Open Source

The author explores the transition from using AI chat interfaces to employing agentic workflows for complex tasks. By treating AI as a collaborative staff member rather than a simple tool, the author achieves more in-depth analytical work despite challenges with platform-imposed resource limits and account access.

Limitations in proprietary systems lead to an investigation of local, open-source models, which highlights the necessity of powerful hardware and GPU resources. These technical constraints suggest that successful investing in the AI sector requires a deep understanding of infrastructure and resource allocation beyond mere financial metrics.

It is very rare to visualize the importance of a piece of concept if you don’t directly work in that field. Like really see it with your own eyes. For example, the importance of safety and security, in any industry, is most often felt and emphasized only after bad things happen.

I had one of these moments when I connected many dots from investing in stocks to getting disabled by Anthropic out of my Pro account (a whole different story).

My understanding of AI and the AI-related tech industry is close to 0.1. At its best I’m an AI enthusiast – I try different models and tools and read news and that’s that. Oh and I watch a lot of Youtube videos on this topic.

My tiny ah-ha really is really just about two and a half dots.

The first dot: ai agents are real and they are the service industry as much as they are in the tech industry.

I have been using different models since GPT-4 on a regular basis and most of my use cases revolve around the chat experiences. I send over questions and documents, and ask LLMs to give me answers and solutions. It was not until the recent two weeks that I had to rely on Claude Cowork to rush out a in-depth analytical presentation for a high-stake meeting, that I realized the power of the agentic workflow and experience.

To be honest I hate this name – agentic – as it emphasizes more on its hype than its essence. To me, the difference (between working with something like Claude Cowork and working with the Chat) is this:

You give a goal you want to achieve to the “agent” and it is the agent’s responsibility to figure out how to achieve it.

You cannot treat it like a tool to get answers from. It is much more than that.

As a matter of fact, I feel I can do the same thing in Claude chat (the normal thing in your browser) and because the model is so smart that it can just break down the task to steps and then call the necessary tools. The only difference is that it doesn’t create the documents directly on your computer and you have to upload and download them manually.

So the biggest difference maker is still me: I stopped assuming that Claude needed lots of handholding and started asking difficult questions. I raised the bar – like I started to talk to it (like literally talk into the chat box via voice-to-text) like a staff member. What works. What doesn’t work. I show emotions by praising the work when the work is well done, and giving harsh feedback when the same mistake appears.

The result was good, but not in ways that I had assumed. It didn’t really save too much time – as work expands to whatever time is available to finish the work. However, I’d say for the same time period, the work is definitely at least 50% more in-depth with much more data analyzed. I didn’t have to rely on my own to swim in the spreadsheet to discover insights. I just asked Claude to do that for me, and I just “guided” it to the conclusion I needed.

If you are interested in this whole process you can read it here.

The second dot: models are powerful but their ability to serve me (and you) is bound by availability of resources aka. tokens, and the mercy of model companies

I subscribed to Pro level and gained access to Cowork. Cowork, as powerful as it is, consumes tokens MUCH faster than normal chats. If you have multiple documents such as word docs or excel spreadsheets, these will all be read and counted as inputs.

But it is necessary. Cowork’s power to work on the project level inside a folder with multiple documents is just another level. Once I tried that I just cannot go back to chatting.

Then I found myself checking the Usage page (Setting->Usage) on how much tokens are left and when the next session begins much more often. There is a name for this – anxiety. Claude once consumed 60% of the session tokens in just one attempt, and it was its fault because it forgot my way of working.

The result of my work is really at the mercy of Claude. Like how much tokens it gives directly influences how much work can be done in an afternoon.

And Claude doesn’t tell me how much token I have access to. At least I don’t know. And it could potentially decide one day that I’m not a worthy customer with my $20/month and all the compute and tokens will be distributed to the more generous enterprise customers. Or I could just be disabled (which I was).

I feel like the same thing has happened to OpenAI too. I remember back in the day, ChatGPT would give really good answers although I was just a free user. Now that they start to emphasize making a profit, all the answers I get are bullet points. Like that is just humiliating. This just doesn’t make any sense; obviously the models are NOT becoming less powerful; even if OpenAI stopped further researching on more powerful models after GPT-4 the experience should at least remain the same, not worse. It is apparently a matter of resource allocation – or re-allocation – where precious resources go from serving people generating zero revenue (like me) to serving people paying.

But what are tokens, really? And why am I (or anybody else) bound by its availability and why do I have to pay for them? All I can see is just a bar showing how much is left. What are the model companies such as OpenAi or Anthropic paying for that I only read about from the news – e.g. OpenAI wants to invest 100 Billion dollars building these data centers with the most advanced Nvidia GPUs?

The third dot: these big model companies cannot be trusted but open source models are not free either.

Even before I was locked out of my Claude account, I had this hunch that open source models should somehow at least be part of my workflow, to lower the costs. As I was going through a Claude Code tutorial (an official one) I wanted to be a smart ass and instead of using an official Claude API key, I asked Claude Code (with my official account info signed in) to rewrite the essential codes so that I could hook up an OpenRouter API and use whatever model I wanted. I wasn’t sure if this was the real reason I got banned (and not one does), but on the same day I did it, I was locked out.

I got set back one generation away from my effective workflow and I desperately wanted to at least restore my ability to work on a project level. I am determined to make it happen whatever it takes. I still haven’t managed to accomplish this, but I have faith.

With some research and Youtube video watching, it dawns on me that I can download open source models on my computer and just use it locally – without any API key, not even with the Internet! That is just mind blowing for me as for me, I never really experienced anything good that is free. And this is just next level.

Well, not so much. As it turns out, my M1/16GB-memory MacBook Air can only work with the most basic models. I downloaded Ollama and then Qwen3 with the 8B parameters and started chatting with the model in my terminal. It was so clunky and I felt like talking to ChatGPT 3. It is not smart at all.

Why is that though? Why I cannot use some of the more powerful models? I know Anthropic and OpenAi models are pripeirtary, but why open models with more parameters are also out of my reach at least according to the Youtube video tutorials?

The answer comes to hardware. My computer aka. the hardware is not powerful enough to support the larger and more powerful local models. So in this sense, the model doesn’t care who it is – you, me, Anthropic, or OpenAI – if a more powerful model is needed, a more powerful hardware is required.

The two windows above are 1. GPU history on the above and 2. a local model (Qwen3:8B) running locally in my terminal. The huge spike started when a question is asked and Qwen started to think about how to answer (the question was who was the most influential philosopher in human history?).

Now things start to make sense. A bigger model requires more powerful and ideally dedicated GPU with bigger RAM, both of which are expensive to individual consumers and companies the same. So even though theoretically I can use some powerful 1T parameter open source models like Kimi, I can never afford doing so, without Moonshot’s GPU clusters and teams of engineers.

The fourth dot: successful investing requires deep understanding of technology more so than ability to crunch numbers because the former provides a foundation for conviction.

Being a professional manager in my day job, I’ve already believed the ability to understand how an organization – including its people and capital – works is the most important thing. However, this view is challenged more and more these days as I tinker with the models. The value of professional managers is deteriorating; we are essentially number crunchers without the domain knowledge of the main business (whatever that is). For example, a professional manager will unlikely be a great hospital administrator; such roles are almost always assumed by doctors because their knowledge of medicine and patient treatment is the foundation of administrative judgment.

On the other hand, I have come to terms with myself on the fact that whenever I get to hear on a rising stock, it is near the top. For example, with the current craze of AI and semiconductor stocks, I should not try to pick individual winners because whatever companies I know of, their growth has been achieved months ago. My chances of catching whatever growth that’s left should be with some targeted ETFs in the field, and ideally still a small portion of my portfolio should be allocated into it.

Connecting the Dots: My Journey with AI Agents and the Reality of Compute

The first dot: ai agents are real and they are the service industry as much as they are in the tech industry.

The second dot: models are powerful but their ability to serve me (and you) is bound by availability of resources aka. tokens, and the mercy of model companies

The third dot: these big model companies cannot be trusted but open source models are not free either.

The fourth dot: successful investing requires deep understanding of technology more so than ability to crunch numbers because the former provides a foundation for conviction.