ai

  • Connecting the Dots: My Journey with AI Agents and the Reality of Compute

    Connecting the Dots: My Journey with AI Agents and the Reality of Compute

    The author explores the transition from using AI chat interfaces to employing agentic workflows for complex tasks. By treating AI as a collaborative staff member rather than a simple tool, the author achieves more in-depth analytical work despite challenges with platform-imposed resource limits and account access.

    Limitations in proprietary systems lead to an investigation of local, open-source models, which highlights the necessity of powerful hardware and GPU resources. These technical constraints suggest that successful investing in the AI sector requires a deep understanding of infrastructure and resource allocation beyond mere financial metrics.

    It is very rare to visualize the importance of a piece of concept if you don’t directly work in that field. Like really see it with your own eyes. For example, the importance of safety and security, in any industry, is most often felt and emphasized only after bad things happen.

    I had one of these moments when I connected many dots from investing in stocks to getting disabled by Anthropic out of my Pro account (a whole different story).

    My understanding of AI and the AI-related tech industry is close to 0.1. At its best I’m an AI enthusiast – I try different models and tools and read news and that’s that. Oh and I watch a lot of Youtube videos on this topic. 

    My tiny ah-ha really is really just about two and a half dots.

    The first dot: ai agents are real and they are the service industry as much as they are in the tech industry. 

    I have been using different models since GPT-4 on a regular basis and most of my use cases revolve around the chat experiences. I send over questions and documents, and ask LLMs to give me answers and solutions. It was not until the recent two weeks that I had to rely on Claude Cowork to rush out a in-depth analytical presentation for a high-stake meeting, that I realized the power of the agentic workflow and experience. 

    To be honest I hate this name – agentic – as it emphasizes more on its hype than its essence. To me, the difference (between working with something like Claude Cowork and working with the Chat) is this:

    You give a goal you want to achieve to the “agent” and it is the agent’s responsibility to figure out how to achieve it. 

    You cannot treat it like a tool to get answers from. It is much more than that.

    As a matter of fact, I feel I can do the same thing in Claude chat (the normal thing in your browser) and because the model is so smart that it can just break down the task to steps and then call the necessary tools. The only difference is that it doesn’t create the documents directly on your computer and you have to upload and download them manually. 

    So the biggest difference maker is still me: I stopped assuming that Claude needed lots of handholding and started asking difficult questions. I raised the bar – like I started to talk to it (like literally talk into the chat box via voice-to-text) like a staff member. What works. What doesn’t work. I show emotions by praising the work when the work is well done, and giving harsh feedback when the same mistake appears.

    The result was good, but not in ways that I had assumed. It didn’t really save too much time – as work expands to whatever time is available to finish the work. However, I’d say for the same time period, the work is definitely at least 50% more in-depth with much more data analyzed. I didn’t have to rely on my own to swim in the spreadsheet to discover insights. I just asked Claude to do that for me, and I just “guided” it to the conclusion I needed.

    If you are interested in this whole process you can read it here.

    The second dot: models are powerful but their ability to serve me (and you) is bound by availability of resources aka. tokens, and the mercy of model companies 

    I subscribed to Pro level and gained access to Cowork. Cowork, as powerful as it is, consumes tokens MUCH faster than normal chats. If you have multiple documents such as word docs or excel spreadsheets, these will all be read and counted as inputs.

    But it is necessary. Cowork’s power to work on the project level inside a folder with multiple documents is just another level. Once I tried that I just cannot go back to chatting. 

    Then I found myself checking the Usage page (Setting->Usage) on how much tokens are left and when the next session begins much more often. There is a name for this – anxiety. Claude once consumed 60% of the session tokens in just one attempt, and it was its fault because it forgot my way of working. 

    The result of my work is really at the mercy of Claude. Like how much tokens it gives directly influences how much work can be done in an afternoon. 

    And Claude doesn’t tell me how much token I have access to. At least I don’t know. And it could potentially decide one day that I’m not a worthy customer with my $20/month and all the compute and tokens will be distributed to the more generous enterprise customers. Or I could just be disabled (which I was).

    I feel like the same thing has happened to OpenAI too. I remember back in the day, ChatGPT would give really good answers although I was just a free user. Now that they start to emphasize making a profit, all the answers I get are bullet points. Like that is just humiliating. This just doesn’t make any sense; obviously the models are NOT becoming less powerful; even if OpenAI stopped further researching on more powerful models after GPT-4 the experience should at least remain the same, not worse. It is apparently a matter of resource allocation – or re-allocation – where precious resources go from serving people generating zero revenue (like me) to serving people paying.

    But what are tokens, really? And why am I (or anybody else) bound by its availability and why do I have to pay for them? All I can see is just a bar showing how much is left. What are the model companies such as OpenAi or Anthropic paying for that I only read about from the news – e.g. OpenAI wants to invest 100 Billion dollars building these data centers with the most advanced Nvidia GPUs?

    The third dot: these big model companies cannot be trusted but open source models are not free either. 

    Even before I was locked out of my Claude account, I had this hunch that open source models should somehow at least be part of my workflow, to lower the costs. As I was going through a Claude Code tutorial (an official one) I wanted to be a smart ass and instead of using an official Claude API key, I asked Claude Code (with my official account info signed in) to rewrite the essential codes so that I could hook up an OpenRouter API and use whatever model I wanted. I wasn’t sure if this was the real reason I got banned (and not one does), but on the same day I did it, I was locked out. 

    I got set back one generation away from my effective workflow and I desperately wanted to at least restore my ability to work on a project level. I am determined to make it happen whatever it takes. I still haven’t managed to accomplish this, but I have faith.

    With some research and Youtube video watching, it dawns on me that I can download open source models on my computer and just use it locally – without any API key, not even with the Internet! That is just mind blowing for me as for me, I never really experienced anything good that is free. And this is just next level. 

    Well, not so much. As it turns out, my M1/16GB-memory MacBook Air can only work with the most basic models. I downloaded Ollama and then Qwen3 with the 8B parameters and started chatting with the model in my terminal. It was so clunky and I felt like talking to ChatGPT 3. It is not smart at all. 

    Why is that though? Why I cannot use some of the more powerful models? I know Anthropic and OpenAi models are pripeirtary, but why open models with more parameters are also out of my reach at least according to the Youtube video tutorials?

    The answer comes to hardware. My computer aka. the hardware is not powerful enough to support the larger and more powerful local models. So in this sense, the model doesn’t care who it is – you, me, Anthropic, or OpenAI – if a more powerful model is needed, a more powerful hardware is required. 

    The two windows above are 1. GPU history on the above and 2. a local model (Qwen3:8B) running locally in my terminal. The huge spike started when a question is asked and Qwen started to think about how to answer (the question was who was the most influential philosopher in human history?). 

    Now things start to make sense. A bigger model requires more powerful and ideally dedicated GPU with bigger RAM, both of which are expensive to individual consumers and companies the same. So even though theoretically I can use some powerful 1T parameter open source models like Kimi, I can never afford doing so, without Moonshot’s GPU clusters and teams of engineers. 

    The fourth dot: successful investing requires deep understanding of technology more so than ability to crunch numbers because the former provides a foundation for conviction.

    Being a professional manager in my day job, I’ve already believed the ability to understand how an organization – including its people and capital – works is the most important thing. However, this view is challenged more and more these days as I tinker with the models. The value of professional managers is deteriorating; we are essentially number crunchers without the domain knowledge of the main business (whatever that is). For example, a professional manager will unlikely be a great hospital administrator; such roles are almost always assumed by doctors because their knowledge of medicine and patient treatment is the foundation of administrative judgment. 

    On the other hand, I have come to terms with myself on the fact that whenever I get to hear on a rising stock, it is near the top. For example, with the current craze of AI and semiconductor stocks, I should not try to pick individual winners because whatever companies I know of, their growth has been achieved months ago. My chances of catching whatever growth that’s left should be with some targeted ETFs in the field, and ideally still a small portion of my portfolio should be allocated into it. 

  • I Spent a Week Building a 25-Slide Deck with Claude. Here’s What Actually Worked (and What Blew Up in My Face)

    I Spent a Week Building a 25-Slide Deck with Claude. Here’s What Actually Worked (and What Blew Up in My Face)

    A brutally honest account of using AI to build a real corporate presentation — not a demo, not a toy project.


    I just finished a week-long project building a half-year industry analysis presentation for a large company with six subsidiaries across multiple business verticals. Thirty-plus slides, six companies, three years of operational data to cross-reference, and a deadline that didn’t move.

    I used Claude — specifically a combination of Claude’s chat interface, its desktop Cowork app, and its Office plugins for Word and Excel — as my primary co-worker throughout. Not as a toy. Not as a “let’s see what AI can do” experiment. As an actual production tool on a real deliverable.

    Here’s what I learned. The honest version.


    The Setup

    The brief: a 2026 H1 analysis report for a large company with six subsidiary businesses across multiple verticals. Leadership wanted forward-looking projections where possible. The data lived across dozens of operational reports — Word documents, each one extremely long — plus spreadsheets. Six companies. Three reporting periods each. That’s 18+ documents just for the core data layer, before touching anything about structure or narrative.

    I had no template. I had no clear starting point. I had a pile of source material and a deadline.

    This is where AI actually earns its keep — or fails you.


    What I Did Wrong First

    Let me start with the mistakes, because they’ll save you more time than the wins will.

    Mistake #1: I tried to build on top of an existing PPT file.

    My first instinct was sensible: don’t start from zero, grab a previous version from a colleague and iterate from there. Hand it to Claude, ask it to update slide by slide.

    This is a trap.

    Every time Claude touches a task involving an existing PowerPoint file, it re-reads the entire thing from scratch. It tries to understand the whole structure, all the existing slides, the formatting logic — before doing anything you actually asked for. I watched one session burn through 60% of its context window just trying to insert a single new slide. The deck barely moved. My progress stalled completely.

    The fix: stop trying to edit an existing file. Generate each slide as a standalone output and paste it in yourself. More on this below.

    Mistake #2: I gave Claude too much context, thinking more was always better.

    It isn’t. Dumping all 18 source documents into one session and asking Claude to “just figure it out” produces confused, generic output. The model gets pulled in too many directions. What I started calling “context contamination” is real — irrelevant material in the context window quietly degrades the quality of the output you actually wanted.

    The fix: treat each slide as an isolated task. Only load the files relevant to that specific slide.


    The Workflow That Actually Worked

    After the wrong turns, I landed on a four-phase system.

    Phase 1 — Plan in chat, not in Cowork.

    Before touching the desktop app or generating a single slide, I spent time in a Claude chat Project just reading and thinking. I uploaded source documents, asked Claude to help me understand what data I actually had, and together we worked out a chapter structure and a rough outline of what each section needed to contain.

    The critical thing here: you have to do this yourself. Claude can help you brainstorm, but you have to be the one who actually understands your material. If you can’t judge whether a proposed structure makes sense, you can’t course-correct when Claude gets it wrong — and it will get things wrong. Your judgment is not optional.

    Phase 2 — Extract data using the Office plugins.

    This was the biggest surprise of the project: Claude inside Word and Excel is genuinely powerful for data work, in a way that the standalone chat interface isn’t.

    With Claude in Word, I opened three operational reports simultaneously and asked it to cross-reference them — pull the same hotel’s revenue figures across Q1 2025, H1 2025, and Q1 2026 in one pass. It did. That would have taken me an hour manually. It took maybe three minutes.

    With Claude in Excel, I could ask open-ended analytical questions directly against the spreadsheet data. Not “apply this formula” — actual questions like “which segment showed the steepest decline in the back half of the year and what’s driving it?” Claude would locate the relevant data, analyze it, and discuss the findings with me interactively. This is not a smart autocomplete. It’s a different kind of tool.

    Phase 3 — Build slides one at a time in Cowork, with strict isolation.

    Here’s the discipline that made Cowork actually work: one slide (or one cluster of closely related slides) per conversation. One task. Clean context.

    My process for each slide:

    1. Copy the relevant source files into a fresh working folder
    2. Open a new Cowork conversation
    3. Tell Claude exactly which files to read and what I need
    4. Ask for a written work plan and outline before any slide generation
    5. Iterate on the content through conversation until it’s right
    6. Only at the very end, ask for the actual PowerPoint output

    That last point is important. Generating the PPT is the last small step, not the first. Most of the real work — the thinking, the structuring, the data decisions — happens in conversation. If you jump straight to “make me a slide,” you burn tokens on something you haven’t thought through yet, and the output reflects that.

    Phase 4 — You manage the master file. Not Claude.

    The master PowerPoint file lives in a completely separate folder that Cowork cannot see. Claude generates individual slides; I copy and paste them in manually.

    This sounds tedious. It is slightly tedious. It is also the reason I never had to spend an afternoon rolling back a corrupted file. When Claude can’t see your master file, it can’t accidentally overwrite, reorder, or misinterpret it. The manual paste is cheap insurance.


    The Tools, Mapped to Their Jobs

    ToolWhat it’s actually good for
    Claude chat (Project)Planning, brainstorming, making sense of source material
    Claude in WordCross-referencing multiple documents you already have open
    Claude in ExcelOpen-ended analysis and Q&A against live spreadsheet data
    Claude Cowork (desktop)Executing specific, isolated slide-generation tasks
    Claude in PowerPointMostly skipped — Cowork handled this better in practice

    The Honest Summary

    AI didn’t make this presentation for me. It made it possible for me to make this presentation in a week instead of three.

    The parts that still required me: understanding the source material, judging whether a structure was right, deciding what story the data was telling, and managing the final output. None of that got automated away. What got automated was the tedious cross-referencing, the reformatting, the “turn this table into a slide” mechanical work that used to eat hours.

    The traps are real. Handing Claude a messy context and hoping for magic doesn’t work. Treating it like a simple command-line tool doesn’t work. What works is treating it like a smart but context-blind collaborator — one who needs clear instructions, bounded tasks, and a human in the loop who actually knows what good looks like.

    If you’re building complex decks and you haven’t tried this yet, try it. Just start with the isolation discipline. One slide, one conversation. Don’t let it touch your master file. Do the planning in chat first.

    It won’t feel like magic. It’ll feel like having a very fast, very capable assistant who needs good management. That’s exactly what it is.