The stack-of-audits problem


Last week I was on a call with a private investment firm I work with. Their analyst has been pulling his hair out trying to use Claude to spread financial statements.

The setup: five-plus years of income statements, balance sheets, and cash flow statements. All buried in long, messy PDFs from private companies. The kind of documents that vary in format from year to year and don't follow the clean structure of a 10-K. He had built a beautiful Excel template. He was uploading the PDFs to Claude. He was asking for the whole thing in one go.

Claude was choking. Numbers came back incomplete. The cash flow deltas (the 3, 6, 9 month math) were wrong. The team assumed they had a prompting problem. So they kept rewriting the prompt. Longer instructions. More rules. More context.

Here is what I told them on the call. Their prompts were fine. The job they were giving Claude was too big.

Asking Claude to spread five years of three statements from a stack of inconsistent PDFs in one shot is like handing a brand-new analyst a pile of audits on day one and saying "spread all of this by Friday." Of course the work comes back wrong. You wouldn't blame the analyst. You would shrink the ask.

I call this the stack-of-audits problem. The instinct, when AI underperforms, is to push harder. Write a longer prompt. Add more guardrails. Throw in more context. The move that actually works is the opposite. Make the next ask small enough that you would trust a new analyst with it.

For the team yesterday, that meant a different workflow. Start with one statement (income statement first). Pull one or two years at a time. Tell Claude exactly which PDF pages the numbers live on. Then move to the next chunk. Then the next. The accuracy went up. The frustration dropped. They got more done in the next hour than they had in the previous week.

This applies far beyond financial spreading. If you are trying to use AI to summarize a quarter of board materials, draft a year of sales emails, or process every invoice from the last fiscal year, look at the scope of the ask before you blame the model.

Two questions worth asking when an AI workflow is failing.

1. What is the smallest version of this task that I could hand to a junior person and trust?

2. What if I ran that smaller version ten times instead of the big one once?

Most of the AI workflows I see breaking have the same root cause. The job is too big for a single ask. The good news is that scope is a free fix. No new tool, no new prompt library, no new vendor. Just a smaller ask.

When Claude looks like it can't do the work, try doing less of the work. Then do that ten times.

Alex

Alex Talks AI

As an AI Coach, Advisor, and Agent Builder, I help organizations and business leaders harness the power of artificial intelligence to boost productivity and streamline operations. I enable organizations to navigate the transformative landscape of AI, educating teams, identifying operational and strategic opportunities with AI and creating a framework for safe and transparent use of data in the organization.

Read more from Alex Talks AI

I spent last week building an investment deck for a client. The raw material was a pile of research reports. The output needed to be a branded PowerPoint that looked like it came from inside their firm, not from a random consultant with a Canva account. If you've ever tried to get an LLM to spit out a polished, branded deck, you know how this usually goes. The content is fine. The formatting is a disaster. Here's what I tried. Attempt 1. I worked in Claude, pointed it at the folder of...

Their names are Aaron Sorkin, Andy Sachs, Hemingway, Darwin, Ted Lasso, and Archivist. They're agents I built inside Claude. Each one has a role, a personality, a set of files they own, and a clear job. Aaron Sorkin is my chief of staff. He directs everything. When I throw something into the void at 11pm, he decides whether it's an Andy problem, a Hemingway problem, or something I actually need to handle myself. Andy Sachs runs operations. She tracks my Notion CRM, drafts invoices, watches my...

You ask for research. You get a confident-sounding wall of text. The numbers feel right. The framing is fine. But you cannot quite tell where any of it came from, and you would not bet a client meeting on it. I had that feeling one too many times this month, so I ran a small experiment. Same research brief, different tools. The question: what is it actually like to work at SpaceX, xAI, and Tesla? I wanted real numbers from Glassdoor, Indeed, and Blind. Ratings, work-life balance, culture,...