<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
  <title>Posts tagged with “Ai” on Mark van Lent’s weblog</title>
  <updated>2026-04-08T00:00:00+00:00</updated>
  <link rel="self" type="application/atom+xml" href="https://markvanlent.dev/tags/ai/index.xml" hreflang="en"/>
  <id>tag:markvanlent.dev,2010-04-02:/tags/ai/index.xml</id>
  <link rel="alternate" type="text/html" href="https://markvanlent.dev/tags/ai/" hreflang="en"/>
  <author>
      <name>Mark van Lent</name>
      <uri>https://markvanlent.dev/about/</uri>
    </author>
  <rights>Copyright (c) Mark van Lent, Creative Commons Attribution 4.0 International License.</rights>
  <icon>https://markvanlent.dev/favicon.ico</icon>
  <entry>
    <title type="html"><![CDATA[AI Engineer Europe 2026: Workshop Day]]></title>
    <link rel="alternate" href="https://markvanlent.dev/2026/04/08/ai-engineer-europe-2026-workshop-day/" type="text/html" />
    <id>https://markvanlent.dev/2026/04/08/ai-engineer-europe-2026-workshop-day/</id>
    <author>
      <name>map[name:Mark van Lent uri:https://markvanlent.dev/about/]</name>
    </author>
    <category term="ai" />
    <category term="conference" />
    
    <updated>2026-04-09T20:32:35Z</updated>
    <published>2026-04-08T00:00:00Z</published>
    <content type="html"><![CDATA[<p><a href="https://www.ai.engineer/europe">AI Engineer Europe</a> is &ldquo;Europe&rsquo;s first flagship
AI Engineer event&rdquo;. I was fortunate enough to be one of the engineers that
<a href="https://schubergphilis.com/">Schuberg Philis</a> (my employer) sent to this
conference. The first day was workshop day.</p>
<p><img src="/images/ai_engineer_2026_entrance.jpg" alt="AI Engineer Europe 2026 was held at the Queen Elizabeth II Centre in London"></p>
<h2 id="how-to-build-agents-that-run-for-hours-without-losing-the-plot--ash-prabaker-and-andrew-wilson">How to Build Agents That Run for Hours (Without Losing the Plot) &mdash; Ash Prabaker and Andrew Wilson</h2>
<p>Why are agents losing the plot?</p>
<ul>
<li>Context: the agent can&rsquo;t carry state (and has &ldquo;context anxiety&rdquo; if the context
is getting filled up)</li>
<li>Planning: general models are not great at planning (e.g. running out of context)</li>
<li>Verification: models are bad at evaluating their own output (it thinks it&rsquo;s done)</li>
</ul>
<p>There are two ways to fix this:</p>
<ul>
<li>Train the model</li>
<li>Wrap the model in a harness</li>
</ul>
<p>Anthropic&rsquo;s new models were also combined with harness improvements last year.
Every release of Claude could run longer unattended.</p>
<p>Harness design for long-running agents. Building the generator/evaluation
loop, and then deleting half of it when the model caught up. Splitting up the
generator (which builds the thing) from the evaluator (which grades the thing).
Most people now use the same instance to build and evaluate.</p>
<blockquote>
<p>Tuning a standalone evaluator to be skeptical is tractable. Making a generator
self-critical is not.</p></blockquote>
<p>Added one more role: plan (a 1-line prompt to full spec). This is the input for
the generator. If you squint a bit, you&rsquo;ll notice that this mimics the real
world where we have a product manager (the plan role), an individual contributor
(the generator) and a QA person (the evaluator).</p>
<p>Before any code gets written, the generator and evaluator negotiate what &ldquo;done&rdquo;
looks like for this chunk. They iterate via files until they agree (one agent
writes, the other reads and responds). The agents agree on a contract. This
bridges user stories to testable behaviour.</p>
<p>Ash presented an example where a solo agent built a game vs building a game
with a full harness. The solo agent was done in 20 minutes, and the full harness
took 6 hours. But the result was significantly better and more
thought through.</p>
<p>Out of the box, Claude is a poor QA agent. It would find a bug and then decide
itself that it wasn&rsquo;t a big deal and approve the work anyway.</p>
<p>With Opus 4.6 half of what was described about harnesses became obsolete. The
new model needs less scaffolding and the harness can be less complex.</p>
<figure><img src="/images/ai_engineer_2026_anthropic_harness_design.jpg"
    alt="human -&gt; planner -&gt; generator -&gt; evaluator"><figcaption>
      <p>The simplified harness where the human prompts the planner, the planner writes the spec and the generator and evaluator work on the application</p>
    </figcaption>
</figure>

<p>Takeaways:</p>
<ol>
<li>Use an adversarial evaluator, self-evaluation is a trap</li>
<li>Structured handoffs are better than compaction</li>
<li>Make subjective quality gradable with rubrics the model can apply</li>
<li>Read the traces as they are your primary debugging loop</li>
<li>Delete scaffolding where the model catches up</li>
</ol>
<p>The models you are using (e.g. Opus for planning and Sonnet for building)
influence the harness. The harness patterns that work tend to get absorbed back
into the tools. Most of the loop described is buildable in Claude Code right now
with primitives that are already available.</p>
<p>Further reading on the Anthropic Engineering blog:</p>
<ul>
<li><a href="https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents">Effective harnesses for long-running agents</a></li>
<li><a href="https://www.anthropic.com/engineering/harness-design-long-running-apps">Harness design for long-running application development</a></li>
</ul>
<h2 id="building-your-own-software-factory--eric-zakariasson">Building Your Own Software Factory &mdash; Eric Zakariasson</h2>
<p>Eric spoke about his experience using Cursor to build a software factory.
Some parts are running autonomously. It&rsquo;s hard work. These are his observations.</p>
<p>There are several levels of using AI. At the lowest level you have autocomplete.
Then you progress to a coding intern, a junior dev, a developer (where the
majority of code is written by the AI tool) to a senior developer and finally a
software factory where the AI is a black box and operates like a
<a href="https://en.wikipedia.org/wiki/Lights_out_(manufacturing)">dark factory</a>.</p>
<p>Why would you want a factory:</p>
<ul>
<li>Throughput (24/7, machines do not need sleep)</li>
<li>Consistent output (like an actual factory, but with AI there is a risk of
losing determinism)</li>
<li>Leverage your taste better</li>
</ul>
<p>What do you need to build a factory?</p>
<ul>
<li>Primitives &amp; patterns (co-located code and usage patterns)</li>
<li>Guardrails (you want to let the agents free, but not <em>too</em> free, think about
rules, hooks and tests)</li>
<li>Enablers (what can you allow the agents to do to be more free, think about
skills, MCPs)</li>
</ul>
<figure><img src="/images/ai_engineer_2026_eric_zakariasson.jpg"
    alt="Building the factory"><figcaption>
      <p>Eric Zakariasson speaking about what you need to build a software factory</p>
    </figcaption>
</figure>

<p>Rules should be created dynamically: if you see an agent doing something you
don&rsquo;t like, write a rule. Note that agents will behave better over time with
newer models, so rules may become obsolete.</p>
<p>Other aspects you&rsquo;ll need:</p>
<ul>
<li>Runnable: can the agent start the env?</li>
<li>Accessible: can the agent use the tools it needs, e.g. Datadog?</li>
<li>Verifiable: can the agent perform check itself?</li>
</ul>
<p>For each of the different stages of the software development lifecycle you will
need an agent. For example to review changes and automated testing.</p>
<p>You need to shift your way of working:</p>
<ul>
<li>You&rsquo;ll write less code yourself (if any) but are managing your agents</li>
<li>You are also going from sync to async</li>
<li>You need to think more about scope and parallelising work (e.g. running some
tasks in parallel will guarantee merge conflicts while other tasks do not
interfere with each other).</li>
<li>You still need to know how data flows and what users want</li>
<li>You&rsquo;ll want to identify the human in the loop. Is there e.g. a copy/paste
action the user is doing now? Automate that away.</li>
</ul>
<p>Since agents will run for longer periods, you need to trust them more. You get
to know the agents: their weaknesses and their strengths and how to prompt them.</p>
<p>When having agents work in parallel, Eric is using separate environments (even
in different VMs) to have reproducible and isolated environments without side
effects from other branches and ongoing work. This will take more effort to
setup, but once you are there, it is easier to scale up the number of agents
working on your code.</p>
<p>You need to keep an eye out for where the agents go off the rails. Use this
information to improve the factory.</p>
<p>Now how do you go from 5 agents to 100? Same as before: observe the outcome.
And:</p>
<ul>
<li>Make sure the agents can verify their own work</li>
<li>Setup automations. Examples:
<ul>
<li>Eric demonstrated asking the agent what actions he does frequently so he can work on automating those</li>
<li>Review the comments made on PR reviews so the agents can learn from them</li>
</ul>
</li>
<li>Move up abstractions</li>
</ul>
<p>The takeaways:</p>
<ul>
<li>Be clear about your intent: what problem are you solving?</li>
<li>Stay in the loop for important decisions (e.g. which payment system to use,
etc)</li>
<li>Build systems and tools: codify them and give your agents access to them</li>
<li>Store context for later and keep it up-to-date (since it will evolve)</li>
<li>Let the agents be free (one team had even given the agents a place to complain
and that proved to be very useful)</li>
</ul>
<h2 id="build-your-own-deep-research-agent--technical-writer--louis-françois-bouchard-paul-iusztin-and-samridhi-vaid">Build Your Own Deep Research Agent + Technical Writer &mdash; Louis-François Bouchard, Paul Iusztin and Samridhi Vaid</h2>
<p>The team built a multi-agent pipeline to replace the research and technical writing
process. They give a topic and it will write a technical article (without slop
or hallucinations). It targets short content, like LinkedIn posts.</p>
<p>The GitHub repo of their project:
<a href="https://github.com/iusztinpaul/designing-real-world-ai-agents-workshop">iusztinpaul/designing-real-world-ai-agents-workshop</a></p>
<p>Constraints:</p>
<ul>
<li>Costs per task</li>
<li>Latency</li>
<li>Quality</li>
<li>Compliance &amp; data privacy</li>
</ul>
<p>There&rsquo;s a scale from simple prompts (where you have more control and less costs)
via workflows to single agent to multiple agents (where you have more autonomy,
and thus less control, and higher costs). It&rsquo;s best to always use the most
simple solution. For example: if the context is known at or before query time
and has less than 200K tokens, a simple prompt can suffice, using
<a href="https://www.geeksforgeeks.org/artificial-intelligence/context-augmented-generation-cag/">Context Augmented Generation (CAG)</a>.</p>
<p>In a situation where the context is not known beforehand (e.g. because it is
private or too recent), you might benefit from including a workflow. (A
workflow is a sequence of fixed steps, with the same steps in the same order
each time). Think about using
<a href="https://www.geeksforgeeks.org/nlp/what-is-retrieval-augmented-generation-rag/">Retrieval-Augmented Generation (RAG)</a>.</p>
<p>The next step is when you need the system to take autonomous actions or you need
dynamic behaviour. Then you get to agents, which can react to what is happening.
These agents can use also tools, which can have their own:</p>
<ul>
<li>System prompt</li>
<li>Validation logic</li>
<li>LLMs</li>
</ul>
<p>Tools are specialists, but with one shared decision maker which has a global
context. Delegation to tools helps with context management. The tools can have
their own context windows.</p>
<p>AI products are never just agents, simple workflows or LLM calls and tools.
They combine all of them. AI engineers need to understand how to build these
complex systems. And deep research systems are a perfect project on how to learn
these complex, multi-agent systems.</p>
<p>The MCP server they built (see <a href="https://github.com/iusztinpaul/designing-real-world-ai-agents-workshop">their GitHub repo</a>):</p>
<ul>
<li>Tools: actions the agent can do</li>
<li>Prompts: instructions the agent can follow</li>
<li>Resource: data the agent can read</li>
</ul>
<p>Why both skills and MCP? They are moving to using skills more, but those cannot
replace the MCP server completely because some tools are too complex to be
turned into skills.</p>
<p>LinkedIn post generation:</p>
<ul>
<li>Guidelines: what to write about (topic, angle, etc)</li>
<li>Profiles: how to write (structure, terminology, character of the post)</li>
<li>Research</li>
</ul>
<p>Debugging workflows/agents purely through logs is hard. You want traces
(LLM/tool calls with full I/O + metadata), latency and cost tracking. They used
<a href="https://www.comet.com/docs/opik/">Opik</a> for observability.</p>
<p>You also want to automate evals. Generating one post allows for manual review,
but when scaling to 100 posts, it&rsquo;s quite impossible to manually review each one.
One small change could break something completely, so you need to review.</p>
<p>They used a three layer architecture:</p>
<ul>
<li>Optimization</li>
<li>Regression testing (evals in CI/CD)</li>
<li>Production monitoring (using Opik)</li>
</ul>
<p>They encourage us to run their project ourselves and reading the code to
understand the details of what&rsquo;s going on.</p>
<h2 id="ai-coding-for-real-engineers--matt-pocock">AI Coding For Real Engineers &mdash; Matt Pocock</h2>
<p>To follow the workshop along, see Matt&rsquo;s workshop at
<a href="https://aihero.dev/s/ai-2026">aihero.dev/s/ai-2026</a>.</p>
<p><em>Before I begin with my notes: this session is definitely worth watching (again)
once it&rsquo;s available on YouTube! Matt is an excellent teacher, has great energy
and great content as well.</em></p>
<p>Once there are about 100K tokens in the context, the AI starts to get dumber and
making increasingly dumb decisions. We don&rsquo;t want the AI to bite off more than
it can chew, so keep your tasks small.</p>
<figure><img src="/images/ai_engineer_2026_matt_pocock_1.jpg"
    alt="Smart zone / dumb zone"><figcaption>
      <p>Matt Pocock explaining about the smart zone / dumb zone</p>
    </figcaption>
</figure>

<p>Even with the 1M context window of Claude the &ldquo;smart zone&rdquo; is still around 100K.
Claude basically just expanded the dumb zone. Good for retrieval, less good for
coding.</p>
<p>So how <em>do</em> you tackle big tasks? Multi-phase plans are a common solution. It&rsquo;s
basically a loop. This is where the
<a href="https://ralph-wiggum.ai/">Ralph Wiggum loop</a> comes from. Matt likes something
smarter though.</p>
<p>Every session starts with a system prompt. If you have 200K tokens in here
already, you are in the &ldquo;dumb zone&rdquo; from the start. To stay in the &ldquo;smart zone&rdquo;
you can clear the context. This does mean that you lose everything that
happened after the system prompt. Alternatively, you can compact.</p>
<p>If you show the number of tokens in the status line of Claude Code (or whatever
tool you are using), you know how close to the &ldquo;dumb zone&rdquo; you are. When using
<code>/compact</code> it squeezes all information. The downside of using <code>/compact</code> over
<code>/clear</code>: with the latter you have a deterministic state.</p>
<p>The <code>/grill-me</code> skill (<a href="https://github.com/mattpocock/skills/tree/main/grill-me">source</a>)
is a really nice way of taking inputs from the world. It can 

<cite>interview
the user relentlessly about a plan or design until reaching shared
understanding</cite> (according to the skill itself). Ideally you use
the <code>/grill-me</code> skill with both the developer and the domain expert in the room.</p>
<p>After the <code>/grill-me</code> skill, you want to summarize all those valuable tokens
into a Product Requirements Document (PRD). This is the definition of done for
your agent. You can use the <code>/write-a-prd</code> skill
(<a href="https://github.com/mattpocock/skills/tree/main/write-a-prd">source</a>)
to write this document. Note that there are testing decisions in there too.
These are important!</p>
<p>Matt explains that he does not actually read the PRDs that are generated. In the
grilling sessions he makes sure he and the AI are on the same wavelength so
there&rsquo;s no need to review the resulting document. Why would he? Doing so would
basically only test the LLM&rsquo;s ability to summarize.</p>
<p>Should he optimize the plan? Matt doesn&rsquo;t think optimizing the plan to death
adds a lot of value. Things will change afterwards anyway.</p>
<p>To see examples of a PRD and issues generated from it, check his
<a href="https://github.com/mattpocock/course-video-manager">course-video-manager</a>
repository, in particular the closed issues.</p>
<p>Now that we have our destination, how do we split it? Matt likes creating a
<a href="https://en.wikipedia.org/wiki/Kanban_board">Kanban board</a> out of it.
He created a skill to do this: <code>/prd-to-issues</code>
(<a href="https://github.com/mattpocock/skills/tree/main/prd-to-issues">source</a>).</p>
<p>As you&rsquo;ll see in that skill, Matt instructs the AI to use vertical slices. LLMs
love to code horizontally, so per layer (database, business logic, frontend).
This means you don&rsquo;t get feedback on your work until all layers are done. If
you slice vertical layers, you can test the entire flow sooner. (Also known
as &ldquo;tracer bullets&rdquo; if you&rsquo;ve read
<a href="https://pragprog.com/titles/tpp20/the-pragmatic-programmer-20th-anniversary-edition/">The Pragmatic Programmer</a>.)</p>
<p>With the Kanban board setup, it&rsquo;s easier to parallelize working on tasks. And
once we have the issues that can be worked on, the human can step out of the
loop.</p>
<p>For implementation use the <code>/tdd</code> skill
(<a href="https://github.com/mattpocock/skills/tree/main/tdd">source</a>). It does red/green
refactors: write failing test first and then make it succeed. This not only adds
(good) tests to the codebase, but starting with the tests it&rsquo;s harder to &lsquo;cheat&rsquo;
with writing the tests (you cannot write the tests to match the implementation
because the latter does not exist yet).</p>
<p>How do you conform with existing architecture, coding standards, API design,
constraints, etc?</p>
<ul>
<li>Push instructions to the LLM (e.g. in <code>CLAUDE.md</code>)</li>
<li>Pull: give the agent an opportunity to collect info, e.g. via skills.</li>
</ul>
<p>For the implementer you should use the pull strategy so it can pull what it
needs. For the reviewer use push (these are our standards, make sure they are
adhered to).</p>
<p>You absolutely need to have feedback loops for the AI. The quality of
your feedback loops directly affect the quality of the output.</p>
<p>As you may have noticed, there are two types of work when building something:</p>
<ul>
<li>Human in the loop (HITL) tasks (like planning) which <strong>need</strong> the human</li>
<li>Away from keyboard (AFK) tasks (like implementation) where the AI can work
autonomously</li>
</ul>
<p>To recap the process thus far: you have an idea, you have the grilling session
with the AI and this results in a PRD, which gets turned into issues on a Kanban
board. These are HITL steps. Now the AI can take over and handle implementation
where one or more agents work (the night shift so to speak). Once the AI is done,
the human steps back in the loop for the QA/review step.</p>
<figure><img src="/images/ai_engineer_2026_matt_pocock_2.jpg"
    alt="Smart zone / dumb zone"><figcaption>
      <p>Matt Pocock discussing the phases in the whole process from idea to finished implementation</p>
    </figcaption>
</figure>

<p>And yes, we do need code review. There is no way to avoid this. If we delegate
coding to the agent in small PRs, we have to review more code. Matt doesn&rsquo;t feel
good saying that, but it&rsquo;s his honest answer. The QA step is also where you can
impose your opinions on the agent. Note that in the QA phase we also create more
issues on the Kanban board.</p>
<p>You <em>can</em> and <em>should</em> have an automated review step though. Only QA manually
afterwards. But be careful that the automated review isn&rsquo;t done in the &ldquo;dumb
zone&rdquo;, you want to review in the smart zone.</p>
<p>Frontend in particular is tricky. It needs human eyes. AI is not very good at
that yet. But you <em>can</em> ask it to create a couple of prototypes to trigger
a feedback loop with the agent.</p>
<p>So how does this work in a team? You involve the team in all HITL steps. And
while the idea, research, prototype steps look linear, in the messy world you&rsquo;ll
bounce back and forth between those phases.</p>
<p>What if you have a bad, complicated codebase that even humans do a bad job in?
How do you improve that? If your files are &ldquo;shallow modules&rdquo; (small files with
little functionality), it&rsquo;s hard for AI to navigate. It has to manually track
through the repo. Also hard to draw test boundaries. Hard to test interaction
between modules. Should the tests mock other modules?</p>
<p>Building a codebase that is easy to test is essential, because the feedback loop
is better. &ldquo;Deep modules&rdquo; are better; these are modules with more functionality.
Dependencies are more clear. So how do you go from a bad codebase to a good one?
How to group modules? The <code>/improve-codebase-architecture</code> skill
(<a href="https://github.com/mattpocock/skills/tree/main/improve-codebase-architecture">source</a>)
will find places to deepen the modules.</p>
<p>(The concept of shallow/deep modules come from the book
<a href="https://www.amazon.com/Philosophy-Software-Design-John-Ousterhout/dp/1732102201">A Philosophy of Software Design</a> discusses dependencies.)</p>
<p>If you take only one thing away from today: use the <code>/improve-codebase-architecture</code> skill</p>
<div class="note ">
  <div class="note_header">
    <span class="hidden">:</span>
  </div>
  <div class="note_body">
    Matt was brilliant at switching between telling Claude what to do next,
presenting and answering questions. I&rsquo;ve tried to make the above a bit of a
logical story. What remains are more random notes resulting from questions from
the audience.
  </div>
</div>

<p>You need to have (enough) control over the thing to be able to fix it. The PRD
contains which modules are updated, it also helps you keep in control of the
beast. Because we delegate more, we lose sense of our codebase. By building
deep modules (big shapes), it&rsquo;s easier to have their mental models in your mind.
You don&rsquo;t need to code review all details in a module, you only need to make
sure the shape does what it needs to do.</p>
<p>Code is important, so understanding the tools deeply make you a better developer
and you&rsquo;ll get more out of AI.</p>
<p>When using plan mode, you can tell in <code>CLAUDE.md</code> to be terse (&ldquo;when talking to
me, sacrifice grammar for the sake of concision&rdquo;). This helps when reading the
plans. But Matt dropped this in favour of the grilling session where he and the
LLM came to the same shared understanding and he no longer needed to read the
plans.</p>
<p>Does he keep the markdown plans for future reference? No clear answer. Matt is
wary of outdated documentation (names and requirements have changed). He tends
to get rid of the plans and marking the issues as closed.</p>
<p>What does he think of <a href="https://steve-yegge.medium.com/introducing-beads-a-coding-agent-memory-system-637d7d92514a">Beads</a>
from Steve Yegge? It&rsquo;s another way to manage Kanban boards.</p>
<p>Sidenote: <a href="https://github.com/mattpocock/sandcastle">Sandcastle</a> (also created
by Matt) is an orchestrator. It takes Ralph loop from sequential to parallel.</p>
<p>Matt is not selling a way of working. He does recommend buying old programming
books (pre AI) since they contain a lot of wisdom that is still applicable.</p>]]></content>
  </entry>
</feed>
