<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
  <title>Posts tagged with “Conference” on Mark van Lent’s weblog</title>
  <updated>2026-04-13T00:00:00+00:00</updated>
  <link rel="self" type="application/atom+xml" href="https://markvanlent.dev/tags/conference/index.xml" hreflang="en"/>
  <id>tag:markvanlent.dev,2010-04-02:/tags/conference/index.xml</id>
  <link rel="alternate" type="text/html" href="https://markvanlent.dev/tags/conference/" hreflang="en"/>
  <author>
      <name>Mark van Lent</name>
      <uri>https://markvanlent.dev/about/</uri>
    </author>
  <rights>Copyright (c) Mark van Lent, Creative Commons Attribution 4.0 International License.</rights>
  <icon>https://markvanlent.dev/favicon.ico</icon>
  <entry>
    <title type="html"><![CDATA[AI Engineer Europe 2026: reflection]]></title>
    <link rel="alternate" href="https://markvanlent.dev/2026/04/13/ai-engineer-europe-2026-reflection/" type="text/html" />
    <id>https://markvanlent.dev/2026/04/13/ai-engineer-europe-2026-reflection/</id>
    <author>
      <name>map[name:Mark van Lent uri:https://markvanlent.dev/about/]</name>
    </author>
    <category term="ai" />
    <category term="conference" />
    <category term="opinion" />
    
    <updated>2026-04-13T07:55:45Z</updated>
    <published>2026-04-13T00:00:00Z</published>
    <content type="html"><![CDATA[<p>Now that the dust is settling, it&rsquo;s time to reflect a bit on what I&rsquo;ve heard at
the conference.</p>
<h2 id="my-takeaways">My takeaways</h2>
<p>This was my first AI Engineer conference and also the first AI conference I
attended. It was a nice way to gauge where both I and we as a company stand.
This is especially true when I realise that mostly (only?) companies at the
forefront of these developments were there. AI is hot, but as we
<a href="/2026/04/10/ai-engineer-europe-2026-keynote/session-day-2/#most-enterprise-agentic-projects-are-doomed--heres-why--jess-grogan-avignon-and-jack-wang">have heard</a>,
only 12% of the (big) companies are &ldquo;AI achievers&rdquo;.</p>
<p>What I think the speakers agreed on:</p>
<ul>
<li>The role of the engineer is shifting. Less coding, more writing
specifications, planning and reviewing</li>
<li>Codebases must be designed for humans <strong>and</strong> agents to read and understand</li>
<li>Guardrails are essential. If we want to give agents more freedom to solve
problems, we need to give them a certain amount of freedom, but we need
guardrails first</li>
<li>We need to have feedback loops. The quality of the feedback loop determines
the quality of the output</li>
<li>The human stays in the loop (at least for now)</li>
<li>Introducing AI is a process. Start with non-critical, well-scoped tasks and
let the system &ldquo;prove&rdquo; itself and earn trust.</li>
</ul>
<p>There are also things the speakers do not agree on:</p>
<ul>
<li><a href="/2026/04/09/ai-engineer-europe-2026-keynote/session-day-1/#harness-engineering-how-to-build-software-when-humans-steer-and-agents-execute--ryan-lopopolo">Code is free</a> vs <a href="/2026/04/09/ai-engineer-europe-2026-keynote/session-day-1/#it-aint-broke-why-software-fundamentals-matter-more-than-ever--matt-pocock">Code is not cheap</a></li>
<li>Move fast, <a href="/2026/04/09/ai-engineer-europe-2026-keynote/session-day-1/#harness-engineering-how-to-build-software-when-humans-steer-and-agents-execute--ryan-lopopolo">have the agents do the full job</a> and <a href="/2026/04/10/ai-engineer-europe-2026-keynote/session-day-2/#cicd-is-dead-agents-need-continuous-compute-and-computers--hugo-santos-and-madison-faulkner">remove humans from the loop to speed up</a> vs <a href="/2026/04/10/ai-engineer-europe-2026-keynote/session-day-2/#building-pi-in-a-world-of-slop--mario-zechner">slow down</a> and <a href="/2026/04/10/ai-engineer-europe-2026-keynote/session-day-2/#the-friction-is-your-judgment--armin-ronacher-and-cristina-poncela-cubeiro">think</a></li>
</ul>
<p>Other takeaways (that were not explicitly mentioned by multiple speakers):</p>
<ul>
<li>Agents are bad at self-evaluation</li>
<li>Only the first ~100K tokens of context are useful. More context will only make
the agent dumber</li>
<li>Agents are consuming APIs, documentation and websites and might even already
make up a large portion of their users</li>
<li>I should probably use more skills</li>
</ul>
<p>In general, I think we, as an industry, are still figuring out how to work with
AI. This is also directly related to the rapid pace at which things are
changing. Something may work today, but may be obsolete next month.</p>
<h2 id="the-conference">The conference</h2>
<p>Overall, I liked the atmosphere and energy. The organizers managed to get a great
bunch of speakers on stage who delivered insightful content.</p>
<p>I&rsquo;m a bit on the fence about the workshop day. I really liked that the speakers
had more time to go in depth (which is hard in the 20-minute timeslot of the
breakout sessions). On the other hand, I was expecting to do more &ldquo;work&rdquo; in a
workshop. However, most talks I attended were just that: longer talks. And if it
were not for taking notes, I would not have needed to bring my laptop.</p>
<p>Having said that, I had fun, learned a lot and would love to go again next year.</p>
<h2 id="the-trip">The trip</h2>
<p>This wasn&rsquo;t my first trip to London, but it was my first conference there. The
conference was held in the
<a href="https://en.wikipedia.org/wiki/Queen_Elizabeth_II_Centre">Queen Elizabeth II Centre</a>,
which is in the center of the city. And especially since we stayed in a hotel
close to the venue and could walk over there each day, I was very aware of where
we were, which was great. (I like London, can you tell?)</p>
<p>This was also my first time going to a conference with such a big group.
Sixteen people from Schuberg Philis were there! This also meant that for most
(perhaps even all?) of the sessions I joined, I was not the only one from our
company in the room, which means I could discuss what we&rsquo;ve heard and how it
applies to our company and customers. This added a lot of value for me.</p>
<p>And it&rsquo;s also nice to get to know my colleagues a bit better, especially the
people I don&rsquo;t work with on a day-to-day basis. And while we talked a lot of
shop there, there was also time and space to bond on a more personal level.</p>
<h2 id="conclusion">Conclusion</h2>
<p>I think I can say that we, as a company, are in a good place with how we are
adopting AI and helping our customers. Having said that: AI engineering is a
field that is still very much developing. So we&rsquo;ll continue to learn and adapt.</p>
<p>As for myself: I&rsquo;ll work, together with my colleagues, on integrating the
takeaways into my everyday work.</p>]]></content>
  </entry>
  <entry>
    <title type="html"><![CDATA[AI Engineer Europe 2026: Keynote/Session Day 2]]></title>
    <link rel="alternate" href="https://markvanlent.dev/2026/04/10/ai-engineer-europe-2026-keynote/session-day-2/" type="text/html" />
    <id>https://markvanlent.dev/2026/04/10/ai-engineer-europe-2026-keynote/session-day-2/</id>
    <author>
      <name>map[name:Mark van Lent uri:https://markvanlent.dev/about/]</name>
    </author>
    <category term="ai" />
    <category term="conference" />
    
    <updated>2026-04-11T01:06:51Z</updated>
    <published>2026-04-10T00:00:00Z</published>
    <content type="html"><![CDATA[<p>The third, and final day of the AI Engineer Europe 2026 conference. The format
of the day is similar to the previous day: kick the day off with keynotes, then
breakout sessions and close with keynotes at the end again. Today I spent most
of my time in the AI Architects track.</p>
<h2 id="building-pi-in-a-world-of-slop--mario-zechner">Building pi in a World of Slop &mdash; Mario Zechner</h2>
<p><a href="https://github.com/badlogic/pi-mono">Pi</a> is the engine inside <a href="https://openclaw.ai/">OpenClaw</a>.</p>
<p>Mario has several issues with Claude Code. To name a few: the system prompt
changes on every release, zero observability, zero model choice, almost zero
extensibility. He gravitated towards <a href="https://opencode.ai/">OpenCode</a> as a
replacement. But Mario found some design choices he did not like.</p>
<p>Started developing pi. It contains four packages:</p>
<ul>
<li>pi-ai: a unified LLM API</li>
<li>pi-tui: a terminal UI framework</li>
<li>pi-agent-core: an agent loop for tool execution, validation, event streaming and message queuing</li>
<li>pi-coding-agent: the CLI</li>
</ul>
<p>Pi has four tools:</p>
<ul>
<li>Read</li>
<li>Write</li>
<li>Edit</li>
<li>Bash</li>
</ul>
<p>That&rsquo;s it.</p>
<p>It is in YOLO mode by default (in other words: it can execute commands without
asking for permission). You need to implement your own guardrails that fit for
<em>your</em> security needs. You can ask pi to implement those for you if you want.</p>
<p>Pi is extensible:</p>
<ul>
<li>skills</li>
<li>prompt templates</li>
<li>themes</li>
<li>extensions: tools, command, shortcuts, compaction, etc.</li>
</ul>
<p>All of these hot-reload.</p>
<p>How do you build an extension? You don&rsquo;t. You specify what you need, and have Pi
build it.</p>
<p>But none of this is about pi. It&rsquo;s about taking control of your own tools.</p>
<p><a href="https://en.wikipedia.org/wiki/Clanker">Clankers</a> are ruining OSS. Mario
auto-closes new PRs that look AI generated with the message that the user should
try again. Since Clankers don&rsquo;t read this, he gets rid of the bots this way.
Users that indeed try again are not blocked the second time.</p>
<blockquote>
<p>Agents are merchants of learned complexity</p></blockquote>
<p>Models are trained on bad architecture decisions and cargo cult best practices.
So it&rsquo;s trained on garbage. The model fills in blanks in your specs with garbage from the internet.</p>
<p>Humans work differently since we feel pain and fix things to resolve/prevent the
pain. Agents don&rsquo;t learn the way we learn.</p>
<p>How we <em>should</em> work with agents:</p>
<ul>
<li>Scoped, so that the agent doesn&rsquo;t need to load tons of code</li>
<li>With a closed loop so the agent can evaluate its own work</li>
<li>Nothing mission critical (instead: dashboards, debugging tools, etc)</li>
<li>Work on the boring stuff or things you haven&rsquo;t had time for yourself</li>
<li>Reproduce cases from user issues</li>
</ul>
<p>Most importantly:</p>
<blockquote>
<p>Slow the fuck down</p></blockquote>
<p>Think about what you are building and why. Also learn to say &ldquo;no&rdquo;. It&rsquo;s easy to
add features, but are they the right features to add? Limit generated code to
what you can review. And do review every line if it&rsquo;s critical code.</p>
<h2 id="the-friction-is-your-judgment--armin-ronacher-and-cristina-poncela-cubeiro">The Friction Is Your Judgment &mdash; Armin Ronacher and Cristina Poncela Cubeiro</h2>
<p>Agents initially felt like an unlocked secret: you get more done. Now you <em>have</em>
to use them or you fall behind. It is not sustainable to do reviews and have
time to think. It&rsquo;s addictive. We are tricked into thinking we are doing more
work. But we have less time to reflect on whether we are even doing the right thing. It
is hard to know when to stop.</p>
<p>Before agents there was a balance between creating and reviewing code. But the
creation part is now amplified. The parts the engineer still has to do are <em>not</em>
amplified. As a result reviews are skipped or rubber-stamped.</p>
<blockquote>
<p>The moments where you want to skip thinking are exactly the moments where it
matters most.</p></blockquote>
<p>Agents are optimized to write code that runs. They are not as good at making an
overall good design. Agents introduce more code paths and local failures. And a
degrading codebase reinforces itself: the agents will create worse code.</p>
<p>Libraries have clearly defined problems they are solving. And they likely have
a simple core. Products, on the other hand, have more interacting components:
flags, permissions, billing, etc. The components are more intertwined. One of
the problems with that is that the context window cannot hold the full picture.</p>
<p>Your codebase has become infrastructure. So you have to design it in a way the
agent can read it. To have an agent-legible codebase you need to have:</p>
<ul>
<li>Modularization with clear boundaries so agents can work in a single area at a
time</li>
<li>Known patterns and conventions the agent can use for pattern-matching</li>
<li>A simple core (push the complexity to layers above)</li>
<li>No hidden magic. If the agent cannot see it, it cannot take it into account</li>
</ul>
<p>Examples of mechanical enforcements:</p>
<ul>
<li>No bare catch-all. This forces the agent to think about error handling</li>
<li>No raw SQL outside the abstraction layer to preserve the query interface</li>
<li>Use components for the UI</li>
<li>No dynamic imports</li>
<li>Enforce unique function names so it&rsquo;s easier to <code>grep</code> for a name</li>
<li>Use <code>erasableSyntaxOnly</code> TypeScript mode</li>
</ul>
<p>In pull request reviews you should separate the input going back to the agent
from what needs to go to a human to make a judgement call.</p>
<p>We still have to go slow. It becomes harder to understand what is going on
in the codebase. This makes cleanup also harder. It&rsquo;s harder to judge the state
of your codebase.</p>
<p>While we like to remove friction, some friction is useful. It makes it possible
to steer the project. The friction isn&rsquo;t your enemy, it&rsquo;s your judgement.</p>
<h2 id="context-is-the-new-code--patrick-debois">Context Is the New Code &mdash; Patrick Debois</h2>
<p>We are prompting to turn context into code. But we are also transforming code back into context in the form of skills.</p>
<p>Patrick is talking about the context development lifecycle today.</p>
<figure><img src="/images/ai_engineer_2026_patrick_debois_cdlc.jpg"
    alt="The Context Development Lifecycle"><figcaption>
      <p>Patrick Debois explained the Context Development Lifecycle. Image taken from <a href="https://tessl.io/blog/context-development-lifecycle-better-context-for-ai-coding-agents/">his article</a> since I didn&rsquo;t get a picture of it</p>
    </figcaption>
</figure>

<h3 id="generate-create--curate-context">Generate: create &amp; curate context</h3>
<p>Prompting is using humans as a context engine. If you want to get advanced: create
rules/instructions (<code>AGENT.md</code> or <code>CLAUDE.md</code>) to have reusable pieces of
context. You can also bring in context in the form of library documentation or
pull context from other places (context connectors, like MCP). But spec
driven development is also a form of context.</p>
<h3 id="evaluate-test--measure-context-quality">Evaluate: test &amp; measure context quality</h3>
<p>We&rsquo;re not yet writing evals for our code context. We could validate our skills
(more or less linting). Use LLM as a judge if the generated code matches the
criteria. Compare the outcome with code generated <em>without</em> the context in your
<code>AGENT.md</code> to see if the context has an effect.</p>
<p>Once a judge gets agents and can do stuff, you basically get an end-to-end test.
The tests give feedback what is working and what is missing. We can generate
actions from there to improve context.</p>
<p>Can we run this in CI/CD? That&rsquo;s hard, because it&rsquo;s not deterministic. Better to
run e.g. 5 or more times to see if it works most of the time.</p>
<h3 id="distribute">Distribute</h3>
<p>What if you want to reuse context in multiple projects/teams? You want to
package the context in one way or another. Then the question becomes: how do you
discover the skills? Via a skills marketplace. But most skills on there are
crap. (It&rsquo;s still useful to learn from others though.) A skill contains
context, code, etc.</p>
<p>We&rsquo;re also going to have dependencies and thus dependency hell.</p>
<p>We&rsquo;ll need security and scan the context. (Snyk has options for this.) Who built
this skill, with what model? We&rsquo;ll need a skill
<a href="https://en.wikipedia.org/wiki/Software_supply_chain">SBOM</a>.</p>
<h3 id="observe-monitor-and-improve-in-production">Observe: monitor and improve in production</h3>
<p>If you maintain a skill as something someone else can use, how do you get
feedback how it&rsquo;s working for others? Look at agent logs. Agent traces. Any
feedback on a PR that it&rsquo;s not correct is also feedback.</p>
<p>What about running code that&rsquo;s running in production and was created from a
context? You could e.g. use <a href="https://www.hud.io/">Hud</a></p>
<p>Agent sandboxing. Is code running in production doing strange things? Having a
context filter is like a WAF and can prevent prompt injections. Use harness
engineering.</p>
<blockquote>
<p>Context is the fuel. Coding agents are the engine.</p></blockquote>
<p>Related articles:</p>
<ul>
<li><a href="https://tessl.io/blog/cicd-for-context-in-agentic-coding-same-pipeline-different-rules/">CI/CD for Context in Agentic Coding: Same Pipeline, Different Rules</a></li>
<li><a href="https://tessl.io/blog/context-development-lifecycle-better-context-for-ai-coding-agents/">The Context Development Lifecycle: Optimizing Context for AI Coding Agents</a></li>
<li><a href="https://openai.com/index/harness-engineering/">Harness engineering: leveraging Codex in an agent-first world</a></li>
</ul>
<p>(Full disclosure: Patrick works for Tessl.)</p>
<h2 id="most-enterprise-agentic-projects-are-doomed--heres-why--jess-grogan-avignon-and-jack-wang">Most Enterprise Agentic Projects Are Doomed — Here&rsquo;s Why &mdash; Jess Grogan-Avignon and Jack Wang</h2>
<p>The presenters work in the world of large enterprises. A bad deployment can take down
critical infrastructure. Control, process, repeatability and governance structures
for human speed, not machine speed.</p>
<p>Only 12% of the companies are &ldquo;AI Achievers&rdquo;. The other 88% of them remain stuck
and are falling behind.</p>
<p>Things that have made companies successful are now holding them back in the time
of AI.</p>
<figure><img src="/images/ai_engineer_2026_jess_grogan-avignon_and_jack_wang.jpg"
    alt="Enterprise behaviours"><figcaption>
      <p>Jess Grogan-Avignon and Jack Wang about enterprise tensions that could hold agentic projects back</p>
    </figcaption>
</figure>

<dl>
<dt>Speed</dt>
<dd>Enterprise speed is running at human speed (with practices such as security
reviews, deployment process, etc). Corporations have not invested like tech
companies have. An example: an application took 2 weeks to build, but 12
months to get it into production.
<p>Approval infra hasn&rsquo;t kept up with the speed at which the supply of code is
growing. To go faster, every human step in the process should become
executable, adaptable code.</p>
</dd>
<dt>Value</dt>
<dd>Needing a business case is not wrong per se. However, they assume scope,
solution, expected value and cost to deliver are known beforehand. But when
the cost to prototype drops massively, you can more easily test things that
were economically impossible before.
<p>Enterprises need to think as a VC: take a bit of risk, not everything will
turn into profit. Try out things.</p>
</dd>
<dt>Delivery</dt>
<dd>Treating a scientific process like software feature delivery. Utopian design
upfront with guaranteed performance and status updates. Instead of building
the thing.
<p>IT is not the problem. The team needs to upskill product managers,
architects, etc around building confidence.</p>
</dd>
<dt>Trust</dt>
<dd>Completed features are not the most valuable thing you ship. There is a large
trust gap. The trust you build when using an AI is the most valuable. Trust in
content quality, accuracy, security, reliability.
<p>Agent autonomy is gated by evidence in outcomes, you need to earn autonomy.
Use what the user is saying to iterate. Shadow mode -&gt; advisory mode -&gt;
controlled autonomy -&gt; expanded autonomy. Engineer for trust, not completion.</p>
</dd>
<dt>Moat</dt>
<dd>What is unique for you? When your customer touches our product. Deployment is
the starting line, not the finish line. How fast can you iterate? Continuous,
compounding feedback loop.</dd>
</dl>
<p>Prescription to succeed:</p>
<ul>
<li>Start now: deliver differently and measure in confidence</li>
<li>Make finance a transformation partner, not a gatekeeper</li>
<li>Make governance speed an engineering problem</li>
<li>Redefine your moat as what you compound from today</li>
</ul>
<h2 id="the-domain-native-ai-organization-how-to-leverage-domain-expertise--chris-lovejoy">The Domain-Native AI Organization: How to Leverage Domain Expertise &mdash; Chris Lovejoy</h2>
<p>We often do not have deep understanding about the processes we are automating.
Use domain experts as oracle, evaluator or architect.</p>
<ul>
<li>Oracle: directly adds expertise</li>
<li>Evaluator: define and measure quality</li>
<li>Architect: build self-improving systems</li>
</ul>
<p>Most common mistakes:</p>
<ul>
<li>Not hiring domain experts (or too late)</li>
<li>Wrong kind of domain expert</li>
<li>Not fitting them in the organization properly</li>
</ul>
<p>Do you need domain experts? Yes! Appraising AI quality requires judgement. And
judgement requires domain expertise.</p>
<h2 id="cicd-is-dead-agents-need-continuous-compute-and-computers--hugo-santos-and-madison-faulkner">CI/CD Is Dead, Agents Need Continuous Compute and Computers &mdash; Hugo Santos and Madison Faulkner</h2>
<p>Agentic software is breaking traditional CI/CD. Why is CI/CD dead? At agent
scale you have more PRs and more repos. But it still takes the same time to
review and verify. Merging all different versions together becomes impossible.</p>
<p>Machine latency in the CI/CD pipeline was hidden behind the &lsquo;slow&rsquo; humans. With
agents the pain points become clear.</p>
<p>Today we have the human in the loop: we validate the PR. So each time a PR is
rejected the agentic software can update the PR fairly quickly only to be
blocked again by the human. But since this means there are only so many changes
going on, we have a relatively big window to get your stuff merged.</p>
<p>The PR as a unit of work was designed for humans. CI matters because it
validates your work. It also facilitates coordination.</p>
<p>Hugo explains a loop we have today: intent + plan (what are you doing) goes into
an agent harness loop. The agent will check out your code. Then goes to internal
validation to make sure the change is correct. It then reports back to the human for
external validation (does it look good?). When done, go to merge queue. This is fast,
but not fast enough because there is a human in the loop.</p>
<figure><img src="/images/ai_engineer_2026_hugo_santos_and_madison_faulkner.jpg"
    alt="CI/CD human out of the loop"><figcaption>
      <p>Hugo Santos showing the human out of the loop setup of tomorrow</p>
    </figcaption>
</figure>

<p>Tomorrow the human will be out of the loop. We&rsquo;ll have all kinds of LLMs (e.g.
one with a security focus, another one with a different focus, etc) to do the
external validation. The human only needs to approve a change once it&rsquo;s in the
pre merge queue. So the human is still gatekeeping, but later in the process.</p>
<h2 id="software-engineering-is-becoming-plan-and-review--louis-knight-webb">Software Engineering Is Becoming Plan and Review &mdash; Louis Knight-Webb</h2>
<p>What are we even going to do in the new AI era?</p>
<p>Work humans do:</p>
<ul>
<li>Plan</li>
<li>Write code</li>
<li>Review my code</li>
<li>Review other people&rsquo;s code</li>
</ul>
<p>If we look at 2021, most time was spent writing code and reviewing other people&rsquo;s code. If we look at
2025, there is little actual coding; most time is spent reviewing other people&rsquo;s code.</p>
<p><img src="/images/ai_engineer_2026_louis_knight-webb.jpg" alt="Louis Knight-Webb about how he spent his time"></p>
<p>So work got displaced. The time we no longer spend on coding (because the AI is
doing that now) mostly changed into planning and reviewing code.</p>
<p>There are basically two modes to work in:</p>
<ul>
<li>Spend a lot of time planning (create a comprehensive plan doc, interrogation, etc). You
spend more time planning, but the coding agent can run longer and there&rsquo;s less
time needed for review.</li>
<li>Spend a lot of time reviewing. Loosely define prompts, at the cost of more
manual QA work. If you spend less time planning, the coding agent yields
results faster, but there&rsquo;s more back and forth with agent delivering half
baked work.</li>
</ul>
<p>The mode you&rsquo;ll want to use depends on the type of work. Plan heavy mode works
well for refactoring/migration. For new features heavy planning works for
backend tasks, but for the frontend the review heavy mode is a better match.</p>
<p>Spending 5 mins of planning saves you 30 minutes of time reviewing code.</p>
<p>The time an agent can run before human intervention is needed has increased over
time. This is good: you want to minimize the time you are spending with the AI.
Most of the time the back and forth is done by the agent itself.</p>
<p>When the agent takes longer than 5 minutes, waiting on the AI (and slacking off)
is not realistic. So when agents run longer and longer, you need to change your
way of working. One option is parallelism (have multiple agents working at the
same time) <a href="https://www.vibekanban.com/">Vibe Kanban</a> is a tool to help you with
that approach.</p>
<p>What should the future look like to help the human?</p>
<ul>
<li>Focusmaxxing: embrace the fact that you cannot context switch every 30
seconds, instead build to get the most out of the humans</li>
<li>Write tasks</li>
<li>QA (websites, APIs)</li>
<li>Code review</li>
<li>Shepherd the change until it&rsquo;s deployed</li>
</ul>
<p><em>Then the talk took another turn.</em></p>
<p>Recently Louis decided to shut down the company behind Vibe Kanban. On stage he
instructed his agent to write the <a href="https://vibekanban.com/blog/shutdown">Goodbye bloop</a>
blog post (with a prompt he had prepared beforehand). Don&rsquo;t worry though, the
project will continue as open source and be maintained by the community.</p>
<h2 id="how-building-with-ai-can-double-the-throughput-of-your-engineering-team--brian-scanlan">How Building with AI Can Double the Throughput of Your Engineering Team &mdash; Brian Scanlan</h2>
<p>Brian works for Intercom. They are the company behind <a href="https://fin.ai/">Fin</a>.</p>
<p>Change is hard. You need clear and executive guidance. How to enable change:</p>
<ul>
<li>Update job descriptions and expectations</li>
<li>Constantly talk about the urgency of AI adoption</li>
<li>Reward great work (financially, socially and publicly)</li>
<li>Give people room to learn, enable them and give them access to tools</li>
<li>Be very specific about what you want to see and how it is to be done</li>
</ul>
<p>Standardizing on a <strong>single</strong>, skill driven AI platform helps. Prove that it
works, optimize its usage. Connect it to everything. Anything the human can do
on their laptop, the agent should also be able to do. (You&rsquo;ll need to be in
control of your environment to make sure it doesn&rsquo;t do anything bad though!)
Start using the platform for <strong>all</strong> technical work. It will make mistakes
initially, but it will become a flywheel where it will become more powerful
over time.</p>
<figure><img src="/images/ai_engineer_2026_brian_scanlan.jpg"
    alt="Replace technical work with Claude"><figcaption>
      <p>Brian Scanlan illustrating technical work being replaced by AI</p>
    </figcaption>
</figure>

<p>Engineering is changing. The engineers focus their time on writing specs,
validation and improving the agents. The agents write, test and review code.</p>
<p>Internal tools at his company are deprecated in favour of first-class vendor
replacements (like Anthropic/Claude Code).</p>
<blockquote>
<p>Give agents problems, not tasks</p></blockquote>
<p>Agents should figure out the necessary tasks on their own. They focus on
durable, high quality, sharable skills.</p>
<p>Current bottleneck: code review.</p>
<p>Intercom has extensive feedback loops via lots of hooks to <a href="https://www.honeycomb.io/">Honeycomb</a>.
They measure things like skill invocations, failures, etc.</p>
<p>A side-effect of using AI more was that their defect rate is going down. This
wasn&rsquo;t a goal, but a natural consequence.</p>
<p>Relevant link:</p>
<ul>
<li><a href="https://brian.scanlan.ie/">Brian&rsquo;s website</a></li>
</ul>
<h2 id="agents-dont-do-standups-building-the-post-engineer-engineering-org--mike-spitz">Agents Don&rsquo;t Do Standups: Building the Post-Engineer Engineering Org &mdash; Mike Spitz</h2>
<p>From &ldquo;how do we help engineers output more?&rdquo; to &ldquo;how do we make agents faster?&rdquo;</p>
<blockquote>
<p>Scrum did not survive</p></blockquote>
<p>Rituals designed for humans don&rsquo;t work for agents. Ceremonies became huddle
sessions.</p>
<p>Specs are turned into LDDs (lightweight design documents) which become tickets and PRs.</p>
<table>
  <thead>
      <tr>
          <th>Humans</th>
          <th>Agents</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Sprint planning</td>
          <td>Don&rsquo;t need 1hr estimation sessions</td>
      </tr>
      <tr>
          <td>Daily Standup</td>
          <td>Update tickets automatically</td>
      </tr>
      <tr>
          <td>Sprint refinement</td>
          <td>Generate tickets via LDDs &amp; flag issues, make sure tickets don&rsquo;t depend on each other</td>
      </tr>
      <tr>
          <td>Retro</td>
          <td>Metrics replace anecdotes</td>
      </tr>
  </tbody>
</table>
<p>How do you start?</p>
<ol>
<li>Pick the engineers with development and broad system knowledge</li>
<li>Scale slowly!</li>
<li>Experiment in non-critical systems</li>
</ol>
<p>Also keep in mind that:</p>
<blockquote>
<p>Not everyone can drive a sports car</p></blockquote>
<p>It&rsquo;s going to be hard for a few engineers. The curious engineer will smash this.</p>
<p>Some guardrails:</p>
<ol>
<li>Verifiable deterministic tasks (unit tests, e2e tests, linters, PR
prerequisites)</li>
<li>Agentic code review (human steering, agentic review; opinionated comments are
easy to offload)</li>
<li>Tiered human in the loop (heavy human review at system design, light review
at code (except security), heavy review at end for product feel)</li>
</ol>
<p>Prerequisites for autonomous loop:</p>
<ol>
<li>Composable skills (all parts of development are abstracted into composable
skills)</li>
<li>Agent-involved stages (agent involved at <em>every</em> stage: spec, LDD,
ticket/branch/PR creation, self-testing, self-QA)</li>
<li>Self-healing agents</li>
<li>Human multipliers (allow humans to parallelize)</li>
</ol>
<p>What do the humans do?</p>
<ol>
<li>Security (ensure no shortcuts were taken by the AI)</li>
<li>Product feel</li>
<li>Scale &amp; engineering complexity for task (Are we spending tokens on work we don&rsquo;t need to do? Is the agent over-engineering?)</li>
</ol>
<p>The playbook to get started:</p>
<ol>
<li>Start with boring, repetitive tasks</li>
<li>Remove as much redundancy from the process as possible</li>
<li>Make sure the good patterns are turned into skills</li>
<li>Build guardrails before autonomy</li>
<li>Build this with your best engineers</li>
<li><strong>Do not</strong> onboard everyone all at once</li>
<li><strong>Do not</strong> try to create a &ldquo;one size fits all&rdquo; approach</li>
<li><strong>Do not</strong> be conservative, otherwise you&rsquo;ll get behind (compounding effect)</li>
<li><strong>Do not</strong> try to do too much at once, you want a phased approach</li>
</ol>]]></content>
  </entry>
  <entry>
    <title type="html"><![CDATA[AI Engineer Europe 2026: Keynote/Session Day 1]]></title>
    <link rel="alternate" href="https://markvanlent.dev/2026/04/09/ai-engineer-europe-2026-keynote/session-day-1/" type="text/html" />
    <id>https://markvanlent.dev/2026/04/09/ai-engineer-europe-2026-keynote/session-day-1/</id>
    <author>
      <name>map[name:Mark van Lent uri:https://markvanlent.dev/about/]</name>
    </author>
    <category term="ai" />
    <category term="conference" />
    
    <updated>2026-04-10T20:58:45Z</updated>
    <published>2026-04-09T00:00:00Z</published>
    <content type="html"><![CDATA[<p>Day two of the AI Engineering conference.
<a href="/2026/04/08/ai-engineer-europe-2026-workshop-day/">Yesterday</a> was filled with
longer talks where the speakers could go more in-depth. This day consists of
shorter (usually 20 minute) slots of talks and keynotes.</p>
<p><img src="/images/ai_engineer_2026_stage.jpg" alt="AI Engineer Europe 2026 stage"></p>
<h2 id="the-new-application-layer--malte-ubl">The New Application Layer &mdash; Malte Ubl</h2>
<p>AI is changing how we build and what we build. Is there a place for the software
engineer in the future?</p>
<p>Agents are a new kind of software. They are both the builders and users of
software. They allow us to automate things that were economically unviable in the
past. But with AI, this changes. Lots of software was too expensive to write
(compared to the benefits it would bring). Also, companies are making different
decisions with regard to buying from SaaS companies versus making it themselves.
The cheaper the software is to make, the more software there will be. This
leads to more work for software engineers.</p>
<p>As AI Engineers, our job is to build the next application layer: agents.</p>
<p>Archetypes of practical non-coding agents:</p>
<ul>
<li>Always running (24/7)</li>
<li>Compresses the research (in the chain of business event -&gt; research -&gt; human
decision, we can have the AI do the research)</li>
<li>Surfaces the hidden information (lots of info, but you cannot practically use it)</li>
<li>The boring things (more time for the human for the interesting tasks)</li>
</ul>
<p>We also need to realise that software is for agents now. For example, 60% of the
visitors of the vercel.com page are agents.</p>
<p>Not writing the code yourself also makes us less opinionated about the infra. It
&ldquo;just has to run&rdquo;.</p>
<p>We should also have agentic security infrastructure. We need to have an open
mindset for how to change things.</p>
<p>The new application layer can thrive independently of models. Now model A is the
&ldquo;best&rdquo; today, but tomorrow it&rsquo;s model B. We as engineers should provide a stable
interface.</p>
<p>Europe is the leader in AI Engineering. Not models though. But the model
companies are commoditising. The application layer is where the real innovation
happens.</p>
<h2 id="harness-engineering-how-to-build-software-when-humans-steer-and-agents-execute--ryan-lopopolo">Harness Engineering: How to Build Software When Humans Steer and Agents Execute &mdash; Ryan Lopopolo</h2>
<p>Use the models to do the full job. Lean into the idea that the models are
software engineers.</p>
<blockquote>
<p>Code is free</p></blockquote>
<p>Hiring the &ldquo;hands on the keyboard&rdquo; is constrained by token budgets nowadays. We
need to constructively use this capacity. The human skill sets that are now
needed are delegation and systems design.</p>
<p>The models are good enough. Code is free (to produce, maintain and refactor).
Your role is to unblock your team and your team is infinitely large. Your job is
to make use of that team.</p>
<p>Human time and attention is scarce. How do we effectively use it? When we are no
longer blocked by this, we <em>can</em> work on the low(er) priority issues. We have
infinite coding resources, right? Humans don&rsquo;t need to concern themselves with
implementation, but specs and guardrails. Our job is to build systems and
structures to make our teams effective.</p>
<p>What does it mean to do a good job? Used to be years of experience in the field.
Lots of little decisions in everyday work. But agents have seen more code than
we have seen. We need to write down the non-functional requirements so that the
agents can use this. Figure out what the agents are struggling with. Put
guardrails in place to guide the agents. Move on to higher level tasks.</p>
<p>Having a single QA expert in the team who can write a good QA plan can benefit
the whole team. Same goes for other experts: one engineer has more impact on the
whole team than before.</p>
<p>How can we help the agents to make the correct decisions?</p>
<ul>
<li>Continuously run review agents</li>
<li>Check if the code has a secure interface</li>
<li>&ldquo;Lint&rdquo; for non-functional requirements</li>
<li>Figure out why we are spending time on correcting agents and fix the problem so
you can move on</li>
<li>Adapt your codebase to match the world today, e.g. limit file size so they fit
in the context window</li>
<li>Have good error messages with resolution steps. This helps the agent resolve
issues itself</li>
</ul>
<p>Everything is a prompt: rules, skills, error messages, PR comments, et cetera.</p>
<p>Just build things. Do not hesitate to have the agents do the full job.</p>
<h2 id="why-building-eval-platforms-is-hard--phil-hetzel">Why building eval platforms is hard &mdash; Phil Hetzel</h2>
<p>Evals and observability are related. Evals is what you do before hitting
production, observability is what you do when in production.</p>
<p>Why are evals important?</p>
<ul>
<li>LLMs have extreme variability (we love them for it though)</li>
<li>Agents are becoming the norm, people have come to expect them when
interacting with your company</li>
<li>You need to be confident with the agent&rsquo;s performance</li>
</ul>
<p>Evals are not a hard problem: gather a bunch of example inputs, loop through
them with the agent and publish the result. Right? Actually it&rsquo;s way more
complex. Think about tracing, alerting, online scoring, topic modeling, etc.</p>
<figure><img src="/images/ai_engineer_2026_phil_hetzel_1.jpg"
    alt="The iceberg of evals"><figcaption>
      <p>Phil Hetzel speaking about the eval iceberg hidden under the surface</p>
    </figcaption>
</figure>

<p>Different stages of doing evals:</p>
<ol>
<li>A spreadsheet plus a <code>for</code> loop. It&rsquo;s a great place to start, no barrier to
entry. However, it&rsquo;s more about documenting and not really experimenting, and it is
hard to compare experiments. It&rsquo;s also a cumbersome process.</li>
<li>Vibe coded UI. Nice next step, probably has proper persistence (database).
It&rsquo;s a bit more bespoke to bring others into the fold. But it&rsquo;s still more of
a reporting tool.</li>
<li>Encouraging experimentation. Give a user access to an agent configuration
plus a sandbox. Allow the users to tweak prompts and parameters.
(He demonstrated with Braintrust). Still no access to production traces.</li>
<li>&ldquo;The Flywheel&rdquo;. This is where observability and evals are connecting.
Understanding actual user behaviour. This unlocks the feedback loop.
Downside: you now have to manage the eval platform, at the pace the industry
is moving. More importantly: agent traces are nasty (and not like normal
application traces), very large and numerous.</li>
</ol>
<figure><img src="/images/ai_engineer_2026_phil_hetzel_2.jpg"
    alt="The flywheel"><figcaption>
      <p>Phil Hetzel speaking about the flywheel: observe -&gt; analyze -&gt; evaluate -&gt; improve -&gt; etc</p>
    </figcaption>
</figure>

<p>This is a new problem because of the specifics of traces: larger spans, highly
unstructured, difference in read patterns. And while these aspects individually
are not unique problems, the combination of them <em>is</em> new.</p>
<p>Building the right system: specific for agent traces with multipersona
workflows. This allows you to measure agent quality at scale, delivering near
real time feedback.</p>
<p>Looking forward:</p>
<ul>
<li>Surface unknown-unknowns via topic modelling techniques</li>
<li>Build the platform for users <strong>and</strong> agents</li>
<li>Perform observability via an AI proxy or gateway</li>
</ul>
<h2 id="what-breaks-when-you-build-ai-under-sovereignty-constraints--bilge-yücel">What Breaks When You Build AI Under Sovereignty Constraints &mdash; Bilge Yücel</h2>
<p>Sovereign AI is &ldquo;the ability of an organisation to design, deploy and operate AI
systems on its own terms.&rdquo; In a practical sense: this means having explicit
control over data flow, model choices, infrastructure, observability and
operations.</p>
<p>The pillars of sovereignty:</p>
<dl>
<dt>Data sovereignty</dt>
<dd>Data should be stored in &ldquo;trusted jurisdictions&rdquo;. If you send your data to a
model running in the US, you may already not be compliant anymore.</dd>
<dt>Infrastructure sovereignty</dt>
<dd>Maximal control: airgapped (EU AI act safe). Maximal convenience: SaaS (CLOUD
Act risk)</dd>
<dt>Model sovereignty</dt>
<dd>You do not want to tightly couple with a specific model provider. You want to
be able to swap without architectural changes.</dd>
<dt>Operational sovereignty</dt>
<dd>Monitor how AI systems behave, have the human in the loop, manage versioning
and updates to models and applications.</dd>
</dl>
<p>Sovereignty is a spectrum. Some sectors need more sovereignty (finance,
healthcare); some need less (e.g. startups).</p>
<p>What are the engineering challenges? What do you do and what do you break?</p>
<ul>
<li>Replace the frontier API with a self hosted model. Consequence: translate API
logic</li>
<li>Private data to required jurisdiction. Consequence: managing multiple
databases and instances.</li>
<li>Replace managed infra with on-prem. Consequence: you discover vendor lock-in
and are limited by your hardware.</li>
<li>Incorporate observability and tracing. Consequence: you have to do version
control of the whole system.</li>
</ul>
<p>Haystack solves some of these problems. It has:</p>
<ul>
<li>A consistent interface</li>
<li>Explicit data flow (know what data was where)</li>
<li>Serializable to YAML so easy to version</li>
<li>No black box or hidden assumptions because it is open source</li>
</ul>
<p>(Full disclosure: Haystack is a product from Deepset, the company Bilge works
for.)</p>
<p>Architecture:</p>
<ul>
<li>Guardrails (check input)</li>
<li>Agent (with LLM)</li>
<li>Guardrails (prevent leaking info)</li>
</ul>
<figure><img src="/images/ai_engineer_2026_bilge_yucel.jpg"
    alt="Sovereign architecture"><figcaption>
      <p>Bilge Yücel showing a sovereign architecture</p>
    </figcaption>
</figure>

<p>Sovereignty checklist:</p>
<ul>
<li>Can you swap your models without changing the application logic?</li>
<li>Do you have reproducible run logs stored in a compliant way?</li>
<li>Can your team respond to an incident without needing the help of the
hyperscalers?</li>
</ul>
<p>Related blog post: <a href="https://haystack.deepset.ai/cookbook/safety_moderation_open_lms">AI Guardrails: Content Moderation and Safety with Open Language Models</a></p>
<h2 id="software-engineering--ai----gergely-orosz-and-swyx">Software Engineering + AI = ? &mdash; Gergely Orosz and swyx</h2>
<p>Gergely is the man behind <a href="https://www.pragmaticengineer.com/">The Pragmatic Engineer</a>.</p>
<p><strong>What does Gergely think about
<a href="https://www.newsnationnow.com/business/tech/tokenmaxxing-ai-status-game/">tokenmaxxing</a>?</strong>
It&rsquo;s happening at multiple large companies. Token output is measured and e.g.
shown on a leaderboard. This results in uncertainty: people start to think
performance is also measured by token usage. Token count can be weaponized. The
result: people start burning tokens by, for example, asking the AI instead of
reading the docs just to get the token count up, even if the AI doesn&rsquo;t do a
good job of answering the question. Some companies even have a minimum token
count. It&rsquo;s a weird time to live in.</p>
<p><strong>Is AI still making us faster?</strong> Experienced engineers were holding off,
especially the older models on existing codebases. Targeting and measuring token
usage came from the C-suite. They had the idea that if their company was not
using AI tools they are probably not doing well. There&rsquo;s a push to use AI.</p>
<p>Why are engineers putting up with that? For the same reason as we have weird
leetcode interviews: they select for smart people willing to put up with a
bullshit process to get the job. Big tech is a bit weird.</p>
<p>Individually, AI makes us go faster. For teams? Not always. It is hard to
retrofit in some situations. An engineer is perhaps not much more productive.
But enabling non-coding colleagues with AI unlocks the ability for them to not
have to wait for dev but get things done themselves.</p>
<p>These times are hard to understand for us engineers:</p>
<ul>
<li>Using AI takes a long time to get good at it</li>
<li>Knowing the theory behind it and how it works will not make you more productive</li>
</ul>
<p><strong>How is the role of a software engineer changing?</strong> Before AI the role was
already changing. AI just sped it up. Startups were already moving fast with
smaller teams. More roles collapse into the role of a software engineer (think
of DevOps, QA). Now also product management. Software engineers need to know
about the business, and take more responsibility.</p>
<p>&ldquo;Everyone is an engineering manager now&rdquo; (no longer a software engineer). This
is absolute nonsense. Traditional managers take care of people problems, their
growth path, etc. But with AI you don&rsquo;t have to worry about a person. It&rsquo;s more
a tech lead role. You can do more and faster, but are still in control. It&rsquo;s only
orchestration, not management. With agents you have a faster feedback loop than
with humans.</p>
<p>E.g. Uber is not building more features into their core product. Instead they
are working on infra and people are building more. It&rsquo;s a low risk way to be
hands on with AI and learn. But also the codebase is too big for the context
window, building separate things is easier. Plus: anything related to AI gets
funded faster.</p>
<p>Taking AI infra seriously is not well understood. We&rsquo;re learning from each
other.</p>
<h2 id="keynote--kitze">Keynote &mdash; Kitze</h2>
<p>This talk was fast paced and very enjoyable. I can recommend watching it.
(See the <a href="https://www.youtube.com/live/O_IMsEg91g8?si=Ug5UK3Hh5ELZKz37&amp;t=29326">AIE Europe Day 1: Keynotes &hellip;</a> starting at 8:08:46)</p>
<p><img src="/images/ai_engineer_2026_kitze.jpg" alt="Photo of Kitze speaking"></p>
<h2 id="it-aint-broke-why-software-fundamentals-matter-more-than-ever--matt-pocock">It Ain&rsquo;t Broke: Why Software Fundamentals Matter More Than Ever &mdash; Matt Pocock</h2>
<p>With the new paradigm (AI), we should probably chuck out the old rules to make
room for new ones, right? There&rsquo;s a &ldquo;specs to code&rdquo; movement where you
write the spec and have the AI do the coding. You don&rsquo;t look at the code. If
there is a problem with the code, you go back to the spec and try again. This is
driven by the idea that code is cheap.</p>
<p>But ignoring the code doesn&rsquo;t work. A bad codebase is one that is hard to
change. <a href="https://pragprog.com/titles/tpp20/the-pragmatic-programmer-20th-anniversary-edition/">The Pragmatic
Programmer</a>
has a chapter on software entropy. It basically means that if you do not pay
enough attention to the design of the whole system, the codebase will become
worse and worse. And this was exactly what he was seeing when he was running the
compiler again and again when using specs to code.</p>
<p>Bad code is the most expensive it has ever been. Good codebases matter more than
ever.</p>
<blockquote>
<p>Software fundamentals matter more than ever</p></blockquote>
<p>Common failure modes and how to avoid them by going back to old software
practices:</p>
<dl>
<dt>The AI didn&rsquo;t do what I wanted</dt>
<dd>No-one knows exactly what they want. There is a communication barrier between
you and the AI. <a href="https://www.amazon.com/Design-Essays-Computer-Scientist/dp/0201362988">The Design Of Design</a>
speaks about &ldquo;the design concept&rdquo;: the idea of what you are building. You and
the AI don&rsquo;t share a design concept. Hence the <code>/grill-me</code> skill. Works
towards a shared understanding. Use his other skills to generate a Product
Requirements Document or issues. (See <a href="https://github.com/mattpocock/skills">Matt&rsquo;s skills repository</a>
for the skills mentioned in this talk.)</dd>
<dt>The AI is way too verbose</dt>
<dd>You also see that in the interaction between a developer and a domain expert.
You need to establish shared language.
<a href="https://en.wikipedia.org/wiki/Domain-driven_design">Domain Driven Design</a>
also needs a ubiquitous language. In our case this is essentially a Markdown
file with concepts and what they mean. Use the <code>/ubiquitous-language</code> skill.
This generates a file (<code>UBIQUITOUS_LANGUAGE.md</code>) with tables with terminology.</dd>
<dt>Code that doesn&rsquo;t work</dt>
<dd>Use feedback loops: static types, browser access to look around, automated tests.</dd>
<dt>Doing way too much</dt>
<dd>The Pragmatic Programmer calls this &ldquo;outrunning your headlights&rdquo;. The rate of
feedback is your speed limit. Related skill <code>/tdd</code>: write test, make test
pass. However, testing is hard (how big of a unit do you want to test, what to
mock, what behaviour to test, etc). Good codebases are easy to test. (Because
of better feedback loops and more clear boundaries.) Use deep modules: few
modules with lots of functionality but simple interfaces.</dd>
</dl>
<figure><img src="/images/ai_engineer_2026_matt_pocock_3.jpg"
    alt="Deep modules vs Shallow modules"><figcaption>
      <p>Matt Pocock about deep vs shallow modules</p>
    </figcaption>
</figure>

<dl>
<dt>AI does not understand my code</dt>
<dd>Again, use deep modules. Easier for the AI to understand the design. How to go
from shallow to deep modules? Use the <code>/improve-codebase-architecture</code> skill.</dd>
<dt>My brain hurts</dt>
<dd>The human cannot keep up. Again: deep modules make it simpler for you to
understand the codebase. You can treat these modules as grey boxes. The AI
can handle what&rsquo;s in the blob. Design the interface but delegate the
implementation.</dd>
</dl>

  <figure>

<blockquote >
Invest in the design of the system every day
</blockquote>

  <figcaption>
    &mdash;Kent Beck, <cite>Extreme Programming Explained</cite>
  </figcaption>
  </figure>


<p>Code is <strong>not</strong> cheap, it is important. If we think of AI as a programmer, you
need someone above that, thinking on the strategic level. And that&rsquo;s you! And
that&rsquo;s why fundamental software development skills are still important.</p>]]></content>
  </entry>
  <entry>
    <title type="html"><![CDATA[AI Engineer Europe 2026: Workshop Day]]></title>
    <link rel="alternate" href="https://markvanlent.dev/2026/04/08/ai-engineer-europe-2026-workshop-day/" type="text/html" />
    <id>https://markvanlent.dev/2026/04/08/ai-engineer-europe-2026-workshop-day/</id>
    <author>
      <name>map[name:Mark van Lent uri:https://markvanlent.dev/about/]</name>
    </author>
    <category term="ai" />
    <category term="conference" />
    
    <updated>2026-04-09T20:32:35Z</updated>
    <published>2026-04-08T00:00:00Z</published>
    <content type="html"><![CDATA[<p><a href="https://www.ai.engineer/europe">AI Engineer Europe</a> is &ldquo;Europe&rsquo;s first flagship
AI Engineer event&rdquo;. I was fortunate enough to be one of the engineers that
<a href="https://schubergphilis.com/">Schuberg Philis</a> (my employer) sent to this
conference. The first day was workshop day.</p>
<p><img src="/images/ai_engineer_2026_entrance.jpg" alt="AI Engineer Europe 2026 was held at the Queen Elizabeth II Centre in London"></p>
<h2 id="how-to-build-agents-that-run-for-hours-without-losing-the-plot--ash-prabaker-and-andrew-wilson">How to Build Agents That Run for Hours (Without Losing the Plot) &mdash; Ash Prabaker and Andrew Wilson</h2>
<p>Why are agents losing the plot?</p>
<ul>
<li>Context: the agent can&rsquo;t carry state (and has &ldquo;context anxiety&rdquo; if the context
is getting filled up)</li>
<li>Planning: general models are not great at planning (e.g. running out of context)</li>
<li>Verification: models are bad at evaluating their own output (it thinks it&rsquo;s done)</li>
</ul>
<p>There are two ways to fix this:</p>
<ul>
<li>Train the model</li>
<li>Wrap the model in a harness</li>
</ul>
<p>Anthropic&rsquo;s new models were also combined with harness improvements last year.
Every release of Claude could run longer unattended.</p>
<p>Harness design for long-running agents. Building the generator/evaluation
loop, and then deleting half of it when the model caught up. Splitting up the
generator (which builds the thing) from the evaluator (which grades the thing).
Most people now use the same instance to build and evaluate.</p>
<blockquote>
<p>Tuning a standalone evaluator to be skeptical is tractable. Making a generator
self-critical is not.</p></blockquote>
<p>Added one more role: plan (a 1-line prompt to full spec). This is the input for
the generator. If you squint a bit, you&rsquo;ll notice that this mimics the real
world where we have a product manager (the plan role), an individual contributor
(the generator) and a QA person (the evaluator).</p>
<p>Before any code gets written, the generator and evaluator negotiate what &ldquo;done&rdquo;
looks like for this chunk. They iterate via files until they agree (one agent
writes, the other reads and responds). The agents agree on a contract. This
bridges user stories to testable behaviour.</p>
<p>Ash presented an example where a solo agent built a game vs building a game
with a full harness. The solo agent was done in 20 minutes, and the full harness
took 6 hours. But the result was significantly better and more
thought through.</p>
<p>Out of the box, Claude is a poor QA agent. It would find a bug and then decide
itself that it wasn&rsquo;t a big deal and approve the work anyway.</p>
<p>With Opus 4.6 half of what was described about harnesses became obsolete. The
new model needs less scaffolding and the harness can be less complex.</p>
<figure><img src="/images/ai_engineer_2026_anthropic_harness_design.jpg"
    alt="human -&gt; planner -&gt; generator -&gt; evaluator"><figcaption>
      <p>The simplified harness where the human prompts the planner, the planner writes the spec and the generator and evaluator work on the application</p>
    </figcaption>
</figure>

<p>Takeaways:</p>
<ol>
<li>Use an adversarial evaluator, self-evaluation is a trap</li>
<li>Structured handoffs are better than compaction</li>
<li>Make subjective quality gradable with rubrics the model can apply</li>
<li>Read the traces as they are your primary debugging loop</li>
<li>Delete scaffolding where the model catches up</li>
</ol>
<p>The models you are using (e.g. Opus for planning and Sonnet for building)
influence the harness. The harness patterns that work tend to get absorbed back
into the tools. Most of the loop described is buildable in Claude Code right now
with primitives that are already available.</p>
<p>Further reading on the Anthropic Engineering blog:</p>
<ul>
<li><a href="https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents">Effective harnesses for long-running agents</a></li>
<li><a href="https://www.anthropic.com/engineering/harness-design-long-running-apps">Harness design for long-running application development</a></li>
</ul>
<h2 id="building-your-own-software-factory--eric-zakariasson">Building Your Own Software Factory &mdash; Eric Zakariasson</h2>
<p>Eric spoke about his experience using Cursor to build a software factory.
Some parts are running autonomously. It&rsquo;s hard work. These are his observations.</p>
<p>There are several levels of using AI. At the lowest level you have autocomplete.
Then you progress to a coding intern, a junior dev, a developer (where the
majority of code is written by the AI tool) to a senior developer and finally a
software factory where the AI is a black box and operates like a
<a href="https://en.wikipedia.org/wiki/Lights_out_(manufacturing)">dark factory</a>.</p>
<p>Why would you want a factory:</p>
<ul>
<li>Throughput (24/7, machines do not need sleep)</li>
<li>Consistent output (like an actual factory, but with AI there is a risk of
losing determinism)</li>
<li>Leverage your taste better</li>
</ul>
<p>What do you need to build a factory?</p>
<ul>
<li>Primitives &amp; patterns (co-located code and usage patterns)</li>
<li>Guardrails (you want to let the agents free, but not <em>too</em> free, think about
rules, hooks and tests)</li>
<li>Enablers (what can you allow the agents to do to be more free, think about
skills, MCPs)</li>
</ul>
<figure><img src="/images/ai_engineer_2026_eric_zakariasson.jpg"
    alt="Building the factory"><figcaption>
      <p>Eric Zakariasson speaking about what you need to build a software factory</p>
    </figcaption>
</figure>

<p>Rules should be created dynamically: if you see an agent doing something you
don&rsquo;t like, write a rule. Note that agents will behave better over time with
newer models, so rules may become obsolete.</p>
<p>Other aspects you&rsquo;ll need:</p>
<ul>
<li>Runnable: can the agent start the env?</li>
<li>Accessible: can the agent use the tools it needs, e.g. Datadog?</li>
<li>Verifiable: can the agent perform check itself?</li>
</ul>
<p>For each of the different stages of the software development lifecycle you will
need an agent. For example to review changes and automated testing.</p>
<p>You need to shift your way of working:</p>
<ul>
<li>You&rsquo;ll write less code yourself (if any) but are managing your agents</li>
<li>You are also going from sync to async</li>
<li>You need to think more about scope and parallelising work (e.g. running some
tasks in parallel will guarantee merge conflicts while other tasks do not
interfere with each other).</li>
<li>You still need to know how data flows and what users want</li>
<li>You&rsquo;ll want to identify the human in the loop. Is there e.g. a copy/paste
action the user is doing now? Automate that away.</li>
</ul>
<p>Since agents will run for longer periods, you need to trust them more. You get
to know the agents: their weaknesses and their strengths and how to prompt them.</p>
<p>When having agents work in parallel, Eric is using separate environments (even
in different VMs) to have reproducible and isolated environments without side
effects from other branches and ongoing work. This will take more effort to
setup, but once you are there, it is easier to scale up the number of agents
working on your code.</p>
<p>You need to keep an eye out for where the agents go off the rails. Use this
information to improve the factory.</p>
<p>Now how do you go from 5 agents to 100? Same as before: observe the outcome.
And:</p>
<ul>
<li>Make sure the agents can verify their own work</li>
<li>Setup automations. Examples:
<ul>
<li>Eric demonstrated asking the agent what actions he does frequently so he can work on automating those</li>
<li>Review the comments made on PR reviews so the agents can learn from them</li>
</ul>
</li>
<li>Move up abstractions</li>
</ul>
<p>The takeaways:</p>
<ul>
<li>Be clear about your intent: what problem are you solving?</li>
<li>Stay in the loop for important decisions (e.g. which payment system to use,
etc)</li>
<li>Build systems and tools: codify them and give your agents access to them</li>
<li>Store context for later and keep it up-to-date (since it will evolve)</li>
<li>Let the agents be free (one team had even given the agents a place to complain
and that proved to be very useful)</li>
</ul>
<h2 id="build-your-own-deep-research-agent--technical-writer--louis-françois-bouchard-paul-iusztin-and-samridhi-vaid">Build Your Own Deep Research Agent + Technical Writer &mdash; Louis-François Bouchard, Paul Iusztin and Samridhi Vaid</h2>
<p>The team built a multi-agent pipeline to replace the research and technical writing
process. They give a topic and it will write a technical article (without slop
or hallucinations). It targets short content, like LinkedIn posts.</p>
<p>The GitHub repo of their project:
<a href="https://github.com/iusztinpaul/designing-real-world-ai-agents-workshop">iusztinpaul/designing-real-world-ai-agents-workshop</a></p>
<p>Constraints:</p>
<ul>
<li>Costs per task</li>
<li>Latency</li>
<li>Quality</li>
<li>Compliance &amp; data privacy</li>
</ul>
<p>There&rsquo;s a scale from simple prompts (where you have more control and less costs)
via workflows to single agent to multiple agents (where you have more autonomy,
and thus less control, and higher costs). It&rsquo;s best to always use the most
simple solution. For example: if the context is known at or before query time
and has less than 200K tokens, a simple prompt can suffice, using
<a href="https://www.geeksforgeeks.org/artificial-intelligence/context-augmented-generation-cag/">Context Augmented Generation (CAG)</a>.</p>
<p>In a situation where the context is not known beforehand (e.g. because it is
private or too recent), you might benefit from including a workflow. (A
workflow is a sequence of fixed steps, with the same steps in the same order
each time). Think about using
<a href="https://www.geeksforgeeks.org/nlp/what-is-retrieval-augmented-generation-rag/">Retrieval-Augmented Generation (RAG)</a>.</p>
<p>The next step is when you need the system to take autonomous actions or you need
dynamic behaviour. Then you get to agents, which can react to what is happening.
These agents can use also tools, which can have their own:</p>
<ul>
<li>System prompt</li>
<li>Validation logic</li>
<li>LLMs</li>
</ul>
<p>Tools are specialists, but with one shared decision maker which has a global
context. Delegation to tools helps with context management. The tools can have
their own context windows.</p>
<p>AI products are never just agents, simple workflows or LLM calls and tools.
They combine all of them. AI engineers need to understand how to build these
complex systems. And deep research systems are a perfect project on how to learn
these complex, multi-agent systems.</p>
<p>The MCP server they built (see <a href="https://github.com/iusztinpaul/designing-real-world-ai-agents-workshop">their GitHub repo</a>):</p>
<ul>
<li>Tools: actions the agent can do</li>
<li>Prompts: instructions the agent can follow</li>
<li>Resource: data the agent can read</li>
</ul>
<p>Why both skills and MCP? They are moving to using skills more, but those cannot
replace the MCP server completely because some tools are too complex to be
turned into skills.</p>
<p>LinkedIn post generation:</p>
<ul>
<li>Guidelines: what to write about (topic, angle, etc)</li>
<li>Profiles: how to write (structure, terminology, character of the post)</li>
<li>Research</li>
</ul>
<p>Debugging workflows/agents purely through logs is hard. You want traces
(LLM/tool calls with full I/O + metadata), latency and cost tracking. They used
<a href="https://www.comet.com/docs/opik/">Opik</a> for observability.</p>
<p>You also want to automate evals. Generating one post allows for manual review,
but when scaling to 100 posts, it&rsquo;s quite impossible to manually review each one.
One small change could break something completely, so you need to review.</p>
<p>They used a three layer architecture:</p>
<ul>
<li>Optimization</li>
<li>Regression testing (evals in CI/CD)</li>
<li>Production monitoring (using Opik)</li>
</ul>
<p>They encourage us to run their project ourselves and reading the code to
understand the details of what&rsquo;s going on.</p>
<h2 id="ai-coding-for-real-engineers--matt-pocock">AI Coding For Real Engineers &mdash; Matt Pocock</h2>
<p>To follow the workshop along, see Matt&rsquo;s workshop at
<a href="https://aihero.dev/s/ai-2026">aihero.dev/s/ai-2026</a>.</p>
<p><em>Before I begin with my notes: this session is definitely worth watching (again)
once it&rsquo;s available on YouTube! Matt is an excellent teacher, has great energy
and great content as well.</em></p>
<p>Once there are about 100K tokens in the context, the AI starts to get dumber and
making increasingly dumb decisions. We don&rsquo;t want the AI to bite off more than
it can chew, so keep your tasks small.</p>
<figure><img src="/images/ai_engineer_2026_matt_pocock_1.jpg"
    alt="Smart zone / dumb zone"><figcaption>
      <p>Matt Pocock explaining about the smart zone / dumb zone</p>
    </figcaption>
</figure>

<p>Even with the 1M context window of Claude the &ldquo;smart zone&rdquo; is still around 100K.
Claude basically just expanded the dumb zone. Good for retrieval, less good for
coding.</p>
<p>So how <em>do</em> you tackle big tasks? Multi-phase plans are a common solution. It&rsquo;s
basically a loop. This is where the
<a href="https://ralph-wiggum.ai/">Ralph Wiggum loop</a> comes from. Matt likes something
smarter though.</p>
<p>Every session starts with a system prompt. If you have 200K tokens in here
already, you are in the &ldquo;dumb zone&rdquo; from the start. To stay in the &ldquo;smart zone&rdquo;
you can clear the context. This does mean that you lose everything that
happened after the system prompt. Alternatively, you can compact.</p>
<p>If you show the number of tokens in the status line of Claude Code (or whatever
tool you are using), you know how close to the &ldquo;dumb zone&rdquo; you are. When using
<code>/compact</code> it squeezes all information. The downside of using <code>/compact</code> over
<code>/clear</code>: with the latter you have a deterministic state.</p>
<p>The <code>/grill-me</code> skill (<a href="https://github.com/mattpocock/skills/tree/main/grill-me">source</a>)
is a really nice way of taking inputs from the world. It can 

<cite>interview
the user relentlessly about a plan or design until reaching shared
understanding</cite> (according to the skill itself). Ideally you use
the <code>/grill-me</code> skill with both the developer and the domain expert in the room.</p>
<p>After the <code>/grill-me</code> skill, you want to summarize all those valuable tokens
into a Product Requirements Document (PRD). This is the definition of done for
your agent. You can use the <code>/write-a-prd</code> skill
(<a href="https://github.com/mattpocock/skills/tree/main/write-a-prd">source</a>)
to write this document. Note that there are testing decisions in there too.
These are important!</p>
<p>Matt explains that he does not actually read the PRDs that are generated. In the
grilling sessions he makes sure he and the AI are on the same wavelength so
there&rsquo;s no need to review the resulting document. Why would he? Doing so would
basically only test the LLM&rsquo;s ability to summarize.</p>
<p>Should he optimize the plan? Matt doesn&rsquo;t think optimizing the plan to death
adds a lot of value. Things will change afterwards anyway.</p>
<p>To see examples of a PRD and issues generated from it, check his
<a href="https://github.com/mattpocock/course-video-manager">course-video-manager</a>
repository, in particular the closed issues.</p>
<p>Now that we have our destination, how do we split it? Matt likes creating a
<a href="https://en.wikipedia.org/wiki/Kanban_board">Kanban board</a> out of it.
He created a skill to do this: <code>/prd-to-issues</code>
(<a href="https://github.com/mattpocock/skills/tree/main/prd-to-issues">source</a>).</p>
<p>As you&rsquo;ll see in that skill, Matt instructs the AI to use vertical slices. LLMs
love to code horizontally, so per layer (database, business logic, frontend).
This means you don&rsquo;t get feedback on your work until all layers are done. If
you slice vertical layers, you can test the entire flow sooner. (Also known
as &ldquo;tracer bullets&rdquo; if you&rsquo;ve read
<a href="https://pragprog.com/titles/tpp20/the-pragmatic-programmer-20th-anniversary-edition/">The Pragmatic Programmer</a>.)</p>
<p>With the Kanban board setup, it&rsquo;s easier to parallelize working on tasks. And
once we have the issues that can be worked on, the human can step out of the
loop.</p>
<p>For implementation use the <code>/tdd</code> skill
(<a href="https://github.com/mattpocock/skills/tree/main/tdd">source</a>). It does red/green
refactors: write failing test first and then make it succeed. This not only adds
(good) tests to the codebase, but starting with the tests it&rsquo;s harder to &lsquo;cheat&rsquo;
with writing the tests (you cannot write the tests to match the implementation
because the latter does not exist yet).</p>
<p>How do you conform with existing architecture, coding standards, API design,
constraints, etc?</p>
<ul>
<li>Push instructions to the LLM (e.g. in <code>CLAUDE.md</code>)</li>
<li>Pull: give the agent an opportunity to collect info, e.g. via skills.</li>
</ul>
<p>For the implementer you should use the pull strategy so it can pull what it
needs. For the reviewer use push (these are our standards, make sure they are
adhered to).</p>
<p>You absolutely need to have feedback loops for the AI. The quality of
your feedback loops directly affect the quality of the output.</p>
<p>As you may have noticed, there are two types of work when building something:</p>
<ul>
<li>Human in the loop (HITL) tasks (like planning) which <strong>need</strong> the human</li>
<li>Away from keyboard (AFK) tasks (like implementation) where the AI can work
autonomously</li>
</ul>
<p>To recap the process thus far: you have an idea, you have the grilling session
with the AI and this results in a PRD, which gets turned into issues on a Kanban
board. These are HITL steps. Now the AI can take over and handle implementation
where one or more agents work (the night shift so to speak). Once the AI is done,
the human steps back in the loop for the QA/review step.</p>
<figure><img src="/images/ai_engineer_2026_matt_pocock_2.jpg"
    alt="Smart zone / dumb zone"><figcaption>
      <p>Matt Pocock discussing the phases in the whole process from idea to finished implementation</p>
    </figcaption>
</figure>

<p>And yes, we do need code review. There is no way to avoid this. If we delegate
coding to the agent in small PRs, we have to review more code. Matt doesn&rsquo;t feel
good saying that, but it&rsquo;s his honest answer. The QA step is also where you can
impose your opinions on the agent. Note that in the QA phase we also create more
issues on the Kanban board.</p>
<p>You <em>can</em> and <em>should</em> have an automated review step though. Only QA manually
afterwards. But be careful that the automated review isn&rsquo;t done in the &ldquo;dumb
zone&rdquo;, you want to review in the smart zone.</p>
<p>Frontend in particular is tricky. It needs human eyes. AI is not very good at
that yet. But you <em>can</em> ask it to create a couple of prototypes to trigger
a feedback loop with the agent.</p>
<p>So how does this work in a team? You involve the team in all HITL steps. And
while the idea, research, prototype steps look linear, in the messy world you&rsquo;ll
bounce back and forth between those phases.</p>
<p>What if you have a bad, complicated codebase that even humans do a bad job in?
How do you improve that? If your files are &ldquo;shallow modules&rdquo; (small files with
little functionality), it&rsquo;s hard for AI to navigate. It has to manually track
through the repo. Also hard to draw test boundaries. Hard to test interaction
between modules. Should the tests mock other modules?</p>
<p>Building a codebase that is easy to test is essential, because the feedback loop
is better. &ldquo;Deep modules&rdquo; are better; these are modules with more functionality.
Dependencies are more clear. So how do you go from a bad codebase to a good one?
How to group modules? The <code>/improve-codebase-architecture</code> skill
(<a href="https://github.com/mattpocock/skills/tree/main/improve-codebase-architecture">source</a>)
will find places to deepen the modules.</p>
<p>(The concept of shallow/deep modules come from the book
<a href="https://www.amazon.com/Philosophy-Software-Design-John-Ousterhout/dp/1732102201">A Philosophy of Software Design</a> discusses dependencies.)</p>
<p>If you take only one thing away from today: use the <code>/improve-codebase-architecture</code> skill</p>
<div class="note ">
  <div class="note_header">
    <span class="hidden">:</span>
  </div>
  <div class="note_body">
    Matt was brilliant at switching between telling Claude what to do next,
presenting and answering questions. I&rsquo;ve tried to make the above a bit of a
logical story. What remains are more random notes resulting from questions from
the audience.
  </div>
</div>

<p>You need to have (enough) control over the thing to be able to fix it. The PRD
contains which modules are updated, it also helps you keep in control of the
beast. Because we delegate more, we lose sense of our codebase. By building
deep modules (big shapes), it&rsquo;s easier to have their mental models in your mind.
You don&rsquo;t need to code review all details in a module, you only need to make
sure the shape does what it needs to do.</p>
<p>Code is important, so understanding the tools deeply make you a better developer
and you&rsquo;ll get more out of AI.</p>
<p>When using plan mode, you can tell in <code>CLAUDE.md</code> to be terse (&ldquo;when talking to
me, sacrifice grammar for the sake of concision&rdquo;). This helps when reading the
plans. But Matt dropped this in favour of the grilling session where he and the
LLM came to the same shared understanding and he no longer needed to read the
plans.</p>
<p>Does he keep the markdown plans for future reference? No clear answer. Matt is
wary of outdated documentation (names and requirements have changed). He tends
to get rid of the plans and marking the issues as closed.</p>
<p>What does he think of <a href="https://steve-yegge.medium.com/introducing-beads-a-coding-agent-memory-system-637d7d92514a">Beads</a>
from Steve Yegge? It&rsquo;s another way to manage Kanban boards.</p>
<p>Sidenote: <a href="https://github.com/mattpocock/sandcastle">Sandcastle</a> (also created
by Matt) is an orchestrator. It takes Ralph loop from sequential to parallel.</p>
<p>Matt is not selling a way of working. He does recommend buying old programming
books (pre AI) since they contain a lot of wisdom that is still applicable.</p>]]></content>
  </entry>
  <entry>
    <title type="html"><![CDATA[FOSDEM 2026]]></title>
    <link rel="alternate" href="https://markvanlent.dev/2026/01/31/fosdem-2026/" type="text/html" />
    <id>https://markvanlent.dev/2026/01/31/fosdem-2026/</id>
    <author>
      <name>map[name:Mark van Lent uri:https://markvanlent.dev/about/]</name>
    </author>
    <category term="ansible" />
    <category term="conference" />
    <category term="docker" />
    <category term="infrastructure as code" />
    <category term="git" />
    <category term="python" />
    <category term="security" />
    
    <updated>2026-02-01T14:54:22Z</updated>
    <published>2026-01-31T00:00:00Z</published>
    <content type="html"><![CDATA[<p>January is already almost over, so time for <a href="https://fosdem.org/2026/">FOSDEM</a>,
the yearly <q>free event for software developers to meet, share ideas and
collaborate</q> in Brussels. <a href="/2025/02/01/fosdem-2025/">Last year</a> I
focussed on the Go track, this year I selected a mix of security and Python
related talks to attend.</p>
<h2 id="streamlining-signed-artifacts-in-container-ecosystems--tonis-tiigi">Streamlining Signed Artifacts in Container Ecosystems &mdash; Tonis Tiigi</h2>
<p>It&rsquo;s possible to sign Docker images, but at the moment most are actually not
signed. Also, users should understand what the signature is protecting and what
it&rsquo;s <em>not</em> protecting. We should not want signing just to tick a box on the
security checlist, but because of the security it adds. And we need something
simple: integrated with existing tools, should not slow down tools.</p>
<p>Buildkit powers &ldquo;<code>docker build</code>&rdquo; but is not limited to Dockerfiles. It&rsquo;s high
performance, can build complex builds and has caching.</p>
<p>A modern build is a graph of images, Git repositories, local files, etc. The
results are images, binaries, archives.</p>
<figure><img src="/images/fosdem2026_tonis_tiigi.jpg"
    alt="Photo of Tonis Tiigi explaining the graph that is modern software building"><figcaption>
      <p>Tonis Tiigi explaining that builds of modern software are a complex graph</p>
    </figcaption>
</figure>

<p>We need Supply-chain Levels for Software Artifacts (SLSA) provenance: what has
actually happened in the build? What was the build config? Et cetera. It&rsquo;s useful to
figure out how an artifact was built.</p>
<p>Buildkit does not sign images by default. GitHub has <a href="https://docs.github.com/en/packages/managing-github-packages-using-github-actions-workflows/publishing-and-installing-a-package-with-github-actions#publishing-a-package-using-an-action">an example in the
documentation</a>
to run a build with Buildkit and generate an artifact. It claims to generate an
<q>unforgeable statement</q>. But if your GitHub credentials are
leaked and the attacker can get your hands on the temporary signing key, they can
use it to sign their own artifacts.</p>
<p>Docker created the <a href="https://github.com/docker/github-builder">github-builder</a>
repository. It contains reusable GitHub Actions to securely build images. If you
use this, your images are signed to prove that they were built from a certain
repository, using the configured build steps. Where Buildkit (among other
things) provides isolation, <code>github-builder</code> provides signing context. It also
protects against build dependency leaks.</p>
<p>So that takes care of the signatures, but how do you verify them?</p>
<ul>
<li>The command &ldquo;<code>docker inspect</code>&rdquo; now shows verified signatures</li>
<li>You can manually verify it with <a href="https://github.com/sigstore/cosign">cosign</a></li>
<li>You can also use sigstore/policy-controller for Kubernetes</li>
</ul>
<p>Buildx also includes experimental Rego (Open Policy Agent) policy support. This
means you can write a matching policy for <code>Dockerfile</code>, e.g. <code>Dockerfile.rego</code>,
which is then automatically loaded. All build sources now need to pass policy
for the build to continue (images, Git repositories, URLs, etc).</p>
<p>You can do very complex stuff in the policies. As simple example Tonis showed:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-rego" data-lang="rego"><span class="line"><span class="cl"><span class="kd">package</span><span class="w"> </span><span class="nx">docker</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="n">allow</span><span class="w"> </span><span class="kd">if</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nx">input</span><span class="o">.</span><span class="nx">image</span><span class="o">.</span><span class="nx">repo</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">&#34;org/app&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nf">docker_github_builder_tag</span><span class="p">(</span><span class="nx">input</span><span class="o">.</span><span class="nx">image</span><span class="o">,</span><span class="w"> </span><span class="s2">&#34;org/app&#34;</span><span class="o">,</span><span class="w"> </span><span class="nx">input</span><span class="o">.</span><span class="nx">image</span><span class="o">.</span><span class="nx">tag</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>This policy should make sure that the image can only be built from this
repository and that the image tag should match the Git tag.</p>
<p>Summary:</p>
<ul>
<li>No reason not to sign</li>
<li>Not all signatures are equal</li>
<li>Software pulling packages should verify pulled content</li>
</ul>
<p><a href="https://fosdem.org/2026/schedule/event/HJAJTU-streamlining_signed_artifacts_in_container_ecosystems/">Link to the conference page</a></p>
<h2 id="sequoia-git-making-signed-commits-matter--neal-h-walfield">Sequoia git: Making Signed Commits Matter &mdash; Neal H. Walfield</h2>
<p>Version control systems (also known as VCSs) track the following:</p>
<ul>
<li>Changes to the code</li>
<li>Authorship</li>
<li>Other metadata</li>
<li>Commit message</li>
</ul>
<p>But the author can be faked: the metadata is set by the author, including the
author&rsquo;s name. After a quick &ldquo;<code>git config</code>&rdquo; command you can commit as anyone you
want, for example <a href="https://en.wikipedia.org/wiki/Linus_Torvalds">Linus Torvalds</a>.
Sure, GitHub could see that the committer (the one pushing the commit) and
author are different. However, this is not necessarily bad because we might
simply want to give proper attribution to the author of the commit.</p>
<p>And in theory the forge might also be compromised, or someone may have gotten
permission to push to the project.</p>
<p>To prevent impersonations, we can cryptographically prove who the author is by
signing the commits. But now the problem shifts to the certificates. Because
anyone can create a key with any name (again, for example Linus) attached to it.
So what does a signed commit mean now?</p>
<p>How can we be sure that the author is who they say they are? There are ways:</p>
<ul>
<li>You could talk to developer the verify</li>
<li>You could go to <a href="https://en.wikipedia.org/wiki/Key_signing_party">key signing parties</a></li>
<li>You can use a central authority that you trust (e.g.
<a href="https://keys.openpgp.org/">keys.openpgp.org</a>, the Linux developer keyring,
the <code>distributions-gpg-keys</code> package, or, if you trust Github, use
<code>github.com/&lt;username&gt;.gpg</code>)</li>
</ul>
<p>You can use the following command to show the Git log and the signatures on them:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">git log --show-signature
</span></span></code></pre></div><p>But now you need to actually check that the signatures are indeed made by the
certificates you trust.</p>
<p>It&rsquo;s up to the maintainers of the software to curate a list of contributors and
track when contributors join and leave (yes, there is a temporal element as
well). This is hard. Maintainer needs tooling. And you would want to detect
unauthorized commits (impersonation, a malicious forge, a machine in the middle
or for instance when project is given to a new maintainer by a forge/registry).</p>
<p>What does the solution look like?</p>
<ul>
<li>Clear semantics</li>
<li>The project itself maintains signing policy</li>
<li>Third party uses maintainers&rsquo; policy to authenticate project</li>
<li>Verification, not attestation: do not rely on any external authority</li>
</ul>
<p>(Note that the maintainers can still be socially engineered to include the key
of an attacker in their policy. So they still have to be careful about who is
added to the policy.)</p>
<p>Sequoia git provides:</p>
<ul>
<li>Specification</li>
<li>Config</li>
<li>Tooling</li>
</ul>
<p>With <a href="https://gitlab.com/sequoia-pgp/sequoia-git">Sequoia git</a> (which part of
the <a href="https://sequoia-pgp.org/">Sequoia PGP project</a>) you can have a signing
policy in an <code>openpgp-policy.toml</code> file in the project&rsquo;s Git repository. It
specifies users, their keys and their capabilities. You can use <code>sg-git</code> to help
maintain this file.</p>
<p>For instance to add user Alice and then describe the current policy, you can use
the following commands:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">sq-git policy authorize alice --committer &lt;cert&gt;
</span></span><span class="line"><span class="cl">sq-git policy describe
</span></span></code></pre></div><p>A commit is &ldquo;authenticated&rdquo; if at least one parent commit says the commit is
acceptable (via the policy). To verify that there is an authenticated path from
the current state back to a certain commit we trust, use this command:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">sq-git log --trust-root &lt;sha of trusted commit&gt;
</span></span></code></pre></div><p>Projects may have contributions from others that are not included in the policy.
To maintain an authenticated path when accepting the contribution, a trusted
author needs to merge the contribution via a merge commit that <em>is</em>
authenticated. (You may need to use the &ldquo;<code>--no-ff</code>&rdquo; on the merge to make sure
there is a merge commit though.)</p>
<p><a href="https://fosdem.org/2026/schedule/event/KFSUCW-sequoia-git/">Link to the conference page</a></p>
<h2 id="an-endpoint-telemetry-blueprint-for-security-teams--victor-lyuboslavsky">An Endpoint Telemetry Blueprint for Security Teams &mdash; Victor Lyuboslavsky</h2>
<p>With open source we can inspect something that is broken, we can change the
defaults. With security we are used to the opposite; it&rsquo;s a black box. We are
not used to owning the data. The data exists on the endpoints, but ownership is
transferred to a different team. How can we add more security in a way engineers
understand and can use?</p>
<p>Victor presents a blueprint with the following layers:</p>
<ul>
<li>Endpoint agents</li>
<li>Control layer</li>
<li>Ingestion, streaming &amp; storage</li>
<li>Detection</li>
<li>Correlation, intelligence and response</li>
</ul>
<p>The value is not in the layers themselves, but the boundaries. For example, the
ingestion should move the data reliably but should not care which tool collected
it. This makes them loosely coupled.</p>
<p>For endpoint agents Victor suggests
<a href="https://github.com/osquery/osquery">osquery</a> which allows basic questions about
endpoints. Data is structured and consistent. It aligns with open source values.
(Alternatives: scripts &amp; cron, log shippers like filebeat or tools like auditd
or Event Tracing for Windows.)</p>
<p>Controlling the data (the next layer) means that you want to have:</p>
<ul>
<li>Central config</li>
<li>Live queries</li>
<li>Consistent schemas</li>
</ul>
<p><a href="https://github.com/fleetdm/fleet">Fleet</a> (disclaimer: Victor works here) is
built to manage <code>osquery</code> at scale and a good candidate for this layer.</p>
<p>The control layer needs to work hand-in-hand with ingestion layer. The ingestion
layer moves data to downstream system. E.g. <a href="https://github.com/vectordotdev/vector">Vector</a> or
<a href="https://www.elastic.co/logstash">Logstash</a> can be used here.</p>
<blockquote>
<p>Ingestion isn&rsquo;t where you get clever. It&rsquo;s where you get reliable.</p></blockquote>
<p>Streaming decouples users from consumers and e.g. allows replay. Note that this
is an optional step and it would come <em>after</em> ingestion, not <em>in place of</em> it.
For instance <a href="https://kafka.apache.org/">Apache Kafka</a> can be used in this
layer. Ingestion absorbs the mess. Streaming preserves flexibility.</p>
<p>The storage layer is where telemetry becomes durable. It&rsquo;s about being able to
ask hard questions later. Examples of useful tools:
<a href="https://github.com/ClickHouse/ClickHouse">ClickHouse</a>,
<a href="https://www.elastic.co/elasticsearch">Elasticsearch</a> (which is better at text
search) and <a href="https://github.com/apache/iceberg">Iceberg</a> (which is slower for
active investigation).</p>
<p>For the detection layer you might want to use
<a href="https://github.com/SigmaHQ/Sigma">Sigma</a>. It provided portability. Rules are
translated to native SQL running on ClickHouse. Intent (Sigma signatures)
becomes execution (SQL query to get the data).</p>
<p>Finally the correlation layer: <a href="https://github.com/grafana/grafana">Grafana</a>
can be used for correlation and visualisation. Grafana can query ClickHouse.
Grafana also has alerting.</p>
<p>Note that response isn&rsquo;t just about automation. It&rsquo;s also to pause and ask
better questions. The correlation layer should focus on enabling humans to act.</p>
<p>Open endpoint telemetry is <strong>not</strong> an &ldquo;EDR killer&rdquo;. It does not replace it. It adds
diversity and complements other tools. It provides a second set of eyes.</p>
<p><a href="https://fosdem.org/2026/schedule/event/HYXTPH-endpoint-telemetry-blueprint/">Link to the conference page</a></p>
<h2 id="the-bakery-how-pep810-sped-up-my-bread-operations-business--jacob-coffee">The Bakery: How PEP810 sped up my bread operations business &mdash; Jacob Coffee</h2>
<p>Python loads imports eagerly by default. This leads to memory bloat and cold
start issues. Explicit lazy imports (see
<a href="https://peps.python.org/pep-0810/">PEP 810</a>) only import a module when it&rsquo;s
first accessed not when the import statement is executed.</p>
<p>Lazy import is scheduled to be included in Python 3.15 and looks like this:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">lazy</span> <span class="kn">import</span> <span class="nn">foo</span> <span class="kn">from</span> <span class="nn">bar</span>
</span></span></code></pre></div><p>The design principles applied are that lazy imports are:</p>
<ul>
<li>Explicit</li>
<li>Local</li>
<li>Granular</li>
</ul>
<p>When parsing the Python code a proxy module is created. Only when the module is
actually used, the proxy is transparently replaced by the real package. You will
not always see improvements, so do not blindly replace all imports with lazy
imports.</p>
<p>PEP 810 also eliminates the need for <code>TYPE_CHECKING</code> guards. (See the <a href="https://docs.python.org/3/library/typing.html#typing.TYPE_CHECKING">typing
docs</a>, in
short: importing a module that is expensive and only contains types used for
type checking in an &ldquo;<code>if TYPE_CHECKING:</code>&rdquo; block.) It also helps for faster test
discovery and collection, less memory usage, decrease cold start slowness in
e.g. AWS Lambda functions, CLI applications, etc.</p>
<p>Meta (with Cinder) saw a 70% startup time reduction and 40% memory savings.
PySide has a 35% startup improvement.</p>
<p>About CLI tools: when using lazy imports you might notice the difference when
using <code>--help</code>. There&rsquo;s no need to load all dependencies to just output the help
text of a tool.</p>
<p>Some notes:</p>
<ul>
<li>Import time side effects (e.g. logging configuration, DB connections) are also
delayed!</li>
<li>Type checkers need to be updated</li>
<li>Import errors move to first use (so in runtime, not at launch). Keep that in
mind when debugging</li>
<li>It&rsquo;s not always faster, so profile your application before migrating and see
where you can potentially benefit</li>
<li>Document your lazy imports!</li>
<li>You cannot do lazy imports in functions</li>
</ul>
<p>Circular imports are probably still a problem, but they just show up later.</p>
<p><a href="https://github.com/JacobCoffee/breadctl">Link to the repo for this talk</a></p>
<p><a href="https://fosdem.org/2026/schedule/event/HAAABD-the_bakery_how_pep810_sped_up_my_bread_operations_business/">Link to the conference page</a></p>
<h2 id="modern-python-monorepo-with-uv-workspaces-prek-and-shared-libraries--jarek-potiuk">Modern Python monorepo with <code>uv</code>, <code>workspaces</code>, <code>prek</code> and shared libraries &mdash; Jarek Potiuk</h2>
<p>Jarek is, besides his other roles, the number 1 Apache Airflow contributor. The
<a href="https://github.com/apache/airflow">Apache Airflow repo</a> is the monorepo he
talks about today. There is also a series of blog posts about this topic: see
<a href="https://medium.com/apache-airflow/modern-python-monorepo-for-apache-airflow-part-1-1fe84863e1e1">part 1</a>,
which links to the other parts.</p>
<p>Airflow drove early requirements for
<a href="https://docs.astral.sh/uv/concepts/projects/workspaces/">uv workspaces</a>. They now
manage 120+ distributions seamlessly with it. It allows them to combine
distributions to work together in a workspace. Also used to import from one
distribution in another one.</p>
<p>The project shares a single virtual environment used by <code>uv</code> in root of project.
If you run &ldquo;<code>uv sync</code>&rdquo; from the top level you get everything. If you run it in a
subdirectory (e.g. <code>airflow-core</code>) you only get what is needed for that
distribution.</p>
<p>Benefits of the <code>uv</code> workspaces:</p>
<ul>
<li>Isolated</li>
<li>Explicit</li>
<li>Flexible</li>
</ul>
<p><a href="https://hatch.pypa.io/1.12/">Hatch</a> has (or will have, at the time of writing)
largely compatible workspaces.</p>
<p>However <a href="https://pre-commit.com/">pre-commit</a> became a bottleneck. They needed
to run 170+ pre commit hooks <strong>on every commit</strong>.
<a href="https://github.com/j178/prek">Prek</a> is drop-in replacement for pre-commit and
works fantastic. It is optimized for speed and monorepos.</p>
<p>Airflow uses symlinked shares libraries (where a shared lib is also a
distribution). The Hatchling build backend needs to replace links with physical
copies during packaging. They use Prek to maintain consistency.</p>
<p><code>uv sync</code> detects conflicts between merged requirements files and Prek hooks
enforce relative imports in shared code to prevent cross coupling issues (IIRC)</p>
<p><a href="https://fosdem.org/2026/schedule/event/WE7NHM-modern-python-monorepo-apache-airflow/">Link to the conference page</a></p>
<h2 id="pyinfra-because-your-infrastructure-deserves-real-code-in-python-not-yaml-soup--loïc-wowi42-tosser">PyInfra: Because Your Infrastructure Deserves Real Code in Python, Not YAML Soup &mdash; Loïc &ldquo;wowi42&rdquo; Tosser</h2>
<p>Loïc is a Frenchmen (which, as he himself states, means he <strong>must</strong> have
opinions) and not a YAML fan to put it mildly. That is: YAML as a programming
language, e.g. how it is used in <a href="https://github.com/ansible/ansible">Ansible</a>.</p>
<figure><img src="/images/fosdem2026_loic_tosser.jpg"
    alt="Photo of Loïc Tosser showing a complex Ansible task in YAML"><figcaption>
      <p>Loïc Tosser demonstrating what happens when you ask a config file to be a programming language</p>
    </figcaption>
</figure>

<p><a href="https://pyinfra.com/">PyInfra</a> is an infrastructure as code library to write
Python code which is then translated to shell scripts to run on the target
hosts. So, in contrast to Ansible, you do not need Python on the target. The
target machine only needs SSH and a POSIX shell. You can also configure Docker
containers with PyInfra.</p>
<blockquote>
<p>If it has SSH, PyInfra can talk to it.</p></blockquote>
<p>PyInfra has idempotent operations and built-in diff checking. Declarative
infrastructure with actual code and not YAML. You can use inventory from
Terraform, Coolify or any API.</p>
<p>You can leverage the entire Python packaging ecosystem. Slack integration? Just
use the right Python package.</p>
<p>PyInfra is not only a CLI tool, you can also use it as a library.</p>
<p>PyInfra is 10 times faster than Ansible, uses 70% less code, has proper code
reuse via <code>import</code> and proper loops instead of <code>with_items</code>. It can have actual
unit tests and can scale to thousands of servers. Also you no longer have error
messages stating that <q>the error appears to be in &hellip; <strong>but may be
elsewhere in the file</strong> &hellip;</q> (looking at you Ansible). PyInfra has
clear error messages without having to specify <code>-vvvv</code> and wading through
hundreds of lines of output.</p>
<p>The suggested migration path:</p>
<ul>
<li>Start small, one playbook at a time</li>
<li>Use your IDE for autocomplete and refactoring</li>
<li>Leverage Python&rsquo;s standard library and the ecosystem with all its packages</li>
<li>Sleep better because you don&rsquo;t have to debug at 3 AM.</li>
</ul>
<p>Is PyInfra production ready? Yes! It has a stable API, is already in use in
production, it&rsquo;s actively maintained and is MIT licensed (so no commercial
entity behind it to steer its direction).</p>
<p>You can get started today with a simple &ldquo;<code>pip install pyinfra</code>&rdquo;.</p>
<p><a href="https://fosdem.org/2026/schedule/event/VEQTLH-infrastructure-as-python/">Link to the conference page</a></p>
<p>(Note from me, Mark, I found Loïc a great speaker: he has lots of energy, is
funny and can transfer his enthusiasm to the room. If the topic interests you
and the video becomes available, I would recommend watching this talk as a great
sales pitch to get started with PyInfra.)</p>
<h2 id="ducks-to-the-rescue---etl-using-python-and-duckdb--marc-andré-lemburg">Ducks to the rescue - ETL using Python and DuckDB &mdash; Marc-André Lemburg</h2>
<p>ETL stands for Extract, Transform, Load. Nowadays we usually do Extract, Load,
Transform because databases are efficient in processing.</p>
<p>DuckDB is open source, in-process analytics data storage (OLAP). It is similar
to SQLite, but for OLAP workloads. It has great Python support and uses SQL as
standard query language. It&rsquo;s pip installable, column based
(<a href="https://arrow.apache.org/">Apache Arrow</a>). It&rsquo;s single writer but allows for
multiple readers, so it&rsquo;s not a distributed database.</p>
<p><a href="https://github.com/pola-rs/polars">Polars</a>&rsquo; streaming can help with processing
your data as a line-by-line stream so you don&rsquo;t have to load the whole file in
memory at once.</p>
<p>Example to load a CSV file into DuckDB extremely fast:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">read_csv</span><span class="p">(...)</span><span class="w">
</span></span></span></code></pre></div><p>You can load the data into staging tables first to prepare everything and not
mess up e.g. existing data. You can then transform data in DuckDB, e.g. filter
out unneeded and duplicate data, validate data, fill in missing data, convert
data types, etc. You can do the transforms in SQL. You can even use native
integrations to write to PostgreSQL, MySQL, etc. Or worst case stream to Python.</p>
<p>Guidelines:</p>
<ul>
<li>Know your queries, that is: know how your data is going to be used</li>
<li>Use the Pareto principle (80/20 rule): optimize for queries that are used
often</li>
<li>Keep a healthy balance between performance and space requirements (which are
often trade-offs)</li>
</ul>
<p>Huge datasets: use the <a href="https://github.com/duckdb/ducklake">DuckLake</a> extension.</p>
<p>To get started: &ldquo;<code>uv add duckdb</code>&rdquo;. Do some experiments and see how it works for
you.</p>
<p><a href="https://fosdem.org/2026/schedule/event/S7RELZ-ducks_to_the_rescue_-_etl_using_python_and_duckdb/">Link to the conference page</a></p>
<h2 id="my-takeaways">My takeaways</h2>
<ul>
<li>Yes, FOSDEM is crowded and you may not be able to get into every talk you want
to see in person, but it&rsquo;s still nice to be there. It&rsquo;s well organised and
there&rsquo;s a friendly atmosphere. Lots of interesting projects to see and people
to talk to. And it&rsquo;s convenient if you want to sponsor your favorite projects
by buying some merchandise.</li>
<li>It&rsquo;s worth investigating signing Docker images (in the right way) further.</li>
<li>Lazy imports look useful! Once Python 3.15 lands it&rsquo;s worth doing profiling on
the projects I work on to see if we can use those to speed things up on
startup and save some memory.</li>
<li>At work we recently decided to go for a monorepo for a project. I want to see
if/how <code>uv</code> workspaces and <code>prek</code> can help us.</li>
<li>I&rsquo;ve written a bunch of Ansible roles to configure my humble homelab and
laptop. Perhaps it&rsquo;s time to switch to PyInfra? It sounds promising and might
be worth the investment of migrating to.</li>
</ul>
<h2 id="about-the-trip">About the trip</h2>
<p><figure class="float-right"><img src="/images/fosdem2026_atomium.jpg"
    alt="Picture of the Atomium at night" width="200px"><figcaption>
      <p>The <a href="https://en.wikipedia.org/wiki/Atomium">Atomium</a> at night</p>
    </figcaption>
</figure>

Last year I drove to Brussels on Friday and stayed at the city center in the
<a href="https://cityboxhotels.com/hotels/brussels/citybox-brussels">Citybox Brussels
hotel</a> for one
night, since I had to be home on Sunday. The upside: it was just a short (15
minute?) tram ride to the FOSDEM location. Unfortunately it did mean I had to
drive home that evening.</p>
<p>This year I had more time, so I booked a room at
<a href="https://www.falkohotel.be/">Falko Hotel</a> for two nights. It&rsquo;s about a 20&ndash;30
minute drive (depending on traffic) to the <a href="https://www.interparking.be/en/parkings/brussels/toison-d-or/">parking
garage</a> I used.
And from there about 20 minutes with pubic transport to the Université libre de
Bruxelles.</p>
<p>Staying another night meant I had more time for sightseeing, had the time to
write this post from my notes and could drive home well rested the next day.</p>
<p>As for tech: besides a phone and laptop, I also brought along two items that
made the trip more comfortable:</p>
<ul>
<li>A <a href="https://mojogear.eu/en/products/mojogear-mini-evo-10-000-mah-power-bank-22-5w">MOJOGEAR Mini
Evo</a>
powerbank to give my phone extra juice to make it through the day. With 10.000
mAh and up to 22.5W of power it&rsquo;s more than sufficient for a day at a
conference. With its small size and less than 175 grams in weight, it&rsquo;s also
easy to carry around.</li>
<li>A <a href="https://www.gl-inet.com/products/gl-sft1200/">GL.iNet Opal (GL-SFT1200)</a>
travel router. I plug it in, hook it up to the hotel internet, start a VPN
connection and all my other devices automatically connect to it and can use
the internet without the hotel snooping on my traffic. (Not that I have an
indication that my hotel would do that, but theoretically they could if I
would not use a VPN.)</li>
</ul>]]></content>
  </entry>
  <entry>
    <title type="html"><![CDATA[FOSDEM 2025]]></title>
    <link rel="alternate" href="https://markvanlent.dev/2025/02/01/fosdem-2025/" type="text/html" />
    <id>https://markvanlent.dev/2025/02/01/fosdem-2025/</id>
    <author>
      <name>map[name:Mark van Lent uri:https://markvanlent.dev/about/]</name>
    </author>
    <category term="conference" />
    <category term="go" />
    <category term="kubernetes" />
    
    <updated>2025-06-08T15:27:48Z</updated>
    <published>2025-02-01T00:00:00Z</published>
    <content type="html"><![CDATA[<p>After years of thinking &ldquo;I should have gone&rdquo; after the fact, I finally went to
FOSDEM!</p>
<p>FOSDEM&mdash;which stands for Free and Open source Software Developers’ European
Meeting&mdash;is a free, no registration required, event held in Brussels each year.
Thousands of developers gather there to connect and share ideas. This was the
25th edition.</p>
<p>Below are the notes I took during the talks I attended.</p>
<h2 id="the-state-of-go--maartje-eyskens">The state of Go &mdash; Maartje Eyskens</h2>
<p>This is the eleventh edition of the Go devroom at FOSDEM. It&rsquo;s the first time
the Go devroom is bigger than the Python devroom. (By the way: Rust has the biggest one this
year.)</p>
<p>Go itself is 15 years now (so it is definitely not a new language anymore) and
has had 25 point zero releases. Go has a stable API, a strong and stable
standard library, dependency management and generics.</p>
<p>Go 1.23 was released on August 13th, 2024. Go version 1.24 will be released this
month. So what has changed since last year? To name a few (Maartje mentioned
more, but I wasn&rsquo;t able to write them all down. Watch the recording if you want
to see them all):</p>
<ul>
<li><strong>Language changes</strong>
<ul>
<li>Go can now loop over three new types</li>
<li>Generic types can now be used in type aliases.</li>
</ul>
</li>
<li><strong>Tools</strong>
<ul>
<li>&ldquo;<code>go vet</code>&rdquo;: many new warnings</li>
<li>Go tooling now support JSON (e.g. &ldquo;<code>go test --json</code>&rdquo;).</li>
<li>Go sets binary version based on VCS: &ldquo;<code>debug.ReadBuildInfo()</code>&rdquo; (use
&ldquo;<code>--buildvcs=false</code>&rdquo; to disable the <code>dirty</code> flag)</li>
<li>&ldquo;<code>go tool</code>&rdquo;: add tools used in builds in <code>go.mod</code>.</li>
<li>Go telemetry is still opt-in. Use &ldquo;<code>go telemetry on</code>&rdquo; and &ldquo;<code>go telemetry off</code>&rdquo; to
switch it on and off. The Go team believes that telemetry will play a
critical role in helping Go development.</li>
</ul>
</li>
<li><strong>Standard lib</strong>
<ul>
<li>Several new helpers for <code>iter</code> functions</li>
<li>Support for quantum proof key exchanges</li>
<li>Bunch of modern algorithms (e.g. PBKDF2 and SHA3) moved to stable library</li>
<li>&ldquo;<code>os.OpenRoot(&quot;path&quot;)</code>&rdquo; gives you a safe file system. Even protects against symbolic links outside of path.</li>
<li>JSON encoding supports new <code>omitzero</code></li>
<li>New <code>unique</code> package for faster comparison.</li>
</ul>
</li>
<li><strong>Runtime</strong>
<ul>
<li>Swiss maps</li>
</ul>
</li>
<li><strong>Ports</strong>
<ul>
<li>Go 1.25 will require macOS Monterey or later</li>
<li>Go 1.24 has no support for Windows windows/arm</li>
<li>Go 1.24 requires Linux kernel version 3.2 or later (released in 2012)</li>
</ul>
</li>
</ul>
<p>Go conferences this year:</p>
<ul>
<li>Go devroom @ FOSDEM 2025 (today)</li>
<li>Gophercon Latam Brazil (May 5–6)</li>
<li>GopherCon Europe Berlin (June 16–19)</li>
<li>Gophercon New York (August 26–28)</li>
</ul>
<p>Recording: <a href="https://fosdem.org/2025/schedule/event/fosdem-2025-5353-the-state-of-go/">The state of Go</a></p>
<h2 id="the-inner-workings-of-go-generics--anton-sankov">The Inner Workings of Go Generics &mdash; Anton Sankov</h2>
<p>Generic allow you to work with different types, while keeping type safety.</p>
<p>Example:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="nx">ToSlice</span><span class="p">[</span><span class="nx">T</span><span class="w"> </span><span class="kt">any</span><span class="p">](</span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="nx">T</span><span class="p">)</span><span class="w"> </span><span class="nx">T</span><span class="p">[]</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="k">return</span><span class="w"> </span><span class="p">[]</span><span class="nx">T</span><span class="p">{</span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nx">intSlice</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">ToSlice</span><span class="p">[</span><span class="kt">int</span><span class="p">](</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nx">floatSlice</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">ToSlice</span><span class="p">[</span><span class="kt">float32</span><span class="p">](</span><span class="mf">1.5</span><span class="p">,</span><span class="w"> </span><span class="mf">2.5</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1">// Alternative, use type inference:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nx">intSlice2</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nf">ToSlice</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="c1">// Type safety: this will NOT compile:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nx">wrongSlice</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nf">ToSlice</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s">&#34;string&#34;</span><span class="p">)</span><span class="w">
</span></span></span></code></pre></div><p>Why was Go created:</p>
<ul>
<li>Simplicity (over C++)</li>
<li>Fast compilation times (over C++)</li>
<li>Fast runtime (over C++)</li>
</ul>
<p>Generics complicates all three. However, people started complaining about the
lack of generics the day after Go was introduced to the world. Generics were
proposed a bunch of times and only in 2021 a proposal was accepted.</p>

  <figure>

<blockquote cite="https://research.swtch.com/generic">
The generic dilemma is this: do you want slow programmers, slow compilers and bloated binaries, or slow execution times?
</blockquote>

  <figcaption>
    &mdash;Russ Cox, <cite><a href="https://research.swtch.com/generic">The Generic Dilemma</a></cite>
  </figcaption>
  </figure>


<p>None of the proposals contained an implementation. So more proposals were
needed. Three proposals were written, and the last one was accepted: &ldquo;GC shape
stenciling&rdquo;, which is a middle ground between stenciling (proposal 1, the C++ way) and dictionaries (proposal 2, the Java way).</p>
<p>Anton showed an example of how this works. See <a href="https://asankov.dev/go-generics/">https://asankov.dev/go-generics/</a>
for the slides, which includes a full example, or his GitHub repo:
<a href="https://github.com/asankov/go-generics">https://github.com/asankov/go-generics</a></p>
<p>This proposal still has some drawbacks though: a performance penalty in compile time and in runtime. However, there&rsquo;s only little performance impact on compile time and usually only little performance penalty in runtime.</p>
<p>The exception to the latter is when you are passing interfaces to generic
methods. In this situation, generics can have a <em>big</em> performance impact. If
this matters to you: don&rsquo;t use generics.</p>
<figure><img src="/images/fosdem2025_anton_sankov_go_generics.png"
    alt="Picture comparing generics in C, C&#43;&#43;, Java and Go"><figcaption>
      <p>Anton comparing generics in different languages (image taken from <a href="https://asankov.dev/go-generics/36?clicks=3">his slides</a>)</p>
    </figcaption>
</figure>

<p>All in all, Go is in a good place with generics.</p>
<p>Recording: <a href="https://fosdem.org/2025/schedule/event/fosdem-2025-5329-the-inner-workings-of-go-generics/">The Inner Workings of Go Generics</a></p>
<h2 id="swiss-maps-in-go--bryan-boreham">Swiss Maps in Go &mdash; Bryan Boreham</h2>
<p>Swiss Map is a new map implementation in Go 1.24.</p>
<p>The name of the way it&rsquo;s implemented is <strong>C</strong>losed <strong>H</strong>ashing. The story goes
that this is where the name &ldquo;Swiss map&rdquo; comes from: CH is the country code of
Switzerland.</p>
<p>There were lots of visuals in the presentation to explain how it works, which I
found hard to take notes on that can be understood without copying the whole
presentation alongside it. It was an interesting presentation, do watch the
video if you want to know more about the topic.</p>
<p>Recording: <a href="https://fosdem.org/2025/schedule/event/fosdem-2025-6049-swiss-maps-in-go/">Swiss Maps in Go</a></p>
<h2 id="go-ing-easy-on-memory-writing-gc-friendly-code--sümer-cip">Go-ing Easy on Memory: Writing GC-Friendly code &mdash; Sümer Cip</h2>
<p>There&rsquo;s a lot of theoretical info around garbage collection (GC), but less actual
tips and tricks. This presentation aims to be as practical as possible.</p>
<p>An example of why this topic is important: Datadog switched from Go to Rust
because their service spent 30% of CPU resources on GC
(<a href="https://www.datadoghq.com/blog/engineering/timeseries-indexing-at-scale/">source</a>).</p>
<p>Some of the tips from Sümer:</p>
<ul>
<li>Reducing size almost always has compounding benefits</li>
<li>Returning escapes to heap, calling does not (note: stack is better than heap for performance)</li>
<li><code>interface{}</code> and generics escape to heap</li>
<li>Avoid pointers! GC overhead is linear with the number of pointers.</li>
<li>Try keeping map key/values sizes under 128 bytes</li>
<li>&ldquo;Copying is expensive&rdquo; is a myth. Copying cache lines is the same as copying a pointer</li>
<li>Remember zero allocation libraries? Use them!</li>
<li>Reuse slices (e.g. <code>a = append(a[0:], 10, 20)</code> instead of <code>a = append(a, 10, 20)</code>)</li>
<li>Tune GC by using <code>GOGC</code> and <code>GOMEMLIMIT</code></li>
<li>Profile and benchmark your code</li>
</ul>
<p>Execution tracer is an underrated tool. It&rsquo;s a great cinematic visualization.
It&rsquo;s (kind of) safe to use on production: with Go 1.21 overhead drops to ~1-2%.
For more information see <a href="https://go.dev/blog/execution-traces-2024">More powerful Go execution
traces</a></p>
<p>As always: for more tips, details and background information (e.g. on memory and
garbage collection) watch the recording.</p>
<p>Other interesting talk: <a href="https://archive.fosdem.org/2018/schedule/event/faster/">Make your Go Faster</a> by Bryan
Boreham, FOSDEM 2018.</p>
<p>Recording: <a href="https://fosdem.org/2025/schedule/event/fosdem-2025-5343-go-ing-easy-on-memory-writing-gc-friendly-code/">Go-ing Easy on Memory: Writing GC-Friendly code</a></p>
<h2 id="build-better-go-release-binaries--dimitri-john-ledkov">Build better Go release binaries &mdash; Dimitri John Ledkov</h2>
<p>The focus of this talk is on Linux binaries, but may also be applicable to other
environments. It&rsquo;s basically a list of tips. I wrote these down as reminders to
later look into these in more depth.</p>
<ul>
<li>&ldquo;<code>go build -ldflags -w</code>&rdquo; to remove debug information&mdash;which is on by
default&mdash;since it is often unused in production anyway but can be quite large
in size.</li>
<li>&ldquo;<code>go build -trimpath</code>&rdquo; to prevent leaking full file paths into the binary, to
not take up space an not doing it leads to non-reproducible builds.</li>
<li>&ldquo;<code>go build -tags netgo,osusergo</code>&rdquo; (for container/portable binaries,
&ldquo;<code>CGO_ENABLED=1 go build</code>&rdquo; for explicit host OS resolution)</li>
<li>&ldquo;<code>GOAMD64=v2 GOARM64=v8.0</code>&rdquo; for production hardware that is not 20 years old,
this will improve performance of your binaries</li>
<li>&ldquo;<code>go build -buildmode=pie</code>&rdquo;: position independent code/executable can improve
security. Use this for dynamic libraries.</li>
<li>&ldquo;<code>go build -ldflags=&quot;-X main.Version=$(git describe ...)&quot;</code>&rdquo;</li>
<li>Go toolchain doesn&rsquo;t respect <code>CFLAGS</code>/<code>CXXFLAGS</code>. Use <code>CGO_CFLAGS</code>,
<code>CGO_CXXFLAGS</code>, etc.</li>
<li>Use &ldquo;<code>govulncheck -mode=binary</code>&rdquo; to report module level CVEs. If your binary has
symbol tables, it reports symbol level CVEs. If you keep your symbols in your
binaries, the vulnerability checker could check better if the vulnerability is
actually affecting you.</li>
<li>Do <strong>not</strong> use &ldquo;<code>go build -ldflags -s</code>&rdquo; Do <strong>not</strong> use &ldquo;<code>strip --strip-all</code>&rdquo;.
Verify with &ldquo;<code>go tool nm</code>&rdquo;.</li>
<li>Bump your &ldquo;<code>toolchain go1.x.y</code>&rdquo; stanza regularly. The same code built with a new
go toolchain can be safer.</li>
</ul>
<p>Recording: <a href="https://fosdem.org/2025/schedule/event/fosdem-2025-4406-build-better-go-release-binaries/">Build better Go release binaries</a></p>
<h2 id="kubernetes-outside-of-the-cloud-lessons-learned-after-3-years--nadia-santalla">Kubernetes outside of the cloud: Lessons learned after 3 years &mdash; Nadia Santalla</h2>
<p>For Nadia a self managed Kubernetes cluster, out of the cloud, is: managing your
own hardware, managing your own control plane, and not relying on external
services (like DNS).</p>
<p>A Kubernetes node is a properly configured and running <code>kubelet</code> process. A
control plane is a series of services that make Kubernetes work: an API, a
database (etcd often) and a bunch of clients using the API. These services often
run on Kubernetes itself (often on one or more dedicated machines).</p>
<p>There&rsquo;s an inception problem: if the services that run the control plane also
run in that control plane, how do we start? The answer: with a static manifest.</p>
<p>Useful tools if you want to run your own cluster:</p>
<ul>
<li>Kubeadm: generate static manifests, generating consistent config files, create
RBAC objects, create TLS certs, etc.</li>
<li>Kine (instead of etcd): etcd is not friendly to SSD lifespan. Kine can also be
deployed with Kubeadm.</li>
<li>Cilium as the CNI plugin: well documented, less hard to debug, lots of knobs
to tweak.</li>
<li>Cilium egress gateway: useful for multitenancy.</li>
<li>MetalLB: You&rsquo;ll probably need a load balancer, this is a nice one. Implements
failover using Gossip.</li>
<li>External-dns: creates A and AAAA records for ingress, load balancer servies and
custom resources. But it will <strong>not</strong> service the DNS records itself.</li>
<li>txqueuelen/stateless-dns: combines external-dns with PowerDNS. It makes
PowerDNS stateless.</li>
</ul>
<blockquote>
<p>Gitops is what Kubernetes makes it worth it.</p></blockquote>
<p>Recording: <a href="https://fosdem.org/2025/schedule/event/fosdem-2025-4387-kubernetes-outside-of-the-cloud-lessons-learned-after-3-years/">Kubernetes outside of the cloud: Lessons learned after 3 years</a></p>
<h2 id="return-of-go-without-wires--ron-evans">Return Of Go Without Wires &mdash; Ron Evans</h2>
<p>I might be selling him short by summarizing it like this, but Ron showed his
adventures with his home made &ldquo;find my&rdquo; device, using
<a href="https://tinygo.org/">TinyGo</a>. If you like fiddling around with bluetooth
devices, definitely watch the recording.</p>
<p>Related link: <a href="https://github.com/hybridgroup/go-haystack">https://github.com/hybridgroup/go-haystack</a></p>
<p>Recording: <a href="https://fosdem.org/2025/schedule/event/fosdem-2025-5907-return-of-go-without-wires/">Return Of Go Without Wires</a></p>
<h2 id="go-lightning-talks">Go Lightning Talks</h2>
<p>These are short talks (8 minutes if I recall correctly), and I only took short
&ldquo;check this out later&rdquo; notes:</p>
<ul>
<li>Check what the <a href="https://pkg.go.dev/sync">sync package</a> has to offer.</li>
<li>A go links implementation: <a href="https://github.com/tobiaskohlbau/golinks">https://github.com/tobiaskohlbau/golinks</a></li>
<li><a href="https://github.com/nikolayk812/pgx-outbox">pgx-outbox</a>, a solution for the
<a href="https://thorben-janssen.com/dual-writes/">dual write problem</a> using the
<a href="https://microservices.io/patterns/data/transactional-outbox.html">transactional outbox pattern</a></li>
<li>Generate RESTful HTTP handlers for a resource/entity with
<a href="https://github.com/dolanor/rip">https://github.com/dolanor/rip</a></li>
<li>gno, &ldquo;go for dapps, design for modularity, composability and safety&rdquo;:
<a href="https://gno.land/">https://gno.land/</a></li>
</ul>
<p>Recording: <a href="https://fosdem.org/2025/schedule/event/fosdem-2025-4609-go-lightning-talks/">Go Lightning Talks</a></p>
<h2 id="summary--reflections">Summary / reflections</h2>
<p><em>This section was added on 2025-02-02 after I have had time to reflect on what
I&rsquo;ve heard and seen.</em></p>
<p>Looking back at yesterday, I must say I liked FOSDEM. I&rsquo;ve had a great day and
was able to listen to a bunch of interesting talks (more on those in a bit). I
usually go to these kind of conferences to get inspired and that also did happen
yesterday.</p>
<p>And while I&rsquo;m not really a <a href="https://en.wiktionary.org/wiki/hallway_track">hallway track</a>
kind of person, I did find the atmosphere at the stands pleasant. More than the
&ldquo;normal&rdquo;, commercial vendor booths you see at most conferences.</p>
<p>Now, about those talks. I think my biggest takeaways are:</p>
<ul>
<li>Using generics in Go is not something to shy away from for performance reasons
(at least in most cases). I&rsquo;ll keep them more in my mind when writing code.</li>
<li>Be more aware of garbage collection. While the services I&rsquo;m working on at work
do not have any performance issues (yet), it is good to be aware of some thinks,
like reusing slices. Especially for a person rather new to Go (me) it is
useful to form good habits.</li>
<li>There are lots of ways to tweak your Go binary builds. These are definitely
worth investigating and including them in our code base.</li>
<li>On a personal note: as the owner of a Raspberry Pi Pico W I&rsquo;ve learned that I
could use TinyGo instead of MicroPython to tinker with it.</li>
</ul>]]></content>
  </entry>
  <entry>
    <title type="html"><![CDATA[PyGrunn 2024]]></title>
    <link rel="alternate" href="https://markvanlent.dev/2024/05/17/pygrunn-2024/" type="text/html" />
    <id>https://markvanlent.dev/2024/05/17/pygrunn-2024/</id>
    <author>
      <name>map[name:Mark van Lent uri:https://markvanlent.dev/about/]</name>
    </author>
    <category term="conference" />
    <category term="python" />
    
    <updated>2024-05-17T16:42:29Z</updated>
    <published>2024-05-17T00:00:00Z</published>
    <content type="html"><![CDATA[<p>Notes from my day at the 12th edition of PyGrunn.</p>
<p><a href="https://pygrunn.org/">PyGrunn</a> is a Python focussed, one day conference held in
Groningen<sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup>. Or as the organizers more eloquently phrase this:</p>
<figure class="float-right"><img src="/images/pygrunn24_banner.jpg"
    alt="PyGrunn banner" width="150px">
</figure>

<blockquote>
<p>PyGrunn is the &ldquo;Python and friends&rdquo; developer conference with a local
footprint and global mindset. Firmly rooted in the open source culture, it
aims to provide the leaders in advanced internet technologies a platform to
inform, inspire and impress their peers.</p></blockquote>
<p>Before I start with my notes I want to give a shout-out to Reinout van Rees.
<a href="https://reinout.vanrees.org/weblog/tags/pygrunn.html">His (PyGrunn) summaries</a>
are excellent. I&rsquo;m always impressed by the quality of them and how little time
he needs to write them. Where I&rsquo;m only able to make notes and need to write them
out afterwards, Reinout has the summary (as a coherent story) ready before the
speaker has unhooked their laptop. So if you are interested in one of the talks
he has attended, head over
<a href="https://reinout.vanrees.org/weblog/tags/pygrunn.html">there</a> first.</p>
<h2 id="platform-engineering-python-perspective--andrii-mishkovskyi">Platform Engineering: Python Perspective &mdash; Andrii Mishkovskyi</h2>
<p>Why do platform engineering teams exist? We&rsquo;ve seen a &ldquo;shift left&rdquo; of
responsibilities towards the developer e.g. QA, operations (DevOps). But we
(software developers) are trained to write code, not to e.g. monitor it.</p>
<p>So where does a humble developer start? There is an abundance of choices on the
tools to use. What do you pick for package management? Or continuous
integration? Or deployment, code quality, observability, etc.  This freedom of
choice comes with a cost. We start with discussing which tools to use instead of
the problem the customer is facing. And depending on which choice we make, the
result may make reasoning about the software more complex.</p>
<p>So what should you do?</p>
<p>Andrii broke it down into three parts:</p>
<ul>
<li>You observe (i.e. you read a lot to get the lay of the land)</li>
<li>You execute (this is the actual software development part)</li>
<li>And then you collect feedback (you reach out to teams, you observe how their work has changed)</li>
</ul>
<p>Some of the platform engineering team deliverables:</p>
<ul>
<li>Documentation</li>
<li>Self service portal (tip: look into <a href="https://backstage.io/">Backstage</a>)</li>
<li>Boilerplates</li>
<li>APIs</li>
</ul>
<p>The &ldquo;consumers&rdquo; are developers, compliance teams and other platform teams. The
goal is to:</p>
<ul>
<li>Have reasonable defaults</li>
<li>Remove redundancy</li>
<li>Keep things consistent</li>
</ul>
<p>To get a feel for the scale of things: at Andrii&rsquo;s company there are over 160
services, developed on by 300+ developers in 500+ repositories. They total up
to 3 million lines of code and 6 million lines of YAML.</p>
<p>Templates (boilerplates) provide a paved path. The goal is to have teams spend
as little time as possible when starting a project. The templates use a certain
set tools that are supported. Teams are free to use different tools though if
they want to.</p>
<p>Andrii uses <a href="https://github.com/cookiecutter/cookiecutter">cookiecutter</a>
templates at work. It&rsquo;s not his choice perse, but it&rsquo;s what was already in place
when he joined. There are currently three templates in use. They have evolved
over time. For example in the last nine years, over 800 changes have been made
(that is more than one change per week on average).</p>
<p>The evolution has left its marks: the templates have a lot of code duplication.
There is also code specific to a tiny minority of projects (only about 8 out of
the 500+). This means that most projects start with deleting code after using
the boilerplate.</p>
<p>And that also relates the downside of using cookiecutter the way they do. Instead of
just using it to get started with a project, they also use it incrementally. But
cookiecutter does not have versioning built-in. So if you remove a file and then
reapply cookiecutter again, the file is happily created again.</p>
<p>While Andrii is aware of the issues (and thinks
<a href="https://github.com/copier-org/copier">copier</a> might be a better alternative),
it is hard to replace practices that are already in use. And cookiecutter is
great for getting started with a project.</p>
<p>With regard to the standardization, they use the following in the templates:</p>
<ul>
<li><code>pyproject.toml</code> for all projects</li>
<li>Poetry (instead of setuptools + pip-tools)</li>
<li>sprinkle <a href="https://github.com/renovatebot/renovate">Renovate</a> on top for
automatically updating dependencies</li>
</ul>
<p>As it currently stands, there are 99 project that migrated to <code>pyproject.toml</code> and
Poetry in the last 2 years. It makes sense because it takes time for projects to
transition. Plus they are not <em>required</em> to migrate; again: the template are there
to help, not to limit the users. Renovate has been adopted more quickly.</p>
<p>Migrating from e.g. <code>pkg_resources</code> to <code>pkgutil</code> or
<a href="https://peps.python.org/pep-0420/">PEP-420</a> for namespaces packages is hard.
Templates can help with that. However, cookiecutter does not actually <em>manage</em>
files. So if a file has been removed from a template, rerunning cookiecutter
does not remove the file from the project. So that requires some care.</p>
<p>When they migrated from a monolithic application to a microservices
architecture, authentication/authorization became an issue. There was no
visibility for teams, no transparency and no accountability. To combat this, they
created an API where applications can declare the required access and scopes
in a YAML file. Maintainers can approve this access. And this also allows for
CI/CD to check access. A CLI tool can verify the validity of the YAML and check
if access is actually approved.</p>
<h2 id="securing-your-team-solution-and-company-to-embrace-chaos--edzo-botjes">Securing your team, solution and company to embrace chaos &mdash; Edzo Botjes</h2>
<p>Edzo started by sharing a link to
<a href="https://docs.google.com/presentation/d/1j7HgfiZXd51QdPHD1_yptzbxx8s9O2BNsA0dCG-Ajes/edit#slide=id.g2dd9784acd0_0_108">his slides</a>
and warned us that it usually takes two weeks for people to digest the contents
of his talk. He would overload us with information. And that&rsquo;s where I decided
to solely concentrate on the talk and not on note taking.</p>
<blockquote>
<p>Nobody knows what they are doing. Embrace this.</p></blockquote>
<p>Even a simple, deterministic system like a <a href="https://en.wikipedia.org/wiki/Double_pendulum">double
pendulum</a> has a nondeterministic
outcome. In other (my) words: the whole world is in chaos and unpredictable.</p>
<p>When presented with information everyone processes it differently and
understands something else (also see viral phenomenon of <a href="https://en.wikipedia.org/wiki/The_dress">the dress</a>).</p>
<p><img src="/images/pygrunn24_edzo_botjes.png" alt="Different perspectives: one person sees a circle, another a rectangle"></p>
<p>What worked for Edzo was to embrace chaos. To do this he let go of his desire to
control and trying to create a predictable outcome.</p>
<h2 id="descriptors-decoding-the-magic--alex-dijkstra">Descriptors: Decoding the Magic &mdash; Alex Dijkstra</h2>
<p>Many people have used descriptors without them even being aware of it.</p>
<p>From the documentation:</p>

  <figure>

<blockquote cite="https://docs.python.org/3/howto/descriptor.html">
Descriptors let objects customize attribute lookup, storage, and deletion.
</blockquote>

  <figcaption>
    &mdash;<cite><a href="https://docs.python.org/3/howto/descriptor.html">Descriptor Guide</a></cite>
  </figcaption>
  </figure>


<p>You can view descriptors as reusable @properties. A descriptor implements the <code>__get__</code> and
<code>__set__</code> methods (and when needed <code>__delete__</code>).</p>
<p>Alex showed a bunch of examples. This is the template he showed to introduce
descriptors:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">class</span> <span class="nc">MyDescriptor</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="fm">__get__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">obj</span><span class="p">,</span> <span class="n">owner</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="c1"># owner is the class to which the instance belongs</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">obj</span><span class="o">.</span><span class="vm">__dict__</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">private_name</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="fm">__set__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">obj</span><span class="p">,</span> <span class="n">val</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="c1"># self is descriptor instance.</span>
</span></span><span class="line"><span class="cl">        <span class="n">obj</span><span class="o">.</span><span class="vm">__dict__</span><span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">private_name</span><span class="p">]</span> <span class="o">=</span> <span class="n">val</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">__set_name__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">owner</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">public_name</span> <span class="o">=</span> <span class="n">name</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">private_name</span> <span class="o">=</span> <span class="sa">f</span><span class="s1">&#39;_</span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s1">&#39;</span>
</span></span></code></pre></div><p>His example of how to use this:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">class</span> <span class="nc">MyClass</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="n">value</span> <span class="o">=</span> <span class="n">MyDescriptor</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">myinstance</span> <span class="o">=</span> <span class="n">MyClass</span><span class="p">()</span>
</span></span><span class="line"><span class="cl"><span class="n">myinstance</span><span class="o">.</span><span class="n">value</span> <span class="o">=</span> <span class="mi">5</span>  <span class="c1"># MyDescriptor.__set__</span>
</span></span><span class="line"><span class="cl"><span class="n">myinstance</span><span class="o">.</span><span class="n">value</span>      <span class="c1"># MyDescriptor.__get__</span>
</span></span></code></pre></div><p>Using descriptors you can do things  when getting or setting the value. E.g. in a
class you can enforce that an attribute has a certain type.</p>
<p>The <code>__set_name__</code> method was introduced in Python 3.6. It is not needed to use this in all of your
descriptors, but also doesn&rsquo;t hurt.</p>
<p>Do you want to use descriptors all over the place? No. They can be useful: you
can create a clean APIs with them and this helps if the API is used frequently.
However, it does create some overhead and the code is a bit more complex.</p>
<p>Resources:</p>
<ul>
<li><a href="https://docs.python.org/3/howto/descriptor.html">https://docs.python.org/3/howto/descriptor.html</a></li>
<li>Luciano Ramalho&rsquo;s book
<a href="https://www.oreilly.com/library/view/fluent-python-2nd/9781492056348/">Fluent Python</a>
(Luciano also did a few talks about descriptors)</li>
</ul>
<h2 id="release-the-krakend--erik-jan-blanksma">Release the KrakenD &mdash; Erik-Jan Blanksma</h2>
<p><a href="https://www.krakend.io/">KrakenD</a> is an API gateway product. Erik-Jan likes it
so much he wanted to share his experience with it.</p>
<p>Projects can start out simple, with a monolith that is accessed via a web
client. Before you know it, there are several services and multiple types of
clients. The solution is to introduce an API Gateway in the middle. It can then
handle the incoming requests.</p>
<p>API Gateway in short:</p>
<ul>
<li>It sits between clients and backend (as a sort of portal).</li>
<li>It hides internal complexity of backend for the clients.</li>
<li>The gateway is a great place to introduce things like
authorization/authentication, logging, load balancing, caching, etc.</li>
</ul>
<p>KrakenD is one of the available API Gateways. It is open source, but there&rsquo;s
also an enterprise version with extra features. KrakenD is implemented in Go and
offers a bunch of features out of the box (monitoring, throttling, request and
response manipulation). KrakenD offers integrations with e.g. tools (Jaeger,
Grafana, the Elastic stack), authorization/authentication services and queues
(RabbitMQ).</p>
<p>KrakenD is a stateless process, so no database is needed. It takes JSON (or
YAML) config. It can combine the results of multiple backends API calls and
return it as a single response.</p>
<p>Tips:</p>
<ul>
<li>Use the KrakenDesigner (makes it easy to explore what&rsquo;s possible). Note that
you do not want to use this in production.</li>
<li>You&rsquo;ll want to split up the configuration when it grows. By using flexible
configuration you can combine so called partials, settings and templates.</li>
<li>KrakenD can check and even audit you configuration.</li>
<li>Since the configuration is in JSON, you can generate
OpenAPI.json from the KrakenD config. You can use this for Swagger.</li>
</ul>
<p>KrakenD is a great tool to manage your APIs. It is lightweight, fast, easy to
configure and has lots of functions out of the box. It is versatile and
extensible. By using it you can make your architecture more agile.</p>
<p>However, it also means that you will have to manage the API Gateway
configuration. And a change in the configuration means you will have to restart
the process.</p>
<h2 id="general-tips">General tips</h2>
<p>Some general notes:</p>
<ul>
<li>Have a look at:
<ul>
<li><a href="https://docs.pydantic.dev/latest/">Pydantic</a></li>
<li><a href="https://github.com/aws/chalice">Chalice</a></li>
<li><a href="https://xata.io/">Xata</a></li>
</ul>
</li>
<li>It can be helpful use functional programming (e.g. using closures) instead of
by default using classes and methods.</li>
</ul>
<div class="footnotes" role="doc-endnotes">
<hr>
<ol>
<li id="fn:1">
<p>&ldquo;Grunn&rdquo; is what Groningen is called in the regional language&#160;<a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
</ol>
</div>]]></content>
  </entry>
  <entry>
    <title type="html"><![CDATA[All Day DevOps 2022]]></title>
    <link rel="alternate" href="https://markvanlent.dev/2022/11/10/all-day-devops-2022/" type="text/html" />
    <id>https://markvanlent.dev/2022/11/10/all-day-devops-2022/</id>
    <author>
      <name>map[name:Mark van Lent uri:https://markvanlent.dev/about/]</name>
    </author>
    <category term="conference" />
    <category term="costs" />
    <category term="devops" />
    <category term="failure" />
    <category term="metrics" />
    <category term="observability" />
    <category term="sre" />
    <category term="terraform" />
    
    <updated>2022-11-10T21:32:11Z</updated>
    <published>2022-11-10T00:00:00Z</published>
    <content type="html"><![CDATA[<p>Today was the 7th <a href="https://www.alldaydevops.com/">All Day DevOps</a>. Just like
<a href="/2021/10/28/all-day-devops-2021/">last year</a> the organizers managed to get 180
speakers, spread over 6 tracks, to inform, teach and entertain us. As usual I
made some notes of the talks that I attended.</p>
<p><img src="/images/all_day_devops_2022_banner.webp" alt="All Day DevOps 2022 banner"></p>
<h2 id="keynote-the-rise-of-the-supply-chain-attack--sean-wright">Keynote: The Rise of the Supply Chain Attack &mdash; Sean Wright</h2>
<p>Dependency managers (like Apache Maven) changed the way software is being
developed: using those makes it much easier to use dependencies in your
application (you don&rsquo;t have to fetch them manually yourself). As a result,
building your own libraries is now less common, using an available (open source)
library is much easier.</p>
<p>To get a feel for the scale: NPM serves 2.1 trillion annual downloads
(requests). This obviously includes automatic builds, but the growth over the
last couple of years has been exponential none the less. And using third party
libraries can be a very good thing. Good libraries can be much more secure and
have less bugs than some library developed in house.</p>
<p>Types of supply chain attacks:</p>
<dl>
<dt>Dependency confusion</dt>
<dd>Register a private package (which you know is used by a company) in a public
repository. If the package manager searches for the package in the public
repository <em>before</em> the private one, the malicious package is used.</dd>
<dt>Typo squatting / masquerading</dt>
<dd>Use misspelled or &ldquo;familiar&rdquo; names in the hopes someone uses the wrong package.</dd>
<dt>Malicious package</dt>
<dd>Malicious payload in a package that promises (and perhaps even delivers)
useful functionality.</dd>
<dt>Malicious code injection</dt>
<dd>Compromise of an existing package (e.g. by compromising the build system or
repository).</dd>
<dt>Account takeover</dt>
<dd>Malicious actor can log in as the original author.</dd>
<dt>Exploiting vulnerabilities in packages</dt>
<dd>Using a known or unknown vulnerability in a library.</dd>
<dt>Protestware</dt>
<dd>For example a legitimate library with notes protesting against the war in
Ukraine.</dd>
</dl>
<p><em>After this Sean unfortunately dropped from the session.</em></p>
<h2 id="taking-control-over-cloud-costs--amir-shaked">Taking Control Over Cloud Costs &mdash; Amir Shaked</h2>
<p>Reasons to analyze your cloud costs:</p>
<ul>
<li>Costs translate to profit</li>
<li>Usage is directly tied to costs (and if you don&rsquo;t properly monitor, your spend
could increase unexpected)</li>
<li>Be in control: by monitoring you can determine what impact certain actions
have on your business</li>
<li>Be intelligent: by knowing what you spend where, you can connect it to your
business KPIs</li>
<li>Save money</li>
</ul>
<p>You&rsquo;ll want to be aware when you scale (hyper growth) if your costs grow faster
than the number of users/sales/etc. You don&rsquo;t want to end up in a situation
where more success means <em>less</em> profit.</p>
<p>Things to look for in your environment:</p>
<ul>
<li>Unused resources</li>
<li>Over commitment</li>
<li>Under commitment</li>
<li>Price per unit we care about (e.g. users, sales, etc)</li>
<li>Bugs</li>
</ul>
<p>Best practices you can arrange from day one:</p>
<ul>
<li>Export billing data and create e.g. dashboards</li>
<li>Structure and organize resources for fine-grained cost reporting</li>
<li>Configure policies on who can spend and who has administrator permissions</li>
<li>Have a culture of &ldquo;showback&rdquo; in your organization. Make your employees aware of
the costs, have budgets and alerts.</li>
</ul>
<p><img src="/images/all_day_devops_2022_amir_shaked.webp" alt="Amir Shaked with the takeaways from his talk"></p>
<h2 id="beyond-monitoring-the-rise-of-observability-platform--sameer-paradkar">Beyond Monitoring: The Rise of Observability Platform &mdash; Sameer Paradkar</h2>
<p>Customer experience is important for your revenue. You need to deliver your
service and make sure you know how well you&rsquo;re able to do so. Observability
helps you find the needle in the haystack, identify the issue and respond before
your customers are affected.</p>
<p>Pillars of observability:</p>
<dl>
<dt>Metrics</dt>
<dd>Numeric values measured over time. Easy to query and retained for longer periods.</dd>
<dt>Logs</dt>
<dd>Records of events that occurred at a particular time (plain text, binary,
etc). Used to understand the system and what&rsquo;s going on.</dd>
<dt>Traces</dt>
<dd>Represent the end-to-end journey of a user request through the subsystems of
your architecture.</dd>
</dl>
<p>Relevant key performance indicators (KPIs) and key result areas (KRAs):</p>
<ul>
<li>Customer experience</li>
<li>Mean time to repair (MTTR)</li>
<li>Mean time between failure (MTBF)</li>
<li>Reliability and availability</li>
<li>Performance and scalability</li>
</ul>
<p>An observability platform gives you more visibility into your systems health
and performance. It allows you to discover unknown issues. As a result you&rsquo;ll
have fewer problems and blackouts. You can even catch issues in the build phase
of the software development process. The platform helps understand and debug
systems in production via the data you collected.</p>
<p><img src="/images/all_day_devops_2022_sameer_paradkar.webp" alt="Sameer Paradkar shows a reference architecture of an observability solution"></p>
<p>AIOps applies machine learning to the data you&rsquo;ve collected. It&rsquo;s a next stage
of maturity. Its goal is to create a system with automated functions, freeing up
engineers to work on other things. Automating remediation of issues can also
greatly reduce response time and mean time to repair. This means that the
customer experience is restored faster (or is never degraded to begin with).</p>
<h2 id="failure-is-not-an-option-its-a-fact--ixchel-ruiz">Failure Is Not an Option. It&rsquo;s a Fact &mdash; Ixchel Ruiz</h2>
<p>Failure can cause a deep emotional response, we can get depressed and it can
make us physically sick. On one side of the spectrum, a failure can cause harm
to other people. On the other side we could embrace failure and make things safe to
fail. Failure in IT on the project level is quite common. Failure can also
happen on a personal level.</p>
<p><img src="/images/all_day_devops_2022_ixchel_ruiz.webp" alt="Ixchel Ruiz talks about three types of errors: preventable, complexity related and intelligent"></p>
<p>Not all failures are created equally. There are three types:</p>
<dl>
<dt>Preventable</dt>
<dd>These are the &ldquo;bad&rdquo; failures. There&rsquo;s no reason to allow these.</dd>
<dt>Complexity related</dt>
<dd>These failures are due to the inherent uncertainty of work. Usually it is a
particular combination of needs, people and problems that cause them.</dd>
<dt>Intelligent</dt>
<dd>The &ldquo;good&rdquo; failures, since they provide new knowledge.</dd>
</dl>
<p>Steps to learn from failures:</p>
<ul>
<li>Recognize</li>
<li>Understand the cause</li>
<li>Extract lessons to prevent future failure</li>
<li>Share the information</li>
<li>Practice for the next failure in a safe and controlled setting</li>
</ul>
<p>Increase return by learning from every failure, share the lessons and review the
pattern of failures. Do note that none of this can happen in an environment without
psychological safety. You need to feel safe to discuss your failure, doubts or
questions to be able to learn.</p>
<h2 id="accurate-metrics--christina-zeller-and-marcus-crestani">Accurate Metrics &mdash; Christina Zeller and Marcus Crestani</h2>
<p>A manufacturing process was monitored and there was a nice dashboard to show
whether there were any problems. However, at a certain moment there was a
problem, but the dashboard was still claiming everything was fine.</p>
<p><img src="/images/all_day_devops_2022_christina_zeller.webp" alt="Christina Zeller explaining the problem that the monitoring system did not detect a problem"></p>
<p>What was going on? Unreliable network? Erratic monitoring system? Flawed
collection of metrics? It turned out to be <em>all of them</em>.</p>
<p>Sampling rates, retention policy and network issues can cause missing
measurements in your time series database. This missing information can cause a
drop in failure rate if you are unlucky enough that the missing samples are
failed ones. So you <em>think</em> everything is fine, but there is something wrong in
your environment.</p>
<p>Modelling metrics differently can help. One possible improvement is to have a
duration <em>sum</em> in your metrics instead of just the duration. If you now miss a
sample, the sum will still indicate that there has been a failure.</p>
<p>Using histograms is even better since you place values in buckets, e.g. failures
in our case. A disadvantage however is that the metrics creation system must now
also know what qualifies as a success or failure.</p>
<p>Takeways:</p>
<ul>
<li>Always collect metrics</li>
<li>Check that your metrics match reality</li>
<li>Always consider missing data</li>
<li>For duration metrics always use histograms</li>
</ul>
<p>After improving their metrics, the monitoring system matches the actual state of
the manufacturing environment again.</p>
<p>Tip: look into <a href="https://en.wikipedia.org/wiki/Hexagonal_architecture_(software)">hexagonal
architecture</a>,
also known as &ldquo;ports and adapters architecture.&rdquo;</p>
<h2 id="why-is-it-always-dns-tls-and-bad-configs--philipp-krenn">Why Is It Always DNS, TLS, and Bad Configs? &mdash; Philipp Krenn</h2>
<p><img src="/images/all_day_devops_2022_philipp_krenn.webp" alt="Philipp Krenn compares DNS, TLS and bad config with the three main characters in the Harry Potter series: when something bad happens, at least one of them is always involved"></p>
<p>DNS, TLS and bad config are where failures are waiting to haunt us when we least
expect it. We need to have a tool to find the issues in our system early on.
Health checks can be this essential tool to alert us.</p>
<p>You are able to narrow down where the issue is if you structure your health
checks like this:</p>
<dl>
<dt>Outside the network (different provider)</dt>
<dd>This allows you to detect issues with e.g. DNS, the uplink, a firewall, a load
balancer, service availability and latency.</dd>
<dt>On the network (different AZ)</dt>
<dd>If you are on the network, you could see issues with the network, firewall,
TLS, service availability and latency. You can compare your measurements from
outside with what you see on the inside.</dd>
<dt>On the instance</dt>
<dd>Again, by comparing the local point of view with the outside, you can detect
issues with service availability, proxy vs service, latency, dependencies
(e.g. a database that is slow or the service cannot reach)</dd>
</dl>
<p>Examples of tests you can use:</p>
<ul>
<li>Ping a host</li>
<li>Setup a TCP connection</li>
<li>Do an HTTP request</li>
<li>POST data to a service</li>
<li>Synthetic monitoring where you simulate button clicks</li>
</ul>
<p>Health checks are cheap to run and give you a fast overview. They <strong>do not</strong>
replace observability and only tell you something is broken, not what is going
on. Start simple and only add synthetics when/where needed since they more
complex.</p>
<h2 id="comprehending-terraform-infrastructure-as-code-how-to-evolve-fast--safe--anton-babenko">Comprehending Terraform Infrastructure as Code: How to Evolve Fast &amp; Safe &mdash; Anton Babenko</h2>
<p>By using <a href="https://registry.terraform.io/namespaces/terraform-aws-modules">Terraform AWS modules</a>
you&rsquo;ll have to write less Terraform code yourself, compared to using the
AWS provider resources directly.</p>
<p><img src="/images/all_day_devops_2022_anton_babenko.webp" alt="Anton Babenko compares writing all Terraform code yourself in one big book with using terraform-aws-modules which results in a smaller book"></p>
<p>For example: if you write your own Terraform code from scratch for an example
infrastructure with 40 resources, you&rsquo;ll need about 200 lines of code. Once you
introduce variables, that code base will grow to 1000 lines of code. When you
then split it up into modules, you&rsquo;ll need even more code.</p>
<p>Instead, if you use <code>terraform-aws-modules</code> you&rsquo;ll have more features than your
own modules would only need about 100 lines of code.</p>
<p>Questions with regard to <code>terraform-aws-modules</code>:</p>
<dl>
<dt>Why?</dt>
<dd>Understandability is an important issue. To help with these questions (like
&ldquo;why is it done this way?&rdquo;) there are autogenerated documentation and diagrams. A
visualization of the dependencies also helps.</dd>
<dt>Does it work?</dt>
<dd>The project focusses on static analysis (using <code>tflint</code> and
<code>terraform validate</code>) and running Terraform on examples. Anton thinks that testing
code should be for humans (so HCL is better than Go).</dd>
<dt>How to get feedback quicker?</dt>
<dd>Options: run Terraform locally instead of in CI/CD pipeline, simplify the
code, and restrict surface area of your code using policies, guardrails, etc.
Split up your Terraform code, with different states, to speed up Terraform
runs.</dd>
</dl>
<p>Most important word for this talk: <strong>understandability</strong>.</p>
<blockquote>
<p>50-67% of time for software projects spent on maintenance.</p></blockquote>
<p>Some useful links Anton listed:</p>
<ul>
<li><a href="https://github.com/terraform-community-modules">terraform-community-modules</a></li>
<li><a href="https://github.com/terraform-aws-modules">terraform-aws-modules</a></li>
<li><a href="https://github.com/antonbabenko/terraform-aws-devops">Various links to Anton&rsquo;s Terraform, AWS, and DevOps projects</a></li>
<li><a href="https://github.com/antonbabenko/pre-commit-terraform">Collection of git hooks for Terraform</a></li>
<li><a href="https://serverless.tf">Doing serverless with Terraform</a></li>
<li><a href="https://www.terraform-best-practices.com/">www.terraform-best-practices.com</a></li>
<li><a href="https://www.youtube.com/channel/UCGH0yYPvlCN1VjSFMGVmFgQ">Your weekly dose of #Terraform by Anton</a></li>
<li><a href="https://weekly.tf/">Terraform weekly</a></li>
<li><a href="https://twitter.com/antonbabenko">Anton on Twitter (@antonbabenko)</a></li>
<li><a href="https://lepiter.io/feenk/developers-spend-most-of-their-time-figuri-9q25taswlbzjc5rsufndeu0py/">Developers spend most of their time figuring the system out</a></li>
<li><a href="https://diagrams.mingrammer.com/">Diagram as Code</a></li>
<li><a href="https://github.com/contentful-labs/terraform-diff">terraform-diff</a></li>
<li><a href="https://github.com/terraform-compliance/cli">terraform-compliance</a></li>
<li><a href="https://localstack.cloud/">LocalStack</a></li>
</ul>
<h2 id="keynote-journey-to-auto-devsecops-at-nasdaq--benjamin-wolf">Keynote: Journey to Auto-DevSecOps at Nasdaq &mdash; Benjamin Wolf</h2>
<p>Why DevOps? It can be pretty complicated to explain, even though it is an
obvious choice for Nasdaq. For most people Nasdaq is a stock exchange but it is
actually a global technology company. Sure, they run the stock exchange, but
also provide capital access platforms and protect the financial system.</p>
<p>Nasdaq develops and delivers solutions (value) for their users. They manage and
operate complex systems. It has been around for a while. So again, why DevOps?
The answer is: to get better at their practices.</p>
<ul>
<li>They want to deliver solutions (value) to their users <strong>efficiently, reliably and safely</strong>.</li>
<li>They want to manage and operate complex systems <strong>efficiently, reliably and safely</strong>.</li>
</ul>
<p>Years ago they had manually configured static servers and the development teams
were growing. They automated software deployment to a point where the product
owners could trigger the deployment, and even pick which branch to deploy. This
was an important <strong>first evolution</strong>. They had a &ldquo;DevOps team&rdquo; to handle this
automation.</p>
<p>The <strong>second evolution</strong> for Nasdaq was moving from a data center to the cloud,
using infrastructure as code (IaC). The question they asked themselves was what
to do first: migrate to cloud or get their data center infrastructure 100%
managed via IaC? They made the ambitious decision to do both at once.</p>
<p>By turning your infrastructure into code, you can create and destroy the
environment as many times as you like. And this was welcome: after about 2100
times they &ldquo;got it right&rdquo; and were able to move over the production environment
to the cloud. Without IaC this would not have been possible as flawlessly as it
did.</p>
<p>The cloud and IaC brought them:</p>
<ul>
<li>Efficiency: maintenance, patching, capacity</li>
<li>Reliability: self-healing, immutable</li>
<li>Safety: immutability, destroy/create</li>
</ul>
<p>Over time the DevOps team started to handle a lot more work. The team consisted
of system administators, but they were required to work as developers (make code
reusable, use git, etc). The DevOps team started to complain about being
overloaded and they became a bottleneck since a lot of development teams came to
them with problems (failing builds, cloud questions).</p>
<p>On the other side, the development teams stared to complain because they are
dependent on the DevOps team but that team had become the bottleneck. And &ldquo;just&rdquo;
scaling up the DevOps team would not solve the problem.</p>
<p>Where the second evolution was about the technology, the <strong>third evolution</strong> was about
efficiency. They moved to a &ldquo;distributed DevOps&rdquo; model. Developers were
empowered: access to logs and metrics, training (cloud, Terraform, Jenkins). By
creating a central observability platform, developers could get insight in what
is going on, without the need to have access to the production environment.</p>
<p>This resulted in more deployments and enhanced reliability of the deployments
because of the observability platform.</p>
<p><img src="/images/all_day_devops_2022_benjamin_wolf_1.webp" alt="Benjamin Wolf about their third evolution"></p>
<p>A year or three later, new cracks appeared. Standards were diverging because
teams were allowed to pick their own path (libraries, databases, pipelines,
Terraform code, etc). It also lead to practices that needed to be fixed (e.g.
lack of replication). Standardizing this led to quite a burden at the start of a
project: lots of basic stuff to setup.</p>
<p>Developers needed to be experts in a lot of technology, from JavaScript and the
JavaScript framework in use, via multiple .NET versions to Terraform and other
deployment related tech. An easy way to solve this situation was to flip back to
the previous situation with a single team responsible for deployment and such.
But this would basically mean recreating the bottleneck.</p>
<p>The ownership itself was not the problem. The efficiency was, because of the
boilerplate needed for teams. Stuff you want to do the same across the teams.
They wanted to empower the development teams, but also give them the standards
for databases, messaging, etc. Instead of copying a template and have teams
diverge afterwards, they looked into packaging to also make it easier to update
afterwards.</p>
<p>This lead to <strong>evolution four</strong> with marker files, packages, code generators and
auto-devops pipelines. The pipeline looks at the markers (&ldquo;hey, this is a .NET
app&rdquo;) and can then apply a standard pipeline. Nasdaqs code generators create the
boilerplate for the teams so within two minutes of starting with a new
application you&rsquo;re able to write code to solve your business problem, instead of
having to create boilerplate code yourself first.</p>
<p>The developers can get up and running quickly, but in a safe way.</p>
<p>The development teams are all DevOps teams now, but Nasdaq also has a
specialized team for the complex areas (hardware, networking, etc). There is
also a &ldquo;developer experience&rdquo; team that focusses on the tools for the
developers, like the code generators.</p>
<p><img src="/images/all_day_devops_2022_benjamin_wolf_2.webp" alt="Benjamin Wolf about their fourth evolution"></p>
<p>Current status with regard to our three key areas:</p>
<ul>
<li>Efficiency: code generators, CLI, package setup</li>
<li>Reliability: package standards, pipelines</li>
<li>Safety: code scanning, package scanning, disaster recovery preconfigured</li>
</ul>
<h2 id="varieties-of-incident-response--kurt-andersen">Varieties of Incident Response &mdash; Kurt Andersen</h2>
<p>No matter how reliable our systems are, they are never 100% &mdash; an incident can
always happen. When the pager goes off, the first step is to recruit a response
team. This team can then observe what is going on. They need to figure out what
this means (orient themselves) and decide what to do. And finally they can act
to resolve the problem. (The OODA loop.)</p>
<p><img src="/images/all_day_devops_2022_kurt_andersen_1.webp" alt="Kurt Andersen about basic incidence response"></p>
<p>Getting the right people involved can be hard for the technical responder; they
themselves might want to dive into the technical stuff first. This is where the
&ldquo;incident commander&rdquo; role comes in. The incident responder will recruit the team
to get the right people involved, coordinate who does what and handle
communication with people outside of the response team. (The latter can also be
handled by a dedicated &ldquo;comms lead&rdquo; if needed.)</p>
<p>But how does the incident commander get involved?</p>
<p>A fairly standard approach for an on-call system will be to have a tiered model:
tier 1 (NOC), tier 2 (people generally familiar with the system) and tier 3 (the
experts of the system having the issue). The problem with this model: where does
the incident commander come from? Tier 1? If so, can the incident commander
follow through if the issue is handed over to the next tier?</p>
<p>Another model (&ldquo;one at a time&rdquo;): team A gets involved, decides it is not their
responsibility, hands over to team B, which kicks it to team C, etc. Where does
the incident commander come from in this model?</p>
<p>The aforementioned models only work in the simplest cases. They share a few big
problems: handoffs are hard and there is no ownership, which results in loss of
context. To mitigate this, some teams have an &ldquo;all hands&rdquo; approach where
everyone is paged and everyone swarms into the incident response. However, most
people on the call (or in the war room) cannot contribute. This leads to a
mentality of &ldquo;how quickly can I get out of here?&rdquo;</p>
<p>Yet another approach is an Incident Command System (ICS), which comes from
emergency services. In this approach the alert goes to the incident commander
who then involves the team. While this works in some organizations, in tech it&rsquo;s
usually a bit too regimented.</p>
<p>The ICS morphed to an &ldquo;adaptive ICS&rdquo; where the technical team has more autonomy,
but the incident commander is still involved. This system can be scaled up to
where there&rsquo;s an &ldquo;area commander&rdquo; role which coordinates separate teams (via
their respective incident commanders).</p>
<p><img src="/images/all_day_devops_2022_kurt_andersen_2.webp" alt="Kurt Andersen with an image of the &ldquo;response trio&rdquo;"></p>
<p>Summarizing the roles of the parties in the &ldquo;response trio&rdquo;:</p>
<ul>
<li>incident commander: coordination</li>
<li>tech team: nitty-gritty to solve the problem</li>
<li>comms lead: maintain contact with stakeholders</li>
</ul>
<p>Each role will perform their own OODA loop from their own perspective.</p>
<p>But we started the story in the middle. We need to get back to the beginning and
ask the question &ldquo;why is the pager making noise?&rdquo; Perhaps the first question one
should ask is: &ldquo;is this something actionable?&rdquo; If it is not or if it is something
you can handle in the morning, perhaps you do not have to respond in the middle
of the night.</p>
<blockquote>
<p>Cut down the noise and focus on the signal.</p></blockquote>
<h2 id="the-e-stands-for-enablement---modern-sre-at-pagerduty--paula-thrasher">The E Stands for Enablement - Modern SRE at PagerDuty &mdash; Paula Thrasher</h2>
<p>PagerDuty had an idea what SRE meant: they were enablers. You can hit them up on
Slack and they help you out with a problem. Having an SRE team initially reduced
the total minutes in incident in a year. But when PagerDuty grew further, the
number went up again. Oops.</p>
<p><img src="/images/all_day_devops_2022_paula_thrasher.webp" alt="Paula Thrasher about agreeing on who is responsible for what"></p>
<p>The &ldquo;get well&rdquo; project required teams to have the following:</p>
<ul>
<li>Fast rollbacks: all rollbacks have to happen within 5 minutes. The SRE team
provided guidelines to help teams achieve this.</li>
<li>Canary deploys: teams must test changes (canary) first. The SRE team enabled
this via tooling.</li>
<li>Product limits: reasonable limits. The SRE team made this possible via
telemetry and monitoring.</li>
</ul>
<p>Results:</p>
<ul>
<li>Lower number of minutes in major incident</li>
<li>Lower mean time to resolve</li>
<li>More &ldquo;incidents averted&rdquo;: four minor incidents were resolved before they
caused real harm.</li>
</ul>
<p>Important elements that made this &ldquo;get well&rdquo; project possible:</p>
<ul>
<li>Clear, measurable goals</li>
<li>Enabled by tools and templates</li>
<li>Gave teams a path to do it</li>
<li>Experts were available to coach</li>
<li>The teams owned the implementation themselves</li>
<li>The teams were held accountable</li>
</ul>
<p>PagerDuty used <a href="https://backstage.io">Backstage</a> as an internal &ldquo;one stop shop&rdquo;
developer portal with documentation and insights. It also integrates with the
development systems.</p>]]></content>
  </entry>
  <entry>
    <title type="html"><![CDATA[All Day DevOps 2021]]></title>
    <link rel="alternate" href="https://markvanlent.dev/2021/10/28/all-day-devops-2021/" type="text/html" />
    <id>https://markvanlent.dev/2021/10/28/all-day-devops-2021/</id>
    <author>
      <name>map[name:Mark van Lent uri:https://markvanlent.dev/about/]</name>
    </author>
    <category term="chaos engineering" />
    <category term="conference" />
    <category term="costs" />
    <category term="culture" />
    <category term="devops" />
    <category term="infrastructure as code" />
    <category term="terraform" />
    
    <updated>2021-12-09T21:14:34Z</updated>
    <published>2021-10-28T00:00:00Z</published>
    <content type="html"><![CDATA[<p><a href="https://www.alldaydevops.com/">All Day DevOps</a>, the free online DevOps
conference that goes on for 24 hours, was held for the 6th time today. With a
total of 180 speakers spread over 6 tracks, there&rsquo;s even more content than <a href="/2019/11/06/all-day-devops/">the
last time I attended</a>. These are the notes I
took during the day.</p>
<figure><img src="/images/all_day_devops_2021_keynote.png"
    alt="First slide of the kick-off session with Derek Weeks opening the day"><figcaption>
      <p>Derek Weeks opening the day for All Day DevOps</p>
    </figcaption>
</figure>

<h2 id="keynote-cams-then-and-now-and-the-path-forward--thomas-vikingops-krag">Keynote: CAMS Then and Now and the Path Forward &mdash; Thomas &ldquo;VikingOps&rdquo; Krag</h2>
<p>When people think about DevOps, they think of CAMS, which stands for:</p>
<ul>
<li><strong>C</strong>ulture</li>
<li><strong>A</strong>utomation</li>
<li><strong>M</strong>easurement</li>
<li><strong>S</strong>haring</li>
</ul>
<p>The term CAMS was formalized by John Willis and he wrote down his idea in the
article <a href="https://www.chef.io/blog/what-devops-means-to-me">What Devops Means to Me</a>.
This talk will focus on the most important aspect: culture, and specifically
organizational learning.</p>
<h3 id="culture">Culture</h3>
<p>Culture as described by Simon Sinek:</p>

  <figure>

<blockquote >
A group of people with a common set of values and beliefs. When we&rsquo;re
surrounded by people who believe what we believe something remarkable happens.
Trust emerges.
</blockquote>

  <figcaption>
    &mdash;Simon Sinek
  </figcaption>
  </figure>


<p><img src="/images/all_day_devops_2021_organizational_learning_thomas_krag.png" alt="Thomas Krag talks about organizational learning"></p>
<p>The 5 disciplines from <a href="https://en.wikipedia.org/wiki/Learning_organization">organizational learning</a>:</p>
<dl>
<dt>Shared vision</dt>
<dd>Have people play the game not just because of the rules, but because they feel
responsible for the game.</dd>
<dt>Mental Models</dt>
<dd>Breaking your own assumptions and how they can influence your actions. You
need to self-reflect on your own beliefs.</dd>
<dt>Personal Mastery</dt>
<dd>This is about owning yourself, learning and achieving your personal goals. And
to do the latter you first need to define what is important for yourself.</dd>
<dt>Team learning</dt>
<dd>Effective teamwork leads to results that persons cannot achieve on their own.
And individuals working as a team can learn faster than they would by themselves.</dd>
<dt>Systems Thinking</dt>
<dd>Looking into patterns that emerge inside your organization from a
holistic viewpoint instead of just looking at your own team.</dd>
</dl>
<h3 id="automation">Automation</h3>
<p>Automation is about automating the right things. You don&rsquo;t want to do it just to
automate things, but it has to fit in the system. So you have to start with
culture before you think about automation. The ultimate goal is
<a href="https://www.gitops.tech/">GitOps</a> where everything happens via a pull request.</p>
<h3 id="measurement">Measurement</h3>
<p>Measurement started out with a focus on the tooling. But
<strong>measuring how you work</strong> is as important as measuring your infrastructure.</p>
<p>Important key metrics, from <a href="https://www.devops-research.com/research.html">DORA</a>:</p>
<ul>
<li>deployment frequency (how often do you release to production)</li>
<li>lead time for changes (the amount of time for a change to get into production)</li>
<li>time to restore service (how long does it take to recover from a failure in
production)</li>
<li>change failure rate (the percentage of deployments causing a failure in
production)</li>
</ul>
<h3 id="sharing">Sharing</h3>
<p>Share how you are doing (in) DevOps. This is how we ended up here now.
Share how you are improving your own organization. But also share information
<em>within</em> your organization: documentation, videos, presentations,
<a href="https://devopsdays.org/open-space-format/">open spaces</a>,
<a href="https://leancoffee.org/">lean coffee sessions</a></p>
<h3 id="three-ways">Three ways</h3>
<p>CAMS originated in 2010. The
<a href="https://itrevolution.com/the-three-ways-principles-underpinning-devops/">three ways</a>
are principles that came out of the book
<a href="https://itrevolution.com/the-phoenix-project/">The Phoenix Project</a>. They are:</p>
<ul>
<li>1st way: create flow (systems thinking)</li>
<li>2nd way: feedback loops</li>
<li>3nd way: experimentation, risks, learning</li>
</ul>
<p>If we map these to CAMS:</p>
<ul>
<li>Culture: 3rd way</li>
<li>Automation: 1st way</li>
<li>Measurement: 2nd way</li>
<li>Sharing: 3rd way</li>
</ul>
<p>Working on your culture is as important as doing your actual work.</p>
<p>So what&rsquo;s next? CAMS is still as applicable as 10 years ago. It has always been
important, but it was only put into words in 2010. We need to continue sharing
to get the full value out of it.</p>
<h2 id="gamification-of-chaos-testing--bram-vogelaar">Gamification of Chaos Testing &mdash; Bram Vogelaar</h2>
<p>We think of <a href="https://en.wikipedia.org/wiki/Usain_Bolt">Usain Bolt</a> as the record
breaking athlete, but he&rsquo;s also the person that worked really hard to get there.
It takes a lot of time and effort to become good at something. Pilots and firemen
spend most of their time training and not doing what you expect them to do; just
to make sure they perform well under pressure. Also note that pilots use a lot
of checklists to prevent mistakes.</p>
<p>We should do the same: train for when our platform is in an error state. We
should not just be able to detect it, but also solve the problem.</p>
<p>Chaos engineering is <q>the discipline of experimenting on a distributed system
in order to build <strong>confidence</strong> in the system&rsquo;s capability to
withstand turbulent conditions <strong>in production</strong>.</q> This
practice started at Netflix with <a href="https://github.com/Netflix/chaosmonkey">Chaos Monkey</a>.</p>
<p><img src="/images/all_day_devops_2021_chaos_engineering_ecosystem_bram_vogelaar.png" alt="Bram Vogelaar showing an image of the vast CNCF cloud native landscape with a whole section of chaos engineering products"></p>
<p>We need to become comfortable with experimenting. Have game day exercises and
analyze what happened, to improve your training. Do not just focus on the result
of the exercise itself, but also ask questions like &ldquo;was it the right
experiment?&rdquo;</p>
<p>Now that we use containers, add sidecar containers with tools to get metrics or
detect errors. Or to do chaos engineering e.g. with
<a href="https://github.com/Shopify/toxiproxy">Toxiproxy</a>.</p>
<p>Since checklists are boring, we can use gamification to spice things up.
Celebrate failure, and learn from it!</p>
<blockquote>
<p>Living in the year 3000: breaking production on purpose on Saturdays and have
the system remedy the problem itself.</p></blockquote>
<p>Convince management that failure is <em>normal</em> and <em>expected</em> behaviour. Promising
100% uptime is not realistic. Large, complex systems will always be in a
(somewhat) degraded state.</p>
<p>Let engineers be scientists to deal with this complex environment. Give them
training, allow them to do tests (experiments), which results in having valid
monitoring that lead to actionable alerts. Get the engineers in a state where
they are comfortable with failures.</p>
<p>(<a href="https://www.slideshare.net/attachmentgenie">Slides</a>)</p>
<h2 id="watch-your-wallet-cost-optimizations-in-aws--renato-losio">Watch Your Wallet! Cost Optimizations in AWS &mdash; Renato Losio</h2>
<p>Each AWS service has it&rsquo;s own price components. It&rsquo;s a complex subject. Even a
simple service like a load balancer has multiple components. It looks simple
with the &ldquo;$0.008 per LCU-hour&rdquo; price tag, but now you have to figure out what an
LCU-hour is. Then you learn it has four dimensions that are measured: number of
new connections per second, active connections, processed bytes and rule
evaluations. Good luck predicting the costs.</p>
<p>This presentation only sticks to the basics since cost optimization it such a
big topic.</p>
<p>To start to manage/reduce your costs, you need to enable billing and costs for
your DevOps team. If you cannot measure your costs, you cannot manage it. Note
that you do not need an excessive amount of tags for cost management. First you
need to figure out what you are going to change and how it&rsquo;s going to affect the
bill for your company.</p>

  <figure>

<blockquote >
When savings can be measured, they can be recognized, and <strong>cost efficiency
projects become exciting opportunities</strong>. As of early 2021, the most viewed
dashboard at Airbnb is a dashboard of AWS costs.
</blockquote>

  <figcaption>
    &mdash;Anna Matlin, Airbnb
  </figcaption>
  </figure>


<p>What patterns can we avoid? In most organizations, the most expensive parts of your
bill will be:</p>
<ul>
<li>Compute</li>
<li>Storage</li>
<li>Data transfer</li>
</ul>
<p>So we will dive into these subjects.</p>
<h3 id="compute">Compute</h3>
<p>Tips to reduce costs:</p>
<ul>
<li>Avoid fixed IP addresses where possible. This is not so much about the costs
of the IP address itself, but mostly because architectures that require a
fixed IP tend to be complex and more expensive.</li>
<li>Use Graviton (ARM) instances. They have a better price to performance ratio.
You can also migrate managed services (RDS, Lambda functions) to Graviton.</li>
<li>Check out Lightsail. It&rsquo;s less flexible than using EC2 instances, but cheaper.</li>
</ul>
<h3 id="data-transfer">Data transfer</h3>
<p>We usually forget to take data transfer costs into account upfront. It&rsquo;s also a
complex subject and there are a lot of considerations to make.</p>
<p><img src="/images/all_day_devops_2021_aws_data_transfer_costs_renato_losio.png" alt="Renato Losio shows an image with all kind of ways data transfer can cost you money"></p>
<p>Tips:</p>
<ul>
<li>Multi AZ is always good, but are 3 zones always better than 2 zones?</li>
<li>Multi region is easy to do nowadays, but expensive with regard to data
transfer. Ask yourself if you really need it.</li>
<li>If you want to use multiple cloud providers always think about the data
transfer costs. Perhaps the storage costs are lower at provider X, but if you
have to transfer data, you&rsquo;ll probably pay an egress cost.</li>
<li>Using a CDN (CloudFront) might be cheaper than paying for the egress data
transfer otherwise.</li>
</ul>
<h3 id="storage">Storage</h3>
<p>Initially it was simple: there were only two storage classes. Currently there
are 6 different classes with their own prices and characteristics.</p>
<p><img src="/images/all_day_devops_2021_s3_storage_classes_renato_losio.png" alt="Renato Losio shows an image with the various S3 storage classes and their use cases"></p>
<p>The best option is to use lifecycle rules to move data between different
classes. Note that you in a lifecycle policy you cannot filter the objects based
on an extension (e.g. <code>*.jpg</code>) but instead you need to think &ldquo;from left to
right.&rdquo; So you need to think upfront about the prefixes you are going to want to
use in your bucket.</p>
<p>Managed services can offer you automatic and manual backups, which can be great.
But what is the cost of that? Check how much retention you need for example.</p>
<p>With regard to EBS: for most use cases <code>gp3</code> is better and cheaper than <code>gp2</code>
(except for very large volumes). Note that you can change the EBS volume type
without stopping the machine. In most cases you can have a 20% cost saving
without affecting your performance.</p>
<h3 id="aws-changes-quickly">AWS changes quickly</h3>
<p>After each AWS re:Invent, your deployment is probably outdated with regard to cost
optimizations. Examples:</p>
<ul>
<li>Use <code>gp3</code> instead of <code>gp2</code> for your EBS volumes.</li>
<li>The instance type <code>m6</code> might be more interesting than the <code>m5</code> or <code>m4</code> you may
currently be using.</li>
<li>Dublin was the cheapest region in Europe, but for most things Stockholm is
cheaper than Dublin at the moment.</li>
</ul>
<p>So keep up to date with the offerings.</p>
<h2 id="keynote-call-for-code-with-the-linux-foundation-contributing-to-tech-for-good-even-if-youre-not-techincal--daniel-krook-and-demi-ajayi">Keynote: Call for Code with The Linux Foundation: Contributing to Tech-for-Good Even if You&rsquo;re Not Techincal &mdash; Daniel Krook and Demi Ajayi</h2>
<p>Call for Code is a multi year program launched in 2018 to address humanitarian
issues and help bridge potential solutions. Last year the global challenge was
around climate change and a track was added for the social and business impact
of the COVID-19 pandemic.</p>
<p>It&rsquo;s not just about generating ideas to take on the issues. It should
eventually also lead to an adopted open source solution that is sustainable.</p>
<p>The 14 projects discussed today can be found on <a href="https://linuxfoundation.org/projects/call-for-code/">Call for Code page on the Linux Foundation website</a>.
You can also read about them via the <a href="https://developer.ibm.com/callforcode/">IBM developer site</a>.
GitHub is central for how they iterate on the features. The related organizations are :</p>
<ul>
<li><a href="https://github.com/call-for-code">Call for Code with the Linux Foundation</a></li>
<li><a href="https://github.com/call-for-code-for-racial-justice">Call for Code for Racial Justice</a></li>
</ul>
<p>The key takeaway for this session is to make us help improve how the projects do
DevOps. The goal is to ensure that everyone can contribute to the projects, can do so
with confidence and the projects can be deployed with speed.</p>
<p>The Call for Code for Racial Justice open source projects are categorised in three pillars:</p>
<ul>
<li>Police reform and judicial accountability:
<ul>
<li><a href="https://github.com/Call-for-Code-for-Racial-Justice/fairchange">Fair Change</a></li>
<li><a href="https://github.com/Call-for-Code-for-Racial-Justice/Open-Sentencing/">Open Sentencing</a></li>
<li><a href="https://github.com/Call-for-Code-for-Racial-Justice/Incident-Accuracy-Reporting-System">Incident Accuracy Reporting System (IARS)</a></li>
</ul>
</li>
<li>Diverse representation:
<ul>
<li><a href="https://github.com/Call-for-Code-for-Racial-Justice/TakeTwo">Take Two</a></li>
</ul>
</li>
<li>Policy and legislative reform:
<ul>
<li><a href="https://github.com/Call-for-Code-for-Racial-Justice/Legit-Info/blob/main/README.md">Legit-Info</a></li>
<li><a href="https://github.com/Call-for-Code-for-Racial-Justice/Five-Fifths-Voter">Five Fifth Voters</a></li>
<li><a href="https://github.com/Call-for-Code-for-Racial-Justice/Truth-Loop">Truth Loop</a></li>
</ul>
</li>
</ul>
<p>Demi talked about each of these projects in the program and their tech stacks. You
can read more about them on the
<a href="https://developer.ibm.com/callforcode/racial-justice/">Call for Code for Radical Justice section</a>
on the IBM developer site.</p>
<p>Daniel in turn talked about other Call for Code projects:</p>
<ul>
<li><a href="https://clusterduckprotocol.org/">ClusterDuck</a></li>
<li><a href="https://pyrrha-platform.org/">Pyrrha</a></li>
<li><a href="https://www.isac-simo.net/">ISAC-SIMO</a></li>
<li><a href="https://openeew.com/">OpenEEW</a></li>
<li><a href="https://github.com/Call-for-Code/Liquid-Prep">Liquid Prep</a></li>
<li><a href="https://github.com/Call-for-Code/DroneAid">DroneAid</a></li>
<li><a href="https://github.com/Rend-o-matic">Rend-o-matic</a></li>
</ul>
<p>Even if you cannot code, there are numerous ways you can contribute, e.g.
conducting user research, write/review documentation, do design work, advocacy
like speaking at conferences.</p>
<p><img src="/images/all_day_devops_2021_call_for_code_non_technical_contributions_demi_ajayi.png" alt="Demi Ajayi talking about non-technical ways to contribute to Call for Code"></p>
<p>There are multiple ways to get involved:</p>
<ul>
<li>Via the virtual community (<a href="https://callforcode.org/slack">join slack</a>, join events)</li>
<li>Directly via the projects on GitHub</li>
<li>Conduct outreach (recruit friends, host events, etc)</li>
</ul>
<h2 id="devsecops-culture-laughing-through-the-failures--chris-romeo">DevSecOps Culture: Laughing Through the Failures &mdash; Chris Romeo</h2>
<p>Chris shared 10 DevSecOps failures and talked about how to change the culture
and turn these failures into successes.</p>
<h3 id="1-name-and-brand">#1 Name and brand</h3>
<p>This quote nicely sums it up:</p>

  <figure>

<blockquote cite="https://twitter.com/jvehent">
<p>Can we stop the {Sec}Dev{Sec}Ops{Sec} naming foolishness?</p>
<p>Just call it DevOps and focus on making security a natural part of building stuff.</p>

</blockquote>

  <figcaption>
    &mdash;Julian Vehent, <cite><a href="https://twitter.com/jvehent">@jvehent</a></cite>
  </figcaption>
  </figure>


<p>To change the culture:</p>
<ul>
<li>Embrace security and make it part of DevOps</li>
<li>Teach this new definition to everyone involved with your project (developers,
testers, product managers, executives, etc)</li>
<li>Create a &ldquo;DevSecOps&rdquo; swear jar ;-)</li>
</ul>
<h3 id="2-the-infinity-graph">#2 The infinity graph</h3>
<p><img src="/images/all_day_devops_2021_infinity_loop_chris_romeo.png" alt="Chris Romeo shows the often used infinity symbol"></p>
<p>The problem is that nothing ever gets done with this infinity graph. It&rsquo;s not an
accurate representation.</p>
<p>The solution is to talk about pipelines instead and integrating security into
them. Code review should include security, vulnerability scanning should be part
of the pipeline, etc. Ban the infinity graph.</p>
<h3 id="3-security-as-a-special-team">#3 Security as a special team</h3>
<p>Creating a specific security team is the <em>opposite</em> of what DevOps is about. It&rsquo;s about
working together. Security isn&rsquo;t a specialty, it is the responsibility of
everybody. This requires knowledge and expertise.</p>
<p>The other way around is also true: teach security people to code. They don&rsquo;t
have to become great coders, but it would be nice if they can review code and
make suggestions to make things more secure.</p>
<h3 id="4-vendor-defined-devops">#4 Vendor defined DevOps</h3>
<p>Sometimes we let vendors define what DevOps and security are for us, via the
products that they offer. It would be better to find the best of breed outside
of the offering of cloud provider. Take a vendor independent approach and
determine what DevOps means to <strong>you</strong>.</p>
<h3 id="5-big-company-envy">#5 Big company envy</h3>
<p>Looking at the big companies can be discouraging. You most likely have not
invested the same time in it as e.g. Netflix, Etsy, etc. have. So while you
won&rsquo;t be at the same level, don&rsquo;t see this as an excuse to give up. Do the
DevOps that you do. Don&rsquo;t fixate on the top of the class. Get on that path and
make incremental progress.</p>
<p>Use the <a href="https://owasp.org/www-project-devsecops-maturity-model/">OWASP DevSecOps Maturity Model</a> to create a roadmap.</p>
<h3 id="6-overcomplicated-pipelines-and-doing-everything-now">#6 Overcomplicated pipelines and doing everything now</h3>
<p>This can be a complicated subject (see for example the
<a href="https://blog.sonatype.com/2020-devsecops-reference-architecture">DevSecOps Reference Architecture</a>
from Sonatype).</p>
<p><img src="/images/all_day_devops_2021_sonatype_reference_architecture_chris_romeo.png" alt="Chris Romeo shows the Sonatype DevSecOps reference architecture"></p>
<p>But keep it simple! Start with a small subset of security tools. Everybody
related to your project should be able to explain the build pipeline. Don&rsquo;t try
to solve all problems immediately. Take a phased approach.</p>
<h3 id="7-security-as-gatekeeper">#7 Security as gatekeeper</h3>
<p>Security might want to slow down the pipeline and act as a gatekeeper. Don&rsquo;t say
&ldquo;<em>no</em>,&rdquo; but &ldquo;<em>yes, if&hellip;</em>&rdquo; For example: &ldquo;<em>yes</em>, that would be a great feature <em>if</em> you
enable multifactor authentication.&rdquo;</p>
<p>Practice empathy. Both security people for developers, but also the other way
around.</p>
<h3 id="8-noisy-security-tools">#8 Noisy security tools</h3>
<p>You buy a tool and enable every option to &ldquo;get your money&rsquo;s worth.&rdquo; The result
is 10,000 JIRA tickets of things that need to be fixed. This does not help.</p>
<p>It would be better to tune the tools and don&rsquo;t waste time with security findings
that do not matter.</p>
<p>Start with a minimal policy focussing on the largest issue. Developers will then
start to trust the tool and <em>then</em> you can slowly increase the policy.</p>
<h3 id="9-lack-of-threat-modelling">#9 Lack of threat modelling</h3>
<p>You scanning tools cannot find business logic flaws; there&rsquo;s no pattern to it.</p>
<p><img src="/images/all_day_devops_2021_threat_modelling_chris_romeo.png" alt="Chris Romeo shows the condescending Wonka meme with the text &ldquo;DevOps is too quick for threat modeling you say? &hellip; Thousands of business logic flaws beg to differ"></p>
<p>Perform threat modelling outside of the pipeline. It should be done when new
feature assignments go out.</p>
<h3 id="10-vulnerable-code-in-the-wild">#10 Vulnerable code in the wild</h3>
<p>There are lots of vulnerabilities in open source software; this is a supply
chain problem.</p>
<p>To improve this: embed software composition analysis (SCA) in <strong>all</strong> your
pipelines. Set the SCA policy to fail when a vulnerability is detected. If you
filter out a vulnerability (e.g. because there&rsquo;s no fix yet), make sure the
filter will not be active forever.</p>
<h3 id="successes">Successes</h3>
<p>The 10 DevOps successes we can distil from the failures above:</p>
<ol>
<li>Just call it DevOps</li>
<li>Pipelines</li>
<li>Security for everyone</li>
<li>Embrace <strong>your</strong> DevOps</li>
<li>Be content with <strong>your</strong> DevOps</li>
<li>Simple and staged pipeline</li>
<li>Security as trusted partner</li>
<li>Tuned and valuable security tools</li>
<li>Threat modelling for everyone</li>
<li>Breaking the build for vulnerabilities</li>
</ol>
<h3 id="key-takeaways">Key takeaways</h3>
<ul>
<li>Hopefully the real impact of DevOps (when looking back 50 years from now) is
going to be security culture related.</li>
<li>Some of the failures were outside of your control, but you can change them.</li>
<li>Everybody has to code, choose <strong>your</strong> DevOps, keep it simple, lower the
noise, add the best practices we discussed, do some threat modelling, break
the build for vulnerabilities.</li>
<li>Embrace and laugh at the failures.</li>
</ul>
<h2 id="keynote-managing-risk-with-service-level-objectives-and-chaos-engineering--liz-fong-jones">Keynote: Managing Risk with Service Level Objectives and Chaos Engineering &mdash; Liz Fong-Jones</h2>
<p>Besides being a principal developer advocate at Honeycomb, Liz is also a member
of the platform on-call rotation. Honeycomb deploys with confidence up to 14
times a day, every day of the week&mdash;so also on Fridays. How do they manage to
(mostly) meet their Service Level Objectives (SLOs) while also scaling out their
user traffic?</p>
<p>Their confidence recipe:</p>
<ul>
<li>Quantify the amount of reliability.</li>
<li>Be able to identify risk areas that might prevent them from fulfilling their
targets.</li>
<li>Test to verify that the assumptions about the systems are correct.</li>
<li>Respond to that feedback to address the problems found via those experiments
or though natural outages.</li>
</ul>
<h3 id="how-to-measure-reliability">How to measure reliability?</h3>
<p>You need to know how broken is &ldquo;too broken.&rdquo; You don&rsquo;t have to alert on all
problems when working at scale. You need to measure success of the service and
define SLOs. These are a way to measure and quantify your reliability.</p>
<p>Honeycomb&rsquo;s jobs is to reliably ingest telemetry, index it, store it safely and
let people query it in near-real-time. Honeycomb&rsquo;s SLOs measure the things
that their customers care about.</p>
<p>For example, they have set an SLO that the homepage needs to load quickly
(within a few hundred milliseconds) in 99.9% of the times. User queries need to
be run successful &ldquo;only&rdquo; 99% of the time and are allowed to take up to 10 seconds.
On the other hand: ingestion needs to succeed in 99.99% of the time since they
only have one shot at it.</p>
<blockquote>
<p>Services are not just 100% down or 100% up (most of the time).</p></blockquote>
<p>These metrics help Honeycomb make decisions about reliability and product
velocity. If the service is down too much, they need to invest in reliability
(since having features that cannot be used does not add value). On the other
hand: if they exceed the SLO, they can move faster.</p>
<h3 id="recipe-for-shipping-reliably-and-quickly">Recipe for shipping reliably and quickly</h3>
<p>Practices used by Honeycomb:</p>
<ul>
<li>Code is instrumented. Think beforehand what a success or failure in
production would look like.</li>
<li>Functional and visual testing using libraries that create snapshots.</li>
<li>The design for feature flag deployment (only making a feature available to a
small percentage of users initially).</li>
<li>Practice automated integration plus human review. (There&rsquo;s an SLO for how
long the tests are allowed to take. Code reviews are high priority tasks
since you are blocking someone else by not reviewing their code.)</li>
<li>The main branch is safe to release any time.</li>
<li>Automatically roll out the changes.</li>
<li>After the deployment the engineers have to observe what is happening with
their code in production. Only after they confirm that everything is okay,
they can go home. So you are free to deploy on Friday at 19:30 as long as you
are willing to stick around until your code is deployed and you have confirmed
it is not causing issues.</li>
</ul>
<p>For infrastructure Honeycomb also use infrastructure as code practices:</p>
<ul>
<li>They can use CI and feature flags for their infrastructure.</li>
<li>They can automatically provision fleets if needed.</li>
<li>They can automatically quarantine certain paths to keep the main fleet from
crashing or do performance profiling.</li>
</ul>
<h3 id="chaos-engineering">Chaos Engineering</h3>
<p>Left-over error budget is used for chaos engineering experiments. This is
something where you go test a hypothesis. You need to control the percentage of
users affected by it and be able to revert the impact you are causing.</p>
<blockquote>
<p>Chaos engineering is engineering. It&rsquo;s not pure chaos.</p></blockquote>
<p>This works well for stateless things, but how does it work for stateful things?
In the case of the Honeycomb infrastructure, they make sure to only restart one
server or service at a time. They do not introduce too much chaos to reduce the
likelihood that something goes catastrophically wrong.</p>
<p>Two reasons why you will want to do these experiments at 3 PM and not at 3 AM:</p>
<ul>
<li>You want to test at peak traffic instead of low traffic, since the latter
could give you a false sense of security in situations when everything may
still look normal even though this would have been a problem in a high traffic
situation.</li>
<li>When doing things in the afternoon, there are more people available to deal
with something than there would be in the middle of the night.</li>
</ul>
<p>With the experiments they measure if they had an impact on the customer
experience. If they cause a change, does the telemetry reflect this? (Is the
node indeed reported as being offline, for example?) When you fix things, you
need to repeat the experiment and make sure the change indeed fixed the issue.</p>
<p><img src="/images/all_day_devops_2021_chaos_engineering_liz_fong-jones.png" alt="Liz Fong-Jones: Not every experiment succeeds. But you can mitigate the risks."></p>
<p>When you burn the error budget, the <a href="https://sre.google/sre-book/table-of-contents/">SRE book</a>
states that you should freeze deploys. Liz disagrees. If you freeze deploys, but
continue with feature development, the risk of the next deployment only
increases. Instead, Liz advocates for using the team&rsquo;s time to work on
reliability (i.e. change the nature of the work instead of stopping work).</p>
<blockquote>
<p>Fast and reliable: pick both!</p></blockquote>
<p>You don&rsquo;t have to pick between fast and reliable. In a lot of ways fast <strong>is</strong>
reliable. If you exercise your delivery pipelines every hour of every day,
stopping becomes the anomaly instead of deploying.</p>
<h3 id="takeaways">Takeaways</h3>
<ul>
<li>By designing the delivery pipeline for reliability, Honeycomb can meet their
SLOs.</li>
<li>Feature flag can reduce blast radius, and keep you within your SLO.</li>
<li>And when they cannot: there are other ways to mitigate the risks.</li>
<li>By discovering risks at 3 PM and not 3 AM, you improve the customer experience
since the system is more resilient.</li>
<li>If something does go catastrophically wrong, remember that the SLO is a
guideline not a rule. SLOs are for managing predictable-ish unknown-unknowns
and not things that are completely outside of your control.</li>
<li>We are all part of sociotechnical systems. Customers, engineers and stakeholders alike.</li>
<li>Outages or failed experiments are learning opportunities, not reasons to fire someone.</li>
<li>SLOs are an opportunity to have discussions about trade-offs between stability and speed.</li>
<li>DevOps is about talking to each other and talking to our customers.</li>
</ul>
<h2 id="common-pitfalls-of-infrastructure-as-code-and-how-to-avoid-them--tim-davis">Common Pitfalls of Infrastructure as Code (And how to avoid them!) &mdash; Tim Davis</h2>
<p>We start with the basics: what is Infrastructure as Code (IaC)? With the advent
of cloud providers, you no longer use hardware, but a UI to stand up
infrastructure. This led to <a href="https://en.wikipedia.org/wiki/Shadow_IT">shadow IT</a>
since developers ran off with a credit card to provision what they needed
themselves, instead of using slow, internal IT systems.</p>
<p>Developers however rather write code and use developer methodologies than click
through a UI. This is where IaC started. With it you can create and manage
infrastructure by writing code.</p>
<p><img src="/images/all_day_devops_2021_pitfalls_tim_davis.png" alt="Tim Davis explains that IaC combines the pitfalls from both infrastructure and code"></p>
<p>What are he pitfalls? The bad news: you get all the pitfalls of infrastructure
and all the pitfalls of code. But you&rsquo;ve probably already got a lot of
experience with those issues and teams to handle them. You just use a different
methodology.</p>
<p>The first pitfall is not fostering the communication between the groups that
have experience and tools.</p>
<h3 id="infrastructure-pitfalls">Infrastructure pitfalls</h3>
<p>Which framework/tool do you pick? There are basically two categories:
multi-cloud or cloud agnostic tools on the one hand (like
<a href="https://www.terraform.io/">Terraform</a> and <a href="https://www.pulumi.com/">Pulumi</a>)
and cloud specific tools on the other (like
<a href="https://aws.amazon.com/cloudformation/">CloudFormation</a>). Note that for example
with Terraform and Pulumi you still have to rewrite code when switching from one
cloud provider to another, but at least the tool is familiar.</p>
<p>Security is a huge thing. You still need to know how to design your VPC, IAM
policies, security policies, etc. You still need to communicate with all the
teams that have the experience. It&rsquo;s not just Dev and Ops. With tools like
<a href="https://github.com/accurics/terrascan">Terrascan</a> and
<a href="https://www.checkov.io/">Checkov</a> you can shift-left the security aspect
instead of trying to bolt it on afterwards.</p>
<h3 id="code-pitfalls">Code pitfalls</h3>
<p>The biggest thing issue is with default values. If you use the UI, there are a
lot of boxes that may be blank or have stuff in them. Some of the boxes can be
left blank, for some you need to specify what you want. The UI is going to
yell at you; if you use IaC things may be less in your face.</p>
<p>You don&rsquo;t want to deploy something with an open policy.
<a href="https://www.openpolicyagent.org/">Open Policy Agent</a> can really help you to make sure you
stay within your allowed parameters. For instance you can write a policy to make
sure you are only use a specific region, don&rsquo;t deploy an open S3 bucket or that
you only use certain sizes of EC2 instances.</p>
<p>If you hard code certain values in Terraform or other IaC tools, you might need
to copy/paste a lot of code if you want to create e.g. a test, acceptance and
production environment. To mitigate these <a href="https://en.wikipedia.org/wiki/Don%27t_repeat_yourself">DRY (don&rsquo;t repeat
yourself)</a> issues you can
for instance use
<a href="https://www.terraform.io/docs/language/modules/index.html">Terraform modules</a> or
<a href="https://terragrunt.gruntwork.io/">Terragrunt</a>.</p>
<p>State size can become a problem. If you want to, you can put all of your
infrastructure in the same state file (which is where your tool stores the state
of the infrastructure). However it means the tool will have to check <em>all</em>
resources in the state file to detect if it needs to do something. To mitigate
this, you can again use Terraform modules. It helps with performance and makes
the codebase more manageable.</p>
<h2 id="wrap-up">Wrap-up</h2>
<p>The conference is still ongoing while I publish this post. However, this is it
for me for All Day DevOps for this year. I learned new things and got inspired.</p>
<p>Thanks to the organizers, moderators and speakers for hosting another great
event.</p>]]></content>
  </entry>
  <entry>
    <title type="html"><![CDATA[Devopsdays oNLine 2021]]></title>
    <link rel="alternate" href="https://markvanlent.dev/2021/06/29/devopsdays-online-2021/" type="text/html" />
    <id>https://markvanlent.dev/2021/06/29/devopsdays-online-2021/</id>
    <author>
      <name>map[name:Mark van Lent uri:https://markvanlent.dev/about/]</name>
    </author>
    <category term="conference" />
    <category term="devops" />
    
    <updated>2021-11-10T20:39:28Z</updated>
    <published>2021-06-29T00:00:00Z</published>
    <content type="html"><![CDATA[<p>Since <a href="https://en.wikipedia.org/wiki/COVID-19_pandemic">COVID-19</a> is still
around, devopsdays Amsterdam was an online event again. These are the notes I
took while watching the talks.</p>
<p><img src="/images/devopsdays2021_logo.jpg" alt="devopsdays oNLine logo"></p>
<h2 id="how-cognitive-biases-and-ranking-can-foster-an-ineffective-devops-culture--kenny-baas--evelyn-van-kelle">How cognitive biases and ranking can foster an ineffective DevOps culture &mdash; Kenny Baas &amp; Evelyn van Kelle</h2>
<p>In practice, not all team members are treated equally although we might think
they are.</p>
<h3 id="how-to-make-sure-everyone-said-what-has-to-be-said">How to make sure everyone said what has to be said?</h3>
<p>In each team ranking takes place. It determines who takes the lead, who
expresses an opinion, etc.</p>
<p>We need to make sure ranking is made explicit. Your job title or position in the
org chart is explicit ranking. But things like your gender, skin colour and level
of charisma are part of the implicit ranking.</p>
<p>The result of this ranking can be a situation like this: a woman on the team
just explained something on a certain topic, but questions about it are addressed
to the white man in the team.</p>
<p>We all have a list of traits we associate with higher rank. So if someone ticks
more of those boxes, they are seen as higher in rank. You can also rank yourself
lower compared to others. As a result you might not express your opinion because
someone you regard as having a higher rank said something that conflicts with
what you wanted to say.</p>
<p>The person with the higher rank should be aware of their rank and &ldquo;share it.&rdquo;
Ask yourself the question &ldquo;what <strong>don&rsquo;t</strong> I know?&rdquo; in a discussion. Ask the
question the other (with a lower rank) is afraid to ask.</p>
<blockquote>
<p>Own, play and share your rank.</p></blockquote>
<h3 id="how-can-we-create-and-include-new-insights">How can we create and include new insights?</h3>
<p>We use cognitive bias to make decisions. Usually this helps us, but sometimes it
works against us. What we know hinders us from taking on a new perspective and
it limits us in our thinking. We get stuck in what we know
(<a href="https://en.wikipedia.org/wiki/Functional_fixedness">functional fixedness</a>).</p>
<p><img src="/images/devopsdays2021_bias.jpg" alt="We all have biases: should the toilet paper be over or under?"></p>
<p>Be aware of your biases and try to break free from them.</p>
<h3 id="who-makes-decisions-and-how-to-get-everyone-onboard-with-the-decision">Who makes decisions and how to get everyone onboard with the decision?</h3>
<p>Have less &ldquo;corporate&rdquo; meetings and more &ldquo;campfires&rdquo; where we share ideas and talk
about the why.</p>
<p>There are 4 ways to go into a decision:</p>
<ul>
<li>Idea</li>
<li>Suggestion</li>
<li>Proposal</li>
<li>Command</li>
</ul>
<p>If you have a proposal or command, be open about it: &ldquo;this is what we are going
to do, what does it take to go along with it?&rdquo; This is better than pretending it
is completely open for discussion.</p>
<h2 id="getting-started-embrace-your-inner-child--rain-leander">Getting Started: Embrace Your Inner Child &mdash; Rain Leander</h2>
<p>Learning to ride a bike when you are older is harder than when you are a child.
This is not so much a physical thing, but we do not want to be embarrassed when
we are older. This goes for everything new we do: we want to be the
best&mdash;instantly. However you <em>will</em> fall down and fail. You <em>will</em> make mistakes.
Embrace that, be brave and learn.</p>
<p>Bravery is something you have to develop. It is hard! And it is not a <em>lack</em> of
fear, it is moving forward <em>despite</em> of fear.</p>
<p>The steps to become more brave that worked for Rain:</p>
<ul>
<li>Embrace vulnerability</li>
<li>Admit you have fear</li>
<li>Do it anyway</li>
<li>When you fall down, get back up</li>
</ul>
<blockquote>
<p>Play!</p></blockquote>
<p>Playing is instrumental to life. No one can tell you how to play. Do what is fun
for you. And when it stops being fun, just stop doing it. Book time in your
agenda to play, <strong>each day</strong>! (And napping counts.)</p>
<p>Explore! Some of the things Rains uses to stay curious and explore:</p>
<ul>
<li>Listen</li>
<li>Ask questions</li>
<li>Avoid assumptions</li>
<li>Listen</li>
<li>Embrace learning as fun</li>
<li>Develop a beginner&rsquo;s mind</li>
<li>Listen</li>
</ul>
<p>You are a unicorn. Your experience, your background, your education, etc. makes
you unique.</p>
<p>Be willing to fall down, and get back up. Be brave. Play! Conquer the world.</p>
<p>(The original blog post for this talk:
<a href="http://groningenrain.nl/getting-started-embrace-your-inner-child/">Getting Started: Embrace Your Inner Child</a>)</p>
<h2 id="prevent-heroism-how-to-work-today-to-reduce-work-tomorrow--quintessence-anx">Prevent Heroism: How to Work Today to Reduce Work Tomorrow &mdash; Quintessence Anx</h2>
<p>We might be familiar with &ldquo;the angry sysadmin.&rdquo; It is the person who is highly
skilled, has lots of knowledge, gets asked all the questions and puts in overwork to get
their own work done. This is basically the description of the character Brent
from the book <a href="https://itrevolution.com/the-phoenix-project/">The Phoenix Project</a>.</p>
<p>Being a Brent-like person is stressful and may result in anger, resentment and
strained relationships. Brent-like persons typically have issues with things
like time management, maintaining focus and
<a href="https://en.wikipedia.org/wiki/Allostatic_load">allostatic load</a> (where you&rsquo;re too tired to
be awake and too stressed to sleep). But there are business implications as well:
diminishing returns, high turnover, reputational damage and financial
performance goes down.</p>
<p>Instead of having&mdash;or being&mdash;a Brent-like person, we rather want no silos and
no vacuums (the latter is where you, for instance, get no response to
questions).</p>
<p>Steps to get out of this situation and do things different:</p>
<ul>
<li>Brainstorm the work (every task done during the day, planned or unplanned)</li>
<li>Determine which &ldquo;stream&rdquo; each task is in: reactive or proactive</li>
<li>Look for patterns</li>
<li>Switch to &ldquo;upstream mode&rdquo; to prevent work in the future (short term/long term
vs permanent/temporary matrix)</li>
<li>Plan and Prioritize work: upstream work is more important</li>
<li>Do the work</li>
<li>Iterate (what works, what does not, what can be improved)</li>
</ul>
<p>Upstream work is the proactive and preventative work that reduces the need for
reactive, or downstream, work</p>
<p>But how does one measure things? Things to keep in mind:</p>
<ul>
<li>What are your needs?</li>
<li>Learn how to measure them</li>
<li>Establish baselines</li>
<li>Know your capacity</li>
<li>Track the rate of change</li>
<li>Set goals</li>
<li>Iterate</li>
</ul>
<p><img src="/images/devopsdays2021_metrical_thoughts.jpg" alt="Some example metrics to track: SLA compliance, question volume, change failure rate, unplanned work rate"></p>
<p>Slides and additional resources can be found at
<a href="https://noti.st/quintessence/TFyFzN/prevent-heroism-how-to-work-today-to-reduce-work-tomorrow">https://noti.st/quintessence</a></p>
<h2 id="real-world-continuous-delivery-learn-adapt-improve--michiel-rook">Real-world Continuous Delivery: Learn, Adapt, Improve &mdash; Michiel Rook</h2>
<p>This talk is a real life case study, focussing on the deployment of the platform
of a customer of Michiel.</p>
<p>There was a pretty long release check list with manual steps. Especially when
under pressure steps were forgotten or done incorrectly. Ideally they released
every two weeks (after each sprint). It took a team 2&ndash;3 days of manual work.
Time not spent on features.</p>
<p>To improve this situation, the following goals were set:</p>
<ul>
<li>Reduce costs by letting people do more valuable work</li>
<li>Faster feedback and ability to do more experiments</li>
<li>Reduce <a href="https://sre.google/sre-book/eliminating-toil/#toil-defined">toil</a></li>
</ul>
<p>For continuous delivery to work, you have to make things small. This is a key
difference between high and low performing teams (see
<a href="https://itrevolution.com/book/accelerate/">Accelerate</a>).</p>
<p><img src="/images/devopsdays2021_small_releases.jpg" alt="Small releases mean realizing value faster"></p>
<p>Although the platform was a monolith, they deferred splitting it up. If they
would have broken down the monolith into smaller pieces, they would have more
manual steps for each release and made the situation worse. So the strategy was
to automate first and then split.</p>
<h3 id="phase-1">Phase 1</h3>
<p>The teams increased the release cadence. They switched to a weekly release cycle
instead of a release every two weeks (on good weeks). Releasing more often makes
problems visible faster.</p>
<p>The release process was also automated. They created a pipeline to build and
deploy the platform. In this phase there was still a manual step between the
deployment to acceptance and production. They didn&rsquo;t trust the system enough
yet.</p>
<p>They also made changes in their way of working. The big one: pair programming.
This is superior to code reviews, for example because you also discuss code that
does <em>not</em> get written and you have architecture discussions.</p>
<p>Other changes:</p>
<ul>
<li>A &ldquo;do not leave broken windows&rdquo; mentality. This is from the book
<a href="https://pragprog.com/titles/tpp20/the-pragmatic-programmer-20th-anniversary-edition/">The Pragmatic Programmer</a>)
which states: <q>neglect
accelerates the rot faster than any other factor.</q></li>
<li>Measure how they were doing in terms of their continuous delivery journey.
The book <a href="https://leanpub.com/measuringcontinuousdelivery">Measuring Continuous Delivery</a>
has useful information about this.</li>
</ul>
<h3 id="phase-2">Phase 2</h3>
<p>In the next phase they had a robot press the button to deploy to production.
This required zero downtime deploys. They solved this by doing rolling updates
(a load balancer in front of a couple of backend servers and then upgrade one
server at a time until they are all up-to-date).</p>
<p>Pipeline failures come in all forms, e.g.:</p>
<ul>
<li>Flaky tests</li>
<li>Timeouts</li>
<li>Network stability issues</li>
<li>External dependencies in tests</li>
</ul>
<p>If you cannot trust a test, remove it. Otherwise you&rsquo;ll probably work around it,
which is more dangerous.</p>
<p>This level of automation also meant they treated pipeline failures as priority 1
issues (which warrants extreme feedback to notify people of a broken build:
lights, sounds).</p>
<h3 id="phases-3">Phases 3</h3>
<p>In the third phase they made it faster:</p>
<ul>
<li>Scale vertically and horizontally</li>
<li>Parallelize tests</li>
</ul>
<h2 id="the-adjacent-possible-evolution-innovation--catastrophe--jason-yee">The Adjacent Possible: Evolution, Innovation &amp; Catastrophe &mdash; Jason Yee</h2>
<p>The four cornerstones of resilience:</p>
<ul>
<li>Monitoring: knowing what to look for</li>
<li>Responding: knowing what to do</li>
<li>Learning: knowing what has happened</li>
<li>Anticipating: knowing what to expect</li>
</ul>
<p>The term &ldquo;adjacent possible&rdquo; comes from biology in the context of evolution.
Evolution requires a chain of changes to happen.</p>
<p><img src="/images/devopsdays2021_adjacent_possible.jpg" alt="Adjacent possible: evolution by one small change after another"></p>
<p>Complex systems usually do not have a single root cause, only contributing
factors. Perhaps there&rsquo;s an &ldquo;adjacent possible&rdquo; of failure. We cannot anticipate
all failures, but perhaps we can explore them.</p>
<p><img src="/images/devopsdays2021_known_unknown.jpg" alt="Known knowns, known unknowns and unknowns unknowns"></p>
<p>How do you explore adjacent possible failures?</p>
<p>To move from known knowns to known unknowns, we can use chaos engineering
(<q>thoughtful, planned experiments to improve our understanding of how systems
work (and fail), so that we can improve them</q>). This way we can explore the
known unknowns and these then become the known knowns. From that standpoint,
the original unknown unknowns become the known unknowns.</p>
<p>Scientific process:</p>
<ul>
<li>Observe (what is normal)</li>
<li>Hypothesize (how do we think it will react to failure)</li>
<li>Experiment (inject failure)</li>
<li>Analyze (what did actually happen)</li>
<li>Share knowledge</li>
</ul>
<p>Explore the adjacent possible!</p>
<h2 id="minimal-viable-presentations--mark-smalley">Minimal viable presentations &mdash; Mark Smalley</h2>
<p>There are four important points to take into consideration when creating a
presentation:</p>
<dl>
<dt>Accept that it is not about you</dt>
<dd>It is the audience&rsquo;s presentation. What you want to tell might not be what
they want to hear.  Kill your darlings.</dd>
<dt>Anticipate silent questions</dt>
<dd>These questions are:
<ul>
<li>Huh? What is he talking about? (Solution: be clear about the self evident stuff)</li>
<li>Really? (Solution: be clear about the evidence)</li>
<li>So? How is this relevant for me? What&rsquo;s in it for me?</li>
<li>What&rsquo;s next?</li>
</ul>
</dd>
<dt>Think of learning objectives</dt>
<dd>Think about what the audience should <strong>know</strong>, <strong>believe</strong>, <strong>be able to</strong> do and <strong>feel</strong> after the
presentation.</dd>
<dt>Dare to choose your ideal audience</dt>
<dd>Accept that the presentation will not be for the majority of the people
listening.</dd>
</dl>
<figure><img src="/images/devopsdays2021_learning_objectives.jpg"
    alt="Learning objectives of this talk"><figcaption>
      <p>Mark Smalley applies the advice given in his talk to this talk itself</p>
    </figcaption>
</figure>

<h2 id="engineering-metrics-that-matter--dan-lines">Engineering Metrics That Matter &mdash; Dan Lines</h2>
<p>Before we dive in, let&rsquo;s start with some
<a href="https://en.wikipedia.org/wiki/Performance_indicator">KPI</a>s you want to avoid:</p>
<ul>
<li>Lines of code</li>
<li>Number of commits</li>
<li>Individual stack ranking</li>
<li>Anything with story points</li>
</ul>
<p>So what <em>do</em> you want to measure? What will improve your process?</p>
<dl>
<dt>Couple pull request size with cycle time (how much time from coding to deployment)</dt>
<dd>What you&rsquo;ll see is that bigger PR size takes longer to review and
thus increase cycle time.</dd>
<dt>Deployment frequency</dt>
<dd>How often can we get new value into the hands of customers.</dd>
<dt>Idle time</dt>
<dd>Waiting for e.g. a release or review. Waiting means context switches which
means less productivity.</dd>
<dt>Code churn</dt>
<dd>How much code are we reworking in a short period of time? High code churn
indicates delays and quality issues.</dd>
<dt>Review depth</dt>
<dd>The amount of feedback per PR.</dd>
<dt>Mean time to restore</dt>
<dd>How long it takes to fix an incident in production.</dd>
<dt>Investment profile</dt>
<dd>New functions for our customers vs bug fixing, backend or non-functional work.</dd>
</dl>
<h2 id="devops-is-no-walk-in-the-park--sabine-wojcieszak">DevOps is no Walk in the Park &mdash; Sabine Wojcieszak</h2>
<p>In a park you are safe, you do not need a map, can wander where you want and you
need no preparation. DevOps is not like that.</p>
<p>The term CALMS has been coined by John Willis to explain DevOps. It stands for:</p>
<ul>
<li><strong>C</strong>ulture</li>
<li><strong>A</strong>utomation</li>
<li><strong>L</strong>ean</li>
<li><strong>M</strong>easurement</li>
<li><strong>S</strong>haring</li>
</ul>
<p>Most people talk a lot about the automation and some about the measurements
part, but very few about sharing and even less about culture.</p>
<p>There is a similarity with Agile where the term began as a niche but became
hyped and then mainstream, where people talk about Agile without knowing what it
is. We should prevent this from happening with DevOps.</p>
<p><img src="/images/devopsdays2021_agile_mainstream_devops.jpg" alt="Agile: from niche to mainstream. Will the same happen with DevOps&quot;"></p>
<p>It&rsquo;s no longer business <em>and</em> IT, but IT <em>is part of</em> business. One could even
say IT <em>is</em> business.</p>
<p>DevOps is <strong>not</strong> cherry picking what we like (automation) or only picking the
low hanging fruit and calling it &ldquo;DevOps.&rdquo; It needs an holistic approach. But
this needs a lot of ingredients. In the right mix.</p>
<p>You should always ask yourself <em>why</em> you are doing DevOps. Do you think it is
cheaper because of the level of automation? Because everyone is doing it? To
deliver better software?</p>
<p>You need to understand the whole approach.</p>
<p>DevOps means to Sabine: <q>deliver valuable products with better quality sooner to
customers/users.</q> This means we need to know what is valuable for our customers,
what our customers think of as quality, etc.</p>
<p>The term CI/CD in context of DevOps traditionally stands for continuous integration
/ continuous delivery. But it could also stand for continuous improvement /
continuous development, if we think of products and people. But we can also talk
about continuous improvement and continuous discovery of opportunities and
outcomes.</p>
<p>You cannot buy DevOps&mdash;neither with tools, nor with certifications. But it&rsquo;s
not for free either. You will fail for example.</p>
<p>Some anecdotes of where it went wrong:</p>
<ul>
<li>&ldquo;We are now the DevOps department.&rdquo; Which means there is still a silo with a
lack of communication.</li>
<li>&ldquo;We have a DevOps team that is setting up the pipelines.&rdquo; Again no
communication, no transparency.</li>
<li>&ldquo;We have a linter but it gave to many warnings so we turned it off.&rdquo; No
understanding about why it is done, no trust that it helps to grow, sign of
fear to make mistakes.</li>
<li>&ldquo;We are measured by the number of developed features, but not allowed to talk
to sales.&rdquo; It&rsquo;s a feature factory, measuring the wrong things. Again also not
enough communication and silos.</li>
<li>&ldquo;IT has not reported any problems.&rdquo; Be open to the obvious: if customers
complain, there is a problem. Ask yourself what is actually monitored? Not
business relevant metrics obviously otherwise the problem would have been
detected by the company instead of the customers.</li>
</ul>
<p>DevOps is more than automation, it is CALMS.</p>

  <figure>

<blockquote >
If you don&rsquo;t get the C then don&rsquo;t bother with the A, L, M or S.
</blockquote>

  <figcaption>
    &mdash;John Willis
  </figcaption>
  </figure>]]></content>
  </entry>
</feed>
