FOSDEM 2026

January 31, 2026 (Last modified on February 1, 2026)

#ansible #conference #docker #infrastructure as code #git #python #security

Table of Contents

January is already almost over, so time for FOSDEM, the yearly free event for software developers to meet, share ideas and collaborate in Brussels. Last year I focussed on the Go track, this year I selected a mix of security and Python related talks to attend.

Streamlining Signed Artifacts in Container Ecosystems — Tonis Tiigi

It’s possible to sign Docker images, but at the moment most are actually not signed. Also, users should understand what the signature is protecting and what it’s not protecting. We should not want signing just to tick a box on the security checlist, but because of the security it adds. And we need something simple: integrated with existing tools, should not slow down tools.

Buildkit powers “docker build” but is not limited to Dockerfiles. It’s high performance, can build complex builds and has caching.

A modern build is a graph of images, Git repositories, local files, etc. The results are images, binaries, archives.

Photo of Tonis Tiigi explaining the graph that is modern software building — Tonis Tiigi explaining that builds of modern software are a complex graph

We need Supply-chain Levels for Software Artifacts (SLSA) provenance: what has actually happened in the build? What was the build config? Et cetera. It’s useful to figure out how an artifact was built.

Buildkit does not sign images by default. GitHub has an example in the documentation to run a build with Buildkit and generate an artifact. It claims to generate an unforgeable statement. But if your GitHub credentials are leaked and the attacker can get your hands on the temporary signing key, they can use it to sign their own artifacts.

Docker created the github-builder repository. It contains reusable GitHub Actions to securely build images. If you use this, your images are signed to prove that they were built from a certain repository, using the configured build steps. Where Buildkit (among other things) provides isolation, github-builder provides signing context. It also protects against build dependency leaks.

So that takes care of the signatures, but how do you verify them?

The command “docker inspect” now shows verified signatures
You can manually verify it with cosign
You can also use sigstore/policy-controller for Kubernetes

Buildx also includes experimental Rego (Open Policy Agent) policy support. This means you can write a matching policy for Dockerfile, e.g. Dockerfile.rego, which is then automatically loaded. All build sources now need to pass policy for the build to continue (images, Git repositories, URLs, etc).

You can do very complex stuff in the policies. As simple example Tonis showed:

package docker

allow if {
  input.image.repo = "org/app"
  docker_github_builder_tag(input.image, "org/app", input.image.tag)
}

This policy should make sure that the image can only be built from this repository and that the image tag should match the Git tag.

Summary:

No reason not to sign
Not all signatures are equal
Software pulling packages should verify pulled content

Link to the conference page

Sequoia git: Making Signed Commits Matter — Neal H. Walfield

Version control systems (also known as VCSs) track the following:

Changes to the code
Authorship
Other metadata
Commit message

But the author can be faked: the metadata is set by the author, including the author’s name. After a quick “git config” command you can commit as anyone you want, for example Linus Torvalds. Sure, GitHub could see that the committer (the one pushing the commit) and author are different. However, this is not necessarily bad because we might simply want to give proper attribution to the author of the commit.

And in theory the forge might also be compromised, or someone may have gotten permission to push to the project.

To prevent impersonations, we can cryptographically prove who the author is by signing the commits. But now the problem shifts to the certificates. Because anyone can create a key with any name (again, for example Linus) attached to it. So what does a signed commit mean now?

How can we be sure that the author is who they say they are? There are ways:

You could talk to developer the verify
You could go to key signing parties
You can use a central authority that you trust (e.g. keys.openpgp.org, the Linux developer keyring, the distributions-gpg-keys package, or, if you trust Github, use github.com/<username>.gpg)

You can use the following command to show the Git log and the signatures on them:

git log --show-signature

But now you need to actually check that the signatures are indeed made by the certificates you trust.

It’s up to the maintainers of the software to curate a list of contributors and track when contributors join and leave (yes, there is a temporal element as well). This is hard. Maintainer needs tooling. And you would want to detect unauthorized commits (impersonation, a malicious forge, a machine in the middle or for instance when project is given to a new maintainer by a forge/registry).

What does the solution look like?

Clear semantics
The project itself maintains signing policy
Third party uses maintainers’ policy to authenticate project
Verification, not attestation: do not rely on any external authority

(Note that the maintainers can still be socially engineered to include the key of an attacker in their policy. So they still have to be careful about who is added to the policy.)

Sequoia git provides:

Specification
Config
Tooling

With Sequoia git (which part of the Sequoia PGP project) you can have a signing policy in an openpgp-policy.toml file in the project’s Git repository. It specifies users, their keys and their capabilities. You can use sg-git to help maintain this file.

For instance to add user Alice and then describe the current policy, you can use the following commands:

sq-git policy authorize alice --committer <cert>
sq-git policy describe

A commit is “authenticated” if at least one parent commit says the commit is acceptable (via the policy). To verify that there is an authenticated path from the current state back to a certain commit we trust, use this command:

sq-git log --trust-root <sha of trusted commit>

Projects may have contributions from others that are not included in the policy. To maintain an authenticated path when accepting the contribution, a trusted author needs to merge the contribution via a merge commit that is authenticated. (You may need to use the “--no-ff” on the merge to make sure there is a merge commit though.)

Link to the conference page

An Endpoint Telemetry Blueprint for Security Teams — Victor Lyuboslavsky

With open source we can inspect something that is broken, we can change the defaults. With security we are used to the opposite; it’s a black box. We are not used to owning the data. The data exists on the endpoints, but ownership is transferred to a different team. How can we add more security in a way engineers understand and can use?

Victor presents a blueprint with the following layers:

Endpoint agents
Control layer
Ingestion, streaming & storage
Detection
Correlation, intelligence and response

The value is not in the layers themselves, but the boundaries. For example, the ingestion should move the data reliably but should not care which tool collected it. This makes them loosely coupled.

For endpoint agents Victor suggests osquery which allows basic questions about endpoints. Data is structured and consistent. It aligns with open source values. (Alternatives: scripts & cron, log shippers like filebeat or tools like auditd or Event Tracing for Windows.)

Controlling the data (the next layer) means that you want to have:

Central config
Live queries
Consistent schemas

Fleet (disclaimer: Victor works here) is built to manage osquery at scale and a good candidate for this layer.

The control layer needs to work hand-in-hand with ingestion layer. The ingestion layer moves data to downstream system. E.g. Vector or Logstash can be used here.

Ingestion isn’t where you get clever. It’s where you get reliable.

Streaming decouples users from consumers and e.g. allows replay. Note that this is an optional step and it would come after ingestion, not in place of it. For instance Apache Kafka can be used in this layer. Ingestion absorbs the mess. Streaming preserves flexibility.

The storage layer is where telemetry becomes durable. It’s about being able to ask hard questions later. Examples of useful tools: ClickHouse, Elasticsearch (which is better at text search) and Iceberg (which is slower for active investigation).

For the detection layer you might want to use Sigma. It provided portability. Rules are translated to native SQL running on ClickHouse. Intent (Sigma signatures) becomes execution (SQL query to get the data).

Finally the correlation layer: Grafana can be used for correlation and visualisation. Grafana can query ClickHouse. Grafana also has alerting.

Note that response isn’t just about automation. It’s also to pause and ask better questions. The correlation layer should focus on enabling humans to act.

Open endpoint telemetry is not an “EDR killer”. It does not replace it. It adds diversity and complements other tools. It provides a second set of eyes.

Link to the conference page

The Bakery: How PEP810 sped up my bread operations business — Jacob Coffee

Python loads imports eagerly by default. This leads to memory bloat and cold start issues. Explicit lazy imports (see PEP 810) only import a module when it’s first accessed not when the import statement is executed.

Lazy import is scheduled to be included in Python 3.15 and looks like this:

lazy import foo from bar

The design principles applied are that lazy imports are:

Explicit
Local
Granular

When parsing the Python code a proxy module is created. Only when the module is actually used, the proxy is transparently replaced by the real package. You will not always see improvements, so do not blindly replace all imports with lazy imports.

PEP 810 also eliminates the need for TYPE_CHECKING guards. (See the typing docs, in short: importing a module that is expensive and only contains types used for type checking in an “if TYPE_CHECKING:” block.) It also helps for faster test discovery and collection, less memory usage, decrease cold start slowness in e.g. AWS Lambda functions, CLI applications, etc.

Meta (with Cinder) saw a 70% startup time reduction and 40% memory savings. PySide has a 35% startup improvement.

About CLI tools: when using lazy imports you might notice the difference when using --help. There’s no need to load all dependencies to just output the help text of a tool.

Some notes:

Import time side effects (e.g. logging configuration, DB connections) are also delayed!
Type checkers need to be updated
Import errors move to first use (so in runtime, not at launch). Keep that in mind when debugging
It’s not always faster, so profile your application before migrating and see where you can potentially benefit
Document your lazy imports!
You cannot do lazy imports in functions

Circular imports are probably still a problem, but they just show up later.

Link to the repo for this talk

Link to the conference page

Modern Python monorepo with `uv`, `workspaces`, `prek` and shared libraries — Jarek Potiuk

Jarek is, besides his other roles, the number 1 Apache Airflow contributor. The Apache Airflow repo is the monorepo he talks about today. There is also a series of blog posts about this topic: see part 1, which links to the other parts.

Airflow drove early requirements for uv workspaces. They now manage 120+ distributions seamlessly with it. It allows them to combine distributions to work together in a workspace. Also used to import from one distribution in another one.

The project shares a single virtual environment used by uv in root of project. If you run “uv sync” from the top level you get everything. If you run it in a subdirectory (e.g. airflow-core) you only get what is needed for that distribution.

Benefits of the uv workspaces:

Isolated
Explicit
Flexible

Hatch has (or will have, at the time of writing) largely compatible workspaces.

However pre-commit became a bottleneck. They needed to run 170+ pre commit hooks on every commit. Prek is drop-in replacement for pre-commit and works fantastic. It is optimized for speed and monorepos.

Airflow uses symlinked shares libraries (where a shared lib is also a distribution). The Hatchling build backend needs to replace links with physical copies during packaging. They use Prek to maintain consistency.

uv sync detects conflicts between merged requirements files and Prek hooks enforce relative imports in shared code to prevent cross coupling issues (IIRC)

Link to the conference page

PyInfra: Because Your Infrastructure Deserves Real Code in Python, Not YAML Soup — Loïc “wowi42” Tosser

Loïc is a Frenchmen (which, as he himself states, means he must have opinions) and not a YAML fan to put it mildly. That is: YAML as a programming language, e.g. how it is used in Ansible.

Photo of Loïc Tosser showing a complex Ansible task in YAML — Loïc Tosser demonstrating what happens when you ask a config file to be a programming language

PyInfra is an infrastructure as code library to write Python code which is then translated to shell scripts to run on the target hosts. So, in contrast to Ansible, you do not need Python on the target. The target machine only needs SSH and a POSIX shell. You can also configure Docker containers with PyInfra.

If it has SSH, PyInfra can talk to it.

PyInfra has idempotent operations and built-in diff checking. Declarative infrastructure with actual code and not YAML. You can use inventory from Terraform, Coolify or any API.

You can leverage the entire Python packaging ecosystem. Slack integration? Just use the right Python package.

PyInfra is not only a CLI tool, you can also use it as a library.

PyInfra is 10 times faster than Ansible, uses 70% less code, has proper code reuse via import and proper loops instead of with_items. It can have actual unit tests and can scale to thousands of servers. Also you no longer have error messages stating that the error appears to be in … but may be elsewhere in the file … (looking at you Ansible). PyInfra has clear error messages without having to specify -vvvv and wading through hundreds of lines of output.

The suggested migration path:

Start small, one playbook at a time
Use your IDE for autocomplete and refactoring
Leverage Python’s standard library and the ecosystem with all its packages
Sleep better because you don’t have to debug at 3 AM.

Is PyInfra production ready? Yes! It has a stable API, is already in use in production, it’s actively maintained and is MIT licensed (so no commercial entity behind it to steer its direction).

You can get started today with a simple “pip install pyinfra”.

Link to the conference page

(Note from me, Mark, I found Loïc a great speaker: he has lots of energy, is funny and can transfer his enthusiasm to the room. If the topic interests you and the video becomes available, I would recommend watching this talk as a great sales pitch to get started with PyInfra.)

Ducks to the rescue - ETL using Python and DuckDB — Marc-André Lemburg

ETL stands for Extract, Transform, Load. Nowadays we usually do Extract, Load, Transform because databases are efficient in processing.

DuckDB is open source, in-process analytics data storage (OLAP). It is similar to SQLite, but for OLAP workloads. It has great Python support and uses SQL as standard query language. It’s pip installable, column based (Apache Arrow). It’s single writer but allows for multiple readers, so it’s not a distributed database.

Polars’ streaming can help with processing your data as a line-by-line stream so you don’t have to load the whole file in memory at once.

Example to load a CSV file into DuckDB extremely fast:

SELECT * FROM read_csv(...)

You can load the data into staging tables first to prepare everything and not mess up e.g. existing data. You can then transform data in DuckDB, e.g. filter out unneeded and duplicate data, validate data, fill in missing data, convert data types, etc. You can do the transforms in SQL. You can even use native integrations to write to PostgreSQL, MySQL, etc. Or worst case stream to Python.

Guidelines:

Know your queries, that is: know how your data is going to be used
Use the Pareto principle (80/20 rule): optimize for queries that are used often
Keep a healthy balance between performance and space requirements (which are often trade-offs)

Huge datasets: use the DuckLake extension.

To get started: “uv add duckdb”. Do some experiments and see how it works for you.

Link to the conference page

My takeaways

Yes, FOSDEM is crowded and you may not be able to get into every talk you want to see in person, but it’s still nice to be there. It’s well organised and there’s a friendly atmosphere. Lots of interesting projects to see and people to talk to. And it’s convenient if you want to sponsor your favorite projects by buying some merchandise.
It’s worth investigating signing Docker images (in the right way) further.
Lazy imports look useful! Once Python 3.15 lands it’s worth doing profiling on the projects I work on to see if we can use those to speed things up on startup and save some memory.
At work we recently decided to go for a monorepo for a project. I want to see if/how uv workspaces and prek can help us.
I’ve written a bunch of Ansible roles to configure my humble homelab and laptop. Perhaps it’s time to switch to PyInfra? It sounds promising and might be worth the investment of migrating to.

About the trip

Picture of the Atomium at night — The Atomium at night

Last year I drove to Brussels on Friday and stayed at the city center in the Citybox Brussels hotel for one night, since I had to be home on Sunday. The upside: it was just a short (15 minute?) tram ride to the FOSDEM location. Unfortunately it did mean I had to drive home that evening.

This year I had more time, so I booked a room at Falko Hotel for two nights. It’s about a 20–30 minute drive (depending on traffic) to the parking garage I used. And from there about 20 minutes with pubic transport to the Université libre de Bruxelles.

Staying another night meant I had more time for sightseeing, had the time to write this post from my notes and could drive home well rested the next day.

As for tech: besides a phone and laptop, I also brought along two items that made the trip more comfortable:

A MOJOGEAR Mini Evo powerbank to give my phone extra juice to make it through the day. With 10.000 mAh and up to 22.5W of power it’s more than sufficient for a day at a conference. With its small size and less than 175 grams in weight, it’s also easy to carry around.
A GL.iNet Opal (GL-SFT1200) travel router. I plug it in, hook it up to the hotel internet, start a VPN connection and all my other devices automatically connect to it and can use the internet without the hotel snooping on my traffic. (Not that I have an indication that my hotel would do that, but theoretically they could if I would not use a VPN.)

Streamlining Signed Artifacts in Container Ecosystems — Tonis Tiigi

Sequoia git: Making Signed Commits Matter — Neal H. Walfield

An Endpoint Telemetry Blueprint for Security Teams — Victor Lyuboslavsky

The Bakery: How PEP810 sped up my bread operations business — Jacob Coffee

Modern Python monorepo with uv, workspaces, prek and shared libraries — Jarek Potiuk

PyInfra: Because Your Infrastructure Deserves Real Code in Python, Not YAML Soup — Loïc “wowi42” Tosser

Ducks to the rescue - ETL using Python and DuckDB — Marc-André Lemburg

My takeaways

About the trip

Modern Python monorepo with `uv`, `workspaces`, `prek` and shared libraries — Jarek Potiuk