Making a micro Linux distro (2023)

How does “infrastructure on top of infrastructure” run?

Building our almost useless Linux micro distrbibution

Bonus section: making an actually useful micro distribution with u-root

Lastly, we are doing this example on the RISC-V architecture, specifically QEMU’s riscv64 virt machine. There’s very little in this article that is specific to this architecture, so you might as well do an almost identical exercise for other architectures like x86. We recently went through the RISC-V boot process with SBI and bare metal programming for RISC-V, so this is just a continuation up the software stack.

Warning: This article is a very simplified view of a Linux distribution. There are things written below that are not 100% accurate, but more like 99.9%. This article is meant for beginners and helping them form a basic mental framework for understanding Linux systems. More advanced users may be triggered by over-simplification in some parts.

Basically, the OS kernel does a lot of heavy lifting to enable you to run your code easily on a very generic and complicated machinery such as your smartphone. What is written above probably doesn’t do full justice to the kernels, they do a whole lot of things, but the few paragraphs above should give a fairly good idea of kernel’s main tasks, and there are many.

Linux is an extremely popular operating system kernel. It can be built to run on many architectures (really, a lot), it is open source and free to use. And a lot of people are “Linux users”, but what does it exactly mean that someone “uses Linux”? Those Linux users typically install something like Debian, or Ubuntu on their machines, and they use Linux that way, and what does that mean?

However, the kernel alone is not infrastructure for Chrome to run. We need to run sort of “infrastructure on top of infrastructure” to achieve the full infrastructure to run Chrome. Again, much like in the SBI article, we’re just layering abstractions on top of each other in some way, so essentially there is nothing new here, just the way we do it.

You may now guess where this is going — a Linux distribution is really the Linux kernel plus the infrastructure on top of the kernel infrastructure. Let’s dig into it.

Again, the kernel does a whole bunch of things, a million times more than what we can cover in a single article, but it definitely has its limits and it doesn’t do all the heavy lifting on your everyday personal device — and this is where something outside of the kernel gets into the picture.

The collection of kernel, the processes that get launched right after the kernel, and the tools that are available at your disposal represent the Linux distribution. It’s essentially a packaging for the kernel alongside all these useful tools that do more around the machine than what the kernel alone does (but it still provides the infrastructure for everything outside of the kernel to run, nothing bypasses the kernel).

Let’s get our hands dirty and build something that’s basically useless but we’ll actually end up booting it for real. You may want to refresh your memory on the RISC-V boot process, I think it will be rewarding here.

I’m on an x86 platform here, so I will depend heavily on the cross-platform toolchain to build things for RISC-V. You will likely do something similar (I’m not sure I have yet seen someone build the RISC-V kernel on RISC-V itself).

Now is the time to configure the build. The first step is to make the defconfig which basically initiates your configuration file.

Note: Here and below, you may want to use a different CROSS_COMPILE prefix, depending on how the cross compilation tool is identified on your machine

This was hopefully quick and the .config file should be generated. The config file should contain a lot of IDs for individual configurations and the values for those, very often in yes/no format (e.g. CONFIG_FOO=y or CONFIG_FOO=n). You could edit the file manually, but I personally wouldn’t recommend it, especially as a beginner (I don’t consider myself an expert at this either). A better way to edit this is through the curses-based pseudo-interface. You can get there by running

It’s time to build the kernel! Quick note here, the make process famously has the -j flag, which basically sets the concurrency in the build process, meaning it allows the build process to run a few things simultaneously. If you want to build faster, but not sure what to do, count the number of cores, and if it’s something like 8, just pass the flag -j8 below, as so. I will run the command like this (I’m on a 16-core machine):

This can take some time, though for the RISC-V build, it shouldn’t take awfully long, but I would expect at least a few minutes.

Switching to the UART view, we see that OpenSBI tidily started and the Linux took over! Great! We even see some references to the SBI layer that we have discussed before:

Seems like Linux is capable of dynamically figuring out the capability of the underlying RISC-V hardware. I’m not sure what exactly is the mechanism behind it, could it be somehow passed through the device tree that we mentioned in the previous article, or something in the ISA itself tells this to the kernel, I’m not sure.

I guess this means that printk will now write to tty0? printk is basically a way to write out messages from the kernel space. Remember, your typical printf from C’s stdio.h is meant for running in the user space, not kernel space, so kernel space must have its own solution, and it is printk.

Great, Linux knows there is UART at 0x10000000, just like we established before. Linux can now choose whether to use the SBI interface to drive the UART, or talk to it directly (if the S-mode allows it on that machine, that is). On many platforms, the OS can disregard that a lower level software like BIOS may offer to interact with the hardware, and from what I hear, this actually indeed happens a lot.

We’ll keep it simple in this article, and we won’t customize anything in the kernel unless we have to.

Remember how we talked that pretty much always Linux needs a filesystem to be useful and how all the “infrastructure on top of infrastructure” is in the user space? Well, we didn’t really pass anything related to the filesystem explicitly and we surely didn’t pass any user space code to serve as the init, though we didn’t even get to the latter.

This initial filesystem has a name: initramfs. You’ll often hear it called initrd too (I imagine rd is short for ramdisk?). The latter is how QEMU takes in the filesystem for loading (-initrd flag).

The filesystem is packaged as a cpio archive, which is conceptually similar to tar, but it’s not the same binary format. Short discussion can be read here.

I guess this just means init shouldn’t finish, so it should be easy to fix? Let’s just make it print something every 10 seconds and never stop. Important to note: our output worked, we see a “Hello world” string!

We’ll write a new init, but let’s also make our initramfs a little more complex too. Let’s remember how we said that init starts up all the other processes on the machine. Wouldn’t it be nice if we actually had some sort of a shell? After all, that’s what we typically have with Linux — shells go well with Linux. We’ll build a useless shell, the one that just tells us what we asked it to do (echoes back the input).

For the “shell” we’re building, I want to get a little more creative. Why don’t we write this one in Go instead of old school C?

The bits in this console excerpt enclosed with triple square brackets are my user-provided input over UART. You can see 3 things interleaved on the UART

Jokes aside, you can make an exercise out of this and implement some sort of a mini shell out of this little_shell. Instead of just echoing back the commands given to it, you could make it actually understand what mkdir is. You can even have it fork off a process to execute that elsewhere. Sky is the limit, you’re in the Linux userspace!

There are many other things the kernel does for us, but let’s just stop here for now and appreciate this. It may not look like a lot, but the kernel gives us a pretty solid, portable infrastructure with which we can develop high level software while often disregarding the complexities of the underlying machine.

This is now a game of words in my opinion. In my view, what matters is that the reader now has an understanding of what Linux as the kernel is, what “infrastructure” it offers, and what is running in the user space and what is running in the kernel space.

Some people may call the kernel itself an operating system, some people will refer to the whole distribution as the operating system, or they may come up with something completely different. I hope that at this point you have a good understanding of what is happening on a machine once Linux is started and where the responsibilities of each component end (or you can at least imagine the boundaries on a more complex system).

The reason why I like the u-root project is because it’s so insanely easy to use. Its usage is a bit creative though, so there are really 2 steps here:

And that’s really it, this cpio file can now be just ran with QEMU and you’ll boot right into a shell! Go through the u-root documentation to understand how you can customize this initramfs image you get, including what sort of changes you can make to the init process behavior, but I think the default setup is so amazing to explore with.

And as you can see by the little /# prompt, you’re actually in a shell! u-root’s init forked off a shell process and gave it the control over the UART.

This little shell that u-root gives even supports Tab-completion! I will say I have encountered some hiccups occassionally with it, it’s definitely not your full blown Bash, but it’s more than just a toy.

First, we need to attach a network device. We add -device virtio-net-device,netdev=usernet -netdev user,id=usernet,hostfwd=tcp::10000-:22 to our QEMU CLI. I think the last 2 numbers do not really matter as we won’t be SSH’ing into this machine (maybe you can do that exercise yourself, but I’m afraid it won’t be easy). The default kernel build should indeed bake in the virtio network device drivers, so this should more or less just work.

XDEFiANCE'e Quality Internet Shop

How does “infrastructure on top of infrastructure” run?

Building our almost useless Linux micro distrbibution

Bonus section: making an actually useful micro distribution with u-root

This is the xdefiance Online Web Shop.

Reaching Outwards

Join the fun!

Recent blog posts

How to Build Reactive Declarative UI in Vanilla JavaScript

Fossil versus Git

Lightpanda migrate DOM implementation to Zig

Ai, Japanese chimpanzee who counted and painted dies at 49

CDC staff 'blindsided' as child vaccine schedule unilaterally overhauled

MIT Non-AI License

Your cart (items: 0)