04 Aug 2019
Once again, various life stressors have resulted in me attempting to retreat
into my little shell, and whenever that happens I always seem to find something
new to work on. In this particular instance, the stress seemed to perfectly
coincide with my discovery of a new Linux distribution, NixOS. I ended up
becoming very fond of the distribution almost immediately, and after two weeks
of living on it and learning a bit more about it, I recently redid the MacBook
utilizing Nix as the sole operating system on it.
The next series of blog posts, assuming I manage to stay consistent, will be
about my journey using NixOS, why I have become so fond of it so quickly, and
other interesting things I come up with along the way.
Why NixOS?
The most predominant form of package management in linux distributions tends to
be imperative package management. Even in distributions like Gen/Funtoo,
the make file is in general a static entity composed of variables that resolve
how imperative commands, such as emerge system
are executed. Essentially, we
spend our time telling the system how we want it to accomplish what we
actually want, and by following the list of imperative commands we
(theoretically) end up with what we wanted.
The problem is, this doesn’t always work. To quote GlaDOS, “This former Linux
Developer would like to remind you that Dependency Hell is a real place
where you WILL be sent at the first sign of poor imperative order control.”
Essentially, because we are so focused on how something will be accomplished,
if anything along that chain breaks we end up in a genuinely unfortunate
situation.
This is not an old problem, and many distributions have attempted to resolve
this. Back in 2004/2005 when I was starting to get interested in Linux I spent
some time talking to a friend about what I was using to tinker. While I cannot
remember the specific distributions, I do know that RPM was the primary
mode of package management on them. He essentially scoffed at me, and said while
he was glad I was using Linux, he wished me luck when I inevitably discovered
“RPM Hell,” perhaps one of the earliest forms of dependency hell.
When I inquired as to what he was using, he told me about Gentoo. This was
a very very long time ago, where Stage 1/Stage 2/Stage 3 installations were a
thing. The benefit, I was told, is that because literally every part of the
package is imperatively defined ahead of time, you mitigate the chance of
dependency hell, because you’re modifying the entire operating system as a
single unit.
This began my extremely long and at times complicated history with the
Gen/Funtoo community during my past life. I still hold an incredible soft spot
for their ecosystem, but there are situations that can occur even within the
portage subsystem, and the method of doing things imperatively often means that
the problem isn’t discovered until the middle of rebuilding your system, for
example with an emerge world
. Not to mention there is a penalty, and it was
even worse during the P3/P4 days, for building everything completely from
source.
Various hybrid Binary/Source systems were developed over time, one that I
remember offhand is Sabayon which started as a layer over Gentoo if I
remember correctly, that would install unmodified packages from binary but
anything modified would be built. However, this technique just solves the speed
issue, it does nothing to resolve the imperative issues.
How does Nix Help?
So Nix works by utilizing a declarative form of package management.
Essentially, instead of focusing on telling Nix how we want our system, we
tell it what system we want. We declare the Operating System we desire as a
function, Nix evaluates that function, and then it decides how it will form
that system. This is combined with transactional, atomic, procedures. Every
“Generation” of Nix is a stand alone operating system. Installing/Un-installing
packages never leaves any cruft behind because the packages are just dynamically
linked upon boot. Removing the package from our declaration removes the request
to symbolically link it to the system, which means it never existed.
Nix, in particular, is a functional package manager, an even more strict
form of declarative package management. Instead of rehashing the explanations,
I’ll let the wonderful nix webpage explain the difference. NixOS is simply
an entire operating system built on top of Nix.
Beyond the benefits discussed on the official Nix website, there are other
benefits to putting the entire operating system under functional control.
As an example, let us take the boot process. Since the system we are going to
get to below utilizes encrypted swap and an encrypted root, you would expect
that we need to handle a custom initrd, custom fstab, etc etc. You would be
right! But instead of handling multiple files to get this done, we can actually
do everything right in our configuration.nix
.
Swap:
swapDevices = [
{
device = "/dev/disk/by-uuid/f6533f92-baf2-4804-afda-880a7b5975ac";
encrypted = {
enable = true;
keyFile = "/mnt-root/root/swap.key"; # Believe it or not, this is correct.
label = "nixos-swap";
blkDev = "/dev/disk/by-uuid/6babbdb8-26ec-43ee-b7ab-76b43015acd3";
};
}
];
Root FS:
boot = {
initrd = {
luks = {
devices = {
decrypted-disk-name = {
device = "/dev/disk/by-uuid/0765a1fc-6045-45af-978e-db49609bc0e3";
keyFile = "/root.key";
};
};
};
};
};
Additionally, while some distributions have opted to have several tools for
building your grub.cfg
, those still rely on modifying external files, for
example under /etc/default
or other directories. Instead, we add it right into
our configuration.nix
as well. We are just declaring what we want Nix to do,
whatever else it decides it needs is up to it.
Grub Configuration:
boot = {
grub = {
device = "nodev"; # This isn't for BIOS.
efiInstallAsRemovable = true; # Try to use Standard EFI.
efiSupport = true; # This IS for EFI.
enable = true; # Grub is needed for our weird shit
enableCryptodisk = true; # Add LUKS support
extraInitrd = "/boot/initrd.keys.gz"; # LUKS Key
zfsSupport = true; # Add ZFS support
};
};
Some people may note right away that we’re building our key into the initrd and
may worry about security issues, but we will get to that as well!
Needless to say, literally everything that is handled in configuration files
throughout a normal linux distribution, is instead located in a centralized
file. We can break that file up and import others as well, similar to any other
programming language. Essentially, NixOS reduces the entire operating system to
a series of Nix-Language source files, and we let Nix handle all the rest!
About Our System
So, for this initial article I will be discussing what I wanted out of my new
daily driver operating system, and how I went about implementing it. You can
look at my entire Nix-Configuration through the GitHub repository
of the same name, but I won’t be referring to any specific files yet
because the repository is going to change layouts several times throughout this
series of articles, as I attempt to convert to a more functional form of
creating my system.
So, without further ado, what are our requirements?
- Encrypted Root Partition
- ZFS Root Partition
- Encrypted Boot(!) Partition, to protect our kernels and initrd
- Encrypted Swap
- Hibernation Support with encrypted swap(!)
There are actually more requirements, but these form the basis for this article.
Before we get into it, it is worth noting that I used several different sources
to compile all the steps needed to accomplish everything.
First, for Nix on ZFS, the NixOS Wiki Page of the same name was
instrumental in solving the basic requirements of our work. We skip over ZFS
Native encryption, because while it is not leaky in relevant ways, it is still
slightly leaky. From man zfs
:
zfs will not encrypt metadata related to the pool structure, including dataset
names, dataset hierarchy, file size, file holes, and dedup tables.
Next, we need to ensure that we can encrypt the boot directory. This
blog post was instrumental in getting things to work. Had I not found this
post, its possible that I would have forgone boot partition encryption, and I’m
very glad I was able to get it done.
Last, but not least, encrypted swap with hibernation was enabled by following
the first answer to this stack exchange question with some gentle
modifications made.
Additionally, it is worth noting that the configuration.nix
file was simply
copied from my previous trial system, and tweaks made to it. The primary reason
the repository is out of date is that while trying to get the system up and
running, I ignored most standards of aesthetics, so I’d like to get it cleaned
up proper before releasing it.
Let’s get started!
Part 1 - Live Environment Pre-Work
While installing the trial system, I simply used the minimal disk environment,
without a GUI, and used my phone to access documentation. However, with the
number of things I wanted to try this time, I figured it would be best to have a
graphical environment to refer to the three sources mentioned above. This poses
a unique problem on a Mac, as the proprietary NVIDIA driver is the only driver
at the time of writing that will get an X-Session up and running.
Adding additional complexity is the fact that while preparing for this process,
I disassembled my MacBook and somewhere along the way caused some sort of issue
with the IO Board, which means I was limited to only one USB 3.0 port on the
left side of the computer.
To start, I used macOS to install the new macOS beta. This was important because
the only way to update MacBook firmware is through macOS. By installing a beta
release, I was trying to get out ahead of any firmware updates to be released in
the next six months. Fingers crossed, this is all that will be required, and we
won’t need to figure out how to get macOS back because of a new firmware
exploit.
Once the beta was installed, and everything was good to go, I downloaded a quick
live-image of ElementaryOS (I knew the Nix live image would be a problem, and
wanted to wait till the system was ready for installation to deal with it), and
used Etcher to write it to a USB disk.
Rebooting into ElementaryOS, I ran a series of commands on the main SSD.
Starting with blkdiscard
, I initiated a manual TRIM on the disk to mark
every sector as free of data. Next, I used an ATA Secure Erase command,
per the Arch Wiki Memory Cell article to reset the drive to factory
default write speed. For good measure, I ran the --security-erase-enhanced
form of the wipe. Finally, I ran another blkdiscard
on the drive, just to be
really sure that everything was gone.
Next, we rebooted into ElementaryOS again, this time telling GRUB I wanted the
whole live system to be stored in RAM. When this was done, I downloaded the
NixOS Graphical Install CD, and burned the ISO to the USB drive.
Rebooting, I was presented with the boot menu, and I made sure to load NixOS
entirely into RAM as well. The NixOS live CD does not contain any non-free
firmware or software, which means the Mac Broadcom WIFI chipset will not be
detected. I already needed to deal with one small issue with the live system,
and it would be easier to simply unplug the USB drive and use a USB->Ethernet
dongle to connect for the majority of the installation.
So, now we are inside the NixOS Live System, at a command prompt, and we have an
internet connection. Attempting to boot into a GUI, as expected, results in a
failure to find a valid display device. This isn’t as much of an issue as it
could be on other systems, we simply need to edit /etc/nixos/configuration.nix
on the live-system to include services.xserver.videoDrivers = [ "nvidia" ];
and then run nixos-rebuild switch
. Once this completes, we run the given
command to start the X-Session, and viola, it works. Checking all of our
networking areas, we see that we have a proper internet connection, and we can
move on to the more fun things.
Part 2 - Disk Configuration
This part is fairly straight forward. We use gdisk
to set up three partitions.
The first partition is for our EFI Boot Partition, the second will be for our
Swap partition, made large enough to handle hibernation plus a little extra, and
the third is our new root partition, where ZFS will live.
$ gdisk /dev/sda
GPT fdisk (gdisk) version 1.0.4
Partition table scan:
MBR: not present
BSD: not present
APM: not present
GPT: not present
Creating new GPT entries in memory.
Command (? for help): o
This option deletes all partitions and creates a new protective MBR.
Proceed? (Y/N): Y
Command (? for help): n
Partition number (1-128, default 1): 1
First sector (34-2097118, default = 2048) or {+-}size{KMGTP}:
Last sector (2048-2097118, default = 2097118) or {+-}size{KMGTP}: +200M
Current type is 'Linux filesystem'
Hex code or GUID (L to show codes, Enter = 8300): ef00
Changed type of partition to 'EFI System'
Command (? for help): n
Partition number (1-128, default 2): 2
First sector (34-2097118, default = 2048) or {+-}size{KMGTP}:
Last sector (2048-2097118, default = 2097118) or {+-}size{KMGTP}: +20G
Current type is 'Linux filesystem'
Hex code or GUID (L to show codes, Enter = 8300): 8200
Changed type of partition to 'Linux swap'
Command (? for help): n
Partition number (2-128, default 3): 3
First sector (34-2097118, default = 411648) or {+-}size{KMGTP}:
Last sector (411648-2097118, default = 2097118) or {+-}size{KMGTP}:
Current type is 'Linux filesystem'
Hex code or GUID (L to show codes, Enter = 8300):
Changed type of partition to 'Linux filesystem'
Command (? for help): w
Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING
PARTITIONS!!
Do you want to proceed? (Y/N): Y
OK; writing new GUID partition table (GPT) to /dev/sda.
The operation has completed successfully.
Now we can take a look at what we have:
$ gdisk /dev/sda
GPT fdisk (gdisk) version 1.0.4
Partition table scan:
MBR: protective
BSD: not present
APM: not present
GPT: present
Found valid GPT with protective MBR; using GPT.
Command (? for help): p
Disk /dev/sda: 977105060 sectors, 465.9 GiB
Model: APPLE SSD SM0512
Sector size (logical/physical): 512/4096 bytes
Disk identifier (GUID): C35223A0-E004-474E-8B79-230B64658AB0
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 34, last usable sector is 977105026
Partitions will be aligned on 2048-sector boundaries
Total free space is 2014 sectors (1007.0 KiB)
Number Start (sector) End (sector) Size Code Name
1 2048 411647 200.0 MiB EF00 EFI System
2 411648 42354687 20.0 GiB 8200 Linux swap
3 42354688 977105026 445.7 GiB 8300 Linux filesystem
Command (? for help): q
So, everything is now set up on disk, it is time to build our filesystems.
Part 3 - Filesystems
With our disk structures on place, let’s talk about our filesystems. There will
be three “main” ones, but it gets a bit more complex than that. First, let’s
start by setting up our new EFI System Partition
That solves that issue, next we need to set up our two encrypted partitions.
Despite the fact that we are going to use keyfiles, we should still establish a
typed passphrase in the event we need to tweak the partitions from outside the
operating system built on it. After setting up the encryption, we open each
encrypted container and assign it a friendly name to work with.
$ cryptsetup luksFormat /dev/sda2
Enter passphrase:
Verify passphrase:
Command successful.
$ cryptsetup luksFormat /dev/sda3
Enter passphrase:
Verify passphrase:
Command successful.
$ cryptsetup luksOpen /dev/sda2 nixos-swap
Enter passphrase:
Command successful.
$ cryptsetup luksOpen /dev/sda3 nixos-root
Enter passphrase:
Command successful.
Okay, so now we have our containers. Next step is to form the basic filesystems
inside of each one. Swap on our swap device, ZFS on our ZFS device. It is
important that we use the /dev/disk/by-id/
entry with the UUID specification
and the friendly name of the device. This makes identifying things easier when
we work with them, and helps ZFS to understand what exactly is going on.
$ mkswap /dev/disk/by-id/dm-uuid-CRYPT-LUKS1-deadbeef-nixos-swap
Setting up swapspace version 1, size = 20971520 KB
Before we get to the ZFS setup, I’d like to explain the options I am using.
While the explanations are available on the wiki page, they are restated here
for brevity.
-O compression=lz4
- Disk space on an SSD is more valuable than CPU time. Using LZ4 will not impact the user experience to any discernible degree.
-O normalization=formD
- Our whole filesystem will be in Unicode with this. While not really required, it could let you do some interesting things, and in general I like to use Unicode wherever possible.
-O xattr=sa
- Boost performance with certain file attributes, this could become useful if I ever attempt system hardening (I likely will at some point)
-O acltype=posixacl
- Required for systemd-journald
-O mountpoint=none
- This turns off ZFS’ automount machinery. In certain instances, ZFS’ and NixOS’ boot time automounting machinery could trigger a race condition and prevent the system from booting. This allows us you bypass that potential completely.
-o ashift=12
- Force 4K sectors. It is very likely ZFS would have done this anyways, but instead of risking the chance that it would read the hardware incorrectly, I just manually declare it.
So, with that out of the way, here is what our nice bulky zpool
creation
command looks like:
$ zpool create -O compression=lz4 -O normalization=formD -O xattr=sa -O acltype=posixacl -O mountpoint=none -o ashift=12 zroot /dev/disk/by-id/dm-uuid-CRYPT-LUKS1-deadbeef-nixos-root
With our zpool initialized, next we need to form the filesystems under it. We
will do three things. First, we will create a separate dataset for our /home
directory, so that user data is kept somewhere separate from the root partition.
Next, we will define a root
dataset, and within that, a nixos
dataset. This
means, should we ever want to, we could run multiple distributions off of the
same ZPool by nesting them under the root
dataset, and pointing their /home
at our home dataset. Is it likely we will ever do this? No, but it would be nice
to have if we ever decided to try it!
Once again, we set our mount points to legacy to ensure that the automount
machinery has absolutely nothing to go on, preventing it from firing during
boot.
$ zfs create -o mountpoint=none zroot/root
$ zfs create -o mountpoint=legacy zroot/root/nixos
$ zfs create -o mountpoint=legacy zroot/home
Where does this leave us? As of right now, we have our EFI System Partition,
freshly created with nothing on it. We have our Swap Partition, nested within a
LUKS encrypted volume, and we have our three ZFS datasets, within our zroot
zpool, within a LUKS encrypted volume.
Not yet done, we need to mount everything to the proper locations, and then do
some additional work to make sure everything will boot as we want it to. To
begin with, we will set up the easy ZFS mount points. Next, we need to mount our
EFI System Partition to /mnt/efi
. We do this because it will allow GRUB to
write to the EFI partition, and have that point to our actual, encrypted, boot
directory, which we also create here. Lastly, for some additional work we will
do here in a moment, we manually create the /root
directory.
$ mount -t zfs zroot/root/nixos /mnt
$ mkdir /mnt/home
$ mount -t zfs zroot/home /mnt/home
$ mkdir /mnt/efi
$ mount /dev/sda1 /mnt/efi
$ mkdir /mnt/boot
$ mkdir /mnt/root
Now, we mentioned above that we would like to have the system automatically
unlock. This is secure, because before we can even access GRUB directly, we will
have to type in our decryption passphrase for our nixos-root partition.
Essentially, everything we are about to do will be encrypted based on that
master passphrase anyways, so there’s no real chance of a leak occurring.
To do this, we will create two binary keyfiles. swap.key
will be the binary
key for the Swap partition, and root.key
the binary key for the root
partition. We will use LUKS to assign those keys to their respective LUKS
encrypted volumes, allowing the volumes to be decrypted both with a binary
keyfile and a passphrase. The root.key
file will then be packaged into a CPIO
archive, and GRUB will append this to the initrd image made by NixOS.
During boot, we will type in our master password to unlock GRUB, select our boot
entry, and then GRUB will hand over control to the initrd after appending our
CPIO archive. During boot, the initrd will unlock /dev/sda3
using root.key
and then hand over control to systemd
. SystemD will continue the boot, and
then load swap.key
to unlock the swap partition. Since the swap partition is
not encrypted randomly each time, this process can be repeated, thus enabling
hibernation to function properly.
The end result of all of this is that a single master passphrase only needs to
be entered once to allow the system to boot properly. Without this method, we
would have to re-enter the nixos-root
passphrase twice, and the nixos-swap
passphrase once. I am not sure, but this also might break our hibernation
capabilities.
Let’s get started. First we will create our binary keyfiles from /dev/urandom
,
then assign them to the volumes, then create the CPIO archive and stash it where
it needs to be.
$ dd count=4096 bs=1 if=/dev/urandom of=/mnt/root/root.key
$ dd count=4096 bs=1 if=/dev/urandom of=/mnt/root/swap.key
$ cryptsetup luksAddKey /dev/sda2 /mnt/root/swap.key
Enter passphrase:
Command successful.
$ cryptsetup luksAddKey /dev/sda3 /mnt/root/root.key
Enter passphrase:
Command successful.
$ cd /mnt/root
$ echo ./root.key | cpio -o -H newc -R +0:+0 --reproducible | gzip -9 > /mnt/boot/initrd.keys.gz
With that last string of commands, we are all set. To recap what was
accomplished in this section:
- We created the FAT filesystem for our EFI System Partition
- LUKS formatted /dev/sda2 and /dev/sda3 with a passphrase
- Opened /dev/sda2 and /dev/sda3 and assigned them to nixos-swap and nixos-root, respectively
- Created a swap FS on nixos-swap
- Created a zpool on nixos-root
- Created 3 datasets on nixos-root
- Mounted everything correctly
- Generated 2 binary keyfiles
- Assigned each binary keyfile to its respective partition
- Generated a CPIO archive for our initrd.
It’s time to move on to installing NixOS and configuring it to make use of our
work.
End of Part 1
While I had intended for us to have a system up and running by the end of part
one, this post is close to breaking 600 lines, and to be completely honest, this
is the most I have written in quite a while. Part 2 will cover getting the
system up and running, as well as a little preview of what our
configuration.nix
file will look like. Expect that installment to be quite a
bit smaller than this one. Finally, part 3 will deal with the initial
declarative configuration of our /home
directory.
The end goal is that all user data will end up kept in a ~/.library
directory,
similar to the nix-store
, and upon login, a symbolic-link farm will be built
according to our declarative home-management system. In this way, everything,
from /etc/nixos
to various dotfiles will be kept in an easy to understand
layout, and just linked to their less-easy to understand directories by the
derivation created by home-management.
But, for another time.
10 Dec 2018
There are yet more updates to the blog. The first of which is that we now have
an actual domain name, which is zyradyl.moe
. In keeping with my tradition of
complete transparency, the domain was acquired through Gandi.net, after I found
out that IWantMyName was unable to accept Discover cards. While I am still
supportive of IWMN as a company, if they don’t accept my card it leaves me
unable to use them.
Next, the DNS for this site is now handled through Cloudflare, which means that
this site is now fully available via HTTPS with a valid SSL certificate. So,
small victories.
While running through the process of updating the blog, I noticed several things
were broken and went ahead and fixed those:
- The “Site Version” link in the sidebar now properly links to the GitHub
source repository.
A long standing issue with pagination has been corrected by updating to
the jekyll-paginate-v2 gem, and rewriting the appropriate liquid blocks.
- Github-Pages does not support the v2 gem. Therefore, the site has been
downgraded back to the v1 gem, and the liquid blocks were cleaned up
based on trial and error.
- Related posts are now actually related! This is accomplished by iterating
through tags at compile time and creating a list of related posts. While
this may not always be accurate, it is far more accurate than the time
based system jekyll uses by default.
- A small issue has been corrected with the header file used across pages.
There was a typo that was generating invalid HTML. It didn’t cause any
visible issues, but it was a problem all the same.
- The archive page now uses a new Liquid code block. This is to resolve the
long standing
</ul>
problem, where the code would generate trailing
closing tags.
- HTTPS links have been enforced across the board. I cannot promise the site
that you visit will have a valid SSL certificate, but we will certainly try
to redirect the connection over SSL now.
HTML proofer is still throwing a few errors related to my consistent use of
the Introduction and Conclusion headers, but these are not actual errors.
- Even these errors have been fixed. HTMLProofer now returns a completely safe
site.
I’m also in the process of going back through previous posts and cleaning up
the YAML front matter. While this front-matter previously had very little
impact on the site, it now can matter quite a lot with the way the related
posts system works.
09 Dec 2018
So, another week has gone, and it is time to update this blog with what I have
learned. Unfortunately, experiments were not able to be run this week in the
realm of data transfer. I decided to revisit the base system to focus on
encrypting backup data while it is at rest on the system. This was one of the
remaining security vulnerabilities with this process. While end-users still have
to trust me, they can at least be assured the data is encrypted at rest.
Essentially, if the system was ever stolen, or our apartment door was broken
down, we would just have to cut power and the data would be good. With that
previous statement, please keep in mind that this week’s post only refers to the
root drive. I didn’t make much progress because of things happening at work, but
this is a nice, strong, foundation to build upon.
Many of the steps in this post were cobbled together from various sources across
the internet. At the bottom of this post you can find a works cited that will
show the posts that I used to gather the appropriate information.
End Goal
The end goal is to ensure that the operating system’s root drive is encrypted at
rest. Full Disk Encryption is not an active security measure, it is a passive
one. It is primarily there to ensure that should the system ever be stolen, it
would not be readable. The root partition will not host any user data, so the
encryption should be transparent and seamless.
In short, we will utilize a USB key to provide a Keyfile which will then be
combined with LUKS encryption to unlock the LVM array to allow the initramfs to
hand over control to the operating system.
Notes ## {: #thoth-3-notes }
Because we are using a solid state drive, and we will be filling the drive with
data, it was important for me to over-provision the drive. The SSD we’re using
comes with 240GB of space. We can assume that there is some form of manufacturer
over-provisioning in play to get that number, if I had to guess I would assume
there is actually 256GB of NAND memory on the drive, but only 240GB are made
available to the user. This is a fairly reasonable level of over-provisioning.
However, with us planning to fill the drive with pseudorandom data in order to
obfuscate the amount of data actually in use, this 16GB could potentially be
used quite quickly. SSDs cannot actually rewrite sectors on the fly, they have
to run a READ/ERASE/WRITE cycle. This is typically done by writing the new block
to an over-provisioned area and then pointing the drive’s firmware at that
block. In this way we avoid the ERASE penalty, which can be on the order of 0.5
seconds per block.
Essentially then, every single write to the drive will require a
READ/ERASE/WRITE cycle, so padding the over-provisioning is a very good idea. It
will help with wear leveling and prevent severe write amplification, while also
making the drive “feel” faster.
Prior Work
Before we get into the new installation, we need to prepare the drive for its
new role. Unless the flash cells are at their default state, the firmware will
regard them as holding data and will not utilize them for wear leveling, thus
rendering the over-provisioning useless.
To begin, boot the system via a Debian Live-CD and open up a root prompt using
sudo
.
If you, like me, prefer to work remotely, you will then need to run a sequence
of commands to prep the system for SSH access. We need to add a password to the
liveCD user, then install openSSH, and finally start the service. Once all of
this is complete, you can log in from a more comfortable system.
# apt-get update
# apt-get install openssh-server
# passwd user
# systemctl start sshd
We will need to install one last software package, hdparm
. Run
apt-get install hdparm
to grab it. Once you have done so, run
hdparm -I /dev/sda
. Under “Security” you are looking for the words “not
frozen”. If it says frozen, and you are working remotely, you will need to
access the physical console to suspend/resume the machine. This should
unfreeze access to the ATA security system.
The first thing we need to do is to run an ATA Enhanced Erase. After this is
done, I still like to run blkdiscard
just to make sure every sector has been
marked as empty. Finally, we will use hdparm
to mark a host-protected-area,
which the drive firmware will be able to use as an over-provisioning space.
To calculate the HPA size, figure out what size you want to be available to
you. Convert that into bytes, and divide by 512, which is the sector size.
This will give you the number to pass to hdparm
.
# hdparm --user-master u --security-set-pass Eins /dev/sda
# hdparm --user-master u --security-erase-enhanced Eins /dev/sda
# blkdiscard /dev/sda
# hdparm -Np390625000 --yes-i-know-what-i-am-doing /dev/sda
# reboot
Once this is done reboot immediately. There is a lot that can go wrong if
you fail to reboot. At this point, I swapped out my disk for the Debian
installer. If you are doing this on your own 2006-2008 MacMini, you may want
to use the AMD64-mac ISO that the Debian project provides.
From here, we just have to confirm that the drive shows up how we want in the
installer (200GB in size, in my case), and we can proceed with the installation.
Installation
Most of the Debian installation process is self explanatory. The only point
where I will interject is partitioning. Because of the way the MacMini2,1
boots, it is important that we use an MBR based grub installation. You can do
a 32bit EFI installation, but it is very fragile, and I’m not a fan of fragile
things. That being said, I still wanted the ability to use GPT partitions. I
like being able to label everything from the partition up to the individual
filesystems.
Accomplishing this is actually fairly easy anymore. You just need to create a
1MB grub_bios
partition as part of your scheme and you’re good to go. To get
the level of control we need, we will select manual partitioning when prompted
to set up our partitions in the installer.
Create a new partition table (This will default to GPT), and then lay out your
initial partition layout. It will look something like this:
<PART #> <SIZE> <NAME> <FILESYSTEM> <FLAGS>
#1 1MB BIOS_PARTITION none grub_bios
#2 1GB BOOT_PARTITION ext4 bootable
#3 199GB ROOT_PARTITION crypto crypto
When you select “Physical Volume For Encryption” it will prompt you to configure
some features. You can customize the partition there, but I actually wanted more
options than the GUI provided, so I accepted the defaults and planned to
re-encrypt later. Please make sure to allow the installer to write encrypted
data to the partition. Since we have already set up a customized HPA, a
potential attacker already knows the maximum amount of cipher text that can
be present, and if the HPA is disabled they would likely be able to gain
access to more. Therefore, it is important that we take every possible
precaution.
Once this is done, you should scroll to the top where it will say “Configure
Encryption” or something similar. Select this option, then select the physical
volume we just set up, and it should drop you back to the partitioning menu.
This time, however, you will be able to see the newly unlocked crypto partition
as something that we can further customize.
Select that volume and partition it like so:
<PART #> <SIZE> <NAME> <FILESYSTEM> <FLAGS>
#1 199GB none lvm
The LVM option will show up in the menu as “Physical Volume for LVM.” From here,
we go back up to the top of our menu and select “Configure Logical Volume
Manager.” You will then be taken to a new screen where it should show that you
have one PV
available for use. Create a new volume group that fills the entire
PV
and name it as you would like. For this project, I named it djehuti-root
and completed setup.
Next we need to create a Logical Volume for each partition that you would like
to have. For me, this looked like the following:
<Logical Volume> <Size> <Name>
#1 30GB root-root
#2 25GB root-home
#3 10GB root-opt
#4 05GB root-swap
#5 05GB root-tmp
#6 10GB root-usr-local
#7 10GB root-var
#8 05GB root-var-audit
#9 05GB root-var-log
#10 05GB root-var-tmp
Your layout may be similar. Once this is done, you can exit out and you will
see that all of your logical volumes are now available for formatting. Since I
wanted to stick with something stable, and most importantly resizable (more on
why later), I picked ext4
for all of my partitioning. We will tweak mount
options later. For now, the end product looked like the following:
<PARTITION> <FS> <MOUNT POINT> <MOUNT OPTIONS>
/dev/sda2 ext4 /boot defaults
/dev/djehuti-root/root-root ext4 / defaults
/dev/djehuti-root/root-home ext4 /home defaults
/dev/djehuti-root/root-opt ext4 /opt defaults
/dev/djehuti-root/root-swap swapfs none defaults
/dev/djehuti-root/root-tmp ext4 /tmp defaults
/dev/djehuti-root/root-usr-local ext4 /usr/local defaults
/dev/djehuti-root/root-var ext4 /var defaults
/dev/djehuti-root/root-var-audit ext4 /var/audit defaults
/dev/djehuti-root/root-var-log ext4 /var/log defaults
/dev/djehuti-root/root-var-tmp ext4 /var/tmp defaults
Once everything is setup appropriately, follow through the installation until
you get to the task-sel
portion. You really only want to install an ssh server
and the standard system utilities pack. Once the installation completes, reboot
into your server and make sure everything boots appropriately. We’re going to be
doing some offline tweaking after this point, so ensuring that everything is
functioning as is will save you a lot of headache.
Once you are satisfied the initial installation is functioning and booting
correctly, it is time to move on to re-encrypting the partition with our own
heavily customized parameters.
Re-Encryption
This process isn’t so much difficult as it is simply time consuming. Go ahead
and reboot your system to the boot media selection screen. You will want to
swap out your Debian Installation CD for the Debian LiveCD that we used earlier.
Once the disks have been swapped, boot into the live environment and then
bring up a shell. We will first need to install the tools that we will use, and
then run the actual command. The command is actually fairly self explanatory,
so I won’t explain that, but I will explain the reasoning behind the parameters
below.
# apt-get update
# apt-get install cryptsetup
# cryptsetup-reencrypt /dev/sda3 --verbose --use-random --cipher serpent-xts-plain64 --key-size 512 --hash whirlpool --iter-time <higher number>
So, onto the parameters:
- cipher - I picked Serpent because it is widely acknowledged to be
a more “secure” cipher. Appropriate text from the above link
is as follows: “The official NIST report on AES competition
classified Serpent as having a high security margin along
with MARS and Twofish, in contrast to the adequate security
margin of RC6 and Rijndael (currently AES).” The speed
trade-off was negligible for me, as the true bottleneck in the
system will be network speed, not disk speed.
- key-size - The XTS algorithm requires double the number of bits to
achieve the same level of security. Therefore, 512 bits
are required to achieve an AES-256 level of security.
- hash - In general, I prefer hashes that have actually had extensive
cryptanalysis performed to very high round counts. The best
example of an attack on whirlpool, with a worst case situation
where the attacker controls almost all aspects of the hash,
the time complexity is still 2^128th on 9.5 of 10 rounds. This
establishes a known time to break of over 100 years.
- iter-time - The higher your iteration time, the longer it takes to unlock,
but it also makes it harder to break the hash function. So if
we combine what we know above with a large iteration time, we
gain fairly strong security at the expense of a long unlock
time when using a passphrase.
Once these specifications have been entered, you simply need to press enter and
sit back and relax as the system handles the rest. Once this process is
complete, you should once again reset and boot into the system to verify that
everything is still working as intended. If it is, you are ready for the next
step, which is automating the unlock process.
Auto-Decryption
There are a few ways to handle USB key based auto-decryption. The end goal is
to actually use a hardware security module to do this, and I don’t anticipate
the FBI busting down my door any time soon for hosting the data of my friends
and family, so I opted for one that is easily extendable.
Essentially, the key will live on an ext4
filesystem. It will be a simple
hidden file, so nothing extremely complex to find. This shouldn’t be considered
secure at this point, but it is paving the way to a slightly more secure
future.
The first thing that I did, though it isn’t strictly necessary, is write random
data to the entire USB stick. In my case, the USB drive could be found at
/dev/sdb
.
# dd if=/dev/urandom of=/dev/sdb status=progress bs=1M
Once this is done, we’ve effectively destroyed the partition table. We will
recreate a GPT table, and then create a partition that fills the usable space
of the drive.
# apt update
# apt install parted
# parted /dev/sdb
(parted) mklabel gpt
(parted) mkpart KEYS ext4 0% 100%
(parted) quit
Now we just create the filesystem, a mount point for the filesystem, and make
our new LUKS keyfile. Once the file has been created, we just add it to the
existing LUKS header.
# mkfs.ext4 -L KEYS /dev/sdb1
# mkdir /mnt/KEYS
# mount LABEL=KEYS /mnt/KEYS
# dd if=/dev/random of=/mnt/KEYS/.root_key bs=1 count=4096 status=progress
# cryptsetup luksAddKey /dev/sda3 /mnt/KEYS/.root_key
After this point, the setup diverges a bit depending on what guide you follow.
We will stick close to the guide posted to the Debian mailing list for now, as
that guide got me a successful boot on the first try. The others are slightly
more elegant looking, but at the expense of added complexity. As such, they may
end up being the final configuration, but for this prototyping phase they are
a bit excessive.
We have to modify the crypttab
file to enable the keyfile to be loaded off of
our freshly set up key drive.
sda3_crypt UUID="..." /dev/disk/by-label/KEYS:/.root_key:5 luks,initramfs,keyscript=/lib/cryptsetup/scripts/passdev,tries=2
At this point, we need to repackage our startup image, update grub, and reboot
to test the whole package.
# update-initramfs -tuck all
# update-grub
# reboot
At this point the system should boot automatically, but you will notice a weird
systemd
based timeout that happens. This is mentioned in the guide posted to
the Debian Stretch mailing list, and is fairly easy to solve. We just need to
create an empty service file to prevent systemd
from doing it’s own thing.
# touch "/etc/systemd/system/systemd-cryptsetup@sda3_crypt.service"
# reboot
At this point, everything should boot correctly and quickly. You may notice a
few thrown errors, but it shouldn’t be anything severe, more services loading
out of order.
At this point, it used to be possible to allow for the creation of a fallback
in the event that the key drive wasn’t present, but that seems to have been
removed. I plan to look into it further when I have more time.
Conclusion ## {: #thoth-3-conclusion }
This concludes the first part of the Operating System setup process. The next
step was originally planned to be thin-provisioning the partitions inside the
djehuti-root
volume group, but there seems to be some problems in getting
the system to boot from a thin-provisioned root. I’m looking into a weird
combined system, where the root is static but all the accessory partitions are
thinly provisioned, but it will take time to tinker with this and report back.
Thin Provisioning isn’t strictly required, but it is a rather neat feature and
I like the idea of being able to create more partitions than would technically
fit. I’m not sure when this would be useful, but we will see.
Once all of this is finalized, we will move on to hardening the base system,
and last but not least creating the Stage 1 Project page. Then it is back to
experiments with data synchronization. This is a fairly large step back in
progress, but I am hopeful it will result in a better end product, where
security can be dynamically updated as needed.
Works Cited
The following sources were invaluable in cobbling this process together. I
sincerely thank the authors both for figuring the process out and documenting
the process online.
03 Dec 2018
Hello! It’s that time of the week again, where I update everyone on my latest
work. This episode is far less technical and focuses more on the concept of a
“One and Done” backup solution, aka the holy grail of data maintenance.
It fucking sucks.
Introduction ### {: #thoth-2-introduction }
This entry is slightly unidirectional. The concept of a simple, easy to
implement, catch everything you might ever need solution is quite literally the
holy grail, yet it has never honestly been implemented. Sure, user data is
generally scooped out, but in the day and age of game mods, and with some
development projects taking place outside of the User directory, it seemed
prudent to at least attempt the full backup. Well, I’ve been attempting it
for seven days. Here’s what I’ve found.
Focus
We will not be focusing on the space impact of a complete backup. This is
actually fairly negligible. With out-of-band deduplication, only one set of
operating system files would ever be stored, so server side storage would reach
a weird type of equilibrium fairly quickly. Instead, I’ll talk about three
things:
- Metadata Overhead
- Metadata Processing
- Initial Synchronization
There may be another post tonight talking about additional things, but this
deserves it’s own little deal.
A fully updated Windows 10 partition of your average gamer, aka my fiancé, is
composed of 479,641 files and 70,005 directories which comprise a total
data size of ~216 GiB. This is actually just the C drive and typical
programs. If you factor in the actual game drive in use by our test case, that
drive contains 354,315 files and 29,111 directories which comprise a
total of ~385 GiB of space.
In summation, an initial synchronization of what is typically considered a “full
system backup” comprises 833,956 files and 99116 directories comprising
~601GiB which results in an average filesize of ~755KiB and an average
directory size of ~9 files.
SyncThing creates a block store that is comprised of, by default, 128KiB
blocks. This means that for our system, assuming the data is contiguous, we need
4923392 Metadata Entries. Assuming the files are NOT contiguous, this is
probably closer to about 5 Million metadata entries. As of right now, the
server side metadata storage for the testing pool is at 1.7 GiB and initial
syncronization is not yet complete. Extrapolating a bit, we can assume that
2.0 GiB would not be an unreasonable size for a final server side data
store.
The client side store, at the time of writing, is approximately 1 GiB and
may grow slightly larger. However, I will use 1 GiB. This means that there
is a plausible total of 3GiB of metadata overhead representing an overhead
percentage of ~0.5% across the pool. Scaling up, this means 10 clients
with 1TB of data each would require 51.2GB of Metadata.
Should anything happen to the metadata store, it would need to be rebuilt by
data reprocessing. This introduces a potentially massive liability, as scanning
frequency would need to be reduced to not impact the rebuild operation.
The server is capable of a hash rate of 107MB/s. I am picking the server’s
hash rate because it is both the slowest hash rate of the pool and would have
the most metadata that would need to be rebuilt.
For a complete rebuild of the data of our current cluster, it would take the
server ~96 Minutes during which no data synchronization could occur. This
equates to a minimum of 1 Missed Hourly Update and could potentially result
in up to 2 missed hourly updates if the timing was unfortunate enough.
For a complete rebuild of the data of our theoretical cluster, we will allow for
a hash rate of 300MB/s. The total data needed to be rebuilt would be 10TB.
This would result in a database rebuilt time of ~10 Hours which could result
in up to 11 missed synchronization attempts.
Initial Synchronization
The initial syncronization is composed of three primary parts. First, the
client and host must agree on what folders to syncronize. Second, the client
must build a database of the content hosted locally. Next, utilizing a rolling
hash algorithm, data is entered into the metadata cache and transmitted to the
server.
Per the developer of SyncThing, millions of small files are the worst case
scenario for the backup system. As of my independent, albeit anecdotal testing,
After 7 days the synchronization process is still in effect. This represents a
very poor user experience and would not be ideal for a widespread rollout.
Conclusion ### {: #thoth-2-conclusion }
The primary goal of a backup utility is to synchronize files and achieve cross
system consistency as quickly as possible. While it is true that eventually
consistent systems are utilized in large scale operations, this type of
consistency is allowable only, in my opinion, at data sizes over 10TB. The
current testing set is approximately 1TB at most, and thus this is unacceptable.
Either the backup paradigm must change, or the utility used to implement it
must change. While I do not expect to find any faster utilities for performing
the backup process, I do plan to continue to experiment. At this time, however,
it seems that the most likely way to make the process as friendly as possible
would be the implementation of a default backup subset, with additional data
added upon user request, and after the high priority synchronization had been
completed.
30 Nov 2018
Yes, it’s that time of year again. I have updated the blog! You can find more
information below.
Analytics
Google Analytics are back. I understand that some people may not like being
tracked, but at that point you should have an add or tracking blocker installed.
I recommend looking at the Brave Browser, which is what I personally use, or
installing uBlock Origin. The reason I have added this back to the blog is that
I have noticed links to this blog appearing in various places over the web, and
I would like to be able to detect how much traffic I am getting from these
links.
I wholeheartedly understand if you disapprove of the use of Google Analytics. If
anyone is able to suggest a better service, that collects less user data, please
open an issue in the GitHub repository for this site. I will gladly change
providers as long as the new one is also free.
Theme Updates
I forked the Lanyon repository and applied all the currently pending pull
requests. This should keep everything up to date with the latest version of
Jekyll. To make it easier for anyone else looking for that information, I
created a new PR in the lanyon repository to my mergers. You can also find them
under my GitHub site.
Fonts
Somehow the fonts on the site got nuked during the upgrade. They are back in
place now.
Page Speed
The new updates have not yet been optimized. Previously I used Google’s page
load speed thingy to optimize the site. Maintaining that by hand is one of the
reasons the upgrade was so painful to implement. I’m looking for a way to
automate the process the same way that I have currently automated the link tests
that I run prior to a push. This will likely involve writing a new process in
the Rakefile, so it will take some time. In the meantime, the only thing that
is really being pulled is a few font files and the analytics script, so the
load impact shouldn’t be too bad.
Organization
There is a weird issue between rendering the site on my local machine and the
way GitHub pages renders it on their side. To solve this I created a new
template and appended it to all the pages that should not appear in the sidebar.
Hopefully this strikes a good balance between being able to use standardized
templates, as well as ease of use. The new template simply imports the existing
page template under a new name.
Images
While I am primarily focused on text on this blog, I have recently included a
few images. These are also hosted on GitHub, so I cleaned up the way the image
directory is laid out. This has impacted all of one image, but should make any
expansion in the future much easier.
Liquid Changes
There was some issue with the way the Liquid on the Tags page was written. This
has been corrected accordingly.
About Page
The about page has been correctly updated! My view on some of the listed issues
has evolved in recent years. You will now find those things struckthrough with
comments added underneath.
Future Improvements
There is still minifying, javascript inlining, and font work to be done to make
the page run faster. Additionally, the tags page is simply a disaster. Between
all of that and the project pages, there is still a lot left to be done.
Hopefully, all will be accomplished in time.
Conclusion #### {: #blog-updates-conclusion }
Hopefully all of these changes make the site a bit more enjoyable to use. I do
understand if the use of analytics bothers you. Please make an issue in the
tracker, or if someone already has, comment accordingly. If there are enough
people using this site that honestly care, I might consider removing the
analytics while I do further research.