The forbidden storage layer

Storage in my homelab has always been a sore point. When I used to run a Kubernetes cluster for all my workloads, I relied on Longhorn.

I moved to Docker and needed mountable storage that’s not NFS / SMB (ie it can run a database off it) and is persistent – so when I re-provision a Docker host all my volumes can be re-mounted without having to restore from backups.

Ceph could be an option. But it’s too heavy. I would only ever max out my lab to 3 compute nodes and the reason I downscaled from Kubernetes in the first place was so that I don’t have to run HA setup. It saves 30 EUR per month, so about 6 coffees? A worthy trade-off.

I wanted to have S3 compatible storage as a foundation. For applications (like Gitea, Docker registry) that have native S3 storage support this becomes the primary storage back-end. For the rest – Docker volumes and one-off experiments – I need S3 backed POSIX compliant filesystem.

For that there are a few options. I settled on JuiceFS. A mature, thouroughly documented project with a ton of options for metadata and S3 compatible storage backends. For S3 I settled on Garage due to its simplicity and generally high praise.

A very short crash course on JuiceFS

S3 compatible API backed distributed, Read-Write-Many type of storage system with separate metadata engine. For KV metadata engine Valkey, Postgres, SQLite and many more others can be used. Metadata is critical. S3 objects DO NOT represent the actual file hierarchy, therefore – no metadata = no filesystem. Backing up BOTH is very important. JuiceFS is easy to install. Mounts pretty much anywhere (via FUSE). It has a Docker plugin. It has a CSI plugin for Kubernetes. It mounts on Linux (unfortunately, installer does not work on FreeBSD).

Layout

Here’s a rough sketch for the layout:

FreeBSD Hypervisor (14700T ThinkCentre m90q, 64G RAM)
 |
 v
Linux VM
----------
Docker container (/data)
 |
 v
Docker host -> JuiceFS volume
 |
 v
JuiceFS docker driver
 |           |
 v           v
Valkey       Garage S3

Everything runs off a single machine. I have no Garage or Valkey cluster. This is bad. I know. Valkey and Garage run in their own FreeBSD jails. Both backed by ZFS snapshots and replicated to my TrueNAS instance.

Ingress

For now I use static IP addresses. Comms are backed by 10Gb network and all metadata and s3 (garage) requests go via a switch, on their own VLAN. This is to ensure that network path is included in the benchmarks if I choose to set up the Garage and Valkey instances in HA mode.

Setup on a high level

Assuming you already have garage and Valkey running. To make things easier, replace s3_region value in garage.toml with us-east-1. This is the default AWS region and you don’t need to spend time overriding that in various tools you’ll use to access your garage bucket.

Create juicefs key:

garage key create juicefs

outputs

==== ACCESS KEY INFORMATION ====
Key ID:              access-key
Key name:            juicefs
Secret key:          secret-key
Created:             2026-05-14 10:37:50.349 +00:00
Validity:            valid
Expiration:          never

Allow creating buckets:

garage key allow juicefs --create-bucket

Create juicefs bucket:

garage bucket create juicefs

Make the key owner of the bucket:

garage bucket allow --owner --read --write juicefs --key GK3185a92a5f0a45554f08ab3d

Create juicefs volume:

juicefs format --storage s3 --bucket http://garage-instance:3900/juicefs --access-key access-key --secret-key secret-key redis://localhost:6379/1 juicefs

Mount fs:

juicefs mount Valkey://localhost:6379/1 /mnt/juicefs

With this short sequence we’re off to the races. Let’s run some benchmarks.

Benchmarks

JuiceFS bundles two benchmarking commands – objbench and bench. The former benches you s3 layer, the latter – filesystem.

bench

Here is one of the bench runs in a Linux VM with 8 cores and 16Gb of ram:

BlockSize: 1.0 MiB, BigFileSize: 1.0 GiB, SmallFileSize: 128 KiB, SmallFileCount: 100, NumThreads: 4
Time used: 23.6 s, CPU: 173.2%, Memory: 657.4 MiB
+------------------+------------------+---------------+
|       ITEM       |       VALUE      |      COST     |
+------------------+------------------+---------------+
|   Write big file |     320.15 MiB/s |  12.79 s/file |
|    Read big file |     582.79 MiB/s |   7.03 s/file |
| Write small file |    338.3 files/s | 11.82 ms/file |
|  Read small file |   1562.7 files/s |  2.56 ms/file |
|        Stat file |   7935.6 files/s |  0.50 ms/file |
|   FUSE operation | 71838 operations |    1.16 ms/op |
|      Update meta |  1275 operations |   19.30 ms/op |
|       Put object |  1424 operations |  148.81 ms/op |
|       Get object |  1024 operations |  108.14 ms/op |
|    Delete object |     0 operations |    0.00 ms/op |
| Write into cache |  1424 operations |    2.68 ms/op |
|  Read from cache |   400 operations |    0.14 ms/op |
+------------------+------------------+---------------+

320 MiB/s throughput on write and around 600 MiB/s on read. Not too bad. However, one number stood out (bench output has colors and it was in yellow) – Update meta. Consistently hovering around 20 ms/op. I wanted to optimize that further but couldn’t figure out why it consistently stayed high.

I thought network path to Valkey was the issue. Starting with ping:

root@cvm:~/juicefs-docker# ping -i 0.2 -c 100 10.200.3.100
PING 10.200.3.100 (10.200.3.100) 56(84) bytes of data.
64 bytes from 10.200.3.100: icmp_seq=1 ttl=62 time=0.148 ms
64 bytes from 10.200.3.100: icmp_seq=3 ttl=62 time=0.283 ms
64 bytes from 10.200.3.100: icmp_seq=4 ttl=62 time=0.195 ms
64 bytes from 10.200.3.100: icmp_seq=5 ttl=62 time=0.213 ms
64 bytes from 10.200.3.100: icmp_seq=6 ttl=62 time=0.233 ms
64 bytes from 10.200.3.100: icmp_seq=7 ttl=62 time=0.081 ms
--- 10.200.3.100 ping statistics ---
7 packets transmitted, 7 received, 0% packet loss, time 1221ms
rtt min/avg/max/mdev = 0.081/0.186/0.283/0.060 ms

All good here. Testing Valkey latency:

root@cvm:~/juicefs-docker# valkey-cli -h 10.200.3.100 -a <password> --no-auth-warning --latency
min: 0, max: 11, avg: 0.55 (271 samples)

Also good. So Valkey network path is not to blame.

I am running the tests in a VM and I stumbled upon Receive Packet Steering Linux kernel tuneable. This setting allows software based distribution of packet processing amongst multiple cores.

With mpstat -P ALL 1 I checked how many cores were working when running objbench.

Here’s the output of the command when running juicefs bench:

Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
Average:     all    8.56    0.00   16.40    9.16    0.00   14.54    0.00    0.00    0.00   51.35
Average:       0    1.50    0.00    2.79    0.09    0.00   91.06    0.00    0.00    0.00    4.56
Average:       1   10.78    0.00   18.25    2.27    0.00   13.57    0.00    0.00    0.00   55.13
Average:       2   10.46    0.00   19.55   17.91    0.00    1.30    0.00    0.00    0.00   50.77
Average:       3    9.55    0.00   20.28   46.90    0.00    0.83    0.00    0.00    0.00   22.43
Average:       4    8.77    0.00   19.34    1.94    0.00    0.64    0.00    0.00    0.00   69.31
Average:       5    8.27    0.00   17.22    3.25    0.00    0.46    0.00    0.00    0.00   70.80
Average:       6    8.91    0.00   18.71    1.83    0.00    0.53    0.00    0.00    0.00   70.01
Average:       7   10.86    0.00   16.44    1.79    0.00    0.46    0.00    0.00    0.00   70.44

From this output it’s obvious that CPU 0 is doing most of the processing. First, RSS is not enabled or properly configured on the host. Second, I can compensate for that with Linux side RPS. Set the value to ff – 16 CPUs (maximum).

Re-running the test shows the folling distribution:

Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
Average:     all    9.95    0.00   20.14    4.34    0.00   20.06    0.00    0.00    0.00   45.51
Average:       0    4.15    0.00   10.11    0.98    0.00   47.15    0.00    0.00    0.00   37.61
Average:       1    8.98    0.00   19.67    3.20    0.00   26.70    0.00    0.00    0.00   41.44
Average:       2    8.76    0.00   17.69    5.17    0.00   26.09    0.00    0.00    0.00   42.29
Average:       3   13.15    0.00   25.64    4.44    0.00    7.99    0.00    0.00    0.00   48.77
Average:       4   10.27    0.00   20.76    5.84    0.00   16.38    0.00    0.00    0.00   46.75
Average:       5   12.21    0.00   23.58    4.09    0.00    9.45    0.00    0.00    0.00   50.67
Average:       6   10.04    0.00   21.52    5.49    0.00   16.13    0.00    0.00    0.00   46.82
Average:       7   12.33    0.00   22.68    5.73    0.00    9.02    0.00    0.00    0.00   50.25

Much better.

Here’s the bench output after the tweak:

BlockSize: 1.0 MiB, BigFileSize: 1.0 GiB, SmallFileSize: 128 KiB, SmallFileCount: 100, NumThreads: 4
Time used: 16.3 s, CPU: 274.3%, Memory: 610.6 MiB
+------------------+------------------+---------------+
|       ITEM       |       VALUE      |      COST     |
+------------------+------------------+---------------+
|   Write big file |     558.74 MiB/s |   7.33 s/file |
|    Read big file |     768.95 MiB/s |   5.33 s/file |
| Write small file |    353.4 files/s | 11.32 ms/file |
|  Read small file |   1668.3 files/s |  2.40 ms/file |
|        Stat file |   6620.2 files/s |  0.60 ms/file |
|   FUSE operation | 71739 operations |    0.70 ms/op |
|      Update meta |  1274 operations |    1.81 ms/op |
|       Put object |  1424 operations |  102.75 ms/op |
|       Get object |  1024 operations |   79.54 ms/op |
|    Delete object |     0 operations |    0.00 ms/op |
| Write into cache |  1421 operations |    2.63 ms/op |
|  Read from cache |   400 operations |    0.14 ms/op |
+------------------+------------------+---------------+

Update meta dropped from ~20ms/op to ~2ms/op. Nice bump.

Further optimizations

Network side tweaks between the host and the VM is a source of measurable performance boost. My network card supports SR-IOV which I don’t currently use. I am using software-based bridge from FreeBSD host to the Linux VM and while performance is sufficient, there are still some gains left on the table. There are other JuiceFS tweaks to play with – metadata client cache and writeback options. That’s something to explore in the future.

I noticed my small files performance tanked over time. In my benchmarks it’s around 350 files per second. After running JuiceFS for some weeks it’s hovering around 100 files per second. My filesystem total size is 7GB, while bucket size reported from garage is a whopping 52GB. I noticed that even with 1 day retention total tras size was over 40 GB. I disabled trash entirely. With lots of small files and log churn it only adds unnecessary overhead.

Conclusion

Deploying JuiceFS, Garage and Valkey to have a unified shared storage stack for Docker is an overly complicated solution with multiple single points of failure. Definitely do not do that for your production workloads. However, deploying and benching across multiple layers helped me to identify and consequently resolve issues on the hot-path of writing and reading the data.

JuiceFS provides a solid, mature foundation for almost infinitely scaleable storage solution backed by the rock-solid S3 layer. Once setup, it’s been stable and headache free. Garage is a straightforward and, dare I say, Simple Storage layer that is easy to configure and operate. Valkey is a Redis fork – so very mature key value store.

I have not properly planned and exercised disaster recovery for all the layers. Which could be a fun next exercise. For now, backing up the contents of the Docker volumes is good enough.