Welcome to NBSoftSolutions, home of the software development company and writings of its main developer: Nick Babcock. If you would like to contact NBSoftSolutions, please see the Contact section of the about page.

Guidelines on Benchmarking and Rust

Beauty in benchmarks

This post covers:

  • Benchmark reports for contributors
  • Benchmark reports for users
  • Profiling with valgrind / kcachegrind
  • Reproducible benchmarks and graphics
  • Tips for benchmark behavior and benchmarking other languages

Lots of libraries advertise how performant they are with phrases like “blazingly fast”, “lightning fast”, “10x faster than y” – oftentimes written in the project’s main description. If performance is a libraries main selling point then I expect for there to be instructions for reproducible benchmarks and lucid visualizations. Nothing less. Else it’s an all talk and no action situation, especially because great benchmark frameworks exist in nearly all languages.

I find performance touting libraries without a benchmark foundation analogous to GUI libraries without screenshots.

This post mainly focusses on creating satisfactory benchmarks in Rust, but the main points here can be extrapolated.

Use Criterion

If there is one thing to takeaway from this post: benchmark with Criterion.

Never written a Rust benchmark? Use Criterion.

Only written benchmarks against Rust’s built in bench harness? Switch to Criterion:

  • Benchmark on stable Rust (I personally have eschewed nightly Rust for the last few months!)
  • Reports statistically significant changes between runs (to test branches or varying implementations).
  • Criterion is actively developed

Get started with Criterion

When running benchmarks, the commandline output will look something like:

sixtyfour_bits/bitter_byte_checked
                        time:   [1.1052 us 1.1075 us 1.1107 us]
                        thrpt:  [6.7083 GiB/s 6.7274 GiB/s 6.7416 GiB/s]
                 change:
                        time:   [-1.0757% -0.0366% +0.8695%] (p = 0.94 > 0.05)
                        thrpt:  [-0.8621% +0.0367% +1.0874%]
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  6 (6.00%) high severe

This output is good for contributors in pull requests or issues, but I better not see this in a project’s readme! Criterion generates reports automatically that are 100x better than console output.

Criterion Reports

Below is a criterion generated plot from one of my projects: bitter. I’m only including one of the nearly 1800 graphics generated by criterion, the one chosen captures the heart of a single benchmark measuring Rust bit parsing libraries across read sizes (in bits).

This chart shows the mean measured time for each function as the input (or the size of the input) increases.

Out of all the auto-generated graphics, I would consider this the only visualization that could be displayed for a more general audience, but I still wouldn’t use it this way. This chart lacks context, and it’s not clear what graphic is trying to convey. I’d even be worried about one drawing inappropriate conclusions (pop quiz time: there is a superior library for all parameters, which one is it?).

It’s my opinion that the graphics that criterion generates are perfect for contributors of the project as there is no dearth of info. Criterion generates graphics that break down mean, median, standard deviation, MAD, etc, which are invaluable when trying to pinpoint areas of improvement.

As a comparison, here is the graphic I created using the same data:

It may be hard to believe that the same data, but here are the improvements:

  • A more self-explanatory title
  • Stylistically differentiate “us vs them”. In the above graphic, bitter methods are solid lines while “them” are dashed
  • More accessible x, y axis values
  • Eyes are drawn to upper right, as the throughput value stands out which is desirable as it shows bitter in a good light. It’s more clear which libraries perform better.

These add context that Criterion shouldn’t be expected to know. I recommend spending the time to dress reports up before presenting it to a wider audience.

Profiling and Criterion

Criterion does a great job comparing performance of implementations, but we have to rely on profiling tools to show us why one is faster than the other. We’ll be using the venerable valgrind, which doesn’t have a great cross platform story, so I’ll be sticking to linux for this.

# Create the benchmark executable with debugging symbols, but do not run it. We
# don't want valgrind to profile the compiler, so we have the "--no-run" flag. We
# also need debugging symbols so valgrind can track down source code
# appropriately. It blows my mind to this day that compiling with optimizations +
# debugging symbols is a thing. For so long I thought they were mutually
# exclusive.
RUSTFLAGS="-g" cargo bench  --no-run

# Now find the created benchmark executable. I tend to prefix my benchmark
# names with 'bench' to easily identify them
ls -lhtr ./target/release

# Let's say this was the executable
BENCH="./target/release/bench_bits-430e4e0f5d361f1f"

# Now identify a single test that you want profiled. Test identifiers are
# printed in the console output, so I'll use the one that I posted earlier
T_ID="sixtyfour_bits/bitter_byte_checked"

# Have valgrind profile criterion running our benchmark for 10 seconds
valgrind --tool=callgrind \
         --dump-instr=yes \
         --collect-jumps=yes \
         --simulate-cache=yes \
         $BENCH --profile-time 10 $T_ID

# valgrind outputs a callgrind.out.<pid>. We can analyze this with kcachegrind
kcachegrind

And we can navigate in kcachegrind to lines of code with the most instructions executed in them, and typically execution time scales with instructions executed.

Don’t worry if nothing stands out. I just wanted to take a screenshot of what a profiling result looks like (with the assembly of the line highlighted below). The goal of profiling is to receive a better inclination of the code base. Hopefully you’ll find hidden hot spots, fix them, and then see the improvement on the next criterion run.

While I’ve only focussed on Criterion, valgrind, kcachegrind – your needs may be better suited by flame graphs and flamer.

Make everything reproducible

Creating a benchmark and reports mean nothing if they are ephemeral, as no one else can reproduce what you did including yourself when your memory fades.

  • Include instructions in the readme on how to run the benchmark and generate any necessary output (eg:
cargo clean
RUSTFLAGS="-C target-cpu=native" cargo bench -- bit-reading
find ./target -wholename "*/new/raw.csv" -print0 | \
  xargs -0 xsv cat rows > assets/benchmark-data.csv
  • Commit the benchmark data to the repo. This may be a little controversial due to benchmarks varying across machines, but since benchmarks may take hours to run – you’ll save yourself and any contributors a ton of time when all they need is the data (for instance, when a visualization needs to be updated). Previous benchmark data can also be used to compare performance throughout time. Only commit new benchmark data when benchmarks have changed or a dependant library used in the comparison is updated.
  • Commit the script / instructions to generate graphics. I use R + ggplot2, but one can use matplotlib, gnuplot, or even Chart.js. Doesn’t matter what it is, but if someone points out a typo, you don’t want to scramble to remember how the chart was generated.

General Tips

  • Don’t force a narrative
    • While it’s important to be able to convey a point with graphics and callouts, ensure that the “competing” implementations are not gimped, as people prefer honesty over gamed benchmarks. Open source is not some winner take all, zero sum environment.
    • It’s ok if, after benchmarking, your library isn’t on top. Benchmark suites in and of themselves are extremely useful to a community, see: TechEmpower Web Benchmarks, JSON Benchmark 1 / 2, HTTP / JSON / MP4 parsing benchmarks
  • Benchmark older versions of your library so you can accurately track progress or catch regressions. This can easily be done in Rust:
[dev-dependencies.bitterv1]
package = "bitter"
version = "=0.1.0"

and reference it like:

extern crate bitterv1;
  • A single graphic often may not be satisfactory for all use cases. If we examine the chart I posted earlier, cramping is apparent when read sizes are small (< 4 bits), which may be important to some use cases.

We can fix that with a tasteful table

Now users can quickly quantify performance at all sizes (well… to the closest power of 2). Being able to see a trend with shading is a bonus here.

  • When benchmarking across languages, first try an apples to apples comparison in the same benchmark framework (like Criterion) using a (hopefully) a zero cost -sys crate for C / C++. Else one can get an approximation using appropriate benchmark harness for each language (make sure it can output data in csv or json):

Conclusion

In summary:

  • It’s ok to use benchmark console output in issues / pull requests
  • While criterion reports are geared towards contributors, a couple may be devoted to a wider audience in a pinch
  • Putting thought into graphics is considerate for potential users
  • Profiling with criterion benchmarks + valgrind + kcachegrind is a great way to find hot spots in the code to optimize
  • Make benchmark data and graphics reproducible by committing the data / scripts to the repo

Quick investment advice for the smart and lazy

Fees, fees, and more fees

Someone new to investing asked for some advice of how they should allocate their SIMPLE IRA funds in their Vanguard account. There are so many funds that if you didn’t know what to look for, it’s easy to get lost or worse – make bad decisions. Here’s my philosophy:

  • Be average: I don’t have the insight into companies to outmaneuver hedge funds and professional traders and neither do you. So one should strive to invest into assets that track the market.
  • Be frugal: Since we’re average, any fees cut into our retirement.
  • Be lazy: I don’t have time to rebalance my portfolio so that it becomes more conservative as it matures. One should set their portfolio to 100% of a single fund and let that sit until retirement.

The takeaway: invest 100% into a single Vanguard Target Retirement Fund. Look up your age, and vanguard will tell which target retirement fund is appropriate. For instance, VFIFX (a target retirement of 2050 fund) is one of the options.

To dive deeper into the reasoning:

  • For the Vanguard SIMPLE IRA, each fund charges a yearly fee of $25. So if you invested $100 initially and the market stayed the same over 4 years, you’d actually be losing money.
    • Takeaway: it’s best to choose one fund and put a lot of money into it.
    • Ideally this means you plan on eventually having over $2,500 in the account so that the fee is less than 1%.
  • But wait, there are more fees! Each fund charges a fee proportional to funds invested into it.
    • For instance, if a fund’s fee is 0.5% and you have $100 total invested, you’d be charged 50 cents for that year. 0.5% doesn’t sound like a lot until you have an account with 1mil and you’re being charged 5,000 a year.
    • Takeaway: look for funds with a low management fee (ideally < 0.3%), so that savings in fees will be compounded as the years progress.
  • Though not applicable to Vanguard SIMPLE IRA, I must mention tranasction fees (yet another fee!). If it costs you $7 a trade you better be investing over $700 (ideally 2,500) at a time to minimize what you have to “make up” to even out the fee.
  • I don’t have the expertise nor the inclination to manage my portfolio (eg: evaluate new funds / transfer money to different funds (eg: to more bonds)) every year, all I know is that I can be more risky while I’m young, but when I’m nearing retirement I’d like a more conservative portfolio to guard against a depressed market delaying retirement.
  • Enter: target date funds. They encapsulate that idea by tracking the market and automatically balancing the portfolio. So it will start out risky with more stocks but as the fund ages it will automatically start acquiring more conservative bonds.
  • VFIFX is a target retirement 2050 fund which has a 0.15% management fee. So a $100k portfolio would have an annual fee of (100k * 0.0015 + 25) $175 (or 0.175% of portfolio).
  • If you think you can outsmart the market, there are better funds out there

This post complements notecard investing and dives deeper into the “buy inexpensive, well-diversified mutual funds such as Vanguard Target 20XX funds”. For a rabbit hole of information, see bogleheads Vanguard target retirement funds.


Leaning on Algo to route Docker traffic through Wireguard

Algo, wireguard, and dns helping to keep you from driving off the road

Part of the Wireguard series:

---

I write about Wireguard often. It’s been a bit over a year since my initial article and a lot has changed. Enter algo. Within the last year, it has added support for Wireguard. While algo sets up a bit more than Wireguard, it is unparalleled in ease of deployment. After a couple of prompts, the server will be initialized and configs generated for your users.

In this article I’m going to touch on everything that I’ve written about before and try consolidate everything into a single post that I can refer back to. This article fits more into an advanced usage of Algo and wireguard though I try and make it clear when one can jettison the article if an advanced setup is not necessary.

The end goal is to setup a docker server where a subset of the containers are routed through our wireguard interface, as illustrated below. I will also touch upon routing an entire machine’s traffic through the VPN, which is a much easier process.

Algo Preface

There are many ways to install algo, but I am opinionated. I will illustrate my way, but feel free to deviate.

Algo should be installed on a local machine. I use an Ubuntu VM, but you should also be able to work on Windows via WSL or on a Mac. While algo supports a local installation (where algo is installed on the same server as the VPN), I recommend against it as an algo installation will contain the configurations, and the private and public keys for all users – things that shouldn’t be on such a disposable server.

Algo does have an official docker image (15 pulls as of this writing, and I’m pretty sure I’m half of them). To clarify, this is not the VPN, the algo docker image is merely a way to simplify installing dependencies to run algo. Unfortunately since algo does not have any formal releases at this time, when I tried using the docker image the configuration code was out of date with the bleeding edge config in the repo, and caused DNS issues that made me scratch my head for several hours.

Now we choose a VPS provider. Literally any of them should be fine. I happen to like Vultr – they have a lot of options (like a $3.5/mo plan!). They are perfect for disposable servers. When testing out my methodology, I think I spun up a couple dozen servers in my attempts and in the end it only cost me 20 or so cents. So assuming the reader is following along, the post will contain a very small number of vultr specific instructions.

Algo Setup

Before we get started, grab your api key from vultr: https://my.vultr.com/settings/#settingsapi. We need to save this key so algo can access.

We’ll be executing the following on our algo machine (which, to reiterate, is a machine sitting next to you or close by).

# Fill me in
VULTR_API_KEY=

# Installs the minimal dependencies (python 2 and git) to get algo up and
# going. It is possible to go a more flexible route by using pyenv, but that may be
# too developer centric and confusing for most people. Not to mention, most 
# likely python, pip, and git are already installed
apt install -y python python-pip git

# There are no released versions of algo, so we have to work off the bleeding edge
git clone https://github.com/trailofbits/algo.git
cd algo

# The file that algo will use to read our API key to create our server
cat > vultr.ini <<EOF
[default]
key = $VULTR_API_KEY
EOF

# Set the key so that only we can read it. Wouldn't want other users on the
# system to know the key to mess with our vultr account
chmod 600 vultr.ini

# Using pipenv here greatly simplifies the installing of python dependencies
# into an isolated environment, so it won't pollute the main system
pip install --user pipenv
pipenv install --skip-lock -r requirements.txt

${EDITOR:-vi} config.cfg

Up comes the editor. Edit the users who will be using this VPN. Keep in mind, algo will ask if you want the possibility of modifying this list, so one should have a pretty good idea of the users. It is more secure to disable modifications of users after setup. It may even be easier to spin up an entirely new VPN with all the users and migrate existing users.

# Start the show
pipenv run ./algo

Answer the questions as you see fit, except for one. Answer yes to:

Do you want to install a DNS resolver on this VPN server, to block ads while surfing?

We’ll need it to resolve DNS queries.

You also don’t need to input anything for the following question (as algo will pick it up automatically)

Enter the local path to your configuration INI file

Let the command run. It may take 10 minutes if you’re setting up a server halfway around the world.

Once algo is finished and you are not interested in routing select docker traffic through a wireguard interface, you can skip the rest of the article and read the official wireguard documentation on configuring clients.

Routing select docker traffic through Wireguard

Make sure that wireguard is already installed on the box that we’ll be running docker.

Before we hop onto that, we need to modify the wireguard config files that algo created. These config files are in wg-quick format, and while amazingly convenient, it will route all traffic through the VPN, which may be undesirable. Instead we’re going to employ a method that will send specific docker container traffic through the VPN.

First, we need to remove the wg-quick incompatibilities for our docker user (one of the users created in config.cfg):

# the wireguard user
CLIENT_USER=docker-user

# grab the client's VPN address. If more than one VPN server, this script will
# need to be modified to reference the correct config
CLIENT_ADDRESS=$(grep Address configs/*/wireguard/$CLIENT_USER.conf | \
	grep -P -o "[\d/\.]+" | head -1)

# Calculate the client's wireguard config sans wg-quick-isms.
CLIENT_CONFIG=$(sed '/Address/d; /DNS/d' configs/*/wireguard/$CLIENT_USER.conf)

# The docker subnet for our containers to run in
DOCKER_SUBNET=10.193.0.0/16

# Generate the Debian interface configuration. While this configuration is
# debian specific, the `ip` commands are linux distro agnostic, so consult your
# distro's documentation on setting up interfaces. These commands are adapted
# wg-quick but are suited for an isolated interface
INTERFACE_CONFIG=$(
cat <<EOF
# interfaces marked "auto" are brought up at boot time.
auto wg1
iface wg1 inet manual

# Resolve dns through the dns server setup on our wireguard server
dns-nameserver 172.16.0.1

# Create a wireguard interface (device) named 'wg1'. The kernel knows what a
# wireguard interface is as we've already installed the kernel module
pre-up ip link add dev wg1 type wireguard

# Setup the wireguard interface with the config calculated earlier
pre-up wg setconf wg1 /etc/wireguard/wg1.conf

# Give our wireguard the client address the server expects our key to come from
pre-up ip address add $CLIENT_ADDRESS dev wg1
up ip link set up dev wg1

# Mark traffic emanating from our select docker containers into table 200
post-up ip rule add from $DOCKER_SUBNET table 200

# Route table 200 traffic through our wireguard interface
post-up ip route add default via ${CLIENT_ADDRESS%/*} table 200

# rp_filter is reverse path filtering. By default it will ensure that the
# source of the received packet belongs to the receiving interface. While a nice
# default, it will block data for our VPN client. By switching it to '2' we only
# drop the packet if it is not routable through any of the defined interfaces.
post-up sysctl -w net.ipv4.conf.all.rp_filter=2

# Delete the iterface when ifdown wg1 is executed
post-down ip link del dev wg1
EOF
)

Check the results: CLIENT_CONFIG should look a little like:

[Interface]
PrivateKey = notmyprivatekeysodonottry

[Peer]
PublicKey = notthepublickeyeither
AllowedIPs = 0.0.0.0/0, ::/0
Endpoint = 100.100.100.100:51820
PersistentKeepalive = 25

Copy this CLIENT_CONFIG into /etc/wireguard/wg1.conf onto the docker machine

Copy INTERFACE_CONFIG into /etc/network/interfaces.d/wg1 onto the docker machine.

Setup Docker Networking

Create our docker network:

docker network create docker-vpn0 --subnet $DOCKER_SUBNET

At this point, if running Debian, you should be able to execute ifup wg1 successfully. And to test wireguard execute:

curl --interface wg1 'http://httpbin.org/ip'

The ip returned should be the same as the Endpoint option.

It’s also important to test that our DNS is setup appropriately, as our wireguard server may resolve hosts differently. Additionally, we don’t want to leak DNS if that is important to you. Testing DNS is a little bit more nuanced as one can’t provide an interface for dig to use, so we use emulate it by executing a dns query that comes from our docker subnet.

dig -b ${DOCKER_SUBNET%/*} @172.16.0.1 www.google.com

You can then geolocate the given IPs and see if they are located in the same area as the server (this really only works for domains that use anycast). Another way to verify is to execute tcpdump -n -v -i wg1 port 53 and see the dig command successfully communicate with 172.16.0.1 (alternatively one can verify that no traffic was sent on eth0 port 53).

To top our tests off:

docker run --rm --dns 172.16.0.1 \
  --network docker-vpn0 \
  appropriate/curl http://httpbin.org/ip

Providing the --dns flag is critical, else docker will delegate to the host’s machine /etc/resolv.conf, which we are trying to circumnavigate with our wg1 interface config!

We won’t be having a perma docker network, so delete the network we created.

docker network rm docker-vpn0

Docker Compose

Instead we’ll be using docker compose, which encapsulates commandline arguments better. We’ll have docker compose create the subnet and dns for our container.

version: '3'
services:
  curl:
    image: 'appropriate/curl'
    dns: '172.16.0.1'
    networks:
      docker-vpn0: {}

networks:
  docker-vpn0:
    ipam:
      config:
        - subnet: 10.193.0.0/16

While the compose file ends up being more lines, the file can be checked into source control and more easily remembered and communicated to others.

docker-compose up

And we’re done! Straying from official algo and wireguard docs did add a bit of boilerplate but hopefully it isn’t much and it’s easy to maintain.

I must mention that solution #1 in a previous article did not earn a shoutout here. To recap, solution #1 encapsulated wireguard into a docker container and routed other containers through the wireguard container. Initially, this was my top choice, but I’ve switched to the method illustrated here for reasons partly explained in that previous article. Wireguard is light enough and easy enough to gain insight through standard linux tools that having the underlying OS manage the interface outweighs the benefits brought by an encapsulated wireguard container.