Welcome to NBSoftSolutions, home of the software development company and writings of its main developer: Nick Babcock. If you would like to contact NBSoftSolutions, please see the Contact section of the about page.

SQLite: Dropping an index for a 300x speedup

For a small, personal project I use sqlite as a time series database and make queries like the following:

SELECT referer,
       COUNT(*) AS views
FROM   logs
WHERE  host = 'comments.nbsoftsolutions.com' AND
       epoch >= 1551630947 AND
       epoch < 1551632947
GROUP  BY referer
ORDER  BY views DESC

Over time, I’ve noticed that this query has been increasingly time consuming, sometimes taking minutes to complete. I thought I created proper indices:

CREATE index idx_epoch on logs(epoch);
CREATE index idx_host ON logs(host);

I ran ANALYZE, and still no changes. I was at my wits end, seriously thought about looking into another database. Then I enabled sqlite’s EXPLAIN QUERY PLAN and saw the following output.

0|0|0|SEARCH TABLE logs USING INDEX idx_host (host=?)
0|0|0|USE TEMP B-TREE FOR GROUP BY
0|0|0|USE TEMP B-TREE FOR ORDER BY

Omitting the epoch timestamp index from a time series table is a red flag. SQLite can only use one index from each table and it was choosing the wrong one.

After crawling sqlite docs, I found a way to disable the host index using the unary “+” operator.

Now our query looks like:

SELECT referer,
       COUNT(*) AS views
FROM   logs
WHERE  +host = 'comments.nbsoftsolutions.com' AND
       epoch >= 1551630947 AND
       epoch < 1551632947
GROUP  BY referer
ORDER  BY views DESC

And the new query plan picks up on our hint:

EARCH TABLE logs USING INDEX idx_epoch (epoch>? AND epoch<?)
0|0|0|USE TEMP B-TREE FOR GROUP BY
0|0|0|USE TEMP B-TREE FOR ORDER BY

Execution time decrease from 30 seconds to 0.1 second, a 300x speedup by dropping the host index from consideration.

Being able to write raw sql like this is a major reason why I tend to have a disdain for ORMs that abstract everything away. Sometimes we need to go to a lower level.

Let’s see if we can’t tease out why sqlite naively makes the wrong decision. From the docs on choosing from multiple indices:

the sqlite_stat1 table might indicate that an equality constraint on column x reduces the search space to 10 rows on average, whereas an equality constraint on column y reduces the search space to 3 rows on average

The contents of sqlite_stat1 in my scenario:

logs|idx_host|3165918 7685
logs|idx_epoch|3165918 2

This table states that an equality constraint on host reduces the search space to an average of 7685, and an average of 2 for epoch. Considering that our epoch range is [1551630947, 1551632947), a 2000 difference, I would have expected that sqlite would have realized that (2 * 2000 < 7685). Even updating the estimate that idx_host index narrows the results down to 10 million rows changes nothing:

UPDATE sqlite_stat1
SET stat = '3165918 9999999'
WHERE idx = 'idx_host';

Thus we can conclude that SQLite will always prefer an equality constraint in index evaluations versus any range constraint. So update your queries or drop offending indices if you have to.

If this is not true, or anyone has new / differing information, feel free to comment. This is with sqlite 3.22.0.


Azure Pipelines for Rust Projects

Azure Pipelines bridging the CI gap

EDIT: 2019-03-07: The bug described in the edit has now been fixed, but I’m leaving the edit in for posterity.

EDIT: 2019-03-01: (this section describes a bug that has been fixed, so one can skip to “END EDIT”)

Currently there is a critical bug in creating cross platform jobs. Adapting the sample code for cross platform jobs to:

strategy:
  matrix:
    linux:
      imageName: 'ubuntu-16.04'
    mac:
      imageName: 'macos-10.13'
    windows:
      imageName: 'vs2017-win2016'
pool:
  vmImage: $(imageName)

steps:
  - bash: env
    displayName: env

Causes every host to use ubuntu-16.04 (as mentioned in the UI and AGENT_OS environment variable). This is heartbreaking as this example was working a day or two ago. I can no longer recommend Azure Pipelines in this state. No one likes debugging CI issues. I spent a lot of time working on this blog post, so hopefully it will still be of use when this issue is fixed.

END EDIT

In this post I will detail why I believe that Azure Pipelines can be a great CI / CD platform for open source Rust projects on Github. The catch is that there are some rough spots on Azure Pipelines and in the rust ecosystem, but everything can be worked around. In writing this post, I hope to detail examples one can copy and paste into their projects.

The goal isn’t to convince the world to ditch Travis, Appveyor, Circle CI, etc for Azure Pipelines, rather introduce a relatively underused CI. Unless you enjoy CI configuration debugging, stick with the your current process.

Hello World

To start off, let’s compile a project of mine (but any rust project is fine) using stable rust on Ubuntu. To activate azure pipeline, we need the yaml file azure-pipelines.yml in the root directory:

pool:
  vmImage: 'ubuntu-16.04'

steps:
  - script: |
      curl https://sh.rustup.rs -sSf | sh -s -- -y
      echo "##vso[task.setvariable variable=PATH;]$PATH:$HOME/.cargo/bin"
    displayName: Install rust
  - script: cargo build --all
    displayName: Cargo build
  - script: cargo test --all
    displayName: Cargo test

While azure pipelines natively supports environments like Java, .NET, Go, Ruby, etc – rust is not so lucky (yet! One should be able to contribute it). Our first step is to install rust. While the script, for the most part, appears self explanatory, one must invoke a logging command (via echo for bash) to add cargo to the environment’s path for subsequent steps (the build and test steps).

The Azure Pipeline UI renders the build as follows:

Build on Multiple Rust Versions

One should test their project on more than just stable, ideally these four versions:

  • The minimum supported Rust version
  • Current stable
  • Beta
  • Nightly

A matrix will generate copies of a job with different inputs. We define a rustup_toolchain environment variable to reference in our installation step.

strategy:
  matrix:
    stable:
      rustup_toolchain: stable
    beta:
      rustup_toolchain: beta
    nightly:
      rustup_toolchain: nightly

pool:
  vmImage: 'ubuntu-16.04'

steps:
  - script: |
      curl https://sh.rustup.rs -sSf | sh -s -- -y --default-toolchain $RUSTUP_TOOLCHAIN
      echo "##vso[task.setvariable variable=PATH;]$PATH:$HOME/.cargo/bin"
    displayName: Install rust
  - script: cargo build --all
    displayName: Cargo build
  - script: cargo test --all
    displayName: Cargo test

Azure pipelines will render as follows:

The one shortcoming here is that there is no easy way to instruct Azure Pipelines that it’s ok for one of the matrix cells to fail (relevant github issue). For instance, allowing failures on nightly is not uncommon. Hopefully we see Azure Pipelines support this feature soon.

Jobs and Containers with Clippy

So far, I’ve only demonstrated a pipeline with one job, so we were able to use a more concise pipeline configuration format. Now I want to introduce multiple jobs with an example. Clippy and its code lints are an indespensible tool to the rust community. Our goal will be to have these lints ran alongside the build and test phase, as only running clippy after tests succeed may obscure the tips from clippy when a test fails (tips that may deduce why tests failed).

First, let’s look at what it will take to run clippy:

pool:
  vmImage: 'ubuntu-16.04'
steps:
  - script: |
      curl https://sh.rustup.rs -sSf | sh -s -- -y
      echo "##vso[task.setvariable variable=PATH;]$PATH:$HOME/.cargo/bin"
    displayName: Install rust
  - script: rustup component add clippy
    displayName: Install Clippy
  - script: cargo clippy --all
    displayName: Run clippy

Nothing too new was just seen, but the installing rust step is more cumbersome than desired. Azure pipeline offers the ability to run jobs inside containers, so we can leverage the official rust container to save a few lines of configuration.

pool:
  vmImage: 'ubuntu-16.04'
container: 'rust:latest'
steps:
  - script: rustup component add clippy
    displayName: Install Clippy
  - script: cargo clippy --all
    displayName: Run clippy

Better. Keep in mind that the rust docker image doesn’t support beta / nightly so we need to continue using our manual installation for building and testing our code.

Let’s combine our clippy job with our hello world job into one pipeline:

jobs:
  - job: 'Clippy'
    pool:
      vmImage: 'ubuntu-16.04'
    container: 'rust:latest'
    steps:
      - script: rustup component add clippy
        displayName: Install Clippy
      - script: cargo clippy --all
        displayName: Run clippy

  # Strategy matrix removed from config for brevity, 
  # but one can nest it inside a job
  - job: 'Test'
    pool:
      vmImage: 'ubuntu-16.04'
    steps:
      - script: |
          curl https://sh.rustup.rs -sSf | sh -s -- -y
          echo "##vso[task.setvariable variable=PATH;]$PATH:$HOME/.cargo/bin"
        displayName: Install rust
      - script: cargo build --all
        displayName: Cargo build
      - script: cargo test --all
        displayName: Cargo test

Conditions with Rustfmt

Let’s get a bit more fancy. Rustfmt is another tool available to rust developers. It’s great. It can help a code base appear more consistent. Whenever someone opens a pull request on a repo, it’s important that they adhere by the style guidelines. But keep in mind, maybe it’s not as important to pass style checks if one has to push an emergency bugfix to a master branch.

We can create a pipeline where we only check the style on pull requests using conditions. Conditions can be specified per job or step.

jobs:
  - job: 'Rustfmt'
    pool:
      vmImage: 'ubuntu-16.04'
    container: 'rust:latest'
    condition: eq(variables['Build.Reason'], 'PullRequest')
    steps:
      - script: rustup component add rustfmt
        displayName: Install Rustfmt
      - script: cargo fmt --all -- --check
        displayName: Run Rustfmt

  - job: 'Test'
    pool:
      vmImage: 'ubuntu-16.04'
    steps:
      - script: |
          curl https://sh.rustup.rs -sSf | sh -s -- -y
          echo "##vso[task.setvariable variable=PATH;]$PATH:$HOME/.cargo/bin"
        displayName: Install rust
      - script: cargo build --all
        displayName: Cargo build
      - script: cargo test --all
        displayName: Cargo test

Here I’m using the Build.Reason variable to check if the build is working against a PR. Here is a list of predefined variables

I’ve chosen to make rustfmt a separate job so that it is quicker to understand that the only reason why a job failed is because of formatting.

Build on Linux, Mac, and Windows

Cross platform pipelines is where Azure Pipeline’s value proposition comes into play. I’m not aware of any other free CI where I can test windows, mac, and linux builds all in one place. I’ve always had to maintain a travis and appveyor configs, so the thought of consolidating them into one gives me delight.

Below is a config that tests:

  • Windows using rust stable
  • Mac using rust stable
  • Linux using rust stable, beta, and nightly
strategy:
  matrix:
    windows-stable:
      imageName: 'vs2017-win2016'
      rustup_toolchain: stable
    mac-stable:
      imageName: 'macos-10.13'
      rustup_toolchain: stable
    linux-stable:
      imageName: 'ubuntu-16.04'
      rustup_toolchain: stable
    linux-beta:
      imageName: 'ubuntu-16.04'
      rustup_toolchain: beta
    linux-nightly:
      imageName: 'ubuntu-16.04'
      rustup_toolchain: nightly

pool:
  vmImage: $(imageName)

steps:
  - script: |
      curl https://sh.rustup.rs -sSf | sh -s -- -y --default-toolchain $RUSTUP_TOOLCHAIN
      echo "##vso[task.setvariable variable=PATH;]$PATH:$HOME/.cargo/bin"
    displayName: Install rust
    condition: ne( variables['Agent.OS'], 'Windows_NT' )
  - script: |
      curl -sSf -o rustup-init.exe https://win.rustup.rs
      rustup-init.exe -y --default-toolchain %RUSTUP_TOOLCHAIN%
      echo "##vso[task.setvariable variable=PATH;]%PATH%;%USERPROFILE%\.cargo\bin"
    displayName: Windows install rust
    condition: eq( variables['Agent.OS'], 'Windows_NT' )
  - script: cargo build --all
    displayName: Cargo build
  - script: cargo test --all
    displayName: Cargo test

This config doesn’t contain anything too groundbreaking compared to previous ones, but combines a few concepts. It does demonstrate that steps can have conditions to them. So every time a Linux / Mac job is executed, the “Windows install rust” step is skipped and will be greyed out in the UI (and vice versa).

The reason why stable, beta, and nightly are not tested against windows, mac, and linux is to keep the example concise, but also because a cross-product matrix strategy is not implemented yet, so one has to manually expand it.

Introduction to Templates

The previous cross platform pipeline is ok. We can improve upon it with templates. Unfortunately, this will mean that we have to split our pipeline file into two, so everything won’t be self contained in a single file. Hopefully it is still easy to follow along.

We’re going to create a template whose only job is to install rust. We’ll store the template in _build/install-rust.yml

First, here is how we reference the template file in our main config.

strategy:
  matrix:
    windows-stable:
      imageName: 'vs2017-win2016'
      rustup_toolchain: stable
    # ...

pool:
  vmImage: $(imageName)

steps:
  - template: '_build/install-rust.yml'
  - script: cargo build --all
    displayName: Cargo build
  - script: cargo test --all
    displayName: Cargo test

Then in _build/install-rust.yml we insert the appropriate install rust step based on the agent.

steps:
 - script: |
     curl -sSf -o rustup-init.exe https://win.rustup.rs
     rustup-init.exe -y --default-toolchain %RUSTUP_TOOLCHAIN%
     echo "##vso[task.setvariable variable=PATH;]%PATH%;%USERPROFILE%\.cargo\bin"
   displayName: Windows install rust
   condition: eq( variables['Agent.OS'], 'Windows_NT' )
 - script: |
     curl https://sh.rustup.rs -sSf | sh -s -- -y --default-toolchain $RUSTUP_TOOLCHAIN
     echo "##vso[task.setvariable variable=PATH;]$PATH:$HOME/.cargo/bin"
   displayName: Install rust
   condition: ne( variables['Agent.OS'], 'Windows_NT' )

Remember: just like how it can be good to split our source code to help understanding, same too can be said about one’s CI configuration.

Reusable Template with Parameters

Templates can have parameters to make them even more function like. Notice that our previous example referenced RUSTUP_TOOLCHAIN. Like in source code, referencing a global, potentially undefined variable is a bad idea.

Instead our template should pick the toolchain in the following order:

  1. Parameter if provided
  2. Global RUSTUP_TOOLCHAIN if available
  3. Default to stable if none are available

Here’s our new _build/azure-pipelines.yml:

parameters:
  rustup_toolchain: ''

steps:
 - bash: |
     TOOLCHAIN="${{parameters['rustup_toolchain']}}"
     TOOLCHAIN="${TOOLCHAIN:-$RUSTUP_TOOLCHAIN}"
     TOOLCHAIN="${TOOLCHAIN:-stable}"
     echo "##vso[task.setvariable variable=TOOLCHAIN;]$TOOLCHAIN"
   displayName: Set rust toolchain
 - script: |
     curl -sSf -o rustup-init.exe https://win.rustup.rs
     rustup-init.exe -y --default-toolchain %RUSTUP_TOOLCHAIN%
     echo "##vso[task.setvariable variable=PATH;]%PATH%;%USERPROFILE%\.cargo\bin"
   displayName: Windows install rust
   condition: eq( variables['Agent.OS'], 'Windows_NT' )
 - script: |
     curl https://sh.rustup.rs -sSf | sh -s -- -y --default-toolchain $RUSTUP_TOOLCHAIN
     echo "##vso[task.setvariable variable=PATH;]$PATH:$HOME/.cargo/bin"
   displayName: Install rust
   condition: ne( variables['Agent.OS'], 'Windows_NT' )

Besides the new parameters section, this template explicitly calls bash instead of script. While script calls each platform’s native interpreter, it’s occasionally useful to force Windows to use bash to more easily write a cross platform script.

While we don’t have to change our configuration, our new template allows for inputs if that is someone’s preference:

pool:
  vmImage: 'ubuntu-16.04'
steps:
  - template: '_build/install-rust.yml'
    inputs:
      rustup_toolchain: beta

The great news is that it is possible for us (the rust community) to create a repo with all the templates we need, and then to reference this repo like so

# Just an example resource, does not actually exist
resources:
  repositories:
    - repository: templates
      type: github
      name: rust-lang/azure-pipeline-templates
jobs:
 - template: [email protected] 

This would seriously cut down on the amount of CI code per rust project, as there wouldn’t even be need to copy and paste CI config code anymore (a la trust).

Workaround Cross Compilation

Cross is another incredible tool at our disposal for “zero setup” cross compilation and “cross testing” of Rust crates. The bad news is azure pipelines doesn’t allocate a tty (sensibly so), but Cross assumes one is allocated. While a pull request is open to fix the issue, it remains to be seen if this will be merged in the near future.

Never one to give up (cross compilation is dear to me), I have a gist patch to the cross repo. This mean that the repo needs to be cloned, patch downloaded, patch applied, and finally installed. Not ideal, but not excruciatingly difficult.

strategy:
  matrix:
    musl:
      target: 'x86_64-unknown-linux-musl'
      imageName: 'ubuntu-16.04'
    gnu:
      target: 'x86_64-unknown-linux-gnu'
      imageName: 'ubuntu-16.04'
    mac:
      target: 'x86_64-apple-darwin'
      imageName: 'macos-10.13'
pool:
  vmImage: $(imageName)
steps:
  - template: '_build/install-rust.yml'
  - script: |
      set -eou
      D=$(mktemp -d)
      git clone https://github.com/rust-embedded/cross.git "$D"
      cd "$D"
      curl -O -L "https://gist.githubusercontent.com/nickbabcock/c7bdc8e5974ed9956abf46ffd7dc13ff/raw/e211bc17ea88e505003ad763fac7060b4ac1d8d0/patch"
      git apply patch
      cargo install --path .
      rm -rf "$D"
    displayName: Install cross
  - script: cross build --all --target $TARGET
    displayName: Build
  - script: cross test --all --target $TARGET
    displayName: Test

Unfortunately this can’t be a long term solution as the patch will likely become outdated and non-applicable in short order. Any projects relying on cross compilation should wait before exploring azure pipelines due to incompatibilities with cross.

You may have noticed that I’ve eschewed container jobs. That’s because the Docker socket is not accessible from inside the Docker container, which will break cross. The provided workaround is too much for me to cope with – ideally there should be no workarounds (including the cross workaround). Hence why I’ve stuck with using the base agents.

Generating Github Releases

Ready for pain? Welcome to github releases. Azure Pipelines only received this feature a couple months ago, so it shouldn’t be surprising that there are rough edges. That said, hopefully a lot of what I’ll be describing here will become outdated.

Here’s what I want when I’m about to release an update:

  • I generate a git tag (eg: v1.1.0) (maybe through something like cargo release) and push it
  • CI builds and tests the tag
  • CI creates optimized executables and uploads them to the Github Release with the date of the release

I’ve made a pretty diagram for this flow:

Let’s go through the issues encountered

Tag name

In the migrating from travis document, it lists BUILD_SOURCEBRANCH as a replacement for TRAVIS_TAG. Except it’s not. Reading the description:

For Azure Pipelines, the BUILD_SOURCEBRANCH will be set to the full Git reference name, eg refs/tags/tag_name

That means that my v1.1.0 will become refs/tags/v1.1.0 which is not nearly as attractive of a name. Fixing this requires the following step to define our own variable.

steps:
  - bash: |
      MY_TAG="$(Build.SourceBranch)"
      MY_TAG=${MY_TAG#refs/tags/}
      echo $MY_TAG
      echo "##vso[task.setvariable variable=build.my_tag]$MY_TAG"
    displayName: "Create tag variable"

Date of build

This isn’t a terrible problem, but it would be nice to be able to access the build’s date in ISO-8601 format without resorting to a custom step

steps:
  - bash: |
      DATE="$(date +%Y-%m-%d)"
      echo "##vso[task.setvariable variable=build.date]$DATE"
    displayName: "Create date variable"

Tags not built by default

Azure pipelines does not build tags when pushed to Github. I’m not kidding. In the documentation on build trigger it lists the default behavior as:

trigger:
  tags:
    include:
    - *

But that’s invalid yaml.

After fumbling around with trigger syntax to ensure that all branches and tags are built, I ended up with

trigger:
  branches:
    include: ['*']
  tags:
    include: ['*']

How is this not the default!? I’ve raised an issue about the documentation

Github Release Task

Before releasing, ensure that you’ve packaged the binaries appropriately. I used the CopyFiles and ArhiveFiles task:

steps:
  - task: [email protected]
    displayName: Copy assets
    inputs:
      sourceFolder: '$(Build.SourcesDirectory)/target/$(TARGET)/release'
      contents: |
        rrinlog
        rrinlog-server
      targetFolder: '$(Build.BinariesDirectory)/rrinlog'
  - task: [email protected]
    displayName: Gather assets
    inputs:
      rootFolderOrFile: '$(Build.BinariesDirectory)/rrinlog'
      archiveType: 'tar'
      tarCompression: 'gz'
      archiveFile: '$(Build.ArtifactStagingDirectory)/rrinlog-$(build.my_tag)-$(TARGET).tar.gz'

Then we can finally release! It took me a couple hours to figure out all the options. Partially because gitHubConnection has a bug where one has to re-target and re-save the pipeline to avoid authorization failure.

steps:
  - task: [email protected]
    condition: and(succeeded(), startsWith(variables['Build.SourceBranch'], 'refs/tags/'))
    inputs:
      gitHubConnection: 'nickbabcock'
      repositoryName: 'nickbabcock/rrinlog'
      action: 'edit'
      target: '$(build.sourceVersion)'
      tagSource: 'manual'
      tag: '$(build.my_tag)'
      assets: '$(Build.ArtifactStagingDirectory)/rrinlog-$(build.my_tag)-$(TARGET).tar.gz'
      title: '$(build.my_tag) - $(build.date)'
      assetUploadMode: 'replace'
      addChangeLog: false

In the end, I arrived at a satisfactory release to replace my previous travis + appveyor workflow.

Conclusion

Getting azure pipelines to work as expected has been exhausting, but that is mainly due to cross compilation (not azure pipeline’s fault), lack of documentation, and bugs around github releases. There are cool features that I haven’t even touched on, such as job dependencies, uploading test and code coverage results. While one should be able to use tarpaulin to generate the code coverage report, I haven’t yet identified a clear front runner for generating Junit, Nunit, or Xunit test results. After a break I may take another look.

In the end I’m excited what azure pipelines can do to help consolidate and streamline a lot of rust configs. It’s not there yet, but it may be soon.


Guidelines on Benchmarking and Rust

Beauty in benchmarks

This post covers:

  • Benchmark reports for contributors
  • Benchmark reports for users
  • Profiling with valgrind / kcachegrind
  • Reproducible benchmarks and graphics
  • Tips for benchmark behavior and benchmarking other languages

Lots of libraries advertise how performant they are with phrases like “blazingly fast”, “lightning fast”, “10x faster than y” – oftentimes written in the project’s main description. If performance is a libraries main selling point then I expect for there to be instructions for reproducible benchmarks and lucid visualizations. Nothing less. Else it’s an all talk and no action situation, especially because great benchmark frameworks exist in nearly all languages.

I find performance touting libraries without a benchmark foundation analogous to GUI libraries without screenshots.

This post mainly focusses on creating satisfactory benchmarks in Rust, but the main points here can be extrapolated.

Use Criterion

If there is one thing to takeaway from this post: benchmark with Criterion.

Never written a Rust benchmark? Use Criterion.

Only written benchmarks against Rust’s built in bench harness? Switch to Criterion:

  • Benchmark on stable Rust (I personally have eschewed nightly Rust for the last few months!)
  • Reports statistically significant changes between runs (to test branches or varying implementations).
  • Criterion is actively developed

Get started with Criterion

When running benchmarks, the commandline output will look something like:

sixtyfour_bits/bitter_byte_checked
                        time:   [1.1052 us 1.1075 us 1.1107 us]
                        thrpt:  [6.7083 GiB/s 6.7274 GiB/s 6.7416 GiB/s]
                 change:
                        time:   [-1.0757% -0.0366% +0.8695%] (p = 0.94 > 0.05)
                        thrpt:  [-0.8621% +0.0367% +1.0874%]
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  6 (6.00%) high severe

This output is good for contributors in pull requests or issues, but I better not see this in a project’s readme! Criterion generates reports automatically that are 100x better than console output.

Criterion Reports

Below is a criterion generated plot from one of my projects: bitter. I’m only including one of the nearly 1800 graphics generated by criterion, the one chosen captures the heart of a single benchmark measuring Rust bit parsing libraries across read sizes (in bits).

This chart shows the mean measured time for each function as the input (or the size of the input) increases.

Out of all the auto-generated graphics, I would consider this the only visualization that could be displayed for a more general audience, but I still wouldn’t use it this way. This chart lacks context, and it’s not clear what graphic is trying to convey. I’d even be worried about one drawing inappropriate conclusions (pop quiz time: there is a superior library for all parameters, which one is it?).

It’s my opinion that the graphics that criterion generates are perfect for contributors of the project as there is no dearth of info. Criterion generates graphics that break down mean, median, standard deviation, MAD, etc, which are invaluable when trying to pinpoint areas of improvement.

As a comparison, here is the graphic I created using the same data:

It may be hard to believe that the same data, but here are the improvements:

  • A more self-explanatory title
  • Stylistically differentiate “us vs them”. In the above graphic, bitter methods are solid lines while “them” are dashed
  • More accessible x, y axis values
  • Eyes are drawn to upper right, as the throughput value stands out which is desirable as it shows bitter in a good light. It’s more clear which libraries perform better.

These add context that Criterion shouldn’t be expected to know. I recommend spending the time to dress reports up before presenting it to a wider audience.

Profiling and Criterion

Criterion does a great job comparing performance of implementations, but we have to rely on profiling tools to show us why one is faster than the other. We’ll be using the venerable valgrind, which doesn’t have a great cross platform story, so I’ll be sticking to linux for this.

# Create the benchmark executable with debugging symbols, but do not run it. We
# don't want valgrind to profile the compiler, so we have the "--no-run" flag. We
# also need debugging symbols so valgrind can track down source code
# appropriately. It blows my mind to this day that compiling with optimizations +
# debugging symbols is a thing. For so long I thought they were mutually
# exclusive.
RUSTFLAGS="-g" cargo bench  --no-run

# Now find the created benchmark executable. I tend to prefix my benchmark
# names with 'bench' to easily identify them
ls -lhtr ./target/release

# Let's say this was the executable
BENCH="./target/release/bench_bits-430e4e0f5d361f1f"

# Now identify a single test that you want profiled. Test identifiers are
# printed in the console output, so I'll use the one that I posted earlier
T_ID="sixtyfour_bits/bitter_byte_checked"

# Have valgrind profile criterion running our benchmark for 10 seconds
valgrind --tool=callgrind \
         --dump-instr=yes \
         --collect-jumps=yes \
         --simulate-cache=yes \
         $BENCH --profile-time 10 $T_ID

# valgrind outputs a callgrind.out.<pid>. We can analyze this with kcachegrind
kcachegrind

And we can navigate in kcachegrind to lines of code with the most instructions executed in them, and typically execution time scales with instructions executed.

Don’t worry if nothing stands out. I just wanted to take a screenshot of what a profiling result looks like (with the assembly of the line highlighted below). The goal of profiling is to receive a better inclination of the code base. Hopefully you’ll find hidden hot spots, fix them, and then see the improvement on the next criterion run.

While I’ve only focussed on Criterion, valgrind, kcachegrind – your needs may be better suited by flame graphs and flamer.

Make everything reproducible

Creating a benchmark and reports mean nothing if they are ephemeral, as no one else can reproduce what you did including yourself when your memory fades.

  • Include instructions in the readme on how to run the benchmark and generate any necessary output (eg:
cargo clean
RUSTFLAGS="-C target-cpu=native" cargo bench -- bit-reading
find ./target -wholename "*/new/raw.csv" -print0 | \
  xargs -0 xsv cat rows > assets/benchmark-data.csv
  • Commit the benchmark data to the repo. This may be a little controversial due to benchmarks varying across machines, but since benchmarks may take hours to run – you’ll save yourself and any contributors a ton of time when all they need is the data (for instance, when a visualization needs to be updated). Previous benchmark data can also be used to compare performance throughout time. Only commit new benchmark data when benchmarks have changed or a dependant library used in the comparison is updated.
  • Commit the script / instructions to generate graphics. I use R + ggplot2, but one can use matplotlib, gnuplot, or even Chart.js. Doesn’t matter what it is, but if someone points out a typo, you don’t want to scramble to remember how the chart was generated.

General Tips

  • Don’t force a narrative
    • While it’s important to be able to convey a point with graphics and callouts, ensure that the “competing” implementations are not gimped, as people prefer honesty over gamed benchmarks. Open source is not some winner take all, zero sum environment.
    • It’s ok if, after benchmarking, your library isn’t on top. Benchmark suites in and of themselves are extremely useful to a community, see: TechEmpower Web Benchmarks, JSON Benchmark 1 / 2, HTTP / JSON / MP4 parsing benchmarks
  • Benchmark older versions of your library so you can accurately track progress or catch regressions. This can easily be done in Rust:
[dev-dependencies.bitterv1]
package = "bitter"
version = "=0.1.0"

and reference it like:

extern crate bitterv1;
  • A single graphic often may not be satisfactory for all use cases. If we examine the chart I posted earlier, cramping is apparent when read sizes are small (< 4 bits), which may be important to some use cases.

We can fix that with a tasteful table

Now users can quickly quantify performance at all sizes (well… to the closest power of 2). Being able to see a trend with shading is a bonus here.

  • When benchmarking across languages, first try an apples to apples comparison in the same benchmark framework (like Criterion) using a (hopefully) a zero cost -sys crate for C / C++. Else one can get an approximation using appropriate benchmark harness for each language (make sure it can output data in csv or json):

Conclusion

In summary:

  • It’s ok to use benchmark console output in issues / pull requests
  • While criterion reports are geared towards contributors, a couple may be devoted to a wider audience in a pinch
  • Putting thought into graphics is considerate for potential users
  • Profiling with criterion benchmarks + valgrind + kcachegrind is a great way to find hot spots in the code to optimize
  • Make benchmark data and graphics reproducible by committing the data / scripts to the repo