Welcome to NBSoftSolutions, home of the software development company and writings of its main developer: Nick Babcock. If you would like to contact NBSoftSolutions, please see the Contact section of the about page.
Over time, I’ve noticed that this query has been increasingly time consuming, sometimes taking minutes to complete. I thought I created proper indices:
I ran ANALYZE, and still no changes. I was at my wits end, seriously thought about looking into another database. Then I enabled sqlite’s EXPLAIN QUERY PLAN and saw the following output.
Omitting the epoch timestamp index from a time series table is a red flag. SQLite can only use one index from each table and it was choosing the wrong one.
After crawling sqlite docs, I found a way to disable the host index using the unary “+” operator.
Now our query looks like:
And the new query plan picks up on our hint:
Execution time decrease from 30 seconds to 0.1 second, a 300x speedup by dropping the host index from consideration.
Being able to write raw sql like this is a major reason why I tend to have a disdain for ORMs that abstract everything away. Sometimes we need to go to a lower level.
Let’s see if we can’t tease out why sqlite naively makes the wrong decision. From the docs on choosing from multiple indices:
the sqlite_stat1 table might indicate that an equality constraint on column x reduces the search space to 10 rows on average, whereas an equality constraint on column y reduces the search space to 3 rows on average
The contents of sqlite_stat1 in my scenario:
This table states that an equality constraint on host reduces the search space to an average of 7685, and an average of 2 for epoch. Considering that our epoch range is [1551630947, 1551632947), a 2000 difference, I would have expected that sqlite would have realized that (2 * 2000 < 7685). Even updating the estimate that idx_host index narrows the results down to 10 million rows changes nothing:
Thus we can conclude that SQLite will always prefer an equality constraint in index evaluations versus any range constraint. So update your queries or drop offending indices if you have to.
If this is not true, or anyone has new / differing information, feel free to comment. This is with sqlite 3.22.0.
Causes every host to use ubuntu-16.04 (as mentioned in the UI and AGENT_OS environment variable). This is heartbreaking as this example was working a day or two ago. I can no longer recommend Azure Pipelines in this state. No one likes debugging CI issues. I spent a lot of time working on this blog post, so hopefully it will still be of use when this issue is fixed.
In this post I will detail why I believe that Azure Pipelines can be a great CI / CD platform for open source Rust projects on Github. The catch is that there are some rough spots on Azure Pipelines and in the rust ecosystem, but everything can be worked around. In writing this post, I hope to detail examples one can copy and paste into their projects.
The goal isn’t to convince the world to ditch Travis, Appveyor, Circle CI, etc for Azure Pipelines, rather introduce a relatively underused CI. Unless you enjoy CI configuration debugging, stick with the your current process.
To start off, let’s compile a project of mine (but any rust project is fine) using stable rust on Ubuntu. To activate azure pipeline, we need the yaml file azure-pipelines.yml in the root directory:
While azure pipelines natively supports environments like Java, .NET, Go, Ruby, etc – rust is not so lucky (yet! One should be able to contribute it). Our first step is to install rust. While the script, for the most part, appears self explanatory, one must invoke a logging command (via echo for bash) to add cargo to the environment’s path for subsequent steps (the build and test steps).
The Azure Pipeline UI renders the build as follows:
Build on Multiple Rust Versions
One should test their project on more than just stable, ideally these four versions:
The minimum supported Rust version
A matrix will generate copies of a job with different inputs. We define a rustup_toolchain environment variable to reference in our installation step.
Azure pipelines will render as follows:
The one shortcoming here is that there is no easy way to instruct Azure Pipelines that it’s ok for one of the matrix cells to fail (relevant github issue). For instance, allowing failures on nightly is not uncommon. Hopefully we see Azure Pipelines support this feature soon.
Jobs and Containers with Clippy
So far, I’ve only demonstrated a pipeline with one job, so we were able to use a more concise pipeline configuration format. Now I want to introduce multiple jobs with an example. Clippy and its code lints are an indespensible tool to the rust community. Our goal will be to have these lints ran alongside the build and test phase, as only running clippy after tests succeed may obscure the tips from clippy when a test fails (tips that may deduce why tests failed).
First, let’s look at what it will take to run clippy:
Nothing too new was just seen, but the installing rust step is more cumbersome than desired. Azure pipeline offers the ability to run jobs inside containers, so we can leverage the official rust container to save a few lines of configuration.
Better. Keep in mind that the rust docker image doesn’t support beta / nightly so we need to continue using our manual installation for building and testing our code.
Let’s combine our clippy job with our hello world job into one pipeline:
Conditions with Rustfmt
Let’s get a bit more fancy. Rustfmt is another tool available to rust developers. It’s great. It can help a code base appear more consistent. Whenever someone opens a pull request on a repo, it’s important that they adhere by the style guidelines. But keep in mind, maybe it’s not as important to pass style checks if one has to push an emergency bugfix to a master branch.
We can create a pipeline where we only check the style on pull requests using conditions. Conditions can be specified per job or step.
I’ve chosen to make rustfmt a separate job so that it is quicker to understand that the only reason why a job failed is because of formatting.
Build on Linux, Mac, and Windows
Cross platform pipelines is where Azure Pipeline’s value proposition comes into play. I’m not aware of any other free CI where I can test windows, mac, and linux builds all in one place. I’ve always had to maintain a travis and appveyor configs, so the thought of consolidating them into one gives me delight.
Below is a config that tests:
Windows using rust stable
Mac using rust stable
Linux using rust stable, beta, and nightly
This config doesn’t contain anything too groundbreaking compared to previous ones, but combines a few concepts. It does demonstrate that steps can have conditions to them. So every time a Linux / Mac job is executed, the “Windows install rust” step is skipped and will be greyed out in the UI (and vice versa).
The previous cross platform pipeline is ok. We can improve upon it with templates. Unfortunately, this will mean that we have to split our pipeline file into two, so everything won’t be self contained in a single file. Hopefully it is still easy to follow along.
We’re going to create a template whose only job is to install rust. We’ll store the template in _build/install-rust.yml
First, here is how we reference the template file in our main config.
Then in _build/install-rust.yml we insert the appropriate install rust step based on the agent.
Remember: just like how it can be good to split our source code to help understanding, same too can be said about one’s CI configuration.
Reusable Template with Parameters
Templates can have parameters to make them even more function like. Notice that our previous example referenced RUSTUP_TOOLCHAIN. Like in source code, referencing a global, potentially undefined variable is a bad idea.
Instead our template should pick the toolchain in the following order:
Parameter if provided
Global RUSTUP_TOOLCHAIN if available
Default to stable if none are available
Here’s our new _build/azure-pipelines.yml:
Besides the new parameters section, this template explicitly calls bash instead of script. While script calls each platform’s native interpreter, it’s occasionally useful to force Windows to use bash to more easily write a cross platform script.
While we don’t have to change our configuration, our new template allows for inputs if that is someone’s preference:
The great news is that it is possible for us (the rust community) to create a repo with all the templates we need, and then to reference this repo like so
This would seriously cut down on the amount of CI code per rust project, as there wouldn’t even be need to copy and paste CI config code anymore (a la trust).
Workaround Cross Compilation
Cross is another incredible tool at our disposal for “zero setup” cross compilation and “cross testing” of Rust crates. The bad news is azure pipelines doesn’t allocate a tty (sensibly so), but Cross assumes one is allocated. While a pull request is open to fix the issue, it remains to be seen if this will be merged in the near future.
Never one to give up (cross compilation is dear to me), I have a gist patch to the cross repo. This mean that the repo needs to be cloned, patch downloaded, patch applied, and finally installed. Not ideal, but not excruciatingly difficult.
Unfortunately this can’t be a long term solution as the patch will likely become outdated and non-applicable in short order. Any projects relying on cross compilation should wait before exploring azure pipelines due to incompatibilities with cross.
You may have noticed that I’ve eschewed container jobs. That’s because the Docker socket is not accessible from inside the Docker container, which will break cross. The provided workaround is too much for me to cope with – ideally there should be no workarounds (including the cross workaround). Hence why I’ve stuck with using the base agents.
Generating Github Releases
Ready for pain? Welcome to github releases. Azure Pipelines only received this feature a couple months ago, so it shouldn’t be surprising that there are rough edges. That said, hopefully a lot of what I’ll be describing here will become outdated.
Here’s what I want when I’m about to release an update:
I generate a git tag (eg: v1.1.0) (maybe through something like cargo release) and push it
CI builds and tests the tag
CI creates optimized executables and uploads them to the Github Release with the date of the release
Before releasing, ensure that you’ve packaged the binaries appropriately. I used the CopyFiles and ArhiveFiles task:
Then we can finally release! It took me a couple hours to figure out all the options. Partially because gitHubConnection has a bug where one has to re-target and re-save the pipeline to avoid authorization failure.
In the end, I arrived at a satisfactory release to replace my previous travis + appveyor workflow.
Getting azure pipelines to work as expected has been exhausting, but that is mainly due to cross compilation (not azure pipeline’s fault), lack of documentation, and bugs around github releases. There are cool features that I haven’t even touched on, such as job dependencies, uploading test and code coverage results. While one should be able to use tarpaulin to generate the code coverage report, I haven’t yet identified a clear front runner for generating Junit, Nunit, or Xunit test results. After a break I may take another look.
In the end I’m excited what azure pipelines can do to help consolidate and streamline a lot of rust configs. It’s not there yet, but it may be soon.
Tips for benchmark behavior and benchmarking other languages
Lots of libraries advertise how performant they are with phrases like “blazingly fast”, “lightning fast”, “10x faster than y” – oftentimes written in the project’s main description. If performance is a libraries main selling point then I expect for there to be instructions for reproducible benchmarks and lucid visualizations. Nothing less. Else it’s an all talk and no action situation, especially because great benchmark frameworks exist in nearly all languages.
I find performance touting libraries without a benchmark foundation analogous to GUI libraries without screenshots.
This post mainly focusses on creating satisfactory benchmarks in Rust, but the main points here can be extrapolated.
If there is one thing to takeaway from this post: benchmark with Criterion.
When running benchmarks, the commandline output will look something like:
This output is good for contributors in pull requests or issues, but I better not see this in a project’s readme! Criterion generates reports automatically that are 100x better than console output.
Below is a criterion generated plot from one of my projects: bitter. I’m only including one of the nearly 1800 graphics generated by criterion, the one chosen captures the heart of a single benchmark measuring Rust bit parsing libraries across read sizes (in bits).
This chart shows the mean measured time for each function as the input (or the size of the input) increases.
Out of all the auto-generated graphics, I would consider this the only visualization that could be displayed for a more general audience, but I still wouldn’t use it this way. This chart lacks context, and it’s not clear what graphic is trying to convey. I’d even be worried about one drawing inappropriate conclusions (pop quiz time: there is a superior library for all parameters, which one is it?).
It’s my opinion that the graphics that criterion generates are perfect for contributors of the project as there is no dearth of info. Criterion generates graphics that break down mean, median, standard deviation, MAD, etc, which are invaluable when trying to pinpoint areas of improvement.
As a comparison, here is the graphic I created using the same data:
It may be hard to believe that the same data, but here are the improvements:
A more self-explanatory title
Stylistically differentiate “us vs them”. In the above graphic, bitter methods are solid lines while “them” are dashed
More accessible x, y axis values
Eyes are drawn to upper right, as the throughput value stands out which is desirable as it shows bitter in a good light. It’s more clear which libraries perform better.
These add context that Criterion shouldn’t be expected to know. I recommend spending the time to dress reports up before presenting it to a wider audience.
Profiling and Criterion
Criterion does a great job comparing performance of implementations, but we have to rely on profiling tools to show us why one is faster than the other. We’ll be using the venerable valgrind, which doesn’t have a great cross platform story, so I’ll be sticking to linux for this.
And we can navigate in kcachegrind to lines of code with the most instructions executed in them, and typically execution time scales with instructions executed.
Don’t worry if nothing stands out. I just wanted to take a screenshot of what a profiling result looks like (with the assembly of the line highlighted below). The goal of profiling is to receive a better inclination of the code base. Hopefully you’ll find hidden hot spots, fix them, and then see the improvement on the next criterion run.
While I’ve only focussed on Criterion, valgrind, kcachegrind – your needs may be better suited by flame graphs and flamer.
Make everything reproducible
Creating a benchmark and reports mean nothing if they are ephemeral, as no one else can reproduce what you did including yourself when your memory fades.
Include instructions in the readme on how to run the benchmark and generate any necessary output (eg:
Commit the benchmark data to the repo. This may be a little controversial due to benchmarks varying across machines, but since benchmarks may take hours to run – you’ll save yourself and any contributors a ton of time when all they need is the data (for instance, when a visualization needs to be updated). Previous benchmark data can also be used to compare performance throughout time. Only commit new benchmark data when benchmarks have changed or a dependant library used in the comparison is updated.
Commit the script / instructions to generate graphics. I use R + ggplot2, but one can use matplotlib, gnuplot, or even Chart.js. Doesn’t matter what it is, but if someone points out a typo, you don’t want to scramble to remember how the chart was generated.
Don’t force a narrative
While it’s important to be able to convey a point with graphics and callouts, ensure that the “competing” implementations are not gimped, as people prefer honesty over gamed benchmarks. Open source is not some winner take all, zero sum environment.
Benchmark older versions of your library so you can accurately track progress or catch regressions. This can easily be done in Rust:
and reference it like:
A single graphic often may not be satisfactory for all use cases. If we examine the chart I posted earlier, cramping is apparent when read sizes are small (< 4 bits), which may be important to some use cases.
We can fix that with a tasteful table
Now users can quickly quantify performance at all sizes (well… to the closest power of 2). Being able to see a trend with shading is a bonus here.
When benchmarking across languages, first try an apples to apples comparison in the same benchmark framework (like Criterion) using a (hopefully) a zero cost -sys crate for C / C++. Else one can get an approximation using appropriate benchmark harness for each language (make sure it can output data in csv or json):