Welcome to NBSoftSolutions, home of the software development company and writings of its main developer: Nick Babcock. If you would like to contact NBSoftSolutions, please see the Contact section of the about page.

Migrating to Actix Web from Rocket for Stability

Actors in Actix, actors in life

I previously wrote an article back in November 2017: Replacing Elasticsearch with Rust and SQLite. In it, I needed to create a few HTTP endpoints that ingested JSON, perform a database lookup, and return JSON. Very simple. No query / path parameters, authentication, authorization, H2, or TLS. Any v0.1 microframework would do (Project repo).

I went with Rocket and I knew what I was getting into back then:

[Rocket] has the best testing story, serde support, and contains minimal boilerplate. The downside is that nightly Rust is required.

I became enamored with succint endpoints (a differentiating feature at the time):

#[post("/search", format = "application/json", data = "<data>")]
fn search(data: Json<Search>) -> Json<SearchResponse> {
    debug!("Search received: {:?}", data.0);
    Json(SearchResponse(vec![
        "blog_hits".to_string(),
        "sites".to_string(),
        "outbound_data".to_string(),
    ]))
}

I didn’t understand how stability was such an important feature. I was familiar with needing new versions of the nightly compiler to stay current with clippy and rustfmt, but it was a blindspot when it came to dependencies.

Six Months Later

It’s been six months since the article. There have been a half dozen updates to Rocket and numerous updates to the nightly compiler. Since I only revisit the project about once every other week, I was always met with a compiler error, as I had most likely updated the nightly compiler in the meantime to grab a new rustfmt or clippy version, and the Rocket version wouldn’t work with that nightly. Syncing Rocket and the nightly compiler to the latest version normally fixed the issue. I’m very thankful that Rocket is maintained to this high degree.

There was a time when the latest Rocket broke on nightly because the nightly version broke ring, a dependency. A combination of cross linking, alpha versions, module paths, and herd mentality led to a temporary impasse. The only solution was to pin the nightly version of Rust with a specific version of Rocket. Everyone worked together and eventually resolved the issue. Still, I find myself wary.

This did have the side effect of me pinning the nightly compiler via rustup override. Though, I came back a couple weeks later to update dependencies (including Rocket) and this time my usual incantation of cargo +nightly build failed. Syncing versions didn’t help. It took me an embarrassingly long amount of time to realize that I needed to either update the nightly pin or unpin.

I don’t want a project where I have to remember its unique setup. I like it when a cargo build or cargo build --all is all I need.

Actix-Web

In these last six months, I have been really impressed with the actix.rs project, specifically actix web. It’s a relatively new project, in fact the first released version on crates.io was shortly before I started working on my project. To me, it has everything Rocket has to offer but it also compiles on stable. As a plus, it integrates with tokio for asynchronous endpoints. I don’t utilize this feature, but it’s nice to know actix is closely tracking the future of scalable Rust networking.

I can’t overstate how similar actix web endpoints resemble Rocket endpoints. The following diff is the migration for the endpoint posted earlier:

- #[post("/search", format = "application/json", data = "<data>")]
  fn search(data: Json<Search>) -> Json<SearchResponse> {
      debug!("Search received: {:?}", data.0);
      Json(SearchResponse(vec![
          "blog_hits".to_string(),
          "sites".to_string(),
          "outbound_data".to_string(),
      ]))
  }

That was too easy. How about an endpoint that takes application state too?

- #[post("/query", format = "application/json", data = "<data>")]
- fn query(data: Json<Query>, opt: State<options::Opt>) -> Result<Json<QueryResponse>, Error> {
+ fn query(data: (Json<Query>, State<options::Opt>)) -> Result<Json<QueryResponse>, Error> {
      // endpoint code 
  }

The only thing that changed in the migration were the function signatures! Hats off to the actix project for reaching the same level of ergonomics as Rocket, making it a painless migration, all the while working on stable Rust.

Really, the only code I wrote for the migration was for creating an App

fn create_app(opts: options:Opt) -> App<options:Opt> {
    App::with_state(opts)
        .middleware(Logger::default())
        .resource("/", |r| r.f(index))
        .resource("/search", |r| r.method(http::Method::POST).with(search))
        .resource("/query", |r| r.method(http::Method::POST).with(query))
}

I completed the migration in under an hour with zero experience with actix and only using the official docs. I kicked myself for not doing this sooner, it was almost too easy.

One hiccup, though: integration testing.

Integration Testing

It took me longer to convert a few tests than to go from zero to migrated with actix. Testing with actix has it’s own documentation section which got me 95% of the way.

I was used to Rocket’s way of testing (example taken from Rocket docs):

let client = Client::new(rocket()).expect("valid rocket instance");
let mut response = client.get("/").dispatch();
assert_eq!(response.status(), Status::Ok);
assert_eq!(response.body_string(), Some("Hello world!".into()));

In the example, it’s immediately obvious how to test for the response’s body contents. Actix lacked this example, so I created it. I won’t bore you with why it took me so long to figure it out, so for posterity, here is the same test but with actix.

let mut srv = TestServer::with_factory( create_app);
let request = srv.client(http::Method::GET, "/").finish().unwrap();
let response = srv.execute(request.send()).unwrap();

assert!(response.status().is_success());
assert_eq!(response.content_type(), "text/plain");

let bytes = srv.execute(response.body()).unwrap();
assert_eq!(str::from_utf8(&bytes).unwrap(), "Hello world!");

Other Thoughts

I’m keeping a Rocket branch alive so I can do some comparisons:

Compile times with Rocket

git checkout rocket
cargo clean
time cargo +nightly build --release -p rrinlog-server

real    2m34.228s
user    9m41.905s
sys     0m11.664s

Compile times with Actix-Web (I compile with same nightly to control for any compiler improvements)

git checkout master
cargo clean
time cargo +nightly build --release -p rrinlog-server

real    3m16.306s
user    12m4.985s
sys     0m16.032s

Binary size:

  • Rocket: 9.5MB (5.3MB strip -x)
  • Actix-Web: 13MB (8.5MB strip -x)
  • Actix-Web (no default features): 12MB (7.6MB strip -x)

So Actix-Web results in a slower compile time and a larger executable. Migrating to actix web didn’t result in wins across the board, but it is a fine trade-off. For curiosity, I ran cargo bloat to see if there were obvious culprits:

 File  .text     Size Name
 9.0%  26.1%   1.2MiB [Unknown]
 6.8%  19.7% 961.4KiB std
 3.0%   8.6% 420.8KiB regex_syntax
 2.9%   8.2% 402.2KiB actix_web
 2.1%   6.1% 299.2KiB regex
34.6% 100.0%   4.8MiB .text section size, the file size is 13.8MiB

Eh, nothing stands out. That’s ok.

Last difference that I felt is that Rocket has its built in Rocket.toml, which I used to change the bound address. This was simple enough to move to a commandline argument.

I’m more than happy with the migration and will recommend anyone who uses Rocket and feels the pain of nightly breakage to use actix web. My intent for this article was not to come across like “X is better than Y framework” as both are exceptional, but rather showcase actix for Rocket users. It’s easy, fast, and stable.


Viewing WireGuard Traffic with Tcpdump

Part of the Wireguard series:

On the article, WireGuard VPN Walkthrough, Glen posted the tantalizing question:

How would you verify/confirm that the link is definitely encrypted? If you use OpenVPN and use Wireshark to sniff the packets, you see the OPENVPN protocol listed in the captured dump. Is there an equivalent for Wireguard?

For testing, here are my assumptions:

  • External Wireguard server is hosted at IP address 100.100.100.100
  • Local Wireguard interface is called wg1 at 10.192.122.2. We won’t be using wg-quick (see solution #2 if you want to setup the interface and follow along)
  • curl --interface eth0 http://httpbin.org/ip gives your external ip address (90.90.90.90)
  • curl --interface wg1 http://httpbin.org/ip gives vpn ip address (100.100.100.100)
  • If we can observe unencrypted data while listening to eth0 then the connection is not secure.

Plain HTTP is not secure so let’s watch for our request to httpbin. We’ll be able to snoop using

tcpdump -n -v -i eth0 port 80

Exectuing the eth0 curl statement, we’ll be able to clearly see our HTTP request and response:

02:58:45.452812 IP (tos 0x0, ttl 64, id 31717, offset 0, flags [DF], proto TCP (6), length 129)
    192.168.1.6.40318 > 54.235.130.91.80: Flags [P.], cksum 0x7b68 (incorrect -> 0x26e5), seq 283436622:283436699, ack 1472925892, win 229, options [nop,nop,TS val 323460534 ecr 112785182], length 77: HTTP, length: 77
        GET /ip HTTP/1.1
        Host: httpbin.org
        User-Agent: curl/7.47.0
        Accept: */*

02:58:45.504943 IP (tos 0x20, ttl 232, id 59362, offset 0, flags [DF], proto TCP (6), length 372)
    54.235.130.91.80 > 192.168.1.6.40318: Flags [P.], cksum 0x675f (correct), seq 1:321, ack 77, win 105, options [nop,nop,TS val 112785194 ecr 323460534], length 320: HTTP, length: 320
        HTTP/1.1 200 OK
        Connection: keep-alive
        Server: gunicorn/19.7.1
        Date: Thu, 26 Apr 2018 00:15:35 GMT
        Content-Type: application/json
        Access-Control-Allow-Origin: *
        Access-Control-Allow-Credentials: true
        X-Powered-By: Flask
        X-Processed-Time: 0
        Content-Length: 33
        Via: 1.1 vegur

        { 
          "origin": "90.90.90.90"
        }
  • We see our eth0 address (192.168.1.6) talking to httpbin’s server at 54.235.230.91
  • The response contains our expected external ip address (90.90.90.90)
  • And oh no, we’re snooping!

Let’s look at what executing the wg1 curl will look like with:

tcpdump -n -v -i wg1 port 80
tcpdump: listening on wg1, link-type RAW (Raw IP), capture size 262144 bytes
03:10:13.382048 IP (tos 0x0, ttl 64, id 26588, offset 0, flags [DF], proto TCP (6), length 129)
    10.192.122.2.42904 > 54.225.185.38.80: Flags [P.], cksum 0x753d (incorrect -> 0x2b18), seq 354179898:354179975, ack 3265920867, win 216, options [nop,nop,TS val 323632516 ecr 2420503305], length 77: HTTP, length: 77
        GET /ip HTTP/1.1
        Host: httpbin.org
        User-Agent: curl/7.47.0
        Accept: */*

03:10:13.861353 IP (tos 0x0, ttl 234, id 3103, offset 0, flags [DF], proto TCP (6), length 371)
    54.225.185.38.80 > 10.192.122.2.42904: Flags [P.], cksum 0xf0d4 (correct), seq 1:320, ack 77, win 105, options [nop,nop,TS val 2420503425 ecr 323632516], length 319: HTTP, length: 319
        HTTP/1.1 200 OK
        Connection: keep-alive
        Server: gunicorn/19.7.1
        Date: Thu, 26 Apr 2018 00:27:03 GMT
        Content-Type: application/json
        Access-Control-Allow-Origin: *
        Access-Control-Allow-Credentials: true
        X-Powered-By: Flask
        X-Processed-Time: 0
        Content-Length: 32
        Via: 1.1 vegur

        { 
          "origin": "100.100.100.100"
        }
  • We see our wg1 address (10.192.122.2) talking to httpbin’s server at 54.235.230.91
  • The response contains our VPN’s external ip address (100.100.100.100)
  • And oh no, we’re snooping! Or are we?

Don’t worry, in this example we’re sending plaintext to a Wireguard interface and receiving plaintext back, which is what our tcpdump command is showing. However, our Wireguard interface doesn’t actually have the capability to send network data anywhere. It’ll have eth0 transmit the encrypted payload to our VPN server. This means that listening on eth0 but executing the wg1 curl will show encrypted contents. If eth0 can’t read the contents then no one else will either.

We’ll update our tcpdump command as we won’t be communicating TCP over port 80. We know we’ll be communicating with our VPN server, so only capture traffic between us and the server.

tcpdump -n -X -i eth0 host 100.100.100.100

Since we’ll be seeing encrypted packets, they won’t be printable. To display the contents, we’ll view the data hex encoded (which is the -X option).

I’ve also removed the IPv4 header and UDP header, so we can just focus on the data. Below is the our HTTP GET request.

0x0000:  .... .... .... .... .... .... .... ....  ................
0x0010:  .... .... .... .... .... .... 0400 0000  ................
0x0020:  bef2 24a4 0600 0000 0000 0000 002f b736  ..$........../.6
0x0030:  7448 8e01 778f 7e13 adb8 e66c e307 3d39  tH..w.~....l..=9
0x0040:  bfbf f53d a194 211b f0e5 6cab c561 1f5c  ...=..!...l..a.\
0x0050:  2c38 906b 1bec 183a 8e41 8bab 3a59 ca0b  ,8.k...:.A..:Y..
0x0060:  e1ef dda8 e882 4f8a 6590 f517 2d9a 2077  ......O.e...-..w
0x0070:  4830 8673 b26b 1a16 7f3b e358 f3ec b14f  H0.s.k...;.X...O
0x0080:  8d97 b15d 9ad3 3962 3e1f 5d6c 96be 0518  ...]..9b>.]l....

Notice that the data starts with 0400 0000. If we cross reference this with Wireguard’s documented protocol, we can confirm that the data begins with an 8bit 4 followed by 24 bits of 0, so we can rest assured that we’ve set up Wireguard correctly. One could dig a little deeper in the subsequent bits to capture the receiver index part of the protocol, but as a heuristic, 0400 0000 is decent. Keep in mind Wireguard doesn’t try to obsfuscate data, so an internet provider could reasonably try to detect and block Wireguard traffic.

The creator of wireguard had this to say:

WireGuard does not aim to evade DPS [deep packet inspection], unfortunately. There are several things that prevent this from occurring:

  • The first byte, which is a fixed type value.
  • The fact that mac2 is most often all zeros.
  • The fixed length of handshake messages.
  • The unencrypted ephemeral public key.

So Wireguard isn’t the panacea for those trying to evade sophisticated and unfriendly firewalls (and Wireguard never billed itself as that). It’s a great VPN that can be combined with other tools to match one’s desired needs.


Lessons learned: ZFS, databases, and backups

The graves of data without backups

I have a self-hosted Nextcloud for cloud storage installed on a ZFS Raid-6 array. I use rclone to keep my laptop in sync with my cloud. I was setting up a new computer wanted a local copy of the cloud, so I executed rclone sync . nextcloud:. This ended up deleting a good chunk of my cloud files. The correct command was rclone sync nextcloud: .. The manual for rclone sync includes this snippet:

Important: Since this can cause data loss, test first with the –dry-run flag to see exactly what would be copied and deleted.

✅ - Lesson: Prefer rclone copy or rclone copyto where possible as they do not delete files.

Oof. Now that I just deleted a bunch of files, it became a test to see if I could restore them. Since I use zfs-auto-snapshot I figured rolling back to the most recent snapshot would fix the problem. So I logged onto the server to see zfs list

NAME                        USED  AVAIL  REFER  MOUNTPOINT
tank                       1.03T  9.36T   941G  /tank

I have only a single ZFS dataset. So if I rolled back to a snapshot, I’d be rolling back every single application, database, media files to a certain point in time. Since I just executed the erroneous rclone command, I thought it safe to rollback everything to previous snapshot taken a few prior. So I did it.

✅ - Lesson: Use more datasets. Datasets are cheap and configured to have different configuration (sharing, compression, snapshots, etc). The FreeBSD handbook on zfs states:

The only drawbacks to having an extremely large number of datasets is that some commands like zfs list will be slower, and the mounting of hundreds or even thousands of datasets can slow the FreeBSD boot process. […] Destroying a dataset is much quicker than deleting all of the files that reside on the dataset, as it does not involve scanning all of the files and updating all of the corresponding metadata.”

I regretted rolling back. I opened up Nextcloud to see a blank screen. Nextcloud relies on MySQL and logs showed severe MySQL errors. Uh oh, why would MySQL be broken when it had been working at the provided snapshot? MySQL wouldn’t start. Without too much thought I incremented innodb_force_recovery all the way to 5 to get it to start, but then no data was visible. I had no database backups.

✅ - Lesson: Always make database backups using proper database tools (mysqldump, pg_dumpall, .backup). Store these in a snapshotted directory in case you need to rollback the backup.

So I scrapped that database, but why had it gone awry? Here I only have hypotheses. The internet is not abundant in technicians diagnosing why a file system snapshot of a database failed, but here are some good leads. A zfs snapshot is not instantaneous. A database has a data file and several logs that ensure that power loss doesn’t cause any corruption. However, if the database and these logs get out of sync (like they might with a snapshot), you might see the database try and insert data into unavailable space. I say “might” because with a low volume application or snapshotting at just the right time, the files may be in sync and you won’t see this problem.

✅ - Lesson: If you are taking automatic zfs snapshots do not take snapshots of datasets containing databases: zfs set com.sun:auto-snapshot=false tank/containers-db

I went back through the initial installation for Nextcloud. Thankfully, it recognized all the files restored from the snapshot. I thought my troubles were over, but no such luck. I wrote an application called rrinlog that ingests nginx logs and exposes metrics for Grafana (previously blogged: Replacing Elasticsearch with Rust and SQLite). This application uses SQLite with journal_mode=WAL and I started noticing that writes didn’t go through. They didn’t fail, they just didn’t insert! Well, from the application’s perspective, the data appear to insert, but I couldn’t SELECT them. A VACUUM remarked that the database was corrupt.

✅ - Lesson: SQLite, while heavily resistant to corruption, is still susceptible, so don’t forget to backup SQLite databases too!

Maybe it’s a bug in the library that I’m using or maybe it’s a SQLite bug. An error should have been raised somewhere along the way, as I could have caught the issue earlier and not lost as much data. Next step was to recover what data I had left using .backup. Annoyingly, this backup ended with a ROLLBACK statement, so I needed to hand edit the backup.

After these trials I’ve changed my directory structure a little bit and applied all the lessons learned:

|- applications (snapshotted)
|- databases (not snapshotted)
|- database-backups (snapshotted)

It’s always a shame when one has to undergo a bit of stress in order to realize best practices, but the hope is that by having this experience, I should apply these practices in round 1 instead of round 2.