Welcome to NBSoftSolutions, home of the software development company and writings of its main developer: Nick Babcock. If you would like to contact NBSoftSolutions, please see the Contact section of the about page.

Pushing the power envelope with undervolting

This article has been superceded at it’s new home: https://sff.life/how-to-undervolt-gpu/

This article depicts how to use standard undervolting tools on a standard CPU and GPU to achieve a significant gain in power efficiency while preserving performance.


I’m migrating away from a 4.5L sandwich style case that used a 300w flex atx PSU to power:

  • Ryzen 2700
  • MSI GTX 1070 Aero ITX
  • 32GB ram at 1.35v
  • 1TB Nvme

For the curious, entering these components into various power supply calculators yielded:

So it already seems like we’re pushing our PSU, but I’m going to go with an even smaller rater PSU soon, so I’ll need to get creative through undervolting. Undervolting will:

  • Decrease energy consumption by lowering voltage (and thus watts)
  • Decrease temperature of components as there is less heat to shed
  • Can have no effect on performance, as we can undervolt at given frequencies

Could this be too good to be true?

How to Undervolt

Components we’ll be using:

  • P4460 Kill a Watt electricity usage monitor: to measure output from the wall. This is the only thing that costs money on the list – you may be able to rent it from a local library or utility company. A wattmeter is not critical, but it’ll give us a sense of total component draw that the PSU has to supply
  • Cinebench R20: Free CPU benchmark
  • 3dmark Timespy: Free GPU benchmark
  • HWiNFO64: to measure our sensor readings (temperature, wattage, etc)
  • MSI Afterburner: to tweak the voltage / frequency curve of the gpu
  • Motherboard bios: to tweak cpu voltage
  • Google sheets: A spreadsheet for tracking power usage, benchmarks, any modifications, etc

a Kill a Watt device to measure wall draw

Feel free to swap components out for alternatives, but I do want to stress the importance of benchmarks, as it’s desirable for any potential performance loss to become apparent when running benchmarks at each stage of the undervolt.

Attentive readers will note the absence of any stress tests (eg: Prime95 With AVX & Small FFTs + MSI Kombustor). It is my opinion that drawing obscene amounts of power to run these stress tests is just too unrealistic. Benchmarks should be stressful enough, else what are they benchmarking? But I understand the drive for those who want ultimate stability, so I’d suggest after we work our way down the voltage ladder to start climbing up as stress tests fail.

Initial Benchmarks and Measurements

Let’s break down what it really means when we’re measuring the number of watts flowing through the wattmeter:

  • Kill A Watt shows 200 watts
  • The PSU (SSP-300-SUG) is 300W 80 Plus Gold certified
  • PSU is able to convert at least 87%-90% of inbound power to the components with the rest dispersed as heat
  • The components are asking between 174-180 watts (else the PSU would be rated silver or platinum)
  • If the components are asking for max power (300 watts), the Kill A Watt should be reading 337-344 before shutdown

First we’ll measure idle:

  • Close all programs including those in the background using any cpu cycles
  • Open HWiNFO
  • Wait for the system to settle down (ie, the wattmeter converges to a reading)
  • Record what’s being pulled from the wall, cpu / gpu watts, and cpu / gpu temps

Then benchmark the cpu:

  • Keep everything closed
  • Run Cinebench R20
  • For each run record score, max wattmeter reading, max watts / temp from the cpu
  • Reset HWiNFO recorded max values
  • Repeat three times

Then benchmark the gpu:

  • Keep everything closed
  • Run Timespy
  • For each run record score, max wattmeter reading, max watts / temp from the gpu
  • Reset HWiNFO recorded max values
  • Repeat three times

For timespy, the score you are interested in is the graphics score (highlighted below)

The reason why we’re interested in measuring max values is that we need any future power supply to handle any and all spikes.

Here’s an example of what I recorded for my initial measurements and how I interpreted them. Idle:

  • Wattmeter reading: 52.5W
  • CPU watts: 22
  • GPU watts: 15


  • Average Timespy score: 5976
  • Max wattmeter reading: 241W
  • Max GPU watts: 159W
  • Max GPU temperature: 82c
  • Our GPU exceeds it’s TDP rating by 10W and total component draw is around 210-217W (241 * [.87, .9]). Through some algebra we can approximate other components consuming 50-60W

If you’re recording CPU watts during the GPU benchmark you may be able to say a bit more about motherboard + ram + etc usage, but be careful – max wattmeter, max gpu watts, and max cpu watts are unlikely to happen at the same time, so unless one has the ability to correlate all three of them, better to stick with the more likely event: max wattmeter occurs at the same time as max gpu watts during a gpu benchmark.


  • Average Cinebench score: 3294
  • Max wattmeter reading: 131W
  • Max CPU watts: 77W
  • Max CPU temperature: 81c
  • Our CPU exceeds it’s TDP rating by 12W and total component draw is around 114-118 (131 * [.87, .9]). Other components are consuming around 20W.

GPU Undervolt

Determine Target Frequency

First, determine what target frequency you’d like your GPU to boost to, ideally a number between the GPU’s boost clock and max clock. The GPU’s boost clock will be listed in the manufacturer’s specification and also in GPU-Z. The GPU’s max clock is determined at runtime through GPU boost and will be reported as the “GPU Clock” in HWiNFO64 (make sure it’s running while the benchmark are in progress). Below are screenshots from HWiNFO64 and GPU-Z showing the differences between these two numbers.

  • Boost clock: 1721mhz
  • Max clock: 1886mhz

GPU boost (different from boost clock) increases the GPU’s core clock while thermal and TDP headroom remain. This is somewhat counterproductive for us as any voltage increases to reach higher frequencies will cause an increase in power usage and blow our budget.

I chose my target frequency to be 1860mhz – 139mhz over boost and 26mhz under max, as during benchmarking the gpu clock was pinned at 1860mhz most the time. The exact number doesn’t matter, as one can adjust their target frequency depending on their undervolting results. I change my target frequency later on. We’ll also choose our starting voltage to be 950mv, which is a pretty middling undervolt, and we’ll work our way down.

MSI Afterburner

The tool for undervolting! Powerful, but can be unintuitive at first.

Ctrl + F to bring up the voltage frequency curve. Find our target voltage of 950mv and take note of the frequency (1809mhz in the screenshot)

Our target frequency (1860mhz) is greater than 1809mhz by 51mhz, so we increase the core clock by 51mhz.

This will shift the voltage frequency graph up by 51mhz ensuring a nice smooth ride up the voltage / frequency curve until we hit 1860mhz at 950mv. Then to guarantee we don’t exceed 950mv:

  • Select the point at 950mv
  • Hit “l” to lock the voltage
  • For all voltage points greater than 950mv, drag them to or below 1860mhz
  • Hit “✔” to apply
  • Afterburner will adjust the points to be the same frequency as our locked voltage
  • You may have to comb over >950mv points to ensure that afterburner didn’t re-adjust any voltage points to be greater than 1860mhz. It happens
  • Hit “✔” to apply

End result should look like:

After our hard work, we’ll want to save this as a profile so that we can refer back to it after we inevitably undervolt too far. Also after determining what is the best undervolt, we’ll want to have that profile loaded on boot, so ensure the windows icon is selected.

Rinse and Repeat

Do the benchmark routine:

  • Clear / reset sensor data for max power usage and temperature
  • Start Timespy
  • Keep eyes glued on the kill a watt and record max draw
  • After Timespy completes record score, gpu max power usage, max temperature, and wall max wall draw.
  • Repeat 3 times

After three successful benchmarks:

  • Open Afterburner
  • Reset the voltage / frequency graph by hitting the “↺” icon
  • Decrement the target voltage by 25mv and keep the same target frequency
  • Calculate new core clock offset from new target voltage
  • Proceed to adjust all greater voltages to our frequency
  • Re-benchmark


  • Undervolted to 875mv at 1860mhz
  • 850mv was not stable
  • No affect on Timespy score (<1% difference)
  • GPU max wattage decreased from 159W to 124W (20-23% reduction)
  • GPU max temp decreased from 82 to 75 (9-10% reduction)
  • Attempting an undervolt of 875mv at 1886mhz (the original max clock) was not stable

These are good results that demonstrate that there is no performance loss from undervolting, yet one can mitigate heat and power usage. I decided to take undervolting one step further and set my target frequency to the GPU’s boost frequency (1721mhz) and record the results:

  • Undervolted to 800mv at 1721mhz
  • Slight decrease to Timespy score (6-7%)
  • GPU max wattage decreased from 159W to 105W (33% reduction)
  • GPU max temp decreased from 82 to 68 (15-17% reduction)

To me, the loss in performance in this chase to undervolt is greatly outweighed by the gain in efficiency.

CPU Undervolting

Now onto CPU undervolting. This step will differ greatly depending on the motherboard, but for me it was as easy as adjusting the VCore Voltage Offset in 20mv increments.

After booting, run cinebench, record values – the works. Repeat a few times. On success, decrement the offset more.


  • Undervolted to -100mv offset
  • Small decrease in Cinebench score (<3% difference)
  • CPU max wattage decreased from 77W to 65W (15% reduction)
  • CPU max temp decreased from 81 to 76 (6% reduction)


We’ve decreased power usage for the CPU and GPU considerably (a combined 66 watts), lowered temperatures, and opted into additional undervolts for a near neglible performance loss. Yet we’re unable to say anything about power options we can slim down to, so I ran an informal stress test by running cinebench and timespy at the same time and recorded max watts recorded from the wall: 205 watts. Meaning components are really desiring 180-200 watts. Incredible – less than 200 watts in a stress test.

Monitoring Remote Sites with Traefik and Prometheus

Partial screenshot of a traefik metric dashboard

I have several sites deployed on VPSs like DigitalOcean that have been dockerized and are reverse proxied by traefik so they don’t have to worry about Let’s Encrypt, https redirection, etc. Until recently I had very little insight into these sites and infrastructure. I couldn’t answer basic questions like:

  • How many requests is each site handling
  • What are the response times for each site
  • Is the box over / underprovisioned

For someone who has repeatedly blogged about metrics and observability (here, here, here, here, and here) – this gap was definitely a sore spot for me. I sowed the gap shut with traefix static routes, prometheus metrics, basic authentication, and Let’s Encrypt.

Do note that this article assumes you already having a working setup with traefik and let’s encrypt.

Sample screenshot

Exposing Traefik Metrics

Traefik keeps count of how many requests each backend has handled and the duration of these requests. Traefik exposes several possibilities for exporting these metrics, but only one of them is a pull model, which is prometheus. A pull model is ideal here as it allows one to be able to track metrics from a laptop without any errors on the server side, and one can more easily tell if a service has malfunctioned. So we’ll allow metrics to be exposed with a modification to our traefik.toml


If traefik is running inside a docker container (in my case, docker compose) the default api port needs to be exposed.

    # ... snip
      - 80:80
      - 443:443
      - 8080:8080

Now once traefik starts, we can retrieve metrics from http://<server-ip>:8080/metrics. Three things wrong:

  • Metrics broadcast to the entire internet. We need to lock this down to only authenticated individuals.
  • Metrics served over http so others can snoop on the metrics.
  • Typically I find it preferable to lock traffic down to as few ports as possible.

We fix these issues by binding the listening address to localhost, and reverse proxying through a traefik frontend that forces basic authentication and TLS.

To bind port 8080 so it only listens locally, update the docker compose file

       - 80:80
       - 443:443
-      - 8080:8080
+      - ""

Now let’s proxy traffic through a basic auth TLS frontend.

First, create a prometheus user with a password of “mypassword” encoded with bcrypt using htpasswd (installed through apache2-utils on Ubuntu):

$ htpasswd -nbB prometheus mypassword

Potential performance issues with constantly authenticating using bcrypt can be mitigated with SHA1. Though in practice CPU usage is less than 1%.

Next, we configure traefik using the file directive. This basically configures traefik with static routes – think nginx or apache. While not explicitly mentioned anywhere, one can configure traefik with as many route providers as necessary (in this case, docker and file). A nice feature is that the file provider can delegate to a separate file that can be watched so traefik doesn’t need to be restarted on config change. For the sake of this article, I’m keeping everything in one file.

Add the below to traefik.toml. It’ll listen for metrics.myapp.example.com, only allow those who authenticate as prometheus, and then forward the request to our traefik metrics.


       url = ""

      backend = "internal-traefik"
      basicAuth = ["prometheus:$2y$05$JMP9BgFp6rtzDpAMatnrDeuj78UG7W05Zr4eyjtq2i7.gk0KZfcIC"]

          rule = "Host:metrics.myapp.example.com"

Note that this relies on Let’s Encrypt working, as metrics.myapp.example.com will automatically be assigned a cert. Pretty neat!

Exposing System Metrics

We now have a slew of metrics giving insights into the number of requests, response times, etc, but we’re missing system metrics to know if we’ve over or under provisioned the box. That’s where node_exporter comes into play. It’s analogous to collectd, telegraf, and diamond, but it is geared for prometheus’s pull model.

While one can use the prometheus-node-exporter apt package, I opted to install node_exporter’s latest version right from Github Releases. I avoided installing via docker as the readme states “It’s not recommended to deploy it as a Docker container because it requires access to the host system”. No major harm done, the node_exporter package is a single executable.

Executing it reveals some work to do.

$ ./node_exporter
INFO[0000] Listening on :9100
  • Plain http
  • Internet accessible port 9100

The official standpoint is that anything requiring auth or security should use a reverse proxy. That’s fine, good to know.

Downloading just an executable isn’t installing it, so let’s do that now before we forget. This is for systemd, so steps may vary.

cp ./node_exporter /usr/sbin/.
useradd --no-create-home --shell /usr/sbin/nologin --system node_exporter
cat > /etc/systemd/system/node_exporter.service <<EOF
Description=Node Exporter



systemctl enable node_exporter
systemctl start node_exporter

Now we’ll add authentication and security.

Node Exporter through Traefik

Now that we have multiple metric exporters on the box, it may seem tempting to look for a solution that aggregates both exporters so we only need to configure prometheus to scrape one endpoint. One agent to rule them all goes into depth as to why that’d be a bad idea, but the gist is operational bottlenecks, lack of isolation, and bottom up configuration (instead of top down like prometheus prefers).

Ok two endpoints are needed for our two metric exporters. We could split our metrics.myapp.example.com into traefik.metrics.myapp.example.com and node.metrics.myapp.example.com (phew!). This is a fine approach, but I’m going let the Let’s Encrypt servers have a breather and only work with metrics.myapp.example.com. We’ll have /traefik route to /metrics on our traefik server and /node-exporter appropriately.

I’ll post traefik.toml config with commentary following:


       url = ""

       url = ""

      backend = "internal-traefik"
      basicAuth = ["prometheus:$2y$05$JMP9BgFp6rtzDpAMatnrDeuj78UG7W05Zr4eyjtq2i7.gk0KZfcIC"]

          rule = "Host:metrics.myapp.example.com;Path:/traefik;ReplacePath:/metrics"

      backend = "internal-node"
      basicAuth = ["prometheus:$2y$05$JMP9BgFp6rtzDpAMatnrDeuj78UG7W05Zr4eyjtq2i7.gk0KZfcIC"]

          rule = "Host:metrics.myapp.example.com;Path:/node-exporter;ReplacePath:/metrics"

The backend for internal-node is “” and not “”. is the ip address of the docker bridge. This interface links containers with each other and the outside world. If we had used that would be traffic local to traefik inside it’s container (which node-exporter is not inside, it resides on the host system). The bridge ip address can be found by executing:

docker network inspect bridge --format='{{(index .IPAM.Config 0).Gateway}}'

We now can update the node-exporter to listen to internal traffic only on

node_exporter --web.listen-address=""

Other notes:

  • One can’t override the backend in a frontend route based on path, so two frontends are created. This means that the config turned out a bit more verbose.
  • This necessitates another entry for basic auth. In this example I copied and pasted, but one could generate a new bcrypt hash with the same password or use a different password
  • Each frontend rule uses the ReplacePath modifier to change the path to /metrics so something like /node-exporter gets translated to
  • I prefix each frontend and backend endpoint with “internal” so that these can be excluded in prometheus queries with a simple regex.

Now it’s time for prometheus to actually scrape these metrics.

Enter Prometheus

The prometheus config can remain pretty bland, if only a bit redundant.

  - job_name: 'traefik'
    scheme: https
      username: prometheus
      password: mypassword

    metrics_path: "/traefik"
    - targets: ['metrics.myapp.example.com']

  - job_name: 'node-exporter'
    scheme: https
      username: prometheus
      password: mypassword

    metrics_path: "/node-exporter"
    - targets: ['metrics.myapp.example.com']

And finally Grafana

I haven’t quite got the hang of promQL, so I’d recommend importing these two dashboards:

Play with and tweak as needed.


Definitely a lot of pros with this experience

  • Traefik contains metrics and only needed configuration to expose
  • Traefik can reverse proxy through static routes in a config file. I was worried I’d have to setup basic auth and tls through nginx, apache, or something like ghost tunnel, which I’m unfamiliar with. I love nginx, but I love managing fewer services more.
  • Installing node_exporter was a 5 minute ordeal
  • Pre-existing Grafana dashboards for node_exporter and traefik
  • A metric pull model sits more comfortably with me, as the alternative would be to push the metrics to my home server where I house databases such as graphite and timescale. Pushing data to my home network is a lovely thought, but one I don’t want to depend on right now.

In fact, it was such a pleasant experience that even for boxes where I don’t host web sites, I’ll be installing this traefik setup for node_exporter.

Downsampling Timescale Data with Continuous Aggregations

Gardening with time series data

Timescale v1.3 was released recently and the major feature is continuous aggregations. The main use case for these aggregations is downsampling data: which has been brought up several times before, so it’s great that these issues have been addressed.

Downsampling data allows queries spanning a long time interval to complete in a reasonable time frame. I wrote and maintain OhmGraphite, which I have configured to send hardware sensor data to a Timescale instance every couple seconds. Querying one sensor over the course of a week will cause the db to scan over 600k values. This query will take 10 or so seconds to run on penny pinched hardware. A single grafana dashboard can be composed of dozens of queries, which will compound performance degradations. If one could downsample the data from a resolution of 2 seconds to 5 minutes, the db would scan 150x fewer values.

This is why I’m so excited about the new continuous aggregations feature. It’s starting to get to the point where Timescale is checking all the boxes for me. I’ve compared it against graphite before and not downsampling was a caveat preventing me from investing more time with Timescale. To be clear, one could have approximated downsampling prior to this release, by implementing multi-table approach + a system timer to compute aggregations. But in my opinion, downsampling should be a native feature of any time series database.

There are caveats with continuous aggregations but they are documented and we’ll explore them here as well. Here I hope to outline how I’ve already started using them.

So OhmGraphite writes to a table structure like so

   host TEXT,
   hardware TEXT,
   hardware_type TEXT,
   identifier TEXT,
   sensor TEXT,
   sensor_type TEXT,
   sensor_index INT,
   value REAL

SELECT create_hypertable('ohm_stats', 'time', if_not_exists => TRUE);

Basically it’s time + sensor info + sensor value. Here is how we downsample to 5 minute intervals without losing sensor granularity.

CREATE VIEW ohm_stats_5m
WITH (timescaledb.continuous)
  time_bucket('5m', time) as time,
  host, hardware, hardware_type, identifier, sensor, sensor_type, sensor_index,
  avg(value) as value_avg,
  max(value) as value_max,
  min(value) as value_min
FROM ohm_stats
GROUP BY 1, host, hardware, hardware_type, identifier, sensor, sensor_type, sensor_index;

Some notes:

  • The CREATE VIEW has a backing table that materializes the data and refreshes automatically (so it’s more akin to a materialized view in that it trades spaces for faster querying times).
  • While I used avg, max, and min, any aggregate function can be used like count, stddev_samp, sum
  • Every column in the GROUP BY has an index created for it automatically unless timescaledb.create_group_indexes is set to false. I’m still grappling with how one should decide on this option. One can always create indices later on the materialized data, but I’m unsure how removing indices effects the refresh performance.
  • drop_chunks on ohm_stats will delete data on ohm_stats_5m, but decoupling this behavior is slated for future work. This unfortunately means that continuous aggregations are far from “space saver” as now there needs to be room for additional tables and indices. I’ve commented before that graphite’s disk format is already lightweight compared to a postgres db, and continuous aggregations only exasperates the difference.
  • The continuous aggregations lag behind realtime data by a set amount, but as I’ll demonstrate this caveat doesn’t concern my use case.

The one area where we have to do some heavy lifting is our queries should be rewritten on the client side to utilize the continuous aggregations when a “large” time range is requested. A large time range would be when each data point is 5 or more minutes apart. In grafana, I changed CPU load calculation from:

  $__timeGroupAlias("time", $__interval),
FROM ohm_stats
  $__timeFilter("time") AND
  hardware_type = 'CPU' AND
  sensor_type = 'Load' AND
  sensor LIKE '%Total%'


  $__timeGroupAlias("time", $__interval),
FROM ohm_stats
  $__timeFilter("time") AND
  hardware_type = 'CPU' AND
  sensor_type = 'Load' AND
  sensor LIKE '%Total%' AND
  '$__interval' < '5m'::interval


  $__timeGroupAlias("time", $__interval),
FROM ohm_stats_5m
  $__timeFilter("time") AND
  hardware_type = 'CPU' AND
  sensor_type = 'Load' AND
  sensor LIKE '%Total%' AND
  '$__interval' >= '5m'::interval

Some notes:

  • The first UNIONed query is exactly the same as the first query outside of '$__interval' < '5m'::interval, which will aid rewriting queries as they can be copied and pasted
  • The '$__interval' < '5m'::interval causes the first query to be a noop if the interval is 5 minutes or greater, so only the second query will compute.
  • More tables at a less granular time interval (think ohm_graphite_1d for year or multi-year reports), can be accomplished by just tacking more queries to be unioned.
  • Since the fine grained data (ohm_stats) is still present, one can zoom in on a past time period to see the high resolution data.
  • When the day comes when drop_chunks can keep around the continuous aggregations, then the query will need to always pull data from continuous aggregation if viewing data from more than 7 days ago (for example):
('$__interval' >= '5m'::interval OR
  (NOW() - $__timeFrom()) > '7 days'::interval)
  • Long range queries using the continuous aggregations complete instantly freeing up server resources



  • Continuous aggregations are a welcome feature and beat downsampling by hand
  • Postgres has more aggregate functions than graphite (though simulating graphite’s last aggregation may require a tad of SQL finesse).
  • Since the continuous aggregations are loselessly downsampling data, one gets both the performance of aggregated data and fine-grained access when zoomed in.
  • Multiple aggregations can be used per downsample (graphite only allows users to choose one unless they break it out into multiple series).

More work is needed to make continuous aggregations a clear win-win


  • Queries ideally shouldn’t be rewritten to seemlessly query the continuous aggregations (unsure if this can be solved without compromises)
  • More disk space needed to hold materialized data and indices in addition to the regular data. This will be partialy solved when drop_chunks allows one to selectively keep the continuous aggregation data past the policy period.
  • Some future continuous aggregation features may be enterprise only.

Result: I will continue porting all timescale queries to continuous aggregations. The feature isn’t perfect but I only expect it to get better with time.