Monitoring Windows system metrics with GrafanaPublished on
Beautiful dashboards for your home components
Update: 2017-10-30: With new preferred graphite docker image
Update: 2017-11-01: I’ve opensourced a Windows utility that will export Open Hardware sensor data into Graphite. It adds GPU, power, temperature, and more, which won’t be covered in this post. Check it out! The follow up post contains a little more context.
At work we’re using Grafana for realtime visualization dashboards. We recently started this work and have had a blast creating dashboards and teasing new insights from data. Last month, I attended Grafanacon 2016, and one of the themes is how prevalent Grafana has become; with screenshots of Microsoft, NASA, and Intel using Grafana dashboards in advertisements. Anyways, a coworker linked to a Grafana dashboard of someone’s Plex setup running ontop of ESXi.
I run Hyper-v (I’ve written Linux Virtualization with a Mounted Window’s Share on Client Hyper-V) and don’t use plex, but I wanted to know if I could create a dashboard about my machine as well. Since Windows already publishes metrics via Performance Counters, I wanted to visualize them with Grafana. Here’s how we’ll do it:
- The host machine has performance counters
- Install the single binary, time series data collector, telegraf onto the host.
- Run Graphite within a docker container on virtual machine.
- Install Grafana onto virtual machine.
This may be overcomplicating things, as both docker and Grafana can be installed on Windows. Indeed, docker contains thorough information for Windows users. But I like to keep my host machine as clean as it can be and I still feel like docker and Grafana treat Windows as a second class citizencitation needed
After downloading the Telegraf Windows bundle, the default config is pretty good, but could use some updating for my use case. Below is the config that I’m currently using:
- Collection interval set to 5s as the default of 10s seems too slow and values smaller than 5s tend to timeout gathering the metrics.
- Moved logfile to my
D:drive as my
C:drive is fast SSD and it’s not worth storing logs.
- The one output is our graphite instance running on our VM. There may be something wrong with Go’s (the language that telegraf is written in) domain name resolution as only the IP address worked.
- The graphite template removes the
hostsegment and swaps
measurementwill be something like
win_diskand one of the tags will be
C:. The one thing I dislike with telegraf’s data output is that I can’t choose to filter out tags, so even useless tags get written to graphite
IncludeTotalfor processors as it gives a good overview of processor usage across all processors, which is convenient for calculations and also helpful if the number of processors ever change.
LogicalDisk, I prefer seeing the disk throughput instead of percentages
Systemperformance metrics. I don’t think there is a use for these metrics for average computer user.
MemoryI prefer to see how many committed bytes as well as available bytes
Running it at the commandline is fine for testing, but having telegraf boot up on start up is ideal. Telegraf provides a way to install as a service. Since I install telegraf into
C:\Apps\telegraf, I executed the service command like the following:
Update: 2017-10-30: My new favorite graphite docker image is the official one graphiteapp/graphite-statsd, so installation may be a little different. Also I removed the configuration that started graphite on boot (it is handled natively by docker using the
restart command line)
Next comes Graphite and it’s bad rap for being hard to install. Yes, a lot of it is rooted in truth and work needs to be done to make installation easier, but people tend to make mountains out of mole hills. If one makes technical decisions solely on difficulty of installations, then they’d be missing out. Instead, we’re just going to use docker setup our instance. My favorite graphite installation is praekeltfoundation/graphite because it doesn’t come with a significant amount of unneeded programs. The only thing we need to change is the data retention policy; we want store data at the five second level, and not minutely.
Now that we have our image, let’s boot it up!
- We expose port 2003, which is the inbound port for metrics data
- Also port 8000 so grafana can talk to the graphite render api
Grafana is the easiest step.
- Follow the install instructions
- Login with admin, admin
- Add graphite as a datasource
- Start visualizing
- Hardware temperature
- GPU metrics
These two will probably require information outside of telegraf. I’m hoping to hitch a ride off Open Hardware Monitor.
See the follow up post for how to include GPU, temperature and power metrics.