Grafana¶

I will refer the reader to my feelings on docker. Watch in amazement as I stand up a turbo Grafana dashboard with no container software.

NOTE: the following config examples are may not be up to date or complete. All grafana configs etc. are backed up to the same place as the mkdocs source is. Use those files to restore grafana not the examples in this guide.

I'm currently not utilizing a lot of the data I have access to.

Grafana Installation¶

sudo apt-get update
sudo apt-get upgrade
sudo apt-get install -y software-properties-common apt-transport-https wget
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"
sudo apt-get update
sudo apt-get install grafana

sudo systemctl start grafana-server
sudo systemctl enable grafana-server


login admin/admin
connections->data source->prometheus
prometheus server url to where you're hosting it (grafana VM probably): http://192.168.4.9:9090

dashboard->new->import

Grafana Data Transformations and Tricks¶

Jellyfin Client Use by Count
- Transform rename fields by regex: .*client="([^"]+)".*
TrueNAS Arc Hit Ratio: multiply query by 100 to get into percentages
TrueNAS disk sorting is gross enough im just putting the image here:
How to get slow metrics to always show up: last_over_time(jellyfin_client_playback_count_value[1d])
You can fix cut off label names on bar charts by adding some padding at the ends of the labels with a transform->rename by regex, and capturing the whole thing, then adding your padding.

Data Scraping Overview¶

I tried using SNMP on a bunch of stuff. It is truly horrible. It runs like shit and is basically impossible to configure. Running device-specific tooling will be FAR superior in every case. I use it for esxi because afaik there's no other secure option. Creating a metrics user in esxi for esxshell ssh is still not really secure because that would involve permanently enabling ssh access to esxi.

Telegraf Install:

echo 'deb [signed-by=/etc/apt/trusted.gpg.d/influxdata-archive_compat.gpg] https://repos.influxdata.com/debian stable main' | sudo tee /etc/apt/sources.list.d/influxdata.list
sudo apt-get update && sudo apt-get install telegraf

pfsense -> telegraf package -> graphite exporter -> prometheus
- packages->telegraf
- set endpoint to graphite
- set graphite server ip, port is 2003
- prefix I used is _pfsense. I fucked up with the _.
- graphiteexporter/graphite-mappings.conf contains its mappings
truenas -> graphite builtin -> graphite exporter -> prometheus
- system->reporting
- set graphite server ip, port is 2003
- graphite-mappings.conf contains its mappings
esxi -> resxtop + script -> telegraf -> prometheus
- nightmare.
- uses a custom script to reparse resxtop data output and send it to telegraf
supermicro -> ipmi_exporter -> prometheus
- dirt simple
- create an account with "User" level so you don't leave your admin level bmc creds lying around.
ups -> apcups_exporter -> prometheus
jellyfin -> json_exporter -> prometheus
servarr stack -> exportarr -> prometheus
misc statuses -> blackbox_exporter -> prometheus

Prometheus and Exporters¶

I only use Prometheus. This is for cleanliness and because Prometheus is fundamentally far more efficient than influxdb. This is because Prometheus only stores float64's keyed by string. Here is the script I use to run all of them from a single shell script. I run these exporters on a separate VM that does nothing else, it has another layer of firewalling and security on it due to the credentials its storing for these things.

Look at the main.go files for each of the exporters on their respective github pages if you need to see more launch args. there is no documentation anywhere.

##!/bin/bash
cd "$(dirname "$0")"
## Set the PATH variable explicitly to include any necessary directories
export PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/your/custom/path
screen -ls | grep '(Detached)' | awk '{print $1}' | xargs -I % -t screen -X -S % quit
screen -dmS prometheus ./prometheus/prometheus --config.file=prometheus/prometheus.yml
screen -dmS graphite_exporter ./exporter_graphite/graphite_exporter --graphite.listen-address=0.0.0.0:2003 --graphite.mapping-config=exporter_graphite/graphite-mappings.conf
screen -dmS blackbox_exporter ./exporter_blackbox/blackbox_exporter --config.file=exporter_blackbox/blackbox.yml
screen -dmS ipmi_exporter ./exporter_ipmi/ipmi_exporter --config.file=exporter_ipmi/ipmi_exporter.yaml
screen -dmS apcups_exporter ./exporter_apcups/apcups_exporter --metrics-interval 5 --metrics-address 192.168.4.9 --metrics-port 5000
screen -dmS json_exporter ./exporter_json/json_exporter --config.file exporter_json/json_exporter.yml
screen -dmS sabnzbd_exportarr ./exporter_exportarr/exportarr "sabnzbd" --port 9710 --url "http://192.168.5.4:8080" --api-key "APIKEY"
screen -dmS radarr_exportarr ./exporter_exportarr/exportarr "radarr" --port 9711 --url "http://192.168.5.4:7878" --api-key "APIKEY" --enable-additional-metrics true
screen -dmS sonarr_exportarr ./exporter_exportarr/exportarr "sonarr" --port 9712 --url "http://192.168.5.4:8989" --api-key "APIKEY" --enable-additional-metrics true
screen -dmS sonarrAnime_exportarr ./exporter_exportarr/exportarr "sonarr" --port 9713 --url "http://192.168.5.4:8990" --api-key "APIKEY" --enable-additional-metrics true
screen -dmS prowlarr_exportarr ./exporter_exportarr/exportarr "prowlarr" --backfill true --port 9714 --url "http://192.168.5.4:9696" --api-key "APIKEY" --enable-additional-metrics true

For each of the following the configuration file goes into the same directory the binary is located in. you can change this however you want.

Prometheus¶

Blackbox¶

Dirt simple endpoint status prober. I use it to make sure stuff is online. Run with default config, add scrape jobs in prometheus.yml. easy.

Graphite¶

Graphite is some prometheus competitor idfk, but PfSense and TrueNAS both support it natively and its smooth, so I use it for them. The graphite mappings file is ultra important. its for remapping "." delimited strings to multi-label prometheus metrics.

Graphite is self-reporting, so you're going to get all metrics from a device no matter what mappings you have or don't have UNLESS you decide to use --graphite.mapping-strict-match .. This argument makes the exporter only record matched metrics. I don't use it atm because I'm too lazy to map my hba temperature metric.

To see raw graphite metrics to build mapping matcher regex you can run a fake endpoint and pass them on to graphite but print the results

socat -v TCP-LISTEN:2003,fork,reuseaddr -

Here is my graphite mappings file for TrueNAS and PfSense: !

graphite_mappings.conf

mappings:
    # ifstats mapping
    - match: 'servers\.(.*)\.interface-(.*)\.if_(.*)'
      match_type: regex
      name: 'truenas_interface_${3}'
      labels:
        hostname: ${1}
        device: ${2}
    # dataset metrics mapping
    - match: 'servers\.(.*)\.df-(.*)\.(.*)'
      match_type: regex
      name: 'truenas_dataset_${3}'
      labels:
        hostname: ${1}
        device: ${2}
    # memory metrics mapping
    - match: 'servers\.(.*)\.memory\.(.*)'
      match_type: regex
      name: 'truenas_${2}'
      labels:
        hostname: ${1}
    # zfs arc metrics mapping
    - match: 'servers\.(.*)\.zfs_arc\.(.*)'
      match_type: regex
      name: 'truenas_zfs_arc_${2}'
      labels:
        hostname: ${1}
    # processes metrics
    - match: 'servers\.(.*)\.processes\.(.*)'
      match_type: regex
      name: 'truenas_processes_${2}'
      labels:
        hostname: ${1}
    # LA metrics
    - match: 'servers\.(.*)\.load\.load\.(.*)'
      match_type: regex
      name: 'truenas_load_${2}'
      labels:
        hostname: ${1}
    # rrd cache metrics
    - match: 'servers\.(.*)\.rrdcached\.(.*)'
      match_type: regex
      name: 'truenas_rrdcached_${2}'
      labels:
        hostname: ${1}
    # swap metrics
    - match: 'servers\.(.*)\.swap\.(.*)'
      match_type: regex
      name: 'truenas_swap_${2}'
      labels:
        hostname: ${1}
    # uptime metric
    - match: 'servers\.(.*)\.uptime\.(.*)'
      match_type: regex
      name: 'truenas_uptime_${2}'
      labels:
        hostname: ${1}
    # disk metrics mapping
    - match: 'servers\.(.*)\.disk-(.*)\.(.*)\.(.*)'
      match_type: regex
      name: 'truenas_${3}_${4}'
      labels:
        hostname: ${1}
        device: ${2}
    # cpu and nfs metrics mapping
    - match: 'servers\.(.*)\.(.*)-(.*)\.(.*)'
      match_type: regex
      name: 'truenas_${2}_${4}'
      labels:
        hostname: ${1}
        device: ${3}
    - match: '^pfsense_\.([^.]+)\.([^.]+)\.net\.([^.]+)$'
      match_type: regex
      name: 'pfsense_net_${3}'
      labels:
        host: ${1}
        interface: ${2}
    - match: '^pfsense_\.([^.]+)\.([^.]+)\.([^.]+)\.pfconfig\.interface\.status'
      match_type: regex
      name: pfsense_interface_status
      labels:
        host: ${1}
        name: ${2}
        interface: ${3}

TrueNAS¶

Add Pool Status to Graphite¶

Cron job that maps statuses to labels leveraging one hot encoding. simply use like this: truenas_pool_status{pool="primary"} == 1 and you'll get what you want, set legend to {{ status }}

for p in $(zpool list -H -o name); do status=$(zpool get -H -o value health "$p"); ts=$(date +%s); for s in ONLINE DEGRADED FAULTED OFFLINE UNAVAIL REMOVED UNKNOWN; do v=0; [ "$status" = "$s" ] && v=1; echo "truenas.pool_status;pool=$p;status=$s $v $ts"; done; done | nc -w 1 192.168.4.9 2003

Add HBA Temp to Graphite¶

One line cron job!!! (tasks->cron jobs) run as root and schedule is just * (as fast as possible).

echo "truenas.hba_temp $(mprutil -u 0 show cfgpage page 7 | awk '/^0010/ {printf "%d", "0x" $4 $5}')" $(date +%s) | nc -w 1 192.168.4.9 2003

Add SMB Metrics to Graphite¶

vibe coded a python script that uses tcpdump and listens to SMB ports:

!

truenas_smb_metrics.py

#!/usr/bin/env python3
"""
SMB Traffic Monitor for TrueNAS Core
Monitors SMB traffic on ports 445 and 139 using tcpdump and reports to Graphite
"""

import socket
import time
import argparse
from datetime import datetime
from collections import defaultdict
import threading
import sys
import subprocess
import re

class SMBTrafficMonitor:
    def __init__(self, interface, graphite_host, graphite_port=2003, interval=10, debug=False):
        self.interface = interface
        self.graphite_host = graphite_host
        self.graphite_port = graphite_port
        self.interval = interval
        self.running = False
        self.debug = debug

        # Metrics storage
        self.metrics = {
            'bytes_in': 0,
            'bytes_out': 0,
            'packets_in': 0,
            'packets_out': 0,
            'connections': defaultdict(int)
        }
        self.lock = threading.Lock()

        # SMB ports
        self.smb_ports = {445, 139}

    def get_local_ip(self):
        """Get local IP address of the interface"""
        try:
            result = subprocess.run(
                ['ifconfig', self.interface],
                capture_output=True,
                text=True
            )
            for line in result.stdout.split('\n'):
                if 'inet ' in line:
                    parts = line.strip().split()
                    if len(parts) >= 2:
                        return parts[1]
        except Exception as e:
            print(f"Error getting local IP: {e}")

        return "0.0.0.0"

    def capture_packets(self):
        """Capture and analyze packets using tcpdump"""
        local_ip = self.get_local_ip()
        print(f"Monitoring SMB traffic on {self.interface} (IP: {local_ip})")
        print(f"Reporting to Graphite at {self.graphite_host}:{self.graphite_port}")
        print(f"Debug mode: {self.debug}")

        # tcpdump filter for SMB ports (445 and 139)
        tcpdump_filter = "tcp port 445 or tcp port 139"

        # Start tcpdump with verbose mode to get actual packet lengths
        # -v gives us the IP packet length which is what we want
        cmd = [
            'tcpdump',
            '-i', self.interface,
            '-l',  # Line buffered
            '-n',  # Don't resolve hostnames
            '-v',  # Verbose - gives us IP length
            '-tt', # Print timestamp as seconds since epoch
            tcpdump_filter
        ]

        try:
            process = subprocess.Popen(
                cmd,
                stdout=subprocess.PIPE,
                stderr=subprocess.DEVNULL,
                universal_newlines=True,
                bufsize=1
            )
        except FileNotFoundError:
            print("Error: tcpdump not found. Please install tcpdump.")
            sys.exit(1)
        except PermissionError:
            print("Error: This script requires root privileges to run tcpdump")
            sys.exit(1)

        print("tcpdump started successfully...")

        # Regex patterns to parse tcpdump verbose output
        # tcpdump -v output spans multiple lines:
        # Line 1: timestamp IP (..., length XXXX)
        # Line 2:     src_ip.src_port > dst_ip.dst_port: ...

        # Pattern to capture IP length (total IP packet size including headers)
        ip_length_pattern = re.compile(r'proto TCP \(6\), length (\d+)\)')

        # Pattern to capture source and destination (on the indented line)
        addr_pattern = re.compile(
            r'^\s+(\d+\.\d+\.\d+\.\d+)\.(\d+)\s*>\s*'
            r'(\d+\.\d+\.\d+\.\d+)\.(\d+):'
        )

        packet_count = 0
        parse_success = 0
        current_ip_length = None

        while self.running:
            line = None
            try:
                line = process.stdout.readline()
                if not line:
                    if process.poll() is not None:
                        print("tcpdump process ended unexpectedly")
                        break
                    continue

                packet_count += 1

                if self.debug and packet_count <= 10:
                    print(f"Raw line {packet_count}: {line.strip()}")

                # Check if this line has the IP length (first line of packet)
                ip_match = ip_length_pattern.search(line)
                if ip_match:
                    current_ip_length = int(ip_match.group(1))
                    if self.debug and packet_count <= 10:
                        print(f"  -> Found IP length: {current_ip_length}")
                    continue  # Address will be on next line

                # Check if this line has the addresses (second line of packet)
                addr_match = addr_pattern.search(line)
                if addr_match and current_ip_length is not None:
                    src_ip = addr_match.group(1)
                    src_port = int(addr_match.group(2))
                    dst_ip = addr_match.group(3)
                    dst_port = int(addr_match.group(4))

                    # Add Ethernet header (14 bytes) to get total wire size
                    packet_size = current_ip_length + 14

                    parse_success += 1

                    if self.debug and parse_success <= 10:
                        print(f"Parsed packet {parse_success}: {src_ip}:{src_port} -> {dst_ip}:{dst_port} ({packet_size} bytes on wire)")

                    if packet_count % 1000 == 0:
                        print(f"Processed {packet_count} lines, {parse_success} packets parsed successfully")

                # Determine if incoming or outgoing
                is_incoming = False
                connection_key = None

                if dst_port in self.smb_ports and dst_ip == local_ip:
                    # Incoming to SMB server
                    is_incoming = True
                    connection_key = f"{src_ip}:{src_port}"
                    if self.debug and packet_count <= 10:
                        print(f"  -> SMB INCOMING")
                elif src_port in self.smb_ports and src_ip == local_ip:
                    # Outgoing from SMB server
                    is_incoming = False
                    connection_key = f"{dst_ip}:{dst_port}"
                    if self.debug and packet_count <= 10:
                        print(f"  -> SMB OUTGOING")

                if connection_key:
                    with self.lock:
                        if is_incoming:
                            self.metrics['bytes_in'] += packet_size
                            self.metrics['packets_in'] += 1
                        else:
                            self.metrics['bytes_out'] += packet_size
                            self.metrics['packets_out'] += 1

                        self.metrics['connections'][connection_key] += 1

            except (AttributeError, ValueError, IndexError) as parse_error:
                if self.debug and line:
                    print(f"Failed to parse line: {parse_error}")
                    print(f"Line was: {line}")
                continue
            except Exception as e:
                if self.running:
                    print(f"Error reading tcpdump output: {e}")
                    if self.debug and line:
                        print(f"Line was: {line}")
                continue

        # Cleanup
        process.terminate()
        try:
            process.wait(timeout=5)
        except subprocess.TimeoutExpired:
            process.kill()

    def send_to_graphite(self, metric_name, value, timestamp):
        """Send metric to Graphite"""
        message = f"{metric_name} {value} {timestamp}\n"
        try:
            sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
            sock.settimeout(5)
            sock.connect((self.graphite_host, self.graphite_port))
            sock.sendall(message.encode())
            sock.close()
            return True
        except Exception as e:
            print(f"Error sending to Graphite: {e}")
            return False

    def report_metrics(self):
        """Periodically report metrics to Graphite"""
        base_metric = "truenas.smb"

        while self.running:
            time.sleep(self.interval)

            timestamp = int(time.time())

            with self.lock:
                # Calculate rates (per second)
                bytes_in_rate = self.metrics['bytes_in'] / self.interval
                bytes_out_rate = self.metrics['bytes_out'] / self.interval
                packets_in_rate = self.metrics['packets_in'] / self.interval
                packets_out_rate = self.metrics['packets_out'] / self.interval
                active_connections = len(self.metrics['connections'])

                # Send metrics
                self.send_to_graphite(f"{base_metric}.bytes_in", bytes_in_rate, timestamp)
                self.send_to_graphite(f"{base_metric}.bytes_out", bytes_out_rate, timestamp)
                self.send_to_graphite(f"{base_metric}.packets_in", packets_in_rate, timestamp)
                self.send_to_graphite(f"{base_metric}.packets_out", packets_out_rate, timestamp)
                self.send_to_graphite(f"{base_metric}.active_connections", active_connections, timestamp)
                self.send_to_graphite(f"{base_metric}.total_bandwidth", bytes_in_rate + bytes_out_rate, timestamp)

                # Console output
                print(f"\n[{datetime.now().strftime('%Y-%m-%d %H:%M:%S')}] Metrics:")
                print(f"  Bytes In:  {bytes_in_rate:.2f} B/s ({bytes_in_rate/1024:.2f} KB/s)")
                print(f"  Bytes Out: {bytes_out_rate:.2f} B/s ({bytes_out_rate/1024:.2f} KB/s)")
                print(f"  Packets In:  {packets_in_rate:.2f} pkt/s")
                print(f"  Packets Out: {packets_out_rate:.2f} pkt/s")
                print(f"  Active Connections: {active_connections}")

                # Reset counters
                self.metrics['bytes_in'] = 0
                self.metrics['bytes_out'] = 0
                self.metrics['packets_in'] = 0
                self.metrics['packets_out'] = 0
                self.metrics['connections'].clear()

    def start(self):
        """Start monitoring"""
        self.running = True

        # Start packet capture thread
        capture_thread = threading.Thread(target=self.capture_packets)
        capture_thread.daemon = True
        capture_thread.start()

        # Start reporting thread
        report_thread = threading.Thread(target=self.report_metrics)
        report_thread.daemon = True
        report_thread.start()

        try:
            # Keep main thread alive
            while True:
                time.sleep(1)
        except KeyboardInterrupt:
            print("\nStopping monitor...")
            self.running = False
            time.sleep(2)

def main():
    parser = argparse.ArgumentParser(description='Monitor SMB traffic and report to Graphite')
    parser.add_argument('-i', '--interface', required=True, help='Network interface to monitor (e.g., em0, igb0, vmx0)')
    parser.add_argument('-g', '--graphite-host', required=True, help='Graphite server hostname or IP')
    parser.add_argument('-p', '--graphite-port', type=int, default=2003, help='Graphite port (default: 2003)')
    parser.add_argument('-t', '--interval', type=int, default=10, help='Reporting interval in seconds (default: 10)')
    parser.add_argument('-d', '--debug', action='store_true', help='Enable debug output')

    args = parser.parse_args()

    monitor = SMBTrafficMonitor(
        interface=args.interface,
        graphite_host=args.graphite_host,
        graphite_port=args.graphite_port,
        interval=args.interval,
        debug=args.debug
    )

    monitor.start()

if __name__ == '__main__':
    main()

Create a startup task that executes the script with the appropriate args, script will show you what to do. easy.

PfSense¶

Add Interface Names to Metrics¶

This guy is a legend: https://github.com/VictorRobellini/pfSense-Dashboard/tree/master

Note: telegraf doesn't care what data format it gets (influx graphite whatever), it has the power to translate metrics to whatever output format seamlessly.

I took his php code and stripped it down a bit because I don't want to include ip addresses etc. since my dashboard is public. (don't want to give away my network map).

!

telegraf_pfif_alchemy.php

#!/usr/local/bin/php-cgi -f
<?php
require_once("config.inc");
require_once("gwlb.inc");
require_once("interfaces.inc");
$host = gethostname();
$source = "pfconfig";
$iflist = get_configured_interface_with_descr(true);
foreach ($iflist as $ifname => $friendly) {
    $ifinfo =  get_interface_info($ifname);
    $ifstatus = $ifinfo['status'];
    $ifconf = $config['interfaces'][$ifname];
    $realif = get_real_interface($ifname);
    $mac = get_interface_mac($realif);
    if (!isset($ifinfo)) {
        $ifinfo = "Unavailable";
    }
    if (strtolower($ifstatus) == "up") {
        $ifstatus = 1;
    }
    if (strtolower($ifstatus) == "active") {
        $ifstatus = 1;
    }
    if (strtolower($ifstatus) == "no carrier") {
        $ifstatus = 0;
    }
    if (strtolower($ifstatus) == "down") {
        $ifstatus = 0;
    }
    if (!isset($ifstatus)) {
        $ifstatus = 2;
    }
    if (!isset($ifconf)) {
        $ifconf = "Unassigned";
    }
    if (!isset($realif)) {
        $realif = "Unassigned";
    }
    printf(
        "interface,host=%s,name=%s,friendlyname=%s,source=%s status=%s\n",
        $host,
        $realif,
        $friendly,
        $source,
        $ifstatus
    );
};
?>

pfsense->diagnostics->edit file or command prompt and upload it. then chmod +x it.

In the telegraf package settings: Additional configuration for Telegraf:

[[inputs.exec]]
  commands = ["/usr/local/bin/telegraf_pfif_alchemy.php"]
  timeout = "5s"
  interval = "5s"
  data_format = "influx"

How to merge these in grafana: Note: "interface" is my common denominator here. Its what I refer to my physical adapters as. This is stuff I defined in my graphite mappings file when I regex parsed stuff with captures.

rate(pfsense_net_bytes_sent[30s]) * on(interface) group_left(name) pfsense_interface_status

idfk why but pfsense traffic metrics seem to be flipped. whatever.

Json¶

I hate slogging through Json on the web so much. I got it working though.

Jellyfin¶

This pulls metrics from Jellyfin's playback reporting plugin. I have it set to ignore my personal user guid since I cause the metrics to be heavily skewed. This is an insanely powerful tool since any SQL query can be run and parsed into metrics.

!

json_exporter.yml

modules:
  jellyfin_client_count:
    # Your original, simple SQL query
    body:
      content: |
        {
          "CustomQueryString": "SELECT ClientName, COUNT(ClientName) AS Count FROM PlaybackActivity WHERE UserId != '5d9eadb456d14511b268c9eddcbeb0b8' GROUP BY ClientName ORDER BY Count DESC;"
        }
    headers:
      Content-Type: "application/json"
      Authorization: "MediaBrowser Token="
    metrics:
      - name: jellyfin_client_playback_count
        help: Total playback count per client (excluding specified user)
        type: object
        path: "{ $.results[*] }"  # Iterates over the array, e.g., ["Jellyfin Web", 488]
        labels:
          client: "{ [0] }"      # <-- CORRECT SYNTAX: Gets index 0
        values:
          value: "{ [1] }"       # <-- CORRECT SYNTAX: Gets index 1

  jellyfin_client_duration:
    body:
      content: |
        {
          "CustomQueryString": "SELECT ClientName, SUM(PlayDuration) AS Count FROM PlaybackActivity WHERE UserId != '5d9eadb456d14511b268c9eddcbeb0b8' AND PlayDuration > 0 GROUP BY ClientName ORDER BY Count DESC;"
        }
    headers:
      Content-Type: "application/json"
      Authorization: "MediaBrowser Token="
    metrics:
      - name: jellyfin_client_playback_duration_seconds
        help: Total playback duration per client in seconds (excluding specified user)
        type: object
        path: "{ $.results[*] }"
        labels:
          client: "{ [0] }"
        values:
          value: "{ [1] }"

  jellyfin_weekday_duration:
    body:
      content: |
        {
          "CustomQueryString": "SELECT strftime('%w', DateCreated) AS Weekday, SUM(PlayDuration) AS TotalPlayDuration FROM PlaybackActivity WHERE UserId != '5d9eadb456d14511b268c9eddcbeb0b8' AND PlayDuration > 0 GROUP BY strftime('%w', DateCreated) ORDER BY strftime('%w', DateCreated);"
        }
    headers:
      Content-Type: "application/json"
      Authorization: "MediaBrowser Token=02c869f338904f1daab13ae9312b6718"
    metrics:
      - name: jellyfin_weekday_playback_duration_seconds
        help: Total playback duration per weekday in seconds (excluding specified user)
        type: object
        path: "{ $.results[*] }"
        labels:
          weekday: "{ [0] }"
        values:
          value: "{ [1] }"

  jellyfin_hourly_duration:
    body:
      content: |
        {
          "CustomQueryString": "SELECT strftime('%H', DateCreated) AS HourOfDay, SUM(PlayDuration) AS TotalDuration FROM PlaybackActivity WHERE UserId != '5d9eadb456d14511b268c9eddcbeb0b8' AND PlayDuration > 0 GROUP BY HourOfDay ORDER BY HourOfDay;"
        }
    headers:
      Content-Type: "application/json"
      Authorization: "MediaBrowser Token="
    metrics:
      - name: jellyfin_hourly_playback_duration_seconds
        help: Total playback duration per hour of day in seconds (excluding specified user)
        type: object
        path: "{ $.results[*] }"
        labels:
          hour: "{ [0] }"
        values:
          value: "{ [1] }"

SNMP¶

SNMP is vile. Avoid if possible. I am going to briefly touch on what I learned.

OIDs are strings of numbers used during runtime as the ID of a metric, like a completely globally unique guid agreed upon by all of mankind is unique across all systems, or something. MIBs are like dictionaries that map OIDs to string identifiers and metric formats.

Prometheus has a cute program. If your SNMP config file whatever, uses straight OIDs and no human readable strings and shit, and is set up perfectly, then you don't need to carry MIBs around with you. So why not have a tool that generates a file so you only need MIBs once.

Learn to use the snmp generator tool. Some tips, get all the mibs you can. Even proprietary software mostly uses standard public mibs. I have copies of VMWare specific mibs I included when generating my snmp config. SNMPv2 is fine, v3 is secure as hell (encrypted).

VMWare ESXi¶

using resxtop

NOTE: I have no idea why, but after a reboot it seems like permissions for the roll I created "resxtop" reset? Need to watch out for this.

in ESXI:

Manage->Security and Users -> Roles -> Add Role -> I called it "resxtop"

edit the role "resxtop". it needs System and some Global permissions. In Global permissions it needs ServiceManagers, Health, SystemTag, GlobalTag.

go to Users and add a new user

right click the "Host" and go to "permissions"

assign the role to the user account you made

APCUPS¶

Quick little exporter that hooks to an apcupsd process. done.

I find hooking my ups to apcupsd over usb a lot more elegant than messing with networking.

IPMI¶

Quick, easy, performant. Notice I only reference "ipmi" in the exporter's config under collectors. "User" level accounts on supermicro boards don't have access to much, and I don't need anything else. You need to have ipmitool on your system for this to work. I have it in my PATH atm, otherwise I'd just slap the binary in the same folder as the tool.

!

ipmi.yml

modules:
  default:
    # Credentials for the remote BMC (10.0.0.10)
    user: ""
    pass: ""

    # Common, safe defaults for most BMCs:
    driver: "LAN_2_0"      # FreeIPMI LAN 2.0
    privilege: "user"      # or "admin" if your BMC requires it
    collectors:
      - ipmi

Supermicro¶

Exportarr¶

Super easy, works with them all. iirc I put backfill for prowlarr's exportarr so metrics wouldn't wink out for indexers or something. idfk. I also run with additional metrics enabled that I'm not using.

--backfill true --port 9714 --url "http://192.168.5.4:9696" --api-key "" --enable-additional-metrics true