In modern Linux systems, **systemd** has emerged as the de facto init system, managing everything from service startup and shutdown to process supervision, log aggregation, and resource control. For system administrators, DevOps engineers, and developers, ensuring services run reliably is critical—and real-time monitoring is the cornerstone of that reliability. Whether you’re troubleshooting a failing service, optimizing resource usage, or proactively preventing downtime, monitoring systemd services in real time empowers you to act swiftly. This blog explores the most essential techniques for real-time systemd service monitoring, from built-in tools like `systemctl` and `journalctl` to advanced third-party integrations and custom alerting workflows. By the end, you’ll have a toolkit to track service health, logs, resource consumption, and dependencies—all in real time.

Table of Contents

  1. Understanding Systemd: A Quick Primer
  2. 1. Real-Time Status Checks with systemctl
  3. 2. Live Log Monitoring with journalctl
  4. 3. Resource Usage Tracking with systemd-cgtop
  5. 4. Third-Party Tools: Prometheus + Node Exporter
  6. 5. Custom Scripts for Real-Time Alerts
  7. 6. Advanced Monitoring: Dependencies and Socket Activation
  8. 7. Setting Up Real-Time Alerts with Systemd
  9. Troubleshooting Common Monitoring Issues
  10. Conclusion
  11. References

Understanding Systemd: A Quick Primer

Before diving into monitoring, let’s recap how systemd works. Systemd uses units (e.g., .service.socket.target) to manage resources. A .service unit defines how a service starts, runs, and stops. Systemd also maintains a centralized log system (journald), cgroups for resource isolation, and tools like systemctl (for unit management) and journalctl (for log querying).

Real-time monitoring of systemd services involves tracking:

  • Service state (active, failed, inactive).
  • Logs generated by the service.
  • Resource usage (CPU, memory, I/O).
  • Dependencies between services.
  • Automatic recovery and failure events.

1. Real-Time Status Checks with systemctl

systemctl is the primary command-line tool for interacting with systemd. It provides real-time insights into service states, enabling you to quickly identify issues like failed or inactive services.

Key Commands for Real-Time Status

Check Service State

To get the current status of a specific service (e.g., nginx):

systemctl status nginx  

Output Explanation:

  • Active: active (running): Service is healthy.
  • Active: failed (Result: exit-code): Service crashed or failed to start.
  • Active: inactive (dead): Service is stopped.
  • Includes recent log entries, PID, and memory usage.
Monitor Service State Continuously

To track a service’s status in real time (refreshing every 1 second), combine systemctl with watch:

watch -n 1 systemctl status nginx  

The watch command repeats the systemctl status command every n seconds (here, 1), ideal for observing transient issues (e.g., a service that crashes intermittently).

List All Active Services

To view all active services in real time:

systemctl list-units --type=service --state=active  

Add --no-pager to avoid pagination:

systemctl list-units --type=service --state=active --no-pager  
Filter by State

Focus on failed services to triage critical issues:

systemctl list-units --type=service --state=failed  

Pro Tip

Use systemctl is-active <service> for scriptable checks (returns activeinactive, or failed):

if systemctl is-active --quiet nginx; then  
  echo "Nginx is running"  
else  
  echo "Nginx is down!"  
fi  

2. Live Log Monitoring with journalctl

Systemd’s journald collects logs from services, kernel, and system processes in a structured, binary format. journalctl queries these logs, and its -f (follow) flag enables real-time log monitoring—critical for debugging live issues.

Key Commands for Real-Time Logs

Follow a Service’s Logs

To stream logs from a specific service (e.g., nginx) in real time:

journalctl -u nginx -f  
  • -u nginx: Filter logs by the nginx.service unit.
  • -f: “Follow” new log entries (like tail -f).
Filter by Priority

Focus on errors or critical messages to avoid noise:

journalctl -u nginx -f -p err  

Priority levels (from highest to lowest severity):
emerg (0), alert (1), crit (2), err (3), warning (4), notice (5), info (6), debug (7).

Combine Filters

Narrow logs by time, priority, and service:

journalctl -u nginx -f -p err --since "10 minutes ago"  

This shows errors from nginx in the last 10 minutes, continuing to stream new entries.

Structured Log Output

For parsing logs programmatically (e.g., in scripts or monitoring tools), use JSON format:

journalctl -u nginx -f -o json  

Or human-readable JSON (json-pretty):

journalctl -u nginx -f -o json-pretty  

Pro Tip

Use --no-hostname to exclude hostname from logs (cleaner output on single-machine setups) and --output=short for concise timestamps:

journalctl -u nginx -f --no-hostname --output=short  

3. Resource Usage Tracking with systemd-cgtop

Systemd uses cgroups (control groups) to manage resource allocation for services. systemd-cgtop provides real-time metrics for CPU, memory, disk I/O, and network usage of systemd-managed cgroups—making it easy to identify resource-hungry services.

How to Use systemd-cgtop

Run the command without arguments to launch an interactive, top-like interface:

systemd-cgtop  

Key Metrics Displayed:

  • CPU%: Percentage of CPU used by the cgroup.
  • Mem%: Percentage of memory used.
  • IO Read/Write: Disk I/O activity.
  • Press q to quit.
  • Press c to sort by CPU usage, m for memory, i for I/O.
  • Filter by service: Press /, type the service name (e.g., nginx), and press Enter.

Example Output

Control Group               CPU%   Mem%   Input/s Output/s  
/                           1.2    34.5   0B      0B  
/user.slice                 0.5    12.3   0B      0B  
/system.slice               0.7    22.2   0B      0B  
/system.slice/nginx.service 0.3    5.1    0B      0B  

4. Third-Party Tools: Prometheus + Node Exporter

For scalable, long-term monitoring, combine systemd’s built-in tools with third-party solutions like Prometheus (time-series database) and Node Exporter (metrics collector). Node Exporter exposes systemd metrics (e.g., service state, restart count) for Prometheus to scrape.

Step 1: Install Node Exporter

Node Exporter includes a systemd collector by default. Install it via your package manager or from source:

# On Ubuntu/Debian  
sudo apt install prometheus-node-exporter  

# Start and enable the service  
sudo systemctl enable --now prometheus-node-exporter  

Step 2: Verify Systemd Metrics

Node Exporter exposes metrics at http://localhost:9100/metrics. Check for systemd-specific metrics:

curl http://localhost:9100/metrics | grep node_systemd_unit_state  

Example output:

node_systemd_unit_state{name="nginx.service",state="active"} 1  
node_systemd_unit_state{name="nginx.service",state="failed"} 0  
node_systemd_unit_state{name="nginx.service",state="inactive"} 0  

Step 3: Configure Prometheus to Scrape Metrics

Add Node Exporter as a target in prometheus.yml:

scrape_configs:  
  - job_name: 'node'  
    static_configs:  
      - targets: ['localhost:9100']  

Restart Prometheus and navigate to http://localhost:9090/graph to query metrics. For example:

node_systemd_unit_state{name="nginx.service", state="active"}  

Step 4: Visualize with Grafana

For dashboards, connect Prometheus to Grafana and import a systemd-focused dashboard (e.g., Dashboard ID 1860).

5. Custom Scripts for Real-Time Alerts

For simple, service-specific monitoring, write custom scripts using systemctl and journalctl, then trigger alerts (e.g., email, Slack) when issues arise.

Example: Monitor Service and Send Alerts

This script checks if nginx is active and sends a Slack alert if it fails:

#!/bin/bash  

SERVICE="nginx"  
SLACK_WEBHOOK="https://hooks.slack.com/services/YOUR_WEBHOOK"  

if ! systemctl is-active --quiet $SERVICE; then  
  # Get last 10 logs for context  
  LOGS=$(journalctl -u $SERVICE --no-pager -n 10)  
  # Send Slack alert  
  curl -X POST -H "Content-Type: application/json" \  
    -d "{\"text\":\"ALERT: $SERVICE is down!\\nLogs:\\n$LOGS\"}" \  
    $SLACK_WEBHOOK  
fi  

Automate with Systemd Timers

To run the script every minute, create a systemd timer:

  1. Create a service file (/etc/systemd/system/nginx-monitor.service):
[Unit]  
Description=Monitor Nginx and send alerts  

[Service]  
Type=oneshot  
ExecStart=/path/to/your/script.sh  
  1. Create a timer file (/etc/systemd/system/nginx-monitor.timer):
[Unit]  
Description=Run Nginx monitor every minute  

[Timer]  
OnCalendar=*:0/1  
Persistent=true  

[Install]  
WantedBy=timers.target  
  1. Enable and start the timer:
sudo systemctl enable --now nginx-monitor.timer  

6. Advanced Monitoring: Dependencies and Socket Activation

Systemd services often depend on other units (e.g., a database service依赖于 a network target). Monitoring these dependencies and socket activation (where services start on-demand) helps diagnose cascading failures.

Monitor Dependencies

Use systemctl list-dependencies to visualize a service’s dependencies:

systemctl list-dependencies nginx.service  

Output example:

nginx.service  
├─system.slice  
└─basic.target  
  ├─-.mount  
  ├─paths.target  
  ├─slices.target  
  ...  

If a dependency (e.g., network.target) fails, nginx may not start—this command helps identify such issues.

Monitor Socket Activation

Services using socket activation (e.g., sshd.socket) start only when a connection is received. List active sockets with:

systemctl list-sockets  

Check if a socket is listening and the service is inactive:

systemctl status sshd.socket  

7. Setting Up Real-Time Alerts with Systemd

Systemd has built-in alerting via OnFailure in service units. When a service fails, OnFailure triggers another unit (e.g., an alert service).

Example: Alert on Service Failure

  1. Create an alert service (/etc/systemd/system/alert-on-failure@.service):
[Unit]  
Description=Send alert when %i fails  

[Service]  
Type=oneshot  
ExecStart=/path/to/send-alert.sh %i  # Script to send email/Slack  
  1. Modify the target service (e.g., nginx.service) to use OnFailure:
[Unit]  
Description=Nginx HTTP Server  
OnFailure=alert-on-failure@nginx.service  

[Service]  
ExecStart=/usr/sbin/nginx  
...  
  1. Reload systemd and restart the service:
sudo systemctl daemon-reload  
sudo systemctl restart nginx  

Now, if nginx fails, alert-on-failure@nginx.service runs, triggering your alert script.

Troubleshooting Common Monitoring Issues

  • Logs Not Showing Up: Ensure journald is running (systemctl status systemd-journald). If logs are missing, check Storage=auto in /etc/systemd/journald.conf (logs may be stored in /run/log/journal temporarily).
  • High Resource Usage: Use systemd-cgtop to identify resource-heavy cgroups. Check for memory leaks with journalctl -u <service> -f for repeated error logs.
  • Alerts Not Triggering: Verify the OnFailure unit path and script permissions. Test alerts manually with systemctl start alert-on-failure@nginx.service.

Conclusion

Real-time monitoring of systemd services is critical for maintaining system reliability. By leveraging built-in tools like systemctljournalctl, and systemd-cgtop, combined with third-party solutions like Prometheus and custom scripts, you can proactively detect issues, debug failures, and ensure services run smoothly.

Remember: The best monitoring strategy combines real-time visibility (via logs and status checks), resource tracking (via cgroups), and automated alerts (via systemd timers or OnFailure). With these techniques, you’ll minimize downtime and streamline troubleshooting.

References

Logo

有“AI”的1024 = 2048,欢迎大家加入2048 AI社区

更多推荐