Essential Techniques for Monitoring Systemd Services in Real-Time

Systemd services often depend on other units (e.g., a database service依赖于 a network target). Monitoring these dependencies and socket activation (where services start on-demand) helps diagnose cascadi

allway2

364人浏览 · 2026-02-14 13:21:54

allway2 · 2026-02-14 13:21:54 发布

In modern Linux systems, **systemd** has emerged as the de facto init system, managing everything from service startup and shutdown to process supervision, log aggregation, and resource control. For system administrators, DevOps engineers, and developers, ensuring services run reliably is critical—and real-time monitoring is the cornerstone of that reliability. Whether you’re troubleshooting a failing service, optimizing resource usage, or proactively preventing downtime, monitoring systemd services in real time empowers you to act swiftly. This blog explores the most essential techniques for real-time systemd service monitoring, from built-in tools like `systemctl` and `journalctl` to advanced third-party integrations and custom alerting workflows. By the end, you’ll have a toolkit to track service health, logs, resource consumption, and dependencies—all in real time.

Understanding Systemd: A Quick Primer

Before diving into monitoring, let’s recap how systemd works. Systemd uses units (e.g., .service, .socket, .target) to manage resources. A .service unit defines how a service starts, runs, and stops. Systemd also maintains a centralized log system (journald), cgroups for resource isolation, and tools like systemctl (for unit management) and journalctl (for log querying).

Real-time monitoring of systemd services involves tracking:

Service state (active, failed, inactive).
Logs generated by the service.
Resource usage (CPU, memory, I/O).
Dependencies between services.
Automatic recovery and failure events.

1. Real-Time Status Checks with `systemctl`

systemctl is the primary command-line tool for interacting with systemd. It provides real-time insights into service states, enabling you to quickly identify issues like failed or inactive services.

Key Commands for Real-Time Status

Check Service State

To get the current status of a specific service (e.g., nginx):

systemctl status nginx

Output Explanation:

Active: active (running): Service is healthy.
Active: failed (Result: exit-code): Service crashed or failed to start.
Active: inactive (dead): Service is stopped.
Includes recent log entries, PID, and memory usage.

Monitor Service State Continuously

To track a service’s status in real time (refreshing every 1 second), combine systemctl with watch:

watch -n 1 systemctl status nginx

The watch command repeats the systemctl status command every n seconds (here, 1), ideal for observing transient issues (e.g., a service that crashes intermittently).

List All Active Services

To view all active services in real time:

systemctl list-units --type=service --state=active

Add --no-pager to avoid pagination:

systemctl list-units --type=service --state=active --no-pager

Filter by State

Focus on failed services to triage critical issues:

systemctl list-units --type=service --state=failed

Pro Tip

Use systemctl is-active <service> for scriptable checks (returns active, inactive, or failed):

if systemctl is-active --quiet nginx; then  
  echo "Nginx is running"  
else  
  echo "Nginx is down!"  
fi

2. Live Log Monitoring with `journalctl`

Systemd’s journald collects logs from services, kernel, and system processes in a structured, binary format. journalctl queries these logs, and its -f (follow) flag enables real-time log monitoring—critical for debugging live issues.

Key Commands for Real-Time Logs

Follow a Service’s Logs

To stream logs from a specific service (e.g., nginx) in real time:

journalctl -u nginx -f

-u nginx: Filter logs by the nginx.service unit.
-f: “Follow” new log entries (like tail -f).

Filter by Priority

Focus on errors or critical messages to avoid noise:

journalctl -u nginx -f -p err

Priority levels (from highest to lowest severity):
emerg (0), alert (1), crit (2), err (3), warning (4), notice (5), info (6), debug (7).

Combine Filters

Narrow logs by time, priority, and service:

journalctl -u nginx -f -p err --since "10 minutes ago"

This shows errors from nginx in the last 10 minutes, continuing to stream new entries.

Structured Log Output

For parsing logs programmatically (e.g., in scripts or monitoring tools), use JSON format:

journalctl -u nginx -f -o json

Or human-readable JSON (json-pretty):

journalctl -u nginx -f -o json-pretty

Pro Tip

Use --no-hostname to exclude hostname from logs (cleaner output on single-machine setups) and --output=short for concise timestamps:

journalctl -u nginx -f --no-hostname --output=short

3. Resource Usage Tracking with `systemd-cgtop`

Systemd uses cgroups (control groups) to manage resource allocation for services. systemd-cgtop provides real-time metrics for CPU, memory, disk I/O, and network usage of systemd-managed cgroups—making it easy to identify resource-hungry services.

How to Use `systemd-cgtop`

Run the command without arguments to launch an interactive, top-like interface:

systemd-cgtop

Key Metrics Displayed:

CPU%: Percentage of CPU used by the cgroup.
Mem%: Percentage of memory used.
IO Read/Write: Disk I/O activity.

Press q to quit.
Press c to sort by CPU usage, m for memory, i for I/O.
Filter by service: Press /, type the service name (e.g., nginx), and press Enter.

Example Output

Control Group               CPU%   Mem%   Input/s Output/s  
/                           1.2    34.5   0B      0B  
/user.slice                 0.5    12.3   0B      0B  
/system.slice               0.7    22.2   0B      0B  
/system.slice/nginx.service 0.3    5.1    0B      0B

4. Third-Party Tools: Prometheus + Node Exporter

For scalable, long-term monitoring, combine systemd’s built-in tools with third-party solutions like Prometheus (time-series database) and Node Exporter (metrics collector). Node Exporter exposes systemd metrics (e.g., service state, restart count) for Prometheus to scrape.

Step 1: Install Node Exporter

Node Exporter includes a systemd collector by default. Install it via your package manager or from source:

# On Ubuntu/Debian  
sudo apt install prometheus-node-exporter  

# Start and enable the service  
sudo systemctl enable --now prometheus-node-exporter

Step 2: Verify Systemd Metrics

Node Exporter exposes metrics at http://localhost:9100/metrics. Check for systemd-specific metrics:

curl http://localhost:9100/metrics | grep node_systemd_unit_state

Example output:

node_systemd_unit_state{name="nginx.service",state="active"} 1  
node_systemd_unit_state{name="nginx.service",state="failed"} 0  
node_systemd_unit_state{name="nginx.service",state="inactive"} 0

Step 3: Configure Prometheus to Scrape Metrics

Add Node Exporter as a target in prometheus.yml:

scrape_configs:  
  - job_name: 'node'  
    static_configs:  
      - targets: ['localhost:9100']

Restart Prometheus and navigate to http://localhost:9090/graph to query metrics. For example:

node_systemd_unit_state{name="nginx.service", state="active"}

Step 4: Visualize with Grafana

For dashboards, connect Prometheus to Grafana and import a systemd-focused dashboard (e.g., Dashboard ID 1860).

5. Custom Scripts for Real-Time Alerts

For simple, service-specific monitoring, write custom scripts using systemctl and journalctl, then trigger alerts (e.g., email, Slack) when issues arise.

Example: Monitor Service and Send Alerts

This script checks if nginx is active and sends a Slack alert if it fails:

#!/bin/bash  

SERVICE="nginx"  
SLACK_WEBHOOK="https://hooks.slack.com/services/YOUR_WEBHOOK"  

if ! systemctl is-active --quiet $SERVICE; then  
  # Get last 10 logs for context  
  LOGS=$(journalctl -u $SERVICE --no-pager -n 10)  
  # Send Slack alert  
  curl -X POST -H "Content-Type: application/json" \  
    -d "{\"text\":\"ALERT: $SERVICE is down!\\nLogs:\\n$LOGS\"}" \  
    $SLACK_WEBHOOK  
fi

Automate with Systemd Timers

To run the script every minute, create a systemd timer:

Create a service file (/etc/systemd/system/nginx-monitor.service):

[Unit]  
Description=Monitor Nginx and send alerts  

[Service]  
Type=oneshot  
ExecStart=/path/to/your/script.sh

Create a timer file (/etc/systemd/system/nginx-monitor.timer):

[Unit]  
Description=Run Nginx monitor every minute  

[Timer]  
OnCalendar=*:0/1  
Persistent=true  

[Install]  
WantedBy=timers.target

Enable and start the timer:

sudo systemctl enable --now nginx-monitor.timer

6. Advanced Monitoring: Dependencies and Socket Activation

Monitor Dependencies

Use systemctl list-dependencies to visualize a service’s dependencies:

systemctl list-dependencies nginx.service

Output example:

nginx.service  
├─system.slice  
└─basic.target  
  ├─-.mount  
  ├─paths.target  
  ├─slices.target  
  ...

If a dependency (e.g., network.target) fails, nginx may not start—this command helps identify such issues.

Monitor Socket Activation

Services using socket activation (e.g., sshd.socket) start only when a connection is received. List active sockets with:

systemctl list-sockets

Check if a socket is listening and the service is inactive:

systemctl status sshd.socket

7. Setting Up Real-Time Alerts with Systemd

Systemd has built-in alerting via OnFailure in service units. When a service fails, OnFailure triggers another unit (e.g., an alert service).

Example: Alert on Service Failure

Create an alert service (/etc/systemd/system/alert-on-failure@.service):

[Unit]  
Description=Send alert when %i fails  

[Service]  
Type=oneshot  
ExecStart=/path/to/send-alert.sh %i  # Script to send email/Slack

Modify the target service (e.g., nginx.service) to use OnFailure:

[Unit]  
Description=Nginx HTTP Server  
OnFailure=alert-on-failure@nginx.service  

[Service]  
ExecStart=/usr/sbin/nginx  
...

Reload systemd and restart the service:

sudo systemctl daemon-reload  
sudo systemctl restart nginx

Now, if nginx fails, alert-on-failure@nginx.service runs, triggering your alert script.

Troubleshooting Common Monitoring Issues

Logs Not Showing Up: Ensure journald is running (systemctl status systemd-journald). If logs are missing, check Storage=auto in /etc/systemd/journald.conf (logs may be stored in /run/log/journal temporarily).
High Resource Usage: Use systemd-cgtop to identify resource-heavy cgroups. Check for memory leaks with journalctl -u <service> -f for repeated error logs.
Alerts Not Triggering: Verify the OnFailure unit path and script permissions. Test alerts manually with systemctl start alert-on-failure@nginx.service.

Conclusion

Real-time monitoring of systemd services is critical for maintaining system reliability. By leveraging built-in tools like systemctl, journalctl, and systemd-cgtop, combined with third-party solutions like Prometheus and custom scripts, you can proactively detect issues, debug failures, and ensure services run smoothly.

Remember: The best monitoring strategy combines real-time visibility (via logs and status checks), resource tracking (via cgroups), and automated alerts (via systemd timers or OnFailure). With these techniques, you’ll minimize downtime and streamline troubleshooting.

References

2048 AI社区

有“AI”的1024 = 2048，欢迎大家加入2048 AI社区

更多推荐

iwr -useb https://openclaw.ai/install.ps1 | iex 这里的iwr怎么安装？

摘要：iwr是PowerShell中Invoke-WebRequest的别名，用于发起HTTP/HTTPS请求。命令iwr -useb https://openclaw.ai/install.ps1|iex表示下载并执行远程脚本。在Windows系统中，iwr是PowerShell 3.0+的内置命令；Linux/macOS需安装PowerShell Core才能使用。执行前需验证来源可信性，并注

2048 AI社区

C++进阶9：异常和智能指针

2048 AI社区

精细化拓客背景下，B端号码核验的困局与技术破局路径氪迹科技法人、股东、号码核验、筛选系统

B端拓客正面临号码核验的精准度与成本双重困境。传统核验模式存在精准度低（不足85%）、数据滞后、成本高企等问题，导致大量无效线索消耗人力财力。新兴技术方案通过AI算法和实时算力，将精准度提升至98%，核验成本降至行业1/3，并解决数据时效性问题。这种"低价高质"模式适配电销、金融等多元场景，支持API对接和批量处理，帮助团队实现降本增效。技术驱动的核验服务正成为行业趋势，推动B

2048 AI社区

所有评论(0)

查看更多评论

allway2

@allway2

已为社区贡献221条内容

Essential Techniques for Monitoring Systemd Services in Real-Time

allway2

Table of Contents

Understanding Systemd: A Quick Primer

1. Real-Time Status Checks with `systemctl`

Key Commands for Real-Time Status

Check Service State

Monitor Service State Continuously

List All Active Services

Filter by State

Pro Tip

2. Live Log Monitoring with `journalctl`

Key Commands for Real-Time Logs

Follow a Service’s Logs

Filter by Priority

Combine Filters

Structured Log Output

Pro Tip

3. Resource Usage Tracking with `systemd-cgtop`

How to Use `systemd-cgtop`

Navigation Tips

Example Output

4. Third-Party Tools: Prometheus + Node Exporter

Step 1: Install Node Exporter

Step 2: Verify Systemd Metrics

Step 3: Configure Prometheus to Scrape Metrics

Step 4: Visualize with Grafana

5. Custom Scripts for Real-Time Alerts

Example: Monitor Service and Send Alerts

Automate with Systemd Timers

6. Advanced Monitoring: Dependencies and Socket Activation

Monitor Dependencies

Monitor Socket Activation

7. Setting Up Real-Time Alerts with Systemd

Example: Alert on Service Failure

Troubleshooting Common Monitoring Issues

Conclusion

References

所有评论(0)

allway2

Essential Techniques for Monitoring Systemd Services in Real-Time

allway2

Table of Contents

Understanding Systemd: A Quick Primer

1. Real-Time Status Checks with systemctl

Key Commands for Real-Time Status

Check Service State

Monitor Service State Continuously

List All Active Services

Filter by State

Pro Tip

2. Live Log Monitoring with journalctl

Key Commands for Real-Time Logs

Follow a Service’s Logs

Filter by Priority

Combine Filters

Structured Log Output

Pro Tip

3. Resource Usage Tracking with systemd-cgtop

How to Use systemd-cgtop

Navigation Tips

Example Output

4. Third-Party Tools: Prometheus + Node Exporter

Step 1: Install Node Exporter

Step 2: Verify Systemd Metrics

Step 3: Configure Prometheus to Scrape Metrics

Step 4: Visualize with Grafana

5. Custom Scripts for Real-Time Alerts

Example: Monitor Service and Send Alerts

Automate with Systemd Timers

6. Advanced Monitoring: Dependencies and Socket Activation

Monitor Dependencies

Monitor Socket Activation

7. Setting Up Real-Time Alerts with Systemd

Example: Alert on Service Failure

Troubleshooting Common Monitoring Issues

Conclusion

References

所有评论(0)

allway2

1. Real-Time Status Checks with `systemctl`

2. Live Log Monitoring with `journalctl`

3. Resource Usage Tracking with `systemd-cgtop`

How to Use `systemd-cgtop`