上篇文章:

Spring Cloud系列—SkyWalking链路追踪https://blog.csdn.net/sniper_fandc/article/details/149948321?fromshare=blogdetail&sharetype=blogdetail&sharerId=149948321&sharerefer=PC&sharesource=sniper_fandc&sharefrom=from_link

目录

1 告警规则

1.1 告警规则配置项

1.2 Webhook邮箱告警

1.2.1 引入依赖

1.2.2 添加配置项

1.2.3 接口开发

1.2.4 配置Webhook

1.2.5 重启服务和SkyWalking

2 Webhook接入飞书


1 告警规则

1.1 告警规则配置项

        当发生异常信息时,比如接口访问非常慢或超时,请求成功率很低,就需要SkyWalking来通知开发人员和运维人员来及时排查问题:

        在SkyWalking安装目录apache-skywalking-apm-bin\config的alarm-settings.yml文件中,存在默认的告警规则:

# Sample alarm rules.

rules:

  # Rule unique name, must be ended with `_rule`.

  service_resp_time_rule:

    # A MQE expression, the result type must be `SINGLE_VALUE` and the root operation of the expression must be a Compare Operation

    # which provides `1`(true) or `0`(false) result. When the result is `1`(true), the alarm will be triggered.

    expression: sum(service_resp_time > 1000) >= 3

    period: 10

    silence-period: 5

    message: Response time of service {name} is more than 1000ms in 3 minutes of last 10 minutes.

#  service_resp_time_rule:

#    expression: avg(service_resp_time) > 1000

#    period: 10

#    silence-period: 5

#    message: Avg response time of service {name} is more than 1000ms in last 10 minutes.

  service_sla_rule:

    expression: sum(service_sla < 8000) >= 2

    # The length of time to evaluate the metrics

    period: 10

    # How many times of checks, the alarm keeps silence after alarm triggered, default as same as period.

    silence-period: 3

    message: Successful rate of service {name} is lower than 80% in 2 minutes of last 10 minutes

  service_resp_time_percentile_rule:

    expression: sum(service_percentile{p='50,75,90,95,99'} > 1000) >= 3

    period: 10

    silence-period: 5

    message: Percentile response time of service {name} alarm in 3 minutes of last 10 minutes, due to more than one condition of p50 > 1000, p75 > 1000, p90 > 1000, p95 > 1000, p99 > 1000

  service_instance_resp_time_rule:

    expression: sum(service_instance_resp_time > 1000) >= 2

    period: 10

    silence-period: 5

    message: Response time of service instance {name} is more than 1000ms in 2 minutes of last 10 minutes

  database_access_resp_time_rule:

    expression: sum(database_access_resp_time > 1000) >= 2

    period: 10

    message: Response time of database access {name} is more than 1000ms in 2 minutes of last 10 minutes

  endpoint_relation_resp_time_rule:

    expression: sum(endpoint_relation_resp_time > 1000) >= 2

    period: 10

    message: Response time of endpoint relation {name} is more than 1000ms in 2 minutes of last 10 minutes

#  Active endpoint related metrics alarm will cost more memory than service and service instance metrics alarm.

#  Because the number of endpoint is much more than service and instance.

#

#  endpoint_resp_time_rule:

#    expression: sum(endpoint_resp_time > 1000) >= 2

#    period: 10

#    silence-period: 5

#    message: Response time of endpoint {name} is more than 1000ms in 2 minutes of last 10 minutes



#hooks:

#  webhook:

#    default:

#      is-default: true

#      urls:

#        - http://127.0.0.1/notify/

#        - http://127.0.0.1/go-wechat/

        告警规则的定义必须以`_rule`结尾,其中:

        expression是告警表达式,结果为1时触发告警;

        period是告警周期(minute),在该时间范围内满足expression的触发高级;

        silence-period是静默时间(minute),触发告警后,静默时间内不再触发;

        message是告警信息。

        比如service_resp_time_rule规则,表示某个服务的响应时间在最近10分钟的3分钟内持续超过1000毫秒时触发告警。

1.2 Webhook邮箱告警

        Webhook是一种允许应用程序向外部系统实时推送事件或数据的机制,通常通过HTTP回调实现,从而实现跨系统自动化的信息传递。核心特性:

        事件驱动:当预设条件触发时(如告警触发、数据更新),主动向目标URL发送HTTP请求(通常为POST)。

        轻量级集成:接收方只需提供一个可访问的HTTP端点即可接收数据,无需轮询查询。

        灵活扩展:适用于告警通知、流程触发、数据同步等场景。

1.2.1 引入依赖

    <dependencies>

        <dependency>

            <groupId>org.springframework.boot</groupId>

            <artifactId>spring-boot-starter-web</artifactId>

        </dependency>

        <dependency>

            <groupId>org.projectlombok</groupId>

            <artifactId>lombok</artifactId>

        </dependency>

        <dependency>

            <groupId>org.springframework.boot</groupId>

            <artifactId>spring-boot-starter-mail</artifactId>

        </dependency>

    </dependencies>

        spring-boot-starter-mail是主要进行邮件的依赖。

1.2.2 添加配置项

server:

  port: 8084

logging:

  pattern:

    dateformat: HH:mm:ss:SSS

spring:

  mail:

    # 指定邮件服务器地址

    host: smtp.qq.com

    # 登录账户

    username: "发件人邮箱账号"

    # 授权码

    password: "授权码"

    # 端口

    port: 465

    # 默认编码

    default-encoding: UTF-8

    # 使用的协议

    protocol: smtps

    # 其他的属性

    properties:

      # 默认属性

      "mail.smtp.connectiontimeout": 5000

      "mail.smtp.timeout": 3000

      "mail.smtp.writetimeout": 5000

      "mail.smtp.auth": true

      "mail.smtp.starttls.enable": true

      "mail.smtp.starttls.required": true

      # 自定义属性

      "personal": "告警系统"

      "subject": "订单系统告警"

        需要在QQ邮箱设置中开启SMTP服务,并且记得把配置文件中的spring.mail.username和password替换为自己的邮箱账号和授权码。

        授权码开启方式可以查看QQ邮箱官网的帮助文档:https://service.mail.qq.com/detail/0/75

1.2.3 接口开发

        SkyWalking告警消息接收实体类:

@Data

public class AlarmMessage {

    private int scopeId;

    private String scope;

    private String name;

    private String id0;

    private String id1;

    private String ruleName;

    private String alarmMessage;

    private List<Tag> tags;

    private long startTime;

    private transient int period;

    private Set<String> hooks = new HashSet<>();

    private String expression;

    @Data

    public static class Tag {

        private String key;

        private String value;

    }

}

        邮件发送配置类:

@Slf4j

@Configuration

public class Mail {

    @Autowired

    //读取spring.mail配置并注册成MailProperties对象

    private MailProperties mailProperties;

    @Autowired

    private JavaMailSender javaMailSender;

    public void send(String to,String content) {



        try {

            // 创建一个邮件消息

            MimeMessage message = javaMailSender.createMimeMessage();

            // 创建MimeMessageHelper

            MimeMessageHelper helper = new MimeMessageHelper(message, false);

            // 发件人邮箱和名称

            String personal = Optional.ofNullable(mailProperties.getProperties().get("personal")).orElse(mailProperties.getUsername());

            helper.setFrom(mailProperties.getUsername(), personal);

            // 收件人邮箱

            helper.setTo(to);

            // 邮件标题

            helper.setSubject(mailProperties.getProperties().getOrDefault("subject","告警通知"));

            // 邮件正文,第二个参数表示是否是HTML正文

            helper.setText(content, true);

            // 发送

            javaMailSender.send(message);

        } catch (Exception e) {

            log.error("邮件发送失败, e:" + e);

        }

    }

}

        控制层接口:

@Slf4j

@RequestMapping("/alarm")

@RestController

public class AlarmController {

    @Autowired

    private Mail mail;

    @RequestMapping("/handler")

    public String handler(@RequestBody List<AlarmMessage> alarmMessages) {

        log.info("收到报警, alarmMessages:{}", alarmMessages);

        mail.send("收件人邮箱",buildMessage(alarmMessages));

        return "接收报警成功";

    }

    private String buildMessage(List<AlarmMessage> alarmMessages) {

        StringBuilder builder = new StringBuilder();

        builder.append("系统告警: <br/>");

        for (AlarmMessage alarmMessage : alarmMessages) {

            builder.append("scopeId: ").append(alarmMessage.getScopeId())

                    .append("<br/> scope: ").append(alarmMessage.getScope())

                    .append("<br/> 目标 Scope 的实体名称: ").append(alarmMessage.getName())

                    .append("<br/> Scope 实体的 ID: ").append(alarmMessage.getId0())

                    .append("<br/> 告警规则名称: ").append(alarmMessage.getRuleName())

                    .append("<br/> 告警消息内容: ").append(alarmMessage.getAlarmMessage())

                    .append("<br/>告警时间: ").append(alarmMessage.getStartTime())

                    .append("<br/><br/>---------------");

        }

        return builder.toString();

    }

}

1.2.4 配置Webhook

        主要是配置apache-skywalking-apm-bin\config的alarm-settings.yml文件,配置告警向URL进行通知,这里就是配置向alarm-service服务进行通知,然后由alarm-service服务将告警信息处理并发送邮箱:

hooks:

  webhook:

    default:

      is-default: true

      urls:

        - http://127.0.0.1:8084/alarm/handler

1.2.5 重启服务和SkyWalking

        由于开启分布式事务,因此创建订单操作比较慢,在邮件中就会出现告警信息。

        注意:该信息出现可能比较慢,因为告警规则统计周期默认是10分钟,加上信息处理等就更慢了。

2 Webhook接入飞书

        Webhook还可以接入企业微信、飞书、钉钉等应用,从而让开发和运维人员更及时接收告警信息。在飞书任意一个群组,点击右上角,添加机器人:

        点击自定义机器人,然后配置机器人信息,点击添加:

        在打开的界面复制Webhook地址和签名校验码(如果开启设置),修改SkyWalking的配置文件:

hooks:

  feishu:

    default:

      is-default: true

      text-template: |

        {

          "msg_type":"text",

          "content": {

            "text": "Apache SkyWalking Alarm: \n %s."

          }

        }

      webhooks:

        - url: 飞书获取

          secret: 飞书获取

        然后重启SkyWalking,测试观察飞书机器人推送告警消息:

        可以发现,告警信息被成功推送到飞书。其它应用接入也同理,具体可看应用的开发者文档。

Logo

有“AI”的1024 = 2048,欢迎大家加入2048 AI社区

更多推荐