一、问题现象与触发条件

1、页面/服务表现

image-20260116210107745

现象特征

  • Ranger Admin 能启动,但后台任务持续报错(定时刷)
  • 日志出现 HiveResourceMgr timedTask 初始化失败
  • 同时伴随 ZooKeeper 相关的 动态配置解析错误SASL/JAAS 认证失败

2、典型复现场景

场景条件 说明 影响点
集群启用了 Kerberos Ranger / ZK / Hive 链路走安全认证 认证参数缺失会直接失败
Ranger 依赖 ZooKeeper 资源探测、服务发现、连接管理常见依赖 ZK 配置异常会放大影响
使用 Bigtop Stack stacks/BIGTOP/3.2.0 的配置模板/默认值差异 更容易出现“默认没开但需要”的开关

排查策略先说结论
这类问题通常是 两个错误叠在一起
先把会“持续刷屏”的硬错误(dynamic config 解析失败)止血,再处理认证链路(SASL/JAAS)。

二、日志定位:timedTask 失败 + dynamic config 非法事件 + 认证失败

1、核心日志(原样保留)

      at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1191)
        at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659)
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
        at java.lang.Thread.run(Thread.java:748)
2026-01-16 12:58:24,324 [timed-executor-pool-0] ERROR [HiveResourceMgr.java:172] Could not initiate at timedTask
2026-01-16 12:58:42,783 [timed-executor-pool-0] INFO [BaseClient.java:132] Init Lookup Login: security enabled, using lookupPrincipal/lookupKeytab
2026-01-16 12:58:42,786 [timed-executor-pool-0] INFO [HiveClient.java:85] Secured Mode: JDBC Connection done with preAuthenticated Subject
2026-01-16 12:58:42,787 [timed-executor-pool-0-SendThread(dev1:2181)] WARN [ClientCnxn.java:1094] SASL configuration failed: javax.security.auth.login.LoginException: No JAAS configuration section named 'Client' was found in specified JAAS configuration file: '/dev/null'. Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it.
2026-01-16 12:58:42,788 [timed-executor-pool-0-EventThread] ERROR [ConnectionState.java:307] Authentication failed
2026-01-16 12:58:42,793 [timed-executor-pool-0-EventThread] ERROR [EnsembleTracker.java:214] Invalid config event received: {server.1=dev1:2888:3888:participant, version=0, server.3=dev3:2888:3888:participant, server.2=dev2:2888:3888:participant}
2026-01-16 12:58:42,793 [timed-executor-pool-0-EventThread] ERROR [EnsembleTracker.java:214] Invalid config event received: {server.1=dev1:2888:3888:participant, version=0, server.3=dev3:2888:3888:participant, server.2=dev2:2888:3888:participant}
2026-01-16 12:58:43,264 [timed-executor-pool-0] INFO [BaseClient.java:132] Init Lookup Login: security enabled, using lookupPrincipal/lookupKeytab
2026-01-16 12:58:43,266 [timed-executor-pool-0] INFO [HiveClient.java:85] Secured Mode: JDBC Connection done with preAuthenticated Subject
2026-01-16 12:58:43,268 [timed-executor-pool-0-SendThread(dev2:2181)] WARN [ClientCnxn.java:1094] SASL configuration failed: javax.security.auth.login.LoginException: No JAAS configuration section named 'Client' was found in specified JAAS configuration file: '/dev/null'. Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it.
2026-01-16 12:58:43,269 [timed-executor-pool-0-EventThread] ERROR [ConnectionState.java:307] Authentication failed
2026-01-16 12:58:43,274 [timed-executor-pool-0-EventThread] ERROR [EnsembleTracker.java:214] Invalid config event received: {server.1=dev1:2888:3888:participant, version=0, server.3=dev3:2888:3888:participant, server.2=dev2:2888:3888:participant}
2026-01-16 12:58:43,274 [timed-executor-pool-0-EventThread] ERROR [EnsembleTracker.java:214] Invalid config event received: {server.1=dev1:2888:3888:participant, version=0, server.3=dev3:2888:3888:participant, server.2=dev2:2888:3888:participant}
2026-01-16 12:58:43,378 [timed-executor-pool-0-SendThread(dev3:2181)] WARN [ClientCnxn.java:1094] SASL configuration failed: javax.security.auth.login.LoginException: No JAAS configuration section named 'Client' was found in specified JAAS configuration file: '/dev/null'. Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it.
2026-01-16 12:58:43,378 [timed-executor-pool-0-EventThread] ERROR [ConnectionState.java:307] Authentication failed
2026-01-16 12:58:43,384 [timed-executor-pool-0-EventThread] ERROR [EnsembleTracker.java:214] Invalid config event received: {server.1=dev1:2888:3888:participant, version=0, server.3=dev3:2888:3888:participant, server.2=dev2:2888:3888:participant}
2026-01-16 12:58:43,384 [timed-executor-pool-0-EventThread] ERROR [EnsembleTracker.java:214] Invalid config event received: {server.1=dev1:2888:3888:participant, version=0, server.3=dev3:2888:3888:participant, server.2=dev2:2888:3888:participant}
2026-01-16 12:58:43,489 [timed-executor-pool-0-SendThread(dev2:2181)] WARN [ClientCnxn.java:1094] SASL configuration failed: javax.security.auth.login.LoginException: No JAAS configuration section named 'Client' was found in specified JAAS configuration file: '/dev/null'. Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it.
2026-01-16 12:58:43,489 [timed-executor-pool-0-EventThread] ERROR [ConnectionState.java:307] Authentication failed
2026-01-16 12:58:43,493 [timed-executor-pool-0-EventThread] ERROR [EnsembleTracker.java:214] Invalid config event received: {server.1=dev1:2888:3888:participant, version=0, server.3=dev3:2888:3888:participant, server.2=dev2:2888:3888:participant}
2026-01-16 12:58:43,494 [timed-executor-pool-0-EventThread] ERROR [EnsembleTracker.java:214] Invalid config event received: {server.1=dev1:2888:3888:participant, version=0, server.3=dev3:2888:3888:participant, server.2=dev2:2888:3888:participant}

2、日志关键点拆解

先抓“最能指导下一步动作”的三行

  • Invalid config event received:dynamic config 解析失败(硬错误、可稳定复现)
  • specified JAAS configuration file: '/dev/null':JAAS 配置指向异常(高概率是启动参数没生效/被覆盖)
  • Authentication failed:安全模式下 ZK 认证失败(会让后续组件初始化频繁失败)

三、造成的原因(两个问题叠加)

1、原因一:ZooKeeper dynamic config 缺少 extended clientPort 段

现象里能看到 server.* 已经出现 :participant,但缺少 ;host:2181 的 clientPort 扩展段。

为什么会触发 Invalid config event
ZK 动态配置(/zookeeper/config)属于可变的 ensemble 描述信息。
客户端(Curator 的 EnsembleTracker)监听到变更事件后,需要解析出完整 server 条目(含角色与对外 clientPort)。
当事件只带 participant 而缺失 clientPort 时,就会被判定为“非法配置事件”,从而刷 Invalid config event received

2、原因二:Ranger 侧 ZK SASL/JAAS 参数缺失或被覆盖到 /dev/null

日志明确写了:

  • No JAAS configuration section named 'Client'
  • ... JAAS configuration file: '/dev/null'

这类问题最常见的根因不是 keytab 不对,而是 JVM 参数没有带上(或者被 systemd/脚本覆盖),导致 Ranger 在连接 ZK 时根本没加载到期望的
JAAS 文件,自然也就找不到 Client 段。

为什么 timedTask 会被打断
HiveResourceMgr timedTask 属于后台周期任务,初始化阶段经常会依赖内部连接状态(例如 ZK 状态、认证状态、下游依赖)。
当 ZK 侧不断抛出 “非法配置事件 + 认证失败” 时,定时任务的初始化动作会反复失败,表现为服务“活着但一直报错”。

四、排查策略(先止血、再闭环)


::: danger 处理办法可参考
22205:解决办法
:::

验收标准

  • Ranger 重启后不再持续出现 Invalid config event received
  • 日志中不再出现 /dev/null,且 Authentication failed 不再刷屏
  • HiveResourceMgr timedTask 不再周期性连续失败(允许偶发,但不能持续增长)

image-20260119112421803

Logo

有“AI”的1024 = 2048,欢迎大家加入2048 AI社区

更多推荐