力扣1225-报告系统状态的连续日期
结果忽略了 2018 年的记录,因为我们只关心从 2019-01-01 到 2019-12-31 的记录。从 2019-01-01 到 2019-01-03 所有任务成功,系统状态为 "succeeded"。从 2019-01-06 到 2019-01-06 所有任务成功,系统状态为 "succeeded"。从 2019-01-04 到 2019-01-05 所有任务失败,系统状态为 "faile
表:Failed
+--------------+---------+ | Column Name | Type | +--------------+---------+ | fail_date | date | +--------------+---------+ 该表主键为 fail_date (具有唯一值的列)。 该表包含失败任务的天数.
表: Succeeded
+--------------+---------+ | Column Name | Type | +--------------+---------+ | success_date | date | +--------------+---------+ 该表主键为 success_date (具有唯一值的列)。 该表包含成功任务的天数.
系统 每天 运行一个任务。每个任务都独立于先前的任务。任务的状态可以是失败或是成功。
编写解决方案找出 2019-01-01 到 2019-12-31 期间任务连续同状态 period_state 的起止日期(start_date 和 end_date)。即如果任务失败了,就是失败状态的起止日期,如果任务成功了,就是成功状态的起止日期。
最后结果按照起始日期 start_date 排序
返回结果样例如下所示:
示例 1:
输入: Failed table: +-------------------+ | fail_date | +-------------------+ | 2018-12-28 | | 2018-12-29 | | 2019-01-04 | | 2019-01-05 | +-------------------+ Succeeded table: +-------------------+ | success_date | +-------------------+ | 2018-12-30 | | 2018-12-31 | | 2019-01-01 | | 2019-01-02 | | 2019-01-03 | | 2019-01-06 | +-------------------+ 输出: +--------------+--------------+--------------+ | period_state | start_date | end_date | +--------------+--------------+--------------+ | succeeded | 2019-01-01 | 2019-01-03 | | failed | 2019-01-04 | 2019-01-05 | | succeeded | 2019-01-06 | 2019-01-06 | +--------------+--------------+--------------+ 解释: 结果忽略了 2018 年的记录,因为我们只关心从 2019-01-01 到 2019-12-31 的记录 从 2019-01-01 到 2019-01-03 所有任务成功,系统状态为 "succeeded"。 从 2019-01-04 到 2019-01-05 所有任务失败,系统状态为 "failed"。 从 2019-01-06 到 2019-01-06 所有任务成功,系统状态为 "succeeded"。
思路:
第一步,构造统一的时间状态表
将原本分散在不同表中的成功日期和失败日期,通过 UNION ALL 合并为一张统一的时间表。
在这一步中,明确每一条记录只包含两个核心信息:
-
日期(date)
-
状态(state:failed 或 succeeded)
这样做的目的是将问题从“多表问题”转化为“单一时间序列分组问题”。
第二步,通过窗口函数制造“分组标识”
连续区间的识别是本题的难点。
解决方法是利用两个行号序列的差值来刻画“连续性”:
-
一个行号按全局日期顺序递增
-
一个行号按状态分组、日期顺序递增
当状态在日期序列中连续时,这两个行号的差值保持不变;
一旦状态发生切换,差值就会改变。
因此,全局行号 − 状态内行号 可以作为“连续区间的唯一标识”。
第三步,按区间标识进行分组聚合
在得到区间标识后,只需按以下维度分组:
-
状态
-
连续区间标识
然后分别取:
-
最小日期作为区间起始
-
最大日期作为区间结束
即可得到每一个状态连续区间的完整时间范围。
| 输入1 | |||||||||||||||||
| SUCCESS_DATE | date | state | date | state | date | state | rn1 | rn2 | rn1-rn2 | ||||||||
| 2018-12-30 | 2018-12-30 | succeeded | 2018-12-28 | failed | row_number() over (order by dt) as rn1, row_number() over (partition by state order by dt) as rn2 |
2018-12-28 | failed | 1 | 1 | 0 | date | state | |||||
| 2018-12-31 | 2018-12-31 | succeeded | 2018-12-29 | failed | 2018-12-29 | failed | 2 | 2 | 0 | where dt between date'2019-01-01' and date' 2019-12-31' | 2018-12-30 | succeeded | |||||
| 2019-01-01 | 2019-01-01 | succeeded | 排序 | 2018-12-30 | succeeded | 2018-12-30 | succeeded | 3 | 1 | 2 | 2018-12-31 | succeeded | group by state, (rn1 - rn2) order by start_date; |
||||
| 2019-01-02 | 2019-01-02 | succeeded | 2018-12-31 | succeeded | 2018-12-31 | succeeded | 4 | 2 | 2 | 2019-01-01 | succeeded | ||||||
| 2019-01-03 | 首先把两个表合并 | 2019-01-03 | succeeded | 2019-01-01 | succeeded | 2019-01-01 | succeeded | 5 | 3 | 2 | 2019-01-02 | succeeded | |||||
| 2019-01-06 | 2019-01-06 | succeeded | 2019-01-02 | succeeded | 2019-01-02 | succeeded | 6 | 4 | 2 | 2019-01-03 | succeeded | ||||||
| 2018-12-28 | failed | 2019-01-03 | succeeded | 2019-01-03 | succeeded | 7 | 5 | 2 | 2019-01-04 | failed | |||||||
| 输入2 | 2018-12-29 | failed | 2019-01-04 | failed | 2019-01-04 | failed | 8 | 3 | 5 | 2019-01-05 | failed | ||||||
| FAIL_DATE | 2019-01-04 | failed | 2019-01-05 | failed | 2019-01-05 | failed | 9 | 4 | 5 | 2019-01-06 | succeeded | ||||||
| 2018-12-28 | 2019-01-05 | failed | 2019-01-06 | succeeded | 2019-01-06 | succeeded | 10 | 6 | 4 | ||||||||
| 2018-12-29 | (rn1 - rn2) | ||||||||||||||||
| 2019-01-04 | period_state | start_date | end_date | ||||||||||||||
| 2019-01-05 | succeeded | 2019-01-01 | 2019-01-03 | ||||||||||||||
| failed | 2019-01-04 | 2019-01-05 | |||||||||||||||
| succeeded | 2019-01-06 | 2019-01-06 | |||||||||||||||
代码:
with t1 as (
select to_char(fail_date,'YYYY-MM-DD') as dt, 'failed' as state
from failed
union all
select to_char(success_date,'YYYY-MM-DD') as dt, 'succeeded' as state
from succeeded
),
t as (
select dt,
state,
row_number() over (order by dt) as rn1,
row_number() over (partition by state order by dt) as rn2
from t1
)
select state as period_state,
min(dt) as start_date,
max(dt) as end_date
from t
where dt between date'2019-01-01' and date' 2019-12-31'
group by state, (rn1 - rn2)
order by start_date;
更多推荐


所有评论(0)