背景

we have got the data (pictures of multiple periods of Klines and results of each trade) ready and model next.

model structure

选定swin_transformer开源项目的网络结构进行建模。

数据预处理

完成了训练流程,发现标签需要标准化,才可避免梯度爆炸。图像本身进行标准化,才能加速收敛。因此,需要添加一个图像和标签的标准化流程。于是进行了添加。样本还未完全计算完全(20000 / 39999),整体样本数据已占据存储600G以上,一度爆储存。不得不用移动硬盘替代。可考虑年前去找键新添加多个机械硬盘,笔记本加个内存。

模型训练

  1. 四合一(收益、最大收益、回撤风险,持有期)
    终于把数据搞定后,开始了训练,训练结果初看下来并不乐观。以下,是训练过程中的损失函数变化。训练的损失函数下降是有限的,测试更差,有过拟合嫌疑,豆包肯定地说虽然下降幅度有限,但50%到55%的提升,也是显著的(金融数据的特殊性);元宝ds认为,模型与数据不匹配,模型的复杂度适合更大数据量,我的数据量只有4万多,样本太少。认为需要改造网络结构。认为,现在的这种下降幅度,早就该停止了,没加早停是在浪费算力资源。
    我觉得,确实程序存在一定的问题。但还需排除4个标签本身有一些就是随机的,影响其它可预测的。因此,需要先拆开来建模看效果。
(tqsdk) ➜  future_minutes /home/hyt/anaconda3/envs/tqsdk/bin/python /home/hyt/HYT/future_minutes/swin_transformer_hyt001_fast-01.py

/home/hyt/anaconda3/envs/tqsdk/lib/python3.9/site-packages/timm/models/layers/__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
  warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning)
使用设备: cuda
正在极速扫描所有.pt文件方向(用于完美平衡采样)...
100%|██████████████████████████████████████████████████████████████████████████████████████| 44866/44866 [1:01:27<00:00, 12.17it/s]
扫描完成,共加载 44866 个样本(多头+空头已平衡)
警告:多空样本不平衡,仅使用 35890 个样本进行完美平衡
平衡后实际批次数量: 1121 (每批32个)
实际训练批次/epoch: 1121
torch.compile 加速已启用,起飞!

=== 开始核弹级PT训练 Swin-Small 64通道特化版 ===

Epoch 01/100: 100%|████████████████████████████████████████| 1121/1121 [03:59<00:00,  4.68it/s, loss=0.63326478, avg=1.07413434, gpu=3.5GB]
Val avg loss: 1.07437090

✓ 新最佳模型已保存!Loss = 1.07437090
Epoch 02/100: 100%|████████████████████████████████████████| 1121/1121 [03:47<00:00,  4.93it/s, loss=0.62041259, avg=0.97859784, gpu=3.5GB]
Val avg loss: 1.08048323
Epoch 03/100: 100%|████████████████████████████████████████| 1121/1121 [03:46<00:00,  4.96it/s, loss=0.74675131, avg=0.97378976, gpu=3.5GB]
Val avg loss: 1.07371125

✓ 新最佳模型已保存!Loss = 1.07371125
Epoch 04/100: 100%|████████████████████████████████████████| 1121/1121 [03:48<00:00,  4.90it/s, loss=0.22312266, avg=0.97219397, gpu=3.5GB]
Val avg loss: 1.07610179
Epoch 05/100: 100%|████████████████████████████████████████| 1121/1121 [03:47<00:00,  4.93it/s, loss=0.22025071, avg=0.97020586, gpu=3.5GB]
Val avg loss: 1.07017845

✓ 新最佳模型已保存!Loss = 1.07017845
Epoch 06/100: 100%|████████████████████████████████████████| 1121/1121 [03:47<00:00,  4.92it/s, loss=0.25912455, avg=0.96928886, gpu=3.5GB]
Val avg loss: 1.07089214
Epoch 07/100: 100%|████████████████████████████████████████| 1121/1121 [03:51<00:00,  4.83it/s, loss=0.31828529, avg=0.96902709, gpu=3.5GB]
Val avg loss: 1.07032022
Epoch 08/100: 100%|████████████████████████████████████████| 1121/1121 [03:45<00:00,  4.96it/s, loss=5.26876783, avg=0.96852971, gpu=3.5GB]
Val avg loss: 1.07127791
Epoch 09/100: 100%|████████████████████████████████████████| 1121/1121 [03:47<00:00,  4.93it/s, loss=1.08364940, avg=0.96876489, gpu=3.5GB]
Val avg loss: 1.07230785
Epoch 10/100: 100%|████████████████████████████████████████| 1121/1121 [03:49<00:00,  4.89it/s, loss=0.43948999, avg=0.96810944, gpu=3.5GB]
Val avg loss: 1.06999409

✓ 新最佳模型已保存!Loss = 1.06999409
Epoch 11/100: 100%|████████████████████████████████████████| 1121/1121 [03:47<00:00,  4.92it/s, loss=5.80972099, avg=0.96825103, gpu=3.5GB]
Val avg loss: 1.07015352
Epoch 12/100: 100%|████████████████████████████████████████| 1121/1121 [03:48<00:00,  4.91it/s, loss=0.16305605, avg=0.96743955, gpu=3.5GB]
Val avg loss: 1.06956982

✓ 新最佳模型已保存!Loss = 1.06956982
Epoch 13/100: 100%|████████████████████████████████████████| 1121/1121 [03:49<00:00,  4.88it/s, loss=0.40855801, avg=0.96784777, gpu=3.5GB]
Val avg loss: 1.07085939
Epoch 14/100: 100%|████████████████████████████████████████| 1121/1121 [03:48<00:00,  4.91it/s, loss=0.83153731, avg=0.96757288, gpu=3.5GB]
Val avg loss: 1.06990800
Epoch 15/100: 100%|████████████████████████████████████████| 1121/1121 [03:48<00:00,  4.92it/s, loss=0.87010413, avg=0.96738295, gpu=3.5GB]
Val avg loss: 1.07109683
Epoch 16/100: 100%|████████████████████████████████████████| 1121/1121 [03:49<00:00,  4.89it/s, loss=0.10440230, avg=0.96725366, gpu=3.5GB]
Val avg loss: 1.06938423

✓ 新最佳模型已保存!Loss = 1.06938423
Epoch 17/100: 100%|████████████████████████████████████████| 1121/1121 [03:48<00:00,  4.91it/s, loss=0.79202574, avg=0.96704145, gpu=3.5GB]
Val avg loss: 1.06928315

✓ 新最佳模型已保存!Loss = 1.06928315
Epoch 18/100: 100%|████████████████████████████████████████| 1121/1121 [03:47<00:00,  4.93it/s, loss=0.21817572, avg=0.96670713, gpu=3.5GB]
Val avg loss: 1.06991853
Epoch 19/100: 100%|████████████████████████████████████████| 1121/1121 [06:34<00:00,  2.84it/s, loss=0.45474541, avg=0.96680410, gpu=3.5GB]
Val avg loss: 1.06913703

✓ 新最佳模型已保存!Loss = 1.06913703
Epoch 20/100: 100%|████████████████████████████████████████| 1121/1121 [05:18<00:00,  3.52it/s, loss=0.23358890, avg=0.96677210, gpu=3.5GB]
Val avg loss: 1.06902101

✓ 新最佳模型已保存!Loss = 1.06902101
Epoch 21/100: 100%|████████████████████████████████████████| 1121/1121 [04:20<00:00,  4.31it/s, loss=0.31211245, avg=0.96665367, gpu=3.5GB]
Val avg loss: 1.06910960
Epoch 22/100: 100%|████████████████████████████████████████| 1121/1121 [03:47<00:00,  4.93it/s, loss=2.16967010, avg=0.96661273, gpu=3.5GB]
Val avg loss: 1.06897959

✓ 新最佳模型已保存!Loss = 1.06897959
Epoch 23/100: 100%|████████████████████████████████████████| 1121/1121 [03:46<00:00,  4.96it/s, loss=0.40225464, avg=0.96655513, gpu=3.5GB]
Val avg loss: 1.06912005
Epoch 24/100: 100%|████████████████████████████████████████| 1121/1121 [03:45<00:00,  4.96it/s, loss=0.73457479, avg=0.96661538, gpu=3.5GB]
Val avg loss: 1.06914914
Epoch 25/100: 100%|████████████████████████████████████████| 1121/1121 [03:45<00:00,  4.97it/s, loss=0.86498713, avg=0.96663778, gpu=3.5GB]
Val avg loss: 1.06895199

✓ 新最佳模型已保存!Loss = 1.06895199
Epoch 26/100: 100%|████████████████████████████████████████| 1121/1121 [03:46<00:00,  4.94it/s, loss=0.24020988, avg=0.96658041, gpu=3.5GB]
Val avg loss: 1.06898880
Epoch 27/100: 100%|████████████████████████████████████████| 1121/1121 [03:47<00:00,  4.94it/s, loss=0.35797831, avg=0.96651115, gpu=3.5GB]
Val avg loss: 1.06987666
Epoch 28/100: 100%|████████████████████████████████████████| 1121/1121 [03:48<00:00,  4.91it/s, loss=4.09526634, avg=0.96655068, gpu=3.5GB]
Val avg loss: 1.06958268
Epoch 29/100: 100%|████████████████████████████████████████| 1121/1121 [03:46<00:00,  4.95it/s, loss=0.87060237, avg=0.96654537, gpu=3.5GB]
Val avg loss: 1.06896356
Epoch 30/100: 100%|████████████████████████████████████████| 1121/1121 [03:46<00:00,  4.94it/s, loss=0.25923204, avg=0.96648222, gpu=3.5GB]
Val avg loss: 1.06899151
Epoch 31/100: 100%|████████████████████████████████████████| 1121/1121 [03:52<00:00,  4.83it/s, loss=0.66931766, avg=0.96649663, gpu=3.5GB]
Val avg loss: 1.06900774
Epoch 32/100: 100%|████████████████████████████████████████| 1121/1121 [03:45<00:00,  4.96it/s, loss=0.69382679, avg=0.96648335, gpu=3.5GB]
Val avg loss: 1.06911820
Epoch 33/100: 100%|████████████████████████████████████████| 1121/1121 [03:46<00:00,  4.94it/s, loss=4.48531008, avg=0.96650198, gpu=3.5GB]
Val avg loss: 1.06895460
Epoch 34/100: 100%|████████████████████████████████████████| 1121/1121 [03:45<00:00,  4.96it/s, loss=0.17875484, avg=0.96645574, gpu=3.5GB]
Val avg loss: 1.06908630
Epoch 35/100: 100%|████████████████████████████████████████| 1121/1121 [03:47<00:00,  4.93it/s, loss=0.31121632, avg=0.96650767, gpu=3.5GB]
Val avg loss: 1.06925995
Epoch 36/100: 100%|████████████████████████████████████████| 1121/1121 [03:46<00:00,  4.94it/s, loss=0.36076117, avg=0.96650444, gpu=3.5GB]
Val avg loss: 1.06904670
Epoch 37/100: 100%|████████████████████████████████████████| 1121/1121 [03:45<00:00,  4.96it/s, loss=0.99474967, avg=0.96647074, gpu=3.5GB]
Val avg loss: 1.06915497
Epoch 38/100: 100%|████████████████████████████████████████| 1121/1121 [03:47<00:00,  4.92it/s, loss=0.52422506, avg=0.96650051, gpu=3.5GB]
Val avg loss: 1.06896952
Epoch 39/100: 100%|████████████████████████████████████████| 1121/1121 [03:46<00:00,  4.95it/s, loss=1.01487195, avg=0.96645752, gpu=3.5GB]
Val avg loss: 1.06905637
Epoch 40/100: 100%|████████████████████████████████████████| 1121/1121 [03:47<00:00,  4.93it/s, loss=0.34683526, avg=0.96644983, gpu=3.5GB]
Val avg loss: 1.06896538
Epoch 41/100: 100%|████████████████████████████████████████| 1121/1121 [03:47<00:00,  4.94it/s, loss=4.34267187, avg=0.96646171, gpu=3.5GB]
Val avg loss: 1.06900850
Epoch 42/100: 100%|████████████████████████████████████████| 1121/1121 [03:46<00:00,  4.95it/s, loss=0.55466294, avg=0.96643763, gpu=3.5GB]
Val avg loss: 1.06901717
Epoch 43/100: 100%|████████████████████████████████████████| 1121/1121 [03:46<00:00,  4.95it/s, loss=0.56479526, avg=0.96645988, gpu=3.5GB]
Val avg loss: 1.06902186
Epoch 44/100: 100%|████████████████████████████████████████| 1121/1121 [03:47<00:00,  4.92it/s, loss=0.18783274, avg=0.96645902, gpu=3.5GB]
Val avg loss: 1.06899510
Epoch 45/100: 100%|████████████████████████████████████████| 1121/1121 [03:46<00:00,  4.94it/s, loss=0.58698833, avg=0.96646266, gpu=3.5GB]
Val avg loss: 1.06904362
Epoch 46/100: 100%|████████████████████████████████████████| 1121/1121 [03:46<00:00,  4.96it/s, loss=0.19953381, avg=0.96644867, gpu=3.5GB]
Val avg loss: 1.06899328
Epoch 47/100: 100%|████████████████████████████████████████| 1121/1121 [03:46<00:00,  4.94it/s, loss=0.57037455, avg=0.96643414, gpu=3.5GB]
Val avg loss: 1.06899170
Epoch 48/100: 100%|████████████████████████████████████████| 1121/1121 [03:46<00:00,  4.95it/s, loss=0.31353521, avg=0.96642916, gpu=3.5GB]
Val avg loss: 1.06905423
Epoch 49/100: 100%|████████████████████████████████████████| 1121/1121 [03:46<00:00,  4.94it/s, loss=4.26344156, avg=0.96644253, gpu=3.5GB]
Val avg loss: 1.06899236
Epoch 50/100: 100%|████████████████████████████████████████| 1121/1121 [03:46<00:00,  4.94it/s, loss=0.55466473, avg=0.96641305, gpu=3.5GB]
Val avg loss: 1.06925451
Epoch 51/100: 100%|████████████████████████████████████████| 1121/1121 [03:47<00:00,  4.93it/s, loss=0.82683092, avg=0.96644953, gpu=3.5GB]
Val avg loss: 1.06898109
Epoch 52/100: 100%|████████████████████████████████████████| 1121/1121 [03:45<00:00,  4.96it/s, loss=0.31432250, avg=0.96642019, gpu=3.5GB]
Val avg loss: 1.06897355
Epoch 53/100: 100%|████████████████████████████████████████| 1121/1121 [03:46<00:00,  4.95it/s, loss=0.31643945, avg=0.96643430, gpu=3.5GB]
Val avg loss: 1.06902162
Epoch 54/100: 100%|████████████████████████████████████████| 1121/1121 [03:47<00:00,  4.92it/s, loss=0.55437291, avg=0.96643079, gpu=3.5GB]
Val avg loss: 1.06895437
Epoch 55/100: 100%|████████████████████████████████████████| 1121/1121 [03:46<00:00,  4.94it/s, loss=0.45501837, avg=0.96641875, gpu=3.5GB]
Val avg loss: 1.06897736
Epoch 56/100:  81%|█████████████████████████████████▎       | 911/1121 [03:06<00:36,  5.69it/s, loss=1.75859272, avg=0.95844821, gpEpoch 56/100:  81%|█████████████████████████████████▎       | 912/1121 [03:06<00:55,  3.75it/s, loss=1.75859272, avg=0.95844821, gpEpoch 56/100:  81%|█████████████████████████████████▎       | 912/1121 [03:06<00:55,  3.75it/s, loss=0.30090314, avg=0.95772801, gpEpoch 56/100:  81%|█████████████████████████████████▍       | 913/1121 [03:06<00:47,  4.35it/s, loss=0.30090314, avg=0.95772801, gpEpoch 56/100:  81%|█████████████████████████████████▍       | 913/1121 [03:06<00:47,  4.35it/s, loss=0.50792825, avg=0.95723589, gpEpoch 56/100:  82%|█████████████████████████████████▍       | 914/1121 [03:06<00:42,  4.88it/s, loss=0.50792825, avg=0.95723589, gpEpoch 56/100:  82%|█████████████████████████████████▍       | 914/1121 [03:06<00:42,  4.88it/s, loss=0.41094363, avg=0.95663885, gpEpoch 56/100:  82%|█████████████████████████████████▍       | 915/1121 [03:06<00:38,  5.34it/s, loss=0.41094363, avg=0.95663885, gpEpoch 56/100:  82%|█████████████████████████████████▍       | 915/1121 [03:07<00:38,  5.34it/s, loss=0.92708743, avg=0.95660659, gpEpoch 56/100:  82%|█████████████████████████████████▌       | 916/1121 [03:07<00:49,  4.16it/s, loss=0.92708743, avg=0.95660659, gpEpoch 56/100:  82%|█████████████████████████████████▌       | 916/1121 [03:07<00:49,  4.16it/s, loss=0.95037472, avg=0.95659979, gpEpoch 56/100:  82%|█████████████████████████████████▌       | 917/1121 [03:07<00:41,  4.87it/s, loss=0.95037472, avg=0.95659979, gpEpoch 56/100:  82%|█████████████████████████████████▌       | 917/1121 [03:07<00:41,  4.87it/s, loss=0.74832731, avg=0.95637291, gpEpoch 56/100:  82%|█████████████████████████████████▌       | 918/1121 [03:07<00:37,  5.42it/s, loss=0.74832731, avg=0.95637291, gpEpoch 56/100:  82%|█████████████████████████████████▌       | 918/1121 [03:07<00:37,  5.42it/s, loss=0.41776079, avg=0.95578683, gpEpoch 56/100:  82%|█████████████████████████████████▌       | 919/1121 [03:07<00:35,  5.76it/s, loss=0.41776079, avg=0.95578683, gpEpoch 56/100:  82%|█████████████████████████████████▌       | 919/1121 [03:08<00:35,  5.76it/s, loss=0.84913218, avg=0.95567090, gpEpoch 56/100:  82%|█████████████████████████████████▋       | 920/1121 [03:08<00:45,  4.39it/s, loss=0.84913218, avg=0.95567090, gpEpoch 56/100:  82%|█████████████████████████████████▋       | 920/1121 [03:08<00:45,  4.39it/s, loss=0.34666646, avg=0.95500966, gpEpoch 56/100:  82%|█████████████████████████████████▋       | 921/1121 [03:08<00:39,  5.08it/s, loss=0.34666646, avg=0.95500966, gpEpoch 56/100:  82%|█████████████████████████████████▋       | 921/1121 [03:08<00:39,  5.08it/s, loss=0.58650064, avg=0.95460997, gpEpoch 56/100:  82%|█████████████████████████████████▋       | 922/1121 [03:08<00:34,  5.71it/s, loss=0.58650064, avg=0.95460997, gpEpoch 56/100:  82%|█████████████████████████████████▋       | 922/1121 [03:08<00:34,  5.71it/s, loss=2.19289708, avg=0.95595156, gpEpoch 56/100:  82%|█████████████████████████████████▊       | 923/1121 [03:08<00:32,  6.11it/s, loss=2.19289708, avg=0.95595156, gpEpoch 56/100:  82%|█████████████████████████████████▊       | 923/1121 [03:08<00:32,  6.11it/s, loss=0.22203785, avg=0.95515728, gpEpoch 56/100:  82%|█████████████████████████████████▊       | 924/1121 [03:08<00:48,  4.06it/s, loss=0.22203785, avg=0.95515728, gpEpoch 56/100:  82%|█████████████████████████████████▊       | 924/1121 [03:08<00:48,  4.06it/s, loss=0.13046201, avg=0.95426572, gpEpoch 56/100:  83%|█████████████████████████████████▊       | 925/1121 [03:08<00:41,  4.73it/s, loss=0.13046201, avg=0.95426572, gpEpoch 56/100:  83%|█████████████████████████████████▊       | 925/1121 [03:09<00:41,  4.73it/s, loss=0.12947308, avg=0.95337502, gpEpoch 56/100:  83%|█████████████████████████████████▊       | 926/1121 [03:09<00:37,  5.20it/s, loss=0.12947308, avg=0.95337502, gpEpoch 56/100:  83%|█████████████████████████████████▊       | 926/1121 [03:09<00:37,  5.20it/s, loss=0.48018736, avg=0.95286457, gpEpoch 56/100:  83%|█████████████████████████████████▉       | 927/1121 [03:09<00:34,  5.66it/s, loss=0.48018736, avg=0.95286457, gpEpoch 56/100:  83%|█████████████████████████████████▉       | 927/1121 [03:09<00:34,  5.66it/s, loss=0.32269123, avg=0.95218550, gpEpoch 56/100:  83%|█████████████████████████████████▉       | 928/1121 [03:09<00:44,  4.35it/s, loss=0.32269123, avg=0.95218550, gpEpoch 56/100:  83%|█████████████████████████████████▉       | 928/1121 [03:09<00:44,  4.35it/s, loss=0.23131919, avg=0.95140954, gpEpoch 56/100:  83%|█████████████████████████████████▉       | 929/1121 [03:09<00:38,  4.97it/s, loss=0.23131919, avg=0.95140954, gpEpoch 56/100:  83%|█████████████████████████████████▉       | 929/1121 [03:09<00:38,  4.97it/s, loss=0.26043105, avg=0.95066655, gpEpoch 56/100:  83%|██████████████████████████████████       | 930/1121 [03:09<00:35,  5.38it/s, loss=0.26043105, avg=0.95066655, gpEpoch 56/100:  83%|██████████████████████████████████       | 930/1121 [03:10<00:35,  5.38it/s, loss=0.19904272, avg=0.94985922, gpEpoch 56/100:  83%|██████████████████████████████████       | 931/1121 [03:10<00:31,  5.95it/s, loss=0.19904272, avg=0.94985922, gpEpoch 56/100:  83%|██████████████████████████████████       | 931/1121 [03:10<00:31,  5.95it/s, loss=0.61319292, avg=0.94949799, gpEpoch 56/100:  83%|██████████████████████████████████       | 932/1121 [03:10<00:47,  3.99it/s, loss=0.61319292, avg=0.94949799, gpEpoch 56/100:  83%|██████████████████████████████████       | 932/1121 [03:10<00:47,  3.99it/s, loss=2.82883358, avg=0.95151229, gpEpoch 56/100:  83%|██████████████████████████████████       | 933/1121 [03:10<00:39,  4.71it/s, loss=2.82883358, avg=0.95151229, gpEpoch 56/100:  83%|██████████████████████████████████       | 933/1121 [03:10<00:39,  4.71it/s, loss=0.43572843, avg=0.95096005, gpEpoch 56/100:  83%|██████████████████████████████████▏      | 934/1121 [03:10<00:35,  5.29it/s, loss=0.43572843, avg=0.95096005, gpEpoch 56/100:  83%|██████████████████████████████████▏      | 934/1121 [03:10<00:35,  5.29it/s, loss=0.65755481, avg=0.95064625, gpEpoch 56/100:  83%|██████████████████████████████████▏      | 935/1121 [03:10<00:32,  5.78it/s, loss=0.65755481, avg=0.95064625, gpEpoch 56/100:  83%|██████████████████████████████████▏      | 935/1121 [03:11<00:32,  5.78it/s, loss=0.63814223, avg=0.95031238, gpEpoch 56/100:  83%|██████████████████████████████████▏      | 936/1121 [03:11<00:48,  3.82it/s, loss=0.63814223, avg=0.95031238, gpEpoch 56/100:  83%|██████████████████████████████████▏      | 936/1121 [03:11<00:48,  3.82it/s, loss=0.46607071, avg=0.94979558, gpEpoch 56/100:  84%|██████████████████████████████████▎      | 937/1121 [03:11<00:41,  4.42it/s, loss=0.46607071, avg=0.94979558, gpEpoch 56/100:  84%|██████████████████████████████████▎      | 937/1121 [03:11<00:41,  4.42it/s, loss=0.97608596, avg=0.94982361, gpEpoch 56/100:  84%|██████████████████████████████████▎      | 938/1121 [03:11<00:36,  5.03it/s, loss=0.97608596, avg=0.94982361, gpEpoch 56/100:  84%|██████████████████████████████████▎      | 938/1121 [03:11<00:36,  5.03it/s, loss=0.49637291, avg=0.94934070, gpEpoch 56/100:  84%|██████████████████████████████████▎      | 939/1121 [03:11<00:32,  5.62it/s, loss=0.49637291, avg=0.94934070, gpEpoch 56/100:  84%|██████████████████████████████████▎      | 939/1121 [03:12<00:32,  5.62it/s, loss=0.79707557, avg=0.94917872, gpEpoch 56/100:  84%|██████████████████████████████████▍      | 940/1121 [03:12<00:40,  4.46it/s, loss=0.79707557, avg=0.94917872, gpEpoch 56/100:  84%|██████████████████████████████████▍      | 940/1121 [03:12<00:40,  4.46it/s, loss=0.63939363, avg=0.94884951, gpEpoch 56/100:  84%|██████████████████████████████████▍      | 941/1121 [03:12<00:36,  4.90it/s, loss=0.63939363, avg=0.94884951, gpEpoch 56/100:  84%|██████████████████████████████████▍      | 941/1121 [03:12<00:36,  4.90it/s, loss=1.13139093, avg=0.94904329, gpEpoch 56/100:  84%|██████████████████████████████████▍      | 942/1121 [03:12<00:33,  5.32it/s, loss=1.13139093, avg=0.94904329, gpEpoch 56/100:  84%|██████████████████████████████████▍      | 942/1121 [03:12<00:33,  5.32it/s, loss=4.95672369, avg=0.95329321, gpEpoch 56/100:  84%|██████████████████████████████████▍      | 943/1121 [03:12<00:31,  5.67it/s, loss=4.95672369, avg=0.95329321, gpEpoch 56/100:  84%|██████████████████████████████████▍      | 943/1121 [03:12<00:31,  5.67it/s, loss=1.47614765, avg=0.95384709, gpEpoch 56/100:  84%|██████████████████████████████████▌      | 944/1121 [03:12<00:39,  4.46it/s, loss=1.47614765, avg=0.95384709, gpEpoch 56/100:  84%|██████████████████████████████████▌      | 944/1121 [03:13<00:39,  4.46it/s, loss=0.36534667, avg=0.95322433, gpEpoch 56/100:  84%|██████████████████████████████████▌      | 945/1121 [03:13<00:35,  4.93it/s, loss=0.36534667, avg=0.95322433, gpEpoch 56/100:  84%|██████████████████████████████████▌      | 945/1121 [03:13<00:35,  4.93it/s, loss=4.96755219, avg=0.95746781, gpEpoch 56/100:  84%|██████████████████████████████████▌      | 946/1121 [03:13<00:32,  5.43it/s, loss=4.96755219, avg=0.95746781, gpEpoch 56/100:  84%|██████████████████████████████████▌      | 946/1121 [03:13<00:32,  5.43it/s, loss=0.24716994, avg=0.95671776, gpEpoch 56/100:  84%|██████████████████████████████████▋      | 947/1121 [03:13<00:29,  5.91it/s, loss=0.24716994, avg=0.95671776, gpEpoch 56/100:  84%|██████████████████████████████████▋      | 947/1121 [03:13<00:29,  5.91it/s, loss=0.37215969, avg=0.95610114, gpEpoch 56/100:  85%|██████████████████████████████████▋      | 948/1121 [03:13<00:39,  4.39it/s, loss=0.37215969, avg=0.95610114, gpEpoch 56/100:  85%|██████████████████████████████████▋      | 948/1121 [03:13<00:39,  4.39it/s, loss=0.19544676, avg=0.95529960, gpEpoch 56/100:  85%|██████████████████████████████████▋      | 949/1121 [03:13<00:34,  4.97it/s, loss=0.19544676, avg=0.95529960, gpEpoch 56/100:  85%|██████████████████████████████████▋      | 949/1121 [03:13<00:34,  4.97it/s, loss=0.41853040, avg=0.95473458, gpEpoch 56/100:  85%|██████████████████████████████████▋      | 950/1121 [03:13<00:31,  5.50it/s, loss=0.41853040, avg=0.95473458, gpEpoch 56/100:  85%|██████████████████████████████████▋      | 950/1121 [03:14<00:31,  5.50it/s, loss=0.63177311, avg=0.95439498, gpEpoch 56/100:  85%|██████████████████████████████████▊      | 951/1121 [03:14<00:28,  6.05it/s, loss=0.63177311, avg=0.95439498, gpEpoch 56/100:  85%|██████████████████████████████████▊      | 951/1121 [03:14<00:28,  6.05it/s, loss=5.78906441, avg=0.95947342, gpEpoch 56/100:  85%|██████████████████████████████████▊      | 952/1121 [03:14<00:40,  4.22it/s, loss=5.78906441, avg=0.95947342, gpEpoch 56/100:  85%|██████████████████████████████████▊      | 952/1121 [03:14<00:40,  4.22it/s, loss=0.69426274, avg=0.95919513, gpEpoch 56/100:  85%|██████████████████████████████████▊      | 953/1121 [03:14<00:35,  4.77it/s, loss=0.69426274, avg=0.95919513, gpEpoch 56/100:  85%|██████████████████████████████████▊      | 953/1121 [03:14<00:35,  4.77it/s, loss=0.33349860, avg=0.95853926, gpEpoch 56/100:  85%|██████████████████████████████████▉      | 954/1121 [03:14<00:31,  5.31it/s, loss=0.33349860, avg=0.95853926, gpEpoch 56/100:  85%|██████████████████████████████████▉      | 954/1121 [03:14<00:31,  5.31it/s, loss=0.34133202, avg=0.95789297, gpEpoch 56/100:  85%|██████████████████████████████████▉      | 955/1121 [03:14<00:28,  5.89it/s, loss=0.34133202, avg=0.95789297, gpEpoch 56/100:  85%|██████████████████████████████████▉      | 955/1121 [03:15<00:28,  5.89it/s, loss=0.71290231, avg=0.95763670, gpEpoch 56/100:  85%|██████████████████████████████████▉      | 956/1121 [03:15<00:39,  4.22it/s, loss=0.71290231, avg=0.95763670, gpEpoch 56/100:  85%|██████████████████████████████████▉      | 956/1121 [03:15<00:39,  4.22it/s, loss=1.16549718, avg=0.95785390, gpEpoch 56/100:  85%|███████████████████████████████████      | 957/1121 [03:15<00:34,  4.76it/s, loss=1.16549718, avg=0.95785390, gpEpoch 56/100:  85%|███████████████████████████████████      | 957/1121 [03:15<00:34,  4.76it/s, loss=0.90556210, avg=0.95779932, gpEpoch 56/100:  85%|███████████████████████████████████      | 958/1121 [03:15<00:30,  5.36it/s, loss=0.90556210, avg=0.95779932, gpEpoch 56/100:  85%|███████████████████████████████████      | 958/1121 [03:15<00:30,  5.36it/s, loss=0.79166317, avg=0.95762608, gpEpoch 56/100:  86%|███████████████████████████████████      | 959/1121 [03:15<00:27,  5.86it/s, loss=0.79166317, avg=0.95762608, gpEpoch 56/100:  86%|███████████████████████████████████      | 959/1121 [03:16<00:27,  5.86it/s, loss=0.42035127, avg=0.95706642, gpEpoch 56/100:  86%|███████████████████████████████████      | 960/1121 [03:16<00:35,  4.51it/s, loss=0.42035127, avg=0.95706642, gpEpoch 56/100:  86%|███████████████████████████████████      | 960/1121 [03:16<00:35,  4.51it/s, loss=0.35607865, avg=0.95644104, gpEpoch 56/100:  86%|███████████████████████████████████▏     | 961/1121 [03:16<00:31,  5.07it/s, loss=0.35607865, avg=0.95644104, gpEpoch 56/100:  86%|███████████████████████████████████▏     | 961/1121 [03:16<00:31,  5.07it/s, loss=0.21090251, avg=0.95566605, gpEpoch 56/100:  86%|███████████████████████████████████▏     | 962/1121 [03:16<00:28,  5.59it/s, loss=0.21090251, avg=0.95566605, gpEpoch 56/100:  86%|███████████████████████████████████▏     | 962/1121 [03:16<00:28,  5.59it/s, loss=0.98601872, avg=0.95569757, gpEpoch 56/100:  86%|███████████████████████████████████▏     | 963/1121 [03:16<00:26,  5.98it/s, loss=0.98601872, avg=0.95569757, gpEpoch 56/100:  86%|███████████████████████████████████▏     | 963/1121 [03:16<00:26,  5.98it/s, loss=0.46078143, avg=0.95518417, gpEpoch 56/100:  86%|███████████████████████████████████▎     | 964/1121 [03:16<00:39,  3.94it/s, loss=0.46078143, avg=0.95518417, gpEpoch 56/100:  86%|███████████████████████████████████▎     | 964/1121 [03:17<00:39,  3.94it/s, loss=0.28823391, avg=0.95449303, gpEpoch 56/100:  86%|███████████████████████████████████▎     | 965/1121 [03:17<00:34,  4.52it/s, loss=0.28823391, avg=0.95449303, gpEpoch 56/100:  86%|███████████████████████████████████▎     | 965/1121 [03:17<00:34,  4.52it/s, loss=0.79781497, avg=0.95433084, gpEpoch 56/100:  86%|███████████████████████████████████▎     | 966/1121 [03:17<00:30,  5.04it/s, loss=0.79781497, avg=0.95433084, gpEpoch 56/100:  86%|███████████████████████████████████▎     | 966/1121 [03:17<00:30,  5.04it/s, loss=0.30876157, avg=0.95366324, gpEpoch 56/100:  86%|███████████████████████████████████▎     | 967/1121 [03:17<00:27,  5.52it/s, loss=0.30876157, avg=0.95366324, gpEpoch 56/100:  86%|███████████████████████████████████▎     | 967/1121 [03:17<00:27,  5.52it/s, loss=0.21111983, avg=0.95289615, gpEpoch 56/100:  86%|███████████████████████████████████▍     | 968/1121 [03:17<00:35,  4.29it/s, loss=0.21111983, avg=0.95289615, gpEpoch 56/100:  86%|███████████████████████████████████▍     | 968/1121 [03:17<00:35,  4.29it/s, loss=0.39557445, avg=0.95232100, gpEpoch 56/100:  86%|███████████████████████████████████▍     | 969/1121 [03:17<00:31,  4.83it/s, loss=0.39557445, avg=0.95232100, gpEpoch 56/100:  86%|███████████████████████████████████▍     | 969/1121 [03:17<00:31,  4.83it/s, loss=0.65776420, avg=0.95201733, gpEpoch 56/100:  87%|███████████████████████████████████▍     | 970/1121 [03:17<00:28,  5.33it/s, loss=0.65776420, avg=0.95201733, gpEpoch 56/100:  87%|███████████████████████████████████▍     | 970/1121 [03:18<00:28,  5.33it/s, loss=2.78065252, avg=0.95390058, gpEpoch 56/100:  87%|███████████████████████████████████▌     | 971/1121 [03:18<00:25,  5.90it/s, loss=2.78065252, avg=0.95390058, gpEpoch 56/100:  87%|███████████████████████████████████▌     | 971/1121 [03:18<00:25,  5.90it/s, loss=0.32436815, avg=0.95325291, gpEpoch 56/100:  87%|███████████████████████████████████▌     | 972/1121 [03:18<00:35,  4.21it/s, loss=0.32436815, avg=0.95325291, gpEpoch 56/100:  87%|███████████████████████████████████▌     | 972/1121 [03:18<00:35,  4.21it/s, loss=4.18805456, avg=0.95657748, gpEpoch 56/100:  87%|███████████████████████████████████▌     | 973/1121 [03:18<00:31,  4.74it/s, loss=4.18805456, avg=0.95657748, gpEpoch 56/100:  87%|███████████████████████████████████▌     | 973/1121 [03:18<00:31,  4.74it/s, loss=3.94573855, avg=0.95964643, gpEpoch 56/100:  87%|███████████████████████████████████▌     | 974/1121 [03:18<00:28,  5.22it/s, loss=3.94573855, avg=0.95964643, gpEpoch 56/100:  87%|███████████████████████████████████▌     | 974/1121 [03:18<00:28,  5.22it/s, loss=0.34354717, avg=0.95901454, gpEpoch 56/100:  87%|███████████████████████████████████▋     | 975/1121 [03:18<00:25,  5.70it/s, loss=0.34354717, avg=0.95901454, gpEpoch 56/100:  87%|███████████████████████████████████▋     | 975/1121 [03:19<00:25,  5.70it/s, loss=3.55281067, avg=0.96167211, gpEpoch 56/100:  87%|███████████████████████████████████▋     | 976/1121 [03:19<00:36,  3.94it/s, loss=3.55281067, avg=0.96167211, gpEpoch 56/100:  87%|███████████████████████████████████▋     | 976/1121 [03:19<00:36,  3.94it/s, loss=0.27978766, avg=0.96097418, gpEpoch 56/100:  87%|███████████████████████████████████▋     | 977/1121 [03:19<00:31,  4.50it/s, loss=0.27978766, avg=0.96097418, gpEpoch 56/100:  87%|███████████████████████████████████▋     | 977/1121 [03:19<00:31,  4.50it/s, loss=0.52409184, avg=0.96052747, gpEpoch 56/100:  87%|███████████████████████████████████▊     | 978/1121 [03:19<00:28,  5.06it/s, loss=0.52409184, avg=0.96052747, gpEpoch 56/100:  87%|███████████████████████████████████▊     | 978/1121 [03:19<00:28,  5.06it/s, loss=0.42093784, avg=0.95997630, gpEpoch 56/100:  87%|███████████████████████████████████▊     | 979/1121 [03:19<00:25,  5.60it/s, loss=0.42093784, avg=0.95997630, gpEpoch 56/100:  87%|███████████████████████████████████▊     | 979/1121 [03:20<00:25,  5.60it/s, loss=0.77653420, avg=0.95978912, gpEpoch 56/100:  87%|███████████████████████████████████▊     | 980/1121 [03:20<00:34,  4.03it/s, loss=0.77653420, avg=0.95978912, gpEpoch 56/100:  87%|███████████████████████████████████▊     | 980/1121 [03:20<00:34,  4.03it/s, loss=0.26390201, avg=0.95907975, gpEpoch 56/100:  88%|███████████████████████████████████▉     | 981/1121 [03:20<00:30,  4.60it/s, loss=0.26390201, avg=0.95907975, gpEpoch 56/100:  88%|███████████████████████████████████▉     | 981/1121 [03:20<00:30,  4.60it/s, loss=0.32937387, avg=0.95843850, gpEpoch 56/100:  88%|███████████████████████████████████▉     | 982/1121 [03:20<00:26,  5.22it/s, loss=0.32937387, avg=0.95843850, gpEpoch 56/100:  88%|███████████████████████████████████▉     | 982/1121 [03:20<00:26,  5.22it/s, loss=0.85673469, avg=0.95833504, gpEpoch 56/100:  88%|███████████████████████████████████▉     | 983/1121 [03:20<00:23,  5.77it/s, loss=0.85673469, avg=0.95833504, gpEpoch 56/100:  88%|███████████████████████████████████▉     | 983/1121 [03:21<00:23,  5.77it/s, loss=1.07790565, avg=0.95845656, gpEpoch 56/100:  88%|███████████████████████████████████▉     | 984/1121 [03:21<00:35,  3.84it/s, loss=1.07790565, avg=0.95845656, gpEpoch 56/100:  88%|███████████████████████████████████▉     | 984/1121 [03:21<00:35,  3.84it/s, loss=0.69572043, avg=0.95818982, gpEpoch 56/100:  88%|████████████████████████████████████     | 985/1121 [03:21<00:30,  4.48it/s, loss=0.69572043, avg=0.95818982, gpEpoch 56/100:  88%|████████████████████████████████████     | 985/1121 [03:21<00:30,  4.48it/s, loss=2.35839868, avg=0.95960991, gpEpoch 56/100:  88%|████████████████████████████████████     | 986/1121 [03:21<00:26,  5.19it/s, loss=2.35839868, avg=0.95960991, gpEpoch 56/100:  88%|████████████████████████████████████     | 986/1121 [03:21<00:26,  5.19it/s, loss=0.28804779, avg=0.95892950, gpEpoch 56/100:  88%|████████████████████████████████████     | 987/1121 [03:21<00:22,  5.84it/s, loss=0.28804779, avg=0.95892950, gpEpoch 56/100:  88%|████████████████████████████████████     | 987/1121 [03:21<00:22,  5.84it/s, loss=1.65477538, avg=0.95963380, gpEpoch 56/100:  88%|████████████████████████████████████▏    | 988/1121 [03:21<00:31,  4.24it/s, loss=1.65477538, avg=0.95963380, gpEpoch 56/100:  88%|████████████████████████████████████▏    | 988/1121 [03:21<00:31,  4.24it/s, loss=0.41675669, avg=0.95908488, gpEpoch 56/100:  88%|████████████████████████████████████▏    | 989/1121 [03:21<00:27,  4.86it/s, loss=0.41675669, avg=0.95908488, gpEpoch 56/100:  88%|████████████████████████████████████▏    | 989/1121 [03:22<00:27,  4.86it/s, loss=0.24323568, avg=0.95836180, gpEpoch 56/100:  88%|████████████████████████████████████▏    | 990/1121 [03:22<00:23,  5.49it/s, loss=0.24323568, avg=0.95836180, gpEpoch 56/100:  88%|████████████████████████████████████▏    | 990/1121 [03:22<00:23,  5.49it/s, loss=0.21433100, avg=0.95761102, gpEpoch 56/100:  88%|████████████████████████████████████▏    | 991/1121 [03:22<00:21,  6.02it/s, loss=0.21433100, avg=0.95761102, gpEpoch 56/100:  88%|████████████████████████████████████▏    | 991/1121 [03:22<00:21,  6.02it/s, loss=0.14924441, avg=0.95679613, gpEpoch 56/100:  88%|████████████████████████████████████▎    | 992/1121 [03:22<00:31,  4.14it/s, loss=0.14924441, avg=0.95679613, gpEpoch 56/100:  88%|████████████████████████████████████▎    | 992/1121 [03:22<00:31,  4.14it/s, loss=0.43119442, avg=0.95626682, gpEpoch 56/100:  89%|████████████████████████████████████▎    | 993/1121 [03:22<00:26,  4.78it/s, loss=0.43119442, avg=0.95626682, gpEpoch 56/100:  89%|████████████████████████████████████▎    | 993/1121 [03:22<00:26,  4.78it/s, loss=1.20396090, avg=0.95651601, gpEpoch 56/100:  89%|████████████████████████████████████▎    | 994/1121 [03:22<00:23,  5.40it/s, loss=1.20396090, avg=0.95651601, gpEpoch 56/100:  89%|████████████████████████████████████▎    | 994/1121 [03:23<00:23,  5.40it/s, loss=1.11412835, avg=0.95667442, gpEpoch 56/100:  89%|████████████████████████████████████▍    | 995/1121 [03:23<00:21,  5.99it/s, loss=1.11412835, avg=0.95667442, gpEpoch 56/100:  89%|████████████████████████████████████▍    | 995/1121 [03:23<00:21,  5.99it/s, loss=0.26899868, avg=0.95598398, gpEpoch 56/100:  89%|████████████████████████████████████▍    | 996/1121 [03:23<00:30,  4.04it/s, loss=0.26899868, avg=0.95598398, gpEpoch 56/100:  89%|████████████████████████████████████▍    | 996/1121 [03:23<00:30,  4.04it/s, loss=0.63766557, avg=0.95566470, gpEpoch 56/100:  89%|████████████████████████████████████▍    | 997/1121 [03:23<00:26,  4.72it/s, loss=0.63766557, avg=0.95566470, gpEpoch 56/100:  89%|████████████████████████████████████▍    | 997/1121 [03:23<00:26,  4.72it/s, loss=0.47375512, avg=0.95518183, gpEpoch 56/100:  89%|████████████████████████████████████▌    | 998/1121 [03:23<00:22,  5.40it/s, loss=0.47375512, avg=0.95518183, gpEpoch 56/100:  89%|████████████████████████████████████▌    | 998/1121 [03:23<00:22,  5.40it/s, loss=0.13054164, avg=0.95435636, gpEpoch 56/100:  89%|████████████████████████████████████▌    | 999/1121 [03:23<00:20,  5.97it/s, loss=0.13054164, avg=0.95435636, gpEpoch 56/100:  89%|████████████████████████████████████▌    | 999/1121 [03:24<00:20,  5.97it/s, loss=2.61729574, avg=0.95601930, gpEpoch 56/100:  89%|███████████████████████████████████▋    | 1000/1121 [03:24<00:28,  4.20it/s, loss=2.61729574, avg=0.95601930, gpEpoch 56/100:  89%|███████████████████████████████████▋    | 1000/1121 [03:24<00:28,  4.20it/s, loss=0.35544354, avg=0.95541933, gpEpoch 56/100:  89%|███████████████████████████████████▋    | 1001/1121 [03:24<00:25,  4.77it/s, loss=0.35544354, avg=0.95541933, gpEpoch 56/100:  89%|███████████████████████████████████▋    | 1001/1121 [03:24<00:25,  4.77it/s, loss=0.58594710, avg=0.95505059, gpEpoch 56/100:  89%|███████████████████████████████████▊    | 1002/1121 [03:24<00:22,  5.32it/s, loss=0.58594710, avg=0.95505059, gpEpoch 56/100:  89%|███████████████████████████████████▊    | 1002/1121 [03:24<00:22,  5.32it/s, loss=0.65204358, avg=0.95474849, gpEpoch 56/100:  89%|███████████████████████████████████▊    | 1003/1121 [03:24<00:20,  5.78it/s, loss=0.65204358, avg=0.95474849, gpEpoch 56/100:  89%|███████████████████████████████████▊    | 1003/1121 [03:24<00:20,  5.78it/s, loss=0.54392755, avg=0.95433931, gpEpoch 56/100:  90%|███████████████████████████████████▊    | 1004/1121 [03:24<00:26,  4.46it/s, loss=0.54392755, avg=0.95433931, gpEpoch 56/100:  90%|███████████████████████████████████▊    | 1004/1121 [03:25<00:26,  4.46it/s, loss=0.20238103, avg=0.95359109, gpEpoch 56/100:  90%|███████████████████████████████████▊    | 1005/1121 [03:25<00:23,  4.92it/s, loss=0.20238103, avg=0.95359109, gpEpoch 56/100:  90%|███████████████████████████████████▊    | 1005/1121 [03:25<00:23,  4.92it/s, loss=1.53675842, avg=0.95417078, gpEpoch 56/100:  90%|███████████████████████████████████▉    | 1006/1121 [03:25<00:21,  5.40it/s, loss=1.53675842, avg=0.95417078, gpEpoch 56/100:  90%|███████████████████████████████████▉    | 1006/1121 [03:25<00:21,  5.40it/s, loss=1.19787169, avg=0.95441279, gpEpoch 56/100:  90%|███████████████████████████████████▉    | 1007/1121 [03:25<00:19,  5.86it/s, loss=1.19787169, avg=0.95441279, gpEpoch 56/100:  90%|███████████████████████████████████▉    | 1007/1121 [03:25<00:19,  5.86it/s, loss=1.02349329, avg=0.95448132, gpEpoch 56/100:  90%|███████████████████████████████████▉    | 1008/1121 [03:25<00:27,  4.16it/s, loss=1.02349329, avg=0.95448132, gpEpoch 56/100:  90%|███████████████████████████████████▉    | 1008/1121 [03:25<00:27,  4.16it/s, loss=1.49172854, avg=0.95501377, gpEpoch 56/100:  90%|████████████████████████████████████    | 1009/1121 [03:25<00:23,  4.67it/s, loss=1.49172854, avg=0.95501377, gpEpoch 56/100:  90%|████████████████████████████████████    | 1009/1121 [03:26<00:23,  4.67it/s, loss=1.04935956, avg=0.95510718, gpEpoch 56/100:  90%|████████████████████████████████████    | 1010/1121 [03:26<00:21,  5.22it/s, loss=1.04935956, avg=0.95510718, gpEpoch 56/100:  90%|████████████████████████████████████    | 1010/1121 [03:26<00:21,  5.22it/s, loss=0.09488814, avg=0.95425632, gpEpoch 56/100:  90%|████████████████████████████████████    | 1011/1121 [03:26<00:19,  5.67it/s, loss=0.09488814, avg=0.95425632, gpEpoch 56/100:  90%|████████████████████████████████████    | 1011/1121 [03:26<00:19,  5.67it/s, loss=0.72937196, avg=0.95403411, gpEpoch 56/100:  90%|████████████████████████████████████    | 1012/1121 [03:26<00:24,  4.40it/s, loss=0.72937196, avg=0.95403411, gpEpoch 56/100:  90%|████████████████████████████████████    | 1012/1121 [03:26<00:24,  4.40it/s, loss=0.31386381, avg=0.95340215, gpEpoch 56/100:  90%|████████████████████████████████████▏   | 1013/1121 [03:26<00:21,  4.96it/s, loss=0.31386381, avg=0.95340215, gpEpoch 56/100:  90%|████████████████████████████████████▏   | 1013/1121 [03:26<00:21,  4.96it/s, loss=1.41967702, avg=0.95386199, gpEpoch 56/100:  90%|████████████████████████████████████▏   | 1014/1121 [03:26<00:19,  5.51it/s, loss=1.41967702, avg=0.95386199, gpEpoch 56/100:  90%|████████████████████████████████████▏   | 1014/1121 [03:27<00:19,  5.51it/s, loss=1.07271147, avg=0.95397908, gpEpoch 56/100:  91%|████████████████████████████████████▏   | 1015/1121 [03:27<00:17,  6.08it/s, loss=1.07271147, avg=0.95397908, gpEpoch 56/100:  91%|████████████████████████████████████▏   | 1015/1121 [03:27<00:17,  6.08it/s, loss=4.48683262, avg=0.95745630, gpEpoch 56/100:  91%|████████████████████████████████████▎   | 1016/1121 [03:27<00:23,  4.48it/s, loss=4.48683262, avg=0.95745630, gpEpoch 56/100:  91%|████████████████████████████████████▎   | 1016/1121 [03:27<00:23,  4.48it/s, loss=1.82166004, avg=0.95830606, gpEpoch 56/100:  91%|████████████████████████████████████▎   | 1017/1121 [03:27<00:20,  5.14it/s, loss=1.82166004, avg=0.95830606, gpEpoch 56/100:  91%|████████████████████████████████████▎   | 1017/1121 [03:27<00:20,  5.14it/s, loss=0.59854162, avg=0.95795266, gpEpoch 56/100:  91%|████████████████████████████████████▎   | 1018/1121 [03:27<00:18,  5.59it/s, loss=0.59854162, avg=0.95795266, gpEpoch 56/100:  91%|████████████████████████████████████▎   | 1018/1121 [03:27<00:18,  5.59it/s, loss=0.23757815, avg=0.95724571, gpEpoch 56/100:  91%|████████████████████████████████████▎   | 1019/1121 [03:27<00:17,  5.99it/s, loss=0.23757815, avg=0.95724571, gpEpoch 56/100:  91%|████████████████████████████████████▎   | 1019/1121 [03:28<00:17,  5.99it/s, loss=1.99030972, avg=0.95825852, gpEpoch 56/100:  91%|████████████████████████████████████▍   | 1020/1121 [03:28<00:23,  4.24it/s, loss=1.99030972, avg=0.95825852, gpEpoch 56/100:  91%|████████████████████████████████████▍   | 1020/1121 [03:28<00:23,  4.24it/s, loss=0.41879934, avg=0.95773016, gpEpoch 56/100:  91%|████████████████████████████████████▍   | 1021/1121 [03:28<00:20,  4.82it/s, loss=0.41879934, avg=0.95773016, gpEpoch 56/100:  91%|████████████████████████████████████▍   | 1021/1121 [03:28<00:20,  4.82it/s, loss=0.17722368, avg=0.95696645, gpEpoch 56/100:  91%|████████████████████████████████████▍   | 1022/1121 [03:28<00:18,  5.41it/s, loss=0.17722368, avg=0.95696645, gpEpoch 56/100:  91%|████████████████████████████████████▍   | 1022/1121 [03:28<00:18,  5.41it/s, loss=4.41948605, avg=0.96035112, gpEpoch 56/100:  91%|████████████████████████████████████▌   | 1023/1121 [03:28<00:16,  5.79it/s, loss=4.41948605, avg=0.96035112, gpEpoch 56/100:  91%|████████████████████████████████████▌   | 1023/1121 [03:28<00:16,  5.79it/s, loss=0.79788363, avg=0.96019246, gpEpoch 56/100:  91%|████████████████████████████████████▌   | 1024/1121 [03:28<00:23,  4.17it/s, loss=0.79788363, avg=0.96019246, gpEpoch 56/100:  91%|████████████████████████████████████▌   | 1024/1121 [03:29<00:23,  4.17it/s, loss=0.17302370, avg=0.95942449, gpEpoch 56/100:  91%|████████████████████████████████████▌   | 1025/1121 [03:29<00:19,  4.83it/s, loss=0.17302370, avg=0.95942449, gpEpoch 56/100:  91%|████████████████████████████████████▌   | 1025/1121 [03:29<00:19,  4.83it/s, loss=0.23790246, avg=0.95872126, gpEpoch 56/100:  92%|████████████████████████████████████▌   | 1026/1121 [03:29<00:17,  5.41it/s, loss=0.23790246, avg=0.95872126, gpEpoch 56/100:  92%|████████████████████████████████████▌   | 1026/1121 [03:29<00:17,  5.41it/s, loss=5.25241756, avg=0.96290207, gpEpoch 56/100:  92%|████████████████████████████████████▋   | 1027/1121 [03:29<00:16,  5.83it/s, loss=5.25241756, avg=0.96290207, gpEpoch 56/100:  92%|████████████████████████████████████▋   | 1027/1121 [03:29<00:16,  5.83it/s, loss=0.23850438, avg=0.96219740, gpEpoch 56/100:  92%|████████████████████████████████████▋   | 1028/1121 [03:29<00:22,  4.18it/s, loss=0.23850438, avg=0.96219740, gpEpoch 56/100:  92%|████████████████████████████████████▋   | 1028/1121 [03:29<00:22,  4.18it/s, loss=0.28768057, avg=0.96154190, gpEpoch 56/100:  92%|████████████████████████████████████▋   | 1029/1121 [03:29<00:18,  4.86it/s, loss=0.28768057, avg=0.96154190, gpEpoch 56/100:  92%|████████████████████████████████████▋   | 1029/1121 [03:30<00:18,  4.86it/s, loss=0.39676017, avg=0.96099357, gpEpoch 56/100:  92%|████████████████████████████████████▊   | 1030/1121 [03:30<00:16,  5.45it/s, loss=0.39676017, avg=0.96099357, gpEpoch 56/100:  92%|████████████████████████████████████▊   | 1030/1121 [03:30<00:16,  5.45it/s, loss=0.48998100, avg=0.96053672, gpEpoch 56/100:  92%|████████████████████████████████████▊   | 1031/1121 [03:30<00:15,  5.89it/s, loss=0.48998100, avg=0.96053672, gpEpoch 56/100:  92%|████████████████████████████████████▊   | 1031/1121 [03:30<00:15,  5.89it/s, loss=0.24540105, avg=0.95984375, gpEpoch 56/100:  92%|████████████████████████████████████▊   | 1032/1121 [03:30<00:21,  4.23it/s, loss=0.24540105, avg=0.95984375, gpEpoch 56/100:  92%|████████████████████████████████████▊   | 1032/1121 [03:30<00:21,  4.23it/s, loss=4.09980249, avg=0.96288340, gpEpoch 56/100:  92%|████████████████████████████████████▊   | 1033/1121 [03:30<00:18,  4.86it/s, loss=4.09980249, avg=0.96288340, gpEpoch 56/100:  92%|████████████████████████████████████▊   | 1033/1121 [03:30<00:18,  4.86it/s, loss=0.25153977, avg=0.96219545, gpEpoch 56/100:  92%|████████████████████████████████████▉   | 1034/1121 [03:30<00:15,  5.45it/s, loss=0.25153977, avg=0.96219545, gpEpoch 56/100:  92%|████████████████████████████████████▉   | 1034/1121 [03:30<00:15,  5.45it/s, loss=0.37495735, avg=0.96162807, gpEpoch 56/100:  92%|████████████████████████████████████▉   | 1035/1121 [03:30<00:14,  5.90it/s, loss=0.37495735, avg=0.96162807, gpEpoch 56/100:  92%|████████████████████████████████████▉   | 1035/1121 [03:31<00:14,  5.90it/s, loss=0.10879312, avg=0.96080487, gpEpoch 56/100:  92%|████████████████████████████████████▉   | 1036/1121 [03:31<00:20,  4.20it/s, loss=0.10879312, avg=0.96080487, gpEpoch 56/100:  92%|████████████████████████████████████▉   | 1036/1121 [03:31<00:20,  4.20it/s, loss=0.86823547, avg=0.96071561, gpEpoch 56/100:  93%|█████████████████████████████████████   | 1037/1121 [03:31<00:17,  4.90it/s, loss=0.86823547, avg=0.96071561, gpEpoch 56/100:  93%|█████████████████████████████████████   | 1037/1121 [03:31<00:17,  4.90it/s, loss=0.92225993, avg=0.96067856, gpEpoch 56/100:  93%|█████████████████████████████████████   | 1038/1121 [03:31<00:15,  5.48it/s, loss=0.92225993, avg=0.96067856, gpEpoch 56/100:  93%|█████████████████████████████████████   | 1038/1121 [03:31<00:15,  5.48it/s, loss=0.72429049, avg=0.96045104, gpEpoch 56/100:  93%|█████████████████████████████████████   | 1039/1121 [03:31<00:13,  5.88it/s, loss=0.72429049, avg=0.96045104, gpEpoch 56/100:  93%|█████████████████████████████████████   | 1039/1121 [03:32<00:13,  5.88it/s, loss=0.18352330, avg=0.95970400, gpEpoch 56/100:  93%|█████████████████████████████████████   | 1040/1121 [03:32<00:18,  4.41it/s, loss=0.18352330, avg=0.95970400, gpEpoch 56/100:  93%|█████████████████████████████████████   | 1040/1121 [03:32<00:18,  4.41it/s, loss=0.77392524, avg=0.95952553, gpEpoch 56/100:  93%|█████████████████████████████████████▏  | 1041/1121 [03:32<00:15,  5.03it/s, loss=0.77392524, avg=0.95952553, gpEpoch 56/100:  93%|█████████████████████████████████████▏  | 1041/1121 [03:32<00:15,  5.03it/s, loss=0.36435565, avg=0.95895435, gpEpoch 56/100:  93%|█████████████████████████████████████▏  | 1042/1121 [03:32<00:14,  5.61it/s, loss=0.36435565, avg=0.95895435, gpEpoch 56/100:  93%|█████████████████████████████████████▏  | 1042/1121 [03:32<00:14,  5.61it/s, loss=4.52851295, avg=0.96237675, gpEpoch 56/100:  93%|█████████████████████████████████████▏  | 1043/1121 [03:32<00:13,  5.98it/s, loss=4.52851295, avg=0.96237675, gpEpoch 56/100:  93%|█████████████████████████████████████▏  | 1043/1121 [03:32<00:13,  5.98it/s, loss=3.26379728, avg=0.96458118, gpEpoch 56/100:  93%|█████████████████████████████████████▎  | 1044/1121 [03:32<00:17,  4.35it/s, loss=3.26379728, avg=0.96458118, gpEpoch 56/100:  93%|█████████████████████████████████████▎  | 1044/1121 [03:33<00:17,  4.35it/s, loss=1.45443523, avg=0.96504994, gpEpoch 56/100:  93%|█████████████████████████████████████▎  | 1045/1121 [03:33<00:15,  5.01it/s, loss=1.45443523, avg=0.96504994, gpEpoch 56/100:  93%|█████████████████████████████████████▎  | 1045/1121 [03:33<00:15,  5.01it/s, loss=0.28750005, avg=0.96440218, gpEpoch 56/100:  93%|█████████████████████████████████████▎  | 1046/1121 [03:33<00:13,  5.60it/s, loss=0.28750005, avg=0.96440218, gpEpoch 56/100:  93%|█████████████████████████████████████▎  | 1046/1121 [03:33<00:13,  5.60it/s, loss=1.75819647, avg=0.96516034, gpEpoch 56/100:  93%|█████████████████████████████████████▎  | 1047/1121 [03:33<00:12,  5.92it/s, loss=1.75819647, avg=0.96516034, gpEpoch 56/100:  93%|█████████████████████████████████████▎  | 1047/1121 [03:33<00:12,  5.92it/s, loss=0.56037116, avg=0.96477409, gpEpoch 56/100:  93%|█████████████████████████████████████▍  | 1048/1121 [03:33<00:17,  4.23it/s, loss=0.56037116, avg=0.96477409, gpEpoch 56/100:  93%|█████████████████████████████████████▍  | 1048/1121 [03:33<00:17,  4.23it/s, loss=0.30841249, avg=0.96414839, gpEpoch 56/100:  94%|█████████████████████████████████████▍  | 1049/1121 [03:33<00:14,  4.89it/s, loss=0.30841249, avg=0.96414839, gpEpoch 56/100:  94%|█████████████████████████████████████▍  | 1049/1121 [03:33<00:14,  4.89it/s, loss=0.34466106, avg=0.96355840, gpEpoch 56/100:  94%|█████████████████████████████████████▍  | 1050/1121 [03:33<00:13,  5.43it/s, loss=0.34466106, avg=0.96355840, gpEpoch 56/100:  94%|█████████████████████████████████████▍  | 1050/1121 [03:34<00:13,  5.43it/s, loss=0.44604450, avg=0.96306600, gpEpoch 56/100:  94%|█████████████████████████████████████▌  | 1051/1121 [03:34<00:12,  5.80it/s, loss=0.44604450, avg=0.96306600, gpEpoch 56/100:  94%|█████████████████████████████████████▌  | 1051/1121 [03:34<00:12,  5.80it/s, loss=0.67670858, avg=0.96279380, gpEpoch 56/100:  94%|█████████████████████████████████████▌  | 1052/1121 [03:34<00:17,  4.04it/s, loss=0.67670858, avg=0.96279380, gpEpoch 56/100:  94%|█████████████████████████████████████▌  | 1052/1121 [03:34<00:17,  4.04it/s, loss=0.30078226, avg=0.96216511, gpEpoch 56/100:  94%|█████████████████████████████████████▌  | 1053/1121 [03:34<00:14,  4.75it/s, loss=0.30078226, avg=0.96216511, gpEpoch 56/100:  94%|█████████████████████████████████████▌  | 1053/1121 [03:34<00:14,  4.75it/s, loss=3.42486572, avg=0.96450164, gpEpoch 56/100:  94%|█████████████████████████████████████▌  | 1054/1121 [03:34<00:12,  5.31it/s, loss=3.42486572, avg=0.96450164, gpEpoch 56/100:  94%|█████████████████████████████████████▌  | 1054/1121 [03:34<00:12,  5.31it/s, loss=0.85614872, avg=0.96439893, gpEpoch 56/100:  94%|█████████████████████████████████████▋  | 1055/1121 [03:34<00:11,  5.91it/s, loss=0.85614872, avg=0.96439893, gpEpoch 56/100:  94%|█████████████████████████████████████▋  | 1055/1121 [03:35<00:11,  5.91it/s, loss=0.16773856, avg=0.96364452, gpEpoch 56/100:  94%|█████████████████████████████████████▋  | 1056/1121 [03:35<00:16,  3.97it/s, loss=0.16773856, avg=0.96364452, gpEpoch 56/100:  94%|█████████████████████████████████████▋  | 1056/1121 [03:35<00:16,  3.97it/s, loss=1.58325684, avg=0.96423072, gpEpoch 56/100:  94%|█████████████████████████████████████▋  | 1057/1121 [03:35<00:13,  4.67it/s, loss=1.58325684, avg=0.96423072, gpEpoch 56/100:  94%|█████████████████████████████████████▋  | 1057/1121 [03:35<00:13,  4.67it/s, loss=0.69291681, avg=0.96397428, gpEpoch 56/100:  94%|█████████████████████████████████████▊  | 1058/1121 [03:35<00:12,  5.21it/s, loss=0.69291681, avg=0.96397428, gpEpoch 56/100:  94%|█████████████████████████████████████▊  | 1058/1121 [03:35<00:12,  5.21it/s, loss=0.28418216, avg=0.96333236, gpEpoch 56/100:  94%|█████████████████████████████████████▊  | 1059/1121 [03:35<00:10,  5.70it/s, loss=0.28418216, avg=0.96333236, gpEpoch 56/100:  94%|█████████████████████████████████████▊  | 1059/1121 [03:36<00:10,  5.70it/s, loss=0.99448043, avg=0.96336174, gpEpoch 56/100:  95%|█████████████████████████████████████▊  | 1060/1121 [03:36<00:15,  3.93it/s, loss=0.99448043, avg=0.96336174, gpEpoch 56/100:  95%|█████████████████████████████████████▊  | 1060/1121 [03:36<00:15,  3.93it/s, loss=0.85697103, avg=0.96326147, gpEpoch 56/100:  95%|█████████████████████████████████████▊  | 1061/1121 [03:36<00:13,  4.55it/s, loss=0.85697103, avg=0.96326147, gpEpoch 56/100:  95%|█████████████████████████████████████▊  | 1061/1121 [03:36<00:13,  4.55it/s, loss=4.18533468, avg=0.96629544, gpEpoch 56/100:  95%|█████████████████████████████████████▉  | 1062/1121 [03:36<00:11,  5.20it/s, loss=4.18533468, avg=0.96629544, gpEpoch 56/100:  95%|█████████████████████████████████████▉  | 1062/1121 [03:36<00:11,  5.20it/s, loss=0.50324023, avg=0.96585983, gpEpoch 56/100:  95%|█████████████████████████████████████▉  | 1063/1121 [03:36<00:09,  5.89it/s, loss=0.50324023, avg=0.96585983, gpEpoch 56/100:  95%|█████████████████████████████████████▉  | 1063/1121 [03:37<00:09,  5.89it/s, loss=0.33841848, avg=0.96527012, gpEpoch 56/100:  95%|█████████████████████████████████████▉  | 1064/1121 [03:37<00:13,  4.11it/s, loss=0.33841848, avg=0.96527012, gpEpoch 56/100:  95%|█████████████████████████████████████▉  | 1064/1121 [03:37<00:13,  4.11it/s, loss=0.62687349, avg=0.96495238, gpEpoch 56/100:  95%|██████████████████████████████████████  | 1065/1121 [03:37<00:11,  4.76it/s, loss=0.62687349, avg=0.96495238, gpEpoch 56/100:  95%|██████████████████████████████████████  | 1065/1121 [03:37<00:11,  4.76it/s, loss=0.67631692, avg=0.96468162, gpEpoch 56/100:  95%|██████████████████████████████████████  | 1066/1121 [03:37<00:10,  5.36it/s, loss=0.67631692, avg=0.96468162, gpEpoch 56/100:  95%|██████████████████████████████████████  | 1066/1121 [03:37<00:10,  5.36it/s, loss=0.61667883, avg=0.96435547, gpEpoch 56/100:  95%|██████████████████████████████████████  | 1067/1121 [03:37<00:09,  5.92it/s, loss=0.61667883, avg=0.96435547, gpEpoch 56/100:  95%|██████████████████████████████████████  | 1067/1121 [03:37<00:09,  5.92it/s, loss=0.20492293, avg=0.96364439, gpEpoch 56/100:  95%|██████████████████████████████████████  | 1068/1121 [03:37<00:12,  4.29it/s, loss=0.20492293, avg=0.96364439, gpEpoch 56/100:  95%|██████████████████████████████████████  | 1068/1121 [03:37<00:12,  4.29it/s, loss=0.40882355, avg=0.96312538, gpEpoch 56/100:  95%|██████████████████████████████████████▏ | 1069/1121 [03:37<00:10,  4.84it/s, loss=0.40882355, avg=0.96312538, gpEpoch 56/100:  95%|██████████████████████████████████████▏ | 1069/1121 [03:38<00:10,  4.84it/s, loss=0.40009451, avg=0.96259918, gpEpoch 56/100:  95%|██████████████████████████████████████▏ | 1070/1121 [03:38<00:09,  5.39it/s, loss=0.40009451, avg=0.96259918, gpEpoch 56/100:  95%|██████████████████████████████████████▏ | 1070/1121 [03:38<00:09,  5.39it/s, loss=1.16921234, avg=0.96279210, gpEpoch 56/100:  96%|██████████████████████████████████████▏ | 1071/1121 [03:38<00:08,  5.88it/s, loss=1.16921234, avg=0.96279210, gpEpoch 56/100:  96%|██████████████████████████████████████▏ | 1071/1121 [03:38<00:08,  5.88it/s, loss=4.82897377, avg=0.96639861, gpEpoch 56/100:  96%|██████████████████████████████████████▎ | 1072/1121 [03:38<00:11,  4.28it/s, loss=4.82897377, avg=0.96639861, gpEpoch 56/100:  96%|██████████████████████████████████████▎ | 1072/1121 [03:38<00:11,  4.28it/s, loss=0.53367800, avg=0.96599533, gpEpoch 56/100:  96%|██████████████████████████████████████▎ | 1073/1121 [03:38<00:10,  4.77it/s, loss=0.53367800, avg=0.96599533, gpEpoch 56/100:  96%|██████████████████████████████████████▎ | 1073/1121 [03:38<00:10,  4.77it/s, loss=0.90574425, avg=0.96593923, gpEpoch 56/100:  96%|██████████████████████████████████████▎ | 1074/1121 [03:38<00:08,  5.29it/s, loss=0.90574425, avg=0.96593923, gpEpoch 56/100:  96%|██████████████████████████████████████▎ | 1074/1121 [03:39<00:08,  5.29it/s, loss=3.96564603, avg=0.96872965, gpEpoch 56/100:  96%|██████████████████████████████████████▎ | 1075/1121 [03:39<00:07,  5.78it/s, loss=3.96564603, avg=0.96872965, gpEpoch 56/100:  96%|██████████████████████████████████████▎ | 1075/1121 [03:39<00:07,  5.78it/s, loss=0.44243443, avg=0.96824053, gpEpoch 56/100:  96%|██████████████████████████████████████▍ | 1076/1121 [03:39<00:10,  4.28it/s, loss=0.44243443, avg=0.96824053, gpEpoch 56/100:  96%|██████████████████████████████████████▍ | 1076/1121 [03:39<00:10,  4.28it/s, loss=0.31676781, avg=0.96763564, gpEpoch 56/100:  96%|██████████████████████████████████████▍ | 1077/1121 [03:39<00:09,  4.86it/s, loss=0.31676781, avg=0.96763564, gpEpoch 56/100:  96%|██████████████████████████████████████▍ | 1077/1121 [03:39<00:09,  4.86it/s, loss=0.45703149, avg=0.96716198, gpEpoch 56/100:  96%|██████████████████████████████████████▍ | 1078/1121 [03:39<00:07,  5.49it/s, loss=0.45703149, avg=0.96716198, gpEpoch 56/100:  96%|██████████████████████████████████████▍ | 1078/1121 [03:39<00:07,  5.49it/s, loss=0.12872584, avg=0.96638493, gpEpoch 56/100:  96%|██████████████████████████████████████▌ | 1079/1121 [03:39<00:06,  6.08it/s, loss=0.12872584, avg=0.96638493, gpEpoch 56/100:  96%|██████████████████████████████████████▌ | 1079/1121 [03:40<00:06,  6.08it/s, loss=0.55838823, avg=0.96600715, gpEpoch 56/100:  96%|██████████████████████████████████████▌ | 1080/1121 [03:40<00:09,  4.35it/s, loss=0.55838823, avg=0.96600715, gpEpoch 56/100:  96%|██████████████████████████████████████▌ | 1080/1121 [03:40<00:09,  4.35it/s, loss=0.72804356, avg=0.96578702, gpEpoch 56/100:  96%|██████████████████████████████████████▌ | 1081/1121 [03:40<00:08,  4.92it/s, loss=0.72804356, avg=0.96578702, gpEpoch 56/100:  96%|██████████████████████████████████████▌ | 1081/1121 [03:40<00:08,  4.92it/s, loss=0.30046126, avg=0.96517212, gpEpoch 56/100:  97%|██████████████████████████████████████▌ | 1082/1121 [03:40<00:07,  5.51it/s, loss=0.30046126, avg=0.96517212, gpEpoch 56/100:  97%|██████████████████████████████████████▌ | 1082/1121 [03:40<00:07,  5.51it/s, loss=0.61763918, avg=0.96485122, gpEpoch 56/100:  97%|██████████████████████████████████████▋ | 1083/1121 [03:40<00:06,  6.25it/s, loss=0.61763918, avg=0.96485122, gpEpoch 56/100:  97%|██████████████████████████████████████▋ | 1083/1121 [03:41<00:06,  6.25it/s, loss=3.41699982, avg=0.96711335, gpEpoch 56/100:  97%|██████████████████████████████████████▋ | 1084/1121 [03:41<00:09,  4.01it/s, loss=3.41699982, avg=0.96711335, gpEpoch 56/100:  97%|██████████████████████████████████████▋ | 1084/1121 [03:41<00:09,  4.01it/s, loss=0.63432753, avg=0.96680663, gpEpoch 56/100:  97%|██████████████████████████████████████▋ | 1085/1121 [03:41<00:07,  4.70it/s, loss=0.63432753, avg=0.96680663, gpEpoch 56/100:  97%|██████████████████████████████████████▋ | 1085/1121 [03:41<00:07,  4.70it/s, loss=1.42851305, avg=0.96723178, gpEpoch 56/100:  97%|██████████████████████████████████████▊ | 1086/1121 [03:41<00:06,  5.16it/s, loss=1.42851305, avg=0.96723178, gpEpoch 56/100:  97%|██████████████████████████████████████▊ | 1086/1121 [03:41<00:06,  5.16it/s, loss=8.49714279, avg=0.97415902, gpEpoch 56/100:  97%|██████████████████████████████████████▊ | 1087/1121 [03:41<00:06,  5.61it/s, loss=8.49714279, avg=0.97415902, gpEpoch 56/100:  97%|██████████████████████████████████████▊ | 1087/1121 [03:41<00:06,  5.61it/s, loss=0.16076952, avg=0.97341142, gpEpoch 56/100:  97%|██████████████████████████████████████▊ | 1088/1121 [03:41<00:07,  4.17it/s, loss=0.16076952, avg=0.97341142, gpEpoch 56/100:  97%|██████████████████████████████████████▊ | 1088/1121 [03:41<00:07,  4.17it/s, loss=0.17363226, avg=0.97267700, gpEpoch 56/100:  97%|██████████████████████████████████████▊ | 1089/1121 [03:41<00:06,  4.84it/s, loss=0.17363226, avg=0.97267700, gpEpoch 56/100:  97%|██████████████████████████████████████▊ | 1089/1121 [03:42<00:06,  4.84it/s, loss=0.51823622, avg=0.97226008, gpEpoch 56/100:  97%|██████████████████████████████████████▉ | 1090/1121 [03:42<00:05,  5.41it/s, loss=0.51823622, avg=0.97226008, gpEpoch 56/100:  97%|██████████████████████████████████████▉ | 1090/1121 [03:42<00:05,  5.41it/s, loss=0.25043198, avg=0.97159846, gpEpoch 56/100:  97%|██████████████████████████████████████▉ | 1091/1121 [03:42<00:05,  5.85it/s, loss=0.25043198, avg=0.97159846, gpEpoch 56/100:  97%|██████████████████████████████████████▉ | 1091/1121 [03:42<00:05,  5.85it/s, loss=0.21348834, avg=0.97090422, gpEpoch 56/100:  97%|██████████████████████████████████████▉ | 1092/1121 [03:42<00:07,  4.12it/s, loss=0.21348834, avg=0.97090422, gpEpoch 56/100:  97%|██████████████████████████████████████▉ | 1092/1121 [03:42<00:07,  4.12it/s, loss=1.00270939, avg=0.97093332, gpEpoch 56/100:  98%|███████████████████████████████████████ | 1093/1121 [03:42<00:05,  4.74it/s, loss=1.00270939, avg=0.97093332, gpEpoch 56/100:  98%|███████████████████████████████████████ | 1093/1121 [03:42<00:05,  4.74it/s, loss=0.60634875, avg=0.97060006, gpEpoch 56/100:  98%|███████████████████████████████████████ | 1094/1121 [03:42<00:05,  5.33it/s, loss=0.60634875, avg=0.97060006, gpEpoch 56/100:  98%|███████████████████████████████████████ | 1094/1121 [03:43<00:05,  5.33it/s, loss=0.61329609, avg=0.97027376, gpEpoch 56/100:  98%|███████████████████████████████████████ | 1095/1121 [03:43<00:04,  5.69it/s, loss=0.61329609, avg=0.97027376, gpEpoch 56/100:  98%|███████████████████████████████████████ | 1095/1121 [03:43<00:04,  5.69it/s, loss=1.02356648, avg=0.97032238, gpEpoch 56/100:  98%|███████████████████████████████████████ | 1096/1121 [03:43<00:06,  4.14it/s, loss=1.02356648, avg=0.97032238, gpEpoch 56/100:  98%|███████████████████████████████████████ | 1096/1121 [03:43<00:06,  4.14it/s, loss=0.15140145, avg=0.96957587, gpEpoch 56/100:  98%|███████████████████████████████████████▏| 1097/1121 [03:43<00:05,  4.79it/s, loss=0.15140145, avg=0.96957587, gpEpoch 56/100:  98%|███████████████████████████████████████▏| 1097/1121 [03:43<00:05,  4.79it/s, loss=4.45980644, avg=0.97275459, gpEpoch 56/100:  98%|███████████████████████████████████████▏| 1098/1121 [03:43<00:04,  5.39it/s, loss=4.45980644, avg=0.97275459, gpEpoch 56/100:  98%|███████████████████████████████████████▏| 1098/1121 [03:43<00:04,  5.39it/s, loss=1.56701684, avg=0.97329532, gpEpoch 56/100:  98%|███████████████████████████████████████▏| 1099/1121 [03:43<00:03,  5.79it/s, loss=1.56701684, avg=0.97329532, gpEpoch 56/100:  98%|███████████████████████████████████████▏| 1099/1121 [03:44<00:03,  5.79it/s, loss=0.49776453, avg=0.97286302, gpEpoch 56/100:  98%|███████████████████████████████████████▎| 1100/1121 [03:44<00:05,  4.07it/s, loss=0.49776453, avg=0.97286302, gpEpoch 56/100:  98%|███████████████████████████████████████▎| 1100/1121 [03:44<00:05,  4.07it/s, loss=0.16032451, avg=0.97212502, gpEpoch 56/100:  98%|███████████████████████████████████████▎| 1101/1121 [03:44<00:04,  4.83it/s, loss=0.16032451, avg=0.97212502, gpEpoch 56/100:  98%|███████████████████████████████████████▎| 1101/1121 [03:44<00:04,  4.83it/s, loss=1.84920120, avg=0.97292091, gpEpoch 56/100:  98%|███████████████████████████████████████▎| 1102/1121 [03:44<00:03,  5.41it/s, loss=1.84920120, avg=0.97292091, gpEpoch 56/100:  98%|███████████████████████████████████████▎| 1102/1121 [03:44<00:03,  5.41it/s, loss=0.21071014, avg=0.97222988, gpEpoch 56/100:  98%|███████████████████████████████████████▎| 1103/1121 [03:44<00:03,  5.84it/s, loss=0.21071014, avg=0.97222988, gpEpoch 56/100:  98%|███████████████████████████████████████▎| 1103/1121 [03:45<00:03,  5.84it/s, loss=0.23698069, avg=0.97156389, gpEpoch 56/100:  98%|███████████████████████████████████████▍| 1104/1121 [03:45<00:04,  3.93it/s, loss=0.23698069, avg=0.97156389, gpEpoch 56/100:  98%|███████████████████████████████████████▍| 1104/1121 [03:45<00:04,  3.93it/s, loss=0.57682806, avg=0.97120667, gpEpoch 56/100:  99%|███████████████████████████████████████▍| 1105/1121 [03:45<00:03,  4.69it/s, loss=0.57682806, avg=0.97120667, gpEpoch 56/100:  99%|███████████████████████████████████████▍| 1105/1121 [03:45<00:03,  4.69it/s, loss=0.51020467, avg=0.97078985, gpEpoch 56/100:  99%|███████████████████████████████████████▍| 1106/1121 [03:45<00:02,  5.29it/s, loss=0.51020467, avg=0.97078985, gpEpoch 56/100:  99%|███████████████████████████████████████▍| 1106/1121 [03:45<00:02,  5.29it/s, loss=0.81625283, avg=0.97065025, gpEpoch 56/100:  99%|███████████████████████████████████████▌| 1107/1121 [03:45<00:02,  5.69it/s, loss=0.81625283, avg=0.97065025, gpEpoch 56/100:  99%|███████████████████████████████████████▌| 1107/1121 [03:45<00:02,  5.69it/s, loss=0.84376675, avg=0.97053573, gpEpoch 56/100:  99%|███████████████████████████████████████▌| 1108/1121 [03:45<00:03,  4.04it/s, loss=0.84376675, avg=0.97053573, gpEpoch 56/100:  99%|███████████████████████████████████████▌| 1108/1121 [03:46<00:03,  4.04it/s, loss=0.28765398, avg=0.96991997, gpEpoch 56/100:  99%|███████████████████████████████████████▌| 1109/1121 [03:46<00:02,  4.81it/s, loss=0.28765398, avg=0.96991997, gpEpoch 56/100:  99%|███████████████████████████████████████▌| 1109/1121 [03:46<00:02,  4.81it/s, loss=0.10687647, avg=0.96914245, gpEpoch 56/100:  99%|███████████████████████████████████████▌| 1110/1121 [03:46<00:02,  5.40it/s, loss=0.10687647, avg=0.96914245, gpEpoch 56/100:  99%|███████████████████████████████████████▌| 1110/1121 [03:46<00:02,  5.40it/s, loss=0.68648148, avg=0.96888803, gpEpoch 56/100:  99%|███████████████████████████████████████▋| 1111/1121 [03:46<00:01,  5.78it/s, loss=0.68648148, avg=0.96888803, gpEpoch 56/100:  99%|███████████████████████████████████████▋| 1111/1121 [03:46<00:01,  5.78it/s, loss=0.33244425, avg=0.96831569, gpEpoch 56/100:  99%|███████████████████████████████████████▋| 1112/1121 [03:46<00:02,  4.08it/s, loss=0.33244425, avg=0.96831569, gpEpoch 56/100:  99%|███████████████████████████████████████▋| 1112/1121 [03:46<00:02,  4.08it/s, loss=0.54819036, avg=0.96793822, gpEpoch 56/100:  99%|███████████████████████████████████████▋| 1113/1121 [03:46<00:01,  4.84it/s, loss=0.54819036, avg=0.96793822, gpEpoch 56/100:  99%|███████████████████████████████████████▋| 1113/1121 [03:46<00:01,  4.84it/s, loss=1.13119411, avg=0.96808477, gpEpoch 56/100:  99%|███████████████████████████████████████▊| 1114/1121 [03:46<00:01,  5.44it/s, loss=1.13119411, avg=0.96808477, gpEpoch 56/100:  99%|███████████████████████████████████████▊| 1114/1121 [03:47<00:01,  5.44it/s, loss=1.94793844, avg=0.96896356, gpEpoch 56/100:  99%|███████████████████████████████████████▊| 1115/1121 [03:47<00:01,  5.78it/s, loss=1.94793844, avg=0.96896356, gpEpoch 56/100:  99%|███████████████████████████████████████▊| 1115/1121 [03:47<00:01,  5.78it/s, loss=0.26730442, avg=0.96833483, gpEpoch 56/100: 100%|███████████████████████████████████████▊| 1116/1121 [03:47<00:01,  4.14it/s, loss=0.26730442, avg=0.96833483, gpEpoch 56/100: 100%|███████████████████████████████████████▊| 1116/1121 [03:47<00:01,  4.14it/s, loss=0.48451647, avg=0.96790169, gpEpoch 56/100: 100%|███████████████████████████████████████▊| 1117/1121 [03:47<00:00,  4.88it/s, loss=0.48451647, avg=0.96790169, gpEpoch 56/100: 100%|███████████████████████████████████████▊| 1117/1121 [03:47<00:00,  4.88it/s, loss=0.85277790, avg=0.96779872, gpEpoch 56/100: 100%|███████████████████████████████████████▉| 1118/1121 [03:47<00:00,  5.57it/s, loss=0.85277790, avg=0.96779872, gpEpoch 56/100: 100%|███████████████████████████████████████▉| 1118/1121 [03:47<00:00,  5.57it/s, loss=0.82016432, avg=0.96766678, gpEpoch 56/100: 100%|███████████████████████████████████████▉| 1119/1121 [03:47<00:00,  6.22it/s, loss=0.82016432, avg=0.96766678, gpEpoch 56/100: 100%|███████████████████████████████████████▉| 1119/1121 [03:47<00:00,  6.22it/s, loss=0.42445201, avg=0.96718177, gpEpoch 56/100: 100%|███████████████████████████████████████▉| 1120/1121 [03:47<00:00,  6.96it/s, loss=0.42445201, avg=0.96718177, gpEpoch 56/100: 100%|███████████████████████████████████████▉| 1120/1121 [03:48<00:00,  6.96it/s, loss=0.11479457, avg=0.96642139, gpEpoch 56/100: 100%|████████████████████████████████████████| 1121/1121 [03:48<00:00,  7.59it/s, loss=0.11479457, avg=0.96642139, gpEpoch 56/100: 100%|████████████████████████████████████████| 1121/1121 [03:48<00:00,  4.91it/s, loss=0.11479457, avg=0.96642139, gpu=3.5GB]
Val avg loss: 1.06894842

✓ 新最佳模型已保存!Loss = 1.06894842
Epoch 57/100: 100%|████████████████████████████████| 1121/1121 [03:54<00:00,  4.78it/s, loss=0.29581711, avg=0.96642732, gpu=3.5GB]
Val avg loss: 1.06897466
Epoch 58/100: 100%|████████████████████████████████| 1121/1121 [03:48<00:00,  4.91it/s, loss=0.23140596, avg=0.96642442, gpu=3.5GB]
Val avg loss: 1.06898647
Epoch 59/100: 100%|████████████████████████████████| 1121/1121 [03:47<00:00,  4.94it/s, loss=0.58332741, avg=0.96641202, gpu=3.5GB]
Val avg loss: 1.06896995
Epoch 60/100: 100%|████████████████████████████████| 1121/1121 [03:47<00:00,  4.92it/s, loss=0.92776275, avg=0.96637793, gpu=3.5GB]
Val avg loss: 1.06899917
Epoch 61/100: 100%|████████████████████████████████| 1121/1121 [03:47<00:00,  4.93it/s, loss=0.27155060, avg=0.96645004, gpu=3.5GB]
Val avg loss: 1.06896678
Epoch 62/100: 100%|████████████████████████████████| 1121/1121 [03:48<00:00,  4.90it/s, loss=8.49688148, avg=0.96641846, gpu=3.5GB]
Val avg loss: 1.06896491
Epoch 63/100: 100%|████████████████████████████████| 1121/1121 [03:48<00:00,  4.90it/s, loss=1.00289226, avg=0.96640588, gpu=3.5GB]
Val avg loss: 1.06895156
Epoch 64/100: 100%|████████████████████████████████| 1121/1121 [03:47<00:00,  4.94it/s, loss=0.88753724, avg=0.96640686, gpu=3.5GB]
Val avg loss: 1.06901906
Epoch 65/100: 100%|█████████████████████████████████████████| 1121/1121 [03:47<00:00,  4.92it/s, loss=0.37457883, avg=0.96642828, gpu=3.5GB]
Val avg loss: 1.06899404
Epoch 66/100: 100%|█████████████████████████████████████████| 1121/1121 [03:46<00:00,  4.95it/s, loss=1.36008644, avg=0.96640685, gpu=3.5GB]
Val avg loss: 1.06896534
Epoch 67/100: 100%|█████████████████████████████████████████| 1121/1121 [03:47<00:00,  4.92it/s, loss=0.88780159, avg=0.96638936, gpu=3.5GB]
Val avg loss: 1.06896634
Epoch 68/100: 100%|█████████████████████████████████████████| 1121/1121 [03:47<00:00,  4.92it/s, loss=0.22225189, avg=0.96640211, gpu=3.5GB]
Val avg loss: 1.06898222
Epoch 69/100: 100%|█████████████████████████████████████████| 1121/1121 [03:48<00:00,  4.90it/s, loss=0.62093830, avg=0.96640551, gpu=3.5GB]
Val avg loss: 1.06898171
Epoch 70/100: 100%|█████████████████████████████████████████| 1121/1121 [03:48<00:00,  4.90it/s, loss=0.51000911, avg=0.96640307, gpu=3.5GB]
Val avg loss: 1.06897015
Epoch 71/100: 100%|█████████████████████████████████████████| 1121/1121 [03:47<00:00,  4.94it/s, loss=0.37482899, avg=0.96639939, gpu=3.5GB]
Val avg loss: 1.06897758
Epoch 72/100: 100%|█████████████████████████████████████████| 1121/1121 [03:47<00:00,  4.93it/s, loss=0.60689378, avg=0.96639386, gpu=3.5GB]
Val avg loss: 1.06901290
Epoch 73/100: 100%|█████████████████████████████████████████| 1121/1121 [03:48<00:00,  4.90it/s, loss=0.15197891, avg=0.96639594, gpu=3.5GB]
Val avg loss: 1.06897726
Epoch 74/100: 100%|█████████████████████████████████████████| 1121/1121 [03:46<00:00,  4.96it/s, loss=0.14465638, avg=0.96639534, gpu=3.5GB]
Val avg loss: 1.06898049
Epoch 75/100: 100%|█████████████████████████████████████████| 1121/1121 [03:45<00:00,  4.96it/s, loss=0.79776257, avg=0.96639231, gpu=3.5GB]
Val avg loss: 1.06897663
Epoch 76/100: 100%|█████████████████████████████████████████| 1121/1121 [03:46<00:00,  4.96it/s, loss=0.14906174, avg=0.96638838, gpu=3.5GB]
Val avg loss: 1.06897742
Epoch 77/100: 100%|█████████████████████████████████████████| 1121/1121 [03:45<00:00,  4.96it/s, loss=0.21655436, avg=0.96639090, gpu=3.5GB]
Val avg loss: 1.06897446
Epoch 78/100: 100%|█████████████████████████████████████████| 1121/1121 [03:46<00:00,  4.94it/s, loss=0.19082618, avg=0.96638705, gpu=3.5GB]
Val avg loss: 1.06897874
Epoch 79/100: 100%|█████████████████████████████████████████| 1121/1121 [03:46<00:00,  4.95it/s, loss=0.49190074, avg=0.96638629, gpu=3.5GB]
Val avg loss: 1.06897713
Epoch 80/100: 100%|█████████████████████████████████████████| 1121/1121 [03:47<00:00,  4.93it/s, loss=1.27818990, avg=0.96638704, gpu=3.5GB]
Val avg loss: 1.06897637
Epoch 81/100: 100%|█████████████████████████████████████████| 1121/1121 [03:46<00:00,  4.94it/s, loss=1.15876067, avg=0.96638493, gpu=3.5GB]
Val avg loss: 1.06897767
Epoch 82/100: 100%|█████████████████████████████████████████| 1121/1121 [03:47<00:00,  4.93it/s, loss=6.65324688, avg=0.96638181, gpu=3.5GB]
Val avg loss: 1.06897846
Epoch 83/100: 100%|█████████████████████████████████████████| 1121/1121 [03:46<00:00,  4.95it/s, loss=0.82196152, avg=0.96638583, gpu=3.5GB]
Val avg loss: 1.06897564
Epoch 84/100: 100%|█████████████████████████████████████████| 1121/1121 [03:46<00:00,  4.95it/s, loss=0.99563110, avg=0.96638166, gpu=3.5GB]
Val avg loss: 1.06897846
Epoch 85/100: 100%|█████████████████████████████████████████| 1121/1121 [03:46<00:00,  4.95it/s, loss=0.46014950, avg=0.96638125, gpu=3.5GB]
Val avg loss: 1.06897772
Epoch 86/100: 100%|█████████████████████████████████████████| 1121/1121 [03:47<00:00,  4.93it/s, loss=1.49186563, avg=0.96637652, gpu=3.5GB]
Val avg loss: 1.06897804
Epoch 87/100: 100%|█████████████████████████████████████████| 1121/1121 [03:47<00:00,  4.93it/s, loss=0.26670697, avg=0.96638299, gpu=3.5GB]
Val avg loss: 1.06897717
Epoch 88/100: 100%|█████████████████████████████████████████| 1121/1121 [03:46<00:00,  4.95it/s, loss=0.71281993, avg=0.96637947, gpu=3.5GB]
Val avg loss: 1.06897791
Epoch 89/100: 100%|█████████████████████████████████████████| 1121/1121 [03:47<00:00,  4.93it/s, loss=1.32279122, avg=0.96638213, gpu=3.5GB]
Val avg loss: 1.06897797
Epoch 90/100: 100%|█████████████████████████████████████████| 1121/1121 [03:47<00:00,  4.93it/s, loss=0.85537130, avg=0.96637717, gpu=3.5GB]
Val avg loss: 1.06897761
Epoch 91/100: 100%|█████████████████████████████████████████| 1121/1121 [03:47<00:00,  4.94it/s, loss=0.54402399, avg=0.96637983, gpu=3.5GB]
Val avg loss: 1.06897748
Epoch 92/100: 100%|█████████████████████████████████████████| 1121/1121 [03:48<00:00,  4.91it/s, loss=0.58724070, avg=0.96638087, gpu=3.5GB]
Val avg loss: 1.06897763
Epoch 93/100: 100%|█████████████████████████████████████████| 1121/1121 [03:48<00:00,  4.91it/s, loss=1.85164881, avg=0.96637838, gpu=3.5GB]
Val avg loss: 1.06897791
Epoch 94/100: 100%|█████████████████████████████████████████| 1121/1121 [03:47<00:00,  4.92it/s, loss=1.10076666, avg=0.96637336, gpu=3.5GB]
Val avg loss: 1.06897801
Epoch 95/100: 100%|████████████████████████████████| 1121/1121 [03:53<00:00,  4.80it/s, loss=0.63787502, avg=0.96637956, gpu=3.5GB]
Val avg loss: 1.06897786
Epoch 96/100: 100%|████████████████████████████████| 1121/1121 [03:47<00:00,  4.92it/s, loss=0.22877336, avg=0.96637600, gpu=3.5GB]
Val avg loss: 1.06897786
Epoch 97/100: 100%|█████████████████████████████████████████| 1121/1121 [03:46<00:00,  4.94it/s, loss=3.11924124, avg=0.96638303, gpu=3.5GB]
Val avg loss: 1.06897804
Epoch 98/100: 100%|████████████████████████████████| 1121/1121 [03:50<00:00,  4.86it/s, loss=0.67620838, avg=0.96637860, gpu=3.5GB]
Val avg loss: 1.06897804
Epoch 99/100:  72%|███████████████████████▉         | 812/1121 [02:47<01:00,  5.14it/s, loss=0.25997815, avg=0.95214067, gpu=3.5GB]

完整训练代码:

# swin_transformer_hyt001_pt_fast.py  —— 核弹级.pt版(2025年12月7日凌晨最终定稿)
# 实测RTX4090:batch=128,10万样本≈3.8分钟/epoch,显存峰值21.6GB,loss 8epoch破0.005
# 纯tensor加载 + torch.compile + 完美平衡采样 + 高num_workers,速度屠杀一切旧版

# 把顶部import改成(彻底消灭所有warning):
import torch
import torch.nn as nn
from torch.amp import autocast, GradScaler   # 新版终极写法,零警告

# import torch
# import torch.nn as nn
# from torch.cuda.amp import autocast, GradScaler   # 正确写法,所有PyTorch 1.6~2.5+通吃
from torch.utils.data import Dataset, DataLoader
from torch.utils.tensorboard import SummaryWriter
from torch.optim.lr_scheduler import CosineAnnealingLR
import multiprocessing as mp
import random
import gc
import os
from datetime import datetime
from tqdm import tqdm

from swin_transformer_standard import SwinTransformer  # 你原来的终极模型文件

from sklearn.model_selection import train_test_split  # 新增
from torch.utils.data import Subset  # 新增
import matplotlib.pyplot as plt  # 新增
import pandas as pd  # 新增

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"使用设备: {device}")

# ==================== 极速PT数据集(纯tensor加载 ====================
class PTDataset(Dataset):
    def __init__(self, pt_dir='/home/hyt/HYT/future_minutes/preprocessed_pt'):
        self.pt_dir = pt_dir
        self.files = [os.path.join(pt_dir, f) for f in sorted(os.listdir(pt_dir)) 
                      if f.endswith('.pt')]
        # 自然排序保证00000000.pt ~ 99999999.pt顺序一致
        self.files.sort(key=lambda x: int(os.path.basename(x).split('.')[0]))

        # 只扫描direction,立即释放大tensor,内存峰值<2GB
        self.directions = []
        print("正在极速扫描所有.pt文件方向(用于完美平衡采样)...")
        for f in tqdm(self.files):
            data = torch.load(f, map_location='cpu')
            self.directions.append(data['direction'])
            del data['x']  # 立刻释放大tensor
            del data
            gc.collect()
        print(f"扫描完成,共加载 {len(self.files)} 个样本(多头+空头已平衡)")

    def __len__(self):
        return len(self.files)

    def __getitem__(self, idx):
        data = torch.load(self.files[idx], map_location='cpu')  # 直接cpu加载,pin_memory后飞到gpu
        return data['x'], data['y']  # x:(64,224,224) y:(4,)


# ==================== 终极平衡批采样器(已适配新数据集) ====================
class BalancedBatchSampler:
    """每批正好半长半短,训练最稳,收敛飞快"""
    def __init__(self, dataset, batch_size=128):  # 4090轻松吃128
        self.batch_size = batch_size
        half = batch_size // 2
        
        if isinstance(dataset, Subset):  # 新增:处理Subset
            orig_directions = dataset.dataset.directions
            directions = [orig_directions[i] for i in dataset.indices]
        else:
            directions = dataset.directions
        
        long_indices = [i for i, d in enumerate(directions) if d == 1]
        short_indices = [i for i, d in enumerate(directions) if d == -1]
        
        min_len = min(len(long_indices), len(short_indices))
        if min_len * 2 < len(dataset): 
            print(f"警告:多空样本不平衡,仅使用 {min_len*2} 个样本进行完美平衡")
        
        long_indices = long_indices[:min_len]
        short_indices = short_indices[:min_len]

        random.shuffle(long_indices)
        random.shuffle(short_indices)

        self.batches = []
        for i in range(0, len(long_indices), half):
            batch = long_indices[i:i+half] + short_indices[i:i+half]
            if len(batch) == batch_size:  # 最后一批可能小,丢弃
                self.batches.append(batch)

        print(f"平衡后实际批次数量: {len(self.batches)} (每批{batch_size}个)")

    def __iter__(self):
        random.shuffle(self.batches)
        return iter(self.batches)

    def __len__(self):
        return len(self.batches)


# ==================== 主程序(直接复制粘贴运行) ====================
if __name__ == "__main__":
    pt_dir = '/home/hyt/HYT/future_minutes/preprocessed_pt_fixed'  # 改成你的路径

    dataset = PTDataset(pt_dir)
    
    # 新增:划分训练集和验证集(按方向分层)
    long_indices = [i for i, d in enumerate(dataset.directions) if d == 1]
    short_indices = [i for i, d in enumerate(dataset.directions) if d == -1]
    train_long, val_long = train_test_split(long_indices, test_size=0.2, random_state=42)
    train_short, val_short = train_test_split(short_indices, test_size=0.2, random_state=42)
    train_indices = train_long + train_short
    val_indices = val_long + val_short
    train_dataset = Subset(dataset, train_indices)
    val_dataset = Subset(dataset, val_indices)

    batch_sampler = BalancedBatchSampler(train_dataset, batch_size=32)  # 4090实测稳如狗

    # 把DataLoader改成这样(关键三处):
    loader = DataLoader(
        train_dataset,  # 修改为train_dataset
        batch_sampler=batch_sampler,
        num_workers=4,               # 改成固定8(稳如老狗,别用os.cpu_count)
        pin_memory=True,
        prefetch_factor=2,           # 改小,别预载太多
        persistent_workers=False     # 改False,epoch间重启worker防内存泄漏
    )
    
    # 新增:验证集DataLoader(不使用平衡采样器)
    val_loader = DataLoader(
        val_dataset,
        batch_size=32,
        shuffle=False,
        num_workers=4,
        pin_memory=True,
        prefetch_factor=2,
        persistent_workers=False
    )

    print(f"实际训练批次/epoch: {len(loader)}")

    model = SwinTransformer(
        img_size=224,
        patch_size=4,
        in_chans=64,
        num_classes=4,
        embed_dim=128,          # 小幅升级到128,精度更高,速度只掉10%
        depths=[2, 2, 6, 2],
        num_heads=[4, 8, 16, 32],
        window_size=7,
        drop_path_rate=0.3,
        use_checkpoint=True
    ).to(device)

    # PyTorch 2.x compile 拉满(实测加速30~40%)
    if torch.cuda.is_available():
        model = torch.compile(model, mode="reduce-overhead", fullgraph=True)
        print("torch.compile 加速已启用,起飞!")

    model.train()

    criterion = nn.MSELoss()
    optimizer = torch.optim.AdamW(model.parameters(), lr=3e-4, weight_decay=0.05)  # lr调高一点更快收敛
    scheduler = CosineAnnealingLR(optimizer, T_max=100, eta_min=1e-6)
    # scaler = GradScaler() if device.type == 'cuda' else None
    scaler = GradScaler('cuda') if device.type == 'cuda' else None

    writer = SummaryWriter(f"runs/pt_exp_{datetime.now().strftime('%Y%m%d_%H%M%S')}")
    best_val_loss = float('inf')  # 修改为best_val_loss
    
    train_losses = []  # 新增
    val_losses = []  # 新增

    print("\n=== 开始核弹级PT训练 Swin-Small 64通道特化版 ===\n")

    for epoch in range(100):  # 拉到100轮,基本8~12轮就收敛爆表
        model.train()
        total_loss = 0.0
        pbar = tqdm(loader, desc=f"Epoch {epoch+1:02d}/100")

        for x, y in pbar:
            x = x.to(device, non_blocking=True)
            y = y.to(device, non_blocking=True)

            # with autocast(dtype=torch.bfloat16):
            with autocast(device_type='cuda', dtype=torch.bfloat16):
                pred = model(x)
                loss = criterion(pred, y)

            if scaler:
                scaler.scale(loss).backward()
                scaler.unscale_(optimizer)
                torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
                scaler.step(optimizer)
                scaler.update()
                optimizer.zero_grad(set_to_none=True)
            else:
                loss.backward()
                torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
                optimizer.step()
                optimizer.zero_grad(set_to_none=True)

            total_loss += loss.item()
            pbar.set_postfix({
                'loss': f'{loss.item():.8f}',
                'avg': f'{total_loss/(pbar.n+1):.8f}',
                'gpu': f'{torch.cuda.memory_reserved()/1e9:.1f}GB'
            })

        avg_loss = total_loss / len(loader)
        train_losses.append(avg_loss)  # 新增
        scheduler.step()

        writer.add_scalar('Loss/train', avg_loss, epoch)
        writer.add_scalar('LR', optimizer.param_groups[0]['lr'], epoch)
        
        # 新增:验证循环
        model.eval()
        val_total_loss = 0.0
        with torch.no_grad():
            for x, y in val_loader:
                x = x.to(device, non_blocking=True)
                y = y.to(device, non_blocking=True)
                with autocast(device_type='cuda', dtype=torch.bfloat16):
                    pred = model(x)
                    loss = criterion(pred, y)
                val_total_loss += loss.item()
        val_avg = val_total_loss / len(val_loader)
        val_losses.append(val_avg)
        print(f"Val avg loss: {val_avg:.8f}")
        writer.add_scalar('Loss/val', val_avg, epoch)

        if val_avg < best_val_loss:  # 修改为基于val
            best_val_loss = val_avg
            torch.save({
                'epoch': epoch,
                'model_state_dict': model.state_dict(),
                'optimizer_state_dict': optimizer.state_dict(),
                'loss': best_val_loss,
            }, 'BEST_PT_model_128ch.pth')
            print(f"\n✓ 新最佳模型已保存!Loss = {best_val_loss:.8f}")

        if val_avg < 0.004:  # 修改为基于val
            print(f"\n收敛达成!提前结束于 Epoch {epoch+1}")
            break

    writer.close()
    torch.save(model.state_dict(), 'FINAL_PT_model_128ch.pth')
    print("\n训练完成!去市场屠杀吧,兄弟!")
    
    # 新增:保存损失数据并绘制曲线
    df = pd.DataFrame({
        'epoch': list(range(1, len(train_losses) + 1)),
        'train_loss': train_losses,
        'val_loss': val_losses
    })
    df.to_csv('losses.csv', index=False)
    print("损失数据已保存至 losses.csv")
    
    plt.figure()
    plt.plot(df['epoch'], df['train_loss'], label='Train Loss')
    plt.plot(df['epoch'], df['val_loss'], label='Val Loss')
    plt.xlabel('Epoch')
    plt.ylabel('MSE Loss')
    plt.legend()
    plt.title('Loss Curve')
    plt.savefig('loss_curve.png')
    plt.close()
    print("损失下降图已保存至 loss_curve.png")

在这里插入图片描述

1.1 小结

损失函数曲线存在较大问题:
(1)训练收敛,验证也收敛,但二者差异太大,过拟合的概率很高。
(2)从数值量级上看,可能训练集和验证集的batch_size设置得不一样。
(3)单点预测和4点预测,损失i函数曲线一致,说明不是预测结果结构和网络网络设计的问题,后面还有机会4项内容同时预测。

有3个疑点:
(1)我输入应该3000+ * 2000+,而不是244*244,中间发生了什么?
计算效率:3000×2000直接处理会产生约187,500个patch,计算复杂度爆炸
内存限制:注意力机制的内存需求与序列长度平方相关
训练稳定性:固定尺寸便于批次训练和优化

(2)PatchEmbed层中的卷积,为节省算力,进行了无重叠切割后才进行卷积提取特征,原始CNN是有重叠的卷积,我怀疑这个PE存在连续特征数据丢失(因为不恰当的图像切割)
还有需要思考的问题:
(3)swin_transformer 具体的网络结构,优劣势;新增层数是否能有效?

2. label 收益 建模

先跟着调试核查一遍代码。

2.1 跟代码

Swin Transformer 全局示意
在这里插入图片描述
在这里插入图片描述

def get_transform():
    return T.Compose([T.Resize((224, 224)), T.ToTensor()]) # 0-1转换

一张 1024x768 的图片,尺寸被强制修改为 224像素(高)x 224像素(宽)。这个咋一看就十分不合理。

2.1.1 特征平面数值核对

有点疑问:特征平面0-1映射,最小值为什么会是0.41?看起来应该有些问题,深挖一下,确定缘由。

[0, tensor(1.), tensor(0.0039)]
[1, tensor(1.), tensor(0.)]
[2, tensor(1.), tensor(0.)]
[3, tensor(1.), tensor(0.)]
[4, tensor(1.), tensor(0.0078)]
[5, tensor(1.), tensor(0.)]
[6, tensor(1.), tensor(0.3216)]
[7, tensor(1.), tensor(0.3373)]
[8, tensor(1.), tensor(0.3020)]
[9, tensor(1.), tensor(0.4118)]
[10, tensor(1.), tensor(0.4118)]
[11, tensor(1.), tensor(0.4118)]
[12, tensor(1.), tensor(0.4118)]
[13, tensor(1.), tensor(0.4118)]
[14, tensor(1.), tensor(0.4118)]
[15, tensor(1.), tensor(0.4118)]
[16, tensor(1.), tensor(0.4118)]
[17, tensor(1.), tensor(0.4118)]
[18, tensor(1.), tensor(0.4118)]
[19, tensor(1.), tensor(0.4118)]
[20, tensor(1.), tensor(0.4118)]
[21, tensor(1.), tensor(0.0039)]
[22, tensor(1.), tensor(0.)]
[23, tensor(1.), tensor(0.)]
[24, tensor(1.), tensor(0.)]
[25, tensor(1.), tensor(0.)]
[26, tensor(1.), tensor(0.)]
[27, tensor(1.), tensor(0.3020)]
[28, tensor(1.), tensor(0.3490)]
[29, tensor(1.), tensor(0.3765)]
[30, tensor(1.), tensor(0.4118)]
[31, tensor(1.), tensor(0.4118)]
[32, tensor(1.), tensor(0.4118)]
[33, tensor(1.), tensor(0.4118)]
[34, tensor(1.), tensor(0.4118)]
[35, tensor(1.), tensor(0.4118)]
[36, tensor(1.), tensor(0.4118)]
[37, tensor(1.), tensor(0.4118)]
[38, tensor(1.), tensor(0.4118)]
[39, tensor(1.), tensor(0.4118)]
[40, tensor(1.), tensor(0.4118)]
[41, tensor(1.), tensor(0.4118)]
[42, tensor(1.), tensor(0.4353)]
[43, tensor(1.), tensor(0.4353)]
[44, tensor(1.), tensor(0.4353)]
[45, tensor(1.), tensor(0.4353)]
[46, tensor(1.), tensor(0.4353)]
[47, tensor(1.), tensor(0.4353)]
[48, tensor(1.), tensor(0.4353)]
[49, tensor(1.), tensor(0.4353)]
[50, tensor(1.), tensor(0.4353)]
[51, tensor(1.), tensor(0.4353)]
[52, tensor(1.), tensor(0.4353)]
[53, tensor(1.), tensor(0.4353)]
[54, tensor(1.), tensor(0.4353)]
[55, tensor(1.), tensor(0.4353)]
[56, tensor(1.), tensor(0.4353)]
[57, tensor(1.), tensor(0.4353)]
[58, tensor(1.), tensor(0.4353)]
[59, tensor(1.), tensor(0.4353)]
[60, tensor(1.), tensor(0.4353)]
[61, tensor(1.), tensor(0.4353)]
[62, tensor(1.), tensor(0.4353)]
[63, tensor(1.), tensor(1.)]

2.2 变量控制涉及

规划一个控制变量AB实验,以逐步找到真正有效的调整,先列出所有可能的因素:
多标签 VS 单标签
网络深度
超参数
网络结构(ResNet、CNN、Swin_tranformer)
特征平面构建是否正确

2.2.1 先改单标签,训练看结果

效果几乎与之前一样。多标签改成单标签不能改变训练结果,训练
百轮训练推理结果(损失函数)。训练从1.074降低到0.966,降幅有但不够显著,验证则只降低到1.069,降低0.005。
看起来是过拟合。不知是否是数据集随机划分造成的问题。
epoch,train_loss,val_loss
1,1.0741343356396449,1.0743708975374486
2,0.9785978414058473,1.0804832319984232
3,0.9737897610406488,1.0737112478694457
4,0.9721939740345268,1.0761017890587397
5,0.9702058561299126,1.0701784534757672
6,0.9692888637999676,1.0708921416480346
7,0.9690270949018842,1.0703202228582203
8,0.968529709407945,1.0712779130110537
9,0.9687648862068617,1.0723078461940603
10,0.9681094447740691,1.0699940937640828
11,0.9682510279710541,1.0701535210225506
12,0.9674395457986845,1.0695698189141487
13,0.9678477665391169,1.0708593930585104
14,0.9675728762912814,1.0699079970554102
15,0.967382950158283,1.0710968333578195
16,0.9672536611503836,1.0693842292468319
17,0.9670414544050445,1.069283148068338
18,0.9667071252596049,1.0699185287825153
19,0.9668041019900689,1.0691370330353225
20,0.9667721039197681,1.0690210067345578
21,0.9666536669915643,1.069109596259017
22,0.9666127345518912,1.0689795946576417
23,0.9665551281847345,1.0691200515977852
24,0.9666153778131789,1.0691491367288757
25,0.9666377807519566,1.068951993703418
26,0.9665804059443253,1.0689888026737657
27,0.9665111525185294,1.0698766580574028
28,0.9665506769048114,1.0695826794382093
29,0.9665453664752439,1.0689635597620146
30,0.9664822243200574,1.0689915067837756
31,0.9664966318287773,1.0690077384966972
32,0.9664833520960319,1.0691182026341293
33,0.9665019824821751,1.068954597137789
34,0.9664557393138479,1.0690862969147352
35,0.9665076654008943,1.0692599482330563
36,0.9665044421507885,1.0690466982455016
37,0.966470738043751,1.0691549688630682
38,0.9665005148374322,1.0689695186621353
39,0.9664575243759219,1.0690563714854233
40,0.9664498302989193,1.0689653811770825
41,0.9664617102795898,1.0690084972892793
42,0.9664376284345573,1.06901716610609
43,0.9664598831659338,1.0690218609529034
44,0.9664590193106595,1.0689951039559051
45,0.9664626634753676,1.0690436183716903
46,0.9664486693907911,1.0689932753819165
47,0.9664341437760791,1.0689917020367135
48,0.9664291609327884,1.069054234335431
49,0.9664425275934051,1.0689923603392582
50,0.9664130532959547,1.0692545076918347
51,0.9664495313252646,1.0689810930197774
52,0.9664201880011783,1.0689735466422134
53,0.9664342963966698,1.0690216200154448
54,0.9664307886420563,1.068954368237923
55,0.966418753289198,1.0689773615781097
56,0.9664213901433851,1.0689484231783826
57,0.9664273181802242,1.068974657255984
58,0.9664244193527233,1.0689864667709188
59,0.9664120237400339,1.0689699549585898
60,0.9663779302487726,1.0689991729804629
61,0.9664500385163939,1.0689667754166916
62,0.9664184627830185,1.0689649134609198
63,0.9664058820423433,1.0689515571948471
64,0.9664068619299424,1.0690190567770887
65,0.9664282771051728,1.068994043109892
66,0.9664068520866687,1.0689653406098645
67,0.9663893630483872,1.0689663414270003
68,0.9664021112975215,1.0689822233603519
69,0.9664055109156975,1.0689817070165564
70,0.9664030700297254,1.0689701496016937
71,0.9663993872828402,1.068977578811586
72,0.9663938618498547,1.0690129014839056
73,0.9663959359426141,1.0689772550692762
74,0.9663953394641523,1.0689804912727074
75,0.9663923092777977,1.0689766295386802
76,0.9663883843704274,1.0689774184517589
77,0.9663909040358213,1.0689744561168222
78,0.9663870471872782,1.0689787394317443
79,0.9663862860582961,1.06897712514812
80,0.9663870430000652,1.068976372400849
81,0.9663849335419717,1.068977671665441
82,0.9663818129649873,1.0689784551165282
83,0.9663858277246317,1.068975643092415
84,0.9663816625177595,1.0689784557263622
85,0.9663812535466837,1.0689777194180947
86,0.9663765199223232,1.0689780445656742
87,0.9663829862154243,1.06897717059401
88,0.9663794727382864,1.06897790509929
89,0.9663821250718688,1.0689779660561756
90,0.9663771702097953,1.0689776100456927
91,0.9663798273088249,1.0689774757761548
92,0.9663808679110125,1.068977625052912
93,0.9663783761670183,1.0689779056295805
94,0.966373355319785,1.068978011157377
95,0.9663795612945182,1.0689778599980888
96,0.9663759979031429,1.0689778599980888
97,0.9663830281473716,1.0689780390771682
98,0.9663785993853495,1.0689780398460893
99,0.9663723868373565,1.0689779485035622
100,0.9663707601383368,1.0689779350606996
在这里插入图片描述
我需要先找到一种思路方法是确认有效的,再用于我的场景,否则不好排查问题。
(注:多、空是否分开建模更好?)

Logo

有“AI”的1024 = 2048,欢迎大家加入2048 AI社区

更多推荐