模型训练时遇到的问题总结

模型训练遇到的问题总结

次次皮

3686人浏览 · 2024-04-24 15:17:29

次次皮 · 2024-04-24 15:17:29 发布

1、训练出来的模型无法识别任何目标

2、OSError: [WinError 1455] 页面文件太小，无法完成操作

3、Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

4、IndexError: index 10 is out of bounds for dimension 1 with size 10

5、使用GPU训练，torch.cuda.OutOfMemoryError: CUDA out of memory

1、训练出来的模型无法识别任何目标

报错日志：

D:\Programs\miniconda3\envs\python311\python.exe D:\python\project\VisDrone2019-DET-MOT\train.py 
New https://pypi.org/project/ultralytics/8.1.15 available 😃 Update with 'pip install -U ultralytics'
Ultralytics YOLOv8.1.9 🚀 Python-3.11.7 torch-2.2.0 CUDA:0 (NVIDIA GeForce GTX 1650, 4096MiB)
engine\trainer: task=detect, mode=train, model=yolov8n.pt, data=D:\python\project\VisDrone2019-DET-MOT\class.yaml, epochs=20, time=None, patience=50, batch=2, imgsz=900, save=True, save_period=-1, cache=False, device=0, workers=8, project=None, name=train6, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, auto_augment=randaugment, erasing=0.4, crop_fraction=1.0, cfg=None, tracker=botsort.yaml, save_dir=runs\detect\train6
Overriding model.yaml nc=80 with nc=12

                   from  n    params  module                                       arguments                     
  0                  -1  1       464  ultralytics.nn.modules.conv.Conv             [3, 16, 3, 2]                 
  1                  -1  1      4672  ultralytics.nn.modules.conv.Conv             [16, 32, 3, 2]                
  2                  -1  1      7360  ultralytics.nn.modules.block.C2f             [32, 32, 1, True]             
  3                  -1  1     18560  ultralytics.nn.modules.conv.Conv             [32, 64, 3, 2]                
  4                  -1  2     49664  ultralytics.nn.modules.block.C2f             [64, 64, 2, True]             
  5                  -1  1     73984  ultralytics.nn.modules.conv.Conv             [64, 128, 3, 2]               
  6                  -1  2    197632  ultralytics.nn.modules.block.C2f             [128, 128, 2, True]           
  7                  -1  1    295424  ultralytics.nn.modules.conv.Conv             [128, 256, 3, 2]              
  8                  -1  1    460288  ultralytics.nn.modules.block.C2f             [256, 256, 1, True]           
  9                  -1  1    164608  ultralytics.nn.modules.block.SPPF            [256, 256, 5]                 
 10                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 11             [-1, 6]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 12                  -1  1    148224  ultralytics.nn.modules.block.C2f             [384, 128, 1]                 
 13                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 14             [-1, 4]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 15                  -1  1     37248  ultralytics.nn.modules.block.C2f             [192, 64, 1]                  
 16                  -1  1     36992  ultralytics.nn.modules.conv.Conv             [64, 64, 3, 2]                
 17            [-1, 12]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 18                  -1  1    123648  ultralytics.nn.modules.block.C2f             [192, 128, 1]                 
 19                  -1  1    147712  ultralytics.nn.modules.conv.Conv             [128, 128, 3, 2]              
 20             [-1, 9]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 21                  -1  1    493056  ultralytics.nn.modules.block.C2f             [384, 256, 1]                 
 22        [15, 18, 21]  1    753652  ultralytics.nn.modules.head.Detect           [12, [64, 128, 256]]          
Model summary: 225 layers, 3013188 parameters, 3013172 gradients, 8.2 GFLOPs

Transferred 319/355 items from pretrained weights
Freezing layer 'model.22.dfl.conv.weight'
AMP: running Automatic Mixed Precision (AMP) checks with YOLOv8n...
AMP: checks passed ✅
WARNING ⚠️ imgsz=[900] must be multiple of max stride 32, updating to [928]
train: Scanning E:\DeepLearning\AI\VisDrone2019\VisDrone2019-DET-MOT\train\labels.cache... 30669 images, 0 backgrounds, 0 corrupt: 100%|██████████| 30669/30669 [00:00<?, ?it/s]
train: WARNING ⚠️ E:\DeepLearning\AI\VisDrone2019\VisDrone2019-DET-MOT\train\images\0000137_02220_d_0000163.jpg: 1 duplicate labels removed
train: WARNING ⚠️ E:\DeepLearning\AI\VisDrone2019\VisDrone2019-DET-MOT\train\images\0000140_00118_d_0000002.jpg: 1 duplicate labels removed
train: WARNING ⚠️ E:\DeepLearning\AI\VisDrone2019\VisDrone2019-DET-MOT\train\images\9999945_00000_d_0000114.jpg: 1 duplicate labels removed
train: WARNING ⚠️ E:\DeepLearning\AI\VisDrone2019\VisDrone2019-DET-MOT\train\images\9999987_00000_d_0000049.jpg: 1 duplicate labels removed
train: WARNING ⚠️ E:\DeepLearning\AI\VisDrone2019\VisDrone2019-DET-MOT\train\images\9999998_00219_d_0000175.jpg: 1 duplicate labels removed
val: Scanning E:\DeepLearning\AI\VisDrone2019\VisDrone2019-DET-MOT\val\labels.cache... 3394 images, 1 backgrounds, 0 corrupt: 100%|██████████| 3394/3394 [00:00<?, ?it/s]
Plotting labels to runs\detect\train6\labels.jpg... 
optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 
optimizer: AdamW(lr=0.000625, momentum=0.9) with parameter groups 57 weight(decay=0.0), 64 weight(decay=0.0005), 63 bias(decay=0.0)
Image sizes 928 train, 928 val
Using 8 dataloader workers
Logging results to runs\detect\train6
Starting training for 20 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       1/20      2.55G        nan        nan        nan         77        928: 100%|██████████| 15335/15335 [1:25:45<00:00,  2.98it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 849/849 [01:50<00:00,  7.70it/s]
                   all       3394     158168    0.00413   1.32e-05    0.00208   0.000415

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       2/20      3.17G        nan        nan        nan         44        928: 100%|██████████| 15335/15335 [1:25:31<00:00,  2.99it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 849/849 [01:50<00:00,  7.66it/s]
                   all       3394     158168    0.00401   1.32e-05    0.00203   0.000406

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       3/20      2.08G        nan        nan        nan        134        928: 100%|██████████| 15335/15335 [1:25:26<00:00,  2.99it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 849/849 [01:50<00:00,  7.70it/s]
                   all       3394     158168    0.00154   1.14e-05   0.000779   7.79e-05

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       4/20      3.25G        nan        nan        nan         99        928: 100%|██████████| 15335/15335 [1:25:26<00:00,  2.99it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 849/849 [01:50<00:00,  7.70it/s]
                   all       3394     158168    0.00409   1.32e-05    0.00206   0.000412

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       5/20      3.15G        nan        nan        nan         90        928: 100%|██████████| 15335/15335 [1:25:25<00:00,  2.99it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 849/849 [01:50<00:00,  7.71it/s]
                   all       3394     158168    0.00412   1.32e-05    0.00207    0.00033

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       6/20      2.22G        nan        nan        nan         85        928: 100%|██████████| 15335/15335 [1:25:25<00:00,  2.99it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 849/849 [01:50<00:00,  7.70it/s]
                   all       3394     158168     0.0043   1.32e-05    0.00216   0.000343
  0%|          | 0/15335 [00:00<?, ?it/s]
      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       7/20      3.52G        nan        nan        nan         39        928: 100%|██████████| 15335/15335 [1:25:30<00:00,  2.99it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 849/849 [01:51<00:00,  7.65it/s]
                   all       3394     158168    0.00401   1.32e-05    0.00202   0.000322

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       8/20      2.62G        nan        nan        nan         28        928: 100%|██████████| 15335/15335 [1:25:19<00:00,  3.00it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 849/849 [01:50<00:00,  7.70it/s]
                   all       3394     158168     0.0016   1.14e-05    0.00081   0.000162

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       9/20       3.4G        nan        nan        nan        109        928: 100%|██████████| 15335/15335 [1:25:27<00:00,  2.99it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 849/849 [01:50<00:00,  7.69it/s]
                   all       3394     158168    0.00419   1.32e-05    0.00211   0.000338

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      10/20       2.8G        nan        nan        nan         35        928: 100%|██████████| 15335/15335 [1:25:22<00:00,  2.99it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 849/849 [01:50<00:00,  7.71it/s]
                   all       3394     158168     0.0016   1.14e-05    0.00081    8.1e-05
Closing dataloader mosaic
  0%|          | 0/15335 [00:00<?, ?it/s]
      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      11/20      2.73G        nan        nan        nan         41        928: 100%|██████████| 15335/15335 [1:23:44<00:00,  3.05it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 849/849 [01:50<00:00,  7.71it/s]
                   all       3394     158168     0.0013   1.14e-05   0.000697   0.000279

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      12/20      2.38G        nan        nan        nan         10        928: 100%|██████████| 15335/15335 [1:23:44<00:00,  3.05it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 849/849 [01:50<00:00,  7.71it/s]
                   all       3394     158168    0.00134   1.14e-05   0.000725    0.00029

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      13/20      2.87G        nan        nan        nan         20        928: 100%|██████████| 15335/15335 [1:23:45<00:00,  3.05it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 849/849 [01:50<00:00,  7.70it/s]
                   all       3394     158168    0.00132   1.14e-05   0.000707   0.000283

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      14/20      2.68G        nan        nan        nan         63        928: 100%|██████████| 15335/15335 [1:23:44<00:00,  3.05it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 849/849 [01:50<00:00,  7.71it/s]
                   all       3394     158168    0.00134   1.14e-05   0.000725    0.00029

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      15/20      2.28G        nan        nan        nan         56        928: 100%|██████████| 15335/15335 [1:23:45<00:00,  3.05it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 849/849 [01:50<00:00,  7.71it/s]
                   all       3394     158168    0.00134   1.14e-05   0.000725    0.00029
  0%|          | 0/15335 [00:00<?, ?it/s]
      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      16/20      2.15G        nan        nan        nan         51        928: 100%|██████████| 15335/15335 [1:23:44<00:00,  3.05it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 849/849 [01:50<00:00,  7.71it/s]
                   all       3394     158168    0.00132   1.14e-05   0.000714   0.000286

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      17/20      2.33G        nan        nan        nan         23        928: 100%|██████████| 15335/15335 [1:23:50<00:00,  3.05it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 849/849 [01:50<00:00,  7.70it/s]
                   all       3394     158168    0.00132   1.14e-05   0.000714   0.000286
  0%|          | 0/15335 [00:00<?, ?it/s]
      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      18/20       2.9G        nan        nan        nan         47        928: 100%|██████████| 15335/15335 [1:23:50<00:00,  3.05it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 849/849 [01:50<00:00,  7.68it/s]
                   all       3394     158168    0.00132   1.14e-05   0.000714   0.000286

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      19/20      2.69G        nan        nan        nan        136        928: 100%|██████████| 15335/15335 [1:23:51<00:00,  3.05it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 849/849 [01:50<00:00,  7.69it/s]
                   all       3394     158168    0.00134   1.14e-05   0.000725    0.00029
  0%|          | 0/15335 [00:00<?, ?it/s]
      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      20/20      2.41G        nan        nan        nan         93        928: 100%|██████████| 15335/15335 [1:23:52<00:00,  3.05it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 849/849 [01:50<00:00,  7.70it/s]
                   all       3394     158168    0.00132   1.14e-05   0.000714   0.000286

20 epochs completed in 28.833 hours.
Optimizer stripped from runs\detect\train6\weights\last.pt, 6.3MB
Optimizer stripped from runs\detect\train6\weights\best.pt, 6.3MB

Validating runs\detect\train6\weights\best.pt...
Ultralytics YOLOv8.1.9 🚀 Python-3.11.7 torch-2.2.0 CUDA:0 (NVIDIA GeForce GTX 1650, 4096MiB)
Model summary (fused): 168 layers, 3007988 parameters, 0 gradients, 8.1 GFLOPs
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 849/849 [01:40<00:00,  8.49it/s]
                   all       3394     158168    0.00405   1.32e-05    0.00204    0.00032
               regions       3394       5371          0          0          0          0
            pedestrian       3394      41239          0          0          0          0
                people       3394      23003          0          0          0          0
               bicycle       3394       7307     0.0208   0.000137     0.0105    0.00105
                   car       3394      45871     0.0278   2.18e-05     0.0139    0.00279
                   van       3394       8815          0          0          0          0
                 truck       3394       2108          0          0          0          0
              tricycle       3394       4812          0          0          0          0
       awning-tricycle       3394       2233          0          0          0          0
                   bus       3394        515          0          0          0          0
                 motor       3394      16861          0          0          0          0
                others       3394         33          0          0          0          0
Speed: 0.4ms preprocess, 26.0ms inference, 0.0ms loss, 0.2ms postprocess per image
Results saved to runs\detect\train6

Process finished with exit code 0

可以看到这些数据box_loss cls_loss dfl_loss都是nan

原因据说是显卡的bug，16系列的和pytorch的不兼容

问题数据：

正常数据：

解决方法：在 yolov8 的源码 ultralytics\ultralytics\cfg 目录下找到了 default.yaml ，并将 amp 改为 False，重新训练

解决方法还可以参考这个博主写的文章：yolov8 训练自己的数据集后，检测不到目标的解决办法_yolov8 训练模型未检测到-CSDN博客

2、OSError: [WinError 1455] 页面文件太小，无法完成操作

报错日志：

D:\Programs\miniconda3\envs\python311\python.exe D:\python\project\illegal-building\train.py 
New https://pypi.org/project/ultralytics/8.1.16 available 😃 Update with 'pip install -U ultralytics'
Ultralytics YOLOv8.1.9 🚀 Python-3.11.7 torch-2.2.0 CUDA:0 (NVIDIA GeForce GTX 1650, 4096MiB)
engine\trainer: task=detect, mode=train, model=yolov8n.pt, data=class.yaml, epochs=20, time=None, patience=50, batch=5, imgsz=6000, save=True, save_period=-1, cache=False, device=0, workers=8, project=None, name=train4, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, auto_augment=randaugment, erasing=0.4, crop_fraction=1.0, cfg=None, tracker=botsort.yaml, save_dir=runs\detect\train4
Overriding model.yaml nc=80 with nc=10

                   from  n    params  module                                       arguments                     
  0                  -1  1       464  ultralytics.nn.modules.conv.Conv             [3, 16, 3, 2]                 
  1                  -1  1      4672  ultralytics.nn.modules.conv.Conv             [16, 32, 3, 2]                
  2                  -1  1      7360  ultralytics.nn.modules.block.C2f             [32, 32, 1, True]             
  3                  -1  1     18560  ultralytics.nn.modules.conv.Conv             [32, 64, 3, 2]                
  4                  -1  2     49664  ultralytics.nn.modules.block.C2f             [64, 64, 2, True]             
  5                  -1  1     73984  ultralytics.nn.modules.conv.Conv             [64, 128, 3, 2]               
  6                  -1  2    197632  ultralytics.nn.modules.block.C2f             [128, 128, 2, True]           
  7                  -1  1    295424  ultralytics.nn.modules.conv.Conv             [128, 256, 3, 2]              
  8                  -1  1    460288  ultralytics.nn.modules.block.C2f             [256, 256, 1, True]           
  9                  -1  1    164608  ultralytics.nn.modules.block.SPPF            [256, 256, 5]                 
 10                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 11             [-1, 6]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 12                  -1  1    148224  ultralytics.nn.modules.block.C2f             [384, 128, 1]                 
 13                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 14             [-1, 4]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 15                  -1  1     37248  ultralytics.nn.modules.block.C2f             [192, 64, 1]                  
 16                  -1  1     36992  ultralytics.nn.modules.conv.Conv             [64, 64, 3, 2]                
 17            [-1, 12]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 18                  -1  1    123648  ultralytics.nn.modules.block.C2f             [192, 128, 1]                 
 19                  -1  1    147712  ultralytics.nn.modules.conv.Conv             [128, 128, 3, 2]              
 20             [-1, 9]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 21                  -1  1    493056  ultralytics.nn.modules.block.C2f             [384, 256, 1]                 
 22        [15, 18, 21]  1    753262  ultralytics.nn.modules.head.Detect           [10, [64, 128, 256]]          
Model summary: 225 layers, 3012798 parameters, 3012782 gradients, 8.2 GFLOPs

Transferred 319/355 items from pretrained weights
Freezing layer 'model.22.dfl.conv.weight'
AMP: running Automatic Mixed Precision (AMP) checks with YOLOv8n...
AMP: checks passed ✅
WARNING ⚠️ imgsz=[6000] must be multiple of max stride 32, updating to [6016]
train: Scanning E:\DeepLearning\dataset\drone_images_deal\split\train\labels.cache... 1616 images, 0 backgrounds, 0 corrupt: 100%|██████████| 1616/1616 [00:00<?, ?it/s]
val: Scanning E:\DeepLearning\dataset\drone_images_deal\split\valid\labels... 202 images, 0 backgrounds, 0 corrupt: 100%|██████████| 202/202 [00:00<00:00, 1324.57it/s]
val: New cache created: E:\DeepLearning\dataset\drone_images_deal\split\valid\labels.cache
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "D:\Programs\miniconda3\envs\python311\Lib\multiprocessing\spawn.py", line 122, in spawn_main
    exitcode = _main(fd, parent_sentinel)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Programs\miniconda3\envs\python311\Lib\multiprocessing\spawn.py", line 131, in _main
    prepare(preparation_data)
  File "D:\Programs\miniconda3\envs\python311\Lib\multiprocessing\spawn.py", line 246, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "D:\Programs\miniconda3\envs\python311\Lib\multiprocessing\spawn.py", line 297, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen runpy>", line 291, in run_path
  File "<frozen runpy>", line 98, in _run_module_code
  File "<frozen runpy>", line 88, in _run_code
  File "D:\python\project\illegal-building\train.py", line 8, in <module>
    from ultralytics import YOLO
  File "D:\Programs\miniconda3\envs\python311\Lib\site-packages\ultralytics\__init__.py", line 5, in <module>
    from ultralytics.data.explorer.explorer import Explorer
  File "D:\Programs\miniconda3\envs\python311\Lib\site-packages\ultralytics\data\__init__.py", line 3, in <module>
    from .base import BaseDataset
  File "D:\Programs\miniconda3\envs\python311\Lib\site-packages\ultralytics\data\base.py", line 15, in <module>
    from torch.utils.data import Dataset
  File "D:\Programs\miniconda3\envs\python311\Lib\site-packages\torch\__init__.py", line 130, in <module>
    raise err
OSError: [WinError 1455] 页面文件太小，无法完成操作。 Error loading "D:\Programs\miniconda3\envs\python311\Lib\site-packages\torch\lib\shm.dll" or one of its dependencies.

解决方法：分配虚拟空间，参考博文：

“OSError: [WinError 1455]页面文件太小，无法完成操作。”解决方案

3、Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

报错日志：

D:\Programs\miniconda3\envs\python311\python.exe D:\python\project\illegal-building\train.py 
New https://pypi.org/project/ultralytics/8.1.16 available 😃 Update with 'pip install -U ultralytics'
Ultralytics YOLOv8.1.9 🚀 Python-3.11.7 torch-2.2.0 CUDA:0 (NVIDIA GeForce GTX 1650, 4096MiB)
engine\trainer: task=detect, mode=train, model=yolov8n.pt, data=class.yaml, epochs=10, time=None, patience=50, batch=2, imgsz=1440, save=True, save_period=-1, cache=False, device=0, workers=8, project=None, name=train7, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=False, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, auto_augment=randaugment, erasing=0.4, crop_fraction=1.0, cfg=None, tracker=botsort.yaml, save_dir=runs\detect\train7
Overriding model.yaml nc=80 with nc=10

                   from  n    params  module                                       arguments                     
  0                  -1  1       464  ultralytics.nn.modules.conv.Conv             [3, 16, 3, 2]                 
  1                  -1  1      4672  ultralytics.nn.modules.conv.Conv             [16, 32, 3, 2]                
  2                  -1  1      7360  ultralytics.nn.modules.block.C2f             [32, 32, 1, True]             
  3                  -1  1     18560  ultralytics.nn.modules.conv.Conv             [32, 64, 3, 2]                
  4                  -1  2     49664  ultralytics.nn.modules.block.C2f             [64, 64, 2, True]             
  5                  -1  1     73984  ultralytics.nn.modules.conv.Conv             [64, 128, 3, 2]               
  6                  -1  2    197632  ultralytics.nn.modules.block.C2f             [128, 128, 2, True]           
  7                  -1  1    295424  ultralytics.nn.modules.conv.Conv             [128, 256, 3, 2]              
  8                  -1  1    460288  ultralytics.nn.modules.block.C2f             [256, 256, 1, True]           
  9                  -1  1    164608  ultralytics.nn.modules.block.SPPF            [256, 256, 5]                 
 10                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 11             [-1, 6]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 12                  -1  1    148224  ultralytics.nn.modules.block.C2f             [384, 128, 1]                 
 13                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 14             [-1, 4]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 15                  -1  1     37248  ultralytics.nn.modules.block.C2f             [192, 64, 1]                  
 16                  -1  1     36992  ultralytics.nn.modules.conv.Conv             [64, 64, 3, 2]                
 17            [-1, 12]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 18                  -1  1    123648  ultralytics.nn.modules.block.C2f             [192, 128, 1]                 
 19                  -1  1    147712  ultralytics.nn.modules.conv.Conv             [128, 128, 3, 2]              
 20             [-1, 9]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 21                  -1  1    493056  ultralytics.nn.modules.block.C2f             [384, 256, 1]                 
 22        [15, 18, 21]  1    753262  ultralytics.nn.modules.head.Detect           [10, [64, 128, 256]]          
Model summary: 225 layers, 3012798 parameters, 3012782 gradients, 8.2 GFLOPs

Transferred 319/355 items from pretrained weights
Freezing layer 'model.22.dfl.conv.weight'
train: Scanning E:\DeepLearning\dataset\drone_images_deal\split\train\labels.cache... 1616 images, 0 backgrounds, 0 corrupt: 100%|██████████| 1616/1616 [00:00<?, ?it/s]
val: Scanning E:\DeepLearning\dataset\drone_images_deal\split\valid\labels.cache... 202 images, 0 backgrounds, 0 corrupt: 100%|██████████| 202/202 [00:00<?, ?it/s]
Plotting labels to runs\detect\train7\labels.jpg... 
optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 
optimizer: AdamW(lr=0.000714, momentum=0.9) with parameter groups 57 weight(decay=0.0), 64 weight(decay=0.0005), 63 bias(decay=0.0)
Image sizes 1440 train, 1440 val
Using 8 dataloader workers
Logging results to runs\detect\train7
Starting training for 10 epochs...
Closing dataloader mosaic

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       1/10      2.52G       2.42      7.816      1.957          4       1440:  57%|█████▋    | 459/808 [02:43<02:02,  2.86it/s]C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [64,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [65,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [66,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [67,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [68,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [69,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [70,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [71,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [72,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [73,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [74,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [75,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [76,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [77,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [78,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [79,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [80,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [81,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [82,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [83,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [84,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [85,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [86,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [87,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [88,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [89,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [90,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [91,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [92,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [93,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [94,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [95,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [836,0,0], thread: [0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [836,0,0], thread: [1,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [836,0,0], thread: [2,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [836,0,0], thread: [3,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [836,0,0], thread: [4,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [836,0,0], thread: [5,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [836,0,0], thread: [6,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [836,0,0], thread: [7,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [836,0,0], thread: [8,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [836,0,0], thread: [9,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [836,0,0], thread: [10,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [836,0,0], thread: [11,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [836,0,0], thread: [12,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [836,0,0], thread: [13,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [836,0,0], thread: [14,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [836,0,0], thread: [15,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [836,0,0], thread: [16,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [836,0,0], thread: [17,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [836,0,0], thread: [18,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [836,0,0], thread: [19,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [836,0,0], thread: [20,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [836,0,0], thread: [21,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [836,0,0], thread: [22,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [836,0,0], thread: [23,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [836,0,0], thread: [24,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [836,0,0], thread: [25,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [836,0,0], thread: [26,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [836,0,0], thread: [27,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [836,0,0], thread: [28,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [836,0,0], thread: [29,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [836,0,0], thread: [30,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [836,0,0], thread: [31,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [96,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [97,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [98,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [99,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [100,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [101,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [102,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [103,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [104,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [105,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [106,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [107,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [108,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [109,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [110,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [111,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [112,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [113,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [114,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [115,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [116,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [117,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [118,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [119,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [120,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [121,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [122,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [123,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [124,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [125,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [126,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [888,0,0], thread: [127,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
       1/10      2.52G       2.42      7.816      1.957          4       1440:  57%|█████▋    | 459/808 [02:43<02:04,  2.81it/s]
Traceback (most recent call last):
  File "D:\python\project\illegal-building\train.py", line 17, in <module>
    results = model.train(data="class.yaml", imgsz=1440, epochs=10, batch=2,device=0)  # 训练模型
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Programs\miniconda3\envs\python311\Lib\site-packages\ultralytics\engine\model.py", line 601, in train
    self.trainer.train()
  File "D:\Programs\miniconda3\envs\python311\Lib\site-packages\ultralytics\engine\trainer.py", line 208, in train
    self._do_train(world_size)
  File "D:\Programs\miniconda3\envs\python311\Lib\site-packages\ultralytics\engine\trainer.py", line 376, in _do_train
    self.loss, self.loss_items = self.model(batch)
                                 ^^^^^^^^^^^^^^^^^
  File "D:\Programs\miniconda3\envs\python311\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Programs\miniconda3\envs\python311\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Programs\miniconda3\envs\python311\Lib\site-packages\ultralytics\nn\tasks.py", line 79, in forward
    return self.loss(x, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Programs\miniconda3\envs\python311\Lib\site-packages\ultralytics\nn\tasks.py", line 258, in loss
    return self.criterion(preds, batch)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Programs\miniconda3\envs\python311\Lib\site-packages\ultralytics\utils\loss.py", line 221, in __call__
    _, target_bboxes, target_scores, fg_mask, _ = self.assigner(
                                                  ^^^^^^^^^^^^^^
  File "D:\Programs\miniconda3\envs\python311\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Programs\miniconda3\envs\python311\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Programs\miniconda3\envs\python311\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "D:\Programs\miniconda3\envs\python311\Lib\site-packages\ultralytics\utils\tal.py", line 72, in forward
    mask_pos, align_metric, overlaps = self.get_pos_mask(
                                       ^^^^^^^^^^^^^^^^^^
  File "D:\Programs\miniconda3\envs\python311\Lib\site-packages\ultralytics\utils\tal.py", line 94, in get_pos_mask
    align_metric, overlaps = self.get_box_metrics(pd_scores, pd_bboxes, gt_labels, gt_bboxes, mask_in_gts * mask_gt)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Programs\miniconda3\envs\python311\Lib\site-packages\ultralytics\utils\tal.py", line 113, in get_box_metrics
    bbox_scores[mask_gt] = pd_scores[ind[0], :, ind[1]][mask_gt]  # b, max_num_obj, h*w
                           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


Process finished with exit code 1

问题原因：

改成device='cpu'，查看具体报错

4、IndexError: index 10 is out of bounds for dimension 1 with size 10

解决方法见我的这篇博文：

模型训练时报错IndexError: index 10 is out of bounds for dimension 1 with size 10-CSDN博客

5、使用GPU训练，torch.cuda.OutOfMemoryError: CUDA out of memory

解决方法参考这篇：模型训练时报错Failed to allocate 12192768 bytes in function ‘cv::OutOfMemoryError‘-CSDN博客

2048 AI社区

有“AI”的1024 = 2048，欢迎大家加入2048 AI社区

更多推荐

AI导出pdf指令

2048 AI社区

大模型调教指南：如何通过人工智能让AI更懂你的业务？

大语言模型（LLM）本质上是一个概率训练生成系统。它的记忆来源于训练数据剪裁日期。你问它的股市，它可能在胡思乱想。它是通过概率预测下一个字，逻辑推理（尤其是数学和复杂业务逻辑）很容易翻车。每个企业都有自己的“黑话”和原生数据。通用模型不懂你们公司的区域、设置架构，也不懂一些罕见病的临床表现。它会一本正经地胡说八道，生成场景合理但实际错误的内容。调优（Fine-tuning）就是训练解决这些问题的核