💡 选购建议(按预算 & 需求)

目标 推荐显卡 价格区间(人民币) 理由说明
预算有限,只想体验 AI 玩玩 RTX 2080 Ti(魔改 22GB) ¥2,000 – ¥3,000 性价比极高;显存魔改后(22GB)可运行多数 LLM 推理(如 7B 模型量化版),适合入门学习。但架构较老(Turing),无 FP8/BF16/TF32 原生支持。
中等预算,追求稳定 & 性能平衡 RTX 3090 ¥5,500 – ¥6,000 24GB 大显存 + Ampere 架构,支持 BF16/TF32,适合运行 13B 以内量化模型,兼顾训练与推理,是老卡中的“AI 神卡”。
追求前沿 AI 能力 & 最佳性价比(高端) RTX 4090 / 4090D ¥15,000 – ¥20,000 Ada Lovelace 架构,原生支持 FP8、DLSS 3、超高 Tensor 性能;24GB GDDR6X + 1TB/s+ 带宽,可流畅运行 13B~30B 量化模型,是当前最推荐的高端 AI 卡(即便 5090 发布后仍极具性价比)。
顶级预算,尝鲜最新技术 RTX 5090(预计) ¥20,000 – ¥25,000 Blackwell 架构,支持 FP4/FP8、GDDR7、PCIe 5.0;性能飞跃,适合大规模本地推理或微调。但价格昂贵,性价比不如 4090。

总结

  • “玩玩 AI”2080 Ti 魔改版(22GB 是关键)
  • “认真搞点小项目”3090
  • “追求前沿体验 & 长期使用”4090(首选)或 5090(尝鲜)

📊 核心参数对比表

参数类别 参数名称 RTX 2080 Ti RTX 3090 RTX 4090 RTX 5090
基础信息 GPU Codename TU102 GA102 AD102 GB202
GPU Architecture NVIDIA Turing NVIDIA Ampere NVIDIA Ada Lovelace NVIDIA Blackwell
GPCs (Graphics Processing Clusters) 6 7 11 11
TPCs (Texture Processing Clusters) 34 41 64 85
SMs (Streaming Multiprocessors) 68 82 128 170
CUDA Cores / SM 64 128 128 128
CUDA Cores / GPU 4,352 10,496 16,384 21,760
Tensor Cores / SM 8 (2nd Gen) 4 (3rd Gen) 4 (4th Gen) 4 (5th Gen)
Tensor Cores / GPU 544 328 512 680
RT Cores 68 (1st Gen) 82 (2nd Gen) 128 (3rd Gen) 170 (4th Gen)
GPU Boost Clock (MHz) 1,545 / 1,635 1,695 2,520 2,407
CUDA 性能 Peak FP16 TFLOPS (non-Tensor) 26.9 / 28.5 35.6 82.6 104.8
Peak BF16 TFLOPS (non-Tensor) N/A 35.6 82.6 104.8
Peak FP32 TFLOPS (non-Tensor) 13.4 / 14.2 35.6 82.6 104.8
Peak INT32 TOPS (non-Tensor) 13.4 / 14.2 17.8 41.3 104.8
RT TFLOPS (Giga Rays/sec) 10 69.5 191 317.5
Tensor 性能 Peak FP4 Tensor TFLOPS (FP32 Accumulate) N/A N/A N/A 1,676 / 3,352
Peak INT4 Tensor TOPS 430.3 / 455.4 568 / 1136 1321.2 / 2642.4 N/A
Peak FP8 Tensor TFLOPS (FP16 Accumulate) N/A N/A 660.6 / 1,321.2 838 / 1,676
Peak FP8 Tensor TFLOPS (FP32 Accumulate) N/A N/A 330.3 / 660.6 419 / 838
Peak INT8 Tensor TOPS 215.2 / 227.7 284 / 568 660.6 / 1,321.2 838 / 1,676
Peak FP16 Tensor TFLOPS (FP16 Accumulate) 107.6 / 113.8 142 / 284 330.3 / 660.6 419 / 838
Peak FP16 Tensor TFLOPS (FP32 Accumulate) 53.8 / 56.9 71 / 142 165.2 / 330.4 209.5 / 419
Peak BF16 Tensor TFLOPS (FP32 Accumulate) N/A 71 / 142 165.2 / 330.4 209.5 / 419
Peak TF32 Tensor TFLOPS N/A 35.6 / 71 82.6 / 165.2 104.8 / 209.5
显存系统 Frame Buffer Memory Size and Type 11 GB GDDR6 24 GB GDDR6X 24 GB GDDR6X 32 GB GDDR7
Memory Interface 352-bit 384-bit 384-bit 512-bit
Memory Clock (Data Rate) 14 Gbps 19.5 Gbps 21 Gbps 28 Gbps
Memory Bandwidth 616 GB/s 936 GB/s 1,008 GB/s 1,792 GB/s
渲染输出 ROPs 88 112 176 176
Pixel Fill-rate (GPixel/s) N/A 193 443.5 423.6
Texture Units 272 328 512 680
Texel Fill-rate (GTexel/s) 420.2 / 444.7 566 1,290.20 1,636.80
缓存与寄存器(单SM) L1 Data Cache / Shared Memory (KB)(SM) 96 128 128 128
L2 Cache Size (KB)(GPU) 5,632 6,144 73,728 98,304
Register File Size (KB)(SM) 256 256 256 256
其他信息 Video Engines (NVENC/NVDEC) 2×NVENC (7th Gen)
1×NVDEC (4th Gen)
1×NVENC (7th Gen)
1×NVDEC (5th Gen)
2×NVENC (8th Gen)
1×NVDEC (5th Gen)
3×NVENC (9th Gen)
2×NVDEC (6th Gen)
TGP (Total Graphics Power) (W) 250 / 260 350 450 575
Transistor Count (Billion) 18.6 28.3 76.3 92.2
Die Size (mm²) 754 628.4 608.5 750
Manufacturing Process 12 nm FFN Samsung 8nm (8N) TSMC 4nm (4N) TSMC 4nm (4N)
PCI Express Interface Gen 3 Gen 4 Gen 4 Gen 5
Logo

有“AI”的1024 = 2048,欢迎大家加入2048 AI社区

更多推荐