Nvidia GPU Architecture

来自Jack's Lab
2017年3月23日 (四) 14:30Comcat (讨论 | 贡献)的版本

跳转到: 导航, 搜索

目录

1 Overview

The GPU architecture is built around a scalable array of Streaming Multiprocessors (SM)

The key components of a SM:

  • CUDA cores (ALU + FPU)
  • Double Precision Units (DPU)
  • Special Function Units (SPU)
  • Load/Store Units (LD/ST)
  • Register File
  • Shared Memory/L1 Cache
  • Warp Scheduler



2 Fermi Micro Architecture

The Fermi architecture was the first complete GPU computing architecture to deliver the features required for the most demanding HPC applications.


Fermi-arch.png


  • 1 SM: 32 CUDA cores + 16 Load/Store Unit + 4 SPU + 2 Warp Scheduler
  • 1 SM: 2 Warps
  • 1 Warp: 16 CUDA cores + 16 Load/Store Unit(shared) + 4 SPU(shared) + [32 threads context ?]
  • Handle 48 warps per SM for a total of 1536 (48x32) threads resident in a single SM at a time [48 Warps context ?]
  • 1 CUDA core: 1 ALU + 1 FPU
  • Register file is 32KB

GTX480:

  • 15 SM (32 CUDA cores/SM)
  • 480 CUDA cores
  • 1345 GFLOPs
  • 40 nm
  • 3.2 billion transistors
  • GTX480 250Watts


2.1 Video Cards

2.1.1 GeForce 400 Series

  • Release date: April 12, 2010
  • Codename: GF10x
  • Architecture: Fermi
  • Models:
  1. GeForce Series
  2. GeForce GT Series
  3. GeForce GTS Series
  4. GeForce GTX Series
  • Fabrication process and transistors:
  1. 260M 40nm (GT218)
  2. 585M 40 nm (GF108)
  3. 1.170M 40 nm (GF106)
  4. 1.950M 40 nm (GF104)
  5. 1.950M 40 nm (GF114)
  6. 3.200M 40 nm (GF100)
  • Cards:
  1. Entry-level GT420 GT430
  2. Mid-range GT440 GTS450 GTX460
  3. High-end GTX465 GTX470
  4. Enthusiast GTX480 (2010.3, 3.2 billion Transistors, 15 SMs, 1536MB, 1345 GFLOPS, 250W)



2.1.2 GeForce 500 Series

  • Release date: 8 November 2010
  • Codename: GF11x
  • Architecture: Fermi
  • Models:
  1. GeForce Series
  2. GeForce GT Series
  3. GeForce GTX Series
  • Fabrication process and transistors:
  1. 292M 40nm (GF119)
  2. 585M 40 nm (GF108)
  3. 1.170M 40 nm (GF116)
  4. 1.950M 40 nm (GF114)
  5. 3.000M 40 nm (GF110)
  • Cards:
  1. Entry-level 510 GT520 GT530
  2. Mid-range GT545 GTX550Ti GTX560 GTX560Ti
  3. High-end GTX570 GTX580 GTX590(2011.3, 2x3 billion transistors, 32 SMs, 2x1536MB, 2488GFLOPS, 365W)



2.2 GPGPU Cards



3 Kepler Micro Architecture

Kepler-SM-arch.png

Kepler-arch.jpg


Released in the fall of 2012

  • 1 SM: 4 Warps Scheduler (2 instruction dispatchers per Warp)
  • 1 Warp: [32 threads context ?]
  • 1 SM: 192 CUDA cores + 64 DPU (shared) + 32 Load/Store Unit (shared) + 32 SPU (shared) + 4 Warp Scheduler
  • Handle 64 warps/SM for a total of 2048 (64x32) threads resident in a single SM at a time [64 Warps context ?]
  • Register file size is 64K


Kepler-GK110-arch.jpg


K20X:

  • 14 SM
  • 2688 CUDA cores, 6GB
  • 3.935 TFLOPs / DPU: 1.312 TFLOPs
  • 28 nm
  • 235Watts

GTX690:

  • 2x8 SM
  • 3072 CUDA cores
  • 2x2.8TFLOPs
  • 2x3.54 billion transistors
  • 300Watts (2012.4)



3.1 Video Cards

3.1.1 GeForce 600 series
  • Release date: March 22, 2012
  • Codename: GK10x
  • Models
  1. GeForce Series
  2. GeForce GT Series
  3. GeForce GTX Series
  • Fabrication process and transistors
  1. 292M 40 nm (GF119)
  2. 585M 40 nm (GF108)
  3. 1.170M 40 nm (GF116)
  4. 1.950M 40 nm (GF114)
  5. 1.270M 28 nm (GK107)
  6. 1.270M 28 nm (GK208)
  7. 2.540M 28 nm (GK106)
  8. 3.540M 28 nm (GK104)
  • Cards:
  1. Entry-level GT610 GT620 GT630 GT640
  2. Mid-range GTX650 GTX650Ti GTX650Ti Boost GTX 660
  3. High-end GTX660Ti GTX670
  4. Enthusiast GTX680 GTX690


3.1.2 GeForce 700 series
  • Release date: May 2013
  • Codename: GK110 GK208
  • Models:
  1. GeForce Series
  2. GeForce GT Series
  3. GeForce GTX Series
  • Fabrication process and transistors:
  1. 585M 28 nm (GF117)
  2. 1.020M 28 nm (GK208)
  3. 1.270M 28 nm (GK107)
  4. 3.540M 28 nm (GK104)
  5. 7.080M 28 nm (GK110)
  • Cards
  1. Entry-level: GeForce GT 705 GeForce GT 710 GeForce GT 720 GeForce GT 730 GeForce GT 740 GeForce GTX 745
  2. Mid-range: GeForce GTX 750 GeForce GTX 750 Ti GeForce GTX 760 192-Bit GeForce GTX 760 GeForce GTX 760 Ti
  3. High-end: GeForce GTX 770 GeForce GTX 780
  4. Enthusiast: GeForce GTX 780 Ti GeForce GTX Titan GeForce GTX Titan Black GeForce GTX Titan Z



3.2 GPGPU Cards



4 Maxwell Micro Architecture

Maxwell-GTX980-SM-arch.png


Maxwell-arch.png


  • 1 SM (SMM): 4 Warp Scheduler (2 instruction dispatchers per Warp)
  • 1 Warp: 32 CUDA cores + 1 DPU + 8 Load/Store Units + 8 SPU
  • 1 SM (SMM): 128 CUDA cores + 4 DPU + 32 Load/Store Units + 32 SPU


GTX980:

  • 16 SM (SMM)
  • 2048 CUDA cores
  • 4612 GFLOPs / DPU: 144 GFLOPs
  • 28 nm
  • 5.2 billion transistors
  • 165W


TITAN X:

TITAN-X-arch.png


4.1 Video Cards

4.1.1 GeForce 900 series

  • Release date: September 2014
  • Codename: GM20x
  • Models
  1. GeForce Series
  2. GeForce GT Series
  3. GeForce GTX Series
  • Cards
  1. Mid-range GTX950 / GTX960
  2. High-end GTX970 / GTX980
  3. Enthusiast GTX980 Ti / GTX Titan X



4.2 GPGPU Cards



5 Pascal Micro Architecture

The SM arch of Pascal GP100

Pascal-GP100-SM-arch.png

  • 1 SM: 2 Warp Scheduler (2 instruction dispatchers per Warp)
  • 1 Warp: 32 CUDA cores + 16 DPU + 8 Load/Store Units + 8 SPU
  • 1 SM: 64 CUDA cores + 32 DPU + 16 Load/Store Units + 16 SPU
  • e.g. Tesla P100: 60 SM(56 enabled), 3584 CUDA cores, 1792 DPUs, 16GB, 9.5 TFLOPs / DPU: 4.7 TFLOPs, 300Watts


The arch of Pascal GP100:

Pascal-GP100-arch.png


The SM arch of Pascal GP104

Pascal-GP104-SM-arch.png

  • 1 SM: 4 Warp Scheduler (2 instruction dispatchers per Warp)
  • 1 Warp: 32 CUDA cores + 1 DPU + 8 Load/Store Units + 8 SPU
  • 1 SM: 128 CUDA cores + 4 DPU + 32 Load/Store Units + 32 SPU
  • e.g. GTX1080, GTX1080Ti, TITAN X
  • GTX1080 (GP104): 20 SMs, 2560 CUDA cores, 80 DPUs, 16nm, 7.2billion, 8GB, 8.2 TFLOPs / DPU: 257 GFLOPs, 180Watts


The arch of Pascal GP104:

Pascal-GP104-arch.png


5.1 Video Cards

5.1.1 GeForce 1000 series

  • Release date: May 2016
  • Codename: GP10x
  • Models
  1. GeForce GTX Series
  • Fabrication process and transistors:
  1. 3.3B 14 nm (GP107)
  2. 4.4B 16 nm (GP106)
  3. 7.2B 16 nm (GP104)
  4. 12B 16 nm (GP102)
  • Cards:
  1. Entry-level: GTX1050 / GTX1050 Ti
  2. Mid-range: GTX1060
  3. High-end: GTX1070 / GTX1080(2016.5, 2560 CUDA cores, 20 SMs, 16nm, 7.2billion, 8GB, 8.2TFLOPs / DPU: 0.257TFLOPs, 180Watts)
  4. Enthusiast: GTX1080 Ti / NVIDIA Titan X(2016.8, 3584 CUDA cores, 28 SMs, 16nm, 12billion, 12GB, 10TFLOPs / DPU: 0.317TFLOPs, 250Watts)



5.2 GPGPU Cards



6 Reference




个人工具
名字空间

变换
操作
导航
工具箱