Nvidia GPU Architecture

2017年3月23日 (四) 13:28的版本

1 Overview
2 Fermi Micro Architecture
- 2.1 Fermi Cards
  - 2.1.1 GeForce 400 Series
  - 2.1.2 GeForce 500 Series
3 Kepler Micro Architecture
- 3.1 Kepler Cards
  - 3.1.1 GeForce 600 series
  - 3.1.2 GeForce 700 series
4 Maxwell Micro Architecture
- 4.1 Maxwell Cards
  - 4.1.1 GeForce 900 series
5 Pascal Micro Architecture
- 5.1 Pascal Cards
  - 5.1.1 GeForce 1000 series
6 Reference

1 Overview

The GPU architecture is built around a scalable array of Streaming Multiprocessors (SM)

The key components of a SM:

CUDA cores (ALU + FPU)
Double Precision Units (DPU)
Special Function Units (SPU)
Load/Store Units (LD/ST)

Register File
Shared Memory/L1 Cache

Warp Scheduler

2 Fermi Micro Architecture

The Fermi architecture was the first complete GPU computing architecture to deliver the features required for the most demanding HPC applications.

1 SM: 32 CUDA cores + 16 Load/Store Unit + 4 SPU + 2 Warp Scheduler
1 SM: 2 Warps
1 Warp: 16 CUDA cores + 16 Load/Store Unit(shared) + 4 SPU(shared) + [32 threads context ?]
Handle 48 warps per SM for a total of 1536 (48x32) threads resident in a single SM at a time [48 Warps context ?]
1 CUDA core: 1 ALU + 1 FPU
Register file is 32KB

GTX480:

15 SM (32 CUDA cores/SM)
480 CUDA cores
1345 GFLOPs
40 nm
3.2 billion transistors
GTX480 250Watts

2.1 Fermi Cards

2.1.1 GeForce 400 Series

Release date: April 12, 2010
Codename: GF10x
Architecture: Fermi

Models:

GeForce Series
GeForce GT Series
GeForce GTS Series
GeForce GTX Series

Fabrication process and transistors:

260M 40nm (GT218)
585M 40 nm (GF108)
1.170M 40 nm (GF106)
1.950M 40 nm (GF104)
1.950M 40 nm (GF114)
3.200M 40 nm (GF100)

Cards:

Entry-level GT420 GT430
Mid-range GT440 GTS450 GTX460
High-end GTX465 GTX470
Enthusiast GTX480 (2010.3, 3.2 billion Transistors, 15 SMs, 1536MB, 1345 GFLOPS, 250W)

2.1.2 GeForce 500 Series

Release date: 8 November 2010
Codename: GF11x
Architecture: Fermi

Models:

GeForce Series
GeForce GT Series
GeForce GTX Series

Fabrication process and transistors:

292M 40nm (GF119)
585M 40 nm (GF108)
1.170M 40 nm (GF116)
1.950M 40 nm (GF114)
3.000M 40 nm (GF110)

Cards:

Entry-level 510 GT520 GT530
Mid-range GT545 GTX550Ti GTX560 GTX560Ti
High-end GTX570 GTX580 GTX590(2011.3, 2x3 billion transistors, 32 SMs, 2x1536MB, 2488GFLOPS, 365W)

3 Kepler Micro Architecture

Released in the fall of 2012

1 SM: 4 Warps Scheduler (2 instruction dispatchers per Warp)
1 Warp: [32 threads context ?]
1 SM: 192 CUDA cores + 64 DPU (shared) + 32 Load/Store Unit (shared) + 32 SPU (shared) + 4 Warp Scheduler

Handle 64 warps/SM for a total of 2048 (64x32) threads resident in a single SM at a time [64 Warps context ?]
Register file size is 64K

K20X:

14 SM
2688 CUDA cores, 6GB
3.935 TFLOPs / DPU: 1.312 TFLOPs
28 nm
235Watts

GTX690:

2x8 SM
3072 CUDA cores
2x2.8TFLOPs
2x3.54 billion transistors
300Watts (2012.4)

3.1 Kepler Cards

3.1.1 GeForce 600 series

Release date: March 22, 2012
Codename: GK10x

Models

GeForce Series
GeForce GT Series
GeForce GTX Series

Fabrication process and transistors

292M 40 nm (GF119)
585M 40 nm (GF108)
1.170M 40 nm (GF116)
1.950M 40 nm (GF114)
1.270M 28 nm (GK107)
1.270M 28 nm (GK208)
2.540M 28 nm (GK106)
3.540M 28 nm (GK104)

Cards:

Entry-level GT610 GT620 GT630 GT640
Mid-range GTX650 GTX650Ti GTX650Ti Boost GTX 660
High-end GTX660Ti GTX670
Enthusiast GTX680 GTX690

3.1.2 GeForce 700 series

Release date: May 2013
Codename: GK110 GK208

Models:

GeForce Series
GeForce GT Series
GeForce GTX Series

Fabrication process and transistors:

585M 28 nm (GF117)
1.020M 28 nm (GK208)
1.270M 28 nm (GK107)
3.540M 28 nm (GK104)
7.080M 28 nm (GK110)

Cards

Entry-level: GeForce GT 705 GeForce GT 710 GeForce GT 720 GeForce GT 730 GeForce GT 740 GeForce GTX 745
Mid-range: GeForce GTX 750 GeForce GTX 750 Ti GeForce GTX 760 192-Bit GeForce GTX 760 GeForce GTX 760 Ti
High-end: GeForce GTX 770 GeForce GTX 780
Enthusiast: GeForce GTX 780 Ti GeForce GTX Titan GeForce GTX Titan Black GeForce GTX Titan Z

4 Maxwell Micro Architecture

1 SM (SMM): 4 Warp Scheduler (2 instruction dispatchers per Warp)
1 Warp: 32 CUDA cores + 1 DPU + 8 Load/Store Units + 8 SPU

1 SM (SMM): 128 CUDA cores + 4 DPU + 32 Load/Store Units + 32 SPU

GTX980:

16 SM (SMM)
2048 CUDA cores
4612 GFLOPs / DPU: 144 GFLOPs
28 nm
5.2 billion transistors
165W

TITAN X:

4.1 Maxwell Cards

4.1.1 GeForce 900 series

Release date: September 2014
Codename: GM20x

Models

GeForce Series
GeForce GT Series
GeForce GTX Series

Cards

Mid-range GTX950 / GTX960
High-end GTX970 / GTX980
Enthusiast GTX980 Ti / GTX Titan X

5 Pascal Micro Architecture

1 SM: 2 Warp Scheduler (2 instruction dispatchers per Warp)
1 Warp: 32 CUDA cores + 16 DPU + 8 Load/Store Units + 8 SPU

1 SM: 64 CUDA cores + 32 DPU + 16 Load/Store Units + 16 SPU

5.1 Pascal Cards

5.1.1 GeForce 1000 series

Release date: May 2016
Codename: GP10x

Models

GeForce GTX Series

Fabrication process and transistors:

3.3B 14 nm (GP107)
4.4B 16 nm (GP106)
7.2B 16 nm (GP104)
12B 16 nm (GP102)

Cards:

Entry-level: GTX1050 / GTX1050 Ti
Mid-range: GTX1060
High-end: GTX1070 / GTX1080(2016.5, 2560 CUDA cores, 20 SMs, 16nm, 7.2billion, 8GB, 8.2TFLOPs / DPU: 0.257TFLOPs, 180Watts)
Enthusiast: GTX1080 Ti / NVIDIA Titan X(2016.8, 3584 CUDA cores, 28 SMs, 16nm, 12billion, 12GB, 10TFLOPs / DPU: 0.317TFLOPs, 250Watts)

6 Reference

https://en.wikipedia.org/wiki/GeForce_900_series

https://en.wikipedia.org/wiki/GeForce_10_series

https://en.wikipedia.org/wiki/Nvidia_Tesla

@@ 第247行： / 第247行： @@
 [[文件:Pascal-GP104-arch.png | 950px]]
+* 1 SM: 2 Warp Scheduler (2 instruction dispatchers per Warp)
+* 1 Warp: 32 CUDA cores + 16 DPU + 8 Load/Store Units + 8 SPU
+* 1 SM: 64 CUDA cores + 32 DPU + 16 Load/Store Units + 16 SPU
 <br>