查看Nvidia GPU Architecture的源代码


== Overview ==

The GPU architecture is built around a scalable array of <b style="color: #5a0">Streaming Multiprocessors (SM)</b>


[[文件:Nv-dgx1.jpg]]


The key components of a SM:

* CUDA cores (ALU + FPU)
* Double Precision Units (DPU)
* Special Function Units (SPU)
* Load/Store Units (LD/ST)

* Register File
* Shared Memory/L1 Cache
* Warp Scheduler


'''Cards:'''
# GTX980 (2048 CUDA cores, 28nm, 5.2billion, 4GB, 4.981 TFLOPS / DPU: 0.1556TFLOPs, 165W, 2014.9, $549) [https://www.techpowerup.com/gpu-specs/evga-gtx-980.b3061 evga gtx980]
#Entry-level:	GTX1050 / GTX1050 Ti
#Mid-range:	GTX1060
#High-end:	GTX1070 / GTX1080 (2560 CUDA cores, 20 SMs, 16nm, 7.2billion, 8GB, 8.2TFLOPs / DPU: 0.257TFLOPs, 180W, 2016.5)
#Enthusiast:	GTX1080 Ti / TITAN X (3584 CUDA cores, 28 SMs, 16nm, 12billion, 12GB, 10TFLOPs / DPU: 0.317TFLOPs, 250W, 2016.8)

<br><br>

== Fermi Micro Architecture ==

The Fermi architecture was the first complete GPU computing architecture to deliver the features required for the most demanding HPC applications.


[[文件:Fermi-arch.png | 750px]]


* 1 SM: 32 CUDA cores + 16 Load/Store Unit + 4 SPU + 2 Warp Scheduler
* 1 SM: 2 Warps
* 1 Warp: 16 CUDA cores + 16 Load/Store Unit(shared) + 4 SPU(shared) + [32 threads context ?]
* Handle 48 warps per SM for a total of 1536 (48x32) threads resident in a single SM at a time [48 Warps context ?]
* 1 CUDA core: 1 ALU + 1 FPU
* Register file is 32KB

GTX480:
* 15 SM (32 CUDA cores/SM)
* 480 CUDA cores
* 1345 GFLOPs
* 40 nm
* 3.2 billion transistors
* GTX480 250Watts


=== Video Cards ===

==== GeForce 400 Series ====

* Release date:	April 12, 2010
* Codename:	GF10x
* Architecture:	Fermi

* Models:
#GeForce Series
#GeForce GT Series
#GeForce GTS Series
#GeForce GTX Series

* Fabrication process and transistors:
#260M 40nm (GT218)
#585M 40 nm (GF108)
#1.170M 40 nm (GF106)
#1.950M 40 nm (GF104)
#1.950M 40 nm (GF114)
#3.200M 40 nm (GF100)

* Cards:
#Entry-level	GT420 GT430
#Mid-range	GT440 GTS450 GTX460
#High-end	GTX465 GTX470
#Enthusiast	GTX480 (2010.3, 3.2 billion Transistors, 15 SMs, 1536MB, 1345 GFLOPS, 250W)

<br><br>

==== GeForce 500 Series ====

* Release date:	8 November 2010
*Codename:	GF11x
*Architecture:	Fermi

*Models:
#GeForce Series
#GeForce GT Series
#GeForce GTX Series

* Fabrication process and transistors:
#292M 40nm (GF119)
#585M 40 nm (GF108)
#1.170M 40 nm (GF116)
#1.950M 40 nm (GF114)
#3.000M 40 nm (GF110)

*Cards:
#Entry-level	510 GT520 GT530
#Mid-range	GT545 GTX550Ti GTX560 GTX560Ti
#High-end	GTX570 GTX580   GTX590(2011.3, 2x3 billion transistors, 32 SMs, 2x1536MB, 2488GFLOPS, 365W)

<br><br>

=== GPGPU Cards ===

Goto: http://wiki.jackslab.org/Nvidia_GPU_Architecture#Nvidia_Tesla_GPGPU_Cards

<br><br>

== Kepler Micro Architecture ==

[[文件:Kepler-SM-arch.png]]

[[文件:Kepler-arch.jpg | 800px]]


Released in the fall of 2012

* 1 SM: 4 Warps Scheduler (2 instruction dispatchers per Warp)
* 1 Warp: [32 threads context ?]
* 1 SM: 192 CUDA cores + 64 DPU (shared) + 32 Load/Store Unit (shared) + 32 SPU (shared) + 4 Warp Scheduler

* Handle 64 warps/SM for a total of 2048 (64x32) threads resident in a single SM at a time [64 Warps context ?]
* Register file size is 64K


[[文件:Kepler-GK110-arch.jpg | 800px]]


K20X:

* 14 SM
* 2688 CUDA cores, 6GB
* 3.935 TFLOPs / DPU: 1.312 TFLOPs
* 28 nm
* 235Watts

GTX690:

* 2x8 SM
* 3072 CUDA cores
* 2x2.8TFLOPs
* 2x3.54 billion transistors
* 300Watts (2012.4)



<br>

==== Video Cards ====

===== GeForce 600 series =====

*Release date:	March 22, 2012
*Codename:	GK10x

*Models
#GeForce Series
#GeForce GT Series
#GeForce GTX Series

*Fabrication process and transistors
#292M 40 nm (GF119)
#585M 40 nm (GF108)
#1.170M 40 nm (GF116)
#1.950M 40 nm (GF114)
#1.270M 28 nm (GK107)
#1.270M 28 nm (GK208)
#2.540M 28 nm (GK106)
#3.540M 28 nm (GK104)

*Cards:
#Entry-level	GT610  GT620  GT630  GT640
#Mid-range	GTX650  GTX650Ti  GTX650Ti Boost  GTX 660
#High-end	GTX660Ti  GTX670
#Enthusiast	GTX680  GTX690

<br>

===== GeForce 700 series =====

*Release date:	May 2013
*Codename:	GK110  GK208

*Models:
#GeForce Series
#GeForce GT Series
#GeForce GTX Series

*Fabrication process and transistors:	
#585M 28 nm (GF117)
#1.020M 28 nm (GK208)
#1.270M 28 nm (GK107)
#3.540M 28 nm (GK104)
#7.080M 28 nm (GK110)

*Cards
#Entry-level:	GeForce GT 705  GeForce GT 710  GeForce GT 720  GeForce GT 730  GeForce GT 740  GeForce GTX 745
#Mid-range:	GeForce GTX 750  GeForce GTX 750 Ti  GeForce GTX 760 192-Bit  GeForce GTX 760  GeForce GTX 760 Ti
#High-end:	GeForce GTX 770  GeForce GTX 780
#Enthusiast:	GeForce GTX 780 Ti  GeForce GTX Titan  GeForce GTX Titan Black  GeForce GTX Titan Z

<br><br>

=== GPGPU Cards ===

Goto: http://wiki.jackslab.org/Nvidia_GPU_Architecture#Nvidia_Tesla_GPGPU_Cards

<br><br>

== Maxwell Micro Architecture ==

The SM arch of Maxwell GM204:

[[文件:Maxwell-GTX980-SM-arch.png]]

* 1 SM (SMM): 4 Warp Scheduler (2 instruction dispatchers per Warp)
* 1 Warp: 32 CUDA cores + 1 DPU + 8 Load/Store Units + 8 SPU
* 1 SM (SMM): 128 CUDA cores + 4 DPU + 32 Load/Store Units + 32 SPU
* e.g. GTX980: 16 SM (SMM), 2048 CUDA cores, 64 DPUs, 4612 GFLOPs / DPU: 144 GFLOPs, 28 nm, 5.2 billion transistors, 165W


The arch of Maxwell GM204:

[[文件:Maxwell-arch.png | 800px]]


TITAN X (GM204):

[[文件:TITAN-X-arch.png | 800px]]

<br>

=== Video Cards ===

==== GeForce 900 series ====

* Release date:	September 2014
* Codename:	GM20x

* Models
#GeForce Series
#GeForce GT Series
#GeForce GTX Series

*Cards
#Mid-range	GTX950 / GTX960
#High-end	GTX970 / GTX980
#Enthusiast	GTX980 Ti  / GTX Titan X

<br><br>

=== GPGPU Cards ===

Goto: http://wiki.jackslab.org/Nvidia_GPU_Architecture#Nvidia_Tesla_GPGPU_Cards

<br><br>

== Pascal Micro Architecture ==

;;The SM arch of Pascal GP100:

[[文件:Pascal-GP100-SM-arch.png]]

* 1 SM: 2 Warp Scheduler (2 instruction dispatchers per Warp)
* 1 Warp: 32 CUDA cores + 16 DPU + 8 Load/Store Units + 8 SPU
* 1 SM: 64 CUDA cores + 32 DPU + 16 Load/Store Units + 16 SPU
* e.g. Tesla P100: 60 SM(56 enabled), 3584 CUDA cores, 1792 DPUs, 16GB, 9.5 TFLOPs / DPU: 4.7 TFLOPs， 300Watts


The arch of Pascal GP100:

[[文件:Pascal-GP100-arch.png | 800px]]


;;The SM arch of Pascal GP104

[[文件:Pascal-GP104-SM-arch.png | 624px]]

* 1 SM: 4 Warp Scheduler (2 instruction dispatchers per Warp)
* 1 Warp: 32 CUDA cores + 1 DPU + 8 Load/Store Units + 8 SPU
* 1 SM: 128 CUDA cores + 4 DPU + 32 Load/Store Units + 32 SPU
* e.g. GTX1080, GTX1080Ti, TITAN X
* GTX1080 (GP104): 20 SMs, 2560 CUDA cores, 80 DPUs, 16nm, 7.2billion, 8GB, 8.2 TFLOPs / DPU: 257 GFLOPs, 180Watts


The arch of Pascal GP104:

[[文件:Pascal-GP104-arch.png | 800px]]

<br>

=== Video Cards ===

==== GeForce 1000 series ====

* Release date:	May 2016
* Codename:	GP10x

* Models
#GeForce GTX Series

* Fabrication process and transistors:
#3.3B 14 nm (GP107)
#4.4B 16 nm (GP106)
#7.2B 16 nm (GP104)
#12B 16 nm (GP102)

* Cards:
#Entry-level:	GTX1050 / GTX1050 Ti
#Mid-range:	GTX1060
#High-end:	GTX1070 / GTX1080(2016.5, 2560 CUDA cores, 20 SMs, 16nm, 7.2billion, 8GB, 8.2TFLOPs / DPU: 0.257TFLOPs, 180Watts)
#Enthusiast:	GTX1080 Ti / NVIDIA Titan X(2016.8, 3584 CUDA cores, 28 SMs, 16nm, 12billion, 12GB, 10TFLOPs / DPU: 0.317TFLOPs, 250Watts)

<br><br>

=== GPGPU Cards ===

Goto: http://wiki.jackslab.org/Nvidia_GPU_Architecture#Nvidia_Tesla_GPGPU_Cards

<br><br>

== Nvidia Tesla GPGPU Cards ==

Tesla products target the high-performance computing market.

As of 2012, Nvidia Teslas power some of the world's fastest supercomputers, including Titan at Oak Ridge National Laboratory and Tianhe-1A, in Tianjin, China.

<br>

=== Overview ===

[[文件:Nvidia-tesla-gpu-capabilities-table.jpg]]

<br>


[[文件:Nvidia-tesla-lineup-1.jpg | 800px]]

<br>

== Reference ==

* https://en.wikipedia.org/wiki/Fermi_(microarchitecture)
* https://en.wikipedia.org/wiki/GeForce_400_series
* https://en.wikipedia.org/wiki/GeForce_500_series
* https://en.wikipedia.org/wiki/GeForce_600_series
* https://en.wikipedia.org/wiki/GeForce_700_series
* https://en.wikipedia.org/wiki/GeForce_800M_series

* https://en.wikipedia.org/wiki/GeForce_900_series

* https://en.wikipedia.org/wiki/GeForce_10_series

* https://en.wikipedia.org/wiki/Nvidia_Tesla


<br><br>