Nvidia GPU Architecture

来自Jack's Lab
(版本间的差异)
跳转到: 导航, 搜索
(Kepler Micro Architecture)
(Fermi Micro Architecture)
第20行: 第20行:
 
The Fermi architecture was the first complete GPU computing architecture to deliver the features required for the most demanding HPC applications.
 
The Fermi architecture was the first complete GPU computing architecture to deliver the features required for the most demanding HPC applications.
  
* 16 SM (32 CUDA cores/SM)
+
* 1 CUDA core: 1 ALU + 1 FPU
* 512 CUDA cores (1 ALU + 1 FPU per core)
+
 
* 1 SM: 32 CUDA cores + 16 Load/Store Unit + 4 SPU + 2 Warp Scheduler
 
* 1 SM: 32 CUDA cores + 16 Load/Store Unit + 4 SPU + 2 Warp Scheduler
 
* 1 SM: 2 Warps
 
* 1 SM: 2 Warps
 
* 1 Warp: 16 CUDA cores + 16 Load/Store Unit(shared) + 4 SPU(shared) + [32 threads context ?]
 
* 1 Warp: 16 CUDA cores + 16 Load/Store Unit(shared) + 4 SPU(shared) + [32 threads context ?]
 
* Handle 48 warps per SM for a total of 1536 (48x32) threads resident in a single SM at a time [48 Warps context ?]
 
* Handle 48 warps per SM for a total of 1536 (48x32) threads resident in a single SM at a time [48 Warps context ?]
 
 
* Register file is 32KB
 
* Register file is 32KB
  
 +
GTX480:
 +
* 16 SM (32 CUDA cores/SM)
 +
* 512 CUDA cores
 
* 1345 GFLOPs
 
* 1345 GFLOPs
 
* 40 nm
 
* 40 nm

2017年3月22日 (三) 14:59的版本

目录

1 Overview

The GPU architecture is built around a scalable array of Streaming Multiprocessors (SM)

The key components of a Fermi SM:

  • CUDA cores (ALU + FPU)
  • Load/Store Units
  • Special Function Units
  • Register File
  • Shared Memory/L1 Cache
  • Warp Scheduler



2 Fermi Micro Architecture

The Fermi architecture was the first complete GPU computing architecture to deliver the features required for the most demanding HPC applications.

  • 1 CUDA core: 1 ALU + 1 FPU
  • 1 SM: 32 CUDA cores + 16 Load/Store Unit + 4 SPU + 2 Warp Scheduler
  • 1 SM: 2 Warps
  • 1 Warp: 16 CUDA cores + 16 Load/Store Unit(shared) + 4 SPU(shared) + [32 threads context ?]
  • Handle 48 warps per SM for a total of 1536 (48x32) threads resident in a single SM at a time [48 Warps context ?]
  • Register file is 32KB

GTX480:

  • 16 SM (32 CUDA cores/SM)
  • 512 CUDA cores
  • 1345 GFLOPs
  • 40 nm
  • 3.2 billion transistors
  • GTX480 250Watts


2.1 Fermi Cards

2.1.1 GeForce 400 Series

  • Release date: April 12, 2010
  • Codename: GF10x
  • Architecture: Fermi
  • Models:
  1. GeForce Series
  2. GeForce GT Series
  3. GeForce GTS Series
  4. GeForce GTX Series
  • Fabrication process and transistors:
  1. 260M 40nm (GT218)
  2. 585M 40 nm (GF108)
  3. 1.170M 40 nm (GF106)
  4. 1.950M 40 nm (GF104)
  5. 1.950M 40 nm (GF114)
  6. 3.200M 40 nm (GF100)
  • Cards:
  1. Entry-level GT420 GT430
  2. Mid-range GT440 GTS450 GTX460
  3. High-end GTX465 GTX470
  4. Enthusiast GTX480 (2010.3, 3.2 billion Transistors, 15 SMs, 1536MB, 1345 GFLOPS, 250W)



2.1.2 GeForce 500 Series

  • Release date: 8 November 2010
  • Codename: GF11x
  • Architecture: Fermi
  • Models:
  1. GeForce Series
  2. GeForce GT Series
  3. GeForce GTX Series
  • Fabrication process and transistors:
  1. 292M 40nm (GF119)
  2. 585M 40 nm (GF108)
  3. 1.170M 40 nm (GF116)
  4. 1.950M 40 nm (GF114)
  5. 3.000M 40 nm (GF110)
  • Cards:
  1. Entry-level 510 GT520 GT530
  2. Mid-range GT545 GTX550Ti GTX560 GTX560Ti
  3. High-end GTX570 GTX580 GTX590(2011.3, 2x3 billion transistors, 32 SMs, 2x1536MB, 2488GFLOPS, 365W)



3 Kepler Micro Architecture

Released in the fall of 2012

  • 1 SM: 192 CUDA cores + 64 DPU + 32 Load/Store Unit + 32 SPU + 4 Warp Scheduler (8 instruction dispatchers)
  • 1 SM: 4 Warps
  • 1 Warp: 16 CUDA cores + 16 Load/Store Unit(shared) + 4 SPU(shared) + [32 threads context ?]
  • Handle 64 warps/SM for a total of 2048 (64x32) threads resident in a single SM at a time [64 Warps context ?]
  • Register file size is 64K

K20X:

  • 2688 CUDA cores, 6GB
  • 14 SM
  • 3.935 TFLOPs / DPU: 1.312 TFLOPs
  • 28 nm
  • 235Watts

GTX690:

  • 3072 CUDA cores
  • 2x8 SM
  • 2x3.54 billion transistors
  • 300Watts (2012.4)



4 Maxwell Micro Architecture

GM204:

  • 16 SM
  • 2048 CUDA cores
  • 4612 GFLOPs
  • 28 nm
  • 5.2 billion transistors
  • GTX980 165W


5 Reference




个人工具
名字空间

变换
操作
导航
工具箱