October 7, 2022
Nvidia GPU hardware
This began when I wanted to answer the question, "what is a CUDA core?".
My next question was, "what kind of Nvidia card do I have, and how many
cores does it have."
GeForce GT1030
My card is a MSI Graphics Card NVIDIA GEFORCE GT 1030 2GHD4 LP OC.
I bought it in 2021 with two requirements.
- It needed to support 4K on my 3840x2160 monitor.
- It must not have a fan
- The price must be around $100
This card now sells for $125. General advice was to get a 750 card for the
same money, but I rejected that choice because 750 cards have fans.
If I had bought an EVGA 750 card, there is a good chance the fans would never
have turned on, since I use the card in 2D mode all the time.
What is CUDA?
CUDA stands for "Compute Unified Device Architecture", which is pretty vague.
You need to read the above article to get the big idea.
CUDA programming
GPU Assembly
The bottom line here is to forget it. Vendors generally do not even publish the instructions,
or at any event they don't expect you to do things at that level.
The Hardware
The silicon is the Nvidia GP108 chip. This was made by Samsung for Nvidia.
- released May, 2017
- Pascal architecture
- 14 nm process
- 1,800 million transistors
- 74 mm^2 die size
- 3 SM units (streaming muliprocessors)
- 384 shading units (CUDA cores)
- 24 TMU units (texture mapping units)
- 16 ROPS (raster output pipelines aka raster output units)
- 96 SFU units (special function units)
- 3 TPC units (texture processor cluster)
- 1 GPC (graphics processing cluster)
- each SM has a 48K L1 Cache
- 512K shared cache
-
The following white paper talks about the Pascal architecture in the "Tesla P100",
which as 56 SM units, but may still be informative for understanding Pascal.
Maxwell came first, then Pascal, and the world moved on to "Turing".
I hear about Fermi and Kepler, and even Ampere; but I am not trying to keep up with all of this.
In March of 2022, Nvidia introduced "Hopper", which displaces "Ampere"
The H100 in Hopper has 80,000 million transistors and is geared to speed up various AI calculations.
My GP108 chip has what is designated as a 384:24:16 core.
Each SM has 128 CUDA cores. Each CUDA contains an ALU.
Each SM also has 4 double precision FP ALU, and a half precision FP ALU which handles a vector of two
half precision floating operations.
Have any comments? Questions?
Drop me a line!
Adventures in Computing / [email protected]