
fact sheet
|
Computationally-intensive computing projects have usually run on supercomputers costing millions or tens of millions of dollars (remember the Cray X-MP supercomputer which cost $15 million back in 1984). If you are lucky enough to work in academia or for the government and have a job that required you to write computationally intensive code, chances are you have access to the type of machine that can speed your job.
While computing has become both more powerful and affordable since then, the demand for greater computational power has grown beyond just scientific and academic needs. Medical, financial, digital video and gaming applications all require tremendous computational power. For those developers who don't work for the government, where can they access this power in a cost-effective fashion?
GPU Computing with CUDA: Redefining Supercomputing
For years, Graphics Processing Units (GPUs) from visual computing company NVIDIA have been central to providing the highest performance graphics, gaming, and visual experience to consumers and professionals alike. Today, these GPUs continue to serve to help render high-performance graphics by calculating lighting and transformation data for graphically-intense programs. But since NVIDIA rearchitected their GPUs with the massively parallel CUDA architecture in November 2006, these GPUs also work in concert with the system CPU and act as high-performance parallel processors. The two together make for a heterogeneous computing environment. Sumit Gupta, Senior Product Manager for Tesla/GPU explains: "In heterogeneous computing you use two different architecture types: the host CPU, which is optimized for sequential computing (database, operating system, etc.) and the GPU which is optimized for parallel computing (HPC, scientific computing, design and creation, etc.) This gives the developer the flexibility to target different parts of an application to the most appropriate architecture type."
The CUDA architecture enables developers to take advantage of the GPU's enormous one Teraflop of floating point performance using high-level language programming languages like C, C++, and Fortran. "We created a new set of tools for the software developer" says Gupta "to tap into the power of the GPU. This C for GPU toolkit is exactly like the C for CPU toolkit. The tools have the same options and are very familiar to a C programmer. Now, not only does supercomputing become affordable, but since we can get supercomputing power using the GPU in our regular desktop computers, every developer can have a supercomputer under their desk."
NVIDIA has already seen significant speed increases across a number of computationally-intensive applications—from medical imaging to financial simulations, molecular dynamics and more. Even consumer applications are becoming more computationally demanding, says Gupta. "Digital or video image processing has become so complex and demanding, that the things consumers do in their digital cameras today would have been characterized as a supercomputing application ten years ago. Transcoding a Blu-Ray movie for playback on your iPod could take 4-5 hours on your typical laptop today. But with a GPU-enabled laptop and a transcoder developed on the CUDA architecture, it could be done in as little as 30 minutes." Results like these will help drive demand for the CUDA architecture. Best of all, the CUDA tools are free.
What Do Developers Need to Do?
CUDA tools today are targeted at the C language developer. If you program in other languages (Java, C#, C++, etc.) you'll need to identify those functions that would benefit from GPU speedup, code just those functions in C, compile them with the CUDA C compiler, then link the object files with the rest of your application to create a CPU/GPU executable. Gupta says there are plans to move the CUDA tools to Fortran and C++ next. There are also wrappers available for Fortran, Python, Java, and .NET today.
In addition, to support C language developers, CUDA also supports OpenCL (Open Computing Language)—a driver API for programming heterogeneous data and task parallel computing across GPUs and CPUs, as well as DX11 compute. The CUDA architecture is designed to natively support all computational interfaces, standard languages, such as Fortran, C and C++, as well as application programming interfaces.
One of the challenges developers face is parallelizing the portion of the code where there is a performance bottleneck. "This is a challenge with any massively-parallel system" says Gupta. "And CUDA is designed to make this process easier for the developer. The CUDA programming model makes it easier to break down the code and express the parallelism in the algorithm."
The CUDA architecture works best with compute- and data- intensive, code. The NVIDIA Tesla 10 series processor, which incorporates the second generation CUDA architecture, has IEEE standard double precision floating point hardware and with its 240 processor cores can clock in at one teraflop of single precision performance—1000-times faster than the previously mentioned Cray X-MP.
CUDA Momentum
Gupta points out that CUDA has been in development for more than 5 years and has been shipping for over two years now. Eighty-plus universities are now teaching about the CUDA architecture and parallel programming model. And since CUDA is enabled on all new NVIDIA GPUs, there are over 100 million CUDA-enabled GPUs deployed today. Application developers can thus be confident that there is a ready-market out there for CUDA-developed applications.
With Tesla-based computing products from the Tesla C1060 Computing Board, to the Tesla Personal Supercomputer with 4 Tesla C1060 Computing Boards or the Tesla S1070 1U rack mountable system, developers can easily afford their own supercomputing platform for developing CUDA applications.
As Microsoft Technical Fellow Burton Smith says, "We've all heard 'desktop supercomputer' claims in the past, but this time it's for real: NVIDIA and its partners will be delivering outstanding performance and broad applicability to the mainstream marketplace. Heterogeneous computing is what makes such a breakthrough possible."