Why Is It So Hard To Run A GPU Server On A Public Cloud?
When it comes to deep learning applications and the graphics cards that are used to run them, many people are misinformed. The popular perception is that only high-end cards from high-end manufacturers like Asus and Samsung are effective and produce comparable results to what one can get from an average graphics card. In order to provide insight into the difference between the two, one must understand the difference in the underlying technologies that go into creating a card.
What is the distinction between regular computers that we all know of and what we typically think of as high-performance computing machines? The answer is that it lies in the core elements that go into creating and running them. For example, the computer that you are probably using right now has four major elements: motherboard, processor, RAM and graphics card. Motherboard and processor are where we find the communication ports and the processing capabilities while graphics card is where we find the hardware components necessary for creating the image and video that we see on our screens. A single motherboard is just a simple tool that coordinates all of these elements and provides the connectivity necessary to communicate with external software and hardware devices.
As we have mentioned previously, the differences between regular computers and what we consider high-performance computing machines lie in the underlying technology. In the case of the former, we are talking about two specific types of element, namely gpu server and the graphics cards. These are very important parts because they define the performance and capabilities of a particular system. Needless to say, the latter are quite expensive, with the exception of those systems that are customized for specific uses like gaming and workstation environments. Furthermore, they are very complex and are capable of great number of operations per second.
As we already mentioned, the main function of these chips is for representing tasks on a central fabric or queue based on pre-established priorities. The most common form of queue is the one that divides the workloads of the CPU and graphics cards into two specific groups. It is called the parallel processing side and is usually made up of two different groups: one for the workloads of the processors and another one for the discrete graphics cards. The reality is that the two separate groups of elements on the die are there for entirely different reasons. However, the two different groups of elements on the fabric form the basis for the performance of the whole system.
If you will take a closer look at the fabric of the chip, you will notice that the two different groups of elements are interconnected through a long line of wires. This long wire path actually links the main process of the computer together with all the critical workloads, which are then executed by the different processor cores. The fact that the fabric is rather complicated makes the task of connecting the different elements rather complex as well. This is why gpu servers usually run a bit slower than the standard ones.
This has become an important problem for GPU Servers running on the public cloud, as there is no such thing as a one-for-a-fee server. One of the major causes for this is the fact that the workloads of the different processor cores are too big for the fabric to handle. A good example of this is the fact that the most popular algorithm used in modern computer games are actually written in a programming language that is largely based on the x86 instruction set. The game designers need to be able to execute this code without any interruptions due to stalls that are caused by other tasks on the compute side. The issue here is that the game engine's sole responsibility is to pre calculate the positions of the players and their animations at every frame of every play. In order to make sure that the calculations are accurate and the game runs smoothly, it needs to use the most advanced computing techniques and the x86 instructions are just not suitable for this kind of work.
0 comments:
Post a Comment