Hacking a Server-Grade NVIDIA GPU Right into a Dwelling Desktop

What is the cope with pc {hardware} recently? Reminiscence and storage costs have gone stratospheric. A top-of-the-line Raspberry Pi 5 now prices greater than $300. Don’t even ask a couple of GPU — if you must ask, you most likely can’t afford one. With the times of $35 single-board computer systems and dirt-cheap RAM behind us, we have now to begin getting inventive once more, like again when turning Wi-Fi routers into computer systems was an enormous factor.

That’s what Oscar Molnar is doing, and it acquired him a pleasant deal on a GPU. For about $250, he acquired an NVIDIA Tesla V100 SXM2 16GB card. The trick is that this GPU was plucked from a knowledge heart. Whereas it nonetheless has a variety of life left in it, it’s out of date for business purposes. However repurposing a server-grade GPU for residence use isn’t simple. There isn’t any PCIe connector, and the facility connector isn’t what you’d count on both.

The adapter (📷: Oscar Molnar)

The Tesla V100 SXM2 was initially designed for NVIDIA DGX servers and hyperscale computing methods. Not like a standard graphics card, it plugs right into a proprietary socket and depends on specialised server {hardware} for energy, cooling, and communication. Thankfully for Molnar, third-party SXM2-to-PCIe adapter boards can be found. By combining certainly one of these adapters with the secondhand V100, he was in a position to set up the cardboard alongside his current RTX 4080.

The V100 might date again to 2017, however it nonetheless packs 16GB of HBM2 reminiscence and a formidable 900GB/s of reminiscence bandwidth. That truly exceeds the bandwidth out there on a newer RTX 4080. For AI inference workloads, the place transferring mannequin weights by reminiscence is commonly the limiting issue, bandwidth issues way over many individuals understand.

Nevertheless, there was nonetheless some work to be accomplished to make this GPU appropriate for residence use. The adapter’s cooling fan was about as loud as a vacuum cleaner — Molnar measured the inventory setup at 82 dB. After some experimentation, he found that the fan used normal PWM management indicators regardless of its uncommon connector. A handful of jumper wires and a customized cable allowed the fan to be related on to a motherboard header, decreasing the noise dramatically whereas holding temperatures beneath 50°C underneath load.

The cardboard put in in a desktop pc (📷: Oscar Molnar)

With each GPUs put in, Molnar’s system now has 32GB of mixed VRAM. Utilizing tensor splitting in llama.cpp, giant language fashions might be distributed throughout each units. In a single check, a quantized 27-billion-parameter Qwen3.6 mannequin with a 128,000-token context window achieved 32 tokens per second throughout inference.

The software program setup required some hacking, notably as a result of newer NVIDIA drivers have dropped help for the Volta structure utilized by the V100. By pinning particular driver, kernel, and CUDA variations underneath NixOS, Molnar was in a position to get each the RTX 4080 and V100 working collectively reliably.

This sort of challenge actually isn’t for everybody, however it’s a good reminder that if you’re prepared to get inventive, you’ll be able to nonetheless get loads of computing energy for an affordable worth.

Hacking a Server-Grade NVIDIA GPU Right into a Dwelling Desktop

LEAVE A REPLY Cancel reply

Editor Picks

Latest News

Popular Categories