Neocloud & Hosted.ai: Scaling AI Infrastructure via Fractional GPU Pooling
The compute shortage of 2024-2025 has transitioned into an optimization crisis in 2026. While the world's largest AI labs have secured their massive clusters, thousands of startups and enterprises are left fighting for the "scraps" of the GPU market. Enter Neocloud and Hosted.ai. Today, the two companies announced a landmark merger to create the first decentralized GPU Pooling Marketplace, a system designed to unlock the estimated 40% of global H100 capacity that currently sits idle during off-peak cycles.
VRAM Virtualization: The Technical Breakthrough
Until now, GPU resources were largely monolithic. If you wanted to run an inference task, you had to lease a full physical GPU (or a MIG slice). Neocloud’s proprietary VRAM Virtualization layer changes this. By utilizing Dynamic Resource Fractionalization (DRF), Neocloud can abstract a single H200 or B200 GPU into dozens of Virtual Compute Units (VCUs).
Developers can now provision exactly the amount of VRAM they need—down to 1GB increments—without the overhead of a full virtual machine. Neocloud’s custom hypervisor, Aether, manages the Spatial and Temporal multiplexing of the GPU kernels. This ensures that a workload running on 4GB of VRAM has its own Guaranteed Memory Bandwidth and Compute cycles, isolated from other tenants on the same silicon.
Efficiency Benchmark
In production testing, Neocloud’s DRF technology improved cluster-wide GPU utilization from an average of 42% to 89%, while reducing the marginal cost per token for small-model inference by 65%.
Hosted.ai: The Real-Time Auction Engine (RAE)
If Neocloud is the engine, Hosted.ai is the market that fuels it. The platform features a Real-Time Auction Engine (RAE) that functions similarly to a high-frequency trading floor. Data center owners can "list" their excess VCUs on the marketplace, and developers can bid for them based on SLA requirements.
The RAE uses a Vickrey-type auction to ensure fair pricing. Prices fluctuate every 60 seconds based on global supply and demand. Large enterprises with private data centers—such as banks or healthcare providers—can now turn their "night-time idle capacity" into a new revenue stream, effectively subsidizing their own AI R&D by exporting their compute to the Hosted.ai network.
Securing the Multi-Tenant GPU
The biggest hurdle to GPU pooling has always been security. Side-channel attacks and memory leakage are significant risks when multiple users share the same physical die. Neocloud addresses this with Encrypted Multi-Tenancy. Every VCU is wrapped in a hardware-protected memory enclave.
The Aether hypervisor performs real-time Memory-Address Sanitization to prevent "rowhammer" style exploits. Furthermore, all data passed to a VCU is encrypted in transit and only decrypted within the GPU’s Secure Memory Controller. This allows Hosted.ai to satisfy HIPAA and SOC2 compliance requirements, even in a shared-compute environment.
The "Uber-ization" of Silicon
The long-term vision of the Neocloud-Hosted.ai merger is the Liquidization of Compute. By the end of 2026, the companies plan to release an Edge-Node Connector that will allow high-end consumer GPUs (like the NVIDIA RTX 6090) to join the global pool. This "Uber for GPUs" model would allow a designer in London to "rent out" their workstation's GPU power to a developer in Singapore while they sleep.
While Interconnect Latency remains a bottleneck for large-scale training, the inference market is perfectly suited for this decentralized model. Most agentic tasks involve small context windows and fast turnarounds, making them ideal candidates for Fractional Compute Routing.
Conclusion: Toward Compute as a Utility
The Neocloud-Hosted.ai partnership is a glimpse into a future where compute is treated as a standardized utility, much like water or electricity. In this world, the concept of "owning a server" becomes as antiquated as "owning a power generator." By democratizing access to high-end silicon, Neocloud and Hosted.ai are ensuring that the AI economy remains competitive and accessible to all, not just the "Compute Rich."
Architect Your Cloud Infrastructure
Managing fractional GPU pools requires meticulous planning. Use ByteNotes to collaborate on your infrastructure diagrams and keep your resource allocation logs organized for your engineering team.
Start Using ByteNotes Today →