NVIDIA, AWS expand EC2 G7 instances with RTX Blackwell for production AI; vector search now 10x faster on OpenSearch Serverless
NVIDIA and AWS announced a series of AI infrastructure expansions Tuesday. The headline: EC2 G7 instances powered by NVIDIA RTX PRO 4500 Blackwell Server Edition GPUs for AI inference, graphics, and data analytics workloads—delivering up to 4.6x AI inference performance vs. G6 instances. G7 supports up to eight GPUs with 256GB total GPU memory, 700 Gbps EFA networking, and 7.6TB NVMe SSD storage in one- to eight-GPU configurations plus bare metal variants.
AWS also made NVIDIA cuVS the default GPU-accelerated vector indexing engine for Amazon OpenSearch Serverless, shifting vector search from a specialized optimization project to a standard cloud capability. Result: vector indexing up to 10x faster at one-quarter the cost vs. CPU-only builds, enabling billion-scale vector databases to build in under an hour—critical for RAG, semantic search, recommendation systems, and agentic AI workloads.
AWS achieved NVIDIA Exemplar Cloud status for NVIDIA GB300, meaning the cloud provider meets NVIDIA's rigorous performance thresholds for large-scale training workloads. The partnership reinforces AWS's position as a primary production deployment platform for NVIDIA hardware and software, with G7 instances available through AWS Deep Learning AMIs, ECS, EKS, and Amazon SageMaker AI.
For architects: G7 shifts inferencing from over-provisioned multi-GPU instances toward right-sized, cost-efficient deployments. cuVS-default OpenSearch eliminates the need for separate vector DB infrastructure tuning, lowering operational overhead for teams scaling retrieval and agentic systems. Watch for follow-on optimizations around latency—700 Gbps networking is strong, but sub-10ms p99 latencies for live multi-hop agent calls remain a frontier challenge.