Meet us at NVIDIA GTC 2026.Learn More

Now Available: Optimized DeepSeek-R1 On GMI Cloud

GMI Cloud is happy to announce that we are hosting DeepSeek and its distilled models!

February 03, 2025

GMI Cloud is excited to announce that we are now hosting a dedicated DeepSeek-R1 inference endpoint, on optimized, US-based hardware.

What's DeepSeek-R1? Read our initial takeaways here.

Technical details:

  • Model Provider: DeepSeek
  • Type: Chat
  • Parameters: 685B
  • Deployment: Serverless (MaaS) or Dedicated Endpoint
  • Quantization: FP16
  • Context Length: The model can remember and process up to 128,000 tokens from previous inputs within a single session.

Additionally, we are offering the following distilled models:

  • DeepSeek-R1-Distill-Llama-70B
  • DeepSeek-R1-Distill-Qwen-32B
  • DeepSeek-R1-Distill-Qwen-14B
  • DeepSeek-R1-Distill-Llama-8B
  • DeepSeek-R1-Distill-Qwen-7B
  • DeepSeek-R1-Distill-Qwen-1.5B

Try our token-free service with unlimited usage!

Reach out for access to our dedicated endpoint here.

Build AI Without Limits

GMI Cloud helps you architect, deploy, optimize, and scale your AI strategies

Ready to build?

Explore powerful AI models and launch your project in just a few clicks.

Get Started