51CTO interviews GMI Cloud's founder Alex Yeh about his path to success
Aliquet morbi justo auctor cursus auctor aliquam. Neque elit blandit et quis tortor vel ut lectus morbi. Amet mus nunc rhoncus sit sagittis pellentesque eleifend lobortis commodo vestibulum hendrerit proin varius lorem ultrices quam velit sed consequat duis. Lectus condimentum maecenas adipiscing massa neque erat porttitor in adipiscing aliquam auctor aliquam eu phasellus egestas lectus hendrerit sit malesuada tincidunt quisque volutpat aliquet vitae lorem odio feugiat lectus sem purus.
Viverra mi ut nulla eu mattis in purus. Habitant donec mauris id consectetur. Tempus consequat ornare dui tortor feugiat cursus. Pellentesque massa molestie phasellus enim lobortis pellentesque sit ullamcorper purus. Elementum ante nunc quam pulvinar. Volutpat nibh dolor amet vitae feugiat varius augue justo elit. Vitae amet curabitur in sagittis arcu montes tortor. In enim pulvinar pharetra sagittis fermentum. Ultricies non eu faucibus praesent tristique dolor tellus bibendum. Cursus bibendum nunc enim.
Mattis quisque amet pharetra nisl congue nulla orci. Nibh commodo maecenas adipiscing adipiscing. Blandit ut odio urna arcu quam eleifend donec neque. Augue nisl arcu malesuada interdum risus lectus sed. Pulvinar aliquam morbi arcu commodo. Accumsan elementum elit vitae pellentesque sit. Nibh elementum morbi feugiat amet aliquet. Ultrices duis lobortis mauris nibh pellentesque mattis est maecenas. Tellus pellentesque vivamus massa purus arcu sagittis. Viverra consectetur praesent luctus faucibus phasellus integer fermentum mattis donec.
Commodo velit viverra neque aliquet tincidunt feugiat. Amet proin cras pharetra mauris leo. In vitae mattis sit fermentum. Maecenas nullam egestas lorem tincidunt eleifend est felis tincidunt. Etiam dictum consectetur blandit tortor vitae. Eget integer tortor in mattis velit ante purus ante.
“Lacus donec arcu amet diam vestibulum nunc nulla malesuada velit curabitur mauris tempus nunc curabitur dignig pharetra metus consequat.”
Commodo velit viverra neque aliquet tincidunt feugiat. Amet proin cras pharetra mauris leo. In vitae mattis sit fermentum. Maecenas nullam egestas lorem tincidunt eleifend est felis tincidunt. Etiam dictum consectetur blandit tortor vitae. Eget integer tortor in mattis velit ante purus ante.
Author: 51CTO chief editor
Original: https://mp.weixin.qq.com/s/YthTgJEOlrXtL8NGeQz3TA
Translated by GMI Cloud, edited for readability
Alex Yeh reflected on the entrepreneurial journey of the past two and a half years, comparing his startup, GMI Cloud, to "a shark struggling to survive in the deep sea." In the ocean’s depths, hidden dangers lurk just out of sight—and navigating around them has become the shark’s daily battle to survive and move forward.
Alex is the founder and CEO of GMI Cloud, an AI-native cloud provider launched in 2023. It’s his second startup. Before founding GMI Cloud, Alex served as a director at a leading private equity and venture capital firm in the Asia-Pacific region. He was also the youngest partner in the crypto and blockchain space, with over 100 AI-related investments under his belt.
In a field like AI—where progress happens daily and uncertainty is the norm—the metaphor feels especially apt. And yet, as Alex would tell you, the reality is often even more extraordinary.
Two and a half years ago, Alex moved to the U.S. to launch GMI Cloud. After significant effort, he secured over 500 hectares of land and a 100-megawatt power plant—marking the start of a new AI infrastructure venture. In just four months, he achieved four major milestones:
And that was just the beginning. “You can’t predict what AI will look like in the long term,” Alex said. “But you need a True North that lasts ten years—like an unshakable aircraft carrier.”
By early 2025, following the release of DeepSeek R1, global users rushed to try the model. GMI Cloud swiftly deployed, adapted, and optimized DeepSeek-R1 using H200 hardware. Demand surged, and Alex’s phone didn’t stop ringing. GMI Cloud is now focused on boosting AI inference performance and token throughput using high-end hardware.
In Q1 this year, GMI Cloud’s revenue tripled year-over-year. The growth raises key questions: What do users want from AI cloud services during the inference era? How can GMI Cloud compete with giants like Google, Microsoft, and Amazon? What’s changing in both infrastructure and applications?
We sat down with Alex for an 80-minute conversation to find out.
Tech innovation tends to be overhyped in the short term and underappreciated in the long run—especially in AI. For startups, the hardest test is surviving the quiet period before growth.
Months before DeepSeek R1 launched, Alex made a strategic call: GMI Cloud had to rapidly build a robust inference engine. In hindsight, it was inevitable. “Big models were already good enough. We saw airline call centers using AI customer service, apps doing instant translation. The next step was clearly edge and local deployment, which requires ultra-low latency.”
He set three key goals: Auto Scaling, Global Scaling, and Hot Swap. The first two are straightforward—but why Hot Swap? Because in inference, machine downtime is a deal-breaker. Hot swapping ensures reliability through instant machine replacement.
In March, GMI Cloud launched its self-developed Inference Engine Cloud Platform with 99.995% uptime availability.
This head start has paid off in the last six months. When DeepSeek R1 exploded in February, customer demand turned almost entirely to inference.
With demand came new pressures. Hardware makers like AMD reached out to partner—but Alex declined. “We’re a cloud company, not a hardware vendor,” he said. With limited resources, GMI Cloud must focus on scale, not diversification.
“DeepSeek R1 dominates overseas, and the H200 is best for it. Our customers need high-performance inference. Supporting too many chip types would slow us down. When we’re as big as CoreWeave, we’ll revisit diversification.”
The engineering team is now in high-alert mode. “Tech moves fast—NVIDIA might drop Dynamo one week, and the next, there’s a new community paper. We have to digest and deploy quickly.”
We asked a hypothetical: If Alex were building an application startup, would he go B2B or B2C?
His answer: B2B. “They monetize differently. B2B needs enterprise support. B2C depends on finding large user scenarios. Post-launch feedback is intense, and trend cycles move fast.”
He believes every AI founder is a kind of superhero—and his superpower works best in B2B.
His playbook is “desperate but simple”: Find anchor customers, talk constantly, understand their needs, iterate fast, and deliver.
Still, Alex sees potential in global consumer (B2C) markets. “Chinese companies are great at going global with consumer products. They’re strong at execution, promotion, hardware integration, and open source. A lot of new open-source communities are led by Chinese teams. With that support, you don’t even need to build your own site to monetize.”
Whether it’s B2B or C2C, both offer opportunity. For example, text-to-image or text-to-video: “C2C can power creative studios for designers. B2B can partner with Adobe or offer vertical APIs.”
Why haven’t AI agents taken off?
Alex concluded decisively that the era of explosive growth for general-purpose Agents has not yet arrived. For one, models still underperform in key areas—particularly in their ability to interact with the physical world. Second, the computational costs remain too high to be truly cost-effective. There are also persistent barriers between application scenarios, with private datasets being a major one. Until these datasets can be integrated, it will be difficult for Agents to deliver the kinds of services customers actually need.
Still, niche use cases are growing—like AI coding tools (Cursor, Windsurf) and content generators.
On the cost front, since the release of DeepSeek, open-source models have begun to outperform even some closed-source alternatives. With ongoing improvements across AI infrastructure—from memory and hardware to model architecture—the cost per token is steadily dropping. Lower-precision implementations and more efficient solutions are expected to push costs down even further.
As for breaking through scenario-specific barriers, companies with proprietary data in various verticals can build specialized Agents tailored to each domain. If the product experience is strong enough, clients will naturally be more willing to entrust their data.
“Based on my current assessment,” Alex said, “general-purpose Agent technology still has a long way to go before it reaches full-scale breakthroughs. The most rapid progress is still happening in areas like AIGC and text-to-image generation. But in the coming years, as the data flywheel effect kicks in, applications will start to generate meaningful interaction data at scale. That data will fuel stronger multimodal models, which will in turn unlock the next wave of breakthroughs.”
Responsibility for data-related issues lies with the application layer, while the burden of reducing costs falls to cloud providers like GMI Cloud.
Token prices remain high—especially internationally. GMI Cloud uses methods like Prefill-Decode Disaggregation and Elasticity Provisioning to cut token costs without hurting performance.
On pricing wars, Alex is direct: “First, customers want it to exist. Then they want it to be good. Then they want it to be cheap. If it’s cheap but unreliable, it’s worthless. Or if it’s the cheapest in the U.S., but customers operate in Asia, it’s still not cost-effective.”
Recent examples like GPT-4o’s Ghibli-style image generation show how fast inference demand can spike—forcing OpenAI to impose limits.
So how quickly will costs drop?
Alex predicts: very swiftly. With every 1-2 year cycle, NVIDIA and others release new architectures. Each time, inference cost could halve. Within five years, it could approach zero.
Scaling Laws and Moore’s Law point to two trends:
Cloud providers will continue to integrate models and hardware. Soon, token prices won’t be a bottleneck.
In October 2024, GMI Cloud raised $82 million in Series A. Over time, Alex developed a way to convey his vision clearly:
“We’re the Shopify of AI.”
Shopify is a globally recognized e-commerce platform that has, since its inception, been dedicated to helping entrepreneurs and influencers quickly launch online stores and pursue their business dreams. In the U.S., the market was once dominated by eBay and Amazon—platforms that locked entrepreneurs into rigid ecosystems with limited flexibility. Shopify changed that by offering creators a more autonomous and controllable alternative.
Alex sees a parallel in GMI Cloud’s mission. “Historically, most innovations have been confined to the ‘big three’ cloud providers in the U.S., which has made it difficult for customers to achieve real, value-added breakthroughs,” he explained. “We want to give control of the environment back to our clients.”
How does GMI Cloud achieve this? By prioritizing flexibility in product design. The platform is built around three core components—GPU Instance, Cluster Engine, and Inference Engine—which can be purchased separately or bundled together. Clients can mix and match these layers based on their needs: deploying locally trained models, using GMI Cloud’s pre-optimized offerings, or integrating third-party platforms. Even GPU resources come with zero vendor lock-in, ensuring customers maintain full control and autonomy.
At NVIDIA GTC 2025, GMI Cloud officially launched its Inference Engine—a gateway to its Model-as-a-Service (MaaS) layer.
Alex explained: It’s designed for product teams without ML backgrounds. With Inference Engine, they focus on growth and product—not debugging base models. Just pick a model from the Marketplace.
So, what’s next?
When asked about GMI Cloud’s 3–5 year roadmap, Alex shared an ambitious vision: to build an "AI of the Internet." “It’s a vision I find incredibly exciting,” he said. “We want to create an invisible GPU cloud network—a silent enabler that empowers startups and enterprise innovators to bring their AI ideas to life. We’ll deliver scenario-specific computing power and engine support across a wide range of AI R&D use cases, staying closely aligned with every breakthrough in the field. Our goal is to help shape the future of the AI industry—not just stand by and watch it unfold.”
The GMI Cloud he founded 28 months ago is evolving into a full-stack AI platform—from compute to storage to application layers, all modular.
“Like a luxury hotel: check in with a suitcase and live comfortably. And if you like something in the room—you can take it home.”
Give GMI Cloud a try and see for yourself if it's a good fit for AI needs.
Starting at
$4.39/GPU-hour
As low as
$2.50/GPU-hour