DeepSeek-R1: The Open-Source Challenger Upending the LLM Market

We took a look at DeepSeek-R1's research paper and its implications to understand why it's so groundbreaking.

January 28, 2025

Why managing AI risk presents new challenges

Aliquet morbi justo auctor cursus auctor aliquam. Neque elit blandit et quis tortor vel ut lectus morbi. Amet mus nunc rhoncus sit sagittis pellentesque eleifend lobortis commodo vestibulum hendrerit proin varius lorem ultrices quam velit sed consequat duis. Lectus condimentum maecenas adipiscing massa neque erat porttitor in adipiscing aliquam auctor aliquam eu phasellus egestas lectus hendrerit sit malesuada tincidunt quisque volutpat aliquet vitae lorem odio feugiat lectus sem purus.

  • Lorem ipsum dolor sit amet consectetur lobortis pellentesque sit ullamcorpe.
  • Mauris aliquet faucibus iaculis vitae ullamco consectetur praesent luctus.
  • Posuere enim mi pharetra neque proin condimentum maecenas adipiscing.
  • Posuere enim mi pharetra neque proin nibh dolor amet vitae feugiat.

The difficult of using AI to improve risk management

Viverra mi ut nulla eu mattis in purus. Habitant donec mauris id consectetur. Tempus consequat ornare dui tortor feugiat cursus. Pellentesque massa molestie phasellus enim lobortis pellentesque sit ullamcorper purus. Elementum ante nunc quam pulvinar. Volutpat nibh dolor amet vitae feugiat varius augue justo elit. Vitae amet curabitur in sagittis arcu montes tortor. In enim pulvinar pharetra sagittis fermentum. Ultricies non eu faucibus praesent tristique dolor tellus bibendum. Cursus bibendum nunc enim.

Id suspendisse massa mauris amet volutpat adipiscing odio eu pellentesque tristique nisi.

How to bring AI into managing risk

Mattis quisque amet pharetra nisl congue nulla orci. Nibh commodo maecenas adipiscing adipiscing. Blandit ut odio urna arcu quam eleifend donec neque. Augue nisl arcu malesuada interdum risus lectus sed. Pulvinar aliquam morbi arcu commodo. Accumsan elementum elit vitae pellentesque sit. Nibh elementum morbi feugiat amet aliquet. Ultrices duis lobortis mauris nibh pellentesque mattis est maecenas. Tellus pellentesque vivamus massa purus arcu sagittis. Viverra consectetur praesent luctus faucibus phasellus integer fermentum mattis donec.

Pros and cons of using AI to manage risks

Commodo velit viverra neque aliquet tincidunt feugiat. Amet proin cras pharetra mauris leo. In vitae mattis sit fermentum. Maecenas nullam egestas lorem tincidunt eleifend est felis tincidunt. Etiam dictum consectetur blandit tortor vitae. Eget integer tortor in mattis velit ante purus ante.

  1. Vestibulum faucibus semper vitae imperdiet at eget sed diam ullamcorper vulputate.
  2. Quam mi proin libero morbi viverra ultrices odio sem felis mattis etiam faucibus morbi.
  3. Tincidunt ac eu aliquet turpis amet morbi at hendrerit donec pharetra tellus vel nec.
  4. Sollicitudin egestas sit bibendum malesuada pulvinar sit aliquet turpis lacus ultricies.
“Lacus donec arcu amet diam vestibulum nunc nulla malesuada velit curabitur mauris tempus nunc curabitur dignig pharetra metus consequat.”
Benefits and opportunities for risk managers applying AI

Commodo velit viverra neque aliquet tincidunt feugiat. Amet proin cras pharetra mauris leo. In vitae mattis sit fermentum. Maecenas nullam egestas lorem tincidunt eleifend est felis tincidunt. Etiam dictum consectetur blandit tortor vitae. Eget integer tortor in mattis velit ante purus ante.

Well this is exciting: DeepSeek-R1 is an open-source reasoning model that rivals OpenAI's o1 in complex problem-solving tasks while being 90-95% more affordable. We view this breakthrough as one that highlights the increasing potential of open-source AI and its impact on the cloud computing landscape. 

You can read the paper here: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Here's a highlight of business and practitioner takeaways:

Business:

  • Costs: Being ~ 95% cheaper (than OpenAI) improves the margins on using advanced AI reasoning capabilities and improves accessibility for startups, researchers, and budget-conscious businesses. We expect this up to 20x increase in affordability to improve accessibility, making it viable for more AI applications to be produced.
  • Open-source: The model follows the MIT license, which allows free commercial and academic use. This is critical for anyone interested in building on top of DeepSeek's model and opens up opportunities for the groundbreaking methods used to create DeepSeek to be applied to other open-source models.
  • Specialties: Benchmark comparisons indicate that DeepSeek-R1 excels in mathematical reasoning and software engineering tasks, while OpenAI's o1 performs better in general knowledge and problem-solving.
  • Strategic Implications: We fully expect other AI providers to reevaluate their pricing strategies in front of this competitive model, released for free and open-source.

Practical Implications:

  • DeepSeek's findings underline the viability of smaller, distilled models for specialized tasks, offering high performance with lower resource requirements.
  • The combination of RL and cold-start approaches in DeepSeek-R1 provides a scalable and effective pathway for tackling complex reasoning challenges.

Background: Understanding AI Reasoning Models

Reasoning models are transforming AI by tackling tasks requiring logical inference, problem-solving, and decision-making. Unlike traditional pattern recognition models, they mimic human cognition, enabling advancements in complex fields like mathematics, coding, and scientific research.

DeepSeek-R1 (DS-R1) is a breakthrough in AI reasoning, using a multi-stage training process that integrates cold-start data before reinforcement learning, ensuring a strong foundation for high-complexity tasks. Built on the V3-Base model, it features a mixture of experts (MoE) framework with 671 billion parameters, activating only 37 billion per token for optimal efficiency. This design maximizes performance while minimizing resource use, making it ideal for enterprise-level workloads.

DeepSeek has also open-sourced the model and six distilled variants (1.5B–70B parameters) based on Qwen and Llama architectures, offering developers flexible deployment options.

How Does DeepSeek-R1 Compare to OpenAI's o1?

Below are the DeepSeek-R1 benchmark performances provided in the paper, showcasing how R1 compares to OpenAI-o1-1217.

  • DeepSeek-R1 is better at…
    • Showing Detailed Reasoning: It provides a full, transparent chain-of-thought (tens of thousands of tokens). It is fascinating to be able to watch the multi-faceted reasoning process the model uses to come up with reasoned answers (including edge cases and unintended consequences)
    • Cost-Effectiveness & Openness: The hosted version is free to use (with daily limits) and openly accessible. Users can also copy it from their GitHub repository to deploy DS-R1 on the AI Infrastructure of one’s choosing.
  • ChatGPT-o1 is better at…
    • Advanced Scientific Tasks: Demonstrates near doctoral-level performance in physics, chemistry, and biology.
    • High-Level Competition Performance: Achieves an 83% accuracy on the IMO qualification exam and 89th percentile on Codeforces.
  • They are equally good at…
    • Mathematics & Coding: Both handle complex math (e.g., geometry, combinatorics) and programming tasks well.
    • General Logical Reasoning: Both can break down multi-step logical problems and arrive at correct solutions.

OpenAI's o1 series, introduced in late 2024, introduced a novel approach to AI reasoning by allowing models to "think" longer before generating responses. This enhancement enables o1 to excel in science, coding, and mathematics. However, DeepSeek-R1 has demonstrated competitive performance across these benchmarks, matching o1's capabilities in key reasoning tasks.

The parity between DeepSeek-R1 and OpenAI's proprietary model is a game-changer for enterprises looking to leverage AI for critical workloads. As an open-source solution, DeepSeek-R1 provides greater accessibility, enabling organizations to experiment, customize, and deploy powerful reasoning models without vendor lock-in. This aligns with GMI Cloud's vision of providing on-demand, flexible GPU resources to power AI innovation.

Implications for AI Development

DeepSeek-R1 seems to have no obvious drawbacks, but here are what can be considered as limitations:

  • Limited additional fine-tuning: There is currently no official way to finetune or do reinforcement learning based on the model. We are looking forward to the open sourcing of those in the future.
  • Spontaneous stubbornness: DS-R1 performs very well in reasoning yet some tests indicate it’s more “stubborn” than o1 and can sometimes fail to expand the topic.
  • Limited capabilities: While DS-R1 excels in reasoning tasks, it lags behind DeepSeek-V3 in areas like function calling, complex role-playing, and JSON output. Future improvements will focus on leveraging Chain-of-Thought (CoT) methods for these tasks.
  • Language optimization: DS-R1is optimized for Chinese and English, resulting in language mixing when producing responses.
  • Prompt limitations: DS-R1 struggles with few-shot prompting, and zero-shot settings are currently recommended for optimal performance. Future work will refine prompt engineering to improve usability and robustness.

DeepSeek-R1: Observations of the Technicals

Emphasis on Reinforcement Learning (RL) Instead of Supervised Fine-Tuning (SFT)

Probably the most surprising line: "We directly apply RL to the base model without relying on supervised fine-tuning (SFT) as a preliminary step." – DeepSeek-R1 paper, Page 4

DeepSeek R1 boldly diverges from the common LLM training pattern (pre-training + large-scale SFT) by relying almost entirely on RL for fine-tuning. This approach minimizes dependence on vast labeled datasets and allows the model to “learn by doing” in an autonomous manner. This paradigm shift makes the model break free from traditional “pre-set patterns,” driving remarkable gains in adaptability, complex reasoning, and self-guided learning.

Group Relative Policy Optimization Reduces Costs of RL

This caught our eye from the paper and may explain at least parts of why DeepSeek-R1 was so cost-effective to train.

In layman's terms (please understand this just a summary): the model is taught by thinking about groups of answers at once, then comparing them to determine how relatively "good" each answer is. By "rewarding" the model for producing increasingly better answers, the researchers achieve cheaper training costs for RL.

Emergent Reasoning Abilities (Self-Verification, Reflection, Long-Chain Reasoning)

Under the pure RL regime, DeepSeek R1 spontaneously developed advanced capabilities:

  • Self-Validation: It checks intermediate reasoning steps before finalizing answers—akin to a student double-checking their work.
  • Reflection: It revisits past inferences, identifies errors, and refines solutions based on those insights.
  • Long-Chain Reasoning: DeepSeek R1 seamlessly handles multi-step logical or mathematical challenges, indicating robust problem-solving depth that emerged naturally from the RL-driven training.

No, it's not self-aware (yet). The paper itself refuses to use the term. But the line is increasingly blurred when the model self-evolved spontaneous behavior that many would characterize as (for lack of a better term) conceptually "self-aware" critical thinking, able to self-reference its own previous thoughts to identify mistakes in the previous approach. We're curious where this goes, but reinforcement learning has certainly produced an interesting result that the researchers highlight was an "aha moment"

This begs a question: at what point is something self-aware? We'll pursue this topic sometime in the future.

The Role of “Cold Start” & Multi-Stage Training

Although DeepSeek R1 primarily relies on RL, the paper reveals a critical “cold start” phase, where a small amount of high-quality chain-of-thought (CoT) data is used to stabilize the initial training. This subtle detail counters the impression of pure RL from zero—there is still a minimal guided setup to ensure training doesn’t collapse early. Additionally, language-consistency rewards and multi-objective optimization (e.g., combining reasoning, writing, and role-playing tasks) are carefully orchestrated to produce a balanced, high-performing model. These measures highlight that while the “pure RL” narrative is central, a degree of careful engineering is essential for effective results.

Looking Ahead

GMI Cloud is already hosting DeepSeek-V3 for general purpose use, with dedicated DeepSeek-R1 endpoints for customers. Public endpoints will be available in February 2025. If you're curious to test DeepSeek's capabilities for yourself, please don't hesitate to reach out to us here.

Get started today

Give GMI Cloud a try and see for yourself if it's a good fit for AI needs.

Get started
14-day trial
No long-term commits
No setup needed
On-demand GPUs

Starting at

$4.39/GPU-hour

$4.39/GPU-hour
Private Cloud

As low as

$2.50/GPU-hour

$2.50/GPU-hour