How to Inference Using Llama 3 (70b and 8b) on GMI Cloud

May 20, 2024

Why managing AI risk presents new challenges

Aliquet morbi justo auctor cursus auctor aliquam. Neque elit blandit et quis tortor vel ut lectus morbi. Amet mus nunc rhoncus sit sagittis pellentesque eleifend lobortis commodo vestibulum hendrerit proin varius lorem ultrices quam velit sed consequat duis. Lectus condimentum maecenas adipiscing massa neque erat porttitor in adipiscing aliquam auctor aliquam eu phasellus egestas lectus hendrerit sit malesuada tincidunt quisque volutpat aliquet vitae lorem odio feugiat lectus sem purus.

  • Lorem ipsum dolor sit amet consectetur lobortis pellentesque sit ullamcorpe.
  • Mauris aliquet faucibus iaculis vitae ullamco consectetur praesent luctus.
  • Posuere enim mi pharetra neque proin condimentum maecenas adipiscing.
  • Posuere enim mi pharetra neque proin nibh dolor amet vitae feugiat.

The difficult of using AI to improve risk management

Viverra mi ut nulla eu mattis in purus. Habitant donec mauris id consectetur. Tempus consequat ornare dui tortor feugiat cursus. Pellentesque massa molestie phasellus enim lobortis pellentesque sit ullamcorper purus. Elementum ante nunc quam pulvinar. Volutpat nibh dolor amet vitae feugiat varius augue justo elit. Vitae amet curabitur in sagittis arcu montes tortor. In enim pulvinar pharetra sagittis fermentum. Ultricies non eu faucibus praesent tristique dolor tellus bibendum. Cursus bibendum nunc enim.

Id suspendisse massa mauris amet volutpat adipiscing odio eu pellentesque tristique nisi.

How to bring AI into managing risk

Mattis quisque amet pharetra nisl congue nulla orci. Nibh commodo maecenas adipiscing adipiscing. Blandit ut odio urna arcu quam eleifend donec neque. Augue nisl arcu malesuada interdum risus lectus sed. Pulvinar aliquam morbi arcu commodo. Accumsan elementum elit vitae pellentesque sit. Nibh elementum morbi feugiat amet aliquet. Ultrices duis lobortis mauris nibh pellentesque mattis est maecenas. Tellus pellentesque vivamus massa purus arcu sagittis. Viverra consectetur praesent luctus faucibus phasellus integer fermentum mattis donec.

Pros and cons of using AI to manage risks

Commodo velit viverra neque aliquet tincidunt feugiat. Amet proin cras pharetra mauris leo. In vitae mattis sit fermentum. Maecenas nullam egestas lorem tincidunt eleifend est felis tincidunt. Etiam dictum consectetur blandit tortor vitae. Eget integer tortor in mattis velit ante purus ante.

  1. Vestibulum faucibus semper vitae imperdiet at eget sed diam ullamcorper vulputate.
  2. Quam mi proin libero morbi viverra ultrices odio sem felis mattis etiam faucibus morbi.
  3. Tincidunt ac eu aliquet turpis amet morbi at hendrerit donec pharetra tellus vel nec.
  4. Sollicitudin egestas sit bibendum malesuada pulvinar sit aliquet turpis lacus ultricies.
“Lacus donec arcu amet diam vestibulum nunc nulla malesuada velit curabitur mauris tempus nunc curabitur dignig pharetra metus consequat.”
Benefits and opportunities for risk managers applying AI

Commodo velit viverra neque aliquet tincidunt feugiat. Amet proin cras pharetra mauris leo. In vitae mattis sit fermentum. Maecenas nullam egestas lorem tincidunt eleifend est felis tincidunt. Etiam dictum consectetur blandit tortor vitae. Eget integer tortor in mattis velit ante purus ante.

GMI Cloud provides a robust platform that simplifies training, fine-tuning, and inferencing, allowing users to deploy AI strategies in just a few clicks. In addition to providing instant access to top-tier GPUs from NVIDIA, our service stack includes compatibility with some of the premier open-source LLMs such as Llama 3. This blog post will guide you through the process of inferencing using Llama 3 on GMI Cloud, highlighting the platform’s unique advantages and key features of Llama 3.

Inferencing with Llama 3 on GMI Cloud

Step-by-Step Guide to start using Llama 3 in just a few clicks:

1. Logging in to the GMI Cloud platform

  • Create an account or log in using a previously created account

2. Launch a container

  • Navigate to the ‘Containers’ page using the navigation bar on the left side of the page
  • Click the ‘Launch a Container’ button located in the upper right-hand corner

3. Choose your model template and parameters

  • In the first dropdown menu, select Llama 3 as your template. GMI Cloud offers access to both Lllama 3 70b and 8b models
  • Under the ‘Select Hardware Resources’ section, select the type of hardware you’d like to deploy such as the NVIDIA H100. This will auto-populate certain parameters
  • Enter details for storage, authentication, and container name

4. Connect to Jupyter Notebook:

  • Returning to the ‘Containers’ page, you will be able to see the container you just created with the container name you provided
  • Click the Jupyter Notebook icon to connect to your container

5. Start testing and inferencing

  • Within the Jupyter Notebook workspace, you can start testing and inferencing using Llama 3

Key features of Llama 3:

Llama 3 represents the next generation of Meta’s open-source large language models, designed to push the boundaries of AI capabilities. Here are some key features and specifications that make Llama 3 a standout choice for developers and researchers:

Model Variants:

  • Model Sizes: Llama 3 includes models with 8 billion (8B) and 70 billion (70B) parameters, tailored for a wide range of use cases.
  • Performance: These models demonstrate state-of-the-art performance on industry benchmarks and offer improved reasoning capabilities.

Design and Architecture:

  • Tokenizer: Utilizes a tokenizer with a vocabulary of 128K tokens, leading to more efficient language encoding.
  • Inference Efficiency: Features Grouped Query Attention (GQA) to enhance inference efficiency, particularly in the 8B and 70B models.
  • Sequence Length: Trained on sequences of up to 8,192 tokens, ensuring robust handling of longer contexts.

Training Data:

  • Scale: Pretrained on over 15 trillion tokens, which is seven times larger than the dataset used for Llama 2.
  • Diversity: Includes a significant portion of high-quality non-English data, covering over 30 languages.
  • Quality: Utilizes advanced data-filtering pipelines to ensure the highest quality training data, including heuristic filters, NSFW filters, semantic deduplication, and text classifiers.

Pretraining and Fine-Tuning:

  • Pretraining: Involves extensive scaling up with detailed scaling laws to optimize data mix and training compute, achieving over 95% effective training time.
  • Fine-Tuning: Incorporates supervised fine-tuning, rejection sampling, proximal policy optimization (PPO), and direct preference optimization (DPO) to enhance performance on reasoning and coding tasks.

Trust and Safety:

  • Safety Tools: Introduced new tools like Llama Guard 2, Code Shield, and CyberSec Eval 2 to ensure responsible use and deployment.
  • Red Teaming: Engaged in extensive safety testing through internal and external red teaming efforts to mitigate risks associated with misuse.

Why GMI Cloud

Accessibility:

GMI Cloud ensures broad access to the latest NVIDIA GPUs, including the H100 and H200 models. Leveraging our Asia-based data centers and deep relationships with NVIDIA as a Certified Partner, we provide unparalleled GPU access to meet your AI and machine learning needs.

Ease of Use:

Our platform simplifies AI deployment through a rich software stack designed for orchestration, virtualization, and containerization. GMI Cloud solutions are compatible with NVIDIA tools like TensorRT and come with pre-built images, making it easy to get started and manage your AI workflows efficiently.

Performance:

GMI Cloud delivers high-performance computing essential for training, inferencing, and fine-tuning AI models. Our infrastructure is optimized to ensure cost-effective and efficient operations, allowing you to maximize the potential of models like Llama 3.

Governance:

We offer robust multi-tenancy security and control mechanisms to ensure the highest levels of data security and compliance. Our platform is designed to protect your data and maintain strict governance standards, giving you peace of mind as you scale your AI solutions.

GMI Cloud provides a comprehensive and powerful environment for all your AI needs, making it the ideal choice for deploying advanced models like Llama 3. With our integrated solutions, you can streamline your AI processes, improve performance, and ensure the security and compliance of your operations.

Get started today

Give GMI Cloud a try and see for yourself if it's a good fit for AI needs.

Get started
14-day trial
No long-term commits
No setup needed
On-demand GPUs

Starting at

$4.39/GPU-hour

$4.39/GPU-hour
Private Cloud

As low as

$2.50/GPU-hour

$2.50/GPU-hour