Self-Host Gemma 7B on Shadeform at $0.57/hr

By Ronald Ding

Hosting large language models (LLMs) like Google's Gemma 7B can be a resource-intensive and costly process. But with Shadeform, you can deploy this powerful tool affordably and with ease. Here's how to get started:

Gemma Model

Create a HuggingFace Account and Access Gemma

First, you'll need to gain access to the Gemma model:

  1. Visit HuggingFace's Gemma 7B page to request access.
  2. If you're not already a HuggingFace member, sign up for a free account.
  3. Generate a new API token at the HuggingFace token settings page.

Alternatively, use the pre-configured API token provided for this guide: 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.

Set Up Your Shadeform Account

Next, set up your hosting environment:

  1. Create your free Shadeform account at shadeform.ai.
  2. Add credits to your account here to launch an instance.
  3. Optionally, contact me on LinkedIn for a $10 credit.

Find and Launch the A6000 GPU Instance

Locate the most cost-effective A6000 GPU option and set up your model:

  1. Go to the A6000 GPU launch page on Shadeform.
  2. Under "Launch Configuration," select the "Container" tab.
  3. Enter the machine and container details as shown below.
  • Machine Name: gemma-7b-llm
  • Container Image: ghcr.io/huggingface/text-generation-inference:1.4
  • Container Args: --model-id google/gemma-7b-it
  • Environment Variables:
    • Name: HUGGING_FACE_HUB_TOKEN
    • Value: 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

Massed Compute Container Screenshot

Once configured, launch your instance and wait for the model to be ready.

Wait until the model is ready

It takes a few minutes for the GPU instance to spin up and another few minutes after that to download everything needed to run the model. On the Shadeform Running instances, click on the instance that was spun up and go to the logs tab to see the live logs.

Once you see the log lines "Invalid hostname, defaulting to 0.0.0.0", your API is ready to go!

Massed Compute Container Screenshot

Curl Your Gemma 7B Instance

After the API is live, you can send prompts using curl commands:

curl -H 'Content-Type: application/json' \
     -d '{ "inputs":"What are the top national parks?", "parameters":{ "max_new_tokens":100 } }' \
     -X POST \
     http://<Your-Instance-IP>/generate
{
  "generated_text": "\n\n1. Grand Canyon National Park, Arizona\n2. Yellowstone National Park, Wyoming\n3. Zion National Park, Utah\n4. Bryce Canyon National Park, Utah\n5. Rocky Mountain National Park, Colorado\n\nThese five national parks are consistently ranked among the top in the country for their stunning natural beauty, diverse ecosystems, and recreational opportunities."
}
© Shadeform, Inc