Blog

Gemma 4 31B Running Fastest on SambaCloud

By SambaNova

June 10, 2026

Gemma 4 31B is Google DeepMind's most capable dense open model to date — and it's running fastest on SambaCloud. Try it today for reasoning, coding, and agentic workflows on SambaCloud.

Gemma 4 31B is an open-weight (Apache 2.0) frontier dense model built for advanced reasoning, coding, and agentic workflows
Top benchmarks: 85.2% MMLU Pro, 89.2% AIME 2026 (no tools), 80.0% LiveCodeBench v6, 84.3% GPQA Diamond, Codeforces ELO 2150
Gemma 4 31B is an open multimodal model and supports text and image input with text output.
Available as a preview model on SambaCloud playground or API with the model name gemma-4-31B-it

It's the largest dense model in Google's Gemma 4 family, built from the same research foundation as Gemini 3.

Read the official Gemma 4 announcement here.

SambaCloud runs Gemma 4 31B more than 30% faster than the next provider and miles ahead of the rest. The fastest place to run Gemma 4 31B, verified by Artificial Analysis.

Why Use Gemma 4 31B?

Gemma 4 31B brings frontier-class reasoning to an open-weight model that's small enough to fine-tune and deploy on accessible hardware, while SambaCloud delivers it at the lowest latency available. Key strengths include:

Advanced Reasoning

Built as a highly capable reasoner with a configurable thinking mode, Gemma 4 31B scores 89.2% on AIME 2026 (no tools) and 84.3% on GPQA Diamond, with strong multi-step planning and instruction-following. Toggle thinking on or off depending on whether your workload needs deep deliberation or fast turnaround.

State-of-the-Art Coding

Production-grade coding performance with 80.0% on LiveCodeBench v6 and a Codeforces ELO of 2150 — turning a single workstation, or a SambaCloud endpoint, into a frontier-class local-first code assistant.

Native Agentic Capabilities

Native function-calling, structured JSON output, and native system-prompt support let you build autonomous agents that reliably interact with tools and APIs. Pairs naturally with multi-agent frameworks like OpenClaw and CrewAI.

Get Started Quickly with SambaCloud

With just a few lines of Python, you can pass Gemma 4 31B an image and have it reason over what it sees — extracting structured data from a chart, document, or screenshot and returning it as clean JSON. This is the kind of vision-plus-reasoning task that makes the model a strong foundation for document-processing and agentic applications.

import base64
import mimetypes
import os
import urllib.request

from sambanova import SambaNova

# Initialize client using environment variable for security
client = SambaNova(
    api_key=os.environ["SAMBANOVA_API_KEY"],
    base_url="https://api.sambanova.ai/v1",
)

# Fetch a publicly accessible image of a handwritten manuscript and inline it as a data URI.
# (Wikimedia rejects the default urllib User-Agent, so we set a browser-like one.)
source_url = "https://upload.wikimedia.org/wikipedia/commons/c/c3/GeneralRelativityTheoryManuscript.jpg"
req = urllib.request.Request(source_url, headers={"User-Agent": "Mozilla/5.0"})
with urllib.request.urlopen(req) as resp:
    image_bytes = resp.read()

media_type = mimetypes.guess_type(source_url)[0] or "image/jpeg"
image_url = f"data:{media_type};base64," + base64.b64encode(image_bytes).decode("ascii")

system_prompt = "You are a careful transcription assistant built on Gemma 4 31B."

user_prompt = (
    "Transcribe the handwritten text in this image exactly as written, in its "
    "original language, preserving line breaks. Then translate it into English. "
    "Return a JSON object with four keys: 'transcription' (the full original-language "
    "text), 'translation' (the English translation), 'legibility' (high, medium, or "
    "low), and 'notes' (any words you were unsure about, or an empty list). "
    "Respond with ONLY the JSON, no prose."
)

response = client.chat.completions.create(
    model="gemma-4-31B-it",
    messages=[
        {"role": "system", "content": system_prompt},
        {
            # Per Gemma 4's guidance, place the image BEFORE the text for best results
            "role": "user",
            "content": [
                {"type": "image_url", "image_url": {"url": image_url}},
                {"type": "text", "text": user_prompt},
            ],
        },
    ],
    temperature=1.0,
    top_p=0.95,
    top_k=64,
)

print(response.choices[0].message.content)

Check out SambaCloud and explore Gemma 4 31B in the playground, or generate an API key and integrate it into your agentic application today.

Gemma 4 31B brings frontier open intelligence to production. SambaNova delivers it the fastest.

FAQs

Gemma 4 31B is Google DeepMind's most capable dense open model to date, released under an Apache 2.0 license. Built for advanced reasoning, coding, and agentic workflows, it's the largest dense model in the Gemma 4 family and shares the same research foundation as Gemini 3.

Gemma 4 31B scores 85.2% on MMLU Pro, 89.2% on AIME 2026 (no tools), 80.0% on LiveCodeBench v6, 84.3% on GPQA Diamond, and a Codeforces ELO of 2150, reflecting frontier-class performance across reasoning, mathematics, and coding tasks.

Yes. Gemma 4 31B includes a configurable thinking mode. Toggle it on when your workload requires deep deliberation and multi-step reasoning, or off for faster turnaround, making it adaptable to both complex reasoning tasks and latency-sensitive applications.

Gemma 4 31B supports native function-calling, structured JSON output, and native system-prompt support, making it a strong foundation for autonomous agents. It pairs natively with multi-agent frameworks including OpenClaw and CrewAI for building agentic workflows on SambaNova.

SambaCloud runs Gemma 4 31B more than 30% faster than the next provider, the fastest place to run the model, verified by Artificial Analysis. This speed advantage is particularly valuable for agentic and reasoning workloads that require low latency.

Gemma 4 31B is available as a preview model on SambaCloud via the playground or API, using the model name gemma-4-31B-it. Generate an API key to start integrating it into your application today.

← The First Disaggregated Inference Demo for AI Agents Is Live

SambaCloud Now Supports the Anthropic Messages API →