Documentation

Everything you need to deploy AI workloads on Apple Silicon

Quickstart

Get up and running in under 5 minutes. Install the SDK, authenticate, and run your first job.

Get started →

Python SDK

Complete reference for the MetalCloud Python SDK. Job submission, streaming, and more.

View reference →

REST API

Direct API access for any language. OpenAPI spec, authentication, and endpoints.

Explore API →

Model Library

Pre-configured models ready to deploy. Llama, Mistral, DeepSeek, and more.

Browse models →

MLX Guide

Using Apple's MLX framework on MetalCloud. Optimizations, examples, and best practices.

Billing & Usage

Understanding your bill, usage tracking, and cost optimization strategies.

View billing →

Popular Topics

Frequently accessed documentation

Installing the SDK

pip install metalcloud

Authentication

API keys and environment setup

Submitting Jobs

Run inference on MetalCloud

Streaming Responses

Real-time output handling

Running Llama 70B

Full precision inference guide

Troubleshooting

Common issues and solutions

Quick Example

Run inference in just a few lines of code

                    Python
                

import metalcloud

# Initialize client
client = metalcloud.Client()

# Run inference on Llama 70B
response = client.inference(
    model="meta-llama/Llama-3.3-70B",
    prompt="Explain unified memory in Apple Silicon",
    max_tokens=500
)

print(response.text)