Everything you need to deploy AI workloads on Apple Silicon
Get up and running in under 5 minutes. Install the SDK, authenticate, and run your first job.
Get started โComplete reference for the MetalCloud Python SDK. Job submission, streaming, and more.
View reference โDirect API access for any language. OpenAPI spec, authentication, and endpoints.
Explore API โPre-configured models ready to deploy. Llama, Mistral, DeepSeek, and more.
Browse models โUsing Apple's MLX framework on MetalCloud. Optimizations, examples, and best practices.
Learn MLX โUnderstanding your bill, usage tracking, and cost optimization strategies.
View billing โFrequently accessed documentation
Run inference in just a few lines of code
import metalcloud # Initialize client client = metalcloud.Client() # Run inference on Llama 70B response = client.inference( model="meta-llama/Llama-3.3-70B", prompt="Explain unified memory in Apple Silicon", max_tokens=500 ) print(response.text)