Quickstart Guide
Get from zero to running AI inference on Apple Silicon in under 5 minutes.
Prerequisites
Before you begin, make sure you have:
- Python 3.9 or higher
- A MetalCloud account (join the waitlist if you don't have one)
- Your API key (available in your dashboard)
Step 1: Install the SDK
1
Install via pip
The MetalCloud Python SDK is available on PyPI:
Terminal
pip install metalcloud
This installs the metalcloud package and all required dependencies.
Step 2: Authenticate
2
Set your API key
You can authenticate in two ways:
Option A: Environment Variable (Recommended)
Terminal
export METALCLOUD_API_KEY="your-api-key-here"
Option B: Direct in Code
Python
import metalcloud client = metalcloud.Client(api_key="your-api-key-here")
๐ Security Tip
Never commit your API key to version control. Use environment variables or a secrets manager in production.
Step 3: Run Your First Job
3
Submit an inference request
Here's a complete example that runs Llama 3.3 70B:
Python
import metalcloud # Initialize the client (uses METALCLOUD_API_KEY env var) client = metalcloud.Client() # Run inference response = client.inference( model="meta-llama/Llama-3.3-70B", prompt="Explain the benefits of unified memory in Apple Silicon chips.", max_tokens=500, temperature=0.7 ) # Print the response print(response.text) # View usage stats print(f"Tokens used: {response.usage.total_tokens}") print(f"Cost: ยฃ{response.usage.cost:.4f}")
Streaming Responses
For real-time output, use streaming:
Python
for chunk in client.inference( model="meta-llama/Llama-3.3-70B", prompt="Write a haiku about cloud computing.", stream=True ): print(chunk.text, end="", flush=True)
Next Steps
- Python SDK Reference โ Full API documentation
- Model Library โ Available models and configurations
- MLX Guide โ Using Apple's MLX framework
- LLM Inference Guide โ Best practices for large models
๐ฌ Need Help?
Join our Discord community or email support@metalcloud.space.