MLX Framework

Cloud MLX Training & Inference

Apple's MLX framework meets enterprise-scale compute. Train and deploy ML models with native Metal acceleration on 512GB unified memory machines. The fastest path from research to production on Apple Silicon.

Start Training MLX Documentation

Why MLX on MetalCloud

Native Apple Silicon, at Scale

MLX is designed for Apple Silicon. MetalCloud gives you the compute to match your ambitions.

🍎

Native Metal Acceleration

MLX runs directly on Metal Performance Shaders. No CUDA translation layers, no compatibility modes—pure Apple Silicon performance.

⚡

512GB Unified Memory

Train and run models that don't fit in traditional GPU VRAM. Unified memory means no CPU-GPU transfer overhead.

⚡

Lazy Evaluation

MLX's lazy evaluation optimizes computation graphs automatically. Dynamic shapes, efficient memory usage, fast iteration.

🔄

NumPy-Compatible API

Familiar array operations. If you know NumPy, you know MLX. Minimal code changes from existing workflows.

📦

HuggingFace Integration

Load models directly from HuggingFace Hub. MLX-LM provides optimized implementations of popular architectures.

🚀

Swift & Python

First-class Swift bindings for iOS/macOS integration. Python for research. Deploy anywhere in the Apple ecosystem.

Quick Start

MLX Training in Minutes

From zero to training with a few lines of code.

                Python - Fine-tune Llama with MLX
                pip install mlx mlx-lm metalcloud
            

import mlx.core as mx
import mlx.nn as nn
from mlx_lm import load, generate
import metalcloud

# Connect to MetalCloud with 512GB memory
mc = metalcloud.connect(min_memory_gb=256)

# Load a large model - no quantization needed with 512GB
model, tokenizer = load("mlx-community/Llama-3.3-70B-Instruct")

# Fine-tune with LoRA on your data
from mlx_lm.tuner import train

train(
    model=model,
    tokenizer=tokenizer,
    train_data="./training_data.jsonl",
    adapter_path="./adapters",
    lora_rank=16,
    num_epochs=3,
    batch_size=4  # Large batches possible with 512GB
)

# Generate with your fine-tuned model
response = generate(
    model, tokenizer,
    prompt="Explain quantum computing:",
    max_tokens=500
)
print(response)
            

Use Cases

What Teams Build with MLX

🔬 Research & Experimentation

Iterate fast on large models. Test architectures, hyperparameters, and training strategies without waiting for GPU availability or managing multi-GPU complexity.

📱 iOS/macOS ML Features

Train models that deploy directly to Apple devices. CoreML export, on-device fine-tuning, native Swift integration. Ship ML features faster.

🎯 Production Inference

Deploy MLX models as API endpoints. Consistent latency, simple scaling, pay-per-use pricing. No infrastructure to manage.

🔧 Model Optimization

Quantize, prune, and distill models for edge deployment. Experiment at full precision, deploy optimized. Full control over the optimization pipeline.