New top story on Hacker News: Show HN: RunAnwhere – Faster AI Inference on Apple Silicon
Show HN: RunAnwhere – Faster AI Inference on Apple Silicon 4 by sanchitmonga22 | 0 comments on Hacker News. Hi HN, we're Sanchit and Shubham (YC W26). We built a fast inference engine for Apple Silicon. LLMs, speech-to-text, text-to-speech – MetalRT beats llama.cpp, Apple's MLX, Ollama, and sherpa-onnx on every modality we tested. Custom Metal shaders, no framework overhead. Also, we've open-sourced RCLI, the fastest end-to-end voice AI pipeline on Apple Silicon. Mic to spoken response, entirely on-device. No cloud, no API keys. To get started: brew tap RunanywhereAI/rcli https://ift.tt/uiIg7hw brew install rcli rcli setup # downloads ~1 GB of models rcli # interactive mode with push-to-talk Or: curl -fsSL https://ift.tt/FuToq0e | bash The numbers (M4 Max, 64 GB, reproducible via `rcli bench`): LLM decode – 1.67x faster than llama.cpp, 1.19x faster than Apple MLX (same model files): - Qwen3-0.6B: 658 tok/s (vs mlx-lm 552, llama.cpp 295) - Qwen3-4B: 186 tok/s (vs mlx-lm 1...