What is local AI?

Local AI is artificial intelligence that runs entirely on your own computer. The model lives on your disk, inference happens on your CPU or GPU, and no conversation leaves your machine unless you explicitly choose to send it somewhere else.

What is local AI?

Local AI is artificial intelligence that runs on your own hardware instead of on a remote server. The language model weights sit on your disk, inference executes on your CPU or GPU, and the full conversation stays in local memory.

Cloud AI (ChatGPT, Claude.ai, Gemini) sends each prompt you type to a company's servers. A language model running in their datacenter generates a reply, which travels back over the internet. Local AI removes that round trip. The model file lives on your machine, and when you ask it a question your CPU or GPU does the thinking. "Runs on your hardware" means exactly that: no cloud tenant, no API call, no provider log of what you asked.

How does local AI work?

You install a local runtime (Ollama, LM Studio, or llama.cpp), download an open-source model file once, and from then on every prompt runs inference locally. The model never phones home and no request leaves your device.

The steps in practice: the runtime downloads the model weights (typically a few gigabytes) from a public registry, caches them on disk, and exposes a local HTTP endpoint on your own machine. Applications like InnerZero call that endpoint the same way a web app would call a cloud API, except the call never touches the internet. Inference uses your GPU first (fast) and falls back to CPU (slower but workable) if the model is larger than your VRAM. InnerZero is the front end: it handles the chat window, memory, voice, knowledge packs, and tools. The heavy lifting happens in Ollama, LM Studio, or llama.cpp, which we treat as swappable engines.

What is the difference between local AI and cloud AI?

Local AI wins on privacy, cost, and offline use. Cloud AI wins on raw capability for frontier reasoning tasks. Hybrid means local by default with cloud available when a task needs it; it is what InnerZero supports via optional API keys and managed cloud.

The honest trade-off looks like this:

	Local AI	Cloud AI	Hybrid
Privacy	Full. Nothing leaves your device.	Depends on provider terms. Prompts reach provider servers.	Local by default. Cloud only for messages you explicitly send.
Cost	Free. Electricity is the only marginal cost.	Subscription or pay-per-token. Scales with usage.	Free for local tasks. You pay the provider only for cloud calls.
Performance	Depends on your hardware. A mid-range PC runs an 8B model smoothly.	Datacenter GPUs. Consistent, fast, no local resources used.	Local for daily use, cloud for heavy reasoning when you want it.
Capability	Strong on most tasks. Open-source models up to gpt-oss 120B.	Frontier models (Claude, GPT-5, Gemini) still lead on hardest tasks.	Broad daily competence locally, frontier reach from cloud when needed.

For a deeper walkthrough of the trade-offs, see local AI vs cloud AI.

What hardware do I need to run local AI?

Any modern laptop can run a small model (4B parameters) on CPU alone. A consumer GPU with 8 to 16 GB of VRAM comfortably runs the mid-range models most users reach for. Frontier models in the 30B to 120B range need a high-end GPU or multi-card setup.

InnerZero ships with a tier ladder so the right model gets picked automatically for your machine. Entry tier (8 to 16 GB VRAM) runs Qwen 3 8B for chat and code. Standard tier (16 to 24 GB VRAM) steps up to Qwen 3 30B for stronger reasoning. Pro tier (48 to 80 GB VRAM or a pair of cards) runs gpt-oss 120B for heavy tasks. Enterprise and frontier tiers push into multi-GPU workstations and the 235B class. For the full matrix, see the supported models page, and for a practical buying guide read hardware for local AI.

Is local AI free?

Yes. The open-source models and the runtimes that serve them are free to download and use. InnerZero itself is free for personal use. Your only marginal cost is the electricity each inference uses.

The realistic catch is not money, it is time and hardware. A model big enough to feel useful wants a decent GPU, which you either already have or buy once. First-time setup (installing the runtime, downloading a model) takes half an hour end to end. After that, every prompt is free. Compare this to a cloud subscription at £15 to £20 a month: local breaks even in well under a year of regular use. See AI without a subscription for the full maths.

Is local AI private?

Yes. By default, no prompt, file, or memory leaves your machine. The model is a static file; it has no network access of its own. Only you see what you type and what the AI says back.

The caveat is that "private" depends on the app wrapper you use. InnerZero is designed so local mode truly is local: no telemetry, no analytics, no crash reports, no cloud sync. Optional cloud mode exists (for tasks where you want a frontier model) but is strictly opt-in, and even then only the current prompt plus a short conversation window is sent, never your full memory or files. For the formal statement of what is and is not collected, see the privacy policy.

Can local AI work offline?

Yes. Once the model is downloaded, local AI runs with no internet connection at all. Planes, trains, remote cabins, air-gapped work laptops: all fine.

The one caveat is first-time setup. Installing the runtime and downloading the model weights does require internet, because the weights live in a public registry. After that single download, you can disconnect permanently and the AI still works. For reference material that also needs to be available offline, see using AI offline and the knowledge-packs feature covered in knowledge packs explained.

What AI models can I run locally?

The three families InnerZero installs by default are Qwen 3, Gemma 3, and gpt-oss. Together they cover everything from a 1B voice model on a basic laptop up to a 235B reasoning model on a datacenter-class workstation.

Qwen 3 (by Alibaba) is the daily-driver family: Apache 2.0 licensed, sizes from 4B to 235B, strong at chat and code. Gemma 3 (by Google) covers the smaller sizes well and powers voice transcription. gpt-oss (OpenAI's open-weight release, 120B) is the top end of the default install on high-memory machines. For the complete catalogue and hardware mapping, see the models page.

What can I do with local AI?

The same tasks people use ChatGPT for, minus the cloud round trip: writing, coding, summarising, planning, answering questions about your own documents, voice chat, image understanding.

The practical surface area looks like this:

Chat: day-to-day conversation, idea sparring, writing help.
Code: reading, writing, debugging, explaining across common languages.
Voice: push to talk, wake words, fully local speech-to-text.
Knowledge packs: your own PDFs, notes, or reference material as offline context.
Tools: local file access, search, and calculators where supported by the front end.

How does InnerZero fit?

InnerZero is a desktop app that turns a local runtime (Ollama, LM Studio, or llama.cpp) into a polished daily assistant. It adds memory, voice, tools, knowledge packs, and an optional cloud layer for when you want a frontier model.

The separation of concerns is intentional: Ollama and LM Studio are excellent engines but offer only a developer-oriented chat UI. InnerZero is the interface that makes local AI feel like a product instead of a tinkering project. Cloud is optional and always opt-in: add your own API key for any of seven providers, or subscribe to the managed plan. Either way, the local app stays free. Start at the features page, grab the installer from the download page, or see the full breakdown on the pricing page.

Who is local AI for?

Most people benefit from local AI, but the pitch differs by context. Pick the persona that matches your use case for a tailored breakdown.

Developers

Local coding models, sandboxed agent, optional frontier keys.

Writers

Persistent project memory, offline dictation, tuneable AI voice.

Students

Free, no account, offline study with knowledge packs.

Researchers

Sensitive sources stay on your machine; frontier via BYO keys.

Offline work

Travel, fieldwork, or air-gapped sites with no internet.

Frequently asked questions

Do I need a GPU for local AI?

No, but it helps. Small models like Qwen 3 4B and Gemma 3 4B run on CPU alone, slowly but usably, on any modern laptop. A dedicated GPU with 8GB of VRAM or more makes mid-range models (8B to 14B) feel responsive. Serious reasoning models (30B and up) need 24GB of VRAM or a pair of consumer cards.

How much does it cost to run local AI?

The software and models are free. You pay for hardware once (a consumer laptop or desktop you probably already own), plus the electricity each inference uses. Running a 14B model on a desktop GPU for an hour of active use typically costs a few pence in UK electricity. There is no per-token or subscription charge.

Is local AI as good as ChatGPT?

For daily tasks like writing, summarising, and conversational help, modern open-source models are competitive with GPT-4-class cloud models. For the hardest reasoning and coding problems, frontier cloud models (Claude Opus, GPT-5, Gemini 2.5 Pro) still lead. The honest answer is that local AI is good enough for most work and you reach for cloud only when the task requires it.

Can I use my own Claude or GPT API key?

Yes. InnerZero supports bring-your-own-key for seven providers including Anthropic, OpenAI, Google, and DeepSeek. Keys are encrypted and stored on your machine. Requests go directly from your device to the provider, with no InnerZero server in the middle and no markup on the provider's own token pricing.

What is Ollama?

Ollama is an open-source runtime that downloads and serves local language models on your computer. It exposes a simple HTTP API that applications like InnerZero call to generate text, embeddings, and vision outputs. You can think of it as the engine; InnerZero is the interface.

What is LM Studio?

LM Studio is a desktop application that does the same job as Ollama (downloading and serving local language models) with a built-in chat UI and model browser. InnerZero works with either backend, so you can pick whichever you already have installed.

What is llama.cpp?

llama.cpp is the open-source inference engine that much of the local AI world is built on; Ollama and LM Studio both use it under the hood. If you run your own llama-server from llama.cpp directly, InnerZero can connect to it as a backend: pick llama.cpp in Settings, point it at your server, and chat, voice, and memory all route through it, with a picker for your GGUF model files.

Is local AI safe to use for sensitive documents?

Yes. Local AI never sends document contents to a third-party server. Your files stay on your disk, your prompts stay in local memory, and the model weights are static files that cannot phone home. This makes local AI well suited to confidential work such as legal drafts, medical notes, or private journals.

Can I run local AI on a Mac or Linux?

Ollama, LM Studio, and llama.cpp all run on macOS, Linux, and Windows. Open-source models run anywhere those runtimes run. InnerZero ships on Windows, macOS, and Linux, so the local AI stack is cross-platform end to end.

What is local AI?

How does local AI work?

What is the difference between local AI and cloud AI?

What hardware do I need to run local AI?

Is local AI free?

Is local AI private?

Can local AI work offline?

What AI models can I run locally?

What can I do with local AI?

How does InnerZero fit?

Who is local AI for?

Frequently asked questions

Do I need a GPU for local AI?

How much does it cost to run local AI?

Is local AI as good as ChatGPT?

Can I use my own Claude or GPT API key?

What is Ollama?

What is LM Studio?

What is llama.cpp?

Is local AI safe to use for sensitive documents?

Can I run local AI on a Mac or Linux?

Further reading