Media Summary: Here's the one change that took mine from ~120 tok/s to 1200+ without a new GPU. TryHackMe just launched Cyber Security 101 ... This is the stack that gets me over 4000 tokens per second Ready to become a certified watsonx AI Assistant Engineer? Register now and use code IBMTechYT20 for 20% off of your exam ...

This Local Llm Looked Smart - Detailed Analysis & Overview

Here's the one change that took mine from ~120 tok/s to 1200+ without a new GPU. TryHackMe just launched Cyber Security 101 ... This is the stack that gets me over 4000 tokens per second Ready to become a certified watsonx AI Assistant Engineer? Register now and use code IBMTechYT20 for 20% off of your exam ... I put a tiny MacBook Air between me and some ridiculously large Dave tests llama3.1 and llama3.2 using Ollama on a Raspberry Pi, a Herk Orion Mini PC, a 3970X, an M2 Mac Pro, and a ... Llama.cpp Web UI + GGUF Setup Walkthrough and Ollama comparisons. Check out ChatLLM: My ...

I Made ChatGPT-2 Run on a Potato (63MB AI Model!) - Extreme Quantization Experiment What happens when you compress a ... The Qwen3 family of thinking large language models has just been released and the smallest model in the family is just 523MB! Learn in-demand Machine Learning skills now → Learn about watsonx → Large ...

Photo Gallery

This Local LLM Looked Smart Until I Saw What It Made Up
Your local LLM is 10x slower than it should be
THIS is the REAL DEAL 🤯 for local LLMs
Are Local Models Finally Good Enough?
What is Ollama? Running Local LLMs Made Simple
Private AI on the go… a new trick
Run Local LLMs on Hardware from $50 to $50,000 - We Test and Compare!
Local AI just leveled up... Llama.cpp vs Ollama
I Made The Smallest (And Dumbest) LLM
What Can a 500MB LLM Actually Do? You'll Be Surprised!
How Large Language Models Work
How to Choose Large Language Models: A Developer’s Guide to LLMs
Sponsored
Sponsored
View Detailed Profile
This Local LLM Looked Smart Until I Saw What It Made Up

This Local LLM Looked Smart Until I Saw What It Made Up

Don't Trust One-Number

Your local LLM is 10x slower than it should be

Your local LLM is 10x slower than it should be

Here's the one change that took mine from ~120 tok/s to 1200+ without a new GPU. TryHackMe just launched Cyber Security 101 ...

Sponsored
THIS is the REAL DEAL 🤯 for local LLMs

THIS is the REAL DEAL 🤯 for local LLMs

This is the stack that gets me over 4000 tokens per second

Are Local Models Finally Good Enough?

Are Local Models Finally Good Enough?

I have been covering

What is Ollama? Running Local LLMs Made Simple

What is Ollama? Running Local LLMs Made Simple

Ready to become a certified watsonx AI Assistant Engineer? Register now and use code IBMTechYT20 for 20% off of your exam ...

Sponsored
Private AI on the go… a new trick

Private AI on the go… a new trick

I put a tiny MacBook Air between me and some ridiculously large

Run Local LLMs on Hardware from $50 to $50,000 - We Test and Compare!

Run Local LLMs on Hardware from $50 to $50,000 - We Test and Compare!

Dave tests llama3.1 and llama3.2 using Ollama on a Raspberry Pi, a Herk Orion Mini PC, a 3970X, an M2 Mac Pro, and a ...

Local AI just leveled up... Llama.cpp vs Ollama

Local AI just leveled up... Llama.cpp vs Ollama

Llama.cpp Web UI + GGUF Setup Walkthrough and Ollama comparisons. Check out ChatLLM: https://chatllm.abacus.ai/ltf My ...

I Made The Smallest (And Dumbest) LLM

I Made The Smallest (And Dumbest) LLM

I Made ChatGPT-2 Run on a Potato (63MB AI Model!) - Extreme Quantization Experiment What happens when you compress a ...

What Can a 500MB LLM Actually Do? You'll Be Surprised!

What Can a 500MB LLM Actually Do? You'll Be Surprised!

The Qwen3 family of thinking large language models has just been released and the smallest model in the family is just 523MB!

How Large Language Models Work

How Large Language Models Work

Learn in-demand Machine Learning skills now → https://ibm.biz/BdK65D Learn about watsonx → https://ibm.biz/BdvxRj Large ...

How to Choose Large Language Models: A Developer’s Guide to LLMs

How to Choose Large Language Models: A Developer’s Guide to LLMs

Ready to become a certified watsonx AI Assistant Engineer? Register now and use code IBMTechYT20 for 20% off of your exam ...