Media Summary: A visual explanation of how transformers piece concepts together, told How can a Transformer have a huge hidden layer but still run faster? This paper shows that many feed-forward activations I made a video about one of my favorite papers! I hope you enjoy :) ===Summary=== "Applying

A Window Into Llms Sparse - Detailed Analysis & Overview

A visual explanation of how transformers piece concepts together, told How can a Transformer have a huge hidden layer but still run faster? This paper shows that many feed-forward activations I made a video about one of my favorite papers! I hope you enjoy :) ===Summary=== "Applying

Photo Gallery

A Window  Into LLMs | Sparse Autoencoders Explained
RTPurbo: 100-Step Sparse Attention for LLMs
Reading an AI's Mind with Sparse Autoencoders
Sparse LLMs at inference: 6x faster transformers! | DEJAVU paper explained
Why Sparse Activations Make LLMs Faster | One Minute Paper
How DeepSeek Rewrote the Transformer [MLA]
Sparse Autoencoders Unlearn Knowledge in LLMs | A Paper-Based Walkthrough
IndexCache: Faster Sparse Attention for LLMs
Why LLMs Forget—and How RAG + Context Engineering Fix It (Free Labs).
Sanity Checks for LLM Sparse Autoencoders
How to train LLMs with long context?
View Detailed Profile
A Window  Into LLMs | Sparse Autoencoders Explained

A Window Into LLMs | Sparse Autoencoders Explained

This has been my favorite video so far

RTPurbo: 100-Step Sparse Attention for LLMs

RTPurbo: 100-Step Sparse Attention for LLMs

In

Reading an AI's Mind with Sparse Autoencoders

Reading an AI's Mind with Sparse Autoencoders

A visual explanation of how transformers piece concepts together, told

Sparse LLMs at inference: 6x faster transformers! | DEJAVU paper explained

Sparse LLMs at inference: 6x faster transformers! | DEJAVU paper explained

Contextual

Why Sparse Activations Make LLMs Faster | One Minute Paper

Why Sparse Activations Make LLMs Faster | One Minute Paper

How can a Transformer have a huge hidden layer but still run faster? This paper shows that many feed-forward activations

How DeepSeek Rewrote the Transformer [MLA]

How DeepSeek Rewrote the Transformer [MLA]

Thanks

Sparse Autoencoders Unlearn Knowledge in LLMs | A Paper-Based Walkthrough

Sparse Autoencoders Unlearn Knowledge in LLMs | A Paper-Based Walkthrough

I made a video about one of my favorite papers! I hope you enjoy :) ===Summary=== "Applying

IndexCache: Faster Sparse Attention for LLMs

IndexCache: Faster Sparse Attention for LLMs

In

Why LLMs Forget—and How RAG + Context Engineering Fix It (Free Labs).

Why LLMs Forget—and How RAG + Context Engineering Fix It (Free Labs).

00:26 - Understanding context

Sanity Checks for LLM Sparse Autoencoders

Sanity Checks for LLM Sparse Autoencoders

In

How to train LLMs with long context?

How to train LLMs with long context?

In