Inspired by the foundational paper, “A Survey of Context Engineering for Large Language Models,” we developed the Context Engineering Navigator. This interactive tool below allows you to search and explore the core methods from the paper’s framework, helping you put theory into practice.
Our Navigator breaks down the key components of context engineering, including:
Context Retrieval & Generation: Strategies for finding and creating relevant information.
Context Processing: Methodologies for cleaning, condensing, and structuring that information.
Context Management: Systems for efficiently storing and handling context.
Find the Right Methods for Your Project
Ready to dive in? Use our tool to pinpoint the context engineering methods that best fit your needs.
You can filter by a category (All Categories, Context Retrieval & Generation, Context Processing, or Context Management) and click on any method to learn how it works and when to use it. You can even search for your specific use case to see which techniques we recommend.
Context Engineering Navigator
Interactive guide to context engineering techniques for LLMs
Context Retrieval & Generation
Prompt Engineering & Context Generation
Frameworks & Chain-of-Thought Foundations
CLEAR Framework
Conciseness, Logic, Explicitness, Adaptability, Reflectiveness for robust prompt design
Prompt templates
Chain of Thought (CoT)
Step-by-step reasoning with "Let's think step by step"
Mathematical problem solving Debugging code logic
Tree of Thoughts (ToT)
Hierarchical reasoning with exploration and backtracking
Classification tasks Content generation with examples
In-context learning
Performance heavily depends on example selection and ordering
Adaptive AI systems Personalized responses
External Knowledge Retrieval
Retrieval-Augmented Generation Fundamentals
FlashRAG
Comprehensive evaluation and modular RAG implementation
Rapid prototyping of retrieval systems
Advanced Retrieval Strategies
KRAGEN
Integrates Knowledge Graphs with advanced prompting. Breaks down complex problems into smaller sub-problems retrieves relevant information through RAG, and consolidates the results for a more accurate and transparent response
Uses the LLM to plan, decompose, and orchestrate retrievals. Breaks down a user's query into steps, retrieving relevant information for each step (from one or more sources) and then stitches the response together for a final coherent response
Complex research queries Investigative journalism
Adaptive Retrieval Mechanisms
Self-RAG
Model decides when to retrieve information dynamically. Uses special tokens to control retrieval timing and quality assessment
Conversational AI Adapts retrieval based on query complexity
RAPTOR
Processes documents hierarchically using recursive clustering and summarization, constructing tree with differing levels of summarization from bottom up
Research papers Legal documents Comprehensive reports Complex PDFs
HippoRAG
Memory-inspired retrieval architecture that synergistically orchestrates LLMs, knowledge graphs, and the Personalized PageRank algorithm to mimic the different roles of neocortex and hippocampus in human memory
Scientific literature reviews Legal case briefings Medical diagnosis Synthesis of information from various sources
Knowledge Graph Integration and Structured Retrieval
KAPING
Retrieves facts using semantic similarity combining semantic network structure with information content of concepts
Document classification Content recommendation systems Knowledge graph applications
KARPA
Training-free KG adaptation with pre-planning, semantic matching, and relation path reasoning
Sequential reasoning over knowledge graphs to locate relevant triples, conduct exploration to retrieve related information from external databases while generating multiple reasoning paths
Iterative reading and reasoning approach that constructs specialized functions to collect relevant evidence from structured data sources
Database querying Structured data analysis
Agentic and Modular Retrieval Systems
AgenticRAG
Embeds autonomous AI agents into the RAG pipeline using agentic design patterns (reflection, planning, tool use, multi-agent collaboration) to dynamically manage retrieval strategies
Multi-agent collaborative reasoning Personal assistants Adaptive question answering systems Access to various data sources (emails, docs, web search)
Graph Enhanced RAG Systems
Incorporates knowledge graphs into RAG to capture structured relationships between entities, enabling multi-hop reasoning and deeper contextual retrieval using graph traversal and Cypher queries alongside vector similarity search (e.g. Microsoft's GraphRAG)
Enterprise knowledge bases with complex relationships Legal document analysis Scientific literature review Recommendation systems with rich entity relationships
Real Time RAG
Processes streaming data in real-time by constructing evolving knowledge graphs that capture scene-object-entity relationships as data arrives, using lightweight models and dynamic priority-based knowledge extraction
News monitoring Market analysis Intelligent transportation systems Healthcare monitoring Satellite remote sensing Financial trading systems
Dynamic Context Assembly
Assembly Functions and Orchestration Mechanisms
Template-based formatting
Consistent structure for context assembly
API responses Report generation
Priority-based selection
Most important info first in context assembly
Mobile apps Real-time systems
Adaptive composition
Adjusts to task requirements and model capabilities
Multi-domain applications Scalable AI systems
Multi-Component Integration Strategies
Multi-Component Integration
Combines text, structured knowledge, temporal data, and external tools while maintaining coherent semantic relationships
Enterprise AI systems Comprehensive analytics platforms
Uses search algorithms to find optimal prompts automatically. Treats instruction as 'program,' optimized by searching over pool of instruction candidates proposed by LLM to maximize chosen score function
Production systems needing optimal prompts AI Content generation AI-powered chatbots
LM-BFF
Has automated pipelines that combine prompt-based fine-tuning with dynamic demonstrations. Suite of simple techniques for fine-tuning pre-trained language models on small number of annotated examples
Entity extraction Classification tasks Domain adaptation with limited examples
Promptbreeder
Self-Referential Evolutionary Systems where LLMs improve their own task-prompts and mutation-prompts through 'natural selection' analogies. Evolves and adapts prompts for given domain using evolutionary algorithm
Autonomous AI systems Continuous learning applications Domain-specific optimization
Sequential processing chains, agents, web browsing capabilities. Provides building blocks for AI applications with extensive integrations
Chatbots Data retrieval and processing Internal knowlege management RAG applications
AutoGPT/AutoGen
Complex AI agent development with user-friendly interfaces. Multi-agent conversation orchestration with event-driven architecture
Code generation Personalized content creation at scale Automated workflows Research tasks
Context Processing
Long Context Processing
Ultra-long Sequence Context Processing
Standard Transformer (Self-Attention, O²)
Baseline self-attention with quadratic complexity
General LLM Tasks Question answering Summarization
Architectural Innovations for Long Context
State Space Models (SSMs, e.g. Mamba)
Linear complexity with constant memory through hidden states
Real-time language modeling Streaming data Online inference with constant memory
Dilated Attention (e.g. LongNet)
Exponentially expanding attention fields
Genomics Legal/financial document analysis Logs
Toeplitz Neural Networks
Model sequences with relative position encoded Toeplitz matrices reducing space time complexity to log linear
Code generation Document QA Long sequence forecasting
Linear Attention
Expresses self attention as linear dot products of kernel feature maps
Scientific papers Book summarization Historical document analysis
Non-attention LLMs (Recursive memory)
Uses recursive memory transformers, breaking long sequences into chunks. Model keeps a summary ('memory') of each chunk. As model moves through sequence it recursively updates and uses this memory
Chatbots with persistent and growing context Logging agents
Position Interpolation & Context Extension
NTK/YaRN
Neural Tangent Kernel uses NTK interpolation for positional encodings, linear interpolation for stretching position indices, and attention distribution correction tweaks attention mechanism
Fine-tuning LLMs to process longer docs for specialized tasks
LongRoPE
Identifies effective rescale factors for RoPE's rotation angles for each RoPE dimension based on token positions. Uses evolutionary search algorithm with progressive extension strategy to achieve 2048k context window
Rapidly adapting LLMs to massive token windows (256k–2M tokens) Patent or scientific literature mining Big data retrieval
PoSE
Positional Skip-wisE training decouples train length from target context by dividing original context window into chunks with distinct skipping bias terms to manipulate position indices during training
Data curation Codebase analysis Long meeting transcripts
Self-Extend (bi-level/grouped/neighbor attention)
Constructs bi-level attention: grouped attention for distant tokens using FLOOR operation and neighbor attention for adjacent tokens within specified range
Plug-and-play context extension
Optimization Techniques for Efficient Processing
Grouped Query Attention (GQA)
Optimizes multi-head attention by sharing key and value projections across multiple query heads, reducing memory bandwidth requirements
Scaling up LLM Inference for cost savings
FlashAttention 1/2
Memory-efficient attention algorithm that fuses operations and uses tiling to reduce memory accesses while maintaining exact attention computation
Cloud inference Large batch serving Distributed training
Ring Attention
Distributes attention computation across multiple devices using ring communication pattern
Multi-GPU or TPU setups
Sparse Attention (LongLoRA, SinkLoRA)
Selectively attends to subset of tokens based on patterns or learned importance
RAG Selective context retrieval
Efficient Selective Attention
Dynamically selects important tokens for attention computation
Large codebases Multi-document Q&A
BigBird
Combines local, global, and random attention patterns in sparse attention mechanism
Biomedical data mining Knowledge bases Graph-structured documents
Memory Management & Context Compression
Rolling Buffer Cache
Maintains fixed-size cache with sliding window mechanism for token storage
Real-time applications with fixed compute or memory budget
Streaming LLM
Processes continuous input streams with memory-efficient caching mechanisms
Live chat moderation Streaming summarization
Infini-attention
Combines compressive memory with local attention in single model block
Retrieval-based LLMs Systems requiring both detail and recall
Heavy Hitter Oracle
Identifies and prioritizes most important tokens/context for attention
LLM API deployment Latency-sensitive applications
QwenLong-CPRS, InfLLM
Multi-granularity memory management with compression techniques
Advanced compression and context handling for variable-length inputs
Dynamic media Streaming environments AI-driven broadcast
Relational & Structured Context
Knowledge Graph Embeddings and Neural Integration
Graph Neural Networks, GraphFormers, Heterformer
Neural architectures designed for graph-structured data processing
Scientific discovery Knowledge base QA Relational Reasoning
Verbalization & Structured Data Representations
Verbalization/Structured Data Reps
Converts structured data into natural language representations
Table QA Knowledge base integration Data extraction
Programming Language Reps (Python/SQL)
Uses programming languages as intermediate representations
Data engineering Code synthesis Database querying
Matrix Representations
Compact matrix-based representations of structured information
Lightweight edge deployments On-device ML
Integration Frameworks & Synergized Approaches
K-BERT (Pretrain), KAPING (Inference Time)
Injects structured knowledge during pretraining or inference
Medical/financial LLMs RAG
Unified Approaches (GreaseLM, QA-GNN)
Combines natural language fluency with knowledge graph reasoning
Open-domain QA Research assistants Scientific agents
Context Management
Memory Hierarchies & Storage Architectures
Virtual Memory Systems
MemGPT
Virtual memory management like OS systems
Extended conversations Document analysis Personal companion systems Chatbots with persistent memory
PagedAttention
Efficient KV cache memory management with non-contiguous blocks
Cloud inference Large batch serving Production LLM deployment
Dynamic Memory Organizations
MemoryBank
Uses Ebbinghaus Forgetting Curve principles with dynamic memory strength adjustment
Personal companion systems Psychological counseling Long-term AI companions
ReadAgent
Episode pagination, memory gisting, and interactive lookup for human-like reading
Long document analysis Research paper comprehension Multi-document Q&A
Compressor-Retriever Systems
Lifelong context management using base model functions to compress and retrieve content with end-to-end differentiability
Long-term learning systems Evolving knowledge bases
System Configurations
Centralized Systems
Excellent task coordination but poor scalability, context overflow as topics increase
Single-domain chatbots Focused task automation
Decentralized Systems
Reduced context overflow but increased response time due to inter-agent querying
Multi-agent systems Distributed knowledge processing
Hybrid Systems
Balances shared knowledge with specialized processing, semi-autonomous operation
Enterprise AI systems Complex conversational agents
Context Compression
Context Manager Components
Snapshot creation
Save intermediate states during processing
Long-running tasks Recovery systems
State restoration
Resume from previous points in processing
Recovery from interruptions
Window management
Overall context optimization and organization
Multi-step reasoning Complex document processing
Context Compression Techniques
In-Context Autoencoder (ICAE)
Condenses long contexts into compact memory slots for direct conditioning
Memory-constrained environments
Recurrent Context Compression (RCC)
Expands context window length in constrained storage using instruction reconstruction techniques
Edge computing Resource-limited deployment
Memory Augmented Approaches
kNN-Based Memory Caches
Store key-value pairs with contrastive learning to improve retrieval accuracy
Conversation systems Recommendation engines
Hierarchical Caching Systems
Hierarchical Caching
Multi-layer caching systems like Activation Refilling (ACRE)
Enterprise systems with multiple data tiers
Infinite-LLM with DistAttention
Handle extremely long sequences using distributed attention mechanisms
Large-scale document processing
KCache
Smart optimization: K cache in high-bandwidth memory, V cache in CPU memory for optimized inference speed and memory usage
High-performance serving Cost optimization
Multi-agent Distributive Processing
Multi-Agent Distributive Processing
Handles massive inputs in distributed manner with high cache reusability in RAG and agent workloads
Large-scale knowledge processing Enterprise RAG
Cache Access Pattern Analysis
High reusability patterns in RAG and agent applications to reduce redundancy
Production RAG systems Agent platforms
A More Effective Way to Build Smarter Language Models
If you’ve built production machine learning (ML) systems, you’ve likely run into these challenges: your model struggles with domain-specific tasks, can’t access real-time information, or fails to deliver personalized responses. The traditional solution of creating massive, specialized datasets is often incredibly expensive and time-consuming.
Context engineering offers a powerful alternative. It’s the process of finding, preparing, and managing the right information to give your model at the exact moment it’s needed (at inference time). This approach provides your model with targeted, relevant knowledge, drastically reducing the need for costly retraining.
The world of AI will continue to evolve, but the principle of supplying models with relevant, external information will always be critical. Instead of waiting years for the perfect dataset, you can use context engineering to get immediate performance boosts for your specialized and dynamic applications.
We’ve also included a list of additional resources that we’ve found most effective for deploying these techniques in real-world systems below.