Context Engineering Navigator

Introducing the Context Engineering Navigator

Inspired by the foundational paper, “A Survey of Context Engineering for Large Language Models,” we developed the Context Engineering Navigator. This interactive tool below allows you to search and explore the core methods from the paper’s framework, helping you put theory into practice.

Our Navigator breaks down the key components of context engineering, including:

Context Retrieval & Generation: Strategies for finding and creating relevant information.
Context Processing: Methodologies for cleaning, condensing, and structuring that information.
Context Management: Systems for efficiently storing and handling context.

Find the Right Methods for Your Project

Ready to dive in? Use our tool to pinpoint the context engineering methods that best fit your needs.

You can filter by a category (All Categories, Context Retrieval & Generation, Context Processing, or Context Management) and click on any method to learn how it works and when to use it. You can even search for your specific use case to see which techniques we recommend.

Context Engineering Navigator

Interactive guide to context engineering techniques for LLMs

Context Retrieval & Generation

Prompt Engineering & Context Generation

Frameworks & Chain-of-Thought Foundations

CLEAR Framework

Conciseness, Logic, Explicitness, Adaptability, Reflectiveness for robust prompt design

Prompt templates

Chain of Thought (CoT)

Step-by-step reasoning with "Let's think step by step"

Mathematical problem solving Debugging code logic

Tree of Thoughts (ToT)

Hierarchical reasoning with exploration and backtracking

Strategic planning Chess moves Complex optimization

Graph of Thoughts (GoT)

Interconnected reasoning graphs with dependencies

Software architecture design Project management

Learning Strategies

Zero-shot

No examples, direct instruction

Rapid prototyping Model evaluation

Few-shot

2-5 demonstration examples for task learning

Classification tasks Content generation with examples

In-context learning

Performance heavily depends on example selection and ordering

Adaptive AI systems Personalized responses

External Knowledge Retrieval

Retrieval-Augmented Generation Fundamentals

FlashRAG

Comprehensive evaluation and modular RAG implementation

Rapid prototyping of retrieval systems

Advanced Retrieval Strategies

KRAGEN

Integrates Knowledge Graphs with advanced prompting. Breaks down complex problems into smaller sub-problems retrieves relevant information through RAG, and consolidates the results for a more accurate and transparent response

Enterprise knowledge bases Complex domain-specific queries

ComposeRAG

Uses the LLM to plan, decompose, and orchestrate retrievals. Breaks down a user's query into steps, retrieving relevant information for each step (from one or more sources) and then stitches the response together for a final coherent response

Complex research queries Investigative journalism

Adaptive Retrieval Mechanisms

Self-RAG

Model decides when to retrieve information dynamically. Uses special tokens to control retrieval timing and quality assessment

Conversational AI Adapts retrieval based on query complexity

RAPTOR

Processes documents hierarchically using recursive clustering and summarization, constructing tree with differing levels of summarization from bottom up

Research papers Legal documents Comprehensive reports Complex PDFs

HippoRAG

Memory-inspired retrieval architecture that synergistically orchestrates LLMs, knowledge graphs, and the Personalized PageRank algorithm to mimic the different roles of neocortex and hippocampus in human memory

Scientific literature reviews Legal case briefings Medical diagnosis Synthesis of information from various sources

Knowledge Graph Integration and Structured Retrieval

KAPING

Retrieves facts using semantic similarity combining semantic network structure with information content of concepts

Document classification Content recommendation systems Knowledge graph applications

KARPA

Training-free KG adaptation with pre-planning, semantic matching, and relation path reasoning

Real-time knowledge integration Dynamic query processing

Think-on-Graph

Sequential reasoning over knowledge graphs to locate relevant triples, conduct exploration to retrieve related information from external databases while generating multiple reasoning paths

Complex fact verification Multi-step logical inference

StructGPT

Iterative reading and reasoning approach that constructs specialized functions to collect relevant evidence from structured data sources

Database querying Structured data analysis

Agentic and Modular Retrieval Systems

AgenticRAG

Embeds autonomous AI agents into the RAG pipeline using agentic design patterns (reflection, planning, tool use, multi-agent collaboration) to dynamically manage retrieval strategies

Multi-agent collaborative reasoning Personal assistants Adaptive question answering systems Access to various data sources (emails, docs, web search)

Graph Enhanced RAG Systems

Incorporates knowledge graphs into RAG to capture structured relationships between entities, enabling multi-hop reasoning and deeper contextual retrieval using graph traversal and Cypher queries alongside vector similarity search (e.g. Microsoft's GraphRAG)

Enterprise knowledge bases with complex relationships Legal document analysis Scientific literature review Recommendation systems with rich entity relationships

Real Time RAG

Processes streaming data in real-time by constructing evolving knowledge graphs that capture scene-object-entity relationships as data arrives, using lightweight models and dynamic priority-based knowledge extraction

News monitoring Market analysis Intelligent transportation systems Healthcare monitoring Satellite remote sensing Financial trading systems

Dynamic Context Assembly

Assembly Functions and Orchestration Mechanisms

Template-based formatting

Consistent structure for context assembly

API responses Report generation

Priority-based selection

Most important info first in context assembly

Mobile apps Real-time systems

Adaptive composition

Adjusts to task requirements and model capabilities

Multi-domain applications Scalable AI systems

Multi-Component Integration Strategies

Multi-Component Integration

Combines text, structured knowledge, temporal data, and external tools while maintaining coherent semantic relationships

Enterprise AI systems Comprehensive analytics platforms

Automated Assembly Optimizations - Automated Prompt Engineering

Automatic Prompt Engineer (APE)

Uses search algorithms to find optimal prompts automatically. Treats instruction as 'program,' optimized by searching over pool of instruction candidates proposed by LLM to maximize chosen score function

Production systems needing optimal prompts AI Content generation AI-powered chatbots

LM-BFF

Has automated pipelines that combine prompt-based fine-tuning with dynamic demonstrations. Suite of simple techniques for fine-tuning pre-trained language models on small number of annotated examples

Entity extraction Classification tasks Domain adaptation with limited examples

Promptbreeder

Self-Referential Evolutionary Systems where LLMs improve their own task-prompts and mutation-prompts through 'natural selection' analogies. Evolves and adapts prompts for given domain using evolutionary algorithm

Autonomous AI systems Continuous learning applications Domain-specific optimization

Self-Refine

Generate → Critique → Revise → Repeat, 20% performance improvement with GPT-4

Content creation Code review Creative writing

Tool Integration Frameworks

Langchain

Sequential processing chains, agents, web browsing capabilities. Provides building blocks for AI applications with extensive integrations

Chatbots Data retrieval and processing Internal knowlege management RAG applications

AutoGPT/AutoGen

Complex AI agent development with user-friendly interfaces. Multi-agent conversation orchestration with event-driven architecture

Code generation Personalized content creation at scale Automated workflows Research tasks

Context Processing

Long Context Processing

Ultra-long Sequence Context Processing

Standard Transformer (Self-Attention, O²)

Baseline self-attention with quadratic complexity

General LLM Tasks Question answering Summarization

Architectural Innovations for Long Context

State Space Models (SSMs, e.g. Mamba)

Linear complexity with constant memory through hidden states

Real-time language modeling Streaming data Online inference with constant memory

Dilated Attention (e.g. LongNet)

Exponentially expanding attention fields

Genomics Legal/financial document analysis Logs

Toeplitz Neural Networks

Model sequences with relative position encoded Toeplitz matrices reducing space time complexity to log linear

Code generation Document QA Long sequence forecasting

Linear Attention

Expresses self attention as linear dot products of kernel feature maps

Scientific papers Book summarization Historical document analysis

Non-attention LLMs (Recursive memory)

Uses recursive memory transformers, breaking long sequences into chunks. Model keeps a summary ('memory') of each chunk. As model moves through sequence it recursively updates and uses this memory

Chatbots with persistent and growing context Logging agents

Position Interpolation & Context Extension

NTK/YaRN

Neural Tangent Kernel uses NTK interpolation for positional encodings, linear interpolation for stretching position indices, and attention distribution correction tweaks attention mechanism

Fine-tuning LLMs to process longer docs for specialized tasks

LongRoPE

Identifies effective rescale factors for RoPE's rotation angles for each RoPE dimension based on token positions. Uses evolutionary search algorithm with progressive extension strategy to achieve 2048k context window

Rapidly adapting LLMs to massive token windows (256k–2M tokens) Patent or scientific literature mining Big data retrieval

PoSE

Positional Skip-wisE training decouples train length from target context by dividing original context window into chunks with distinct skipping bias terms to manipulate position indices during training

Data curation Codebase analysis Long meeting transcripts

Self-Extend (bi-level/grouped/neighbor attention)

Constructs bi-level attention: grouped attention for distant tokens using FLOOR operation and neighbor attention for adjacent tokens within specified range

Plug-and-play context extension

Optimization Techniques for Efficient Processing

Grouped Query Attention (GQA)

Optimizes multi-head attention by sharing key and value projections across multiple query heads, reducing memory bandwidth requirements

Scaling up LLM Inference for cost savings

FlashAttention 1/2

Memory-efficient attention algorithm that fuses operations and uses tiling to reduce memory accesses while maintaining exact attention computation

Cloud inference Large batch serving Distributed training

Ring Attention

Distributes attention computation across multiple devices using ring communication pattern

Multi-GPU or TPU setups

Sparse Attention (LongLoRA, SinkLoRA)

Selectively attends to subset of tokens based on patterns or learned importance

RAG Selective context retrieval

Efficient Selective Attention

Dynamically selects important tokens for attention computation

Large codebases Multi-document Q&A

BigBird

Combines local, global, and random attention patterns in sparse attention mechanism

Biomedical data mining Knowledge bases Graph-structured documents

Memory Management & Context Compression

Rolling Buffer Cache

Maintains fixed-size cache with sliding window mechanism for token storage

Real-time applications with fixed compute or memory budget

Streaming LLM

Processes continuous input streams with memory-efficient caching mechanisms

Live chat moderation Streaming summarization

Infini-attention

Combines compressive memory with local attention in single model block

Retrieval-based LLMs Systems requiring both detail and recall

Heavy Hitter Oracle

Identifies and prioritizes most important tokens/context for attention

LLM API deployment Latency-sensitive applications

QwenLong-CPRS, InfLLM

Multi-granularity memory management with compression techniques

Legal/medical archives Multi-chapter document analysis

Contextual Self-Refinement and Adaptation

Foundational Self-Refinement Frameworks

Self-Refine

Generate → Critique → Revise → Repeat iterative process for output improvement

Automated writing assistants Code review bots

Reflexion

Uses self-feedback and reflection mechanisms to improve task performance over time

Interactive agents Multi-turn dialogue Strategic planning

N-CRITICS, A2R, ISR-LLM

Multiple critic models provide multi-dimensional evaluation and refinement

Scientific writing Factual report generation Compliance documentation

Meta-Learning and Autonomous Evolution

SELF, Creator, Self-Developing

Autonomous tool creation and self-improvement mechanisms

AutoML systems LLM-driven agent ecosystems Research automation

In-Context Learning / Meta-in-context

Fast adaptation using examples provided in context without parameter updates

Few-shot/zero-shot learning Adapting to unseen data formats

Memory-Augmented Adaptation Frameworks

Memory Augmentation, Meta-Learned Loss

Meta-learning abilities and improved sample efficiency through architectural innovations

Lifelong learning Reinforcement learning Robotics

Long Chain-of-Thought and Advanced Reasoning

Advanced Reasoning (Compact CoT)

Optimized reasoning with reduced token usage while maintaining performance

Cost-sensitive LLM apps Mobile or edge deployments Complex reasoning tasks

Multi-Modal Context

Foundational Techniques

Visual Prompt Generators (VPGs), CLIP/CLAP + Q-Former

Combines vision/audio with text in unified model using cross-modal encoders

Image captioning VQA multi-modal retrieval

Advanced Integration Strategies

Cross-Modal Attention, Hierarchical Designs

Deep fusion mechanisms across modalities with hierarchical processing

Complex scene analysis Video QA Scientific image interpretation

Browse-and-Concentrate, Unified Training

Joint training approach across modalities from the outset

Foundation models Multi-modal pretraining

Adapters/Prompt Tuning for Video

Parameter-efficient adaptation for video understanding

Surveillance analysis Sports analytics Lecture summarization

Emerging Applications

Adaptive Hierarchical Token Compression, V2PE

Advanced compression and context handling for variable-length inputs

Dynamic media Streaming environments AI-driven broadcast

Relational & Structured Context

Knowledge Graph Embeddings and Neural Integration

Graph Neural Networks, GraphFormers, Heterformer

Neural architectures designed for graph-structured data processing

Scientific discovery Knowledge base QA Relational Reasoning

Verbalization & Structured Data Representations

Verbalization/Structured Data Reps

Converts structured data into natural language representations

Table QA Knowledge base integration Data extraction

Programming Language Reps (Python/SQL)

Uses programming languages as intermediate representations

Data engineering Code synthesis Database querying

Matrix Representations

Compact matrix-based representations of structured information

Lightweight edge deployments On-device ML

Integration Frameworks & Synergized Approaches

K-BERT (Pretrain), KAPING (Inference Time)

Injects structured knowledge during pretraining or inference

Medical/financial LLMs RAG

Unified Approaches (GreaseLM, QA-GNN)

Combines natural language fluency with knowledge graph reasoning

Open-domain QA Research assistants Scientific agents

Context Management

Memory Hierarchies & Storage Architectures

Virtual Memory Systems

MemGPT

Virtual memory management like OS systems

Extended conversations Document analysis Personal companion systems Chatbots with persistent memory

PagedAttention

Efficient KV cache memory management with non-contiguous blocks

Cloud inference Large batch serving Production LLM deployment

Dynamic Memory Organizations

MemoryBank

Uses Ebbinghaus Forgetting Curve principles with dynamic memory strength adjustment

Personal companion systems Psychological counseling Long-term AI companions

ReadAgent

Episode pagination, memory gisting, and interactive lookup for human-like reading

Long document analysis Research paper comprehension Multi-document Q&A

Compressor-Retriever Systems

Lifelong context management using base model functions to compress and retrieve content with end-to-end differentiability

Long-term learning systems Evolving knowledge bases

System Configurations

Centralized Systems

Excellent task coordination but poor scalability, context overflow as topics increase

Single-domain chatbots Focused task automation

Decentralized Systems

Reduced context overflow but increased response time due to inter-agent querying

Multi-agent systems Distributed knowledge processing

Hybrid Systems

Balances shared knowledge with specialized processing, semi-autonomous operation

Enterprise AI systems Complex conversational agents

Context Compression

Context Manager Components

Snapshot creation

Save intermediate states during processing

Long-running tasks Recovery systems

State restoration

Resume from previous points in processing

Recovery from interruptions

Window management

Overall context optimization and organization

Multi-step reasoning Complex document processing

Context Compression Techniques

In-Context Autoencoder (ICAE)

Condenses long contexts into compact memory slots for direct conditioning

Memory-constrained environments

Recurrent Context Compression (RCC)

Expands context window length in constrained storage using instruction reconstruction techniques

Edge computing Resource-limited deployment

Memory Augmented Approaches

kNN-Based Memory Caches

Store key-value pairs with contrastive learning to improve retrieval accuracy

Conversation systems Recommendation engines

Hierarchical Caching Systems

Hierarchical Caching

Multi-layer caching systems like Activation Refilling (ACRE)

Enterprise systems with multiple data tiers

Infinite-LLM with DistAttention

Handle extremely long sequences using distributed attention mechanisms

Large-scale document processing

KCache

Smart optimization: K cache in high-bandwidth memory, V cache in CPU memory for optimized inference speed and memory usage

High-performance serving Cost optimization

Multi-agent Distributive Processing

Multi-Agent Distributive Processing

Handles massive inputs in distributed manner with high cache reusability in RAG and agent workloads

Large-scale knowledge processing Enterprise RAG

Cache Access Pattern Analysis

High reusability patterns in RAG and agent applications to reduce redundancy

Production RAG systems Agent platforms

A More Effective Way to Build Smarter Language Models

If you’ve built production machine learning (ML) systems, you’ve likely run into these challenges: your model struggles with domain-specific tasks, can’t access real-time information, or fails to deliver personalized responses. The traditional solution of creating massive, specialized datasets is often incredibly expensive and time-consuming.

Context engineering offers a powerful alternative. It’s the process of finding, preparing, and managing the right information to give your model at the exact moment it’s needed (at inference time). This approach provides your model with targeted, relevant knowledge, drastically reducing the need for costly retraining.

The world of AI will continue to evolve, but the principle of supplying models with relevant, external information will always be critical. Instead of waiting years for the perfect dataset, you can use context engineering to get immediate performance boosts for your specialized and dynamic applications.

We’ve also included a list of additional resources that we’ve found most effective for deploying these techniques in real-world systems below.

Resources

[1] Original paper guiding this post: https://arxiv.org/abs/2507.13334

Github: https://github.com/Meirtz/Awesome-Context-Engineering

[2] Context Engineering Guide: https://docs.google.com/document/d/1JU8w-E4LlseFZm-ag22GSBU5A2rp2nb7iFGBNAbFL7k/edit?usp=sharing

[3] Manus Context Engineering for AI Agents: https://manus.im/blog/Context-Engineering-for-AI-Agents-Lessons-from-Building-Manus

[4] 12 Factor Agents Factors 2 & 3: https://github.com/humanlayer/12-factor-agents

[5] Ultimate Context Engineering Cheat Sheet: https://x.com/lenadroid/status/1943685060785524824