LangChain & LlamaIndex

LangChain and LlamaIndex are the two dominant frameworks for building LLM applications. While they overlap significantly, each has distinct strengths: LangChain excels at composable chains and agent workflows, while LlamaIndex is purpose-built for structured data retrieval and indexing.

LangChain Core Concepts

Installation

pip install langchain langchain-openai langchain-community langchain-chroma

Prompt Templates

Prompt templates let you create reusable, parameterized prompts:

from langchain.prompts import ChatPromptTemplate, FewShotChatMessagePromptTemplate
Basic template
prompt = ChatPromptTemplate.from_template(
    "Explain {topic} to a {audience} in {num_sentences} sentences."
)
Format with variables
formatted = prompt.format(
    topic="neural networks",
    audience="5-year-old",
    num_sentences=3,
)
print(formatted)
Multi-message template (system + human)
chat_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a {role}. Always respond in {language}."),
    ("human", "{question}"),
])
messages = chat_prompt.format_messages(
    role="helpful coding tutor",
    language="simple English",
    question="What is recursion?",
)
Few-shot template
examples = [
    {"input": "happy", "output": "sad"},
    {"input": "tall", "output": "short"},
    {"input": "fast", "output": "slow"},
]
example_prompt = ChatPromptTemplate.from_messages([
    ("human", "{input}"),
    ("ai", "{output}"),
])
few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=example_prompt,
    examples=examples,
)final_prompt = ChatPromptTemplate.from_messages([
    ("system", "You give the opposite of each word."),
    few_shot_prompt,
    ("human", "{input}"),
])

Output Parsers

Structured output from LLMs:

from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field
Define output schema
class MovieReview(BaseModel):
    title: str = Field(description="The movie title")
    rating: float = Field(description="Rating from 1 to 10")
    summary: str = Field(description="One-sentence summary")
    pros: list[str] = Field(description="List of positive aspects")
    cons: list[str] = Field(description="List of negative aspects")
parser = PydanticOutputParser(pydantic_object=MovieReview)
prompt = ChatPromptTemplate.from_template(
    """Review the following movie and provide structured output.
{format_instructions}
Movie: {movie}"""
)
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
chain = prompt.partial(format_instructions=parser.get_format_instructions()) | llm | parserreview = chain.invoke({"movie": "Inception"})
print(f"Title: {review.title}")
print(f"Rating: {review.rating}")
print(f"Pros: {review.pros}")

Using with_structured_output (preferred approach)

# Even simpler --- use the model's native structured output
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
structured_llm = llm.with_structured_output(MovieReview)review = structured_llm.invoke("Review the movie Inception")
print(type(review))  # 
print(review.rating)

LCEL (LangChain Expression Language)

LCEL is LangChain's declarative way to compose chains using the pipe operator (|). Each component is a 'Runnable' with invoke(), stream(), and batch() methods. Chains are built like: prompt | llm | parser. LCEL enables automatic streaming, async support, parallel execution, and fallbacks without changing your chain logic.

LCEL Chains

from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableParallel
llm = ChatOpenAI(model="gpt-4o-mini")
Simple chain: prompt -> LLM -> string
chain = (
    ChatPromptTemplate.from_template("Tell a joke about {topic}")
    | llm
    | StrOutputParser()
)
result = chain.invoke({"topic": "programming"})
print(result)
Streaming
for chunk in chain.stream({"topic": "AI"}):
    print(chunk, end="", flush=True)
Parallel chains
analysis_chain = RunnableParallel(
    summary=ChatPromptTemplate.from_template("Summarize: {text}") | llm | StrOutputParser(),
    sentiment=ChatPromptTemplate.from_template("What is the sentiment of: {text}") | llm | StrOutputParser(),
    keywords=ChatPromptTemplate.from_template("Extract 5 keywords from: {text}") | llm | StrOutputParser(),
)
results = analysis_chain.invoke({"text": "LangChain makes it easy to build LLM applications..."})
print(results["summary"])
print(results["sentiment"])
print(results["keywords"])
Chain with fallbacks
primary_llm = ChatOpenAI(model="gpt-4o")
fallback_llm = ChatOpenAI(model="gpt-4o-mini")reliable_llm = primary_llm.with_fallbacks([fallback_llm])
chain = ChatPromptTemplate.from_template("{question}") | reliable_llm | StrOutputParser()

Conversation Memory

from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
llm = ChatOpenAI(model="gpt-4o-mini")
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}"),
])
chain = prompt | llm
Session-based memory store
store = {}
def get_session_history(session_id: str):
    if session_id not in store:
        store[session_id] = InMemoryChatMessageHistory()
    return store[session_id]
Wrap chain with message history
chain_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="history",
)
Conversation
config = {"configurable": {"session_id": "user_123"}}
response1 = chain_with_history.invoke(
    {"input": "My name is Alice. I'm learning about RAG."},
    config=config,
)
print(response1.content)response2 = chain_with_history.invoke(
    {"input": "What am I learning about?"},
    config=config,
)
print(response2.content)  # "You mentioned you're learning about RAG!"

LlamaIndex Core Concepts

LlamaIndex is designed specifically for connecting LLMs to data. Its core abstraction is the index --- a data structure that organizes your documents for efficient LLM querying.

Installation

pip install llama-index llama-index-llms-openai llama-index-embeddings-openai

Documents and Nodes

from llama_index.core import Document, VectorStoreIndex, Settings
from llama_index.core.node_parser import SentenceSplitter
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
Configure global settings
Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0)
Settings.embed_model = OpenAIEmbedding(model_name="text-embedding-3-small")
Create documents
documents = [
    Document(
        text="RAG systems combine retrieval with generation for accurate answers.",
        metadata={"source": "rag_guide", "chapter": 1},
    ),
    Document(
        text="Vector databases store embeddings for fast similarity search.",
        metadata={"source": "vector_db_guide", "chapter": 1},
    ),
]
Parse documents into nodes (LlamaIndex's term for chunks)
parser = SentenceSplitter(chunk_size=512, chunk_overlap=50)
nodes = parser.get_nodes_from_documents(documents)
print(f"Created {len(nodes)} nodes")
for node in nodes:
    print(f"  Node: {node.text[:80]}...")
    print(f"  Metadata: {node.metadata}")

Building and Querying an Index

from llama_index.core import VectorStoreIndex, StorageContext, load_index_from_storage
Build an index from documents
index = VectorStoreIndex.from_documents(documents)
Query the index
query_engine = index.as_query_engine(
    similarity_top_k=3,
    response_mode="compact",  # Options: compact, tree_summarize, refine, no_text
)
response = query_engine.query("What are vector databases used for?")
print(response)
print(f"Source nodes: {len(response.source_nodes)}")
for node in response.source_nodes:
    print(f"  Score: {node.score:.4f} | {node.text[:60]}...")
Persist and reload
index.storage_context.persist(persist_dir="./storage")
Later: reload from disk
storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = load_index_from_storage(storage_context)

Advanced Query Engines

from llama_index.core import VectorStoreIndex, SummaryIndex
from llama_index.core.tools import QueryEngineTool
from llama_index.core.query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector
Create specialized indices
vector_index = VectorStoreIndex.from_documents(technical_docs)
summary_index = SummaryIndex.from_documents(overview_docs)
Create query engine tools
vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_index.as_query_engine(),
    description="Best for specific technical questions about implementation details.",
)
summary_tool = QueryEngineTool.from_defaults(
    query_engine=summary_index.as_query_engine(response_mode="tree_summarize"),
    description="Best for high-level summaries and overviews of topics.",
)
Router decides which engine to use based on the query
router_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=[vector_tool, summary_tool],
)
The router automatically picks the right engine
response = router_engine.query("Give me an overview of RAG systems")  # Uses summary
response = router_engine.query("What chunk_size should I use with HNSW?")  # Uses vector

LangChain vs LlamaIndex: When to Use Which

Use LangChain when: you need flexible chain composition, agent workflows, integration with many tools (APIs, databases, web), or complex multi-step reasoning. Use LlamaIndex when: your primary task is document retrieval/QA, you need structured index management, you want built-in evaluation, or you need advanced retrieval patterns (router, sub-question). Many production systems use both: LlamaIndex for the retrieval layer and LangChain for orchestration and agents.

Common Patterns

Pattern 1: Conversational RAG (LangChain)

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_history_aware_retriever
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
llm = ChatOpenAI(model="gpt-4o-mini")
vectorstore = Chroma(embedding_function=OpenAIEmbeddings())
retriever = vectorstore.as_retriever()
Contextualize question based on chat history
contextualize_prompt = ChatPromptTemplate.from_messages([
    ("system", "Given chat history and a new question, reformulate the question to be standalone."),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}"),
])
history_aware_retriever = create_history_aware_retriever(
    llm, retriever, contextualize_prompt
)
Answer prompt
answer_prompt = ChatPromptTemplate.from_messages([
    ("system", "Answer based on context:\n\n{context}"),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}"),
])
question_answer_chain = create_stuff_documents_chain(llm, answer_prompt)
rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)
Multi-turn conversation
chat_history = []
result = rag_chain.invoke({"input": "What is RAG?", "chat_history": chat_history})
chat_history.extend([("human", "What is RAG?"), ("ai", result["answer"])])
result = rag_chain.invoke({"input": "How does it reduce hallucinations?", "chat_history": chat_history})
The retriever understands "it" refers to RAG from the history

Pattern 2: Sub-Question Query Engine (LlamaIndex)

from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.tools import QueryEngineTool
Create query engine tools for different document sets
tools = [
    QueryEngineTool.from_defaults(
        query_engine=ml_index.as_query_engine(),
        description="Contains information about machine learning concepts.",
    ),
    QueryEngineTool.from_defaults(
        query_engine=db_index.as_query_engine(),
        description="Contains information about databases and data storage.",
    ),
]
Sub-question engine decomposes complex queries
sub_question_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=tools,
)
A complex question gets broken into sub-questions
response = sub_question_engine.query(
    "Compare how vector databases and traditional databases handle ML workloads"
)
Internally generates:
  Sub-Q1: "How do vector databases handle ML workloads?" -> ml_tool + db_tool
  Sub-Q2: "How do traditional databases handle ML workloads?" -> db_tool
Then synthesizes a final answer

LangChain & LlamaIndex

LangChain Core Concepts

Installation

pip install langchain langchain-openai langchain-community langchain-chroma

Prompt Templates

Prompt templates let you create reusable, parameterized prompts:

from langchain.prompts import ChatPromptTemplate, FewShotChatMessagePromptTemplate
Basic template
prompt = ChatPromptTemplate.from_template(
    "Explain {topic} to a {audience} in {num_sentences} sentences."
)
Format with variables
formatted = prompt.format(
    topic="neural networks",
    audience="5-year-old",
    num_sentences=3,
)
print(formatted)
Multi-message template (system + human)
chat_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a {role}. Always respond in {language}."),
    ("human", "{question}"),
])
messages = chat_prompt.format_messages(
    role="helpful coding tutor",
    language="simple English",
    question="What is recursion?",
)
Few-shot template
examples = [
    {"input": "happy", "output": "sad"},
    {"input": "tall", "output": "short"},
    {"input": "fast", "output": "slow"},
]
example_prompt = ChatPromptTemplate.from_messages([
    ("human", "{input}"),
    ("ai", "{output}"),
])
few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=example_prompt,
    examples=examples,
)final_prompt = ChatPromptTemplate.from_messages([
    ("system", "You give the opposite of each word."),
    few_shot_prompt,
    ("human", "{input}"),
])

Output Parsers

Structured output from LLMs:

from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field
Define output schema
class MovieReview(BaseModel):
    title: str = Field(description="The movie title")
    rating: float = Field(description="Rating from 1 to 10")
    summary: str = Field(description="One-sentence summary")
    pros: list[str] = Field(description="List of positive aspects")
    cons: list[str] = Field(description="List of negative aspects")
parser = PydanticOutputParser(pydantic_object=MovieReview)
prompt = ChatPromptTemplate.from_template(
    """Review the following movie and provide structured output.
{format_instructions}
Movie: {movie}"""
)
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
chain = prompt.partial(format_instructions=parser.get_format_instructions()) | llm | parserreview = chain.invoke({"movie": "Inception"})
print(f"Title: {review.title}")
print(f"Rating: {review.rating}")
print(f"Pros: {review.pros}")

Using with_structured_output (preferred approach)

# Even simpler --- use the model's native structured output
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
structured_llm = llm.with_structured_output(MovieReview)review = structured_llm.invoke("Review the movie Inception")
print(type(review))  # 
print(review.rating)

LCEL (LangChain Expression Language)

LCEL Chains

from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableParallel
llm = ChatOpenAI(model="gpt-4o-mini")
Simple chain: prompt -> LLM -> string
chain = (
    ChatPromptTemplate.from_template("Tell a joke about {topic}")
    | llm
    | StrOutputParser()
)
result = chain.invoke({"topic": "programming"})
print(result)
Streaming
for chunk in chain.stream({"topic": "AI"}):
    print(chunk, end="", flush=True)
Parallel chains
analysis_chain = RunnableParallel(
    summary=ChatPromptTemplate.from_template("Summarize: {text}") | llm | StrOutputParser(),
    sentiment=ChatPromptTemplate.from_template("What is the sentiment of: {text}") | llm | StrOutputParser(),
    keywords=ChatPromptTemplate.from_template("Extract 5 keywords from: {text}") | llm | StrOutputParser(),
)
results = analysis_chain.invoke({"text": "LangChain makes it easy to build LLM applications..."})
print(results["summary"])
print(results["sentiment"])
print(results["keywords"])
Chain with fallbacks
primary_llm = ChatOpenAI(model="gpt-4o")
fallback_llm = ChatOpenAI(model="gpt-4o-mini")reliable_llm = primary_llm.with_fallbacks([fallback_llm])
chain = ChatPromptTemplate.from_template("{question}") | reliable_llm | StrOutputParser()

Conversation Memory

from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
llm = ChatOpenAI(model="gpt-4o-mini")
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}"),
])
chain = prompt | llm
Session-based memory store
store = {}
def get_session_history(session_id: str):
    if session_id not in store:
        store[session_id] = InMemoryChatMessageHistory()
    return store[session_id]
Wrap chain with message history
chain_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="history",
)
Conversation
config = {"configurable": {"session_id": "user_123"}}
response1 = chain_with_history.invoke(
    {"input": "My name is Alice. I'm learning about RAG."},
    config=config,
)
print(response1.content)response2 = chain_with_history.invoke(
    {"input": "What am I learning about?"},
    config=config,
)
print(response2.content)  # "You mentioned you're learning about RAG!"

LlamaIndex Core Concepts

LlamaIndex is designed specifically for connecting LLMs to data. Its core abstraction is the index --- a data structure that organizes your documents for efficient LLM querying.

Installation

pip install llama-index llama-index-llms-openai llama-index-embeddings-openai

Documents and Nodes

from llama_index.core import Document, VectorStoreIndex, Settings
from llama_index.core.node_parser import SentenceSplitter
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
Configure global settings
Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0)
Settings.embed_model = OpenAIEmbedding(model_name="text-embedding-3-small")
Create documents
documents = [
    Document(
        text="RAG systems combine retrieval with generation for accurate answers.",
        metadata={"source": "rag_guide", "chapter": 1},
    ),
    Document(
        text="Vector databases store embeddings for fast similarity search.",
        metadata={"source": "vector_db_guide", "chapter": 1},
    ),
]
Parse documents into nodes (LlamaIndex's term for chunks)
parser = SentenceSplitter(chunk_size=512, chunk_overlap=50)
nodes = parser.get_nodes_from_documents(documents)
print(f"Created {len(nodes)} nodes")
for node in nodes:
    print(f"  Node: {node.text[:80]}...")
    print(f"  Metadata: {node.metadata}")

Building and Querying an Index

from llama_index.core import VectorStoreIndex, StorageContext, load_index_from_storage
Build an index from documents
index = VectorStoreIndex.from_documents(documents)
Query the index
query_engine = index.as_query_engine(
    similarity_top_k=3,
    response_mode="compact",  # Options: compact, tree_summarize, refine, no_text
)
response = query_engine.query("What are vector databases used for?")
print(response)
print(f"Source nodes: {len(response.source_nodes)}")
for node in response.source_nodes:
    print(f"  Score: {node.score:.4f} | {node.text[:60]}...")
Persist and reload
index.storage_context.persist(persist_dir="./storage")
Later: reload from disk
storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = load_index_from_storage(storage_context)

Advanced Query Engines

from llama_index.core import VectorStoreIndex, SummaryIndex
from llama_index.core.tools import QueryEngineTool
from llama_index.core.query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector
Create specialized indices
vector_index = VectorStoreIndex.from_documents(technical_docs)
summary_index = SummaryIndex.from_documents(overview_docs)
Create query engine tools
vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_index.as_query_engine(),
    description="Best for specific technical questions about implementation details.",
)
summary_tool = QueryEngineTool.from_defaults(
    query_engine=summary_index.as_query_engine(response_mode="tree_summarize"),
    description="Best for high-level summaries and overviews of topics.",
)
Router decides which engine to use based on the query
router_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=[vector_tool, summary_tool],
)
The router automatically picks the right engine
response = router_engine.query("Give me an overview of RAG systems")  # Uses summary
response = router_engine.query("What chunk_size should I use with HNSW?")  # Uses vector

LangChain vs LlamaIndex: When to Use Which

Common Patterns

Pattern 1: Conversational RAG (LangChain)

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_history_aware_retriever
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
llm = ChatOpenAI(model="gpt-4o-mini")
vectorstore = Chroma(embedding_function=OpenAIEmbeddings())
retriever = vectorstore.as_retriever()
Contextualize question based on chat history
contextualize_prompt = ChatPromptTemplate.from_messages([
    ("system", "Given chat history and a new question, reformulate the question to be standalone."),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}"),
])
history_aware_retriever = create_history_aware_retriever(
    llm, retriever, contextualize_prompt
)
Answer prompt
answer_prompt = ChatPromptTemplate.from_messages([
    ("system", "Answer based on context:\n\n{context}"),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}"),
])
question_answer_chain = create_stuff_documents_chain(llm, answer_prompt)
rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)
Multi-turn conversation
chat_history = []
result = rag_chain.invoke({"input": "What is RAG?", "chat_history": chat_history})
chat_history.extend([("human", "What is RAG?"), ("ai", result["answer"])])
result = rag_chain.invoke({"input": "How does it reduce hallucinations?", "chat_history": chat_history})
The retriever understands "it" refers to RAG from the history

Pattern 2: Sub-Question Query Engine (LlamaIndex)

from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.tools import QueryEngineTool
Create query engine tools for different document sets
tools = [
    QueryEngineTool.from_defaults(
        query_engine=ml_index.as_query_engine(),
        description="Contains information about machine learning concepts.",
    ),
    QueryEngineTool.from_defaults(
        query_engine=db_index.as_query_engine(),
        description="Contains information about databases and data storage.",
    ),
]
Sub-question engine decomposes complex queries
sub_question_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=tools,
)
A complex question gets broken into sub-questions
response = sub_question_engine.query(
    "Compare how vector databases and traditional databases handle ML workloads"
)
Internally generates:
  Sub-Q1: "How do vector databases handle ML workloads?" -> ml_tool + db_tool
  Sub-Q2: "How do traditional databases handle ML workloads?" -> db_tool
Then synthesizes a final answer