LangChain & LlamaIndex
LangChain and LlamaIndex are the two dominant frameworks for building LLM applications. While they overlap significantly, each has distinct strengths: LangChain excels at composable chains and agent workflows, while LlamaIndex is purpose-built for structured data retrieval and indexing.
LangChain Core Concepts
Installation
pip install langchain langchain-openai langchain-community langchain-chroma
Prompt Templates
Prompt templates let you create reusable, parameterized prompts:
from langchain.prompts import ChatPromptTemplate, FewShotChatMessagePromptTemplateBasic template
prompt = ChatPromptTemplate.from_template(
"Explain {topic} to a {audience} in {num_sentences} sentences."
)Format with variables
formatted = prompt.format(
topic="neural networks",
audience="5-year-old",
num_sentences=3,
)
print(formatted)Multi-message template (system + human)
chat_prompt = ChatPromptTemplate.from_messages([
("system", "You are a {role}. Always respond in {language}."),
("human", "{question}"),
])messages = chat_prompt.format_messages(
role="helpful coding tutor",
language="simple English",
question="What is recursion?",
)
Few-shot template
examples = [
{"input": "happy", "output": "sad"},
{"input": "tall", "output": "short"},
{"input": "fast", "output": "slow"},
]example_prompt = ChatPromptTemplate.from_messages([
("human", "{input}"),
("ai", "{output}"),
])
few_shot_prompt = FewShotChatMessagePromptTemplate(
example_prompt=example_prompt,
examples=examples,
)
final_prompt = ChatPromptTemplate.from_messages([
("system", "You give the opposite of each word."),
few_shot_prompt,
("human", "{input}"),
])
Output Parsers
Structured output from LLMs:
from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, FieldDefine output schema
class MovieReview(BaseModel):
title: str = Field(description="The movie title")
rating: float = Field(description="Rating from 1 to 10")
summary: str = Field(description="One-sentence summary")
pros: list[str] = Field(description="List of positive aspects")
cons: list[str] = Field(description="List of negative aspects")parser = PydanticOutputParser(pydantic_object=MovieReview)
prompt = ChatPromptTemplate.from_template(
"""Review the following movie and provide structured output.
{format_instructions}
Movie: {movie}"""
)
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
chain = prompt.partial(format_instructions=parser.get_format_instructions()) | llm | parser
review = chain.invoke({"movie": "Inception"})
print(f"Title: {review.title}")
print(f"Rating: {review.rating}")
print(f"Pros: {review.pros}")
Using with_structured_output (preferred approach)
# Even simpler --- use the model's native structured output
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
structured_llm = llm.with_structured_output(MovieReview)review = structured_llm.invoke("Review the movie Inception")
print(type(review)) #
print(review.rating)
LCEL (LangChain Expression Language)
LCEL Chains
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableParallelllm = ChatOpenAI(model="gpt-4o-mini")
Simple chain: prompt -> LLM -> string
chain = (
ChatPromptTemplate.from_template("Tell a joke about {topic}")
| llm
| StrOutputParser()
)result = chain.invoke({"topic": "programming"})
print(result)
Streaming
for chunk in chain.stream({"topic": "AI"}):
print(chunk, end="", flush=True)Parallel chains
analysis_chain = RunnableParallel(
summary=ChatPromptTemplate.from_template("Summarize: {text}") | llm | StrOutputParser(),
sentiment=ChatPromptTemplate.from_template("What is the sentiment of: {text}") | llm | StrOutputParser(),
keywords=ChatPromptTemplate.from_template("Extract 5 keywords from: {text}") | llm | StrOutputParser(),
)results = analysis_chain.invoke({"text": "LangChain makes it easy to build LLM applications..."})
print(results["summary"])
print(results["sentiment"])
print(results["keywords"])
Chain with fallbacks
primary_llm = ChatOpenAI(model="gpt-4o")
fallback_llm = ChatOpenAI(model="gpt-4o-mini")reliable_llm = primary_llm.with_fallbacks([fallback_llm])
chain = ChatPromptTemplate.from_template("{question}") | reliable_llm | StrOutputParser()
Conversation Memory
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistoryllm = ChatOpenAI(model="gpt-4o-mini")
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant."),
MessagesPlaceholder(variable_name="history"),
("human", "{input}"),
])
chain = prompt | llm
Session-based memory store
store = {}def get_session_history(session_id: str):
if session_id not in store:
store[session_id] = InMemoryChatMessageHistory()
return store[session_id]
Wrap chain with message history
chain_with_history = RunnableWithMessageHistory(
chain,
get_session_history,
input_messages_key="input",
history_messages_key="history",
)Conversation
config = {"configurable": {"session_id": "user_123"}}response1 = chain_with_history.invoke(
{"input": "My name is Alice. I'm learning about RAG."},
config=config,
)
print(response1.content)
response2 = chain_with_history.invoke(
{"input": "What am I learning about?"},
config=config,
)
print(response2.content) # "You mentioned you're learning about RAG!"
LlamaIndex Core Concepts
LlamaIndex is designed specifically for connecting LLMs to data. Its core abstraction is the index --- a data structure that organizes your documents for efficient LLM querying.
Installation
pip install llama-index llama-index-llms-openai llama-index-embeddings-openai
Documents and Nodes
from llama_index.core import Document, VectorStoreIndex, Settings
from llama_index.core.node_parser import SentenceSplitter
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbeddingConfigure global settings
Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0)
Settings.embed_model = OpenAIEmbedding(model_name="text-embedding-3-small")Create documents
documents = [
Document(
text="RAG systems combine retrieval with generation for accurate answers.",
metadata={"source": "rag_guide", "chapter": 1},
),
Document(
text="Vector databases store embeddings for fast similarity search.",
metadata={"source": "vector_db_guide", "chapter": 1},
),
]Parse documents into nodes (LlamaIndex's term for chunks)
parser = SentenceSplitter(chunk_size=512, chunk_overlap=50)
nodes = parser.get_nodes_from_documents(documents)
print(f"Created {len(nodes)} nodes")
for node in nodes:
print(f" Node: {node.text[:80]}...")
print(f" Metadata: {node.metadata}")
Building and Querying an Index
from llama_index.core import VectorStoreIndex, StorageContext, load_index_from_storageBuild an index from documents
index = VectorStoreIndex.from_documents(documents)Query the index
query_engine = index.as_query_engine(
similarity_top_k=3,
response_mode="compact", # Options: compact, tree_summarize, refine, no_text
)response = query_engine.query("What are vector databases used for?")
print(response)
print(f"Source nodes: {len(response.source_nodes)}")
for node in response.source_nodes:
print(f" Score: {node.score:.4f} | {node.text[:60]}...")
Persist and reload
index.storage_context.persist(persist_dir="./storage")Later: reload from disk
storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = load_index_from_storage(storage_context)
Advanced Query Engines
from llama_index.core import VectorStoreIndex, SummaryIndex
from llama_index.core.tools import QueryEngineTool
from llama_index.core.query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelectorCreate specialized indices
vector_index = VectorStoreIndex.from_documents(technical_docs)
summary_index = SummaryIndex.from_documents(overview_docs)Create query engine tools
vector_tool = QueryEngineTool.from_defaults(
query_engine=vector_index.as_query_engine(),
description="Best for specific technical questions about implementation details.",
)summary_tool = QueryEngineTool.from_defaults(
query_engine=summary_index.as_query_engine(response_mode="tree_summarize"),
description="Best for high-level summaries and overviews of topics.",
)
Router decides which engine to use based on the query
router_engine = RouterQueryEngine(
selector=LLMSingleSelector.from_defaults(),
query_engine_tools=[vector_tool, summary_tool],
)The router automatically picks the right engine
response = router_engine.query("Give me an overview of RAG systems") # Uses summary
response = router_engine.query("What chunk_size should I use with HNSW?") # Uses vector
LangChain vs LlamaIndex: When to Use Which
Common Patterns
Pattern 1: Conversational RAG (LangChain)
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_history_aware_retriever
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholderllm = ChatOpenAI(model="gpt-4o-mini")
vectorstore = Chroma(embedding_function=OpenAIEmbeddings())
retriever = vectorstore.as_retriever()
Contextualize question based on chat history
contextualize_prompt = ChatPromptTemplate.from_messages([
("system", "Given chat history and a new question, reformulate the question to be standalone."),
MessagesPlaceholder("chat_history"),
("human", "{input}"),
])history_aware_retriever = create_history_aware_retriever(
llm, retriever, contextualize_prompt
)
Answer prompt
answer_prompt = ChatPromptTemplate.from_messages([
("system", "Answer based on context:\n\n{context}"),
MessagesPlaceholder("chat_history"),
("human", "{input}"),
])question_answer_chain = create_stuff_documents_chain(llm, answer_prompt)
rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)
Multi-turn conversation
chat_history = []
result = rag_chain.invoke({"input": "What is RAG?", "chat_history": chat_history})
chat_history.extend([("human", "What is RAG?"), ("ai", result["answer"])])result = rag_chain.invoke({"input": "How does it reduce hallucinations?", "chat_history": chat_history})
The retriever understands "it" refers to RAG from the history
Pattern 2: Sub-Question Query Engine (LlamaIndex)
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.tools import QueryEngineToolCreate query engine tools for different document sets
tools = [
QueryEngineTool.from_defaults(
query_engine=ml_index.as_query_engine(),
description="Contains information about machine learning concepts.",
),
QueryEngineTool.from_defaults(
query_engine=db_index.as_query_engine(),
description="Contains information about databases and data storage.",
),
]Sub-question engine decomposes complex queries
sub_question_engine = SubQuestionQueryEngine.from_defaults(
query_engine_tools=tools,
)A complex question gets broken into sub-questions
response = sub_question_engine.query(
"Compare how vector databases and traditional databases handle ML workloads"
)
Internally generates:
Sub-Q1: "How do vector databases handle ML workloads?" -> ml_tool + db_tool
Sub-Q2: "How do traditional databases handle ML workloads?" -> db_tool
Then synthesizes a final answer