Skip to main content

LangChain & LlamaIndex

LangChain core (chains, prompts, memory, output parsers, LCEL), LlamaIndex (nodes, indices, query engines), comparison, common patterns

~50 min
Listen to this lesson

LangChain & LlamaIndex

LangChain and LlamaIndex are the two dominant frameworks for building LLM applications. While they overlap significantly, each has distinct strengths: LangChain excels at composable chains and agent workflows, while LlamaIndex is purpose-built for structured data retrieval and indexing.

LangChain Core Concepts

Installation

pip install langchain langchain-openai langchain-community langchain-chroma

Prompt Templates

Prompt templates let you create reusable, parameterized prompts:

from langchain.prompts import ChatPromptTemplate, FewShotChatMessagePromptTemplate

Basic template

prompt = ChatPromptTemplate.from_template( "Explain {topic} to a {audience} in {num_sentences} sentences." )

Format with variables

formatted = prompt.format( topic="neural networks", audience="5-year-old", num_sentences=3, ) print(formatted)

Multi-message template (system + human)

chat_prompt = ChatPromptTemplate.from_messages([ ("system", "You are a {role}. Always respond in {language}."), ("human", "{question}"), ])

messages = chat_prompt.format_messages( role="helpful coding tutor", language="simple English", question="What is recursion?", )

Few-shot template

examples = [ {"input": "happy", "output": "sad"}, {"input": "tall", "output": "short"}, {"input": "fast", "output": "slow"}, ]

example_prompt = ChatPromptTemplate.from_messages([ ("human", "{input}"), ("ai", "{output}"), ])

few_shot_prompt = FewShotChatMessagePromptTemplate( example_prompt=example_prompt, examples=examples, )

final_prompt = ChatPromptTemplate.from_messages([ ("system", "You give the opposite of each word."), few_shot_prompt, ("human", "{input}"), ])

Output Parsers

Structured output from LLMs:

from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field

Define output schema

class MovieReview(BaseModel): title: str = Field(description="The movie title") rating: float = Field(description="Rating from 1 to 10") summary: str = Field(description="One-sentence summary") pros: list[str] = Field(description="List of positive aspects") cons: list[str] = Field(description="List of negative aspects")

parser = PydanticOutputParser(pydantic_object=MovieReview)

prompt = ChatPromptTemplate.from_template( """Review the following movie and provide structured output.

{format_instructions}

Movie: {movie}""" )

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

chain = prompt.partial(format_instructions=parser.get_format_instructions()) | llm | parser

review = chain.invoke({"movie": "Inception"}) print(f"Title: {review.title}") print(f"Rating: {review.rating}") print(f"Pros: {review.pros}")

Using with_structured_output (preferred approach)

# Even simpler --- use the model's native structured output
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
structured_llm = llm.with_structured_output(MovieReview)

review = structured_llm.invoke("Review the movie Inception") print(type(review)) # print(review.rating)

LCEL (LangChain Expression Language)

LCEL is LangChain's declarative way to compose chains using the pipe operator (|). Each component is a 'Runnable' with invoke(), stream(), and batch() methods. Chains are built like: prompt | llm | parser. LCEL enables automatic streaming, async support, parallel execution, and fallbacks without changing your chain logic.

LCEL Chains

from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableParallel

llm = ChatOpenAI(model="gpt-4o-mini")

Simple chain: prompt -> LLM -> string

chain = ( ChatPromptTemplate.from_template("Tell a joke about {topic}") | llm | StrOutputParser() )

result = chain.invoke({"topic": "programming"}) print(result)

Streaming

for chunk in chain.stream({"topic": "AI"}): print(chunk, end="", flush=True)

Parallel chains

analysis_chain = RunnableParallel( summary=ChatPromptTemplate.from_template("Summarize: {text}") | llm | StrOutputParser(), sentiment=ChatPromptTemplate.from_template("What is the sentiment of: {text}") | llm | StrOutputParser(), keywords=ChatPromptTemplate.from_template("Extract 5 keywords from: {text}") | llm | StrOutputParser(), )

results = analysis_chain.invoke({"text": "LangChain makes it easy to build LLM applications..."}) print(results["summary"]) print(results["sentiment"]) print(results["keywords"])

Chain with fallbacks

primary_llm = ChatOpenAI(model="gpt-4o") fallback_llm = ChatOpenAI(model="gpt-4o-mini")

reliable_llm = primary_llm.with_fallbacks([fallback_llm]) chain = ChatPromptTemplate.from_template("{question}") | reliable_llm | StrOutputParser()

Conversation Memory

from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

llm = ChatOpenAI(model="gpt-4o-mini")

prompt = ChatPromptTemplate.from_messages([ ("system", "You are a helpful assistant."), MessagesPlaceholder(variable_name="history"), ("human", "{input}"), ])

chain = prompt | llm

Session-based memory store

store = {}

def get_session_history(session_id: str): if session_id not in store: store[session_id] = InMemoryChatMessageHistory() return store[session_id]

Wrap chain with message history

chain_with_history = RunnableWithMessageHistory( chain, get_session_history, input_messages_key="input", history_messages_key="history", )

Conversation

config = {"configurable": {"session_id": "user_123"}}

response1 = chain_with_history.invoke( {"input": "My name is Alice. I'm learning about RAG."}, config=config, ) print(response1.content)

response2 = chain_with_history.invoke( {"input": "What am I learning about?"}, config=config, ) print(response2.content) # "You mentioned you're learning about RAG!"

LlamaIndex Core Concepts

LlamaIndex is designed specifically for connecting LLMs to data. Its core abstraction is the index --- a data structure that organizes your documents for efficient LLM querying.

Installation

pip install llama-index llama-index-llms-openai llama-index-embeddings-openai

Documents and Nodes

from llama_index.core import Document, VectorStoreIndex, Settings
from llama_index.core.node_parser import SentenceSplitter
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

Configure global settings

Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0) Settings.embed_model = OpenAIEmbedding(model_name="text-embedding-3-small")

Create documents

documents = [ Document( text="RAG systems combine retrieval with generation for accurate answers.", metadata={"source": "rag_guide", "chapter": 1}, ), Document( text="Vector databases store embeddings for fast similarity search.", metadata={"source": "vector_db_guide", "chapter": 1}, ), ]

Parse documents into nodes (LlamaIndex's term for chunks)

parser = SentenceSplitter(chunk_size=512, chunk_overlap=50) nodes = parser.get_nodes_from_documents(documents) print(f"Created {len(nodes)} nodes") for node in nodes: print(f" Node: {node.text[:80]}...") print(f" Metadata: {node.metadata}")

Building and Querying an Index

from llama_index.core import VectorStoreIndex, StorageContext, load_index_from_storage

Build an index from documents

index = VectorStoreIndex.from_documents(documents)

Query the index

query_engine = index.as_query_engine( similarity_top_k=3, response_mode="compact", # Options: compact, tree_summarize, refine, no_text )

response = query_engine.query("What are vector databases used for?") print(response) print(f"Source nodes: {len(response.source_nodes)}") for node in response.source_nodes: print(f" Score: {node.score:.4f} | {node.text[:60]}...")

Persist and reload

index.storage_context.persist(persist_dir="./storage")

Later: reload from disk

storage_context = StorageContext.from_defaults(persist_dir="./storage") index = load_index_from_storage(storage_context)

Advanced Query Engines

from llama_index.core import VectorStoreIndex, SummaryIndex
from llama_index.core.tools import QueryEngineTool
from llama_index.core.query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector

Create specialized indices

vector_index = VectorStoreIndex.from_documents(technical_docs) summary_index = SummaryIndex.from_documents(overview_docs)

Create query engine tools

vector_tool = QueryEngineTool.from_defaults( query_engine=vector_index.as_query_engine(), description="Best for specific technical questions about implementation details.", )

summary_tool = QueryEngineTool.from_defaults( query_engine=summary_index.as_query_engine(response_mode="tree_summarize"), description="Best for high-level summaries and overviews of topics.", )

Router decides which engine to use based on the query

router_engine = RouterQueryEngine( selector=LLMSingleSelector.from_defaults(), query_engine_tools=[vector_tool, summary_tool], )

The router automatically picks the right engine

response = router_engine.query("Give me an overview of RAG systems") # Uses summary response = router_engine.query("What chunk_size should I use with HNSW?") # Uses vector

LangChain vs LlamaIndex: When to Use Which

Use LangChain when: you need flexible chain composition, agent workflows, integration with many tools (APIs, databases, web), or complex multi-step reasoning. Use LlamaIndex when: your primary task is document retrieval/QA, you need structured index management, you want built-in evaluation, or you need advanced retrieval patterns (router, sub-question). Many production systems use both: LlamaIndex for the retrieval layer and LangChain for orchestration and agents.

Common Patterns

Pattern 1: Conversational RAG (LangChain)

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_history_aware_retriever
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder

llm = ChatOpenAI(model="gpt-4o-mini") vectorstore = Chroma(embedding_function=OpenAIEmbeddings()) retriever = vectorstore.as_retriever()

Contextualize question based on chat history

contextualize_prompt = ChatPromptTemplate.from_messages([ ("system", "Given chat history and a new question, reformulate the question to be standalone."), MessagesPlaceholder("chat_history"), ("human", "{input}"), ])

history_aware_retriever = create_history_aware_retriever( llm, retriever, contextualize_prompt )

Answer prompt

answer_prompt = ChatPromptTemplate.from_messages([ ("system", "Answer based on context:\n\n{context}"), MessagesPlaceholder("chat_history"), ("human", "{input}"), ])

question_answer_chain = create_stuff_documents_chain(llm, answer_prompt) rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

Multi-turn conversation

chat_history = [] result = rag_chain.invoke({"input": "What is RAG?", "chat_history": chat_history}) chat_history.extend([("human", "What is RAG?"), ("ai", result["answer"])])

result = rag_chain.invoke({"input": "How does it reduce hallucinations?", "chat_history": chat_history})

The retriever understands "it" refers to RAG from the history

Pattern 2: Sub-Question Query Engine (LlamaIndex)

from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.tools import QueryEngineTool

Create query engine tools for different document sets

tools = [ QueryEngineTool.from_defaults( query_engine=ml_index.as_query_engine(), description="Contains information about machine learning concepts.", ), QueryEngineTool.from_defaults( query_engine=db_index.as_query_engine(), description="Contains information about databases and data storage.", ), ]

Sub-question engine decomposes complex queries

sub_question_engine = SubQuestionQueryEngine.from_defaults( query_engine_tools=tools, )

A complex question gets broken into sub-questions

response = sub_question_engine.query( "Compare how vector databases and traditional databases handle ML workloads" )

Internally generates:

Sub-Q1: "How do vector databases handle ML workloads?" -> ml_tool + db_tool

Sub-Q2: "How do traditional databases handle ML workloads?" -> db_tool

Then synthesizes a final answer