Transformers Library Quickstart

Hugging Face's transformers library is the most popular open-source library for working with pre-trained language models. It provides a unified API for thousands of models across NLP, computer vision, audio, and multimodal tasks.

Installation

pip install transformers torch datasets accelerate

The library supports PyTorch, TensorFlow, and JAX backends. We'll use PyTorch throughout this module.

The Transformers Philosophy

Hugging Face Transformers provides three levels of abstraction: (1) Pipeline API for quick inference, (2) AutoModel + AutoTokenizer for flexible model usage, and (3) direct model classes for full control. Start simple and go deeper only when needed.

The Pipeline API

The pipeline() function is the simplest way to use pre-trained models. It handles tokenization, model inference, and post-processing in a single call.

Sentiment Analysis

from transformers import pipeline
Create a sentiment analysis pipeline
classifier = pipeline("sentiment-analysis")
Single prediction
result = classifier("I love learning about AI!")
print(result)
[{'label': 'POSITIVE', 'score': 0.9998}]
Batch prediction
results = classifier([
    "This movie was terrible.",
    "The food was absolutely delicious!",
    "I'm not sure how I feel about this."
])
for r in results:
    print(f"{r['label']}: {r['score']:.4f}")

Text Summarization

summarizer = pipeline("summarization")
article = """
Hugging Face has become the central hub for machine learning models.
Founded in 2016, the company initially built a chatbot app before
pivoting to become the GitHub of machine learning. Their Transformers
library supports over 200,000 models and is used by thousands of
organizations. The platform hosts models, datasets, and Spaces
for demo applications.
"""summary = summarizer(article, max_length=50, min_length=20)
print(summary[0]['summary_text'])

Named Entity Recognition (NER)

ner = pipeline("ner", aggregation_strategy="simple")
text = "Elon Musk founded SpaceX in Hawthorne, California."
entities = ner(text)
for entity in entities:
    print(f"{entity['word']}: {entity['entity_group']} ({entity['score']:.3f})")
Elon Musk: PER (0.998)
SpaceX: ORG (0.995)
Hawthorne: LOC (0.993)
California: LOC (0.997)

Question Answering

qa = pipeline("question-answering")
context = """
The transformer architecture was introduced in the 2017 paper
'Attention Is All You Need' by Vaswani et al. It replaced recurrent
layers with self-attention mechanisms, enabling massive parallelization
and leading to models like BERT and GPT.
"""
answer = qa(
    question="Who introduced the transformer architecture?",
    context=context
)
print(f"Answer: {answer['answer']} (score: {answer['score']:.3f})")
Answer: Vaswani et al (score: 0.892)

Zero-Shot Classification

zero_shot = pipeline("zero-shot-classification")
result = zero_shot(
    "I just got promoted to senior engineer!",
    candidate_labels=["career", "health", "sports", "technology"]
)
print(f"Labels: {result['labels']}")
print(f"Scores: {[f'{s:.3f}' for s in result['scores']]}")
Labels: ['career', 'technology', 'sports', 'health']
Scores: ['0.891', '0.067', '0.024', '0.018']

Translation

translator = pipeline("translation_en_to_fr")
result = translator("Machine learning is transforming every industry.")
print(result[0]['translation_text'])
L'apprentissage automatique transforme chaque industrie.

Text Generation

generator = pipeline("text-generation", model="gpt2")output = generator(
    "The future of artificial intelligence",
    max_new_tokens=50,
    num_return_sequences=1,
    temperature=0.7
)
print(output[0]['generated_text'])

Specifying Models

Every pipeline uses a default model, but you can specify any compatible model from the Hub: pipeline("sentiment-analysis", model="nlptown/bert-base-multilingual-uncased-sentiment"). This lets you swap models without changing your code.

AutoModel and AutoTokenizer

When you need more control than the pipeline provides, use AutoModel and AutoTokenizer directly. This is the standard approach for production code.

The Three Auto Classes

from transformers import AutoTokenizer, AutoModel, AutoConfig
model_name = "bert-base-uncased"
Load just the config (no weights downloaded)
config = AutoConfig.from_pretrained(model_name)
print(f"Hidden size: {config.hidden_size}")      # 768
print(f"Num layers: {config.num_hidden_layers}")  # 12
print(f"Num heads: {config.num_attention_heads}") # 12
Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
Load the model
model = AutoModel.from_pretrained(model_name)

Manual Inference

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
Tokenize input
text = "I absolutely love this product!"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
print(f"Input IDs shape: {inputs['input_ids'].shape}")
print(f"Attention mask shape: {inputs['attention_mask'].shape}")
print(f"Tokens: {tokenizer.convert_ids_to_tokens(inputs['input_ids'][0])}")
Run inference (no gradient computation needed)
model.eval()
with torch.no_grad():
    outputs = model(**inputs)
Process logits
logits = outputs.logits
probabilities = torch.softmax(logits, dim=-1)
predicted_class = torch.argmax(probabilities, dim=-1).item()labels = model.config.id2label
print(f"Prediction: {labels[predicted_class]}")
print(f"Confidence: {probabilities[0][predicted_class]:.4f}")

AutoModel Variants

Use the right AutoModel subclass for your task: - AutoModel: Base model, returns hidden states - AutoModelForSequenceClassification: Text classification - AutoModelForTokenClassification: NER, POS tagging - AutoModelForQuestionAnswering: Extractive QA - AutoModelForCausalLM: Text generation (GPT-style) - AutoModelForSeq2SeqLM: Translation, summarization (T5-style)

Batch Processing

For efficiency, always batch your inputs when processing multiple texts:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()
texts = [
    "This is fantastic!",
    "Terrible experience.",
    "Pretty average, nothing special.",
    "Best purchase I've ever made!",
    "Would not recommend to anyone."
]
Tokenize as a batch - padding ensures uniform length
inputs = tokenizer(
    texts,
    return_tensors="pt",
    padding=True,       # Pad to longest in batch
    truncation=True,    # Truncate if over max length
    max_length=128
)
with torch.no_grad():
    outputs = model(inputs)
    probs = torch.softmax(outputs.logits, dim=-1)
    predictions = torch.argmax(probs, dim=-1)for text, pred, prob in zip(texts, predictions, probs):
    label = model.config.id2label[pred.item()]
    confidence = prob[pred.item()].item()
    print(f"[{label} {confidence:.2f}] {text}")

Device Management

Move models and inputs to GPU for faster inference:

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name).to(device)
Inputs must also be on the same device
text = "Great movie!"
inputs = tokenizer(text, return_tensors="pt").to(device)with torch.no_grad():
    outputs = model(inputs)
For pipelines, use the device argument
classifier = pipeline(
    "sentiment-analysis",
    device=0  # GPU index, or -1 for CPU
)

Memory Considerations

Large models can exhaust GPU memory. Use model.half() for FP16 inference to halve memory usage, or use device_map='auto' with accelerate to automatically split a model across multiple GPUs or offload to CPU/disk.