Skip to main content

Python Libraries and Frameworks

 

#

Library

Best for

1

PyTorch

The dominant deep learning framework — research & production, dynamic graphs, huge ecosystem

2

TensorFlow / Keras

Deep learning with strong production tooling (TF Serving, TF Lite); Keras gives a clean high-level API

3

Hugging Face Transformers

Pre-trained LLMs & transformer models (text, vision, audio) — download and fine-tune SOTA models

4

scikit-learn

Classical ML — regression, classification, clustering, preprocessing pipelines

5

NumPy

Foundational array/tensor math that nearly every other library is built on

6

pandas

Data loading, cleaning, and manipulation — the backbone of any ML data pipeline

7

LangChain

Building LLM-powered apps — RAG, agents, chains, tool integration



1. PyTorch

PyTorch is an open-source deep learning framework originally developed by Meta AI (Facebook) and now governed by the PyTorch Foundation under the Linux Foundation. It has become the dominant framework in AI research and is increasingly common in production as well. Its core strength is the dynamic computation graph (define-by-run), meaning the network is built on the fly as code executes. This makes PyTorch feel natural and "Pythonic" — you can use standard Python control flow, debug with ordinary tools, and inspect tensors at any point, which is a major advantage over older static-graph frameworks.

At its heart, PyTorch provides Tensor objects (similar to NumPy arrays but GPU-accelerated) and autograd, an automatic differentiation engine that computes gradients for backpropagation. You build models by subclassing torch.nn.Module, define a forward method, and train using optimizers from torch.optim and loss functions from torch.nn. GPU acceleration is as simple as calling .to("cuda") on tensors and models.

PyTorch's ecosystem is enormous. TorchVision, TorchText, and TorchAudio offer domain-specific datasets, models, and transforms. PyTorch Lightning and Hugging Face Accelerate reduce boilerplate for training loops and distributed training. For deployment, TorchScript and torch.compile (introduced in PyTorch 2.0) optimize and serialize models, while ONNX export enables cross-platform inference.

The framework dominates academic publishing — the majority of new research papers release PyTorch code — which means cutting-edge architectures appear here first. It scales from a laptop to thousands of GPUs via DistributedDataParallel and FSDP (Fully Sharded Data Parallel) for training massive models like LLMs.

Best for: researchers prototyping novel architectures, teams fine-tuning transformers, and anyone who values flexibility and debuggability. Learning curve: moderate; you write more low-level code than with Keras, but gain full control. It's the safest default choice for serious deep learning work today.


2. TensorFlow / Keras

TensorFlow is Google's open-source machine learning platform, first released in 2015, and one of the most mature, production-hardened frameworks available. Keras, now the official high-level API of TensorFlow (tf.keras), provides a clean, beginner-friendly interface that makes building neural networks remarkably approachable, while TensorFlow underneath handles the heavy computation.

The Keras API lets you build models in a few lines using the Sequential API for simple stacks of layers, or the Functional API for complex, multi-input/multi-output architectures. You compile a model with an optimizer, loss, and metrics, then call .fit(), .evaluate(), and .predict() — a workflow so intuitive it's often the first deep learning framework newcomers learn.

TensorFlow's biggest differentiator is its end-to-end production ecosystem. TensorFlow Serving deploys models behind high-performance APIs; TensorFlow Lite runs models on mobile and embedded/IoT devices; TensorFlow.js runs them in the browser; and TFX (TensorFlow Extended) provides full MLOps pipelines for data validation, training, and serving at scale. This makes it especially attractive to enterprises that need a reliable path from prototype to deployment.

Historically TensorFlow used static computation graphs, which were fast but hard to debug. Since TensorFlow 2.x, eager execution is the default, closing much of the usability gap with PyTorch while retaining graph optimization via @tf.function. It also integrates tightly with TensorBoard, the gold-standard visualization tool for monitoring training metrics, model graphs, and embeddings.

TensorFlow runs efficiently on CPUs, GPUs, and Google's custom TPUs, giving it an edge for certain large-scale Google Cloud workloads.

Best for: production deployment, mobile/edge inference, and teams wanting a mature, well-supported stack. Learning curve: Keras is one of the easiest entry points to deep learning, though TensorFlow's lower levels can feel more complex than PyTorch. Its research mindshare has declined relative to PyTorch, but it remains a powerful industrial choice.


3. Hugging Face Transformers

Hugging Face Transformers is the de facto standard library for working with pre-trained transformer models and modern large language models (LLMs). Instead of training models from scratch — which can cost millions of dollars — it lets you download thousands of state-of-the-art models from the Hugging Face Hub and use or fine-tune them in just a few lines of code. This democratization of cutting-edge AI is arguably its greatest contribution.

The library supports models across modalities: text (BERT, GPT, Llama, Mistral, T5), vision (ViT, DETR), audio (Whisper, Wav2Vec2), and multimodal models. Its highest-level interface, the pipeline() function, abstracts entire tasks — sentiment analysis, translation, summarization, question answering, image classification, speech recognition — into a single call, handling tokenization, model inference, and post-processing automatically.

For more control, you work directly with AutoModel and AutoTokenizer classes, which automatically load the correct architecture and tokenizer for any model name. Transformers integrates seamlessly with both PyTorch and TensorFlow (and JAX), so you can use whichever backend you prefer. The Trainer API simplifies fine-tuning with built-in support for mixed precision, distributed training, and evaluation, while integration with PEFT (Parameter-Efficient Fine-Tuning) libraries enables techniques like LoRA to fine-tune huge models on modest hardware.

The broader Hugging Face ecosystem amplifies its value: Datasets for efficient data loading, Tokenizers for fast text processing, Accelerate for distributed training, and the Hub itself for sharing models, datasets, and demo Spaces.

Best for: NLP and generative AI tasks, fine-tuning LLMs, and rapidly prototyping with state-of-the-art models without deep ML expertise. Learning curve: low for the pipeline API, moderate for custom fine-tuning. If you're building anything involving language models, chatbots, or transformer-based vision/audio, this is an essential tool.


4. scikit-learn

scikit-learn is the most popular library for classical (non-deep-learning) machine learning in Python. Built on NumPy, SciPy, and matplotlib, it provides clean, well-documented, and consistent implementations of the algorithms that power a huge proportion of real-world ML applications — many problems simply don't need deep learning, and scikit-learn is the go-to tool for them.

It covers the full range of traditional ML tasks. For supervised learning, it offers regression (linear, ridge, lasso) and classification (logistic regression, support vector machines, decision trees, random forests, gradient boosting, k-nearest neighbors). For unsupervised learning, it provides clustering (K-means, DBSCAN, hierarchical), dimensionality reduction (PCA, t-SNE), and anomaly detection. It also includes tools for model selection, hyperparameter tuning, and evaluation.

scikit-learn's defining feature is its consistent, elegant API. Almost every model follows the same pattern: instantiate an estimator, call .fit(X, y) to train, and .predict(X) to make predictions, with .transform() for preprocessing steps. This uniformity makes it easy to swap algorithms and learn new ones. The Pipeline class lets you chain preprocessing and modeling steps into a single reproducible object, preventing common mistakes like data leakage.

It excels at the practical "glue" work of ML: preprocessing (scaling, encoding categorical variables, imputing missing values), feature selection, cross-validation, and metrics (accuracy, precision, recall, F1, ROC-AUC, etc.). Tools like GridSearchCV and RandomizedSearchCV automate hyperparameter optimization.

It is designed for small-to-medium structured/tabular datasets that fit in memory and does not natively support GPUs or deep neural networks — for those, you turn to PyTorch or TensorFlow.

Best for: tabular data problems, baseline models, classical ML, teaching, and rapid experimentation. Learning curve: low — it's often the first ML library people learn. Reliable, stable, and exceptionally well-documented, it remains an indispensable part of nearly every data scientist's toolkit.


5. NumPy 

[https://www.youtube.com/watch?v=xECXZ3tyONo]

[https://www.youtube.com/watch?v=VXU4LSAQDSc]

[https://www.youtube.com/watch?v=ceMMJZrAXl8]

NumPy (Numerical Python) is the foundational library for numerical computing in Python and arguably the single most important package in the entire scientific Python ecosystem. Nearly every other AI and data library — pandas, scikit-learn, TensorFlow, PyTorch, SciPy — is built on top of NumPy or interoperates with its array format. Understanding NumPy is essential to understanding how AI computation works under the hood.

Its central object is the ndarray (n-dimensional array), a fast, memory-efficient container for homogeneous numerical data. Unlike Python lists, NumPy arrays store data in contiguous memory blocks and are implemented in C, making operations dramatically faster. NumPy enables vectorization — performing operations on entire arrays at once without explicit Python loops — which is both more concise and orders of magnitude faster. For example, adding two million-element arrays is a single expression that runs at near-C speed.

NumPy provides a rich set of mathematical capabilities: element-wise arithmetic, broadcasting (automatically aligning arrays of different shapes for operations), linear algebra (matrix multiplication, decompositions, eigenvalues via numpy.linalg), random number generation, Fourier transforms, and statistical functions. Broadcasting in particular is a powerful concept that underpins how tensors are manipulated in deep learning frameworks.

In AI specifically, NumPy is the lingua franca for data exchange: datasets are typically represented as NumPy arrays before being converted into framework-specific tensors. Concepts like tensors, axes, reshaping, slicing, and indexing that you use in PyTorch and TensorFlow originate directly from NumPy conventions.

While NumPy itself runs on CPU only, GPU-accelerated drop-in replacements like CuPy mirror its API, and deep learning tensors are deliberately designed to feel NumPy-like.

Best for: all numerical computation, data preprocessing, and as the substrate beneath higher-level libraries. Learning curve: low to moderate. Mastering array operations, broadcasting, and indexing pays dividends across every other AI library you'll ever use.


6. pandas

[https://www.youtube.com/watch?v=mkYBJwX_dMs]

[https://www.youtube.com/watch?v=dcqPhpY7tWk]

[https://www.youtube.com/watch?v=EXIgjIBu4EU]

pandas is the standard library for data manipulation and analysis in Python and a critical part of nearly every AI and machine learning workflow. Real-world data is rarely clean and ready for modeling — it must be loaded, explored, cleaned, transformed, and engineered into useful features first. pandas is the tool that handles this crucial "data wrangling" stage, which often consumes the majority of a data scientist's time.

Its two core data structures are the Series (a one-dimensional labeled array) and the DataFrame (a two-dimensional labeled table, conceptually like a spreadsheet or SQL table). DataFrames allow you to work with heterogeneous columns — numbers, strings, dates, categories — under intuitive row and column labels. Built on top of NumPy, pandas combines NumPy's speed with far more flexible, labeled, real-world data handling.

pandas reads and writes a wide variety of formats out of the box: CSV, Excel, JSON, SQL databases, Parquet, and more, making data ingestion straightforward. Once data is loaded, it offers an enormous toolkit: filtering and selecting rows/columns (.loc, .iloc, boolean masks), handling missing data (dropna, fillna), grouping and aggregation (groupby for split-apply-combine operations), merging and joining datasets, pivoting and reshaping, and time-series functionality with powerful date/time handling.

For AI specifically, pandas is where feature engineering happens — creating new columns, encoding categories, binning values, computing rolling statistics — before data is converted to NumPy arrays or tensors and fed into scikit-learn, PyTorch, or TensorFlow. It also integrates with visualization libraries (matplotlib, seaborn) for exploratory data analysis (EDA).

A limitation is that pandas works in-memory and can struggle with very large datasets; alternatives like Polars, Dask, or Spark address bigger-than-memory data.

Best for: data cleaning, exploration, and feature engineering on tabular data. Learning curve: low to moderate. It's an indispensable everyday tool for anyone doing data science or ML.


7. LangChain

LangChain is a popular open-source framework for building applications powered by large language models (LLMs). While libraries like Hugging Face focus on the models themselves, LangChain operates a layer above — it helps you orchestrate LLMs into complete, useful applications such as chatbots, question-answering systems, autonomous agents, and retrieval-augmented generation (RAG) pipelines. It emerged as one of the central tools of the generative AI application boom.

Its core idea is composability: connecting LLMs to other components and data sources to overcome their limitations. A raw LLM is stateless and only knows its training data, but LangChain lets you augment it. The LangChain Expression Language (LCEL) lets you compose these components into "chains" — pipelines where the output of one step feeds the next — using a clean, declarative syntax.

Key capabilities include: prompt templates for structuring and reusing prompts; memory to maintain conversation history across turns; document loaders and text splitters for ingesting PDFs, web pages, and databases; and integrations with vector stores (Pinecone, Chroma, FAISS) and embedding models to enable RAG, where the model retrieves relevant external documents to answer questions accurately and reduce hallucination.

LangChain is best known for agents — systems where the LLM decides which tools to use (web search, calculators, APIs, code execution) to accomplish a multi-step task, reasoning iteratively about what action to take next. The companion product LangSmith provides observability, debugging, and evaluation for these often-unpredictable LLM applications, while LangGraph enables more controllable, stateful, graph-based agent workflows.

A common critique is that LangChain's abstractions can feel heavy or change rapidly; some developers prefer lighter alternatives like LlamaIndex (RAG-focused) or direct API calls.

Best for: prototyping and building LLM apps, RAG systems, and agents. Learning curve: moderate, given its breadth and evolving API. It remains a leading choice for connecting LLMs to real-world data and actions.


Difference of Libraries vs. Frameworks in Python

Note: the line is often blurry. Many tools have both library-like and framework-like parts. PyTorch is usually used as a library (you write your own training loop), but PyTorch Lightning wraps it into a framework (it runs the loop and calls your hooks).

Other Practical Differences

Learning curve: Libraries are usually easier to adopt incrementally; frameworks require learning their conventions and structure upfront.

Lock-in: Frameworks impose more architecture, so switching later is harder. Libraries are easier to swap out.

Combining them: You can use many libraries together freely, but it's hard to use two full frameworks for the same job (they both want to be "in charge").

Examples beyond AI:

Libraries: requests, NumPy, pandas, Pillow, BeautifulSoup

Frameworks: Django, Flask, FastAPI, Scrapy, PyTorch Lightning

In one sentence: A library is a set of tools you reach for when you need them; a framework is a pre-built structure you build your application inside of, and it decides when to run your code.

Libraries vs. Frameworks in Python

Both are reusable collections of pre-written code, but the key difference is who is in control — a concept called Inversion of Control.

The Core Distinction

AspectLibraryFramework
ControlYou call the libraryThe framework calls your code
AnalogyA toolbox — you pick the tool when you need itA house blueprint — you fill in the rooms, but the structure is fixed
Flow of controlYour code drives the programThe framework drives; your code plugs into it
FlexibilityHigh — use it however you likeLower — follow the framework's rules and structure
ScopeSolves one specific problemProvides a full skeleton for an entire application
"Don't call us, we'll call you"❌ You call it✅ It calls you

The Simplest Way to Remember It

You call a library. A framework calls you.

  • With a library, your code is in charge. You decide when and how to use its functions.
  • With a framework, the framework is in charge. It provides the overall structure and calls your code at the right moments (via callbacks, hooks, or methods you implement).

Code Examples

Library (NumPy) — you are in control, calling functions when you choose:

import numpy as np

# YOU decide to call NumPy, when and how
data = np.array([1, 2, 3])
result = np.mean(data)   # you invoke the library
print(result)

Framework (Django / Flask) — the framework controls the flow and calls your code:

from flask import Flask
app = Flask(__name__)

# YOU write this function, but the FRAMEWORK decides
# when to call it (when a request hits "/")
@app.route("/")
def home():
    return "Hello World"

app.run()   # you hand control over to Flask

Notice: in Flask you never call home() yourself — Flask calls it for you when a web request arrives. That's inversion of control.

Mapping This to the AI Libraries We Discussed

ToolTechnically a...Why
NumPyLibraryYou call its functions directly
pandasLibraryYou call methods on DataFrames
scikit-learnLibraryYou call .fit(), .predict()
PyTorchLibrary (leans toward framework)Mostly you control the training loop
TensorFlow/KerasFramework-likemodel.fit() runs the loop and calls your callbacks
Hugging Face TransformersLibraryYou call pipelines and models
LangChainFrameworkIt orchestrates and calls your chains/agents

Comments

Popular posts from this blog

Simple Linear Regression - and Related Regression Loss Functions

Today's Topics: a. Regression Algorithms  b. Outliers - Explained in Simple Terms c. Common Regression Metrics Explained d. Overfitting and Underfitting e. How are Linear and Non Linear Regression Algorithms used in Neural Networks [Future study topics] Regression Algorithms Regression algorithms are a category of machine learning methods used to predict a continuous numerical value. Linear regression is a simple, powerful, and interpretable algorithm for this type of problem. Quick Example: These are the scores of students vs. the hours they spent studying. Looking at this dataset of student scores and their corresponding study hours, can we determine what score someone might achieve after studying for a random number of hours? Example: From the graph, we can estimate that 4 hours of daily study would result in a score near 80. It is a simple example, but for more complex tasks the underlying concept will be similar. If you understand this graph, you will understand this blog. Sim...

What problems can AI Neural Networks solve

How does AI Neural Networks solve Problems? What problems can AI Neural Networks solve? Based on effectiveness and common usage, here's the ranking from best to least suitable for neural networks (Classification Problems, Regression Problems and Optimization Problems.) But first some Math, background and related topics as how the Neural Network Learn by training (Supervised Learning and Unsupervised Learning.)  Background Note - Mathematical Precision vs. Practical AI Solutions. Math can solve all these problems with very accurate results. While Math can theoretically solve classification, regression, and optimization problems with perfect accuracy, such calculations often require impractical amounts of time—hours, days, or even years for complex real-world scenarios. In practice, we rarely need absolute precision; instead, we need actionable results quickly enough to make timely decisions. Neural networks excel at this trade-off, providing "good enough" solutions in seco...

Activation Functions in Neural Networks

  A Guide to Activation Functions in Neural Networks 🧠 Question: Without activation function can a neural network with many layers be non-linear? Answer: Provided at the end of this document. Activation functions are a crucial component of neural networks. Their primary purpose is to introduce non-linearity , which allows the network to learn the complex, winding patterns found in real-world data. Without them, a neural network, no matter how deep, would just be a simple linear model. In the diagram below the f is the activation function that receives input and send output to next layers. Commonly used activation functions. 1. Sigmoid Function 2. Tanh (Hyperbolic Tangent) 3. ReLU (Rectified Linear Unit - Like an Electronic Diode) 4. Leaky ReLU & PReLU 5. ELU (Exponential Linear Unit) 6. Softmax 7. GELU, Swish, and SiLU 1. Sigmoid Function                       The classic "S-curve," Sigmoid squashes any input value t...