Python Libraries and Frameworks

#	Library	Best for
1	PyTorch	The dominant deep learning framework — research & production, dynamic graphs, huge ecosystem
2	TensorFlow / Keras	Deep learning with strong production tooling (TF Serving, TF Lite); Keras gives a clean high-level API
3	Hugging Face Transformers	Pre-trained LLMs & transformer models (text, vision, audio) — download and fine-tune SOTA models
4	scikit-learn	Classical ML — regression, classification, clustering, preprocessing pipelines
5	NumPy	Foundational array/tensor math that nearly every other library is built on
6	pandas	Data loading, cleaning, and manipulation — the backbone of any ML data pipeline
7	LangChain	Building LLM-powered apps — RAG, agents, chains, tool integration

1. PyTorch

PyTorch is an open-source deep learning framework originally developed by Meta AI (Facebook) and now governed by the PyTorch Foundation under the Linux Foundation. It has become the dominant framework in AI research and is increasingly common in production as well. Its core strength is the dynamic computation graph (define-by-run), meaning the network is built on the fly as code executes. This makes PyTorch feel natural and "Pythonic" — you can use standard Python control flow, debug with ordinary tools, and inspect tensors at any point, which is a major advantage over older static-graph frameworks.

At its heart, PyTorch provides Tensor objects (similar to NumPy arrays but GPU-accelerated) and autograd, an automatic differentiation engine that computes gradients for backpropagation. You build models by subclassing torch.nn.Module, define a forward method, and train using optimizers from torch.optim and loss functions from torch.nn. GPU acceleration is as simple as calling .to("cuda") on tensors and models.

PyTorch's ecosystem is enormous. TorchVision, TorchText, and TorchAudio offer domain-specific datasets, models, and transforms. PyTorch Lightning and Hugging Face Accelerate reduce boilerplate for training loops and distributed training. For deployment, TorchScript and torch.compile (introduced in PyTorch 2.0) optimize and serialize models, while ONNX export enables cross-platform inference.

The framework dominates academic publishing — the majority of new research papers release PyTorch code — which means cutting-edge architectures appear here first. It scales from a laptop to thousands of GPUs via DistributedDataParallel and FSDP (Fully Sharded Data Parallel) for training massive models like LLMs.

Best for: researchers prototyping novel architectures, teams fine-tuning transformers, and anyone who values flexibility and debuggability. Learning curve: moderate; you write more low-level code than with Keras, but gain full control. It's the safest default choice for serious deep learning work today.

2. TensorFlow / Keras

TensorFlow is Google's open-source machine learning platform, first released in 2015, and one of the most mature, production-hardened frameworks available. Keras, now the official high-level API of TensorFlow (tf.keras), provides a clean, beginner-friendly interface that makes building neural networks remarkably approachable, while TensorFlow underneath handles the heavy computation.

The Keras API lets you build models in a few lines using the Sequential API for simple stacks of layers, or the Functional API for complex, multi-input/multi-output architectures. You compile a model with an optimizer, loss, and metrics, then call .fit(), .evaluate(), and .predict() — a workflow so intuitive it's often the first deep learning framework newcomers learn.

TensorFlow's biggest differentiator is its end-to-end production ecosystem. TensorFlow Serving deploys models behind high-performance APIs; TensorFlow Lite runs models on mobile and embedded/IoT devices; TensorFlow.js runs them in the browser; and TFX (TensorFlow Extended) provides full MLOps pipelines for data validation, training, and serving at scale. This makes it especially attractive to enterprises that need a reliable path from prototype to deployment.

Historically TensorFlow used static computation graphs, which were fast but hard to debug. Since TensorFlow 2.x, eager execution is the default, closing much of the usability gap with PyTorch while retaining graph optimization via @tf.function. It also integrates tightly with TensorBoard, the gold-standard visualization tool for monitoring training metrics, model graphs, and embeddings.

TensorFlow runs efficiently on CPUs, GPUs, and Google's custom TPUs, giving it an edge for certain large-scale Google Cloud workloads.

Best for: production deployment, mobile/edge inference, and teams wanting a mature, well-supported stack. Learning curve: Keras is one of the easiest entry points to deep learning, though TensorFlow's lower levels can feel more complex than PyTorch. Its research mindshare has declined relative to PyTorch, but it remains a powerful industrial choice.

3. Hugging Face Transformers

Hugging Face Transformers is the de facto standard library for working with pre-trained transformer models and modern large language models (LLMs). Instead of training models from scratch — which can cost millions of dollars — it lets you download thousands of state-of-the-art models from the Hugging Face Hub and use or fine-tune them in just a few lines of code. This democratization of cutting-edge AI is arguably its greatest contribution.

The library supports models across modalities: text (BERT, GPT, Llama, Mistral, T5), vision (ViT, DETR), audio (Whisper, Wav2Vec2), and multimodal models. Its highest-level interface, the pipeline() function, abstracts entire tasks — sentiment analysis, translation, summarization, question answering, image classification, speech recognition — into a single call, handling tokenization, model inference, and post-processing automatically.

For more control, you work directly with AutoModel and AutoTokenizer classes, which automatically load the correct architecture and tokenizer for any model name. Transformers integrates seamlessly with both PyTorch and TensorFlow (and JAX), so you can use whichever backend you prefer. The Trainer API simplifies fine-tuning with built-in support for mixed precision, distributed training, and evaluation, while integration with PEFT (Parameter-Efficient Fine-Tuning) libraries enables techniques like LoRA to fine-tune huge models on modest hardware.

The broader Hugging Face ecosystem amplifies its value: Datasets for efficient data loading, Tokenizers for fast text processing, Accelerate for distributed training, and the Hub itself for sharing models, datasets, and demo Spaces.

Best for: NLP and generative AI tasks, fine-tuning LLMs, and rapidly prototyping with state-of-the-art models without deep ML expertise. Learning curve: low for the pipeline API, moderate for custom fine-tuning. If you're building anything involving language models, chatbots, or transformer-based vision/audio, this is an essential tool.

4. scikit-learn

scikit-learn is the most popular library for classical (non-deep-learning) machine learning in Python. Built on NumPy, SciPy, and matplotlib, it provides clean, well-documented, and consistent implementations of the algorithms that power a huge proportion of real-world ML applications — many problems simply don't need deep learning, and scikit-learn is the go-to tool for them.

It covers the full range of traditional ML tasks. For supervised learning, it offers regression (linear, ridge, lasso) and classification (logistic regression, support vector machines, decision trees, random forests, gradient boosting, k-nearest neighbors). For unsupervised learning, it provides clustering (K-means, DBSCAN, hierarchical), dimensionality reduction (PCA, t-SNE), and anomaly detection. It also includes tools for model selection, hyperparameter tuning, and evaluation.

scikit-learn's defining feature is its consistent, elegant API. Almost every model follows the same pattern: instantiate an estimator, call .fit(X, y) to train, and .predict(X) to make predictions, with .transform() for preprocessing steps. This uniformity makes it easy to swap algorithms and learn new ones. The Pipeline class lets you chain preprocessing and modeling steps into a single reproducible object, preventing common mistakes like data leakage.

It excels at the practical "glue" work of ML: preprocessing (scaling, encoding categorical variables, imputing missing values), feature selection, cross-validation, and metrics (accuracy, precision, recall, F1, ROC-AUC, etc.). Tools like GridSearchCV and RandomizedSearchCV automate hyperparameter optimization.

It is designed for small-to-medium structured/tabular datasets that fit in memory and does not natively support GPUs or deep neural networks — for those, you turn to PyTorch or TensorFlow.

Best for: tabular data problems, baseline models, classical ML, teaching, and rapid experimentation. Learning curve: low — it's often the first ML library people learn. Reliable, stable, and exceptionally well-documented, it remains an indispensable part of nearly every data scientist's toolkit.

5. NumPy

[https://www.youtube.com/watch?v=xECXZ3tyONo]

[https://www.youtube.com/watch?v=VXU4LSAQDSc]

[https://www.youtube.com/watch?v=ceMMJZrAXl8]

NumPy (Numerical Python) is the foundational library for numerical computing in Python and arguably the single most important package in the entire scientific Python ecosystem. Nearly every other AI and data library — pandas, scikit-learn, TensorFlow, PyTorch, SciPy — is built on top of NumPy or interoperates with its array format. Understanding NumPy is essential to understanding how AI computation works under the hood.

Its central object is the ndarray (n-dimensional array), a fast, memory-efficient container for homogeneous numerical data. Unlike Python lists, NumPy arrays store data in contiguous memory blocks and are implemented in C, making operations dramatically faster. NumPy enables vectorization — performing operations on entire arrays at once without explicit Python loops — which is both more concise and orders of magnitude faster. For example, adding two million-element arrays is a single expression that runs at near-C speed.

NumPy provides a rich set of mathematical capabilities: element-wise arithmetic, broadcasting (automatically aligning arrays of different shapes for operations), linear algebra (matrix multiplication, decompositions, eigenvalues via numpy.linalg), random number generation, Fourier transforms, and statistical functions. Broadcasting in particular is a powerful concept that underpins how tensors are manipulated in deep learning frameworks.

In AI specifically, NumPy is the lingua franca for data exchange: datasets are typically represented as NumPy arrays before being converted into framework-specific tensors. Concepts like tensors, axes, reshaping, slicing, and indexing that you use in PyTorch and TensorFlow originate directly from NumPy conventions.

While NumPy itself runs on CPU only, GPU-accelerated drop-in replacements like CuPy mirror its API, and deep learning tensors are deliberately designed to feel NumPy-like.

Best for: all numerical computation, data preprocessing, and as the substrate beneath higher-level libraries. Learning curve: low to moderate. Mastering array operations, broadcasting, and indexing pays dividends across every other AI library you'll ever use.

Here is the complete checklist of every Python and NumPy command used, demonstrated, or mentioned in the tutorial, organized by functional category.

### 1. Library Setup & Environment

```python

# Installing NumPy (run in terminal)

pip install numpy

# Importing the library under its standard alias

import numpy as np

```

### 2. Array Creation

```python

# Create an array of zeros (defaults to float data type)

np.zeros(shape)

# Create an array of ones (defaults to float data type)

np.ones(shape)

# Create an uninitialized empty array (faster allocation, contains junk data)

np.empty(shape)

# Create evenly spaced numbers over a specified interval (start, stop, num_elements)

np.linspace(2, 10, 5)

# Convert a Python list or nested list into a NumPy array

np.array([1, 2, 3])

# Generate an array of random integers within a range

np.random.randint(low, high, size)

```

### 3. Inspection & Workspace Tips (Jupyter Notebook)

```python

# Check the dimensions/shape of an array

array_name.shape

# Inspect the core data type of the array object itself

type(array_name)

# Inspect the specific data type of elements inside the array

type(array_name[0])

# Jupyter Shortcut: Tab completion to view all available methods

array_name.<Tab>

# Jupyter Shortcut: View the full documentation/docstring for an array object

?array_name

# Jupyter Shortcut: View documentation specifically for an attribute or method

?array_name.shape

```

### 4. Shape Manipulation & Sorting

```python

# Change the dimensions of an array without changing its data

array_name.reshape(rows, columns)

# Transpose an array (interchanges rows and columns / flips axes)

array_name.T

# Return a sorted copy of an array

np.sort(array_name)

```

### 5. Indexing, Slicing, & Filtering

```python

# Access the element at index 0 (first element)

array_name[0]

# Slice from index 0 up to (but excluding) index 2

array_name[0:2]

# Access the last element of an array

array_name[-1]

# Multi-dimensional slice: Reverse rows, keep columns and color channels intact

photo[::-1, :, :]

# Multi-dimensional slice: Keep rows intact, reverse columns (horizontal mirror)

photo[:, ::-1, :]

# Multi-dimensional slice: Crop a specific bounding box section

photo[50:150, 150:280]

# Multi-dimensional slice: Downsample by skipping every other row and column (step of 2)

photo[::2, ::2]

# Vectorized conditional replacement (condition, value_if_true, value_if_false)

np.where(photo > 100, 255, 0)

```

### 6. Mathematical & Statistical Methods

```python

# Apply a mathematical function (sine) element-wise across the entire array

np.sin(array_name)

# Sum of all elements in the array

array_name.sum()

# Product of all elements in the array

array_name.prod()

# Mean (average) of all elements

array_name.mean()

# Standard deviation of the data

array_name.std()

# Variance of the data

array_name.var()

# Find the minimum value in the array

array_name.min()

# Find the maximum value in the array

array_name.max()

# Find the index of the minimum value

array_name.argmin()

# Find the index of the maximum value

array_name.argmax()

```

### 7. Operators & Broadcasting

```python

# Scalar math: Adds 30 to every single element

a + 30

# Scalar math: Multiplies every single element by 10

a * 10

# Element-wise math: Adds corresponding elements of two arrays together

a + b

# Element-wise math: Multiplies corresponding elements of two arrays together

a * b

# Matrix math: Computes the dot product of two arrays

a @ b

# Boolean Masking: Returns a boolean array evaluating the condition for each element

a > 3

# Boolean Indexing: Filters the array, returning only elements that evaluate to True

a[a > 3]

```

6. pandas

[https://www.youtube.com/watch?v=mkYBJwX_dMs]

[https://www.youtube.com/watch?v=dcqPhpY7tWk]

[https://www.youtube.com/watch?v=EXIgjIBu4EU]

pandas is the standard library for data manipulation and analysis in Python and a critical part of nearly every AI and machine learning workflow. Real-world data is rarely clean and ready for modeling — it must be loaded, explored, cleaned, transformed, and engineered into useful features first. pandas is the tool that handles this crucial "data wrangling" stage, which often consumes the majority of a data scientist's time.

Its two core data structures are the Series (a one-dimensional labeled array) and the DataFrame (a two-dimensional labeled table, conceptually like a spreadsheet or SQL table). DataFrames allow you to work with heterogeneous columns — numbers, strings, dates, categories — under intuitive row and column labels. Built on top of NumPy, pandas combines NumPy's speed with far more flexible, labeled, real-world data handling.

pandas reads and writes a wide variety of formats out of the box: CSV, Excel, JSON, SQL databases, Parquet, and more, making data ingestion straightforward. Once data is loaded, it offers an enormous toolkit: filtering and selecting rows/columns (.loc, .iloc, boolean masks), handling missing data (dropna, fillna), grouping and aggregation (groupby for split-apply-combine operations), merging and joining datasets, pivoting and reshaping, and time-series functionality with powerful date/time handling.

For AI specifically, pandas is where feature engineering happens — creating new columns, encoding categories, binning values, computing rolling statistics — before data is converted to NumPy arrays or tensors and fed into scikit-learn, PyTorch, or TensorFlow. It also integrates with visualization libraries (matplotlib, seaborn) for exploratory data analysis (EDA).

A limitation is that pandas works in-memory and can struggle with very large datasets; alternatives like Polars, Dask, or Spark address bigger-than-memory data.

Best for: data cleaning, exploration, and feature engineering on tabular data. Learning curve: low to moderate. It's an indispensable everyday tool for anyone doing data science or ML.

7. LangChain

LangChain is a popular open-source framework for building applications powered by large language models (LLMs). While libraries like Hugging Face focus on the models themselves, LangChain operates a layer above — it helps you orchestrate LLMs into complete, useful applications such as chatbots, question-answering systems, autonomous agents, and retrieval-augmented generation (RAG) pipelines. It emerged as one of the central tools of the generative AI application boom.

Its core idea is composability: connecting LLMs to other components and data sources to overcome their limitations. A raw LLM is stateless and only knows its training data, but LangChain lets you augment it. The LangChain Expression Language (LCEL) lets you compose these components into "chains" — pipelines where the output of one step feeds the next — using a clean, declarative syntax.

Key capabilities include: prompt templates for structuring and reusing prompts; memory to maintain conversation history across turns; document loaders and text splitters for ingesting PDFs, web pages, and databases; and integrations with vector stores (Pinecone, Chroma, FAISS) and embedding models to enable RAG, where the model retrieves relevant external documents to answer questions accurately and reduce hallucination.

LangChain is best known for agents — systems where the LLM decides which tools to use (web search, calculators, APIs, code execution) to accomplish a multi-step task, reasoning iteratively about what action to take next. The companion product LangSmith provides observability, debugging, and evaluation for these often-unpredictable LLM applications, while LangGraph enables more controllable, stateful, graph-based agent workflows.

A common critique is that LangChain's abstractions can feel heavy or change rapidly; some developers prefer lighter alternatives like LlamaIndex (RAG-focused) or direct API calls.

Best for: prototyping and building LLM apps, RAG systems, and agents. Learning curve: moderate, given its breadth and evolving API. It remains a leading choice for connecting LLMs to real-world data and actions.

Difference of Libraries vs. Frameworks in Python

Note: the line is often blurry. Many tools have both library-like and framework-like parts. PyTorch is usually used as a library (you write your own training loop), but PyTorch Lightning wraps it into a framework (it runs the loop and calls your hooks).

Other Practical Differences

Learning curve: Libraries are usually easier to adopt incrementally; frameworks require learning their conventions and structure upfront.

Lock-in: Frameworks impose more architecture, so switching later is harder. Libraries are easier to swap out.

Combining them: You can use many libraries together freely, but it's hard to use two full frameworks for the same job (they both want to be "in charge").

Examples beyond AI:

Libraries: requests, NumPy, pandas, Pillow, BeautifulSoup

Frameworks: Django, Flask, FastAPI, Scrapy, PyTorch Lightning

In one sentence: A library is a set of tools you reach for when you need them; a framework is a pre-built structure you build your application inside of, and it decides when to run your code.

Libraries vs. Frameworks in Python

Both are reusable collections of pre-written code, but the key difference is who is in control — a concept called Inversion of Control.

The Core Distinction

Aspect	Library	Framework
Control	You call the library	The framework calls your code
Analogy	A toolbox — you pick the tool when you need it	A house blueprint — you fill in the rooms, but the structure is fixed
Flow of control	Your code drives the program	The framework drives; your code plugs into it
Flexibility	High — use it however you like	Lower — follow the framework's rules and structure
Scope	Solves one specific problem	Provides a full skeleton for an entire application
"Don't call us, we'll call you"	❌ You call it	✅ It calls you

The Simplest Way to Remember It

You call a library. A framework calls you.

With a library, your code is in charge. You decide when and how to use its functions.
With a framework, the framework is in charge. It provides the overall structure and calls your code at the right moments (via callbacks, hooks, or methods you implement).

Code Examples

Library (NumPy) — you are in control, calling functions when you choose:

import numpy as np

# YOU decide to call NumPy, when and how
data = np.array([1, 2, 3])
result = np.mean(data)   # you invoke the library
print(result)

Framework (Django / Flask) — the framework controls the flow and calls your code:

from flask import Flask
app = Flask(__name__)

# YOU write this function, but the FRAMEWORK decides
# when to call it (when a request hits "/")
@app.route("/")
def home():
    return "Hello World"

app.run()   # you hand control over to Flask

Notice: in Flask you never call home() yourself — Flask calls it for you when a web request arrives. That's inversion of control.

Mapping This to the AI Libraries We Discussed

Tool	Technically a...	Why
NumPy	Library	You call its functions directly
pandas	Library	You call methods on DataFrames
scikit-learn	Library	You call `.fit()`, `.predict()`
PyTorch	Library (leans toward framework)	Mostly you control the training loop
TensorFlow/Keras	Framework-like	`model.fit()` runs the loop and calls your callbacks
Hugging Face Transformers	Library	You call pipelines and models
LangChain	Framework	It orchestrates and calls your chains/agents

Artificial Intelligence Theory and Application

Search This Blog

Python Libraries and Frameworks

1. PyTorch

Libraries vs. Frameworks in Python

The Core Distinction

The Simplest Way to Remember It

Code Examples

Mapping This to the AI Libraries We Discussed

Comments

Post a Comment

Popular posts from this blog

Simple Linear Regression - and Related Regression Loss Functions

What problems can AI Neural Networks solve

Activation Functions in Neural Networks