Skip to main content

Posts

Python Libraries and Frameworks

  # Library Best for 1 PyTorch The dominant deep learning framework — research & production, dynamic graphs, huge ecosystem 2 TensorFlow / Keras Deep learning with strong production tooling (TF Serving, TF Lite); Keras gives a clean high-level API 3 Hugging Face Transformers Pre-trained LLMs & transformer models (text, vision, audio) — download and fine-tune SOTA models 4 scikit-learn Classical ML — regression, classification, clustering, preprocessing pipelines 5 NumPy Foundational array/tensor math that nearly every other library is built on 6 pandas Data loading, cleaning, and manipulation — the backbone of any ML data pipeline 7 LangChain Building LLM-powered apps — RAG, agents, chains, tool integration...
Recent posts

Data Distributions

  Introduction Data is everywhere — but raw numbers alone tell us very little. To make sense of data, statisticians use probability distributions : mathematical patterns that describe how values are likely to appear. Whether you're flipping a coin, measuring heights, counting website visitors, or predicting waiting times, there is a distribution that fits. Understanding these patterns helps data scientists, analysts, and curious learners spot trends, test ideas, and build smarter models. In this post, we'll explore nine essential distributions every data enthusiast should know — from the famous bell curve to the lesser-known Beta and Log Normal — explained simply, with real-world examples. Some of these are: Normal Distribution, Bernoulli Distribution, Binomial Distribution, Poisson Distribution, Exponential Distribution, Gamma Distribution, Beta Distribution, Uniform Distribution, Log Normal Distribution. See below for explanation.  1. Normal Distribution The Normal Distrib...

ROC and AUC Explained

This StatQuest video by Josh Starmer provides a clear explanation of ROC (Receiver Operating Characteristic) curves and AUC (Area Under the Curve) , which are tools used to evaluate the performance of classification models (like Logistic Regression). See: https://www.youtube.com/watch?v=4jRBRDbJemM Explanation in Words 1. The Problem: Choosing a Threshold When a machine learning model makes a prediction (e.g., "Is this mouse obese?"), it usually outputs a probability (e.g., "There is a 0.8 chance this mouse is obese"). To make a final decision, you must choose a threshold . Standard Threshold (0.5): If probability > 0.5, classify as Obese. Low Threshold (e.g., 0.1): You classify almost everyone as Obese. You catch all the actual cases (High Sensitivity), but you also falsely accuse many healthy mice (High False Positives). This is useful for dangerous diseases like Ebola where you can't afford to miss a case. High Threshold (e.g., 0.9): You are very stric...