As I dive into the world of Artificial Intelligence, moving away from my daily work as an SAP Full Stack Developer, I realized I needed to start at the very beginning. Today is quite literally my first day learning Machine Learning. Before getting lost in the hype of LLMs or complex neural networks, I wanted to strip away the magic and understand the absolute foundation.
For me, that meant looking at the very first equation of Machine Learning: Linear Regression.
The Mathematical Foundation
At its core, linear regression attempts to model the relationship between a scalar response and one or more explanatory variables by fitting a linear equation to observed data.
When I looked at Simple Linear Regression (a single variable), the hypothesis function is defined simply as:
Where:
- is our predicted output.
- is our input feature.
- is the weight (the slope).
- is the bias (the intercept).
To find the optimal values for and , I needed a way to measure how far off my predictions were from the actual true values in my dataset. In machine learning, we do this using a Cost Function. The most standard one for this use case is the Mean Squared Error (MSE):
Once we have the cost, we use Gradient Descent to iteratively update our parameters and minimize that cost. We calculate the partial derivatives (the gradients) with respect to and :
Here, is our learning rate, dictating how big of a step we take in the direction of the steepest descent.
Implementation in Vanilla Rust
I could have easily used Python with PyTorch or Scikit-Learn to do this in three lines of code. But to truly grasp the mechanics, I decided to write it from scratch using a systems language. Rust forces me to be explicit about memory and types, which I find incredibly helpful for building solid mental models.
Here is my pure Rust implementation, using nothing but standard vectors:
fn main() {
let xs: Vec<f64> = vec![1.0, 2.0, 3.0, 4.0, 5.0];
let ys: Vec<f64> = vec![2.0, 4.0, 5.5, 8.0, 10.5];
let n = xs.len() as f64;
let mut w: f64 = 0.0;
let mut b: f64 = 0.0;
let learning_rate: f64 = 0.01;
let epochs: usize = 1000;
for epoch in 0..epochs {
let mut dw = 0.0;
let mut db = 0.0;
let mut cost = 0.0;
for i in 0..xs.len() {
let prediction = w * xs[i] + b;
let error = prediction - ys[i];
dw += error * xs[i];
db += error;
cost += error * error;
}
dw /= n;
db /= n;
cost /= 2.0 * n;
w -= learning_rate * dw;
b -= learning_rate * db;
if epoch % 200 == 0 {
println!("Epoch {:>4} | Cost: {:.4} | w: {:.4}, b: {:.4}", epoch, cost, w, b);
}
}
println!("Final Equation: y = {:.4}x + {:.4}", w, b);
let test_x = 6.0;
let pred_y = w * test_x + b;
println!("Prediction for x={}: {:.4}", test_x, pred_y);
}
By building it from scratch in Rust, I forced myself to understand the raw math before hiding it behind a library. It’s a small step, but it demystifies the learning process. It builds the intuition I know I’ll need as I continue this journey into more complex models. Not bad for day one.