The First Equation of Machine Learning: Linear Regression in Vanilla Rust

As I dive into the world of Artificial Intelligence, moving away from my daily work as an SAP Full Stack Developer, I realized I needed to start at the very beginning. Today is quite literally my first day learning Machine Learning. Before getting lost in the hype of LLMs or complex neural networks, I wanted to strip away the magic and understand the absolute foundation.

For me, that meant looking at the very first equation of Machine Learning: Linear Regression.

The Mathematical Foundation

At its core, linear regression attempts to model the relationship between a scalar response and one or more explanatory variables by fitting a linear equation to observed data.

When I looked at Simple Linear Regression (a single variable), the hypothesis function $h_\theta(x)$ is defined simply as:

y = wx + b

Where:

$y$ is our predicted output.
$x$ is our input feature.
$w$ is the weight (the slope).
$b$ is the bias (the intercept).

To find the optimal values for $w$ and $b$ , I needed a way to measure how far off my predictions were from the actual true values in my dataset. In machine learning, we do this using a Cost Function. The most standard one for this use case is the Mean Squared Error (MSE):

J(w, b) = \frac{1}{2N} \sum_{i=1}^{N} (y^{(i)} - (wx^{(i)} + b))^2

Once we have the cost, we use Gradient Descent to iteratively update our parameters and minimize that cost. We calculate the partial derivatives (the gradients) with respect to $w$ and $b$ :

w = w - \alpha \frac{\partial J}{\partial w} = w - \alpha \frac{1}{N} \sum_{i=1}^{N} (wx^{(i)} + b - y^{(i)})x^{(i)}

b = b - \alpha \frac{\partial J}{\partial b} = b - \alpha \frac{1}{N} \sum_{i=1}^{N} (wx^{(i)} + b - y^{(i)})

Here, $\alpha$ is our learning rate, dictating how big of a step we take in the direction of the steepest descent.

Implementation in Vanilla Rust

I could have easily used Python with PyTorch or Scikit-Learn to do this in three lines of code. But to truly grasp the mechanics, I decided to write it from scratch using a systems language. Rust forces me to be explicit about memory and types, which I find incredibly helpful for building solid mental models.

Here is my pure Rust implementation, using nothing but standard vectors:

fn main() {
    let xs: Vec<f64> = vec![1.0, 2.0, 3.0, 4.0, 5.0];
    let ys: Vec<f64> = vec![2.0, 4.0, 5.5, 8.0, 10.5];
    let n = xs.len() as f64;

    let mut w: f64 = 0.0;
    let mut b: f64 = 0.0;
    
    let learning_rate: f64 = 0.01;
    let epochs: usize = 1000;

    for epoch in 0..epochs {
        let mut dw = 0.0;
        let mut db = 0.0;
        let mut cost = 0.0;

        for i in 0..xs.len() {
            let prediction = w * xs[i] + b;
            let error = prediction - ys[i];
            
            dw += error * xs[i];
            db += error;
            cost += error * error;
        }

        dw /= n;
        db /= n;
        cost /= 2.0 * n;

        w -= learning_rate * dw;
        b -= learning_rate * db;

        if epoch % 200 == 0 {
            println!("Epoch {:>4} | Cost: {:.4} | w: {:.4}, b: {:.4}", epoch, cost, w, b);
        }
    }

    println!("Final Equation: y = {:.4}x + {:.4}", w, b);
    
    let test_x = 6.0;
    let pred_y = w * test_x + b;
    println!("Prediction for x={}: {:.4}", test_x, pred_y);
}

By building it from scratch in Rust, I forced myself to understand the raw math before hiding it behind a library. It’s a small step, but it demystifies the learning process. It builds the intuition I know I’ll need as I continue this journey into more complex models. Not bad for day one.