Tuesday, October 11, 2016

Stock Beta Computation using Linear Regression with MATLAB

In this article I show you how to compute stock beta using linear regression. The dataset is CAPMuniverse available in MATLAB.

The CAPM model:

$$R(k,i) = a(i) + C(k) + b(i) * (M(k) - C(k)) + V(k,i)$$

for samples k = 1, ... , m and assets i = 1, ... , n, where a(i) is a parameter that specifies the non-systematic return of an asset, b(i) is the asset beta, and V(k,i) is the residual error for each asset with associated random variable V(i). Asset alphas a(1), ... , a(n) are zeros in strict form of CAPM but non-zeros in practice.

The MATLAB dataset CAPMuniverse contains the daily total return data from 03-Jan-2000 to 07-Nov-2005 for 12 stocks as follows: 'AAPL', 'AMZN', 'CSCO', 'DELL', 'EBAY', 'GOOG', 'HPQ', 'IBM', 'INTC', 'MSFT', 'ORCL', 'YHOO'. Columns 13 and 14 are daily return data for the market, and the risk-free rate. For computing beta of each stock, You will subtract risk-free rate from the stock and the market returns to get x and y. Note that you need to add a column of 1 to x to make X so that X is of size m x 2

$$h_\theta(x)=\theta^{T}X=\theta_0+\theta_1x$$

More information regarding the dataset can be seen on Mathworks website. You will use regression to find the betas of these securities.

To compute the cost

$$J(\theta_0,\theta_1)=\frac{1}{2m}\sum\limits_{i=1}^m (h_\theta(x^{(i)})-y^{(i)}))^2$$

The function computeCost() as as follows:

function cost = computeCost(X,y,theta)
    m = length(y);
    h = X*theta;  
    cost = 1/2/m*sum((h-y).^2);
end

The parameters θ are updated with the pseudo-code:

Repeat until the maximum number of iterations {

Do the following for all thetas:

$$θ_j≔ θ_j-α/m \sum\limits_{i=1}^m (h_θ (x^{(i)}-y^{(i)})(x_j)^{(i)}$$

where j=0,…,n

}

The cost can be computed within the loop as well.

You need to update all the parameters $θ_j$ simultaneously. The function optimizeCost() output variables, function name, and input parameters

function [theta,cost_range] = optimizeCost(X,y,theta,step,maxrun)
    m = length(y);
    cost_range = zeros(maxrun,1);
    for iter = 1:maxrun
        h = X*theta;
        grad = 1/m * (h-y)' * X; % grad is 1 x d
        theta = theta - step * grad';
        cost_range(iter) = 1/2/m*sum((h-y).^2);
    end
end

Plotting the regression line:


For example the column 12 is the return data for Yahoo (YHOO), θ_1and θ_2 are

theta =

0.0001

1.6543

Where θ_1and θ_2 are alpha and beta values of YHOO computed by regression

Plotting the cost vs the number of iterations shows that the cost function does not increase as the number of iterations increases