The following is the information of the attributes, from the income statements and balance sheets.
● Size
• Sales
● Profit
• ROCE: profit before tax=capital employed (%)
• FFTL: funds flow (earnings before interest, tax & depreciation)=total liabilities
● Gearing
• GEAR: (current liabilities + long-term debt)=total assets
• CLTA: current liabilities=total assets
● Liquidity
• CACL: current assets=current liabilities
• QACL: (current assets – stock)=current liabilities
• WCTA: (current assets – current liabilities)=total assets
• LAG: number of days between account year end and the date the annual report and accounts were filed at company registry.
• AGE: number of years company has been operating since incorporation date.
• CHAUD: coded 1 if changed auditor in previous three years, 0 otherwise
• BIG6: coded 1 if company auditor is a Big6 auditor, 0 otherwise
The target variable is FAIL, either = 1 or 0. You program and model using logistic regression.
First the data set is read in from an Excel sheet Sheet1 in the xls file. X is normalized as the range of variables are much different.
[data,txt,raw] = xlsread('bankruptcy.xls','Sheet1');
X = data(:,1:12);
X = normalize(X);
X = data(:,1:12);
y = data(:,13);
[m,n] = size(X);
Then a column of 1 is added to X corresponding to θ0
X = [ones(m,1) X];
theta = zeros(n+1,1);
The function fmincon is then called to search for the minimum cost, unconstrained optimized, in which the cost is provided by the function computeCost()
options = optimset('GradObj', 'on', 'MaxIter', 100);
[theta,cost] = fminunc(@(t)(computeCost(t,X,y)),theta,options);
The function computeCost()
function [cost,grad] = computeCost(theta,X,y,lambda)
implements the cost function$$J(\theta)=\frac{1}{2m}\sum\limits_{i=1}^m (-y^{(i)}log(h_\theta(x^{(i)})-(1-y^{(i)})log(1-h_\theta(x^{(i)}))+\frac{\lambda}{2m}\sum\limits_{j=1}^n\theta_j^2$$
and the gradient
$$\frac{\partial J(\theta)}{\partial t}≔\frac{1}{m} \sum\limits_{i=1}^m (h_θ (x^{(i)}-y^{(i)})(x_j)^{(i)}+\frac{\lambda}{m}\theta_j^2$$
then stored in the variables cost and grad to return to the calling function.
The 2 above are implemented by these lines. Note that theta(1) should not be included as it is the parameter of x0 = 1
z = X*theta;
h = sigmoid(z);
grad = (1/m * (h-y)' * X) + lambda * [0;theta(2:end)]'/m;
cost = 1/(m) * sum(-y .* log(h) - (1-y) .* log(1-h)) + lambda/m/2*sum(theta(2:end).^2);
Performance is much better if data is projected to a high dimensional space (explanation is going to be in another post)