Linear Regression

事情的发生是有规律性的，这样我们才能预测天气、发射火箭、制造原子弹、将来甚至进行星际旅行。那么这些规律又是如何发现的呢？以前靠人，例如牛顿、爱因斯坦。那么以后会不会由机器来发掘呢？

Linear Regression

场景1 假设你是房地产销售人员，而且你手里有一些已经有一些已经成功售出的历史数据，如 表 1-1 所示。你想根据房屋面积来预测房子的售价，这个时候你应该怎么办？

表 1-1 历史数据

房屋面积(m²)	售价(万)
42	88
55	108
64	130
75	158
82	180
90	192
108	210
123	258

图 1-1 数据分布图

假想函数

观察数据分布图，我们容易发现它很像初中就接触过的线性函数——y=ax+b。这里我们就假设售价与房屋面积的关系为线性函数。即假想函数(Hypothesis Function) 是

$$ h(x) = θ_0+θ_1x $$

有了假想函数，下一步需要做的事情就是确定假想函数中的各个参数的值，或者叫特征值(Features)，这里我们有 $ θ_0 , θ_1 $ 这两个特征值。

我们分别做如下假设，图 1-2 中表示了各个假想函数的拟合结果:

$θ_0=150,θ_1=0 $，得到假想函数 $ h(x) = 150 $
$θ_0=0,θ_1=1 $，得到假想函数 $ h(x) = x $
$θ_0=0,θ_1=2 $，得到假想函数 $ h(x) = 2x $

图 1-2 各个假想函数的拟合结果

代价函数

从 图 1-2 中我们很明显的可以看到，$ θ_0=0,θ_1=2 $ 时拟合的结果最好。

提问1 除了用经验来判断拟合结果的好坏，有没有能量化比较的方法呢?

答案是肯定的，这就是代价函数(Cost Function)。

$$J(θ_0,θ_1) = \frac 1 {2m} \sum_{i=0}^{m} (h_θ(x_i)-y_i)^2 $$

1	注:有的公式符号前面需要加转义字符 \，例如 _，应该写成 \_

$ (h_θ(x_i)-y_i)^2 $。表示假想函数求出来的值减去实际值的平方
$ \sum_{i=0}^{n} $。表示从 i=0 累加到 i=n

随着 $ θ_0,θ_1 $ 的取值不同，代价函数 $ J(θ_0,θ_1) $ 的结果值如 图 1-3 所示

图 1-3 代价函数的结果

代价函数表示的是假想函数的拟合程度，代价函数求出来的值越小，说明拟合的越成功。现在的问题已经变成了如何求代价函数的最小值。

梯度下降

梯度下降(Gradient Descent)定义如下:

$$ θ_j=θ_j-α \frac δ {δθ_j} J(θ_0,θ_1) $$

图 1-4 梯度下降的几何含义

需要注意的是:在每次迭代中，应该同时更新参数。在这里需要同时更新 $ θ_0,θ_1$

$ temp0:=θ_0-α \frac δ {δθ_0} J(θ_0,θ_1) $
$ temp1:=θ_1-α \frac δ {δθ_0} J(θ_0,θ_1) $
$ θ_0=temp0 $
$ θ_1=temp1 $

图 1-5 α 取值的影响

实际操作中，我们需要及时调整参数 α，以确保梯度下降算法在合理的时间内收敛。未能收敛或取得最小值的时间太多意味着我们的步长是错误的。

octave 代码实现

图 1-6 J(θ)求偏导

执行结果如下:

X-y-theta

iterator-result

Cost Function

function J = computeCost(X, y, theta)
%COMPUTECOST Compute cost for linear regression
%   J = COMPUTECOST(X, y, theta) computes the cost of using theta as the
%   parameter for linear regression to fit the data points in X and y

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly
J = 0;

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta
%               You should set J to the cost.

m = size(X,1);
predictions = X*theta;
sqrErrors = (predictions-y).^2;

J = 1/(2*m)*sum(sqrErrors);



% =========================================================================

end

Gradient Descent

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
%GRADIENTDESCENT Performs gradient descent to learn theta
%   theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by
%   taking num_iters gradient steps with learning rate alpha

% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);

for iter = 1:num_iters

    % ====================== YOUR CODE HERE ======================
    % Instructions: Perform a single gradient step on the parameter vector
    %               theta.
    %
    % Hint: While debugging, it can be useful to print out the values
    %       of the cost function (computeCost) and gradient here.
    %

    x1 = X(:,2);
    h = theta(1) + (theta(2)*x1);

    theta_zero = theta(1) - alpha * (1/m) * sum(h-y);
    theta_one = theta(2) - alpha * (1/m) * sum((h-y) .* x1);

    theta = [theta_zero; theta_one];


    % ============================================================

    % Save the cost J in every iteration
    J_history(iter) = computeCost(X, y, theta);
    disp(J_history(iter));

end
    disp(min(J_history));
end