Bilim ve Felsefe Yazıları: UNDERSTANDING LINEAR REGRESSION

Linear regression is a method for finding a linear relationship between two variables that are related to each other. In mathematical terms, it is the process of expressing randomly determined (xi, yi) points in an x-y coordinate system with the most reasonable linear equation. Let's examine this method with a simple example.

The equation of the line that represents the above points will be of the form y = a + bx. The value of b is the slope of the line, and a is the point where it intersects the y-axis. Using this linear equation, we can write the y-value we will find for each x-value in the graph in terms of the unknowns a and b. Let's call this ỹ. Naturally, there will be differences between ỹ and our original y-points.

Our goal is to be able to draw a line where the calculated error differences will be at a minimum value. This is the only way to achieve the "most reasonable linear equation" that we mentioned at the beginning. In other words, the sum of the error values (y - ỹ) calculated for each x value must be minimum. Here we face an important problem; how will we sum the errors?

How to Find the Minimum Error:

We can examine the y error differences we find for each x value using a few methods:

1. Sum them: Some of the errors will be negative and some will be positive. Therefore, if we simply add the errors as they are, the positive errors and negative errors will cancel each other out and be extinguished. So, if we simply add the error values, we will get meaningless results.
2. Sum their absolute values: If we add the absolute values of the errors and try to minimize this value, some lines with high errors will appear to be advantageous in terms of error. In other words, the total error may be numerically small, but the line we draw may be on an irrelevant line. For example, in the figure below, although the situation on the left is more erroneous than the one on the right, if we evaluate it based on absolute values, we would choose the one on the left.

Both of the above methods will cause us to make mistakes. We will both prevent errors of different signs from canceling each other out and we will weight the higher errors, i.e., increase their effects. So, taking the square of the errors will save us from these troubles. In fact, the sum of all the even powers of the errors will lead us to the correct result, but the 4th power and higher powers will create unnecessary workload and will give the same result as the 2nd power. Because the 2nd power is sufficient to weight the errors.

Sum of Squares of Errors

In the table above, we need to sum the values we found as the y - ỹ error for each x value.

If we adjust...

As you can see, our error function depends on the variables a and b. When we find the a and b values at the minimum point of this function, we will have solved our problem. Since we have reduced our problem to a function, we can call it a cost function.

Minimum Point of the Cost Function

Let's take a look at what the 5a^2 + 55b^2 + 30ab - 34a - 112b + 63 function we found above looks like. The easiest way to do this is to use the online version of Octave, an open source numerical analysis program. It is a great alternative because it saves you the trouble of downloading and installing it on your computer, and it can even be used on your phone or tablet.

We go to the https://octave-online.net/ website and write the following commands in sequence:

a = -10 : 0.1 : 10

b = -10 : 0.1 : 10

F = 5*a.^2 + 55*b.^2 + 30*a.*b - 34*a - 112*b + 63

In this way, we created the a and b arrays from -10 to 10, and wrote our cost function.

We can see the 3D graph with the command plot3(a,b,F).

As you can see, our function has a minimum point near 0. Therefore, if we look at this function from the a-axis and calculate the slopes, and find the point where this slope is 0, we will have found the minimum point with respect to the a-axis. We must also do the same thing with respect to the b-axis.

We know that the mathematical name for this is partial derivative.

Partial Derivatives of the Cost Function

As we explained above, we need to take the derivatives of the function

F = 5a^2 + 55b^2 + 30ab - 34a - 112*b + 63

with respect to both a and b and set them equal to 0. In this way, we had stated that we would find the point where the slope is 0 and find the values of a and b that minimize the function.

We have two equations with two unknowns. We can solve them by multiplying one of the equations by a damping coefficient and adding it to the other.

10a + 30b = 34 (multiply this by -3 and add it to the other to eliminate a)

30a + 110b = 112
------------------------------------------
-30a - 90b = -102
30a + 110b = 112
------------------------------------------
20b = 10 yields b = 0.5 and a = 1.9

Therefore, the equation we are looking for is y = 1.9 + 0.5x. Now let's plot this equation together with the 5 points we determined at the beginning.

Our Octave commands are as follows:

a = 0:1:6

b = 0:1:6

plot(a,b,'w')

hold on

x = [1 2 3 4 5]

y = [3 2 3 5 4]

plot(x,y,"x")

t = 0:0.1:6;

k = 1.9 + 0.5*t;

plot(t,k,'b')

Bilim ve Felsefe Yazıları

25 Mart 2024 Pazartesi

UNDERSTANDING LINEAR REGRESSION

Hiç yorum yok:

Yorum Gönder

UNDERSTANDING ARTIFICIAL NEURAL NETWORKS

Bu Blogda Ara