Bilim ve Felsefe Yazıları

25 Mart 2024 Pazartesi

UNDERSTANDING ARTIFICIAL NEURAL NETWORKS

NATURAL NEURAL NETWORKS

A natural neural network is composed of connections between neurons. The human brain has approximately 86 billion neurons, a chimpanzee has approximately 28 billion, a honey bee has 960 thousand neurons, and a nematode, a living creature 1 millimeter long and 65 micrometers thick, has 320 neurons.

The structure of a typical neuron is shown in the figure.

Axon terminals are connected to the dendrites of other neurons. The connection points are called synapses, and the entire set of connections is called a synaptic connection. The structure of a synaptic connection is shown in the figure below.

There are electrochemical interactions between axons and dendrites. When the change in ion levels exceeds a certain threshold value, an electron flow occurs from one side to the other. Electrons, or electric current, create our movements, thoughts, and all our actions. When we raise our arm, electrons cause muscle cells to contract, when we dream, flows occur from some neuron groups to others, hormones are released, and so on.

When scientists observed the structures of neurons that we have summarized above, they thought that it could be modeled mathematically. After a system is modeled mathematically, it can be imitated in different artificial structures. This is how the foundation of artificial neural networks was laid.

The mathematical model above means the following:The information received from the inputs (dendrites) is multiplied by certain weight ratios called W (chemical levels in the dendrite channels), summed (enters the neuron body), and then enters an activation function called f (chemical structure of the axon), and then the output.
In the natural structure, the amount of electrons produced at the axon output corresponds to the concept of output in our model.

Now we will solve a problem that is a product of our mind, that is, a product of natural intelligence, using an artificial neural network that we have modeled as above.

IMPLEMENTATION OF XOR LOGIC GATE WITH ARTIFICIAL NEURAL NETWORK

The input (x1,x2) and output (y) connections for the XOR logic gate are as follows:

We need to design an artificial neural network with two inputs and one output. For example, something like this:

Here, X1 and X2 are the input neurons, A, B, and C are the intermediate layers (also called hidden layers), and O is the output neuron. w1-w9 are the weight multipliers. In an artificial neural network, the inputs and weight multipliers are multiplied and summed to create the output values. There is no other trick to artificial neural networks. In fact, the basic working structure of the brain is not much different. However, when this structure consists of thousands and millions of elements, it creates amazing things.

Our main goal is to find such mathematical multipliers that the system will give the appropriate "XOR output" value for each "XOR input" value applied to the input.

In the first stage, we take the weight multipliers randomly because we do not know what they are. Our goal will be to find their values.

The value of x1*w1 enters neuron A, and the value of x2*w2 also enters neuron A. In other words, the product of the input and the weight is the input of the corresponding neuron. For example, according to our first input value, neuron A:

A = x1*w1 + x2*w2 = 0 * 0.1 + 0 * 0.2 = 0

Each neuron other than X1 and X2 has an activation function. The input value is the input of this activation function, and the output of the function is reflected in the output of the neuron. Different activation functions can be used. One of the most common is the sigmoid function.

The sigmoid function is 1 / (1+e^(-x)) and its curve is as follows;

Now let's see the output value produced by each neuron for the first input value, that is, for (x1,x2) = (0,0)

For the output neuron O, the input value is:

O-input = sigmoid(A)*w7 + sigmoid(B)*w8 + sigmoid(C)*w9
= 0.5*0.7 + 0.5*0.8 + 0.5*0.9 = 1.2

Since the output of neuron O is the sigmoid value of the input value,
Output = sigmoid(O) = 0.7685248

To summarize what we have done so far, we found an output value of 0.7685248 for the input value (x1,x2) = (0,0) and the randomly determined weight values w1-9.

Similarly, let's write the output values we found for each input pair and the differences between our findings and the expected values, i.e. the errors:

We need to find new w values that will minimize the error at the output, but how will we find them? Different algorithms can be used for this task. In the method we will use, we will take the derivative of the activation function. The derivative will give us the rate and direction of change. By multiplying the output error with this derivative, we find in which direction and by what amount we can increase or decrease the w weights. By doing this repeatedly, we find the most suitable weight values. This method is called backpropagation. Let's see step by step how this is calculated for the output neuron:

Input value of the output neuron = w7*Aout + w8*Bout + w9*Cout

Output value of the output neuron = sigmoid(Input value of the output neuron)

Error = Output value of the output neuron - desired output value

Delta output = SigmoidDerivative(Input value of the output neuron) * error

We can also call this "global error".

We need to find new weight multipliers based on the error. For example:

New w7 value = Old w7 value + Delta output * Aout

w1-w6 weights can be calculated with the same logic:

New w1 value = Old w1 value + x1*(SigmoidDerivative(Ain)*Delta output * old w7)

In this way, we find the new weight multipliers w1-w9. For each w, we do the same calculations with the corresponding input and output neurons. If we apply the new multipliers to the system, we will see that the output values approach our desired outputs.

As you can see, the outputs are always evolving towards the desired value. If we do this process 10000 times...

After 10,000 corrections, the values seem to be very close to the desired values. After 10,000 corrections, the w1-w9 weight multipliers calculated for the values (x1,x2) = (1,1) are as follows:

(1.251617 1.351617 0.7538233 0.8538233 7.1306686 7.2306686 -2.1630681 -2.2256228 -1.628696 )

Initially, we set the weight multipliers randomly as w = [0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9]. We performed error correction 10 thousand times and found new weight multipliers. When we replace the newly found weight multipliers in our system as seen in the figure below, it gives an output value of 0.0040834 for the input values (1,1). It should have been 0, but we approached this value with a very small margin of error.

This is how artificial neural networks basically work.

UNDERSTANDING LINEAR REGRESSION

Linear regression is a method for finding a linear relationship between two variables that are related to each other. In mathematical terms, it is the process of expressing randomly determined (xi, yi) points in an x-y coordinate system with the most reasonable linear equation. Let's examine this method with a simple example.

The equation of the line that represents the above points will be of the form y = a + bx. The value of b is the slope of the line, and a is the point where it intersects the y-axis. Using this linear equation, we can write the y-value we will find for each x-value in the graph in terms of the unknowns a and b. Let's call this ỹ. Naturally, there will be differences between ỹ and our original y-points.

Our goal is to be able to draw a line where the calculated error differences will be at a minimum value. This is the only way to achieve the "most reasonable linear equation" that we mentioned at the beginning. In other words, the sum of the error values (y - ỹ) calculated for each x value must be minimum. Here we face an important problem; how will we sum the errors?

How to Find the Minimum Error:

We can examine the y error differences we find for each x value using a few methods:

1. Sum them: Some of the errors will be negative and some will be positive. Therefore, if we simply add the errors as they are, the positive errors and negative errors will cancel each other out and be extinguished. So, if we simply add the error values, we will get meaningless results.
2. Sum their absolute values: If we add the absolute values of the errors and try to minimize this value, some lines with high errors will appear to be advantageous in terms of error. In other words, the total error may be numerically small, but the line we draw may be on an irrelevant line. For example, in the figure below, although the situation on the left is more erroneous than the one on the right, if we evaluate it based on absolute values, we would choose the one on the left.

Both of the above methods will cause us to make mistakes. We will both prevent errors of different signs from canceling each other out and we will weight the higher errors, i.e., increase their effects. So, taking the square of the errors will save us from these troubles. In fact, the sum of all the even powers of the errors will lead us to the correct result, but the 4th power and higher powers will create unnecessary workload and will give the same result as the 2nd power. Because the 2nd power is sufficient to weight the errors.

Sum of Squares of Errors

In the table above, we need to sum the values we found as the y - ỹ error for each x value.

If we adjust...

As you can see, our error function depends on the variables a and b. When we find the a and b values at the minimum point of this function, we will have solved our problem. Since we have reduced our problem to a function, we can call it a cost function.

Minimum Point of the Cost Function

Let's take a look at what the 5a^2 + 55b^2 + 30ab - 34a - 112b + 63 function we found above looks like. The easiest way to do this is to use the online version of Octave, an open source numerical analysis program. It is a great alternative because it saves you the trouble of downloading and installing it on your computer, and it can even be used on your phone or tablet.

We go to the https://octave-online.net/ website and write the following commands in sequence:

a = -10 : 0.1 : 10

b = -10 : 0.1 : 10

F = 5*a.^2 + 55*b.^2 + 30*a.*b - 34*a - 112*b + 63

In this way, we created the a and b arrays from -10 to 10, and wrote our cost function.

We can see the 3D graph with the command plot3(a,b,F).

As you can see, our function has a minimum point near 0. Therefore, if we look at this function from the a-axis and calculate the slopes, and find the point where this slope is 0, we will have found the minimum point with respect to the a-axis. We must also do the same thing with respect to the b-axis.

We know that the mathematical name for this is partial derivative.

Partial Derivatives of the Cost Function

As we explained above, we need to take the derivatives of the function

F = 5a^2 + 55b^2 + 30ab - 34a - 112*b + 63

with respect to both a and b and set them equal to 0. In this way, we had stated that we would find the point where the slope is 0 and find the values of a and b that minimize the function.

We have two equations with two unknowns. We can solve them by multiplying one of the equations by a damping coefficient and adding it to the other.

10a + 30b = 34 (multiply this by -3 and add it to the other to eliminate a)

30a + 110b = 112
------------------------------------------
-30a - 90b = -102
30a + 110b = 112
------------------------------------------
20b = 10 yields b = 0.5 and a = 1.9

Therefore, the equation we are looking for is y = 1.9 + 0.5x. Now let's plot this equation together with the 5 points we determined at the beginning.

Our Octave commands are as follows:

a = 0:1:6

b = 0:1:6

plot(a,b,'w')

hold on

x = [1 2 3 4 5]

y = [3 2 3 5 4]

plot(x,y,"x")

t = 0:0.1:6;

k = 1.9 + 0.5*t;

plot(t,k,'b')

31 Ocak 2022 Pazartesi

Understanding Low Pass Filters using SciLab

This study will utilize SciLab to explore the characteristics of low-pass filters. SciLab, Octave and MATLAB are all powerful scientific computing programs with significant overlap in functionality. As a result, any of these programs could be suitable for this investigation.

We will aim to understand and compare the characteristics of various filter types, including the Simple Moving Average, Weighted Moving Average, Exponential Moving Average, Butterworth Low Pass Filter and Alpha-Beta Filter.

Let’s use a noisy sine signal for the job.
for a clear sine;

pi = 3.1415;
M_SQRT2 = 1.4142;
t = 0 : 0.05 : 4*pi;
y = sin(t);
plot(t,y)

adding some noise;

r_var = 0.3*(0.5 - rand(t));
noisy_sine = y + r_var;
scf();
plot(t,noisy_sine)

Simple Moving Average

This filter applies a moving average to the integral of the previous data points.

SciLab code;

sma_filter_coeff = 32; //or 8

sma_array = zeros(1,length(t));

aux_array = zeros(1,length(sma_filter_coeff))

aux_total = 0;

for i = 1:length(t)

aux_array(sma_filter_coeff) = real_signal(i);

aux_total = real_signal(i)

for j = 1:(sma_filter_coeff-1)

aux_array(j) = aux_array(j+1)

aux_total = aux_total + aux_array(j);

end

sma_array(i) = aux_total/sma_filter_coeff;

end

Weighted Moving Average

SciLab code;

wma_filter_coeff = 16; //or 8

wma_array = zeros(1,length(t));

average = 0;

for i = 1:length(t)

average = average - (average/wma_filter_coeff);

average = average + (noisy_sine(i)/wma_filter_coeff);

wma_array(i) = average;

end

Exponential Moving Average

SciLab code;

ema_filter_coeff = 32; //or 8

ema_array = zeros(1,length(t));

alpha = 2/(ema_filter_coeff + 1)

for i = 1:length(t)-1

ema_array(i+1) = noisy_sine(i)*alpha + ema_array(i)*(1-alpha);

end

Butterworth Filter

A second order butterworth filter;

butterworth_array = zeros(1,length(t));

samplerate = 1;

cutoff = 0.05;

QcRaw = (2 * pi * cutoff) / samplerate;

QcWarp = tan(QcRaw);

gain = 1.0 / (1.0 + M_SQRT2/QcWarp + 2.0/(QcWarp*QcWarp));

by_2 = (1.0 - M_SQRT2/QcWarp + 2.0/(QcWarp*QcWarp)) * gain;

by_1 = (2.0 - 4.0/(QcWarp*QcWarp)) * gain;

by_0 = 1; ax_0 = gain; ax_1 = 2 * gain; ax_2 = gain;

xv_0 = 0; xv_1 = 0; xv_2 = 0; yv_0 = 0; yv_1 = 0; yv_2 = 0;

for i = 1:length(t)

xv_2 = xv_1;

xv_1 = xv_0;

xv_0 = noisy_sine(i);

yv_2 = yv_1;

yv_1 = yv_0;

yv_0 = ax_0 * xv_0 + ax_1 * xv_1 + ax_2 * xv_2 - by_1 * yv_0 - by_2 * yv_1;

butterworth_array(i) = yv_0;

end

Alpha-Beta Filter

In fact, the alpha-beta filter is not strictly a low-pass filter; it's primarily an estimator. However, it can be used for signal smoothing in a way that resembles a low-pass filter. Let's see how this works.

dt = 0.1;

ALPHA = 0.2;

BETA = 0.0001;

position = zeros(1,length(t));

speed = zeros(1,length(t));

measured = zeros(1,length(t));

prediction_error = 0;

for i = 2:length(t)

measured(i) = noisy_sine(i);

position(i) = position(i-1) + ( speed(i-1) * dt );

speed(i) = speed(i-1);

prediction_error = measured(i) - position(i);

position(i) = position(i) + ALPHA * prediction_error;

speed(i) = speed(i) + ( BETA / dt) * prediction_error;

position(i-1) = position(i);

speed(i-1) = speed(i);

end

All filters above in a frame (detail);