Self made algorithms to know how the algorithms work.
Probability based concept Starting out with the prior probability (or believe) P(A) , when given a likelihood) P(B | A) and evidence P(B) we arrive at the posterior probability P(A | B) . The Bayes' Theorem formula is, posterior = likelihood times prior, over evidence
P(A | B)= (P(B | A)⋅P(A))/(P(B))
User defined Algorithm to perform Gaussian Naive Bayes' (for numerical data handling)
Probability based concept Gaussian Naive Bayes formula for this dataset cause data is normally distributed¶ Formula used : f(x) = (1/(sd * sqrt(2 * pi))) * pow(e, ((-1/2) * pow(((x-m)/sd), 2))) ; where, sd = standard deviation ; m = mean ; f(x) = probability for the point x
Statistics based
A straight line equation is y=m*x + b where m is the slope of the line and b is the y-intercept.
m = (Σ (xi - mean_x)(yi - mean_y))/ (Σ (xi - mean_x)^2)
b = mean_y - (m*mean_x) ; Here xi and yi denotes the term of x and y at position i, where i ranges from 0 to the ending range of x and y. For regression based problem, accuracy is found using R2_score function.
Calculus based.
Steps for Gradient Descent
1.First initialization of m and b value. (By assuming, with m=1 and b=0 for the very first time) 2.Assuming a Learning rate and then calculating the step-size for calculating the next m and b value. 3.Calculating the next value with m and b. 4.Repeating the entire steps for all the new values of m and b unill step size <= 0.001, i.e more or less 1000 times.
Formulae to be used : Equation of a straight line : y=mx + b ; here m = slope of the line and b is the intercept of the line Equation of the SS of a particular line : Σ(pyi-ayi)^2 ; i ranges from 1 to n where n is the total no of y points.
yi = the y value for the ith point. pyi = predicted yi value of the best fit line. ayi = actual yi value. SS = Sum of the squared distance from the equation of the SS line we will see that SS = f(m,b) i.e. SS is a function of m and b
Calculation of b: differentiate f(m,b) with respect to b. Then follow the 4 steps of Gradient descent as mentioned above for just b value.
Calculation of m: differentiate f(m,b) with respect to m. Then follow the 4 steps of Gradient descent as mentioned above for just m value.
Geometric concept
Equation to get the distance from a point to the line is :
di = (|A*xi + Byi + C|)/ √(A^2 + B^2)
Here, di is the distance of the point no. i
xi is the x coordinate of point i and yi is the y coordinate of point i
i ranges from 1 to n, where n is the total no. of points
Logistic Regression says,
after calculating di, we need to find - di_total.
di_total = Σdi where i ranges from 1 to total no. of points.
Next, we need to calculate d_dash.
d_dash = yi_cap * di_total
Since Logistic Regression works for binary classification, yi_cap has two values:
**yi_cap1 = 1; yi_cap2 = -1 **
The line which has maximum d_dash value is the best fitted line.