faheemmu@grinnell.edu Muhammad Faheem khanalsa@grinnell.edu Sauryanshu Khanal
We will conduct an OLS regression with a .csv file as an input and we will output the results of running the regression in a text file. The output will include coefficient, variance, standard errors and p-values with the specific regression model that the user specifies in the shell. More information: https://www.statology.org/multiple-linear-regression/ Our implementation will only contain 3 numerical variables (and hence three variable datasets): a Y-var and two X Vars.
We have our own shell that has 5 commands:
load anyFileName.csv // this command will load specified .csv file (should be present in repository)
view // this command will show you the data from the .csv file you loaded
reg var0 var1 var2 // this will form regression model with y as dependent variable and x1 and x2 as independent
export output.txt // this will create a text file of specified name and export the regression to that
exit // this will exit the shell
- Import the GNU Science Library. We use this library to import a T-Distribution that does a two-tailed test to calculate coefficient p-Values.
sudo apt-get install libgsl-dev
- Run Make command
make
- Run ./shell to load shell
./shell
load sample_data.csv // Load the dataset
view // Print loaded dataset
reg height weight age // Run the regression and display its output
The output should be displayed as follows:
reg weight height age // Run a different regression and display its output
export output.txt // Put the regression data in output.txt
Note: it will put the regression results of most recent model (reg x1 x2 y in our case)
exit // Exit the terminal
- Inside the shell, you can always load new datasets as you see fit to run new regressions.
- We have appropriate error handling inside the shell, and also while doing parallel computation, so that the shell exits in the face of unexpected errors.