Multiple Linear Regression.

Use the dataset that you have been using for the previous projects. Use all of your independent variables, your response variable, and the lm() function to build a multiple linear regression model. Print the model with the summary() function. The output will be similar to the bottom of page 141.

Use the pairs() function to look at the scatterplots of the interval/ratio variables. Color your points by the value of a nominal/ordinal variable.

A standard regression model with correlated independent variables will almost always perform poorly. For this project, you will remove independent variables until the model is trustworthy. Use the summary() output

The scatterplots to decide if a variable should be removed. Remove the variable.

Repeat the process of

• build model

• check summary() and scatterplots

• remove variable until you believe all variables in the model should stay in the model.

Use par(mfrow = c(2,2)) and the plot() function to look at diagnostic plots of the reduced model (similar to the plots on page 129).

Create a simple linear regression model with one of your numeric independent variable and your response variable.

Build the scatterplot of the response variable by the independent variable, Include the line of best fit on the first scatterplot.

The scatterplot of the residuals by the independent variable (similar to figure 3.3, page 50).

Plot the residuals by the response variable. Do you the scatterplots indicate that there are any problems with the model?

Use hist() to plot a histogram of the residuals. Do the residuals appear to be normally distributed?

Use qqnorm() and qqline() to plot a QQ-normal plot with the QQ-line of the residuals. Do the residuals appear to be normally distributed?

Use par(mfrow = c(2,2)) and plot(‘linear model’) to build a plot similar to figure 3.14 on page 70.

Record which data points are labeled in the subplots.

Solution:

The data points: 209, 425, 435, and 599.

Print those observations. Investigate each of these points and decide which ones are legitimate data points and which ones are erroneous and polluting your dataset. Use car::powerTransform() to find power transformations for

• y – min(y) + 1, and

• x – min(x) + 1.

Transform the data and call the new data y_new and x_new. Build four scatterplots.

• y ~ x

• y_new ~ x

• y ~ x_new

• y_new ~ x_new

Which of these models appears to be the be fit?

Solution:

The simple linear regression model.

Build the corresponding linear model.

The post Multiple Linear Regression first appeared on COMPLIANT PAPERS.