Statistics project

**Order Instructions:**

Using any data of interest to your group, compile a data set comprised of one predictor and one

response variable with at least 20 observations (data points), and answer the questions below.

Your project needs to be typed and plots can be made using any software of your choice. Only one

project (with each member’s name) per group needs to be submitted. Your project should include

all the observations used.

Provide a brief description of your project. Make sure to identify the predictor and response

variables, as well as discussing the objective of your regression model.

1. (20%) All your answers must be in the order in which the questions are asked, otherwise you will be

deducted 20%. Note: Even if only one answer is out of order you will still be deducted 20%.

2. (15%) For your predictor and response variables:

(a) compute the range and IQR.

(b) make a histogram of your data.

(c) make a boxplot of your data.

3. (25%) Make a scatterplot of your data and describe the:

(a) Direction

(b) Form

(c) Strength

(d) Correlation

(g) Outliers

4. (40%) Based on your data, construct a linear regression model of your response variable as a function

of your predictor variable following the steps below:

(a) Compute ¯x and ¯y

(b) Compute sx and sy

(c) Compute r

(d) Compute a and b

(e) Construct the respective Least Squares line and plot it over your scatter plot.

(f) Compute the respective R2 and interpret your results.

(g) For your model, compute and plot the residuals vs x. Describe what you observe from

this plot.

(h) Are there any outliers? If so, are they high leverage and/or influential.

(i) Based on your model, make 3 predictions for your response variable (i.e., use 3 different

values of x that are not in your data, and compute the respective y value

**SAMPLE ANSWER**

### Statistics project

**Question One**

The data below was obtained from an organization that wanted to estimate the cost of leasing a building given the contract value for constructing the building. It follows that the contract value was the predictor variable while the estimated cost is the response variable.

Estimated cost | Contract value | ||

85,000 | 310,000 | 100,000 | 360,000 |

70,000 | 305,000 | 120,000 | 370,000 |

110,000 | 180,000 | 150,000 | 200,000 |

90,000 | 170,000 | 80,000 | 250,000 |

130,000 | 160,000 | 180,000 | 300,000 |

160,000 | 110,000 | 190,000 | 160,000 |

160,000 | 150,000 | 200,000 | 210,000 |

280,000 | 180,000 | 350,000 | 230,000 |

130,000 | 175,000 | 180,000 | 250,000 |

320,000 | 180,000 | 380,000 | 270,000 |

** ****Question Two**

- compute the range and IQR.

**Range**

Constructed value =380,000-80,000

=300,000

Estimated cost = 320,000-70,000

=250,000

**Quartile Range**

Constructed value = 300000- 175000

=125000

Estimated cost = 197500- 125000

= 72500

(b) Make a histogram of your data.

**(c) make a boxplot of your data.**

**Question Three**

**(a) Direction**

The direction of a relationship tells whether the values on two variables go up

and down together. The nature of the plot indicates direction. If two variables have a positive direction, then as the values on one variable go up, so do the values on the other variable. The data used has a positive direction because the points of the scatter plots run from the lower left to the upper right. This implies that as the vales of the contract value go up so does the value of the estimated cost and vice versa.

**(b) Form**

The shape of the plot could explain the form of the scatter plot. This is because there are instances where the plot has a curved shape. In other instances, the plot could have a straight line plot. If there is a linear relationship, then the plot will appear to swarm or cloud in a generally straight and consistent form. The plot above indicates that the data points are straight and consistent. I.e. there is a linear relationship between the estimated cost and the contract value.

**Strength**

The strength of the relationship between variables is determined by how close the plotted points are from one another. Closely placed points indicate a strong relationship between the variables. In this case, the points are neither close nor far from each other. Therefore, there is a moderate relationship between the variables.

**Correlation**

The correlation between two variables measures the strength and direction of the relationship between the variables. The strength and direction of the variables have already been established in the previous paragraphs. Therefore, we conclude that there is a moderate positive relationship between the variables.

**(g) Outliers**

The extreme points in a scatter plot identify outliers. In this case, there are four outliers. The box plot has also demonstrated this.

**Question Four**

**(a) Compute ¯x and ¯y**

Mean for estimated cost is given by the sum of all the observations divided by the number of observations.

**¯x = **3,455,000/20

=172750

The mean for the contract value is given by the sum of all the observations divided by the number of observations.

**¯y =**4,530,000/20

=226,500

**Compute sx and sy**

The standard deviation of the variables is given by taking the square root of the sum of all the deviations from the mean and dividing by the number of observations less by one.

The standard deviation for the estimated cost is

Sd = (107,323,750,000/19) ^1/2

= 75157.2912

The standard deviation for the contract value is

Sd** **= (209,836,250,000/19) ^1/2

= 105090.4998

**Compute r**

The correlation coefficient is given by the following formula.

Estimated cost (Y) | Contract value (X) | XY | X^{2} |
Y^{2} |

85,000 | 100,000 | 8500000000 | 7,225,000,000 | 10,000,000,000 |

70,000 | 120,000 | 8400000000 | 4,900,000,000 | 14,400,000,000 |

110,000 | 150,000 | 16500000000 | 12,100,000,000 | 22,500,000,000 |

90,000 | 80,000 | 7200000000 | 8,100,000,000 | 6,400,000,000 |

130,000 | 180,000 | 23400000000 | 16,900,000,000 | 32,400,000,000 |

160,000 | 190,000 | 30400000000 | 25,600,000,000 | 36,100,000,000 |

160,000 | 200,000 | 32000000000 | 25,600,000,000 | 40,000,000,000 |

280,000 | 350,000 | 98000000000 | 78,400,000,000 | 122,500,000,000 |

130,000 | 180,000 | 23400000000 | 16,900,000,000 | 32,400,000,000 |

320,000 | 380,000 | 121600000000 | 102,400,000,000 | 144,400,000,000 |

310,000 | 360,000 | 111600000000 | 96,100,000,000 | 129,600,000,000 |

305,000 | 370,000 | 112850000000 | 93,025,000,000 | 136,900,000,000 |

180,000 | 200,000 | 36000000000 | 32,400,000,000 | 40,000,000,000 |

170,000 | 250,000 | 42500000000 | 28,900,000,000 | 62,500,000,000 |

160,000 | 300,000 | 48000000000 | 25,600,000,000 | 90,000,000,000 |

110,000 | 160,000 | 17600000000 | 12,100,000,000 | 25,600,000,000 |

150,000 | 210,000 | 31500000000 | 22,500,000,000 | 44,100,000,000 |

180,000 | 230,000 | 41400000000 | 32,400,000,000 | 52,900,000,000 |

175,000 | 250,000 | 43750000000 | 30,625,000,000 | 62,500,000,000 |

180,000 | 270,000 | 48600000000 | 32,400,000,000 | 72,900,000,000 |

3,455,000 | 4,530,000 | 903,200,000,000 | 704,175,000,000 | 1,178,100,000,000 |

= 0.94439147

**Compute a and b**

a = -6958.173

b = 0.793

**(e) Construct the respective Least Squares line and plot it over your scatter plot.**

Estimated Cost = -6958.173 + 0.793 contract value

**(f) Compute the respective R2 and interpret your results.**

= 0.89187525

This implies that 89 percent of the variation in expected cost is explained by the variation in the contract value.

**(g) For your model, compute and plot the residuals vs. x. Describe what you observe from this plot.**

The residual plot above indicates that the data has a constant and independent variance because the plots are consistent regardless of the contract value. It is also clear that the data follows a normal distribution form the normal probability plot below.

**(h) Are there any outliers? If so, are they high leverage and/or influential?**

There are outliers in the data but they are neither high leveraged or influential.

**Based on your model, make 3 predictions for your response variable**

Using the following equation Estimated Cost = -6958.173 + 0.793 contract value

The predicted value for three values is indicated in the table below.

Contract Value | 276000 | 302000 | 144000 |

Predicte Estimated Cost | 212023.9716 | 232652.7243 | 107293.3807 |

We can write this or a similar paper for you! Simply fill the order form!