Statistical Computing

#Problem A. Random number generation and power.

**Definition:** The **power** of a statistical test is the probability the test correctly rejects the null hypothesis when it is indeed false.

# Statistical Computing Assignment Paper

1. Let’s explore the `rnorm` function. The `rnorm()` function in R randomly generates data from a normal distribution with a specified mean and a specified standard deviation. Recall we saw the exact equation of the probability distribution function in the notes. The r in `rnorm` stands for “random.” The data will be randomly drawn based on the probabilities dictated by the probability distribution function.

a. Use the `set.seed()` function with a seed number of your choosing. Then complete the following: Use `rnorm(500)` and draw a histogram using `ggplot2` tools of the resulting data values. Enter your code in the space provided below so that when you **knit** this document it will show the code and the histogram.

“`{r problem1a}

“`

b. Use `rnorm(500)` again and draw a histogram using base tools of the resulting data values. Note: You do not need to use the `set.seed()` function again. Enter the code in the space provided below.

“`{r problem1b}

#Enter your code here.

“`

c. Write a few sentences to describe your histograms in parts a) and b). Also, explain What you think the 500 in the code represents. Type our answers below in plain text:

TYPE YOUR ANSWER HERE!

d. Based on a) and b), what do you think the default mean is for the `rnorm` function? How did the graphs inform your answer? Type your answers below.

TYPE YOUR ANSWER HERE IN PLAIN TEXT:

e. Do you know what the default standard deviation is? How did you determine this?

TYPE YOUR ANSWER HERE IN PLAIN TEXT:

f. Use `rnorm(500,100,5)` and draw a histogram (`base` or `ggplot2`) of the resulting data values. Report your code in the space provided below.

“`{r problem1f}

#Enter your code here.

“`

g. Use `rnorm(500,100,5)` again and draw a histogram (`base` or `ggplot2`) of the resulting data values. Report your code in the space provided below.

“`{r problem1g}

#Enter your code here.

## Statistical Computing Assignment Paper

h. Based on f) and g), what does the second argument (the 100) of the `rnorm` function do? How did the graphs provide evidence of this for you?

TYPE YOUR ANSWERS HERE IN PLAIN TEXT:

i. What does the third argument (the 5) of the `rnorm` function do?

TYPE YOUR ANSWER HERE IN PLAIN TEXT:

j. Generate 1000 observations from the F distribution with 5 numerator degrees of freedom and 10 denominator degrees of freedom. Draw a histogram (`base` or `ggplot2`) of the data values.

“`{r problem1j}

#Enter your code here.

“`

k. Write a few sentences to describe the histogram you generated in part j.

TYPE YOUR ANSWER HERE IN PLAIN TEXT:

*Now that you have an understanding of how the `rnorm` function works, complete the following problems. These items go together to investigate the statistical concept of power.*

2. Create two scalars named `rows` and `samplesize` with the values of 10 and 3, respectively. (We will later change these values to 1000 and 30, but while you are working on getting all of your code to run, these smaller values will allow you to print objects and view them to investigate what is going on).

“`{r problem2}

#Enter your code here.

“`

3. Use the `set.seed(1000)` function so that we all are randomly generating from the same starting point.

“`{r problem3}

#Enter your code here.

“`

4. Use the `rnorm()` function in R in order to randomly generate data from a normal distribution with a mean of 100 and a standard deviation of 5. You should generate enough values to fill in a matrix with the number of rows and the number of columns given by the objects `rows` and `samplesize` that you created in #2. You should not retype their values. Instead, reference the objects so that later you can make the change once in the code to explore other rows and sample size options.

“`{r problem4}

#Enter your code here.

“`

5. Create a matrix of the data values you randomly generated in #4 that has the number of rows given by the scalar you created in #2. Also include code to print the matrix you created.

“`{r problem5}

#Enter your code here.

“`

6. Calculate the mean of each row of your matrix and store this information in an object called `mymeans`. Print the output. Answer the question that follows.

“`{r problem6}

#Enter your code here.

“`

What do you notice about the sample means from your samples and the mean of the normal distribution from which they were drawn?

TYPE YOUR ANSWER HERE IN PLAIN TEXT:

7. Calculate the standard deviation of each row of your matrix and store this information in an object called `mysd`. Print the output. Answer the question that follows.

“`{r problem7}

#Enter your code here.

“`

What do you notice about the sample standard deviations from your samples and the standard deviation of the normal distribution from which they were drawn?

TYPE YOUR ANSWER HERE IN PLAIN TEXT:

8. Suppose you planned to test the hypotheses of $H_0: \mu = 107$ vs. $Ha: \mu \neq 107$ in order to determine if the mean of the population from which your data is drawn is different from 107.

a. Question: What is the true mean of the population from which this data is drawn? Type your answer below in plain text.

TYPE YOUR ANSWER HERE IN PLAIN TEXT:

b. Question: Therefore, what do you expect your p-value would look like (small or large)? Type your answer below in plain text.

TYPE YOUR ANSWER HERE IN PLAIN TEXT:

c. Question: Therefore, what is the correct outcome of this test (reject the null or do not reject the null)? Type your answer below in plain text.

TYPE YOUR ANSWER HERE IN PLAIN TEXT:

d. For this test, you would calculate the test statistic as $$t=\frac{\bar{x}-107}{s/\sqrt{n}}$$

Using the objects you created in #6 and #7 and knowing that you created an object in #2 called `samplesize`, which indicates the size of your sample, calculate this test statistic for every row of your data using R. You do not need the apply function, this is just a calculation involving vectors! Save your vector of test statistics in an object called `test.stat`. Be very careful with parentheses!! I would calculate the numerator and denominator separately and then divide if I were you! Your code should ultimately print the object `test.stat`.

“`{r problem8d}

#Enter your code here.

“`

e. Report the data values (should be 3 values) from the first row of your matrix of data values by using R code to print them. Report the mean and standard deviation of three data values by using R code to calculate these values. Use R as a calculator to calculate the test statistic for these three data values only. Then print the first element of the object `test.stat` to verify that it was calculated correctly. Be ware of order of operations – PEMDAS!

“`{r problem8e}

#Enter your code here.

“`

9. We can use the following code to calculate the two-sided p-value:

`pvals <- 2*pt(abs(test.stat),lower.tail=_____________,df=________)`

Note that the `abs()` function in R calculates absolute values. I did this because I wanted to only work with the positive version of the test statistics. Since the t-distribution is symmetric, I can use mirror images to calculate p-values more efficiently. Additionally, I can multiple by 2 in order to find the two-sided p-value. Complete the two blanks in the code and run it below. Your outcome should be a vector with one p-value for each row of the data. Print your vector pvals also.

“`{r problem9}

#Enter your code here.

“`

10. Find the proportion of times the p-values were less than 0.05. Note that you should use a logical comparison to get TRUE/FALSE values for each p-value based on if it is less than 0.05. Then fill in the blanks in the two interpretations below.

“`{r problem10}

#Enter your code here.

“`

We can interpret the proportion or percentage from your calculated from the previous problem as follows:

Interpretation 1: ___% of the time we correctly rejected the null hypothesis that $H_0: \mu = 107$ when the true population mean is $\mu = 100$ based on samples of size 3.

OR

Interpretation 2: The probability of correctly rejecting the null hypothesis of $H_0: \mu = 107$ when the true population mean is $\mu = 100$ is about ___ based on samples of size 3.

###The remaining problems ask you to re-run your code from #2-10 making changes each time. Copy and paste your code and re-run it to answer the following questions. Do not edit your original code (so that I can grade questions #2-10). Instead, copy and paste all of the individual lines of code from #2-#10 and make the necessary changes. HOWEVER, REMOVE ANY PRINTING OF OBJECTS. I DON’T WANT TO SEE ANYTHING THAT IS 1000 LINES LONG!

11. Our answer in #10 was based on seeing the process repeated a mere 10 times. That’s not enough to see the long-term patterns! Copy your code from #2-10 and paste it below. **Remove any lines of code that would print out objects.** For this problem, I want you to change the number of `rows` to 1000 so that we can look at the long-term proportion of times the null hypothesis is correctly rejected. Your code should ultimately print out the proportion of times you obtained a p-value of less than 0.05 and therefore rejected the null hypothesis. Also, write a sentence to summarize the proportion you find in context as indicated in the space after your R code.

“`{r problem11}

#Enter your code here.

### Statistical Computing Assignment Paper

Type your sentence in plain text here:

12. Our answer in #10 was based on seeing the process repeated a mere 10 times but it was also based on samples of size 3. That’s not very interesting! Copy your code from #11. For this problem, I want you to change the number of `rows` to 1000 and to change the `samplesize` to 30 so that we can look at the long-term proportion of times the null hypothesis is correctly rejected for a larger sample size. Your code should ultimately print out the proportion or times you obtained a p-value of less than 0.05 and therefore rejected the null hypothesis. Type your code below. Also, write a sentence to summarize the proportion you find in context as indicated in the space after your R code and answer the questions that follow by typing in plain text.

“`{r problem12}

#Enter your code here.

“`

Type your sentence in plain text here:

Is this what you would expect? (YES/NO)

TYPE YOUR ANSWER HERE IN PLAIN TEXT:

Explain. Type your answer as plain text here:

TYPE YOUR ANSWER HERE IN PLAIN TEXT:

We can write this or a similar paper for you! Simply fill the order form!