For all the questions below, use degrees of freedom N – 1.
Q1. Given a dataset, (file data.online.scores.txt) which includes the records of studentsexam scores (sample from the population) for the past few years of an online course.

The firstcolumn students’ id, the second column is the mid-term scores, and the third column is the finalscores, and data are splitted by tab. Based on the dataset, give out the following statisticaldescription of data. If the result is not integer, then round it to 3 decimal places. Give out the basic statistical description about mid-term scores.

a. Max, min

b. First quartile Q1, median, third quartile Q3.

c. The mean score.

d. The mode score.

e. Empirical Variance.

Q2. Based on the data of students’ score (file data.online.scores.txt).

Please normalize the mid-term score using z-score normalization (divided by the empirical standard deviation).

a. Compare the empirical variance before and after normalization.

b. Given original score of 90, what is the corresponding score after normalization?

c. Pearson’s correlation coefficient between midterm scores and final scores is:

d. Covariance between midterm scores and final scores is:

Q3. Given the inventories of two libraries Citadel’s Master Library (CML) and Castle

Black’s library(CBL), compare the similarity between this two libraries by using the different

proximity measures. if the result is not integer, then round it to 3 decimal places.

a. Given 200 books, the following table summarizes how many books are supplied by corresponding library in Table 1. In Table 1, for CBL = 0, CML = 0, it corresponds the number of items among the 200 items that are served neither by CBL nor CML. For CBL = 1, CML = 0, it corresponds the number of items among the 200 items that are served by CBL but not CML.

So on and so forth. Based on Table 1, calculate the Jaccard coefficient of Citadel’s Maester Library (CML) and Castle Black’s library(CBL).

