Problem solving with r code

  

Instructions: Using R and the data set provided, provide the code and output that completes the following. I have intentionally left a few minor details for you to either reason through or research. I am aware that there are lots of ways to accomplish many of these tasks in R; I do not really care which way you do them so long your output answers the questions asked.

Expectation: This assignment is constructed such that you can easily validate your answers in Excel with minimum effort. Therefore, I expect you to do so.

Note: You might find this tidbit useful. The simple use of the aggregate function in R produces two variables with the labels: “Group.1” and “x”. If you want to change the labels you can use the command “names” (e.g. names(x) = c(“NewLabel_Group.1”, “NewLabel_x”)

1. Load the data file.

2. Provide to me the structure of the loaded data set.

3. Provide me a “summary” of the loaded data structure.

4. Count of releases per year.

5. Count of releases for each group of two years (i.e. 1992 and 1993, 1994 and 1995, etc).

6. Average number of the Lines of Code (LOC) per releases per year.

7. Average size of the file size per year.

8. Create a single data frame through code which contains the year along with the avg, median, and standard deviation for LOC and tar file variables.

 find something interesting to tell me about from this data or about this assignment! Note that this is subjective on my part and is intended to challenge you to go beyond the constraints of the above questions!

Tags: No tags