Section 4 Methods

In this project, we extracted R-package names from CRAN and obtained summary daily download logs of R-packages through the web Application Programming Interface (API) maintained by r-hub(R-Hub 2020). We also collected release dates of R-packages through CRAN. In addition, we also tried to scrap the number of commits on master (main) branch in Github repositories for all of available R-packages. Then we conducted eight exploratory data analysis studies:

  • we explored the daily downloads of all the packages on CRAN and figured out the number of R-packages occupied by different downloaded groups;

  • we analyzed the daily download trend of R itself and the found out the most popular version of R;

  • we compared the top 15 downloaded R-packages on CRAN, yearly, on 1st April. from 2013 to 2021, to see how user preference change;

  • we figured out the relationship between initial release dates and the total download counts over the most recent 6 month period. The reason for choosing the period of 6 months is : different R-packages are initially released in different time. If the comparison period is too long, it will be unfair to the R-packages released later. And if the period is too short, download data will be not enough to reflect some potential patterns or relationships;

  • we compared two closely related packages : fable and forecast, to see the difference between their MA (moving average) trends;

  • we studied how the number of commits on master (main) branches in Github repositories may influence the total download counts for the most recent 6 month period;

  • we explored the relationship between the number of updates for all of R-packages and their download counts;

  • we analyzed the name length patterns for both CRAN task view packages and all of R-packages on CRAN;

  • we explored how the alphabetical order of R-packages would related to the download counts and the statistic features (e.g.variance, median of the download counts).

References