5.4 Compare last year’s downloads with the earliest release date
Finding: R-packages that are initially released earlier on CRAN tend to have a higher download count in the past year. This perhaps is because in earlier times, there were fewer R-packages in the same category, so users had ‘no choice’ but to use them. Due to this, these R-packages accumulate a certain user base, which makes it more possible to attract new users.
In our common cognition, we may assume that the earlier a R-package is released, the more people will know about it, and thus the more downloads it will have. However, R-packages related to different topics cannot be directly compared, because it is possible that the total download amount of R-packages in a certain topic is higher than that in another topic. Therefore, in order to test this conjecture as clearly as possible, we selected three domain R-packages through CRAN task view(n.d.b), calculated their respective downloads in the previous half a year, and extracted their earliest release dates for comparison. Those three topics are :
- R-packages for Time Series Analysis
The first topic is Time Series Analysis. Time Series Analysis is a statistical technique that deals with time series data, or trend analysis. Time series data means that data is in a series of particular time periods or intervals(“Time Series Analysis” 2020).
- Bayesian R-packages for general model fitting
The second topic is Bayesian Inference. Bayesian statistics is a mathematical procedure that applies probabilities to statistical problems. It provides people the tools to update their beliefs in the evidence of new data(perpetual 2019).
- Econometrics R-packages
In order to test whether this is the case in other areas, the last topic is for econometrics R-packages. Econometrics is the use of statistical methods using quantitative data to develop theories or test existing hypotheses in economics or finance, which relies on techniques such as regression models and null hypothesis testing(Hayes 2020).
Figure 5.10 displays the scatterplot of the past year’s download count and the earliest release date for Time Series Analysis
, Econometrics
and Bayesian
R-packages. It can be seen that generally, as the earliest release date gets later and later, the number of download logs becomes lower and lower. And for Time Series Analysis
R-packages, they are mainly released between 2012 and 2019. For Bayesian
R-packages, most of the R-packages are from 2007 to 2012. And most Econometrics
are centered between 2013 and 2016.

Figure 5.10: The download count decrease with the initial release date.
In conclusion, we are not surprised to find that the earlier the R-package is released, the more downloads it could has, which is reflected in all of three topics of R-packages above. That is probably because the R-packages released earlier will be better known. When they are released early, there may be a relatively small number of R-packages of the same topic, under non-serious competition. As a result, the R-packages coming later can easily be covered up, since people generally tend to use well-known, mature and habitual packages.
That is to say, earlier R-packages are more conducive to the cultivation of user habits. After all, habits are influenced by the length of time. For example, if the teacher is an old user of some R-packages, they may recommend these R-packages to their students when they teach, or colleagues may prefer to recommend familiar R-packages to others especially when they get a satisfying user experience.