5.9 Compare download counts with alphabetical order of names

We might also have such experiences in our life : when we go shopping in the supermarket, the products placed in front of the shelves can be noticed by us easier. From the perspective of R-packages, in addition to the lengths of names, we also wondered whether alphabetical order can link to the download volume. R-packages with earlier alphabetical order will be placed at the first part of package list on CRAN(“Available CRAN Packages by Name” 2021). To answer this question, we grouped the R-packages by 26-letter order, calculated the average downloads for each group, and made comparisons.

Finding 1: For all of CRAN R-packages, the average downloads of different alphabetical groups are slightly increasing with alphabetical order, while the total downloads tend to decrease a little, on the contrary.

From Figure 5.25, we could see that the average downloads of different alphabetical groups are slightly increasing with alphabetical order, while the total downloads tend to decrease, instead. That is because the later-ordered groups contain fewer R-packages. Developers may prefer to name their packages with top alphabetical order, which might be easier for users to notice.

Figure 5.25: The average total download counts of each group is little linked to the alphabetical order of name, for all of R-packages.

Finding 2: For all of R-packages on CRAN, alphabetical groups with higher total downloads tend to have greater variance, owing to more outliers.

Then, we took a look at how the variance varies across groups. Figure 5.26 shows the data range and the median value for each alphabetical group. It can be seen that the group “R/r” has the highest outlier and the group “X/x” has the largest variation. The variances across groups difference little, which means that for each group, 50% of the download counts are relatively concentrated. The real difference is the highest and lowest downloads per group. In general, the larger the total numbers of downloads (which also means more packages in this group), the more outliers can be included, such as group “F/f,” “L/l” and “R/r.”

Figure 5.26: The R-packages with name starting with “j” has the largest variation.

In order to further verify our conclusion, we turned to the ultra-low-downloaded R-packages. As mentioned previously, when it comes to the ultra-low-downloaded R-packages, we could approximately assume the only factor that may affect the downloads is name order here. From Figure 5.27, we could see that the difference in median download counts of each alphabetical group is not significant as we expected.

Figure 5.27: The last 1% downloaded R-packages with name starting with “U/u” has only one observation.

Therefore, we could approximately draw a conclusion : In general, the R-packages with top alphabetical order can be easier to get relatively higher download volume, with non-significant gaps. At the same time, the higher the numbers of downloads, the greater the variance can appear in this group.

References

“Available CRAN Packages by Name.” 2021. CRAN Packages By Name. https://cran.r-project.org/web/packages/available_packages_by_name.html.