Naming something in general, and especially in R, can be quite tricky, because there seems to be no real consensus about naming conventions for packages and functions. Since I am currently in the process of releasing an R package, I wanted to get more recent data about the usage of different naming styles for R packages and package functions. My hypothesis was that the users most probably are familiar with the naming conventions which are used by the most downloaded R packages and therefore it might make sense to adopt these naming styles.
For my small analysis project I distinguished between the following naming styles:
- lowercase (lc)
- UPPERCASE (UC)
- lowerCamelCase (lCC)
- UpperCamelCase (UCC)
- name_with_underscores (us / snake_case)
- name.with.dots (dot)
Today (2017-06-04), I’ve pulled the package names and the names of the functions of a package from the API on rdocumentation for the 500 most downloaded R packages. After some processing, I’ve created a contingency table of the naming styles of the function and package names and visualized it using the following mosaic plot. The modal value of the function naming style is used, if different styles occur. If lowercase and lowerCamelCase functions are used in one package, the lowercase occurences are added to the lowerCamelCase counts.
First we see that around 70% of the package names are in lowercase. This large proportion could be interpreted as a guideline for choosing a package name.
The function names are distributed in a more heterogenous way. The combination of lowercase package names and lowerCamelCase function names appear most frequently. Also underscore, dot and lowercase function names are used often in conjuntion with lowercase package names.
When we have a look at the top 100 of the most downloaded R packages, we see that combination between lowercase package names and function names with underscores is the second most frequent one.