The golden era of Ten simple rules articles

R
PLOS
Data analysis
Author

Pablo Gómez Barreiro

Published

September 2, 2024

Top 10 lists have dominated the internet for ages, and academic publishing is no exception. My Twitter feed is proof of this; with frequent “Ten Simple Rules to…” being retweeted left and right. This made me curious about the origin of these papers and whether they were a sudden trend or just a passing phase. I did not jump into this data search with many expectations, but what I initially thought would be a simple answer turned into an unexpectedly enjoyable internet adventure.

The data, and the cleaning

Using Dimensions, I downloaded all available data for articles containing the phrases “10 simple rules” and “Ten simple rules.” It’s not a perfect starting point, but I’ll clean the data later. If you’d like to follow along or explore on your own, you can find the original .csv file here: LINK. Just remember to skip the first row when loading the file in R. After that, it’s simply a matter of filtering out titles that don’t contain the target keywords

Show the code
library(tidyverse)

keywords<-c("ten simple rules","10 simple rules")

clean_data<- read_csv("Dimensions-Publication-2024-08-24_19-59-19.csv",  skip = 1)%>%
  filter(grepl(paste(keywords,collapse = "|"),tolower(Title)))

head(clean_data,10)

The original sin

Arranging data by publication year shows that the first known paper (according to Dimensions) with “Ten simple rules” in its title was published back in 1988.

Screenshot from the 1988 movie Who framed roger rabbit?

For temporal context, the movie Who framed roger rabbit? was also published in 1988
Show the code
library(tidyverse)

clean_data%>%
  select(Title,PubYear)%>%
  arrange(PubYear)%>%
  head(5)

The paper, titled Ten Simple Rules for Improving Advertising Programs in the Health Care Industry, was the first of its kind. I’d love to provide more details about these rules, but in 1988, academia hadn’t yet discovered the wonders of open access, and modern publishers still believe in gatekeeping the contents of a 36-year-old article.

The second paper with a similar title had to wait a whopping 17 years before making its appearance in 2005. Philip E. Bourne, the Editor-in-Chief of PLOS Computational Biology, was the mastermind behind the fitting title Ten Simple Rules for Getting Published, which can be credited with truly starting the “10 simple rules” trend, setting off a snowball effect in academic publishing.

The golden era

Ten simple rules” is such a simple and effective formula for a title—a blend of clickbait with the enticing promise to solve the reader’s problems in just a few easy steps. But just how many of these papers are out there in the wild? and the answer is:

Screen caption of the paper titled The association between academia clickbait and impact

Clickbait titles in academia are on the rise, but this is not necessarily correlated with an increase of citations
Show the code
library(ggthemes)

clean_data%>%
  summarise(.by=PubYear,n=n())%>%
  ggplot(aes(x=PubYear,y=n))+
  geom_col(fill="darkred")+
  scale_x_continuous(breaks = seq(1988,2024,2))+
  labs(title = "The age of 10 simple rules....", y="Number of Articles",x="Year",caption = "Data source: Dimensions")+
  theme_economist()+
  theme(axis.text.y = element_text(hjust = 0),
        axis.title.y = element_text(vjust = 5),
        axis.text.x = element_text(angle=90))

A histogram showing the number of articles with ten simple rules in its title over time.

a lot! About 300 of “10 simple rules” papers. That means in the scientific literature there are at least 3000 simple rules!

My next question was, “Where are all these articles being published?” When I saw the answer, my first thought was that I must have made an embarrassing mistake during data cleaning or miswritten something in the code. But nope, the data was correct—papers titled Ten Simple Rules have a clear and distinct origin.

Show the code
clean_data%>%
  summarise(.by=`Source title`,n=n())%>%
  arrange(desc(n))%>%
  mutate(`Source title`=ifelse(grepl("International Journal for Paras",`Source title`),"IJPPW",`Source title`))%>%
  head(10)%>%
  ggplot(aes(y=fct_reorder(`Source title`,n),x=n))+
  geom_col(fill="darkred")+
  scale_x_continuous(limits = c(0,300))+
  labs(title = "The age of 10 simple rules....", y="Top 10 journals",x="Number of articles",caption = "Data source: Dimensions")+
  theme_economist()+
  theme(axis.text.y = element_text(hjust = 0),
        axis.title.y = element_text(vjust = 5))

A graph showing the journal PLOS Computational Biology has published the large majority of articles with the words "ten simple rules" in its title

The large majority of these papers are coming from PLOS Computational Biology, and of course it’s no coincidence that their Editor-in-Chief published the first modern Ten Simple Rules paper back in 2005. This became a series, with him leading the list of authors who had articles published in it.

Show the code
#Compile a vector with all authors names and count
all_authors <-  trimws(unlist(strsplit(clean_data$Authors, ";")))
table(all_authors)%>%
  as.tibble()%>%
  arrange(desc(n))%>%
  head(5)
# A tibble: 5 × 2
  all_authors              n
  <chr>                <int>
1 Bourne, Philip E.       27
2 Bourne, Philip E        12
3 Botham, Crystal M.       6
4 Mulder, Nicola           5
5 De Las Rivas, Javier     4

The impact of 10 simple rules for

Dimensions data also includes article citation numbers. To assess the impact of these papers over time, I decided to track the number of citations they accumulate each year, adding 1 to avoid dividing by zero for articles published in 2024.

Show the code
library(ggrepel)
clean_data%>%
  mutate(years_publicated=2024-PubYear+1,
         cit_per_year=`Times cited`/years_publicated,
         iscompbio=ifelse(`Source title`=="PLOS Computational Biology","Yes","No"))%>%
  ggplot(aes(x=PubYear,y=cit_per_year,fill=iscompbio))+
  scale_fill_manual(values = c("darkblue","darkred"))+
  geom_jitter(height=0,width = .2,colour="black",size=3,shape=21)+
  geom_text_repel(data=.%>%
                    filter(cit_per_year>40)%>%
                    mutate(Title=gsub("ten simple rules for ","...",tolower(Title))),
                  aes(label=Title,y=cit_per_year),
                  nudge_y=5,
                  force=2,
                  max.overlaps=3,
                  direction="both",
                  size=4,
                  segment.linetype = 2
                  )+
  scale_x_continuous(breaks = seq(1988,2024,2))+
  labs(title = "The impact of 10 simple rules for....", y="Citations/year published",x="Year",caption = "Data source: Dimensions",fill="Published in PLOS CB?")+
  theme_economist()+
  theme(axis.text.y = element_text(hjust = 0),
        axis.title.y = element_text(vjust = 5),
        axis.text.x = element_text(angle=90),
        legend.position = "bottom")

A graph with the number of citations accumulated per year against year published of papers with 10 simple rules in its title. The graph makes a distinction between articles published or not at PLOS

For good measurement I am also going to plot articles vs total citations

Show the code
clean_data%>% 
  ggplot(aes(x=`Times cited`))+
  geom_histogram(colour="black",fill="darkred",binwidth = 10)+
  labs(title = "The impact of 10 simple rules for....", y="Number of articles",x="Total citations",caption = "Data source: Dimensions")+
  theme_economist()+
  theme(axis.text.y = element_text(hjust = 0),
        axis.title.y = element_text(vjust = 5),
        axis.text.x = element_text(angle=90))

Histogram showing most articles with ten simple rules in the title are in the low spectrum of number of citations

Despite PLOS having the large majority of ten simple rules articles, the more impactful ones (citation wise) were published at different publishers. However, in general the majority of articles remain on the lower end of the citation spectrum, including some newer articles that may not have yet achieved their breakthrough.

At this point, I decided to turn to the internet for some additional clues into PLOS and its “Ten Simple Rules” series, just to discover my findings were not particularly novel.

Stephen Heard had already commented on this trend on his blog less than a year ago. Along with a quality parody of this trending genre of papers, his blog ends up highlighting the epitome article of the Ten simple rules… series: Ten Simple Rules for Writing a PLOS Ten Simple Rules Article. Published back in 2014, the article is still a fun read with some interesting data on the series. Back then, Philip E. Bourne was present in almost half of the papers on the series.

2014 Pie chart from PLOS article shows Philip E. Bourne presence in the series

The future of the ten simple rules series

Later, in 2018, Philip E. Bourne would then went into publishing One thousand simple rules. The paper ends rising some interesting questions about the series, its novelty and the future of science dissemination. This article goes as far as questioning of whether these articles might be better suited as blogs. But here we are in 2024, and the series is still going strong. With the current PLOS article processing chargess, each published rule cost authors approximately $296, perhaps an extra factor to reconsider the long term viability of the series in its current format. At glance, some articles in the series seem to offer basic guidelines and common sense, with a wide range of tones in them (From Ten simple rules to win a Nobel prize to Ten simple rules to ruin a collaborative environment). In 2023, the number of articles published in the series was halved compared to 2022, although the 2024 “harvest” is on track to match 2023’s output.
As 2025 marks the 20th anniversary of the “Ten Simple Rules” series, will PLOS plan a special edition to celebrate the milestone?

Note

PLOS articles can be downloaded for text mining purposes from here. In hindsight, I should have use this to get some better metrics for the Ten simple rules… series. I have got a tutorial on text mining PLOS data here in case you want to explore further.