From epub to print copy, which of the traditional ‘pain’ journals is the quickest?
Peter Kamerman
5 February 2017Background
Traditional journals (those that publish hardcopy volumes) typically publish an electronic version of the article before the print copy is produced, presumably because the practise reduces the time between a journal article being accepted for publication and the information being disseminated. These epubs have a DOI, which makes them readily citable.
In my neck of the woods (South Africa), the number of original research outputs by a university is factored into the annual government subsidy an institution receives. Only articles with page numbers are included in the calculation, which for traditional journals means that the articles must have been published in hardcopy format. At my institution, the University of the Witwatersrand, a small fraction of that government subsidy for publications trickles down to the originating labs as a research incentive. It’s not much money, but every bit helps in these tight funding times, and so ‘time to print’ is something we have to consider when selecting which journal(s) to submit our work to.
To help us decide which of the traditional pain-focused journals have the quickest electronic to hardcopy conversion rate, I have performed a very crude analysis of the ‘time to print’ by the four top-ranked traditional pain journals (based on impact factor) we typically consider submitting manuscripts to (Table 1.).
# Make a dataframe to populate the table
<- tibble(Journal = c('<a href="http://journals.lww.com/clinicalpain/pages/default.aspx" target="_blank">Clinical Journal of Pain</a>', '<a href="http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1532-2149" target="_blank">European Journal of Pain</a>', '<a href="http://www.jpain.org/home" target="_blank">Journal of Pain</a>', '<a href="http://journals.lww.com/pain/pages/default.aspx" target="_blank">PAIN</a>'),
tab_df `Impact factor` = c(2.7, 2.9, 4.5, 5.6),
`Year started` = c(1985, 1997, 2000, 1975),
`Frequency (issues per year)` = c(12, 10, 12, 12))
# Print table
kable(x = tab_df,
align = 'lrrr',
caption = '<b>Table 1.</b> Journals included in assessment')
Journal | Impact factor | Year started | Frequency (issues per year) |
---|---|---|---|
Clinical Journal of Pain | 2.7 | 1985 | 12 |
European Journal of Pain | 2.9 | 1997 | 10 |
Journal of Pain | 4.5 | 2000 | 12 |
PAIN | 5.6 | 1975 | 12 |
Getting the data
I obtained the electronic and print publication dates of articles for the past four years from PubMed. Beyond the usual web browser method of searching PubMed, you can remotely access the full database through the user-friendly and well-documented Entrez Programming Utilities API (E-utilities). In R you can make these queries to the PubMed database directly using packages such as xml2, or if you are not familiar with using web APIs, the guys at rOpenSci have given us the excellent rentrez package. I have used the direct approach here (see code below).
# Set eval = FALSE after first run so as to speed-up knit on future run
# First run used to save data outputs from this chunk to file, which can
# be read into memory in future runs.
############################################################
# #
# Query PubMed for records from the #
# top four journals from the past 4 years #
# #
############################################################
# Set E-Utilities base query string
<- 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/'
base_url
# Set database query to search and fetch from PubMed
<- 'esearch.fcgi?db=pubmed'
database_query <- 'efetch.fcgi?db=pubmed'
database_query2
# Set search criteria
## Restricted to:
### 1. journal articles
### 2. articles with abstracts
### 3. Clin J Pain, Eur J Pain, J Pain, PAIN
### 4. Entrez entry date range of 2013/01/01 to 2016/12/31
### 5. First 10,000 articles
### 6. xml format
<- '&term=((journal+article[Publication+Type]+AND+hasabstract[All+Fields])+AND+("2013/01/01"[EDAT]+:+"2016/12/31"[EDAT]))+AND+((("Pain"[Journal]+OR+"J+Pain"[Journal])+OR+"Clin+J+Pain"[Journal])+OR+"Eur+J+Pain"[Journal])&rettype=xml&retmax=10000'
terms
# Piece together the search query string
<- paste0(base_url,
search_query
database_query,
terms)
# Execute search
<- read_xml(search_query)
search_get
# Find xpath for PMIDs from 'query'
<- xml_find_all(search_get, xpath = './/Id')
pmid_path
# Use xpath to extract PMIDs
<- xml_text(pmid_path)
pmids
############################################################
# #
# Fetch the records using the returned PMIDs #
# #
############################################################
# Get the number of records returned by the search
= length(pmids)
record_count
# Split the 'pmids' vector into n = 200 sized chunks
# (the max number of ids the API can handle)
<- seq(from = 1,
splitter to = record_count,
by = 200)
# Create an empty list of length 'splitter'
<- vector(mode = 'list',
splitter_list length = length(splitter))
# Split the list of PMIDs, and paste each into a single string
for(i in seq_along(splitter)) {
<- pmids[splitter[[i]]:(splitter[[i]] + 199)]
splitter_list[[i]] <- paste(splitter_list[[i]],
splitter_list[[i]] collapse = ',')
}
# Create empty list of length 'splitter_list'
<- vector(mode = 'list',
pubmed_query length = length(splitter_list))
# Populate empty list with repeated PubMed query calls
for(i in seq_along(splitter_list)) {
<- paste0(base_url,
pubmed_query[[i]]
database_query2,'&id=',
splitter_list[[i]],'&retmode=xml&retmax=200')
}
# Fetch pubmed xml records
<- map(pubmed_query,
record
read_xml)
############################################################
# #
# Make a user-defined function ('parse_record') #
# to extract date information #
# #
############################################################
<- function(record) {
parse_record
# Packages to load when fucntion used outside this .Rmd script #
################################################################
# library(dplyr)
# library(xml2)
# library(stringr)
# Set XPaths to xml nodes #
###########################
#-- Publisher -----------------------------------------------------------#
<- xml_path(
publisher_path xml_find_all(record,
'.//ISSNLinking'))
#-- Journal -------------------------------------------------------------#
<- xml_path(
journal_path xml_find_all(record,
'.//ISOAbbreviation'))
#-- Volume --------------------------------------------------------------#
<- xml2::xml_path(
volume_path ::xml_find_all(record,
xml2'.//Volume'))
#-- Issue ---------------------------------------------------------------#
<- xml2::xml_path(
issue_path ::xml_find_all(record,
xml2'.//Issue'))
#-- PMID ----------------------------------------------------------------#
<- xml_path(
pmid_path xml_find_all(record,
".//ArticleId[@IdType = 'pubmed']"))
#-- Publication status --------------------------------------------------#
<- xml_path(
status_path xml_find_all(record,
'.//PublicationStatus'))
#-- Year / month published ----------------------------------------------#
<- xml_path(
year_published_path xml_find_all(record,
'.//PubDate/Year'))
<- xml_path(
month_published_path xml_find_all(record,
'.//PubDate/Month'))
#-- Year / month / day online -------------------------------------------#
<- xml_path(
year_online_path xml_find_all(record,
".//ArticleDate[@DateType = 'Electronic']/Year"))
<- xml_path(
month_online_path xml_find_all(record,
".//ArticleDate[@DateType = 'Electronic']/Month"))
<- xml_path(
day_online_path xml_find_all(record,
".//ArticleDate[@DateType = 'Electronic']/Day"))
#-- Year / month / day entrez -------------------------------------------#
# PAIN stopped giving the 'ArticleDate' info in 2015, so also get
# 'PubMedPubDate[@PubStatus = 'entrez']', which is a close match.
<- xml_path(
year_entrez_path xml_find_all(record,
".//PubMedPubDate[@PubStatus = 'entrez']/Year"))
<- xml_path(
month_entrez_path xml_find_all(record,
".//PubMedPubDate[@PubStatus = 'entrez']/Month"))
<- xml_path(
day_entrez_path xml_find_all(record,
".//PubMedPubDate[@PubStatus = 'entrez']/Day"))
# Extract information using XPaths #
####################################
#-- Publisher -----------------------------------------------------------#
# Define vector for publisher name
<- vector(mode = 'character',
publisher length = length(publisher_path))
for(i in 1:length(publisher_path)) {
<- str_to_lower(
publisher[[i]] xml_text(
xml_find_first(record,
publisher_path[[i]])))
}
# Make article marker for joins
## Define vector for 'trimmed' publisher path
<- vector(mode = 'character',
publisher_path2 length = length(publisher_path))
for(i in 1:length(publisher_path)) {
<-
publisher_path2[[i]] str_extract(publisher_path[[i]],
'/PubmedArticleSet/PubmedArticle\\[[0-9][0-9]?[0-9]?\\]')
}
# Make dataframe
<- data.frame(article_node = publisher_path2,
publisher2 publisher = publisher)
#-- Journal -------------------------------------------------------------#
# Define vector for journal name
<- vector(mode = 'character',
journal length = length(journal_path))
for(i in 1:length(journal_path)) {
<- str_to_lower(
journal[[i]] xml_text(
xml_find_first(record,
journal_path[[i]])))
}
# Make article marker for joins
## Define vector for 'trimmed' journal path
<- vector(mode = 'character',
journal_path2 length = length(journal_path))
for(i in 1:length(journal_path)) {
<-
journal_path2[[i]] str_extract(journal_path[[i]],
'/PubmedArticleSet/PubmedArticle\\[[0-9][0-9]?[0-9]?\\]')
}
# Make dataframe
<- data.frame(article_node = journal_path2,
journal2 journal = journal) %>%
mutate(journal = str_replace_all(journal,
pattern = '[.]',
replacement = ''))
#-- Volume ----------------------------------------------------------------#
# Define vector for journal volume
<- vector(mode = 'numeric',
volume length = length(volume_path))
for(i in 1:length(volume_path)) {
<- xml2::xml_text(
volume[[i]] ::xml_find_first(record,
xml2
volume_path[[i]]))
}
# Make article marker for joins
## Define vector for 'trimmed' volume path
<- vector(mode = 'character',
volume_path2 length = length(volume_path))
for(i in 1:length(volume_path)) {
<-
volume_path2[[i]] ::str_extract(
stringr
volume_path[[i]],'/PubmedArticleSet/PubmedArticle\\[[0-9][0-9]?[0-9]?\\]')
}
# Make dataframe
<- dplyr::data.frame(article_node = volume_path2,
volume2 volume = volume) %>%
separate(volume,
into = c('volume', 'other'),
extra = 'merge') %>%
mutate(volume = as.numeric(volume)) %>%
select(article_node, volume)
#-- Issue -------------------------------------------------------------------#
# Define vector for journal issue
<- vector(mode = 'numeric',
issue length = length(issue_path))
for(i in 1:length(issue_path)) {
<- xml2::xml_text(
issue[[i]] ::xml_find_first(record,
xml2
issue_path[[i]]))
}
# Make article marker for joins
## Define vector for 'trimmed' issue path
<- vector(mode = 'character',
issue_path2 length = length(issue_path))
for(i in 1:length(issue_path)) {
<-
issue_path2[[i]] ::str_extract(
stringr
issue_path[[i]],'/PubmedArticleSet/PubmedArticle\\[[0-9][0-9]?[0-9]?\\]')
}
# Make dataframe
<- dplyr::data.frame(article_node = issue_path2,
issue2 issue = issue) %>%
separate(issue,
into = c('issue', 'other'),
extra = 'merge') %>%
mutate(issue = as.numeric(issue)) %>%
select(article_node, issue)
#-- PMID ----------------------------------------------------------------#
# Define vector for pmid
<- vector(mode = 'character',
pmid length = length(pmid_path))
for(i in 1:length(pmid_path)) {
<- xml_text(
pmid[[i]] xml_find_first(record,
pmid_path[[i]]))
}
# Make article marker for joins
## Define vector for 'trimmed' pmid path
<- vector(mode = 'character',
pmid_path2 length = length(pmid_path))
for(i in 1:length(pmid_path)) {
<-
pmid_path2[[i]] str_extract(pmid_path[[i]],
'/PubmedArticleSet/PubmedArticle\\[[0-9][0-9]?[0-9]?\\]')
}
# Make dataframe
<- data.frame(article_node = pmid_path2,
pmid2 pmid = pmid)
#-- Publication status --------------------------------------------------#
# Define vector for publication status
<- vector(mode = 'character',
status length = length(status_path))
for(i in 1:length(status_path)) {
<- xml_text(
status[[i]] xml_find_first(record,
status_path[[i]]))
}
# Make article marker for joins
## Define vector for 'trimmed' year path
<- vector(mode = 'character',
status_path2 length = length(status_path))
for(i in 1:length(status_path)) {
<-
status_path2[[i]] str_extract(status_path[[i]],
'/PubmedArticleSet/PubmedArticle\\[[0-9][0-9]?[0-9]?\\]')
}
# Make dataframe
<- data.frame(article_node = status_path2,
status2 publication_status = status) %>%
# Edit text
mutate(publication_status = ifelse(
is.na(publication_status),
yes = NA,
no = ifelse(
== 'ppublish',
publication_status yes = 'print copy',
no = 'ahead of print')))
#-- Year / month / day published ----------------------------------------#
# Define vector for publication year
<- vector(mode = 'character',
year_published length = length(year_published_path))
for(i in 1:length(year_published_path)) {
<- xml_text(
year_published[[i]] xml_find_first(record,
year_published_path[[i]]))
}
# Define vector for publication month
<- vector(mode = 'character',
month_published length = length(month_published_path))
for(i in 1:length(month_published_path)) {
<- xml_text(
month_published[[i]] xml_find_first(record,
month_published_path[[i]]))
}
# Define vector for publication day (default = 1st of the month)
<- rep('01', length(year_published_path))
day_published
# Make article marker for joins
## Define vector for 'trimmed' year path
<- vector(mode = 'character',
year_published_path2 length = length(year_published_path))
for(i in 1:length(year_published_path)) {
<-
year_published_path2[[i]] str_extract(year_published_path[[i]],
'/PubmedArticleSet/PubmedArticle\\[[0-9][0-9]?[0-9]?\\]')
}
# Make dataframe
<- data.frame(article_node = year_published_path2,
year_published2 year_published = year_published,
month_published = month_published,
day_published = day_published) %>%
# Convert to date
mutate(date_published = paste(year_published,
month_published,
day_published,sep = '-'),
date_published = ymd(date_published)) %>%
# Select required columns
select(article_node, date_published)
#-- Year / month / day online -------------------------------------------#
# Define vector for online publication year
<- vector(mode = 'character',
year_online length = length(year_online_path))
for(i in 1:length(year_online_path)) {
<- xml_text(
year_online[[i]] xml_find_first(record,
year_online_path[[i]]))
}
# Define vector for online publication year
<- vector(mode = 'character',
month_online length = length(month_online_path))
for(i in 1:length(month_online_path)) {
<- xml_text(
month_online[[i]] xml_find_first(record,
month_online_path[[i]]))
}
# Define vector for online publication year
<- vector(mode = 'character',
day_online length = length(day_online_path))
for(i in 1:length(day_online_path)) {
<- xml_text(
day_online[[i]] xml_find_first(record,
day_online_path[[i]]))
}
# Make article marker for joins
## Define vector for 'trimmed' year path
<- vector(mode = 'character',
year_online_path2 length = length(year_online_path))
for(i in 1:length(year_online_path)) {
<-
year_online_path2[[i]] str_extract(year_online_path[[i]],
'/PubmedArticleSet/PubmedArticle\\[[0-9][0-9]?[0-9]?\\]')
}
# Make dataframe
<- data.frame(article_node = year_online_path2,
year_online2 year_online = year_online,
month_online = month_online,
day_online = day_online) %>%
# Convert to date
mutate(date_online = paste(year_online,
month_online,
day_online,sep = '-'),
date_online = ymd(date_online)) %>%
# Select required columns
select(article_node, date_online)
#-- Year / month / day entrez -------------------------------------------#
# Define vector for entrez publication year
<- vector(mode = 'character',
year_entrez length = length(year_entrez_path))
for(i in 1:length(year_entrez_path)) {
<- xml_text(
year_entrez[[i]] xml_find_first(record,
year_entrez_path[[i]]))
}
# Define vector for entrez publication year
<- vector(mode = 'character',
month_entrez length = length(month_entrez_path))
for(i in 1:length(month_entrez_path)) {
<- xml_text(
month_entrez[[i]] xml_find_first(record,
month_entrez_path[[i]]))
}
# Define vector for entrez publication year
<- vector(mode = 'character',
day_entrez length = length(day_entrez_path))
for(i in 1:length(day_entrez_path)) {
<- xml_text(
day_entrez[[i]] xml_find_first(record,
day_entrez_path[[i]]))
}
# Make article marker for joins
## Define vector for 'trimmed' year path
<- vector(mode = 'character',
year_entrez_path2 length = length(year_entrez_path))
for(i in 1:length(year_entrez_path)) {
<-
year_entrez_path2[[i]] str_extract(year_entrez_path[[i]],
'/PubmedArticleSet/PubmedArticle\\[[0-9][0-9]?[0-9]?\\]')
}
# Make dataframe
<- data.frame(article_node = year_entrez_path2,
year_entrez2 year_entrez = year_entrez,
month_entrez = month_entrez,
day_entrez = day_entrez) %>%
# Convert to date
mutate(date_entrez = paste(year_entrez,
month_entrez,
day_entrez,sep = '-'),
date_entrez = ymd(date_entrez)) %>%
# Select required columns
select(article_node, date_entrez)
# Put it all together #
#######################
#-- Make into dataframe ----------------------------------------------#
# Join 'short' dataframes (<=100 entries)
<- pmid2 %>%
record left_join(publisher2,
by = 'article_node') %>%
left_join(journal2,
by = 'article_node') %>%
left_join(volume2,
by = 'article_node') %>%
left_join(issue2,
by = 'article_node') %>%
left_join(status2,
by = 'article_node') %>%
left_join(year_online2,
by = 'article_node') %>%
left_join(year_entrez2,
by = 'article_node') %>%
left_join(year_published2,
by = 'article_node') %>%
select(pmid,
publisher,
journal,
volume,
issue,
publication_status,
date_online,
date_entrez,
date_published)
#-- Output -----------------------------------------------------------#
return(record)
}
############################################################
# #
# Generate dataframe from downloaded xml record #
# #
############################################################
<- map_df(record,
df
parse_record)
############################################################
# #
# Clean-up dataframe #
# #
############################################################
<- df %>%
df # Remove 'date_online' column (use complete 'date_entrez' data instead)
select(-date_online) %>%
# Make a 'year_entrez' and 'year_published' column
mutate(year_entrez = as.numeric(str_extract(date_entrez,
pattern = '[0-9]{4}')),
year_published = as.numeric(str_extract(date_published,
pattern = '[0-9]{4}'))) %>%
# Fix journal names
mutate(journal = fct_recode(as.factor(journal),
`Clin J Pain` = 'clin j pain',
`Eur J Pain` = 'eur j pain',
`J Pain` = 'j pain',
PAIN = 'pain'))
# Generate 'print copy data'
<- df %>%
df_print # Only want papers that have completed the publication cycle
filter(publication_status != 'ahead of print') %>%
# Remove 'date_published' = NA
filter(!is.na(date_published)) %>%
# Remove 'year_published' > 2016
filter(year_published < 2017) %>%
# Make an interval column ('time to print' in days)
mutate(interval = as.numeric(date_published - date_entrez)) %>%
# Remove interval values < 1
filter(interval >= 1)
# Generate 'ahead of print' data
# df_ahead <- df %>%
# filter(publication_status != 'print copy')
# Clean-up environment
rm(list = c('base_url',
'database_query',
'database_query2',
'i',
'parse_record',
'pmid_path',
'pmids',
'pubmed_query',
'record',
'record_count',
'search_get',
'search_query',
'splitter',
'splitter_list',
'terms'))
# readr::write_rds(df, './_data/2017-02-05-publication-time/df.rds')
# readr::write_rds(df_print, './_data/2017-02-05-publication-time/df_print.rds')
Caveats
I mentioned at the start that this was a very crude analysis, and the primary reasons for this statement are as follows:
The PubMed database has errors, and I made no attempt to verify the data retrieved from PubMed against data available through the publishers.
PubMed xml records follow a template, but the template is not applied consistently across all records. These inconsistencies make programmatically extracting the data susceptible to errors and missing data. For example, the XPath for the print publication year and month is typically //PubDate/Year and //PubDate/Month, respectively. But, in some records these individual year and month nodes are missing and instead a single date string of the form ‘YEAR Month-Month’ is provided at the path //PubDate/MedlineDate. Similarly, all records include a //PubMedPubDate[@PubStatus = 'entrez'] node from which the date an article was added to the Entrez database can be extracted, but only some records provide information on the date the publisher first released the e-publication (//ArticleDate[@DateType = 'Electronic']).
Median ‘time to print’
The figure below shows the median time in days between an article being recorded as an ‘epub ahead of print’ and then as a ‘print copy’ on the PubMed database. The data are shown as a heatmap, with the colour getting darker (more purple) as the time between epub and print increases. A quick scan of the plot reveals that the Clinical Journal of Pain (Clin J Pain) and the European Journal of Pain (Eur J Pain) take the longest, while the time take by PAIN and the Journal of Pain (J Pain) is relatively short.
# Read in saved outputs from get_data chunk
<- read_rds('./_data/2017-02-05-publication-time/df.rds')
df <- read_rds('./_data/2017-02-05-publication-time/df_print.rds')
df_print
############################################################
# #
# Plot heatmap #
# #
############################################################
# Summarise data for plotting
## Median 'time to print' by journal and year
<- df_print %>%
df_heat group_by(journal, year_entrez) %>%
rename(year = year_entrez) %>%
summarise(median = round(median(interval))) %>%
# Add tooltip
mutate(tooltip = paste0('<b>', journal, '</b> <br>',
'<em>Time to print:</em> ', median, ' days')) %>%
ungroup() %>%
mutate(journal = fct_relevel(journal,
'Clin J Pain',
'Eur J Pain',
'PAIN',
'J Pain'))
# ggplot
<- ggplot(data = df_heat) +
gg_heat aes(x = year,
y = journal,
fill = median,
tooltip = tooltip,
data_id = tooltip) +
geom_tile_interactive() +
scale_fill_viridis_c(direction = -1,
name = 'Days\n') +
labs(caption = "(Interactive figure, 'hover' over plot elements for more detailed information)",
x = '\nYear') +
theme(panel.background = element_blank(),
axis.ticks = element_blank(),
axis.title.x = element_text(size = 14),
axis.title.y = element_blank(),
axis.text.y = element_text(size = 12),
axis.text.x = element_text(size = 12),
plot.caption = element_text(size = 8),
panel.grid = element_blank())
<- girafe(ggobj = gg_heat,
gi_heat height_svg = 5,
width_svg = 6)
girafe_options(x = gi_heat,
opts_tooltip(css = 'font-family:arial;background-color:#eaeaea;
padding:10px;border-radius:10px 20px 10px 20px;',
opacity = 1,
offx = 10, offy = -10),
opts_hover(css = 'color:#FFFFFF;opacity:0.4;'))
Variability in the ‘time to print’
The box-and-whisker plot below gives some idea about the spread of the ‘time to print’ for each of the journals over the past few years. Clearly there are errors in the PubMed database. I cannot believe that the Clinical Journal of Pain took 529 days to transition one article from epub to print in 2014. Nor can I believe that only one day was needed for the Journal of Pain to transition six epubs to print in 2013. But, pruning the data for ‘outliers’ didn’t shift the median time to publication substantially, so I decided to present the data warts and all.
############################################################
# #
# Summary stats #
# #
############################################################
# Generate boxplot summary stats for 'time to print' (days) interval
<- df_print %>%
summary_stats select(journal,
year_entrez,
interval,%>%
pmid) rename(year = year_entrez) %>%
group_by(journal, year) %>%
summarise(median = round(median(interval)),
Q25 = round(quantile(interval, 0.25)),
Q75 = round(quantile(interval, 0.75)),
lower_whisker = round(boxplot.stats(interval)$stats[1]),
upper_whisker = round(boxplot.stats(interval)$stats[5]),
min = min(interval),
max = max(interval)) %>%
mutate(tooltip = paste0(paste0('<b>Time to print (days): ',
'</b> <br>',
journal, '<em>Median:</em> ',
'<br>',
median, '<em>Minimum / Maximum:</em> ',
' / ', max, '<br>',
min, '<em>Inter-quartile range:</em> ',
' to ', Q75, '<br>',
Q25, '<em>Whisker range:</em> ',
' to ',
lower_whisker, '<br>'))) %>%
upper_whisker, ungroup() %>%
select(journal, year, tooltip)
<- df_print %>%
df_print rename(year = year_entrez) %>%
left_join(summary_stats)
############################################################
# #
# Plot #
# #
############################################################
<- df_print %>%
gg_box mutate(journal = fct_relevel(journal,
'Clin J Pain',
'Eur J Pain',
'PAIN',
'J Pain')) %>%
ungroup() %>%
ggplot(.) +
aes(x = factor(year),
y = interval,
fill = journal,
colour = journal,
tooltip = tooltip,
data_id = tooltip) +
geom_boxplot_interactive() +
labs(caption = "(Interactive figure, 'hover' over plot elements for more detailed information)\n",
y = 'Time to print (days)\n',
x = '\nYear') +
scale_y_continuous(limits = c(-5, 605),
breaks = c(0, 100, 200, 300, 400, 500, 600),
labels = c(0, 100, 200, 300, 400, 500, 600),
expand = c(0,0)) +
scale_colour_manual(values = c('#000000', '#E69F00', '#0072B2', '#009E73')) +
scale_fill_manual(values = c('#4c4c4c', '#edbb4c', '#4c9cc9', '#4cbb9d')) +
facet_wrap(~ journal, ncol = 4) +
theme(legend.position = 'none',
panel.background = element_blank(),
axis.ticks = element_blank(),
axis.title = element_text(size = 20),
axis.text.y = element_text(size = 18),
axis.text.x = element_text(size = 18,
angle = 60,
hjust = 1),
strip.text = element_text(size = 18),
plot.caption = element_text(size = 12),
panel.grid.major = element_line(colour = '#999999',
size = 0.1))
<- girafe(ggobj = gg_box,
gi_box height_svg = 7,
width_svg = 9)
girafe_options(x = gi_box,
opts_tooltip(css = 'font-family:arial;background-color:#eaeaea;
padding:10px;border-radius:10px 20px 10px 20px;',
opacity = 1,
offx = 10, offy = -10),
opts_hover(css = 'color:#FFFFFF;opacity:0.4;'))
You could be mischievous with these data and say that the two top-ranked journals (by impact factor) are the top-ranked journals partially because they are streets ahead of the other two journals in terms of getting articles from electronic to print format. But another possibility is that the ‘lesser’ two journals have more papers to print compared to the Journal of Pain and PAIN. That is, does the elitism inherent in the impact factor system afford the Journal of Pain and PAIN greater scope to reject submissions (something I am a little too familiar with for my liking), giving them fewer articles to process?
# Calculate the median number of articles per issue
<- df_print %>%
issue_no group_by(journal, year, volume, issue) %>%
# Number of articles by journal/year/volume/issue
summarise(article_no = n()) %>%
# Average number of articles per issue per journal
group_by(journal) %>%
summarise(median = round(median(article_no)))
# Add to table 1
<- tab_df %>%
tab_df2 select(Journal, `Frequency (issues per year)`) %>%
rename(`Issues per year (n)` = `Frequency (issues per year)`) %>%
bind_cols(issue_no[2]) %>% # bind_cols(issue_no[2], ahead_no[2]) %>%
mutate(`Articles per year (n; median)` =
* `Issues per year (n)`) %>%
median rename(`Articles per issue (n; median)` = median) %>%
select(Journal,
`Articles per year (n; median)`,
`Issues per year (n)`,
`Articles per issue (n; median)`) # `'ahead of print' articles (n; 31 Dec 2016)`)
# Print table
kable(x = tab_df2,
align = 'lrrr',
caption = '<b>Table 2.</b> Journal outputs')
Journal | Articles per year (n; median) | Issues per year (n) | Articles per issue (n; median) |
---|---|---|---|
Clinical Journal of Pain | 96 | 12 | 8 |
European Journal of Pain | 140 | 10 | 14 |
Journal of Pain | 108 | 12 | 9 |
PAIN | 228 | 12 | 19 |
Other than the European Journal of Pain, which publishes 10 issues per year, the other three journals publish 12 issues per year (Table 2). Yet despite having the lowest issue frequency of the four journals, the European Journal of Pain publishes the second greatest number of articles per issue (median = 14 articles) and hence it is competitive with regards to total number of printed articles per year (median = 140 articles). The reduced number of issues per year does means that if you miss being published in an issue of the European Journal of Pain, there is a longer wait until the next issue compared to the other three journals. However this delay does not account for the magnitude in the difference in time to print between the European Journal of Pain, and the Journal of Pain and PAIN.
The long time to print for Clinical Journal of Pain isn’t easy to explain either. Comparing the data of the Clinical Journal of Pain and the Journal of Pain (Table 2), the two journals have comparable issue frequency (12 per year), median articles per issue, and hence total number of articles published per year, but vastly different time to print data. What ever the reason for the slow time to print speed of the Clinical Journal of Pain, they need to find a solution; increasing the number of articles per issue is the obvious solution.
Closing remarks
The unrefined nature of this analysis combined with the falliblity of the PubMed database when it comes to dates means that there may be some inaccuracies in the data presented here. Nevertheless, I think the data is strong enough to conclude that the two top-ranked journals (PAIN and Journal of Pain) are quicker at converting articles from ‘epub ahead of print’ to ‘print copy’ compared to the two lower-ranked journals. The reasons for the differences are not obvious, and I do not believe it is a publisher issue1. What ever the reason, I don’t think it is acceptable for journals such as the European Journal of Pain and Clinical Journal of Pain to take on average half to three-quarters to of a year to bring an article out in ‘print’.
Comment
I have had excellent experiences with all four journals, and these data are in no way a reflection of the excellent work done by the editorial and copy-editing staff of these journals. Indeed, the issue of time to print for these four journals is in my experience not an indicator of the time it takes for an article to go from accepted to being available online with a DOI, and is only an issue for those of us with weird funding mechanisms.
Session information
sessionInfo()
## R version 4.0.4 (2021-02-15)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] gdtools_0.2.3 ggiraph_0.7.8 knitr_1.31 lubridate_1.7.10
## [5] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.5 purrr_0.3.4
## [9] readr_1.4.0 tidyr_1.1.3 tibble_3.1.0 ggplot2_3.3.3
## [13] tidyverse_1.3.0 xml2_1.3.2
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.1.0 xfun_0.22 bslib_0.2.4 haven_2.3.1
## [5] colorspace_2.0-0 vctrs_0.3.6 generics_0.1.0 viridisLite_0.3.0
## [9] htmltools_0.5.1.1 yaml_2.2.1 utf8_1.2.1 rlang_0.4.10
## [13] jquerylib_0.1.3 pillar_1.5.1 withr_2.4.1 glue_1.4.2
## [17] DBI_1.1.1 dbplyr_2.1.0 uuid_0.1-4 modelr_0.1.8
## [21] readxl_1.3.1 lifecycle_1.0.0 munsell_0.5.0 gtable_0.3.0
## [25] cellranger_1.1.0 rvest_1.0.0 htmlwidgets_1.5.3 evaluate_0.14
## [29] labeling_0.4.2 fansi_0.4.2 highr_0.8 broom_0.7.5
## [33] Rcpp_1.0.6 backports_1.2.1 scales_1.1.1 jsonlite_1.7.2
## [37] farver_2.1.0 systemfonts_1.0.1 fs_1.5.0 hms_1.0.0
## [41] digest_0.6.27 stringi_1.5.3 grid_4.0.4 cli_2.3.1
## [45] tools_4.0.4 magrittr_2.0.1 sass_0.3.1 crayon_1.4.1
## [49] pkgconfig_2.0.3 ellipsis_0.3.1 reprex_1.0.0 assertthat_0.2.1
## [53] rmarkdown_2.7 httr_1.4.2 rstudioapi_0.13 R6_2.5.0
## [57] compiler_4.0.4
Clinical Journal of Pain (slowest ‘time to print’) and PAIN (second fastest ‘time to print’) are both publihsed by Lippincott Williams & Wilkins.↩︎