Connected Papers: A Game-Changing Tool for Researchers


For anyone who is interested in a scholarly topic—perhaps COVID19, systemic racism, or climate change—finding relevant research articles can be an enormously daunting task. For example, a PubMed search for "COVID19" on July 18, 2020 produced 31,577 results. Only 13 of these papers were published before 2020. Limiting the results to papers with free full text reduced the number to 22,910 papers. Further limiting the results to review articles yielded 2,345 results, which hardly makes for easy reading.

Therefore, most people with a passing interest in the topic will be quickly deterred, resigned to reading snippets about recent research in the popular press. This is a great loss because some of the people with a passing interest—perhaps engineers, computer scientists, interior designers, social workers, and educators—have skills which could be applied to reduce the problem. Instead, only those who start with a very strong commitment to the topic will be able to gain a deep understanding of the research. This is problematic because these people tend to come from similar educational backgrounds, take similar approaches to solving problems, and have similar blind spots. This is not the best way to solve massive problems like a global pandemic, systemic racism, or climate change.

Those with a very strong commitment to a difficult topic—such as enthusiastic students and dedicated researchers—may spend hours upon hours sifting through the literature to identify relevant papers. This takes time away from understanding the research and contributing to it, and results in hundreds and thousands of research hours lost.

Connected Papers: A game-changing tool

Three Israeli researchers—with interests in mathematics, machine learning, and electrical engineering—were familiar with this problem and wanted to find a solution. They developed an online search tool called Connected Papers, which was made freely available to the public in June 2020.

When the user enters a topic of interest, Connected Papers will sort through upwards of 50,000 related papers and select a few dozen of the most cited papers for the user to peruse. The results are presented visually in a force directed graph. Each research paper is represented by a circle, with similar papers clustered together in space and connected by strong lines. Less similar papers are presented farther away in space, often clustered in their own groups. More frequently cited papers are represented as larger circles, and more recent papers are represented by a darker color.

Trial run 1

Connected Papers 'COVID19' search results
The Connected Papers "COVID19" search results drop-down menu.

I tested this new research tool by entering "COVID19" into the search box. Within a few seconds, the drop-down menu was populated with five paper titles, and the option to "See all paper suggestions for COVID19." I chose this last option and was presented with a list of 10 papers, a far cry from the 31,577 on the same topic from PubMed. The top result on Connected Papers had been cited 566 times, so I clicked the button next to it to "Build a graph". The resulting graph contained 41 papers, dating back to 2005.

Connected Papers 'COVID19' graph
The graph included top papers on COVID19, swine flu (H1N1), the 1918 flu, SARS, and smallpox, and the impact of mass gatherings, holiday travel, school closures, and social mixing.

For anyone with an interest in how to control COVID19, this list was gold and had taken less than 2 minutes to generate. The paper titles were listed in the left side panel for easy scrolling. The upper left corner contained buttons to isolate prior works and derivative works.

When I clicked on a paper title or circle, a preview would appear on the right side panel with the abstract, authors, journal, the number of times the paper has been cited, and the number of references found within the paper. There was also the option to build a graph using that paper, allowing me to refine or expand my search with ease. I could also click to view the paper details in Semantic Scholar, which provides additional information including tables, figures, and full text when available.

The data source

Connected Papers relies on the data in Semantic Scholar to build its graphs. Semantic Scholar was launched in 2015 by the Allen Institute for AI (artificial intelligence) with a collection of 3 million computer science papers. The database now includes more than 180 million papers from all fields of science, with an emphasis on computer science, molecular biology, microbiology, and neuroscience. Semantic Scholar has partnerships with over 500 publishers, university presses, and scholarly societies to provide timely access to scientific research in a manner that is freely available to the public.

In both Semantic Scholar and Connected Papers, the details provided for a paper prominently feature the number of times the paper has been cited and the number of references found within the paper. This is the information that Connected Papers uses to determine if papers cover similar topics. If two papers have highly overlapping citations and reference lists, they can be strongly connected and positioned close to each other on a graph even if they do not directly cite each other.

Trial run 2

Connected Papers 'systemic racism' search results
The Connected Papers search results for "systemic racism."

To test Connected Papers in a different field of study, I searched for "systemic racism" and was presented with a list of 10 publications. The first match was a chapter from a scholarly book published in 2019 called "Educating for Critical Consciousness". The chapter had 0 citations and 0 references listed on Semantic Scholar, no abstract and no free full text available. When I clicked "Build a graph", I received the message "Sorry! We couldn't find enough papers to create a graph."

The third paper in the list for "systemic racism" looked more promising, with 3 citations to its create and 124 references in the text. The majority of publications in the graph had been cited 0 to 2 times. Topics included racism in health care, causes of obesity, and the association between air pollution and insulin resistance.

Connected Papers 'systemic racism' graph
The resulting graph included 41 publications, with the earliest one from 2004. The largest circle had been cited 160 times and was from 2012.

The focus on health issues—compared to education, housing, employment, law enforcement, etc—demonstrates Semantic Scholar's emphasis on the natural sciences compared to the social sciences. Furthermore, the number of publications with 0 identified references (a near impossibility in modern scholarly research) suggests that Semantic Scholar has been less successful in gaining access to the scholarly books that are commonly used to present research results in the social sciences.

A quick search for "climate change" and some of my favorite genes yielded more satisfying results.


Every researcher, whether a student or a seasoned professional, knows the struggle of finding papers that are relevant to their research. This is especially challenging for researchers who are exploring new fields. Yet a deep understanding of the literature is essential for becoming familiar with a field, and identifying a niche that requires additional research to advance the field.

In the olden days, the process of finding relevant papers involved physically going to the library to find old journal articles that were cited in an article of interest, and perusing recent journal issues for new papers. The internet and related databases made this process much easier, allowing researchers to browse through titles and abstracts with ease. Another step forward was the introduction of high quality open access journals, which are freely available to anyone with an internet connection. This has forced many traditional journal publishers to make articles freely available online after a short period of time (e.g. a year).

To complicate the situation, the number of scientific papers published each year continues to grow. In 2018, there were 3 million papers published across 42,500 journals. Databases and search engines have improved how they store, organize, and analyze this data. However, finding relevant papers still often requires researchers to read hundreds of titles and tens of abstracts to find a few relevant articles. Other people in the same field will repeat the process, often struggling with the same poorly worded explanations or looking for the same detail buried in the text. The work is tedious, and relies almost entirely on searching for keywords and reading text.

If a researcher finds a relevant paper from an unexpected source, they will often share it with colleagues working in the same field. These unexpected gems can be an especially exciting aspect of research. By greatly simplifying the process of finding these gems, Connected Papers gives researchers the freedom to explore new fields and play with new research ideas. It also allows novices, including young students, to learn and become passionate about exciting research. This will only make it easier for talented, dedicated people to unleash their full potential to tackle an abundance of important questions. The game has changed.

