What is this?

This is a literature mining tool that facilitates scientific discovery of etiologies and treatments.

To learn more
Basic usage The goal of Clinical Queries (CQ) is to extract information about medical conditions directly from the scientific literature. The application currently supports two types of queries:
  1. Causes: given a symptom (e.g. “Chest Pain”) the app will retrieve a list of underlying causes (etiologies).
  2. Treatments: given a disease (e.g. “Malaria”) the app will retrieve a list of treatments (both drug and non-drug treatments).

In response to a query, CQ displays a list of results (causes or treatments), where each result is backed by one or more scientific papers supporting it. CQ provides a link to each paper and highlights important sentences which triggered the match. The results are ranked by the number of papers supporting them, which is displayed in parentheses next to each result.

How does this work? At a high-level, the system scans the literature (see "data sources" below) for sentences that contain assertions about etiologies or treatments for conditions or diseases, and presents these assertions, while linking to the original paper. The system also extracts the etiology and treatment names, and groups the results by them.

An attempt is made to group various names that refer to the same condition, disease or treatment into a single item, though this is very hard to achieve in practice, and the system may have errors of grouping too much, or too little.

Note that the system does not distinguish between papers that study a specific condition or phenomena, and papers that simply mention it in passing. We argue that both useful, but, of course, practitioners who are interested in a given result should consult the relevant literature and decide for themselves how to judge the evidence.

To find assertions of etiologies and treatment, the system consults around 30-40 templates, which are forms such as ____ due to ____ or ____ in patients with ____, and attempt to match them to the literature. The matching is done at the syntactic level, after a linguistic analysis of the text, so the second template, for example, might match also sentences like ____ and other conditions in large groups of patients in a UK hospital with ____. Some patterns also include constraints on the kind of thing that can appear in the slots, for example, requiring the term to be a disease or a chemical.

Why do I see near duplicates?

The list of results is extracted from the literature (see "how does this work") above. We try to group similar items into a single entry, either by considering various similarity rules, and/or by attempting to link each extracted item to a biomedical ontology, but (a) this automatic linking is not always accurate; and (b) sometimes that ontology itself considers two very similar conditions to be separate concepts. For this reason, the items on the left pane may seem to contain near duplicates: we were not able to determine for certain that the two items refer to the same concept, and preferred to err on the side of not linking them.

In future versions of this tool, we consider a user interface that will allow you to filter or to merge such entries.

Data sources

We currently consult the text of all abstracts available on PubMed. We update the literature we consult regularly, but we may have a lag of 2-3 months after the latest PubMed release.

What are "broad" and "narrow"

The application also supports two extraction modes:

  1. Narrow: the default extraction mode, it shows fewer results than “Broad”, but the results are generally more accurate and contain fewer near-duplicates (differently named variants of what is essentially the same cause or treatment).

  2. Broad: an extraction mode relevant to medical researchers interested in the widest coverage possible. “Broad” will return more results than “narrow” but these results may be less accurate and contain more near-duplicates.

These relate to different sets of patterns, as described in "how does this work" above (the "broad" one is naturally more inclusive).

Why do I see non-treatments As discussed in "how does this work", the system scans the literature for assertions. Some of these may come from a larger context that negate the assertion (in future versions, we aim to automatically identify and remove many of these). Or they may come from papers that were refuted in later studies, or from papers that only apply to model animals. We stress that it is up to the reader to consult the relevant literature (at least the evidence sentences and their contexts) and assess the findings for themselves.

Who built this?

The Clinical Queries application is a collaboration between the Allen Institute for AI, Rambam Health Care Campus, Bar Ilan University and the Technion Biomedical Engineering Lab.

AI2 Logo Rambam Logo Bar Ilan University Logo Technion BME Lab Logo ERC Logo