September 10, 2019

CheXpert PR

Background

In the United States, about half of all radiographic exams are X-rays, mostly of the chest. In other developing countries around the world, chest X-rays are even more widely used, for example, to detect lung cancer early, stop the spread of tuberculosis, and support the responsible use of antibiotics for pneumonia. Chest X-ray is the “bread and butter” for modern medical imaging to some extent.

Because of the critical role of chest X-ray, on January 2019 the Machine Learning (ML) group at Stanford University lead by Dr. Andrew Ng released a large-scale chest X-ray dataset CheXpert for research, which contains 224,316 images of 65,240 patients, collected from Stanford Hospital, performed between October 2002 and July 2017 along with their associated radiology reports. Similar to the previous ChestX-ray14 dataset released by NIH¹, researchers from Stanford also used a natural language processing (NLP) tool to extract the label of 14 common thorax conditions from the free text radiology reports. But compared to the ChestX-ray14 dataset, the CheXpert dataset has significantly larger volume with more variability of the data source, and is labelled with more accurate NLP system^2,3.

To benchmark the capability of Artificial Intelligence (AI) in interpreting chest X-ray compared to radiologists, the Stanford researchers also provide 500 studies as the hold-out test set, that were annotated by eight board-certified radiologists. The majority vote of 5 radiologist annotations serves as the ground truth; the remaining 3 radiologist annotations were used to benchmark radiologist performance. Five common thorax conditions are evaluated: (a) Atelectasis, (b) Cardiomegaly, (c) Consolidation, (d) Edema, and (e) Pleural Effusion, by measuring the average Area Under the ROC Curve (AUC) for binary classification. For AI models, the number of radiologist operating points lie below the curves is also evaluated to determine if the AI model is superior to the radiologists. The Stanford ML group published a baseline model that achieves an average AUC of 0.907 of the 5 conditions, which outperforms 1.8 radiologists.

JFhealthcare topped the CheXpert leaderboard

The AI team from JFhealthcare recently achieved an average AUC of 0.926, that is currently ranked NO.1 on the leaderboard, and more importantly, is the first team that outperforms all three radiologists on the test set, showing the AI model can achieve super-human performance in interpreting chest X-ray, even compared to human experts on the CheXpert dataset.

The competition organizers from Stanford also highly praised this record

The JFhealthcare AI team used multiple strategies to improve model performance on interpreting chest X-ray including: data denoising and data augmentation specifically designed for chest X-ray images, deep convolutional neural network based on Denset and pyramid attention, dynamic balancing of loss function for positive and negative samples during training, and so forth. The team also evaluate the contribution of each of these techniques through comprehensive ablation studies. More detailed technical report and code open source are under progress.

About JFhealthcare

JFhealthcare has been long working on providing affordable healthcare to township hospitals in rural China through advanced technologies including cloud and AI. It was founded by healthcare veteran Frank Wu, who lead the healthcare division of Siemens in Northeast Asia for more than 20 years. The leadership also includes senior RSNA member in China, principal AI scientist from Baidu Silicon Valley AI Lab, and research associate from US CDC.

JFhealthcare focuses strongly on practicing and deploying AI solutions for primary care in rural China under real clinical environment. The major product of tuberculosis (TB) screening based on chest X-ray developed by JFhealthcare has been widely deployed on dozens of township hospitals across multiple provinces in China, serving more than twenty thousands of people within two months. The mobile screening truck recently launched by JFhealthcare is expected to screen more than ten provinces in China by the end of 2019, serving half millions of people.

JFhealthcare also emphasizes deeply on team-working between engineers and radiologists. The two teams work closely to develop the cloud-based platform for image annotation and design the protocol of annotation that is both clinically sound and computationally efficient for AI model development. With these tools and strategies well set up, the radiologist team at JFhealthcare not only supplies massive amount of chest X-ray annotations (about 5,000 image annotations a week) in various formats for training the AI algorithms, but also reviews the prediction results from AI models in a regular basis. The TB screen model has so far reached about 90% consistency with the radiologist team on a daily basis.

In the future, JFhealthcare is planning to develop chest X-ray screening product for tuberculosis, lung cancer and pneumonia jointly, that covers the major thorax diseases based on more than three hundred thousands of chest X-ray images with curated radiology reports, collected from about one thousand township hospitals by JFhealthcare in the past a few years. The team is hoping to fully leverage the clinical value of chest X-ray through AI and deliver it to the massive township hospitals in China where radiologists are mostly needed.

References

1. https://www.nih.gov/news-events/news-releases/nih-clinical-center-provides-one-largest-publicly-available-chest-x-ray-datasets-scientific-community

2. https://arxiv.org/abs/1901.07031

3.https://lukeoakdenrayner.wordpress.com/2019/02/25/half-a-million-x-rays-first-impressions-of-the-stanford-and-mit-chest-x-ray-datasets/