Five Ways to Sharpen Your Data Science Skills as a Radiologist

You’re a full-time radiologist working a day job. A few years ago, you heard about big data and AI and decided to learn all that you could. A good thing you did, too: Radiology AI is real and here today, and your practice looks to you as the local expert.

The challenge, then, is keeping up.

Just as all AI performance degrades with time, your data skills will also need upkeep. If your goals are similar to mine, these five suggestions might help. These are not ways to become the world's leading expert in AI but suggestions for brushing up and keeping up.

1. Engage in a Data Project

While images are the most apparent source of rich data, you regularly run across a lot of text data as a radiologist. Finding a project in data analytics can take your work to the next level. Maybe the project is predicting future volume based on this year's data to justify hiring new radiologists. Perhaps it’s predicting demand for radiologists and technologists by day and hour using a combination of exam volume and turnaround time by modality. Maybe it's building a computer vision model for a disease entity that you spent your career studying.

Working on a project that will impact your daily practice is probably the single best way to keep your skills current because there is a real incentive to do things well. There is the pressure of eventually displaying your work. What’s more, it’s an opportunity to improve the way your co-workers do their work. As long as you are willing to accept the challenge, both intrinsic and extrinsic rewards can be well worth the effort.

2. Make Your Data Better

If taking on a truly tangible project sounds too involved for your professional life right now, that's okay! There are plenty of other ways to keep your data skills current.

The best data science projects start with high-quality data. Sometimes this means better data: A structured format for diagnostic findings, standardized recommendations, and fine-tuned, practice-level reporting templates are all meaningful engagements. These can be highly worthwhile projects to refine the input to machine learning models.

Sometimes high-quality data means better use of data. What is the volume trend in your practice? What is the average turnaround time? Make a request to your data center for a spreadsheet of last month's radiology reports by modality, anatomy, and timestamps. In particular, timestamps are extremely helpful for calculating turnaround time and identifying outliers.

Take a moment to learn how data is populated. How is your practice's turnaround time calculated? Do your radiologists and ED physicians agree on the definition? Are there manual components in the time calculations — for instance, does the scanner automatically fill in exam start time, or is there another button a technologist has to click? If a data field is manually populated, what are your options to improve the quality of that data?

It might be easy and tempting to do the analysis straight on a spreadsheet, but instead, try using an analytic platform. Is R your cup of tea? Do you prefer Python? The benefit of approaching analytics this way is that you can scale your analysis quickly. With the right platform, the difference between analyzing a 300-row spreadsheet and a 30-million-row spreadsheet is just hardware.

3. Machine Learning Competitions

Many machine learning competitions allow you to solve discrete problems in a "practice mode" (or in actual competition mode with penalties for wrong answers). For example, national societies such as RSNA and the Society for Imaging Informatics in Medicine (SIIM) routinely put out radiology-relevant competitions on a timely topic. ML competitions provide optimized data and encapsulate the problem. While real-life data science is messy and often involves mixed-quality data, competitions abstract out the logistics and focus on model-building. If you got into data science because you enjoyed the rush of creating something out of your own hours of effort, you might enjoy these. Cash or computing resources are common prizes for top performers.

Kaggle ( is a website that allows users to publish anonymized data sets, build machine learning models, and host/participate in data science competitions both in and outside of healthcare. RSNA and SIIM have hosted many of their recent machine learning challenges – and winning solutions – on Kaggle. Outside of the radiology competitions, data science problems on Kaggle range from straightforward to very difficult, and there is something for everyone from complete novices to experts. It's never just busywork.

4. Learn a New API or New Language

Like any skill, every element of computer science builds upon itself. While current literature covering radiology data science emphasizes coding in Python, a radiologist with the right data and no access to full-time data scientists can use a low-code or no-code environment like PyCaret to turn ideas into a working prototype.

For those with coding experience, even within one programming language, there are many packages to consider. Python libraries in machine learning alone pose a daunting challenge: Pytorch, Keras, Caffe/Caffe2, and MXNet are just some examples of the many choices you have for computer vision. For natural language processing, popular starting points include nltk, GenSim, SciPy, and others.

Finally, the proper integration of data models into the broad healthcare technology and workflow is critical in the real world. Pragmatic considerations often require knowledge beyond Python or data science. Java (deeplearning4j), C++ (OpenCV, Cuda), and C# (also OpenCV through a .NET wrapper) are useful considerations for data science projects ripe for clinical translation.

One great way to keep yourself current as a data scientist is to keep learning new things because learning new things requires you to review what you already know.

5. Attend the ACR Data Science and ACR Imaging Informatics Summits

The ACR Data Science Summit occurs annually in June. It has frequently been a pre-conference event before the SIIM annual meeting at the same venue. Attending the DSI Summit and SIIM is an excellent way to keep up with the most topical considerations in radiology AI. This year's DSI Summit on June 8 will focus on the data considerations for bringing AI to practice, such as model evaluation, deployment, and ongoing monitoring. SIIM also offers learning labs and workshops that would be appropriate for informaticists at all levels. A programming 2022 virtual Hackathon at SIIM provides hands-on opportunities for DICOMweb and FHIR API.

The ACR Informatics Summit takes place in Washington, DC Oct. 22-23, 2022 and will cover a broader swath of topics related to informatics, including data science. The Informatics Summit is perfect for radiologists whose work reaches beyond data science research and involves integration into the broader radiology workflow.


As a radiologist, I am not (and probably never will be) as good as a full-time data scientist, so my goal is to keep abreast of the newest technologies and periodically create something that helps me solve my everyday problems at work.

How do you keep up with your data skills?

Po-Hao “Howard” Chen, MD, MBA, Chief Imaging Informatics Officer, IT Medical Director for Enterprise Radiology, and Staff Radiologist in Musculoskeletal Imaging at Cleveland Clinic

  • You may also like

    AI-LAB Federated Learning to Expand AI Opportunities to More Radiologists

    As radiologists, we strive to deliver high-quality images for interpretation while maintaining patient safety, and to deliver accurate, concise reports that will inform patient care. We have improved image quality with advances in technology and attention to optimizing protocols. We have made a stronger commitment to patient safety, comfort, and satisfaction with research, communication, and education about contrast and radiation issues. But when it comes to radiology reports, little has changed over the past century.