The accuracy of AI models used in healthcare to forecast disease is not as high as some studies might imply.
We use artificial intelligence (AI)-based tools daily, most commonly voice assistants such as Alexa and Siri. These consumer products work pretty well (Siri understands most of what we say), but they're far from perfect which will be improved with time. We accept their limitations and adjust how we use them until they get the right answer or give up. The consequences are usually minor when Siri or Alexa misinterpret a user's request.
As the AI model/algorithm completely depends on the dataset, the more accurate
and large size of the dataset the more accurate model be. But errors in the AI models
that support doctors' clinical decisions can mean the difference between life and
death. Therefore, it is important to understand how well these models work before
deploying them. Currently, published reports on the technology are overly optimistic
about its accuracy, sometimes leading to sensationalist stories in the press. There
is much debate in the media about algorithms that can diagnose early-stage Alzheimer's
disease with up to 74% accuracy, or that are more accurate than clinicians. Scientific
studies detailing such advances could form the basis for new ventures, new investment
and research directions, and large-scale implementation in hospital systems. In
most cases, this technology is not yet ready for use the reason is as follows.
If a researcher enters the data into the AI model, it is expected that the accuracy of the model will be improved or at least worsened. However, researchers gained the opposite of the accuracy reported in the published model as the size of the dataset increased.
This intuitive scenario cause is how the accuracy of the reported model is estimated and reported by scientists. In Best Practice, researchers train AI models in part of the dataset and store the rest in the "rock box". Next, use this "holding" data to test the accuracy of the model.
Suppose an AI program is developed that distinguishes between people with dementia
and those without by analyzing how people with dementia speak. The model was developed
using training data consisting of audio samples and dementia diagnostic labels to
predict whether a person has dementia based on the audio. Then test with the same
type of persisted data to measure performance accuracy. This accuracy estimate will
be published in a scientific publication. Scientists say that the more accurate
the stored data, the better the algorithm.
Why do studies show that reported accuracy decreases as record size increases? Ideally,
retained data is not visible to scientists until the model is completed and repaired.
However, scientists may, sometimes unintentionally, look at the data and modify
the model until it achieves high accuracy. This is a phenomenon known as data leakage.
By modifying the model using retained data and then testing it, the researcher virtually
guarantees that the system correctly predicts the retained data and overestimates
the true accuracy of the model. We need to be able to use new datasets to test whether
the model is learning and look at fairly unknowns to make a proper diagnosis.
For models trained and tested on tiny data sets, the effects of data leakage
and publication bias are incredibly high. This means that models trained on
short data sets are more likely to produce exaggerated estimates of accuracy;
as a result, researchers observe this same pattern in the published literature,
where models trained on small data sets show higher accuracy than models
trained on big data sets.
After determining
that the development of an AI model is ethical for a particular application, the
first question an algorithm designer should ask is “Do we have sufficient
evidence to represent a complicated concept like human health? If the answer is
true, researchers should focus more on validating models rather than attempting
to extract every last bit of "accuracy" from them.
Reliable validation
of models begins with ensuring representative data. The most difficult problem in
AI model development is the design of the training and test data itself.
Consumer AI companies collect data opportunistically, but clinical AI models are
riskier and require more caution. Algorithm designers should regularly examine the
size and composition of data used to train models to ensure that it is representative
of the range of disease symptoms and user demographics.
All datasets are
imperfect in some way. Researchers should aim to understand the limitations of the
data used to train and score the model and how these limitations affect model performance.
Unfortunately, there is no silver bullet to reliably validate clinical AI models.
Every tool and every clinical population is different. To arrive at a satisfactory
validation plan that considers real-world conditions, clinicians and patients should
be involved early in the design process, with input from stakeholders such as the
Food and Drug Administration. A wider conversation is more likely to ensure that
the training dataset is representative. That the parameters are relevant to
know the behavior of the model. What AI tells clinicians is relevant.
Lessons need to
be learned from the reproducibility crisis in clinical research, where strategies
such as pre-enrollment and patient-centered studies have been proposed as means
of increasing transparency and promoting trust. Similarly, the sociotechnical approach
to AI model design recognizes that building reliable and accountable AI models for
clinical applications is not a purely technical problem. It requires a deep knowledge
of the underlying clinical application, an awareness that these models exist in
the context of a larger system, and an understanding of the potential harm if the
model's performance degrades during use.
Without this holistic approach, the AI hype will continue. This is unfortunate
because the technology has real potential to improve clinical outcomes and extend
clinical reach to underserved communities. Adopting a more holistic approach to
developing and testing clinical AI models will lead to more nuanced discussions
about how well these models work and what their limitations are. I believe this
will ultimately lead to technology reaching its full potential and people benefiting.
0 Comments
Please let me know, if you have any doubt....