How accurate is Artificial intelligence (AI) in the medical field?


The accuracy of AI models used in healthcare to forecast disease is not as high as some studies might imply.                                                                                                                                                            

We use artificial intelligence (AI)-based tools daily, most commonly voice assistants such as Alexa and Siri. These consumer products work pretty well (Siri understands most of what we say), but they're far from perfect which will be improved with time. We accept their limitations and adjust how we use them until they get the right answer or give up. The consequences are usually minor when Siri or Alexa misinterpret a user's request.

As the AI model/algorithm completely depends on the dataset, the more accurate and large size of the dataset the more accurate model be. But errors in the AI ​​models that support doctors' clinical decisions can mean the difference between life and death. Therefore, it is important to understand how well these models work before deploying them. Currently, published reports on the technology are overly optimistic about its accuracy, sometimes leading to sensationalist stories in the press. There is much debate in the media about algorithms that can diagnose early-stage Alzheimer's disease with up to 74% accuracy, or that are more accurate than clinicians. Scientific studies detailing such advances could form the basis for new ventures, new investment and research directions, and large-scale implementation in hospital systems. In most cases, this technology is not yet ready for use the reason is as follows.

If a researcher enters the data into the AI ​​model, it is expected that the accuracy of the model will be improved or at least worsened. However, researchers gained the opposite of the accuracy reported in the published model as the size of the dataset increased.

This intuitive scenario cause is how the accuracy of the reported model is estimated and reported by scientists. In Best Practice, researchers train AI models in part of the dataset and store the rest in the "rock box". Next, use this "holding" data to test the accuracy of the model.

Suppose an AI program is developed that distinguishes between people with dementia and those without by analyzing how people with dementia speak. The model was developed using training data consisting of audio samples and dementia diagnostic labels to predict whether a person has dementia based on the audio. Then test with the same type of persisted data to measure performance accuracy. This accuracy estimate will be published in a scientific publication. Scientists say that the more accurate the stored data, the better the algorithm.

Why do studies show that reported accuracy decreases as record size increases? Ideally, retained data is not visible to scientists until the model is completed and repaired. However, scientists may, sometimes unintentionally, look at the data and modify the model until it achieves high accuracy. This is a phenomenon known as data leakage. By modifying the model using retained data and then testing it, the researcher virtually guarantees that the system correctly predicts the retained data and overestimates the true accuracy of the model. We need to be able to use new datasets to test whether the model is learning and look at fairly unknowns to make a proper diagnosis.

For models trained and tested on tiny data sets, the effects of data leakage and publication bias are incredibly high. This means that models trained on short data sets are more likely to produce exaggerated estimates of accuracy; as a result, researchers observe this same pattern in the published literature, where models trained on small data sets show higher accuracy than models trained on big data sets.

After determining that the development of an AI model is ethical for a particular application, the first question an algorithm designer should ask is “Do we have sufficient evidence to represent a complicated concept like human health? If the answer is true, researchers should focus more on validating models rather than attempting to extract every last bit of "accuracy" from them.

Reliable validation of models begins with ensuring representative data. The most difficult problem in AI model development is the design of the training and test data itself.
Consumer AI companies collect data opportunistically, but clinical AI models are riskier and require more caution. Algorithm designers should regularly examine the size and composition of data used to train models to ensure that it is representative of the range of disease symptoms and user demographics.

All datasets are imperfect in some way. Researchers should aim to understand the limitations of the data used to train and score the model and how these limitations affect model performance.

Unfortunately, there is no silver bullet to reliably validate clinical AI models. Every tool and every clinical population is different. To arrive at a satisfactory validation plan that considers real-world conditions, clinicians and patients should be involved early in the design process, with input from stakeholders such as the Food and Drug Administration. A wider conversation is more likely to ensure that the training dataset is representative. That the parameters are relevant to know the behavior of the model. What AI tells clinicians is relevant.

Lessons need to be learned from the reproducibility crisis in clinical research, where strategies such as pre-enrollment and patient-centered studies have been proposed as means of increasing transparency and promoting trust. Similarly, the sociotechnical approach to AI model design recognizes that building reliable and accountable AI models for clinical applications is not a purely technical problem. It requires a deep knowledge of the underlying clinical application, an awareness that these models exist in the context of a larger system, and an understanding of the potential harm if the model's performance degrades during use.

Without this holistic approach, the AI ​​hype will continue. This is unfortunate because the technology has real potential to improve clinical outcomes and extend clinical reach to underserved communities. Adopting a more holistic approach to developing and testing clinical AI models will lead to more nuanced discussions about how well these models work and what their limitations are. I believe this will ultimately lead to technology reaching its full potential and people benefiting.

Post a Comment

0 Comments