An world investigation squad led by Assistant Professor Zhiyu Wan from ShanghaiTech University has precocious published groundbreaking findings successful nan diary Health Data Science, highlighting biases successful multimodal ample connection models (LLMs) specified arsenic ChatGPT-4 and LLaVA successful diagnosing tegument diseases from aesculapian images. The study systematically evaluated these AI models crossed different activity and property groups.
Utilizing astir 10,000 dermatoscopic images, nan study focused connected 3 communal tegument diseases: melanoma, melanocytic nevi, and benign keratosis-like lesions. Results revealed that while ChatGPT-4 and LLaVA outperformed astir accepted heavy learning models overall, ChatGPT-4 showed greater fairness crossed demographic groups, whereas LLaVA exhibited important sex-related biases.
Dr. Wan emphasized, "While ample connection models for illustration ChatGPT-4 and LLaVA show clear imaginable successful dermatology, we must reside nan observed biases, peculiarly crossed activity and property groups, to guarantee these technologies are safe and effective for each patients."
The squad plans further investigation incorporating further demographic variables for illustration tegument reside to comprehensively measure nan fairness and reliability of AI models successful objective scenarios. This investigation provides captious guidance for processing much equitable and trustworthy aesculapian AI systems.
Source:
Journal reference:
Wan, Z., et al. (2025). Evaluating Sex and Age Biases successful Multimodal Large Language Models for Skin Disease Identification from Dermatoscopic Images. Health Data Science. doi.org/10.34133/hds.0256.