22/09/2025

The Unicorn challenge: How can we assess the reliability of general models in specific tasks?

At MICCAI 2024, the Medical Image Computing and Computer Assisted Intervention Society (MICCAI) announced the UNICORN Challenge as one of its Lighthouse Challenges. These prestigious challenges highlight high-quality, high-impact efforts in the community.

The UNICORN Challenge is an international benchmarking initiative to evaluate the capabilities of large multimodal foundation models in medicine. These models, designed as “generalists,” are expected to adapt to a wide range of tasks without requiring task-specific training. Yet, despite their promise, the field currently faces a benchmarking crisis: there is a lack of systematic tools to assess how reliably such models perform across different clinical tasks, particularly in medical image interpretation. The UNICORN Challenge addresses this gap by providing a unified set of 20 diverse tasks in radiology and digital pathology to test how well a single model can transfer its knowledge across multiple domains of vision and language.

COMFORT aims to build AI models and tools tailored to kidney and prostate cancer. And some of the COMFORT consortium are at the heart of the UNICORN challenge: Alessa Hering and her team from Radboud UMC are the organisers of the challenge and Hartmut Häntze and Sarah de Boer are presenting their work on the challenge at this year’s MICCAI, on 23 September 2025.

Sarah de Boer says about the challenge: “By building a general image encoder for radiology and submitting this model to the UNICORN challenge, we wanted to investigate its generalisability in different radiology tasks. One of the tasks in UNICORN was the detection of clinically significant prostate cancer. In the future, we hope to apply our model to kidney and prostate cancer and see how it compares to so called “narrow” AI models, which are built for one task only.”

Hartmut Häntze presented an adaption of the public segmentation model MRSegmentor for cancer related tasks. “We demonstrated that segmentation models for general anatomy can be used as a feature extractor to classify nodule malignancy.” This means that, although trained for general anatomy, that model can also analyse an image and identify useful information (patterns, shapes, textures, etc.) from the region of interest, which can then be used to classify whether a nodule is malignant or not (benign vs. cancerous).

Find out more: UNICORN Challenge