Carta Acesso aberto Revisado por pares

Using a Structured Image Database, How Well Can Novices Assign Skin Lesion Images to the Correct Diagnostic Grouping?

2009; Elsevier BV; Volume: 129; Issue: 10 Linguagem: Inglês

10.1038/jid.2009.75

ISSN

1523-1747

Autores

Nicola H. Brown, Karen Robertson, Yvonne Bisset, Jonathan L. Rees,

Tópico(s)

Cutaneous Melanoma Detection and Management

Resumo

basal cell carcinoma squamous cell carcinoma seborrheic keratoses TO THE EDITOR The cognitive basis of expertise in dermatology has received little formal attention (Jackson, 1975Jackson R. The importance of being visually literate. Observations on the art and science of making a morphological diagnosis in dermatology.Arch Dermatol. 1975; 111: 632-636Crossref PubMed Scopus (20) Google Scholar). Crucial to any such account is how experts recognize dermatological lesions—that is, how they attach semantics to images. Insights into these processes might allow improved rates of skill acquisition and may be relevant to attempts to use computers to diagnose solitary skin lesions such as skin cancers. Although not designed to answer fundamental cognitive questions, a number of very different techniques are suggested as tools or heuristics to facilitate diagnosis. In melanoma for instance, diagnostic strategies range from the use of rule-based systems, such as the ABCD system (Friedman et al., 1985Friedman R.J. Rigel D.S. Kopf A.W. Early detection of malignant-melanoma – the role of physician examination and self-examination of the skin.CA Cancer J Clin. 1985; 35: 130-151Crossref PubMed Scopus (650) Google Scholar), to approaches that rely on what might be termed a gestalt approach, such as the use of the 'ugly duckling' sign (Grob and Bonerandi, 1998Grob J.J. Bonerandi J.J. The 'ugly duckling' sign: identification of the common characteristics of nevi in an individual as a basis for melanoma screening.Arch Dermatol. 1998; 134: 103-104Crossref PubMed Google Scholar). In some domains of expertise, strategies based on 'matching' show that humans are capable of classifying objects using implicit concepts of likeness that allow them to search for an image that looks like a reference image (Murphy, 2002Murphy G.L. The Big Book of Concepts. Bradford Books, MIT Press, Cambridge, Mass2002Crossref Google Scholar; Muller et al., 2004Muller H. Michoux N. Bandon D. Geissbuhler A. A review of content-based image retrieval systems in medical applications-clinical benefits and future directions.Int J Med Inform. 2004; 73: 1-23Abstract Full Text Full Text PDF PubMed Scopus (1239) Google Scholar). Informal discussions with clinicians suggest that this matching approach—which is sometimes referred to as 'diagnostic-snap'—is looked upon with great skepticism. This is in part because there is not a one-to-one correspondence between morphology and diagnosis and because the range of lesions that present clinically seems so rich. To pursue this question experimentally, we examined whether non-experts (novices) were able to assign images of lesions from three common diagnostic groups (basal cell carcinoma (BCC), squamous cell carcinoma (SCC), and seborrheic keratoses (SK)) to the correct group using a simple structured computer image database. The use of the database does not require knowledge of diagnostic terms or presuppose knowledge of the rules by which lesions are classified, nor require the participants to know how many diagnostic categories are being examined. Participants were recruited opportunistically from the university campus, age range 22–52 years, and consisted of five medical students, one law student, and seven adults from a range of non-medical educational backgrounds. We refer to the participants as novices apart from the two of the five medical students who had completed a 2-week dermatology attachment. Results are presented with and without these two students. The study was conducted in a designated research room within the University of Edinburgh, Department of Dermatology. Three studies were undertaken sequentially and no participant took part in any experiment more than once, but some participants took part in more than one experiment (different index images were used for each experiment—see below). Ethical permission for collection of the images was obtained from the Lothian Ethics committee. In brief, for all experiments, participants were shown a series of six index prints of skin tumors, two from each test diagnostic group (BCC, SCC, SK). The images differed for each experiment. They were asked to match each test lesion to either a group of images from a single diagnostic category (experiments 1 and 2) or to a single image using the database (experiment 3). Participants were not informed of the diagnosis of any image. The experimenter had no training in dermatology and her role was to facilitate the use of the computer database. She was instructed to be indifferent to the choices of the participants, and was unaware of how the data would be analyzed. The experimental design is shown schematically in Figure 1. Eight novice participants were studied (six female, two male). Three different images labeled A, B, and C (one BCC, one SCC, and one SK) were shown on the computer screen (screen 1) and the subject was asked which of the three images looked most similar to the (one of six) index prints. Depending on their selection (A, B, or C), they were led to a second screen (screen 2) comprising up to 16 lesions of the same diagnostic category (i.e. BCC, SCC, SK). They were asked if they were happy if the index photograph belonged to this 'family' of lesions (i.e. diagnostic group) and if not, they had the option of repeating the procedure (returning to screen 1) freely until they made their choice. Nine participants of whom seven were novices were studied (five female, four male). Again, participants were shown one of the series of six index prints. They were then shown a screen of 15 images with a range of diagnoses (hemangiomas, melanoma, melanocytic nevi, but three images each of BCC, SCC, and SK) and were asked to indicate which image looked most similar to the index image. Depending on their choice they were led to one of a possible 15 screen 2s. Each collection of screen 2 images contained 15 images as for screen 1 that contained a range of diagnoses but with a predominance of BCC, SCC, and SK. Again, they were asked to select a match and, depending on their choice, were directed to one of nine further screens (screen 3), which comprised images from nine single diagnostic groups, including one screen each for BCC, SCC, or SK (and six of nine screens for non-BCC/SCC/SK lesions). Participants were asked to choose the screen of images most similar to the index case, and could return to earlier screens freely. Seven participants of whom six were novices took part (six female, one male). In the first two experiments, each 'final' screen was always of a homogenous diagnostic group. This may have made the participants more confident that they did not need to go back to earlier screens. To make the task harder, in experiment 3, participants were asked to choose a single image from a final screen with a range of diagnoses (rather than a screen defined by a single diagnostic category). Again, six images were shown one at a time to the participants. They were then shown screen 1 comprising 10 images of a range of diagnoses (e.g. hemangiomas, viral warts, but including BCC, SCC, and SK). Participants then chose the image with the greatest likeness and then proceeded to one of three screen 2s. Each screen 2 contained a preponderance of either SK, BCC or SCC but also contained a range of other diagnoses. They were then asked to make a match with a single lesion. They were given the option of returning to screen 1 and repeating the process freely until they were content with their choice. In all experiments, there were three possible categories (BCC, SCC, and SK) and the test outcome was assignment to the correct (one of three) groups. Without use of the database, a 'blind participant' would assign an index case to the correct diagnostic group on one of three occasions (given that there are two cards from each of three categories). We therefore used a binomial test to examine whether the results obtained differed from P=1/3 using 'R' software (R Development Core Team, 2008R Development Core Team R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria2008http://www.R-project.orgGoogle Scholar). Our only outcome was the number of correct scores for an index image; how the participants achieved this match, whether for instance they flicked between screens on more than one occasion, was not examined. Scores for the correct diagnosis for the three experiments, experiment 1, 2, and 3, were 36/48 (75%), 50/54 (93%), and 38/42 (90%), respectively. If only novices are considered, the respective figures were 36/48 (75%), 39/42 (93%), and 34/36 (94%). All these scores are highly significant (P 10,000 images), then the elements of machine learning based on user feedback or computer vision will need to be incorporated. We think such approaches are worthy of further research. The authors state no conflict of interest. Nicola Brown was supported by a Wellcome Trust summer studentship at the University of Edinburgh. This work was supported by the Foundation for Skin Research (Edinburgh) and the Wellcome Trust.

Referência(s)