Comparing reliability between different ultrasound techniques

Carta

Produção Nacional Revisado por pares

Comparing reliability between different ultrasound techniques

2012; Wiley; Volume: 39; Issue: 4 Linguagem: Inglês

10.1002/uog.11138

ISSN

1469-0705

Autores

Wellington P. Martins,

Tópico(s)

Statistical Methods in Epidemiology

Resumo

Ultrasound in Obstetrics & GynecologyVolume 39, Issue 4 p. 482-485 CorrespondenceFree Access Comparing reliability between different ultrasound techniques W. P. Martins, Corresponding Author W. P. Martins wpmartins@gmail.com Departamento de Ginecologia e Obstetrícia da Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo (FMRP-USP), Av. Bandeirantes, 3900–8° andar - HCRP - Campus Universitário, Ribeirão Preto, São Paulo, Brazil. CEP: 14049-900Escola de Ultra-sonografia e Reciclagem Médica de Ribeirão Preto (EURP), Ribeirão Preto, Brazil; Instituto Nacional de Ciência e Tecnologia (INCT) de Hormônios e Saúde da Mulher, Ribeirão Preto, BrazilSearch for more papers by this author W. P. Martins, Corresponding Author W. P. Martins wpmartins@gmail.com Departamento de Ginecologia e Obstetrícia da Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo (FMRP-USP), Av. Bandeirantes, 3900–8° andar - HCRP - Campus Universitário, Ribeirão Preto, São Paulo, Brazil. CEP: 14049-900Escola de Ultra-sonografia e Reciclagem Médica de Ribeirão Preto (EURP), Ribeirão Preto, Brazil; Instituto Nacional de Ciência e Tecnologia (INCT) de Hormônios e Saúde da Mulher, Ribeirão Preto, BrazilSearch for more papers by this author First published: 27 March 2012 https://doi.org/10.1002/uog.11138Citations: 7AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinked InRedditWechat I read with great interest the recently published article by Sur et al.1, in the November issue of the White Journal. In this study the authors intended to compare the intraobserver reliability of embryo volume measurement using a 'semi-automated technique'—using both Virtual Organ Computer-aided AnaLysis (VOCAL) and Sonography-based Automated Volume Count (SonoAVC)—with a manual technique using VOCAL alone. The authors concluded that the semi-automated technique is more reliable than is the manual technique for embryo volume measurement. However, based on their results, I strongly disagree with their conclusion; additionally I have some minor comments about the study. Although the terms 'reliability' and 'agreement' are sometimes used interchangeably, they refer to different concepts2. A recent guideline for reporting reliability and agreement studies2, a statistical review3 and even some reliability studies/agreement studies recently published in the White Journal4-6 have reported how reliability and agreement should be defined. Reliability relates the magnitude of the measurement error in observed measurements to the 'true' variability between subjects or, in other words, the ability of a measurement to differentiate between subjects or objects2. Agreement quantifies how close two measurements made on the same subject are, regardless of the 'true' variability between subjects3. The authors of these studies concur that evaluating both reliability and agreement might be important in some situations, particularly when different samples are being evaluated, which might present different 'true' variability (heterogeneity) between subjects. They also agree that the intraclass correlation coefficient (ICC) should be used to evaluate the reliability of methods assessing continuous data (e.g. fetal volume) and that Bland–Altman plots might be used along with other methods when assessing the agreement. In the study of Sur et al.1, the authors described that their objective was to compare the intraobserver reliability between the two methods of assessing fetal volume. Therefore, I expected them to compare the ICCs between these methods, preferably considering the 95% CIs. Since they concluded that the semi-automated technique is more reliable than is the manual technique for embryo volume measurement, I assumed that the ICC for the semi-automated technique was greater than that observed for the manual technique, but this information was not provided in the abstract. When examining the full text I was surprised by the results. Actually, the ICC for the manual technique was higher than that observed for the semi-automated technique: 0.976 (95% CI, 0.926–0.991) vs. 0.942 (95% CI, 0.874–0.997). This result means that, when using the manual mode, 97.6% of the variability in measurements of embryo volumes was estimated to be due to genuine differences between fetuses, 2.4% being due to errors in the measurement process and the observer involved; when using the semi-automated technique, 5.8% of the variability in measurements was estimated to be to due errors in the measurement process and the observer. In other words, random errors were responsible for a greater proportion (2.4 fold) of the total variability when using the semi-automated method. Sur et al. should have concluded that the manual technique is more reliable than the semi-automated one. However, there was an overlap in the 95% CIs between the estimated ICCs, and it is possible that the observed difference occurred only by chance. Yet, the 95% CIs provided were miscalculated: using the estimated ICC and the sample size of 52 subjects, I calculated different values (Table 1). Even considering the new and correct 95% CIs, there is still a small overlap between the two 95% CIs of the estimated ICCs; however, this would probably not occur if a larger sample (> 90 subjects) were examined, providing more precise estimates. Therefore, the best conclusion based on the results of this study is that the manual technique is probably more reliable than the semi-automated one; however, the evaluated sample did not provide sufficient precision for a more robust conclusion. The worse reliability associated with the semi-automated technique is very plausible, since there are more sources of error: each of the three measurements—gestational sac, yolk sac (using the manual method) and amniotic fluid (using SonoAVC)—will provide errors that will be summed when calculating fetal volume. Table 1. 95% CI from intraclass correlation coefficients (ICC) provided by Sur et al.1 and recalculated based on sample size ICC (95% CI) Technique Provided by authors Estimated by sample size Semi-automated 0.942 (0.874–0.997) 0.942 (0.901–0.966) Manual 0.976 (0.926–0.991) 0.976 (0.958–0.986) 95% CIs estimated using Fisher transformation and sample size = 52 subjects. On the other hand, the intraobserver agreement, measured as percentage difference between measurements, seems to be better for the semi-automated method. However, comparing the intraobserver agreement between these two methods is complicated, because the semi-automated technique resulted in larger measurements than did the manual method (three-fold). This huge difference in systematic error probably occurred because SonoAVC underestimated the total fluid. However, based on the results from several studies about validity of volume estimation using VOCAL7-9, it is unlikely that VOCAL had underestimated fetal volume to such a degree. The only expected systematic error when estimating fetal volume with VOCAL would be a 5–10% underestimation of fetal volume if limbs are not included, evaluating only fetal head and trunk volume10, 11. In fact, the observed limits of agreement (LOA) of approximately ± 16% when using the semi-automated technique were worse than the observed LOA of ± 25% when using the manual technique: this represents approximately 0.50 cm3 vs. 0.25 cm3 for semi-automated vs. manual techniques. This is consistent with the findings from ICCs: the semi-automated technique again resulted in larger (two-fold) random errors. The authors would have avoided this misleading result if they had also presented the results from Bland–Altman plots using absolute differences. I also have some minor comments: 1. Datasets. By evaluating a single three-dimensional (3D) dataset twice, the authors did not evaluate the true reliability between methods, because only stable information was analyzed. This might be particularly relevant for the variability of automated estimation of amniotic fluid by SonoAVC. In my opinion, the 'true' reliability and agreement of an ultrasound method can only be assessed properly when a whole new examination is performed. In this case (fetal volume), a new 3D dataset should have been acquired, preferably with some minutes between acquisitions to permit fetal movements. 2. Evaluated gestational age (5–9 weeks). I believe that fetal volume measurement is of great value, particularly between 10 and 14 weeks' gestation. At this gestational age, fetal head and trunk volume have been shown to be better than crown–rump length (CRL) to evaluate growth impairment12 and to estimate gestational age10. One reason for this is the fetal attitude: small fetal flexions or deflections may modify CRL measurement, while it is very unlikely to change fetal volume estimation. Until 10 weeks' gestation, fetuses are normally in a neutral position, and the improvement of fetal volume over CRL in determining the 'true' gestational age is smaller and unlikely to have any clinical meaning13. 3. Citations from previous studies. The authors stated in their introduction that 'embryo volume, between 7 and 10 weeks of gestational age, has been shown to be a better predictor of gestational age than CRL in a study of 30 in vitro fertilization (IVF) pregnancies11.' This information is not accurate: in the cited study10, the authors demonstrated that fetal head and trunk volume was more accurate than was CRL to determine gestational age between 10 and 14 weeks' gestation. In another study13, the authors observed that fetal volume assessed by VOCAL was better than was CRL to estimate gestational age between 7 and 10 weeks. However, in the latter study the authors concluded that the improvement was small and unlikely to have any clinical relevance13, probably for the reasons previously described. 4. Inaccurate information. The authors reported that Blaas et al.11 measured fetal volume using the conventional slicing technique. Actually, Blaas et al. measured the fetal volume using EchoPAC-3D, a completely different technique, which requires delineating segmentation lines all over the surface of the embryo and geometric reconstruction. In my opinion, the EchoPAC-3D method provided the best images (Figure 1) and the feeling that the fetuses were measured accurately. However, this technique is time consuming: the segmentation process took between 20 and 30 min per fetus11, which is somewhat prohibitive in clinical practice. The lack of a final rendered image showing exactly what has been evaluated is one of the main flaws of the described semi-automated method: although a huge systematic error was shown, we were not able to determine from where this extra volume came (e.g. umbilical cord, underestimated fluid estimation). 5. Figure 5. In my opinion, this figure is not very intuitive because it is hard to understand how one measurement could be > 100% smaller than another. The authors should provide an explanation reporting that the difference between measurements is related to the average value between two techniques and not comparisons between the methods. For example, we can see one point near to − 150% on the y-axis and close to 1.0 cm3 in the x-axis. This means that the average value between the two techniques was 1 cm3 and that the difference between the measurements was 1.5 cm3 (150% of 1.0 cm3); therefore, one might conclude that in the same fetus the manual technique measured 0.25 cm3 while the semi-automated technique measured 1.75 cm3 (a seven-fold difference). 6. Including limbs on fetal volume. I agree that including the limbs in the measurement of the fetus will probably result in values closer to the whole fetus, since they represent 5–10% of the total volume between 7 and 12 weeks11 and I understand why the authors developed a strategy for including the limbs before the 10th week when using VOCAL, 'drawing a thin connecting stalk between the limbs and trunk'. However, this thin stalk does not exist and drawing such stalk might be considered another source of bias, which is acknowledged as a limitation by the authors. In my opinion, including the limbs on fetal volume estimation is not important because the underestimation in volume would be predicted, and be relatively small and constant14. Therefore any observer would be able to compare the measured value with proper reference curves determined using VOCAL or any other method that does not include the limbs on fetal volume estimation. Figure 1Open in figure viewerPowerPoint Three-dimensional (3D) surface-rendered fetus delineated with Virtual Organ Computer-aided AnaLysis1 (a) and EchoPAC-3D11 (b). When using the semi-automated technique, combining VOCAL with SonoAVC as suggested by the authors1, the observer is not able to evaluate precisely what has been measured using the rendered images. Therefore, even very large systematic errors (> three-fold) may be overlooked, which should be considered as a significant limitation of this method. Images reprinted with permission. References References 1Sur SD, Clewes JS, Campbell BK, Raine-Fenning NJ. Embryo volume measurement: an intraobserver, intermethod comparative study of semiautomated and manual three-dimensional ultrasound techniques. Ultrasound Obstet Gynecol 2011; 38: 516– 523. Wiley Online LibraryCASPubMedWeb of Science®Google Scholar 2Kottner J, Audige L, Brorson S, Donner A, Gajewski BJ, Hrobjartsson A, Roberts C, Shoukri M, Streiner DL. Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed. J Clin Epidemiol 2011; 64: 96– 106. CrossrefPubMedWeb of Science®Google Scholar 3Bartlett JW, Frost C. Reliability, repeatability and reproducibility: analysis of measurement errors in continuous variables. Ultrasound Obstet Gynecol 2008; 31: 466– 475. Wiley Online LibraryCASPubMedWeb of Science®Google Scholar 4Lima JC, Miyague AH, Filho FM, Nasti CO, Martins WP. Biometry and estimated fetal weight by two-dimensional and three-dimensional ultrasonography: an intra- and inter-observer reliability and agreement study. Ultrasound Obstet Gynecol 2011. DOI: 10.1002/uog.10146. Google Scholar 5Martins WP, Raine-Fenning NJ, Leite SP, Ferriani RA, Nastri CO. A standardized measurement technique may improve the reliability of measurements of endometrial thickness and volume. Ultrasound Obstet Gynecol 2011; 38: 107– 115. Wiley Online LibraryCASPubMedWeb of Science®Google Scholar 6Martins WP, Lima JC, Welsh AW, Araujo Junior E, Miyague AH, Mauad Filho F, Raine-Fenning NJ. Three-dimensional Doppler evaluation of single spherical samples from placenta: Intra- and inter-observer reliability. Ultrasound Obstet Gynecol 2011. DOI: 10.1002/uog.11076. Google Scholar 7Martins WP, Ferriani RA, Barra DA, Dos Reis RM, Bortolieiro MA, Nastri CO, Filho FM. Reliability and validity of tissue volume measurement by three-dimensional ultrasound: an experimental model. Ultrasound Obstet Gynecol 2007; 29: 210– 214. Wiley Online LibraryCASPubMedWeb of Science®Google Scholar 8Cheong KB, Leung KY, Li TK, Chan HY, Lee YP, Tang MH. Comparison of inter- and intraobserver agreement and reliability between three different types of placental volume measurement technique (XI VOCAL, VOCAL and multiplanar) and validity in the in-vitro setting. Ultrasound Obstet Gynecol 2010; 36: 210– 217. Wiley Online LibraryCASPubMedWeb of Science®Google Scholar 9Raine-Fenning NJ, Clewes JS, Kendall NR, Bunkheila AK, Campbell BK, Johnson IR. The interobserver reliability and validity of volume calculation from three-dimensional ultrasound datasets in the in vitro setting. Ultrasound Obstet Gynecol 2003; 21: 283– 291. Wiley Online LibraryCASPubMedWeb of Science®Google Scholar 10Martins WP, Ferriani RA, Nastri CO, Filho FM. First trimester fetal volume and crown-rump length: comparison between singletons and twins conceived by in vitro fertilization. Ultrasound Med Biol 2008; 34: 1360– 1364. CrossrefPubMedWeb of Science®Google Scholar 11Blaas HG, Taipale P, Torp H, Eik-Nes SH. Three-dimensional ultrasound volume calculations of human embryos and young fetuses: a study on the volumetry of compound structures and its reproducibility. Ultrasound Obstet Gynecol 2006; 27: 640– 646. Wiley Online LibraryPubMedWeb of Science®Google Scholar 12Falcon O, Peralta CF, Cavoretto P, Auer M, Nicolaides KH. Fetal trunk and head volume in chromosomally abnormal fetuses at 11 + 0 to 13 + 6 weeks of gestation. Ultrasound Obstet Gynecol 2005; 26: 517– 520. Wiley Online LibraryCASPubMedWeb of Science®Google Scholar 13Martins WP, Nastri CO, Barra DA, Navarro PA, Mauad Filho F, Ferriani RA. Fetal volume and crown-rump length from 7 to 10 weeks of gestational age in singletons and twins. Eur J Obstet Gynecol Reprod Biol 2009; 145: 32– 35. CrossrefPubMedWeb of Science®Google Scholar 14Martins WP. Measurement of embryo volume using SonoAVC. Ultrasound Med Biol 2010; 36:2144; author reply 2145. Google Scholar W. P. Martins*, * Departamento de Ginecologia e Obstetrícia da Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo (FMRP-USP), Av. Bandeirantes, 3900–8° andar - HCRP - Campus Universitário, Ribeirão Preto, São Paulo, Brazil. CEP: 14049-900 Citing Literature Volume39, Issue4April 2012Pages 482-485 FiguresReferencesRelatedInformation

Ver no editor

Altmetric

PlumX

Entrar

Lembrar minha senha

Receber meu e-mail de confirmação

Comparing reliability between different ultrasound techniques