Diabetic Retinopathy Assessment Variability Among Eye Care Providers in an Urban Teleophthalmology Program

Abstract Background: Teleophthalmology is an evidence-based method for diabetic eye screening. It is unclear whether the type of eye care provider performing teleophthalmology interpretation produces significant variability. Introduction: We assessed grading variability between an optometrist, general ophthalmologist, and retinal specialist using images from an urban, diabetic retinopathy teleophthalmology program. Methods: Three readers evaluated digital retinal images in 100 cases (178 eyes from 90 patients with type 2 diabetes). Fisher's exact test, percent agreement, and the observed proportion of positive (Ppos) or negative agreement (Pneg) were used to assess variability. Results: Among cases deemed gradable by all three readers (n = 65), there was substantial agreement on absence of any retinopathy (88% ± 4.6%, Pneg = 0.91–0.95), presence of moderate nonproliferative or worse retinopathy (87% ± 3.9%, Ppos = 0.67–1.00), and presence of macular edema (99% ± 0.9%, Ppos = 0.67–1.00). There was limited agreement regarding presence of referable nondiabetic eye pathology (61% ± 11%, Ppos = 0.21–0.59) and early, nonroutine referral for a follow-up clinical eye exam (66% ± 8.1%, Ppos = 0.19–0.54). Among all cases (n = 100), there was acceptable agreement regarding which had gradable images (77% ± 5.0%, Ppos = 0.50–0.90). Discussion: Inclusion of multiple types of eye care providers as teleophthalmology readers is unlikely to produce significant variability in the assessment of diabetic retinopathy among high-quality images. Greater variability was found regarding image gradability, nondiabetic eye pathology, and recommended clinical referral times. Conclusions: Our results suggest that more extensive training and uniform referral standards are needed to improve consensus on image gradability, referable nondiabetic eye pathology, and recommended clinical referral times.


Introduction
A n estimated 29.1 million American adults have diabetes mellitus and among them, approximately one quarter have diabetic retinopathy. 1 Early detection and treatment dramatically reduce the risk of severe vision loss, yet diabetic retinopathy remains the leading cause of blindness among working-age U.S. adults, largely due to low screening rates. 2 However, fewer than 50% of Americans with diabetes mellitus receive yearly diabetic eye screening as recommended by the American Diabetes Association (ADA), the American Academy of Ophthalmology (AAO), and the American Optometric Association (AOA). [2][3][4] An urgent need exists for improved access to diabetic eye screening. The rapidly increasing prevalence of diabetesexpected to more than double by the year 2050-has created a growing demand for screening that cannot be met by the current eye care provider workforce. 5,6 As an alternative to traditional screening methods, where an eye care provider performs a dilated eye examination, teleophthalmology programs using store-and-forward retinal photographs have been well validated to provide high-quality diabetic eye screening, to improve access to care, and to reduce blindness. [7][8][9][10] Teleophthalmology is an evidence-based, costeffective approach to diabetic eye screening that represents one of many dramatic advances in telemedicine on the verge of significant widespread adoption in the United States. 11,12 These retinal photographs can be obtained in the primary care setting where 90% of patients with known diabetes obtain medical care. 13 This technology is particularly well suited for preventing blindness in urban minority populations such as Latinos, who are at greater risk for vision loss due to a higher prevalence of more severe diabetic retinopathy and lower screening rates. 14 While teleophthalmology has become increasingly accepted for diabetic eye screening in primary care settings, the availability of adequately trained, qualified readers may limit its widespread implementation. 15,16 Prior studies have compared many different types of readers, including general practitioners, nurses, optometrists, and ophthalmologists. 15 Most U.S. teleophthalmology programs currently use eye care providers as readers, but training backgrounds among these readers can vary significantly. It remains unclear as to whether the type of eye care provider performing teleophthalmology interpretation produces significant variability. This study is the first to directly assess variability between an optometrist, a general ophthalmologist, and a fellowship-trained retina specialist for teleophthalmology evaluation of diabetic retinopathy.

TELEOPHTHALMOLOGY IMAGE ACQUISITION AND PRESENTATION
Three readers assessed a set of masked telemedicine cases from 90 patients (178 eyes) with type 2 diabetes mellitus, who participated in a clinical teleophthalmology program for diabetic eye screening. All images were acquired by a trained ophthalmic photographer from 2008 to 2010 at a single urban primary care clinic affiliated with Massachusetts General Hospital located in Chelsea, Massachusetts, using a nonmydriatic camera (Topcon TRC NW-6S, Topcon Medical Systems, Inc., Tokyo, Japan). Four nonmydriatic, nonstereoscopic digital images were acquired from each eye following the validated Joslin 3-field technique (i.e., 45-degree views of the posterior pole, area nasal to the disc, and area superotemporal to the disc) plus an external photograph. 17 Images from both eyes were included with the exception of 2 cases with monocular patients in whom their only eye was imaged. To assess intrareader reliability, 10 cases were duplicated and all 3 readers independently assessed 100 masked case vignettes from 90 patients. In addition to retinal photographs, each masked telemedicine case included a brief synopsis of the patient's demographic information and medical history, including age, gender, self-reported ethnicity, diabetes type, diabetes medications, last hemoglobin A1c, and past medical history. Readers viewed all masked telemedicine cases within Microsoft PowerPoint (Microsoft, Inc., Redmond, WA).

READER BACKGROUNDS AND GRADING INSTRUCTIONS
The readers were a Veteran's Affairs (VA) Medical Center optometrist (D.H.), an academic general ophthalmologist (C.K.), and an academic fellowship-trained retina specialist (L.K.). All readers completed the standard teleophthalmology diabetic retinopathy reader training program (an electronic independent study course) instituted by the VA healthcare system before initiation of the study. All were active readers in clinical teleophthalmology programs, but the optometrist had several years more of experience from her prior participation in the VA teleophthalmology program.
Readers had not previously evaluated any of the teleophthalmology images in this study and did not have access to the patients' prior clinical assessments for diabetic retinopathy. All readers were oriented to the navigation of the telemedicine case vignettes and the electronic grading sheet by the same investigator (Y.L.) using a series of five sample cases that were not included in the data analysis. Readers were instructed on how to adjust images as needed to maximize gradability, including adjustment for brightness and contrast, as well as for increasing magnification over any area of interest. Readers were informed that the intent of the study was to assess variability in telemedicine diabetic retinopathy assessments.
The readers were asked to determine whether or not images from each case were of sufficient quality to be gradable for diabetic retinopathy, whether there was any diabetic retinopathy present, the severity of diabetic retinopathy if present, and whether there was referable nondiabetic eye pathology (i.e., glaucomatous optic nerve, advanced macular degeneration, etc.). For each case, readers selected one of the three following options for recommended referral time to a clinical eye examination: (1) urgent/within 1 month, (2) nonurgent/within 2-6 months, or (3) routine/within 1 year. Diabetic retinopathy was evaluated on a three-level severity scale: (1) none, (2) mild nonproliferative, or (3) moderate nonproliferative or worse based on the International Classification of Diabetic Retinopathy. 18 If images were considered ungradable, then all remaining assessments regarding the absence or presence of retinopathy, macular edema, or nondiabetic eye pathology were categorized as ''unable to determine.'' There were no synchronous, clinical dilated eye examinations or dilated seven-field ETDRS retinal photographs to serve as reference standards because study images were previously acquired in an active clinical teleophthalmology program. 19

STATISTICAL ANALYSIS
Comparisons of inter-reader and intrareader variability were assessed using Fisher's exact test and percent agreement. In addition, a statistical measure known as the observed proportion of positive (P pos ) or negative agreement (P neg ) was used in place of kappa (j) to confirm agreement due to the ''kappa paradox,'' wherein kappa values are artificially low due to the infrequency of retinopathy in this communitybased sample. 20,21 Values of 70-90% agreement are generally considered acceptable for inter-reader reliability. 22,23 Percent agreement has been used in similar studies and can be safely used since readers were well-trained eye care providers who were unlikely to be guessing in their assessments. 8,24 Agreement was considered limited if <70%, acceptable if 70% or greater, and substantial if 80% or greater. All statistical calculations were made using Stata software (StataCorp LP, College Station, TX).

ETHICS/IRB APPROVAL
This study was approved by the Human Studies Committee/ Institutional Review Board at the Massachusetts Eye and Ear Infirmary and complied with the tenets of the Declaration of Helsinki.
Among all three readers in this study, the average percentage of all telemedicine cases (n = 100) found to have clinically significant diabetic retinopathy (i.e., moderate nonproliferative or worse) was 3.7% -1.5% (range: 2.0-5.0%) and the average percentage of cases found to have macular edema was 2.7% -0.6% (range 2.0-3.0%) ( Table 2). These values were not significantly different ( p > 0.05) among the three readers. The frequency of cases with images considered ungradable for diabetic retinopathy was variable-27% by the general ophthal-mologist, 17% by the optometrist, and 3% by the retina specialist ( p < 0.005). However, inter-reader agreement regarding gradable images for diabetic retinopathy among all cases (n = 100) was acceptable (77% -5.0%, P pos = 0.50-0.90).
Intrareader agreement was more variable in each of the diagnostic and disposition categories ( Table 4). Overall, there was greater intrareader agreement (80-90%, P pos = 0.67-1.00) for the optometrist and retina specialist than for the general ophthalmologist (50-70%, P pos = 0.60-0.77) on gradability of images for diabetic retinopathy, severity of

Discussion
Our study demonstrated substantial inter-reader agreement in diabetic retinopathy assessment among the subset of teleophthalmology cases (n = 65) with high-quality images considered gradable by all three readers: an optometrist, a general ophthalmologist, and a fellowship-trained retina specialist. Prior studies have assessed variability among other types of teleophthalmology readers for diabetic retinopathy. Ruamvi-boonsuk et al. evaluated inter-reader agreement between three retina specialists, three general ophthalmologists, three ophthalmic nurses, and three ophthalmic photographers in single-field digital retinal photographs. 15 Notably, their study did not include optometrists, who form the majority of readers in the VA healthcare system-one of the largest U.S. teleophthalmology programs. 11 A 2-day instruction course was provided to all nonphysician readers in that study, but no standardized training was provided to general ophthalmologists or retina specialists. The authors found only fair agreement between and within groups, except for retina specialists who showed moderate to substantial intragroup agreement. Vujosevic et al. demonstrated similar findings among retina specialists. 25 Longer training may be needed for nonphysician readers, who showed moderate to substantial agreement with   16 While using nonphysician readers may expand reader availability, investment in a year-long training program may not be practical in many U.S. healthcare settings. In this study, all readers were eye care providers who underwent a brief electronic independent self-study training program instituted by the VA healthcare system. Ruamviboonsuk et al. concluded that retinal specialists were generally the most reliable teleophthalmology readers. 15 It is therefore notable that in our study, all cases identified by the retina specialist as having clinically significant diabetic eye disease (i.e., those with macular edema or moderate nonproliferative or worse retinopathy) were also identified as such by the other two readers. Therefore, if we were to use the retina specialist's interpretation as a reference standard, then the general ophthalmologist and optometrist both had 100% sensitivity in identifying vision-threatening diabetic eye disease for early clinical referral, although they referred additional patients (e.g., 70% and 81% specificity, respectively) often as a result of noting more ungradable images. 18 A previous study comparing the results of in-person clinical diabetic eye examinations between optometrists and general ophthalmologists found that optometrists had a referral sensitivity of 77% and 92% for moderate to severe macular edema and diabetic retinopathy, respectively. 26 Our study demonstrates that telemedicine evaluation by non-retina specialist, eye care providers is highly sensitive for detecting clinically significant, vision-threatening forms of diabetic eye disease when a standardized training program is used and furthermore, that this appears to exceed clinical thresholds currently accepted for in-person dilated eye examinations.
Since most patients in our study did not have retinopathy, it was unsurprising that there was greater agreement on the absence of any retinopathy (P neg ) versus its presence (P pos ) and greater agreement on absent or mild retinopathy (P neg , severity of retinopathy) versus moderate or worse retinopathy (P pos , severity of retinopathy) due to the difference in the frequencies of these findings in the community. Similarly, because of the lower frequency of referable nondiabetic eye disease and earlier, nonroutine referral, there was greater agreement on the absence of referable nondiabetic eye disease (P neg ) versus its presence (P pos ) and greater agreement on routine (P neg , recommended referral time) versus earlier, nonroutine referral (P pos , recommended referral time).
In addition to inter-reader agreement, we also evaluated intrareader agreement by duplicating 10 cases in our dataset. The general ophthalmologist had the lowest intrareader agreement, in part, from having a high percentage (27%) of ungradable image assessments, which led to more frequent assessments of ''unable to determine'' in each category. The higher frequency of ungradable image assessments by the general ophthalmologist may reflect less experience compared to the retina specialist in evaluating diabetic retinopathy under suboptimal conditions (e.g., through a postoperative gas bubble). Nor did the general ophthalmologist have as much experience with teleophthalmology as the VA optometrist. These data support the existence of a

VARIABILITY IN TELEOPHTHALMOLOGY GRADING
learning curve in teleophthalmology grading among eye care providers. 15 As a result, teleophthalmology programs should consider having more experienced readers supervise newer readers with protocols for on-going quality assurance, wherein more than one reader assesses a subset of images (''over-reads''). The large range of ungradable rates (3-27%) found among our readers is similar to that of other published studies (4-35%). 8,17,27 Since the absence of retinopathy is more common in our clinical image dataset, the fact that the optometrist and general ophthalmologist noted more ungradable images than did the retina specialist also resulted in their noting fewer cases with no diabetic retinopathy. The wide range of gradability found by readers in our study suggests that improved training modules to standardize criteria for gradability should be developed. This is especially important because unnecessarily high ungradable rates could strain downstream eye care services (e.g., patients with ungradable images are typically referred for clinical evaluation because 50% have significant ocular pathology). 28 In addition, there was limited inter-reader agreement on referable nondiabetic eye disease and recommended clinical referral times. Owsley et al. have found that 44.2% of patients participating in teleophthalmology screening for diabetic retinopathy have nondiabetic eye pathology. 29 Given the high frequency of nondiabetic eye pathology in teleophthalmology programs and the need to regulate the volume of downstream eye clinic referrals, training modules to further standardize referral criteria in these areas are also needed.
One proposed solution to inter-reader variability is automated grading using objective software algorithms. Prior studies have shown that automated grading systems have high sensitivity, but low specificity. 30 A more recent study by Gulshan et al. showed both high sensitivity and specificity for detecting moderate or worse diabetic retinopathy using a deep learning algorithm. 31 However, costs for licensing such software, professional liability, and billing models remain to be determined. Also, these algorithms may be limited in their ability to correctly assess the relative significance of a wide variety of nondiabetic eye pathology. 32 Thus, automated grading systems may be best used in combination with human readers. Future studies should evaluate the optimal implementation of automated grading systems in clinical teleophthalmology programs.
A major strength of our study is that the telemedicine cases came directly from a clinical teleophthalmology program for diabetic retinopathy screening in an urban, primary care clinic. In addition, our study population was unique given the large proportion (73.4%) of Latino patients. This group has been found to have an increased prevalence, incidence, and likelihood of progression of diabetic retinopathy as well as lower rates of diabetic eye screening compared to other ethnic groups. 14,33 Latinos and other high-risk groups with historically low screening rates have the greatest potential to benefit from increased access to diabetic eye screening using teleophthalmology. Thus, it is important for these groups to be well represented in teleophthalmology research studies. Interestingly, the frequency of diabetic retinopathy was substantially lower in our study than that reported previously in a Los Angeles Latino population. 34 This may have resulted, in part, from a lack of patients with type 1 diabetes in our study. In addition, patients with type 2 diabetes obtaining regular care at this community-based primary care clinic may have relatively better long-term glycemic control than those in other parts of the country with less access to care.
Limitations of our study included the absence of a reference standard for the telemedicine cases (e.g., synchronous inperson examination or ETDRS 7-field images) since our cases came directly from a real-world teleophthalmology program. 19 Furthermore, a single individual represented each type of eye care provider in our study. Some of our results may be attributable to unique aspects of each reader and larger studies with more readers may help to determine the generalizability of our findings. Although all readers underwent standardized training, there were no interim assessments to provide readers with additional feedback. Such feedback may be critical for ongoing quality assurance, and the grading of a subset of reads by at least one other experienced reader should be used for quality control purposes in teleophthalmology programs. Another limitation was the small number of duplicates (10%) used to assess intrareader variability. However, this exceeded the proportion of duplicate reads (5%) used as quality control standards for retinal image assessments in large-scale research studies. 35 In summary, substantial inter-reader agreement was found among three telemedicine readers: an optometrist, general ophthalmologist, and retinal specialist in diabetic retinopathy assessment among cases with high-quality images. This is the first study directly comparing variability in diabetic teleophthalmology evaluation between readers with these training backgrounds. Our study suggests that the inclusion of multiple types of eye care providers as teleophthalmology readers is unlikely to produce significant variability in diabetic retinopathy grading when a standardized training program is used. Training modules to further standardize criteria for image gradability, referable nondiabetic eye pathology, and recommended clinical referral times should be developed. Continued advances in image assessment are needed to expand access to diabetic eye screening using teleophthalmology to meet the growing demand in communities worldwide.