Ethical issue
This study will follow the national and international guidelines stated at the Declaration of Helsinki and, furthermore, it will comply with the legal procedures regarding rules of data confidentiality (Law 15/1999 of December the 13th, about Personal Data Protection [LOPD]).
Mammogram selection
A random sample of 200 mammograms from asymptomatic women aged 50 to 64 years old who had participated in the first and second rounds of a population-based breast cancer screening program in Spain was selected. A total of 33435 mammograms were stratified so that the sample included the four possible results of screening: true negatives, true positives, false negatives, and false positives. These results were validated by comparing the original interpretation obtained in the screening program with the result of the mammogram performed in the following round (2 years later). Histological confirmation was available in all women with a final diagnosis of cancer (both carcinoma in situ and invasive carcinoma). The mammograms were performed in Barcelona (Spain) from 1996-1999.
Of the 200 mammograms selected, 30 (15%) corresponded to women with a definitive diagnosis of cancer (14% true positives, 1% false negatives). The remaining 170 mammograms (85%) corresponded to women with a definitive result of absence of cancer (55% true negatives, 30% false positives by recall). For each participant, double-view mammograms were taken (craniocaudal and mediolateral oblique), with a total of four films per participant. All mammograms complied with the following minimum quality criteria: breast situated centrally with the nipple in profile, visualization of all the breast tissue, the pectoral muscle shadow reached the nipple level, the nipple was seen in profile, and the inframamammary angle could be visualized. We excluded a small number of mammograms not meeting these criteria, as well as women requiring more than one film in one of the views, those who had undergone plastic surgery, those with breast implants, and women with radiopaque skin markers on the breast.
Original films (not copies) were always used. All the mammograms were obtained with a standard film-screen technique (Thosiba SSH 140 A and Bennett Trex Medical) using Agfa Mamoray-HT film.
Radiologists
A random sample of 28 radiologists from the radiology services of distinct health centers in Spain (general hospitals, district hospitals and primary care centers) was selected.
Before beginning data collection, the radiologists were asked if they routinely interpreted mammograms. Depending on their responses, the radiologists were then divided into two groups. The first group included 21 radiologists routinely reading mammograms but with different amounts of experience while the second group included seven radiologists who read mammograms infrequently or who were medical residents in radiology (radiologists not routinely interpreting mammograms).
Experience-related variables
Experience of mammogram interpretation was evaluated in 21 routine readers through a questionnaire designed after a literature review of the possible factors related to radiologists&rsquo self-reported experience.[11] Telephone interviews were performed by one of the project&rsquos researchers, with prior agreement from participating radiologists. The items referred to routine practice in mammogram interpretation during the year prior to participation in the study. The following experience-related factors were taken into account:
Annual reading volume. This variable included both screening and diagnostic mammograms. Annual volume was calculated on the basis of the number of readings made per week, bearing in mind holiday periods and rotations
Consultations. Radiologists were asked whether they routinely (frequently) consulted with other radiologists when interpreting mammograms. This variable is an indicator of whether the mammogram reading was performed individually or as a team.
Years of experience in reading mammograms. The number of years of experience reading both diagnostic and screening mammograms was evaluated without taking into account years of specialist practice.
Radiologists&rsquo age. Age at interview (as a proxy variable of experience).
Focus on breast radiology. The percentage of working hours included the percentage of time devoted to breast radiology, both mammograms and other diagnostic techniques, during radiologists&rsquo working hours.
Feedback. Radiologists were considered to obtain feedback when they followed-up women in whom they recommended further workup after the screening test (imaging tests or invasive procedures).
Reading procedure
Given that the aim was to reproduce as far as possible normal mammogram reading practice, the 28 radiologists independently read the set of 200 mammograms at their workplace. For each breast, the radiologists provided information on the following variables: result, breast density (from less dense to more dense), lesional pattern (nodular, distorting fibrous, mixed, calcified, and parenchymatous asymmetry) and location of the lesion. The results of readings were reported according to the Breast Imaging and Reporting Data System (BI-RADS).[12,13] In the case of more than one lesion in the same breast, only the most severe lesion was reported. At no time were previous mammograms available to radiologists for comparison while interpreting films.
At the beginning of the study, a session was held with all the radiologists to unify the criteria for mammogram data. The radiologists indicated the results in a standard data collection form that included the norms for completion explained in the initial session.
The participating radiologists were blind to both the study design and the proportion of cancers in the sample, although they were informed that cancer cases were oversampled.
Statistical Analysis
To calculate sensitivity and specificity, mammograms were considered positive (women were recalled for additional investigations) when classified as BI-RADS III, 0, IV or V. Readings were considered negative when they were classified as BI-RADS I or II. A single BI-RADS category was determined for each woman, based on the most malignant of the two breasts.
For the univariate analysis, sensitivity and specificity were evaluated according to each experience-related variable, stratified into two levels with a cut-off indicating presumably less and presumably more experience in mammogram reading. Moreover, a global measure of accuracy was calculated the radiologist was assumed to be accurate when classifying mammograms from women with cancer as positive and those from women without breast cancer as negative.
The statistical significance of differences in sensitivity, specificity and accuracy between the radiologists not routinely interpreting mammograms and the group of routine readers, and between the two levels of routine readers for each experience-related variable was determined through chi-square tests.
Sensitivity, specificity and accuracy were then modeled by use of multivariate logistic regression estimated through the method described in detail by Smith-Bindman et al.[14] These models were adjusted by all the experience-related variables. Because of their characteristics, the seven radiologists not routinely interpreting mammograms were excluded from this regression. Given that 15% of the women in the sample had cancer and 85% were cancer-free, to estimate accuracy weights were used to assign equal importance to interpretation of mammograms from women with and without cancer. The analysis took into account the correlation due to the fact that readings were not independent, since each radiologist interpreted the same 200 films. Therefore, marginal models were estimated based on generalized estimating equations (GEE). The analysis was performed through link logit and an exchangeable structure in working correlation matrix. The GENMOD procedure of SAS 9.1 was used.