Evaluation of Speakers at a National Radiology Continuing Medical Education Course
Collins, MD, MEd, FCCP*, Brian F. Mullan, MD†,
field of continuing medical education (CME) has become a subject for investigation
over the past two decades. CME studies have expanded dramatically as a
result of a need to know why and how physicians learn, and how formal
and informal education contributes to the medical practice of competent
physicians.1 When asked for a definition of CME, many physicians
will describe a short course with instructors presenting didactic lectures
to large groups of physicians sitting for long hours in a hotel conference
room in rows of narrow tables. The lecture is the staple format of CME
course instruction. However, few if any studies have examined the features
of effective or ineffective radiology CME lectures. A query of pubmed
(http://www.pubmed.com) with the search
terms “radiology CME lectures”, “radiology CME”,
and “radiology lectures” revealed articles about delivering
CME electronically and the use of software in making digital slides, but
no articles specifically related to what constitutes effective lecturing
at radiology CME courses.
Materials and Methods
The STR annual meeting took place at the Fairmont Hotel in San Francisco, March 24-28, 2002. The course was jointly sponsored by the STR and the Radiological Society of North America (RSNA), which was accredited by the Accreditation Council for Continuing Medical Education (ACCME) to sponsor continuing medical education for physicians. The course was designated as meeting the criteria for up to 37 credit hours in Category 1 of the Physician’s Recognition Award of the American Medical Association.
Twenty-two sessions were programmed for the five-day course. One session was cancelled due to an absent speaker. Sessions varied from having two to seven speakers and lasting one-half to five hours. All sessions consisted of didactic lecture presentations supplemented with 1-3 page handouts. All but one speaker used digital presentation; one speaker utilized traditional slides. Attendees were supplied with a booklet containing evaluation forms (Figure 1) to use in evaluating each session. The form asked participants to evaluate ten items from one to four, where one represented “strongly agree” and four represented “strongly disagree.” None of the items allowed for evaluation of individual speakers, and all of the sessions evaluated included more than one speaker. Item #11 asked participants to respond to the following: “Please tell us what topic you think would be most important to have covered in the future.” Participants’ responses were tallied and collated by the Department of Data Management at the RSNA to provide the following data: total respondents, total respondents per day, total program evaluations, distribution of program evaluations by day, percent distribution of number of sessions attended, percent distribution of total credits earned, meeting attendance by session number, mean ratings for each session (items 1-10) and a summary of verbatim comments to item #11. Comments were evaluated for total number and common themes by one radiologist (JC) and subsequently reviewed by two other radiologists (JMH, BM).
The STR Training Committee implemented a system to evaluate individual speakers, through the use of scannable forms. Each participant was provided with a set of forms that listed each speaker name, title of presentation, and time of presentation. Participants were asked to rate each speaker’s overall performance from one to seven, where 1=very poor, 2=poor, 3=unsatisfactory, 4=satisfactory, 5=good, 6=very good, and 7=outstanding. A space was provided for comments for each speaker. The forms were scanned at the University of Iowa, and the data was entered into computer spreadsheets. Raw data included speakers’ names and scores they received. From this, the computer generated an average score for each speaker, average score for all speakers, total evaluations per speaker and average number of evaluations per speaker. Comments were collated and analyzed to determine the number of positive and negative comments and common themes. All comments were initially reviewed by one radiologist (JC) and subsequently by two additional radiologists (JMH, BM). Criteria for what constituted a negative or positive comment were not pre-determined and were based on subjective analysis by each reviewer.
Session evaluations - The total number of respondents was 234 (75.7%) of 309 professional registrants, which ranged from 226 on the second day of the course to 153 on the last day. A total of 3,879 evaluations were submitted, with a range of 1016 (26.2%) of 3,879 on the second day and 385 (9.9%) of 3,879 on the last day. One hundred-sixty-seven (71.4%) of 234 respondents attended fifteen or more sessions (out of a total of 22 sessions). One of the 22 sessions was cancelled, yet 53 (1.37%) of 3,879 responses related to this cancelled session. Average and median number of sessions attended were 16.58 and 18.00 (standard deviation 4.65, range 1 – 22). Average and median credits earned were 28.97 and 30.88 (standard deviation 7.49, range 3.75 – 37.00). Average and median respondents per session were 176.32 and 188.00 (standard deviation 42.08, range 53.00 – 222.00).
Mean ratings for items 1 - 10 for all sessions ranged from 1.28 – 2.05, with one representing the most positive response and four the least positive. The median rating for items 1 – 10 for all sessions was either one (n = 102) or two (n = 117), except for one median rating of 1.5. Standard deviations ranged from .451 to .902. A “comment” was defined as one statement or set of statements from one participant for one session. The total number of comments was 152. Not all sessions received comments. There were no illegible comments. Participants were asked to comment on what topic they think would be most important to have covered in future meetings, and 57 (38%) of 152 comments were related to this question (i.e. “More MRI”, “More on micronodules”). Many of the comments were not related to the question. Sixteen (10.5%) of 152 comments were related to poor image quality (i.e. images too dark, images too small, images not seen on inferior portion of slide, animation effects distracting). Five (3.3%) of 152 comments related to a recommendation to have more time for discussion and questions. Four (2.6%) of the 152 comments related to a recommendation for a multidisciplinary approach at future courses (i.e. including radiologists, pulmonologists, cardiothoracic surgeons and thoracic pathologists). Three (2.0%) of 152 comments were recommendations for having a specific speaker return and give a longer presentation on Power Point lecturing. There were no other dominant themes that were repeated in the comments submitted. The remainder of the comments varied and related to participant satisfaction with the course amenities (i.e. “I was disappointed in the breakfast selection”); satisfaction with the lecture room (i.e. “Room not dark enough”); superlatives (i.e. “Outstanding”); and the organization of the course (i.e. “Speakers left prior to discussion”, “Need to keep speakers on time”).
Speaker evaluations - The total number of non-zero data points was 12,602. The number of responses for each rating from one to seven was the following: 1(n=14), 2(n=9), 3(n=130), 4(n=1053), 5(n=3251), 6(n=5399), 7(n=2746). The total number of speakers evaluated was 81. One speaker, who delivered the Benjamin Felson Memorial Lecture, was not evaluated. Nine speakers each gave two presentations. Evaluations for speakers who gave two presentations were summed and not individually recorded; therefore, speakers who gave two presentations generally received twice as many evaluations as a speaker who gave only one presentation.
The average speaker rating was 5.7 (standard deviation 0.94, range 4.3 – 6.4). The average number of evaluations per speaker was 153. The number of evaluations per speaker ranged from 2 to 313 (the speaker with 313 evaluations gave two presentations). Four speakers received an average rating of less than or equal to five (one only received two evaluations; the others received 135, 159 and 158 evaluations).
A “comment” was defined as one statement or set of statements from one participant for one speaker. The total number of comments was 914. Fifty-two comments were deleted because it wasn’t obvious if they were negative or positive or words were missing that compromised the meaning (i.e. “Images, especially from older generation CT scanners”, “Really wild case”, “Basic stuff”); they were unrelated to the quality of the presentation (i.e. “I can’t rate myself”, “She has given this lecture for years”); the statements were nonjudgmental, offered advice, or asked questions (i.e. “More negative than my personal experience” [PACS talk]); or they were related to the organization of the course and not individual speakers (i.e. “Too many talks on catheters this year”). Fifteen (28.8%) of the 52 deleted comments were related to redundancy in the program, specifically the topics functional lung imaging, imaging of the pleura, lines and monitoring devices and missed lung cancer. The total number of comments that were analyzed further was 862.
The total number of comments considered positive was 505 (58.6%) of 862. Seventy-six (93.8%) of 81 speakers received one or more positive comments. The total number of comments considered negative was 404 (46.9%) of 862. Seventy-two (88.9%) of 81 speakers received one or more negative comments. The highest number of negative comments for any one speaker was forty-three. Since several comments included more than one statement, and a single comment could include both positive and negative statements, the total number of positive and negative comments exceeded 862.
Review of all negative comments revealed common themes (Table 1). One hundred-seven (12.4%) of 862 comments were negative and related to content of the presentation (i.e. no data provided, too many numbers and statistics, no significant advances shown, biased, out of date, not enough examples/images shown, too complex/not simple enough/confusing, disorganized, overinclusive, didn’t follow printed program, not practical for most radiologists, need more on applications [related to CT angiography presentation], too technical, not enough technical information, need to describe images, not concrete enough, lacking in science with too much opinion, imprecise, not clinically focused, not enough pathologic focus, no focus, and incorrect terminology [referring to an MRI study as a CT scan]). Fifty-one (63%) of 81 speakers received one or more negative comments related to content. Seventy-four (8.6%) of 862 comments were negative and related to the delivery of the presentation (i.e. monotone voice, talked too fast, voice too sharp, read the slides, unenthusiastic, rambled, stammered, dull, negative tone, condescending, not dynamic, tentative, not loud enough, too many “ums” and “uhs”, didn’t speak well into the microphone, shaking pointer, sloppy language, and didn’t use laser pointer/cursor). Thirty-one (38.3%) of 81 speakers received one or more negative comments related to delivery. Seventy-two (8.4%) of 862 comments were related to poor image slides (i.e. poor contrast, images too small, too few images, and cine images moved too fast). Nineteen (23.5%) of 81 speakers received negative comments related to image slides. One speaker received 37 such comments. Fifty-eight (6.7%) of 862 comments were related to poor command of the English language. Nine (11.1%) of 81 speakers received negative comments related to language, and one speaker received 22 such comments. Two telling comments were: “I’d rather have an unknown presenter than a famous speaker with bad English”, and “Poor English can ruin the best talk.” Thirty-seven (4.3%) of 862 comments were related to poor quality of text slides (i.e. too many lines per slide, too many words per line, lines extending too far inferiorly on the slide, spelling errors, distracting animation effects, too many graphs, and poor color scheme). Twenty-six (32%) of 81 speakers received one or more negative comments related to poor quality of text slides. Nineteen (2.2%) of 862 comments were related to poor handout material (i.e. didn’t follow slides, poor references, no handout/references, and not detailed enough). Fourteen (17.3%) of 81 speakers received one or more negative comments related to handouts.
Lectures have been viewed as a poor method to promote the development of thinking skills for the formation of attitudes.2-4 The main reason for this is the lack of involvement by the students who remain passive recipients of information. However, when done effectively, the lecture can allow students to learn new material, explain difficult concepts, organize thinking, promote problem solving, and challenge attitudes.5-7 Lectures remain the most popular and desired form of teaching as judged by participant responses at a national CME course (unpublished participant evaluation data from the 2002 annual meeting of the American Roentgen Ray Society, Atlanta GA).
Comments from the 2002 STR individual speaker evaluations were both positive (58.6%) and negative (46.9%). Since a single “comment” could include both positive and negative statements, the total exceeds 100%. Analysis of the negative comments revealed common themes: poor content, poor delivery, poor image slides, poor use of the English language, poor text slides, and poor quality handouts. The theme that was commented on most often (12.4%), and directed at the largest number of faculty (63%) was poor content. This information can be used to provide feedback to speakers on how their presentations can be improved. It can also be used by persons involved in directing “teach the teachers” courses. “Teach the teachers” workshops and courses, which introduce participants to interactive lecturing, lead to lectures that increase student participation and involvement in the large class lecture.8
Several initiatives for “teaching the teachers” are in place.9 A study that involved interviews with medical educators, review of the literature, analysis of the main themes in 11 “teaching the teachers” courses, and survey of 593 physicians (including radiologists) in England found the following to be considered the key themes in training medical teachers: 1) giving feedback constructively, 2) keeping up to date as a teacher, 3) building a good educational climate, 4) assessing the trainee and his/her learning needs, and 5) practical teaching skills.10 Another study11aimed at helping hospital consultants identify their needs in relation to teaching skills showed that physician teachers need to acquire and update their teaching skills through attending courses that should include basic teaching and assessment/appraisal skills.
Gelula12 reported on aspects of voice clarity and speaking speed, approaches to using audiovisual aids, effectively using the audience as a resource, and ways to be entertaining as keys to effective lecturing. Specific recommendations included: 1) practicing lecturing, 2) listening to one’s own voice and being deliberate, 3) timing oneself, 4) not talking to the slides (rather, looking at the audience during most of the presentation, and looking to everyone in the audience), 5) not reading slides word-for-word, 6) pausing after highlighting points on a slide, 7) speaking in a conversational tone, 8) moving around the podium if possible and using body language to emphasize points (if possible, leaving the podium and moving into the audience), 9) asking questions of the audience (even requesting a show of hands), 10) using voice inflection (at times speaking in a whisper to make a dramatic entry to a central point, and forcefully following up with the conclusion), and 11) using humor.
In another study, Copeland et al13 collected data prospectively from physicians participating in lecture-based CME internal medicine courses to determine the most important features of the effective lecture. These features were clarity and visibility of slides, relevance of material to the audience, and the speaker’s ability to identify key issues, engage the audience, and present material clearly and with animation. Features determined least likely to affect the attendee’s ratings of a lecture included presenter’s age, gender, physical appearance, and time of day in which the lecture was delivered.
One aspect of “delivery” that has received attention in the literature is the art of entertaining the audience. According to Gagne’s conditions of learning14, it is first necessary to motivate and gain attention of the learner in order for learning to take place. When done properly, this aspect of the lecture offers a distinct advantage over written text or computerized programs. The importance of entertainment in the perceived effectiveness of a lecture was shown in a study by Naftulin et al15 where the authors hired a professional actor, whom they named Dr. Myron L. Fox, to deliver a lecture to a group of highly trained educators on mathematical game theory as applied to physician education. The source material was derived from a complex but sufficiently understandable scientific article geared to lay readers. One of the authors of the article coached “Dr. Fox” to present his topic and conduct his question and answer period with an excessive use of double talk, neologisms, non sequiturs, and contradictory statements. All this was interspersed with parenthetical humor and meaningless references to unrelated topics. The participants not only responded favorably to the lecture but several even noted that they had read Dr. Fox’s publications! The authors of the study concluded that the extent to which students are satisfied with teaching, and even the degree to which they feel they have learned, reflects little more than their illusions of having learned. Furthermore, the relationship between the illusion of having learned to motivation for learning supports the possibility of training actors to give legitimate lectures, or to provide the educator with a more dramatic stage presence to enhance student satisfaction with the learning process. The study also pointed out that learner satisfaction may not be an accurate measure of learning and assessments that go beyond learner perceptions are necessary to make this determination.
Gigliotti16 offered suggestions for developing an effective slide presentation, using novelty and humor. The author’s premise was that it will not matter how important the content of a presentation is if it is not heard due to lack of interest. She emphasized the use of change (i.e. change of voice, posture, or an amusing anecdote to break up the lecture) and creativity. For example, she suggested that a road sign reading “Gas Next Exit” would attract more interest from the audience than a slide that reads “Abdominal distention.”
Van Dokkum17 also offered suggestions for effective lecturing that included audience entertainment. He stated, “The two basic elements of a presentation are that it is both scientific and entertaining at the same time.” He recommended that speakers make slides simple, speak clearly and slowly, stay within the time limit, keep the microphone a fixed distance from the mouth (especially when turning the head or moving away from the podium), use a laser pointer only when needed to make a point and avoid random movement of the pointer, check the lecture room before the presentation to learn how the audiovisual equipment functions, avoid reading directly from the slides, speak to and look at everybody during the presentation, relate to things heard in an earlier presentation, rehearse, do not apologize, and use appropriate color combinations in making slides.
Copeland et al18 recommended that those who design continuing medical education courses could improve them through faculty development or by providing guidelines to their lecturers. In addition, course directors could collect data and give feedback on these specific, behaviorally based, and important lecture features. Based on the comments received from the participants at the 2002 STR course, and recommendations for effective lecturing from the literature, a list of features of effective lecturing was developed (Figure 2). This list, in addition to speaker ratings data, was provided to the STR speakers. Thus, the data from the evaluations was used in a way that offered guidance to speakers in constructing future presentations. The items in the list can also be used in creating targeted CME course evaluation surveys at future STR meetings. For example, future speaker evaluation items could be related to specific areas in which speakers received the highest number of negative comments, such as content and delivery. Information derived from the negative comments could be provided to raters as a description of the aspects of content and delivery that should be evaluated. Because the literature suggests generalizability to these items, they could also be used in creating surveys for other CME courses.
Fifty-two (5.7%) of 914 comments were not analyzed because the meaning, positive or negative, wasn’t clear. Therefore, it may be useful for evaluation forms to provide separate columns for participants to write positive and negative comments. Positive comments were generally non-specific (i.e. “Outstanding”, “Born to teach”) and thus not analyzed further. Descriptive positive comments can be as instructive to speakers as negative comments. For example, if a speaker is not aware that something he/she does is perceived positively by an audience, the speaker may discontinue that activity. Therefore, evaluators should be encouraged to provide descriptive statements for both positive and negative responses.
The fact that some participants provided ratings for a speaker that cancelled (i.e. there was no presentation to be rated) highlights a limitation of the speaker evaluation system that was used. However, only 53 (1.37%) of 3,879 responses related to this cancelled session, and were therefore unlikely to have greatly influenced the overall mean scores.
The data from this study describes perceptions of the evaluators as to the quality of lectures rated. It cannot be assumed that the ratings reflect the degree to which the lectures were effective in changing knowledge, attitudes or the way physicians practice. The latter would require testing on the content of the presentations and/or interviews with participants, and this type of outcomes assessment should be the focus of future research studies aimed at determining the effectiveness of CME courses.
In summary, individual evaluations of speakers at a national CME course provided information regarding the quality of presentations that was not provided by evaluations of grouped presentations. Systematic analysis of speaker evaluations provided specific information related to the types and frequency of features related to ineffective lecturing. This information can be used to design CME course evaluations, design future CME course outcomes studies, provide training to presenters, and monitor presenter performance.
Collins J, Mullan BF, Holbert JM. Evaluation of speakers at a national radiology continuing medical education course. Med Educ Online [serial online] 2002;7:17. Available from http://www.med-ed-online.org.
Jannette Collins, MD, MEd,