How to digest teaching quality surveys?

None of the different available evaluation methods for teaching quality, Berk (2005) counts twelve of them, automatically produces better teaching. Teaching quality assessment surveys results are overall a very good data source when it comes to measuring teaching practice, including for formative purposes. They are often subject to fundamental criticism which, however, seem to lack any robust evidence in practice (Berk, 2005, 50 lists the relevant vast literature). If students use them diligently, and those evaluating understand reading them, many imperfections can be levelled out by way of interpretation (which obviously is subject to biases itself). In any case, we must admit that evaluation of teaching quality will always comprise subjective elements and that we, as a result, work with approximations.

Well, I thought, let’s give it a try and make the most of my teaching survey results of the last two years, instead of just reading once them upon receipt and then filing them away. One afternoon this summer I sat down with my five most recent ones and did some statistics. The good news first: they are all globally indexed between 1.3 and 1.5 (on a scale from 1-5) – which is comparatively pretty good I think.

Getting more detailed information out of them is trickier. There is the first part which includes numerical indices responding to concrete questions (‘How do you rate this teachers ability to engage with the students during lectures?’, etc.). And then there is a comment box where students can freely express themselves – and some students use this opportunity quite diligently.

Rating-based data

First, I considered the numerical indices they provide. What should I make out of them? As they are widely homogeneous in terms of results I thought the most valuable information could be gained from looking at the worst performances throughout the bench and see whether any patterns could be derived. As an amateur statistician, I decided to highlight those individual indices that in any of the five courses obtained a value of 1.6 or lower.  Here are the results:

  1. How would you rate your own contribution? – 2.1/2.2/2.1/2.5/1.9
  2. How do you rate the reading list? – 1.7/1.6/1.6/1.3/1.2
  3. How satisfied are you with the integration of classes with
    lectures? – 1.4/1.6/1.3/1.2/1.5
  4. How satisfied have you been with feedback given by this teacher? – 1.6/1.5/1.2/1.7/1.8

Regarding all other survey questions values were consistently 1.5 or higher.

When it came to interpreting these values I first decided to leave out the question about student’s own contribution: this is a special case, kind of a control question. Regarding the other three underperforming indices (reading list; integration classes with lectures; feedback given to students) I hope to gain more insight from the free comment box.  Survey results are most effective when read together with comments or other types of non-rating-based data sources (Berk, 2005, 49; Hendry and Dean, 2002, 76).

Students’ free comments

Reading the aggregate content in the comment boxes of five courses (each 60-90 students) was quite dome challenge (apart from bad handwriting). I tried to identify patterns within these comments. I will concentrate on the critical comments and those positive comments that refer to specific aspects (as opposed to the general appraisal of the course) and will also attempt categorising them. I included all comments that seemed somehow of substantial weight but have not weighed them. There were comments I did not really know what to do about them (‘He scares us the death!’). And, there were comments on other topics, such as on practicalities, the use of visiting scholars, exam method and exam content, which however were so insufficient as datasets that they did not even remotely provide a basis for a representative review.

Here is the list of patterns I was able to distil:

Teaching style  
Good Not good
Clear comprehensible explanations
Lecture and classes well structured Structure not clear
T manages to keep large groups engaged
Exam-style discussion good exam prep.
Everyone feels valued Teacher engages more with active students
Interactive teaching encouraging Too much interaction with students
Sharing newspaper articles on blog
Questions to class sometimes unclear
2 hrs not enough
T constantly tries to improve teaching
Good presentational skills
Frequent recaps appreciated
Communicate structure more clearly
Too little time for questions
Complicated things become understandable
Good slides
Good variety of session design
‘Very German’ (?) ‘Very German’ (?)


Reading list  
Good Not good
Clearly prioritised Not clearly prioritised
Uncertainty about status of further reading
Too long
Needs more basic reading


Good Not good
Happy about opportunity essay/mock Not enough feedback on formative work
Great feedback Not enough feedback


Substantial content of the course  
Good Not good
Substance at a good level Lectures too broad
Makes complex things understandable Simplistic language
Good mixture theory and practice Not enough detail
Too difficult because require background Boring for people with background
Contemporary and topical
Overly technical



The only crystal-clear outcome relates to reading lists. Even taking into consideration that complex reading by its nature is simply not students’ favourite pastime their critique is probably valid. The critique is specific (too much, too difficult, etc …) and is raised by a significant number of respondents. The critique is also understandable and hence not a surprise. Reading lists in my sub-discipline grow rapidly because substance evolves constantly and quickly – since the financial crisis content is produced at a breath-taking speed. I will, as a consequence of these findings, straighten and shorten the reading lists, get used to kicking things out despite the fact that they have some value. Also, I need to improve guidance on how to use the reading list, in particular how to deal with ‘further reading’.

Also in respect of feedback, results seem to suggest room for improvement. However, I am not sure whether the feedback criterion is one that students use to express somewhat general uncertainties about the content of the course. I dare to say this because there were plenty of more opportunities where students have not taken up on my offer to receive feedback: in particular, my office hours are, on average, only booked at 50% of their capacities; my ‘walk-in-feedback clinic’ was half-empty last term. Discontent with feedback on essays and other formative work is tricky to handle (students are unwilling to invest own intellectual effort into the analysis of what went wrong – it is, of course, a tedious task. I note that I actually provide a model answer in bullet-point form for each formative piece of work. Therefore, I may say, in colloquial terms, that I tend to believe that some students, on the one hand, are too lazy to come to my office hours but, on the other hand, complain that I do not work enough on the individual improvement of their formative work. Also, in this case, the survey results are not as clear-cut as in relation to reading lists (above), as other students actually do expressly praise the quality of the feedback. Still, there is certainly room for improvement as I cannot exclude that within a batch of 50 to 70 copies per exercise some do not receive the degree of diligence in giving feedback that they ideally should receive. Therefore, I will pay even more attention.

Critical reflection

Survey results probably do not allow to identify any other clear-cut suggestion for improvement, lack of other meaningful ways to interpret the remainder of survey data (see Hendry and Dean, 2002, 76, citing Murray 1997; Hodges and Stanton, 2007, 280). Many aspects receive positive and negative comments alike. I tend to interpret these findings to the effect that there is no systemic deficiency. However, it may well be that there is a problem of inconsistent performance on my side, or sometimes a problem of communication in the sense that I do not reach all students, which would either point to the need to pay even more attention in terms of diversity of students, or it may well be that, for instance, on certain days I manage to really make everybody engaged with the substance of a lecture, whereas on other days I do not manage to produce that atmosphere. Hence, this is rather a reminder that I should always remain attentive with respect to giving my best and making my teaching accessible to a diverse audience … and read through my compilation of student feedback from time to time.

First experience with simpler surveys

I have also used the formative evaluation approach during my recent summer school session. I thought this was a good idea because there is no survey comparable to the standardised teaching quality assessment which I addressed in the previous sections. Instead, the summer course uses a proprietary format measuring overall satisfaction with the programme, which is hence built on a black-or-white mechanism. As a consequence, it is not suitable for formative purposes. At the end of the summer course, I asked students to fill in the form provided by LSE’s Teaching and Learning Centre and to email it to a neutral person, for the sake of anonymity.

This survey was a straightforward failure. The main finding, if any, is how not to conduct a survey. Returns were sent by only 15% of the course (despite a friendly reminder) and data was basically worthless. I could deduct that the respondents were generally happy with the course – but not more. For the future, it seems that the last day of the course is not a good point in time. Also, instructions to students apparently need to be much clearer. Maybe, but this is just a guess, summer school students, as opposed to students on our regular programmes, do not feel committed to the School and do not see any interest in participating in measures aimed at improving teaching quality.