By Gail E. Krovitz, Ph.D. , Director of Academic Training & Consulting
Testing, testing, 1, 2, 3… The idea of testing in education brings out mixed feelings in many of us. We think of excessive standardized testing in K-12 education, or instructors in higher education who assess a semester of content with only a scantron final test. But what if the act of testing is misunderstood and can actually provide an opportunity for learning instead of just assessing learning gained elsewhere? Research on the testing effect shows just that. The testing effect is the “finding that retrieval of information from memory [i.e., as during a test] produces better retention than restudying the same information for an equivalent amount of time” (Roediger and Butler, 2011: 20) and is supported by a strong series of experiments in laboratory settings as well as classroom studies. “In education today, people tend to think of tests as dipstick devices… you stick it in to measure what people know. But every time you test someone, you change what they know” (HL Roediger III, as quoted in Glenn).
Many laboratory studies that examine the testing effect are set up as follows:
“One group of students studied some set of materials and then was given an initial test (or sometimes repeated tests). Retention of the material was assessed on a final criterial test, and the tested group’s performance was compared with that of one or two control groups. In one type of control, students studied the material and took the final test just as the tested group did, but were not given an initial test. In a second type of control (a restudy control), students studied the material just as the tested group did, but then studied the material a second time when the tested group received the initial test; in this case total exposure time to the material was equated for the tested and control groups. The typical finding throughout the literature is that the tested group outperforms both kinds of control groups… on the final test” (Roediger and Karpicke, 2006b: 182).
So in summary, taking a test provides better preparation for future retention of material than does repeatedly studying or re-reading the same material (see studies reviewed in Roediger and Karpicke, 2006b, and Roediger and Butler, 2011). This is particularly true if the final test is delayed, compared to immediately taken after the studying (otherwise known as cramming in educational circles!).
This result may seem counter intuitive – how can taking a test provide better preparation for retention of information on a delayed test (i.e., long-term retention), compared to continuing to study or re-read the same material? But this is actually not surprising if you think about it terms of learning the skills needed for a task: exactly practicing the skill during learning (i.e., taking a practice test), helps you perform better when being assessed on that skill later (i.e., taking a follow up or final test) (Roediger and Karpicke, 2006a). If I want to learn to play tennis, I’m best served by actually practicing the skills needed to play tennis instead of reading about it. Also, research on how memory works shows that retrieving information during a test is not a “neutral event,” but actually impacts the ability to retrieve that information in the future. “People usually imagine memory as a storage space, as a space where we put things, as if they were books in a library. But the act of retrieval is not neutral. It affects the system” (JD Karpicke, as quoted in Glenn).
We might want to chalk these findings up to so-called “mediated” (or indirect) effects of testing. Mediated effects would include that frequent testing encourages students to study more throughout the class rather than cramming right before one or two large tests, or that tests give students feedback about what they do or don’t know so they can refine their future studying efforts (Roediger and Karpicke, 2006b: 182). With mediated effects of testing, “it is not the act of taking the test itself that influences learning, but rather the fact that testing promotes learning via some other process or processes” (Roediger and Kapicke, 2006b: 182). While these mediated effects are certainly valuable, and could lead to recommendations for more instructors to use low-stakes formative testing in their classes, this research focuses on direct or unmediated effects of the tests, something intrinsic to taking a test that helps future knowledge retrieval. “Testing not only measures knowledge, but also changes it, often greatly improving retention of the tested knowledge” (Roediger and Karpicke, 2006b: 181). Unfortunately, researchers don’t currently know why the testing effect works (see Roediger and Karpicke, 2006b for discussion of the theoretical studies investigating this), but the testing effect has been strongly shown in many studies.
Some other interesting findings from this research involve whether or not to give feedback (the correct answers) and what test format is most effective.
Feedback: It is important to give students the correct answers (or “feedback” as it’s called in these studies), as presenting students with the correct response after the test is more effective than simply telling them that a particular question is correct or not (Butler et al., 2007). It is also best to give this feedback after the test as a whole (delayed feedback) rather than right after answering each question (immediate feedback) (Butler et al., 2007). A laboratory study by Butler and Roediger (2008) illustrates the testing effect, as well as the importance of feedback, and of delayed feedback. The experiment yielded the following results for students who did not have a chance to read (study) the assigned passages they would be eventually be tested on (2008: 609):
No study, no initial test: 10% correct on final test
No study, initial test, but no feedback given: 18% correct on final test
No study, initial test, immediate feedback given: 42% correct on final test
No study, initial test, delayed feedback given: 57% correct on final test
Similar patterns were observed in each experimental set up, illustrating the importance of the initial test and the use of delayed feedback.
Interestingly, Butler et al. (2007) mention other studies showing that immediate feedback might be more effective in actual classroom settings (rather than laboratory experiments), and they suggest that this might be due to students not going back after the test to process the delayed feedback (both correct and incorrect questions) since they are not forced to do that as part of an experimental set up. Thus recommending that students make an effort to view the feedback and read correct and incorrect questions and answers after the test might be helpful.
Test format: Studies reviewed here suggest that if the initial test is short answer or essay format (a “free-recall” or “production” type test) it contributes to a larger test effect than if the initial test is multiple-choice (a “recognition” type test); the follow-up test format (whether short answer or multiple-choice) does not matter as much (Roediger and Karpicke, 2006b). However, recognition type tests do still show a strong test effect, so it’s probably still advantageous to use them in educational settings (like if the class size is too large to make manual grading of free-recall tests realistic).
Another potential issue of using multiple-choice or true/false questions on tests is that students are exposed to incorrect answers during the testing process. Therefore, “students may sometimes endorse false items as being true and thereby learn erroneous information,” or “even if they read a false item and know it is false, the mere act of reading the false statement may make it seem true at a later point of time” (Roediger and Karpicke, 2006b: 203). This is called the negative suggestion effect (Roediger and Karpicke, 2006b). To counterbalance this, providing feedback is extremely important on recognition type tests, and research shows that “if feedback is provided after a multiple-choice test, the negative effects are completely nullified” (Roediger and Butler, 2011: 23).
So far much of this discussion has focused on results of laboratory studies on the test effect, but what about studies in actual classrooms? Classroom findings might differ because students are responsible for more information in the classroom than in a laboratory setting, the material is presented in a variety of ways, and “students also differ greatly in the amount of studying they do before tests, in how soon they begin studying (relative to when tests occur), in their interest in the course material, and in their motivation to learn” (Roediger and Karpicke, 2006b: 195). However, studies in classroom settings also demonstrate a test effect. In a study looking at frequent quizzes given in a middle school science class, McDaniel and Agarwal found that frequent quizzing increased student performance on unit tests from 79% correct (for material not previously tested with a quiz) to more than 90% correct (2011: 403). The quizzing effect persisted until end of semester test (79% on what was quizzed vs. 72% on non-quizzed content) and an end of school year test (68% for quizzed vs. 62% non-quizzed content) (2011: 403). The quizzes were low stakes, less than 10% of students’ grades. In another example, one section of a statistics for psychology course included a test of four short answer questions at the end of each lecture period (totaling around 8% of the final grade), while another section of the same class taught by the same professor did not use these end of class tests. Students in the section using the end of class tests scored significantly higher on the exams (mean score of 86% versus 78%), and fewer students overall earned mean exam scores lower than 70% (5.4% of class versus 27.1% of class in comparison section), compared with students in the course section not using the tests (Lyle and Crawford, 2011).
All in all, research on the testing effect is compelling, and suggests that testing (or information retrieval practice) has a greater effect than studying on long-term retention of information, so more frequent “retrieval practice” (i.e., testing/quizzing) in the classroom should help increase long-term retention of information (Roediger and Butler, 2011). This research should hopefully allow us to see tests as opportunities for learning, instead of just instruments that assess learning acquired place by other means, and maybe it will inspire some of us to include more frequent testing in our own classes.
Sources:
Butler, A.C., J.D. Karpicke, and H.L. Roediger, III (2007). The effect of type and timing of feedback on learning from multiple-choice tests. Journal of Experimental Psychology: Applied 13(4): 273-281.
Butler, A.C., and H.L. Roediger, III (2008). Feedback enhances the positive effects and reduces the negative effects of multiple-choice testing. Memory and Cognition 36(3): 604-616.
Glenn, D. (2007). You will be tested on this. Chronicle of Higher Education 53(40): A14. Accessed online on January 10, 2012 at http://chronicle.com/article/You-Will-be-Tested-on-This/14732
Lyle, K.B. and N.A. Crawford (2011). Retrieving essential material at the end of lectures improves performance on statistics exams. Teaching of Psychology 38(2): 94-97.
McDaniel, M.A., and P.K. Agarwal (2011). Test-enhanced learning in a middle school science classroom: the effects of quiz frequency and placement. Journal of Educational Psychology 103(2): 399-414.
Roediger, H.L., III, and A.C. Butler (2011). The critical role of retrieval practice in long-term retention. Trends in Cognitive Science 15(1): 20-27.
Roediger, H.L., III, and J.D. Karpicke (2006a). Test-enhanced learning: taking memory tests improves long-term retention. Psychological Science 17(3): 249-255.
Roediger, H.L., III and J.D. Karpicke (2006b). The power of testing memory: basic research and implications for educational practice. Perspectives on Psychological Science 1(3): 181-210.



