Physics Lecture Hall

Gather Evidence


There are many common methods for gathering evidence of student learning. It is hard to say that any one method is right or wrong, but some will align better with your outcomes than others. Also some may work better in your class, discipline, or program than others.

There are a few things to think about when deciding what kind of method(s) to select:

1) Think about how reliable or consistent the measurement is likely to be. A measure with low reliability would give different results each time it was used. For a concrete example, using just one multiple choice item to assess a learning outcome is not likely to be reliable. A student might misread the item, understand the larger concept but miss the finer details in the individual question, circle “B” instead of “C,” and so on. On the other hand, combining the total number of correct answers from a group of ten items is likely to be a more reliable measure of student learning.

2) Think about how valid a measurement is likely to be. Measures with high validity have good alignment with the outcome they claim to be measuring. For example, using an essay to assess students’ writing skills is probably more valid than using a multiple choice exam about the rules of grammar.

3) You may already have experience with some methods through your own research or prior experience teaching or assessing learning. If these methods have reasonable validity and/or reliability, there is nothing wrong with selecting a method to which you are already comfortable.

The following is a list of common methods of gathering evidence. The list is not meant to be exhaustive, and the advantages and disadvantages are meant to sensitize you to some of the success and problems you may encounter using a particular method.

Direct evidence is created by students as they demonstrate what they have learned. Indirect evidence is typically created by students telling us what they think they have learned; indirect evidence can also be items such as average grades or graduation rates that point to student success without telling us what exactly students learned (or did not learn).

Direct evidence is generally more convincing and should be used where possible. Indirect evidence can sometimes offer unique insights and may also be used to supplement direct evidence.

Using more than one line of evidence, or triangulating, generally offers more robust evidence of student learning than using just one method. You can also select methods that complement one another’s strengths and weaknesses. 

Direct Evidence  
  Advantages Disadvantages

Term Papers, Essays and other Written Work

• Easy to integrate into classroom routines

• Can be used to assess a variety of kinds of outcomes

• Requires time to develop high quality assignments


Embedded Test Questions

• Easy to integrate into classroom routines

• Can be closely linked to outcomes

• Comparability over time compromised unless questions are identical

• Traditional exams may be an inauthentic way to assess learning


• Portfolios can bring work from across courses/time together in one place

• There are free tools (e.g., Blackboard and Google) for managing portfolios

• They provide a site of student reflection and metacognition

• Identifying and implementing a software package that fits your portfolio project can take time, and not all portfolio systems are free

Capstone or Signature Assignments

• Can be an effective place to assess more than one outcome

• May be a good site to assess what has been learned over a student’s career

• Guiding students through capstone assignments and assessing them may be particularly time intensive

External Tests

• Make it possible to compare across institutions (and across time)

• Tests may be particularly well developed or designed

• May not be well aligned with local outcomes

• Generally are not free of cost


Indirect Evidence  
  Advantages Disadvantages

Locally Developed Surveys

• Can be quickly developed

• Can be administered at low cost

• Can be closely linked to outcomes

• Creating high-quality surveys may require time and training

• Low response rates can affect the validity of results

Externally Developed Surveys

• Surveys have already been developed, typically by individuals with a background in survey methods

• Make it possible to compare across institutions (and across time)

• May not be well aligned with locally developed outcomes

• Generally are not free

• Low response rates can affect the validity of results

Reflective Essays

• Can gather evidence that would be difficult to assess in other ways, for example about attitudes or change over time

• Can be used as a pedagogical tool to encourage reflection

• Students may be tempted to write what they think instructors want to hear and not what they actually think

Focus Groups

• An authentic way to learn about attitudes and perceptions

• Able to gather insights that instructors did not anticipate

• Keeping a focus group on topic can require some skill or training on the part of the facilitator

• Evidence from focus groups is often unstructured, requiring skills in qualitative analysis

Graduation Rates

• Facilitate comparison between groups and over time

• Can be seen as important measures of student success

• Constructing such measures may take knowledge of data management and data analysis techniques

•These are likely to be, at best, a very rough proxy for any specific learning outcome

Term papers, Essays and other Written Work

Student writing can be a good way to assess a variety of outcomes, from knowledge and basic mastery of content to skills like anlaysis, critical thinking and (obviously) writing. With written work it is generally a good idea to develop a rubric for assessment.

Some advantages of using written work is that it is easy to integrate into classroom our course routines and, if you use a rubric, much of the time spent grading and providing feedback to students can overlap with grading.

The main disadvantage is that without some kind of standards different people are likely to assess the same evidence (i.e.: paper) in different ways. 

Embedded test questions

Questions that align well with outcomes can be embedded in midterm, final, or other kinds of student examinations. These test questions can be multiple choice, fill in the blank, short answer, or essay. You can use existing test questions, although you might want to think about creating new ones (or making adjustments) to ensure good alignment with the outcome being assessed.

Some advantages of embedded test questions is that they are easy to integrate into usual classroom procedures and can be crafted to closely align with outcomes.

Some disadvantages of embedded test questions is that comparability between courses or over time can be limited unless identical questions are used. Traditional tests may not be an authentic form of assessment; a student may be able to select the correct answer on a multiple choice exam but fail to apply the underlying knowledge to a problem in the real world.


Portfolios are a collection of artifacts that students have produced - such as texts, images, creative products, or webpages. These collections are likely to be housed in an online repository. Students may simply collect things there, use the collection to highlight their own learning, or use the portfolio as a place to showcase their work to others.

There are many options for online portfolio creation and curation. Note, that UCR’s iLearn can be used to create portfolios, and there are other free options/alternatives such as Google Drive.

The advantages of portfolios are that the collection of evidence is streamlined, and portfolios can be used to assess learning or improvement over time. (Students will collect and add evidence of their learning to the portfolio over time.) Another advantage is that portfolios are a natural place for students to reflect on their own learning; students can also often take portfolios with them, using them in graduate school or employment applications.

One potential disadvantage of portfolios is that it can take time to identify and implement an electronic portfolio management system with features that work for your purposes.

Capstone or Signature Assignments

A capstone or signature assignment is one that asks students to create a significant original piece of work, integrating what they have learned over several courses or even years of study. The assignment might be an undergraduate-level thesis, an original research project, or creative work.

Some advantages of capstone assignments include the potential of assessing more than one outcome with the same piece of student work: A good capstone project should show a student has met most of the program’s expectations. Capstone assignments may also be a good place to assess student learning in interdisciplinary programs where much coursework may occur outside the department or program.

Some disadvantages of capstone assignments are that they may be time intensive for instructors because they may have to guide students through the process (for example reviewing drafts), and then the projects themselves may be lengthy and complex.

External Tests

External tests are ones developed by a third party - such as a testing service or disciplinary association - that measure what students know in a particular field. One example is the Diagnostic of Undergraduate Chemistry Knowledge (DUCK), which asks students about general principles of chemistry and contains groups of items that ask about important subfields. The test has been taken at universities across the country, and the American Chemical Society reports national averages and percentile rankings.

The advantage of external tests is that they make it possible to compare students’ learning across or between institutions; it is also often possible to make comparisons over time. Many of these national tests have been developed by professionals with a background in psychometrics, meaning they may be particularly high-quality exams.

The disadvantage of external tests is that they do not exist and may not be appropriate for all disciplines. Such exams may also not be well aligned with local outcomes. Many of these exams are also not free of cost.

Locally Developed Surveys

One of the most common ways to gather indirect evidence of student learning is to ask students to self-report what they have learned. These surveys could be a half sheet of paper asking students about what they learned in the previous hour or an online survey with dozens of questions asking students about what they have learned in their degree programs. Surveys can include close-ended items (e.g., “strongly agree, agree, neutral . . .”) or open-ended items (e.g., “How do you think your writing has improved?”).

The advantage of local surveys is that they can be quickly created and are generally easy to administer. (There are many free online tools such as Qualtrics.) Locally developed surveys can be easily tailored to your outcomes.

One disadvantage of surveys is that creating high-quality surveys can be difficult without at least some experience in survey design. Another potential disadvantage is low response rates: If a relatively low percentage of respondents answer a survey, there is a chance that there could be significant differences between those who respond and those who do not.

Externally Developed Surveys

Others have also developed surveys that ask about student learning in general ways. One example is the University of California Undergraduate Experience Survey. This survey asks students to self-report on their level of reading, writing, math, and other skills when they started at this campus and where they are now. The change in self-assigned scores could be seen as indirect evidence of learning.

The advantage of external surveys is that they have already been developed. Moreover, such surveys may have already been administered, making it possible to look at change over time; it may also be possible to compare students on one campus to students on other campuses.

The disadvantage of externally developed surveys is that they may not align particularly well with a given set of local outcomes. Some of these surveys are proprietary and not free of cost. External surveys may also suffer from the problem of low response rates.

Reflective Essays

Reflective essays ask students to reflect on what they have learned, often near the end of a course or just before earning a degree. Reflective essays can also be integrated with larger assignments, such as portfolios or group presentations, giving instructors additional insights into what students have learned. These kinds of essays ask students to make additional connections with course material or engage in metacognition. Reflective essays are often assessed with a rubric.

The advantage of reflective essays is that they can give instructors insights that might otherwise be difficult to make. Reflective essays can also be a unique pedagogical tool, encouraging students to link ideas and engage in metacognition.

One disadvantage of reflective essays is that there may be a tendency for students to say what they think instructors are interested in hearing and not what they actually think.

Focus Groups

Focus groups involve students having a focused discussion on particular topics, such as what they have learned in a particular course or their experiences in the coursework offered by a given program. Focus groups typically involve a facilitator asking a small number of open-ended questions and then listening to answers as well as encouraging discussion among participants. The choice of the facilitator is important: If it is the instructor of a course students are currently enrolled in, students may not feel comfortable sharing. (Another faculty or staff member from the same program may be a better choice.) The conversation is typically documented by notes, a transcription, or a recording. This then becomes the evidence and is analyzed for common themes or patterns.

The advantage of focus groups is they have a stronger ability to gauge the strength of respondents’ feelings. Focus groups are also much better able to glean insights from responses or feelings that were not anticipated when compared to surveys, with predefined sets of answer choices.

A disadvantage of focus groups is that if a facilitator is not careful, the group can easily stray off topic. It can also be time consuming to analyze unstructured data.

Metrics Such as Passing, Retention, and Graduation Rates

Metrics such as graduation rates or, in the context of particular programs, how long it took to pass a particular sequence of courses (e.g., BIOL 005A, 005B, and 005C) can give some insight into student learning.

The advantages of these kinds of metrics is that they can often be compared across groups of students as well as disaggregated for subgroups (e.g., men or women) and tracked over time.

The disadvantage is that constructing these kinds of metrics generally requires direct access to student data and at least some background in data management and analysis techniques. These are also somewhat imprecise metrics in that it is not possible to know what, exactly, a student learned (or did not learn).

It is not necessary to look at every single example of student evidence to conduct assessment. It is possible to draw a sample, and if you do this carefully, the time you spend examining this sample of student evidence can tell you something about learning among the larger group of students.

The key to sampling is selecting a group that is representative of the larger group. One option is to gather the kind of random sample statisticians talk about. To do this, you would gather all the evidence (all the term papers) and then use some random selection procedure (maybe those papers written by students with a student identification number ending in a six). You are typically much safer in making generalizations from those kinds of randomly selected samples. There is a large body of work on random sampling that can offer more precise guidance on considerations such as generalization and sample size.

If it is not feasible to assess all the pieces of evidence, and if you are not willing or able to assemble a random sample, the alternative is to try to be purposeful and collect a sample that attempts to approximate the larger group of students. This would likely mean including students from different classes and stages in the program; you will also want to include students whose work will show a variety of levels of mastery. It is important to be aware of limitations sampling may introduce: If some groups of students are not included or are underrepresented, you limit your ability to say anything about learning among those groups.

If you are sampling because you simply do not have time to assess every student and are wondering how much evidence your sample should include, here is one approach to try. Start with about thirty examples. Conduct your assessment and then draw some tentative conclusions. Then assess ten more examples. Does any of this new evidence change your conclusions? If not, then your sample is probably big enough. If the additional evidence changed your conclusion, you should repeat the process until adding more evidence does not change your conclusions.

Remember, though, you do not have to collect a sample. If you are assessing student learning in a small class or program, you should probably include all students. Similarly, if you are using embedded test questions, it might be easier to look at the results for all the students at the end of the grading process.