The ABCs of Assessment

Trusteeship
March/April
2003
Number: 
2
Volume: 
11
By 
T. Dary Erwin
Trusteeship

When was the last time your board heard an update about how well your institution was educating students? What kind of evidence was presented to you: perceptions about what might be happening or direct measures of what students actually learned and developed? Isn’t part of your oversight responsibility to ask for and expect accurate information about educational quality?

Since the 1980s, governors and state legislators have been asking public colleges and universities for more and better evidence about effectiveness. With the competition for state funds fiercer than ever, the tactic of making a case for increased funding based on logic and tradition has given way to arguments favoring a greater role for outcomes-based information in decision making. Private institutions, too, are facing increased pressure to prove their graduates’ achievements.

Generally, elected officials and state policymakers ask for data of three basic types. First, officials seek various business-related operational information such as length of time to pay bills or to process a financial-aid application. The focus here is on efficiency, not effectiveness. That approach, however, seemed justified when the public perceived there might be waste or misspent funds. With public budgets at bare bones, this perception is diminished.

Second, state and federal groups ask for such outputs as retention and graduation rates. Recently, for example, the Bush Administration called for federal monitoring of graduation rates so that colleges can be held accountable for purposes of federal-aid distribution. The problem with basing high-stakes decisions on such statistics, as we shall see, is that a college or university can move students through its pipeline without necessarily teaching them much.

Third, some states recognize the shortcomings of these first two types of measures and have added various institutional and student-learning outcomes measures. The idea here is to capture measures related more to quality than operations. Coupled with collection of these types of data, a few states have experimented with performance-based budgeting or funding. That is, rather than basing public funding primarily on enrollment counts, states are tinkering with linking funding to measures of quality. Major problems persist, however, regarding what measures of student learning to use and how to link this information to funding, and no state has developed an effective model for other states to emulate.

In sum, the long-standing quest for consensus on the appropriate data for measuring quality in higher education continues. What follows is some perspective on this ongoing conundrum, along with the results of some new research in this field being conducted at James Madison University in Virginia.

Four Traditional Areas. Historically, officials have sought data in four broad areas of student assessment: basic skills, the major programs, student affairs and other out-of-class activities, and general education.

In basic skills, states such as New Jersey and Georgia have required passage of tests by first-year students in such areas as mathematics and reading. Entry-skills testing is receiving less attention now because of the proliferation of competency tests in high schools across the country. Legislators evidently believe it is better to deal with the problem of basic skills closer to the source—at the high schools.

Assessment of major programs has been driven largely by federal mandates about reporting student-achievement results through the accreditation process. Many professional programs such as the Accreditation Board for Engineering and Technology, the American Assembly of Collegiate Schools of Business, and the National Organization of Nurse Practitioner Faculties/American Association of Colleges of Nursing are now including student-learning data in their oversight reviews of institutional programs. However, academic disciplines that lack such professional outlets are taking longer to develop assessment processes.

Student affairs is probably the least mentioned of these four areas, but it is soon likely to receive increased attention. At least two related public-policy questions loom: First, with the rise of distance education, what is the value of the residential campus experience? And second, are student-activity fees contributing to educational outcomes? One can ask what students learn or develop outside of class from, for example, extracurricular activities, study-abroad programs, career counseling, and other campus services.

The remaining area, one that often is the main focus for assessment mandates, is general education. This refers to the common knowledge, skills, and personal characteristics of all baccalaureate graduates, regardless of their major. Assessing the effectiveness of an institution’s general education program can yield the greatest insights because it is the “picture window” of an institution. In addition, general education has received much criticism from outside of higher education, with many in the business world complaining that too many recent graduates lack adequate communication, technology, and quantitative reasoning skills.

More than 90 percent of the general education programs in American colleges and universities offer a cafeteria-like array of courses. Departments covet course listings because they convert to credit hours and assigned resources. Yet the descriptive categories and course titles do not necessarily reflect the reality of what occurs inside classrooms. Over time at some institutions, courses have been added to general education curricula for reasons that have little to do with improving the quality of education.

What Boards Can Do. Trustees who follow the current debate over assessment of student learning may wish there were a higher education equivalent of the business world’s income statement. Absent such a development, board members may ask several questions of senior administrators to help determine the extent to which the institution is measuring educational results.

  1. Has the institution set specific objectives regarding what students should learn for each major program, general education category, and educationally related student-affairs program? Weasel words such as “appreciation,” “familiar,” or “awareness” can signal a lack of clear educational purpose. Instead, ask for specific learning objectives, and be sure to focus on what students should be learning, not how they are learning.
  2. Is there regular program-by-program collection and analysis of student-learning data? Are assessment instruments given only to the best students, to volunteer students, or to students of cooperative faculty? Are the assessment instruments credible? Are only the “easy” objectives being measured? Asking students to self-report can produce different results from those that directly measure learning.
  3. How have students changed in general education over two to three years? How many students reach an expected standard or competency level? Do students who have taken part in the student-affairs program score or rate higher than students who have not participated? Expect analyses of these data about student learning.
  4. Have program requirements or ways of delivering instruction or content in a course changed because students were not learning? Have resources been allocated or reallocated based on how well students have learned? Expect a description of the uses of the assessment data to improve programs or services.
  5. Are descriptions of what students learn based on data, rather than on unsubstantiated claims? Ask that regular reports about student learning be a part of the departmental and institutional annual reporting process.

Every college and university needs accurate information about quality, and there’s no question that some critics will continue to push for more intensive board oversight of student learning. To be sure, many will understand that measuring student learning is more difficult than counting dollars and that we’ll need to design more sophisticated assessment instruments in our efforts to measure complex ways of thinking. But asking for and requiring aggregated data about student learning is a responsibility of all of us who are part of the education enterprise. Will we pass the test?

Assessing Student Learning at James Madison University

The public debate over how to measure student learning as well as a long-standing desire to improve programs and services prompted creation in 1986 of a special unit to study assessment at James Madison University (JMU), a public institution in Harrisonburg, Va.

 Today, the Center for Assessment and Research Studies (CARS) has seven doctoral-level faculty members who provide help to faculty and student-affairs staff across campus and offer master and doctoral programs in assessment and measurement. Here are additional details about this effort:

Scope. Every JMU undergraduate and graduate program is required to furnish evidence of student learning, which is reported annually and which goes far beyond state requirements. Administrators of the five course groups within general education are required to report data supporting this evidence, and course sequences must generate positive data on student learning if they are to remain in the curriculum. Educational programs in the student-affairs category also report data about student learning from their various programs and services.

Assessment instruments. More than 90 percent of the tests and rating scales are designed by CARS faculty. CARS conducts extensive studies of reliability and validity before using these tests to gauge student progress or allocate resources. There are two main reasons CARS creates its own tests. Many tests such as those designed by the Educational Testing Service do not match our curriculum or report only one total score, when more diagnostic information is needed to make program changes. Also, no college-level test has nationally representative norms, which makes comparisons difficult.

Computer-based testing. CARS is creating or moving many of its assessment instruments to the computer, which has multimedia and “smart” testing capabilities for novel (and entertaining) questions and for branching capacities. For example, we have a general education fine-arts test that displays artistic representations and music; part of our oral communication test displays video clips of speeches and small-group processes; and our American history and government test provides clips from speeches by Martin Luther King Jr. and President Franklin Delano Roosevelt as question prompts.

Our Information Literacy Test (ILT) displays test questions in the bottom half of the computer screen, but on the top half are Internet links to other electronic databases to help students answer the test questions. The ILT tests students’ ability to find, access, and evaluate the credibility of information sources stored electronically—skills we posit as basic for the first-year student as well as for enlightened citizenship.

Collecting assessment information. CARS assesses all first-year students and all late sophomores/early juniors in general education. For the latter group, we call off classes for one day in late February, when about 1,800 sophomores or juniors are assessed. That same day is used by many academic departments to collect learning information from graduating seniors.

Analyzing assessment information. CARS analyzes assessment data along three primary indicators:

  • Do student scores or ratings change over time (an outcome that allows an institution to claim it is adding value)? For example, what is the difference between the performance of entering first-year students and end-of-the-year sophomores in a given aspect of general education?
  • Is there an expected standard or competency level for students who have completed the respective program? How many students meet that standard?
  • Is there a relationship between the educational program or service and the outcome? Do students who have completed a particular program at the mid-point of their undergraduate program score or rate higher than students who have not yet completed that particular part of the program? Are outcomes related or correlated to grades?

Use of assessment results. One hallmark of any successful assessment program is that results data are put to use, rather than being collected merely for compliance purposes. Two examples of our uses: First, our former general education program, then called “liberal studies,” was replaced with almost an entirely new initiative in 1997 after data showed mixed success. 
A second example includes the use of assessment data in external academic program reviews. Some of these reviews led to discontinuing selected majors and certification programs, the development of new interdisciplinary minors, the restructuring of majors into minors, and the migration of programs and faculty to different departments.

Training leaders in assessment. Institutional assessment efforts often fail because of the people running them. In some cases, assessment coordinators do not receive adequate support, and in other cases, the will is there, but the expertise is not.

CARS has created master and doctoral programs in assessment and measurement to train students essentially in the why, what, and how. The why encompasses knowledge of the public-policy reasons assessment is required. Some educators may think accountability is a fad. But as trustees know well, calls for accountability come from taxpayers, government officials, educational oversight groups, business and industry, and consumers. Accountability is a continuing demand.

The what component of JMU’s graduate curriculum includes the objectives and constructs we should be assessing. Some knowledge and skills are straightforward, but some important areas are not as clear. For example, in assessing whether an institution is producing lifelong learners, the typical reaction is to survey alumni about continuing education. CARS is developing a “curiosity index,” or measure of one’s intrinsic motivation to learn. This index will have two scores in breadth (interest in a variety of learning activities) and depth (the desire to pursue a sustained investigation).

The how component is a more widely understood area involving measurement and statistics. Developing programmatic measures of student learning means careful study and revision of assessment instruments to reduce measurement error and to ensure that the instruments measure intended educational purposes. In addition, a professional communication course is required because assessment coordinators should interact with a variety of audiences.

One pitfall of the assessment field is the preponderance of jargon and everyday words that have a special meaning to those in the field. To help trustees penetrate the vocabulary, the CARS Web site, www.jmu.edu/assessment, offers a user-friendly search tool that provides clear definitions for common terms.—T.D.E.