Justice Through Testing

An earlier version of this paper appears in educational Horizons, 85,1 (Fall 2006) 44 - 55.

Justice Through Testing
©Edward G. Rozycki, Ed. D.

RETURN
edited 11/2/15

Rationale, or Dogma?

Over the last seventy years, increasing numbers of public school people in the United States have come to tell a story in which they express widely shared aspirations and deeply felt concerns. Let us call this story The Testing For Justice Rationale. It goes something like this:

For some children and not others to have their needs met by the schools is unfair. Justice therefore dictates that we meet the needs of all children. How do we determine those needs? By comparing what children can do with what they can learn to do. Any discrepancy between achievement and potential is an indicator of need. How do we determine such things as achievement and potential? By adequate testing. [1]

This rationale supports many well-intentioned attempts at upgrading American schools. But it is replete with questionable assumptions seldom examined even when repeated tries at improving schooling practice have failed.

Educational testing has long been noted to affect the lives of our not only of students, but of educators, themselves.[2] Thus, an understanding of testing and the assumptions it is based on is indispensable to intelligent schooling practice. School testing can be critiqued on several grounds:

a. Are the tests technically adequate?

b. Are they fair?

c. Is the imposition of a testing process little more than an exercise of political power?

The Testing For Justice Rationale goes beyond answering these questions and burdens testing not only with the determination of need, but ultimately, of justice.

Why Bother with Testing?

Modern schooling, which processes large numbers of students, seems inconceivable without testing. That is because it is so convenient for sorting students. It can stand in for a long and involved set of social interactions with master teachers -- typical of an apprenticeship -- more common in schools before they grew to present-day sizes and pursued a philosophy of productive efficiency.

Why do teachers give tests? For several reasons. Among them are a) to support the authority of the teacher's judgment as to acquired learnings; and b) to substitute for an infeasibly broad examination of student ability. This convenience is so important in the mass processing carried out in today's schools that learnings which don't lend themselves to easy examination, e.g. with paper and pencil, find it hard to gain status in a curriculum. Goodson comments,

For the groups and associations promoting themselves as school subjects, and irresistibly drawn to claiming academic status, a central criterion has been whether the subjects content could be tested by a written examinations for an able clientele.[3]

In testing, however, many crucial assumptions are made about means, ends and the causal connections between them. Achievement tests, for example, are not, in and of themselves, the point of instruction; otherwise, we would teach, not merely to the tests, but the very tests themselves. Nor is mere participation in course work thought sufficient to make testing unnecessary. Rather, what is sought in achievement tests are certain important residues of the instructional process.

Calling something a test assumes that there is a strong consensus on what the test is an indicator of. But when it comes to things as vague and controversial as human abilities, upon which a judgment of educational need might be based, interesting things happen. Tests come to stand in for controversial and pluralistic conceptions of human ability. Intelligence, for example, becomes what IQ tests measure. Or, on the other hand, the concept, say, of intelligence, itself, becomes a focal point of controversy.[4]

What Makes a Test a Test?

From the student's point of view every test is a task.[5] But not every task is a test, even if it looks like one. What conditions must a task satisfy, if it is to really be a test? This is a question of great practicality. State governments allocate funds to school districts on the basis of efficiency. This efficiency is determined by tests provided by state departments of education and imposed on local school districts. But what is necessary if this procedure is to be anything more than a charade?

To avoid overlooking assumptions built into our conception of testing, let's use a substitute concept: rank-task. We begin our investigation by talking about rank-tasks rather than about tests. A rank-task is a type of activity some outcomes of which can be ranked, as, for example, better, the same, or worse. Think of a rank-task as any procedure which assigns someone a number. This can be interpreted as a rank to compare that person to others involved with the procedure. Cinderella's Prince, looking to fit the glass slipper, would be undertaking such a rank-task. Some feet are too small; others, too large; only Cinderella's, just right. But trying to sort football players by the numbers on their jerseys is not a rank-task because there is generally no significance to the comparison of any two numbers other than that they indicate a different wearer.

Tests are, at the minimum, rank-tasks. They can be performed with more or less skill. But the skill demonstrated may not be what it is we wish to measure. For instance, Students take SAT preparation courses to learn test-taking skills, not the information the tests are designed to measure. Often these test-taking skills can be as crucial to a good score as actually knowing the material covered in the test. For example, The Princeton Review has for some years provided materials and training in simple test taking procedures which seem to be able to raise SAT scores significantly.[6] The SAT's are intended to measure scholastic aptitude. But the effectiveness of the Princeton Review's materials suggests that the SAT's are also measuring something else, namely the ability to take standardized tests of this type.

This observation illustrates the very practical nature of our seemingly theoretical observations about testing. Among the readers of this article are certain to be individuals who did not get a scholarship, or who failed to get in to the college or university of their choice because of the score they received on the SAT. And there is a fair chance that the reason they did not get a higher score was not because they lacked scholastic aptitude, but because they lacked certain test-taking skills.

Tests are also taken to be indicators. As such they must meet certain conditions. Any well-designed rank-task must be able to vary in a consistent manner upon reapplication and the variation must be understood to make a difference. Test makers call this consistency, or internal validity. They must indicate something other than themselves: this is called "external," or construct, validity.[7]

Usually out of the hands of professional test-makers is a fourth condition: trustworthiness -- we must be able to believe the results were not manipulated for special purposes. This is usually a matter of test security[8] and is not infrequently dealt with in a cavalier fashion in many schools.

The important thing, especially from the test-taker's point of view, is that every test is a task that can be performed with more or less skill, independently of any technical considerations of externality and trustworthiness. For example, a student may learn to do multiple choice exams very efficiently even if those exams do not test anything we can recognize as subject-matter. On the other hand, a student may know a great deal about something, yet be very bad at demonstrating that knowledge through the medium of the test prepared for it.[9]

If a rank-task is a test then the goals of the testing control (determine) the kind of test tasks we present to the student. These tasks in turn control the knowledge the student will have to bring to support the test task. The connections between student knowledge and the test outcomes used to evaluate it are mediated by the task itself. Whether an increase in test scores indicates an increase in student knowledge or an increase in test taking skill may depend on this mediation.

From Consensus, Through Testing, To Justice

Let's reiterate an important point: we -- as interested parties -- must agree upon some way of determining what knowledge a student has that is independent of the test, otherwise the test becomes problematic. Lacking such consensus on the test, evaluations of potential or achievement are questionable. So then is the determination of need and consequently, fairness.

Thus, in a very real way, it is problems of consensus that bear ultimately, via testing, upon perceptions of fairness in schooling. We can lay the argument out as follows:

a) Consensus -- among interested parties -- will affect which ideas of potential (e.g. native ability, capacity, competence) can be used for testing in the school.

b) Consensus will affect which ideas of achievement (e.g. skills acquired or developed) can be used for testing in the school.

We then bring in the connections given by The Testing For Justice Rationale:

c). The difference between potential and achievement measures need.

d) The difference in treatment of need measures justice.

The most immediately practical version of this argument, which we will call The Status Quo Argument, is this:

There is a consensus in our community that Group A and Group B differ in potential. We observe that they differ in achievement. Since their achievement merely reflects their potential, there is no disparity in educational need. Therefore, our present treatment of Groups A and B, although they may look different, are not unjust.

Despite the fact that the Status Quo Argument has been pressed into the service of racism and class bias, it is theoretically sound.[10] The moral issue is how the consensus supporting it arises. It is around such claims of consensus that many of the controversies about schooling cluster. (Consider, for example, the widely accepted assumption that so-called "gifted students" have no need for special educational treatments.)

Objectivity and Need

One assumption of much discussion about schooling practice is that testing offers us an objective way of making decisions that gets around problems of values and consensus. But is this so? Test data seems so impartial, so objective. But what can numbers alone tell us?

Imagine we have three groups of students, A, B and C, and we give a rank-task to each student, consisting of RankTask 1, RankTask 2 and RankTask 3. Suppose chart 1 gives us the following results -- assuming the group means to be calculable.

	RankTask 1	RankTask 2	RankTask 3
Group 1	90	90	60
Group 2	50	50	60
Group 3	15	15	60

Chart 1

Even if we can also assume that the differences between groups for each test are significant and that there has been no cheating, what are we to make of the differences in these scores? Are they any guide to practical decision?

It all depends. Our first question should be, "What are these tests supposed to indicate?" Unless they are believed to indicate something, they are still merely rank-tasks. And if these test results are to be important to making equitable schooling decisions, they must deal with what Thomas F. Green[11] has called educationally relevant attributes.

An attribute is educationally relevant in Green's terms, if it would be fair to distribute schooling benefits on the basis of that attribute. If we believed it was fair, for example, for males to get more diplomas than females just because they were males, then sex would be an educationally relevant attribute.

In the United States, sex is, by law, not educationally relevant in public schools. In other cultures, it is considered to be so. Let us imagine a society that is so fixated on gender stereotypes that physiological distinctions are overridden by psychological ones. To the extent that a female is seen as a "tomboy," she is given preference with "real men" over other females. "Girly men" are devalued. In this society, the test that decides who enjoys the privileges of gender prejudice is call the "Degree of Masculinity Test."

In chart 1, suppose Test 1 (RankTask 1) indicated something like "degree of masculinity," (DMT). If Test 2 (RankTask 2) indicated percentage of high school graduates in the group, we, in the United States, would find that it indicated an unjust situation, because we reject gender as educationally relevant. But if Test 3 (RankTask 3) stood for percentage of high school graduates, it would be taken, on the same assumption of the irrelevance of gender, to be an indicator of equitable schooling practice.

More Educationally Relevant Attributes

Chart 2 shows attributes in terms of which people might be grouped compared with different kinds of schooling benefits. In each block the words just or unjust indicate whether there is a general consensus in the U.S. that any schooling benefits distributed on the basis of the indicated kinds of attribute are considered just. Question marks indicate controversial practices.[12]

Benefit Attribute	H. S. diplomas	Access to further schooling	Knowledge per se	Playing Varsity Sports	Nurturance	Special Programs
SEX	unjust	unjust	unjust	?	unjust	just
RACE	unjust	unjust	unjust	just	unjust	just
HEIGHT	unjust	unjust	unjust	just	unjust	just
ABILITY	just	just	just	?	unjust	just
EFFORT	?	?	just	just	unjust	just
CHOICE	just	just	just	just	?	just
NEED	unjust	unjust	unjust	unjust	?	just
WEALTH	?	unjust	just	?	unjust	just
DISABILITY	just	just	?	just	unjust	just
POTENTIAL	just	just	just	just	unjust	just
ACHIEVEMENT	just	just	just	just	unjust	just

Chart 2. Benefits distributed according to attribute

What the chart indicates is that in different situations an attribute may be educationally relevant or it may not. Consider the case of sexual groupings and varsity sports. Sex is generally not considered a relevant attribute so far as any educational benefit is concerned. It is unjust, for example, to distribute high school diplomas on the basis of sex. But participation in varsity sports is another matter. There is sometimes controversy about allowing women to play football, particularly in public high schools. Our chart indicates this with a question mark. (Imagine how chart 2 would look if it reflected the common opinions of the US in 1800.)

Choice is an important and controversial attribute in our culture. If adults choose not to participate in certain programs, for example, it is generally not thought to be unjust if they fail to gain the benefits those programs offer. But if children or mentally incompetent people choose not to participate it is often taken as a sign of immaturity or incompetence. Truancy is an example. Significantly, in the case of truancy the lack of consequent benefits is still often argued to be unjust, even though this insinuates that coercion may be justified. (This sense of injustice no doubt supports compulsory schooling statutes.)

Other controversial practices suggested by the chart are:

a. allowing students to play varsity sports on the basis of choice (interest) rather than ability (a long-established practice at Swarthmore College);

b. social promotion, promoting students on the basis of effort rather than knowledge;

c. providing nurturance, a scarce resource, on the basis of need rather than following traditional practices as sharing on a per capita basis. (This is called Special Education);

d. providing diplomas and sports participation on the basis of wealth. (This is an important service of some kinds of private schooling.)

Needs and Consensus

Embedded in The Testing for Justice Rationale is an interesting equation:

Ability - achievement = need

Read this "Ability minus achievement equals need" or "The measure of need is indicated by the difference between ability and achievement."

On the basis of this equation students are often sorted into three types, underachievers, normal and overachievers. Chart 3 shows some hypothetical scores for tests of ability and achievement. Using the equation given above, need is calculated. On the basis of need, students are typed as overachievers, normal and underachiever.

	Ability	Achievement	Need	Type
Group A	50	95	-45	Overachiever
Group B	50	50	0	Normal
Group C	50	15	35	Underachiever

Chart 3

So it is argued that underachievers have greater educational needs. And numbers make it seem so objective.

Vague formulas like the one above guide a surprising amount of daily school practice and can be discerned in the rationales offered for such practice.[13] They express not only accepted generalizations from practice but also conceptions of human nature. Their usefulness is not that they provide exact measures of important pedagogical constructs, but that they can so readily guide practice.

But do they really identify needs? It depends on what we mean by needs.[14]

In schooling needs have long been treated as though they were independent of consensus. But underlying much talk about needs is the assumption that something should be desired. When someone calls something a need, he is often urging action to address it while begging the crucial question of why we should address it.

We can distinguish between two conceptions of need: a conditional concept and an approval concept. A clear picture of this distinction can be gotten by comparing the following situations.

Situation 1: Johnny asks you to borrow a magic marker. "I need it to write graffiti on the boys' room wall," he explains.

Situation 2: Mark tells you, "I need a magic marker to do my school art project."

We would deny that Johnny needs a magic marker, but concede that Mark needs one. Why? Because we do not approve of graffiti but we value Mark's art project. If our values were different, our assessment of needs would be different.

The conditional concept of need says merely that: some item X is necessary to bring about some other item Y. The magic marker stands in this relation to grafittiing the wall as it does to doing the art project. In the conditional sense both Johnny and Mark have needs, just as cars need fuel or terrorists need explosives. A conditional need indicates at most, a lack. But lacks do not necessarily beg for remediation.

Talking about needs in schooling is an attempt to transform an objective, take-it-or-leave-it conditional need into a need that moves us to support without careful consideration. The common technique is to show that there is a lack of some kind, and then to treat that lack as synonymous with an approval concept of need. A typical instance goes something like this:

Researchers working for one or another special interest group announce with alarm that there is a great need to emphasize classical antiquity in the high school curriculum because 97% of all five-thousand high school seniors surveyed nation-wide could not identify Achilles, the Acropolis, Adonis, Aeneas, the Aeneid and several dozen similar items.

If the research has been done properly, it does demonstrate that there is a lack among high school seniors. But it does not demonstrate that we should do anything about it. That is an entirely different matter.

We are not disparaging needs-slogans, but merely reiterating the point emphasized that needs-slogans assume and obscure issues of value and consensus. If, for example, people do agree on the value of "self-fulfillment" and what it means, then what they believe to be a causal or logical necessity to achieve self-fulfillment will probably be approved of also. But an even more important consideration is this: where people appear to be unmoved by appeals to needs, this may not be a matter of heartlessness, but rather a disagreement over values or over beliefs in what is causally or logically related.

Examining the Rationale

Suppose candidates for school positions, teachers, principals or superintendents, were asked to comment on the Testing for Justice Rationale during their employment interview. I would wager that were they to disavow or deny it, they would be denied employment. Most likely, surreptitiously denied employment-- moved to the bottom of the list -- since educators like to flatter themselves that they are open to diversity in philosophy as well as race, religion, ethnicity, disability or sexual preference. (And lawsuits are expensive.)

But too many schools adapt such slogans as "All children can learn," or "We are dedicated to excellence" -- and importune their staff to accept them, at least to the level of lip-service. This does not leave much wriggle room for those who find the Testing for Justice Rationale to be presumptuous of Omnipotence almost to the point of blasphemy.

Actually, if we analyze the Testing for Justice Rationale, we can see just where issues of value versus issues of power can be distinguished. By doing so we may achieve a consensus on important values without necessitating commitment to a possibly counterfactual optimism expressed in the Rationale.

The Rationale is:

For some children and not others to have their needs met by the schools is unfair. Justice therefore dictates that we meet the needs of all children. How do we determine those needs? By comparing what children can do with what they can learn to do. Any discrepancy between achievement and potential is an indicator of need. How do we determine such things as achievement and potential? By adequate testing.

Is it really unfair for some children and not others to have their needs met by the schools? Does the concept of readiness -- so important to reading teachers -- not indicate that we recognize that some children may be such that the school can meet their needs better than others? And even if there is unfairness here, must it be the school that is responsible to address it? Does Justice dictate this? Or is it that other institutions in our society have foisted this off on the schools?[15]

Ought we, as educators, accept responsibilities beyond the reasonable scope of our knowledge? Less blather about "determining potential" and more humility might work out in the long run to enhance our professional repute to a greater extent than our posing as Medicine Men for All Things Academic.

But if we are to accept such responsibilities, can we expect to be given reasonable resources to support our efforts? So far as funding is concerned, Special Education has been reneged on since its inception. Do we really expect a more generous flow from the public coffers in the future?

Let us recognize that testing is a side issue. Tests are constructed after most of the important issues of value, ethics and politics that impinge upon schooling have been settled. This is why private and parochial schools seldom are consumed with the furor, the enthusiasms and the dismay that testing brings to public education.

We may well continue to be concerned with the inequities we perceive in our society. We may well continue to pursue a dream of alabaster cities' gleam, undimmed by human tears. If so, we might do better to look elsewhere than to public education to address our aspirations.