Time-on-Task Evaluation
©1999 Eileen Bonine

edited 4/14/01

It has been demonstrated that increased time spent on learning activities yields increased learning, provided that the teacher is competent and the learning activities are effectively designed and implemented (Brophy, 1988). This hopefully is no surprise to anyone. Two elements of time spent, as described by Levin & Nolan (1996), are time allocated to teaching a subject, and the students' time spent actively engaged in learning. The concept of "time-on-task" has been derived as a measure of the latter variable. Few teachers would argue that one of their primary objectives is to keep the class on task as much as possible, particularly since the time allocated to teaching is compromised by administrative needs, announcements, and other interruptions. A method for assessing time-on-task in a classsroom setting will be discussed.

When evaluating a class for "time-on-task" one is asked to scan the classroom, noting and recording individual student behavior at regular time intervals. The experience and skill level of the observer will help determine the observation interval, but an interval of approximately 5 seconds between observations is suggested. To avoid bias, the observer must make a random sampling of student behavior by selecting students from all areas of the classroom. A sampling plan, indicating an order in which to observe individual students, is provided to assist the observer.

Students are judged to be on-task, misbehaving, or doing nothing. The observer selects one of these three descriptions of the student's behavior and records either a letter T (on-task), a letter B (misbehaving), or nothing (not on task, not misbehaving). At the end of the observation session, the data are tallied and a percent time-on-task score is assessed. In order to accurately assess time-on-task, the observer must be able to clearly distinguish between these three behaviors. In certain learning situations, this may be fairly difficult to ascertain. When a student is sitting quietly, who can really determine whether or not he is on task? If the student is thinking about or processing the subject material, formulating a question or an answer, or simply listening and absorbing, he may be judged to be doing nothing when he is in fact on-task and actively learning. The five-second sampling interval requires the observer to make a snap decision without benefit of careful study.

The calculation of time-on-task is made by dividing the number of on-task observations by the total number of observations. Should the "nothing" data points be excluded from the total? This bears careful consideration. The number of these null points, of course, has a bearing on the decision. A data set with very few null points will not be greatly affected either way, but a large number of null points can sway the on-task percentage significantly. If the objective of the evaluation is to determine time spent effectively on learning activities, and the observer confidently assigns the null value to mean "not on task, not misbehaving", then the points should be included. Excluding them will give a falsely high on-task rating. If the observer cannot confidently determine that the student is not on task, the points should be excluded.

I had the opportunity to practice the time-on-task observation process in a recent teaching simulation. The teacher was attempting to have the class act out a scene from Romeo and Juliet. The students mostly stood clustered around the teacher, occasionally wandering off on their own. They often had their backs turned, making it difficult to judge their behavior with their faces obscured. If they were talking or laughing among themselves or otherwise clearly misbehaving, there was no problem assessing them, but if they were silent there was no way to tell with their backs turned if they were paying attention and on-task. I had difficulty randomizing the observations. I had a tendency to get caught up in whatever action was taking place, and would either suspend my observations temporarily or focus on one cluster of students at the expense of those on the fringe of the activity. This perhaps skewed my observations, resulting in an incorrectly high measure of misbehavior, but I can't be sure that the other students weren't also off-task as I was.

The behavior table is attached in Appendix A. The table was expanded to hold the complete data set. The data are summarized below:

% null points included
% null points excluded
On-task observations
Misbehaving observations 
Neither on task nor misbehaving
Total observations

In this case, the null points had little effect on the overall assessment. Looking at the data table in Appendix A, one can see that the misbehavior tended to be infectious. A few on-task ratings tended to be followed by a longer string of misbehaviors. Particularly because the students were standing around in an informal cluster, they had a tendency to get drawn into what was going on in their vicinity. Their close proximity to one another made it easy for them to see each other's behavior and mirror it. Had the students been sitting separately and individually at their desks, neatly lined up in rows facing forward, this may have been less of a problem. They may not have become engaged in the activity, however, resulting in similar off-task scoring. While the students were unruly, they seemed to be getting into the spirit of the playacting, and while they were off-task frequently, my sense is that they weren't really far off and could have been brought back in. The data show strings of on-task behavior, as well as misbehavior. The group dynamics tended to dictate on-task behavior. With the exception of one student with a wandering-off problem, they were either all engaged or all misbehaving.

I found the evaluation process to be highly subjective and uncomfortably imprecise. I was unable to follow the suggested randomization pattern, and just did my best to fairly scan the classroom. An observer would need to gain a fair amount of experience before he could be confident in the reproducibility of his results. Two side-by-side observers in our simulation differed by greater than 10 percentage points.

An acceptable score would seem to be situational. As observed in the Romeo and Juliet scenario, the classroom setting impacted the scoring. If the data were used for trending evaluations and afforded a high margin of experimental error, the teacher might be able to use the data to compare classroom behavior on different days, with different students, or perhaps the same class with a different teacher. The method provides one measure of classroom control and when performed by a trained observer, could become a useful tool for a teacher wishing to monitor classroom performance. I would not be confident comparing the results obtained with this class with those obtained from another type of classroom situation or setting.


Brophy, J. E. (1988). Educating teachers about managing classrooms and students. Teaching and Teacher Education, 4, 1, 3.

Levin, J. and Nolan, J. F. (1996). Principles of Classroom Management, 2nd edition. Boston: Allyn and Bacon.