The High-Stakes Testing Scam Revealed At Last

What if I told you the high-stakes testing American children have been going through is a complete and utter scam? Many would say they already knew that, but would they be able to tell you how they knew this? Probably not. At least not at the levels our state Department of Educations developed with the many testing companies such as American Institutes for Research, Pearson, and the Smarter Balanced Assessment Consortium.

The Delaware Department of Education put out a Request for Proposal for our new Social Studies State Assessment. The actual RFP is a treasure trove of testing information. For starters, the Delaware Department of Education is flat-out lying in their RFP. Last year, the Delaware DOE put out their “Delaware School Success Framework”. This is essentially Delaware’s report card for schools. Included in this horrible accountability testing machine are participation rate penalties for schools that go under 95% participation rate on the state assessments. The Delaware DOE and State Board of Education tried passing an updated version of Delaware’s regulation regarding school accountability, but many parents and education organizations balked and successfully blocked the State Board of Education from passing it. As a result, even though the Delaware State Board of Education eventually passed the Delaware School Success Framework, there is no regulatory power behind it. But that didn’t stop the Delaware DOE from making it look like it is perfectly legal in their RFP for the new Social Studies state assessment.

One of the first things the DOE calls for from a potential vendor for this test is understanding of and the ability to put the Rasch Scoring Methodology into the test. What is this Rasch the Delaware DOE has? It is an all-consuming itch to trip up kids and schools and parents. This is part of the underbelly of state testing that no one talks about. The website appropriately titled explains the Rasch Scoring Methodology as this:

What is a Rasch Analysis? The Rasch model, where the total score summarizes completely a person’s standing on a variable, arises from a more fundamental requirement: that the comparison of two people is independent of which items may be used within the set of items assessing the same variable. Thus the Rasch model is taken as a criterion for the structure of the responses, rather than a mere statistical description of the responses. For example, the comparison of the performance of two students’ work marked by different graders should be independent of the graders.

In this case it is considered that the researcher is deliberately developing items that are valid for the purpose and that meet the Rasch requirements of invariance of comparisons.

Analyzing data according to the Rasch model, that is, conducting a Rasch analysis, gives a range of details for checking whether or not adding the scores is justified in the data. This is called the test of fit between the data and the model. If the invariance of responses across different groups of people does not hold, then taking the total score to characterize a person is not justified. Of course, data never fit the model perfectly, and it is important to consider the fit of data to the model with respect to the uses to be made of the total scores. If the data do fit the model adequately for the purpose, then the Rasch analysis also linearises the total score, which is bounded by 0 and the maximum score on the items, into measurements. The linearised value is the location of the person on the unidimensional continuum – the value is called a parameter in the model and there can be only one number in a unidimensional framework. This parameter can then be used in analysis of variance and regression more readily than the raw total score which has floor and ceiling effects.

Many assessments in these disciplines involve a well defined group of people responding to a set of items for assessment. Generally, the responses to the items are scored 0, 1 (for two ordered categories); or 0, 1, 2 (for three ordered categories); or 0, 1,2, 3 (for four ordered categories) and so on, to indicate increasing levels of a response on some variable such as health status or academic achievement. These responses are then added across items to give each person a total score. This total score summarise the responses to all the items, and a person with a higher total score than another one is deemed to show more of the variable assessed. Summing the scores of the items to give a single score for a person implies that the items are intended to measure a single variable, often referred to as a unidimensional variable.

The Rasch model is the only item response theory (IRT) model in which the total score across items characterizes a person totally. It is also the simplest of such models having the minimum of parameters for the person (just one), and just one parameter corresponding to each category of an item. This item parameter is generically referred to as a threshold. There is just one in the case of a dichotomous item, two in the case of three ordered categories, and so on.

Now this has a lot of lingo I didn’t quite get.  But the important part about understanding the Rasch Methodology of Scoring is that ALL items must be the same.  This is NOT what is going on currently.  With Smarter Balanced, PARCC and other state assessments, the testing companies have developed what is called a Partial Matrix of Items.  What this means is that a portion of the state assessment is the same for everyone.  But the remaining portion comes from a bucket of different test items submitted for these tests.  In partial matrix testing theory, the similar content shared by all could be anywhere from 20-30% of the items on the test.  The rest varies based on what is in the bucket.  What this means is this shocking find: students aren’t taking the exact same state assessment.  For Smarter Balanced test-takers, the tests aren’t the same.  The same for PARCC as well.

The truly frightening part about this is the probabilities with Partial Matrix.  If a student is a high achiever, the probability they will get a correct answer is above a probability of .5 on each item’s scale.  If they aren’t a high achiever and struggle, the probability drops below .5 on the scale.  So these tests are designed so roughly half get it right and half get them wrong.  But if kids aren’t taking the same exact test, where all the items after the “common” items change, that throws the whole model into whack.  The testing companies know this.  Our state DOEs know this.  The US DOE knows this.  Chances are many corporate education reform companies, politicians, and even some school Superintendents know this.  Any testing coordinator in a school district or charter school should know this.

This is also why opt out throws the whole scheme into disarray.  If too many “smart kids” opt out, it will change that whole .5 probability.  If too many struggling kids opt out, the test scores will be very high.  The testing companies love this model because it furthers the whole standardized testing environment which gives them lots of money.  With this model, schools fail and schools succeed.  It really is based on the socio-economic demographics of any given school.  This explains why the 95% participation rate is the desired outcome.  With a school of 1000 kids, 950 kids taking the test isn’t going to skew the results too much.  But once you get below that level, that .5 probability begins to shift in either direction.  None of these testing advocates care if the kids are proficient or not.  They already know, for the most part, exactly how it is going to turn out.  That’s when the real work and potential manipulation can occur.

In Delaware, students don’t take the Smarter Balanced Assessment at the same time.  There is a three month testing window.  Some schools begin in the first week of March whereas others may not start until May.  How do we know, with 100% certainty, companies like our testing vendor, American Institutes for Research aren’t looking at that data constantly?  How do we know they aren’t able to ascertain which questions have a higher or lower probability of being answered correctly once students start taking the test?  How do we know the testing gurus at our state DOEs aren’t in constant contact with the testing companies and are able to determine ahead of time which testing items in the “non-common” partial matrix to send to different schools, or even certain grades?

For example, say a state really wants to have a particular school show phenomenal “growth” in proficiency scores from one year to the next.  This could be a charter school.  While the overall proficiency rate isn’t phenomenal, the growth could be.  As a result, more students could be wowed by this school and might be more apt to send their children there.  It could flip around another way.  Say a state DOE really is  just sick of a particular district and wants more charters in that area.  The best way to make more charters is to show more failing traditional schools.  Even some charters could be expendable.  Another one might want to expand their enrollment and has more influence and pull than other ones.  With current accountability regulations (and more to come under ESSA), this allows states to continue labeling and shaming certain schools.  The reality is these assessments can be molded into any shape a state might want if they are able to interact with the testing vendor and determine which items go to which school.  This is a worst-case scenario for an already bad test to begin with.

While state DOEs brag about the computer-adaptability of these tests and how it will “work with the student”, this is the most egregious part of the whole modern-day standardized testing scheme.  By having this “adaptability”, it disguises the true intent: different items on the tests for different students.  Even if students talk about particular items on the test, the adaptability prevents them from having the same items on the test.  It is an ingenious scheme.

For teachers, some could be guided towards certain directions by the state DOEs for where to go with curriculum.  Others could be guided in the wrong direction which will ultimately change the results of these assessments.  It is the grandest illusion of them all.  The state DOEs will say “we have advisory committees.  Teachers pick the items for the test.”  I’m sure they do.  And I’m also sure there are plants on those committees.  Ones that wind up working with certain state foundations, state DOEs, or other corporate education reform companies.  It sounds so shady, doesn’t it?  How much of a soul has to be sold to make more money or climb up the corporate education ladder?

While all of this may have your head reeling, try this on for size: what happens when competency-based education becomes the next “thing”?  When digital personalized learning becomes the norm and all these state assessments become broken down into mini-standardized tests?  Instead of those 7-10 days when students are hogging up all the bandwidth in the school and teachers most likely lose a lot of hair, the tests will be shorter.  They will become end of unit assessments.  Teachers won’t even need to worry about administering their own end of unit assessment because Smarter Balanced and PARCC already did all the work!  How convenient.  Not only did our states reduce testing time, but also teacher’s time and effort.  A true cause for celebration.  And parents won’t even be able to opt their kids out of these tests because most of them most likely won’t even know their kid is testing and their classroom grades will be based off their digital personalized learning work and their competency-based education high-stakes mini-test.  We know Delaware is leaning towards this testing model because Delaware Secretary of Education Dr. Steven Godowsky mentioned this during our last Assessment Inventory Committee meeting back in May.

Meanwhile, back at the state DOE, they are getting all this data.  They are getting it from their vendors like American Institutes for Research, or Questar, or Pearson.  Other companies want to see it so they can work on a report about how to fix our schools.  Our state DOEs actually pay them to do these reports.  Through contracts and extensions of contracts.  Yes, only the student identifier code goes out.  These testing companies really don’t care about who the student is, just what they can extrapolate from the data.  But then that information comes back to the state.  The state knows who that student identifier belongs to.  For example, Student ID # belongs to John Johns at Delaware Elementary School.  Based on the information from all that data, they can easily paint a picture of that student.  Based on the scores, how long it took them to take the test, how they answered responsive questions… all of this allows them to track.  So much so they can determine, based on other algorithms and matrices, exactly what career path John Johns is heading towards.  Perhaps we should guide him towards that culinary program.  Or maybe Bio-technology pathways.  Or maybe poor John Johns won’t ever advance past a welder position.  FERPA guidelines allow state DOEs to actually do this.

Want to know who always loses in these testing games?  Students with disabilities.  They may receive accommodations but they never get the one accommodation they need the most.  For regular classroom tests, IEP teams frequently agree on a student not taking every single test question.  Maybe 1//2 or 3/4 of the questions.  Standardized tests don’t allow for that.  The answer is always the same: they will get more time.  What they fail to understand is what “more time” means to these students.  It means more time focusing on the same task: Taking a test.  What are their regular peers doing when these kids are getting “more time”?  They are learning.  Receiving instruction.  Getting ahead.  Students with disabilities are, yet again, put in a position where they will become further behind.

We all knew our kids were guinea pigs for these tests.  We just didn’t know how much.  The time to opt out of these tests, no matter what the circumstances might be, is now.  Not later, not tomorrow.  Now.  Today is your opt out day for your child.

Below is the RFP for Delaware’s Social Studies state assessment.  I’ve gone through this and highlighted key wording and troubling aspects which I will write more about tonight or tomorrow.  Don’t be fooled by the DOE’s statements of assurance in this.  I have no doubt their legal team went through it very carefully.  But I’m fairly certain they didn’t expect a citizen to go through it and dissect it like I did…