Applied Psychological Measurement

 

Advanced Search

Journal Navigation

Journal Home

Subscriptions

Archive

Contact Us

Table of Contents

Free Access - Register Here

Sign In to gain access to subscriptions and/or personal tools.
This Article
Right arrow Abstract Freely available
Right arrow Free Full Text (Free PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Add to Saved Citations
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Request Reprints
Right arrow Add to My Marked Citations
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (4)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by von Davier, A. A.
Right arrow Articles by Wilson, C.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
Applied Psychological Measurement, Vol. 32, No. 1, 11-26 (2008)
DOI: 10.1177/0146621607311560

Investigating the Population Sensitivity Assumption of Item Response Theory True-Score Equating Across Two Subgroups of Examinees and Two Test Formats

Alina A. von Davier

Educational Testing Service, avondavier{at}ets.org

Christine Wilson

Educational Testing Service

Dorans and Holland (2000) and von Davier, Holland, and Thayer (2003) introduced measures of the degree to which an observed-score equating function is sensitive to the population on which it is computed. This article extends the findings of Dorans and Holland and of von Davier et al. to item response theory (IRT) true-score equating methods that are commonly used in the nonequivalent-groups anchor test (NEAT) design. Using data from the Advanced Placement Program Calculus AB exam, which contain multiple-choice (MC) and free-response (FR) sections, the authors investigate the population sensitivity of the IRT equating functions computed for the MC section only and for the MC and FR sections together. The degree of population sensitivity is also compared across three equating methods: the IRT true-score equating method and two observed-score equating methods, chained equipercentile and Tucker linear equating.

Key Words: Index terms: population sensitivity • observed-score equating • IRT true-score equating • nonequivalent-groups anchor test (NEAT)

References

  • Braun, H.I., & Holland, P.W. (1982). Observed score test equating: A mathematical analysis of some ETS equating procedures. In P. W. Holland & D. B. Rubin (Eds.), Test equating (pp. 9-49). New York: Academic Press.
  • Brennan, R.L., & Kolen, M.J. (1987). Some practical issues in equating. Applied Psychological Measurement, 11, 279-290.[Abstract]
  • Cook, L.L., Dorans, N.J., Eignor, D.R., & Petersen, N.S. (1985). An assessment of the relationship between the assumption of unidimensionality and the quality of IRT true-score equating (ETS Research Rep. No. RR-85-30). Princeton, NJ: Educational Testing Service.
  • Cook, L.L., & Eignor, D.R. (1991). An NCME instructional module on IRT equating methods. Educational Measurement: Issues and Practice, 10, 37-45.
  • Cook, L.L., & Petersen, N.S. (1987). Problems related to the use of conventional and item response theory equating methods in less than optimal circumstances. Applied Psychological Measurement, 11, 225-244.[Abstract]
  • Dorans, N.J., & Holland, P.W. (2000). Population invariance and equatability of tests: Basic theory and the linear case. Journal of Educational Measurement, 37, 281-306.[CrossRef]
  • Dorans, N.J., Holland, P.W., Thayer, D.T., & Tateneni, K. (2003, April). Invariance of score linking across gender groups for three Advanced Placement Program exams. Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans, LA.
  • Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22, 144-149.
  • Hambleton, R.K., Swaminathan, H., & Rogers, H.J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage.
  • Harris, D.J., & Crouse, J.D. (1993). A study of criteria used in equating. Applied Measurement in Education, 6, 195-240.[CrossRef]
  • Harris, D.J., & Kolen, M.J. (1986). Effect of examinee group on equating relationships. Applied Psychological Measurement, 10, 35-43.[Abstract]
  • Hattie, J. (1985). Methodology review: Assessing unidimensionality of tests and items. Applied Psychological Measurement, 9, 139-164.[Abstract]
  • Jodoin, M.G., & Davey, T. (2003, April). A multidimensional simulation approach to investigate the robustness of IRT common item equating. Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL.
  • Kolen, M.J., & Brennan, R.L. (2004). Test equating, scaling, and linking: Methods and practices (2nd ed.). New York: Springer-Verlag.
  • Lord, F.M. (1980). Application of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.
  • Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159-176.[Abstract/Free Full Text]
  • Peterson, N.S. (2008). A Discussion of Population Invariance of Equating. Applied Psychological Measurement, 32, 98-101.[Abstract/Free Full Text]
  • Petersen, N.S., Cook, L.L., & Stocking, M.L. (1983). IRT versus conventional equating methods: A comparative study of scale stability. Jour
  • nal of Educational Statistics, 8, 137-156. Petersen, N.S., Kolen, M.J., & Hoover, H.D. (1989). Scaling, norming, and equating. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 221-262). New York: Macmillan.
  • Stocking, M.L., & Lord, F.M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201-210.[Abstract]
  • Thissen, D., Wainer, H., & Wang, X.-B. (1994). Are tests comprising both multiple-choice and free responses items necessarily less unidimensional than multiple-choice tests? An analysis of two tests. Journal of Educational Measurement, 31, 113-123.[CrossRef][ISI]
  • von Davier, A.A. (2003). Notes on linear equating methods for the non-equivalent groups design (ETS Research Rep. No. RR-03-24). Princeton, NJ: Educational Testing Service.
  • von Davier, A.A., Holland, P.W., & Thayer, D.T. (2003). Population invariance and chain versus post-stratification equating methods. In N. J. Dorans (Ed.), Population invariance of score linking: Theory and applications to Advanced Placement Program examinations (ETS Research Rep. No. RR-03-27, pp. 19-36). Princeton, NJ: Educational Testing Service.
  • von Davier, A.A., Holland, P.W., & Thayer, D.T. (2004a). The chain and post-stratification methods for observed-score equating and their relationship to population invariance. Journal of Educational Measurement, 41, 15-32.[CrossRef][ISI]
  • von Davier, A.A., Holland, P.W., & Thayer, D.T. (2004b). The kernel method of test equating. New York: Springer-Verlag.
  • von Davier, A.A., & Wilson, C. (2005). A didactic approach to the use of IRT true score equating (ETS Research Rep. No. RR-05-26). Princeton, NJ: Educational Testing Service.

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
Applied Psychological MeasurementHome page
Qing Yi, H. Assessment, D. J. Harris, and Xiaohong Gao
Invariance of Equating Functions Across Different Subgroups of Examinees Taking a Science Achievement Test
Applied Psychological Measurement, January 1, 2008; 32(1): 62 - 80.
[Abstract] [PDF]


Home page
Applied Psychological MeasurementHome page
N. J. Dorans, Jinghua Liu, and S. Hammond
Anchor Test Type and Population Invariance: An Exploration Across Subpopulations and Test Administrations
Applied Psychological Measurement, January 1, 2008; 32(1): 81 - 97.
[Abstract] [PDF]


Home page
Applied Psychological MeasurementHome page
N. S. Petersen
A Discussion of Population Invariance of Equating
Applied Psychological Measurement, January 1, 2008; 32(1): 98 - 101.
[Abstract] [PDF]


Home page
Applied Psychological MeasurementHome page
R. L. Brennan
A Discussion of Population Invariance
Applied Psychological Measurement, January 1, 2008; 32(1): 102 - 114.
[PDF]


This Article
Right arrow Abstract Freely available
Right arrow Free Full Text (Free PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Add to Saved Citations
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Request Reprints
Right arrow Add to My Marked Citations
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (4)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by von Davier, A. A.
Right arrow Articles by Wilson, C.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?