|
Sign In to gain access to subscriptions and/or personal tools.
|
Effects of Ignoring Item Interaction on Item Parameter Estimation and Detection of Interacting Items
Cheng-Te Chen
National Chung Cheng University, Chia-Yi, Taiwan
Wen-Chung Wang
National Chung Cheng University, Chia-Yi, Taiwan, psywcw{at}ccu.edu.tw
This study explores the effects of ignoring item interaction on item parameter estimation and the efficiency of using the local dependence index Q3 and the SAS NLMIXED procedure to detect item interaction under the three-parameter logistic model and the generalized partial credit model. Through simulations, it was found that ignoring positive item interaction led to overestimation for the discrimination parameters, underestimation for the difficulty parameters, and a Q3 much smaller than zero. As the guessing parameters approached zero, the overestimation for the discrimination parameters became more serious. In contrast, ignoring negative item interaction led to underestimation for the discrimination parameters, overestimation for the difficulty parameters, and a Q3 much larger than zero. As the guessing parameters approached zero, the underestimation for the discrimination parameters became less serious. A modification of posterior predictive p value for Q3 was proposed to detect item interaction and was found to work very well. Direct modeling of item interaction using NLMIXED was demonstrated.
Key Words: local item dependence three-parameter logistic model generalized partial credit model posterior predictive check item response theory SAS NLMIXED
References
- Béguin, A.A., & Glas, C.A.W. (2001). MCMC estimation of multidimensional IRT models. Psychometrika, 66, 541-562.[CrossRef]
- Berkhof, J., van Mechelen, I., & Hoijtink, H. (2001). Posterior predictive checks: Principles and discussion. Computational Statistics, 15, 337-354.[CrossRef][Web of Science]
- Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee's ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397-479). Reading, MA: Addison-Wesley.
- Bradlow, E.T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64, 153-168.[CrossRef][Web of Science]
- Chen, W.-H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22, 265-289.[Abstract/Free Full Text]
- De Boeck, P., & Wilson, M. R. (Eds.). (2004). Explanatory item response models: A generalized linear and nonlinear approach. New York: Springer-Verlag.
- Embretson, S.E., & Reise, S.P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum.
- Fox, J.P., & Glas, C.A.W. (2001). Bayesian estimation of a multilevel IRT model using Gibbs sampling. Psychometrika, 66, 271-288.[CrossRef][Web of Science]
- Gelman, A., Carlin, J.B., Stern, H.S., & Rubin, D.B. (1995). Bayesian data analysis. London: Chapman & Hall.
- Glas, C.A.W., & Meijer, R.R. (2003). A Bayesian approach to person fit analysis in item response theory models. Applied Psychological Measurement, 27, 217-233.[Abstract]
- Hambleton, R.K., & Swaminathan, H. (1985).Item response theory: Principles and applications. Boston: Kluwer Nijhoff.
- Hoijtink, H., & Molenaar, I.W. (1997). A multidimensional item response model: Constrained latent class analysis using the Gibbs sampler and posterior predictive checks. Psychometrika, 62, 171-189.[CrossRef][Web of Science]
- Hoskens, M., & De Boeck, P. (1997). A parametric model for local dependence among test items. Psychological Methods, 2, 261-277.[CrossRef][Web of Science]
- Janssen, R., Tuerlinckx, F., Meulders, M., & de Boeck, P. (2000). A hierarchical IRT model for criterion referenced measurement. Journal of Educational and Behavioral Statistics, 25, 285-306.[Abstract/Free Full Text]
- Karabatsos, G., & Sheu, C.-F. (2004). Order-constrained Bayes inference for dichotomous models of unidimensional nonparametric IRT. Applied Psychological Measurement, 28, 110-125.[Abstract]
- Li, Y., Bolt, D.M., & Fu, J. (2006). A comparison of alternative models for testlets. Applied Psychological Measurement, 30, 3-21.[Abstract/Free Full Text]
- Lord, F.M. (1980). Applications of item response theory to practice testing problems. Hillsdale, NJ: Lawrence Erlbaum.
- Lynch, S.M., & Western, B. (2004). Bayesian posterior predictive checks for complex models. Sociological Methods and Research, 32, 301-335.[CrossRef]
- Masters, G.N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-174.[CrossRef][Web of Science]
- Meijer, R.R., & Nering, M.L. (1997). Trait level estimation for nonfitting response vectors. Applied Psychological Measurement, 21, 321-336.[Abstract/Free Full Text]
- Meng, X.L. (1994). Posterior predictive p-values. Annals of Statistics, 22, 1142-1160.[Web of Science]
- Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159-176.[Abstract/Free Full Text]
- Muraki, E., & Bock, R.D. (2003). PARSCALE 4: IRT item analysis and test scoring for rating-scale data [Computer software]. Chicago: Scientific Software.
- Nering, M.L. (1995). The distribution of person fit using true and estimated person parameters. Applied Psychological Measurement, 19, 121-129.[CrossRef][Web of Science]
- Patz, R.J., & Junker, B.W. (1999). A straightforward approach to Markov chain Monte Carlo methods for item response theory models. Journal of Educational and Behavioral Statistics, 24, 146-178.[Abstract/Free Full Text]
- Pinheiro, J.C., & Bates, D.M. (1995). Approximations to the log-likelihood function in the nonlinear mixed-effects model. Journal of Computational and Graphical Statistics, 4, 12-35.[CrossRef]
- Reise, S.P. (2000). Using multilevel logistic regression to evaluate person-fit in IRT models. Multi-variate Behavioral Research, 35, 543-568.[CrossRef][Web of Science]
- Reise, S.P., & Widaman, K.F. (1999). Assessing the fit of measurement models at the individual level: A comparison of item response theory and covariance structure models. Psychological Methods, 4, 3-21.[CrossRef][Web of Science]
- Rijmen, F., & De Boeck, P. (2002). The random weights linear logistic test model. Applied Psychological Measurement, 26, 271-285.[Abstract/Free Full Text]
- Rosenbaum, P. (1984). Testing the conditional independence and monotonicity assumptions of item response theory. Psychometrika, 53, 349-359.[CrossRef]
- Rubin, D.B. (1984). Bayesianly justifiable and relevant frequency calculations for applied statistician. The Annals of Statistics, 12, 1151-1172.
- SAS Institute. (2000). SAS/STAT user's guide (Version 8). Cary, NC: Author.
- Sheu, C.-F., Chen, C.-T., Su, Y.-H., & Wang, W.-C. (2005). Using SAS PROC NLMIXED to fit item response theory models. Behavior Research Methods, 37, 202-218.[Medline]
[Order article via Infotrieve]
- Sireci, S.G., Thissen, D., & Wainer, H. (1991). On the reliability of testlet based tests. Journal of Educational Measurement, 28, 237-247.[CrossRef][Web of Science]
- Smits, D.J.M., & De Boeck, P. (2003). Estimation of the MIRID: A program and a SAS-based approach. Behavioral Research Methods, Instruments, and Computers, 35, 537-549.[Web of Science][Medline]
[Order article via Infotrieve]
- Spray, J.A., & Ackerman, Y.A. (1987). The effect of item response dependency on trait or ability dimensionality (ACT Research Report Series 87-10). Iowa City, IA: American College Testing Program.
- Thissen, D., & Chen, W. (1993). Item response theory and local dependency: An interim report (Research Memorandum 93-3). Chapel Hill: L. L. Thurston Laboratory, University of North Carolina at Chapel Hill.
- Thissen, D., Steinberg, L., & Mooney, J.A. (1989). Trace lines for testlets: Use of multiple-categorical-response models. Journal of Educational Measurement, 26, 247-260.[CrossRef][Web of Science]
- Tuerlinckx, F., & De Boeck, P. (2001). The effect of ignoring item interactions on the estimated discrimination parameters in item response theory. Psychological Methods, 6, 181-195.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Tuerlinckx, F., De Boeck, P., & Lens, W. (2002). Measuring needs with the Thematic Apperception Test: A psychometric study. Journal of Personality and Social Psychology, 82, 448-461.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- van den Wollenberg, A.L. (1982). Two new test statistics for the Rasch model. Psychometrika, 47, 123-140.[CrossRef][Web of Science]
- Verguts, T., & De Boeck, P. (2000). A Rasch model for detecting learning while solving an intelligence test. Applied Psychological Measurement, 24, 151-162.[Abstract/Free Full Text]
- Wainer, H. (1995). Precision and differential item functioning on a testlet-based test: The 1991 Law School Admissions Test as an example. Applied Measurement in Education, 8, 157-186.[CrossRef][Web of Science]
- Wainer, H., Bradlow, E.T., & Du, Z. (2000). Testlet response theory: An analog for the 3PL model using in testlet-based adaptive testing. In W. van der Linden & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 245-269). London: Kluwer.
- Wainer, H., & Kiely, G.L. (1987). Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement, 24, 185-201.[CrossRef][Web of Science]
- Wainer, H., & Lukhele, R. (1997). How reliable are TOEFL scores? Educational and Psychological Measurement, 57, 749-766.
- Wainer, H., & Thissen, D. (1996). How is reliability related to the quality of test scores? What is the effect of local dependence on reliability? Educational Measurement: Issues and Practice, 15(1), 22-29.[CrossRef]
- Wainer, H., & Wang, X. (2000). Using a new statistical model for testlets to score TOEFL. Journal of Educational Measurement, 37, 203-220.[CrossRef]
- Wang, W.-C., & Wilson, M.R. (2005a). Exploring local item dependence using a random-effects facet model. Applied Psychological Measurement, 29, 296-318.[Abstract]
- Wang, W.-C., & Wilson, M.R. (2005b). The Rasch testlet model. Applied Psychological Measurement, 29, 126-149.[CrossRef][Web of Science]
- Wang, X., Bradlow, E.T., & Wainer, H. (2002). A general Bayesian model for testlets: Theory and applications. Applied Psychological Measurement, 26, 109-128.[Abstract/Free Full Text]
- Wilson, M., & Adams, R.J. (1995). Rasch models for item bundles. Psychometrika, 60, 181-198.[CrossRef][Web of Science]
- Wilson, M., & Hoskens, M. (2001). The rater bundle model. Journal of Educational and Behavioral Statistics, 26, 283-306.[Abstract/Free Full Text]
- Wolfinger, R.D. (1999). Fitting nonlinear mixed models with the new NLMIXED procedure. Proceedings of the 24th Annual SAS Users Group International Conference (SUGI 24), 287-24.
- Yen, W.M. (1981). Using simulation results to choose a latent trait model. Applied Psychological Measurement, 5, 245-262.[Medline]
[Order article via Infotrieve]
- Yen, W.M. (1984). Effect of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8, 125-145.[Abstract]
- Yen, W.M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30, 187-213.[CrossRef][Web of Science]
- Zimowski, M.F., Muraki, E., Mislevy, R.J., & Bock, R.D. (2003). Bilog-MG 3: Multiple group IRT analysis and test maintenance for binary items [Computer software]. Chicago: Scientific Software.
Applied Psychological Measurement, Vol. 31, No. 5,
388-411 (2007)
DOI: 10.1177/0146621606297309

CiteULike Complore Connotea Del.icio.us Digg Reddit Technorati Twitter What's this?
|
|