|
Sign In to gain access to subscriptions and/or personal tools.
|
Applied Psychological Measurement, Vol. 19, No. 1,
51-71 (1995)
DOI: 10.1177/014662169501900107
Complex Composites: Issues That Arise in Combining Different Modes of Assessment
Mark Wilson
University of California, Berkeley
Wen-chung Wang
National Taiwan University
Data from the California Learning Assessment System are used to examine certain characteristics of tests designed as the composites of items of different modes. The characteristics include rater severity, test information, and definition of the latent variable. Three different assessment modes-multiple-choice, open-ended, and investigation items (the latter two are referred to as performance-based modes)-were combined in a test across three different test forms. Rater severity was investigated by incorporating a rater parameter for each rater in an item response model that then was used to analyze the data. Some rater severities were found to be quite extreme, and the impact of this variation in rater severities on both total scores and trait level estimates was examined. Within-rater variation in rater severity also was examined and was found to have significant variation. The information contribution of the three modes was compared. Performance-based items provided more information than multiple-choice items and also provided greatest precision for higher levels of the latent variable. A projection-like method was applied to investigate the effects of assessment mode on the definition of the latent variable. The multiple-choice items added information to the performance-based variable. The results of the analysis also showed that the projection-like method did not practically differ from the results when the latent trait was defined jointly by both the multiple-choice and the performance-based items. Index terms: equating, linking, multiple assessment modes, polytomous item response models, rater effects.
References
- Adams, R.A., & Wilson, M. (1992, April). A random coefficients multinomial logit: Generalising Rasch models. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.
- Adams, R.A., & Wilson, M. (in press). Formulating the Rasch model as a mixed coefficients multinomial logit. In G. Engelhard & M. Wilson (Eds.), Objective measurement : Theory into practice (Vol. III). Norwood NJ: Ablex.
- Andrich, D. (1978). A rating formulation for ordered response categories . Psychometrika, 43, 561-573.[CrossRef][ISI]
- Bock, R.D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: An application of the EM algorithm. Psychometrika , 46, 443-459.[CrossRef][ISI]
- California Learning Assessment System. (1994 ). 1993/ 4 technical manual. Sacramento CA : Author.
- Cronbach, L.J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297-334.[CrossRef][ISI]
- Dillon, W.R., & Mulani, N. (1984). A probabilistic latent class model for assessing inter-judge reliability. Multivariate Behavioral Research , 19, 438-458.[CrossRef]
- Draney, K., Pirolli, P., & Wilson, M. (in press). Using the RCML to investigate linear logistic test models in a complex domain. In G. Engelhard & M. Wilson (Eds.), Objective measurement: Theory into practice (Vol. III). Norwood NJ: Ablex.
- Dunbar, S.B., Koretz, D., & Hoover, H.D. (1991). Quality control in the development and use of performance assessment. Applied Measurement in Education, 4, 289-302.[CrossRef]
- Engelhard, G., Jr. (1992). The measurement of writing ability with a many-faceted Rasch model. Applied Measurement in Education, 5, 171-191.
- Grima, A., & Liang, J. (1992, April). The effect of response rate to multiple-choice and open-ended items on differential item functioning. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.
- Jöreskog, K.G., & Sörbom, D. (1988). LISREL VII: A guide to program applications Chicago: SPSS, Inc.
- Linacre, J.M. (1988). FACETS: Computer program for many-faceted Rasch measurement. Chicago: MESA Press .
- Linacre, J.M. (1989). Many-faceted Rasch measurement. Chicago: MESA Press.
- Luecht, R., & Miller, T. (1992, April). Multidimensional considerations for polychotomous item response models. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.
- Lunz, M.E., & Stahl, J.A. (1990, April). Severity of grading across time periods . Paper presented at the annual meeting of the American Educational Research Association, Boston.
- Lunz, M.E., Stahl, J.A., Wright, B.D., & Linacre, J.M. (1989, April). Variation among examiners and protocols on oral examinations. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.
- Lunz, M.E., Wright, B.D., & Linacre, J.M. (1990). Measuring the impact of judge severity on examination scores. Applied Measurement in Education, 3, 331-345.
- Masters, G.N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-174.[CrossRef]
- Masters, G.N., & Wilson, M. (1991). Partial credit models: Advanced session for statistical analysis and measurement staff. Workshop at Educational Testing Service, Princeton NJ.
- Moore, S. (in press). Estimating and testing differential item functioning with the RCML model. In G. Engelhard & M. Wilson (Eds.), Objective measurement : Theory into practice (Vol. III). Norwood NJ: Ablex.
- Overall, J.E., & Magee, K.N. (1992). Estimating individual rater reliabilities. Applied Psychological Measurement, 16, 77-85.[Abstract]
- Pollack, J.M., Rock, D.A., & Jenkins, F. (1992, April). Advantages and disadvantages of constructed-response item formats in large-scale surveys. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.
- Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests. Chicago: University of Chicago Press. (Originally published 1960)
- Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, No. 17.
- Samejima, F. (1977). The use of the information function in tailored testing. Applied Psychological Measurement, 1, 233-247.
- Shrout, P.E., & Fleiss, J.L. (1979). Intraclass correlations : Uses in assessing rater reliability. Psychological Bulletin, 86, 420-428.[CrossRef][ISI]
- Thissen, D.M. (1976). Information in wrong responses to Raven Progressive Matrices. Journal of Educational Measurement, 13, 201-214.[CrossRef]
- Van den Bergh, H., & Eiting, M.H. (1989). A method of estimating rater reliability. Journal of Educational Measurement, 26, 29-40.
- Wainer, H., & Thissen, D.M. (1993). Combining multiple-choice and constructed-response test scores: Toward a Marxist theory of test construction. Applied Measurement in Education, 6, 103-118.[CrossRef]
- Wang, W. (1994). Implementation and application of the multidimensional random coefficients model. Unpublished doctoral dissertation, University of California, Berkeley.
- Wang, W., & Wilson, M. (in press). Comparing multiple-choice and performance-based items. In G. Engelhard & M. Wilson (Eds.), Objective measurement: Theory into practice (Vol. III). Norwood NJ : Ablex.
- Wilson, M. (1992). The ordered partition model: An extension of the partial credit model. Applied Psychological Measurement, 16, 309-325.
- Wilson, M. (1994). Community of judgment: A teacher-centered approach to educational accountability. In Office of Technology Assessment (Ed.),Issues in educational accountability (pp. 1-48). Washington D.C.: Office of Technology Assessment, United States Congress.
- Wilson, M., & Adams, R.A. (in press). Evaluating progress with alternative assessments: A model for Chapter 1. In M. B. Kane (Ed.), Implementing performance assessment: Promise, problems and challenge. Hillsdale NJ: Erlbaum.
- Wilson, M., & Adams, R.A. (in press). Rasch models for item bundles. Psychometrika .
- Wilson, M., & Case, M. (1994). Dynamic rater calibration. Paper presented at the meeting of the CLAS Technical Study Group, Sacramento CA.
- Winer, B.J. (1962). Statistical principles in experimental design . New York: McGraw-Hill.
- Wingersky, M.S., Barton, M.A., & Lord, F.M. (1982). LOGIST user's guide. Princeton NJ: Educational Testing Service.
- Wright, B.D., Mead, R.J., & Bell, S.R. (1980). BICAL: Calibrating items and scales with the Rasch model (Research Memorandum No. 23C). Chicago: University of Chicago, Statistical Laboratory .
- Wright, B.D., & Panchapakesan, N. (1969). A procedure for sample-free item analysis. Educational and Psychological Measurement, 29, 23-48.[CrossRef][ISI]

CiteULike Connotea Del.icio.us Digg Reddit Technorati What's this?
This article has been cited by other articles:

|
 |

|
 |
 
R. J. Patz, B. W. Junker, M. S. Johnson, and L. T. Mariano
The Hierarchical Rater Model for Rated Test Items and its Application to Large-Scale Educational Assessment Data
Journal of Educational and Behavioral Statistics,
January 1, 2002;
27(4):
341 - 384.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
M. Wilson and M. Hoskens
The Rater Bundle Model
Journal of Educational and Behavioral Statistics,
January 1, 2001;
26(3):
283 - 306.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
D. Southworth
Using Statistical-Based Rating Sheets to Measure Oral Test Inter-Rater Reliability
Review of Public Personnel Administration,
July 1, 2000;
20(3):
43 - 57.
[PDF]
|
 |
|

|
 |

|
 |
 
R. J. Patz and B. W. Junker
Applications and Extensions of MCMC in IRT: Multiple Item Types, Missing Data, and Rated Responses
Journal of Educational and Behavioral Statistics,
January 1, 1999;
24(4):
342 - 366.
[Abstract]
[PDF]
|
 |
|
|