Nosofsky, R.M. (1984).  Choice, similarity, and the context theory of classification.  Journal of Experimental Psychology, Vol, 10, No. 1, 104-114.

Primary Reviewer: Angie

Secondary Reviewer:: Amanda

Purpose

The purpose of this article is to provide an interpretation of context theory in terms of choice theory and similarity to exemplars.  Also, the article attempts to more fully explain possible relationships between identification and classification. 

Context theory assumes that a person’s classification of a stimulus is based on its similarity to stored category exemplars.  The probability of classifying a stimulus as a member of a particular category, X, is a simple ratio of the similarity of the stimulus to exemplars in Category X divided by the similarity of the stimulus to exemplars in all categories.  This is known as the response-ratio rule.  The response-ratio rule, proposed by Medin and Schaffer (1978), structurally resembles Luce’s (1963) choice model for stimulus identification.  The response rule can be viewed as a bias-free extension of the choice model applied at the category level.  The relationship may be seen comparing the demands of identification and classification tasks.  There is a one-to-one mapping of stimuli in the identification paradigm, but a many-to-one mapping of stimuli in the classification paradigm.  In other words, with identification, there is a one-to-one mapping because only one stimulus can match the target stimulus, but there is a many-to-one mapping for classification because many stimuli can match the target category.  Interstimulus confusion in the identification paradigm would result in a correct classification response because it would result in within category confusion.  This is referred to as the mapping hypothesis.  Assuming the mapping hypothesis is correct, the response-ratio rule can be derived directly from choice theory applied at the identification level. 

 

Another important aspect of the context theory is the multiplicative rule for computing stimulus similarity.  It is considered the crucial feature of context theory because it differentiates the theory from other classification learning theories.  One of the major assumptions made by those in the field of multidimensional scaling is that stimulus similarity is some monotonically decreasing function of psychological distance.  Nosofsky hypothesizes that the multiplicative rule of computing stimulus similarity comes about as a special case of psychological distance between stimuli conforming to the city-block metric, and of stimulus similarity being an exponential decay function of psychological distance.  In general, the city-block metric measures the distance between two stimuli as over X units and up/down Y units instead of a straight line between the two stimuli.  In other words, it is measured as the addition of the horizontal distance and vertical distance between two stimuli.

 

To summarize, the context theory arises as a consequence of integrating the mapping hypothesis of the identification-classification relationship with models in the areas of choice and similarity including Luce’s choice model for stimulus identification, an exponential decay function relating stimulus generalization to psychological distance, and psychological distance relationships for stimuli composed of separable dimensions that conform to the city-block metric.  The problem, though, is that the mapping hypothesis, which provides a clear link between identification and classification theories, had been rejected by Shepard et al. (1961) on empirical grounds.  Nosofsky reexamines the Shepard results in detail in attempt to resolve this discrepancy.   

Experimental Work

Shepard et al (1961) studied the relationship between identification and classification performance for sets of eight stimuli that varied along three binary-valued separable dimensions (e.g., black/white, big/small, square/triangle).  Subjects were required to learn a unique response to each of the eight stimuli.  Subjects also learned to classify the stimuli in several different ways.  A paired-associate learning paradigm was employed for both the identification and classification tasks.  After recording subjects’ errors, Shepard et al. used what was essentially the mapping hypothesis to predict classification performance from their identification performance.  The total number of predicted errors was compared to the total number of observed errors.  The mapping hypothesis failed in three basic ways.  The predicted number of errors exceeded the actual observed number of errors and the observed amount of variation exceeded the predicted amount.  Also, the observed rank order of difficulty between different dimensions was different than the predicted rank order of difficulty.

Increased sensitivity to classification tasks relative to identification ones and selective attention are suggested to play a role in these tasks and may explain the discrepancies in predicted and observed patterns of scores.  Subjects may distribute attention in such a way that performance will be optimized.  By assuming different combinations of overall sensitivity and distribution of attention, the pattern of data generated by the model presented in this article seems to reflect the results of Shepard et al (1961). 

Overall, the working hypothesis here is that selective attention optimizes classification performance.  Attempts to quantify the processes of selective attention and abstraction that may serve to mediate identification and classification are made by considering the optimal pattern of dimension weights.  However, Medin et al. (1983) showed that abstraction does not occur automatically, but depends greatly upon specific experimental conditions.  By having subjects classify photographs of women’s faces into two categories varying along four binary-valued dimensions (light or dark hair, light or dark shirt, open or closed smile, and long or short hair) without ever seeing the same photograph twice, the processes of selective attention and abstraction were expected to be enhanced.  This occurs for two reasons.  First, rote paired-associate learning use is discouraged because subjects would never be retested on any particular exemplar.  Second, the potential size of the exemplar population is, for all practical purposes, infinite.  It has been found that abstraction improves as category size increases (e.g., Homa, Cross, Cornell, Goldman, & Schwarz, 1973). 

Conclusions

The context theory was related to a more general theoretical framework for the modeling of choice and similarity and the ratio-response rule for classification was related to Luce’s (1963) choice model for identification.  The classification and choice theories can be related through a modified form of the mapping hypothesis in which weighting distributions of sensitivity and attention are altered.  The multiplicative similarity rule was related to theoretical results from multidimensional scaling research.  Lastly, identification and classification performance may be related and understood in terms of selective attention.  Some support has been shown for the idea attention may be distributed among dimensions in a way that optimizes performance.

Points for Discussion 

The experimental paradigms discussed in this article had binary-valued dimensions.  Will work like this really help disseminate the processes of identification and categorization?  After all, how many dimensions are there in the real world that vary binarily?         

This article demonstrated attempts to relate findings of research to mathematical models.  Do you think that it is possible to precisely simplify human mental processes into mathematical models?  Is the human mind complex enough to discover exact mathematical models of our mental processes?


This page was last updated:
07/18/2006 00:36