Can exempler models predict prototype and peak shift effects?

The most compelling way to discriminate between two different cognitive models is if you can show that they make qualitatively different predictions for the same experimental manipulation. McLaren, Bennett, Guttman-Nahir, Kim, & Mackintosh (1995) presented experimental data that they argued demonstrate a case where exemplar models of categorization make a qualitatively wrong preciction. They asked people to learn two different categories of checkerboard patterns that varied in which tiles were filled. The prototypes for these categories shared some of the tiles in common, but varied in which non-shared tiles were filled.

People saw a series of exemplars produced by randomly perturbing one of the category prototypes by swapping the black/white identity of a subset of the tiles. Critically, these exemplars could be close exemplars in which some of the tiles of one prototype were changed to make them more like the other prorotype, or far exemplars in which the flipped tiles were chosen to make the exemplar different from the target prototype, but also less like the other category’s prototype. At test, McLaren et al. (1995) showed that participants were better able to classify the never-seen category prototypes than the category-specific far exemplars. They then showed analytically that under some assumptions exemplar models can never produce this ordering of accuracies–a qualitative failure.

A rejoinder from Lamberts (1996), however, showed that this prediction failure hinges critically on a choice not about the exemplar representation, but about the similarity function used to compare stored exemplars to test grids. In this assignment, you will use simulations to replicate Lamberts’ results, and show further that prototype models are subject to the exact same qualitative mismatch when the same kind of similarity function is used.

The Prototypes

Following Lamberts, we’ll randomly construct random 16x16 grids to represent the two categories A and B. We’ll start by creating category prototype A by randomly filling half of the tiles with black. Then we’ll construct category prototype B by randomly flipping the value for 7 of the 16 tiles in each row.

# Make prototype A by setting each cell of a 16x16 grid randomly to be white 
# or black
a <- tibble(row = 1:16, col = 1:16) %>%
  expand(row, col) %>%
  mutate(value =  rbinom(256, 1, .5))

# Pick 7 cells from each row and flip their value (to make prototype B)
different <- a %>%
  group_by(row) %>%
  sample_n(7) %>%
  mutate(value = as.numeric(!value)) %>%
  ungroup()

# Find the cells which will not differ between prototypes
same <- anti_join(a, different, by = c("row", "col"))

# Make prototype B
b <- bind_rows(different, same)

Let’s plot the prototypes to see what they look like

prototypes <- a %>%
  mutate(prototype = "a") %>%
  bind_rows(mutate(b, prototype = "b"))

ggplot(prototypes, aes(x = row, y = col, fill = as.factor(value))) + 
  facet_wrap(~ prototype) +
  geom_tile(color = "black") + 
  theme(legend.position = "none", axis.text = element_blank(),
        axis.line = element_blank(), axis.ticks = element_blank(),
        axis.title = element_blank()) + 
  scale_fill_manual(values = c("black", "white"))

We have the prototypes, now we need to construct close and far exemplars for each category following the recipe in McLaren et al. (1995). For each kind of exemplar, we’ll randomly select 25 tiles to change. For close exemplars, these will be tiles that differ in value between the two prototypes, but for far exemplars, these will be tiles that are the same for both prototypes. This design makes close exemplars of category A different from the prototype of A, but more similar to B. In conrast, far exemplars of category A are different from prototype A but also different from category B.

close_exemplar <- function(prototype) {
  changed <- different %>%
    sample_n(25) %>%
    select(-value) %>%
    left_join(prototype, by = c("row", "col")) %>%
    mutate(value = as.numeric(!value))
  
  anti_join(prototype, changed, by = c("row", "col")) %>%
    bind_rows(changed) %>%
    mutate(type = "close")
} 

far_exemplar <- function(prototype) {
  changed <- same %>%
    sample_n(25) %>%
    select(-value) %>%
    left_join(prototype, by = c("row", "col")) %>%
    mutate(value = as.numeric(!value))
  
  anti_join(prototype, changed, by = c("row", "col")) %>%
    bind_rows(changed) %>%
    mutate(type = "far")
}

Take a look at these near and far exemplars for category A. Do you see how they are different from both category A and category B’s prototype? Make sure you understand what the manipulation is doing.

a_manips <- prototypes %>%
  rename(type = prototype) %>%
  bind_rows(close_exemplar(a) %>% mutate(type = "close a"),
            far_exemplar(a) %>% mutate(type = "far a")) %>%
  mutate(type = factor(type, levels = c("far a", "a","close a", "b")))

ggplot(a_manips, aes(x = row, y = col, fill = as.factor(value))) + 
  facet_grid(. ~ type) +
  geom_tile(color = "black") + 
  theme(legend.position = "none", axis.text = element_blank(),
        axis.line = element_blank(), axis.ticks = element_blank(),
        axis.title = element_blank()) + 
  scale_fill_manual(values = c("black", "white"))

Building an exemplar model

Now that we have the functions to generate the training stimuli, we need to (1) create a training set like the one in McLaren et al.’s participants, (2) decide how our model should represent these training items, and (3) simulate the model’s behavior on the close, far, and prototype test trials.

Training

To simulate the training trials, you need to decide how many of each type of stimulus (close A, close B, far A, far B) our simulated participant should be exposed to. In McLaren et al.’s experiment, they set this on an individual participant level by using a learning threshold. We could do the same in principle, but for simplicty let’s not worry about–you can choose some reasonable number of trials of each type (e.g. 20). You then want to make that many distinct copies of each trial type by calling the functions above to construct a training set.

Representation and Algorithm

A critical choice to be made is how our simulated learner should represent these exemplars. One possibility is to follow Busemeyer & Dietrich and use a connectionist representation. You’re welcome to do this if you like. Alternatively, an even simpler choice is to say that participants’ representation is exactly identical to the representation of the stimuli we’re using in our simulation–a set of grids.

Once you have figured out how you want to solve that, you need to implement the similiarity function used by the exemplar model described in Lamberts (1996). There are three important functions here:

Psychological distance

According to the exemplar models, people compare two exemplars by estimating the psychological distance between them. This distance between two exemplars \(x_{i}\) and \(x_{j}\) is an additive function of their dissimilarity across all dimensions \(P\)–here one dimension per tile \(p\) (see below). A critical parameter in this distance function is \(r\), the exponent that scales the distance. When \(r\) is 1, this distance is equivalent to city block distance–it counts the number of dissimilar tiles. However, when \(r\) takes different values, the distance function changes. When \(r\) is 2, for example, this function computes Euclidian distance which squares the number of dissimilar tials, and thus accentuates bigger distances.

\[ d_{ij} = \left(\sum_{p=1}^{P}{\mid x_{ip}-x_{jp} \mid^{r}}\right)^{1/r} \]

Similarity

The next step in an exemplar model is to conver this distance to a psychological similarity. We’ll follow Lamberts (1996) and say that psychological similarity of two exemplars \(i\) and \(j\) is an exponentiated negative distance with a single parameter \(c\) that scales how sensitive the model is to each unit of distance. You’ll see that the choice of \(c\) is important in scaling the absolute levels of accuracy below, but it doesn’t change the qualitative pattern of responses across close, far, and prototype trials.

\[ S_{ij} = e^{-c \cdot d_{ij}} \]

Test

Choice

Finally, the model has to make a choice at test when deciding whether a new exemplar is a member of Category A or Category B. Following lamberts, we’ll use the classic Luce Choice Axiom in which each category is chosen in proportion to its similarity.

\[ P\left(R_{j} \mid S_{i} \right) = \frac{\sum_{j \in C_{j}}{s_{ij}}}{\sum_{k}\left(\sum_{k \in C_{k}}{s_{ik}}\right)} \]

In order to simulate the key result, you’ll need to test the model on three types of stimuli: (1) new close exemplars of A and B, (2) new far exemplars of A and B, and the untrained A and B prototypes. For simplicity, you can just do one test trial of each important type.

What happens in the model? What happens when you change the key parameter \(r\)? Can exemplar models reproduce the pattern of results in McLaren et al.?

Homework Content

Training Stimuli (20 Points): Use the provided functions in order to create a set of training stimuli that you’ll use to simulate McLaren’s experiment.
Model (30 Points): Implement the three functions (Distance, Similarity, and Choice) in order to build a working Exemplar Model
Test (20 Points): Run the model for \(r = 1\) and \(r = 2\). Can you reproduce the results from Lamberts’? You’ll need to change the value of c in order to get the accuracy levels reported. Try to get this approximately right (within e.g. 10%), but don’t stress about getting those exact values. In a real model, you would estimate the best value of \(c\) numerically rather than fiddling with it by hand.
A Prototype Model (20 Points): Construct a simple prototype model and simulate the same experiment. One very simple way to do this is to say that the model stores exactly one point-estimate of each category–the average of all of the exemplars. You can then use the similarity function from Busemeyer & Dietrich to compare these point-estimates to the 3 types of stimuli. Here, \(\sigma\) is a parameter that controls the sensitivity of the similarity function. This should look really familiar…

\[ S_{ij} = \sum_{p=1}^{P}e^{- \left(\frac{\mid x_{ip} - x_{jp}\mid }{\sigma}\right)^2} \]

A generalized problem of similarity functions(10 Points): Show that you get the same pattern of results with this new model by tweaking the similarity function just like you did with the Exemplar model. What does this mean about the claims in McLaren et al.? See if you can tie this back to Marr’s ideas about levels of analysis and representation and process.

Report format

R Markdown: All code used to generate the results and plots should be organized and submitted in an R Markdown document.

Download the template for this homework here:

download.file("https://raw.githubusercontent.com/dyurovsky/cog-models/master/hw1_template.Rmd", 
              destfile = "hw1_template.Rmd")

Teamwork and grading

You will be allowed to work with up to two other member of the class and submit on the project. You can either choose your partner(s), or if you’d prefer I’m happy to choose a partner(s) for you. You’re also welcome to work alone if you prefer.

All partners will receive the same grade.