# Homework 3

Due by 11:59 PM on Wednesday, October 21, 2020

Getting your assignment: You can find template code for your submission here at this GitHub Classroom link. All of the code you write you should go in hw3.Rmd, and please knit the Markdown file in your completed submission.

## Introduction

An important problem in inferential learning is that the data available are almost always too sparse to unambiguously pick out the true hypothesis (see e.g. Quine, 1960). Human learners, however, appear to cope with this problem just fine, converging on a small set of possible hypothesis from very little data. In machine learning, this is known as few-shot learning, and is a real challenge for modern learning systems. One hypothesis for what helps human learners is that have a strong prior on the kinds of hypotheses that are likely and relevant, which makes the search problem much easier.

In this assignment, you’ll be working with a simple problem designed to demonstrate this phenomenon, and building a model that learns in a human-like way. The number game, developed by Josh Tenenbaum (2000), demonstrates the power of strong sampling for inductive inference. The key insight is that the probability that data is generated by a hypothesized process is proportional to the size of the set of possible data that the hypothesis could generate.

In the number game, you get as input one or more numbers between 1 and 100 generated by a computer program according to an unknown rule. Your job is to determine the most likely rule, or to make predictions about which numbers the program will generate next.

# Modeling the number game

To figure out what hypothesis the computer is generating samples from, you can use Bayesian inference. To do this, you will use Bayes Rule to infer the posterior probability for each hypothesis:

$P\left(H \vert D\right) \propto P\left(D \vert H\right)P\left(H\right)$

To specify the model, you need to specify a prior probability ($$P\left(H\right)$$) for each hypothesis, and a likelihood function that specified how probable the observed data are under that hypothesis ($$P\left(D \vert H\right)$$).

Prior: Following Tenenbaum (2000), we’ll say that hypotheses can be of two kinds: Mathematical and Interval. We’ll say that aprior, there is a $$\lambda$$ probability that the true hypothesis is in the set of Mathemtical hypotheses, and a (1-$$\lambda$$) probability that the true hypothesis is in the set of Interval Hypotheses

Likelihood: If we assume that the every number consistent with a particular hypothesis is equally likely to be generated by the machine, then if every number in $$D$$ is in $$H$$, the likelihood that set of numbers $$D$$ is generated by the hypothesis $$H$$ is the probability of choosing $$\lvert D \rvert$$ numbers at random from the set of $$\lvert H \rvert$$ numbers ($$\frac{1}{\lvert H \rvert \choose \lvert D \rvert}$$). You can compute this in R using the choose function. Otherwise, $$H$$ cannot generate $$D$$ and the likelihood is 0.

# The candidate hypotheses

Your first goal is to write all of the candidate hypotheses that your model will consider. You don’t need to exhaustively include all of the hypotheses in the original Tenenbaum (2000) model, just the following

Mathematical hypotheses:

• even numbers

• odd numbers

• square numbers

• cube numbers

• prime numbers

• multiples of any number $$n$$, $$\left(3 \leq n \leq 12\right)$$

• powers of any number $$n$$, $$\left(2 \leq n \leq 10\right)$$

Interval hypotheses:

• decades ($$\left\{1-10,\; 10-20, \;\ldots \right\}$$)

• any range $$1 \leq n \leq 100, n \leq m \leq 100, \left\{n-m\right\}$$

One nice way of representing these hypotheses as tibbles with two columns: name which gives a human-readable description of the hypothesis, and set which is a list of all of the numbers 1—100 consistent with that hypothesis. You’ll first. write a function that generates each of these. For the multiple-set hypotheses (e.g. multiples of any number), you can make the function generate a multi-row tibble where each row is one of the sub-hypotheses.

#### Problem 1: Define all of the hypotheses for the number game. You’re welcome to use the stubs I have provided in the template. (3 points).

Now that you have each of the individual hypotheses, you can put them all together. A nice representation for this is one big tibble that has a row for all of the hypotheses you’ve generated. You’ll find the bind_rows function to be helpful here.

The last detail is that you want to be able to assign a prior probability appropriately to each of these hypotheses. Remember, all of the mathematical hypothesis together should have prior probability $$\lambda$$, and the interval hypotheses should make up the remainder of the prior ($$1 - \lambda$$). You’ll want to divy up this probability equally within those respective sets.

# Modeling Inference

Now that you have all of your hypotheses, you need to write the likelihood function. And finally to compute the posterior probability of each hypothesis.

# Inferring hypotheses

Now you are ready to test the model! Using the default $$\lambda$$ value of $$\frac{4}{5}$$, you will get the top 5 best hypotheses for each of the following sets of data $$D$$:

• $$2$$

• $$2, 4, 8$$

• $$15, 22$$

• $$1, 2, 3, 4, 6$$

# Predicting the next number.

Now you will implement the final part of the model: Prediction about whether the computer is likely to generate a new number $$d'$$ having observed all of the previous examples $$D$$. To do this, you will use Bayesian model averaging. Instead of committing to the best hypothesis after observing $$D$$, you will average together all of the hypotheses, weighing them by the posterior probability of them being true after observing $$D$$.

$P\left(d'\right) = \sum_{h \in H}P\left(d' | h\right)P\left(h | D\right)$

You’ll just need to write one new function predict_value which will take in a tibble of hypotheses, a set of previously observed numbers, and a new target number. It should return the posterior probability of observing that number according to the formula above. You shouldn’t need any new helper functions here!

To check whether it works correctly, try these examples using the default value for $$\lambda$$:

• $$D = 2, \;d' = 12$$

• $$D = 2, 4, 8, \;d' = 12$$

• $$D = 15, 22 \; d' = 21$$

• $$D = 15, 22 \; d' = 50$$