Introduction to IRT

The Framework

At the base IRT defines a model that associates some latent ability of the student to probability of answering correct on problems. $p(\theta,b_i)$ where $p$ is answering correctly on problem ii given the student's ability $\theta$ and the intrinsic parameters of the problem collected in the vector $b_i$.

We often want to study the problem intrinsic parameters ($b_i$) as a function of latent ability ($\theta$). We call this the characteristic equation of a problem which is just a $(\theta, p(\theta,b_i))$ plot.

In standard models, $\theta$ is unbounded that is $\theta \in ]-\infty,\infty[$, and a sigmoid function ($\sigma(x) = \frac{1}{1+e^{-x}}$) is usually used as a base for the characteristic equation. This is convenient since sigmoid is defined $\forall x \in ]-\infty,\infty[$, and has a range of $]0,1[$ which makes it a natural probability density function.

Even though $\theta$ is unbounded, in practice, $\theta$ is typically in the range of $[-3,3]$.

Assumptions

There are some formal assumptions for how $p(\theta,b_i)$ should behave. They're mostly common sense, but it's worth noting them regardless.

Monotonicity: The probability of answering correctly, $p(\theta,b_i)$, should monotonically increase as $\theta$ increases.
Unidimensionality: Basic models assume that there is only one latent trait that encodes student ability, $\theta$, but this assumption can be relaxed in more advanced models.
Local Independence: Responses to different problems are independent from each other.
Invariance: The intrinsic parameters of the problems are stable across different students.

Models

1 Parameter (Rasch model)

The simplest model uses only one intrinsic parameter: the difficulty of the problem. We write the characteristic equation as

$$p(\theta,d_i)=\frac{1}{1+e^{-(\theta-d_i)}}$$

where $d_i$ is the difficulty parameter. It is analogous to $\theta$ in its range. The neutral value is $0$.

2 Parameter model

We can extend this model by introducing a discrimination parameter that varies the slope of the characteristic equation.

$$p(\theta,d_i,a_i) = \frac{1}{1+e^{-a_i(\theta-d_i)}}$$

where $a_i$ is the discrimination parameter.

$a_i$ should be in the range of $[0,\infty[$ where $0$ means that the skill has no effect. Everyone has the same probability of answering correctly given by $\frac{1}{2}$, and in the 4 parameter model the average of $b,c$. If $a_i$ approaches $\infty$ then the discrimination becomes a perfect step function. The neutral value is $1$.

3 Parameter model

The 3 parameter model takes account for the probability of guessing correctly by raising the lower asymptote

$$p(\theta,d_i,a_i,c_i) = c_i + (1-c_i) \frac{1}{1+e^{-a_i(\theta-d_i)}}$$

where $c_i$ is the guessing parameter.

$c_i$ should be in the range of $[0,1]$ where $0$ means that it's impossible to guess the solution, and $1$ means that it is impossible to fail the question. The neutral value is $0$.

4 Parameter model

The 4 parameter model is used less often, but is still worthwhile to discuss It introduces a slip factor: The probability of someone who knows the concept makes a mistake.

$$p(\theta,d_i,a_i,c_i,b_i) = c_i + (b_i-c_i) \frac{1}{1+e^{-a_i(\theta-d_i)}}$$

where $b_i$ is the upper bound probability parameter.

$b_i$ should be bigger than $c_i$ and is in the range $[0,1]$. It encodes the probability of someone who knows the answer actually answering correctly. the neutral value is $1$.

Building intuition

Difficulty (0):

Discrimination (1):

Guess factor (0):

Slip factor (1):

Fitting the model

We fit the models using maximum likelihood estimation. This is the same whether or not you know the intrinsic problem parameters. However, if you already know the problem difficulties, then you can choose to iteratively estimate student abilities using only a subset of the problem dataset.

By choosing the problems based on which one gives the maximum information: The maximum of $I= p(\theta,b_i)(1-p(\theta,b_i))$, you can quickly estimate the student's ability to a high degree of confidence. The highest information problem is the problem where you guess the student has the closest to $50\%$ chance of answering correctly.

🌿 Introduction to IRT