All Posts

A probabilistic interpretation of binary cross-entropy

This post is about the most commonly used loss function for binary classification, known as "binary cross-entropy loss." My goal is to explain why this strange formula is optimal in a particular sense: it encodes the "likelihood" that a model will reproduce the observed data.