Predicting The Future: Discrete Time-to-Event Modeling For Accurate Forecasting

Introduction

Data science problems typically focus on predicting the what — for instance, what will a house sell for? Or what will a customer buy? Or what is the likelihood that a patient has a disease?

However, many real-world decisions rely equally on when something will occur. How long until a customer leaves? When will a loan default? How much time is left before a component breaks down?

Forecasting when an event will take place is a predictive modeling scenario that often receives little attention in introductory resources. Predicting the “when” is commonly known as time-to-event modeling or survival analysis.

Although event modeling shares methods and principles with more conventional predictive modeling, it also brings unique complexities that must be addressed to produce accurate predictions.

This is the beginning of a multi-part series that will explore the fundamentals of time-to-event modeling. This first installment will introduce core concepts, while subsequent articles will delve into time-to-event model development techniques.

Here are the three topics I’ll cover in this article:

Breaking events into discrete time intervals
Censoring in event data
The life table

Discretizing Time

Although time is inherently continuous, depending on the time-to-event modeling scenario, it may be suitable to treat time as either continuous or discrete. In this article, we’ll concentrate on the discrete approach, but I do want to briefly discuss the choice between discrete and continuous time treatment.

Guidelines for treating time as continuous

Time is generally best treated as continuous when:

The event can happen at any moment and is naturally continuous (we’ll contrast this with the less intuitive, inherently discrete events in the next section). Equipment breakdown is a typical example.
The timing of the event can be recorded with precision. It’s challenging to pinpoint the exact moment an unemployed person finds a job, but modern vehicle sensors can capture the precise timing of a car accident.
The granularity of the time measurement is very fine relative to the overall time horizon. For instance, measuring events down to the second when the natural timeline of the event spans weeks or months.

Keep in mind that measuring time in small increments alone doesn’t automatically mean a continuous-time setting is appropriate. Consider human response time to changing images. Response time can be measured in centiseconds (1/100 of a second), but since typical response times are around 2–3 centiseconds, this unit represents a significant portion of the underlying timeline. Despite the small unit of measurement, this example likely wouldn’t work well as a continuous-time model.

Guidelines for treating time as discrete

The event itself is naturally discrete. For example, a customer can only miss a payment on a due date; they can’t miss it at an arbitrary moment in time.
Precise event timing cannot be reliably recorded. We can’t know exactly when a pipe burst or when a person became infected with a disease.
Data are grouped at discrete intervals for practical reasons. In many applications, treating time as continuous adds minimal value. In home insurance, for instance, it rarely matters what second a pipe burst or a fire started; the relevant unit is typically just the day of the event or the day the claim was filed.

When the modeling context calls for discrete time, a deliberate decision must be made about how to discretize. This requires a solid understanding of the problem domain. In life insurance, time is often measured in years; in business reporting, months or quarters may be more suitable.

A note on ties — One additional distinction I want to highlight between discrete and continuous time is ‘ties’ — that is, an event occurring at the exact same time for multiple observations. Many continuous time-to-event modeling methods assume that ties are not possible and don’t exist in the dataset. Discrete time-to-event approaches don’t carry this assumption, and depending on the scenario, ties can be common (think of insurance claims within a month).

Censoring

Data censoring is far more prevalent in time-to-event data than in traditional machine learning applications. Data censoring happens when the value of an observation is only partially known — we might know it falls above (right censoring) or below (left censoring) a certain threshold, but we don’t know the exact value.

Consider yourself as an example: how many years will you live? You know you’ll live at least to your current age (because you already have), but you don’t know how much longer you’ll go. You are a right-censored data point! Your great-great-grandmother is not censored because she has already passed away, so you can determine how long she lived. Alright, enough of this example — I don’t enjoy thinking about my own mortality.

While both right and left censoring can occur in time-to-event applications, I’ll focus my discussion on right censoring because it’s the most common type you’ll encounter. Right censoring typically arises from two phenomena in the data: (1) the event hasn’t occurred or hasn’t had a full opportunity to occur for some observations, and (2) data collection ceased for some observations at a certain point in time. We’ll spend a bit of time discussing each.

The event hasn’t happened

Our somewhat too-real lifespan example falls into the category of censoring due to an event not occurring. Death and taxes are inevitable — or so they say. But not all events you might need to model are guaranteed to eventually happen. Consider modeling when someone catches the flu, gets fired from their job, or when an insurance claim on a house is filed. These are things that may or may not happen, but they are also subject to censoring.

Let’s explore the home insurance example a bit further. We want to predict the timing of claims for a set of home insurance policies. We have a dataset with 1-year contracts that goes back to contracts that started 5 years ago and includes data up to last month. Pause and think about where the censoring comes into play here. All contracts that began less than a year ago are right-censored — we don’t know how many claims they’ll have because they’re still active.

Data stopped being collected

Sometimes our data are censored because we fail to gather event data for various reasons. Imagine we’re conducting a study on how long it takes a job seeker to receive an offer. We begin with 500 participants in our study, but after a while, 50 of them stop responding to our calls and emails. We know what their offer

Can We Ever Really Know a Customer’s Claim Status?

To understand data censoring, imagine a company trying to figure out the best way to communicate with its customers. Say we reached out to a customer last week, and their status was “interested.” That was the last time we contacted them, but we don’t know what their status is now or what it will be in the future (assuming they continue to ignore us).

To make this clearer, let’s return to our home insurance example. We will likely have some customers who cancel their policies before the contract period ends. For these customers, we know the number and timing of claims (if any) up to the point of cancellation. However, once they leave, we have no way of knowing if they experienced any claimable events afterward.

Illustration of censoring in time-to-event data – image by author

What Happens If You Ignore Data Censoring?

Models built without accounting for censoring will produce biased predictions. Since we are tracking events, more censoring means fewer observed events. When a model sees fewer events, it predicts them less often. Time-to-event models that don’t account for censoring will generate predictions that are consistently lower than the actual number of observed events.

Additional Note: Most time-to-event methods assume that censoring is non-informative. This means the reason an observation is censored is unrelated to its underlying risk of an event, after accounting for observed features. If the censoring is actually related to the event risk, standard time-to-event methods can become biased. In some cases, it may be more appropriate to model the censoring process directly, for example, by treating it as a competing risk.

The good news is that there is a simple data transformation that corrects for time-based right censoring. The life table offers a clear and intuitive way to see how this correction works.

The Life Table

Life tables are very simple, yet powerful tools for modeling time-to-event data. While the actual prediction methodology is generally inflexible and tends to underfit, understanding how data is structured in life tables provides a solid foundation for more advanced time-to-event modeling techniques.

Before diving into the details of life tables, let’s get a conceptual overview of what they do. Essentially, life tables break time into multiple discrete intervals to handle the censoring problem.

Consider a single home insurance policy. We could certainly determine the number of claims by simply observing the contract until it expires. But to do that, we’d have to wait until the contract ends, which delays our ability to learn from recent data. The life table allows us to start learning from the data much sooner by dividing time into discrete intervals. We can learn from each discrete interval as soon as it ends. Instead of waiting for a home insurance policy to expire, we can start learning after the first month (if we divide time into monthly intervals).

Each row in a life table represents a discrete unit of time. The columns of the life table generally fall into two categories: (1) observational data and (2) calculations derived from the observational data. The observational columns include the number of units ‘at risk’ (units that could experience an event), the number of units that actually experienced the event, and the number of units that were censored. The calculation columns include the number of units adjusted for censoring, the conditional probability of the event, the unconditional probability of the event, and the survival probability.

Describing the life table in words can be tricky. Let’s walk through an example to build our intuition.

Example of a life table – image by author
Note: I added the additional (1-conditional prob) column for illustration

Table with the Excel formulas to illustrate calculations – image by author

I want to stress how important it is to understand the calculations in the life table. While life tables themselves are rarely used for predictive modeling, the details of these calculations are absolutely essential knowledge when using more advanced techniques.

If you can read through the formulas and understand them, great! If not, I’ve included additional comments on each calculation below.

Let’s go through the columns one by one.

Discrete Time — The sequential, discretized units of time. These could be days, weeks, months, etc.

Units at Risk — This column shows the number of units at risk at the start of each time period. In other words, these are the units that hadn’t experienced the event before the time period in question.

The first value of 1,283 is an input; the other values can be calculated by subtracting the censored units and the number of events from the previous time period’s units at risk.

Censored — These are the number of units that were censored during the current time period. Note that these calculations assume they were censored at the beginning of the time period. This means the censored units were not considered ‘at risk’ during that time period. Simple adjustments to the calculations can change the assumption about when censoring occurs. Common adjustments include assuming risk exposure for the full time period or half of the time period.

Conditional Probability — In discrete-time survival analysis, this is often called the hazard. It’s the probability of the event happening in the current interval, given that the unit has survived up to that interval.

1-Conditional Probability — A simple calculation to get the conditional survival probability.

Survival Probability — The product of all the conditional survival probabilities up to the current point. You can think of survival as a series of coin flips, with varying probabilities of getting heads on each flip. The survival probability captures the chance that you won’t flip heads n times in a row.

Unconditional Probability — This calculation gives the probability of an event in a specific time period, without conditioning on survival up to that point. It removes the condition by multiplying the probability of the event in time period n by the product of all the survival probabilities in the time periods from 1 to n-1.

Wrapping It Up

Time-to-event modeling provides the tools to predict when something will happen. This is different from more common machine learning approaches that predict what will happen or how much.

In this article, we covered three main points: (1) discretizing time, (2) understanding censoring in time-to-event data, and (3) using the life table to demonstrate how censoring can be addressed through data structuring.

In the next article, we’ll build on these concepts and show how they translate into practical predictive modeling techniques.

Top Posts

OWL Unveils Multi-Token Prediction Drafters for Gemma 4: 3x Faster Inference With Zero Quality Drop

$${\bf{Micro}}{{\mathbb{S}}}{\bf{plit}}$$ : semantic unmixing of fluorescent microscopy data

What Bitcoin’s 20% Surge Isn’t Telling You: The Bearish Case Hiding Behind the Rally

Predicting the Future: Discrete Time-to-Event Modeling for Accurate Forecasting

$${\bf{Micro}}{{\mathbb{S}}}{\bf{plit}}$$ : semantic unmixing of fluorescent microscopy data

How OWL Made Your Voice Assistant Finally Join the Conversation

Behind the Numbers: The Unlikely Alliance Shaping Data-Driven Policy

Engineering a Precision AI: Automated Candidate Selection for Direct Cut Shots in 9-Ball Pool

Unsupervised Transfer Learning Powers Seamless Multi-Animal Tracking Without Manual Annotation

Zyphra Launches TSP: A Breakthrough Hardware-Aware Training & Inference Method That Outpaces TP+SP by 2.6x

OWL Unveils Multi-Token Prediction Drafters for Gemma 4: 3x Faster Inference With Zero Quality Drop

$${\bf{Micro}}{{\mathbb{S}}}{\bf{plit}}$$ : semantic unmixing of fluorescent microscopy data

What Bitcoin’s 20% Surge Isn’t Telling You: The Bearish Case Hiding Behind the Rally

DAEMON Tools Official Installers Hijacked in Sophisticated Supply Chain Malware Attack

Kyverno 1.18 Unveiled: What’s New in the Latest CNCF Release

Is That Job Listing Too Good to Be True? 9 Tell-Tale Signs to Spot Scams, Says LinkedIn

Tutor Intelligence Launches Real-World Data Factory to Train Next-Gen Robot AI

5 Creative Ways to Boost Your Workflow with Claude Code

Trending

OWL Unveils Multi-Token Prediction Drafters for Gemma 4: 3x Faster Inference With Zero Quality Drop

$${\bf{Micro}}{{\mathbb{S}}}{\bf{plit}}$$ : semantic unmixing of fluorescent microscopy data

Latest Posts

Not More Data, but Better World Models – Unite.AI

OpenAI Is Hiring Head of Preparedness, Amid AI Cyberattack Fears

Subscribe to Updates

Top Posts

Predicting the Future: Discrete Time-to-Event Modeling for Accurate Forecasting

Introduction

Discretizing Time

Guidelines for treating time as continuous

Guidelines for treating time as discrete

Censoring

The event hasn’t happened

Data stopped being collected

Can We Ever Really Know a Customer’s Claim Status?

What Happens If You Ignore Data Censoring?

The Life Table

Wrapping It Up

Related Posts