Revolutionizing Mouse Health Assessment Through AI-Powered Continuous Home-Cage Surveillance Across Multiple Centers

Figure 1 illustrates how daily health checks are carried out at the three facilities. At Rutgers University (fig. 1a), technicians use a flashlight (pen light) to look for signs of animal distress. However, visibility can be limited because animals are often resting inside their nests or enrichment devices (fig. 1b,c). Fully pulling cages out of the rack to inspect the animals is discouraged at both Rutgers and the École Polytechnique Fédérale de Lausanne (EPFL), as this practice can disturb the animals’ sleep and compromise the reliability of scientific results. Observations of technician workflow at Rutgers showed that each cage typically receives only 4–8 seconds of attention during routine checks.

To assess how well these daily checks work, we reviewed all clinical cases recorded during the following periods: 11 months at Rutgers (USA), 3 years at EPFL (Switzerland), and 10 months at Sanofi (France), using data from animals housed in DVC racks. As shown in figure 2a, the highest number of detected cases occurred on cage change days: Friday at EPFL, Monday at Rutgers, and Wednesday at Sanofi. At Rutgers and EPFL, case numbers dropped sharply on weekends. This decline likely reflects fewer staff members on duty who still need to inspect a large number of cages. Weekend checks tend to prioritize checking for food, water, and urgent veterinary needs. In contrast, Sanofi showed little difference in case detection between cage change days and other days, possibly because tumor study cages were pulled from the rack daily for visual checks without opening them. Across the study period, the total number of confirmed cases was 229 at EPFL, 42 at Rutgers, and 65 at Sanofi, categorized in figure 2b. Supplementary figures 1 and 2 show animal density per cage and the distribution of veterinary cases, respectively. These results confirm that brief visual inspections are poor at detecting animal distress. Most health problems are only noticed when technicians handle animals during cage changes.

**Fig. 2: Distribution of veterinary cases by weekday and veterinary category across the three institutions.**

To improve health monitoring, we created machine learning/AI algorithms that analyze nighttime movement when mice are naturally most active. Each cage’s activity pattern over a 12-hour dark period is compared to its own average from the previous 7 days. Figure 3 shows examples of the alerts the system generates. Red lines represent the current day’s activity; green lines show the 7-day baseline. A color-coded bar at the top of each graph indicates status: green for normal, red for abnormal (labeled as hypoactivity, hyperactivity, or unusual spatial behavior). Figure 3a shows a mouse with reduced movement (hypoactivity); figure 3b shows zero movement, indicating a likely death; figure 3c displays a hyperactive pattern; and figure 3d represents normal activity with no alerts. Technicians use these signals to decide whether a closer physical examination is needed.

**Fig. 3: Example of graphical output from the welfare-check task generated by the algorithm.**

A key strength of this algorithm is that users can customize thresholds for spotting movement anomalies, including how strictly it distinguishes true alarms (true positives, TP) from non-issues (false positives, FP). We tested three ensemble models on data from all three sites to find the threshold best suited for scaling the system across an entire facility. The maximum allowed FP rates were set at 5% (model 1), 15% (model 2), and 30% (model 3). This flexibility lets each facility choose settings that balance efficiency with digital monitoring. Too many false warnings can lead to “user fatigue,” reducing trust in the system and making it less effective.

The analysis covered 1,367 cages across the three sites over varying durations (roughly 300 days each for Rutgers and Sanofi, and over 1,000 days at EPFL), yielding 90,540 evaluations at EPFL, 51,094 at Rutgers, and 15,367 at Sanofi (see Table 1). Table 1 also reports FP detection rates for each model. The analysis started retroactively from the date each facility first recorded a confirmed veterinary case (day 0). As expected, FP rates rose from model 1 to model 3, with notable differences between sites. EPFL’s results closely matched the algorithm’s performance during optimization. In contrast, Rutgers showed relatively small changes in false positives across the three models.

Discrepancies in detection rates observed between EPFL and sites 1 and 2 may be attributable to the larger population of wild-type B6 mice at EPFL. At both Sanofi and Rutgers, performance differences between model 2 and model 3 were negligible (Table 1). This inter-site variation could stem from the reduced cage count and shorter evaluation period at Rutgers and Sanofi.

Table 1: Performance of the AI algorithms across the three institutions

The proportion of clinical cases or true positives (TPs) identified on day 0 improved progressively from model 1 through model 3 (Fig. 4). Detection gains from model 1 to model 3 at each site were as follows: EPFL improved from 17.9% (model 1) to 40.6% (model 3), while Sanofi rose from 50.8% (model 1) to 61.5% (model 3). Conversely, Rutgers University experienced a decline under model 3 (42.2%) relative to model 2 (50%). All facilities and time points showed statistically significant improvements when moving from model 1 to model 2 (McNemar test, P < 0.05, power ranging from 0.71 to 1), though Sanofi saw only a marginal gain (+6.1%). We broadened the detection window to −3 and −6 days prior to day 0 to address weekday bias in operator-reported cases, counting any detection within that window as a TP. Expanding the window boosted TP case detection across all three models (Fig. 4). Only EPFL showed a statistically significant improvement between model 2 and model 3, with detection rates climbing by 7–9 percentage points for both −3 and −6 day intervals (McNemar test, P < 0.05, though power was limited [0.05–0.24] due to the small number of discordant pairs). Given these findings, we selected model 2 for the subsequent results presented in this report, as it struck the best balance between operational efficiency and TP detection effectiveness. Findings from models 1 and 3 are provided in Supplementary Figs. 3–6.

**Fig. 4: Detection rates of three ensemble models across three time intervals.**

Figure 5 displays the categories of TP clinical issues identified by model 2 across the three locations on day 0. The naming conventions for clinical cases varied among the institutions depending on their available clinical records. A clinical category was included only if it encompassed at least three documented cases. The algorithm successfully detected 93% of found-dead cases at EPFL (71 total cases), 85% at Rutgers (13 total cases), and 100% at Sanofi (3 total cases). The small share of missed fatal cases at Rutgers and EPFL affected group-housed mice. Since the current technology does not permit individual tracking of animal movement, it likely lacks the sensitivity needed to achieve 100% detection of spontaneous deaths. At Rutgers, the algorithm achieved perfect accuracy in identifying hunched posture, eye problems, and weight loss. Detection rates exceeding 80% were also recorded for health impairment, neurological disorders, and hyperactivity at EPFL, as well as for ruffled fur and tumors or ulcers at Sanofi. The lowest detection rates were noted for injuries at EPFL (73%) and for fighting and skin conditions at Rutgers (62% and 57%, respectively).

**Fig. 5: Detection rates of TP veterinary case categories using model 2.**

To evaluate the algorithm’s ability to catch subclinical cases within the TP subset, we extended the analysis window to −3 and −6 days. Figure 6 illustrates when the algorithm first flagged anomalous behavior. Generally, initial anomaly detections occurred between days −3 and −6, pointing to an early warning signal of the clinical issue. At EPFL and Sanofi, TP detection ranged from 59% to 100% depending on the clinical category during days −3 to −6. At Rutgers, certain TP cases involving skin issues (43%), fighting (38%), and eye-related problems (67%) went undetected on earlier days. Early stages of these conditions may not have yet altered the locomotor activity of the mice, which could explain why the algorithm failed to pick them up.

**Fig. 6: Temporal distribution of the first alerts generated by model 2 from day 0 to –6 days for each veterinary case category.**

Top Posts

Secret Sabotage: How Hidden Azure DevOps PR Comments Can Hijack AI Agents

AI Jailbreak: OpenAI Models Breach Test Prison, Rig Hugging Face Leaderboard with Cheat Code

Precision Medicine Deposited: The Art of Microdispensing for Next-Gen Medical Devices

Revolutionizing Mouse Health Assessment Through AI-Powered Continuous Home-Cage Surveillance Across Multiple Centers

Beyond Guesswork: A Slurm-Powered Battle Plan for Benchmarking Distributed LLM Servers

Beyond Prompt Engineering: How 4 Context Bricks Silence RAG Hallucinations

Run Mythos Enhanced Coding Model Locally with llama.cpp on Raspberry Pi

Astryx: Meta’s Open-Source React Toolkit—150+ Accessible Components, 7 Themes, and a CLI Agent-Ready Design System

Endless Code: Mastering the Art of the 24-Hour Claude Agent

Unlock Peak Performance: Your Blueprint for Lightning-Fast Agentic Coding with Claude

Secret Sabotage: How Hidden Azure DevOps PR Comments Can Hijack AI Agents

AI Jailbreak: OpenAI Models Breach Test Prison, Rig Hugging Face Leaderboard with Cheat Code

Precision Medicine Deposited: The Art of Microdispensing for Next-Gen Medical Devices

When the World Cup Collided with the Cloud: 2026’s Digital Traffic Surge

Skyways Unleashed: The US and Europe Race to Build the Future of Urban Air Travel

5 No-Cost Courses to Transform from AI Newbie to Pro

Beyond Guesswork: A Slurm-Powered Battle Plan for Benchmarking Distributed LLM Servers

The Magic of Friction: Engineering Smarter Robot World Models

Trending

Secret Sabotage: How Hidden Azure DevOps PR Comments Can Hijack AI Agents

AI Jailbreak: OpenAI Models Breach Test Prison, Rig Hugging Face Leaderboard with Cheat Code

Latest Posts

Not More Data, but Better World Models – Unite.AI

OpenAI Is Hiring Head of Preparedness, Amid AI Cyberattack Fears

Subscribe to Updates

Top Posts

Revolutionizing Mouse Health Assessment Through AI-Powered Continuous Home-Cage Surveillance Across Multiple Centers

Related Posts