
Propensity Labs is a Delaware Public Benefit Corporation. Our charter commits us to advancing the safety of artificial intelligence systems for the benefit of humanity, through their evaluation and responsible deployment.
Today, that means evaluations of frontier models for the kinds of behavior that produce loss of control, like - power-seeking, specification gaming, resistance to goal modification. SysAdmin is our first benchmark covering 7 frontier models, 2800 tasks, and counting.
SysAdmin benchmark : Measuring propensities that could lead to Loss of Control in an open world Linux sandbox. 7 frontier models, 2800 tasks, 2x2 factorial design. Submitted to NeurIPS 2026. Draft here.
Early Benchmark : testing power seeking in models tasked with simple system admin work. Presented at the Alignment Workshop at NeurIPS 2025.
Poster here.
Evaluations of the most downloaded open source models, on propensities polled HuggingFace users were interested in - Instruction Following and Hallucinations.
Analysis of power seeking behaviour amongst agents in Moltbook + the effect of humans masquerading as agents.
We publish our research here : https://propensitylabs.substack.com/
Propensities are what a model is inclined to do when given the opportunity, in contrast with capabilities, which are what a model is able to do.
While there are existing organizations that focus on what models are capable of, there's a gap in systematically evaluating all model propensities that could lead to systemic risks like loss of control scenarios.
Yes, focusing on capabilities alone does not address all risks from misalignment. Models can hide or later acquire capabilities with further training or scaffolding. Similarly, models might have the capacity to do certain things but might not be inclined to do them.
We're initially focusing on propensities that contribute to Loss of Control scenarios like Power Seeking, Corrigibility, or Lawlessness. Our current work is focused on evaluating Power Seeking tendencies in frontier LLMs, which manifest in terminal agents as privilege escalation, scope increase when completing tasks, etc.
Loss of Control is defined as 'Risks from humans losing the ability to reliably direct, modify, or shut down a model'.
Model propensities contribute disproportionately to it, compared to other systemic risks, which are more from model misuse (enabling CBRN, cyber attacks, and harmful manipulation).
Of course! Please reach out to us via the form below or email us at info@propensitylabs.ai
![[headshot] image of customer for an insurance agency & company](https://cdn.prod.website-files.com/694fc73cfa78a94c9f557143/694fceae56e0064e6af22837_1608993719325.jpeg)
![[headshot] image of customer (for a plumbing service)](https://cdn.prod.website-files.com/694fc73cfa78a94c9f557143/6964aa82ac70671a6f034df6_Screenshot%202026-01-12%20at%2012.02.01%E2%80%AFPM.png)