
We're Propensity Labs, an independent evaluation platform to certify AI models on propensities that could lead to catastrophic risks.
SysAdmin Benchmark : testing power seeking in models tasked with simple system admin work. Presented at the Alignment Workshop at NeurIPS 2025. Poster here.
Evaluations of the most downloaded open source models, on propensities polled HuggingFace users were interested in - Instruction Following and Hallucinations.
[WIP] We're creating an open world linux environment with complex tasks to measure power seeking propensities in models. Stay tuned!
Propensities are what a model is inclined to do when given the opportunity, in contrast with capabilities, which are what a model is able to do.
While there are existing organizations that focus on what models are capable of, there's a gap in systematically evaluating all model propensities that could lead to systemic risks like loss of control scenarios.
Yes, focusing on capabilities alone does not address all risks from misalignment. Models can hide or later acquire capabilities with further training or scaffolding. Similarly, models might have the capacity to do certain things but might not be inclined to do them.
We're initially focusing on propensities that contribute to Loss of Control scenarios like Power Seeking, Corrigibility, or Lawlessness. Our current work is focused on evaluating Power Seeking tendencies in frontier LLMs.
Loss of Control is defined as 'Risks from humans losing the ability to reliably direct, modify, or shut down a model'.
Model propensities contribute disproportionately to it, compared to other systemic risks, which are more from model misuse (enabling CBRN, cyber attacks, and harmful manipulation).
Of course! Please reach out to us via the form below or email us at info@propensitylabs.ai