Bug Validation Framework
LeanCert includes a validation framework to filter false positives from counter-example searches.
The Problem
Static analysis and interval arithmetic can report "violations" that aren't real bugs:
- Interval explosion: Wide bounds due to the dependency problem
- Precision artifacts: Floating-point rounding in the checker
- Intentional behavior: Code that's designed to handle edge cases
The validation framework helps distinguish real bugs from false alarms.
Quick Example
import leancert as lc
from leancert.validation import BugValidator
# Expression that might have issues
x = lf.var('x')
expr = 1 / (x - 1) # Singularity at x = 1
# Find potential violations
result = lf.find_bounds(expr, {'x': (0.5, 1.5)})
# Validate if the "unbounded" result is a real issue
validator = BugValidator()
verdict = validator.validate(
expr=expr,
domain={'x': (0.5, 1.5)},
reported_issue="unbounded output"
)
print(verdict)
# ValidationVerdict(
# is_real_bug=True,
# confidence=0.95,
# reason="Singularity at x=1 within domain"
# )
Components
BugValidator
Main entry point for validation.
from leancert.validation import BugValidator
validator = BugValidator()
verdict = validator.validate(expr, domain, reported_issue)
IntervalExplosionDetector
Detects if bounds blew up due to interval arithmetic limitations, not actual unboundedness.
from leancert.validation import IntervalExplosionDetector
detector = IntervalExplosionDetector()
# Check if wide bounds are due to dependency problem
is_explosion = detector.detect(
expr=x * x - x * x, # Should be 0, but interval gives [-1, 1] on [-1, 1]
domain={'x': (-1, 1)}
)
# Returns True - this is interval explosion, not a real issue
Common explosion patterns:
- x - x → should be 0, interval gives wide bounds
- x * (1/x) → should be 1, interval gives wide bounds
- Correlated variables in LayerNorm
CounterexampleVerifier
Concretely evaluates the expression at reported counter-example points.
from leancert.validation import CounterexampleVerifier
verifier = CounterexampleVerifier()
# Check if counter-example is real
is_real = verifier.verify(
expr=x**2,
point={'x': 2.0},
claimed_bound=3.0,
bound_type='upper'
)
# Returns True - x²=4 really does exceed 3
CommentAnalyzer
Checks code context for indicators that behavior is intentional.
from leancert.validation import CommentAnalyzer
analyzer = CommentAnalyzer()
code_context = """
# NOTE: This can return infinity for x near 0
# This is expected behavior - caller handles it
def f(x):
return 1/x
"""
is_intentional = analyzer.is_intentional(code_context)
# Returns True - comment indicates intentional behavior
Detected patterns:
- # expected, # intentional, # by design
- # TODO: handle edge case
- # FIXME (indicates known issue)
ValidationVerdict
Result of validation:
@dataclass
class ValidationVerdict:
is_real_bug: bool # True if this is likely a real issue
confidence: float # 0.0 to 1.0
reason: str # Human-readable explanation
suggested_fix: str | None # Optional fix suggestion
Validation Pipeline
The BugValidator runs multiple checks:
Reported Issue
│
▼
┌─────────────────┐
│ Interval │──▶ If explosion detected,
│ Explosion Check │ likely false positive
└─────────────────┘
│
▼
┌─────────────────┐
│ Counterexample │──▶ Concrete evaluation
│ Verification │ confirms or refutes
└─────────────────┘
│
▼
┌─────────────────┐
│ Comment │──▶ Check for intentional
│ Analysis │ behavior markers
└─────────────────┘
│
▼
Verdict
Configuration
validator = BugValidator(
concrete_samples=100, # Points to sample for verification
explosion_threshold=1e10, # Bound magnitude indicating explosion
confidence_threshold=0.8 # Min confidence to report as bug
)
Integration with CI
import leancert as lc
from leancert.validation import BugValidator
def check_module(module_path):
"""Check a module for real bugs, filtering false positives."""
validator = BugValidator()
real_bugs = []
# Run interval analysis
results = lf.analyze_file(module_path)
for issue in results.potential_issues:
verdict = validator.validate(
expr=issue.expr,
domain=issue.domain,
reported_issue=issue.description
)
if verdict.is_real_bug and verdict.confidence > 0.8:
real_bugs.append(issue)
return real_bugs
# In CI pipeline
bugs = check_module("src/math_utils.py")
if bugs:
print(f"Found {len(bugs)} real bugs")
sys.exit(1)
BugReport
Structured bug report for integration:
@dataclass
class BugReport:
location: str # File and line
expr: Expr # Expression with issue
domain: dict # Input domain
issue_type: str # "overflow", "division_by_zero", etc.
counterexample: dict # Concrete input triggering issue
confidence: float
def to_json(self) -> str: ...
def to_github_annotation(self) -> str: ...
Example: Filtering False Positives
import leancert as lc
from leancert.validation import BugValidator
x = lf.var('x')
# False positive: interval explosion
expr1 = (x + 1) * (x - 1) - (x*x - 1) # Algebraically 0
result1 = lf.find_bounds(expr1, {'x': (-10, 10)})
# Interval says [-400, 400] but it's actually always 0
validator = BugValidator()
verdict1 = validator.validate(expr1, {'x': (-10, 10)}, "large bounds")
# verdict1.is_real_bug = False (detected as interval explosion)
# Real bug: division by zero possible
expr2 = 1 / x
result2 = lf.find_bounds(expr2, {'x': (-1, 1)})
verdict2 = validator.validate(expr2, {'x': (-1, 1)}, "unbounded")
# verdict2.is_real_bug = True (x=0 causes real issue)