Neural Network & Transformer Verification

LeanCert includes verified neural network verification based on interval propagation and DeepPoly relaxations, with support for modern architectures including Transformers.

Overview

The ML module provides:

Interval Propagation: Sound overapproximation of neural network outputs
DeepPoly Relaxations: Tight linear bounds for ReLU and sigmoid activations
Transformer Support: Multi-Head Attention, LayerNorm, GELU, Residual connections
Verified Soundness: All propagation theorems are formally proved in Lean

Supported Architectures

Architecture	Components	Status
Feedforward (MLP)	Linear, ReLU, Sigmoid	Fully verified
Transformers	Attention, LayerNorm, GELU	Fully verified
Residual Networks	Skip connections	Fully verified

Quick Example

import LeanCert.ML.Network

open LeanCert.ML

-- Define a simple 2-layer network
def myNet : TwoLayerNet := {
  layer1 := { weights := [[1, -1], [0, 1]], bias := [0, 0] }
  layer2 := { weights := [[1, 1]], bias := [0] }
}

-- Input interval: x₁ ∈ [-1, 1], x₂ ∈ [0, 1]
def inputBox : IntervalVector := [
  IntervalDyadic.ofRat (-1) 1,
  IntervalDyadic.ofRat 0 1
]

-- Propagate intervals through the network
def outputBounds := myNet.forwardInterval inputBox

Architecture

Layer Structure

A dense layer computes \(y = \text{ReLU}(Wx + b)\):

structure Layer where
  weights : List (List ℚ)  -- Weight matrix (rows)
  bias : List ℚ            -- Bias vector

Soundness Theorem

The key theorem guarantees that interval propagation is sound:

theorem mem_forwardInterval {l : Layer} {xs : List ℝ} {Is : IntervalVector}
    (hwf : l.WellFormed)
    (hdim : l.inputDim = Is.length)
    (hxlen : xs.length = Is.length)
    (hmem : ∀ i, xs[i] ∈ Is[i]) :
    ∀ i, (forwardReal l xs)[i] ∈ (forwardInterval l Is)[i]

This says: if every real input is contained in its corresponding interval, then every real output is contained in its corresponding output interval.

Activation Functions

ReLU

ReLU interval propagation uses the simple rule:

\[ \text{ReLU}([l, u]) = [\max(0, l), \max(0, u)] \]

def relu (I : IntervalDyadic) : IntervalDyadic where
  lo := Dyadic.max 0 I.lo
  hi := Dyadic.max 0 I.hi

theorem mem_relu {x : ℝ} {I : IntervalDyadic} (hx : x ∈ I) :
    max 0 x ∈ relu I

Sigmoid

Sigmoid uses the conservative bound \(\sigma(x) \in (0, 1)\):

def sigmoid (_I : IntervalDyadic) : IntervalDyadic where
  lo := 0
  hi := 1

theorem mem_sigmoid {x : ℝ} {I : IntervalDyadic} (_hx : x ∈ I) :
    Real.sigmoid x ∈ sigmoid I

DeepPoly Relaxations

For tighter bounds, the module implements DeepPoly-style linear relaxations.

ReLU Triangle Relaxation

For the "crossing case" where \(l < 0 < u\), ReLU is bounded by:

Lower: \(y \geq 0\)
Upper: The line through \((l, 0)\) and \((u, u)\)

theorem upper_bound_valid (l u : ℚ) (x : ℝ)
    (hl : l < 0) (hu : 0 < u) (hx_mem : l ≤ x ∧ x ≤ u) :
    max 0 x ≤ (crossingUpperBound l u).eval x

Sigmoid Monotonicity Bounds

Since sigmoid is strictly monotone:

\[ \sigma(l) \leq \sigma(x) \leq \sigma(u) \quad \text{for } x \in [l, u] \]

theorem sigmoid_relaxation_sound (l u x : ℝ) (h : l ≤ x ∧ x ≤ u) :
    sigmoid l ≤ sigmoid x ∧ sigmoid x ≤ sigmoid u

GELU Activation

GELU (Gaussian Error Linear Unit) is the standard activation for Transformers:

\[ \text{GELU}(x) = 0.5 \cdot x \cdot (1 + \tanh(\sqrt{2/\pi} \cdot (x + 0.044715 \cdot x^3))) \]

def gelu (I : IntervalDyadic) : IntervalDyadic :=
  -- Verified interval approximation
  let inner := I.add (I.mul I.mul I.scale 0.044715)
  let scaled := inner.scale (Real.sqrt (2 / Real.pi))
  I.scale 0.5 |>.mul (IntervalDyadic.one.add (tanh scaled))

theorem mem_geluInterval {x : ℝ} {I : IntervalDyadic} (hx : x ∈ I) :
    Real.gelu x ∈ gelu I

Transformer Components

Self-Attention

LeanCert verifies the scaled dot-product attention mechanism:

\[ \text{Attention}(Q, K, V) = \text{softmax}\left(\frac{Q \cdot K^T}{\sqrt{d_k}}\right) \cdot V \]

import LeanCert.ML.Attention

-- Verified attention output bounds
theorem mem_scaledDotProductAttention
    {Q K V : Matrix n d ℝ} {Q_int K_int V_int : IntervalMatrix n d}
    (hQ : ∀ i j, Q i j ∈ Q_int i j)
    (hK : ∀ i j, K i j ∈ K_int i j)
    (hV : ∀ i j, V i j ∈ V_int i j) :
    ∀ i j, (scaledDotProductAttention Q K V) i j ∈
           (scaledDotProductAttentionInterval Q_int K_int V_int) i j

Layer Normalization

Interval bounds for LayerNorm are computed soundly:

import LeanCert.ML.LayerNorm

-- LayerNorm: y = (x - μ) / σ * γ + β
theorem mem_layerNormInterval {x : Vector n ℝ} {I : IntervalVector n}
    (hx : ∀ i, x i ∈ I i) :
    ∀ i, (layerNorm x γ β) i ∈ (layerNormInterval I γ_int β_int) i

Note: Standard interval arithmetic may overestimate LayerNorm bounds due to variable correlation (the mean and variance are computed from the same input).

Affine Arithmetic for Tight LayerNorm Bounds

To address the dependency problem in LayerNorm, LeanCert provides LeanCert.ML.LayerNormAffine which uses affine arithmetic to track linear correlations between variables:

import LeanCert.ML.LayerNormAffine

-- Tight bounds via affine forms that track variable correlations
def layerNormAffine (x : AffineVector n) (γ β : List ℚ) : AffineVector n

-- Used in TransformerBlock.forwardIntervalTight
theorem layerNormAffine_sound {x_real : Vector n ℝ} {x_affine : AffineVector n}
    (hx : x_real ∈ x_affine) :
    layerNorm x_real γ β ∈ (layerNormAffine x_affine γ β).toIntervalVector

Key insight: In LayerNorm, the centering operation x - μ creates correlated outputs (they sum to zero). Standard interval arithmetic loses this correlation, leading to loose bounds. Affine arithmetic preserves it:

Method	LayerNorm([0.9, 1.1]) Bounds
Standard Interval	[-1.5, 1.5]
Affine Arithmetic	[-0.1, 0.1]

Use TransformerBlock.forwardIntervalTight for the tightest bounds on transformer layers.

Optimized Implementation

For real-world networks, the LeanCert.ML.Optimized module provides:

Optimization	Speedup	Description
Structure-of-Arrays	~5x	Cache-efficient interval storage
Split-Sign Decomposition	~2x	Branch-free interval matrix multiply
Common Exponent Alignment	~10-50x	Pure integer (GMP) arithmetic

import LeanCert.ML.Optimized

open LeanCert.ML.Optimized

-- Create quantized network for fast propagation
def qnet := QuantizedNet.ofLayers myLayers

-- Fast interval propagation
def output := qnet.forward alignedInput

Verification Status

Component	Status
`mem_forwardInterval` (layer soundness)	✓ Fully verified
`mem_relu`	✓ Fully verified
`mem_sigmoid`	✓ Fully verified
`relu_relaxation_sound` (DeepPoly ReLU)	✓ Fully verified
`sigmoid_relaxation_sound` (DeepPoly Sigmoid)	✓ Fully verified
Optimized implementations	Computationally correct, proofs in progress

Use Cases

Robustness Verification: Prove that small input perturbations don't change the output class
Safety Certification: Verify that outputs stay within safe bounds
Lipschitz Estimation: Bound the sensitivity of the network to input changes

Files

File	Description
`ML/Network.lean`	Layer and network definitions
`ML/IntervalVector.lean`	Activation functions (ReLU, sigmoid)
`ML/Symbolic/ReLU.lean`	DeepPoly ReLU relaxation
`ML/Symbolic/Sigmoid.lean`	DeepPoly sigmoid relaxation
`ML/Optimized.lean`	High-performance implementations
`ML/LayerNormAffine.lean`	Affine arithmetic for tight LayerNorm bounds
`ML/Transformer.lean`	Full transformer block definitions
`ML/Attention.lean`	Scaled dot-product attention verification
`ML/Softmax.lean`	Softmax interval propagation