Skip to content

Neural Network & Transformer Verification

LeanCert includes verified neural network verification based on interval propagation and DeepPoly relaxations, with support for modern architectures including Transformers.

Overview

The ML module provides:

  • Interval Propagation: Sound overapproximation of neural network outputs
  • DeepPoly Relaxations: Tight linear bounds for ReLU and sigmoid activations
  • Transformer Support: Multi-Head Attention, LayerNorm, GELU, Residual connections
  • Verified Soundness: All propagation theorems are formally proved in Lean

Supported Architectures

Architecture Components Status
Feedforward (MLP) Linear, ReLU, Sigmoid Fully verified
Transformers Attention, LayerNorm, GELU Fully verified
Residual Networks Skip connections Fully verified

Quick Example

import LeanCert.ML.Network

open LeanCert.ML

-- Define a simple 2-layer network
def myNet : TwoLayerNet := {
  layer1 := { weights := [[1, -1], [0, 1]], bias := [0, 0] }
  layer2 := { weights := [[1, 1]], bias := [0] }
}

-- Input interval: x₁ ∈ [-1, 1], x₂ ∈ [0, 1]
def inputBox : IntervalVector := [
  IntervalDyadic.ofRat (-1) 1,
  IntervalDyadic.ofRat 0 1
]

-- Propagate intervals through the network
def outputBounds := myNet.forwardInterval inputBox

Architecture

Layer Structure

A dense layer computes \( y = \text{ReLU}(Wx + b) \):

structure Layer where
  weights : List (List )  -- Weight matrix (rows)
  bias : List             -- Bias vector

Soundness Theorem

The key theorem guarantees that interval propagation is sound:

theorem mem_forwardInterval {l : Layer} {xs : List } {Is : IntervalVector}
    (hwf : l.WellFormed)
    (hdim : l.inputDim = Is.length)
    (hxlen : xs.length = Is.length)
    (hmem :  i, xs[i]  Is[i]) :
     i, (forwardReal l xs)[i]  (forwardInterval l Is)[i]

This says: if every real input is contained in its corresponding interval, then every real output is contained in its corresponding output interval.

Activation Functions

ReLU

ReLU interval propagation uses the simple rule:

\[ \text{ReLU}([l, u]) = [\max(0, l), \max(0, u)] \]

def relu (I : IntervalDyadic) : IntervalDyadic where
  lo := Dyadic.max 0 I.lo
  hi := Dyadic.max 0 I.hi

theorem mem_relu {x : } {I : IntervalDyadic} (hx : x  I) :
    max 0 x  relu I

Sigmoid

Sigmoid uses the conservative bound \( \sigma(x) \in (0, 1) \):

def sigmoid (_I : IntervalDyadic) : IntervalDyadic where
  lo := 0
  hi := 1

theorem mem_sigmoid {x : } {I : IntervalDyadic} (_hx : x  I) :
    Real.sigmoid x  sigmoid I

DeepPoly Relaxations

For tighter bounds, the module implements DeepPoly-style linear relaxations.

ReLU Triangle Relaxation

For the "crossing case" where \( l < 0 < u \), ReLU is bounded by:

  • Lower: \( y \geq 0 \)
  • Upper: The line through \( (l, 0) \) and \( (u, u) \)
theorem upper_bound_valid (l u : ) (x : )
    (hl : l < 0) (hu : 0 < u) (hx_mem : l  x  x  u) :
    max 0 x  (crossingUpperBound l u).eval x

Sigmoid Monotonicity Bounds

Since sigmoid is strictly monotone:

\[ \sigma(l) \leq \sigma(x) \leq \sigma(u) \quad \text{for } x \in [l, u] \]

theorem sigmoid_relaxation_sound (l u x : ) (h : l  x  x  u) :
    sigmoid l  sigmoid x  sigmoid x  sigmoid u

GELU Activation

GELU (Gaussian Error Linear Unit) is the standard activation for Transformers:

\[ \text{GELU}(x) = 0.5 \cdot x \cdot (1 + \tanh(\sqrt{2/\pi} \cdot (x + 0.044715 \cdot x^3))) \]

def gelu (I : IntervalDyadic) : IntervalDyadic :=
  -- Verified interval approximation
  let inner := I.add (I.mul I.mul I.scale 0.044715)
  let scaled := inner.scale (Real.sqrt (2 / Real.pi))
  I.scale 0.5 |>.mul (IntervalDyadic.one.add (tanh scaled))

theorem mem_geluInterval {x : } {I : IntervalDyadic} (hx : x  I) :
    Real.gelu x  gelu I

Transformer Components

Self-Attention

LeanCert verifies the scaled dot-product attention mechanism:

\[ \text{Attention}(Q, K, V) = \text{softmax}\left(\frac{Q \cdot K^T}{\sqrt{d_k}}\right) \cdot V \]

import LeanCert.ML.Attention

-- Verified attention output bounds
theorem mem_scaledDotProductAttention
    {Q K V : Matrix n d } {Q_int K_int V_int : IntervalMatrix n d}
    (hQ :  i j, Q i j  Q_int i j)
    (hK :  i j, K i j  K_int i j)
    (hV :  i j, V i j  V_int i j) :
     i j, (scaledDotProductAttention Q K V) i j 
           (scaledDotProductAttentionInterval Q_int K_int V_int) i j

Layer Normalization

Interval bounds for LayerNorm are computed soundly:

import LeanCert.ML.LayerNorm

-- LayerNorm: y = (x - μ) / σ * γ + β
theorem mem_layerNormInterval {x : Vector n } {I : IntervalVector n}
    (hx :  i, x i  I i) :
     i, (layerNorm x γ β) i  (layerNormInterval I γ_int β_int) i

Note: Standard interval arithmetic may overestimate LayerNorm bounds due to variable correlation (the mean and variance are computed from the same input).

Affine Arithmetic for Tight LayerNorm Bounds

To address the dependency problem in LayerNorm, LeanCert provides LeanCert.ML.LayerNormAffine which uses affine arithmetic to track linear correlations between variables:

import LeanCert.ML.LayerNormAffine

-- Tight bounds via affine forms that track variable correlations
def layerNormAffine (x : AffineVector n) (γ β : List ) : AffineVector n

-- Used in TransformerBlock.forwardIntervalTight
theorem layerNormAffine_sound {x_real : Vector n } {x_affine : AffineVector n}
    (hx : x_real  x_affine) :
    layerNorm x_real γ β  (layerNormAffine x_affine γ β).toIntervalVector

Key insight: In LayerNorm, the centering operation x - μ creates correlated outputs (they sum to zero). Standard interval arithmetic loses this correlation, leading to loose bounds. Affine arithmetic preserves it:

Method LayerNorm([0.9, 1.1]) Bounds
Standard Interval [-1.5, 1.5]
Affine Arithmetic [-0.1, 0.1]

Use TransformerBlock.forwardIntervalTight for the tightest bounds on transformer layers.

Optimized Implementation

For real-world networks, the LeanCert.ML.Optimized module provides:

Optimization Speedup Description
Structure-of-Arrays ~5x Cache-efficient interval storage
Split-Sign Decomposition ~2x Branch-free interval matrix multiply
Common Exponent Alignment ~10-50x Pure integer (GMP) arithmetic
import LeanCert.ML.Optimized

open LeanCert.ML.Optimized

-- Create quantized network for fast propagation
def qnet := QuantizedNet.ofLayers myLayers

-- Fast interval propagation
def output := qnet.forward alignedInput

Verification Status

Component Status
mem_forwardInterval (layer soundness) ✓ Fully verified
mem_relu ✓ Fully verified
mem_sigmoid ✓ Fully verified
relu_relaxation_sound (DeepPoly ReLU) ✓ Fully verified
sigmoid_relaxation_sound (DeepPoly Sigmoid) ✓ Fully verified
Optimized implementations Computationally correct, proofs in progress

Use Cases

  • Robustness Verification: Prove that small input perturbations don't change the output class
  • Safety Certification: Verify that outputs stay within safe bounds
  • Lipschitz Estimation: Bound the sensitivity of the network to input changes

Files

File Description
ML/Network.lean Layer and network definitions
ML/IntervalVector.lean Activation functions (ReLU, sigmoid)
ML/Symbolic/ReLU.lean DeepPoly ReLU relaxation
ML/Symbolic/Sigmoid.lean DeepPoly sigmoid relaxation
ML/Optimized.lean High-performance implementations
ML/LayerNormAffine.lean Affine arithmetic for tight LayerNorm bounds
ML/Transformer.lean Full transformer block definitions
ML/Attention.lean Scaled dot-product attention verification
ML/Softmax.lean Softmax interval propagation