Preprocessing (A prompt to AI Code Generation)

HexagonDiff is a C++ tool for differential verification of deep neural networks (DNNs) on Hexagon DSPs. It compares the outputs of two DNN implementations to identify discrepancies and ensure correctness.

Basics

Use onnx library to parse the ONNX models of the two DNN implementations. Extract the computational graph, input/output tensors, and parameters for each layer.

Write your own parser to read the VNNLIB specification file. Assume that the input specifications are in the form of $lb \leq x \leq ub$ , where $lb$ and $ub$ are the lower and upper bounds of the input, respectively.

Write most of your operation in torch. For operations doesn't exist in torch, write your own implementation using triton for GPU acceleration.

Differential Network

We assume that two DNN implementations are only different in certain layers, such as affine layers and token pruning layers, which means we can construct a differential network. The differential network has the same structure as the original network, except the following differences:

All non-linear operators (e.g., ReLU, MaxPool) remained the same as in the original two DNNs.
If the two networks have different weight and bias in a certain affine layer, in the differential network, we keep weights and biases from both the DNNs: $(W_{x}, W_{y}, b_{x}, b_{y})$ as the affine operator in the differential network.
We denote each edge of the network as different types of tensors. During the verification of the network, these different type will be bounded differently. At the current stage, we only consider the following types of tensors:
1. $Eq (x, y)$ represents that the two DNNs have the exact same value at this point. Later, we will bound $x$ and $y$ using the same bounds: $lb \leq x, y \leq ub$ .
2. $Diff (x, y)$ represents that the two DNNs have different values but same shape at this point. Later, we will bound $x$ and $y$ separately: $lb_{x} \leq x \leq ub_{x}$ and $lb_{y} \leq y \leq ub_{y}$ , and with a differential bound: $lb_{d} \leq y - β x \leq ub_{d}, β = \frac{uy - ly}{ux - lx}$ .
3. $Truncate (x, y)$ represents that the one vector (tensor) of the differential network is truncated from the other. Later, we will bound the common part of $x$ and $y$ using the same bounds as $Diff$ , and bound the truncated part of $y$ using a single bound of $y$ : $lb_{y} \leq y \leq ub_{y}$ .
4. $TruncateMerge (x, y)$ represents that the one vector (tensor) of the differential network is truncated from the other, and the truncated part is merged into the last vector (tensor). In addition to $Truncate$ , we also need to bound the $Diff$ bounds between every value (tensor) in the truncated part and the merged value (tensor). For example, if vector $y$ is truncated from vector $x$ , we need to bound $x$ and $y$ using $Truncate$ bounds, and bound the difference between the last element of $y$ and every element in the truncated part of $x$ using $Diff$ bounds.
For token pruning of transformers, we use special operators like $TopK$ and $EViT$ to represent the token pruning operations, which generates $Truncate$ and $TruncateMerge$ types.

Conversion to the Differential Network

The following changes has to be made to convert the original two DNNs into the differential network:

LayerNorm's division needs to be removed, since this part will be difficult for verification.
Integer tensor operations need to be fused into operators like $TopK$ and $EViT$ , which will generate $Truncate$ and $TruncateMerge$ types.

After making these changes, we can construct the differential network by merging the two DNNs together. The two DNNs will share the same non-linear operators, and have different weights and biases for affine layers. The types in the differential network are then deduced based on the structure of the network and the differences between the two DNNs.

HexagonDiff Wiki

Preprocessing (A prompt to AI Code Generation)

Basics

Differential Network

Conversion to the Differential Network