Token Pruning
In this section, we will discuss how to linearize token pruning methods, which are commonly used in transformer models to reduce the number of tokens and thus speed up inference. We will cover four popular token pruning methods: Top-K, EViT, POMT, and ToMe.