Papers
arxiv:2602.03918

Entropy Reveals Block Importance in Masked Self-Supervised Vision Transformers

Published on Feb 3
Authors:
,

Abstract

Block importance in masked self-supervised vision transformers can be estimated using information entropy of pretrained weights, enabling efficient pruning without data access.

Masked self-supervised vision transformers have become a dominant pretraining paradigm, yet their substantial model size poses significant challenges for resource-constrained deployment and efficient transfer learning. A fundamental question remains: are all transformer blocks equally important for downstream performance? In this paper, we show that block importance in masked self-supervised vision transformers can be accurately estimated without access to any data. Our key finding is that the information entropy of pretrained block weights strongly correlates with oracle sensitivity obtained via iterative block removal and finetuning. This observation enables Gardener, a data-free, one-shot, block-level pruning principle that identifies redundant blocks through simple information-theoretic measurements. We evaluate Gardener on VideoMAE-B across multiple pruning ratios and downstream video recognition benchmarks. Despite its negligible computational overhead, Gardener consistently matches or outperforms existing data-free pruning baselines and closely approaches sensitivity-based pruning. Remarkably, even after pruning up to 91.7\% of blocks, the pruned model retains competitive transfer performance. Our results reveal substantial block-level redundancy in masked self-supervised vision transformers and demonstrate that information-theoretic analysis offers a principled and efficient pathway for model compression and resource-efficient transfer learning.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2602.03918
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.03918 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.03918 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.03918 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.