How can I tell the size of a model before downloading it?

I’m using the ‘hf’ CLI tool to download models from HF.

Take Qwen/Qwen3.5-397B-A17B · Hugging Face for example.

How can I tell BEFORE downloading how large it is?

Maybe dry-run mode on the hf CLI is a reliable method. The dry-run output lists the files that would be downloaded and the total bytes they would require.

hf download Qwen/Qwen3.5-397B-A17B --dry-run

‘hf’ doesn’t have anything close to that argument. Bad bot.

Maybe due to older version of huggingface_hub library. Try pip install -U huggingface_hub first…

hf download Qwen/Qwen3.5-397B-A17B --dry-run
[dry-run] Fetching 107 files: 100%|██████████████████████████████████████████████████| 107/107 [00:02<00:00, 35.91it/s]
Download complete: : 0.00B [00:02, ?B/s]              [dry-run] Will download 107 files (out of 107) totalling 806.8G.]
File                                         Bytes to download
-------------------------------------------- -----------------
.gitattributes                               1.6K
LICENSE                                      11.5K
README.md                                    94.8K
chat_template.jinja                          7.8K
config.json                                  4.2K
generation_config.json                       244.0
merges.txt                                   3.4M
model.safetensors-00001-of-00094.safetensors 8.6G
...

To check the size of a Hugging Face model like Qwen/Qwen3.5-397B-A17B

Python Script

Run this to get total size and per-file breakdown without downloading:

python

fromhuggingface_hubimport HfApi

def print_model_sizes(repo_id):
api = HfApi()
repo_info = api.model_info(repo_id=repo_id, files_metadata=True)
total_size_bytes = 0
print(f"File sizes for model '{repo_id}':")
forsiblingin repo_info.siblings:
filename = sibling.rfilename
size_bytes = sibling.sizeor 0
total_size_bytes += size_bytes
size_gb = size_bytes / (1024 ** 3)
print(f" {filename}: {size_gb:.2f} GB")
total_gb = total_size_bytes / (1024 ** 3)
print(f"\nTotal size: {total_gb:.2f} GB")

print_model_sizes("Qwen/Qwen3.5-397B-A17B")

This queries metadata only

HF CLI Dry-Run

bash

hf space status Qwen/Qwen3.5-397B-A17B # O model_info
#dry-run download:
huggingface-cli download Qwen/Qwen3.5-397B-A17B --dry-run

Use Hugging Face’s generic search= parameter. It behaves like the site search: partial/fuzzy-ish matching across model names and metadata.

#!/usr/bin/env python3

from future import annotations

import argparse
from huggingface_hub import HfApi
from huggingface_hub.utils import HfHubHTTPError

def human_size(num_bytes: int | None) → str:
if not num_bytes:
return “0 B”

units = ["B", "KB", "MB", "GB", "TB"]
size = float(num_bytes)

for unit in units:
    if size < 1024 or unit == units[-1]:
        return f"{size:.2f} {unit}"
    size /= 1024

return f"{size:.2f} TB"

def get_repo_size(api: HfApi, repo_id: str) → tuple[int, list[tuple[str, int]]]:
info = api.model_info(repo_id, files_metadata=True)

files = []
total = 0

for sibling in info.siblings or []:
    name = sibling.rfilename
    size = sibling.size or 0
    files.append((name, size))
    total += size

files.sort(key=lambda x: x[1], reverse=True)
return total, files

def main() → int:
parser = argparse.ArgumentParser(
description=“Search Hugging Face models by partial term and show approximate repo sizes.”
)
parser.add_argument(“query”, help=“Partial search term, e.g. qwen, llama, flux, embedding, whisper”)
parser.add_argument(“-n”, “–limit”, type=int, default=20, help=“Number of model results to check”)
parser.add_argument(“–show-files”, action=“store_true”, help=“Show largest files per repo”)
parser.add_argument(“–top-files”, type=int, default=5, help=“Number of files to show with --show-files”)
parser.add_argument(“–sort”, default=“downloads”, help=“HF sort field: downloads, likes, lastModified, createdAt”)
parser.add_argument(“–ascending”, action=“store_true”, help=“Sort HF results ascending”)

args = parser.parse_args()

api = HfApi()

models = api.list_models(
    search=args.query,
    limit=args.limit,
    sort=args.sort,
    direction=1 if args.ascending else -1,
    full=True,
)

rows = []

for model in models:
    repo_id = model.modelId

    try:
        total_size, files = get_repo_size(api, repo_id)
    except HfHubHTTPError as e:
        rows.append({
            "repo_id": repo_id,
            "size": None,
            "downloads": getattr(model, "downloads", None),
            "likes": getattr(model, "likes", None),
            "pipeline": getattr(model, "pipeline_tag", None),
            "error": str(e),
            "files": [],
        })
        continue

    rows.append({
        "repo_id": repo_id,
        "size": total_size,
        "downloads": getattr(model, "downloads", None),
        "likes": getattr(model, "likes", None),
        "pipeline": getattr(model, "pipeline_tag", None),
        "error": None,
        "files": files,
    })

rows.sort(key=lambda r: r["size"] or 0, reverse=True)

print()
print(f"Search: {args.query}")
print()

print(f"{'SIZE':>12}  {'DOWNLOADS':>10}  {'LIKES':>7}  {'TYPE':<24}  MODEL")
print("-" * 90)

for row in rows:
    size = human_size(row["size"])
    downloads = row["downloads"] if row["downloads"] is not None else "-"
    likes = row["likes"] if row["likes"] is not None else "-"
    pipeline = row["pipeline"] or "-"

    print(f"{size:>12}  {downloads:>10}  {likes:>7}  {pipeline:<24}  {row['repo_id']}")

    if row["error"]:
        print(f"{'':>12}  error: {row['error']}")

    if args.show_files and row["files"]:
        for filename, file_size in row["files"][:args.top_files]:
            print(f"{'':>12}  {human_size(file_size):>10}  {filename}")
        print()

return 0

if name == “main”:
raise SystemExit(main())

Or depending on what hf_hub you hace

#!/usr/bin/env python3

from future import annotations

from huggingface_hub import HfApi
from huggingface_hub.utils import HfHubHTTPError

def human_size(num_bytes: int | None) → str:
if not num_bytes:
return “0 B”

units = ["B", "KB", "MB", "GB", "TB"]
size = float(num_bytes)

for unit in units:
    if size < 1024 or unit == units[-1]:
        return f"{size:.2f} {unit}"
    size /= 1024

return f"{size:.2f} TB"

def prompt_int(label: str, default: int) → int:
value = input(f"{label} [{default}]: ").strip()

if not value:
    return default

try:
    parsed = int(value)
    if parsed <= 0:
        print(f"Using default: {default}")
        return default
    return parsed
except ValueError:
    print(f"Invalid number. Using default: {default}")
    return default

def prompt_bool(label: str, default: bool = False) → bool:
default_text = “y” if default else “n”
value = input(f"{label} [y/n, default {default_text}]: ").strip().lower()

if not value:
    return default

return value in {"y", "yes", "true", "1"}

def prompt_choice(label: str, choices: list[str], default: str) → str:
print(f"{label}:“)
for index, choice in enumerate(choices, start=1):
marker = " default” if choice == default else “”
print(f" {index}. {choice}{marker}")

value = input(f"Choose 1-{len(choices)} [{default}]: ").strip()

if not value:
    return default

if value in choices:
    return value

try:
    index = int(value)
    if 1 <= index <= len(choices):
        return choices[index - 1]
except ValueError:
    pass

print(f"Invalid choice. Using default: {default}")
return default

def get_repo_size(api: HfApi, repo_id: str) → tuple[int, list[tuple[str, int]]]:
info = api.model_info(repo_id, files_metadata=True)

files: list[tuple[str, int]] = []
total = 0

for sibling in info.siblings or []:
    name = getattr(sibling, "rfilename", None) or getattr(sibling, "filename", None) or "<unknown>"
    size = getattr(sibling, "size", None) or 0

    size = int(size)
    files.append((name, size))
    total += size

files.sort(key=lambda x: x[1], reverse=True)
return total, files

def get_model_id(model) → str | None:
return getattr(model, “modelId”, None) or getattr(model, “id”, None)

def sort_models_locally(models: list, sort: str, ascending: bool) → list:
reverse = not ascending

if sort == "downloads":
    models.sort(key=lambda m: getattr(m, "downloads", 0) or 0, reverse=reverse)
elif sort == "likes":
    models.sort(key=lambda m: getattr(m, "likes", 0) or 0, reverse=reverse)
elif sort == "lastModified":
    models.sort(key=lambda m: str(getattr(m, "lastModified", "") or ""), reverse=reverse)
elif sort == "createdAt":
    models.sort(key=lambda m: str(getattr(m, "createdAt", "") or ""), reverse=reverse)
elif sort == "trendingScore":
    models.sort(key=lambda m: getattr(m, "trendingScore", 0) or 0, reverse=reverse)

return models

def main() → int:
print()
print(“Hugging Face model size search”)
print()

query = input("Search term, partial match is OK: ").strip()

if not query:
    print("No search term entered. Exiting.")
    return 1

limit = prompt_int("How many model results should I check?", 20)

sort = prompt_choice(
    "Sort Hugging Face search results by",
    ["downloads", "likes", "lastModified", "createdAt", "trendingScore"],
    "downloads",
)

ascending = prompt_bool("Ascending sort?", False)
show_files = prompt_bool("Show largest files for each model?", True)

top_files = 5
if show_files:
    top_files = prompt_int("How many largest files per model?", 5)

size_sort = prompt_choice(
    "Sort final output by computed repo size",
    ["desc", "asc", "none"],
    "desc",
)

api = HfApi()

print()
print(f"Searching Hugging Face for: {query}")
print()

try:
    models = list(api.list_models(
        search=query,
        limit=limit,
        sort=sort,
        full=True,
    ))
except TypeError:
    models = list(api.list_models(
        search=query,
        limit=limit,
        full=True,
    ))

models = sort_models_locally(models, sort=sort, ascending=ascending)

if not models:
    print(f"No models found for search term: {query}")
    return 1

rows = []

total_models = len(models)

for index, model in enumerate(models, start=1):
    repo_id = get_model_id(model)

    if not repo_id:
        continue

    print(f"[{index}/{total_models}] Checking {repo_id}...")

    try:
        total_size, files = get_repo_size(api, repo_id)
        error = None
    except HfHubHTTPError as e:
        total_size = None
        files = []
        error = str(e)
    except Exception as e:
        total_size = None
        files = []
        error = f"{type(e).__name__}: {e}"

    rows.append({
        "repo_id": repo_id,
        "size": total_size,
        "downloads": getattr(model, "downloads", None),
        "likes": getattr(model, "likes", None),
        "pipeline": getattr(model, "pipeline_tag", None),
        "error": error,
        "files": files,
    })

if size_sort == "asc":
    rows.sort(key=lambda r: r["size"] or 0)
elif size_sort == "desc":
    rows.sort(key=lambda r: r["size"] or 0, reverse=True)

print()
print(f"Search: {query}")
print()

print(f"{'SIZE':>12}  {'DOWNLOADS':>10}  {'LIKES':>7}  {'TYPE':<24}  MODEL")
print("-" * 110)

for row in rows:
    size = human_size(row["size"])
    downloads = row["downloads"] if row["downloads"] is not None else "-"
    likes = row["likes"] if row["likes"] is not None else "-"
    pipeline = row["pipeline"] or "-"

    print(f"{size:>12}  {downloads:>10}  {likes:>7}  {pipeline:<24}  {row['repo_id']}")

    if row["error"]:
        print(f"{'':>12}  error: {row['error']}")

    if show_files and row["files"]:
        for filename, file_size in row["files"][:top_files]:
            print(f"{'':>12}  {human_size(file_size):>10}  {filename}")
        print()

return 0

if name == “main”:
raise SystemExit(main())

um just a question here.
are you asking how much physical space it will take up on the hardrive?
or how much VRAM it needs to be able to run efficiently?
because those are 2 similar but very different questions.

@John6666 I owe you an apology, I didn’t realize the --help output could be different depending on the 2nd command. I had already run ‘hf --help’ before creating this discussion and saw nothing I could use, then 1 day later you’re mentioning a dry-run argument that I knew I hadn’t seen in -help, so I assumed you were a bot/lying. I should have just tried your command before assuming :sweat_smile:

I was asking about physical disk space.

I know that for VRAM, you need the model’s weight in GB + breathing room to store the KV Cache.

yes, that was the difference I was pointing out. The reason to focus on this too is that for VRAM you do need the model weights in GB plus some breathing room. This also affects solutions because one of the largest, fastest cards that can run AI models (whether you’re looking at consumer or enterprise markets) only has 24 GB of VRAM.Some of the older architectures run up to, I think, 48. There may be other ones. My research into this has not been exhaustive because I’ve been looking into this too. What this actually means is that with some of the larger models (like the 70b models), you need at least two graphics cards to run it. That’s because ofbecause of how they actually run the models, the models are not run as a 70B. They are run through an architecture that breaks the model up. I don’t completely understand that yet.

What it means is they run it as an 8-bit model, I think that is the terminology, which allows a program you would want to use along with the large 70B model. It allows it to operate the 70B model in a VRAM pool between 2 or more cards instead of just on one card.

The smaller agents, like the 8b agents, and I think there is a 16b agent, can run safely on one 24 GB card, but the larger ones won’t.
So far, my research into what I’m trying to do has only led me to discover the 8B, the 30B, and the 70B. I haven’t been looking into it very long.