I’m using the ‘hf’ CLI tool to download models from HF.
Take Qwen/Qwen3.5-397B-A17B · Hugging Face for example.
How can I tell BEFORE downloading how large it is?
I’m using the ‘hf’ CLI tool to download models from HF.
Take Qwen/Qwen3.5-397B-A17B · Hugging Face for example.
How can I tell BEFORE downloading how large it is?
Maybe dry-run mode on the hf CLI is a reliable method. The dry-run output lists the files that would be downloaded and the total bytes they would require.
hf download Qwen/Qwen3.5-397B-A17B --dry-run
‘hf’ doesn’t have anything close to that argument. Bad bot.
Maybe due to older version of huggingface_hub library. Try pip install -U huggingface_hub first…
hf download Qwen/Qwen3.5-397B-A17B --dry-run
[dry-run] Fetching 107 files: 100%|██████████████████████████████████████████████████| 107/107 [00:02<00:00, 35.91it/s]
Download complete: : 0.00B [00:02, ?B/s] [dry-run] Will download 107 files (out of 107) totalling 806.8G.]
File Bytes to download
-------------------------------------------- -----------------
.gitattributes 1.6K
LICENSE 11.5K
README.md 94.8K
chat_template.jinja 7.8K
config.json 4.2K
generation_config.json 244.0
merges.txt 3.4M
model.safetensors-00001-of-00094.safetensors 8.6G
...
To check the size of a Hugging Face model like Qwen/Qwen3.5-397B-A17B
Run this to get total size and per-file breakdown without downloading:
python
fromhuggingface_hubimport HfApi
def print_model_sizes(repo_id):
api = HfApi()
repo_info = api.model_info(repo_id=repo_id, files_metadata=True)
total_size_bytes = 0
print(f"File sizes for model '{repo_id}':")
forsiblingin repo_info.siblings:
filename = sibling.rfilename
size_bytes = sibling.sizeor 0
total_size_bytes += size_bytes
size_gb = size_bytes / (1024 ** 3)
print(f" {filename}: {size_gb:.2f} GB")
total_gb = total_size_bytes / (1024 ** 3)
print(f"\nTotal size: {total_gb:.2f} GB")
print_model_sizes("Qwen/Qwen3.5-397B-A17B")
This queries metadata only
bash
hf space status Qwen/Qwen3.5-397B-A17B # O model_info
#dry-run download:
huggingface-cli download Qwen/Qwen3.5-397B-A17B --dry-run
Use Hugging Face’s generic search= parameter. It behaves like the site search: partial/fuzzy-ish matching across model names and metadata.
#!/usr/bin/env python3
from future import annotations
import argparse
from huggingface_hub import HfApi
from huggingface_hub.utils import HfHubHTTPError
def human_size(num_bytes: int | None) → str:
if not num_bytes:
return “0 B”
units = ["B", "KB", "MB", "GB", "TB"]
size = float(num_bytes)
for unit in units:
if size < 1024 or unit == units[-1]:
return f"{size:.2f} {unit}"
size /= 1024
return f"{size:.2f} TB"
def get_repo_size(api: HfApi, repo_id: str) → tuple[int, list[tuple[str, int]]]:
info = api.model_info(repo_id, files_metadata=True)
files = []
total = 0
for sibling in info.siblings or []:
name = sibling.rfilename
size = sibling.size or 0
files.append((name, size))
total += size
files.sort(key=lambda x: x[1], reverse=True)
return total, files
def main() → int:
parser = argparse.ArgumentParser(
description=“Search Hugging Face models by partial term and show approximate repo sizes.”
)
parser.add_argument(“query”, help=“Partial search term, e.g. qwen, llama, flux, embedding, whisper”)
parser.add_argument(“-n”, “–limit”, type=int, default=20, help=“Number of model results to check”)
parser.add_argument(“–show-files”, action=“store_true”, help=“Show largest files per repo”)
parser.add_argument(“–top-files”, type=int, default=5, help=“Number of files to show with --show-files”)
parser.add_argument(“–sort”, default=“downloads”, help=“HF sort field: downloads, likes, lastModified, createdAt”)
parser.add_argument(“–ascending”, action=“store_true”, help=“Sort HF results ascending”)
args = parser.parse_args()
api = HfApi()
models = api.list_models(
search=args.query,
limit=args.limit,
sort=args.sort,
direction=1 if args.ascending else -1,
full=True,
)
rows = []
for model in models:
repo_id = model.modelId
try:
total_size, files = get_repo_size(api, repo_id)
except HfHubHTTPError as e:
rows.append({
"repo_id": repo_id,
"size": None,
"downloads": getattr(model, "downloads", None),
"likes": getattr(model, "likes", None),
"pipeline": getattr(model, "pipeline_tag", None),
"error": str(e),
"files": [],
})
continue
rows.append({
"repo_id": repo_id,
"size": total_size,
"downloads": getattr(model, "downloads", None),
"likes": getattr(model, "likes", None),
"pipeline": getattr(model, "pipeline_tag", None),
"error": None,
"files": files,
})
rows.sort(key=lambda r: r["size"] or 0, reverse=True)
print()
print(f"Search: {args.query}")
print()
print(f"{'SIZE':>12} {'DOWNLOADS':>10} {'LIKES':>7} {'TYPE':<24} MODEL")
print("-" * 90)
for row in rows:
size = human_size(row["size"])
downloads = row["downloads"] if row["downloads"] is not None else "-"
likes = row["likes"] if row["likes"] is not None else "-"
pipeline = row["pipeline"] or "-"
print(f"{size:>12} {downloads:>10} {likes:>7} {pipeline:<24} {row['repo_id']}")
if row["error"]:
print(f"{'':>12} error: {row['error']}")
if args.show_files and row["files"]:
for filename, file_size in row["files"][:args.top_files]:
print(f"{'':>12} {human_size(file_size):>10} {filename}")
print()
return 0
if name == “main”:
raise SystemExit(main())
Or depending on what hf_hub you hace
#!/usr/bin/env python3
from future import annotations
from huggingface_hub import HfApi
from huggingface_hub.utils import HfHubHTTPError
def human_size(num_bytes: int | None) → str:
if not num_bytes:
return “0 B”
units = ["B", "KB", "MB", "GB", "TB"]
size = float(num_bytes)
for unit in units:
if size < 1024 or unit == units[-1]:
return f"{size:.2f} {unit}"
size /= 1024
return f"{size:.2f} TB"
def prompt_int(label: str, default: int) → int:
value = input(f"{label} [{default}]: ").strip()
if not value:
return default
try:
parsed = int(value)
if parsed <= 0:
print(f"Using default: {default}")
return default
return parsed
except ValueError:
print(f"Invalid number. Using default: {default}")
return default
def prompt_bool(label: str, default: bool = False) → bool:
default_text = “y” if default else “n”
value = input(f"{label} [y/n, default {default_text}]: ").strip().lower()
if not value:
return default
return value in {"y", "yes", "true", "1"}
def prompt_choice(label: str, choices: list[str], default: str) → str:
print(f"{label}:“)
for index, choice in enumerate(choices, start=1):
marker = " default” if choice == default else “”
print(f" {index}. {choice}{marker}")
value = input(f"Choose 1-{len(choices)} [{default}]: ").strip()
if not value:
return default
if value in choices:
return value
try:
index = int(value)
if 1 <= index <= len(choices):
return choices[index - 1]
except ValueError:
pass
print(f"Invalid choice. Using default: {default}")
return default
def get_repo_size(api: HfApi, repo_id: str) → tuple[int, list[tuple[str, int]]]:
info = api.model_info(repo_id, files_metadata=True)
files: list[tuple[str, int]] = []
total = 0
for sibling in info.siblings or []:
name = getattr(sibling, "rfilename", None) or getattr(sibling, "filename", None) or "<unknown>"
size = getattr(sibling, "size", None) or 0
size = int(size)
files.append((name, size))
total += size
files.sort(key=lambda x: x[1], reverse=True)
return total, files
def get_model_id(model) → str | None:
return getattr(model, “modelId”, None) or getattr(model, “id”, None)
def sort_models_locally(models: list, sort: str, ascending: bool) → list:
reverse = not ascending
if sort == "downloads":
models.sort(key=lambda m: getattr(m, "downloads", 0) or 0, reverse=reverse)
elif sort == "likes":
models.sort(key=lambda m: getattr(m, "likes", 0) or 0, reverse=reverse)
elif sort == "lastModified":
models.sort(key=lambda m: str(getattr(m, "lastModified", "") or ""), reverse=reverse)
elif sort == "createdAt":
models.sort(key=lambda m: str(getattr(m, "createdAt", "") or ""), reverse=reverse)
elif sort == "trendingScore":
models.sort(key=lambda m: getattr(m, "trendingScore", 0) or 0, reverse=reverse)
return models
def main() → int:
print()
print(“Hugging Face model size search”)
print()
query = input("Search term, partial match is OK: ").strip()
if not query:
print("No search term entered. Exiting.")
return 1
limit = prompt_int("How many model results should I check?", 20)
sort = prompt_choice(
"Sort Hugging Face search results by",
["downloads", "likes", "lastModified", "createdAt", "trendingScore"],
"downloads",
)
ascending = prompt_bool("Ascending sort?", False)
show_files = prompt_bool("Show largest files for each model?", True)
top_files = 5
if show_files:
top_files = prompt_int("How many largest files per model?", 5)
size_sort = prompt_choice(
"Sort final output by computed repo size",
["desc", "asc", "none"],
"desc",
)
api = HfApi()
print()
print(f"Searching Hugging Face for: {query}")
print()
try:
models = list(api.list_models(
search=query,
limit=limit,
sort=sort,
full=True,
))
except TypeError:
models = list(api.list_models(
search=query,
limit=limit,
full=True,
))
models = sort_models_locally(models, sort=sort, ascending=ascending)
if not models:
print(f"No models found for search term: {query}")
return 1
rows = []
total_models = len(models)
for index, model in enumerate(models, start=1):
repo_id = get_model_id(model)
if not repo_id:
continue
print(f"[{index}/{total_models}] Checking {repo_id}...")
try:
total_size, files = get_repo_size(api, repo_id)
error = None
except HfHubHTTPError as e:
total_size = None
files = []
error = str(e)
except Exception as e:
total_size = None
files = []
error = f"{type(e).__name__}: {e}"
rows.append({
"repo_id": repo_id,
"size": total_size,
"downloads": getattr(model, "downloads", None),
"likes": getattr(model, "likes", None),
"pipeline": getattr(model, "pipeline_tag", None),
"error": error,
"files": files,
})
if size_sort == "asc":
rows.sort(key=lambda r: r["size"] or 0)
elif size_sort == "desc":
rows.sort(key=lambda r: r["size"] or 0, reverse=True)
print()
print(f"Search: {query}")
print()
print(f"{'SIZE':>12} {'DOWNLOADS':>10} {'LIKES':>7} {'TYPE':<24} MODEL")
print("-" * 110)
for row in rows:
size = human_size(row["size"])
downloads = row["downloads"] if row["downloads"] is not None else "-"
likes = row["likes"] if row["likes"] is not None else "-"
pipeline = row["pipeline"] or "-"
print(f"{size:>12} {downloads:>10} {likes:>7} {pipeline:<24} {row['repo_id']}")
if row["error"]:
print(f"{'':>12} error: {row['error']}")
if show_files and row["files"]:
for filename, file_size in row["files"][:top_files]:
print(f"{'':>12} {human_size(file_size):>10} {filename}")
print()
return 0
if name == “main”:
raise SystemExit(main())
um just a question here.
are you asking how much physical space it will take up on the hardrive?
or how much VRAM it needs to be able to run efficiently?
because those are 2 similar but very different questions.
@John6666 I owe you an apology, I didn’t realize the --help output could be different depending on the 2nd command. I had already run ‘hf --help’ before creating this discussion and saw nothing I could use, then 1 day later you’re mentioning a dry-run argument that I knew I hadn’t seen in -help, so I assumed you were a bot/lying. I should have just tried your command before assuming ![]()
I was asking about physical disk space.
I know that for VRAM, you need the model’s weight in GB + breathing room to store the KV Cache.
yes, that was the difference I was pointing out. The reason to focus on this too is that for VRAM you do need the model weights in GB plus some breathing room. This also affects solutions because one of the largest, fastest cards that can run AI models (whether you’re looking at consumer or enterprise markets) only has 24 GB of VRAM.Some of the older architectures run up to, I think, 48. There may be other ones. My research into this has not been exhaustive because I’ve been looking into this too. What this actually means is that with some of the larger models (like the 70b models), you need at least two graphics cards to run it. That’s because ofbecause of how they actually run the models, the models are not run as a 70B. They are run through an architecture that breaks the model up. I don’t completely understand that yet.
What it means is they run it as an 8-bit model, I think that is the terminology, which allows a program you would want to use along with the large 70B model. It allows it to operate the 70B model in a VRAM pool between 2 or more cards instead of just on one card.
The smaller agents, like the 8b agents, and I think there is a 16b agent, can run safely on one 24 GB card, but the larger ones won’t.
So far, my research into what I’m trying to do has only led me to discover the 8B, the 30B, and the 70B. I haven’t been looking into it very long.