In a Training Loop 🔄

5 30 52

lulavc

AI & ML interests

None yet

Recent Activity

commentedon an article 15 days ago

Agentic RL: Token-In, Token-Out Done Right

upvoted an article 15 days ago

Agentic RL: Token-In, Token-Out Done Right

upvoted a collection 15 days ago

Collabs

View all activity

Organizations

commented on Agentic RL: Token-In, Token-Out Done Right 15 days ago

Thank you!!

upvoted an article 15 days ago

Article

Agentic RL: Token-In, Token-Out Done Right

huggingface

•

17 days ago

• 13

upvoted a collection 15 days ago

Collabs

Collection

This collection tracks artifacts on the Hub that are products of collaborations with hugging-science. • 17 items • Updated Jan 22 • 2

replied to pankajpandey-dev's post 15 days ago

Good work nd nice thing you doing. Lets make AI more capable on somelanguages that sometimes lacks on quality. I made some good Hindi friends on my path on AI.

reacted to pankajpandey-dev's post with 🔥 15 days ago

Post

14972

🇮🇳 Qwen3-4B Hindi Instruct v2 — a Hindi LLM that runs on your own machine
Most strong Hindi-capable models are either huge or cloud-only. I wanted one that's small enough to run locally but actually follows instructions in Hindi — so I fine-tuned Qwen3-4B on 10K Hindi instruction pairs and shipped it with a full GGUF quant ladder.
✅ Fine-tune (16-bit): huggingface.co/pankajpandey-dev/Qwen3-4B-Hindi-Instruct-v2
✅ GGUF (Q4/Q5/Q8): huggingface.co/pankajpandey-dev/Qwen3-4B-Hindi-Instruct-v2-GGUF
Runs in Ollama, llama.cpp, and LM Studio. The Q4_K_M is just 2.5 GB — fits comfortably on a laptop, CPU or GPU.
Part of my Hindi LLM Series — building openly-licensed Indic models for local and edge use. More coming (Gemma next). Feedback welcome 🙏
#Hindi #IndicNLP #GGUF #LocalLLM #Qwen

4 replies

upvoted a collection 15 days ago

2026 June 🌞 China Open Source Highlights

Collection

20 items • Updated about 18 hours ago • 3

updated a Space 15 days ago

HF-VPS

🚀

Monitor HF-VPS status and resources in real time

liked a model 16 days ago

deepseek-ai/DeepSeek-V4-Flash-Base

292B • Updated Apr 27 • 52.4k • 246

liked a model 17 days ago

unsloth/LFM2.5-8B-A1B

Text Generation • 8B • Updated 17 days ago • 868 • 12

liked a model 27 days ago

HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive

Image-Text-to-Text • 35B • Updated Apr 17 • 2.7M • 1.86k

replied to their post 28 days ago

Sent 2 more emails to deepseek support today.

reacted to salma-remyx's post with 🔥 28 days ago

Post

11603

The space of possible improvements for your AI model is large while evaluation is costly.

So I was excited to discover the ICML 2026 paper from Kobalczyk, Lin, Letham, Zhao, Balandat, and Bakshy titled "LILO: Bayesian Optimization with Natural Language Feedback."

The method learns efficiently from expert preferences, balancing exploration and exploitation in a principled way with Bayesian Optimization for expensive-to-evaluate black-box objectives.

Experimenting with the technique, I trained a Gaussian Process proxy model on the implicit preferences in my code repo's commit history at VQASynth.

The result: I used the model's preference scores to re-rank candidate papers recommended based on my interests in spatial reasoning and multimodal data synthesis.

Semantic relevance is a high-recall method for finding arXiv papers personalized to your interests. Adding contributor preferences, extracted from the merge history of your code offers a high-precision filter.

So what's next? I'm using the model to synthesize a larger volume of preference data to finetune an open-weight coding model with DPO and LoRA. Tuning Coding Agents via Implicit Preference Distillation

arXiv: https://arxiv.org/pdf/2510.17671
Substack: https://remyxai.substack.com/p/lilo-and-myx
VQASynth: https://github.com/remyxai/VQASynth

1 reply

reacted to espejelomar's post with ❤️ 28 days ago

Post

4728

Sharing WorldForge with @abdelstark

It's an open-source Python project for evaluating and replaying robotics and world-model workflows.

The useful part is not only calling a model. WorldForge records the run, validates action shapes, translates outputs into actions, and keeps replay artifacts you can inspect later.

The current demo uses LeRobot + LeWorldModel on PushT through the official loader:

stable_worldmodel.policy.AutoCostModel("pusht/lewm")

The harness also has replay-only paths for Cosmos-Policy and GR00T-style outputs, so you can inspect the provider contract from saved artifacts without keeping a GPU server online.

Try it:

pip install worldforge-ai
uv run --extra harness worldforge-harness --flow robotics-compare

Repo: https://github.com/AbdelStark/worldforge
Docs: https://abdelstark.github.io/worldforge/

Pre-1.0, MIT, and actively looking for contributors. Good areas:
- robotics provider adapters
- replay artifacts
- eval flows
- docs & first-run demos

Good first issues: https://github.com/AbdelStark/worldforge/contribute

If you're building robot policy evals or model adapters, would love a PR — or an issue describing what's missing.