Thank you!!
lulavc
lulavc
AI & ML interests
None yet
Recent Activity
commentedon an article 15 days ago
Agentic RL: Token-In, Token-Out Done Right upvoted an article 15 days ago
Agentic RL: Token-In, Token-Out Done Right upvoted a collection 15 days ago
CollabsOrganizations
commented on Agentic RL: Token-In, Token-Out Done Right 15 days ago
upvoted an article 15 days ago
Article
Agentic RL: Token-In, Token-Out Done Right
huggingface
⢠⢠13upvoted a collection 15 days ago
replied to pankajpandey-dev's post 15 days ago
Good work nd nice thing you doing. Lets make AI more capable on somelanguages that sometimes lacks on quality. I made some good Hindi friends on my path on AI.
reacted to pankajpandey-dev's post with š„ 15 days ago
Post
14972
š®š³ Qwen3-4B Hindi Instruct v2 ā a Hindi LLM that runs on your own machine
Most strong Hindi-capable models are either huge or cloud-only. I wanted one that's small enough to run locally but actually follows instructions in Hindi ā so I fine-tuned Qwen3-4B on 10K Hindi instruction pairs and shipped it with a full GGUF quant ladder.
ā Fine-tune (16-bit): huggingface.co/pankajpandey-dev/Qwen3-4B-Hindi-Instruct-v2
ā GGUF (Q4/Q5/Q8): huggingface.co/pankajpandey-dev/Qwen3-4B-Hindi-Instruct-v2-GGUF
Runs in Ollama, llama.cpp, and LM Studio. The Q4_K_M is just 2.5 GB ā fits comfortably on a laptop, CPU or GPU.
Part of my Hindi LLM Series ā building openly-licensed Indic models for local and edge use. More coming (Gemma next). Feedback welcome š
#Hindi #IndicNLP #GGUF #LocalLLM #Qwen
Most strong Hindi-capable models are either huge or cloud-only. I wanted one that's small enough to run locally but actually follows instructions in Hindi ā so I fine-tuned Qwen3-4B on 10K Hindi instruction pairs and shipped it with a full GGUF quant ladder.
ā Fine-tune (16-bit): huggingface.co/pankajpandey-dev/Qwen3-4B-Hindi-Instruct-v2
ā GGUF (Q4/Q5/Q8): huggingface.co/pankajpandey-dev/Qwen3-4B-Hindi-Instruct-v2-GGUF
Runs in Ollama, llama.cpp, and LM Studio. The Q4_K_M is just 2.5 GB ā fits comfortably on a laptop, CPU or GPU.
Part of my Hindi LLM Series ā building openly-licensed Indic models for local and edge use. More coming (Gemma next). Feedback welcome š
#Hindi #IndicNLP #GGUF #LocalLLM #Qwen
upvoted a collection 15 days ago
replied to their post 28 days ago
Sent 2 more emails to deepseek support today.
reacted to salma-remyx's post with š„ 28 days ago
Post
11603
The space of possible improvements for your AI model is large while evaluation is costly.
So I was excited to discover the ICML 2026 paper from Kobalczyk, Lin, Letham, Zhao, Balandat, and Bakshy titled "LILO: Bayesian Optimization with Natural Language Feedback."
The method learns efficiently from expert preferences, balancing exploration and exploitation in a principled way with Bayesian Optimization for expensive-to-evaluate black-box objectives.
Experimenting with the technique, I trained a Gaussian Process proxy model on the implicit preferences in my code repo's commit history at VQASynth.
The result: I used the model's preference scores to re-rank candidate papers recommended based on my interests in spatial reasoning and multimodal data synthesis.
Semantic relevance is a high-recall method for finding arXiv papers personalized to your interests. Adding contributor preferences, extracted from the merge history of your code offers a high-precision filter.
So what's next? I'm using the model to synthesize a larger volume of preference data to finetune an open-weight coding model with DPO and LoRA. Tuning Coding Agents via Implicit Preference Distillation
arXiv: https://arxiv.org/pdf/2510.17671
Substack: https://remyxai.substack.com/p/lilo-and-myx
VQASynth: https://github.com/remyxai/VQASynth
So I was excited to discover the ICML 2026 paper from Kobalczyk, Lin, Letham, Zhao, Balandat, and Bakshy titled "LILO: Bayesian Optimization with Natural Language Feedback."
The method learns efficiently from expert preferences, balancing exploration and exploitation in a principled way with Bayesian Optimization for expensive-to-evaluate black-box objectives.
Experimenting with the technique, I trained a Gaussian Process proxy model on the implicit preferences in my code repo's commit history at VQASynth.
The result: I used the model's preference scores to re-rank candidate papers recommended based on my interests in spatial reasoning and multimodal data synthesis.
Semantic relevance is a high-recall method for finding arXiv papers personalized to your interests. Adding contributor preferences, extracted from the merge history of your code offers a high-precision filter.
So what's next? I'm using the model to synthesize a larger volume of preference data to finetune an open-weight coding model with DPO and LoRA. Tuning Coding Agents via Implicit Preference Distillation
arXiv: https://arxiv.org/pdf/2510.17671
Substack: https://remyxai.substack.com/p/lilo-and-myx
VQASynth: https://github.com/remyxai/VQASynth
reacted to espejelomar's post with ā¤ļø 28 days ago
Post
4728
Sharing WorldForge with @abdelstark
It's an open-source Python project for evaluating and replaying robotics and world-model workflows.
The useful part is not only calling a model. WorldForge records the run, validates action shapes, translates outputs into actions, and keeps replay artifacts you can inspect later.
The current demo uses LeRobot + LeWorldModel on PushT through the official loader:
The harness also has replay-only paths for Cosmos-Policy and GR00T-style outputs, so you can inspect the provider contract from saved artifacts without keeping a GPU server online.
Try it:
Repo: https://github.com/AbdelStark/worldforge
Docs: https://abdelstark.github.io/worldforge/
Pre-1.0, MIT, and actively looking for contributors. Good areas:
- robotics provider adapters
- replay artifacts
- eval flows
- docs & first-run demos
Good first issues: https://github.com/AbdelStark/worldforge/contribute
If you're building robot policy evals or model adapters, would love a PR ā or an issue describing what's missing.
It's an open-source Python project for evaluating and replaying robotics and world-model workflows.
The useful part is not only calling a model. WorldForge records the run, validates action shapes, translates outputs into actions, and keeps replay artifacts you can inspect later.
The current demo uses LeRobot + LeWorldModel on PushT through the official loader:
stable_worldmodel.policy.AutoCostModel("pusht/lewm")The harness also has replay-only paths for Cosmos-Policy and GR00T-style outputs, so you can inspect the provider contract from saved artifacts without keeping a GPU server online.
Try it:
pip install worldforge-aiuv run --extra harness worldforge-harness --flow robotics-compareRepo: https://github.com/AbdelStark/worldforge
Docs: https://abdelstark.github.io/worldforge/
Pre-1.0, MIT, and actively looking for contributors. Good areas:
- robotics provider adapters
- replay artifacts
- eval flows
- docs & first-run demos
Good first issues: https://github.com/AbdelStark/worldforge/contribute
If you're building robot policy evals or model adapters, would love a PR ā or an issue describing what's missing.