Post
45
We trained an open-source Mythos like cybersecurity LLM for the Build Small Hackathon meet OpenMythos
Trained in two stages: SFT on ~1.84K filtered ArXiv cs.CR papers + real CVE data, then RLVR using paired with past vulnerabilities GitHub repos with a verifier model checking outputs against ground truth.
Trained on: H100s from Modal
The RLVR stage made the biggest difference responses got more precise and less prone to confusing similar vulnerability classes.
Everything is open:
🤖 Demo → build-small-hackathon/OpenMythos
🧠 Model → build-small-hackathon/OpenMythos
📦 CVE Dataset → build-small-hackathon/CVE_Vulnerailities_Detailed
📄 ArXiv Dataset → himanshu17HF/ArvixImport-Filtered-Final
Try it out and let us know where it breaks 🙏
Trained in two stages: SFT on ~1.84K filtered ArXiv cs.CR papers + real CVE data, then RLVR using paired with past vulnerabilities GitHub repos with a verifier model checking outputs against ground truth.
Trained on: H100s from Modal
The RLVR stage made the biggest difference responses got more precise and less prone to confusing similar vulnerability classes.
Everything is open:
🤖 Demo → build-small-hackathon/OpenMythos
🧠 Model → build-small-hackathon/OpenMythos
📦 CVE Dataset → build-small-hackathon/CVE_Vulnerailities_Detailed
📄 ArXiv Dataset → himanshu17HF/ArvixImport-Filtered-Final
Try it out and let us know where it breaks 🙏