# How Cosine’s model was trained

**Cosine’s Genie model** is purpose-built for software engineering, optimized for autonomy, reasoning, and code correctness.

Unlike general-purpose LLMs, Genie was trained to understand real-world repository structures, dependency graphs, and test-driven workflows.

***

#### **Training sources and approach**

* **Pretraining:** On high-quality, permissively licensed open-source repositories (e.g., MIT, Apache, BSD).
* **Filtering:** Removal of PII, insecure code, and non-source text.
* **Domain diversity:** Data across 20+ languages and frameworks (Python, Java, JS/TS, C#, Go, etc.).

***

#### **Reinforcement learning for engineering tasks**

Genie is post-trained with reinforcement signals specific to engineering quality:

* Successful vs. failed task completions.
* Code compile/test outcomes.
* PR merge acceptance rates.
* Efficiency of fixes and refactors.

This reinforcement phase teaches Genie to *plan, validate, and reason* about software — not just autocomplete text.

***

#### **Continuous evaluation and fine-tuning**

Cosine runs continuous regression tests on real repositories to measure:\n- Code accuracy and runtime stability. \n- Test pass rates and diff efficiency. \n- Hallucination and error frequency. \n\nEnterprise deployments may use **private fine-tuning** on internal codebases, fully contained within their VPC or on-prem environments — no data egress.

***

#### **Model safety and data governance**

* **Zero customer data used for training.**
* **PII and license filtering** applied pre-training.
* **Model cards** document dataset sources, evaluation benchmarks, and update history.
* Aligned with **NIST AI RMF** and **EU AI Act** governance frameworks.

***

#### **Why this matters**

This purpose-built training pipeline makes Genie more reliable for real engineering tasks — from legacy refactors to multi-service migrations — and ensures Cosine is **trustworthy, secure, and audit-ready** for enterprise use.

***

#### **Related pages**

* [Model governance and control](https://www.notion.so/How-Cosine-s-model-was-trained-284aa81af4ab802aaf71c02b36a7f529?pvs=21)
* [How does Cosine minimize hallucinations and ensure code quality?](https://www.notion.so/How-Cosine-s-model-was-trained-284aa81af4ab802aaf71c02b36a7f529?pvs=21)
* [How does Cosine handle security, privacy, and IP?](https://www.notion.so/How-Cosine-s-model-was-trained-284aa81af4ab802aaf71c02b36a7f529?pvs=21)

→ Next: What benchmarks or case studies exist?
