Welcome

Opt-in tooling to collect, sanitize, and publish coding-agent traces for open model training.

What This Project Is

trace-share is open infrastructure for building high-quality coding-agent datasets from real developer workflows.

The pipeline is designed to:

collect from approved local tool sources
sanitize data locally before any upload
convert traces into consistent Episode records
publish versioned dataset snapshots for model training

Why It Exists

Open-source coding models are limited by a lack of high-signal agent interaction data. Most public data is static code, not iterative workflows with tool use, debugging, and fix loops.

trace-share exists to close that gap with reproducible, consent-gated, and privacy-aware data contribution flows.

Who This Is For

model trainers building open coding assistants
companies that want to support shared model infrastructure
individual contributors who want to donate sanitized traces
maintainers building parser adapters for additional coding tools

We Are Looking For Support

This project is seeking support from companies and infrastructure partners for:

storage and egress credits for dataset artifacts
index/search hosting capacity
CI/release compute for larger snapshot builds
security and privacy review support

If your team wants to help, start with the Proposal and Project Status.

We Welcome Contributions

Community contributions are core to this project:

add parser adapters and source definitions
improve sanitization and safety coverage
expand tests and CI reliability
improve docs and onboarding

Start with CLI Usage and Parser Adapters.