Skip to content

Welcome

trace-share

Opt-in tooling to collect, sanitize, and publish coding-agent traces for open model training.

What This Project Is

trace-share is open infrastructure for building high-quality coding-agent datasets from real developer workflows.

The pipeline is designed to:

  • collect from approved local tool sources
  • sanitize data locally before any upload
  • convert traces into consistent Episode records
  • publish versioned dataset snapshots for model training

Why It Exists

Open-source coding models are limited by a lack of high-signal agent interaction data. Most public data is static code, not iterative workflows with tool use, debugging, and fix loops.

trace-share exists to close that gap with reproducible, consent-gated, and privacy-aware data contribution flows.

Who This Is For

  • model trainers building open coding assistants
  • companies that want to support shared model infrastructure
  • individual contributors who want to donate sanitized traces
  • maintainers building parser adapters for additional coding tools

We Are Looking For Support

This project is seeking support from companies and infrastructure partners for:

  • storage and egress credits for dataset artifacts
  • index/search hosting capacity
  • CI/release compute for larger snapshot builds
  • security and privacy review support

If your team wants to help, start with the Proposal and Project Status.

We Welcome Contributions

Community contributions are core to this project:

  • add parser adapters and source definitions
  • improve sanitization and safety coverage
  • expand tests and CI reliability
  • improve docs and onboarding

Start with CLI Usage and Parser Adapters.

Start Here