cgcardona / muse public
muse-vcs.md markdown
525 lines 22.1 KB
dfaf1b77 refactor: rename muse-work/ → state/ Gabriel Cardona <gabriel@tellurstori.com> 8h ago
1 # Muse VCS — Architecture Reference
2
3 > **Version:** v0.1.2
4 > **See also:** [Plugin Authoring Guide](../guide/plugin-authoring-guide.md) · [CRDT Reference](../guide/crdt-reference.md) · [E2E Walkthrough](muse-e2e-demo.md) · [Plugin Protocol](../protocol/muse-protocol.md) · [Domain Concepts](../protocol/muse-domain-concepts.md) · [Type Contracts](../reference/type-contracts.md)
5
6 ---
7
8 ## What Muse Is
9
10 Muse is a **domain-agnostic version control system for multidimensional state**. It provides
11 a complete DAG engine — content-addressed objects, commits, branches, three-way merge, drift
12 detection, time-travel checkout, and a full log graph — with one deliberate gap: it does not
13 know what "state" is.
14
15 That gap is the plugin slot. A `MuseDomainPlugin` tells Muse how to interpret your domain's
16 data. Everything else — the DAG, object store, branching, lineage walking, log, merge state
17 machine — is provided by the core engine and shared across all domains.
18
19 Muse v1.0 adds **four layers of semantic richness** on top of that base, each implemented as
20 an optional protocol extension that plugins can adopt without breaking anything:
21
22 | Phase | Protocol | What you gain |
23 |-------|----------|---------------|
24 | 1 — Typed Delta Algebra | `MuseDomainPlugin` (required) | Rich, typed operation lists instead of opaque file diffs |
25 | 2 — Domain Schema | `MuseDomainPlugin.schema()` (required) | Algorithm selection driven by declared data structure |
26 | 3 — OT Merge Engine | `StructuredMergePlugin` (optional) | Sub-file auto-merge using Operational Transformation |
27 | 4 — CRDT Semantics | `CRDTPlugin` (optional) | Convergent join — no conflicts ever possible |
28
29 ---
30
31 ## The Seven Invariants
32
33 ```
34 State = a serializable, content-addressed snapshot of any multidimensional space
35 Commit = a named delta from a parent state, recorded in a DAG
36 Branch = a divergent line of intent forked from a shared ancestor
37 Merge = three-way reconciliation of two divergent state lines against a common base
38 Drift = the gap between committed state and live state
39 Checkout = deterministic reconstruction of any historical state from the DAG
40 Lineage = the causal chain from root to any commit
41 ```
42
43 None of those definitions contain the word "music."
44
45 ---
46
47 ## Repository Structure on Disk
48
49 Every Muse repository is a `.muse/` directory:
50
51 ```
52 .muse/
53 repo.json — repository ID, domain name, creation metadata
54 HEAD — ref pointer, e.g. refs/heads/main
55 config.toml — optional local config (auth token, remotes)
56 refs/
57 heads/
58 main — SHA-256 commit ID of branch HEAD
59 feature/… — additional branch HEADs
60 objects/
61 <sha2>/ — shard directory (first 2 hex chars)
62 <sha62> — raw content-addressed blob
63 commits/
64 <commit_id>.json — CommitRecord (includes structured_delta since Phase 1)
65 snapshots/
66 <snapshot_id>.json — SnapshotRecord (manifest: {path → object_id})
67 tags/
68 <tag_id>.json — TagRecord
69 MERGE_STATE.json — present only during an active merge conflict
70 state/ — the working tree (domain files live here)
71 .museattributes — optional: per-path merge strategy overrides (TOML)
72 .museignore — optional: snapshot exclusion rules (TOML, domain-scoped)
73 ```
74
75 The object store mirrors Git's loose-object layout: sharding by the first two hex characters
76 of each SHA-256 digest prevents filesystem degradation as the repository grows.
77
78 ---
79
80 ## Core Engine Modules
81
82 ```
83 muse/
84 domain.py — all protocol definitions and shared type aliases
85 core/
86 store.py — file-based commit / snapshot / tag CRUD
87 repo.py — repository detection (MUSE_REPO_ROOT or directory walk)
88 snapshot.py — content-addressed snapshot and commit ID derivation
89 object_store.py — SHA-256 blob storage under .muse/objects/
90 merge_engine.py — three-way merge + CRDT join entry points
91 op_transform.py — Operational Transformation (Phase 3)
92 schema.py — DomainSchema TypedDicts (Phase 2)
93 diff_algorithms/ — LCS, tree-edit, numerical, set diff (Phase 2)
94 crdts/ — VectorClock, LWWRegister, ORSet, RGA, AWMap, GCounter (Phase 4)
95 errors.py — ExitCode enum
96 attributes.py — .museattributes loading and strategy resolution
97 plugins/
98 registry.py — domain name → MuseDomainPlugin instance
99 music/
100 plugin.py — MidiPlugin: reference implementation of all protocols
101 midi_diff.py — note-level MIDI diff and MIDI reconstruction
102 scaffold/
103 plugin.py — copy-paste template for new domain plugins
104 cli/
105 app.py — Typer application root, command registration
106 commands/ — one file per Tier 2 command; plumbing/ for Tier 1
107 ```
108
109 ---
110
111 ## Deterministic ID Derivation
112
113 All IDs are SHA-256 digests — the DAG is fully content-addressed:
114
115 ```
116 object_id = sha256(raw_file_bytes)
117 snapshot_id = sha256(sorted("path:object_id\n" pairs))
118 commit_id = sha256(sorted_parent_ids | snapshot_id | message | timestamp_iso)
119 ```
120
121 The same snapshot always produces the same ID. Two commits that point to identical state share
122 a `snapshot_id`. Objects are never overwritten — write is always idempotent.
123
124 ---
125
126 ## Phase 1 — Typed Delta Algebra
127
128 Every commit now carries a `structured_delta: StructuredDelta` alongside the snapshot
129 manifest. A `StructuredDelta` is a list of typed `DomainOp` entries:
130
131 | Op type | Meaning |
132 |---------|---------|
133 | `InsertOp` | An element was added at a position |
134 | `DeleteOp` | An element was removed |
135 | `MoveOp` | An element was repositioned |
136 | `ReplaceOp` | An element's value changed (before/after content hashes) |
137 | `PatchOp` | A container was internally modified (carries child ops recursively) |
138
139 This replaces the old opaque `{added, removed, modified}` path lists entirely. Every operation
140 carries a `content_id` (SHA-256 hash of the element), an `address` (domain-specific location),
141 and a `content_summary` (human-readable description for `muse show`).
142
143 `muse show <commit>` and `muse diff` display note-level diffs for MIDI files — not just "file
144 changed" but "3 notes added at bar 4, 1 note removed from bar 7."
145
146 ---
147
148 ## Phase 2 — Domain Schema & Diff Algorithm Library
149
150 Plugins implement `schema() -> DomainSchema` to declare the structural shape of their data.
151 The schema drives algorithm selection in `diff_by_schema()`:
152
153 | Schema kind | Diff algorithm | Use when… |
154 |-------------|---------------|-----------|
155 | `"sequence"` | Myers LCS | Ordered lists (note events, DNA sequences) |
156 | `"tree"` | LCS-based tree edit | Hierarchical structures (scene graphs, XML) |
157 | `"tensor"` | Epsilon-tolerant numerical | N-dimensional arrays (simulation grids) |
158 | `"set"` | Hash-set algebra | Unordered collections (annotation sets) |
159 | `"map"` | Per-key comparison | Key-value maps (manifests, configs) |
160
161 `DomainSchema.merge_mode` controls which merge path the core engine takes:
162 - `"three_way"` — classic three-way merge (Phases 1–3)
163 - `"crdt"` — convergent CRDT join (Phase 4)
164
165 ---
166
167 ## Phase 3 — Operation-Level Merge Engine
168
169 Plugins that implement `StructuredMergePlugin` gain sub-file auto-merge:
170
171 ```python
172 @runtime_checkable
173 class StructuredMergePlugin(MuseDomainPlugin, Protocol):
174 def merge_ops(
175 self,
176 base: StateSnapshot,
177 ours_snap: StateSnapshot,
178 theirs_snap: StateSnapshot,
179 ours_ops: list[DomainOp],
180 theirs_ops: list[DomainOp],
181 *,
182 repo_root: pathlib.Path | None = None,
183 ) -> MergeResult: ...
184 ```
185
186 The core merge engine detects this with `isinstance(plugin, StructuredMergePlugin)` and calls
187 `merge_ops()` when both branches have `StructuredDelta`. Non-supporting plugins fall back to
188 file-level `merge()` automatically.
189
190 ### Operational Transformation (`muse/core/op_transform.py`)
191
192 | Function | Purpose |
193 |----------|---------|
194 | `ops_commute(a, b)` | Returns `True` when two ops can be applied in either order |
195 | `transform(a, b)` | Adjusts positions so the diamond property holds |
196 | `merge_op_lists(base, ours, theirs)` | Three-way OT merge; returns `MergeOpsResult` |
197 | `merge_structured(base_delta, ours_delta, theirs_delta)` | Wrapper for `StructuredDelta` inputs |
198
199 **Commutativity rules (all 25 op-pair combinations covered):**
200 - Different addresses → always commute
201 - `InsertOp` + `InsertOp` at same position → conflict
202 - `DeleteOp` + `DeleteOp` same content_id → idempotent (not a conflict)
203 - `PatchOp` + `PatchOp` → recursive check on child ops
204 - Cross-type pairs → generally commute (structural independence)
205
206 ---
207
208 ## Phase 4 — CRDT Semantics
209
210 Plugins that implement `CRDTPlugin` replace three-way merge with a mathematical `join` on a
211 lattice. **`join` always succeeds — no conflict state ever exists.**
212
213 ```python
214 @runtime_checkable
215 class CRDTPlugin(MuseDomainPlugin, Protocol):
216 def crdt_schema(self) -> list[CRDTDimensionSpec]: ...
217 def join(self, a: CRDTSnapshotManifest, b: CRDTSnapshotManifest) -> CRDTSnapshotManifest: ...
218 def to_crdt_state(self, snapshot: StateSnapshot) -> CRDTSnapshotManifest: ...
219 def from_crdt_state(self, crdt: CRDTSnapshotManifest) -> StateSnapshot: ...
220 ```
221
222 Entry point: `crdt_join_snapshots()` in `merge_engine.py`.
223
224 ### CRDT Primitive Library (`muse/core/crdts/`)
225
226 | Primitive | File | Best for |
227 |-----------|------|---------|
228 | `VectorClock` | `vclock.py` | Causal ordering between agents |
229 | `LWWRegister` | `lww_register.py` | Scalar values; last write wins |
230 | `ORSet` | `or_set.py` | Unordered sets; adds always win |
231 | `RGA` | `rga.py` | Ordered sequences (collaborative editing) |
232 | `AWMap` | `aw_map.py` | Key-value maps; adds win |
233 | `GCounter` | `g_counter.py` | Monotonically increasing counters |
234
235 All six satisfy: commutativity, associativity, idempotency — the three lattice laws that
236 guarantee convergence regardless of message delivery order.
237
238 ### When to use CRDT mode
239
240 | Scenario | Recommendation |
241 |----------|----------------|
242 | Human-paced commits (once per hour/day) | Three-way merge (Phases 1–3) |
243 | Many agents writing concurrently (sub-second) | CRDT mode |
244 | Shared annotation sets (many simultaneous contributors) | CRDT `ORSet` |
245 | Collaborative score editing (DAW-style) | CRDT `RGA` |
246 | Per-dimension mix | Set `merge_mode="crdt"` per `CRDTDimensionSpec` |
247
248 ---
249
250 ## The Full Plugin Protocol Stack
251
252 ```
253 MuseDomainPlugin ← required by every domain plugin
254 ├── schema() ← Phase 2: declare data structure
255 ├── snapshot() ← capture current live state
256 ├── diff() ← compute typed StructuredDelta
257 ├── drift() ← detect uncommitted changes
258 ├── apply() ← apply delta to working tree
259 └── merge() ← three-way merge (fallback)
260
261 StructuredMergePlugin ← optional Phase 3 extension
262 └── merge_ops() ← operation-level OT merge
263
264 CRDTPlugin ← optional Phase 4 extension
265 ├── crdt_schema() ← declare per-dimension CRDT types
266 ├── join() ← convergent lattice join
267 ├── to_crdt_state() ← lift plain snapshot to CRDT state
268 └── from_crdt_state() ← materialise CRDT state back to snapshot
269 ```
270
271 The core engine detects capabilities at runtime via `isinstance`:
272
273 ```python
274 if isinstance(plugin, CRDTPlugin) and schema["merge_mode"] == "crdt":
275 return crdt_join_snapshots(plugin, ...)
276 elif isinstance(plugin, StructuredMergePlugin):
277 return plugin.merge_ops(base, ours_snap, theirs_snap, ours_ops, theirs_ops)
278 else:
279 return plugin.merge(base, left, right)
280 ```
281
282 ---
283
284 ## How CLI Commands Use the Plugin
285
286 | Command | Plugin method(s) called |
287 |---------|------------------------|
288 | `muse commit` | `snapshot()`, `diff()` (for structured_delta) |
289 | `muse status` | `drift()` |
290 | `muse diff` | `diff()` |
291 | `muse show` | reads stored `structured_delta` |
292 | `muse merge` | `merge_ops()` or `merge()` (capability detection) |
293 | `muse cherry-pick` | `merge()` |
294 | `muse stash` | `snapshot()` |
295 | `muse checkout` | `diff()` + `apply()` |
296 | `muse domains` | `schema()`, capability introspection |
297
298 ---
299
300 ## Adding a New Domain — Quick Reference
301
302 1. Copy `muse/plugins/scaffold/plugin.py` → `muse/plugins/<domain>/plugin.py`
303 2. Implement all methods (every `raise NotImplementedError` must be replaced)
304 3. Register in `muse/plugins/registry.py`
305 4. Run `muse init --domain <domain>` in any project directory
306 5. All existing CLI commands work immediately
307
308 See the full [Plugin Authoring Guide](../guide/plugin-authoring-guide.md) for a step-by-step
309 walkthrough covering Phases 1–4 with examples.
310
311 ---
312
313 ## CLI Command Reference
314
315 Muse uses a **three-tier command architecture**. See [`docs/reference/cli-tiers.md`](../reference/cli-tiers.md) for the full specification and JSON output schemas.
316
317 ### Tier 1 — Plumbing (`muse plumbing …`)
318
319 Machine-readable, JSON-outputting, pipeable primitives. Designed for scripts, agents, and CI automation.
320
321 | Command | Description |
322 |---------|-------------|
323 | `muse plumbing hash-object <file>` | SHA-256 a file; optionally store it |
324 | `muse plumbing cat-object <id>` | Emit raw bytes of a stored object |
325 | `muse plumbing rev-parse <ref>` | Resolve branch/HEAD/prefix → commit_id |
326 | `muse plumbing ls-files [<ref>]` | List tracked files and object IDs |
327 | `muse plumbing read-commit <id>` | Emit full commit JSON |
328 | `muse plumbing read-snapshot <id>` | Emit full snapshot JSON |
329 | `muse plumbing commit-tree <snap_id>` | Create a commit from an explicit snapshot |
330 | `muse plumbing update-ref <branch> <id>` | Move a branch HEAD |
331 | `muse plumbing commit-graph` | Emit commit DAG as JSON |
332 | `muse plumbing pack-objects <ids…>` | Build a PackBundle JSON to stdout |
333 | `muse plumbing unpack-objects` | Read PackBundle JSON from stdin, write to store |
334 | `muse plumbing ls-remote [remote]` | List branch refs on a remote |
335
336 ### Tier 2 — Core Porcelain (top-level `muse …`)
337
338 Human and agent VCS commands. Domain-agnostic; delegate to `muse.core.*`.
339
340 | Command | Description |
341 |---------|-------------|
342 | `muse init [--domain <name>]` | Initialize a repository |
343 | `muse commit -m <msg>` | Snapshot live state and record a commit |
344 | `muse status` | Show drift between HEAD and working tree |
345 | `muse diff [<base>] [<target>]` | Show delta between commits or vs. working tree |
346 | `muse log [--oneline] [--graph] [--stat]` | Display commit history |
347 | `muse show [<ref>] [--json] [--stat]` | Inspect a single commit with operation-level detail |
348 | `muse branch [<name>] [-d <name>]` | Create or delete branches |
349 | `muse checkout <branch\|commit> [-b]` | Switch branches or restore historical state |
350 | `muse merge <branch>` | Three-way merge (or CRDT join, capability-detected) |
351 | `muse cherry-pick <commit>` | Apply a specific commit's delta on top of HEAD |
352 | `muse revert <commit>` | Create a new commit undoing a prior commit |
353 | `muse reset <commit> [--hard]` | Move branch pointer |
354 | `muse stash` / `pop` / `list` / `drop` | Temporarily shelve uncommitted changes |
355 | `muse tag add <tag> [<ref>]` | Tag a commit |
356 | `muse tag list [<ref>]` | List tags |
357 | `muse domains` | Show domain dashboard — registered domains, capabilities, schema |
358 | `muse remote add <name> <url>` | Register a named remote |
359 | `muse clone <url> [dir]` | Clone a remote repository |
360 | `muse fetch [remote]` | Download commits/snapshots/objects |
361 | `muse pull [remote]` | Fetch + three-way merge |
362 | `muse push [remote]` | Upload local commits/snapshots/objects |
363 | `muse check` | Domain-agnostic invariant check |
364 | `muse annotate` | CRDT-backed commit annotations |
365
366 ### Tier 3 — Semantic Porcelain
367
368 Domain-specific commands that interpret multidimensional state. Each sub-namespace is served by the corresponding plugin.
369
370 #### MIDI domain (`muse midi …`)
371
372 | Command | Description |
373 |---------|-------------|
374 | `muse midi notes` | Every note in a track as musical notation |
375 | `muse midi note-log` | Note-level commit history |
376 | `muse midi note-blame` | Per-bar attribution |
377 | `muse midi harmony` | Chord analysis and key detection |
378 | `muse midi piano-roll` | ASCII piano roll visualization |
379 | `muse midi hotspots` | Bar-level churn leaderboard |
380 | `muse midi velocity-profile` | Dynamic range and velocity histogram |
381 | `muse midi transpose` | Transpose all notes by N semitones |
382 | `muse midi mix` | Combine two MIDI tracks into one |
383 | `muse midi query` | MIDI DSL predicate query over history |
384 | `muse midi check` | Enforce MIDI invariant rules |
385
386 #### Code domain (`muse code …`)
387
388 | Command | Description |
389 |---------|-------------|
390 | `muse code symbols` | List every symbol in a snapshot |
391 | `muse code symbol-log` | Full history of one symbol |
392 | `muse code detect-refactor` | Detect semantic refactoring operations |
393 | `muse code grep` | Search the symbol graph |
394 | `muse code blame` | Which commit last touched a symbol |
395 | `muse code hotspots` | Symbol churn leaderboard |
396 | `muse code stable` | Symbol stability leaderboard |
397 | `muse code coupling` | File co-change analysis |
398 | `muse code compare` | Deep semantic comparison between snapshots |
399 | `muse code languages` | Language and symbol-type breakdown |
400 | `muse code patch` | Surgical per-symbol modification |
401 | `muse code query` | Symbol graph predicate DSL |
402 | `muse code query-history` | Temporal symbol search |
403 | `muse code deps` | Import graph and call graph |
404 | `muse code find-symbol` | Cross-commit symbol search |
405 | `muse code impact` | Transitive blast-radius for a symbol |
406 | `muse code dead` | Dead code candidates |
407 | `muse code coverage` | Interface call-coverage |
408 | `muse code lineage` | Full provenance chain of a symbol |
409 | `muse code api-surface` | Public API surface at a commit |
410 | `muse code codemap` | Semantic topology of the codebase |
411 | `muse code clones` | Find duplicate symbols |
412 | `muse code checkout-symbol` | Restore a historical symbol version |
413 | `muse code semantic-cherry-pick` | Cherry-pick named symbols from history |
414 | `muse code index` | Manage local symbol indexes |
415 | `muse code breakage` | Detect structural breakage vs HEAD |
416 | `muse code invariants` | Enforce architectural rules |
417 | `muse code check` | Semantic invariant enforcement |
418
419 #### Coordination domain (`muse coord …`)
420
421 | Command | Description |
422 |---------|-------------|
423 | `muse coord reserve` | Advisory symbol reservation |
424 | `muse coord intent` | Declare an operation before executing |
425 | `muse coord forecast` | Predict merge conflicts |
426 | `muse coord plan-merge` | Dry-run semantic merge plan |
427 | `muse coord shard` | Partition codebase into parallel work zones |
428 | `muse coord reconcile` | Recommend merge ordering strategy |
429
430 ---
431
432 ## Remote Sync
433
434 Muse supports synchronizing repositories with a remote host (e.g. MuseHub)
435 through six commands built on a typed, swappable transport layer.
436
437 ### Commands
438
439 | Command | Description |
440 |---------|-------------|
441 | `muse remote add <name> <url>` | Register a named remote connection |
442 | `muse clone <url> [dir]` | Create a local copy of a remote repository |
443 | `muse fetch [remote]` | Download commits/snapshots/objects (no merge) |
444 | `muse pull [remote]` | Fetch + three-way merge into current branch |
445 | `muse push [remote]` | Upload local commits/snapshots/objects |
446 | `muse plumbing ls-remote [remote]` | List branch refs on a remote (Tier 1 plumbing) |
447
448 ### Transport Architecture
449
450 ```
451 Muse CLI (client) MuseHub (server)
452 ───────────────── ────────────────
453 MuseTransport Protocol
454 └─ HttpTransport (urllib, stdlib) ──HTTPS──► GET {url}/refs
455 POST {url}/fetch
456 POST {url}/push
457 ```
458
459 The `MuseTransport` Protocol in `muse/core/transport.py` is the seam between
460 CLI commands and the HTTP implementation. Every command delegates to this
461 Protocol, so MuseHub can upgrade to HTTP/2 or gRPC without touching command
462 code — only `HttpTransport` changes.
463
464 ### PackBundle Wire Format
465
466 The unit of exchange is a `PackBundle` (defined in `muse/core/pack.py`): a
467 JSON object carrying commits, snapshots, and base64-encoded object blobs.
468 `build_pack()` assembles a bundle from a set of local commit IDs;
469 `apply_pack()` writes a received bundle into the local `.muse/` directory in
470 dependency order (objects → snapshots → commits).
471
472 ```
473 muse/core/pack.py
474 ├─ build_pack(root, commit_ids, *, have) → PackBundle
475 └─ apply_pack(root, bundle) → ApplyResult # commits/snapshots/objects written + skipped
476 ```
477
478 ### Local State for Remotes
479
480 ```
481 .muse/
482 config.toml [remotes.<name>] url + branch
483 remotes/<name>/<branch> last-known remote commit ID (tracking head)
484 ```
485
486 See `docs/reference/remotes.md` for the full reference including the MuseHub
487 API contract, authentication, and tracking branch semantics.
488
489 ---
490
491 ## Testing & Verification
492
493 ```bash
494 # Full test suite (1903 tests)
495 .venv/bin/pytest tests/ -v
496
497 # Type checking (zero errors required)
498 mypy muse/
499
500 # Typing audit (zero Any violations required)
501 python tools/typing_audit.py --dirs muse/ tests/ --max-any 0
502 ```
503
504 CI runs all three gates on every PR to `dev` and on every `dev → main` merge.
505
506 ---
507
508 ## Key Design Decisions
509
510 **Why no `async`?** The CLI is synchronous by design. All algorithms are CPU-bound and
511 complete in bounded time. If a domain's data is too large to diff synchronously, the plugin
512 should chunk it — this is a domain concern, not a core concern.
513
514 **Why TypedDicts over Pydantic?** Zero external dependencies. All types are JSON-serialisable
515 by construction. `mypy --strict` verifies them without runtime overhead.
516
517 **Why content-addressed storage?** Objects are never overwritten. Checkout, revert, and
518 cherry-pick cost zero bytes when the target objects already exist. The object store scales to
519 millions of fine-grained sub-elements (individual notes, nucleotides, mesh vertices) without
520 format changes.
521
522 **Why four phases?** Each phase is independently useful. A plugin that only implements
523 Phase 1 gets rich operation-level `muse show` output. Phase 2 adds algorithm selection.
524 Phase 3 adds sub-file auto-merge. Phase 4 adds convergent multi-agent semantics. Adoption
525 is incremental and current.