cgcardona / muse public
plugin-authoring-guide.md markdown
643 lines 21.0 KB
04004b82 Rename MusicRGA → MidiRGA and purge all 'music plugin' terminology Gabriel Cardona <gabriel@tellurstori.com> 1d ago
1 # Muse Plugin Authoring Guide
2
3 > A complete walkthrough for building a domain plugin for Muse v0.1.1. By the end
4 > you will have a fully typed, schema-aware, OT-capable, CRDT-ready plugin that
5 > works with every `muse` CLI command immediately — no core changes needed.
6 >
7 > **Difficulty progression:** Core Protocol (30 min) → Domain Schema (30 min) → OT Merge (1 hr) → CRDT Semantics (1 hr)
8
9 ---
10
11 ## Table of Contents
12
13 1. [What a Plugin Is](#what-a-plugin-is)
14 2. [Quick Start — Copy the Scaffold](#quick-start--copy-the-scaffold)
15 3. [Core Protocol (Required)](#core-protocol-required)
16 4. [Domain Schema](#domain-schema)
17 5. [Operation-Level Merge (OT)](#operation-level-merge-ot)
18 6. [CRDT Semantics](#crdt-semantics)
19 7. [Registering Your Plugin](#registering-your-plugin)
20 8. [Testing Your Plugin](#testing-your-plugin)
21 9. [Checklist Before You Ship](#checklist-before-you-ship)
22
23 ---
24
25 ## What a Plugin Is
26
27 A Muse plugin is a Python class that implements one or more protocols defined in
28 `muse/domain.py`. The core engine treats every domain identically — it knows nothing
29 about your data. You teach it by implementing the protocol.
30
31 The protocol stack has four levels. You must implement the base level. The rest are
32 optional and add progressively richer capabilities:
33
34 ```
35 Level 1: MuseDomainPlugin ← required — basic VCS operations
36 Level 2: schema() ← declares data structure, enables algorithm selection
37 Level 3: StructuredMergePlugin ← enables sub-file OT merge
38 Level 4: CRDTPlugin ← enables convergent multi-agent join
39 ```
40
41 The reference implementation is `muse/plugins/midi/plugin.py`. Read it alongside this
42 guide — it shows every method with real implementation and full docstrings.
43
44 ---
45
46 ## Quick Start — Copy the Scaffold
47
48 The fastest path to a working plugin:
49
50 ```bash
51 cp -r muse/plugins/scaffold muse/plugins/<your_domain>
52 ```
53
54 Then open `muse/plugins/<your_domain>/plugin.py` and replace every `raise NotImplementedError`
55 with real code. The scaffold includes:
56
57 - Full type annotations for all four protocol levels
58 - Docstrings explaining what each method must return
59 - Inline TODO comments marking exactly what to fill in
60 - Example implementations you can adapt
61
62 Register and test:
63
64 ```bash
65 # Add to muse/plugins/registry.py (see Registering Your Plugin below)
66 muse init --domain <your_domain>
67 muse commit -m "initial state"
68 muse domains # inspect your plugin's capabilities
69 ```
70
71 ---
72
73 ## Core Protocol (Required)
74
75 Every plugin must implement these six methods. All are synchronous. None may import from
76 `muse.core.*` — the core engine calls you, not the other way around.
77
78 ### Types you work with
79
80 ```python
81 LiveState = pathlib.Path | dict[str, bytes]
82 StateSnapshot = dict[str, str] # {path: object_id (sha256 hex)}
83 StateDelta = StructuredDelta # list of DomainOp entries
84 DriftReport = dict[str, list[str]] # {"added": [...], "removed": [...], "modified": [...]}
85 ```
86
87 ### `snapshot(live_state) -> StateSnapshot`
88
89 Capture the current state of the working tree. The engine calls this on every `muse commit`.
90
91 **Contract:**
92 - Must be deterministic — same input always produces the same manifest
93 - Must hash every element that can independently change
94 - Must return a `dict` whose values are SHA-256 hex digests (object IDs)
95
96 ```python
97 def snapshot(self, live_state: LiveState) -> StateSnapshot:
98 """Walk live_state and return {path: sha256_hex} for every versioned element."""
99 if isinstance(live_state, pathlib.Path):
100 manifest: dict[str, str] = {}
101 for p in sorted(live_state.rglob("*.your_extension")):
102 raw = p.read_bytes()
103 sha = hashlib.sha256(raw).hexdigest()
104 manifest[str(p.relative_to(live_state))] = sha
105 return manifest
106 # dict[str, bytes] path — used by internal tests
107 return {
108 k: hashlib.sha256(v).hexdigest()
109 for k, v in live_state.items()
110 }
111 ```
112
113 ### `diff(base, target) -> StateDelta`
114
115 Compute the minimal delta between two snapshots. The engine calls this for `muse diff`,
116 `muse show`, and as the first step of `muse commit` (to build `structured_delta`).
117
118 **Contract:**
119 - Must return a `StructuredDelta` — a `list[DomainOp]` of typed operations
120 - Should be as granular as makes sense for your domain
121 - For sequences, use `diff_by_schema()` from `muse.core.diff_algorithms`
122
123 ```python
124 from muse.core.diff_algorithms import diff_by_schema
125 from muse.domain import StructuredDelta, InsertOp, DeleteOp, ReplaceOp
126
127 def diff(self, base: StateSnapshot, target: StateSnapshot) -> StateDelta:
128 ops: list[DomainOp] = []
129 base_paths = set(base)
130 target_paths = set(target)
131
132 for path in sorted(target_paths - base_paths):
133 ops.append(InsertOp(
134 op="insert",
135 address=path,
136 position=None,
137 content_id=target[path],
138 content_summary=f"added {path}",
139 ))
140 for path in sorted(base_paths - target_paths):
141 ops.append(DeleteOp(
142 op="delete",
143 address=path,
144 content_id=base[path],
145 content_summary=f"removed {path}",
146 ))
147 for path in sorted(base_paths & target_paths):
148 if base[path] != target[path]:
149 ops.append(ReplaceOp(
150 op="replace",
151 address=path,
152 before_content_id=base[path],
153 after_content_id=target[path],
154 content_summary=f"modified {path}",
155 ))
156 return StructuredDelta(ops=ops)
157 ```
158
159 ### `merge(base, left, right, *, repo_root) -> MergeResult`
160
161 Three-way merge. The engine calls this for `muse merge` when the plugin does not implement
162 `StructuredMergePlugin`. Implement this even if you plan to implement OT merge — it is the
163 fallback for `muse cherry-pick`.
164
165 **Contract:**
166 - `merged` — the snapshot that results from reconciling left and right
167 - `conflicts` — list of paths that could not be auto-resolved
168 - `applied_strategies` — optional metadata about what resolution was applied
169 - `dimension_reports` — optional per-dimension auto-merge notes
170
171 ```python
172 from muse.domain import MergeResult
173
174 def merge(
175 self,
176 base: StateSnapshot,
177 left: StateSnapshot,
178 right: StateSnapshot,
179 *,
180 repo_root: pathlib.Path | None = None,
181 ) -> MergeResult:
182 merged: dict[str, str] = dict(base)
183 conflicts: list[str] = []
184
185 all_paths = set(base) | set(left) | set(right)
186 for path in sorted(all_paths):
187 b, l, r = base.get(path), left.get(path), right.get(path)
188
189 if l == r: # both sides agree
190 if l is None:
191 merged.pop(path, None)
192 else:
193 merged[path] = l
194 elif b == l and r is not None: # only right changed
195 merged[path] = r
196 elif b == r and l is not None: # only left changed
197 merged[path] = l
198 else: # both changed differently
199 conflicts.append(path)
200 merged[path] = l or r or b or ""
201
202 return MergeResult(
203 merged=merged,
204 conflicts=conflicts,
205 applied_strategies={},
206 dimension_reports={},
207 )
208 ```
209
210 ### `drift(committed, live) -> DriftReport`
211
212 Report how much the working tree has diverged from the last committed snapshot.
213 The engine calls this for `muse status`.
214
215 ```python
216 def drift(self, committed: StateSnapshot, live: LiveState) -> DriftReport:
217 current = self.snapshot(live)
218 delta = self.diff(committed, current)
219 added = [op["address"] for op in delta["ops"] if op["op"] == "insert"]
220 removed = [op["address"] for op in delta["ops"] if op["op"] == "delete"]
221 modified = [op["address"] for op in delta["ops"] if op["op"] in ("replace", "patch")]
222 return {"added": added, "removed": removed, "modified": modified}
223 ```
224
225 ### `apply(delta, live_state) -> LiveState`
226
227 Apply a delta to the working tree. The engine calls this at the end of `muse checkout`
228 for any domain-level post-processing after the file-level restore has already happened.
229
230 ```python
231 def apply(self, delta: StateDelta, live_state: LiveState) -> LiveState:
232 # For most domains: files are already restored by the engine.
233 # Return live_state unchanged unless you need post-processing.
234 return live_state
235 ```
236
237 ---
238
239 ## Domain Schema
240
241 Implement `schema() -> DomainSchema` to declare the structural shape of your data.
242 This enables `diff_by_schema()` to automatically select the best diff algorithm for
243 each dimension, and powers the `muse domains` dashboard.
244
245 ### Schema TypedDicts
246
247 ```python
248 # All defined in muse/core/schema.py
249
250 DomainSchema = TypedDict("DomainSchema", {
251 "domain": str,
252 "version": str,
253 "merge_mode": Literal["three_way", "crdt"],
254 "elements": list[ElementSchema],
255 "dimensions": list[DimensionSpec],
256 })
257
258 ElementSchema = TypedDict("ElementSchema", {
259 "name": str,
260 "kind": Literal["sequence", "tree", "tensor", "set", "map"],
261 "description": str,
262 })
263
264 DimensionSpec = TypedDict("DimensionSpec", {
265 "name": str,
266 "element": str,
267 "description": str,
268 })
269 ```
270
271 ### Choosing `kind` for each element
272
273 | Your data | Use `kind` | Diff algorithm |
274 |-----------|-----------|----------------|
275 | Ordered list of events (rows, notes, steps) | `"sequence"` | Myers LCS — O(nd) |
276 | Hierarchical tree (DOM, JSON tree, scene graph) | `"tree"` | LCS-based tree edit |
277 | N-dimensional numeric array | `"tensor"` | Epsilon-tolerant numerical |
278 | Unordered collection (labels, tags, gene sets) | `"set"` | Set algebra |
279 | Key-value dict (parameters, config) | `"map"` | Per-key comparison |
280
281 ### Example — a genomics plugin schema
282
283 ```python
284 from muse.core.schema import DomainSchema, ElementSchema, DimensionSpec
285
286 def schema(self) -> DomainSchema:
287 return DomainSchema(
288 domain="genomics",
289 version="1.0",
290 merge_mode="three_way",
291 elements=[
292 ElementSchema(
293 name="nucleotide_sequence",
294 kind="sequence",
295 description="Ordered nucleotide positions in a chromosome",
296 ),
297 ElementSchema(
298 name="annotation_set",
299 kind="set",
300 description="Gene ontology annotations on a locus",
301 ),
302 ElementSchema(
303 name="expression_tensor",
304 kind="tensor",
305 description="3D array: sample × gene × timepoint expression values",
306 ),
307 ],
308 dimensions=[
309 DimensionSpec(
310 name="sequence",
311 element="nucleotide_sequence",
312 description="The primary sequence dimension",
313 ),
314 DimensionSpec(
315 name="annotations",
316 element="annotation_set",
317 description="Functional annotations",
318 ),
319 DimensionSpec(
320 name="expression",
321 element="expression_tensor",
322 description="Quantitative expression data",
323 ),
324 ],
325 )
326 ```
327
328 ---
329
330 ## Operation-Level Merge (OT)
331
332 Implement `StructuredMergePlugin` to enable sub-file auto-merge using Operational
333 Transformation. When both sides have a `structured_delta`, the engine calls `merge_ops()`
334 instead of `merge()`.
335
336 ### What OT gives you
337
338 Without OT merge: two branches that both modified the same file conflict at file granularity —
339 you get one conflict entry even if their changes are on completely different notes / rows / elements.
340
341 With OT merge: the engine computes which operations commute (can apply in either order with
342 the same result) and which don't. Non-commuting ops become the real, minimal conflict set.
343
344 ### Protocol
345
346 ```python
347 from muse.domain import StructuredMergePlugin, MergeResult, DomainOp
348
349 class YourPlugin(StructuredMergePlugin):
350 def merge_ops(
351 self,
352 base: StateSnapshot,
353 ours_snap: StateSnapshot,
354 theirs_snap: StateSnapshot,
355 ours_ops: list[DomainOp],
356 theirs_ops: list[DomainOp],
357 *,
358 repo_root: pathlib.Path | None = None,
359 ) -> MergeResult:
360 from muse.core.op_transform import merge_op_lists
361 result = merge_op_lists(
362 base_ops=[],
363 ours_ops=ours_ops,
364 theirs_ops=theirs_ops,
365 )
366
367 if result.conflict_ops:
368 # Build conflict list from the conflicting op addresses
369 conflicts = list({op["address"] for op in result.conflict_ops})
370 else:
371 conflicts = []
372
373 # Build merged snapshot from merged ops + your base state
374 merged = self._apply_ops(base, ours_snap, theirs_snap, result.merged_ops)
375 return MergeResult(
376 merged=merged,
377 conflicts=conflicts,
378 applied_strategies={},
379 dimension_reports={},
380 )
381 ```
382
383 ### Commutativity — what the engine checks
384
385 The function `ops_commute(a, b)` in `muse/core/op_transform.py` covers all 25 op-pair
386 combinations. Key rules:
387
388 | Op pair | Commute? | Reasoning |
389 |---------|----------|-----------|
390 | Any ops at different addresses | ✓ always | Orthogonal files/dimensions |
391 | `InsertOp` + `InsertOp` at same address, different positions | ✓ | Position-disjoint |
392 | `InsertOp` + `InsertOp` at same address, same position | ✗ conflict | Ordering ambiguity |
393 | `DeleteOp` + `DeleteOp` same `content_id` | ✓ idempotent | Both deleted same thing |
394 | `ReplaceOp` + `ReplaceOp` same address | ✗ conflict | Both updated same element |
395 | `PatchOp` + `PatchOp` same address | recursive check | Recurse into child ops |
396
397 ---
398
399 ## CRDT Semantics
400
401 Implement `CRDTPlugin` to replace three-way merge with a mathematical join.
402 CRDTs are ideal when many agents write concurrently and you want **zero conflicts by construction**.
403
404 ### When to choose CRDT mode
405
406 | Scenario | Right choice |
407 |----------|-------------|
408 | Human-paced commits (DAW, editor) | OT merge |
409 | Many autonomous agents writing sub-second | CRDT join |
410 | Collaborative annotation (many simultaneous adds) | CRDT `ORSet` |
411 | Collaborative sequence editing (multi-cursor) | CRDT `RGA` |
412 | Distributed sensor writes (telemetry, IoT) | CRDT `GCounter` or `LWWRegister` |
413
414 ### Choosing CRDT primitives
415
416 ```python
417 from muse.core.crdts import VectorClock, LWWRegister, ORSet, RGA, AWMap, GCounter
418 ```
419
420 | Primitive | Use for | Semantics |
421 |-----------|---------|-----------|
422 | `VectorClock` | Causal ordering across agents | Track which agent wrote what |
423 | `LWWRegister[T]` | A scalar that one agent owns | Timestamp wins |
424 | `ORSet[T]` | A set where concurrent adds win | "Observed-Remove" — adds always beat removes |
425 | `RGA[T]` | An ordered sequence (list) | Insertion is commutative via parent-ID tree |
426 | `AWMap[K, V]` | A key-value map | Adds win; keys are independently managed |
427 | `GCounter` | A counter that only grows | Perfect for event counts, message IDs |
428
429 ### Protocol implementation sketch
430
431 ```python
432 from muse.core.schema import DomainSchema, CRDTDimensionSpec
433 from muse.domain import CRDTPlugin, CRDTSnapshotManifest
434 from muse.core.crdts import ORSet, RGA, VectorClock
435
436 class YourCRDTPlugin(CRDTPlugin):
437 def crdt_schema(self) -> list[CRDTDimensionSpec]:
438 return [
439 CRDTDimensionSpec(
440 name="labels",
441 crdt_type="or_set",
442 description="Unordered annotation labels",
443 ),
444 CRDTDimensionSpec(
445 name="sequence",
446 crdt_type="rga",
447 description="Ordered element sequence",
448 ),
449 ]
450
451 def join(
452 self,
453 a: CRDTSnapshotManifest,
454 b: CRDTSnapshotManifest,
455 ) -> CRDTSnapshotManifest:
456 # Merge vector clocks
457 vc_a = VectorClock.from_dict(a["vclock"])
458 vc_b = VectorClock.from_dict(b["vclock"])
459 merged_vc = vc_a.merge(vc_b)
460
461 # Join each CRDT dimension
462 labels_a = ORSet[str].from_dict(a["crdt_state"]["labels"])
463 labels_b = ORSet[str].from_dict(b["crdt_state"]["labels"])
464 merged_labels = labels_a.join(labels_b)
465
466 seq_a = RGA[str].from_dict(a["crdt_state"]["sequence"])
467 seq_b = RGA[str].from_dict(b["crdt_state"]["sequence"])
468 merged_seq = seq_a.join(seq_b)
469
470 return CRDTSnapshotManifest(
471 files=a["files"], # file-level manifest (from latest write)
472 vclock=merged_vc.to_dict(),
473 crdt_state={
474 "labels": merged_labels.to_dict(),
475 "sequence": merged_seq.to_dict(),
476 },
477 )
478
479 def to_crdt_state(self, snapshot: StateSnapshot) -> CRDTSnapshotManifest:
480 # Lift a plain snapshot into CRDT state (first time, or after plain checkout)
481 return CRDTSnapshotManifest(
482 files=snapshot,
483 vclock=VectorClock().to_dict(),
484 crdt_state={
485 "labels": ORSet[str]().to_dict(),
486 "sequence": RGA[str]().to_dict(),
487 },
488 )
489
490 def from_crdt_state(self, crdt: CRDTSnapshotManifest) -> StateSnapshot:
491 return crdt["files"]
492 ```
493
494 ### The three lattice laws (why join always converges)
495
496 Every CRDT `join` satisfies:
497
498 1. **Commutativity:** `a.join(b) == b.join(a)` — order of arrival doesn't matter
499 2. **Associativity:** `a.join(b.join(c)) == (a.join(b)).join(c)` — batching is fine
500 3. **Idempotency:** `a.join(a) == a` — duplicates are harmless
501
502 These three laws guarantee that no matter how many agents write concurrently, no matter what
503 order messages arrive, the final state always converges to the same value.
504
505 ---
506
507 ## Registering Your Plugin
508
509 Add one line to `muse/plugins/registry.py`:
510
511 ```python
512 from muse.plugins.my_domain.plugin import MyDomainPlugin
513
514 _REGISTRY: dict[str, MuseDomainPlugin] = {
515 "midi": MidiPlugin(),
516 "my_domain": MyDomainPlugin(), # ← add this
517 }
518 ```
519
520 Then initialize:
521
522 ```bash
523 muse init --domain my_domain
524 muse domains # should show your domain with its capabilities
525 ```
526
527 ---
528
529 ## Testing Your Plugin
530
531 Every plugin must have tests covering:
532
533 ### 1. Protocol conformance
534
535 ```python
536 from muse.domain import MuseDomainPlugin
537 from muse.plugins.my_domain.plugin import MyDomainPlugin
538
539 def test_plugin_satisfies_protocol() -> None:
540 plugin = MyDomainPlugin()
541 assert isinstance(plugin, MuseDomainPlugin)
542 ```
543
544 ### 2. Snapshot round-trip
545
546 ```python
547 def test_snapshot_deterministic(tmp_path: pathlib.Path) -> None:
548 plugin = MyDomainPlugin()
549 (tmp_path / "element.ext").write_bytes(b"data")
550 s1 = plugin.snapshot(tmp_path)
551 s2 = plugin.snapshot(tmp_path)
552 assert s1 == s2
553 ```
554
555 ### 3. Diff / apply round-trip
556
557 ```python
558 def test_diff_apply_roundtrip() -> None:
559 plugin = MyDomainPlugin()
560 base = {"a.ext": sha256(b"v1")}
561 target = {"a.ext": sha256(b"v2"), "b.ext": sha256(b"new")}
562 delta = plugin.diff(base, target)
563 assert any(op["op"] == "replace" for op in delta["ops"])
564 assert any(op["op"] == "insert" for op in delta["ops"])
565 ```
566
567 ### 4. Merge — clean case
568
569 ```python
570 def test_merge_clean_different_paths() -> None:
571 plugin = MyDomainPlugin()
572 base = {"a.ext": sha256(b"v1")}
573 left = {"a.ext": sha256(b"v1"), "b.ext": sha256(b"left")}
574 right = {"a.ext": sha256(b"v1"), "c.ext": sha256(b"right")}
575 result = plugin.merge(base, left, right)
576 assert result["conflicts"] == []
577 assert "b.ext" in result["merged"]
578 assert "c.ext" in result["merged"]
579 ```
580
581 ### 5. Merge — conflict case
582
583 ```python
584 def test_merge_conflict_same_path() -> None:
585 plugin = MyDomainPlugin()
586 base = {"a.ext": sha256(b"v1")}
587 left = {"a.ext": sha256(b"left")}
588 right = {"a.ext": sha256(b"right")}
589 result = plugin.merge(base, left, right)
590 assert "a.ext" in result["conflicts"]
591 ```
592
593 ### 6. Schema
594
595 ```python
596 from muse.core.schema import DomainSchema
597
598 def test_schema_shape() -> None:
599 plugin = MyDomainPlugin()
600 s = plugin.schema()
601 assert s["domain"] == "my_domain"
602 assert len(s["elements"]) > 0
603 assert len(s["dimensions"]) > 0
604 assert s["merge_mode"] in ("three_way", "crdt")
605 ```
606
607 ### 7. CRDT lattice laws
608
609 ```python
610 def test_join_commutative() -> None:
611 plugin = MyCRDTPlugin()
612 a = plugin.to_crdt_state({"x": sha256(b"a")})
613 b = plugin.to_crdt_state({"y": sha256(b"b")})
614 ab = plugin.join(a, b)
615 ba = plugin.join(b, a)
616 # compare the domain-meaningful fields, not object identity
617 assert ab["crdt_state"] == ba["crdt_state"]
618
619 def test_join_idempotent() -> None:
620 plugin = MyCRDTPlugin()
621 a = plugin.to_crdt_state({"x": sha256(b"a")})
622 aa = plugin.join(a, a)
623 assert aa["crdt_state"] == a["crdt_state"]
624 ```
625
626 ---
627
628 ## Checklist Before You Ship
629
630 ```
631 □ MuseDomainPlugin protocol: snapshot, diff, merge, drift, apply, schema all implemented
632 □ schema() returns a valid DomainSchema with merge_mode set
633 □ All type hints pass mypy --strict with zero errors
634 □ python tools/typing_audit.py --dirs muse/ tests/ --max-any 0 passes (zero violations)
635 □ pytest tests/test_<domain>_plugin.py -v — all green
636 □ Registered in muse/plugins/registry.py
637 □ muse init --domain <your_domain> works
638 □ muse domains lists your domain with correct capabilities
639 □ If OT merge: StructuredMergePlugin isinstance check passes
640 □ If CRDT: join satisfies commutativity, associativity, idempotency
641 □ No Any, no object, no cast(), no type: ignore, no Optional[X], no print()
642 □ Module docstring on plugin.py explains what the domain models
643 ```