cgcardona / muse public
crdt-reference.md markdown
475 lines 14.8 KB
1e566283 feat: comprehensive docs, scaffold plugin, and muse domains dashboard (#20) Gabriel Cardona <cgcardona@gmail.com> 2d ago
1 # Muse CRDT Reference
2
3 > **Audience:** Plugin authors adding Phase 4 (CRDT Semantics) to their domain plugin,
4 > or anyone curious about how Muse achieves conflict-free convergence.
5 >
6 > **Implementation:** `muse/core/crdts/` — six primitives, each with full type safety,
7 > `to_dict`/`from_dict` serialisation, and tested lattice-law compliance.
8
9 ---
10
11 ## Table of Contents
12
13 1. [Why CRDTs?](#why-crdts)
14 2. [The Three Lattice Laws](#the-three-lattice-laws)
15 3. [Primitive Reference](#primitive-reference)
16 - [VectorClock](#vectorclock)
17 - [LWWRegister](#lwwregister)
18 - [ORSet](#orset)
19 - [RGA](#rga)
20 - [AWMap](#awmap)
21 - [GCounter](#gcounter)
22 4. [Combining Primitives](#combining-primitives)
23 5. [CRDTPlugin Integration](#crdtplugin-integration)
24 6. [Performance Notes](#performance-notes)
25 7. [When Not to Use CRDTs](#when-not-to-use-crdts)
26
27 ---
28
29 ## Why CRDTs?
30
31 In classical three-way merge (Muse Phases 1–3), two branches that both edit the same element
32 produce a conflict that a human must resolve. This is correct and desirable for human-paced
33 collaborative editing — the human has an opinion and should make the final call.
34
35 But consider a different scenario: twenty automated agents simultaneously annotating a genome,
36 or a distributed sensor network writing telemetry, or a DAW plugin streaming real-time automation
37 changes from multiple collaborators. In these cases:
38
39 - Conflicts are too frequent to be individually resolvable
40 - No human is present to arbitrate
41 - The agents don't coordinate in real time
42 - Messages may be delayed, reordered, or duplicated
43
44 CRDTs (**Conflict-free Replicated Data Types**) solve this by changing the definition of a "write."
45 Instead of "replace the current value," each write is a **join on a partial order** — the state
46 space is a lattice, and the merge of any two states is always the least upper bound of both.
47
48 The result: **join always converges to the same final state, regardless of message order, delay,
49 or duplication.** No conflict state ever exists.
50
51 ---
52
53 ## The Three Lattice Laws
54
55 Every CRDT `join` operation must satisfy all three laws. Muse's test suite verifies them for
56 all six primitives. If you build a composite CRDT from these primitives, your `join` inherits
57 these properties automatically (lattice composition is closed under these laws).
58
59 ### 1. Commutativity
60
61 ```
62 a.join(b) ≡ b.join(a)
63 ```
64
65 The order in which you receive updates doesn't matter. Agent A sending its state to B produces
66 the same result as B sending to A first.
67
68 ### 2. Associativity
69
70 ```
71 a.join(b.join(c)) ≡ (a.join(b)).join(c)
72 ```
73
74 You can batch updates in any grouping. Receiving 10 updates one at a time is equivalent to
75 receiving them all batched, or in any intermediate grouping.
76
77 ### 3. Idempotency
78
79 ```
80 a.join(a) ≡ a
81 ```
82
83 Receiving the same update twice is harmless. Deduplication is not required.
84
85 These three laws together mean: **no matter how your network behaves — delays, reorders,
86 duplicates — all replicas eventually reach the same state once they have seen all updates.**
87
88 ---
89
90 ## Primitive Reference
91
92 All primitives are in `muse/core/crdts/`. Import them from the package:
93
94 ```python
95 from muse.core.crdts import VectorClock, LWWRegister, ORSet, RGA, AWMap, GCounter
96 ```
97
98 All primitives are **immutable** — every mutating method returns a new instance. This
99 makes them safe to use as values in `TypedDict` fields and easy to test.
100
101 ---
102
103 ### VectorClock
104
105 **File:** `muse/core/crdts/vclock.py`
106
107 A vector clock assigns a logical timestamp to each agent. It answers "does event A causally
108 precede event B?" without requiring a synchronized wall clock.
109
110 **Use for:** tracking causal ordering between agents. Required by all other CRDTs implicitly
111 when you need to know which write was "more recent."
112
113 #### API
114
115 ```python
116 vc = VectorClock() # empty — all agents at tick 0
117
118 # Increment agent's own clock before a write
119 vc2 = vc.increment("agent-1") # {"agent-1": 1}
120
121 # Merge with a clock received from another agent
122 merged = vc.merge(other_vc) # take max per agent
123
124 # Causal comparison
125 vc_a.happens_before(vc_b) # True if vc_a ≤ vc_b (strictly before)
126 vc_a.concurrent_with(vc_b) # True if neither precedes the other
127 vc_a.equivalent(vc_b) # True if all per-agent ticks are equal
128
129 # Serialisation
130 d = vc.to_dict() # {"agent-1": 1, "agent-2": 3}
131 vc3 = VectorClock.from_dict(d)
132 ```
133
134 #### When to use
135
136 Always embed a `VectorClock` in your `CRDTSnapshotManifest["vclock"]` field. It tracks
137 which writes from which agents have been seen, enabling you to detect concurrent writes
138 and apply correct merge semantics.
139
140 ---
141
142 ### LWWRegister
143
144 **File:** `muse/core/crdts/lww_register.py`
145
146 A register holding a single value. When two agents write concurrently, the one with the
147 higher timestamp wins ("Last Write Wins").
148
149 **Use for:** scalar values where recency is the right semantic — tempo, a mode enum,
150 a display name, a configuration flag. Not appropriate when concurrent writes represent
151 genuinely independent work that should both be preserved.
152
153 #### API
154
155 ```python
156 reg: LWWRegister[float] = LWWRegister()
157
158 # Write a new value (timestamp should be monotonically increasing per agent)
159 reg2 = reg.write(120.0, timestamp=1700000000.0, author="agent-1")
160
161 # Read current value
162 val = reg2.read() # 120.0
163
164 # Join two registers — higher timestamp wins
165 merged = reg2.join(other_reg)
166
167 # Serialisation
168 d = reg.to_dict() # {"value": ..., "timestamp": ..., "author": ...}
169 reg3 = LWWRegister[float].from_dict(d)
170
171 reg2.equivalent(reg3) # True if same value/timestamp/author
172 ```
173
174 #### Warning on timestamps
175
176 LWW correctness depends on timestamps being reasonably monotone. In a distributed system
177 with clock skew, use logical timestamps (derived from a `VectorClock`) rather than wall time.
178
179 ---
180
181 ### ORSet
182
183 **File:** `muse/core/crdts/or_set.py`
184
185 An Observed-Remove Set. Elements can be added and removed, but **concurrent adds win over
186 concurrent removes**. This is the opposite of a naive set where removes win.
187
188 **Why adds-win?** In collaborative scenarios, a concurrent remove means "I didn't know you
189 were going to add that" — not "I decided to delete your add." Adds-win semantics prevent
190 silent data loss.
191
192 **Use for:** annotation sets, tag collections, member lists, gene ontology terms, feature
193 flags — any unordered collection where concurrent adds should be preserved.
194
195 #### API
196
197 ```python
198 s: ORSet[str] = ORSet()
199
200 # Add with a unique token (UUID or agent+timestamp combination)
201 s2 = s.add("annotation-GO:0001234", token="agent1-tick42")
202
203 # Remove by value (removes all tokens for that element)
204 s3 = s2.remove("annotation-GO:0001234")
205
206 # Query
207 "annotation-GO:0001234" in s2 # True
208 s2.elements() # frozenset({"annotation-GO:0001234"})
209 s2.tokens_for("annotation-GO:0001234") # frozenset({"agent1-tick42"})
210
211 # Join — union of all add-tokens, then subtract remove-tokens
212 merged = s2.join(other_set)
213
214 # Serialisation
215 d = s.to_dict()
216 s4 = ORSet[str].from_dict(d)
217
218 s2.equivalent(s3) # True if same elements and tokens
219 ```
220
221 #### Concurrent add + remove example
222
223 ```
224 Agent A: s.add("X", token="a1")
225 Agent B: s.remove("X") (before seeing A's add)
226
227 After join:
228 A's add token "a1" is present
229 B's remove only targets tokens B has seen — not "a1"
230 Result: "X" is in the merged set ✓
231 ```
232
233 ---
234
235 ### RGA
236
237 **File:** `muse/core/crdts/rga.py`
238
239 A Replicated Growable Array — a list where concurrent insertions are commutative.
240 Two agents can insert at the same logical position; the result is deterministic based
241 on `element_id` ordering (larger ID appears first).
242
243 **Use for:** collaborative text editing, ordered note sequences, ordered event streams,
244 any sequence where multiple agents might insert concurrently.
245
246 #### API
247
248 ```python
249 rga: RGA[str] = RGA()
250
251 # Insert after the virtual root (parent_id=None means "at the beginning")
252 rga2 = rga.insert(value="C4", element_id="id-100", parent_id=None)
253 rga3 = rga2.insert(value="D4", element_id="id-200", parent_id="id-100")
254 rga4 = rga3.insert(value="E4", element_id="id-300", parent_id="id-200")
255
256 # Delete by element_id (tombstones the element, does not shift IDs)
257 rga5 = rga4.delete("id-200")
258
259 # Read current sequence (tombstoned elements excluded)
260 rga4.to_sequence() # ["C4", "D4", "E4"]
261 rga5.to_sequence() # ["C4", "E4"]
262
263 len(rga4) # 3
264
265 # Join — builds parent-ID tree, traverses in canonical order
266 merged = rga4.join(other_rga)
267
268 # Serialisation
269 d = rga.to_dict()
270 rga6 = RGA[str].from_dict(d)
271
272 rga4.equivalent(rga6) # True if same elements in same order
273 ```
274
275 #### How concurrent insertions resolve
276
277 ```
278 Initial: ["A", "C"] (A at id-100, C at id-300)
279
280 Agent 1: inserts "B" at id-200, parent_id="id-100"
281 Agent 2: inserts "X" at id-250, parent_id="id-100"
282
283 After join (same parent "id-100", id-250 > id-200):
284 Result: ["A", "X", "B", "C"]
285 (larger element_id appears first among siblings)
286 ```
287
288 To get a specific ordering, choose `element_id` values accordingly. For sequential
289 insertions from a single agent, monotonically increasing IDs produce the expected order.
290
291 ---
292
293 ### AWMap
294
295 **File:** `muse/core/crdts/aw_map.py`
296
297 An Add-Wins Map. A dictionary where concurrent adds win over concurrent removes, and each
298 key is managed independently (adding a key does not conflict with removing a different key).
299
300 **Use for:** parameter maps, configuration dicts, per-dimension metadata, named dimension
301 states, any key-value structure where concurrent writes to different keys should not conflict.
302
303 #### API
304
305 ```python
306 m: AWMap[str, float] = AWMap()
307
308 # Set a key-value pair (uses an ORSet internally per key)
309 m2 = m.set("tempo", 120.0, token="agent1-t1")
310 m3 = m2.set("key_sig", 0.0, token="agent1-t2")
311
312 # Remove a key
313 m4 = m3.remove("key_sig")
314
315 # Query
316 m3.get("tempo") # 120.0
317 m3.get("missing") # None
318 "tempo" in m3 # True
319 m3.keys() # frozenset({"tempo", "key_sig"})
320
321 # Flatten to plain dict
322 m3.to_plain_dict() # {"tempo": 120.0, "key_sig": 0.0}
323
324 # Join — union of all add-sets per key, removes applied per key
325 merged = m3.join(other_map)
326
327 # Serialisation
328 d = m.to_dict()
329 m5 = AWMap[str, float].from_dict(d)
330
331 m3.equivalent(m4) # True if same key-value pairs
332 ```
333
334 ---
335
336 ### GCounter
337
338 **File:** `muse/core/crdts/g_counter.py`
339
340 A grow-only counter. Each agent increments its own shard; the global value is the sum.
341 Decrement is not possible — this is by design. Counters that can only grow are trivially
342 convergent.
343
344 **Use for:** event counts, version numbers, message sequence numbers, commit counts,
345 read counts — any monotonically increasing quantity.
346
347 #### API
348
349 ```python
350 gc = GCounter()
351
352 # Increment this agent's shard
353 gc2 = gc.increment("agent-1")
354 gc3 = gc2.increment("agent-1")
355 gc4 = gc3.increment("agent-2")
356
357 gc4.value() # 3
358 gc4.value_for("agent-1") # 2
359 gc4.value_for("agent-2") # 1
360 gc4.value_for("agent-99") # 0
361
362 # Join — take max per agent shard
363 merged = gc4.join(other_counter)
364
365 # Serialisation
366 d = gc.to_dict() # {"agent-1": 2, "agent-2": 1}
367 gc5 = GCounter.from_dict(d)
368
369 gc4.equivalent(gc5) # True if same per-agent values
370 ```
371
372 ---
373
374 ## Combining Primitives
375
376 Complex CRDT state is built by composing primitives. The composition inherits the lattice
377 laws because each primitive satisfies them and because `join` is applied field-by-field.
378
379 ### Example: a collaborative score header
380
381 ```python
382 @dataclass
383 class ScoreHeaderCRDT:
384 """Convergent score header: tempo register + time_sig register + author set."""
385
386 tempo: LWWRegister[float]
387 time_sig: LWWRegister[str]
388 authors: ORSet[str]
389
390 def join(self, other: ScoreHeaderCRDT) -> ScoreHeaderCRDT:
391 return ScoreHeaderCRDT(
392 tempo=self.tempo.join(other.tempo),
393 time_sig=self.time_sig.join(other.time_sig),
394 authors=self.authors.join(other.authors),
395 )
396 ```
397
398 Because `LWWRegister.join` and `ORSet.join` both satisfy the three laws, `ScoreHeaderCRDT.join`
399 does too — for free, by composition.
400
401 ---
402
403 ## CRDTPlugin Integration
404
405 The entry point in the core engine is `crdt_join_snapshots()` in `muse/core/merge_engine.py`.
406 The `muse merge` command calls it when `isinstance(plugin, CRDTPlugin)` is `True`:
407
408 ```python
409 from muse.core.merge_engine import crdt_join_snapshots
410 from muse.domain import CRDTPlugin, CRDTSnapshotManifest
411
412 # In merge_engine.py — called by the merge command
413 def crdt_join_snapshots(
414 plugin: CRDTPlugin,
415 ours: StateSnapshot,
416 theirs: StateSnapshot,
417 ) -> MergeResult:
418 crdt_a = plugin.to_crdt_state(ours)
419 crdt_b = plugin.to_crdt_state(theirs)
420 joined = plugin.join(crdt_a, crdt_b)
421 merged_snapshot = plugin.from_crdt_state(joined)
422 return MergeResult(
423 merged=merged_snapshot,
424 conflicts=[], # always empty — CRDT join never conflicts
425 applied_strategies={},
426 dimension_reports={},
427 )
428 ```
429
430 Notice `conflicts=[]` is always empty. This is the CRDT guarantee: **no human intervention
431 is ever required.**
432
433 ---
434
435 ## Performance Notes
436
437 | Primitive | Join complexity | Storage |
438 |-----------|----------------|---------|
439 | `VectorClock` | O(agents) | One int per agent |
440 | `LWWRegister` | O(1) | One value + timestamp |
441 | `ORSet` | O(n + m) tokens | One UUID per add operation |
442 | `RGA` | O(n log n) | One node per insert (tombstones retained) |
443 | `AWMap` | O(keys × tokens) | Per-key ORSet overhead |
444 | `GCounter` | O(agents) | One int per agent |
445
446 **RGA memory warning:** `RGA` retains tombstoned elements forever (this is required for
447 commutativity). For domains with high churn (many inserts and deletes), implement periodic
448 garbage collection by taking a snapshot of the live sequence, creating a fresh `RGA`, and
449 re-inserting only the live elements. This is a safe operation because garbage collection
450 only affects elements both sides have observed as deleted — a coordination-free safe point.
451
452 ---
453
454 ## When Not to Use CRDTs
455
456 CRDTs are not always the right choice. Use three-way merge (Phases 1–3) when:
457
458 - **Humans are making creative decisions** — a DAW producer choosing a chord voicing should
459 not have their choice silently overwritten by a LWW timestamp. Use OT merge with conflicts.
460
461 - **The domain has invariants that CRDTs cannot enforce** — CRDTs converge, but they can
462 produce semantically invalid states. A MIDI file with notes outside the pitch range 0–127
463 is technically convergent but musically invalid. Invariant enforcement requires coordination.
464
465 - **Conflict visibility is a feature** — in code review, you want conflicts to be visible
466 to humans. "This merge is clean" is meaningful precisely because conflicts exist.
467
468 - **You have a clear authority model** — if one agent is the "source of truth," LWW with
469 that agent always winning is fine. But that's a policy, not a CRDT.
470
471 Use CRDTs when all of the following are true:
472 1. Many agents write concurrently (more than humans can coordinate)
473 2. No single agent is the authority
474 3. All writes are semantically valid in isolation
475 4. Convergence is more important than precision