crdt-reference.md
markdown
| 1 | # Muse CRDT Reference |
| 2 | |
| 3 | > **Audience:** Plugin authors adding CRDT Semantics to their domain plugin, |
| 4 | > or anyone curious about how Muse achieves conflict-free convergence. |
| 5 | > |
| 6 | > **Implementation:** `muse/core/crdts/` — six primitives, each with full type safety, |
| 7 | > `to_dict`/`from_dict` serialisation, and tested lattice-law compliance. |
| 8 | |
| 9 | --- |
| 10 | |
| 11 | ## Table of Contents |
| 12 | |
| 13 | 1. [Why CRDTs?](#why-crdts) |
| 14 | 2. [The Three Lattice Laws](#the-three-lattice-laws) |
| 15 | 3. [Primitive Reference](#primitive-reference) |
| 16 | - [VectorClock](#vectorclock) |
| 17 | - [LWWRegister](#lwwregister) |
| 18 | - [ORSet](#orset) |
| 19 | - [RGA](#rga) |
| 20 | - [AWMap](#awmap) |
| 21 | - [GCounter](#gcounter) |
| 22 | 4. [Combining Primitives](#combining-primitives) |
| 23 | 5. [CRDTPlugin Integration](#crdtplugin-integration) |
| 24 | 6. [Performance Notes](#performance-notes) |
| 25 | 7. [When Not to Use CRDTs](#when-not-to-use-crdts) |
| 26 | |
| 27 | --- |
| 28 | |
| 29 | ## Why CRDTs? |
| 30 | |
| 31 | In classical three-way merge (Muse Phases 1–3), two branches that both edit the same element |
| 32 | produce a conflict that a human must resolve. This is correct and desirable for human-paced |
| 33 | collaborative editing — the human has an opinion and should make the final call. |
| 34 | |
| 35 | But consider a different scenario: twenty automated agents simultaneously annotating a genome, |
| 36 | or a distributed sensor network writing telemetry, or a DAW plugin streaming real-time automation |
| 37 | changes from multiple collaborators. In these cases: |
| 38 | |
| 39 | - Conflicts are too frequent to be individually resolvable |
| 40 | - No human is present to arbitrate |
| 41 | - The agents don't coordinate in real time |
| 42 | - Messages may be delayed, reordered, or duplicated |
| 43 | |
| 44 | CRDTs (**Conflict-free Replicated Data Types**) solve this by changing the definition of a "write." |
| 45 | Instead of "replace the current value," each write is a **join on a partial order** — the state |
| 46 | space is a lattice, and the merge of any two states is always the least upper bound of both. |
| 47 | |
| 48 | The result: **join always converges to the same final state, regardless of message order, delay, |
| 49 | or duplication.** No conflict state ever exists. |
| 50 | |
| 51 | --- |
| 52 | |
| 53 | ## The Three Lattice Laws |
| 54 | |
| 55 | Every CRDT `join` operation must satisfy all three laws. Muse's test suite verifies them for |
| 56 | all six primitives. If you build a composite CRDT from these primitives, your `join` inherits |
| 57 | these properties automatically (lattice composition is closed under these laws). |
| 58 | |
| 59 | ### 1. Commutativity |
| 60 | |
| 61 | ``` |
| 62 | a.join(b) ≡ b.join(a) |
| 63 | ``` |
| 64 | |
| 65 | The order in which you receive updates doesn't matter. Agent A sending its state to B produces |
| 66 | the same result as B sending to A first. |
| 67 | |
| 68 | ### 2. Associativity |
| 69 | |
| 70 | ``` |
| 71 | a.join(b.join(c)) ≡ (a.join(b)).join(c) |
| 72 | ``` |
| 73 | |
| 74 | You can batch updates in any grouping. Receiving 10 updates one at a time is equivalent to |
| 75 | receiving them all batched, or in any intermediate grouping. |
| 76 | |
| 77 | ### 3. Idempotency |
| 78 | |
| 79 | ``` |
| 80 | a.join(a) ≡ a |
| 81 | ``` |
| 82 | |
| 83 | Receiving the same update twice is harmless. Deduplication is not required. |
| 84 | |
| 85 | These three laws together mean: **no matter how your network behaves — delays, reorders, |
| 86 | duplicates — all replicas eventually reach the same state once they have seen all updates.** |
| 87 | |
| 88 | --- |
| 89 | |
| 90 | ## Primitive Reference |
| 91 | |
| 92 | All primitives are in `muse/core/crdts/`. Import them from the package: |
| 93 | |
| 94 | ```python |
| 95 | from muse.core.crdts import VectorClock, LWWRegister, ORSet, RGA, AWMap, GCounter |
| 96 | ``` |
| 97 | |
| 98 | All primitives are **immutable** — every mutating method returns a new instance. This |
| 99 | makes them safe to use as values in `TypedDict` fields and easy to test. |
| 100 | |
| 101 | --- |
| 102 | |
| 103 | ### VectorClock |
| 104 | |
| 105 | **File:** `muse/core/crdts/vclock.py` |
| 106 | |
| 107 | A vector clock assigns a logical timestamp to each agent. It answers "does event A causally |
| 108 | precede event B?" without requiring a synchronized wall clock. |
| 109 | |
| 110 | **Use for:** tracking causal ordering between agents. Required by all other CRDTs implicitly |
| 111 | when you need to know which write was "more recent." |
| 112 | |
| 113 | #### API |
| 114 | |
| 115 | ```python |
| 116 | vc = VectorClock() # empty — all agents at tick 0 |
| 117 | |
| 118 | # Increment agent's own clock before a write |
| 119 | vc2 = vc.increment("agent-1") # {"agent-1": 1} |
| 120 | |
| 121 | # Merge with a clock received from another agent |
| 122 | merged = vc.merge(other_vc) # take max per agent |
| 123 | |
| 124 | # Causal comparison |
| 125 | vc_a.happens_before(vc_b) # True if vc_a ≤ vc_b (strictly before) |
| 126 | vc_a.concurrent_with(vc_b) # True if neither precedes the other |
| 127 | vc_a.equivalent(vc_b) # True if all per-agent ticks are equal |
| 128 | |
| 129 | # Serialisation |
| 130 | d = vc.to_dict() # {"agent-1": 1, "agent-2": 3} |
| 131 | vc3 = VectorClock.from_dict(d) |
| 132 | ``` |
| 133 | |
| 134 | #### When to use |
| 135 | |
| 136 | Always embed a `VectorClock` in your `CRDTSnapshotManifest["vclock"]` field. It tracks |
| 137 | which writes from which agents have been seen, enabling you to detect concurrent writes |
| 138 | and apply correct merge semantics. |
| 139 | |
| 140 | --- |
| 141 | |
| 142 | ### LWWRegister |
| 143 | |
| 144 | **File:** `muse/core/crdts/lww_register.py` |
| 145 | |
| 146 | A register holding a single value. When two agents write concurrently, the one with the |
| 147 | higher timestamp wins ("Last Write Wins"). |
| 148 | |
| 149 | **Use for:** scalar values where recency is the right semantic — tempo, a mode enum, |
| 150 | a display name, a configuration flag. Not appropriate when concurrent writes represent |
| 151 | genuinely independent work that should both be preserved. |
| 152 | |
| 153 | #### API |
| 154 | |
| 155 | ```python |
| 156 | reg: LWWRegister[float] = LWWRegister() |
| 157 | |
| 158 | # Write a new value (timestamp should be monotonically increasing per agent) |
| 159 | reg2 = reg.write(120.0, timestamp=1700000000.0, author="agent-1") |
| 160 | |
| 161 | # Read current value |
| 162 | val = reg2.read() # 120.0 |
| 163 | |
| 164 | # Join two registers — higher timestamp wins |
| 165 | merged = reg2.join(other_reg) |
| 166 | |
| 167 | # Serialisation |
| 168 | d = reg.to_dict() # {"value": ..., "timestamp": ..., "author": ...} |
| 169 | reg3 = LWWRegister[float].from_dict(d) |
| 170 | |
| 171 | reg2.equivalent(reg3) # True if same value/timestamp/author |
| 172 | ``` |
| 173 | |
| 174 | #### Warning on timestamps |
| 175 | |
| 176 | LWW correctness depends on timestamps being reasonably monotone. In a distributed system |
| 177 | with clock skew, use logical timestamps (derived from a `VectorClock`) rather than wall time. |
| 178 | |
| 179 | --- |
| 180 | |
| 181 | ### ORSet |
| 182 | |
| 183 | **File:** `muse/core/crdts/or_set.py` |
| 184 | |
| 185 | An Observed-Remove Set. Elements can be added and removed, but **concurrent adds win over |
| 186 | concurrent removes**. This is the opposite of a naive set where removes win. |
| 187 | |
| 188 | **Why adds-win?** In collaborative scenarios, a concurrent remove means "I didn't know you |
| 189 | were going to add that" — not "I decided to delete your add." Adds-win semantics prevent |
| 190 | silent data loss. |
| 191 | |
| 192 | **Use for:** annotation sets, tag collections, member lists, gene ontology terms, feature |
| 193 | flags — any unordered collection where concurrent adds should be preserved. |
| 194 | |
| 195 | #### API |
| 196 | |
| 197 | ```python |
| 198 | s: ORSet[str] = ORSet() |
| 199 | |
| 200 | # Add with a unique token (UUID or agent+timestamp combination) |
| 201 | s2 = s.add("annotation-GO:0001234", token="agent1-tick42") |
| 202 | |
| 203 | # Remove by value (removes all tokens for that element) |
| 204 | s3 = s2.remove("annotation-GO:0001234") |
| 205 | |
| 206 | # Query |
| 207 | "annotation-GO:0001234" in s2 # True |
| 208 | s2.elements() # frozenset({"annotation-GO:0001234"}) |
| 209 | s2.tokens_for("annotation-GO:0001234") # frozenset({"agent1-tick42"}) |
| 210 | |
| 211 | # Join — union of all add-tokens, then subtract remove-tokens |
| 212 | merged = s2.join(other_set) |
| 213 | |
| 214 | # Serialisation |
| 215 | d = s.to_dict() |
| 216 | s4 = ORSet[str].from_dict(d) |
| 217 | |
| 218 | s2.equivalent(s3) # True if same elements and tokens |
| 219 | ``` |
| 220 | |
| 221 | #### Concurrent add + remove example |
| 222 | |
| 223 | ``` |
| 224 | Agent A: s.add("X", token="a1") |
| 225 | Agent B: s.remove("X") (before seeing A's add) |
| 226 | |
| 227 | After join: |
| 228 | A's add token "a1" is present |
| 229 | B's remove only targets tokens B has seen — not "a1" |
| 230 | Result: "X" is in the merged set ✓ |
| 231 | ``` |
| 232 | |
| 233 | --- |
| 234 | |
| 235 | ### RGA |
| 236 | |
| 237 | **File:** `muse/core/crdts/rga.py` |
| 238 | |
| 239 | A Replicated Growable Array — a list where concurrent insertions are commutative. |
| 240 | Two agents can insert at the same logical position; the result is deterministic based |
| 241 | on `element_id` ordering (larger ID appears first). |
| 242 | |
| 243 | **Use for:** collaborative text editing, ordered note sequences, ordered event streams, |
| 244 | any sequence where multiple agents might insert concurrently. |
| 245 | |
| 246 | #### API |
| 247 | |
| 248 | ```python |
| 249 | rga: RGA[str] = RGA() |
| 250 | |
| 251 | # Insert after the virtual root (parent_id=None means "at the beginning") |
| 252 | rga2 = rga.insert(value="C4", element_id="id-100", parent_id=None) |
| 253 | rga3 = rga2.insert(value="D4", element_id="id-200", parent_id="id-100") |
| 254 | rga4 = rga3.insert(value="E4", element_id="id-300", parent_id="id-200") |
| 255 | |
| 256 | # Delete by element_id (tombstones the element, does not shift IDs) |
| 257 | rga5 = rga4.delete("id-200") |
| 258 | |
| 259 | # Read current sequence (tombstoned elements excluded) |
| 260 | rga4.to_sequence() # ["C4", "D4", "E4"] |
| 261 | rga5.to_sequence() # ["C4", "E4"] |
| 262 | |
| 263 | len(rga4) # 3 |
| 264 | |
| 265 | # Join — builds parent-ID tree, traverses in canonical order |
| 266 | merged = rga4.join(other_rga) |
| 267 | |
| 268 | # Serialisation |
| 269 | d = rga.to_dict() |
| 270 | rga6 = RGA[str].from_dict(d) |
| 271 | |
| 272 | rga4.equivalent(rga6) # True if same elements in same order |
| 273 | ``` |
| 274 | |
| 275 | #### How concurrent insertions resolve |
| 276 | |
| 277 | ``` |
| 278 | Initial: ["A", "C"] (A at id-100, C at id-300) |
| 279 | |
| 280 | Agent 1: inserts "B" at id-200, parent_id="id-100" |
| 281 | Agent 2: inserts "X" at id-250, parent_id="id-100" |
| 282 | |
| 283 | After join (same parent "id-100", id-250 > id-200): |
| 284 | Result: ["A", "X", "B", "C"] |
| 285 | (larger element_id appears first among siblings) |
| 286 | ``` |
| 287 | |
| 288 | To get a specific ordering, choose `element_id` values accordingly. For sequential |
| 289 | insertions from a single agent, monotonically increasing IDs produce the expected order. |
| 290 | |
| 291 | --- |
| 292 | |
| 293 | ### AWMap |
| 294 | |
| 295 | **File:** `muse/core/crdts/aw_map.py` |
| 296 | |
| 297 | An Add-Wins Map. A dictionary where concurrent adds win over concurrent removes, and each |
| 298 | key is managed independently (adding a key does not conflict with removing a different key). |
| 299 | |
| 300 | **Use for:** parameter maps, configuration dicts, per-dimension metadata, named dimension |
| 301 | states, any key-value structure where concurrent writes to different keys should not conflict. |
| 302 | |
| 303 | #### API |
| 304 | |
| 305 | ```python |
| 306 | m: AWMap[str, float] = AWMap() |
| 307 | |
| 308 | # Set a key-value pair (uses an ORSet internally per key) |
| 309 | m2 = m.set("tempo", 120.0, token="agent1-t1") |
| 310 | m3 = m2.set("key_sig", 0.0, token="agent1-t2") |
| 311 | |
| 312 | # Remove a key |
| 313 | m4 = m3.remove("key_sig") |
| 314 | |
| 315 | # Query |
| 316 | m3.get("tempo") # 120.0 |
| 317 | m3.get("missing") # None |
| 318 | "tempo" in m3 # True |
| 319 | m3.keys() # frozenset({"tempo", "key_sig"}) |
| 320 | |
| 321 | # Flatten to plain dict |
| 322 | m3.to_plain_dict() # {"tempo": 120.0, "key_sig": 0.0} |
| 323 | |
| 324 | # Join — union of all add-sets per key, removes applied per key |
| 325 | merged = m3.join(other_map) |
| 326 | |
| 327 | # Serialisation |
| 328 | d = m.to_dict() |
| 329 | m5 = AWMap[str, float].from_dict(d) |
| 330 | |
| 331 | m3.equivalent(m4) # True if same key-value pairs |
| 332 | ``` |
| 333 | |
| 334 | --- |
| 335 | |
| 336 | ### GCounter |
| 337 | |
| 338 | **File:** `muse/core/crdts/g_counter.py` |
| 339 | |
| 340 | A grow-only counter. Each agent increments its own shard; the global value is the sum. |
| 341 | Decrement is not possible — this is by design. Counters that can only grow are trivially |
| 342 | convergent. |
| 343 | |
| 344 | **Use for:** event counts, version numbers, message sequence numbers, commit counts, |
| 345 | read counts — any monotonically increasing quantity. |
| 346 | |
| 347 | #### API |
| 348 | |
| 349 | ```python |
| 350 | gc = GCounter() |
| 351 | |
| 352 | # Increment this agent's shard |
| 353 | gc2 = gc.increment("agent-1") |
| 354 | gc3 = gc2.increment("agent-1") |
| 355 | gc4 = gc3.increment("agent-2") |
| 356 | |
| 357 | gc4.value() # 3 |
| 358 | gc4.value_for("agent-1") # 2 |
| 359 | gc4.value_for("agent-2") # 1 |
| 360 | gc4.value_for("agent-99") # 0 |
| 361 | |
| 362 | # Join — take max per agent shard |
| 363 | merged = gc4.join(other_counter) |
| 364 | |
| 365 | # Serialisation |
| 366 | d = gc.to_dict() # {"agent-1": 2, "agent-2": 1} |
| 367 | gc5 = GCounter.from_dict(d) |
| 368 | |
| 369 | gc4.equivalent(gc5) # True if same per-agent values |
| 370 | ``` |
| 371 | |
| 372 | --- |
| 373 | |
| 374 | ## Combining Primitives |
| 375 | |
| 376 | Complex CRDT state is built by composing primitives. The composition inherits the lattice |
| 377 | laws because each primitive satisfies them and because `join` is applied field-by-field. |
| 378 | |
| 379 | ### Example: a collaborative score header |
| 380 | |
| 381 | ```python |
| 382 | @dataclass |
| 383 | class ScoreHeaderCRDT: |
| 384 | """Convergent score header: tempo register + time_sig register + author set.""" |
| 385 | |
| 386 | tempo: LWWRegister[float] |
| 387 | time_sig: LWWRegister[str] |
| 388 | authors: ORSet[str] |
| 389 | |
| 390 | def join(self, other: ScoreHeaderCRDT) -> ScoreHeaderCRDT: |
| 391 | return ScoreHeaderCRDT( |
| 392 | tempo=self.tempo.join(other.tempo), |
| 393 | time_sig=self.time_sig.join(other.time_sig), |
| 394 | authors=self.authors.join(other.authors), |
| 395 | ) |
| 396 | ``` |
| 397 | |
| 398 | Because `LWWRegister.join` and `ORSet.join` both satisfy the three laws, `ScoreHeaderCRDT.join` |
| 399 | does too — for free, by composition. |
| 400 | |
| 401 | --- |
| 402 | |
| 403 | ## CRDTPlugin Integration |
| 404 | |
| 405 | The entry point in the core engine is `crdt_join_snapshots()` in `muse/core/merge_engine.py`. |
| 406 | The `muse merge` command calls it when `isinstance(plugin, CRDTPlugin)` is `True`: |
| 407 | |
| 408 | ```python |
| 409 | from muse.core.merge_engine import crdt_join_snapshots |
| 410 | from muse.domain import CRDTPlugin, CRDTSnapshotManifest |
| 411 | |
| 412 | # In merge_engine.py — called by the merge command |
| 413 | def crdt_join_snapshots( |
| 414 | plugin: CRDTPlugin, |
| 415 | ours: StateSnapshot, |
| 416 | theirs: StateSnapshot, |
| 417 | ) -> MergeResult: |
| 418 | crdt_a = plugin.to_crdt_state(ours) |
| 419 | crdt_b = plugin.to_crdt_state(theirs) |
| 420 | joined = plugin.join(crdt_a, crdt_b) |
| 421 | merged_snapshot = plugin.from_crdt_state(joined) |
| 422 | return MergeResult( |
| 423 | merged=merged_snapshot, |
| 424 | conflicts=[], # always empty — CRDT join never conflicts |
| 425 | applied_strategies={}, |
| 426 | dimension_reports={}, |
| 427 | ) |
| 428 | ``` |
| 429 | |
| 430 | Notice `conflicts=[]` is always empty. This is the CRDT guarantee: **no human intervention |
| 431 | is ever required.** |
| 432 | |
| 433 | --- |
| 434 | |
| 435 | ## Performance Notes |
| 436 | |
| 437 | | Primitive | Join complexity | Storage | |
| 438 | |-----------|----------------|---------| |
| 439 | | `VectorClock` | O(agents) | One int per agent | |
| 440 | | `LWWRegister` | O(1) | One value + timestamp | |
| 441 | | `ORSet` | O(n + m) tokens | One UUID per add operation | |
| 442 | | `RGA` | O(n log n) | One node per insert (tombstones retained) | |
| 443 | | `AWMap` | O(keys × tokens) | Per-key ORSet overhead | |
| 444 | | `GCounter` | O(agents) | One int per agent | |
| 445 | |
| 446 | **RGA memory warning:** `RGA` retains tombstoned elements forever (this is required for |
| 447 | commutativity). For domains with high churn (many inserts and deletes), implement periodic |
| 448 | garbage collection by taking a snapshot of the live sequence, creating a fresh `RGA`, and |
| 449 | re-inserting only the live elements. This is a safe operation because garbage collection |
| 450 | only affects elements both sides have observed as deleted — a coordination-free safe point. |
| 451 | |
| 452 | --- |
| 453 | |
| 454 | ## When Not to Use CRDTs |
| 455 | |
| 456 | CRDTs are not always the right choice. Use three-way merge (Phases 1–3) when: |
| 457 | |
| 458 | - **Humans are making creative decisions** — a DAW producer choosing a chord voicing should |
| 459 | not have their choice silently overwritten by a LWW timestamp. Use OT merge with conflicts. |
| 460 | |
| 461 | - **The domain has invariants that CRDTs cannot enforce** — CRDTs converge, but they can |
| 462 | produce semantically invalid states. A MIDI file with notes outside the pitch range 0–127 |
| 463 | is technically convergent but musically invalid. Invariant enforcement requires coordination. |
| 464 | |
| 465 | - **Conflict visibility is a feature** — in code review, you want conflicts to be visible |
| 466 | to humans. "This merge is clean" is meaningful precisely because conflicts exist. |
| 467 | |
| 468 | - **You have a clear authority model** — if one agent is the "source of truth," LWW with |
| 469 | that agent always winning is fine. But that's a policy, not a CRDT. |
| 470 | |
| 471 | Use CRDTs when all of the following are true: |
| 472 | 1. Many agents write concurrently (more than humans can coordinate) |
| 473 | 2. No single agent is the authority |
| 474 | 3. All writes are semantically valid in isolation |
| 475 | 4. Convergence is more important than precision |