cgcardona / muse public
entity-identity.md markdown
84 lines 3.1 KB
9ee9c39c refactor: rename music→midi domain, strip all 5-dim backward compat Gabriel Cardona <gabriel@tellurstori.com> 1d ago
1 # Entity Identity in Muse
2
3 ## The problem with content hashes as identity
4
5 Muse uses SHA-256 content hashes to address every object in its store. Two
6 blobs with identical bytes have the same hash — content equality. This is
7 correct for immutable storage but wrong for *entity identity*.
8
9 When a musician changes a note's velocity from 80 to 100, the note has the
10 same identity from the musician's perspective. But the content hash changes,
11 so the old diff model produces a `DeleteOp + InsertOp` pair — the note
12 appears to have been removed and a completely different note inserted. All
13 lineage, provenance, and causal history is lost.
14
15 ## The solution: stable entity IDs
16
17 A `NoteEntity` in `muse/plugins/midi/entity.py` extends the five `NoteKey`
18 fields with an optional `entity_id` — a UUID4 that is assigned at first
19 insertion and **never changes**, regardless of how the note's fields are
20 mutated later.
21
22 ```
23 NoteKey: (pitch, velocity, start_tick, duration_ticks, channel)
24 ↑ content equality
25
26 NoteEntity: NoteKey + entity_id (UUID4)
27 ↑ stable identity across mutations
28 ```
29
30 ## Entity assignment heuristic
31
32 `assign_entity_ids()` maps a new note list onto entity IDs from the prior
33 commit using a three-tier matching strategy:
34
35 1. **Exact content match** — all five fields identical → same entity, no mutation.
36 2. **Fuzzy match** — same pitch + channel, `|Δtick| ≤ threshold` (default 10),
37 and `|Δvelocity| ≤ threshold` (default 20) → same entity, emit `MutateOp`.
38 3. **No match** → new entity, fresh UUID4, emit `InsertOp`.
39
40 Notes in the prior index that matched nothing → emit `DeleteOp`.
41
42 ## MutateOp vs. DeleteOp + InsertOp
43
44 The `MutateOp` in `muse/domain.py` carries:
45
46 | Field | Description |
47 |-------|-------------|
48 | `entity_id` | Stable entity ID |
49 | `old_content_id` | SHA-256 of the note before the mutation |
50 | `new_content_id` | SHA-256 of the note after the mutation |
51 | `fields` | `dict[field_name, FieldMutation(old, new)]` |
52 | `old_summary` / `new_summary` | Human-readable before/after strings |
53
54 This enables queries like "show me all velocity edits to the cello part" across
55 the full commit history.
56
57 ## Entity index storage
58
59 Entity indexes live under `.muse/entity_index/` as derived artifacts:
60
61 ```
62 .muse/entity_index/
63 <commit_id[:16]>/
64 <track_safe_name>_<hash[:8]>.json
65 ```
66
67 They are fully rebuildable from commit history and should be added to
68 `.museignore` in CI to avoid accidental commits.
69
70 ## Independence from core
71
72 Entity identity is purely a music-plugin concern. The core engine
73 (`muse/core/`) never imports from `muse/plugins/`. The `MutateOp` and
74 `FieldMutation` types in `muse/domain.py` are domain-agnostic — a genomics
75 plugin can use the same types to track mutations in a nucleotide sequence.
76
77 ## Related files
78
79 | File | Role |
80 |------|------|
81 | `muse/domain.py` | `MutateOp`, `FieldMutation`, `EntityProvenance` |
82 | `muse/plugins/midi/entity.py` | `NoteEntity`, `EntityIndex`, `assign_entity_ids`, `diff_with_entity_ids` |
83 | `muse/plugins/midi/midi_diff.py` | `diff_midi_notes_with_entities()` |
84 | `tests/test_entity.py` | Unit tests |