cgcardona / muse public
entity-identity.md markdown
92 lines 3.2 KB
bda49bdb feat: redesign .museignore as TOML with domain-scoped sections (#100) Gabriel Cardona <cgcardona@gmail.com> 1d ago
1 # Entity Identity in Muse
2
3 ## The problem with content hashes as identity
4
5 Muse uses SHA-256 content hashes to address every object in its store. Two
6 blobs with identical bytes have the same hash — content equality. This is
7 correct for immutable storage but wrong for *entity identity*.
8
9 When a musician changes a note's velocity from 80 to 100, the note has the
10 same identity from the musician's perspective. But the content hash changes,
11 so the old diff model produces a `DeleteOp + InsertOp` pair — the note
12 appears to have been removed and a completely different note inserted. All
13 lineage, provenance, and causal history is lost.
14
15 ## The solution: stable entity IDs
16
17 A `NoteEntity` in `muse/plugins/midi/entity.py` extends the five `NoteKey`
18 fields with an optional `entity_id` — a UUID4 that is assigned at first
19 insertion and **never changes**, regardless of how the note's fields are
20 mutated later.
21
22 ```
23 NoteKey: (pitch, velocity, start_tick, duration_ticks, channel)
24 ↑ content equality
25
26 NoteEntity: NoteKey + entity_id (UUID4)
27 ↑ stable identity across mutations
28 ```
29
30 ## Entity assignment heuristic
31
32 `assign_entity_ids()` maps a new note list onto entity IDs from the prior
33 commit using a three-tier matching strategy:
34
35 1. **Exact content match** — all five fields identical → same entity, no mutation.
36 2. **Fuzzy match** — same pitch + channel, `|Δtick| ≤ threshold` (default 10),
37 and `|Δvelocity| ≤ threshold` (default 20) → same entity, emit `MutateOp`.
38 3. **No match** → new entity, fresh UUID4, emit `InsertOp`.
39
40 Notes in the prior index that matched nothing → emit `DeleteOp`.
41
42 ## MutateOp vs. DeleteOp + InsertOp
43
44 The `MutateOp` in `muse/domain.py` carries:
45
46 | Field | Description |
47 |-------|-------------|
48 | `entity_id` | Stable entity ID |
49 | `old_content_id` | SHA-256 of the note before the mutation |
50 | `new_content_id` | SHA-256 of the note after the mutation |
51 | `fields` | `dict[field_name, FieldMutation(old, new)]` |
52 | `old_summary` / `new_summary` | Human-readable before/after strings |
53
54 This enables queries like "show me all velocity edits to the cello part" across
55 the full commit history.
56
57 ## Entity index storage
58
59 Entity indexes live under `.muse/entity_index/` as derived artifacts:
60
61 ```
62 .muse/entity_index/
63 <commit_id[:16]>/
64 <track_safe_name>_<hash[:8]>.json
65 ```
66
67 They are fully rebuildable from commit history and should be added to the
68 `[domain.midi]` section of `.museignore` (TOML format) in CI to avoid
69 accidental commits:
70
71 ```toml
72 [domain.midi]
73 patterns = [
74 ".muse/entity_index/",
75 ]
76 ```
77
78 ## Independence from core
79
80 Entity identity is purely a music-plugin concern. The core engine
81 (`muse/core/`) never imports from `muse/plugins/`. The `MutateOp` and
82 `FieldMutation` types in `muse/domain.py` are domain-agnostic — a genomics
83 plugin can use the same types to track mutations in a nucleotide sequence.
84
85 ## Related files
86
87 | File | Role |
88 |------|------|
89 | `muse/domain.py` | `MutateOp`, `FieldMutation`, `EntityProvenance` |
90 | `muse/plugins/midi/entity.py` | `NoteEntity`, `EntityIndex`, `assign_entity_ids`, `diff_with_entity_ids` |
91 | `muse/plugins/midi/midi_diff.py` | `diff_midi_notes_with_entities()` |
92 | `tests/test_entity.py` | Unit tests |