docs/reference/security.md · cgcardona/muse — MuseHub

cgcardona / muse public

security.md markdown

446 lines 16.3 KB

dfaf1b77 refactor: rename muse-work/ → state/ Gabriel Cardona <gabriel@tellurstori.com> 8h ago

1	# Security Architecture — Muse Trust Boundary Reference
2
3	Muse is designed to run at the scale of millions of agent calls per minute.
4	Every data path that crosses a trust boundary — user input, remote HTTP
5	responses, manifest keys from the object store, terminal output — is guarded
6	by an explicit validation primitive. This document describes each guard,
7	where it applies, and the attack it prevents.
8
9	---
10
11	## Table of Contents
12
13	1. [Threat Model](#threat-model)
14	2. [Trust Boundary Design](#trust-boundary-design)
15	3. [Validation Module — `muse/core/validation.py`](#validation-module)
16	4. [Object ID & Ref ID Validation](#object-id--ref-id-validation)
17	5. [Branch Name & Repo ID Validation](#branch-name--repo-id-validation)
18	6. [Path Containment — Zip-Slip Defence](#path-containment--zip-slip-defence)
19	7. [Display Sanitization — ANSI Injection Defence](#display-sanitization--ansi-injection-defence)
20	8. [Glob Injection Prevention](#glob-injection-prevention)
21	9. [Numeric Guards](#numeric-guards)
22	10. [XML Safety — `muse/core/xml_safe.py`](#xml-safety)
23	11. [HTTP Transport Hardening](#http-transport-hardening)
24	12. [Snapshot Integrity](#snapshot-integrity)
25	13. [Identity Store Security](#identity-store-security)
26	14. [Size Caps](#size-caps)
27
28	---
29
30	## Threat Model
31
32	Muse's primary threat surface has four entry points:
33
34	\| Entry point \| Source of untrusted data \|
35	\|---\|---\|
36	\| CLI arguments \| User shell input, agent-generated commands \|
37	\| Environment variables \| CI systems, compromised orchestrators \|
38	\| Remote HTTP responses \| MuseHub server, MitM attacker \|
39	\| On-disk data \| Tampered `.muse/` directory, crafted MIDI / MusicXML files \|
40
41	At the scale of millions of agents per minute, even a low-probability
42	exploitation path becomes a near-certainty. Every function that accepts
43	external data must validate it before use.
44
45	---
46
47	## Trust Boundary Design
48
49	Muse uses a layered trust model:
50
51	```
52	External world (untrusted)
53	\|
54	\| CLI args, env vars, HTTP responses, files
55	v
56	CLI commands ←──────────────── muse/cli/commands/
57	\|
58	\| validated, typed data only
59	v
60	Core engine ←──────────────── muse/core/
61	\|
62	\| content-addressed blobs
63	v
64	Object store ←──────────────── muse/core/object_store.py
65	```
66
67	Rule: data is validated at the point it crosses from the external world
68	into the CLI layer, or from the network into the core. Internal functions
69	that call each other do not re-validate data they receive from trusted callers.
70
71	The validation module — `muse/core/validation.py` — sits at the absolute
72	bottom of the dependency graph. It imports no other Muse module. Every layer
73	may import it; it imports nothing above itself.
74
75	---
76
77	## Validation Module
78
79	`muse/core/validation.py` — the single source of all trust-boundary
80	primitives.
81
82	```
83	muse/core/validation.py
84	├── validate_object_id(s) → str \| raises ValueError
85	├── validate_ref_id(s) → str \| raises ValueError
86	├── validate_branch_name(name) → str \| raises ValueError
87	├── validate_repo_id(repo_id) → str \| raises ValueError
88	├── validate_domain_name(domain)→ str \| raises ValueError
89	├── contain_path(base, rel) → pathlib.Path \| raises ValueError
90	├── sanitize_glob_prefix(prefix)→ str (never raises)
91	├── sanitize_display(s) → str (never raises)
92	├── clamp_int(value, lo, hi) → int \| raises ValueError
93	└── finite_float(value, fallback)→ float (never raises)
94	```
95
96	The convention: functions named `validate_*` raise on bad input; functions
97	named `sanitize_*` strip bad bytes and always return a safe string.
98
99	---
100
101	## Object ID & Ref ID Validation
102
103	Function: `validate_object_id(s)` and `validate_ref_id(s)`
104	Guard: enforces exactly 64 lowercase hexadecimal characters.
105	Attack prevented: path traversal via crafted object or commit IDs.
106
107	### Why this matters
108
109	Object IDs are used to construct filesystem paths:
110
111	```
112	.muse/objects/<id[:2]>/<id[2:]>
113	.muse/commits/<commit_id>.json
114	```
115
116	A crafted ID such as `../../../etc/passwd` followed by padding would construct
117	a path outside `.muse/`. Enforcing the 64-char hex format closes this class
118	of attack completely — no character in `[0-9a-f]{64}` can form a path
119	separator.
120
121	### Where applied
122
123	- `object_store.object_path()` — before constructing the shard path
124	- `object_store.restore_object()` — before reading a blob
125	- `object_store.write_object()` — verifies the provided ID is valid hex
126	and checks that the written content hashes to the provided ID
127	(content integrity, not just format integrity)
128	- `store.resolve_commit_ref()` — sanitizes user-supplied ref before prefix scan
129	- `store.store_pulled_commit()` — validates commit and snapshot IDs from remote
130	- `merge_engine.read_merge_state()` — validates IDs read from MERGE_STATE.json
131	- `merge_engine.apply_resolution()` — validates the resolution object ID
132
133	---
134
135	## Branch Name & Repo ID Validation
136
137	Function: `validate_branch_name(name)` and `validate_repo_id(repo_id)`
138	Guard: rejects backslashes, null bytes, CR/LF, leading/trailing dots,
139	consecutive dots, consecutive slashes, leading/trailing slashes, and names
140	longer than 255 characters.
141	Attack prevented: path traversal via branch names used in ref paths, null
142	byte injection, and log injection via CR/LF.
143
144	### Branch name rules
145
146	\| Allowed \| Rejected \|
147	\|---\|---\|
148	\| `main`, `dev`, `feature/my-branch` \| Backslash: `evil\branch` \|
149	\| Digits, hyphens, underscores \| Null byte: `branch\x00name` \|
150	\| Forward slashes (namespacing) \| CR or LF: `branch\rname` \|
151	\| Up to 255 characters \| Leading dot: `.hidden` \|
152	\| \| Trailing dot: `branch.` \|
153	\| \| Consecutive dots: `branch..name` \|
154	\| \| Consecutive slashes: `feat//branch` \|
155	\| \| Leading or trailing slash \|
156
157	### Where applied
158
159	- `cli/commands/init.py` — `--default-branch` and `--domain` arguments
160	- `cli/commands/commit.py` — HEAD branch detection (HEAD-poisoning guard)
161	- `cli/commands/branch.py` — creation and deletion targets
162	- `cli/commands/checkout.py` — new branch creation via `-b`
163	- `cli/commands/merge.py` — target branch name
164	- `cli/commands/reset.py` — branch before writing the ref file
165	- `store.get_head_commit_id()` — branch from the ref layer
166
167	---
168
169	## Path Containment — Zip-Slip Defence
170
171	Function: `contain_path(base: pathlib.Path, rel: str) -> pathlib.Path`
172	Guard: joins `base / rel`, resolves symlinks, then asserts the result is
173	inside `base`.
174	Attack prevented: zip-slip (path traversal via manifest keys or
175	user-supplied relative paths).
176
177	### The zip-slip attack
178
179	A malicious archive or snapshot manifest can contain a key like
180	`../../.ssh/authorized_keys`. If the restore loop does:
181
182	```python
183	dest = workdir / manifest_key
184	dest.write_bytes(blob)
185	```
186
187	…then a crafted key writes outside the working directory. `contain_path`
188	closes this by checking:
189
190	```python
191	resolved = (base / rel).resolve()
192	if not resolved.is_relative_to(base.resolve()):
193	raise ValueError("Path traversal detected")
194	```
195
196	### Symlink escape
197
198	`contain_path` resolves symlinks before the containment check. A symlink
199	inside `state/` that points to `/etc/passwd` would resolve to a path
200	outside `state/`, causing `contain_path` to raise before any data is
201	written.
202
203	### Where applied
204
205	- `cli/commands/checkout.py` — `_checkout_snapshot()` for every restored file
206	- `cli/commands/merge.py` — `_restore_from_manifest()` for every restored file
207	- `cli/commands/reset.py` — `--hard` reset restore loop
208	- `cli/commands/revert.py` — revert restore loop
209	- `cli/commands/cherry_pick.py` — cherry-pick restore loop
210	- `cli/commands/stash.py` — `stash pop` restore loop
211	- All 7 semantic write commands (arpeggiate, humanize, invert, quantize,
212	retrograde, velocity_normalize, midi_shard) — output file paths
213	- `merge_engine.read_merge_state()` — conflict path list from MERGE_STATE.json
214	- `merge_engine.apply_resolution()` — resolution target file path
215
216	---
217
218	## Display Sanitization — ANSI Injection Defence
219
220	Function: `sanitize_display(s: str) -> str`
221	Guard: strips all C0 control characters except `\t` and `\n`, plus DEL
222	(`\x7f`) and C1 control characters (`\x80–\x9f`).
223	Attack prevented: ANSI/OSC terminal escape injection via commit messages,
224	branch names, author fields, and other user-controlled strings echoed to the
225	terminal.
226
227	### The attack
228
229	A commit message like:
230
231	```
232	Add feature\x1b]2;Hacked terminal title\x07 (harmless-looking)
233	```
234
235	…would, when echoed to a terminal, silently change the terminal's title bar or
236	execute other OSC/CSI sequences. At millions of agent calls per minute, a
237	malicious agent could systematically inject escape sequences into commit
238	messages that other users' terminals execute.
239
240	### Characters stripped
241
242	\| Code point \| Name \| Why stripped \|
243	\|---\|---\|---\|
244	\| `\x00–\x08` \| C0 (NUL to BS) \| Control bytes; no legitimate use in display \|
245	\| `\x0b–\x0c` \| VT, FF \| Not standard line breaks; terminal control \|
246	\| `\x0d` \| CR \| Cursor return — log injection \|
247	\| `\x0e–\x1a` \| SO to SUB \| Control shift codes \|
248	\| `\x1b` \| ESC \| ANSI escape sequence start \|
249	\| `\x1c–\x1f` \| FS to US \| Control separators \|
250	\| `\x7f` \| DEL \| Backspace-style control \|
251	\| `\x80–\x9f` \| C1 \| CSI (`\x9b`) and other C1 escape starters \|
252
253	Preserved: `\t` (tab) and `\n` (newline) — legitimate in commit messages.
254
255	### Where applied
256
257	All `typer.echo()` paths that output user-controlled strings:
258	`log`, `tag`, `branch`, `checkout`, `merge`, `reset`, `revert`,
259	`cherry_pick`, `commit`, `find_phrase`, `agent_map`.
260
261	---
262
263	## Glob Injection Prevention
264
265	Function: `sanitize_glob_prefix(prefix: str) -> str`
266	Guard: strips the glob metacharacters `*`, `?`, `[`, `]`, `{`, `}` from
267	a string before it is used in a `pathlib.Path.glob()` pattern.
268	Attack prevented: glob injection turning a targeted prefix lookup into an
269	arbitrary filesystem scan.
270
271	The function `_find_commit_by_prefix()` in `store.py` constructs:
272
273	```python
274	list(commits_dir.glob(f"{sanitized}*.json"))
275	```
276
277	Without sanitization, a crafted prefix like `*/` would enumerate the
278	entire directory tree rooted at `.muse/commits/`.
279
280	---
281
282	## Numeric Guards
283
284	Function: `clamp_int(value, lo, hi, name)` and `finite_float(value, fallback)`
285	Guard: raises `ValueError` for out-of-range integers; returns `fallback`
286	for `Inf` / `-Inf` / `NaN` floats.
287	Attack prevented: resource exhaustion via large numeric arguments; NaN
288	propagation causing silent computation corruption.
289
290	### Where applied
291
292	\| Command \| Flag \| Bounds \|
293	\|---\|---\|---\|
294	\| `muse log` \| `--max-count` \| ≥ 1 \|
295	\| `muse find_phrase` \| `--depth` \| 1–10,000 \|
296	\| `muse agent_map` \| `--depth` \| 1–10,000 \|
297	\| `muse find_phrase` \| `--min-score` \| 0.0–1.0 \|
298	\| `muse humanize` \| `--timing` \| ≤ 1.0 beat \|
299	\| `muse humanize` \| `--velocity` \| ≤ 127 \|
300	\| `muse invert` \| `--pivot` \| 0–127 (MIDI note range) \|
301	\| MIDI parser \| `tempo` \| guard against `tempo=0` (division by zero) \|
302	\| MIDI parser \| `divisions` \| guard against negative or zero values \|
303
304	---
305
306	## XML Safety
307
308	Module: `muse/core/xml_safe.py`
309	Guard: wraps `defusedxml.ElementTree.parse()` behind a typed `SafeET`
310	class.
311	Attack prevented: Billion Laughs (entity expansion DoS), XXE (external
312	entity credential theft), and SSRF via XML.
313
314	### The attacks
315
316	Billion Laughs:
317	A DTD-defined entity that expands to another entity, repeated exponentially.
318	Parsing a single small file consumes gigabytes of memory.
319
320	XXE (XML External Entity):
321	```xml
322	<!ENTITY xxe SYSTEM "file:///etc/passwd">
323	<root>&xxe;</root>
324	```
325	The parser fetches the file and embeds its contents in the parse tree. With a
326	`SYSTEM "http://..."` URL, it becomes an SSRF vector.
327
328	### Why a typed wrapper
329
330	`defusedxml` does not ship type stubs. Importing it directly requires a
331	`# type: ignore` comment, which the project's zero-ignore rule bans.
332	`xml_safe.py` contains the single justified crossing of the typed/untyped
333	boundary and re-exports all necessary stdlib `ElementTree` types with full
334	type information.
335
336	```python
337	# Instead of:
338	import xml.etree.ElementTree as ET # unsafe — no XXE protection
339	ET.parse("score.xml")
340
341	# Use:
342	from muse.core.xml_safe import SafeET
343	SafeET.parse("score.xml") # fully typed, XXE-safe
344	```
345
346	---
347
348	## HTTP Transport Hardening
349
350	Module: `muse/core/transport.py`
351
352	### Redirect refusal
353
354	`_STRICT_OPENER` is a `urllib.request.OpenerDirector` built with a custom
355	`_NoRedirectHandler` that raises on any HTTP redirect. This prevents:
356
357	- Authorization header leakage — a redirect to a different host would
358	carry the `Authorization: Bearer <token>` header to the attacker's server.
359	- Scheme downgrade — a redirect from `https://` to `http://` would
360	expose the bearer token over cleartext.
361
362	### HTTPS enforcement
363
364	`_build_request()` uses `urllib.parse.urlparse(url).scheme` to check for
365	HTTPS. A URL that uses any other scheme raises before a connection is
366	attempted.
367
368	### Response size cap
369
370	`_execute()` reads at most `MAX_RESPONSE_BYTES` (64 MB) from any HTTP
371	response. If a `Content-Length` header declares a larger body, the request is
372	rejected before reading begins. This prevents OOM attacks via an unbounded
373	response body.
374
375	### Content-Type guard
376
377	`_assert_json_content(raw, endpoint)` checks that the first non-whitespace
378	byte of a response body is `{` or `[` before calling `json.loads()`. This
379	catches HTML error pages (proxy intercept pages, Cloudflare challenges) that
380	would otherwise produce a misleading `JSONDecodeError`.
381
382	---
383
384	## Snapshot Integrity
385
386	Module: `muse/core/snapshot.py`
387
388	### Null-byte separators in hash computation
389
390	`compute_snapshot_id()` and `compute_commit_id()` hash a canonical
391	representation of the manifest. The separator between key and value is the
392	null byte (`\x00`) rather than a printable character like `\|` or `:`.
393
394	Why this matters: if the separator is `:`, then a file named `a:b` with
395	object ID `c` and a file named `a` with object ID `b:c` produce the same hash
396	input. The null byte cannot appear in filenames on POSIX or Windows, making
397	collisions structurally impossible.
398
399	### Symlink and hidden-file exclusion
400
401	`walk_workdir()` skips:
402	- Symlinks — following symlinks during snapshot could include files
403	outside the working directory, leaking content.
404	- Hidden files and directories (names starting with `.`) — `.muse/` must
405	never be snapshotted; other dotfiles (`.env`, `.git`) are excluded to prevent
406	accidental credential capture.
407
408	---
409
410	## Identity Store Security
411
412	Module: `muse/core/identity.py`
413
414	The identity store (`~/.muse/identity.toml`) holds bearer tokens. Several
415	layered controls protect it:
416
417	\| Control \| Implementation \| Threat prevented \|
418	\|---\|---\|---\|
419	\| 0o700 directory \| `os.chmod(~/.muse/, 0o700)` \| Other local users cannot list or traverse the directory \|
420	\| 0o600 from byte zero \| `os.open()` + `os.fchmod()` before writing \| Eliminates the TOCTOU window that `write_text()` + `chmod()` creates \|
421	\| Atomic rename \| Temp file + `os.replace()` \| A crash or kill signal during write leaves the old file intact — never a partial file \|
422	\| Symlink guard \| Check `path.is_symlink()` before write \| Blocks pre-placed symlink attacks targeting a different credential file \|
423	\| Exclusive write lock \| `fcntl.flock(LOCK_EX)` on `.identity.lock` \| Prevents race conditions when parallel agents write simultaneously \|
424	\| Token masking \| All log calls use `"Bearer ***"` \| Tokens never appear in log output \|
425	\| URL normalisation \| `_hostname_from_url()` strips scheme, userinfo, path \| `https://admin:secret@musehub.ai/repos/x` and `musehub.ai` resolve to the same key \|
426
427	---
428
429	## Size Caps
430
431	\| Constant \| Value \| Where enforced \|
432	\|---\|---\|---\|
433	\| `MAX_FILE_BYTES` \| 256 MB \| `object_store.read_object()` — cap per-blob reads \|
434	\| `MAX_RESPONSE_BYTES` \| 64 MB \| `transport._execute()` — cap HTTP response body \|
435	\| `MAX_SYSEX_BYTES` \| 64 KiB \| `midi_merge._msg_to_dict()` — cap SysEx data per message \|
436	\| MIDI file size \| `MAX_FILE_BYTES` \| `midi_parser.parse_file()` — cap file size before parse \|
437
438	---
439
440	See also:
441
442	- [`docs/reference/auth.md`](auth.md) — identity lifecycle (`muse auth`)
443	- [`docs/reference/hub.md`](hub.md) — hub connection management (`muse hub`)
444	- [`docs/reference/remotes.md`](remotes.md) — push, fetch, clone transport
445	- [`muse/core/validation.py`](../../muse/core/validation.py) — implementation
446	- [`tests/test_core_validation.py`](../../tests/test_core_validation.py) — test suite

Content Address

Object ID (SHA-256)

e2bba1dbeba5bc9e13edfb045c9840bd8c796a6507cc42078f34762bc0cce60a

This file is immutable and content-addressed. The same SHA always refers to the same bytes, across every clone and every time.

File Info

Path docs/reference/security.md

Lines 446

Size 16.3 KB

Language markdown

Ref dfaf1b77

Snapshot 5a11c9fd68ef…

Last Modified

dfaf1b77

refactor: rename muse-work/ → state/

Gabriel Cardona <gabriel@tellurstori.com> 8h ago

View commit →

Links

Browse tree at dfaf1b77 All commits View raw