cy0307 commited on
Commit
91b502e
·
verified ·
1 Parent(s): 2bd560e

Add Qwen3-Omni held-out error analysis

Browse files
Files changed (37) hide show
  1. ARTIFACT_GUIDE.md +2 -0
  2. PROJECT_STATUS.md +1 -1
  3. data/artifact_index.json +46 -13
  4. data/mirror_parity.json +879 -79
  5. data/omni_finetune_verified_result.json +22 -1
  6. data/project_status.json +4 -2
  7. data/publication_audit.json +9 -9
  8. data/scope_claims_audit.json +1 -1
  9. data/task_surface_integrity.json +145 -145
  10. data/website_integrity.json +5 -5
  11. docs/data/artifact_index.json +46 -13
  12. docs/data/mirror_parity.json +366 -62
  13. docs/data/omni_finetune_verified_result.json +22 -1
  14. docs/data/project_status.json +4 -2
  15. docs/data/publication_audit.json +9 -9
  16. docs/data/scope_claims_audit.json +1 -1
  17. docs/data/task_surface_integrity.json +145 -145
  18. docs/data/website_integrity.json +5 -5
  19. metrics/artifact_index.json +46 -13
  20. metrics/mirror_parity.json +366 -62
  21. metrics/omni_finetune_verified_result.json +22 -1
  22. metrics/project_status.json +4 -2
  23. metrics/publication_audit.json +9 -9
  24. metrics/scope_claims_audit.json +1 -1
  25. metrics/task_surface_integrity.json +145 -145
  26. metrics/website_integrity.json +5 -5
  27. results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/PUBLIC_RESULT_SUMMARY.md +18 -0
  28. results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/ERROR_ANALYSIS.md +78 -0
  29. results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/action_family_error_analysis.csv +9 -0
  30. results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/episode_error_analysis.csv +15 -0
  31. results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/error_analysis_summary.json +667 -0
  32. results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/missing_modality_error_analysis.csv +2 -0
  33. results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/object_category_error_analysis.csv +11 -0
  34. results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/train_seen_error_analysis.csv +3 -0
  35. scripts/build_artifact_index.py +24 -0
  36. scripts/omni/analyze_qwen3_omni_errors.py +370 -0
  37. scripts/validate_mirror_parity.py +11 -0
ARTIFACT_GUIDE.md CHANGED
@@ -110,12 +110,14 @@ research project.
110
  | [`results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md`](results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md) | Documents the public multi-episode access path, selected 128-episode pilot plan, and data requirements. |
111
  | [`docs/data/omni_finetune_verified_result.json`](docs/data/omni_finetune_verified_result.json) | Compact verified summary for the first selected-episode Qwen3-Omni diagnostic pilot, including split counts, held-out metrics, and the quality-target caveat. |
112
  | [`results/omni_finetune/verified_public/`](results/omni_finetune/verified_public/) | Public-safe verified held-out result packages. These include metrics, predictions, reports, manifests, training metadata, validation summaries, and audit files, but not raw data or weights. |
 
113
  | [`scripts/omni/discover_xperience10m_sources.py`](scripts/omni/discover_xperience10m_sources.py) | Discovery gate for valid multi-episode Xperience-10M sources. |
114
  | [`scripts/omni/train_qwen3_omni_lora.py`](scripts/omni/train_qwen3_omni_lora.py) | Training entrypoint for the Qwen3-Omni LoRA pilot after the data gate passes. |
115
  | [`scripts/omni/run_128_fullsplit_parallel_export_8gpu.sh`](scripts/omni/run_128_fullsplit_parallel_export_8gpu.sh) | Full 96/16/16 launcher with parallel export, 8-process LoRA training, validation-sample monitoring, held-out test evaluation, and quality-target reporting. |
116
  | [`scripts/omni/merge_qwen3_omni_eval_shards.py`](scripts/omni/merge_qwen3_omni_eval_shards.py) | Recomputes held-out metrics from deterministic Qwen eval shards and checks missing or duplicate prediction ids. |
117
  | [`scripts/omni/package_verified_omni_result.py`](scripts/omni/package_verified_omni_result.py) | Creates a contract-driven public-safe package from validated held-out fine-tuning outputs without raw data, base weights, adapter/checkpoint weights, full checkpoints, or large archives. |
118
  | [`scripts/omni/audit_verified_omni_package.py`](scripts/omni/audit_verified_omni_package.py) | Audits a verified package before README, website, or Hugging Face updates by checking validation status, required files, primary metrics, held-out evidence, and forbidden file types. |
 
119
  | [`scripts/omni/watch_verified_omni_package.py`](scripts/omni/watch_verified_omni_package.py) | Waits for a passing held-out eval validation and then runs the verified public-safe packager automatically. |
120
  | [`OMNI_MODEL_EXTENSION_CONTRACT.md`](OMNI_MODEL_EXTENSION_CONTRACT.md) | Human-readable contract for adding new model families while preserving the same episode split, held-out evaluation, packaging gate, and public-safety boundary. |
121
  | [`configs/omni_backbones/`](configs/omni_backbones/) | Backbone registry for implemented Qwen3-Omni LoRA plus planned Cosmos-style world-model and VLA/policy branches. |
 
110
  | [`results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md`](results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md) | Documents the public multi-episode access path, selected 128-episode pilot plan, and data requirements. |
111
  | [`docs/data/omni_finetune_verified_result.json`](docs/data/omni_finetune_verified_result.json) | Compact verified summary for the first selected-episode Qwen3-Omni diagnostic pilot, including split counts, held-out metrics, and the quality-target caveat. |
112
  | [`results/omni_finetune/verified_public/`](results/omni_finetune/verified_public/) | Public-safe verified held-out result packages. These include metrics, predictions, reports, manifests, training metadata, validation summaries, and audit files, but not raw data or weights. |
113
+ | [`results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/ERROR_ANALYSIS.md`](results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/ERROR_ANALYSIS.md) | Derived held-out error analysis by episode, action family, train-seen status, required-modality state, and object category for the validation-aware Qwen3-Omni diagnostic pilot. |
114
  | [`scripts/omni/discover_xperience10m_sources.py`](scripts/omni/discover_xperience10m_sources.py) | Discovery gate for valid multi-episode Xperience-10M sources. |
115
  | [`scripts/omni/train_qwen3_omni_lora.py`](scripts/omni/train_qwen3_omni_lora.py) | Training entrypoint for the Qwen3-Omni LoRA pilot after the data gate passes. |
116
  | [`scripts/omni/run_128_fullsplit_parallel_export_8gpu.sh`](scripts/omni/run_128_fullsplit_parallel_export_8gpu.sh) | Full 96/16/16 launcher with parallel export, 8-process LoRA training, validation-sample monitoring, held-out test evaluation, and quality-target reporting. |
117
  | [`scripts/omni/merge_qwen3_omni_eval_shards.py`](scripts/omni/merge_qwen3_omni_eval_shards.py) | Recomputes held-out metrics from deterministic Qwen eval shards and checks missing or duplicate prediction ids. |
118
  | [`scripts/omni/package_verified_omni_result.py`](scripts/omni/package_verified_omni_result.py) | Creates a contract-driven public-safe package from validated held-out fine-tuning outputs without raw data, base weights, adapter/checkpoint weights, full checkpoints, or large archives. |
119
  | [`scripts/omni/audit_verified_omni_package.py`](scripts/omni/audit_verified_omni_package.py) | Audits a verified package before README, website, or Hugging Face updates by checking validation status, required files, primary metrics, held-out evidence, and forbidden file types. |
120
+ | [`scripts/omni/analyze_qwen3_omni_errors.py`](scripts/omni/analyze_qwen3_omni_errors.py) | Computes public-safe held-out error-analysis tables from the verified Qwen3-Omni prediction package. |
121
  | [`scripts/omni/watch_verified_omni_package.py`](scripts/omni/watch_verified_omni_package.py) | Waits for a passing held-out eval validation and then runs the verified public-safe packager automatically. |
122
  | [`OMNI_MODEL_EXTENSION_CONTRACT.md`](OMNI_MODEL_EXTENSION_CONTRACT.md) | Human-readable contract for adding new model families while preserving the same episode split, held-out evaluation, packaging gate, and public-safety boundary. |
123
  | [`configs/omni_backbones/`](configs/omni_backbones/) | Backbone registry for implemented Qwen3-Omni LoRA plus planned Cosmos-style world-model and VLA/policy branches. |
PROJECT_STATUS.md CHANGED
@@ -30,7 +30,7 @@ scale-up readiness; it is not presented as final full-dataset model quality.
30
  | Public dashboard and Hub pages | Verified | GitHub Pages, HF Space, artifact dataset, baseline model repo, Qwen3-Omni LoRA repo | Readers can move between the website, code, derived artifacts, baseline weights, and Qwen3-Omni pilot status without needing local infrastructure details. |
31
  | Public package policy | Verified | `DATA_NOTICE.md`, `REPRODUCIBILITY.md` | Raw Xperience-10M data, private gated files, large archives, credentials, and full Qwen weights are not redistributed. |
32
  | Reproducibility | Verified for the public sample | `REPRODUCIBILITY.md`, `docs/data/reproducibility_matrix.json`, `notes/reproducibility_audit.md` | The public sample workflow has explicit commands, expected outputs, and exact-match reproduction evidence. |
33
- | Qwen3-Omni fine-tuning | Verified validation-aware diagnostic held-out pilot; quality target not met | `docs/data/omni_finetune_verified_result.json`, `results/omni_finetune/verified_public/`, `scripts/omni/package_verified_omni_result.py`, `scripts/omni/audit_verified_omni_package.py` | The selected 96/16/16 episode split produced a validation-aware public-safe held-out package with 3,808 exported windows, 512 validation windows, and 448 test predictions. JSON validity is 87.50%, below the 98% target, so the result is a diagnostic baseline and the next pass should focus on structured-output improvements and error analysis. |
34
  | Raw Xperience-10M redistribution | Not included | `DATA_NOTICE.md`, `docs/data/publication_audit.json` | Raw MP4, HDF5, RRD files, private gated data, and full Qwen weights are intentionally excluded. |
35
 
36
  ## Fast Research Route
 
30
  | Public dashboard and Hub pages | Verified | GitHub Pages, HF Space, artifact dataset, baseline model repo, Qwen3-Omni LoRA repo | Readers can move between the website, code, derived artifacts, baseline weights, and Qwen3-Omni pilot status without needing local infrastructure details. |
31
  | Public package policy | Verified | `DATA_NOTICE.md`, `REPRODUCIBILITY.md` | Raw Xperience-10M data, private gated files, large archives, credentials, and full Qwen weights are not redistributed. |
32
  | Reproducibility | Verified for the public sample | `REPRODUCIBILITY.md`, `docs/data/reproducibility_matrix.json`, `notes/reproducibility_audit.md` | The public sample workflow has explicit commands, expected outputs, and exact-match reproduction evidence. |
33
+ | Qwen3-Omni fine-tuning | Verified validation-aware diagnostic held-out pilot; quality target not met | `docs/data/omni_finetune_verified_result.json`, `results/omni_finetune/verified_public/`, `results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/`, `scripts/omni/package_verified_omni_result.py`, `scripts/omni/audit_verified_omni_package.py`, `scripts/omni/analyze_qwen3_omni_errors.py` | The selected 96/16/16 episode split produced a validation-aware public-safe held-out package with 3,808 exported windows, 512 validation windows, 448 test predictions, and derived error-analysis tables by episode, action family, train-seen status, required-modality state, and object category. JSON validity is 87.50%, below the 98% target, so the result is a diagnostic baseline and the next pass should focus on structured-output improvements. |
34
  | Raw Xperience-10M redistribution | Not included | `DATA_NOTICE.md`, `docs/data/publication_audit.json` | Raw MP4, HDF5, RRD files, private gated data, and full Qwen weights are intentionally excluded. |
35
 
36
  ## Fast Research Route
data/artifact_index.json CHANGED
@@ -1,12 +1,12 @@
1
  {
2
  "title": "Ropedia Xperience-10M Task Suite Artifact Index",
3
- "generated_at_utc": "2026-06-06T14:35:42+00:00",
4
  "status": "pass",
5
- "artifact_count": 83,
6
  "missing": [],
7
  "by_kind": {
8
  "project_path": 14,
9
- "scaleup_contract": 6,
10
  "project_scope": 1,
11
  "source_alignment": 5,
12
  "publication_workflow": 3,
@@ -28,7 +28,7 @@
28
  "onboarding_doc": 1,
29
  "generated_figure": 3,
30
  "generated_figure_assets": 1,
31
- "scaleup_status": 2,
32
  "citation": 1,
33
  "license": 1
34
  },
@@ -63,8 +63,8 @@
63
  "surface": "repo_hf",
64
  "shows": "Gives a compact current-state table for first-pass readers.",
65
  "exists": true,
66
- "bytes": 8534,
67
- "sha256": "5eb48d489da7f005baab233a94c9d6b209eb1e9ffdb138c8e0e600ece9239a29"
68
  },
69
  {
70
  "id": "project_status_json",
@@ -74,8 +74,8 @@
74
  "surface": "website_hf",
75
  "shows": "Machine-readable copy of the current project status for website and HF mirrors.",
76
  "exists": true,
77
- "bytes": 10977,
78
- "sha256": "2bb0639c137dfd6eddd337eb909292543ae2e72753dee398f8240ff35f6a3984"
79
  },
80
  {
81
  "id": "research_roadmap",
@@ -187,6 +187,17 @@
187
  "bytes": 6519,
188
  "sha256": "a3773fc681e298325e2be80556d6be6e7e30b90ba22ee24b66633f07ff9c4ea4"
189
  },
 
 
 
 
 
 
 
 
 
 
 
190
  {
191
  "id": "additional_development_directions",
192
  "title": "Additional development directions",
@@ -250,8 +261,8 @@
250
  "surface": "repo_hf",
251
  "shows": "Gives the human-readable map from project scope to data, tasks, platform mirrors, and scale-up status.",
252
  "exists": true,
253
- "bytes": 15660,
254
- "sha256": "a9ad335b82c35a5ac102428663ffae1c8798e90e45cc5e795c3a499b4563b417"
255
  },
256
  {
257
  "id": "official_dataset_card_alignment",
@@ -695,8 +706,8 @@
695
  "surface": "repo_hf",
696
  "shows": "Generates the selective artifact catalog from local files.",
697
  "exists": true,
698
- "bytes": 30785,
699
- "sha256": "0c42b68e44e6a32b6b5161b47161adc5ccdb57567e1462e8271ea87af50ab92d"
700
  },
701
  {
702
  "id": "publication_audit",
@@ -731,7 +742,7 @@
731
  "volatile": true,
732
  "shows": "Confirms prepared GitHub/HF Space/artifact/model mirrors share the same critical data, figure, website HTML, and validator files.",
733
  "exists": true,
734
- "bytes": 111950,
735
  "hash_policy": "existence_and_size_only"
736
  },
737
  {
@@ -933,6 +944,28 @@
933
  "bytes": 3076,
934
  "sha256": "23b87581cfc1d95b0af118a0dbb4e601f42fc6bad608759490e13a9a1ef73205"
935
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
936
  {
937
  "id": "citation",
938
  "title": "Citation metadata",
 
1
  {
2
  "title": "Ropedia Xperience-10M Task Suite Artifact Index",
3
+ "generated_at_utc": "2026-06-06T14:53:45+00:00",
4
  "status": "pass",
5
+ "artifact_count": 86,
6
  "missing": [],
7
  "by_kind": {
8
  "project_path": 14,
9
+ "scaleup_contract": 7,
10
  "project_scope": 1,
11
  "source_alignment": 5,
12
  "publication_workflow": 3,
 
28
  "onboarding_doc": 1,
29
  "generated_figure": 3,
30
  "generated_figure_assets": 1,
31
+ "scaleup_status": 4,
32
  "citation": 1,
33
  "license": 1
34
  },
 
63
  "surface": "repo_hf",
64
  "shows": "Gives a compact current-state table for first-pass readers.",
65
  "exists": true,
66
+ "bytes": 8805,
67
+ "sha256": "4051b78674306078880de33a144a499144b2487b11455c70a364a94cefa035a7"
68
  },
69
  {
70
  "id": "project_status_json",
 
74
  "surface": "website_hf",
75
  "shows": "Machine-readable copy of the current project status for website and HF mirrors.",
76
  "exists": true,
77
+ "bytes": 11274,
78
+ "sha256": "ae2b2c520ab1e0553fa399439345edd87832fa5293d8c27ffe610ede5bfa1067"
79
  },
80
  {
81
  "id": "research_roadmap",
 
187
  "bytes": 6519,
188
  "sha256": "a3773fc681e298325e2be80556d6be6e7e30b90ba22ee24b66633f07ff9c4ea4"
189
  },
190
+ {
191
+ "id": "qwen3_omni_error_analysis_script",
192
+ "title": "Qwen3-Omni held-out error-analysis script",
193
+ "path": "scripts/omni/analyze_qwen3_omni_errors.py",
194
+ "kind": "scaleup_contract",
195
+ "surface": "repo_hf",
196
+ "shows": "Computes public-safe held-out error-analysis tables by episode, action family, train-seen status, required-modality state, and object category.",
197
+ "exists": true,
198
+ "bytes": 15676,
199
+ "sha256": "d4c7e46d9fbd5f9d84bc32374f457fd8c9d68c8faa39c77bc45770eb95d80337"
200
+ },
201
  {
202
  "id": "additional_development_directions",
203
  "title": "Additional development directions",
 
261
  "surface": "repo_hf",
262
  "shows": "Gives the human-readable map from project scope to data, tasks, platform mirrors, and scale-up status.",
263
  "exists": true,
264
+ "bytes": 16318,
265
+ "sha256": "cda5f4b5be4b7a2d26aff6ed7f930bfba13dfc463d533a9880193c0a0611b677"
266
  },
267
  {
268
  "id": "official_dataset_card_alignment",
 
706
  "surface": "repo_hf",
707
  "shows": "Generates the selective artifact catalog from local files.",
708
  "exists": true,
709
+ "bytes": 32191,
710
+ "sha256": "4a105c732d2f6c54a78333d7f47e0139325ba638027e34e6acd929a90626b8e0"
711
  },
712
  {
713
  "id": "publication_audit",
 
742
  "volatile": true,
743
  "shows": "Confirms prepared GitHub/HF Space/artifact/model mirrors share the same critical data, figure, website HTML, and validator files.",
744
  "exists": true,
745
+ "bytes": 126335,
746
  "hash_policy": "existence_and_size_only"
747
  },
748
  {
 
944
  "bytes": 3076,
945
  "sha256": "23b87581cfc1d95b0af118a0dbb4e601f42fc6bad608759490e13a9a1ef73205"
946
  },
947
+ {
948
+ "id": "qwen3_omni_error_analysis_report",
949
+ "title": "Qwen3-Omni held-out error-analysis report",
950
+ "path": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/ERROR_ANALYSIS.md",
951
+ "kind": "scaleup_status",
952
+ "surface": "repo_hf",
953
+ "shows": "Summarizes validation-aware Qwen3-Omni held-out failures by episode, action family, train-seen status, required-modality state, and object category.",
954
+ "exists": true,
955
+ "bytes": 3331,
956
+ "sha256": "063fcc2ebd7b57ab5b281fd5e8edc629da4e1f4e5a708483ba27375d02af9467"
957
+ },
958
+ {
959
+ "id": "qwen3_omni_error_analysis_json",
960
+ "title": "Qwen3-Omni held-out error-analysis JSON",
961
+ "path": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/error_analysis_summary.json",
962
+ "kind": "scaleup_status",
963
+ "surface": "repo_hf",
964
+ "shows": "Machine-readable Qwen3-Omni held-out error analysis with grouped metrics and sanitized failure examples.",
965
+ "exists": true,
966
+ "bytes": 25202,
967
+ "sha256": "c2e4eaa686f5d9739a8d0bfd8ae51a453b94019489ed84a154e2bce2fa316ff5"
968
+ },
969
  {
970
  "id": "citation",
971
  "title": "Citation metadata",
data/mirror_parity.json CHANGED
@@ -1,16 +1,20 @@
1
  {
2
- "status": "pass",
3
- "generated_at_utc": "2026-06-06T14:37:36+00:00",
4
  "hf_root": "hf_publish",
5
  "summary": {
6
- "group_count": 104,
7
- "failure_count": 0,
8
- "failures_by_surface": {}
 
 
 
 
9
  },
10
  "checks": [
11
  {
12
  "name": "repo_hf_space_artifact_model_data_parity",
13
- "status": "pass"
14
  },
15
  {
16
  "name": "repo_hf_visual_asset_parity",
@@ -18,7 +22,7 @@
18
  },
19
  {
20
  "name": "repo_hf_validator_script_parity",
21
- "status": "pass"
22
  },
23
  {
24
  "name": "repo_hf_website_html_parity",
@@ -26,7 +30,7 @@
26
  },
27
  {
28
  "name": "repo_hf_diagnostic_result_parity",
29
- "status": "pass"
30
  },
31
  {
32
  "name": "repo_hf_quality_doc_parity",
@@ -98,34 +102,56 @@
98
  },
99
  {
100
  "name": "data/artifact_index.json",
101
- "status": "pass",
102
  "local": {
103
  "path": "repo:docs/data/artifact_index.json",
104
  "exists": true,
105
- "bytes": 37736,
106
- "sha256": "f1d87cbabab02227b834ad333507af31a8ce309600f0e0427bb8cb59a26c3b71"
107
  },
108
  "mirrors": {
109
  "hf_space": {
110
  "path": "hf_space:data/artifact_index.json",
111
  "exists": true,
112
- "bytes": 37736,
113
- "sha256": "f1d87cbabab02227b834ad333507af31a8ce309600f0e0427bb8cb59a26c3b71"
114
  },
115
  "hf_artifacts": {
116
  "path": "hf_artifacts:docs/data/artifact_index.json",
117
  "exists": true,
118
- "bytes": 37736,
119
- "sha256": "f1d87cbabab02227b834ad333507af31a8ce309600f0e0427bb8cb59a26c3b71"
120
  },
121
  "hf_model": {
122
  "path": "hf_model:metrics/artifact_index.json",
123
  "exists": true,
124
- "bytes": 37736,
125
- "sha256": "f1d87cbabab02227b834ad333507af31a8ce309600f0e0427bb8cb59a26c3b71"
126
  }
127
  },
128
- "failures": []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
129
  },
130
  {
131
  "name": "data/brand_assets.json",
@@ -350,27 +376,27 @@
350
  "local": {
351
  "path": "repo:docs/data/omni_finetune_verified_result.json",
352
  "exists": true,
353
- "bytes": 3145,
354
- "sha256": "37b001a24201ba56b327fa89f19792d64ebcdabc1faffa7e7bb4fd6b8323731a"
355
  },
356
  "mirrors": {
357
  "hf_space": {
358
  "path": "hf_space:data/omni_finetune_verified_result.json",
359
  "exists": true,
360
- "bytes": 3145,
361
- "sha256": "37b001a24201ba56b327fa89f19792d64ebcdabc1faffa7e7bb4fd6b8323731a"
362
  },
363
  "hf_artifacts": {
364
  "path": "hf_artifacts:docs/data/omni_finetune_verified_result.json",
365
  "exists": true,
366
- "bytes": 3145,
367
- "sha256": "37b001a24201ba56b327fa89f19792d64ebcdabc1faffa7e7bb4fd6b8323731a"
368
  },
369
  "hf_model": {
370
  "path": "hf_model:metrics/omni_finetune_verified_result.json",
371
  "exists": true,
372
- "bytes": 3145,
373
- "sha256": "37b001a24201ba56b327fa89f19792d64ebcdabc1faffa7e7bb4fd6b8323731a"
374
  }
375
  },
376
  "failures": []
@@ -474,61 +500,83 @@
474
  "local": {
475
  "path": "repo:docs/data/project_status.json",
476
  "exists": true,
477
- "bytes": 10977,
478
- "sha256": "2bb0639c137dfd6eddd337eb909292543ae2e72753dee398f8240ff35f6a3984"
479
  },
480
  "mirrors": {
481
  "hf_space": {
482
  "path": "hf_space:data/project_status.json",
483
  "exists": true,
484
- "bytes": 10977,
485
- "sha256": "2bb0639c137dfd6eddd337eb909292543ae2e72753dee398f8240ff35f6a3984"
486
  },
487
  "hf_artifacts": {
488
  "path": "hf_artifacts:docs/data/project_status.json",
489
  "exists": true,
490
- "bytes": 10977,
491
- "sha256": "2bb0639c137dfd6eddd337eb909292543ae2e72753dee398f8240ff35f6a3984"
492
  },
493
  "hf_model": {
494
  "path": "hf_model:metrics/project_status.json",
495
  "exists": true,
496
- "bytes": 10977,
497
- "sha256": "2bb0639c137dfd6eddd337eb909292543ae2e72753dee398f8240ff35f6a3984"
498
  }
499
  },
500
  "failures": []
501
  },
502
  {
503
  "name": "data/publication_audit.json",
504
- "status": "pass",
505
  "local": {
506
  "path": "repo:docs/data/publication_audit.json",
507
  "exists": true,
508
  "bytes": 7237,
509
- "sha256": "bfdfb04abf62dfb3ffa596f1d9ec58fc5bac633f6c1cfb1710d3988ef635cf03"
510
  },
511
  "mirrors": {
512
  "hf_space": {
513
  "path": "hf_space:data/publication_audit.json",
514
  "exists": true,
515
  "bytes": 7237,
516
- "sha256": "bfdfb04abf62dfb3ffa596f1d9ec58fc5bac633f6c1cfb1710d3988ef635cf03"
517
  },
518
  "hf_artifacts": {
519
  "path": "hf_artifacts:docs/data/publication_audit.json",
520
  "exists": true,
521
  "bytes": 7237,
522
- "sha256": "bfdfb04abf62dfb3ffa596f1d9ec58fc5bac633f6c1cfb1710d3988ef635cf03"
523
  },
524
  "hf_model": {
525
  "path": "hf_model:metrics/publication_audit.json",
526
  "exists": true,
527
  "bytes": 7237,
528
- "sha256": "bfdfb04abf62dfb3ffa596f1d9ec58fc5bac633f6c1cfb1710d3988ef635cf03"
529
  }
530
  },
531
- "failures": []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
532
  },
533
  {
534
  "name": "data/public_surface_qa.json",
@@ -811,34 +859,56 @@
811
  },
812
  {
813
  "name": "data/scope_claims_audit.json",
814
- "status": "pass",
815
  "local": {
816
  "path": "repo:docs/data/scope_claims_audit.json",
817
  "exists": true,
818
  "bytes": 20823,
819
- "sha256": "7f01728415c9c54126eab25f2ce68e563b455f02d2bf10af514463c33bc0091e"
820
  },
821
  "mirrors": {
822
  "hf_space": {
823
  "path": "hf_space:data/scope_claims_audit.json",
824
  "exists": true,
825
  "bytes": 20823,
826
- "sha256": "7f01728415c9c54126eab25f2ce68e563b455f02d2bf10af514463c33bc0091e"
827
  },
828
  "hf_artifacts": {
829
  "path": "hf_artifacts:docs/data/scope_claims_audit.json",
830
  "exists": true,
831
  "bytes": 20823,
832
- "sha256": "7f01728415c9c54126eab25f2ce68e563b455f02d2bf10af514463c33bc0091e"
833
  },
834
  "hf_model": {
835
  "path": "hf_model:metrics/scope_claims_audit.json",
836
  "exists": true,
837
  "bytes": 20823,
838
- "sha256": "7f01728415c9c54126eab25f2ce68e563b455f02d2bf10af514463c33bc0091e"
839
  }
840
  },
841
- "failures": []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
842
  },
843
  {
844
  "name": "data/single_episode_explorer.json",
@@ -935,34 +1005,56 @@
935
  },
936
  {
937
  "name": "data/task_surface_integrity.json",
938
- "status": "pass",
939
  "local": {
940
  "path": "repo:docs/data/task_surface_integrity.json",
941
  "exists": true,
942
  "bytes": 45779,
943
- "sha256": "1ae426aea9895c32912b2c9a0e519a55912222493d3c1d72e4785d71cd3b71cb"
944
  },
945
  "mirrors": {
946
  "hf_space": {
947
  "path": "hf_space:data/task_surface_integrity.json",
948
  "exists": true,
949
  "bytes": 45779,
950
- "sha256": "1ae426aea9895c32912b2c9a0e519a55912222493d3c1d72e4785d71cd3b71cb"
951
  },
952
  "hf_artifacts": {
953
  "path": "hf_artifacts:docs/data/task_surface_integrity.json",
954
  "exists": true,
955
  "bytes": 45779,
956
- "sha256": "1ae426aea9895c32912b2c9a0e519a55912222493d3c1d72e4785d71cd3b71cb"
957
  },
958
  "hf_model": {
959
  "path": "hf_model:metrics/task_surface_integrity.json",
960
  "exists": true,
961
  "bytes": 45779,
962
- "sha256": "1ae426aea9895c32912b2c9a0e519a55912222493d3c1d72e4785d71cd3b71cb"
963
  }
964
  },
965
- "failures": []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
966
  },
967
  {
968
  "name": "data/task_walkthroughs.json",
@@ -997,34 +1089,56 @@
997
  },
998
  {
999
  "name": "data/website_integrity.json",
1000
- "status": "pass",
1001
  "local": {
1002
  "path": "repo:docs/data/website_integrity.json",
1003
  "exists": true,
1004
  "bytes": 15221,
1005
- "sha256": "08f9429aead121834f52fb108a35ff0933435d49064650b94b7ed84c1002182b"
1006
  },
1007
  "mirrors": {
1008
  "hf_space": {
1009
  "path": "hf_space:data/website_integrity.json",
1010
  "exists": true,
1011
  "bytes": 15221,
1012
- "sha256": "08f9429aead121834f52fb108a35ff0933435d49064650b94b7ed84c1002182b"
1013
  },
1014
  "hf_artifacts": {
1015
  "path": "hf_artifacts:docs/data/website_integrity.json",
1016
  "exists": true,
1017
  "bytes": 15221,
1018
- "sha256": "08f9429aead121834f52fb108a35ff0933435d49064650b94b7ed84c1002182b"
1019
  },
1020
  "hf_model": {
1021
  "path": "hf_model:metrics/website_integrity.json",
1022
  "exists": true,
1023
  "bytes": 15221,
1024
- "sha256": "08f9429aead121834f52fb108a35ff0933435d49064650b94b7ed84c1002182b"
1025
  }
1026
  },
1027
- "failures": []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1028
  },
1029
  {
1030
  "name": "data/xperience10m_dataset_card_alignment.json",
@@ -1723,6 +1837,46 @@
1723
  },
1724
  "failures": []
1725
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1726
  {
1727
  "name": "scripts/audio_ablation_and_raw_upgrade.py",
1728
  "status": "pass",
@@ -1754,21 +1908,21 @@
1754
  "local": {
1755
  "path": "repo:scripts/build_artifact_index.py",
1756
  "exists": true,
1757
- "bytes": 30785,
1758
- "sha256": "0c42b68e44e6a32b6b5161b47161adc5ccdb57567e1462e8271ea87af50ab92d"
1759
  },
1760
  "mirrors": {
1761
  "hf_artifacts": {
1762
  "path": "hf_artifacts:scripts/build_artifact_index.py",
1763
  "exists": true,
1764
- "bytes": 30785,
1765
- "sha256": "0c42b68e44e6a32b6b5161b47161adc5ccdb57567e1462e8271ea87af50ab92d"
1766
  },
1767
  "hf_model": {
1768
  "path": "hf_model:scripts/build_artifact_index.py",
1769
  "exists": true,
1770
- "bytes": 30785,
1771
- "sha256": "0c42b68e44e6a32b6b5161b47161adc5ccdb57567e1462e8271ea87af50ab92d"
1772
  }
1773
  },
1774
  "failures": []
@@ -2054,21 +2208,21 @@
2054
  "local": {
2055
  "path": "repo:scripts/validate_mirror_parity.py",
2056
  "exists": true,
2057
- "bytes": 12642,
2058
- "sha256": "17420a261d1327c0a8acb79adb75fc15217f117216eb74acf0cab3fa36de856c"
2059
  },
2060
  "mirrors": {
2061
  "hf_artifacts": {
2062
  "path": "hf_artifacts:scripts/validate_mirror_parity.py",
2063
  "exists": true,
2064
- "bytes": 12642,
2065
- "sha256": "17420a261d1327c0a8acb79adb75fc15217f117216eb74acf0cab3fa36de856c"
2066
  },
2067
  "hf_model": {
2068
  "path": "hf_model:scripts/validate_mirror_parity.py",
2069
  "exists": true,
2070
- "bytes": 12642,
2071
- "sha256": "17420a261d1327c0a8acb79adb75fc15217f117216eb74acf0cab3fa36de856c"
2072
  }
2073
  },
2074
  "failures": []
@@ -2807,6 +2961,395 @@
2807
  },
2808
  "failures": []
2809
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2810
  {
2811
  "name": "docs/QUALITY_GATES.md",
2812
  "status": "pass",
@@ -3061,27 +3604,27 @@
3061
  "local": {
3062
  "path": "repo:PROJECT_STATUS.md",
3063
  "exists": true,
3064
- "bytes": 8534,
3065
- "sha256": "5eb48d489da7f005baab233a94c9d6b209eb1e9ffdb138c8e0e600ece9239a29"
3066
  },
3067
  "mirrors": {
3068
  "hf_space": {
3069
  "path": "hf_space:PROJECT_STATUS.md",
3070
  "exists": true,
3071
- "bytes": 8534,
3072
- "sha256": "5eb48d489da7f005baab233a94c9d6b209eb1e9ffdb138c8e0e600ece9239a29"
3073
  },
3074
  "hf_artifacts": {
3075
  "path": "hf_artifacts:PROJECT_STATUS.md",
3076
  "exists": true,
3077
- "bytes": 8534,
3078
- "sha256": "5eb48d489da7f005baab233a94c9d6b209eb1e9ffdb138c8e0e600ece9239a29"
3079
  },
3080
  "hf_model": {
3081
  "path": "hf_model:PROJECT_STATUS.md",
3082
  "exists": true,
3083
- "bytes": 8534,
3084
- "sha256": "5eb48d489da7f005baab233a94c9d6b209eb1e9ffdb138c8e0e600ece9239a29"
3085
  }
3086
  },
3087
  "failures": []
@@ -3211,5 +3754,262 @@
3211
  "failures": []
3212
  }
3213
  ],
3214
- "failures": []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3215
  }
 
1
  {
2
+ "status": "fail",
3
+ "generated_at_utc": "2026-06-06T14:55:21+00:00",
4
  "hf_root": "hf_publish",
5
  "summary": {
6
+ "group_count": 114,
7
+ "failure_count": 32,
8
+ "failures_by_surface": {
9
+ "hf_space": 10,
10
+ "hf_artifacts": 11,
11
+ "hf_model": 11
12
+ }
13
  },
14
  "checks": [
15
  {
16
  "name": "repo_hf_space_artifact_model_data_parity",
17
+ "status": "fail"
18
  },
19
  {
20
  "name": "repo_hf_visual_asset_parity",
 
22
  },
23
  {
24
  "name": "repo_hf_validator_script_parity",
25
+ "status": "fail"
26
  },
27
  {
28
  "name": "repo_hf_website_html_parity",
 
30
  },
31
  {
32
  "name": "repo_hf_diagnostic_result_parity",
33
+ "status": "fail"
34
  },
35
  {
36
  "name": "repo_hf_quality_doc_parity",
 
102
  },
103
  {
104
  "name": "data/artifact_index.json",
105
+ "status": "fail",
106
  "local": {
107
  "path": "repo:docs/data/artifact_index.json",
108
  "exists": true,
109
+ "bytes": 39486,
110
+ "sha256": "87782cd08bc1106d694a727e21333450d2965b48c48f500d1b6f4294d7b247d0"
111
  },
112
  "mirrors": {
113
  "hf_space": {
114
  "path": "hf_space:data/artifact_index.json",
115
  "exists": true,
116
+ "bytes": 39486,
117
+ "sha256": "2563b854f81b07bfde2880647d0145b511be071a1a274fe1e909ce2be7ce43e1"
118
  },
119
  "hf_artifacts": {
120
  "path": "hf_artifacts:docs/data/artifact_index.json",
121
  "exists": true,
122
+ "bytes": 39486,
123
+ "sha256": "2563b854f81b07bfde2880647d0145b511be071a1a274fe1e909ce2be7ce43e1"
124
  },
125
  "hf_model": {
126
  "path": "hf_model:metrics/artifact_index.json",
127
  "exists": true,
128
+ "bytes": 39486,
129
+ "sha256": "2563b854f81b07bfde2880647d0145b511be071a1a274fe1e909ce2be7ce43e1"
130
  }
131
  },
132
+ "failures": [
133
+ {
134
+ "surface": "hf_space",
135
+ "kind": "hash_mismatch",
136
+ "path": "hf_space:data/artifact_index.json",
137
+ "expected_sha256": "87782cd08bc1106d694a727e21333450d2965b48c48f500d1b6f4294d7b247d0",
138
+ "actual_sha256": "2563b854f81b07bfde2880647d0145b511be071a1a274fe1e909ce2be7ce43e1"
139
+ },
140
+ {
141
+ "surface": "hf_artifacts",
142
+ "kind": "hash_mismatch",
143
+ "path": "hf_artifacts:docs/data/artifact_index.json",
144
+ "expected_sha256": "87782cd08bc1106d694a727e21333450d2965b48c48f500d1b6f4294d7b247d0",
145
+ "actual_sha256": "2563b854f81b07bfde2880647d0145b511be071a1a274fe1e909ce2be7ce43e1"
146
+ },
147
+ {
148
+ "surface": "hf_model",
149
+ "kind": "hash_mismatch",
150
+ "path": "hf_model:metrics/artifact_index.json",
151
+ "expected_sha256": "87782cd08bc1106d694a727e21333450d2965b48c48f500d1b6f4294d7b247d0",
152
+ "actual_sha256": "2563b854f81b07bfde2880647d0145b511be071a1a274fe1e909ce2be7ce43e1"
153
+ }
154
+ ]
155
  },
156
  {
157
  "name": "data/brand_assets.json",
 
376
  "local": {
377
  "path": "repo:docs/data/omni_finetune_verified_result.json",
378
  "exists": true,
379
+ "bytes": 4142,
380
+ "sha256": "297aa6fc86bc09ba7968f3c5c2db265320c0613c5ec9a36701114ba451321b81"
381
  },
382
  "mirrors": {
383
  "hf_space": {
384
  "path": "hf_space:data/omni_finetune_verified_result.json",
385
  "exists": true,
386
+ "bytes": 4142,
387
+ "sha256": "297aa6fc86bc09ba7968f3c5c2db265320c0613c5ec9a36701114ba451321b81"
388
  },
389
  "hf_artifacts": {
390
  "path": "hf_artifacts:docs/data/omni_finetune_verified_result.json",
391
  "exists": true,
392
+ "bytes": 4142,
393
+ "sha256": "297aa6fc86bc09ba7968f3c5c2db265320c0613c5ec9a36701114ba451321b81"
394
  },
395
  "hf_model": {
396
  "path": "hf_model:metrics/omni_finetune_verified_result.json",
397
  "exists": true,
398
+ "bytes": 4142,
399
+ "sha256": "297aa6fc86bc09ba7968f3c5c2db265320c0613c5ec9a36701114ba451321b81"
400
  }
401
  },
402
  "failures": []
 
500
  "local": {
501
  "path": "repo:docs/data/project_status.json",
502
  "exists": true,
503
+ "bytes": 11274,
504
+ "sha256": "ae2b2c520ab1e0553fa399439345edd87832fa5293d8c27ffe610ede5bfa1067"
505
  },
506
  "mirrors": {
507
  "hf_space": {
508
  "path": "hf_space:data/project_status.json",
509
  "exists": true,
510
+ "bytes": 11274,
511
+ "sha256": "ae2b2c520ab1e0553fa399439345edd87832fa5293d8c27ffe610ede5bfa1067"
512
  },
513
  "hf_artifacts": {
514
  "path": "hf_artifacts:docs/data/project_status.json",
515
  "exists": true,
516
+ "bytes": 11274,
517
+ "sha256": "ae2b2c520ab1e0553fa399439345edd87832fa5293d8c27ffe610ede5bfa1067"
518
  },
519
  "hf_model": {
520
  "path": "hf_model:metrics/project_status.json",
521
  "exists": true,
522
+ "bytes": 11274,
523
+ "sha256": "ae2b2c520ab1e0553fa399439345edd87832fa5293d8c27ffe610ede5bfa1067"
524
  }
525
  },
526
  "failures": []
527
  },
528
  {
529
  "name": "data/publication_audit.json",
530
+ "status": "fail",
531
  "local": {
532
  "path": "repo:docs/data/publication_audit.json",
533
  "exists": true,
534
  "bytes": 7237,
535
+ "sha256": "8a21c29d92f3a15b835c37d7784c17fada3edbda050515deed8e440535ed046d"
536
  },
537
  "mirrors": {
538
  "hf_space": {
539
  "path": "hf_space:data/publication_audit.json",
540
  "exists": true,
541
  "bytes": 7237,
542
+ "sha256": "a1741a97c2fb5dee8b9ed8e988b31530128e4fab8b8c458cb8f381e2ad16756c"
543
  },
544
  "hf_artifacts": {
545
  "path": "hf_artifacts:docs/data/publication_audit.json",
546
  "exists": true,
547
  "bytes": 7237,
548
+ "sha256": "a1741a97c2fb5dee8b9ed8e988b31530128e4fab8b8c458cb8f381e2ad16756c"
549
  },
550
  "hf_model": {
551
  "path": "hf_model:metrics/publication_audit.json",
552
  "exists": true,
553
  "bytes": 7237,
554
+ "sha256": "a1741a97c2fb5dee8b9ed8e988b31530128e4fab8b8c458cb8f381e2ad16756c"
555
  }
556
  },
557
+ "failures": [
558
+ {
559
+ "surface": "hf_space",
560
+ "kind": "hash_mismatch",
561
+ "path": "hf_space:data/publication_audit.json",
562
+ "expected_sha256": "8a21c29d92f3a15b835c37d7784c17fada3edbda050515deed8e440535ed046d",
563
+ "actual_sha256": "a1741a97c2fb5dee8b9ed8e988b31530128e4fab8b8c458cb8f381e2ad16756c"
564
+ },
565
+ {
566
+ "surface": "hf_artifacts",
567
+ "kind": "hash_mismatch",
568
+ "path": "hf_artifacts:docs/data/publication_audit.json",
569
+ "expected_sha256": "8a21c29d92f3a15b835c37d7784c17fada3edbda050515deed8e440535ed046d",
570
+ "actual_sha256": "a1741a97c2fb5dee8b9ed8e988b31530128e4fab8b8c458cb8f381e2ad16756c"
571
+ },
572
+ {
573
+ "surface": "hf_model",
574
+ "kind": "hash_mismatch",
575
+ "path": "hf_model:metrics/publication_audit.json",
576
+ "expected_sha256": "8a21c29d92f3a15b835c37d7784c17fada3edbda050515deed8e440535ed046d",
577
+ "actual_sha256": "a1741a97c2fb5dee8b9ed8e988b31530128e4fab8b8c458cb8f381e2ad16756c"
578
+ }
579
+ ]
580
  },
581
  {
582
  "name": "data/public_surface_qa.json",
 
859
  },
860
  {
861
  "name": "data/scope_claims_audit.json",
862
+ "status": "fail",
863
  "local": {
864
  "path": "repo:docs/data/scope_claims_audit.json",
865
  "exists": true,
866
  "bytes": 20823,
867
+ "sha256": "77402dc77c4ecf5cf1e68480ae2c9822a134ae7ef4a24a7b8b9008a2509c2fa3"
868
  },
869
  "mirrors": {
870
  "hf_space": {
871
  "path": "hf_space:data/scope_claims_audit.json",
872
  "exists": true,
873
  "bytes": 20823,
874
+ "sha256": "4fb8c088f8ec533b142534b37e9241f8690c8819333434f5d89336c2af8c1c31"
875
  },
876
  "hf_artifacts": {
877
  "path": "hf_artifacts:docs/data/scope_claims_audit.json",
878
  "exists": true,
879
  "bytes": 20823,
880
+ "sha256": "4fb8c088f8ec533b142534b37e9241f8690c8819333434f5d89336c2af8c1c31"
881
  },
882
  "hf_model": {
883
  "path": "hf_model:metrics/scope_claims_audit.json",
884
  "exists": true,
885
  "bytes": 20823,
886
+ "sha256": "4fb8c088f8ec533b142534b37e9241f8690c8819333434f5d89336c2af8c1c31"
887
  }
888
  },
889
+ "failures": [
890
+ {
891
+ "surface": "hf_space",
892
+ "kind": "hash_mismatch",
893
+ "path": "hf_space:data/scope_claims_audit.json",
894
+ "expected_sha256": "77402dc77c4ecf5cf1e68480ae2c9822a134ae7ef4a24a7b8b9008a2509c2fa3",
895
+ "actual_sha256": "4fb8c088f8ec533b142534b37e9241f8690c8819333434f5d89336c2af8c1c31"
896
+ },
897
+ {
898
+ "surface": "hf_artifacts",
899
+ "kind": "hash_mismatch",
900
+ "path": "hf_artifacts:docs/data/scope_claims_audit.json",
901
+ "expected_sha256": "77402dc77c4ecf5cf1e68480ae2c9822a134ae7ef4a24a7b8b9008a2509c2fa3",
902
+ "actual_sha256": "4fb8c088f8ec533b142534b37e9241f8690c8819333434f5d89336c2af8c1c31"
903
+ },
904
+ {
905
+ "surface": "hf_model",
906
+ "kind": "hash_mismatch",
907
+ "path": "hf_model:metrics/scope_claims_audit.json",
908
+ "expected_sha256": "77402dc77c4ecf5cf1e68480ae2c9822a134ae7ef4a24a7b8b9008a2509c2fa3",
909
+ "actual_sha256": "4fb8c088f8ec533b142534b37e9241f8690c8819333434f5d89336c2af8c1c31"
910
+ }
911
+ ]
912
  },
913
  {
914
  "name": "data/single_episode_explorer.json",
 
1005
  },
1006
  {
1007
  "name": "data/task_surface_integrity.json",
1008
+ "status": "fail",
1009
  "local": {
1010
  "path": "repo:docs/data/task_surface_integrity.json",
1011
  "exists": true,
1012
  "bytes": 45779,
1013
+ "sha256": "8232e2bafa8b5157d97c018e41be5da3ec69ddb4d2020a0dcc7c6377c5575bb6"
1014
  },
1015
  "mirrors": {
1016
  "hf_space": {
1017
  "path": "hf_space:data/task_surface_integrity.json",
1018
  "exists": true,
1019
  "bytes": 45779,
1020
+ "sha256": "51c30fe86c558042960e57a252bc6d3c67d95d5a70a8747043a1cdffe57cf53f"
1021
  },
1022
  "hf_artifacts": {
1023
  "path": "hf_artifacts:docs/data/task_surface_integrity.json",
1024
  "exists": true,
1025
  "bytes": 45779,
1026
+ "sha256": "51c30fe86c558042960e57a252bc6d3c67d95d5a70a8747043a1cdffe57cf53f"
1027
  },
1028
  "hf_model": {
1029
  "path": "hf_model:metrics/task_surface_integrity.json",
1030
  "exists": true,
1031
  "bytes": 45779,
1032
+ "sha256": "51c30fe86c558042960e57a252bc6d3c67d95d5a70a8747043a1cdffe57cf53f"
1033
  }
1034
  },
1035
+ "failures": [
1036
+ {
1037
+ "surface": "hf_space",
1038
+ "kind": "hash_mismatch",
1039
+ "path": "hf_space:data/task_surface_integrity.json",
1040
+ "expected_sha256": "8232e2bafa8b5157d97c018e41be5da3ec69ddb4d2020a0dcc7c6377c5575bb6",
1041
+ "actual_sha256": "51c30fe86c558042960e57a252bc6d3c67d95d5a70a8747043a1cdffe57cf53f"
1042
+ },
1043
+ {
1044
+ "surface": "hf_artifacts",
1045
+ "kind": "hash_mismatch",
1046
+ "path": "hf_artifacts:docs/data/task_surface_integrity.json",
1047
+ "expected_sha256": "8232e2bafa8b5157d97c018e41be5da3ec69ddb4d2020a0dcc7c6377c5575bb6",
1048
+ "actual_sha256": "51c30fe86c558042960e57a252bc6d3c67d95d5a70a8747043a1cdffe57cf53f"
1049
+ },
1050
+ {
1051
+ "surface": "hf_model",
1052
+ "kind": "hash_mismatch",
1053
+ "path": "hf_model:metrics/task_surface_integrity.json",
1054
+ "expected_sha256": "8232e2bafa8b5157d97c018e41be5da3ec69ddb4d2020a0dcc7c6377c5575bb6",
1055
+ "actual_sha256": "51c30fe86c558042960e57a252bc6d3c67d95d5a70a8747043a1cdffe57cf53f"
1056
+ }
1057
+ ]
1058
  },
1059
  {
1060
  "name": "data/task_walkthroughs.json",
 
1089
  },
1090
  {
1091
  "name": "data/website_integrity.json",
1092
+ "status": "fail",
1093
  "local": {
1094
  "path": "repo:docs/data/website_integrity.json",
1095
  "exists": true,
1096
  "bytes": 15221,
1097
+ "sha256": "dcbd09b4c4522770c43504c500eb653de706538516ee2ec72e491ffc3416c6e2"
1098
  },
1099
  "mirrors": {
1100
  "hf_space": {
1101
  "path": "hf_space:data/website_integrity.json",
1102
  "exists": true,
1103
  "bytes": 15221,
1104
+ "sha256": "140d8be179f51351ae55ba7587b7042a1e512e72d9318563a78c96a25e13f830"
1105
  },
1106
  "hf_artifacts": {
1107
  "path": "hf_artifacts:docs/data/website_integrity.json",
1108
  "exists": true,
1109
  "bytes": 15221,
1110
+ "sha256": "140d8be179f51351ae55ba7587b7042a1e512e72d9318563a78c96a25e13f830"
1111
  },
1112
  "hf_model": {
1113
  "path": "hf_model:metrics/website_integrity.json",
1114
  "exists": true,
1115
  "bytes": 15221,
1116
+ "sha256": "140d8be179f51351ae55ba7587b7042a1e512e72d9318563a78c96a25e13f830"
1117
  }
1118
  },
1119
+ "failures": [
1120
+ {
1121
+ "surface": "hf_space",
1122
+ "kind": "hash_mismatch",
1123
+ "path": "hf_space:data/website_integrity.json",
1124
+ "expected_sha256": "dcbd09b4c4522770c43504c500eb653de706538516ee2ec72e491ffc3416c6e2",
1125
+ "actual_sha256": "140d8be179f51351ae55ba7587b7042a1e512e72d9318563a78c96a25e13f830"
1126
+ },
1127
+ {
1128
+ "surface": "hf_artifacts",
1129
+ "kind": "hash_mismatch",
1130
+ "path": "hf_artifacts:docs/data/website_integrity.json",
1131
+ "expected_sha256": "dcbd09b4c4522770c43504c500eb653de706538516ee2ec72e491ffc3416c6e2",
1132
+ "actual_sha256": "140d8be179f51351ae55ba7587b7042a1e512e72d9318563a78c96a25e13f830"
1133
+ },
1134
+ {
1135
+ "surface": "hf_model",
1136
+ "kind": "hash_mismatch",
1137
+ "path": "hf_model:metrics/website_integrity.json",
1138
+ "expected_sha256": "dcbd09b4c4522770c43504c500eb653de706538516ee2ec72e491ffc3416c6e2",
1139
+ "actual_sha256": "140d8be179f51351ae55ba7587b7042a1e512e72d9318563a78c96a25e13f830"
1140
+ }
1141
+ ]
1142
  },
1143
  {
1144
  "name": "data/xperience10m_dataset_card_alignment.json",
 
1837
  },
1838
  "failures": []
1839
  },
1840
+ {
1841
+ "name": "scripts/omni/analyze_qwen3_omni_errors.py",
1842
+ "status": "fail",
1843
+ "local": {
1844
+ "path": "repo:scripts/omni/analyze_qwen3_omni_errors.py",
1845
+ "exists": true,
1846
+ "bytes": 15676,
1847
+ "sha256": "d4c7e46d9fbd5f9d84bc32374f457fd8c9d68c8faa39c77bc45770eb95d80337"
1848
+ },
1849
+ "mirrors": {
1850
+ "hf_artifacts": {
1851
+ "path": "hf_artifacts:scripts/omni/analyze_qwen3_omni_errors.py",
1852
+ "exists": true,
1853
+ "bytes": 15655,
1854
+ "sha256": "e90ffd4bb75b001ab41cd956dfbb0a99b574d0b5e8ffc1a64e2887490d658daa"
1855
+ },
1856
+ "hf_model": {
1857
+ "path": "hf_model:scripts/omni/analyze_qwen3_omni_errors.py",
1858
+ "exists": true,
1859
+ "bytes": 15655,
1860
+ "sha256": "e90ffd4bb75b001ab41cd956dfbb0a99b574d0b5e8ffc1a64e2887490d658daa"
1861
+ }
1862
+ },
1863
+ "failures": [
1864
+ {
1865
+ "surface": "hf_artifacts",
1866
+ "kind": "hash_mismatch",
1867
+ "path": "hf_artifacts:scripts/omni/analyze_qwen3_omni_errors.py",
1868
+ "expected_sha256": "d4c7e46d9fbd5f9d84bc32374f457fd8c9d68c8faa39c77bc45770eb95d80337",
1869
+ "actual_sha256": "e90ffd4bb75b001ab41cd956dfbb0a99b574d0b5e8ffc1a64e2887490d658daa"
1870
+ },
1871
+ {
1872
+ "surface": "hf_model",
1873
+ "kind": "hash_mismatch",
1874
+ "path": "hf_model:scripts/omni/analyze_qwen3_omni_errors.py",
1875
+ "expected_sha256": "d4c7e46d9fbd5f9d84bc32374f457fd8c9d68c8faa39c77bc45770eb95d80337",
1876
+ "actual_sha256": "e90ffd4bb75b001ab41cd956dfbb0a99b574d0b5e8ffc1a64e2887490d658daa"
1877
+ }
1878
+ ]
1879
+ },
1880
  {
1881
  "name": "scripts/audio_ablation_and_raw_upgrade.py",
1882
  "status": "pass",
 
1908
  "local": {
1909
  "path": "repo:scripts/build_artifact_index.py",
1910
  "exists": true,
1911
+ "bytes": 32191,
1912
+ "sha256": "4a105c732d2f6c54a78333d7f47e0139325ba638027e34e6acd929a90626b8e0"
1913
  },
1914
  "mirrors": {
1915
  "hf_artifacts": {
1916
  "path": "hf_artifacts:scripts/build_artifact_index.py",
1917
  "exists": true,
1918
+ "bytes": 32191,
1919
+ "sha256": "4a105c732d2f6c54a78333d7f47e0139325ba638027e34e6acd929a90626b8e0"
1920
  },
1921
  "hf_model": {
1922
  "path": "hf_model:scripts/build_artifact_index.py",
1923
  "exists": true,
1924
+ "bytes": 32191,
1925
+ "sha256": "4a105c732d2f6c54a78333d7f47e0139325ba638027e34e6acd929a90626b8e0"
1926
  }
1927
  },
1928
  "failures": []
 
2208
  "local": {
2209
  "path": "repo:scripts/validate_mirror_parity.py",
2210
  "exists": true,
2211
+ "bytes": 13781,
2212
+ "sha256": "3659adf936b058617dde97ee4c424615a361e59f5ea74975116422dfe01768e8"
2213
  },
2214
  "mirrors": {
2215
  "hf_artifacts": {
2216
  "path": "hf_artifacts:scripts/validate_mirror_parity.py",
2217
  "exists": true,
2218
+ "bytes": 13781,
2219
+ "sha256": "3659adf936b058617dde97ee4c424615a361e59f5ea74975116422dfe01768e8"
2220
  },
2221
  "hf_model": {
2222
  "path": "hf_model:scripts/validate_mirror_parity.py",
2223
  "exists": true,
2224
+ "bytes": 13781,
2225
+ "sha256": "3659adf936b058617dde97ee4c424615a361e59f5ea74975116422dfe01768e8"
2226
  }
2227
  },
2228
  "failures": []
 
2961
  },
2962
  "failures": []
2963
  },
2964
+ {
2965
+ "name": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/ERROR_ANALYSIS.md",
2966
+ "status": "pass",
2967
+ "local": {
2968
+ "path": "repo:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/ERROR_ANALYSIS.md",
2969
+ "exists": true,
2970
+ "bytes": 3331,
2971
+ "sha256": "063fcc2ebd7b57ab5b281fd5e8edc629da4e1f4e5a708483ba27375d02af9467"
2972
+ },
2973
+ "mirrors": {
2974
+ "hf_space": {
2975
+ "path": "hf_space:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/ERROR_ANALYSIS.md",
2976
+ "exists": true,
2977
+ "bytes": 3331,
2978
+ "sha256": "063fcc2ebd7b57ab5b281fd5e8edc629da4e1f4e5a708483ba27375d02af9467"
2979
+ },
2980
+ "hf_artifacts": {
2981
+ "path": "hf_artifacts:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/ERROR_ANALYSIS.md",
2982
+ "exists": true,
2983
+ "bytes": 3331,
2984
+ "sha256": "063fcc2ebd7b57ab5b281fd5e8edc629da4e1f4e5a708483ba27375d02af9467"
2985
+ },
2986
+ "hf_model": {
2987
+ "path": "hf_model:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/ERROR_ANALYSIS.md",
2988
+ "exists": true,
2989
+ "bytes": 3331,
2990
+ "sha256": "063fcc2ebd7b57ab5b281fd5e8edc629da4e1f4e5a708483ba27375d02af9467"
2991
+ }
2992
+ },
2993
+ "failures": []
2994
+ },
2995
+ {
2996
+ "name": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/error_analysis_summary.json",
2997
+ "status": "pass",
2998
+ "local": {
2999
+ "path": "repo:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/error_analysis_summary.json",
3000
+ "exists": true,
3001
+ "bytes": 25202,
3002
+ "sha256": "c2e4eaa686f5d9739a8d0bfd8ae51a453b94019489ed84a154e2bce2fa316ff5"
3003
+ },
3004
+ "mirrors": {
3005
+ "hf_space": {
3006
+ "path": "hf_space:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/error_analysis_summary.json",
3007
+ "exists": true,
3008
+ "bytes": 25202,
3009
+ "sha256": "c2e4eaa686f5d9739a8d0bfd8ae51a453b94019489ed84a154e2bce2fa316ff5"
3010
+ },
3011
+ "hf_artifacts": {
3012
+ "path": "hf_artifacts:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/error_analysis_summary.json",
3013
+ "exists": true,
3014
+ "bytes": 25202,
3015
+ "sha256": "c2e4eaa686f5d9739a8d0bfd8ae51a453b94019489ed84a154e2bce2fa316ff5"
3016
+ },
3017
+ "hf_model": {
3018
+ "path": "hf_model:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/error_analysis_summary.json",
3019
+ "exists": true,
3020
+ "bytes": 25202,
3021
+ "sha256": "c2e4eaa686f5d9739a8d0bfd8ae51a453b94019489ed84a154e2bce2fa316ff5"
3022
+ }
3023
+ },
3024
+ "failures": []
3025
+ },
3026
+ {
3027
+ "name": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/episode_error_analysis.csv",
3028
+ "status": "fail",
3029
+ "local": {
3030
+ "path": "repo:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/episode_error_analysis.csv",
3031
+ "exists": true,
3032
+ "bytes": 2121,
3033
+ "sha256": "7f0bc74140f100b9fe444c38eb74d155605bfc5984f665e653a2cd34a5cb96bd"
3034
+ },
3035
+ "mirrors": {
3036
+ "hf_space": {
3037
+ "path": "hf_space:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/episode_error_analysis.csv",
3038
+ "exists": true,
3039
+ "bytes": 2136,
3040
+ "sha256": "4024fa756edb5a8a9aaac7c213eb411e8d146b109594ad339cc13b08c960bba9"
3041
+ },
3042
+ "hf_artifacts": {
3043
+ "path": "hf_artifacts:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/episode_error_analysis.csv",
3044
+ "exists": true,
3045
+ "bytes": 2136,
3046
+ "sha256": "4024fa756edb5a8a9aaac7c213eb411e8d146b109594ad339cc13b08c960bba9"
3047
+ },
3048
+ "hf_model": {
3049
+ "path": "hf_model:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/episode_error_analysis.csv",
3050
+ "exists": true,
3051
+ "bytes": 2136,
3052
+ "sha256": "4024fa756edb5a8a9aaac7c213eb411e8d146b109594ad339cc13b08c960bba9"
3053
+ }
3054
+ },
3055
+ "failures": [
3056
+ {
3057
+ "surface": "hf_space",
3058
+ "kind": "hash_mismatch",
3059
+ "path": "hf_space:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/episode_error_analysis.csv",
3060
+ "expected_sha256": "7f0bc74140f100b9fe444c38eb74d155605bfc5984f665e653a2cd34a5cb96bd",
3061
+ "actual_sha256": "4024fa756edb5a8a9aaac7c213eb411e8d146b109594ad339cc13b08c960bba9"
3062
+ },
3063
+ {
3064
+ "surface": "hf_artifacts",
3065
+ "kind": "hash_mismatch",
3066
+ "path": "hf_artifacts:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/episode_error_analysis.csv",
3067
+ "expected_sha256": "7f0bc74140f100b9fe444c38eb74d155605bfc5984f665e653a2cd34a5cb96bd",
3068
+ "actual_sha256": "4024fa756edb5a8a9aaac7c213eb411e8d146b109594ad339cc13b08c960bba9"
3069
+ },
3070
+ {
3071
+ "surface": "hf_model",
3072
+ "kind": "hash_mismatch",
3073
+ "path": "hf_model:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/episode_error_analysis.csv",
3074
+ "expected_sha256": "7f0bc74140f100b9fe444c38eb74d155605bfc5984f665e653a2cd34a5cb96bd",
3075
+ "actual_sha256": "4024fa756edb5a8a9aaac7c213eb411e8d146b109594ad339cc13b08c960bba9"
3076
+ }
3077
+ ]
3078
+ },
3079
+ {
3080
+ "name": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/action_family_error_analysis.csv",
3081
+ "status": "fail",
3082
+ "local": {
3083
+ "path": "repo:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/action_family_error_analysis.csv",
3084
+ "exists": true,
3085
+ "bytes": 1320,
3086
+ "sha256": "e15bf22e96b887c4b00aeb8ba548f4fd72ea0aab0772cc59e9bdda517ad72430"
3087
+ },
3088
+ "mirrors": {
3089
+ "hf_space": {
3090
+ "path": "hf_space:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/action_family_error_analysis.csv",
3091
+ "exists": true,
3092
+ "bytes": 1329,
3093
+ "sha256": "d995069202708fa456b35aa459ba6e66d90c799b3c4f7b43aa0f6ac4871c986a"
3094
+ },
3095
+ "hf_artifacts": {
3096
+ "path": "hf_artifacts:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/action_family_error_analysis.csv",
3097
+ "exists": true,
3098
+ "bytes": 1329,
3099
+ "sha256": "d995069202708fa456b35aa459ba6e66d90c799b3c4f7b43aa0f6ac4871c986a"
3100
+ },
3101
+ "hf_model": {
3102
+ "path": "hf_model:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/action_family_error_analysis.csv",
3103
+ "exists": true,
3104
+ "bytes": 1329,
3105
+ "sha256": "d995069202708fa456b35aa459ba6e66d90c799b3c4f7b43aa0f6ac4871c986a"
3106
+ }
3107
+ },
3108
+ "failures": [
3109
+ {
3110
+ "surface": "hf_space",
3111
+ "kind": "hash_mismatch",
3112
+ "path": "hf_space:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/action_family_error_analysis.csv",
3113
+ "expected_sha256": "e15bf22e96b887c4b00aeb8ba548f4fd72ea0aab0772cc59e9bdda517ad72430",
3114
+ "actual_sha256": "d995069202708fa456b35aa459ba6e66d90c799b3c4f7b43aa0f6ac4871c986a"
3115
+ },
3116
+ {
3117
+ "surface": "hf_artifacts",
3118
+ "kind": "hash_mismatch",
3119
+ "path": "hf_artifacts:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/action_family_error_analysis.csv",
3120
+ "expected_sha256": "e15bf22e96b887c4b00aeb8ba548f4fd72ea0aab0772cc59e9bdda517ad72430",
3121
+ "actual_sha256": "d995069202708fa456b35aa459ba6e66d90c799b3c4f7b43aa0f6ac4871c986a"
3122
+ },
3123
+ {
3124
+ "surface": "hf_model",
3125
+ "kind": "hash_mismatch",
3126
+ "path": "hf_model:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/action_family_error_analysis.csv",
3127
+ "expected_sha256": "e15bf22e96b887c4b00aeb8ba548f4fd72ea0aab0772cc59e9bdda517ad72430",
3128
+ "actual_sha256": "d995069202708fa456b35aa459ba6e66d90c799b3c4f7b43aa0f6ac4871c986a"
3129
+ }
3130
+ ]
3131
+ },
3132
+ {
3133
+ "name": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/train_seen_error_analysis.csv",
3134
+ "status": "fail",
3135
+ "local": {
3136
+ "path": "repo:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/train_seen_error_analysis.csv",
3137
+ "exists": true,
3138
+ "bytes": 572,
3139
+ "sha256": "cb196616b6f073266087d8cb7182e36c0a761607f3082ad78c350fd99e1996e7"
3140
+ },
3141
+ "mirrors": {
3142
+ "hf_space": {
3143
+ "path": "hf_space:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/train_seen_error_analysis.csv",
3144
+ "exists": true,
3145
+ "bytes": 575,
3146
+ "sha256": "51de0dd0c65d6edc25e78598eebd681fd6ec16ac27de0fa406cc4318023402ad"
3147
+ },
3148
+ "hf_artifacts": {
3149
+ "path": "hf_artifacts:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/train_seen_error_analysis.csv",
3150
+ "exists": true,
3151
+ "bytes": 575,
3152
+ "sha256": "51de0dd0c65d6edc25e78598eebd681fd6ec16ac27de0fa406cc4318023402ad"
3153
+ },
3154
+ "hf_model": {
3155
+ "path": "hf_model:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/train_seen_error_analysis.csv",
3156
+ "exists": true,
3157
+ "bytes": 575,
3158
+ "sha256": "51de0dd0c65d6edc25e78598eebd681fd6ec16ac27de0fa406cc4318023402ad"
3159
+ }
3160
+ },
3161
+ "failures": [
3162
+ {
3163
+ "surface": "hf_space",
3164
+ "kind": "hash_mismatch",
3165
+ "path": "hf_space:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/train_seen_error_analysis.csv",
3166
+ "expected_sha256": "cb196616b6f073266087d8cb7182e36c0a761607f3082ad78c350fd99e1996e7",
3167
+ "actual_sha256": "51de0dd0c65d6edc25e78598eebd681fd6ec16ac27de0fa406cc4318023402ad"
3168
+ },
3169
+ {
3170
+ "surface": "hf_artifacts",
3171
+ "kind": "hash_mismatch",
3172
+ "path": "hf_artifacts:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/train_seen_error_analysis.csv",
3173
+ "expected_sha256": "cb196616b6f073266087d8cb7182e36c0a761607f3082ad78c350fd99e1996e7",
3174
+ "actual_sha256": "51de0dd0c65d6edc25e78598eebd681fd6ec16ac27de0fa406cc4318023402ad"
3175
+ },
3176
+ {
3177
+ "surface": "hf_model",
3178
+ "kind": "hash_mismatch",
3179
+ "path": "hf_model:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/train_seen_error_analysis.csv",
3180
+ "expected_sha256": "cb196616b6f073266087d8cb7182e36c0a761607f3082ad78c350fd99e1996e7",
3181
+ "actual_sha256": "51de0dd0c65d6edc25e78598eebd681fd6ec16ac27de0fa406cc4318023402ad"
3182
+ }
3183
+ ]
3184
+ },
3185
+ {
3186
+ "name": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/missing_modality_error_analysis.csv",
3187
+ "status": "fail",
3188
+ "local": {
3189
+ "path": "repo:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/missing_modality_error_analysis.csv",
3190
+ "exists": true,
3191
+ "bytes": 408,
3192
+ "sha256": "6447cf285b466a914055adb0aef4f3d47bf82d33a277d8ca2e6f22c4f0f2a7f7"
3193
+ },
3194
+ "mirrors": {
3195
+ "hf_space": {
3196
+ "path": "hf_space:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/missing_modality_error_analysis.csv",
3197
+ "exists": true,
3198
+ "bytes": 410,
3199
+ "sha256": "1bef2f2a709e2b93a01d1cbb43bb483de5bc5b18b25707c796e1df0bab204171"
3200
+ },
3201
+ "hf_artifacts": {
3202
+ "path": "hf_artifacts:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/missing_modality_error_analysis.csv",
3203
+ "exists": true,
3204
+ "bytes": 410,
3205
+ "sha256": "1bef2f2a709e2b93a01d1cbb43bb483de5bc5b18b25707c796e1df0bab204171"
3206
+ },
3207
+ "hf_model": {
3208
+ "path": "hf_model:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/missing_modality_error_analysis.csv",
3209
+ "exists": true,
3210
+ "bytes": 410,
3211
+ "sha256": "1bef2f2a709e2b93a01d1cbb43bb483de5bc5b18b25707c796e1df0bab204171"
3212
+ }
3213
+ },
3214
+ "failures": [
3215
+ {
3216
+ "surface": "hf_space",
3217
+ "kind": "hash_mismatch",
3218
+ "path": "hf_space:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/missing_modality_error_analysis.csv",
3219
+ "expected_sha256": "6447cf285b466a914055adb0aef4f3d47bf82d33a277d8ca2e6f22c4f0f2a7f7",
3220
+ "actual_sha256": "1bef2f2a709e2b93a01d1cbb43bb483de5bc5b18b25707c796e1df0bab204171"
3221
+ },
3222
+ {
3223
+ "surface": "hf_artifacts",
3224
+ "kind": "hash_mismatch",
3225
+ "path": "hf_artifacts:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/missing_modality_error_analysis.csv",
3226
+ "expected_sha256": "6447cf285b466a914055adb0aef4f3d47bf82d33a277d8ca2e6f22c4f0f2a7f7",
3227
+ "actual_sha256": "1bef2f2a709e2b93a01d1cbb43bb483de5bc5b18b25707c796e1df0bab204171"
3228
+ },
3229
+ {
3230
+ "surface": "hf_model",
3231
+ "kind": "hash_mismatch",
3232
+ "path": "hf_model:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/missing_modality_error_analysis.csv",
3233
+ "expected_sha256": "6447cf285b466a914055adb0aef4f3d47bf82d33a277d8ca2e6f22c4f0f2a7f7",
3234
+ "actual_sha256": "1bef2f2a709e2b93a01d1cbb43bb483de5bc5b18b25707c796e1df0bab204171"
3235
+ }
3236
+ ]
3237
+ },
3238
+ {
3239
+ "name": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/object_category_error_analysis.csv",
3240
+ "status": "fail",
3241
+ "local": {
3242
+ "path": "repo:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/object_category_error_analysis.csv",
3243
+ "exists": true,
3244
+ "bytes": 1704,
3245
+ "sha256": "f9cbd5e566ef666fe2d1050cc5bdadc7967a2056bdaa1e2e9f88fb0c22ee0ef8"
3246
+ },
3247
+ "mirrors": {
3248
+ "hf_space": {
3249
+ "path": "hf_space:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/object_category_error_analysis.csv",
3250
+ "exists": true,
3251
+ "bytes": 1715,
3252
+ "sha256": "5c0e94caf4fe1eb26565e0bb796cd3c1eed2741b98a528f38317bcfb6a4c2e23"
3253
+ },
3254
+ "hf_artifacts": {
3255
+ "path": "hf_artifacts:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/object_category_error_analysis.csv",
3256
+ "exists": true,
3257
+ "bytes": 1715,
3258
+ "sha256": "5c0e94caf4fe1eb26565e0bb796cd3c1eed2741b98a528f38317bcfb6a4c2e23"
3259
+ },
3260
+ "hf_model": {
3261
+ "path": "hf_model:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/object_category_error_analysis.csv",
3262
+ "exists": true,
3263
+ "bytes": 1715,
3264
+ "sha256": "5c0e94caf4fe1eb26565e0bb796cd3c1eed2741b98a528f38317bcfb6a4c2e23"
3265
+ }
3266
+ },
3267
+ "failures": [
3268
+ {
3269
+ "surface": "hf_space",
3270
+ "kind": "hash_mismatch",
3271
+ "path": "hf_space:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/object_category_error_analysis.csv",
3272
+ "expected_sha256": "f9cbd5e566ef666fe2d1050cc5bdadc7967a2056bdaa1e2e9f88fb0c22ee0ef8",
3273
+ "actual_sha256": "5c0e94caf4fe1eb26565e0bb796cd3c1eed2741b98a528f38317bcfb6a4c2e23"
3274
+ },
3275
+ {
3276
+ "surface": "hf_artifacts",
3277
+ "kind": "hash_mismatch",
3278
+ "path": "hf_artifacts:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/object_category_error_analysis.csv",
3279
+ "expected_sha256": "f9cbd5e566ef666fe2d1050cc5bdadc7967a2056bdaa1e2e9f88fb0c22ee0ef8",
3280
+ "actual_sha256": "5c0e94caf4fe1eb26565e0bb796cd3c1eed2741b98a528f38317bcfb6a4c2e23"
3281
+ },
3282
+ {
3283
+ "surface": "hf_model",
3284
+ "kind": "hash_mismatch",
3285
+ "path": "hf_model:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/object_category_error_analysis.csv",
3286
+ "expected_sha256": "f9cbd5e566ef666fe2d1050cc5bdadc7967a2056bdaa1e2e9f88fb0c22ee0ef8",
3287
+ "actual_sha256": "5c0e94caf4fe1eb26565e0bb796cd3c1eed2741b98a528f38317bcfb6a4c2e23"
3288
+ }
3289
+ ]
3290
+ },
3291
+ {
3292
+ "name": "docs/ARTIFACT_GUIDE.md",
3293
+ "status": "pass",
3294
+ "local": {
3295
+ "path": "repo:ARTIFACT_GUIDE.md",
3296
+ "exists": true,
3297
+ "bytes": 16318,
3298
+ "sha256": "cda5f4b5be4b7a2d26aff6ed7f930bfba13dfc463d533a9880193c0a0611b677"
3299
+ },
3300
+ "mirrors": {
3301
+ "hf_space": {
3302
+ "path": "hf_space:ARTIFACT_GUIDE.md",
3303
+ "exists": true,
3304
+ "bytes": 16318,
3305
+ "sha256": "cda5f4b5be4b7a2d26aff6ed7f930bfba13dfc463d533a9880193c0a0611b677"
3306
+ },
3307
+ "hf_artifacts": {
3308
+ "path": "hf_artifacts:ARTIFACT_GUIDE.md",
3309
+ "exists": true,
3310
+ "bytes": 16318,
3311
+ "sha256": "cda5f4b5be4b7a2d26aff6ed7f930bfba13dfc463d533a9880193c0a0611b677"
3312
+ },
3313
+ "hf_model": {
3314
+ "path": "hf_model:ARTIFACT_GUIDE.md",
3315
+ "exists": true,
3316
+ "bytes": 16318,
3317
+ "sha256": "cda5f4b5be4b7a2d26aff6ed7f930bfba13dfc463d533a9880193c0a0611b677"
3318
+ }
3319
+ },
3320
+ "failures": []
3321
+ },
3322
+ {
3323
+ "name": "docs/OMNI_MODEL_EXTENSION_CONTRACT.md",
3324
+ "status": "pass",
3325
+ "local": {
3326
+ "path": "repo:OMNI_MODEL_EXTENSION_CONTRACT.md",
3327
+ "exists": true,
3328
+ "bytes": 8900,
3329
+ "sha256": "c4e51d0aa7536045c229418603a67c6b3c5f31c9d756ca7395cb0c9455f0ed6d"
3330
+ },
3331
+ "mirrors": {
3332
+ "hf_space": {
3333
+ "path": "hf_space:OMNI_MODEL_EXTENSION_CONTRACT.md",
3334
+ "exists": true,
3335
+ "bytes": 8900,
3336
+ "sha256": "c4e51d0aa7536045c229418603a67c6b3c5f31c9d756ca7395cb0c9455f0ed6d"
3337
+ },
3338
+ "hf_artifacts": {
3339
+ "path": "hf_artifacts:OMNI_MODEL_EXTENSION_CONTRACT.md",
3340
+ "exists": true,
3341
+ "bytes": 8900,
3342
+ "sha256": "c4e51d0aa7536045c229418603a67c6b3c5f31c9d756ca7395cb0c9455f0ed6d"
3343
+ },
3344
+ "hf_model": {
3345
+ "path": "hf_model:OMNI_MODEL_EXTENSION_CONTRACT.md",
3346
+ "exists": true,
3347
+ "bytes": 8900,
3348
+ "sha256": "c4e51d0aa7536045c229418603a67c6b3c5f31c9d756ca7395cb0c9455f0ed6d"
3349
+ }
3350
+ },
3351
+ "failures": []
3352
+ },
3353
  {
3354
  "name": "docs/QUALITY_GATES.md",
3355
  "status": "pass",
 
3604
  "local": {
3605
  "path": "repo:PROJECT_STATUS.md",
3606
  "exists": true,
3607
+ "bytes": 8805,
3608
+ "sha256": "4051b78674306078880de33a144a499144b2487b11455c70a364a94cefa035a7"
3609
  },
3610
  "mirrors": {
3611
  "hf_space": {
3612
  "path": "hf_space:PROJECT_STATUS.md",
3613
  "exists": true,
3614
+ "bytes": 8805,
3615
+ "sha256": "4051b78674306078880de33a144a499144b2487b11455c70a364a94cefa035a7"
3616
  },
3617
  "hf_artifacts": {
3618
  "path": "hf_artifacts:PROJECT_STATUS.md",
3619
  "exists": true,
3620
+ "bytes": 8805,
3621
+ "sha256": "4051b78674306078880de33a144a499144b2487b11455c70a364a94cefa035a7"
3622
  },
3623
  "hf_model": {
3624
  "path": "hf_model:PROJECT_STATUS.md",
3625
  "exists": true,
3626
+ "bytes": 8805,
3627
+ "sha256": "4051b78674306078880de33a144a499144b2487b11455c70a364a94cefa035a7"
3628
  }
3629
  },
3630
  "failures": []
 
3754
  "failures": []
3755
  }
3756
  ],
3757
+ "failures": [
3758
+ {
3759
+ "group": "data/artifact_index.json",
3760
+ "surface": "hf_space",
3761
+ "kind": "hash_mismatch",
3762
+ "path": "hf_space:data/artifact_index.json",
3763
+ "expected_sha256": "87782cd08bc1106d694a727e21333450d2965b48c48f500d1b6f4294d7b247d0",
3764
+ "actual_sha256": "2563b854f81b07bfde2880647d0145b511be071a1a274fe1e909ce2be7ce43e1"
3765
+ },
3766
+ {
3767
+ "group": "data/artifact_index.json",
3768
+ "surface": "hf_artifacts",
3769
+ "kind": "hash_mismatch",
3770
+ "path": "hf_artifacts:docs/data/artifact_index.json",
3771
+ "expected_sha256": "87782cd08bc1106d694a727e21333450d2965b48c48f500d1b6f4294d7b247d0",
3772
+ "actual_sha256": "2563b854f81b07bfde2880647d0145b511be071a1a274fe1e909ce2be7ce43e1"
3773
+ },
3774
+ {
3775
+ "group": "data/artifact_index.json",
3776
+ "surface": "hf_model",
3777
+ "kind": "hash_mismatch",
3778
+ "path": "hf_model:metrics/artifact_index.json",
3779
+ "expected_sha256": "87782cd08bc1106d694a727e21333450d2965b48c48f500d1b6f4294d7b247d0",
3780
+ "actual_sha256": "2563b854f81b07bfde2880647d0145b511be071a1a274fe1e909ce2be7ce43e1"
3781
+ },
3782
+ {
3783
+ "group": "data/publication_audit.json",
3784
+ "surface": "hf_space",
3785
+ "kind": "hash_mismatch",
3786
+ "path": "hf_space:data/publication_audit.json",
3787
+ "expected_sha256": "8a21c29d92f3a15b835c37d7784c17fada3edbda050515deed8e440535ed046d",
3788
+ "actual_sha256": "a1741a97c2fb5dee8b9ed8e988b31530128e4fab8b8c458cb8f381e2ad16756c"
3789
+ },
3790
+ {
3791
+ "group": "data/publication_audit.json",
3792
+ "surface": "hf_artifacts",
3793
+ "kind": "hash_mismatch",
3794
+ "path": "hf_artifacts:docs/data/publication_audit.json",
3795
+ "expected_sha256": "8a21c29d92f3a15b835c37d7784c17fada3edbda050515deed8e440535ed046d",
3796
+ "actual_sha256": "a1741a97c2fb5dee8b9ed8e988b31530128e4fab8b8c458cb8f381e2ad16756c"
3797
+ },
3798
+ {
3799
+ "group": "data/publication_audit.json",
3800
+ "surface": "hf_model",
3801
+ "kind": "hash_mismatch",
3802
+ "path": "hf_model:metrics/publication_audit.json",
3803
+ "expected_sha256": "8a21c29d92f3a15b835c37d7784c17fada3edbda050515deed8e440535ed046d",
3804
+ "actual_sha256": "a1741a97c2fb5dee8b9ed8e988b31530128e4fab8b8c458cb8f381e2ad16756c"
3805
+ },
3806
+ {
3807
+ "group": "data/scope_claims_audit.json",
3808
+ "surface": "hf_space",
3809
+ "kind": "hash_mismatch",
3810
+ "path": "hf_space:data/scope_claims_audit.json",
3811
+ "expected_sha256": "77402dc77c4ecf5cf1e68480ae2c9822a134ae7ef4a24a7b8b9008a2509c2fa3",
3812
+ "actual_sha256": "4fb8c088f8ec533b142534b37e9241f8690c8819333434f5d89336c2af8c1c31"
3813
+ },
3814
+ {
3815
+ "group": "data/scope_claims_audit.json",
3816
+ "surface": "hf_artifacts",
3817
+ "kind": "hash_mismatch",
3818
+ "path": "hf_artifacts:docs/data/scope_claims_audit.json",
3819
+ "expected_sha256": "77402dc77c4ecf5cf1e68480ae2c9822a134ae7ef4a24a7b8b9008a2509c2fa3",
3820
+ "actual_sha256": "4fb8c088f8ec533b142534b37e9241f8690c8819333434f5d89336c2af8c1c31"
3821
+ },
3822
+ {
3823
+ "group": "data/scope_claims_audit.json",
3824
+ "surface": "hf_model",
3825
+ "kind": "hash_mismatch",
3826
+ "path": "hf_model:metrics/scope_claims_audit.json",
3827
+ "expected_sha256": "77402dc77c4ecf5cf1e68480ae2c9822a134ae7ef4a24a7b8b9008a2509c2fa3",
3828
+ "actual_sha256": "4fb8c088f8ec533b142534b37e9241f8690c8819333434f5d89336c2af8c1c31"
3829
+ },
3830
+ {
3831
+ "group": "data/task_surface_integrity.json",
3832
+ "surface": "hf_space",
3833
+ "kind": "hash_mismatch",
3834
+ "path": "hf_space:data/task_surface_integrity.json",
3835
+ "expected_sha256": "8232e2bafa8b5157d97c018e41be5da3ec69ddb4d2020a0dcc7c6377c5575bb6",
3836
+ "actual_sha256": "51c30fe86c558042960e57a252bc6d3c67d95d5a70a8747043a1cdffe57cf53f"
3837
+ },
3838
+ {
3839
+ "group": "data/task_surface_integrity.json",
3840
+ "surface": "hf_artifacts",
3841
+ "kind": "hash_mismatch",
3842
+ "path": "hf_artifacts:docs/data/task_surface_integrity.json",
3843
+ "expected_sha256": "8232e2bafa8b5157d97c018e41be5da3ec69ddb4d2020a0dcc7c6377c5575bb6",
3844
+ "actual_sha256": "51c30fe86c558042960e57a252bc6d3c67d95d5a70a8747043a1cdffe57cf53f"
3845
+ },
3846
+ {
3847
+ "group": "data/task_surface_integrity.json",
3848
+ "surface": "hf_model",
3849
+ "kind": "hash_mismatch",
3850
+ "path": "hf_model:metrics/task_surface_integrity.json",
3851
+ "expected_sha256": "8232e2bafa8b5157d97c018e41be5da3ec69ddb4d2020a0dcc7c6377c5575bb6",
3852
+ "actual_sha256": "51c30fe86c558042960e57a252bc6d3c67d95d5a70a8747043a1cdffe57cf53f"
3853
+ },
3854
+ {
3855
+ "group": "data/website_integrity.json",
3856
+ "surface": "hf_space",
3857
+ "kind": "hash_mismatch",
3858
+ "path": "hf_space:data/website_integrity.json",
3859
+ "expected_sha256": "dcbd09b4c4522770c43504c500eb653de706538516ee2ec72e491ffc3416c6e2",
3860
+ "actual_sha256": "140d8be179f51351ae55ba7587b7042a1e512e72d9318563a78c96a25e13f830"
3861
+ },
3862
+ {
3863
+ "group": "data/website_integrity.json",
3864
+ "surface": "hf_artifacts",
3865
+ "kind": "hash_mismatch",
3866
+ "path": "hf_artifacts:docs/data/website_integrity.json",
3867
+ "expected_sha256": "dcbd09b4c4522770c43504c500eb653de706538516ee2ec72e491ffc3416c6e2",
3868
+ "actual_sha256": "140d8be179f51351ae55ba7587b7042a1e512e72d9318563a78c96a25e13f830"
3869
+ },
3870
+ {
3871
+ "group": "data/website_integrity.json",
3872
+ "surface": "hf_model",
3873
+ "kind": "hash_mismatch",
3874
+ "path": "hf_model:metrics/website_integrity.json",
3875
+ "expected_sha256": "dcbd09b4c4522770c43504c500eb653de706538516ee2ec72e491ffc3416c6e2",
3876
+ "actual_sha256": "140d8be179f51351ae55ba7587b7042a1e512e72d9318563a78c96a25e13f830"
3877
+ },
3878
+ {
3879
+ "group": "scripts/omni/analyze_qwen3_omni_errors.py",
3880
+ "surface": "hf_artifacts",
3881
+ "kind": "hash_mismatch",
3882
+ "path": "hf_artifacts:scripts/omni/analyze_qwen3_omni_errors.py",
3883
+ "expected_sha256": "d4c7e46d9fbd5f9d84bc32374f457fd8c9d68c8faa39c77bc45770eb95d80337",
3884
+ "actual_sha256": "e90ffd4bb75b001ab41cd956dfbb0a99b574d0b5e8ffc1a64e2887490d658daa"
3885
+ },
3886
+ {
3887
+ "group": "scripts/omni/analyze_qwen3_omni_errors.py",
3888
+ "surface": "hf_model",
3889
+ "kind": "hash_mismatch",
3890
+ "path": "hf_model:scripts/omni/analyze_qwen3_omni_errors.py",
3891
+ "expected_sha256": "d4c7e46d9fbd5f9d84bc32374f457fd8c9d68c8faa39c77bc45770eb95d80337",
3892
+ "actual_sha256": "e90ffd4bb75b001ab41cd956dfbb0a99b574d0b5e8ffc1a64e2887490d658daa"
3893
+ },
3894
+ {
3895
+ "group": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/episode_error_analysis.csv",
3896
+ "surface": "hf_space",
3897
+ "kind": "hash_mismatch",
3898
+ "path": "hf_space:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/episode_error_analysis.csv",
3899
+ "expected_sha256": "7f0bc74140f100b9fe444c38eb74d155605bfc5984f665e653a2cd34a5cb96bd",
3900
+ "actual_sha256": "4024fa756edb5a8a9aaac7c213eb411e8d146b109594ad339cc13b08c960bba9"
3901
+ },
3902
+ {
3903
+ "group": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/episode_error_analysis.csv",
3904
+ "surface": "hf_artifacts",
3905
+ "kind": "hash_mismatch",
3906
+ "path": "hf_artifacts:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/episode_error_analysis.csv",
3907
+ "expected_sha256": "7f0bc74140f100b9fe444c38eb74d155605bfc5984f665e653a2cd34a5cb96bd",
3908
+ "actual_sha256": "4024fa756edb5a8a9aaac7c213eb411e8d146b109594ad339cc13b08c960bba9"
3909
+ },
3910
+ {
3911
+ "group": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/episode_error_analysis.csv",
3912
+ "surface": "hf_model",
3913
+ "kind": "hash_mismatch",
3914
+ "path": "hf_model:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/episode_error_analysis.csv",
3915
+ "expected_sha256": "7f0bc74140f100b9fe444c38eb74d155605bfc5984f665e653a2cd34a5cb96bd",
3916
+ "actual_sha256": "4024fa756edb5a8a9aaac7c213eb411e8d146b109594ad339cc13b08c960bba9"
3917
+ },
3918
+ {
3919
+ "group": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/action_family_error_analysis.csv",
3920
+ "surface": "hf_space",
3921
+ "kind": "hash_mismatch",
3922
+ "path": "hf_space:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/action_family_error_analysis.csv",
3923
+ "expected_sha256": "e15bf22e96b887c4b00aeb8ba548f4fd72ea0aab0772cc59e9bdda517ad72430",
3924
+ "actual_sha256": "d995069202708fa456b35aa459ba6e66d90c799b3c4f7b43aa0f6ac4871c986a"
3925
+ },
3926
+ {
3927
+ "group": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/action_family_error_analysis.csv",
3928
+ "surface": "hf_artifacts",
3929
+ "kind": "hash_mismatch",
3930
+ "path": "hf_artifacts:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/action_family_error_analysis.csv",
3931
+ "expected_sha256": "e15bf22e96b887c4b00aeb8ba548f4fd72ea0aab0772cc59e9bdda517ad72430",
3932
+ "actual_sha256": "d995069202708fa456b35aa459ba6e66d90c799b3c4f7b43aa0f6ac4871c986a"
3933
+ },
3934
+ {
3935
+ "group": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/action_family_error_analysis.csv",
3936
+ "surface": "hf_model",
3937
+ "kind": "hash_mismatch",
3938
+ "path": "hf_model:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/action_family_error_analysis.csv",
3939
+ "expected_sha256": "e15bf22e96b887c4b00aeb8ba548f4fd72ea0aab0772cc59e9bdda517ad72430",
3940
+ "actual_sha256": "d995069202708fa456b35aa459ba6e66d90c799b3c4f7b43aa0f6ac4871c986a"
3941
+ },
3942
+ {
3943
+ "group": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/train_seen_error_analysis.csv",
3944
+ "surface": "hf_space",
3945
+ "kind": "hash_mismatch",
3946
+ "path": "hf_space:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/train_seen_error_analysis.csv",
3947
+ "expected_sha256": "cb196616b6f073266087d8cb7182e36c0a761607f3082ad78c350fd99e1996e7",
3948
+ "actual_sha256": "51de0dd0c65d6edc25e78598eebd681fd6ec16ac27de0fa406cc4318023402ad"
3949
+ },
3950
+ {
3951
+ "group": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/train_seen_error_analysis.csv",
3952
+ "surface": "hf_artifacts",
3953
+ "kind": "hash_mismatch",
3954
+ "path": "hf_artifacts:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/train_seen_error_analysis.csv",
3955
+ "expected_sha256": "cb196616b6f073266087d8cb7182e36c0a761607f3082ad78c350fd99e1996e7",
3956
+ "actual_sha256": "51de0dd0c65d6edc25e78598eebd681fd6ec16ac27de0fa406cc4318023402ad"
3957
+ },
3958
+ {
3959
+ "group": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/train_seen_error_analysis.csv",
3960
+ "surface": "hf_model",
3961
+ "kind": "hash_mismatch",
3962
+ "path": "hf_model:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/train_seen_error_analysis.csv",
3963
+ "expected_sha256": "cb196616b6f073266087d8cb7182e36c0a761607f3082ad78c350fd99e1996e7",
3964
+ "actual_sha256": "51de0dd0c65d6edc25e78598eebd681fd6ec16ac27de0fa406cc4318023402ad"
3965
+ },
3966
+ {
3967
+ "group": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/missing_modality_error_analysis.csv",
3968
+ "surface": "hf_space",
3969
+ "kind": "hash_mismatch",
3970
+ "path": "hf_space:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/missing_modality_error_analysis.csv",
3971
+ "expected_sha256": "6447cf285b466a914055adb0aef4f3d47bf82d33a277d8ca2e6f22c4f0f2a7f7",
3972
+ "actual_sha256": "1bef2f2a709e2b93a01d1cbb43bb483de5bc5b18b25707c796e1df0bab204171"
3973
+ },
3974
+ {
3975
+ "group": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/missing_modality_error_analysis.csv",
3976
+ "surface": "hf_artifacts",
3977
+ "kind": "hash_mismatch",
3978
+ "path": "hf_artifacts:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/missing_modality_error_analysis.csv",
3979
+ "expected_sha256": "6447cf285b466a914055adb0aef4f3d47bf82d33a277d8ca2e6f22c4f0f2a7f7",
3980
+ "actual_sha256": "1bef2f2a709e2b93a01d1cbb43bb483de5bc5b18b25707c796e1df0bab204171"
3981
+ },
3982
+ {
3983
+ "group": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/missing_modality_error_analysis.csv",
3984
+ "surface": "hf_model",
3985
+ "kind": "hash_mismatch",
3986
+ "path": "hf_model:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/missing_modality_error_analysis.csv",
3987
+ "expected_sha256": "6447cf285b466a914055adb0aef4f3d47bf82d33a277d8ca2e6f22c4f0f2a7f7",
3988
+ "actual_sha256": "1bef2f2a709e2b93a01d1cbb43bb483de5bc5b18b25707c796e1df0bab204171"
3989
+ },
3990
+ {
3991
+ "group": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/object_category_error_analysis.csv",
3992
+ "surface": "hf_space",
3993
+ "kind": "hash_mismatch",
3994
+ "path": "hf_space:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/object_category_error_analysis.csv",
3995
+ "expected_sha256": "f9cbd5e566ef666fe2d1050cc5bdadc7967a2056bdaa1e2e9f88fb0c22ee0ef8",
3996
+ "actual_sha256": "5c0e94caf4fe1eb26565e0bb796cd3c1eed2741b98a528f38317bcfb6a4c2e23"
3997
+ },
3998
+ {
3999
+ "group": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/object_category_error_analysis.csv",
4000
+ "surface": "hf_artifacts",
4001
+ "kind": "hash_mismatch",
4002
+ "path": "hf_artifacts:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/object_category_error_analysis.csv",
4003
+ "expected_sha256": "f9cbd5e566ef666fe2d1050cc5bdadc7967a2056bdaa1e2e9f88fb0c22ee0ef8",
4004
+ "actual_sha256": "5c0e94caf4fe1eb26565e0bb796cd3c1eed2741b98a528f38317bcfb6a4c2e23"
4005
+ },
4006
+ {
4007
+ "group": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/object_category_error_analysis.csv",
4008
+ "surface": "hf_model",
4009
+ "kind": "hash_mismatch",
4010
+ "path": "hf_model:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/object_category_error_analysis.csv",
4011
+ "expected_sha256": "f9cbd5e566ef666fe2d1050cc5bdadc7967a2056bdaa1e2e9f88fb0c22ee0ef8",
4012
+ "actual_sha256": "5c0e94caf4fe1eb26565e0bb796cd3c1eed2741b98a528f38317bcfb6a4c2e23"
4013
+ }
4014
+ ]
4015
  }
data/omni_finetune_verified_result.json CHANGED
@@ -67,7 +67,28 @@
67
  "audit_status": "pass",
68
  "contains_raw_xperience10m_data": false,
69
  "contains_qwen_base_weights": false,
70
- "contains_lora_weights": false
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71
  },
72
  "required_next_steps": [
73
  "Improve JSON-format reliability through prompt, decoding, constrained parsing, or target formatting changes.",
 
67
  "audit_status": "pass",
68
  "contains_raw_xperience10m_data": false,
69
  "contains_qwen_base_weights": false,
70
+ "contains_lora_weights": false,
71
+ "error_analysis": {
72
+ "status": "pass",
73
+ "path": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/error_analysis_summary.json",
74
+ "markdown_report": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/ERROR_ANALYSIS.md",
75
+ "groupings": [
76
+ "episode",
77
+ "action_family",
78
+ "train_seen_status",
79
+ "required_modality_state",
80
+ "object_category"
81
+ ],
82
+ "key_readouts": {
83
+ "parsed_prediction_rate": 0.8772321428571429,
84
+ "weakest_action_family": "locomotion",
85
+ "weakest_action_family_samples": 23,
86
+ "weakest_action_family_parsed_prediction_rate": 0.2608695652173913,
87
+ "seen_action_exact_rate": 0.04580152671755725,
88
+ "unseen_action_exact_rate": 0.015772870662460567,
89
+ "required_modality_state": "rrd_missing_only_required_modalities_present"
90
+ }
91
+ }
92
  },
93
  "required_next_steps": [
94
  "Improve JSON-format reliability through prompt, decoding, constrained parsing, or target formatting changes.",
data/project_status.json CHANGED
@@ -180,10 +180,12 @@
180
  "evidence": [
181
  "docs/data/omni_finetune_verified_result.json",
182
  "results/omni_finetune/verified_public/",
 
183
  "scripts/omni/package_verified_omni_result.py",
184
- "scripts/omni/audit_verified_omni_package.py"
 
185
  ],
186
- "readout": "The selected 96/16/16 episode split produced a validation-aware public-safe held-out package with 3,808 exported windows, 512 validation windows, and 448 test predictions. JSON validity is 87.50%, below the 98% target, so it is a stronger diagnostic baseline but not a strong model-quality result."
187
  },
188
  {
189
  "area": "Raw Xperience-10M redistribution",
 
180
  "evidence": [
181
  "docs/data/omni_finetune_verified_result.json",
182
  "results/omni_finetune/verified_public/",
183
+ "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/",
184
  "scripts/omni/package_verified_omni_result.py",
185
+ "scripts/omni/audit_verified_omni_package.py",
186
+ "scripts/omni/analyze_qwen3_omni_errors.py"
187
  ],
188
+ "readout": "The selected 96/16/16 episode split produced a validation-aware public-safe held-out package with 3,808 exported windows, 512 validation windows, 448 test predictions, and derived error-analysis tables by episode, action family, train-seen status, required-modality state, and object category. JSON validity is 87.50%, below the 98% target, so it is a diagnostic baseline but not a strong model-quality result."
189
  },
190
  {
191
  "area": "Raw Xperience-10M redistribution",
data/publication_audit.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-06T14:38:05+00:00",
4
  "checks": [
5
  {
6
  "name": "required_publication_assets_present",
@@ -182,8 +182,8 @@
182
  "github_repo": {
183
  "root": "repo",
184
  "exists": true,
185
- "file_count": 442,
186
- "text_file_count": 372,
187
  "largest_file": {
188
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
189
  "bytes": 55702978
@@ -193,8 +193,8 @@
193
  "hf_space_bundle": {
194
  "root": "hf_publish/space",
195
  "exists": true,
196
- "file_count": 356,
197
- "text_file_count": 286,
198
  "largest_file": {
199
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
200
  "bytes": 55702978
@@ -204,8 +204,8 @@
204
  "hf_artifact_bundle": {
205
  "root": "hf_publish/artifacts",
206
  "exists": true,
207
- "file_count": 514,
208
- "text_file_count": 420,
209
  "largest_file": {
210
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
211
  "bytes": 55702978
@@ -215,8 +215,8 @@
215
  "hf_model_bundle": {
216
  "root": "hf_publish/model",
217
  "exists": true,
218
- "file_count": 701,
219
- "text_file_count": 572,
220
  "largest_file": {
221
  "path": "pytorch_model.bin",
222
  "bytes": 93495480
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-06T14:54:02+00:00",
4
  "checks": [
5
  {
6
  "name": "required_publication_assets_present",
 
182
  "github_repo": {
183
  "root": "repo",
184
  "exists": true,
185
+ "file_count": 450,
186
+ "text_file_count": 380,
187
  "largest_file": {
188
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
189
  "bytes": 55702978
 
193
  "hf_space_bundle": {
194
  "root": "hf_publish/space",
195
  "exists": true,
196
+ "file_count": 363,
197
+ "text_file_count": 293,
198
  "largest_file": {
199
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
200
  "bytes": 55702978
 
204
  "hf_artifact_bundle": {
205
  "root": "hf_publish/artifacts",
206
  "exists": true,
207
+ "file_count": 522,
208
+ "text_file_count": 428,
209
  "largest_file": {
210
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
211
  "bytes": 55702978
 
215
  "hf_model_bundle": {
216
  "root": "hf_publish/model",
217
  "exists": true,
218
+ "file_count": 709,
219
+ "text_file_count": 580,
220
  "largest_file": {
221
  "path": "pytorch_model.bin",
222
  "bytes": 93495480
data/scope_claims_audit.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-06T14:35:59+00:00",
4
  "summary": {
5
  "qwen3_omni_verified_diagnostic_pilot": true,
6
  "dataset_manifest_num_episodes": 119,
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-06T14:54:01+00:00",
4
  "summary": {
5
  "qwen3_omni_verified_diagnostic_pilot": true,
6
  "dataset_manifest_num_episodes": 119,
data/task_surface_integrity.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-06T14:35:59+00:00",
4
  "summary": {
5
  "task_count": 12,
6
  "expected_task_count": 12,
@@ -64,15 +64,21 @@
64
  "observed": "timeline_action"
65
  },
66
  {
67
- "name": "timeline_action: public_field_input_short_is_human_readable",
68
  "status": "pass",
69
- "value": "20-frame multimodal window",
70
  "raw_hits": []
71
  },
72
  {
73
- "name": "timeline_action: public_field_card_blurb_is_human_readable",
74
  "status": "pass",
75
- "value": "Recognize the current manipulation action from synchronized visual, motion, inertial, pose, and annotation context.",
 
 
 
 
 
 
76
  "raw_hits": []
77
  },
78
  {
@@ -88,9 +94,9 @@
88
  "raw_hits": []
89
  },
90
  {
91
- "name": "timeline_action: public_field_research_name_is_human_readable",
92
  "status": "pass",
93
- "value": "Egocentric Action Recognition",
94
  "raw_hits": []
95
  },
96
  {
@@ -99,12 +105,6 @@
99
  "value": "Look at one short multimodal window and name what action is happening now.",
100
  "raw_hits": []
101
  },
102
- {
103
- "name": "timeline_action: public_field_process_short_is_human_readable",
104
- "status": "pass",
105
- "value": "window features -> action label builder -> classifier",
106
- "raw_hits": []
107
- },
108
  {
109
  "name": "timeline_action: known_task_family",
110
  "status": "pass",
@@ -184,15 +184,21 @@
184
  "observed": "timeline_subtask"
185
  },
186
  {
187
- "name": "timeline_subtask: public_field_input_short_is_human_readable",
188
  "status": "pass",
189
- "value": "20-frame multimodal window",
190
  "raw_hits": []
191
  },
192
  {
193
- "name": "timeline_subtask: public_field_card_blurb_is_human_readable",
194
  "status": "pass",
195
- "value": "Recognize the broader activity stage so fine actions become a readable procedure timeline.",
 
 
 
 
 
 
196
  "raw_hits": []
197
  },
198
  {
@@ -208,9 +214,9 @@
208
  "raw_hits": []
209
  },
210
  {
211
- "name": "timeline_subtask: public_field_research_name_is_human_readable",
212
  "status": "pass",
213
- "value": "Temporal Subtask Recognition",
214
  "raw_hits": []
215
  },
216
  {
@@ -219,12 +225,6 @@
219
  "value": "Predict the higher-level task stage for the current window.",
220
  "raw_hits": []
221
  },
222
- {
223
- "name": "timeline_subtask: public_field_process_short_is_human_readable",
224
- "status": "pass",
225
- "value": "window features -> subtask label builder -> classifier",
226
- "raw_hits": []
227
- },
228
  {
229
  "name": "timeline_subtask: known_task_family",
230
  "status": "pass",
@@ -304,15 +304,21 @@
304
  "observed": "transition_detection"
305
  },
306
  {
307
- "name": "transition_detection: public_field_input_short_is_human_readable",
308
  "status": "pass",
309
- "value": "current window with boundary target",
310
  "raw_hits": []
311
  },
312
  {
313
- "name": "transition_detection: public_field_card_blurb_is_human_readable",
314
  "status": "pass",
315
- "value": "Detect the local moment where the episode changes from one action segment to the next.",
 
 
 
 
 
 
316
  "raw_hits": []
317
  },
318
  {
@@ -328,9 +334,9 @@
328
  "raw_hits": []
329
  },
330
  {
331
- "name": "transition_detection: public_field_research_name_is_human_readable",
332
  "status": "pass",
333
- "value": "Temporal Action Segmentation",
334
  "raw_hits": []
335
  },
336
  {
@@ -339,12 +345,6 @@
339
  "value": "Detect whether the current window is near a boundary between actions.",
340
  "raw_hits": []
341
  },
342
- {
343
- "name": "transition_detection: public_field_process_short_is_human_readable",
344
- "status": "pass",
345
- "value": "action changes -> boundary labels -> binary classifier",
346
- "raw_hits": []
347
- },
348
  {
349
  "name": "transition_detection: known_task_family",
350
  "status": "pass",
@@ -422,15 +422,21 @@
422
  "observed": "next_action"
423
  },
424
  {
425
- "name": "next_action: public_field_input_short_is_human_readable",
426
  "status": "pass",
427
- "value": "current window at time t",
428
  "raw_hits": []
429
  },
430
  {
431
- "name": "next_action: public_field_card_blurb_is_human_readable",
432
  "status": "pass",
433
- "value": "Forecast the near-future action from the current observations only.",
 
 
 
 
 
 
434
  "raw_hits": []
435
  },
436
  {
@@ -446,9 +452,9 @@
446
  "raw_hits": []
447
  },
448
  {
449
- "name": "next_action: public_field_research_name_is_human_readable",
450
  "status": "pass",
451
- "value": "Short-Horizon Intention Prediction",
452
  "raw_hits": []
453
  },
454
  {
@@ -457,12 +463,6 @@
457
  "value": "Use the current window to guess the action that will happen shortly after it.",
458
  "raw_hits": []
459
  },
460
- {
461
- "name": "next_action: public_field_process_short_is_human_readable",
462
- "status": "pass",
463
- "value": "current features -> future label shift -> classifier",
464
- "raw_hits": []
465
- },
466
  {
467
  "name": "next_action: known_task_family",
468
  "status": "pass",
@@ -540,15 +540,21 @@
540
  "observed": "hand_trajectory_forecast"
541
  },
542
  {
543
- "name": "hand_trajectory_forecast: public_field_input_short_is_human_readable",
544
  "status": "pass",
545
- "value": "current multimodal window",
546
  "raw_hits": []
547
  },
548
  {
549
- "name": "hand_trajectory_forecast: public_field_card_blurb_is_human_readable",
550
  "status": "pass",
551
- "value": "Predict the future 3D left/right hand path from the current multimodal state.",
 
 
 
 
 
 
552
  "raw_hits": []
553
  },
554
  {
@@ -564,9 +570,9 @@
564
  "raw_hits": []
565
  },
566
  {
567
- "name": "hand_trajectory_forecast: public_field_research_name_is_human_readable",
568
  "status": "pass",
569
- "value": "3D Hand Motion Forecasting",
570
  "raw_hits": []
571
  },
572
  {
@@ -575,12 +581,6 @@
575
  "value": "Predict where the hands will move over the next few frames.",
576
  "raw_hits": []
577
  },
578
- {
579
- "name": "hand_trajectory_forecast: public_field_process_short_is_human_readable",
580
- "status": "pass",
581
- "value": "current features -> future mocap target -> regression head",
582
- "raw_hits": []
583
- },
584
  {
585
  "name": "hand_trajectory_forecast: known_task_family",
586
  "status": "pass",
@@ -658,15 +658,21 @@
658
  "observed": "contact_prediction"
659
  },
660
  {
661
- "name": "contact_prediction: public_field_input_short_is_human_readable",
662
  "status": "pass",
663
- "value": "non-contact, non-caption features",
664
  "raw_hits": []
665
  },
666
  {
667
- "name": "contact_prediction: public_field_card_blurb_is_human_readable",
668
  "status": "pass",
669
- "value": "Predict whether body or hand contact with the scene is occurring without leaking contact labels.",
 
 
 
 
 
 
670
  "raw_hits": []
671
  },
672
  {
@@ -682,9 +688,9 @@
682
  "raw_hits": []
683
  },
684
  {
685
- "name": "contact_prediction: public_field_research_name_is_human_readable",
686
  "status": "pass",
687
- "value": "Human-Object Contact Prediction",
688
  "raw_hits": []
689
  },
690
  {
@@ -693,12 +699,6 @@
693
  "value": "Predict whether the body or hand is in contact with something.",
694
  "raw_hits": []
695
  },
696
- {
697
- "name": "contact_prediction: public_field_process_short_is_human_readable",
698
- "status": "pass",
699
- "value": "feature filter -> contact target -> binary classifier",
700
- "raw_hits": []
701
- },
702
  {
703
  "name": "contact_prediction: known_task_family",
704
  "status": "pass",
@@ -774,15 +774,21 @@
774
  "observed": "object_relevance"
775
  },
776
  {
777
- "name": "object_relevance: public_field_input_short_is_human_readable",
778
  "status": "pass",
779
- "value": "non-caption multimodal features",
780
  "raw_hits": []
781
  },
782
  {
783
- "name": "object_relevance: public_field_card_blurb_is_human_readable",
784
  "status": "pass",
785
- "value": "Infer which objects are relevant to the current manipulation window from non-caption features.",
 
 
 
 
 
 
786
  "raw_hits": []
787
  },
788
  {
@@ -798,9 +804,9 @@
798
  "raw_hits": []
799
  },
800
  {
801
- "name": "object_relevance: public_field_research_name_is_human_readable",
802
  "status": "pass",
803
- "value": "Object-Centric Interaction Recognition",
804
  "raw_hits": []
805
  },
806
  {
@@ -809,12 +815,6 @@
809
  "value": "Predict which objects matter in the current window.",
810
  "raw_hits": []
811
  },
812
- {
813
- "name": "object_relevance: public_field_process_short_is_human_readable",
814
- "status": "pass",
815
- "value": "object vocabulary -> multi-hot labels -> sigmoid heads",
816
- "raw_hits": []
817
- },
818
  {
819
  "name": "object_relevance: known_task_family",
820
  "status": "pass",
@@ -892,15 +892,21 @@
892
  "observed": "caption_grounding"
893
  },
894
  {
895
- "name": "caption_grounding: public_field_input_short_is_human_readable",
896
  "status": "pass",
897
- "value": "text-like query and candidate windows",
898
  "raw_hits": []
899
  },
900
  {
901
- "name": "caption_grounding: public_field_card_blurb_is_human_readable",
902
  "status": "pass",
903
- "value": "Retrieve the matching time window for an annotation-derived text query.",
 
 
 
 
 
 
904
  "raw_hits": []
905
  },
906
  {
@@ -916,9 +922,9 @@
916
  "raw_hits": []
917
  },
918
  {
919
- "name": "caption_grounding: public_field_research_name_is_human_readable",
920
  "status": "pass",
921
- "value": "Language-to-Moment Grounding",
922
  "raw_hits": []
923
  },
924
  {
@@ -927,12 +933,6 @@
927
  "value": "Given a text-like query from annotation, find the matching time window.",
928
  "raw_hits": []
929
  },
930
- {
931
- "name": "caption_grounding: public_field_process_short_is_human_readable",
932
- "status": "pass",
933
- "value": "query features -> candidate index -> cosine ranker",
934
- "raw_hits": []
935
- },
936
  {
937
  "name": "caption_grounding: known_task_family",
938
  "status": "pass",
@@ -1008,15 +1008,21 @@
1008
  "observed": "cross_modal_retrieval"
1009
  },
1010
  {
1011
- "name": "cross_modal_retrieval: public_field_input_short_is_human_readable",
1012
  "status": "pass",
1013
- "value": "motion/IMU/pose query; depth/video candidates",
1014
  "raw_hits": []
1015
  },
1016
  {
1017
- "name": "cross_modal_retrieval: public_field_card_blurb_is_human_readable",
1018
  "status": "pass",
1019
- "value": "Use motion, IMU, and camera-pose signals to retrieve the matching depth/video window.",
 
 
 
 
 
 
1020
  "raw_hits": []
1021
  },
1022
  {
@@ -1032,9 +1038,9 @@
1032
  "raw_hits": []
1033
  },
1034
  {
1035
- "name": "cross_modal_retrieval: public_field_research_name_is_human_readable",
1036
  "status": "pass",
1037
- "value": "Multimodal Representation Retrieval",
1038
  "raw_hits": []
1039
  },
1040
  {
@@ -1043,12 +1049,6 @@
1043
  "value": "Use one group of modalities to retrieve the matching window from another group.",
1044
  "raw_hits": []
1045
  },
1046
- {
1047
- "name": "cross_modal_retrieval: public_field_process_short_is_human_readable",
1048
- "status": "pass",
1049
- "value": "modality split -> projection -> nearest-neighbor ranker",
1050
- "raw_hits": []
1051
- },
1052
  {
1053
  "name": "cross_modal_retrieval: known_task_family",
1054
  "status": "pass",
@@ -1126,15 +1126,21 @@
1126
  "observed": "modality_reconstruction"
1127
  },
1128
  {
1129
- "name": "modality_reconstruction: public_field_input_short_is_human_readable",
1130
  "status": "pass",
1131
- "value": "motion, IMU, and camera/pose features",
1132
  "raw_hits": []
1133
  },
1134
  {
1135
- "name": "modality_reconstruction: public_field_card_blurb_is_human_readable",
1136
  "status": "pass",
1137
- "value": "Predict compressed depth/video feature vectors from motion, IMU, and camera-pose features.",
 
 
 
 
 
 
1138
  "raw_hits": []
1139
  },
1140
  {
@@ -1150,9 +1156,9 @@
1150
  "raw_hits": []
1151
  },
1152
  {
1153
- "name": "modality_reconstruction: public_field_research_name_is_human_readable",
1154
  "status": "pass",
1155
- "value": "Modality Feature Reconstruction",
1156
  "raw_hits": []
1157
  },
1158
  {
@@ -1161,12 +1167,6 @@
1161
  "value": "Predict one modality feature block from other modality blocks.",
1162
  "raw_hits": []
1163
  },
1164
- {
1165
- "name": "modality_reconstruction: public_field_process_short_is_human_readable",
1166
- "status": "pass",
1167
- "value": "source-target split -> scaler -> regression head",
1168
- "raw_hits": []
1169
- },
1170
  {
1171
  "name": "modality_reconstruction: known_task_family",
1172
  "status": "pass",
@@ -1243,12 +1243,6 @@
1243
  "status": "pass",
1244
  "observed": "temporal_order"
1245
  },
1246
- {
1247
- "name": "temporal_order: public_field_input_short_is_human_readable",
1248
- "status": "pass",
1249
- "value": "two adjacent windows plus difference vector",
1250
- "raw_hits": []
1251
- },
1252
  {
1253
  "name": "temporal_order: public_field_card_blurb_is_human_readable",
1254
  "status": "pass",
@@ -1256,27 +1250,27 @@
1256
  "raw_hits": []
1257
  },
1258
  {
1259
- "name": "temporal_order: public_field_display_name_is_human_readable",
1260
  "status": "pass",
1261
  "value": "Temporal Order Verification",
1262
  "raw_hits": []
1263
  },
1264
  {
1265
- "name": "temporal_order: public_field_output_short_is_human_readable",
1266
  "status": "pass",
1267
- "value": "correct or reversed",
1268
  "raw_hits": []
1269
  },
1270
  {
1271
- "name": "temporal_order: public_field_research_name_is_human_readable",
1272
  "status": "pass",
1273
  "value": "Temporal Order Verification",
1274
  "raw_hits": []
1275
  },
1276
  {
1277
- "name": "temporal_order: public_field_plain_goal_is_human_readable",
1278
  "status": "pass",
1279
- "value": "Tell whether two nearby windows are in the correct time order.",
1280
  "raw_hits": []
1281
  },
1282
  {
@@ -1285,6 +1279,12 @@
1285
  "value": "pair builder -> feature combiner -> binary classifier",
1286
  "raw_hits": []
1287
  },
 
 
 
 
 
 
1288
  {
1289
  "name": "temporal_order: known_task_family",
1290
  "status": "pass",
@@ -1360,15 +1360,21 @@
1360
  "observed": "misalignment_detection"
1361
  },
1362
  {
1363
- "name": "misalignment_detection: public_field_input_short_is_human_readable",
1364
  "status": "pass",
1365
- "value": "motion-side and visual/depth-side feature groups",
1366
  "raw_hits": []
1367
  },
1368
  {
1369
- "name": "misalignment_detection: public_field_card_blurb_is_human_readable",
1370
  "status": "pass",
1371
- "value": "Detect whether motion and visual/depth streams have been artificially shifted out of sync.",
 
 
 
 
 
 
1372
  "raw_hits": []
1373
  },
1374
  {
@@ -1384,9 +1390,9 @@
1384
  "raw_hits": []
1385
  },
1386
  {
1387
- "name": "misalignment_detection: public_field_research_name_is_human_readable",
1388
  "status": "pass",
1389
- "value": "Cross-Modal Misalignment Detection",
1390
  "raw_hits": []
1391
  },
1392
  {
@@ -1395,12 +1401,6 @@
1395
  "value": "Detect when modalities that should match are shifted out of sync.",
1396
  "raw_hits": []
1397
  },
1398
- {
1399
- "name": "misalignment_detection: public_field_process_short_is_human_readable",
1400
- "status": "pass",
1401
- "value": "aligned/shifted pairs -> feature combiner -> binary classifier",
1402
- "raw_hits": []
1403
- },
1404
  {
1405
  "name": "misalignment_detection: known_task_family",
1406
  "status": "pass",
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-06T14:53:59+00:00",
4
  "summary": {
5
  "task_count": 12,
6
  "expected_task_count": 12,
 
64
  "observed": "timeline_action"
65
  },
66
  {
67
+ "name": "timeline_action: public_field_card_blurb_is_human_readable",
68
  "status": "pass",
69
+ "value": "Recognize the current manipulation action from synchronized visual, motion, inertial, pose, and annotation context.",
70
  "raw_hits": []
71
  },
72
  {
73
+ "name": "timeline_action: public_field_research_name_is_human_readable",
74
  "status": "pass",
75
+ "value": "Egocentric Action Recognition",
76
+ "raw_hits": []
77
+ },
78
+ {
79
+ "name": "timeline_action: public_field_input_short_is_human_readable",
80
+ "status": "pass",
81
+ "value": "20-frame multimodal window",
82
  "raw_hits": []
83
  },
84
  {
 
94
  "raw_hits": []
95
  },
96
  {
97
+ "name": "timeline_action: public_field_process_short_is_human_readable",
98
  "status": "pass",
99
+ "value": "window features -> action label builder -> classifier",
100
  "raw_hits": []
101
  },
102
  {
 
105
  "value": "Look at one short multimodal window and name what action is happening now.",
106
  "raw_hits": []
107
  },
 
 
 
 
 
 
108
  {
109
  "name": "timeline_action: known_task_family",
110
  "status": "pass",
 
184
  "observed": "timeline_subtask"
185
  },
186
  {
187
+ "name": "timeline_subtask: public_field_card_blurb_is_human_readable",
188
  "status": "pass",
189
+ "value": "Recognize the broader activity stage so fine actions become a readable procedure timeline.",
190
  "raw_hits": []
191
  },
192
  {
193
+ "name": "timeline_subtask: public_field_research_name_is_human_readable",
194
  "status": "pass",
195
+ "value": "Temporal Subtask Recognition",
196
+ "raw_hits": []
197
+ },
198
+ {
199
+ "name": "timeline_subtask: public_field_input_short_is_human_readable",
200
+ "status": "pass",
201
+ "value": "20-frame multimodal window",
202
  "raw_hits": []
203
  },
204
  {
 
214
  "raw_hits": []
215
  },
216
  {
217
+ "name": "timeline_subtask: public_field_process_short_is_human_readable",
218
  "status": "pass",
219
+ "value": "window features -> subtask label builder -> classifier",
220
  "raw_hits": []
221
  },
222
  {
 
225
  "value": "Predict the higher-level task stage for the current window.",
226
  "raw_hits": []
227
  },
 
 
 
 
 
 
228
  {
229
  "name": "timeline_subtask: known_task_family",
230
  "status": "pass",
 
304
  "observed": "transition_detection"
305
  },
306
  {
307
+ "name": "transition_detection: public_field_card_blurb_is_human_readable",
308
  "status": "pass",
309
+ "value": "Detect the local moment where the episode changes from one action segment to the next.",
310
  "raw_hits": []
311
  },
312
  {
313
+ "name": "transition_detection: public_field_research_name_is_human_readable",
314
  "status": "pass",
315
+ "value": "Temporal Action Segmentation",
316
+ "raw_hits": []
317
+ },
318
+ {
319
+ "name": "transition_detection: public_field_input_short_is_human_readable",
320
+ "status": "pass",
321
+ "value": "current window with boundary target",
322
  "raw_hits": []
323
  },
324
  {
 
334
  "raw_hits": []
335
  },
336
  {
337
+ "name": "transition_detection: public_field_process_short_is_human_readable",
338
  "status": "pass",
339
+ "value": "action changes -> boundary labels -> binary classifier",
340
  "raw_hits": []
341
  },
342
  {
 
345
  "value": "Detect whether the current window is near a boundary between actions.",
346
  "raw_hits": []
347
  },
 
 
 
 
 
 
348
  {
349
  "name": "transition_detection: known_task_family",
350
  "status": "pass",
 
422
  "observed": "next_action"
423
  },
424
  {
425
+ "name": "next_action: public_field_card_blurb_is_human_readable",
426
  "status": "pass",
427
+ "value": "Forecast the near-future action from the current observations only.",
428
  "raw_hits": []
429
  },
430
  {
431
+ "name": "next_action: public_field_research_name_is_human_readable",
432
  "status": "pass",
433
+ "value": "Short-Horizon Intention Prediction",
434
+ "raw_hits": []
435
+ },
436
+ {
437
+ "name": "next_action: public_field_input_short_is_human_readable",
438
+ "status": "pass",
439
+ "value": "current window at time t",
440
  "raw_hits": []
441
  },
442
  {
 
452
  "raw_hits": []
453
  },
454
  {
455
+ "name": "next_action: public_field_process_short_is_human_readable",
456
  "status": "pass",
457
+ "value": "current features -> future label shift -> classifier",
458
  "raw_hits": []
459
  },
460
  {
 
463
  "value": "Use the current window to guess the action that will happen shortly after it.",
464
  "raw_hits": []
465
  },
 
 
 
 
 
 
466
  {
467
  "name": "next_action: known_task_family",
468
  "status": "pass",
 
540
  "observed": "hand_trajectory_forecast"
541
  },
542
  {
543
+ "name": "hand_trajectory_forecast: public_field_card_blurb_is_human_readable",
544
  "status": "pass",
545
+ "value": "Predict the future 3D left/right hand path from the current multimodal state.",
546
  "raw_hits": []
547
  },
548
  {
549
+ "name": "hand_trajectory_forecast: public_field_research_name_is_human_readable",
550
  "status": "pass",
551
+ "value": "3D Hand Motion Forecasting",
552
+ "raw_hits": []
553
+ },
554
+ {
555
+ "name": "hand_trajectory_forecast: public_field_input_short_is_human_readable",
556
+ "status": "pass",
557
+ "value": "current multimodal window",
558
  "raw_hits": []
559
  },
560
  {
 
570
  "raw_hits": []
571
  },
572
  {
573
+ "name": "hand_trajectory_forecast: public_field_process_short_is_human_readable",
574
  "status": "pass",
575
+ "value": "current features -> future mocap target -> regression head",
576
  "raw_hits": []
577
  },
578
  {
 
581
  "value": "Predict where the hands will move over the next few frames.",
582
  "raw_hits": []
583
  },
 
 
 
 
 
 
584
  {
585
  "name": "hand_trajectory_forecast: known_task_family",
586
  "status": "pass",
 
658
  "observed": "contact_prediction"
659
  },
660
  {
661
+ "name": "contact_prediction: public_field_card_blurb_is_human_readable",
662
  "status": "pass",
663
+ "value": "Predict whether body or hand contact with the scene is occurring without leaking contact labels.",
664
  "raw_hits": []
665
  },
666
  {
667
+ "name": "contact_prediction: public_field_research_name_is_human_readable",
668
  "status": "pass",
669
+ "value": "Human-Object Contact Prediction",
670
+ "raw_hits": []
671
+ },
672
+ {
673
+ "name": "contact_prediction: public_field_input_short_is_human_readable",
674
+ "status": "pass",
675
+ "value": "non-contact, non-caption features",
676
  "raw_hits": []
677
  },
678
  {
 
688
  "raw_hits": []
689
  },
690
  {
691
+ "name": "contact_prediction: public_field_process_short_is_human_readable",
692
  "status": "pass",
693
+ "value": "feature filter -> contact target -> binary classifier",
694
  "raw_hits": []
695
  },
696
  {
 
699
  "value": "Predict whether the body or hand is in contact with something.",
700
  "raw_hits": []
701
  },
 
 
 
 
 
 
702
  {
703
  "name": "contact_prediction: known_task_family",
704
  "status": "pass",
 
774
  "observed": "object_relevance"
775
  },
776
  {
777
+ "name": "object_relevance: public_field_card_blurb_is_human_readable",
778
  "status": "pass",
779
+ "value": "Infer which objects are relevant to the current manipulation window from non-caption features.",
780
  "raw_hits": []
781
  },
782
  {
783
+ "name": "object_relevance: public_field_research_name_is_human_readable",
784
  "status": "pass",
785
+ "value": "Object-Centric Interaction Recognition",
786
+ "raw_hits": []
787
+ },
788
+ {
789
+ "name": "object_relevance: public_field_input_short_is_human_readable",
790
+ "status": "pass",
791
+ "value": "non-caption multimodal features",
792
  "raw_hits": []
793
  },
794
  {
 
804
  "raw_hits": []
805
  },
806
  {
807
+ "name": "object_relevance: public_field_process_short_is_human_readable",
808
  "status": "pass",
809
+ "value": "object vocabulary -> multi-hot labels -> sigmoid heads",
810
  "raw_hits": []
811
  },
812
  {
 
815
  "value": "Predict which objects matter in the current window.",
816
  "raw_hits": []
817
  },
 
 
 
 
 
 
818
  {
819
  "name": "object_relevance: known_task_family",
820
  "status": "pass",
 
892
  "observed": "caption_grounding"
893
  },
894
  {
895
+ "name": "caption_grounding: public_field_card_blurb_is_human_readable",
896
  "status": "pass",
897
+ "value": "Retrieve the matching time window for an annotation-derived text query.",
898
  "raw_hits": []
899
  },
900
  {
901
+ "name": "caption_grounding: public_field_research_name_is_human_readable",
902
  "status": "pass",
903
+ "value": "Language-to-Moment Grounding",
904
+ "raw_hits": []
905
+ },
906
+ {
907
+ "name": "caption_grounding: public_field_input_short_is_human_readable",
908
+ "status": "pass",
909
+ "value": "text-like query and candidate windows",
910
  "raw_hits": []
911
  },
912
  {
 
922
  "raw_hits": []
923
  },
924
  {
925
+ "name": "caption_grounding: public_field_process_short_is_human_readable",
926
  "status": "pass",
927
+ "value": "query features -> candidate index -> cosine ranker",
928
  "raw_hits": []
929
  },
930
  {
 
933
  "value": "Given a text-like query from annotation, find the matching time window.",
934
  "raw_hits": []
935
  },
 
 
 
 
 
 
936
  {
937
  "name": "caption_grounding: known_task_family",
938
  "status": "pass",
 
1008
  "observed": "cross_modal_retrieval"
1009
  },
1010
  {
1011
+ "name": "cross_modal_retrieval: public_field_card_blurb_is_human_readable",
1012
  "status": "pass",
1013
+ "value": "Use motion, IMU, and camera-pose signals to retrieve the matching depth/video window.",
1014
  "raw_hits": []
1015
  },
1016
  {
1017
+ "name": "cross_modal_retrieval: public_field_research_name_is_human_readable",
1018
  "status": "pass",
1019
+ "value": "Multimodal Representation Retrieval",
1020
+ "raw_hits": []
1021
+ },
1022
+ {
1023
+ "name": "cross_modal_retrieval: public_field_input_short_is_human_readable",
1024
+ "status": "pass",
1025
+ "value": "motion/IMU/pose query; depth/video candidates",
1026
  "raw_hits": []
1027
  },
1028
  {
 
1038
  "raw_hits": []
1039
  },
1040
  {
1041
+ "name": "cross_modal_retrieval: public_field_process_short_is_human_readable",
1042
  "status": "pass",
1043
+ "value": "modality split -> projection -> nearest-neighbor ranker",
1044
  "raw_hits": []
1045
  },
1046
  {
 
1049
  "value": "Use one group of modalities to retrieve the matching window from another group.",
1050
  "raw_hits": []
1051
  },
 
 
 
 
 
 
1052
  {
1053
  "name": "cross_modal_retrieval: known_task_family",
1054
  "status": "pass",
 
1126
  "observed": "modality_reconstruction"
1127
  },
1128
  {
1129
+ "name": "modality_reconstruction: public_field_card_blurb_is_human_readable",
1130
  "status": "pass",
1131
+ "value": "Predict compressed depth/video feature vectors from motion, IMU, and camera-pose features.",
1132
  "raw_hits": []
1133
  },
1134
  {
1135
+ "name": "modality_reconstruction: public_field_research_name_is_human_readable",
1136
  "status": "pass",
1137
+ "value": "Modality Feature Reconstruction",
1138
+ "raw_hits": []
1139
+ },
1140
+ {
1141
+ "name": "modality_reconstruction: public_field_input_short_is_human_readable",
1142
+ "status": "pass",
1143
+ "value": "motion, IMU, and camera/pose features",
1144
  "raw_hits": []
1145
  },
1146
  {
 
1156
  "raw_hits": []
1157
  },
1158
  {
1159
+ "name": "modality_reconstruction: public_field_process_short_is_human_readable",
1160
  "status": "pass",
1161
+ "value": "source-target split -> scaler -> regression head",
1162
  "raw_hits": []
1163
  },
1164
  {
 
1167
  "value": "Predict one modality feature block from other modality blocks.",
1168
  "raw_hits": []
1169
  },
 
 
 
 
 
 
1170
  {
1171
  "name": "modality_reconstruction: known_task_family",
1172
  "status": "pass",
 
1243
  "status": "pass",
1244
  "observed": "temporal_order"
1245
  },
 
 
 
 
 
 
1246
  {
1247
  "name": "temporal_order: public_field_card_blurb_is_human_readable",
1248
  "status": "pass",
 
1250
  "raw_hits": []
1251
  },
1252
  {
1253
+ "name": "temporal_order: public_field_research_name_is_human_readable",
1254
  "status": "pass",
1255
  "value": "Temporal Order Verification",
1256
  "raw_hits": []
1257
  },
1258
  {
1259
+ "name": "temporal_order: public_field_input_short_is_human_readable",
1260
  "status": "pass",
1261
+ "value": "two adjacent windows plus difference vector",
1262
  "raw_hits": []
1263
  },
1264
  {
1265
+ "name": "temporal_order: public_field_display_name_is_human_readable",
1266
  "status": "pass",
1267
  "value": "Temporal Order Verification",
1268
  "raw_hits": []
1269
  },
1270
  {
1271
+ "name": "temporal_order: public_field_output_short_is_human_readable",
1272
  "status": "pass",
1273
+ "value": "correct or reversed",
1274
  "raw_hits": []
1275
  },
1276
  {
 
1279
  "value": "pair builder -> feature combiner -> binary classifier",
1280
  "raw_hits": []
1281
  },
1282
+ {
1283
+ "name": "temporal_order: public_field_plain_goal_is_human_readable",
1284
+ "status": "pass",
1285
+ "value": "Tell whether two nearby windows are in the correct time order.",
1286
+ "raw_hits": []
1287
+ },
1288
  {
1289
  "name": "temporal_order: known_task_family",
1290
  "status": "pass",
 
1360
  "observed": "misalignment_detection"
1361
  },
1362
  {
1363
+ "name": "misalignment_detection: public_field_card_blurb_is_human_readable",
1364
  "status": "pass",
1365
+ "value": "Detect whether motion and visual/depth streams have been artificially shifted out of sync.",
1366
  "raw_hits": []
1367
  },
1368
  {
1369
+ "name": "misalignment_detection: public_field_research_name_is_human_readable",
1370
  "status": "pass",
1371
+ "value": "Cross-Modal Misalignment Detection",
1372
+ "raw_hits": []
1373
+ },
1374
+ {
1375
+ "name": "misalignment_detection: public_field_input_short_is_human_readable",
1376
+ "status": "pass",
1377
+ "value": "motion-side and visual/depth-side feature groups",
1378
  "raw_hits": []
1379
  },
1380
  {
 
1390
  "raw_hits": []
1391
  },
1392
  {
1393
+ "name": "misalignment_detection: public_field_process_short_is_human_readable",
1394
  "status": "pass",
1395
+ "value": "aligned/shifted pairs -> feature combiner -> binary classifier",
1396
  "raw_hits": []
1397
  },
1398
  {
 
1401
  "value": "Detect when modalities that should match are shifted out of sync.",
1402
  "raw_hits": []
1403
  },
 
 
 
 
 
 
1404
  {
1405
  "name": "misalignment_detection: known_task_family",
1406
  "status": "pass",
data/website_integrity.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-06T14:36:10+00:00",
4
  "docs_root": "docs",
5
  "site_base": "/ropedia-xperience-10m-task-suite/",
6
  "summary": {
@@ -251,7 +251,7 @@
251
  },
252
  {
253
  "path": "data/artifact_index.json",
254
- "bytes": 37736,
255
  "top_level_type": "dict"
256
  },
257
  {
@@ -291,7 +291,7 @@
291
  },
292
  {
293
  "path": "data/mirror_parity.json",
294
- "bytes": 111950,
295
  "top_level_type": "dict"
296
  },
297
  {
@@ -301,7 +301,7 @@
301
  },
302
  {
303
  "path": "data/omni_finetune_verified_result.json",
304
- "bytes": 3145,
305
  "top_level_type": "dict"
306
  },
307
  {
@@ -321,7 +321,7 @@
321
  },
322
  {
323
  "path": "data/project_status.json",
324
- "bytes": 10977,
325
  "top_level_type": "dict"
326
  },
327
  {
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-06T14:54:01+00:00",
4
  "docs_root": "docs",
5
  "site_base": "/ropedia-xperience-10m-task-suite/",
6
  "summary": {
 
251
  },
252
  {
253
  "path": "data/artifact_index.json",
254
+ "bytes": 39486,
255
  "top_level_type": "dict"
256
  },
257
  {
 
291
  },
292
  {
293
  "path": "data/mirror_parity.json",
294
+ "bytes": 126335,
295
  "top_level_type": "dict"
296
  },
297
  {
 
301
  },
302
  {
303
  "path": "data/omni_finetune_verified_result.json",
304
+ "bytes": 4142,
305
  "top_level_type": "dict"
306
  },
307
  {
 
321
  },
322
  {
323
  "path": "data/project_status.json",
324
+ "bytes": 11274,
325
  "top_level_type": "dict"
326
  },
327
  {
docs/data/artifact_index.json CHANGED
@@ -1,12 +1,12 @@
1
  {
2
  "title": "Ropedia Xperience-10M Task Suite Artifact Index",
3
- "generated_at_utc": "2026-06-06T14:35:42+00:00",
4
  "status": "pass",
5
- "artifact_count": 83,
6
  "missing": [],
7
  "by_kind": {
8
  "project_path": 14,
9
- "scaleup_contract": 6,
10
  "project_scope": 1,
11
  "source_alignment": 5,
12
  "publication_workflow": 3,
@@ -28,7 +28,7 @@
28
  "onboarding_doc": 1,
29
  "generated_figure": 3,
30
  "generated_figure_assets": 1,
31
- "scaleup_status": 2,
32
  "citation": 1,
33
  "license": 1
34
  },
@@ -63,8 +63,8 @@
63
  "surface": "repo_hf",
64
  "shows": "Gives a compact current-state table for first-pass readers.",
65
  "exists": true,
66
- "bytes": 8534,
67
- "sha256": "5eb48d489da7f005baab233a94c9d6b209eb1e9ffdb138c8e0e600ece9239a29"
68
  },
69
  {
70
  "id": "project_status_json",
@@ -74,8 +74,8 @@
74
  "surface": "website_hf",
75
  "shows": "Machine-readable copy of the current project status for website and HF mirrors.",
76
  "exists": true,
77
- "bytes": 10977,
78
- "sha256": "2bb0639c137dfd6eddd337eb909292543ae2e72753dee398f8240ff35f6a3984"
79
  },
80
  {
81
  "id": "research_roadmap",
@@ -187,6 +187,17 @@
187
  "bytes": 6519,
188
  "sha256": "a3773fc681e298325e2be80556d6be6e7e30b90ba22ee24b66633f07ff9c4ea4"
189
  },
 
 
 
 
 
 
 
 
 
 
 
190
  {
191
  "id": "additional_development_directions",
192
  "title": "Additional development directions",
@@ -250,8 +261,8 @@
250
  "surface": "repo_hf",
251
  "shows": "Gives the human-readable map from project scope to data, tasks, platform mirrors, and scale-up status.",
252
  "exists": true,
253
- "bytes": 15660,
254
- "sha256": "a9ad335b82c35a5ac102428663ffae1c8798e90e45cc5e795c3a499b4563b417"
255
  },
256
  {
257
  "id": "official_dataset_card_alignment",
@@ -695,8 +706,8 @@
695
  "surface": "repo_hf",
696
  "shows": "Generates the selective artifact catalog from local files.",
697
  "exists": true,
698
- "bytes": 30785,
699
- "sha256": "0c42b68e44e6a32b6b5161b47161adc5ccdb57567e1462e8271ea87af50ab92d"
700
  },
701
  {
702
  "id": "publication_audit",
@@ -731,7 +742,7 @@
731
  "volatile": true,
732
  "shows": "Confirms prepared GitHub/HF Space/artifact/model mirrors share the same critical data, figure, website HTML, and validator files.",
733
  "exists": true,
734
- "bytes": 111950,
735
  "hash_policy": "existence_and_size_only"
736
  },
737
  {
@@ -933,6 +944,28 @@
933
  "bytes": 3076,
934
  "sha256": "23b87581cfc1d95b0af118a0dbb4e601f42fc6bad608759490e13a9a1ef73205"
935
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
936
  {
937
  "id": "citation",
938
  "title": "Citation metadata",
 
1
  {
2
  "title": "Ropedia Xperience-10M Task Suite Artifact Index",
3
+ "generated_at_utc": "2026-06-06T14:53:45+00:00",
4
  "status": "pass",
5
+ "artifact_count": 86,
6
  "missing": [],
7
  "by_kind": {
8
  "project_path": 14,
9
+ "scaleup_contract": 7,
10
  "project_scope": 1,
11
  "source_alignment": 5,
12
  "publication_workflow": 3,
 
28
  "onboarding_doc": 1,
29
  "generated_figure": 3,
30
  "generated_figure_assets": 1,
31
+ "scaleup_status": 4,
32
  "citation": 1,
33
  "license": 1
34
  },
 
63
  "surface": "repo_hf",
64
  "shows": "Gives a compact current-state table for first-pass readers.",
65
  "exists": true,
66
+ "bytes": 8805,
67
+ "sha256": "4051b78674306078880de33a144a499144b2487b11455c70a364a94cefa035a7"
68
  },
69
  {
70
  "id": "project_status_json",
 
74
  "surface": "website_hf",
75
  "shows": "Machine-readable copy of the current project status for website and HF mirrors.",
76
  "exists": true,
77
+ "bytes": 11274,
78
+ "sha256": "ae2b2c520ab1e0553fa399439345edd87832fa5293d8c27ffe610ede5bfa1067"
79
  },
80
  {
81
  "id": "research_roadmap",
 
187
  "bytes": 6519,
188
  "sha256": "a3773fc681e298325e2be80556d6be6e7e30b90ba22ee24b66633f07ff9c4ea4"
189
  },
190
+ {
191
+ "id": "qwen3_omni_error_analysis_script",
192
+ "title": "Qwen3-Omni held-out error-analysis script",
193
+ "path": "scripts/omni/analyze_qwen3_omni_errors.py",
194
+ "kind": "scaleup_contract",
195
+ "surface": "repo_hf",
196
+ "shows": "Computes public-safe held-out error-analysis tables by episode, action family, train-seen status, required-modality state, and object category.",
197
+ "exists": true,
198
+ "bytes": 15676,
199
+ "sha256": "d4c7e46d9fbd5f9d84bc32374f457fd8c9d68c8faa39c77bc45770eb95d80337"
200
+ },
201
  {
202
  "id": "additional_development_directions",
203
  "title": "Additional development directions",
 
261
  "surface": "repo_hf",
262
  "shows": "Gives the human-readable map from project scope to data, tasks, platform mirrors, and scale-up status.",
263
  "exists": true,
264
+ "bytes": 16318,
265
+ "sha256": "cda5f4b5be4b7a2d26aff6ed7f930bfba13dfc463d533a9880193c0a0611b677"
266
  },
267
  {
268
  "id": "official_dataset_card_alignment",
 
706
  "surface": "repo_hf",
707
  "shows": "Generates the selective artifact catalog from local files.",
708
  "exists": true,
709
+ "bytes": 32191,
710
+ "sha256": "4a105c732d2f6c54a78333d7f47e0139325ba638027e34e6acd929a90626b8e0"
711
  },
712
  {
713
  "id": "publication_audit",
 
742
  "volatile": true,
743
  "shows": "Confirms prepared GitHub/HF Space/artifact/model mirrors share the same critical data, figure, website HTML, and validator files.",
744
  "exists": true,
745
+ "bytes": 126335,
746
  "hash_policy": "existence_and_size_only"
747
  },
748
  {
 
944
  "bytes": 3076,
945
  "sha256": "23b87581cfc1d95b0af118a0dbb4e601f42fc6bad608759490e13a9a1ef73205"
946
  },
947
+ {
948
+ "id": "qwen3_omni_error_analysis_report",
949
+ "title": "Qwen3-Omni held-out error-analysis report",
950
+ "path": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/ERROR_ANALYSIS.md",
951
+ "kind": "scaleup_status",
952
+ "surface": "repo_hf",
953
+ "shows": "Summarizes validation-aware Qwen3-Omni held-out failures by episode, action family, train-seen status, required-modality state, and object category.",
954
+ "exists": true,
955
+ "bytes": 3331,
956
+ "sha256": "063fcc2ebd7b57ab5b281fd5e8edc629da4e1f4e5a708483ba27375d02af9467"
957
+ },
958
+ {
959
+ "id": "qwen3_omni_error_analysis_json",
960
+ "title": "Qwen3-Omni held-out error-analysis JSON",
961
+ "path": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/error_analysis_summary.json",
962
+ "kind": "scaleup_status",
963
+ "surface": "repo_hf",
964
+ "shows": "Machine-readable Qwen3-Omni held-out error analysis with grouped metrics and sanitized failure examples.",
965
+ "exists": true,
966
+ "bytes": 25202,
967
+ "sha256": "c2e4eaa686f5d9739a8d0bfd8ae51a453b94019489ed84a154e2bce2fa316ff5"
968
+ },
969
  {
970
  "id": "citation",
971
  "title": "Citation metadata",
docs/data/mirror_parity.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-06T14:37:36+00:00",
4
  "hf_root": "hf_publish",
5
  "summary": {
6
- "group_count": 104,
7
  "failure_count": 0,
8
  "failures_by_surface": {}
9
  },
@@ -102,27 +102,27 @@
102
  "local": {
103
  "path": "repo:docs/data/artifact_index.json",
104
  "exists": true,
105
- "bytes": 37736,
106
- "sha256": "f1d87cbabab02227b834ad333507af31a8ce309600f0e0427bb8cb59a26c3b71"
107
  },
108
  "mirrors": {
109
  "hf_space": {
110
  "path": "hf_space:data/artifact_index.json",
111
  "exists": true,
112
- "bytes": 37736,
113
- "sha256": "f1d87cbabab02227b834ad333507af31a8ce309600f0e0427bb8cb59a26c3b71"
114
  },
115
  "hf_artifacts": {
116
  "path": "hf_artifacts:docs/data/artifact_index.json",
117
  "exists": true,
118
- "bytes": 37736,
119
- "sha256": "f1d87cbabab02227b834ad333507af31a8ce309600f0e0427bb8cb59a26c3b71"
120
  },
121
  "hf_model": {
122
  "path": "hf_model:metrics/artifact_index.json",
123
  "exists": true,
124
- "bytes": 37736,
125
- "sha256": "f1d87cbabab02227b834ad333507af31a8ce309600f0e0427bb8cb59a26c3b71"
126
  }
127
  },
128
  "failures": []
@@ -350,27 +350,27 @@
350
  "local": {
351
  "path": "repo:docs/data/omni_finetune_verified_result.json",
352
  "exists": true,
353
- "bytes": 3145,
354
- "sha256": "37b001a24201ba56b327fa89f19792d64ebcdabc1faffa7e7bb4fd6b8323731a"
355
  },
356
  "mirrors": {
357
  "hf_space": {
358
  "path": "hf_space:data/omni_finetune_verified_result.json",
359
  "exists": true,
360
- "bytes": 3145,
361
- "sha256": "37b001a24201ba56b327fa89f19792d64ebcdabc1faffa7e7bb4fd6b8323731a"
362
  },
363
  "hf_artifacts": {
364
  "path": "hf_artifacts:docs/data/omni_finetune_verified_result.json",
365
  "exists": true,
366
- "bytes": 3145,
367
- "sha256": "37b001a24201ba56b327fa89f19792d64ebcdabc1faffa7e7bb4fd6b8323731a"
368
  },
369
  "hf_model": {
370
  "path": "hf_model:metrics/omni_finetune_verified_result.json",
371
  "exists": true,
372
- "bytes": 3145,
373
- "sha256": "37b001a24201ba56b327fa89f19792d64ebcdabc1faffa7e7bb4fd6b8323731a"
374
  }
375
  },
376
  "failures": []
@@ -474,27 +474,27 @@
474
  "local": {
475
  "path": "repo:docs/data/project_status.json",
476
  "exists": true,
477
- "bytes": 10977,
478
- "sha256": "2bb0639c137dfd6eddd337eb909292543ae2e72753dee398f8240ff35f6a3984"
479
  },
480
  "mirrors": {
481
  "hf_space": {
482
  "path": "hf_space:data/project_status.json",
483
  "exists": true,
484
- "bytes": 10977,
485
- "sha256": "2bb0639c137dfd6eddd337eb909292543ae2e72753dee398f8240ff35f6a3984"
486
  },
487
  "hf_artifacts": {
488
  "path": "hf_artifacts:docs/data/project_status.json",
489
  "exists": true,
490
- "bytes": 10977,
491
- "sha256": "2bb0639c137dfd6eddd337eb909292543ae2e72753dee398f8240ff35f6a3984"
492
  },
493
  "hf_model": {
494
  "path": "hf_model:metrics/project_status.json",
495
  "exists": true,
496
- "bytes": 10977,
497
- "sha256": "2bb0639c137dfd6eddd337eb909292543ae2e72753dee398f8240ff35f6a3984"
498
  }
499
  },
500
  "failures": []
@@ -506,26 +506,26 @@
506
  "path": "repo:docs/data/publication_audit.json",
507
  "exists": true,
508
  "bytes": 7237,
509
- "sha256": "bfdfb04abf62dfb3ffa596f1d9ec58fc5bac633f6c1cfb1710d3988ef635cf03"
510
  },
511
  "mirrors": {
512
  "hf_space": {
513
  "path": "hf_space:data/publication_audit.json",
514
  "exists": true,
515
  "bytes": 7237,
516
- "sha256": "bfdfb04abf62dfb3ffa596f1d9ec58fc5bac633f6c1cfb1710d3988ef635cf03"
517
  },
518
  "hf_artifacts": {
519
  "path": "hf_artifacts:docs/data/publication_audit.json",
520
  "exists": true,
521
  "bytes": 7237,
522
- "sha256": "bfdfb04abf62dfb3ffa596f1d9ec58fc5bac633f6c1cfb1710d3988ef635cf03"
523
  },
524
  "hf_model": {
525
  "path": "hf_model:metrics/publication_audit.json",
526
  "exists": true,
527
  "bytes": 7237,
528
- "sha256": "bfdfb04abf62dfb3ffa596f1d9ec58fc5bac633f6c1cfb1710d3988ef635cf03"
529
  }
530
  },
531
  "failures": []
@@ -816,26 +816,26 @@
816
  "path": "repo:docs/data/scope_claims_audit.json",
817
  "exists": true,
818
  "bytes": 20823,
819
- "sha256": "7f01728415c9c54126eab25f2ce68e563b455f02d2bf10af514463c33bc0091e"
820
  },
821
  "mirrors": {
822
  "hf_space": {
823
  "path": "hf_space:data/scope_claims_audit.json",
824
  "exists": true,
825
  "bytes": 20823,
826
- "sha256": "7f01728415c9c54126eab25f2ce68e563b455f02d2bf10af514463c33bc0091e"
827
  },
828
  "hf_artifacts": {
829
  "path": "hf_artifacts:docs/data/scope_claims_audit.json",
830
  "exists": true,
831
  "bytes": 20823,
832
- "sha256": "7f01728415c9c54126eab25f2ce68e563b455f02d2bf10af514463c33bc0091e"
833
  },
834
  "hf_model": {
835
  "path": "hf_model:metrics/scope_claims_audit.json",
836
  "exists": true,
837
  "bytes": 20823,
838
- "sha256": "7f01728415c9c54126eab25f2ce68e563b455f02d2bf10af514463c33bc0091e"
839
  }
840
  },
841
  "failures": []
@@ -940,26 +940,26 @@
940
  "path": "repo:docs/data/task_surface_integrity.json",
941
  "exists": true,
942
  "bytes": 45779,
943
- "sha256": "1ae426aea9895c32912b2c9a0e519a55912222493d3c1d72e4785d71cd3b71cb"
944
  },
945
  "mirrors": {
946
  "hf_space": {
947
  "path": "hf_space:data/task_surface_integrity.json",
948
  "exists": true,
949
  "bytes": 45779,
950
- "sha256": "1ae426aea9895c32912b2c9a0e519a55912222493d3c1d72e4785d71cd3b71cb"
951
  },
952
  "hf_artifacts": {
953
  "path": "hf_artifacts:docs/data/task_surface_integrity.json",
954
  "exists": true,
955
  "bytes": 45779,
956
- "sha256": "1ae426aea9895c32912b2c9a0e519a55912222493d3c1d72e4785d71cd3b71cb"
957
  },
958
  "hf_model": {
959
  "path": "hf_model:metrics/task_surface_integrity.json",
960
  "exists": true,
961
  "bytes": 45779,
962
- "sha256": "1ae426aea9895c32912b2c9a0e519a55912222493d3c1d72e4785d71cd3b71cb"
963
  }
964
  },
965
  "failures": []
@@ -1002,26 +1002,26 @@
1002
  "path": "repo:docs/data/website_integrity.json",
1003
  "exists": true,
1004
  "bytes": 15221,
1005
- "sha256": "08f9429aead121834f52fb108a35ff0933435d49064650b94b7ed84c1002182b"
1006
  },
1007
  "mirrors": {
1008
  "hf_space": {
1009
  "path": "hf_space:data/website_integrity.json",
1010
  "exists": true,
1011
  "bytes": 15221,
1012
- "sha256": "08f9429aead121834f52fb108a35ff0933435d49064650b94b7ed84c1002182b"
1013
  },
1014
  "hf_artifacts": {
1015
  "path": "hf_artifacts:docs/data/website_integrity.json",
1016
  "exists": true,
1017
  "bytes": 15221,
1018
- "sha256": "08f9429aead121834f52fb108a35ff0933435d49064650b94b7ed84c1002182b"
1019
  },
1020
  "hf_model": {
1021
  "path": "hf_model:metrics/website_integrity.json",
1022
  "exists": true,
1023
  "bytes": 15221,
1024
- "sha256": "08f9429aead121834f52fb108a35ff0933435d49064650b94b7ed84c1002182b"
1025
  }
1026
  },
1027
  "failures": []
@@ -1723,6 +1723,31 @@
1723
  },
1724
  "failures": []
1725
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1726
  {
1727
  "name": "scripts/audio_ablation_and_raw_upgrade.py",
1728
  "status": "pass",
@@ -1754,21 +1779,21 @@
1754
  "local": {
1755
  "path": "repo:scripts/build_artifact_index.py",
1756
  "exists": true,
1757
- "bytes": 30785,
1758
- "sha256": "0c42b68e44e6a32b6b5161b47161adc5ccdb57567e1462e8271ea87af50ab92d"
1759
  },
1760
  "mirrors": {
1761
  "hf_artifacts": {
1762
  "path": "hf_artifacts:scripts/build_artifact_index.py",
1763
  "exists": true,
1764
- "bytes": 30785,
1765
- "sha256": "0c42b68e44e6a32b6b5161b47161adc5ccdb57567e1462e8271ea87af50ab92d"
1766
  },
1767
  "hf_model": {
1768
  "path": "hf_model:scripts/build_artifact_index.py",
1769
  "exists": true,
1770
- "bytes": 30785,
1771
- "sha256": "0c42b68e44e6a32b6b5161b47161adc5ccdb57567e1462e8271ea87af50ab92d"
1772
  }
1773
  },
1774
  "failures": []
@@ -2054,21 +2079,21 @@
2054
  "local": {
2055
  "path": "repo:scripts/validate_mirror_parity.py",
2056
  "exists": true,
2057
- "bytes": 12642,
2058
- "sha256": "17420a261d1327c0a8acb79adb75fc15217f117216eb74acf0cab3fa36de856c"
2059
  },
2060
  "mirrors": {
2061
  "hf_artifacts": {
2062
  "path": "hf_artifacts:scripts/validate_mirror_parity.py",
2063
  "exists": true,
2064
- "bytes": 12642,
2065
- "sha256": "17420a261d1327c0a8acb79adb75fc15217f117216eb74acf0cab3fa36de856c"
2066
  },
2067
  "hf_model": {
2068
  "path": "hf_model:scripts/validate_mirror_parity.py",
2069
  "exists": true,
2070
- "bytes": 12642,
2071
- "sha256": "17420a261d1327c0a8acb79adb75fc15217f117216eb74acf0cab3fa36de856c"
2072
  }
2073
  },
2074
  "failures": []
@@ -2807,6 +2832,285 @@
2807
  },
2808
  "failures": []
2809
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2810
  {
2811
  "name": "docs/QUALITY_GATES.md",
2812
  "status": "pass",
@@ -3061,27 +3365,27 @@
3061
  "local": {
3062
  "path": "repo:PROJECT_STATUS.md",
3063
  "exists": true,
3064
- "bytes": 8534,
3065
- "sha256": "5eb48d489da7f005baab233a94c9d6b209eb1e9ffdb138c8e0e600ece9239a29"
3066
  },
3067
  "mirrors": {
3068
  "hf_space": {
3069
  "path": "hf_space:PROJECT_STATUS.md",
3070
  "exists": true,
3071
- "bytes": 8534,
3072
- "sha256": "5eb48d489da7f005baab233a94c9d6b209eb1e9ffdb138c8e0e600ece9239a29"
3073
  },
3074
  "hf_artifacts": {
3075
  "path": "hf_artifacts:PROJECT_STATUS.md",
3076
  "exists": true,
3077
- "bytes": 8534,
3078
- "sha256": "5eb48d489da7f005baab233a94c9d6b209eb1e9ffdb138c8e0e600ece9239a29"
3079
  },
3080
  "hf_model": {
3081
  "path": "hf_model:PROJECT_STATUS.md",
3082
  "exists": true,
3083
- "bytes": 8534,
3084
- "sha256": "5eb48d489da7f005baab233a94c9d6b209eb1e9ffdb138c8e0e600ece9239a29"
3085
  }
3086
  },
3087
  "failures": []
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-06T14:56:44+00:00",
4
  "hf_root": "hf_publish",
5
  "summary": {
6
+ "group_count": 114,
7
  "failure_count": 0,
8
  "failures_by_surface": {}
9
  },
 
102
  "local": {
103
  "path": "repo:docs/data/artifact_index.json",
104
  "exists": true,
105
+ "bytes": 39486,
106
+ "sha256": "87782cd08bc1106d694a727e21333450d2965b48c48f500d1b6f4294d7b247d0"
107
  },
108
  "mirrors": {
109
  "hf_space": {
110
  "path": "hf_space:data/artifact_index.json",
111
  "exists": true,
112
+ "bytes": 39486,
113
+ "sha256": "87782cd08bc1106d694a727e21333450d2965b48c48f500d1b6f4294d7b247d0"
114
  },
115
  "hf_artifacts": {
116
  "path": "hf_artifacts:docs/data/artifact_index.json",
117
  "exists": true,
118
+ "bytes": 39486,
119
+ "sha256": "87782cd08bc1106d694a727e21333450d2965b48c48f500d1b6f4294d7b247d0"
120
  },
121
  "hf_model": {
122
  "path": "hf_model:metrics/artifact_index.json",
123
  "exists": true,
124
+ "bytes": 39486,
125
+ "sha256": "87782cd08bc1106d694a727e21333450d2965b48c48f500d1b6f4294d7b247d0"
126
  }
127
  },
128
  "failures": []
 
350
  "local": {
351
  "path": "repo:docs/data/omni_finetune_verified_result.json",
352
  "exists": true,
353
+ "bytes": 4142,
354
+ "sha256": "297aa6fc86bc09ba7968f3c5c2db265320c0613c5ec9a36701114ba451321b81"
355
  },
356
  "mirrors": {
357
  "hf_space": {
358
  "path": "hf_space:data/omni_finetune_verified_result.json",
359
  "exists": true,
360
+ "bytes": 4142,
361
+ "sha256": "297aa6fc86bc09ba7968f3c5c2db265320c0613c5ec9a36701114ba451321b81"
362
  },
363
  "hf_artifacts": {
364
  "path": "hf_artifacts:docs/data/omni_finetune_verified_result.json",
365
  "exists": true,
366
+ "bytes": 4142,
367
+ "sha256": "297aa6fc86bc09ba7968f3c5c2db265320c0613c5ec9a36701114ba451321b81"
368
  },
369
  "hf_model": {
370
  "path": "hf_model:metrics/omni_finetune_verified_result.json",
371
  "exists": true,
372
+ "bytes": 4142,
373
+ "sha256": "297aa6fc86bc09ba7968f3c5c2db265320c0613c5ec9a36701114ba451321b81"
374
  }
375
  },
376
  "failures": []
 
474
  "local": {
475
  "path": "repo:docs/data/project_status.json",
476
  "exists": true,
477
+ "bytes": 11274,
478
+ "sha256": "ae2b2c520ab1e0553fa399439345edd87832fa5293d8c27ffe610ede5bfa1067"
479
  },
480
  "mirrors": {
481
  "hf_space": {
482
  "path": "hf_space:data/project_status.json",
483
  "exists": true,
484
+ "bytes": 11274,
485
+ "sha256": "ae2b2c520ab1e0553fa399439345edd87832fa5293d8c27ffe610ede5bfa1067"
486
  },
487
  "hf_artifacts": {
488
  "path": "hf_artifacts:docs/data/project_status.json",
489
  "exists": true,
490
+ "bytes": 11274,
491
+ "sha256": "ae2b2c520ab1e0553fa399439345edd87832fa5293d8c27ffe610ede5bfa1067"
492
  },
493
  "hf_model": {
494
  "path": "hf_model:metrics/project_status.json",
495
  "exists": true,
496
+ "bytes": 11274,
497
+ "sha256": "ae2b2c520ab1e0553fa399439345edd87832fa5293d8c27ffe610ede5bfa1067"
498
  }
499
  },
500
  "failures": []
 
506
  "path": "repo:docs/data/publication_audit.json",
507
  "exists": true,
508
  "bytes": 7237,
509
+ "sha256": "8a21c29d92f3a15b835c37d7784c17fada3edbda050515deed8e440535ed046d"
510
  },
511
  "mirrors": {
512
  "hf_space": {
513
  "path": "hf_space:data/publication_audit.json",
514
  "exists": true,
515
  "bytes": 7237,
516
+ "sha256": "8a21c29d92f3a15b835c37d7784c17fada3edbda050515deed8e440535ed046d"
517
  },
518
  "hf_artifacts": {
519
  "path": "hf_artifacts:docs/data/publication_audit.json",
520
  "exists": true,
521
  "bytes": 7237,
522
+ "sha256": "8a21c29d92f3a15b835c37d7784c17fada3edbda050515deed8e440535ed046d"
523
  },
524
  "hf_model": {
525
  "path": "hf_model:metrics/publication_audit.json",
526
  "exists": true,
527
  "bytes": 7237,
528
+ "sha256": "8a21c29d92f3a15b835c37d7784c17fada3edbda050515deed8e440535ed046d"
529
  }
530
  },
531
  "failures": []
 
816
  "path": "repo:docs/data/scope_claims_audit.json",
817
  "exists": true,
818
  "bytes": 20823,
819
+ "sha256": "77402dc77c4ecf5cf1e68480ae2c9822a134ae7ef4a24a7b8b9008a2509c2fa3"
820
  },
821
  "mirrors": {
822
  "hf_space": {
823
  "path": "hf_space:data/scope_claims_audit.json",
824
  "exists": true,
825
  "bytes": 20823,
826
+ "sha256": "77402dc77c4ecf5cf1e68480ae2c9822a134ae7ef4a24a7b8b9008a2509c2fa3"
827
  },
828
  "hf_artifacts": {
829
  "path": "hf_artifacts:docs/data/scope_claims_audit.json",
830
  "exists": true,
831
  "bytes": 20823,
832
+ "sha256": "77402dc77c4ecf5cf1e68480ae2c9822a134ae7ef4a24a7b8b9008a2509c2fa3"
833
  },
834
  "hf_model": {
835
  "path": "hf_model:metrics/scope_claims_audit.json",
836
  "exists": true,
837
  "bytes": 20823,
838
+ "sha256": "77402dc77c4ecf5cf1e68480ae2c9822a134ae7ef4a24a7b8b9008a2509c2fa3"
839
  }
840
  },
841
  "failures": []
 
940
  "path": "repo:docs/data/task_surface_integrity.json",
941
  "exists": true,
942
  "bytes": 45779,
943
+ "sha256": "8232e2bafa8b5157d97c018e41be5da3ec69ddb4d2020a0dcc7c6377c5575bb6"
944
  },
945
  "mirrors": {
946
  "hf_space": {
947
  "path": "hf_space:data/task_surface_integrity.json",
948
  "exists": true,
949
  "bytes": 45779,
950
+ "sha256": "8232e2bafa8b5157d97c018e41be5da3ec69ddb4d2020a0dcc7c6377c5575bb6"
951
  },
952
  "hf_artifacts": {
953
  "path": "hf_artifacts:docs/data/task_surface_integrity.json",
954
  "exists": true,
955
  "bytes": 45779,
956
+ "sha256": "8232e2bafa8b5157d97c018e41be5da3ec69ddb4d2020a0dcc7c6377c5575bb6"
957
  },
958
  "hf_model": {
959
  "path": "hf_model:metrics/task_surface_integrity.json",
960
  "exists": true,
961
  "bytes": 45779,
962
+ "sha256": "8232e2bafa8b5157d97c018e41be5da3ec69ddb4d2020a0dcc7c6377c5575bb6"
963
  }
964
  },
965
  "failures": []
 
1002
  "path": "repo:docs/data/website_integrity.json",
1003
  "exists": true,
1004
  "bytes": 15221,
1005
+ "sha256": "dcbd09b4c4522770c43504c500eb653de706538516ee2ec72e491ffc3416c6e2"
1006
  },
1007
  "mirrors": {
1008
  "hf_space": {
1009
  "path": "hf_space:data/website_integrity.json",
1010
  "exists": true,
1011
  "bytes": 15221,
1012
+ "sha256": "dcbd09b4c4522770c43504c500eb653de706538516ee2ec72e491ffc3416c6e2"
1013
  },
1014
  "hf_artifacts": {
1015
  "path": "hf_artifacts:docs/data/website_integrity.json",
1016
  "exists": true,
1017
  "bytes": 15221,
1018
+ "sha256": "dcbd09b4c4522770c43504c500eb653de706538516ee2ec72e491ffc3416c6e2"
1019
  },
1020
  "hf_model": {
1021
  "path": "hf_model:metrics/website_integrity.json",
1022
  "exists": true,
1023
  "bytes": 15221,
1024
+ "sha256": "dcbd09b4c4522770c43504c500eb653de706538516ee2ec72e491ffc3416c6e2"
1025
  }
1026
  },
1027
  "failures": []
 
1723
  },
1724
  "failures": []
1725
  },
1726
+ {
1727
+ "name": "scripts/omni/analyze_qwen3_omni_errors.py",
1728
+ "status": "pass",
1729
+ "local": {
1730
+ "path": "repo:scripts/omni/analyze_qwen3_omni_errors.py",
1731
+ "exists": true,
1732
+ "bytes": 15676,
1733
+ "sha256": "d4c7e46d9fbd5f9d84bc32374f457fd8c9d68c8faa39c77bc45770eb95d80337"
1734
+ },
1735
+ "mirrors": {
1736
+ "hf_artifacts": {
1737
+ "path": "hf_artifacts:scripts/omni/analyze_qwen3_omni_errors.py",
1738
+ "exists": true,
1739
+ "bytes": 15676,
1740
+ "sha256": "d4c7e46d9fbd5f9d84bc32374f457fd8c9d68c8faa39c77bc45770eb95d80337"
1741
+ },
1742
+ "hf_model": {
1743
+ "path": "hf_model:scripts/omni/analyze_qwen3_omni_errors.py",
1744
+ "exists": true,
1745
+ "bytes": 15676,
1746
+ "sha256": "d4c7e46d9fbd5f9d84bc32374f457fd8c9d68c8faa39c77bc45770eb95d80337"
1747
+ }
1748
+ },
1749
+ "failures": []
1750
+ },
1751
  {
1752
  "name": "scripts/audio_ablation_and_raw_upgrade.py",
1753
  "status": "pass",
 
1779
  "local": {
1780
  "path": "repo:scripts/build_artifact_index.py",
1781
  "exists": true,
1782
+ "bytes": 32191,
1783
+ "sha256": "4a105c732d2f6c54a78333d7f47e0139325ba638027e34e6acd929a90626b8e0"
1784
  },
1785
  "mirrors": {
1786
  "hf_artifacts": {
1787
  "path": "hf_artifacts:scripts/build_artifact_index.py",
1788
  "exists": true,
1789
+ "bytes": 32191,
1790
+ "sha256": "4a105c732d2f6c54a78333d7f47e0139325ba638027e34e6acd929a90626b8e0"
1791
  },
1792
  "hf_model": {
1793
  "path": "hf_model:scripts/build_artifact_index.py",
1794
  "exists": true,
1795
+ "bytes": 32191,
1796
+ "sha256": "4a105c732d2f6c54a78333d7f47e0139325ba638027e34e6acd929a90626b8e0"
1797
  }
1798
  },
1799
  "failures": []
 
2079
  "local": {
2080
  "path": "repo:scripts/validate_mirror_parity.py",
2081
  "exists": true,
2082
+ "bytes": 13781,
2083
+ "sha256": "3659adf936b058617dde97ee4c424615a361e59f5ea74975116422dfe01768e8"
2084
  },
2085
  "mirrors": {
2086
  "hf_artifacts": {
2087
  "path": "hf_artifacts:scripts/validate_mirror_parity.py",
2088
  "exists": true,
2089
+ "bytes": 13781,
2090
+ "sha256": "3659adf936b058617dde97ee4c424615a361e59f5ea74975116422dfe01768e8"
2091
  },
2092
  "hf_model": {
2093
  "path": "hf_model:scripts/validate_mirror_parity.py",
2094
  "exists": true,
2095
+ "bytes": 13781,
2096
+ "sha256": "3659adf936b058617dde97ee4c424615a361e59f5ea74975116422dfe01768e8"
2097
  }
2098
  },
2099
  "failures": []
 
2832
  },
2833
  "failures": []
2834
  },
2835
+ {
2836
+ "name": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/ERROR_ANALYSIS.md",
2837
+ "status": "pass",
2838
+ "local": {
2839
+ "path": "repo:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/ERROR_ANALYSIS.md",
2840
+ "exists": true,
2841
+ "bytes": 3331,
2842
+ "sha256": "063fcc2ebd7b57ab5b281fd5e8edc629da4e1f4e5a708483ba27375d02af9467"
2843
+ },
2844
+ "mirrors": {
2845
+ "hf_space": {
2846
+ "path": "hf_space:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/ERROR_ANALYSIS.md",
2847
+ "exists": true,
2848
+ "bytes": 3331,
2849
+ "sha256": "063fcc2ebd7b57ab5b281fd5e8edc629da4e1f4e5a708483ba27375d02af9467"
2850
+ },
2851
+ "hf_artifacts": {
2852
+ "path": "hf_artifacts:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/ERROR_ANALYSIS.md",
2853
+ "exists": true,
2854
+ "bytes": 3331,
2855
+ "sha256": "063fcc2ebd7b57ab5b281fd5e8edc629da4e1f4e5a708483ba27375d02af9467"
2856
+ },
2857
+ "hf_model": {
2858
+ "path": "hf_model:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/ERROR_ANALYSIS.md",
2859
+ "exists": true,
2860
+ "bytes": 3331,
2861
+ "sha256": "063fcc2ebd7b57ab5b281fd5e8edc629da4e1f4e5a708483ba27375d02af9467"
2862
+ }
2863
+ },
2864
+ "failures": []
2865
+ },
2866
+ {
2867
+ "name": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/error_analysis_summary.json",
2868
+ "status": "pass",
2869
+ "local": {
2870
+ "path": "repo:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/error_analysis_summary.json",
2871
+ "exists": true,
2872
+ "bytes": 25202,
2873
+ "sha256": "c2e4eaa686f5d9739a8d0bfd8ae51a453b94019489ed84a154e2bce2fa316ff5"
2874
+ },
2875
+ "mirrors": {
2876
+ "hf_space": {
2877
+ "path": "hf_space:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/error_analysis_summary.json",
2878
+ "exists": true,
2879
+ "bytes": 25202,
2880
+ "sha256": "c2e4eaa686f5d9739a8d0bfd8ae51a453b94019489ed84a154e2bce2fa316ff5"
2881
+ },
2882
+ "hf_artifacts": {
2883
+ "path": "hf_artifacts:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/error_analysis_summary.json",
2884
+ "exists": true,
2885
+ "bytes": 25202,
2886
+ "sha256": "c2e4eaa686f5d9739a8d0bfd8ae51a453b94019489ed84a154e2bce2fa316ff5"
2887
+ },
2888
+ "hf_model": {
2889
+ "path": "hf_model:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/error_analysis_summary.json",
2890
+ "exists": true,
2891
+ "bytes": 25202,
2892
+ "sha256": "c2e4eaa686f5d9739a8d0bfd8ae51a453b94019489ed84a154e2bce2fa316ff5"
2893
+ }
2894
+ },
2895
+ "failures": []
2896
+ },
2897
+ {
2898
+ "name": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/episode_error_analysis.csv",
2899
+ "status": "pass",
2900
+ "local": {
2901
+ "path": "repo:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/episode_error_analysis.csv",
2902
+ "exists": true,
2903
+ "bytes": 2121,
2904
+ "sha256": "7f0bc74140f100b9fe444c38eb74d155605bfc5984f665e653a2cd34a5cb96bd"
2905
+ },
2906
+ "mirrors": {
2907
+ "hf_space": {
2908
+ "path": "hf_space:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/episode_error_analysis.csv",
2909
+ "exists": true,
2910
+ "bytes": 2121,
2911
+ "sha256": "7f0bc74140f100b9fe444c38eb74d155605bfc5984f665e653a2cd34a5cb96bd"
2912
+ },
2913
+ "hf_artifacts": {
2914
+ "path": "hf_artifacts:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/episode_error_analysis.csv",
2915
+ "exists": true,
2916
+ "bytes": 2121,
2917
+ "sha256": "7f0bc74140f100b9fe444c38eb74d155605bfc5984f665e653a2cd34a5cb96bd"
2918
+ },
2919
+ "hf_model": {
2920
+ "path": "hf_model:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/episode_error_analysis.csv",
2921
+ "exists": true,
2922
+ "bytes": 2121,
2923
+ "sha256": "7f0bc74140f100b9fe444c38eb74d155605bfc5984f665e653a2cd34a5cb96bd"
2924
+ }
2925
+ },
2926
+ "failures": []
2927
+ },
2928
+ {
2929
+ "name": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/action_family_error_analysis.csv",
2930
+ "status": "pass",
2931
+ "local": {
2932
+ "path": "repo:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/action_family_error_analysis.csv",
2933
+ "exists": true,
2934
+ "bytes": 1320,
2935
+ "sha256": "e15bf22e96b887c4b00aeb8ba548f4fd72ea0aab0772cc59e9bdda517ad72430"
2936
+ },
2937
+ "mirrors": {
2938
+ "hf_space": {
2939
+ "path": "hf_space:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/action_family_error_analysis.csv",
2940
+ "exists": true,
2941
+ "bytes": 1320,
2942
+ "sha256": "e15bf22e96b887c4b00aeb8ba548f4fd72ea0aab0772cc59e9bdda517ad72430"
2943
+ },
2944
+ "hf_artifacts": {
2945
+ "path": "hf_artifacts:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/action_family_error_analysis.csv",
2946
+ "exists": true,
2947
+ "bytes": 1320,
2948
+ "sha256": "e15bf22e96b887c4b00aeb8ba548f4fd72ea0aab0772cc59e9bdda517ad72430"
2949
+ },
2950
+ "hf_model": {
2951
+ "path": "hf_model:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/action_family_error_analysis.csv",
2952
+ "exists": true,
2953
+ "bytes": 1320,
2954
+ "sha256": "e15bf22e96b887c4b00aeb8ba548f4fd72ea0aab0772cc59e9bdda517ad72430"
2955
+ }
2956
+ },
2957
+ "failures": []
2958
+ },
2959
+ {
2960
+ "name": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/train_seen_error_analysis.csv",
2961
+ "status": "pass",
2962
+ "local": {
2963
+ "path": "repo:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/train_seen_error_analysis.csv",
2964
+ "exists": true,
2965
+ "bytes": 572,
2966
+ "sha256": "cb196616b6f073266087d8cb7182e36c0a761607f3082ad78c350fd99e1996e7"
2967
+ },
2968
+ "mirrors": {
2969
+ "hf_space": {
2970
+ "path": "hf_space:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/train_seen_error_analysis.csv",
2971
+ "exists": true,
2972
+ "bytes": 572,
2973
+ "sha256": "cb196616b6f073266087d8cb7182e36c0a761607f3082ad78c350fd99e1996e7"
2974
+ },
2975
+ "hf_artifacts": {
2976
+ "path": "hf_artifacts:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/train_seen_error_analysis.csv",
2977
+ "exists": true,
2978
+ "bytes": 572,
2979
+ "sha256": "cb196616b6f073266087d8cb7182e36c0a761607f3082ad78c350fd99e1996e7"
2980
+ },
2981
+ "hf_model": {
2982
+ "path": "hf_model:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/train_seen_error_analysis.csv",
2983
+ "exists": true,
2984
+ "bytes": 572,
2985
+ "sha256": "cb196616b6f073266087d8cb7182e36c0a761607f3082ad78c350fd99e1996e7"
2986
+ }
2987
+ },
2988
+ "failures": []
2989
+ },
2990
+ {
2991
+ "name": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/missing_modality_error_analysis.csv",
2992
+ "status": "pass",
2993
+ "local": {
2994
+ "path": "repo:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/missing_modality_error_analysis.csv",
2995
+ "exists": true,
2996
+ "bytes": 408,
2997
+ "sha256": "6447cf285b466a914055adb0aef4f3d47bf82d33a277d8ca2e6f22c4f0f2a7f7"
2998
+ },
2999
+ "mirrors": {
3000
+ "hf_space": {
3001
+ "path": "hf_space:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/missing_modality_error_analysis.csv",
3002
+ "exists": true,
3003
+ "bytes": 408,
3004
+ "sha256": "6447cf285b466a914055adb0aef4f3d47bf82d33a277d8ca2e6f22c4f0f2a7f7"
3005
+ },
3006
+ "hf_artifacts": {
3007
+ "path": "hf_artifacts:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/missing_modality_error_analysis.csv",
3008
+ "exists": true,
3009
+ "bytes": 408,
3010
+ "sha256": "6447cf285b466a914055adb0aef4f3d47bf82d33a277d8ca2e6f22c4f0f2a7f7"
3011
+ },
3012
+ "hf_model": {
3013
+ "path": "hf_model:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/missing_modality_error_analysis.csv",
3014
+ "exists": true,
3015
+ "bytes": 408,
3016
+ "sha256": "6447cf285b466a914055adb0aef4f3d47bf82d33a277d8ca2e6f22c4f0f2a7f7"
3017
+ }
3018
+ },
3019
+ "failures": []
3020
+ },
3021
+ {
3022
+ "name": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/object_category_error_analysis.csv",
3023
+ "status": "pass",
3024
+ "local": {
3025
+ "path": "repo:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/object_category_error_analysis.csv",
3026
+ "exists": true,
3027
+ "bytes": 1704,
3028
+ "sha256": "f9cbd5e566ef666fe2d1050cc5bdadc7967a2056bdaa1e2e9f88fb0c22ee0ef8"
3029
+ },
3030
+ "mirrors": {
3031
+ "hf_space": {
3032
+ "path": "hf_space:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/object_category_error_analysis.csv",
3033
+ "exists": true,
3034
+ "bytes": 1704,
3035
+ "sha256": "f9cbd5e566ef666fe2d1050cc5bdadc7967a2056bdaa1e2e9f88fb0c22ee0ef8"
3036
+ },
3037
+ "hf_artifacts": {
3038
+ "path": "hf_artifacts:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/object_category_error_analysis.csv",
3039
+ "exists": true,
3040
+ "bytes": 1704,
3041
+ "sha256": "f9cbd5e566ef666fe2d1050cc5bdadc7967a2056bdaa1e2e9f88fb0c22ee0ef8"
3042
+ },
3043
+ "hf_model": {
3044
+ "path": "hf_model:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/object_category_error_analysis.csv",
3045
+ "exists": true,
3046
+ "bytes": 1704,
3047
+ "sha256": "f9cbd5e566ef666fe2d1050cc5bdadc7967a2056bdaa1e2e9f88fb0c22ee0ef8"
3048
+ }
3049
+ },
3050
+ "failures": []
3051
+ },
3052
+ {
3053
+ "name": "docs/ARTIFACT_GUIDE.md",
3054
+ "status": "pass",
3055
+ "local": {
3056
+ "path": "repo:ARTIFACT_GUIDE.md",
3057
+ "exists": true,
3058
+ "bytes": 16318,
3059
+ "sha256": "cda5f4b5be4b7a2d26aff6ed7f930bfba13dfc463d533a9880193c0a0611b677"
3060
+ },
3061
+ "mirrors": {
3062
+ "hf_space": {
3063
+ "path": "hf_space:ARTIFACT_GUIDE.md",
3064
+ "exists": true,
3065
+ "bytes": 16318,
3066
+ "sha256": "cda5f4b5be4b7a2d26aff6ed7f930bfba13dfc463d533a9880193c0a0611b677"
3067
+ },
3068
+ "hf_artifacts": {
3069
+ "path": "hf_artifacts:ARTIFACT_GUIDE.md",
3070
+ "exists": true,
3071
+ "bytes": 16318,
3072
+ "sha256": "cda5f4b5be4b7a2d26aff6ed7f930bfba13dfc463d533a9880193c0a0611b677"
3073
+ },
3074
+ "hf_model": {
3075
+ "path": "hf_model:ARTIFACT_GUIDE.md",
3076
+ "exists": true,
3077
+ "bytes": 16318,
3078
+ "sha256": "cda5f4b5be4b7a2d26aff6ed7f930bfba13dfc463d533a9880193c0a0611b677"
3079
+ }
3080
+ },
3081
+ "failures": []
3082
+ },
3083
+ {
3084
+ "name": "docs/OMNI_MODEL_EXTENSION_CONTRACT.md",
3085
+ "status": "pass",
3086
+ "local": {
3087
+ "path": "repo:OMNI_MODEL_EXTENSION_CONTRACT.md",
3088
+ "exists": true,
3089
+ "bytes": 8900,
3090
+ "sha256": "c4e51d0aa7536045c229418603a67c6b3c5f31c9d756ca7395cb0c9455f0ed6d"
3091
+ },
3092
+ "mirrors": {
3093
+ "hf_space": {
3094
+ "path": "hf_space:OMNI_MODEL_EXTENSION_CONTRACT.md",
3095
+ "exists": true,
3096
+ "bytes": 8900,
3097
+ "sha256": "c4e51d0aa7536045c229418603a67c6b3c5f31c9d756ca7395cb0c9455f0ed6d"
3098
+ },
3099
+ "hf_artifacts": {
3100
+ "path": "hf_artifacts:OMNI_MODEL_EXTENSION_CONTRACT.md",
3101
+ "exists": true,
3102
+ "bytes": 8900,
3103
+ "sha256": "c4e51d0aa7536045c229418603a67c6b3c5f31c9d756ca7395cb0c9455f0ed6d"
3104
+ },
3105
+ "hf_model": {
3106
+ "path": "hf_model:OMNI_MODEL_EXTENSION_CONTRACT.md",
3107
+ "exists": true,
3108
+ "bytes": 8900,
3109
+ "sha256": "c4e51d0aa7536045c229418603a67c6b3c5f31c9d756ca7395cb0c9455f0ed6d"
3110
+ }
3111
+ },
3112
+ "failures": []
3113
+ },
3114
  {
3115
  "name": "docs/QUALITY_GATES.md",
3116
  "status": "pass",
 
3365
  "local": {
3366
  "path": "repo:PROJECT_STATUS.md",
3367
  "exists": true,
3368
+ "bytes": 8805,
3369
+ "sha256": "4051b78674306078880de33a144a499144b2487b11455c70a364a94cefa035a7"
3370
  },
3371
  "mirrors": {
3372
  "hf_space": {
3373
  "path": "hf_space:PROJECT_STATUS.md",
3374
  "exists": true,
3375
+ "bytes": 8805,
3376
+ "sha256": "4051b78674306078880de33a144a499144b2487b11455c70a364a94cefa035a7"
3377
  },
3378
  "hf_artifacts": {
3379
  "path": "hf_artifacts:PROJECT_STATUS.md",
3380
  "exists": true,
3381
+ "bytes": 8805,
3382
+ "sha256": "4051b78674306078880de33a144a499144b2487b11455c70a364a94cefa035a7"
3383
  },
3384
  "hf_model": {
3385
  "path": "hf_model:PROJECT_STATUS.md",
3386
  "exists": true,
3387
+ "bytes": 8805,
3388
+ "sha256": "4051b78674306078880de33a144a499144b2487b11455c70a364a94cefa035a7"
3389
  }
3390
  },
3391
  "failures": []
docs/data/omni_finetune_verified_result.json CHANGED
@@ -67,7 +67,28 @@
67
  "audit_status": "pass",
68
  "contains_raw_xperience10m_data": false,
69
  "contains_qwen_base_weights": false,
70
- "contains_lora_weights": false
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71
  },
72
  "required_next_steps": [
73
  "Improve JSON-format reliability through prompt, decoding, constrained parsing, or target formatting changes.",
 
67
  "audit_status": "pass",
68
  "contains_raw_xperience10m_data": false,
69
  "contains_qwen_base_weights": false,
70
+ "contains_lora_weights": false,
71
+ "error_analysis": {
72
+ "status": "pass",
73
+ "path": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/error_analysis_summary.json",
74
+ "markdown_report": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/ERROR_ANALYSIS.md",
75
+ "groupings": [
76
+ "episode",
77
+ "action_family",
78
+ "train_seen_status",
79
+ "required_modality_state",
80
+ "object_category"
81
+ ],
82
+ "key_readouts": {
83
+ "parsed_prediction_rate": 0.8772321428571429,
84
+ "weakest_action_family": "locomotion",
85
+ "weakest_action_family_samples": 23,
86
+ "weakest_action_family_parsed_prediction_rate": 0.2608695652173913,
87
+ "seen_action_exact_rate": 0.04580152671755725,
88
+ "unseen_action_exact_rate": 0.015772870662460567,
89
+ "required_modality_state": "rrd_missing_only_required_modalities_present"
90
+ }
91
+ }
92
  },
93
  "required_next_steps": [
94
  "Improve JSON-format reliability through prompt, decoding, constrained parsing, or target formatting changes.",
docs/data/project_status.json CHANGED
@@ -180,10 +180,12 @@
180
  "evidence": [
181
  "docs/data/omni_finetune_verified_result.json",
182
  "results/omni_finetune/verified_public/",
 
183
  "scripts/omni/package_verified_omni_result.py",
184
- "scripts/omni/audit_verified_omni_package.py"
 
185
  ],
186
- "readout": "The selected 96/16/16 episode split produced a validation-aware public-safe held-out package with 3,808 exported windows, 512 validation windows, and 448 test predictions. JSON validity is 87.50%, below the 98% target, so it is a stronger diagnostic baseline but not a strong model-quality result."
187
  },
188
  {
189
  "area": "Raw Xperience-10M redistribution",
 
180
  "evidence": [
181
  "docs/data/omni_finetune_verified_result.json",
182
  "results/omni_finetune/verified_public/",
183
+ "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/",
184
  "scripts/omni/package_verified_omni_result.py",
185
+ "scripts/omni/audit_verified_omni_package.py",
186
+ "scripts/omni/analyze_qwen3_omni_errors.py"
187
  ],
188
+ "readout": "The selected 96/16/16 episode split produced a validation-aware public-safe held-out package with 3,808 exported windows, 512 validation windows, 448 test predictions, and derived error-analysis tables by episode, action family, train-seen status, required-modality state, and object category. JSON validity is 87.50%, below the 98% target, so it is a diagnostic baseline but not a strong model-quality result."
189
  },
190
  {
191
  "area": "Raw Xperience-10M redistribution",
docs/data/publication_audit.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-06T14:38:05+00:00",
4
  "checks": [
5
  {
6
  "name": "required_publication_assets_present",
@@ -182,8 +182,8 @@
182
  "github_repo": {
183
  "root": "repo",
184
  "exists": true,
185
- "file_count": 442,
186
- "text_file_count": 372,
187
  "largest_file": {
188
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
189
  "bytes": 55702978
@@ -193,8 +193,8 @@
193
  "hf_space_bundle": {
194
  "root": "hf_publish/space",
195
  "exists": true,
196
- "file_count": 356,
197
- "text_file_count": 286,
198
  "largest_file": {
199
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
200
  "bytes": 55702978
@@ -204,8 +204,8 @@
204
  "hf_artifact_bundle": {
205
  "root": "hf_publish/artifacts",
206
  "exists": true,
207
- "file_count": 514,
208
- "text_file_count": 420,
209
  "largest_file": {
210
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
211
  "bytes": 55702978
@@ -215,8 +215,8 @@
215
  "hf_model_bundle": {
216
  "root": "hf_publish/model",
217
  "exists": true,
218
- "file_count": 701,
219
- "text_file_count": 572,
220
  "largest_file": {
221
  "path": "pytorch_model.bin",
222
  "bytes": 93495480
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-06T14:54:02+00:00",
4
  "checks": [
5
  {
6
  "name": "required_publication_assets_present",
 
182
  "github_repo": {
183
  "root": "repo",
184
  "exists": true,
185
+ "file_count": 450,
186
+ "text_file_count": 380,
187
  "largest_file": {
188
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
189
  "bytes": 55702978
 
193
  "hf_space_bundle": {
194
  "root": "hf_publish/space",
195
  "exists": true,
196
+ "file_count": 363,
197
+ "text_file_count": 293,
198
  "largest_file": {
199
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
200
  "bytes": 55702978
 
204
  "hf_artifact_bundle": {
205
  "root": "hf_publish/artifacts",
206
  "exists": true,
207
+ "file_count": 522,
208
+ "text_file_count": 428,
209
  "largest_file": {
210
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
211
  "bytes": 55702978
 
215
  "hf_model_bundle": {
216
  "root": "hf_publish/model",
217
  "exists": true,
218
+ "file_count": 709,
219
+ "text_file_count": 580,
220
  "largest_file": {
221
  "path": "pytorch_model.bin",
222
  "bytes": 93495480
docs/data/scope_claims_audit.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-06T14:35:59+00:00",
4
  "summary": {
5
  "qwen3_omni_verified_diagnostic_pilot": true,
6
  "dataset_manifest_num_episodes": 119,
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-06T14:54:01+00:00",
4
  "summary": {
5
  "qwen3_omni_verified_diagnostic_pilot": true,
6
  "dataset_manifest_num_episodes": 119,
docs/data/task_surface_integrity.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-06T14:35:59+00:00",
4
  "summary": {
5
  "task_count": 12,
6
  "expected_task_count": 12,
@@ -64,15 +64,21 @@
64
  "observed": "timeline_action"
65
  },
66
  {
67
- "name": "timeline_action: public_field_input_short_is_human_readable",
68
  "status": "pass",
69
- "value": "20-frame multimodal window",
70
  "raw_hits": []
71
  },
72
  {
73
- "name": "timeline_action: public_field_card_blurb_is_human_readable",
74
  "status": "pass",
75
- "value": "Recognize the current manipulation action from synchronized visual, motion, inertial, pose, and annotation context.",
 
 
 
 
 
 
76
  "raw_hits": []
77
  },
78
  {
@@ -88,9 +94,9 @@
88
  "raw_hits": []
89
  },
90
  {
91
- "name": "timeline_action: public_field_research_name_is_human_readable",
92
  "status": "pass",
93
- "value": "Egocentric Action Recognition",
94
  "raw_hits": []
95
  },
96
  {
@@ -99,12 +105,6 @@
99
  "value": "Look at one short multimodal window and name what action is happening now.",
100
  "raw_hits": []
101
  },
102
- {
103
- "name": "timeline_action: public_field_process_short_is_human_readable",
104
- "status": "pass",
105
- "value": "window features -> action label builder -> classifier",
106
- "raw_hits": []
107
- },
108
  {
109
  "name": "timeline_action: known_task_family",
110
  "status": "pass",
@@ -184,15 +184,21 @@
184
  "observed": "timeline_subtask"
185
  },
186
  {
187
- "name": "timeline_subtask: public_field_input_short_is_human_readable",
188
  "status": "pass",
189
- "value": "20-frame multimodal window",
190
  "raw_hits": []
191
  },
192
  {
193
- "name": "timeline_subtask: public_field_card_blurb_is_human_readable",
194
  "status": "pass",
195
- "value": "Recognize the broader activity stage so fine actions become a readable procedure timeline.",
 
 
 
 
 
 
196
  "raw_hits": []
197
  },
198
  {
@@ -208,9 +214,9 @@
208
  "raw_hits": []
209
  },
210
  {
211
- "name": "timeline_subtask: public_field_research_name_is_human_readable",
212
  "status": "pass",
213
- "value": "Temporal Subtask Recognition",
214
  "raw_hits": []
215
  },
216
  {
@@ -219,12 +225,6 @@
219
  "value": "Predict the higher-level task stage for the current window.",
220
  "raw_hits": []
221
  },
222
- {
223
- "name": "timeline_subtask: public_field_process_short_is_human_readable",
224
- "status": "pass",
225
- "value": "window features -> subtask label builder -> classifier",
226
- "raw_hits": []
227
- },
228
  {
229
  "name": "timeline_subtask: known_task_family",
230
  "status": "pass",
@@ -304,15 +304,21 @@
304
  "observed": "transition_detection"
305
  },
306
  {
307
- "name": "transition_detection: public_field_input_short_is_human_readable",
308
  "status": "pass",
309
- "value": "current window with boundary target",
310
  "raw_hits": []
311
  },
312
  {
313
- "name": "transition_detection: public_field_card_blurb_is_human_readable",
314
  "status": "pass",
315
- "value": "Detect the local moment where the episode changes from one action segment to the next.",
 
 
 
 
 
 
316
  "raw_hits": []
317
  },
318
  {
@@ -328,9 +334,9 @@
328
  "raw_hits": []
329
  },
330
  {
331
- "name": "transition_detection: public_field_research_name_is_human_readable",
332
  "status": "pass",
333
- "value": "Temporal Action Segmentation",
334
  "raw_hits": []
335
  },
336
  {
@@ -339,12 +345,6 @@
339
  "value": "Detect whether the current window is near a boundary between actions.",
340
  "raw_hits": []
341
  },
342
- {
343
- "name": "transition_detection: public_field_process_short_is_human_readable",
344
- "status": "pass",
345
- "value": "action changes -> boundary labels -> binary classifier",
346
- "raw_hits": []
347
- },
348
  {
349
  "name": "transition_detection: known_task_family",
350
  "status": "pass",
@@ -422,15 +422,21 @@
422
  "observed": "next_action"
423
  },
424
  {
425
- "name": "next_action: public_field_input_short_is_human_readable",
426
  "status": "pass",
427
- "value": "current window at time t",
428
  "raw_hits": []
429
  },
430
  {
431
- "name": "next_action: public_field_card_blurb_is_human_readable",
432
  "status": "pass",
433
- "value": "Forecast the near-future action from the current observations only.",
 
 
 
 
 
 
434
  "raw_hits": []
435
  },
436
  {
@@ -446,9 +452,9 @@
446
  "raw_hits": []
447
  },
448
  {
449
- "name": "next_action: public_field_research_name_is_human_readable",
450
  "status": "pass",
451
- "value": "Short-Horizon Intention Prediction",
452
  "raw_hits": []
453
  },
454
  {
@@ -457,12 +463,6 @@
457
  "value": "Use the current window to guess the action that will happen shortly after it.",
458
  "raw_hits": []
459
  },
460
- {
461
- "name": "next_action: public_field_process_short_is_human_readable",
462
- "status": "pass",
463
- "value": "current features -> future label shift -> classifier",
464
- "raw_hits": []
465
- },
466
  {
467
  "name": "next_action: known_task_family",
468
  "status": "pass",
@@ -540,15 +540,21 @@
540
  "observed": "hand_trajectory_forecast"
541
  },
542
  {
543
- "name": "hand_trajectory_forecast: public_field_input_short_is_human_readable",
544
  "status": "pass",
545
- "value": "current multimodal window",
546
  "raw_hits": []
547
  },
548
  {
549
- "name": "hand_trajectory_forecast: public_field_card_blurb_is_human_readable",
550
  "status": "pass",
551
- "value": "Predict the future 3D left/right hand path from the current multimodal state.",
 
 
 
 
 
 
552
  "raw_hits": []
553
  },
554
  {
@@ -564,9 +570,9 @@
564
  "raw_hits": []
565
  },
566
  {
567
- "name": "hand_trajectory_forecast: public_field_research_name_is_human_readable",
568
  "status": "pass",
569
- "value": "3D Hand Motion Forecasting",
570
  "raw_hits": []
571
  },
572
  {
@@ -575,12 +581,6 @@
575
  "value": "Predict where the hands will move over the next few frames.",
576
  "raw_hits": []
577
  },
578
- {
579
- "name": "hand_trajectory_forecast: public_field_process_short_is_human_readable",
580
- "status": "pass",
581
- "value": "current features -> future mocap target -> regression head",
582
- "raw_hits": []
583
- },
584
  {
585
  "name": "hand_trajectory_forecast: known_task_family",
586
  "status": "pass",
@@ -658,15 +658,21 @@
658
  "observed": "contact_prediction"
659
  },
660
  {
661
- "name": "contact_prediction: public_field_input_short_is_human_readable",
662
  "status": "pass",
663
- "value": "non-contact, non-caption features",
664
  "raw_hits": []
665
  },
666
  {
667
- "name": "contact_prediction: public_field_card_blurb_is_human_readable",
668
  "status": "pass",
669
- "value": "Predict whether body or hand contact with the scene is occurring without leaking contact labels.",
 
 
 
 
 
 
670
  "raw_hits": []
671
  },
672
  {
@@ -682,9 +688,9 @@
682
  "raw_hits": []
683
  },
684
  {
685
- "name": "contact_prediction: public_field_research_name_is_human_readable",
686
  "status": "pass",
687
- "value": "Human-Object Contact Prediction",
688
  "raw_hits": []
689
  },
690
  {
@@ -693,12 +699,6 @@
693
  "value": "Predict whether the body or hand is in contact with something.",
694
  "raw_hits": []
695
  },
696
- {
697
- "name": "contact_prediction: public_field_process_short_is_human_readable",
698
- "status": "pass",
699
- "value": "feature filter -> contact target -> binary classifier",
700
- "raw_hits": []
701
- },
702
  {
703
  "name": "contact_prediction: known_task_family",
704
  "status": "pass",
@@ -774,15 +774,21 @@
774
  "observed": "object_relevance"
775
  },
776
  {
777
- "name": "object_relevance: public_field_input_short_is_human_readable",
778
  "status": "pass",
779
- "value": "non-caption multimodal features",
780
  "raw_hits": []
781
  },
782
  {
783
- "name": "object_relevance: public_field_card_blurb_is_human_readable",
784
  "status": "pass",
785
- "value": "Infer which objects are relevant to the current manipulation window from non-caption features.",
 
 
 
 
 
 
786
  "raw_hits": []
787
  },
788
  {
@@ -798,9 +804,9 @@
798
  "raw_hits": []
799
  },
800
  {
801
- "name": "object_relevance: public_field_research_name_is_human_readable",
802
  "status": "pass",
803
- "value": "Object-Centric Interaction Recognition",
804
  "raw_hits": []
805
  },
806
  {
@@ -809,12 +815,6 @@
809
  "value": "Predict which objects matter in the current window.",
810
  "raw_hits": []
811
  },
812
- {
813
- "name": "object_relevance: public_field_process_short_is_human_readable",
814
- "status": "pass",
815
- "value": "object vocabulary -> multi-hot labels -> sigmoid heads",
816
- "raw_hits": []
817
- },
818
  {
819
  "name": "object_relevance: known_task_family",
820
  "status": "pass",
@@ -892,15 +892,21 @@
892
  "observed": "caption_grounding"
893
  },
894
  {
895
- "name": "caption_grounding: public_field_input_short_is_human_readable",
896
  "status": "pass",
897
- "value": "text-like query and candidate windows",
898
  "raw_hits": []
899
  },
900
  {
901
- "name": "caption_grounding: public_field_card_blurb_is_human_readable",
902
  "status": "pass",
903
- "value": "Retrieve the matching time window for an annotation-derived text query.",
 
 
 
 
 
 
904
  "raw_hits": []
905
  },
906
  {
@@ -916,9 +922,9 @@
916
  "raw_hits": []
917
  },
918
  {
919
- "name": "caption_grounding: public_field_research_name_is_human_readable",
920
  "status": "pass",
921
- "value": "Language-to-Moment Grounding",
922
  "raw_hits": []
923
  },
924
  {
@@ -927,12 +933,6 @@
927
  "value": "Given a text-like query from annotation, find the matching time window.",
928
  "raw_hits": []
929
  },
930
- {
931
- "name": "caption_grounding: public_field_process_short_is_human_readable",
932
- "status": "pass",
933
- "value": "query features -> candidate index -> cosine ranker",
934
- "raw_hits": []
935
- },
936
  {
937
  "name": "caption_grounding: known_task_family",
938
  "status": "pass",
@@ -1008,15 +1008,21 @@
1008
  "observed": "cross_modal_retrieval"
1009
  },
1010
  {
1011
- "name": "cross_modal_retrieval: public_field_input_short_is_human_readable",
1012
  "status": "pass",
1013
- "value": "motion/IMU/pose query; depth/video candidates",
1014
  "raw_hits": []
1015
  },
1016
  {
1017
- "name": "cross_modal_retrieval: public_field_card_blurb_is_human_readable",
1018
  "status": "pass",
1019
- "value": "Use motion, IMU, and camera-pose signals to retrieve the matching depth/video window.",
 
 
 
 
 
 
1020
  "raw_hits": []
1021
  },
1022
  {
@@ -1032,9 +1038,9 @@
1032
  "raw_hits": []
1033
  },
1034
  {
1035
- "name": "cross_modal_retrieval: public_field_research_name_is_human_readable",
1036
  "status": "pass",
1037
- "value": "Multimodal Representation Retrieval",
1038
  "raw_hits": []
1039
  },
1040
  {
@@ -1043,12 +1049,6 @@
1043
  "value": "Use one group of modalities to retrieve the matching window from another group.",
1044
  "raw_hits": []
1045
  },
1046
- {
1047
- "name": "cross_modal_retrieval: public_field_process_short_is_human_readable",
1048
- "status": "pass",
1049
- "value": "modality split -> projection -> nearest-neighbor ranker",
1050
- "raw_hits": []
1051
- },
1052
  {
1053
  "name": "cross_modal_retrieval: known_task_family",
1054
  "status": "pass",
@@ -1126,15 +1126,21 @@
1126
  "observed": "modality_reconstruction"
1127
  },
1128
  {
1129
- "name": "modality_reconstruction: public_field_input_short_is_human_readable",
1130
  "status": "pass",
1131
- "value": "motion, IMU, and camera/pose features",
1132
  "raw_hits": []
1133
  },
1134
  {
1135
- "name": "modality_reconstruction: public_field_card_blurb_is_human_readable",
1136
  "status": "pass",
1137
- "value": "Predict compressed depth/video feature vectors from motion, IMU, and camera-pose features.",
 
 
 
 
 
 
1138
  "raw_hits": []
1139
  },
1140
  {
@@ -1150,9 +1156,9 @@
1150
  "raw_hits": []
1151
  },
1152
  {
1153
- "name": "modality_reconstruction: public_field_research_name_is_human_readable",
1154
  "status": "pass",
1155
- "value": "Modality Feature Reconstruction",
1156
  "raw_hits": []
1157
  },
1158
  {
@@ -1161,12 +1167,6 @@
1161
  "value": "Predict one modality feature block from other modality blocks.",
1162
  "raw_hits": []
1163
  },
1164
- {
1165
- "name": "modality_reconstruction: public_field_process_short_is_human_readable",
1166
- "status": "pass",
1167
- "value": "source-target split -> scaler -> regression head",
1168
- "raw_hits": []
1169
- },
1170
  {
1171
  "name": "modality_reconstruction: known_task_family",
1172
  "status": "pass",
@@ -1243,12 +1243,6 @@
1243
  "status": "pass",
1244
  "observed": "temporal_order"
1245
  },
1246
- {
1247
- "name": "temporal_order: public_field_input_short_is_human_readable",
1248
- "status": "pass",
1249
- "value": "two adjacent windows plus difference vector",
1250
- "raw_hits": []
1251
- },
1252
  {
1253
  "name": "temporal_order: public_field_card_blurb_is_human_readable",
1254
  "status": "pass",
@@ -1256,27 +1250,27 @@
1256
  "raw_hits": []
1257
  },
1258
  {
1259
- "name": "temporal_order: public_field_display_name_is_human_readable",
1260
  "status": "pass",
1261
  "value": "Temporal Order Verification",
1262
  "raw_hits": []
1263
  },
1264
  {
1265
- "name": "temporal_order: public_field_output_short_is_human_readable",
1266
  "status": "pass",
1267
- "value": "correct or reversed",
1268
  "raw_hits": []
1269
  },
1270
  {
1271
- "name": "temporal_order: public_field_research_name_is_human_readable",
1272
  "status": "pass",
1273
  "value": "Temporal Order Verification",
1274
  "raw_hits": []
1275
  },
1276
  {
1277
- "name": "temporal_order: public_field_plain_goal_is_human_readable",
1278
  "status": "pass",
1279
- "value": "Tell whether two nearby windows are in the correct time order.",
1280
  "raw_hits": []
1281
  },
1282
  {
@@ -1285,6 +1279,12 @@
1285
  "value": "pair builder -> feature combiner -> binary classifier",
1286
  "raw_hits": []
1287
  },
 
 
 
 
 
 
1288
  {
1289
  "name": "temporal_order: known_task_family",
1290
  "status": "pass",
@@ -1360,15 +1360,21 @@
1360
  "observed": "misalignment_detection"
1361
  },
1362
  {
1363
- "name": "misalignment_detection: public_field_input_short_is_human_readable",
1364
  "status": "pass",
1365
- "value": "motion-side and visual/depth-side feature groups",
1366
  "raw_hits": []
1367
  },
1368
  {
1369
- "name": "misalignment_detection: public_field_card_blurb_is_human_readable",
1370
  "status": "pass",
1371
- "value": "Detect whether motion and visual/depth streams have been artificially shifted out of sync.",
 
 
 
 
 
 
1372
  "raw_hits": []
1373
  },
1374
  {
@@ -1384,9 +1390,9 @@
1384
  "raw_hits": []
1385
  },
1386
  {
1387
- "name": "misalignment_detection: public_field_research_name_is_human_readable",
1388
  "status": "pass",
1389
- "value": "Cross-Modal Misalignment Detection",
1390
  "raw_hits": []
1391
  },
1392
  {
@@ -1395,12 +1401,6 @@
1395
  "value": "Detect when modalities that should match are shifted out of sync.",
1396
  "raw_hits": []
1397
  },
1398
- {
1399
- "name": "misalignment_detection: public_field_process_short_is_human_readable",
1400
- "status": "pass",
1401
- "value": "aligned/shifted pairs -> feature combiner -> binary classifier",
1402
- "raw_hits": []
1403
- },
1404
  {
1405
  "name": "misalignment_detection: known_task_family",
1406
  "status": "pass",
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-06T14:53:59+00:00",
4
  "summary": {
5
  "task_count": 12,
6
  "expected_task_count": 12,
 
64
  "observed": "timeline_action"
65
  },
66
  {
67
+ "name": "timeline_action: public_field_card_blurb_is_human_readable",
68
  "status": "pass",
69
+ "value": "Recognize the current manipulation action from synchronized visual, motion, inertial, pose, and annotation context.",
70
  "raw_hits": []
71
  },
72
  {
73
+ "name": "timeline_action: public_field_research_name_is_human_readable",
74
  "status": "pass",
75
+ "value": "Egocentric Action Recognition",
76
+ "raw_hits": []
77
+ },
78
+ {
79
+ "name": "timeline_action: public_field_input_short_is_human_readable",
80
+ "status": "pass",
81
+ "value": "20-frame multimodal window",
82
  "raw_hits": []
83
  },
84
  {
 
94
  "raw_hits": []
95
  },
96
  {
97
+ "name": "timeline_action: public_field_process_short_is_human_readable",
98
  "status": "pass",
99
+ "value": "window features -> action label builder -> classifier",
100
  "raw_hits": []
101
  },
102
  {
 
105
  "value": "Look at one short multimodal window and name what action is happening now.",
106
  "raw_hits": []
107
  },
 
 
 
 
 
 
108
  {
109
  "name": "timeline_action: known_task_family",
110
  "status": "pass",
 
184
  "observed": "timeline_subtask"
185
  },
186
  {
187
+ "name": "timeline_subtask: public_field_card_blurb_is_human_readable",
188
  "status": "pass",
189
+ "value": "Recognize the broader activity stage so fine actions become a readable procedure timeline.",
190
  "raw_hits": []
191
  },
192
  {
193
+ "name": "timeline_subtask: public_field_research_name_is_human_readable",
194
  "status": "pass",
195
+ "value": "Temporal Subtask Recognition",
196
+ "raw_hits": []
197
+ },
198
+ {
199
+ "name": "timeline_subtask: public_field_input_short_is_human_readable",
200
+ "status": "pass",
201
+ "value": "20-frame multimodal window",
202
  "raw_hits": []
203
  },
204
  {
 
214
  "raw_hits": []
215
  },
216
  {
217
+ "name": "timeline_subtask: public_field_process_short_is_human_readable",
218
  "status": "pass",
219
+ "value": "window features -> subtask label builder -> classifier",
220
  "raw_hits": []
221
  },
222
  {
 
225
  "value": "Predict the higher-level task stage for the current window.",
226
  "raw_hits": []
227
  },
 
 
 
 
 
 
228
  {
229
  "name": "timeline_subtask: known_task_family",
230
  "status": "pass",
 
304
  "observed": "transition_detection"
305
  },
306
  {
307
+ "name": "transition_detection: public_field_card_blurb_is_human_readable",
308
  "status": "pass",
309
+ "value": "Detect the local moment where the episode changes from one action segment to the next.",
310
  "raw_hits": []
311
  },
312
  {
313
+ "name": "transition_detection: public_field_research_name_is_human_readable",
314
  "status": "pass",
315
+ "value": "Temporal Action Segmentation",
316
+ "raw_hits": []
317
+ },
318
+ {
319
+ "name": "transition_detection: public_field_input_short_is_human_readable",
320
+ "status": "pass",
321
+ "value": "current window with boundary target",
322
  "raw_hits": []
323
  },
324
  {
 
334
  "raw_hits": []
335
  },
336
  {
337
+ "name": "transition_detection: public_field_process_short_is_human_readable",
338
  "status": "pass",
339
+ "value": "action changes -> boundary labels -> binary classifier",
340
  "raw_hits": []
341
  },
342
  {
 
345
  "value": "Detect whether the current window is near a boundary between actions.",
346
  "raw_hits": []
347
  },
 
 
 
 
 
 
348
  {
349
  "name": "transition_detection: known_task_family",
350
  "status": "pass",
 
422
  "observed": "next_action"
423
  },
424
  {
425
+ "name": "next_action: public_field_card_blurb_is_human_readable",
426
  "status": "pass",
427
+ "value": "Forecast the near-future action from the current observations only.",
428
  "raw_hits": []
429
  },
430
  {
431
+ "name": "next_action: public_field_research_name_is_human_readable",
432
  "status": "pass",
433
+ "value": "Short-Horizon Intention Prediction",
434
+ "raw_hits": []
435
+ },
436
+ {
437
+ "name": "next_action: public_field_input_short_is_human_readable",
438
+ "status": "pass",
439
+ "value": "current window at time t",
440
  "raw_hits": []
441
  },
442
  {
 
452
  "raw_hits": []
453
  },
454
  {
455
+ "name": "next_action: public_field_process_short_is_human_readable",
456
  "status": "pass",
457
+ "value": "current features -> future label shift -> classifier",
458
  "raw_hits": []
459
  },
460
  {
 
463
  "value": "Use the current window to guess the action that will happen shortly after it.",
464
  "raw_hits": []
465
  },
 
 
 
 
 
 
466
  {
467
  "name": "next_action: known_task_family",
468
  "status": "pass",
 
540
  "observed": "hand_trajectory_forecast"
541
  },
542
  {
543
+ "name": "hand_trajectory_forecast: public_field_card_blurb_is_human_readable",
544
  "status": "pass",
545
+ "value": "Predict the future 3D left/right hand path from the current multimodal state.",
546
  "raw_hits": []
547
  },
548
  {
549
+ "name": "hand_trajectory_forecast: public_field_research_name_is_human_readable",
550
  "status": "pass",
551
+ "value": "3D Hand Motion Forecasting",
552
+ "raw_hits": []
553
+ },
554
+ {
555
+ "name": "hand_trajectory_forecast: public_field_input_short_is_human_readable",
556
+ "status": "pass",
557
+ "value": "current multimodal window",
558
  "raw_hits": []
559
  },
560
  {
 
570
  "raw_hits": []
571
  },
572
  {
573
+ "name": "hand_trajectory_forecast: public_field_process_short_is_human_readable",
574
  "status": "pass",
575
+ "value": "current features -> future mocap target -> regression head",
576
  "raw_hits": []
577
  },
578
  {
 
581
  "value": "Predict where the hands will move over the next few frames.",
582
  "raw_hits": []
583
  },
 
 
 
 
 
 
584
  {
585
  "name": "hand_trajectory_forecast: known_task_family",
586
  "status": "pass",
 
658
  "observed": "contact_prediction"
659
  },
660
  {
661
+ "name": "contact_prediction: public_field_card_blurb_is_human_readable",
662
  "status": "pass",
663
+ "value": "Predict whether body or hand contact with the scene is occurring without leaking contact labels.",
664
  "raw_hits": []
665
  },
666
  {
667
+ "name": "contact_prediction: public_field_research_name_is_human_readable",
668
  "status": "pass",
669
+ "value": "Human-Object Contact Prediction",
670
+ "raw_hits": []
671
+ },
672
+ {
673
+ "name": "contact_prediction: public_field_input_short_is_human_readable",
674
+ "status": "pass",
675
+ "value": "non-contact, non-caption features",
676
  "raw_hits": []
677
  },
678
  {
 
688
  "raw_hits": []
689
  },
690
  {
691
+ "name": "contact_prediction: public_field_process_short_is_human_readable",
692
  "status": "pass",
693
+ "value": "feature filter -> contact target -> binary classifier",
694
  "raw_hits": []
695
  },
696
  {
 
699
  "value": "Predict whether the body or hand is in contact with something.",
700
  "raw_hits": []
701
  },
 
 
 
 
 
 
702
  {
703
  "name": "contact_prediction: known_task_family",
704
  "status": "pass",
 
774
  "observed": "object_relevance"
775
  },
776
  {
777
+ "name": "object_relevance: public_field_card_blurb_is_human_readable",
778
  "status": "pass",
779
+ "value": "Infer which objects are relevant to the current manipulation window from non-caption features.",
780
  "raw_hits": []
781
  },
782
  {
783
+ "name": "object_relevance: public_field_research_name_is_human_readable",
784
  "status": "pass",
785
+ "value": "Object-Centric Interaction Recognition",
786
+ "raw_hits": []
787
+ },
788
+ {
789
+ "name": "object_relevance: public_field_input_short_is_human_readable",
790
+ "status": "pass",
791
+ "value": "non-caption multimodal features",
792
  "raw_hits": []
793
  },
794
  {
 
804
  "raw_hits": []
805
  },
806
  {
807
+ "name": "object_relevance: public_field_process_short_is_human_readable",
808
  "status": "pass",
809
+ "value": "object vocabulary -> multi-hot labels -> sigmoid heads",
810
  "raw_hits": []
811
  },
812
  {
 
815
  "value": "Predict which objects matter in the current window.",
816
  "raw_hits": []
817
  },
 
 
 
 
 
 
818
  {
819
  "name": "object_relevance: known_task_family",
820
  "status": "pass",
 
892
  "observed": "caption_grounding"
893
  },
894
  {
895
+ "name": "caption_grounding: public_field_card_blurb_is_human_readable",
896
  "status": "pass",
897
+ "value": "Retrieve the matching time window for an annotation-derived text query.",
898
  "raw_hits": []
899
  },
900
  {
901
+ "name": "caption_grounding: public_field_research_name_is_human_readable",
902
  "status": "pass",
903
+ "value": "Language-to-Moment Grounding",
904
+ "raw_hits": []
905
+ },
906
+ {
907
+ "name": "caption_grounding: public_field_input_short_is_human_readable",
908
+ "status": "pass",
909
+ "value": "text-like query and candidate windows",
910
  "raw_hits": []
911
  },
912
  {
 
922
  "raw_hits": []
923
  },
924
  {
925
+ "name": "caption_grounding: public_field_process_short_is_human_readable",
926
  "status": "pass",
927
+ "value": "query features -> candidate index -> cosine ranker",
928
  "raw_hits": []
929
  },
930
  {
 
933
  "value": "Given a text-like query from annotation, find the matching time window.",
934
  "raw_hits": []
935
  },
 
 
 
 
 
 
936
  {
937
  "name": "caption_grounding: known_task_family",
938
  "status": "pass",
 
1008
  "observed": "cross_modal_retrieval"
1009
  },
1010
  {
1011
+ "name": "cross_modal_retrieval: public_field_card_blurb_is_human_readable",
1012
  "status": "pass",
1013
+ "value": "Use motion, IMU, and camera-pose signals to retrieve the matching depth/video window.",
1014
  "raw_hits": []
1015
  },
1016
  {
1017
+ "name": "cross_modal_retrieval: public_field_research_name_is_human_readable",
1018
  "status": "pass",
1019
+ "value": "Multimodal Representation Retrieval",
1020
+ "raw_hits": []
1021
+ },
1022
+ {
1023
+ "name": "cross_modal_retrieval: public_field_input_short_is_human_readable",
1024
+ "status": "pass",
1025
+ "value": "motion/IMU/pose query; depth/video candidates",
1026
  "raw_hits": []
1027
  },
1028
  {
 
1038
  "raw_hits": []
1039
  },
1040
  {
1041
+ "name": "cross_modal_retrieval: public_field_process_short_is_human_readable",
1042
  "status": "pass",
1043
+ "value": "modality split -> projection -> nearest-neighbor ranker",
1044
  "raw_hits": []
1045
  },
1046
  {
 
1049
  "value": "Use one group of modalities to retrieve the matching window from another group.",
1050
  "raw_hits": []
1051
  },
 
 
 
 
 
 
1052
  {
1053
  "name": "cross_modal_retrieval: known_task_family",
1054
  "status": "pass",
 
1126
  "observed": "modality_reconstruction"
1127
  },
1128
  {
1129
+ "name": "modality_reconstruction: public_field_card_blurb_is_human_readable",
1130
  "status": "pass",
1131
+ "value": "Predict compressed depth/video feature vectors from motion, IMU, and camera-pose features.",
1132
  "raw_hits": []
1133
  },
1134
  {
1135
+ "name": "modality_reconstruction: public_field_research_name_is_human_readable",
1136
  "status": "pass",
1137
+ "value": "Modality Feature Reconstruction",
1138
+ "raw_hits": []
1139
+ },
1140
+ {
1141
+ "name": "modality_reconstruction: public_field_input_short_is_human_readable",
1142
+ "status": "pass",
1143
+ "value": "motion, IMU, and camera/pose features",
1144
  "raw_hits": []
1145
  },
1146
  {
 
1156
  "raw_hits": []
1157
  },
1158
  {
1159
+ "name": "modality_reconstruction: public_field_process_short_is_human_readable",
1160
  "status": "pass",
1161
+ "value": "source-target split -> scaler -> regression head",
1162
  "raw_hits": []
1163
  },
1164
  {
 
1167
  "value": "Predict one modality feature block from other modality blocks.",
1168
  "raw_hits": []
1169
  },
 
 
 
 
 
 
1170
  {
1171
  "name": "modality_reconstruction: known_task_family",
1172
  "status": "pass",
 
1243
  "status": "pass",
1244
  "observed": "temporal_order"
1245
  },
 
 
 
 
 
 
1246
  {
1247
  "name": "temporal_order: public_field_card_blurb_is_human_readable",
1248
  "status": "pass",
 
1250
  "raw_hits": []
1251
  },
1252
  {
1253
+ "name": "temporal_order: public_field_research_name_is_human_readable",
1254
  "status": "pass",
1255
  "value": "Temporal Order Verification",
1256
  "raw_hits": []
1257
  },
1258
  {
1259
+ "name": "temporal_order: public_field_input_short_is_human_readable",
1260
  "status": "pass",
1261
+ "value": "two adjacent windows plus difference vector",
1262
  "raw_hits": []
1263
  },
1264
  {
1265
+ "name": "temporal_order: public_field_display_name_is_human_readable",
1266
  "status": "pass",
1267
  "value": "Temporal Order Verification",
1268
  "raw_hits": []
1269
  },
1270
  {
1271
+ "name": "temporal_order: public_field_output_short_is_human_readable",
1272
  "status": "pass",
1273
+ "value": "correct or reversed",
1274
  "raw_hits": []
1275
  },
1276
  {
 
1279
  "value": "pair builder -> feature combiner -> binary classifier",
1280
  "raw_hits": []
1281
  },
1282
+ {
1283
+ "name": "temporal_order: public_field_plain_goal_is_human_readable",
1284
+ "status": "pass",
1285
+ "value": "Tell whether two nearby windows are in the correct time order.",
1286
+ "raw_hits": []
1287
+ },
1288
  {
1289
  "name": "temporal_order: known_task_family",
1290
  "status": "pass",
 
1360
  "observed": "misalignment_detection"
1361
  },
1362
  {
1363
+ "name": "misalignment_detection: public_field_card_blurb_is_human_readable",
1364
  "status": "pass",
1365
+ "value": "Detect whether motion and visual/depth streams have been artificially shifted out of sync.",
1366
  "raw_hits": []
1367
  },
1368
  {
1369
+ "name": "misalignment_detection: public_field_research_name_is_human_readable",
1370
  "status": "pass",
1371
+ "value": "Cross-Modal Misalignment Detection",
1372
+ "raw_hits": []
1373
+ },
1374
+ {
1375
+ "name": "misalignment_detection: public_field_input_short_is_human_readable",
1376
+ "status": "pass",
1377
+ "value": "motion-side and visual/depth-side feature groups",
1378
  "raw_hits": []
1379
  },
1380
  {
 
1390
  "raw_hits": []
1391
  },
1392
  {
1393
+ "name": "misalignment_detection: public_field_process_short_is_human_readable",
1394
  "status": "pass",
1395
+ "value": "aligned/shifted pairs -> feature combiner -> binary classifier",
1396
  "raw_hits": []
1397
  },
1398
  {
 
1401
  "value": "Detect when modalities that should match are shifted out of sync.",
1402
  "raw_hits": []
1403
  },
 
 
 
 
 
 
1404
  {
1405
  "name": "misalignment_detection: known_task_family",
1406
  "status": "pass",
docs/data/website_integrity.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-06T14:36:10+00:00",
4
  "docs_root": "docs",
5
  "site_base": "/ropedia-xperience-10m-task-suite/",
6
  "summary": {
@@ -251,7 +251,7 @@
251
  },
252
  {
253
  "path": "data/artifact_index.json",
254
- "bytes": 37736,
255
  "top_level_type": "dict"
256
  },
257
  {
@@ -291,7 +291,7 @@
291
  },
292
  {
293
  "path": "data/mirror_parity.json",
294
- "bytes": 111950,
295
  "top_level_type": "dict"
296
  },
297
  {
@@ -301,7 +301,7 @@
301
  },
302
  {
303
  "path": "data/omni_finetune_verified_result.json",
304
- "bytes": 3145,
305
  "top_level_type": "dict"
306
  },
307
  {
@@ -321,7 +321,7 @@
321
  },
322
  {
323
  "path": "data/project_status.json",
324
- "bytes": 10977,
325
  "top_level_type": "dict"
326
  },
327
  {
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-06T14:54:01+00:00",
4
  "docs_root": "docs",
5
  "site_base": "/ropedia-xperience-10m-task-suite/",
6
  "summary": {
 
251
  },
252
  {
253
  "path": "data/artifact_index.json",
254
+ "bytes": 39486,
255
  "top_level_type": "dict"
256
  },
257
  {
 
291
  },
292
  {
293
  "path": "data/mirror_parity.json",
294
+ "bytes": 126335,
295
  "top_level_type": "dict"
296
  },
297
  {
 
301
  },
302
  {
303
  "path": "data/omni_finetune_verified_result.json",
304
+ "bytes": 4142,
305
  "top_level_type": "dict"
306
  },
307
  {
 
321
  },
322
  {
323
  "path": "data/project_status.json",
324
+ "bytes": 11274,
325
  "top_level_type": "dict"
326
  },
327
  {
metrics/artifact_index.json CHANGED
@@ -1,12 +1,12 @@
1
  {
2
  "title": "Ropedia Xperience-10M Task Suite Artifact Index",
3
- "generated_at_utc": "2026-06-06T14:35:42+00:00",
4
  "status": "pass",
5
- "artifact_count": 83,
6
  "missing": [],
7
  "by_kind": {
8
  "project_path": 14,
9
- "scaleup_contract": 6,
10
  "project_scope": 1,
11
  "source_alignment": 5,
12
  "publication_workflow": 3,
@@ -28,7 +28,7 @@
28
  "onboarding_doc": 1,
29
  "generated_figure": 3,
30
  "generated_figure_assets": 1,
31
- "scaleup_status": 2,
32
  "citation": 1,
33
  "license": 1
34
  },
@@ -63,8 +63,8 @@
63
  "surface": "repo_hf",
64
  "shows": "Gives a compact current-state table for first-pass readers.",
65
  "exists": true,
66
- "bytes": 8534,
67
- "sha256": "5eb48d489da7f005baab233a94c9d6b209eb1e9ffdb138c8e0e600ece9239a29"
68
  },
69
  {
70
  "id": "project_status_json",
@@ -74,8 +74,8 @@
74
  "surface": "website_hf",
75
  "shows": "Machine-readable copy of the current project status for website and HF mirrors.",
76
  "exists": true,
77
- "bytes": 10977,
78
- "sha256": "2bb0639c137dfd6eddd337eb909292543ae2e72753dee398f8240ff35f6a3984"
79
  },
80
  {
81
  "id": "research_roadmap",
@@ -187,6 +187,17 @@
187
  "bytes": 6519,
188
  "sha256": "a3773fc681e298325e2be80556d6be6e7e30b90ba22ee24b66633f07ff9c4ea4"
189
  },
 
 
 
 
 
 
 
 
 
 
 
190
  {
191
  "id": "additional_development_directions",
192
  "title": "Additional development directions",
@@ -250,8 +261,8 @@
250
  "surface": "repo_hf",
251
  "shows": "Gives the human-readable map from project scope to data, tasks, platform mirrors, and scale-up status.",
252
  "exists": true,
253
- "bytes": 15660,
254
- "sha256": "a9ad335b82c35a5ac102428663ffae1c8798e90e45cc5e795c3a499b4563b417"
255
  },
256
  {
257
  "id": "official_dataset_card_alignment",
@@ -695,8 +706,8 @@
695
  "surface": "repo_hf",
696
  "shows": "Generates the selective artifact catalog from local files.",
697
  "exists": true,
698
- "bytes": 30785,
699
- "sha256": "0c42b68e44e6a32b6b5161b47161adc5ccdb57567e1462e8271ea87af50ab92d"
700
  },
701
  {
702
  "id": "publication_audit",
@@ -731,7 +742,7 @@
731
  "volatile": true,
732
  "shows": "Confirms prepared GitHub/HF Space/artifact/model mirrors share the same critical data, figure, website HTML, and validator files.",
733
  "exists": true,
734
- "bytes": 111950,
735
  "hash_policy": "existence_and_size_only"
736
  },
737
  {
@@ -933,6 +944,28 @@
933
  "bytes": 3076,
934
  "sha256": "23b87581cfc1d95b0af118a0dbb4e601f42fc6bad608759490e13a9a1ef73205"
935
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
936
  {
937
  "id": "citation",
938
  "title": "Citation metadata",
 
1
  {
2
  "title": "Ropedia Xperience-10M Task Suite Artifact Index",
3
+ "generated_at_utc": "2026-06-06T14:53:45+00:00",
4
  "status": "pass",
5
+ "artifact_count": 86,
6
  "missing": [],
7
  "by_kind": {
8
  "project_path": 14,
9
+ "scaleup_contract": 7,
10
  "project_scope": 1,
11
  "source_alignment": 5,
12
  "publication_workflow": 3,
 
28
  "onboarding_doc": 1,
29
  "generated_figure": 3,
30
  "generated_figure_assets": 1,
31
+ "scaleup_status": 4,
32
  "citation": 1,
33
  "license": 1
34
  },
 
63
  "surface": "repo_hf",
64
  "shows": "Gives a compact current-state table for first-pass readers.",
65
  "exists": true,
66
+ "bytes": 8805,
67
+ "sha256": "4051b78674306078880de33a144a499144b2487b11455c70a364a94cefa035a7"
68
  },
69
  {
70
  "id": "project_status_json",
 
74
  "surface": "website_hf",
75
  "shows": "Machine-readable copy of the current project status for website and HF mirrors.",
76
  "exists": true,
77
+ "bytes": 11274,
78
+ "sha256": "ae2b2c520ab1e0553fa399439345edd87832fa5293d8c27ffe610ede5bfa1067"
79
  },
80
  {
81
  "id": "research_roadmap",
 
187
  "bytes": 6519,
188
  "sha256": "a3773fc681e298325e2be80556d6be6e7e30b90ba22ee24b66633f07ff9c4ea4"
189
  },
190
+ {
191
+ "id": "qwen3_omni_error_analysis_script",
192
+ "title": "Qwen3-Omni held-out error-analysis script",
193
+ "path": "scripts/omni/analyze_qwen3_omni_errors.py",
194
+ "kind": "scaleup_contract",
195
+ "surface": "repo_hf",
196
+ "shows": "Computes public-safe held-out error-analysis tables by episode, action family, train-seen status, required-modality state, and object category.",
197
+ "exists": true,
198
+ "bytes": 15676,
199
+ "sha256": "d4c7e46d9fbd5f9d84bc32374f457fd8c9d68c8faa39c77bc45770eb95d80337"
200
+ },
201
  {
202
  "id": "additional_development_directions",
203
  "title": "Additional development directions",
 
261
  "surface": "repo_hf",
262
  "shows": "Gives the human-readable map from project scope to data, tasks, platform mirrors, and scale-up status.",
263
  "exists": true,
264
+ "bytes": 16318,
265
+ "sha256": "cda5f4b5be4b7a2d26aff6ed7f930bfba13dfc463d533a9880193c0a0611b677"
266
  },
267
  {
268
  "id": "official_dataset_card_alignment",
 
706
  "surface": "repo_hf",
707
  "shows": "Generates the selective artifact catalog from local files.",
708
  "exists": true,
709
+ "bytes": 32191,
710
+ "sha256": "4a105c732d2f6c54a78333d7f47e0139325ba638027e34e6acd929a90626b8e0"
711
  },
712
  {
713
  "id": "publication_audit",
 
742
  "volatile": true,
743
  "shows": "Confirms prepared GitHub/HF Space/artifact/model mirrors share the same critical data, figure, website HTML, and validator files.",
744
  "exists": true,
745
+ "bytes": 126335,
746
  "hash_policy": "existence_and_size_only"
747
  },
748
  {
 
944
  "bytes": 3076,
945
  "sha256": "23b87581cfc1d95b0af118a0dbb4e601f42fc6bad608759490e13a9a1ef73205"
946
  },
947
+ {
948
+ "id": "qwen3_omni_error_analysis_report",
949
+ "title": "Qwen3-Omni held-out error-analysis report",
950
+ "path": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/ERROR_ANALYSIS.md",
951
+ "kind": "scaleup_status",
952
+ "surface": "repo_hf",
953
+ "shows": "Summarizes validation-aware Qwen3-Omni held-out failures by episode, action family, train-seen status, required-modality state, and object category.",
954
+ "exists": true,
955
+ "bytes": 3331,
956
+ "sha256": "063fcc2ebd7b57ab5b281fd5e8edc629da4e1f4e5a708483ba27375d02af9467"
957
+ },
958
+ {
959
+ "id": "qwen3_omni_error_analysis_json",
960
+ "title": "Qwen3-Omni held-out error-analysis JSON",
961
+ "path": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/error_analysis_summary.json",
962
+ "kind": "scaleup_status",
963
+ "surface": "repo_hf",
964
+ "shows": "Machine-readable Qwen3-Omni held-out error analysis with grouped metrics and sanitized failure examples.",
965
+ "exists": true,
966
+ "bytes": 25202,
967
+ "sha256": "c2e4eaa686f5d9739a8d0bfd8ae51a453b94019489ed84a154e2bce2fa316ff5"
968
+ },
969
  {
970
  "id": "citation",
971
  "title": "Citation metadata",
metrics/mirror_parity.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-06T14:37:36+00:00",
4
  "hf_root": "hf_publish",
5
  "summary": {
6
- "group_count": 104,
7
  "failure_count": 0,
8
  "failures_by_surface": {}
9
  },
@@ -102,27 +102,27 @@
102
  "local": {
103
  "path": "repo:docs/data/artifact_index.json",
104
  "exists": true,
105
- "bytes": 37736,
106
- "sha256": "f1d87cbabab02227b834ad333507af31a8ce309600f0e0427bb8cb59a26c3b71"
107
  },
108
  "mirrors": {
109
  "hf_space": {
110
  "path": "hf_space:data/artifact_index.json",
111
  "exists": true,
112
- "bytes": 37736,
113
- "sha256": "f1d87cbabab02227b834ad333507af31a8ce309600f0e0427bb8cb59a26c3b71"
114
  },
115
  "hf_artifacts": {
116
  "path": "hf_artifacts:docs/data/artifact_index.json",
117
  "exists": true,
118
- "bytes": 37736,
119
- "sha256": "f1d87cbabab02227b834ad333507af31a8ce309600f0e0427bb8cb59a26c3b71"
120
  },
121
  "hf_model": {
122
  "path": "hf_model:metrics/artifact_index.json",
123
  "exists": true,
124
- "bytes": 37736,
125
- "sha256": "f1d87cbabab02227b834ad333507af31a8ce309600f0e0427bb8cb59a26c3b71"
126
  }
127
  },
128
  "failures": []
@@ -350,27 +350,27 @@
350
  "local": {
351
  "path": "repo:docs/data/omni_finetune_verified_result.json",
352
  "exists": true,
353
- "bytes": 3145,
354
- "sha256": "37b001a24201ba56b327fa89f19792d64ebcdabc1faffa7e7bb4fd6b8323731a"
355
  },
356
  "mirrors": {
357
  "hf_space": {
358
  "path": "hf_space:data/omni_finetune_verified_result.json",
359
  "exists": true,
360
- "bytes": 3145,
361
- "sha256": "37b001a24201ba56b327fa89f19792d64ebcdabc1faffa7e7bb4fd6b8323731a"
362
  },
363
  "hf_artifacts": {
364
  "path": "hf_artifacts:docs/data/omni_finetune_verified_result.json",
365
  "exists": true,
366
- "bytes": 3145,
367
- "sha256": "37b001a24201ba56b327fa89f19792d64ebcdabc1faffa7e7bb4fd6b8323731a"
368
  },
369
  "hf_model": {
370
  "path": "hf_model:metrics/omni_finetune_verified_result.json",
371
  "exists": true,
372
- "bytes": 3145,
373
- "sha256": "37b001a24201ba56b327fa89f19792d64ebcdabc1faffa7e7bb4fd6b8323731a"
374
  }
375
  },
376
  "failures": []
@@ -474,27 +474,27 @@
474
  "local": {
475
  "path": "repo:docs/data/project_status.json",
476
  "exists": true,
477
- "bytes": 10977,
478
- "sha256": "2bb0639c137dfd6eddd337eb909292543ae2e72753dee398f8240ff35f6a3984"
479
  },
480
  "mirrors": {
481
  "hf_space": {
482
  "path": "hf_space:data/project_status.json",
483
  "exists": true,
484
- "bytes": 10977,
485
- "sha256": "2bb0639c137dfd6eddd337eb909292543ae2e72753dee398f8240ff35f6a3984"
486
  },
487
  "hf_artifacts": {
488
  "path": "hf_artifacts:docs/data/project_status.json",
489
  "exists": true,
490
- "bytes": 10977,
491
- "sha256": "2bb0639c137dfd6eddd337eb909292543ae2e72753dee398f8240ff35f6a3984"
492
  },
493
  "hf_model": {
494
  "path": "hf_model:metrics/project_status.json",
495
  "exists": true,
496
- "bytes": 10977,
497
- "sha256": "2bb0639c137dfd6eddd337eb909292543ae2e72753dee398f8240ff35f6a3984"
498
  }
499
  },
500
  "failures": []
@@ -506,26 +506,26 @@
506
  "path": "repo:docs/data/publication_audit.json",
507
  "exists": true,
508
  "bytes": 7237,
509
- "sha256": "bfdfb04abf62dfb3ffa596f1d9ec58fc5bac633f6c1cfb1710d3988ef635cf03"
510
  },
511
  "mirrors": {
512
  "hf_space": {
513
  "path": "hf_space:data/publication_audit.json",
514
  "exists": true,
515
  "bytes": 7237,
516
- "sha256": "bfdfb04abf62dfb3ffa596f1d9ec58fc5bac633f6c1cfb1710d3988ef635cf03"
517
  },
518
  "hf_artifacts": {
519
  "path": "hf_artifacts:docs/data/publication_audit.json",
520
  "exists": true,
521
  "bytes": 7237,
522
- "sha256": "bfdfb04abf62dfb3ffa596f1d9ec58fc5bac633f6c1cfb1710d3988ef635cf03"
523
  },
524
  "hf_model": {
525
  "path": "hf_model:metrics/publication_audit.json",
526
  "exists": true,
527
  "bytes": 7237,
528
- "sha256": "bfdfb04abf62dfb3ffa596f1d9ec58fc5bac633f6c1cfb1710d3988ef635cf03"
529
  }
530
  },
531
  "failures": []
@@ -816,26 +816,26 @@
816
  "path": "repo:docs/data/scope_claims_audit.json",
817
  "exists": true,
818
  "bytes": 20823,
819
- "sha256": "7f01728415c9c54126eab25f2ce68e563b455f02d2bf10af514463c33bc0091e"
820
  },
821
  "mirrors": {
822
  "hf_space": {
823
  "path": "hf_space:data/scope_claims_audit.json",
824
  "exists": true,
825
  "bytes": 20823,
826
- "sha256": "7f01728415c9c54126eab25f2ce68e563b455f02d2bf10af514463c33bc0091e"
827
  },
828
  "hf_artifacts": {
829
  "path": "hf_artifacts:docs/data/scope_claims_audit.json",
830
  "exists": true,
831
  "bytes": 20823,
832
- "sha256": "7f01728415c9c54126eab25f2ce68e563b455f02d2bf10af514463c33bc0091e"
833
  },
834
  "hf_model": {
835
  "path": "hf_model:metrics/scope_claims_audit.json",
836
  "exists": true,
837
  "bytes": 20823,
838
- "sha256": "7f01728415c9c54126eab25f2ce68e563b455f02d2bf10af514463c33bc0091e"
839
  }
840
  },
841
  "failures": []
@@ -940,26 +940,26 @@
940
  "path": "repo:docs/data/task_surface_integrity.json",
941
  "exists": true,
942
  "bytes": 45779,
943
- "sha256": "1ae426aea9895c32912b2c9a0e519a55912222493d3c1d72e4785d71cd3b71cb"
944
  },
945
  "mirrors": {
946
  "hf_space": {
947
  "path": "hf_space:data/task_surface_integrity.json",
948
  "exists": true,
949
  "bytes": 45779,
950
- "sha256": "1ae426aea9895c32912b2c9a0e519a55912222493d3c1d72e4785d71cd3b71cb"
951
  },
952
  "hf_artifacts": {
953
  "path": "hf_artifacts:docs/data/task_surface_integrity.json",
954
  "exists": true,
955
  "bytes": 45779,
956
- "sha256": "1ae426aea9895c32912b2c9a0e519a55912222493d3c1d72e4785d71cd3b71cb"
957
  },
958
  "hf_model": {
959
  "path": "hf_model:metrics/task_surface_integrity.json",
960
  "exists": true,
961
  "bytes": 45779,
962
- "sha256": "1ae426aea9895c32912b2c9a0e519a55912222493d3c1d72e4785d71cd3b71cb"
963
  }
964
  },
965
  "failures": []
@@ -1002,26 +1002,26 @@
1002
  "path": "repo:docs/data/website_integrity.json",
1003
  "exists": true,
1004
  "bytes": 15221,
1005
- "sha256": "08f9429aead121834f52fb108a35ff0933435d49064650b94b7ed84c1002182b"
1006
  },
1007
  "mirrors": {
1008
  "hf_space": {
1009
  "path": "hf_space:data/website_integrity.json",
1010
  "exists": true,
1011
  "bytes": 15221,
1012
- "sha256": "08f9429aead121834f52fb108a35ff0933435d49064650b94b7ed84c1002182b"
1013
  },
1014
  "hf_artifacts": {
1015
  "path": "hf_artifacts:docs/data/website_integrity.json",
1016
  "exists": true,
1017
  "bytes": 15221,
1018
- "sha256": "08f9429aead121834f52fb108a35ff0933435d49064650b94b7ed84c1002182b"
1019
  },
1020
  "hf_model": {
1021
  "path": "hf_model:metrics/website_integrity.json",
1022
  "exists": true,
1023
  "bytes": 15221,
1024
- "sha256": "08f9429aead121834f52fb108a35ff0933435d49064650b94b7ed84c1002182b"
1025
  }
1026
  },
1027
  "failures": []
@@ -1723,6 +1723,31 @@
1723
  },
1724
  "failures": []
1725
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1726
  {
1727
  "name": "scripts/audio_ablation_and_raw_upgrade.py",
1728
  "status": "pass",
@@ -1754,21 +1779,21 @@
1754
  "local": {
1755
  "path": "repo:scripts/build_artifact_index.py",
1756
  "exists": true,
1757
- "bytes": 30785,
1758
- "sha256": "0c42b68e44e6a32b6b5161b47161adc5ccdb57567e1462e8271ea87af50ab92d"
1759
  },
1760
  "mirrors": {
1761
  "hf_artifacts": {
1762
  "path": "hf_artifacts:scripts/build_artifact_index.py",
1763
  "exists": true,
1764
- "bytes": 30785,
1765
- "sha256": "0c42b68e44e6a32b6b5161b47161adc5ccdb57567e1462e8271ea87af50ab92d"
1766
  },
1767
  "hf_model": {
1768
  "path": "hf_model:scripts/build_artifact_index.py",
1769
  "exists": true,
1770
- "bytes": 30785,
1771
- "sha256": "0c42b68e44e6a32b6b5161b47161adc5ccdb57567e1462e8271ea87af50ab92d"
1772
  }
1773
  },
1774
  "failures": []
@@ -2054,21 +2079,21 @@
2054
  "local": {
2055
  "path": "repo:scripts/validate_mirror_parity.py",
2056
  "exists": true,
2057
- "bytes": 12642,
2058
- "sha256": "17420a261d1327c0a8acb79adb75fc15217f117216eb74acf0cab3fa36de856c"
2059
  },
2060
  "mirrors": {
2061
  "hf_artifacts": {
2062
  "path": "hf_artifacts:scripts/validate_mirror_parity.py",
2063
  "exists": true,
2064
- "bytes": 12642,
2065
- "sha256": "17420a261d1327c0a8acb79adb75fc15217f117216eb74acf0cab3fa36de856c"
2066
  },
2067
  "hf_model": {
2068
  "path": "hf_model:scripts/validate_mirror_parity.py",
2069
  "exists": true,
2070
- "bytes": 12642,
2071
- "sha256": "17420a261d1327c0a8acb79adb75fc15217f117216eb74acf0cab3fa36de856c"
2072
  }
2073
  },
2074
  "failures": []
@@ -2807,6 +2832,285 @@
2807
  },
2808
  "failures": []
2809
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2810
  {
2811
  "name": "docs/QUALITY_GATES.md",
2812
  "status": "pass",
@@ -3061,27 +3365,27 @@
3061
  "local": {
3062
  "path": "repo:PROJECT_STATUS.md",
3063
  "exists": true,
3064
- "bytes": 8534,
3065
- "sha256": "5eb48d489da7f005baab233a94c9d6b209eb1e9ffdb138c8e0e600ece9239a29"
3066
  },
3067
  "mirrors": {
3068
  "hf_space": {
3069
  "path": "hf_space:PROJECT_STATUS.md",
3070
  "exists": true,
3071
- "bytes": 8534,
3072
- "sha256": "5eb48d489da7f005baab233a94c9d6b209eb1e9ffdb138c8e0e600ece9239a29"
3073
  },
3074
  "hf_artifacts": {
3075
  "path": "hf_artifacts:PROJECT_STATUS.md",
3076
  "exists": true,
3077
- "bytes": 8534,
3078
- "sha256": "5eb48d489da7f005baab233a94c9d6b209eb1e9ffdb138c8e0e600ece9239a29"
3079
  },
3080
  "hf_model": {
3081
  "path": "hf_model:PROJECT_STATUS.md",
3082
  "exists": true,
3083
- "bytes": 8534,
3084
- "sha256": "5eb48d489da7f005baab233a94c9d6b209eb1e9ffdb138c8e0e600ece9239a29"
3085
  }
3086
  },
3087
  "failures": []
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-06T14:56:44+00:00",
4
  "hf_root": "hf_publish",
5
  "summary": {
6
+ "group_count": 114,
7
  "failure_count": 0,
8
  "failures_by_surface": {}
9
  },
 
102
  "local": {
103
  "path": "repo:docs/data/artifact_index.json",
104
  "exists": true,
105
+ "bytes": 39486,
106
+ "sha256": "87782cd08bc1106d694a727e21333450d2965b48c48f500d1b6f4294d7b247d0"
107
  },
108
  "mirrors": {
109
  "hf_space": {
110
  "path": "hf_space:data/artifact_index.json",
111
  "exists": true,
112
+ "bytes": 39486,
113
+ "sha256": "87782cd08bc1106d694a727e21333450d2965b48c48f500d1b6f4294d7b247d0"
114
  },
115
  "hf_artifacts": {
116
  "path": "hf_artifacts:docs/data/artifact_index.json",
117
  "exists": true,
118
+ "bytes": 39486,
119
+ "sha256": "87782cd08bc1106d694a727e21333450d2965b48c48f500d1b6f4294d7b247d0"
120
  },
121
  "hf_model": {
122
  "path": "hf_model:metrics/artifact_index.json",
123
  "exists": true,
124
+ "bytes": 39486,
125
+ "sha256": "87782cd08bc1106d694a727e21333450d2965b48c48f500d1b6f4294d7b247d0"
126
  }
127
  },
128
  "failures": []
 
350
  "local": {
351
  "path": "repo:docs/data/omni_finetune_verified_result.json",
352
  "exists": true,
353
+ "bytes": 4142,
354
+ "sha256": "297aa6fc86bc09ba7968f3c5c2db265320c0613c5ec9a36701114ba451321b81"
355
  },
356
  "mirrors": {
357
  "hf_space": {
358
  "path": "hf_space:data/omni_finetune_verified_result.json",
359
  "exists": true,
360
+ "bytes": 4142,
361
+ "sha256": "297aa6fc86bc09ba7968f3c5c2db265320c0613c5ec9a36701114ba451321b81"
362
  },
363
  "hf_artifacts": {
364
  "path": "hf_artifacts:docs/data/omni_finetune_verified_result.json",
365
  "exists": true,
366
+ "bytes": 4142,
367
+ "sha256": "297aa6fc86bc09ba7968f3c5c2db265320c0613c5ec9a36701114ba451321b81"
368
  },
369
  "hf_model": {
370
  "path": "hf_model:metrics/omni_finetune_verified_result.json",
371
  "exists": true,
372
+ "bytes": 4142,
373
+ "sha256": "297aa6fc86bc09ba7968f3c5c2db265320c0613c5ec9a36701114ba451321b81"
374
  }
375
  },
376
  "failures": []
 
474
  "local": {
475
  "path": "repo:docs/data/project_status.json",
476
  "exists": true,
477
+ "bytes": 11274,
478
+ "sha256": "ae2b2c520ab1e0553fa399439345edd87832fa5293d8c27ffe610ede5bfa1067"
479
  },
480
  "mirrors": {
481
  "hf_space": {
482
  "path": "hf_space:data/project_status.json",
483
  "exists": true,
484
+ "bytes": 11274,
485
+ "sha256": "ae2b2c520ab1e0553fa399439345edd87832fa5293d8c27ffe610ede5bfa1067"
486
  },
487
  "hf_artifacts": {
488
  "path": "hf_artifacts:docs/data/project_status.json",
489
  "exists": true,
490
+ "bytes": 11274,
491
+ "sha256": "ae2b2c520ab1e0553fa399439345edd87832fa5293d8c27ffe610ede5bfa1067"
492
  },
493
  "hf_model": {
494
  "path": "hf_model:metrics/project_status.json",
495
  "exists": true,
496
+ "bytes": 11274,
497
+ "sha256": "ae2b2c520ab1e0553fa399439345edd87832fa5293d8c27ffe610ede5bfa1067"
498
  }
499
  },
500
  "failures": []
 
506
  "path": "repo:docs/data/publication_audit.json",
507
  "exists": true,
508
  "bytes": 7237,
509
+ "sha256": "8a21c29d92f3a15b835c37d7784c17fada3edbda050515deed8e440535ed046d"
510
  },
511
  "mirrors": {
512
  "hf_space": {
513
  "path": "hf_space:data/publication_audit.json",
514
  "exists": true,
515
  "bytes": 7237,
516
+ "sha256": "8a21c29d92f3a15b835c37d7784c17fada3edbda050515deed8e440535ed046d"
517
  },
518
  "hf_artifacts": {
519
  "path": "hf_artifacts:docs/data/publication_audit.json",
520
  "exists": true,
521
  "bytes": 7237,
522
+ "sha256": "8a21c29d92f3a15b835c37d7784c17fada3edbda050515deed8e440535ed046d"
523
  },
524
  "hf_model": {
525
  "path": "hf_model:metrics/publication_audit.json",
526
  "exists": true,
527
  "bytes": 7237,
528
+ "sha256": "8a21c29d92f3a15b835c37d7784c17fada3edbda050515deed8e440535ed046d"
529
  }
530
  },
531
  "failures": []
 
816
  "path": "repo:docs/data/scope_claims_audit.json",
817
  "exists": true,
818
  "bytes": 20823,
819
+ "sha256": "77402dc77c4ecf5cf1e68480ae2c9822a134ae7ef4a24a7b8b9008a2509c2fa3"
820
  },
821
  "mirrors": {
822
  "hf_space": {
823
  "path": "hf_space:data/scope_claims_audit.json",
824
  "exists": true,
825
  "bytes": 20823,
826
+ "sha256": "77402dc77c4ecf5cf1e68480ae2c9822a134ae7ef4a24a7b8b9008a2509c2fa3"
827
  },
828
  "hf_artifacts": {
829
  "path": "hf_artifacts:docs/data/scope_claims_audit.json",
830
  "exists": true,
831
  "bytes": 20823,
832
+ "sha256": "77402dc77c4ecf5cf1e68480ae2c9822a134ae7ef4a24a7b8b9008a2509c2fa3"
833
  },
834
  "hf_model": {
835
  "path": "hf_model:metrics/scope_claims_audit.json",
836
  "exists": true,
837
  "bytes": 20823,
838
+ "sha256": "77402dc77c4ecf5cf1e68480ae2c9822a134ae7ef4a24a7b8b9008a2509c2fa3"
839
  }
840
  },
841
  "failures": []
 
940
  "path": "repo:docs/data/task_surface_integrity.json",
941
  "exists": true,
942
  "bytes": 45779,
943
+ "sha256": "8232e2bafa8b5157d97c018e41be5da3ec69ddb4d2020a0dcc7c6377c5575bb6"
944
  },
945
  "mirrors": {
946
  "hf_space": {
947
  "path": "hf_space:data/task_surface_integrity.json",
948
  "exists": true,
949
  "bytes": 45779,
950
+ "sha256": "8232e2bafa8b5157d97c018e41be5da3ec69ddb4d2020a0dcc7c6377c5575bb6"
951
  },
952
  "hf_artifacts": {
953
  "path": "hf_artifacts:docs/data/task_surface_integrity.json",
954
  "exists": true,
955
  "bytes": 45779,
956
+ "sha256": "8232e2bafa8b5157d97c018e41be5da3ec69ddb4d2020a0dcc7c6377c5575bb6"
957
  },
958
  "hf_model": {
959
  "path": "hf_model:metrics/task_surface_integrity.json",
960
  "exists": true,
961
  "bytes": 45779,
962
+ "sha256": "8232e2bafa8b5157d97c018e41be5da3ec69ddb4d2020a0dcc7c6377c5575bb6"
963
  }
964
  },
965
  "failures": []
 
1002
  "path": "repo:docs/data/website_integrity.json",
1003
  "exists": true,
1004
  "bytes": 15221,
1005
+ "sha256": "dcbd09b4c4522770c43504c500eb653de706538516ee2ec72e491ffc3416c6e2"
1006
  },
1007
  "mirrors": {
1008
  "hf_space": {
1009
  "path": "hf_space:data/website_integrity.json",
1010
  "exists": true,
1011
  "bytes": 15221,
1012
+ "sha256": "dcbd09b4c4522770c43504c500eb653de706538516ee2ec72e491ffc3416c6e2"
1013
  },
1014
  "hf_artifacts": {
1015
  "path": "hf_artifacts:docs/data/website_integrity.json",
1016
  "exists": true,
1017
  "bytes": 15221,
1018
+ "sha256": "dcbd09b4c4522770c43504c500eb653de706538516ee2ec72e491ffc3416c6e2"
1019
  },
1020
  "hf_model": {
1021
  "path": "hf_model:metrics/website_integrity.json",
1022
  "exists": true,
1023
  "bytes": 15221,
1024
+ "sha256": "dcbd09b4c4522770c43504c500eb653de706538516ee2ec72e491ffc3416c6e2"
1025
  }
1026
  },
1027
  "failures": []
 
1723
  },
1724
  "failures": []
1725
  },
1726
+ {
1727
+ "name": "scripts/omni/analyze_qwen3_omni_errors.py",
1728
+ "status": "pass",
1729
+ "local": {
1730
+ "path": "repo:scripts/omni/analyze_qwen3_omni_errors.py",
1731
+ "exists": true,
1732
+ "bytes": 15676,
1733
+ "sha256": "d4c7e46d9fbd5f9d84bc32374f457fd8c9d68c8faa39c77bc45770eb95d80337"
1734
+ },
1735
+ "mirrors": {
1736
+ "hf_artifacts": {
1737
+ "path": "hf_artifacts:scripts/omni/analyze_qwen3_omni_errors.py",
1738
+ "exists": true,
1739
+ "bytes": 15676,
1740
+ "sha256": "d4c7e46d9fbd5f9d84bc32374f457fd8c9d68c8faa39c77bc45770eb95d80337"
1741
+ },
1742
+ "hf_model": {
1743
+ "path": "hf_model:scripts/omni/analyze_qwen3_omni_errors.py",
1744
+ "exists": true,
1745
+ "bytes": 15676,
1746
+ "sha256": "d4c7e46d9fbd5f9d84bc32374f457fd8c9d68c8faa39c77bc45770eb95d80337"
1747
+ }
1748
+ },
1749
+ "failures": []
1750
+ },
1751
  {
1752
  "name": "scripts/audio_ablation_and_raw_upgrade.py",
1753
  "status": "pass",
 
1779
  "local": {
1780
  "path": "repo:scripts/build_artifact_index.py",
1781
  "exists": true,
1782
+ "bytes": 32191,
1783
+ "sha256": "4a105c732d2f6c54a78333d7f47e0139325ba638027e34e6acd929a90626b8e0"
1784
  },
1785
  "mirrors": {
1786
  "hf_artifacts": {
1787
  "path": "hf_artifacts:scripts/build_artifact_index.py",
1788
  "exists": true,
1789
+ "bytes": 32191,
1790
+ "sha256": "4a105c732d2f6c54a78333d7f47e0139325ba638027e34e6acd929a90626b8e0"
1791
  },
1792
  "hf_model": {
1793
  "path": "hf_model:scripts/build_artifact_index.py",
1794
  "exists": true,
1795
+ "bytes": 32191,
1796
+ "sha256": "4a105c732d2f6c54a78333d7f47e0139325ba638027e34e6acd929a90626b8e0"
1797
  }
1798
  },
1799
  "failures": []
 
2079
  "local": {
2080
  "path": "repo:scripts/validate_mirror_parity.py",
2081
  "exists": true,
2082
+ "bytes": 13781,
2083
+ "sha256": "3659adf936b058617dde97ee4c424615a361e59f5ea74975116422dfe01768e8"
2084
  },
2085
  "mirrors": {
2086
  "hf_artifacts": {
2087
  "path": "hf_artifacts:scripts/validate_mirror_parity.py",
2088
  "exists": true,
2089
+ "bytes": 13781,
2090
+ "sha256": "3659adf936b058617dde97ee4c424615a361e59f5ea74975116422dfe01768e8"
2091
  },
2092
  "hf_model": {
2093
  "path": "hf_model:scripts/validate_mirror_parity.py",
2094
  "exists": true,
2095
+ "bytes": 13781,
2096
+ "sha256": "3659adf936b058617dde97ee4c424615a361e59f5ea74975116422dfe01768e8"
2097
  }
2098
  },
2099
  "failures": []
 
2832
  },
2833
  "failures": []
2834
  },
2835
+ {
2836
+ "name": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/ERROR_ANALYSIS.md",
2837
+ "status": "pass",
2838
+ "local": {
2839
+ "path": "repo:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/ERROR_ANALYSIS.md",
2840
+ "exists": true,
2841
+ "bytes": 3331,
2842
+ "sha256": "063fcc2ebd7b57ab5b281fd5e8edc629da4e1f4e5a708483ba27375d02af9467"
2843
+ },
2844
+ "mirrors": {
2845
+ "hf_space": {
2846
+ "path": "hf_space:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/ERROR_ANALYSIS.md",
2847
+ "exists": true,
2848
+ "bytes": 3331,
2849
+ "sha256": "063fcc2ebd7b57ab5b281fd5e8edc629da4e1f4e5a708483ba27375d02af9467"
2850
+ },
2851
+ "hf_artifacts": {
2852
+ "path": "hf_artifacts:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/ERROR_ANALYSIS.md",
2853
+ "exists": true,
2854
+ "bytes": 3331,
2855
+ "sha256": "063fcc2ebd7b57ab5b281fd5e8edc629da4e1f4e5a708483ba27375d02af9467"
2856
+ },
2857
+ "hf_model": {
2858
+ "path": "hf_model:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/ERROR_ANALYSIS.md",
2859
+ "exists": true,
2860
+ "bytes": 3331,
2861
+ "sha256": "063fcc2ebd7b57ab5b281fd5e8edc629da4e1f4e5a708483ba27375d02af9467"
2862
+ }
2863
+ },
2864
+ "failures": []
2865
+ },
2866
+ {
2867
+ "name": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/error_analysis_summary.json",
2868
+ "status": "pass",
2869
+ "local": {
2870
+ "path": "repo:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/error_analysis_summary.json",
2871
+ "exists": true,
2872
+ "bytes": 25202,
2873
+ "sha256": "c2e4eaa686f5d9739a8d0bfd8ae51a453b94019489ed84a154e2bce2fa316ff5"
2874
+ },
2875
+ "mirrors": {
2876
+ "hf_space": {
2877
+ "path": "hf_space:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/error_analysis_summary.json",
2878
+ "exists": true,
2879
+ "bytes": 25202,
2880
+ "sha256": "c2e4eaa686f5d9739a8d0bfd8ae51a453b94019489ed84a154e2bce2fa316ff5"
2881
+ },
2882
+ "hf_artifacts": {
2883
+ "path": "hf_artifacts:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/error_analysis_summary.json",
2884
+ "exists": true,
2885
+ "bytes": 25202,
2886
+ "sha256": "c2e4eaa686f5d9739a8d0bfd8ae51a453b94019489ed84a154e2bce2fa316ff5"
2887
+ },
2888
+ "hf_model": {
2889
+ "path": "hf_model:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/error_analysis_summary.json",
2890
+ "exists": true,
2891
+ "bytes": 25202,
2892
+ "sha256": "c2e4eaa686f5d9739a8d0bfd8ae51a453b94019489ed84a154e2bce2fa316ff5"
2893
+ }
2894
+ },
2895
+ "failures": []
2896
+ },
2897
+ {
2898
+ "name": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/episode_error_analysis.csv",
2899
+ "status": "pass",
2900
+ "local": {
2901
+ "path": "repo:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/episode_error_analysis.csv",
2902
+ "exists": true,
2903
+ "bytes": 2121,
2904
+ "sha256": "7f0bc74140f100b9fe444c38eb74d155605bfc5984f665e653a2cd34a5cb96bd"
2905
+ },
2906
+ "mirrors": {
2907
+ "hf_space": {
2908
+ "path": "hf_space:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/episode_error_analysis.csv",
2909
+ "exists": true,
2910
+ "bytes": 2121,
2911
+ "sha256": "7f0bc74140f100b9fe444c38eb74d155605bfc5984f665e653a2cd34a5cb96bd"
2912
+ },
2913
+ "hf_artifacts": {
2914
+ "path": "hf_artifacts:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/episode_error_analysis.csv",
2915
+ "exists": true,
2916
+ "bytes": 2121,
2917
+ "sha256": "7f0bc74140f100b9fe444c38eb74d155605bfc5984f665e653a2cd34a5cb96bd"
2918
+ },
2919
+ "hf_model": {
2920
+ "path": "hf_model:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/episode_error_analysis.csv",
2921
+ "exists": true,
2922
+ "bytes": 2121,
2923
+ "sha256": "7f0bc74140f100b9fe444c38eb74d155605bfc5984f665e653a2cd34a5cb96bd"
2924
+ }
2925
+ },
2926
+ "failures": []
2927
+ },
2928
+ {
2929
+ "name": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/action_family_error_analysis.csv",
2930
+ "status": "pass",
2931
+ "local": {
2932
+ "path": "repo:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/action_family_error_analysis.csv",
2933
+ "exists": true,
2934
+ "bytes": 1320,
2935
+ "sha256": "e15bf22e96b887c4b00aeb8ba548f4fd72ea0aab0772cc59e9bdda517ad72430"
2936
+ },
2937
+ "mirrors": {
2938
+ "hf_space": {
2939
+ "path": "hf_space:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/action_family_error_analysis.csv",
2940
+ "exists": true,
2941
+ "bytes": 1320,
2942
+ "sha256": "e15bf22e96b887c4b00aeb8ba548f4fd72ea0aab0772cc59e9bdda517ad72430"
2943
+ },
2944
+ "hf_artifacts": {
2945
+ "path": "hf_artifacts:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/action_family_error_analysis.csv",
2946
+ "exists": true,
2947
+ "bytes": 1320,
2948
+ "sha256": "e15bf22e96b887c4b00aeb8ba548f4fd72ea0aab0772cc59e9bdda517ad72430"
2949
+ },
2950
+ "hf_model": {
2951
+ "path": "hf_model:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/action_family_error_analysis.csv",
2952
+ "exists": true,
2953
+ "bytes": 1320,
2954
+ "sha256": "e15bf22e96b887c4b00aeb8ba548f4fd72ea0aab0772cc59e9bdda517ad72430"
2955
+ }
2956
+ },
2957
+ "failures": []
2958
+ },
2959
+ {
2960
+ "name": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/train_seen_error_analysis.csv",
2961
+ "status": "pass",
2962
+ "local": {
2963
+ "path": "repo:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/train_seen_error_analysis.csv",
2964
+ "exists": true,
2965
+ "bytes": 572,
2966
+ "sha256": "cb196616b6f073266087d8cb7182e36c0a761607f3082ad78c350fd99e1996e7"
2967
+ },
2968
+ "mirrors": {
2969
+ "hf_space": {
2970
+ "path": "hf_space:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/train_seen_error_analysis.csv",
2971
+ "exists": true,
2972
+ "bytes": 572,
2973
+ "sha256": "cb196616b6f073266087d8cb7182e36c0a761607f3082ad78c350fd99e1996e7"
2974
+ },
2975
+ "hf_artifacts": {
2976
+ "path": "hf_artifacts:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/train_seen_error_analysis.csv",
2977
+ "exists": true,
2978
+ "bytes": 572,
2979
+ "sha256": "cb196616b6f073266087d8cb7182e36c0a761607f3082ad78c350fd99e1996e7"
2980
+ },
2981
+ "hf_model": {
2982
+ "path": "hf_model:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/train_seen_error_analysis.csv",
2983
+ "exists": true,
2984
+ "bytes": 572,
2985
+ "sha256": "cb196616b6f073266087d8cb7182e36c0a761607f3082ad78c350fd99e1996e7"
2986
+ }
2987
+ },
2988
+ "failures": []
2989
+ },
2990
+ {
2991
+ "name": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/missing_modality_error_analysis.csv",
2992
+ "status": "pass",
2993
+ "local": {
2994
+ "path": "repo:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/missing_modality_error_analysis.csv",
2995
+ "exists": true,
2996
+ "bytes": 408,
2997
+ "sha256": "6447cf285b466a914055adb0aef4f3d47bf82d33a277d8ca2e6f22c4f0f2a7f7"
2998
+ },
2999
+ "mirrors": {
3000
+ "hf_space": {
3001
+ "path": "hf_space:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/missing_modality_error_analysis.csv",
3002
+ "exists": true,
3003
+ "bytes": 408,
3004
+ "sha256": "6447cf285b466a914055adb0aef4f3d47bf82d33a277d8ca2e6f22c4f0f2a7f7"
3005
+ },
3006
+ "hf_artifacts": {
3007
+ "path": "hf_artifacts:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/missing_modality_error_analysis.csv",
3008
+ "exists": true,
3009
+ "bytes": 408,
3010
+ "sha256": "6447cf285b466a914055adb0aef4f3d47bf82d33a277d8ca2e6f22c4f0f2a7f7"
3011
+ },
3012
+ "hf_model": {
3013
+ "path": "hf_model:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/missing_modality_error_analysis.csv",
3014
+ "exists": true,
3015
+ "bytes": 408,
3016
+ "sha256": "6447cf285b466a914055adb0aef4f3d47bf82d33a277d8ca2e6f22c4f0f2a7f7"
3017
+ }
3018
+ },
3019
+ "failures": []
3020
+ },
3021
+ {
3022
+ "name": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/object_category_error_analysis.csv",
3023
+ "status": "pass",
3024
+ "local": {
3025
+ "path": "repo:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/object_category_error_analysis.csv",
3026
+ "exists": true,
3027
+ "bytes": 1704,
3028
+ "sha256": "f9cbd5e566ef666fe2d1050cc5bdadc7967a2056bdaa1e2e9f88fb0c22ee0ef8"
3029
+ },
3030
+ "mirrors": {
3031
+ "hf_space": {
3032
+ "path": "hf_space:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/object_category_error_analysis.csv",
3033
+ "exists": true,
3034
+ "bytes": 1704,
3035
+ "sha256": "f9cbd5e566ef666fe2d1050cc5bdadc7967a2056bdaa1e2e9f88fb0c22ee0ef8"
3036
+ },
3037
+ "hf_artifacts": {
3038
+ "path": "hf_artifacts:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/object_category_error_analysis.csv",
3039
+ "exists": true,
3040
+ "bytes": 1704,
3041
+ "sha256": "f9cbd5e566ef666fe2d1050cc5bdadc7967a2056bdaa1e2e9f88fb0c22ee0ef8"
3042
+ },
3043
+ "hf_model": {
3044
+ "path": "hf_model:results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/object_category_error_analysis.csv",
3045
+ "exists": true,
3046
+ "bytes": 1704,
3047
+ "sha256": "f9cbd5e566ef666fe2d1050cc5bdadc7967a2056bdaa1e2e9f88fb0c22ee0ef8"
3048
+ }
3049
+ },
3050
+ "failures": []
3051
+ },
3052
+ {
3053
+ "name": "docs/ARTIFACT_GUIDE.md",
3054
+ "status": "pass",
3055
+ "local": {
3056
+ "path": "repo:ARTIFACT_GUIDE.md",
3057
+ "exists": true,
3058
+ "bytes": 16318,
3059
+ "sha256": "cda5f4b5be4b7a2d26aff6ed7f930bfba13dfc463d533a9880193c0a0611b677"
3060
+ },
3061
+ "mirrors": {
3062
+ "hf_space": {
3063
+ "path": "hf_space:ARTIFACT_GUIDE.md",
3064
+ "exists": true,
3065
+ "bytes": 16318,
3066
+ "sha256": "cda5f4b5be4b7a2d26aff6ed7f930bfba13dfc463d533a9880193c0a0611b677"
3067
+ },
3068
+ "hf_artifacts": {
3069
+ "path": "hf_artifacts:ARTIFACT_GUIDE.md",
3070
+ "exists": true,
3071
+ "bytes": 16318,
3072
+ "sha256": "cda5f4b5be4b7a2d26aff6ed7f930bfba13dfc463d533a9880193c0a0611b677"
3073
+ },
3074
+ "hf_model": {
3075
+ "path": "hf_model:ARTIFACT_GUIDE.md",
3076
+ "exists": true,
3077
+ "bytes": 16318,
3078
+ "sha256": "cda5f4b5be4b7a2d26aff6ed7f930bfba13dfc463d533a9880193c0a0611b677"
3079
+ }
3080
+ },
3081
+ "failures": []
3082
+ },
3083
+ {
3084
+ "name": "docs/OMNI_MODEL_EXTENSION_CONTRACT.md",
3085
+ "status": "pass",
3086
+ "local": {
3087
+ "path": "repo:OMNI_MODEL_EXTENSION_CONTRACT.md",
3088
+ "exists": true,
3089
+ "bytes": 8900,
3090
+ "sha256": "c4e51d0aa7536045c229418603a67c6b3c5f31c9d756ca7395cb0c9455f0ed6d"
3091
+ },
3092
+ "mirrors": {
3093
+ "hf_space": {
3094
+ "path": "hf_space:OMNI_MODEL_EXTENSION_CONTRACT.md",
3095
+ "exists": true,
3096
+ "bytes": 8900,
3097
+ "sha256": "c4e51d0aa7536045c229418603a67c6b3c5f31c9d756ca7395cb0c9455f0ed6d"
3098
+ },
3099
+ "hf_artifacts": {
3100
+ "path": "hf_artifacts:OMNI_MODEL_EXTENSION_CONTRACT.md",
3101
+ "exists": true,
3102
+ "bytes": 8900,
3103
+ "sha256": "c4e51d0aa7536045c229418603a67c6b3c5f31c9d756ca7395cb0c9455f0ed6d"
3104
+ },
3105
+ "hf_model": {
3106
+ "path": "hf_model:OMNI_MODEL_EXTENSION_CONTRACT.md",
3107
+ "exists": true,
3108
+ "bytes": 8900,
3109
+ "sha256": "c4e51d0aa7536045c229418603a67c6b3c5f31c9d756ca7395cb0c9455f0ed6d"
3110
+ }
3111
+ },
3112
+ "failures": []
3113
+ },
3114
  {
3115
  "name": "docs/QUALITY_GATES.md",
3116
  "status": "pass",
 
3365
  "local": {
3366
  "path": "repo:PROJECT_STATUS.md",
3367
  "exists": true,
3368
+ "bytes": 8805,
3369
+ "sha256": "4051b78674306078880de33a144a499144b2487b11455c70a364a94cefa035a7"
3370
  },
3371
  "mirrors": {
3372
  "hf_space": {
3373
  "path": "hf_space:PROJECT_STATUS.md",
3374
  "exists": true,
3375
+ "bytes": 8805,
3376
+ "sha256": "4051b78674306078880de33a144a499144b2487b11455c70a364a94cefa035a7"
3377
  },
3378
  "hf_artifacts": {
3379
  "path": "hf_artifacts:PROJECT_STATUS.md",
3380
  "exists": true,
3381
+ "bytes": 8805,
3382
+ "sha256": "4051b78674306078880de33a144a499144b2487b11455c70a364a94cefa035a7"
3383
  },
3384
  "hf_model": {
3385
  "path": "hf_model:PROJECT_STATUS.md",
3386
  "exists": true,
3387
+ "bytes": 8805,
3388
+ "sha256": "4051b78674306078880de33a144a499144b2487b11455c70a364a94cefa035a7"
3389
  }
3390
  },
3391
  "failures": []
metrics/omni_finetune_verified_result.json CHANGED
@@ -67,7 +67,28 @@
67
  "audit_status": "pass",
68
  "contains_raw_xperience10m_data": false,
69
  "contains_qwen_base_weights": false,
70
- "contains_lora_weights": false
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71
  },
72
  "required_next_steps": [
73
  "Improve JSON-format reliability through prompt, decoding, constrained parsing, or target formatting changes.",
 
67
  "audit_status": "pass",
68
  "contains_raw_xperience10m_data": false,
69
  "contains_qwen_base_weights": false,
70
+ "contains_lora_weights": false,
71
+ "error_analysis": {
72
+ "status": "pass",
73
+ "path": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/error_analysis_summary.json",
74
+ "markdown_report": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/ERROR_ANALYSIS.md",
75
+ "groupings": [
76
+ "episode",
77
+ "action_family",
78
+ "train_seen_status",
79
+ "required_modality_state",
80
+ "object_category"
81
+ ],
82
+ "key_readouts": {
83
+ "parsed_prediction_rate": 0.8772321428571429,
84
+ "weakest_action_family": "locomotion",
85
+ "weakest_action_family_samples": 23,
86
+ "weakest_action_family_parsed_prediction_rate": 0.2608695652173913,
87
+ "seen_action_exact_rate": 0.04580152671755725,
88
+ "unseen_action_exact_rate": 0.015772870662460567,
89
+ "required_modality_state": "rrd_missing_only_required_modalities_present"
90
+ }
91
+ }
92
  },
93
  "required_next_steps": [
94
  "Improve JSON-format reliability through prompt, decoding, constrained parsing, or target formatting changes.",
metrics/project_status.json CHANGED
@@ -180,10 +180,12 @@
180
  "evidence": [
181
  "docs/data/omni_finetune_verified_result.json",
182
  "results/omni_finetune/verified_public/",
 
183
  "scripts/omni/package_verified_omni_result.py",
184
- "scripts/omni/audit_verified_omni_package.py"
 
185
  ],
186
- "readout": "The selected 96/16/16 episode split produced a validation-aware public-safe held-out package with 3,808 exported windows, 512 validation windows, and 448 test predictions. JSON validity is 87.50%, below the 98% target, so it is a stronger diagnostic baseline but not a strong model-quality result."
187
  },
188
  {
189
  "area": "Raw Xperience-10M redistribution",
 
180
  "evidence": [
181
  "docs/data/omni_finetune_verified_result.json",
182
  "results/omni_finetune/verified_public/",
183
+ "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/",
184
  "scripts/omni/package_verified_omni_result.py",
185
+ "scripts/omni/audit_verified_omni_package.py",
186
+ "scripts/omni/analyze_qwen3_omni_errors.py"
187
  ],
188
+ "readout": "The selected 96/16/16 episode split produced a validation-aware public-safe held-out package with 3,808 exported windows, 512 validation windows, 448 test predictions, and derived error-analysis tables by episode, action family, train-seen status, required-modality state, and object category. JSON validity is 87.50%, below the 98% target, so it is a diagnostic baseline but not a strong model-quality result."
189
  },
190
  {
191
  "area": "Raw Xperience-10M redistribution",
metrics/publication_audit.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-06T14:38:05+00:00",
4
  "checks": [
5
  {
6
  "name": "required_publication_assets_present",
@@ -182,8 +182,8 @@
182
  "github_repo": {
183
  "root": "repo",
184
  "exists": true,
185
- "file_count": 442,
186
- "text_file_count": 372,
187
  "largest_file": {
188
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
189
  "bytes": 55702978
@@ -193,8 +193,8 @@
193
  "hf_space_bundle": {
194
  "root": "hf_publish/space",
195
  "exists": true,
196
- "file_count": 356,
197
- "text_file_count": 286,
198
  "largest_file": {
199
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
200
  "bytes": 55702978
@@ -204,8 +204,8 @@
204
  "hf_artifact_bundle": {
205
  "root": "hf_publish/artifacts",
206
  "exists": true,
207
- "file_count": 514,
208
- "text_file_count": 420,
209
  "largest_file": {
210
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
211
  "bytes": 55702978
@@ -215,8 +215,8 @@
215
  "hf_model_bundle": {
216
  "root": "hf_publish/model",
217
  "exists": true,
218
- "file_count": 701,
219
- "text_file_count": 572,
220
  "largest_file": {
221
  "path": "pytorch_model.bin",
222
  "bytes": 93495480
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-06T14:54:02+00:00",
4
  "checks": [
5
  {
6
  "name": "required_publication_assets_present",
 
182
  "github_repo": {
183
  "root": "repo",
184
  "exists": true,
185
+ "file_count": 450,
186
+ "text_file_count": 380,
187
  "largest_file": {
188
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
189
  "bytes": 55702978
 
193
  "hf_space_bundle": {
194
  "root": "hf_publish/space",
195
  "exists": true,
196
+ "file_count": 363,
197
+ "text_file_count": 293,
198
  "largest_file": {
199
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
200
  "bytes": 55702978
 
204
  "hf_artifact_bundle": {
205
  "root": "hf_publish/artifacts",
206
  "exists": true,
207
+ "file_count": 522,
208
+ "text_file_count": 428,
209
  "largest_file": {
210
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
211
  "bytes": 55702978
 
215
  "hf_model_bundle": {
216
  "root": "hf_publish/model",
217
  "exists": true,
218
+ "file_count": 709,
219
+ "text_file_count": 580,
220
  "largest_file": {
221
  "path": "pytorch_model.bin",
222
  "bytes": 93495480
metrics/scope_claims_audit.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-06T14:35:59+00:00",
4
  "summary": {
5
  "qwen3_omni_verified_diagnostic_pilot": true,
6
  "dataset_manifest_num_episodes": 119,
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-06T14:54:01+00:00",
4
  "summary": {
5
  "qwen3_omni_verified_diagnostic_pilot": true,
6
  "dataset_manifest_num_episodes": 119,
metrics/task_surface_integrity.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-06T14:35:59+00:00",
4
  "summary": {
5
  "task_count": 12,
6
  "expected_task_count": 12,
@@ -64,15 +64,21 @@
64
  "observed": "timeline_action"
65
  },
66
  {
67
- "name": "timeline_action: public_field_input_short_is_human_readable",
68
  "status": "pass",
69
- "value": "20-frame multimodal window",
70
  "raw_hits": []
71
  },
72
  {
73
- "name": "timeline_action: public_field_card_blurb_is_human_readable",
74
  "status": "pass",
75
- "value": "Recognize the current manipulation action from synchronized visual, motion, inertial, pose, and annotation context.",
 
 
 
 
 
 
76
  "raw_hits": []
77
  },
78
  {
@@ -88,9 +94,9 @@
88
  "raw_hits": []
89
  },
90
  {
91
- "name": "timeline_action: public_field_research_name_is_human_readable",
92
  "status": "pass",
93
- "value": "Egocentric Action Recognition",
94
  "raw_hits": []
95
  },
96
  {
@@ -99,12 +105,6 @@
99
  "value": "Look at one short multimodal window and name what action is happening now.",
100
  "raw_hits": []
101
  },
102
- {
103
- "name": "timeline_action: public_field_process_short_is_human_readable",
104
- "status": "pass",
105
- "value": "window features -> action label builder -> classifier",
106
- "raw_hits": []
107
- },
108
  {
109
  "name": "timeline_action: known_task_family",
110
  "status": "pass",
@@ -184,15 +184,21 @@
184
  "observed": "timeline_subtask"
185
  },
186
  {
187
- "name": "timeline_subtask: public_field_input_short_is_human_readable",
188
  "status": "pass",
189
- "value": "20-frame multimodal window",
190
  "raw_hits": []
191
  },
192
  {
193
- "name": "timeline_subtask: public_field_card_blurb_is_human_readable",
194
  "status": "pass",
195
- "value": "Recognize the broader activity stage so fine actions become a readable procedure timeline.",
 
 
 
 
 
 
196
  "raw_hits": []
197
  },
198
  {
@@ -208,9 +214,9 @@
208
  "raw_hits": []
209
  },
210
  {
211
- "name": "timeline_subtask: public_field_research_name_is_human_readable",
212
  "status": "pass",
213
- "value": "Temporal Subtask Recognition",
214
  "raw_hits": []
215
  },
216
  {
@@ -219,12 +225,6 @@
219
  "value": "Predict the higher-level task stage for the current window.",
220
  "raw_hits": []
221
  },
222
- {
223
- "name": "timeline_subtask: public_field_process_short_is_human_readable",
224
- "status": "pass",
225
- "value": "window features -> subtask label builder -> classifier",
226
- "raw_hits": []
227
- },
228
  {
229
  "name": "timeline_subtask: known_task_family",
230
  "status": "pass",
@@ -304,15 +304,21 @@
304
  "observed": "transition_detection"
305
  },
306
  {
307
- "name": "transition_detection: public_field_input_short_is_human_readable",
308
  "status": "pass",
309
- "value": "current window with boundary target",
310
  "raw_hits": []
311
  },
312
  {
313
- "name": "transition_detection: public_field_card_blurb_is_human_readable",
314
  "status": "pass",
315
- "value": "Detect the local moment where the episode changes from one action segment to the next.",
 
 
 
 
 
 
316
  "raw_hits": []
317
  },
318
  {
@@ -328,9 +334,9 @@
328
  "raw_hits": []
329
  },
330
  {
331
- "name": "transition_detection: public_field_research_name_is_human_readable",
332
  "status": "pass",
333
- "value": "Temporal Action Segmentation",
334
  "raw_hits": []
335
  },
336
  {
@@ -339,12 +345,6 @@
339
  "value": "Detect whether the current window is near a boundary between actions.",
340
  "raw_hits": []
341
  },
342
- {
343
- "name": "transition_detection: public_field_process_short_is_human_readable",
344
- "status": "pass",
345
- "value": "action changes -> boundary labels -> binary classifier",
346
- "raw_hits": []
347
- },
348
  {
349
  "name": "transition_detection: known_task_family",
350
  "status": "pass",
@@ -422,15 +422,21 @@
422
  "observed": "next_action"
423
  },
424
  {
425
- "name": "next_action: public_field_input_short_is_human_readable",
426
  "status": "pass",
427
- "value": "current window at time t",
428
  "raw_hits": []
429
  },
430
  {
431
- "name": "next_action: public_field_card_blurb_is_human_readable",
432
  "status": "pass",
433
- "value": "Forecast the near-future action from the current observations only.",
 
 
 
 
 
 
434
  "raw_hits": []
435
  },
436
  {
@@ -446,9 +452,9 @@
446
  "raw_hits": []
447
  },
448
  {
449
- "name": "next_action: public_field_research_name_is_human_readable",
450
  "status": "pass",
451
- "value": "Short-Horizon Intention Prediction",
452
  "raw_hits": []
453
  },
454
  {
@@ -457,12 +463,6 @@
457
  "value": "Use the current window to guess the action that will happen shortly after it.",
458
  "raw_hits": []
459
  },
460
- {
461
- "name": "next_action: public_field_process_short_is_human_readable",
462
- "status": "pass",
463
- "value": "current features -> future label shift -> classifier",
464
- "raw_hits": []
465
- },
466
  {
467
  "name": "next_action: known_task_family",
468
  "status": "pass",
@@ -540,15 +540,21 @@
540
  "observed": "hand_trajectory_forecast"
541
  },
542
  {
543
- "name": "hand_trajectory_forecast: public_field_input_short_is_human_readable",
544
  "status": "pass",
545
- "value": "current multimodal window",
546
  "raw_hits": []
547
  },
548
  {
549
- "name": "hand_trajectory_forecast: public_field_card_blurb_is_human_readable",
550
  "status": "pass",
551
- "value": "Predict the future 3D left/right hand path from the current multimodal state.",
 
 
 
 
 
 
552
  "raw_hits": []
553
  },
554
  {
@@ -564,9 +570,9 @@
564
  "raw_hits": []
565
  },
566
  {
567
- "name": "hand_trajectory_forecast: public_field_research_name_is_human_readable",
568
  "status": "pass",
569
- "value": "3D Hand Motion Forecasting",
570
  "raw_hits": []
571
  },
572
  {
@@ -575,12 +581,6 @@
575
  "value": "Predict where the hands will move over the next few frames.",
576
  "raw_hits": []
577
  },
578
- {
579
- "name": "hand_trajectory_forecast: public_field_process_short_is_human_readable",
580
- "status": "pass",
581
- "value": "current features -> future mocap target -> regression head",
582
- "raw_hits": []
583
- },
584
  {
585
  "name": "hand_trajectory_forecast: known_task_family",
586
  "status": "pass",
@@ -658,15 +658,21 @@
658
  "observed": "contact_prediction"
659
  },
660
  {
661
- "name": "contact_prediction: public_field_input_short_is_human_readable",
662
  "status": "pass",
663
- "value": "non-contact, non-caption features",
664
  "raw_hits": []
665
  },
666
  {
667
- "name": "contact_prediction: public_field_card_blurb_is_human_readable",
668
  "status": "pass",
669
- "value": "Predict whether body or hand contact with the scene is occurring without leaking contact labels.",
 
 
 
 
 
 
670
  "raw_hits": []
671
  },
672
  {
@@ -682,9 +688,9 @@
682
  "raw_hits": []
683
  },
684
  {
685
- "name": "contact_prediction: public_field_research_name_is_human_readable",
686
  "status": "pass",
687
- "value": "Human-Object Contact Prediction",
688
  "raw_hits": []
689
  },
690
  {
@@ -693,12 +699,6 @@
693
  "value": "Predict whether the body or hand is in contact with something.",
694
  "raw_hits": []
695
  },
696
- {
697
- "name": "contact_prediction: public_field_process_short_is_human_readable",
698
- "status": "pass",
699
- "value": "feature filter -> contact target -> binary classifier",
700
- "raw_hits": []
701
- },
702
  {
703
  "name": "contact_prediction: known_task_family",
704
  "status": "pass",
@@ -774,15 +774,21 @@
774
  "observed": "object_relevance"
775
  },
776
  {
777
- "name": "object_relevance: public_field_input_short_is_human_readable",
778
  "status": "pass",
779
- "value": "non-caption multimodal features",
780
  "raw_hits": []
781
  },
782
  {
783
- "name": "object_relevance: public_field_card_blurb_is_human_readable",
784
  "status": "pass",
785
- "value": "Infer which objects are relevant to the current manipulation window from non-caption features.",
 
 
 
 
 
 
786
  "raw_hits": []
787
  },
788
  {
@@ -798,9 +804,9 @@
798
  "raw_hits": []
799
  },
800
  {
801
- "name": "object_relevance: public_field_research_name_is_human_readable",
802
  "status": "pass",
803
- "value": "Object-Centric Interaction Recognition",
804
  "raw_hits": []
805
  },
806
  {
@@ -809,12 +815,6 @@
809
  "value": "Predict which objects matter in the current window.",
810
  "raw_hits": []
811
  },
812
- {
813
- "name": "object_relevance: public_field_process_short_is_human_readable",
814
- "status": "pass",
815
- "value": "object vocabulary -> multi-hot labels -> sigmoid heads",
816
- "raw_hits": []
817
- },
818
  {
819
  "name": "object_relevance: known_task_family",
820
  "status": "pass",
@@ -892,15 +892,21 @@
892
  "observed": "caption_grounding"
893
  },
894
  {
895
- "name": "caption_grounding: public_field_input_short_is_human_readable",
896
  "status": "pass",
897
- "value": "text-like query and candidate windows",
898
  "raw_hits": []
899
  },
900
  {
901
- "name": "caption_grounding: public_field_card_blurb_is_human_readable",
902
  "status": "pass",
903
- "value": "Retrieve the matching time window for an annotation-derived text query.",
 
 
 
 
 
 
904
  "raw_hits": []
905
  },
906
  {
@@ -916,9 +922,9 @@
916
  "raw_hits": []
917
  },
918
  {
919
- "name": "caption_grounding: public_field_research_name_is_human_readable",
920
  "status": "pass",
921
- "value": "Language-to-Moment Grounding",
922
  "raw_hits": []
923
  },
924
  {
@@ -927,12 +933,6 @@
927
  "value": "Given a text-like query from annotation, find the matching time window.",
928
  "raw_hits": []
929
  },
930
- {
931
- "name": "caption_grounding: public_field_process_short_is_human_readable",
932
- "status": "pass",
933
- "value": "query features -> candidate index -> cosine ranker",
934
- "raw_hits": []
935
- },
936
  {
937
  "name": "caption_grounding: known_task_family",
938
  "status": "pass",
@@ -1008,15 +1008,21 @@
1008
  "observed": "cross_modal_retrieval"
1009
  },
1010
  {
1011
- "name": "cross_modal_retrieval: public_field_input_short_is_human_readable",
1012
  "status": "pass",
1013
- "value": "motion/IMU/pose query; depth/video candidates",
1014
  "raw_hits": []
1015
  },
1016
  {
1017
- "name": "cross_modal_retrieval: public_field_card_blurb_is_human_readable",
1018
  "status": "pass",
1019
- "value": "Use motion, IMU, and camera-pose signals to retrieve the matching depth/video window.",
 
 
 
 
 
 
1020
  "raw_hits": []
1021
  },
1022
  {
@@ -1032,9 +1038,9 @@
1032
  "raw_hits": []
1033
  },
1034
  {
1035
- "name": "cross_modal_retrieval: public_field_research_name_is_human_readable",
1036
  "status": "pass",
1037
- "value": "Multimodal Representation Retrieval",
1038
  "raw_hits": []
1039
  },
1040
  {
@@ -1043,12 +1049,6 @@
1043
  "value": "Use one group of modalities to retrieve the matching window from another group.",
1044
  "raw_hits": []
1045
  },
1046
- {
1047
- "name": "cross_modal_retrieval: public_field_process_short_is_human_readable",
1048
- "status": "pass",
1049
- "value": "modality split -> projection -> nearest-neighbor ranker",
1050
- "raw_hits": []
1051
- },
1052
  {
1053
  "name": "cross_modal_retrieval: known_task_family",
1054
  "status": "pass",
@@ -1126,15 +1126,21 @@
1126
  "observed": "modality_reconstruction"
1127
  },
1128
  {
1129
- "name": "modality_reconstruction: public_field_input_short_is_human_readable",
1130
  "status": "pass",
1131
- "value": "motion, IMU, and camera/pose features",
1132
  "raw_hits": []
1133
  },
1134
  {
1135
- "name": "modality_reconstruction: public_field_card_blurb_is_human_readable",
1136
  "status": "pass",
1137
- "value": "Predict compressed depth/video feature vectors from motion, IMU, and camera-pose features.",
 
 
 
 
 
 
1138
  "raw_hits": []
1139
  },
1140
  {
@@ -1150,9 +1156,9 @@
1150
  "raw_hits": []
1151
  },
1152
  {
1153
- "name": "modality_reconstruction: public_field_research_name_is_human_readable",
1154
  "status": "pass",
1155
- "value": "Modality Feature Reconstruction",
1156
  "raw_hits": []
1157
  },
1158
  {
@@ -1161,12 +1167,6 @@
1161
  "value": "Predict one modality feature block from other modality blocks.",
1162
  "raw_hits": []
1163
  },
1164
- {
1165
- "name": "modality_reconstruction: public_field_process_short_is_human_readable",
1166
- "status": "pass",
1167
- "value": "source-target split -> scaler -> regression head",
1168
- "raw_hits": []
1169
- },
1170
  {
1171
  "name": "modality_reconstruction: known_task_family",
1172
  "status": "pass",
@@ -1243,12 +1243,6 @@
1243
  "status": "pass",
1244
  "observed": "temporal_order"
1245
  },
1246
- {
1247
- "name": "temporal_order: public_field_input_short_is_human_readable",
1248
- "status": "pass",
1249
- "value": "two adjacent windows plus difference vector",
1250
- "raw_hits": []
1251
- },
1252
  {
1253
  "name": "temporal_order: public_field_card_blurb_is_human_readable",
1254
  "status": "pass",
@@ -1256,27 +1250,27 @@
1256
  "raw_hits": []
1257
  },
1258
  {
1259
- "name": "temporal_order: public_field_display_name_is_human_readable",
1260
  "status": "pass",
1261
  "value": "Temporal Order Verification",
1262
  "raw_hits": []
1263
  },
1264
  {
1265
- "name": "temporal_order: public_field_output_short_is_human_readable",
1266
  "status": "pass",
1267
- "value": "correct or reversed",
1268
  "raw_hits": []
1269
  },
1270
  {
1271
- "name": "temporal_order: public_field_research_name_is_human_readable",
1272
  "status": "pass",
1273
  "value": "Temporal Order Verification",
1274
  "raw_hits": []
1275
  },
1276
  {
1277
- "name": "temporal_order: public_field_plain_goal_is_human_readable",
1278
  "status": "pass",
1279
- "value": "Tell whether two nearby windows are in the correct time order.",
1280
  "raw_hits": []
1281
  },
1282
  {
@@ -1285,6 +1279,12 @@
1285
  "value": "pair builder -> feature combiner -> binary classifier",
1286
  "raw_hits": []
1287
  },
 
 
 
 
 
 
1288
  {
1289
  "name": "temporal_order: known_task_family",
1290
  "status": "pass",
@@ -1360,15 +1360,21 @@
1360
  "observed": "misalignment_detection"
1361
  },
1362
  {
1363
- "name": "misalignment_detection: public_field_input_short_is_human_readable",
1364
  "status": "pass",
1365
- "value": "motion-side and visual/depth-side feature groups",
1366
  "raw_hits": []
1367
  },
1368
  {
1369
- "name": "misalignment_detection: public_field_card_blurb_is_human_readable",
1370
  "status": "pass",
1371
- "value": "Detect whether motion and visual/depth streams have been artificially shifted out of sync.",
 
 
 
 
 
 
1372
  "raw_hits": []
1373
  },
1374
  {
@@ -1384,9 +1390,9 @@
1384
  "raw_hits": []
1385
  },
1386
  {
1387
- "name": "misalignment_detection: public_field_research_name_is_human_readable",
1388
  "status": "pass",
1389
- "value": "Cross-Modal Misalignment Detection",
1390
  "raw_hits": []
1391
  },
1392
  {
@@ -1395,12 +1401,6 @@
1395
  "value": "Detect when modalities that should match are shifted out of sync.",
1396
  "raw_hits": []
1397
  },
1398
- {
1399
- "name": "misalignment_detection: public_field_process_short_is_human_readable",
1400
- "status": "pass",
1401
- "value": "aligned/shifted pairs -> feature combiner -> binary classifier",
1402
- "raw_hits": []
1403
- },
1404
  {
1405
  "name": "misalignment_detection: known_task_family",
1406
  "status": "pass",
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-06T14:53:59+00:00",
4
  "summary": {
5
  "task_count": 12,
6
  "expected_task_count": 12,
 
64
  "observed": "timeline_action"
65
  },
66
  {
67
+ "name": "timeline_action: public_field_card_blurb_is_human_readable",
68
  "status": "pass",
69
+ "value": "Recognize the current manipulation action from synchronized visual, motion, inertial, pose, and annotation context.",
70
  "raw_hits": []
71
  },
72
  {
73
+ "name": "timeline_action: public_field_research_name_is_human_readable",
74
  "status": "pass",
75
+ "value": "Egocentric Action Recognition",
76
+ "raw_hits": []
77
+ },
78
+ {
79
+ "name": "timeline_action: public_field_input_short_is_human_readable",
80
+ "status": "pass",
81
+ "value": "20-frame multimodal window",
82
  "raw_hits": []
83
  },
84
  {
 
94
  "raw_hits": []
95
  },
96
  {
97
+ "name": "timeline_action: public_field_process_short_is_human_readable",
98
  "status": "pass",
99
+ "value": "window features -> action label builder -> classifier",
100
  "raw_hits": []
101
  },
102
  {
 
105
  "value": "Look at one short multimodal window and name what action is happening now.",
106
  "raw_hits": []
107
  },
 
 
 
 
 
 
108
  {
109
  "name": "timeline_action: known_task_family",
110
  "status": "pass",
 
184
  "observed": "timeline_subtask"
185
  },
186
  {
187
+ "name": "timeline_subtask: public_field_card_blurb_is_human_readable",
188
  "status": "pass",
189
+ "value": "Recognize the broader activity stage so fine actions become a readable procedure timeline.",
190
  "raw_hits": []
191
  },
192
  {
193
+ "name": "timeline_subtask: public_field_research_name_is_human_readable",
194
  "status": "pass",
195
+ "value": "Temporal Subtask Recognition",
196
+ "raw_hits": []
197
+ },
198
+ {
199
+ "name": "timeline_subtask: public_field_input_short_is_human_readable",
200
+ "status": "pass",
201
+ "value": "20-frame multimodal window",
202
  "raw_hits": []
203
  },
204
  {
 
214
  "raw_hits": []
215
  },
216
  {
217
+ "name": "timeline_subtask: public_field_process_short_is_human_readable",
218
  "status": "pass",
219
+ "value": "window features -> subtask label builder -> classifier",
220
  "raw_hits": []
221
  },
222
  {
 
225
  "value": "Predict the higher-level task stage for the current window.",
226
  "raw_hits": []
227
  },
 
 
 
 
 
 
228
  {
229
  "name": "timeline_subtask: known_task_family",
230
  "status": "pass",
 
304
  "observed": "transition_detection"
305
  },
306
  {
307
+ "name": "transition_detection: public_field_card_blurb_is_human_readable",
308
  "status": "pass",
309
+ "value": "Detect the local moment where the episode changes from one action segment to the next.",
310
  "raw_hits": []
311
  },
312
  {
313
+ "name": "transition_detection: public_field_research_name_is_human_readable",
314
  "status": "pass",
315
+ "value": "Temporal Action Segmentation",
316
+ "raw_hits": []
317
+ },
318
+ {
319
+ "name": "transition_detection: public_field_input_short_is_human_readable",
320
+ "status": "pass",
321
+ "value": "current window with boundary target",
322
  "raw_hits": []
323
  },
324
  {
 
334
  "raw_hits": []
335
  },
336
  {
337
+ "name": "transition_detection: public_field_process_short_is_human_readable",
338
  "status": "pass",
339
+ "value": "action changes -> boundary labels -> binary classifier",
340
  "raw_hits": []
341
  },
342
  {
 
345
  "value": "Detect whether the current window is near a boundary between actions.",
346
  "raw_hits": []
347
  },
 
 
 
 
 
 
348
  {
349
  "name": "transition_detection: known_task_family",
350
  "status": "pass",
 
422
  "observed": "next_action"
423
  },
424
  {
425
+ "name": "next_action: public_field_card_blurb_is_human_readable",
426
  "status": "pass",
427
+ "value": "Forecast the near-future action from the current observations only.",
428
  "raw_hits": []
429
  },
430
  {
431
+ "name": "next_action: public_field_research_name_is_human_readable",
432
  "status": "pass",
433
+ "value": "Short-Horizon Intention Prediction",
434
+ "raw_hits": []
435
+ },
436
+ {
437
+ "name": "next_action: public_field_input_short_is_human_readable",
438
+ "status": "pass",
439
+ "value": "current window at time t",
440
  "raw_hits": []
441
  },
442
  {
 
452
  "raw_hits": []
453
  },
454
  {
455
+ "name": "next_action: public_field_process_short_is_human_readable",
456
  "status": "pass",
457
+ "value": "current features -> future label shift -> classifier",
458
  "raw_hits": []
459
  },
460
  {
 
463
  "value": "Use the current window to guess the action that will happen shortly after it.",
464
  "raw_hits": []
465
  },
 
 
 
 
 
 
466
  {
467
  "name": "next_action: known_task_family",
468
  "status": "pass",
 
540
  "observed": "hand_trajectory_forecast"
541
  },
542
  {
543
+ "name": "hand_trajectory_forecast: public_field_card_blurb_is_human_readable",
544
  "status": "pass",
545
+ "value": "Predict the future 3D left/right hand path from the current multimodal state.",
546
  "raw_hits": []
547
  },
548
  {
549
+ "name": "hand_trajectory_forecast: public_field_research_name_is_human_readable",
550
  "status": "pass",
551
+ "value": "3D Hand Motion Forecasting",
552
+ "raw_hits": []
553
+ },
554
+ {
555
+ "name": "hand_trajectory_forecast: public_field_input_short_is_human_readable",
556
+ "status": "pass",
557
+ "value": "current multimodal window",
558
  "raw_hits": []
559
  },
560
  {
 
570
  "raw_hits": []
571
  },
572
  {
573
+ "name": "hand_trajectory_forecast: public_field_process_short_is_human_readable",
574
  "status": "pass",
575
+ "value": "current features -> future mocap target -> regression head",
576
  "raw_hits": []
577
  },
578
  {
 
581
  "value": "Predict where the hands will move over the next few frames.",
582
  "raw_hits": []
583
  },
 
 
 
 
 
 
584
  {
585
  "name": "hand_trajectory_forecast: known_task_family",
586
  "status": "pass",
 
658
  "observed": "contact_prediction"
659
  },
660
  {
661
+ "name": "contact_prediction: public_field_card_blurb_is_human_readable",
662
  "status": "pass",
663
+ "value": "Predict whether body or hand contact with the scene is occurring without leaking contact labels.",
664
  "raw_hits": []
665
  },
666
  {
667
+ "name": "contact_prediction: public_field_research_name_is_human_readable",
668
  "status": "pass",
669
+ "value": "Human-Object Contact Prediction",
670
+ "raw_hits": []
671
+ },
672
+ {
673
+ "name": "contact_prediction: public_field_input_short_is_human_readable",
674
+ "status": "pass",
675
+ "value": "non-contact, non-caption features",
676
  "raw_hits": []
677
  },
678
  {
 
688
  "raw_hits": []
689
  },
690
  {
691
+ "name": "contact_prediction: public_field_process_short_is_human_readable",
692
  "status": "pass",
693
+ "value": "feature filter -> contact target -> binary classifier",
694
  "raw_hits": []
695
  },
696
  {
 
699
  "value": "Predict whether the body or hand is in contact with something.",
700
  "raw_hits": []
701
  },
 
 
 
 
 
 
702
  {
703
  "name": "contact_prediction: known_task_family",
704
  "status": "pass",
 
774
  "observed": "object_relevance"
775
  },
776
  {
777
+ "name": "object_relevance: public_field_card_blurb_is_human_readable",
778
  "status": "pass",
779
+ "value": "Infer which objects are relevant to the current manipulation window from non-caption features.",
780
  "raw_hits": []
781
  },
782
  {
783
+ "name": "object_relevance: public_field_research_name_is_human_readable",
784
  "status": "pass",
785
+ "value": "Object-Centric Interaction Recognition",
786
+ "raw_hits": []
787
+ },
788
+ {
789
+ "name": "object_relevance: public_field_input_short_is_human_readable",
790
+ "status": "pass",
791
+ "value": "non-caption multimodal features",
792
  "raw_hits": []
793
  },
794
  {
 
804
  "raw_hits": []
805
  },
806
  {
807
+ "name": "object_relevance: public_field_process_short_is_human_readable",
808
  "status": "pass",
809
+ "value": "object vocabulary -> multi-hot labels -> sigmoid heads",
810
  "raw_hits": []
811
  },
812
  {
 
815
  "value": "Predict which objects matter in the current window.",
816
  "raw_hits": []
817
  },
 
 
 
 
 
 
818
  {
819
  "name": "object_relevance: known_task_family",
820
  "status": "pass",
 
892
  "observed": "caption_grounding"
893
  },
894
  {
895
+ "name": "caption_grounding: public_field_card_blurb_is_human_readable",
896
  "status": "pass",
897
+ "value": "Retrieve the matching time window for an annotation-derived text query.",
898
  "raw_hits": []
899
  },
900
  {
901
+ "name": "caption_grounding: public_field_research_name_is_human_readable",
902
  "status": "pass",
903
+ "value": "Language-to-Moment Grounding",
904
+ "raw_hits": []
905
+ },
906
+ {
907
+ "name": "caption_grounding: public_field_input_short_is_human_readable",
908
+ "status": "pass",
909
+ "value": "text-like query and candidate windows",
910
  "raw_hits": []
911
  },
912
  {
 
922
  "raw_hits": []
923
  },
924
  {
925
+ "name": "caption_grounding: public_field_process_short_is_human_readable",
926
  "status": "pass",
927
+ "value": "query features -> candidate index -> cosine ranker",
928
  "raw_hits": []
929
  },
930
  {
 
933
  "value": "Given a text-like query from annotation, find the matching time window.",
934
  "raw_hits": []
935
  },
 
 
 
 
 
 
936
  {
937
  "name": "caption_grounding: known_task_family",
938
  "status": "pass",
 
1008
  "observed": "cross_modal_retrieval"
1009
  },
1010
  {
1011
+ "name": "cross_modal_retrieval: public_field_card_blurb_is_human_readable",
1012
  "status": "pass",
1013
+ "value": "Use motion, IMU, and camera-pose signals to retrieve the matching depth/video window.",
1014
  "raw_hits": []
1015
  },
1016
  {
1017
+ "name": "cross_modal_retrieval: public_field_research_name_is_human_readable",
1018
  "status": "pass",
1019
+ "value": "Multimodal Representation Retrieval",
1020
+ "raw_hits": []
1021
+ },
1022
+ {
1023
+ "name": "cross_modal_retrieval: public_field_input_short_is_human_readable",
1024
+ "status": "pass",
1025
+ "value": "motion/IMU/pose query; depth/video candidates",
1026
  "raw_hits": []
1027
  },
1028
  {
 
1038
  "raw_hits": []
1039
  },
1040
  {
1041
+ "name": "cross_modal_retrieval: public_field_process_short_is_human_readable",
1042
  "status": "pass",
1043
+ "value": "modality split -> projection -> nearest-neighbor ranker",
1044
  "raw_hits": []
1045
  },
1046
  {
 
1049
  "value": "Use one group of modalities to retrieve the matching window from another group.",
1050
  "raw_hits": []
1051
  },
 
 
 
 
 
 
1052
  {
1053
  "name": "cross_modal_retrieval: known_task_family",
1054
  "status": "pass",
 
1126
  "observed": "modality_reconstruction"
1127
  },
1128
  {
1129
+ "name": "modality_reconstruction: public_field_card_blurb_is_human_readable",
1130
  "status": "pass",
1131
+ "value": "Predict compressed depth/video feature vectors from motion, IMU, and camera-pose features.",
1132
  "raw_hits": []
1133
  },
1134
  {
1135
+ "name": "modality_reconstruction: public_field_research_name_is_human_readable",
1136
  "status": "pass",
1137
+ "value": "Modality Feature Reconstruction",
1138
+ "raw_hits": []
1139
+ },
1140
+ {
1141
+ "name": "modality_reconstruction: public_field_input_short_is_human_readable",
1142
+ "status": "pass",
1143
+ "value": "motion, IMU, and camera/pose features",
1144
  "raw_hits": []
1145
  },
1146
  {
 
1156
  "raw_hits": []
1157
  },
1158
  {
1159
+ "name": "modality_reconstruction: public_field_process_short_is_human_readable",
1160
  "status": "pass",
1161
+ "value": "source-target split -> scaler -> regression head",
1162
  "raw_hits": []
1163
  },
1164
  {
 
1167
  "value": "Predict one modality feature block from other modality blocks.",
1168
  "raw_hits": []
1169
  },
 
 
 
 
 
 
1170
  {
1171
  "name": "modality_reconstruction: known_task_family",
1172
  "status": "pass",
 
1243
  "status": "pass",
1244
  "observed": "temporal_order"
1245
  },
 
 
 
 
 
 
1246
  {
1247
  "name": "temporal_order: public_field_card_blurb_is_human_readable",
1248
  "status": "pass",
 
1250
  "raw_hits": []
1251
  },
1252
  {
1253
+ "name": "temporal_order: public_field_research_name_is_human_readable",
1254
  "status": "pass",
1255
  "value": "Temporal Order Verification",
1256
  "raw_hits": []
1257
  },
1258
  {
1259
+ "name": "temporal_order: public_field_input_short_is_human_readable",
1260
  "status": "pass",
1261
+ "value": "two adjacent windows plus difference vector",
1262
  "raw_hits": []
1263
  },
1264
  {
1265
+ "name": "temporal_order: public_field_display_name_is_human_readable",
1266
  "status": "pass",
1267
  "value": "Temporal Order Verification",
1268
  "raw_hits": []
1269
  },
1270
  {
1271
+ "name": "temporal_order: public_field_output_short_is_human_readable",
1272
  "status": "pass",
1273
+ "value": "correct or reversed",
1274
  "raw_hits": []
1275
  },
1276
  {
 
1279
  "value": "pair builder -> feature combiner -> binary classifier",
1280
  "raw_hits": []
1281
  },
1282
+ {
1283
+ "name": "temporal_order: public_field_plain_goal_is_human_readable",
1284
+ "status": "pass",
1285
+ "value": "Tell whether two nearby windows are in the correct time order.",
1286
+ "raw_hits": []
1287
+ },
1288
  {
1289
  "name": "temporal_order: known_task_family",
1290
  "status": "pass",
 
1360
  "observed": "misalignment_detection"
1361
  },
1362
  {
1363
+ "name": "misalignment_detection: public_field_card_blurb_is_human_readable",
1364
  "status": "pass",
1365
+ "value": "Detect whether motion and visual/depth streams have been artificially shifted out of sync.",
1366
  "raw_hits": []
1367
  },
1368
  {
1369
+ "name": "misalignment_detection: public_field_research_name_is_human_readable",
1370
  "status": "pass",
1371
+ "value": "Cross-Modal Misalignment Detection",
1372
+ "raw_hits": []
1373
+ },
1374
+ {
1375
+ "name": "misalignment_detection: public_field_input_short_is_human_readable",
1376
+ "status": "pass",
1377
+ "value": "motion-side and visual/depth-side feature groups",
1378
  "raw_hits": []
1379
  },
1380
  {
 
1390
  "raw_hits": []
1391
  },
1392
  {
1393
+ "name": "misalignment_detection: public_field_process_short_is_human_readable",
1394
  "status": "pass",
1395
+ "value": "aligned/shifted pairs -> feature combiner -> binary classifier",
1396
  "raw_hits": []
1397
  },
1398
  {
 
1401
  "value": "Detect when modalities that should match are shifted out of sync.",
1402
  "raw_hits": []
1403
  },
 
 
 
 
 
 
1404
  {
1405
  "name": "misalignment_detection: known_task_family",
1406
  "status": "pass",
metrics/website_integrity.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-06T14:36:10+00:00",
4
  "docs_root": "docs",
5
  "site_base": "/ropedia-xperience-10m-task-suite/",
6
  "summary": {
@@ -251,7 +251,7 @@
251
  },
252
  {
253
  "path": "data/artifact_index.json",
254
- "bytes": 37736,
255
  "top_level_type": "dict"
256
  },
257
  {
@@ -291,7 +291,7 @@
291
  },
292
  {
293
  "path": "data/mirror_parity.json",
294
- "bytes": 111950,
295
  "top_level_type": "dict"
296
  },
297
  {
@@ -301,7 +301,7 @@
301
  },
302
  {
303
  "path": "data/omni_finetune_verified_result.json",
304
- "bytes": 3145,
305
  "top_level_type": "dict"
306
  },
307
  {
@@ -321,7 +321,7 @@
321
  },
322
  {
323
  "path": "data/project_status.json",
324
- "bytes": 10977,
325
  "top_level_type": "dict"
326
  },
327
  {
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-06T14:54:01+00:00",
4
  "docs_root": "docs",
5
  "site_base": "/ropedia-xperience-10m-task-suite/",
6
  "summary": {
 
251
  },
252
  {
253
  "path": "data/artifact_index.json",
254
+ "bytes": 39486,
255
  "top_level_type": "dict"
256
  },
257
  {
 
291
  },
292
  {
293
  "path": "data/mirror_parity.json",
294
+ "bytes": 126335,
295
  "top_level_type": "dict"
296
  },
297
  {
 
301
  },
302
  {
303
  "path": "data/omni_finetune_verified_result.json",
304
+ "bytes": 4142,
305
  "top_level_type": "dict"
306
  },
307
  {
 
321
  },
322
  {
323
  "path": "data/project_status.json",
324
+ "bytes": 11274,
325
  "top_level_type": "dict"
326
  },
327
  {
results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/PUBLIC_RESULT_SUMMARY.md CHANGED
@@ -22,4 +22,22 @@
22
 
23
  Raw Xperience-10M files, base-model weights, adapter or checkpoint weights, full checkpoints, and large archives are not included.
24
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  Use this package as the source for README, website, and Hugging Face updates.
 
22
 
23
  Raw Xperience-10M files, base-model weights, adapter or checkpoint weights, full checkpoints, and large archives are not included.
24
 
25
+ ## Error Analysis
26
+
27
+ The package includes a derived held-out error analysis under `analysis/`. It
28
+ groups the 448 public prediction rows by episode, coarse action family,
29
+ train-seen status, required-modality state, and object category.
30
+
31
+ Key readouts:
32
+
33
+ - Official JSON validity from `metrics.json`: `0.8750`
34
+ - Parsed prediction rate from public rows: `0.8772`
35
+ - Weakest action family by parsed prediction rate: `locomotion` with 23 rows and `0.2609`
36
+ - Train-seen split: seen labels have `0.0458` action exact rate; unseen labels have `0.0158`
37
+ - Required-modality state: all held-out rows have required modalities present, with only `visualization.rrd` absent
38
+
39
+ Use `analysis/ERROR_ANALYSIS.md` and
40
+ `analysis/error_analysis_summary.json` before planning the next
41
+ structured-output pass.
42
+
43
  Use this package as the source for README, website, and Hugging Face updates.
results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/ERROR_ANALYSIS.md ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Qwen3-Omni Held-Out Error Analysis
2
+
3
+ This report is computed from the verified public package predictions. It contains only derived metrics and sanitized examples.
4
+
5
+ ## Overall
6
+
7
+ - Prediction rows: `448`
8
+ - JSON validity from `metrics.json`: `0.8750`
9
+ - Parsed prediction rate from public rows: `0.8772`
10
+ - Action exact rate: `0.0246`
11
+ - Subtask exact rate: `0.0067`
12
+ - Contact exact rate: `0.6451`
13
+ - Object F1: `0.2230`
14
+
15
+ ## Weakest Episode Groups
16
+
17
+ | group | samples | parsed_prediction_rate | action_exact_rate | object_f1 |
18
+ | --- | --- | --- | --- | --- |
19
+ | 1796b943-caad-43c6-b9bd-80b8d601f37d__ep1 | 32 | 0.5625 | 0.0000 | 0.0459 |
20
+ | 8a8e1b3c-607e-4ada-b3fd-fa639727e92c__ep1 | 32 | 0.7500 | 0.0312 | 0.0942 |
21
+ | 33f7ae08-ac1d-4321-9cb9-eca79016b359__ep1 | 32 | 0.8438 | 0.0000 | 0.0529 |
22
+ | b750fab3-7fbb-43a0-b451-c64c4d4a64da__ep1 | 32 | 0.8438 | 0.0000 | 0.2353 |
23
+ | ba18b7c1-21ff-45da-8452-41acce7fc8de__ep2 | 32 | 0.8438 | 0.0000 | 0.2836 |
24
+ | ba045ed4-ef25-404d-b756-8dcbd45b18fa__ep2 | 32 | 0.8438 | 0.0625 | 0.0746 |
25
+ | b9dd769b-e31a-4fdb-945e-5a60db6487b0__ep2 | 32 | 0.8750 | 0.0312 | 0.3265 |
26
+ | 4b02bb38-384a-438a-b5f9-6131d85c34b0__ep1 | 32 | 0.8750 | 0.0938 | 0.2830 |
27
+
28
+ ## Action Families
29
+
30
+ | group | samples | parsed_prediction_rate | action_exact_rate | subtask_exact_rate | object_f1 |
31
+ | --- | --- | --- | --- | --- | --- |
32
+ | locomotion | 23 | 0.2609 | 0.0000 | 0.0000 | 0.0120 |
33
+ | food_kitchen | 5 | 0.6000 | 0.2000 | 0.0000 | 0.2727 |
34
+ | cleaning | 8 | 0.7500 | 0.0000 | 0.0000 | 0.0000 |
35
+ | other | 94 | 0.8511 | 0.0000 | 0.0000 | 0.1910 |
36
+ | phone_use | 51 | 0.9020 | 0.0588 | 0.0196 | 0.3501 |
37
+ | paper_cardboard_craft | 142 | 0.9225 | 0.0282 | 0.0141 | 0.2308 |
38
+ | small_object_sorting | 87 | 0.9655 | 0.0000 | 0.0000 | 0.2740 |
39
+ | retail_stocking | 38 | 0.9737 | 0.0789 | 0.0000 | 0.1564 |
40
+
41
+ ## Train-Seen Split
42
+
43
+ | group | samples | parsed_prediction_rate | action_exact_rate | next_action_exact_rate |
44
+ | --- | --- | --- | --- | --- |
45
+ | unseen_in_train | 317 | 0.8454 | 0.0158 | 0.0158 |
46
+ | seen_in_train | 131 | 0.9542 | 0.0458 | 0.0458 |
47
+
48
+ ## Required-Modality State
49
+
50
+ | group | samples | parsed_prediction_rate | action_exact_rate | object_f1 |
51
+ | --- | --- | --- | --- | --- |
52
+ | rrd_missing_only_required_modalities_present | 448 | 0.8772 | 0.0246 | 0.2230 |
53
+
54
+ ## Object Categories
55
+
56
+ | group | samples | object_precision | object_recall | object_f1 |
57
+ | --- | --- | --- | --- | --- |
58
+ | furniture_room | 96 | 0.2534 | 0.2334 | 0.2430 |
59
+ | other_object | 135 | 0.1372 | 0.1643 | 0.1495 |
60
+ | food_kitchen | 56 | 0.2228 | 0.2000 | 0.2108 |
61
+ | cleaning | 8 | 0.0400 | 0.0476 | 0.0435 |
62
+ | phone_device | 162 | 0.3252 | 0.3132 | 0.3191 |
63
+ | paper_cardboard | 261 | 0.2227 | 0.3234 | 0.2638 |
64
+ | craft_small_object | 106 | 0.2266 | 0.2581 | 0.2413 |
65
+ | retail_container | 101 | 0.2028 | 0.1752 | 0.1880 |
66
+
67
+ ## Interpretation
68
+
69
+ The diagnostic pilot is dominated by invalid or weak structured outputs and exact-label failures. These tables identify where to tighten JSON constraints, action/subtask target formatting, object vocabularies, and missing-modality robustness before claiming stronger model quality.
70
+
71
+ Generated files:
72
+
73
+ - `error_analysis_summary.json`
74
+ - `episode_error_analysis.csv`
75
+ - `action_family_error_analysis.csv`
76
+ - `train_seen_error_analysis.csv`
77
+ - `missing_modality_error_analysis.csv`
78
+ - `object_category_error_analysis.csv`
results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/action_family_error_analysis.csv ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ group,samples,parsed_prediction_rate,action_exact_rate,subtask_exact_rate,transition_exact_rate,next_action_exact_rate,contact_exact_rate,object_precision,object_recall,object_f1
2
+ locomotion,23,0.2608695652173913,0.0,0.0,0.2608695652173913,0.0,0.08695652173913043,0.010752688172043012,0.0136986301369863,0.012048192771084338
3
+ food_kitchen,5,0.6,0.2,0.0,0.6,0.2,0.2,0.375,0.21428571428571427,0.2727272727272727
4
+ cleaning,8,0.75,0.0,0.0,0.625,0.0,0.625,0.0,0.0,0.0
5
+ other,94,0.851063829787234,0.0,0.0,0.8085106382978723,0.0,0.6063829787234043,0.17220543806646527,0.21428571428571427,0.19095477386934673
6
+ phone_use,51,0.9019607843137255,0.058823529411764705,0.0196078431372549,0.8431372549019608,0.058823529411764705,0.5686274509803921,0.35542168674698793,0.34502923976608185,0.3501483679525222
7
+ paper_cardboard_craft,142,0.9225352112676056,0.028169014084507043,0.014084507042253521,0.9154929577464789,0.028169014084507043,0.8169014084507042,0.1853233830845771,0.3059548254620123,0.2308288148721921
8
+ small_object_sorting,87,0.9655172413793104,0.0,0.0,0.9425287356321839,0.0,0.5747126436781609,0.26515151515151514,0.2834008097165992,0.27397260273972607
9
+ retail_stocking,38,0.9736842105263158,0.07894736842105263,0.0,0.9473684210526315,0.07894736842105263,0.7631578947368421,0.15384615384615385,0.1590909090909091,0.1564245810055866
results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/episode_error_analysis.csv ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ group,samples,parsed_prediction_rate,action_exact_rate,subtask_exact_rate,transition_exact_rate,next_action_exact_rate,contact_exact_rate,object_precision,object_recall,object_f1
2
+ 1796b943-caad-43c6-b9bd-80b8d601f37d__ep1,32,0.5625,0.0,0.0,0.5625,0.0,0.53125,0.045871559633027525,0.045871559633027525,0.045871559633027525
3
+ 8a8e1b3c-607e-4ada-b3fd-fa639727e92c__ep1,32,0.75,0.03125,0.0,0.71875,0.03125,0.4375,0.08108108108108109,0.1125,0.09424083769633508
4
+ 33f7ae08-ac1d-4321-9cb9-eca79016b359__ep1,32,0.84375,0.0,0.0,0.6875,0.0,0.53125,0.043859649122807015,0.06666666666666667,0.05291005291005291
5
+ b750fab3-7fbb-43a0-b451-c64c4d4a64da__ep1,32,0.84375,0.0,0.0,0.84375,0.0,0.375,0.2153846153846154,0.25925925925925924,0.23529411764705882
6
+ ba18b7c1-21ff-45da-8452-41acce7fc8de__ep2,32,0.84375,0.0,0.0,0.84375,0.0,0.75,0.3,0.2689655172413793,0.2836363636363637
7
+ ba045ed4-ef25-404d-b756-8dcbd45b18fa__ep2,32,0.84375,0.0625,0.0625,0.84375,0.0625,0.75,0.04830917874396135,0.16393442622950818,0.07462686567164178
8
+ b9dd769b-e31a-4fdb-945e-5a60db6487b0__ep2,32,0.875,0.03125,0.0,0.8125,0.03125,0.40625,0.30303030303030304,0.35398230088495575,0.32653061224489793
9
+ 4b02bb38-384a-438a-b5f9-6131d85c34b0__ep1,32,0.875,0.09375,0.03125,0.8125,0.09375,0.40625,0.2608695652173913,0.30927835051546393,0.2830188679245283
10
+ 9c553886-83c5-4dc4-be5c-dcb269b3a771__ep2,32,0.9375,0.0,0.0,0.9375,0.0,0.9375,0.21333333333333335,0.2831858407079646,0.24334600760456274
11
+ 5399ef86-4df9-49bc-809f-8f4f92f9e659__ep6,32,0.9375,0.0,0.0,0.90625,0.0,0.78125,0.027777777777777776,0.027777777777777776,0.027777777777777776
12
+ b6579cb5-0a71-4ca6-8808-1e2700be05c7__ep3,32,0.96875,0.03125,0.0,0.9375,0.03125,0.96875,0.5130434782608696,0.4573643410852713,0.48360655737704916
13
+ a1012a57-385e-45a9-8a59-694a26fe92a5__ep1,32,1.0,0.0,0.0,1.0,0.0,0.90625,0.1927710843373494,0.48484848484848486,0.27586206896551724
14
+ 877779cd-25f3-4293-a3c4-39067dd9558c__ep4,32,1.0,0.0,0.0,1.0,0.0,0.34375,0.3402061855670103,0.3548387096774194,0.3473684210526316
15
+ 34f07a04-eb37-45a3-95ec-189ed5f4a85b__ep5,32,1.0,0.09375,0.0,1.0,0.09375,0.90625,0.18840579710144928,0.18055555555555555,0.1843971631205674
results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/error_analysis_summary.json ADDED
@@ -0,0 +1,667 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "status": "pass",
3
+ "source_package": "xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval",
4
+ "source_prediction_rows": 448,
5
+ "metrics_json_validity_rate": 0.875,
6
+ "computed": {
7
+ "group": "overall",
8
+ "samples": 448,
9
+ "parsed_prediction_rate": 0.8772321428571429,
10
+ "action_exact_rate": 0.024553571428571428,
11
+ "subtask_exact_rate": 0.006696428571428571,
12
+ "transition_exact_rate": 0.8504464285714286,
13
+ "next_action_exact_rate": 0.024553571428571428,
14
+ "contact_exact_rate": 0.6450892857142857,
15
+ "object_precision": 0.19611111111111112,
16
+ "object_recall": 0.25841874084919475,
17
+ "object_f1": 0.22299431459254582
18
+ },
19
+ "worst_episode_groups": [
20
+ {
21
+ "group": "1796b943-caad-43c6-b9bd-80b8d601f37d__ep1",
22
+ "samples": 32,
23
+ "parsed_prediction_rate": 0.5625,
24
+ "action_exact_rate": 0.0,
25
+ "subtask_exact_rate": 0.0,
26
+ "transition_exact_rate": 0.5625,
27
+ "next_action_exact_rate": 0.0,
28
+ "contact_exact_rate": 0.53125,
29
+ "object_precision": 0.045871559633027525,
30
+ "object_recall": 0.045871559633027525,
31
+ "object_f1": 0.045871559633027525
32
+ },
33
+ {
34
+ "group": "8a8e1b3c-607e-4ada-b3fd-fa639727e92c__ep1",
35
+ "samples": 32,
36
+ "parsed_prediction_rate": 0.75,
37
+ "action_exact_rate": 0.03125,
38
+ "subtask_exact_rate": 0.0,
39
+ "transition_exact_rate": 0.71875,
40
+ "next_action_exact_rate": 0.03125,
41
+ "contact_exact_rate": 0.4375,
42
+ "object_precision": 0.08108108108108109,
43
+ "object_recall": 0.1125,
44
+ "object_f1": 0.09424083769633508
45
+ },
46
+ {
47
+ "group": "33f7ae08-ac1d-4321-9cb9-eca79016b359__ep1",
48
+ "samples": 32,
49
+ "parsed_prediction_rate": 0.84375,
50
+ "action_exact_rate": 0.0,
51
+ "subtask_exact_rate": 0.0,
52
+ "transition_exact_rate": 0.6875,
53
+ "next_action_exact_rate": 0.0,
54
+ "contact_exact_rate": 0.53125,
55
+ "object_precision": 0.043859649122807015,
56
+ "object_recall": 0.06666666666666667,
57
+ "object_f1": 0.05291005291005291
58
+ },
59
+ {
60
+ "group": "b750fab3-7fbb-43a0-b451-c64c4d4a64da__ep1",
61
+ "samples": 32,
62
+ "parsed_prediction_rate": 0.84375,
63
+ "action_exact_rate": 0.0,
64
+ "subtask_exact_rate": 0.0,
65
+ "transition_exact_rate": 0.84375,
66
+ "next_action_exact_rate": 0.0,
67
+ "contact_exact_rate": 0.375,
68
+ "object_precision": 0.2153846153846154,
69
+ "object_recall": 0.25925925925925924,
70
+ "object_f1": 0.23529411764705882
71
+ },
72
+ {
73
+ "group": "ba18b7c1-21ff-45da-8452-41acce7fc8de__ep2",
74
+ "samples": 32,
75
+ "parsed_prediction_rate": 0.84375,
76
+ "action_exact_rate": 0.0,
77
+ "subtask_exact_rate": 0.0,
78
+ "transition_exact_rate": 0.84375,
79
+ "next_action_exact_rate": 0.0,
80
+ "contact_exact_rate": 0.75,
81
+ "object_precision": 0.3,
82
+ "object_recall": 0.2689655172413793,
83
+ "object_f1": 0.2836363636363637
84
+ },
85
+ {
86
+ "group": "ba045ed4-ef25-404d-b756-8dcbd45b18fa__ep2",
87
+ "samples": 32,
88
+ "parsed_prediction_rate": 0.84375,
89
+ "action_exact_rate": 0.0625,
90
+ "subtask_exact_rate": 0.0625,
91
+ "transition_exact_rate": 0.84375,
92
+ "next_action_exact_rate": 0.0625,
93
+ "contact_exact_rate": 0.75,
94
+ "object_precision": 0.04830917874396135,
95
+ "object_recall": 0.16393442622950818,
96
+ "object_f1": 0.07462686567164178
97
+ },
98
+ {
99
+ "group": "b9dd769b-e31a-4fdb-945e-5a60db6487b0__ep2",
100
+ "samples": 32,
101
+ "parsed_prediction_rate": 0.875,
102
+ "action_exact_rate": 0.03125,
103
+ "subtask_exact_rate": 0.0,
104
+ "transition_exact_rate": 0.8125,
105
+ "next_action_exact_rate": 0.03125,
106
+ "contact_exact_rate": 0.40625,
107
+ "object_precision": 0.30303030303030304,
108
+ "object_recall": 0.35398230088495575,
109
+ "object_f1": 0.32653061224489793
110
+ },
111
+ {
112
+ "group": "4b02bb38-384a-438a-b5f9-6131d85c34b0__ep1",
113
+ "samples": 32,
114
+ "parsed_prediction_rate": 0.875,
115
+ "action_exact_rate": 0.09375,
116
+ "subtask_exact_rate": 0.03125,
117
+ "transition_exact_rate": 0.8125,
118
+ "next_action_exact_rate": 0.09375,
119
+ "contact_exact_rate": 0.40625,
120
+ "object_precision": 0.2608695652173913,
121
+ "object_recall": 0.30927835051546393,
122
+ "object_f1": 0.2830188679245283
123
+ }
124
+ ],
125
+ "action_family_groups": [
126
+ {
127
+ "group": "locomotion",
128
+ "samples": 23,
129
+ "parsed_prediction_rate": 0.2608695652173913,
130
+ "action_exact_rate": 0.0,
131
+ "subtask_exact_rate": 0.0,
132
+ "transition_exact_rate": 0.2608695652173913,
133
+ "next_action_exact_rate": 0.0,
134
+ "contact_exact_rate": 0.08695652173913043,
135
+ "object_precision": 0.010752688172043012,
136
+ "object_recall": 0.0136986301369863,
137
+ "object_f1": 0.012048192771084338
138
+ },
139
+ {
140
+ "group": "food_kitchen",
141
+ "samples": 5,
142
+ "parsed_prediction_rate": 0.6,
143
+ "action_exact_rate": 0.2,
144
+ "subtask_exact_rate": 0.0,
145
+ "transition_exact_rate": 0.6,
146
+ "next_action_exact_rate": 0.2,
147
+ "contact_exact_rate": 0.2,
148
+ "object_precision": 0.375,
149
+ "object_recall": 0.21428571428571427,
150
+ "object_f1": 0.2727272727272727
151
+ },
152
+ {
153
+ "group": "cleaning",
154
+ "samples": 8,
155
+ "parsed_prediction_rate": 0.75,
156
+ "action_exact_rate": 0.0,
157
+ "subtask_exact_rate": 0.0,
158
+ "transition_exact_rate": 0.625,
159
+ "next_action_exact_rate": 0.0,
160
+ "contact_exact_rate": 0.625,
161
+ "object_precision": 0.0,
162
+ "object_recall": 0.0,
163
+ "object_f1": 0.0
164
+ },
165
+ {
166
+ "group": "other",
167
+ "samples": 94,
168
+ "parsed_prediction_rate": 0.851063829787234,
169
+ "action_exact_rate": 0.0,
170
+ "subtask_exact_rate": 0.0,
171
+ "transition_exact_rate": 0.8085106382978723,
172
+ "next_action_exact_rate": 0.0,
173
+ "contact_exact_rate": 0.6063829787234043,
174
+ "object_precision": 0.17220543806646527,
175
+ "object_recall": 0.21428571428571427,
176
+ "object_f1": 0.19095477386934673
177
+ },
178
+ {
179
+ "group": "phone_use",
180
+ "samples": 51,
181
+ "parsed_prediction_rate": 0.9019607843137255,
182
+ "action_exact_rate": 0.058823529411764705,
183
+ "subtask_exact_rate": 0.0196078431372549,
184
+ "transition_exact_rate": 0.8431372549019608,
185
+ "next_action_exact_rate": 0.058823529411764705,
186
+ "contact_exact_rate": 0.5686274509803921,
187
+ "object_precision": 0.35542168674698793,
188
+ "object_recall": 0.34502923976608185,
189
+ "object_f1": 0.3501483679525222
190
+ },
191
+ {
192
+ "group": "paper_cardboard_craft",
193
+ "samples": 142,
194
+ "parsed_prediction_rate": 0.9225352112676056,
195
+ "action_exact_rate": 0.028169014084507043,
196
+ "subtask_exact_rate": 0.014084507042253521,
197
+ "transition_exact_rate": 0.9154929577464789,
198
+ "next_action_exact_rate": 0.028169014084507043,
199
+ "contact_exact_rate": 0.8169014084507042,
200
+ "object_precision": 0.1853233830845771,
201
+ "object_recall": 0.3059548254620123,
202
+ "object_f1": 0.2308288148721921
203
+ },
204
+ {
205
+ "group": "small_object_sorting",
206
+ "samples": 87,
207
+ "parsed_prediction_rate": 0.9655172413793104,
208
+ "action_exact_rate": 0.0,
209
+ "subtask_exact_rate": 0.0,
210
+ "transition_exact_rate": 0.9425287356321839,
211
+ "next_action_exact_rate": 0.0,
212
+ "contact_exact_rate": 0.5747126436781609,
213
+ "object_precision": 0.26515151515151514,
214
+ "object_recall": 0.2834008097165992,
215
+ "object_f1": 0.27397260273972607
216
+ },
217
+ {
218
+ "group": "retail_stocking",
219
+ "samples": 38,
220
+ "parsed_prediction_rate": 0.9736842105263158,
221
+ "action_exact_rate": 0.07894736842105263,
222
+ "subtask_exact_rate": 0.0,
223
+ "transition_exact_rate": 0.9473684210526315,
224
+ "next_action_exact_rate": 0.07894736842105263,
225
+ "contact_exact_rate": 0.7631578947368421,
226
+ "object_precision": 0.15384615384615385,
227
+ "object_recall": 0.1590909090909091,
228
+ "object_f1": 0.1564245810055866
229
+ }
230
+ ],
231
+ "train_seen_groups": [
232
+ {
233
+ "group": "unseen_in_train",
234
+ "samples": 317,
235
+ "parsed_prediction_rate": 0.8454258675078864,
236
+ "action_exact_rate": 0.015772870662460567,
237
+ "subtask_exact_rate": 0.006309148264984227,
238
+ "transition_exact_rate": 0.8233438485804416,
239
+ "next_action_exact_rate": 0.015772870662460567,
240
+ "contact_exact_rate": 0.6151419558359621,
241
+ "object_precision": 0.15804806991988346,
242
+ "object_recall": 0.23183760683760685,
243
+ "object_f1": 0.18796015591165008
244
+ },
245
+ {
246
+ "group": "seen_in_train",
247
+ "samples": 131,
248
+ "parsed_prediction_rate": 0.9541984732824428,
249
+ "action_exact_rate": 0.04580152671755725,
250
+ "subtask_exact_rate": 0.007633587786259542,
251
+ "transition_exact_rate": 0.916030534351145,
252
+ "next_action_exact_rate": 0.04580152671755725,
253
+ "contact_exact_rate": 0.7175572519083969,
254
+ "object_precision": 0.3185011709601874,
255
+ "object_recall": 0.31627906976744186,
256
+ "object_f1": 0.3173862310385064
257
+ }
258
+ ],
259
+ "missing_modality_groups": [
260
+ {
261
+ "group": "rrd_missing_only_required_modalities_present",
262
+ "samples": 448,
263
+ "parsed_prediction_rate": 0.8772321428571429,
264
+ "action_exact_rate": 0.024553571428571428,
265
+ "subtask_exact_rate": 0.006696428571428571,
266
+ "transition_exact_rate": 0.8504464285714286,
267
+ "next_action_exact_rate": 0.024553571428571428,
268
+ "contact_exact_rate": 0.6450892857142857,
269
+ "object_precision": 0.19611111111111112,
270
+ "object_recall": 0.25841874084919475,
271
+ "object_f1": 0.22299431459254582
272
+ }
273
+ ],
274
+ "object_category_groups": [
275
+ {
276
+ "group": "furniture_room",
277
+ "samples": 96,
278
+ "parsed_prediction_rate": 0.71875,
279
+ "action_exact_rate": 0.0,
280
+ "subtask_exact_rate": 0.0,
281
+ "transition_exact_rate": 0.7083333333333334,
282
+ "next_action_exact_rate": 0.0,
283
+ "contact_exact_rate": 0.4166666666666667,
284
+ "object_precision": 0.2534246575342466,
285
+ "object_recall": 0.2334384858044164,
286
+ "object_f1": 0.24302134646962234
287
+ },
288
+ {
289
+ "group": "other_object",
290
+ "samples": 135,
291
+ "parsed_prediction_rate": 0.7925925925925926,
292
+ "action_exact_rate": 0.02962962962962963,
293
+ "subtask_exact_rate": 0.007407407407407408,
294
+ "transition_exact_rate": 0.762962962962963,
295
+ "next_action_exact_rate": 0.02962962962962963,
296
+ "contact_exact_rate": 0.6,
297
+ "object_precision": 0.13717693836978131,
298
+ "object_recall": 0.16428571428571428,
299
+ "object_f1": 0.1495124593716143
300
+ },
301
+ {
302
+ "group": "food_kitchen",
303
+ "samples": 56,
304
+ "parsed_prediction_rate": 0.8571428571428571,
305
+ "action_exact_rate": 0.0,
306
+ "subtask_exact_rate": 0.0,
307
+ "transition_exact_rate": 0.8214285714285714,
308
+ "next_action_exact_rate": 0.0,
309
+ "contact_exact_rate": 0.7678571428571429,
310
+ "object_precision": 0.22277227722772278,
311
+ "object_recall": 0.2,
312
+ "object_f1": 0.2107728337236534
313
+ },
314
+ {
315
+ "group": "cleaning",
316
+ "samples": 8,
317
+ "parsed_prediction_rate": 0.875,
318
+ "action_exact_rate": 0.0,
319
+ "subtask_exact_rate": 0.0,
320
+ "transition_exact_rate": 0.875,
321
+ "next_action_exact_rate": 0.0,
322
+ "contact_exact_rate": 0.625,
323
+ "object_precision": 0.04,
324
+ "object_recall": 0.047619047619047616,
325
+ "object_f1": 0.043478260869565216
326
+ },
327
+ {
328
+ "group": "phone_device",
329
+ "samples": 162,
330
+ "parsed_prediction_rate": 0.9074074074074074,
331
+ "action_exact_rate": 0.024691358024691357,
332
+ "subtask_exact_rate": 0.006172839506172839,
333
+ "transition_exact_rate": 0.8703703703703703,
334
+ "next_action_exact_rate": 0.024691358024691357,
335
+ "contact_exact_rate": 0.5864197530864198,
336
+ "object_precision": 0.32521739130434785,
337
+ "object_recall": 0.3132328308207705,
338
+ "object_f1": 0.31911262798634815
339
+ },
340
+ {
341
+ "group": "paper_cardboard",
342
+ "samples": 261,
343
+ "parsed_prediction_rate": 0.9080459770114943,
344
+ "action_exact_rate": 0.034482758620689655,
345
+ "subtask_exact_rate": 0.011494252873563218,
346
+ "transition_exact_rate": 0.8888888888888888,
347
+ "next_action_exact_rate": 0.034482758620689655,
348
+ "contact_exact_rate": 0.7203065134099617,
349
+ "object_precision": 0.22274881516587677,
350
+ "object_recall": 0.32339449541284404,
351
+ "object_f1": 0.2637979420018709
352
+ },
353
+ {
354
+ "group": "craft_small_object",
355
+ "samples": 106,
356
+ "parsed_prediction_rate": 0.9339622641509434,
357
+ "action_exact_rate": 0.02830188679245283,
358
+ "subtask_exact_rate": 0.009433962264150943,
359
+ "transition_exact_rate": 0.9150943396226415,
360
+ "next_action_exact_rate": 0.02830188679245283,
361
+ "contact_exact_rate": 0.5,
362
+ "object_precision": 0.22662889518413598,
363
+ "object_recall": 0.25806451612903225,
364
+ "object_f1": 0.24132730015082954
365
+ },
366
+ {
367
+ "group": "retail_container",
368
+ "samples": 101,
369
+ "parsed_prediction_rate": 0.9405940594059405,
370
+ "action_exact_rate": 0.0297029702970297,
371
+ "subtask_exact_rate": 0.0,
372
+ "transition_exact_rate": 0.9108910891089109,
373
+ "next_action_exact_rate": 0.0297029702970297,
374
+ "contact_exact_rate": 0.7722772277227723,
375
+ "object_precision": 0.20279720279720279,
376
+ "object_recall": 0.17522658610271905,
377
+ "object_f1": 0.18800648298217182
378
+ },
379
+ {
380
+ "group": "tool_stationery",
381
+ "samples": 138,
382
+ "parsed_prediction_rate": 0.9565217391304348,
383
+ "action_exact_rate": 0.014492753623188406,
384
+ "subtask_exact_rate": 0.0,
385
+ "transition_exact_rate": 0.9347826086956522,
386
+ "next_action_exact_rate": 0.014492753623188406,
387
+ "contact_exact_rate": 0.8043478260869565,
388
+ "object_precision": 0.27906976744186046,
389
+ "object_recall": 0.3894523326572008,
390
+ "object_f1": 0.32514817950889074
391
+ },
392
+ {
393
+ "group": "no_object_label",
394
+ "samples": 2,
395
+ "parsed_prediction_rate": 1.0,
396
+ "action_exact_rate": 0.0,
397
+ "subtask_exact_rate": 0.0,
398
+ "transition_exact_rate": 1.0,
399
+ "next_action_exact_rate": 0.0,
400
+ "contact_exact_rate": 1.0,
401
+ "object_precision": 0.0,
402
+ "object_recall": 0.0,
403
+ "object_f1": 0.0
404
+ }
405
+ ],
406
+ "invalid_json_examples": [
407
+ {
408
+ "id": "8a8e1b3c-607e-4ada-b3fd-fa639727e92c__ep1:qa:0",
409
+ "episode_id": "8a8e1b3c-607e-4ada-b3fd-fa639727e92c__ep1",
410
+ "true_action": "Hold smartphone",
411
+ "raw_prediction_prefix": "{\"action\": \"Pour liquid into bowl\", \"contact\": \"yes\", \"evidence_window\": {\"end_frame\": 19, \"start_frame\": 0}, \"next_action\": \"Pour liquid into bowl\", \"objects\": [\"bottle\", \"bowl\", \"kitchen counter\", \"kitchen sink\", \"kitchen tap\", \"kitchen c"
412
+ },
413
+ {
414
+ "id": "8a8e1b3c-607e-4ada-b3fd-fa639727e92c__ep1:qa:90",
415
+ "episode_id": "8a8e1b3c-607e-4ada-b3fd-fa639727e92c__ep1",
416
+ "true_action": "Move towards the stove",
417
+ "raw_prediction_prefix": "{\"action\": \"Walk through workspace\", \"contact\": \"no\", \"evidence_window\": {\"end_frame\": 1819, \"start_frame\": 1800}, \"next_action\": \"Walk through workspace\", \"objects\": [\"kitchen counter\", \"kitchen sink\", \"kitchen cabinets\", \"kitchen applianc"
418
+ },
419
+ {
420
+ "id": "8a8e1b3c-607e-4ada-b3fd-fa639727e92c__ep1:qa:100",
421
+ "episode_id": "8a8e1b3c-607e-4ada-b3fd-fa639727e92c__ep1",
422
+ "true_action": "Open stove pot lid",
423
+ "raw_prediction_prefix": "{\"action\": \"Approach packing area\", \"contact\": \"no\", \"evidence_window\": {\"end_frame\": 2019, \"start_frame\": 2000}, \"next_action\": \"Approach packing area\", \"objects\": [\"kitchen counter\", \"kitchen sink\", \"kitchen cabinet\", \"kitchen utensils\", "
424
+ },
425
+ {
426
+ "id": "8a8e1b3c-607e-4ada-b3fd-fa639727e92c__ep1:qa:126",
427
+ "episode_id": "8a8e1b3c-607e-4ada-b3fd-fa639727e92c__ep1",
428
+ "true_action": "Closing the door",
429
+ "raw_prediction_prefix": "{\"action\": \"Close door\", \"contact\": \"yes\", \"evidence_window\": {\"end_frame\": 2539, \"start_frame\": 2520}, \"next_action\": \"Close door\", \"objects\": [\"door\", \"door handle\", \"plastic bag\", \"blue bucket\", \"red bucket\", \"blue container\", \"red conta"
430
+ },
431
+ {
432
+ "id": "8a8e1b3c-607e-4ada-b3fd-fa639727e92c__ep1:qa:127",
433
+ "episode_id": "8a8e1b3c-607e-4ada-b3fd-fa639727e92c__ep1",
434
+ "true_action": "Closing the door",
435
+ "raw_prediction_prefix": "{\"action\": \"Close the door\", \"contact\": \"yes\", \"evidence_window\": {\"end_frame\": 2559, \"start_frame\": 2540}, \"next_action\": \"Close the door\", \"objects\": [\"door\", \"door handle\", \"door knob\", \"door frame\", \"door hinge\", \"door latch\", \"door loc"
436
+ },
437
+ {
438
+ "id": "8a8e1b3c-607e-4ada-b3fd-fa639727e92c__ep1:qa:134",
439
+ "episode_id": "8a8e1b3c-607e-4ada-b3fd-fa639727e92c__ep1",
440
+ "true_action": "Picking up bottle",
441
+ "raw_prediction_prefix": "{\"action\": \"Carry cereal boxes\", \"contact\": \"yes\", \"evidence_window\": {\"end_frame\": 2699, \"start_frame\": 2680}, \"next_action\": \"Carry cereal boxes\", \"objects\": [\"cereal boxes\", \"door\", \"washing machine\", \"laundry basket\", \"laundry detergent"
442
+ },
443
+ {
444
+ "id": "8a8e1b3c-607e-4ada-b3fd-fa639727e92c__ep1:qa:140",
445
+ "episode_id": "8a8e1b3c-607e-4ada-b3fd-fa639727e92c__ep1",
446
+ "true_action": "Wipe kitchen counter",
447
+ "raw_prediction_prefix": "{\"action\": \"Walk through workspace\", \"contact\": \"no\", \"evidence_window\": {\"end_frame\": 2819, \"start_frame\": 2800}, \"next_action\": \"Walk through workspace\", \"objects\": [\"laundry machine\", \"laundry basket\", \"laundry detergent\", \"laundry soap\""
448
+ },
449
+ {
450
+ "id": "8a8e1b3c-607e-4ada-b3fd-fa639727e92c__ep1:qa:170",
451
+ "episode_id": "8a8e1b3c-607e-4ada-b3fd-fa639727e92c__ep1",
452
+ "true_action": "Reach for cleaning supplies",
453
+ "raw_prediction_prefix": "{\"action\": \"Close cabinet door\", \"contact\": \"yes\", \"evidence_window\": {\"end_frame\": 3419, \"start_frame\": 3400}, \"next_action\": \"Close cabinet door\", \"objects\": [\"cabinet\", \"cabinet door\", \"cabinet handle\", \"cabinet shelf\", \"cabinet drawer\","
454
+ },
455
+ {
456
+ "id": "33f7ae08-ac1d-4321-9cb9-eca79016b359__ep1:qa:0",
457
+ "episode_id": "33f7ae08-ac1d-4321-9cb9-eca79016b359__ep1",
458
+ "true_action": "Observe and walk through store",
459
+ "raw_prediction_prefix": "{\"action\": \"Walk through workspace\", \"contact\": \"no\", \"evidence_window\": {\"end_frame\": 19, \"start_frame\": 0}, \"next_action\": \"Walk through workspace\", \"objects\": [\"person\", \"mirror\", \"desk\", \"chair\", \"box\", \"shelf\", \"door\", \"wall\", \"floor\","
460
+ },
461
+ {
462
+ "id": "33f7ae08-ac1d-4321-9cb9-eca79016b359__ep1:qa:1",
463
+ "episode_id": "33f7ae08-ac1d-4321-9cb9-eca79016b359__ep1",
464
+ "true_action": "Observe and walk through store",
465
+ "raw_prediction_prefix": "{\"action\": \"Walk through workspace\", \"contact\": \"no\", \"evidence_window\": {\"end_frame\": 39, \"start_frame\": 20}, \"next_action\": \"Walk through workspace\", \"objects\": [\"person\", \"desk\", \"chair\", \"wall\", \"ceiling\", \"floor\", \"box\", \"cardboard\", \""
466
+ },
467
+ {
468
+ "id": "33f7ae08-ac1d-4321-9cb9-eca79016b359__ep1:qa:50",
469
+ "episode_id": "33f7ae08-ac1d-4321-9cb9-eca79016b359__ep1",
470
+ "true_action": "Walk towards shelves",
471
+ "raw_prediction_prefix": "{\"action\": \"Walk through workspace\", \"contact\": \"no\", \"evidence_window\": {\"end_frame\": 1019, \"start_frame\": 1000}, \"next_action\": \"Walk through workspace\", \"objects\": [\"person\", \"cardboard\", \"shelf\", \"door\", \"box\", \"jar\", \"lantern\", \"light\""
472
+ },
473
+ {
474
+ "id": "33f7ae08-ac1d-4321-9cb9-eca79016b359__ep1:qa:59",
475
+ "episode_id": "33f7ae08-ac1d-4321-9cb9-eca79016b359__ep1",
476
+ "true_action": "Observe workspace",
477
+ "raw_prediction_prefix": "{\"action\": \"Walk through workspace\", \"contact\": \"no\", \"evidence_window\": {\"end_frame\": 1199, \"start_frame\": 1180}, \"next_action\": \"Walk through workspace\", \"objects\": [\"cardboard\", \"cardboard box\", \"cardboard pieces\", \"cardboard sheet\", \"ca"
478
+ }
479
+ ],
480
+ "object_overgeneration_examples": [
481
+ {
482
+ "id": "a1012a57-385e-45a9-8a59-694a26fe92a5__ep1:qa:19",
483
+ "episode_id": "a1012a57-385e-45a9-8a59-694a26fe92a5__ep1",
484
+ "true_action": "Start cutting",
485
+ "predicted_object_count": 175,
486
+ "first_predicted_objects": [
487
+ "cardboard",
488
+ "cardboard box",
489
+ "cardboard pieces",
490
+ "cardboard sheet",
491
+ "cardboard square",
492
+ "cardboard tray",
493
+ "cardboard tube",
494
+ "utility knife",
495
+ "scissors",
496
+ "ruler",
497
+ "pen",
498
+ "marker",
499
+ "box",
500
+ "container",
501
+ "plastic container",
502
+ "tin can",
503
+ "jar",
504
+ "canned food",
505
+ "canned goods",
506
+ "canned product"
507
+ ]
508
+ },
509
+ {
510
+ "id": "33f7ae08-ac1d-4321-9cb9-eca79016b359__ep1:qa:70",
511
+ "episode_id": "33f7ae08-ac1d-4321-9cb9-eca79016b359__ep1",
512
+ "true_action": "Reach for wire hangers",
513
+ "predicted_object_count": 53,
514
+ "first_predicted_objects": [
515
+ "cardboard",
516
+ "cardboard box",
517
+ "cardboard pieces",
518
+ "cardboard shapes",
519
+ "cardboard squares",
520
+ "cardboard tray",
521
+ "cardboard tube",
522
+ "cardboard pieces",
523
+ "cardboard shapes",
524
+ "cardboard squares",
525
+ "cardboard tray",
526
+ "cardboard tube",
527
+ "blue foam pieces",
528
+ "blue foam sheet",
529
+ "blue product box",
530
+ "blue strip",
531
+ "canned food",
532
+ "canned goods",
533
+ "canned items",
534
+ "cans"
535
+ ]
536
+ },
537
+ {
538
+ "id": "ba045ed4-ef25-404d-b756-8dcbd45b18fa__ep2:qa:30",
539
+ "episode_id": "ba045ed4-ef25-404d-b756-8dcbd45b18fa__ep2",
540
+ "true_action": "Grasp lantern",
541
+ "predicted_object_count": 119,
542
+ "first_predicted_objects": [
543
+ "jar",
544
+ "red bowl",
545
+ "cardboard box",
546
+ "white paper",
547
+ "black bag",
548
+ "white bag",
549
+ "plastic bag",
550
+ "cardboard pieces",
551
+ "cardboard tray",
552
+ "cardboard sheet",
553
+ "cardboard shape",
554
+ "cardboard tube",
555
+ "cardboard strip",
556
+ "cardboard pattern",
557
+ "cardboard cutout",
558
+ "cardboard square",
559
+ "cardboard stack",
560
+ "plastic container",
561
+ "canned food",
562
+ "tin can"
563
+ ]
564
+ },
565
+ {
566
+ "id": "ba045ed4-ef25-404d-b756-8dcbd45b18fa__ep2:qa:176",
567
+ "episode_id": "ba045ed4-ef25-404d-b756-8dcbd45b18fa__ep2",
568
+ "true_action": "Release lantern",
569
+ "predicted_object_count": 205,
570
+ "first_predicted_objects": [
571
+ "jar",
572
+ "gift box",
573
+ "cardboard",
574
+ "paper lantern",
575
+ "plastic bag",
576
+ "plastic container",
577
+ "shopping bag",
578
+ "cardboard box",
579
+ "cardboard piece",
580
+ "cardboard tray",
581
+ "cardboard sheet",
582
+ "cardboard shape",
583
+ "cardboard pattern",
584
+ "cardboard square",
585
+ "cardboard strip",
586
+ "cardboard tube",
587
+ "cardboard piece",
588
+ "cardboard cutout",
589
+ "cardboard pattern piece",
590
+ "box"
591
+ ]
592
+ },
593
+ {
594
+ "id": "1796b943-caad-43c6-b9bd-80b8d601f37d__ep1:qa:40",
595
+ "episode_id": "1796b943-caad-43c6-b9bd-80b8d601f37d__ep1",
596
+ "true_action": "Move through the training room",
597
+ "predicted_object_count": 108,
598
+ "first_predicted_objects": [
599
+ "people",
600
+ "office chairs",
601
+ "desk",
602
+ "computer",
603
+ "laptop",
604
+ "office supplies",
605
+ "whiteboard",
606
+ "door",
607
+ "window",
608
+ "light fixture",
609
+ "wall",
610
+ "floor",
611
+ "box",
612
+ "cardboard",
613
+ "paper",
614
+ "plastic container",
615
+ "jar",
616
+ "bottle",
617
+ "canned food",
618
+ "snack package"
619
+ ]
620
+ }
621
+ ],
622
+ "modality_missing_by_episode": {
623
+ "8a8e1b3c-607e-4ada-b3fd-fa639727e92c__ep1": [
624
+ "visualization.rrd"
625
+ ],
626
+ "a1012a57-385e-45a9-8a59-694a26fe92a5__ep1": [
627
+ "visualization.rrd"
628
+ ],
629
+ "33f7ae08-ac1d-4321-9cb9-eca79016b359__ep1": [
630
+ "visualization.rrd"
631
+ ],
632
+ "9c553886-83c5-4dc4-be5c-dcb269b3a771__ep2": [
633
+ "visualization.rrd"
634
+ ],
635
+ "34f07a04-eb37-45a3-95ec-189ed5f4a85b__ep5": [
636
+ "visualization.rrd"
637
+ ],
638
+ "b9dd769b-e31a-4fdb-945e-5a60db6487b0__ep2": [
639
+ "visualization.rrd"
640
+ ],
641
+ "ba045ed4-ef25-404d-b756-8dcbd45b18fa__ep2": [
642
+ "visualization.rrd"
643
+ ],
644
+ "4b02bb38-384a-438a-b5f9-6131d85c34b0__ep1": [
645
+ "visualization.rrd"
646
+ ],
647
+ "5399ef86-4df9-49bc-809f-8f4f92f9e659__ep6": [
648
+ "visualization.rrd"
649
+ ],
650
+ "b750fab3-7fbb-43a0-b451-c64c4d4a64da__ep1": [
651
+ "visualization.rrd"
652
+ ],
653
+ "877779cd-25f3-4293-a3c4-39067dd9558c__ep4": [
654
+ "visualization.rrd"
655
+ ],
656
+ "1796b943-caad-43c6-b9bd-80b8d601f37d__ep1": [
657
+ "visualization.rrd"
658
+ ],
659
+ "ba18b7c1-21ff-45da-8452-41acce7fc8de__ep2": [
660
+ "visualization.rrd"
661
+ ],
662
+ "b6579cb5-0a71-4ca6-8808-1e2700be05c7__ep3": [
663
+ "visualization.rrd"
664
+ ]
665
+ },
666
+ "interpretation": "The diagnostic pilot is dominated by invalid or weak structured outputs and exact-label failures. These tables identify where to tighten JSON constraints, action/subtask target formatting, object vocabularies, and missing-modality robustness before claiming stronger model quality."
667
+ }
results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/missing_modality_error_analysis.csv ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ group,samples,parsed_prediction_rate,action_exact_rate,subtask_exact_rate,transition_exact_rate,next_action_exact_rate,contact_exact_rate,object_precision,object_recall,object_f1
2
+ rrd_missing_only_required_modalities_present,448,0.8772321428571429,0.024553571428571428,0.006696428571428571,0.8504464285714286,0.024553571428571428,0.6450892857142857,0.19611111111111112,0.25841874084919475,0.22299431459254582
results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/object_category_error_analysis.csv ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ group,samples,parsed_prediction_rate,action_exact_rate,subtask_exact_rate,transition_exact_rate,next_action_exact_rate,contact_exact_rate,object_precision,object_recall,object_f1
2
+ furniture_room,96,0.71875,0.0,0.0,0.7083333333333334,0.0,0.4166666666666667,0.2534246575342466,0.2334384858044164,0.24302134646962234
3
+ other_object,135,0.7925925925925926,0.02962962962962963,0.007407407407407408,0.762962962962963,0.02962962962962963,0.6,0.13717693836978131,0.16428571428571428,0.1495124593716143
4
+ food_kitchen,56,0.8571428571428571,0.0,0.0,0.8214285714285714,0.0,0.7678571428571429,0.22277227722772278,0.2,0.2107728337236534
5
+ cleaning,8,0.875,0.0,0.0,0.875,0.0,0.625,0.04,0.047619047619047616,0.043478260869565216
6
+ phone_device,162,0.9074074074074074,0.024691358024691357,0.006172839506172839,0.8703703703703703,0.024691358024691357,0.5864197530864198,0.32521739130434785,0.3132328308207705,0.31911262798634815
7
+ paper_cardboard,261,0.9080459770114943,0.034482758620689655,0.011494252873563218,0.8888888888888888,0.034482758620689655,0.7203065134099617,0.22274881516587677,0.32339449541284404,0.2637979420018709
8
+ craft_small_object,106,0.9339622641509434,0.02830188679245283,0.009433962264150943,0.9150943396226415,0.02830188679245283,0.5,0.22662889518413598,0.25806451612903225,0.24132730015082954
9
+ retail_container,101,0.9405940594059405,0.0297029702970297,0.0,0.9108910891089109,0.0297029702970297,0.7722772277227723,0.20279720279720279,0.17522658610271905,0.18800648298217182
10
+ tool_stationery,138,0.9565217391304348,0.014492753623188406,0.0,0.9347826086956522,0.014492753623188406,0.8043478260869565,0.27906976744186046,0.3894523326572008,0.32514817950889074
11
+ no_object_label,2,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0
results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/train_seen_error_analysis.csv ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ group,samples,parsed_prediction_rate,action_exact_rate,subtask_exact_rate,transition_exact_rate,next_action_exact_rate,contact_exact_rate,object_precision,object_recall,object_f1
2
+ unseen_in_train,317,0.8454258675078864,0.015772870662460567,0.006309148264984227,0.8233438485804416,0.015772870662460567,0.6151419558359621,0.15804806991988346,0.23183760683760685,0.18796015591165008
3
+ seen_in_train,131,0.9541984732824428,0.04580152671755725,0.007633587786259542,0.916030534351145,0.04580152671755725,0.7175572519083969,0.3185011709601874,0.31627906976744186,0.3173862310385064
scripts/build_artifact_index.py CHANGED
@@ -129,6 +129,14 @@ ARTIFACTS = [
129
  "surface": "repo_hf",
130
  "shows": "Builds synthetic verified packages for every configured backbone and audits them against the public-safe package contract.",
131
  },
 
 
 
 
 
 
 
 
132
  {
133
  "id": "additional_development_directions",
134
  "title": "Additional development directions",
@@ -674,6 +682,22 @@ ARTIFACTS = [
674
  "surface": "repo_hf",
675
  "shows": "Documents the public multi-episode access status and 32-episode pilot selection.",
676
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
677
  {
678
  "id": "citation",
679
  "title": "Citation metadata",
 
129
  "surface": "repo_hf",
130
  "shows": "Builds synthetic verified packages for every configured backbone and audits them against the public-safe package contract.",
131
  },
132
+ {
133
+ "id": "qwen3_omni_error_analysis_script",
134
+ "title": "Qwen3-Omni held-out error-analysis script",
135
+ "path": "scripts/omni/analyze_qwen3_omni_errors.py",
136
+ "kind": "scaleup_contract",
137
+ "surface": "repo_hf",
138
+ "shows": "Computes public-safe held-out error-analysis tables by episode, action family, train-seen status, required-modality state, and object category.",
139
+ },
140
  {
141
  "id": "additional_development_directions",
142
  "title": "Additional development directions",
 
682
  "surface": "repo_hf",
683
  "shows": "Documents the public multi-episode access status and 32-episode pilot selection.",
684
  },
685
+ {
686
+ "id": "qwen3_omni_error_analysis_report",
687
+ "title": "Qwen3-Omni held-out error-analysis report",
688
+ "path": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/ERROR_ANALYSIS.md",
689
+ "kind": "scaleup_status",
690
+ "surface": "repo_hf",
691
+ "shows": "Summarizes validation-aware Qwen3-Omni held-out failures by episode, action family, train-seen status, required-modality state, and object category.",
692
+ },
693
+ {
694
+ "id": "qwen3_omni_error_analysis_json",
695
+ "title": "Qwen3-Omni held-out error-analysis JSON",
696
+ "path": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/error_analysis_summary.json",
697
+ "kind": "scaleup_status",
698
+ "surface": "repo_hf",
699
+ "shows": "Machine-readable Qwen3-Omni held-out error analysis with grouped metrics and sanitized failure examples.",
700
+ },
701
  {
702
  "id": "citation",
703
  "title": "Citation metadata",
scripts/omni/analyze_qwen3_omni_errors.py ADDED
@@ -0,0 +1,370 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """Analyze public-safe Qwen3-Omni held-out prediction errors.
3
+
4
+ The script consumes a verified public package, not raw Xperience-10M data. It
5
+ summarizes where the diagnostic pilot fails by episode, train-seen status,
6
+ coarse action family, object category, parsed prediction state, and
7
+ required-modality state. The outputs are small derived CSV/JSON/Markdown
8
+ artifacts suitable for the public package.
9
+ """
10
+
11
+ from __future__ import annotations
12
+
13
+ import argparse
14
+ import csv
15
+ import json
16
+ from collections import Counter, defaultdict
17
+ from pathlib import Path
18
+ from typing import Any
19
+
20
+
21
+ DEFAULT_PACKAGE = (
22
+ Path(__file__).resolve().parents[2]
23
+ / "results/omni_finetune/verified_public/"
24
+ / "xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval"
25
+ )
26
+
27
+ ACTION_FAMILIES = [
28
+ ("phone_use", ("phone", "smartphone", "watch", "screen")),
29
+ ("paper_cardboard_craft", ("paper", "cardboard", "fold", "cut", "draw", "mark", "ruler", "scissors", "lantern", "star")),
30
+ ("retail_stocking", ("shelf", "product", "can", "canned", "container", "box", "grocery", "stock")),
31
+ ("small_object_sorting", ("bead", "button", "tile", "mahjong", "puzzle", "piece")),
32
+ ("cleaning", ("clean", "wipe", "wash", "vacuum", "sweep", "trash")),
33
+ ("locomotion", ("walk", "approach", "enter", "move through", "arrive", "leave")),
34
+ ("food_kitchen", ("kettle", "rice", "saucepan", "kitchen", "bottle", "jar", "lid")),
35
+ ]
36
+
37
+ OBJECT_CATEGORIES = [
38
+ ("phone_device", ("phone", "smartphone", "watch", "charger", "cable", "power bank", "earbud")),
39
+ ("paper_cardboard", ("paper", "cardboard", "lantern", "origami", "star", "ribbon")),
40
+ ("tool_stationery", ("scissors", "knife", "ruler", "marker", "pen", "stapler", "glue", "tape")),
41
+ ("retail_container", ("shelf", "container", "product", "box", "can", "canned", "package", "bag")),
42
+ ("furniture_room", ("table", "chair", "desk", "counter", "sink", "door", "wall", "floor")),
43
+ ("food_kitchen", ("kettle", "rice", "saucepan", "jar", "bottle", "food", "kitchen")),
44
+ ("craft_small_object", ("bead", "button", "tile", "mahjong", "puzzle", "foam", "piece")),
45
+ ("cleaning", ("vacuum", "broom", "cloth", "towel", "trash")),
46
+ ]
47
+
48
+ REQUIRED_VIDEO_FILES = {
49
+ "fisheye_cam0.mp4",
50
+ "fisheye_cam1.mp4",
51
+ "fisheye_cam2.mp4",
52
+ "fisheye_cam3.mp4",
53
+ "stereo_left.mp4",
54
+ "stereo_right.mp4",
55
+ }
56
+
57
+ REQUIRED_HDF5_MODALITIES = {
58
+ "calibration",
59
+ "slam_pose",
60
+ "slam_point_cloud",
61
+ "depth",
62
+ "depth_confidence",
63
+ "hand_mocap",
64
+ "body_mocap",
65
+ "contacts",
66
+ "imu",
67
+ "caption",
68
+ }
69
+
70
+
71
+ def parse_args() -> argparse.Namespace:
72
+ parser = argparse.ArgumentParser(description=__doc__)
73
+ parser.add_argument("--package-dir", type=Path, default=DEFAULT_PACKAGE)
74
+ parser.add_argument("--output-dir", type=Path)
75
+ parser.add_argument("--max-examples", type=int, default=12)
76
+ return parser.parse_args()
77
+
78
+
79
+ def load_json(path: Path) -> dict[str, Any]:
80
+ return json.loads(path.read_text(encoding="utf-8"))
81
+
82
+
83
+ def load_jsonl(path: Path) -> list[dict[str, Any]]:
84
+ rows = []
85
+ with path.open("r", encoding="utf-8") as handle:
86
+ for line in handle:
87
+ line = line.strip()
88
+ if line:
89
+ rows.append(json.loads(line))
90
+ return rows
91
+
92
+
93
+ def norm(value: Any) -> str:
94
+ return str(value or "").strip().lower()
95
+
96
+
97
+ def family_for(text: str, families: list[tuple[str, tuple[str, ...]]], fallback: str = "other") -> str:
98
+ low = norm(text)
99
+ for name, keywords in families:
100
+ if any(keyword in low for keyword in keywords):
101
+ return name
102
+ return fallback
103
+
104
+
105
+ def object_categories(objects: list[Any]) -> set[str]:
106
+ categories: set[str] = set()
107
+ for obj in objects:
108
+ categories.add(family_for(str(obj), OBJECT_CATEGORIES, "other_object"))
109
+ return categories or {"no_object_label"}
110
+
111
+
112
+ def f1(precision: float, recall: float) -> float:
113
+ if precision + recall == 0:
114
+ return 0.0
115
+ return 2 * precision * recall / (precision + recall)
116
+
117
+
118
+ def bool_metric(row: dict[str, Any], key: str) -> bool:
119
+ true_json = row.get("true_json") or {}
120
+ pred_json = row.get("pred_json") or {}
121
+ return norm(true_json.get(key)) == norm(pred_json.get(key)) and bool(pred_json)
122
+
123
+
124
+ def object_overlap(row: dict[str, Any]) -> tuple[int, int, int]:
125
+ true_objects = {norm(item) for item in (row.get("true_json") or {}).get("objects", []) if norm(item)}
126
+ pred_objects = {norm(item) for item in (row.get("pred_json") or {}).get("objects", []) if norm(item)}
127
+ return len(true_objects & pred_objects), len(pred_objects), len(true_objects)
128
+
129
+
130
+ def modality_state(episode: dict[str, Any] | None) -> tuple[str, list[str]]:
131
+ if not episode:
132
+ return "episode_manifest_missing", ["episode_manifest_missing"]
133
+ missing: list[str] = []
134
+ files = {str(item.get("name")): bool(item.get("exists")) for item in episode.get("files", [])}
135
+ for filename in sorted(REQUIRED_VIDEO_FILES):
136
+ if not files.get(filename):
137
+ missing.append(filename)
138
+ hdf5 = episode.get("hdf5_modalities") or {}
139
+ for modality in sorted(REQUIRED_HDF5_MODALITIES):
140
+ if not hdf5.get(modality):
141
+ missing.append(modality)
142
+ if missing:
143
+ return "missing_required_modalities", missing
144
+ if files.get("visualization.rrd") is False:
145
+ return "rrd_missing_only_required_modalities_present", ["visualization.rrd"]
146
+ return "required_modalities_present", []
147
+
148
+
149
+ def add_row_stats(bucket: dict[str, Any], row: dict[str, Any]) -> None:
150
+ bucket["samples"] += 1
151
+ valid = bool(row.get("pred_json"))
152
+ bucket["parsed_predictions"] += int(valid)
153
+ bucket["action_exact"] += int(bool_metric(row, "action"))
154
+ bucket["subtask_exact"] += int(bool_metric(row, "subtask"))
155
+ bucket["transition_exact"] += int(bool_metric(row, "transition"))
156
+ bucket["next_action_exact"] += int(bool_metric(row, "next_action"))
157
+ bucket["contact_exact"] += int(bool_metric(row, "contact"))
158
+ matched, pred_count, true_count = object_overlap(row)
159
+ bucket["object_matched"] += matched
160
+ bucket["object_predicted"] += pred_count
161
+ bucket["object_true"] += true_count
162
+
163
+
164
+ def empty_bucket() -> dict[str, Any]:
165
+ return {
166
+ "samples": 0,
167
+ "parsed_predictions": 0,
168
+ "action_exact": 0,
169
+ "subtask_exact": 0,
170
+ "transition_exact": 0,
171
+ "next_action_exact": 0,
172
+ "contact_exact": 0,
173
+ "object_matched": 0,
174
+ "object_predicted": 0,
175
+ "object_true": 0,
176
+ }
177
+
178
+
179
+ def finalize_bucket(name: str, bucket: dict[str, Any]) -> dict[str, Any]:
180
+ samples = max(int(bucket["samples"]), 1)
181
+ precision = bucket["object_matched"] / bucket["object_predicted"] if bucket["object_predicted"] else 0.0
182
+ recall = bucket["object_matched"] / bucket["object_true"] if bucket["object_true"] else 0.0
183
+ return {
184
+ "group": name,
185
+ "samples": bucket["samples"],
186
+ "parsed_prediction_rate": bucket["parsed_predictions"] / samples,
187
+ "action_exact_rate": bucket["action_exact"] / samples,
188
+ "subtask_exact_rate": bucket["subtask_exact"] / samples,
189
+ "transition_exact_rate": bucket["transition_exact"] / samples,
190
+ "next_action_exact_rate": bucket["next_action_exact"] / samples,
191
+ "contact_exact_rate": bucket["contact_exact"] / samples,
192
+ "object_precision": precision,
193
+ "object_recall": recall,
194
+ "object_f1": f1(precision, recall),
195
+ }
196
+
197
+
198
+ def write_csv(path: Path, rows: list[dict[str, Any]]) -> None:
199
+ path.parent.mkdir(parents=True, exist_ok=True)
200
+ if not rows:
201
+ path.write_text("", encoding="utf-8")
202
+ return
203
+ with path.open("w", encoding="utf-8", newline="") as handle:
204
+ writer = csv.DictWriter(handle, fieldnames=list(rows[0].keys()), lineterminator="\n")
205
+ writer.writeheader()
206
+ writer.writerows(rows)
207
+
208
+
209
+ def top_rows(groups: dict[str, dict[str, Any]], *, min_samples: int = 1, reverse: bool = False) -> list[dict[str, Any]]:
210
+ rows = [finalize_bucket(name, bucket) for name, bucket in groups.items() if bucket["samples"] >= min_samples]
211
+ return sorted(rows, key=lambda row: (row["parsed_prediction_rate"], row["action_exact_rate"], row["samples"]), reverse=reverse)
212
+
213
+
214
+ def markdown_table(rows: list[dict[str, Any]], columns: list[str], limit: int = 8) -> list[str]:
215
+ selected = rows[:limit]
216
+ if not selected:
217
+ return ["No rows."]
218
+ lines = ["| " + " | ".join(columns) + " |", "| " + " | ".join("---" for _ in columns) + " |"]
219
+ for row in selected:
220
+ values = []
221
+ for col in columns:
222
+ value = row.get(col)
223
+ if isinstance(value, float):
224
+ values.append(f"{value:.4f}")
225
+ else:
226
+ values.append(str(value))
227
+ lines.append("| " + " | ".join(values) + " |")
228
+ return lines
229
+
230
+
231
+ def main() -> int:
232
+ args = parse_args()
233
+ package_dir = args.package_dir.expanduser().resolve()
234
+ output_dir = args.output_dir or package_dir / "analysis"
235
+ output_dir = output_dir.expanduser().resolve()
236
+
237
+ predictions = load_jsonl(package_dir / "eval" / "predictions.jsonl")
238
+ metrics = load_json(package_dir / "eval" / "metrics.json")
239
+ episode_manifest = load_json(package_dir / "dataset" / "episode_manifest.json")
240
+ episodes = {episode.get("episode_id"): episode for episode in episode_manifest.get("episodes", [])}
241
+
242
+ overall = empty_bucket()
243
+ by_episode: dict[str, dict[str, Any]] = defaultdict(empty_bucket)
244
+ by_family: dict[str, dict[str, Any]] = defaultdict(empty_bucket)
245
+ by_seen: dict[str, dict[str, Any]] = defaultdict(empty_bucket)
246
+ by_modality: dict[str, dict[str, Any]] = defaultdict(empty_bucket)
247
+ by_object_category: dict[str, dict[str, Any]] = defaultdict(empty_bucket)
248
+ invalid_examples = []
249
+ overgenerated_examples = []
250
+ modality_missing_by_episode: dict[str, list[str]] = {}
251
+
252
+ for row in predictions:
253
+ episode_id = str(row.get("episode_id"))
254
+ true_json = row.get("true_json") or {}
255
+ pred_json = row.get("pred_json") or {}
256
+ add_row_stats(overall, row)
257
+ add_row_stats(by_episode[episode_id], row)
258
+ add_row_stats(by_family[family_for(str(true_json.get("action")), ACTION_FAMILIES)], row)
259
+ add_row_stats(by_seen["seen_in_train" if row.get("true_label_seen_in_train") else "unseen_in_train"], row)
260
+ state, missing = modality_state(episodes.get(episode_id))
261
+ modality_missing_by_episode.setdefault(episode_id, missing)
262
+ add_row_stats(by_modality[state], row)
263
+ for category in object_categories(true_json.get("objects", [])):
264
+ add_row_stats(by_object_category[category], row)
265
+ if not pred_json and len(invalid_examples) < args.max_examples:
266
+ invalid_examples.append({
267
+ "id": row.get("id"),
268
+ "episode_id": episode_id,
269
+ "true_action": true_json.get("action"),
270
+ "raw_prediction_prefix": str(row.get("raw_prediction", ""))[:240],
271
+ })
272
+ pred_objects = pred_json.get("objects", []) if isinstance(pred_json, dict) else []
273
+ if len(pred_objects) > 20 and len(overgenerated_examples) < args.max_examples:
274
+ overgenerated_examples.append({
275
+ "id": row.get("id"),
276
+ "episode_id": episode_id,
277
+ "true_action": true_json.get("action"),
278
+ "predicted_object_count": len(pred_objects),
279
+ "first_predicted_objects": pred_objects[:20],
280
+ })
281
+
282
+ episode_rows = top_rows(by_episode)
283
+ family_rows = top_rows(by_family)
284
+ seen_rows = top_rows(by_seen)
285
+ modality_rows = top_rows(by_modality)
286
+ object_rows = top_rows(by_object_category)
287
+
288
+ write_csv(output_dir / "episode_error_analysis.csv", episode_rows)
289
+ write_csv(output_dir / "action_family_error_analysis.csv", family_rows)
290
+ write_csv(output_dir / "train_seen_error_analysis.csv", seen_rows)
291
+ write_csv(output_dir / "missing_modality_error_analysis.csv", modality_rows)
292
+ write_csv(output_dir / "object_category_error_analysis.csv", object_rows)
293
+
294
+ summary = {
295
+ "status": "pass",
296
+ "source_package": package_dir.name,
297
+ "source_prediction_rows": len(predictions),
298
+ "metrics_json_validity_rate": metrics.get("json_validity_rate"),
299
+ "computed": finalize_bucket("overall", overall),
300
+ "worst_episode_groups": episode_rows[:8],
301
+ "action_family_groups": family_rows,
302
+ "train_seen_groups": seen_rows,
303
+ "missing_modality_groups": modality_rows,
304
+ "object_category_groups": object_rows,
305
+ "invalid_json_examples": invalid_examples,
306
+ "object_overgeneration_examples": overgenerated_examples,
307
+ "modality_missing_by_episode": modality_missing_by_episode,
308
+ "interpretation": (
309
+ "The diagnostic pilot is dominated by invalid or weak structured outputs and exact-label failures. "
310
+ "These tables identify where to tighten JSON constraints, action/subtask target formatting, object vocabularies, "
311
+ "and missing-modality robustness before claiming stronger model quality."
312
+ ),
313
+ }
314
+ (output_dir / "error_analysis_summary.json").write_text(json.dumps(summary, indent=2) + "\n", encoding="utf-8")
315
+
316
+ report = [
317
+ "# Qwen3-Omni Held-Out Error Analysis",
318
+ "",
319
+ "This report is computed from the verified public package predictions. It contains only derived metrics and sanitized examples.",
320
+ "",
321
+ "## Overall",
322
+ "",
323
+ f"- Prediction rows: `{len(predictions)}`",
324
+ f"- JSON validity from `metrics.json`: `{summary['metrics_json_validity_rate']:.4f}`",
325
+ f"- Parsed prediction rate from public rows: `{summary['computed']['parsed_prediction_rate']:.4f}`",
326
+ f"- Action exact rate: `{summary['computed']['action_exact_rate']:.4f}`",
327
+ f"- Subtask exact rate: `{summary['computed']['subtask_exact_rate']:.4f}`",
328
+ f"- Contact exact rate: `{summary['computed']['contact_exact_rate']:.4f}`",
329
+ f"- Object F1: `{summary['computed']['object_f1']:.4f}`",
330
+ "",
331
+ "## Weakest Episode Groups",
332
+ "",
333
+ *markdown_table(episode_rows, ["group", "samples", "parsed_prediction_rate", "action_exact_rate", "object_f1"]),
334
+ "",
335
+ "## Action Families",
336
+ "",
337
+ *markdown_table(family_rows, ["group", "samples", "parsed_prediction_rate", "action_exact_rate", "subtask_exact_rate", "object_f1"]),
338
+ "",
339
+ "## Train-Seen Split",
340
+ "",
341
+ *markdown_table(seen_rows, ["group", "samples", "parsed_prediction_rate", "action_exact_rate", "next_action_exact_rate"]),
342
+ "",
343
+ "## Required-Modality State",
344
+ "",
345
+ *markdown_table(modality_rows, ["group", "samples", "parsed_prediction_rate", "action_exact_rate", "object_f1"]),
346
+ "",
347
+ "## Object Categories",
348
+ "",
349
+ *markdown_table(object_rows, ["group", "samples", "object_precision", "object_recall", "object_f1"]),
350
+ "",
351
+ "## Interpretation",
352
+ "",
353
+ summary["interpretation"],
354
+ "",
355
+ "Generated files:",
356
+ "",
357
+ "- `error_analysis_summary.json`",
358
+ "- `episode_error_analysis.csv`",
359
+ "- `action_family_error_analysis.csv`",
360
+ "- `train_seen_error_analysis.csv`",
361
+ "- `missing_modality_error_analysis.csv`",
362
+ "- `object_category_error_analysis.csv`",
363
+ ]
364
+ (output_dir / "ERROR_ANALYSIS.md").write_text("\n".join(report) + "\n", encoding="utf-8")
365
+ print(json.dumps({"status": "pass", "output_dir": str(output_dir), "prediction_rows": len(predictions)}, indent=2))
366
+ return 0
367
+
368
+
369
+ if __name__ == "__main__":
370
+ raise SystemExit(main())
scripts/validate_mirror_parity.py CHANGED
@@ -30,6 +30,7 @@ DATA_FILES = [
30
  "foundation_model_plan.json",
31
  "live_publication_status.json",
32
  "modality_atlas.json",
 
33
  "project_brief.json",
34
  "project_manifest.json",
35
  "project_packet.json",
@@ -76,6 +77,7 @@ ASSET_FILES = [
76
  ]
77
 
78
  SCRIPT_FILES = [
 
79
  "audio_ablation_and_raw_upgrade.py",
80
  "build_artifact_index.py",
81
  "build_brand_assets.py",
@@ -122,9 +124,18 @@ RESULT_FILES = [
122
  "single_episode_diagnostics/timeline_overlay/timeline_overlay.csv",
123
  "single_episode_diagnostics/alignment_stress/alignment_shift_metrics.csv",
124
  "single_episode_diagnostics/alignment_stress/alignment_stress_summary.json",
 
 
 
 
 
 
 
125
  ]
126
 
127
  DOC_FILES = [
 
 
128
  "QUALITY_GATES.md",
129
  "EVALUATION_PROTOCOL.md",
130
  "FIGURE_INDEX.md",
 
30
  "foundation_model_plan.json",
31
  "live_publication_status.json",
32
  "modality_atlas.json",
33
+ "omni_finetune_verified_result.json",
34
  "project_brief.json",
35
  "project_manifest.json",
36
  "project_packet.json",
 
77
  ]
78
 
79
  SCRIPT_FILES = [
80
+ "omni/analyze_qwen3_omni_errors.py",
81
  "audio_ablation_and_raw_upgrade.py",
82
  "build_artifact_index.py",
83
  "build_brand_assets.py",
 
124
  "single_episode_diagnostics/timeline_overlay/timeline_overlay.csv",
125
  "single_episode_diagnostics/alignment_stress/alignment_shift_metrics.csv",
126
  "single_episode_diagnostics/alignment_stress/alignment_stress_summary.json",
127
+ "omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/ERROR_ANALYSIS.md",
128
+ "omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/error_analysis_summary.json",
129
+ "omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/episode_error_analysis.csv",
130
+ "omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/action_family_error_analysis.csv",
131
+ "omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/train_seen_error_analysis.csv",
132
+ "omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/missing_modality_error_analysis.csv",
133
+ "omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_96train_16val_16test_valmon_20260605_eval/analysis/object_category_error_analysis.csv",
134
  ]
135
 
136
  DOC_FILES = [
137
+ "ARTIFACT_GUIDE.md",
138
+ "OMNI_MODEL_EXTENSION_CONTRACT.md",
139
  "QUALITY_GATES.md",
140
  "EVALUATION_PROTOCOL.md",
141
  "FIGURE_INDEX.md",