cy0307 commited on
Commit
bfcf156
·
verified ·
1 Parent(s): e11423d

Add Xperience embodied foundation pretraining goal

Browse files
ARTIFACT_GUIDE.md CHANGED
@@ -3,15 +3,17 @@
3
  This guide is the human-readable map for the public Ropedia Xperience-10M task
4
  suite artifacts. It is organized around what a reader usually wants to do:
5
  understand the project, inspect the sample episode, compare baselines, read the
6
- task results, and follow the Qwen3-Omni scale-up path.
 
7
 
8
  ## Start Here
9
 
10
  | Artifact | Why to open it first |
11
  | --- | --- |
12
  | [`PROJECT_STATUS.md`](PROJECT_STATUS.md) | Gives the fastest current-state table: implemented, in staging, and outside current scope. |
13
- | [`RESEARCH_ROADMAP.md`](RESEARCH_ROADMAP.md) | Shows the roadmap from public-sample task development to multi-episode data preparation, Qwen3-Omni LoRA, robustness runs, and larger omni-model extensions. |
14
  | [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md) | Explains which foundation backbones fit which Xperience-10M objective: Qwen3-Omni first, Cosmos 3 for world modeling, and VLA/policy models after action-target conversion. |
 
15
  | [`EVALUATION_PROTOCOL.md`](EVALUATION_PROTOCOL.md) | Defines the task unit, chronological split, metrics, leakage controls, and current limitations. |
16
  | [`REPRODUCIBILITY.md`](REPRODUCIBILITY.md) | Defines public reproduction commands, expected outputs, and unreproducible boundaries. |
17
  | [`results/audio_ablation/AUDIO_ABLATION_SUMMARY.md`](results/audio_ablation/AUDIO_ABLATION_SUMMARY.md) | Shows measured current-audio and raw log-mel replacement deltas across the 12 task contracts. |
@@ -107,6 +109,7 @@ research project.
107
  | [`scripts/omni/train_qwen3_omni_lora.py`](scripts/omni/train_qwen3_omni_lora.py) | Training entrypoint for the Qwen3-Omni LoRA pilot after the data gate passes. |
108
  | [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md) | Adds the post-data-gate backbone selection plan: Qwen3-Omni first, Cosmos 3 for world modeling, and OpenVLA/openpi/GR00T for policy/action branches. |
109
  | [`docs/data/foundation_model_plan.json`](docs/data/foundation_model_plan.json) | Machine-readable model-family registry with source links, entry conditions, and evaluation additions. |
 
110
 
111
  ## What Is Not Included
112
 
 
3
  This guide is the human-readable map for the public Ropedia Xperience-10M task
4
  suite artifacts. It is organized around what a reader usually wants to do:
5
  understand the project, inspect the sample episode, compare baselines, read the
6
+ task results, follow the Qwen3-Omni scale-up path, and understand the longer
7
+ Xperience-native pretraining goal.
8
 
9
  ## Start Here
10
 
11
  | Artifact | Why to open it first |
12
  | --- | --- |
13
  | [`PROJECT_STATUS.md`](PROJECT_STATUS.md) | Gives the fastest current-state table: implemented, in staging, and outside current scope. |
14
+ | [`RESEARCH_ROADMAP.md`](RESEARCH_ROADMAP.md) | Shows the roadmap from public-sample task development to multi-episode data preparation, Qwen3-Omni LoRA, robustness runs, model branches, and the future native-pretraining goal. |
15
  | [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md) | Explains which foundation backbones fit which Xperience-10M objective: Qwen3-Omni first, Cosmos 3 for world modeling, and VLA/policy models after action-target conversion. |
16
+ | [`XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md`](XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md) | Describes the future full-corpus Xperience Embodied Foundation Model goal, including modules, objectives, staged scale-up, hardware ranges, and evaluation. |
17
  | [`EVALUATION_PROTOCOL.md`](EVALUATION_PROTOCOL.md) | Defines the task unit, chronological split, metrics, leakage controls, and current limitations. |
18
  | [`REPRODUCIBILITY.md`](REPRODUCIBILITY.md) | Defines public reproduction commands, expected outputs, and unreproducible boundaries. |
19
  | [`results/audio_ablation/AUDIO_ABLATION_SUMMARY.md`](results/audio_ablation/AUDIO_ABLATION_SUMMARY.md) | Shows measured current-audio and raw log-mel replacement deltas across the 12 task contracts. |
 
109
  | [`scripts/omni/train_qwen3_omni_lora.py`](scripts/omni/train_qwen3_omni_lora.py) | Training entrypoint for the Qwen3-Omni LoRA pilot after the data gate passes. |
110
  | [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md) | Adds the post-data-gate backbone selection plan: Qwen3-Omni first, Cosmos 3 for world modeling, and OpenVLA/openpi/GR00T for policy/action branches. |
111
  | [`docs/data/foundation_model_plan.json`](docs/data/foundation_model_plan.json) | Machine-readable model-family registry with source links, entry conditions, and evaluation additions. |
112
+ | [`XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md`](XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md) | Future full-corpus Xperience-native pretraining plan; not a current model result. |
113
 
114
  ## What Is Not Included
115
 
FOUNDATION_MODEL_PLAN.md CHANGED
@@ -20,6 +20,7 @@ run a held-out multi-episode foundation-model evaluation.
20
  | 5 | openpi pi0/pi0.5 | Open robot policy and action expert baseline | Useful for action chunking, policy fine-tuning, and embodiment transfer experiments | Candidate for policy branch once action labels are retargeted |
21
  | 6 | Gemini Robotics | Closed/API embodied reasoning reference | Strong candidate for qualitative reasoning and task interpretation, but not a local fine-tune target | Use only as an external comparison or annotation assistant |
22
  | 7 | Octo / SmolVLA-style lightweight policies | Smaller reproducible robot-policy baselines | Good for cheaper action-policy experiments, but less directly omni-modal | Optional baseline branch after selected-episode data preparation |
 
23
 
24
  ## Why Qwen3-Omni Still Goes First
25
 
@@ -38,6 +39,46 @@ prepare video/audio/language prompts and adapter inputs. It is also suitable for
38
  the 12 current task contracts, which mostly produce labels, structured JSON, or
39
  short task answers.
40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
  ## Why Cosmos 3 Should Be Added Next
42
 
43
  Cosmos 3 should not replace the Qwen3-Omni pilot. It should become the first
@@ -105,6 +146,9 @@ The foundation-model stage should add metrics beyond the current 12-task suite:
105
  retargeting artifacts are traceable.
106
  6. Update public cards only when a branch has real manifests, predictions,
107
  metrics, and qualitative examples.
 
 
 
108
 
109
  ## Source Links
110
 
@@ -116,3 +160,5 @@ The foundation-model stage should add metrics beyond the current 12-task suite:
116
  - Gemini Robotics: https://deepmind.google/discover/blog/gemini-robotics-brings-ai-into-the-physical-world/
117
  - Octo: https://octo-models.github.io/
118
  - LeRobot / SmolVLA: https://github.com/huggingface/lerobot
 
 
 
20
  | 5 | openpi pi0/pi0.5 | Open robot policy and action expert baseline | Useful for action chunking, policy fine-tuning, and embodiment transfer experiments | Candidate for policy branch once action labels are retargeted |
21
  | 6 | Gemini Robotics | Closed/API embodied reasoning reference | Strong candidate for qualitative reasoning and task interpretation, but not a local fine-tune target | Use only as an external comparison or annotation assistant |
22
  | 7 | Octo / SmolVLA-style lightweight policies | Smaller reproducible robot-policy baselines | Good for cheaper action-policy experiments, but less directly omni-modal | Optional baseline branch after selected-episode data preparation |
23
+ | Future | Xperience Embodied Foundation Model | Xperience-native domain model pretrained from scratch on full-corpus embodied experience | Would learn a shared temporal representation across video, audio, depth, pose, mocap, IMU, and language | Long-term goal after smaller pilots prove value and full-corpus storage/compute are available |
24
 
25
  ## Why Qwen3-Omni Still Goes First
26
 
 
39
  the 12 current task contracts, which mostly produce labels, structured JSON, or
40
  short task answers.
41
 
42
+ The executable Qwen branch and future branch contracts are now represented as
43
+ config files under `configs/omni_backbones/`. Validate them with:
44
+
45
+ ```bash
46
+ python scripts/omni/backbone_registry.py --validate --json
47
+ ```
48
+
49
+ The shared extension rules are in
50
+ [`OMNI_MODEL_EXTENSION_CONTRACT.md`](OMNI_MODEL_EXTENSION_CONTRACT.md). A new
51
+ foundation branch should add a config first, then implement the exporter,
52
+ trainer, evaluator, and launcher required by that config.
53
+
54
+ ## Long-Term Native Pretraining Goal
55
+
56
+ Qwen3-Omni, Cosmos 3, GR00T, OpenVLA, and openpi are backbone choices for the
57
+ next experiments. The longer-term goal is different: train an
58
+ **Xperience Embodied Foundation Model** that is native to the Xperience-10M
59
+ modality structure.
60
+
61
+ That model would not start as a general internet-scale omni model. It would be
62
+ a domain model over synchronized embodied experience: multi-view egocentric
63
+ video, audio, depth, pose/SLAM, hand and body mocap, IMU, calibration, and
64
+ language annotations. Its pretraining should combine masked multimodal
65
+ modeling, cross-modal contrastive alignment, future-state prediction,
66
+ ego-motion and hand-motion forecasting, action/procedure prediction, language
67
+ grounding, contact/affordance prediction, and optional policy-style targets
68
+ after action conversion.
69
+
70
+ This is not a current result in the repo. It becomes appropriate only after:
71
+
72
+ - the selected multi-episode pipeline trains and evaluates cleanly,
73
+ - scaling from 128 episodes to thousands of episodes shows measurable value,
74
+ - raw-corpus storage and derived-shard capacity are available,
75
+ - distributed training and checkpoint/restart infrastructure are reliable,
76
+ - evaluation covers held-out episodes, sessions, activities, objects, and
77
+ missing-modality robustness.
78
+
79
+ The full plan is documented in
80
+ [`XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md`](XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md).
81
+
82
  ## Why Cosmos 3 Should Be Added Next
83
 
84
  Cosmos 3 should not replace the Qwen3-Omni pilot. It should become the first
 
146
  retargeting artifacts are traceable.
147
  6. Update public cards only when a branch has real manifests, predictions,
148
  metrics, and qualitative examples.
149
+ 7. Start Xperience-native pretraining only after smaller scaling stages,
150
+ full-corpus storage, multi-node compute, and held-out evaluation protocols
151
+ are in place.
152
 
153
  ## Source Links
154
 
 
160
  - Gemini Robotics: https://deepmind.google/discover/blog/gemini-robotics-brings-ai-into-the-physical-world/
161
  - Octo: https://octo-models.github.io/
162
  - LeRobot / SmolVLA: https://github.com/huggingface/lerobot
163
+ - Xperience Embodied Foundation Model pretraining plan:
164
+ `XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md`
PROJECT_README.md CHANGED
@@ -42,7 +42,7 @@ embodied-AI research infrastructure:
42
  | Multimodal data understanding | Parses the public sample into synchronized windows across video, audio, depth, pose/SLAM, mocap, IMU, calibration, and language-derived signals |
43
  | Task design | Defines 12 human-readable tasks plus four direction-extension probes with inputs, outputs, process modules, metrics, and case-study walkthroughs |
44
  | Model and evaluation discipline | Runs minimal and compact neural baselines, records predictions/metrics, keeps chronological split boundaries explicit, and separates sample evidence from held-out claims |
45
- | Scale-up planning | Connects the public-sample pipeline to 32/128-episode held-out pilots, Qwen3-Omni LoRA, Cosmos-style world-model branches, and later policy-model branches |
46
 
47
  ## Start Here
48
 
@@ -59,6 +59,7 @@ before the multi-episode omni-model stage becomes a real held-out evaluation.
59
  | Navigate the 12 tasks, four tracks, and scale-up plan | [Interactive research roadmap](https://chaoyue0307.github.io/ropedia-xperience-10m-task-suite/research_roadmap.html), [`docs/data/research_roadmap_interactive.json`](docs/data/research_roadmap_interactive.json) |
60
  | Compare current task metrics | [`RESEARCH_TAKEAWAYS.md`](RESEARCH_TAKEAWAYS.md), [`docs/data/summary_metrics.json`](docs/data/summary_metrics.json) |
61
  | Compare possible foundation backbones | [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md), [`docs/data/foundation_model_plan.json`](docs/data/foundation_model_plan.json) |
 
62
  | Understand one model input | [`results/episode_task_suite/feature_manifest.json`](results/episode_task_suite/feature_manifest.json), [`results/episode_task_suite/windows.csv`](results/episode_task_suite/windows.csv) |
63
  | Check multi-episode data status | [`results/omni_finetune/DATA_ACCESS_STATUS.md`](results/omni_finetune/DATA_ACCESS_STATUS.md) |
64
 
@@ -71,7 +72,7 @@ before the multi-episode omni-model stage becomes a real held-out evaluation.
71
  | Task suite | 12 human-readable embodied-AI task contracts with input, process, output, metrics, predictions, and case-study walkthroughs |
72
  | Baselines | Minimal linear/ridge/logistic heads plus compact PyTorch MLP task heads over the same chronological split |
73
  | Research directions | Task mapping and extension probes for human modeling, 3D/4D reconstruction, egocentric interaction, and world modeling |
74
- | Scale-up path | The gated Xperience-10M dataset is available for a selected 128-episode pilot before Qwen3-Omni LoRA, followed by Cosmos 3/world-model and VLA/policy branches |
75
  | Public surfaces | GitHub repo, GitHub Pages dashboard, HF Space, HF artifact dataset, HF baseline-model repo, and HF collection |
76
 
77
  For the fastest interpretation of the current metrics, start with
@@ -93,100 +94,27 @@ Current contributions:
93
  - human-readable research task cards and an interactive scrub/play walkthrough storyboard for every task,
94
  - an interactive research roadmap connecting 12 tasks, four research tracks, current sample evidence, the Qwen3-Omni scale-up path, and foundation-model branch selection,
95
  - a next-milestone track for Qwen3-Omni fine-tuning, Cosmos 3 world modeling, and sensor-bridge evaluation,
 
96
  - metrics, predictions, model weights, manifests, charts, and a two-level
97
  tabbed static research website,
98
  - a clear explanation of what is implemented now and what moves to the multi-episode stage.
99
 
100
  ## Current Research Scope
101
 
102
- This repo separates implemented single-episode research artifacts from future
103
- multi-episode held-out model metrics:
104
 
105
- | Project layer | Evidence | Current scope |
106
  | --- | --- | --- |
107
- | Official Xperience-10M description | `XPERIENCE10M_DATASET_CARD_ALIGNMENT.md`, `docs/data/xperience10m_dataset_card_alignment.json` | aligns public wording with the official gated dataset card, public sample card, and HF API metadata; does not mirror raw data |
108
- | Source alignment | `SOURCE_ALIGNMENT_AUDIT.md`, `docs/data/source_alignment_audit.json`, `scripts/validate_source_alignment.py` | records the same official dataset facts, public sample details, API-listing notes, and project coverage across repo, website, and HF cards |
109
- | Figure index | `FIGURE_INDEX.md`, `docs/data/figure_index.json`, `scripts/build_figure_index.py` | catalogs public figures, charts, modality thumbnails, dimensions, hashes, roles, and source scripts |
110
- | Brand assets | `docs/assets/brand/`, `docs/favicon.png`, `docs/apple-touch-icon.png`, `scripts/build_brand_assets.py` | applies the generated project logo system across the website, README, HF cards, favicon, and social previews |
111
- | Data windows | `results/episode_task_suite/windows.csv`, `shared_windows.npz`, `summary_report.json` | one public sample episode |
112
- | Feature contract | `results/episode_task_suite/feature_manifest.json`, `available_modalities.json` | documents the 8,546-dimensional multimodal representation and source coverage |
113
- | Evaluation protocol | `EVALUATION_PROTOCOL.md`, `docs/data/evaluation_protocol.json`, `scripts/build_evaluation_protocol.py` | defines windowing, chronological split, leakage controls, per-task metrics, and current limitations |
114
- | Research Takeaways | `RESEARCH_TAKEAWAYS.md`, `docs/data/research_takeaways.json`, `scripts/build_research_takeaways.py` | summarizes result interpretation from committed metrics and identifies which experiments need held-out episodes |
115
- | Audio ablation | `scripts/audio_ablation_and_raw_upgrade.py`, `results/audio_ablation/`, `docs/data/audio_ablation_summary.json` | measures whether audio helps each of the 12 task contracts |
116
- | Research roadmap | `RESEARCH_ROADMAP.md`, `docs/research_roadmap.html`, `docs/data/research_roadmap.json`, `docs/data/research_roadmap_interactive.json` | stages and visualizes the path from public-sample task development to multi-episode held-out evaluation, foundation-model selection, and larger omni/world-model extensions |
117
- | Foundation-model plan | `FOUNDATION_MODEL_PLAN.md`, `docs/data/foundation_model_plan.json` | keeps Qwen3-Omni as the first trainable pilot, adds Cosmos 3 as the first world-model branch, and tracks OpenVLA/openpi/GR00T policy candidates |
118
- | 12-task suite | `scripts/episode_task_suite.py`, per-task `metrics.json`, predictions | chronological single-episode split |
119
- | Single-episode diagnostics | `scripts/single_episode_diagnostics.py`, `results/single_episode_diagnostics/`, `docs/single_episode_explorer.html` | modality ablations, timeline overlay, object-label export, alignment stress tests, and interactive window inspection from one sample episode |
120
- | Neural heads | `scripts/neural_task_models.py`, `results/episode_task_suite/neural_mlp/` | compact MLP heads, not a foundation model |
121
- | Research directions | `research_direction_taxonomy.json`, extension probe results | direct/proxy/diagnostic evidence, not full solutions |
122
- | Task surface integrity | `docs/data/task_surface_integrity.json`, `scripts/validate_task_surface.py` | public task cards stay human-readable, thumbnail-backed, and wired to the scrub/play walkthrough storyboard |
123
- | Rendered website check | `RENDERED_SITE_CHECK.md`, `docs/data/rendered_site_check.json`, `scripts/build_rendered_site_check.py` | records a browser-level load, tab, walkthrough deep-link, control-click, and console-health check |
124
- | Public project surface | `PUBLIC_SURFACE_QA.md`, `docs/data/public_surface_qa.json`, `scripts/build_public_surface_qa.py` | presents the repo, website, and Hugging Face cards as one research project surface |
125
- | Qwen3-Omni | `results/omni_finetune/DATA_ACCESS_STATUS.md`, `MULTI_EPISODE_ACCESS_STATUS.md` | the gated full dataset is available for a selected 128-episode pilot before held-out evaluation |
126
- | Multi-episode pilot status | `scripts/validate_scope_claims.py`, `docs/data/scope_claims_audit.json` | separates setup artifacts, selected-episode preparation, and completed held-out-episode metrics |
127
- | Mirror parity | `scripts/validate_mirror_parity.py`, `docs/data/mirror_parity.json` | prepared GitHub/HF mirrors carry matching data, figure, website HTML, and validator files |
128
- | Public bundle contents | `scripts/validate_publication_package.py`, `docs/data/publication_audit.json` | summarizes the public repo and HF bundles, including raw-data exclusion and temporary local-file exclusion |
129
- | Release checks | `QUALITY_GATES.md`, `docs/data/quality_gates.json`, `metrics/quality_gates.json`, `scripts/build_quality_gates.py` | one map for automated checks and live post-publish verification; the `metrics/` path is the Hugging Face model-repo mirror |
130
- | Artifact index | `scripts/build_artifact_index.py`, `docs/data/artifact_index.json` | selective source-of-truth catalog with existence, size, and stable-file hashes |
131
- | Project status | `PROJECT_STATUS.md`, `docs/data/project_status.json` | compact current-state table for first-pass readers |
132
- | Citation and metadata | `CITATION.cff`, `codemeta.json`, `docs/data/project_manifest.json`, `LICENSE` | code is MIT-scoped; raw-data use follows Xperience-10M terms |
133
- | Project path | `docs/data/project_packet.json`, website project path section | navigation guide across data, tasks, results, and scale-up status |
134
-
135
- Read the full scope note in [`EVIDENCE_CONTRACT.md`](EVIDENCE_CONTRACT.md), or
136
- consume the machine-readable copy at
137
- [`docs/data/evidence_contract.json`](docs/data/evidence_contract.json).
138
- The current release package report is at
139
- [`docs/data/publication_audit.json`](docs/data/publication_audit.json).
140
- The release-check summary is at
141
- [`QUALITY_GATES.md`](QUALITY_GATES.md) and
142
- [`docs/data/quality_gates.json`](docs/data/quality_gates.json).
143
- The last live-publication verification report is at
144
- [`docs/data/live_publication_status.json`](docs/data/live_publication_status.json).
145
- The current prepared-mirror parity report is at
146
- [`docs/data/mirror_parity.json`](docs/data/mirror_parity.json).
147
- The current multi-episode pilot status note is at
148
- [`docs/data/scope_claims_audit.json`](docs/data/scope_claims_audit.json).
149
- The task-card and walkthrough-storyboard integrity report is at
150
- [`docs/data/task_surface_integrity.json`](docs/data/task_surface_integrity.json).
151
- The public presentation report is at
152
- [`PUBLIC_SURFACE_QA.md`](PUBLIC_SURFACE_QA.md) and
153
- [`docs/data/public_surface_qa.json`](docs/data/public_surface_qa.json).
154
- The generated evaluation protocol is at
155
- [`EVALUATION_PROTOCOL.md`](EVALUATION_PROTOCOL.md) and
156
- [`docs/data/evaluation_protocol.json`](docs/data/evaluation_protocol.json).
157
- The generated research takeaways are at
158
- [`RESEARCH_TAKEAWAYS.md`](RESEARCH_TAKEAWAYS.md) and
159
- [`docs/data/research_takeaways.json`](docs/data/research_takeaways.json).
160
- The research roadmap is at
161
- [`RESEARCH_ROADMAP.md`](RESEARCH_ROADMAP.md) and
162
- [`docs/data/research_roadmap.json`](docs/data/research_roadmap.json).
163
- The foundation-model selection plan is at
164
- [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md) and
165
- [`docs/data/foundation_model_plan.json`](docs/data/foundation_model_plan.json).
166
- The source-of-truth artifact index is at
167
- [`docs/data/artifact_index.json`](docs/data/artifact_index.json).
168
- For a human-readable artifact map, use
169
- [`ARTIFACT_GUIDE.md`](ARTIFACT_GUIDE.md).
170
- For reproduction commands and expected outputs, use
171
- [`REPRODUCIBILITY.md`](REPRODUCIBILITY.md) and
172
- [`docs/data/reproducibility_matrix.json`](docs/data/reproducibility_matrix.json).
173
- Project citation and machine-readable metadata live in
174
- [`CITATION.cff`](CITATION.cff), [`codemeta.json`](codemeta.json), and
175
- [`docs/data/project_manifest.json`](docs/data/project_manifest.json).
176
- The upstream dataset-card alignment note is
177
- [`XPERIENCE10M_DATASET_CARD_ALIGNMENT.md`](XPERIENCE10M_DATASET_CARD_ALIGNMENT.md),
178
- with a machine-readable copy at
179
- [`docs/data/xperience10m_dataset_card_alignment.json`](docs/data/xperience10m_dataset_card_alignment.json).
180
- The generated source-alignment note is at
181
- [`SOURCE_ALIGNMENT_AUDIT.md`](SOURCE_ALIGNMENT_AUDIT.md) and
182
- [`docs/data/source_alignment_audit.json`](docs/data/source_alignment_audit.json).
183
- The generated figure index is at
184
- [`FIGURE_INDEX.md`](FIGURE_INDEX.md) and
185
- [`docs/data/figure_index.json`](docs/data/figure_index.json).
186
- The project logo system is packaged by
187
- [`scripts/build_brand_assets.py`](scripts/build_brand_assets.py), stored under
188
- [`docs/assets/brand/`](docs/assets/brand/), and indexed in
189
- [`docs/data/brand_assets.json`](docs/data/brand_assets.json).
190
 
191
  ## Project Status
192
 
@@ -200,10 +128,9 @@ They give the current research state in one compact table:
200
  | Public-sample pipeline | Verified on one public sample episode: 5,821 frames, 1,161 windows, 8,546 dimensions |
201
  | 12-task suite | Verified minimal baselines with committed metrics, predictions, and manifests |
202
  | Neural heads | Verified compact PyTorch MLP heads over the same task contracts and chronological splits |
203
- | Official dataset wording | Verified against the public `ropedia-ai/xperience-10m` dataset card/API metadata |
204
- | Source alignment | Source facts, sample details, API-listing notes, and project coverage are consistent across repo, website, and HF cards |
205
  | Evaluation protocol | Verified generated protocol for windowing, split policy, leakage controls, and per-task metrics |
206
- | Website and HF mirrors | Verified by website reference reports, public presentation reports, mirror parity, and live-publication checks; the public dashboard uses six top-level tabs, including an explicit Directions tab, plus subsection tabs for dataset, task-suite, method, result, direction, and resource views |
207
  | Qwen3-Omni multi-episode pilot | The gated Xperience-10M dataset is available for selected 128-episode preparation, with full metrics pending completed preprocessing, training, and held-out evaluation |
208
  | Raw Xperience-10M data / full Qwen weights | Not redistributed |
209
 
@@ -213,33 +140,31 @@ If you are reading the project cold, open these in order:
213
 
214
  | Step | Question | Primary artifacts | What should be true |
215
  | --- | --- | --- | --- |
216
- | 1 | What has been implemented? | [`PROJECT_BRIEF.md`](PROJECT_BRIEF.md), [`PROJECT_STATUS.md`](PROJECT_STATUS.md), [`docs/data/project_status.json`](docs/data/project_status.json), [`ARTIFACT_GUIDE.md`](ARTIFACT_GUIDE.md), [`docs/data/artifact_index.json`](docs/data/artifact_index.json), [`docs/data/figure_index.json`](docs/data/figure_index.json) | Single-episode task engineering, visual assets, mirrors, and scale-up status are summarized for first-pass reading. |
217
- | 2 | What is the official upstream dataset? | [`XPERIENCE10M_DATASET_CARD_ALIGNMENT.md`](XPERIENCE10M_DATASET_CARD_ALIGNMENT.md), [`docs/data/xperience10m_dataset_card_alignment.json`](docs/data/xperience10m_dataset_card_alignment.json), [official HF dataset](https://huggingface.co/datasets/ropedia-ai/xperience-10m) | The full dataset is described as a gated large-scale 4D multimodal egocentric source; this repo validates only one public sample episode. |
218
- | 3 | Are source facts consistently presented? | [`SOURCE_ALIGNMENT_AUDIT.md`](SOURCE_ALIGNMENT_AUDIT.md), [`docs/data/source_alignment_audit.json`](docs/data/source_alignment_audit.json), [`scripts/validate_source_alignment.py`](scripts/validate_source_alignment.py) | Repo, website, and HF cards use the same full-dataset facts, sample-card facts, API-listing notes, and project coverage. |
219
- | 4 | How exactly are tasks evaluated? | [`EVALUATION_PROTOCOL.md`](EVALUATION_PROTOCOL.md), [`docs/data/evaluation_protocol.json`](docs/data/evaluation_protocol.json), [`scripts/build_evaluation_protocol.py`](scripts/build_evaluation_protocol.py) | The window unit, chronological split, leakage controls, task metrics, and current limitations are explicit. |
220
- | 5 | What do the current results mean? | [`RESEARCH_TAKEAWAYS.md`](RESEARCH_TAKEAWAYS.md), [`docs/data/research_takeaways.json`](docs/data/research_takeaways.json), [`docs/data/summary_metrics.json`](docs/data/summary_metrics.json) | The takeaways are generated from committed metrics and identify which signals are ready for larger held-out experiments. |
221
- | 6 | What is the research roadmap? | [`RESEARCH_ROADMAP.md`](RESEARCH_ROADMAP.md), [`docs/data/research_roadmap.json`](docs/data/research_roadmap.json), [`DATA_ACCESS_STATUS.md`](results/omni_finetune/DATA_ACCESS_STATUS.md) | The roadmap connects public-sample task development to multi-episode data preparation, Qwen3-Omni LoRA, foundation-model selection, robustness runs, and larger omni/world-model extensions. |
222
- | 7 | Which foundation model comes next? | [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md), [`docs/data/foundation_model_plan.json`](docs/data/foundation_model_plan.json) | Qwen3-Omni remains the first held-out LoRA baseline; Cosmos 3 is the first world-model branch; OpenVLA/openpi/GR00T wait for explicit action targets. |
223
- | 8 | How do I reproduce it? | [`REPRODUCIBILITY.md`](REPRODUCIBILITY.md), [`docs/data/reproducibility_matrix.json`](docs/data/reproducibility_matrix.json), [`notes/reproducibility_audit.md`](notes/reproducibility_audit.md) | Public commands, expected outputs, and the latest exact-match reproduction record are explicit. |
224
- | 9 | What is one model input? | [`windows.csv`](results/episode_task_suite/windows.csv), [`feature_manifest.json`](results/episode_task_suite/feature_manifest.json), [`available_modalities.json`](results/episode_task_suite/available_modalities.json) | The input is an aligned 8,546-dimensional multimodal window with synchronized video, audio, sensor, and language signals. |
225
- | 10 | Are the task results backed by files? | [`summary_report.json`](results/episode_task_suite/summary_report.json), [`neural_mlp/`](results/episode_task_suite/neural_mlp/), [`docs/data/summary_metrics.json`](docs/data/summary_metrics.json) | Each task has minimal and neural-head evidence over the same window contracts. |
226
- | 11 | Is the website self-consistent? | [`docs/data/website_integrity.json`](docs/data/website_integrity.json), [`scripts/validate_website_integrity.py`](scripts/validate_website_integrity.py) | Local links, anchors, tab routing, JSON data, and referenced images are checked before publishing. |
227
- | 12 | What is still pending? | [`DATA_ACCESS_STATUS.md`](results/omni_finetune/DATA_ACCESS_STATUS.md), [`MULTI_EPISODE_ACCESS_STATUS.md`](results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md), [`scripts/omni/discover_xperience10m_sources.py`](scripts/omni/discover_xperience10m_sources.py) | The multi-episode Qwen3-Omni run is prepared at the episode-selection level; final model metrics require completed preprocessing, training, and held-out evaluation. |
228
-
229
- The machine-readable project packet is
230
  [`docs/data/project_packet.json`](docs/data/project_packet.json).
231
 
232
- ## Artifact Index
233
 
234
- [`docs/data/artifact_index.json`](docs/data/artifact_index.json) is the compact
235
- project artifact map for the repo. It lists the core supporting artifacts, whether each exists,
236
- its size, and a SHA-256 hash for stable files. Volatile generated files, such as
237
- the publication package report with a run timestamp, are marked so readers know they
238
- are checked for presence and size rather than treated as fixed hashes.
239
 
240
- [`ARTIFACT_GUIDE.md`](ARTIFACT_GUIDE.md) is the human-readable companion. It
241
- groups the same project evidence into start-here files, data-contract files,
242
- task-evidence files, platform mirrors, and scale-up status artifacts.
243
 
244
  ## Evaluation Protocol
245
 
@@ -256,41 +181,20 @@ generated from committed metric artifacts. They define:
256
  audio-visual learning, pixel-depth reconstruction, and real held-out
257
  multi-episode Qwen3-Omni quality.
258
 
259
- ## Official Dataset Alignment
260
 
261
  The official [`ropedia-ai/xperience-10m`](https://huggingface.co/datasets/ropedia-ai/xperience-10m)
262
- card describes Xperience-10M as a large-scale gated egocentric multimodal
263
- dataset for embodied AI, robotics, world models, and spatial intelligence. Its
264
- public metadata lists video classification, image-to-text, depth estimation,
265
- and robotics task categories; 3D, audio, and video modalities; English
266
- language; `other` license; and manually reviewed non-commercial access.
267
-
268
- At full scale, the official card describes about 10 million experience units,
269
- about 10,000 hours, six RGB streams per episode, audio, stereo depth, camera
270
- pose/SLAM, hand and full-body mocap, IMU, captions, metadata, and calibration.
271
- The card also reports headline counts such as billions of RGB/depth/IMU records
272
- and large caption/object annotations. The live HF page/API separately shows a
273
- 31.9 TB currently hosted file-size display; this is kept separate from the
274
- card's about-1PB full-scale storage statement. This repo records those upstream facts in
275
- [`XPERIENCE10M_DATASET_CARD_ALIGNMENT.md`](XPERIENCE10M_DATASET_CARD_ALIGNMENT.md)
276
- and [`docs/data/xperience10m_dataset_card_alignment.json`](docs/data/xperience10m_dataset_card_alignment.json).
277
 
278
- The current HF API snapshot for the gated dataset reports commit
279
- `ce943cf271a758b60240084892d05cf6dc12dd90`, last modified
280
- `2026-04-21T05:03:45.000Z`, manual gating, and a metadata file listing with
281
- 803 session folders and 12,103 episode folders carrying `annotation.hdf5`.
282
- Those counts are upstream listing metadata only; they are not local downloads,
283
- not redistributed files, and not evidence of model quality in this repo.
284
 
285
- The public sample repo,
286
- [`ropedia-ai/xperience-10m-sample`](https://huggingface.co/datasets/ropedia-ai/xperience-10m-sample),
287
- is separately documented as `Xperience-10M-Sample` with sample metadata,
288
- `cc-by-nc-4.0` license, HOMIE Toolkit usage, and Rerun 0.29.0 `.rrd`
289
- visualization. This project preserves that distinction: the sample powers the
290
- current 5,821-frame task suite, while the full gated dataset is the source for
291
- the selected 128-episode held-out multi-episode pilot now in preparation.
292
-
293
- This repo's current verified subset is much smaller and intentionally explicit:
294
 
295
  - one public sample episode, 5,821 frames, and 1,161 aligned windows,
296
  - raw sample files with six MP4 video streams and audio streams,
@@ -299,15 +203,11 @@ This repo's current verified subset is much smaller and intentionally explicit:
299
  - an 8,546-dimensional baseline representation using video, audio, depth,
300
  pose/SLAM, mocap, IMU, calibration, and language-derived signals.
301
 
302
- The same alignment note also records what is outside the current implemented subset: real
303
- audio-visual learning, caption generation, pixel-depth estimation, SLAM
304
- estimation, neural rendering, policy learning, cross-episode generalization,
305
- and real held-out multi-episode Qwen3-Omni model quality.
306
- It also preserves the official responsible-use scope: the open-source
307
- dataset is limited in diversity and showcase/production quality, and it should
308
- not be used for identity recognition, re-identification, biometric profiling,
309
- surveillance, sensitive attribute inference, or safety-critical deployment
310
- without appropriate safeguards.
311
 
312
  Start with the visual dashboard:
313
 
@@ -323,22 +223,15 @@ Hugging Face Space app:
323
  | --- | --- | --- |
324
  | Project status | `PROJECT_STATUS.md`, `docs/data/project_status.json` | Gives a one-table current project summary before reading the full artifact trail |
325
  | Data contract | `windows.csv`, `feature_manifest.json`, modality manifests | Confirms what each sample window contains before modeling |
326
- | Official dataset alignment | `XPERIENCE10M_DATASET_CARD_ALIGNMENT.md`, `docs/data/xperience10m_dataset_card_alignment.json` | Keeps public descriptions aligned with the official gated dataset card |
327
- | Source alignment | `SOURCE_ALIGNMENT_AUDIT.md`, `docs/data/source_alignment_audit.json` | Summarizes official dataset facts, sample-card facts, API-listing notes, and project coverage across repo, website, and HF cards |
328
- | Figure index | `FIGURE_INDEX.md`, `docs/data/figure_index.json` | Indexes public figures, charts, modality thumbnails, dimensions, hashes, and source scripts |
329
- | Brand assets | `docs/data/brand_assets.json`, `docs/assets/brand/` | Indexes the generated logo, favicon, README/HF card image, app icon, and social preview |
330
  | Evaluation protocol | `EVALUATION_PROTOCOL.md`, `docs/data/evaluation_protocol.json` | Defines the task unit, split, metrics, leakage controls, and current limitations |
331
- | Task surface integrity | `docs/data/task_surface_integrity.json` | Checks the public task cards, readable task names, representative modality thumbnails, and interactive walkthrough storyboard |
332
- | Rendered website check | `RENDERED_SITE_CHECK.md`, `docs/data/rendered_site_check.json` | Records the browser-level page load, tab navigation, walkthrough deep link, player interaction, and console-health result |
333
- | Research roadmap | `RESEARCH_ROADMAP.md`, `docs/data/research_roadmap.json` | Shows the path from sample-level task development to multi-episode and larger omni-model work |
334
  | Minimal heads | softmax, ridge projection/regression, multi-label logistic heads | Keeps every input/output contract visible and inspectable |
335
  | Neural heads | PyTorch MLP classifiers/regressors under `neural_mlp/` | Checks whether nonlinear heads improve each task without changing features |
336
  | Evidence | metrics, predictions, confusion matrices, diagrams, dashboard | Makes the single-episode task development inspectable without rerunning first |
337
- | Release checks | `QUALITY_GATES.md`, `docs/data/quality_gates.json` | Shows the automated and post-publish checks used to keep the public release current |
338
- | Live publication status | `docs/data/live_publication_status.json` | Records the last live GitHub Pages, GitHub raw, and Hugging Face mirror verification |
339
- | Public bundle contents | `docs/data/publication_audit.json` | Summarizes public bundle contents, raw Xperience-10M data exclusion, cache exclusion, archive exclusion, credential-text checks, and public-card figure references |
340
- | Artifact index | `docs/data/artifact_index.json` | Gives readers a compact source-of-truth catalog with stable hashes |
341
- | Artifact guide | `ARTIFACT_GUIDE.md` | Groups the public evidence into research-project layers |
342
  | Reproducibility contract | `REPRODUCIBILITY.md`, `docs/data/reproducibility_matrix.json` | States public commands, expected outputs, exact-match reproduction evidence, and non-reproducible boundaries |
343
  | Citation metadata | `CITATION.cff`, `codemeta.json`, `LICENSE` | Makes the repo easier to cite, index, and reuse without confusing code license and dataset terms |
344
 
@@ -421,12 +314,12 @@ scripts/
421
  export_modality_atlas_assets.py # exports responsive modality-card assets
422
  render_overview_figures.py # renders polished pipeline/architecture PNGs
423
  build_brand_assets.py # derives logo sizes, favicon, social card
424
- build_artifact_index.py # builds the source-of-truth artifact index
425
  build_quality_gates.py # builds release checks
426
  validate_mirror_parity.py # checks prepared GitHub/HF mirror file parity
427
- validate_scope_claims.py # keeps Qwen3-Omni setup and result states separate
428
  validate_task_surface.py # checks readable task cards and interactive storyboard wiring
429
- validate_website_integrity.py # checks local site links, anchors, JSON, images
430
  validate_publication_package.py # checks public repo + HF bundle contents
431
  publish_hf_bundles.py # uploads prepared HF Space/artifact/model bundles
432
  omni/
@@ -454,11 +347,9 @@ docs/
454
  data/artifact_index.json # compact project-artifact catalog
455
  data/live_publication_status.json # live GitHub/HF publication verification
456
  data/quality_gates.json # machine-readable release checks
457
- data/publication_audit.json # machine-readable public bundle report
458
  data/task_surface_integrity.json # machine-readable task-card/storyboard integrity check
459
- data/website_integrity.json # machine-readable website integrity check
460
  data/project_manifest.json # machine-readable public-surface metadata
461
- data/project_packet.json # machine-readable project path and scope summary
462
  data/research_roadmap.json # multi-episode and omni-model roadmap
463
  data/research_directions.json # four-track website data bundle
464
  data/research_direction_extensions.json # four extra probe data bundle
@@ -671,6 +562,59 @@ uses the same split guard, exports episodes in parallel CPU shards, skips and
671
  reports episodes that contain no labeled windows under the configured label
672
  rule, then launches Qwen3-Omni LoRA with `NUM_PROCESSES=8`.
673
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
674
  ### Uploading the pilot Qwen3-Omni LoRA
675
 
676
  A prepared upload package is available at `results/omni_finetune/hf_upload`.
@@ -697,11 +641,23 @@ assuming one backbone solves every Xperience-10M objective.
697
  | GR00T | Humanoid/action-policy branch | Use after mocap/contact retargeting creates well-defined humanoid action targets. |
698
  | OpenVLA / openpi | Open VLA/policy baselines | Use after the project defines robot-compatible or action-token targets. |
699
  | Gemini Robotics | External reasoning reference | Use only for qualitative comparison or annotation support unless local trainable access exists. |
 
700
 
701
  See [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md) and
702
  [`docs/data/foundation_model_plan.json`](docs/data/foundation_model_plan.json)
703
  for the full selection matrix, source links, and model-specific evaluation
704
- additions.
 
 
 
 
 
 
 
 
 
 
 
705
 
706
  ## Four Research Directions
707
 
 
42
  | Multimodal data understanding | Parses the public sample into synchronized windows across video, audio, depth, pose/SLAM, mocap, IMU, calibration, and language-derived signals |
43
  | Task design | Defines 12 human-readable tasks plus four direction-extension probes with inputs, outputs, process modules, metrics, and case-study walkthroughs |
44
  | Model and evaluation discipline | Runs minimal and compact neural baselines, records predictions/metrics, keeps chronological split boundaries explicit, and separates sample evidence from held-out claims |
45
+ | Scale-up planning | Connects the public-sample pipeline to 32/128-episode held-out pilots, Qwen3-Omni LoRA, Cosmos-style world-model branches, policy-model branches, and the future Xperience-native foundation-model pretraining goal |
46
 
47
  ## Start Here
48
 
 
59
  | Navigate the 12 tasks, four tracks, and scale-up plan | [Interactive research roadmap](https://chaoyue0307.github.io/ropedia-xperience-10m-task-suite/research_roadmap.html), [`docs/data/research_roadmap_interactive.json`](docs/data/research_roadmap_interactive.json) |
60
  | Compare current task metrics | [`RESEARCH_TAKEAWAYS.md`](RESEARCH_TAKEAWAYS.md), [`docs/data/summary_metrics.json`](docs/data/summary_metrics.json) |
61
  | Compare possible foundation backbones | [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md), [`docs/data/foundation_model_plan.json`](docs/data/foundation_model_plan.json) |
62
+ | Understand the future native pretraining goal | [`XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md`](XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md) |
63
  | Understand one model input | [`results/episode_task_suite/feature_manifest.json`](results/episode_task_suite/feature_manifest.json), [`results/episode_task_suite/windows.csv`](results/episode_task_suite/windows.csv) |
64
  | Check multi-episode data status | [`results/omni_finetune/DATA_ACCESS_STATUS.md`](results/omni_finetune/DATA_ACCESS_STATUS.md) |
65
 
 
72
  | Task suite | 12 human-readable embodied-AI task contracts with input, process, output, metrics, predictions, and case-study walkthroughs |
73
  | Baselines | Minimal linear/ridge/logistic heads plus compact PyTorch MLP task heads over the same chronological split |
74
  | Research directions | Task mapping and extension probes for human modeling, 3D/4D reconstruction, egocentric interaction, and world modeling |
75
+ | Scale-up path | The gated Xperience-10M dataset is available for a selected 128-episode pilot before Qwen3-Omni LoRA, followed by Cosmos 3/world-model and VLA/policy branches; the long-term goal is an Xperience-native embodied foundation model if full-corpus data, storage, and compute are available |
76
  | Public surfaces | GitHub repo, GitHub Pages dashboard, HF Space, HF artifact dataset, HF baseline-model repo, and HF collection |
77
 
78
  For the fastest interpretation of the current metrics, start with
 
94
  - human-readable research task cards and an interactive scrub/play walkthrough storyboard for every task,
95
  - an interactive research roadmap connecting 12 tasks, four research tracks, current sample evidence, the Qwen3-Omni scale-up path, and foundation-model branch selection,
96
  - a next-milestone track for Qwen3-Omni fine-tuning, Cosmos 3 world modeling, and sensor-bridge evaluation,
97
+ - a future pretraining plan for an Xperience Embodied Foundation Model over the full corpus after smaller multi-episode stages prove value,
98
  - metrics, predictions, model weights, manifests, charts, and a two-level
99
  tabbed static research website,
100
  - a clear explanation of what is implemented now and what moves to the multi-episode stage.
101
 
102
  ## Current Research Scope
103
 
104
+ This project is best read as a staged embodied-AI research study:
 
105
 
106
+ | Layer | Current scope | Where to start |
107
  | --- | --- | --- |
108
+ | Data understanding | One public Xperience-10M sample episode is converted into 5,821 frames, 1,161 aligned windows, and an 8,546-dimensional multimodal representation. | [`PROJECT_BRIEF.md`](PROJECT_BRIEF.md), [`PROJECT_STATUS.md`](PROJECT_STATUS.md) |
109
+ | Task suite | Twelve human-readable tasks cover action, procedure, contact, object, language, retrieval, reconstruction, order, and synchronization questions. | [`RESEARCH_TAKEAWAYS.md`](RESEARCH_TAKEAWAYS.md), [`results/episode_task_suite/summary_report.json`](results/episode_task_suite/summary_report.json) |
110
+ | Baselines | Minimal heads and compact PyTorch MLP heads provide a first controlled comparison on the same chronological split. | [`results/episode_task_suite/neural_mlp/`](results/episode_task_suite/neural_mlp/) |
111
+ | Diagnostics | Audio contribution, modality ablations, timeline overlays, object labels, and alignment stress tests show which signals are useful and which tasks remain hard. | [`results/audio_ablation/AUDIO_ABLATION_SUMMARY.md`](results/audio_ablation/AUDIO_ABLATION_SUMMARY.md), [`docs/single_episode_explorer.html`](docs/single_episode_explorer.html) |
112
+ | Scale-up | A selected 128-episode Qwen3-Omni LoRA pilot is being prepared from the gated dataset; held-out model metrics will be added only after training and evaluation finish. The long-term native-pretraining plan is documented separately as a future research goal. | [`RESEARCH_ROADMAP.md`](RESEARCH_ROADMAP.md), [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md), [`XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md`](XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md), [`results/omni_finetune/DATA_ACCESS_STATUS.md`](results/omni_finetune/DATA_ACCESS_STATUS.md) |
113
+
114
+ Detailed dataset notes, reproduction checks, and generated JSON reports are
115
+ included for readers who want to inspect the implementation, but they are
116
+ supporting materials rather than the main reading path. Use
117
+ [`ARTIFACT_GUIDE.md`](ARTIFACT_GUIDE.md) when you want the full file map.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
118
 
119
  ## Project Status
120
 
 
128
  | Public-sample pipeline | Verified on one public sample episode: 5,821 frames, 1,161 windows, 8,546 dimensions |
129
  | 12-task suite | Verified minimal baselines with committed metrics, predictions, and manifests |
130
  | Neural heads | Verified compact PyTorch MLP heads over the same task contracts and chronological splits |
131
+ | Dataset context | Official Xperience-10M links, sample-vs-gated-data boundary, modality coverage, and redistribution policy are documented |
 
132
  | Evaluation protocol | Verified generated protocol for windowing, split policy, leakage controls, and per-task metrics |
133
+ | Website and Hub pages | Public dashboard, Hugging Face Space, artifact dataset, baseline model repo, and collection use the same project framing and links |
134
  | Qwen3-Omni multi-episode pilot | The gated Xperience-10M dataset is available for selected 128-episode preparation, with full metrics pending completed preprocessing, training, and held-out evaluation |
135
  | Raw Xperience-10M data / full Qwen weights | Not redistributed |
136
 
 
140
 
141
  | Step | Question | Primary artifacts | What should be true |
142
  | --- | --- | --- | --- |
143
+ | 1 | What is this project? | [`PROJECT_BRIEF.md`](PROJECT_BRIEF.md), [`PROJECT_STATUS.md`](PROJECT_STATUS.md), [dashboard](https://chaoyue0307.github.io/ropedia-xperience-10m-task-suite/) | A public-sample Xperience-10M research project with 12 tasks, baselines, and a scale-up plan. |
144
+ | 2 | What data is used? | [`XPERIENCE10M_DATASET_CARD_ALIGNMENT.md`](XPERIENCE10M_DATASET_CARD_ALIGNMENT.md), [official HF dataset](https://huggingface.co/datasets/ropedia-ai/xperience-10m), [sample HF dataset](https://huggingface.co/datasets/ropedia-ai/xperience-10m-sample) | The implemented suite uses one public sample episode; the gated dataset is reserved for selected multi-episode training. |
145
+ | 3 | What does one model input contain? | [`windows.csv`](results/episode_task_suite/windows.csv), [`feature_manifest.json`](results/episode_task_suite/feature_manifest.json), [`available_modalities.json`](results/episode_task_suite/available_modalities.json) | Each window is an aligned multimodal unit with video, audio, depth, pose/SLAM, mocap, IMU, calibration, and language-derived signals. |
146
+ | 4 | What are the 12 tasks? | [`results/episode_task_suite/task_walkthroughs/`](results/episode_task_suite/task_walkthroughs/), [`docs/data/task_walkthroughs.json`](docs/data/task_walkthroughs.json) | Every task has a human-readable name, case study, input, process modules, output, metric, and limitation. |
147
+ | 5 | How are tasks evaluated? | [`EVALUATION_PROTOCOL.md`](EVALUATION_PROTOCOL.md), [`docs/data/evaluation_protocol.json`](docs/data/evaluation_protocol.json) | The window unit, chronological split, leakage controls, task metrics, and current limitations are explicit. |
148
+ | 6 | What do the current results mean? | [`RESEARCH_TAKEAWAYS.md`](RESEARCH_TAKEAWAYS.md), [`docs/data/research_takeaways.json`](docs/data/research_takeaways.json), [`docs/data/summary_metrics.json`](docs/data/summary_metrics.json) | Current metrics describe sample-level task behavior and identify which signals need larger held-out experiments. |
149
+ | 7 | Which models are implemented? | [`results/episode_task_suite/summary_report.json`](results/episode_task_suite/summary_report.json), [`results/episode_task_suite/neural_mlp/`](results/episode_task_suite/neural_mlp/), [HF baseline repo](https://huggingface.co/cy0307/ropedia-xperience-10m-task-baselines) | Each task has minimal and neural-head evidence over the same feature windows. |
150
+ | 8 | What research directions does this support? | [`RESEARCH_ROADMAP.md`](RESEARCH_ROADMAP.md), [`docs/data/research_directions.json`](docs/data/research_directions.json), [`docs/data/research_direction_extensions.json`](docs/data/research_direction_extensions.json) | The tasks are mapped to human modeling, 3D/4D reconstruction, egocentric interaction, and world modeling. |
151
+ | 9 | Which foundation model comes next? | [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md), [`docs/data/foundation_model_plan.json`](docs/data/foundation_model_plan.json), [`XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md`](XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md) | Qwen3-Omni is the first held-out LoRA baseline; Cosmos 3 is the first world-model branch; policy models wait for explicit action targets; Xperience-native pretraining is the full-corpus future goal. |
152
+ | 10 | How do I reproduce it? | [`REPRODUCIBILITY.md`](REPRODUCIBILITY.md), [`notes/reproducibility_audit.md`](notes/reproducibility_audit.md) | Public commands and expected outputs are documented for the sample-episode task suite. |
153
+ | 11 | What is still pending? | [`DATA_ACCESS_STATUS.md`](results/omni_finetune/DATA_ACCESS_STATUS.md), [`MULTI_EPISODE_ACCESS_STATUS.md`](results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md) | Multi-episode Qwen3-Omni model quality will be reported after preprocessing, training, and held-out evaluation complete. |
154
+
155
+ A compact reader-path summary is available at
 
156
  [`docs/data/project_packet.json`](docs/data/project_packet.json).
157
 
158
+ ## Supporting Files
159
 
160
+ [`ARTIFACT_GUIDE.md`](ARTIFACT_GUIDE.md) is the human-readable map for readers
161
+ who want to inspect the project files after the first pass. It groups the main
162
+ briefs, task outputs, baseline results, visual assets, data notes, and
163
+ scale-up documents.
 
164
 
165
+ [`docs/data/artifact_index.json`](docs/data/artifact_index.json) is the compact
166
+ machine-readable companion used by the website and Hugging Face artifact
167
+ dataset.
168
 
169
  ## Evaluation Protocol
170
 
 
181
  audio-visual learning, pixel-depth reconstruction, and real held-out
182
  multi-episode Qwen3-Omni quality.
183
 
184
+ ## Dataset Context
185
 
186
  The official [`ropedia-ai/xperience-10m`](https://huggingface.co/datasets/ropedia-ai/xperience-10m)
187
+ dataset is a gated large-scale egocentric multimodal dataset for embodied AI,
188
+ robotics, spatial intelligence, and world modeling. The public
189
+ [`ropedia-ai/xperience-10m-sample`](https://huggingface.co/datasets/ropedia-ai/xperience-10m-sample)
190
+ repo provides the sample episode used for the implemented task suite here.
 
 
 
 
 
 
 
 
 
 
 
191
 
192
+ This project keeps those layers separate: the public sample supports the
193
+ current 12-task study, while the gated full dataset is used only for the
194
+ selected multi-episode Qwen3-Omni pilot. Raw Xperience-10M MP4/HDF5/RRD files
195
+ are not redistributed in this repo or in the Hugging Face mirrors.
 
 
196
 
197
+ The current verified public-sample subset is:
 
 
 
 
 
 
 
 
198
 
199
  - one public sample episode, 5,821 frames, and 1,161 aligned windows,
200
  - raw sample files with six MP4 video streams and audio streams,
 
203
  - an 8,546-dimensional baseline representation using video, audio, depth,
204
  pose/SLAM, mocap, IMU, calibration, and language-derived signals.
205
 
206
+ Detailed dataset notes are available in
207
+ [`XPERIENCE10M_DATASET_CARD_ALIGNMENT.md`](XPERIENCE10M_DATASET_CARD_ALIGNMENT.md)
208
+ for readers who need the full upstream-card and access-term context. The
209
+ practical boundary is simple: current results come from the public sample, and
210
+ multi-episode model quality is pending the selected held-out pilot.
 
 
 
 
211
 
212
  Start with the visual dashboard:
213
 
 
223
  | --- | --- | --- |
224
  | Project status | `PROJECT_STATUS.md`, `docs/data/project_status.json` | Gives a one-table current project summary before reading the full artifact trail |
225
  | Data contract | `windows.csv`, `feature_manifest.json`, modality manifests | Confirms what each sample window contains before modeling |
226
+ | Dataset context | `XPERIENCE10M_DATASET_CARD_ALIGNMENT.md`, official dataset links | Explains the official dataset, public sample, modalities, access boundary, and what this repo uses |
227
+ | Visual assets | `FIGURE_INDEX.md`, `docs/assets/` | Shows the task-suite graphic, modality thumbnails, pipeline diagrams, charts, and logo assets |
 
 
228
  | Evaluation protocol | `EVALUATION_PROTOCOL.md`, `docs/data/evaluation_protocol.json` | Defines the task unit, split, metrics, leakage controls, and current limitations |
229
+ | Research roadmap | `RESEARCH_ROADMAP.md`, `docs/data/research_roadmap.json` | Shows the path from sample-level task development to multi-episode work, larger model branches, and the future native-pretraining goal |
230
+ | Xperience Embodied Foundation Model plan | `XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md` | Describes the long-term full-corpus pretraining goal, target modules, objectives, staged scale-up, hardware ranges, and evaluation protocol |
 
231
  | Minimal heads | softmax, ridge projection/regression, multi-label logistic heads | Keeps every input/output contract visible and inspectable |
232
  | Neural heads | PyTorch MLP classifiers/regressors under `neural_mlp/` | Checks whether nonlinear heads improve each task without changing features |
233
  | Evidence | metrics, predictions, confusion matrices, diagrams, dashboard | Makes the single-episode task development inspectable without rerunning first |
234
+ | Artifact guide | `ARTIFACT_GUIDE.md` | Groups the public evidence into research-project layers after the first-pass overview |
 
 
 
 
235
  | Reproducibility contract | `REPRODUCIBILITY.md`, `docs/data/reproducibility_matrix.json` | States public commands, expected outputs, exact-match reproduction evidence, and non-reproducible boundaries |
236
  | Citation metadata | `CITATION.cff`, `codemeta.json`, `LICENSE` | Makes the repo easier to cite, index, and reuse without confusing code license and dataset terms |
237
 
 
314
  export_modality_atlas_assets.py # exports responsive modality-card assets
315
  render_overview_figures.py # renders polished pipeline/architecture PNGs
316
  build_brand_assets.py # derives logo sizes, favicon, social card
317
+ build_artifact_index.py # builds the compact artifact guide data
318
  build_quality_gates.py # builds release checks
319
  validate_mirror_parity.py # checks prepared GitHub/HF mirror file parity
320
+ validate_scope_claims.py # separates setup artifacts from completed model metrics
321
  validate_task_surface.py # checks readable task cards and interactive storyboard wiring
322
+ validate_website_integrity.py # checks local site links, anchors, and images
323
  validate_publication_package.py # checks public repo + HF bundle contents
324
  publish_hf_bundles.py # uploads prepared HF Space/artifact/model bundles
325
  omni/
 
347
  data/artifact_index.json # compact project-artifact catalog
348
  data/live_publication_status.json # live GitHub/HF publication verification
349
  data/quality_gates.json # machine-readable release checks
 
350
  data/task_surface_integrity.json # machine-readable task-card/storyboard integrity check
 
351
  data/project_manifest.json # machine-readable public-surface metadata
352
+ data/project_packet.json # compact project path and scope summary
353
  data/research_roadmap.json # multi-episode and omni-model roadmap
354
  data/research_directions.json # four-track website data bundle
355
  data/research_direction_extensions.json # four extra probe data bundle
 
562
  reports episodes that contain no labeled windows under the configured label
563
  rule, then launches Qwen3-Omni LoRA with `NUM_PROCESSES=8`.
564
 
565
+ ### Full 128-Episode Held-Out Pilot
566
+
567
+ Once all selected episodes are complete, use the fixed selected-episode split:
568
+
569
+ - 96 train episodes,
570
+ - 16 validation episodes,
571
+ - 16 held-out test episodes.
572
+
573
+ The clean full-run launcher validates the selected split, exports all splits in
574
+ parallel, trains Qwen3-Omni LoRA on train/val only, then evaluates on the held-
575
+ out test split:
576
+
577
+ ```bash
578
+ RUN_ID=xperience10m_qwen3_omni_128ep_fullsplit_fast8gpu \
579
+ DATA_ROOT=/path/to/xperience10m_128 \
580
+ SELECTION_JSON=results/omni_finetune/xperience10m_128_episode_selection.json \
581
+ MODEL_DIR=/path/to/Qwen__Qwen3-Omni-30B-A3B-Instruct \
582
+ NUM_PROCESSES=8 \
583
+ scripts/omni/run_128_fullsplit_parallel_export_8gpu.sh
584
+ ```
585
+
586
+ Monitor the run with:
587
+
588
+ ```bash
589
+ python scripts/omni/monitor_omni_progress.py \
590
+ --run-id xperience10m_qwen3_omni_128ep_fullsplit_fast8gpu
591
+ ```
592
+
593
+ Validate the run artifacts stage by stage:
594
+
595
+ ```bash
596
+ python scripts/omni/validate_omni_finetune_run.py \
597
+ --run-id xperience10m_qwen3_omni_128ep_fullsplit_fast8gpu \
598
+ --require-stage manifest
599
+
600
+ python scripts/omni/validate_omni_finetune_run.py \
601
+ --run-id xperience10m_qwen3_omni_128ep_fullsplit_fast8gpu \
602
+ --require-stage eval \
603
+ --min-json-validity 0.98
604
+ ```
605
+
606
+ After dataset export, a model-neutral window index can be created for future
607
+ backbones:
608
+
609
+ ```bash
610
+ python scripts/omni/export_model_neutral_window_index.py \
611
+ --dataset-jsonl results/omni_finetune/xperience10m_qwen3_omni_128ep_fullsplit_fast8gpu_dataset/dataset.jsonl
612
+ ```
613
+
614
+ This produces `window_index.jsonl` and `window_index_manifest.json` so Cosmos-
615
+ style world models and VLA/policy branches can reuse the same split-checked
616
+ windows without depending on Qwen chat-message records.
617
+
618
  ### Uploading the pilot Qwen3-Omni LoRA
619
 
620
  A prepared upload package is available at `results/omni_finetune/hf_upload`.
 
641
  | GR00T | Humanoid/action-policy branch | Use after mocap/contact retargeting creates well-defined humanoid action targets. |
642
  | OpenVLA / openpi | Open VLA/policy baselines | Use after the project defines robot-compatible or action-token targets. |
643
  | Gemini Robotics | External reasoning reference | Use only for qualitative comparison or annotation support unless local trainable access exists. |
644
+ | Xperience Embodied Foundation Model | Future Xperience-native pretraining goal | Use only after multi-episode pilots, full-corpus storage, distributed training infrastructure, and scaling evidence justify a from-scratch domain model. |
645
 
646
  See [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md) and
647
  [`docs/data/foundation_model_plan.json`](docs/data/foundation_model_plan.json)
648
  for the full selection matrix, source links, and model-specific evaluation
649
+ additions. See
650
+ [`XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md`](XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md)
651
+ for the long-term full-corpus pretraining plan.
652
+
653
+ Backbone-specific contracts now live in [`configs/omni_backbones`](configs/omni_backbones).
654
+ The extension contract is documented in
655
+ [`OMNI_MODEL_EXTENSION_CONTRACT.md`](OMNI_MODEL_EXTENSION_CONTRACT.md), and the
656
+ registry can be checked with:
657
+
658
+ ```bash
659
+ python scripts/omni/backbone_registry.py --validate --json
660
+ ```
661
 
662
  ## Four Research Directions
663
 
PROJECT_STATUS.md CHANGED
@@ -21,8 +21,9 @@ scale-up readiness; it is not presented as final full-dataset model quality.
21
  | Neural heads | Verified | `scripts/neural_task_models.py`, `results/episode_task_suite/neural_mlp/` | Each task also has a compact PyTorch MLP run over the same feature tensor and chronological split. |
22
  | Audio contribution study | Verified | `scripts/audio_ablation_and_raw_upgrade.py`, `results/audio_ablation/`, `docs/data/audio_ablation_summary.json` | Audio variants are compared across all 12 task contracts; audio improves the primary metric on 6 of 12 tasks, and a 588-d audio-window representation improves over the baseline audio variant on 6 of 12 tasks. |
23
  | Research takeaways | Verified | `RESEARCH_TAKEAWAYS.md`, `docs/data/research_takeaways.json`, `scripts/build_research_takeaways.py` | The main result interpretation is generated from committed metrics: chronological class shift, neural gains on dynamics/order/alignment, open retrieval/reconstruction problems, and the need for held-out episodes. |
24
- | Research roadmap | Current | `RESEARCH_ROADMAP.md`, `docs/data/research_roadmap.json` | The roadmap connects public-sample task development to 128-episode data preparation, Qwen3-Omni LoRA, foundation-model selection, robustness runs, and larger omni/world-model extensions. |
25
  | Foundation-model plan | Current | `FOUNDATION_MODEL_PLAN.md`, `docs/data/foundation_model_plan.json` | Qwen3-Omni remains the first trainable held-out LoRA baseline; Cosmos 3 is added as the first world-model/action-generation branch; OpenVLA/openpi/GR00T are policy candidates after action targets are explicit. |
 
26
  | Evaluation protocol | Verified | `EVALUATION_PROTOCOL.md`, `docs/data/evaluation_protocol.json`, `scripts/build_evaluation_protocol.py` | Windowing, chronological split, per-task metrics, leakage controls, and current limitations are generated from committed metric artifacts. |
27
  | Dataset context | Verified | `XPERIENCE10M_DATASET_CARD_ALIGNMENT.md`, official Xperience-10M and sample cards | The README and dashboard distinguish the public sample used here from the gated full dataset used for the selected multi-episode pilot. |
28
  | Public dashboard and Hub pages | Verified | GitHub Pages, HF Space, artifact dataset, baseline model repo, Qwen3-Omni LoRA repo | Readers can move between the website, code, derived artifacts, baseline weights, and Qwen3-Omni pilot status without needing internal setup details. |
@@ -42,15 +43,17 @@ scale-up readiness; it is not presented as final full-dataset model quality.
42
  the path from public-sample task work to multi-episode modeling.
43
  5. Inspect `FOUNDATION_MODEL_PLAN.md` and
44
  `docs/data/foundation_model_plan.json` before choosing a backbone branch.
45
- 6. Inspect `docs/data/summary_metrics.json` and
 
 
46
  `results/episode_task_suite/neural_mlp/` to check the 12-task outputs.
47
- 7. Inspect `results/audio_ablation/AUDIO_ABLATION_SUMMARY.md` before judging
48
  whether audio helps the current task suite.
49
- 8. Inspect `EVALUATION_PROTOCOL.md` before judging task metrics or leakage
50
  controls.
51
- 9. Inspect `XPERIENCE10M_DATASET_CARD_ALIGNMENT.md` only if you need the
52
  detailed upstream dataset-card context.
53
- 10. Inspect `results/omni_finetune/DATA_ACCESS_STATUS.md` before judging
54
  Qwen3-Omni scale-up status.
55
 
56
  ## Current Reading Notes
@@ -67,3 +70,5 @@ scale-up readiness; it is not presented as final full-dataset model quality.
67
  - Foundation-model selection is now explicit: Qwen3-Omni is the immediate
68
  trainable pilot, Cosmos 3 is the first world-model branch, and policy models
69
  such as OpenVLA/openpi/GR00T wait for action-target conversion.
 
 
 
21
  | Neural heads | Verified | `scripts/neural_task_models.py`, `results/episode_task_suite/neural_mlp/` | Each task also has a compact PyTorch MLP run over the same feature tensor and chronological split. |
22
  | Audio contribution study | Verified | `scripts/audio_ablation_and_raw_upgrade.py`, `results/audio_ablation/`, `docs/data/audio_ablation_summary.json` | Audio variants are compared across all 12 task contracts; audio improves the primary metric on 6 of 12 tasks, and a 588-d audio-window representation improves over the baseline audio variant on 6 of 12 tasks. |
23
  | Research takeaways | Verified | `RESEARCH_TAKEAWAYS.md`, `docs/data/research_takeaways.json`, `scripts/build_research_takeaways.py` | The main result interpretation is generated from committed metrics: chronological class shift, neural gains on dynamics/order/alignment, open retrieval/reconstruction problems, and the need for held-out episodes. |
24
+ | Research roadmap | Current | `RESEARCH_ROADMAP.md`, `docs/data/research_roadmap.json` | The roadmap connects public-sample task development to 128-episode data preparation, Qwen3-Omni LoRA, foundation-model selection, robustness runs, world/policy branches, and the future Xperience-native pretraining goal. |
25
  | Foundation-model plan | Current | `FOUNDATION_MODEL_PLAN.md`, `docs/data/foundation_model_plan.json` | Qwen3-Omni remains the first trainable held-out LoRA baseline; Cosmos 3 is added as the first world-model/action-generation branch; OpenVLA/openpi/GR00T are policy candidates after action targets are explicit. |
26
+ | Xperience Embodied Foundation Model | Future goal | `XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md` | A future full-corpus pretraining plan describes target modules, objectives, staged scale-up, hardware ranges, and evaluation for a domain-specific embodied foundation model. |
27
  | Evaluation protocol | Verified | `EVALUATION_PROTOCOL.md`, `docs/data/evaluation_protocol.json`, `scripts/build_evaluation_protocol.py` | Windowing, chronological split, per-task metrics, leakage controls, and current limitations are generated from committed metric artifacts. |
28
  | Dataset context | Verified | `XPERIENCE10M_DATASET_CARD_ALIGNMENT.md`, official Xperience-10M and sample cards | The README and dashboard distinguish the public sample used here from the gated full dataset used for the selected multi-episode pilot. |
29
  | Public dashboard and Hub pages | Verified | GitHub Pages, HF Space, artifact dataset, baseline model repo, Qwen3-Omni LoRA repo | Readers can move between the website, code, derived artifacts, baseline weights, and Qwen3-Omni pilot status without needing internal setup details. |
 
43
  the path from public-sample task work to multi-episode modeling.
44
  5. Inspect `FOUNDATION_MODEL_PLAN.md` and
45
  `docs/data/foundation_model_plan.json` before choosing a backbone branch.
46
+ 6. Inspect `XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md` for the
47
+ long-term full-corpus pretraining goal.
48
+ 7. Inspect `docs/data/summary_metrics.json` and
49
  `results/episode_task_suite/neural_mlp/` to check the 12-task outputs.
50
+ 8. Inspect `results/audio_ablation/AUDIO_ABLATION_SUMMARY.md` before judging
51
  whether audio helps the current task suite.
52
+ 9. Inspect `EVALUATION_PROTOCOL.md` before judging task metrics or leakage
53
  controls.
54
+ 10. Inspect `XPERIENCE10M_DATASET_CARD_ALIGNMENT.md` only if you need the
55
  detailed upstream dataset-card context.
56
+ 11. Inspect `results/omni_finetune/DATA_ACCESS_STATUS.md` before judging
57
  Qwen3-Omni scale-up status.
58
 
59
  ## Current Reading Notes
 
70
  - Foundation-model selection is now explicit: Qwen3-Omni is the immediate
71
  trainable pilot, Cosmos 3 is the first world-model branch, and policy models
72
  such as OpenVLA/openpi/GR00T wait for action-target conversion.
73
+ - The Xperience Embodied Foundation Model is a future native-pretraining goal,
74
+ not a completed model or current benchmark.
README.md CHANGED
@@ -64,7 +64,7 @@ embodied-AI research infrastructure:
64
  | Multimodal data understanding | Parses the public sample into synchronized windows across video, audio, depth, pose/SLAM, mocap, IMU, calibration, and language-derived signals |
65
  | Task design | Defines 12 human-readable tasks plus four direction-extension probes with inputs, outputs, process modules, metrics, and case-study walkthroughs |
66
  | Model and evaluation discipline | Runs minimal and compact neural baselines, records predictions/metrics, keeps chronological split boundaries explicit, and separates sample evidence from held-out claims |
67
- | Scale-up planning | Connects the public-sample pipeline to 32/128-episode held-out pilots, Qwen3-Omni LoRA, Cosmos-style world-model branches, and later policy-model branches |
68
 
69
  ## Start Here
70
 
@@ -81,6 +81,7 @@ before the multi-episode omni-model stage becomes a real held-out evaluation.
81
  | Navigate the 12 tasks, four tracks, and scale-up plan | [Interactive research roadmap](https://chaoyue0307.github.io/ropedia-xperience-10m-task-suite/research_roadmap.html), [`docs/data/research_roadmap_interactive.json`](docs/data/research_roadmap_interactive.json) |
82
  | Compare current task metrics | [`RESEARCH_TAKEAWAYS.md`](RESEARCH_TAKEAWAYS.md), [`docs/data/summary_metrics.json`](docs/data/summary_metrics.json) |
83
  | Compare possible foundation backbones | [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md), [`docs/data/foundation_model_plan.json`](docs/data/foundation_model_plan.json) |
 
84
  | Understand one model input | [`results/episode_task_suite/feature_manifest.json`](results/episode_task_suite/feature_manifest.json), [`results/episode_task_suite/windows.csv`](results/episode_task_suite/windows.csv) |
85
  | Check multi-episode data status | [`results/omni_finetune/DATA_ACCESS_STATUS.md`](results/omni_finetune/DATA_ACCESS_STATUS.md) |
86
 
@@ -93,7 +94,7 @@ before the multi-episode omni-model stage becomes a real held-out evaluation.
93
  | Task suite | 12 human-readable embodied-AI task contracts with input, process, output, metrics, predictions, and case-study walkthroughs |
94
  | Baselines | Minimal linear/ridge/logistic heads plus compact PyTorch MLP task heads over the same chronological split |
95
  | Research directions | Task mapping and extension probes for human modeling, 3D/4D reconstruction, egocentric interaction, and world modeling |
96
- | Scale-up path | The gated Xperience-10M dataset is available for a selected 128-episode pilot before Qwen3-Omni LoRA, followed by Cosmos 3/world-model and VLA/policy branches |
97
  | Public surfaces | GitHub repo, GitHub Pages dashboard, HF Space, HF artifact dataset, HF baseline-model repo, and HF collection |
98
 
99
  For the fastest interpretation of the current metrics, start with
@@ -115,6 +116,7 @@ Current contributions:
115
  - human-readable research task cards and an interactive scrub/play walkthrough storyboard for every task,
116
  - an interactive research roadmap connecting 12 tasks, four research tracks, current sample evidence, the Qwen3-Omni scale-up path, and foundation-model branch selection,
117
  - a next-milestone track for Qwen3-Omni fine-tuning, Cosmos 3 world modeling, and sensor-bridge evaluation,
 
118
  - metrics, predictions, model weights, manifests, charts, and a two-level
119
  tabbed static research website,
120
  - a clear explanation of what is implemented now and what moves to the multi-episode stage.
@@ -129,7 +131,7 @@ This project is best read as a staged embodied-AI research study:
129
  | Task suite | Twelve human-readable tasks cover action, procedure, contact, object, language, retrieval, reconstruction, order, and synchronization questions. | [`RESEARCH_TAKEAWAYS.md`](RESEARCH_TAKEAWAYS.md), [`results/episode_task_suite/summary_report.json`](results/episode_task_suite/summary_report.json) |
130
  | Baselines | Minimal heads and compact PyTorch MLP heads provide a first controlled comparison on the same chronological split. | [`results/episode_task_suite/neural_mlp/`](results/episode_task_suite/neural_mlp/) |
131
  | Diagnostics | Audio contribution, modality ablations, timeline overlays, object labels, and alignment stress tests show which signals are useful and which tasks remain hard. | [`results/audio_ablation/AUDIO_ABLATION_SUMMARY.md`](results/audio_ablation/AUDIO_ABLATION_SUMMARY.md), [`docs/single_episode_explorer.html`](docs/single_episode_explorer.html) |
132
- | Scale-up | A selected 128-episode Qwen3-Omni LoRA pilot is being prepared from the gated dataset; held-out model metrics will be added only after training and evaluation finish. | [`RESEARCH_ROADMAP.md`](RESEARCH_ROADMAP.md), [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md), [`results/omni_finetune/DATA_ACCESS_STATUS.md`](results/omni_finetune/DATA_ACCESS_STATUS.md) |
133
 
134
  Detailed dataset notes, reproduction checks, and generated JSON reports are
135
  included for readers who want to inspect the implementation, but they are
@@ -168,7 +170,7 @@ If you are reading the project cold, open these in order:
168
  | 6 | What do the current results mean? | [`RESEARCH_TAKEAWAYS.md`](RESEARCH_TAKEAWAYS.md), [`docs/data/research_takeaways.json`](docs/data/research_takeaways.json), [`docs/data/summary_metrics.json`](docs/data/summary_metrics.json) | Current metrics describe sample-level task behavior and identify which signals need larger held-out experiments. |
169
  | 7 | Which models are implemented? | [`results/episode_task_suite/summary_report.json`](results/episode_task_suite/summary_report.json), [`results/episode_task_suite/neural_mlp/`](results/episode_task_suite/neural_mlp/), [HF baseline repo](https://huggingface.co/cy0307/ropedia-xperience-10m-task-baselines) | Each task has minimal and neural-head evidence over the same feature windows. |
170
  | 8 | What research directions does this support? | [`RESEARCH_ROADMAP.md`](RESEARCH_ROADMAP.md), [`docs/data/research_directions.json`](docs/data/research_directions.json), [`docs/data/research_direction_extensions.json`](docs/data/research_direction_extensions.json) | The tasks are mapped to human modeling, 3D/4D reconstruction, egocentric interaction, and world modeling. |
171
- | 9 | Which foundation model comes next? | [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md), [`docs/data/foundation_model_plan.json`](docs/data/foundation_model_plan.json) | Qwen3-Omni is the first held-out LoRA baseline; Cosmos 3 is the first world-model branch; policy models wait for explicit action targets. |
172
  | 10 | How do I reproduce it? | [`REPRODUCIBILITY.md`](REPRODUCIBILITY.md), [`notes/reproducibility_audit.md`](notes/reproducibility_audit.md) | Public commands and expected outputs are documented for the sample-episode task suite. |
173
  | 11 | What is still pending? | [`DATA_ACCESS_STATUS.md`](results/omni_finetune/DATA_ACCESS_STATUS.md), [`MULTI_EPISODE_ACCESS_STATUS.md`](results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md) | Multi-episode Qwen3-Omni model quality will be reported after preprocessing, training, and held-out evaluation complete. |
174
 
@@ -246,7 +248,8 @@ Hugging Face Space app:
246
  | Dataset context | `XPERIENCE10M_DATASET_CARD_ALIGNMENT.md`, official dataset links | Explains the official dataset, public sample, modalities, access boundary, and what this repo uses |
247
  | Visual assets | `FIGURE_INDEX.md`, `docs/assets/` | Shows the task-suite graphic, modality thumbnails, pipeline diagrams, charts, and logo assets |
248
  | Evaluation protocol | `EVALUATION_PROTOCOL.md`, `docs/data/evaluation_protocol.json` | Defines the task unit, split, metrics, leakage controls, and current limitations |
249
- | Research roadmap | `RESEARCH_ROADMAP.md`, `docs/data/research_roadmap.json` | Shows the path from sample-level task development to multi-episode and larger omni-model work |
 
250
  | Minimal heads | softmax, ridge projection/regression, multi-label logistic heads | Keeps every input/output contract visible and inspectable |
251
  | Neural heads | PyTorch MLP classifiers/regressors under `neural_mlp/` | Checks whether nonlinear heads improve each task without changing features |
252
  | Evidence | metrics, predictions, confusion matrices, diagrams, dashboard | Makes the single-episode task development inspectable without rerunning first |
@@ -607,11 +610,14 @@ assuming one backbone solves every Xperience-10M objective.
607
  | GR00T | Humanoid/action-policy branch | Use after mocap/contact retargeting creates well-defined humanoid action targets. |
608
  | OpenVLA / openpi | Open VLA/policy baselines | Use after the project defines robot-compatible or action-token targets. |
609
  | Gemini Robotics | External reasoning reference | Use only for qualitative comparison or annotation support unless local trainable access exists. |
 
610
 
611
  See [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md) and
612
  [`docs/data/foundation_model_plan.json`](docs/data/foundation_model_plan.json)
613
  for the full selection matrix, source links, and model-specific evaluation
614
- additions.
 
 
615
 
616
  ## Four Research Directions
617
 
 
64
  | Multimodal data understanding | Parses the public sample into synchronized windows across video, audio, depth, pose/SLAM, mocap, IMU, calibration, and language-derived signals |
65
  | Task design | Defines 12 human-readable tasks plus four direction-extension probes with inputs, outputs, process modules, metrics, and case-study walkthroughs |
66
  | Model and evaluation discipline | Runs minimal and compact neural baselines, records predictions/metrics, keeps chronological split boundaries explicit, and separates sample evidence from held-out claims |
67
+ | Scale-up planning | Connects the public-sample pipeline to 32/128-episode held-out pilots, Qwen3-Omni LoRA, Cosmos-style world-model branches, policy-model branches, and the future Xperience-native foundation-model pretraining goal |
68
 
69
  ## Start Here
70
 
 
81
  | Navigate the 12 tasks, four tracks, and scale-up plan | [Interactive research roadmap](https://chaoyue0307.github.io/ropedia-xperience-10m-task-suite/research_roadmap.html), [`docs/data/research_roadmap_interactive.json`](docs/data/research_roadmap_interactive.json) |
82
  | Compare current task metrics | [`RESEARCH_TAKEAWAYS.md`](RESEARCH_TAKEAWAYS.md), [`docs/data/summary_metrics.json`](docs/data/summary_metrics.json) |
83
  | Compare possible foundation backbones | [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md), [`docs/data/foundation_model_plan.json`](docs/data/foundation_model_plan.json) |
84
+ | Understand the future native pretraining goal | [`XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md`](XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md) |
85
  | Understand one model input | [`results/episode_task_suite/feature_manifest.json`](results/episode_task_suite/feature_manifest.json), [`results/episode_task_suite/windows.csv`](results/episode_task_suite/windows.csv) |
86
  | Check multi-episode data status | [`results/omni_finetune/DATA_ACCESS_STATUS.md`](results/omni_finetune/DATA_ACCESS_STATUS.md) |
87
 
 
94
  | Task suite | 12 human-readable embodied-AI task contracts with input, process, output, metrics, predictions, and case-study walkthroughs |
95
  | Baselines | Minimal linear/ridge/logistic heads plus compact PyTorch MLP task heads over the same chronological split |
96
  | Research directions | Task mapping and extension probes for human modeling, 3D/4D reconstruction, egocentric interaction, and world modeling |
97
+ | Scale-up path | The gated Xperience-10M dataset is available for a selected 128-episode pilot before Qwen3-Omni LoRA, followed by Cosmos 3/world-model and VLA/policy branches; the long-term goal is an Xperience-native embodied foundation model if full-corpus data, storage, and compute are available |
98
  | Public surfaces | GitHub repo, GitHub Pages dashboard, HF Space, HF artifact dataset, HF baseline-model repo, and HF collection |
99
 
100
  For the fastest interpretation of the current metrics, start with
 
116
  - human-readable research task cards and an interactive scrub/play walkthrough storyboard for every task,
117
  - an interactive research roadmap connecting 12 tasks, four research tracks, current sample evidence, the Qwen3-Omni scale-up path, and foundation-model branch selection,
118
  - a next-milestone track for Qwen3-Omni fine-tuning, Cosmos 3 world modeling, and sensor-bridge evaluation,
119
+ - a future pretraining plan for an Xperience Embodied Foundation Model over the full corpus after smaller multi-episode stages prove value,
120
  - metrics, predictions, model weights, manifests, charts, and a two-level
121
  tabbed static research website,
122
  - a clear explanation of what is implemented now and what moves to the multi-episode stage.
 
131
  | Task suite | Twelve human-readable tasks cover action, procedure, contact, object, language, retrieval, reconstruction, order, and synchronization questions. | [`RESEARCH_TAKEAWAYS.md`](RESEARCH_TAKEAWAYS.md), [`results/episode_task_suite/summary_report.json`](results/episode_task_suite/summary_report.json) |
132
  | Baselines | Minimal heads and compact PyTorch MLP heads provide a first controlled comparison on the same chronological split. | [`results/episode_task_suite/neural_mlp/`](results/episode_task_suite/neural_mlp/) |
133
  | Diagnostics | Audio contribution, modality ablations, timeline overlays, object labels, and alignment stress tests show which signals are useful and which tasks remain hard. | [`results/audio_ablation/AUDIO_ABLATION_SUMMARY.md`](results/audio_ablation/AUDIO_ABLATION_SUMMARY.md), [`docs/single_episode_explorer.html`](docs/single_episode_explorer.html) |
134
+ | Scale-up | A selected 128-episode Qwen3-Omni LoRA pilot is being prepared from the gated dataset; held-out model metrics will be added only after training and evaluation finish. The long-term native-pretraining plan is documented separately as a future research goal. | [`RESEARCH_ROADMAP.md`](RESEARCH_ROADMAP.md), [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md), [`XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md`](XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md), [`results/omni_finetune/DATA_ACCESS_STATUS.md`](results/omni_finetune/DATA_ACCESS_STATUS.md) |
135
 
136
  Detailed dataset notes, reproduction checks, and generated JSON reports are
137
  included for readers who want to inspect the implementation, but they are
 
170
  | 6 | What do the current results mean? | [`RESEARCH_TAKEAWAYS.md`](RESEARCH_TAKEAWAYS.md), [`docs/data/research_takeaways.json`](docs/data/research_takeaways.json), [`docs/data/summary_metrics.json`](docs/data/summary_metrics.json) | Current metrics describe sample-level task behavior and identify which signals need larger held-out experiments. |
171
  | 7 | Which models are implemented? | [`results/episode_task_suite/summary_report.json`](results/episode_task_suite/summary_report.json), [`results/episode_task_suite/neural_mlp/`](results/episode_task_suite/neural_mlp/), [HF baseline repo](https://huggingface.co/cy0307/ropedia-xperience-10m-task-baselines) | Each task has minimal and neural-head evidence over the same feature windows. |
172
  | 8 | What research directions does this support? | [`RESEARCH_ROADMAP.md`](RESEARCH_ROADMAP.md), [`docs/data/research_directions.json`](docs/data/research_directions.json), [`docs/data/research_direction_extensions.json`](docs/data/research_direction_extensions.json) | The tasks are mapped to human modeling, 3D/4D reconstruction, egocentric interaction, and world modeling. |
173
+ | 9 | Which foundation model comes next? | [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md), [`docs/data/foundation_model_plan.json`](docs/data/foundation_model_plan.json), [`XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md`](XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md) | Qwen3-Omni is the first held-out LoRA baseline; Cosmos 3 is the first world-model branch; policy models wait for explicit action targets; Xperience-native pretraining is the full-corpus future goal. |
174
  | 10 | How do I reproduce it? | [`REPRODUCIBILITY.md`](REPRODUCIBILITY.md), [`notes/reproducibility_audit.md`](notes/reproducibility_audit.md) | Public commands and expected outputs are documented for the sample-episode task suite. |
175
  | 11 | What is still pending? | [`DATA_ACCESS_STATUS.md`](results/omni_finetune/DATA_ACCESS_STATUS.md), [`MULTI_EPISODE_ACCESS_STATUS.md`](results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md) | Multi-episode Qwen3-Omni model quality will be reported after preprocessing, training, and held-out evaluation complete. |
176
 
 
248
  | Dataset context | `XPERIENCE10M_DATASET_CARD_ALIGNMENT.md`, official dataset links | Explains the official dataset, public sample, modalities, access boundary, and what this repo uses |
249
  | Visual assets | `FIGURE_INDEX.md`, `docs/assets/` | Shows the task-suite graphic, modality thumbnails, pipeline diagrams, charts, and logo assets |
250
  | Evaluation protocol | `EVALUATION_PROTOCOL.md`, `docs/data/evaluation_protocol.json` | Defines the task unit, split, metrics, leakage controls, and current limitations |
251
+ | Research roadmap | `RESEARCH_ROADMAP.md`, `docs/data/research_roadmap.json` | Shows the path from sample-level task development to multi-episode work, larger model branches, and the future native-pretraining goal |
252
+ | Xperience Embodied Foundation Model plan | `XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md` | Describes the long-term full-corpus pretraining goal, target modules, objectives, staged scale-up, hardware ranges, and evaluation protocol |
253
  | Minimal heads | softmax, ridge projection/regression, multi-label logistic heads | Keeps every input/output contract visible and inspectable |
254
  | Neural heads | PyTorch MLP classifiers/regressors under `neural_mlp/` | Checks whether nonlinear heads improve each task without changing features |
255
  | Evidence | metrics, predictions, confusion matrices, diagrams, dashboard | Makes the single-episode task development inspectable without rerunning first |
 
610
  | GR00T | Humanoid/action-policy branch | Use after mocap/contact retargeting creates well-defined humanoid action targets. |
611
  | OpenVLA / openpi | Open VLA/policy baselines | Use after the project defines robot-compatible or action-token targets. |
612
  | Gemini Robotics | External reasoning reference | Use only for qualitative comparison or annotation support unless local trainable access exists. |
613
+ | Xperience Embodied Foundation Model | Future Xperience-native pretraining goal | Use only after multi-episode pilots, full-corpus storage, distributed training infrastructure, and scaling evidence justify a from-scratch domain model. |
614
 
615
  See [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md) and
616
  [`docs/data/foundation_model_plan.json`](docs/data/foundation_model_plan.json)
617
  for the full selection matrix, source links, and model-specific evaluation
618
+ additions. See
619
+ [`XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md`](XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md)
620
+ for the long-term full-corpus pretraining plan.
621
 
622
  ## Four Research Directions
623
 
RESEARCH_ROADMAP.md CHANGED
@@ -15,6 +15,7 @@ should exist before the stage is treated as complete.
15
  | Foundation-Model Selection Matrix | Next | The selected pilot episodes are prepared, or a 3-8 episode dry run is available for preprocessing checks. | Backbone registry, Cosmos 3 world-model branch plan, Qwen3-Omni baseline plan, OpenVLA/openpi/GR00T policy candidates, and model-specific evaluation additions. | `FOUNDATION_MODEL_PLAN.md`, `docs/data/foundation_model_plan.json`, `research_roadmap_interactive.json` |
16
  | 64-128 Episode Robustness Run | Planned | The selected-episode pilot trains and evaluates cleanly. | Split-by-session metrics, modality ablations, calibration/object/language error analysis, and sensitivity to missing views. | Held-out metrics by session, task, and modality; ablation tables; qualitative error analysis. |
17
  | Cosmos 3 and Policy-Model Extensions | Planned | Enough multi-episode data, compute budget, and model-specific action/world-state targets. | Cosmos 3 future-window or action-conditioned world-model probes, OpenVLA/openpi/GR00T action-policy baselines, modality-conditioning checks, affordance tasks, and synthetic-data usefulness tests. | Task-specific held-out evaluations, qualitative inspection, and updated model cards. |
 
18
 
19
  ## Current Decision Point
20
 
@@ -24,9 +25,11 @@ episodes to run the held-out Qwen3-Omni pilot, then choose larger model branches
24
  by task fit. Qwen3-Omni remains the first trainable multimodal LoRA target.
25
  Cosmos 3 becomes the first world-model/action-generation branch. OpenVLA,
26
  openpi, GR00T, Octo, and SmolVLA-style models become policy/action branches only
27
- after the action target is explicit. The public sample is already enough for
28
- task design, feature contracts, walkthroughs, and baseline comparisons. It is
29
- not enough to measure general embodied-AI model quality.
 
 
30
 
31
  ## Stage Details
32
 
@@ -109,6 +112,27 @@ objectives: audio-visible alignment, future-window prediction,
109
  action-conditioned world modeling, synthetic-data usefulness tests, policy-style
110
  next action, contact, object relevance, and affordance reasoning.
111
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
112
  ## Public Artifacts That Should Move Together
113
 
114
  When a roadmap stage advances, update these public surfaces together:
@@ -118,6 +142,7 @@ When a roadmap stage advances, update these public surfaces together:
118
  - `RESEARCH_TAKEAWAYS.md`
119
  - `EVALUATION_PROTOCOL.md`
120
  - `ARTIFACT_GUIDE.md`
 
121
  - `docs/index.html`
122
  - `docs/data/research_roadmap.json`
123
  - Hugging Face Space, artifact dataset, and model cards
 
15
  | Foundation-Model Selection Matrix | Next | The selected pilot episodes are prepared, or a 3-8 episode dry run is available for preprocessing checks. | Backbone registry, Cosmos 3 world-model branch plan, Qwen3-Omni baseline plan, OpenVLA/openpi/GR00T policy candidates, and model-specific evaluation additions. | `FOUNDATION_MODEL_PLAN.md`, `docs/data/foundation_model_plan.json`, `research_roadmap_interactive.json` |
16
  | 64-128 Episode Robustness Run | Planned | The selected-episode pilot trains and evaluates cleanly. | Split-by-session metrics, modality ablations, calibration/object/language error analysis, and sensitivity to missing views. | Held-out metrics by session, task, and modality; ablation tables; qualitative error analysis. |
17
  | Cosmos 3 and Policy-Model Extensions | Planned | Enough multi-episode data, compute budget, and model-specific action/world-state targets. | Cosmos 3 future-window or action-conditioned world-model probes, OpenVLA/openpi/GR00T action-policy baselines, modality-conditioning checks, affordance tasks, and synthetic-data usefulness tests. | Task-specific held-out evaluations, qualitative inspection, and updated model cards. |
18
+ | Xperience Embodied Foundation Model Pretraining | Future | Full-corpus access, PB-scale storage path, multi-node compute, and positive scaling evidence from smaller runs. | Xperience-native temporal multimodal model, full-corpus manifests, pretraining shards, scaling curves, held-out evaluations, and model card. | Pretraining metadata, checkpoint inventory, held-out metrics, scaling report, and data-boundary report. |
19
 
20
  ## Current Decision Point
21
 
 
25
  by task fit. Qwen3-Omni remains the first trainable multimodal LoRA target.
26
  Cosmos 3 becomes the first world-model/action-generation branch. OpenVLA,
27
  openpi, GR00T, Octo, and SmolVLA-style models become policy/action branches only
28
+ after the action target is explicit. A from-scratch Xperience Embodied
29
+ Foundation Model is the long-term native-pretraining goal, not the immediate
30
+ experiment. The public sample is already enough for task design, feature
31
+ contracts, walkthroughs, and baseline comparisons. It is not enough to measure
32
+ general embodied-AI model quality.
33
 
34
  ## Stage Details
35
 
 
112
  action-conditioned world modeling, synthetic-data usefulness tests, policy-style
113
  next action, contact, object relevance, and affordance reasoning.
114
 
115
+ ### 7. Xperience Embodied Foundation Model Pretraining
116
+
117
+ This stage is the long-term full-corpus goal. Instead of adapting an existing
118
+ backbone, it would pretrain a domain model directly on the synchronized
119
+ Xperience-10M modality structure: video, audio, depth, pose/SLAM, hand/body
120
+ mocap, IMU, calibration, and language annotations.
121
+
122
+ The first realistic target is a 3B-7B Xperience-native domain model after
123
+ smaller 0.3B-1B and 1B-3B pilots prove that the objectives and data loaders
124
+ scale. The training objective should combine masked multimodal modeling,
125
+ cross-modal alignment, future-state prediction, ego-motion and hand-motion
126
+ forecasting, action/procedure prediction, language grounding, contact and
127
+ affordance prediction, and optional policy-style targets after action
128
+ conversion.
129
+
130
+ This stage needs full-corpus access, PB-scale storage planning, high-throughput
131
+ media decoding, distributed training, reliable checkpoints, and held-out
132
+ evaluation across episodes, sessions, activities, objects, and missing
133
+ modalities. The plan is reader-facing in
134
+ `XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md`.
135
+
136
  ## Public Artifacts That Should Move Together
137
 
138
  When a roadmap stage advances, update these public surfaces together:
 
142
  - `RESEARCH_TAKEAWAYS.md`
143
  - `EVALUATION_PROTOCOL.md`
144
  - `ARTIFACT_GUIDE.md`
145
+ - `XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md`
146
  - `docs/index.html`
147
  - `docs/data/research_roadmap.json`
148
  - Hugging Face Space, artifact dataset, and model cards
XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md ADDED
@@ -0,0 +1,178 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Xperience Embodied Foundation Model Pretraining Goal
2
+
3
+ This document describes a future research direction for the project: a
4
+ domain-specific embodied foundation model pretrained on the full Xperience-10M
5
+ corpus, if full-episode access, storage, and compute become available.
6
+
7
+ Current status: this is a planning artifact. The public project currently
8
+ contains a public-sample task suite, lightweight baselines, Qwen3-Omni LoRA
9
+ preparation, and a smoke LoRA artifact. It does not currently contain a
10
+ from-scratch Xperience foundation model or full-corpus pretraining run.
11
+
12
+ ## Why This Is A Natural Long-Term Goal
13
+
14
+ Xperience-10M is designed for physical-AI pretraining rather than only
15
+ single-task supervised learning. The official dataset card describes 10 million
16
+ experiences, 10,000 hours of synchronized first-person recordings, six video
17
+ streams, audio, stereo depth, camera pose, hand and full-body mocap, IMU, and
18
+ hierarchical language annotations. It also reports 2.88B RGB frames, 720M depth
19
+ frames, 576M pose/mocap frames, 7.2B IMU frames, and about 1 PB of total data.
20
+
21
+ That scale and alignment make a specific Xperience-native model plausible:
22
+ not a general web-scale omni model, but an embodied model specialized for
23
+ egocentric perception, human-object interaction, temporal dynamics, physical
24
+ state, and task intent.
25
+
26
+ ## Target Model
27
+
28
+ The proposed model name is **Xperience Embodied Foundation Model**.
29
+
30
+ The model should learn a shared temporal representation of embodied experience:
31
+ what the wearer sees and hears, how the camera moves, how the body and hands
32
+ move, what objects are involved, what geometry is present, and what task is
33
+ being performed.
34
+
35
+ Expected modules:
36
+
37
+ | Module | Input | Role |
38
+ | --- | --- | --- |
39
+ | Multi-view video encoder | fisheye/stereo/RGB streams | visual state, egocentric context, object interaction |
40
+ | Audio encoder | synchronized MP4 audio | event cues, contact-like sound, temporal grounding |
41
+ | Depth and geometry encoder | depth, confidence, calibration | spatial structure and 3D/4D scene cues |
42
+ | Pose/SLAM encoder | camera trajectory and orientation | ego-motion, viewpoint, scene traversal |
43
+ | Mocap encoder | hand/body joints | human motion, hand-object interaction, affordance cues |
44
+ | IMU encoder | accelerometer/gyroscope streams | inertial dynamics and wearable motion |
45
+ | Language encoder/decoder | task/subtask/action/object annotations | semantic grounding and structured generation |
46
+ | Temporal fusion transformer | aligned per-window modality tokens | shared embodied representation across time |
47
+ | Task heads / decoders | fused representation | action, caption, future motion, retrieval, reconstruction, and world-state outputs |
48
+
49
+ ## Pretraining Objectives
50
+
51
+ The model should not rely on one loss. It should combine complementary
52
+ objectives so that every modality contributes to the shared representation.
53
+
54
+ | Objective | What the model learns | Example output |
55
+ | --- | --- | --- |
56
+ | Masked multimodal modeling | recover hidden video/depth/sensor tokens from context | reconstructed latent patches or sensor features |
57
+ | Cross-modal contrastive alignment | align video, motion, audio, geometry, and language from the same time window | matching score or retrieval embedding |
58
+ | Future-state prediction | predict what changes after the current window | future visual/depth/motion latent |
59
+ | Ego-motion and hand-motion forecasting | model wearer/body dynamics | future camera delta or hand trajectory |
60
+ | Action and procedure prediction | connect physical state to task semantics | action, subtask, transition, next action |
61
+ | Language grounding and captioning | connect temporal windows to natural language | caption, object/action grounding, structured JSON |
62
+ | Contact and affordance prediction | learn interaction state from human-object motion | contact state, relevant object set |
63
+ | Optional policy-style targets | learn action-like outputs after target conversion | action token, motion chunk, retargeted policy target |
64
+
65
+ ## Staged Pretraining Plan
66
+
67
+ ### Stage 0: Data Contract And Quality Gate
68
+
69
+ Use the existing public-sample task suite to define the data contract. Before
70
+ pretraining, every episode must pass a strict manifest check:
71
+
72
+ - `annotation.hdf5` exists and is readable,
73
+ - video streams are present or missing views are explicitly recorded,
74
+ - audio can be extracted or marked unavailable,
75
+ - depth, pose, mocap, IMU, calibration, and language fields are indexed,
76
+ - windows are aligned by timestamp or frame index,
77
+ - train/val/test splits are episode-level, not window-level leakage splits,
78
+ - raw data remains outside public repos and Hugging Face artifact mirrors.
79
+
80
+ ### Stage 1: 128-1,000 Episode Representation Pilot
81
+
82
+ Start with a smaller model and a selected subset. The goal is to test whether
83
+ the multimodal objectives train stably and improve held-out task performance.
84
+
85
+ Recommended scale:
86
+
87
+ - 128 to 1,000 episodes,
88
+ - frozen or lightly trainable video/audio encoders at first,
89
+ - 0.3B-1B temporal fusion model,
90
+ - all available sensor modalities represented as tokens,
91
+ - evaluation on the existing 12-task suite plus future-state/retrieval probes.
92
+
93
+ ### Stage 2: 10K Episode Domain Model
94
+
95
+ Scale after the pilot proves value. This stage should train a stronger
96
+ Xperience-specific representation model rather than only fine-tuning a general
97
+ omni model.
98
+
99
+ Recommended scale:
100
+
101
+ - thousands to 10K episodes,
102
+ - 1B-3B parameter multimodal temporal model,
103
+ - mixed supervised, contrastive, and predictive objectives,
104
+ - held-out sessions and held-out activities,
105
+ - robustness to missing camera views and sensor dropout.
106
+
107
+ ### Stage 3: Full-Corpus Xperience Embodied Foundation Model
108
+
109
+ Use this stage only if storage, data throughput, and multi-node compute are
110
+ available. The goal is a domain foundation model over embodied human experience,
111
+ not a general internet-scale language model.
112
+
113
+ Recommended scale:
114
+
115
+ - all available Xperience-10M episodes,
116
+ - 3B-7B domain model as a realistic first full-corpus target,
117
+ - larger models only after scaling curves justify the cost,
118
+ - mixture of reconstruction, retrieval, forecasting, language, and world-model
119
+ objectives,
120
+ - downstream evaluation on held-out episodes, held-out sessions, unseen
121
+ objects, unseen activities, and downstream robotics/world-model tasks.
122
+
123
+ ## Hardware Requirements
124
+
125
+ These are planning ranges, not completed run measurements from this repo.
126
+
127
+ | Training goal | Typical compute | Storage and data path | Practical use |
128
+ | --- | --- | --- | --- |
129
+ | 0.3B-1B pilot | 8-32 modern 80GB-class data-center GPUs | tens of TB plus fast local cache | prove objectives and data loaders |
130
+ | 1B-3B domain model | 32-128 GPUs | 100TB-scale cache, high-throughput decoding | serious research-scale pretraining |
131
+ | 3B-7B full-corpus domain model | 128-512 GPUs | PB-scale storage plus 100-400Gbps networking | first full Xperience-native foundation model |
132
+ | 30B-class omni model from scratch | 512-2,000+ GPUs | PB-scale storage, multi-node orchestration, large checkpoint budget | lab-scale project, not the first target |
133
+ | frontier general omni model | thousands of GPUs | data beyond Xperience-10M plus large infrastructure | out of scope for this project |
134
+
135
+ For full-corpus work, storage is as important as GPU count:
136
+
137
+ - raw corpus storage around the official dataset scale,
138
+ - 1.5-3x extra capacity for derived shards, caches, checkpoints, and metadata,
139
+ - fast NVMe cache for active shards,
140
+ - parallel media decoding and feature extraction workers,
141
+ - distributed training with reliable checkpoint/restart,
142
+ - per-episode provenance and split manifests.
143
+
144
+ ## Evaluation Protocol
145
+
146
+ The model should not be judged only by training loss. Evaluation should include:
147
+
148
+ - JSON validity and structured task metrics from the current task suite,
149
+ - action/subtask/contact/object metrics on held-out episodes,
150
+ - text-to-window and window-to-text retrieval,
151
+ - future ego-motion and hand-motion forecasting,
152
+ - cross-modal reconstruction and missing-modality robustness,
153
+ - held-out object/activity/session generalization,
154
+ - qualitative inspection of retrieved or generated future states,
155
+ - downstream transfer to Qwen3-Omni, Cosmos-style world modeling, and
156
+ policy/action branches.
157
+
158
+ ## Relationship To Existing Public Work
159
+
160
+ The current public project is the harness for this future model:
161
+
162
+ - the 12-task suite defines concrete input/output contracts,
163
+ - minimal and neural baselines provide initial supervised targets,
164
+ - audio/modality diagnostics show which signals contribute,
165
+ - Qwen3-Omni LoRA provides the first trainable multi-episode adapter path,
166
+ - Cosmos and policy branches define downstream model families,
167
+ - the pretraining goal unifies these into a long-term representation-learning
168
+ direction.
169
+
170
+ The next practical step is still selected multi-episode preparation and
171
+ held-out Qwen3-Omni LoRA evaluation. Full-corpus pretraining should come after
172
+ the smaller scaling stages show measurable value.
173
+
174
+ ## Source Links
175
+
176
+ - Official Xperience-10M dataset: https://huggingface.co/datasets/ropedia-ai/xperience-10m
177
+ - Ropedia Xperience-10M release page: https://ropedia.com/blog/20260316_xperience_10m
178
+ - Ropedia physical-AI data infrastructure page: https://ropedia-dev.com/
data/artifact_index.json CHANGED
@@ -1,11 +1,11 @@
1
  {
2
  "title": "Ropedia Xperience-10M Task Suite Artifact Index",
3
- "generated_at_utc": "2026-06-04T16:42:13+00:00",
4
  "status": "pass",
5
- "artifact_count": 72,
6
  "missing": [],
7
  "by_kind": {
8
- "project_path": 11,
9
  "project_scope": 1,
10
  "source_alignment": 5,
11
  "publication_workflow": 1,
@@ -62,8 +62,8 @@
62
  "surface": "repo_hf",
63
  "shows": "Gives a compact current-state table for first-pass readers.",
64
  "exists": true,
65
- "bytes": 7138,
66
- "sha256": "67d85a198ee90082e47d790bd0f4d9dafbc97625cd39b17cc94b9785ec25104a"
67
  },
68
  {
69
  "id": "project_status_json",
@@ -73,8 +73,8 @@
73
  "surface": "website_hf",
74
  "shows": "Machine-readable copy of the current project status for website and HF mirrors.",
75
  "exists": true,
76
- "bytes": 9169,
77
- "sha256": "50d3c87b774c8375dcb897bd363d25e392e5fd6571571c41d56e623df15063f8"
78
  },
79
  {
80
  "id": "research_roadmap",
@@ -84,8 +84,8 @@
84
  "surface": "repo_hf",
85
  "shows": "Defines the path from public-sample task development to multi-episode held-out evaluation and larger omni-model extensions.",
86
  "exists": true,
87
- "bytes": 6677,
88
- "sha256": "58491bfb68ad3e6b7569bdb1a3cac3de7682a49beb9de368a114d58ebf0b118b"
89
  },
90
  {
91
  "id": "research_roadmap_json",
@@ -95,8 +95,8 @@
95
  "surface": "website_hf",
96
  "shows": "Machine-readable research roadmap for the website and Hugging Face mirrors.",
97
  "exists": true,
98
- "bytes": 5758,
99
- "sha256": "54657eb8824416d2128d6e5710543bdaf9e41d7c2fa46dd14ad6b58fede3b5db"
100
  },
101
  {
102
  "id": "foundation_model_plan",
@@ -106,8 +106,8 @@
106
  "surface": "repo_hf",
107
  "shows": "Defines the post-data-gate backbone choices: Qwen3-Omni first, Cosmos 3 for world modeling, and VLA/policy models after action-target conversion.",
108
  "exists": true,
109
- "bytes": 6559,
110
- "sha256": "955be6559b554f1c6c4141dd6ca2818127d89585df3940c2bd9b975ad9047926"
111
  },
112
  {
113
  "id": "foundation_model_plan_json",
@@ -117,8 +117,19 @@
117
  "surface": "website_hf",
118
  "shows": "Machine-readable foundation-model selection matrix with source links, entry conditions, and evaluation additions.",
119
  "exists": true,
120
- "bytes": 8889,
121
- "sha256": "e9b11114fa290253000b921575586780ccc3ba17665235259d4326c524f6ce97"
 
 
 
 
 
 
 
 
 
 
 
122
  },
123
  {
124
  "id": "evidence_contract",
@@ -150,8 +161,8 @@
150
  "surface": "repo_hf",
151
  "shows": "Gives the human-readable map from project scope to data, tasks, platform mirrors, and scale-up status.",
152
  "exists": true,
153
- "bytes": 16890,
154
- "sha256": "8bce9a773daf36214e377a7154b72a4493efd0f7d1a1941d5e0fc9bf784a29e5"
155
  },
156
  {
157
  "id": "official_dataset_card_alignment",
@@ -195,7 +206,7 @@
195
  "shows": "Machine-readable source-alignment pass/fail check for repo, website, and HF surfaces.",
196
  "exists": true,
197
  "bytes": 4432,
198
- "sha256": "96c7adc61c869fab71ef34ec2f6ec4f5f88af844509bd3d51d3818732d1f84b6"
199
  },
200
  {
201
  "id": "source_alignment_validator",
@@ -573,8 +584,8 @@
573
  "surface": "repo_hf",
574
  "shows": "Generates the selective artifact catalog from local files.",
575
  "exists": true,
576
- "bytes": 26568,
577
- "sha256": "a611b399e858560f6afb41e121f033724753c5167d04e0d7bf243e569de88f04"
578
  },
579
  {
580
  "id": "publication_audit",
@@ -585,7 +596,7 @@
585
  "volatile": true,
586
  "shows": "Confirms public bundles exclude raw data, caches, heavy archives, and credential text.",
587
  "exists": true,
588
- "bytes": 7289,
589
  "hash_policy": "existence_and_size_only"
590
  },
591
  {
@@ -597,7 +608,7 @@
597
  "volatile": true,
598
  "shows": "Separates setup paths from completed held-out-episode results.",
599
  "exists": true,
600
- "bytes": 19505,
601
  "hash_policy": "existence_and_size_only"
602
  },
603
  {
@@ -609,7 +620,7 @@
609
  "volatile": true,
610
  "shows": "Confirms prepared GitHub/HF Space/artifact/model mirrors share the same critical data, figure, website HTML, and validator files.",
611
  "exists": true,
612
- "bytes": 108617,
613
  "hash_policy": "existence_and_size_only"
614
  },
615
  {
@@ -621,7 +632,7 @@
621
  "volatile": true,
622
  "shows": "Confirms local website links, anchors, JSON data files, and referenced images resolve.",
623
  "exists": true,
624
- "bytes": 14923,
625
  "hash_policy": "existence_and_size_only"
626
  },
627
  {
 
1
  {
2
  "title": "Ropedia Xperience-10M Task Suite Artifact Index",
3
+ "generated_at_utc": "2026-06-04T20:40:52+00:00",
4
  "status": "pass",
5
+ "artifact_count": 73,
6
  "missing": [],
7
  "by_kind": {
8
+ "project_path": 12,
9
  "project_scope": 1,
10
  "source_alignment": 5,
11
  "publication_workflow": 1,
 
62
  "surface": "repo_hf",
63
  "shows": "Gives a compact current-state table for first-pass readers.",
64
  "exists": true,
65
+ "bytes": 7207,
66
+ "sha256": "7baaba976ccc254da1a03ee2653057d1e08f3fb0c0cad035886c362442828720"
67
  },
68
  {
69
  "id": "project_status_json",
 
73
  "surface": "website_hf",
74
  "shows": "Machine-readable copy of the current project status for website and HF mirrors.",
75
  "exists": true,
76
+ "bytes": 9874,
77
+ "sha256": "600c95726eae3404127a8b2110f35468ff2ba02943cae0fbcd3ea43c66109d3e"
78
  },
79
  {
80
  "id": "research_roadmap",
 
84
  "surface": "repo_hf",
85
  "shows": "Defines the path from public-sample task development to multi-episode held-out evaluation and larger omni-model extensions.",
86
  "exists": true,
87
+ "bytes": 8388,
88
+ "sha256": "0b3e3356076998ad94dc39f708cc783a4ebeab76c9da661cdd37ea12a3bb3665"
89
  },
90
  {
91
  "id": "research_roadmap_json",
 
95
  "surface": "website_hf",
96
  "shows": "Machine-readable research roadmap for the website and Hugging Face mirrors.",
97
  "exists": true,
98
+ "bytes": 7161,
99
+ "sha256": "cc96118c2c05108c831616151bc027441f7545495adeeb6a4a6a6bffe8da7801"
100
  },
101
  {
102
  "id": "foundation_model_plan",
 
106
  "surface": "repo_hf",
107
  "shows": "Defines the post-data-gate backbone choices: Qwen3-Omni first, Cosmos 3 for world modeling, and VLA/policy models after action-target conversion.",
108
  "exists": true,
109
+ "bytes": 9075,
110
+ "sha256": "444d13ab556d2e16a199a7fca191b87c85ab8685d167aab357bc6341839299a2"
111
  },
112
  {
113
  "id": "foundation_model_plan_json",
 
117
  "surface": "website_hf",
118
  "shows": "Machine-readable foundation-model selection matrix with source links, entry conditions, and evaluation additions.",
119
  "exists": true,
120
+ "bytes": 12981,
121
+ "sha256": "9cce52025a2e2f8afb4660e2af3353aea6ad0a1af380849218dd74c0acc271bb"
122
+ },
123
+ {
124
+ "id": "xperience_embodied_foundation_pretraining",
125
+ "title": "Xperience Embodied Foundation Model pretraining goal",
126
+ "path": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md",
127
+ "kind": "project_path",
128
+ "surface": "repo_hf",
129
+ "shows": "Describes the future full-corpus Xperience-native pretraining goal, target modules, objectives, staged scale-up, hardware ranges, and evaluation protocol.",
130
+ "exists": true,
131
+ "bytes": 9182,
132
+ "sha256": "b5a6ddc58647cd895a4772b110ecc9f4d685427fb37b81b22c6c02d2b9b323f1"
133
  },
134
  {
135
  "id": "evidence_contract",
 
161
  "surface": "repo_hf",
162
  "shows": "Gives the human-readable map from project scope to data, tasks, platform mirrors, and scale-up status.",
163
  "exists": true,
164
+ "bytes": 11440,
165
+ "sha256": "9b8821a9b14fe1744f2e6b5c419b2c5daaf70b57f1944caf1105c36c0c66c119"
166
  },
167
  {
168
  "id": "official_dataset_card_alignment",
 
206
  "shows": "Machine-readable source-alignment pass/fail check for repo, website, and HF surfaces.",
207
  "exists": true,
208
  "bytes": 4432,
209
+ "sha256": "06c6e2d111c72df01ed127fd288e6675b63e35a21ae12a2523931a072bd0bc49"
210
  },
211
  {
212
  "id": "source_alignment_validator",
 
584
  "surface": "repo_hf",
585
  "shows": "Generates the selective artifact catalog from local files.",
586
  "exists": true,
587
+ "bytes": 27020,
588
+ "sha256": "0ca7ed96f24caecbab31687cffa99f0eba8471258986412a294614e688c5aff5"
589
  },
590
  {
591
  "id": "publication_audit",
 
596
  "volatile": true,
597
  "shows": "Confirms public bundles exclude raw data, caches, heavy archives, and credential text.",
598
  "exists": true,
599
+ "bytes": 11811,
600
  "hash_policy": "existence_and_size_only"
601
  },
602
  {
 
608
  "volatile": true,
609
  "shows": "Separates setup paths from completed held-out-episode results.",
610
  "exists": true,
611
+ "bytes": 18981,
612
  "hash_policy": "existence_and_size_only"
613
  },
614
  {
 
620
  "volatile": true,
621
  "shows": "Confirms prepared GitHub/HF Space/artifact/model mirrors share the same critical data, figure, website HTML, and validator files.",
622
  "exists": true,
623
+ "bytes": 108621,
624
  "hash_policy": "existence_and_size_only"
625
  },
626
  {
 
632
  "volatile": true,
633
  "shows": "Confirms local website links, anchors, JSON data files, and referenced images resolve.",
634
  "exists": true,
635
+ "bytes": 14891,
636
  "hash_policy": "existence_and_size_only"
637
  },
638
  {
data/foundation_model_plan.json CHANGED
@@ -2,6 +2,16 @@
2
  "title": "Xperience-10M Foundation Model Plan",
3
  "status": "planning_artifact",
4
  "current_boundary": "No held-out multi-episode foundation-model result has been completed in this repo. The current foundation-model artifacts are setup-stage until enough valid episodes are prepared and evaluated.",
 
 
 
 
 
 
 
 
 
 
5
  "decision": {
6
  "immediate_trainable_backbone": "Qwen3-Omni",
7
  "first_world_model_branch": "Cosmos 3",
@@ -10,7 +20,65 @@
10
  "openpi pi0/pi0.5",
11
  "NVIDIA GR00T"
12
  ],
13
- "external_reasoning_reference": "Gemini Robotics"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  },
15
  "model_families": [
16
  {
@@ -112,6 +180,21 @@
112
  "current_decision": "optional_baseline_after_data_staging",
113
  "entry_condition": "Action labels and baseline protocol exist.",
114
  "public_source": "https://github.com/huggingface/lerobot"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
115
  }
116
  ],
117
  "execution_order": [
@@ -144,6 +227,11 @@
144
  "step": 6,
145
  "name": "Publishing threshold",
146
  "action": "Publish branch results only with real manifests, predictions, metrics, and qualitative examples."
 
 
 
 
 
147
  }
148
  ],
149
  "evaluation_additions": [
@@ -230,6 +318,10 @@
230
  {
231
  "label": "LeRobot / SmolVLA",
232
  "url": "https://github.com/huggingface/lerobot"
 
 
 
 
233
  }
234
  ]
235
  }
 
2
  "title": "Xperience-10M Foundation Model Plan",
3
  "status": "planning_artifact",
4
  "current_boundary": "No held-out multi-episode foundation-model result has been completed in this repo. The current foundation-model artifacts are setup-stage until enough valid episodes are prepared and evaluated.",
5
+ "backbone_registry": {
6
+ "config_dir": "configs/omni_backbones",
7
+ "validator": "scripts/omni/backbone_registry.py --validate --json",
8
+ "extension_contract": "OMNI_MODEL_EXTENSION_CONTRACT.md",
9
+ "implemented_backbone": "qwen3_omni_lora",
10
+ "planned_backbones": [
11
+ "cosmos_world_model",
12
+ "policy_vla_branch"
13
+ ]
14
+ },
15
  "decision": {
16
  "immediate_trainable_backbone": "Qwen3-Omni",
17
  "first_world_model_branch": "Cosmos 3",
 
20
  "openpi pi0/pi0.5",
21
  "NVIDIA GR00T"
22
  ],
23
+ "external_reasoning_reference": "Gemini Robotics",
24
+ "long_term_native_pretraining_goal": "Xperience Embodied Foundation Model"
25
+ },
26
+ "future_pretraining_goal": {
27
+ "name": "Xperience Embodied Foundation Model",
28
+ "status": "future_planning_goal",
29
+ "role": "Domain-specific embodied foundation model pretrained on full Xperience-10M if full-corpus data, storage, and compute become available.",
30
+ "not_current_result": true,
31
+ "document": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md",
32
+ "entry_conditions": [
33
+ "Selected multi-episode Qwen3-Omni pilot trains and evaluates cleanly.",
34
+ "Scaling from 128 episodes to thousands of episodes shows measurable value.",
35
+ "Full-corpus storage, derived-shard storage, and fast active-cache capacity are available.",
36
+ "Distributed training, checkpoint/restart, and provenance tracking are reliable.",
37
+ "Evaluation covers held-out episodes, sessions, activities, objects, and missing-modality robustness."
38
+ ],
39
+ "target_modules": [
40
+ "multi-view video encoder",
41
+ "audio encoder",
42
+ "depth and geometry encoder",
43
+ "pose/SLAM encoder",
44
+ "hand/body mocap encoder",
45
+ "IMU encoder",
46
+ "language encoder/decoder",
47
+ "temporal fusion transformer",
48
+ "task heads and decoders"
49
+ ],
50
+ "pretraining_objectives": [
51
+ "masked multimodal modeling",
52
+ "cross-modal contrastive alignment",
53
+ "future-state prediction",
54
+ "ego-motion and hand-motion forecasting",
55
+ "action and procedure prediction",
56
+ "language grounding and captioning",
57
+ "contact and affordance prediction",
58
+ "optional policy-style targets after action conversion"
59
+ ],
60
+ "hardware_ranges": [
61
+ {
62
+ "goal": "0.3B-1B pilot",
63
+ "compute": "8-32 modern 80GB-class data-center GPUs",
64
+ "use": "prove objectives and data loaders"
65
+ },
66
+ {
67
+ "goal": "1B-3B domain model",
68
+ "compute": "32-128 GPUs",
69
+ "use": "research-scale Xperience representation learning"
70
+ },
71
+ {
72
+ "goal": "3B-7B full-corpus domain model",
73
+ "compute": "128-512 GPUs",
74
+ "use": "first realistic full Xperience-native foundation model"
75
+ },
76
+ {
77
+ "goal": "30B-class omni model from scratch",
78
+ "compute": "512-2000+ GPUs",
79
+ "use": "lab-scale project after scaling curves justify cost"
80
+ }
81
+ ]
82
  },
83
  "model_families": [
84
  {
 
180
  "current_decision": "optional_baseline_after_data_staging",
181
  "entry_condition": "Action labels and baseline protocol exist.",
182
  "public_source": "https://github.com/huggingface/lerobot"
183
+ },
184
+ {
185
+ "priority": 8,
186
+ "family": "Xperience Embodied Foundation Model",
187
+ "category": "xperience_native_pretraining_goal",
188
+ "openness": "future project-specific model if full-corpus access and compute exist",
189
+ "best_role": "Domain model over synchronized embodied experience.",
190
+ "xperience10m_fit": [
191
+ "Uses the full aligned modality stack rather than treating sensors as auxiliary metadata.",
192
+ "Targets temporal embodied representation learning across perception, motion, geometry, audio, and language.",
193
+ "Can become the shared pretraining backbone for Qwen-style instruction tasks, Cosmos-style world modeling, and policy/action branches."
194
+ ],
195
+ "current_decision": "future_goal_after_scaling_evidence",
196
+ "entry_condition": "Full-corpus data path, PB-scale storage, multi-node compute, and positive smaller-run scaling evidence.",
197
+ "public_source": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md"
198
  }
199
  ],
200
  "execution_order": [
 
227
  "step": 6,
228
  "name": "Publishing threshold",
229
  "action": "Publish branch results only with real manifests, predictions, metrics, and qualitative examples."
230
+ },
231
+ {
232
+ "step": 7,
233
+ "name": "Xperience-native pretraining",
234
+ "action": "Start a from-scratch Xperience Embodied Foundation Model only after smaller scaling stages, full-corpus storage, multi-node compute, and held-out evaluation protocols are in place."
235
  }
236
  ],
237
  "evaluation_additions": [
 
318
  {
319
  "label": "LeRobot / SmolVLA",
320
  "url": "https://github.com/huggingface/lerobot"
321
+ },
322
+ {
323
+ "label": "Xperience Embodied Foundation Model pretraining plan",
324
+ "url": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md"
325
  }
326
  ]
327
  }
data/mirror_parity.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-04T18:33:44+00:00",
4
  "hf_root": "hf_publish",
5
  "summary": {
6
  "group_count": 101,
@@ -71,27 +71,27 @@
71
  "local": {
72
  "path": "repo:docs/data/artifact_index.json",
73
  "exists": true,
74
- "bytes": 32296,
75
- "sha256": "5494e5ee1e40bc50d44a9cd6f77c8de694175939bda4f174fb5b1554e53ec508"
76
  },
77
  "mirrors": {
78
  "hf_space": {
79
  "path": "hf_space:data/artifact_index.json",
80
  "exists": true,
81
- "bytes": 32296,
82
- "sha256": "5494e5ee1e40bc50d44a9cd6f77c8de694175939bda4f174fb5b1554e53ec508"
83
  },
84
  "hf_artifacts": {
85
  "path": "hf_artifacts:docs/data/artifact_index.json",
86
  "exists": true,
87
- "bytes": 32296,
88
- "sha256": "5494e5ee1e40bc50d44a9cd6f77c8de694175939bda4f174fb5b1554e53ec508"
89
  },
90
  "hf_model": {
91
  "path": "hf_model:metrics/artifact_index.json",
92
  "exists": true,
93
- "bytes": 32296,
94
- "sha256": "5494e5ee1e40bc50d44a9cd6f77c8de694175939bda4f174fb5b1554e53ec508"
95
  }
96
  },
97
  "failures": []
@@ -226,27 +226,27 @@
226
  "local": {
227
  "path": "repo:docs/data/foundation_model_plan.json",
228
  "exists": true,
229
- "bytes": 8889,
230
- "sha256": "e9b11114fa290253000b921575586780ccc3ba17665235259d4326c524f6ce97"
231
  },
232
  "mirrors": {
233
  "hf_space": {
234
  "path": "hf_space:data/foundation_model_plan.json",
235
  "exists": true,
236
- "bytes": 8889,
237
- "sha256": "e9b11114fa290253000b921575586780ccc3ba17665235259d4326c524f6ce97"
238
  },
239
  "hf_artifacts": {
240
  "path": "hf_artifacts:docs/data/foundation_model_plan.json",
241
  "exists": true,
242
- "bytes": 8889,
243
- "sha256": "e9b11114fa290253000b921575586780ccc3ba17665235259d4326c524f6ce97"
244
  },
245
  "hf_model": {
246
  "path": "hf_model:metrics/foundation_model_plan.json",
247
  "exists": true,
248
- "bytes": 8889,
249
- "sha256": "e9b11114fa290253000b921575586780ccc3ba17665235259d4326c524f6ce97"
250
  }
251
  },
252
  "failures": []
@@ -412,27 +412,27 @@
412
  "local": {
413
  "path": "repo:docs/data/project_status.json",
414
  "exists": true,
415
- "bytes": 9169,
416
- "sha256": "50d3c87b774c8375dcb897bd363d25e392e5fd6571571c41d56e623df15063f8"
417
  },
418
  "mirrors": {
419
  "hf_space": {
420
  "path": "hf_space:data/project_status.json",
421
  "exists": true,
422
- "bytes": 9169,
423
- "sha256": "50d3c87b774c8375dcb897bd363d25e392e5fd6571571c41d56e623df15063f8"
424
  },
425
  "hf_artifacts": {
426
  "path": "hf_artifacts:docs/data/project_status.json",
427
  "exists": true,
428
- "bytes": 9169,
429
- "sha256": "50d3c87b774c8375dcb897bd363d25e392e5fd6571571c41d56e623df15063f8"
430
  },
431
  "hf_model": {
432
  "path": "hf_model:metrics/project_status.json",
433
  "exists": true,
434
- "bytes": 9169,
435
- "sha256": "50d3c87b774c8375dcb897bd363d25e392e5fd6571571c41d56e623df15063f8"
436
  }
437
  },
438
  "failures": []
@@ -444,26 +444,26 @@
444
  "path": "repo:docs/data/publication_audit.json",
445
  "exists": true,
446
  "bytes": 7237,
447
- "sha256": "a95c93592ba70709b2fad24a911d19329e6823f25862cd4fcb256788190dd0f2"
448
  },
449
  "mirrors": {
450
  "hf_space": {
451
  "path": "hf_space:data/publication_audit.json",
452
  "exists": true,
453
  "bytes": 7237,
454
- "sha256": "a95c93592ba70709b2fad24a911d19329e6823f25862cd4fcb256788190dd0f2"
455
  },
456
  "hf_artifacts": {
457
  "path": "hf_artifacts:docs/data/publication_audit.json",
458
  "exists": true,
459
  "bytes": 7237,
460
- "sha256": "a95c93592ba70709b2fad24a911d19329e6823f25862cd4fcb256788190dd0f2"
461
  },
462
  "hf_model": {
463
  "path": "hf_model:metrics/publication_audit.json",
464
  "exists": true,
465
  "bytes": 7237,
466
- "sha256": "a95c93592ba70709b2fad24a911d19329e6823f25862cd4fcb256788190dd0f2"
467
  }
468
  },
469
  "failures": []
@@ -598,27 +598,27 @@
598
  "local": {
599
  "path": "repo:docs/data/research_roadmap.json",
600
  "exists": true,
601
- "bytes": 5758,
602
- "sha256": "54657eb8824416d2128d6e5710543bdaf9e41d7c2fa46dd14ad6b58fede3b5db"
603
  },
604
  "mirrors": {
605
  "hf_space": {
606
  "path": "hf_space:data/research_roadmap.json",
607
  "exists": true,
608
- "bytes": 5758,
609
- "sha256": "54657eb8824416d2128d6e5710543bdaf9e41d7c2fa46dd14ad6b58fede3b5db"
610
  },
611
  "hf_artifacts": {
612
  "path": "hf_artifacts:docs/data/research_roadmap.json",
613
  "exists": true,
614
- "bytes": 5758,
615
- "sha256": "54657eb8824416d2128d6e5710543bdaf9e41d7c2fa46dd14ad6b58fede3b5db"
616
  },
617
  "hf_model": {
618
  "path": "hf_model:metrics/research_roadmap.json",
619
  "exists": true,
620
- "bytes": 5758,
621
- "sha256": "54657eb8824416d2128d6e5710543bdaf9e41d7c2fa46dd14ad6b58fede3b5db"
622
  }
623
  },
624
  "failures": []
@@ -629,27 +629,27 @@
629
  "local": {
630
  "path": "repo:docs/data/research_roadmap_interactive.json",
631
  "exists": true,
632
- "bytes": 131519,
633
- "sha256": "004fbcc7a3582da88dd66504d686604ecb0f04f65c9c8166bb0583e0fc174274"
634
  },
635
  "mirrors": {
636
  "hf_space": {
637
  "path": "hf_space:data/research_roadmap_interactive.json",
638
  "exists": true,
639
- "bytes": 131519,
640
- "sha256": "004fbcc7a3582da88dd66504d686604ecb0f04f65c9c8166bb0583e0fc174274"
641
  },
642
  "hf_artifacts": {
643
  "path": "hf_artifacts:docs/data/research_roadmap_interactive.json",
644
  "exists": true,
645
- "bytes": 131519,
646
- "sha256": "004fbcc7a3582da88dd66504d686604ecb0f04f65c9c8166bb0583e0fc174274"
647
  },
648
  "hf_model": {
649
  "path": "hf_model:metrics/research_roadmap_interactive.json",
650
  "exists": true,
651
- "bytes": 131519,
652
- "sha256": "004fbcc7a3582da88dd66504d686604ecb0f04f65c9c8166bb0583e0fc174274"
653
  }
654
  },
655
  "failures": []
@@ -1692,21 +1692,21 @@
1692
  "local": {
1693
  "path": "repo:scripts/build_artifact_index.py",
1694
  "exists": true,
1695
- "bytes": 26568,
1696
- "sha256": "a611b399e858560f6afb41e121f033724753c5167d04e0d7bf243e569de88f04"
1697
  },
1698
  "mirrors": {
1699
  "hf_artifacts": {
1700
  "path": "hf_artifacts:scripts/build_artifact_index.py",
1701
  "exists": true,
1702
- "bytes": 26568,
1703
- "sha256": "a611b399e858560f6afb41e121f033724753c5167d04e0d7bf243e569de88f04"
1704
  },
1705
  "hf_model": {
1706
  "path": "hf_model:scripts/build_artifact_index.py",
1707
  "exists": true,
1708
- "bytes": 26568,
1709
- "sha256": "a611b399e858560f6afb41e121f033724753c5167d04e0d7bf243e569de88f04"
1710
  }
1711
  },
1712
  "failures": []
@@ -2017,21 +2017,21 @@
2017
  "local": {
2018
  "path": "repo:scripts/validate_publication_package.py",
2019
  "exists": true,
2020
- "bytes": 17125,
2021
- "sha256": "51febee7a4caa4e3cbb3833c0c13ac502bd7106fdb3df06e868ed00bc8f9fd9e"
2022
  },
2023
  "mirrors": {
2024
  "hf_artifacts": {
2025
  "path": "hf_artifacts:scripts/validate_publication_package.py",
2026
  "exists": true,
2027
- "bytes": 17125,
2028
- "sha256": "51febee7a4caa4e3cbb3833c0c13ac502bd7106fdb3df06e868ed00bc8f9fd9e"
2029
  },
2030
  "hf_model": {
2031
  "path": "hf_model:scripts/validate_publication_package.py",
2032
  "exists": true,
2033
- "bytes": 17125,
2034
- "sha256": "51febee7a4caa4e3cbb3833c0c13ac502bd7106fdb3df06e868ed00bc8f9fd9e"
2035
  }
2036
  },
2037
  "failures": []
@@ -2217,21 +2217,21 @@
2217
  "local": {
2218
  "path": "repo:docs/index.html",
2219
  "exists": true,
2220
- "bytes": 172286,
2221
- "sha256": "a736850416c0061adddbb6ced5897efd1add499ec26e510b6fe21a4945b341c8"
2222
  },
2223
  "mirrors": {
2224
  "hf_space": {
2225
  "path": "hf_space:index.html",
2226
  "exists": true,
2227
- "bytes": 172286,
2228
- "sha256": "a736850416c0061adddbb6ced5897efd1add499ec26e510b6fe21a4945b341c8"
2229
  },
2230
  "hf_artifacts_docs": {
2231
  "path": "hf_artifacts:docs/index.html",
2232
  "exists": true,
2233
- "bytes": 172286,
2234
- "sha256": "a736850416c0061adddbb6ced5897efd1add499ec26e510b6fe21a4945b341c8"
2235
  }
2236
  },
2237
  "failures": []
@@ -2242,21 +2242,21 @@
2242
  "local": {
2243
  "path": "repo:docs/research_roadmap.html",
2244
  "exists": true,
2245
- "bytes": 31554,
2246
- "sha256": "f51e83a4495f2d2012ec4c48191d66ca4456a00d7fcb335a427b7d86afc66109"
2247
  },
2248
  "mirrors": {
2249
  "hf_space": {
2250
  "path": "hf_space:research_roadmap.html",
2251
  "exists": true,
2252
- "bytes": 31554,
2253
- "sha256": "f51e83a4495f2d2012ec4c48191d66ca4456a00d7fcb335a427b7d86afc66109"
2254
  },
2255
  "hf_artifacts_docs": {
2256
  "path": "hf_artifacts:docs/research_roadmap.html",
2257
  "exists": true,
2258
- "bytes": 31554,
2259
- "sha256": "f51e83a4495f2d2012ec4c48191d66ca4456a00d7fcb335a427b7d86afc66109"
2260
  }
2261
  },
2262
  "failures": []
@@ -2844,27 +2844,27 @@
2844
  "local": {
2845
  "path": "repo:FOUNDATION_MODEL_PLAN.md",
2846
  "exists": true,
2847
- "bytes": 6559,
2848
- "sha256": "955be6559b554f1c6c4141dd6ca2818127d89585df3940c2bd9b975ad9047926"
2849
  },
2850
  "mirrors": {
2851
  "hf_space": {
2852
  "path": "hf_space:FOUNDATION_MODEL_PLAN.md",
2853
  "exists": true,
2854
- "bytes": 6559,
2855
- "sha256": "955be6559b554f1c6c4141dd6ca2818127d89585df3940c2bd9b975ad9047926"
2856
  },
2857
  "hf_artifacts": {
2858
  "path": "hf_artifacts:FOUNDATION_MODEL_PLAN.md",
2859
  "exists": true,
2860
- "bytes": 6559,
2861
- "sha256": "955be6559b554f1c6c4141dd6ca2818127d89585df3940c2bd9b975ad9047926"
2862
  },
2863
  "hf_model": {
2864
  "path": "hf_model:FOUNDATION_MODEL_PLAN.md",
2865
  "exists": true,
2866
- "bytes": 6559,
2867
- "sha256": "955be6559b554f1c6c4141dd6ca2818127d89585df3940c2bd9b975ad9047926"
2868
  }
2869
  },
2870
  "failures": []
@@ -2937,27 +2937,27 @@
2937
  "local": {
2938
  "path": "repo:RESEARCH_ROADMAP.md",
2939
  "exists": true,
2940
- "bytes": 6677,
2941
- "sha256": "58491bfb68ad3e6b7569bdb1a3cac3de7682a49beb9de368a114d58ebf0b118b"
2942
  },
2943
  "mirrors": {
2944
  "hf_space": {
2945
  "path": "hf_space:RESEARCH_ROADMAP.md",
2946
  "exists": true,
2947
- "bytes": 6677,
2948
- "sha256": "58491bfb68ad3e6b7569bdb1a3cac3de7682a49beb9de368a114d58ebf0b118b"
2949
  },
2950
  "hf_artifacts": {
2951
  "path": "hf_artifacts:RESEARCH_ROADMAP.md",
2952
  "exists": true,
2953
- "bytes": 6677,
2954
- "sha256": "58491bfb68ad3e6b7569bdb1a3cac3de7682a49beb9de368a114d58ebf0b118b"
2955
  },
2956
  "hf_model": {
2957
  "path": "hf_model:RESEARCH_ROADMAP.md",
2958
  "exists": true,
2959
- "bytes": 6677,
2960
- "sha256": "58491bfb68ad3e6b7569bdb1a3cac3de7682a49beb9de368a114d58ebf0b118b"
2961
  }
2962
  },
2963
  "failures": []
@@ -2968,27 +2968,27 @@
2968
  "local": {
2969
  "path": "repo:PROJECT_STATUS.md",
2970
  "exists": true,
2971
- "bytes": 6648,
2972
- "sha256": "b052c725472f1d59232918a4d5b0f3668534c1e25e24189307159f5a0157d58f"
2973
  },
2974
  "mirrors": {
2975
  "hf_space": {
2976
  "path": "hf_space:PROJECT_STATUS.md",
2977
  "exists": true,
2978
- "bytes": 6648,
2979
- "sha256": "b052c725472f1d59232918a4d5b0f3668534c1e25e24189307159f5a0157d58f"
2980
  },
2981
  "hf_artifacts": {
2982
  "path": "hf_artifacts:PROJECT_STATUS.md",
2983
  "exists": true,
2984
- "bytes": 6648,
2985
- "sha256": "b052c725472f1d59232918a4d5b0f3668534c1e25e24189307159f5a0157d58f"
2986
  },
2987
  "hf_model": {
2988
  "path": "hf_model:PROJECT_STATUS.md",
2989
  "exists": true,
2990
- "bytes": 6648,
2991
- "sha256": "b052c725472f1d59232918a4d5b0f3668534c1e25e24189307159f5a0157d58f"
2992
  }
2993
  },
2994
  "failures": []
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-04T20:45:22+00:00",
4
  "hf_root": "hf_publish",
5
  "summary": {
6
  "group_count": 101,
 
71
  "local": {
72
  "path": "repo:docs/data/artifact_index.json",
73
  "exists": true,
74
+ "bytes": 32864,
75
+ "sha256": "ec7d17898c42fd76109567c201f9638059b6a9a11a48817b32677a0eb2662178"
76
  },
77
  "mirrors": {
78
  "hf_space": {
79
  "path": "hf_space:data/artifact_index.json",
80
  "exists": true,
81
+ "bytes": 32864,
82
+ "sha256": "ec7d17898c42fd76109567c201f9638059b6a9a11a48817b32677a0eb2662178"
83
  },
84
  "hf_artifacts": {
85
  "path": "hf_artifacts:docs/data/artifact_index.json",
86
  "exists": true,
87
+ "bytes": 32864,
88
+ "sha256": "ec7d17898c42fd76109567c201f9638059b6a9a11a48817b32677a0eb2662178"
89
  },
90
  "hf_model": {
91
  "path": "hf_model:metrics/artifact_index.json",
92
  "exists": true,
93
+ "bytes": 32864,
94
+ "sha256": "ec7d17898c42fd76109567c201f9638059b6a9a11a48817b32677a0eb2662178"
95
  }
96
  },
97
  "failures": []
 
226
  "local": {
227
  "path": "repo:docs/data/foundation_model_plan.json",
228
  "exists": true,
229
+ "bytes": 12981,
230
+ "sha256": "9cce52025a2e2f8afb4660e2af3353aea6ad0a1af380849218dd74c0acc271bb"
231
  },
232
  "mirrors": {
233
  "hf_space": {
234
  "path": "hf_space:data/foundation_model_plan.json",
235
  "exists": true,
236
+ "bytes": 12981,
237
+ "sha256": "9cce52025a2e2f8afb4660e2af3353aea6ad0a1af380849218dd74c0acc271bb"
238
  },
239
  "hf_artifacts": {
240
  "path": "hf_artifacts:docs/data/foundation_model_plan.json",
241
  "exists": true,
242
+ "bytes": 12981,
243
+ "sha256": "9cce52025a2e2f8afb4660e2af3353aea6ad0a1af380849218dd74c0acc271bb"
244
  },
245
  "hf_model": {
246
  "path": "hf_model:metrics/foundation_model_plan.json",
247
  "exists": true,
248
+ "bytes": 12981,
249
+ "sha256": "9cce52025a2e2f8afb4660e2af3353aea6ad0a1af380849218dd74c0acc271bb"
250
  }
251
  },
252
  "failures": []
 
412
  "local": {
413
  "path": "repo:docs/data/project_status.json",
414
  "exists": true,
415
+ "bytes": 9874,
416
+ "sha256": "600c95726eae3404127a8b2110f35468ff2ba02943cae0fbcd3ea43c66109d3e"
417
  },
418
  "mirrors": {
419
  "hf_space": {
420
  "path": "hf_space:data/project_status.json",
421
  "exists": true,
422
+ "bytes": 9874,
423
+ "sha256": "600c95726eae3404127a8b2110f35468ff2ba02943cae0fbcd3ea43c66109d3e"
424
  },
425
  "hf_artifacts": {
426
  "path": "hf_artifacts:docs/data/project_status.json",
427
  "exists": true,
428
+ "bytes": 9874,
429
+ "sha256": "600c95726eae3404127a8b2110f35468ff2ba02943cae0fbcd3ea43c66109d3e"
430
  },
431
  "hf_model": {
432
  "path": "hf_model:metrics/project_status.json",
433
  "exists": true,
434
+ "bytes": 9874,
435
+ "sha256": "600c95726eae3404127a8b2110f35468ff2ba02943cae0fbcd3ea43c66109d3e"
436
  }
437
  },
438
  "failures": []
 
444
  "path": "repo:docs/data/publication_audit.json",
445
  "exists": true,
446
  "bytes": 7237,
447
+ "sha256": "7fbb19f8990b1a4d902e282c010d27e4391755564fa68af97d96c298c6b054f8"
448
  },
449
  "mirrors": {
450
  "hf_space": {
451
  "path": "hf_space:data/publication_audit.json",
452
  "exists": true,
453
  "bytes": 7237,
454
+ "sha256": "7fbb19f8990b1a4d902e282c010d27e4391755564fa68af97d96c298c6b054f8"
455
  },
456
  "hf_artifacts": {
457
  "path": "hf_artifacts:docs/data/publication_audit.json",
458
  "exists": true,
459
  "bytes": 7237,
460
+ "sha256": "7fbb19f8990b1a4d902e282c010d27e4391755564fa68af97d96c298c6b054f8"
461
  },
462
  "hf_model": {
463
  "path": "hf_model:metrics/publication_audit.json",
464
  "exists": true,
465
  "bytes": 7237,
466
+ "sha256": "7fbb19f8990b1a4d902e282c010d27e4391755564fa68af97d96c298c6b054f8"
467
  }
468
  },
469
  "failures": []
 
598
  "local": {
599
  "path": "repo:docs/data/research_roadmap.json",
600
  "exists": true,
601
+ "bytes": 7161,
602
+ "sha256": "cc96118c2c05108c831616151bc027441f7545495adeeb6a4a6a6bffe8da7801"
603
  },
604
  "mirrors": {
605
  "hf_space": {
606
  "path": "hf_space:data/research_roadmap.json",
607
  "exists": true,
608
+ "bytes": 7161,
609
+ "sha256": "cc96118c2c05108c831616151bc027441f7545495adeeb6a4a6a6bffe8da7801"
610
  },
611
  "hf_artifacts": {
612
  "path": "hf_artifacts:docs/data/research_roadmap.json",
613
  "exists": true,
614
+ "bytes": 7161,
615
+ "sha256": "cc96118c2c05108c831616151bc027441f7545495adeeb6a4a6a6bffe8da7801"
616
  },
617
  "hf_model": {
618
  "path": "hf_model:metrics/research_roadmap.json",
619
  "exists": true,
620
+ "bytes": 7161,
621
+ "sha256": "cc96118c2c05108c831616151bc027441f7545495adeeb6a4a6a6bffe8da7801"
622
  }
623
  },
624
  "failures": []
 
629
  "local": {
630
  "path": "repo:docs/data/research_roadmap_interactive.json",
631
  "exists": true,
632
+ "bytes": 134282,
633
+ "sha256": "ff37219a9f1d9b386a9d4c42766e4aa28f10ce6ef338dceeedd6bdb4a1b2c40a"
634
  },
635
  "mirrors": {
636
  "hf_space": {
637
  "path": "hf_space:data/research_roadmap_interactive.json",
638
  "exists": true,
639
+ "bytes": 134282,
640
+ "sha256": "ff37219a9f1d9b386a9d4c42766e4aa28f10ce6ef338dceeedd6bdb4a1b2c40a"
641
  },
642
  "hf_artifacts": {
643
  "path": "hf_artifacts:docs/data/research_roadmap_interactive.json",
644
  "exists": true,
645
+ "bytes": 134282,
646
+ "sha256": "ff37219a9f1d9b386a9d4c42766e4aa28f10ce6ef338dceeedd6bdb4a1b2c40a"
647
  },
648
  "hf_model": {
649
  "path": "hf_model:metrics/research_roadmap_interactive.json",
650
  "exists": true,
651
+ "bytes": 134282,
652
+ "sha256": "ff37219a9f1d9b386a9d4c42766e4aa28f10ce6ef338dceeedd6bdb4a1b2c40a"
653
  }
654
  },
655
  "failures": []
 
1692
  "local": {
1693
  "path": "repo:scripts/build_artifact_index.py",
1694
  "exists": true,
1695
+ "bytes": 27020,
1696
+ "sha256": "0ca7ed96f24caecbab31687cffa99f0eba8471258986412a294614e688c5aff5"
1697
  },
1698
  "mirrors": {
1699
  "hf_artifacts": {
1700
  "path": "hf_artifacts:scripts/build_artifact_index.py",
1701
  "exists": true,
1702
+ "bytes": 27020,
1703
+ "sha256": "0ca7ed96f24caecbab31687cffa99f0eba8471258986412a294614e688c5aff5"
1704
  },
1705
  "hf_model": {
1706
  "path": "hf_model:scripts/build_artifact_index.py",
1707
  "exists": true,
1708
+ "bytes": 27020,
1709
+ "sha256": "0ca7ed96f24caecbab31687cffa99f0eba8471258986412a294614e688c5aff5"
1710
  }
1711
  },
1712
  "failures": []
 
2017
  "local": {
2018
  "path": "repo:scripts/validate_publication_package.py",
2019
  "exists": true,
2020
+ "bytes": 17197,
2021
+ "sha256": "2a617f3204ffb8c59d1c5bc1828b4441a4d014bb531655fd0613e128a6d9abc2"
2022
  },
2023
  "mirrors": {
2024
  "hf_artifacts": {
2025
  "path": "hf_artifacts:scripts/validate_publication_package.py",
2026
  "exists": true,
2027
+ "bytes": 17197,
2028
+ "sha256": "2a617f3204ffb8c59d1c5bc1828b4441a4d014bb531655fd0613e128a6d9abc2"
2029
  },
2030
  "hf_model": {
2031
  "path": "hf_model:scripts/validate_publication_package.py",
2032
  "exists": true,
2033
+ "bytes": 17197,
2034
+ "sha256": "2a617f3204ffb8c59d1c5bc1828b4441a4d014bb531655fd0613e128a6d9abc2"
2035
  }
2036
  },
2037
  "failures": []
 
2217
  "local": {
2218
  "path": "repo:docs/index.html",
2219
  "exists": true,
2220
+ "bytes": 174923,
2221
+ "sha256": "099fcc01cbb4d50f62c508b10f343f05b1c883962b85bda294bcede99af2a0f1"
2222
  },
2223
  "mirrors": {
2224
  "hf_space": {
2225
  "path": "hf_space:index.html",
2226
  "exists": true,
2227
+ "bytes": 174923,
2228
+ "sha256": "099fcc01cbb4d50f62c508b10f343f05b1c883962b85bda294bcede99af2a0f1"
2229
  },
2230
  "hf_artifacts_docs": {
2231
  "path": "hf_artifacts:docs/index.html",
2232
  "exists": true,
2233
+ "bytes": 174923,
2234
+ "sha256": "099fcc01cbb4d50f62c508b10f343f05b1c883962b85bda294bcede99af2a0f1"
2235
  }
2236
  },
2237
  "failures": []
 
2242
  "local": {
2243
  "path": "repo:docs/research_roadmap.html",
2244
  "exists": true,
2245
+ "bytes": 31702,
2246
+ "sha256": "1b20a5cc342b3ba59ad808eed9f5bf978e2d9ac438c88b5c3eeba01f4e14b883"
2247
  },
2248
  "mirrors": {
2249
  "hf_space": {
2250
  "path": "hf_space:research_roadmap.html",
2251
  "exists": true,
2252
+ "bytes": 31702,
2253
+ "sha256": "1b20a5cc342b3ba59ad808eed9f5bf978e2d9ac438c88b5c3eeba01f4e14b883"
2254
  },
2255
  "hf_artifacts_docs": {
2256
  "path": "hf_artifacts:docs/research_roadmap.html",
2257
  "exists": true,
2258
+ "bytes": 31702,
2259
+ "sha256": "1b20a5cc342b3ba59ad808eed9f5bf978e2d9ac438c88b5c3eeba01f4e14b883"
2260
  }
2261
  },
2262
  "failures": []
 
2844
  "local": {
2845
  "path": "repo:FOUNDATION_MODEL_PLAN.md",
2846
  "exists": true,
2847
+ "bytes": 9075,
2848
+ "sha256": "444d13ab556d2e16a199a7fca191b87c85ab8685d167aab357bc6341839299a2"
2849
  },
2850
  "mirrors": {
2851
  "hf_space": {
2852
  "path": "hf_space:FOUNDATION_MODEL_PLAN.md",
2853
  "exists": true,
2854
+ "bytes": 9075,
2855
+ "sha256": "444d13ab556d2e16a199a7fca191b87c85ab8685d167aab357bc6341839299a2"
2856
  },
2857
  "hf_artifacts": {
2858
  "path": "hf_artifacts:FOUNDATION_MODEL_PLAN.md",
2859
  "exists": true,
2860
+ "bytes": 9075,
2861
+ "sha256": "444d13ab556d2e16a199a7fca191b87c85ab8685d167aab357bc6341839299a2"
2862
  },
2863
  "hf_model": {
2864
  "path": "hf_model:FOUNDATION_MODEL_PLAN.md",
2865
  "exists": true,
2866
+ "bytes": 9075,
2867
+ "sha256": "444d13ab556d2e16a199a7fca191b87c85ab8685d167aab357bc6341839299a2"
2868
  }
2869
  },
2870
  "failures": []
 
2937
  "local": {
2938
  "path": "repo:RESEARCH_ROADMAP.md",
2939
  "exists": true,
2940
+ "bytes": 8388,
2941
+ "sha256": "0b3e3356076998ad94dc39f708cc783a4ebeab76c9da661cdd37ea12a3bb3665"
2942
  },
2943
  "mirrors": {
2944
  "hf_space": {
2945
  "path": "hf_space:RESEARCH_ROADMAP.md",
2946
  "exists": true,
2947
+ "bytes": 8388,
2948
+ "sha256": "0b3e3356076998ad94dc39f708cc783a4ebeab76c9da661cdd37ea12a3bb3665"
2949
  },
2950
  "hf_artifacts": {
2951
  "path": "hf_artifacts:RESEARCH_ROADMAP.md",
2952
  "exists": true,
2953
+ "bytes": 8388,
2954
+ "sha256": "0b3e3356076998ad94dc39f708cc783a4ebeab76c9da661cdd37ea12a3bb3665"
2955
  },
2956
  "hf_model": {
2957
  "path": "hf_model:RESEARCH_ROADMAP.md",
2958
  "exists": true,
2959
+ "bytes": 8388,
2960
+ "sha256": "0b3e3356076998ad94dc39f708cc783a4ebeab76c9da661cdd37ea12a3bb3665"
2961
  }
2962
  },
2963
  "failures": []
 
2968
  "local": {
2969
  "path": "repo:PROJECT_STATUS.md",
2970
  "exists": true,
2971
+ "bytes": 7207,
2972
+ "sha256": "7baaba976ccc254da1a03ee2653057d1e08f3fb0c0cad035886c362442828720"
2973
  },
2974
  "mirrors": {
2975
  "hf_space": {
2976
  "path": "hf_space:PROJECT_STATUS.md",
2977
  "exists": true,
2978
+ "bytes": 7207,
2979
+ "sha256": "7baaba976ccc254da1a03ee2653057d1e08f3fb0c0cad035886c362442828720"
2980
  },
2981
  "hf_artifacts": {
2982
  "path": "hf_artifacts:PROJECT_STATUS.md",
2983
  "exists": true,
2984
+ "bytes": 7207,
2985
+ "sha256": "7baaba976ccc254da1a03ee2653057d1e08f3fb0c0cad035886c362442828720"
2986
  },
2987
  "hf_model": {
2988
  "path": "hf_model:PROJECT_STATUS.md",
2989
  "exists": true,
2990
+ "bytes": 7207,
2991
+ "sha256": "7baaba976ccc254da1a03ee2653057d1e08f3fb0c0cad035886c362442828720"
2992
  }
2993
  },
2994
  "failures": []
data/project_status.json CHANGED
@@ -82,7 +82,7 @@
82
  "RESEARCH_ROADMAP.md",
83
  "docs/data/research_roadmap.json"
84
  ],
85
- "readout": "The roadmap connects public-sample task development to 128-episode data preparation, Qwen3-Omni LoRA, foundation-model selection, robustness runs, and larger omni/world-model extensions."
86
  },
87
  {
88
  "area": "Foundation-model plan",
@@ -93,6 +93,14 @@
93
  ],
94
  "readout": "Qwen3-Omni remains the first trainable held-out LoRA baseline; Cosmos 3 is added as the first world-model/action-generation branch; OpenVLA/openpi/GR00T are policy candidates after action targets are explicit."
95
  },
 
 
 
 
 
 
 
 
96
  {
97
  "area": "Official dataset wording",
98
  "status": "verified",
@@ -167,6 +175,7 @@
167
  "Inspect RESEARCH_TAKEAWAYS.md and docs/data/research_takeaways.json before interpreting model scores.",
168
  "Inspect RESEARCH_ROADMAP.md and docs/data/research_roadmap.json for the path from public-sample task work to multi-episode modeling.",
169
  "Inspect FOUNDATION_MODEL_PLAN.md and docs/data/foundation_model_plan.json before choosing a backbone branch.",
 
170
  "Inspect docs/data/summary_metrics.json and results/episode_task_suite/neural_mlp/ to check the 12-task outputs.",
171
  "Inspect results/audio_ablation/AUDIO_ABLATION_SUMMARY.md before judging whether audio helps the current task suite.",
172
  "Inspect EVALUATION_PROTOCOL.md before judging task metrics or leakage controls.",
@@ -180,6 +189,7 @@
180
  "The current reconstruction task reconstructs feature vectors, not pixel-depth, mesh, NeRF, or Gaussian reconstruction.",
181
  "Audio is one of the synchronized source modalities in the current task representation.",
182
  "The audio ablation report compares audio/no-audio variants across all 12 task contracts in results/audio_ablation/.",
183
- "Foundation-model selection is explicit: Qwen3-Omni is the immediate trainable pilot, Cosmos 3 is the first world-model branch, and policy models such as OpenVLA/openpi/GR00T wait for action-target conversion."
 
184
  ]
185
  }
 
82
  "RESEARCH_ROADMAP.md",
83
  "docs/data/research_roadmap.json"
84
  ],
85
+ "readout": "The roadmap connects public-sample task development to 128-episode data preparation, Qwen3-Omni LoRA, foundation-model selection, robustness runs, world/policy branches, and the future Xperience-native pretraining goal."
86
  },
87
  {
88
  "area": "Foundation-model plan",
 
93
  ],
94
  "readout": "Qwen3-Omni remains the first trainable held-out LoRA baseline; Cosmos 3 is added as the first world-model/action-generation branch; OpenVLA/openpi/GR00T are policy candidates after action targets are explicit."
95
  },
96
+ {
97
+ "area": "Xperience Embodied Foundation Model",
98
+ "status": "future_goal",
99
+ "evidence": [
100
+ "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md"
101
+ ],
102
+ "readout": "A future full-corpus pretraining plan describes target modules, objectives, staged scale-up, hardware ranges, and evaluation for a domain-specific embodied foundation model."
103
+ },
104
  {
105
  "area": "Official dataset wording",
106
  "status": "verified",
 
175
  "Inspect RESEARCH_TAKEAWAYS.md and docs/data/research_takeaways.json before interpreting model scores.",
176
  "Inspect RESEARCH_ROADMAP.md and docs/data/research_roadmap.json for the path from public-sample task work to multi-episode modeling.",
177
  "Inspect FOUNDATION_MODEL_PLAN.md and docs/data/foundation_model_plan.json before choosing a backbone branch.",
178
+ "Inspect XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md for the long-term full-corpus pretraining goal.",
179
  "Inspect docs/data/summary_metrics.json and results/episode_task_suite/neural_mlp/ to check the 12-task outputs.",
180
  "Inspect results/audio_ablation/AUDIO_ABLATION_SUMMARY.md before judging whether audio helps the current task suite.",
181
  "Inspect EVALUATION_PROTOCOL.md before judging task metrics or leakage controls.",
 
189
  "The current reconstruction task reconstructs feature vectors, not pixel-depth, mesh, NeRF, or Gaussian reconstruction.",
190
  "Audio is one of the synchronized source modalities in the current task representation.",
191
  "The audio ablation report compares audio/no-audio variants across all 12 task contracts in results/audio_ablation/.",
192
+ "Foundation-model selection is explicit: Qwen3-Omni is the immediate trainable pilot, Cosmos 3 is the first world-model branch, and policy models such as OpenVLA/openpi/GR00T wait for action-target conversion.",
193
+ "The Xperience Embodied Foundation Model is a future native-pretraining goal, not a completed model or current benchmark."
194
  ]
195
  }
data/publication_audit.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-04T18:32:51+00:00",
4
  "checks": [
5
  {
6
  "name": "required_publication_assets_present",
@@ -182,8 +182,8 @@
182
  "github_repo": {
183
  "root": "repo",
184
  "exists": true,
185
- "file_count": 386,
186
- "text_file_count": 320,
187
  "largest_file": {
188
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
189
  "bytes": 55702978
@@ -193,8 +193,8 @@
193
  "hf_space_bundle": {
194
  "root": "hf_publish/space",
195
  "exists": true,
196
- "file_count": 316,
197
- "text_file_count": 250,
198
  "largest_file": {
199
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
200
  "bytes": 55702978
@@ -204,8 +204,8 @@
204
  "hf_artifact_bundle": {
205
  "root": "hf_publish/artifacts",
206
  "exists": true,
207
- "file_count": 417,
208
- "text_file_count": 329,
209
  "largest_file": {
210
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
211
  "bytes": 55702978
@@ -215,8 +215,8 @@
215
  "hf_model_bundle": {
216
  "root": "hf_publish/model",
217
  "exists": true,
218
- "file_count": 643,
219
- "text_file_count": 518,
220
  "largest_file": {
221
  "path": "pytorch_model.bin",
222
  "bytes": 93495480
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-04T20:43:37+00:00",
4
  "checks": [
5
  {
6
  "name": "required_publication_assets_present",
 
182
  "github_repo": {
183
  "root": "repo",
184
  "exists": true,
185
+ "file_count": 396,
186
+ "text_file_count": 330,
187
  "largest_file": {
188
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
189
  "bytes": 55702978
 
193
  "hf_space_bundle": {
194
  "root": "hf_publish/space",
195
  "exists": true,
196
+ "file_count": 317,
197
+ "text_file_count": 251,
198
  "largest_file": {
199
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
200
  "bytes": 55702978
 
204
  "hf_artifact_bundle": {
205
  "root": "hf_publish/artifacts",
206
  "exists": true,
207
+ "file_count": 418,
208
+ "text_file_count": 330,
209
  "largest_file": {
210
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
211
  "bytes": 55702978
 
215
  "hf_model_bundle": {
216
  "root": "hf_publish/model",
217
  "exists": true,
218
+ "file_count": 644,
219
+ "text_file_count": 519,
220
  "largest_file": {
221
  "path": "pytorch_model.bin",
222
  "bytes": 93495480
data/research_roadmap.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "title": "Ropedia Xperience-10M Research Roadmap",
3
- "summary": "Staged path from the public-sample task lab to multi-episode held-out evaluation, foundation-model selection, and larger omni/world-model extensions.",
4
- "current_decision_point": "Keep the public-sample task suite as the development harness, prepare the selected official Xperience-10M episodes for the held-out Qwen3-Omni pilot, then branch into Cosmos 3 world modeling and policy-model experiments after the data preparation path is stable.",
5
  "phases": [
6
  {
7
  "id": "public_sample_task_lab",
@@ -126,6 +126,30 @@
126
  "updated model cards"
127
  ],
128
  "reader_takeaway": "The long-term direction is richer multimodal representation learning for embodied-AI reasoning, with model branches chosen by task fit rather than by a single default backbone."
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
129
  }
130
  ],
131
  "public_surfaces_to_update": [
@@ -134,6 +158,7 @@
134
  "RESEARCH_TAKEAWAYS.md",
135
  "EVALUATION_PROTOCOL.md",
136
  "ARTIFACT_GUIDE.md",
 
137
  "docs/index.html",
138
  "docs/data/research_roadmap.json",
139
  "Hugging Face Space card",
 
1
  {
2
  "title": "Ropedia Xperience-10M Research Roadmap",
3
+ "summary": "Staged path from the public-sample task lab to multi-episode held-out evaluation, foundation-model selection, world/policy branches, and a future Xperience-native embodied foundation model.",
4
+ "current_decision_point": "Keep the public-sample task suite as the development harness, prepare the selected official Xperience-10M episodes for the held-out Qwen3-Omni pilot, then branch into Cosmos 3 world modeling and policy-model experiments after the data preparation path is stable. The Xperience Embodied Foundation Model is a later full-corpus pretraining goal, not a current result.",
5
  "phases": [
6
  {
7
  "id": "public_sample_task_lab",
 
126
  "updated model cards"
127
  ],
128
  "reader_takeaway": "The long-term direction is richer multimodal representation learning for embodied-AI reasoning, with model branches chosen by task fit rather than by a single default backbone."
129
+ },
130
+ {
131
+ "id": "xperience_embodied_foundation_pretraining",
132
+ "name": "Xperience Embodied Foundation Model Pretraining",
133
+ "status": "future",
134
+ "entry_condition": "Full-corpus access, PB-scale storage path, high-throughput data loading, multi-node compute, and positive scaling evidence from smaller multi-episode runs.",
135
+ "deliverables": [
136
+ "full-corpus episode and split manifests",
137
+ "pretraining shard and provenance manifests",
138
+ "0.3B-1B and 1B-3B scaling pilots",
139
+ "3B-7B Xperience-native domain model target",
140
+ "held-out episode/session/activity/object evaluations",
141
+ "missing-modality robustness report",
142
+ "model card and data-boundary report"
143
+ ],
144
+ "completion_evidence": [
145
+ "pretraining metadata",
146
+ "checkpoint inventory",
147
+ "scaling curves",
148
+ "held-out evaluation reports",
149
+ "qualitative retrieval or future-state examples",
150
+ "safety and data-boundary report"
151
+ ],
152
+ "reader_takeaway": "The final research direction is a domain-specific embodied foundation model trained directly on Xperience-10M, after smaller pilots justify the cost and infrastructure."
153
  }
154
  ],
155
  "public_surfaces_to_update": [
 
158
  "RESEARCH_TAKEAWAYS.md",
159
  "EVALUATION_PROTOCOL.md",
160
  "ARTIFACT_GUIDE.md",
161
+ "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md",
162
  "docs/index.html",
163
  "docs/data/research_roadmap.json",
164
  "Hugging Face Space card",
data/research_roadmap_interactive.json CHANGED
@@ -1837,7 +1837,8 @@
1837
  "NVIDIA GR00T"
1838
  ],
1839
  "first_world_model_branch": "Cosmos 3",
1840
- "immediate_trainable_backbone": "Qwen3-Omni"
 
1841
  },
1842
  "evaluation_additions": [
1843
  {
@@ -1921,6 +1922,11 @@
1921
  "action": "Publish branch results only with real manifests, predictions, metrics, and qualitative examples.",
1922
  "name": "Publishing threshold",
1923
  "step": 6
 
 
 
 
 
1924
  }
1925
  ],
1926
  "model_families": [
@@ -2023,6 +2029,21 @@
2023
  "Useful after action target design.",
2024
  "Less directly omni-modal than Qwen3-Omni or Cosmos 3."
2025
  ]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2026
  }
2027
  ],
2028
  "source_links": [
@@ -2057,11 +2078,15 @@
2057
  {
2058
  "label": "LeRobot / SmolVLA",
2059
  "url": "https://github.com/huggingface/lerobot"
 
 
 
 
2060
  }
2061
  ],
2062
  "status": "planning_artifact"
2063
  },
2064
- "generated_at_utc": "2026-06-04T16:42:13+00:00",
2065
  "omni_plan": {
2066
  "adapter": "LoRA rank 16, alpha 32, dropout 0.05",
2067
  "backbone": "Qwen/Qwen3-Omni-30B-A3B-Instruct",
@@ -2208,6 +2233,31 @@
2208
  "reader_takeaway": "The long-term direction is richer multimodal representation learning for embodied-AI reasoning, with model branches chosen by task fit rather than by a single default backbone.",
2209
  "stage": "future",
2210
  "status": "planned"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2211
  }
2212
  ],
2213
  "scale_up": {
 
1837
  "NVIDIA GR00T"
1838
  ],
1839
  "first_world_model_branch": "Cosmos 3",
1840
+ "immediate_trainable_backbone": "Qwen3-Omni",
1841
+ "long_term_native_pretraining_goal": "Xperience Embodied Foundation Model"
1842
  },
1843
  "evaluation_additions": [
1844
  {
 
1922
  "action": "Publish branch results only with real manifests, predictions, metrics, and qualitative examples.",
1923
  "name": "Publishing threshold",
1924
  "step": 6
1925
+ },
1926
+ {
1927
+ "action": "Start a from-scratch Xperience Embodied Foundation Model only after smaller scaling stages, full-corpus storage, multi-node compute, and held-out evaluation protocols are in place.",
1928
+ "name": "Xperience-native pretraining",
1929
+ "step": 7
1930
  }
1931
  ],
1932
  "model_families": [
 
2029
  "Useful after action target design.",
2030
  "Less directly omni-modal than Qwen3-Omni or Cosmos 3."
2031
  ]
2032
+ },
2033
+ {
2034
+ "best_role": "Domain model over synchronized embodied experience.",
2035
+ "category": "xperience_native_pretraining_goal",
2036
+ "current_decision": "future_goal_after_scaling_evidence",
2037
+ "entry_condition": "Full-corpus data path, PB-scale storage, multi-node compute, and positive smaller-run scaling evidence.",
2038
+ "family": "Xperience Embodied Foundation Model",
2039
+ "openness": "future project-specific model if full-corpus access and compute exist",
2040
+ "priority": 8,
2041
+ "public_source": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md",
2042
+ "xperience10m_fit": [
2043
+ "Uses the full aligned modality stack rather than treating sensors as auxiliary metadata.",
2044
+ "Targets temporal embodied representation learning across perception, motion, geometry, audio, and language.",
2045
+ "Can become the shared pretraining backbone for Qwen-style instruction tasks, Cosmos-style world modeling, and policy/action branches."
2046
+ ]
2047
  }
2048
  ],
2049
  "source_links": [
 
2078
  {
2079
  "label": "LeRobot / SmolVLA",
2080
  "url": "https://github.com/huggingface/lerobot"
2081
+ },
2082
+ {
2083
+ "label": "Xperience Embodied Foundation Model pretraining plan",
2084
+ "url": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md"
2085
  }
2086
  ],
2087
  "status": "planning_artifact"
2088
  },
2089
+ "generated_at_utc": "2026-06-04T20:40:29+00:00",
2090
  "omni_plan": {
2091
  "adapter": "LoRA rank 16, alpha 32, dropout 0.05",
2092
  "backbone": "Qwen/Qwen3-Omni-30B-A3B-Instruct",
 
2233
  "reader_takeaway": "The long-term direction is richer multimodal representation learning for embodied-AI reasoning, with model branches chosen by task fit rather than by a single default backbone.",
2234
  "stage": "future",
2235
  "status": "planned"
2236
+ },
2237
+ {
2238
+ "completion_evidence": [
2239
+ "pretraining metadata",
2240
+ "checkpoint inventory",
2241
+ "scaling curves",
2242
+ "held-out evaluation reports",
2243
+ "qualitative retrieval or future-state examples",
2244
+ "safety and data-boundary report"
2245
+ ],
2246
+ "deliverables": [
2247
+ "full-corpus episode and split manifests",
2248
+ "pretraining shard and provenance manifests",
2249
+ "0.3B-1B and 1B-3B scaling pilots",
2250
+ "3B-7B Xperience-native domain model target",
2251
+ "held-out episode/session/activity/object evaluations",
2252
+ "missing-modality robustness report",
2253
+ "model card and data-boundary report"
2254
+ ],
2255
+ "entry_condition": "Full-corpus access, PB-scale storage path, high-throughput data loading, multi-node compute, and positive scaling evidence from smaller multi-episode runs.",
2256
+ "id": "xperience_embodied_foundation_pretraining",
2257
+ "name": "Xperience Embodied Foundation Model Pretraining",
2258
+ "reader_takeaway": "The final research direction is a domain-specific embodied foundation model trained directly on Xperience-10M, after smaller pilots justify the cost and infrastructure.",
2259
+ "stage": "future",
2260
+ "status": "future"
2261
  }
2262
  ],
2263
  "scale_up": {
docs/data/artifact_index.json CHANGED
@@ -1,11 +1,11 @@
1
  {
2
  "title": "Ropedia Xperience-10M Task Suite Artifact Index",
3
- "generated_at_utc": "2026-06-04T16:42:13+00:00",
4
  "status": "pass",
5
- "artifact_count": 72,
6
  "missing": [],
7
  "by_kind": {
8
- "project_path": 11,
9
  "project_scope": 1,
10
  "source_alignment": 5,
11
  "publication_workflow": 1,
@@ -62,8 +62,8 @@
62
  "surface": "repo_hf",
63
  "shows": "Gives a compact current-state table for first-pass readers.",
64
  "exists": true,
65
- "bytes": 7138,
66
- "sha256": "67d85a198ee90082e47d790bd0f4d9dafbc97625cd39b17cc94b9785ec25104a"
67
  },
68
  {
69
  "id": "project_status_json",
@@ -73,8 +73,8 @@
73
  "surface": "website_hf",
74
  "shows": "Machine-readable copy of the current project status for website and HF mirrors.",
75
  "exists": true,
76
- "bytes": 9169,
77
- "sha256": "50d3c87b774c8375dcb897bd363d25e392e5fd6571571c41d56e623df15063f8"
78
  },
79
  {
80
  "id": "research_roadmap",
@@ -84,8 +84,8 @@
84
  "surface": "repo_hf",
85
  "shows": "Defines the path from public-sample task development to multi-episode held-out evaluation and larger omni-model extensions.",
86
  "exists": true,
87
- "bytes": 6677,
88
- "sha256": "58491bfb68ad3e6b7569bdb1a3cac3de7682a49beb9de368a114d58ebf0b118b"
89
  },
90
  {
91
  "id": "research_roadmap_json",
@@ -95,8 +95,8 @@
95
  "surface": "website_hf",
96
  "shows": "Machine-readable research roadmap for the website and Hugging Face mirrors.",
97
  "exists": true,
98
- "bytes": 5758,
99
- "sha256": "54657eb8824416d2128d6e5710543bdaf9e41d7c2fa46dd14ad6b58fede3b5db"
100
  },
101
  {
102
  "id": "foundation_model_plan",
@@ -106,8 +106,8 @@
106
  "surface": "repo_hf",
107
  "shows": "Defines the post-data-gate backbone choices: Qwen3-Omni first, Cosmos 3 for world modeling, and VLA/policy models after action-target conversion.",
108
  "exists": true,
109
- "bytes": 6559,
110
- "sha256": "955be6559b554f1c6c4141dd6ca2818127d89585df3940c2bd9b975ad9047926"
111
  },
112
  {
113
  "id": "foundation_model_plan_json",
@@ -117,8 +117,19 @@
117
  "surface": "website_hf",
118
  "shows": "Machine-readable foundation-model selection matrix with source links, entry conditions, and evaluation additions.",
119
  "exists": true,
120
- "bytes": 8889,
121
- "sha256": "e9b11114fa290253000b921575586780ccc3ba17665235259d4326c524f6ce97"
 
 
 
 
 
 
 
 
 
 
 
122
  },
123
  {
124
  "id": "evidence_contract",
@@ -150,8 +161,8 @@
150
  "surface": "repo_hf",
151
  "shows": "Gives the human-readable map from project scope to data, tasks, platform mirrors, and scale-up status.",
152
  "exists": true,
153
- "bytes": 16890,
154
- "sha256": "8bce9a773daf36214e377a7154b72a4493efd0f7d1a1941d5e0fc9bf784a29e5"
155
  },
156
  {
157
  "id": "official_dataset_card_alignment",
@@ -195,7 +206,7 @@
195
  "shows": "Machine-readable source-alignment pass/fail check for repo, website, and HF surfaces.",
196
  "exists": true,
197
  "bytes": 4432,
198
- "sha256": "96c7adc61c869fab71ef34ec2f6ec4f5f88af844509bd3d51d3818732d1f84b6"
199
  },
200
  {
201
  "id": "source_alignment_validator",
@@ -573,8 +584,8 @@
573
  "surface": "repo_hf",
574
  "shows": "Generates the selective artifact catalog from local files.",
575
  "exists": true,
576
- "bytes": 26568,
577
- "sha256": "a611b399e858560f6afb41e121f033724753c5167d04e0d7bf243e569de88f04"
578
  },
579
  {
580
  "id": "publication_audit",
@@ -585,7 +596,7 @@
585
  "volatile": true,
586
  "shows": "Confirms public bundles exclude raw data, caches, heavy archives, and credential text.",
587
  "exists": true,
588
- "bytes": 7289,
589
  "hash_policy": "existence_and_size_only"
590
  },
591
  {
@@ -597,7 +608,7 @@
597
  "volatile": true,
598
  "shows": "Separates setup paths from completed held-out-episode results.",
599
  "exists": true,
600
- "bytes": 19505,
601
  "hash_policy": "existence_and_size_only"
602
  },
603
  {
@@ -609,7 +620,7 @@
609
  "volatile": true,
610
  "shows": "Confirms prepared GitHub/HF Space/artifact/model mirrors share the same critical data, figure, website HTML, and validator files.",
611
  "exists": true,
612
- "bytes": 108617,
613
  "hash_policy": "existence_and_size_only"
614
  },
615
  {
@@ -621,7 +632,7 @@
621
  "volatile": true,
622
  "shows": "Confirms local website links, anchors, JSON data files, and referenced images resolve.",
623
  "exists": true,
624
- "bytes": 14923,
625
  "hash_policy": "existence_and_size_only"
626
  },
627
  {
 
1
  {
2
  "title": "Ropedia Xperience-10M Task Suite Artifact Index",
3
+ "generated_at_utc": "2026-06-04T20:40:52+00:00",
4
  "status": "pass",
5
+ "artifact_count": 73,
6
  "missing": [],
7
  "by_kind": {
8
+ "project_path": 12,
9
  "project_scope": 1,
10
  "source_alignment": 5,
11
  "publication_workflow": 1,
 
62
  "surface": "repo_hf",
63
  "shows": "Gives a compact current-state table for first-pass readers.",
64
  "exists": true,
65
+ "bytes": 7207,
66
+ "sha256": "7baaba976ccc254da1a03ee2653057d1e08f3fb0c0cad035886c362442828720"
67
  },
68
  {
69
  "id": "project_status_json",
 
73
  "surface": "website_hf",
74
  "shows": "Machine-readable copy of the current project status for website and HF mirrors.",
75
  "exists": true,
76
+ "bytes": 9874,
77
+ "sha256": "600c95726eae3404127a8b2110f35468ff2ba02943cae0fbcd3ea43c66109d3e"
78
  },
79
  {
80
  "id": "research_roadmap",
 
84
  "surface": "repo_hf",
85
  "shows": "Defines the path from public-sample task development to multi-episode held-out evaluation and larger omni-model extensions.",
86
  "exists": true,
87
+ "bytes": 8388,
88
+ "sha256": "0b3e3356076998ad94dc39f708cc783a4ebeab76c9da661cdd37ea12a3bb3665"
89
  },
90
  {
91
  "id": "research_roadmap_json",
 
95
  "surface": "website_hf",
96
  "shows": "Machine-readable research roadmap for the website and Hugging Face mirrors.",
97
  "exists": true,
98
+ "bytes": 7161,
99
+ "sha256": "cc96118c2c05108c831616151bc027441f7545495adeeb6a4a6a6bffe8da7801"
100
  },
101
  {
102
  "id": "foundation_model_plan",
 
106
  "surface": "repo_hf",
107
  "shows": "Defines the post-data-gate backbone choices: Qwen3-Omni first, Cosmos 3 for world modeling, and VLA/policy models after action-target conversion.",
108
  "exists": true,
109
+ "bytes": 9075,
110
+ "sha256": "444d13ab556d2e16a199a7fca191b87c85ab8685d167aab357bc6341839299a2"
111
  },
112
  {
113
  "id": "foundation_model_plan_json",
 
117
  "surface": "website_hf",
118
  "shows": "Machine-readable foundation-model selection matrix with source links, entry conditions, and evaluation additions.",
119
  "exists": true,
120
+ "bytes": 12981,
121
+ "sha256": "9cce52025a2e2f8afb4660e2af3353aea6ad0a1af380849218dd74c0acc271bb"
122
+ },
123
+ {
124
+ "id": "xperience_embodied_foundation_pretraining",
125
+ "title": "Xperience Embodied Foundation Model pretraining goal",
126
+ "path": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md",
127
+ "kind": "project_path",
128
+ "surface": "repo_hf",
129
+ "shows": "Describes the future full-corpus Xperience-native pretraining goal, target modules, objectives, staged scale-up, hardware ranges, and evaluation protocol.",
130
+ "exists": true,
131
+ "bytes": 9182,
132
+ "sha256": "b5a6ddc58647cd895a4772b110ecc9f4d685427fb37b81b22c6c02d2b9b323f1"
133
  },
134
  {
135
  "id": "evidence_contract",
 
161
  "surface": "repo_hf",
162
  "shows": "Gives the human-readable map from project scope to data, tasks, platform mirrors, and scale-up status.",
163
  "exists": true,
164
+ "bytes": 11440,
165
+ "sha256": "9b8821a9b14fe1744f2e6b5c419b2c5daaf70b57f1944caf1105c36c0c66c119"
166
  },
167
  {
168
  "id": "official_dataset_card_alignment",
 
206
  "shows": "Machine-readable source-alignment pass/fail check for repo, website, and HF surfaces.",
207
  "exists": true,
208
  "bytes": 4432,
209
+ "sha256": "06c6e2d111c72df01ed127fd288e6675b63e35a21ae12a2523931a072bd0bc49"
210
  },
211
  {
212
  "id": "source_alignment_validator",
 
584
  "surface": "repo_hf",
585
  "shows": "Generates the selective artifact catalog from local files.",
586
  "exists": true,
587
+ "bytes": 27020,
588
+ "sha256": "0ca7ed96f24caecbab31687cffa99f0eba8471258986412a294614e688c5aff5"
589
  },
590
  {
591
  "id": "publication_audit",
 
596
  "volatile": true,
597
  "shows": "Confirms public bundles exclude raw data, caches, heavy archives, and credential text.",
598
  "exists": true,
599
+ "bytes": 11811,
600
  "hash_policy": "existence_and_size_only"
601
  },
602
  {
 
608
  "volatile": true,
609
  "shows": "Separates setup paths from completed held-out-episode results.",
610
  "exists": true,
611
+ "bytes": 18981,
612
  "hash_policy": "existence_and_size_only"
613
  },
614
  {
 
620
  "volatile": true,
621
  "shows": "Confirms prepared GitHub/HF Space/artifact/model mirrors share the same critical data, figure, website HTML, and validator files.",
622
  "exists": true,
623
+ "bytes": 108621,
624
  "hash_policy": "existence_and_size_only"
625
  },
626
  {
 
632
  "volatile": true,
633
  "shows": "Confirms local website links, anchors, JSON data files, and referenced images resolve.",
634
  "exists": true,
635
+ "bytes": 14891,
636
  "hash_policy": "existence_and_size_only"
637
  },
638
  {
docs/data/foundation_model_plan.json CHANGED
@@ -2,6 +2,16 @@
2
  "title": "Xperience-10M Foundation Model Plan",
3
  "status": "planning_artifact",
4
  "current_boundary": "No held-out multi-episode foundation-model result has been completed in this repo. The current foundation-model artifacts are setup-stage until enough valid episodes are prepared and evaluated.",
 
 
 
 
 
 
 
 
 
 
5
  "decision": {
6
  "immediate_trainable_backbone": "Qwen3-Omni",
7
  "first_world_model_branch": "Cosmos 3",
@@ -10,7 +20,65 @@
10
  "openpi pi0/pi0.5",
11
  "NVIDIA GR00T"
12
  ],
13
- "external_reasoning_reference": "Gemini Robotics"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  },
15
  "model_families": [
16
  {
@@ -112,6 +180,21 @@
112
  "current_decision": "optional_baseline_after_data_staging",
113
  "entry_condition": "Action labels and baseline protocol exist.",
114
  "public_source": "https://github.com/huggingface/lerobot"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
115
  }
116
  ],
117
  "execution_order": [
@@ -144,6 +227,11 @@
144
  "step": 6,
145
  "name": "Publishing threshold",
146
  "action": "Publish branch results only with real manifests, predictions, metrics, and qualitative examples."
 
 
 
 
 
147
  }
148
  ],
149
  "evaluation_additions": [
@@ -230,6 +318,10 @@
230
  {
231
  "label": "LeRobot / SmolVLA",
232
  "url": "https://github.com/huggingface/lerobot"
 
 
 
 
233
  }
234
  ]
235
  }
 
2
  "title": "Xperience-10M Foundation Model Plan",
3
  "status": "planning_artifact",
4
  "current_boundary": "No held-out multi-episode foundation-model result has been completed in this repo. The current foundation-model artifacts are setup-stage until enough valid episodes are prepared and evaluated.",
5
+ "backbone_registry": {
6
+ "config_dir": "configs/omni_backbones",
7
+ "validator": "scripts/omni/backbone_registry.py --validate --json",
8
+ "extension_contract": "OMNI_MODEL_EXTENSION_CONTRACT.md",
9
+ "implemented_backbone": "qwen3_omni_lora",
10
+ "planned_backbones": [
11
+ "cosmos_world_model",
12
+ "policy_vla_branch"
13
+ ]
14
+ },
15
  "decision": {
16
  "immediate_trainable_backbone": "Qwen3-Omni",
17
  "first_world_model_branch": "Cosmos 3",
 
20
  "openpi pi0/pi0.5",
21
  "NVIDIA GR00T"
22
  ],
23
+ "external_reasoning_reference": "Gemini Robotics",
24
+ "long_term_native_pretraining_goal": "Xperience Embodied Foundation Model"
25
+ },
26
+ "future_pretraining_goal": {
27
+ "name": "Xperience Embodied Foundation Model",
28
+ "status": "future_planning_goal",
29
+ "role": "Domain-specific embodied foundation model pretrained on full Xperience-10M if full-corpus data, storage, and compute become available.",
30
+ "not_current_result": true,
31
+ "document": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md",
32
+ "entry_conditions": [
33
+ "Selected multi-episode Qwen3-Omni pilot trains and evaluates cleanly.",
34
+ "Scaling from 128 episodes to thousands of episodes shows measurable value.",
35
+ "Full-corpus storage, derived-shard storage, and fast active-cache capacity are available.",
36
+ "Distributed training, checkpoint/restart, and provenance tracking are reliable.",
37
+ "Evaluation covers held-out episodes, sessions, activities, objects, and missing-modality robustness."
38
+ ],
39
+ "target_modules": [
40
+ "multi-view video encoder",
41
+ "audio encoder",
42
+ "depth and geometry encoder",
43
+ "pose/SLAM encoder",
44
+ "hand/body mocap encoder",
45
+ "IMU encoder",
46
+ "language encoder/decoder",
47
+ "temporal fusion transformer",
48
+ "task heads and decoders"
49
+ ],
50
+ "pretraining_objectives": [
51
+ "masked multimodal modeling",
52
+ "cross-modal contrastive alignment",
53
+ "future-state prediction",
54
+ "ego-motion and hand-motion forecasting",
55
+ "action and procedure prediction",
56
+ "language grounding and captioning",
57
+ "contact and affordance prediction",
58
+ "optional policy-style targets after action conversion"
59
+ ],
60
+ "hardware_ranges": [
61
+ {
62
+ "goal": "0.3B-1B pilot",
63
+ "compute": "8-32 modern 80GB-class data-center GPUs",
64
+ "use": "prove objectives and data loaders"
65
+ },
66
+ {
67
+ "goal": "1B-3B domain model",
68
+ "compute": "32-128 GPUs",
69
+ "use": "research-scale Xperience representation learning"
70
+ },
71
+ {
72
+ "goal": "3B-7B full-corpus domain model",
73
+ "compute": "128-512 GPUs",
74
+ "use": "first realistic full Xperience-native foundation model"
75
+ },
76
+ {
77
+ "goal": "30B-class omni model from scratch",
78
+ "compute": "512-2000+ GPUs",
79
+ "use": "lab-scale project after scaling curves justify cost"
80
+ }
81
+ ]
82
  },
83
  "model_families": [
84
  {
 
180
  "current_decision": "optional_baseline_after_data_staging",
181
  "entry_condition": "Action labels and baseline protocol exist.",
182
  "public_source": "https://github.com/huggingface/lerobot"
183
+ },
184
+ {
185
+ "priority": 8,
186
+ "family": "Xperience Embodied Foundation Model",
187
+ "category": "xperience_native_pretraining_goal",
188
+ "openness": "future project-specific model if full-corpus access and compute exist",
189
+ "best_role": "Domain model over synchronized embodied experience.",
190
+ "xperience10m_fit": [
191
+ "Uses the full aligned modality stack rather than treating sensors as auxiliary metadata.",
192
+ "Targets temporal embodied representation learning across perception, motion, geometry, audio, and language.",
193
+ "Can become the shared pretraining backbone for Qwen-style instruction tasks, Cosmos-style world modeling, and policy/action branches."
194
+ ],
195
+ "current_decision": "future_goal_after_scaling_evidence",
196
+ "entry_condition": "Full-corpus data path, PB-scale storage, multi-node compute, and positive smaller-run scaling evidence.",
197
+ "public_source": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md"
198
  }
199
  ],
200
  "execution_order": [
 
227
  "step": 6,
228
  "name": "Publishing threshold",
229
  "action": "Publish branch results only with real manifests, predictions, metrics, and qualitative examples."
230
+ },
231
+ {
232
+ "step": 7,
233
+ "name": "Xperience-native pretraining",
234
+ "action": "Start a from-scratch Xperience Embodied Foundation Model only after smaller scaling stages, full-corpus storage, multi-node compute, and held-out evaluation protocols are in place."
235
  }
236
  ],
237
  "evaluation_additions": [
 
318
  {
319
  "label": "LeRobot / SmolVLA",
320
  "url": "https://github.com/huggingface/lerobot"
321
+ },
322
+ {
323
+ "label": "Xperience Embodied Foundation Model pretraining plan",
324
+ "url": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md"
325
  }
326
  ]
327
  }
docs/data/mirror_parity.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-04T16:49:59+00:00",
4
  "hf_root": "hf_publish",
5
  "summary": {
6
  "group_count": 101,
@@ -71,27 +71,27 @@
71
  "local": {
72
  "path": "repo:docs/data/artifact_index.json",
73
  "exists": true,
74
- "bytes": 32296,
75
- "sha256": "5494e5ee1e40bc50d44a9cd6f77c8de694175939bda4f174fb5b1554e53ec508"
76
  },
77
  "mirrors": {
78
  "hf_space": {
79
  "path": "hf_space:data/artifact_index.json",
80
  "exists": true,
81
- "bytes": 32296,
82
- "sha256": "5494e5ee1e40bc50d44a9cd6f77c8de694175939bda4f174fb5b1554e53ec508"
83
  },
84
  "hf_artifacts": {
85
  "path": "hf_artifacts:docs/data/artifact_index.json",
86
  "exists": true,
87
- "bytes": 32296,
88
- "sha256": "5494e5ee1e40bc50d44a9cd6f77c8de694175939bda4f174fb5b1554e53ec508"
89
  },
90
  "hf_model": {
91
  "path": "hf_model:metrics/artifact_index.json",
92
  "exists": true,
93
- "bytes": 32296,
94
- "sha256": "5494e5ee1e40bc50d44a9cd6f77c8de694175939bda4f174fb5b1554e53ec508"
95
  }
96
  },
97
  "failures": []
@@ -226,27 +226,27 @@
226
  "local": {
227
  "path": "repo:docs/data/foundation_model_plan.json",
228
  "exists": true,
229
- "bytes": 8889,
230
- "sha256": "e9b11114fa290253000b921575586780ccc3ba17665235259d4326c524f6ce97"
231
  },
232
  "mirrors": {
233
  "hf_space": {
234
  "path": "hf_space:data/foundation_model_plan.json",
235
  "exists": true,
236
- "bytes": 8889,
237
- "sha256": "e9b11114fa290253000b921575586780ccc3ba17665235259d4326c524f6ce97"
238
  },
239
  "hf_artifacts": {
240
  "path": "hf_artifacts:docs/data/foundation_model_plan.json",
241
  "exists": true,
242
- "bytes": 8889,
243
- "sha256": "e9b11114fa290253000b921575586780ccc3ba17665235259d4326c524f6ce97"
244
  },
245
  "hf_model": {
246
  "path": "hf_model:metrics/foundation_model_plan.json",
247
  "exists": true,
248
- "bytes": 8889,
249
- "sha256": "e9b11114fa290253000b921575586780ccc3ba17665235259d4326c524f6ce97"
250
  }
251
  },
252
  "failures": []
@@ -412,27 +412,27 @@
412
  "local": {
413
  "path": "repo:docs/data/project_status.json",
414
  "exists": true,
415
- "bytes": 9169,
416
- "sha256": "50d3c87b774c8375dcb897bd363d25e392e5fd6571571c41d56e623df15063f8"
417
  },
418
  "mirrors": {
419
  "hf_space": {
420
  "path": "hf_space:data/project_status.json",
421
  "exists": true,
422
- "bytes": 9169,
423
- "sha256": "50d3c87b774c8375dcb897bd363d25e392e5fd6571571c41d56e623df15063f8"
424
  },
425
  "hf_artifacts": {
426
  "path": "hf_artifacts:docs/data/project_status.json",
427
  "exists": true,
428
- "bytes": 9169,
429
- "sha256": "50d3c87b774c8375dcb897bd363d25e392e5fd6571571c41d56e623df15063f8"
430
  },
431
  "hf_model": {
432
  "path": "hf_model:metrics/project_status.json",
433
  "exists": true,
434
- "bytes": 9169,
435
- "sha256": "50d3c87b774c8375dcb897bd363d25e392e5fd6571571c41d56e623df15063f8"
436
  }
437
  },
438
  "failures": []
@@ -443,27 +443,27 @@
443
  "local": {
444
  "path": "repo:docs/data/publication_audit.json",
445
  "exists": true,
446
- "bytes": 7289,
447
- "sha256": "cd84a10ddbfb13943820c8e6113ca377a9ab1215f45df2b3384e752cbcac190b"
448
  },
449
  "mirrors": {
450
  "hf_space": {
451
  "path": "hf_space:data/publication_audit.json",
452
  "exists": true,
453
- "bytes": 7289,
454
- "sha256": "cd84a10ddbfb13943820c8e6113ca377a9ab1215f45df2b3384e752cbcac190b"
455
  },
456
  "hf_artifacts": {
457
  "path": "hf_artifacts:docs/data/publication_audit.json",
458
  "exists": true,
459
- "bytes": 7289,
460
- "sha256": "cd84a10ddbfb13943820c8e6113ca377a9ab1215f45df2b3384e752cbcac190b"
461
  },
462
  "hf_model": {
463
  "path": "hf_model:metrics/publication_audit.json",
464
  "exists": true,
465
- "bytes": 7289,
466
- "sha256": "cd84a10ddbfb13943820c8e6113ca377a9ab1215f45df2b3384e752cbcac190b"
467
  }
468
  },
469
  "failures": []
@@ -598,27 +598,27 @@
598
  "local": {
599
  "path": "repo:docs/data/research_roadmap.json",
600
  "exists": true,
601
- "bytes": 5758,
602
- "sha256": "54657eb8824416d2128d6e5710543bdaf9e41d7c2fa46dd14ad6b58fede3b5db"
603
  },
604
  "mirrors": {
605
  "hf_space": {
606
  "path": "hf_space:data/research_roadmap.json",
607
  "exists": true,
608
- "bytes": 5758,
609
- "sha256": "54657eb8824416d2128d6e5710543bdaf9e41d7c2fa46dd14ad6b58fede3b5db"
610
  },
611
  "hf_artifacts": {
612
  "path": "hf_artifacts:docs/data/research_roadmap.json",
613
  "exists": true,
614
- "bytes": 5758,
615
- "sha256": "54657eb8824416d2128d6e5710543bdaf9e41d7c2fa46dd14ad6b58fede3b5db"
616
  },
617
  "hf_model": {
618
  "path": "hf_model:metrics/research_roadmap.json",
619
  "exists": true,
620
- "bytes": 5758,
621
- "sha256": "54657eb8824416d2128d6e5710543bdaf9e41d7c2fa46dd14ad6b58fede3b5db"
622
  }
623
  },
624
  "failures": []
@@ -629,27 +629,27 @@
629
  "local": {
630
  "path": "repo:docs/data/research_roadmap_interactive.json",
631
  "exists": true,
632
- "bytes": 131519,
633
- "sha256": "004fbcc7a3582da88dd66504d686604ecb0f04f65c9c8166bb0583e0fc174274"
634
  },
635
  "mirrors": {
636
  "hf_space": {
637
  "path": "hf_space:data/research_roadmap_interactive.json",
638
  "exists": true,
639
- "bytes": 131519,
640
- "sha256": "004fbcc7a3582da88dd66504d686604ecb0f04f65c9c8166bb0583e0fc174274"
641
  },
642
  "hf_artifacts": {
643
  "path": "hf_artifacts:docs/data/research_roadmap_interactive.json",
644
  "exists": true,
645
- "bytes": 131519,
646
- "sha256": "004fbcc7a3582da88dd66504d686604ecb0f04f65c9c8166bb0583e0fc174274"
647
  },
648
  "hf_model": {
649
  "path": "hf_model:metrics/research_roadmap_interactive.json",
650
  "exists": true,
651
- "bytes": 131519,
652
- "sha256": "004fbcc7a3582da88dd66504d686604ecb0f04f65c9c8166bb0583e0fc174274"
653
  }
654
  },
655
  "failures": []
@@ -939,27 +939,27 @@
939
  "local": {
940
  "path": "repo:docs/data/website_integrity.json",
941
  "exists": true,
942
- "bytes": 14923,
943
- "sha256": "23a03838502e8d43ee2b41e313634ec46a4b329792883aa12fc03b044c4e9b0e"
944
  },
945
  "mirrors": {
946
  "hf_space": {
947
  "path": "hf_space:data/website_integrity.json",
948
  "exists": true,
949
- "bytes": 14923,
950
- "sha256": "23a03838502e8d43ee2b41e313634ec46a4b329792883aa12fc03b044c4e9b0e"
951
  },
952
  "hf_artifacts": {
953
  "path": "hf_artifacts:docs/data/website_integrity.json",
954
  "exists": true,
955
- "bytes": 14923,
956
- "sha256": "23a03838502e8d43ee2b41e313634ec46a4b329792883aa12fc03b044c4e9b0e"
957
  },
958
  "hf_model": {
959
  "path": "hf_model:metrics/website_integrity.json",
960
  "exists": true,
961
- "bytes": 14923,
962
- "sha256": "23a03838502e8d43ee2b41e313634ec46a4b329792883aa12fc03b044c4e9b0e"
963
  }
964
  },
965
  "failures": []
@@ -1692,21 +1692,21 @@
1692
  "local": {
1693
  "path": "repo:scripts/build_artifact_index.py",
1694
  "exists": true,
1695
- "bytes": 26568,
1696
- "sha256": "a611b399e858560f6afb41e121f033724753c5167d04e0d7bf243e569de88f04"
1697
  },
1698
  "mirrors": {
1699
  "hf_artifacts": {
1700
  "path": "hf_artifacts:scripts/build_artifact_index.py",
1701
  "exists": true,
1702
- "bytes": 26568,
1703
- "sha256": "a611b399e858560f6afb41e121f033724753c5167d04e0d7bf243e569de88f04"
1704
  },
1705
  "hf_model": {
1706
  "path": "hf_model:scripts/build_artifact_index.py",
1707
  "exists": true,
1708
- "bytes": 26568,
1709
- "sha256": "a611b399e858560f6afb41e121f033724753c5167d04e0d7bf243e569de88f04"
1710
  }
1711
  },
1712
  "failures": []
@@ -2017,21 +2017,21 @@
2017
  "local": {
2018
  "path": "repo:scripts/validate_publication_package.py",
2019
  "exists": true,
2020
- "bytes": 19267,
2021
- "sha256": "0db7f9a376ac4dfb1bb083a5f35051e2cb18a0d9db5788e7d707d8dc084ad231"
2022
  },
2023
  "mirrors": {
2024
  "hf_artifacts": {
2025
  "path": "hf_artifacts:scripts/validate_publication_package.py",
2026
  "exists": true,
2027
- "bytes": 19267,
2028
- "sha256": "0db7f9a376ac4dfb1bb083a5f35051e2cb18a0d9db5788e7d707d8dc084ad231"
2029
  },
2030
  "hf_model": {
2031
  "path": "hf_model:scripts/validate_publication_package.py",
2032
  "exists": true,
2033
- "bytes": 19267,
2034
- "sha256": "0db7f9a376ac4dfb1bb083a5f35051e2cb18a0d9db5788e7d707d8dc084ad231"
2035
  }
2036
  },
2037
  "failures": []
@@ -2117,21 +2117,21 @@
2117
  "local": {
2118
  "path": "repo:scripts/validate_website_integrity.py",
2119
  "exists": true,
2120
- "bytes": 24396,
2121
- "sha256": "3b4af15250f79827e3010e93636836c3a0c768ba0188a9a7e55e439233988c72"
2122
  },
2123
  "mirrors": {
2124
  "hf_artifacts": {
2125
  "path": "hf_artifacts:scripts/validate_website_integrity.py",
2126
  "exists": true,
2127
- "bytes": 24396,
2128
- "sha256": "3b4af15250f79827e3010e93636836c3a0c768ba0188a9a7e55e439233988c72"
2129
  },
2130
  "hf_model": {
2131
  "path": "hf_model:scripts/validate_website_integrity.py",
2132
  "exists": true,
2133
- "bytes": 24396,
2134
- "sha256": "3b4af15250f79827e3010e93636836c3a0c768ba0188a9a7e55e439233988c72"
2135
  }
2136
  },
2137
  "failures": []
@@ -2217,21 +2217,21 @@
2217
  "local": {
2218
  "path": "repo:docs/index.html",
2219
  "exists": true,
2220
- "bytes": 173425,
2221
- "sha256": "26ac1e7976c11f21f4fd2f3623ac8d339a57b511f6cc8f5e68300062e9def2b0"
2222
  },
2223
  "mirrors": {
2224
  "hf_space": {
2225
  "path": "hf_space:index.html",
2226
  "exists": true,
2227
- "bytes": 173425,
2228
- "sha256": "26ac1e7976c11f21f4fd2f3623ac8d339a57b511f6cc8f5e68300062e9def2b0"
2229
  },
2230
  "hf_artifacts_docs": {
2231
  "path": "hf_artifacts:docs/index.html",
2232
  "exists": true,
2233
- "bytes": 173425,
2234
- "sha256": "26ac1e7976c11f21f4fd2f3623ac8d339a57b511f6cc8f5e68300062e9def2b0"
2235
  }
2236
  },
2237
  "failures": []
@@ -2242,21 +2242,21 @@
2242
  "local": {
2243
  "path": "repo:docs/research_roadmap.html",
2244
  "exists": true,
2245
- "bytes": 31554,
2246
- "sha256": "f51e83a4495f2d2012ec4c48191d66ca4456a00d7fcb335a427b7d86afc66109"
2247
  },
2248
  "mirrors": {
2249
  "hf_space": {
2250
  "path": "hf_space:research_roadmap.html",
2251
  "exists": true,
2252
- "bytes": 31554,
2253
- "sha256": "f51e83a4495f2d2012ec4c48191d66ca4456a00d7fcb335a427b7d86afc66109"
2254
  },
2255
  "hf_artifacts_docs": {
2256
  "path": "hf_artifacts:docs/research_roadmap.html",
2257
  "exists": true,
2258
- "bytes": 31554,
2259
- "sha256": "f51e83a4495f2d2012ec4c48191d66ca4456a00d7fcb335a427b7d86afc66109"
2260
  }
2261
  },
2262
  "failures": []
@@ -2844,27 +2844,27 @@
2844
  "local": {
2845
  "path": "repo:FOUNDATION_MODEL_PLAN.md",
2846
  "exists": true,
2847
- "bytes": 6559,
2848
- "sha256": "955be6559b554f1c6c4141dd6ca2818127d89585df3940c2bd9b975ad9047926"
2849
  },
2850
  "mirrors": {
2851
  "hf_space": {
2852
  "path": "hf_space:FOUNDATION_MODEL_PLAN.md",
2853
  "exists": true,
2854
- "bytes": 6559,
2855
- "sha256": "955be6559b554f1c6c4141dd6ca2818127d89585df3940c2bd9b975ad9047926"
2856
  },
2857
  "hf_artifacts": {
2858
  "path": "hf_artifacts:FOUNDATION_MODEL_PLAN.md",
2859
  "exists": true,
2860
- "bytes": 6559,
2861
- "sha256": "955be6559b554f1c6c4141dd6ca2818127d89585df3940c2bd9b975ad9047926"
2862
  },
2863
  "hf_model": {
2864
  "path": "hf_model:FOUNDATION_MODEL_PLAN.md",
2865
  "exists": true,
2866
- "bytes": 6559,
2867
- "sha256": "955be6559b554f1c6c4141dd6ca2818127d89585df3940c2bd9b975ad9047926"
2868
  }
2869
  },
2870
  "failures": []
@@ -2937,27 +2937,27 @@
2937
  "local": {
2938
  "path": "repo:RESEARCH_ROADMAP.md",
2939
  "exists": true,
2940
- "bytes": 6677,
2941
- "sha256": "58491bfb68ad3e6b7569bdb1a3cac3de7682a49beb9de368a114d58ebf0b118b"
2942
  },
2943
  "mirrors": {
2944
  "hf_space": {
2945
  "path": "hf_space:RESEARCH_ROADMAP.md",
2946
  "exists": true,
2947
- "bytes": 6677,
2948
- "sha256": "58491bfb68ad3e6b7569bdb1a3cac3de7682a49beb9de368a114d58ebf0b118b"
2949
  },
2950
  "hf_artifacts": {
2951
  "path": "hf_artifacts:RESEARCH_ROADMAP.md",
2952
  "exists": true,
2953
- "bytes": 6677,
2954
- "sha256": "58491bfb68ad3e6b7569bdb1a3cac3de7682a49beb9de368a114d58ebf0b118b"
2955
  },
2956
  "hf_model": {
2957
  "path": "hf_model:RESEARCH_ROADMAP.md",
2958
  "exists": true,
2959
- "bytes": 6677,
2960
- "sha256": "58491bfb68ad3e6b7569bdb1a3cac3de7682a49beb9de368a114d58ebf0b118b"
2961
  }
2962
  },
2963
  "failures": []
@@ -2968,27 +2968,27 @@
2968
  "local": {
2969
  "path": "repo:PROJECT_STATUS.md",
2970
  "exists": true,
2971
- "bytes": 7138,
2972
- "sha256": "67d85a198ee90082e47d790bd0f4d9dafbc97625cd39b17cc94b9785ec25104a"
2973
  },
2974
  "mirrors": {
2975
  "hf_space": {
2976
  "path": "hf_space:PROJECT_STATUS.md",
2977
  "exists": true,
2978
- "bytes": 7138,
2979
- "sha256": "67d85a198ee90082e47d790bd0f4d9dafbc97625cd39b17cc94b9785ec25104a"
2980
  },
2981
  "hf_artifacts": {
2982
  "path": "hf_artifacts:PROJECT_STATUS.md",
2983
  "exists": true,
2984
- "bytes": 7138,
2985
- "sha256": "67d85a198ee90082e47d790bd0f4d9dafbc97625cd39b17cc94b9785ec25104a"
2986
  },
2987
  "hf_model": {
2988
  "path": "hf_model:PROJECT_STATUS.md",
2989
  "exists": true,
2990
- "bytes": 7138,
2991
- "sha256": "67d85a198ee90082e47d790bd0f4d9dafbc97625cd39b17cc94b9785ec25104a"
2992
  }
2993
  },
2994
  "failures": []
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-04T20:45:22+00:00",
4
  "hf_root": "hf_publish",
5
  "summary": {
6
  "group_count": 101,
 
71
  "local": {
72
  "path": "repo:docs/data/artifact_index.json",
73
  "exists": true,
74
+ "bytes": 32864,
75
+ "sha256": "ec7d17898c42fd76109567c201f9638059b6a9a11a48817b32677a0eb2662178"
76
  },
77
  "mirrors": {
78
  "hf_space": {
79
  "path": "hf_space:data/artifact_index.json",
80
  "exists": true,
81
+ "bytes": 32864,
82
+ "sha256": "ec7d17898c42fd76109567c201f9638059b6a9a11a48817b32677a0eb2662178"
83
  },
84
  "hf_artifacts": {
85
  "path": "hf_artifacts:docs/data/artifact_index.json",
86
  "exists": true,
87
+ "bytes": 32864,
88
+ "sha256": "ec7d17898c42fd76109567c201f9638059b6a9a11a48817b32677a0eb2662178"
89
  },
90
  "hf_model": {
91
  "path": "hf_model:metrics/artifact_index.json",
92
  "exists": true,
93
+ "bytes": 32864,
94
+ "sha256": "ec7d17898c42fd76109567c201f9638059b6a9a11a48817b32677a0eb2662178"
95
  }
96
  },
97
  "failures": []
 
226
  "local": {
227
  "path": "repo:docs/data/foundation_model_plan.json",
228
  "exists": true,
229
+ "bytes": 12981,
230
+ "sha256": "9cce52025a2e2f8afb4660e2af3353aea6ad0a1af380849218dd74c0acc271bb"
231
  },
232
  "mirrors": {
233
  "hf_space": {
234
  "path": "hf_space:data/foundation_model_plan.json",
235
  "exists": true,
236
+ "bytes": 12981,
237
+ "sha256": "9cce52025a2e2f8afb4660e2af3353aea6ad0a1af380849218dd74c0acc271bb"
238
  },
239
  "hf_artifacts": {
240
  "path": "hf_artifacts:docs/data/foundation_model_plan.json",
241
  "exists": true,
242
+ "bytes": 12981,
243
+ "sha256": "9cce52025a2e2f8afb4660e2af3353aea6ad0a1af380849218dd74c0acc271bb"
244
  },
245
  "hf_model": {
246
  "path": "hf_model:metrics/foundation_model_plan.json",
247
  "exists": true,
248
+ "bytes": 12981,
249
+ "sha256": "9cce52025a2e2f8afb4660e2af3353aea6ad0a1af380849218dd74c0acc271bb"
250
  }
251
  },
252
  "failures": []
 
412
  "local": {
413
  "path": "repo:docs/data/project_status.json",
414
  "exists": true,
415
+ "bytes": 9874,
416
+ "sha256": "600c95726eae3404127a8b2110f35468ff2ba02943cae0fbcd3ea43c66109d3e"
417
  },
418
  "mirrors": {
419
  "hf_space": {
420
  "path": "hf_space:data/project_status.json",
421
  "exists": true,
422
+ "bytes": 9874,
423
+ "sha256": "600c95726eae3404127a8b2110f35468ff2ba02943cae0fbcd3ea43c66109d3e"
424
  },
425
  "hf_artifacts": {
426
  "path": "hf_artifacts:docs/data/project_status.json",
427
  "exists": true,
428
+ "bytes": 9874,
429
+ "sha256": "600c95726eae3404127a8b2110f35468ff2ba02943cae0fbcd3ea43c66109d3e"
430
  },
431
  "hf_model": {
432
  "path": "hf_model:metrics/project_status.json",
433
  "exists": true,
434
+ "bytes": 9874,
435
+ "sha256": "600c95726eae3404127a8b2110f35468ff2ba02943cae0fbcd3ea43c66109d3e"
436
  }
437
  },
438
  "failures": []
 
443
  "local": {
444
  "path": "repo:docs/data/publication_audit.json",
445
  "exists": true,
446
+ "bytes": 7237,
447
+ "sha256": "7fbb19f8990b1a4d902e282c010d27e4391755564fa68af97d96c298c6b054f8"
448
  },
449
  "mirrors": {
450
  "hf_space": {
451
  "path": "hf_space:data/publication_audit.json",
452
  "exists": true,
453
+ "bytes": 7237,
454
+ "sha256": "7fbb19f8990b1a4d902e282c010d27e4391755564fa68af97d96c298c6b054f8"
455
  },
456
  "hf_artifacts": {
457
  "path": "hf_artifacts:docs/data/publication_audit.json",
458
  "exists": true,
459
+ "bytes": 7237,
460
+ "sha256": "7fbb19f8990b1a4d902e282c010d27e4391755564fa68af97d96c298c6b054f8"
461
  },
462
  "hf_model": {
463
  "path": "hf_model:metrics/publication_audit.json",
464
  "exists": true,
465
+ "bytes": 7237,
466
+ "sha256": "7fbb19f8990b1a4d902e282c010d27e4391755564fa68af97d96c298c6b054f8"
467
  }
468
  },
469
  "failures": []
 
598
  "local": {
599
  "path": "repo:docs/data/research_roadmap.json",
600
  "exists": true,
601
+ "bytes": 7161,
602
+ "sha256": "cc96118c2c05108c831616151bc027441f7545495adeeb6a4a6a6bffe8da7801"
603
  },
604
  "mirrors": {
605
  "hf_space": {
606
  "path": "hf_space:data/research_roadmap.json",
607
  "exists": true,
608
+ "bytes": 7161,
609
+ "sha256": "cc96118c2c05108c831616151bc027441f7545495adeeb6a4a6a6bffe8da7801"
610
  },
611
  "hf_artifacts": {
612
  "path": "hf_artifacts:docs/data/research_roadmap.json",
613
  "exists": true,
614
+ "bytes": 7161,
615
+ "sha256": "cc96118c2c05108c831616151bc027441f7545495adeeb6a4a6a6bffe8da7801"
616
  },
617
  "hf_model": {
618
  "path": "hf_model:metrics/research_roadmap.json",
619
  "exists": true,
620
+ "bytes": 7161,
621
+ "sha256": "cc96118c2c05108c831616151bc027441f7545495adeeb6a4a6a6bffe8da7801"
622
  }
623
  },
624
  "failures": []
 
629
  "local": {
630
  "path": "repo:docs/data/research_roadmap_interactive.json",
631
  "exists": true,
632
+ "bytes": 134282,
633
+ "sha256": "ff37219a9f1d9b386a9d4c42766e4aa28f10ce6ef338dceeedd6bdb4a1b2c40a"
634
  },
635
  "mirrors": {
636
  "hf_space": {
637
  "path": "hf_space:data/research_roadmap_interactive.json",
638
  "exists": true,
639
+ "bytes": 134282,
640
+ "sha256": "ff37219a9f1d9b386a9d4c42766e4aa28f10ce6ef338dceeedd6bdb4a1b2c40a"
641
  },
642
  "hf_artifacts": {
643
  "path": "hf_artifacts:docs/data/research_roadmap_interactive.json",
644
  "exists": true,
645
+ "bytes": 134282,
646
+ "sha256": "ff37219a9f1d9b386a9d4c42766e4aa28f10ce6ef338dceeedd6bdb4a1b2c40a"
647
  },
648
  "hf_model": {
649
  "path": "hf_model:metrics/research_roadmap_interactive.json",
650
  "exists": true,
651
+ "bytes": 134282,
652
+ "sha256": "ff37219a9f1d9b386a9d4c42766e4aa28f10ce6ef338dceeedd6bdb4a1b2c40a"
653
  }
654
  },
655
  "failures": []
 
939
  "local": {
940
  "path": "repo:docs/data/website_integrity.json",
941
  "exists": true,
942
+ "bytes": 14891,
943
+ "sha256": "9ba1cfe02568fc9b08209902ce037c445a9a8c3954d20eea4351b04c65ca0a0c"
944
  },
945
  "mirrors": {
946
  "hf_space": {
947
  "path": "hf_space:data/website_integrity.json",
948
  "exists": true,
949
+ "bytes": 14891,
950
+ "sha256": "9ba1cfe02568fc9b08209902ce037c445a9a8c3954d20eea4351b04c65ca0a0c"
951
  },
952
  "hf_artifacts": {
953
  "path": "hf_artifacts:docs/data/website_integrity.json",
954
  "exists": true,
955
+ "bytes": 14891,
956
+ "sha256": "9ba1cfe02568fc9b08209902ce037c445a9a8c3954d20eea4351b04c65ca0a0c"
957
  },
958
  "hf_model": {
959
  "path": "hf_model:metrics/website_integrity.json",
960
  "exists": true,
961
+ "bytes": 14891,
962
+ "sha256": "9ba1cfe02568fc9b08209902ce037c445a9a8c3954d20eea4351b04c65ca0a0c"
963
  }
964
  },
965
  "failures": []
 
1692
  "local": {
1693
  "path": "repo:scripts/build_artifact_index.py",
1694
  "exists": true,
1695
+ "bytes": 27020,
1696
+ "sha256": "0ca7ed96f24caecbab31687cffa99f0eba8471258986412a294614e688c5aff5"
1697
  },
1698
  "mirrors": {
1699
  "hf_artifacts": {
1700
  "path": "hf_artifacts:scripts/build_artifact_index.py",
1701
  "exists": true,
1702
+ "bytes": 27020,
1703
+ "sha256": "0ca7ed96f24caecbab31687cffa99f0eba8471258986412a294614e688c5aff5"
1704
  },
1705
  "hf_model": {
1706
  "path": "hf_model:scripts/build_artifact_index.py",
1707
  "exists": true,
1708
+ "bytes": 27020,
1709
+ "sha256": "0ca7ed96f24caecbab31687cffa99f0eba8471258986412a294614e688c5aff5"
1710
  }
1711
  },
1712
  "failures": []
 
2017
  "local": {
2018
  "path": "repo:scripts/validate_publication_package.py",
2019
  "exists": true,
2020
+ "bytes": 17197,
2021
+ "sha256": "2a617f3204ffb8c59d1c5bc1828b4441a4d014bb531655fd0613e128a6d9abc2"
2022
  },
2023
  "mirrors": {
2024
  "hf_artifacts": {
2025
  "path": "hf_artifacts:scripts/validate_publication_package.py",
2026
  "exists": true,
2027
+ "bytes": 17197,
2028
+ "sha256": "2a617f3204ffb8c59d1c5bc1828b4441a4d014bb531655fd0613e128a6d9abc2"
2029
  },
2030
  "hf_model": {
2031
  "path": "hf_model:scripts/validate_publication_package.py",
2032
  "exists": true,
2033
+ "bytes": 17197,
2034
+ "sha256": "2a617f3204ffb8c59d1c5bc1828b4441a4d014bb531655fd0613e128a6d9abc2"
2035
  }
2036
  },
2037
  "failures": []
 
2117
  "local": {
2118
  "path": "repo:scripts/validate_website_integrity.py",
2119
  "exists": true,
2120
+ "bytes": 24481,
2121
+ "sha256": "31d85a4674e8005a916e759d820178287e297e0ec08774fe3a70aa3b61b07cf7"
2122
  },
2123
  "mirrors": {
2124
  "hf_artifacts": {
2125
  "path": "hf_artifacts:scripts/validate_website_integrity.py",
2126
  "exists": true,
2127
+ "bytes": 24481,
2128
+ "sha256": "31d85a4674e8005a916e759d820178287e297e0ec08774fe3a70aa3b61b07cf7"
2129
  },
2130
  "hf_model": {
2131
  "path": "hf_model:scripts/validate_website_integrity.py",
2132
  "exists": true,
2133
+ "bytes": 24481,
2134
+ "sha256": "31d85a4674e8005a916e759d820178287e297e0ec08774fe3a70aa3b61b07cf7"
2135
  }
2136
  },
2137
  "failures": []
 
2217
  "local": {
2218
  "path": "repo:docs/index.html",
2219
  "exists": true,
2220
+ "bytes": 174923,
2221
+ "sha256": "099fcc01cbb4d50f62c508b10f343f05b1c883962b85bda294bcede99af2a0f1"
2222
  },
2223
  "mirrors": {
2224
  "hf_space": {
2225
  "path": "hf_space:index.html",
2226
  "exists": true,
2227
+ "bytes": 174923,
2228
+ "sha256": "099fcc01cbb4d50f62c508b10f343f05b1c883962b85bda294bcede99af2a0f1"
2229
  },
2230
  "hf_artifacts_docs": {
2231
  "path": "hf_artifacts:docs/index.html",
2232
  "exists": true,
2233
+ "bytes": 174923,
2234
+ "sha256": "099fcc01cbb4d50f62c508b10f343f05b1c883962b85bda294bcede99af2a0f1"
2235
  }
2236
  },
2237
  "failures": []
 
2242
  "local": {
2243
  "path": "repo:docs/research_roadmap.html",
2244
  "exists": true,
2245
+ "bytes": 31702,
2246
+ "sha256": "1b20a5cc342b3ba59ad808eed9f5bf978e2d9ac438c88b5c3eeba01f4e14b883"
2247
  },
2248
  "mirrors": {
2249
  "hf_space": {
2250
  "path": "hf_space:research_roadmap.html",
2251
  "exists": true,
2252
+ "bytes": 31702,
2253
+ "sha256": "1b20a5cc342b3ba59ad808eed9f5bf978e2d9ac438c88b5c3eeba01f4e14b883"
2254
  },
2255
  "hf_artifacts_docs": {
2256
  "path": "hf_artifacts:docs/research_roadmap.html",
2257
  "exists": true,
2258
+ "bytes": 31702,
2259
+ "sha256": "1b20a5cc342b3ba59ad808eed9f5bf978e2d9ac438c88b5c3eeba01f4e14b883"
2260
  }
2261
  },
2262
  "failures": []
 
2844
  "local": {
2845
  "path": "repo:FOUNDATION_MODEL_PLAN.md",
2846
  "exists": true,
2847
+ "bytes": 9075,
2848
+ "sha256": "444d13ab556d2e16a199a7fca191b87c85ab8685d167aab357bc6341839299a2"
2849
  },
2850
  "mirrors": {
2851
  "hf_space": {
2852
  "path": "hf_space:FOUNDATION_MODEL_PLAN.md",
2853
  "exists": true,
2854
+ "bytes": 9075,
2855
+ "sha256": "444d13ab556d2e16a199a7fca191b87c85ab8685d167aab357bc6341839299a2"
2856
  },
2857
  "hf_artifacts": {
2858
  "path": "hf_artifacts:FOUNDATION_MODEL_PLAN.md",
2859
  "exists": true,
2860
+ "bytes": 9075,
2861
+ "sha256": "444d13ab556d2e16a199a7fca191b87c85ab8685d167aab357bc6341839299a2"
2862
  },
2863
  "hf_model": {
2864
  "path": "hf_model:FOUNDATION_MODEL_PLAN.md",
2865
  "exists": true,
2866
+ "bytes": 9075,
2867
+ "sha256": "444d13ab556d2e16a199a7fca191b87c85ab8685d167aab357bc6341839299a2"
2868
  }
2869
  },
2870
  "failures": []
 
2937
  "local": {
2938
  "path": "repo:RESEARCH_ROADMAP.md",
2939
  "exists": true,
2940
+ "bytes": 8388,
2941
+ "sha256": "0b3e3356076998ad94dc39f708cc783a4ebeab76c9da661cdd37ea12a3bb3665"
2942
  },
2943
  "mirrors": {
2944
  "hf_space": {
2945
  "path": "hf_space:RESEARCH_ROADMAP.md",
2946
  "exists": true,
2947
+ "bytes": 8388,
2948
+ "sha256": "0b3e3356076998ad94dc39f708cc783a4ebeab76c9da661cdd37ea12a3bb3665"
2949
  },
2950
  "hf_artifacts": {
2951
  "path": "hf_artifacts:RESEARCH_ROADMAP.md",
2952
  "exists": true,
2953
+ "bytes": 8388,
2954
+ "sha256": "0b3e3356076998ad94dc39f708cc783a4ebeab76c9da661cdd37ea12a3bb3665"
2955
  },
2956
  "hf_model": {
2957
  "path": "hf_model:RESEARCH_ROADMAP.md",
2958
  "exists": true,
2959
+ "bytes": 8388,
2960
+ "sha256": "0b3e3356076998ad94dc39f708cc783a4ebeab76c9da661cdd37ea12a3bb3665"
2961
  }
2962
  },
2963
  "failures": []
 
2968
  "local": {
2969
  "path": "repo:PROJECT_STATUS.md",
2970
  "exists": true,
2971
+ "bytes": 7207,
2972
+ "sha256": "7baaba976ccc254da1a03ee2653057d1e08f3fb0c0cad035886c362442828720"
2973
  },
2974
  "mirrors": {
2975
  "hf_space": {
2976
  "path": "hf_space:PROJECT_STATUS.md",
2977
  "exists": true,
2978
+ "bytes": 7207,
2979
+ "sha256": "7baaba976ccc254da1a03ee2653057d1e08f3fb0c0cad035886c362442828720"
2980
  },
2981
  "hf_artifacts": {
2982
  "path": "hf_artifacts:PROJECT_STATUS.md",
2983
  "exists": true,
2984
+ "bytes": 7207,
2985
+ "sha256": "7baaba976ccc254da1a03ee2653057d1e08f3fb0c0cad035886c362442828720"
2986
  },
2987
  "hf_model": {
2988
  "path": "hf_model:PROJECT_STATUS.md",
2989
  "exists": true,
2990
+ "bytes": 7207,
2991
+ "sha256": "7baaba976ccc254da1a03ee2653057d1e08f3fb0c0cad035886c362442828720"
2992
  }
2993
  },
2994
  "failures": []
docs/data/project_status.json CHANGED
@@ -82,7 +82,7 @@
82
  "RESEARCH_ROADMAP.md",
83
  "docs/data/research_roadmap.json"
84
  ],
85
- "readout": "The roadmap connects public-sample task development to 128-episode data preparation, Qwen3-Omni LoRA, foundation-model selection, robustness runs, and larger omni/world-model extensions."
86
  },
87
  {
88
  "area": "Foundation-model plan",
@@ -93,6 +93,14 @@
93
  ],
94
  "readout": "Qwen3-Omni remains the first trainable held-out LoRA baseline; Cosmos 3 is added as the first world-model/action-generation branch; OpenVLA/openpi/GR00T are policy candidates after action targets are explicit."
95
  },
 
 
 
 
 
 
 
 
96
  {
97
  "area": "Official dataset wording",
98
  "status": "verified",
@@ -167,6 +175,7 @@
167
  "Inspect RESEARCH_TAKEAWAYS.md and docs/data/research_takeaways.json before interpreting model scores.",
168
  "Inspect RESEARCH_ROADMAP.md and docs/data/research_roadmap.json for the path from public-sample task work to multi-episode modeling.",
169
  "Inspect FOUNDATION_MODEL_PLAN.md and docs/data/foundation_model_plan.json before choosing a backbone branch.",
 
170
  "Inspect docs/data/summary_metrics.json and results/episode_task_suite/neural_mlp/ to check the 12-task outputs.",
171
  "Inspect results/audio_ablation/AUDIO_ABLATION_SUMMARY.md before judging whether audio helps the current task suite.",
172
  "Inspect EVALUATION_PROTOCOL.md before judging task metrics or leakage controls.",
@@ -180,6 +189,7 @@
180
  "The current reconstruction task reconstructs feature vectors, not pixel-depth, mesh, NeRF, or Gaussian reconstruction.",
181
  "Audio is one of the synchronized source modalities in the current task representation.",
182
  "The audio ablation report compares audio/no-audio variants across all 12 task contracts in results/audio_ablation/.",
183
- "Foundation-model selection is explicit: Qwen3-Omni is the immediate trainable pilot, Cosmos 3 is the first world-model branch, and policy models such as OpenVLA/openpi/GR00T wait for action-target conversion."
 
184
  ]
185
  }
 
82
  "RESEARCH_ROADMAP.md",
83
  "docs/data/research_roadmap.json"
84
  ],
85
+ "readout": "The roadmap connects public-sample task development to 128-episode data preparation, Qwen3-Omni LoRA, foundation-model selection, robustness runs, world/policy branches, and the future Xperience-native pretraining goal."
86
  },
87
  {
88
  "area": "Foundation-model plan",
 
93
  ],
94
  "readout": "Qwen3-Omni remains the first trainable held-out LoRA baseline; Cosmos 3 is added as the first world-model/action-generation branch; OpenVLA/openpi/GR00T are policy candidates after action targets are explicit."
95
  },
96
+ {
97
+ "area": "Xperience Embodied Foundation Model",
98
+ "status": "future_goal",
99
+ "evidence": [
100
+ "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md"
101
+ ],
102
+ "readout": "A future full-corpus pretraining plan describes target modules, objectives, staged scale-up, hardware ranges, and evaluation for a domain-specific embodied foundation model."
103
+ },
104
  {
105
  "area": "Official dataset wording",
106
  "status": "verified",
 
175
  "Inspect RESEARCH_TAKEAWAYS.md and docs/data/research_takeaways.json before interpreting model scores.",
176
  "Inspect RESEARCH_ROADMAP.md and docs/data/research_roadmap.json for the path from public-sample task work to multi-episode modeling.",
177
  "Inspect FOUNDATION_MODEL_PLAN.md and docs/data/foundation_model_plan.json before choosing a backbone branch.",
178
+ "Inspect XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md for the long-term full-corpus pretraining goal.",
179
  "Inspect docs/data/summary_metrics.json and results/episode_task_suite/neural_mlp/ to check the 12-task outputs.",
180
  "Inspect results/audio_ablation/AUDIO_ABLATION_SUMMARY.md before judging whether audio helps the current task suite.",
181
  "Inspect EVALUATION_PROTOCOL.md before judging task metrics or leakage controls.",
 
189
  "The current reconstruction task reconstructs feature vectors, not pixel-depth, mesh, NeRF, or Gaussian reconstruction.",
190
  "Audio is one of the synchronized source modalities in the current task representation.",
191
  "The audio ablation report compares audio/no-audio variants across all 12 task contracts in results/audio_ablation/.",
192
+ "Foundation-model selection is explicit: Qwen3-Omni is the immediate trainable pilot, Cosmos 3 is the first world-model branch, and policy models such as OpenVLA/openpi/GR00T wait for action-target conversion.",
193
+ "The Xperience Embodied Foundation Model is a future native-pretraining goal, not a completed model or current benchmark."
194
  ]
195
  }
docs/data/publication_audit.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-04T16:49:00+00:00",
4
  "checks": [
5
  {
6
  "name": "required_publication_assets_present",
@@ -141,7 +141,7 @@
141
  "surface": "github_repo",
142
  "path": "README.md",
143
  "exists": true,
144
- "required_marker_count": 20,
145
  "missing_markers": [],
146
  "status": "pass"
147
  },
@@ -149,7 +149,7 @@
149
  "surface": "hf_space_bundle",
150
  "path": "README.md",
151
  "exists": true,
152
- "required_marker_count": 20,
153
  "missing_markers": [],
154
  "status": "pass"
155
  },
@@ -157,7 +157,7 @@
157
  "surface": "hf_artifact_bundle",
158
  "path": "README.md",
159
  "exists": true,
160
- "required_marker_count": 19,
161
  "missing_markers": [],
162
  "status": "pass"
163
  },
@@ -165,7 +165,7 @@
165
  "surface": "hf_artifact_bundle",
166
  "path": "PROJECT_README.md",
167
  "exists": true,
168
- "required_marker_count": 20,
169
  "missing_markers": [],
170
  "status": "pass"
171
  },
@@ -173,7 +173,7 @@
173
  "surface": "hf_model_bundle",
174
  "path": "README.md",
175
  "exists": true,
176
- "required_marker_count": 20,
177
  "missing_markers": [],
178
  "status": "pass"
179
  }
@@ -182,8 +182,8 @@
182
  "github_repo": {
183
  "root": "repo",
184
  "exists": true,
185
- "file_count": 386,
186
- "text_file_count": 320,
187
  "largest_file": {
188
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
189
  "bytes": 55702978
@@ -193,8 +193,8 @@
193
  "hf_space_bundle": {
194
  "root": "hf_publish/space",
195
  "exists": true,
196
- "file_count": 316,
197
- "text_file_count": 250,
198
  "largest_file": {
199
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
200
  "bytes": 55702978
@@ -204,8 +204,8 @@
204
  "hf_artifact_bundle": {
205
  "root": "hf_publish/artifacts",
206
  "exists": true,
207
- "file_count": 417,
208
- "text_file_count": 329,
209
  "largest_file": {
210
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
211
  "bytes": 55702978
@@ -215,11 +215,11 @@
215
  "hf_model_bundle": {
216
  "root": "hf_publish/model",
217
  "exists": true,
218
- "file_count": 640,
219
- "text_file_count": 516,
220
  "largest_file": {
221
- "path": "artifacts/episode_task_suite/modality_reconstruction/predictions.npz",
222
- "bytes": 55702978
223
  },
224
  "violations": []
225
  }
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-04T20:43:37+00:00",
4
  "checks": [
5
  {
6
  "name": "required_publication_assets_present",
 
141
  "surface": "github_repo",
142
  "path": "README.md",
143
  "exists": true,
144
+ "required_marker_count": 10,
145
  "missing_markers": [],
146
  "status": "pass"
147
  },
 
149
  "surface": "hf_space_bundle",
150
  "path": "README.md",
151
  "exists": true,
152
+ "required_marker_count": 10,
153
  "missing_markers": [],
154
  "status": "pass"
155
  },
 
157
  "surface": "hf_artifact_bundle",
158
  "path": "README.md",
159
  "exists": true,
160
+ "required_marker_count": 7,
161
  "missing_markers": [],
162
  "status": "pass"
163
  },
 
165
  "surface": "hf_artifact_bundle",
166
  "path": "PROJECT_README.md",
167
  "exists": true,
168
+ "required_marker_count": 10,
169
  "missing_markers": [],
170
  "status": "pass"
171
  },
 
173
  "surface": "hf_model_bundle",
174
  "path": "README.md",
175
  "exists": true,
176
+ "required_marker_count": 10,
177
  "missing_markers": [],
178
  "status": "pass"
179
  }
 
182
  "github_repo": {
183
  "root": "repo",
184
  "exists": true,
185
+ "file_count": 396,
186
+ "text_file_count": 330,
187
  "largest_file": {
188
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
189
  "bytes": 55702978
 
193
  "hf_space_bundle": {
194
  "root": "hf_publish/space",
195
  "exists": true,
196
+ "file_count": 317,
197
+ "text_file_count": 251,
198
  "largest_file": {
199
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
200
  "bytes": 55702978
 
204
  "hf_artifact_bundle": {
205
  "root": "hf_publish/artifacts",
206
  "exists": true,
207
+ "file_count": 418,
208
+ "text_file_count": 330,
209
  "largest_file": {
210
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
211
  "bytes": 55702978
 
215
  "hf_model_bundle": {
216
  "root": "hf_publish/model",
217
  "exists": true,
218
+ "file_count": 644,
219
+ "text_file_count": 519,
220
  "largest_file": {
221
+ "path": "pytorch_model.bin",
222
+ "bytes": 93495480
223
  },
224
  "violations": []
225
  }
docs/data/research_roadmap.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "title": "Ropedia Xperience-10M Research Roadmap",
3
- "summary": "Staged path from the public-sample task lab to multi-episode held-out evaluation, foundation-model selection, and larger omni/world-model extensions.",
4
- "current_decision_point": "Keep the public-sample task suite as the development harness, prepare the selected official Xperience-10M episodes for the held-out Qwen3-Omni pilot, then branch into Cosmos 3 world modeling and policy-model experiments after the data preparation path is stable.",
5
  "phases": [
6
  {
7
  "id": "public_sample_task_lab",
@@ -126,6 +126,30 @@
126
  "updated model cards"
127
  ],
128
  "reader_takeaway": "The long-term direction is richer multimodal representation learning for embodied-AI reasoning, with model branches chosen by task fit rather than by a single default backbone."
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
129
  }
130
  ],
131
  "public_surfaces_to_update": [
@@ -134,6 +158,7 @@
134
  "RESEARCH_TAKEAWAYS.md",
135
  "EVALUATION_PROTOCOL.md",
136
  "ARTIFACT_GUIDE.md",
 
137
  "docs/index.html",
138
  "docs/data/research_roadmap.json",
139
  "Hugging Face Space card",
 
1
  {
2
  "title": "Ropedia Xperience-10M Research Roadmap",
3
+ "summary": "Staged path from the public-sample task lab to multi-episode held-out evaluation, foundation-model selection, world/policy branches, and a future Xperience-native embodied foundation model.",
4
+ "current_decision_point": "Keep the public-sample task suite as the development harness, prepare the selected official Xperience-10M episodes for the held-out Qwen3-Omni pilot, then branch into Cosmos 3 world modeling and policy-model experiments after the data preparation path is stable. The Xperience Embodied Foundation Model is a later full-corpus pretraining goal, not a current result.",
5
  "phases": [
6
  {
7
  "id": "public_sample_task_lab",
 
126
  "updated model cards"
127
  ],
128
  "reader_takeaway": "The long-term direction is richer multimodal representation learning for embodied-AI reasoning, with model branches chosen by task fit rather than by a single default backbone."
129
+ },
130
+ {
131
+ "id": "xperience_embodied_foundation_pretraining",
132
+ "name": "Xperience Embodied Foundation Model Pretraining",
133
+ "status": "future",
134
+ "entry_condition": "Full-corpus access, PB-scale storage path, high-throughput data loading, multi-node compute, and positive scaling evidence from smaller multi-episode runs.",
135
+ "deliverables": [
136
+ "full-corpus episode and split manifests",
137
+ "pretraining shard and provenance manifests",
138
+ "0.3B-1B and 1B-3B scaling pilots",
139
+ "3B-7B Xperience-native domain model target",
140
+ "held-out episode/session/activity/object evaluations",
141
+ "missing-modality robustness report",
142
+ "model card and data-boundary report"
143
+ ],
144
+ "completion_evidence": [
145
+ "pretraining metadata",
146
+ "checkpoint inventory",
147
+ "scaling curves",
148
+ "held-out evaluation reports",
149
+ "qualitative retrieval or future-state examples",
150
+ "safety and data-boundary report"
151
+ ],
152
+ "reader_takeaway": "The final research direction is a domain-specific embodied foundation model trained directly on Xperience-10M, after smaller pilots justify the cost and infrastructure."
153
  }
154
  ],
155
  "public_surfaces_to_update": [
 
158
  "RESEARCH_TAKEAWAYS.md",
159
  "EVALUATION_PROTOCOL.md",
160
  "ARTIFACT_GUIDE.md",
161
+ "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md",
162
  "docs/index.html",
163
  "docs/data/research_roadmap.json",
164
  "Hugging Face Space card",
docs/data/research_roadmap_interactive.json CHANGED
@@ -1837,7 +1837,8 @@
1837
  "NVIDIA GR00T"
1838
  ],
1839
  "first_world_model_branch": "Cosmos 3",
1840
- "immediate_trainable_backbone": "Qwen3-Omni"
 
1841
  },
1842
  "evaluation_additions": [
1843
  {
@@ -1921,6 +1922,11 @@
1921
  "action": "Publish branch results only with real manifests, predictions, metrics, and qualitative examples.",
1922
  "name": "Publishing threshold",
1923
  "step": 6
 
 
 
 
 
1924
  }
1925
  ],
1926
  "model_families": [
@@ -2023,6 +2029,21 @@
2023
  "Useful after action target design.",
2024
  "Less directly omni-modal than Qwen3-Omni or Cosmos 3."
2025
  ]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2026
  }
2027
  ],
2028
  "source_links": [
@@ -2057,11 +2078,15 @@
2057
  {
2058
  "label": "LeRobot / SmolVLA",
2059
  "url": "https://github.com/huggingface/lerobot"
 
 
 
 
2060
  }
2061
  ],
2062
  "status": "planning_artifact"
2063
  },
2064
- "generated_at_utc": "2026-06-04T16:42:13+00:00",
2065
  "omni_plan": {
2066
  "adapter": "LoRA rank 16, alpha 32, dropout 0.05",
2067
  "backbone": "Qwen/Qwen3-Omni-30B-A3B-Instruct",
@@ -2208,6 +2233,31 @@
2208
  "reader_takeaway": "The long-term direction is richer multimodal representation learning for embodied-AI reasoning, with model branches chosen by task fit rather than by a single default backbone.",
2209
  "stage": "future",
2210
  "status": "planned"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2211
  }
2212
  ],
2213
  "scale_up": {
 
1837
  "NVIDIA GR00T"
1838
  ],
1839
  "first_world_model_branch": "Cosmos 3",
1840
+ "immediate_trainable_backbone": "Qwen3-Omni",
1841
+ "long_term_native_pretraining_goal": "Xperience Embodied Foundation Model"
1842
  },
1843
  "evaluation_additions": [
1844
  {
 
1922
  "action": "Publish branch results only with real manifests, predictions, metrics, and qualitative examples.",
1923
  "name": "Publishing threshold",
1924
  "step": 6
1925
+ },
1926
+ {
1927
+ "action": "Start a from-scratch Xperience Embodied Foundation Model only after smaller scaling stages, full-corpus storage, multi-node compute, and held-out evaluation protocols are in place.",
1928
+ "name": "Xperience-native pretraining",
1929
+ "step": 7
1930
  }
1931
  ],
1932
  "model_families": [
 
2029
  "Useful after action target design.",
2030
  "Less directly omni-modal than Qwen3-Omni or Cosmos 3."
2031
  ]
2032
+ },
2033
+ {
2034
+ "best_role": "Domain model over synchronized embodied experience.",
2035
+ "category": "xperience_native_pretraining_goal",
2036
+ "current_decision": "future_goal_after_scaling_evidence",
2037
+ "entry_condition": "Full-corpus data path, PB-scale storage, multi-node compute, and positive smaller-run scaling evidence.",
2038
+ "family": "Xperience Embodied Foundation Model",
2039
+ "openness": "future project-specific model if full-corpus access and compute exist",
2040
+ "priority": 8,
2041
+ "public_source": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md",
2042
+ "xperience10m_fit": [
2043
+ "Uses the full aligned modality stack rather than treating sensors as auxiliary metadata.",
2044
+ "Targets temporal embodied representation learning across perception, motion, geometry, audio, and language.",
2045
+ "Can become the shared pretraining backbone for Qwen-style instruction tasks, Cosmos-style world modeling, and policy/action branches."
2046
+ ]
2047
  }
2048
  ],
2049
  "source_links": [
 
2078
  {
2079
  "label": "LeRobot / SmolVLA",
2080
  "url": "https://github.com/huggingface/lerobot"
2081
+ },
2082
+ {
2083
+ "label": "Xperience Embodied Foundation Model pretraining plan",
2084
+ "url": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md"
2085
  }
2086
  ],
2087
  "status": "planning_artifact"
2088
  },
2089
+ "generated_at_utc": "2026-06-04T20:40:29+00:00",
2090
  "omni_plan": {
2091
  "adapter": "LoRA rank 16, alpha 32, dropout 0.05",
2092
  "backbone": "Qwen/Qwen3-Omni-30B-A3B-Instruct",
 
2233
  "reader_takeaway": "The long-term direction is richer multimodal representation learning for embodied-AI reasoning, with model branches chosen by task fit rather than by a single default backbone.",
2234
  "stage": "future",
2235
  "status": "planned"
2236
+ },
2237
+ {
2238
+ "completion_evidence": [
2239
+ "pretraining metadata",
2240
+ "checkpoint inventory",
2241
+ "scaling curves",
2242
+ "held-out evaluation reports",
2243
+ "qualitative retrieval or future-state examples",
2244
+ "safety and data-boundary report"
2245
+ ],
2246
+ "deliverables": [
2247
+ "full-corpus episode and split manifests",
2248
+ "pretraining shard and provenance manifests",
2249
+ "0.3B-1B and 1B-3B scaling pilots",
2250
+ "3B-7B Xperience-native domain model target",
2251
+ "held-out episode/session/activity/object evaluations",
2252
+ "missing-modality robustness report",
2253
+ "model card and data-boundary report"
2254
+ ],
2255
+ "entry_condition": "Full-corpus access, PB-scale storage path, high-throughput data loading, multi-node compute, and positive scaling evidence from smaller multi-episode runs.",
2256
+ "id": "xperience_embodied_foundation_pretraining",
2257
+ "name": "Xperience Embodied Foundation Model Pretraining",
2258
+ "reader_takeaway": "The final research direction is a domain-specific embodied foundation model trained directly on Xperience-10M, after smaller pilots justify the cost and infrastructure.",
2259
+ "stage": "future",
2260
+ "status": "future"
2261
  }
2262
  ],
2263
  "scale_up": {
docs/index.html CHANGED
@@ -2141,9 +2141,11 @@
2141
  <p class="hero-copy">
2142
  This project uses the public Xperience-10M sample from Ropedia to explore
2143
  embodied-AI task design, multimodal feature construction, lightweight
2144
- baselines, and future Omni-model fine-tuning. It starts from the sample
2145
- episode available now, then keeps the same data contracts ready for
2146
- held-out multi-episode training when more Xperience-10M data is prepared.
 
 
2147
  </p>
2148
  <div class="hero-actions">
2149
  <a class="button primary" href="research_roadmap.html">Open roadmap</a>
@@ -2252,7 +2254,7 @@
2252
  </article>
2253
  <article class="brief-card">
2254
  <strong>Scale-up readiness</strong>
2255
- <p>Connects the same data contract to 32/128-episode held-out pilots, Qwen3-Omni LoRA, Cosmos-style world modeling, and later policy-model branches.</p>
2256
  </article>
2257
  </div>
2258
  <div class="brief-actions">
@@ -2356,7 +2358,7 @@
2356
  <div class="wrap">
2357
  <div class="section-head">
2358
  <h2>Research roadmap.</h2>
2359
- <p>The project path moves from the current public-sample task lab to multi-episode data preparation, held-out Qwen3-Omni fine-tuning, robustness runs, and larger foundation/world-model extensions.</p>
2360
  </div>
2361
  <div class="roadmap-grid" aria-label="Research roadmap stages">
2362
  <article class="roadmap-card" data-status="implemented">
@@ -2413,12 +2415,22 @@
2413
  <strong>Evidence</strong><p>Task-specific held-out evaluations, qualitative inspection, and updated model cards.</p>
2414
  </div>
2415
  </article>
 
 
 
 
 
 
 
 
 
2416
  </div>
2417
  <div class="roadmap-links">
2418
  <a href="research_roadmap.html">interactive roadmap</a>
2419
  <a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/RESEARCH_ROADMAP.md">roadmap document</a>
2420
  <a href="data/research_roadmap.json">roadmap stages</a>
2421
  <a href="data/foundation_model_plan.json">foundation model plan</a>
 
2422
  <a href="data/research_roadmap_interactive.json">interactive map</a>
2423
  <a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">scale-up status</a>
2424
  <a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/PROJECT_STATUS.md">project status</a>
@@ -2438,7 +2450,7 @@
2438
  <article class="artifact"><h3>Metric contract</h3><p>All 12 tasks list input, target, primary metric, minimal baseline score, and neural MLP score from committed result files.</p><a href="data/summary_metrics.json">summary metrics</a></article>
2439
  <article class="artifact"><h3>Leakage controls</h3><p>Scalers fit on train windows only; future labels, target-side signals, caption/object labels, and contact labels stay on the target side unless explicitly queried.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/scripts/build_evaluation_protocol.py">builder script</a></article>
2440
  <article class="artifact"><h3>Audio ablation</h3><p>Audio and no-audio variants are evaluated across all 12 task contracts under the same chronological split.</p><a href="data/audio_ablation_summary.json">audio summary</a></article>
2441
- <article class="artifact"><h3>Foundation branch selection</h3><p>Qwen3-Omni is the first trainable baseline, Cosmos 3 becomes the world-model branch, and policy models wait for explicit action targets.</p><a href="data/foundation_model_plan.json">backbone plan</a></article>
2442
  <article class="artifact"><h3>Next evaluation stage</h3><p>This public-sample run covers single-episode task development. Cross-episode generalization, audio-visual learning, world modeling, policy targets, and held-out Qwen3-Omni training move to the multi-episode stage after selected data is prepared.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">next-stage plan</a></article>
2443
  <article class="artifact"><h3>Scale-up requirement</h3><p>The Omni pilot requires selected prepared episodes, held-out episode splits, no train/test episode leakage, training metadata, predictions, metrics, and a run report.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">scale-up status</a></article>
2444
  </div>
@@ -2492,10 +2504,11 @@
2492
  <article class="evidence-card">
2493
  <span class="status-pill">current plan</span>
2494
  <h3>Foundation backbones are separated by role</h3>
2495
- <p>Qwen3-Omni stays first for held-out LoRA; Cosmos 3 is the world-model branch; OpenVLA/openpi/GR00T are policy candidates after action-space conversion.</p>
2496
  <div class="evidence-links">
2497
  <a href="data/foundation_model_plan.json">foundation model plan</a>
2498
  <a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/FOUNDATION_MODEL_PLAN.md">plan doc</a>
 
2499
  </div>
2500
  </article>
2501
  <article class="evidence-card">
@@ -2628,10 +2641,11 @@
2628
  <article class="reading-card">
2629
  <span class="step-index">04</span>
2630
  <h3>Check the scale-up gate</h3>
2631
- <p>The multi-episode Qwen3-Omni path is prepared. The selected 128-episode result will be added after staging, preprocessing, training, and held-out evaluation pass.</p>
2632
  <div class="reading-links">
2633
  <a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">scale-up status</a>
2634
  <a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md">data access</a>
 
2635
  <a href="data/project_packet.json">reader path</a>
2636
  </div>
2637
  </article>
@@ -2659,7 +2673,7 @@
2659
  <article class="artifact"><h3>Current project subset</h3><p>One public sample episode, 5,821 frames, 1,161 aligned windows, 8,546-dimensional task inputs, and no raw-data redistribution.</p><a href="data/modality_atlas.json">modality atlas</a></article>
2660
  <article class="artifact"><h3>Covered now</h3><p>Action/subtask labels, next-action prediction, temporal diagnostics, hand trajectory, contact, object relevance, caption grounding, retrieval, reconstruction, and misalignment.</p><a href="data/summary_metrics.json">summary metrics</a></article>
2661
  <article class="artifact"><h3>Responsible use</h3><p>This project is for research exploration and excludes identity recognition, surveillance, biometric profiling, sensitive-attribute inference, and safety-critical deployment.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/DATA_NOTICE.md">use notes</a></article>
2662
- <article class="artifact"><h3>Later milestones</h3><p>Full audio-visual learning, caption generation, depth-pixel prediction, SLAM estimation, neural rendering, policy learning, cross-episode generalization, and held-out Qwen3-Omni evaluation.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">scale-up status</a></article>
2663
  </div>
2664
  </div>
2665
  </section>
@@ -3103,10 +3117,11 @@
3103
  </div>
3104
  <div class="artifact-grid">
3105
  <article class="artifact primary-artifact"><div><h3>Project scope</h3><p>Connects implemented single-episode artifacts, setup-stage Omni work, the selected 128-episode pilot, and later multi-episode milestones.</p></div><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/EVIDENCE_CONTRACT.md">evidence contract</a></article>
3106
- <article class="artifact"><h3>Foundation-model plan</h3><p>Backbone selection matrix covering Qwen3-Omni, Cosmos 3, GR00T, OpenVLA/openpi, Gemini Robotics, Octo, and SmolVLA-style policy candidates.</p><a href="data/foundation_model_plan.json">foundation model plan</a></article>
3107
  <article class="artifact"><h3>Multi-episode data access</h3><p>Public data-access path, selected 128-episode pilot plan, and preparation requirements.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md">data access</a></article>
3108
  <article class="artifact"><h3>Qwen3-Omni preparation</h3><p>Episode selection and manifest preparation for the current scale-up path.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/episode_manifest.json">preparation details</a></article>
3109
  <article class="artifact"><h3>Scale-up requirement</h3><p>What must be available before full pilot training and held-out metrics.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">training requirements</a></article>
 
3110
  </div>
3111
  </section>
3112
 
@@ -3123,7 +3138,7 @@
3123
  <article class="artifact"><h3>Dataset notes</h3><p>Official dataset links, public sample source, modalities, access boundary, and current project subset.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/XPERIENCE10M_DATASET_CARD_ALIGNMENT.md">dataset notes</a></article>
3124
  <article class="artifact"><h3>Reproducibility</h3><p>Commands and expected outputs for rebuilding the public-sample task suite and visual artifacts.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/REPRODUCIBILITY.md">reproduce</a></article>
3125
  <article class="artifact"><h3>Qwen3-Omni status</h3><p>Data requirements and evaluation boundary for the selected multi-episode LoRA pilot.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">training status</a></article>
3126
- <article class="artifact"><h3>Foundation-model plan</h3><p>Qwen3-Omni, Cosmos 3, GR00T, OpenVLA/openpi, Gemini Robotics, Octo, and SmolVLA-style branches by role.</p><a href="data/foundation_model_plan.json">model plan</a></article>
3127
  <article class="artifact"><h3>Hub artifacts</h3><p>Derived CSV/JSON/Markdown/figure artifacts without redistributing raw Xperience-10M data.</p><a href="https://huggingface.co/datasets/cy0307/ropedia-xperience-10m-task-suite-artifacts">artifact dataset</a></article>
3128
  <article class="artifact"><h3>Baseline models</h3><p>Lightweight minimal and neural task-head model files for the 12 task contracts.</p><a href="https://huggingface.co/cy0307/ropedia-xperience-10m-task-baselines">model repo</a></article>
3129
  </div>
@@ -3143,6 +3158,7 @@
3143
  <article class="artifact"><h3>Transfer</h3><p>Download raw episodes only from official gated sources, exclude visualization.rrd, validate files, then stage them for training.</p></article>
3144
  <article class="artifact"><h3>Current LoRA artifact</h3><p>The current LoRA artifact uses the locally available sample data. The multi-episode result begins after selected data is prepared, preprocessed, trained, and evaluated on held-out sessions.</p></article>
3145
  <article class="artifact"><h3>Backbone branches</h3><p>Qwen3-Omni is the immediate LoRA path; Cosmos 3 is the first world-model branch; GR00T/OpenVLA/openpi become policy branches after action targets are well-defined.</p><a href="data/foundation_model_plan.json">backbone plan</a></article>
 
3146
  </div>
3147
  </div>
3148
  </section>
 
2141
  <p class="hero-copy">
2142
  This project uses the public Xperience-10M sample from Ropedia to explore
2143
  embodied-AI task design, multimodal feature construction, lightweight
2144
+ baselines, future Omni-model fine-tuning, and the long-term path toward
2145
+ an Xperience-native embodied foundation model. It starts from the
2146
+ sample episode available now, then keeps the same data contracts ready
2147
+ for held-out multi-episode training when more Xperience-10M data is
2148
+ prepared.
2149
  </p>
2150
  <div class="hero-actions">
2151
  <a class="button primary" href="research_roadmap.html">Open roadmap</a>
 
2254
  </article>
2255
  <article class="brief-card">
2256
  <strong>Scale-up readiness</strong>
2257
+ <p>Connects the same data contract to 32/128-episode held-out pilots, Qwen3-Omni LoRA, Cosmos-style world modeling, policy-model branches, and the later Xperience-native pretraining goal.</p>
2258
  </article>
2259
  </div>
2260
  <div class="brief-actions">
 
2358
  <div class="wrap">
2359
  <div class="section-head">
2360
  <h2>Research roadmap.</h2>
2361
+ <p>The project path moves from the current public-sample task lab to multi-episode data preparation, held-out Qwen3-Omni fine-tuning, robustness runs, world/policy branches, and the future Xperience Embodied Foundation Model pretraining goal.</p>
2362
  </div>
2363
  <div class="roadmap-grid" aria-label="Research roadmap stages">
2364
  <article class="roadmap-card" data-status="implemented">
 
2415
  <strong>Evidence</strong><p>Task-specific held-out evaluations, qualitative inspection, and updated model cards.</p>
2416
  </div>
2417
  </article>
2418
+ <article class="roadmap-card" data-status="planned">
2419
+ <span class="roadmap-status">future</span>
2420
+ <h3>Xperience Embodied Foundation Model</h3>
2421
+ <p>Pretrain an Xperience-native domain model over synchronized video, audio, depth, pose, mocap, IMU, and language after smaller scaling stages prove value.</p>
2422
+ <div class="roadmap-meta">
2423
+ <strong>Entry</strong><p>Full-corpus access, PB-scale storage path, multi-node compute, and positive scaling evidence.</p>
2424
+ <strong>Evidence</strong><p>Pretraining manifests, scaling curves, held-out evaluations, checkpoint inventory, model card, and data-boundary report.</p>
2425
+ </div>
2426
+ </article>
2427
  </div>
2428
  <div class="roadmap-links">
2429
  <a href="research_roadmap.html">interactive roadmap</a>
2430
  <a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/RESEARCH_ROADMAP.md">roadmap document</a>
2431
  <a href="data/research_roadmap.json">roadmap stages</a>
2432
  <a href="data/foundation_model_plan.json">foundation model plan</a>
2433
+ <a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md">native pretraining plan</a>
2434
  <a href="data/research_roadmap_interactive.json">interactive map</a>
2435
  <a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">scale-up status</a>
2436
  <a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/PROJECT_STATUS.md">project status</a>
 
2450
  <article class="artifact"><h3>Metric contract</h3><p>All 12 tasks list input, target, primary metric, minimal baseline score, and neural MLP score from committed result files.</p><a href="data/summary_metrics.json">summary metrics</a></article>
2451
  <article class="artifact"><h3>Leakage controls</h3><p>Scalers fit on train windows only; future labels, target-side signals, caption/object labels, and contact labels stay on the target side unless explicitly queried.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/scripts/build_evaluation_protocol.py">builder script</a></article>
2452
  <article class="artifact"><h3>Audio ablation</h3><p>Audio and no-audio variants are evaluated across all 12 task contracts under the same chronological split.</p><a href="data/audio_ablation_summary.json">audio summary</a></article>
2453
+ <article class="artifact"><h3>Foundation branch selection</h3><p>Qwen3-Omni is the first trainable baseline, Cosmos 3 becomes the world-model branch, policy models wait for explicit action targets, and Xperience-native pretraining remains a later full-corpus goal.</p><a href="data/foundation_model_plan.json">backbone plan</a></article>
2454
  <article class="artifact"><h3>Next evaluation stage</h3><p>This public-sample run covers single-episode task development. Cross-episode generalization, audio-visual learning, world modeling, policy targets, and held-out Qwen3-Omni training move to the multi-episode stage after selected data is prepared.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">next-stage plan</a></article>
2455
  <article class="artifact"><h3>Scale-up requirement</h3><p>The Omni pilot requires selected prepared episodes, held-out episode splits, no train/test episode leakage, training metadata, predictions, metrics, and a run report.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">scale-up status</a></article>
2456
  </div>
 
2504
  <article class="evidence-card">
2505
  <span class="status-pill">current plan</span>
2506
  <h3>Foundation backbones are separated by role</h3>
2507
+ <p>Qwen3-Omni stays first for held-out LoRA; Cosmos 3 is the world-model branch; OpenVLA/openpi/GR00T are policy candidates after action-space conversion; Xperience-native pretraining is the later full-corpus goal.</p>
2508
  <div class="evidence-links">
2509
  <a href="data/foundation_model_plan.json">foundation model plan</a>
2510
  <a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/FOUNDATION_MODEL_PLAN.md">plan doc</a>
2511
+ <a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md">pretraining plan</a>
2512
  </div>
2513
  </article>
2514
  <article class="evidence-card">
 
2641
  <article class="reading-card">
2642
  <span class="step-index">04</span>
2643
  <h3>Check the scale-up gate</h3>
2644
+ <p>The multi-episode Qwen3-Omni path is prepared. The selected 128-episode result will be added after staging, preprocessing, training, and held-out evaluation pass. The native-pretraining plan shows how this can grow into a full-corpus research direction.</p>
2645
  <div class="reading-links">
2646
  <a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">scale-up status</a>
2647
  <a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md">data access</a>
2648
+ <a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md">native pretraining</a>
2649
  <a href="data/project_packet.json">reader path</a>
2650
  </div>
2651
  </article>
 
2673
  <article class="artifact"><h3>Current project subset</h3><p>One public sample episode, 5,821 frames, 1,161 aligned windows, 8,546-dimensional task inputs, and no raw-data redistribution.</p><a href="data/modality_atlas.json">modality atlas</a></article>
2674
  <article class="artifact"><h3>Covered now</h3><p>Action/subtask labels, next-action prediction, temporal diagnostics, hand trajectory, contact, object relevance, caption grounding, retrieval, reconstruction, and misalignment.</p><a href="data/summary_metrics.json">summary metrics</a></article>
2675
  <article class="artifact"><h3>Responsible use</h3><p>This project is for research exploration and excludes identity recognition, surveillance, biometric profiling, sensitive-attribute inference, and safety-critical deployment.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/DATA_NOTICE.md">use notes</a></article>
2676
+ <article class="artifact"><h3>Later milestones</h3><p>Full audio-visual learning, caption generation, depth-pixel prediction, SLAM estimation, neural rendering, policy learning, cross-episode generalization, held-out Qwen3-Omni evaluation, and future Xperience-native pretraining.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md">native pretraining</a></article>
2677
  </div>
2678
  </div>
2679
  </section>
 
3117
  </div>
3118
  <div class="artifact-grid">
3119
  <article class="artifact primary-artifact"><div><h3>Project scope</h3><p>Connects implemented single-episode artifacts, setup-stage Omni work, the selected 128-episode pilot, and later multi-episode milestones.</p></div><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/EVIDENCE_CONTRACT.md">evidence contract</a></article>
3120
+ <article class="artifact"><h3>Foundation-model plan</h3><p>Backbone selection matrix covering Qwen3-Omni, Cosmos 3, GR00T, OpenVLA/openpi, Gemini Robotics, Octo, SmolVLA-style policy candidates, and the future Xperience-native pretraining goal.</p><a href="data/foundation_model_plan.json">foundation model plan</a></article>
3121
  <article class="artifact"><h3>Multi-episode data access</h3><p>Public data-access path, selected 128-episode pilot plan, and preparation requirements.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md">data access</a></article>
3122
  <article class="artifact"><h3>Qwen3-Omni preparation</h3><p>Episode selection and manifest preparation for the current scale-up path.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/episode_manifest.json">preparation details</a></article>
3123
  <article class="artifact"><h3>Scale-up requirement</h3><p>What must be available before full pilot training and held-out metrics.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">training requirements</a></article>
3124
+ <article class="artifact"><h3>Xperience-native pretraining</h3><p>Future plan for a domain-specific embodied foundation model trained from scratch over full-corpus video, audio, geometry, motion, inertial, and language streams.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md">pretraining plan</a></article>
3125
  </div>
3126
  </section>
3127
 
 
3138
  <article class="artifact"><h3>Dataset notes</h3><p>Official dataset links, public sample source, modalities, access boundary, and current project subset.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/XPERIENCE10M_DATASET_CARD_ALIGNMENT.md">dataset notes</a></article>
3139
  <article class="artifact"><h3>Reproducibility</h3><p>Commands and expected outputs for rebuilding the public-sample task suite and visual artifacts.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/REPRODUCIBILITY.md">reproduce</a></article>
3140
  <article class="artifact"><h3>Qwen3-Omni status</h3><p>Data requirements and evaluation boundary for the selected multi-episode LoRA pilot.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">training status</a></article>
3141
+ <article class="artifact"><h3>Foundation-model plan</h3><p>Qwen3-Omni, Cosmos 3, GR00T, OpenVLA/openpi, Gemini Robotics, Octo, SmolVLA-style branches, and the Xperience-native pretraining goal by role.</p><a href="data/foundation_model_plan.json">model plan</a></article>
3142
  <article class="artifact"><h3>Hub artifacts</h3><p>Derived CSV/JSON/Markdown/figure artifacts without redistributing raw Xperience-10M data.</p><a href="https://huggingface.co/datasets/cy0307/ropedia-xperience-10m-task-suite-artifacts">artifact dataset</a></article>
3143
  <article class="artifact"><h3>Baseline models</h3><p>Lightweight minimal and neural task-head model files for the 12 task contracts.</p><a href="https://huggingface.co/cy0307/ropedia-xperience-10m-task-baselines">model repo</a></article>
3144
  </div>
 
3158
  <article class="artifact"><h3>Transfer</h3><p>Download raw episodes only from official gated sources, exclude visualization.rrd, validate files, then stage them for training.</p></article>
3159
  <article class="artifact"><h3>Current LoRA artifact</h3><p>The current LoRA artifact uses the locally available sample data. The multi-episode result begins after selected data is prepared, preprocessed, trained, and evaluated on held-out sessions.</p></article>
3160
  <article class="artifact"><h3>Backbone branches</h3><p>Qwen3-Omni is the immediate LoRA path; Cosmos 3 is the first world-model branch; GR00T/OpenVLA/openpi become policy branches after action targets are well-defined.</p><a href="data/foundation_model_plan.json">backbone plan</a></article>
3161
+ <article class="artifact"><h3>Native foundation model</h3><p>The long-term goal is a full-corpus Xperience Embodied Foundation Model trained on synchronized perception, geometry, motion, inertial, audio, and language streams after smaller scaling stages validate the approach.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md">pretraining plan</a></article>
3162
  </div>
3163
  </div>
3164
  </section>
docs/research_roadmap.html CHANGED
@@ -605,8 +605,9 @@
605
  <h1>Interactive Research Roadmap.</h1>
606
  <p class="hero-copy">
607
  This page connects the current public-sample task lab to the four research
608
- directions, the next multi-episode Qwen3-Omni fine-tuning path, and
609
- the later Cosmos 3 / policy-model branch choices. It loads
 
610
  directly from generated project artifacts, so the track and task views stay
611
  tied to the real sample metrics and scale-up status.
612
  </p>
@@ -630,7 +631,7 @@
630
  </div>
631
  <div class="route-step">
632
  <strong>03</strong>
633
- <div><b>Omni + branches</b><span>Qwen3-Omni first, Cosmos 3 and policy models after data preparation</span></div>
634
  <em id="routeOmni">pending data</em>
635
  </div>
636
  </div>
@@ -701,7 +702,7 @@
701
  },
702
  omni: {
703
  title: "Omni pilot and foundation branches",
704
- summary: "Run Qwen3-Omni first for the held-out LoRA pilot, then evaluate Cosmos 3 for world modeling and policy candidates after action targets are explicit.",
705
  }
706
  };
707
 
 
605
  <h1>Interactive Research Roadmap.</h1>
606
  <p class="hero-copy">
607
  This page connects the current public-sample task lab to the four research
608
+ directions, the next multi-episode Qwen3-Omni fine-tuning path, the
609
+ later Cosmos 3 / policy-model branch choices, and the future
610
+ Xperience-native foundation-model pretraining goal. It loads
611
  directly from generated project artifacts, so the track and task views stay
612
  tied to the real sample metrics and scale-up status.
613
  </p>
 
631
  </div>
632
  <div class="route-step">
633
  <strong>03</strong>
634
+ <div><b>Omni + branches</b><span>Qwen3-Omni first, Cosmos 3 and policy models next, native pretraining later</span></div>
635
  <em id="routeOmni">pending data</em>
636
  </div>
637
  </div>
 
702
  },
703
  omni: {
704
  title: "Omni pilot and foundation branches",
705
+ summary: "Run Qwen3-Omni first for the held-out LoRA pilot, evaluate Cosmos 3 for world modeling and policy candidates after action targets are explicit, then treat Xperience-native pretraining as the full-corpus future goal.",
706
  }
707
  };
708
 
index.html CHANGED
@@ -2141,9 +2141,11 @@
2141
  <p class="hero-copy">
2142
  This project uses the public Xperience-10M sample from Ropedia to explore
2143
  embodied-AI task design, multimodal feature construction, lightweight
2144
- baselines, and future Omni-model fine-tuning. It starts from the sample
2145
- episode available now, then keeps the same data contracts ready for
2146
- held-out multi-episode training when more Xperience-10M data is prepared.
 
 
2147
  </p>
2148
  <div class="hero-actions">
2149
  <a class="button primary" href="research_roadmap.html">Open roadmap</a>
@@ -2252,7 +2254,7 @@
2252
  </article>
2253
  <article class="brief-card">
2254
  <strong>Scale-up readiness</strong>
2255
- <p>Connects the same data contract to 32/128-episode held-out pilots, Qwen3-Omni LoRA, Cosmos-style world modeling, and later policy-model branches.</p>
2256
  </article>
2257
  </div>
2258
  <div class="brief-actions">
@@ -2356,7 +2358,7 @@
2356
  <div class="wrap">
2357
  <div class="section-head">
2358
  <h2>Research roadmap.</h2>
2359
- <p>The project path moves from the current public-sample task lab to multi-episode data preparation, held-out Qwen3-Omni fine-tuning, robustness runs, and larger foundation/world-model extensions.</p>
2360
  </div>
2361
  <div class="roadmap-grid" aria-label="Research roadmap stages">
2362
  <article class="roadmap-card" data-status="implemented">
@@ -2413,12 +2415,22 @@
2413
  <strong>Evidence</strong><p>Task-specific held-out evaluations, qualitative inspection, and updated model cards.</p>
2414
  </div>
2415
  </article>
 
 
 
 
 
 
 
 
 
2416
  </div>
2417
  <div class="roadmap-links">
2418
  <a href="research_roadmap.html">interactive roadmap</a>
2419
  <a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/RESEARCH_ROADMAP.md">roadmap document</a>
2420
  <a href="data/research_roadmap.json">roadmap stages</a>
2421
  <a href="data/foundation_model_plan.json">foundation model plan</a>
 
2422
  <a href="data/research_roadmap_interactive.json">interactive map</a>
2423
  <a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">scale-up status</a>
2424
  <a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/PROJECT_STATUS.md">project status</a>
@@ -2438,7 +2450,7 @@
2438
  <article class="artifact"><h3>Metric contract</h3><p>All 12 tasks list input, target, primary metric, minimal baseline score, and neural MLP score from committed result files.</p><a href="data/summary_metrics.json">summary metrics</a></article>
2439
  <article class="artifact"><h3>Leakage controls</h3><p>Scalers fit on train windows only; future labels, target-side signals, caption/object labels, and contact labels stay on the target side unless explicitly queried.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/scripts/build_evaluation_protocol.py">builder script</a></article>
2440
  <article class="artifact"><h3>Audio ablation</h3><p>Audio and no-audio variants are evaluated across all 12 task contracts under the same chronological split.</p><a href="data/audio_ablation_summary.json">audio summary</a></article>
2441
- <article class="artifact"><h3>Foundation branch selection</h3><p>Qwen3-Omni is the first trainable baseline, Cosmos 3 becomes the world-model branch, and policy models wait for explicit action targets.</p><a href="data/foundation_model_plan.json">backbone plan</a></article>
2442
  <article class="artifact"><h3>Next evaluation stage</h3><p>This public-sample run covers single-episode task development. Cross-episode generalization, audio-visual learning, world modeling, policy targets, and held-out Qwen3-Omni training move to the multi-episode stage after selected data is prepared.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">next-stage plan</a></article>
2443
  <article class="artifact"><h3>Scale-up requirement</h3><p>The Omni pilot requires selected prepared episodes, held-out episode splits, no train/test episode leakage, training metadata, predictions, metrics, and a run report.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">scale-up status</a></article>
2444
  </div>
@@ -2492,10 +2504,11 @@
2492
  <article class="evidence-card">
2493
  <span class="status-pill">current plan</span>
2494
  <h3>Foundation backbones are separated by role</h3>
2495
- <p>Qwen3-Omni stays first for held-out LoRA; Cosmos 3 is the world-model branch; OpenVLA/openpi/GR00T are policy candidates after action-space conversion.</p>
2496
  <div class="evidence-links">
2497
  <a href="data/foundation_model_plan.json">foundation model plan</a>
2498
  <a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/FOUNDATION_MODEL_PLAN.md">plan doc</a>
 
2499
  </div>
2500
  </article>
2501
  <article class="evidence-card">
@@ -2628,10 +2641,11 @@
2628
  <article class="reading-card">
2629
  <span class="step-index">04</span>
2630
  <h3>Check the scale-up gate</h3>
2631
- <p>The multi-episode Qwen3-Omni path is prepared. The selected 128-episode result will be added after staging, preprocessing, training, and held-out evaluation pass.</p>
2632
  <div class="reading-links">
2633
  <a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">scale-up status</a>
2634
  <a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md">data access</a>
 
2635
  <a href="data/project_packet.json">reader path</a>
2636
  </div>
2637
  </article>
@@ -2659,7 +2673,7 @@
2659
  <article class="artifact"><h3>Current project subset</h3><p>One public sample episode, 5,821 frames, 1,161 aligned windows, 8,546-dimensional task inputs, and no raw-data redistribution.</p><a href="data/modality_atlas.json">modality atlas</a></article>
2660
  <article class="artifact"><h3>Covered now</h3><p>Action/subtask labels, next-action prediction, temporal diagnostics, hand trajectory, contact, object relevance, caption grounding, retrieval, reconstruction, and misalignment.</p><a href="data/summary_metrics.json">summary metrics</a></article>
2661
  <article class="artifact"><h3>Responsible use</h3><p>This project is for research exploration and excludes identity recognition, surveillance, biometric profiling, sensitive-attribute inference, and safety-critical deployment.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/DATA_NOTICE.md">use notes</a></article>
2662
- <article class="artifact"><h3>Later milestones</h3><p>Full audio-visual learning, caption generation, depth-pixel prediction, SLAM estimation, neural rendering, policy learning, cross-episode generalization, and held-out Qwen3-Omni evaluation.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">scale-up status</a></article>
2663
  </div>
2664
  </div>
2665
  </section>
@@ -3103,10 +3117,11 @@
3103
  </div>
3104
  <div class="artifact-grid">
3105
  <article class="artifact primary-artifact"><div><h3>Project scope</h3><p>Connects implemented single-episode artifacts, setup-stage Omni work, the selected 128-episode pilot, and later multi-episode milestones.</p></div><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/EVIDENCE_CONTRACT.md">evidence contract</a></article>
3106
- <article class="artifact"><h3>Foundation-model plan</h3><p>Backbone selection matrix covering Qwen3-Omni, Cosmos 3, GR00T, OpenVLA/openpi, Gemini Robotics, Octo, and SmolVLA-style policy candidates.</p><a href="data/foundation_model_plan.json">foundation model plan</a></article>
3107
  <article class="artifact"><h3>Multi-episode data access</h3><p>Public data-access path, selected 128-episode pilot plan, and preparation requirements.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md">data access</a></article>
3108
  <article class="artifact"><h3>Qwen3-Omni preparation</h3><p>Episode selection and manifest preparation for the current scale-up path.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/episode_manifest.json">preparation details</a></article>
3109
  <article class="artifact"><h3>Scale-up requirement</h3><p>What must be available before full pilot training and held-out metrics.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">training requirements</a></article>
 
3110
  </div>
3111
  </section>
3112
 
@@ -3123,7 +3138,7 @@
3123
  <article class="artifact"><h3>Dataset notes</h3><p>Official dataset links, public sample source, modalities, access boundary, and current project subset.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/XPERIENCE10M_DATASET_CARD_ALIGNMENT.md">dataset notes</a></article>
3124
  <article class="artifact"><h3>Reproducibility</h3><p>Commands and expected outputs for rebuilding the public-sample task suite and visual artifacts.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/REPRODUCIBILITY.md">reproduce</a></article>
3125
  <article class="artifact"><h3>Qwen3-Omni status</h3><p>Data requirements and evaluation boundary for the selected multi-episode LoRA pilot.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">training status</a></article>
3126
- <article class="artifact"><h3>Foundation-model plan</h3><p>Qwen3-Omni, Cosmos 3, GR00T, OpenVLA/openpi, Gemini Robotics, Octo, and SmolVLA-style branches by role.</p><a href="data/foundation_model_plan.json">model plan</a></article>
3127
  <article class="artifact"><h3>Hub artifacts</h3><p>Derived CSV/JSON/Markdown/figure artifacts without redistributing raw Xperience-10M data.</p><a href="https://huggingface.co/datasets/cy0307/ropedia-xperience-10m-task-suite-artifacts">artifact dataset</a></article>
3128
  <article class="artifact"><h3>Baseline models</h3><p>Lightweight minimal and neural task-head model files for the 12 task contracts.</p><a href="https://huggingface.co/cy0307/ropedia-xperience-10m-task-baselines">model repo</a></article>
3129
  </div>
@@ -3143,6 +3158,7 @@
3143
  <article class="artifact"><h3>Transfer</h3><p>Download raw episodes only from official gated sources, exclude visualization.rrd, validate files, then stage them for training.</p></article>
3144
  <article class="artifact"><h3>Current LoRA artifact</h3><p>The current LoRA artifact uses the locally available sample data. The multi-episode result begins after selected data is prepared, preprocessed, trained, and evaluated on held-out sessions.</p></article>
3145
  <article class="artifact"><h3>Backbone branches</h3><p>Qwen3-Omni is the immediate LoRA path; Cosmos 3 is the first world-model branch; GR00T/OpenVLA/openpi become policy branches after action targets are well-defined.</p><a href="data/foundation_model_plan.json">backbone plan</a></article>
 
3146
  </div>
3147
  </div>
3148
  </section>
 
2141
  <p class="hero-copy">
2142
  This project uses the public Xperience-10M sample from Ropedia to explore
2143
  embodied-AI task design, multimodal feature construction, lightweight
2144
+ baselines, future Omni-model fine-tuning, and the long-term path toward
2145
+ an Xperience-native embodied foundation model. It starts from the
2146
+ sample episode available now, then keeps the same data contracts ready
2147
+ for held-out multi-episode training when more Xperience-10M data is
2148
+ prepared.
2149
  </p>
2150
  <div class="hero-actions">
2151
  <a class="button primary" href="research_roadmap.html">Open roadmap</a>
 
2254
  </article>
2255
  <article class="brief-card">
2256
  <strong>Scale-up readiness</strong>
2257
+ <p>Connects the same data contract to 32/128-episode held-out pilots, Qwen3-Omni LoRA, Cosmos-style world modeling, policy-model branches, and the later Xperience-native pretraining goal.</p>
2258
  </article>
2259
  </div>
2260
  <div class="brief-actions">
 
2358
  <div class="wrap">
2359
  <div class="section-head">
2360
  <h2>Research roadmap.</h2>
2361
+ <p>The project path moves from the current public-sample task lab to multi-episode data preparation, held-out Qwen3-Omni fine-tuning, robustness runs, world/policy branches, and the future Xperience Embodied Foundation Model pretraining goal.</p>
2362
  </div>
2363
  <div class="roadmap-grid" aria-label="Research roadmap stages">
2364
  <article class="roadmap-card" data-status="implemented">
 
2415
  <strong>Evidence</strong><p>Task-specific held-out evaluations, qualitative inspection, and updated model cards.</p>
2416
  </div>
2417
  </article>
2418
+ <article class="roadmap-card" data-status="planned">
2419
+ <span class="roadmap-status">future</span>
2420
+ <h3>Xperience Embodied Foundation Model</h3>
2421
+ <p>Pretrain an Xperience-native domain model over synchronized video, audio, depth, pose, mocap, IMU, and language after smaller scaling stages prove value.</p>
2422
+ <div class="roadmap-meta">
2423
+ <strong>Entry</strong><p>Full-corpus access, PB-scale storage path, multi-node compute, and positive scaling evidence.</p>
2424
+ <strong>Evidence</strong><p>Pretraining manifests, scaling curves, held-out evaluations, checkpoint inventory, model card, and data-boundary report.</p>
2425
+ </div>
2426
+ </article>
2427
  </div>
2428
  <div class="roadmap-links">
2429
  <a href="research_roadmap.html">interactive roadmap</a>
2430
  <a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/RESEARCH_ROADMAP.md">roadmap document</a>
2431
  <a href="data/research_roadmap.json">roadmap stages</a>
2432
  <a href="data/foundation_model_plan.json">foundation model plan</a>
2433
+ <a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md">native pretraining plan</a>
2434
  <a href="data/research_roadmap_interactive.json">interactive map</a>
2435
  <a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">scale-up status</a>
2436
  <a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/PROJECT_STATUS.md">project status</a>
 
2450
  <article class="artifact"><h3>Metric contract</h3><p>All 12 tasks list input, target, primary metric, minimal baseline score, and neural MLP score from committed result files.</p><a href="data/summary_metrics.json">summary metrics</a></article>
2451
  <article class="artifact"><h3>Leakage controls</h3><p>Scalers fit on train windows only; future labels, target-side signals, caption/object labels, and contact labels stay on the target side unless explicitly queried.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/scripts/build_evaluation_protocol.py">builder script</a></article>
2452
  <article class="artifact"><h3>Audio ablation</h3><p>Audio and no-audio variants are evaluated across all 12 task contracts under the same chronological split.</p><a href="data/audio_ablation_summary.json">audio summary</a></article>
2453
+ <article class="artifact"><h3>Foundation branch selection</h3><p>Qwen3-Omni is the first trainable baseline, Cosmos 3 becomes the world-model branch, policy models wait for explicit action targets, and Xperience-native pretraining remains a later full-corpus goal.</p><a href="data/foundation_model_plan.json">backbone plan</a></article>
2454
  <article class="artifact"><h3>Next evaluation stage</h3><p>This public-sample run covers single-episode task development. Cross-episode generalization, audio-visual learning, world modeling, policy targets, and held-out Qwen3-Omni training move to the multi-episode stage after selected data is prepared.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">next-stage plan</a></article>
2455
  <article class="artifact"><h3>Scale-up requirement</h3><p>The Omni pilot requires selected prepared episodes, held-out episode splits, no train/test episode leakage, training metadata, predictions, metrics, and a run report.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">scale-up status</a></article>
2456
  </div>
 
2504
  <article class="evidence-card">
2505
  <span class="status-pill">current plan</span>
2506
  <h3>Foundation backbones are separated by role</h3>
2507
+ <p>Qwen3-Omni stays first for held-out LoRA; Cosmos 3 is the world-model branch; OpenVLA/openpi/GR00T are policy candidates after action-space conversion; Xperience-native pretraining is the later full-corpus goal.</p>
2508
  <div class="evidence-links">
2509
  <a href="data/foundation_model_plan.json">foundation model plan</a>
2510
  <a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/FOUNDATION_MODEL_PLAN.md">plan doc</a>
2511
+ <a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md">pretraining plan</a>
2512
  </div>
2513
  </article>
2514
  <article class="evidence-card">
 
2641
  <article class="reading-card">
2642
  <span class="step-index">04</span>
2643
  <h3>Check the scale-up gate</h3>
2644
+ <p>The multi-episode Qwen3-Omni path is prepared. The selected 128-episode result will be added after staging, preprocessing, training, and held-out evaluation pass. The native-pretraining plan shows how this can grow into a full-corpus research direction.</p>
2645
  <div class="reading-links">
2646
  <a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">scale-up status</a>
2647
  <a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md">data access</a>
2648
+ <a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md">native pretraining</a>
2649
  <a href="data/project_packet.json">reader path</a>
2650
  </div>
2651
  </article>
 
2673
  <article class="artifact"><h3>Current project subset</h3><p>One public sample episode, 5,821 frames, 1,161 aligned windows, 8,546-dimensional task inputs, and no raw-data redistribution.</p><a href="data/modality_atlas.json">modality atlas</a></article>
2674
  <article class="artifact"><h3>Covered now</h3><p>Action/subtask labels, next-action prediction, temporal diagnostics, hand trajectory, contact, object relevance, caption grounding, retrieval, reconstruction, and misalignment.</p><a href="data/summary_metrics.json">summary metrics</a></article>
2675
  <article class="artifact"><h3>Responsible use</h3><p>This project is for research exploration and excludes identity recognition, surveillance, biometric profiling, sensitive-attribute inference, and safety-critical deployment.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/DATA_NOTICE.md">use notes</a></article>
2676
+ <article class="artifact"><h3>Later milestones</h3><p>Full audio-visual learning, caption generation, depth-pixel prediction, SLAM estimation, neural rendering, policy learning, cross-episode generalization, held-out Qwen3-Omni evaluation, and future Xperience-native pretraining.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md">native pretraining</a></article>
2677
  </div>
2678
  </div>
2679
  </section>
 
3117
  </div>
3118
  <div class="artifact-grid">
3119
  <article class="artifact primary-artifact"><div><h3>Project scope</h3><p>Connects implemented single-episode artifacts, setup-stage Omni work, the selected 128-episode pilot, and later multi-episode milestones.</p></div><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/EVIDENCE_CONTRACT.md">evidence contract</a></article>
3120
+ <article class="artifact"><h3>Foundation-model plan</h3><p>Backbone selection matrix covering Qwen3-Omni, Cosmos 3, GR00T, OpenVLA/openpi, Gemini Robotics, Octo, SmolVLA-style policy candidates, and the future Xperience-native pretraining goal.</p><a href="data/foundation_model_plan.json">foundation model plan</a></article>
3121
  <article class="artifact"><h3>Multi-episode data access</h3><p>Public data-access path, selected 128-episode pilot plan, and preparation requirements.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md">data access</a></article>
3122
  <article class="artifact"><h3>Qwen3-Omni preparation</h3><p>Episode selection and manifest preparation for the current scale-up path.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/episode_manifest.json">preparation details</a></article>
3123
  <article class="artifact"><h3>Scale-up requirement</h3><p>What must be available before full pilot training and held-out metrics.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">training requirements</a></article>
3124
+ <article class="artifact"><h3>Xperience-native pretraining</h3><p>Future plan for a domain-specific embodied foundation model trained from scratch over full-corpus video, audio, geometry, motion, inertial, and language streams.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md">pretraining plan</a></article>
3125
  </div>
3126
  </section>
3127
 
 
3138
  <article class="artifact"><h3>Dataset notes</h3><p>Official dataset links, public sample source, modalities, access boundary, and current project subset.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/XPERIENCE10M_DATASET_CARD_ALIGNMENT.md">dataset notes</a></article>
3139
  <article class="artifact"><h3>Reproducibility</h3><p>Commands and expected outputs for rebuilding the public-sample task suite and visual artifacts.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/REPRODUCIBILITY.md">reproduce</a></article>
3140
  <article class="artifact"><h3>Qwen3-Omni status</h3><p>Data requirements and evaluation boundary for the selected multi-episode LoRA pilot.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">training status</a></article>
3141
+ <article class="artifact"><h3>Foundation-model plan</h3><p>Qwen3-Omni, Cosmos 3, GR00T, OpenVLA/openpi, Gemini Robotics, Octo, SmolVLA-style branches, and the Xperience-native pretraining goal by role.</p><a href="data/foundation_model_plan.json">model plan</a></article>
3142
  <article class="artifact"><h3>Hub artifacts</h3><p>Derived CSV/JSON/Markdown/figure artifacts without redistributing raw Xperience-10M data.</p><a href="https://huggingface.co/datasets/cy0307/ropedia-xperience-10m-task-suite-artifacts">artifact dataset</a></article>
3143
  <article class="artifact"><h3>Baseline models</h3><p>Lightweight minimal and neural task-head model files for the 12 task contracts.</p><a href="https://huggingface.co/cy0307/ropedia-xperience-10m-task-baselines">model repo</a></article>
3144
  </div>
 
3158
  <article class="artifact"><h3>Transfer</h3><p>Download raw episodes only from official gated sources, exclude visualization.rrd, validate files, then stage them for training.</p></article>
3159
  <article class="artifact"><h3>Current LoRA artifact</h3><p>The current LoRA artifact uses the locally available sample data. The multi-episode result begins after selected data is prepared, preprocessed, trained, and evaluated on held-out sessions.</p></article>
3160
  <article class="artifact"><h3>Backbone branches</h3><p>Qwen3-Omni is the immediate LoRA path; Cosmos 3 is the first world-model branch; GR00T/OpenVLA/openpi become policy branches after action targets are well-defined.</p><a href="data/foundation_model_plan.json">backbone plan</a></article>
3161
+ <article class="artifact"><h3>Native foundation model</h3><p>The long-term goal is a full-corpus Xperience Embodied Foundation Model trained on synchronized perception, geometry, motion, inertial, audio, and language streams after smaller scaling stages validate the approach.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md">pretraining plan</a></article>
3162
  </div>
3163
  </div>
3164
  </section>
metrics/artifact_index.json CHANGED
@@ -1,11 +1,11 @@
1
  {
2
  "title": "Ropedia Xperience-10M Task Suite Artifact Index",
3
- "generated_at_utc": "2026-06-04T16:42:13+00:00",
4
  "status": "pass",
5
- "artifact_count": 72,
6
  "missing": [],
7
  "by_kind": {
8
- "project_path": 11,
9
  "project_scope": 1,
10
  "source_alignment": 5,
11
  "publication_workflow": 1,
@@ -62,8 +62,8 @@
62
  "surface": "repo_hf",
63
  "shows": "Gives a compact current-state table for first-pass readers.",
64
  "exists": true,
65
- "bytes": 7138,
66
- "sha256": "67d85a198ee90082e47d790bd0f4d9dafbc97625cd39b17cc94b9785ec25104a"
67
  },
68
  {
69
  "id": "project_status_json",
@@ -73,8 +73,8 @@
73
  "surface": "website_hf",
74
  "shows": "Machine-readable copy of the current project status for website and HF mirrors.",
75
  "exists": true,
76
- "bytes": 9169,
77
- "sha256": "50d3c87b774c8375dcb897bd363d25e392e5fd6571571c41d56e623df15063f8"
78
  },
79
  {
80
  "id": "research_roadmap",
@@ -84,8 +84,8 @@
84
  "surface": "repo_hf",
85
  "shows": "Defines the path from public-sample task development to multi-episode held-out evaluation and larger omni-model extensions.",
86
  "exists": true,
87
- "bytes": 6677,
88
- "sha256": "58491bfb68ad3e6b7569bdb1a3cac3de7682a49beb9de368a114d58ebf0b118b"
89
  },
90
  {
91
  "id": "research_roadmap_json",
@@ -95,8 +95,8 @@
95
  "surface": "website_hf",
96
  "shows": "Machine-readable research roadmap for the website and Hugging Face mirrors.",
97
  "exists": true,
98
- "bytes": 5758,
99
- "sha256": "54657eb8824416d2128d6e5710543bdaf9e41d7c2fa46dd14ad6b58fede3b5db"
100
  },
101
  {
102
  "id": "foundation_model_plan",
@@ -106,8 +106,8 @@
106
  "surface": "repo_hf",
107
  "shows": "Defines the post-data-gate backbone choices: Qwen3-Omni first, Cosmos 3 for world modeling, and VLA/policy models after action-target conversion.",
108
  "exists": true,
109
- "bytes": 6559,
110
- "sha256": "955be6559b554f1c6c4141dd6ca2818127d89585df3940c2bd9b975ad9047926"
111
  },
112
  {
113
  "id": "foundation_model_plan_json",
@@ -117,8 +117,19 @@
117
  "surface": "website_hf",
118
  "shows": "Machine-readable foundation-model selection matrix with source links, entry conditions, and evaluation additions.",
119
  "exists": true,
120
- "bytes": 8889,
121
- "sha256": "e9b11114fa290253000b921575586780ccc3ba17665235259d4326c524f6ce97"
 
 
 
 
 
 
 
 
 
 
 
122
  },
123
  {
124
  "id": "evidence_contract",
@@ -150,8 +161,8 @@
150
  "surface": "repo_hf",
151
  "shows": "Gives the human-readable map from project scope to data, tasks, platform mirrors, and scale-up status.",
152
  "exists": true,
153
- "bytes": 16890,
154
- "sha256": "8bce9a773daf36214e377a7154b72a4493efd0f7d1a1941d5e0fc9bf784a29e5"
155
  },
156
  {
157
  "id": "official_dataset_card_alignment",
@@ -195,7 +206,7 @@
195
  "shows": "Machine-readable source-alignment pass/fail check for repo, website, and HF surfaces.",
196
  "exists": true,
197
  "bytes": 4432,
198
- "sha256": "96c7adc61c869fab71ef34ec2f6ec4f5f88af844509bd3d51d3818732d1f84b6"
199
  },
200
  {
201
  "id": "source_alignment_validator",
@@ -573,8 +584,8 @@
573
  "surface": "repo_hf",
574
  "shows": "Generates the selective artifact catalog from local files.",
575
  "exists": true,
576
- "bytes": 26568,
577
- "sha256": "a611b399e858560f6afb41e121f033724753c5167d04e0d7bf243e569de88f04"
578
  },
579
  {
580
  "id": "publication_audit",
@@ -585,7 +596,7 @@
585
  "volatile": true,
586
  "shows": "Confirms public bundles exclude raw data, caches, heavy archives, and credential text.",
587
  "exists": true,
588
- "bytes": 7289,
589
  "hash_policy": "existence_and_size_only"
590
  },
591
  {
@@ -597,7 +608,7 @@
597
  "volatile": true,
598
  "shows": "Separates setup paths from completed held-out-episode results.",
599
  "exists": true,
600
- "bytes": 19505,
601
  "hash_policy": "existence_and_size_only"
602
  },
603
  {
@@ -609,7 +620,7 @@
609
  "volatile": true,
610
  "shows": "Confirms prepared GitHub/HF Space/artifact/model mirrors share the same critical data, figure, website HTML, and validator files.",
611
  "exists": true,
612
- "bytes": 108617,
613
  "hash_policy": "existence_and_size_only"
614
  },
615
  {
@@ -621,7 +632,7 @@
621
  "volatile": true,
622
  "shows": "Confirms local website links, anchors, JSON data files, and referenced images resolve.",
623
  "exists": true,
624
- "bytes": 14923,
625
  "hash_policy": "existence_and_size_only"
626
  },
627
  {
 
1
  {
2
  "title": "Ropedia Xperience-10M Task Suite Artifact Index",
3
+ "generated_at_utc": "2026-06-04T20:40:52+00:00",
4
  "status": "pass",
5
+ "artifact_count": 73,
6
  "missing": [],
7
  "by_kind": {
8
+ "project_path": 12,
9
  "project_scope": 1,
10
  "source_alignment": 5,
11
  "publication_workflow": 1,
 
62
  "surface": "repo_hf",
63
  "shows": "Gives a compact current-state table for first-pass readers.",
64
  "exists": true,
65
+ "bytes": 7207,
66
+ "sha256": "7baaba976ccc254da1a03ee2653057d1e08f3fb0c0cad035886c362442828720"
67
  },
68
  {
69
  "id": "project_status_json",
 
73
  "surface": "website_hf",
74
  "shows": "Machine-readable copy of the current project status for website and HF mirrors.",
75
  "exists": true,
76
+ "bytes": 9874,
77
+ "sha256": "600c95726eae3404127a8b2110f35468ff2ba02943cae0fbcd3ea43c66109d3e"
78
  },
79
  {
80
  "id": "research_roadmap",
 
84
  "surface": "repo_hf",
85
  "shows": "Defines the path from public-sample task development to multi-episode held-out evaluation and larger omni-model extensions.",
86
  "exists": true,
87
+ "bytes": 8388,
88
+ "sha256": "0b3e3356076998ad94dc39f708cc783a4ebeab76c9da661cdd37ea12a3bb3665"
89
  },
90
  {
91
  "id": "research_roadmap_json",
 
95
  "surface": "website_hf",
96
  "shows": "Machine-readable research roadmap for the website and Hugging Face mirrors.",
97
  "exists": true,
98
+ "bytes": 7161,
99
+ "sha256": "cc96118c2c05108c831616151bc027441f7545495adeeb6a4a6a6bffe8da7801"
100
  },
101
  {
102
  "id": "foundation_model_plan",
 
106
  "surface": "repo_hf",
107
  "shows": "Defines the post-data-gate backbone choices: Qwen3-Omni first, Cosmos 3 for world modeling, and VLA/policy models after action-target conversion.",
108
  "exists": true,
109
+ "bytes": 9075,
110
+ "sha256": "444d13ab556d2e16a199a7fca191b87c85ab8685d167aab357bc6341839299a2"
111
  },
112
  {
113
  "id": "foundation_model_plan_json",
 
117
  "surface": "website_hf",
118
  "shows": "Machine-readable foundation-model selection matrix with source links, entry conditions, and evaluation additions.",
119
  "exists": true,
120
+ "bytes": 12981,
121
+ "sha256": "9cce52025a2e2f8afb4660e2af3353aea6ad0a1af380849218dd74c0acc271bb"
122
+ },
123
+ {
124
+ "id": "xperience_embodied_foundation_pretraining",
125
+ "title": "Xperience Embodied Foundation Model pretraining goal",
126
+ "path": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md",
127
+ "kind": "project_path",
128
+ "surface": "repo_hf",
129
+ "shows": "Describes the future full-corpus Xperience-native pretraining goal, target modules, objectives, staged scale-up, hardware ranges, and evaluation protocol.",
130
+ "exists": true,
131
+ "bytes": 9182,
132
+ "sha256": "b5a6ddc58647cd895a4772b110ecc9f4d685427fb37b81b22c6c02d2b9b323f1"
133
  },
134
  {
135
  "id": "evidence_contract",
 
161
  "surface": "repo_hf",
162
  "shows": "Gives the human-readable map from project scope to data, tasks, platform mirrors, and scale-up status.",
163
  "exists": true,
164
+ "bytes": 11440,
165
+ "sha256": "9b8821a9b14fe1744f2e6b5c419b2c5daaf70b57f1944caf1105c36c0c66c119"
166
  },
167
  {
168
  "id": "official_dataset_card_alignment",
 
206
  "shows": "Machine-readable source-alignment pass/fail check for repo, website, and HF surfaces.",
207
  "exists": true,
208
  "bytes": 4432,
209
+ "sha256": "06c6e2d111c72df01ed127fd288e6675b63e35a21ae12a2523931a072bd0bc49"
210
  },
211
  {
212
  "id": "source_alignment_validator",
 
584
  "surface": "repo_hf",
585
  "shows": "Generates the selective artifact catalog from local files.",
586
  "exists": true,
587
+ "bytes": 27020,
588
+ "sha256": "0ca7ed96f24caecbab31687cffa99f0eba8471258986412a294614e688c5aff5"
589
  },
590
  {
591
  "id": "publication_audit",
 
596
  "volatile": true,
597
  "shows": "Confirms public bundles exclude raw data, caches, heavy archives, and credential text.",
598
  "exists": true,
599
+ "bytes": 11811,
600
  "hash_policy": "existence_and_size_only"
601
  },
602
  {
 
608
  "volatile": true,
609
  "shows": "Separates setup paths from completed held-out-episode results.",
610
  "exists": true,
611
+ "bytes": 18981,
612
  "hash_policy": "existence_and_size_only"
613
  },
614
  {
 
620
  "volatile": true,
621
  "shows": "Confirms prepared GitHub/HF Space/artifact/model mirrors share the same critical data, figure, website HTML, and validator files.",
622
  "exists": true,
623
+ "bytes": 108621,
624
  "hash_policy": "existence_and_size_only"
625
  },
626
  {
 
632
  "volatile": true,
633
  "shows": "Confirms local website links, anchors, JSON data files, and referenced images resolve.",
634
  "exists": true,
635
+ "bytes": 14891,
636
  "hash_policy": "existence_and_size_only"
637
  },
638
  {
metrics/foundation_model_plan.json CHANGED
@@ -2,6 +2,16 @@
2
  "title": "Xperience-10M Foundation Model Plan",
3
  "status": "planning_artifact",
4
  "current_boundary": "No held-out multi-episode foundation-model result has been completed in this repo. The current foundation-model artifacts are setup-stage until enough valid episodes are prepared and evaluated.",
 
 
 
 
 
 
 
 
 
 
5
  "decision": {
6
  "immediate_trainable_backbone": "Qwen3-Omni",
7
  "first_world_model_branch": "Cosmos 3",
@@ -10,7 +20,65 @@
10
  "openpi pi0/pi0.5",
11
  "NVIDIA GR00T"
12
  ],
13
- "external_reasoning_reference": "Gemini Robotics"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  },
15
  "model_families": [
16
  {
@@ -112,6 +180,21 @@
112
  "current_decision": "optional_baseline_after_data_staging",
113
  "entry_condition": "Action labels and baseline protocol exist.",
114
  "public_source": "https://github.com/huggingface/lerobot"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
115
  }
116
  ],
117
  "execution_order": [
@@ -144,6 +227,11 @@
144
  "step": 6,
145
  "name": "Publishing threshold",
146
  "action": "Publish branch results only with real manifests, predictions, metrics, and qualitative examples."
 
 
 
 
 
147
  }
148
  ],
149
  "evaluation_additions": [
@@ -230,6 +318,10 @@
230
  {
231
  "label": "LeRobot / SmolVLA",
232
  "url": "https://github.com/huggingface/lerobot"
 
 
 
 
233
  }
234
  ]
235
  }
 
2
  "title": "Xperience-10M Foundation Model Plan",
3
  "status": "planning_artifact",
4
  "current_boundary": "No held-out multi-episode foundation-model result has been completed in this repo. The current foundation-model artifacts are setup-stage until enough valid episodes are prepared and evaluated.",
5
+ "backbone_registry": {
6
+ "config_dir": "configs/omni_backbones",
7
+ "validator": "scripts/omni/backbone_registry.py --validate --json",
8
+ "extension_contract": "OMNI_MODEL_EXTENSION_CONTRACT.md",
9
+ "implemented_backbone": "qwen3_omni_lora",
10
+ "planned_backbones": [
11
+ "cosmos_world_model",
12
+ "policy_vla_branch"
13
+ ]
14
+ },
15
  "decision": {
16
  "immediate_trainable_backbone": "Qwen3-Omni",
17
  "first_world_model_branch": "Cosmos 3",
 
20
  "openpi pi0/pi0.5",
21
  "NVIDIA GR00T"
22
  ],
23
+ "external_reasoning_reference": "Gemini Robotics",
24
+ "long_term_native_pretraining_goal": "Xperience Embodied Foundation Model"
25
+ },
26
+ "future_pretraining_goal": {
27
+ "name": "Xperience Embodied Foundation Model",
28
+ "status": "future_planning_goal",
29
+ "role": "Domain-specific embodied foundation model pretrained on full Xperience-10M if full-corpus data, storage, and compute become available.",
30
+ "not_current_result": true,
31
+ "document": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md",
32
+ "entry_conditions": [
33
+ "Selected multi-episode Qwen3-Omni pilot trains and evaluates cleanly.",
34
+ "Scaling from 128 episodes to thousands of episodes shows measurable value.",
35
+ "Full-corpus storage, derived-shard storage, and fast active-cache capacity are available.",
36
+ "Distributed training, checkpoint/restart, and provenance tracking are reliable.",
37
+ "Evaluation covers held-out episodes, sessions, activities, objects, and missing-modality robustness."
38
+ ],
39
+ "target_modules": [
40
+ "multi-view video encoder",
41
+ "audio encoder",
42
+ "depth and geometry encoder",
43
+ "pose/SLAM encoder",
44
+ "hand/body mocap encoder",
45
+ "IMU encoder",
46
+ "language encoder/decoder",
47
+ "temporal fusion transformer",
48
+ "task heads and decoders"
49
+ ],
50
+ "pretraining_objectives": [
51
+ "masked multimodal modeling",
52
+ "cross-modal contrastive alignment",
53
+ "future-state prediction",
54
+ "ego-motion and hand-motion forecasting",
55
+ "action and procedure prediction",
56
+ "language grounding and captioning",
57
+ "contact and affordance prediction",
58
+ "optional policy-style targets after action conversion"
59
+ ],
60
+ "hardware_ranges": [
61
+ {
62
+ "goal": "0.3B-1B pilot",
63
+ "compute": "8-32 modern 80GB-class data-center GPUs",
64
+ "use": "prove objectives and data loaders"
65
+ },
66
+ {
67
+ "goal": "1B-3B domain model",
68
+ "compute": "32-128 GPUs",
69
+ "use": "research-scale Xperience representation learning"
70
+ },
71
+ {
72
+ "goal": "3B-7B full-corpus domain model",
73
+ "compute": "128-512 GPUs",
74
+ "use": "first realistic full Xperience-native foundation model"
75
+ },
76
+ {
77
+ "goal": "30B-class omni model from scratch",
78
+ "compute": "512-2000+ GPUs",
79
+ "use": "lab-scale project after scaling curves justify cost"
80
+ }
81
+ ]
82
  },
83
  "model_families": [
84
  {
 
180
  "current_decision": "optional_baseline_after_data_staging",
181
  "entry_condition": "Action labels and baseline protocol exist.",
182
  "public_source": "https://github.com/huggingface/lerobot"
183
+ },
184
+ {
185
+ "priority": 8,
186
+ "family": "Xperience Embodied Foundation Model",
187
+ "category": "xperience_native_pretraining_goal",
188
+ "openness": "future project-specific model if full-corpus access and compute exist",
189
+ "best_role": "Domain model over synchronized embodied experience.",
190
+ "xperience10m_fit": [
191
+ "Uses the full aligned modality stack rather than treating sensors as auxiliary metadata.",
192
+ "Targets temporal embodied representation learning across perception, motion, geometry, audio, and language.",
193
+ "Can become the shared pretraining backbone for Qwen-style instruction tasks, Cosmos-style world modeling, and policy/action branches."
194
+ ],
195
+ "current_decision": "future_goal_after_scaling_evidence",
196
+ "entry_condition": "Full-corpus data path, PB-scale storage, multi-node compute, and positive smaller-run scaling evidence.",
197
+ "public_source": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md"
198
  }
199
  ],
200
  "execution_order": [
 
227
  "step": 6,
228
  "name": "Publishing threshold",
229
  "action": "Publish branch results only with real manifests, predictions, metrics, and qualitative examples."
230
+ },
231
+ {
232
+ "step": 7,
233
+ "name": "Xperience-native pretraining",
234
+ "action": "Start a from-scratch Xperience Embodied Foundation Model only after smaller scaling stages, full-corpus storage, multi-node compute, and held-out evaluation protocols are in place."
235
  }
236
  ],
237
  "evaluation_additions": [
 
318
  {
319
  "label": "LeRobot / SmolVLA",
320
  "url": "https://github.com/huggingface/lerobot"
321
+ },
322
+ {
323
+ "label": "Xperience Embodied Foundation Model pretraining plan",
324
+ "url": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md"
325
  }
326
  ]
327
  }
metrics/mirror_parity.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-04T18:33:44+00:00",
4
  "hf_root": "hf_publish",
5
  "summary": {
6
  "group_count": 101,
@@ -71,27 +71,27 @@
71
  "local": {
72
  "path": "repo:docs/data/artifact_index.json",
73
  "exists": true,
74
- "bytes": 32296,
75
- "sha256": "5494e5ee1e40bc50d44a9cd6f77c8de694175939bda4f174fb5b1554e53ec508"
76
  },
77
  "mirrors": {
78
  "hf_space": {
79
  "path": "hf_space:data/artifact_index.json",
80
  "exists": true,
81
- "bytes": 32296,
82
- "sha256": "5494e5ee1e40bc50d44a9cd6f77c8de694175939bda4f174fb5b1554e53ec508"
83
  },
84
  "hf_artifacts": {
85
  "path": "hf_artifacts:docs/data/artifact_index.json",
86
  "exists": true,
87
- "bytes": 32296,
88
- "sha256": "5494e5ee1e40bc50d44a9cd6f77c8de694175939bda4f174fb5b1554e53ec508"
89
  },
90
  "hf_model": {
91
  "path": "hf_model:metrics/artifact_index.json",
92
  "exists": true,
93
- "bytes": 32296,
94
- "sha256": "5494e5ee1e40bc50d44a9cd6f77c8de694175939bda4f174fb5b1554e53ec508"
95
  }
96
  },
97
  "failures": []
@@ -226,27 +226,27 @@
226
  "local": {
227
  "path": "repo:docs/data/foundation_model_plan.json",
228
  "exists": true,
229
- "bytes": 8889,
230
- "sha256": "e9b11114fa290253000b921575586780ccc3ba17665235259d4326c524f6ce97"
231
  },
232
  "mirrors": {
233
  "hf_space": {
234
  "path": "hf_space:data/foundation_model_plan.json",
235
  "exists": true,
236
- "bytes": 8889,
237
- "sha256": "e9b11114fa290253000b921575586780ccc3ba17665235259d4326c524f6ce97"
238
  },
239
  "hf_artifacts": {
240
  "path": "hf_artifacts:docs/data/foundation_model_plan.json",
241
  "exists": true,
242
- "bytes": 8889,
243
- "sha256": "e9b11114fa290253000b921575586780ccc3ba17665235259d4326c524f6ce97"
244
  },
245
  "hf_model": {
246
  "path": "hf_model:metrics/foundation_model_plan.json",
247
  "exists": true,
248
- "bytes": 8889,
249
- "sha256": "e9b11114fa290253000b921575586780ccc3ba17665235259d4326c524f6ce97"
250
  }
251
  },
252
  "failures": []
@@ -412,27 +412,27 @@
412
  "local": {
413
  "path": "repo:docs/data/project_status.json",
414
  "exists": true,
415
- "bytes": 9169,
416
- "sha256": "50d3c87b774c8375dcb897bd363d25e392e5fd6571571c41d56e623df15063f8"
417
  },
418
  "mirrors": {
419
  "hf_space": {
420
  "path": "hf_space:data/project_status.json",
421
  "exists": true,
422
- "bytes": 9169,
423
- "sha256": "50d3c87b774c8375dcb897bd363d25e392e5fd6571571c41d56e623df15063f8"
424
  },
425
  "hf_artifacts": {
426
  "path": "hf_artifacts:docs/data/project_status.json",
427
  "exists": true,
428
- "bytes": 9169,
429
- "sha256": "50d3c87b774c8375dcb897bd363d25e392e5fd6571571c41d56e623df15063f8"
430
  },
431
  "hf_model": {
432
  "path": "hf_model:metrics/project_status.json",
433
  "exists": true,
434
- "bytes": 9169,
435
- "sha256": "50d3c87b774c8375dcb897bd363d25e392e5fd6571571c41d56e623df15063f8"
436
  }
437
  },
438
  "failures": []
@@ -444,26 +444,26 @@
444
  "path": "repo:docs/data/publication_audit.json",
445
  "exists": true,
446
  "bytes": 7237,
447
- "sha256": "a95c93592ba70709b2fad24a911d19329e6823f25862cd4fcb256788190dd0f2"
448
  },
449
  "mirrors": {
450
  "hf_space": {
451
  "path": "hf_space:data/publication_audit.json",
452
  "exists": true,
453
  "bytes": 7237,
454
- "sha256": "a95c93592ba70709b2fad24a911d19329e6823f25862cd4fcb256788190dd0f2"
455
  },
456
  "hf_artifacts": {
457
  "path": "hf_artifacts:docs/data/publication_audit.json",
458
  "exists": true,
459
  "bytes": 7237,
460
- "sha256": "a95c93592ba70709b2fad24a911d19329e6823f25862cd4fcb256788190dd0f2"
461
  },
462
  "hf_model": {
463
  "path": "hf_model:metrics/publication_audit.json",
464
  "exists": true,
465
  "bytes": 7237,
466
- "sha256": "a95c93592ba70709b2fad24a911d19329e6823f25862cd4fcb256788190dd0f2"
467
  }
468
  },
469
  "failures": []
@@ -598,27 +598,27 @@
598
  "local": {
599
  "path": "repo:docs/data/research_roadmap.json",
600
  "exists": true,
601
- "bytes": 5758,
602
- "sha256": "54657eb8824416d2128d6e5710543bdaf9e41d7c2fa46dd14ad6b58fede3b5db"
603
  },
604
  "mirrors": {
605
  "hf_space": {
606
  "path": "hf_space:data/research_roadmap.json",
607
  "exists": true,
608
- "bytes": 5758,
609
- "sha256": "54657eb8824416d2128d6e5710543bdaf9e41d7c2fa46dd14ad6b58fede3b5db"
610
  },
611
  "hf_artifacts": {
612
  "path": "hf_artifacts:docs/data/research_roadmap.json",
613
  "exists": true,
614
- "bytes": 5758,
615
- "sha256": "54657eb8824416d2128d6e5710543bdaf9e41d7c2fa46dd14ad6b58fede3b5db"
616
  },
617
  "hf_model": {
618
  "path": "hf_model:metrics/research_roadmap.json",
619
  "exists": true,
620
- "bytes": 5758,
621
- "sha256": "54657eb8824416d2128d6e5710543bdaf9e41d7c2fa46dd14ad6b58fede3b5db"
622
  }
623
  },
624
  "failures": []
@@ -629,27 +629,27 @@
629
  "local": {
630
  "path": "repo:docs/data/research_roadmap_interactive.json",
631
  "exists": true,
632
- "bytes": 131519,
633
- "sha256": "004fbcc7a3582da88dd66504d686604ecb0f04f65c9c8166bb0583e0fc174274"
634
  },
635
  "mirrors": {
636
  "hf_space": {
637
  "path": "hf_space:data/research_roadmap_interactive.json",
638
  "exists": true,
639
- "bytes": 131519,
640
- "sha256": "004fbcc7a3582da88dd66504d686604ecb0f04f65c9c8166bb0583e0fc174274"
641
  },
642
  "hf_artifacts": {
643
  "path": "hf_artifacts:docs/data/research_roadmap_interactive.json",
644
  "exists": true,
645
- "bytes": 131519,
646
- "sha256": "004fbcc7a3582da88dd66504d686604ecb0f04f65c9c8166bb0583e0fc174274"
647
  },
648
  "hf_model": {
649
  "path": "hf_model:metrics/research_roadmap_interactive.json",
650
  "exists": true,
651
- "bytes": 131519,
652
- "sha256": "004fbcc7a3582da88dd66504d686604ecb0f04f65c9c8166bb0583e0fc174274"
653
  }
654
  },
655
  "failures": []
@@ -1692,21 +1692,21 @@
1692
  "local": {
1693
  "path": "repo:scripts/build_artifact_index.py",
1694
  "exists": true,
1695
- "bytes": 26568,
1696
- "sha256": "a611b399e858560f6afb41e121f033724753c5167d04e0d7bf243e569de88f04"
1697
  },
1698
  "mirrors": {
1699
  "hf_artifacts": {
1700
  "path": "hf_artifacts:scripts/build_artifact_index.py",
1701
  "exists": true,
1702
- "bytes": 26568,
1703
- "sha256": "a611b399e858560f6afb41e121f033724753c5167d04e0d7bf243e569de88f04"
1704
  },
1705
  "hf_model": {
1706
  "path": "hf_model:scripts/build_artifact_index.py",
1707
  "exists": true,
1708
- "bytes": 26568,
1709
- "sha256": "a611b399e858560f6afb41e121f033724753c5167d04e0d7bf243e569de88f04"
1710
  }
1711
  },
1712
  "failures": []
@@ -2017,21 +2017,21 @@
2017
  "local": {
2018
  "path": "repo:scripts/validate_publication_package.py",
2019
  "exists": true,
2020
- "bytes": 17125,
2021
- "sha256": "51febee7a4caa4e3cbb3833c0c13ac502bd7106fdb3df06e868ed00bc8f9fd9e"
2022
  },
2023
  "mirrors": {
2024
  "hf_artifacts": {
2025
  "path": "hf_artifacts:scripts/validate_publication_package.py",
2026
  "exists": true,
2027
- "bytes": 17125,
2028
- "sha256": "51febee7a4caa4e3cbb3833c0c13ac502bd7106fdb3df06e868ed00bc8f9fd9e"
2029
  },
2030
  "hf_model": {
2031
  "path": "hf_model:scripts/validate_publication_package.py",
2032
  "exists": true,
2033
- "bytes": 17125,
2034
- "sha256": "51febee7a4caa4e3cbb3833c0c13ac502bd7106fdb3df06e868ed00bc8f9fd9e"
2035
  }
2036
  },
2037
  "failures": []
@@ -2217,21 +2217,21 @@
2217
  "local": {
2218
  "path": "repo:docs/index.html",
2219
  "exists": true,
2220
- "bytes": 172286,
2221
- "sha256": "a736850416c0061adddbb6ced5897efd1add499ec26e510b6fe21a4945b341c8"
2222
  },
2223
  "mirrors": {
2224
  "hf_space": {
2225
  "path": "hf_space:index.html",
2226
  "exists": true,
2227
- "bytes": 172286,
2228
- "sha256": "a736850416c0061adddbb6ced5897efd1add499ec26e510b6fe21a4945b341c8"
2229
  },
2230
  "hf_artifacts_docs": {
2231
  "path": "hf_artifacts:docs/index.html",
2232
  "exists": true,
2233
- "bytes": 172286,
2234
- "sha256": "a736850416c0061adddbb6ced5897efd1add499ec26e510b6fe21a4945b341c8"
2235
  }
2236
  },
2237
  "failures": []
@@ -2242,21 +2242,21 @@
2242
  "local": {
2243
  "path": "repo:docs/research_roadmap.html",
2244
  "exists": true,
2245
- "bytes": 31554,
2246
- "sha256": "f51e83a4495f2d2012ec4c48191d66ca4456a00d7fcb335a427b7d86afc66109"
2247
  },
2248
  "mirrors": {
2249
  "hf_space": {
2250
  "path": "hf_space:research_roadmap.html",
2251
  "exists": true,
2252
- "bytes": 31554,
2253
- "sha256": "f51e83a4495f2d2012ec4c48191d66ca4456a00d7fcb335a427b7d86afc66109"
2254
  },
2255
  "hf_artifacts_docs": {
2256
  "path": "hf_artifacts:docs/research_roadmap.html",
2257
  "exists": true,
2258
- "bytes": 31554,
2259
- "sha256": "f51e83a4495f2d2012ec4c48191d66ca4456a00d7fcb335a427b7d86afc66109"
2260
  }
2261
  },
2262
  "failures": []
@@ -2844,27 +2844,27 @@
2844
  "local": {
2845
  "path": "repo:FOUNDATION_MODEL_PLAN.md",
2846
  "exists": true,
2847
- "bytes": 6559,
2848
- "sha256": "955be6559b554f1c6c4141dd6ca2818127d89585df3940c2bd9b975ad9047926"
2849
  },
2850
  "mirrors": {
2851
  "hf_space": {
2852
  "path": "hf_space:FOUNDATION_MODEL_PLAN.md",
2853
  "exists": true,
2854
- "bytes": 6559,
2855
- "sha256": "955be6559b554f1c6c4141dd6ca2818127d89585df3940c2bd9b975ad9047926"
2856
  },
2857
  "hf_artifacts": {
2858
  "path": "hf_artifacts:FOUNDATION_MODEL_PLAN.md",
2859
  "exists": true,
2860
- "bytes": 6559,
2861
- "sha256": "955be6559b554f1c6c4141dd6ca2818127d89585df3940c2bd9b975ad9047926"
2862
  },
2863
  "hf_model": {
2864
  "path": "hf_model:FOUNDATION_MODEL_PLAN.md",
2865
  "exists": true,
2866
- "bytes": 6559,
2867
- "sha256": "955be6559b554f1c6c4141dd6ca2818127d89585df3940c2bd9b975ad9047926"
2868
  }
2869
  },
2870
  "failures": []
@@ -2937,27 +2937,27 @@
2937
  "local": {
2938
  "path": "repo:RESEARCH_ROADMAP.md",
2939
  "exists": true,
2940
- "bytes": 6677,
2941
- "sha256": "58491bfb68ad3e6b7569bdb1a3cac3de7682a49beb9de368a114d58ebf0b118b"
2942
  },
2943
  "mirrors": {
2944
  "hf_space": {
2945
  "path": "hf_space:RESEARCH_ROADMAP.md",
2946
  "exists": true,
2947
- "bytes": 6677,
2948
- "sha256": "58491bfb68ad3e6b7569bdb1a3cac3de7682a49beb9de368a114d58ebf0b118b"
2949
  },
2950
  "hf_artifacts": {
2951
  "path": "hf_artifacts:RESEARCH_ROADMAP.md",
2952
  "exists": true,
2953
- "bytes": 6677,
2954
- "sha256": "58491bfb68ad3e6b7569bdb1a3cac3de7682a49beb9de368a114d58ebf0b118b"
2955
  },
2956
  "hf_model": {
2957
  "path": "hf_model:RESEARCH_ROADMAP.md",
2958
  "exists": true,
2959
- "bytes": 6677,
2960
- "sha256": "58491bfb68ad3e6b7569bdb1a3cac3de7682a49beb9de368a114d58ebf0b118b"
2961
  }
2962
  },
2963
  "failures": []
@@ -2968,27 +2968,27 @@
2968
  "local": {
2969
  "path": "repo:PROJECT_STATUS.md",
2970
  "exists": true,
2971
- "bytes": 6648,
2972
- "sha256": "b052c725472f1d59232918a4d5b0f3668534c1e25e24189307159f5a0157d58f"
2973
  },
2974
  "mirrors": {
2975
  "hf_space": {
2976
  "path": "hf_space:PROJECT_STATUS.md",
2977
  "exists": true,
2978
- "bytes": 6648,
2979
- "sha256": "b052c725472f1d59232918a4d5b0f3668534c1e25e24189307159f5a0157d58f"
2980
  },
2981
  "hf_artifacts": {
2982
  "path": "hf_artifacts:PROJECT_STATUS.md",
2983
  "exists": true,
2984
- "bytes": 6648,
2985
- "sha256": "b052c725472f1d59232918a4d5b0f3668534c1e25e24189307159f5a0157d58f"
2986
  },
2987
  "hf_model": {
2988
  "path": "hf_model:PROJECT_STATUS.md",
2989
  "exists": true,
2990
- "bytes": 6648,
2991
- "sha256": "b052c725472f1d59232918a4d5b0f3668534c1e25e24189307159f5a0157d58f"
2992
  }
2993
  },
2994
  "failures": []
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-04T20:45:22+00:00",
4
  "hf_root": "hf_publish",
5
  "summary": {
6
  "group_count": 101,
 
71
  "local": {
72
  "path": "repo:docs/data/artifact_index.json",
73
  "exists": true,
74
+ "bytes": 32864,
75
+ "sha256": "ec7d17898c42fd76109567c201f9638059b6a9a11a48817b32677a0eb2662178"
76
  },
77
  "mirrors": {
78
  "hf_space": {
79
  "path": "hf_space:data/artifact_index.json",
80
  "exists": true,
81
+ "bytes": 32864,
82
+ "sha256": "ec7d17898c42fd76109567c201f9638059b6a9a11a48817b32677a0eb2662178"
83
  },
84
  "hf_artifacts": {
85
  "path": "hf_artifacts:docs/data/artifact_index.json",
86
  "exists": true,
87
+ "bytes": 32864,
88
+ "sha256": "ec7d17898c42fd76109567c201f9638059b6a9a11a48817b32677a0eb2662178"
89
  },
90
  "hf_model": {
91
  "path": "hf_model:metrics/artifact_index.json",
92
  "exists": true,
93
+ "bytes": 32864,
94
+ "sha256": "ec7d17898c42fd76109567c201f9638059b6a9a11a48817b32677a0eb2662178"
95
  }
96
  },
97
  "failures": []
 
226
  "local": {
227
  "path": "repo:docs/data/foundation_model_plan.json",
228
  "exists": true,
229
+ "bytes": 12981,
230
+ "sha256": "9cce52025a2e2f8afb4660e2af3353aea6ad0a1af380849218dd74c0acc271bb"
231
  },
232
  "mirrors": {
233
  "hf_space": {
234
  "path": "hf_space:data/foundation_model_plan.json",
235
  "exists": true,
236
+ "bytes": 12981,
237
+ "sha256": "9cce52025a2e2f8afb4660e2af3353aea6ad0a1af380849218dd74c0acc271bb"
238
  },
239
  "hf_artifacts": {
240
  "path": "hf_artifacts:docs/data/foundation_model_plan.json",
241
  "exists": true,
242
+ "bytes": 12981,
243
+ "sha256": "9cce52025a2e2f8afb4660e2af3353aea6ad0a1af380849218dd74c0acc271bb"
244
  },
245
  "hf_model": {
246
  "path": "hf_model:metrics/foundation_model_plan.json",
247
  "exists": true,
248
+ "bytes": 12981,
249
+ "sha256": "9cce52025a2e2f8afb4660e2af3353aea6ad0a1af380849218dd74c0acc271bb"
250
  }
251
  },
252
  "failures": []
 
412
  "local": {
413
  "path": "repo:docs/data/project_status.json",
414
  "exists": true,
415
+ "bytes": 9874,
416
+ "sha256": "600c95726eae3404127a8b2110f35468ff2ba02943cae0fbcd3ea43c66109d3e"
417
  },
418
  "mirrors": {
419
  "hf_space": {
420
  "path": "hf_space:data/project_status.json",
421
  "exists": true,
422
+ "bytes": 9874,
423
+ "sha256": "600c95726eae3404127a8b2110f35468ff2ba02943cae0fbcd3ea43c66109d3e"
424
  },
425
  "hf_artifacts": {
426
  "path": "hf_artifacts:docs/data/project_status.json",
427
  "exists": true,
428
+ "bytes": 9874,
429
+ "sha256": "600c95726eae3404127a8b2110f35468ff2ba02943cae0fbcd3ea43c66109d3e"
430
  },
431
  "hf_model": {
432
  "path": "hf_model:metrics/project_status.json",
433
  "exists": true,
434
+ "bytes": 9874,
435
+ "sha256": "600c95726eae3404127a8b2110f35468ff2ba02943cae0fbcd3ea43c66109d3e"
436
  }
437
  },
438
  "failures": []
 
444
  "path": "repo:docs/data/publication_audit.json",
445
  "exists": true,
446
  "bytes": 7237,
447
+ "sha256": "7fbb19f8990b1a4d902e282c010d27e4391755564fa68af97d96c298c6b054f8"
448
  },
449
  "mirrors": {
450
  "hf_space": {
451
  "path": "hf_space:data/publication_audit.json",
452
  "exists": true,
453
  "bytes": 7237,
454
+ "sha256": "7fbb19f8990b1a4d902e282c010d27e4391755564fa68af97d96c298c6b054f8"
455
  },
456
  "hf_artifacts": {
457
  "path": "hf_artifacts:docs/data/publication_audit.json",
458
  "exists": true,
459
  "bytes": 7237,
460
+ "sha256": "7fbb19f8990b1a4d902e282c010d27e4391755564fa68af97d96c298c6b054f8"
461
  },
462
  "hf_model": {
463
  "path": "hf_model:metrics/publication_audit.json",
464
  "exists": true,
465
  "bytes": 7237,
466
+ "sha256": "7fbb19f8990b1a4d902e282c010d27e4391755564fa68af97d96c298c6b054f8"
467
  }
468
  },
469
  "failures": []
 
598
  "local": {
599
  "path": "repo:docs/data/research_roadmap.json",
600
  "exists": true,
601
+ "bytes": 7161,
602
+ "sha256": "cc96118c2c05108c831616151bc027441f7545495adeeb6a4a6a6bffe8da7801"
603
  },
604
  "mirrors": {
605
  "hf_space": {
606
  "path": "hf_space:data/research_roadmap.json",
607
  "exists": true,
608
+ "bytes": 7161,
609
+ "sha256": "cc96118c2c05108c831616151bc027441f7545495adeeb6a4a6a6bffe8da7801"
610
  },
611
  "hf_artifacts": {
612
  "path": "hf_artifacts:docs/data/research_roadmap.json",
613
  "exists": true,
614
+ "bytes": 7161,
615
+ "sha256": "cc96118c2c05108c831616151bc027441f7545495adeeb6a4a6a6bffe8da7801"
616
  },
617
  "hf_model": {
618
  "path": "hf_model:metrics/research_roadmap.json",
619
  "exists": true,
620
+ "bytes": 7161,
621
+ "sha256": "cc96118c2c05108c831616151bc027441f7545495adeeb6a4a6a6bffe8da7801"
622
  }
623
  },
624
  "failures": []
 
629
  "local": {
630
  "path": "repo:docs/data/research_roadmap_interactive.json",
631
  "exists": true,
632
+ "bytes": 134282,
633
+ "sha256": "ff37219a9f1d9b386a9d4c42766e4aa28f10ce6ef338dceeedd6bdb4a1b2c40a"
634
  },
635
  "mirrors": {
636
  "hf_space": {
637
  "path": "hf_space:data/research_roadmap_interactive.json",
638
  "exists": true,
639
+ "bytes": 134282,
640
+ "sha256": "ff37219a9f1d9b386a9d4c42766e4aa28f10ce6ef338dceeedd6bdb4a1b2c40a"
641
  },
642
  "hf_artifacts": {
643
  "path": "hf_artifacts:docs/data/research_roadmap_interactive.json",
644
  "exists": true,
645
+ "bytes": 134282,
646
+ "sha256": "ff37219a9f1d9b386a9d4c42766e4aa28f10ce6ef338dceeedd6bdb4a1b2c40a"
647
  },
648
  "hf_model": {
649
  "path": "hf_model:metrics/research_roadmap_interactive.json",
650
  "exists": true,
651
+ "bytes": 134282,
652
+ "sha256": "ff37219a9f1d9b386a9d4c42766e4aa28f10ce6ef338dceeedd6bdb4a1b2c40a"
653
  }
654
  },
655
  "failures": []
 
1692
  "local": {
1693
  "path": "repo:scripts/build_artifact_index.py",
1694
  "exists": true,
1695
+ "bytes": 27020,
1696
+ "sha256": "0ca7ed96f24caecbab31687cffa99f0eba8471258986412a294614e688c5aff5"
1697
  },
1698
  "mirrors": {
1699
  "hf_artifacts": {
1700
  "path": "hf_artifacts:scripts/build_artifact_index.py",
1701
  "exists": true,
1702
+ "bytes": 27020,
1703
+ "sha256": "0ca7ed96f24caecbab31687cffa99f0eba8471258986412a294614e688c5aff5"
1704
  },
1705
  "hf_model": {
1706
  "path": "hf_model:scripts/build_artifact_index.py",
1707
  "exists": true,
1708
+ "bytes": 27020,
1709
+ "sha256": "0ca7ed96f24caecbab31687cffa99f0eba8471258986412a294614e688c5aff5"
1710
  }
1711
  },
1712
  "failures": []
 
2017
  "local": {
2018
  "path": "repo:scripts/validate_publication_package.py",
2019
  "exists": true,
2020
+ "bytes": 17197,
2021
+ "sha256": "2a617f3204ffb8c59d1c5bc1828b4441a4d014bb531655fd0613e128a6d9abc2"
2022
  },
2023
  "mirrors": {
2024
  "hf_artifacts": {
2025
  "path": "hf_artifacts:scripts/validate_publication_package.py",
2026
  "exists": true,
2027
+ "bytes": 17197,
2028
+ "sha256": "2a617f3204ffb8c59d1c5bc1828b4441a4d014bb531655fd0613e128a6d9abc2"
2029
  },
2030
  "hf_model": {
2031
  "path": "hf_model:scripts/validate_publication_package.py",
2032
  "exists": true,
2033
+ "bytes": 17197,
2034
+ "sha256": "2a617f3204ffb8c59d1c5bc1828b4441a4d014bb531655fd0613e128a6d9abc2"
2035
  }
2036
  },
2037
  "failures": []
 
2217
  "local": {
2218
  "path": "repo:docs/index.html",
2219
  "exists": true,
2220
+ "bytes": 174923,
2221
+ "sha256": "099fcc01cbb4d50f62c508b10f343f05b1c883962b85bda294bcede99af2a0f1"
2222
  },
2223
  "mirrors": {
2224
  "hf_space": {
2225
  "path": "hf_space:index.html",
2226
  "exists": true,
2227
+ "bytes": 174923,
2228
+ "sha256": "099fcc01cbb4d50f62c508b10f343f05b1c883962b85bda294bcede99af2a0f1"
2229
  },
2230
  "hf_artifacts_docs": {
2231
  "path": "hf_artifacts:docs/index.html",
2232
  "exists": true,
2233
+ "bytes": 174923,
2234
+ "sha256": "099fcc01cbb4d50f62c508b10f343f05b1c883962b85bda294bcede99af2a0f1"
2235
  }
2236
  },
2237
  "failures": []
 
2242
  "local": {
2243
  "path": "repo:docs/research_roadmap.html",
2244
  "exists": true,
2245
+ "bytes": 31702,
2246
+ "sha256": "1b20a5cc342b3ba59ad808eed9f5bf978e2d9ac438c88b5c3eeba01f4e14b883"
2247
  },
2248
  "mirrors": {
2249
  "hf_space": {
2250
  "path": "hf_space:research_roadmap.html",
2251
  "exists": true,
2252
+ "bytes": 31702,
2253
+ "sha256": "1b20a5cc342b3ba59ad808eed9f5bf978e2d9ac438c88b5c3eeba01f4e14b883"
2254
  },
2255
  "hf_artifacts_docs": {
2256
  "path": "hf_artifacts:docs/research_roadmap.html",
2257
  "exists": true,
2258
+ "bytes": 31702,
2259
+ "sha256": "1b20a5cc342b3ba59ad808eed9f5bf978e2d9ac438c88b5c3eeba01f4e14b883"
2260
  }
2261
  },
2262
  "failures": []
 
2844
  "local": {
2845
  "path": "repo:FOUNDATION_MODEL_PLAN.md",
2846
  "exists": true,
2847
+ "bytes": 9075,
2848
+ "sha256": "444d13ab556d2e16a199a7fca191b87c85ab8685d167aab357bc6341839299a2"
2849
  },
2850
  "mirrors": {
2851
  "hf_space": {
2852
  "path": "hf_space:FOUNDATION_MODEL_PLAN.md",
2853
  "exists": true,
2854
+ "bytes": 9075,
2855
+ "sha256": "444d13ab556d2e16a199a7fca191b87c85ab8685d167aab357bc6341839299a2"
2856
  },
2857
  "hf_artifacts": {
2858
  "path": "hf_artifacts:FOUNDATION_MODEL_PLAN.md",
2859
  "exists": true,
2860
+ "bytes": 9075,
2861
+ "sha256": "444d13ab556d2e16a199a7fca191b87c85ab8685d167aab357bc6341839299a2"
2862
  },
2863
  "hf_model": {
2864
  "path": "hf_model:FOUNDATION_MODEL_PLAN.md",
2865
  "exists": true,
2866
+ "bytes": 9075,
2867
+ "sha256": "444d13ab556d2e16a199a7fca191b87c85ab8685d167aab357bc6341839299a2"
2868
  }
2869
  },
2870
  "failures": []
 
2937
  "local": {
2938
  "path": "repo:RESEARCH_ROADMAP.md",
2939
  "exists": true,
2940
+ "bytes": 8388,
2941
+ "sha256": "0b3e3356076998ad94dc39f708cc783a4ebeab76c9da661cdd37ea12a3bb3665"
2942
  },
2943
  "mirrors": {
2944
  "hf_space": {
2945
  "path": "hf_space:RESEARCH_ROADMAP.md",
2946
  "exists": true,
2947
+ "bytes": 8388,
2948
+ "sha256": "0b3e3356076998ad94dc39f708cc783a4ebeab76c9da661cdd37ea12a3bb3665"
2949
  },
2950
  "hf_artifacts": {
2951
  "path": "hf_artifacts:RESEARCH_ROADMAP.md",
2952
  "exists": true,
2953
+ "bytes": 8388,
2954
+ "sha256": "0b3e3356076998ad94dc39f708cc783a4ebeab76c9da661cdd37ea12a3bb3665"
2955
  },
2956
  "hf_model": {
2957
  "path": "hf_model:RESEARCH_ROADMAP.md",
2958
  "exists": true,
2959
+ "bytes": 8388,
2960
+ "sha256": "0b3e3356076998ad94dc39f708cc783a4ebeab76c9da661cdd37ea12a3bb3665"
2961
  }
2962
  },
2963
  "failures": []
 
2968
  "local": {
2969
  "path": "repo:PROJECT_STATUS.md",
2970
  "exists": true,
2971
+ "bytes": 7207,
2972
+ "sha256": "7baaba976ccc254da1a03ee2653057d1e08f3fb0c0cad035886c362442828720"
2973
  },
2974
  "mirrors": {
2975
  "hf_space": {
2976
  "path": "hf_space:PROJECT_STATUS.md",
2977
  "exists": true,
2978
+ "bytes": 7207,
2979
+ "sha256": "7baaba976ccc254da1a03ee2653057d1e08f3fb0c0cad035886c362442828720"
2980
  },
2981
  "hf_artifacts": {
2982
  "path": "hf_artifacts:PROJECT_STATUS.md",
2983
  "exists": true,
2984
+ "bytes": 7207,
2985
+ "sha256": "7baaba976ccc254da1a03ee2653057d1e08f3fb0c0cad035886c362442828720"
2986
  },
2987
  "hf_model": {
2988
  "path": "hf_model:PROJECT_STATUS.md",
2989
  "exists": true,
2990
+ "bytes": 7207,
2991
+ "sha256": "7baaba976ccc254da1a03ee2653057d1e08f3fb0c0cad035886c362442828720"
2992
  }
2993
  },
2994
  "failures": []
metrics/project_status.json CHANGED
@@ -82,7 +82,7 @@
82
  "RESEARCH_ROADMAP.md",
83
  "docs/data/research_roadmap.json"
84
  ],
85
- "readout": "The roadmap connects public-sample task development to 128-episode data preparation, Qwen3-Omni LoRA, foundation-model selection, robustness runs, and larger omni/world-model extensions."
86
  },
87
  {
88
  "area": "Foundation-model plan",
@@ -93,6 +93,14 @@
93
  ],
94
  "readout": "Qwen3-Omni remains the first trainable held-out LoRA baseline; Cosmos 3 is added as the first world-model/action-generation branch; OpenVLA/openpi/GR00T are policy candidates after action targets are explicit."
95
  },
 
 
 
 
 
 
 
 
96
  {
97
  "area": "Official dataset wording",
98
  "status": "verified",
@@ -167,6 +175,7 @@
167
  "Inspect RESEARCH_TAKEAWAYS.md and docs/data/research_takeaways.json before interpreting model scores.",
168
  "Inspect RESEARCH_ROADMAP.md and docs/data/research_roadmap.json for the path from public-sample task work to multi-episode modeling.",
169
  "Inspect FOUNDATION_MODEL_PLAN.md and docs/data/foundation_model_plan.json before choosing a backbone branch.",
 
170
  "Inspect docs/data/summary_metrics.json and results/episode_task_suite/neural_mlp/ to check the 12-task outputs.",
171
  "Inspect results/audio_ablation/AUDIO_ABLATION_SUMMARY.md before judging whether audio helps the current task suite.",
172
  "Inspect EVALUATION_PROTOCOL.md before judging task metrics or leakage controls.",
@@ -180,6 +189,7 @@
180
  "The current reconstruction task reconstructs feature vectors, not pixel-depth, mesh, NeRF, or Gaussian reconstruction.",
181
  "Audio is one of the synchronized source modalities in the current task representation.",
182
  "The audio ablation report compares audio/no-audio variants across all 12 task contracts in results/audio_ablation/.",
183
- "Foundation-model selection is explicit: Qwen3-Omni is the immediate trainable pilot, Cosmos 3 is the first world-model branch, and policy models such as OpenVLA/openpi/GR00T wait for action-target conversion."
 
184
  ]
185
  }
 
82
  "RESEARCH_ROADMAP.md",
83
  "docs/data/research_roadmap.json"
84
  ],
85
+ "readout": "The roadmap connects public-sample task development to 128-episode data preparation, Qwen3-Omni LoRA, foundation-model selection, robustness runs, world/policy branches, and the future Xperience-native pretraining goal."
86
  },
87
  {
88
  "area": "Foundation-model plan",
 
93
  ],
94
  "readout": "Qwen3-Omni remains the first trainable held-out LoRA baseline; Cosmos 3 is added as the first world-model/action-generation branch; OpenVLA/openpi/GR00T are policy candidates after action targets are explicit."
95
  },
96
+ {
97
+ "area": "Xperience Embodied Foundation Model",
98
+ "status": "future_goal",
99
+ "evidence": [
100
+ "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md"
101
+ ],
102
+ "readout": "A future full-corpus pretraining plan describes target modules, objectives, staged scale-up, hardware ranges, and evaluation for a domain-specific embodied foundation model."
103
+ },
104
  {
105
  "area": "Official dataset wording",
106
  "status": "verified",
 
175
  "Inspect RESEARCH_TAKEAWAYS.md and docs/data/research_takeaways.json before interpreting model scores.",
176
  "Inspect RESEARCH_ROADMAP.md and docs/data/research_roadmap.json for the path from public-sample task work to multi-episode modeling.",
177
  "Inspect FOUNDATION_MODEL_PLAN.md and docs/data/foundation_model_plan.json before choosing a backbone branch.",
178
+ "Inspect XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md for the long-term full-corpus pretraining goal.",
179
  "Inspect docs/data/summary_metrics.json and results/episode_task_suite/neural_mlp/ to check the 12-task outputs.",
180
  "Inspect results/audio_ablation/AUDIO_ABLATION_SUMMARY.md before judging whether audio helps the current task suite.",
181
  "Inspect EVALUATION_PROTOCOL.md before judging task metrics or leakage controls.",
 
189
  "The current reconstruction task reconstructs feature vectors, not pixel-depth, mesh, NeRF, or Gaussian reconstruction.",
190
  "Audio is one of the synchronized source modalities in the current task representation.",
191
  "The audio ablation report compares audio/no-audio variants across all 12 task contracts in results/audio_ablation/.",
192
+ "Foundation-model selection is explicit: Qwen3-Omni is the immediate trainable pilot, Cosmos 3 is the first world-model branch, and policy models such as OpenVLA/openpi/GR00T wait for action-target conversion.",
193
+ "The Xperience Embodied Foundation Model is a future native-pretraining goal, not a completed model or current benchmark."
194
  ]
195
  }
metrics/publication_audit.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-04T18:32:51+00:00",
4
  "checks": [
5
  {
6
  "name": "required_publication_assets_present",
@@ -182,8 +182,8 @@
182
  "github_repo": {
183
  "root": "repo",
184
  "exists": true,
185
- "file_count": 386,
186
- "text_file_count": 320,
187
  "largest_file": {
188
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
189
  "bytes": 55702978
@@ -193,8 +193,8 @@
193
  "hf_space_bundle": {
194
  "root": "hf_publish/space",
195
  "exists": true,
196
- "file_count": 316,
197
- "text_file_count": 250,
198
  "largest_file": {
199
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
200
  "bytes": 55702978
@@ -204,8 +204,8 @@
204
  "hf_artifact_bundle": {
205
  "root": "hf_publish/artifacts",
206
  "exists": true,
207
- "file_count": 417,
208
- "text_file_count": 329,
209
  "largest_file": {
210
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
211
  "bytes": 55702978
@@ -215,8 +215,8 @@
215
  "hf_model_bundle": {
216
  "root": "hf_publish/model",
217
  "exists": true,
218
- "file_count": 643,
219
- "text_file_count": 518,
220
  "largest_file": {
221
  "path": "pytorch_model.bin",
222
  "bytes": 93495480
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-04T20:43:37+00:00",
4
  "checks": [
5
  {
6
  "name": "required_publication_assets_present",
 
182
  "github_repo": {
183
  "root": "repo",
184
  "exists": true,
185
+ "file_count": 396,
186
+ "text_file_count": 330,
187
  "largest_file": {
188
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
189
  "bytes": 55702978
 
193
  "hf_space_bundle": {
194
  "root": "hf_publish/space",
195
  "exists": true,
196
+ "file_count": 317,
197
+ "text_file_count": 251,
198
  "largest_file": {
199
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
200
  "bytes": 55702978
 
204
  "hf_artifact_bundle": {
205
  "root": "hf_publish/artifacts",
206
  "exists": true,
207
+ "file_count": 418,
208
+ "text_file_count": 330,
209
  "largest_file": {
210
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
211
  "bytes": 55702978
 
215
  "hf_model_bundle": {
216
  "root": "hf_publish/model",
217
  "exists": true,
218
+ "file_count": 644,
219
+ "text_file_count": 519,
220
  "largest_file": {
221
  "path": "pytorch_model.bin",
222
  "bytes": 93495480
metrics/research_roadmap.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "title": "Ropedia Xperience-10M Research Roadmap",
3
- "summary": "Staged path from the public-sample task lab to multi-episode held-out evaluation, foundation-model selection, and larger omni/world-model extensions.",
4
- "current_decision_point": "Keep the public-sample task suite as the development harness, prepare the selected official Xperience-10M episodes for the held-out Qwen3-Omni pilot, then branch into Cosmos 3 world modeling and policy-model experiments after the data preparation path is stable.",
5
  "phases": [
6
  {
7
  "id": "public_sample_task_lab",
@@ -126,6 +126,30 @@
126
  "updated model cards"
127
  ],
128
  "reader_takeaway": "The long-term direction is richer multimodal representation learning for embodied-AI reasoning, with model branches chosen by task fit rather than by a single default backbone."
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
129
  }
130
  ],
131
  "public_surfaces_to_update": [
@@ -134,6 +158,7 @@
134
  "RESEARCH_TAKEAWAYS.md",
135
  "EVALUATION_PROTOCOL.md",
136
  "ARTIFACT_GUIDE.md",
 
137
  "docs/index.html",
138
  "docs/data/research_roadmap.json",
139
  "Hugging Face Space card",
 
1
  {
2
  "title": "Ropedia Xperience-10M Research Roadmap",
3
+ "summary": "Staged path from the public-sample task lab to multi-episode held-out evaluation, foundation-model selection, world/policy branches, and a future Xperience-native embodied foundation model.",
4
+ "current_decision_point": "Keep the public-sample task suite as the development harness, prepare the selected official Xperience-10M episodes for the held-out Qwen3-Omni pilot, then branch into Cosmos 3 world modeling and policy-model experiments after the data preparation path is stable. The Xperience Embodied Foundation Model is a later full-corpus pretraining goal, not a current result.",
5
  "phases": [
6
  {
7
  "id": "public_sample_task_lab",
 
126
  "updated model cards"
127
  ],
128
  "reader_takeaway": "The long-term direction is richer multimodal representation learning for embodied-AI reasoning, with model branches chosen by task fit rather than by a single default backbone."
129
+ },
130
+ {
131
+ "id": "xperience_embodied_foundation_pretraining",
132
+ "name": "Xperience Embodied Foundation Model Pretraining",
133
+ "status": "future",
134
+ "entry_condition": "Full-corpus access, PB-scale storage path, high-throughput data loading, multi-node compute, and positive scaling evidence from smaller multi-episode runs.",
135
+ "deliverables": [
136
+ "full-corpus episode and split manifests",
137
+ "pretraining shard and provenance manifests",
138
+ "0.3B-1B and 1B-3B scaling pilots",
139
+ "3B-7B Xperience-native domain model target",
140
+ "held-out episode/session/activity/object evaluations",
141
+ "missing-modality robustness report",
142
+ "model card and data-boundary report"
143
+ ],
144
+ "completion_evidence": [
145
+ "pretraining metadata",
146
+ "checkpoint inventory",
147
+ "scaling curves",
148
+ "held-out evaluation reports",
149
+ "qualitative retrieval or future-state examples",
150
+ "safety and data-boundary report"
151
+ ],
152
+ "reader_takeaway": "The final research direction is a domain-specific embodied foundation model trained directly on Xperience-10M, after smaller pilots justify the cost and infrastructure."
153
  }
154
  ],
155
  "public_surfaces_to_update": [
 
158
  "RESEARCH_TAKEAWAYS.md",
159
  "EVALUATION_PROTOCOL.md",
160
  "ARTIFACT_GUIDE.md",
161
+ "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md",
162
  "docs/index.html",
163
  "docs/data/research_roadmap.json",
164
  "Hugging Face Space card",
metrics/research_roadmap_interactive.json CHANGED
@@ -1837,7 +1837,8 @@
1837
  "NVIDIA GR00T"
1838
  ],
1839
  "first_world_model_branch": "Cosmos 3",
1840
- "immediate_trainable_backbone": "Qwen3-Omni"
 
1841
  },
1842
  "evaluation_additions": [
1843
  {
@@ -1921,6 +1922,11 @@
1921
  "action": "Publish branch results only with real manifests, predictions, metrics, and qualitative examples.",
1922
  "name": "Publishing threshold",
1923
  "step": 6
 
 
 
 
 
1924
  }
1925
  ],
1926
  "model_families": [
@@ -2023,6 +2029,21 @@
2023
  "Useful after action target design.",
2024
  "Less directly omni-modal than Qwen3-Omni or Cosmos 3."
2025
  ]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2026
  }
2027
  ],
2028
  "source_links": [
@@ -2057,11 +2078,15 @@
2057
  {
2058
  "label": "LeRobot / SmolVLA",
2059
  "url": "https://github.com/huggingface/lerobot"
 
 
 
 
2060
  }
2061
  ],
2062
  "status": "planning_artifact"
2063
  },
2064
- "generated_at_utc": "2026-06-04T16:42:13+00:00",
2065
  "omni_plan": {
2066
  "adapter": "LoRA rank 16, alpha 32, dropout 0.05",
2067
  "backbone": "Qwen/Qwen3-Omni-30B-A3B-Instruct",
@@ -2208,6 +2233,31 @@
2208
  "reader_takeaway": "The long-term direction is richer multimodal representation learning for embodied-AI reasoning, with model branches chosen by task fit rather than by a single default backbone.",
2209
  "stage": "future",
2210
  "status": "planned"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2211
  }
2212
  ],
2213
  "scale_up": {
 
1837
  "NVIDIA GR00T"
1838
  ],
1839
  "first_world_model_branch": "Cosmos 3",
1840
+ "immediate_trainable_backbone": "Qwen3-Omni",
1841
+ "long_term_native_pretraining_goal": "Xperience Embodied Foundation Model"
1842
  },
1843
  "evaluation_additions": [
1844
  {
 
1922
  "action": "Publish branch results only with real manifests, predictions, metrics, and qualitative examples.",
1923
  "name": "Publishing threshold",
1924
  "step": 6
1925
+ },
1926
+ {
1927
+ "action": "Start a from-scratch Xperience Embodied Foundation Model only after smaller scaling stages, full-corpus storage, multi-node compute, and held-out evaluation protocols are in place.",
1928
+ "name": "Xperience-native pretraining",
1929
+ "step": 7
1930
  }
1931
  ],
1932
  "model_families": [
 
2029
  "Useful after action target design.",
2030
  "Less directly omni-modal than Qwen3-Omni or Cosmos 3."
2031
  ]
2032
+ },
2033
+ {
2034
+ "best_role": "Domain model over synchronized embodied experience.",
2035
+ "category": "xperience_native_pretraining_goal",
2036
+ "current_decision": "future_goal_after_scaling_evidence",
2037
+ "entry_condition": "Full-corpus data path, PB-scale storage, multi-node compute, and positive smaller-run scaling evidence.",
2038
+ "family": "Xperience Embodied Foundation Model",
2039
+ "openness": "future project-specific model if full-corpus access and compute exist",
2040
+ "priority": 8,
2041
+ "public_source": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md",
2042
+ "xperience10m_fit": [
2043
+ "Uses the full aligned modality stack rather than treating sensors as auxiliary metadata.",
2044
+ "Targets temporal embodied representation learning across perception, motion, geometry, audio, and language.",
2045
+ "Can become the shared pretraining backbone for Qwen-style instruction tasks, Cosmos-style world modeling, and policy/action branches."
2046
+ ]
2047
  }
2048
  ],
2049
  "source_links": [
 
2078
  {
2079
  "label": "LeRobot / SmolVLA",
2080
  "url": "https://github.com/huggingface/lerobot"
2081
+ },
2082
+ {
2083
+ "label": "Xperience Embodied Foundation Model pretraining plan",
2084
+ "url": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md"
2085
  }
2086
  ],
2087
  "status": "planning_artifact"
2088
  },
2089
+ "generated_at_utc": "2026-06-04T20:40:29+00:00",
2090
  "omni_plan": {
2091
  "adapter": "LoRA rank 16, alpha 32, dropout 0.05",
2092
  "backbone": "Qwen/Qwen3-Omni-30B-A3B-Instruct",
 
2233
  "reader_takeaway": "The long-term direction is richer multimodal representation learning for embodied-AI reasoning, with model branches chosen by task fit rather than by a single default backbone.",
2234
  "stage": "future",
2235
  "status": "planned"
2236
+ },
2237
+ {
2238
+ "completion_evidence": [
2239
+ "pretraining metadata",
2240
+ "checkpoint inventory",
2241
+ "scaling curves",
2242
+ "held-out evaluation reports",
2243
+ "qualitative retrieval or future-state examples",
2244
+ "safety and data-boundary report"
2245
+ ],
2246
+ "deliverables": [
2247
+ "full-corpus episode and split manifests",
2248
+ "pretraining shard and provenance manifests",
2249
+ "0.3B-1B and 1B-3B scaling pilots",
2250
+ "3B-7B Xperience-native domain model target",
2251
+ "held-out episode/session/activity/object evaluations",
2252
+ "missing-modality robustness report",
2253
+ "model card and data-boundary report"
2254
+ ],
2255
+ "entry_condition": "Full-corpus access, PB-scale storage path, high-throughput data loading, multi-node compute, and positive scaling evidence from smaller multi-episode runs.",
2256
+ "id": "xperience_embodied_foundation_pretraining",
2257
+ "name": "Xperience Embodied Foundation Model Pretraining",
2258
+ "reader_takeaway": "The final research direction is a domain-specific embodied foundation model trained directly on Xperience-10M, after smaller pilots justify the cost and infrastructure.",
2259
+ "stage": "future",
2260
+ "status": "future"
2261
  }
2262
  ],
2263
  "scale_up": {
research_roadmap.html CHANGED
@@ -605,8 +605,9 @@
605
  <h1>Interactive Research Roadmap.</h1>
606
  <p class="hero-copy">
607
  This page connects the current public-sample task lab to the four research
608
- directions, the next multi-episode Qwen3-Omni fine-tuning path, and
609
- the later Cosmos 3 / policy-model branch choices. It loads
 
610
  directly from generated project artifacts, so the track and task views stay
611
  tied to the real sample metrics and scale-up status.
612
  </p>
@@ -630,7 +631,7 @@
630
  </div>
631
  <div class="route-step">
632
  <strong>03</strong>
633
- <div><b>Omni + branches</b><span>Qwen3-Omni first, Cosmos 3 and policy models after data preparation</span></div>
634
  <em id="routeOmni">pending data</em>
635
  </div>
636
  </div>
@@ -701,7 +702,7 @@
701
  },
702
  omni: {
703
  title: "Omni pilot and foundation branches",
704
- summary: "Run Qwen3-Omni first for the held-out LoRA pilot, then evaluate Cosmos 3 for world modeling and policy candidates after action targets are explicit.",
705
  }
706
  };
707
 
 
605
  <h1>Interactive Research Roadmap.</h1>
606
  <p class="hero-copy">
607
  This page connects the current public-sample task lab to the four research
608
+ directions, the next multi-episode Qwen3-Omni fine-tuning path, the
609
+ later Cosmos 3 / policy-model branch choices, and the future
610
+ Xperience-native foundation-model pretraining goal. It loads
611
  directly from generated project artifacts, so the track and task views stay
612
  tied to the real sample metrics and scale-up status.
613
  </p>
 
631
  </div>
632
  <div class="route-step">
633
  <strong>03</strong>
634
+ <div><b>Omni + branches</b><span>Qwen3-Omni first, Cosmos 3 and policy models next, native pretraining later</span></div>
635
  <em id="routeOmni">pending data</em>
636
  </div>
637
  </div>
 
702
  },
703
  omni: {
704
  title: "Omni pilot and foundation branches",
705
+ summary: "Run Qwen3-Omni first for the held-out LoRA pilot, evaluate Cosmos 3 for world modeling and policy candidates after action targets are explicit, then treat Xperience-native pretraining as the full-corpus future goal.",
706
  }
707
  };
708
 
scripts/build_artifact_index.py CHANGED
@@ -81,6 +81,14 @@ ARTIFACTS = [
81
  "surface": "website_hf",
82
  "shows": "Machine-readable foundation-model selection matrix with source links, entry conditions, and evaluation additions.",
83
  },
 
 
 
 
 
 
 
 
84
  {
85
  "id": "evidence_contract",
86
  "title": "Evidence contract",
 
81
  "surface": "website_hf",
82
  "shows": "Machine-readable foundation-model selection matrix with source links, entry conditions, and evaluation additions.",
83
  },
84
+ {
85
+ "id": "xperience_embodied_foundation_pretraining",
86
+ "title": "Xperience Embodied Foundation Model pretraining goal",
87
+ "path": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md",
88
+ "kind": "project_path",
89
+ "surface": "repo_hf",
90
+ "shows": "Describes the future full-corpus Xperience-native pretraining goal, target modules, objectives, staged scale-up, hardware ranges, and evaluation protocol.",
91
+ },
92
  {
93
  "id": "evidence_contract",
94
  "title": "Evidence contract",
scripts/validate_publication_package.py CHANGED
@@ -221,6 +221,8 @@ def scan(root: Path, *, paths: list[Path] | None = None, display_root: str | Non
221
  "detail": reason,
222
  })
223
  for needle, reason in STALE_PRESENTATION_STRINGS.items():
 
 
224
  if needle in text:
225
  violations.append({
226
  "kind": "stale_presentation_copy",
 
221
  "detail": reason,
222
  })
223
  for needle, reason in STALE_PRESENTATION_STRINGS.items():
224
+ if path_rel == ".mailmap":
225
+ continue
226
  if needle in text:
227
  violations.append({
228
  "kind": "stale_presentation_copy",