Add Xperience embodied foundation pretraining goal
Browse files- ARTIFACT_GUIDE.md +5 -2
- FOUNDATION_MODEL_PLAN.md +46 -0
- PROJECT_README.md +129 -173
- PROJECT_STATUS.md +11 -6
- README.md +12 -6
- RESEARCH_ROADMAP.md +28 -3
- XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md +178 -0
- data/artifact_index.json +35 -24
- data/foundation_model_plan.json +93 -1
- data/mirror_parity.json +93 -93
- data/project_status.json +12 -2
- data/publication_audit.json +9 -9
- data/research_roadmap.json +27 -2
- data/research_roadmap_interactive.json +52 -2
- docs/data/artifact_index.json +35 -24
- docs/data/foundation_model_plan.json +93 -1
- docs/data/mirror_parity.json +111 -111
- docs/data/project_status.json +12 -2
- docs/data/publication_audit.json +16 -16
- docs/data/research_roadmap.json +27 -2
- docs/data/research_roadmap_interactive.json +52 -2
- docs/index.html +27 -11
- docs/research_roadmap.html +5 -4
- index.html +27 -11
- metrics/artifact_index.json +35 -24
- metrics/foundation_model_plan.json +93 -1
- metrics/mirror_parity.json +93 -93
- metrics/project_status.json +12 -2
- metrics/publication_audit.json +9 -9
- metrics/research_roadmap.json +27 -2
- metrics/research_roadmap_interactive.json +52 -2
- research_roadmap.html +5 -4
- scripts/build_artifact_index.py +8 -0
- scripts/validate_publication_package.py +2 -0
ARTIFACT_GUIDE.md
CHANGED
|
@@ -3,15 +3,17 @@
|
|
| 3 |
This guide is the human-readable map for the public Ropedia Xperience-10M task
|
| 4 |
suite artifacts. It is organized around what a reader usually wants to do:
|
| 5 |
understand the project, inspect the sample episode, compare baselines, read the
|
| 6 |
-
task results,
|
|
|
|
| 7 |
|
| 8 |
## Start Here
|
| 9 |
|
| 10 |
| Artifact | Why to open it first |
|
| 11 |
| --- | --- |
|
| 12 |
| [`PROJECT_STATUS.md`](PROJECT_STATUS.md) | Gives the fastest current-state table: implemented, in staging, and outside current scope. |
|
| 13 |
-
| [`RESEARCH_ROADMAP.md`](RESEARCH_ROADMAP.md) | Shows the roadmap from public-sample task development to multi-episode data preparation, Qwen3-Omni LoRA, robustness runs, and
|
| 14 |
| [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md) | Explains which foundation backbones fit which Xperience-10M objective: Qwen3-Omni first, Cosmos 3 for world modeling, and VLA/policy models after action-target conversion. |
|
|
|
|
| 15 |
| [`EVALUATION_PROTOCOL.md`](EVALUATION_PROTOCOL.md) | Defines the task unit, chronological split, metrics, leakage controls, and current limitations. |
|
| 16 |
| [`REPRODUCIBILITY.md`](REPRODUCIBILITY.md) | Defines public reproduction commands, expected outputs, and unreproducible boundaries. |
|
| 17 |
| [`results/audio_ablation/AUDIO_ABLATION_SUMMARY.md`](results/audio_ablation/AUDIO_ABLATION_SUMMARY.md) | Shows measured current-audio and raw log-mel replacement deltas across the 12 task contracts. |
|
|
@@ -107,6 +109,7 @@ research project.
|
|
| 107 |
| [`scripts/omni/train_qwen3_omni_lora.py`](scripts/omni/train_qwen3_omni_lora.py) | Training entrypoint for the Qwen3-Omni LoRA pilot after the data gate passes. |
|
| 108 |
| [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md) | Adds the post-data-gate backbone selection plan: Qwen3-Omni first, Cosmos 3 for world modeling, and OpenVLA/openpi/GR00T for policy/action branches. |
|
| 109 |
| [`docs/data/foundation_model_plan.json`](docs/data/foundation_model_plan.json) | Machine-readable model-family registry with source links, entry conditions, and evaluation additions. |
|
|
|
|
| 110 |
|
| 111 |
## What Is Not Included
|
| 112 |
|
|
|
|
| 3 |
This guide is the human-readable map for the public Ropedia Xperience-10M task
|
| 4 |
suite artifacts. It is organized around what a reader usually wants to do:
|
| 5 |
understand the project, inspect the sample episode, compare baselines, read the
|
| 6 |
+
task results, follow the Qwen3-Omni scale-up path, and understand the longer
|
| 7 |
+
Xperience-native pretraining goal.
|
| 8 |
|
| 9 |
## Start Here
|
| 10 |
|
| 11 |
| Artifact | Why to open it first |
|
| 12 |
| --- | --- |
|
| 13 |
| [`PROJECT_STATUS.md`](PROJECT_STATUS.md) | Gives the fastest current-state table: implemented, in staging, and outside current scope. |
|
| 14 |
+
| [`RESEARCH_ROADMAP.md`](RESEARCH_ROADMAP.md) | Shows the roadmap from public-sample task development to multi-episode data preparation, Qwen3-Omni LoRA, robustness runs, model branches, and the future native-pretraining goal. |
|
| 15 |
| [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md) | Explains which foundation backbones fit which Xperience-10M objective: Qwen3-Omni first, Cosmos 3 for world modeling, and VLA/policy models after action-target conversion. |
|
| 16 |
+
| [`XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md`](XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md) | Describes the future full-corpus Xperience Embodied Foundation Model goal, including modules, objectives, staged scale-up, hardware ranges, and evaluation. |
|
| 17 |
| [`EVALUATION_PROTOCOL.md`](EVALUATION_PROTOCOL.md) | Defines the task unit, chronological split, metrics, leakage controls, and current limitations. |
|
| 18 |
| [`REPRODUCIBILITY.md`](REPRODUCIBILITY.md) | Defines public reproduction commands, expected outputs, and unreproducible boundaries. |
|
| 19 |
| [`results/audio_ablation/AUDIO_ABLATION_SUMMARY.md`](results/audio_ablation/AUDIO_ABLATION_SUMMARY.md) | Shows measured current-audio and raw log-mel replacement deltas across the 12 task contracts. |
|
|
|
|
| 109 |
| [`scripts/omni/train_qwen3_omni_lora.py`](scripts/omni/train_qwen3_omni_lora.py) | Training entrypoint for the Qwen3-Omni LoRA pilot after the data gate passes. |
|
| 110 |
| [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md) | Adds the post-data-gate backbone selection plan: Qwen3-Omni first, Cosmos 3 for world modeling, and OpenVLA/openpi/GR00T for policy/action branches. |
|
| 111 |
| [`docs/data/foundation_model_plan.json`](docs/data/foundation_model_plan.json) | Machine-readable model-family registry with source links, entry conditions, and evaluation additions. |
|
| 112 |
+
| [`XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md`](XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md) | Future full-corpus Xperience-native pretraining plan; not a current model result. |
|
| 113 |
|
| 114 |
## What Is Not Included
|
| 115 |
|
FOUNDATION_MODEL_PLAN.md
CHANGED
|
@@ -20,6 +20,7 @@ run a held-out multi-episode foundation-model evaluation.
|
|
| 20 |
| 5 | openpi pi0/pi0.5 | Open robot policy and action expert baseline | Useful for action chunking, policy fine-tuning, and embodiment transfer experiments | Candidate for policy branch once action labels are retargeted |
|
| 21 |
| 6 | Gemini Robotics | Closed/API embodied reasoning reference | Strong candidate for qualitative reasoning and task interpretation, but not a local fine-tune target | Use only as an external comparison or annotation assistant |
|
| 22 |
| 7 | Octo / SmolVLA-style lightweight policies | Smaller reproducible robot-policy baselines | Good for cheaper action-policy experiments, but less directly omni-modal | Optional baseline branch after selected-episode data preparation |
|
|
|
|
| 23 |
|
| 24 |
## Why Qwen3-Omni Still Goes First
|
| 25 |
|
|
@@ -38,6 +39,46 @@ prepare video/audio/language prompts and adapter inputs. It is also suitable for
|
|
| 38 |
the 12 current task contracts, which mostly produce labels, structured JSON, or
|
| 39 |
short task answers.
|
| 40 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
## Why Cosmos 3 Should Be Added Next
|
| 42 |
|
| 43 |
Cosmos 3 should not replace the Qwen3-Omni pilot. It should become the first
|
|
@@ -105,6 +146,9 @@ The foundation-model stage should add metrics beyond the current 12-task suite:
|
|
| 105 |
retargeting artifacts are traceable.
|
| 106 |
6. Update public cards only when a branch has real manifests, predictions,
|
| 107 |
metrics, and qualitative examples.
|
|
|
|
|
|
|
|
|
|
| 108 |
|
| 109 |
## Source Links
|
| 110 |
|
|
@@ -116,3 +160,5 @@ The foundation-model stage should add metrics beyond the current 12-task suite:
|
|
| 116 |
- Gemini Robotics: https://deepmind.google/discover/blog/gemini-robotics-brings-ai-into-the-physical-world/
|
| 117 |
- Octo: https://octo-models.github.io/
|
| 118 |
- LeRobot / SmolVLA: https://github.com/huggingface/lerobot
|
|
|
|
|
|
|
|
|
| 20 |
| 5 | openpi pi0/pi0.5 | Open robot policy and action expert baseline | Useful for action chunking, policy fine-tuning, and embodiment transfer experiments | Candidate for policy branch once action labels are retargeted |
|
| 21 |
| 6 | Gemini Robotics | Closed/API embodied reasoning reference | Strong candidate for qualitative reasoning and task interpretation, but not a local fine-tune target | Use only as an external comparison or annotation assistant |
|
| 22 |
| 7 | Octo / SmolVLA-style lightweight policies | Smaller reproducible robot-policy baselines | Good for cheaper action-policy experiments, but less directly omni-modal | Optional baseline branch after selected-episode data preparation |
|
| 23 |
+
| Future | Xperience Embodied Foundation Model | Xperience-native domain model pretrained from scratch on full-corpus embodied experience | Would learn a shared temporal representation across video, audio, depth, pose, mocap, IMU, and language | Long-term goal after smaller pilots prove value and full-corpus storage/compute are available |
|
| 24 |
|
| 25 |
## Why Qwen3-Omni Still Goes First
|
| 26 |
|
|
|
|
| 39 |
the 12 current task contracts, which mostly produce labels, structured JSON, or
|
| 40 |
short task answers.
|
| 41 |
|
| 42 |
+
The executable Qwen branch and future branch contracts are now represented as
|
| 43 |
+
config files under `configs/omni_backbones/`. Validate them with:
|
| 44 |
+
|
| 45 |
+
```bash
|
| 46 |
+
python scripts/omni/backbone_registry.py --validate --json
|
| 47 |
+
```
|
| 48 |
+
|
| 49 |
+
The shared extension rules are in
|
| 50 |
+
[`OMNI_MODEL_EXTENSION_CONTRACT.md`](OMNI_MODEL_EXTENSION_CONTRACT.md). A new
|
| 51 |
+
foundation branch should add a config first, then implement the exporter,
|
| 52 |
+
trainer, evaluator, and launcher required by that config.
|
| 53 |
+
|
| 54 |
+
## Long-Term Native Pretraining Goal
|
| 55 |
+
|
| 56 |
+
Qwen3-Omni, Cosmos 3, GR00T, OpenVLA, and openpi are backbone choices for the
|
| 57 |
+
next experiments. The longer-term goal is different: train an
|
| 58 |
+
**Xperience Embodied Foundation Model** that is native to the Xperience-10M
|
| 59 |
+
modality structure.
|
| 60 |
+
|
| 61 |
+
That model would not start as a general internet-scale omni model. It would be
|
| 62 |
+
a domain model over synchronized embodied experience: multi-view egocentric
|
| 63 |
+
video, audio, depth, pose/SLAM, hand and body mocap, IMU, calibration, and
|
| 64 |
+
language annotations. Its pretraining should combine masked multimodal
|
| 65 |
+
modeling, cross-modal contrastive alignment, future-state prediction,
|
| 66 |
+
ego-motion and hand-motion forecasting, action/procedure prediction, language
|
| 67 |
+
grounding, contact/affordance prediction, and optional policy-style targets
|
| 68 |
+
after action conversion.
|
| 69 |
+
|
| 70 |
+
This is not a current result in the repo. It becomes appropriate only after:
|
| 71 |
+
|
| 72 |
+
- the selected multi-episode pipeline trains and evaluates cleanly,
|
| 73 |
+
- scaling from 128 episodes to thousands of episodes shows measurable value,
|
| 74 |
+
- raw-corpus storage and derived-shard capacity are available,
|
| 75 |
+
- distributed training and checkpoint/restart infrastructure are reliable,
|
| 76 |
+
- evaluation covers held-out episodes, sessions, activities, objects, and
|
| 77 |
+
missing-modality robustness.
|
| 78 |
+
|
| 79 |
+
The full plan is documented in
|
| 80 |
+
[`XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md`](XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md).
|
| 81 |
+
|
| 82 |
## Why Cosmos 3 Should Be Added Next
|
| 83 |
|
| 84 |
Cosmos 3 should not replace the Qwen3-Omni pilot. It should become the first
|
|
|
|
| 146 |
retargeting artifacts are traceable.
|
| 147 |
6. Update public cards only when a branch has real manifests, predictions,
|
| 148 |
metrics, and qualitative examples.
|
| 149 |
+
7. Start Xperience-native pretraining only after smaller scaling stages,
|
| 150 |
+
full-corpus storage, multi-node compute, and held-out evaluation protocols
|
| 151 |
+
are in place.
|
| 152 |
|
| 153 |
## Source Links
|
| 154 |
|
|
|
|
| 160 |
- Gemini Robotics: https://deepmind.google/discover/blog/gemini-robotics-brings-ai-into-the-physical-world/
|
| 161 |
- Octo: https://octo-models.github.io/
|
| 162 |
- LeRobot / SmolVLA: https://github.com/huggingface/lerobot
|
| 163 |
+
- Xperience Embodied Foundation Model pretraining plan:
|
| 164 |
+
`XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md`
|
PROJECT_README.md
CHANGED
|
@@ -42,7 +42,7 @@ embodied-AI research infrastructure:
|
|
| 42 |
| Multimodal data understanding | Parses the public sample into synchronized windows across video, audio, depth, pose/SLAM, mocap, IMU, calibration, and language-derived signals |
|
| 43 |
| Task design | Defines 12 human-readable tasks plus four direction-extension probes with inputs, outputs, process modules, metrics, and case-study walkthroughs |
|
| 44 |
| Model and evaluation discipline | Runs minimal and compact neural baselines, records predictions/metrics, keeps chronological split boundaries explicit, and separates sample evidence from held-out claims |
|
| 45 |
-
| Scale-up planning | Connects the public-sample pipeline to 32/128-episode held-out pilots, Qwen3-Omni LoRA, Cosmos-style world-model branches, and
|
| 46 |
|
| 47 |
## Start Here
|
| 48 |
|
|
@@ -59,6 +59,7 @@ before the multi-episode omni-model stage becomes a real held-out evaluation.
|
|
| 59 |
| Navigate the 12 tasks, four tracks, and scale-up plan | [Interactive research roadmap](https://chaoyue0307.github.io/ropedia-xperience-10m-task-suite/research_roadmap.html), [`docs/data/research_roadmap_interactive.json`](docs/data/research_roadmap_interactive.json) |
|
| 60 |
| Compare current task metrics | [`RESEARCH_TAKEAWAYS.md`](RESEARCH_TAKEAWAYS.md), [`docs/data/summary_metrics.json`](docs/data/summary_metrics.json) |
|
| 61 |
| Compare possible foundation backbones | [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md), [`docs/data/foundation_model_plan.json`](docs/data/foundation_model_plan.json) |
|
|
|
|
| 62 |
| Understand one model input | [`results/episode_task_suite/feature_manifest.json`](results/episode_task_suite/feature_manifest.json), [`results/episode_task_suite/windows.csv`](results/episode_task_suite/windows.csv) |
|
| 63 |
| Check multi-episode data status | [`results/omni_finetune/DATA_ACCESS_STATUS.md`](results/omni_finetune/DATA_ACCESS_STATUS.md) |
|
| 64 |
|
|
@@ -71,7 +72,7 @@ before the multi-episode omni-model stage becomes a real held-out evaluation.
|
|
| 71 |
| Task suite | 12 human-readable embodied-AI task contracts with input, process, output, metrics, predictions, and case-study walkthroughs |
|
| 72 |
| Baselines | Minimal linear/ridge/logistic heads plus compact PyTorch MLP task heads over the same chronological split |
|
| 73 |
| Research directions | Task mapping and extension probes for human modeling, 3D/4D reconstruction, egocentric interaction, and world modeling |
|
| 74 |
-
| Scale-up path | The gated Xperience-10M dataset is available for a selected 128-episode pilot before Qwen3-Omni LoRA, followed by Cosmos 3/world-model and VLA/policy branches |
|
| 75 |
| Public surfaces | GitHub repo, GitHub Pages dashboard, HF Space, HF artifact dataset, HF baseline-model repo, and HF collection |
|
| 76 |
|
| 77 |
For the fastest interpretation of the current metrics, start with
|
|
@@ -93,100 +94,27 @@ Current contributions:
|
|
| 93 |
- human-readable research task cards and an interactive scrub/play walkthrough storyboard for every task,
|
| 94 |
- an interactive research roadmap connecting 12 tasks, four research tracks, current sample evidence, the Qwen3-Omni scale-up path, and foundation-model branch selection,
|
| 95 |
- a next-milestone track for Qwen3-Omni fine-tuning, Cosmos 3 world modeling, and sensor-bridge evaluation,
|
|
|
|
| 96 |
- metrics, predictions, model weights, manifests, charts, and a two-level
|
| 97 |
tabbed static research website,
|
| 98 |
- a clear explanation of what is implemented now and what moves to the multi-episode stage.
|
| 99 |
|
| 100 |
## Current Research Scope
|
| 101 |
|
| 102 |
-
This
|
| 103 |
-
multi-episode held-out model metrics:
|
| 104 |
|
| 105 |
-
|
|
| 106 |
| --- | --- | --- |
|
| 107 |
-
|
|
| 108 |
-
|
|
| 109 |
-
|
|
| 110 |
-
|
|
| 111 |
-
|
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
| Foundation-model plan | `FOUNDATION_MODEL_PLAN.md`, `docs/data/foundation_model_plan.json` | keeps Qwen3-Omni as the first trainable pilot, adds Cosmos 3 as the first world-model branch, and tracks OpenVLA/openpi/GR00T policy candidates |
|
| 118 |
-
| 12-task suite | `scripts/episode_task_suite.py`, per-task `metrics.json`, predictions | chronological single-episode split |
|
| 119 |
-
| Single-episode diagnostics | `scripts/single_episode_diagnostics.py`, `results/single_episode_diagnostics/`, `docs/single_episode_explorer.html` | modality ablations, timeline overlay, object-label export, alignment stress tests, and interactive window inspection from one sample episode |
|
| 120 |
-
| Neural heads | `scripts/neural_task_models.py`, `results/episode_task_suite/neural_mlp/` | compact MLP heads, not a foundation model |
|
| 121 |
-
| Research directions | `research_direction_taxonomy.json`, extension probe results | direct/proxy/diagnostic evidence, not full solutions |
|
| 122 |
-
| Task surface integrity | `docs/data/task_surface_integrity.json`, `scripts/validate_task_surface.py` | public task cards stay human-readable, thumbnail-backed, and wired to the scrub/play walkthrough storyboard |
|
| 123 |
-
| Rendered website check | `RENDERED_SITE_CHECK.md`, `docs/data/rendered_site_check.json`, `scripts/build_rendered_site_check.py` | records a browser-level load, tab, walkthrough deep-link, control-click, and console-health check |
|
| 124 |
-
| Public project surface | `PUBLIC_SURFACE_QA.md`, `docs/data/public_surface_qa.json`, `scripts/build_public_surface_qa.py` | presents the repo, website, and Hugging Face cards as one research project surface |
|
| 125 |
-
| Qwen3-Omni | `results/omni_finetune/DATA_ACCESS_STATUS.md`, `MULTI_EPISODE_ACCESS_STATUS.md` | the gated full dataset is available for a selected 128-episode pilot before held-out evaluation |
|
| 126 |
-
| Multi-episode pilot status | `scripts/validate_scope_claims.py`, `docs/data/scope_claims_audit.json` | separates setup artifacts, selected-episode preparation, and completed held-out-episode metrics |
|
| 127 |
-
| Mirror parity | `scripts/validate_mirror_parity.py`, `docs/data/mirror_parity.json` | prepared GitHub/HF mirrors carry matching data, figure, website HTML, and validator files |
|
| 128 |
-
| Public bundle contents | `scripts/validate_publication_package.py`, `docs/data/publication_audit.json` | summarizes the public repo and HF bundles, including raw-data exclusion and temporary local-file exclusion |
|
| 129 |
-
| Release checks | `QUALITY_GATES.md`, `docs/data/quality_gates.json`, `metrics/quality_gates.json`, `scripts/build_quality_gates.py` | one map for automated checks and live post-publish verification; the `metrics/` path is the Hugging Face model-repo mirror |
|
| 130 |
-
| Artifact index | `scripts/build_artifact_index.py`, `docs/data/artifact_index.json` | selective source-of-truth catalog with existence, size, and stable-file hashes |
|
| 131 |
-
| Project status | `PROJECT_STATUS.md`, `docs/data/project_status.json` | compact current-state table for first-pass readers |
|
| 132 |
-
| Citation and metadata | `CITATION.cff`, `codemeta.json`, `docs/data/project_manifest.json`, `LICENSE` | code is MIT-scoped; raw-data use follows Xperience-10M terms |
|
| 133 |
-
| Project path | `docs/data/project_packet.json`, website project path section | navigation guide across data, tasks, results, and scale-up status |
|
| 134 |
-
|
| 135 |
-
Read the full scope note in [`EVIDENCE_CONTRACT.md`](EVIDENCE_CONTRACT.md), or
|
| 136 |
-
consume the machine-readable copy at
|
| 137 |
-
[`docs/data/evidence_contract.json`](docs/data/evidence_contract.json).
|
| 138 |
-
The current release package report is at
|
| 139 |
-
[`docs/data/publication_audit.json`](docs/data/publication_audit.json).
|
| 140 |
-
The release-check summary is at
|
| 141 |
-
[`QUALITY_GATES.md`](QUALITY_GATES.md) and
|
| 142 |
-
[`docs/data/quality_gates.json`](docs/data/quality_gates.json).
|
| 143 |
-
The last live-publication verification report is at
|
| 144 |
-
[`docs/data/live_publication_status.json`](docs/data/live_publication_status.json).
|
| 145 |
-
The current prepared-mirror parity report is at
|
| 146 |
-
[`docs/data/mirror_parity.json`](docs/data/mirror_parity.json).
|
| 147 |
-
The current multi-episode pilot status note is at
|
| 148 |
-
[`docs/data/scope_claims_audit.json`](docs/data/scope_claims_audit.json).
|
| 149 |
-
The task-card and walkthrough-storyboard integrity report is at
|
| 150 |
-
[`docs/data/task_surface_integrity.json`](docs/data/task_surface_integrity.json).
|
| 151 |
-
The public presentation report is at
|
| 152 |
-
[`PUBLIC_SURFACE_QA.md`](PUBLIC_SURFACE_QA.md) and
|
| 153 |
-
[`docs/data/public_surface_qa.json`](docs/data/public_surface_qa.json).
|
| 154 |
-
The generated evaluation protocol is at
|
| 155 |
-
[`EVALUATION_PROTOCOL.md`](EVALUATION_PROTOCOL.md) and
|
| 156 |
-
[`docs/data/evaluation_protocol.json`](docs/data/evaluation_protocol.json).
|
| 157 |
-
The generated research takeaways are at
|
| 158 |
-
[`RESEARCH_TAKEAWAYS.md`](RESEARCH_TAKEAWAYS.md) and
|
| 159 |
-
[`docs/data/research_takeaways.json`](docs/data/research_takeaways.json).
|
| 160 |
-
The research roadmap is at
|
| 161 |
-
[`RESEARCH_ROADMAP.md`](RESEARCH_ROADMAP.md) and
|
| 162 |
-
[`docs/data/research_roadmap.json`](docs/data/research_roadmap.json).
|
| 163 |
-
The foundation-model selection plan is at
|
| 164 |
-
[`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md) and
|
| 165 |
-
[`docs/data/foundation_model_plan.json`](docs/data/foundation_model_plan.json).
|
| 166 |
-
The source-of-truth artifact index is at
|
| 167 |
-
[`docs/data/artifact_index.json`](docs/data/artifact_index.json).
|
| 168 |
-
For a human-readable artifact map, use
|
| 169 |
-
[`ARTIFACT_GUIDE.md`](ARTIFACT_GUIDE.md).
|
| 170 |
-
For reproduction commands and expected outputs, use
|
| 171 |
-
[`REPRODUCIBILITY.md`](REPRODUCIBILITY.md) and
|
| 172 |
-
[`docs/data/reproducibility_matrix.json`](docs/data/reproducibility_matrix.json).
|
| 173 |
-
Project citation and machine-readable metadata live in
|
| 174 |
-
[`CITATION.cff`](CITATION.cff), [`codemeta.json`](codemeta.json), and
|
| 175 |
-
[`docs/data/project_manifest.json`](docs/data/project_manifest.json).
|
| 176 |
-
The upstream dataset-card alignment note is
|
| 177 |
-
[`XPERIENCE10M_DATASET_CARD_ALIGNMENT.md`](XPERIENCE10M_DATASET_CARD_ALIGNMENT.md),
|
| 178 |
-
with a machine-readable copy at
|
| 179 |
-
[`docs/data/xperience10m_dataset_card_alignment.json`](docs/data/xperience10m_dataset_card_alignment.json).
|
| 180 |
-
The generated source-alignment note is at
|
| 181 |
-
[`SOURCE_ALIGNMENT_AUDIT.md`](SOURCE_ALIGNMENT_AUDIT.md) and
|
| 182 |
-
[`docs/data/source_alignment_audit.json`](docs/data/source_alignment_audit.json).
|
| 183 |
-
The generated figure index is at
|
| 184 |
-
[`FIGURE_INDEX.md`](FIGURE_INDEX.md) and
|
| 185 |
-
[`docs/data/figure_index.json`](docs/data/figure_index.json).
|
| 186 |
-
The project logo system is packaged by
|
| 187 |
-
[`scripts/build_brand_assets.py`](scripts/build_brand_assets.py), stored under
|
| 188 |
-
[`docs/assets/brand/`](docs/assets/brand/), and indexed in
|
| 189 |
-
[`docs/data/brand_assets.json`](docs/data/brand_assets.json).
|
| 190 |
|
| 191 |
## Project Status
|
| 192 |
|
|
@@ -200,10 +128,9 @@ They give the current research state in one compact table:
|
|
| 200 |
| Public-sample pipeline | Verified on one public sample episode: 5,821 frames, 1,161 windows, 8,546 dimensions |
|
| 201 |
| 12-task suite | Verified minimal baselines with committed metrics, predictions, and manifests |
|
| 202 |
| Neural heads | Verified compact PyTorch MLP heads over the same task contracts and chronological splits |
|
| 203 |
-
|
|
| 204 |
-
| Source alignment | Source facts, sample details, API-listing notes, and project coverage are consistent across repo, website, and HF cards |
|
| 205 |
| Evaluation protocol | Verified generated protocol for windowing, split policy, leakage controls, and per-task metrics |
|
| 206 |
-
| Website and
|
| 207 |
| Qwen3-Omni multi-episode pilot | The gated Xperience-10M dataset is available for selected 128-episode preparation, with full metrics pending completed preprocessing, training, and held-out evaluation |
|
| 208 |
| Raw Xperience-10M data / full Qwen weights | Not redistributed |
|
| 209 |
|
|
@@ -213,33 +140,31 @@ If you are reading the project cold, open these in order:
|
|
| 213 |
|
| 214 |
| Step | Question | Primary artifacts | What should be true |
|
| 215 |
| --- | --- | --- | --- |
|
| 216 |
-
| 1 | What
|
| 217 |
-
| 2 | What is
|
| 218 |
-
| 3 |
|
| 219 |
-
| 4 |
|
| 220 |
-
| 5 |
|
| 221 |
-
| 6 | What
|
| 222 |
-
| 7 | Which
|
| 223 |
-
| 8 |
|
| 224 |
-
| 9 |
|
| 225 |
-
| 10 |
|
| 226 |
-
| 11 |
|
| 227 |
-
|
| 228 |
-
|
| 229 |
-
The machine-readable project packet is
|
| 230 |
[`docs/data/project_packet.json`](docs/data/project_packet.json).
|
| 231 |
|
| 232 |
-
##
|
| 233 |
|
| 234 |
-
[`
|
| 235 |
-
|
| 236 |
-
|
| 237 |
-
|
| 238 |
-
are checked for presence and size rather than treated as fixed hashes.
|
| 239 |
|
| 240 |
-
[`
|
| 241 |
-
|
| 242 |
-
|
| 243 |
|
| 244 |
## Evaluation Protocol
|
| 245 |
|
|
@@ -256,41 +181,20 @@ generated from committed metric artifacts. They define:
|
|
| 256 |
audio-visual learning, pixel-depth reconstruction, and real held-out
|
| 257 |
multi-episode Qwen3-Omni quality.
|
| 258 |
|
| 259 |
-
##
|
| 260 |
|
| 261 |
The official [`ropedia-ai/xperience-10m`](https://huggingface.co/datasets/ropedia-ai/xperience-10m)
|
| 262 |
-
|
| 263 |
-
|
| 264 |
-
|
| 265 |
-
|
| 266 |
-
language; `other` license; and manually reviewed non-commercial access.
|
| 267 |
-
|
| 268 |
-
At full scale, the official card describes about 10 million experience units,
|
| 269 |
-
about 10,000 hours, six RGB streams per episode, audio, stereo depth, camera
|
| 270 |
-
pose/SLAM, hand and full-body mocap, IMU, captions, metadata, and calibration.
|
| 271 |
-
The card also reports headline counts such as billions of RGB/depth/IMU records
|
| 272 |
-
and large caption/object annotations. The live HF page/API separately shows a
|
| 273 |
-
31.9 TB currently hosted file-size display; this is kept separate from the
|
| 274 |
-
card's about-1PB full-scale storage statement. This repo records those upstream facts in
|
| 275 |
-
[`XPERIENCE10M_DATASET_CARD_ALIGNMENT.md`](XPERIENCE10M_DATASET_CARD_ALIGNMENT.md)
|
| 276 |
-
and [`docs/data/xperience10m_dataset_card_alignment.json`](docs/data/xperience10m_dataset_card_alignment.json).
|
| 277 |
|
| 278 |
-
|
| 279 |
-
|
| 280 |
-
|
| 281 |
-
|
| 282 |
-
Those counts are upstream listing metadata only; they are not local downloads,
|
| 283 |
-
not redistributed files, and not evidence of model quality in this repo.
|
| 284 |
|
| 285 |
-
The
|
| 286 |
-
[`ropedia-ai/xperience-10m-sample`](https://huggingface.co/datasets/ropedia-ai/xperience-10m-sample),
|
| 287 |
-
is separately documented as `Xperience-10M-Sample` with sample metadata,
|
| 288 |
-
`cc-by-nc-4.0` license, HOMIE Toolkit usage, and Rerun 0.29.0 `.rrd`
|
| 289 |
-
visualization. This project preserves that distinction: the sample powers the
|
| 290 |
-
current 5,821-frame task suite, while the full gated dataset is the source for
|
| 291 |
-
the selected 128-episode held-out multi-episode pilot now in preparation.
|
| 292 |
-
|
| 293 |
-
This repo's current verified subset is much smaller and intentionally explicit:
|
| 294 |
|
| 295 |
- one public sample episode, 5,821 frames, and 1,161 aligned windows,
|
| 296 |
- raw sample files with six MP4 video streams and audio streams,
|
|
@@ -299,15 +203,11 @@ This repo's current verified subset is much smaller and intentionally explicit:
|
|
| 299 |
- an 8,546-dimensional baseline representation using video, audio, depth,
|
| 300 |
pose/SLAM, mocap, IMU, calibration, and language-derived signals.
|
| 301 |
|
| 302 |
-
|
| 303 |
-
|
| 304 |
-
|
| 305 |
-
|
| 306 |
-
|
| 307 |
-
dataset is limited in diversity and showcase/production quality, and it should
|
| 308 |
-
not be used for identity recognition, re-identification, biometric profiling,
|
| 309 |
-
surveillance, sensitive attribute inference, or safety-critical deployment
|
| 310 |
-
without appropriate safeguards.
|
| 311 |
|
| 312 |
Start with the visual dashboard:
|
| 313 |
|
|
@@ -323,22 +223,15 @@ Hugging Face Space app:
|
|
| 323 |
| --- | --- | --- |
|
| 324 |
| Project status | `PROJECT_STATUS.md`, `docs/data/project_status.json` | Gives a one-table current project summary before reading the full artifact trail |
|
| 325 |
| Data contract | `windows.csv`, `feature_manifest.json`, modality manifests | Confirms what each sample window contains before modeling |
|
| 326 |
-
|
|
| 327 |
-
|
|
| 328 |
-
| Figure index | `FIGURE_INDEX.md`, `docs/data/figure_index.json` | Indexes public figures, charts, modality thumbnails, dimensions, hashes, and source scripts |
|
| 329 |
-
| Brand assets | `docs/data/brand_assets.json`, `docs/assets/brand/` | Indexes the generated logo, favicon, README/HF card image, app icon, and social preview |
|
| 330 |
| Evaluation protocol | `EVALUATION_PROTOCOL.md`, `docs/data/evaluation_protocol.json` | Defines the task unit, split, metrics, leakage controls, and current limitations |
|
| 331 |
-
|
|
| 332 |
-
|
|
| 333 |
-
| Research roadmap | `RESEARCH_ROADMAP.md`, `docs/data/research_roadmap.json` | Shows the path from sample-level task development to multi-episode and larger omni-model work |
|
| 334 |
| Minimal heads | softmax, ridge projection/regression, multi-label logistic heads | Keeps every input/output contract visible and inspectable |
|
| 335 |
| Neural heads | PyTorch MLP classifiers/regressors under `neural_mlp/` | Checks whether nonlinear heads improve each task without changing features |
|
| 336 |
| Evidence | metrics, predictions, confusion matrices, diagrams, dashboard | Makes the single-episode task development inspectable without rerunning first |
|
| 337 |
-
|
|
| 338 |
-
| Live publication status | `docs/data/live_publication_status.json` | Records the last live GitHub Pages, GitHub raw, and Hugging Face mirror verification |
|
| 339 |
-
| Public bundle contents | `docs/data/publication_audit.json` | Summarizes public bundle contents, raw Xperience-10M data exclusion, cache exclusion, archive exclusion, credential-text checks, and public-card figure references |
|
| 340 |
-
| Artifact index | `docs/data/artifact_index.json` | Gives readers a compact source-of-truth catalog with stable hashes |
|
| 341 |
-
| Artifact guide | `ARTIFACT_GUIDE.md` | Groups the public evidence into research-project layers |
|
| 342 |
| Reproducibility contract | `REPRODUCIBILITY.md`, `docs/data/reproducibility_matrix.json` | States public commands, expected outputs, exact-match reproduction evidence, and non-reproducible boundaries |
|
| 343 |
| Citation metadata | `CITATION.cff`, `codemeta.json`, `LICENSE` | Makes the repo easier to cite, index, and reuse without confusing code license and dataset terms |
|
| 344 |
|
|
@@ -421,12 +314,12 @@ scripts/
|
|
| 421 |
export_modality_atlas_assets.py # exports responsive modality-card assets
|
| 422 |
render_overview_figures.py # renders polished pipeline/architecture PNGs
|
| 423 |
build_brand_assets.py # derives logo sizes, favicon, social card
|
| 424 |
-
build_artifact_index.py # builds the
|
| 425 |
build_quality_gates.py # builds release checks
|
| 426 |
validate_mirror_parity.py # checks prepared GitHub/HF mirror file parity
|
| 427 |
-
validate_scope_claims.py #
|
| 428 |
validate_task_surface.py # checks readable task cards and interactive storyboard wiring
|
| 429 |
-
validate_website_integrity.py # checks local site links, anchors,
|
| 430 |
validate_publication_package.py # checks public repo + HF bundle contents
|
| 431 |
publish_hf_bundles.py # uploads prepared HF Space/artifact/model bundles
|
| 432 |
omni/
|
|
@@ -454,11 +347,9 @@ docs/
|
|
| 454 |
data/artifact_index.json # compact project-artifact catalog
|
| 455 |
data/live_publication_status.json # live GitHub/HF publication verification
|
| 456 |
data/quality_gates.json # machine-readable release checks
|
| 457 |
-
data/publication_audit.json # machine-readable public bundle report
|
| 458 |
data/task_surface_integrity.json # machine-readable task-card/storyboard integrity check
|
| 459 |
-
data/website_integrity.json # machine-readable website integrity check
|
| 460 |
data/project_manifest.json # machine-readable public-surface metadata
|
| 461 |
-
data/project_packet.json #
|
| 462 |
data/research_roadmap.json # multi-episode and omni-model roadmap
|
| 463 |
data/research_directions.json # four-track website data bundle
|
| 464 |
data/research_direction_extensions.json # four extra probe data bundle
|
|
@@ -671,6 +562,59 @@ uses the same split guard, exports episodes in parallel CPU shards, skips and
|
|
| 671 |
reports episodes that contain no labeled windows under the configured label
|
| 672 |
rule, then launches Qwen3-Omni LoRA with `NUM_PROCESSES=8`.
|
| 673 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 674 |
### Uploading the pilot Qwen3-Omni LoRA
|
| 675 |
|
| 676 |
A prepared upload package is available at `results/omni_finetune/hf_upload`.
|
|
@@ -697,11 +641,23 @@ assuming one backbone solves every Xperience-10M objective.
|
|
| 697 |
| GR00T | Humanoid/action-policy branch | Use after mocap/contact retargeting creates well-defined humanoid action targets. |
|
| 698 |
| OpenVLA / openpi | Open VLA/policy baselines | Use after the project defines robot-compatible or action-token targets. |
|
| 699 |
| Gemini Robotics | External reasoning reference | Use only for qualitative comparison or annotation support unless local trainable access exists. |
|
|
|
|
| 700 |
|
| 701 |
See [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md) and
|
| 702 |
[`docs/data/foundation_model_plan.json`](docs/data/foundation_model_plan.json)
|
| 703 |
for the full selection matrix, source links, and model-specific evaluation
|
| 704 |
-
additions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 705 |
|
| 706 |
## Four Research Directions
|
| 707 |
|
|
|
|
| 42 |
| Multimodal data understanding | Parses the public sample into synchronized windows across video, audio, depth, pose/SLAM, mocap, IMU, calibration, and language-derived signals |
|
| 43 |
| Task design | Defines 12 human-readable tasks plus four direction-extension probes with inputs, outputs, process modules, metrics, and case-study walkthroughs |
|
| 44 |
| Model and evaluation discipline | Runs minimal and compact neural baselines, records predictions/metrics, keeps chronological split boundaries explicit, and separates sample evidence from held-out claims |
|
| 45 |
+
| Scale-up planning | Connects the public-sample pipeline to 32/128-episode held-out pilots, Qwen3-Omni LoRA, Cosmos-style world-model branches, policy-model branches, and the future Xperience-native foundation-model pretraining goal |
|
| 46 |
|
| 47 |
## Start Here
|
| 48 |
|
|
|
|
| 59 |
| Navigate the 12 tasks, four tracks, and scale-up plan | [Interactive research roadmap](https://chaoyue0307.github.io/ropedia-xperience-10m-task-suite/research_roadmap.html), [`docs/data/research_roadmap_interactive.json`](docs/data/research_roadmap_interactive.json) |
|
| 60 |
| Compare current task metrics | [`RESEARCH_TAKEAWAYS.md`](RESEARCH_TAKEAWAYS.md), [`docs/data/summary_metrics.json`](docs/data/summary_metrics.json) |
|
| 61 |
| Compare possible foundation backbones | [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md), [`docs/data/foundation_model_plan.json`](docs/data/foundation_model_plan.json) |
|
| 62 |
+
| Understand the future native pretraining goal | [`XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md`](XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md) |
|
| 63 |
| Understand one model input | [`results/episode_task_suite/feature_manifest.json`](results/episode_task_suite/feature_manifest.json), [`results/episode_task_suite/windows.csv`](results/episode_task_suite/windows.csv) |
|
| 64 |
| Check multi-episode data status | [`results/omni_finetune/DATA_ACCESS_STATUS.md`](results/omni_finetune/DATA_ACCESS_STATUS.md) |
|
| 65 |
|
|
|
|
| 72 |
| Task suite | 12 human-readable embodied-AI task contracts with input, process, output, metrics, predictions, and case-study walkthroughs |
|
| 73 |
| Baselines | Minimal linear/ridge/logistic heads plus compact PyTorch MLP task heads over the same chronological split |
|
| 74 |
| Research directions | Task mapping and extension probes for human modeling, 3D/4D reconstruction, egocentric interaction, and world modeling |
|
| 75 |
+
| Scale-up path | The gated Xperience-10M dataset is available for a selected 128-episode pilot before Qwen3-Omni LoRA, followed by Cosmos 3/world-model and VLA/policy branches; the long-term goal is an Xperience-native embodied foundation model if full-corpus data, storage, and compute are available |
|
| 76 |
| Public surfaces | GitHub repo, GitHub Pages dashboard, HF Space, HF artifact dataset, HF baseline-model repo, and HF collection |
|
| 77 |
|
| 78 |
For the fastest interpretation of the current metrics, start with
|
|
|
|
| 94 |
- human-readable research task cards and an interactive scrub/play walkthrough storyboard for every task,
|
| 95 |
- an interactive research roadmap connecting 12 tasks, four research tracks, current sample evidence, the Qwen3-Omni scale-up path, and foundation-model branch selection,
|
| 96 |
- a next-milestone track for Qwen3-Omni fine-tuning, Cosmos 3 world modeling, and sensor-bridge evaluation,
|
| 97 |
+
- a future pretraining plan for an Xperience Embodied Foundation Model over the full corpus after smaller multi-episode stages prove value,
|
| 98 |
- metrics, predictions, model weights, manifests, charts, and a two-level
|
| 99 |
tabbed static research website,
|
| 100 |
- a clear explanation of what is implemented now and what moves to the multi-episode stage.
|
| 101 |
|
| 102 |
## Current Research Scope
|
| 103 |
|
| 104 |
+
This project is best read as a staged embodied-AI research study:
|
|
|
|
| 105 |
|
| 106 |
+
| Layer | Current scope | Where to start |
|
| 107 |
| --- | --- | --- |
|
| 108 |
+
| Data understanding | One public Xperience-10M sample episode is converted into 5,821 frames, 1,161 aligned windows, and an 8,546-dimensional multimodal representation. | [`PROJECT_BRIEF.md`](PROJECT_BRIEF.md), [`PROJECT_STATUS.md`](PROJECT_STATUS.md) |
|
| 109 |
+
| Task suite | Twelve human-readable tasks cover action, procedure, contact, object, language, retrieval, reconstruction, order, and synchronization questions. | [`RESEARCH_TAKEAWAYS.md`](RESEARCH_TAKEAWAYS.md), [`results/episode_task_suite/summary_report.json`](results/episode_task_suite/summary_report.json) |
|
| 110 |
+
| Baselines | Minimal heads and compact PyTorch MLP heads provide a first controlled comparison on the same chronological split. | [`results/episode_task_suite/neural_mlp/`](results/episode_task_suite/neural_mlp/) |
|
| 111 |
+
| Diagnostics | Audio contribution, modality ablations, timeline overlays, object labels, and alignment stress tests show which signals are useful and which tasks remain hard. | [`results/audio_ablation/AUDIO_ABLATION_SUMMARY.md`](results/audio_ablation/AUDIO_ABLATION_SUMMARY.md), [`docs/single_episode_explorer.html`](docs/single_episode_explorer.html) |
|
| 112 |
+
| Scale-up | A selected 128-episode Qwen3-Omni LoRA pilot is being prepared from the gated dataset; held-out model metrics will be added only after training and evaluation finish. The long-term native-pretraining plan is documented separately as a future research goal. | [`RESEARCH_ROADMAP.md`](RESEARCH_ROADMAP.md), [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md), [`XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md`](XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md), [`results/omni_finetune/DATA_ACCESS_STATUS.md`](results/omni_finetune/DATA_ACCESS_STATUS.md) |
|
| 113 |
+
|
| 114 |
+
Detailed dataset notes, reproduction checks, and generated JSON reports are
|
| 115 |
+
included for readers who want to inspect the implementation, but they are
|
| 116 |
+
supporting materials rather than the main reading path. Use
|
| 117 |
+
[`ARTIFACT_GUIDE.md`](ARTIFACT_GUIDE.md) when you want the full file map.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 118 |
|
| 119 |
## Project Status
|
| 120 |
|
|
|
|
| 128 |
| Public-sample pipeline | Verified on one public sample episode: 5,821 frames, 1,161 windows, 8,546 dimensions |
|
| 129 |
| 12-task suite | Verified minimal baselines with committed metrics, predictions, and manifests |
|
| 130 |
| Neural heads | Verified compact PyTorch MLP heads over the same task contracts and chronological splits |
|
| 131 |
+
| Dataset context | Official Xperience-10M links, sample-vs-gated-data boundary, modality coverage, and redistribution policy are documented |
|
|
|
|
| 132 |
| Evaluation protocol | Verified generated protocol for windowing, split policy, leakage controls, and per-task metrics |
|
| 133 |
+
| Website and Hub pages | Public dashboard, Hugging Face Space, artifact dataset, baseline model repo, and collection use the same project framing and links |
|
| 134 |
| Qwen3-Omni multi-episode pilot | The gated Xperience-10M dataset is available for selected 128-episode preparation, with full metrics pending completed preprocessing, training, and held-out evaluation |
|
| 135 |
| Raw Xperience-10M data / full Qwen weights | Not redistributed |
|
| 136 |
|
|
|
|
| 140 |
|
| 141 |
| Step | Question | Primary artifacts | What should be true |
|
| 142 |
| --- | --- | --- | --- |
|
| 143 |
+
| 1 | What is this project? | [`PROJECT_BRIEF.md`](PROJECT_BRIEF.md), [`PROJECT_STATUS.md`](PROJECT_STATUS.md), [dashboard](https://chaoyue0307.github.io/ropedia-xperience-10m-task-suite/) | A public-sample Xperience-10M research project with 12 tasks, baselines, and a scale-up plan. |
|
| 144 |
+
| 2 | What data is used? | [`XPERIENCE10M_DATASET_CARD_ALIGNMENT.md`](XPERIENCE10M_DATASET_CARD_ALIGNMENT.md), [official HF dataset](https://huggingface.co/datasets/ropedia-ai/xperience-10m), [sample HF dataset](https://huggingface.co/datasets/ropedia-ai/xperience-10m-sample) | The implemented suite uses one public sample episode; the gated dataset is reserved for selected multi-episode training. |
|
| 145 |
+
| 3 | What does one model input contain? | [`windows.csv`](results/episode_task_suite/windows.csv), [`feature_manifest.json`](results/episode_task_suite/feature_manifest.json), [`available_modalities.json`](results/episode_task_suite/available_modalities.json) | Each window is an aligned multimodal unit with video, audio, depth, pose/SLAM, mocap, IMU, calibration, and language-derived signals. |
|
| 146 |
+
| 4 | What are the 12 tasks? | [`results/episode_task_suite/task_walkthroughs/`](results/episode_task_suite/task_walkthroughs/), [`docs/data/task_walkthroughs.json`](docs/data/task_walkthroughs.json) | Every task has a human-readable name, case study, input, process modules, output, metric, and limitation. |
|
| 147 |
+
| 5 | How are tasks evaluated? | [`EVALUATION_PROTOCOL.md`](EVALUATION_PROTOCOL.md), [`docs/data/evaluation_protocol.json`](docs/data/evaluation_protocol.json) | The window unit, chronological split, leakage controls, task metrics, and current limitations are explicit. |
|
| 148 |
+
| 6 | What do the current results mean? | [`RESEARCH_TAKEAWAYS.md`](RESEARCH_TAKEAWAYS.md), [`docs/data/research_takeaways.json`](docs/data/research_takeaways.json), [`docs/data/summary_metrics.json`](docs/data/summary_metrics.json) | Current metrics describe sample-level task behavior and identify which signals need larger held-out experiments. |
|
| 149 |
+
| 7 | Which models are implemented? | [`results/episode_task_suite/summary_report.json`](results/episode_task_suite/summary_report.json), [`results/episode_task_suite/neural_mlp/`](results/episode_task_suite/neural_mlp/), [HF baseline repo](https://huggingface.co/cy0307/ropedia-xperience-10m-task-baselines) | Each task has minimal and neural-head evidence over the same feature windows. |
|
| 150 |
+
| 8 | What research directions does this support? | [`RESEARCH_ROADMAP.md`](RESEARCH_ROADMAP.md), [`docs/data/research_directions.json`](docs/data/research_directions.json), [`docs/data/research_direction_extensions.json`](docs/data/research_direction_extensions.json) | The tasks are mapped to human modeling, 3D/4D reconstruction, egocentric interaction, and world modeling. |
|
| 151 |
+
| 9 | Which foundation model comes next? | [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md), [`docs/data/foundation_model_plan.json`](docs/data/foundation_model_plan.json), [`XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md`](XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md) | Qwen3-Omni is the first held-out LoRA baseline; Cosmos 3 is the first world-model branch; policy models wait for explicit action targets; Xperience-native pretraining is the full-corpus future goal. |
|
| 152 |
+
| 10 | How do I reproduce it? | [`REPRODUCIBILITY.md`](REPRODUCIBILITY.md), [`notes/reproducibility_audit.md`](notes/reproducibility_audit.md) | Public commands and expected outputs are documented for the sample-episode task suite. |
|
| 153 |
+
| 11 | What is still pending? | [`DATA_ACCESS_STATUS.md`](results/omni_finetune/DATA_ACCESS_STATUS.md), [`MULTI_EPISODE_ACCESS_STATUS.md`](results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md) | Multi-episode Qwen3-Omni model quality will be reported after preprocessing, training, and held-out evaluation complete. |
|
| 154 |
+
|
| 155 |
+
A compact reader-path summary is available at
|
|
|
|
| 156 |
[`docs/data/project_packet.json`](docs/data/project_packet.json).
|
| 157 |
|
| 158 |
+
## Supporting Files
|
| 159 |
|
| 160 |
+
[`ARTIFACT_GUIDE.md`](ARTIFACT_GUIDE.md) is the human-readable map for readers
|
| 161 |
+
who want to inspect the project files after the first pass. It groups the main
|
| 162 |
+
briefs, task outputs, baseline results, visual assets, data notes, and
|
| 163 |
+
scale-up documents.
|
|
|
|
| 164 |
|
| 165 |
+
[`docs/data/artifact_index.json`](docs/data/artifact_index.json) is the compact
|
| 166 |
+
machine-readable companion used by the website and Hugging Face artifact
|
| 167 |
+
dataset.
|
| 168 |
|
| 169 |
## Evaluation Protocol
|
| 170 |
|
|
|
|
| 181 |
audio-visual learning, pixel-depth reconstruction, and real held-out
|
| 182 |
multi-episode Qwen3-Omni quality.
|
| 183 |
|
| 184 |
+
## Dataset Context
|
| 185 |
|
| 186 |
The official [`ropedia-ai/xperience-10m`](https://huggingface.co/datasets/ropedia-ai/xperience-10m)
|
| 187 |
+
dataset is a gated large-scale egocentric multimodal dataset for embodied AI,
|
| 188 |
+
robotics, spatial intelligence, and world modeling. The public
|
| 189 |
+
[`ropedia-ai/xperience-10m-sample`](https://huggingface.co/datasets/ropedia-ai/xperience-10m-sample)
|
| 190 |
+
repo provides the sample episode used for the implemented task suite here.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 191 |
|
| 192 |
+
This project keeps those layers separate: the public sample supports the
|
| 193 |
+
current 12-task study, while the gated full dataset is used only for the
|
| 194 |
+
selected multi-episode Qwen3-Omni pilot. Raw Xperience-10M MP4/HDF5/RRD files
|
| 195 |
+
are not redistributed in this repo or in the Hugging Face mirrors.
|
|
|
|
|
|
|
| 196 |
|
| 197 |
+
The current verified public-sample subset is:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 198 |
|
| 199 |
- one public sample episode, 5,821 frames, and 1,161 aligned windows,
|
| 200 |
- raw sample files with six MP4 video streams and audio streams,
|
|
|
|
| 203 |
- an 8,546-dimensional baseline representation using video, audio, depth,
|
| 204 |
pose/SLAM, mocap, IMU, calibration, and language-derived signals.
|
| 205 |
|
| 206 |
+
Detailed dataset notes are available in
|
| 207 |
+
[`XPERIENCE10M_DATASET_CARD_ALIGNMENT.md`](XPERIENCE10M_DATASET_CARD_ALIGNMENT.md)
|
| 208 |
+
for readers who need the full upstream-card and access-term context. The
|
| 209 |
+
practical boundary is simple: current results come from the public sample, and
|
| 210 |
+
multi-episode model quality is pending the selected held-out pilot.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 211 |
|
| 212 |
Start with the visual dashboard:
|
| 213 |
|
|
|
|
| 223 |
| --- | --- | --- |
|
| 224 |
| Project status | `PROJECT_STATUS.md`, `docs/data/project_status.json` | Gives a one-table current project summary before reading the full artifact trail |
|
| 225 |
| Data contract | `windows.csv`, `feature_manifest.json`, modality manifests | Confirms what each sample window contains before modeling |
|
| 226 |
+
| Dataset context | `XPERIENCE10M_DATASET_CARD_ALIGNMENT.md`, official dataset links | Explains the official dataset, public sample, modalities, access boundary, and what this repo uses |
|
| 227 |
+
| Visual assets | `FIGURE_INDEX.md`, `docs/assets/` | Shows the task-suite graphic, modality thumbnails, pipeline diagrams, charts, and logo assets |
|
|
|
|
|
|
|
| 228 |
| Evaluation protocol | `EVALUATION_PROTOCOL.md`, `docs/data/evaluation_protocol.json` | Defines the task unit, split, metrics, leakage controls, and current limitations |
|
| 229 |
+
| Research roadmap | `RESEARCH_ROADMAP.md`, `docs/data/research_roadmap.json` | Shows the path from sample-level task development to multi-episode work, larger model branches, and the future native-pretraining goal |
|
| 230 |
+
| Xperience Embodied Foundation Model plan | `XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md` | Describes the long-term full-corpus pretraining goal, target modules, objectives, staged scale-up, hardware ranges, and evaluation protocol |
|
|
|
|
| 231 |
| Minimal heads | softmax, ridge projection/regression, multi-label logistic heads | Keeps every input/output contract visible and inspectable |
|
| 232 |
| Neural heads | PyTorch MLP classifiers/regressors under `neural_mlp/` | Checks whether nonlinear heads improve each task without changing features |
|
| 233 |
| Evidence | metrics, predictions, confusion matrices, diagrams, dashboard | Makes the single-episode task development inspectable without rerunning first |
|
| 234 |
+
| Artifact guide | `ARTIFACT_GUIDE.md` | Groups the public evidence into research-project layers after the first-pass overview |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 235 |
| Reproducibility contract | `REPRODUCIBILITY.md`, `docs/data/reproducibility_matrix.json` | States public commands, expected outputs, exact-match reproduction evidence, and non-reproducible boundaries |
|
| 236 |
| Citation metadata | `CITATION.cff`, `codemeta.json`, `LICENSE` | Makes the repo easier to cite, index, and reuse without confusing code license and dataset terms |
|
| 237 |
|
|
|
|
| 314 |
export_modality_atlas_assets.py # exports responsive modality-card assets
|
| 315 |
render_overview_figures.py # renders polished pipeline/architecture PNGs
|
| 316 |
build_brand_assets.py # derives logo sizes, favicon, social card
|
| 317 |
+
build_artifact_index.py # builds the compact artifact guide data
|
| 318 |
build_quality_gates.py # builds release checks
|
| 319 |
validate_mirror_parity.py # checks prepared GitHub/HF mirror file parity
|
| 320 |
+
validate_scope_claims.py # separates setup artifacts from completed model metrics
|
| 321 |
validate_task_surface.py # checks readable task cards and interactive storyboard wiring
|
| 322 |
+
validate_website_integrity.py # checks local site links, anchors, and images
|
| 323 |
validate_publication_package.py # checks public repo + HF bundle contents
|
| 324 |
publish_hf_bundles.py # uploads prepared HF Space/artifact/model bundles
|
| 325 |
omni/
|
|
|
|
| 347 |
data/artifact_index.json # compact project-artifact catalog
|
| 348 |
data/live_publication_status.json # live GitHub/HF publication verification
|
| 349 |
data/quality_gates.json # machine-readable release checks
|
|
|
|
| 350 |
data/task_surface_integrity.json # machine-readable task-card/storyboard integrity check
|
|
|
|
| 351 |
data/project_manifest.json # machine-readable public-surface metadata
|
| 352 |
+
data/project_packet.json # compact project path and scope summary
|
| 353 |
data/research_roadmap.json # multi-episode and omni-model roadmap
|
| 354 |
data/research_directions.json # four-track website data bundle
|
| 355 |
data/research_direction_extensions.json # four extra probe data bundle
|
|
|
|
| 562 |
reports episodes that contain no labeled windows under the configured label
|
| 563 |
rule, then launches Qwen3-Omni LoRA with `NUM_PROCESSES=8`.
|
| 564 |
|
| 565 |
+
### Full 128-Episode Held-Out Pilot
|
| 566 |
+
|
| 567 |
+
Once all selected episodes are complete, use the fixed selected-episode split:
|
| 568 |
+
|
| 569 |
+
- 96 train episodes,
|
| 570 |
+
- 16 validation episodes,
|
| 571 |
+
- 16 held-out test episodes.
|
| 572 |
+
|
| 573 |
+
The clean full-run launcher validates the selected split, exports all splits in
|
| 574 |
+
parallel, trains Qwen3-Omni LoRA on train/val only, then evaluates on the held-
|
| 575 |
+
out test split:
|
| 576 |
+
|
| 577 |
+
```bash
|
| 578 |
+
RUN_ID=xperience10m_qwen3_omni_128ep_fullsplit_fast8gpu \
|
| 579 |
+
DATA_ROOT=/path/to/xperience10m_128 \
|
| 580 |
+
SELECTION_JSON=results/omni_finetune/xperience10m_128_episode_selection.json \
|
| 581 |
+
MODEL_DIR=/path/to/Qwen__Qwen3-Omni-30B-A3B-Instruct \
|
| 582 |
+
NUM_PROCESSES=8 \
|
| 583 |
+
scripts/omni/run_128_fullsplit_parallel_export_8gpu.sh
|
| 584 |
+
```
|
| 585 |
+
|
| 586 |
+
Monitor the run with:
|
| 587 |
+
|
| 588 |
+
```bash
|
| 589 |
+
python scripts/omni/monitor_omni_progress.py \
|
| 590 |
+
--run-id xperience10m_qwen3_omni_128ep_fullsplit_fast8gpu
|
| 591 |
+
```
|
| 592 |
+
|
| 593 |
+
Validate the run artifacts stage by stage:
|
| 594 |
+
|
| 595 |
+
```bash
|
| 596 |
+
python scripts/omni/validate_omni_finetune_run.py \
|
| 597 |
+
--run-id xperience10m_qwen3_omni_128ep_fullsplit_fast8gpu \
|
| 598 |
+
--require-stage manifest
|
| 599 |
+
|
| 600 |
+
python scripts/omni/validate_omni_finetune_run.py \
|
| 601 |
+
--run-id xperience10m_qwen3_omni_128ep_fullsplit_fast8gpu \
|
| 602 |
+
--require-stage eval \
|
| 603 |
+
--min-json-validity 0.98
|
| 604 |
+
```
|
| 605 |
+
|
| 606 |
+
After dataset export, a model-neutral window index can be created for future
|
| 607 |
+
backbones:
|
| 608 |
+
|
| 609 |
+
```bash
|
| 610 |
+
python scripts/omni/export_model_neutral_window_index.py \
|
| 611 |
+
--dataset-jsonl results/omni_finetune/xperience10m_qwen3_omni_128ep_fullsplit_fast8gpu_dataset/dataset.jsonl
|
| 612 |
+
```
|
| 613 |
+
|
| 614 |
+
This produces `window_index.jsonl` and `window_index_manifest.json` so Cosmos-
|
| 615 |
+
style world models and VLA/policy branches can reuse the same split-checked
|
| 616 |
+
windows without depending on Qwen chat-message records.
|
| 617 |
+
|
| 618 |
### Uploading the pilot Qwen3-Omni LoRA
|
| 619 |
|
| 620 |
A prepared upload package is available at `results/omni_finetune/hf_upload`.
|
|
|
|
| 641 |
| GR00T | Humanoid/action-policy branch | Use after mocap/contact retargeting creates well-defined humanoid action targets. |
|
| 642 |
| OpenVLA / openpi | Open VLA/policy baselines | Use after the project defines robot-compatible or action-token targets. |
|
| 643 |
| Gemini Robotics | External reasoning reference | Use only for qualitative comparison or annotation support unless local trainable access exists. |
|
| 644 |
+
| Xperience Embodied Foundation Model | Future Xperience-native pretraining goal | Use only after multi-episode pilots, full-corpus storage, distributed training infrastructure, and scaling evidence justify a from-scratch domain model. |
|
| 645 |
|
| 646 |
See [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md) and
|
| 647 |
[`docs/data/foundation_model_plan.json`](docs/data/foundation_model_plan.json)
|
| 648 |
for the full selection matrix, source links, and model-specific evaluation
|
| 649 |
+
additions. See
|
| 650 |
+
[`XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md`](XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md)
|
| 651 |
+
for the long-term full-corpus pretraining plan.
|
| 652 |
+
|
| 653 |
+
Backbone-specific contracts now live in [`configs/omni_backbones`](configs/omni_backbones).
|
| 654 |
+
The extension contract is documented in
|
| 655 |
+
[`OMNI_MODEL_EXTENSION_CONTRACT.md`](OMNI_MODEL_EXTENSION_CONTRACT.md), and the
|
| 656 |
+
registry can be checked with:
|
| 657 |
+
|
| 658 |
+
```bash
|
| 659 |
+
python scripts/omni/backbone_registry.py --validate --json
|
| 660 |
+
```
|
| 661 |
|
| 662 |
## Four Research Directions
|
| 663 |
|
PROJECT_STATUS.md
CHANGED
|
@@ -21,8 +21,9 @@ scale-up readiness; it is not presented as final full-dataset model quality.
|
|
| 21 |
| Neural heads | Verified | `scripts/neural_task_models.py`, `results/episode_task_suite/neural_mlp/` | Each task also has a compact PyTorch MLP run over the same feature tensor and chronological split. |
|
| 22 |
| Audio contribution study | Verified | `scripts/audio_ablation_and_raw_upgrade.py`, `results/audio_ablation/`, `docs/data/audio_ablation_summary.json` | Audio variants are compared across all 12 task contracts; audio improves the primary metric on 6 of 12 tasks, and a 588-d audio-window representation improves over the baseline audio variant on 6 of 12 tasks. |
|
| 23 |
| Research takeaways | Verified | `RESEARCH_TAKEAWAYS.md`, `docs/data/research_takeaways.json`, `scripts/build_research_takeaways.py` | The main result interpretation is generated from committed metrics: chronological class shift, neural gains on dynamics/order/alignment, open retrieval/reconstruction problems, and the need for held-out episodes. |
|
| 24 |
-
| Research roadmap | Current | `RESEARCH_ROADMAP.md`, `docs/data/research_roadmap.json` | The roadmap connects public-sample task development to 128-episode data preparation, Qwen3-Omni LoRA, foundation-model selection, robustness runs, and
|
| 25 |
| Foundation-model plan | Current | `FOUNDATION_MODEL_PLAN.md`, `docs/data/foundation_model_plan.json` | Qwen3-Omni remains the first trainable held-out LoRA baseline; Cosmos 3 is added as the first world-model/action-generation branch; OpenVLA/openpi/GR00T are policy candidates after action targets are explicit. |
|
|
|
|
| 26 |
| Evaluation protocol | Verified | `EVALUATION_PROTOCOL.md`, `docs/data/evaluation_protocol.json`, `scripts/build_evaluation_protocol.py` | Windowing, chronological split, per-task metrics, leakage controls, and current limitations are generated from committed metric artifacts. |
|
| 27 |
| Dataset context | Verified | `XPERIENCE10M_DATASET_CARD_ALIGNMENT.md`, official Xperience-10M and sample cards | The README and dashboard distinguish the public sample used here from the gated full dataset used for the selected multi-episode pilot. |
|
| 28 |
| Public dashboard and Hub pages | Verified | GitHub Pages, HF Space, artifact dataset, baseline model repo, Qwen3-Omni LoRA repo | Readers can move between the website, code, derived artifacts, baseline weights, and Qwen3-Omni pilot status without needing internal setup details. |
|
|
@@ -42,15 +43,17 @@ scale-up readiness; it is not presented as final full-dataset model quality.
|
|
| 42 |
the path from public-sample task work to multi-episode modeling.
|
| 43 |
5. Inspect `FOUNDATION_MODEL_PLAN.md` and
|
| 44 |
`docs/data/foundation_model_plan.json` before choosing a backbone branch.
|
| 45 |
-
6. Inspect `
|
|
|
|
|
|
|
| 46 |
`results/episode_task_suite/neural_mlp/` to check the 12-task outputs.
|
| 47 |
-
|
| 48 |
whether audio helps the current task suite.
|
| 49 |
-
|
| 50 |
controls.
|
| 51 |
-
|
| 52 |
detailed upstream dataset-card context.
|
| 53 |
-
|
| 54 |
Qwen3-Omni scale-up status.
|
| 55 |
|
| 56 |
## Current Reading Notes
|
|
@@ -67,3 +70,5 @@ scale-up readiness; it is not presented as final full-dataset model quality.
|
|
| 67 |
- Foundation-model selection is now explicit: Qwen3-Omni is the immediate
|
| 68 |
trainable pilot, Cosmos 3 is the first world-model branch, and policy models
|
| 69 |
such as OpenVLA/openpi/GR00T wait for action-target conversion.
|
|
|
|
|
|
|
|
|
| 21 |
| Neural heads | Verified | `scripts/neural_task_models.py`, `results/episode_task_suite/neural_mlp/` | Each task also has a compact PyTorch MLP run over the same feature tensor and chronological split. |
|
| 22 |
| Audio contribution study | Verified | `scripts/audio_ablation_and_raw_upgrade.py`, `results/audio_ablation/`, `docs/data/audio_ablation_summary.json` | Audio variants are compared across all 12 task contracts; audio improves the primary metric on 6 of 12 tasks, and a 588-d audio-window representation improves over the baseline audio variant on 6 of 12 tasks. |
|
| 23 |
| Research takeaways | Verified | `RESEARCH_TAKEAWAYS.md`, `docs/data/research_takeaways.json`, `scripts/build_research_takeaways.py` | The main result interpretation is generated from committed metrics: chronological class shift, neural gains on dynamics/order/alignment, open retrieval/reconstruction problems, and the need for held-out episodes. |
|
| 24 |
+
| Research roadmap | Current | `RESEARCH_ROADMAP.md`, `docs/data/research_roadmap.json` | The roadmap connects public-sample task development to 128-episode data preparation, Qwen3-Omni LoRA, foundation-model selection, robustness runs, world/policy branches, and the future Xperience-native pretraining goal. |
|
| 25 |
| Foundation-model plan | Current | `FOUNDATION_MODEL_PLAN.md`, `docs/data/foundation_model_plan.json` | Qwen3-Omni remains the first trainable held-out LoRA baseline; Cosmos 3 is added as the first world-model/action-generation branch; OpenVLA/openpi/GR00T are policy candidates after action targets are explicit. |
|
| 26 |
+
| Xperience Embodied Foundation Model | Future goal | `XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md` | A future full-corpus pretraining plan describes target modules, objectives, staged scale-up, hardware ranges, and evaluation for a domain-specific embodied foundation model. |
|
| 27 |
| Evaluation protocol | Verified | `EVALUATION_PROTOCOL.md`, `docs/data/evaluation_protocol.json`, `scripts/build_evaluation_protocol.py` | Windowing, chronological split, per-task metrics, leakage controls, and current limitations are generated from committed metric artifacts. |
|
| 28 |
| Dataset context | Verified | `XPERIENCE10M_DATASET_CARD_ALIGNMENT.md`, official Xperience-10M and sample cards | The README and dashboard distinguish the public sample used here from the gated full dataset used for the selected multi-episode pilot. |
|
| 29 |
| Public dashboard and Hub pages | Verified | GitHub Pages, HF Space, artifact dataset, baseline model repo, Qwen3-Omni LoRA repo | Readers can move between the website, code, derived artifacts, baseline weights, and Qwen3-Omni pilot status without needing internal setup details. |
|
|
|
|
| 43 |
the path from public-sample task work to multi-episode modeling.
|
| 44 |
5. Inspect `FOUNDATION_MODEL_PLAN.md` and
|
| 45 |
`docs/data/foundation_model_plan.json` before choosing a backbone branch.
|
| 46 |
+
6. Inspect `XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md` for the
|
| 47 |
+
long-term full-corpus pretraining goal.
|
| 48 |
+
7. Inspect `docs/data/summary_metrics.json` and
|
| 49 |
`results/episode_task_suite/neural_mlp/` to check the 12-task outputs.
|
| 50 |
+
8. Inspect `results/audio_ablation/AUDIO_ABLATION_SUMMARY.md` before judging
|
| 51 |
whether audio helps the current task suite.
|
| 52 |
+
9. Inspect `EVALUATION_PROTOCOL.md` before judging task metrics or leakage
|
| 53 |
controls.
|
| 54 |
+
10. Inspect `XPERIENCE10M_DATASET_CARD_ALIGNMENT.md` only if you need the
|
| 55 |
detailed upstream dataset-card context.
|
| 56 |
+
11. Inspect `results/omni_finetune/DATA_ACCESS_STATUS.md` before judging
|
| 57 |
Qwen3-Omni scale-up status.
|
| 58 |
|
| 59 |
## Current Reading Notes
|
|
|
|
| 70 |
- Foundation-model selection is now explicit: Qwen3-Omni is the immediate
|
| 71 |
trainable pilot, Cosmos 3 is the first world-model branch, and policy models
|
| 72 |
such as OpenVLA/openpi/GR00T wait for action-target conversion.
|
| 73 |
+
- The Xperience Embodied Foundation Model is a future native-pretraining goal,
|
| 74 |
+
not a completed model or current benchmark.
|
README.md
CHANGED
|
@@ -64,7 +64,7 @@ embodied-AI research infrastructure:
|
|
| 64 |
| Multimodal data understanding | Parses the public sample into synchronized windows across video, audio, depth, pose/SLAM, mocap, IMU, calibration, and language-derived signals |
|
| 65 |
| Task design | Defines 12 human-readable tasks plus four direction-extension probes with inputs, outputs, process modules, metrics, and case-study walkthroughs |
|
| 66 |
| Model and evaluation discipline | Runs minimal and compact neural baselines, records predictions/metrics, keeps chronological split boundaries explicit, and separates sample evidence from held-out claims |
|
| 67 |
-
| Scale-up planning | Connects the public-sample pipeline to 32/128-episode held-out pilots, Qwen3-Omni LoRA, Cosmos-style world-model branches, and
|
| 68 |
|
| 69 |
## Start Here
|
| 70 |
|
|
@@ -81,6 +81,7 @@ before the multi-episode omni-model stage becomes a real held-out evaluation.
|
|
| 81 |
| Navigate the 12 tasks, four tracks, and scale-up plan | [Interactive research roadmap](https://chaoyue0307.github.io/ropedia-xperience-10m-task-suite/research_roadmap.html), [`docs/data/research_roadmap_interactive.json`](docs/data/research_roadmap_interactive.json) |
|
| 82 |
| Compare current task metrics | [`RESEARCH_TAKEAWAYS.md`](RESEARCH_TAKEAWAYS.md), [`docs/data/summary_metrics.json`](docs/data/summary_metrics.json) |
|
| 83 |
| Compare possible foundation backbones | [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md), [`docs/data/foundation_model_plan.json`](docs/data/foundation_model_plan.json) |
|
|
|
|
| 84 |
| Understand one model input | [`results/episode_task_suite/feature_manifest.json`](results/episode_task_suite/feature_manifest.json), [`results/episode_task_suite/windows.csv`](results/episode_task_suite/windows.csv) |
|
| 85 |
| Check multi-episode data status | [`results/omni_finetune/DATA_ACCESS_STATUS.md`](results/omni_finetune/DATA_ACCESS_STATUS.md) |
|
| 86 |
|
|
@@ -93,7 +94,7 @@ before the multi-episode omni-model stage becomes a real held-out evaluation.
|
|
| 93 |
| Task suite | 12 human-readable embodied-AI task contracts with input, process, output, metrics, predictions, and case-study walkthroughs |
|
| 94 |
| Baselines | Minimal linear/ridge/logistic heads plus compact PyTorch MLP task heads over the same chronological split |
|
| 95 |
| Research directions | Task mapping and extension probes for human modeling, 3D/4D reconstruction, egocentric interaction, and world modeling |
|
| 96 |
-
| Scale-up path | The gated Xperience-10M dataset is available for a selected 128-episode pilot before Qwen3-Omni LoRA, followed by Cosmos 3/world-model and VLA/policy branches |
|
| 97 |
| Public surfaces | GitHub repo, GitHub Pages dashboard, HF Space, HF artifact dataset, HF baseline-model repo, and HF collection |
|
| 98 |
|
| 99 |
For the fastest interpretation of the current metrics, start with
|
|
@@ -115,6 +116,7 @@ Current contributions:
|
|
| 115 |
- human-readable research task cards and an interactive scrub/play walkthrough storyboard for every task,
|
| 116 |
- an interactive research roadmap connecting 12 tasks, four research tracks, current sample evidence, the Qwen3-Omni scale-up path, and foundation-model branch selection,
|
| 117 |
- a next-milestone track for Qwen3-Omni fine-tuning, Cosmos 3 world modeling, and sensor-bridge evaluation,
|
|
|
|
| 118 |
- metrics, predictions, model weights, manifests, charts, and a two-level
|
| 119 |
tabbed static research website,
|
| 120 |
- a clear explanation of what is implemented now and what moves to the multi-episode stage.
|
|
@@ -129,7 +131,7 @@ This project is best read as a staged embodied-AI research study:
|
|
| 129 |
| Task suite | Twelve human-readable tasks cover action, procedure, contact, object, language, retrieval, reconstruction, order, and synchronization questions. | [`RESEARCH_TAKEAWAYS.md`](RESEARCH_TAKEAWAYS.md), [`results/episode_task_suite/summary_report.json`](results/episode_task_suite/summary_report.json) |
|
| 130 |
| Baselines | Minimal heads and compact PyTorch MLP heads provide a first controlled comparison on the same chronological split. | [`results/episode_task_suite/neural_mlp/`](results/episode_task_suite/neural_mlp/) |
|
| 131 |
| Diagnostics | Audio contribution, modality ablations, timeline overlays, object labels, and alignment stress tests show which signals are useful and which tasks remain hard. | [`results/audio_ablation/AUDIO_ABLATION_SUMMARY.md`](results/audio_ablation/AUDIO_ABLATION_SUMMARY.md), [`docs/single_episode_explorer.html`](docs/single_episode_explorer.html) |
|
| 132 |
-
| Scale-up | A selected 128-episode Qwen3-Omni LoRA pilot is being prepared from the gated dataset; held-out model metrics will be added only after training and evaluation finish. | [`RESEARCH_ROADMAP.md`](RESEARCH_ROADMAP.md), [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md), [`results/omni_finetune/DATA_ACCESS_STATUS.md`](results/omni_finetune/DATA_ACCESS_STATUS.md) |
|
| 133 |
|
| 134 |
Detailed dataset notes, reproduction checks, and generated JSON reports are
|
| 135 |
included for readers who want to inspect the implementation, but they are
|
|
@@ -168,7 +170,7 @@ If you are reading the project cold, open these in order:
|
|
| 168 |
| 6 | What do the current results mean? | [`RESEARCH_TAKEAWAYS.md`](RESEARCH_TAKEAWAYS.md), [`docs/data/research_takeaways.json`](docs/data/research_takeaways.json), [`docs/data/summary_metrics.json`](docs/data/summary_metrics.json) | Current metrics describe sample-level task behavior and identify which signals need larger held-out experiments. |
|
| 169 |
| 7 | Which models are implemented? | [`results/episode_task_suite/summary_report.json`](results/episode_task_suite/summary_report.json), [`results/episode_task_suite/neural_mlp/`](results/episode_task_suite/neural_mlp/), [HF baseline repo](https://huggingface.co/cy0307/ropedia-xperience-10m-task-baselines) | Each task has minimal and neural-head evidence over the same feature windows. |
|
| 170 |
| 8 | What research directions does this support? | [`RESEARCH_ROADMAP.md`](RESEARCH_ROADMAP.md), [`docs/data/research_directions.json`](docs/data/research_directions.json), [`docs/data/research_direction_extensions.json`](docs/data/research_direction_extensions.json) | The tasks are mapped to human modeling, 3D/4D reconstruction, egocentric interaction, and world modeling. |
|
| 171 |
-
| 9 | Which foundation model comes next? | [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md), [`docs/data/foundation_model_plan.json`](docs/data/foundation_model_plan.json) | Qwen3-Omni is the first held-out LoRA baseline; Cosmos 3 is the first world-model branch; policy models wait for explicit action targets. |
|
| 172 |
| 10 | How do I reproduce it? | [`REPRODUCIBILITY.md`](REPRODUCIBILITY.md), [`notes/reproducibility_audit.md`](notes/reproducibility_audit.md) | Public commands and expected outputs are documented for the sample-episode task suite. |
|
| 173 |
| 11 | What is still pending? | [`DATA_ACCESS_STATUS.md`](results/omni_finetune/DATA_ACCESS_STATUS.md), [`MULTI_EPISODE_ACCESS_STATUS.md`](results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md) | Multi-episode Qwen3-Omni model quality will be reported after preprocessing, training, and held-out evaluation complete. |
|
| 174 |
|
|
@@ -246,7 +248,8 @@ Hugging Face Space app:
|
|
| 246 |
| Dataset context | `XPERIENCE10M_DATASET_CARD_ALIGNMENT.md`, official dataset links | Explains the official dataset, public sample, modalities, access boundary, and what this repo uses |
|
| 247 |
| Visual assets | `FIGURE_INDEX.md`, `docs/assets/` | Shows the task-suite graphic, modality thumbnails, pipeline diagrams, charts, and logo assets |
|
| 248 |
| Evaluation protocol | `EVALUATION_PROTOCOL.md`, `docs/data/evaluation_protocol.json` | Defines the task unit, split, metrics, leakage controls, and current limitations |
|
| 249 |
-
| Research roadmap | `RESEARCH_ROADMAP.md`, `docs/data/research_roadmap.json` | Shows the path from sample-level task development to multi-episode
|
|
|
|
| 250 |
| Minimal heads | softmax, ridge projection/regression, multi-label logistic heads | Keeps every input/output contract visible and inspectable |
|
| 251 |
| Neural heads | PyTorch MLP classifiers/regressors under `neural_mlp/` | Checks whether nonlinear heads improve each task without changing features |
|
| 252 |
| Evidence | metrics, predictions, confusion matrices, diagrams, dashboard | Makes the single-episode task development inspectable without rerunning first |
|
|
@@ -607,11 +610,14 @@ assuming one backbone solves every Xperience-10M objective.
|
|
| 607 |
| GR00T | Humanoid/action-policy branch | Use after mocap/contact retargeting creates well-defined humanoid action targets. |
|
| 608 |
| OpenVLA / openpi | Open VLA/policy baselines | Use after the project defines robot-compatible or action-token targets. |
|
| 609 |
| Gemini Robotics | External reasoning reference | Use only for qualitative comparison or annotation support unless local trainable access exists. |
|
|
|
|
| 610 |
|
| 611 |
See [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md) and
|
| 612 |
[`docs/data/foundation_model_plan.json`](docs/data/foundation_model_plan.json)
|
| 613 |
for the full selection matrix, source links, and model-specific evaluation
|
| 614 |
-
additions.
|
|
|
|
|
|
|
| 615 |
|
| 616 |
## Four Research Directions
|
| 617 |
|
|
|
|
| 64 |
| Multimodal data understanding | Parses the public sample into synchronized windows across video, audio, depth, pose/SLAM, mocap, IMU, calibration, and language-derived signals |
|
| 65 |
| Task design | Defines 12 human-readable tasks plus four direction-extension probes with inputs, outputs, process modules, metrics, and case-study walkthroughs |
|
| 66 |
| Model and evaluation discipline | Runs minimal and compact neural baselines, records predictions/metrics, keeps chronological split boundaries explicit, and separates sample evidence from held-out claims |
|
| 67 |
+
| Scale-up planning | Connects the public-sample pipeline to 32/128-episode held-out pilots, Qwen3-Omni LoRA, Cosmos-style world-model branches, policy-model branches, and the future Xperience-native foundation-model pretraining goal |
|
| 68 |
|
| 69 |
## Start Here
|
| 70 |
|
|
|
|
| 81 |
| Navigate the 12 tasks, four tracks, and scale-up plan | [Interactive research roadmap](https://chaoyue0307.github.io/ropedia-xperience-10m-task-suite/research_roadmap.html), [`docs/data/research_roadmap_interactive.json`](docs/data/research_roadmap_interactive.json) |
|
| 82 |
| Compare current task metrics | [`RESEARCH_TAKEAWAYS.md`](RESEARCH_TAKEAWAYS.md), [`docs/data/summary_metrics.json`](docs/data/summary_metrics.json) |
|
| 83 |
| Compare possible foundation backbones | [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md), [`docs/data/foundation_model_plan.json`](docs/data/foundation_model_plan.json) |
|
| 84 |
+
| Understand the future native pretraining goal | [`XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md`](XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md) |
|
| 85 |
| Understand one model input | [`results/episode_task_suite/feature_manifest.json`](results/episode_task_suite/feature_manifest.json), [`results/episode_task_suite/windows.csv`](results/episode_task_suite/windows.csv) |
|
| 86 |
| Check multi-episode data status | [`results/omni_finetune/DATA_ACCESS_STATUS.md`](results/omni_finetune/DATA_ACCESS_STATUS.md) |
|
| 87 |
|
|
|
|
| 94 |
| Task suite | 12 human-readable embodied-AI task contracts with input, process, output, metrics, predictions, and case-study walkthroughs |
|
| 95 |
| Baselines | Minimal linear/ridge/logistic heads plus compact PyTorch MLP task heads over the same chronological split |
|
| 96 |
| Research directions | Task mapping and extension probes for human modeling, 3D/4D reconstruction, egocentric interaction, and world modeling |
|
| 97 |
+
| Scale-up path | The gated Xperience-10M dataset is available for a selected 128-episode pilot before Qwen3-Omni LoRA, followed by Cosmos 3/world-model and VLA/policy branches; the long-term goal is an Xperience-native embodied foundation model if full-corpus data, storage, and compute are available |
|
| 98 |
| Public surfaces | GitHub repo, GitHub Pages dashboard, HF Space, HF artifact dataset, HF baseline-model repo, and HF collection |
|
| 99 |
|
| 100 |
For the fastest interpretation of the current metrics, start with
|
|
|
|
| 116 |
- human-readable research task cards and an interactive scrub/play walkthrough storyboard for every task,
|
| 117 |
- an interactive research roadmap connecting 12 tasks, four research tracks, current sample evidence, the Qwen3-Omni scale-up path, and foundation-model branch selection,
|
| 118 |
- a next-milestone track for Qwen3-Omni fine-tuning, Cosmos 3 world modeling, and sensor-bridge evaluation,
|
| 119 |
+
- a future pretraining plan for an Xperience Embodied Foundation Model over the full corpus after smaller multi-episode stages prove value,
|
| 120 |
- metrics, predictions, model weights, manifests, charts, and a two-level
|
| 121 |
tabbed static research website,
|
| 122 |
- a clear explanation of what is implemented now and what moves to the multi-episode stage.
|
|
|
|
| 131 |
| Task suite | Twelve human-readable tasks cover action, procedure, contact, object, language, retrieval, reconstruction, order, and synchronization questions. | [`RESEARCH_TAKEAWAYS.md`](RESEARCH_TAKEAWAYS.md), [`results/episode_task_suite/summary_report.json`](results/episode_task_suite/summary_report.json) |
|
| 132 |
| Baselines | Minimal heads and compact PyTorch MLP heads provide a first controlled comparison on the same chronological split. | [`results/episode_task_suite/neural_mlp/`](results/episode_task_suite/neural_mlp/) |
|
| 133 |
| Diagnostics | Audio contribution, modality ablations, timeline overlays, object labels, and alignment stress tests show which signals are useful and which tasks remain hard. | [`results/audio_ablation/AUDIO_ABLATION_SUMMARY.md`](results/audio_ablation/AUDIO_ABLATION_SUMMARY.md), [`docs/single_episode_explorer.html`](docs/single_episode_explorer.html) |
|
| 134 |
+
| Scale-up | A selected 128-episode Qwen3-Omni LoRA pilot is being prepared from the gated dataset; held-out model metrics will be added only after training and evaluation finish. The long-term native-pretraining plan is documented separately as a future research goal. | [`RESEARCH_ROADMAP.md`](RESEARCH_ROADMAP.md), [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md), [`XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md`](XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md), [`results/omni_finetune/DATA_ACCESS_STATUS.md`](results/omni_finetune/DATA_ACCESS_STATUS.md) |
|
| 135 |
|
| 136 |
Detailed dataset notes, reproduction checks, and generated JSON reports are
|
| 137 |
included for readers who want to inspect the implementation, but they are
|
|
|
|
| 170 |
| 6 | What do the current results mean? | [`RESEARCH_TAKEAWAYS.md`](RESEARCH_TAKEAWAYS.md), [`docs/data/research_takeaways.json`](docs/data/research_takeaways.json), [`docs/data/summary_metrics.json`](docs/data/summary_metrics.json) | Current metrics describe sample-level task behavior and identify which signals need larger held-out experiments. |
|
| 171 |
| 7 | Which models are implemented? | [`results/episode_task_suite/summary_report.json`](results/episode_task_suite/summary_report.json), [`results/episode_task_suite/neural_mlp/`](results/episode_task_suite/neural_mlp/), [HF baseline repo](https://huggingface.co/cy0307/ropedia-xperience-10m-task-baselines) | Each task has minimal and neural-head evidence over the same feature windows. |
|
| 172 |
| 8 | What research directions does this support? | [`RESEARCH_ROADMAP.md`](RESEARCH_ROADMAP.md), [`docs/data/research_directions.json`](docs/data/research_directions.json), [`docs/data/research_direction_extensions.json`](docs/data/research_direction_extensions.json) | The tasks are mapped to human modeling, 3D/4D reconstruction, egocentric interaction, and world modeling. |
|
| 173 |
+
| 9 | Which foundation model comes next? | [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md), [`docs/data/foundation_model_plan.json`](docs/data/foundation_model_plan.json), [`XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md`](XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md) | Qwen3-Omni is the first held-out LoRA baseline; Cosmos 3 is the first world-model branch; policy models wait for explicit action targets; Xperience-native pretraining is the full-corpus future goal. |
|
| 174 |
| 10 | How do I reproduce it? | [`REPRODUCIBILITY.md`](REPRODUCIBILITY.md), [`notes/reproducibility_audit.md`](notes/reproducibility_audit.md) | Public commands and expected outputs are documented for the sample-episode task suite. |
|
| 175 |
| 11 | What is still pending? | [`DATA_ACCESS_STATUS.md`](results/omni_finetune/DATA_ACCESS_STATUS.md), [`MULTI_EPISODE_ACCESS_STATUS.md`](results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md) | Multi-episode Qwen3-Omni model quality will be reported after preprocessing, training, and held-out evaluation complete. |
|
| 176 |
|
|
|
|
| 248 |
| Dataset context | `XPERIENCE10M_DATASET_CARD_ALIGNMENT.md`, official dataset links | Explains the official dataset, public sample, modalities, access boundary, and what this repo uses |
|
| 249 |
| Visual assets | `FIGURE_INDEX.md`, `docs/assets/` | Shows the task-suite graphic, modality thumbnails, pipeline diagrams, charts, and logo assets |
|
| 250 |
| Evaluation protocol | `EVALUATION_PROTOCOL.md`, `docs/data/evaluation_protocol.json` | Defines the task unit, split, metrics, leakage controls, and current limitations |
|
| 251 |
+
| Research roadmap | `RESEARCH_ROADMAP.md`, `docs/data/research_roadmap.json` | Shows the path from sample-level task development to multi-episode work, larger model branches, and the future native-pretraining goal |
|
| 252 |
+
| Xperience Embodied Foundation Model plan | `XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md` | Describes the long-term full-corpus pretraining goal, target modules, objectives, staged scale-up, hardware ranges, and evaluation protocol |
|
| 253 |
| Minimal heads | softmax, ridge projection/regression, multi-label logistic heads | Keeps every input/output contract visible and inspectable |
|
| 254 |
| Neural heads | PyTorch MLP classifiers/regressors under `neural_mlp/` | Checks whether nonlinear heads improve each task without changing features |
|
| 255 |
| Evidence | metrics, predictions, confusion matrices, diagrams, dashboard | Makes the single-episode task development inspectable without rerunning first |
|
|
|
|
| 610 |
| GR00T | Humanoid/action-policy branch | Use after mocap/contact retargeting creates well-defined humanoid action targets. |
|
| 611 |
| OpenVLA / openpi | Open VLA/policy baselines | Use after the project defines robot-compatible or action-token targets. |
|
| 612 |
| Gemini Robotics | External reasoning reference | Use only for qualitative comparison or annotation support unless local trainable access exists. |
|
| 613 |
+
| Xperience Embodied Foundation Model | Future Xperience-native pretraining goal | Use only after multi-episode pilots, full-corpus storage, distributed training infrastructure, and scaling evidence justify a from-scratch domain model. |
|
| 614 |
|
| 615 |
See [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md) and
|
| 616 |
[`docs/data/foundation_model_plan.json`](docs/data/foundation_model_plan.json)
|
| 617 |
for the full selection matrix, source links, and model-specific evaluation
|
| 618 |
+
additions. See
|
| 619 |
+
[`XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md`](XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md)
|
| 620 |
+
for the long-term full-corpus pretraining plan.
|
| 621 |
|
| 622 |
## Four Research Directions
|
| 623 |
|
RESEARCH_ROADMAP.md
CHANGED
|
@@ -15,6 +15,7 @@ should exist before the stage is treated as complete.
|
|
| 15 |
| Foundation-Model Selection Matrix | Next | The selected pilot episodes are prepared, or a 3-8 episode dry run is available for preprocessing checks. | Backbone registry, Cosmos 3 world-model branch plan, Qwen3-Omni baseline plan, OpenVLA/openpi/GR00T policy candidates, and model-specific evaluation additions. | `FOUNDATION_MODEL_PLAN.md`, `docs/data/foundation_model_plan.json`, `research_roadmap_interactive.json` |
|
| 16 |
| 64-128 Episode Robustness Run | Planned | The selected-episode pilot trains and evaluates cleanly. | Split-by-session metrics, modality ablations, calibration/object/language error analysis, and sensitivity to missing views. | Held-out metrics by session, task, and modality; ablation tables; qualitative error analysis. |
|
| 17 |
| Cosmos 3 and Policy-Model Extensions | Planned | Enough multi-episode data, compute budget, and model-specific action/world-state targets. | Cosmos 3 future-window or action-conditioned world-model probes, OpenVLA/openpi/GR00T action-policy baselines, modality-conditioning checks, affordance tasks, and synthetic-data usefulness tests. | Task-specific held-out evaluations, qualitative inspection, and updated model cards. |
|
|
|
|
| 18 |
|
| 19 |
## Current Decision Point
|
| 20 |
|
|
@@ -24,9 +25,11 @@ episodes to run the held-out Qwen3-Omni pilot, then choose larger model branches
|
|
| 24 |
by task fit. Qwen3-Omni remains the first trainable multimodal LoRA target.
|
| 25 |
Cosmos 3 becomes the first world-model/action-generation branch. OpenVLA,
|
| 26 |
openpi, GR00T, Octo, and SmolVLA-style models become policy/action branches only
|
| 27 |
-
after the action target is explicit.
|
| 28 |
-
|
| 29 |
-
|
|
|
|
|
|
|
| 30 |
|
| 31 |
## Stage Details
|
| 32 |
|
|
@@ -109,6 +112,27 @@ objectives: audio-visible alignment, future-window prediction,
|
|
| 109 |
action-conditioned world modeling, synthetic-data usefulness tests, policy-style
|
| 110 |
next action, contact, object relevance, and affordance reasoning.
|
| 111 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 112 |
## Public Artifacts That Should Move Together
|
| 113 |
|
| 114 |
When a roadmap stage advances, update these public surfaces together:
|
|
@@ -118,6 +142,7 @@ When a roadmap stage advances, update these public surfaces together:
|
|
| 118 |
- `RESEARCH_TAKEAWAYS.md`
|
| 119 |
- `EVALUATION_PROTOCOL.md`
|
| 120 |
- `ARTIFACT_GUIDE.md`
|
|
|
|
| 121 |
- `docs/index.html`
|
| 122 |
- `docs/data/research_roadmap.json`
|
| 123 |
- Hugging Face Space, artifact dataset, and model cards
|
|
|
|
| 15 |
| Foundation-Model Selection Matrix | Next | The selected pilot episodes are prepared, or a 3-8 episode dry run is available for preprocessing checks. | Backbone registry, Cosmos 3 world-model branch plan, Qwen3-Omni baseline plan, OpenVLA/openpi/GR00T policy candidates, and model-specific evaluation additions. | `FOUNDATION_MODEL_PLAN.md`, `docs/data/foundation_model_plan.json`, `research_roadmap_interactive.json` |
|
| 16 |
| 64-128 Episode Robustness Run | Planned | The selected-episode pilot trains and evaluates cleanly. | Split-by-session metrics, modality ablations, calibration/object/language error analysis, and sensitivity to missing views. | Held-out metrics by session, task, and modality; ablation tables; qualitative error analysis. |
|
| 17 |
| Cosmos 3 and Policy-Model Extensions | Planned | Enough multi-episode data, compute budget, and model-specific action/world-state targets. | Cosmos 3 future-window or action-conditioned world-model probes, OpenVLA/openpi/GR00T action-policy baselines, modality-conditioning checks, affordance tasks, and synthetic-data usefulness tests. | Task-specific held-out evaluations, qualitative inspection, and updated model cards. |
|
| 18 |
+
| Xperience Embodied Foundation Model Pretraining | Future | Full-corpus access, PB-scale storage path, multi-node compute, and positive scaling evidence from smaller runs. | Xperience-native temporal multimodal model, full-corpus manifests, pretraining shards, scaling curves, held-out evaluations, and model card. | Pretraining metadata, checkpoint inventory, held-out metrics, scaling report, and data-boundary report. |
|
| 19 |
|
| 20 |
## Current Decision Point
|
| 21 |
|
|
|
|
| 25 |
by task fit. Qwen3-Omni remains the first trainable multimodal LoRA target.
|
| 26 |
Cosmos 3 becomes the first world-model/action-generation branch. OpenVLA,
|
| 27 |
openpi, GR00T, Octo, and SmolVLA-style models become policy/action branches only
|
| 28 |
+
after the action target is explicit. A from-scratch Xperience Embodied
|
| 29 |
+
Foundation Model is the long-term native-pretraining goal, not the immediate
|
| 30 |
+
experiment. The public sample is already enough for task design, feature
|
| 31 |
+
contracts, walkthroughs, and baseline comparisons. It is not enough to measure
|
| 32 |
+
general embodied-AI model quality.
|
| 33 |
|
| 34 |
## Stage Details
|
| 35 |
|
|
|
|
| 112 |
action-conditioned world modeling, synthetic-data usefulness tests, policy-style
|
| 113 |
next action, contact, object relevance, and affordance reasoning.
|
| 114 |
|
| 115 |
+
### 7. Xperience Embodied Foundation Model Pretraining
|
| 116 |
+
|
| 117 |
+
This stage is the long-term full-corpus goal. Instead of adapting an existing
|
| 118 |
+
backbone, it would pretrain a domain model directly on the synchronized
|
| 119 |
+
Xperience-10M modality structure: video, audio, depth, pose/SLAM, hand/body
|
| 120 |
+
mocap, IMU, calibration, and language annotations.
|
| 121 |
+
|
| 122 |
+
The first realistic target is a 3B-7B Xperience-native domain model after
|
| 123 |
+
smaller 0.3B-1B and 1B-3B pilots prove that the objectives and data loaders
|
| 124 |
+
scale. The training objective should combine masked multimodal modeling,
|
| 125 |
+
cross-modal alignment, future-state prediction, ego-motion and hand-motion
|
| 126 |
+
forecasting, action/procedure prediction, language grounding, contact and
|
| 127 |
+
affordance prediction, and optional policy-style targets after action
|
| 128 |
+
conversion.
|
| 129 |
+
|
| 130 |
+
This stage needs full-corpus access, PB-scale storage planning, high-throughput
|
| 131 |
+
media decoding, distributed training, reliable checkpoints, and held-out
|
| 132 |
+
evaluation across episodes, sessions, activities, objects, and missing
|
| 133 |
+
modalities. The plan is reader-facing in
|
| 134 |
+
`XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md`.
|
| 135 |
+
|
| 136 |
## Public Artifacts That Should Move Together
|
| 137 |
|
| 138 |
When a roadmap stage advances, update these public surfaces together:
|
|
|
|
| 142 |
- `RESEARCH_TAKEAWAYS.md`
|
| 143 |
- `EVALUATION_PROTOCOL.md`
|
| 144 |
- `ARTIFACT_GUIDE.md`
|
| 145 |
+
- `XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md`
|
| 146 |
- `docs/index.html`
|
| 147 |
- `docs/data/research_roadmap.json`
|
| 148 |
- Hugging Face Space, artifact dataset, and model cards
|
XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md
ADDED
|
@@ -0,0 +1,178 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Xperience Embodied Foundation Model Pretraining Goal
|
| 2 |
+
|
| 3 |
+
This document describes a future research direction for the project: a
|
| 4 |
+
domain-specific embodied foundation model pretrained on the full Xperience-10M
|
| 5 |
+
corpus, if full-episode access, storage, and compute become available.
|
| 6 |
+
|
| 7 |
+
Current status: this is a planning artifact. The public project currently
|
| 8 |
+
contains a public-sample task suite, lightweight baselines, Qwen3-Omni LoRA
|
| 9 |
+
preparation, and a smoke LoRA artifact. It does not currently contain a
|
| 10 |
+
from-scratch Xperience foundation model or full-corpus pretraining run.
|
| 11 |
+
|
| 12 |
+
## Why This Is A Natural Long-Term Goal
|
| 13 |
+
|
| 14 |
+
Xperience-10M is designed for physical-AI pretraining rather than only
|
| 15 |
+
single-task supervised learning. The official dataset card describes 10 million
|
| 16 |
+
experiences, 10,000 hours of synchronized first-person recordings, six video
|
| 17 |
+
streams, audio, stereo depth, camera pose, hand and full-body mocap, IMU, and
|
| 18 |
+
hierarchical language annotations. It also reports 2.88B RGB frames, 720M depth
|
| 19 |
+
frames, 576M pose/mocap frames, 7.2B IMU frames, and about 1 PB of total data.
|
| 20 |
+
|
| 21 |
+
That scale and alignment make a specific Xperience-native model plausible:
|
| 22 |
+
not a general web-scale omni model, but an embodied model specialized for
|
| 23 |
+
egocentric perception, human-object interaction, temporal dynamics, physical
|
| 24 |
+
state, and task intent.
|
| 25 |
+
|
| 26 |
+
## Target Model
|
| 27 |
+
|
| 28 |
+
The proposed model name is **Xperience Embodied Foundation Model**.
|
| 29 |
+
|
| 30 |
+
The model should learn a shared temporal representation of embodied experience:
|
| 31 |
+
what the wearer sees and hears, how the camera moves, how the body and hands
|
| 32 |
+
move, what objects are involved, what geometry is present, and what task is
|
| 33 |
+
being performed.
|
| 34 |
+
|
| 35 |
+
Expected modules:
|
| 36 |
+
|
| 37 |
+
| Module | Input | Role |
|
| 38 |
+
| --- | --- | --- |
|
| 39 |
+
| Multi-view video encoder | fisheye/stereo/RGB streams | visual state, egocentric context, object interaction |
|
| 40 |
+
| Audio encoder | synchronized MP4 audio | event cues, contact-like sound, temporal grounding |
|
| 41 |
+
| Depth and geometry encoder | depth, confidence, calibration | spatial structure and 3D/4D scene cues |
|
| 42 |
+
| Pose/SLAM encoder | camera trajectory and orientation | ego-motion, viewpoint, scene traversal |
|
| 43 |
+
| Mocap encoder | hand/body joints | human motion, hand-object interaction, affordance cues |
|
| 44 |
+
| IMU encoder | accelerometer/gyroscope streams | inertial dynamics and wearable motion |
|
| 45 |
+
| Language encoder/decoder | task/subtask/action/object annotations | semantic grounding and structured generation |
|
| 46 |
+
| Temporal fusion transformer | aligned per-window modality tokens | shared embodied representation across time |
|
| 47 |
+
| Task heads / decoders | fused representation | action, caption, future motion, retrieval, reconstruction, and world-state outputs |
|
| 48 |
+
|
| 49 |
+
## Pretraining Objectives
|
| 50 |
+
|
| 51 |
+
The model should not rely on one loss. It should combine complementary
|
| 52 |
+
objectives so that every modality contributes to the shared representation.
|
| 53 |
+
|
| 54 |
+
| Objective | What the model learns | Example output |
|
| 55 |
+
| --- | --- | --- |
|
| 56 |
+
| Masked multimodal modeling | recover hidden video/depth/sensor tokens from context | reconstructed latent patches or sensor features |
|
| 57 |
+
| Cross-modal contrastive alignment | align video, motion, audio, geometry, and language from the same time window | matching score or retrieval embedding |
|
| 58 |
+
| Future-state prediction | predict what changes after the current window | future visual/depth/motion latent |
|
| 59 |
+
| Ego-motion and hand-motion forecasting | model wearer/body dynamics | future camera delta or hand trajectory |
|
| 60 |
+
| Action and procedure prediction | connect physical state to task semantics | action, subtask, transition, next action |
|
| 61 |
+
| Language grounding and captioning | connect temporal windows to natural language | caption, object/action grounding, structured JSON |
|
| 62 |
+
| Contact and affordance prediction | learn interaction state from human-object motion | contact state, relevant object set |
|
| 63 |
+
| Optional policy-style targets | learn action-like outputs after target conversion | action token, motion chunk, retargeted policy target |
|
| 64 |
+
|
| 65 |
+
## Staged Pretraining Plan
|
| 66 |
+
|
| 67 |
+
### Stage 0: Data Contract And Quality Gate
|
| 68 |
+
|
| 69 |
+
Use the existing public-sample task suite to define the data contract. Before
|
| 70 |
+
pretraining, every episode must pass a strict manifest check:
|
| 71 |
+
|
| 72 |
+
- `annotation.hdf5` exists and is readable,
|
| 73 |
+
- video streams are present or missing views are explicitly recorded,
|
| 74 |
+
- audio can be extracted or marked unavailable,
|
| 75 |
+
- depth, pose, mocap, IMU, calibration, and language fields are indexed,
|
| 76 |
+
- windows are aligned by timestamp or frame index,
|
| 77 |
+
- train/val/test splits are episode-level, not window-level leakage splits,
|
| 78 |
+
- raw data remains outside public repos and Hugging Face artifact mirrors.
|
| 79 |
+
|
| 80 |
+
### Stage 1: 128-1,000 Episode Representation Pilot
|
| 81 |
+
|
| 82 |
+
Start with a smaller model and a selected subset. The goal is to test whether
|
| 83 |
+
the multimodal objectives train stably and improve held-out task performance.
|
| 84 |
+
|
| 85 |
+
Recommended scale:
|
| 86 |
+
|
| 87 |
+
- 128 to 1,000 episodes,
|
| 88 |
+
- frozen or lightly trainable video/audio encoders at first,
|
| 89 |
+
- 0.3B-1B temporal fusion model,
|
| 90 |
+
- all available sensor modalities represented as tokens,
|
| 91 |
+
- evaluation on the existing 12-task suite plus future-state/retrieval probes.
|
| 92 |
+
|
| 93 |
+
### Stage 2: 10K Episode Domain Model
|
| 94 |
+
|
| 95 |
+
Scale after the pilot proves value. This stage should train a stronger
|
| 96 |
+
Xperience-specific representation model rather than only fine-tuning a general
|
| 97 |
+
omni model.
|
| 98 |
+
|
| 99 |
+
Recommended scale:
|
| 100 |
+
|
| 101 |
+
- thousands to 10K episodes,
|
| 102 |
+
- 1B-3B parameter multimodal temporal model,
|
| 103 |
+
- mixed supervised, contrastive, and predictive objectives,
|
| 104 |
+
- held-out sessions and held-out activities,
|
| 105 |
+
- robustness to missing camera views and sensor dropout.
|
| 106 |
+
|
| 107 |
+
### Stage 3: Full-Corpus Xperience Embodied Foundation Model
|
| 108 |
+
|
| 109 |
+
Use this stage only if storage, data throughput, and multi-node compute are
|
| 110 |
+
available. The goal is a domain foundation model over embodied human experience,
|
| 111 |
+
not a general internet-scale language model.
|
| 112 |
+
|
| 113 |
+
Recommended scale:
|
| 114 |
+
|
| 115 |
+
- all available Xperience-10M episodes,
|
| 116 |
+
- 3B-7B domain model as a realistic first full-corpus target,
|
| 117 |
+
- larger models only after scaling curves justify the cost,
|
| 118 |
+
- mixture of reconstruction, retrieval, forecasting, language, and world-model
|
| 119 |
+
objectives,
|
| 120 |
+
- downstream evaluation on held-out episodes, held-out sessions, unseen
|
| 121 |
+
objects, unseen activities, and downstream robotics/world-model tasks.
|
| 122 |
+
|
| 123 |
+
## Hardware Requirements
|
| 124 |
+
|
| 125 |
+
These are planning ranges, not completed run measurements from this repo.
|
| 126 |
+
|
| 127 |
+
| Training goal | Typical compute | Storage and data path | Practical use |
|
| 128 |
+
| --- | --- | --- | --- |
|
| 129 |
+
| 0.3B-1B pilot | 8-32 modern 80GB-class data-center GPUs | tens of TB plus fast local cache | prove objectives and data loaders |
|
| 130 |
+
| 1B-3B domain model | 32-128 GPUs | 100TB-scale cache, high-throughput decoding | serious research-scale pretraining |
|
| 131 |
+
| 3B-7B full-corpus domain model | 128-512 GPUs | PB-scale storage plus 100-400Gbps networking | first full Xperience-native foundation model |
|
| 132 |
+
| 30B-class omni model from scratch | 512-2,000+ GPUs | PB-scale storage, multi-node orchestration, large checkpoint budget | lab-scale project, not the first target |
|
| 133 |
+
| frontier general omni model | thousands of GPUs | data beyond Xperience-10M plus large infrastructure | out of scope for this project |
|
| 134 |
+
|
| 135 |
+
For full-corpus work, storage is as important as GPU count:
|
| 136 |
+
|
| 137 |
+
- raw corpus storage around the official dataset scale,
|
| 138 |
+
- 1.5-3x extra capacity for derived shards, caches, checkpoints, and metadata,
|
| 139 |
+
- fast NVMe cache for active shards,
|
| 140 |
+
- parallel media decoding and feature extraction workers,
|
| 141 |
+
- distributed training with reliable checkpoint/restart,
|
| 142 |
+
- per-episode provenance and split manifests.
|
| 143 |
+
|
| 144 |
+
## Evaluation Protocol
|
| 145 |
+
|
| 146 |
+
The model should not be judged only by training loss. Evaluation should include:
|
| 147 |
+
|
| 148 |
+
- JSON validity and structured task metrics from the current task suite,
|
| 149 |
+
- action/subtask/contact/object metrics on held-out episodes,
|
| 150 |
+
- text-to-window and window-to-text retrieval,
|
| 151 |
+
- future ego-motion and hand-motion forecasting,
|
| 152 |
+
- cross-modal reconstruction and missing-modality robustness,
|
| 153 |
+
- held-out object/activity/session generalization,
|
| 154 |
+
- qualitative inspection of retrieved or generated future states,
|
| 155 |
+
- downstream transfer to Qwen3-Omni, Cosmos-style world modeling, and
|
| 156 |
+
policy/action branches.
|
| 157 |
+
|
| 158 |
+
## Relationship To Existing Public Work
|
| 159 |
+
|
| 160 |
+
The current public project is the harness for this future model:
|
| 161 |
+
|
| 162 |
+
- the 12-task suite defines concrete input/output contracts,
|
| 163 |
+
- minimal and neural baselines provide initial supervised targets,
|
| 164 |
+
- audio/modality diagnostics show which signals contribute,
|
| 165 |
+
- Qwen3-Omni LoRA provides the first trainable multi-episode adapter path,
|
| 166 |
+
- Cosmos and policy branches define downstream model families,
|
| 167 |
+
- the pretraining goal unifies these into a long-term representation-learning
|
| 168 |
+
direction.
|
| 169 |
+
|
| 170 |
+
The next practical step is still selected multi-episode preparation and
|
| 171 |
+
held-out Qwen3-Omni LoRA evaluation. Full-corpus pretraining should come after
|
| 172 |
+
the smaller scaling stages show measurable value.
|
| 173 |
+
|
| 174 |
+
## Source Links
|
| 175 |
+
|
| 176 |
+
- Official Xperience-10M dataset: https://huggingface.co/datasets/ropedia-ai/xperience-10m
|
| 177 |
+
- Ropedia Xperience-10M release page: https://ropedia.com/blog/20260316_xperience_10m
|
| 178 |
+
- Ropedia physical-AI data infrastructure page: https://ropedia-dev.com/
|
data/artifact_index.json
CHANGED
|
@@ -1,11 +1,11 @@
|
|
| 1 |
{
|
| 2 |
"title": "Ropedia Xperience-10M Task Suite Artifact Index",
|
| 3 |
-
"generated_at_utc": "2026-06-
|
| 4 |
"status": "pass",
|
| 5 |
-
"artifact_count":
|
| 6 |
"missing": [],
|
| 7 |
"by_kind": {
|
| 8 |
-
"project_path":
|
| 9 |
"project_scope": 1,
|
| 10 |
"source_alignment": 5,
|
| 11 |
"publication_workflow": 1,
|
|
@@ -62,8 +62,8 @@
|
|
| 62 |
"surface": "repo_hf",
|
| 63 |
"shows": "Gives a compact current-state table for first-pass readers.",
|
| 64 |
"exists": true,
|
| 65 |
-
"bytes":
|
| 66 |
-
"sha256": "
|
| 67 |
},
|
| 68 |
{
|
| 69 |
"id": "project_status_json",
|
|
@@ -73,8 +73,8 @@
|
|
| 73 |
"surface": "website_hf",
|
| 74 |
"shows": "Machine-readable copy of the current project status for website and HF mirrors.",
|
| 75 |
"exists": true,
|
| 76 |
-
"bytes":
|
| 77 |
-
"sha256": "
|
| 78 |
},
|
| 79 |
{
|
| 80 |
"id": "research_roadmap",
|
|
@@ -84,8 +84,8 @@
|
|
| 84 |
"surface": "repo_hf",
|
| 85 |
"shows": "Defines the path from public-sample task development to multi-episode held-out evaluation and larger omni-model extensions.",
|
| 86 |
"exists": true,
|
| 87 |
-
"bytes":
|
| 88 |
-
"sha256": "
|
| 89 |
},
|
| 90 |
{
|
| 91 |
"id": "research_roadmap_json",
|
|
@@ -95,8 +95,8 @@
|
|
| 95 |
"surface": "website_hf",
|
| 96 |
"shows": "Machine-readable research roadmap for the website and Hugging Face mirrors.",
|
| 97 |
"exists": true,
|
| 98 |
-
"bytes":
|
| 99 |
-
"sha256": "
|
| 100 |
},
|
| 101 |
{
|
| 102 |
"id": "foundation_model_plan",
|
|
@@ -106,8 +106,8 @@
|
|
| 106 |
"surface": "repo_hf",
|
| 107 |
"shows": "Defines the post-data-gate backbone choices: Qwen3-Omni first, Cosmos 3 for world modeling, and VLA/policy models after action-target conversion.",
|
| 108 |
"exists": true,
|
| 109 |
-
"bytes":
|
| 110 |
-
"sha256": "
|
| 111 |
},
|
| 112 |
{
|
| 113 |
"id": "foundation_model_plan_json",
|
|
@@ -117,8 +117,19 @@
|
|
| 117 |
"surface": "website_hf",
|
| 118 |
"shows": "Machine-readable foundation-model selection matrix with source links, entry conditions, and evaluation additions.",
|
| 119 |
"exists": true,
|
| 120 |
-
"bytes":
|
| 121 |
-
"sha256": "
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 122 |
},
|
| 123 |
{
|
| 124 |
"id": "evidence_contract",
|
|
@@ -150,8 +161,8 @@
|
|
| 150 |
"surface": "repo_hf",
|
| 151 |
"shows": "Gives the human-readable map from project scope to data, tasks, platform mirrors, and scale-up status.",
|
| 152 |
"exists": true,
|
| 153 |
-
"bytes":
|
| 154 |
-
"sha256": "
|
| 155 |
},
|
| 156 |
{
|
| 157 |
"id": "official_dataset_card_alignment",
|
|
@@ -195,7 +206,7 @@
|
|
| 195 |
"shows": "Machine-readable source-alignment pass/fail check for repo, website, and HF surfaces.",
|
| 196 |
"exists": true,
|
| 197 |
"bytes": 4432,
|
| 198 |
-
"sha256": "
|
| 199 |
},
|
| 200 |
{
|
| 201 |
"id": "source_alignment_validator",
|
|
@@ -573,8 +584,8 @@
|
|
| 573 |
"surface": "repo_hf",
|
| 574 |
"shows": "Generates the selective artifact catalog from local files.",
|
| 575 |
"exists": true,
|
| 576 |
-
"bytes":
|
| 577 |
-
"sha256": "
|
| 578 |
},
|
| 579 |
{
|
| 580 |
"id": "publication_audit",
|
|
@@ -585,7 +596,7 @@
|
|
| 585 |
"volatile": true,
|
| 586 |
"shows": "Confirms public bundles exclude raw data, caches, heavy archives, and credential text.",
|
| 587 |
"exists": true,
|
| 588 |
-
"bytes":
|
| 589 |
"hash_policy": "existence_and_size_only"
|
| 590 |
},
|
| 591 |
{
|
|
@@ -597,7 +608,7 @@
|
|
| 597 |
"volatile": true,
|
| 598 |
"shows": "Separates setup paths from completed held-out-episode results.",
|
| 599 |
"exists": true,
|
| 600 |
-
"bytes":
|
| 601 |
"hash_policy": "existence_and_size_only"
|
| 602 |
},
|
| 603 |
{
|
|
@@ -609,7 +620,7 @@
|
|
| 609 |
"volatile": true,
|
| 610 |
"shows": "Confirms prepared GitHub/HF Space/artifact/model mirrors share the same critical data, figure, website HTML, and validator files.",
|
| 611 |
"exists": true,
|
| 612 |
-
"bytes":
|
| 613 |
"hash_policy": "existence_and_size_only"
|
| 614 |
},
|
| 615 |
{
|
|
@@ -621,7 +632,7 @@
|
|
| 621 |
"volatile": true,
|
| 622 |
"shows": "Confirms local website links, anchors, JSON data files, and referenced images resolve.",
|
| 623 |
"exists": true,
|
| 624 |
-
"bytes":
|
| 625 |
"hash_policy": "existence_and_size_only"
|
| 626 |
},
|
| 627 |
{
|
|
|
|
| 1 |
{
|
| 2 |
"title": "Ropedia Xperience-10M Task Suite Artifact Index",
|
| 3 |
+
"generated_at_utc": "2026-06-04T20:40:52+00:00",
|
| 4 |
"status": "pass",
|
| 5 |
+
"artifact_count": 73,
|
| 6 |
"missing": [],
|
| 7 |
"by_kind": {
|
| 8 |
+
"project_path": 12,
|
| 9 |
"project_scope": 1,
|
| 10 |
"source_alignment": 5,
|
| 11 |
"publication_workflow": 1,
|
|
|
|
| 62 |
"surface": "repo_hf",
|
| 63 |
"shows": "Gives a compact current-state table for first-pass readers.",
|
| 64 |
"exists": true,
|
| 65 |
+
"bytes": 7207,
|
| 66 |
+
"sha256": "7baaba976ccc254da1a03ee2653057d1e08f3fb0c0cad035886c362442828720"
|
| 67 |
},
|
| 68 |
{
|
| 69 |
"id": "project_status_json",
|
|
|
|
| 73 |
"surface": "website_hf",
|
| 74 |
"shows": "Machine-readable copy of the current project status for website and HF mirrors.",
|
| 75 |
"exists": true,
|
| 76 |
+
"bytes": 9874,
|
| 77 |
+
"sha256": "600c95726eae3404127a8b2110f35468ff2ba02943cae0fbcd3ea43c66109d3e"
|
| 78 |
},
|
| 79 |
{
|
| 80 |
"id": "research_roadmap",
|
|
|
|
| 84 |
"surface": "repo_hf",
|
| 85 |
"shows": "Defines the path from public-sample task development to multi-episode held-out evaluation and larger omni-model extensions.",
|
| 86 |
"exists": true,
|
| 87 |
+
"bytes": 8388,
|
| 88 |
+
"sha256": "0b3e3356076998ad94dc39f708cc783a4ebeab76c9da661cdd37ea12a3bb3665"
|
| 89 |
},
|
| 90 |
{
|
| 91 |
"id": "research_roadmap_json",
|
|
|
|
| 95 |
"surface": "website_hf",
|
| 96 |
"shows": "Machine-readable research roadmap for the website and Hugging Face mirrors.",
|
| 97 |
"exists": true,
|
| 98 |
+
"bytes": 7161,
|
| 99 |
+
"sha256": "cc96118c2c05108c831616151bc027441f7545495adeeb6a4a6a6bffe8da7801"
|
| 100 |
},
|
| 101 |
{
|
| 102 |
"id": "foundation_model_plan",
|
|
|
|
| 106 |
"surface": "repo_hf",
|
| 107 |
"shows": "Defines the post-data-gate backbone choices: Qwen3-Omni first, Cosmos 3 for world modeling, and VLA/policy models after action-target conversion.",
|
| 108 |
"exists": true,
|
| 109 |
+
"bytes": 9075,
|
| 110 |
+
"sha256": "444d13ab556d2e16a199a7fca191b87c85ab8685d167aab357bc6341839299a2"
|
| 111 |
},
|
| 112 |
{
|
| 113 |
"id": "foundation_model_plan_json",
|
|
|
|
| 117 |
"surface": "website_hf",
|
| 118 |
"shows": "Machine-readable foundation-model selection matrix with source links, entry conditions, and evaluation additions.",
|
| 119 |
"exists": true,
|
| 120 |
+
"bytes": 12981,
|
| 121 |
+
"sha256": "9cce52025a2e2f8afb4660e2af3353aea6ad0a1af380849218dd74c0acc271bb"
|
| 122 |
+
},
|
| 123 |
+
{
|
| 124 |
+
"id": "xperience_embodied_foundation_pretraining",
|
| 125 |
+
"title": "Xperience Embodied Foundation Model pretraining goal",
|
| 126 |
+
"path": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md",
|
| 127 |
+
"kind": "project_path",
|
| 128 |
+
"surface": "repo_hf",
|
| 129 |
+
"shows": "Describes the future full-corpus Xperience-native pretraining goal, target modules, objectives, staged scale-up, hardware ranges, and evaluation protocol.",
|
| 130 |
+
"exists": true,
|
| 131 |
+
"bytes": 9182,
|
| 132 |
+
"sha256": "b5a6ddc58647cd895a4772b110ecc9f4d685427fb37b81b22c6c02d2b9b323f1"
|
| 133 |
},
|
| 134 |
{
|
| 135 |
"id": "evidence_contract",
|
|
|
|
| 161 |
"surface": "repo_hf",
|
| 162 |
"shows": "Gives the human-readable map from project scope to data, tasks, platform mirrors, and scale-up status.",
|
| 163 |
"exists": true,
|
| 164 |
+
"bytes": 11440,
|
| 165 |
+
"sha256": "9b8821a9b14fe1744f2e6b5c419b2c5daaf70b57f1944caf1105c36c0c66c119"
|
| 166 |
},
|
| 167 |
{
|
| 168 |
"id": "official_dataset_card_alignment",
|
|
|
|
| 206 |
"shows": "Machine-readable source-alignment pass/fail check for repo, website, and HF surfaces.",
|
| 207 |
"exists": true,
|
| 208 |
"bytes": 4432,
|
| 209 |
+
"sha256": "06c6e2d111c72df01ed127fd288e6675b63e35a21ae12a2523931a072bd0bc49"
|
| 210 |
},
|
| 211 |
{
|
| 212 |
"id": "source_alignment_validator",
|
|
|
|
| 584 |
"surface": "repo_hf",
|
| 585 |
"shows": "Generates the selective artifact catalog from local files.",
|
| 586 |
"exists": true,
|
| 587 |
+
"bytes": 27020,
|
| 588 |
+
"sha256": "0ca7ed96f24caecbab31687cffa99f0eba8471258986412a294614e688c5aff5"
|
| 589 |
},
|
| 590 |
{
|
| 591 |
"id": "publication_audit",
|
|
|
|
| 596 |
"volatile": true,
|
| 597 |
"shows": "Confirms public bundles exclude raw data, caches, heavy archives, and credential text.",
|
| 598 |
"exists": true,
|
| 599 |
+
"bytes": 11811,
|
| 600 |
"hash_policy": "existence_and_size_only"
|
| 601 |
},
|
| 602 |
{
|
|
|
|
| 608 |
"volatile": true,
|
| 609 |
"shows": "Separates setup paths from completed held-out-episode results.",
|
| 610 |
"exists": true,
|
| 611 |
+
"bytes": 18981,
|
| 612 |
"hash_policy": "existence_and_size_only"
|
| 613 |
},
|
| 614 |
{
|
|
|
|
| 620 |
"volatile": true,
|
| 621 |
"shows": "Confirms prepared GitHub/HF Space/artifact/model mirrors share the same critical data, figure, website HTML, and validator files.",
|
| 622 |
"exists": true,
|
| 623 |
+
"bytes": 108621,
|
| 624 |
"hash_policy": "existence_and_size_only"
|
| 625 |
},
|
| 626 |
{
|
|
|
|
| 632 |
"volatile": true,
|
| 633 |
"shows": "Confirms local website links, anchors, JSON data files, and referenced images resolve.",
|
| 634 |
"exists": true,
|
| 635 |
+
"bytes": 14891,
|
| 636 |
"hash_policy": "existence_and_size_only"
|
| 637 |
},
|
| 638 |
{
|
data/foundation_model_plan.json
CHANGED
|
@@ -2,6 +2,16 @@
|
|
| 2 |
"title": "Xperience-10M Foundation Model Plan",
|
| 3 |
"status": "planning_artifact",
|
| 4 |
"current_boundary": "No held-out multi-episode foundation-model result has been completed in this repo. The current foundation-model artifacts are setup-stage until enough valid episodes are prepared and evaluated.",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
"decision": {
|
| 6 |
"immediate_trainable_backbone": "Qwen3-Omni",
|
| 7 |
"first_world_model_branch": "Cosmos 3",
|
|
@@ -10,7 +20,65 @@
|
|
| 10 |
"openpi pi0/pi0.5",
|
| 11 |
"NVIDIA GR00T"
|
| 12 |
],
|
| 13 |
-
"external_reasoning_reference": "Gemini Robotics"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
},
|
| 15 |
"model_families": [
|
| 16 |
{
|
|
@@ -112,6 +180,21 @@
|
|
| 112 |
"current_decision": "optional_baseline_after_data_staging",
|
| 113 |
"entry_condition": "Action labels and baseline protocol exist.",
|
| 114 |
"public_source": "https://github.com/huggingface/lerobot"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 115 |
}
|
| 116 |
],
|
| 117 |
"execution_order": [
|
|
@@ -144,6 +227,11 @@
|
|
| 144 |
"step": 6,
|
| 145 |
"name": "Publishing threshold",
|
| 146 |
"action": "Publish branch results only with real manifests, predictions, metrics, and qualitative examples."
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 147 |
}
|
| 148 |
],
|
| 149 |
"evaluation_additions": [
|
|
@@ -230,6 +318,10 @@
|
|
| 230 |
{
|
| 231 |
"label": "LeRobot / SmolVLA",
|
| 232 |
"url": "https://github.com/huggingface/lerobot"
|
|
|
|
|
|
|
|
|
|
|
|
|
| 233 |
}
|
| 234 |
]
|
| 235 |
}
|
|
|
|
| 2 |
"title": "Xperience-10M Foundation Model Plan",
|
| 3 |
"status": "planning_artifact",
|
| 4 |
"current_boundary": "No held-out multi-episode foundation-model result has been completed in this repo. The current foundation-model artifacts are setup-stage until enough valid episodes are prepared and evaluated.",
|
| 5 |
+
"backbone_registry": {
|
| 6 |
+
"config_dir": "configs/omni_backbones",
|
| 7 |
+
"validator": "scripts/omni/backbone_registry.py --validate --json",
|
| 8 |
+
"extension_contract": "OMNI_MODEL_EXTENSION_CONTRACT.md",
|
| 9 |
+
"implemented_backbone": "qwen3_omni_lora",
|
| 10 |
+
"planned_backbones": [
|
| 11 |
+
"cosmos_world_model",
|
| 12 |
+
"policy_vla_branch"
|
| 13 |
+
]
|
| 14 |
+
},
|
| 15 |
"decision": {
|
| 16 |
"immediate_trainable_backbone": "Qwen3-Omni",
|
| 17 |
"first_world_model_branch": "Cosmos 3",
|
|
|
|
| 20 |
"openpi pi0/pi0.5",
|
| 21 |
"NVIDIA GR00T"
|
| 22 |
],
|
| 23 |
+
"external_reasoning_reference": "Gemini Robotics",
|
| 24 |
+
"long_term_native_pretraining_goal": "Xperience Embodied Foundation Model"
|
| 25 |
+
},
|
| 26 |
+
"future_pretraining_goal": {
|
| 27 |
+
"name": "Xperience Embodied Foundation Model",
|
| 28 |
+
"status": "future_planning_goal",
|
| 29 |
+
"role": "Domain-specific embodied foundation model pretrained on full Xperience-10M if full-corpus data, storage, and compute become available.",
|
| 30 |
+
"not_current_result": true,
|
| 31 |
+
"document": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md",
|
| 32 |
+
"entry_conditions": [
|
| 33 |
+
"Selected multi-episode Qwen3-Omni pilot trains and evaluates cleanly.",
|
| 34 |
+
"Scaling from 128 episodes to thousands of episodes shows measurable value.",
|
| 35 |
+
"Full-corpus storage, derived-shard storage, and fast active-cache capacity are available.",
|
| 36 |
+
"Distributed training, checkpoint/restart, and provenance tracking are reliable.",
|
| 37 |
+
"Evaluation covers held-out episodes, sessions, activities, objects, and missing-modality robustness."
|
| 38 |
+
],
|
| 39 |
+
"target_modules": [
|
| 40 |
+
"multi-view video encoder",
|
| 41 |
+
"audio encoder",
|
| 42 |
+
"depth and geometry encoder",
|
| 43 |
+
"pose/SLAM encoder",
|
| 44 |
+
"hand/body mocap encoder",
|
| 45 |
+
"IMU encoder",
|
| 46 |
+
"language encoder/decoder",
|
| 47 |
+
"temporal fusion transformer",
|
| 48 |
+
"task heads and decoders"
|
| 49 |
+
],
|
| 50 |
+
"pretraining_objectives": [
|
| 51 |
+
"masked multimodal modeling",
|
| 52 |
+
"cross-modal contrastive alignment",
|
| 53 |
+
"future-state prediction",
|
| 54 |
+
"ego-motion and hand-motion forecasting",
|
| 55 |
+
"action and procedure prediction",
|
| 56 |
+
"language grounding and captioning",
|
| 57 |
+
"contact and affordance prediction",
|
| 58 |
+
"optional policy-style targets after action conversion"
|
| 59 |
+
],
|
| 60 |
+
"hardware_ranges": [
|
| 61 |
+
{
|
| 62 |
+
"goal": "0.3B-1B pilot",
|
| 63 |
+
"compute": "8-32 modern 80GB-class data-center GPUs",
|
| 64 |
+
"use": "prove objectives and data loaders"
|
| 65 |
+
},
|
| 66 |
+
{
|
| 67 |
+
"goal": "1B-3B domain model",
|
| 68 |
+
"compute": "32-128 GPUs",
|
| 69 |
+
"use": "research-scale Xperience representation learning"
|
| 70 |
+
},
|
| 71 |
+
{
|
| 72 |
+
"goal": "3B-7B full-corpus domain model",
|
| 73 |
+
"compute": "128-512 GPUs",
|
| 74 |
+
"use": "first realistic full Xperience-native foundation model"
|
| 75 |
+
},
|
| 76 |
+
{
|
| 77 |
+
"goal": "30B-class omni model from scratch",
|
| 78 |
+
"compute": "512-2000+ GPUs",
|
| 79 |
+
"use": "lab-scale project after scaling curves justify cost"
|
| 80 |
+
}
|
| 81 |
+
]
|
| 82 |
},
|
| 83 |
"model_families": [
|
| 84 |
{
|
|
|
|
| 180 |
"current_decision": "optional_baseline_after_data_staging",
|
| 181 |
"entry_condition": "Action labels and baseline protocol exist.",
|
| 182 |
"public_source": "https://github.com/huggingface/lerobot"
|
| 183 |
+
},
|
| 184 |
+
{
|
| 185 |
+
"priority": 8,
|
| 186 |
+
"family": "Xperience Embodied Foundation Model",
|
| 187 |
+
"category": "xperience_native_pretraining_goal",
|
| 188 |
+
"openness": "future project-specific model if full-corpus access and compute exist",
|
| 189 |
+
"best_role": "Domain model over synchronized embodied experience.",
|
| 190 |
+
"xperience10m_fit": [
|
| 191 |
+
"Uses the full aligned modality stack rather than treating sensors as auxiliary metadata.",
|
| 192 |
+
"Targets temporal embodied representation learning across perception, motion, geometry, audio, and language.",
|
| 193 |
+
"Can become the shared pretraining backbone for Qwen-style instruction tasks, Cosmos-style world modeling, and policy/action branches."
|
| 194 |
+
],
|
| 195 |
+
"current_decision": "future_goal_after_scaling_evidence",
|
| 196 |
+
"entry_condition": "Full-corpus data path, PB-scale storage, multi-node compute, and positive smaller-run scaling evidence.",
|
| 197 |
+
"public_source": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md"
|
| 198 |
}
|
| 199 |
],
|
| 200 |
"execution_order": [
|
|
|
|
| 227 |
"step": 6,
|
| 228 |
"name": "Publishing threshold",
|
| 229 |
"action": "Publish branch results only with real manifests, predictions, metrics, and qualitative examples."
|
| 230 |
+
},
|
| 231 |
+
{
|
| 232 |
+
"step": 7,
|
| 233 |
+
"name": "Xperience-native pretraining",
|
| 234 |
+
"action": "Start a from-scratch Xperience Embodied Foundation Model only after smaller scaling stages, full-corpus storage, multi-node compute, and held-out evaluation protocols are in place."
|
| 235 |
}
|
| 236 |
],
|
| 237 |
"evaluation_additions": [
|
|
|
|
| 318 |
{
|
| 319 |
"label": "LeRobot / SmolVLA",
|
| 320 |
"url": "https://github.com/huggingface/lerobot"
|
| 321 |
+
},
|
| 322 |
+
{
|
| 323 |
+
"label": "Xperience Embodied Foundation Model pretraining plan",
|
| 324 |
+
"url": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md"
|
| 325 |
}
|
| 326 |
]
|
| 327 |
}
|
data/mirror_parity.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
{
|
| 2 |
"status": "pass",
|
| 3 |
-
"generated_at_utc": "2026-06-
|
| 4 |
"hf_root": "hf_publish",
|
| 5 |
"summary": {
|
| 6 |
"group_count": 101,
|
|
@@ -71,27 +71,27 @@
|
|
| 71 |
"local": {
|
| 72 |
"path": "repo:docs/data/artifact_index.json",
|
| 73 |
"exists": true,
|
| 74 |
-
"bytes":
|
| 75 |
-
"sha256": "
|
| 76 |
},
|
| 77 |
"mirrors": {
|
| 78 |
"hf_space": {
|
| 79 |
"path": "hf_space:data/artifact_index.json",
|
| 80 |
"exists": true,
|
| 81 |
-
"bytes":
|
| 82 |
-
"sha256": "
|
| 83 |
},
|
| 84 |
"hf_artifacts": {
|
| 85 |
"path": "hf_artifacts:docs/data/artifact_index.json",
|
| 86 |
"exists": true,
|
| 87 |
-
"bytes":
|
| 88 |
-
"sha256": "
|
| 89 |
},
|
| 90 |
"hf_model": {
|
| 91 |
"path": "hf_model:metrics/artifact_index.json",
|
| 92 |
"exists": true,
|
| 93 |
-
"bytes":
|
| 94 |
-
"sha256": "
|
| 95 |
}
|
| 96 |
},
|
| 97 |
"failures": []
|
|
@@ -226,27 +226,27 @@
|
|
| 226 |
"local": {
|
| 227 |
"path": "repo:docs/data/foundation_model_plan.json",
|
| 228 |
"exists": true,
|
| 229 |
-
"bytes":
|
| 230 |
-
"sha256": "
|
| 231 |
},
|
| 232 |
"mirrors": {
|
| 233 |
"hf_space": {
|
| 234 |
"path": "hf_space:data/foundation_model_plan.json",
|
| 235 |
"exists": true,
|
| 236 |
-
"bytes":
|
| 237 |
-
"sha256": "
|
| 238 |
},
|
| 239 |
"hf_artifacts": {
|
| 240 |
"path": "hf_artifacts:docs/data/foundation_model_plan.json",
|
| 241 |
"exists": true,
|
| 242 |
-
"bytes":
|
| 243 |
-
"sha256": "
|
| 244 |
},
|
| 245 |
"hf_model": {
|
| 246 |
"path": "hf_model:metrics/foundation_model_plan.json",
|
| 247 |
"exists": true,
|
| 248 |
-
"bytes":
|
| 249 |
-
"sha256": "
|
| 250 |
}
|
| 251 |
},
|
| 252 |
"failures": []
|
|
@@ -412,27 +412,27 @@
|
|
| 412 |
"local": {
|
| 413 |
"path": "repo:docs/data/project_status.json",
|
| 414 |
"exists": true,
|
| 415 |
-
"bytes":
|
| 416 |
-
"sha256": "
|
| 417 |
},
|
| 418 |
"mirrors": {
|
| 419 |
"hf_space": {
|
| 420 |
"path": "hf_space:data/project_status.json",
|
| 421 |
"exists": true,
|
| 422 |
-
"bytes":
|
| 423 |
-
"sha256": "
|
| 424 |
},
|
| 425 |
"hf_artifacts": {
|
| 426 |
"path": "hf_artifacts:docs/data/project_status.json",
|
| 427 |
"exists": true,
|
| 428 |
-
"bytes":
|
| 429 |
-
"sha256": "
|
| 430 |
},
|
| 431 |
"hf_model": {
|
| 432 |
"path": "hf_model:metrics/project_status.json",
|
| 433 |
"exists": true,
|
| 434 |
-
"bytes":
|
| 435 |
-
"sha256": "
|
| 436 |
}
|
| 437 |
},
|
| 438 |
"failures": []
|
|
@@ -444,26 +444,26 @@
|
|
| 444 |
"path": "repo:docs/data/publication_audit.json",
|
| 445 |
"exists": true,
|
| 446 |
"bytes": 7237,
|
| 447 |
-
"sha256": "
|
| 448 |
},
|
| 449 |
"mirrors": {
|
| 450 |
"hf_space": {
|
| 451 |
"path": "hf_space:data/publication_audit.json",
|
| 452 |
"exists": true,
|
| 453 |
"bytes": 7237,
|
| 454 |
-
"sha256": "
|
| 455 |
},
|
| 456 |
"hf_artifacts": {
|
| 457 |
"path": "hf_artifacts:docs/data/publication_audit.json",
|
| 458 |
"exists": true,
|
| 459 |
"bytes": 7237,
|
| 460 |
-
"sha256": "
|
| 461 |
},
|
| 462 |
"hf_model": {
|
| 463 |
"path": "hf_model:metrics/publication_audit.json",
|
| 464 |
"exists": true,
|
| 465 |
"bytes": 7237,
|
| 466 |
-
"sha256": "
|
| 467 |
}
|
| 468 |
},
|
| 469 |
"failures": []
|
|
@@ -598,27 +598,27 @@
|
|
| 598 |
"local": {
|
| 599 |
"path": "repo:docs/data/research_roadmap.json",
|
| 600 |
"exists": true,
|
| 601 |
-
"bytes":
|
| 602 |
-
"sha256": "
|
| 603 |
},
|
| 604 |
"mirrors": {
|
| 605 |
"hf_space": {
|
| 606 |
"path": "hf_space:data/research_roadmap.json",
|
| 607 |
"exists": true,
|
| 608 |
-
"bytes":
|
| 609 |
-
"sha256": "
|
| 610 |
},
|
| 611 |
"hf_artifacts": {
|
| 612 |
"path": "hf_artifacts:docs/data/research_roadmap.json",
|
| 613 |
"exists": true,
|
| 614 |
-
"bytes":
|
| 615 |
-
"sha256": "
|
| 616 |
},
|
| 617 |
"hf_model": {
|
| 618 |
"path": "hf_model:metrics/research_roadmap.json",
|
| 619 |
"exists": true,
|
| 620 |
-
"bytes":
|
| 621 |
-
"sha256": "
|
| 622 |
}
|
| 623 |
},
|
| 624 |
"failures": []
|
|
@@ -629,27 +629,27 @@
|
|
| 629 |
"local": {
|
| 630 |
"path": "repo:docs/data/research_roadmap_interactive.json",
|
| 631 |
"exists": true,
|
| 632 |
-
"bytes":
|
| 633 |
-
"sha256": "
|
| 634 |
},
|
| 635 |
"mirrors": {
|
| 636 |
"hf_space": {
|
| 637 |
"path": "hf_space:data/research_roadmap_interactive.json",
|
| 638 |
"exists": true,
|
| 639 |
-
"bytes":
|
| 640 |
-
"sha256": "
|
| 641 |
},
|
| 642 |
"hf_artifacts": {
|
| 643 |
"path": "hf_artifacts:docs/data/research_roadmap_interactive.json",
|
| 644 |
"exists": true,
|
| 645 |
-
"bytes":
|
| 646 |
-
"sha256": "
|
| 647 |
},
|
| 648 |
"hf_model": {
|
| 649 |
"path": "hf_model:metrics/research_roadmap_interactive.json",
|
| 650 |
"exists": true,
|
| 651 |
-
"bytes":
|
| 652 |
-
"sha256": "
|
| 653 |
}
|
| 654 |
},
|
| 655 |
"failures": []
|
|
@@ -1692,21 +1692,21 @@
|
|
| 1692 |
"local": {
|
| 1693 |
"path": "repo:scripts/build_artifact_index.py",
|
| 1694 |
"exists": true,
|
| 1695 |
-
"bytes":
|
| 1696 |
-
"sha256": "
|
| 1697 |
},
|
| 1698 |
"mirrors": {
|
| 1699 |
"hf_artifacts": {
|
| 1700 |
"path": "hf_artifacts:scripts/build_artifact_index.py",
|
| 1701 |
"exists": true,
|
| 1702 |
-
"bytes":
|
| 1703 |
-
"sha256": "
|
| 1704 |
},
|
| 1705 |
"hf_model": {
|
| 1706 |
"path": "hf_model:scripts/build_artifact_index.py",
|
| 1707 |
"exists": true,
|
| 1708 |
-
"bytes":
|
| 1709 |
-
"sha256": "
|
| 1710 |
}
|
| 1711 |
},
|
| 1712 |
"failures": []
|
|
@@ -2017,21 +2017,21 @@
|
|
| 2017 |
"local": {
|
| 2018 |
"path": "repo:scripts/validate_publication_package.py",
|
| 2019 |
"exists": true,
|
| 2020 |
-
"bytes":
|
| 2021 |
-
"sha256": "
|
| 2022 |
},
|
| 2023 |
"mirrors": {
|
| 2024 |
"hf_artifacts": {
|
| 2025 |
"path": "hf_artifacts:scripts/validate_publication_package.py",
|
| 2026 |
"exists": true,
|
| 2027 |
-
"bytes":
|
| 2028 |
-
"sha256": "
|
| 2029 |
},
|
| 2030 |
"hf_model": {
|
| 2031 |
"path": "hf_model:scripts/validate_publication_package.py",
|
| 2032 |
"exists": true,
|
| 2033 |
-
"bytes":
|
| 2034 |
-
"sha256": "
|
| 2035 |
}
|
| 2036 |
},
|
| 2037 |
"failures": []
|
|
@@ -2217,21 +2217,21 @@
|
|
| 2217 |
"local": {
|
| 2218 |
"path": "repo:docs/index.html",
|
| 2219 |
"exists": true,
|
| 2220 |
-
"bytes":
|
| 2221 |
-
"sha256": "
|
| 2222 |
},
|
| 2223 |
"mirrors": {
|
| 2224 |
"hf_space": {
|
| 2225 |
"path": "hf_space:index.html",
|
| 2226 |
"exists": true,
|
| 2227 |
-
"bytes":
|
| 2228 |
-
"sha256": "
|
| 2229 |
},
|
| 2230 |
"hf_artifacts_docs": {
|
| 2231 |
"path": "hf_artifacts:docs/index.html",
|
| 2232 |
"exists": true,
|
| 2233 |
-
"bytes":
|
| 2234 |
-
"sha256": "
|
| 2235 |
}
|
| 2236 |
},
|
| 2237 |
"failures": []
|
|
@@ -2242,21 +2242,21 @@
|
|
| 2242 |
"local": {
|
| 2243 |
"path": "repo:docs/research_roadmap.html",
|
| 2244 |
"exists": true,
|
| 2245 |
-
"bytes":
|
| 2246 |
-
"sha256": "
|
| 2247 |
},
|
| 2248 |
"mirrors": {
|
| 2249 |
"hf_space": {
|
| 2250 |
"path": "hf_space:research_roadmap.html",
|
| 2251 |
"exists": true,
|
| 2252 |
-
"bytes":
|
| 2253 |
-
"sha256": "
|
| 2254 |
},
|
| 2255 |
"hf_artifacts_docs": {
|
| 2256 |
"path": "hf_artifacts:docs/research_roadmap.html",
|
| 2257 |
"exists": true,
|
| 2258 |
-
"bytes":
|
| 2259 |
-
"sha256": "
|
| 2260 |
}
|
| 2261 |
},
|
| 2262 |
"failures": []
|
|
@@ -2844,27 +2844,27 @@
|
|
| 2844 |
"local": {
|
| 2845 |
"path": "repo:FOUNDATION_MODEL_PLAN.md",
|
| 2846 |
"exists": true,
|
| 2847 |
-
"bytes":
|
| 2848 |
-
"sha256": "
|
| 2849 |
},
|
| 2850 |
"mirrors": {
|
| 2851 |
"hf_space": {
|
| 2852 |
"path": "hf_space:FOUNDATION_MODEL_PLAN.md",
|
| 2853 |
"exists": true,
|
| 2854 |
-
"bytes":
|
| 2855 |
-
"sha256": "
|
| 2856 |
},
|
| 2857 |
"hf_artifacts": {
|
| 2858 |
"path": "hf_artifacts:FOUNDATION_MODEL_PLAN.md",
|
| 2859 |
"exists": true,
|
| 2860 |
-
"bytes":
|
| 2861 |
-
"sha256": "
|
| 2862 |
},
|
| 2863 |
"hf_model": {
|
| 2864 |
"path": "hf_model:FOUNDATION_MODEL_PLAN.md",
|
| 2865 |
"exists": true,
|
| 2866 |
-
"bytes":
|
| 2867 |
-
"sha256": "
|
| 2868 |
}
|
| 2869 |
},
|
| 2870 |
"failures": []
|
|
@@ -2937,27 +2937,27 @@
|
|
| 2937 |
"local": {
|
| 2938 |
"path": "repo:RESEARCH_ROADMAP.md",
|
| 2939 |
"exists": true,
|
| 2940 |
-
"bytes":
|
| 2941 |
-
"sha256": "
|
| 2942 |
},
|
| 2943 |
"mirrors": {
|
| 2944 |
"hf_space": {
|
| 2945 |
"path": "hf_space:RESEARCH_ROADMAP.md",
|
| 2946 |
"exists": true,
|
| 2947 |
-
"bytes":
|
| 2948 |
-
"sha256": "
|
| 2949 |
},
|
| 2950 |
"hf_artifacts": {
|
| 2951 |
"path": "hf_artifacts:RESEARCH_ROADMAP.md",
|
| 2952 |
"exists": true,
|
| 2953 |
-
"bytes":
|
| 2954 |
-
"sha256": "
|
| 2955 |
},
|
| 2956 |
"hf_model": {
|
| 2957 |
"path": "hf_model:RESEARCH_ROADMAP.md",
|
| 2958 |
"exists": true,
|
| 2959 |
-
"bytes":
|
| 2960 |
-
"sha256": "
|
| 2961 |
}
|
| 2962 |
},
|
| 2963 |
"failures": []
|
|
@@ -2968,27 +2968,27 @@
|
|
| 2968 |
"local": {
|
| 2969 |
"path": "repo:PROJECT_STATUS.md",
|
| 2970 |
"exists": true,
|
| 2971 |
-
"bytes":
|
| 2972 |
-
"sha256": "
|
| 2973 |
},
|
| 2974 |
"mirrors": {
|
| 2975 |
"hf_space": {
|
| 2976 |
"path": "hf_space:PROJECT_STATUS.md",
|
| 2977 |
"exists": true,
|
| 2978 |
-
"bytes":
|
| 2979 |
-
"sha256": "
|
| 2980 |
},
|
| 2981 |
"hf_artifacts": {
|
| 2982 |
"path": "hf_artifacts:PROJECT_STATUS.md",
|
| 2983 |
"exists": true,
|
| 2984 |
-
"bytes":
|
| 2985 |
-
"sha256": "
|
| 2986 |
},
|
| 2987 |
"hf_model": {
|
| 2988 |
"path": "hf_model:PROJECT_STATUS.md",
|
| 2989 |
"exists": true,
|
| 2990 |
-
"bytes":
|
| 2991 |
-
"sha256": "
|
| 2992 |
}
|
| 2993 |
},
|
| 2994 |
"failures": []
|
|
|
|
| 1 |
{
|
| 2 |
"status": "pass",
|
| 3 |
+
"generated_at_utc": "2026-06-04T20:45:22+00:00",
|
| 4 |
"hf_root": "hf_publish",
|
| 5 |
"summary": {
|
| 6 |
"group_count": 101,
|
|
|
|
| 71 |
"local": {
|
| 72 |
"path": "repo:docs/data/artifact_index.json",
|
| 73 |
"exists": true,
|
| 74 |
+
"bytes": 32864,
|
| 75 |
+
"sha256": "ec7d17898c42fd76109567c201f9638059b6a9a11a48817b32677a0eb2662178"
|
| 76 |
},
|
| 77 |
"mirrors": {
|
| 78 |
"hf_space": {
|
| 79 |
"path": "hf_space:data/artifact_index.json",
|
| 80 |
"exists": true,
|
| 81 |
+
"bytes": 32864,
|
| 82 |
+
"sha256": "ec7d17898c42fd76109567c201f9638059b6a9a11a48817b32677a0eb2662178"
|
| 83 |
},
|
| 84 |
"hf_artifacts": {
|
| 85 |
"path": "hf_artifacts:docs/data/artifact_index.json",
|
| 86 |
"exists": true,
|
| 87 |
+
"bytes": 32864,
|
| 88 |
+
"sha256": "ec7d17898c42fd76109567c201f9638059b6a9a11a48817b32677a0eb2662178"
|
| 89 |
},
|
| 90 |
"hf_model": {
|
| 91 |
"path": "hf_model:metrics/artifact_index.json",
|
| 92 |
"exists": true,
|
| 93 |
+
"bytes": 32864,
|
| 94 |
+
"sha256": "ec7d17898c42fd76109567c201f9638059b6a9a11a48817b32677a0eb2662178"
|
| 95 |
}
|
| 96 |
},
|
| 97 |
"failures": []
|
|
|
|
| 226 |
"local": {
|
| 227 |
"path": "repo:docs/data/foundation_model_plan.json",
|
| 228 |
"exists": true,
|
| 229 |
+
"bytes": 12981,
|
| 230 |
+
"sha256": "9cce52025a2e2f8afb4660e2af3353aea6ad0a1af380849218dd74c0acc271bb"
|
| 231 |
},
|
| 232 |
"mirrors": {
|
| 233 |
"hf_space": {
|
| 234 |
"path": "hf_space:data/foundation_model_plan.json",
|
| 235 |
"exists": true,
|
| 236 |
+
"bytes": 12981,
|
| 237 |
+
"sha256": "9cce52025a2e2f8afb4660e2af3353aea6ad0a1af380849218dd74c0acc271bb"
|
| 238 |
},
|
| 239 |
"hf_artifacts": {
|
| 240 |
"path": "hf_artifacts:docs/data/foundation_model_plan.json",
|
| 241 |
"exists": true,
|
| 242 |
+
"bytes": 12981,
|
| 243 |
+
"sha256": "9cce52025a2e2f8afb4660e2af3353aea6ad0a1af380849218dd74c0acc271bb"
|
| 244 |
},
|
| 245 |
"hf_model": {
|
| 246 |
"path": "hf_model:metrics/foundation_model_plan.json",
|
| 247 |
"exists": true,
|
| 248 |
+
"bytes": 12981,
|
| 249 |
+
"sha256": "9cce52025a2e2f8afb4660e2af3353aea6ad0a1af380849218dd74c0acc271bb"
|
| 250 |
}
|
| 251 |
},
|
| 252 |
"failures": []
|
|
|
|
| 412 |
"local": {
|
| 413 |
"path": "repo:docs/data/project_status.json",
|
| 414 |
"exists": true,
|
| 415 |
+
"bytes": 9874,
|
| 416 |
+
"sha256": "600c95726eae3404127a8b2110f35468ff2ba02943cae0fbcd3ea43c66109d3e"
|
| 417 |
},
|
| 418 |
"mirrors": {
|
| 419 |
"hf_space": {
|
| 420 |
"path": "hf_space:data/project_status.json",
|
| 421 |
"exists": true,
|
| 422 |
+
"bytes": 9874,
|
| 423 |
+
"sha256": "600c95726eae3404127a8b2110f35468ff2ba02943cae0fbcd3ea43c66109d3e"
|
| 424 |
},
|
| 425 |
"hf_artifacts": {
|
| 426 |
"path": "hf_artifacts:docs/data/project_status.json",
|
| 427 |
"exists": true,
|
| 428 |
+
"bytes": 9874,
|
| 429 |
+
"sha256": "600c95726eae3404127a8b2110f35468ff2ba02943cae0fbcd3ea43c66109d3e"
|
| 430 |
},
|
| 431 |
"hf_model": {
|
| 432 |
"path": "hf_model:metrics/project_status.json",
|
| 433 |
"exists": true,
|
| 434 |
+
"bytes": 9874,
|
| 435 |
+
"sha256": "600c95726eae3404127a8b2110f35468ff2ba02943cae0fbcd3ea43c66109d3e"
|
| 436 |
}
|
| 437 |
},
|
| 438 |
"failures": []
|
|
|
|
| 444 |
"path": "repo:docs/data/publication_audit.json",
|
| 445 |
"exists": true,
|
| 446 |
"bytes": 7237,
|
| 447 |
+
"sha256": "7fbb19f8990b1a4d902e282c010d27e4391755564fa68af97d96c298c6b054f8"
|
| 448 |
},
|
| 449 |
"mirrors": {
|
| 450 |
"hf_space": {
|
| 451 |
"path": "hf_space:data/publication_audit.json",
|
| 452 |
"exists": true,
|
| 453 |
"bytes": 7237,
|
| 454 |
+
"sha256": "7fbb19f8990b1a4d902e282c010d27e4391755564fa68af97d96c298c6b054f8"
|
| 455 |
},
|
| 456 |
"hf_artifacts": {
|
| 457 |
"path": "hf_artifacts:docs/data/publication_audit.json",
|
| 458 |
"exists": true,
|
| 459 |
"bytes": 7237,
|
| 460 |
+
"sha256": "7fbb19f8990b1a4d902e282c010d27e4391755564fa68af97d96c298c6b054f8"
|
| 461 |
},
|
| 462 |
"hf_model": {
|
| 463 |
"path": "hf_model:metrics/publication_audit.json",
|
| 464 |
"exists": true,
|
| 465 |
"bytes": 7237,
|
| 466 |
+
"sha256": "7fbb19f8990b1a4d902e282c010d27e4391755564fa68af97d96c298c6b054f8"
|
| 467 |
}
|
| 468 |
},
|
| 469 |
"failures": []
|
|
|
|
| 598 |
"local": {
|
| 599 |
"path": "repo:docs/data/research_roadmap.json",
|
| 600 |
"exists": true,
|
| 601 |
+
"bytes": 7161,
|
| 602 |
+
"sha256": "cc96118c2c05108c831616151bc027441f7545495adeeb6a4a6a6bffe8da7801"
|
| 603 |
},
|
| 604 |
"mirrors": {
|
| 605 |
"hf_space": {
|
| 606 |
"path": "hf_space:data/research_roadmap.json",
|
| 607 |
"exists": true,
|
| 608 |
+
"bytes": 7161,
|
| 609 |
+
"sha256": "cc96118c2c05108c831616151bc027441f7545495adeeb6a4a6a6bffe8da7801"
|
| 610 |
},
|
| 611 |
"hf_artifacts": {
|
| 612 |
"path": "hf_artifacts:docs/data/research_roadmap.json",
|
| 613 |
"exists": true,
|
| 614 |
+
"bytes": 7161,
|
| 615 |
+
"sha256": "cc96118c2c05108c831616151bc027441f7545495adeeb6a4a6a6bffe8da7801"
|
| 616 |
},
|
| 617 |
"hf_model": {
|
| 618 |
"path": "hf_model:metrics/research_roadmap.json",
|
| 619 |
"exists": true,
|
| 620 |
+
"bytes": 7161,
|
| 621 |
+
"sha256": "cc96118c2c05108c831616151bc027441f7545495adeeb6a4a6a6bffe8da7801"
|
| 622 |
}
|
| 623 |
},
|
| 624 |
"failures": []
|
|
|
|
| 629 |
"local": {
|
| 630 |
"path": "repo:docs/data/research_roadmap_interactive.json",
|
| 631 |
"exists": true,
|
| 632 |
+
"bytes": 134282,
|
| 633 |
+
"sha256": "ff37219a9f1d9b386a9d4c42766e4aa28f10ce6ef338dceeedd6bdb4a1b2c40a"
|
| 634 |
},
|
| 635 |
"mirrors": {
|
| 636 |
"hf_space": {
|
| 637 |
"path": "hf_space:data/research_roadmap_interactive.json",
|
| 638 |
"exists": true,
|
| 639 |
+
"bytes": 134282,
|
| 640 |
+
"sha256": "ff37219a9f1d9b386a9d4c42766e4aa28f10ce6ef338dceeedd6bdb4a1b2c40a"
|
| 641 |
},
|
| 642 |
"hf_artifacts": {
|
| 643 |
"path": "hf_artifacts:docs/data/research_roadmap_interactive.json",
|
| 644 |
"exists": true,
|
| 645 |
+
"bytes": 134282,
|
| 646 |
+
"sha256": "ff37219a9f1d9b386a9d4c42766e4aa28f10ce6ef338dceeedd6bdb4a1b2c40a"
|
| 647 |
},
|
| 648 |
"hf_model": {
|
| 649 |
"path": "hf_model:metrics/research_roadmap_interactive.json",
|
| 650 |
"exists": true,
|
| 651 |
+
"bytes": 134282,
|
| 652 |
+
"sha256": "ff37219a9f1d9b386a9d4c42766e4aa28f10ce6ef338dceeedd6bdb4a1b2c40a"
|
| 653 |
}
|
| 654 |
},
|
| 655 |
"failures": []
|
|
|
|
| 1692 |
"local": {
|
| 1693 |
"path": "repo:scripts/build_artifact_index.py",
|
| 1694 |
"exists": true,
|
| 1695 |
+
"bytes": 27020,
|
| 1696 |
+
"sha256": "0ca7ed96f24caecbab31687cffa99f0eba8471258986412a294614e688c5aff5"
|
| 1697 |
},
|
| 1698 |
"mirrors": {
|
| 1699 |
"hf_artifacts": {
|
| 1700 |
"path": "hf_artifacts:scripts/build_artifact_index.py",
|
| 1701 |
"exists": true,
|
| 1702 |
+
"bytes": 27020,
|
| 1703 |
+
"sha256": "0ca7ed96f24caecbab31687cffa99f0eba8471258986412a294614e688c5aff5"
|
| 1704 |
},
|
| 1705 |
"hf_model": {
|
| 1706 |
"path": "hf_model:scripts/build_artifact_index.py",
|
| 1707 |
"exists": true,
|
| 1708 |
+
"bytes": 27020,
|
| 1709 |
+
"sha256": "0ca7ed96f24caecbab31687cffa99f0eba8471258986412a294614e688c5aff5"
|
| 1710 |
}
|
| 1711 |
},
|
| 1712 |
"failures": []
|
|
|
|
| 2017 |
"local": {
|
| 2018 |
"path": "repo:scripts/validate_publication_package.py",
|
| 2019 |
"exists": true,
|
| 2020 |
+
"bytes": 17197,
|
| 2021 |
+
"sha256": "2a617f3204ffb8c59d1c5bc1828b4441a4d014bb531655fd0613e128a6d9abc2"
|
| 2022 |
},
|
| 2023 |
"mirrors": {
|
| 2024 |
"hf_artifacts": {
|
| 2025 |
"path": "hf_artifacts:scripts/validate_publication_package.py",
|
| 2026 |
"exists": true,
|
| 2027 |
+
"bytes": 17197,
|
| 2028 |
+
"sha256": "2a617f3204ffb8c59d1c5bc1828b4441a4d014bb531655fd0613e128a6d9abc2"
|
| 2029 |
},
|
| 2030 |
"hf_model": {
|
| 2031 |
"path": "hf_model:scripts/validate_publication_package.py",
|
| 2032 |
"exists": true,
|
| 2033 |
+
"bytes": 17197,
|
| 2034 |
+
"sha256": "2a617f3204ffb8c59d1c5bc1828b4441a4d014bb531655fd0613e128a6d9abc2"
|
| 2035 |
}
|
| 2036 |
},
|
| 2037 |
"failures": []
|
|
|
|
| 2217 |
"local": {
|
| 2218 |
"path": "repo:docs/index.html",
|
| 2219 |
"exists": true,
|
| 2220 |
+
"bytes": 174923,
|
| 2221 |
+
"sha256": "099fcc01cbb4d50f62c508b10f343f05b1c883962b85bda294bcede99af2a0f1"
|
| 2222 |
},
|
| 2223 |
"mirrors": {
|
| 2224 |
"hf_space": {
|
| 2225 |
"path": "hf_space:index.html",
|
| 2226 |
"exists": true,
|
| 2227 |
+
"bytes": 174923,
|
| 2228 |
+
"sha256": "099fcc01cbb4d50f62c508b10f343f05b1c883962b85bda294bcede99af2a0f1"
|
| 2229 |
},
|
| 2230 |
"hf_artifacts_docs": {
|
| 2231 |
"path": "hf_artifacts:docs/index.html",
|
| 2232 |
"exists": true,
|
| 2233 |
+
"bytes": 174923,
|
| 2234 |
+
"sha256": "099fcc01cbb4d50f62c508b10f343f05b1c883962b85bda294bcede99af2a0f1"
|
| 2235 |
}
|
| 2236 |
},
|
| 2237 |
"failures": []
|
|
|
|
| 2242 |
"local": {
|
| 2243 |
"path": "repo:docs/research_roadmap.html",
|
| 2244 |
"exists": true,
|
| 2245 |
+
"bytes": 31702,
|
| 2246 |
+
"sha256": "1b20a5cc342b3ba59ad808eed9f5bf978e2d9ac438c88b5c3eeba01f4e14b883"
|
| 2247 |
},
|
| 2248 |
"mirrors": {
|
| 2249 |
"hf_space": {
|
| 2250 |
"path": "hf_space:research_roadmap.html",
|
| 2251 |
"exists": true,
|
| 2252 |
+
"bytes": 31702,
|
| 2253 |
+
"sha256": "1b20a5cc342b3ba59ad808eed9f5bf978e2d9ac438c88b5c3eeba01f4e14b883"
|
| 2254 |
},
|
| 2255 |
"hf_artifacts_docs": {
|
| 2256 |
"path": "hf_artifacts:docs/research_roadmap.html",
|
| 2257 |
"exists": true,
|
| 2258 |
+
"bytes": 31702,
|
| 2259 |
+
"sha256": "1b20a5cc342b3ba59ad808eed9f5bf978e2d9ac438c88b5c3eeba01f4e14b883"
|
| 2260 |
}
|
| 2261 |
},
|
| 2262 |
"failures": []
|
|
|
|
| 2844 |
"local": {
|
| 2845 |
"path": "repo:FOUNDATION_MODEL_PLAN.md",
|
| 2846 |
"exists": true,
|
| 2847 |
+
"bytes": 9075,
|
| 2848 |
+
"sha256": "444d13ab556d2e16a199a7fca191b87c85ab8685d167aab357bc6341839299a2"
|
| 2849 |
},
|
| 2850 |
"mirrors": {
|
| 2851 |
"hf_space": {
|
| 2852 |
"path": "hf_space:FOUNDATION_MODEL_PLAN.md",
|
| 2853 |
"exists": true,
|
| 2854 |
+
"bytes": 9075,
|
| 2855 |
+
"sha256": "444d13ab556d2e16a199a7fca191b87c85ab8685d167aab357bc6341839299a2"
|
| 2856 |
},
|
| 2857 |
"hf_artifacts": {
|
| 2858 |
"path": "hf_artifacts:FOUNDATION_MODEL_PLAN.md",
|
| 2859 |
"exists": true,
|
| 2860 |
+
"bytes": 9075,
|
| 2861 |
+
"sha256": "444d13ab556d2e16a199a7fca191b87c85ab8685d167aab357bc6341839299a2"
|
| 2862 |
},
|
| 2863 |
"hf_model": {
|
| 2864 |
"path": "hf_model:FOUNDATION_MODEL_PLAN.md",
|
| 2865 |
"exists": true,
|
| 2866 |
+
"bytes": 9075,
|
| 2867 |
+
"sha256": "444d13ab556d2e16a199a7fca191b87c85ab8685d167aab357bc6341839299a2"
|
| 2868 |
}
|
| 2869 |
},
|
| 2870 |
"failures": []
|
|
|
|
| 2937 |
"local": {
|
| 2938 |
"path": "repo:RESEARCH_ROADMAP.md",
|
| 2939 |
"exists": true,
|
| 2940 |
+
"bytes": 8388,
|
| 2941 |
+
"sha256": "0b3e3356076998ad94dc39f708cc783a4ebeab76c9da661cdd37ea12a3bb3665"
|
| 2942 |
},
|
| 2943 |
"mirrors": {
|
| 2944 |
"hf_space": {
|
| 2945 |
"path": "hf_space:RESEARCH_ROADMAP.md",
|
| 2946 |
"exists": true,
|
| 2947 |
+
"bytes": 8388,
|
| 2948 |
+
"sha256": "0b3e3356076998ad94dc39f708cc783a4ebeab76c9da661cdd37ea12a3bb3665"
|
| 2949 |
},
|
| 2950 |
"hf_artifacts": {
|
| 2951 |
"path": "hf_artifacts:RESEARCH_ROADMAP.md",
|
| 2952 |
"exists": true,
|
| 2953 |
+
"bytes": 8388,
|
| 2954 |
+
"sha256": "0b3e3356076998ad94dc39f708cc783a4ebeab76c9da661cdd37ea12a3bb3665"
|
| 2955 |
},
|
| 2956 |
"hf_model": {
|
| 2957 |
"path": "hf_model:RESEARCH_ROADMAP.md",
|
| 2958 |
"exists": true,
|
| 2959 |
+
"bytes": 8388,
|
| 2960 |
+
"sha256": "0b3e3356076998ad94dc39f708cc783a4ebeab76c9da661cdd37ea12a3bb3665"
|
| 2961 |
}
|
| 2962 |
},
|
| 2963 |
"failures": []
|
|
|
|
| 2968 |
"local": {
|
| 2969 |
"path": "repo:PROJECT_STATUS.md",
|
| 2970 |
"exists": true,
|
| 2971 |
+
"bytes": 7207,
|
| 2972 |
+
"sha256": "7baaba976ccc254da1a03ee2653057d1e08f3fb0c0cad035886c362442828720"
|
| 2973 |
},
|
| 2974 |
"mirrors": {
|
| 2975 |
"hf_space": {
|
| 2976 |
"path": "hf_space:PROJECT_STATUS.md",
|
| 2977 |
"exists": true,
|
| 2978 |
+
"bytes": 7207,
|
| 2979 |
+
"sha256": "7baaba976ccc254da1a03ee2653057d1e08f3fb0c0cad035886c362442828720"
|
| 2980 |
},
|
| 2981 |
"hf_artifacts": {
|
| 2982 |
"path": "hf_artifacts:PROJECT_STATUS.md",
|
| 2983 |
"exists": true,
|
| 2984 |
+
"bytes": 7207,
|
| 2985 |
+
"sha256": "7baaba976ccc254da1a03ee2653057d1e08f3fb0c0cad035886c362442828720"
|
| 2986 |
},
|
| 2987 |
"hf_model": {
|
| 2988 |
"path": "hf_model:PROJECT_STATUS.md",
|
| 2989 |
"exists": true,
|
| 2990 |
+
"bytes": 7207,
|
| 2991 |
+
"sha256": "7baaba976ccc254da1a03ee2653057d1e08f3fb0c0cad035886c362442828720"
|
| 2992 |
}
|
| 2993 |
},
|
| 2994 |
"failures": []
|
data/project_status.json
CHANGED
|
@@ -82,7 +82,7 @@
|
|
| 82 |
"RESEARCH_ROADMAP.md",
|
| 83 |
"docs/data/research_roadmap.json"
|
| 84 |
],
|
| 85 |
-
"readout": "The roadmap connects public-sample task development to 128-episode data preparation, Qwen3-Omni LoRA, foundation-model selection, robustness runs, and
|
| 86 |
},
|
| 87 |
{
|
| 88 |
"area": "Foundation-model plan",
|
|
@@ -93,6 +93,14 @@
|
|
| 93 |
],
|
| 94 |
"readout": "Qwen3-Omni remains the first trainable held-out LoRA baseline; Cosmos 3 is added as the first world-model/action-generation branch; OpenVLA/openpi/GR00T are policy candidates after action targets are explicit."
|
| 95 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 96 |
{
|
| 97 |
"area": "Official dataset wording",
|
| 98 |
"status": "verified",
|
|
@@ -167,6 +175,7 @@
|
|
| 167 |
"Inspect RESEARCH_TAKEAWAYS.md and docs/data/research_takeaways.json before interpreting model scores.",
|
| 168 |
"Inspect RESEARCH_ROADMAP.md and docs/data/research_roadmap.json for the path from public-sample task work to multi-episode modeling.",
|
| 169 |
"Inspect FOUNDATION_MODEL_PLAN.md and docs/data/foundation_model_plan.json before choosing a backbone branch.",
|
|
|
|
| 170 |
"Inspect docs/data/summary_metrics.json and results/episode_task_suite/neural_mlp/ to check the 12-task outputs.",
|
| 171 |
"Inspect results/audio_ablation/AUDIO_ABLATION_SUMMARY.md before judging whether audio helps the current task suite.",
|
| 172 |
"Inspect EVALUATION_PROTOCOL.md before judging task metrics or leakage controls.",
|
|
@@ -180,6 +189,7 @@
|
|
| 180 |
"The current reconstruction task reconstructs feature vectors, not pixel-depth, mesh, NeRF, or Gaussian reconstruction.",
|
| 181 |
"Audio is one of the synchronized source modalities in the current task representation.",
|
| 182 |
"The audio ablation report compares audio/no-audio variants across all 12 task contracts in results/audio_ablation/.",
|
| 183 |
-
"Foundation-model selection is explicit: Qwen3-Omni is the immediate trainable pilot, Cosmos 3 is the first world-model branch, and policy models such as OpenVLA/openpi/GR00T wait for action-target conversion."
|
|
|
|
| 184 |
]
|
| 185 |
}
|
|
|
|
| 82 |
"RESEARCH_ROADMAP.md",
|
| 83 |
"docs/data/research_roadmap.json"
|
| 84 |
],
|
| 85 |
+
"readout": "The roadmap connects public-sample task development to 128-episode data preparation, Qwen3-Omni LoRA, foundation-model selection, robustness runs, world/policy branches, and the future Xperience-native pretraining goal."
|
| 86 |
},
|
| 87 |
{
|
| 88 |
"area": "Foundation-model plan",
|
|
|
|
| 93 |
],
|
| 94 |
"readout": "Qwen3-Omni remains the first trainable held-out LoRA baseline; Cosmos 3 is added as the first world-model/action-generation branch; OpenVLA/openpi/GR00T are policy candidates after action targets are explicit."
|
| 95 |
},
|
| 96 |
+
{
|
| 97 |
+
"area": "Xperience Embodied Foundation Model",
|
| 98 |
+
"status": "future_goal",
|
| 99 |
+
"evidence": [
|
| 100 |
+
"XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md"
|
| 101 |
+
],
|
| 102 |
+
"readout": "A future full-corpus pretraining plan describes target modules, objectives, staged scale-up, hardware ranges, and evaluation for a domain-specific embodied foundation model."
|
| 103 |
+
},
|
| 104 |
{
|
| 105 |
"area": "Official dataset wording",
|
| 106 |
"status": "verified",
|
|
|
|
| 175 |
"Inspect RESEARCH_TAKEAWAYS.md and docs/data/research_takeaways.json before interpreting model scores.",
|
| 176 |
"Inspect RESEARCH_ROADMAP.md and docs/data/research_roadmap.json for the path from public-sample task work to multi-episode modeling.",
|
| 177 |
"Inspect FOUNDATION_MODEL_PLAN.md and docs/data/foundation_model_plan.json before choosing a backbone branch.",
|
| 178 |
+
"Inspect XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md for the long-term full-corpus pretraining goal.",
|
| 179 |
"Inspect docs/data/summary_metrics.json and results/episode_task_suite/neural_mlp/ to check the 12-task outputs.",
|
| 180 |
"Inspect results/audio_ablation/AUDIO_ABLATION_SUMMARY.md before judging whether audio helps the current task suite.",
|
| 181 |
"Inspect EVALUATION_PROTOCOL.md before judging task metrics or leakage controls.",
|
|
|
|
| 189 |
"The current reconstruction task reconstructs feature vectors, not pixel-depth, mesh, NeRF, or Gaussian reconstruction.",
|
| 190 |
"Audio is one of the synchronized source modalities in the current task representation.",
|
| 191 |
"The audio ablation report compares audio/no-audio variants across all 12 task contracts in results/audio_ablation/.",
|
| 192 |
+
"Foundation-model selection is explicit: Qwen3-Omni is the immediate trainable pilot, Cosmos 3 is the first world-model branch, and policy models such as OpenVLA/openpi/GR00T wait for action-target conversion.",
|
| 193 |
+
"The Xperience Embodied Foundation Model is a future native-pretraining goal, not a completed model or current benchmark."
|
| 194 |
]
|
| 195 |
}
|
data/publication_audit.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
{
|
| 2 |
"status": "pass",
|
| 3 |
-
"generated_at_utc": "2026-06-
|
| 4 |
"checks": [
|
| 5 |
{
|
| 6 |
"name": "required_publication_assets_present",
|
|
@@ -182,8 +182,8 @@
|
|
| 182 |
"github_repo": {
|
| 183 |
"root": "repo",
|
| 184 |
"exists": true,
|
| 185 |
-
"file_count":
|
| 186 |
-
"text_file_count":
|
| 187 |
"largest_file": {
|
| 188 |
"path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
|
| 189 |
"bytes": 55702978
|
|
@@ -193,8 +193,8 @@
|
|
| 193 |
"hf_space_bundle": {
|
| 194 |
"root": "hf_publish/space",
|
| 195 |
"exists": true,
|
| 196 |
-
"file_count":
|
| 197 |
-
"text_file_count":
|
| 198 |
"largest_file": {
|
| 199 |
"path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
|
| 200 |
"bytes": 55702978
|
|
@@ -204,8 +204,8 @@
|
|
| 204 |
"hf_artifact_bundle": {
|
| 205 |
"root": "hf_publish/artifacts",
|
| 206 |
"exists": true,
|
| 207 |
-
"file_count":
|
| 208 |
-
"text_file_count":
|
| 209 |
"largest_file": {
|
| 210 |
"path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
|
| 211 |
"bytes": 55702978
|
|
@@ -215,8 +215,8 @@
|
|
| 215 |
"hf_model_bundle": {
|
| 216 |
"root": "hf_publish/model",
|
| 217 |
"exists": true,
|
| 218 |
-
"file_count":
|
| 219 |
-
"text_file_count":
|
| 220 |
"largest_file": {
|
| 221 |
"path": "pytorch_model.bin",
|
| 222 |
"bytes": 93495480
|
|
|
|
| 1 |
{
|
| 2 |
"status": "pass",
|
| 3 |
+
"generated_at_utc": "2026-06-04T20:43:37+00:00",
|
| 4 |
"checks": [
|
| 5 |
{
|
| 6 |
"name": "required_publication_assets_present",
|
|
|
|
| 182 |
"github_repo": {
|
| 183 |
"root": "repo",
|
| 184 |
"exists": true,
|
| 185 |
+
"file_count": 396,
|
| 186 |
+
"text_file_count": 330,
|
| 187 |
"largest_file": {
|
| 188 |
"path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
|
| 189 |
"bytes": 55702978
|
|
|
|
| 193 |
"hf_space_bundle": {
|
| 194 |
"root": "hf_publish/space",
|
| 195 |
"exists": true,
|
| 196 |
+
"file_count": 317,
|
| 197 |
+
"text_file_count": 251,
|
| 198 |
"largest_file": {
|
| 199 |
"path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
|
| 200 |
"bytes": 55702978
|
|
|
|
| 204 |
"hf_artifact_bundle": {
|
| 205 |
"root": "hf_publish/artifacts",
|
| 206 |
"exists": true,
|
| 207 |
+
"file_count": 418,
|
| 208 |
+
"text_file_count": 330,
|
| 209 |
"largest_file": {
|
| 210 |
"path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
|
| 211 |
"bytes": 55702978
|
|
|
|
| 215 |
"hf_model_bundle": {
|
| 216 |
"root": "hf_publish/model",
|
| 217 |
"exists": true,
|
| 218 |
+
"file_count": 644,
|
| 219 |
+
"text_file_count": 519,
|
| 220 |
"largest_file": {
|
| 221 |
"path": "pytorch_model.bin",
|
| 222 |
"bytes": 93495480
|
data/research_roadmap.json
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
{
|
| 2 |
"title": "Ropedia Xperience-10M Research Roadmap",
|
| 3 |
-
"summary": "Staged path from the public-sample task lab to multi-episode held-out evaluation, foundation-model selection, and
|
| 4 |
-
"current_decision_point": "Keep the public-sample task suite as the development harness, prepare the selected official Xperience-10M episodes for the held-out Qwen3-Omni pilot, then branch into Cosmos 3 world modeling and policy-model experiments after the data preparation path is stable.",
|
| 5 |
"phases": [
|
| 6 |
{
|
| 7 |
"id": "public_sample_task_lab",
|
|
@@ -126,6 +126,30 @@
|
|
| 126 |
"updated model cards"
|
| 127 |
],
|
| 128 |
"reader_takeaway": "The long-term direction is richer multimodal representation learning for embodied-AI reasoning, with model branches chosen by task fit rather than by a single default backbone."
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 129 |
}
|
| 130 |
],
|
| 131 |
"public_surfaces_to_update": [
|
|
@@ -134,6 +158,7 @@
|
|
| 134 |
"RESEARCH_TAKEAWAYS.md",
|
| 135 |
"EVALUATION_PROTOCOL.md",
|
| 136 |
"ARTIFACT_GUIDE.md",
|
|
|
|
| 137 |
"docs/index.html",
|
| 138 |
"docs/data/research_roadmap.json",
|
| 139 |
"Hugging Face Space card",
|
|
|
|
| 1 |
{
|
| 2 |
"title": "Ropedia Xperience-10M Research Roadmap",
|
| 3 |
+
"summary": "Staged path from the public-sample task lab to multi-episode held-out evaluation, foundation-model selection, world/policy branches, and a future Xperience-native embodied foundation model.",
|
| 4 |
+
"current_decision_point": "Keep the public-sample task suite as the development harness, prepare the selected official Xperience-10M episodes for the held-out Qwen3-Omni pilot, then branch into Cosmos 3 world modeling and policy-model experiments after the data preparation path is stable. The Xperience Embodied Foundation Model is a later full-corpus pretraining goal, not a current result.",
|
| 5 |
"phases": [
|
| 6 |
{
|
| 7 |
"id": "public_sample_task_lab",
|
|
|
|
| 126 |
"updated model cards"
|
| 127 |
],
|
| 128 |
"reader_takeaway": "The long-term direction is richer multimodal representation learning for embodied-AI reasoning, with model branches chosen by task fit rather than by a single default backbone."
|
| 129 |
+
},
|
| 130 |
+
{
|
| 131 |
+
"id": "xperience_embodied_foundation_pretraining",
|
| 132 |
+
"name": "Xperience Embodied Foundation Model Pretraining",
|
| 133 |
+
"status": "future",
|
| 134 |
+
"entry_condition": "Full-corpus access, PB-scale storage path, high-throughput data loading, multi-node compute, and positive scaling evidence from smaller multi-episode runs.",
|
| 135 |
+
"deliverables": [
|
| 136 |
+
"full-corpus episode and split manifests",
|
| 137 |
+
"pretraining shard and provenance manifests",
|
| 138 |
+
"0.3B-1B and 1B-3B scaling pilots",
|
| 139 |
+
"3B-7B Xperience-native domain model target",
|
| 140 |
+
"held-out episode/session/activity/object evaluations",
|
| 141 |
+
"missing-modality robustness report",
|
| 142 |
+
"model card and data-boundary report"
|
| 143 |
+
],
|
| 144 |
+
"completion_evidence": [
|
| 145 |
+
"pretraining metadata",
|
| 146 |
+
"checkpoint inventory",
|
| 147 |
+
"scaling curves",
|
| 148 |
+
"held-out evaluation reports",
|
| 149 |
+
"qualitative retrieval or future-state examples",
|
| 150 |
+
"safety and data-boundary report"
|
| 151 |
+
],
|
| 152 |
+
"reader_takeaway": "The final research direction is a domain-specific embodied foundation model trained directly on Xperience-10M, after smaller pilots justify the cost and infrastructure."
|
| 153 |
}
|
| 154 |
],
|
| 155 |
"public_surfaces_to_update": [
|
|
|
|
| 158 |
"RESEARCH_TAKEAWAYS.md",
|
| 159 |
"EVALUATION_PROTOCOL.md",
|
| 160 |
"ARTIFACT_GUIDE.md",
|
| 161 |
+
"XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md",
|
| 162 |
"docs/index.html",
|
| 163 |
"docs/data/research_roadmap.json",
|
| 164 |
"Hugging Face Space card",
|
data/research_roadmap_interactive.json
CHANGED
|
@@ -1837,7 +1837,8 @@
|
|
| 1837 |
"NVIDIA GR00T"
|
| 1838 |
],
|
| 1839 |
"first_world_model_branch": "Cosmos 3",
|
| 1840 |
-
"immediate_trainable_backbone": "Qwen3-Omni"
|
|
|
|
| 1841 |
},
|
| 1842 |
"evaluation_additions": [
|
| 1843 |
{
|
|
@@ -1921,6 +1922,11 @@
|
|
| 1921 |
"action": "Publish branch results only with real manifests, predictions, metrics, and qualitative examples.",
|
| 1922 |
"name": "Publishing threshold",
|
| 1923 |
"step": 6
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1924 |
}
|
| 1925 |
],
|
| 1926 |
"model_families": [
|
|
@@ -2023,6 +2029,21 @@
|
|
| 2023 |
"Useful after action target design.",
|
| 2024 |
"Less directly omni-modal than Qwen3-Omni or Cosmos 3."
|
| 2025 |
]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2026 |
}
|
| 2027 |
],
|
| 2028 |
"source_links": [
|
|
@@ -2057,11 +2078,15 @@
|
|
| 2057 |
{
|
| 2058 |
"label": "LeRobot / SmolVLA",
|
| 2059 |
"url": "https://github.com/huggingface/lerobot"
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2060 |
}
|
| 2061 |
],
|
| 2062 |
"status": "planning_artifact"
|
| 2063 |
},
|
| 2064 |
-
"generated_at_utc": "2026-06-
|
| 2065 |
"omni_plan": {
|
| 2066 |
"adapter": "LoRA rank 16, alpha 32, dropout 0.05",
|
| 2067 |
"backbone": "Qwen/Qwen3-Omni-30B-A3B-Instruct",
|
|
@@ -2208,6 +2233,31 @@
|
|
| 2208 |
"reader_takeaway": "The long-term direction is richer multimodal representation learning for embodied-AI reasoning, with model branches chosen by task fit rather than by a single default backbone.",
|
| 2209 |
"stage": "future",
|
| 2210 |
"status": "planned"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2211 |
}
|
| 2212 |
],
|
| 2213 |
"scale_up": {
|
|
|
|
| 1837 |
"NVIDIA GR00T"
|
| 1838 |
],
|
| 1839 |
"first_world_model_branch": "Cosmos 3",
|
| 1840 |
+
"immediate_trainable_backbone": "Qwen3-Omni",
|
| 1841 |
+
"long_term_native_pretraining_goal": "Xperience Embodied Foundation Model"
|
| 1842 |
},
|
| 1843 |
"evaluation_additions": [
|
| 1844 |
{
|
|
|
|
| 1922 |
"action": "Publish branch results only with real manifests, predictions, metrics, and qualitative examples.",
|
| 1923 |
"name": "Publishing threshold",
|
| 1924 |
"step": 6
|
| 1925 |
+
},
|
| 1926 |
+
{
|
| 1927 |
+
"action": "Start a from-scratch Xperience Embodied Foundation Model only after smaller scaling stages, full-corpus storage, multi-node compute, and held-out evaluation protocols are in place.",
|
| 1928 |
+
"name": "Xperience-native pretraining",
|
| 1929 |
+
"step": 7
|
| 1930 |
}
|
| 1931 |
],
|
| 1932 |
"model_families": [
|
|
|
|
| 2029 |
"Useful after action target design.",
|
| 2030 |
"Less directly omni-modal than Qwen3-Omni or Cosmos 3."
|
| 2031 |
]
|
| 2032 |
+
},
|
| 2033 |
+
{
|
| 2034 |
+
"best_role": "Domain model over synchronized embodied experience.",
|
| 2035 |
+
"category": "xperience_native_pretraining_goal",
|
| 2036 |
+
"current_decision": "future_goal_after_scaling_evidence",
|
| 2037 |
+
"entry_condition": "Full-corpus data path, PB-scale storage, multi-node compute, and positive smaller-run scaling evidence.",
|
| 2038 |
+
"family": "Xperience Embodied Foundation Model",
|
| 2039 |
+
"openness": "future project-specific model if full-corpus access and compute exist",
|
| 2040 |
+
"priority": 8,
|
| 2041 |
+
"public_source": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md",
|
| 2042 |
+
"xperience10m_fit": [
|
| 2043 |
+
"Uses the full aligned modality stack rather than treating sensors as auxiliary metadata.",
|
| 2044 |
+
"Targets temporal embodied representation learning across perception, motion, geometry, audio, and language.",
|
| 2045 |
+
"Can become the shared pretraining backbone for Qwen-style instruction tasks, Cosmos-style world modeling, and policy/action branches."
|
| 2046 |
+
]
|
| 2047 |
}
|
| 2048 |
],
|
| 2049 |
"source_links": [
|
|
|
|
| 2078 |
{
|
| 2079 |
"label": "LeRobot / SmolVLA",
|
| 2080 |
"url": "https://github.com/huggingface/lerobot"
|
| 2081 |
+
},
|
| 2082 |
+
{
|
| 2083 |
+
"label": "Xperience Embodied Foundation Model pretraining plan",
|
| 2084 |
+
"url": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md"
|
| 2085 |
}
|
| 2086 |
],
|
| 2087 |
"status": "planning_artifact"
|
| 2088 |
},
|
| 2089 |
+
"generated_at_utc": "2026-06-04T20:40:29+00:00",
|
| 2090 |
"omni_plan": {
|
| 2091 |
"adapter": "LoRA rank 16, alpha 32, dropout 0.05",
|
| 2092 |
"backbone": "Qwen/Qwen3-Omni-30B-A3B-Instruct",
|
|
|
|
| 2233 |
"reader_takeaway": "The long-term direction is richer multimodal representation learning for embodied-AI reasoning, with model branches chosen by task fit rather than by a single default backbone.",
|
| 2234 |
"stage": "future",
|
| 2235 |
"status": "planned"
|
| 2236 |
+
},
|
| 2237 |
+
{
|
| 2238 |
+
"completion_evidence": [
|
| 2239 |
+
"pretraining metadata",
|
| 2240 |
+
"checkpoint inventory",
|
| 2241 |
+
"scaling curves",
|
| 2242 |
+
"held-out evaluation reports",
|
| 2243 |
+
"qualitative retrieval or future-state examples",
|
| 2244 |
+
"safety and data-boundary report"
|
| 2245 |
+
],
|
| 2246 |
+
"deliverables": [
|
| 2247 |
+
"full-corpus episode and split manifests",
|
| 2248 |
+
"pretraining shard and provenance manifests",
|
| 2249 |
+
"0.3B-1B and 1B-3B scaling pilots",
|
| 2250 |
+
"3B-7B Xperience-native domain model target",
|
| 2251 |
+
"held-out episode/session/activity/object evaluations",
|
| 2252 |
+
"missing-modality robustness report",
|
| 2253 |
+
"model card and data-boundary report"
|
| 2254 |
+
],
|
| 2255 |
+
"entry_condition": "Full-corpus access, PB-scale storage path, high-throughput data loading, multi-node compute, and positive scaling evidence from smaller multi-episode runs.",
|
| 2256 |
+
"id": "xperience_embodied_foundation_pretraining",
|
| 2257 |
+
"name": "Xperience Embodied Foundation Model Pretraining",
|
| 2258 |
+
"reader_takeaway": "The final research direction is a domain-specific embodied foundation model trained directly on Xperience-10M, after smaller pilots justify the cost and infrastructure.",
|
| 2259 |
+
"stage": "future",
|
| 2260 |
+
"status": "future"
|
| 2261 |
}
|
| 2262 |
],
|
| 2263 |
"scale_up": {
|
docs/data/artifact_index.json
CHANGED
|
@@ -1,11 +1,11 @@
|
|
| 1 |
{
|
| 2 |
"title": "Ropedia Xperience-10M Task Suite Artifact Index",
|
| 3 |
-
"generated_at_utc": "2026-06-
|
| 4 |
"status": "pass",
|
| 5 |
-
"artifact_count":
|
| 6 |
"missing": [],
|
| 7 |
"by_kind": {
|
| 8 |
-
"project_path":
|
| 9 |
"project_scope": 1,
|
| 10 |
"source_alignment": 5,
|
| 11 |
"publication_workflow": 1,
|
|
@@ -62,8 +62,8 @@
|
|
| 62 |
"surface": "repo_hf",
|
| 63 |
"shows": "Gives a compact current-state table for first-pass readers.",
|
| 64 |
"exists": true,
|
| 65 |
-
"bytes":
|
| 66 |
-
"sha256": "
|
| 67 |
},
|
| 68 |
{
|
| 69 |
"id": "project_status_json",
|
|
@@ -73,8 +73,8 @@
|
|
| 73 |
"surface": "website_hf",
|
| 74 |
"shows": "Machine-readable copy of the current project status for website and HF mirrors.",
|
| 75 |
"exists": true,
|
| 76 |
-
"bytes":
|
| 77 |
-
"sha256": "
|
| 78 |
},
|
| 79 |
{
|
| 80 |
"id": "research_roadmap",
|
|
@@ -84,8 +84,8 @@
|
|
| 84 |
"surface": "repo_hf",
|
| 85 |
"shows": "Defines the path from public-sample task development to multi-episode held-out evaluation and larger omni-model extensions.",
|
| 86 |
"exists": true,
|
| 87 |
-
"bytes":
|
| 88 |
-
"sha256": "
|
| 89 |
},
|
| 90 |
{
|
| 91 |
"id": "research_roadmap_json",
|
|
@@ -95,8 +95,8 @@
|
|
| 95 |
"surface": "website_hf",
|
| 96 |
"shows": "Machine-readable research roadmap for the website and Hugging Face mirrors.",
|
| 97 |
"exists": true,
|
| 98 |
-
"bytes":
|
| 99 |
-
"sha256": "
|
| 100 |
},
|
| 101 |
{
|
| 102 |
"id": "foundation_model_plan",
|
|
@@ -106,8 +106,8 @@
|
|
| 106 |
"surface": "repo_hf",
|
| 107 |
"shows": "Defines the post-data-gate backbone choices: Qwen3-Omni first, Cosmos 3 for world modeling, and VLA/policy models after action-target conversion.",
|
| 108 |
"exists": true,
|
| 109 |
-
"bytes":
|
| 110 |
-
"sha256": "
|
| 111 |
},
|
| 112 |
{
|
| 113 |
"id": "foundation_model_plan_json",
|
|
@@ -117,8 +117,19 @@
|
|
| 117 |
"surface": "website_hf",
|
| 118 |
"shows": "Machine-readable foundation-model selection matrix with source links, entry conditions, and evaluation additions.",
|
| 119 |
"exists": true,
|
| 120 |
-
"bytes":
|
| 121 |
-
"sha256": "
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 122 |
},
|
| 123 |
{
|
| 124 |
"id": "evidence_contract",
|
|
@@ -150,8 +161,8 @@
|
|
| 150 |
"surface": "repo_hf",
|
| 151 |
"shows": "Gives the human-readable map from project scope to data, tasks, platform mirrors, and scale-up status.",
|
| 152 |
"exists": true,
|
| 153 |
-
"bytes":
|
| 154 |
-
"sha256": "
|
| 155 |
},
|
| 156 |
{
|
| 157 |
"id": "official_dataset_card_alignment",
|
|
@@ -195,7 +206,7 @@
|
|
| 195 |
"shows": "Machine-readable source-alignment pass/fail check for repo, website, and HF surfaces.",
|
| 196 |
"exists": true,
|
| 197 |
"bytes": 4432,
|
| 198 |
-
"sha256": "
|
| 199 |
},
|
| 200 |
{
|
| 201 |
"id": "source_alignment_validator",
|
|
@@ -573,8 +584,8 @@
|
|
| 573 |
"surface": "repo_hf",
|
| 574 |
"shows": "Generates the selective artifact catalog from local files.",
|
| 575 |
"exists": true,
|
| 576 |
-
"bytes":
|
| 577 |
-
"sha256": "
|
| 578 |
},
|
| 579 |
{
|
| 580 |
"id": "publication_audit",
|
|
@@ -585,7 +596,7 @@
|
|
| 585 |
"volatile": true,
|
| 586 |
"shows": "Confirms public bundles exclude raw data, caches, heavy archives, and credential text.",
|
| 587 |
"exists": true,
|
| 588 |
-
"bytes":
|
| 589 |
"hash_policy": "existence_and_size_only"
|
| 590 |
},
|
| 591 |
{
|
|
@@ -597,7 +608,7 @@
|
|
| 597 |
"volatile": true,
|
| 598 |
"shows": "Separates setup paths from completed held-out-episode results.",
|
| 599 |
"exists": true,
|
| 600 |
-
"bytes":
|
| 601 |
"hash_policy": "existence_and_size_only"
|
| 602 |
},
|
| 603 |
{
|
|
@@ -609,7 +620,7 @@
|
|
| 609 |
"volatile": true,
|
| 610 |
"shows": "Confirms prepared GitHub/HF Space/artifact/model mirrors share the same critical data, figure, website HTML, and validator files.",
|
| 611 |
"exists": true,
|
| 612 |
-
"bytes":
|
| 613 |
"hash_policy": "existence_and_size_only"
|
| 614 |
},
|
| 615 |
{
|
|
@@ -621,7 +632,7 @@
|
|
| 621 |
"volatile": true,
|
| 622 |
"shows": "Confirms local website links, anchors, JSON data files, and referenced images resolve.",
|
| 623 |
"exists": true,
|
| 624 |
-
"bytes":
|
| 625 |
"hash_policy": "existence_and_size_only"
|
| 626 |
},
|
| 627 |
{
|
|
|
|
| 1 |
{
|
| 2 |
"title": "Ropedia Xperience-10M Task Suite Artifact Index",
|
| 3 |
+
"generated_at_utc": "2026-06-04T20:40:52+00:00",
|
| 4 |
"status": "pass",
|
| 5 |
+
"artifact_count": 73,
|
| 6 |
"missing": [],
|
| 7 |
"by_kind": {
|
| 8 |
+
"project_path": 12,
|
| 9 |
"project_scope": 1,
|
| 10 |
"source_alignment": 5,
|
| 11 |
"publication_workflow": 1,
|
|
|
|
| 62 |
"surface": "repo_hf",
|
| 63 |
"shows": "Gives a compact current-state table for first-pass readers.",
|
| 64 |
"exists": true,
|
| 65 |
+
"bytes": 7207,
|
| 66 |
+
"sha256": "7baaba976ccc254da1a03ee2653057d1e08f3fb0c0cad035886c362442828720"
|
| 67 |
},
|
| 68 |
{
|
| 69 |
"id": "project_status_json",
|
|
|
|
| 73 |
"surface": "website_hf",
|
| 74 |
"shows": "Machine-readable copy of the current project status for website and HF mirrors.",
|
| 75 |
"exists": true,
|
| 76 |
+
"bytes": 9874,
|
| 77 |
+
"sha256": "600c95726eae3404127a8b2110f35468ff2ba02943cae0fbcd3ea43c66109d3e"
|
| 78 |
},
|
| 79 |
{
|
| 80 |
"id": "research_roadmap",
|
|
|
|
| 84 |
"surface": "repo_hf",
|
| 85 |
"shows": "Defines the path from public-sample task development to multi-episode held-out evaluation and larger omni-model extensions.",
|
| 86 |
"exists": true,
|
| 87 |
+
"bytes": 8388,
|
| 88 |
+
"sha256": "0b3e3356076998ad94dc39f708cc783a4ebeab76c9da661cdd37ea12a3bb3665"
|
| 89 |
},
|
| 90 |
{
|
| 91 |
"id": "research_roadmap_json",
|
|
|
|
| 95 |
"surface": "website_hf",
|
| 96 |
"shows": "Machine-readable research roadmap for the website and Hugging Face mirrors.",
|
| 97 |
"exists": true,
|
| 98 |
+
"bytes": 7161,
|
| 99 |
+
"sha256": "cc96118c2c05108c831616151bc027441f7545495adeeb6a4a6a6bffe8da7801"
|
| 100 |
},
|
| 101 |
{
|
| 102 |
"id": "foundation_model_plan",
|
|
|
|
| 106 |
"surface": "repo_hf",
|
| 107 |
"shows": "Defines the post-data-gate backbone choices: Qwen3-Omni first, Cosmos 3 for world modeling, and VLA/policy models after action-target conversion.",
|
| 108 |
"exists": true,
|
| 109 |
+
"bytes": 9075,
|
| 110 |
+
"sha256": "444d13ab556d2e16a199a7fca191b87c85ab8685d167aab357bc6341839299a2"
|
| 111 |
},
|
| 112 |
{
|
| 113 |
"id": "foundation_model_plan_json",
|
|
|
|
| 117 |
"surface": "website_hf",
|
| 118 |
"shows": "Machine-readable foundation-model selection matrix with source links, entry conditions, and evaluation additions.",
|
| 119 |
"exists": true,
|
| 120 |
+
"bytes": 12981,
|
| 121 |
+
"sha256": "9cce52025a2e2f8afb4660e2af3353aea6ad0a1af380849218dd74c0acc271bb"
|
| 122 |
+
},
|
| 123 |
+
{
|
| 124 |
+
"id": "xperience_embodied_foundation_pretraining",
|
| 125 |
+
"title": "Xperience Embodied Foundation Model pretraining goal",
|
| 126 |
+
"path": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md",
|
| 127 |
+
"kind": "project_path",
|
| 128 |
+
"surface": "repo_hf",
|
| 129 |
+
"shows": "Describes the future full-corpus Xperience-native pretraining goal, target modules, objectives, staged scale-up, hardware ranges, and evaluation protocol.",
|
| 130 |
+
"exists": true,
|
| 131 |
+
"bytes": 9182,
|
| 132 |
+
"sha256": "b5a6ddc58647cd895a4772b110ecc9f4d685427fb37b81b22c6c02d2b9b323f1"
|
| 133 |
},
|
| 134 |
{
|
| 135 |
"id": "evidence_contract",
|
|
|
|
| 161 |
"surface": "repo_hf",
|
| 162 |
"shows": "Gives the human-readable map from project scope to data, tasks, platform mirrors, and scale-up status.",
|
| 163 |
"exists": true,
|
| 164 |
+
"bytes": 11440,
|
| 165 |
+
"sha256": "9b8821a9b14fe1744f2e6b5c419b2c5daaf70b57f1944caf1105c36c0c66c119"
|
| 166 |
},
|
| 167 |
{
|
| 168 |
"id": "official_dataset_card_alignment",
|
|
|
|
| 206 |
"shows": "Machine-readable source-alignment pass/fail check for repo, website, and HF surfaces.",
|
| 207 |
"exists": true,
|
| 208 |
"bytes": 4432,
|
| 209 |
+
"sha256": "06c6e2d111c72df01ed127fd288e6675b63e35a21ae12a2523931a072bd0bc49"
|
| 210 |
},
|
| 211 |
{
|
| 212 |
"id": "source_alignment_validator",
|
|
|
|
| 584 |
"surface": "repo_hf",
|
| 585 |
"shows": "Generates the selective artifact catalog from local files.",
|
| 586 |
"exists": true,
|
| 587 |
+
"bytes": 27020,
|
| 588 |
+
"sha256": "0ca7ed96f24caecbab31687cffa99f0eba8471258986412a294614e688c5aff5"
|
| 589 |
},
|
| 590 |
{
|
| 591 |
"id": "publication_audit",
|
|
|
|
| 596 |
"volatile": true,
|
| 597 |
"shows": "Confirms public bundles exclude raw data, caches, heavy archives, and credential text.",
|
| 598 |
"exists": true,
|
| 599 |
+
"bytes": 11811,
|
| 600 |
"hash_policy": "existence_and_size_only"
|
| 601 |
},
|
| 602 |
{
|
|
|
|
| 608 |
"volatile": true,
|
| 609 |
"shows": "Separates setup paths from completed held-out-episode results.",
|
| 610 |
"exists": true,
|
| 611 |
+
"bytes": 18981,
|
| 612 |
"hash_policy": "existence_and_size_only"
|
| 613 |
},
|
| 614 |
{
|
|
|
|
| 620 |
"volatile": true,
|
| 621 |
"shows": "Confirms prepared GitHub/HF Space/artifact/model mirrors share the same critical data, figure, website HTML, and validator files.",
|
| 622 |
"exists": true,
|
| 623 |
+
"bytes": 108621,
|
| 624 |
"hash_policy": "existence_and_size_only"
|
| 625 |
},
|
| 626 |
{
|
|
|
|
| 632 |
"volatile": true,
|
| 633 |
"shows": "Confirms local website links, anchors, JSON data files, and referenced images resolve.",
|
| 634 |
"exists": true,
|
| 635 |
+
"bytes": 14891,
|
| 636 |
"hash_policy": "existence_and_size_only"
|
| 637 |
},
|
| 638 |
{
|
docs/data/foundation_model_plan.json
CHANGED
|
@@ -2,6 +2,16 @@
|
|
| 2 |
"title": "Xperience-10M Foundation Model Plan",
|
| 3 |
"status": "planning_artifact",
|
| 4 |
"current_boundary": "No held-out multi-episode foundation-model result has been completed in this repo. The current foundation-model artifacts are setup-stage until enough valid episodes are prepared and evaluated.",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
"decision": {
|
| 6 |
"immediate_trainable_backbone": "Qwen3-Omni",
|
| 7 |
"first_world_model_branch": "Cosmos 3",
|
|
@@ -10,7 +20,65 @@
|
|
| 10 |
"openpi pi0/pi0.5",
|
| 11 |
"NVIDIA GR00T"
|
| 12 |
],
|
| 13 |
-
"external_reasoning_reference": "Gemini Robotics"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
},
|
| 15 |
"model_families": [
|
| 16 |
{
|
|
@@ -112,6 +180,21 @@
|
|
| 112 |
"current_decision": "optional_baseline_after_data_staging",
|
| 113 |
"entry_condition": "Action labels and baseline protocol exist.",
|
| 114 |
"public_source": "https://github.com/huggingface/lerobot"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 115 |
}
|
| 116 |
],
|
| 117 |
"execution_order": [
|
|
@@ -144,6 +227,11 @@
|
|
| 144 |
"step": 6,
|
| 145 |
"name": "Publishing threshold",
|
| 146 |
"action": "Publish branch results only with real manifests, predictions, metrics, and qualitative examples."
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 147 |
}
|
| 148 |
],
|
| 149 |
"evaluation_additions": [
|
|
@@ -230,6 +318,10 @@
|
|
| 230 |
{
|
| 231 |
"label": "LeRobot / SmolVLA",
|
| 232 |
"url": "https://github.com/huggingface/lerobot"
|
|
|
|
|
|
|
|
|
|
|
|
|
| 233 |
}
|
| 234 |
]
|
| 235 |
}
|
|
|
|
| 2 |
"title": "Xperience-10M Foundation Model Plan",
|
| 3 |
"status": "planning_artifact",
|
| 4 |
"current_boundary": "No held-out multi-episode foundation-model result has been completed in this repo. The current foundation-model artifacts are setup-stage until enough valid episodes are prepared and evaluated.",
|
| 5 |
+
"backbone_registry": {
|
| 6 |
+
"config_dir": "configs/omni_backbones",
|
| 7 |
+
"validator": "scripts/omni/backbone_registry.py --validate --json",
|
| 8 |
+
"extension_contract": "OMNI_MODEL_EXTENSION_CONTRACT.md",
|
| 9 |
+
"implemented_backbone": "qwen3_omni_lora",
|
| 10 |
+
"planned_backbones": [
|
| 11 |
+
"cosmos_world_model",
|
| 12 |
+
"policy_vla_branch"
|
| 13 |
+
]
|
| 14 |
+
},
|
| 15 |
"decision": {
|
| 16 |
"immediate_trainable_backbone": "Qwen3-Omni",
|
| 17 |
"first_world_model_branch": "Cosmos 3",
|
|
|
|
| 20 |
"openpi pi0/pi0.5",
|
| 21 |
"NVIDIA GR00T"
|
| 22 |
],
|
| 23 |
+
"external_reasoning_reference": "Gemini Robotics",
|
| 24 |
+
"long_term_native_pretraining_goal": "Xperience Embodied Foundation Model"
|
| 25 |
+
},
|
| 26 |
+
"future_pretraining_goal": {
|
| 27 |
+
"name": "Xperience Embodied Foundation Model",
|
| 28 |
+
"status": "future_planning_goal",
|
| 29 |
+
"role": "Domain-specific embodied foundation model pretrained on full Xperience-10M if full-corpus data, storage, and compute become available.",
|
| 30 |
+
"not_current_result": true,
|
| 31 |
+
"document": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md",
|
| 32 |
+
"entry_conditions": [
|
| 33 |
+
"Selected multi-episode Qwen3-Omni pilot trains and evaluates cleanly.",
|
| 34 |
+
"Scaling from 128 episodes to thousands of episodes shows measurable value.",
|
| 35 |
+
"Full-corpus storage, derived-shard storage, and fast active-cache capacity are available.",
|
| 36 |
+
"Distributed training, checkpoint/restart, and provenance tracking are reliable.",
|
| 37 |
+
"Evaluation covers held-out episodes, sessions, activities, objects, and missing-modality robustness."
|
| 38 |
+
],
|
| 39 |
+
"target_modules": [
|
| 40 |
+
"multi-view video encoder",
|
| 41 |
+
"audio encoder",
|
| 42 |
+
"depth and geometry encoder",
|
| 43 |
+
"pose/SLAM encoder",
|
| 44 |
+
"hand/body mocap encoder",
|
| 45 |
+
"IMU encoder",
|
| 46 |
+
"language encoder/decoder",
|
| 47 |
+
"temporal fusion transformer",
|
| 48 |
+
"task heads and decoders"
|
| 49 |
+
],
|
| 50 |
+
"pretraining_objectives": [
|
| 51 |
+
"masked multimodal modeling",
|
| 52 |
+
"cross-modal contrastive alignment",
|
| 53 |
+
"future-state prediction",
|
| 54 |
+
"ego-motion and hand-motion forecasting",
|
| 55 |
+
"action and procedure prediction",
|
| 56 |
+
"language grounding and captioning",
|
| 57 |
+
"contact and affordance prediction",
|
| 58 |
+
"optional policy-style targets after action conversion"
|
| 59 |
+
],
|
| 60 |
+
"hardware_ranges": [
|
| 61 |
+
{
|
| 62 |
+
"goal": "0.3B-1B pilot",
|
| 63 |
+
"compute": "8-32 modern 80GB-class data-center GPUs",
|
| 64 |
+
"use": "prove objectives and data loaders"
|
| 65 |
+
},
|
| 66 |
+
{
|
| 67 |
+
"goal": "1B-3B domain model",
|
| 68 |
+
"compute": "32-128 GPUs",
|
| 69 |
+
"use": "research-scale Xperience representation learning"
|
| 70 |
+
},
|
| 71 |
+
{
|
| 72 |
+
"goal": "3B-7B full-corpus domain model",
|
| 73 |
+
"compute": "128-512 GPUs",
|
| 74 |
+
"use": "first realistic full Xperience-native foundation model"
|
| 75 |
+
},
|
| 76 |
+
{
|
| 77 |
+
"goal": "30B-class omni model from scratch",
|
| 78 |
+
"compute": "512-2000+ GPUs",
|
| 79 |
+
"use": "lab-scale project after scaling curves justify cost"
|
| 80 |
+
}
|
| 81 |
+
]
|
| 82 |
},
|
| 83 |
"model_families": [
|
| 84 |
{
|
|
|
|
| 180 |
"current_decision": "optional_baseline_after_data_staging",
|
| 181 |
"entry_condition": "Action labels and baseline protocol exist.",
|
| 182 |
"public_source": "https://github.com/huggingface/lerobot"
|
| 183 |
+
},
|
| 184 |
+
{
|
| 185 |
+
"priority": 8,
|
| 186 |
+
"family": "Xperience Embodied Foundation Model",
|
| 187 |
+
"category": "xperience_native_pretraining_goal",
|
| 188 |
+
"openness": "future project-specific model if full-corpus access and compute exist",
|
| 189 |
+
"best_role": "Domain model over synchronized embodied experience.",
|
| 190 |
+
"xperience10m_fit": [
|
| 191 |
+
"Uses the full aligned modality stack rather than treating sensors as auxiliary metadata.",
|
| 192 |
+
"Targets temporal embodied representation learning across perception, motion, geometry, audio, and language.",
|
| 193 |
+
"Can become the shared pretraining backbone for Qwen-style instruction tasks, Cosmos-style world modeling, and policy/action branches."
|
| 194 |
+
],
|
| 195 |
+
"current_decision": "future_goal_after_scaling_evidence",
|
| 196 |
+
"entry_condition": "Full-corpus data path, PB-scale storage, multi-node compute, and positive smaller-run scaling evidence.",
|
| 197 |
+
"public_source": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md"
|
| 198 |
}
|
| 199 |
],
|
| 200 |
"execution_order": [
|
|
|
|
| 227 |
"step": 6,
|
| 228 |
"name": "Publishing threshold",
|
| 229 |
"action": "Publish branch results only with real manifests, predictions, metrics, and qualitative examples."
|
| 230 |
+
},
|
| 231 |
+
{
|
| 232 |
+
"step": 7,
|
| 233 |
+
"name": "Xperience-native pretraining",
|
| 234 |
+
"action": "Start a from-scratch Xperience Embodied Foundation Model only after smaller scaling stages, full-corpus storage, multi-node compute, and held-out evaluation protocols are in place."
|
| 235 |
}
|
| 236 |
],
|
| 237 |
"evaluation_additions": [
|
|
|
|
| 318 |
{
|
| 319 |
"label": "LeRobot / SmolVLA",
|
| 320 |
"url": "https://github.com/huggingface/lerobot"
|
| 321 |
+
},
|
| 322 |
+
{
|
| 323 |
+
"label": "Xperience Embodied Foundation Model pretraining plan",
|
| 324 |
+
"url": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md"
|
| 325 |
}
|
| 326 |
]
|
| 327 |
}
|
docs/data/mirror_parity.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
{
|
| 2 |
"status": "pass",
|
| 3 |
-
"generated_at_utc": "2026-06-
|
| 4 |
"hf_root": "hf_publish",
|
| 5 |
"summary": {
|
| 6 |
"group_count": 101,
|
|
@@ -71,27 +71,27 @@
|
|
| 71 |
"local": {
|
| 72 |
"path": "repo:docs/data/artifact_index.json",
|
| 73 |
"exists": true,
|
| 74 |
-
"bytes":
|
| 75 |
-
"sha256": "
|
| 76 |
},
|
| 77 |
"mirrors": {
|
| 78 |
"hf_space": {
|
| 79 |
"path": "hf_space:data/artifact_index.json",
|
| 80 |
"exists": true,
|
| 81 |
-
"bytes":
|
| 82 |
-
"sha256": "
|
| 83 |
},
|
| 84 |
"hf_artifacts": {
|
| 85 |
"path": "hf_artifacts:docs/data/artifact_index.json",
|
| 86 |
"exists": true,
|
| 87 |
-
"bytes":
|
| 88 |
-
"sha256": "
|
| 89 |
},
|
| 90 |
"hf_model": {
|
| 91 |
"path": "hf_model:metrics/artifact_index.json",
|
| 92 |
"exists": true,
|
| 93 |
-
"bytes":
|
| 94 |
-
"sha256": "
|
| 95 |
}
|
| 96 |
},
|
| 97 |
"failures": []
|
|
@@ -226,27 +226,27 @@
|
|
| 226 |
"local": {
|
| 227 |
"path": "repo:docs/data/foundation_model_plan.json",
|
| 228 |
"exists": true,
|
| 229 |
-
"bytes":
|
| 230 |
-
"sha256": "
|
| 231 |
},
|
| 232 |
"mirrors": {
|
| 233 |
"hf_space": {
|
| 234 |
"path": "hf_space:data/foundation_model_plan.json",
|
| 235 |
"exists": true,
|
| 236 |
-
"bytes":
|
| 237 |
-
"sha256": "
|
| 238 |
},
|
| 239 |
"hf_artifacts": {
|
| 240 |
"path": "hf_artifacts:docs/data/foundation_model_plan.json",
|
| 241 |
"exists": true,
|
| 242 |
-
"bytes":
|
| 243 |
-
"sha256": "
|
| 244 |
},
|
| 245 |
"hf_model": {
|
| 246 |
"path": "hf_model:metrics/foundation_model_plan.json",
|
| 247 |
"exists": true,
|
| 248 |
-
"bytes":
|
| 249 |
-
"sha256": "
|
| 250 |
}
|
| 251 |
},
|
| 252 |
"failures": []
|
|
@@ -412,27 +412,27 @@
|
|
| 412 |
"local": {
|
| 413 |
"path": "repo:docs/data/project_status.json",
|
| 414 |
"exists": true,
|
| 415 |
-
"bytes":
|
| 416 |
-
"sha256": "
|
| 417 |
},
|
| 418 |
"mirrors": {
|
| 419 |
"hf_space": {
|
| 420 |
"path": "hf_space:data/project_status.json",
|
| 421 |
"exists": true,
|
| 422 |
-
"bytes":
|
| 423 |
-
"sha256": "
|
| 424 |
},
|
| 425 |
"hf_artifacts": {
|
| 426 |
"path": "hf_artifacts:docs/data/project_status.json",
|
| 427 |
"exists": true,
|
| 428 |
-
"bytes":
|
| 429 |
-
"sha256": "
|
| 430 |
},
|
| 431 |
"hf_model": {
|
| 432 |
"path": "hf_model:metrics/project_status.json",
|
| 433 |
"exists": true,
|
| 434 |
-
"bytes":
|
| 435 |
-
"sha256": "
|
| 436 |
}
|
| 437 |
},
|
| 438 |
"failures": []
|
|
@@ -443,27 +443,27 @@
|
|
| 443 |
"local": {
|
| 444 |
"path": "repo:docs/data/publication_audit.json",
|
| 445 |
"exists": true,
|
| 446 |
-
"bytes":
|
| 447 |
-
"sha256": "
|
| 448 |
},
|
| 449 |
"mirrors": {
|
| 450 |
"hf_space": {
|
| 451 |
"path": "hf_space:data/publication_audit.json",
|
| 452 |
"exists": true,
|
| 453 |
-
"bytes":
|
| 454 |
-
"sha256": "
|
| 455 |
},
|
| 456 |
"hf_artifacts": {
|
| 457 |
"path": "hf_artifacts:docs/data/publication_audit.json",
|
| 458 |
"exists": true,
|
| 459 |
-
"bytes":
|
| 460 |
-
"sha256": "
|
| 461 |
},
|
| 462 |
"hf_model": {
|
| 463 |
"path": "hf_model:metrics/publication_audit.json",
|
| 464 |
"exists": true,
|
| 465 |
-
"bytes":
|
| 466 |
-
"sha256": "
|
| 467 |
}
|
| 468 |
},
|
| 469 |
"failures": []
|
|
@@ -598,27 +598,27 @@
|
|
| 598 |
"local": {
|
| 599 |
"path": "repo:docs/data/research_roadmap.json",
|
| 600 |
"exists": true,
|
| 601 |
-
"bytes":
|
| 602 |
-
"sha256": "
|
| 603 |
},
|
| 604 |
"mirrors": {
|
| 605 |
"hf_space": {
|
| 606 |
"path": "hf_space:data/research_roadmap.json",
|
| 607 |
"exists": true,
|
| 608 |
-
"bytes":
|
| 609 |
-
"sha256": "
|
| 610 |
},
|
| 611 |
"hf_artifacts": {
|
| 612 |
"path": "hf_artifacts:docs/data/research_roadmap.json",
|
| 613 |
"exists": true,
|
| 614 |
-
"bytes":
|
| 615 |
-
"sha256": "
|
| 616 |
},
|
| 617 |
"hf_model": {
|
| 618 |
"path": "hf_model:metrics/research_roadmap.json",
|
| 619 |
"exists": true,
|
| 620 |
-
"bytes":
|
| 621 |
-
"sha256": "
|
| 622 |
}
|
| 623 |
},
|
| 624 |
"failures": []
|
|
@@ -629,27 +629,27 @@
|
|
| 629 |
"local": {
|
| 630 |
"path": "repo:docs/data/research_roadmap_interactive.json",
|
| 631 |
"exists": true,
|
| 632 |
-
"bytes":
|
| 633 |
-
"sha256": "
|
| 634 |
},
|
| 635 |
"mirrors": {
|
| 636 |
"hf_space": {
|
| 637 |
"path": "hf_space:data/research_roadmap_interactive.json",
|
| 638 |
"exists": true,
|
| 639 |
-
"bytes":
|
| 640 |
-
"sha256": "
|
| 641 |
},
|
| 642 |
"hf_artifacts": {
|
| 643 |
"path": "hf_artifacts:docs/data/research_roadmap_interactive.json",
|
| 644 |
"exists": true,
|
| 645 |
-
"bytes":
|
| 646 |
-
"sha256": "
|
| 647 |
},
|
| 648 |
"hf_model": {
|
| 649 |
"path": "hf_model:metrics/research_roadmap_interactive.json",
|
| 650 |
"exists": true,
|
| 651 |
-
"bytes":
|
| 652 |
-
"sha256": "
|
| 653 |
}
|
| 654 |
},
|
| 655 |
"failures": []
|
|
@@ -939,27 +939,27 @@
|
|
| 939 |
"local": {
|
| 940 |
"path": "repo:docs/data/website_integrity.json",
|
| 941 |
"exists": true,
|
| 942 |
-
"bytes":
|
| 943 |
-
"sha256": "
|
| 944 |
},
|
| 945 |
"mirrors": {
|
| 946 |
"hf_space": {
|
| 947 |
"path": "hf_space:data/website_integrity.json",
|
| 948 |
"exists": true,
|
| 949 |
-
"bytes":
|
| 950 |
-
"sha256": "
|
| 951 |
},
|
| 952 |
"hf_artifacts": {
|
| 953 |
"path": "hf_artifacts:docs/data/website_integrity.json",
|
| 954 |
"exists": true,
|
| 955 |
-
"bytes":
|
| 956 |
-
"sha256": "
|
| 957 |
},
|
| 958 |
"hf_model": {
|
| 959 |
"path": "hf_model:metrics/website_integrity.json",
|
| 960 |
"exists": true,
|
| 961 |
-
"bytes":
|
| 962 |
-
"sha256": "
|
| 963 |
}
|
| 964 |
},
|
| 965 |
"failures": []
|
|
@@ -1692,21 +1692,21 @@
|
|
| 1692 |
"local": {
|
| 1693 |
"path": "repo:scripts/build_artifact_index.py",
|
| 1694 |
"exists": true,
|
| 1695 |
-
"bytes":
|
| 1696 |
-
"sha256": "
|
| 1697 |
},
|
| 1698 |
"mirrors": {
|
| 1699 |
"hf_artifacts": {
|
| 1700 |
"path": "hf_artifacts:scripts/build_artifact_index.py",
|
| 1701 |
"exists": true,
|
| 1702 |
-
"bytes":
|
| 1703 |
-
"sha256": "
|
| 1704 |
},
|
| 1705 |
"hf_model": {
|
| 1706 |
"path": "hf_model:scripts/build_artifact_index.py",
|
| 1707 |
"exists": true,
|
| 1708 |
-
"bytes":
|
| 1709 |
-
"sha256": "
|
| 1710 |
}
|
| 1711 |
},
|
| 1712 |
"failures": []
|
|
@@ -2017,21 +2017,21 @@
|
|
| 2017 |
"local": {
|
| 2018 |
"path": "repo:scripts/validate_publication_package.py",
|
| 2019 |
"exists": true,
|
| 2020 |
-
"bytes":
|
| 2021 |
-
"sha256": "
|
| 2022 |
},
|
| 2023 |
"mirrors": {
|
| 2024 |
"hf_artifacts": {
|
| 2025 |
"path": "hf_artifacts:scripts/validate_publication_package.py",
|
| 2026 |
"exists": true,
|
| 2027 |
-
"bytes":
|
| 2028 |
-
"sha256": "
|
| 2029 |
},
|
| 2030 |
"hf_model": {
|
| 2031 |
"path": "hf_model:scripts/validate_publication_package.py",
|
| 2032 |
"exists": true,
|
| 2033 |
-
"bytes":
|
| 2034 |
-
"sha256": "
|
| 2035 |
}
|
| 2036 |
},
|
| 2037 |
"failures": []
|
|
@@ -2117,21 +2117,21 @@
|
|
| 2117 |
"local": {
|
| 2118 |
"path": "repo:scripts/validate_website_integrity.py",
|
| 2119 |
"exists": true,
|
| 2120 |
-
"bytes":
|
| 2121 |
-
"sha256": "
|
| 2122 |
},
|
| 2123 |
"mirrors": {
|
| 2124 |
"hf_artifacts": {
|
| 2125 |
"path": "hf_artifacts:scripts/validate_website_integrity.py",
|
| 2126 |
"exists": true,
|
| 2127 |
-
"bytes":
|
| 2128 |
-
"sha256": "
|
| 2129 |
},
|
| 2130 |
"hf_model": {
|
| 2131 |
"path": "hf_model:scripts/validate_website_integrity.py",
|
| 2132 |
"exists": true,
|
| 2133 |
-
"bytes":
|
| 2134 |
-
"sha256": "
|
| 2135 |
}
|
| 2136 |
},
|
| 2137 |
"failures": []
|
|
@@ -2217,21 +2217,21 @@
|
|
| 2217 |
"local": {
|
| 2218 |
"path": "repo:docs/index.html",
|
| 2219 |
"exists": true,
|
| 2220 |
-
"bytes":
|
| 2221 |
-
"sha256": "
|
| 2222 |
},
|
| 2223 |
"mirrors": {
|
| 2224 |
"hf_space": {
|
| 2225 |
"path": "hf_space:index.html",
|
| 2226 |
"exists": true,
|
| 2227 |
-
"bytes":
|
| 2228 |
-
"sha256": "
|
| 2229 |
},
|
| 2230 |
"hf_artifacts_docs": {
|
| 2231 |
"path": "hf_artifacts:docs/index.html",
|
| 2232 |
"exists": true,
|
| 2233 |
-
"bytes":
|
| 2234 |
-
"sha256": "
|
| 2235 |
}
|
| 2236 |
},
|
| 2237 |
"failures": []
|
|
@@ -2242,21 +2242,21 @@
|
|
| 2242 |
"local": {
|
| 2243 |
"path": "repo:docs/research_roadmap.html",
|
| 2244 |
"exists": true,
|
| 2245 |
-
"bytes":
|
| 2246 |
-
"sha256": "
|
| 2247 |
},
|
| 2248 |
"mirrors": {
|
| 2249 |
"hf_space": {
|
| 2250 |
"path": "hf_space:research_roadmap.html",
|
| 2251 |
"exists": true,
|
| 2252 |
-
"bytes":
|
| 2253 |
-
"sha256": "
|
| 2254 |
},
|
| 2255 |
"hf_artifacts_docs": {
|
| 2256 |
"path": "hf_artifacts:docs/research_roadmap.html",
|
| 2257 |
"exists": true,
|
| 2258 |
-
"bytes":
|
| 2259 |
-
"sha256": "
|
| 2260 |
}
|
| 2261 |
},
|
| 2262 |
"failures": []
|
|
@@ -2844,27 +2844,27 @@
|
|
| 2844 |
"local": {
|
| 2845 |
"path": "repo:FOUNDATION_MODEL_PLAN.md",
|
| 2846 |
"exists": true,
|
| 2847 |
-
"bytes":
|
| 2848 |
-
"sha256": "
|
| 2849 |
},
|
| 2850 |
"mirrors": {
|
| 2851 |
"hf_space": {
|
| 2852 |
"path": "hf_space:FOUNDATION_MODEL_PLAN.md",
|
| 2853 |
"exists": true,
|
| 2854 |
-
"bytes":
|
| 2855 |
-
"sha256": "
|
| 2856 |
},
|
| 2857 |
"hf_artifacts": {
|
| 2858 |
"path": "hf_artifacts:FOUNDATION_MODEL_PLAN.md",
|
| 2859 |
"exists": true,
|
| 2860 |
-
"bytes":
|
| 2861 |
-
"sha256": "
|
| 2862 |
},
|
| 2863 |
"hf_model": {
|
| 2864 |
"path": "hf_model:FOUNDATION_MODEL_PLAN.md",
|
| 2865 |
"exists": true,
|
| 2866 |
-
"bytes":
|
| 2867 |
-
"sha256": "
|
| 2868 |
}
|
| 2869 |
},
|
| 2870 |
"failures": []
|
|
@@ -2937,27 +2937,27 @@
|
|
| 2937 |
"local": {
|
| 2938 |
"path": "repo:RESEARCH_ROADMAP.md",
|
| 2939 |
"exists": true,
|
| 2940 |
-
"bytes":
|
| 2941 |
-
"sha256": "
|
| 2942 |
},
|
| 2943 |
"mirrors": {
|
| 2944 |
"hf_space": {
|
| 2945 |
"path": "hf_space:RESEARCH_ROADMAP.md",
|
| 2946 |
"exists": true,
|
| 2947 |
-
"bytes":
|
| 2948 |
-
"sha256": "
|
| 2949 |
},
|
| 2950 |
"hf_artifacts": {
|
| 2951 |
"path": "hf_artifacts:RESEARCH_ROADMAP.md",
|
| 2952 |
"exists": true,
|
| 2953 |
-
"bytes":
|
| 2954 |
-
"sha256": "
|
| 2955 |
},
|
| 2956 |
"hf_model": {
|
| 2957 |
"path": "hf_model:RESEARCH_ROADMAP.md",
|
| 2958 |
"exists": true,
|
| 2959 |
-
"bytes":
|
| 2960 |
-
"sha256": "
|
| 2961 |
}
|
| 2962 |
},
|
| 2963 |
"failures": []
|
|
@@ -2968,27 +2968,27 @@
|
|
| 2968 |
"local": {
|
| 2969 |
"path": "repo:PROJECT_STATUS.md",
|
| 2970 |
"exists": true,
|
| 2971 |
-
"bytes":
|
| 2972 |
-
"sha256": "
|
| 2973 |
},
|
| 2974 |
"mirrors": {
|
| 2975 |
"hf_space": {
|
| 2976 |
"path": "hf_space:PROJECT_STATUS.md",
|
| 2977 |
"exists": true,
|
| 2978 |
-
"bytes":
|
| 2979 |
-
"sha256": "
|
| 2980 |
},
|
| 2981 |
"hf_artifacts": {
|
| 2982 |
"path": "hf_artifacts:PROJECT_STATUS.md",
|
| 2983 |
"exists": true,
|
| 2984 |
-
"bytes":
|
| 2985 |
-
"sha256": "
|
| 2986 |
},
|
| 2987 |
"hf_model": {
|
| 2988 |
"path": "hf_model:PROJECT_STATUS.md",
|
| 2989 |
"exists": true,
|
| 2990 |
-
"bytes":
|
| 2991 |
-
"sha256": "
|
| 2992 |
}
|
| 2993 |
},
|
| 2994 |
"failures": []
|
|
|
|
| 1 |
{
|
| 2 |
"status": "pass",
|
| 3 |
+
"generated_at_utc": "2026-06-04T20:45:22+00:00",
|
| 4 |
"hf_root": "hf_publish",
|
| 5 |
"summary": {
|
| 6 |
"group_count": 101,
|
|
|
|
| 71 |
"local": {
|
| 72 |
"path": "repo:docs/data/artifact_index.json",
|
| 73 |
"exists": true,
|
| 74 |
+
"bytes": 32864,
|
| 75 |
+
"sha256": "ec7d17898c42fd76109567c201f9638059b6a9a11a48817b32677a0eb2662178"
|
| 76 |
},
|
| 77 |
"mirrors": {
|
| 78 |
"hf_space": {
|
| 79 |
"path": "hf_space:data/artifact_index.json",
|
| 80 |
"exists": true,
|
| 81 |
+
"bytes": 32864,
|
| 82 |
+
"sha256": "ec7d17898c42fd76109567c201f9638059b6a9a11a48817b32677a0eb2662178"
|
| 83 |
},
|
| 84 |
"hf_artifacts": {
|
| 85 |
"path": "hf_artifacts:docs/data/artifact_index.json",
|
| 86 |
"exists": true,
|
| 87 |
+
"bytes": 32864,
|
| 88 |
+
"sha256": "ec7d17898c42fd76109567c201f9638059b6a9a11a48817b32677a0eb2662178"
|
| 89 |
},
|
| 90 |
"hf_model": {
|
| 91 |
"path": "hf_model:metrics/artifact_index.json",
|
| 92 |
"exists": true,
|
| 93 |
+
"bytes": 32864,
|
| 94 |
+
"sha256": "ec7d17898c42fd76109567c201f9638059b6a9a11a48817b32677a0eb2662178"
|
| 95 |
}
|
| 96 |
},
|
| 97 |
"failures": []
|
|
|
|
| 226 |
"local": {
|
| 227 |
"path": "repo:docs/data/foundation_model_plan.json",
|
| 228 |
"exists": true,
|
| 229 |
+
"bytes": 12981,
|
| 230 |
+
"sha256": "9cce52025a2e2f8afb4660e2af3353aea6ad0a1af380849218dd74c0acc271bb"
|
| 231 |
},
|
| 232 |
"mirrors": {
|
| 233 |
"hf_space": {
|
| 234 |
"path": "hf_space:data/foundation_model_plan.json",
|
| 235 |
"exists": true,
|
| 236 |
+
"bytes": 12981,
|
| 237 |
+
"sha256": "9cce52025a2e2f8afb4660e2af3353aea6ad0a1af380849218dd74c0acc271bb"
|
| 238 |
},
|
| 239 |
"hf_artifacts": {
|
| 240 |
"path": "hf_artifacts:docs/data/foundation_model_plan.json",
|
| 241 |
"exists": true,
|
| 242 |
+
"bytes": 12981,
|
| 243 |
+
"sha256": "9cce52025a2e2f8afb4660e2af3353aea6ad0a1af380849218dd74c0acc271bb"
|
| 244 |
},
|
| 245 |
"hf_model": {
|
| 246 |
"path": "hf_model:metrics/foundation_model_plan.json",
|
| 247 |
"exists": true,
|
| 248 |
+
"bytes": 12981,
|
| 249 |
+
"sha256": "9cce52025a2e2f8afb4660e2af3353aea6ad0a1af380849218dd74c0acc271bb"
|
| 250 |
}
|
| 251 |
},
|
| 252 |
"failures": []
|
|
|
|
| 412 |
"local": {
|
| 413 |
"path": "repo:docs/data/project_status.json",
|
| 414 |
"exists": true,
|
| 415 |
+
"bytes": 9874,
|
| 416 |
+
"sha256": "600c95726eae3404127a8b2110f35468ff2ba02943cae0fbcd3ea43c66109d3e"
|
| 417 |
},
|
| 418 |
"mirrors": {
|
| 419 |
"hf_space": {
|
| 420 |
"path": "hf_space:data/project_status.json",
|
| 421 |
"exists": true,
|
| 422 |
+
"bytes": 9874,
|
| 423 |
+
"sha256": "600c95726eae3404127a8b2110f35468ff2ba02943cae0fbcd3ea43c66109d3e"
|
| 424 |
},
|
| 425 |
"hf_artifacts": {
|
| 426 |
"path": "hf_artifacts:docs/data/project_status.json",
|
| 427 |
"exists": true,
|
| 428 |
+
"bytes": 9874,
|
| 429 |
+
"sha256": "600c95726eae3404127a8b2110f35468ff2ba02943cae0fbcd3ea43c66109d3e"
|
| 430 |
},
|
| 431 |
"hf_model": {
|
| 432 |
"path": "hf_model:metrics/project_status.json",
|
| 433 |
"exists": true,
|
| 434 |
+
"bytes": 9874,
|
| 435 |
+
"sha256": "600c95726eae3404127a8b2110f35468ff2ba02943cae0fbcd3ea43c66109d3e"
|
| 436 |
}
|
| 437 |
},
|
| 438 |
"failures": []
|
|
|
|
| 443 |
"local": {
|
| 444 |
"path": "repo:docs/data/publication_audit.json",
|
| 445 |
"exists": true,
|
| 446 |
+
"bytes": 7237,
|
| 447 |
+
"sha256": "7fbb19f8990b1a4d902e282c010d27e4391755564fa68af97d96c298c6b054f8"
|
| 448 |
},
|
| 449 |
"mirrors": {
|
| 450 |
"hf_space": {
|
| 451 |
"path": "hf_space:data/publication_audit.json",
|
| 452 |
"exists": true,
|
| 453 |
+
"bytes": 7237,
|
| 454 |
+
"sha256": "7fbb19f8990b1a4d902e282c010d27e4391755564fa68af97d96c298c6b054f8"
|
| 455 |
},
|
| 456 |
"hf_artifacts": {
|
| 457 |
"path": "hf_artifacts:docs/data/publication_audit.json",
|
| 458 |
"exists": true,
|
| 459 |
+
"bytes": 7237,
|
| 460 |
+
"sha256": "7fbb19f8990b1a4d902e282c010d27e4391755564fa68af97d96c298c6b054f8"
|
| 461 |
},
|
| 462 |
"hf_model": {
|
| 463 |
"path": "hf_model:metrics/publication_audit.json",
|
| 464 |
"exists": true,
|
| 465 |
+
"bytes": 7237,
|
| 466 |
+
"sha256": "7fbb19f8990b1a4d902e282c010d27e4391755564fa68af97d96c298c6b054f8"
|
| 467 |
}
|
| 468 |
},
|
| 469 |
"failures": []
|
|
|
|
| 598 |
"local": {
|
| 599 |
"path": "repo:docs/data/research_roadmap.json",
|
| 600 |
"exists": true,
|
| 601 |
+
"bytes": 7161,
|
| 602 |
+
"sha256": "cc96118c2c05108c831616151bc027441f7545495adeeb6a4a6a6bffe8da7801"
|
| 603 |
},
|
| 604 |
"mirrors": {
|
| 605 |
"hf_space": {
|
| 606 |
"path": "hf_space:data/research_roadmap.json",
|
| 607 |
"exists": true,
|
| 608 |
+
"bytes": 7161,
|
| 609 |
+
"sha256": "cc96118c2c05108c831616151bc027441f7545495adeeb6a4a6a6bffe8da7801"
|
| 610 |
},
|
| 611 |
"hf_artifacts": {
|
| 612 |
"path": "hf_artifacts:docs/data/research_roadmap.json",
|
| 613 |
"exists": true,
|
| 614 |
+
"bytes": 7161,
|
| 615 |
+
"sha256": "cc96118c2c05108c831616151bc027441f7545495adeeb6a4a6a6bffe8da7801"
|
| 616 |
},
|
| 617 |
"hf_model": {
|
| 618 |
"path": "hf_model:metrics/research_roadmap.json",
|
| 619 |
"exists": true,
|
| 620 |
+
"bytes": 7161,
|
| 621 |
+
"sha256": "cc96118c2c05108c831616151bc027441f7545495adeeb6a4a6a6bffe8da7801"
|
| 622 |
}
|
| 623 |
},
|
| 624 |
"failures": []
|
|
|
|
| 629 |
"local": {
|
| 630 |
"path": "repo:docs/data/research_roadmap_interactive.json",
|
| 631 |
"exists": true,
|
| 632 |
+
"bytes": 134282,
|
| 633 |
+
"sha256": "ff37219a9f1d9b386a9d4c42766e4aa28f10ce6ef338dceeedd6bdb4a1b2c40a"
|
| 634 |
},
|
| 635 |
"mirrors": {
|
| 636 |
"hf_space": {
|
| 637 |
"path": "hf_space:data/research_roadmap_interactive.json",
|
| 638 |
"exists": true,
|
| 639 |
+
"bytes": 134282,
|
| 640 |
+
"sha256": "ff37219a9f1d9b386a9d4c42766e4aa28f10ce6ef338dceeedd6bdb4a1b2c40a"
|
| 641 |
},
|
| 642 |
"hf_artifacts": {
|
| 643 |
"path": "hf_artifacts:docs/data/research_roadmap_interactive.json",
|
| 644 |
"exists": true,
|
| 645 |
+
"bytes": 134282,
|
| 646 |
+
"sha256": "ff37219a9f1d9b386a9d4c42766e4aa28f10ce6ef338dceeedd6bdb4a1b2c40a"
|
| 647 |
},
|
| 648 |
"hf_model": {
|
| 649 |
"path": "hf_model:metrics/research_roadmap_interactive.json",
|
| 650 |
"exists": true,
|
| 651 |
+
"bytes": 134282,
|
| 652 |
+
"sha256": "ff37219a9f1d9b386a9d4c42766e4aa28f10ce6ef338dceeedd6bdb4a1b2c40a"
|
| 653 |
}
|
| 654 |
},
|
| 655 |
"failures": []
|
|
|
|
| 939 |
"local": {
|
| 940 |
"path": "repo:docs/data/website_integrity.json",
|
| 941 |
"exists": true,
|
| 942 |
+
"bytes": 14891,
|
| 943 |
+
"sha256": "9ba1cfe02568fc9b08209902ce037c445a9a8c3954d20eea4351b04c65ca0a0c"
|
| 944 |
},
|
| 945 |
"mirrors": {
|
| 946 |
"hf_space": {
|
| 947 |
"path": "hf_space:data/website_integrity.json",
|
| 948 |
"exists": true,
|
| 949 |
+
"bytes": 14891,
|
| 950 |
+
"sha256": "9ba1cfe02568fc9b08209902ce037c445a9a8c3954d20eea4351b04c65ca0a0c"
|
| 951 |
},
|
| 952 |
"hf_artifacts": {
|
| 953 |
"path": "hf_artifacts:docs/data/website_integrity.json",
|
| 954 |
"exists": true,
|
| 955 |
+
"bytes": 14891,
|
| 956 |
+
"sha256": "9ba1cfe02568fc9b08209902ce037c445a9a8c3954d20eea4351b04c65ca0a0c"
|
| 957 |
},
|
| 958 |
"hf_model": {
|
| 959 |
"path": "hf_model:metrics/website_integrity.json",
|
| 960 |
"exists": true,
|
| 961 |
+
"bytes": 14891,
|
| 962 |
+
"sha256": "9ba1cfe02568fc9b08209902ce037c445a9a8c3954d20eea4351b04c65ca0a0c"
|
| 963 |
}
|
| 964 |
},
|
| 965 |
"failures": []
|
|
|
|
| 1692 |
"local": {
|
| 1693 |
"path": "repo:scripts/build_artifact_index.py",
|
| 1694 |
"exists": true,
|
| 1695 |
+
"bytes": 27020,
|
| 1696 |
+
"sha256": "0ca7ed96f24caecbab31687cffa99f0eba8471258986412a294614e688c5aff5"
|
| 1697 |
},
|
| 1698 |
"mirrors": {
|
| 1699 |
"hf_artifacts": {
|
| 1700 |
"path": "hf_artifacts:scripts/build_artifact_index.py",
|
| 1701 |
"exists": true,
|
| 1702 |
+
"bytes": 27020,
|
| 1703 |
+
"sha256": "0ca7ed96f24caecbab31687cffa99f0eba8471258986412a294614e688c5aff5"
|
| 1704 |
},
|
| 1705 |
"hf_model": {
|
| 1706 |
"path": "hf_model:scripts/build_artifact_index.py",
|
| 1707 |
"exists": true,
|
| 1708 |
+
"bytes": 27020,
|
| 1709 |
+
"sha256": "0ca7ed96f24caecbab31687cffa99f0eba8471258986412a294614e688c5aff5"
|
| 1710 |
}
|
| 1711 |
},
|
| 1712 |
"failures": []
|
|
|
|
| 2017 |
"local": {
|
| 2018 |
"path": "repo:scripts/validate_publication_package.py",
|
| 2019 |
"exists": true,
|
| 2020 |
+
"bytes": 17197,
|
| 2021 |
+
"sha256": "2a617f3204ffb8c59d1c5bc1828b4441a4d014bb531655fd0613e128a6d9abc2"
|
| 2022 |
},
|
| 2023 |
"mirrors": {
|
| 2024 |
"hf_artifacts": {
|
| 2025 |
"path": "hf_artifacts:scripts/validate_publication_package.py",
|
| 2026 |
"exists": true,
|
| 2027 |
+
"bytes": 17197,
|
| 2028 |
+
"sha256": "2a617f3204ffb8c59d1c5bc1828b4441a4d014bb531655fd0613e128a6d9abc2"
|
| 2029 |
},
|
| 2030 |
"hf_model": {
|
| 2031 |
"path": "hf_model:scripts/validate_publication_package.py",
|
| 2032 |
"exists": true,
|
| 2033 |
+
"bytes": 17197,
|
| 2034 |
+
"sha256": "2a617f3204ffb8c59d1c5bc1828b4441a4d014bb531655fd0613e128a6d9abc2"
|
| 2035 |
}
|
| 2036 |
},
|
| 2037 |
"failures": []
|
|
|
|
| 2117 |
"local": {
|
| 2118 |
"path": "repo:scripts/validate_website_integrity.py",
|
| 2119 |
"exists": true,
|
| 2120 |
+
"bytes": 24481,
|
| 2121 |
+
"sha256": "31d85a4674e8005a916e759d820178287e297e0ec08774fe3a70aa3b61b07cf7"
|
| 2122 |
},
|
| 2123 |
"mirrors": {
|
| 2124 |
"hf_artifacts": {
|
| 2125 |
"path": "hf_artifacts:scripts/validate_website_integrity.py",
|
| 2126 |
"exists": true,
|
| 2127 |
+
"bytes": 24481,
|
| 2128 |
+
"sha256": "31d85a4674e8005a916e759d820178287e297e0ec08774fe3a70aa3b61b07cf7"
|
| 2129 |
},
|
| 2130 |
"hf_model": {
|
| 2131 |
"path": "hf_model:scripts/validate_website_integrity.py",
|
| 2132 |
"exists": true,
|
| 2133 |
+
"bytes": 24481,
|
| 2134 |
+
"sha256": "31d85a4674e8005a916e759d820178287e297e0ec08774fe3a70aa3b61b07cf7"
|
| 2135 |
}
|
| 2136 |
},
|
| 2137 |
"failures": []
|
|
|
|
| 2217 |
"local": {
|
| 2218 |
"path": "repo:docs/index.html",
|
| 2219 |
"exists": true,
|
| 2220 |
+
"bytes": 174923,
|
| 2221 |
+
"sha256": "099fcc01cbb4d50f62c508b10f343f05b1c883962b85bda294bcede99af2a0f1"
|
| 2222 |
},
|
| 2223 |
"mirrors": {
|
| 2224 |
"hf_space": {
|
| 2225 |
"path": "hf_space:index.html",
|
| 2226 |
"exists": true,
|
| 2227 |
+
"bytes": 174923,
|
| 2228 |
+
"sha256": "099fcc01cbb4d50f62c508b10f343f05b1c883962b85bda294bcede99af2a0f1"
|
| 2229 |
},
|
| 2230 |
"hf_artifacts_docs": {
|
| 2231 |
"path": "hf_artifacts:docs/index.html",
|
| 2232 |
"exists": true,
|
| 2233 |
+
"bytes": 174923,
|
| 2234 |
+
"sha256": "099fcc01cbb4d50f62c508b10f343f05b1c883962b85bda294bcede99af2a0f1"
|
| 2235 |
}
|
| 2236 |
},
|
| 2237 |
"failures": []
|
|
|
|
| 2242 |
"local": {
|
| 2243 |
"path": "repo:docs/research_roadmap.html",
|
| 2244 |
"exists": true,
|
| 2245 |
+
"bytes": 31702,
|
| 2246 |
+
"sha256": "1b20a5cc342b3ba59ad808eed9f5bf978e2d9ac438c88b5c3eeba01f4e14b883"
|
| 2247 |
},
|
| 2248 |
"mirrors": {
|
| 2249 |
"hf_space": {
|
| 2250 |
"path": "hf_space:research_roadmap.html",
|
| 2251 |
"exists": true,
|
| 2252 |
+
"bytes": 31702,
|
| 2253 |
+
"sha256": "1b20a5cc342b3ba59ad808eed9f5bf978e2d9ac438c88b5c3eeba01f4e14b883"
|
| 2254 |
},
|
| 2255 |
"hf_artifacts_docs": {
|
| 2256 |
"path": "hf_artifacts:docs/research_roadmap.html",
|
| 2257 |
"exists": true,
|
| 2258 |
+
"bytes": 31702,
|
| 2259 |
+
"sha256": "1b20a5cc342b3ba59ad808eed9f5bf978e2d9ac438c88b5c3eeba01f4e14b883"
|
| 2260 |
}
|
| 2261 |
},
|
| 2262 |
"failures": []
|
|
|
|
| 2844 |
"local": {
|
| 2845 |
"path": "repo:FOUNDATION_MODEL_PLAN.md",
|
| 2846 |
"exists": true,
|
| 2847 |
+
"bytes": 9075,
|
| 2848 |
+
"sha256": "444d13ab556d2e16a199a7fca191b87c85ab8685d167aab357bc6341839299a2"
|
| 2849 |
},
|
| 2850 |
"mirrors": {
|
| 2851 |
"hf_space": {
|
| 2852 |
"path": "hf_space:FOUNDATION_MODEL_PLAN.md",
|
| 2853 |
"exists": true,
|
| 2854 |
+
"bytes": 9075,
|
| 2855 |
+
"sha256": "444d13ab556d2e16a199a7fca191b87c85ab8685d167aab357bc6341839299a2"
|
| 2856 |
},
|
| 2857 |
"hf_artifacts": {
|
| 2858 |
"path": "hf_artifacts:FOUNDATION_MODEL_PLAN.md",
|
| 2859 |
"exists": true,
|
| 2860 |
+
"bytes": 9075,
|
| 2861 |
+
"sha256": "444d13ab556d2e16a199a7fca191b87c85ab8685d167aab357bc6341839299a2"
|
| 2862 |
},
|
| 2863 |
"hf_model": {
|
| 2864 |
"path": "hf_model:FOUNDATION_MODEL_PLAN.md",
|
| 2865 |
"exists": true,
|
| 2866 |
+
"bytes": 9075,
|
| 2867 |
+
"sha256": "444d13ab556d2e16a199a7fca191b87c85ab8685d167aab357bc6341839299a2"
|
| 2868 |
}
|
| 2869 |
},
|
| 2870 |
"failures": []
|
|
|
|
| 2937 |
"local": {
|
| 2938 |
"path": "repo:RESEARCH_ROADMAP.md",
|
| 2939 |
"exists": true,
|
| 2940 |
+
"bytes": 8388,
|
| 2941 |
+
"sha256": "0b3e3356076998ad94dc39f708cc783a4ebeab76c9da661cdd37ea12a3bb3665"
|
| 2942 |
},
|
| 2943 |
"mirrors": {
|
| 2944 |
"hf_space": {
|
| 2945 |
"path": "hf_space:RESEARCH_ROADMAP.md",
|
| 2946 |
"exists": true,
|
| 2947 |
+
"bytes": 8388,
|
| 2948 |
+
"sha256": "0b3e3356076998ad94dc39f708cc783a4ebeab76c9da661cdd37ea12a3bb3665"
|
| 2949 |
},
|
| 2950 |
"hf_artifacts": {
|
| 2951 |
"path": "hf_artifacts:RESEARCH_ROADMAP.md",
|
| 2952 |
"exists": true,
|
| 2953 |
+
"bytes": 8388,
|
| 2954 |
+
"sha256": "0b3e3356076998ad94dc39f708cc783a4ebeab76c9da661cdd37ea12a3bb3665"
|
| 2955 |
},
|
| 2956 |
"hf_model": {
|
| 2957 |
"path": "hf_model:RESEARCH_ROADMAP.md",
|
| 2958 |
"exists": true,
|
| 2959 |
+
"bytes": 8388,
|
| 2960 |
+
"sha256": "0b3e3356076998ad94dc39f708cc783a4ebeab76c9da661cdd37ea12a3bb3665"
|
| 2961 |
}
|
| 2962 |
},
|
| 2963 |
"failures": []
|
|
|
|
| 2968 |
"local": {
|
| 2969 |
"path": "repo:PROJECT_STATUS.md",
|
| 2970 |
"exists": true,
|
| 2971 |
+
"bytes": 7207,
|
| 2972 |
+
"sha256": "7baaba976ccc254da1a03ee2653057d1e08f3fb0c0cad035886c362442828720"
|
| 2973 |
},
|
| 2974 |
"mirrors": {
|
| 2975 |
"hf_space": {
|
| 2976 |
"path": "hf_space:PROJECT_STATUS.md",
|
| 2977 |
"exists": true,
|
| 2978 |
+
"bytes": 7207,
|
| 2979 |
+
"sha256": "7baaba976ccc254da1a03ee2653057d1e08f3fb0c0cad035886c362442828720"
|
| 2980 |
},
|
| 2981 |
"hf_artifacts": {
|
| 2982 |
"path": "hf_artifacts:PROJECT_STATUS.md",
|
| 2983 |
"exists": true,
|
| 2984 |
+
"bytes": 7207,
|
| 2985 |
+
"sha256": "7baaba976ccc254da1a03ee2653057d1e08f3fb0c0cad035886c362442828720"
|
| 2986 |
},
|
| 2987 |
"hf_model": {
|
| 2988 |
"path": "hf_model:PROJECT_STATUS.md",
|
| 2989 |
"exists": true,
|
| 2990 |
+
"bytes": 7207,
|
| 2991 |
+
"sha256": "7baaba976ccc254da1a03ee2653057d1e08f3fb0c0cad035886c362442828720"
|
| 2992 |
}
|
| 2993 |
},
|
| 2994 |
"failures": []
|
docs/data/project_status.json
CHANGED
|
@@ -82,7 +82,7 @@
|
|
| 82 |
"RESEARCH_ROADMAP.md",
|
| 83 |
"docs/data/research_roadmap.json"
|
| 84 |
],
|
| 85 |
-
"readout": "The roadmap connects public-sample task development to 128-episode data preparation, Qwen3-Omni LoRA, foundation-model selection, robustness runs, and
|
| 86 |
},
|
| 87 |
{
|
| 88 |
"area": "Foundation-model plan",
|
|
@@ -93,6 +93,14 @@
|
|
| 93 |
],
|
| 94 |
"readout": "Qwen3-Omni remains the first trainable held-out LoRA baseline; Cosmos 3 is added as the first world-model/action-generation branch; OpenVLA/openpi/GR00T are policy candidates after action targets are explicit."
|
| 95 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 96 |
{
|
| 97 |
"area": "Official dataset wording",
|
| 98 |
"status": "verified",
|
|
@@ -167,6 +175,7 @@
|
|
| 167 |
"Inspect RESEARCH_TAKEAWAYS.md and docs/data/research_takeaways.json before interpreting model scores.",
|
| 168 |
"Inspect RESEARCH_ROADMAP.md and docs/data/research_roadmap.json for the path from public-sample task work to multi-episode modeling.",
|
| 169 |
"Inspect FOUNDATION_MODEL_PLAN.md and docs/data/foundation_model_plan.json before choosing a backbone branch.",
|
|
|
|
| 170 |
"Inspect docs/data/summary_metrics.json and results/episode_task_suite/neural_mlp/ to check the 12-task outputs.",
|
| 171 |
"Inspect results/audio_ablation/AUDIO_ABLATION_SUMMARY.md before judging whether audio helps the current task suite.",
|
| 172 |
"Inspect EVALUATION_PROTOCOL.md before judging task metrics or leakage controls.",
|
|
@@ -180,6 +189,7 @@
|
|
| 180 |
"The current reconstruction task reconstructs feature vectors, not pixel-depth, mesh, NeRF, or Gaussian reconstruction.",
|
| 181 |
"Audio is one of the synchronized source modalities in the current task representation.",
|
| 182 |
"The audio ablation report compares audio/no-audio variants across all 12 task contracts in results/audio_ablation/.",
|
| 183 |
-
"Foundation-model selection is explicit: Qwen3-Omni is the immediate trainable pilot, Cosmos 3 is the first world-model branch, and policy models such as OpenVLA/openpi/GR00T wait for action-target conversion."
|
|
|
|
| 184 |
]
|
| 185 |
}
|
|
|
|
| 82 |
"RESEARCH_ROADMAP.md",
|
| 83 |
"docs/data/research_roadmap.json"
|
| 84 |
],
|
| 85 |
+
"readout": "The roadmap connects public-sample task development to 128-episode data preparation, Qwen3-Omni LoRA, foundation-model selection, robustness runs, world/policy branches, and the future Xperience-native pretraining goal."
|
| 86 |
},
|
| 87 |
{
|
| 88 |
"area": "Foundation-model plan",
|
|
|
|
| 93 |
],
|
| 94 |
"readout": "Qwen3-Omni remains the first trainable held-out LoRA baseline; Cosmos 3 is added as the first world-model/action-generation branch; OpenVLA/openpi/GR00T are policy candidates after action targets are explicit."
|
| 95 |
},
|
| 96 |
+
{
|
| 97 |
+
"area": "Xperience Embodied Foundation Model",
|
| 98 |
+
"status": "future_goal",
|
| 99 |
+
"evidence": [
|
| 100 |
+
"XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md"
|
| 101 |
+
],
|
| 102 |
+
"readout": "A future full-corpus pretraining plan describes target modules, objectives, staged scale-up, hardware ranges, and evaluation for a domain-specific embodied foundation model."
|
| 103 |
+
},
|
| 104 |
{
|
| 105 |
"area": "Official dataset wording",
|
| 106 |
"status": "verified",
|
|
|
|
| 175 |
"Inspect RESEARCH_TAKEAWAYS.md and docs/data/research_takeaways.json before interpreting model scores.",
|
| 176 |
"Inspect RESEARCH_ROADMAP.md and docs/data/research_roadmap.json for the path from public-sample task work to multi-episode modeling.",
|
| 177 |
"Inspect FOUNDATION_MODEL_PLAN.md and docs/data/foundation_model_plan.json before choosing a backbone branch.",
|
| 178 |
+
"Inspect XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md for the long-term full-corpus pretraining goal.",
|
| 179 |
"Inspect docs/data/summary_metrics.json and results/episode_task_suite/neural_mlp/ to check the 12-task outputs.",
|
| 180 |
"Inspect results/audio_ablation/AUDIO_ABLATION_SUMMARY.md before judging whether audio helps the current task suite.",
|
| 181 |
"Inspect EVALUATION_PROTOCOL.md before judging task metrics or leakage controls.",
|
|
|
|
| 189 |
"The current reconstruction task reconstructs feature vectors, not pixel-depth, mesh, NeRF, or Gaussian reconstruction.",
|
| 190 |
"Audio is one of the synchronized source modalities in the current task representation.",
|
| 191 |
"The audio ablation report compares audio/no-audio variants across all 12 task contracts in results/audio_ablation/.",
|
| 192 |
+
"Foundation-model selection is explicit: Qwen3-Omni is the immediate trainable pilot, Cosmos 3 is the first world-model branch, and policy models such as OpenVLA/openpi/GR00T wait for action-target conversion.",
|
| 193 |
+
"The Xperience Embodied Foundation Model is a future native-pretraining goal, not a completed model or current benchmark."
|
| 194 |
]
|
| 195 |
}
|
docs/data/publication_audit.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
{
|
| 2 |
"status": "pass",
|
| 3 |
-
"generated_at_utc": "2026-06-
|
| 4 |
"checks": [
|
| 5 |
{
|
| 6 |
"name": "required_publication_assets_present",
|
|
@@ -141,7 +141,7 @@
|
|
| 141 |
"surface": "github_repo",
|
| 142 |
"path": "README.md",
|
| 143 |
"exists": true,
|
| 144 |
-
"required_marker_count":
|
| 145 |
"missing_markers": [],
|
| 146 |
"status": "pass"
|
| 147 |
},
|
|
@@ -149,7 +149,7 @@
|
|
| 149 |
"surface": "hf_space_bundle",
|
| 150 |
"path": "README.md",
|
| 151 |
"exists": true,
|
| 152 |
-
"required_marker_count":
|
| 153 |
"missing_markers": [],
|
| 154 |
"status": "pass"
|
| 155 |
},
|
|
@@ -157,7 +157,7 @@
|
|
| 157 |
"surface": "hf_artifact_bundle",
|
| 158 |
"path": "README.md",
|
| 159 |
"exists": true,
|
| 160 |
-
"required_marker_count":
|
| 161 |
"missing_markers": [],
|
| 162 |
"status": "pass"
|
| 163 |
},
|
|
@@ -165,7 +165,7 @@
|
|
| 165 |
"surface": "hf_artifact_bundle",
|
| 166 |
"path": "PROJECT_README.md",
|
| 167 |
"exists": true,
|
| 168 |
-
"required_marker_count":
|
| 169 |
"missing_markers": [],
|
| 170 |
"status": "pass"
|
| 171 |
},
|
|
@@ -173,7 +173,7 @@
|
|
| 173 |
"surface": "hf_model_bundle",
|
| 174 |
"path": "README.md",
|
| 175 |
"exists": true,
|
| 176 |
-
"required_marker_count":
|
| 177 |
"missing_markers": [],
|
| 178 |
"status": "pass"
|
| 179 |
}
|
|
@@ -182,8 +182,8 @@
|
|
| 182 |
"github_repo": {
|
| 183 |
"root": "repo",
|
| 184 |
"exists": true,
|
| 185 |
-
"file_count":
|
| 186 |
-
"text_file_count":
|
| 187 |
"largest_file": {
|
| 188 |
"path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
|
| 189 |
"bytes": 55702978
|
|
@@ -193,8 +193,8 @@
|
|
| 193 |
"hf_space_bundle": {
|
| 194 |
"root": "hf_publish/space",
|
| 195 |
"exists": true,
|
| 196 |
-
"file_count":
|
| 197 |
-
"text_file_count":
|
| 198 |
"largest_file": {
|
| 199 |
"path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
|
| 200 |
"bytes": 55702978
|
|
@@ -204,8 +204,8 @@
|
|
| 204 |
"hf_artifact_bundle": {
|
| 205 |
"root": "hf_publish/artifacts",
|
| 206 |
"exists": true,
|
| 207 |
-
"file_count":
|
| 208 |
-
"text_file_count":
|
| 209 |
"largest_file": {
|
| 210 |
"path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
|
| 211 |
"bytes": 55702978
|
|
@@ -215,11 +215,11 @@
|
|
| 215 |
"hf_model_bundle": {
|
| 216 |
"root": "hf_publish/model",
|
| 217 |
"exists": true,
|
| 218 |
-
"file_count":
|
| 219 |
-
"text_file_count":
|
| 220 |
"largest_file": {
|
| 221 |
-
"path": "
|
| 222 |
-
"bytes":
|
| 223 |
},
|
| 224 |
"violations": []
|
| 225 |
}
|
|
|
|
| 1 |
{
|
| 2 |
"status": "pass",
|
| 3 |
+
"generated_at_utc": "2026-06-04T20:43:37+00:00",
|
| 4 |
"checks": [
|
| 5 |
{
|
| 6 |
"name": "required_publication_assets_present",
|
|
|
|
| 141 |
"surface": "github_repo",
|
| 142 |
"path": "README.md",
|
| 143 |
"exists": true,
|
| 144 |
+
"required_marker_count": 10,
|
| 145 |
"missing_markers": [],
|
| 146 |
"status": "pass"
|
| 147 |
},
|
|
|
|
| 149 |
"surface": "hf_space_bundle",
|
| 150 |
"path": "README.md",
|
| 151 |
"exists": true,
|
| 152 |
+
"required_marker_count": 10,
|
| 153 |
"missing_markers": [],
|
| 154 |
"status": "pass"
|
| 155 |
},
|
|
|
|
| 157 |
"surface": "hf_artifact_bundle",
|
| 158 |
"path": "README.md",
|
| 159 |
"exists": true,
|
| 160 |
+
"required_marker_count": 7,
|
| 161 |
"missing_markers": [],
|
| 162 |
"status": "pass"
|
| 163 |
},
|
|
|
|
| 165 |
"surface": "hf_artifact_bundle",
|
| 166 |
"path": "PROJECT_README.md",
|
| 167 |
"exists": true,
|
| 168 |
+
"required_marker_count": 10,
|
| 169 |
"missing_markers": [],
|
| 170 |
"status": "pass"
|
| 171 |
},
|
|
|
|
| 173 |
"surface": "hf_model_bundle",
|
| 174 |
"path": "README.md",
|
| 175 |
"exists": true,
|
| 176 |
+
"required_marker_count": 10,
|
| 177 |
"missing_markers": [],
|
| 178 |
"status": "pass"
|
| 179 |
}
|
|
|
|
| 182 |
"github_repo": {
|
| 183 |
"root": "repo",
|
| 184 |
"exists": true,
|
| 185 |
+
"file_count": 396,
|
| 186 |
+
"text_file_count": 330,
|
| 187 |
"largest_file": {
|
| 188 |
"path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
|
| 189 |
"bytes": 55702978
|
|
|
|
| 193 |
"hf_space_bundle": {
|
| 194 |
"root": "hf_publish/space",
|
| 195 |
"exists": true,
|
| 196 |
+
"file_count": 317,
|
| 197 |
+
"text_file_count": 251,
|
| 198 |
"largest_file": {
|
| 199 |
"path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
|
| 200 |
"bytes": 55702978
|
|
|
|
| 204 |
"hf_artifact_bundle": {
|
| 205 |
"root": "hf_publish/artifacts",
|
| 206 |
"exists": true,
|
| 207 |
+
"file_count": 418,
|
| 208 |
+
"text_file_count": 330,
|
| 209 |
"largest_file": {
|
| 210 |
"path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
|
| 211 |
"bytes": 55702978
|
|
|
|
| 215 |
"hf_model_bundle": {
|
| 216 |
"root": "hf_publish/model",
|
| 217 |
"exists": true,
|
| 218 |
+
"file_count": 644,
|
| 219 |
+
"text_file_count": 519,
|
| 220 |
"largest_file": {
|
| 221 |
+
"path": "pytorch_model.bin",
|
| 222 |
+
"bytes": 93495480
|
| 223 |
},
|
| 224 |
"violations": []
|
| 225 |
}
|
docs/data/research_roadmap.json
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
{
|
| 2 |
"title": "Ropedia Xperience-10M Research Roadmap",
|
| 3 |
-
"summary": "Staged path from the public-sample task lab to multi-episode held-out evaluation, foundation-model selection, and
|
| 4 |
-
"current_decision_point": "Keep the public-sample task suite as the development harness, prepare the selected official Xperience-10M episodes for the held-out Qwen3-Omni pilot, then branch into Cosmos 3 world modeling and policy-model experiments after the data preparation path is stable.",
|
| 5 |
"phases": [
|
| 6 |
{
|
| 7 |
"id": "public_sample_task_lab",
|
|
@@ -126,6 +126,30 @@
|
|
| 126 |
"updated model cards"
|
| 127 |
],
|
| 128 |
"reader_takeaway": "The long-term direction is richer multimodal representation learning for embodied-AI reasoning, with model branches chosen by task fit rather than by a single default backbone."
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 129 |
}
|
| 130 |
],
|
| 131 |
"public_surfaces_to_update": [
|
|
@@ -134,6 +158,7 @@
|
|
| 134 |
"RESEARCH_TAKEAWAYS.md",
|
| 135 |
"EVALUATION_PROTOCOL.md",
|
| 136 |
"ARTIFACT_GUIDE.md",
|
|
|
|
| 137 |
"docs/index.html",
|
| 138 |
"docs/data/research_roadmap.json",
|
| 139 |
"Hugging Face Space card",
|
|
|
|
| 1 |
{
|
| 2 |
"title": "Ropedia Xperience-10M Research Roadmap",
|
| 3 |
+
"summary": "Staged path from the public-sample task lab to multi-episode held-out evaluation, foundation-model selection, world/policy branches, and a future Xperience-native embodied foundation model.",
|
| 4 |
+
"current_decision_point": "Keep the public-sample task suite as the development harness, prepare the selected official Xperience-10M episodes for the held-out Qwen3-Omni pilot, then branch into Cosmos 3 world modeling and policy-model experiments after the data preparation path is stable. The Xperience Embodied Foundation Model is a later full-corpus pretraining goal, not a current result.",
|
| 5 |
"phases": [
|
| 6 |
{
|
| 7 |
"id": "public_sample_task_lab",
|
|
|
|
| 126 |
"updated model cards"
|
| 127 |
],
|
| 128 |
"reader_takeaway": "The long-term direction is richer multimodal representation learning for embodied-AI reasoning, with model branches chosen by task fit rather than by a single default backbone."
|
| 129 |
+
},
|
| 130 |
+
{
|
| 131 |
+
"id": "xperience_embodied_foundation_pretraining",
|
| 132 |
+
"name": "Xperience Embodied Foundation Model Pretraining",
|
| 133 |
+
"status": "future",
|
| 134 |
+
"entry_condition": "Full-corpus access, PB-scale storage path, high-throughput data loading, multi-node compute, and positive scaling evidence from smaller multi-episode runs.",
|
| 135 |
+
"deliverables": [
|
| 136 |
+
"full-corpus episode and split manifests",
|
| 137 |
+
"pretraining shard and provenance manifests",
|
| 138 |
+
"0.3B-1B and 1B-3B scaling pilots",
|
| 139 |
+
"3B-7B Xperience-native domain model target",
|
| 140 |
+
"held-out episode/session/activity/object evaluations",
|
| 141 |
+
"missing-modality robustness report",
|
| 142 |
+
"model card and data-boundary report"
|
| 143 |
+
],
|
| 144 |
+
"completion_evidence": [
|
| 145 |
+
"pretraining metadata",
|
| 146 |
+
"checkpoint inventory",
|
| 147 |
+
"scaling curves",
|
| 148 |
+
"held-out evaluation reports",
|
| 149 |
+
"qualitative retrieval or future-state examples",
|
| 150 |
+
"safety and data-boundary report"
|
| 151 |
+
],
|
| 152 |
+
"reader_takeaway": "The final research direction is a domain-specific embodied foundation model trained directly on Xperience-10M, after smaller pilots justify the cost and infrastructure."
|
| 153 |
}
|
| 154 |
],
|
| 155 |
"public_surfaces_to_update": [
|
|
|
|
| 158 |
"RESEARCH_TAKEAWAYS.md",
|
| 159 |
"EVALUATION_PROTOCOL.md",
|
| 160 |
"ARTIFACT_GUIDE.md",
|
| 161 |
+
"XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md",
|
| 162 |
"docs/index.html",
|
| 163 |
"docs/data/research_roadmap.json",
|
| 164 |
"Hugging Face Space card",
|
docs/data/research_roadmap_interactive.json
CHANGED
|
@@ -1837,7 +1837,8 @@
|
|
| 1837 |
"NVIDIA GR00T"
|
| 1838 |
],
|
| 1839 |
"first_world_model_branch": "Cosmos 3",
|
| 1840 |
-
"immediate_trainable_backbone": "Qwen3-Omni"
|
|
|
|
| 1841 |
},
|
| 1842 |
"evaluation_additions": [
|
| 1843 |
{
|
|
@@ -1921,6 +1922,11 @@
|
|
| 1921 |
"action": "Publish branch results only with real manifests, predictions, metrics, and qualitative examples.",
|
| 1922 |
"name": "Publishing threshold",
|
| 1923 |
"step": 6
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1924 |
}
|
| 1925 |
],
|
| 1926 |
"model_families": [
|
|
@@ -2023,6 +2029,21 @@
|
|
| 2023 |
"Useful after action target design.",
|
| 2024 |
"Less directly omni-modal than Qwen3-Omni or Cosmos 3."
|
| 2025 |
]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2026 |
}
|
| 2027 |
],
|
| 2028 |
"source_links": [
|
|
@@ -2057,11 +2078,15 @@
|
|
| 2057 |
{
|
| 2058 |
"label": "LeRobot / SmolVLA",
|
| 2059 |
"url": "https://github.com/huggingface/lerobot"
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2060 |
}
|
| 2061 |
],
|
| 2062 |
"status": "planning_artifact"
|
| 2063 |
},
|
| 2064 |
-
"generated_at_utc": "2026-06-
|
| 2065 |
"omni_plan": {
|
| 2066 |
"adapter": "LoRA rank 16, alpha 32, dropout 0.05",
|
| 2067 |
"backbone": "Qwen/Qwen3-Omni-30B-A3B-Instruct",
|
|
@@ -2208,6 +2233,31 @@
|
|
| 2208 |
"reader_takeaway": "The long-term direction is richer multimodal representation learning for embodied-AI reasoning, with model branches chosen by task fit rather than by a single default backbone.",
|
| 2209 |
"stage": "future",
|
| 2210 |
"status": "planned"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2211 |
}
|
| 2212 |
],
|
| 2213 |
"scale_up": {
|
|
|
|
| 1837 |
"NVIDIA GR00T"
|
| 1838 |
],
|
| 1839 |
"first_world_model_branch": "Cosmos 3",
|
| 1840 |
+
"immediate_trainable_backbone": "Qwen3-Omni",
|
| 1841 |
+
"long_term_native_pretraining_goal": "Xperience Embodied Foundation Model"
|
| 1842 |
},
|
| 1843 |
"evaluation_additions": [
|
| 1844 |
{
|
|
|
|
| 1922 |
"action": "Publish branch results only with real manifests, predictions, metrics, and qualitative examples.",
|
| 1923 |
"name": "Publishing threshold",
|
| 1924 |
"step": 6
|
| 1925 |
+
},
|
| 1926 |
+
{
|
| 1927 |
+
"action": "Start a from-scratch Xperience Embodied Foundation Model only after smaller scaling stages, full-corpus storage, multi-node compute, and held-out evaluation protocols are in place.",
|
| 1928 |
+
"name": "Xperience-native pretraining",
|
| 1929 |
+
"step": 7
|
| 1930 |
}
|
| 1931 |
],
|
| 1932 |
"model_families": [
|
|
|
|
| 2029 |
"Useful after action target design.",
|
| 2030 |
"Less directly omni-modal than Qwen3-Omni or Cosmos 3."
|
| 2031 |
]
|
| 2032 |
+
},
|
| 2033 |
+
{
|
| 2034 |
+
"best_role": "Domain model over synchronized embodied experience.",
|
| 2035 |
+
"category": "xperience_native_pretraining_goal",
|
| 2036 |
+
"current_decision": "future_goal_after_scaling_evidence",
|
| 2037 |
+
"entry_condition": "Full-corpus data path, PB-scale storage, multi-node compute, and positive smaller-run scaling evidence.",
|
| 2038 |
+
"family": "Xperience Embodied Foundation Model",
|
| 2039 |
+
"openness": "future project-specific model if full-corpus access and compute exist",
|
| 2040 |
+
"priority": 8,
|
| 2041 |
+
"public_source": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md",
|
| 2042 |
+
"xperience10m_fit": [
|
| 2043 |
+
"Uses the full aligned modality stack rather than treating sensors as auxiliary metadata.",
|
| 2044 |
+
"Targets temporal embodied representation learning across perception, motion, geometry, audio, and language.",
|
| 2045 |
+
"Can become the shared pretraining backbone for Qwen-style instruction tasks, Cosmos-style world modeling, and policy/action branches."
|
| 2046 |
+
]
|
| 2047 |
}
|
| 2048 |
],
|
| 2049 |
"source_links": [
|
|
|
|
| 2078 |
{
|
| 2079 |
"label": "LeRobot / SmolVLA",
|
| 2080 |
"url": "https://github.com/huggingface/lerobot"
|
| 2081 |
+
},
|
| 2082 |
+
{
|
| 2083 |
+
"label": "Xperience Embodied Foundation Model pretraining plan",
|
| 2084 |
+
"url": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md"
|
| 2085 |
}
|
| 2086 |
],
|
| 2087 |
"status": "planning_artifact"
|
| 2088 |
},
|
| 2089 |
+
"generated_at_utc": "2026-06-04T20:40:29+00:00",
|
| 2090 |
"omni_plan": {
|
| 2091 |
"adapter": "LoRA rank 16, alpha 32, dropout 0.05",
|
| 2092 |
"backbone": "Qwen/Qwen3-Omni-30B-A3B-Instruct",
|
|
|
|
| 2233 |
"reader_takeaway": "The long-term direction is richer multimodal representation learning for embodied-AI reasoning, with model branches chosen by task fit rather than by a single default backbone.",
|
| 2234 |
"stage": "future",
|
| 2235 |
"status": "planned"
|
| 2236 |
+
},
|
| 2237 |
+
{
|
| 2238 |
+
"completion_evidence": [
|
| 2239 |
+
"pretraining metadata",
|
| 2240 |
+
"checkpoint inventory",
|
| 2241 |
+
"scaling curves",
|
| 2242 |
+
"held-out evaluation reports",
|
| 2243 |
+
"qualitative retrieval or future-state examples",
|
| 2244 |
+
"safety and data-boundary report"
|
| 2245 |
+
],
|
| 2246 |
+
"deliverables": [
|
| 2247 |
+
"full-corpus episode and split manifests",
|
| 2248 |
+
"pretraining shard and provenance manifests",
|
| 2249 |
+
"0.3B-1B and 1B-3B scaling pilots",
|
| 2250 |
+
"3B-7B Xperience-native domain model target",
|
| 2251 |
+
"held-out episode/session/activity/object evaluations",
|
| 2252 |
+
"missing-modality robustness report",
|
| 2253 |
+
"model card and data-boundary report"
|
| 2254 |
+
],
|
| 2255 |
+
"entry_condition": "Full-corpus access, PB-scale storage path, high-throughput data loading, multi-node compute, and positive scaling evidence from smaller multi-episode runs.",
|
| 2256 |
+
"id": "xperience_embodied_foundation_pretraining",
|
| 2257 |
+
"name": "Xperience Embodied Foundation Model Pretraining",
|
| 2258 |
+
"reader_takeaway": "The final research direction is a domain-specific embodied foundation model trained directly on Xperience-10M, after smaller pilots justify the cost and infrastructure.",
|
| 2259 |
+
"stage": "future",
|
| 2260 |
+
"status": "future"
|
| 2261 |
}
|
| 2262 |
],
|
| 2263 |
"scale_up": {
|
docs/index.html
CHANGED
|
@@ -2141,9 +2141,11 @@
|
|
| 2141 |
<p class="hero-copy">
|
| 2142 |
This project uses the public Xperience-10M sample from Ropedia to explore
|
| 2143 |
embodied-AI task design, multimodal feature construction, lightweight
|
| 2144 |
-
baselines,
|
| 2145 |
-
|
| 2146 |
-
|
|
|
|
|
|
|
| 2147 |
</p>
|
| 2148 |
<div class="hero-actions">
|
| 2149 |
<a class="button primary" href="research_roadmap.html">Open roadmap</a>
|
|
@@ -2252,7 +2254,7 @@
|
|
| 2252 |
</article>
|
| 2253 |
<article class="brief-card">
|
| 2254 |
<strong>Scale-up readiness</strong>
|
| 2255 |
-
<p>Connects the same data contract to 32/128-episode held-out pilots, Qwen3-Omni LoRA, Cosmos-style world modeling, and later
|
| 2256 |
</article>
|
| 2257 |
</div>
|
| 2258 |
<div class="brief-actions">
|
|
@@ -2356,7 +2358,7 @@
|
|
| 2356 |
<div class="wrap">
|
| 2357 |
<div class="section-head">
|
| 2358 |
<h2>Research roadmap.</h2>
|
| 2359 |
-
<p>The project path moves from the current public-sample task lab to multi-episode data preparation, held-out Qwen3-Omni fine-tuning, robustness runs, and
|
| 2360 |
</div>
|
| 2361 |
<div class="roadmap-grid" aria-label="Research roadmap stages">
|
| 2362 |
<article class="roadmap-card" data-status="implemented">
|
|
@@ -2413,12 +2415,22 @@
|
|
| 2413 |
<strong>Evidence</strong><p>Task-specific held-out evaluations, qualitative inspection, and updated model cards.</p>
|
| 2414 |
</div>
|
| 2415 |
</article>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2416 |
</div>
|
| 2417 |
<div class="roadmap-links">
|
| 2418 |
<a href="research_roadmap.html">interactive roadmap</a>
|
| 2419 |
<a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/RESEARCH_ROADMAP.md">roadmap document</a>
|
| 2420 |
<a href="data/research_roadmap.json">roadmap stages</a>
|
| 2421 |
<a href="data/foundation_model_plan.json">foundation model plan</a>
|
|
|
|
| 2422 |
<a href="data/research_roadmap_interactive.json">interactive map</a>
|
| 2423 |
<a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">scale-up status</a>
|
| 2424 |
<a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/PROJECT_STATUS.md">project status</a>
|
|
@@ -2438,7 +2450,7 @@
|
|
| 2438 |
<article class="artifact"><h3>Metric contract</h3><p>All 12 tasks list input, target, primary metric, minimal baseline score, and neural MLP score from committed result files.</p><a href="data/summary_metrics.json">summary metrics</a></article>
|
| 2439 |
<article class="artifact"><h3>Leakage controls</h3><p>Scalers fit on train windows only; future labels, target-side signals, caption/object labels, and contact labels stay on the target side unless explicitly queried.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/scripts/build_evaluation_protocol.py">builder script</a></article>
|
| 2440 |
<article class="artifact"><h3>Audio ablation</h3><p>Audio and no-audio variants are evaluated across all 12 task contracts under the same chronological split.</p><a href="data/audio_ablation_summary.json">audio summary</a></article>
|
| 2441 |
-
<article class="artifact"><h3>Foundation branch selection</h3><p>Qwen3-Omni is the first trainable baseline, Cosmos 3 becomes the world-model branch,
|
| 2442 |
<article class="artifact"><h3>Next evaluation stage</h3><p>This public-sample run covers single-episode task development. Cross-episode generalization, audio-visual learning, world modeling, policy targets, and held-out Qwen3-Omni training move to the multi-episode stage after selected data is prepared.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">next-stage plan</a></article>
|
| 2443 |
<article class="artifact"><h3>Scale-up requirement</h3><p>The Omni pilot requires selected prepared episodes, held-out episode splits, no train/test episode leakage, training metadata, predictions, metrics, and a run report.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">scale-up status</a></article>
|
| 2444 |
</div>
|
|
@@ -2492,10 +2504,11 @@
|
|
| 2492 |
<article class="evidence-card">
|
| 2493 |
<span class="status-pill">current plan</span>
|
| 2494 |
<h3>Foundation backbones are separated by role</h3>
|
| 2495 |
-
<p>Qwen3-Omni stays first for held-out LoRA; Cosmos 3 is the world-model branch; OpenVLA/openpi/GR00T are policy candidates after action-space conversion.</p>
|
| 2496 |
<div class="evidence-links">
|
| 2497 |
<a href="data/foundation_model_plan.json">foundation model plan</a>
|
| 2498 |
<a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/FOUNDATION_MODEL_PLAN.md">plan doc</a>
|
|
|
|
| 2499 |
</div>
|
| 2500 |
</article>
|
| 2501 |
<article class="evidence-card">
|
|
@@ -2628,10 +2641,11 @@
|
|
| 2628 |
<article class="reading-card">
|
| 2629 |
<span class="step-index">04</span>
|
| 2630 |
<h3>Check the scale-up gate</h3>
|
| 2631 |
-
<p>The multi-episode Qwen3-Omni path is prepared. The selected 128-episode result will be added after staging, preprocessing, training, and held-out evaluation pass.</p>
|
| 2632 |
<div class="reading-links">
|
| 2633 |
<a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">scale-up status</a>
|
| 2634 |
<a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md">data access</a>
|
|
|
|
| 2635 |
<a href="data/project_packet.json">reader path</a>
|
| 2636 |
</div>
|
| 2637 |
</article>
|
|
@@ -2659,7 +2673,7 @@
|
|
| 2659 |
<article class="artifact"><h3>Current project subset</h3><p>One public sample episode, 5,821 frames, 1,161 aligned windows, 8,546-dimensional task inputs, and no raw-data redistribution.</p><a href="data/modality_atlas.json">modality atlas</a></article>
|
| 2660 |
<article class="artifact"><h3>Covered now</h3><p>Action/subtask labels, next-action prediction, temporal diagnostics, hand trajectory, contact, object relevance, caption grounding, retrieval, reconstruction, and misalignment.</p><a href="data/summary_metrics.json">summary metrics</a></article>
|
| 2661 |
<article class="artifact"><h3>Responsible use</h3><p>This project is for research exploration and excludes identity recognition, surveillance, biometric profiling, sensitive-attribute inference, and safety-critical deployment.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/DATA_NOTICE.md">use notes</a></article>
|
| 2662 |
-
<article class="artifact"><h3>Later milestones</h3><p>Full audio-visual learning, caption generation, depth-pixel prediction, SLAM estimation, neural rendering, policy learning, cross-episode generalization,
|
| 2663 |
</div>
|
| 2664 |
</div>
|
| 2665 |
</section>
|
|
@@ -3103,10 +3117,11 @@
|
|
| 3103 |
</div>
|
| 3104 |
<div class="artifact-grid">
|
| 3105 |
<article class="artifact primary-artifact"><div><h3>Project scope</h3><p>Connects implemented single-episode artifacts, setup-stage Omni work, the selected 128-episode pilot, and later multi-episode milestones.</p></div><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/EVIDENCE_CONTRACT.md">evidence contract</a></article>
|
| 3106 |
-
<article class="artifact"><h3>Foundation-model plan</h3><p>Backbone selection matrix covering Qwen3-Omni, Cosmos 3, GR00T, OpenVLA/openpi, Gemini Robotics, Octo,
|
| 3107 |
<article class="artifact"><h3>Multi-episode data access</h3><p>Public data-access path, selected 128-episode pilot plan, and preparation requirements.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md">data access</a></article>
|
| 3108 |
<article class="artifact"><h3>Qwen3-Omni preparation</h3><p>Episode selection and manifest preparation for the current scale-up path.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/episode_manifest.json">preparation details</a></article>
|
| 3109 |
<article class="artifact"><h3>Scale-up requirement</h3><p>What must be available before full pilot training and held-out metrics.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">training requirements</a></article>
|
|
|
|
| 3110 |
</div>
|
| 3111 |
</section>
|
| 3112 |
|
|
@@ -3123,7 +3138,7 @@
|
|
| 3123 |
<article class="artifact"><h3>Dataset notes</h3><p>Official dataset links, public sample source, modalities, access boundary, and current project subset.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/XPERIENCE10M_DATASET_CARD_ALIGNMENT.md">dataset notes</a></article>
|
| 3124 |
<article class="artifact"><h3>Reproducibility</h3><p>Commands and expected outputs for rebuilding the public-sample task suite and visual artifacts.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/REPRODUCIBILITY.md">reproduce</a></article>
|
| 3125 |
<article class="artifact"><h3>Qwen3-Omni status</h3><p>Data requirements and evaluation boundary for the selected multi-episode LoRA pilot.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">training status</a></article>
|
| 3126 |
-
<article class="artifact"><h3>Foundation-model plan</h3><p>Qwen3-Omni, Cosmos 3, GR00T, OpenVLA/openpi, Gemini Robotics, Octo,
|
| 3127 |
<article class="artifact"><h3>Hub artifacts</h3><p>Derived CSV/JSON/Markdown/figure artifacts without redistributing raw Xperience-10M data.</p><a href="https://huggingface.co/datasets/cy0307/ropedia-xperience-10m-task-suite-artifacts">artifact dataset</a></article>
|
| 3128 |
<article class="artifact"><h3>Baseline models</h3><p>Lightweight minimal and neural task-head model files for the 12 task contracts.</p><a href="https://huggingface.co/cy0307/ropedia-xperience-10m-task-baselines">model repo</a></article>
|
| 3129 |
</div>
|
|
@@ -3143,6 +3158,7 @@
|
|
| 3143 |
<article class="artifact"><h3>Transfer</h3><p>Download raw episodes only from official gated sources, exclude visualization.rrd, validate files, then stage them for training.</p></article>
|
| 3144 |
<article class="artifact"><h3>Current LoRA artifact</h3><p>The current LoRA artifact uses the locally available sample data. The multi-episode result begins after selected data is prepared, preprocessed, trained, and evaluated on held-out sessions.</p></article>
|
| 3145 |
<article class="artifact"><h3>Backbone branches</h3><p>Qwen3-Omni is the immediate LoRA path; Cosmos 3 is the first world-model branch; GR00T/OpenVLA/openpi become policy branches after action targets are well-defined.</p><a href="data/foundation_model_plan.json">backbone plan</a></article>
|
|
|
|
| 3146 |
</div>
|
| 3147 |
</div>
|
| 3148 |
</section>
|
|
|
|
| 2141 |
<p class="hero-copy">
|
| 2142 |
This project uses the public Xperience-10M sample from Ropedia to explore
|
| 2143 |
embodied-AI task design, multimodal feature construction, lightweight
|
| 2144 |
+
baselines, future Omni-model fine-tuning, and the long-term path toward
|
| 2145 |
+
an Xperience-native embodied foundation model. It starts from the
|
| 2146 |
+
sample episode available now, then keeps the same data contracts ready
|
| 2147 |
+
for held-out multi-episode training when more Xperience-10M data is
|
| 2148 |
+
prepared.
|
| 2149 |
</p>
|
| 2150 |
<div class="hero-actions">
|
| 2151 |
<a class="button primary" href="research_roadmap.html">Open roadmap</a>
|
|
|
|
| 2254 |
</article>
|
| 2255 |
<article class="brief-card">
|
| 2256 |
<strong>Scale-up readiness</strong>
|
| 2257 |
+
<p>Connects the same data contract to 32/128-episode held-out pilots, Qwen3-Omni LoRA, Cosmos-style world modeling, policy-model branches, and the later Xperience-native pretraining goal.</p>
|
| 2258 |
</article>
|
| 2259 |
</div>
|
| 2260 |
<div class="brief-actions">
|
|
|
|
| 2358 |
<div class="wrap">
|
| 2359 |
<div class="section-head">
|
| 2360 |
<h2>Research roadmap.</h2>
|
| 2361 |
+
<p>The project path moves from the current public-sample task lab to multi-episode data preparation, held-out Qwen3-Omni fine-tuning, robustness runs, world/policy branches, and the future Xperience Embodied Foundation Model pretraining goal.</p>
|
| 2362 |
</div>
|
| 2363 |
<div class="roadmap-grid" aria-label="Research roadmap stages">
|
| 2364 |
<article class="roadmap-card" data-status="implemented">
|
|
|
|
| 2415 |
<strong>Evidence</strong><p>Task-specific held-out evaluations, qualitative inspection, and updated model cards.</p>
|
| 2416 |
</div>
|
| 2417 |
</article>
|
| 2418 |
+
<article class="roadmap-card" data-status="planned">
|
| 2419 |
+
<span class="roadmap-status">future</span>
|
| 2420 |
+
<h3>Xperience Embodied Foundation Model</h3>
|
| 2421 |
+
<p>Pretrain an Xperience-native domain model over synchronized video, audio, depth, pose, mocap, IMU, and language after smaller scaling stages prove value.</p>
|
| 2422 |
+
<div class="roadmap-meta">
|
| 2423 |
+
<strong>Entry</strong><p>Full-corpus access, PB-scale storage path, multi-node compute, and positive scaling evidence.</p>
|
| 2424 |
+
<strong>Evidence</strong><p>Pretraining manifests, scaling curves, held-out evaluations, checkpoint inventory, model card, and data-boundary report.</p>
|
| 2425 |
+
</div>
|
| 2426 |
+
</article>
|
| 2427 |
</div>
|
| 2428 |
<div class="roadmap-links">
|
| 2429 |
<a href="research_roadmap.html">interactive roadmap</a>
|
| 2430 |
<a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/RESEARCH_ROADMAP.md">roadmap document</a>
|
| 2431 |
<a href="data/research_roadmap.json">roadmap stages</a>
|
| 2432 |
<a href="data/foundation_model_plan.json">foundation model plan</a>
|
| 2433 |
+
<a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md">native pretraining plan</a>
|
| 2434 |
<a href="data/research_roadmap_interactive.json">interactive map</a>
|
| 2435 |
<a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">scale-up status</a>
|
| 2436 |
<a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/PROJECT_STATUS.md">project status</a>
|
|
|
|
| 2450 |
<article class="artifact"><h3>Metric contract</h3><p>All 12 tasks list input, target, primary metric, minimal baseline score, and neural MLP score from committed result files.</p><a href="data/summary_metrics.json">summary metrics</a></article>
|
| 2451 |
<article class="artifact"><h3>Leakage controls</h3><p>Scalers fit on train windows only; future labels, target-side signals, caption/object labels, and contact labels stay on the target side unless explicitly queried.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/scripts/build_evaluation_protocol.py">builder script</a></article>
|
| 2452 |
<article class="artifact"><h3>Audio ablation</h3><p>Audio and no-audio variants are evaluated across all 12 task contracts under the same chronological split.</p><a href="data/audio_ablation_summary.json">audio summary</a></article>
|
| 2453 |
+
<article class="artifact"><h3>Foundation branch selection</h3><p>Qwen3-Omni is the first trainable baseline, Cosmos 3 becomes the world-model branch, policy models wait for explicit action targets, and Xperience-native pretraining remains a later full-corpus goal.</p><a href="data/foundation_model_plan.json">backbone plan</a></article>
|
| 2454 |
<article class="artifact"><h3>Next evaluation stage</h3><p>This public-sample run covers single-episode task development. Cross-episode generalization, audio-visual learning, world modeling, policy targets, and held-out Qwen3-Omni training move to the multi-episode stage after selected data is prepared.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">next-stage plan</a></article>
|
| 2455 |
<article class="artifact"><h3>Scale-up requirement</h3><p>The Omni pilot requires selected prepared episodes, held-out episode splits, no train/test episode leakage, training metadata, predictions, metrics, and a run report.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">scale-up status</a></article>
|
| 2456 |
</div>
|
|
|
|
| 2504 |
<article class="evidence-card">
|
| 2505 |
<span class="status-pill">current plan</span>
|
| 2506 |
<h3>Foundation backbones are separated by role</h3>
|
| 2507 |
+
<p>Qwen3-Omni stays first for held-out LoRA; Cosmos 3 is the world-model branch; OpenVLA/openpi/GR00T are policy candidates after action-space conversion; Xperience-native pretraining is the later full-corpus goal.</p>
|
| 2508 |
<div class="evidence-links">
|
| 2509 |
<a href="data/foundation_model_plan.json">foundation model plan</a>
|
| 2510 |
<a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/FOUNDATION_MODEL_PLAN.md">plan doc</a>
|
| 2511 |
+
<a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md">pretraining plan</a>
|
| 2512 |
</div>
|
| 2513 |
</article>
|
| 2514 |
<article class="evidence-card">
|
|
|
|
| 2641 |
<article class="reading-card">
|
| 2642 |
<span class="step-index">04</span>
|
| 2643 |
<h3>Check the scale-up gate</h3>
|
| 2644 |
+
<p>The multi-episode Qwen3-Omni path is prepared. The selected 128-episode result will be added after staging, preprocessing, training, and held-out evaluation pass. The native-pretraining plan shows how this can grow into a full-corpus research direction.</p>
|
| 2645 |
<div class="reading-links">
|
| 2646 |
<a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">scale-up status</a>
|
| 2647 |
<a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md">data access</a>
|
| 2648 |
+
<a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md">native pretraining</a>
|
| 2649 |
<a href="data/project_packet.json">reader path</a>
|
| 2650 |
</div>
|
| 2651 |
</article>
|
|
|
|
| 2673 |
<article class="artifact"><h3>Current project subset</h3><p>One public sample episode, 5,821 frames, 1,161 aligned windows, 8,546-dimensional task inputs, and no raw-data redistribution.</p><a href="data/modality_atlas.json">modality atlas</a></article>
|
| 2674 |
<article class="artifact"><h3>Covered now</h3><p>Action/subtask labels, next-action prediction, temporal diagnostics, hand trajectory, contact, object relevance, caption grounding, retrieval, reconstruction, and misalignment.</p><a href="data/summary_metrics.json">summary metrics</a></article>
|
| 2675 |
<article class="artifact"><h3>Responsible use</h3><p>This project is for research exploration and excludes identity recognition, surveillance, biometric profiling, sensitive-attribute inference, and safety-critical deployment.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/DATA_NOTICE.md">use notes</a></article>
|
| 2676 |
+
<article class="artifact"><h3>Later milestones</h3><p>Full audio-visual learning, caption generation, depth-pixel prediction, SLAM estimation, neural rendering, policy learning, cross-episode generalization, held-out Qwen3-Omni evaluation, and future Xperience-native pretraining.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md">native pretraining</a></article>
|
| 2677 |
</div>
|
| 2678 |
</div>
|
| 2679 |
</section>
|
|
|
|
| 3117 |
</div>
|
| 3118 |
<div class="artifact-grid">
|
| 3119 |
<article class="artifact primary-artifact"><div><h3>Project scope</h3><p>Connects implemented single-episode artifacts, setup-stage Omni work, the selected 128-episode pilot, and later multi-episode milestones.</p></div><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/EVIDENCE_CONTRACT.md">evidence contract</a></article>
|
| 3120 |
+
<article class="artifact"><h3>Foundation-model plan</h3><p>Backbone selection matrix covering Qwen3-Omni, Cosmos 3, GR00T, OpenVLA/openpi, Gemini Robotics, Octo, SmolVLA-style policy candidates, and the future Xperience-native pretraining goal.</p><a href="data/foundation_model_plan.json">foundation model plan</a></article>
|
| 3121 |
<article class="artifact"><h3>Multi-episode data access</h3><p>Public data-access path, selected 128-episode pilot plan, and preparation requirements.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md">data access</a></article>
|
| 3122 |
<article class="artifact"><h3>Qwen3-Omni preparation</h3><p>Episode selection and manifest preparation for the current scale-up path.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/episode_manifest.json">preparation details</a></article>
|
| 3123 |
<article class="artifact"><h3>Scale-up requirement</h3><p>What must be available before full pilot training and held-out metrics.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">training requirements</a></article>
|
| 3124 |
+
<article class="artifact"><h3>Xperience-native pretraining</h3><p>Future plan for a domain-specific embodied foundation model trained from scratch over full-corpus video, audio, geometry, motion, inertial, and language streams.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md">pretraining plan</a></article>
|
| 3125 |
</div>
|
| 3126 |
</section>
|
| 3127 |
|
|
|
|
| 3138 |
<article class="artifact"><h3>Dataset notes</h3><p>Official dataset links, public sample source, modalities, access boundary, and current project subset.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/XPERIENCE10M_DATASET_CARD_ALIGNMENT.md">dataset notes</a></article>
|
| 3139 |
<article class="artifact"><h3>Reproducibility</h3><p>Commands and expected outputs for rebuilding the public-sample task suite and visual artifacts.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/REPRODUCIBILITY.md">reproduce</a></article>
|
| 3140 |
<article class="artifact"><h3>Qwen3-Omni status</h3><p>Data requirements and evaluation boundary for the selected multi-episode LoRA pilot.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">training status</a></article>
|
| 3141 |
+
<article class="artifact"><h3>Foundation-model plan</h3><p>Qwen3-Omni, Cosmos 3, GR00T, OpenVLA/openpi, Gemini Robotics, Octo, SmolVLA-style branches, and the Xperience-native pretraining goal by role.</p><a href="data/foundation_model_plan.json">model plan</a></article>
|
| 3142 |
<article class="artifact"><h3>Hub artifacts</h3><p>Derived CSV/JSON/Markdown/figure artifacts without redistributing raw Xperience-10M data.</p><a href="https://huggingface.co/datasets/cy0307/ropedia-xperience-10m-task-suite-artifacts">artifact dataset</a></article>
|
| 3143 |
<article class="artifact"><h3>Baseline models</h3><p>Lightweight minimal and neural task-head model files for the 12 task contracts.</p><a href="https://huggingface.co/cy0307/ropedia-xperience-10m-task-baselines">model repo</a></article>
|
| 3144 |
</div>
|
|
|
|
| 3158 |
<article class="artifact"><h3>Transfer</h3><p>Download raw episodes only from official gated sources, exclude visualization.rrd, validate files, then stage them for training.</p></article>
|
| 3159 |
<article class="artifact"><h3>Current LoRA artifact</h3><p>The current LoRA artifact uses the locally available sample data. The multi-episode result begins after selected data is prepared, preprocessed, trained, and evaluated on held-out sessions.</p></article>
|
| 3160 |
<article class="artifact"><h3>Backbone branches</h3><p>Qwen3-Omni is the immediate LoRA path; Cosmos 3 is the first world-model branch; GR00T/OpenVLA/openpi become policy branches after action targets are well-defined.</p><a href="data/foundation_model_plan.json">backbone plan</a></article>
|
| 3161 |
+
<article class="artifact"><h3>Native foundation model</h3><p>The long-term goal is a full-corpus Xperience Embodied Foundation Model trained on synchronized perception, geometry, motion, inertial, audio, and language streams after smaller scaling stages validate the approach.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md">pretraining plan</a></article>
|
| 3162 |
</div>
|
| 3163 |
</div>
|
| 3164 |
</section>
|
docs/research_roadmap.html
CHANGED
|
@@ -605,8 +605,9 @@
|
|
| 605 |
<h1>Interactive Research Roadmap.</h1>
|
| 606 |
<p class="hero-copy">
|
| 607 |
This page connects the current public-sample task lab to the four research
|
| 608 |
-
directions, the next multi-episode Qwen3-Omni fine-tuning path,
|
| 609 |
-
|
|
|
|
| 610 |
directly from generated project artifacts, so the track and task views stay
|
| 611 |
tied to the real sample metrics and scale-up status.
|
| 612 |
</p>
|
|
@@ -630,7 +631,7 @@
|
|
| 630 |
</div>
|
| 631 |
<div class="route-step">
|
| 632 |
<strong>03</strong>
|
| 633 |
-
<div><b>Omni + branches</b><span>Qwen3-Omni first, Cosmos 3 and policy models
|
| 634 |
<em id="routeOmni">pending data</em>
|
| 635 |
</div>
|
| 636 |
</div>
|
|
@@ -701,7 +702,7 @@
|
|
| 701 |
},
|
| 702 |
omni: {
|
| 703 |
title: "Omni pilot and foundation branches",
|
| 704 |
-
summary: "Run Qwen3-Omni first for the held-out LoRA pilot,
|
| 705 |
}
|
| 706 |
};
|
| 707 |
|
|
|
|
| 605 |
<h1>Interactive Research Roadmap.</h1>
|
| 606 |
<p class="hero-copy">
|
| 607 |
This page connects the current public-sample task lab to the four research
|
| 608 |
+
directions, the next multi-episode Qwen3-Omni fine-tuning path, the
|
| 609 |
+
later Cosmos 3 / policy-model branch choices, and the future
|
| 610 |
+
Xperience-native foundation-model pretraining goal. It loads
|
| 611 |
directly from generated project artifacts, so the track and task views stay
|
| 612 |
tied to the real sample metrics and scale-up status.
|
| 613 |
</p>
|
|
|
|
| 631 |
</div>
|
| 632 |
<div class="route-step">
|
| 633 |
<strong>03</strong>
|
| 634 |
+
<div><b>Omni + branches</b><span>Qwen3-Omni first, Cosmos 3 and policy models next, native pretraining later</span></div>
|
| 635 |
<em id="routeOmni">pending data</em>
|
| 636 |
</div>
|
| 637 |
</div>
|
|
|
|
| 702 |
},
|
| 703 |
omni: {
|
| 704 |
title: "Omni pilot and foundation branches",
|
| 705 |
+
summary: "Run Qwen3-Omni first for the held-out LoRA pilot, evaluate Cosmos 3 for world modeling and policy candidates after action targets are explicit, then treat Xperience-native pretraining as the full-corpus future goal.",
|
| 706 |
}
|
| 707 |
};
|
| 708 |
|
index.html
CHANGED
|
@@ -2141,9 +2141,11 @@
|
|
| 2141 |
<p class="hero-copy">
|
| 2142 |
This project uses the public Xperience-10M sample from Ropedia to explore
|
| 2143 |
embodied-AI task design, multimodal feature construction, lightweight
|
| 2144 |
-
baselines,
|
| 2145 |
-
|
| 2146 |
-
|
|
|
|
|
|
|
| 2147 |
</p>
|
| 2148 |
<div class="hero-actions">
|
| 2149 |
<a class="button primary" href="research_roadmap.html">Open roadmap</a>
|
|
@@ -2252,7 +2254,7 @@
|
|
| 2252 |
</article>
|
| 2253 |
<article class="brief-card">
|
| 2254 |
<strong>Scale-up readiness</strong>
|
| 2255 |
-
<p>Connects the same data contract to 32/128-episode held-out pilots, Qwen3-Omni LoRA, Cosmos-style world modeling, and later
|
| 2256 |
</article>
|
| 2257 |
</div>
|
| 2258 |
<div class="brief-actions">
|
|
@@ -2356,7 +2358,7 @@
|
|
| 2356 |
<div class="wrap">
|
| 2357 |
<div class="section-head">
|
| 2358 |
<h2>Research roadmap.</h2>
|
| 2359 |
-
<p>The project path moves from the current public-sample task lab to multi-episode data preparation, held-out Qwen3-Omni fine-tuning, robustness runs, and
|
| 2360 |
</div>
|
| 2361 |
<div class="roadmap-grid" aria-label="Research roadmap stages">
|
| 2362 |
<article class="roadmap-card" data-status="implemented">
|
|
@@ -2413,12 +2415,22 @@
|
|
| 2413 |
<strong>Evidence</strong><p>Task-specific held-out evaluations, qualitative inspection, and updated model cards.</p>
|
| 2414 |
</div>
|
| 2415 |
</article>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2416 |
</div>
|
| 2417 |
<div class="roadmap-links">
|
| 2418 |
<a href="research_roadmap.html">interactive roadmap</a>
|
| 2419 |
<a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/RESEARCH_ROADMAP.md">roadmap document</a>
|
| 2420 |
<a href="data/research_roadmap.json">roadmap stages</a>
|
| 2421 |
<a href="data/foundation_model_plan.json">foundation model plan</a>
|
|
|
|
| 2422 |
<a href="data/research_roadmap_interactive.json">interactive map</a>
|
| 2423 |
<a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">scale-up status</a>
|
| 2424 |
<a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/PROJECT_STATUS.md">project status</a>
|
|
@@ -2438,7 +2450,7 @@
|
|
| 2438 |
<article class="artifact"><h3>Metric contract</h3><p>All 12 tasks list input, target, primary metric, minimal baseline score, and neural MLP score from committed result files.</p><a href="data/summary_metrics.json">summary metrics</a></article>
|
| 2439 |
<article class="artifact"><h3>Leakage controls</h3><p>Scalers fit on train windows only; future labels, target-side signals, caption/object labels, and contact labels stay on the target side unless explicitly queried.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/scripts/build_evaluation_protocol.py">builder script</a></article>
|
| 2440 |
<article class="artifact"><h3>Audio ablation</h3><p>Audio and no-audio variants are evaluated across all 12 task contracts under the same chronological split.</p><a href="data/audio_ablation_summary.json">audio summary</a></article>
|
| 2441 |
-
<article class="artifact"><h3>Foundation branch selection</h3><p>Qwen3-Omni is the first trainable baseline, Cosmos 3 becomes the world-model branch,
|
| 2442 |
<article class="artifact"><h3>Next evaluation stage</h3><p>This public-sample run covers single-episode task development. Cross-episode generalization, audio-visual learning, world modeling, policy targets, and held-out Qwen3-Omni training move to the multi-episode stage after selected data is prepared.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">next-stage plan</a></article>
|
| 2443 |
<article class="artifact"><h3>Scale-up requirement</h3><p>The Omni pilot requires selected prepared episodes, held-out episode splits, no train/test episode leakage, training metadata, predictions, metrics, and a run report.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">scale-up status</a></article>
|
| 2444 |
</div>
|
|
@@ -2492,10 +2504,11 @@
|
|
| 2492 |
<article class="evidence-card">
|
| 2493 |
<span class="status-pill">current plan</span>
|
| 2494 |
<h3>Foundation backbones are separated by role</h3>
|
| 2495 |
-
<p>Qwen3-Omni stays first for held-out LoRA; Cosmos 3 is the world-model branch; OpenVLA/openpi/GR00T are policy candidates after action-space conversion.</p>
|
| 2496 |
<div class="evidence-links">
|
| 2497 |
<a href="data/foundation_model_plan.json">foundation model plan</a>
|
| 2498 |
<a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/FOUNDATION_MODEL_PLAN.md">plan doc</a>
|
|
|
|
| 2499 |
</div>
|
| 2500 |
</article>
|
| 2501 |
<article class="evidence-card">
|
|
@@ -2628,10 +2641,11 @@
|
|
| 2628 |
<article class="reading-card">
|
| 2629 |
<span class="step-index">04</span>
|
| 2630 |
<h3>Check the scale-up gate</h3>
|
| 2631 |
-
<p>The multi-episode Qwen3-Omni path is prepared. The selected 128-episode result will be added after staging, preprocessing, training, and held-out evaluation pass.</p>
|
| 2632 |
<div class="reading-links">
|
| 2633 |
<a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">scale-up status</a>
|
| 2634 |
<a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md">data access</a>
|
|
|
|
| 2635 |
<a href="data/project_packet.json">reader path</a>
|
| 2636 |
</div>
|
| 2637 |
</article>
|
|
@@ -2659,7 +2673,7 @@
|
|
| 2659 |
<article class="artifact"><h3>Current project subset</h3><p>One public sample episode, 5,821 frames, 1,161 aligned windows, 8,546-dimensional task inputs, and no raw-data redistribution.</p><a href="data/modality_atlas.json">modality atlas</a></article>
|
| 2660 |
<article class="artifact"><h3>Covered now</h3><p>Action/subtask labels, next-action prediction, temporal diagnostics, hand trajectory, contact, object relevance, caption grounding, retrieval, reconstruction, and misalignment.</p><a href="data/summary_metrics.json">summary metrics</a></article>
|
| 2661 |
<article class="artifact"><h3>Responsible use</h3><p>This project is for research exploration and excludes identity recognition, surveillance, biometric profiling, sensitive-attribute inference, and safety-critical deployment.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/DATA_NOTICE.md">use notes</a></article>
|
| 2662 |
-
<article class="artifact"><h3>Later milestones</h3><p>Full audio-visual learning, caption generation, depth-pixel prediction, SLAM estimation, neural rendering, policy learning, cross-episode generalization,
|
| 2663 |
</div>
|
| 2664 |
</div>
|
| 2665 |
</section>
|
|
@@ -3103,10 +3117,11 @@
|
|
| 3103 |
</div>
|
| 3104 |
<div class="artifact-grid">
|
| 3105 |
<article class="artifact primary-artifact"><div><h3>Project scope</h3><p>Connects implemented single-episode artifacts, setup-stage Omni work, the selected 128-episode pilot, and later multi-episode milestones.</p></div><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/EVIDENCE_CONTRACT.md">evidence contract</a></article>
|
| 3106 |
-
<article class="artifact"><h3>Foundation-model plan</h3><p>Backbone selection matrix covering Qwen3-Omni, Cosmos 3, GR00T, OpenVLA/openpi, Gemini Robotics, Octo,
|
| 3107 |
<article class="artifact"><h3>Multi-episode data access</h3><p>Public data-access path, selected 128-episode pilot plan, and preparation requirements.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md">data access</a></article>
|
| 3108 |
<article class="artifact"><h3>Qwen3-Omni preparation</h3><p>Episode selection and manifest preparation for the current scale-up path.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/episode_manifest.json">preparation details</a></article>
|
| 3109 |
<article class="artifact"><h3>Scale-up requirement</h3><p>What must be available before full pilot training and held-out metrics.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">training requirements</a></article>
|
|
|
|
| 3110 |
</div>
|
| 3111 |
</section>
|
| 3112 |
|
|
@@ -3123,7 +3138,7 @@
|
|
| 3123 |
<article class="artifact"><h3>Dataset notes</h3><p>Official dataset links, public sample source, modalities, access boundary, and current project subset.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/XPERIENCE10M_DATASET_CARD_ALIGNMENT.md">dataset notes</a></article>
|
| 3124 |
<article class="artifact"><h3>Reproducibility</h3><p>Commands and expected outputs for rebuilding the public-sample task suite and visual artifacts.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/REPRODUCIBILITY.md">reproduce</a></article>
|
| 3125 |
<article class="artifact"><h3>Qwen3-Omni status</h3><p>Data requirements and evaluation boundary for the selected multi-episode LoRA pilot.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">training status</a></article>
|
| 3126 |
-
<article class="artifact"><h3>Foundation-model plan</h3><p>Qwen3-Omni, Cosmos 3, GR00T, OpenVLA/openpi, Gemini Robotics, Octo,
|
| 3127 |
<article class="artifact"><h3>Hub artifacts</h3><p>Derived CSV/JSON/Markdown/figure artifacts without redistributing raw Xperience-10M data.</p><a href="https://huggingface.co/datasets/cy0307/ropedia-xperience-10m-task-suite-artifacts">artifact dataset</a></article>
|
| 3128 |
<article class="artifact"><h3>Baseline models</h3><p>Lightweight minimal and neural task-head model files for the 12 task contracts.</p><a href="https://huggingface.co/cy0307/ropedia-xperience-10m-task-baselines">model repo</a></article>
|
| 3129 |
</div>
|
|
@@ -3143,6 +3158,7 @@
|
|
| 3143 |
<article class="artifact"><h3>Transfer</h3><p>Download raw episodes only from official gated sources, exclude visualization.rrd, validate files, then stage them for training.</p></article>
|
| 3144 |
<article class="artifact"><h3>Current LoRA artifact</h3><p>The current LoRA artifact uses the locally available sample data. The multi-episode result begins after selected data is prepared, preprocessed, trained, and evaluated on held-out sessions.</p></article>
|
| 3145 |
<article class="artifact"><h3>Backbone branches</h3><p>Qwen3-Omni is the immediate LoRA path; Cosmos 3 is the first world-model branch; GR00T/OpenVLA/openpi become policy branches after action targets are well-defined.</p><a href="data/foundation_model_plan.json">backbone plan</a></article>
|
|
|
|
| 3146 |
</div>
|
| 3147 |
</div>
|
| 3148 |
</section>
|
|
|
|
| 2141 |
<p class="hero-copy">
|
| 2142 |
This project uses the public Xperience-10M sample from Ropedia to explore
|
| 2143 |
embodied-AI task design, multimodal feature construction, lightweight
|
| 2144 |
+
baselines, future Omni-model fine-tuning, and the long-term path toward
|
| 2145 |
+
an Xperience-native embodied foundation model. It starts from the
|
| 2146 |
+
sample episode available now, then keeps the same data contracts ready
|
| 2147 |
+
for held-out multi-episode training when more Xperience-10M data is
|
| 2148 |
+
prepared.
|
| 2149 |
</p>
|
| 2150 |
<div class="hero-actions">
|
| 2151 |
<a class="button primary" href="research_roadmap.html">Open roadmap</a>
|
|
|
|
| 2254 |
</article>
|
| 2255 |
<article class="brief-card">
|
| 2256 |
<strong>Scale-up readiness</strong>
|
| 2257 |
+
<p>Connects the same data contract to 32/128-episode held-out pilots, Qwen3-Omni LoRA, Cosmos-style world modeling, policy-model branches, and the later Xperience-native pretraining goal.</p>
|
| 2258 |
</article>
|
| 2259 |
</div>
|
| 2260 |
<div class="brief-actions">
|
|
|
|
| 2358 |
<div class="wrap">
|
| 2359 |
<div class="section-head">
|
| 2360 |
<h2>Research roadmap.</h2>
|
| 2361 |
+
<p>The project path moves from the current public-sample task lab to multi-episode data preparation, held-out Qwen3-Omni fine-tuning, robustness runs, world/policy branches, and the future Xperience Embodied Foundation Model pretraining goal.</p>
|
| 2362 |
</div>
|
| 2363 |
<div class="roadmap-grid" aria-label="Research roadmap stages">
|
| 2364 |
<article class="roadmap-card" data-status="implemented">
|
|
|
|
| 2415 |
<strong>Evidence</strong><p>Task-specific held-out evaluations, qualitative inspection, and updated model cards.</p>
|
| 2416 |
</div>
|
| 2417 |
</article>
|
| 2418 |
+
<article class="roadmap-card" data-status="planned">
|
| 2419 |
+
<span class="roadmap-status">future</span>
|
| 2420 |
+
<h3>Xperience Embodied Foundation Model</h3>
|
| 2421 |
+
<p>Pretrain an Xperience-native domain model over synchronized video, audio, depth, pose, mocap, IMU, and language after smaller scaling stages prove value.</p>
|
| 2422 |
+
<div class="roadmap-meta">
|
| 2423 |
+
<strong>Entry</strong><p>Full-corpus access, PB-scale storage path, multi-node compute, and positive scaling evidence.</p>
|
| 2424 |
+
<strong>Evidence</strong><p>Pretraining manifests, scaling curves, held-out evaluations, checkpoint inventory, model card, and data-boundary report.</p>
|
| 2425 |
+
</div>
|
| 2426 |
+
</article>
|
| 2427 |
</div>
|
| 2428 |
<div class="roadmap-links">
|
| 2429 |
<a href="research_roadmap.html">interactive roadmap</a>
|
| 2430 |
<a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/RESEARCH_ROADMAP.md">roadmap document</a>
|
| 2431 |
<a href="data/research_roadmap.json">roadmap stages</a>
|
| 2432 |
<a href="data/foundation_model_plan.json">foundation model plan</a>
|
| 2433 |
+
<a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md">native pretraining plan</a>
|
| 2434 |
<a href="data/research_roadmap_interactive.json">interactive map</a>
|
| 2435 |
<a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">scale-up status</a>
|
| 2436 |
<a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/PROJECT_STATUS.md">project status</a>
|
|
|
|
| 2450 |
<article class="artifact"><h3>Metric contract</h3><p>All 12 tasks list input, target, primary metric, minimal baseline score, and neural MLP score from committed result files.</p><a href="data/summary_metrics.json">summary metrics</a></article>
|
| 2451 |
<article class="artifact"><h3>Leakage controls</h3><p>Scalers fit on train windows only; future labels, target-side signals, caption/object labels, and contact labels stay on the target side unless explicitly queried.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/scripts/build_evaluation_protocol.py">builder script</a></article>
|
| 2452 |
<article class="artifact"><h3>Audio ablation</h3><p>Audio and no-audio variants are evaluated across all 12 task contracts under the same chronological split.</p><a href="data/audio_ablation_summary.json">audio summary</a></article>
|
| 2453 |
+
<article class="artifact"><h3>Foundation branch selection</h3><p>Qwen3-Omni is the first trainable baseline, Cosmos 3 becomes the world-model branch, policy models wait for explicit action targets, and Xperience-native pretraining remains a later full-corpus goal.</p><a href="data/foundation_model_plan.json">backbone plan</a></article>
|
| 2454 |
<article class="artifact"><h3>Next evaluation stage</h3><p>This public-sample run covers single-episode task development. Cross-episode generalization, audio-visual learning, world modeling, policy targets, and held-out Qwen3-Omni training move to the multi-episode stage after selected data is prepared.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">next-stage plan</a></article>
|
| 2455 |
<article class="artifact"><h3>Scale-up requirement</h3><p>The Omni pilot requires selected prepared episodes, held-out episode splits, no train/test episode leakage, training metadata, predictions, metrics, and a run report.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">scale-up status</a></article>
|
| 2456 |
</div>
|
|
|
|
| 2504 |
<article class="evidence-card">
|
| 2505 |
<span class="status-pill">current plan</span>
|
| 2506 |
<h3>Foundation backbones are separated by role</h3>
|
| 2507 |
+
<p>Qwen3-Omni stays first for held-out LoRA; Cosmos 3 is the world-model branch; OpenVLA/openpi/GR00T are policy candidates after action-space conversion; Xperience-native pretraining is the later full-corpus goal.</p>
|
| 2508 |
<div class="evidence-links">
|
| 2509 |
<a href="data/foundation_model_plan.json">foundation model plan</a>
|
| 2510 |
<a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/FOUNDATION_MODEL_PLAN.md">plan doc</a>
|
| 2511 |
+
<a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md">pretraining plan</a>
|
| 2512 |
</div>
|
| 2513 |
</article>
|
| 2514 |
<article class="evidence-card">
|
|
|
|
| 2641 |
<article class="reading-card">
|
| 2642 |
<span class="step-index">04</span>
|
| 2643 |
<h3>Check the scale-up gate</h3>
|
| 2644 |
+
<p>The multi-episode Qwen3-Omni path is prepared. The selected 128-episode result will be added after staging, preprocessing, training, and held-out evaluation pass. The native-pretraining plan shows how this can grow into a full-corpus research direction.</p>
|
| 2645 |
<div class="reading-links">
|
| 2646 |
<a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">scale-up status</a>
|
| 2647 |
<a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md">data access</a>
|
| 2648 |
+
<a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md">native pretraining</a>
|
| 2649 |
<a href="data/project_packet.json">reader path</a>
|
| 2650 |
</div>
|
| 2651 |
</article>
|
|
|
|
| 2673 |
<article class="artifact"><h3>Current project subset</h3><p>One public sample episode, 5,821 frames, 1,161 aligned windows, 8,546-dimensional task inputs, and no raw-data redistribution.</p><a href="data/modality_atlas.json">modality atlas</a></article>
|
| 2674 |
<article class="artifact"><h3>Covered now</h3><p>Action/subtask labels, next-action prediction, temporal diagnostics, hand trajectory, contact, object relevance, caption grounding, retrieval, reconstruction, and misalignment.</p><a href="data/summary_metrics.json">summary metrics</a></article>
|
| 2675 |
<article class="artifact"><h3>Responsible use</h3><p>This project is for research exploration and excludes identity recognition, surveillance, biometric profiling, sensitive-attribute inference, and safety-critical deployment.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/DATA_NOTICE.md">use notes</a></article>
|
| 2676 |
+
<article class="artifact"><h3>Later milestones</h3><p>Full audio-visual learning, caption generation, depth-pixel prediction, SLAM estimation, neural rendering, policy learning, cross-episode generalization, held-out Qwen3-Omni evaluation, and future Xperience-native pretraining.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md">native pretraining</a></article>
|
| 2677 |
</div>
|
| 2678 |
</div>
|
| 2679 |
</section>
|
|
|
|
| 3117 |
</div>
|
| 3118 |
<div class="artifact-grid">
|
| 3119 |
<article class="artifact primary-artifact"><div><h3>Project scope</h3><p>Connects implemented single-episode artifacts, setup-stage Omni work, the selected 128-episode pilot, and later multi-episode milestones.</p></div><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/EVIDENCE_CONTRACT.md">evidence contract</a></article>
|
| 3120 |
+
<article class="artifact"><h3>Foundation-model plan</h3><p>Backbone selection matrix covering Qwen3-Omni, Cosmos 3, GR00T, OpenVLA/openpi, Gemini Robotics, Octo, SmolVLA-style policy candidates, and the future Xperience-native pretraining goal.</p><a href="data/foundation_model_plan.json">foundation model plan</a></article>
|
| 3121 |
<article class="artifact"><h3>Multi-episode data access</h3><p>Public data-access path, selected 128-episode pilot plan, and preparation requirements.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md">data access</a></article>
|
| 3122 |
<article class="artifact"><h3>Qwen3-Omni preparation</h3><p>Episode selection and manifest preparation for the current scale-up path.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/episode_manifest.json">preparation details</a></article>
|
| 3123 |
<article class="artifact"><h3>Scale-up requirement</h3><p>What must be available before full pilot training and held-out metrics.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">training requirements</a></article>
|
| 3124 |
+
<article class="artifact"><h3>Xperience-native pretraining</h3><p>Future plan for a domain-specific embodied foundation model trained from scratch over full-corpus video, audio, geometry, motion, inertial, and language streams.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md">pretraining plan</a></article>
|
| 3125 |
</div>
|
| 3126 |
</section>
|
| 3127 |
|
|
|
|
| 3138 |
<article class="artifact"><h3>Dataset notes</h3><p>Official dataset links, public sample source, modalities, access boundary, and current project subset.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/XPERIENCE10M_DATASET_CARD_ALIGNMENT.md">dataset notes</a></article>
|
| 3139 |
<article class="artifact"><h3>Reproducibility</h3><p>Commands and expected outputs for rebuilding the public-sample task suite and visual artifacts.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/REPRODUCIBILITY.md">reproduce</a></article>
|
| 3140 |
<article class="artifact"><h3>Qwen3-Omni status</h3><p>Data requirements and evaluation boundary for the selected multi-episode LoRA pilot.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/omni_finetune/DATA_ACCESS_STATUS.md">training status</a></article>
|
| 3141 |
+
<article class="artifact"><h3>Foundation-model plan</h3><p>Qwen3-Omni, Cosmos 3, GR00T, OpenVLA/openpi, Gemini Robotics, Octo, SmolVLA-style branches, and the Xperience-native pretraining goal by role.</p><a href="data/foundation_model_plan.json">model plan</a></article>
|
| 3142 |
<article class="artifact"><h3>Hub artifacts</h3><p>Derived CSV/JSON/Markdown/figure artifacts without redistributing raw Xperience-10M data.</p><a href="https://huggingface.co/datasets/cy0307/ropedia-xperience-10m-task-suite-artifacts">artifact dataset</a></article>
|
| 3143 |
<article class="artifact"><h3>Baseline models</h3><p>Lightweight minimal and neural task-head model files for the 12 task contracts.</p><a href="https://huggingface.co/cy0307/ropedia-xperience-10m-task-baselines">model repo</a></article>
|
| 3144 |
</div>
|
|
|
|
| 3158 |
<article class="artifact"><h3>Transfer</h3><p>Download raw episodes only from official gated sources, exclude visualization.rrd, validate files, then stage them for training.</p></article>
|
| 3159 |
<article class="artifact"><h3>Current LoRA artifact</h3><p>The current LoRA artifact uses the locally available sample data. The multi-episode result begins after selected data is prepared, preprocessed, trained, and evaluated on held-out sessions.</p></article>
|
| 3160 |
<article class="artifact"><h3>Backbone branches</h3><p>Qwen3-Omni is the immediate LoRA path; Cosmos 3 is the first world-model branch; GR00T/OpenVLA/openpi become policy branches after action targets are well-defined.</p><a href="data/foundation_model_plan.json">backbone plan</a></article>
|
| 3161 |
+
<article class="artifact"><h3>Native foundation model</h3><p>The long-term goal is a full-corpus Xperience Embodied Foundation Model trained on synchronized perception, geometry, motion, inertial, audio, and language streams after smaller scaling stages validate the approach.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md">pretraining plan</a></article>
|
| 3162 |
</div>
|
| 3163 |
</div>
|
| 3164 |
</section>
|
metrics/artifact_index.json
CHANGED
|
@@ -1,11 +1,11 @@
|
|
| 1 |
{
|
| 2 |
"title": "Ropedia Xperience-10M Task Suite Artifact Index",
|
| 3 |
-
"generated_at_utc": "2026-06-
|
| 4 |
"status": "pass",
|
| 5 |
-
"artifact_count":
|
| 6 |
"missing": [],
|
| 7 |
"by_kind": {
|
| 8 |
-
"project_path":
|
| 9 |
"project_scope": 1,
|
| 10 |
"source_alignment": 5,
|
| 11 |
"publication_workflow": 1,
|
|
@@ -62,8 +62,8 @@
|
|
| 62 |
"surface": "repo_hf",
|
| 63 |
"shows": "Gives a compact current-state table for first-pass readers.",
|
| 64 |
"exists": true,
|
| 65 |
-
"bytes":
|
| 66 |
-
"sha256": "
|
| 67 |
},
|
| 68 |
{
|
| 69 |
"id": "project_status_json",
|
|
@@ -73,8 +73,8 @@
|
|
| 73 |
"surface": "website_hf",
|
| 74 |
"shows": "Machine-readable copy of the current project status for website and HF mirrors.",
|
| 75 |
"exists": true,
|
| 76 |
-
"bytes":
|
| 77 |
-
"sha256": "
|
| 78 |
},
|
| 79 |
{
|
| 80 |
"id": "research_roadmap",
|
|
@@ -84,8 +84,8 @@
|
|
| 84 |
"surface": "repo_hf",
|
| 85 |
"shows": "Defines the path from public-sample task development to multi-episode held-out evaluation and larger omni-model extensions.",
|
| 86 |
"exists": true,
|
| 87 |
-
"bytes":
|
| 88 |
-
"sha256": "
|
| 89 |
},
|
| 90 |
{
|
| 91 |
"id": "research_roadmap_json",
|
|
@@ -95,8 +95,8 @@
|
|
| 95 |
"surface": "website_hf",
|
| 96 |
"shows": "Machine-readable research roadmap for the website and Hugging Face mirrors.",
|
| 97 |
"exists": true,
|
| 98 |
-
"bytes":
|
| 99 |
-
"sha256": "
|
| 100 |
},
|
| 101 |
{
|
| 102 |
"id": "foundation_model_plan",
|
|
@@ -106,8 +106,8 @@
|
|
| 106 |
"surface": "repo_hf",
|
| 107 |
"shows": "Defines the post-data-gate backbone choices: Qwen3-Omni first, Cosmos 3 for world modeling, and VLA/policy models after action-target conversion.",
|
| 108 |
"exists": true,
|
| 109 |
-
"bytes":
|
| 110 |
-
"sha256": "
|
| 111 |
},
|
| 112 |
{
|
| 113 |
"id": "foundation_model_plan_json",
|
|
@@ -117,8 +117,19 @@
|
|
| 117 |
"surface": "website_hf",
|
| 118 |
"shows": "Machine-readable foundation-model selection matrix with source links, entry conditions, and evaluation additions.",
|
| 119 |
"exists": true,
|
| 120 |
-
"bytes":
|
| 121 |
-
"sha256": "
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 122 |
},
|
| 123 |
{
|
| 124 |
"id": "evidence_contract",
|
|
@@ -150,8 +161,8 @@
|
|
| 150 |
"surface": "repo_hf",
|
| 151 |
"shows": "Gives the human-readable map from project scope to data, tasks, platform mirrors, and scale-up status.",
|
| 152 |
"exists": true,
|
| 153 |
-
"bytes":
|
| 154 |
-
"sha256": "
|
| 155 |
},
|
| 156 |
{
|
| 157 |
"id": "official_dataset_card_alignment",
|
|
@@ -195,7 +206,7 @@
|
|
| 195 |
"shows": "Machine-readable source-alignment pass/fail check for repo, website, and HF surfaces.",
|
| 196 |
"exists": true,
|
| 197 |
"bytes": 4432,
|
| 198 |
-
"sha256": "
|
| 199 |
},
|
| 200 |
{
|
| 201 |
"id": "source_alignment_validator",
|
|
@@ -573,8 +584,8 @@
|
|
| 573 |
"surface": "repo_hf",
|
| 574 |
"shows": "Generates the selective artifact catalog from local files.",
|
| 575 |
"exists": true,
|
| 576 |
-
"bytes":
|
| 577 |
-
"sha256": "
|
| 578 |
},
|
| 579 |
{
|
| 580 |
"id": "publication_audit",
|
|
@@ -585,7 +596,7 @@
|
|
| 585 |
"volatile": true,
|
| 586 |
"shows": "Confirms public bundles exclude raw data, caches, heavy archives, and credential text.",
|
| 587 |
"exists": true,
|
| 588 |
-
"bytes":
|
| 589 |
"hash_policy": "existence_and_size_only"
|
| 590 |
},
|
| 591 |
{
|
|
@@ -597,7 +608,7 @@
|
|
| 597 |
"volatile": true,
|
| 598 |
"shows": "Separates setup paths from completed held-out-episode results.",
|
| 599 |
"exists": true,
|
| 600 |
-
"bytes":
|
| 601 |
"hash_policy": "existence_and_size_only"
|
| 602 |
},
|
| 603 |
{
|
|
@@ -609,7 +620,7 @@
|
|
| 609 |
"volatile": true,
|
| 610 |
"shows": "Confirms prepared GitHub/HF Space/artifact/model mirrors share the same critical data, figure, website HTML, and validator files.",
|
| 611 |
"exists": true,
|
| 612 |
-
"bytes":
|
| 613 |
"hash_policy": "existence_and_size_only"
|
| 614 |
},
|
| 615 |
{
|
|
@@ -621,7 +632,7 @@
|
|
| 621 |
"volatile": true,
|
| 622 |
"shows": "Confirms local website links, anchors, JSON data files, and referenced images resolve.",
|
| 623 |
"exists": true,
|
| 624 |
-
"bytes":
|
| 625 |
"hash_policy": "existence_and_size_only"
|
| 626 |
},
|
| 627 |
{
|
|
|
|
| 1 |
{
|
| 2 |
"title": "Ropedia Xperience-10M Task Suite Artifact Index",
|
| 3 |
+
"generated_at_utc": "2026-06-04T20:40:52+00:00",
|
| 4 |
"status": "pass",
|
| 5 |
+
"artifact_count": 73,
|
| 6 |
"missing": [],
|
| 7 |
"by_kind": {
|
| 8 |
+
"project_path": 12,
|
| 9 |
"project_scope": 1,
|
| 10 |
"source_alignment": 5,
|
| 11 |
"publication_workflow": 1,
|
|
|
|
| 62 |
"surface": "repo_hf",
|
| 63 |
"shows": "Gives a compact current-state table for first-pass readers.",
|
| 64 |
"exists": true,
|
| 65 |
+
"bytes": 7207,
|
| 66 |
+
"sha256": "7baaba976ccc254da1a03ee2653057d1e08f3fb0c0cad035886c362442828720"
|
| 67 |
},
|
| 68 |
{
|
| 69 |
"id": "project_status_json",
|
|
|
|
| 73 |
"surface": "website_hf",
|
| 74 |
"shows": "Machine-readable copy of the current project status for website and HF mirrors.",
|
| 75 |
"exists": true,
|
| 76 |
+
"bytes": 9874,
|
| 77 |
+
"sha256": "600c95726eae3404127a8b2110f35468ff2ba02943cae0fbcd3ea43c66109d3e"
|
| 78 |
},
|
| 79 |
{
|
| 80 |
"id": "research_roadmap",
|
|
|
|
| 84 |
"surface": "repo_hf",
|
| 85 |
"shows": "Defines the path from public-sample task development to multi-episode held-out evaluation and larger omni-model extensions.",
|
| 86 |
"exists": true,
|
| 87 |
+
"bytes": 8388,
|
| 88 |
+
"sha256": "0b3e3356076998ad94dc39f708cc783a4ebeab76c9da661cdd37ea12a3bb3665"
|
| 89 |
},
|
| 90 |
{
|
| 91 |
"id": "research_roadmap_json",
|
|
|
|
| 95 |
"surface": "website_hf",
|
| 96 |
"shows": "Machine-readable research roadmap for the website and Hugging Face mirrors.",
|
| 97 |
"exists": true,
|
| 98 |
+
"bytes": 7161,
|
| 99 |
+
"sha256": "cc96118c2c05108c831616151bc027441f7545495adeeb6a4a6a6bffe8da7801"
|
| 100 |
},
|
| 101 |
{
|
| 102 |
"id": "foundation_model_plan",
|
|
|
|
| 106 |
"surface": "repo_hf",
|
| 107 |
"shows": "Defines the post-data-gate backbone choices: Qwen3-Omni first, Cosmos 3 for world modeling, and VLA/policy models after action-target conversion.",
|
| 108 |
"exists": true,
|
| 109 |
+
"bytes": 9075,
|
| 110 |
+
"sha256": "444d13ab556d2e16a199a7fca191b87c85ab8685d167aab357bc6341839299a2"
|
| 111 |
},
|
| 112 |
{
|
| 113 |
"id": "foundation_model_plan_json",
|
|
|
|
| 117 |
"surface": "website_hf",
|
| 118 |
"shows": "Machine-readable foundation-model selection matrix with source links, entry conditions, and evaluation additions.",
|
| 119 |
"exists": true,
|
| 120 |
+
"bytes": 12981,
|
| 121 |
+
"sha256": "9cce52025a2e2f8afb4660e2af3353aea6ad0a1af380849218dd74c0acc271bb"
|
| 122 |
+
},
|
| 123 |
+
{
|
| 124 |
+
"id": "xperience_embodied_foundation_pretraining",
|
| 125 |
+
"title": "Xperience Embodied Foundation Model pretraining goal",
|
| 126 |
+
"path": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md",
|
| 127 |
+
"kind": "project_path",
|
| 128 |
+
"surface": "repo_hf",
|
| 129 |
+
"shows": "Describes the future full-corpus Xperience-native pretraining goal, target modules, objectives, staged scale-up, hardware ranges, and evaluation protocol.",
|
| 130 |
+
"exists": true,
|
| 131 |
+
"bytes": 9182,
|
| 132 |
+
"sha256": "b5a6ddc58647cd895a4772b110ecc9f4d685427fb37b81b22c6c02d2b9b323f1"
|
| 133 |
},
|
| 134 |
{
|
| 135 |
"id": "evidence_contract",
|
|
|
|
| 161 |
"surface": "repo_hf",
|
| 162 |
"shows": "Gives the human-readable map from project scope to data, tasks, platform mirrors, and scale-up status.",
|
| 163 |
"exists": true,
|
| 164 |
+
"bytes": 11440,
|
| 165 |
+
"sha256": "9b8821a9b14fe1744f2e6b5c419b2c5daaf70b57f1944caf1105c36c0c66c119"
|
| 166 |
},
|
| 167 |
{
|
| 168 |
"id": "official_dataset_card_alignment",
|
|
|
|
| 206 |
"shows": "Machine-readable source-alignment pass/fail check for repo, website, and HF surfaces.",
|
| 207 |
"exists": true,
|
| 208 |
"bytes": 4432,
|
| 209 |
+
"sha256": "06c6e2d111c72df01ed127fd288e6675b63e35a21ae12a2523931a072bd0bc49"
|
| 210 |
},
|
| 211 |
{
|
| 212 |
"id": "source_alignment_validator",
|
|
|
|
| 584 |
"surface": "repo_hf",
|
| 585 |
"shows": "Generates the selective artifact catalog from local files.",
|
| 586 |
"exists": true,
|
| 587 |
+
"bytes": 27020,
|
| 588 |
+
"sha256": "0ca7ed96f24caecbab31687cffa99f0eba8471258986412a294614e688c5aff5"
|
| 589 |
},
|
| 590 |
{
|
| 591 |
"id": "publication_audit",
|
|
|
|
| 596 |
"volatile": true,
|
| 597 |
"shows": "Confirms public bundles exclude raw data, caches, heavy archives, and credential text.",
|
| 598 |
"exists": true,
|
| 599 |
+
"bytes": 11811,
|
| 600 |
"hash_policy": "existence_and_size_only"
|
| 601 |
},
|
| 602 |
{
|
|
|
|
| 608 |
"volatile": true,
|
| 609 |
"shows": "Separates setup paths from completed held-out-episode results.",
|
| 610 |
"exists": true,
|
| 611 |
+
"bytes": 18981,
|
| 612 |
"hash_policy": "existence_and_size_only"
|
| 613 |
},
|
| 614 |
{
|
|
|
|
| 620 |
"volatile": true,
|
| 621 |
"shows": "Confirms prepared GitHub/HF Space/artifact/model mirrors share the same critical data, figure, website HTML, and validator files.",
|
| 622 |
"exists": true,
|
| 623 |
+
"bytes": 108621,
|
| 624 |
"hash_policy": "existence_and_size_only"
|
| 625 |
},
|
| 626 |
{
|
|
|
|
| 632 |
"volatile": true,
|
| 633 |
"shows": "Confirms local website links, anchors, JSON data files, and referenced images resolve.",
|
| 634 |
"exists": true,
|
| 635 |
+
"bytes": 14891,
|
| 636 |
"hash_policy": "existence_and_size_only"
|
| 637 |
},
|
| 638 |
{
|
metrics/foundation_model_plan.json
CHANGED
|
@@ -2,6 +2,16 @@
|
|
| 2 |
"title": "Xperience-10M Foundation Model Plan",
|
| 3 |
"status": "planning_artifact",
|
| 4 |
"current_boundary": "No held-out multi-episode foundation-model result has been completed in this repo. The current foundation-model artifacts are setup-stage until enough valid episodes are prepared and evaluated.",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
"decision": {
|
| 6 |
"immediate_trainable_backbone": "Qwen3-Omni",
|
| 7 |
"first_world_model_branch": "Cosmos 3",
|
|
@@ -10,7 +20,65 @@
|
|
| 10 |
"openpi pi0/pi0.5",
|
| 11 |
"NVIDIA GR00T"
|
| 12 |
],
|
| 13 |
-
"external_reasoning_reference": "Gemini Robotics"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
},
|
| 15 |
"model_families": [
|
| 16 |
{
|
|
@@ -112,6 +180,21 @@
|
|
| 112 |
"current_decision": "optional_baseline_after_data_staging",
|
| 113 |
"entry_condition": "Action labels and baseline protocol exist.",
|
| 114 |
"public_source": "https://github.com/huggingface/lerobot"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 115 |
}
|
| 116 |
],
|
| 117 |
"execution_order": [
|
|
@@ -144,6 +227,11 @@
|
|
| 144 |
"step": 6,
|
| 145 |
"name": "Publishing threshold",
|
| 146 |
"action": "Publish branch results only with real manifests, predictions, metrics, and qualitative examples."
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 147 |
}
|
| 148 |
],
|
| 149 |
"evaluation_additions": [
|
|
@@ -230,6 +318,10 @@
|
|
| 230 |
{
|
| 231 |
"label": "LeRobot / SmolVLA",
|
| 232 |
"url": "https://github.com/huggingface/lerobot"
|
|
|
|
|
|
|
|
|
|
|
|
|
| 233 |
}
|
| 234 |
]
|
| 235 |
}
|
|
|
|
| 2 |
"title": "Xperience-10M Foundation Model Plan",
|
| 3 |
"status": "planning_artifact",
|
| 4 |
"current_boundary": "No held-out multi-episode foundation-model result has been completed in this repo. The current foundation-model artifacts are setup-stage until enough valid episodes are prepared and evaluated.",
|
| 5 |
+
"backbone_registry": {
|
| 6 |
+
"config_dir": "configs/omni_backbones",
|
| 7 |
+
"validator": "scripts/omni/backbone_registry.py --validate --json",
|
| 8 |
+
"extension_contract": "OMNI_MODEL_EXTENSION_CONTRACT.md",
|
| 9 |
+
"implemented_backbone": "qwen3_omni_lora",
|
| 10 |
+
"planned_backbones": [
|
| 11 |
+
"cosmos_world_model",
|
| 12 |
+
"policy_vla_branch"
|
| 13 |
+
]
|
| 14 |
+
},
|
| 15 |
"decision": {
|
| 16 |
"immediate_trainable_backbone": "Qwen3-Omni",
|
| 17 |
"first_world_model_branch": "Cosmos 3",
|
|
|
|
| 20 |
"openpi pi0/pi0.5",
|
| 21 |
"NVIDIA GR00T"
|
| 22 |
],
|
| 23 |
+
"external_reasoning_reference": "Gemini Robotics",
|
| 24 |
+
"long_term_native_pretraining_goal": "Xperience Embodied Foundation Model"
|
| 25 |
+
},
|
| 26 |
+
"future_pretraining_goal": {
|
| 27 |
+
"name": "Xperience Embodied Foundation Model",
|
| 28 |
+
"status": "future_planning_goal",
|
| 29 |
+
"role": "Domain-specific embodied foundation model pretrained on full Xperience-10M if full-corpus data, storage, and compute become available.",
|
| 30 |
+
"not_current_result": true,
|
| 31 |
+
"document": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md",
|
| 32 |
+
"entry_conditions": [
|
| 33 |
+
"Selected multi-episode Qwen3-Omni pilot trains and evaluates cleanly.",
|
| 34 |
+
"Scaling from 128 episodes to thousands of episodes shows measurable value.",
|
| 35 |
+
"Full-corpus storage, derived-shard storage, and fast active-cache capacity are available.",
|
| 36 |
+
"Distributed training, checkpoint/restart, and provenance tracking are reliable.",
|
| 37 |
+
"Evaluation covers held-out episodes, sessions, activities, objects, and missing-modality robustness."
|
| 38 |
+
],
|
| 39 |
+
"target_modules": [
|
| 40 |
+
"multi-view video encoder",
|
| 41 |
+
"audio encoder",
|
| 42 |
+
"depth and geometry encoder",
|
| 43 |
+
"pose/SLAM encoder",
|
| 44 |
+
"hand/body mocap encoder",
|
| 45 |
+
"IMU encoder",
|
| 46 |
+
"language encoder/decoder",
|
| 47 |
+
"temporal fusion transformer",
|
| 48 |
+
"task heads and decoders"
|
| 49 |
+
],
|
| 50 |
+
"pretraining_objectives": [
|
| 51 |
+
"masked multimodal modeling",
|
| 52 |
+
"cross-modal contrastive alignment",
|
| 53 |
+
"future-state prediction",
|
| 54 |
+
"ego-motion and hand-motion forecasting",
|
| 55 |
+
"action and procedure prediction",
|
| 56 |
+
"language grounding and captioning",
|
| 57 |
+
"contact and affordance prediction",
|
| 58 |
+
"optional policy-style targets after action conversion"
|
| 59 |
+
],
|
| 60 |
+
"hardware_ranges": [
|
| 61 |
+
{
|
| 62 |
+
"goal": "0.3B-1B pilot",
|
| 63 |
+
"compute": "8-32 modern 80GB-class data-center GPUs",
|
| 64 |
+
"use": "prove objectives and data loaders"
|
| 65 |
+
},
|
| 66 |
+
{
|
| 67 |
+
"goal": "1B-3B domain model",
|
| 68 |
+
"compute": "32-128 GPUs",
|
| 69 |
+
"use": "research-scale Xperience representation learning"
|
| 70 |
+
},
|
| 71 |
+
{
|
| 72 |
+
"goal": "3B-7B full-corpus domain model",
|
| 73 |
+
"compute": "128-512 GPUs",
|
| 74 |
+
"use": "first realistic full Xperience-native foundation model"
|
| 75 |
+
},
|
| 76 |
+
{
|
| 77 |
+
"goal": "30B-class omni model from scratch",
|
| 78 |
+
"compute": "512-2000+ GPUs",
|
| 79 |
+
"use": "lab-scale project after scaling curves justify cost"
|
| 80 |
+
}
|
| 81 |
+
]
|
| 82 |
},
|
| 83 |
"model_families": [
|
| 84 |
{
|
|
|
|
| 180 |
"current_decision": "optional_baseline_after_data_staging",
|
| 181 |
"entry_condition": "Action labels and baseline protocol exist.",
|
| 182 |
"public_source": "https://github.com/huggingface/lerobot"
|
| 183 |
+
},
|
| 184 |
+
{
|
| 185 |
+
"priority": 8,
|
| 186 |
+
"family": "Xperience Embodied Foundation Model",
|
| 187 |
+
"category": "xperience_native_pretraining_goal",
|
| 188 |
+
"openness": "future project-specific model if full-corpus access and compute exist",
|
| 189 |
+
"best_role": "Domain model over synchronized embodied experience.",
|
| 190 |
+
"xperience10m_fit": [
|
| 191 |
+
"Uses the full aligned modality stack rather than treating sensors as auxiliary metadata.",
|
| 192 |
+
"Targets temporal embodied representation learning across perception, motion, geometry, audio, and language.",
|
| 193 |
+
"Can become the shared pretraining backbone for Qwen-style instruction tasks, Cosmos-style world modeling, and policy/action branches."
|
| 194 |
+
],
|
| 195 |
+
"current_decision": "future_goal_after_scaling_evidence",
|
| 196 |
+
"entry_condition": "Full-corpus data path, PB-scale storage, multi-node compute, and positive smaller-run scaling evidence.",
|
| 197 |
+
"public_source": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md"
|
| 198 |
}
|
| 199 |
],
|
| 200 |
"execution_order": [
|
|
|
|
| 227 |
"step": 6,
|
| 228 |
"name": "Publishing threshold",
|
| 229 |
"action": "Publish branch results only with real manifests, predictions, metrics, and qualitative examples."
|
| 230 |
+
},
|
| 231 |
+
{
|
| 232 |
+
"step": 7,
|
| 233 |
+
"name": "Xperience-native pretraining",
|
| 234 |
+
"action": "Start a from-scratch Xperience Embodied Foundation Model only after smaller scaling stages, full-corpus storage, multi-node compute, and held-out evaluation protocols are in place."
|
| 235 |
}
|
| 236 |
],
|
| 237 |
"evaluation_additions": [
|
|
|
|
| 318 |
{
|
| 319 |
"label": "LeRobot / SmolVLA",
|
| 320 |
"url": "https://github.com/huggingface/lerobot"
|
| 321 |
+
},
|
| 322 |
+
{
|
| 323 |
+
"label": "Xperience Embodied Foundation Model pretraining plan",
|
| 324 |
+
"url": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md"
|
| 325 |
}
|
| 326 |
]
|
| 327 |
}
|
metrics/mirror_parity.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
{
|
| 2 |
"status": "pass",
|
| 3 |
-
"generated_at_utc": "2026-06-
|
| 4 |
"hf_root": "hf_publish",
|
| 5 |
"summary": {
|
| 6 |
"group_count": 101,
|
|
@@ -71,27 +71,27 @@
|
|
| 71 |
"local": {
|
| 72 |
"path": "repo:docs/data/artifact_index.json",
|
| 73 |
"exists": true,
|
| 74 |
-
"bytes":
|
| 75 |
-
"sha256": "
|
| 76 |
},
|
| 77 |
"mirrors": {
|
| 78 |
"hf_space": {
|
| 79 |
"path": "hf_space:data/artifact_index.json",
|
| 80 |
"exists": true,
|
| 81 |
-
"bytes":
|
| 82 |
-
"sha256": "
|
| 83 |
},
|
| 84 |
"hf_artifacts": {
|
| 85 |
"path": "hf_artifacts:docs/data/artifact_index.json",
|
| 86 |
"exists": true,
|
| 87 |
-
"bytes":
|
| 88 |
-
"sha256": "
|
| 89 |
},
|
| 90 |
"hf_model": {
|
| 91 |
"path": "hf_model:metrics/artifact_index.json",
|
| 92 |
"exists": true,
|
| 93 |
-
"bytes":
|
| 94 |
-
"sha256": "
|
| 95 |
}
|
| 96 |
},
|
| 97 |
"failures": []
|
|
@@ -226,27 +226,27 @@
|
|
| 226 |
"local": {
|
| 227 |
"path": "repo:docs/data/foundation_model_plan.json",
|
| 228 |
"exists": true,
|
| 229 |
-
"bytes":
|
| 230 |
-
"sha256": "
|
| 231 |
},
|
| 232 |
"mirrors": {
|
| 233 |
"hf_space": {
|
| 234 |
"path": "hf_space:data/foundation_model_plan.json",
|
| 235 |
"exists": true,
|
| 236 |
-
"bytes":
|
| 237 |
-
"sha256": "
|
| 238 |
},
|
| 239 |
"hf_artifacts": {
|
| 240 |
"path": "hf_artifacts:docs/data/foundation_model_plan.json",
|
| 241 |
"exists": true,
|
| 242 |
-
"bytes":
|
| 243 |
-
"sha256": "
|
| 244 |
},
|
| 245 |
"hf_model": {
|
| 246 |
"path": "hf_model:metrics/foundation_model_plan.json",
|
| 247 |
"exists": true,
|
| 248 |
-
"bytes":
|
| 249 |
-
"sha256": "
|
| 250 |
}
|
| 251 |
},
|
| 252 |
"failures": []
|
|
@@ -412,27 +412,27 @@
|
|
| 412 |
"local": {
|
| 413 |
"path": "repo:docs/data/project_status.json",
|
| 414 |
"exists": true,
|
| 415 |
-
"bytes":
|
| 416 |
-
"sha256": "
|
| 417 |
},
|
| 418 |
"mirrors": {
|
| 419 |
"hf_space": {
|
| 420 |
"path": "hf_space:data/project_status.json",
|
| 421 |
"exists": true,
|
| 422 |
-
"bytes":
|
| 423 |
-
"sha256": "
|
| 424 |
},
|
| 425 |
"hf_artifacts": {
|
| 426 |
"path": "hf_artifacts:docs/data/project_status.json",
|
| 427 |
"exists": true,
|
| 428 |
-
"bytes":
|
| 429 |
-
"sha256": "
|
| 430 |
},
|
| 431 |
"hf_model": {
|
| 432 |
"path": "hf_model:metrics/project_status.json",
|
| 433 |
"exists": true,
|
| 434 |
-
"bytes":
|
| 435 |
-
"sha256": "
|
| 436 |
}
|
| 437 |
},
|
| 438 |
"failures": []
|
|
@@ -444,26 +444,26 @@
|
|
| 444 |
"path": "repo:docs/data/publication_audit.json",
|
| 445 |
"exists": true,
|
| 446 |
"bytes": 7237,
|
| 447 |
-
"sha256": "
|
| 448 |
},
|
| 449 |
"mirrors": {
|
| 450 |
"hf_space": {
|
| 451 |
"path": "hf_space:data/publication_audit.json",
|
| 452 |
"exists": true,
|
| 453 |
"bytes": 7237,
|
| 454 |
-
"sha256": "
|
| 455 |
},
|
| 456 |
"hf_artifacts": {
|
| 457 |
"path": "hf_artifacts:docs/data/publication_audit.json",
|
| 458 |
"exists": true,
|
| 459 |
"bytes": 7237,
|
| 460 |
-
"sha256": "
|
| 461 |
},
|
| 462 |
"hf_model": {
|
| 463 |
"path": "hf_model:metrics/publication_audit.json",
|
| 464 |
"exists": true,
|
| 465 |
"bytes": 7237,
|
| 466 |
-
"sha256": "
|
| 467 |
}
|
| 468 |
},
|
| 469 |
"failures": []
|
|
@@ -598,27 +598,27 @@
|
|
| 598 |
"local": {
|
| 599 |
"path": "repo:docs/data/research_roadmap.json",
|
| 600 |
"exists": true,
|
| 601 |
-
"bytes":
|
| 602 |
-
"sha256": "
|
| 603 |
},
|
| 604 |
"mirrors": {
|
| 605 |
"hf_space": {
|
| 606 |
"path": "hf_space:data/research_roadmap.json",
|
| 607 |
"exists": true,
|
| 608 |
-
"bytes":
|
| 609 |
-
"sha256": "
|
| 610 |
},
|
| 611 |
"hf_artifacts": {
|
| 612 |
"path": "hf_artifacts:docs/data/research_roadmap.json",
|
| 613 |
"exists": true,
|
| 614 |
-
"bytes":
|
| 615 |
-
"sha256": "
|
| 616 |
},
|
| 617 |
"hf_model": {
|
| 618 |
"path": "hf_model:metrics/research_roadmap.json",
|
| 619 |
"exists": true,
|
| 620 |
-
"bytes":
|
| 621 |
-
"sha256": "
|
| 622 |
}
|
| 623 |
},
|
| 624 |
"failures": []
|
|
@@ -629,27 +629,27 @@
|
|
| 629 |
"local": {
|
| 630 |
"path": "repo:docs/data/research_roadmap_interactive.json",
|
| 631 |
"exists": true,
|
| 632 |
-
"bytes":
|
| 633 |
-
"sha256": "
|
| 634 |
},
|
| 635 |
"mirrors": {
|
| 636 |
"hf_space": {
|
| 637 |
"path": "hf_space:data/research_roadmap_interactive.json",
|
| 638 |
"exists": true,
|
| 639 |
-
"bytes":
|
| 640 |
-
"sha256": "
|
| 641 |
},
|
| 642 |
"hf_artifacts": {
|
| 643 |
"path": "hf_artifacts:docs/data/research_roadmap_interactive.json",
|
| 644 |
"exists": true,
|
| 645 |
-
"bytes":
|
| 646 |
-
"sha256": "
|
| 647 |
},
|
| 648 |
"hf_model": {
|
| 649 |
"path": "hf_model:metrics/research_roadmap_interactive.json",
|
| 650 |
"exists": true,
|
| 651 |
-
"bytes":
|
| 652 |
-
"sha256": "
|
| 653 |
}
|
| 654 |
},
|
| 655 |
"failures": []
|
|
@@ -1692,21 +1692,21 @@
|
|
| 1692 |
"local": {
|
| 1693 |
"path": "repo:scripts/build_artifact_index.py",
|
| 1694 |
"exists": true,
|
| 1695 |
-
"bytes":
|
| 1696 |
-
"sha256": "
|
| 1697 |
},
|
| 1698 |
"mirrors": {
|
| 1699 |
"hf_artifacts": {
|
| 1700 |
"path": "hf_artifacts:scripts/build_artifact_index.py",
|
| 1701 |
"exists": true,
|
| 1702 |
-
"bytes":
|
| 1703 |
-
"sha256": "
|
| 1704 |
},
|
| 1705 |
"hf_model": {
|
| 1706 |
"path": "hf_model:scripts/build_artifact_index.py",
|
| 1707 |
"exists": true,
|
| 1708 |
-
"bytes":
|
| 1709 |
-
"sha256": "
|
| 1710 |
}
|
| 1711 |
},
|
| 1712 |
"failures": []
|
|
@@ -2017,21 +2017,21 @@
|
|
| 2017 |
"local": {
|
| 2018 |
"path": "repo:scripts/validate_publication_package.py",
|
| 2019 |
"exists": true,
|
| 2020 |
-
"bytes":
|
| 2021 |
-
"sha256": "
|
| 2022 |
},
|
| 2023 |
"mirrors": {
|
| 2024 |
"hf_artifacts": {
|
| 2025 |
"path": "hf_artifacts:scripts/validate_publication_package.py",
|
| 2026 |
"exists": true,
|
| 2027 |
-
"bytes":
|
| 2028 |
-
"sha256": "
|
| 2029 |
},
|
| 2030 |
"hf_model": {
|
| 2031 |
"path": "hf_model:scripts/validate_publication_package.py",
|
| 2032 |
"exists": true,
|
| 2033 |
-
"bytes":
|
| 2034 |
-
"sha256": "
|
| 2035 |
}
|
| 2036 |
},
|
| 2037 |
"failures": []
|
|
@@ -2217,21 +2217,21 @@
|
|
| 2217 |
"local": {
|
| 2218 |
"path": "repo:docs/index.html",
|
| 2219 |
"exists": true,
|
| 2220 |
-
"bytes":
|
| 2221 |
-
"sha256": "
|
| 2222 |
},
|
| 2223 |
"mirrors": {
|
| 2224 |
"hf_space": {
|
| 2225 |
"path": "hf_space:index.html",
|
| 2226 |
"exists": true,
|
| 2227 |
-
"bytes":
|
| 2228 |
-
"sha256": "
|
| 2229 |
},
|
| 2230 |
"hf_artifacts_docs": {
|
| 2231 |
"path": "hf_artifacts:docs/index.html",
|
| 2232 |
"exists": true,
|
| 2233 |
-
"bytes":
|
| 2234 |
-
"sha256": "
|
| 2235 |
}
|
| 2236 |
},
|
| 2237 |
"failures": []
|
|
@@ -2242,21 +2242,21 @@
|
|
| 2242 |
"local": {
|
| 2243 |
"path": "repo:docs/research_roadmap.html",
|
| 2244 |
"exists": true,
|
| 2245 |
-
"bytes":
|
| 2246 |
-
"sha256": "
|
| 2247 |
},
|
| 2248 |
"mirrors": {
|
| 2249 |
"hf_space": {
|
| 2250 |
"path": "hf_space:research_roadmap.html",
|
| 2251 |
"exists": true,
|
| 2252 |
-
"bytes":
|
| 2253 |
-
"sha256": "
|
| 2254 |
},
|
| 2255 |
"hf_artifacts_docs": {
|
| 2256 |
"path": "hf_artifacts:docs/research_roadmap.html",
|
| 2257 |
"exists": true,
|
| 2258 |
-
"bytes":
|
| 2259 |
-
"sha256": "
|
| 2260 |
}
|
| 2261 |
},
|
| 2262 |
"failures": []
|
|
@@ -2844,27 +2844,27 @@
|
|
| 2844 |
"local": {
|
| 2845 |
"path": "repo:FOUNDATION_MODEL_PLAN.md",
|
| 2846 |
"exists": true,
|
| 2847 |
-
"bytes":
|
| 2848 |
-
"sha256": "
|
| 2849 |
},
|
| 2850 |
"mirrors": {
|
| 2851 |
"hf_space": {
|
| 2852 |
"path": "hf_space:FOUNDATION_MODEL_PLAN.md",
|
| 2853 |
"exists": true,
|
| 2854 |
-
"bytes":
|
| 2855 |
-
"sha256": "
|
| 2856 |
},
|
| 2857 |
"hf_artifacts": {
|
| 2858 |
"path": "hf_artifacts:FOUNDATION_MODEL_PLAN.md",
|
| 2859 |
"exists": true,
|
| 2860 |
-
"bytes":
|
| 2861 |
-
"sha256": "
|
| 2862 |
},
|
| 2863 |
"hf_model": {
|
| 2864 |
"path": "hf_model:FOUNDATION_MODEL_PLAN.md",
|
| 2865 |
"exists": true,
|
| 2866 |
-
"bytes":
|
| 2867 |
-
"sha256": "
|
| 2868 |
}
|
| 2869 |
},
|
| 2870 |
"failures": []
|
|
@@ -2937,27 +2937,27 @@
|
|
| 2937 |
"local": {
|
| 2938 |
"path": "repo:RESEARCH_ROADMAP.md",
|
| 2939 |
"exists": true,
|
| 2940 |
-
"bytes":
|
| 2941 |
-
"sha256": "
|
| 2942 |
},
|
| 2943 |
"mirrors": {
|
| 2944 |
"hf_space": {
|
| 2945 |
"path": "hf_space:RESEARCH_ROADMAP.md",
|
| 2946 |
"exists": true,
|
| 2947 |
-
"bytes":
|
| 2948 |
-
"sha256": "
|
| 2949 |
},
|
| 2950 |
"hf_artifacts": {
|
| 2951 |
"path": "hf_artifacts:RESEARCH_ROADMAP.md",
|
| 2952 |
"exists": true,
|
| 2953 |
-
"bytes":
|
| 2954 |
-
"sha256": "
|
| 2955 |
},
|
| 2956 |
"hf_model": {
|
| 2957 |
"path": "hf_model:RESEARCH_ROADMAP.md",
|
| 2958 |
"exists": true,
|
| 2959 |
-
"bytes":
|
| 2960 |
-
"sha256": "
|
| 2961 |
}
|
| 2962 |
},
|
| 2963 |
"failures": []
|
|
@@ -2968,27 +2968,27 @@
|
|
| 2968 |
"local": {
|
| 2969 |
"path": "repo:PROJECT_STATUS.md",
|
| 2970 |
"exists": true,
|
| 2971 |
-
"bytes":
|
| 2972 |
-
"sha256": "
|
| 2973 |
},
|
| 2974 |
"mirrors": {
|
| 2975 |
"hf_space": {
|
| 2976 |
"path": "hf_space:PROJECT_STATUS.md",
|
| 2977 |
"exists": true,
|
| 2978 |
-
"bytes":
|
| 2979 |
-
"sha256": "
|
| 2980 |
},
|
| 2981 |
"hf_artifacts": {
|
| 2982 |
"path": "hf_artifacts:PROJECT_STATUS.md",
|
| 2983 |
"exists": true,
|
| 2984 |
-
"bytes":
|
| 2985 |
-
"sha256": "
|
| 2986 |
},
|
| 2987 |
"hf_model": {
|
| 2988 |
"path": "hf_model:PROJECT_STATUS.md",
|
| 2989 |
"exists": true,
|
| 2990 |
-
"bytes":
|
| 2991 |
-
"sha256": "
|
| 2992 |
}
|
| 2993 |
},
|
| 2994 |
"failures": []
|
|
|
|
| 1 |
{
|
| 2 |
"status": "pass",
|
| 3 |
+
"generated_at_utc": "2026-06-04T20:45:22+00:00",
|
| 4 |
"hf_root": "hf_publish",
|
| 5 |
"summary": {
|
| 6 |
"group_count": 101,
|
|
|
|
| 71 |
"local": {
|
| 72 |
"path": "repo:docs/data/artifact_index.json",
|
| 73 |
"exists": true,
|
| 74 |
+
"bytes": 32864,
|
| 75 |
+
"sha256": "ec7d17898c42fd76109567c201f9638059b6a9a11a48817b32677a0eb2662178"
|
| 76 |
},
|
| 77 |
"mirrors": {
|
| 78 |
"hf_space": {
|
| 79 |
"path": "hf_space:data/artifact_index.json",
|
| 80 |
"exists": true,
|
| 81 |
+
"bytes": 32864,
|
| 82 |
+
"sha256": "ec7d17898c42fd76109567c201f9638059b6a9a11a48817b32677a0eb2662178"
|
| 83 |
},
|
| 84 |
"hf_artifacts": {
|
| 85 |
"path": "hf_artifacts:docs/data/artifact_index.json",
|
| 86 |
"exists": true,
|
| 87 |
+
"bytes": 32864,
|
| 88 |
+
"sha256": "ec7d17898c42fd76109567c201f9638059b6a9a11a48817b32677a0eb2662178"
|
| 89 |
},
|
| 90 |
"hf_model": {
|
| 91 |
"path": "hf_model:metrics/artifact_index.json",
|
| 92 |
"exists": true,
|
| 93 |
+
"bytes": 32864,
|
| 94 |
+
"sha256": "ec7d17898c42fd76109567c201f9638059b6a9a11a48817b32677a0eb2662178"
|
| 95 |
}
|
| 96 |
},
|
| 97 |
"failures": []
|
|
|
|
| 226 |
"local": {
|
| 227 |
"path": "repo:docs/data/foundation_model_plan.json",
|
| 228 |
"exists": true,
|
| 229 |
+
"bytes": 12981,
|
| 230 |
+
"sha256": "9cce52025a2e2f8afb4660e2af3353aea6ad0a1af380849218dd74c0acc271bb"
|
| 231 |
},
|
| 232 |
"mirrors": {
|
| 233 |
"hf_space": {
|
| 234 |
"path": "hf_space:data/foundation_model_plan.json",
|
| 235 |
"exists": true,
|
| 236 |
+
"bytes": 12981,
|
| 237 |
+
"sha256": "9cce52025a2e2f8afb4660e2af3353aea6ad0a1af380849218dd74c0acc271bb"
|
| 238 |
},
|
| 239 |
"hf_artifacts": {
|
| 240 |
"path": "hf_artifacts:docs/data/foundation_model_plan.json",
|
| 241 |
"exists": true,
|
| 242 |
+
"bytes": 12981,
|
| 243 |
+
"sha256": "9cce52025a2e2f8afb4660e2af3353aea6ad0a1af380849218dd74c0acc271bb"
|
| 244 |
},
|
| 245 |
"hf_model": {
|
| 246 |
"path": "hf_model:metrics/foundation_model_plan.json",
|
| 247 |
"exists": true,
|
| 248 |
+
"bytes": 12981,
|
| 249 |
+
"sha256": "9cce52025a2e2f8afb4660e2af3353aea6ad0a1af380849218dd74c0acc271bb"
|
| 250 |
}
|
| 251 |
},
|
| 252 |
"failures": []
|
|
|
|
| 412 |
"local": {
|
| 413 |
"path": "repo:docs/data/project_status.json",
|
| 414 |
"exists": true,
|
| 415 |
+
"bytes": 9874,
|
| 416 |
+
"sha256": "600c95726eae3404127a8b2110f35468ff2ba02943cae0fbcd3ea43c66109d3e"
|
| 417 |
},
|
| 418 |
"mirrors": {
|
| 419 |
"hf_space": {
|
| 420 |
"path": "hf_space:data/project_status.json",
|
| 421 |
"exists": true,
|
| 422 |
+
"bytes": 9874,
|
| 423 |
+
"sha256": "600c95726eae3404127a8b2110f35468ff2ba02943cae0fbcd3ea43c66109d3e"
|
| 424 |
},
|
| 425 |
"hf_artifacts": {
|
| 426 |
"path": "hf_artifacts:docs/data/project_status.json",
|
| 427 |
"exists": true,
|
| 428 |
+
"bytes": 9874,
|
| 429 |
+
"sha256": "600c95726eae3404127a8b2110f35468ff2ba02943cae0fbcd3ea43c66109d3e"
|
| 430 |
},
|
| 431 |
"hf_model": {
|
| 432 |
"path": "hf_model:metrics/project_status.json",
|
| 433 |
"exists": true,
|
| 434 |
+
"bytes": 9874,
|
| 435 |
+
"sha256": "600c95726eae3404127a8b2110f35468ff2ba02943cae0fbcd3ea43c66109d3e"
|
| 436 |
}
|
| 437 |
},
|
| 438 |
"failures": []
|
|
|
|
| 444 |
"path": "repo:docs/data/publication_audit.json",
|
| 445 |
"exists": true,
|
| 446 |
"bytes": 7237,
|
| 447 |
+
"sha256": "7fbb19f8990b1a4d902e282c010d27e4391755564fa68af97d96c298c6b054f8"
|
| 448 |
},
|
| 449 |
"mirrors": {
|
| 450 |
"hf_space": {
|
| 451 |
"path": "hf_space:data/publication_audit.json",
|
| 452 |
"exists": true,
|
| 453 |
"bytes": 7237,
|
| 454 |
+
"sha256": "7fbb19f8990b1a4d902e282c010d27e4391755564fa68af97d96c298c6b054f8"
|
| 455 |
},
|
| 456 |
"hf_artifacts": {
|
| 457 |
"path": "hf_artifacts:docs/data/publication_audit.json",
|
| 458 |
"exists": true,
|
| 459 |
"bytes": 7237,
|
| 460 |
+
"sha256": "7fbb19f8990b1a4d902e282c010d27e4391755564fa68af97d96c298c6b054f8"
|
| 461 |
},
|
| 462 |
"hf_model": {
|
| 463 |
"path": "hf_model:metrics/publication_audit.json",
|
| 464 |
"exists": true,
|
| 465 |
"bytes": 7237,
|
| 466 |
+
"sha256": "7fbb19f8990b1a4d902e282c010d27e4391755564fa68af97d96c298c6b054f8"
|
| 467 |
}
|
| 468 |
},
|
| 469 |
"failures": []
|
|
|
|
| 598 |
"local": {
|
| 599 |
"path": "repo:docs/data/research_roadmap.json",
|
| 600 |
"exists": true,
|
| 601 |
+
"bytes": 7161,
|
| 602 |
+
"sha256": "cc96118c2c05108c831616151bc027441f7545495adeeb6a4a6a6bffe8da7801"
|
| 603 |
},
|
| 604 |
"mirrors": {
|
| 605 |
"hf_space": {
|
| 606 |
"path": "hf_space:data/research_roadmap.json",
|
| 607 |
"exists": true,
|
| 608 |
+
"bytes": 7161,
|
| 609 |
+
"sha256": "cc96118c2c05108c831616151bc027441f7545495adeeb6a4a6a6bffe8da7801"
|
| 610 |
},
|
| 611 |
"hf_artifacts": {
|
| 612 |
"path": "hf_artifacts:docs/data/research_roadmap.json",
|
| 613 |
"exists": true,
|
| 614 |
+
"bytes": 7161,
|
| 615 |
+
"sha256": "cc96118c2c05108c831616151bc027441f7545495adeeb6a4a6a6bffe8da7801"
|
| 616 |
},
|
| 617 |
"hf_model": {
|
| 618 |
"path": "hf_model:metrics/research_roadmap.json",
|
| 619 |
"exists": true,
|
| 620 |
+
"bytes": 7161,
|
| 621 |
+
"sha256": "cc96118c2c05108c831616151bc027441f7545495adeeb6a4a6a6bffe8da7801"
|
| 622 |
}
|
| 623 |
},
|
| 624 |
"failures": []
|
|
|
|
| 629 |
"local": {
|
| 630 |
"path": "repo:docs/data/research_roadmap_interactive.json",
|
| 631 |
"exists": true,
|
| 632 |
+
"bytes": 134282,
|
| 633 |
+
"sha256": "ff37219a9f1d9b386a9d4c42766e4aa28f10ce6ef338dceeedd6bdb4a1b2c40a"
|
| 634 |
},
|
| 635 |
"mirrors": {
|
| 636 |
"hf_space": {
|
| 637 |
"path": "hf_space:data/research_roadmap_interactive.json",
|
| 638 |
"exists": true,
|
| 639 |
+
"bytes": 134282,
|
| 640 |
+
"sha256": "ff37219a9f1d9b386a9d4c42766e4aa28f10ce6ef338dceeedd6bdb4a1b2c40a"
|
| 641 |
},
|
| 642 |
"hf_artifacts": {
|
| 643 |
"path": "hf_artifacts:docs/data/research_roadmap_interactive.json",
|
| 644 |
"exists": true,
|
| 645 |
+
"bytes": 134282,
|
| 646 |
+
"sha256": "ff37219a9f1d9b386a9d4c42766e4aa28f10ce6ef338dceeedd6bdb4a1b2c40a"
|
| 647 |
},
|
| 648 |
"hf_model": {
|
| 649 |
"path": "hf_model:metrics/research_roadmap_interactive.json",
|
| 650 |
"exists": true,
|
| 651 |
+
"bytes": 134282,
|
| 652 |
+
"sha256": "ff37219a9f1d9b386a9d4c42766e4aa28f10ce6ef338dceeedd6bdb4a1b2c40a"
|
| 653 |
}
|
| 654 |
},
|
| 655 |
"failures": []
|
|
|
|
| 1692 |
"local": {
|
| 1693 |
"path": "repo:scripts/build_artifact_index.py",
|
| 1694 |
"exists": true,
|
| 1695 |
+
"bytes": 27020,
|
| 1696 |
+
"sha256": "0ca7ed96f24caecbab31687cffa99f0eba8471258986412a294614e688c5aff5"
|
| 1697 |
},
|
| 1698 |
"mirrors": {
|
| 1699 |
"hf_artifacts": {
|
| 1700 |
"path": "hf_artifacts:scripts/build_artifact_index.py",
|
| 1701 |
"exists": true,
|
| 1702 |
+
"bytes": 27020,
|
| 1703 |
+
"sha256": "0ca7ed96f24caecbab31687cffa99f0eba8471258986412a294614e688c5aff5"
|
| 1704 |
},
|
| 1705 |
"hf_model": {
|
| 1706 |
"path": "hf_model:scripts/build_artifact_index.py",
|
| 1707 |
"exists": true,
|
| 1708 |
+
"bytes": 27020,
|
| 1709 |
+
"sha256": "0ca7ed96f24caecbab31687cffa99f0eba8471258986412a294614e688c5aff5"
|
| 1710 |
}
|
| 1711 |
},
|
| 1712 |
"failures": []
|
|
|
|
| 2017 |
"local": {
|
| 2018 |
"path": "repo:scripts/validate_publication_package.py",
|
| 2019 |
"exists": true,
|
| 2020 |
+
"bytes": 17197,
|
| 2021 |
+
"sha256": "2a617f3204ffb8c59d1c5bc1828b4441a4d014bb531655fd0613e128a6d9abc2"
|
| 2022 |
},
|
| 2023 |
"mirrors": {
|
| 2024 |
"hf_artifacts": {
|
| 2025 |
"path": "hf_artifacts:scripts/validate_publication_package.py",
|
| 2026 |
"exists": true,
|
| 2027 |
+
"bytes": 17197,
|
| 2028 |
+
"sha256": "2a617f3204ffb8c59d1c5bc1828b4441a4d014bb531655fd0613e128a6d9abc2"
|
| 2029 |
},
|
| 2030 |
"hf_model": {
|
| 2031 |
"path": "hf_model:scripts/validate_publication_package.py",
|
| 2032 |
"exists": true,
|
| 2033 |
+
"bytes": 17197,
|
| 2034 |
+
"sha256": "2a617f3204ffb8c59d1c5bc1828b4441a4d014bb531655fd0613e128a6d9abc2"
|
| 2035 |
}
|
| 2036 |
},
|
| 2037 |
"failures": []
|
|
|
|
| 2217 |
"local": {
|
| 2218 |
"path": "repo:docs/index.html",
|
| 2219 |
"exists": true,
|
| 2220 |
+
"bytes": 174923,
|
| 2221 |
+
"sha256": "099fcc01cbb4d50f62c508b10f343f05b1c883962b85bda294bcede99af2a0f1"
|
| 2222 |
},
|
| 2223 |
"mirrors": {
|
| 2224 |
"hf_space": {
|
| 2225 |
"path": "hf_space:index.html",
|
| 2226 |
"exists": true,
|
| 2227 |
+
"bytes": 174923,
|
| 2228 |
+
"sha256": "099fcc01cbb4d50f62c508b10f343f05b1c883962b85bda294bcede99af2a0f1"
|
| 2229 |
},
|
| 2230 |
"hf_artifacts_docs": {
|
| 2231 |
"path": "hf_artifacts:docs/index.html",
|
| 2232 |
"exists": true,
|
| 2233 |
+
"bytes": 174923,
|
| 2234 |
+
"sha256": "099fcc01cbb4d50f62c508b10f343f05b1c883962b85bda294bcede99af2a0f1"
|
| 2235 |
}
|
| 2236 |
},
|
| 2237 |
"failures": []
|
|
|
|
| 2242 |
"local": {
|
| 2243 |
"path": "repo:docs/research_roadmap.html",
|
| 2244 |
"exists": true,
|
| 2245 |
+
"bytes": 31702,
|
| 2246 |
+
"sha256": "1b20a5cc342b3ba59ad808eed9f5bf978e2d9ac438c88b5c3eeba01f4e14b883"
|
| 2247 |
},
|
| 2248 |
"mirrors": {
|
| 2249 |
"hf_space": {
|
| 2250 |
"path": "hf_space:research_roadmap.html",
|
| 2251 |
"exists": true,
|
| 2252 |
+
"bytes": 31702,
|
| 2253 |
+
"sha256": "1b20a5cc342b3ba59ad808eed9f5bf978e2d9ac438c88b5c3eeba01f4e14b883"
|
| 2254 |
},
|
| 2255 |
"hf_artifacts_docs": {
|
| 2256 |
"path": "hf_artifacts:docs/research_roadmap.html",
|
| 2257 |
"exists": true,
|
| 2258 |
+
"bytes": 31702,
|
| 2259 |
+
"sha256": "1b20a5cc342b3ba59ad808eed9f5bf978e2d9ac438c88b5c3eeba01f4e14b883"
|
| 2260 |
}
|
| 2261 |
},
|
| 2262 |
"failures": []
|
|
|
|
| 2844 |
"local": {
|
| 2845 |
"path": "repo:FOUNDATION_MODEL_PLAN.md",
|
| 2846 |
"exists": true,
|
| 2847 |
+
"bytes": 9075,
|
| 2848 |
+
"sha256": "444d13ab556d2e16a199a7fca191b87c85ab8685d167aab357bc6341839299a2"
|
| 2849 |
},
|
| 2850 |
"mirrors": {
|
| 2851 |
"hf_space": {
|
| 2852 |
"path": "hf_space:FOUNDATION_MODEL_PLAN.md",
|
| 2853 |
"exists": true,
|
| 2854 |
+
"bytes": 9075,
|
| 2855 |
+
"sha256": "444d13ab556d2e16a199a7fca191b87c85ab8685d167aab357bc6341839299a2"
|
| 2856 |
},
|
| 2857 |
"hf_artifacts": {
|
| 2858 |
"path": "hf_artifacts:FOUNDATION_MODEL_PLAN.md",
|
| 2859 |
"exists": true,
|
| 2860 |
+
"bytes": 9075,
|
| 2861 |
+
"sha256": "444d13ab556d2e16a199a7fca191b87c85ab8685d167aab357bc6341839299a2"
|
| 2862 |
},
|
| 2863 |
"hf_model": {
|
| 2864 |
"path": "hf_model:FOUNDATION_MODEL_PLAN.md",
|
| 2865 |
"exists": true,
|
| 2866 |
+
"bytes": 9075,
|
| 2867 |
+
"sha256": "444d13ab556d2e16a199a7fca191b87c85ab8685d167aab357bc6341839299a2"
|
| 2868 |
}
|
| 2869 |
},
|
| 2870 |
"failures": []
|
|
|
|
| 2937 |
"local": {
|
| 2938 |
"path": "repo:RESEARCH_ROADMAP.md",
|
| 2939 |
"exists": true,
|
| 2940 |
+
"bytes": 8388,
|
| 2941 |
+
"sha256": "0b3e3356076998ad94dc39f708cc783a4ebeab76c9da661cdd37ea12a3bb3665"
|
| 2942 |
},
|
| 2943 |
"mirrors": {
|
| 2944 |
"hf_space": {
|
| 2945 |
"path": "hf_space:RESEARCH_ROADMAP.md",
|
| 2946 |
"exists": true,
|
| 2947 |
+
"bytes": 8388,
|
| 2948 |
+
"sha256": "0b3e3356076998ad94dc39f708cc783a4ebeab76c9da661cdd37ea12a3bb3665"
|
| 2949 |
},
|
| 2950 |
"hf_artifacts": {
|
| 2951 |
"path": "hf_artifacts:RESEARCH_ROADMAP.md",
|
| 2952 |
"exists": true,
|
| 2953 |
+
"bytes": 8388,
|
| 2954 |
+
"sha256": "0b3e3356076998ad94dc39f708cc783a4ebeab76c9da661cdd37ea12a3bb3665"
|
| 2955 |
},
|
| 2956 |
"hf_model": {
|
| 2957 |
"path": "hf_model:RESEARCH_ROADMAP.md",
|
| 2958 |
"exists": true,
|
| 2959 |
+
"bytes": 8388,
|
| 2960 |
+
"sha256": "0b3e3356076998ad94dc39f708cc783a4ebeab76c9da661cdd37ea12a3bb3665"
|
| 2961 |
}
|
| 2962 |
},
|
| 2963 |
"failures": []
|
|
|
|
| 2968 |
"local": {
|
| 2969 |
"path": "repo:PROJECT_STATUS.md",
|
| 2970 |
"exists": true,
|
| 2971 |
+
"bytes": 7207,
|
| 2972 |
+
"sha256": "7baaba976ccc254da1a03ee2653057d1e08f3fb0c0cad035886c362442828720"
|
| 2973 |
},
|
| 2974 |
"mirrors": {
|
| 2975 |
"hf_space": {
|
| 2976 |
"path": "hf_space:PROJECT_STATUS.md",
|
| 2977 |
"exists": true,
|
| 2978 |
+
"bytes": 7207,
|
| 2979 |
+
"sha256": "7baaba976ccc254da1a03ee2653057d1e08f3fb0c0cad035886c362442828720"
|
| 2980 |
},
|
| 2981 |
"hf_artifacts": {
|
| 2982 |
"path": "hf_artifacts:PROJECT_STATUS.md",
|
| 2983 |
"exists": true,
|
| 2984 |
+
"bytes": 7207,
|
| 2985 |
+
"sha256": "7baaba976ccc254da1a03ee2653057d1e08f3fb0c0cad035886c362442828720"
|
| 2986 |
},
|
| 2987 |
"hf_model": {
|
| 2988 |
"path": "hf_model:PROJECT_STATUS.md",
|
| 2989 |
"exists": true,
|
| 2990 |
+
"bytes": 7207,
|
| 2991 |
+
"sha256": "7baaba976ccc254da1a03ee2653057d1e08f3fb0c0cad035886c362442828720"
|
| 2992 |
}
|
| 2993 |
},
|
| 2994 |
"failures": []
|
metrics/project_status.json
CHANGED
|
@@ -82,7 +82,7 @@
|
|
| 82 |
"RESEARCH_ROADMAP.md",
|
| 83 |
"docs/data/research_roadmap.json"
|
| 84 |
],
|
| 85 |
-
"readout": "The roadmap connects public-sample task development to 128-episode data preparation, Qwen3-Omni LoRA, foundation-model selection, robustness runs, and
|
| 86 |
},
|
| 87 |
{
|
| 88 |
"area": "Foundation-model plan",
|
|
@@ -93,6 +93,14 @@
|
|
| 93 |
],
|
| 94 |
"readout": "Qwen3-Omni remains the first trainable held-out LoRA baseline; Cosmos 3 is added as the first world-model/action-generation branch; OpenVLA/openpi/GR00T are policy candidates after action targets are explicit."
|
| 95 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 96 |
{
|
| 97 |
"area": "Official dataset wording",
|
| 98 |
"status": "verified",
|
|
@@ -167,6 +175,7 @@
|
|
| 167 |
"Inspect RESEARCH_TAKEAWAYS.md and docs/data/research_takeaways.json before interpreting model scores.",
|
| 168 |
"Inspect RESEARCH_ROADMAP.md and docs/data/research_roadmap.json for the path from public-sample task work to multi-episode modeling.",
|
| 169 |
"Inspect FOUNDATION_MODEL_PLAN.md and docs/data/foundation_model_plan.json before choosing a backbone branch.",
|
|
|
|
| 170 |
"Inspect docs/data/summary_metrics.json and results/episode_task_suite/neural_mlp/ to check the 12-task outputs.",
|
| 171 |
"Inspect results/audio_ablation/AUDIO_ABLATION_SUMMARY.md before judging whether audio helps the current task suite.",
|
| 172 |
"Inspect EVALUATION_PROTOCOL.md before judging task metrics or leakage controls.",
|
|
@@ -180,6 +189,7 @@
|
|
| 180 |
"The current reconstruction task reconstructs feature vectors, not pixel-depth, mesh, NeRF, or Gaussian reconstruction.",
|
| 181 |
"Audio is one of the synchronized source modalities in the current task representation.",
|
| 182 |
"The audio ablation report compares audio/no-audio variants across all 12 task contracts in results/audio_ablation/.",
|
| 183 |
-
"Foundation-model selection is explicit: Qwen3-Omni is the immediate trainable pilot, Cosmos 3 is the first world-model branch, and policy models such as OpenVLA/openpi/GR00T wait for action-target conversion."
|
|
|
|
| 184 |
]
|
| 185 |
}
|
|
|
|
| 82 |
"RESEARCH_ROADMAP.md",
|
| 83 |
"docs/data/research_roadmap.json"
|
| 84 |
],
|
| 85 |
+
"readout": "The roadmap connects public-sample task development to 128-episode data preparation, Qwen3-Omni LoRA, foundation-model selection, robustness runs, world/policy branches, and the future Xperience-native pretraining goal."
|
| 86 |
},
|
| 87 |
{
|
| 88 |
"area": "Foundation-model plan",
|
|
|
|
| 93 |
],
|
| 94 |
"readout": "Qwen3-Omni remains the first trainable held-out LoRA baseline; Cosmos 3 is added as the first world-model/action-generation branch; OpenVLA/openpi/GR00T are policy candidates after action targets are explicit."
|
| 95 |
},
|
| 96 |
+
{
|
| 97 |
+
"area": "Xperience Embodied Foundation Model",
|
| 98 |
+
"status": "future_goal",
|
| 99 |
+
"evidence": [
|
| 100 |
+
"XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md"
|
| 101 |
+
],
|
| 102 |
+
"readout": "A future full-corpus pretraining plan describes target modules, objectives, staged scale-up, hardware ranges, and evaluation for a domain-specific embodied foundation model."
|
| 103 |
+
},
|
| 104 |
{
|
| 105 |
"area": "Official dataset wording",
|
| 106 |
"status": "verified",
|
|
|
|
| 175 |
"Inspect RESEARCH_TAKEAWAYS.md and docs/data/research_takeaways.json before interpreting model scores.",
|
| 176 |
"Inspect RESEARCH_ROADMAP.md and docs/data/research_roadmap.json for the path from public-sample task work to multi-episode modeling.",
|
| 177 |
"Inspect FOUNDATION_MODEL_PLAN.md and docs/data/foundation_model_plan.json before choosing a backbone branch.",
|
| 178 |
+
"Inspect XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md for the long-term full-corpus pretraining goal.",
|
| 179 |
"Inspect docs/data/summary_metrics.json and results/episode_task_suite/neural_mlp/ to check the 12-task outputs.",
|
| 180 |
"Inspect results/audio_ablation/AUDIO_ABLATION_SUMMARY.md before judging whether audio helps the current task suite.",
|
| 181 |
"Inspect EVALUATION_PROTOCOL.md before judging task metrics or leakage controls.",
|
|
|
|
| 189 |
"The current reconstruction task reconstructs feature vectors, not pixel-depth, mesh, NeRF, or Gaussian reconstruction.",
|
| 190 |
"Audio is one of the synchronized source modalities in the current task representation.",
|
| 191 |
"The audio ablation report compares audio/no-audio variants across all 12 task contracts in results/audio_ablation/.",
|
| 192 |
+
"Foundation-model selection is explicit: Qwen3-Omni is the immediate trainable pilot, Cosmos 3 is the first world-model branch, and policy models such as OpenVLA/openpi/GR00T wait for action-target conversion.",
|
| 193 |
+
"The Xperience Embodied Foundation Model is a future native-pretraining goal, not a completed model or current benchmark."
|
| 194 |
]
|
| 195 |
}
|
metrics/publication_audit.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
{
|
| 2 |
"status": "pass",
|
| 3 |
-
"generated_at_utc": "2026-06-
|
| 4 |
"checks": [
|
| 5 |
{
|
| 6 |
"name": "required_publication_assets_present",
|
|
@@ -182,8 +182,8 @@
|
|
| 182 |
"github_repo": {
|
| 183 |
"root": "repo",
|
| 184 |
"exists": true,
|
| 185 |
-
"file_count":
|
| 186 |
-
"text_file_count":
|
| 187 |
"largest_file": {
|
| 188 |
"path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
|
| 189 |
"bytes": 55702978
|
|
@@ -193,8 +193,8 @@
|
|
| 193 |
"hf_space_bundle": {
|
| 194 |
"root": "hf_publish/space",
|
| 195 |
"exists": true,
|
| 196 |
-
"file_count":
|
| 197 |
-
"text_file_count":
|
| 198 |
"largest_file": {
|
| 199 |
"path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
|
| 200 |
"bytes": 55702978
|
|
@@ -204,8 +204,8 @@
|
|
| 204 |
"hf_artifact_bundle": {
|
| 205 |
"root": "hf_publish/artifacts",
|
| 206 |
"exists": true,
|
| 207 |
-
"file_count":
|
| 208 |
-
"text_file_count":
|
| 209 |
"largest_file": {
|
| 210 |
"path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
|
| 211 |
"bytes": 55702978
|
|
@@ -215,8 +215,8 @@
|
|
| 215 |
"hf_model_bundle": {
|
| 216 |
"root": "hf_publish/model",
|
| 217 |
"exists": true,
|
| 218 |
-
"file_count":
|
| 219 |
-
"text_file_count":
|
| 220 |
"largest_file": {
|
| 221 |
"path": "pytorch_model.bin",
|
| 222 |
"bytes": 93495480
|
|
|
|
| 1 |
{
|
| 2 |
"status": "pass",
|
| 3 |
+
"generated_at_utc": "2026-06-04T20:43:37+00:00",
|
| 4 |
"checks": [
|
| 5 |
{
|
| 6 |
"name": "required_publication_assets_present",
|
|
|
|
| 182 |
"github_repo": {
|
| 183 |
"root": "repo",
|
| 184 |
"exists": true,
|
| 185 |
+
"file_count": 396,
|
| 186 |
+
"text_file_count": 330,
|
| 187 |
"largest_file": {
|
| 188 |
"path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
|
| 189 |
"bytes": 55702978
|
|
|
|
| 193 |
"hf_space_bundle": {
|
| 194 |
"root": "hf_publish/space",
|
| 195 |
"exists": true,
|
| 196 |
+
"file_count": 317,
|
| 197 |
+
"text_file_count": 251,
|
| 198 |
"largest_file": {
|
| 199 |
"path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
|
| 200 |
"bytes": 55702978
|
|
|
|
| 204 |
"hf_artifact_bundle": {
|
| 205 |
"root": "hf_publish/artifacts",
|
| 206 |
"exists": true,
|
| 207 |
+
"file_count": 418,
|
| 208 |
+
"text_file_count": 330,
|
| 209 |
"largest_file": {
|
| 210 |
"path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
|
| 211 |
"bytes": 55702978
|
|
|
|
| 215 |
"hf_model_bundle": {
|
| 216 |
"root": "hf_publish/model",
|
| 217 |
"exists": true,
|
| 218 |
+
"file_count": 644,
|
| 219 |
+
"text_file_count": 519,
|
| 220 |
"largest_file": {
|
| 221 |
"path": "pytorch_model.bin",
|
| 222 |
"bytes": 93495480
|
metrics/research_roadmap.json
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
{
|
| 2 |
"title": "Ropedia Xperience-10M Research Roadmap",
|
| 3 |
-
"summary": "Staged path from the public-sample task lab to multi-episode held-out evaluation, foundation-model selection, and
|
| 4 |
-
"current_decision_point": "Keep the public-sample task suite as the development harness, prepare the selected official Xperience-10M episodes for the held-out Qwen3-Omni pilot, then branch into Cosmos 3 world modeling and policy-model experiments after the data preparation path is stable.",
|
| 5 |
"phases": [
|
| 6 |
{
|
| 7 |
"id": "public_sample_task_lab",
|
|
@@ -126,6 +126,30 @@
|
|
| 126 |
"updated model cards"
|
| 127 |
],
|
| 128 |
"reader_takeaway": "The long-term direction is richer multimodal representation learning for embodied-AI reasoning, with model branches chosen by task fit rather than by a single default backbone."
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 129 |
}
|
| 130 |
],
|
| 131 |
"public_surfaces_to_update": [
|
|
@@ -134,6 +158,7 @@
|
|
| 134 |
"RESEARCH_TAKEAWAYS.md",
|
| 135 |
"EVALUATION_PROTOCOL.md",
|
| 136 |
"ARTIFACT_GUIDE.md",
|
|
|
|
| 137 |
"docs/index.html",
|
| 138 |
"docs/data/research_roadmap.json",
|
| 139 |
"Hugging Face Space card",
|
|
|
|
| 1 |
{
|
| 2 |
"title": "Ropedia Xperience-10M Research Roadmap",
|
| 3 |
+
"summary": "Staged path from the public-sample task lab to multi-episode held-out evaluation, foundation-model selection, world/policy branches, and a future Xperience-native embodied foundation model.",
|
| 4 |
+
"current_decision_point": "Keep the public-sample task suite as the development harness, prepare the selected official Xperience-10M episodes for the held-out Qwen3-Omni pilot, then branch into Cosmos 3 world modeling and policy-model experiments after the data preparation path is stable. The Xperience Embodied Foundation Model is a later full-corpus pretraining goal, not a current result.",
|
| 5 |
"phases": [
|
| 6 |
{
|
| 7 |
"id": "public_sample_task_lab",
|
|
|
|
| 126 |
"updated model cards"
|
| 127 |
],
|
| 128 |
"reader_takeaway": "The long-term direction is richer multimodal representation learning for embodied-AI reasoning, with model branches chosen by task fit rather than by a single default backbone."
|
| 129 |
+
},
|
| 130 |
+
{
|
| 131 |
+
"id": "xperience_embodied_foundation_pretraining",
|
| 132 |
+
"name": "Xperience Embodied Foundation Model Pretraining",
|
| 133 |
+
"status": "future",
|
| 134 |
+
"entry_condition": "Full-corpus access, PB-scale storage path, high-throughput data loading, multi-node compute, and positive scaling evidence from smaller multi-episode runs.",
|
| 135 |
+
"deliverables": [
|
| 136 |
+
"full-corpus episode and split manifests",
|
| 137 |
+
"pretraining shard and provenance manifests",
|
| 138 |
+
"0.3B-1B and 1B-3B scaling pilots",
|
| 139 |
+
"3B-7B Xperience-native domain model target",
|
| 140 |
+
"held-out episode/session/activity/object evaluations",
|
| 141 |
+
"missing-modality robustness report",
|
| 142 |
+
"model card and data-boundary report"
|
| 143 |
+
],
|
| 144 |
+
"completion_evidence": [
|
| 145 |
+
"pretraining metadata",
|
| 146 |
+
"checkpoint inventory",
|
| 147 |
+
"scaling curves",
|
| 148 |
+
"held-out evaluation reports",
|
| 149 |
+
"qualitative retrieval or future-state examples",
|
| 150 |
+
"safety and data-boundary report"
|
| 151 |
+
],
|
| 152 |
+
"reader_takeaway": "The final research direction is a domain-specific embodied foundation model trained directly on Xperience-10M, after smaller pilots justify the cost and infrastructure."
|
| 153 |
}
|
| 154 |
],
|
| 155 |
"public_surfaces_to_update": [
|
|
|
|
| 158 |
"RESEARCH_TAKEAWAYS.md",
|
| 159 |
"EVALUATION_PROTOCOL.md",
|
| 160 |
"ARTIFACT_GUIDE.md",
|
| 161 |
+
"XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md",
|
| 162 |
"docs/index.html",
|
| 163 |
"docs/data/research_roadmap.json",
|
| 164 |
"Hugging Face Space card",
|
metrics/research_roadmap_interactive.json
CHANGED
|
@@ -1837,7 +1837,8 @@
|
|
| 1837 |
"NVIDIA GR00T"
|
| 1838 |
],
|
| 1839 |
"first_world_model_branch": "Cosmos 3",
|
| 1840 |
-
"immediate_trainable_backbone": "Qwen3-Omni"
|
|
|
|
| 1841 |
},
|
| 1842 |
"evaluation_additions": [
|
| 1843 |
{
|
|
@@ -1921,6 +1922,11 @@
|
|
| 1921 |
"action": "Publish branch results only with real manifests, predictions, metrics, and qualitative examples.",
|
| 1922 |
"name": "Publishing threshold",
|
| 1923 |
"step": 6
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1924 |
}
|
| 1925 |
],
|
| 1926 |
"model_families": [
|
|
@@ -2023,6 +2029,21 @@
|
|
| 2023 |
"Useful after action target design.",
|
| 2024 |
"Less directly omni-modal than Qwen3-Omni or Cosmos 3."
|
| 2025 |
]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2026 |
}
|
| 2027 |
],
|
| 2028 |
"source_links": [
|
|
@@ -2057,11 +2078,15 @@
|
|
| 2057 |
{
|
| 2058 |
"label": "LeRobot / SmolVLA",
|
| 2059 |
"url": "https://github.com/huggingface/lerobot"
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2060 |
}
|
| 2061 |
],
|
| 2062 |
"status": "planning_artifact"
|
| 2063 |
},
|
| 2064 |
-
"generated_at_utc": "2026-06-
|
| 2065 |
"omni_plan": {
|
| 2066 |
"adapter": "LoRA rank 16, alpha 32, dropout 0.05",
|
| 2067 |
"backbone": "Qwen/Qwen3-Omni-30B-A3B-Instruct",
|
|
@@ -2208,6 +2233,31 @@
|
|
| 2208 |
"reader_takeaway": "The long-term direction is richer multimodal representation learning for embodied-AI reasoning, with model branches chosen by task fit rather than by a single default backbone.",
|
| 2209 |
"stage": "future",
|
| 2210 |
"status": "planned"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2211 |
}
|
| 2212 |
],
|
| 2213 |
"scale_up": {
|
|
|
|
| 1837 |
"NVIDIA GR00T"
|
| 1838 |
],
|
| 1839 |
"first_world_model_branch": "Cosmos 3",
|
| 1840 |
+
"immediate_trainable_backbone": "Qwen3-Omni",
|
| 1841 |
+
"long_term_native_pretraining_goal": "Xperience Embodied Foundation Model"
|
| 1842 |
},
|
| 1843 |
"evaluation_additions": [
|
| 1844 |
{
|
|
|
|
| 1922 |
"action": "Publish branch results only with real manifests, predictions, metrics, and qualitative examples.",
|
| 1923 |
"name": "Publishing threshold",
|
| 1924 |
"step": 6
|
| 1925 |
+
},
|
| 1926 |
+
{
|
| 1927 |
+
"action": "Start a from-scratch Xperience Embodied Foundation Model only after smaller scaling stages, full-corpus storage, multi-node compute, and held-out evaluation protocols are in place.",
|
| 1928 |
+
"name": "Xperience-native pretraining",
|
| 1929 |
+
"step": 7
|
| 1930 |
}
|
| 1931 |
],
|
| 1932 |
"model_families": [
|
|
|
|
| 2029 |
"Useful after action target design.",
|
| 2030 |
"Less directly omni-modal than Qwen3-Omni or Cosmos 3."
|
| 2031 |
]
|
| 2032 |
+
},
|
| 2033 |
+
{
|
| 2034 |
+
"best_role": "Domain model over synchronized embodied experience.",
|
| 2035 |
+
"category": "xperience_native_pretraining_goal",
|
| 2036 |
+
"current_decision": "future_goal_after_scaling_evidence",
|
| 2037 |
+
"entry_condition": "Full-corpus data path, PB-scale storage, multi-node compute, and positive smaller-run scaling evidence.",
|
| 2038 |
+
"family": "Xperience Embodied Foundation Model",
|
| 2039 |
+
"openness": "future project-specific model if full-corpus access and compute exist",
|
| 2040 |
+
"priority": 8,
|
| 2041 |
+
"public_source": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md",
|
| 2042 |
+
"xperience10m_fit": [
|
| 2043 |
+
"Uses the full aligned modality stack rather than treating sensors as auxiliary metadata.",
|
| 2044 |
+
"Targets temporal embodied representation learning across perception, motion, geometry, audio, and language.",
|
| 2045 |
+
"Can become the shared pretraining backbone for Qwen-style instruction tasks, Cosmos-style world modeling, and policy/action branches."
|
| 2046 |
+
]
|
| 2047 |
}
|
| 2048 |
],
|
| 2049 |
"source_links": [
|
|
|
|
| 2078 |
{
|
| 2079 |
"label": "LeRobot / SmolVLA",
|
| 2080 |
"url": "https://github.com/huggingface/lerobot"
|
| 2081 |
+
},
|
| 2082 |
+
{
|
| 2083 |
+
"label": "Xperience Embodied Foundation Model pretraining plan",
|
| 2084 |
+
"url": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md"
|
| 2085 |
}
|
| 2086 |
],
|
| 2087 |
"status": "planning_artifact"
|
| 2088 |
},
|
| 2089 |
+
"generated_at_utc": "2026-06-04T20:40:29+00:00",
|
| 2090 |
"omni_plan": {
|
| 2091 |
"adapter": "LoRA rank 16, alpha 32, dropout 0.05",
|
| 2092 |
"backbone": "Qwen/Qwen3-Omni-30B-A3B-Instruct",
|
|
|
|
| 2233 |
"reader_takeaway": "The long-term direction is richer multimodal representation learning for embodied-AI reasoning, with model branches chosen by task fit rather than by a single default backbone.",
|
| 2234 |
"stage": "future",
|
| 2235 |
"status": "planned"
|
| 2236 |
+
},
|
| 2237 |
+
{
|
| 2238 |
+
"completion_evidence": [
|
| 2239 |
+
"pretraining metadata",
|
| 2240 |
+
"checkpoint inventory",
|
| 2241 |
+
"scaling curves",
|
| 2242 |
+
"held-out evaluation reports",
|
| 2243 |
+
"qualitative retrieval or future-state examples",
|
| 2244 |
+
"safety and data-boundary report"
|
| 2245 |
+
],
|
| 2246 |
+
"deliverables": [
|
| 2247 |
+
"full-corpus episode and split manifests",
|
| 2248 |
+
"pretraining shard and provenance manifests",
|
| 2249 |
+
"0.3B-1B and 1B-3B scaling pilots",
|
| 2250 |
+
"3B-7B Xperience-native domain model target",
|
| 2251 |
+
"held-out episode/session/activity/object evaluations",
|
| 2252 |
+
"missing-modality robustness report",
|
| 2253 |
+
"model card and data-boundary report"
|
| 2254 |
+
],
|
| 2255 |
+
"entry_condition": "Full-corpus access, PB-scale storage path, high-throughput data loading, multi-node compute, and positive scaling evidence from smaller multi-episode runs.",
|
| 2256 |
+
"id": "xperience_embodied_foundation_pretraining",
|
| 2257 |
+
"name": "Xperience Embodied Foundation Model Pretraining",
|
| 2258 |
+
"reader_takeaway": "The final research direction is a domain-specific embodied foundation model trained directly on Xperience-10M, after smaller pilots justify the cost and infrastructure.",
|
| 2259 |
+
"stage": "future",
|
| 2260 |
+
"status": "future"
|
| 2261 |
}
|
| 2262 |
],
|
| 2263 |
"scale_up": {
|
research_roadmap.html
CHANGED
|
@@ -605,8 +605,9 @@
|
|
| 605 |
<h1>Interactive Research Roadmap.</h1>
|
| 606 |
<p class="hero-copy">
|
| 607 |
This page connects the current public-sample task lab to the four research
|
| 608 |
-
directions, the next multi-episode Qwen3-Omni fine-tuning path,
|
| 609 |
-
|
|
|
|
| 610 |
directly from generated project artifacts, so the track and task views stay
|
| 611 |
tied to the real sample metrics and scale-up status.
|
| 612 |
</p>
|
|
@@ -630,7 +631,7 @@
|
|
| 630 |
</div>
|
| 631 |
<div class="route-step">
|
| 632 |
<strong>03</strong>
|
| 633 |
-
<div><b>Omni + branches</b><span>Qwen3-Omni first, Cosmos 3 and policy models
|
| 634 |
<em id="routeOmni">pending data</em>
|
| 635 |
</div>
|
| 636 |
</div>
|
|
@@ -701,7 +702,7 @@
|
|
| 701 |
},
|
| 702 |
omni: {
|
| 703 |
title: "Omni pilot and foundation branches",
|
| 704 |
-
summary: "Run Qwen3-Omni first for the held-out LoRA pilot,
|
| 705 |
}
|
| 706 |
};
|
| 707 |
|
|
|
|
| 605 |
<h1>Interactive Research Roadmap.</h1>
|
| 606 |
<p class="hero-copy">
|
| 607 |
This page connects the current public-sample task lab to the four research
|
| 608 |
+
directions, the next multi-episode Qwen3-Omni fine-tuning path, the
|
| 609 |
+
later Cosmos 3 / policy-model branch choices, and the future
|
| 610 |
+
Xperience-native foundation-model pretraining goal. It loads
|
| 611 |
directly from generated project artifacts, so the track and task views stay
|
| 612 |
tied to the real sample metrics and scale-up status.
|
| 613 |
</p>
|
|
|
|
| 631 |
</div>
|
| 632 |
<div class="route-step">
|
| 633 |
<strong>03</strong>
|
| 634 |
+
<div><b>Omni + branches</b><span>Qwen3-Omni first, Cosmos 3 and policy models next, native pretraining later</span></div>
|
| 635 |
<em id="routeOmni">pending data</em>
|
| 636 |
</div>
|
| 637 |
</div>
|
|
|
|
| 702 |
},
|
| 703 |
omni: {
|
| 704 |
title: "Omni pilot and foundation branches",
|
| 705 |
+
summary: "Run Qwen3-Omni first for the held-out LoRA pilot, evaluate Cosmos 3 for world modeling and policy candidates after action targets are explicit, then treat Xperience-native pretraining as the full-corpus future goal.",
|
| 706 |
}
|
| 707 |
};
|
| 708 |
|
scripts/build_artifact_index.py
CHANGED
|
@@ -81,6 +81,14 @@ ARTIFACTS = [
|
|
| 81 |
"surface": "website_hf",
|
| 82 |
"shows": "Machine-readable foundation-model selection matrix with source links, entry conditions, and evaluation additions.",
|
| 83 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 84 |
{
|
| 85 |
"id": "evidence_contract",
|
| 86 |
"title": "Evidence contract",
|
|
|
|
| 81 |
"surface": "website_hf",
|
| 82 |
"shows": "Machine-readable foundation-model selection matrix with source links, entry conditions, and evaluation additions.",
|
| 83 |
},
|
| 84 |
+
{
|
| 85 |
+
"id": "xperience_embodied_foundation_pretraining",
|
| 86 |
+
"title": "Xperience Embodied Foundation Model pretraining goal",
|
| 87 |
+
"path": "XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md",
|
| 88 |
+
"kind": "project_path",
|
| 89 |
+
"surface": "repo_hf",
|
| 90 |
+
"shows": "Describes the future full-corpus Xperience-native pretraining goal, target modules, objectives, staged scale-up, hardware ranges, and evaluation protocol.",
|
| 91 |
+
},
|
| 92 |
{
|
| 93 |
"id": "evidence_contract",
|
| 94 |
"title": "Evidence contract",
|
scripts/validate_publication_package.py
CHANGED
|
@@ -221,6 +221,8 @@ def scan(root: Path, *, paths: list[Path] | None = None, display_root: str | Non
|
|
| 221 |
"detail": reason,
|
| 222 |
})
|
| 223 |
for needle, reason in STALE_PRESENTATION_STRINGS.items():
|
|
|
|
|
|
|
| 224 |
if needle in text:
|
| 225 |
violations.append({
|
| 226 |
"kind": "stale_presentation_copy",
|
|
|
|
| 221 |
"detail": reason,
|
| 222 |
})
|
| 223 |
for needle, reason in STALE_PRESENTATION_STRINGS.items():
|
| 224 |
+
if path_rel == ".mailmap":
|
| 225 |
+
continue
|
| 226 |
if needle in text:
|
| 227 |
violations.append({
|
| 228 |
"kind": "stale_presentation_copy",
|