Robotics
PyTorch
Cosmos
xperience10m_task_baseline_suite
embodied-ai
multimodal
xperience-10m
baseline
evaluation
qwen3-omni
Instructions to use cy0307/ropedia-xperience-10m-task-baselines with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Cosmos
How to use cy0307/ropedia-xperience-10m-task-baselines with Cosmos:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
File size: 4,146 Bytes
ca4ac1c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | <svg xmlns="http://www.w3.org/2000/svg" width="1320" height="624" viewBox="0 0 1320 624">
<rect width="100%" height="100%" fill="#07110d"/>
<text x="36" y="42" fill="#e6f7ea" font-family="Arial, sans-serif" font-size="28" font-weight="700">Measured Audio Delta Across 12 Xperience-10M Tasks</text>
<text x="36" y="70" fill="#a7b8ab" font-family="Arial, sans-serif" font-size="15">Positive means audio improved the task primary metric on the single public sample split.</text>
<line x1="680" y1="92" x2="680" y2="600" stroke="#5b6f61" stroke-width="1"/>
<text x="36" y="130" fill="#d8eadc" font-family="Arial, sans-serif" font-size="15">Current Action Recognition</text>
<rect x="680.00" y="112" width="0.10" height="22" rx="3" fill="#7ae5c3"/>
<text x="950" y="129" fill="#d8eadc" font-family="Arial, sans-serif" font-size="14">+0.0003 macro_f1</text>
<text x="36" y="172" fill="#d8eadc" font-family="Arial, sans-serif" font-size="15">Current Subtask Recognition</text>
<rect x="680.00" y="154" width="0.03" height="22" rx="3" fill="#7ae5c3"/>
<text x="950" y="171" fill="#d8eadc" font-family="Arial, sans-serif" font-size="14">+0.0001 macro_f1</text>
<text x="36" y="214" fill="#d8eadc" font-family="Arial, sans-serif" font-size="15">Action Transition Detection</text>
<rect x="677.58" y="196" width="2.42" height="22" rx="3" fill="#ff8a6a"/>
<text x="950" y="213" fill="#d8eadc" font-family="Arial, sans-serif" font-size="14">-0.0066 macro_f1</text>
<text x="36" y="256" fill="#d8eadc" font-family="Arial, sans-serif" font-size="15">Next-Action Prediction</text>
<rect x="679.95" y="238" width="0.05" height="22" rx="3" fill="#ff8a6a"/>
<text x="950" y="255" fill="#d8eadc" font-family="Arial, sans-serif" font-size="14">-0.0001 macro_f1</text>
<text x="36" y="298" fill="#d8eadc" font-family="Arial, sans-serif" font-size="15">Future Hand Motion Forecasting</text>
<rect x="620.17" y="280" width="59.83" height="22" rx="3" fill="#ff8a6a"/>
<text x="950" y="297" fill="#d8eadc" font-family="Arial, sans-serif" font-size="14">-0.1626 mae</text>
<text x="36" y="340" fill="#d8eadc" font-family="Arial, sans-serif" font-size="15">Contact State Prediction</text>
<rect x="680.00" y="322" width="0.00" height="22" rx="3" fill="#7ae5c3"/>
<text x="950" y="339" fill="#d8eadc" font-family="Arial, sans-serif" font-size="14">+0.0000 macro_f1</text>
<text x="36" y="382" fill="#d8eadc" font-family="Arial, sans-serif" font-size="15">Relevant Object Prediction</text>
<rect x="680.00" y="364" width="3.75" height="22" rx="3" fill="#7ae5c3"/>
<text x="950" y="381" fill="#d8eadc" font-family="Arial, sans-serif" font-size="14">+0.0102 micro_f1</text>
<text x="36" y="424" fill="#d8eadc" font-family="Arial, sans-serif" font-size="15">Language-to-Time Grounding</text>
<rect x="680.00" y="406" width="1.79" height="22" rx="3" fill="#7ae5c3"/>
<text x="950" y="423" fill="#d8eadc" font-family="Arial, sans-serif" font-size="14">+0.0049 mrr</text>
<text x="36" y="466" fill="#d8eadc" font-family="Arial, sans-serif" font-size="15">Cross-Modal Window Retrieval</text>
<rect x="674.82" y="448" width="5.18" height="22" rx="3" fill="#ff8a6a"/>
<text x="950" y="465" fill="#d8eadc" font-family="Arial, sans-serif" font-size="14">-0.0141 mrr</text>
<text x="36" y="508" fill="#d8eadc" font-family="Arial, sans-serif" font-size="15">Sensor-to-Visual Reconstruction</text>
<rect x="680.00" y="490" width="240.00" height="22" rx="3" fill="#7ae5c3"/>
<text x="950" y="507" fill="#d8eadc" font-family="Arial, sans-serif" font-size="14">+0.6524 mae</text>
<text x="36" y="550" fill="#d8eadc" font-family="Arial, sans-serif" font-size="15">Temporal Order Verification</text>
<rect x="680.00" y="532" width="8.46" height="22" rx="3" fill="#7ae5c3"/>
<text x="950" y="549" fill="#d8eadc" font-family="Arial, sans-serif" font-size="14">+0.0230 macro_f1</text>
<text x="36" y="592" fill="#d8eadc" font-family="Arial, sans-serif" font-size="15">Cross-Modal Misalignment Detection</text>
<rect x="678.07" y="574" width="1.93" height="22" rx="3" fill="#ff8a6a"/>
<text x="950" y="591" fill="#d8eadc" font-family="Arial, sans-serif" font-size="14">-0.0052 macro_f1</text>
</svg> |