papers - a Pechyony Collection

Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
- Website
- Community
- Solutions
Log In
Sign Up

Pechyony 's Collections

papers

updated about 12 hours ago

AI for Auto-Research: Roadmap & User Guide

Paper • 2605.18661 • Published May 18 • 71
StableVLA: Towards Robust Vision-Language-Action Models without Extra Data

Paper • 2605.18287 • Published May 18 • 15
MixSD: Mixed Contextual Self-Distillation for Knowledge Injection

Paper • 2605.16865 • Published May 16 • 10
MolmoPoint: Better Pointing for VLMs with Grounding Tokens

Paper • 2603.28069 • Published Mar 30 • 9
VersaViT: Enhancing MLLM Vision Backbones via Task-Guided Optimization

Paper • 2602.09934 • Published Feb 10 • 1
Flash-WAM: Modality-Aware Distillation for World Action Models

Paper • 2606.05254 • Published Jun 3 • 7
Latent Reasoning with Normalizing Flows

Paper • 2606.06447 • Published Jun 4 • 8
Discrete-WAM: Unified Discrete Vision-Action Token Editing for World-Policy Learning

Paper • 2606.05645 • Published Jun 4 • 2
Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?

Paper • 2606.04811 • Published Jun 4 • 17
Cosmos 3: Omnimodal World Models for Physical AI

Paper • 2606.02800 • Published Jun 1 • 141
Qwen-Image-Flash: Beyond Objective Design

Paper • 2606.03746 • Published Jun 2 • 37
Filter, Then Reweight: Rethinking Optimization Granularity in On-Policy Distillation

Paper • 2606.02684 • Published Jun 1 • 17
GRAIL: Generating Humanoid Loco-Manipulation from 3D Assets and Video Priors

Paper • 2606.05160 • Published Jun 3 • 9
Unlocking Feature Learning in Gated Delta Networks at Scale

Paper • 2606.04048 • Published Jun 2 • 2
SpatialAct: Probing Spatial Reasoning-to-Action Capabilities of VLM Agents in 3D Scenes

Paper • 2605.31148 • Published May 29 • 3
On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters

Paper • 2606.02437 • Published Jun 1 • 241
PhysBrain 1.0 Technical Report

Paper • 2605.15298 • Published May 14 • 145
Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

Paper • 2605.30280 • Published May 28 • 146
LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

Paper • 2605.27365 • Published May 26 • 146
RLDX-1 Technical Report

Paper • 2605.03269 • Published May 5 • 127
Robots Need More than VLA and World Models

Paper • 2606.06556 • Published Jun 4 • 30
World Pilot: Steering Vision-Language-Action Models with World-Action Priors

Paper • 2606.12403 • Published Jun 10 • 27
Reason, Then Re-reason: Cross-view Revisiting Improves Spatial Reasoning

Paper • 2606.11683 • Published Jun 10 • 30
On Subquadratic Architectures: From Applications to Principles

Paper • 2606.12364 • Published Jun 10 • 24
Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models

Paper • 2606.11324 • Published Jun 9 • 172
World Model Self-Distillation: Training World Models to Solve General Tasks

Paper • 2606.12072 • Published Jun 10 • 15
MiniMax Sparse Attention

Paper • 2606.13392 • Published Jun 11 • 153
WEAVER, Better, Faster, Longer: An Effective World Model for Robotic Manipulation

Paper • 2606.13672 • Published Jun 11 • 4
Hy-Embodied-0.5-VLA: From Vision-Language-Action Models to a Real-World Robot Learning Stack

Paper • 2606.14409 • Published Jun 12 • 15
μ_0: A Scalable 3D Interaction-Trace World Model

Paper • 2606.13769 • Published Jun 11 • 11
World Tracing: Generative Pixel-Aligned Geometry Beyond the Visible

Paper • 2606.13652 • Published Jun 11 • 16
World Value Models for Robotic Manipulation

Paper • 2606.24742 • Published Jun 23 • 7
EventVLA: Event-Driven Visual Evidence Memory for Long-Horizon Vision-Language-Action Policies

Paper • 2606.20092 • Published Jun 18 • 6
From Foundation to Application: Improving VLA Models in Practice

Paper • 2607.06403 • Published 21 days ago • 20
GigaWorld-1: A Roadmap to Build World Models for Robot Policy Evaluation

Paper • 2607.02642 • Published 26 days ago • 40
OpenCoF: Learning to Reason Through Video Generation

Paper • 2607.08763 • Published 19 days ago • 28
Linear Attention Architectures: Mechanisms, Trade-offs, and Cross-Layer Routing

Paper • 2607.07953 • Published 20 days ago • 16
RoboDojo: A Unified Sim-and-Real Benchmark for Comprehensive Evaluation of Generalist Robot Manipulation Policies

Paper • 2607.04434 • Published 21 days ago • 15
Dual Latent Memory in Vision-Language-Action Models for Robotic Manipulation

Paper • 2607.07608 • Published 20 days ago • 56
Let RGB Be the Language of Vision

Paper • 2607.12450 • Published 14 days ago • 14
GigaWorld-Policy-0.5: A Faster and Stronger WAM Empowered by AutoResearch

Paper • 2607.13960 • Published 13 days ago • 28
SPEAR: A Simulator for Photorealistic Embodied AI Research

Paper • 2607.06701 • Published 21 days ago • 8
Loop the Loopies!

Paper • 2607.16051 • Published 11 days ago • 73
Xiaomi-Robotics-1: Scaling Vision-Language-Action Models with over 100K Hours of Real-World Trajectories

Paper • 2607.15330 • Published 12 days ago • 70
On-Policy Delta Distillation

Paper • 2607.15161 • Published 12 days ago • 37
Understanding Reasoning from Pretraining to Post-Training

Paper • 2607.16097 • Published 11 days ago • 28
See like a Robot: Robot-Centric Pointmaps for Vision-Language-Action Models

Paper • 2607.11498 • Published 15 days ago • 7
RynnBrain 1.1: Towards More Capable and Generalizable Embodied Foundation Model

Paper • 2607.17977 • Published 8 days ago • 197
Group Entropy-Controlled Policy Optimization

Paper • 2607.16850 • Published 10 days ago • 29
Open-AoE: An Open Egocentric Manipulation Dataset and Toolchain for Embodied Learning

Paper • 2607.14183 • Published 13 days ago • 68
Distilled Reinforcement Learning for LLM Post-training

Paper • 2607.17247 • Published 9 days ago • 9
JoyNexus: Service-Oriented Multi-Tenant Post-Training for VLA Models

Paper • 2607.16074 • Published 11 days ago • 8
ABot-World-0: Infinite Interactive World Rollout on a Single Desktop GPU

Paper • 2607.19191 • Published 7 days ago • 300
Mage-Flow: An Efficient Native-Resolution Foundation Model for Image Generation and Editing

Paper • 2607.19064 • Published 7 days ago • 72
AlayaWorld: Interactive Long-Horizon World Modeling -- Full Technical Report

Paper • 2607.18367 • Published 7 days ago • 57
H^2SD: Hybrid Hindsight Self-Distillation

Paper • 2607.18955 • Published 7 days ago • 7
Appearance Pointers -- Multimodal Region Control of Diffusion Transformers

Paper • 2607.19344 • Published 7 days ago • 4
Masked Visual Actions for Unified World Modeling

Paper • 2607.19343 • Published 7 days ago • 8
SeededGrasp: Language-Guided Grasping in Complex Scenes with Multiple Embodiments

Paper • 2607.20207 • Published 6 days ago • 4
Self-Supervised Learning of Structured Dynamics from Videos

Paper • 2607.21576 • Published 5 days ago • 18
Visual Contrastive Self-Distillation

Paper • 2607.21556 • Published 5 days ago • 44
SANA-Video 2.0: Hybrid Linear Attention with Attention Residuals for Efficient Video Generation

Paper • 2607.21553 • Published 5 days ago • 34
TableVerse: A Large-scale Tabletop Dataset with Real-world Grounded Layouts for Generalizable Manipulation

Paper • 2607.21017 • Published 5 days ago • 5
Molt: A Scalable PyTorch-Native Training Framework for Agentic Reinforcement Learning

Paper • 2607.21653 • Published 6 days ago • 24
Scaling Native Multimodal Pre-Training From Scratch

Paper • 2607.22043 • Published 4 days ago • 18

Collection guide
Browse collections

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs