PaperLens-7B-Text-arXiv
SFT fine-tune of Qwen/Qwen2.5-7B-Instruct trained to predict ICLR-style Accept/Reject verdicts on academic papers from the paper text.
- Modality: text
- Training data: arxiv-large (71k balanced_per_venue)
- Checkpoint: step 4442, end of epoch 2 of 4 (7B โ ep2 by convention)
- Hyperparams: LR 1e-6, cosine_then_constant scheduler (decay_ratio 0.75, min_lr_rate 0.001), batch size 32, cutoff_len 24480, framework LLaMA-Factory + FSDP2.
Quickstart โ serve + submit a PDF or LaTeX source dir
Easiest path uses the PaperLens orchestrator (paperprep + scoring server + web UI all wired up). Clone, run setup, and chain into the UI:
git clone https://github.com/zlab-princeton/PaperLens.git
cd PaperLens
uv tool install . # installs the `paperlens` CLI globally on PATH
paperlens setup --serve # in the wizard, pick: size=7B ยท modality=text ยท domain=arxiv
# โ web UI on http://localhost:8003 (PDF upload + LaTeX dir browse)
Or hit the API directly (FastAPI on the same port):
# Submit an anonymized PDF; poll for the verdict
JOB=$(curl -s -F file=@anonymized.pdf http://localhost:8003/submit | jq -r .job_id)
curl http://localhost:8003/status/$JOB # โ job dict: state, verdict, p_accept, ...
# Submit a LaTeX source directory (anonymized) or an arXiv id
curl -X POST http://localhost:8003/submit_latex \
-H "Content-Type: application/json" \
-d '{"path": "/abs/path/to/anonymized_latex_dir"}'
curl -X POST http://localhost:8003/submit_arxiv \
-H "Content-Type: application/json" \
-d '{"arxiv_id": "2511.08364"}'
Headless one-shot (no server):
paperlens run /abs/path/to/anonymized.pdf
Lower-level: stand up just a vLLM scoring server with pre-prep'd sharegpt rows (skips paperprep):
vllm serve skonan/PaperLens-7B-Text-arXiv --task generate --gpu-memory-utilization 0.85
# OpenAI-compat API on :8000 โ format prompts per the "Prompt format" section below.
Test results (in-distribution, calibrated)
Evaluated on arxiv-balanced-test. Calibration threshold picked on arxiv-balanced-val. Score = logprob(Accept) โ logprob(Reject) at the decision-token position. pA = predicted accept rate; A_rec / R_rec = accept / reject recall.
| n_test | Acc | AUC | pA | A_rec | R_rec |
|---|---|---|---|---|---|
| 1415 | 74.6% | 0.829 | 55% | 80.7% | 68.8% |
Note on training-size asymmetry
Arxiv-trained models saw ~71k examples (4 epochs โ 8884 / 17724 steps). Arxiv-trained vs ICLR-trained models saw a ~3ร data gap โ direct comparisons should account for it.
Prompt format
Inputs are sharegpt-style 3-turn conversations: system, human, gpt. SYSTEM is the same string across all 8 PaperLens models. USER preamble differs per training domain. Vision variants append one <image> token per page-screenshot at the end of the user message.
SYSTEM (all PaperLens models)
You are an expert academic reviewer tasked with evaluating research papers.
ARXIV-trained USER preamble (verbatim)
I am giving you a paper submitted to a top machine-learning venue. Predict its acceptance outcome.
- Your answer will either be: \boxed{Accept} or \boxed{Reject}
- Note: typical top-tier ML venues have ~25-30% acceptance rates
# <PAPER TITLE>
...paper body in markdown...
ASSISTANT (gold)
Outcome: \boxed{Accept}
or
Outcome: \boxed{Reject}
At inference, the decision logprobs at the boxed-token position (5th generated token under the qwen template) are used for calibration; either parse the text or read logprobs directly.
Concrete example (TEXT, ARXIV-trained)
[SYSTEM]
You are an expert academic reviewer tasked with evaluating research papers.
[USER]
I am giving you a paper submitted to a top machine-learning venue. Predict its acceptance outcome.
- Your answer will either be: \boxed{Accept} or \boxed{Reject}
- Note: typical top-tier ML venues have ~25-30% acceptance rates
# SSAST: SELF-SUPERVISED AUDIO SPECTROGRAM TRANSFORMER
## Abstract
... ~32k chars of paper body ...
[ASSISTANT]
Outcome: \boxed{Accept}
Related models + datasets in the PaperLens collection
All 8 single-domain SFT models (this one plus 7 siblings) plus the companion PaperLens-Text and PaperLens-Vision datasets live in the PaperLens collection. Pairwise comparisons across {3B, 7B} ร {text, vision} ร {arxiv, openreview-iclr} are intended.
- Downloads last month
- 45