PaperLens-7B-Text-arXiv

SFT fine-tune of Qwen/Qwen2.5-7B-Instruct trained to predict ICLR-style Accept/Reject verdicts on academic papers from the paper text.

Modality: text
Training data: arxiv-large (71k balanced_per_venue)
Checkpoint: step 4442, end of epoch 2 of 4 (7B → ep2 by convention)
Hyperparams: LR 1e-6, cosine_then_constant scheduler (decay_ratio 0.75, min_lr_rate 0.001), batch size 32, cutoff_len 24480, framework LLaMA-Factory + FSDP2.

Quickstart — serve + submit a PDF or LaTeX source dir

Easiest path uses the PaperLens orchestrator (paperprep + scoring server + web UI all wired up). Clone, run setup, and chain into the UI:

git clone https://github.com/zlab-princeton/PaperLens.git
cd PaperLens
uv tool install .              # installs the `paperlens` CLI globally on PATH
paperlens setup --serve        # in the wizard, pick: size=7B · modality=text · domain=arxiv
# → web UI on http://localhost:8003 (PDF upload + LaTeX dir browse)

Or hit the API directly (FastAPI on the same port):

# Submit an anonymized PDF; poll for the verdict
JOB=$(curl -s -F file=@anonymized.pdf http://localhost:8003/submit | jq -r .job_id)
curl http://localhost:8003/status/$JOB        # → job dict: state, verdict, p_accept, ...

# Submit a LaTeX source directory (anonymized) or an arXiv id
curl -X POST http://localhost:8003/submit_latex \
     -H "Content-Type: application/json" \
     -d '{"path": "/abs/path/to/anonymized_latex_dir"}'
curl -X POST http://localhost:8003/submit_arxiv \
     -H "Content-Type: application/json" \
     -d '{"arxiv_id": "2511.08364"}'

Headless one-shot (no server):

paperlens run /abs/path/to/anonymized.pdf

Lower-level: stand up just a vLLM scoring server with pre-prep'd sharegpt rows (skips paperprep):

vllm serve skonan/PaperLens-7B-Text-arXiv --task generate --gpu-memory-utilization 0.85
# OpenAI-compat API on :8000 — format prompts per the "Prompt format" section below.

Test results (in-distribution, calibrated)

Evaluated on arxiv-balanced-test. Calibration threshold picked on arxiv-balanced-val. Score = logprob(Accept) − logprob(Reject) at the decision-token position. pA = predicted accept rate; A_rec / R_rec = accept / reject recall.

n_test	Acc	AUC	pA	A_rec	R_rec
1415	74.6%	0.829	55%	80.7%	68.8%

Note on training-size asymmetry

Arxiv-trained models saw ~71k examples (4 epochs ≈ 8884 / 17724 steps). Arxiv-trained vs ICLR-trained models saw a ~3× data gap — direct comparisons should account for it.

Prompt format

Inputs are sharegpt-style 3-turn conversations: system, human, gpt. SYSTEM is the same string across all 8 PaperLens models. USER preamble differs per training domain. Vision variants append one <image> token per page-screenshot at the end of the user message.

SYSTEM (all PaperLens models)

You are an expert academic reviewer tasked with evaluating research papers.

ARXIV-trained USER preamble (verbatim)

I am giving you a paper submitted to a top machine-learning venue. Predict its acceptance outcome.
 - Your answer will either be: \boxed{Accept} or \boxed{Reject}
 - Note: typical top-tier ML venues have ~25-30% acceptance rates

# <PAPER TITLE>
...paper body in markdown...

ASSISTANT (gold)

Outcome: \boxed{Accept}

Outcome: \boxed{Reject}

At inference, the decision logprobs at the boxed-token position (5th generated token under the qwen template) are used for calibration; either parse the text or read logprobs directly.

Concrete example (TEXT, ARXIV-trained)

[SYSTEM]
You are an expert academic reviewer tasked with evaluating research papers.

[USER]
I am giving you a paper submitted to a top machine-learning venue. Predict its acceptance outcome.
 - Your answer will either be: \boxed{Accept} or \boxed{Reject}
 - Note: typical top-tier ML venues have ~25-30% acceptance rates

# SSAST: SELF-SUPERVISED AUDIO SPECTROGRAM TRANSFORMER

## Abstract
... ~32k chars of paper body ...

[ASSISTANT]
Outcome: \boxed{Accept}

Related models + datasets in the PaperLens collection

All 8 single-domain SFT models (this one plus 7 siblings) plus the companion PaperLens-Text and PaperLens-Vision datasets live in the PaperLens collection. Pairwise comparisons across {3B, 7B} × {text, vision} × {arxiv, openreview-iclr} are intended.

Downloads last month: 45

Safetensors

Model size

8B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for skonan/PaperLens-7B-Text-arXiv

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Finetuned

(2603)

this model