This academic paper introduces PROXYTHINKER, a novel inference-time method designed to enhance the visual reasoning abilities of large vision-language models (LVLMs). Unlike computationally expensive fine-tuning approaches like reinforcement fine-tuning (RFT), PROXYTHINKER allows larger models to inherit reasoning skills from smaller, pre-trained reasoning models. It achieves this by adjusting the large model's output based on the difference between a small RFT expert's output and a small base model's output. The paper demonstrates that this training-free techniquesignificantly improves performance on various visual reasoning benchmarks while being computationally efficient.

Podden och tillhörande omslagsbild pÄ den hÀr sidan tillhör Neural Intelligence Network. InnehÄllet i podden Àr skapat av Neural Intelligence Network och inte av, eller tillsammans med, Poddtoppen.

Senast besökta

Neural intel Pod

ProxyThinker: Guiding Large Models with Small Reasoners

00:00