Welcome! ๐Ÿ™‹โ€โ™‚๏ธ View more

AI

Chain of Thought(CoT): AI๊ฐ€ ์‹ ๋ขฐ๋ฅผ ์ฃผ๋Š” ๋ฐฉ๋ฒ•

DeepFlame 2026. 3. 21. 20:36

์ž‘๋…„ ํšŒ์‚ฌ ์ง€์›์œผ๋กœ ์—ฐ์„ธ๋Œ€ํ•™๊ต AI ๊ต์œก์„ ๊ธˆ์š”์ผ๋งˆ๋‹ค 7์ฃผ๊ฐ„ ์ˆ˜๋ฃŒํ–ˆ์Šต๋‹ˆ๋‹ค.
๋‹น์‹œ ๊ฐ€์žฅ ํฅ๋ฏธ๋กญ๊ฒŒ ๋‹ค๋ฃจ์—ˆ๋˜ ํ”„๋กœ์ ํŠธ ์ฃผ์ œ์ธ 'Chain of Thought(CoT)'์— ๋Œ€ํ•ด ์ •๋ฆฌํ•˜๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
๋ฐœํ‘œ ๋‹ด๋‹น์ž๋กœ์„œ ๋ณต์žกํ•œ ๊ฐœ๋…์„ ์ตœ๋Œ€ํ•œ ์‰ฝ๊ฒŒ ํ’€์–ด์„œ ์„ค๋ช…ํ•˜๋ ค ๋…ธ๋ ฅํ–ˆ๋˜ ๋‚ด์šฉ๋“ค์„ ํ•ต์‹ฌ ์œ„์ฃผ๋กœ ๊ณต์œ ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

 


๐Ÿง˜ Chain-of-Thought (CoT) ๊ฐ€ ๋ฌด์—‡์ธ๊ฐ€?

LLM์ด ๋‹จ์ˆœํžˆ ๋‹ต๋งŒ ์ฐพ์•„๋‚ด๋Š” ๊ฒƒ์—์„œ ๊ทธ์น˜์ง€ ์•Š๊ณ , ๋…ผ๋ฆฌ์  ์‚ฌ๊ณ ๋ฅผ ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋•๋Š” ๋ฐฉ๋ฒ•๋ก ์ž…๋‹ˆ๋‹ค.
์•„๋ž˜ ๊ทธ๋ฆผ์—์„œ LLM์—๊ฒŒ ํ’€์ด๊ณผ์ •์„ ์“ฐ๋„๋ก ์œ ๋„ํ–ˆ๋”๋‹ˆ, ๊ธฐ์กด์— ํ‹€๋ ธ๋˜ ๋‹ต์„ ๋งž์ถ”๋ฉฐ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

์ธ๋ฅ˜๊ฐ€ AI์—๊ฒŒ ๊ธฐ๋Œ€ํ•˜๋Š” ๊ฒƒ์€ ๋‹จ์ˆœํžˆ ํ•™์Šต๋œ ๋‚ด์šฉ์„ ๊ทธ๋Œ€๋กœ ๋‚ด๋ฑ‰๋Š” ๊ฒƒ์ด ์•„๋‹™๋‹ˆ๋‹ค.
์ธ๋ฅ˜์˜ ๋ฏธ์ œ ๋ฌธ์ œ๋ฅผ ์–ธ์  ๊ฐ€๋Š” ํ•ด๊ฒฐํ•ด์ฃผ๊ธธ ๊ธฐ๋Œ€ํ•˜ ์ˆ˜์กฐ ์› ์”ฉ ํˆฌ์žํ•˜๊ณ  ์žˆ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
๊ทธ๋Ÿฐ ๊ด€์ ์—์„œ ๋ดค์„ ๋•Œ, ๋‹จ์ˆœํ•ด ๋ณด์ด๋Š” ์ด ๋ฐฉ๋ฒ•๋ก ์ด ๊ทธ ํ•ด๋‹ต์˜ ์‹ค๋งˆ๋ฆฌ๊ฐ€ ๋  ์ˆ˜ ์žˆ์„ ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€๋ฉ๋‹ˆ๋‹ค.

https://arxiv.org/pdf/2201.11903

https://arxiv.org/abs/2201.11903

 

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in su

arxiv.org

 

 

โœจ Self-consistency (์ž๊ธฐ ์ผ๊ด€์„ฑ)?

์ž ์ƒ๊ฐํ•ด๋ด…์‹œ๋‹ค. LLM์˜ ๊ฒฐ๊ณผ๋Š” ๊ฒฐ๊ตญ ํ™•๋ฅ ์— ์˜ํ•ด์„œ ๋ณ€๊ฒฝ๋  ์ˆ˜ ์žˆ๋Š” ๊ฐ’์ž…๋‹ˆ๋‹ค. ๋˜‘๊ฐ™์€ ์ž…๋ ฅ๊ฐ’์„ ๋„ฃ์—ˆ์„ ๋•Œ ๋‹ค์–‘ํ•œ ๋‹ต๋ณ€์ด ๋„์ถœ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๊ทธ๋Ÿผ ์—ฌ๋Ÿฌ ๋ฒˆ ์ž…๋ ฅํ•œ ํ›„ ๋ณต์ˆ˜ ๊ฐœ์˜ ๋‹ต๋ณ€์„ ๋ฐ›๊ณ , ํˆฌํ‘œ๋กœ ๊ฒฐ๊ณผ๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•๋„ ์ƒ๊ฐํ•ด๋ณผ ์ˆ˜ ์žˆ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
๊ทธ๊ฒƒ์ด Self-consistency ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.

์•„๋ž˜ ๊ทธ๋ฆผ๊ณผ ๊ฐ™์ด Greedy decode ๋ฐฉ๋ฒ•์œผ๋กœ๋Š” ํ•œ ๊ฒฝ๋กœ๋งŒ ํƒ์ƒ‰ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ƒ๋Œ€์ ์œผ๋กœ ์‹ค์ˆ˜์— ์ทจ์•ฝํ•œ ๋ชจ์Šต์„ ๋ณด์ž…๋‹ˆ๋‹ค.
๋‹ค์–‘ํ•œ ๊ฒฝ๋กœ๋ฅผ ํƒ์ƒ‰ํ•˜๊ณ , ๊ฐ€์žฅ ๋งŽ์ด ๋“ฑ์žฅํ•˜๋Š” ๋‹ต ์ฆ‰ ์ž๊ธฐ ์ผ๊ด€์„ฑ์ด ๋†’์€ ๋‹ต์„ ์ตœ์ข…์œผ๋กœ ์ง‘๊ณ„ํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์ž…๋‹ˆ๋‹ค.

https://arxiv.org/abs/2203.11171

 

Self-Consistency Improves Chain of Thought Reasoning in Language Models

Chain-of-thought prompting combined with pre-trained large language models has achieved encouraging results on complex reasoning tasks. In this paper, we propose a new decoding strategy, self-consistency, to replace the naive greedy decoding used in chain-

arxiv.org

 

๐Ÿค– ๊ธฐ๊ณ„๊ฐ€ ๊ฒฐ๊ณผ๋ฅผ ํŒ๋‹จํ•œ๋‹ค?

Self-consistency ๋ฐฉ๋ฒ•์€ ๋งŒ๋Šฅ์ด ์•„๋‹™๋‹ˆ๋‹ค.
๋งŒ์•ฝ ๋ชจ๋ธ์˜ ๊ธฐ๋ณธ ์„ฑ๋Šฅ์ด ๋„ˆ๋ฌด ๋‚ฎ์•„ ๋‹ค์ˆ˜์˜ ์ƒ˜ํ”Œ์ด ์˜ค๋‹ต์„ ์ƒ์„ฑํ•˜๋ฉด, ์˜ค๋‹ต์ด ๊ฒฐ๊ณผ๋กœ ๋„์ถœ๋˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

๊ทธ๋ž˜์„œ ๋‹ต๋ณ€์— ๋Œ€ํ•œ ์ •ํ™•๋„๋ฅผ ํ‰๊ฐ€ํ•˜๋Š” ๋ชจ๋ธ์ด ๊ณ ์•ˆ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

  • ORM (Outcome Reward Model): ๊ฒฐ๊ณผ๋ฅผ ๋ชจ๋ธ์ด ํŒ๋‹จํ•ด ์ ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋†’์€ ์ƒ˜ํ”Œ์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.
  • PRM (Process Reward Model): ๊ฐ ๋‹จ๊ณ„๋ฅผ ๊ฐœ๋ณ„ ์—ํ”ผ์†Œ๋“œ๋กœ ์ทจ๊ธ‰ํ•˜๋ฉฐ, ๊ฐ ๋‹จ๊ณ„ ๋งˆ๋‹ค ๋ชจ๋ธ์ด ์ ์ˆ˜๋ฅผ ์ธก์ •ํ•ฉ๋‹ˆ๋‹ค. (์ง€๊ธˆ๊นŒ์ง€์˜ ๋‹จ๊ณ„๋“ค์ด ์ •ํ™•ํ•œ๊ฐ€๋ฅผ ํ‰๊ฐ€)

 

๋…ผ๋ฌธ์˜ ๋‚ด์šฉ์œผ๋กœ๋Š” PRM, ORM ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด ์˜ค๋ฅ˜์œจ์„ ์œ ์˜ํ•˜๊ฒŒ ์ค„์˜€๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
๋˜ํ•œ ์ด ๋ฐฉ๋ฒ•์ด ์ธ๊ฐ„์˜ ์‚ฌ๊ณ  ๋ฐฉ์‹์„ ๋”ฐ๋ฅด๊ธฐ ๋•Œ๋ฌธ์— ํŠนํžˆ ๋ณต์žกํ•œ ์ถ”๋ก (์ˆ˜ํ•™, ์ฝ”๋”ฉ)์—์„œ ํšจ์œจ์„ฑ์ด ๊ทน๋Œ€ํ™”๋˜์—ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

https://arxiv.org/abs/2211.14275

 

Solving math word problems with process- and outcome-based feedback

Recent work has shown that asking language models to generate reasoning steps improves performance on many reasoning tasks. When moving beyond prompting, this raises the question of how we should supervise such models: outcome-based approaches which superv

arxiv.org

 

 

์œ„ ๋…ผ๋ฌธ์—์„œ ๋ฐœํ‘œํ•œ PRM ๋ชจ๋ธ์ด ์ˆ˜ํ•™ ๋ฌธ์ œ์—๋งŒ ํŠนํ™”๋˜์–ด์žˆ๋‹ค๋ณด๋‹ˆ, ๋‹ค์–‘ํ•œ ๋„๋ฉ”์ธ์—๋Š” ์ทจ์•ฝํ•œ ๋ชจ์Šต์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.
๊ทธ๋Ÿฌ๋‚˜ ๊ทธ ํ›„ ๋ฐœํ‘œ๋œ Versa PRM์€ ์—ฌ๋Ÿฌ ๋„๋ฉ”์ธ์„ ํ•™์Šต ์‹œํ‚ด์œผ๋กœ์จ ์ด๋ฅผ ํ•ด๊ฒฐํ–ˆ์Šต๋‹ˆ๋‹ค.

์ด ์—ฐ๊ตฌ๋Š” PRM์ด ์—ฌ๋Ÿฌ ๋„๋ฉ”์ธ์—์„œ ๋‘๊ฐ์„ ๋“œ๋Ÿฌ๋‚ผ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ์‹œ์‚ฌํ–ˆ์Šต๋‹ˆ๋‹ค.

https://arxiv.org/abs/2502.06737

 

VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data

Process Reward Models (PRMs) have proven effective at enhancing mathematical reasoning for Large Language Models (LLMs) by leveraging increased inference-time computation. However, they are predominantly trained on mathematical data and their generalizabil

arxiv.org

 

 


๐Ÿค” ์œ„ ๋ฐฉ๋ฒ•๋“ค์˜ ๋ฌธ์ œ์ ์€?

์œ„ ๋ฐฉ๋ฒ•๋“ค์„ ๋”ฐ๋ผ๊ฐ€๋‹ค๋ณด๋ฉด ์ ์  ๋” ์ธ๊ฐ„์˜ ์‚ฌ๊ณ  ๋ฐฉ์‹์— ๊ฐ€๊นŒ์›Œ ์ง€๋Š” ๊ฒƒ์ด ๋А๊ปด์ง‘๋‹ˆ๋‹ค.
ํ•˜์ง€๋งŒ ๋น„์šฉ ๋ฌธ์ œ๊ฐ€ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.

๊ณ ์„ฑ๋Šฅ LLM ํ•˜๋‚˜๋ฅผ ์„œ๋น™ํ•˜๋Š” ๊ฒƒ๋„ ๋ถ€๋‹ด์ธ๋ฐ, CoT์˜ ํŠน์„ฑ์ƒ k๊ฐœ์˜ ์ƒ˜ํ”Œ์„ ์ถ”์ถœํ•˜๊ณ  ์—ฌ๊ธฐ์— ๊ณผ์ •๋งˆ๋‹ค ์ ์ˆ˜๋ฅผ ๋งค๊ธธ PRM ๋ชจ๋ธ๊นŒ์ง€ ๋ณ„๋„๋กœ ์šด์˜ํ•ด์•ผ ํ•œ๋‹ค๋ฉด ์ด๋Š” ์‹œ๊ฐ„์ /์ž์›์  ๋น„์šฉ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๊ต์œก์„ ๋“ค์œผ๋ฉด์„œ๋„ ์ด ํฅ๋ฏธ๋กœ์šด ๋ฐฉ๋ฒ•์„ ์–ด๋–ป๊ฒŒ ์‹ค๋ฌด ํ™˜๊ฒฝ์— ๋…น์—ฌ๋‚ผ ์ˆ˜ ์žˆ์„๊นŒ์— ๋Œ€ํ•œ ๊ณ ๋ฏผ์„ ํ–ˆ์Šต๋‹ˆ๋‹ค.
๊ทธ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋ ค๋Š” ์‹œ๋„๋ฅผ ํ–ˆ๋˜ ๊ฒƒ์ด ์—ฐ๊ตฌ ์ฃผ์ œ์˜€์Šต๋‹ˆ๋‹ค. (๋ฌผ๋ก  ์‹œ๊ฐ„์ด ๋ถ€์กฑํ•ด ์™„๋ฒฝํ•œ ํ•ด๊ฒฐ์€ ๋ชป ํ–ˆ์ง€๋งŒ์š” ๐Ÿ˜…)

 

 

๐Ÿ”ฅ ๊ทธ๋Ÿผ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ 

๋…ผ๋ฆฌ์  ์ถ”๋ก ์€ ํ˜„์žฌ LLM์„ ํ‰๊ฐ€ํ•˜๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค.
HuggingFace Dashboard์—์„œ ๋ชจ๋ธ์„ ํ‰๊ฐ€ํ•˜๋Š” ์ง€ํ‘œ๋ฅผ ๋ณด๋ฉด reasoning์„ ํ‰๊ฐ€ํ•˜๋Š” ๋ฐ์ดํ„ฐ์…‹์ด ๋‹ค์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ทธ๋งŒํผ ํ˜„์žฌ ๋ชจ๋ธ์—์„œ ์ค‘์š”ํ•œ ๋ถ€๋ถ„์ž„์„ ๋ฐ˜์ฆํ•˜๋Š” ๊ฒƒ์œผ๋กœ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

https://huggingface.co/docs/leaderboards/open_llm_leaderboard/about#tasks

 

About · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

 

ํ†ต๊ณ„ํ•™์„ ์ „๊ณตํ•œ ์ž…์žฅ์—์„œ, ํ•œ๋™์•ˆ ๋”ฅ๋Ÿฌ๋‹์˜ ๋ถ€์ƒ์„ ๋ณต์žกํ•œ ์‹ฌ๊ฒฝ์œผ๋กœ ์ง€์ผœ๋ณด์•˜์Šต๋‹ˆ๋‹ค.
ํ†ต๊ณ„ํ•™์€ ํ˜„์ƒ์˜ ๋ฐœ์ƒ ์š”์ธ์„ ๋ชจ๋ธ๋งํ•˜๊ณ  ๊ฐ ๋ณ€์ˆ˜์˜ ๊ฒฝํ–ฅ์„ฑ์„ ํŒŒ์•…ํ•˜๋Š” ํ•ด์„์˜ ํ•™๋ฌธ์ธ ๋ฐ˜๋ฉด, ๋”ฅ๋Ÿฌ๋‹์€ ๋›ฐ์–ด๋‚œ ์˜ˆ์ธก๋ ฅ์— ๋น„ํ•ด ๊ทธ ๋‚ด๋ถ€ ๊ธฐ์ œ๋Š” ์•Œ ์ˆ˜ ์—†๋Š” ๋ธ”๋ž™๋ฐ•์Šค์˜€๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.
์‹œ์žฅ์€ ํ•ด์„๋ณด๋‹ค ์˜ˆ์ธก์˜ ํšจ์œจ์„ฑ์— ์†์„ ๋“ค์–ด์ฃผ์—ˆ์ง€๋งŒ, ์ „๋ฌธ๊ฐ€๋กœ์„œ์˜ ๊ฐˆ์ฆ์€ ์—ฌ์ „ํ–ˆ์Šต๋‹ˆ๋‹ค.

๊ทธ๋Ÿฐ๋ฐ CoT๋ฅผ ํ†ตํ•ด์„œ๋ผ๋ฉด "AI๊ฐ€ ์™œ ์ด๋Ÿฐ ๋‹ต์„ ๋‚ด๋†“์•˜๋Š”๊ฐ€?"์— ๋Œ€ํ•œ ํ•ด๋‹ต์„ ์ค„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์ด๋ฅผ ํ†ตํ•ด์„œ ๋”ฅ๋Ÿฌ๋‹์˜ ๊ณ ์งˆ์ ์ธ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ €์™€ ๊ฐ™์€ ํ†ต๊ณ„์  ์‚ฌ๊ณ ๋ฅผ ํ•˜๋Š” ์ด๋“ค๋„ ๋‚ฉ๋“ํ•  ์ˆ˜ ์žˆ๋Š” ์„ค๋ช… ๊ฐ€๋Šฅํ•œ ์‹ ๋ขฐ์„ฑ์„ ๊ฐ€์ง€๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
ํŠนํžˆ ์˜๋ฃŒ, ๋ฒ•๋ฅ , ๊ธˆ์œต ๋“ฑ ๊ทผ๊ฑฐ๊ฐ€ ์ƒ๋ช…์ธ ์ „๋ฌธ ๋ถ„์•ผ์—์„œ CoT๋Š” ๋‹จ์ˆœํ•œ ๊ธฐ์ˆ ์„ ๋„˜์–ด ํ•ต์‹ฌ์ ์ธ ์•ˆ์ „์žฅ์น˜ ์—ญํ• ์„ ์ˆ˜ํ–‰ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์‚ฌ์‹ค ๋น„์šฉ๊ณผ ์‹œ๊ฐ„ ๋ฌธ์ œ๋Š” ํ•˜๋“œ์›จ์–ด ๊ฐ€์†์ด๋‚˜ ๋ชจ๋ธ ์†Œํ˜•ํ™” ๊ธฐ์ˆ ๋กœ ์ ์ฐจ ํ•ด๊ฒฐ๋  ์ˆ˜ ์žˆ๋Š” ๋ฌธ์ œ๋ผ๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.
ํ•˜์ง€๋งŒ ๋…ผ๋ฆฌ์ ์œผ๋กœ ์ƒ๊ฐํ•˜๋Š” ๋Šฅ๋ ฅ์€ ๋Œ€์ฒด ๋ถˆ๊ฐ€๋Šฅํ•œ ๊ฐ€์น˜๋ผ๊ณ  ์ƒ๊ฐ๋˜๋ฉฐ, ์•ž์œผ๋กœ ์–ด๋–ป๊ฒŒ ๋ฐฉํ–ฅ์ด ํ˜๋Ÿฌ๊ฐˆ์ง€ ๊ธฐ๋Œ€๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.

์ˆ˜๋ฃŒ ์™„๋ฃŒ!