Welcome! ๐Ÿ™‹โ€โ™‚๏ธ View more

mlops 2

Kubernetes ํ™•์žฅํŒ: Gateway API

๋“ค์–ด๊ฐ€๋ฉฐ์ €๋Š” ํ˜„์—…์—์„œ Kserve๋ฅผ ํ™œ์šฉํ•ด์„œ ๋ชจ๋ธ ์„œ๋น™์„ ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.Kserve๋Š” Serveless ๋ชจ๋“œ๋กœ ์„ค์น˜ํ•˜๋Š” ๊ฒƒ์ด ํ‘œ์ค€์ด์—ˆ๊ณ , Istio/Knative์™€ ํ•จ๊ป˜ ์„ค์น˜ํ•ด์„œ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ 0.15 ๋ฒ„์ „๋ถ€ํ„ฐ๋Š” RawDeployment ๋ชจ๋“œ(0.16 ๋ฒ„์ „๋ถ€ํ„ฐ๋Š” Standard ๋ชจ๋“œ๋ผ๊ณ  ๋ถˆ๋ฆผ) ์„ค์น˜๊ฐ€ ์ƒ๊ฒผ๊ณ , LLM ์„œ๋น™ ์‹œ์—๋Š” ํ•ด๋‹น ์„ค์น˜ ๋ฐฉ๋ฒ•์ด ๊ถŒ์žฅ๋œ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์•„๋ฌด๋ž˜๋„ Serverless ๋ชจ๋“œ์˜ ํ•ต์‹ฌ์ธ 'Scale-to-zero'๋Š” LLM์—์„œ ๋น„ํ˜„์‹ค์ ์œผ๋กœ ๋‹ค๊ฐ€์™”๋˜ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๊ธฐ๊ฐ€๋ฐ”์ดํŠธ ๋‹จ์œ„์˜ ๋ชจ๋ธ ๊ฐ€์ค‘์น˜๋ฅผ ๋‹ค์‹œ ๋กœ๋“œํ•˜๋Š” ๋ฐ ์‹œ๊ฐ„์ด ๋„ˆ๋ฌด ์˜ค๋ž˜ ๊ฑธ๋ฆฌ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.Knative์˜ ๊ตฌ์„ฑ์š”์†Œ๊ฐ€ ๋งŽ๊ณ  ๋””๋ฒ„๊น…๋„ ์–ด๋ ค์›Œ ๊ฑท์–ด๋‚ด๊ณ  ์‹ถ๋‹ค๋Š” ๋‹ˆ์ฆˆ๊ฐ€ ์žˆ์—ˆ๋Š”๋ฐ, ์—ฌ๋Ÿฌ ๊ณณ์—์„œ ์œ ์‚ฌํ•œ ๊ณ ๋ฏผ์ด ์žˆ์—ˆ๋˜ ๊ฒƒ ๊ฐ™์Šต..

ํด๋ผ์šฐ๋“œ ๋„ค์ดํ‹ฐ๋ธŒ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋””์ž์ธ ํŒจํ„ด: AI ํ”Œ๋žซํผ ๊ฐœ๋ฐœ์ž์˜ ์•„ํ‚คํ…์ฒ˜ ์„ฑ์ฐฐ

๋“ค์–ด๊ฐ€๋ฉฐ: ์ฟ ๋ฒ„๋„คํ‹ฐ์Šค๋ฅผ ์“ด๋‹ค๊ณ  ๋ชจ๋‘ 'ํด๋ผ์šฐ๋“œ ๋„ค์ดํ‹ฐ๋ธŒ'๋Š” ์•„๋‹™๋‹ˆ๋‹ค.์ €๋Š” GPU ๊ธฐ๋ฐ˜์˜ LLM ํ•™์Šต ๋ฐ ์„œ๋น™ ํ”Œ๋žซํผ์„ ๊ฐœ๋ฐœํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.GPU ๋ฆฌ์†Œ์Šค ์ง‘์•ฝ์ ์ธ ๋กœ์ง์„ ๋‹ค๋ฃจ๋‹ค ๋ณด๋‹ˆ ๋‹น์—ฐํžˆ ์ฟ ๋ฒ„๋„คํ‹ฐ์Šค ํ™˜๊ฒฝ์„ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, ๋‚˜๋ฆ„๋Œ€๋กœ ํด๋ผ์šฐ๋“œ ๋„ค์ดํ‹ฐ๋ธŒํ•œ ํ™˜๊ฒฝ์—์„œ ๊ฐœ๋ฐœํ•˜๊ณ  ์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ–ˆ์Šต๋‹ˆ๋‹ค.๊ทธ๋Ÿฌ๋‚˜ ๊ตฌํ˜„์—๋งŒ ์ง‘์ค‘ํ•œ ๋‚˜๋จธ์ง€ ๊ฒฐํ•ฉ๋„๊ฐ€ ๋†’์•„ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค.์˜ˆ๋ฅผ ๋“ค์–ด ํ•™์Šต๊ณผ ์„œ๋น™ ํŒŒํŠธ๋ฅผ ๋ถ„๋ฆฌํ•˜๊ณ  ์‹ถ์€๋ฐ, ํ˜„์žฌ ๊ฐœ๋ฐœ๋œ ๋‚ด์šฉ์œผ๋กœ๋Š” ์–ด๋ ต์Šต๋‹ˆ๋‹ค. ๋งŒ์•ฝ MSA (MicroService Architecture)๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ๋А์Šจํ•˜๊ฒŒ ์„œ๋น„์Šค๋“ค์„ ๊ฐœ๋ฐœํ–ˆ๋‹ค๋ฉด ์–ด๋ ต์ง€ ์•Š๊ฒŒ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ–ˆ์„ ๊ฒƒ ๊ฐ™๋‹ค๋Š” ์ƒ๊ฐ์ด ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค.๋‹จ์ˆœํžˆ ํˆด์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์„ ๋„˜์–ด, ํด๋ผ์šฐ๋“œ ํ™˜๊ฒฝ์— ์ตœ์ ํ™”๋œ ์„ค๊ณ„์˜ ๋ณธ์งˆ์„ ์ดํ•ดํ•˜๊ณ  ์‹ถ์–ด ์ด ์ฑ…์„ ์ ‘ํ–ˆ์Šต๋‹ˆ๋‹ค. ํด๋ผ์šฐ..