Welcome! ๐Ÿ™‹โ€โ™‚๏ธ View more

Engineering ๐Ÿ’ป/BigData

[HIVE] ํ•˜์ด๋ธŒ-Python ์—ฐ๋™ (feat. sqlalchemy)

DeepFlame 2022. 4. 20. 23:17

 

ํ”„๋กœ์ ํŠธ๋ฅผ ์ง„ํ–‰ํ•˜๋ฉด์„œ Python์„ ํ†ตํ•ด Hive๋ฅผ ์ปจํŠธ๋กคํ•˜๋Š” ๋‹ˆ์ฆˆ๊ฐ€ ๋ฐœ์ƒํ–ˆ๋‹ค. 

sqlalchemy๋ฅผ ํ™œ์šฉํ•˜์—ฌ, create select insert๋ฅผ ๊ตฌํ˜„ํ–ˆ๋‹ค.

 

sqlalchemy ๋ž€?

Python์—์„œ ์‚ฌ์šฉ๊ฐ€๋Šฅํ•œ ORM(Object-relational maping)์ด๋‹ค. 

์ฆ‰, ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์˜ ๋ฐ์ดํ„ฐ๋ฅผ Objectํ•„๋“œ์™€ ๋งคํ•‘ํ•ด์ค€๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ์‚ฌ์‹ค DB์˜ ์Šคํ‚ค๋งˆ์™€ Object๋Š” ์„œ๋กœ ๊ธฐ์กด๋ถ€ํ„ฐ ํ˜ธํ™˜๊ฐ€๋Šฅ์„ฑ์„ ๋‘๊ณ  ๋งŒ๋“ค์–ด์ง„ ๊ฒƒ์ด ์•„๋‹ˆ๊ธฐ์— ๋ถˆ์ผ์น˜๊ฐ€ ๋ฐœ์ƒํ•˜๋Š”๋ฐ,, ORM์€ ๊ฐ์ฒด ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ SQL๋ฌธ์„ ์ž๋™์œผ๋กœ ์ƒ์„ฑํ•˜์—ฌ ์ด๋Ÿฌํ•œ ๋ถˆ์ผ์น˜๋ฅผ ํ•ด๊ฒฐํ•œ๋‹ค.

์ด๋ฅผ ํ†ตํ•ด์„œ ์ฟผ๋ฆฌ๋ฅผ ์‹คํ–‰ํ•  ์ˆ˜๋„ ์žˆ๊ณ , Python์˜ Dataframe๊ณผ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๊ฐ„์„ ์—ฐ๊ฒฐํ•  ์ˆ˜ ์žˆ๋‹ค.

ORM

 

 

Hive-Python ์—ฐ๋™

1. select

from sqlalchemy import create_engine
engine = create_engine('hive://localhost:10000')

# select๋ฌธ
df = pd.read_sql("select * from db.table", engine)

 

2. Create

from sqlalchemy import create_engine
engine = create_engine('hive://localhost:10000')

# Create
db = db_name
table = table_name
schema = '(`val1` string, ' \
         '`val2` Date, ' \
         '`val3` int) '

query = ('''CREATE TABLE IF NOT EXISTS %s.%s %s
PARTITIONED BY (year int, month int, day int)
STORED AS PARQUET
''') % (db, table, schema)
engine.execute(query)

# PARTITIONED BY: ํŒŒํ‹ฐ์…”๋‹์„ ํ†ตํ•ด์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„ํ• ํ•œ๋‹ค (๋ฐ์ดํ„ฐ ์–‘์„ ์ œํ•œํ•˜์—ฌ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ฌ์ˆ˜ ์žˆ๋‹ค)
# STORED AS PARQUET: ์นผ๋Ÿผ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•œ๋‹ค

 

3. Insert

from sqlalchemy import create_engine
engine = create_engine('hive://localhost:10000')

# Insert๋ฌธ
df.to_sql(schema=db, name=table, con=engine,
          index=False, method='multi', if_exists='append')
          
# index: df์˜ index๋ฅผ db์˜ ์นผ๋Ÿผ์œผ๋กœ ์ถ”๊ฐ€ํ• ์ง€ ์—ฌ๋ถ€
# method: multi์ผ ๊ฒฝ์šฐ, ๋‹ค์ˆ˜์˜ ํ–‰์„ ๋Œ€์ƒ์œผ๋กœ ํ•œ๋‹ค
# if_exists: ๋งŒ์•ฝ table์ด ์กด์žฌํ•  ๊ฒฝ์šฐ, ๋’ค์— ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ™์ธ๋‹ค

 

 

์ด๋ฅผ ํด๋ž˜์Šคํ™”ํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

 

 

์ฐธ๊ณ 
https://moons08.github.io/programming/Hive_Partition/
https://moons08.github.io/programming/Hive_Table/#parquet
https://ulfrid.github.io/python/python-sqlalchemy/