๐Ÿ’ป ๊ณต๊ณต๋ฐ์ดํ„ฐ๋กœ ๋ฐฐ์šฐ๋Š” Python ๋ฐ์ดํ„ฐ ๋ถ„์„

2025. 6. 1. 13:45ใ†Python(AI)

๐Ÿ•ต ๊ฒฝ์ฐฐ์ฒญ ๋ฒ”์ฃ„ ๋ฐœ์ƒ ์‹œ๊ฐ„๋Œ€ ๋ถ„์„ with Pandas & Matplotlib

"์ด์ œ๋Š” ๋ฐ์ดํ„ฐ ๋ถ„์„๋„ ํŒŒ์ด์ฌ์œผ๋กœ ์ง์ ‘ ํ•ด๋ณด๋Š” ์‹œ๋Œ€!"
๊ณต๋ถ€๋„ ํ•˜๊ณ , ์‚ฌํšŒ ๋ฌธ์ œ๋„ ๋ฐ์ดํ„ฐ๋กœ ๋ฐ”๋ผ๋ณด์ž. โœจ


๐ŸŽฏ ์˜ค๋Š˜์˜ ๋ชฉํ‘œ

  • ๊ณต๊ณต๋ฐ์ดํ„ฐํฌํ„ธ์—์„œ ๋ฐ›์€ CSV ํŒŒ์ผ์„ ๋ถˆ๋Ÿฌ์˜ค๊ณ 
  • Pandas๋กœ ๊ฐ€๊ณตํ•œ ๋’ค
  • Matplotlib๋กœ ์‹œ๊ฐ„๋Œ€๋ณ„ ๋ฒ”์ฃ„ ๋ฐœ์ƒ๋Ÿ‰์„ ์‹œ๊ฐํ™”ํ•ฉ๋‹ˆ๋‹ค.
  • ํ•œ๊ธ€์ด ๊นจ์ง€์ง€ ์•Š๋„๋ก ํฐํŠธ ์„ค์ •๋„ ๊ฐ™์ด!

๐Ÿ›  ๊ฐœ๋ฐœ ํ™˜๊ฒฝ

๋„๊ตฌ๋‚ด์šฉ
IDE VSCode + Jupyter ํ™•์žฅ
์–ธ์–ด Python 3.11
๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ pandas, matplotlib
๋ฐ์ดํ„ฐ ๊ฒฝ์ฐฐ์ฒญ_๋ฒ”์ฃ„ ๋ฐœ์ƒ ์‹œ๊ฐ„๋Œ€ ๋ฐ ์š”์ผ_20191231.csv (cp949 ์ธ์ฝ”๋”ฉ)
 
pip install pandas matplotlib

๐Ÿงช 1๋‹จ๊ณ„ – CSV ํŒŒ์ผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ

 
# %%
# pandas: ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ๊ธฐ๋ฐ˜ ๋ฐ์ดํ„ฐ ๋ถ„์„ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
import pandas as pd

# %%
# ๊ณต๊ณต๋ฐ์ดํ„ฐ CSV ํŒŒ์ผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ (ํ•œ๊ธ€ ์ธ์ฝ”๋”ฉ ๋ฌธ์ œ๋กœ cp949 ์‚ฌ์šฉ)
data = pd.read_csv('C:\\Dev\\ESG\\06_01\\๊ฒฝ์ฐฐ์ฒญ_๋ฒ”์ฃ„ ๋ฐœ์ƒ ์‹œ๊ฐ„๋Œ€ ๋ฐ ์š”์ผ_20191231.csv', encoding='cp949')

# %%
# ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ์ „์ฒด ์ถœ๋ ฅํ•ด๋ณด๊ธฐ (๋ฐ์ดํ„ฐ ๊ตฌ์กฐ ํ™•์ธ์šฉ)
print(data)

# %%
# ์—ด(Column) ๊ธฐ์ค€์œผ๋กœ ์ „์ฒด ํ•ฉ๊ณ„๋ฅผ ๊ตฌํ•จ (์‹œ๊ฐ„๋Œ€๋ณ„, ์š”์ผ๋ณ„ ์ดํ•ฉ ๋“ฑ ํ™•์ธ ๊ฐ€๋Šฅ)
print(data.sum(axis=0))

โœ”๏ธ encoding='cp949'๋Š” ํ•„์ˆ˜!
๊ณต๊ณต๋ฐ์ดํ„ฐ์˜ ํ•œ๊ธ€ CSV๋Š” ๋Œ€๋ถ€๋ถ„ UTF-8์ด ์•„๋‹™๋‹ˆ๋‹ค.
๊ทธ๋ฆฌ๊ณ  .sum(axis=0)์€ ์—ด๋ณ„ ํ•ฉ๊ณ„๋กœ, ์‹œ๊ฐ„๋Œ€·์š”์ผ๋ณ„ ๋ฒ”์ฃ„ ์ดํ•ฉ์ด ๋ณด์ด๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.


๐Ÿ” 2๋‹จ๊ณ„ – ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ ํ™•์ธ

# %% 
import matplotlib.pyplot as plt 

# %%
# ์ปฌ๋Ÿผ๋ช…(์—ด ์ด๋ฆ„)๊ณผ ์ธ๋ฑ์Šค(ํ–‰ ๋ฒˆํ˜ธ) ํ™•์ธํ•˜๊ธฐ
print(data.columns) 
print(data.index)

๐Ÿ‘€ ๋ฐ์ดํ„ฐ ํ™•์ธ ๊ฒฐ๊ณผ!
๋ฒ”์ฃ„ ์œ ํ˜•์€ ํ–‰(row)์— ์žˆ๊ณ , ์‹œ๊ฐ„๋Œ€/์š”์ผ์€ ์—ด(column)์— ์žˆ์Šต๋‹ˆ๋‹ค.
์šฐ๋ฆฌ๊ฐ€ ์›ํ•˜๋Š” ์‹œ๊ฐ„๋Œ€๋Š” ์•„๋ž˜ 8๊ฐœ:

# ๋ถ„์„์— ์‚ฌ์šฉํ•  ์‹œ๊ฐ„๋Œ€ ๊ด€๋ จ ์ปฌ๋Ÿผ ๋ฆฌ์ŠคํŠธ ์ •์˜ 
time_columns = [
'0์‹œ00๋ถ„-02์‹œ59๋ถ„', '03์‹œ00๋ถ„-05์‹œ59๋ถ„', '06์‹œ00๋ถ„-08์‹œ59๋ถ„', 
'09์‹œ00๋ถ„-11์‹œ59๋ถ„', '12์‹œ00๋ถ„-14์‹œ59๋ถ„', '15์‹œ00๋ถ„-17์‹œ59๋ถ„',
'18์‹œ00๋ถ„-20์‹œ59๋ถ„', '21์‹œ00๋ถ„-23์‹œ59๋ถ„' 
]

๐Ÿง  3๋‹จ๊ณ„ – ์‹œ๊ฐ„๋Œ€๋ณ„ ๋ฒ”์ฃ„ ์ดํ•ฉ ๊ตฌํ•˜๊ธฐ

python
๋ณต์‚ฌํŽธ์ง‘
# ๊ฐ ์‹œ๊ฐ„๋Œ€๋ณ„ ์ „์ฒด ํ•ฉ๊ณ„ ๊ตฌํ•˜๊ธฐ violent_by_time = data[time_columns].sum()

โœ… violent_by_time๋Š” ์ธ๋ฑ์Šค๊ฐ€ ์‹œ๊ฐ„๋Œ€, ๊ฐ’์€ ๋ฒ”์ฃ„ ๋ฐœ์ƒ ๊ฑด์ˆ˜์ธ ์‹œ๋ฆฌ์ฆˆ ํ˜•ํƒœ์ž…๋‹ˆ๋‹ค.


๐Ÿ˜ญ 4๋‹จ๊ณ„ – ํ•œ๊ธ€ ๊นจ์ง ํ•ด๊ฒฐ (์ค‘์š”!)

 
import matplotlib.font_manager as fm 
import platform 


# ์‹œ์Šคํ…œ์— ๋”ฐ๋ผ ํ•œ๊ธ€ ํฐํŠธ ์ง€์ • 
if platform.system() == 'Windows': 
	plt.rcParams['font.family'] = 'Malgun Gothic' # ์œˆ๋„์šฐ ๊ธฐ๋ณธ ํ•œ๊ธ€ ํฐํŠธ 
elif platform.system() == 'Darwin': # macOS 
	plt.rcParams['font.family'] = 'AppleGothic' 
else: # ๋ฆฌ๋ˆ…์Šค (ex: Colab) plt.rcParams['font.family'] = 'NanumGothic' 

# ๋งˆ์ด๋„ˆ์Šค ๋ถ€ํ˜ธ ๊นจ์ง ๋ฐฉ์ง€ 
plt.rcParams['axes.unicode_minus'] = False

โœ”๏ธ ์ด๊ฑธ ์„ค์ • ์•ˆ ํ•˜๋ฉด ์‹œ๊ฐ„๋Œ€, ๊ฐ•๋ ฅ๋ฒ”์ฃ„, ๊ฑด์ˆ˜ ๊ฐ™์€ ํ…์ŠคํŠธ๊ฐ€ ๊นจ์ง„ ์ฑ„ ์ถœ๋ ฅ๋ฉ๋‹ˆ๋‹ค.


๐Ÿ“Š 5๋‹จ๊ณ„ – ์‹œ๊ฐํ™” (Bar Chart)

# %% plt.figure(figsize=(10, 5)) # ๊ทธ๋ž˜ํ”„ ํฌ๊ธฐ ์„ค์ •
plt.bar(violent_by_time.index, violent_by_time.values) # ๋ง‰๋Œ€๊ทธ๋ž˜ํ”„ ์ƒ์„ฑ 

# ๊ทธ๋ž˜ํ”„ ํƒ€์ดํ‹€๊ณผ ๋ผ๋ฒจ ์„ค์ •
plt.title('๊ฐ•๋ ฅ๋ฒ”์ฃ„ ์‹œ๊ฐ„๋Œ€๋ณ„ ๋ฐœ์ƒ ๊ฑด์ˆ˜ (์ „์ฒด ํ•ฉ์‚ฐ)')
plt.xlabel('์‹œ๊ฐ„๋Œ€')
plt.ylabel('๊ฑด์ˆ˜') 
plt.xticks(rotation=45) # X์ถ• ๊ธ€์”จ ๊ธฐ์šธ์ด๊ธฐ 
plt.tight_layout() # ์—ฌ๋ฐฑ ์ž๋™ ์กฐ์ ˆ plt.show()

๐Ÿ“ˆ ์งœ์ž”!
์‹œ๊ฐํ™”๋กœ ํ•œ๋ˆˆ์— ํ™•์ธ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
๋ฒ”์ฃ„๋Š” ์ƒˆ๋ฒฝ์—๋„ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค… ์กฐ์‹ฌํ•˜์„ธ์š” ์—ฌ๋Ÿฌ๋ถ„ ๐Ÿ™ˆ