주제 : 롤 좀 하니? '이것'만 하면 무조건 이긴다!
- 데이터로 알아보는 리그 오브 레전드의 승리 공식

.

데이터 소개¶

- 이번 주제는 League of Legends Diamond Ranked Games (10 min) 데이터셋을 사용합니다.

- 다음 1개의 csv 파일을 사용합니다.
high_diamond_ranked_10min.csv

- 각 파일의 컬럼은 아래와 같습니다.
gameId: 게임 판의 고유 ID
blueWins: 블루팀의 승리 여부 (0: 패배, 1: 승리)
xxxWardsPlaced: xxx팀에서 설치한 와드의 수 
xxxWardsDestroyed: xxx팀에서 파괴한 와드의 수
xxxFirstBlood: xxx팀의 첫번째 킬 달성 여부
xxxKills: xxx팀의 킬 수
xxxDeaths: xxx팀의 죽음 수
xxxAssists: xxx팀의 어시스트 수
xxxEliteMonsters: xxx팀이 죽인 엘리트 몬스터 수
xxxDragons: xxx팀이 죽인 용의 수
xxxHeralds: xxx팀이 죽인 전령의 수
xxxTowersDestroyed: xxx팀이 파괴한 탑의 수
xxxTotalGold: xxx팀의 전체 획득 골드
xxxAvgLevel: xxx팀의 평균 레벨
xxxTotalExperience: xxx팀의 총 경험치 획득량
xxxTotalMinionsKilled: xxx팀의 총 미니언 킬 수
xxxTotalJungleMinionsKilled: xxx팀의 총 정글 미니언 킬 수
xxxGoldDiff: xxx팀과 다른 팀 간의 골드 획득량 차이
xxxExperienceDiff: xxx팀과 다른 팀과의 경험치 획득량 차이
xxxCSPerMin: xxx팀의 분당 CS 스코어
xxxGoldPerMin: xxx팀의 분당 골드 획득량

데이터 출처: https://www.kaggle.com/bobbyscience/league-of-legends-diamond-ranked-games-10-min

최종 목표¶

- 일상에서 볼 수 있는 데이터의 활용
- 데이터 시각화를 통한 인사이트 습득 방법의 이해
- Scikit-learn 기반의 모델 학습 방법 습득
- 학습된 모델로부터 인사이트 습득 방법 이해
-  LoL 순위 게임 결과 분류

Step 1. 데이터셋 준비하기¶

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

Pandas 라이브러리로 csv파일 읽어들이기¶

# pd.read_csv()로 csv파일 읽어들이기
df = pd.read_csv('high_diamond_ranked_10min.csv')

Step 2. EDA 및 데이터 기초 통계 분석¶

문제 4. 데이터프레임의 각 컬럼 분석하기¶

# DataFrame에서 제공하는 메소드를 이용하여 컬럼 분석하기 (head(), info(), describe())
df.head()

df.describe(include=[np.number])

df.describe(exclude=[object])

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9879 entries, 0 to 9878
Data columns (total 40 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   gameId                        9879 non-null   int64  
 1   blueWins                      9879 non-null   int64  
 2   blueWardsPlaced               9879 non-null   int64  
 3   blueWardsDestroyed            9879 non-null   int64  
 4   blueFirstBlood                9879 non-null   int64  
 5   blueKills                     9879 non-null   int64  
 6   blueDeaths                    9879 non-null   int64  
 7   blueAssists                   9879 non-null   int64  
 8   blueEliteMonsters             9879 non-null   int64  
 9   blueDragons                   9879 non-null   int64  
 10  blueHeralds                   9879 non-null   int64  
 11  blueTowersDestroyed           9879 non-null   int64  
 12  blueTotalGold                 9879 non-null   int64  
 13  blueAvgLevel                  9879 non-null   float64
 14  blueTotalExperience           9879 non-null   int64  
 15  blueTotalMinionsKilled        9879 non-null   int64  
 16  blueTotalJungleMinionsKilled  9879 non-null   int64  
 17  blueGoldDiff                  9879 non-null   int64  
 18  blueExperienceDiff            9879 non-null   int64  
 19  blueCSPerMin                  9879 non-null   float64
 20  blueGoldPerMin                9879 non-null   float64
 21  redWardsPlaced                9879 non-null   int64  
 22  redWardsDestroyed             9879 non-null   int64  
 23  redFirstBlood                 9879 non-null   int64  
 24  redKills                      9879 non-null   int64  
 25  redDeaths                     9879 non-null   int64  
 26  redAssists                    9879 non-null   int64  
 27  redEliteMonsters              9879 non-null   int64  
 28  redDragons                    9879 non-null   int64  
 29  redHeralds                    9879 non-null   int64  
 30  redTowersDestroyed            9879 non-null   int64  
 31  redTotalGold                  9879 non-null   int64  
 32  redAvgLevel                   9879 non-null   float64
 33  redTotalExperience            9879 non-null   int64  
 34  redTotalMinionsKilled         9879 non-null   int64  
 35  redTotalJungleMinionsKilled   9879 non-null   int64  
 36  redGoldDiff                   9879 non-null   int64  
 37  redExperienceDiff             9879 non-null   int64  
 38  redCSPerMin                   9879 non-null   float64
 39  redGoldPerMin                 9879 non-null   float64
dtypes: float64(6), int64(34)
memory usage: 3.0 MB

문제 5. 각 컬럼의 Correlation 히트맵으로 시각화하기¶

df.corr().head()

컬럼이 많아 히트맵이 직관적이지 않음¶

# DataFrame의 corr() 메소드와 Seaborn의 heatmap() 메소드를 이용하여 Pearson's correlation 시각화하기

ax=plt.figure(figsize=(10,10))
ax=sns.heatmap(df.corr(),annot=False,linewidths=.5)
ax=plt.show()

컬럼이 많아 구분이 조금 힘들다 종속변수에 대해서만 진행 -> 한 눈에 들어옴¶

# DataFrame의 corr() 메소드와 Seaborn의 heatmap() 메소드를 이용하여 Pearson's correlation 시각화하기
fig = plt.figure(figsize=(4, 10))
sns.heatmap(df.corr()[['blueWins']], annot=True)

<matplotlib.axes._subplots.AxesSubplot at 0x2a3f48f7eb0>

df.columns

Index(['gameId', 'blueWins', 'blueWardsPlaced', 'blueWardsDestroyed',
       'blueFirstBlood', 'blueKills', 'blueDeaths', 'blueAssists',
       'blueEliteMonsters', 'blueDragons', 'blueHeralds',
       'blueTowersDestroyed', 'blueTotalGold', 'blueAvgLevel',
       'blueTotalExperience', 'blueTotalMinionsKilled',
       'blueTotalJungleMinionsKilled', 'blueGoldDiff', 'blueExperienceDiff',
       'blueCSPerMin', 'blueGoldPerMin', 'redWardsPlaced', 'redWardsDestroyed',
       'redFirstBlood', 'redKills', 'redDeaths', 'redAssists',
       'redEliteMonsters', 'redDragons', 'redHeralds', 'redTowersDestroyed',
       'redTotalGold', 'redAvgLevel', 'redTotalExperience',
       'redTotalMinionsKilled', 'redTotalJungleMinionsKilled', 'redGoldDiff',
       'redExperienceDiff', 'redCSPerMin', 'redGoldPerMin'],
      dtype='object')

### 아래 4개가 종속변수와 0.4 이상의 상관 관계
### Blue total gold

df.columns[df.corr()['blueWins']>0.4]

Index(['blueWins', 'blueTotalGold', 'blueGoldDiff', 'blueExperienceDiff',
       'blueGoldPerMin'],
      dtype='object')

문제 6. 각 컬럼과 승리 여부의 관계 시각화하기¶

df.head()

df.select_dtypes(include='number')

이진 변수들은 count plot¶

이겼을 때 블루팀이 드래곤을 안죽였을 때 많이졌군..큰 의미는 없다¶

# Seaborn의 countplot() 및 histplot()을 사용하여 각 컬럼과 승/패의 관계를 시각화 ->

sns.countplot(x='redTowersDestroyed',hue='blueWins',data=df)

<matplotlib.axes._subplots.AxesSubplot at 0x2a396a5f970>

numeric 값은 hist plot¶

sns.histplot(x='blueExperienceDiff',hue='blueWins',data=df)

<matplotlib.axes._subplots.AxesSubplot at 0x2a392275f10>

종속과 두 변수를 함께 보려면 joint¶

블루팀이 이겼을 때 킬이 많은 건 당연한 이야기...¶

sns.jointplot(x='blueKills', y='blueGoldDiff', data=df, hue='blueWins')

<seaborn.axisgrid.JointGrid at 0x2a392206e50>

Step 3. 모델 학습을 위한 데이터 전처리¶

문제 7. StandardScaler를 이용해 수치형 데이터 표준화하기¶

from sklearn.preprocessing import StandardScaler

X_num = df.select_dtypes(include='number')
X_cat =df.select_dtypes(include='category')
X
y = df['blueWins']
원래는 이런 식으로 전처리 해야하는데 type을 보니 모두 int라.... 노가다로 하든 타입을 미리바꿔주던 하나 선택!

쓸데 없는 변수들 미리 지워주기 <-> 서로서로 다중공선성이 있거나 gameid 같은 무의미¶

df[['gameId', 'redFirstBlood', 'redKills', 'redDeaths',
       'redTotalGold', 'redTotalExperience', 'redGoldDiff',
       'redExperienceDiff']]

표준화는 예전 커널들에서 계속 다뤄줬으므로 자세한 설명은 생략하겠다¶

from sklearn.preprocessing import StandardScaler

# StandardScaler를 이용해 수치형 데이터를 표준화하기
# Hint) Multicollinearity를 피하기 위해 불필요한 컬럼은 drop한다.
df.drop(['gameId', 'redFirstBlood', 'redKills', 'redDeaths',
       'redTotalGold', 'redTotalExperience', 'redGoldDiff',
       'redExperienceDiff'], axis=1, inplace=True)

X_num = df[['blueWardsPlaced', 'blueWardsDestroyed', 
       'blueKills', 'blueDeaths', 'blueAssists', 'blueEliteMonsters',
       'blueTowersDestroyed', 'blueTotalGold',
       'blueAvgLevel', 'blueTotalExperience', 'blueTotalMinionsKilled',
       'blueTotalJungleMinionsKilled', 'blueGoldDiff', 'blueExperienceDiff',
       'blueCSPerMin', 'blueGoldPerMin', 'redWardsPlaced', 'redWardsDestroyed',
       'redAssists', 'redEliteMonsters', 'redTowersDestroyed', 'redAvgLevel', 'redTotalMinionsKilled',
       'redTotalJungleMinionsKilled', 'redCSPerMin', 'redGoldPerMin']]
X_cat = df[['blueFirstBlood', 'blueDragons', 'blueHeralds', 'redDragons', 'redHeralds']]

scaler=StandardScaler()
scaler.fit(X_num)
X_scaled=scaler.transform(X_num)



X = pd.concat([X_scaled, X_cat], axis=1)
y = df['blueWins']

scaler=StandardScaler()
scaler.fit(X_num)
X_scaled=scaler.transform(X_num)

X_scaled

array([[ 0.31699566, -0.37927514,  0.93530086, ...,  0.36768454,
        -0.9287406 ,  0.05229268],
       [-0.57099219, -0.83906887, -0.39321635, ...,  0.06850362,
         1.0337835 ,  0.75861871],
       [-0.40449447, -1.2988626 ,  0.27104225, ..., -2.32494376,
        -0.65490002,  0.5339091 ],
       ...,
       [ 0.03949946, -0.83906887, -0.06108705, ...,  0.86631941,
         1.9922255 ,  1.22749041],
       [-0.45999371,  0.54031232, -1.38960425, ..., -1.12822007,
         1.35326417, -0.79892075],
       [-0.23799674, -1.2988626 , -0.06108705, ..., -0.52985823,
        -0.74618022, -0.77141898]])

X_scaled=pd.DataFrame(X_scaled,index=X_num.index,columns=X_num.columns)

X=pd.concat([X_scaled,X_cat],axis=1)
y=df['blueWins']

X

문제 8. 학습데이터와 테스트데이터 분리하기¶

from sklearn.model_selection import train_test_split

# train_test_split() 함수로 학습 데이터와 테스트 데이터 분리하기
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3,random_state=1)

Step 4. Classification 모델 학습하기¶

문제 9. Logistic Regression 모델 생성/학습하기¶

from sklearn.linear_model import LogisticRegression

# LogisticRegression 모델 생성/학습
model_lr = LogisticRegression(random_state=1,max_iter=100)
model_lr.fit(X_train,y_train)

LogisticRegression(random_state=1)

문제 10. 모델 학습 결과 평가하기¶

from sklearn.metrics import classification_report

0.74 정도의 정확도 ( imbalanced 되 있지 않으므로 단순 accuracy 참고)¶

# Predict를 수행하고 classification_report() 결과 출력하기
pred =model_lr.predict(X_test)
print(classification_report(y_test,pred))

              precision    recall  f1-score   support

           0       0.74      0.75      0.74      1469
           1       0.75      0.74      0.74      1495

    accuracy                           0.74      2964
   macro avg       0.74      0.74      0.74      2964
weighted avg       0.74      0.74      0.74      2964

문제 11. XGBoost 모델 생성/학습하기¶

from xgboost import XGBClassifier

# XGBClassifier 모델 생성/학습
model_xgb = XGBClassifier()
model_xgb.fit(X_train,y_train)

C:\Users\Administrator\anaconda3\lib\site-packages\xgboost\sklearn.py:888: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].
  warnings.warn(label_encoder_deprecation_msg, UserWarning)

[10:53:48] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.3.0/src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
              importance_type='gain', interaction_constraints='',
              learning_rate=0.300000012, max_delta_step=0, max_depth=6,
              min_child_weight=1, missing=nan, monotone_constraints='()',
              n_estimators=100, n_jobs=8, num_parallel_tree=1, random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', validate_parameters=1, verbosity=None)

부스팅 모델을 쓰니 오히려 정확도가 조금 내려감 - 이럴 때도있음 ..ㅎㅎ¶

문제 12. 모델 학습 결과 평가하기¶

# Predict를 수행하고 classification_report() 결과 출력하기
pred = model_xgb.predict(X_test)
print(classification_report(y_test,pred))

              precision    recall  f1-score   support

           0       0.72      0.72      0.72      1469
           1       0.72      0.73      0.73      1495

    accuracy                           0.72      2964
   macro avg       0.72      0.72      0.72      2964
weighted avg       0.72      0.72      0.72      2964

Step5 모델 학습 결과 심화 분석하기¶

문제 13. Logistic Regression 모델 계수로 상관성 파악하기¶

model_lr.coef_

array([[-0.03722312, -0.02280321, -0.16394534,  0.00258791, -0.03347599,
         0.11396334, -0.1138962 ,  0.30622615, -0.07285938,  0.03729869,
        -0.0372248 ,  0.01932872,  0.44377859,  0.42423486, -0.0372248 ,
         0.30622615, -0.02448855, -0.0133212 ,  0.07023347, -0.07907826,
         0.05399199, -0.00977213,  0.03659856,  0.03652838,  0.03659856,
        -0.4148868 ,  0.03551527,  0.17068559, -0.11330011, -0.13009491,
         0.06607254]])

model_coef = pd.DataFrame(data=model_lr.coef_[0], index=X.columns, columns=['Model Coefficient'])

model_coef

model_coef.sort_values(by='Model Coefficient', ascending=False, inplace=True)
model_coef

blueGoldDiff blueExperienceDiff 가 모델 구성에 중요한 feature#182;

# Logistic Regression 모델의 coef_ 속성을 plot하기
plt.bar(model_coef.index, model_coef['Model Coefficient'])
plt.xticks(rotation=90)
plt.grid()
plt.show()

Feature importance 도 마찬가지로 blueGoldDiff가 중요하다¶

문제 14. XGBoost 모델로 특징의 중요도 확인하기¶

# XGBoost 모델의 feature_importances_ 속성을 plot하기
fig = plt.figure(figsize=(10, 10))
plt.barh(X.columns, model_xgb.feature_importances_)

<BarContainer object of 31 artists>

	gameId	blueWardsPlaced	blueWardsDestroyed	blueFirstBlood	blueKills	blueDeaths	blueAssists	blueEliteMonsters	blueDragons	...	redTowersDestroyed	redTotalGold	redAvgLevel	redTotalExperience	redTotalMinionsKilled	redTotalJungleMinionsKilled	redGoldDiff	redExperienceDiff	redCSPerMin	redGoldPerMin
0	4519157822	28	2	1	9	6	11	0	0	...	0	16567	6.800	17047	197	55	-643	8	19.700	1656.700
1	4523371949	12	1	0	5	5	5	0	0	...	1	17620	6.800	17438	240	52	2908	1173	24.000	1762.000
2	4521474530	15	0	0	7	11	4	1	1	...	0	17285	6.800	17254	203	28	1172	1033	20.300	1728.500
3	4524384067	43	1	0	4	5	5	1	0	...	0	16478	7.000	17961	235	47	1321	7	23.500	1647.800
4	4436033771	75	4	0	6	6	6	0	0	...	0	17404	7.000	18313	225	67	1004	-230	22.500	1740.400

	gameId	blueWins	blueWardsPlaced	blueWardsDestroyed	blueFirstBlood	blueKills	blueDeaths	blueAssists	blueEliteMonsters	blueDragons	...	redTowersDestroyed	redTotalGold	redAvgLevel	redTotalExperience	redTotalMinionsKilled	redTotalJungleMinionsKilled	redGoldDiff	redExperienceDiff	redCSPerMin	redGoldPerMin
count	9879.000	9879.000	9879.000	9879.000	9879.000	9879.000	9879.000	9879.000	9879.000	9879.000	...	9879.000	9879.000	9879.000	9879.000	9879.000	9879.000	9879.000	9879.000	9879.000	9879.000
mean	4500084044.846	0.499	22.288	2.825	0.505	6.184	6.138	6.645	0.550	0.362	...	0.043	16489.041	6.925	17961.730	217.349	51.313	-14.414	33.620	21.735	1648.904
std	27573278.491	0.500	18.019	2.175	0.500	3.011	2.934	4.065	0.626	0.481	...	0.217	1490.888	0.305	1198.584	21.912	10.028	2453.349	1920.370	2.191	149.089
min	4295358071.000	0.000	5.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	...	0.000	11212.000	4.800	10465.000	107.000	4.000	-11467.000	-8348.000	10.700	1121.200
25%	4483301169.000	0.000	14.000	1.000	0.000	4.000	4.000	4.000	0.000	0.000	...	0.000	15427.500	6.800	17209.500	203.000	44.000	-1596.000	-1212.000	20.300	1542.750
50%	4510920346.000	0.000	16.000	3.000	1.000	6.000	6.000	6.000	0.000	0.000	...	0.000	16378.000	7.000	17974.000	218.000	51.000	-14.000	28.000	21.800	1637.800
75%	4521733208.500	1.000	20.000	4.000	1.000	8.000	8.000	9.000	1.000	1.000	...	0.000	17418.500	7.200	18764.500	233.000	57.000	1585.500	1290.500	23.300	1741.850
max	4527990640.000	1.000	250.000	27.000	1.000	22.000	22.000	29.000	2.000	1.000	...	2.000	22732.000	8.200	22269.000	289.000	92.000	10830.000	9333.000	28.900	2273.200

	gameId	blueWins	blueWardsPlaced	blueWardsDestroyed	blueFirstBlood	blueKills	blueDeaths	blueAssists	blueEliteMonsters	blueDragons	...	redTowersDestroyed	redTotalGold	redAvgLevel	redTotalExperience	redTotalMinionsKilled	redTotalJungleMinionsKilled	redGoldDiff	redExperienceDiff	redCSPerMin	redGoldPerMin
count	9879.000	9879.000	9879.000	9879.000	9879.000	9879.000	9879.000	9879.000	9879.000	9879.000	...	9879.000	9879.000	9879.000	9879.000	9879.000	9879.000	9879.000	9879.000	9879.000	9879.000
mean	4500084044.846	0.499	22.288	2.825	0.505	6.184	6.138	6.645	0.550	0.362	...	0.043	16489.041	6.925	17961.730	217.349	51.313	-14.414	33.620	21.735	1648.904
std	27573278.491	0.500	18.019	2.175	0.500	3.011	2.934	4.065	0.626	0.481	...	0.217	1490.888	0.305	1198.584	21.912	10.028	2453.349	1920.370	2.191	149.089
min	4295358071.000	0.000	5.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	...	0.000	11212.000	4.800	10465.000	107.000	4.000	-11467.000	-8348.000	10.700	1121.200
25%	4483301169.000	0.000	14.000	1.000	0.000	4.000	4.000	4.000	0.000	0.000	...	0.000	15427.500	6.800	17209.500	203.000	44.000	-1596.000	-1212.000	20.300	1542.750
50%	4510920346.000	0.000	16.000	3.000	1.000	6.000	6.000	6.000	0.000	0.000	...	0.000	16378.000	7.000	17974.000	218.000	51.000	-14.000	28.000	21.800	1637.800
75%	4521733208.500	1.000	20.000	4.000	1.000	8.000	8.000	9.000	1.000	1.000	...	0.000	17418.500	7.200	18764.500	233.000	57.000	1585.500	1290.500	23.300	1741.850
max	4527990640.000	1.000	250.000	27.000	1.000	22.000	22.000	29.000	2.000	1.000	...	2.000	22732.000	8.200	22269.000	289.000	92.000	10830.000	9333.000	28.900	2273.200

	gameId	blueWins	blueWardsPlaced	blueWardsDestroyed	blueFirstBlood	blueKills	blueDeaths	blueAssists	blueEliteMonsters	blueDragons	...	redTowersDestroyed	redTotalGold	redAvgLevel	redTotalExperience	redTotalMinionsKilled	redTotalJungleMinionsKilled	redGoldDiff	redExperienceDiff	redCSPerMin	redGoldPerMin
gameId	1.000	0.001	0.005	-0.012	-0.012	-0.039	-0.013	-0.023	0.017	0.009	...	0.004	-0.011	-0.012	-0.021	-0.005	0.006	0.015	0.012	-0.005	-0.011
blueWins	0.001	1.000	0.000	0.044	0.202	0.337	-0.339	0.277	0.222	0.214	...	-0.104	-0.411	-0.352	-0.388	-0.212	-0.111	-0.511	-0.490	-0.212	-0.411
blueWardsPlaced	0.005	0.000	1.000	0.034	0.003	0.018	-0.003	0.033	0.020	0.018	...	-0.008	-0.006	-0.009	-0.013	-0.012	0.001	-0.016	-0.028	-0.012	-0.006
blueWardsDestroyed	-0.012	0.044	0.034	1.000	0.018	0.034	-0.073	0.068	0.042	0.041	...	-0.024	-0.067	-0.059	-0.057	0.040	-0.036	-0.079	-0.078	0.040	-0.067
blueFirstBlood	-0.012	0.202	0.003	0.018	1.000	0.269	-0.248	0.229	0.152	0.134	...	-0.070	-0.301	-0.183	-0.195	-0.157	-0.025	-0.379	-0.241	-0.157	-0.301

	gameId	blueWardsPlaced	blueWardsDestroyed	blueFirstBlood	blueKills	blueDeaths	blueAssists	blueEliteMonsters	blueDragons	...	redTowersDestroyed	redTotalGold	redAvgLevel	redTotalExperience	redTotalMinionsKilled	redTotalJungleMinionsKilled	redGoldDiff	redExperienceDiff	redCSPerMin	redGoldPerMin
0	4519157822	28	2	1	9	6	11	0	0	...	0	16567	6.800	17047	197	55	-643	8	19.700	1656.700
1	4523371949	12	1	0	5	5	5	0	0	...	1	17620	6.800	17438	240	52	2908	1173	24.000	1762.000
2	4521474530	15	0	0	7	11	4	1	1	...	0	17285	6.800	17254	203	28	1172	1033	20.300	1728.500
3	4524384067	43	1	0	4	5	5	1	0	...	0	16478	7.000	17961	235	47	1321	7	23.500	1647.800
4	4436033771	75	4	0	6	6	6	0	0	...	0	17404	7.000	18313	225	67	1004	-230	22.500	1740.400

AI/ML 기술 블로그

데이터로 알아보는 리그 오브 레전드의 승리 예측 및 인사이트

주제 : 롤 좀 하니? '이것'만 하면 무조건 이긴다!
- 데이터로 알아보는 리그 오브 레전드의 승리 공식

데이터 소개¶

최종 목표¶

Step 1. 데이터셋 준비하기¶

Pandas 라이브러리로 csv파일 읽어들이기¶

Step 2. EDA 및 데이터 기초 통계 분석¶

문제 4. 데이터프레임의 각 컬럼 분석하기¶

문제 5. 각 컬럼의 Correlation 히트맵으로 시각화하기¶

컬럼이 많아 히트맵이 직관적이지 않음¶

컬럼이 많아 구분이 조금 힘들다 종속변수에 대해서만 진행 -> 한 눈에 들어옴¶

문제 6. 각 컬럼과 승리 여부의 관계 시각화하기¶

이진 변수들은 count plot¶

이겼을 때 블루팀이 드래곤을 안죽였을 때 많이졌군..큰 의미는 없다¶

numeric 값은 hist plot¶

종속과 두 변수를 함께 보려면 joint¶

블루팀이 이겼을 때 킬이 많은 건 당연한 이야기...¶

Step 3. 모델 학습을 위한 데이터 전처리¶

문제 7. StandardScaler를 이용해 수치형 데이터 표준화하기¶

쓸데 없는 변수들 미리 지워주기 <-> 서로서로 다중공선성이 있거나 gameid 같은 무의미¶

표준화는 예전 커널들에서 계속 다뤄줬으므로 자세한 설명은 생략하겠다¶

문제 8. 학습데이터와 테스트데이터 분리하기¶

Step 4. Classification 모델 학습하기¶

문제 9. Logistic Regression 모델 생성/학습하기¶

문제 10. 모델 학습 결과 평가하기¶

0.74 정도의 정확도 ( imbalanced 되 있지 않으므로 단순 accuracy 참고)¶

문제 11. XGBoost 모델 생성/학습하기¶

부스팅 모델을 쓰니 오히려 정확도가 조금 내려감 - 이럴 때도있음 ..ㅎㅎ¶

문제 12. 모델 학습 결과 평가하기¶

Step5 모델 학습 결과 심화 분석하기¶

문제 13. Logistic Regression 모델 계수로 상관성 파악하기¶

blueGoldDiff blueExperienceDiff 가 모델 구성에 중요한 feature#182;

Feature importance 도 마찬가지로 blueGoldDiff가 중요하다¶

문제 14. XGBoost 모델로 특징의 중요도 확인하기¶

끝¶

'Project & Kaggle' 카테고리의 다른 글

'Project & Kaggle'의 다른글

티스토리툴바

	blueWardsPlaced	blueWardsDestroyed	blueKills	blueDeaths	blueAssists	blueEliteMonsters	blueTowersDestroyed	blueTotalGold	blueAvgLevel	blueTotalExperience	...	redAvgLevel	redTotalMinionsKilled	redTotalJungleMinionsKilled	redCSPerMin	redGoldPerMin	blueFirstBlood	blueDragons	blueHeralds	redDragons	redHeralds
0	0.317	-0.379	0.935	-0.047	1.071	-0.879	-0.210	0.460	-1.036	-0.741	...	-0.410	-0.929	0.368	-0.929	0.052	1	0	0	0	0
1	-0.571	-0.839	-0.393	-0.388	-0.405	-0.879	-0.210	-1.167	-1.036	-1.385	...	-0.410	1.034	0.069	1.034	0.759	0	0	0	1	1
2	-0.404	-1.299	0.271	1.657	-0.651	0.720	-0.210	-0.254	-1.691	-1.422	...	-0.410	-0.655	-2.325	-0.655	0.534	0	1	0	0	0
3	1.149	-0.839	-0.725	-0.388	-0.405	0.720	-0.210	-0.877	0.275	0.022	...	0.245	0.806	-0.430	0.806	-0.007	0	0	1	0	0
4	2.925	0.540	-0.061	-0.047	-0.159	-0.879	-0.210	-0.067	0.275	0.512	...	0.245	0.349	1.564	0.349	0.614	0	0	0	1	0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
9874	-0.293	-0.379	0.271	-0.729	-0.405	0.720	-0.210	0.822	0.931	0.865	...	-0.410	0.532	-1.727	0.532	-0.834	1	1	0	0	0
9875	1.760	-1.299	-0.061	-0.729	0.333	0.720	-0.210	-0.173	0.931	1.105	...	0.245	-0.518	0.467	-0.518	-0.693	0	1	0	0	0
9876	0.039	-0.839	-0.061	0.294	-0.405	-0.879	-0.210	-0.391	0.275	0.087	...	1.555	1.992	0.866	1.992	1.227	0	0	0	1	0
9877	-0.460	0.540	-1.390	-1.070	-0.897	0.720	-0.210	-1.332	-1.036	-0.582	...	0.900	1.353	-1.128	1.353	-0.799	1	1	0	0	0
9878	-0.238	-1.299	-0.061	-0.047	-0.405	-0.879	-0.210	-0.155	0.275	-0.506	...	-0.410	-0.746	-0.530	-0.746	-0.771	1	0	0	1	0

« 2025/06 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

데이터로 알아보는 리그 오브 레전드의 승리 예측 및 인사이트

주제 : 롤 좀 하니? '이것'만 하면 무조건 이긴다! - 데이터로 알아보는 리그 오브 레전드의 승리 공식

데이터 소개¶

최종 목표¶

Step 1. 데이터셋 준비하기¶

Pandas 라이브러리로 csv파일 읽어들이기¶

Step 2. EDA 및 데이터 기초 통계 분석¶

문제 4. 데이터프레임의 각 컬럼 분석하기¶

문제 5. 각 컬럼의 Correlation 히트맵으로 시각화하기¶

컬럼이 많아 히트맵이 직관적이지 않음¶

컬럼이 많아 구분이 조금 힘들다 종속변수에 대해서만 진행 -> 한 눈에 들어옴¶

문제 6. 각 컬럼과 승리 여부의 관계 시각화하기¶

이진 변수들은 count plot¶

이겼을 때 블루팀이 드래곤을 안죽였을 때 많이졌군..큰 의미는 없다¶

numeric 값은 hist plot¶

종속과 두 변수를 함께 보려면 joint¶

블루팀이 이겼을 때 킬이 많은 건 당연한 이야기...¶

Step 3. 모델 학습을 위한 데이터 전처리¶

문제 7. StandardScaler를 이용해 수치형 데이터 표준화하기¶

쓸데 없는 변수들 미리 지워주기 <-> 서로서로 다중공선성이 있거나 gameid 같은 무의미¶

표준화는 예전 커널들에서 계속 다뤄줬으므로 자세한 설명은 생략하겠다¶

문제 8. 학습데이터와 테스트데이터 분리하기¶

Step 4. Classification 모델 학습하기¶

문제 9. Logistic Regression 모델 생성/학습하기¶

문제 10. 모델 학습 결과 평가하기¶

0.74 정도의 정확도 ( imbalanced 되 있지 않으므로 단순 accuracy 참고)¶

문제 11. XGBoost 모델 생성/학습하기¶

부스팅 모델을 쓰니 오히려 정확도가 조금 내려감 - 이럴 때도있음 ..ㅎㅎ¶

문제 12. 모델 학습 결과 평가하기¶

Step5 모델 학습 결과 심화 분석하기¶

문제 13. Logistic Regression 모델 계수로 상관성 파악하기¶

blueGoldDiff blueExperienceDiff 가 모델 구성에 중요한 feature#182;

Feature importance 도 마찬가지로 blueGoldDiff가 중요하다¶

문제 14. XGBoost 모델로 특징의 중요도 확인하기¶

끝¶

'Project & Kaggle' 카테고리의 다른 글

'Project & Kaggle'의 다른글

관련글

티스토리툴바

주제 : 롤 좀 하니? '이것'만 하면 무조건 이긴다!
- 데이터로 알아보는 리그 오브 레전드의 승리 공식