전설의 포켓몬 Classification

Project & Kaggle

전설의 포켓몬 Classification

robin0309 2020. 5. 6. 16:27

%matplotlib inline

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings("ignore")

df = pd.read_csv("https://raw.githubusercontent.com/yoonkt200/FastCampusDataset/master/Pokemon.csv")

df.head()

Feature Description

Name : 포켓몬 이름
Type 1 : 포켓몬 타입 1
Type 2 : 포켓몬 타입 2
Total : 포켓몬 총 능력치 (Sum of Attack, Sp. Atk, Defense, Sp. Def, Speed and HP)
HP : 포켓몬 HP 능력치
Attack : 포켓몬 Attack 능력치
Defense : 포켓몬 Defense 능력치
Sp. Atk : 포켓몬 Sp. Atk 능력치
Sp. Def : 포켓몬 Sp. Def 능력치
Speed : 포켓몬 Speed 능력치
Generation : 포켓몬 세대
Legendary : 전설의 포켓몬 여부

2) EDA (Exploratory Data Analysis : 탐색적 데이터 분석)

2-1) 기본 정보 탐색

데이터셋 기본 정보 탐색

개별 피쳐 탐색

Genration별 갯수 시각화

type 1에서 전설의 포켓몬 갯수 시각화

데이터 전처리

- 데이터 타입변경

df['Legendary'] = df['Legendary'].astype(int)
df['Generation'] = df['Generation'].astype(str)
preprocessed_df = df[['Type 1', 'Type 2', 'Total', 'HP', 'Attack',
'Defense', 'Sp. Atk', 'Sp. Def', 'Speed', 'Generation', 'Legendary']]
preprocessed_df.head() # generation은 string으로 legendary는 int로 타입변경

# one-hot encoding example
encoded_df = pd.get_dummies(preprocessed_df['Type 1'])
encoded_df.head() # type1을 더미데이터로 변경(원핫인코딩)

# pokemon type list 생성
def make_list(x1, x2):
    type_list = []
    type_list.append(x1)
    if x2 is not np.nan:
        type_list.append(x2)
    return type_list

preprocessed_df['Type'] = preprocessed_df.apply(lambda x: make_list(x['Type 1'], x['Type 2']), axis=1)
preprocessed_df.head() # type1과 type2로 리스트를 만든다

del preprocessed_df['Type 1']
del preprocessed_df['Type 2']
preprocessed_df.head()#type1과 type2삭제

# multi label binarizer 적용
from sklearn.preprocessing import MultiLabelBinarizemlb = MultiLabelBinarizer()
preprocessed_df = preprocessed_df.join(pd.DataFrame(mlb.fit_transform(preprocessed_df.pop('Type')),
columns=mlb.classes_)) #

preprocessed_df.head()

# apply one-hot encoding to 'Generation'
preprocessed_df = pd.get_dummies(preprocessed_df)
preprocessed_df.head()

from sklearn.preprocessing import StandardScaler

# feature standardization
scaler = StandardScaler()
scale_columns = ['Total', 'HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed']
preprocessed_df[scale_columns] = scaler.fit_transform(preprocessed_df[scale_columns])
preprocessed_df.head()

from sklearn.model_selection import train_test_split

# dataset split to train/test
X = preprocessed_df.loc[:, preprocessed_df.columns != 'Legendary']
y = preprocessed_df['Legendary']
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=33)

3-2) Logistic Regression 모델 학습

print(x_train.shape)
print(x_test.shape)

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Train LR model
lr = LogisticRegression(random_state=0)
lr.fit(x_train, y_train)

# classifiacation predict
y_pred = lr.predict(x_test) # 모델학습

# classification result for test dataset
print("accuracy: %.2f" % accuracy_score(y_test, y_pred))
print("Precision : %.3f" % precision_score(y_test, y_pred))
print("Recall : %.3f" % recall_score(y_test, y_pred))
print("F1 : %.3f" % f1_score(y_test, y_pred)) #모델 평가

accuracy: 0.95 Precision : 0.615 Recall : 0.667 F1 : 0.640

from sklearn.metrics import confusion_matrix

# print confusion matrix
confmat = confusion_matrix(y_true=y_test, y_pred=y_pred)
print(confmat)

3-3) 클래스 불균형 조정¶

저작자표시 비영리 변경금지 (새창열림)

'Project & Kaggle' 카테고리의 다른 글

Dog and cat Classification with CNN (0)	2020.12.15
주택 가격 예측 with Deep Neural network (0)	2020.12.11
Deep learning( CNN을 활용한 Mnist classification kernel) (0)	2020.11.30
사내 직원 이탈 모델링 (2)	2020.09.04
보스턴 집 값 예측 - Boston Housing price Regression (0)	2020.04.24

현재글전설의 포켓몬 Classification

Data Science 관련 지식들에 대해 정리하는 공간입니다.

LEVEL2, Word2Vec, 알고리즘, ARIMA, 프로그래머스, pytorch, LEVEL 1, 딥러닝, feature importance, Classification, LEVEL1, 시계열, Kaggle, deeplearning, deepfm, Keras, 강화학습, 앙상블, 추천시스템, matrix factorization,

일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

AI/ML 기술 블로그

전설의 포켓몬 Classification