-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathXgboost.Rmd
70 lines (60 loc) · 1.48 KB
/
Xgboost.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
---
title: "XGboost"
output:
word_document:
toc: yes
html_document:
df_print: paged
html_notebook:
number_sections: yes
toc: yes
---
# XGboost
## Set-up
```{r}
library('xgboost')
library("Matrix")
library(pROC)
library(caret)
library(ggplot2)
drat:::addRepo("dmlc")
require(xgboost)
```
## Read Data and
```{r}
train<-read.csv("TRAIN_Numeric.csv",header=TRUE,na='NULL')
train=na.omit(train)
Del_rolnames<-c('NA_NULL','NA_','NA_PINK','NA_WI','HASPRIME')
Delete<-function(data,rolnames){
data = data[,!names(data) %in% rolnames]
return(data)
}
train=Delete(train,Del_rolnames)
sum(is.na(train))
```
## Data Segmentation
```{r}
Train<-train
ind<- sample(2, nrow(Train),replace=T, prob=c(0.7,0.3))
train<-Train[ind==1,]
test<-Train[ind==2,]
train_x<-subset(train,select=-IsBadBuy)
train_y<-subset(train,select=IsBadBuy)
test_x<-subset(test,select=-IsBadBuy)
test_y<-subset(test,select=IsBadBuy)
```
## fit-predict
```{r}
bst<-xgboost(as.matrix(train_x),label=train_y$IsBadBuy,max.depth=20,nround=20,objective="binary:logistic",eval_metric='logloss',eta=0.6)
y_predict<-predict(bst,as.matrix(test_x))
y_predict_binary<-ifelse(y_predict >0.00125, 1, 0)
y_predict_binary<-as.matrix(y_predict_binary)
colnames(y_predict_binary) = c('IsBadBuy')
```
## Result
```{r}
names<-colnames(train_x)
importance_matrix<-xgb.importance(names,model=bst)
importance_matrix
xgb.plot.importance(importance_matrix[1:15])
```