[R 프로그래밍] MAU 통계 집계 후 히스토그램, 크로스집계

MAU 통계 집계 프로세스

1. 2016년 12월에 한번이라도 접속한 user_id 추출

2. 해당 user_id 기준으로 Joindate 추출 <-(최초로그인 OR 캐릭터생성 테이블)

3. 컬럼에 install_date, log_month 생성

4. 2016년 12월 기준 해당 user_id의 max LV 추출

5. 2016년 12월 기준 해당 user_id의 device_type 추출

6. 2016 년 12월 기준 해당 user_id의 country 추출

7. 2016년 12월 기준 payment 데이터 추출

<------------ 해당 데이터를 2016년 12월 기준 user_id로 엑셀에서 데이터 수집 및 가공----------->

c.f) 기타

8. 2016년 12월 기준 컨텐츠 이용 데이터 추출

- pvp, 시련의전장, 길드, 아레나 등...

[R 프로그래밍]

#해당 mau csv 파일 읽기

mau <- read.csv("ik_mau.csv", header=T, stringsAsFactors=F)

head(mau)

#install_month 컬럼 생성

mau$install_month <- substr(mau$install_date, 1,7)

head(mau)

#해당 월 기준 신규, 기존 유저 분류

mau$user.type <- ifelse(mau$install_month == mau$log_month, "install", "existing")

head(mau)

#신규, 기존 월차 집계

library("plyr")

mau.payment.summary <- ddply(mau, .(log_month, user.type), summarize, total.payment=sum(payment))

#신규, 기존 월차 집계 데이터 가시화

library("ggplot2")

library("scales")

ggplot(mau.summary, aes(x=log_month, y=total.payment, fill=user.type)) + geom_bar(stat="identity") + scale_y_continuous(label=comma)

#신규유저 집계 가시화

ggplot(mau[mau$payment >0 & mau$user.type == "install",], aes(x=payment, fill=log_month)) +geom_histogram(position="dodge", binwidth=20000) + scale_x_continuous(label=comma)

#세그먼트 분석(레벨별 집계)

table(mau[, c("log_month", "lv")])

#세그먼트 분석(권역별 집계)

table(mau[, c("log_month", "region")])

#세그먼트 분석(디바이스별 집계)

table(mau[, c("log_month", "device_type")])

#세그먼트 분석(디바이스와 권역별 조합 집계)

library(reshape2)

dcast(mau, log_month ~ device_type + region, value.var="user_id", length)

#일별 디바이스별 유저수 산출 및 데이터 형식 변환

-> 한달간 DAU 데이터 필요(따로 데이터 추출)

dau.device <- merge(dau, mau, by= c("user_id"))

nrow(dau)

nrow(dau.device)

dau.device.summary <- ddply(dau.device, .(log_date, device_type), summarize, dau=length(user_id))

head(dau.device.summary)

dau.device.summary$log_date <- as.Date(dau.device.summary$log_date)

#일별 디바이스별 유저수 시계열로 트랜드 그래프 시각화

library(ggplot2)

library(scales)

limits <- c(0, max(dau.device.summary$dau))

ggplot(dau.device.summary, aes(x=log_date, y=dau, col=device_type, lty=device_type, shape=device_type))+ geom_line(lwd=1) + geom_point(size=4) + scale_y_continuous(label=comma, limits=limits)

#DAU 산출 및 데이터 형식 변환 & 시계열로 트랜드 그래프 시각화

library(ggplot2)

library(scales)

dau.summary <- ddply(dau, .(log_date), summarize, dau=length(user_id))

head(dau.summary)

limits <- c(0, max(dau.summary$dau))

dau.summary$log_date <- as.Date(dau.summary$log_date)

ggplot(dau.summary, aes(x=log_date, y=dau, col=dau, lty=dau, shape=dau)) + geom_line(lwd=1) + geom_point(size=4) + scale_y_continuous(label=comma, limits=limits)

#NRU 산출 및 데이터 형식 변환 & 시계열로 트랜드 그래프 시각화

library(ggplot2)

library(scales)

mau.nru <- mau[mau$user.type == "install",]

dau.nru <- ddply(mau.nru, .(install_date), summarize, dau=length(user_id))

limits <- c(0, max(dau.nru$dau))

dau.nru$install_date <- as.Date(dau.nru$install_date)

ggplot(dau.nru, aes(x=install_date, y=dau)) + geom_line(lwd=1) + geom_point(size=4) + scale_y_continuous(label=comma, limits=limits)

저작자표시

'R' 카테고리의 다른 글

[R 프로그래밍] 한글패키지 및 워드클라우드 생성 (0)	2017.02.12
[R 프로그래밍] R 프로그래밍 언어의 특징 (0)	2017.02.12
[R 프로그래밍] 크로스집계, 히스토그램 실습을 통한 데이터 분석 (0)	2017.02.10
[R 프로그래밍] SQL 데이터를 통한 R프로그래밍 실습 (0)	2017.02.09
[R 프로그래밍] 머신 러닝 (0)	2017.02.09

퍼포먼스 마케팅 데이터 분석

[R 프로그래밍] MAU 통계 집계 후 히스토그램, 크로스집계

'R' 카테고리의 다른 글

댓글

티스토리툴바

[R 프로그래밍] MAU 통계 집계 후 히스토그램, 크로스집계

'R' 카테고리의 다른 글

관련글

댓글

티스토리툴바