研究問題與動機描述

** 前10高GDP國家與奧運得獎數之變化**

#動機
#經濟能力是各國重視的問題而在體育方面培養選手是需要大量的資源因此,我們好奇經濟能力是否會影響其國家在體育競賽上表現並推斷若該國家GDP越高 則其運動賽場上的表現會更優異

讀取套件及資料

library(readr)#讀取套件
athlete<-read_csv("../asset/athlete_all.csv")#載入資料
## Warning: Missing column names filled in: 'X1' [1]
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   country = col_character(),
##   Name = col_character(),
##   Sex = col_character(),
##   NOC = col_character(),
##   Games = col_character(),
##   Season = col_character(),
##   City = col_character(),
##   Sport = col_character(),
##   Event = col_character(),
##   Medal = col_character()
## )
## See spec(...) for full column specifications.
## Warning: 42 parsing failures.
##  row           col               expected actual                       file
## 1166 female_school no trailing characters      r '../asset/athlete_all.csv'
## 1167 female_school no trailing characters      r '../asset/athlete_all.csv'
## 1168 female_school no trailing characters      r '../asset/athlete_all.csv'
## 1169 female_school no trailing characters      r '../asset/athlete_all.csv'
## 1170 female_school no trailing characters      r '../asset/athlete_all.csv'
## .... ............. ...................... ...... ..........................
## See problems(...) for more details.
pacman::p_load(devtools,dplyr, ggplot2, readr, plotly, googleVis,ggthemes,d3heatmap,magrittr)#載入套件
## Installing package into 'C:/Users/CCCM_3051/Documents/R/win-library/3.6'
## (as 'lib' is unspecified)
## Warning: unable to access index for repository http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/3.6:
##   無法開啟 URL 'http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/3.6/PACKAGES'
## package 'devtools' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\CCCM_3051\AppData\Local\Temp\Rtmp6VqSiM\downloaded_packages
## 
## devtools installed
## Warning in pacman::p_load(devtools, dplyr, ggplot2, readr, plotly, googleVis, : Failed to install/load:
## devtools

資料探索(敘述性統計)

a = athlete %>% filter(income_GDP!=0) %>% # 由GDP篩選國家
group_by(country) %>% # 依照國家分類
summarise(income_GDP=mean(income_GDP)) %>% # 新增GDP變數
arrange(desc(income_GDP)) %>%# GDP前10高由上往下排序
head(10) #取前10

資料處理(dplyr)

b = athlete %>% filter(country ==c("Qatar","United Arab Emirates","Brunei","Kuwait","Saudi Arabia","Singapore","Bahrain","Libya","Oman", "Switzerland"))%>% # 篩選前10高GDP國家
group_by(country) %>% # 依照國家分類
summarise(cnt = n())#計算得牌數
## Warning in country == c("Qatar", "United Arab Emirates", "Brunei",
## "Kuwait", : 較長的物件長度並非較短物件長度的倍數
c=a %>% left_join(b, by = "country")#得牌數與GDP兩欄位合併

資料視覺化

d = c %>% 
  ggplot(aes(country, income_GDP)) + #設定XY軸
  geom_point(aes(size = cnt)) + #畫點狀圖
  theme_light() + scale_color_economist() +#設定主題
  theme(axis.text.x = element_text(face = "bold", angle = 45))#調整X軸樣式
ggplotly(d)#動態視覺化工具

結論與洞察

#我們在資料探索中,發現到國家的GDP跟奧運的獎牌數沒有很大的關係,從中發現到GDP越高,所得到的獎牌數並不會變多,但這些國家幾乎都來自中東,因發展石油而讓他們的GDP提高,且人口稀少,可利用的土地不多,水資源少, 運動資源並不發達要培育一個體育選手非困難,但還是有優秀的選手。