(4A)單元摘要: 使用第三周個人作業裡面的資料來練習
dplyr
tidyr
ggplot2
plotly
mean()
, mediam()
, min()
, max()
, …hist()
, table %>% barplot
cor()
, plot(x, y)
載入套件
mean()
, mediam()
, min()
, max()
, …hist()
, table %>% barplot
cor()
, plot(x, y)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 0.60 2.10 8.88 10.18 85.90
💡 學習重點:分布
■ 一種描述『變數』的方式
■ 分布:『變數』的值出現的『頻率』
■ 可以用『出現次數』或『出現比率』來呈現
Alabama Alaska Arizona Arkansas California
67 28 15 75 58
Colorado Connecticut Delaware Florida Georgia
64 8 3 67 159
Hawaii Idaho Illinois Indiana Iowa
5 44 102 92 99
Kansas Kentucky Louisiana Maine Maryland
105 120 64 16 24
Massachusetts Michigan Minnesota Mississippi Missouri
14 83 87 82 115
Montana Nebraska Nevada New Hampshire New Jersey
56 93 17 10 21
New Mexico New York North Carolina North Dakota Ohio
33 62 100 53 88
Oklahoma Oregon Pennsylvania Rhode Island South Carolina
77 36 67 5 46
South Dakota Tennessee Texas Utah Vermont
65 95 253 29 14
Virginia Washington West Virginia Wisconsin Wyoming
133 39 55 72 23
par(cex=0.7,mar=c(3,8,4,3))
table(d$state) %>% sort %>% tail(20) %>%
barplot(las=2,horiz=T,main="No. Counties")
[1] -0.26762
[1] 1.3085e-52
ggplot(d, aes(x=black, y=income_per_cap)) +
geom_point(color='cyan', alpha=0.2) +
geom_smooth(method='lm',se=F)
`geom_smooth()` using formula 'y ~ x'
North Central Northeast South West
Metro 302 130 591 142
Nonmetro 752 87 829 305
p1 = ggplot(d, aes(x=region,fill=metro))
p2 = ggplot(d, aes(x=metro,fill=region))
grid.arrange(
p1 + geom_bar(show.legend=F),
p1 + geom_bar(position=position_dodge(),show.legend=F),
p1 + geom_bar(position=position_fill()),
p2 + geom_bar(show.legend=F),
p2 + geom_bar(position=position_dodge(),show.legend=F),
p2 + geom_bar(position=position_fill()),
nrow = 2)
🗿 問題:
Q: 在各region
之中,分別算出metro
和Nonmetro
的比率
Q: 在metro
和Nonmetro
,分別算出各region
的比率
North Central Northeast South West
710.08 746.14 611.04 3879.13
Metro Nonmetro
North Central 604.64 752.43
Northeast 571.08 1007.72
South 561.92 646.06
West 2741.58 4408.75
group_by(d, region, metro) %>% summarise(
land_area = mean(land_area)) %>%
ggplot(aes(x=region, y=land_area, fill=metro)) +
geom_col(position=position_dodge2())
`summarise()` has grouped output by 'region'. You can override using the `.groups` argument.
group_by(d, region, metro) %>% summarize(
income_per_cap = weighted.mean(income_per_cap, population)
) %>%
ggplot(aes(x=region, y=income_per_cap, fill=metro)) +
geom_col(position=position_dodge2())
`summarise()` has grouped output by 'region'. You can override using the `.groups` argument.
grid.arrange(
ggplot(d, aes(x=land_area)) + geom_histogram() + scale_x_log10(),
ggplot(d, aes(x=land_area)) + geom_density() + scale_x_log10(),
nrow=1
)
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
grid.arrange(
ggplot(d, aes(x=land_area,fill=region,color=region)) +
geom_histogram(alpha=0.5) + scale_x_log10(),
ggplot(d, aes(x=land_area,fill=region,color=region)) +
geom_density(alpha=0.5) + scale_x_log10(),
nrow=2
)
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(d, aes(x=d$black, y=d$income_per_cap)) +
geom_point(color='cyan', alpha=0.2) +
geom_smooth(method='lm',se=F) +
facet_grid(metro~region)
Warning: Use of `d$black` is discouraged. Use `black` instead.
Warning: Use of `d$income_per_cap` is discouraged. Use `income_per_cap` instead.
Warning: Use of `d$black` is discouraged. Use `black` instead.
Warning: Use of `d$income_per_cap` is discouraged. Use `income_per_cap` instead.
`geom_smooth()` using formula 'y ~ x'