Assignment 3

Load in packages & data

pacman::p_load(ggplot2, tidyr, dplyr, haven, gridExtra, ggExtra, RColorBrewer)
TEDS_2016 <- read_stata("https://github.com/datageneration/home/blob/master/DataProgramming/data/TEDS_2016.dta?raw=true")
TEDS_2016$Tondu<-as.numeric(TEDS_2016$Tondu,labels=c("Unificationnow”,“Statusquo,unif.infuture”,“Statusquo,decidelater","Statusquoforever","Statusquo,indep.infuture","Independencenow”,“Noresponse"))
head(TEDS_2016$Tondu)
[1] 3 5 3 5 9 4

Recode Tondu & create a subset of the original dataset

sel_dat<-TEDS_2016%>%select(Tondu,female, DPP, age, income, edu, Taiwanese, Econ_worse,votetsai)

Fit regressions

fit1<-lm(Tondu~age+edu+income, data=sel_dat)
summary(fit1)

Call:
lm(formula = Tondu ~ age + edu + income, data = sel_dat)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.7780 -1.1841 -0.4322  1.1079  5.4157 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  5.302529   0.257369  20.603  < 2e-16 ***
age         -0.004205   0.003194  -1.316   0.1882    
edu         -0.244608   0.037579  -6.509 9.96e-11 ***
income      -0.031855   0.016357  -1.948   0.0516 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.725 on 1676 degrees of freedom
  (10 observations deleted due to missingness)
Multiple R-squared:  0.04287,   Adjusted R-squared:  0.04115 
F-statistic: 25.02 on 3 and 1676 DF,  p-value: 7.771e-16
ta<-ggplot(sel_dat, aes(x=age,y=Tondu))+
  geom_smooth(method = "lm", se = F, show.legend = F)+
  geom_point(show.legend = F, position = "jitter",alpha=.5, pch=16) + ggthemes::theme_few() +
  labs(x="Age", y="TONDU preferences")

te<-ggplot(sel_dat, aes(x=edu,y=Tondu))+
  geom_smooth(method = "lm", se = F, show.legend = F)+
  geom_point(show.legend = F, position = "jitter",alpha=.5, pch=16) + ggthemes::theme_few() +
  labs(x="Education", y="TONDU preferences")

ti<-ggplot(sel_dat, aes(x=income,y=Tondu))+
  geom_smooth(method = "lm", se = F, show.legend = F)+
  geom_point(show.legend = F, position = "jitter",alpha=.5, pch=16) + ggthemes::theme_few() +
  labs(x="Income", y="TONDU preferences")

grid.arrange(ta,te,ti,ncol=3,nrow=1)

Additional Plots

taei<-ggplot(sel_dat, aes(age, Tondu, colour=edu))+
  geom_point()

ggMarginal(taei, type="histogram")

What is the problem here?

The dependent variable has too many values, which we can find out using the unique() function. A multinomial logit would be better to use here, or other statistical methods to fit the model as linear regression is pretty weak here.

unique(sel_dat$Tondu)
[1] 3 5 9 4 6 2 1