Data

Featured Image

Introduction

In this project, we collected data from two primary sources. First, we collected American National Election Studies Data for Presidential election years from 2000 to 2020. The ANES is a nationwide survey conducted every four years (and in some midterm years) since 1948. For the past 70 years, the team at ANES has focused on providing a comprehensive outlook on American politics and political attitudes. While variables change from year to year, the Studies aim to glean insight into longstanding trends in how the general American voter thinks and votes.

ANES Data

Using the ANES Data, we isolated variables measuring individual respondent’s authoritarianism along with variables measuring their financial status, party, state of registration, religion, and who they voted for. The authoritarian variables varied by year, but broadly consisted of four primary categories of questions: respondents’ opinions on child-rearing (the most consistent of the variables across each year - generally, this consisted of questions asking if respondents preferred children to have one of two opposite traits or both, one being coded as authoritarian, the other as “libertarian” in the antithetical sense), respondents’ opinions in the United States’ global status, the respondents’ opinions on the influence of rural areas, and the respondents’ opinions on questions regarding women’s rights.

As respondents were not consistent across election years, we joined each file vertically using rbind. Then, using this file, we created a “authoritarianism score” for each respondent. To do so, we normalized each variable in each yearly dataset – not all years were coded the same – by setting the most authoritarian answer equal to 1, the least equal to 0, and then respectively scoring the responses in between. NA values were coded as any negative response or as whatever unique values that year’s code book identified. We then calculated five aggregate scores: one for each of the broad classification of authoritarianism questions, and one overall authoritarianism score. Next, we summarized the data by state and year to provide a clean panel dataset.

An example of how we normalized a variable

    ANES_sum$rural.govt[ANES_sum$rural.govt==1]<-0
    ANES_sum$rural.govt[ANES_sum$rural.govt==2]<-.17
    ANES_sum$rural.govt[ANES_sum$rural.govt==3]<-.34
    ANES_sum$rural.govt[ANES_sum$rural.govt==4]<-.51
    ANES_sum$rural.govt[ANES_sum$rural.govt==5]<-.68
    ANES_sum$rural.govt[ANES_sum$rural.govt==6]<-.85
    ANES_sum$rural.govt[ANES_sum$rural.govt==7]<-1

MIT Election Lab Data

Our second data source was the MIT Election Lab, a team at MIT who has focused on collecting and curating decades of election data in the United States in aims to further political sciences research and bolster democracy. From the MIT Election Lab, we downloaded their presidential election data from 1976 to 2020, which includes observations for each candidate in each state in ach year and measures their votes and vote share. We isolated to the years of relevancy (2000 - 2020), and selected the state, year, party simplified variable (identifying each candidate as Republican, Democrat, Libertarian, or Other), and share, measuring each candidate’s respective vote share in the respective year and state.

Next, we pivoted the data wider, making the party-simplified variable into three unique columns, each with the respective vote share – Democrat, Republican, and Libertarian were kept, other was omitted. We then created a “margin” variable, measuring how many more percentage points of the votes the Democratic candidate in each state and year received compared to the Republican one. The scores from the ANES data were combined with this election data to create a panel with election and authoritarianism data for each state and year.

Pivoting and Joining

 MIT_State <- MIT_ELECTION_clean%>%
      filter(party_simplified != "other")%>%
      pivot_wider(names_from = party_simplified,values_from = share)%>%
      mutate(margin = democrat-republican)%>%
      mutate(year = as.factor(year))%>%
      left_join(ANES_state,by = c("state","year"))

Load and Clean

All these steps can be seen in our load_and_clean_data.R file. Additional packages used were naniar and haven. The former was used to replace non-response codes in the ANES data set with NAs, and the former latter was used to import .dta files, as earlier ANES data sets had such file types.