Setup

Load data

Make sure your data and R Markdown files are in the same directory. When loaded your data file will be called brfss2013. Delete this note when before you submit your work.


Part 1: Data

In this project i will briefly discuss about a data set including how the observation in the sample are collected, the implications of data collection method based on its generalizability and causality using the Behavior Risk Factor Surveillance System (BRFSS).

BRFSS is a phone survey (telephones and cellphone) that is conducted monthly with standardized questionnaire across America. According to BRFSS, in the year of 2011, more than 500000 candidates were conducted, this obey law of large number.

Furthermore, according to “User Guide June 2013”, methods BRFSS uses to eliminate bias is comprised and not limited : variety of questionnaire constriction (for covering different life styles), variety of phone interview time and 15 calling attempts (to avoiding non-response and convince bias) and data weighting.

Conclusion: by using random assignment, multistage sampling and multiple methods to omit bias, BRFFS data set has generabizability and causality.


Part 2: Research questions

Research quesion 1: What’s the relationship between tobacco use and education level? Male and female? Variables: smokday2, educa, sex.

Research quesion 2: How many female have college or higher degree compare to male? Variables: sex, educa.

Research quesion 3: What’s the relationship between What’s the relationship between sleep and education level? Male and female? Variables: sleptim1, educa, sex.


Part 3: Exploratory data analysis

Research quesion 1:

## # A tibble: 6 x 2
##   educa                                                         count
##   <fct>                                                         <int>
## 1 Never attended school or only kindergarten                      677
## 2 Grades 1 through 8 (Elementary)                               13395
## 3 Grades 9 though 11 (Some high school)                         28141
## 4 Grade 12 or GED (High school graduate)                       142971
## 5 College 1 year to 3 years (Some college or technical school) 134197
## 6 College 4 years or more (College graduate)                   170120
## # A tibble: 3 x 2
##   smokday2    count
##   <fct>       <int>
## 1 Every day   55163
## 2 Some days   21494
## 3 Not at all 138135
## # A tibble: 6 x 3
##   educa                                                count percentage_of_smok~
##   <fct>                                                <int>               <dbl>
## 1 Never attended school or only kindergarten             222               0.338
## 2 Grades 1 through 8 (Elementary)                       5956               0.390
## 3 Grades 9 though 11 (Some high school)                16219               0.521
## 4 Grade 12 or GED (High school graduate)               70714               0.417
## 5 College 1 year to 3 years (Some college or technica~ 63166               0.370
## 6 College 4 years or more (College graduate)           58024               0.222

As far as we can tell from the result, smoking habit and education level is somewhat related. When people have education level less than collage, the percentage of smoker (ever smoked) is much high and reaches its peak at population have “some high school” education. As biases may exist, further study may needed.

As we can see from the plot, there are more female smoker than male, the difference is less than 10000

Research quesion 2:

As the plot shows above, there are more female than male have a college or higher degree. Approximately 22568.

Research quesion 3:

## [1] 7

Based on the the plot, there are more people have a college or higher degree sleep less than 7 hours than those who sleep more than 7 hours. From gender preservative, more female tend to have high degree than male, regardless of sleep time.