Setup

Load data

Make sure your data and R Markdown files are in the same directory. When loaded your data file will be called brfss2013. Delete this note when before you submit your work.

## [1] 491775    330

The dimentions of the data indicate that there are almost 500,000 samples (rows), and around 330 possible features (columns). Since everyone didn’t answer all of the questions, some samples have missing features as well. This requires us to preprocess our data before using it to make any decisions


Part 1: Data

BRFSS Background

The Behavioral Risk Factor Surveillance System (BRFSS) is the system of health-related telephone surveys that collect state data about U.S. residents regarding their health-related risk behaviors, chronic health conditions, and use of preventive services. The CDC official website statest that, “The Behavioral Risk Factor Surveillance System (BRFSS) is the nation’s premier system of health-related telephone surveys that collect state data about U.S. residents regarding their health-related risk behaviors, chronic health conditions, and use of preventive services. Established in 1984 with 15 states, BRFSS now collects data in all 50 states as well as the District of Columbia and three U.S. territories. BRFSS completes more than 400,000 adult interviews each year, making it the largest continuously conducted health survey system in the world.”

How they collect data

“BRFSS is a cross-sectional telephone survey that state health departments conduct monthly over landline telephones and cellular telephones with a standardized questionnaire and technical and methodological assistance from CDC. In conducting the BRFSS landline telephone survey, interviewers collect data from a randomly selected adult in a household. In conducting the cellular telephone version of the BRFSS questionnaire, interviewers collect data from an adult who participates by using a cellular telephone and resides in a private residence or college housing.”

Generalizability & Causality

Before jumping into the exploratory analysis of the BRFSS data, we need to establish our understanding on the generalizability and causality within the data.

  • The data can be generalized to the broader US population since the surveys are conducted across all 50 US states, which means it captures enough random sample.

  • No causality because the BRFSS is an observational exercise - with no explicit random assignments to treatments - all relationships indicated may indication association, but not causation.

Part 2: Research questions

Research quesion 1:

Is the respondend’s opinion about their health status related to the fact that they smoke? Are the observation similar between genders?

The question is pretty interesting before there is a general perception that people who smoke tend to have a bad health compared to those who don’t. The difference between genders is also interesting because if there is an association between smoking and general health, then it should be similar amongst genders.

The variable required for this analysis are:

Research quesion 2:

Does education have a significant impact on the salaries and employement status of individuals? How is it different among men and women?

The variable required for this analysis are:

Research quesion 3:

Does employement have any impact on the mental health of individuals? In other words, Deos job/salary have any relation with individuals feeling depressed, nervous, or restless? Is the observation similar between genders

This question is about understanding if mental health of individuals is affected by their employement status. The results we see might have multiple interpretations, positive or negative, to further elaborate upon, but we can still analyze the intensity of impact on their mental health.

The variable required for this analysis are:


Part 3: Exploratory data analysis

NOTE: Insert code chunks as needed by clicking on the “Insert a new code chunk” button (green button with orange arrow) above. Make sure that your code is visible in the project you submit. Delete this note when before you submit your work.

Research quesion 1:

Is the respondend’s opinion about their health status related to the fact that they smoke? Are the observation similar between genders?

## [1] 474987      3
## [1] 213876      3
##      
##       Excellent Very good      Good      Fair      Poor
##   Yes 0.3425677 0.4111267 0.4786293 0.5481280 0.6321580
##   No  0.6574323 0.5888733 0.5213707 0.4518720 0.3678420

After the initial load of data (over 470,000 observations), we can take an initial look at the frequency of responses and then consider their proportion.

The way to interpret the table above is that for each column (“Excellent”,“Very good”,“Good”,“Fair”,“Poor”), what is the proportion of respondents that indicated their smoking (Yes, No). In other words, the column sums up to 1.

Lets jump to graphical representations

The plots depicts the comparison between people who smoke everday, some days, and not at all. The categorical bar graph clearly shows the difference in the proportion of poeple with health and smoke everyday/somedays/not at all. There are indeed a very less proportion of people with excellent health and who smoke everyday. The bar chart of the people who don’t smoke at all is a little right skewed, meaning that there are more people towards the excellent/very good health.

This tells us that maybe smoking does have an impact on ones health. Lets explore a little more on this.

The interesting thing to obeserve here is that both male and female have a similar trend. We see that both have a larger proportion of people with Excellent opinion about their health in the No category of smoking. Similarly, the people with Poor health opinion are in abundance in the peoplpe who smoked more than 100 cigarettes.

Another way to look at this is as a categorical bar graph

The thing to observe from the above plots is that from a normal distribution perspective, both the bar graphs of people who did not smoke at least 100 cigarettes are left skewed. This means that people who do not smoke tend to have a positive opinion about their health as compared to the people who fall in the other category.

Research quesion 2:

Does education have a significant impact on the salaries and employement status of individuals? How is it different among men and women?

The aim of this analysis is to identify the intensity of impact educations has on the salary and employement of individuals. Moreoever, we also aim to observe if there’s any difference in the trend among genders, and if males in general are paid higher than females.

Initializing Variables

## [1] 402328      5

For this analysis we have around 400,000 samples, which are pretty enough to generalize our opinion for the whole populations.

Next, lets see the levels (categories) of the variables we’ve selected.

## [1] "Never attended school or only kindergarten"                  
## [2] "Grades 1 through 8 (Elementary)"                             
## [3] "Grades 9 though 11 (Some high school)"                       
## [4] "Grade 12 or GED (High school graduate)"                      
## [5] "College 1 year to 3 years (Some college or technical school)"
## [6] "College 4 years or more (College graduate)"
## [1] "Employed for wages"               "Self-employed"                   
## [3] "Out of work for 1 year or more"   "Out of work for less than 1 year"
## [5] "A homemaker"                      "A student"                       
## [7] "Retired"                          "Unable to work"
## [1] "Less than $10,000" "Less than $15,000" "Less than $20,000"
## [4] "Less than $25,000" "Less than $35,000" "Less than $50,000"
## [7] "Less than $75,000" "$75,000 or more"

Lets start the analysis with observing the proportion of men and women that lie in each category of the salary range.

The results are surprisingly different than what I expected them to. The graphs show that theres almost an equal number of men and women in the highest paying salary category. But looking at the rest of the categories, we understand that all of them have women in abundane. This could potentially have two explanations: - Maybe the number of samples for women in the sample set is more than men - Out of all the men, maybe most of them lie in the highest paying category, and that’s why they are lesser in the rest of the categories.

##   Male Female 
## 175295 227033

The summary table shows us that the number of females does outrun the number of females in the data. Maybe that’s why our plots are not as accurate as we expected them to be.

Lets jump to our actual analysis that relates education with incomes.

This plot can help us understand a lot about the impact of education on wages. We can observe with the very first glimpse at this graph that people who never attended school are ones who have difficulty finding jobs even for a wage of less than $10000. Secondly, we can observe from the graph that people who are college graduates tend to capture the major chunk of salaries greater than $50,000. This clearly shows us that people who finish their college degrees are at better luck than those who didn’t.

Bill Gates and Steve Jobs would disagree

The plot above gives a further insight about our previous conclusion. See from the bars of the height the amount of people lying in each category, and we can clearly observe here that out of the whole sample population, most of them lie in the employeed for wage category, and within them, the number of people who have the college 4 years degree, are the ones dominating with around 78000 population. This clearly shows that education does have an impact of getting a job. There could be multiple interpretations to that. Maybe seeing such a high proportion of college graduated employed, one would say our colleges are training people to work under poeple and not for themselves

The trend for males and females is pretty much the same for all salary ranges, and all education levels, so the deduction is pretty much the one we discussed earlier in this.

Research quesion 3:

Does employement have any impact on the mental health of individuals? In other words, Deos job/salary have any relation with individuals feeling depressed, nervous, hopeless, or restless? Is the observation similar between genders

## [1] 30647     7

For this question, the sample size we have is around 30000 and we’ll use this to make a few observations

Lets identify the levels of income and employement

## [1] "Employed for wages"               "Self-employed"                   
## [3] "Out of work for 1 year or more"   "Out of work for less than 1 year"
## [5] "A homemaker"                      "A student"                       
## [7] "Retired"                          "Unable to work"
## [1] "Less than $10,000" "Less than $15,000" "Less than $20,000"
## [4] "Less than $25,000" "Less than $35,000" "Less than $50,000"
## [7] "Less than $75,000" "$75,000 or more"
## [1] "All"      "Most"     "Some"     "A little" "None"

bfs Now lets compare these level against the people who claim to feel depressed. Lets see if theres any relationship between the income and employement levels and depression.

We can see an intersting observation from the plots that we’ve generated above. The plots essentially show us the proportion of individuals employements status in each category of the depressed variable. We can clearly see from the trents that a major chunk of people who said they felt depressed all the time are also the people who said that they are unable to work. Similarly, the people who said they didn’t feel depresed at all in the last 30 days lie in the category of people who are employed for wages. The people who are unable to work are in a very negligible.

The second plot correlates the depression with the amount of salary individuals are earning. The second plots fill detail is exactly opposite to the first one. We see in the second plot that people who said they are depressed all the time have a significant number of them lying in the “Less than $10,000” category. The poeple who said they’re not depressed at all have most of them lying in the “$75,000 and more” category.

The graphs shows the same trend we saw earlier, just the difference that exists inbetween genders. We can observe that there are significantly more women who have a salary less than 10,000 as compared to men. We can see that men in the range less than $15,000 are more than those in the less than $10,000 range. This does not invalidate out results on how salary range affects the mental health. That is because discrimination in terms of salary exists in the status quo, and that is the reason behind there being such a less number of men in the less than $10,000 range.

Lets see if we find the same trend for people who felt hopeless, nervous

The plots show a similar trend as the one we discussed before. It is indeed visible that employement status and salary range may just have an impact on whether or not they feel depressed, nervous or hopeless. There are various interpretations that we can extract from the plots that we’ve observed, but we can be certian about the fact that being highly payed does result in people having a relatively better mental health compared to those who’re unemployed or have a very low salary.

This can be because in the current status quo, apparently most of the necessities of daily life require you to have money in your pockets. Having no money or less money can bring about significant impact on the mental health because that causes people to be worried all the time, which results in them feeling depressed, hopeless and nervous.

We can further analyze and draw in depth conclusions for this particular problem using numerical variables as well, which could further solidify our understanding on why people get mentally affected by their employement status and salary range.