Mental Health in Tech

Implementation of Replicated SeeDB for Smart Recommendation of Visualizations

Python, PostgreSQL, Psycopg2

This project is the implementation of the replicated SeeDB to visualize and extract knowledge from the 'Mental Health in Tech' dataset. It includes Exploratory Data Analysis and various pruning techniques for optimization.

The 'Mental Health in Tech' dataset is a survey dataset that can be used to assess the mental health of people working in the technological industry. The dataset has been obtained from Kaggle and originally contains 27 features and 1259 data points. It contains attributes/questions like:
* Age
* Gender
* Country
* State (US), Are you self-employed?
* Do you have history of mental illness?
* Have you sought treatment for a mental health condition?
* Do you work remotely?

A sample of 5 datapoints is given below. The table has 5 rows and 27 columns.

	Timestamp	Age	Gender	Country	state	self_employed	family_history	treatment	work_interfere	no_employees	remote_work	tech_company	benefits	care_options	wellness_program	seek_help	anonymity	leave	mental_health_consequence	phys_health_consequence	coworkers	supervisor	mental_health_interview	phys_health_interview	mental_vs_physical	obs_consequence	comments
0	2014-08-27 11:29:31	37	Female	United States	IL	NaN	No	Yes	Often	6-25	No	Yes	Yes	Not sure	No	Yes	Yes	Somewhat easy	No	No	Some of them	Yes	No	Maybe	Yes	No	NaN
1	2014-08-27 11:29:37	44	M	United States	IN	NaN	No	No	Rarely	More than 1000	No	No	Don't know	No	Don't know	Don't know	Don't know	Don't know	Maybe	No	No	No	No	No	Don't know	No	NaN
2	2014-08-27 11:29:44	32	Male	Canada	NaN	NaN	No	No	Rarely	6-25	No	Yes	No	No	No	No	Don't know	Somewhat difficult	No	No	Yes	Yes	Yes	Yes	No	No	NaN
3	2014-08-27 11:29:46	31	Male	United Kingdom	NaN	NaN	Yes	Yes	Often	26-100	No	Yes	No	Yes	No	No	No	Somewhat difficult	Yes	Yes	Some of them	No	Maybe	Maybe	No	Yes	NaN
4	2014-08-27 11:30:22	31	Male	United States	TX	NaN	No	No	Never	100-500	Yes	Yes	Yes	No	Don't know	Don't know	Don't know	Don't know	No	No	Some of them	Yes	Yes	Yes	Don't know	No	NaN

The answers to these questions give us a feel about the possible mental health condition of the people working in the tech industry. It can also be utilized to answer some thought-provoking questions like:
* How does the frequency of mental health illness vary by geographic location?
* What are the strongest predictors of mental health illness in the workplace?
* How does the attitudes towards mental health vary by geographic location?
* What are the strongest predictors of certain attitudes towards mental health in the workplace?

Four sets of target-reference view queries have been implemented, that is, [self_employed, not_self_employed], [family_history, no_family_history], [treatment, no_treatment] and [remote_work, no_remote_work]. Therefore, a total of 4*2 = 8 tables were generated, and four sets of values were defined for Dimension Attributes A. The top-3 recommendations for the visualizations on this dataset are:

The figure (A) shows the top-3 recommended visualizations for [self_employed, not_self_employed], (B) shows for [family_history, no_family_history], (C) for [treatment, no_treatment], and (D) for [remote_work, no_remote_work]. Read more on my GitHub.