Mental Health in Tech

Implementation of Replicated SeeDB for Smart Recommendation of Visualizations
Python, PostgreSQL, Psycopg2

This project is the implementation of the replicated SeeDB to visualize and extract knowledge from the 'Mental Health in Tech' dataset. It includes Exploratory Data Analysis and various pruning techniques for optimization.

The 'Mental Health in Tech' dataset is a survey dataset that can be used to assess the mental health of people working in the technological industry. The dataset has been obtained from Kaggle and originally contains 27 features and 1259 data points. It contains attributes/questions like:
   * Age
   * Gender
   * Country
   * State (US), Are you self-employed?
   * Do you have history of mental illness?
   * Have you sought treatment for a mental health condition?
   * Do you work remotely?

A sample of 5 datapoints is given below. The table has 5 rows and 27 columns.

Timestamp Age Gender Country state self_employed family_history treatment work_interfere no_employees remote_work tech_company benefits care_options wellness_program seek_help anonymity leave mental_health_consequence phys_health_consequence coworkers supervisor mental_health_interview phys_health_interview mental_vs_physical obs_consequence comments
0 2014-08-27 11:29:31 37 Female United States IL NaN No Yes Often 6-25 No Yes Yes Not sure No Yes Yes Somewhat easy No No Some of them Yes No Maybe Yes No NaN
1 2014-08-27 11:29:37 44 M United States IN NaN No No Rarely More than 1000 No No Don't know No Don't know Don't know Don't know Don't know Maybe No No No No No Don't know No NaN
2 2014-08-27 11:29:44 32 Male Canada NaN NaN No No Rarely 6-25 No Yes No No No No Don't know Somewhat difficult No No Yes Yes Yes Yes No No NaN
3 2014-08-27 11:29:46 31 Male United Kingdom NaN NaN Yes Yes Often 26-100 No Yes No Yes No No No Somewhat difficult Yes Yes Some of them No Maybe Maybe No Yes NaN
4 2014-08-27 11:30:22 31 Male United States TX NaN No No Never 100-500 Yes Yes Yes No Don't know Don't know Don't know Don't know No No Some of them Yes Yes Yes Don't know No NaN


The answers to these questions give us a feel about the possible mental health condition of the people working in the tech industry. It can also be utilized to answer some thought-provoking questions like:
   * How does the frequency of mental health illness vary by geographic location?
   * What are the strongest predictors of mental health illness in the workplace?
   * How does the attitudes towards mental health vary by geographic location?
   * What are the strongest predictors of certain attitudes towards mental health in the workplace?

Four sets of target-reference view queries have been implemented, that is, [self_employed, not_self_employed], [family_history, no_family_history], [treatment, no_treatment] and [remote_work, no_remote_work]. Therefore, a total of 4*2 = 8 tables were generated, and four sets of values were defined for Dimension Attributes A. The top-3 recommendations for the visualizations on this dataset are:

The figure (A) shows the top-3 recommended visualizations for [self_employed, not_self_employed], (B) shows for [family_history, no_family_history], (C) for [treatment, no_treatment], and (D) for [remote_work, no_remote_work]. Read more on my GitHub.