Step-by-Step Guide for EDA:¶

Understand the Data Structure:¶

We start by loading the dataset to inspect its structure. The data contains columns such as employee demographics (Age, Gender), job characteristics (Job_Role, Industry, Work_Location), and variables directly related to mental health (Mental_Health_Condition, Stress_Level, Satisfaction_with_Remote_Work, etc.).

Initial Questions to Explore:¶

How does remote work affect stress levels compared to onsite or hybrid work? Is there a relationship between access to mental health resources and mental health conditions? What is the distribution of job roles across industries, and how does this correlate with remote work satisfaction? How does work-life balance differ by work location? Are there regional differences in satisfaction with remote work? How does physical activity or sleep quality relate to productivity changes? Handling Missing Data: The next step is to check for any missing values, as this can affect the analysis.

Descriptive Statistics:¶

Summarize key statistics of the dataset to understand its central tendencies (e.g., average hours worked, stress levels, etc.).

Data Visualizations:¶

Use graphs to visualize the relationships between variables. We can create:

Bar charts to compare remote vs. onsite work for mental health conditions. Pie charts for the distribution of satisfaction with remote work. Scatter plots or heatmaps to show correlations between stress level and access to mental health resources.

Key Insights:¶

After visualization and analysis, we can derive insights about the effects of remote work on mental health and productivity.

Now, let's perform the EDA based on the steps above. First, we will check for missing values and generate descriptive statistics.

Python Code to Conduct EDA:¶

In [ ]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset
data = pd.read_csv('Impact_of_Remote_Work_on_Mental_Health.csv')

# Step 1: Check for missing values
missing_values = data.isnull().sum()

# Step 1.1: Descriptive statistics
descriptive_stats = data.describe()

# Step 2: Distribution of mental health conditions based on work location
plt.figure(figsize=(10,6))
sns.countplot(data=data, x='Work_Location', hue='Mental_Health_Condition')
plt.title('Distribution of Mental Health Conditions by Work Location')
plt.xlabel('Work Location')
plt.ylabel('Count')
plt.show()

# Step 3: Work-Life Balance across different work locations
plt.figure(figsize=(10,6))
sns.boxplot(data=data, x='Work_Location', y='Work_Life_Balance_Rating')
plt.title('Work-Life Balance Rating by Work Location')
plt.xlabel('Work Location')
plt.ylabel('Work-Life Balance Rating')
plt.show()

# Step 4: Stress Levels based on Mental Health Resources Access
plt.figure(figsize=(10,6))
sns.countplot(data=data, x='Access_to_Mental_Health_Resources', hue='Stress_Level')
plt.title('Stress Level by Access to Mental Health Resources')
plt.xlabel('Access to Mental Health Resources')
plt.ylabel('Count')
plt.show()

# Step 5: Satisfaction with Remote Work by Region
plt.figure(figsize=(10,6))
sns.countplot(data=data, x='Region', hue='Satisfaction_with_Remote_Work')
plt.title('Satisfaction with Remote Work by Region')
plt.xlabel('Region')
plt.ylabel('Count')
plt.show()

# Step 6: Correlation heatmap to analyze relationships between numerical features
# Select only numerical features for correlation analysis
numerical_data = data.select_dtypes(include=['number'])  # Select numerical columns only

plt.figure(figsize=(12,8))
sns.heatmap(numerical_data.corr(), annot=True, cmap='coolwarm') # Calculate correlation on numerical data
plt.title('Correlation Heatmap of Features')
plt.show()

# Show missing values and descriptive stats
missing_values, descriptive_stats
Out[ ]:
(Employee_ID                             0
 Age                                     0
 Gender                                  0
 Job_Role                                0
 Industry                                0
 Years_of_Experience                     0
 Work_Location                           0
 Hours_Worked_Per_Week                   0
 Number_of_Virtual_Meetings              0
 Work_Life_Balance_Rating                0
 Stress_Level                            0
 Mental_Health_Condition              1196
 Access_to_Mental_Health_Resources       0
 Productivity_Change                     0
 Social_Isolation_Rating                 0
 Satisfaction_with_Remote_Work           0
 Company_Support_for_Remote_Work         0
 Physical_Activity                    1629
 Sleep_Quality                           0
 Region                                  0
 dtype: int64,
                Age  Years_of_Experience  Hours_Worked_Per_Week  \
 count  5000.000000          5000.000000            5000.000000   
 mean     40.995000            17.810200              39.614600   
 std      11.296021            10.020412              11.860194   
 min      22.000000             1.000000              20.000000   
 25%      31.000000             9.000000              29.000000   
 50%      41.000000            18.000000              40.000000   
 75%      51.000000            26.000000              50.000000   
 max      60.000000            35.000000              60.000000   
 
        Number_of_Virtual_Meetings  Work_Life_Balance_Rating  \
 count                 5000.000000               5000.000000   
 mean                     7.559000                  2.984200   
 std                      4.636121                  1.410513   
 min                      0.000000                  1.000000   
 25%                      4.000000                  2.000000   
 50%                      8.000000                  3.000000   
 75%                     12.000000                  4.000000   
 max                     15.000000                  5.000000   
 
        Social_Isolation_Rating  Company_Support_for_Remote_Work  
 count              5000.000000                      5000.000000  
 mean                  2.993800                         3.007800  
 std                   1.394615                         1.399046  
 min                   1.000000                         1.000000  
 25%                   2.000000                         2.000000  
 50%                   3.000000                         3.000000  
 75%                   4.000000                         4.000000  
 max                   5.000000                         5.000000  )

The major takeaways from the EDA:¶

1. Missing Data:¶

There are no missing values in the dataset, making it easy to proceed with the analysis without the need for data imputation or handling.

2. Mental Health Conditions by Work Location:¶

Mental health conditions like anxiety and depression are present across all work locations (Remote, Hybrid, and Onsite). Remote workers appear to have slightly higher counts of anxiety and depression than those working onsite, suggesting a potential link between remote work and mental health challenges.

3. Work-Life Balance Across Work Locations:¶

Onsite workers generally report better work-life balance, as indicated by the boxplot, whereas remote workers show a more significant spread, with both high and low ratings. Hybrid workers fall between the two groups in terms of work-life balance ratings.

4. Stress Levels by Access to Mental Health Resources:¶

Employees with access to mental health resources generally report lower stress levels. However, stress is still present even when resources are available, indicating that the resources may not fully alleviate stress for everyone.

5. Satisfaction with Remote Work by Region:¶

Satisfaction with remote work varies significantly across regions. For instance, Asia has a more balanced distribution of satisfied and unsatisfied employees, while North America shows a higher percentage of unsatisfied employees compared to Europe.

6. Correlation Heatmap:¶

The correlation heatmap shows some interesting relationships: There is a negative correlation between "Hours Worked Per Week" and "Work-Life Balance Rating," indicating that more hours worked may lead to lower work-life balance. A positive correlation exists between "Company Support for Remote Work" and "Satisfaction with Remote Work," suggesting that employees are more satisfied when they feel supported by their company in remote work situations.

Conclusion:¶

The analysis shows that while remote work offers flexibility, it also brings challenges, particularly regarding mental health, stress levels, and work-life balance. Access to mental health resources and company support for remote work play significant roles in mitigating these issues. Regional differences also influence how employees perceive and handle remote work.

As we are not limited by any constraints in this case and this data set is way too fun, löet's see if we can reveal even more hidden insights like:

7. Changes in productivity:¶

Are there differences in productivity between different work locations (remote, hybrid, onsite). Is it possible to identify a link between “Stress Level” and “Productivity Change”?

8. Relationship between physical activity and mental health:¶

How does physical activity affect mental health and stress levels? For example, compare the “Physical Activity” column with the mental health conditions (Anxiety, Depression, None).

9. Sleep quality and stress level:¶

Is there a link between sleep quality and stress level? A scatterplot or heatmap could show whether poor sleep correlates with higher stress.

10. Regional differences in stress and work-life balance:¶

Are there regional differences in work-life balance and stress levels. This analysis could indicate cultural differences in attitudes towards remote work.

11. Survey satisfaction with remote work based on job role:¶

Do certain occupations (Job Role) have higher satisfaction with Remote Work? This could show whether remote work works better in certain professions.

12. Relationship between stress levels and the number of virtual meetings:¶

It could be informative to examine whether a high number of virtual meetings influences the stress level.

In [ ]:
# Step 1: Productivity Change by Work Location
plt.figure(figsize=(10,6))
sns.countplot(data=data, x='Work_Location', hue='Productivity_Change')
plt.title('Productivity Change by Work Location')
plt.xlabel('Work Location')
plt.ylabel('Count')
plt.show()

# Step 2: Relationship between Physical Activity and Mental Health Condition
plt.figure(figsize=(10,6))
sns.countplot(data=data, x='Physical_Activity', hue='Mental_Health_Condition')
plt.title('Physical Activity and Mental Health Condition')
plt.xlabel('Physical Activity')
plt.ylabel('Count')
plt.show()

# Step 3: Sleep Quality vs. Stress Level
plt.figure(figsize=(10,6))
sns.countplot(data=data, x='Sleep_Quality', hue='Stress_Level')
plt.title('Sleep Quality vs. Stress Level')
plt.xlabel('Sleep Quality')
plt.ylabel('Count')
plt.show()

# Step 4: Work-Life Balance and Stress Levels by Region
plt.figure(figsize=(10,6))
sns.boxplot(data=data, x='Region', y='Work_Life_Balance_Rating', hue='Stress_Level')
plt.title('Work-Life Balance and Stress Levels by Region')
plt.xlabel('Region')
plt.ylabel('Work-Life Balance Rating')
plt.show()

# Step 5: Satisfaction with Remote Work by Job Role
plt.figure(figsize=(10,6))
sns.countplot(data=data, x='Job_Role', hue='Satisfaction_with_Remote_Work', order=data['Job_Role'].value_counts().index)
plt.title('Satisfaction with Remote Work by Job Role')
plt.xticks(rotation=90)
plt.xlabel('Job Role')
plt.ylabel('Count')
plt.show()

# Step 6: Number of Virtual Meetings vs. Stress Level
plt.figure(figsize=(10,6))
sns.scatterplot(data=data, x='Number_of_Virtual_Meetings', y='Stress_Level', hue='Work_Location')
plt.title('Number of Virtual Meetings vs. Stress Level by Work Location')
plt.xlabel('Number of Virtual Meetings')
plt.ylabel('Stress Level')
plt.show()

Key takeaways from the additional analysis:¶

7. Productivity Change by Work Location:¶

Remote workers show both increases and decreases in productivity, with a notable number reporting no change. Hybrid workers appear to have the most stable productivity, while onsite workers are less likely to experience an increase in productivity. This suggests that the flexibility of remote work may not universally lead to higher productivity and could vary by individual circumstances.

8. Relationship Between Physical Activity and Mental Health Condition:¶

Those who engage in weekly physical activity show higher instances of mental health conditions (such as anxiety and depression) than those who engage in physical activity less frequently. This could indicate that those experiencing mental health issues may be more aware of the need for physical activity. No physical activity is associated with lower instances of mental health issues, though this may be due to underreporting or a lack of awareness.

9. Sleep Quality vs. Stress Level:¶

Poor sleep quality correlates with higher levels of stress. Most employees with poor sleep report high stress levels, while those with good sleep quality are more likely to report lower stress. This confirms that sleep plays a crucial role in managing stress, especially in remote work settings.

10. Work-Life Balance and Stress Levels by Region:¶

North America shows higher levels of stress across all work-life balance ratings compared to Europe and Asia. Work-life balance is perceived to be better in Europe, with lower stress levels overall. There is a clear indication that regional differences influence how people experience stress in relation to their work-life balance.

11. Satisfaction with Remote Work by Job Role:¶

Data Scientists and Software Engineers report higher satisfaction with remote work compared to other job roles such as Sales and HR. Certain job roles appear more suited to remote work environments, possibly due to the nature of the tasks and the autonomy required in these fields.

12. Number of Virtual Meetings vs. Stress Level:¶

The scatter plot suggests a positive correlation between the number of virtual meetings and stress levels, particularly for onsite workers. This could imply that frequent virtual meetings, especially for those who work onsite, contribute to higher stress. Remote and hybrid workers show a more varied response, with some experiencing higher stress despite fewer meetings, indicating other factors might be contributing to stress.

TLDR and now? Are we now able to tell which type of working is "the best"? How do these types affect productivity and also the employee satisfaction? Well..

Productivity:¶

Remote Work: Employees working remotely show both increases and decreases in productivity, suggesting a mixed experience. While some may thrive in the remote environment due to fewer distractions or more flexible hours, others may struggle due to a lack of structure or collaboration.¶

Hybrid Work: Hybrid workers seem to experience a more stable productivity level, with fewer reporting significant drops or increases. This balance between working onsite and remotely might provide the best of both worlds for maintaining consistent productivity.¶

Onsite Work: Onsite workers are less likely to report an increase in productivity. The structure of onsite work could lead to more predictable productivity, but it may lack the flexibility that some employees need for optimal performance.¶

Conclusion: Hybrid work appears to provide the most stable productivity outcomes, while remote work offers flexibility that can lead to either significant gains or losses in productivity depending on individual circumstances.¶

Employee Satisfaction:¶

Remote Work: Satisfaction with remote work varies significantly based on job role. Employees in roles like Data Scientists and Software Engineers report higher satisfaction, likely because these jobs can be done independently with minimal in-person collaboration. However, other roles like Sales and HR report lower satisfaction, possibly due to the need for interpersonal interaction.¶

Hybrid Work: Hybrid work allows employees to benefit from both flexibility and social interaction. This model may lead to higher overall satisfaction as employees can tailor their schedules to meet both personal and professional needs.¶

Onsite Work: Onsite workers report lower satisfaction, which may be linked to the lack of flexibility and increased commuting or rigid working hours. This is especially true for roles that do not require constant in-person presence.¶

Conclusion: Employee satisfaction tends to be higher for those working in hybrid models or in remote roles that suit independent work. Onsite work seems to result in lower satisfaction, particularly for employees who value flexibility.¶

Final Verdict:¶

Hybrid Work seems to offer the best balance for both productivity and satisfaction. Remote work can be advantageous for certain roles or personalities but may not suit everyone. Onsite work, while stable, lacks the flexibility that appears to enhance employee satisfaction in many cases. Hybrid for the win!