import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset
data = pd.read_csv('Impact_of_Remote_Work_on_Mental_Health.csv')

# Step 1: Check for missing values
missing_values = data.isnull().sum()

# Step 1.1: Descriptive statistics
descriptive_stats = data.describe()

# Step 2: Distribution of mental health conditions based on work location
plt.figure(figsize=(10,6))
sns.countplot(data=data, x='Work_Location', hue='Mental_Health_Condition')
plt.title('Distribution of Mental Health Conditions by Work Location')
plt.xlabel('Work Location')
plt.ylabel('Count')
plt.show()

# Step 3: Work-Life Balance across different work locations
plt.figure(figsize=(10,6))
sns.boxplot(data=data, x='Work_Location', y='Work_Life_Balance_Rating')
plt.title('Work-Life Balance Rating by Work Location')
plt.xlabel('Work Location')
plt.ylabel('Work-Life Balance Rating')
plt.show()

# Step 4: Stress Levels based on Mental Health Resources Access
plt.figure(figsize=(10,6))
sns.countplot(data=data, x='Access_to_Mental_Health_Resources', hue='Stress_Level')
plt.title('Stress Level by Access to Mental Health Resources')
plt.xlabel('Access to Mental Health Resources')
plt.ylabel('Count')
plt.show()

# Step 5: Satisfaction with Remote Work by Region
plt.figure(figsize=(10,6))
sns.countplot(data=data, x='Region', hue='Satisfaction_with_Remote_Work')
plt.title('Satisfaction with Remote Work by Region')
plt.xlabel('Region')
plt.ylabel('Count')
plt.show()

# Step 6: Correlation heatmap to analyze relationships between numerical features
# Select only numerical features for correlation analysis
numerical_data = data.select_dtypes(include=['number'])  # Select numerical columns only

plt.figure(figsize=(12,8))
sns.heatmap(numerical_data.corr(), annot=True, cmap='coolwarm') # Calculate correlation on numerical data
plt.title('Correlation Heatmap of Features')
plt.show()

# Show missing values and descriptive stats
missing_values, descriptive_stats

(Employee_ID                             0
 Age                                     0
 Gender                                  0
 Job_Role                                0
 Industry                                0
 Years_of_Experience                     0
 Work_Location                           0
 Hours_Worked_Per_Week                   0
 Number_of_Virtual_Meetings              0
 Work_Life_Balance_Rating                0
 Stress_Level                            0
 Mental_Health_Condition              1196
 Access_to_Mental_Health_Resources       0
 Productivity_Change                     0
 Social_Isolation_Rating                 0
 Satisfaction_with_Remote_Work           0
 Company_Support_for_Remote_Work         0
 Physical_Activity                    1629
 Sleep_Quality                           0
 Region                                  0
 dtype: int64,
                Age  Years_of_Experience  Hours_Worked_Per_Week  \
 count  5000.000000          5000.000000            5000.000000   
 mean     40.995000            17.810200              39.614600   
 std      11.296021            10.020412              11.860194   
 min      22.000000             1.000000              20.000000   
 25%      31.000000             9.000000              29.000000   
 50%      41.000000            18.000000              40.000000   
 75%      51.000000            26.000000              50.000000   
 max      60.000000            35.000000              60.000000   
 
        Number_of_Virtual_Meetings  Work_Life_Balance_Rating  \
 count                 5000.000000               5000.000000   
 mean                     7.559000                  2.984200   
 std                      4.636121                  1.410513   
 min                      0.000000                  1.000000   
 25%                      4.000000                  2.000000   
 50%                      8.000000                  3.000000   
 75%                     12.000000                  4.000000   
 max                     15.000000                  5.000000   
 
        Social_Isolation_Rating  Company_Support_for_Remote_Work  
 count              5000.000000                      5000.000000  
 mean                  2.993800                         3.007800  
 std                   1.394615                         1.399046  
 min                   1.000000                         1.000000  
 25%                   2.000000                         2.000000  
 50%                   3.000000                         3.000000  
 75%                   4.000000                         4.000000  
 max                   5.000000                         5.000000  )


# Step 1: Productivity Change by Work Location
plt.figure(figsize=(10,6))
sns.countplot(data=data, x='Work_Location', hue='Productivity_Change')
plt.title('Productivity Change by Work Location')
plt.xlabel('Work Location')
plt.ylabel('Count')
plt.show()

# Step 2: Relationship between Physical Activity and Mental Health Condition
plt.figure(figsize=(10,6))
sns.countplot(data=data, x='Physical_Activity', hue='Mental_Health_Condition')
plt.title('Physical Activity and Mental Health Condition')
plt.xlabel('Physical Activity')
plt.ylabel('Count')
plt.show()

# Step 3: Sleep Quality vs. Stress Level
plt.figure(figsize=(10,6))
sns.countplot(data=data, x='Sleep_Quality', hue='Stress_Level')
plt.title('Sleep Quality vs. Stress Level')
plt.xlabel('Sleep Quality')
plt.ylabel('Count')
plt.show()

# Step 4: Work-Life Balance and Stress Levels by Region
plt.figure(figsize=(10,6))
sns.boxplot(data=data, x='Region', y='Work_Life_Balance_Rating', hue='Stress_Level')
plt.title('Work-Life Balance and Stress Levels by Region')
plt.xlabel('Region')
plt.ylabel('Work-Life Balance Rating')
plt.show()

# Step 5: Satisfaction with Remote Work by Job Role
plt.figure(figsize=(10,6))
sns.countplot(data=data, x='Job_Role', hue='Satisfaction_with_Remote_Work', order=data['Job_Role'].value_counts().index)
plt.title('Satisfaction with Remote Work by Job Role')
plt.xticks(rotation=90)
plt.xlabel('Job Role')
plt.ylabel('Count')
plt.show()

# Step 6: Number of Virtual Meetings vs. Stress Level
plt.figure(figsize=(10,6))
sns.scatterplot(data=data, x='Number_of_Virtual_Meetings', y='Stress_Level', hue='Work_Location')
plt.title('Number of Virtual Meetings vs. Stress Level by Work Location')
plt.xlabel('Number of Virtual Meetings')
plt.ylabel('Stress Level')
plt.show()

Step-by-Step Guide for EDA:¶

Understand the Data Structure:¶

Initial Questions to Explore:¶

Descriptive Statistics:¶

Data Visualizations:¶

Key Insights:¶

Python Code to Conduct EDA:¶

The major takeaways from the EDA:¶

1. Missing Data:¶

2. Mental Health Conditions by Work Location:¶

3. Work-Life Balance Across Work Locations:¶

4. Stress Levels by Access to Mental Health Resources:¶

5. Satisfaction with Remote Work by Region:¶

6. Correlation Heatmap:¶

Conclusion:¶

7. Changes in productivity:¶

8. Relationship between physical activity and mental health:¶

9. Sleep quality and stress level:¶

10. Regional differences in stress and work-life balance:¶

11. Survey satisfaction with remote work based on job role:¶

12. Relationship between stress levels and the number of virtual meetings:¶

Key takeaways from the additional analysis:¶

7. Productivity Change by Work Location:¶

8. Relationship Between Physical Activity and Mental Health Condition:¶

9. Sleep Quality vs. Stress Level:¶

10. Work-Life Balance and Stress Levels by Region:¶

11. Satisfaction with Remote Work by Job Role:¶

12. Number of Virtual Meetings vs. Stress Level:¶

Productivity:¶

Remote Work: Employees working remotely show both increases and decreases in productivity, suggesting a mixed experience. While some may thrive in the remote environment due to fewer distractions or more flexible hours, others may struggle due to a lack of structure or collaboration.¶

Hybrid Work: Hybrid workers seem to experience a more stable productivity level, with fewer reporting significant drops or increases. This balance between working onsite and remotely might provide the best of both worlds for maintaining consistent productivity.¶

Onsite Work: Onsite workers are less likely to report an increase in productivity. The structure of onsite work could lead to more predictable productivity, but it may lack the flexibility that some employees need for optimal performance.¶

Conclusion: Hybrid work appears to provide the most stable productivity outcomes, while remote work offers flexibility that can lead to either significant gains or losses in productivity depending on individual circumstances.¶

Employee Satisfaction:¶

Onsite Work: Onsite workers report lower satisfaction, which may be linked to the lack of flexibility and increased commuting or rigid working hours. This is especially true for roles that do not require constant in-person presence.¶

Conclusion: Employee satisfaction tends to be higher for those working in hybrid models or in remote roles that suit independent work. Onsite work seems to result in lower satisfaction, particularly for employees who value flexibility.¶

Final Verdict:¶

Step-by-Step Guide for EDA:¶

Understand the Data Structure:¶

Initial Questions to Explore:¶

Descriptive Statistics:¶

Data Visualizations:¶

Key Insights:¶

Python Code to Conduct EDA:¶

The major takeaways from the EDA:¶

1. Missing Data:¶

2. Mental Health Conditions by Work Location:¶

3. Work-Life Balance Across Work Locations:¶

4. Stress Levels by Access to Mental Health Resources:¶

5. Satisfaction with Remote Work by Region:¶

6. Correlation Heatmap:¶

Conclusion:¶

7. Changes in productivity:¶

8. Relationship between physical activity and mental health:¶

9. Sleep quality and stress level:¶

10. Regional differences in stress and work-life balance:¶

11. Survey satisfaction with remote work based on job role:¶

12. Relationship between stress levels and the number of virtual meetings:¶

Key takeaways from the additional analysis:¶

7. Productivity Change by Work Location:¶

8. Relationship Between Physical Activity and Mental Health Condition:¶

9. Sleep Quality vs. Stress Level:¶

10. Work-Life Balance and Stress Levels by Region:¶

11. Satisfaction with Remote Work by Job Role:¶

12. Number of Virtual Meetings vs. Stress Level:¶

Productivity:¶

Remote Work: Employees working remotely show both increases and decreases in productivity, suggesting a mixed experience. While some may thrive in the remote environment due to fewer distractions or more flexible hours, others may struggle due to a lack of structure or collaboration.¶

Hybrid Work: Hybrid workers seem to experience a more stable productivity level, with fewer reporting significant drops or increases. This balance between working onsite and remotely might provide the best of both worlds for maintaining consistent productivity.¶

Onsite Work: Onsite workers are less likely to report an increase in productivity. The structure of onsite work could lead to more predictable productivity, but it may lack the flexibility that some employees need for optimal performance.¶

Conclusion: Hybrid work appears to provide the most stable productivity outcomes, while remote work offers flexibility that can lead to either significant gains or losses in productivity depending on individual circumstances.¶

Employee Satisfaction:¶

Hybrid Work: Hybrid work allows employees to benefit from both flexibility and social interaction. This model may lead to higher overall satisfaction as employees can tailor their schedules to meet both personal and professional needs.¶

Onsite Work: Onsite workers report lower satisfaction, which may be linked to the lack of flexibility and increased commuting or rigid working hours. This is especially true for roles that do not require constant in-person presence.¶

Conclusion: Employee satisfaction tends to be higher for those working in hybrid models or in remote roles that suit independent work. Onsite work seems to result in lower satisfaction, particularly for employees who value flexibility.¶

Final Verdict:¶