OHC Survey overview

Download SAS data

Download SPSS data

Download Excel file

Download txt file

OHC SURVEY OVERVIEW

Introduction to OHC Survey sampling design

The sampling design of the OHC Survey is an example of stratified cluster sampling where both one-stage and two-stage sampling have been used. The pedacogical data set constructed for training purposes in this example includes a total of 250 clusters, i.e. industrial establishments, organized in five strata (stratums 2 to 6), and a total of 7841 persons. Stratification is based on the type of industry and cluster size (the number of salaried employees). Clusters having at least 10 employees are included in the OHC example data set. There is variable number of clusters per stratum in the design. The average cluster sample size is 11.2 employees. A more detailed description of the study design and sampling design of the OHC Survey are given in Section 5.6 of Lehtonen and Pahkinen (2004).

To give you an idea of conditional distributions and correlations of response variable and predictor variables, basic descriptive statistics with the correlation matrix of AGE, PHYS, CHRON, PSYCH and PSYCH2 also are shown, separately for both SEXes.

The list of variables in the OHC Survey data set is shown below.

-----Variables Ordered by Position-----
# Variable Type Len Pos Label
1 STRATUM Num 8 8 Stratum Identification, 2 to 6
2 CLUSTER Num 8 16 Cluster Identification
3 SEX Num 8 24 1 males, 2 females
4 AGE Num 8 32 in years, range 15 to 64
5 PHYS Num 8 40 Physical Health Hazards, 1 present, 0 otherwise
6 CHRON Num 8 48 Chronic morbidity, 1 present, 0 otherwise
7 PSYCH Num 8 56 Psychic Strain, standardized first principal component of nine psychic symptoms
8 PSYCH2 Num 8 64 Psychic Strain, constructed from PSYCH such that score below median =0 and above median =1
SEX=1 (MALES)
5 Variables: AGE PHYS CHRON PSYCH PSYCH2
Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum
AGE 4485 37.53467 10.52197 168343 15.00000 64.00000
PHYS 4485 0.45953 0.49842 2061 0 1.00000
CHRON 4485 0.29298 0.45518 1314 0 1.00000
PSYCH 4485 -0.10078 0.96296 -452.01617 -0.99771 4.75788
PSYCH2 4485 0.45485 0.49801 2040 0 1.00000
Pearson Correlation Coefficients, N = 4485
Prob > |r| under H0: Rho=0
  AGE PHYS CHRON PSYCH PSYCH2
AGE
1.00000 -0.08666
<.0001
0.27922
<.0001
0.10566
<.0001
0.07330
<.0001
PHYS
-0.08666
<.0001
1.00000 0.03065
0.0401
0.06632
<.0001
0.05261
0.0004
CHRON
0.27922
<.0001
0.03065
0.0401
1.00000 0.18658
<.0001
0.13019
<.0001
PSYCH
0.10566
<.0001
0.06632
<.0001
0.18658
<.0001
1.00000 0.78232
<.0001
PSYCH2
0.07330
<.0001
0.05261
0.0004
0.13019
<.0001
0.78232
<.0001
1.00000

 
SEX=2 (FEMALES)
5 Variables: AGE PHYS CHRON PSYCH PSYCH2
Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum
AGE 3356 37.64511 10.91130 126337 17.00000 64.00000
PHYS 3356 0.19368 0.39524 650.00000 0 1.00000
CHRON 3356 0.29172 0.45462 979.00000 0 1.00000
PSYCH 3356 0.13469 1.03235 452.01617 -0.99771 4.75788
PSYCH2 3356 0.55900 0.49658 1876 0 1.00000
Pearson Correlation Coefficients, N = 3356
Prob > |r| under H0: Rho=0
  AGE PHYS CHRON PSYCH PSYCH2
AGE
1.00000 0.03530
0.0409
0.28712
<.0001
0.05143
0.0029
0.01324
0.4431
PHYS
0.03530
0.0409
1.00000 0.06533
0.0002
0.11954
<.0001
0.08451
<.0001
CHRON
0.28712
<.0001
0.06533
0.0002
1.00000 0.20057
<.0001
0.13565
<.0001
PSYCH
0.05143
0.0029
0.11954
<.0001
0.20057
<.0001
1.00000 0.74370
<.0001
PSYCH2
0.01324
0.4431
0.08451
<.0001
0.13565
<.0001
0.74370
<.0001
1.00000




NOTE: Read the dataset into R with these commands:

setwd("path to OHC data in your computer")
ohc<-read.table("ohc.txt", header=TRUE)