Jednofaktorová MANOVA

Máme k dispozícii merania 150 egyptských lebiek z obdobia od r.4000 pnl do 150 nl. Merané veličiny sú: maximalna sirka (mb), bazibregmaticka vyska (bh), bazialnoveolarna dlzka (bl), nazalna vyska (nh) Zaujíma nás, ako sa tieto miery menili v case (systematické zmeny naznačujú, že mohlo dochádzať ku kríženiu s inými národmi).

library(heplots)
## Loading required package: car
## Loading required package: carData
## 
## Attaching package: 'heplots'
## The following object is masked from 'package:rgl':
## 
##     arrow3d
data("skulls", package = "HSAUR")
str(skulls)
## 'data.frame':    150 obs. of  5 variables:
##  $ epoch: Ord.factor w/ 5 levels "c4000BC"<"c3300BC"<..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ mb   : num  131 125 131 119 136 138 139 125 131 134 ...
##  $ bh   : num  138 131 132 132 143 137 130 136 134 134 ...
##  $ bl   : num  89 92 99 96 100 89 108 93 102 99 ...
##  $ nh   : num  49 48 50 44 54 56 48 48 51 51 ...
some(skulls, n=5)
##       epoch  mb  bh  bl nh
## 20  c4000BC 132 131 101 49
## 44  c3300BC 138 134  98 49
## 53  c3300BC 135 136  97 52
## 109  c200BC 136 128  93 54
## 126  cAD150 126 138 101 52

Vizualizácia dát

Scatterplots po dvojiciach

scatterplot(mb ~ bh|epoch, data=skulls, ellipse=TRUE, levels=0.68, smooth=FALSE, legend.coords="topright")

scatterplot(mb ~ bl|epoch, data=skulls, ellipse=TRUE, levels=0.68, smooth=FALSE, legend.coords="topright")

pairs(skulls[c("mb","bh","bl","nh")], main="Egyptian Skull Data: Five Epochs",pch=as.numeric(skulls$epoch), col=as.numeric(skulls$epoch))

Priemery podľa epochy

means <- aggregate(skulls[,c("mb", "bh", "bl", "nh")],list(epoch = skulls$epoch), mean)
rownames(means) <- levels(skulls$epoch)
means
##           epoch       mb       bh       bl       nh
## c4000BC c4000BC 131.3667 133.6000 99.16667 50.53333
## c3300BC c3300BC 132.3667 132.7000 99.06667 50.23333
## c1850BC c1850BC 134.4667 133.8000 96.03333 50.56667
## c200BC   c200BC 135.5000 132.3000 94.53333 51.96667
## cAD150   cAD150 136.1667 130.3333 93.50000 51.36667

Boxploty podľa epochy pre každú premennú

library(lattice)
library(reshape2)
vlab <- c("maxBreadth", "basibHeight", "basialLength", "nasalHeight")
sklong<-melt(skulls,id="epoch")
bwplot(value ~ epoch | variable, data=sklong, scales="free",ylab="Variable value",xlab="Epoch",strip=strip.custom(factor.levels=paste(vlab,"(",levels(sklong$variable),")",sep="")),panel=function(x,y,...){ panel.bwplot(x,y,...)
panel.linejoin(x,y,col="red",...)})

Viacrozmerná analýza rozptylu

Prvý spôsob

sk.manova <- manova(cbind(mb, bh, bl, nh) ~ epoch,data=skulls)
summary(sk.manova, test="Wilks")
##            Df   Wilks approx F num Df den Df   Pr(>F)    
## epoch       4 0.66359   3.9009     16 434.45 7.01e-07 ***
## Residuals 145                                            
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(sk.manova, test="Pillai")
##            Df  Pillai approx F num Df den Df    Pr(>F)    
## epoch       4 0.35331    3.512     16    580 4.675e-06 ***
## Residuals 145                                             
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(sk.manova, test="Roy")
##            Df    Roy approx F num Df den Df    Pr(>F)    
## epoch       4 0.4251    15.41      4    145 1.588e-10 ***
## Residuals 145                                            
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(sk.manova, test="Hotelling-Lawley")
##            Df Hotelling-Lawley approx F num Df den Df    Pr(>F)    
## epoch       4          0.48182    4.231     16    562 8.278e-08 ***
## Residuals 145                                                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Druhý spôsob

sk.mod<-lm(cbind(mb,bh,bl,nh)~epoch,data=skulls)
Manova(sk.mod,test="Wilks")
## 
## Type II MANOVA Tests: Wilks test statistic
##       Df test stat approx F num Df den Df   Pr(>F)    
## epoch  4   0.66359   3.9009     16 434.45 7.01e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
pairs(sk.manova)

Jednorozmerné testy

summary.aov(sk.manova)
##  Response mb :
##              Df  Sum Sq Mean Sq F value    Pr(>F)    
## epoch         4  502.83 125.707  5.9546 0.0001826 ***
## Residuals   145 3061.07  21.111                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Response bh :
##              Df Sum Sq Mean Sq F value  Pr(>F)  
## epoch         4  229.9  57.477  2.4474 0.04897 *
## Residuals   145 3405.3  23.485                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Response bl :
##              Df Sum Sq Mean Sq F value    Pr(>F)    
## epoch         4  803.3 200.823  8.3057 4.636e-06 ***
## Residuals   145 3506.0  24.179                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Response nh :
##              Df Sum Sq Mean Sq F value Pr(>F)
## epoch         4   61.2  15.300   1.507 0.2032
## Residuals   145 1472.1  10.153

Párové viacrozmerné testy pre dvojice hodnôt premennej ‘epoch’.

Výsledky naznačujú, že čím su epochy vzdialenejšie, tým viac sú rozdielne hodnoty jednotlivých rozmerov lebiek

summary(manova(cbind(mb, bh, bl, nh) ~ epoch, data = skulls,subset = epoch %in% c("c4000BC", "c3300BC")))
##           Df   Pillai approx F num Df den Df Pr(>F)
## epoch      1 0.027674  0.39135      4     55 0.8139
## Residuals 58
summary(manova(cbind(mb, bh, bl, nh) ~ epoch, data = skulls,subset = epoch %in% c("c4000BC", "c1850BC")))
##           Df  Pillai approx F num Df den Df  Pr(>F)  
## epoch      1 0.18757   3.1744      4     55 0.02035 *
## Residuals 58                                         
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(manova(cbind(mb, bh, bl, nh) ~ epoch, data = skulls,subset = epoch %in% c("c4000BC", "c200BC")))
##           Df  Pillai approx F num Df den Df    Pr(>F)    
## epoch      1 0.30297   5.9766      4     55 0.0004564 ***
## Residuals 58                                             
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(manova(cbind(mb, bh, bl, nh) ~ epoch, data = skulls, subset = epoch %in% c("c4000BC", "cAD150")))
##           Df  Pillai approx F num Df den Df    Pr(>F)    
## epoch      1 0.36182   7.7956      4     55 4.736e-05 ***
## Residuals 58                                             
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Bonferroniho intervaly spoľahlivosti

skulls.means <- as.matrix(means[,2:5]) 
skulls.S <-  crossprod(sk.manova$residuals) / sk.manova$df.residual
skulls.ns <- table(skulls$epoch)

(n <- sum(skulls.ns))
## [1] 150
(p <- dim(skulls.means)[2])
## [1] 4
(g <- dim(skulls.means)[1])
## [1] 5

Porovnanie c4000BC a c3300BC:

print( outer(skulls.means["c4000BC",]-skulls.means["c3300BC",], c(1,1)) +
outer(qt(1-0.05/(p*g*(g-1)),n-g) *sqrt((1/skulls.ns["c4000BC"]+1/skulls.ns["c3300BC"])*diag(skulls.S)),c(-1,1)),
digits=3 )
##     [,1] [,2]
## mb -4.91 2.91
## bh -3.22 5.02
## bl -4.08 4.28
## nh -2.41 3.01

Porovnanie c3300BC a cAD150:

print( outer(skulls.means["c3300BC",]-skulls.means["cAD150",], c(1,1)) +
       outer(qt(1-0.05/(p*g*(g-1)),n-g) *sqrt((1/skulls.ns["c3300BC"]+1/skulls.ns["cAD150"])*diag(skulls.S)),
             c(-1,1)),digits=3 )
##     [,1]  [,2]
## mb -7.71 0.105
## bh -1.75 6.486
## bl  1.39 9.746
## nh -3.84 1.575

Overenie predpokladov

attach(skulls)
by(skulls[,-1], epoch, var)
## epoch: c4000BC
##           mb         bh         bl         nh
## mb 26.309195  4.1517241  0.4540230  7.2459770
## bh  4.151724 19.9724138 -0.7931034  0.3931034
## bl  0.454023 -0.7931034 34.6264368 -1.9195402
## nh  7.245977  0.3931034 -1.9195402  7.6367816
## ------------------------------------------------------------ 
## epoch: c3300BC
##           mb        bh         bl        nh
## mb 23.136782  1.010345  4.7678161 1.8425287
## bh  1.010345 21.596552  3.3655172 5.6241379
## bl  4.767816  3.365517 18.8919540 0.1908046
## nh  1.842529  5.624138  0.1908046 8.7367816
## ------------------------------------------------------------ 
## epoch: c1850BC
##            mb          bh         bl          nh
## mb 12.1195402  0.78620690 -0.7747126  0.89885057
## bh  0.7862069 24.78620690  3.5931034 -0.08965517
## bl -0.7747126  3.59310345 20.7229885  1.67011494
## nh  0.8988506 -0.08965517  1.6701149 12.59885057
## ------------------------------------------------------------ 
## epoch: c200BC
##           mb        bh        bl       nh
## mb 15.362069 -5.534483 -2.172414 2.051724
## bh -5.534483 26.355172  8.110345 6.148276
## bl -2.172414  8.110345 21.085057 5.328736
## nh  2.051724  6.148276  5.328736 7.964368
## ------------------------------------------------------------ 
## epoch: cAD150
##            mb         bh         bl         nh
## mb 28.6264368 -0.2298851 -1.8793103 -1.9942529
## bh -0.2298851 24.7126437 11.7241379  2.1494253
## bl -1.8793103 11.7241379 25.5689655  0.3965517
## nh -1.9942529  2.1494253  0.3965517 13.8264368
bartlett.test(mb~epoch,data = skulls)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  mb by epoch
## Bartlett's K-squared = 7.3382, df = 4, p-value = 0.1191
bartlett.test(bh~epoch,data = skulls)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  bh by epoch
## Bartlett's K-squared = 0.73148, df = 4, p-value = 0.9474
bartlett.test(bl~epoch,data = skulls)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  bl by epoch
## Bartlett's K-squared = 3.5155, df = 4, p-value = 0.4755
bartlett.test(nh~epoch,data = skulls)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  nh by epoch
## Bartlett's K-squared = 4.3763, df = 4, p-value = 0.3575
(means.by.grps <- cbind(tapply(mb,epoch,mean), tapply(bh,epoch,mean), tapply(bl,epoch,mean), tapply(nh,epoch,mean)))
##             [,1]     [,2]     [,3]     [,4]
## c4000BC 131.3667 133.6000 99.16667 50.53333
## c3300BC 132.3667 132.7000 99.06667 50.23333
## c1850BC 134.4667 133.8000 96.03333 50.56667
## c200BC  135.5000 132.3000 94.53333 51.96667
## cAD150  136.1667 130.3333 93.50000 51.36667
boxM(skulls[ ,2:5], skulls[, 1])
## 
##  Box's M-test for Homogeneity of Covariance Matrices
## 
## data:  skulls[, 2:5]
## Chi-Sq (approx.) = 45.667, df = 40, p-value = 0.2483
library(mvnormtest)
mshapiro.test(t(skulls[,-1]))
## 
##  Shapiro-Wilk normality test
## 
## data:  Z
## W = 0.98687, p-value = 0.1685

Dvojfaktorová MANOVA

data(Plastic)
some(Plastic)
##    tear gloss opacity rate additive
## 2   6.2   9.9     6.4  Low      Low
## 4   6.5   9.6     4.1  Low      Low
## 9   6.1   9.5     1.9  Low     High
## 10  6.3   9.4     5.7  Low     High
## 13  7.2   8.3     3.8 High      Low
## 14  7.1   8.4     1.6 High      Low
## 15  6.8   8.5     3.4 High      Low
## 17  7.0   8.8     5.2 High     High
## 19  7.5  10.1     2.7 High     High
## 20  7.6   9.2     1.9 High     High
plastic.mod<-lm(cbind(tear,gloss,opacity)~rate*additive,data=Plastic)
Anova(plastic.mod,test.statistic="Roy")
## 
## Type II MANOVA Tests: Roy test statistic
##               Df test stat approx F num Df den Df   Pr(>F)   
## rate           1   1.61877   7.5543      3     14 0.003034 **
## additive       1   0.91192   4.2556      3     14 0.024745 * 
## rate:additive  1   0.28683   1.3385      3     14 0.301782   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Usudzujeme, že interakcia rate:additive nie je významná.

Jednorozmerné testy pre tear, gloss a opacity:

Anova(update(plastic.mod,tear~.))
## Anova Table (Type II tests)
## 
## Response: tear
##               Sum Sq Df F value   Pr(>F)   
## rate          1.7405  1 15.7868 0.001092 **
## additive      0.7605  1  6.8980 0.018330 * 
## rate:additive 0.0005  1  0.0045 0.947143   
## Residuals     1.7640 16                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Anova(update(plastic.mod,gloss~.))
## Anova Table (Type II tests)
## 
## Response: gloss
##               Sum Sq Df F value  Pr(>F)  
## rate          1.3005  1  7.9178 0.01248 *
## additive      0.6125  1  3.7291 0.07139 .
## rate:additive 0.5445  1  3.3151 0.08740 .
## Residuals     2.6280 16                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Anova(update(plastic.mod,opacity~.))
## Anova Table (Type II tests)
## 
## Response: opacity
##               Sum Sq Df F value Pr(>F)
## rate           0.421  1  0.1036 0.7517
## additive       4.900  1  1.2077 0.2881
## rate:additive  3.960  1  0.9760 0.3379
## Residuals     64.924 16
colors=c("red","blue","green","brown")
heplot(plastic.mod,size="effect",col=colors,cex=1.25)

colors = c("pink", "darkblue", "darkgreen", "brown")
heplot3d(plastic.mod, col=colors)

You must enable Javascript to view this page properly.

Testy lineárnych hypotéz

plastic.mod
## 
## Call:
## lm(formula = cbind(tear, gloss, opacity) ~ rate * additive, data = Plastic)
## 
## Coefficients:
##                        tear   gloss  opacity
## (Intercept)             6.30   9.56   3.74  
## rateHigh                0.58  -0.84  -0.60  
## additiveHigh            0.38   0.02   0.10  
## rateHigh:additiveHigh   0.02   0.66   1.78
linearHypothesis(plastic.mod, c("rateHigh", "additiveHigh"), verbose=TRUE)
## 
## Hypothesis matrix:
##              (Intercept) rateHigh additiveHigh rateHigh:additiveHigh
## rateHigh               0        1            0                     0
## additiveHigh           0        0            1                     0
## 
## Right-hand-side matrix:
##              tear gloss opacity
## rateHigh        0     0       0
## additiveHigh    0     0       0
## 
## Estimated linear function (hypothesis.matrix %*% coef - rhs):
##              tear gloss opacity
## rateHigh     0.58 -0.84    -0.6
## additiveHigh 0.38  0.02     0.1
## 
## Sum of squares and products for the hypothesis:
##           tear     gloss   opacity
## tear     0.868 -1.086000 -0.750000
## gloss   -1.086  2.409333  1.846667
## opacity -0.750  1.846667  1.433333
## 
## Sum of squares and products for error:
##           tear  gloss opacity
## tear     1.764  0.020  -3.070
## gloss    0.020  2.628  -0.552
## opacity -3.070 -0.552  64.924
## 
## Multivariate Tests: 
##                  Df test stat approx F num Df den Df    Pr(>F)   
## Pillai            2 0.7116133 2.761645      6     30 0.0293945 * 
## Wilks             2 0.3740960 2.963170      6     28 0.0228392 * 
## Hotelling-Lawley  2 1.4440003 3.128667      6     26 0.0191755 * 
## Roy               2 1.2625312 6.312656      3     15 0.0055424 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
linearHypothesis(plastic.mod, c("rateHigh", "additiveHigh", "rateHigh:additiveHigh"), verbose=TRUE)
## 
## Hypothesis matrix:
##                       (Intercept) rateHigh additiveHigh rateHigh:additiveHigh
## rateHigh                        0        1            0                     0
## additiveHigh                    0        0            1                     0
## rateHigh:additiveHigh           0        0            0                     1
## 
## Right-hand-side matrix:
##                       tear gloss opacity
## rateHigh                 0     0       0
## additiveHigh             0     0       0
## rateHigh:additiveHigh    0     0       0
## 
## Estimated linear function (hypothesis.matrix %*% coef - rhs):
##                       tear gloss opacity
## rateHigh              0.58 -0.84   -0.60
## additiveHigh          0.38  0.02    0.10
## rateHigh:additiveHigh 0.02  0.66    1.78
## 
## Sum of squares and products for the hypothesis:
##            tear   gloss opacity
## tear     2.5015 -0.8055  2.8305
## gloss   -0.8055  2.4575  2.4615
## opacity  2.8305  2.4615  9.2815
## 
## Sum of squares and products for error:
##           tear  gloss opacity
## tear     1.764  0.020  -3.070
## gloss    0.020  2.628  -0.552
## opacity -3.070 -0.552  64.924
## 
## Multivariate Tests: 
##                  Df test stat approx F num Df   den Df     Pr(>F)    
## Pillai            3 1.1455982 3.294786      9 48.00000 0.00335033 ** 
## Wilks             3 0.1780187 3.925172      9 34.22293 0.00166294 ** 
## Hotelling-Lawley  3 2.8175163 3.965393      9 38.00000 0.00124500 ** 
## Roy               3 1.8695971 9.971184      3 16.00000 0.00060304 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
heplot(plastic.mod, hypotheses=list("Group" =c("rateHigh", "additiveHigh", "rateHigh:additiveHigh ")),col=c(colors, "purple"),lwd=c(2, 3, 3, 3, 2), cex=1.25)
heplot(plastic.mod, hypotheses=list("Main effects" =c("rateHigh", "additiveHigh")), add=TRUE,col=c(colors, "darkgreen"), cex=1.25)

heplot3d(plastic.mod, hypotheses=list("Group" =c("rateHigh", "additiveHigh", "rateHigh:additiveHigh ")),col=c(colors, "purple"))
heplot3d(plastic.mod, hypotheses=list("Main effects" =c("rateHigh", "additiveHigh")), add=TRUE,col=c(colors, "black"))

You must enable Javascript to view this page properly.

Nevyvážené návrhy

Uvažujme rovnaké dáta ako v predchádzajúcom príklade, ale predpokladajme, že pre kombináciu prvej úrovne faktora rate a prvej úrovne faktora additive máme len 2 pozorovania. Pozrime sa, ako to ovplyvní výpočet:

plastic.mod2 <- lm(cbind(tear,gloss,opacity)~rate*additive, data=Plastic[3:20,])
Anova(plastic.mod2,test.statistic="Pillai")
## 
## Type II MANOVA Tests: Pillai test statistic
##               Df test stat approx F num Df den Df   Pr(>F)   
## rate           1   0.60717   6.1825      3     12 0.008771 **
## additive       1   0.57619   5.4381      3     12 0.013550 * 
## rate:additive  1   0.13991   0.6507      3     12 0.597585   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plastic.mod2a <- lm(cbind(tear,gloss,opacity)~additive*rate, data=Plastic[3:20,])
Anova(plastic.mod2a,test.statistic="Pillai")
## 
## Type II MANOVA Tests: Pillai test statistic
##               Df test stat approx F num Df den Df   Pr(>F)   
## additive       1   0.57619   5.4381      3     12 0.013550 * 
## rate           1   0.60717   6.1825      3     12 0.008771 **
## additive:rate  1   0.13991   0.6507      3     12 0.597585   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
pl2 <- manova(cbind(tear,gloss,opacity)~rate*additive, data=Plastic[3:20,])
summary(pl2)
##               Df  Pillai approx F num Df den Df  Pr(>F)  
## rate           1 0.58571   5.6551      3     12 0.01190 *
## additive       1 0.57619   5.4381      3     12 0.01355 *
## rate:additive  1 0.13991   0.6507      3     12 0.59759  
## Residuals     14                                         
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
pl2a <- manova(cbind(tear,gloss,opacity)~additive*rate, data=Plastic[3:20,])
summary(pl2a)
##               Df  Pillai approx F num Df den Df   Pr(>F)   
## additive       1 0.55110   4.9108      3     12 0.018803 * 
## rate           1 0.60717   6.1825      3     12 0.008771 **
## additive:rate  1 0.13991   0.6507      3     12 0.597585   
## Residuals     14                                           
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Úloha

Porovnávali sme vedomosti študentov psychológie na troch rôznych školách, a to tak, že náhodne vybraným študentom z každej školy sme dali testy z piatich predmetov, v ktorých mohli získať maximálne 15 bodov. V súbore http://www.iam.fmph.uniba.sk/ospm/Filova/vsa/psychology.dat sú uvedené výsledky týchto testov z piatich predmetov (experimentálna psychológia, štatistika, sociálna psychológia, vývinová psychológia a psychológia osobnosti).

  1. Otestujte hypotézu o rozdieloch medzi školami na hladine významnosti 0.05.

  2. Otestujte hypotézu o rozdieloch medzi školami, ak berieme do úvahy len výsledky zo štatistiky.

  3. Vypočítajte Tukeyho intervaly spoľahlivosti pre rozdiely priemerov medzi jednotlivými školami v teste zo štatistiky.