Pri chemickej reakcii pozorujeme v závislosti od prídavnej látky (x) dve premenné: percento nechceného vedľajšieho produktu a zmenu teploty zlúčeniny. Naše dáta sú nasledovné:
x <- 0:4
z1 <- c(1,4,3,8,9)
z2 <- c(-1,-1,2,3,2)
Lineárny model s konštantným členom
X <- cbind(rep(1,5),x)
Y <- cbind(z1, z2)
n <- 5 #pocet pozorovani
p <- 2 #pocet sledovanych znakov
q <- 2 #hodnost matice X
Odhad neznámych parametrov
(B <- solve(t(X)%*%X)%*%t(X)%*%Y)
## z1 z2
## 1 -1
## x 2 1
P <- diag(5) - X%*%solve(t(X)%*%X)%*%t(X)
(H <- t(Y) %*% (diag(5) - P) %*% Y)
## z1 z2
## z1 165 45
## z2 45 15
(E <- t(Y) %*% P %*% Y)
## z1 z2
## z1 6 -2
## z2 -2 4
Môžeme vypočítať hodnoty testovacích štatistík, ale zatiaľ ich nevieme porovnať s kritickými hodnotami:
W <- det(solve(E+H) %*% E) #hodnota Wilksovho lambda
P <- sum(diag(solve(E+H) %*% E)) #hodnota Pillaiovej stopy
HL <- sum(diag(solve(E)%*%H)) #hodnota Hotteling-Lawleyho stopy
R <- max(eigen(solve(E)%*%H)$values) #hodnota Royovho kriteria
Jednorozmerné testy vieme urobiť pomocou príkazu lm:
summary(lm(z1~x))
##
## Call:
## lm(formula = z1 ~ x)
##
## Residuals:
## 1 2 3 4 5
## 5.551e-17 1.000e+00 -2.000e+00 1.000e+00 -4.476e-17
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.0000 1.0954 0.913 0.4286
## x 2.0000 0.4472 4.472 0.0208 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.414 on 3 degrees of freedom
## Multiple R-squared: 0.8696, Adjusted R-squared: 0.8261
## F-statistic: 20 on 1 and 3 DF, p-value: 0.02084
summary(lm(z2~x))
##
## Call:
## lm(formula = z2 ~ x)
##
## Residuals:
## 1 2 3 4 5
## 2.22e-16 -1.00e+00 1.00e+00 1.00e+00 -1.00e+00
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.0000 0.8944 -1.118 0.3450
## x 1.0000 0.3651 2.739 0.0714 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.155 on 3 degrees of freedom
## Multiple R-squared: 0.7143, Adjusted R-squared: 0.619
## F-statistic: 7.5 on 1 and 3 DF, p-value: 0.07142
Máme súbor obsahujúci údaje o predávkovaní amitriptylinom (Rudorfer, 1982). Pre každý zo 17 subjektov sme pozorovali dve odozvy: TOT (celková úroveň TCAD v plazme)a AMI (množstvo amitriptylínu v TCAD). O subjektoch máme ďalej zaznamenaé nasledovné údaje:
GEN: pohlavie (male = 0, female = 1) AMT: dávka, ktorá spôsobila predávkovanie PR: meranie PR vlny DIAP: diastolický krvný tlak QRS: meranie QRS vlny
ami_data <- read.table("http://static.lib.virginia.edu/statlab/materials/data/ami_data.DAT")
names(ami_data) <- c("TOT","AMI","GEN","AMT","PR","DIAP","QRS")
summary(ami_data)
## TOT AMI GEN AMT
## Min. : 500 Min. : 384.0 Min. :0.0000 Min. : 350
## 1st Qu.: 652 1st Qu.: 458.0 1st Qu.:0.0000 1st Qu.: 750
## Median : 896 Median : 653.0 Median :1.0000 Median :1750
## Mean :1120 Mean : 882.4 Mean :0.7059 Mean :2146
## 3rd Qu.:1131 3rd Qu.: 941.0 3rd Qu.:1.0000 3rd Qu.:3000
## Max. :3389 Max. :3149.0 Max. :1.0000 Max. :7500
## PR DIAP QRS
## Min. :135.0 Min. : 0 Min. : 60.00
## 1st Qu.:160.0 1st Qu.:60 1st Qu.: 80.00
## Median :180.0 Median :60 Median : 98.00
## Mean :174.9 Mean :52 Mean : 97.18
## 3rd Qu.:185.0 3rd Qu.:70 3rd Qu.:111.00
## Max. :220.0 Max. :90 Max. :140.00
pairs(ami_data)
Výstup z lm obsahuje dve jednorozmerné regresné analýzy:
mlm1 <- lm(cbind(TOT, AMI) ~ GEN + AMT + PR + DIAP + QRS, data = ami_data)
summary(mlm1)
## Response TOT :
##
## Call:
## lm(formula = TOT ~ GEN + AMT + PR + DIAP + QRS, data = ami_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -399.2 -180.1 4.5 164.1 366.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.879e+03 8.933e+02 -3.224 0.008108 **
## GEN 6.757e+02 1.621e+02 4.169 0.001565 **
## AMT 2.848e-01 6.091e-02 4.677 0.000675 ***
## PR 1.027e+01 4.255e+00 2.414 0.034358 *
## DIAP 7.251e+00 3.225e+00 2.248 0.046026 *
## QRS 7.598e+00 3.849e+00 1.974 0.074006 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 281.2 on 11 degrees of freedom
## Multiple R-squared: 0.8871, Adjusted R-squared: 0.8358
## F-statistic: 17.29 on 5 and 11 DF, p-value: 6.983e-05
##
##
## Response AMI :
##
## Call:
## lm(formula = AMI ~ GEN + AMT + PR + DIAP + QRS, data = ami_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -373.85 -247.29 -83.74 217.13 462.72
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.729e+03 9.288e+02 -2.938 0.013502 *
## GEN 7.630e+02 1.685e+02 4.528 0.000861 ***
## AMT 3.064e-01 6.334e-02 4.837 0.000521 ***
## PR 8.896e+00 4.424e+00 2.011 0.069515 .
## DIAP 7.206e+00 3.354e+00 2.149 0.054782 .
## QRS 4.987e+00 4.002e+00 1.246 0.238622
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 292.4 on 11 degrees of freedom
## Multiple R-squared: 0.8764, Adjusted R-squared: 0.8202
## F-statistic: 15.6 on 5 and 11 DF, p-value: 0.0001132
Je to teda to isté ako uvažovať odozvy zvlášť:
m1 <- lm(TOT ~ GEN + AMT + PR + DIAP + QRS, data = ami_data)
summary(m1)
##
## Call:
## lm(formula = TOT ~ GEN + AMT + PR + DIAP + QRS, data = ami_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -399.2 -180.1 4.5 164.1 366.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.879e+03 8.933e+02 -3.224 0.008108 **
## GEN 6.757e+02 1.621e+02 4.169 0.001565 **
## AMT 2.848e-01 6.091e-02 4.677 0.000675 ***
## PR 1.027e+01 4.255e+00 2.414 0.034358 *
## DIAP 7.251e+00 3.225e+00 2.248 0.046026 *
## QRS 7.598e+00 3.849e+00 1.974 0.074006 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 281.2 on 11 degrees of freedom
## Multiple R-squared: 0.8871, Adjusted R-squared: 0.8358
## F-statistic: 17.29 on 5 and 11 DF, p-value: 6.983e-05
m2 <- lm(AMI ~ GEN + AMT + PR + DIAP + QRS, data = ami_data)
summary(m2)
##
## Call:
## lm(formula = AMI ~ GEN + AMT + PR + DIAP + QRS, data = ami_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -373.85 -247.29 -83.74 217.13 462.72
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.729e+03 9.288e+02 -2.938 0.013502 *
## GEN 7.630e+02 1.685e+02 4.528 0.000861 ***
## AMT 3.064e-01 6.334e-02 4.837 0.000521 ***
## PR 8.896e+00 4.424e+00 2.011 0.069515 .
## DIAP 7.206e+00 3.354e+00 2.149 0.054782 .
## QRS 4.987e+00 4.002e+00 1.246 0.238622
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 292.4 on 11 degrees of freedom
## Multiple R-squared: 0.8764, Adjusted R-squared: 0.8202
## F-statistic: 15.6 on 5 and 11 DF, p-value: 0.0001132
Keď sa ale vrátime k viacrozmernému modelu, vidíme, že koeificnety zodpovedajúce jednorozmerným modelom sú korelované
coef(mlm1)
## TOT AMI
## (Intercept) -2879.4782461 -2728.7085444
## GEN 675.6507805 763.0297617
## AMT 0.2848511 0.3063734
## PR 10.2721328 8.8961977
## DIAP 7.2511714 7.2055597
## QRS 7.5982397 4.9870508
vcov(mlm1)
## TOT:(Intercept) TOT:GEN TOT:AMT TOT:PR
## TOT:(Intercept) 797914.009706 -61055.389981 11.236944164 -3.157872e+03
## TOT:GEN -61055.389981 26262.026594 1.417050110 1.500225e+02
## TOT:AMT 11.236944 1.417050 0.003710048 -1.096946e-01
## TOT:PR -3157.872390 150.022536 -0.109694600 1.810410e+01
## TOT:DIAP -1625.349406 162.904357 0.081729026 3.131923e+00
## TOT:QRS -1414.943019 49.083508 -0.054166512 -4.282235e-01
## AMI:(Intercept) 702227.760065 -53733.596885 9.889404164 -2.779179e+03
## AMI:GEN -53733.596885 23112.671147 1.247116747 1.320318e+02
## AMI:AMT 9.889404 1.247117 0.003265137 -9.653997e-02
## AMI:PR -2779.178744 132.031758 -0.096539968 1.593304e+01
## AMI:DIAP -1430.436687 143.368784 0.071928040 2.756341e+00
## AMI:QRS -1245.262340 43.197389 -0.047670836 -3.768708e-01
## TOT:DIAP TOT:QRS AMI:(Intercept) AMI:GEN
## TOT:(Intercept) -1.625349e+03 -1.414943e+03 702227.760065 -53733.596885
## TOT:GEN 1.629044e+02 4.908351e+01 -53733.596885 23112.671147
## TOT:AMT 8.172903e-02 -5.416651e-02 9.889404 1.247117
## TOT:PR 3.131923e+00 -4.282235e-01 -2779.178744 132.031758
## TOT:DIAP 1.040163e+01 2.535578e+00 -1430.436687 143.368784
## TOT:QRS 2.535578e+00 1.481381e+01 -1245.262340 43.197389
## AMI:(Intercept) -1.430437e+03 -1.245262e+03 862755.907989 -66017.011582
## AMI:GEN 1.433688e+02 4.319739e+01 -66017.011582 28396.190973
## AMI:AMT 7.192804e-02 -4.767084e-02 12.150106 1.532206
## AMI:PR 2.756341e+00 -3.768708e-01 -3414.494580 162.214008
## AMI:DIAP 9.154259e+00 2.231510e+00 -1757.432236 176.142660
## AMI:QRS 2.231510e+00 1.303733e+01 -1529.927329 53.072244
## AMI:AMT AMI:PR AMI:DIAP AMI:QRS
## TOT:(Intercept) 9.889404164 -2.779179e+03 -1.430437e+03 -1.245262e+03
## TOT:GEN 1.247116747 1.320318e+02 1.433688e+02 4.319739e+01
## TOT:AMT 0.003265137 -9.653997e-02 7.192804e-02 -4.767084e-02
## TOT:PR -0.096539968 1.593304e+01 2.756341e+00 -3.768708e-01
## TOT:DIAP 0.071928040 2.756341e+00 9.154259e+00 2.231510e+00
## TOT:QRS -0.047670836 -3.768708e-01 2.231510e+00 1.303733e+01
## AMI:(Intercept) 12.150106211 -3.414495e+03 -1.757432e+03 -1.529927e+03
## AMI:GEN 1.532205650 1.622140e+02 1.761427e+02 5.307224e+01
## AMI:AMT 0.004011543 -1.186089e-01 8.837068e-02 -5.856831e-02
## AMI:PR -0.118608851 1.957531e+01 3.386437e+00 -4.630228e-01
## AMI:DIAP 0.088370675 3.386437e+00 1.124691e+01 2.741630e+00
## AMI:QRS -0.058568314 -4.630228e-01 2.741630e+00 1.601764e+01
Chceme preto urobiť dvojrozmernú analýzu, ktorá by tieto korelácie zohľadnila. Testovanie viacrozmernej hypotézy \(H_0: B=0\) musíme urobiť pomocou príkazu Anova z balíka car, ktorý premenné na ľavej strane modelu chápe ako viacrozmerné pozorovanie
library(car)
## Loading required package: carData
library(heplots)
##
## Attaching package: 'heplots'
## The following object is masked from 'package:rgl':
##
## arrow3d
Anova(mlm1)
##
## Type II MANOVA Tests: Pillai test statistic
## Df test stat approx F num Df den Df Pr(>F)
## GEN 1 0.65521 9.5015 2 10 0.004873 **
## AMT 1 0.69097 11.1795 2 10 0.002819 **
## PR 1 0.34649 2.6509 2 10 0.119200
## DIAP 1 0.32381 2.3944 2 10 0.141361
## QRS 1 0.29184 2.0606 2 10 0.178092
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
heplot(mlm1,col=c("red", "blue"), lwd=2, cex=1.2,verbose=TRUE)
##
## Type II MANOVA Tests: Pillai test statistic
## Df test stat approx F num Df den Df Pr(>F)
## GEN 1 0.65521 9.5015 2 10 0.004873 **
## AMT 1 0.69097 11.1795 2 10 0.002819 **
## PR 1 0.34649 2.6509 2 10 0.119200
## DIAP 1 0.32381 2.3944 2 10 0.141361
## QRS 1 0.29184 2.0606 2 10 0.178092
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## GEN H matrix ( 1 df):
## TOT AMI
## TOT 152314.7 172012.9
## AMI 172012.9 194258.6
## AMT H matrix ( 1 df):
## TOT AMI
## TOT 191638.0 206117.5
## AMI 206117.5 221691.0
## PR H matrix ( 1 df):
## TOT AMI
## TOT 51070.47 44229.67
## AMI 44229.67 38305.18
## DIAP H matrix ( 1 df):
## TOT AMI
## TOT 44293.55 44014.93
## AMI 44014.93 43738.07
## QRS H matrix ( 1 df):
## TOT AMI
## TOT 34149.53 22413.80
## AMI 22413.80 14711.14
Na základe týchto výsledkov môžeme ďalej testovať hypotézu o vplyve PR, DIAP a QRS na odozvu:
lht(mlm1, hypothesis.matrix = c("PR = 0", "DIAP = 0", "QRS = 0"))
##
## Sum of squares and products for the hypothesis:
## TOT AMI
## TOT 930348.1 780517.7
## AMI 780517.7 679948.4
##
## Sum of squares and products for error:
## TOT AMI
## TOT 870008.3 765676.5
## AMI 765676.5 940708.9
##
## Multivariate Tests:
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 3 0.6038599 1.585910 6 22 0.19830
## Wilks 3 0.4405021 1.688991 6 20 0.17553
## Hotelling-Lawley 3 1.1694286 1.754143 6 18 0.16574
## Roy 3 1.0758181 3.944666 3 11 0.03906 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
heplot(mlm1,hypotheses=list("H0" =c("PR", "DIAP", "QRS")),col=rainbow(10),lwd=c(2, 3, 3, 3, 2), cex=1.25)
Alternatívou je použiť príkaz anovana porovnanie pôvodného modelu s modelom bez premenných PR, DIAP a QRS:
mlm2 <- update(mlm1, . ~ . - PR - DIAP - QRS)
anova(mlm1, mlm2)
## Analysis of Variance Table
##
## Model 1: cbind(TOT, AMI) ~ GEN + AMT + PR + DIAP + QRS
## Model 2: cbind(TOT, AMI) ~ GEN + AMT
## Res.Df Df Gen.var. Pillai approx F num Df den Df Pr(>F)
## 1 11 43803
## 2 14 3 51856 0.60386 1.5859 6 22 0.1983
Vezmime teraz dáta Rohwer z balíka heplots.
data(Rohwer)
some(Rohwer,n=6)
## group SES SAT PPVT Raven n s ns na ss
## 26 1 Lo 60 80 11 3 8 18 28 21
## 28 1 Lo 6 70 16 2 11 9 23 11
## 42 2 Hi 90 82 13 20 7 21 28 16
## 43 2 Hi 77 100 15 4 11 18 32 29
## 47 2 Hi 98 91 18 16 12 16 27 30
## 48 2 Hi 8 87 10 5 3 17 25 24
Zadanie lineárneho modelu urobíme rovnako ako pre jednorozmerné modely pomocou príkazu lm
mod1 <- lm(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss,data=Rohwer)
summary(Anova(mod1))
##
## Type II MANOVA Tests:
##
## Sum of squares and products for error:
## SAT PPVT Raven
## SAT 44061.078 6205.5496 1808.5518
## PPVT 6205.550 11259.4196 969.4692
## Raven 1808.552 969.4692 553.5231
##
## ------------------------------------------
##
## Term: n
##
## Sum of squares and products for the hypothesis:
## SAT PPVT Raven
## SAT 2448.44503 760.05215 90.13861
## PPVT 760.05215 235.93720 27.98104
## Raven 90.13861 27.98104 3.31842
##
## Multivariate Tests: n
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 1 0.0599640 1.297044 3 61 0.28358
## Wilks 1 0.9400360 1.297044 3 61 0.28358
## Hotelling-Lawley 1 0.0637891 1.297044 3 61 0.28358
## Roy 1 0.0637891 1.297044 3 61 0.28358
##
## ------------------------------------------
##
## Term: s
##
## Sum of squares and products for the hypothesis:
## SAT PPVT Raven
## SAT 3.208771 27.65146 -8.897894
## PPVT 27.651463 238.28545 -76.677269
## Raven -8.897894 -76.67727 24.673784
##
## Multivariate Tests: s
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 1 0.0977878 2.203862 3 61 0.096703 .
## Wilks 1 0.9022122 2.203862 3 61 0.096703 .
## Hotelling-Lawley 1 0.1083866 2.203862 3 61 0.096703 .
## Roy 1 0.1083866 2.203862 3 61 0.096703 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## ------------------------------------------
##
## Term: ns
##
## Sum of squares and products for the hypothesis:
## SAT PPVT Raven
## SAT 7201.4233 1632.66526 -205.37183
## PPVT 1632.6653 370.14848 -46.56072
## Raven -205.3718 -46.56072 5.85684
##
## Multivariate Tests: ns
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 1 0.2088201 5.36668 3 61 0.002406 **
## Wilks 1 0.7911799 5.36668 3 61 0.002406 **
## Hotelling-Lawley 1 0.2639351 5.36668 3 61 0.002406 **
## Roy 1 0.2639351 5.36668 3 61 0.002406 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## ------------------------------------------
##
## Term: na
##
## Sum of squares and products for the hypothesis:
## SAT PPVT Raven
## SAT 3860.98360 2352.77010 -20.5161806
## PPVT 2352.77010 1433.70906 -12.5019584
## Raven -20.51618 -12.50196 0.1090172
##
## Multivariate Tests: na
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 1 0.1834781 4.56904 3 61 0.0059522 **
## Wilks 1 0.8165219 4.56904 3 61 0.0059522 **
## Hotelling-Lawley 1 0.2247069 4.56904 3 61 0.0059522 **
## Roy 1 0.2247069 4.56904 3 61 0.0059522 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## ------------------------------------------
##
## Term: ss
##
## Sum of squares and products for the hypothesis:
## SAT PPVT Raven
## SAT 1419.37607 1135.36471 46.502231
## PPVT 1135.36471 908.18287 37.197324
## Raven 46.50223 37.19732 1.523527
##
## Multivariate Tests: ss
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 1 0.0917965 2.055187 3 61 0.11552
## Wilks 1 0.9082035 2.055187 3 61 0.11552
## Hotelling-Lawley 1 0.1010748 2.055187 3 61 0.11552
## Roy 1 0.1010748 2.055187 3 61 0.11552
coefplot(mod1, lwd=2, fill=(1:5) %in% 3:4, level=0.95)
Vizuálne porovnanie “veľkosti” matíc H a E
heplot(mod1,col=c("red", "black"), lwd=2, cex=1.2,verbose=TRUE)
##
## Type II MANOVA Tests: Pillai test statistic
## Df test stat approx F num Df den Df Pr(>F)
## n 1 0.059964 1.2970 3 61 0.283582
## s 1 0.097788 2.2039 3 61 0.096703 .
## ns 1 0.208820 5.3667 3 61 0.002406 **
## na 1 0.183478 4.5690 3 61 0.005952 **
## ss 1 0.091796 2.0552 3 61 0.115521
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## n H matrix ( 1 df):
## SAT PPVT
## SAT 286.78803 89.02542
## PPVT 89.02542 27.63548
## s H matrix ( 1 df):
## SAT PPVT
## SAT 0.3758455 3.238835
## PPVT 3.2388346 27.910536
## ns H matrix ( 1 df):
## SAT PPVT
## SAT 843.5076 191.23519
## PPVT 191.2352 43.35574
## na H matrix ( 1 df):
## SAT PPVT
## SAT 452.2396 275.5816
## PPVT 275.5816 167.9313
## ss H matrix ( 1 df):
## SAT PPVT
## SAT 166.2525 132.9860
## PPVT 132.9860 106.3761
pairs(mod1)
L <- matrix(0, nrow=2, ncol=6)
L[1, 5] <- 1
L[2, 4] <- 1
Y <- as.matrix(Rohwer[,3:5])
X <- as.matrix(Rohwer[,6:10])
X <- cbind(rep(1,dim(X)[1]), X)
B <- solve(t(X)%*%X)%*%t(X)%*%Y
H <- t(B) %*% t(L) %*% solve(L %*% solve(t(X)%*%X) %*% t(L)) %*% L %*% B
E <- t(Y) %*% Y - t(B) %*% solve(t(X)%*%X) %*% B
lht(mod1, c("na", "ns"),verbose=TRUE)
##
## Hypothesis matrix:
## (Intercept) n s ns na ss
## na 0 0 0 0 1 0
## ns 0 0 0 1 0 0
##
## Right-hand-side matrix:
## SAT PPVT Raven
## na 0 0 0
## ns 0 0 0
##
## Estimated linear function (hypothesis.matrix %*% coef - rhs):
## SAT PPVT Raven
## na 2.097832 1.2783575 -0.01114729
## ns -2.802803 -0.6354355 0.07993098
##
## Sum of squares and products for the hypothesis:
## SAT PPVT Raven
## SAT 8420.4811 2793.94988 -186.475924
## PPVT 2793.9499 1476.39789 -28.560325
## Raven -186.4759 -28.56033 6.149735
##
## Sum of squares and products for error:
## SAT PPVT Raven
## SAT 44061.078 6205.5496 1808.5518
## PPVT 6205.550 11259.4196 969.4692
## Raven 1808.552 969.4692 553.5231
##
## Multivariate Tests:
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 2 0.2989130 3.631523 6 124 0.00235885 **
## Wilks 2 0.7124255 3.756781 6 122 0.00182225 **
## Hotelling-Lawley 2 0.3877402 3.877402 6 120 0.00142373 **
## Roy 2 0.3410783 7.048952 3 62 0.00037467 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
heplot(mod1,hypotheses=list("H0" =c("na", "ns")),col=rainbow(10),lwd=c(2, 3, 3, 3, 2), cex=1.25)
linearHypothesis(mod1,"na-ns",verbose=TRUE)
##
## Hypothesis matrix:
## (Intercept) n s ns na ss
## na-ns 0 0 0 -1 1 0
##
## Right-hand-side matrix:
## SAT PPVT Raven
## na-ns 0 0 0
##
## Estimated linear function (hypothesis.matrix %*% coef - rhs):
## SAT PPVT Raven
## 4.90063575 1.91379297 -0.09107827
##
## Sum of squares and products for the hypothesis:
## SAT PPVT Raven
## SAT 7997.851 3123.31556 -148.640002
## PPVT 3123.316 1219.71508 -58.046793
## Raven -148.640 -58.04679 2.762473
##
## Sum of squares and products for error:
## SAT PPVT Raven
## SAT 44061.078 6205.5496 1808.5518
## PPVT 6205.550 11259.4196 969.4692
## Raven 1808.552 969.4692 553.5231
##
## Multivariate Tests:
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 1 0.2533129 6.898064 3 61 0.00044825 ***
## Wilks 1 0.7466871 6.898064 3 61 0.00044825 ***
## Hotelling-Lawley 1 0.3392491 6.898064 3 61 0.00044825 ***
## Roy 1 0.3392491 6.898064 3 61 0.00044825 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
heplot(mod1,hypotheses=list("H0" ="na-ns"),col=rainbow(10),lwd=c(2, 3, 3, 3, 2), cex=1.25)
heplot3d(mod1,hypotheses=list("H0" =c("s","na", "ns")),col=rainbow(10))
You must enable Javascript to view this page properly.
rohwer.ses1 <- lm(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss, data=Rohwer, subset=SES=="Hi")
Anova(rohwer.ses1)
##
## Type II MANOVA Tests: Pillai test statistic
## Df test stat approx F num Df den Df Pr(>F)
## n 1 0.20177 2.0222 3 24 0.13760
## s 1 0.30970 3.5891 3 24 0.02836 *
## ns 1 0.35795 4.4601 3 24 0.01260 *
## na 1 0.46516 6.9576 3 24 0.00157 **
## ss 1 0.08873 0.7790 3 24 0.51725
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
rohwer.ses2 <- lm(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss, data=Rohwer, subset=SES=="Lo")
Anova(rohwer.ses2)
##
## Type II MANOVA Tests: Pillai test statistic
## Df test stat approx F num Df den Df Pr(>F)
## n 1 0.038358 0.3856 3 29 0.76418
## s 1 0.111793 1.2167 3 29 0.32130
## ns 1 0.225221 2.8100 3 29 0.05696 .
## na 1 0.267458 3.5294 3 29 0.02705 *
## ss 1 0.138962 1.5601 3 29 0.22030
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Vypíšeme odhady neznámych parametrov
coef(rohwer.ses1)
## SAT PPVT Raven
## (Intercept) -28.4674707 39.6970896 13.2438356
## n 3.2571323 0.0672825 0.0593474
## s 2.9965832 0.3699843 0.4924440
## ns -5.8590628 -0.3743802 -0.1640219
## na 5.6662226 1.5230090 0.1189805
## ss -0.6226532 0.4101567 -0.1211564
coef(rohwer.ses2)
## SAT PPVT Raven
## (Intercept) 4.15106027 33.00576902 11.17337818
## n -0.60887193 -0.08056682 0.21099550
## s -0.05015614 -0.72105017 0.06456701
## ns -1.73239505 -0.29830269 0.21358423
## na 0.49456476 1.47041758 -0.03731804
## ss 2.24772090 0.32396465 -0.05214253
Porovnanie veličín H a E zvlášť pre normálne (modrá) a znevýhodnené (červená) deti
heplot(rohwer.ses1, ylim=c(40,110),col=c("red", "black"), lwd=2, cex=1.2)
heplot(rohwer.ses2, add=TRUE, col=c("blue", "black"), grand.mean=TRUE, error.ellipse=TRUE, lwd=2, cex=1.2)
means <- aggregate(cbind(SAT,PPVT)~SES, data=Rohwer, mean)
text(means[,2], means[,3], labels=means[,1], pos=3, cex=2, col=c("red", "blue"))
Uvažujme teraz dáta NLSY, v ktorých sa sledovali výsledky detí z matematiky a čítania v závislosti od štyroch ďalších premenných. Urobíme podobnú analýzu ako v predchádzajúcom príklade.
data(NLSY)
some(NLSY,n=6)
## math read antisoc hyperact income educ
## 13 19.05 30.95 2 4 14.594 12
## 75 47.62 34.52 0 0 40.050 12
## 123 19.05 26.19 0 0 21.000 11
## 192 34.52 26.19 1 0 21.012 14
## 218 21.43 21.43 0 0 13.666 15
## 221 22.62 22.62 2 0 24.500 6
mod1 <- lm(cbind(read, math) ~ income + educ + antisoc + hyperact, data=NLSY)
summary(Anova(mod1))
##
## Type II MANOVA Tests:
##
## Sum of squares and products for error:
## read math
## read 22712.12 11790.46
## math 11790.46 23247.77
##
## ------------------------------------------
##
## Term: income
##
## Sum of squares and products for the hypothesis:
## read math
## read 17.34907 116.7099
## math 116.70993 785.1261
##
## Multivariate Tests: income
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 1 0.0382795 4.716675 2 237 0.0098015 **
## Wilks 1 0.9617205 4.716675 2 237 0.0098015 **
## Hotelling-Lawley 1 0.0398032 4.716675 2 237 0.0098015 **
## Roy 1 0.0398032 4.716675 2 237 0.0098015 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## ------------------------------------------
##
## Term: educ
##
## Sum of squares and products for the hypothesis:
## read math
## read 659.4951 895.6072
## math 895.6072 1216.2522
##
## Multivariate Tests: educ
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 1 0.0531518 6.652055 2 237 0.001546 **
## Wilks 1 0.9468482 6.652055 2 237 0.001546 **
## Hotelling-Lawley 1 0.0561355 6.652055 2 237 0.001546 **
## Roy 1 0.0561355 6.652055 2 237 0.001546 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## ------------------------------------------
##
## Term: antisoc
##
## Sum of squares and products for the hypothesis:
## read math
## read 73.68227 182.4647
## math 182.46468 451.8504
##
## Multivariate Tests: antisoc
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 1 0.0193432 2.337379 2 237 0.098803 .
## Wilks 1 0.9806568 2.337379 2 237 0.098803 .
## Hotelling-Lawley 1 0.0197247 2.337379 2 237 0.098803 .
## Roy 1 0.0197247 2.337379 2 237 0.098803 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## ------------------------------------------
##
## Term: hyperact
##
## Sum of squares and products for the hypothesis:
## read math
## read 165.4839 230.4094
## math 230.4094 320.8074
##
## Multivariate Tests: hyperact
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 1 0.0144419 1.73644 2 237 0.17838
## Wilks 1 0.9855581 1.73644 2 237 0.17838
## Hotelling-Lawley 1 0.0146535 1.73644 2 237 0.17838
## Roy 1 0.0146535 1.73644 2 237 0.17838
coefplot(mod1, lwd=2, fill=(1:5) %in% 3:4, level=0.95)
heplot(mod1,col=c("red", "black"), lwd=2, cex=1.2,verbose=TRUE)
##
## Type II MANOVA Tests: Pillai test statistic
## Df test stat approx F num Df den Df Pr(>F)
## income 1 0.038280 4.7167 2 237 0.009801 **
## educ 1 0.053152 6.6521 2 237 0.001546 **
## antisoc 1 0.019343 2.3374 2 237 0.098803 .
## hyperact 1 0.014442 1.7364 2 237 0.178380
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## income H matrix ( 1 df):
## read math
## read 2.84717 19.15336
## math 19.15336 128.84770
## educ H matrix ( 1 df):
## read math
## read 108.2303 146.9789
## math 146.9789 199.6002
## antisoc H matrix ( 1 df):
## read math
## read 12.09206 29.94443
## math 29.94443 74.15354
## hyperact H matrix ( 1 df):
## read math
## read 27.15771 37.81267
## math 37.81267 52.64798
L <- matrix(0, nrow=2, ncol=5)
L[1, 4] <- 1
L[2, 5] <- 1
Y <- as.matrix(NLSY[,1:2])
X <- as.matrix(NLSY[,3:6])
X <- cbind(rep(1,dim(X)[1]), X)
B <- solve(t(X)%*%X)%*%t(X)%*%Y
H <- t(B) %*% t(L) %*% solve(L %*% solve(t(X)%*%X) %*% t(L)) %*% L %*% B
E <- t(Y) %*% Y - t(B) %*% solve(t(X)%*%X) %*% B
lht(mod1, c("hyperact", "antisoc"),verbose=TRUE)
##
## Hypothesis matrix:
## (Intercept) income educ antisoc hyperact
## hyperact 0 0 0 0 1
## antisoc 0 0 0 1 0
##
## Right-hand-side matrix:
## read math
## hyperact 0 0
## antisoc 0 0
##
## Estimated linear function (hypothesis.matrix %*% coef - rhs):
## read math
## hyperact -0.6376073 -0.8877641
## antisoc 0.4502605 1.1150123
##
## Sum of squares and products for the hypothesis:
## read math
## read 170.3478 261.2230
## math 261.2230 516.0188
##
## Sum of squares and products for error:
## read math
## read 22712.12 11790.46
## math 11790.46 23247.77
##
## Multivariate Tests:
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 2 0.0239869 1.444548 4 476 0.218172
## Wilks 2 0.9760624 1.444284 4 474 0.218264
## Hotelling-Lawley 2 0.0244741 1.443972 4 472 0.218372
## Roy 2 0.0221965 2.641385 2 238 0.073351 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
heplot(mod1,hypotheses=list("H0" =c("hyperact", "antisoc")),col=rainbow(10),lwd=c(2, 3, 3, 3, 2), cex=1.25)
linearHypothesis(mod1,"hyperact-antisoc",verbose=TRUE)
##
## Hypothesis matrix:
## (Intercept) income educ antisoc hyperact
## hyperact-antisoc 0 0 0 -1 1
##
## Right-hand-side matrix:
## read math
## hyperact-antisoc 0 0
##
## Estimated linear function (hypothesis.matrix %*% coef - rhs):
## read math
## -1.087868 -2.002776
##
## Sum of squares and products for the hypothesis:
## read math
## read 149.4846 275.2028
## math 275.2028 506.6514
##
## Sum of squares and products for error:
## read math
## read 22712.12 11790.46
## math 11790.46 23247.77
##
## Multivariate Tests:
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 1 0.0213663 2.587189 2 237 0.077355 .
## Wilks 1 0.9786337 2.587189 2 237 0.077355 .
## Hotelling-Lawley 1 0.0218328 2.587189 2 237 0.077355 .
## Roy 1 0.0218328 2.587189 2 237 0.077355 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
heplot(mod1,hypotheses=list("H0" ="hyperact-antisoc"),col=rainbow(10),lwd=c(2, 3, 3, 3, 2), cex=1.25)
U 30 značiek japonského Seishu vína sme zisťovali, aký vplyv na chuť a vôňu majú jeho rôzne charakteristiky, ako napriklad mnozstvo cukru a acidita. V súbore http://www.iam.fmph.uniba.sk/ospm/Filova/vsa/Seishu.txt sú uvedené namerane hodnoty závislých premenných (taste, odor) a nezávislých premenných (pH, acidity 1, acidity 2, sake meter, direct reducing sugar, total sugar,alcohol, formyl-nitrogen).
Z dát extrahujte matice Y a X a vypočítajte odhad neznámych koeficientov modelu metódou najmenších štvorcov.
Zostavte model len pre premennú Y1, určte odhady parametrov v tomto modeli a zostrojte intervaly spoľahlivosti.
Analyzujte viacrozmerný model, ktorý berie do úvahy obe premenné Y1, Y2.