Regresná analýza

Viacrozmerná pregresná analýza

Príklad 1: chemická reakcia

Pri chemickej reakcii pozorujeme v závislosti od prídavnej látky (x) dve premenné: percento nechceného vedľajšieho produktu a zmenu teploty zlúčeniny. Naše dáta sú nasledovné:

x <- 0:4
z1 <- c(1,4,3,8,9)
z2 <- c(-1,-1,2,3,2)

Lineárny model s konštantným členom

X <- cbind(rep(1,5),x)
Y <- cbind(z1, z2)

n <- 5  #pocet pozorovani
p <- 2  #pocet sledovanych znakov
q <- 2  #hodnost matice X

Odhad neznámych parametrov

(B <- solve(t(X)%*%X)%*%t(X)%*%Y)

##   z1 z2
##    1 -1
## x  2  1

P <- diag(5) - X%*%solve(t(X)%*%X)%*%t(X)
(H <- t(Y) %*% (diag(5) - P) %*% Y)

##     z1 z2
## z1 165 45
## z2  45 15

(E <- t(Y) %*% P %*% Y)

##    z1 z2
## z1  6 -2
## z2 -2  4

Môžeme vypočítať hodnoty testovacích štatistík, ale zatiaľ ich nevieme porovnať s kritickými hodnotami:

W <- det(solve(E+H) %*% E) #hodnota Wilksovho lambda
P <- sum(diag(solve(E+H) %*% E)) #hodnota Pillaiovej stopy
HL <- sum(diag(solve(E)%*%H)) #hodnota Hotteling-Lawleyho stopy
R <- max(eigen(solve(E)%*%H)$values) #hodnota Royovho kriteria

Jednorozmerné testy vieme urobiť pomocou príkazu lm:

summary(lm(z1~x))

## 
## Call:
## lm(formula = z1 ~ x)
## 
## Residuals:
##          1          2          3          4          5 
##  5.551e-17  1.000e+00 -2.000e+00  1.000e+00 -4.476e-17 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   1.0000     1.0954   0.913   0.4286  
## x             2.0000     0.4472   4.472   0.0208 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.414 on 3 degrees of freedom
## Multiple R-squared:  0.8696, Adjusted R-squared:  0.8261 
## F-statistic:    20 on 1 and 3 DF,  p-value: 0.02084

summary(lm(z2~x))

## 
## Call:
## lm(formula = z2 ~ x)
## 
## Residuals:
##         1         2         3         4         5 
##  2.22e-16 -1.00e+00  1.00e+00  1.00e+00 -1.00e+00 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  -1.0000     0.8944  -1.118   0.3450  
## x             1.0000     0.3651   2.739   0.0714 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.155 on 3 degrees of freedom
## Multiple R-squared:  0.7143, Adjusted R-squared:  0.619 
## F-statistic:   7.5 on 1 and 3 DF,  p-value: 0.07142

Príklad 2: Amitriptylin

Máme súbor obsahujúci údaje o predávkovaní amitriptylinom (Rudorfer, 1982). Pre každý zo 17 subjektov sme pozorovali dve odozvy: TOT (celková úroveň TCAD v plazme)a AMI (množstvo amitriptylínu v TCAD). O subjektoch máme ďalej zaznamenaé nasledovné údaje:

GEN: pohlavie (male = 0, female = 1) AMT: dávka, ktorá spôsobila predávkovanie PR: meranie PR vlny DIAP: diastolický krvný tlak QRS: meranie QRS vlny

ami_data <- read.table("http://static.lib.virginia.edu/statlab/materials/data/ami_data.DAT")
names(ami_data) <- c("TOT","AMI","GEN","AMT","PR","DIAP","QRS")

summary(ami_data)

##       TOT            AMI              GEN              AMT      
##  Min.   : 500   Min.   : 384.0   Min.   :0.0000   Min.   : 350  
##  1st Qu.: 652   1st Qu.: 458.0   1st Qu.:0.0000   1st Qu.: 750  
##  Median : 896   Median : 653.0   Median :1.0000   Median :1750  
##  Mean   :1120   Mean   : 882.4   Mean   :0.7059   Mean   :2146  
##  3rd Qu.:1131   3rd Qu.: 941.0   3rd Qu.:1.0000   3rd Qu.:3000  
##  Max.   :3389   Max.   :3149.0   Max.   :1.0000   Max.   :7500  
##        PR             DIAP         QRS        
##  Min.   :135.0   Min.   : 0   Min.   : 60.00  
##  1st Qu.:160.0   1st Qu.:60   1st Qu.: 80.00  
##  Median :180.0   Median :60   Median : 98.00  
##  Mean   :174.9   Mean   :52   Mean   : 97.18  
##  3rd Qu.:185.0   3rd Qu.:70   3rd Qu.:111.00  
##  Max.   :220.0   Max.   :90   Max.   :140.00

pairs(ami_data)

Výstup z lm obsahuje dve jednorozmerné regresné analýzy:

mlm1 <- lm(cbind(TOT, AMI) ~ GEN + AMT + PR + DIAP + QRS, data = ami_data)
summary(mlm1)

## Response TOT :
## 
## Call:
## lm(formula = TOT ~ GEN + AMT + PR + DIAP + QRS, data = ami_data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -399.2 -180.1    4.5  164.1  366.8 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.879e+03  8.933e+02  -3.224 0.008108 ** 
## GEN          6.757e+02  1.621e+02   4.169 0.001565 ** 
## AMT          2.848e-01  6.091e-02   4.677 0.000675 ***
## PR           1.027e+01  4.255e+00   2.414 0.034358 *  
## DIAP         7.251e+00  3.225e+00   2.248 0.046026 *  
## QRS          7.598e+00  3.849e+00   1.974 0.074006 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 281.2 on 11 degrees of freedom
## Multiple R-squared:  0.8871, Adjusted R-squared:  0.8358 
## F-statistic: 17.29 on 5 and 11 DF,  p-value: 6.983e-05
## 
## 
## Response AMI :
## 
## Call:
## lm(formula = AMI ~ GEN + AMT + PR + DIAP + QRS, data = ami_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -373.85 -247.29  -83.74  217.13  462.72 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.729e+03  9.288e+02  -2.938 0.013502 *  
## GEN          7.630e+02  1.685e+02   4.528 0.000861 ***
## AMT          3.064e-01  6.334e-02   4.837 0.000521 ***
## PR           8.896e+00  4.424e+00   2.011 0.069515 .  
## DIAP         7.206e+00  3.354e+00   2.149 0.054782 .  
## QRS          4.987e+00  4.002e+00   1.246 0.238622    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 292.4 on 11 degrees of freedom
## Multiple R-squared:  0.8764, Adjusted R-squared:  0.8202 
## F-statistic:  15.6 on 5 and 11 DF,  p-value: 0.0001132

Je to teda to isté ako uvažovať odozvy zvlášť:

m1 <- lm(TOT ~ GEN + AMT + PR + DIAP + QRS, data = ami_data)
summary(m1)

## 
## Call:
## lm(formula = TOT ~ GEN + AMT + PR + DIAP + QRS, data = ami_data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -399.2 -180.1    4.5  164.1  366.8 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.879e+03  8.933e+02  -3.224 0.008108 ** 
## GEN          6.757e+02  1.621e+02   4.169 0.001565 ** 
## AMT          2.848e-01  6.091e-02   4.677 0.000675 ***
## PR           1.027e+01  4.255e+00   2.414 0.034358 *  
## DIAP         7.251e+00  3.225e+00   2.248 0.046026 *  
## QRS          7.598e+00  3.849e+00   1.974 0.074006 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 281.2 on 11 degrees of freedom
## Multiple R-squared:  0.8871, Adjusted R-squared:  0.8358 
## F-statistic: 17.29 on 5 and 11 DF,  p-value: 6.983e-05

m2 <- lm(AMI ~ GEN + AMT + PR + DIAP + QRS, data = ami_data)
summary(m2)

## 
## Call:
## lm(formula = AMI ~ GEN + AMT + PR + DIAP + QRS, data = ami_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -373.85 -247.29  -83.74  217.13  462.72 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.729e+03  9.288e+02  -2.938 0.013502 *  
## GEN          7.630e+02  1.685e+02   4.528 0.000861 ***
## AMT          3.064e-01  6.334e-02   4.837 0.000521 ***
## PR           8.896e+00  4.424e+00   2.011 0.069515 .  
## DIAP         7.206e+00  3.354e+00   2.149 0.054782 .  
## QRS          4.987e+00  4.002e+00   1.246 0.238622    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 292.4 on 11 degrees of freedom
## Multiple R-squared:  0.8764, Adjusted R-squared:  0.8202 
## F-statistic:  15.6 on 5 and 11 DF,  p-value: 0.0001132

Keď sa ale vrátime k viacrozmernému modelu, vidíme, že koeificnety zodpovedajúce jednorozmerným modelom sú korelované

coef(mlm1)

##                       TOT           AMI
## (Intercept) -2879.4782461 -2728.7085444
## GEN           675.6507805   763.0297617
## AMT             0.2848511     0.3063734
## PR             10.2721328     8.8961977
## DIAP            7.2511714     7.2055597
## QRS             7.5982397     4.9870508

vcov(mlm1)

##                 TOT:(Intercept)       TOT:GEN      TOT:AMT        TOT:PR
## TOT:(Intercept)   797914.009706 -61055.389981 11.236944164 -3.157872e+03
## TOT:GEN           -61055.389981  26262.026594  1.417050110  1.500225e+02
## TOT:AMT               11.236944      1.417050  0.003710048 -1.096946e-01
## TOT:PR             -3157.872390    150.022536 -0.109694600  1.810410e+01
## TOT:DIAP           -1625.349406    162.904357  0.081729026  3.131923e+00
## TOT:QRS            -1414.943019     49.083508 -0.054166512 -4.282235e-01
## AMI:(Intercept)   702227.760065 -53733.596885  9.889404164 -2.779179e+03
## AMI:GEN           -53733.596885  23112.671147  1.247116747  1.320318e+02
## AMI:AMT                9.889404      1.247117  0.003265137 -9.653997e-02
## AMI:PR             -2779.178744    132.031758 -0.096539968  1.593304e+01
## AMI:DIAP           -1430.436687    143.368784  0.071928040  2.756341e+00
## AMI:QRS            -1245.262340     43.197389 -0.047670836 -3.768708e-01
##                      TOT:DIAP       TOT:QRS AMI:(Intercept)       AMI:GEN
## TOT:(Intercept) -1.625349e+03 -1.414943e+03   702227.760065 -53733.596885
## TOT:GEN          1.629044e+02  4.908351e+01   -53733.596885  23112.671147
## TOT:AMT          8.172903e-02 -5.416651e-02        9.889404      1.247117
## TOT:PR           3.131923e+00 -4.282235e-01    -2779.178744    132.031758
## TOT:DIAP         1.040163e+01  2.535578e+00    -1430.436687    143.368784
## TOT:QRS          2.535578e+00  1.481381e+01    -1245.262340     43.197389
## AMI:(Intercept) -1.430437e+03 -1.245262e+03   862755.907989 -66017.011582
## AMI:GEN          1.433688e+02  4.319739e+01   -66017.011582  28396.190973
## AMI:AMT          7.192804e-02 -4.767084e-02       12.150106      1.532206
## AMI:PR           2.756341e+00 -3.768708e-01    -3414.494580    162.214008
## AMI:DIAP         9.154259e+00  2.231510e+00    -1757.432236    176.142660
## AMI:QRS          2.231510e+00  1.303733e+01    -1529.927329     53.072244
##                      AMI:AMT        AMI:PR      AMI:DIAP       AMI:QRS
## TOT:(Intercept)  9.889404164 -2.779179e+03 -1.430437e+03 -1.245262e+03
## TOT:GEN          1.247116747  1.320318e+02  1.433688e+02  4.319739e+01
## TOT:AMT          0.003265137 -9.653997e-02  7.192804e-02 -4.767084e-02
## TOT:PR          -0.096539968  1.593304e+01  2.756341e+00 -3.768708e-01
## TOT:DIAP         0.071928040  2.756341e+00  9.154259e+00  2.231510e+00
## TOT:QRS         -0.047670836 -3.768708e-01  2.231510e+00  1.303733e+01
## AMI:(Intercept) 12.150106211 -3.414495e+03 -1.757432e+03 -1.529927e+03
## AMI:GEN          1.532205650  1.622140e+02  1.761427e+02  5.307224e+01
## AMI:AMT          0.004011543 -1.186089e-01  8.837068e-02 -5.856831e-02
## AMI:PR          -0.118608851  1.957531e+01  3.386437e+00 -4.630228e-01
## AMI:DIAP         0.088370675  3.386437e+00  1.124691e+01  2.741630e+00
## AMI:QRS         -0.058568314 -4.630228e-01  2.741630e+00  1.601764e+01

Chceme preto urobiť dvojrozmernú analýzu, ktorá by tieto korelácie zohľadnila. Testovanie viacrozmernej hypotézy \(H_0: B=0\) musíme urobiť pomocou príkazu Anova z balíka car, ktorý premenné na ľavej strane modelu chápe ako viacrozmerné pozorovanie

library(car)

## Loading required package: carData

library(heplots)

## 
## Attaching package: 'heplots'

## The following object is masked from 'package:rgl':
## 
##     arrow3d

Anova(mlm1)

## 
## Type II MANOVA Tests: Pillai test statistic
##      Df test stat approx F num Df den Df   Pr(>F)   
## GEN   1   0.65521   9.5015      2     10 0.004873 **
## AMT   1   0.69097  11.1795      2     10 0.002819 **
## PR    1   0.34649   2.6509      2     10 0.119200   
## DIAP  1   0.32381   2.3944      2     10 0.141361   
## QRS   1   0.29184   2.0606      2     10 0.178092   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

heplot(mlm1,col=c("red", "blue"), lwd=2, cex=1.2,verbose=TRUE)

## 
## Type II MANOVA Tests: Pillai test statistic
##      Df test stat approx F num Df den Df   Pr(>F)   
## GEN   1   0.65521   9.5015      2     10 0.004873 **
## AMT   1   0.69097  11.1795      2     10 0.002819 **
## PR    1   0.34649   2.6509      2     10 0.119200   
## DIAP  1   0.32381   2.3944      2     10 0.141361   
## QRS   1   0.29184   2.0606      2     10 0.178092   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## GEN  H matrix ( 1  df):
##          TOT      AMI
## TOT 152314.7 172012.9
## AMI 172012.9 194258.6
## AMT  H matrix ( 1  df):
##          TOT      AMI
## TOT 191638.0 206117.5
## AMI 206117.5 221691.0
## PR  H matrix ( 1  df):
##          TOT      AMI
## TOT 51070.47 44229.67
## AMI 44229.67 38305.18
## DIAP  H matrix ( 1  df):
##          TOT      AMI
## TOT 44293.55 44014.93
## AMI 44014.93 43738.07
## QRS  H matrix ( 1  df):
##          TOT      AMI
## TOT 34149.53 22413.80
## AMI 22413.80 14711.14

Na základe týchto výsledkov môžeme ďalej testovať hypotézu o vplyve PR, DIAP a QRS na odozvu:

lht(mlm1, hypothesis.matrix = c("PR = 0", "DIAP = 0", "QRS = 0"))

## 
## Sum of squares and products for the hypothesis:
##          TOT      AMI
## TOT 930348.1 780517.7
## AMI 780517.7 679948.4
## 
## Sum of squares and products for error:
##          TOT      AMI
## TOT 870008.3 765676.5
## AMI 765676.5 940708.9
## 
## Multivariate Tests: 
##                  Df test stat approx F num Df den Df  Pr(>F)  
## Pillai            3 0.6038599 1.585910      6     22 0.19830  
## Wilks             3 0.4405021 1.688991      6     20 0.17553  
## Hotelling-Lawley  3 1.1694286 1.754143      6     18 0.16574  
## Roy               3 1.0758181 3.944666      3     11 0.03906 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

heplot(mlm1,hypotheses=list("H0" =c("PR", "DIAP", "QRS")),col=rainbow(10),lwd=c(2, 3, 3, 3, 2), cex=1.25)

Alternatívou je použiť príkaz anovana porovnanie pôvodného modelu s modelom bez premenných PR, DIAP a QRS:

mlm2 <- update(mlm1, . ~ . - PR - DIAP - QRS)
anova(mlm1, mlm2)

## Analysis of Variance Table
## 
## Model 1: cbind(TOT, AMI) ~ GEN + AMT + PR + DIAP + QRS
## Model 2: cbind(TOT, AMI) ~ GEN + AMT
##   Res.Df Df Gen.var.  Pillai approx F num Df den Df Pr(>F)
## 1     11       43803                                      
## 2     14  3    51856 0.60386   1.5859      6     22 0.1983

Príklad 3: Rohwer

Vezmime teraz dáta Rohwer z balíka heplots.

data(Rohwer)
some(Rohwer,n=6)

##    group SES SAT PPVT Raven  n  s ns na ss
## 26     1  Lo  60   80    11  3  8 18 28 21
## 28     1  Lo   6   70    16  2 11  9 23 11
## 42     2  Hi  90   82    13 20  7 21 28 16
## 43     2  Hi  77  100    15  4 11 18 32 29
## 47     2  Hi  98   91    18 16 12 16 27 30
## 48     2  Hi   8   87    10  5  3 17 25 24

Zadanie lineárneho modelu urobíme rovnako ako pre jednorozmerné modely pomocou príkazu lm

mod1 <- lm(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss,data=Rohwer)
summary(Anova(mod1))

## 
## Type II MANOVA Tests:
## 
## Sum of squares and products for error:
##             SAT       PPVT     Raven
## SAT   44061.078  6205.5496 1808.5518
## PPVT   6205.550 11259.4196  969.4692
## Raven  1808.552   969.4692  553.5231
## 
## ------------------------------------------
##  
## Term: n 
## 
## Sum of squares and products for the hypothesis:
##              SAT      PPVT    Raven
## SAT   2448.44503 760.05215 90.13861
## PPVT   760.05215 235.93720 27.98104
## Raven   90.13861  27.98104  3.31842
## 
## Multivariate Tests: n
##                  Df test stat approx F num Df den Df  Pr(>F)
## Pillai            1 0.0599640 1.297044      3     61 0.28358
## Wilks             1 0.9400360 1.297044      3     61 0.28358
## Hotelling-Lawley  1 0.0637891 1.297044      3     61 0.28358
## Roy               1 0.0637891 1.297044      3     61 0.28358
## 
## ------------------------------------------
##  
## Term: s 
## 
## Sum of squares and products for the hypothesis:
##             SAT      PPVT      Raven
## SAT    3.208771  27.65146  -8.897894
## PPVT  27.651463 238.28545 -76.677269
## Raven -8.897894 -76.67727  24.673784
## 
## Multivariate Tests: s
##                  Df test stat approx F num Df den Df   Pr(>F)  
## Pillai            1 0.0977878 2.203862      3     61 0.096703 .
## Wilks             1 0.9022122 2.203862      3     61 0.096703 .
## Hotelling-Lawley  1 0.1083866 2.203862      3     61 0.096703 .
## Roy               1 0.1083866 2.203862      3     61 0.096703 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## ------------------------------------------
##  
## Term: ns 
## 
## Sum of squares and products for the hypothesis:
##             SAT       PPVT      Raven
## SAT   7201.4233 1632.66526 -205.37183
## PPVT  1632.6653  370.14848  -46.56072
## Raven -205.3718  -46.56072    5.85684
## 
## Multivariate Tests: ns
##                  Df test stat approx F num Df den Df   Pr(>F)   
## Pillai            1 0.2088201  5.36668      3     61 0.002406 **
## Wilks             1 0.7911799  5.36668      3     61 0.002406 **
## Hotelling-Lawley  1 0.2639351  5.36668      3     61 0.002406 **
## Roy               1 0.2639351  5.36668      3     61 0.002406 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## ------------------------------------------
##  
## Term: na 
## 
## Sum of squares and products for the hypothesis:
##              SAT       PPVT       Raven
## SAT   3860.98360 2352.77010 -20.5161806
## PPVT  2352.77010 1433.70906 -12.5019584
## Raven  -20.51618  -12.50196   0.1090172
## 
## Multivariate Tests: na
##                  Df test stat approx F num Df den Df    Pr(>F)   
## Pillai            1 0.1834781  4.56904      3     61 0.0059522 **
## Wilks             1 0.8165219  4.56904      3     61 0.0059522 **
## Hotelling-Lawley  1 0.2247069  4.56904      3     61 0.0059522 **
## Roy               1 0.2247069  4.56904      3     61 0.0059522 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## ------------------------------------------
##  
## Term: ss 
## 
## Sum of squares and products for the hypothesis:
##              SAT       PPVT     Raven
## SAT   1419.37607 1135.36471 46.502231
## PPVT  1135.36471  908.18287 37.197324
## Raven   46.50223   37.19732  1.523527
## 
## Multivariate Tests: ss
##                  Df test stat approx F num Df den Df  Pr(>F)
## Pillai            1 0.0917965 2.055187      3     61 0.11552
## Wilks             1 0.9082035 2.055187      3     61 0.11552
## Hotelling-Lawley  1 0.1010748 2.055187      3     61 0.11552
## Roy               1 0.1010748 2.055187      3     61 0.11552

coefplot(mod1, lwd=2, fill=(1:5) %in% 3:4, level=0.95)

Vizuálne porovnanie “veľkosti” matíc H a E

heplot(mod1,col=c("red", "black"), lwd=2, cex=1.2,verbose=TRUE)

## 
## Type II MANOVA Tests: Pillai test statistic
##    Df test stat approx F num Df den Df   Pr(>F)   
## n   1  0.059964   1.2970      3     61 0.283582   
## s   1  0.097788   2.2039      3     61 0.096703 . 
## ns  1  0.208820   5.3667      3     61 0.002406 **
## na  1  0.183478   4.5690      3     61 0.005952 **
## ss  1  0.091796   2.0552      3     61 0.115521   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## n  H matrix ( 1  df):
##            SAT     PPVT
## SAT  286.78803 89.02542
## PPVT  89.02542 27.63548
## s  H matrix ( 1  df):
##            SAT      PPVT
## SAT  0.3758455  3.238835
## PPVT 3.2388346 27.910536
## ns  H matrix ( 1  df):
##           SAT      PPVT
## SAT  843.5076 191.23519
## PPVT 191.2352  43.35574
## na  H matrix ( 1  df):
##           SAT     PPVT
## SAT  452.2396 275.5816
## PPVT 275.5816 167.9313
## ss  H matrix ( 1  df):
##           SAT     PPVT
## SAT  166.2525 132.9860
## PPVT 132.9860 106.3761

pairs(mod1)

Testovanie lineárnej hypotézy LB==0

na,ns nemajú žiadny vplyv na pozorovania

L <- matrix(0, nrow=2, ncol=6)
L[1, 5] <- 1
L[2, 4] <- 1
Y <- as.matrix(Rohwer[,3:5])
X <- as.matrix(Rohwer[,6:10])
X <- cbind(rep(1,dim(X)[1]), X)
B <- solve(t(X)%*%X)%*%t(X)%*%Y
H <- t(B) %*% t(L) %*% solve(L %*% solve(t(X)%*%X) %*% t(L)) %*% L %*% B
E <- t(Y) %*% Y - t(B) %*% solve(t(X)%*%X) %*% B

lht(mod1, c("na", "ns"),verbose=TRUE)

## 
## Hypothesis matrix:
##    (Intercept) n s ns na ss
## na           0 0 0  0  1  0
## ns           0 0 0  1  0  0
## 
## Right-hand-side matrix:
##    SAT PPVT Raven
## na   0    0     0
## ns   0    0     0
## 
## Estimated linear function (hypothesis.matrix %*% coef - rhs):
##          SAT       PPVT       Raven
## na  2.097832  1.2783575 -0.01114729
## ns -2.802803 -0.6354355  0.07993098

## 
## Sum of squares and products for the hypothesis:
##             SAT       PPVT       Raven
## SAT   8420.4811 2793.94988 -186.475924
## PPVT  2793.9499 1476.39789  -28.560325
## Raven -186.4759  -28.56033    6.149735
## 
## Sum of squares and products for error:
##             SAT       PPVT     Raven
## SAT   44061.078  6205.5496 1808.5518
## PPVT   6205.550 11259.4196  969.4692
## Raven  1808.552   969.4692  553.5231
## 
## Multivariate Tests: 
##                  Df test stat approx F num Df den Df     Pr(>F)    
## Pillai            2 0.2989130 3.631523      6    124 0.00235885 ** 
## Wilks             2 0.7124255 3.756781      6    122 0.00182225 ** 
## Hotelling-Lawley  2 0.3877402 3.877402      6    120 0.00142373 ** 
## Roy               2 0.3410783 7.048952      3     62 0.00037467 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

heplot(mod1,hypotheses=list("H0" =c("na", "ns")),col=rainbow(10),lwd=c(2, 3, 3, 3, 2), cex=1.25)

test o rovnosti na=ns

linearHypothesis(mod1,"na-ns",verbose=TRUE)

## 
## Hypothesis matrix:
##       (Intercept) n s ns na ss
## na-ns           0 0 0 -1  1  0
## 
## Right-hand-side matrix:
##       SAT PPVT Raven
## na-ns   0    0     0
## 
## Estimated linear function (hypothesis.matrix %*% coef - rhs):
##         SAT        PPVT       Raven 
##  4.90063575  1.91379297 -0.09107827

## 
## Sum of squares and products for the hypothesis:
##            SAT       PPVT       Raven
## SAT   7997.851 3123.31556 -148.640002
## PPVT  3123.316 1219.71508  -58.046793
## Raven -148.640  -58.04679    2.762473
## 
## Sum of squares and products for error:
##             SAT       PPVT     Raven
## SAT   44061.078  6205.5496 1808.5518
## PPVT   6205.550 11259.4196  969.4692
## Raven  1808.552   969.4692  553.5231
## 
## Multivariate Tests: 
##                  Df test stat approx F num Df den Df     Pr(>F)    
## Pillai            1 0.2533129 6.898064      3     61 0.00044825 ***
## Wilks             1 0.7466871 6.898064      3     61 0.00044825 ***
## Hotelling-Lawley  1 0.3392491 6.898064      3     61 0.00044825 ***
## Roy               1 0.3392491 6.898064      3     61 0.00044825 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

heplot(mod1,hypotheses=list("H0" ="na-ns"),col=rainbow(10),lwd=c(2, 3, 3, 3, 2), cex=1.25)

heplot3d(mod1,hypotheses=list("H0" =c("s","na", "ns")),col=rainbow(10))

You must enable Javascript to view this page properly.

Modely pre znevýhodnené a normálne deti zvlášť

rohwer.ses1 <- lm(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss, data=Rohwer, subset=SES=="Hi")
Anova(rohwer.ses1)

## 
## Type II MANOVA Tests: Pillai test statistic
##    Df test stat approx F num Df den Df  Pr(>F)   
## n   1   0.20177   2.0222      3     24 0.13760   
## s   1   0.30970   3.5891      3     24 0.02836 * 
## ns  1   0.35795   4.4601      3     24 0.01260 * 
## na  1   0.46516   6.9576      3     24 0.00157 **
## ss  1   0.08873   0.7790      3     24 0.51725   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

rohwer.ses2 <- lm(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss, data=Rohwer, subset=SES=="Lo")
Anova(rohwer.ses2)

## 
## Type II MANOVA Tests: Pillai test statistic
##    Df test stat approx F num Df den Df  Pr(>F)  
## n   1  0.038358   0.3856      3     29 0.76418  
## s   1  0.111793   1.2167      3     29 0.32130  
## ns  1  0.225221   2.8100      3     29 0.05696 .
## na  1  0.267458   3.5294      3     29 0.02705 *
## ss  1  0.138962   1.5601      3     29 0.22030  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Vypíšeme odhady neznámych parametrov

coef(rohwer.ses1)

##                     SAT       PPVT      Raven
## (Intercept) -28.4674707 39.6970896 13.2438356
## n             3.2571323  0.0672825  0.0593474
## s             2.9965832  0.3699843  0.4924440
## ns           -5.8590628 -0.3743802 -0.1640219
## na            5.6662226  1.5230090  0.1189805
## ss           -0.6226532  0.4101567 -0.1211564

coef(rohwer.ses2)

##                     SAT        PPVT       Raven
## (Intercept)  4.15106027 33.00576902 11.17337818
## n           -0.60887193 -0.08056682  0.21099550
## s           -0.05015614 -0.72105017  0.06456701
## ns          -1.73239505 -0.29830269  0.21358423
## na           0.49456476  1.47041758 -0.03731804
## ss           2.24772090  0.32396465 -0.05214253

Porovnanie veličín H a E zvlášť pre normálne (modrá) a znevýhodnené (červená) deti

heplot(rohwer.ses1, ylim=c(40,110),col=c("red", "black"), lwd=2, cex=1.2)
heplot(rohwer.ses2, add=TRUE, col=c("blue", "black"), grand.mean=TRUE, error.ellipse=TRUE, lwd=2, cex=1.2)
means <- aggregate(cbind(SAT,PPVT)~SES, data=Rohwer, mean)
text(means[,2], means[,3], labels=means[,1], pos=3, cex=2, col=c("red", "blue"))

Príklad 4: NLSY

Uvažujme teraz dáta NLSY, v ktorých sa sledovali výsledky detí z matematiky a čítania v závislosti od štyroch ďalších premenných. Urobíme podobnú analýzu ako v predchádzajúcom príklade.

data(NLSY)
some(NLSY,n=6)

##      math  read antisoc hyperact income educ
## 13  19.05 30.95       2        4 14.594   12
## 75  47.62 34.52       0        0 40.050   12
## 123 19.05 26.19       0        0 21.000   11
## 192 34.52 26.19       1        0 21.012   14
## 218 21.43 21.43       0        0 13.666   15
## 221 22.62 22.62       2        0 24.500    6

mod1 <- lm(cbind(read, math) ~ income + educ + antisoc + hyperact, data=NLSY)
summary(Anova(mod1))

## 
## Type II MANOVA Tests:
## 
## Sum of squares and products for error:
##          read     math
## read 22712.12 11790.46
## math 11790.46 23247.77
## 
## ------------------------------------------
##  
## Term: income 
## 
## Sum of squares and products for the hypothesis:
##           read     math
## read  17.34907 116.7099
## math 116.70993 785.1261
## 
## Multivariate Tests: income
##                  Df test stat approx F num Df den Df    Pr(>F)   
## Pillai            1 0.0382795 4.716675      2    237 0.0098015 **
## Wilks             1 0.9617205 4.716675      2    237 0.0098015 **
## Hotelling-Lawley  1 0.0398032 4.716675      2    237 0.0098015 **
## Roy               1 0.0398032 4.716675      2    237 0.0098015 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## ------------------------------------------
##  
## Term: educ 
## 
## Sum of squares and products for the hypothesis:
##          read      math
## read 659.4951  895.6072
## math 895.6072 1216.2522
## 
## Multivariate Tests: educ
##                  Df test stat approx F num Df den Df   Pr(>F)   
## Pillai            1 0.0531518 6.652055      2    237 0.001546 **
## Wilks             1 0.9468482 6.652055      2    237 0.001546 **
## Hotelling-Lawley  1 0.0561355 6.652055      2    237 0.001546 **
## Roy               1 0.0561355 6.652055      2    237 0.001546 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## ------------------------------------------
##  
## Term: antisoc 
## 
## Sum of squares and products for the hypothesis:
##           read     math
## read  73.68227 182.4647
## math 182.46468 451.8504
## 
## Multivariate Tests: antisoc
##                  Df test stat approx F num Df den Df   Pr(>F)  
## Pillai            1 0.0193432 2.337379      2    237 0.098803 .
## Wilks             1 0.9806568 2.337379      2    237 0.098803 .
## Hotelling-Lawley  1 0.0197247 2.337379      2    237 0.098803 .
## Roy               1 0.0197247 2.337379      2    237 0.098803 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## ------------------------------------------
##  
## Term: hyperact 
## 
## Sum of squares and products for the hypothesis:
##          read     math
## read 165.4839 230.4094
## math 230.4094 320.8074
## 
## Multivariate Tests: hyperact
##                  Df test stat approx F num Df den Df  Pr(>F)
## Pillai            1 0.0144419  1.73644      2    237 0.17838
## Wilks             1 0.9855581  1.73644      2    237 0.17838
## Hotelling-Lawley  1 0.0146535  1.73644      2    237 0.17838
## Roy               1 0.0146535  1.73644      2    237 0.17838

coefplot(mod1, lwd=2, fill=(1:5) %in% 3:4, level=0.95)

heplot(mod1,col=c("red", "black"), lwd=2, cex=1.2,verbose=TRUE)

## 
## Type II MANOVA Tests: Pillai test statistic
##          Df test stat approx F num Df den Df   Pr(>F)   
## income    1  0.038280   4.7167      2    237 0.009801 **
## educ      1  0.053152   6.6521      2    237 0.001546 **
## antisoc   1  0.019343   2.3374      2    237 0.098803 . 
## hyperact  1  0.014442   1.7364      2    237 0.178380   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## income  H matrix ( 1  df):
##          read      math
## read  2.84717  19.15336
## math 19.15336 128.84770
## educ  H matrix ( 1  df):
##          read     math
## read 108.2303 146.9789
## math 146.9789 199.6002
## antisoc  H matrix ( 1  df):
##          read     math
## read 12.09206 29.94443
## math 29.94443 74.15354
## hyperact  H matrix ( 1  df):
##          read     math
## read 27.15771 37.81267
## math 37.81267 52.64798

Testovanie lineárnej hypotézy LB==0

hyperact, antisoc nemajú žiadny vplyv na pozorovania

L <- matrix(0, nrow=2, ncol=5)
L[1, 4] <- 1
L[2, 5] <- 1
Y <- as.matrix(NLSY[,1:2])
X <- as.matrix(NLSY[,3:6])
X <- cbind(rep(1,dim(X)[1]), X)
B <- solve(t(X)%*%X)%*%t(X)%*%Y
H <- t(B) %*% t(L) %*% solve(L %*% solve(t(X)%*%X) %*% t(L)) %*% L %*% B
E <- t(Y) %*% Y - t(B) %*% solve(t(X)%*%X) %*% B

lht(mod1, c("hyperact", "antisoc"),verbose=TRUE)

## 
## Hypothesis matrix:
##          (Intercept) income educ antisoc hyperact
## hyperact           0      0    0       0        1
## antisoc            0      0    0       1        0
## 
## Right-hand-side matrix:
##          read math
## hyperact    0    0
## antisoc     0    0
## 
## Estimated linear function (hypothesis.matrix %*% coef - rhs):
##                read       math
## hyperact -0.6376073 -0.8877641
## antisoc   0.4502605  1.1150123

## 
## Sum of squares and products for the hypothesis:
##          read     math
## read 170.3478 261.2230
## math 261.2230 516.0188
## 
## Sum of squares and products for error:
##          read     math
## read 22712.12 11790.46
## math 11790.46 23247.77
## 
## Multivariate Tests: 
##                  Df test stat approx F num Df den Df   Pr(>F)  
## Pillai            2 0.0239869 1.444548      4    476 0.218172  
## Wilks             2 0.9760624 1.444284      4    474 0.218264  
## Hotelling-Lawley  2 0.0244741 1.443972      4    472 0.218372  
## Roy               2 0.0221965 2.641385      2    238 0.073351 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

heplot(mod1,hypotheses=list("H0" =c("hyperact", "antisoc")),col=rainbow(10),lwd=c(2, 3, 3, 3, 2), cex=1.25)

test o rovnosti hyperact=antisoc

linearHypothesis(mod1,"hyperact-antisoc",verbose=TRUE)

## 
## Hypothesis matrix:
##                  (Intercept) income educ antisoc hyperact
## hyperact-antisoc           0      0    0      -1        1
## 
## Right-hand-side matrix:
##                  read math
## hyperact-antisoc    0    0
## 
## Estimated linear function (hypothesis.matrix %*% coef - rhs):
##      read      math 
## -1.087868 -2.002776

## 
## Sum of squares and products for the hypothesis:
##          read     math
## read 149.4846 275.2028
## math 275.2028 506.6514
## 
## Sum of squares and products for error:
##          read     math
## read 22712.12 11790.46
## math 11790.46 23247.77
## 
## Multivariate Tests: 
##                  Df test stat approx F num Df den Df   Pr(>F)  
## Pillai            1 0.0213663 2.587189      2    237 0.077355 .
## Wilks             1 0.9786337 2.587189      2    237 0.077355 .
## Hotelling-Lawley  1 0.0218328 2.587189      2    237 0.077355 .
## Roy               1 0.0218328 2.587189      2    237 0.077355 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

heplot(mod1,hypotheses=list("H0" ="hyperact-antisoc"),col=rainbow(10),lwd=c(2, 3, 3, 3, 2), cex=1.25)

Regresná analýza

Lenka Filová

Viacrozmerná pregresná analýza

Príklad 1: chemická reakcia

Príklad 2: Amitriptylin

Príklad 3: Rohwer

Testovanie lineárnej hypotézy LB==0

Modely pre znevýhodnené a normálne deti zvlášť

Príklad 4: NLSY

Testovanie lineárnej hypotézy LB==0

Úloha