Merali sme diastolický krvný tlak na začiatku experimentu (DBP1) a potom počas 4 mesiacov raz za mesiac (DBP2 - DBP5). Máme ešte k dispozícii informáciu o pohlaví a veku každého pacienta. Cieľom experimentu bolo otestovať, či liek A je účinný na zníženie krvného tlaku v porovnaní s placebom (B).
library(RODBC)
dat <- read.table("http://www.iam.fmph.uniba.sk/ospm/Filova/clin/DBP.csv", header=TRUE, sep=",")
head(dat)
## Subject TRT DBP1 DBP2 DBP3 DBP4 DBP5 Age Sex
## 1 1 A 114 115 113 109 105 43 F
## 2 2 A 116 113 112 103 101 51 M
## 3 3 A 119 115 113 104 98 48 F
## 4 4 A 115 113 112 109 101 42 F
## 5 5 A 116 112 107 104 105 49 M
## 6 6 A 117 112 113 104 102 47 M
summary(dat)
## Subject TRT DBP1 DBP2 DBP3
## Min. : 1.00 A:20 Min. :114.0 Min. :111.0 Min. :100.0
## 1st Qu.:10.75 B:20 1st Qu.:115.0 1st Qu.:113.0 1st Qu.:112.0
## Median :20.50 Median :116.5 Median :115.0 Median :113.0
## Mean :20.50 Mean :116.7 Mean :114.3 Mean :112.4
## 3rd Qu.:30.25 3rd Qu.:118.0 3rd Qu.:115.0 3rd Qu.:113.0
## Max. :40.00 Max. :121.0 Max. :119.0 Max. :118.0
## DBP4 DBP5 Age Sex
## Min. :102.0 Min. : 97.0 Min. :38.00 F:18
## 1st Qu.:106.8 1st Qu.:101.8 1st Qu.:42.00 M:22
## Median :109.0 Median :106.5 Median :48.00
## Mean :109.3 Mean :106.7 Mean :47.83
## 3rd Qu.:113.2 3rd Qu.:112.0 3rd Qu.:51.25
## Max. :117.0 Max. :115.0 Max. :63.00
Zmeníme formát dát, aby sme ich vedeli ďalej analyzovať:
Dat <- reshape(dat, direction="long", varying = c("DBP1","DBP2","DBP3","DBP4","DBP5"),
idvar = c("Subject","TRT","Age","Sex"),sep="")
colnames(Dat) <- c("Subject","TRT","Age","Sex","Time","DBP")
head(Dat)
## Subject TRT Age Sex Time DBP
## 1.A.43.F.1 1 A 43 F 1 114
## 2.A.51.M.1 2 A 51 M 1 116
## 3.A.48.F.1 3 A 48 F 1 119
## 4.A.42.F.1 4 A 42 F 1 115
## 5.A.49.M.1 5 A 49 M 1 116
## 6.A.47.M.1 6 A 47 M 1 117
Najskôr sa pozrieme na krvý tlak v závislosti od času pre každého pacienta. Pacientov, ktorí dostali liek A, vykreslíme plnou čiarou a pacientov, ktorí dostali placebo, prerušovanou čiarou.
library(lattice)
print(xyplot(DBP~Time|as.factor(Subject),type="l",groups=TRT,
strip=strip.custom(bg="white"),lty=c(1,8),lwd=2,layout=c(10,4), Dat))
print(xyplot(DBP~Time|TRT,type="l",Dat, groups=as.factor(Subject),strip=strip.custom(bg="white")))
Ďalšiu vizualizáciu môžeme urobiť pomocou boxplotu
print(bwplot(DBP~as.factor(Time)|TRT,Dat, xlab="Time",strip=strip.custom(bg="white")))
Tieto obrázky nám naznačujú, že pokles tlaku u pacientov v skupine A je výraznejší ako u pacientov v skupine B. Môžeme to overiť napríklad tak, že pre každého pacienta fitujeme jeho dátami priamku a zaznamenáme, aký má táto priamka sklon:
num.Subj <- 40
intercept <- slope <- numeric(num.Subj)
for(i in 1:num.Subj){
mod <- lm(DBP~Time, Dat[Dat$Subject==i,])
intercept[i] <- coef(mod)[1]
slope[i] <- coef(mod)[2]
}
dat.coef <- data.frame(Subject=dat$Subject,TRT=dat$TRT,
Intercept <- intercept, Slope=slope)
dat.coef
## Subject TRT Intercept....intercept Slope
## 1 1 A 118.4 -2.4
## 2 2 A 121.0 -4.0
## 3 3 A 125.7 -5.3
## 4 4 A 119.6 -3.2
## 5 5 A 117.8 -3.0
## 6 6 A 121.0 -3.8
## 7 7 A 119.4 -4.0
## 8 8 A 125.1 -4.9
## 9 9 A 117.7 -2.5
## 10 10 A 120.7 -4.3
## 11 11 A 120.3 -3.5
## 12 12 A 121.2 -3.4
## 13 13 A 123.5 -4.1
## 14 14 A 124.7 -5.1
## 15 15 A 118.3 -3.3
## 16 16 A 118.2 -3.2
## 17 17 A 121.0 -3.6
## 18 18 A 124.2 -4.2
## 19 19 A 119.1 -3.7
## 20 20 A 122.4 -3.8
## 21 21 B 115.0 -0.6
## 22 22 B 117.7 -1.7
## 23 23 B 116.6 -1.4
## 24 24 B 113.9 0.1
## 25 25 B 117.4 -1.8
## 26 26 B 116.4 -1.2
## 27 27 B 120.1 -0.9
## 28 28 B 119.9 -1.3
## 29 29 B 116.2 -1.6
## 30 30 B 119.6 -1.6
## 31 31 B 116.3 -0.5
## 32 32 B 118.9 -2.1
## 33 33 B 122.3 -1.7
## 34 34 B 117.7 -1.1
## 35 35 B 119.9 -1.7
## 36 36 B 119.9 -1.7
## 37 37 B 117.9 -1.9
## 38 38 B 116.9 -0.9
## 39 39 B 116.3 -0.5
## 40 40 B 116.8 -0.6
Vidíme, že intercepty sa pohybujú okolo 120mmHG a sklony okolo -2.5mmHG za mesiac. Aby sme lepšie odlíšili tieto dva parametre po skupinách, vykreslíme dvojrozmerný graf s interceptom na osi x a sklonom na osi y:
int.hist <- hist(intercept,plot=F)
slope.hist <- hist(slope,plot=F)
top <- max(c(int.hist$counts, slope.hist$counts))
nf <- layout(matrix(c(2,0,1,3),2,2,byrow=T),c(3,1), c(1,3),T)
par(mar=c(5,4,1,1))
plot(Slope~Intercept,las=1,dat.coef,xlab="Intercept", ylab="Slope",pch=as.character(TRT))
par(mar=c(0,4,1,1))
barplot(int.hist$counts, axes=FALSE, ylim=c(0, top), space=0)
par(mar=c(5,0,1,1))
barplot(slope.hist$counts, axes=FALSE,xlim=c(0, top), space=0, horiz=TRUE)
Môžeme teraz modelovať vzťah medzi interceptom a sklonom lineárnou regresiou s interakciami aj bez interakcií:
# model s interakciami
mod1.coef <- lm(Slope~Intercept*TRT, dat.coef)
summary(mod1.coef)
##
## Call:
## lm(formula = Slope ~ Intercept * TRT, data = dat.coef)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.66359 -0.29475 -0.03143 0.34701 0.75317
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 28.33737 4.68741 6.045 6.04e-07 ***
## Intercept -0.26539 0.03874 -6.850 5.17e-08 ***
## TRTB -8.29639 7.37956 -1.124 0.268
## Intercept:TRTB 0.08475 0.06198 1.367 0.180
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4293 on 36 degrees of freedom
## Multiple R-squared: 0.919, Adjusted R-squared: 0.9122
## F-statistic: 136.1 on 3 and 36 DF, p-value: < 2.2e-16
# model bez interakcii
mod2.coef <- lm(Slope~Intercept+TRT, dat.coef)
summary(mod2.coef)
##
## Call:
## lm(formula = Slope ~ Intercept + TRT, data = dat.coef)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.73316 -0.38494 0.02806 0.33483 0.87272
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 24.33216 3.70220 6.572 1.06e-07 ***
## Intercept -0.23228 0.03059 -7.592 4.68e-09 ***
## TRTB 1.79136 0.16831 10.643 8.16e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4343 on 37 degrees of freedom
## Multiple R-squared: 0.9147, Adjusted R-squared: 0.9101
## F-statistic: 198.5 on 2 and 37 DF, p-value: < 2.2e-16
Z modelu 1 vidíme, že interakcie nie sú významné, môžeme preto použiť model 2, z ktorého vidíme signifikantný rozdiel medzi ošetreniami.
Ďalšie porovnanie ošetrení môžeme urobiť pomocou t-testu:
t.test(Slope~TRT, dat.coef)
##
## Welch Two Sample t-test
##
## data: Slope by TRT
## t = -11.673, df = 35.556, p-value = 1.019e-13
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -2.96976 -2.09024
## sample estimates:
## mean in group A mean in group B
## -3.765 -1.235
t.test(Intercept~TRT, dat.coef)
##
## Welch Two Sample t-test
##
## data: Intercept by TRT
## t = 4.3669, df = 36.266, p-value = 0.0001008
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.703495 4.656505
## sample estimates:
## mean in group A mean in group B
## 120.965 117.785