1HrStats2

0

No comments posted yet

Comments

Slide 1

One Hour BioStats Loren Yamamoto, MD, MPH, MBA Professor of Pediatrics University of Hawaii John A. Burns School of Medicine

Slide 2

Objectives Different types of data (measurements, observations, etc.) Descriptive vs. inferential statistics How to select a statistical test Study design types

Slide 3

Types of data Continuous eg., age, weight, BP, peak flow, oxygen saturation, cholesterol Discrete (categorical) eg., gender, ethnicity, socioeconomic status, medical insurance Dichotomous variables - only two categories

Slide 4

Converting continuous data into discrete data Grouping continuous data; eg., age groups Group 1: ages 0-1 year Group 2: ages 2-5 years Group 3: ages 6-10 years Group 4: ages 11-20 years

Slide 5

Converting discrete data into continuous data Ranking hair color: Blonde hair = 1 Brown hair = 3 Black hair = 5 What value is Red hair ? How is the rank used ? Military rank: Lt, Capt, Major, etc. Most discrete data cannot be converted into continuous data

Slide 6

Why are data types important ? Data types determine the statistical test that must be used

Slide 7

Descriptive vs. Inferential statistics Description of data using representative numbers Inference via statistical tests yielding p values

Slide 8

Descriptive Statistics Continuous: Mean, median, mode Standard deviation, variance Confidence interval Discrete: Rates (many kinds) Proportions

Slide 9

Examples: Descriptive Statistics Continuous: Weight in men and women Men: Mean 160 lbs, SD 35 lbs Women: Mean 110 lbs, SD 40 lbs Discrete: Eye color in men and women Men: brown 50%, grey 35%, other 15% Women: brown 45%, grey 30%, other 25%

Slide 10

Distribution of measurements

Slide 11

Descriptive vs. Inferential Statistics Are these two groups significantly different ?? Attempts to compare two or more groups of observations to determine if the groups are significantly different

Slide 12

Who is smarter ? Doctors vs. Lawyers Women vs. Men Iolani vs. Punahou Harvard vs. Yale Pediatricians vs. Internists Wookies vs. Jupitrons

Slide 13

Mean IQ: Wookies vs. Jupitrons 100 Wookies: mean IQ 127, SD 45 90 Jupitrons: mean IQ 139, SD 28

Slide 14

Jupitrons have a higher mean IQ But is this different significant, or it is due to chance alone ?

Slide 15

T-test compares two means 100 Wookies: mean IQ 127, SD 45 90 Jupitrons: mean IQ 139, SD 28 p=0.24 not significant

Slide 16

Is the difference significant ? Magnitude of the difference Spread of the data (variance, SD) 100 Wookies: mean IQ 127: SD 45 vs. 6 90 Jupitrons: mean IQ 139: SD 28 vs. 5 p=0.24 p=0.02

Slide 17

p<0.05 is significant If p<0.05, this means that the probability that this difference is due to chance, is less than 5%. This cutoff value is commonly chosen to distinguish between significant and non-significant. For example, if p=0.052, then the difference is NOT significant. But if p=0.049, then the difference IS significant. This is a bit arbitrary, but we have decide on a cutoff value somewhere so it is p<0.05

Slide 18

Choosing a statistical test T-test Analysis of Variance (ANOVA) Chi-Square (Crosstabulation) Regression

Slide 19

Test selection is based on data type Continuous variable by 2 groups Continuous variable by >2 groups Discrete variable by 2 groups Discrete variable by >2 groups Determining the relationship between one or more continuous variables

Slide 20

t-Test Compares a continuous variable between two groups: BP in men vs. women IQ in Wookies vs. Jupitrons Annual salary in pediatricians vs. OB's

Slide 21

t-Test SPSS/PC+ Summaries of BMI By levels of GROUP Value Label Mean Std Dev Sum of Sq Cases 1 (musicians) 20.6278 3.6412 464.0322 36 2 (tennis) 19.6186 2.3439 379.0859 70 ----------------------------------- Within Groups Total 19.9613 2.8473 843.1181 106 Analysis of Variance Sum of Mean Source Squares D.F. Square F Sig. Between Groups 24.2133 1 24.2133 2.9868 .0869

Slide 22

Paired t-Test Compares a continuous variable between two groups, BUT: Observations occur in pairs (there is a link between the pair of observations) Restaurant study: Calories at McDonald’s vs Outback Jellyfish study: Jellyfish sting relief heat vs vinegar

Slide 23

t-Test vs Paired t-Test Calories at McDonald’s vs Outback 10 subjects go to McDonald’s and 10 go to Outback = t-Test 10 subjects go to McDonald’s and the next week, the same 10 go to Outback = Paired t-Test Jellyfish sting remedies 12 get treated with vinegar and 13 get treated with heat = t-Test 15 get treated with vinegar on half the sting and heat on the other half of the sting = Paired t-Test.

Slide 24

Jellyfish Stings Paired t-Test

Slide 26

Paired t-Test Paired samples t-test: MCCAL (McDonald’s calories) OUCAL (Outback steakhouse cals) Variable Number Standard Standard of Cases Mean Deviation Error MCCAL 104 1015.6 453.4 44.5 OUCAL 104 1656.3 1124.0 110.2 (Diff) Std Std | 2-Tail | t 2-Tail Mean Dev Error |Corr. Prob. |Value D.F. Prob. | | -640.7 1014.4 99.5 |.432 .000 |-6.44 103 .000

Slide 27

Analysis of Variance Compares a continuous variable between more than two groups Annual salary in Peds vs. OBs vs. FPs SAT scores among public schools: Moanalua, McKinley, Aiea, Pearl City, etc. p<0.05 indicates that at least one group is different from the others

Slide 28

ANOVA Summaries of HTCMCHNG By levels of BALLTYPE Value Label Mean Std Dev Sum of Sq Cases 1 Std ball .5681 .3144 4.5457 47 2 RIF-1 .0091 .0314 .0444 46 3 RIF-5 .3805 .1961 1.5386 41 4 RIF-10 .6086 1.3585 119.9568 66 ----------------------------------------- Total .4145 .8021 126.0855 200 Criterion Variable HTCMCHNG Analysis of Variance Sum of Mean Source Squares D.F. Square F Sig. Between Groups 11.2025 3 3.7342 5.8048 .0008

Slide 29

Chi-Square (Crosstabulation) Compares discrete variable between two or more groups Coronary artery disease in joggers vs. non-joggers Kawasaki disease in different ethnic groups Bacteremia in males vs. females

Slide 30

Chi-Square

Slide 31

Regression (Linear, Multiple) Determines the correlation between a continuous variable and other continuous variables BP = 120 + a(Age) + b(Sex) + c(Salt) - d(Activity) ED Asthma Census = X + a(Virus) + b(Weather) + c(Season) + d(Air Qual) Corr Coeff (r) Least squares method of linear regression

Slide 32

Regression Regression Statistics Multiple R 0.897 R Square 0.805 Adj R2 0.804 Std Error 1.062 Observations 178 ANOVA df SSq MS Regress 1 819.0 819 Resid 176 198.5 1.13 Total 177 1017.5 F 726.3, Signif 2.29E-64

Slide 33

Assumptions of tests Normal distribution Generally NOT true Central limit theorem Non-Parametric methods Wilcoxon Rank-Sum test Mann Whitney U test

Slide 34

Null hypothesis Ho Generally, this hypothesizes that two groups of measurements are the same. The null hypothesis should be established before the study is carried out. The way the null hypothesis is stated is critical.

Slide 35

Ho example I would like to study whether the WBC is useful in identifying the presence of bacteremia. Ho : The WBC counts in bacteremic and non-bacteremic febrile children are the same.

Slide 36

Results - Mean WBC Bacteremia: 50 children, mean WBC 17,000 Non-bacteremia: 620 children, mean WBC 11,500 p = 0.06 (not significant) Accept Ho ie., Unable to reject Ho

Slide 37

Mean WBC 11.5 17.0 Non-bacteremic Bacteremic p=0.06 (NS)

Slide 38

Single sided vs. Double sided probability Double side Ho: WBC counts are the same Single side Ho: WBC counts in bacteremic children are lower or the same as non-bacteremic children p=0.03 p=0.03 Double sided (2-tail) p=0.06 (NS) Singled sided (1-tail) p=0.03

Slide 39

Rejecting Double vs. Single sided hypotheses Double side Ho: WBC counts are the same Single side Ho: WBC counts in bacteremic children are lower or the same as non-bacteremic children If p<0.05: Double side: WBC counts in bacteremic children are significantly DIFFERENT from non-bacteremic patients Single side: WBC counts in bacteremic children are significantly HIGHER than in non-bacteremic patients

Slide 40

Mean WBC (non-Bact vs. Bacteremic) Double side p=0.06 Single side p=0.03 11.5 17.0 Non-bacteremic Bacteremic

Slide 41

When to use double vs. single sided test Single sided test: If you know which way a relationship exists and are only interested in testing this one aspect. Only two possibilities exist: a>b vs. a<=b eg. Hypothesis: Fever (temperature) in bacteremic children is higher than in non-bacteremic children. Double sided test: If you do not know which way a relationship exists and must test for three possibilities: a=b, a>b, a<b eg. Hypothesis: Using Jamshidi and Cook IO needles requires different IO insertion times.

Slide 42

Statistical Error Types Type I - The probability of incorrectly concluding that a true difference exists. This is measured with the P value (alpha error). Type II - The probability of incorrectly concluding that the two groups are the same (ie., no difference exists). This type of error is generally due to inadequate sample size. There is no perfect way to measure this type of error (beta error).

Slide 43

Examples of possible statistical errors Type I WBC counts in bacteremic patients are higher than in non-bacteremic patients. p=0.03 (alpha error) Type II Gas mileage in Fords & Chevys are the same. p=0.65 (not significant) Beta error is not accurately measurable. A "power" calculation is indicated (estimate of beta error).

Slide 44

Estimating Sample Size Is there really a difference. Detectable difference (delta). Variance estimate available. Power calculation Done if no significant difference found Likelihood of finding a significant difference for a given delta (difference) value (eg., 95% power)

Slide 45

Study Design Types Experimental design (expensive, clean) Clinical Trial Placebo controlled Blinded Cohort study Case control study (inexpensive, error-prone)

Slide 46

Time Sequence of Data Observations Prospective Longitudinal Retrospective

Slide 47

Experimental Design Not common in clinical medicine. Usually done in labs (on animals or in vitro). Nearly flawless from a methodological standpoint if done correctly. Conclusions are difficult to dispute in the model used However, the model used may not be applicable to humans.

Slide 48

Clinical Trial Treatments to be compared are randomized among subjects. Expensive and resource intensive. Usually published in NEJM. Better if treatment is blinded, but not always possible. Single blind - blinding subject or investigator Double blind - blinding both Trials often not ethically feasible.

Slide 49

Cohort Study A cohort is identified. Disease outcome and risk factors are assessed within the cohort. Prospective, longitudinal, or retrospective. Risk factors are not assigned to cohort members. Problems with confounding variables (must be corrected or controlled using matching). Large longitudinal cohort studies include: Framingham cohort (Massachusetts) Paffenbarger cohort (Stanford) Hawaii Heart & Cancer cohort (Honolulu)

Slide 50

Case-Control Cases of disease are compared with cases of non-disease (controls). Exposure (risk) factors compared between the two. Most efficient and least costly study to perform, especially for uncommon disease conditions. High statistical yield. Highly subject to bias due to confounding factors. Matching the cases and controls for confounding factors is the key. Always retrospective.

Slide 51

Does jogging lower MI risk ? Experimental Design Get 2000 rats. Jog 1000 for a year and have the other 1000 watch TV all day. Compare the number of rat MI's in each group. Not easy to get the rats to jog. Not applicable to human disease. 1/5

Slide 52

Does jogging lower MI risk ? Clinical Trial Get 1000 volunteers. Randomize them to jog or non-jog. For 10 years, the joggers must jog and the non-joggers are forbidden from jogging. Compare MI rates at the end of 10 years. Cannot blind subjects. 2/5

Slide 53

Does jogging lower MI risk ? Cohort Identify a cohort (eg., UH faculty). Survey them to determine who jogs and who has had an MI. Compare the MI rates in the joggers compared to the non-joggers. Cohort may only have a few MI cases. Degree of jogging may be heterogenous. Non-joggers may do other types of comparable activity (eg. tennis). 3/5

Slide 54

Does jogging lower MI risk ? Case-control Go to the CCU and identify 100 patients with MI. Go to the hernia ward and identify 100 age-sex matched patients who never had an MI. Compare jogging activity in the two groups. Must match for factors such as smoking, diet, alcohol, etc. (confounding variables). 4/5

Slide 55

Does jogging lower MI risk ? Experimental design - Poor choice Clinical trial - Impossible Cohort - Probably best choice, but will require longitudinal observation over 10 to 30 years. Case-control - Fairly easy to do, but highly subject to design flaws. 5/5

Slide 56

THE END Statistics in 1 Hour Loren Yamamoto, MD, MPH, MBA Professor of Pediatrics University of Hawaii John A. Burns School of Medicine

Summary: 1 Hr Bio Stats

Tags: 1 hr bio stats

URL: