*=====================================================================. *LAB COMPUTER SPSS Syntax for Lab 5 of Quantitative Methods 1. *Hypothesis Tests I. *-------------------------------------------------------------------------------------------------------------------. *(c) Gwilym Pryce 31 Oct 05, v5. *LAB_L5_Hypothesis Tests_SPSS_Gwilym Pryce_31oct05_v5.sps *=====================================================================. *Examples and answers are taken from "Inference and Statistics in SPSS" by Gwilym Pryce, GeeBeeJey Publising. *SPSS Syntax downloads, copies of the book and other resources can be obtained from www.geebeejey.co.uk, . *and from www.gwilympryce.co.uk. ********************************************* NOTE *******************************************************. *** The macro programs needed for the commands used in this lab are pasted at the end of this file ****. ********************************************* NOTE *******************************************************. *=====================================================================. * 5.2.1 Exercise 5.4 Large Sample Hypothesis Tests on One Mean (Pryce ). *-------------------------------------------------------------------------------------------------------------------. *(see p.5-8 of Gwilym Pryce, 2005, "Inference & Statistics in SPSS: A Course for Business and. *Social Science", Glasgow: Geebeejey Publishing, www.geebeejey.co.uk, ISBN:0955143306 ). *=====================================================================. *Suppose your area of research is the disappearance of thousands of civil servants . *and other workers during Joseph Stalin's Great Purge in Soviet Russia 1936-38. . *One of the questions you are interested in is the average age of the workers . *when they disappeared. *Your thesis is that Stalin felt most threatened by older, more established 'enemies', . *and so you anticipate their average age to be over 50. Unfortunately, . *you only have access to 506 records on the age of individuals when they disappeared. *You have calculated the average age in this sample to be 56.2 years, . *which would appear to confirm your thesis. *The standard deviation of your sample was found to be 14.7 years. *Assuming that your 506 records constitute a random sample . *from the population of those who disappeared (a questionable assumption?), . *test your theory about the average age of the Disappeared. *--------------------------------------------. *Answer Using the SPSS command. *--------------------------------------------. H_L1M n=(506) x_bar=(56.2) m=(50) s=(14.7). *Large sample sig. test for one mean. * N X_BAR SE ZI SIGZ_2TL SIGZ_LTL SIGZ_UTL. * 506.00000 56.20000 .65349 9.48745 .00000 1.00000 .00000. *This corroborates the finding from the last lab that the confidence interval . *is very narrow because of the large sample size . *and because of the small standard deviation in the sample:. *Large sample confidence interval for the population mean . * N X_BAR ZI SE ERR LOWER UPPER . *506.00000 56.20000 1.96039 .65349 1.28111 54.91889 57.48111. *=====================================================================. *5.2.2 Exercise 5.5 Small Sample Hypothesis Tests on One Mean *=====================================================================. *=====================================================================. *1. Insulin Injections Machine . *-------------------------------------------------------------------------------------------------------------------. *(see p.5-9 of Pryce, G. 2005, "Inference and Statistics in SPSS: A Course for Business and. *Social Science", Glasgow: Geebeejey Publishing, www.geebeejey.co.uk, ISBN:0955143306 ). *=====================================================================. . *A machine used to fill pre-packaged emergency insulin injections . *is correctly adjusted if the average net weight of insulin is 11 ounces per syringe. . *A random sample of 107 syringes had an average fill of 10.92 ounces . *and standard deviation of 0.28 ounces. . *Is the machine adjusted properly? Test at the 5% significance level. *. *--------------------------------------------. *Answer Using the SPSS command. *--------------------------------------------. H_L1M n=(107) x_bar=(10.92) m=(11) s=(0.28). *Large sample sig. test for one mean. * N X_BAR SE ZI SIGZ_2TL SIGZ_LTL SIGZ_UTL. * 107.00000 10.92000 .02707 -2.95545 .00312 .00156 .99844. *We could alternatively do a lower-tail test: that the syringes are being under-filled. . *This leads to an even lower significance level, . *making it even less likely that our sample is a freak sample. *Interestingly, if we run a t-test on the same hypothesis we get . *a slightly more cautious estimate of the significance level, . *but the result is essentially the same. . *This is because, as the sample size gets larger, . *the t-distribution estimates of the significance level . *will tend toward the z-distribution estimates. . *In this case the sample size is fairly large . *so you'd expect the z and t estimates of P to be similar. H_S1M n=(107) x_bar=(10.92) m=(11) s=(0.28). *Small sample sig. test for one mean. * N X_BAR SE TI SIGT_2TL SIGT_LTL SIGT_UTL. * 107.00000 10.92000 .02707 -2.95545 .00385 .00192 .99808. *=====================================================================. *2. Average GP Time with Patients. *-------------------------------------------------------------------------------------------------------------------. *(see p.5-11 of Pryce, G. 2005, "Inference and Statistics in SPSS: A Course for Business and. *Social Science", Glasgow: Geebeejey Publishing, www.geebeejey.co.uk, ISBN:0955143306 ). *=====================================================================. . *A newspaper has claimed that the average time GPs spend with patients . *in a particular session has fallen to an all-time low of 3.6 minutes. *A report from the Scottish Executive disagrees, contending that the health service . *has met New Labour's manifesto target of 4 minutes . *average consultation time, . *and that the data used by the newspaper was based on a freak sample. . *The newspaper's survey was based on a random sample of 80 doctors . *with a mean of 3.60 minutes and a standard deviation of 1.80 minutes. . *Is the government bluffing . *or is there a good chance that this is indeed a freak sample? . *Perform the appropriate hypothesis test using a significance level of 0.05. *--------------------------------------------. *Answer Using the SPSS command. *--------------------------------------------. H_L1M n=(80) x_bar=(3.6) m=(4.0) s=(1.8). *Large sample sig. test for one mean. * N X_BAR SE ZI SIGZ_2TL SIGZ_LTL SIGZ_UTL. * 80.00000 3.60000 .20125 -1.98762 .04685 .02343 .97657. *=====================================================================. *3.Prevalence of American Pronunciation. *-------------------------------------------------------------------------------------------------------------------. *(see p.5-12 of Pryce, G. 2005, "Inference & Statistics in SPSS: A Course for Business and. *Social Science", Glasgow: Geebeejey Publishing, www.geebeejey.co.uk, ISBN:0955143306 ). *=====================================================================. *For your PhD, you want to estimate the number of times an American phrase or pronunciation . *occurs in a typical 5 minute conversation between teenage youths in Liverpool. . *Because of the time taken to build up sufficient rapport with such youths . *for them to speak in a relaxed way, . *you only manage to observe 23 such conversations. . *The average number of Americanisms in 5 minute conversations amongst your small sample is 137.71, . *with a standard deviation of 69.56. *The last study that was done demonstrated that the average was 128.2 words per 5 minute conversation and this has become the . *accepted wisdom in the literature. . *Do a hypothesis test to establish whether the average has in fact increased . *at the 5% significance level. . *Also do a two-tail test for whether there has been any change at all. *Compare your answer with the 95% confidence interval . *estimated in the previous set of lab exercises. *--------------------------------------------. *Answer Using the SPSS command. *--------------------------------------------. *If we use the large sample syntax . *(i.e. use the z-distribution to calculate the probability of the sample being atypical) . *then we end up with the following output:. H_L1M n=(23) x_bar=(137.71) m=(128.2) s=(69.56). *Large sample sig. test for one mean. * N X_BAR SE ZI SIGZ_2TL SIGZ_LTL SIGZ_UTL. *23.00000 137.71000 14.50426 .65567 .51204 .74398 25602. *Note, however, that the sample size is only 23, . *and so one should really use the t-distribution:. H_S1M n=(23) x_bar=(137.71) m=(128.2) s=(69.56). *Small sample sig. test for one mean. * N X_BAR SE TI SIGT_2TL SIGT_LTL SIGT_UTL. * 23.00000 137.71000 14.50426 .65567 .51884 .74058 .25942. *The significance level for the upper-tail test was found to be 0.259 . *(slightly higher than that derived using the z-distribution, . *reflecting the slightly flatter shape of the t-distribution i.e. more area in the tails). . *This suggests that if we reject the null hypothesis . *(that the average number of Americanisms amongst Liverpool youth is still 128.2) . *in favour of the alternative hypothesis . *(that average has risen) based on your PhD sample, then there is a 26% chance that we are wrong. . *This is too high a risk to take and so we cannot reject the null hypothesis. . *In other words, we cannot say that the average number of Americanisms has risen. . *If we do a two-tail test, we find that the chances of our sample being a freak sample . *is more than 50% (sig. = 0.519) . *and so we have even less of a basis to reject the null hypothesis. *This result should not surprise us since the 95% confidence interval . *based on your PhD sample was found to be 108 Americanisms to 168 Americanisms . *(compare the output below for small and large sample confidence interval estimation):. C1_L1M n = (23) x_bar = (137.71) s = (69.56) c = (0.95). *Large sample confidence interval for the population mean. * N X_BAR ZIL SE ERR LOWER UPPER. * 23.00000 137.71000 -1.95996 14.50426 28.42783 109.28217 166.13783. *C2_S1M n = (23) x_bar = (137.71) s = (69.56) c = (0.95). *Small sample confidence interval for the population mean. * N X_BAR TIL SE ERR LOWER UPPER. * 23.00000 137.71000 -2.07387 14.50426 30.08000 107.63000 167.79000. *=====================================================================. *4. Brownfield Contamination. *-------------------------------------------------------------------------------------------------------------------. *(see p.5-13 of Pryce, G. 2005, "Inference & Statistics in SPSS: A Course for Business and. *Social Science", Glasgow: Geebeejey Publishing, www.geebeejey.co.uk, ISBN:0955143306 ). *=====================================================================. *As part of your research you seek to analyse the policy of using the planning system . *to encourage the building of residential properties on brownfield sites . *(i.e. recycled industrial land), rather than greenfield sites . *(i.e. former agricultural or park land). . *One of your concerns is that brownfield land is more likely to be contaminated, . *and the methods for surveying the level of contamination do not preclude the possibility . *that a site may be declared safe when it is not. . *Surveys gauge contamination by taking bore-extracts every 100m. *The land is declared safe for residential construction if the average level of toxicity . *is no more than 1g per extract. . *In your case study area, the former steelworks site in Cambuslang, Glasgow, . *you find that a random sample of 64 bores has been taken yielding an average of 0.88g . *with standard deviation of 0.79g. The average is below the safety threshold . *but do you think the Local Authority should grant this site residential planning permission?. *--------------------------------------------. *Answer Using the SPSS command. *--------------------------------------------. H_L1M n=(64) x_bar=(0.88) m=(1) s=(0.79). *Large sample sig. test for one mean. * N X_BAR SE ZI SIGZ_2TL SIGZ_LTL SIGZ_UTL. * 64.00000 .88000 .09875 -1.21519 .22429 .11215 .88785. *The SIGZ_LTL figure is the probability of observing a value of z smaller than –1.215. . *If we reject the null that the mean level of contamination in extracts is = 1g in favour of . *the alternative hypothesis that it is less than 1g, . *there is more than a one in ten chance that we have rejected the null incorrectly. . *If we reject the null in favour of the alternative hypothesis, . *that it is greater than 1g, . *we can only be 88% sure that we will have rejected the null incorrectly. *=====================================================================. *5. Steel Worker Exposure to Contamination. *-------------------------------------------------------------------------------------------------------------------. *(see p.5-13 of Pryce, G. 2005, "Inference & Statistics in SPSS: A Course for Business and. *Social Science", Glasgow: Geebeejey Publishing, www.geebeejey.co.uk, ISBN:0955143306 ). *=====================================================================. *As part of your research into the contamination levels experienced by workers . *in the Cambuslang steel industry in the first half of the twentieth century, . *you examine 273 random medical checks on workers . *which reveal an average contamination level of 92.7 units . *with a standard deviation of 39.7 units. . *The legal threshold for exposure is that the average should not exceed 94 units . *and so the steel industry always claimed on the basis of this sample . *that their workers were safe. . *How sure can you be that this conclusion is valid?. *--------------------------------------------. *Answer Using the SPSS command. *--------------------------------------------. *The question, of course, is whether when we take into account random sampling variation . *the pulation mean is likely to be below the legal threshold. . *Let's do an upper-tail test to see if the population mean is likely to be above the legal limit:. *H0: m = 94. *H1: m > 94. H_L1M n=(273) x_bar=(92.7) m=(94) s=(39.7). *Large sample sig. test for one mean. * N X_BAR SE ZI SIGZ_2TL SIGZ_LTL SIGZ_UTL. * 273.00000 92.70000 2.40275 -.54105 .58848 .29424 .70576. *These results show that, because of the large standard deviation in the sample, . *we cannot reject the null hypothesis that the mean exposure score in the population of workers equals 94 units,. *whether the alternative hypothesis is upper-, lower- or two-tail. . *If we look at the confidence interval, this result is not surprising since . *the upper limit on the 95% confidence interval is 97.4 units. . *In short, we cannot be sure on the basis of this sample . *that the population mean is not well above the legal threshold. C1_L1M n=(273) x_bar=(92.7) s=(39.7) c=(0.95). *Large sample confidence interval for the population mean. * N X_BAR ZIL SE ERR LOWER UPPER. * 273.00000 92.70000 -1.95996 2.40275 4.70931 87.99069 97.40931. *=====================================================================. *6. Sectarian Attitudes Among Rangers Fans. *-------------------------------------------------------------------------------------------------------------------. *(see p.5-14 of Pryce, G. 2005, "Inference & Statistics in SPSS: A Course for Business and. *Social Science", Glasgow: Geebeejey Publishing, www.geebeejey.co.uk, ISBN:0955143306 ). *=====================================================================. *You are a research assistant on a project investigating sectarian attitudes . *amongst Rangers supporters, . *you interview 24 supporters in pubs after a football match at Ibrox. . *You find that the average age of supporters with sectarian attitudes is 22.7 years . *(s.d. = 9.2 years) . *which is well below the most recent estimate published five years ago . *which said that the average age of this group was around 30 years. . *Run a lower-tail t-test to see if this difference between estimates is statistically significant and reflect on the . *robustness of your method. *--------------------------------------------. *Answer Using the SPSS command. *--------------------------------------------. *H0: m = 30. *H1: m < 30. H_S1M n=(24) x_bar=(22.7) m=(30) s=(9.2). *Small sample sig. test for one mean. * N X_BAR SE TI SIGT_2TL SIGT_LTL SIGT_UTL. * 24.00000 22.70000 1.87794 -3.88723 .00074 .00037 .99963. *. *This result tells us that, despite the small sample, . *you can be pretty confident of rejecting the null hypothesis . *(that the average age of sectarian Rangers supporters equals 30) . *in favour of the alternative hypothesis that the average age is less than 30 years . *There is less than a one in a thousand chance that your rejection of the null is incorrect. *Note, however, that the t-test assumes a normally distributed variable . *(i.e. that the age of sectarian Rangers supporters is normally distributed) but this may well not be the case . *If the normality assumption fails and you have a small sample, then non-parametric methods should be used . *If you have a large sample, then the normality assumption does not matter . *because the Central Limit Theorem will start to kick in . *(this says that the sampling distribution of the mean is normal even if the variable itself is not normal, provided the . *sample sizes used in the repeated samples are large). . *The t-test (and z-test and confidence interval estimates and any other form of inference) . *also assumes that your sample is randomly selected from all sectarian Rangers supporters, . *but this might not be the case. Older supporters, for example, may be less likely to attend the match *(rather than watch on television) . *and may be less (or more?) likely to stay around for a drink at the pub. *The type of sample may also be affected by the outcome of the match, . *as may your measurement of sectarianism. *All these lead to what is known as sample selection bias. *Such bias undermines our ability to make inferences about the population from a particular sample. *=====================================================================. *End of exercises. *=====================================================================. *#############################################################################. *#############################################################################. *#############################################################################. *#############################################################################. *#############################################################################. *#############################################################################. *=====================================================================. *Macro Programs. *=====================================================================. *If these macros have not already been installed on the lab machines, simply highlight all the programs below. *Then run them as one command by pressing CTRL+R. *You will then be able to use macro commands . *---- Highlight from the start of this line... -------------------------------------------. DEFINE pz_lt_zi (!POSITIONAL !ENCLOSE('(',')')). GET FILE='Q:\QUANTS\one.sav'. compute Zi_Var = !1 . COMPUTE PROB = CDFNORM(Zi_Var). execute. MATRIX. GET PROB_VAR /VARIABLES = PROB. GET Zi_Var /VARIABLES = Zi_Var. COMPUTE Prob = PROB_VAR(1). COMPUTE zi = Zi_Var(1). COMPUTE ANSWER = {zi, PROB}. PRINT ANSWER / FORMAT "F10.5" /Title = " Prob(z < zi) for a given zi " / CLABELS = zi, Prob. END MATRIX. !ENDDEFINE. DEFINE pz_gt_zi (!POSITIONAL !ENCLOSE('(',')')). GET FILE='Q:\QUANTS\one.sav'. compute Zi_Var = !1 . COMPUTE PROB = 1 - CDFNORM(Zi_Var). execute. MATRIX. GET PROB_VAR /VARIABLES = PROB. GET Zi_Var /VARIABLES = Zi_Var. COMPUTE Prob = PROB_VAR(1). COMPUTE zi = Zi_Var(1). COMPUTE ANSWER = {zi, PROB}. PRINT ANSWER / FORMAT "F10.5" /Title = " Prob(z > zi) for a given zi " / CLABELS = zi, Prob. END MATRIX. !ENDDEFINE. DEFINE pz_lg_zi (zil = !ENCLOSE('(',')') / ziu = !ENCLOSE('(',')')). GET FILE='Q:\QUANTS\one.sav'. compute ZiL_Var = !zil . compute ZiU_Var = !ziu . execute. COMPUTE PROBL = CDFNORM(ZiL_Var). COMPUTE PROBU = 1 - CDFNORM(ZiU_Var). COMPUTE PROBLG = PROBL + PROBU. execute. MATRIX. GET PROB_VAR /VARIABLES = PROBLG. GET ZiL_Var /VARIABLES = ZiL_Var. GET ZiU_Var /VARIABLES = ZiU_Var. COMPUTE Prob = PROB_VAR(1). COMPUTE ziL = ZiL_Var(1). COMPUTE ziU = ZiU_Var(1). COMPUTE ANSWER = {ziL, ziU, PROB}. PRINT ANSWER / FORMAT "F10.5" /Title = " Prob((z < ziL) OR (z > ziU)) for a given zi " / CLABELS = ziL, ziU, Prob. END MATRIX. !ENDDEFINE. DEFINE pz_gl_zi (zil = !ENCLOSE ('(',')') / ziu = !ENCLOSE('(',')')). GET FILE='Q:\QUANTS\one.sav'. compute ZiL_Var = !zil . compute ZiU_Var = !ziu . execute. COMPUTE PROBL = CDFNORM(ZiL_Var). COMPUTE PROBU = 1 - CDFNORM(ZiU_Var). COMPUTE PROBLG = 1 - (PROBL + PROBU). execute. MATRIX. GET PROB_VAR /VARIABLES = PROBLG. GET ZiL_Var /VARIABLES = ZiL_Var. GET ZiU_Var /VARIABLES = ZiU_Var. COMPUTE Prob = PROB_VAR(1). COMPUTE ziL = ZiL_Var(1). COMPUTE ziU = ZiU_Var(1). COMPUTE ANSWER = {ziL, ziU, PROB}. PRINT ANSWER / FORMAT "F10.5" /Title = " Prob(ziL < z < ziU) for a given zi " / CLABELS = ziL, ziU, Prob. END MATRIX. !ENDDEFINE. DEFINE zi_lt_zp (p = !ENCLOSE('(',')')). GET FILE='Q:\QUANTS\one.sav'. COMPUTE Zi = PROBIT(!p). EXECUTE. MATRIX. GET Zi_VAR /VARIABLES = Zi. COMPUTE Zi = Zi_VAR(1). COMPUTE PROB= {!p}. /*Enter the given probability into the curly brackets*/ COMPUTE ANSWER = {Zi, PROB}. PRINT ANSWER / FORMAT "F10.5" /Title = " Value of zi such that Prob(z < zi) = PROB when PROB is given" / CLABELS = zi, PROB. END MATRIX. !END DEFINE. DEFINE zi_gt_zp (p = !ENCLOSE('(',')')). GET FILE='Q:\QUANTS\one.sav'. COMPUTE Zi = PROBIT(1-!p). EXECUTE. MATRIX. GET Zi_VAR /VARIABLES = Zi. COMPUTE Zi = Zi_VAR(1). COMPUTE PROB= {!p}. /*Enter the given probability into the curly brackets*/ COMPUTE ANSWER = {Zi, PROB}. PRINT ANSWER / FORMAT "F10.5" /Title = " Value of zi such that Prob(z > zi) = PROB when PROB is given" / CLABELS = zi, PROB. END MATRIX. !END DEFINE. DEFINE zi_gl_zp (p = !ENCLOSE('(',')')). GET FILE='Q:\QUANTS\one.sav'. COMPUTE PROB = !p. COMPUTE PROBLG = 1 - !p. COMPUTE PROBL = PROBLG / 2. COMPUTE ZiL_Var = PROBIT(PROBL). COMPUTE ZiU_Var = -1 * ZiL_Var . execute. MATRIX. GET PROB_VAR /VARIABLES = PROB. GET ZiL_Var /VARIABLES = ZiL_Var. GET ZiU_Var /VARIABLES = ZiU_Var. COMPUTE Prob = PROB_VAR(1). COMPUTE ziL = ZiL_Var(1). COMPUTE ziU = ZiU_Var(1). COMPUTE ANSWER = {ziL, ziU, PROB}. PRINT ANSWER / FORMAT "F10.5" /Title = " Value of zi such that Prob(-zi < z < zi) = PROB, when PROB is given " / CLABELS = ziL, ziU, Prob. END MATRIX. !ENDDEFINE. DEFINE CI_L1M (n = !ENCLOSE('(',')') /x_bar = !ENCLOSE('(',')') /s = !ENCLOSE('(',')') /c = !ENCLOSE('(',')')). GET FILE='Q:\QUANTS\one.sav'. COMPUTE PROB = !c. COMPUTE PROBLG = 1 - !c. COMPUTE PROBL = PROBLG / 2. COMPUTE ZiL = PROBIT(PROBL). COMPUTE ZiU = -1 * ZiL . execute. MATRIX. COMPUTE n = {!n}. /* Enter the sample size here (i.e. change the number in curly brackets)*/ COMPUTE x_bar = {!x_bar}. /* Enter the sample mean here*/ COMPUTE s = {!s}. /* Enter the sample standard deviation here*/ COMPUTE SE = s/SQRT(n). GET ZiL /VARIABLES = ZiL. COMPUTE ERR = -ZiL * SE. COMPUTE LOWER = x_bar - err. COMPUTE UPPER = x_bar + err. COMPUTE ANSWER = {n, x_bar, ZiL, SE, err, Lower, Upper}. PRINT ANSWER / FORMAT "F10.5" /Title = "Large sample confidence interval for the population mean" / CLABELS = n, x_bar, ZiL, SE, err, Lower, Upper. END MATRIX. !END DEFINE. DEFINE CI_S1M (n = !ENCLOSE('(',')') /x_bar = !ENCLOSE('(',')') /s = !ENCLOSE('(',')') /c = !ENCLOSE('(',')')). GET FILE='Q:\QUANTS\one.sav'. COMPUTE df = !n - 1. COMPUTE PROB = !c. COMPUTE PROBLG = 1 - !c. COMPUTE PROBL = PROBLG / 2. COMPUTE TiL = IDF.T(PROBL, df). COMPUTE TiU = -1 * TiL . execute. MATRIX. COMPUTE n = {!n}. /* Enter the sample size here (i.e. change the number in curly brackets)*/ COMPUTE x_bar = {!x_bar}. /* Enter the sample mean here*/ COMPUTE s = {!s}. /* Enter the sample standard deviation here*/ COMPUTE SE = s/SQRT(n). GET TiL /VARIABLES = TiL. GET df /VARIABLES = df. COMPUTE ERR = -TiL * SE. COMPUTE LOWER = x_bar - err. COMPUTE UPPER = x_bar + err. COMPUTE ANSWER = {n, x_bar, TiL, SE, err, Lower, Upper}. PRINT ANSWER / FORMAT "F10.5" /Title = "Small sample confidence interval for the population mean" / CLABELS = n, x_bar, TiL, SE, err, Lower, Upper. END MATRIX. !END DEFINE. DEFINE CI_S2Mp (n1 = !ENCLOSE('(',')') /n2 = !ENCLOSE('(',')') /x_bar1 = !ENCLOSE('(',')') /x_bar2 = !ENCLOSE('(',')') /s1 = !ENCLOSE('(',')') /s2 = !ENCLOSE('(',')') /c = !ENCLOSE('(',')')). GET FILE='Q:\QUANTS\one.sav'. COMPUTE df = !n1 + !n2 - 2. COMPUTE PROB = !c. COMPUTE PROBLG = 1 - !c. COMPUTE PROBL = PROBLG / 2. COMPUTE TiL = IDF.T(PROBL, df). COMPUTE TiU = -1 * TiL . execute. MATRIX. GET df / variables = df. /* Enter the df here (i.e. change the number in curly brackets)*/ COMPUTE x_bar1 = {!x_bar1}. /* Enter the sample mean here*/ COMPUTE x_bar2 = {!x_bar2}. /* Enter the sample mean here*/ COMPUTE sp = SQRT(( (!n1 - 1)* !s1**2 + (!n2 - 1) * !s2**2 ) / (!n1 + !n2 - 2) ). COMPUTE SE = sp*(SQRT((1/!n1) + (1/!n2))). GET TiL /VARIABLES = TiL. GET df /VARIABLES = df. COMPUTE ERR = -TiL * SE. COMPUTE SAMPDIFF = x_bar1 - x_bar2. COMPUTE LOWER = SAMPDIFF - err. COMPUTE UPPER = SAMPDIFF + err. COMPUTE ANSWER = {SAMPDIFF, SP, TiL, SE, err, Lower, Upper}. PRINT ANSWER / FORMAT "F10.5" /Title = "CI for the difference between 2 population means (pooled variance)" / CLABELS = SAMPDIFF, SP, TiL, SE, err, Lower, Upper. END MATRIX. !END DEFINE. DEFINE CI_S2Md (n1 = !ENCLOSE('(',')') /n2 = !ENCLOSE('(',')') /x_bar1 = !ENCLOSE('(',')') /x_bar2 = !ENCLOSE('(',')') /s1 = !ENCLOSE('(',')') /s2 = !ENCLOSE ('(',')') /c = !ENCLOSE('(',')')). GET FILE='Q:\QUANTS\one.sav'. COMPUTE df = min((!n1 -1), (!n2 - 1)). COMPUTE PROB = !c. COMPUTE PROBLG = 1 - !c. COMPUTE PROBL = PROBLG / 2. COMPUTE TiL = IDF.T(PROBL, df). COMPUTE TiU = -1 * TiL . execute. MATRIX. GET df / variables = df. /* Enter the df here (i.e. change the number in curly brackets)*/ COMPUTE x_bar1 = {!x_bar1}. /* Enter the sample mean here*/ COMPUTE x_bar2 = {!x_bar2}. /* Enter the sample mean here*/ COMPUTE SE = SQRT((!s1**2/!n1) + (!s2**2/!n2)). GET TiL /VARIABLES = TiL. GET df /VARIABLES = df. COMPUTE ERR = -TiL * SE. COMPUTE SAMPDIFF = x_bar1 - x_bar2. COMPUTE LOWER = SAMPDIFF - err. COMPUTE UPPER = SAMPDIFF + err. COMPUTE ANSWER = {SAMPDIFF, TiL, SE, err, Lower, Upper}. PRINT ANSWER / FORMAT "F10.5" /Title = "CI for the difference between 2 population means (different variances)" / CLABELS = SAMPDIFF, TiL, SE, err, Lower, Upper. END MATRIX. !END DEFINE. DEFINE CI_L1P (n = !ENCLOSE ('(',')') /x = !ENCLOSE('(',')') /c = !ENCLOSE('(',')')). GET FILE='Q:\QUANTS\one.sav'. COMPUTE PROB = !c. COMPUTE PROBLG = 1 - !c. COMPUTE PROBL = PROBLG / 2. COMPUTE ZiL = PROBIT(PROBL). COMPUTE ZiU = -1 * ZiL . execute. MATRIX. COMPUTE n = !n. /* Enter the sample size here */ COMPUTE x = !x. /* Enter the number of "successes" or particular outcomes here */ COMPUTE CONFID = !c. /* Enter the desired confidence level here */ COMPUTE pTrad = x/n. /* the traditional estimate of the pop. proportion used in CI estimation */ COMPUTE pWlsn = (x+2)/(n + 4). /* the Wilson estimate */ GET zstar /VARIABLES = ZiL. COMPUTE SE_Trad = SQRT((pTrad*(1-pTrad))/n). COMPUTE SE_Wlsn = SQRT((pWlsn*(1-pWlsn))/(n+4)). COMPUTE eTrad = -zstar * SE_Trad. COMPUTE eWlsn = -zstar * SE_Wlsn. COMPUTE LOW_Trad = pTrad - eTrad. COMPUTE LOW_Wlsn = pWlsn - eWlsn. COMPUTE UP_Trad = pTrad + eTrad. COMPUTE UP_Wlsn = pWlsn + eWlsn. COMPUTE ANSWER = {pTrad, zstar, se_trad, etrad, low_trad, up_trad}. PRINT ANSWER / FORMAT "F10.6" /Title = "Traditional Large sample CI for one proportion" / CLABELS = ptrad, zstar, se_trad, etrad, low_trad, up_trad. COMPUTE ANSWER = {pWlsn, zstar, se_wlsn, ewlsn, low_wlsn, up_wlsn}. PRINT ANSWER / FORMAT "F10.6" /Title = "Wilson Large sample CI for one proportion" / CLABELS = pwlsn, zstar, se_wlsn, ewlsn, low_wlsn, up_wlsn. END MATRIX. !ENDDEFINE. DEFINE N_L1M (e = !ENCLOSE('(',')') /c = !ENCLOSE('(',')') /s = !ENCLOSE('(',')')). GET FILE='Q:\QUANTS\one.sav'. COMPUTE PROB = !c. COMPUTE E = !E. COMPUTE PROBLG = 1 - !c. COMPUTE PROBL = PROBLG / 2. COMPUTE ZiL_Var = PROBIT(PROBL). COMPUTE ZiU_Var = -1 * ZiL_Var . COMPUTE N = (ZiL_Var**2) * (!s**2) / (!e**2). execute. MATRIX. GET PROB_VAR /VARIABLES = PROB. GET N / VARIABLES = N. GET E / VARIABLES = E. GET ZiL_Var /VARIABLES = ZiL_Var. GET ZiU_Var /VARIABLES = ZiU_Var. COMPUTE Prob = PROB_VAR(1). COMPUTE ziL = ZiL_Var(1). COMPUTE ziU = ZiU_Var(1). COMPUTE ANSWER = {e, PROB, ziL, ziU, N}. PRINT ANSWER / FORMAT "F10.5" /Title = "n_hat = estimated sample size needed to achieve an error of size e given c" / CLABELS = e, c, ziL, ziU, n_hat. END MATRIX. !ENDDEFINE. DEFINE H_L1M (n = !ENCLOSE('(',')') / x_bar = !ENCLOSE('(',')') / m = !ENCLOSE('(',')') / s = !ENCLOSE('(',')') ). GET FILE='Q:\QUANTS\one.sav'. COMPUTE x_bar = !x_bar. COMPUTE m = !m. COMPUTE s = !s. COMPUTE n = !n. COMPUTE zi = (!x_bar - !m) / (!s /sqrt(!n)) . execute. * This calculates the Large-Sample Significance Test for a Single Population mean. MATRIX. GET ZI / VARIABLES = zi. GET X_BAR / VARIABLES = X_BAR. GET N / VARIABLES = N. GET S / VARIABLES = S. COMPUTE SE = S / (SQRT(N)). COMPUTE SIGz_2TL = 2 * (1 - CDFNORM(ABS(Zi))). COMPUTE SIGz_LTL = CDFNORM(Zi). COMPUTE SIGz_UTL = 1 - CDFNORM(Zi). COMPUTE ANSWER = {n, x_bar, SE, zi, SIGz_2TL, SIGz_LTL, SIGz_UTL}. PRINT ANSWER / FORMAT "F10.5" /Title = "Large sample sig. test for one mean" / CLABELS = n, x_bar, SE, zi, SIGz_2TL, SIGz_LTL, SIGz_UTL. END MATRIX. !ENDDEFINE. DEFINE H_S1M (n = !ENCLOSE('(',')') / x_bar = !ENCLOSE('(',')') / m = !ENCLOSE('(',')') / s = !ENCLOSE('(',')')). GET FILE='Q:\QUANTS\one.sav'. COMPUTE x_bar = !x_bar. COMPUTE m = !m. COMPUTE s = !s. COMPUTE n = !n. COMPUTE ti = (!x_bar - !m) / (!s /sqrt(!n)) . execute. MATRIX. GET TI / VARIABLES = ti. GET X_BAR / VARIABLES = X_BAR. GET N / VARIABLES = N. GET S / VARIABLES = S. COMPUTE SE = S / (SQRT(N)). COMPUTE SIGt_2TL = 2 * (1 - TCDF(ABS(ti), n-1)). COMPUTE SIGt_LTL = TCDF(ti, n-1). COMPUTE SIGt_UTL = 1 - TCDF(ti, n-1). COMPUTE ANSWER = {n, x_bar, SE, ti, SIGt_2TL, SIGt_LTL, SIGt_UTL}. PRINT ANSWER / FORMAT "F10.5" / /Title = "Small sample sig. test for one mean" / CLABELS = n, x_bar, SE, ti, SIGt_2TL, SIGt_LTL, SIGt_UTL. END MATRIX. !ENDDEFINE. DEFINE H_S2Mp (n1 = !ENCLOSE('(',')') / n2 = !ENCLOSE('(',')') / x_bar1 = !ENCLOSE('(',')') / x_bar2 = !ENCLOSE('(',')') / s1 = !ENCLOSE('(',')') / s2 = !ENCLOSE('(',')') ). GET FILE='Q:\QUANTS\one.sav'. MATRIX. COMPUTE n1 = {!n1}. /* Enter the first sample size here */ COMPUTE n2 = {!n2}. /* Enter the second sample size here */ COMPUTE x_bar1 = {!x_bar1}. /* Enter the mean of sample 1 here */ COMPUTE x_bar2 = {!x_bar2}. /* Enter the mean of sample 2 here */ COMPUTE s1 = {!s1}. /* Enter the first standard deviation here*/ COMPUTE s2 = {!s2}. /* Enter the second standard deviation here*/ COMPUTE sp = SQRT(((n1-1)*s1**2+(n2-1)*s2**2)/(n1 + n2 -2)). COMPUTE x1b_x2b = x_bar1 - x_bar2. COMPUTE SE = sp * SQRT((1/n1)+(1/n2)). COMPUTE ti = (x_bar1 - x_bar2) /SE. COMPUTE df = n1 + n2 -2. COMPUTE SIGt_2TL = 2 * (1 - TCDF(ABS(ti), df)). COMPUTE SIGt_LTL = TCDF(ti, df). COMPUTE SIGt_UTL = 1 - TCDF(ti, df). COMPUTE ANSWER = {df, x1b_x2b, SE, ti, SIGt_2TL, SIGt_LTL, SIGt_UTL}. PRINT ANSWER / FORMAT "F10.5" /Title = "Equal vars Indep samples t-test for equal means (H0 pop1ave = pop2ave)" / CLABELS = df, x1b_x2b, SE, ti, SIGt_2TL, SIGt_LTL, SIGt_UTL. END MATRIX. !ENDDEFINE. DEFINE H_S2Md (n1 = !ENCLOSE('(',')') / n2 = !ENCLOSE('(',')') / x_bar1 = !ENCLOSE('(',')') / x_bar2 = !ENCLOSE('(',')') / s1 = !ENCLOSE('(',')') / s2 = !ENCLOSE('(',')') ). GET FILE='Q:\QUANTS\one.sav'. COMPUTE n1 = !n1. COMPUTE n2 = !n2. COMPUTE x_bar1 = !x_bar1. COMPUTE x_bar2 = !x_bar2. COMPUTE s1 = !s1. COMPUTE s2 = !s2. COMPUTE SE = SQRT((s1**2/n1)+(s2**2/n2)). COMPUTE x1b_x2b = x_bar1 - x_bar2. COMPUTE ti = (x_bar1 - x_bar2) /SE. COMPUTE df1 = n1 -1. COMPUTE df2 = n2 - 2. COMPUTE df = MIN(df1, df2). COMPUTE SIGt_2TL = 2 * (1 - CDF.T(ABS(ti), df)). COMPUTE SIGt_LTL = CDF.T(ti, df). COMPUTE SIGt_UTL = 1 - CDF.T(ti, df). execute. MATRIX. GET TI / VARIABLES = ti. GET x1b_x2b / VARIABLES = x1b_x2b . GET df / VARIABLES = df. GET SE / VARIABLES = SE. GET SIGt_2TL / VARIABLES = SIGt_2TL. GET SIGt_LTL / VARIABLES = SIGt_LTL. GET SIGt_UTL / VARIABLES = SIGt_UTL. COMPUTE ANSWER = {df, x1b_x2b, SE, ti, SIGt_2TL, SIGt_LTL, SIGt_UTL}. PRINT ANSWER / FORMAT "F10.6" /Title = "Uneq vars Indep samples t-test for equality of means (H0 pop1ave = pop2ave)" / CLABELS = df, x1b_x2b, SE, ti, SIGt_2TL, SIGt_LTL, SIGt_UTL. END MATRIX. !ENDDEFINE. DEFINE H_S2VF (n1 = !ENCLOSE('(',')') / n2 = !ENCLOSE('(',')') / s1 = !ENCLOSE('(',')') / s2 = !ENCLOSE('(',')') ). GET FILE='Q:\QUANTS\one.sav'. COMPUTE n1 = !n1. COMPUTE n2 = !n2. COMPUTE s1 = !s1. COMPUTE s2 = !s2. EXECUTE. DO IF (s1 GE s2) . COMPUTE V_nmrtor = s1 * s1. COMPUTE V_dnmtor = s2 * s2. COMPUTE n_nmrtor = n1 - 1. COMPUTE n_dnmtor = n2 - 1. ELSE IF (s1 LT s2) . COMPUTE V_nmrtor = s2 * s2. COMPUTE V_dnmtor = s1* s1. COMPUTE n_nmrtor = n2 -1. COMPUTE n_dnmtor = n1 - 1. END IF. COMPUTE Fc = V_nmrtor / V_dnmtor. COMPUTE SIGF_UTL = 1 - CDF.F(Fc, n_nmrtor, n_dnmtor). EXECUTE. MATRIX. GET V_nmrtor / VARIABLES = V_nmrtor . GET V_dnmtor / VARIABLES = V_dnmtor . GET n_nmrtor / VARIABLES = n_nmrtor . GET n_dnmtor / VARIABLES = n_dnmtor . GET Fc / VARIABLES = Fc . GET SIGF_UTL / VARIABLES = SIGF_UTL . COMPUTE ANSWER = {V_nmrtor, V_dnmtor, n_nmrtor, n_dnmtor, Fc, SIGF_UTL}. PRINT ANSWER / FORMAT "F10.5" /Title = "F-Test for equality of variance (H0 V_nmrtor = V_dnmtor)" / CLABELS = V_nmrtor, V_dnmtor, df_nmrtor, df_dnmtor, Fc, SIGF_UTL. END MATRIX. !ENDDEFINE. DEFINE H_L1P (n = !ENCLOSE('(',')') / x = !ENCLOSE('(',')') / pi = !ENCLOSE('(',')') ). GET FILE='Q:\QUANTS\one.sav'. MATRIX. COMPUTE n = {!n}. /* Enter the sample size here */ COMPUTE x = {!x}. /* Enter the number of "successes" or particular outcomes here */ COMPUTE pi = {!pi}. /* Enter the hypothesised value for the population proportion /* COMPUTE p = x/n. COMPUTE s2 = pi*(1-pi). COMPUTE SE_pi = SQRT(s2/n). COMPUTE z = (p - pi) /SE_pi. COMPUTE SIGz_2TL = 2 * (1 - CDFNORM(ABS(Z))). COMPUTE SIGz_LTL = CDFNORM(Z). COMPUTE SIGz_UTL = 1 - CDFNORM(Z). COMPUTE ANSWER = {n, p, SE_pi, z, SIGz_2TL, SIGz_LTL, SIGz_UTL}. PRINT ANSWER / FORMAT "F10.5" /Title = "Large sample sig. test for one proportion" / CLABELS = n, p, SE_pi, z, SIGz_2TL, SIGz_LTL, SIGz_UTL. END MATRIX. !ENDDEFINE. DEFINE H_L2P (n1 = !ENCLOSE('(',')') / n2 = !ENCLOSE('(',')') / x1 = !ENCLOSE('(',')') / x2 = !ENCLOSE('(',')')). GET FILE='Q:\QUANTS\one.sav'. MATRIX. COMPUTE n1 = {!n1}. /* Enter the first sample size here */ COMPUTE n2 = {!n2}. /* Enter the second sample size here */ COMPUTE x1 = {!x1}. /* Enter the number of "successes" or particular outcomes for sample 1 here */ COMPUTE x2 = {!x2}. /* Enter the number of "successes" or particular outcomes for sample 2 here */ COMPUTE p1 = x1/n1. COMPUTE p2 = x2/n2. COMPUTE phat = (x1 + x2) / (n1 + n2). COMPUTE SE_phat = SQRT(phat * (1 - phat) * ((1/n1) + (1/n2))). COMPUTE z = (p1 - p2) /SE_phat. COMPUTE SIGz_2TL = 2 * (1 - CDFNORM(ABS(z))). COMPUTE SIGz_LTL = CDFNORM(Z). COMPUTE SIGz_UTL = 1 - CDFNORM(Z). COMPUTE ANSWER = {p1, p2, SE_phat, z, SIGz_2TL, SIGz_LTL, SIGz_UTL}. PRINT ANSWER / FORMAT "F10.5" /Title = "Large sample sig. test for two proportions" / CLABELS = p1, p2, SE, z, SIGz_2TL, SIGz_LTL, SIGz_UTL. END MATRIX. !ENDDEFINE. DEFINE CLT (variable = !ENCLOSE('(',')') /nsample = !ENCLOSE('(',')') /Npop = !ENCLOSE('(',')') /reps = !ENCLOSE('(',')') ). !DO !L = 1 !TO !reps. - TITLE !reps Repeated Samples of size !nsample . - temporary. - sample !nsample from !Npop. - MATRIX. - GET VARIABLE / VARIABLES = !variable. - COMPUTE N = NROW(VARIABLE). - COMPUTE I = MAKE(n,1,1). - COMPUTE X_BAR = (1/N)*(TRANSPOS(I) * VARIABLE). - SAVE {X_BAR} / OUTFILE =!CONCAT('"H:\CLT__', !variable, '_sample', !L, '.sav"') /VARIABLES = X_BAR. - END MATRIX. !DOEND. GET FILE= !CONCAT('"H:\CLT__', !variable, '_sample', '1.sav"'). !DO !J = 2 !TO !reps. - ADD FILES /FILE=* /FILE=!CONCAT('"H:\CLT__', !variable, '_sample', !J, '.sav"'). - EXECUTE. !DOEND. SAVE / OUTFILE =!CONCAT('"H:\CLT__n', !nsample, !variable, '_sample', 'ALL', !reps, '.sav"') . TITLE !reps Repeated Samples of size !nsample . GRAPH /HISTOGRAM=X_BAR /TITLE= 'Histogram of Sample Means from Repeated Samples'. TITLE !reps Repeated Samples of size !nsample . DESCRIPTIVES VARIABLES=X_BAR /STATISTICS=MEAN STDDEV MIN MAX . !ENDDEFINE. *---- ... to the end of this line -------------------------------------------------------. *=====================================================================. *LAB COMPUTER SPSS Syntax for Lab 5 of Quantitative Methods 1. *Hypothesis Tests I. *(c) Gwilym Pryce 2005. *=====================================================================. *Examples and answers are taken from "Inference and Statistics in SPSS" by Gwilym Pryce, GeeBeeJey Publising. *SPSS Syntax downloads, copies of the book and other resources can be obtained from www.geebeejey.co.uk, . *and from the Teaching page of www.gwilympryce.co.uk. *========================== End of File ===================================.