*=====================================================================.
*LAB COMPUTER SPSS Syntax for Lab 5 of Quantitative Methods 1.
*Hypothesis Tests I.
*-------------------------------------------------------------------------------------------------------------------.
*(c) Gwilym Pryce 31 Oct 05, v5.
*LAB_L5_Hypothesis Tests_SPSS_Gwilym Pryce_31oct05_v5.sps
*=====================================================================.

*Examples and answers are taken from "Inference and Statistics in SPSS" by Gwilym Pryce, GeeBeeJey Publising.
*SPSS Syntax downloads, copies of the book and other resources can be obtained from www.geebeejey.co.uk, .
*and from www.gwilympryce.co.uk.


********************************************* NOTE *******************************************************.
*** The macro programs needed for the commands used in this lab are pasted at the end of this file ****.
********************************************* NOTE *******************************************************.

 

*=====================================================================.
* 5.2.1 Exercise 5.4 Large Sample Hypothesis Tests on One Mean (Pryce ).
*-------------------------------------------------------------------------------------------------------------------.
*(see p.5-8 of Gwilym Pryce, 2005, "Inference & Statistics in SPSS: A Course for Business and. 
*Social Science", Glasgow: Geebeejey Publishing, www.geebeejey.co.uk, ISBN:0955143306 ).
*=====================================================================.

*Suppose your area of research is the disappearance of thousands of civil servants .
*and other workers during Joseph Stalin's Great Purge in Soviet Russia 1936-38.  .
*One of the questions you are interested in is the average age of the workers .
*when they disappeared.
*Your thesis is that Stalin felt most threatened by older, more established 'enemies', .
*and so you anticipate their average age to be over 50. Unfortunately, .
*you only have access to 506 records on the age of individuals when they disappeared.
*You have calculated the average age in this sample to be 56.2 years, .
*which would appear to confirm your thesis.
*The standard deviation of your sample was found to be 14.7 years.
*Assuming that your 506 records constitute a random sample .
*from the population of those who disappeared (a questionable assumption?), .
*test your theory about the average age of the Disappeared.

*--------------------------------------------.
*Answer Using the SPSS command.
*--------------------------------------------.

H_L1M     n=(506)   x_bar=(56.2)    m=(50)    s=(14.7).

*Large sample sig. test for one mean.
*          N      X_BAR         SE         ZI   SIGZ_2TL   SIGZ_LTL   SIGZ_UTL.
*  506.00000   56.20000     .65349    9.48745     .00000    1.00000     .00000.

*This corroborates the finding from the last lab that the confidence interval .
*is very narrow because of the large sample size .
*and because of the small standard deviation in the sample:.

*Large sample confidence interval for the population mean          .
*        N      X_BAR      ZI          SE        ERR      LOWER      UPPER  .
*506.00000   56.20000    1.96039     .65349    1.28111   54.91889   57.48111.







*=====================================================================.
*5.2.2 Exercise 5.5 Small Sample Hypothesis Tests on One Mean 
*=====================================================================.


*=====================================================================.
*1. Insulin Injections Machine .
*-------------------------------------------------------------------------------------------------------------------.
*(see p.5-9 of Pryce, G. 2005, "Inference and Statistics in SPSS: A Course for Business and.
*Social Science", Glasgow: Geebeejey Publishing, www.geebeejey.co.uk, ISBN:0955143306 ).
*=====================================================================.
.
*A machine used to fill pre-packaged emergency insulin injections .
*is correctly adjusted if the average net weight of insulin is 11 ounces per syringe.  .
*A random sample of 107 syringes had an average fill of 10.92 ounces .
*and standard deviation of 0.28 ounces.  .
*Is the machine adjusted properly?  Test at the 5% significance level.
*.
*--------------------------------------------.
*Answer Using the SPSS command.
*--------------------------------------------.

H_L1M       n=(107)  x_bar=(10.92)  m=(11)   s=(0.28).

*Large sample sig. test for one mean.
*          N      X_BAR         SE         ZI   SIGZ_2TL   SIGZ_LTL   SIGZ_UTL.
*  107.00000   10.92000     .02707   -2.95545     .00312     .00156     .99844.

*We could alternatively do a lower-tail test: that the syringes are being under-filled.  .
*This leads to an even lower significance level, .
*making it even less likely that our sample is a freak sample.

*Interestingly, if we run a t-test on the same hypothesis we get .
*a slightly more cautious estimate of the significance level, .
*but the result is essentially the same.  .
*This is because, as the sample size gets larger, .
*the t-distribution estimates of the significance level .
*will tend toward the z-distribution estimates.  .
*In this case the sample size is fairly large .
*so you'd expect the z and t estimates of P to be similar.

H_S1M       n=(107)  x_bar=(10.92)  m=(11)   s=(0.28).

*Small sample sig. test for one mean.
*          N      X_BAR         SE         TI   SIGT_2TL   SIGT_LTL   SIGT_UTL.
*  107.00000   10.92000     .02707   -2.95545     .00385     .00192     .99808.



*=====================================================================.
*2. Average GP Time with Patients.
*-------------------------------------------------------------------------------------------------------------------.
*(see p.5-11 of Pryce, G. 2005, "Inference and Statistics in SPSS: A Course for Business and.
*Social Science", Glasgow: Geebeejey Publishing, www.geebeejey.co.uk, ISBN:0955143306 ).
*=====================================================================.
.
*A newspaper has claimed that the average time GPs spend with patients .
*in a particular session has fallen to an all-time low of 3.6 minutes.
*A report from the Scottish Executive disagrees, contending that the health service .
*has met New Labour's manifesto target of 4 minutes .
*average consultation time, .
*and that the data used by the newspaper was based on a freak sample.  .
*The newspaper's survey was based on a random sample of 80 doctors .
*with a mean of 3.60 minutes and a standard deviation of 1.80 minutes.  .
*Is the government bluffing .
*or is there a good chance that this is indeed a freak sample? .
*Perform the appropriate hypothesis test using a significance level of 0.05.


*--------------------------------------------.
*Answer Using the SPSS command.
*--------------------------------------------.

H_L1M  n=(80)  x_bar=(3.6)  m=(4.0)   s=(1.8).

*Large sample sig. test for one mean.
*          N      X_BAR         SE         ZI   SIGZ_2TL   SIGZ_LTL   SIGZ_UTL.
*   80.00000    3.60000     .20125   -1.98762     .04685     .02343     .97657.


*=====================================================================.
*3.Prevalence of American Pronunciation.
*-------------------------------------------------------------------------------------------------------------------.
*(see p.5-12 of Pryce, G. 2005, "Inference & Statistics in SPSS: A Course for Business and.
*Social Science", Glasgow: Geebeejey Publishing, www.geebeejey.co.uk, ISBN:0955143306 ).
*=====================================================================.

*For your PhD, you want to estimate the number of times an American phrase or pronunciation .
*occurs in a typical 5 minute conversation between teenage youths in Liverpool.  .
*Because of the time taken to build up sufficient rapport with such youths .
*for them to speak in a relaxed way, .
*you only manage to observe 23 such conversations.  .
*The average number of Americanisms in 5 minute conversations amongst your small sample is 137.71, .
*with a standard deviation of 69.56.
*The last study that was done demonstrated that the average was 128.2 words per 5 minute conversation and this has become the .
*accepted wisdom in the literature.  .
*Do a hypothesis test to establish whether the average has in fact increased .
*at the 5% significance level.  .
*Also do a two-tail test for whether there has been any change at all.
*Compare your answer with the 95% confidence interval .
*estimated in the previous set of lab exercises.


*--------------------------------------------.
*Answer Using the SPSS command.
*--------------------------------------------.

*If we use the large sample syntax .
*(i.e. use the z-distribution to calculate the probability of the sample being atypical) .
*then we end up with the following output:.

H_L1M  n=(23)  x_bar=(137.71)  m=(128.2)   s=(69.56).

*Large sample sig. test for one mean.
*   N      X_BAR         SE         ZI      SIGZ_2TL   SIGZ_LTL   SIGZ_UTL.
*23.00000  137.71000   14.50426   .65567     .51204     .74398     25602.

*Note, however, that the sample size is only 23, .
*and so one should really use the t-distribution:.

H_S1M  n=(23)  x_bar=(137.71)  m=(128.2)   s=(69.56).

*Small sample sig. test for one mean.
*      N      X_BAR         SE           TI      SIGT_2TL   SIGT_LTL   SIGT_UTL.
*   23.00000  137.71000   14.50426     .65567     .51884     .74058     .25942.

*The significance level for the upper-tail test was found to be 0.259 .
*(slightly higher than that derived using the z-distribution, .
*reflecting the slightly flatter shape of the t-distribution i.e. more area in the tails).  .
*This suggests that if we reject the null hypothesis .
*(that the average number of Americanisms amongst Liverpool youth is still 128.2) .
*in favour of the alternative hypothesis .
*(that average has risen) based on your PhD sample, then there is a 26% chance that we are wrong.  .
*This is too high a risk to take and so we cannot reject the null hypothesis.  .
*In other words, we cannot say that the average number of Americanisms has risen.  .

*If we do a two-tail test, we find that the chances of our sample being a freak sample .
*is more than 50% (sig. = 0.519) .
*and so we have even less of a basis to reject the null hypothesis.

*This result should not surprise us since the 95% confidence interval .
*based on your PhD sample was found to be 108 Americanisms to 168 Americanisms .
*(compare the output below for small and large sample confidence interval estimation):.

C1_L1M    n = (23) x_bar = (137.71) s = (69.56) c = (0.95).

*Large sample confidence interval for the population mean.
*          N      X_BAR        ZIL         SE        ERR      LOWER      UPPER.
*   23.00000  137.71000   -1.95996   14.50426   28.42783  109.28217  166.13783.
*C2_S1M    n = (23) x_bar = (137.71) s = (69.56) c = (0.95).

*Small sample confidence interval for the population mean.
*          N      X_BAR        TIL         SE        ERR      LOWER      UPPER.
*   23.00000  137.71000   -2.07387   14.50426   30.08000  107.63000  167.79000.


*=====================================================================.
*4. Brownfield Contamination.
*-------------------------------------------------------------------------------------------------------------------.
*(see p.5-13 of Pryce, G. 2005, "Inference & Statistics in SPSS: A Course for Business and.
*Social Science", Glasgow: Geebeejey Publishing, www.geebeejey.co.uk, ISBN:0955143306 ).
*=====================================================================.

*As part of your research you seek to analyse the policy of using the planning system .
*to encourage the building of residential properties on brownfield sites .
*(i.e. recycled industrial land), rather than greenfield sites .
*(i.e. former agricultural or park land).  .
*One of your concerns is that brownfield land is more likely to be contaminated, .
*and the methods for surveying the level of contamination do not preclude the possibility .
*that a site may be declared safe when it is not.  .
*Surveys gauge contamination by taking bore-extracts every 100m.
*The land is declared safe for residential construction if the average level of toxicity .
*is no more than 1g per extract.  .
*In your case study area, the former steelworks site in Cambuslang, Glasgow, .
*you find that a random sample of 64 bores has been taken yielding an average of 0.88g .
*with standard deviation of 0.79g.  The average is below the safety threshold .
*but do you think the Local Authority should grant this site residential planning permission?.

*--------------------------------------------.
*Answer Using the SPSS command.
*--------------------------------------------.

H_L1M  n=(64)  x_bar=(0.88)  m=(1)   s=(0.79).

*Large sample sig. test for one mean.
*          N      X_BAR         SE         ZI   SIGZ_2TL   SIGZ_LTL   SIGZ_UTL.
*   64.00000     .88000     .09875   -1.21519     .22429     .11215     .88785.

*The SIGZ_LTL figure is the probability of observing a value of z smaller than –1.215.  .
*If we reject the null that the mean level of contamination in extracts is = 1g in favour of .
*the alternative hypothesis that it is less than 1g, .
*there is more than a one in ten chance that we have rejected the null incorrectly.  .

*If we reject the null in favour of the alternative hypothesis, .
*that it is greater than 1g, .
*we can only be 88% sure that we will have rejected the null incorrectly.


*=====================================================================.
*5. Steel Worker Exposure to Contamination.
*-------------------------------------------------------------------------------------------------------------------.
*(see p.5-13 of Pryce, G. 2005, "Inference & Statistics in SPSS: A Course for Business and.
*Social Science", Glasgow: Geebeejey Publishing, www.geebeejey.co.uk, ISBN:0955143306 ).
*=====================================================================.

*As part of your research into the contamination levels experienced by workers .
*in the Cambuslang steel industry in the first half of the twentieth century, .
*you examine 273 random medical checks on workers .
*which reveal an average contamination level of 92.7 units .
*with a standard deviation of 39.7 units.  .
*The legal threshold for exposure is that the average should not exceed 94 units .
*and so the steel industry always claimed on the basis of this sample .
*that their workers were safe.  .
*How sure can you be that this conclusion is valid?.

*--------------------------------------------.
*Answer Using the SPSS command.
*--------------------------------------------.

*The question, of course, is whether when we take into account random sampling variation .
*the pulation mean is likely to be below the legal threshold.  .
*Let's do an upper-tail test to see if the population mean is likely to be above the legal limit:.

*H0: m	=	94.
*H1: m	>	94.

H_L1M  n=(273)  x_bar=(92.7)  m=(94)   s=(39.7).

*Large sample sig. test for one mean.
*          N      X_BAR         SE         ZI   SIGZ_2TL   SIGZ_LTL   SIGZ_UTL.
*  273.00000   92.70000    2.40275    -.54105     .58848     .29424     .70576.

*These results show that, because of the large standard deviation in the sample, .
*we cannot reject the null hypothesis that the mean exposure score in the population of workers equals 94 units,.
*whether the alternative hypothesis is upper-, lower- or two-tail.  .
*If we look at the confidence interval, this result is not surprising since .
*the upper limit on the 95% confidence interval is 97.4 units.  .
*In short, we cannot be sure on the basis of this sample .
*that the population mean is not well above the legal threshold.

C1_L1M  n=(273)  x_bar=(92.7)  s=(39.7) c=(0.95).

*Large sample confidence interval for the population mean.
*          N      X_BAR        ZIL         SE        ERR      LOWER      UPPER.
*  273.00000   92.70000   -1.95996    2.40275    4.70931   87.99069   97.40931.



*=====================================================================.
*6. Sectarian Attitudes Among Rangers Fans.
*-------------------------------------------------------------------------------------------------------------------.
*(see p.5-14 of Pryce, G. 2005, "Inference & Statistics in SPSS: A Course for Business and.
*Social Science", Glasgow: Geebeejey Publishing, www.geebeejey.co.uk, ISBN:0955143306 ).
*=====================================================================.

*You are a research assistant on a project investigating sectarian attitudes .
*amongst Rangers supporters, .
*you interview 24 supporters in pubs after a football match at Ibrox.  .
*You find that the average age of supporters with sectarian attitudes is 22.7 years .
*(s.d. = 9.2 years) .
*which is well below the most recent estimate published five years ago .
*which said that the average age of this group was around 30 years.  .
*Run a lower-tail t-test to see if this difference between estimates is statistically significant and reflect on the .
*robustness of your method.

*--------------------------------------------.
*Answer Using the SPSS command.
*--------------------------------------------.

*H0: m	=	30.
*H1: m	<	30.

H_S1M  n=(24)  x_bar=(22.7)  m=(30)   s=(9.2).

*Small sample sig. test for one mean.
*          N      X_BAR         SE         TI   SIGT_2TL   SIGT_LTL   SIGT_UTL.
*   24.00000   22.70000    1.87794   -3.88723     .00074     .00037     .99963.
*.
*This result tells us that, despite the small sample, .
*you can be pretty confident of rejecting the null hypothesis .
*(that the average age of sectarian Rangers supporters equals 30) .
*in favour of the alternative hypothesis that the average age is less than 30 years  .
*There is less than a one in a thousand chance that your rejection of the null is incorrect.

*Note, however, that the t-test assumes a normally distributed variable .
*(i.e. that the age of sectarian Rangers supporters is normally distributed) but this may well not be the case  .
*If the normality assumption fails and you have a small sample, then non-parametric methods should be used  .
*If you have a large sample, then the normality assumption does not matter .
*because the Central Limit Theorem will start to kick in .
*(this says that the sampling distribution of the mean is normal even if the variable itself is not normal, provided the .
*sample sizes used in the repeated samples are large).  .

*The t-test (and z-test and confidence interval estimates and any other form of inference) .
*also assumes that your sample is randomly selected from all sectarian Rangers supporters, .
*but this might not be the case.  Older supporters, for example, may be less likely to attend the match 
*(rather than watch on television) .
*and may be less (or more?) likely to stay around for a drink at the pub.  
*The type of sample may also be affected by the outcome of the match, .
*as may your measurement of sectarianism.  
*All these lead to what is known as sample selection bias.  
*Such bias undermines our ability to make inferences about the population from a particular sample.


*=====================================================================.
*End of exercises.
*=====================================================================.











*#############################################################################.
*#############################################################################.
*#############################################################################.
*#############################################################################.
*#############################################################################.
*#############################################################################.










*=====================================================================.
*Macro Programs.
*=====================================================================.

*If these macros have not already been installed on the lab machines, simply highlight all the programs below.
*Then run them as one command by pressing CTRL+R.
*You will then be able to use macro commands .

*---- Highlight from the start of this line... -------------------------------------------.
DEFINE pz_lt_zi (!POSITIONAL !ENCLOSE('(',')')).
GET FILE='Q:\QUANTS\one.sav'.
compute Zi_Var = !1 .
COMPUTE PROB = CDFNORM(Zi_Var).
execute.
MATRIX.
GET PROB_VAR /VARIABLES = PROB.
GET Zi_Var /VARIABLES = Zi_Var.
COMPUTE Prob = PROB_VAR(1).
COMPUTE zi = Zi_Var(1).
COMPUTE ANSWER = {zi, PROB}.
PRINT ANSWER / FORMAT "F10.5" /Title = " Prob(z < zi) for a given zi " / CLABELS = zi, Prob.
END MATRIX.
!ENDDEFINE.



DEFINE pz_gt_zi (!POSITIONAL !ENCLOSE('(',')')).
GET FILE='Q:\QUANTS\one.sav'.
compute Zi_Var = !1 .
COMPUTE PROB = 1 - CDFNORM(Zi_Var).
execute.
MATRIX.
GET PROB_VAR /VARIABLES = PROB.
GET Zi_Var /VARIABLES = Zi_Var.
COMPUTE Prob = PROB_VAR(1).
COMPUTE zi = Zi_Var(1).
COMPUTE ANSWER = {zi, PROB}.
PRINT ANSWER / FORMAT "F10.5" /Title = " Prob(z > zi) for a given zi " / CLABELS = zi, Prob.
END MATRIX.
!ENDDEFINE.



DEFINE pz_lg_zi (zil = !ENCLOSE('(',')')  / ziu =  !ENCLOSE('(',')')).
GET FILE='Q:\QUANTS\one.sav'.
compute ZiL_Var = !zil .
compute ZiU_Var = !ziu .
execute.
COMPUTE PROBL = CDFNORM(ZiL_Var).
COMPUTE PROBU = 1 - CDFNORM(ZiU_Var).
COMPUTE PROBLG = PROBL + PROBU.
execute.
MATRIX.
GET PROB_VAR /VARIABLES = PROBLG.
GET ZiL_Var /VARIABLES = ZiL_Var.
GET ZiU_Var /VARIABLES = ZiU_Var.
COMPUTE Prob = PROB_VAR(1).
COMPUTE ziL = ZiL_Var(1).
COMPUTE ziU = ZiU_Var(1).
COMPUTE ANSWER = {ziL, ziU, PROB}.
PRINT ANSWER / FORMAT "F10.5" /Title = " Prob((z < ziL) OR (z > ziU)) for a given zi " / CLABELS = ziL, ziU, Prob.
END MATRIX.
!ENDDEFINE.



DEFINE pz_gl_zi (zil = !ENCLOSE ('(',')')  / ziu = !ENCLOSE('(',')')).
GET FILE='Q:\QUANTS\one.sav'.
compute ZiL_Var = !zil .
compute ZiU_Var = !ziu .
execute.
COMPUTE PROBL = CDFNORM(ZiL_Var).
COMPUTE PROBU = 1 - CDFNORM(ZiU_Var).
COMPUTE PROBLG = 1 - (PROBL + PROBU).
execute.
MATRIX.
GET PROB_VAR /VARIABLES = PROBLG.
GET ZiL_Var /VARIABLES = ZiL_Var.
GET ZiU_Var /VARIABLES = ZiU_Var.
COMPUTE Prob = PROB_VAR(1).
COMPUTE ziL = ZiL_Var(1).
COMPUTE ziU = ZiU_Var(1).
COMPUTE ANSWER = {ziL, ziU, PROB}.
PRINT ANSWER / FORMAT "F10.5" /Title = " Prob(ziL < z < ziU) for a given zi " / CLABELS = ziL, ziU, Prob.
END MATRIX.
!ENDDEFINE.


DEFINE zi_lt_zp (p = !ENCLOSE('(',')')).
GET FILE='Q:\QUANTS\one.sav'.
COMPUTE Zi = PROBIT(!p).
EXECUTE.
MATRIX.
GET Zi_VAR /VARIABLES = Zi.
COMPUTE Zi = Zi_VAR(1).
COMPUTE PROB= {!p}.     /*Enter the given probability into the curly brackets*/
COMPUTE ANSWER = {Zi, PROB}.
PRINT ANSWER / FORMAT "F10.5" /Title = " Value of zi such that Prob(z < zi) = PROB when PROB is given" / CLABELS = zi, PROB.
END MATRIX.
!END DEFINE.



DEFINE zi_gt_zp (p = !ENCLOSE('(',')')).
GET FILE='Q:\QUANTS\one.sav'.
COMPUTE Zi = PROBIT(1-!p).
EXECUTE.
MATRIX.
GET Zi_VAR /VARIABLES = Zi.
COMPUTE Zi = Zi_VAR(1).
COMPUTE PROB= {!p}.     /*Enter the given probability into the curly brackets*/
COMPUTE ANSWER = {Zi, PROB}.
PRINT ANSWER / FORMAT "F10.5" /Title = " Value of zi such that Prob(z > zi) = PROB when PROB is given" / CLABELS = zi, PROB.
END MATRIX.
!END DEFINE.


DEFINE zi_gl_zp (p = !ENCLOSE('(',')')).
GET FILE='Q:\QUANTS\one.sav'.
COMPUTE PROB = !p.
COMPUTE PROBLG = 1 - !p.
COMPUTE PROBL = PROBLG / 2.
COMPUTE ZiL_Var = PROBIT(PROBL).
COMPUTE ZiU_Var = -1 * ZiL_Var .
execute.
MATRIX.
GET PROB_VAR /VARIABLES = PROB.
GET ZiL_Var /VARIABLES = ZiL_Var.
GET ZiU_Var /VARIABLES = ZiU_Var.
COMPUTE Prob = PROB_VAR(1).
COMPUTE ziL = ZiL_Var(1).
COMPUTE ziU = ZiU_Var(1).
COMPUTE ANSWER = {ziL, ziU, PROB}.
PRINT ANSWER / FORMAT "F10.5" /Title = " Value of zi such that Prob(-zi < z < zi)  = PROB, when PROB is given  " / CLABELS = ziL, ziU, Prob.
END MATRIX.
!ENDDEFINE.


DEFINE  CI_L1M (n =  !ENCLOSE('(',')')  /x_bar =  !ENCLOSE('(',')')  /s =  !ENCLOSE('(',')') /c =  !ENCLOSE('(',')')).
GET FILE='Q:\QUANTS\one.sav'.
COMPUTE PROB = !c.
COMPUTE PROBLG = 1 - !c.
COMPUTE PROBL = PROBLG / 2.
COMPUTE ZiL = PROBIT(PROBL).
COMPUTE ZiU = -1 * ZiL .
execute.
MATRIX.
COMPUTE n             =  {!n}.         /* Enter the sample size here (i.e. change the number in curly brackets)*/
COMPUTE x_bar      =   {!x_bar}.   /* Enter the sample mean here*/
COMPUTE s            =   {!s}.          /* Enter the sample standard deviation here*/
COMPUTE SE = s/SQRT(n).
GET ZiL /VARIABLES = ZiL.
COMPUTE ERR = -ZiL * SE.
COMPUTE LOWER = x_bar - err.
COMPUTE UPPER = x_bar + err.
COMPUTE ANSWER = {n, x_bar, ZiL, SE, err, Lower, Upper}.
PRINT ANSWER / FORMAT "F10.5" /Title = "Large sample confidence interval for the population mean" / CLABELS = n, x_bar, ZiL, SE, err, Lower, Upper.
END MATRIX.
!END DEFINE.




DEFINE  CI_S1M (n =  !ENCLOSE('(',')') /x_bar =  !ENCLOSE('(',')') /s =  !ENCLOSE('(',')') /c =  !ENCLOSE('(',')')).
GET FILE='Q:\QUANTS\one.sav'.
COMPUTE df = !n - 1.
COMPUTE PROB = !c.
COMPUTE PROBLG = 1 - !c.
COMPUTE PROBL = PROBLG / 2.
COMPUTE TiL = IDF.T(PROBL, df).
COMPUTE TiU = -1 * TiL .
execute.
MATRIX.
COMPUTE n             =  {!n}.         /* Enter the sample size here (i.e. change the number in curly brackets)*/
COMPUTE x_bar      =   {!x_bar}.   /* Enter the sample mean here*/
COMPUTE s            =   {!s}.          /* Enter the sample standard deviation here*/
COMPUTE SE = s/SQRT(n).
GET TiL /VARIABLES = TiL.
GET df /VARIABLES = df.
COMPUTE ERR = -TiL * SE.
COMPUTE LOWER = x_bar - err.
COMPUTE UPPER = x_bar + err.
COMPUTE ANSWER = {n, x_bar, TiL, SE, err, Lower, Upper}.
PRINT ANSWER / FORMAT "F10.5" /Title = "Small sample confidence interval for the population mean" / CLABELS = n, x_bar, TiL, SE, err, Lower, Upper.
END MATRIX.
!END DEFINE.



DEFINE  CI_S2Mp (n1 =  !ENCLOSE('(',')')  /n2 =  !ENCLOSE('(',')')  /x_bar1 =  !ENCLOSE('(',')')  /x_bar2 =  !ENCLOSE('(',')')  /s1 =  !ENCLOSE('(',')')  /s2 =  !ENCLOSE('(',')')  /c =  !ENCLOSE('(',')')).
GET FILE='Q:\QUANTS\one.sav'.
COMPUTE df = !n1 + !n2 - 2.
COMPUTE PROB = !c.
COMPUTE PROBLG = 1 - !c.
COMPUTE PROBL = PROBLG / 2.
COMPUTE TiL = IDF.T(PROBL, df).
COMPUTE TiU = -1 * TiL .
execute.
MATRIX.
GET df / variables = df.         /* Enter the df here (i.e. change the number in curly brackets)*/
COMPUTE x_bar1      =   {!x_bar1}.   /* Enter the sample mean here*/
COMPUTE x_bar2      =   {!x_bar2}.   /* Enter the sample mean here*/
COMPUTE sp = SQRT(( (!n1 - 1)* !s1**2 + (!n2 - 1) * !s2**2  ) / (!n1 + !n2 - 2) ). 
COMPUTE SE = sp*(SQRT((1/!n1) + (1/!n2))).
GET TiL /VARIABLES = TiL.
GET df /VARIABLES = df.
COMPUTE ERR = -TiL * SE.
COMPUTE SAMPDIFF = x_bar1 - x_bar2.
COMPUTE LOWER = SAMPDIFF - err.
COMPUTE UPPER = SAMPDIFF + err.
COMPUTE ANSWER = {SAMPDIFF, SP, TiL, SE, err, Lower, Upper}.
PRINT ANSWER / FORMAT "F10.5" /Title = "CI for the difference between 2 population means (pooled variance)" / CLABELS = SAMPDIFF, SP, TiL, SE, err, Lower, Upper.
END MATRIX.
!END DEFINE.


DEFINE  CI_S2Md (n1 =  !ENCLOSE('(',')') /n2 =  !ENCLOSE('(',')')  /x_bar1 =  !ENCLOSE('(',')')  /x_bar2 =  !ENCLOSE('(',')')  /s1 =  !ENCLOSE('(',')')  /s2 =  !ENCLOSE ('(',')')  /c =  !ENCLOSE('(',')')).
GET FILE='Q:\QUANTS\one.sav'.
COMPUTE df = min((!n1 -1), (!n2 - 1)).
COMPUTE PROB = !c.
COMPUTE PROBLG = 1 - !c.
COMPUTE PROBL = PROBLG / 2.
COMPUTE TiL = IDF.T(PROBL, df).
COMPUTE TiU = -1 * TiL .
execute.
MATRIX.
GET df / variables = df.         /* Enter the df here (i.e. change the number in curly brackets)*/
COMPUTE x_bar1      =   {!x_bar1}.   /* Enter the sample mean here*/
COMPUTE x_bar2      =   {!x_bar2}.   /* Enter the sample mean here*/
COMPUTE SE = SQRT((!s1**2/!n1) + (!s2**2/!n2)).
GET TiL /VARIABLES = TiL.
GET df /VARIABLES = df.
COMPUTE ERR = -TiL * SE.
COMPUTE SAMPDIFF = x_bar1 - x_bar2.
COMPUTE LOWER = SAMPDIFF - err.
COMPUTE UPPER = SAMPDIFF + err.
COMPUTE ANSWER = {SAMPDIFF, TiL, SE, err, Lower, Upper}.
PRINT ANSWER / FORMAT "F10.5" /Title = "CI for the difference between 2 population means (different variances)" / CLABELS = SAMPDIFF, TiL, SE, err, Lower, Upper.
END MATRIX.
!END DEFINE.



DEFINE  CI_L1P (n =  !ENCLOSE ('(',')') /x =  !ENCLOSE('(',')') /c =  !ENCLOSE('(',')')). 
GET FILE='Q:\QUANTS\one.sav'.
COMPUTE PROB = !c.
COMPUTE PROBLG = 1 - !c.
COMPUTE PROBL = PROBLG / 2.
COMPUTE ZiL = PROBIT(PROBL).
COMPUTE ZiU = -1 * ZiL .
execute.
MATRIX.
COMPUTE n = !n.         /* Enter the sample size here */
COMPUTE x = !x.           /* Enter the number of "successes" or particular outcomes here */
COMPUTE CONFID = !c.  /* Enter the desired confidence level here */
COMPUTE pTrad = x/n.                  /* the traditional estimate of the pop. proportion used in CI estimation */
COMPUTE pWlsn = (x+2)/(n + 4).   /* the Wilson estimate */
GET zstar /VARIABLES = ZiL.
COMPUTE SE_Trad = SQRT((pTrad*(1-pTrad))/n).
COMPUTE SE_Wlsn = SQRT((pWlsn*(1-pWlsn))/(n+4)).
COMPUTE eTrad = -zstar * SE_Trad.
COMPUTE eWlsn = -zstar * SE_Wlsn.
COMPUTE LOW_Trad = pTrad - eTrad.
COMPUTE LOW_Wlsn = pWlsn - eWlsn.
COMPUTE UP_Trad = pTrad + eTrad.
COMPUTE UP_Wlsn = pWlsn + eWlsn.
COMPUTE ANSWER = {pTrad, zstar, se_trad, etrad, low_trad, up_trad}.
PRINT ANSWER / FORMAT "F10.6" /Title = "Traditional Large sample CI for one proportion" / CLABELS = ptrad, zstar, se_trad, etrad, low_trad, up_trad.
COMPUTE ANSWER = {pWlsn, zstar, se_wlsn, ewlsn, low_wlsn, up_wlsn}.
PRINT ANSWER / FORMAT "F10.6" /Title = "Wilson Large sample CI for one proportion" / CLABELS = pwlsn, zstar, se_wlsn, ewlsn, low_wlsn, up_wlsn.
END MATRIX.
!ENDDEFINE.



DEFINE N_L1M (e = !ENCLOSE('(',')')  /c = !ENCLOSE('(',')')  /s  = !ENCLOSE('(',')')).
GET FILE='Q:\QUANTS\one.sav'.
COMPUTE PROB = !c.
COMPUTE E = !E.
COMPUTE PROBLG = 1 - !c.
COMPUTE PROBL = PROBLG / 2.
COMPUTE ZiL_Var = PROBIT(PROBL).
COMPUTE ZiU_Var = -1 * ZiL_Var .
COMPUTE N = (ZiL_Var**2) * (!s**2) / (!e**2). 
execute.
MATRIX.
GET PROB_VAR /VARIABLES = PROB.
GET N / VARIABLES = N.
GET E / VARIABLES = E.
GET ZiL_Var /VARIABLES = ZiL_Var.
GET ZiU_Var /VARIABLES = ZiU_Var.
COMPUTE Prob = PROB_VAR(1).
COMPUTE ziL = ZiL_Var(1).
COMPUTE ziU = ZiU_Var(1).
COMPUTE ANSWER = {e, PROB, ziL, ziU,  N}.
PRINT ANSWER / FORMAT "F10.5" /Title = "n_hat = estimated sample size needed to achieve an error of size e given c" / CLABELS = e, c, ziL, ziU, n_hat.
END MATRIX.
!ENDDEFINE.



DEFINE H_L1M (n = !ENCLOSE('(',')')  / x_bar = !ENCLOSE('(',')')  / m  = !ENCLOSE('(',')')  / s  = !ENCLOSE('(',')') ).
GET FILE='Q:\QUANTS\one.sav'.
COMPUTE x_bar = !x_bar.
COMPUTE m = !m.
COMPUTE s = !s.
COMPUTE n = !n.
COMPUTE zi = (!x_bar - !m) / (!s /sqrt(!n)) .
execute.
* This calculates the Large-Sample Significance Test for a Single Population mean.
MATRIX.
GET ZI / VARIABLES = zi. 
GET X_BAR / VARIABLES = X_BAR.
GET N / VARIABLES = N.
GET S / VARIABLES = S.
COMPUTE SE = S / (SQRT(N)).
COMPUTE SIGz_2TL = 2 * (1 - CDFNORM(ABS(Zi))).
COMPUTE SIGz_LTL = CDFNORM(Zi).
COMPUTE SIGz_UTL = 1 - CDFNORM(Zi).
COMPUTE ANSWER = {n, x_bar, SE, zi, SIGz_2TL, SIGz_LTL, SIGz_UTL}.
PRINT ANSWER / FORMAT "F10.5"  /Title = "Large sample sig. test for one mean" / CLABELS = n, x_bar, SE, zi, SIGz_2TL, SIGz_LTL, SIGz_UTL.
END MATRIX.
!ENDDEFINE.


DEFINE H_S1M (n = !ENCLOSE('(',')')  / x_bar = !ENCLOSE('(',')') / m  = !ENCLOSE('(',')')  / s  = !ENCLOSE('(',')')).
GET FILE='Q:\QUANTS\one.sav'.
COMPUTE x_bar = !x_bar.
COMPUTE m = !m.
COMPUTE s = !s.
COMPUTE n = !n.
COMPUTE ti = (!x_bar - !m) / (!s /sqrt(!n)) .
execute.
MATRIX.
GET TI / VARIABLES = ti. 
GET X_BAR / VARIABLES = X_BAR.
GET N / VARIABLES = N.
GET S / VARIABLES = S.
COMPUTE SE = S / (SQRT(N)).
COMPUTE SIGt_2TL = 2 * (1 - TCDF(ABS(ti), n-1)).
COMPUTE SIGt_LTL = TCDF(ti, n-1).
COMPUTE SIGt_UTL = 1 - TCDF(ti, n-1).
COMPUTE ANSWER = {n, x_bar, SE, ti, SIGt_2TL, SIGt_LTL, SIGt_UTL}.
PRINT ANSWER / FORMAT "F10.5" / /Title = "Small sample sig. test for one mean" / CLABELS = n, x_bar, SE, ti, SIGt_2TL, SIGt_LTL, SIGt_UTL.
END MATRIX.
!ENDDEFINE.



DEFINE H_S2Mp (n1 = !ENCLOSE('(',')')  / n2 = !ENCLOSE('(',')')  / x_bar1 = !ENCLOSE('(',')')  / x_bar2 = !ENCLOSE('(',')')  / s1  = !ENCLOSE('(',')')  / s2  = !ENCLOSE('(',')') ).
GET FILE='Q:\QUANTS\one.sav'.
MATRIX.
COMPUTE n1 = {!n1}.  /* Enter the first sample size here */
COMPUTE n2 = {!n2}.  /* Enter the second sample size here */
COMPUTE x_bar1 = {!x_bar1}.  /* Enter the mean of sample 1 here */
COMPUTE x_bar2 = {!x_bar2}. /* Enter the mean of sample 2 here */
COMPUTE s1 = {!s1}.  /* Enter the first standard deviation here*/
COMPUTE s2 = {!s2}. /* Enter the  second standard deviation here*/
COMPUTE sp = SQRT(((n1-1)*s1**2+(n2-1)*s2**2)/(n1 + n2 -2)).
COMPUTE x1b_x2b = x_bar1 - x_bar2.
COMPUTE SE = sp * SQRT((1/n1)+(1/n2)).
COMPUTE ti = (x_bar1 - x_bar2) /SE.
COMPUTE df = n1 + n2 -2.
COMPUTE SIGt_2TL = 2 * (1 - TCDF(ABS(ti), df)).
COMPUTE SIGt_LTL = TCDF(ti, df).
COMPUTE SIGt_UTL = 1 - TCDF(ti, df).
COMPUTE ANSWER = {df, x1b_x2b, SE, ti, SIGt_2TL, SIGt_LTL, SIGt_UTL}.
PRINT ANSWER / FORMAT "F10.5" /Title = "Equal vars Indep samples t-test for equal means (H0 pop1ave = pop2ave)"  / CLABELS = df, x1b_x2b, SE, ti, SIGt_2TL, SIGt_LTL, SIGt_UTL.
END MATRIX.
!ENDDEFINE.


DEFINE H_S2Md (n1 = !ENCLOSE('(',')')  / n2 = !ENCLOSE('(',')')  / x_bar1 = !ENCLOSE('(',')')  / x_bar2 = !ENCLOSE('(',')')  / s1  = !ENCLOSE('(',')')  / s2  = !ENCLOSE('(',')') ).
GET FILE='Q:\QUANTS\one.sav'.
COMPUTE n1 = !n1.  
COMPUTE n2 = !n2.  
COMPUTE x_bar1 = !x_bar1.  
COMPUTE x_bar2 = !x_bar2. 
COMPUTE s1 = !s1.  
COMPUTE s2 = !s2. 
COMPUTE SE = SQRT((s1**2/n1)+(s2**2/n2)).
COMPUTE x1b_x2b = x_bar1 - x_bar2.
COMPUTE ti = (x_bar1 - x_bar2) /SE.
COMPUTE df1 = n1 -1.
COMPUTE df2 = n2 - 2.
COMPUTE df = MIN(df1, df2).
COMPUTE SIGt_2TL = 2 * (1 - CDF.T(ABS(ti), df)).
COMPUTE SIGt_LTL = CDF.T(ti, df).
COMPUTE SIGt_UTL = 1 - CDF.T(ti, df).
execute.
MATRIX.
GET TI / VARIABLES = ti. 
GET x1b_x2b / VARIABLES = x1b_x2b .
GET df / VARIABLES = df.
GET SE / VARIABLES = SE.
GET SIGt_2TL / VARIABLES = SIGt_2TL.
GET SIGt_LTL / VARIABLES = SIGt_LTL.
GET SIGt_UTL / VARIABLES = SIGt_UTL.
COMPUTE ANSWER = {df, x1b_x2b, SE, ti, SIGt_2TL, SIGt_LTL, SIGt_UTL}.
PRINT ANSWER / FORMAT "F10.6" /Title = "Uneq vars Indep samples t-test for equality of means (H0 pop1ave = pop2ave)" / CLABELS = df, x1b_x2b, SE, ti, SIGt_2TL, SIGt_LTL, SIGt_UTL.
END MATRIX.
!ENDDEFINE.


DEFINE H_S2VF (n1 = !ENCLOSE('(',')')  / n2 = !ENCLOSE('(',')')  / s1  = !ENCLOSE('(',')')  / s2  = !ENCLOSE('(',')') ).
GET FILE='Q:\QUANTS\one.sav'.
COMPUTE n1 = !n1.  
COMPUTE n2 = !n2.  
COMPUTE s1 = !s1.  
COMPUTE s2 = !s2. 
EXECUTE.
DO IF (s1 GE s2) .
COMPUTE V_nmrtor = s1 * s1.
COMPUTE V_dnmtor = s2 * s2.
COMPUTE n_nmrtor = n1 - 1.
COMPUTE n_dnmtor = n2 - 1.
ELSE IF (s1 LT s2) .
COMPUTE V_nmrtor = s2 * s2.
COMPUTE V_dnmtor = s1* s1.
COMPUTE n_nmrtor = n2 -1.
COMPUTE n_dnmtor = n1 - 1.
END IF.
COMPUTE Fc = V_nmrtor / V_dnmtor.
COMPUTE SIGF_UTL = 1 - CDF.F(Fc, n_nmrtor, n_dnmtor).
EXECUTE.
MATRIX.
GET V_nmrtor / VARIABLES = V_nmrtor .
GET V_dnmtor / VARIABLES =  V_dnmtor .
GET n_nmrtor  / VARIABLES = n_nmrtor  .
GET n_dnmtor  / VARIABLES = n_dnmtor  .
GET Fc / VARIABLES = Fc .
GET SIGF_UTL / VARIABLES = SIGF_UTL .
COMPUTE ANSWER = {V_nmrtor, V_dnmtor, n_nmrtor, n_dnmtor, Fc, SIGF_UTL}.
PRINT ANSWER / FORMAT "F10.5" /Title = "F-Test for equality of variance (H0 V_nmrtor = V_dnmtor)" / CLABELS = V_nmrtor, V_dnmtor, df_nmrtor, df_dnmtor, Fc, SIGF_UTL.
END MATRIX.
!ENDDEFINE.



DEFINE H_L1P (n = !ENCLOSE('(',')')  / x = !ENCLOSE('(',')')  / pi  = !ENCLOSE('(',')')  ).
GET FILE='Q:\QUANTS\one.sav'.
MATRIX.
COMPUTE n = {!n}.  /* Enter the sample size here */
COMPUTE x = {!x}.    /* Enter the number of "successes" or particular outcomes here */
COMPUTE pi = {!pi}.  /* Enter the hypothesised value for the population proportion /*
COMPUTE p = x/n.
COMPUTE s2 = pi*(1-pi).
COMPUTE SE_pi = SQRT(s2/n).
COMPUTE z = (p - pi) /SE_pi.
COMPUTE SIGz_2TL = 2 * (1 - CDFNORM(ABS(Z))).
COMPUTE SIGz_LTL = CDFNORM(Z).
COMPUTE SIGz_UTL = 1 - CDFNORM(Z).
COMPUTE ANSWER = {n, p, SE_pi, z, SIGz_2TL, SIGz_LTL, SIGz_UTL}.
PRINT ANSWER / FORMAT "F10.5" /Title = "Large sample sig. test for one proportion" / CLABELS = n, p, SE_pi, z, SIGz_2TL, SIGz_LTL, SIGz_UTL.
END MATRIX.
!ENDDEFINE.



DEFINE H_L2P (n1 = !ENCLOSE('(',')')  / n2 = !ENCLOSE('(',')')  / x1 = !ENCLOSE('(',')')  / x2 = !ENCLOSE('(',')')).
GET FILE='Q:\QUANTS\one.sav'.
MATRIX.
COMPUTE n1 = {!n1}.  /* Enter the first sample size here */
COMPUTE n2 = {!n2}.  /* Enter the second sample size here */
COMPUTE x1 = {!x1}.  /* Enter the number of "successes" or particular outcomes for sample 1 here */
COMPUTE x2 = {!x2}. /* Enter the number of "successes" or particular outcomes for sample 2 here */
COMPUTE p1 = x1/n1.
COMPUTE p2 = x2/n2.
COMPUTE phat = (x1 + x2) / (n1 + n2).
COMPUTE SE_phat = SQRT(phat * (1 - phat) * ((1/n1) + (1/n2))).
COMPUTE z = (p1 - p2) /SE_phat.
COMPUTE SIGz_2TL = 2 * (1 - CDFNORM(ABS(z))).
COMPUTE SIGz_LTL = CDFNORM(Z).
COMPUTE SIGz_UTL = 1 - CDFNORM(Z).
COMPUTE ANSWER = {p1, p2, SE_phat, z, SIGz_2TL, SIGz_LTL, SIGz_UTL}.
PRINT ANSWER / FORMAT "F10.5" /Title = "Large sample sig. test for two proportions" / CLABELS = p1, p2, SE, z, SIGz_2TL, SIGz_LTL, SIGz_UTL.
END MATRIX.
!ENDDEFINE.



DEFINE CLT (variable = !ENCLOSE('(',')') /nsample = !ENCLOSE('(',')') /Npop = !ENCLOSE('(',')') /reps = !ENCLOSE('(',')') ).
!DO !L = 1 !TO !reps.
- TITLE !reps Repeated Samples of size !nsample .
- temporary.
- sample !nsample from !Npop.
- MATRIX.
- GET VARIABLE / VARIABLES = !variable.
- COMPUTE N = NROW(VARIABLE).
- COMPUTE I = MAKE(n,1,1).
- COMPUTE X_BAR = (1/N)*(TRANSPOS(I) * VARIABLE).
- SAVE {X_BAR} / OUTFILE =!CONCAT('"H:\CLT__', !variable, '_sample', !L, '.sav"')  /VARIABLES = X_BAR.
- END MATRIX.
!DOEND.
GET FILE= !CONCAT('"H:\CLT__', !variable, '_sample', '1.sav"').
!DO !J = 2 !TO !reps.
- ADD FILES /FILE=*
/FILE=!CONCAT('"H:\CLT__', !variable, '_sample', !J, '.sav"').
- EXECUTE.
!DOEND.
SAVE / OUTFILE =!CONCAT('"H:\CLT__n', !nsample, !variable, '_sample', 'ALL', !reps, '.sav"') .
TITLE !reps Repeated Samples of size !nsample .
GRAPH /HISTOGRAM=X_BAR /TITLE= 'Histogram of Sample Means from Repeated Samples'.
TITLE !reps Repeated Samples of size !nsample .
DESCRIPTIVES VARIABLES=X_BAR /STATISTICS=MEAN STDDEV MIN MAX .
!ENDDEFINE.
*---- ... to the end of this line -------------------------------------------------------.


*=====================================================================.
*LAB COMPUTER SPSS Syntax for Lab 5 of Quantitative Methods 1.
*Hypothesis Tests I.
*(c) Gwilym Pryce 2005.
*=====================================================================.

*Examples and answers are taken from "Inference and Statistics in SPSS" by Gwilym Pryce, GeeBeeJey Publising.
*SPSS Syntax downloads, copies of the book and other resources can be obtained from www.geebeejey.co.uk, .
*and from the Teaching page of www.gwilympryce.co.uk.

*========================== End of File ===================================.