SPSS-Macro MEDIAN (version 1.0) D.Enzmann The SPSS-macro MEDIAN.SPS computes the median, the first and third quartiles, the minimum and maximum values, the range, the inter- quartile range and the semi-(inter)quartile range of a variable x. If an optional break variable is specified the statistics are computed for each value of the break variable. The user can specify an optional file name to save the results to a SPSS file. If a break variable is specified the results for the subgroups are saved to the file, otherwise the summary statistics of x are saved. The median and the first and third quartiles are computed according to one of six optional methods: Let n be the number of cases and p the percentiles divided by 100 (i.e. .25 for the first quartile, .50 for the median, and .75 for the third quartile). 1) Weighted average centered at x_np Express np as np=j+g where j is the integer part of np, and g is the fractional part of np; then compute percentile value = (1-g)x_j + gx_(j+1) Note that x_0 is replaced by x_1 (if j=0 the formula would be (1-g)x_1 + gx_1. 2) Weighted average centered at x_(n+1)p Express (n+1)p as (n+1)p=j+g where j is the integer part of (n+1)p, and g is the fractional part of (n+1)p; then compute percentile value = (1-g)x_j + gx_(j+1) Note that x_(n+1) is replaced by x_n (if j=n the formula would be (1-g)x_n + gx_n. 3) Empirical distribution function Express np as np=j+g where j is the integer part of np, and g is the fractional part of np; then choose the percentile value as percentile value = x_j if g=0 percentile value = x_(j+1) if g>0 4) Empirical distribution function with averaging (default) Express np as np=j+g where j is the integer part of np, and g is the fractional part of np; then compute the percentile value percentile value = (x_j + x_(j+1))/2 if g=0 percentile value = x_(j+1) if g>0 5) Empirical distribution function with interpolation Express (n-1)p as (n-1)p=j+g where j is the integer part of (n-1)p, and g is the fractional part of (n-1)p; then compute the percentile value as percentile value = x_(j+1) if g=0 percentile value = x_(j+1) + g(x_(j+2) - x_(j+1)) if g>0 6) Observation closest to np Compute j as the integer part of np+0.5; then compute percentile value = x_j Method 2) is the method used by the SPSS procedure FREQUENCIES, method 4) is the default method used by the macro MEDIAN and by STATISTICA, and method 5) is the method used by MS-Excel. A computational alternative to method 5) would be rank of first quartile = (n+3)/4 rank of median = (n+1)/2 rank or third quartile = (3n+1)/4 where rank is the value in the data set in which the values of variable x have been sorted into increasing order (rank of lowest value is 1 and rank of highest value is n). The differences of the methods are most pronounced in small data sets and vanish in big data sets. The interquartile range of method 2) tends to be bigger than the interquartile range of method 4) which tends to be bigger than the interquartile range of method 5). ------------------------------------------------------------------- The syntax of MEDIAN is MEDIAN VAR=variable [/GROUPS=variable] [/METHOD={1} {2} {3} {4**} {5} {6}] [/OUTFILE='file name']. where the number of METHOD refers to the methods described above. Examples: 1) You want to compute the median and distribution measures of variable V6 using the default method "empirical distribution function with averaging". The syntax would be MEDIAN VAR=v6. 2) You want to compute the median and distribution measures of variable V6 for all levels of variable V2 using the default method "empirical distribution function with averaging". The syntax would be MEDIAN VAR=v6 /GROUPS=v2. 3) You want to compute the median and distribution measures of variable V6 for all levels of variable V2 by using the method "empirical distribution function with interpolation". The syntax would be MEDIAN VAR=v6 /GROUPS=v2 /METHOD=5. 4) You want to compute the median and distribution measures of variable V6 for all levels of variable V2 by using the method "empirical distribution function with interpolation" and want to save the results to the SPSS file "RESULTS.SAV". The syntax would be MEDIAN VAR=v6 /GROUPS=v2 /METHOD=5 /OUTFILE='results.sav'. 5) You want to aggregate the median and the distribution measures of the variables V1,V2,V3, and V4 to the file "RESULTS.SAV" by using the SPSS method used in the procedure FREQUENCIES and with listwise deletion of missing values. The syntax would be: COUNT nmiss = v1,v2,v3,v4 (MISSING). SELECT IF nmiss=0. MEDIAN VAR=v1 /METHOD=2 /OUTFILE='RESULT1.SAV'. MEDIAN VAR=v2 /METHOD=2 /OUTFILE='RESULT2.SAV'. MEDIAN VAR=v3 /METHOD=2 /OUTFILE='RESULT3.SAV'. MEDIAN VAR=v4 /METHOD=2 /OUTFILE='RESULT4.SAV'. ADD FILES /FILE='RESULT1.SAV' /FILE='RESULT2.SAV' /FILE='RESULT3.SAV' /FILE='RESULT4.SAV'. EXE. ERASE FILE='RESULT1.SAV'. ERASE FILE='RESULT2.SAV'. ERASE FILE='RESULT3.SAV'. ERASE FILE='RESULT4.SAV'. SAVE OUTFILE='RESULTS.SAV'. ------------------------------------------------------------------- CAVEATS: 1) The macro MEDIAN will erase the files DUMMYTMP.VAR and DUMMYTMP.GRP if they exist already in the current directory. The macro MEDIAN will also delete the variables DUMMYTMP and DUMMYGRP if the exist in the current working file of SPSS. 2) If the macro is implemented via INCLUDE FILE='MEDIAN.SPS'. the maximum number of loops is set to 12,000. If the variable to be analysed has more than 12,000 valid values the macro may fail and SPSS will issue an error message. Thus, after (!) implement- ing the macro it might be necessary to set the maximum number of loops to a higher value, for example via SET MXLOOPS=20000. 3) It is possible to call the macro MEDIAN with a temporary select- ion of cases. However, you must not use a temporary re-definition of missing values, nor temporarily recode the variables specified. By the way, independently of using the macro MEDIAN, SPSS versions 10 to 11.0 may yield unreliable and inconsistent results if you use the procedure RANK in a sequence of temporary transformations. 4) You should not use the option /OUTFILE of the macro MEDIAN if you are working with SPSS version 10 because there has been a bug in the SAVE funcion of the procedure MATRIX. This problem is solved in SPSS verion 11 and does not appear in versions 6 to 9. ==================================================================== In case of any comments you may contact the author at: dirk.enzmann@jura.uni-hamburg.de ====================================================================