Dirk Enzmann - Statistical Software (Some Useful Things)

Below you find some small executables, SPSS macros and scripts, Excel-templates, R functions (see: http://www.r-project.org/) and Stata ado-files I wrote for special calculations in statistical analyses. The executable programs are written in Pascal 7.0 and run under 16- and 32-bit Windows (3.x, 9x, NT4, XP). The files can be downloaded and spread without further permisson under the condition that they remain unchanged. They have been tested as virus free. The author is not liable to any damages caused by their use. Comments on improvements are welcome.

For questions / comments please use the following email address: dirk.enzmann(at)uni-hamburg.de

Name Description Application Download
BetaDiff For calculating confidence intervals and testing the significance of the difference of two beta-coefficients from independent samples (description). Executable BetaDiff.zip
Center For centering a set of variables (with listwise deletion of missing cases); useful for computing products of variables for interaction terms in regression analyses. SPSS center.sps
clstop_lbt Stata module to determine via -cluster stop, rule(lbt)- the number of kmeans clusters (or to determine whether there is more than one kmeans cluster) according to the lower bound technique presented in Steinley & Brusco (2011).
(To install you may copy the .ado- and the .sthlp-file into your "\ado\plus\c\" folder - the recommended method, however, is to enter ssc install clstop_lbt in Stata's command window.)
Stata clstop_lbt.ado
clstop_lbt.sthlp
CorrTot For computing pooled means, standard deviations and a pooled correlation matrix from means, standard deviations and correlation matrices of two independent samples (description). R
Executable
corrtot.r
CorrTot.zip
CovMat For writing a covariance matrix of a set of variables (with listwise deletion of missing cases) to a text file. SPSS covmat.sps
Crosstabs R function to simulate the SPSS procedure CROSSTABS. R crosstabs.r
DivCat Stata module to calculate five measures of diversity for multiple categories: Generalized variance (GV), entropy (H), its normalized counterparts (NGV, NH) (see Budesco & Budesco, 2012), and polarization (RQ) (see Montalvo & Reynal-Querol, 2008). (To install you may copy the contents of the .zip-file into your "\ado\plus\d\" folder - the recommended method, however, is to enter ssc install divcat in Stata's command window.) Stata divcat.zip
dta2sps Stata module  to create SPSS syntax and a Stata data file to convert Stata data into SPSS data. Extended missing values which are labeled will be recoded into "numeric" values which will be defined as missing by using SPSS syntax created by -dta2sav-. This allows to preserve labels of missing values as defined in Stata for subsequent use in SPSS.
(To install you may copy the .ado- and the .sthlp-file into your "\ado\plus\d\" folder - the recommended method, however, is to enter ssc install dta2sav in Stata's command window.)
Stata dta2sav.do
dta2sav.sthlp
DumCode For creating dummy variables (indicator coding) of a nominal variable. Useful for regression analyses with independent variables that are categorical. SPSS dumcode.sps
Fa.promax To compute maximum likelihood factor analysis with varimax and promax rotation; allows specification of promax power and sorting of loadings; output  includes correlation matrix of factors and (optionally) matrices of factor scores R fa.promax.r
Freq R function to simulate the SPSS procedure FREQUENCIES. R freq.r
Hist.kdnc To plot a histogram overlayed by a kernel density and a normal curve. R hist.kdnc.r
IntGraph Template for drawing interaction plots of a regression equation with interaction term (description). Excel intgraph.zip
Kurtosis To compute the unbiased population estimate or biased sample statistic of kurtosis. R kurtosis.r
LogRegR2 To calculate ChiČ model fit and RČ analogs (pseudo RČ: McFadden's RČ, Cox & Snell index, Nagelkerke index, McKelvey & Zavoina's RČ) of a logistic regression model obtained by glm(..., family = 'binomial'). R LogRegR2.r
MeanSD For computing interactively the mean and standard deviation of a combined sample from up to 50 independent samples. Executable  meansd.zip
MeanSDF Same as MeanSD for up to 1000 samples and input file as input (description). Executable meansdf.zip
Median For calculating the median and quartiles of a variable (optionally for all values of a break variable) according to one of six different methods (description). SPSS median.sps
MEResc To rescale the results of mixed (multilevel) nonlinear probability models such as xtmelogit, xtlogit, or xtprobit to the same scale as the intercept-only model. This allows to compare regression coefficients or variance components across hierarchically nested models [see: Hox, J. J. (2010). Multilevel Analysis: Techniques and Applications (Chapter 6.5, pp. 133-139). New York (2nd ed.): Routledge].
(To install you may copy the .ado-, .mo- and .sthlp-files into your "\ado\plus\m\" folder - the recommended method, however, is to enter ssc install meresc in Stata's command window.)
Stata meresc.zip
Miss2Sys Script to recode all missing values of all numeric variables to system missing values (useful if you want to import an SPSS data file with different missing values in R) (description). SPSS Miss2Sys.sbs
Moments2 To calculate the mean, standard deviation, and different types of skewness and kurtosis (according to Joanes & Gill, 1988) of a list of variables. The default are estimates of skewness and kurtosis as used in SAS and SPSS.
(To install you may copy the .ado- and the .hlp-file into your "\ado\plus\m\" folder - the recommended method, however, is to enter ssc install moments2 in Stata's command window.)
Stata moments2.ado
moments2.hlp
nb_adjust For identifying and adjusting (or removing) outliers of a variable assumed to have a negative binomial distribution.
(Requires Stata version 13.1 or higher. To install you may copy all files of the .zip-file starting with "n" into the "\ado\plus\n\" folder and all files starting with "r" into the "\ado\plus\r\" folder - the recommended method, however, is to enter ssc install nb_adjust in Stata's command window.)
Stata nb_adjust.zip
Part_tst For testing the difference between two standardized regression coefficients of the same equation (one sample) (description). SPSS part_tst.zip
PCA To compute a principal components "factor" analysis (PCA) with varimax and promax rotation; different options for the number of components (factors): direct specification, parallel test criteria (random eigenvalues), or minimum eigenvalue; optionally specification of promax power, sorting of loadings, and matrices of factor scores (see also: RanEigen and Fa.promax). R pca.r
Plot.fitPNB To plot the proportion of the observed counts and the fitted (expected) probabilities of Poisson and negative binomial distributed counts of a variable. R plot.fitPoisNegb.r
Plot.kdnc To plot a kernel density curve overlayed by a normal curve. R plot.kdnc.r
Plot.power To calculate and plot power of a one sample z-test of a sample mean. R plot.power.r
Plot_Power Create graph to demonstrate power analysis (one-sample z-test of a mean) - see demonstration in pow_demo.do. Stata plot_power.do
pow_demo.do
ProfSim For calculating different measures of profile similarity based on two sets of variables (description: see comments at the end of the macro). SPSS profsim.sps
prop.CI To calculate the confidence interval of a single proportion according to one of eleven methods (see: Brown, Cai, & DasGupta, 2001; Newcombe, 1998) (default: likelihood ratio method) (description: see comments of source file). R prop.CI.r
ex_prop.CI.r
R2_mz To compute McKelvey & Zavoina's Pseudo-RČ for multilevel logistic regression, random effects, and fixed effects logit and probit models (see Windmeijer, 1995).
(To install you may copy the .ado-, .mo- and .sthlp-files into your "\ado\plus\r\" folder - the recommended method, however, is to enter ssc install r2_mz in Stata's command window.)
Stata r2_mz.zip
RanEigen For determining the number of components (factors) to retain in a principal component analysis (PCA) by using random eigenvalues (parallel analysis) (APM article describing version 1.0)  (how to install RanEigen?). Executable
R
pacrit.zip
RanEigen.r
Rel_Clust Stata module to compute indices of relative clusterability of a set of variables according to Steinley & Brusco (2008) and to transform a set of variables to z-standardized, range standardized, or to variance-to-range ratio weighted variables for use in (K-means) cluster analysis.
(To install you may copy the .ado- and the .hlp-file into your "\ado\plus\r\" folder - the recommended method, however, is to enter ssc install rel_clust in Stata's command window.)
Stata rel_clust.ado
rel_clust.sthlp
RelDiff For computing the reliability of a difference score (gain score) according to Zimmerman & Williams (1982). Executable reldiff.zip
Reliability R function to simulate the SPSS procedure RELIABILITY. R reliability.r
r_bis For computing a biserial correlation coefficient and its significance. SPSS r_bis.sps
examp_r.sps
R_Prob For calculating the significance, 95%-confidence interval, and Fisher's Z value of a Pearson correlation coefficient r (given sample size n). Executable r_prob.zip
r_tetra For computing a tetrachoric correlation coefficient and its significance (see also: TetCorr). SPSS r_tetra.sps
examp_r.sps
scores (R) To create scores (min, max, sum, sd, or mean) of variables. The user can specify the minimum number of valid values necessary for the score to be valid. If mean scores are requested it is possible to center them at the overall mean, to transform them to z-scores, or to transform them to POMP (percent of maximum possible) scores. R scores.r
test_sc.r
scores (Stata) To create scores (row-wise) of a set of variables. The user can specify the minimum number of valid values necessary for the score to be valid. The scores created can be: minimum, maximum, total (sum), median, percentile, standard deviation, or mean. If mean scores are requested it is possible to center them at the overall mean or to transfrom them to z-scores, POMP (percent of maximum possible) scores, the proportion of maximum possible scores, or the shrunken proportion of maximum possible scores.
(To install you may copy the .ado- and the .hlp-file into your "\ado\plus\s\" folder - the recommended method, however, is to enter ssc install scores in Stata's command window.)
Stata scores.ado
scores.sthlp
sim_BE To simulate series of Bernoulli experiments and plot the cumulative sequence of success rates (optionally including confidence intervals). Stata sim_be.do
be_demo.do
sim_CI To demonstrate the concept of confidence intervals (CIs) by simulation. The program creates (animated) plots of confidence intervals (employing either t- or normal-distribution) by drawing a user specified number of samples of user specified size from the normal distribution with user specified mu and sigma. Optional output contains sample statistics and coverage rate of confidence intervals. R

Stata
sim_CI.r
CI_demo.r
sim_ci.do
ci_demo.do
Skewness To compute the unbiased population estimate or biased sample statistic of skewness. R skewness.r
SortL To sort rotated factor loadings (pattern matrix) or components previously created by the postestimation command -rotate-. Sorting of loadings or components by size facilitates the interpretation of a factor solution.
(To install you may copy the .ado- and the .hlp-file into your "\ado\plus\s\" folder - the recommended method, however, is to enter ssc install sortl in Stata's command window.)
Stata sortl.ado
sortl.hlp
SPSS2Stata Script for converting an SPSS data file (.sav) into a Stata/SE data file (.dta). The script now supports variable names longer than 8 characters. Nevertheless, you may find the Stata ado -usespss- useful, too (to install enter ssc install usespss in Stata's command window). However, in contrast to this script and similar to StatTransfer -usespss- ignores value labels of missing values (description). SPSS spss2stata.sbs
t-Test For testing the difference in means between two indepedent samples (given means, standard deviations and sample sizes of both samples) (description). Executable t_test.zip
TabNotes To convert .not-files created by the data entry software EpiData (see: http://www.epidata.dk/index.htm) containing data entry notes into a tabulator-delimited file (for example, to export the notes into an Excel file) (description). Executable TabNotes.zip
TetCorr DOS program and source code (Pascal) for computing a matrix of tetrachoric correlation coefficients of up to 50 variables and a maximum of 8,000 cases (see also: r_tetra) (description). Executable tetcorr.zip
TetVNPos To determine which variables are responsible for a matrix of tetrachoric correlations not being positive definite (dependencies: packages -psych- and -mvtnorm-) R TetVNPos.r
TRd For computing the Satorra-Bentler scaled chi-square difference test (TRd) based on the MLM estimators obtained by MPlus, see: http://www.statmodel.com/chidiff.html. Executable trd.zip
VDef2SPS Script for creating SPSS syntax to define the variables (variable labels, value labels, and missing values) according to the definitions of a specific SPSS data file (*.sav) (description). SPSS VDef2SPS.sbs

Some other useful things:
(last update: August 11, 2018)