Enzmann_Software

Dirk Enzmann - Statistical Software (Some Useful Things)

Below you find some small executables, SPSS macros and scripts, Excel-templates, R functions (see: http://www.r-project.org/) and Stata ado-files I wrote for special calculations in statistical analyses. The executable programs are written in Pascal 7.0 and run under 16- and 32-bit Windows (3.x, 9x, NT4, XP). The files can be downloaded and spread without further permisson under the condition that they remain unchanged. They have been tested as virus free. The author is not liable to any damages caused by their use. Comments on improvements are welcome.

For questions / comments please use the following email address: dirk.enzmann(at)uni-hamburg.de

Name	Description	Application	Download
BetaDiff	For calculating confidence intervals and testing the significance of the difference of two beta-coefficients from independent samples (description).	Executable	BetaDiff.zip
Center	For centering a set of variables (with listwise deletion of missing cases); useful for computing products of variables for interaction terms in regression analyses.	SPSS	center.sps
clstop_lbt	Stata module to determine via -cluster stop, rule(lbt)- the number of kmeans clusters (or to determine whether there is more than one kmeans cluster) according to the lower bound technique presented in Steinley & Brusco (2011). (To install you may copy the .ado- and the .sthlp-file into your "\ado\plus\c\" folder - the recommended method, however, is to enter ssc install clstop_lbt in Stata's command window.)	Stata	clstop_lbt.ado clstop_lbt.sthlp
CorrTot	For computing pooled means, standard deviations and a pooled correlation matrix from means, standard deviations and correlation matrices of two independent samples (description).	R Executable	corrtot.r CorrTot.zip
CovMat	For writing a covariance matrix of a set of variables (with listwise deletion of missing cases) to a text file.	SPSS	covmat.sps
Crosstabs	R function to simulate the SPSS procedure CROSSTABS.	R	crosstabs.r
DivCat	Stata module to calculate five measures of diversity for multiple categories: Generalized variance (GV), entropy (H), its normalized counterparts (NGV, NH) (see Budesco & Budesco, 201 2), and polarization (RQ) (see Montalvo & Reynal-Querol, 2008). (To install you may copy the contents of the .zip-file into your "\ado\plus\d\" folder - the recommended method, however, is to enter ssc install divcat in Stata's command window.)	Stata	divcat.zip
dta2sps	Stata module to create SPSS syntax and a Stata data file to convert Stata data into SPSS data. Extended missing values which are labeled will be recoded into "numeric" values which will be defined as missing by using SPSS syntax created by -dta2sav-. This allows to preserve labels of missing values as defined in Stata for subsequent use in SPSS. (To install you may copy the .ado- and the .sthlp-file into your "\ado\plus\d\" folder - the recommended method, however, is to enter ssc install dta2sav in Stata's command window.)	Stata	dta2sav.do dta2sav.sthlp
DumCode	For creating dummy variables (indicator coding) of a nominal variable. Useful for regression analyses with independent variables that are categorical.	SPSS	dumcode.sps
Fa.promax	To compute maximum likelihood factor analysis with varimax and promax rotation; allows specification of promax power and sorting of loadings; output includes correlation matrix of factors and (optionally) matrices of factor scores	R	fa.promax.r
Freq	R function to simulate the SPSS procedure FREQUENCIES.	R	freq.r
Hist.kdnc	To plot a histogram overlayed by a kernel density and a normal curve.	R	hist.kdnc.r
IntGraph	Template for drawing interaction plots of a regression equation with interaction term (description).	Excel	intgraph.zip
Kurtosis	To compute the unbiased population estimate or biased sample statistic of kurtosis.	R	kurtosis.r
LogRegR2	To calculate Chi² model fit and R² analogs (pseudo R²: McFadden's R², Cox & Snell index, Nagelkerke index, McKelvey & Zavoina's R²) of a logistic regression model obtained by glm(..., family = 'binomial').	R	LogRegR2.r
MeanSD	For computing interactively the mean and standard deviation of a combined sample from up to 50 independent samples.	Executable	meansd.zip
MeanSDF	Same as MeanSD for up to 1000 samples and input file as input (description).	Executable	meansdf.zip
Median	For calculating the median and quartiles of a variable (optionally for all values of a break variable) according to one of six different methods (description).	SPSS	median.sps
MEResc	To rescale the results of mixed (multilevel) nonlinear probability models such as xtmelogit, xtlogit, or xtprobit to the same scale as the intercept-only model. This allows to compare regression coefficients or variance components across hierarchically nested models [see: Hox, J. J. (2010). Multilevel Analysis: Techniques and Applications (Chapter 6.5, pp. 133-139). New York (2nd ed.): Routledge]. (To install you may copy the .ado-, .mo- and .sthlp-files into your "\ado\plus\m\" folder - the recommended method, however, is to enter ssc install meresc in Stata's command window.)	Stata	meresc.zip
Miss2Sys	Script to recode all missing values of all numeric variables to system missing values (useful if you want to import an SPSS data file with different missing values in R) (description).	SPSS	Miss2Sys.sbs
Moments2	To calculate the mean, standard deviation, and different types of skewness and kurtosis (according to Joanes & Gill, 1988) of a list of variables. The default are estimates of skewness and kurtosis as used in SAS and SPSS. (To install you may copy the .ado- and the .hlp-file into your "\ado\plus\m\" folder - the recommended method, however, is to enter ssc install moments2 in Stata's command window.)	Stata	moments2.ado moments2.hlp
nb_adjust	For identifying and adjusting (or removing) outliers of a variable assumed to have a negative binomial distribution. (Requires Stata version 13.1 or higher. To install you may copy all files of the .zip-file starting with "n" into the "\ado\plus\n\" folder and all files starting with "r" into the "\ado\plus\r\" folder - the recommended method, however, is to enter ssc install nb_adjust in Stata's command window.)	Stata	nb_adjust.zip
Part_tst	For testing the difference between two standardized regression coefficients of the same equation (one sample) (description).	SPSS	part_tst.zip
PCA	To compute a principal components "factor" analysis (PCA) with varimax and promax rotation; different options for the number of components (factors): direct specification, parallel test criteria (random eigenvalues), or minimum eigenvalue; optionally specification of promax power, sorting of loadings, and matrices of factor scores (see also: RanEigen and Fa.promax).	R	pca.r
Plot.fitPNB	To plot the proportion of the observed counts and the fitted (expected) probabilities of Poisson and negative binomial distributed counts of a variable.	R	plot.fitPoisNegb.r
Plot.kdnc	To plot a kernel density curve overlayed by a normal curve.	R	plot.kdnc.r
Plot.power	To calculate and plot power of a one sample z-test of a sample mean.	R	plot.power.r
Plot_Power	Create graph to demonstrate power analysis (one-sample z-test of a mean) - see demonstration in pow_demo.do.	Stata	plot_power.do pow_demo.do
ProfSim	For calculating different measures of profile similarity based on two sets of variables (description: see comments at the end of the macro).	SPSS	profsim.sps
prop.CI	To calculate the confidence interval of a single proportion according to one of eleven methods (see: Brown, Cai, & DasGupta, 2001; Newcombe, 1998) (default: likelihood ratio method) (description: see comments of source file).	R	prop.CI.r ex_prop.CI.r
R2_mz	To compute McKelvey & Zavoina's Pseudo-R² for multilevel logistic regression, random effects, and fixed effects logit and probit models (see Windmeijer, 1995). (To install you may copy the .ado-, .mo- and .sthlp-files into your "\ado\plus\r\" folder - the recommended method, however, is to enter ssc install r2_mz in Stata's command window.)	Stata	r2_mz.zip
RanEigen	For determining the number of components (factors) to retain in a principal component analysis (PCA) by using random eigenvalues (parallel analysis) (APM article describing version 1.0) (how to install RanEigen?).	Executable R	pacrit.zip RanEigen.r
Rel_Clust	Stata module to compute indices of relative clusterability of a set of variables according to Steinley & Brusco (2008) and to transform a set of variables to z-standardized, range standardized, or to variance-to-range ratio weighted variables for use in (K-means) cluster analysis. (To install you may copy the .ado- and the .hlp-file into your "\ado\plus\r\" folder - the recommended method, however, is to enter ssc install rel_clust in Stata's command window.)	Stata	rel_clust.ado rel_clust.sthlp
RelDiff	For computing the reliability of a difference score (gain score) according to Zimmerman & Williams (1982).	Executable	reldiff.zip
Reliability	R function to simulate the SPSS procedure RELIABILITY.	R	reliability.r
r_bis	For computing a biserial correlation coefficient and its significance.	SPSS	r_bis.sps examp_r.sps
R_Prob	For calculating the significance, 95%-confidence interval, and Fisher's Z value of a Pearson correlation coefficient r (given sample size n).	Executable	r_prob.zip
r_tetra	For computing a tetrachoric correlation coefficient and its significance (see also: TetCorr).	SPSS	r_tetra.sps examp_r.sps
scores (R)	To create scores (min, max, sum, sd, or mean) of variables. The user can specify the minimum number of valid values necessary for the score to be valid. If mean scores are requested it is possible to center them at the overall mean, to transform them to z-scores, or to transform them to POMP (percent of maximum possible) scores.	R	scores.r test_sc.r
scores (Stata)	To create scores (row-wise) of a set of variables. The user can specify the minimum number of valid values necessary for the score to be valid. The scores created can be: minimum, maximum, total (sum), median, percentile, standard deviation, or mean. If mean scores are requested it is possible to center them at the overall mean or to transfrom them to z-scores, POMP (percent of maximum possible) scores, the proportion of maximum possible scores, or the shrunken proportion of maximum possible scores. (To install you may copy the .ado- and the .hlp-file into your "\ado\plus\s\" folder - the recommended method, however, is to enter ssc install scores in Stata's command window.)	Stata	scores.ado scores.sthlp
sim_BE	To simulate series of Bernoulli experiments and plot the cumulative sequence of success rates (optionally including confidence intervals).	Stata	sim_be.do be_demo.do
sim_CI	To demonstrate the concept of confidence intervals (CIs) by simulation. The program creates (animated) plots of confidence intervals (employing either t- or normal-distribution) by drawing a user specified number of samples of user specified size from the normal distribution with user specified mu and sigma. Optional output contains sample statistics and coverage rate of confidence intervals.	R Stata	sim_CI.r CI_demo.r sim_ci.do ci_demo.do
Skewness	To compute the unbiased population estimate or biased sample statistic of skewness.	R	skewness.r
SortL	To sort rotated factor loadings (pattern matrix) or components previously created by the postestimation command -rotate-. Sorting of loadings or components by size facilitates the interpretation of a factor solution. (To install you may copy the .ado- and the .hlp-file into your "\ado\plus\s\" folder - the recommended method, however, is to enter ssc install sortl in Stata's command window.)	Stata	sortl.ado sortl.hlp
SPSS2Stata	Script for converting an SPSS data file (.sav) into a Stata/SE data file (.dta). The script now supports variable names longer than 8 characters. Nevertheless, you may find the Stata ado -usespss- useful, too (to install enter ssc install usespss in Stata's command window). However, in contrast to this script and similar to StatTransfer -usespss- ignores value labels of missing values (description).	SPSS	spss2stata.sbs
t-Test	For testing the difference in means between two indepedent samples (given means, standard deviations and sample sizes of both samples) (description).	Executable	t_test.zip
TabNotes	To convert .not-files created by the data entry software EpiData (see: http://www.epidata.dk/index.htm) containing data entry notes into a tabulator-delimited file (for example, to export the notes into an Excel file) (description).	Executable	TabNotes.zip
TetCorr	DOS program and source code (Pascal) for computing a matrix of tetrachoric correlation coefficients of up to 50 variables and a maximum of 8,000 cases (see also: r_tetra) (description).	Executable	tetcorr.zip
TetVNPos	To determine which variables are responsible for a matrix of tetrachoric correlations not being positive definite (dependencies: packages -psych- and -mvtnorm-)	R	TetVNPos.r
TRd	For computing the Satorra-Bentler scaled chi-square difference test (TRd) based on the MLM estimators obtained by MPlus, see: http://www.statmodel.com/chidiff.html.	Executable	trd.zip
VDef2SPS	Script for creating SPSS syntax to define the variables (variable labels, value labels, and missing values) according to the definitions of a specific SPSS data file (*.sav) (description).	SPSS	VDef2SPS.sbs

Some other useful things:

A very useful utility is the "real-time codebook" ViewSav written by Karel Asselberghs that allows to view the variables of SPSS and Stata data files including labels and basic statistics, see: http://www.asselberghs.nl/stuff.htm
For an extremely useful source of SPSS macros see: http://www.spsstools.net

(last update: August 11, 2018)