Golive Indonesia (Merchandise)September 28, 2013
Exploring Eagle Mountain Bike Park in AdelaideSeptember 30, 2013
1. Go to Stata prompt and click on “Intercooled Stata”
2. In the command line type: set mem 5000k (then press “enter”)
Note: for very large files: set mem 500000k (press “enter”)
then type: set matsize 150 (press “enter” – allows 150 variables)
3. Click on “file” (upper left corner)
4. Click on “open”
5. Click on “down arrow” to use either the “c” or “a” drives
6. Click so that the desired drive reveals its files
7. Click on the file you want to load
8. To execute an operation (e.g., ordered probit) type the following in the
command line: regress (now click on the variable names as listed on the left side of page beginning with the dependent variable – note that using that replacing regress with “fit” will deliver many useful diagnostics – in regression and dichotomous logit/probit a constant is automatically included). Then press “enter.” Other estimators: logit (for the odds ratio – instead of the log of the odds ratio that logit yields – replace logit with logistic), probit (for marginal effects replace probit with dprobit), oprobit, ologit (ordered probit/logit), mlogit (multinomial logit), nlogit (nested logit) and tobit. If using tobit, after the last independent variable type a comma and the letters ll and ul (e.g., tobit ratio ada pover, ll ul). This censors the model at the lower limit and upper limit (i.e., uses the lowest and highest values as the censoring points). You can select a censoring point [i.e., ll (17)]. If censoring is only one side, you may use just one censor point (e.g., tobit ratio ada pover, ll) After running a tobit model, type quadchk (if these results differ greatly from the tobit results it means that you probably shouldn’t use the tobit results). After probit or logit commands you can eliminate all the convergence output by placing “nolog” at the end of the command line: probit vote ada99, bush00, nolog
Long Command Lines: If the command you are entering is so long that it will
not fit on one line then type /// at the endpoint of the first line and continue the command on a second line (i.e., the line below).
Reconfigure Screen: (1) click on “Prefs”; (2) click on “Manage Preferences”; (3)
click on “Load Preferences”; (4) click on “Factory Settings.” If the something doesn’t appear as you expect, just run several equations, it will clear up. You may need to exit and then go back into Stata.
Place STATA Results in a Word File: Using the mouse, highlight the Stata
results you want to transfer into Word. Then use “ctrl c” (control c) to copy the Stata results. Go into Word. Use “ctrl v” (control v) to bring the Stata results into Word. You can “lineup” the Stata results by making sure the font is “currier new” 9 (i.e., font size or number 9). To change the font, highlight the Stata results you have placed in Word. With the highlight on, change the font to “currier new” 9 (i.e., 9 “point”). Now save as a normal Word file.
Command File or “Do File” – Creating the file: (1) Go into Stata; (2) load a
dataset; (3) Click on “Window” (at the top of the screen); (4) Click on “Do-File”; (4) Click on “New Do-File”; (5) Write the commands that you want to save (you can include loading the data file) – for example:
probit grh85 ccus86 par86 medinc85
probit grh87 ccus88 par87 medinc87
Note: you could input an Excel spreadsheet that had been saved as
as “tab delimited” text file, as opposed to the “use” and then Stata
file as above, by replacing the “use” line with:
insheet using “c:/taxreceiveratiodata.txt” (doesn’t appear to be case
Further Note: for commands requiring more than one line, type /// at
The end of the first line and continue the command on the line below.
(6) save by clicking on “File” under “Untitled” in the middle of the screen
To make changes in your “Do-File” or to use one, or more, commands out of the file do the following: (1) Click on “Window” (at the top of the screen); (2) Click on “Do-File”; (3) click on “New Do File; (4) Click on “File” under “Untitled” in the middle of the screen; (5) Click on “Open” and double click on the Do-File you want and the commands in the file will appear – you can either change/add commands or highlight existing commands and “copy” a command and then “paste” it in the Stata command line.
To run an entire “Do-File”: (1) Click on File in the upper left side of the screen; (2) Click on “Do” and then double click on the Do-File you want to run.
Show Last Command: “page up” key (“page down” for reverse direction)
Sampling: to randomly select 10.5% of the cases from your dataset type:
sample 10.5 You can save this smaller sample (e.g., for “Small Stata”).
Select Cases by Scores on a Variable: logit nafta avmich if divrk>25 (only
uses cases where “divrk” is greater than 25; use >=25 for 25 or greater) If selecting by a particular year you need to use two consecutive equal signs. Thus to list the scores on variable “race” for 1990 type: list race if year==1990 (two consecutive equal signs). To select on two variables at the same time use “&”: logit nafta avmich if party==1 & south==0 Missing data are a great problem with this procedure. You can use two commas if you have multiple sets of instructions after the last independent variable. For example, to specify lower and upper censoring points in tobit and use only observations where variable “state” =1:
tobit ratio ada85 par85, ll ul, if state==1
You can also select cases by scores on a variable using the “keep” command prior to statistical analysis. For example, to select cases scoring “2” on variable “brown” (with possible scores of 1, 2 and 3) you could use the following command: keep if (brown==2). To use cases with scores of 1 and 2 on brown type: keep if (brown== 1 & 2). To use cases with a score of less than 10,000 on a continuous variable “income” type:
keep if (income <10000) or for 10,000 and less type:
keep if (income <=10000). Make sure you don’t save the data because you’ll lose all the dropped observations. If you are using a “do” file to prevent permanent loss of data than have the last command reinstall the original dataset.
Select Cases by Observation: In a data set with 100 observations, to use
observations 1, 25-29 and 34-100 type: drop in 2/24 (press “enter”)
drop in 30/33 (press “enter”) and then run regression
Deleting Observations from a Dataset: Read the dataset into Stata. In a
dataset in which “year” was a variable and I wanted to take data from the years 1985, 1987, 1988, 1991, 1992, 1995, 1996, 2000 and 2005 from a dataset that was annual from 1880 to 2008 I did the following:
drop if year<1985
drop if year==1986
drop if year==1989
drop if year==1990
You get the picture. I don’t know how to drop consecutive years (e.g., 1989 and 1990) in one command. When you’ve deleted all the years you don’t want you will be left with those you do want. Then, using the data editor you can cut and paste the new dataset into Excel. You might “google” both “drop” and “keep” in Stata. There may be easier ways to do this than that described above.
Stacking Data – How to Change it: To change a dataset stacked by state (e.g.,
observations 1-20 are 20 consecutive annual observations on state #1 with observation 21 being the first observation on state #2) to a dataset stacked by year (e.g., observations 1-50 are the scores for 1985 on each of the 50 states – stnum = state number): sort year stnum
Transpose Rows and Columns: xpose, clear Since the xpose command
eliminates letter names for variables, it might be useful to use the xpose command on a backup file while having a primary file containing the variable names. You could then transpose the variable name column or row in Excel and cut and paste the variable names into the xpose file.
Show scores on a particular observation: to show the score on variable dlh for
the 19th state (stcode is the variable name) and the year 1972 (year as the variable name) type: list dlh if stcode==19 & year==1972
Entering a Series of Independent Variables: You can enter a series of
consecutive independent variables by subtracting the last independent variable from the first. For example, suppose you have 50 state dummy variables that appear in consecutive order in your Stata dataset (e.g., ala, ak, etc. through wy). Instead of entering each independent variable by name, you could type: ala-wy and every variable beginning with “ala” and ending with “wy” would be entered.
String Variables to Numeric Variables: Stata cannot read variables which
appear in the Data Editor in “red” (i.e., string variables – letters or numbers that are in red). To convert a “red”/string variable, d3, to a numeric variable (also named “d3”) you need to convert the string variable to another variable (“place” in the example ahead), set this new variable equal to d3 and delete the created variable (i.e., place). The word “force” is part of the process and not a variable name. Proceed as follows:
destring d3, generate(place) force
Mathematical Procedures: addition +; subtraction -; multiplication: *; division /
Recoding a Variable: To convert a score of 1 into 2 and vice versa on variable
grhcum type: recode grhcum (1=2) (2=1) You can also use multiple procedures with nonconsecutive changes in one command. For example if you have a variable “vote” with 4 categories of responses (0, 1, 2, and 3) and want to have 0, 1 and 3 read as 0 while 2 becomes read as 1 type the following: recode vote (3=0) (1=0) (2=1) To put combine two consecutive categories use “/” thus to get 1 and 2 read as “1” type: recode cons (1/2 = 1). To recode a variable that ranged from 0 to 1 (and assumed any value in between) into just 0 and 1, I typed the following:
recode demcont (.0001/.5 = 0) (.5001/1.0 = 1) Note: the recode command does not recognize “<” and “>.” To create and recode a variable that comprises several variables (e.g., Democratic control includes both houses of the legislature plus the governorship):
gen lhdempc= lhdempro
recode lhdempc (.0000/.5 = 0) (.5001/1 = 1)
gen uhdempc= uhdempro
recode uhdempc (.0000/.5 = 0) (.5001/1 = 1)
gen demcont = lhdempc + uhdempc + demgov
recode demcont (.0001/2.999 = 0) (3 = 1)
To recode nonconsecutive numbers use multiple commands. In creating state dummy variables from a state number variable (“stnum” coded 1 through 50) I did the following: gen al=stnum then with a second command: recode al (1=1) (else=0) To recode a percentage variable, “cons,” into three categories (e.g., 0-33=1, etc.) type: recode cons (0/33=1) (34/66=2) (67/100=3) or you can accomplish the same operation as follows: gen cons1 = 1 if cons < 34 (press “enter”)
replace cons1 = 2 if cons >= 34 (press “enter”)
replace cons1 = 3 if cons >= 67 (press “enter”) (double check which way the arrows point – think through if it should be “<” or “>”)
Note: if you are using the above commands for one value you need two
consecutive equal signs. Thus, gen cons1=1 if cons ==34 (would be if you wanted a score of “1” on cons1 to equal a score of 34 on cons).
Recoding to Percentiles: There is a command called the “xtile” command that will recode the observations based on which percentile (ranges) of a distribution the data are in.
xtile lownetinc5 = networthlow_06, nq(5)
Where the new variable would be “lownetinc5” , the old variable would be “networthlow_06” and the number of categories for the new variable would be indicated by the nq (5) if you wanted 5 categories.
Absolute Values: to recode a variable with positive and negative values
to all positive values type: gen margin= abs(diff) This converted
negative values on “diff” to all positive for new variable “margin.”
Converting electoral “margin” into two categories (less than 3% = 1 and greater than 3% = 0), I did the following: (1) gen margin3less=.
(2) replace margin3less=1 if margin<.03001
(3) replace margin3less=0 if margin>.03
Percentage Change Variable:
tsset stnum year, yearly (means time series)
gen income = (pcdinc-pcdinc[_n-1])/pcdinc[_n-1]
Percentage Change Over Time:
For a dataset stacked by state and year (obs. 1-24 were for state #1, with
obs. 25 being the first year for state #2) to get the percentage change in population (popi) over presidential administrations beginning in 1981 and ending with 2004 (i.e., change over 2001-2004) I did the following (after the data were “tsset”):
gen pchange = (f3.popi-popi)/popi
list pchange if year == 1981 & 1985 & 1989 & 1993 & 1997 & 2001
Square Root: square root of variable pop or spop type: gen spop = sqrt(pop)
(no space between sqrt and (pop)
Logarithms: to convert variable X to a log: gen lnx = ln(x)
Descriptive Statistics – type: summarize cons party stinc (to get the median –
i.e., 50th percentile – type: summarize cons party stinc, detail)
If you have missing data and want the statistics based on only those cases used in a regression do the following: (1) run the regression/logit, etc.; (2) after receiving the results type the following in the command line: estat summarize
Z Scores: to standardize cons type: egen consst = std (cons) (i.e., the new variable name is consst or: (1) download “center” command: ssc install center (2) to standardize cons type: center cons (which produces a new variable called c_cons) (3) then type: summarize cons (to obtain the standard deviation) (4) then type: gen consst=c_cons/standard deviation from step 3
Frequencies: to obtain the frequencies on one variable (e.g. ada94) type:
tabulate ada94 You can add the mean and standard deviation by:
tabulate ada94, sum(ada94) You can obtain frequencies over a limit
range of observation (e.g., 1 through 10) by:
tabulate ada94 in 1/10
Addition/Summation Over Time: given a dataset stacked by state (e.g.,, obs.
1-47 are 47 consecutive years for Alabama, with obs. 48 being the first year for Alaska, etc.) and variable year telling the year and stnum the number of the state, to find the mean on variable lhdem over the 1985-2002 period for each state type:
tabulate stnum if year>1984 & year<2003, summ (lhdem)
Note: If you want to make every observation for a particular year have the average value, 63.21, for variable statepop for that year type:
gen statepop = 63.21 if year==1985
Cross Tabulation and Measures of Association:
type: tabulate grh85 par85, row column all (“row” and
“column” yield row and column percentages, “all” yields statistics – Kendall’s tau, gamma and Cramer’s V – you can ask for either row or column percentages, or as above, both – if you want Fischer’s Exact test, add “exact” after “all”). If an error message says “too many values” you may need to recode one or both variables. For a three variable table either: tabulate tax1 cons1 if party==1, row column all
or you need to “sort” by the control variable. For example, to use the two variables above controlling for party type: sort par85 (press “enter”)
by par85: tabulate grh85 grh87, row column all exact (press “enter”)
Correlation: correlate tax cons (to correlate tax and cons – can add more
Partial Correlation: pcorr tax cons party stinc
Kendall’s tau: ktau tax cons (can add more variables)
Spearman rank correlation: spearman tax cons (can add more variables)
Gamma: see “cross tabulation and measures of association” above or tabulate
tax cons, gamma or tab tax cons, gam
Note: you can only use two variables at a time and you may need to recode before obtaining a gamma statistic – you can have 5 categories per variable but I don’t know how many more categories are allowed
if you use the procedure listed at the beginning of “Cross Tabulation and Measures of Association” you can avoid recodes.
Cronbach’s Alpha: Cronbach’s Alpha examines reliability by determining the
internal consistency of a test or the average correlation of items (variables) within the test. In Stata, the alpha command conducts the reliability test. For example, suppose you wish to test the internal reliability of ten variables, v1 through v10. You could run the following:
alpha v1-v10, item In this example, the item option displays the effects of removing an item from the scale. If you want to see if a group of items can reasonably be thought to form an index/scale you could also use Cronbach’s alpha. For example: alpha a3e a3g a3j a3o, c i The “alpha” score in the “Test scale” row (“alpha” is in the far right column and “Test scale” is a row) should be about .80 (maximum is 1.0) to show a high degree of reliability of the components. However, William Jacoby said the .80 threshold is very high. He would’ve gone lower to .70 (but would never use a scale with a reliability below .5 because you’d have more error variance than substantive variance). If the variables are measured on different scales you may want to standardize them. If so then add “s” to the above command (i.e., alpha a3e a3g a3j a3o, c i s). Since the score for a variable in the “Test scale” column is what the “Test scale” number would be if that variable were deleted, you can maximize the score in the “Test scale” row by deleting any variables whose score in the “alpha” column is greater than the alpha in the “Test scale” row. You can make the scale into a variable by typing: alpha a3e a3g a3j a3o, c gen(anscale) Note: “anscale” is arbitrary (you can pick any name you want – this will now appear as a variable). If you want to exclude those respondents that had a particular score on a variable (e.g., using scores 1 and 2 on variable “petition” but excluding 3) then do the following: alpha a3e a3g a3j a3o if partition==1&2, c i s) For a better understanding see “Intermediate Social Statistics: Lecture 6. Scale Construction” by Thomas A.B. Snijders – saved as adobe file: StataMokkenCronbach.
Factor Analysis: You could factor analyze a group of variables (principle
components method by typing: factor a3e a3g a3j a3o, pcf
Look for eigenvalues greater than 1.0 (signifying that the variables in the factor explain more of the variance than individual variables). The entries in the “factor” column are the correlations of that particular variable with the underlying factor. The number in the “cumulative” column tells how much of the variance in the variables you have factor analyzed are explained by all of the factors at the point (i.e., the first entry tells how much of the variance is explained by factor 1 while the second entry tells how much of the variance is explained by factors 1 & 2 together). The score in the “uniqueness” column tells how much of the explained variance is unique to that variable. For example, a score of .61 would indicate that 61% of the variance explained by that particular variable is not explained by the other variables. If you then type “rotate” (default approach – varimax with orthogonal factors – i.e., the factors are not correlated with each other) it will maximize the fit of the dominant variables on that factor. This setting is recommended when you want to identify variables to create indexes or new variables without inter-correlated components. To create new variables (after running “factor” and “rotate”) type: predict factor1 factor2 (you can use whatever names you want to the right of “predict”). They will now appear as variables.
polychoric a3a- a3o (need to add this command by typing findit polychoric) Note: tetrachoric for dichotomous variables
Mokken Scaling: “Mokken scaling is an iterative scale-building technique, and
as it is non-parametric is especially suitable for skewed and binary items. It is based on Guttman scales, which are unidimensional, ordinal scales of binary items along a continuum. A positive answer to one item of a certain ‘difficulty’ indicates that all other items of lesser difficulty have also been
answered positively. For example, a positive response to one particular (rare) item indicates that other (more common) items have also been endorsed. Mokken scaling can also use polytomous items, and is a probabilistic version of Guttman scaling. Loevinger’s H-coefficient is used for interpretation. By convention, 0.3 ≥ H < 0.4, 0.4 ≥ H < 0.5 and H ≥ 0.5 indicate weak, moderate and strong scales respectively. Higher
H values indicate higher item discrimination power, and thus more confidence in ordering of respondents. The H-value equals [1 – (observed Guttman errors/predicted Guttman errors)]. Expected Guttman errors are the probability that the items are chosen by chance, while observed Guttman errors are the number of times items are endorsed as if not in an
ordered sequence. Therefore, a coefficient of ≤ .4 demonstrates a scale with items with a 60% rate of Guttman errors. Following a recommended procedure, which involves increasing the coefficient value until the most interpretable solution is found, items that demonstrate poor discriminability are excluded from the scale. Results can be compared to factor
analysis. In general, factor loadings larger than .5 result in H-coefficients greater than .3. Reported scales are ordered in terms of difficulty, ie. the most infrequently endorsed items feature at the top.” (Frank Doyle, et. al., “Exhaustion, Depression and Hopelessness in Cardiac Patients: A Unidimensional Hierarchy of Symptoms Revealed by Mokken Scaling,” Royal College of Surgeons in Ireland, 2011, pp. 29-30). “Loevinger coefficients Mokken (1971) proposed to measure the quality of the pair of items i; j by the Loevinger coefficient Hij = 1 Observed Nij (1; 0)
Expected Nij (1; 0): The ‘expected’ value is calculated under the null model that the items are independent. If no errors are observed, Hij = 1;
if as many errors are observed as expected under independence, then
Hij = 0. For example with two items with means _Xi: = 0:2; _Xj: = 0:6,
for a sample size of n = 100 the expected table is
Xjh = 0 Xjh = 1
Xih = 0 32 48 80
Xih = 1 8 12 20
There are 8 errors in the above table. Now suppose the errors were reduced to just 2 (i.e., 2 in the cell which contains 8). Then Hij = 1 – (2/8) = 0:75. Thus, a good scale should have Loevinger H coefficients that are large enough for all pairs i; j with i < j. Rules of thumb that have been found useful are as follows: Hij < 0:3 indicates poor/no scalability;
0:3 < Hij < 0:4 indicates useful but weak scalability;
0:4 < Hij < 0:5 indicates medium scalability;
0:5 < Hij indicates good scalability.
Similarly, Loevinger’s coefficients can be defined for all pairwise errors for a given item (Hi ) and for all pairwise errors for the entire scale (H).
Although you can run the procedure without specifying a value for Loevinger’s H, you can set levels as in the following command (“c” is the value set). msp a3f a3h a3i a3k, c(.4)
Below are some additional commands that can be used:
msp a3a-a3o, c(.4)
msp a3a-a3o, pairwise c(.4)
loevh a3a-a3o, pairwise
Mokken scaling can be used in a confirmatory way, with a given set of
items (where the order can be determined empirically) as well as in an exploratory way. In the exploratory method, a set of items is given,
and it is tried to find a well-scalable subset. This is done by first finding the pair with the highest Hij as the starting point for the scale; and by then consecutively adding items that have the highest values of the Loevinger coefficients with the items already included in the scale. This procedure can then be repeated with the remaining items to find a further scale among those. The reliability can be estimated also from the inter-item correlations. The Mokken scaling module is not part of the normal Stata program and must be downloaded. In the command line type: findit msp
Multidimensional Scaling: Assume we have information about the American
electorate’s perceptions of thirteen prominent political figures from the period of the 2004 presidential election. Specifically, we have the perceived dissimilarities between all pairs of political figures. With 13 figures, there will be 78 distinct pairs of figures. Rank-order pairs of political figures, according to their dissimilarity (from least to most dissimilar). Multidimensional Scaling (MDS) tries to find a set of k points in m-dimensional space such that the distances between pairs of points
approximate the dissimilarities between pairs of objects. (adapted from William Jacoby and David Armstrong, “Multidimensional Scaling 1”, used at the Measurement, Scaling and Dimensional Analysis short course at the 2011 ICPSR Summer Program – available from William Jacoby’s website). The following command generates the mds default estimation:
mds a3a-a3o, id(partic)
Note: you need to specify an “id” – the name of the variable that gives the observation number. In the command line above, the variable “partic” gives the number assigned to each participant in the study.
View Scores on a Variable: list (then click on variable names and press “enter”)
Graphs/Plots: run a regression prior to each step below.
to graph residuals by predicted values: rvfplot
graph residuals by an ind. variable cons: rvpplot cons
plot cons by tax type: plot cons tax
leverage vs. residual sq. plot: lvr2plot
added variable plots (useful for uncovering observations exerting disproportionate influence): avplots
box plots: graph box cons party tax (shows median, 25th, 75th percentile)
histogram of cons: graph twoway histogram cons
scatter plot of tax (on y axis) and cons type: scatter tax cons
>>> to get a graph of variables restax and rescon with both dots and
a regression line type:
graph twoway lfit restax rescons || scatter restax rescons
Interaction Term: gen nsnt=nsa*nt1 (+ – / for other mathematical procedures)
Dummy Variables: Automatic Creation: if you have a variable entitled “year”
and want to create dummy year variables type: xi i.year To delete this variable type: drop _I*
Interaction Variables: Automatic Creation: to create an dummy variable for
year and gender type: xi i.year*i.gender To drop type: drop _I*
Residuals and Predicted Values
1. run main equation: fit tax cons party stinc
(I believe you can replace “fit” with regres, logit, probit, etc. – “fit” is just for regression – I don’t think “fit” works with logit, etc.)
2. predict yhat
3. gen res=tax-yhat
4. list tax yhat res
Stepwise: allows you to specify significance levels and re-estimate the model
deleting variables that are less significant than the selected threshold. For example: stepswise, pr(.2) hierarchical: regress tax cons party stinc
would mean Stata would estimate the model with all three independent variables and then re-estimate excluding any independent variable that was not significant at the .20 level. Can use with probit, logit, etc.
Regression Diagnostics: Run a regression with regress. Now in the command
line type: dfbeta (you’ll see each DF and the name of each independent variable – type “list” and then the name of the independent variable you are interested in). For other diagnostics run a regression. For standardized residuals type: predict esta if e(sample), rstandard (in the command line). You will see “esta” appear in the variable list. Now type: list esta and you will see the values. For studentized residuals do try the following after a regression: Predict estu if e(sample), rstudent (estu will now appear as a variable). For Cooks distance type: predict cooksd if e(sample), cooksd after running a regression.
Multicollinearity: after running a regression using regress, type: vif (or: estat vif)
in the command line. Subtract the number in the 1/VIF column from 1
to obtain the percentage of variation in that independent variable which is explained by all other independent variables. In the VIF column, numbers above 30 indicate high variance inflation (i.e., high multicollinearity).
Doesn’t work in probit/logit. Since at this point you’re only interested in multicollinearity, re-estimate a probit/logit equation in regression and then follow the procedure above.
Autocorrelation: regdw (replaces regress command and executes Durbin-
Watson test). The data need to be dated – for example, if your data are annual and you have a variable called year, then before you do a regression type: tsset year and press “enter” (or after running regression with “regress” type dwstat). The command corc (replaces regress command and executes Cochrane-Orcutt correction for first-
order autocorrelation – note data must be dated, see regdw discussion above). You can save the first observation by using the Prais-Winsten (replace “regress” with “prais”).
Heteroscedasticity: run regression replacing regress with fit and then, as a next
command, type: hettest and press enter. If you have significant heteroscedasticity, use the “robust’ estimation option. Thus, for a “robust” regression type: rreg tax cons party stinc
Lagged Independent Variable: to lag variable ussr by one time period type:
You can lag a variable one time period by typing “l.” in front of the variable. Thus, l.ussr should be a one period lag of ussr. You can also do this by typing: gen xussr = ussr[_n-1] which will create a new lagged variable: xussr. Remember that your data must be dated (see regdw discussion under Autocorrelation above). Lagging will cost one data point, when you run the regression it running it on your sample minus the first observation. There is an “underline” before the “n-1.”
First Differences: To create a variable that tells the difference in scores from
one time period to the next (e.g., 2003-2002), type “d.” in front of the variable. Thus, to difference ussr type d.ussr
Moving Average Variable: To transform state ideology from an annual variable
(stideoan) to a moving average where years 1-4 were the same score (because our the data series we needed extend four years prior to the beginning of the time period for which data were collected) and years 5 and beyond were the average of the current year plus the immediately preceeding 3 years (each year equally weighted) type:
xtset stnum year (note: stnum = state number and year=year)
tssmooth ma stdeoan_ma=stideoan , window(3 1 0)
To generate a 12 year moving average of Democratic control (assuming you have already generated the Democratic control variable – e.g., 1= Democratic governor + Democratic majority of both houses of the state legislature in year “t” – how to do this is explained later in this file)
tsset stnum year, yearly
gen demcont12 = (demcont[_n-12] + demcont[_n-11] + demcont[_n-10] + demcont[_n- 9] + demcont[_n-8] + demcont[_n-7] + demcont[_n-6] + demcont[_n-5] + demcont[_n -4] + demcont[_n-3] + demcont[_n-2] + demcont[_n-1])/12
Equality of Regression Coefficients: use the suest (seemingly unrelated
regression) post-estimation command. Estimate model 1, then model 2, and it forms the seemingly unrelated variance-covariance matrix (estimates) for the combined set of coefficients. With that, you can test if some coefficient from model 1 is equal to some coefficient from model 2.
Standardized Coefficients: in regression just add ,beta to the end of the
command line – thus: regress tax cons party stinc, beta
To obtain standardized coefficients for probit, logit and multinomial logit
first estimate the desired equation. After the results appear type the following in the command line: listcoef (You may need to download this option from Stata – it will work but may not be built into the package – to download you need to be connected to the internet and type the following in the command line: ssc install listcoef – if that doesn’t work try findit listcoef – then you need click on the appropriate link). If you are interested in the relative value of coefficients, use the coefficients in the “bStdXY” (i.e., the coefficients in this column should be identical to what you receive with the “beta” command in regression). Additionally, two bStdXY coefficients have virtually the same ratio as do the same two bStdX coefficients).
Marginal Effects: run regress, probit or logit. Then in command line type: mfx
In probit you can also get the marginal effects of each independent variable by replacing probit with dprobit
Comparing Models in Probit/Logit (i.e., nesting – like F test in regression
for the equality of two R squareds) – from page 144 of J. Scott Long and Jeremy Freese, Regression Models for Categorical Dependent Variables Using Stata, 2nd. ed. –
probit involvem repcont demcont ablegal fund1 catholic
estimates store fullmodel
probit involvem ablegal fund1 catholic
estimates store smallmodel
lrtest fullmodel smallmodel
“Weights”/Downloading Data – If you are downloading data and Stata refuses
to accept the file by saying “weights not allowed” or something like that put quotations around the file name. Thus, if the file name is test, then in the command line type: use “test” (press enter) Put the parentheses around everything expect the word: use (thus use “C:/test” not “use C:/test”)
Word Responses into Numerical Responses: If you the responses are words
(e.g., strong agree, etc.) and you want to convert them to numerical values, one suggestion is to cut and paste the dataset into Excel and use the “Find and Replace” option – you can ask Excel to “find” words and then “ replace” with numbers. You can see if there is a numerical code that the words translate into (e.g., strong agree becomes “1,” etc.) by the following procedure: (1) click on “Data” at the top of the screen; (2) click on “Data Editor”; (3) I think you can choose either “Edit” or “Browse”; (4) click on “Tools”; (5) click on “value labels”; (6) click on “Hide all value labels” – numbers should appear at this point. There is a way to permanently convert words into numbers. Go to data editor (choose “edit”, not “browse”) and select (i.e., highlight) the variable (one variable at a time) you are interested in (e.g. q1). Right click the mouse and choose “Value Labels” then choose “ Assign Value Label to Variable 'q1’” and finally, choose “None.” This will erase the labels for the variable and leave the numeric values.