Purpose: Copyright, authors, and disclaimers
SOLAR is Copyright (c) 1995-2007 Southwest Foundation for Biomedical
Research. All rights reserved.
The authors are John Blangero, Kenneth Lange, Laura Almasy, Harald Goring,
Jeff Williams, Tom Dyer, Michael Boehnke, and Charles Peterson. Parts of
SOLAR consist of software developed by others; a complete list is provided
in Appendix Four of the documentation included with this package (use the
"doc" command to find out where that is).
Use of this program should be acknowledged in scientific publications.
Commands, features, performance, and availability are subject to change.
There is absolutely no warranty, express or implied.
There is no committment to support scripts written using current commands
in future releases.
Shortcuts: abou - about
Purpose: Set up all non-trait variables as covariates
Usage: allcovar
Notes: Phenotypes and trait commands must already have been given.
If there is a variable named "age," it will be set up
as "age^1,2#sex." If undesired variables are made into
covariates, they should be removed with the covariate
delete command.
allcovar will not include names mapped to any of the standard
field variables (see 'help field'). Be sure to set up field
mappings (if required) first to ensure you don't get extra
covariates for the likes of ID, FAMID, etc.
allcovar will also not include any names added to the 'exclude'
list. Use the 'exclude' command to add names to the exclude
list, or to display the exclude list. By default, the exclude
list includes some standard PEDSYS mnemonics
Shortcuts: allc - allcovars
A1.3 allsnp
Purpose: Include all snps as covariates in current model
Usage: allsnp
Notes: allsnp includes all the phenotypes prefixed with snp_ or
hap_ as covariates in the current model. This is often the
first step in a qtn analysis. Afterwards, you can remove
some snps using the "covariate delete" command.
It is OK if you have already selected other covariates,
including some of the snps. Every covariate is only added
once no matter what.
allsnp looks at all currently loaded phenotype files.
Purpose: Evaluates the tail of normal curve
Usage: alnorm <x> [t | f]
If "t", curve is evaluated from X to infinity
If "f", curve is evaluated from minus infinity to X
Shortcuts: alnorm - alnorm
Purpose: Describe ascertainment correction using proband(s)
Ascertainment correction is available through use of the proband field in
the phenotypes file. Ascertainment correction by conditioning on
probands is automatically performed if there is a field named 'proband',
'probnd', or 'prband' (in upper or lower case) in the phenotypes file.
Probands are those individuals through whom the pedigree has been
ascertained.
In a proband field, blank ( ) or zero (0) signifies non-proband (normal)
status, and anything else signifies proband status. No decimal point is
permitted after the zero.
If your proband field is named something else, the safest approach is to
modify your data files accordingly. If that is not possible, you can use
the SOLAR 'field' command to map your name. For example, if your proband
field is actually named 'Affected', you would use the following command:
solar> field probnd Affected
(Note that the name 'probnd' is used as a field type selector because that
is the PEDSYS standard mnemonic.)
For routine use, such a field command should be included in a .solar
startup file or user script. Field mappings are not included in model
files. For that reason, it may be safest to modify the code or data file
if possible.
Conversely, if your file has a probnd field but you wish it to be ignored,
you can rename the probnd field or give a command like the following:
solar> field probnd -none
Proband individuals are required to have all the quantitative
variables required of other individuals to be included (as probands) in
the analysis. Probands who are missing any quantitative variables are
not included in the Proband Count and except for defining the pedigree
structure do not enter into any calculations.
Unlike the program Fisher, SOLAR does not require probands to be at the
beginning of pedigrees, and does not require you to provide a "proband
count." Other than that, SOLAR uses the ascertainment correction
algorithm built-in to Fisher.
SOLAR prints a Proband Count in the summary statistics, which are written
to maximization output files and to the terminal unless the verbosity
level is set low.
Shortcuts: ascer - ascertainment
Purpose: Default model setup
Usage: automodel <phenotypes> <trait>
phenotypes is the name of phenotype file
trait is the name of trait
(all other variables will be used as covariates)
Notes: 1. Automodel will create a new model, with all non-trait and
non-pedigree variables as covariates (see note 2).
2. The pedigree-related fields listed by the 'field' command
will not be used as covariates (except for SEX, which will be).
Certain other standard PEDSYS names are in the default exclude
list. You can add additional items to the exclude list with
the exclude command. See 'exclude' and 'allcovar' help.
3. Boundaries and starting points are set automatically by the
maximize command.
4. You can pick and choose from the commands that automodel uses
if you want to do things differently. Here is the body of
automodel:
model new ;# Start a new model
phenotypes load filename ;# load phenotypes file
trait traitname ;# assign trait variable
allcovar ;# assign covariates
polymod ;# set polygenic model type
Shortcuts: autom - automodel
Purpose: Perform bayesian oligogenic model averaging
on covariates or linkage components of the current model.
Usage: bayesavg [-cov[ariates]] [-ov[erwrite]] [-redo]
[-max <max>] [-cutoff <cutoff>] [-symmetric]
[-list <listfile>] [-fix [cov|param]]
[-size_log_n] [-nose] [-old_log_n]
[-sporadic] [-h2rf h2r_factor] [-saveall]
[-qtn] [-stop] [-nostop]
bayesavg -r[estart] ;# (see also -redo)
SPECIAL NOTE: THE ALGORITHMS CHANGED in VERSION 1.7.3. SEE NOTES 1-4.
NUMERIC RESULTS MAY DIFFER FROM PREVIOUS VERSIONS.
-covariates (or -cov) Perform bayesian model averaging
on the covariates only. (The default is to perform
bayesian model averaging on the linkage elements.)
-overwrite (or -ov) means force overwrite of existing output
files
-max Only include this number of components, or fewer,
at one time. This reduces the number of models
enormously, particularly for large N.
-list file contains a list of the elements to use.
There is one line for each covariate or linkage
parameter. Remaining covariates or linkage parameters
in the starting model are automatically fixed.
Covariates need not be present in the starting model,
but linkage parameters (and their matrices, etc.) must
be included in the starting model.
-fix (or -f) fix (lock in) this covariate. A fixed element
covariate (specified by covariate name, e.g. "age") or
linkage element (by linkage parameter name, e.g. "h2q1")
is carried through all models. (Note: a -fix or -f
qualifier is required for each covariate to be fixed,
for example: -f age -f sex.) When fixed elements are
included, it is adviseable to run "polygenic" on the
starting model first.
-cutoff (optional) sets the BIC limit for occam's window
(default: 6)
-log_n specify the log(n) to use. Normally this is first
estimated from the samplesize of the unsaturated model,
then recalculated from the standard deviation of the
mean and it's standard error in the model with the best BIC.
-symmetric (or -sym) use "symmetric" Occam's window.
The default is a "strict" Occam's window, which excludes
superset models with higher BIC; symmetric Occam's window
includes ALL models within BIC cutoff.
-stop Stop when no models in the last group with the same
size (degrees of freedom) have entered the window.
(This is the default for -qtn.)
-nostop Do not stop when no models in the last group with
the same size have entered the window. (Useful
for overriding the default for -qtn.) If -stop
or -qtn is specified, however, the report if any
models have entered the window is still given.
-restart (or -r) means restart previous bayesavg run that was
terminated before completion. This begins with the
model after the last one in the output file. Do not use
-restart if last run completed. When restarting, set
the trait or outdir, then give the command "bayesavg
-restart" with no other arguments. The original model
and other arguments are automatically recalled.
Previous command arguments are read from
bayesavg.command and the previous starting model is
c.orig or cov.orig. If you need to change anything, use
the -redo option instead. You will also need to use
the -redo option if the first pass through all models
completed, or if the bayesavg was started under
a previous version of SOLAR.
-redo is a special form of restart that allows you to change
some options. Unlike -restart, -redo REQUIRES YOU TO
SPECIFY ALL OPTIONS AND LOAD ORIGINAL STARTING MODEL.
Only models not already found in the output file will be
maximized.
There are several cases where you must use -redo instead
-restart: (1) If you need to
re-maximize models which had convergence problems
previously (edit them out of bayesavg*.est file, change
boundaries, then -redo). (2) If previous bayesavg run
completed but you want to try a different window cutoff or
type. (3) You deleted all files except the bayesavg.est
file. (4) You need to restart from a previous version of
SOLAR. Unlike -restart, with -redo you must set up the
starting model and commands either as they were previously
or with desired changes. Since you must set up the
original model EXACTLY, and specify options either EXACTLY
as they were originall specified, or with the desired
changes, you are advised to use this option carefully.
It is a good idea to make a backup copy of the outdir
first.
-saveall will force the saving of all models. Normally only
the models within Occam's window are saved. (Note:
models outside the window will not have standard errors.)
-size_log_n Use the log(n) estimated from sample size as the
final log(n). This bypasses the computation of
log(n) from the S.E. of the SD parameter of the
model with the best BIC.
-nose Do not compute standard errors for any models
(normally they are only computed for models
in the window). Unless you specify a particular
-log_n, the log(n) estimated from sample size
will be used (as with -size_log_n).
-old_log_n This calculates log(n) the old fashioned way,
using the saturated model for covariate analysis
or the unsaturated model for linkage analysis.
This option is provided for comparison with
earlier releases, and may soon be removed.
-h2rf (optional) is used to set upper bound of H2r
(default: 1.1) See notes below. Use of this option
is now unnecessary because of automated boundary control.
-sporadic This option is depricated. Force all models
to sporadic. Valid only with -covariate. Now you can
accomplish the same thing by making the starting model
sporadic.
-qtn Quantitative Trait Nucleotide Analysis:
A "covariate" analysis is done with "-stop" in effect.
Covariates with name snp_* or hap_* are automatically
included but other covariates are excluded. A special
"windowfile" named bayesavg_cov.win is also
produced. The -stop default can be overridden with
-nostop. To include all snps in the starting model,
use the separate command "allsnp".
Output: In addition to the terminal display, the following files are
created (<outname> is "bayesavg" for linkage analysis or
"bayesavg_cov" for covariate analysis):
<outname>.avg Final averaged results
<outname>.out Final BIC and other info for each model
(standard errors for models in window)
<outname>.history History of analysis messages
<outname>.est Estimated BIC for each model (pass 1)
<outname>.nose Final BIC but no standard errors (pass 2)
Models are saved with "c" <prefix> for linkage analysis and "cov"
prefix for covariate analysis:
<prefix>0.mod Unsaturated model, with standard errors
<prefix>1.mod Model with element 1 (if saved)
<prefix>12.mod Model with elements 1 and 2 (if saved)
<prefix>12_11.mod Model with elements 1, 2, and 11.
<prefix>.orig.mod Original user model when started
<prefix>.start.mod Base model (unsaturated) before maximization
<prefix>.base.mod Maximized base model
Notes: 1) bayesavg determines the number of variable (non-fixed)
elements and sets N automatically. N and the number of
models are reported near the beginning. A new algorithm
is used to determine all the element combinations; this
results in a more logical ordering in which the smallest
models are evaluated first.
2) The first pass through all models is done with an approximate
log(n) computed from the sample size. The resulting file
is bayesavg.est (or bayesavg_cov.est). The final log(n) is
then computed from the model with the best BIC, and all
BIC's are recalculated with the resulting file being
bayesavg.nose (or bayesavg_cov.nose). Then, standard
errors for only the models within Occam's window are
recalculated. The resulting final output file is
bayesavg.out (or bayesavg_cov.out). The output summary
averages are reported in bayesavg.avg (or
bayesavg_cov.avg). This is a new algorithm designed to
save time (by only calculating standard errors from the
models in the window), be more robust, and give more
accurate results. Results may differ somewhat from those
in earlier versions (prior to 1.7.3) of SOLAR. Additional
history of the analysis (the messages starting with "***")
are saved in bayesavg.history (or bayesavg_cov.history).
3) To permit special models (with household effects, epistasis,
etc.) to be handled, bayesavg no longer forces the starting
model to be sporadic first. It merely maximizes the current
model, with all non-fixed elements removed, but with no
change(s) to the starting omega or constraints.
If the starting model cannot be maximized, the user is
advised to run "polygenic" first. Running "polygenic"
first is probably a good idea in all -covariate cases,
particularly if there are non-fixed elements.
4) Models are now "built-up" from the unsaturated model
rather than being "constrained down" from the saturated
model. The unsaturated model itself is usually created
by "constraining down" the starting model.
5) bayesavg may not support bivariate models.
Shortcuts: bayesa - bayesavg
Purpose: Lower priority of SOLAR to allow more CPU for other jobs
or lower priority of one SOLAR run relative to another
Usage: benice ; Set "nice" level to 15
benice <LEVEL> ; LEVEL is between 1 and 20
; 20 is "most nice"
Notes: This is intended for use on Unix systems which support the
"renice" command, including Solaris 2.5 and above
Once you have set a nice level, you cannot go back to a
higher priority on this process. You must exit and restart.
The default unix scheduling allows some time even for
very "nice" jobs. However, they get somewhat less CPU than
other jobs.
On the SFBR Ranch, scheduling is absolute, so that "nice"
jobs will be suspended until all other jobs are done (or
waiting for a system resource such as disk access). Nice
jobs have minimal (<1%) impact on other jobs, unless they
hog huge gobs of memory.
Shortcuts: beni - benice
Purpose: Blank individuals according to variable data conditions
Usage: blank [-o] [-q] [-n] [<conditional expression>]
<conditional expression> can be any solar variable expression (as allowed
by the define command for covariates) that adds up to zero or non-zero.
If it adds to non-zero for a person, that person is removed from the sample.
[-q] Go about blanking quietly.
[-o] Force overwrite of existing definition having same name
(see below for example of definition naming).
[-n] Make new definition name if this would otherwise
conflict with existing definition name
With no arguments, blank shows blanking definitions currently in
effect. To see all the definitions available, use define command.
Examples:
blank class!=1 ;# include only class=1 in sample
blank age<<55 + sex==1 ;# blank all but old guys
blank age>=55 * sex==2 ;# blank only old guys
Notes:
1. blank creates a definition and a null covariate to achieve the
desired blanking. It shows you what it does, and then suggests
how this blanking may be deleted:
solar> blank age<<55 + sex==1
define blank_age = blank * (0!= (age<<55 + sex==1)
covariate blank_age()
To delete: covariate delete blank_age()
solar>
2. blanking is cumulative through the effect of all blanking covariates
that remain in effect. If you choose a condition which would create
the same name as used by a previous condition (see example above) it
will raise an error. You can force overwrite with -o.
3. To restrict sample based on available of some variable, use a regular
null covariate for that variable, as documented for the covariate
command, for example:
covariate age()
null covariates (having following empty parentheses) are not included
in likelihood estimation, but are used to delimit the available sample,
just as blanking covariates are.
4. You may also create covariate definitions just like blank does. But
be careful because it is easy to do it wrong.
Purpose: Change artificial boundary heuristics
Usage: boundary ; show settings
boundary wide [start|off] ; wide boundaries
boundary null [on|off] ; use boundaries from
; null model
boundary start upper <term> [<term> ...] ; Initial upper bounds
<term> ::== <number> or <number>*h2r
boundary float upper <number> ; Later upper bounds
boundary change <number> ; Amount to change by
boundary crunch <number> ; Crunch bounds +/-
boundary quadratic tol <number> ; quadratic tolerance
boundary max crunch <number> ; Maximum crunches
boundary h2r factor <number> ; Bound h2r
boundary e2 squeeze <number> ; Bound e2
boundary trace [off] ; Trace upper bounds
boundary hints ; More discussion
boundary cov retries <number> ; Max covar retries
boundary cov incr <number> ; On each retry,
; increase cov bound
; by this factor
Examples:
boundary start upper .2 .1 .05
boundary float upper .05
Notes: To function properly, the maximization algorithm used by SOLAR
needs a little bit of help in the form of artificial boundaries.
In general, any variance component can assume a any value from
0.0-1.0, but in any particular case the range is more limited, and
artificially limiting the range helps SOLAR maximize successfully.
A set of heuristics and retry algorithms has been developed for
SOLAR to set and adjust artificial boundaries. The heuristics
should not normally require adjustment. If they do, please send
a message to solar@txbiomedgenetics.org so we can improve SOLAR.
You will know if you are having trouble with the boundary
heuristics becase you will get 'Boundary' or 'Convergence'
errors. Beginning with SOLAR version 1.4.0, you will not get
'Boundary' errors for variance components because SOLAR will
automatically increase the boundaries up to the theoretic limits
(0.0-1.0) as required. If you get 'Convergence' errors, you
should try setting some of the heuristics to lower values than
they have already. In addition to these heuristics, there are
now also built-in retry mechanisms which automatically increase
bounds if they are found to be too small, or decrease bounds if
they are too big (being too being can cause convergence
problems). SOLAR will always discover if bounds are set too
small to find the correct result and increase them, but it may
not be able to deal automatically with bounds that need to be
set very close to the correct result.
If you get Boundary errors for covariates, you can deal with
them in one of two ways. For one, you can simply set the
covariate upper and lower bounds in the starting model to
reasonable values using the 'parameter' command and then re-run
the analysis. Or, you can use the 'boundary cov retries' or
'boundary cov incr' commands to adjust the covariate boundary
retry mechanism (which is separate from the mechanism for
variance component boundaries). Covariate bounds do not
have obvious theoretic limits, so it is impossible to
automatically prevent all covariate boundary errors.
boundary wide on ... set wide boundaries for variance components (N/A)
boundary wide start ... set wide boundaries at start of chromosome
boundary wide off ... use standard boundary heuristics
"boundary wide on" causes the boundaries for future linkage
models to be set to the full natural range (lower 0 upper 1).
This has no effect on the model currently in memory, but will be
applied to future linkage models created by the multipoint,
twopoint, linkmod, and linkmod2p commands. This supercedes the
standard variance component heuristics ("boundary start upper,"
"boundary float upper," "boundary h2r factor," and "boundary e2
squeeze") and also sets "boundary null off." THIS OPTION IS NOT
YET AVAILABLE (use "boundary wide start" instead).
"boundary wide start" causes the boundaries for future linkage
models to be set to the full natural range for the first QTL on
each chromosome. After the first QTL, the standard heuristics
are applied. (For twopoint linkage, this is the same as
"boundary wide on")
Both "boundary wide on" and "boundary wide start" turn off
"boundary null on," as the options are incompatible.
"boundary wide off" restores the usual variance component
boundary heuristics for future linkage models. "boundary wide
off" does not necessarily restore the exact boundaries
previously in use, and it does not restore "boundary null on" if
that had previously been in effect.
boundary null on ... set boundaries according to null model
boundary null off ... back to standard boundary heuristics
"boundary null on" causes the boundaries for future linkage
models to be taken from the current null model. In pass 1 of
multipoint, for example, the boundaries would be taken from
null0.mod, and in pass 2, they would be taken from null1.mod.
In cases of persistent convergence failure, you can edit the
boundaries in the null model and use "multipoint -restart"
to attempt to resolve the jam.
"boundary null on" turns off "boundary wide on" and
"boundary wide start," if they had been operative, because the
options are incompatible.
In the case of h2q* parameters not defined in the null model
(for example, h2q2 will not be defined in null1.mod, though it
is required for all two-linkage models), the default is to
use the boundaries for the previous h2q parameter. SOLAR
always defines h2q1 in null0.mod.
"boundary null off" restores the usual boundary heuristics for
future linkage models. It does not restore "boundary wide start"
or "boundary wide on" if those had been in effect previously.
boundary start upper ... set upper bound starting point for h2q's
boundary float upper ... set upper bound based on previous h2q value
These commands apply to the upper bounds of h2q* parameters
(e.g. h2q1). The default values are deliberately chosen to
be quite low because they are automatically raised as required
by a retry mechanism. If the starting values were set to high,
convergence errors could occur, and the mechanism for handling
convergence errors is not as robust because it doesn't know which
boundaries to adjust.
'boundary start upper' sets the starting value for the upper
bound of each new h2q parameter at the beginning of each
chromosome. This can be set as a single number (0.0 - 1.0) or
as a term including 'h2r' (such as 0.8*h2r, which is the default).
(H2r will be taken from the preceding null model if one is found.
For example, if there is one linkage component, the null model is
null0.out, which contains no linkage components. If there are
two linkage components, the null model is null1.out which contains
one linkage component.) Multiple values can be specified, one
for each multipoint scan. The last value specified applies to
all remaining scans. The default value of 0.8*h2r means that
the upper bound for each new linkage component is set allowing
for 80% of the current residual heritability to be accounted for
by the first locus.
'boundary float upper' sets the value for the upper bound of
the newest h2q parameter after the beginning of each chromosome.
The upper bound floats above each previously maximized h2q
value by this amount, which defaults to 0.1.
boundary change
'boundary change' sets the value by which a bound changes
after a boundary condition is detected. Upper bounds will be
raised and lower bounds will be lowered by this amount. The
default value is 0.1.
boundary crunch
'boundary crunch' sets the boundaries around each variance
component if a convergence error occurs and then invokes a retry.
The default value is 0.1. For example, if the previous value for
h2r was 0.3, the new boundaries will be set at 0.2 and 0.4.
Boundary crunch is only applied after convergence errors, after
which the boundaries can expand again through the retry mechanism.
boundary max crunch
'boundary max crunch' sets the limit on the number of crunch
attempts for each locus. Any given crunch may be followed by a
series of boundary expansions, so multiple crunches may be
required. The default is 10, to give a large reasonable chance
of success (if success is going to be achievable). Two crunches
in a row are never permitted (that would be meaningless).
boundary quadratic tol
The normalized quadratic (for quantitative traits only) is
normally required to be between 0.999 and 1.001. For some
problems, this is unrealistic. To change the tolerance to
+/- 0.01 (0.99-1.01), you would give the command:
boundary quadratic tol 0.01
The allowed range is 0 - 1.
boundary h2r factor
'boundary h2r factor' sets an upper bound for h2r based on the h2r
value in the null model. The default value of 1.1 means that
h2r is allowed to grow to 1.1x the size it had in the null model.
So far as I know, this has never needed adjustment. In any case,
if it is too small, the automatic retry system will handle it.
boundary e2 squeeze
'boundary e2 squeeze' sets boundaries for e2 based on the previous
e2 value. The default value of 0.1 means that e2 is allowed to
deviate +/- 0.1 from the preceeding value.
boundary trace [off]
'boundary trace' enables a trace of the upper bound applied to the
newest h2q for each locus, and shows all retries and perturbations.
This feature may be shut off with 'boundary trace off'.
boundary cov retries <integer>
'boundary cov retries' sets the maximum number of retries during
which the covariate boundaries are increased. The default is
10.
boundary cov incr <number>
'boundary cov incr' sets the factor controlling the amount by
which a covariate boundary is increased during a retry. The
default is 5, which results in at least a five-fold increase
on each retry. (The actual increase depends on the difference
between both boundaries, and so will be larger than 5 in the
beginning. This is subject to change.)
-
Shortcuts: bou - boundary
Purpose: Discuss boundary error resolution strategies
This is an extension of the help provided for the 'boundary' command, which
you should read first.
When convergence errors occur during a multipoint scan, scanning will
terminate at the end of the scan regardless of whether some LOD scores
reached criterion levels or not, and a message like the following will be
displayed on the terminal (and printed to the multipoint.out file):
*** Exiting because convergence errors occurred in last pass
Also, to the terminal and the applicable output file for the scan, an
error code will be appended to the end of each line on which an error
occurred, for example:
chrom 18 loc 0 0.0000 -2203.917 0.022568 0.268372 ConvrgErr
The code "ConvrgErr" indicates that a Convergence Error occurred such
that it was impossible to find a good maximum likelihood estimation.
Beginning with version 1.4.0, SOLAR now uses a retry mechanism so that
boundary errors (related to variance components) will not occur.
Boundaries will be increased incrementally until their theoretic limits
are reached. It is still possible that convergence errors might occur,
and those may be controlled with the boundary command.
Boundary errors related to covariates are also handled with a retry
mechanism controlled by the 'boundary cov retries' command and the
'boundary cov incr' command. The default values should work in almost
every case, but it is not possible to say they will always work because
with covariates there are no theoretic limits.
If convergence errors occur, you should use the 'boundary' command to
lower the applicable artificial boundary setting heuristic. For example,
if the error is at the beginning of a chromosome, you should use the
'boundary start upper' command to set a lower value than the default.
All you need to do when these errors occur during a multipoint scan is
to "restart" the scan after resetting the applicable heuristic. The
restart will detect those models for which an error occurred, and
redo them with the new heuristics. For example:
solar> boundary start upper 0.1 0.05
solar> boundary float upper 0.05
solar> boundary change 0.05
solar> multipoint 3 -restart
(In earlier releases, you had to edit out the models for which errors
occurred in the multipoint1.out file and then restart. Now SOLAR
recognizes models for which errors occurred and will redo them by
default.)
Purpose: Compute probability for a chi-square value
Usage: chi <value> <degrees>
chi -number <value> <degrees> ; return only the number
chi -inverse <pvalue> <degrees>
Notes: Without the -number argument, the result is returned as a string
like this:
p = 0.0012345
(The sign will be "<" if below available accuracy.)
With the -inverse argument, the chi-square value corresponding
to a given p-value is returned. The -number argument does not
apply when the inverse is computed and should not be given.
chi will raise an error for certain out of bound conditions
You may use a catch {} block to prevent this from ending scripts:
set test [catch {set p [chi $val $deg]}]
if {$test != 0} {set p 1.0}
Shortcuts: chi - chi
Purpose: Select chromosome(s) for multipoint scan
Usage: chromosome [<number>|<name>|<low>-<high>|all|*]+ ;select
chromosome ; show currently selected chromosomes
chromosome show ; show all available chromosomes
chromosome showm ; show mibd's in pass (see note 2)
Examples:
chromosome 10
chromosome 10-13 15-17 20
chromosome 11 11p
chromosome all ; select all available chromosomes
chromosome * ; select all available chromosomes
Notes: Use in conjunction with mibddir, interval, multipoint commands.
(2) The showm option lists the mibds's that will be selected by
the current "chromosome" and "interval" commands.
(3) Alphanumeric chromosomes may not be in <low>-<high> ranges, but may
be selected individually (for example, 11p), or with "all" or *.
(4) The chromosome specification is not saved from one solar session
to the next unless put in a .solar file.
(5) For convenience, you may specify a chromosome or range of
chromosomes whose mibds are not actually present, and
the gap is ignored silently, as long as there are some mibds
available for other specified chromosomes. The chromosome
command acts as a filter applied to the mibd data actually
available.
Shortcuts: chro - chromosomes
solar::lod --
Purpose: Calculate LOD score
Usage: lod [<test-loglike> <null-loglike>] [<options>]
options := [-auto|-off|-raw] [-trait <N>] [-rhoq <N>] [-v]
[-1t|-2t|-3t|-4t|-t1|-t2|-t3|-t4] [-nolodadj]
If no likelihoods are specified, the likelihoods of the
"current" model and the applicable "null" model are used.
-auto Convert multivariate LOD to 1df effective LODs based
on number of traits in current model and constraint
of relevant rhoq's (default)
-off Do not convert LODs to 1df effective
-raw Do not perform LOD conversion or lodadj
-traits <N> Convert multivariate LOD to 1dF assuming <N> traits
-1t or -t1 Assume 1 trait (same as "-traits 1")
-2t or -t2 Assume 2 traits (same as "-traits 2")
-3t or -t3 Assume 3 traits (same as "-traits 3")
-4t or -t4 Assume 4 traits (same as "-traits 4")
-rhoq <N> Convert multivariate LOD to 1df assuming <N>
constraints of relevant rhoq's
-nolodadj Do not perform lod adjustment (lodadj)
-v verbose: Show adjustment and conversion steps
Examples: outdir test1
load model test1/null1
lod
lod -v
lod -2000.51 -2030.87
lod -trait 3 -rhoq 1 -v -2000 -2030
lod -raw -2000 -2030
Notes: If no likelihoods are specified, the current model must have
been maximized through a command such as "maximize," "twopoint",
or "multipoint", and the applicable null model should be saved as
nullX.mod (e.g. null0.mod, null1.mod) where X is the number
of active linkage elements, which is assumed to be one less
linkage element than in the current model. Linkage elements are
parameters named h2q1, h2q2, etc. The null model must have
been saved in the maximization output directory, either named
after the trait or set by the outdir command.
By default, SOLAR provides easily interpreted "1 df effective" LODs
which are equivalent to those in univariate models.
However, you can also have complete control over the LOD
conversion performed either using arguments here or
preferences set globally with the lodp command. Options
specified here override the defaults and lodp preferences.
The correction of 2 trait LODs to 1dF effective LODs is based
on this formula: the LOD is converted to chi square with
1/2 1df, 1/4 3df, and 1/4 point mass at zero. If rhoq is
constrained, the formula is 1/2 1df, 1/4 2df, and 1/4
point mass at zero. This is then converted to a 1/2 1df
chi square of equivalent p-value, which is divided by
2ln10 to get the 1df effective lod score.
The correction of 3 trait LODs to 1dF effective LODs is based
on the formula: the LOD is converted to chi square with
3/8 1df, 3/8 3df, 1/8 6df, and 1/8 point mass at zero.
For each rhoq constrained, the 6df is changed downward
by 1df.
The conversion of higher multivariate LODs follows a similar
expanding sum. If you wish to see the weights used, use the
lod command with the -v option.
Empirical LOD adjustment, if any, is automatically applied (see
the lodadj command) unless the -raw option is used. Unless you
specify -raw, SOLAR will need to search the output directory for
a lodadj.info file, which means that a trait or outdir must
have been selected.
Empirical LOD adjustment is not yet supported for bivariate
models. The lodadj value is ignored when bivariate LODs are
computed, and, in the cases where the lodadj value would be
shown (such as in the multipoint.out file, or if lod is called
from the command prompt) a warning message is shown instead.
In SOLAR version 3.0.2, the "clod" and "lod" commands were
combined into a new "lod" command. The options allowed
have changed compared with the earlier "clod" ; the original
"lod" command did not allow any arguments.
Use the "lodn" command if you the current model may not use
the "h2q1" linkage parameter and you are not specifying
loglikelihoods explicitly.
See also lodn, lodp, lodadj.
Shortcuts: clod - clod
Purpose: Make a list or count combinations of integers 1..N of size K
Usage: combinations <N> [<K>] [-max <maxsize>] [-list list] [-force]
[-count] [-counts] [-start <number>] [-stop <number>]
N defines the range of integers 1..N of interest. If no
other arguments are specified, and N <= 10, the set of
all combinations of this range of integers is returned.
To get a list of combinations where possibly N > 10,
add either the -list or -force option, with -list being
the preferred method.
K only include combinations of exactly this size (as
in traditional "combinations"). If this argument is
not specified, the default is to include combinations
of all sizes, starting from the smallest size.
-count Only return the NUMBER of combinations, not a list
of the actual combinations.
-counts Return a list containing the number of combinations for
each "size" (i.e. "K").
-max include all combinations up to and including this size
(the default is to include combinations of all sizes).
The K and -max arguments may not be used at the
same time.
-list APPEND combinations to this list rather than returning
them. Specify the list variable by name, as with the
Tcl lappend command (see example below). If the variable
is not already set, a new variable is created. When this
argument is used, nothing is returned. For example:
set comblist {}
combinations 20 -max 10 -list comblist
Be sure to empty the list first (as shown above) if you
do not want to append to the previous contents, if the
variable was used previously in the same procedure. This
option may save memory (as compared with -force) for
very large N since only one copy of the list is ever
created.
-force return list ("by value") even if N > 10. This
is required for N > 10 unless the -list, -count, -counts,
-start, or -stop arguments are given. Only use this
option if you are sure this is what you want to do.
Read all the following paragraphs to be sure. Generally,
you would only use it inside a script, where the
returned combinations are going to be immediately saved
to a variable, such as:
catch {set comblist [combinations $N -force]}
The reason to require a -force option is that if a
large N is given in an interactive session, the
terminal window could be locked up for hours displaying
all the combinations, with no way to break out until
the terminal output buffer is empty. If that were to
happen, you would probably want to kill the whole
terminal session from another terminal window. For
some users, that would probably require calling the
system administrator.
The -force option may require more memory than the -list
option because a copy of the list is created in the
process of "returning" it to the caller; that's just
the way Tcl works, and it becomes important when creating
lists with huge numbers of elements.
If you are using this form of the command in a script,
be careful that it is not the last command in the
script, which Tcl automatically returns. Then, if
the user runs the script from the terminal, the
terminal window would be locked up. If you must
use it as the last command in a script, you should
use a "catch" command around it, as in the example
above. The catch command only returns 0 (for success)
or 1 (for error).
The following options are useful when dividing up the set of
combinations into jobs of an equal size. Otherwise, they may
seem a bit obscure.
-start Start with combination number <number>
-stop Stop with combination number <number>
Notes:
CAUTION! The list can get VERY BIG! Be careful if n > 15 because
memory requirements double for each [incr n], unless you are setting k
or -max. ("BIG" means 100's of megabytes, gigabytes, etc. I am not
kidding. On Solaris systems, you can use the SOLAR "memory" command to see
just how much memory SOLAR has consumed.)
Shortcuts: comb - combinations
Purpose: Create, list, or delete constraints
Usage: constraint term [+ <term>...] = <number> | <term>
<term> is [<factor>*]<parameter>
constraint ; display all constraints
constraint delete <number> ; delete constraint by number
constraint delete <spec>|<left-spec> ; delete specified constraint
constraint delete_all ; delete all constraints
Example: constraint e2 + h2r = 1
constraint bq1 = bq2
constraint delete bq1
constraint delete h2r + e2 = 1
constraint H2 + 3*H2q1 - 2*H2q2 = 5*E2 ; anything is possible
Notes: (1) The constraint numbers are shown when listing constraints.
(2) If a new constraint matches the right hand "body" of an
existing constraint, that existing constraint is replaced by
the new constraint (the old one would be invalid anyway).
solar> constraint sd = 1
solar> constraint
[1] sd = 1
solar> constraint sd = 0
solar> constraint
[1] sd = 0
(3) For the "constraint delete <left-spec>" command, if there
is a constraint matching the entire left specification of a
constraint, it is deleted. Or, you can specify the entire
specification to deleted. (The "constraint delete <parameter>"
version of the constraint command was ambiguous and is now
obsolete.)
[1] e2 + h2r + h2q1 = 1
[2] h2q1 = 0
constraint delete h2q1 ;# deletes constraint [2]
constraint delete h2q1 = 0 ;# deletes constraint [2]
constraint delete e2 + h2r + h2q1 ;# deletes constraint [1]
(4) Instead of constraining covariate beta values to 0, use
the "covariate suspend" command instead as this permits
greater efficiency.
(5) If you need to constrain interaction covariates (e.g. age*sex)
or parameters whose name begins with a number, or parameters
whose name includes other special characters, enclose
the parameter name in angle brackets <>. When deleting the
constraint, angle brackets are optional around the parameter
name. Do not include numeric factors in the delete command.
constraint 3*<bage*sex> = 1
constraint delete bage*sex
(6) Constaints may only be simple linear equations of terms
which include a optional numeric factor and a parameter name.
Operating exponents and functions are not supported.
If you need to constrain a power of some model feature,
consider making the parameter itself contain the required
power, then it can be linearly constrained.
(7) Numeric constants (such as 1 or 0) should only appear as
the right hand term.
Shortcuts: cons - constraints
Purpose: Install new executable file without disturbing current users
Usage: copybin <filename> <directory>
Note: The way this works is quite simple. The original version of the
file is not overwritten or deleted, but instead renamed.
Running processes continue to access the original version
through the inode, regardless of the name change, while new
processes will access the new version. The renaming scheme
simply appends dot followed by a number to the filename.
The first available number starting from 1 is used. For
example, the old "solarmain" becomes "solarmain.1" or
"solarmain.2" if a "solarmain.1" already exists, etc. At some
point you might want to clear out some of the older versions, but
that is up to you, and it would lead to numbering that is not
sequential, since copybin always takes the first available
number.
This is similar in design to the Unix "install -f" command.
It lacks some of install's checking features, but in one way
is much more capable: it allows any number of new versions to
be installed without disturbing users of the first or any other
previous version. The Unix install command only has one level
of backup since it merely prepends "OLD" to the original name.
If you do two install's in a row over a timespan in which jobs
are continuing to run (as, unfortunately, is often required)
copies of the original version are lost and users are likely
to get a memory mapping error of some kind.
This seems to work across NFS mounted filesystems, but it might
not work for you, so be wary. Actually, in ancient Unix days this
command might not have been necessary, but now that memory mapping
is used to load image files, it is necessary now.
Purpose: determine consistency of number of columns in a comma delimited file
Usage: countfields <filename>
An information report is returned like this:
longest: 8 (#1) x 1497 shortest: 8 (#1) x 1497
This means that the longest record had 8 fields, the first such record was
#1, and it was followed by 1497 others of same length in the file.
As it happens, the shortest record also had 8 fields, it was #1, and followed
by 1497 of the same length in the file.
Purpose: Set up covariates (independent variables).
It can handle interactions and polynomial terms.
For other non-polynomial models, use the 'mu' command.
Usage: covariate <variable>[^n | ^1,2[,3...]][*<variable> | #<variable>
[([trait])]]*
Creates a new covariate. See below for examples.
;
covariate ; display all covariate info
covariate delete <string> ; deletes covariate and beta(s)
covariate delete_all ; deletes all covariates and beta(s)
;
covariate <variable>() ; Null Covariate: require var in
; sample without covariation
;
; Covariate Suspension (for
; temporary hypothesis testing).
covariate suspend <string> ; temporarily disable covariate
covariate restore <string> ; re-activate suspended covariate
Examples:
covariate age ; Simple covariate Age
covariate age*sex ; Age by Sex interaction (only)
covariate age*diabet*diameds ; 3-way interaction
covariate age^2 ; Age squared as a simple covariate
covariate age^1,2 ; Shorthand for: age age^2
covariate age#diabet ; Shorthand for the following:
; covariate age diabet age*diabet
covariate age^1,2,3#sex ; Shorthand for all the following:
covariate sex age age*sex age^2 age^2*sex age^3 age^3*sex
covariate sex age(q1) age*sex(q3) ; Trait-specific Covariates:
; covariate sex applied to all traits
; covariate age applied to trait q1
; covariate age*sex applied to q3
In a multivariate analysis, trait-specific covariates are only required
for the sample of their trait. See note (7) below.
covariate q2() ; Null-Covariate:
; require q2 in sample of all traits
Notes: (1) More than one covariate may be specified separated by spaces.
Also more than one covariate command may be used. Adding a
new covariate does not remove previous ones. Spaces are
not allowd within the specification of each covariate term.
(2) Pound (#) and comma (,) are shorthands allowed ONLY if there
are no more than two variables. Further, only the first
variable may have multiple exponents separated by commas.
The following are INVALID:
covariate age^1,2*diabed*diamed ;# INVALID
covariate age#diabet*diamed ;# INVALID
covariate weight*height^1,2 ;# INVALID
Instead, specify multiple covariates as required:
covariate age*diabet*diamed age^2*diabet*diamed
covariate age diabet diamed age*diabet age*diamed
covariate diabet*diamed age*diabet*diamed
covaraite height^1,2*weight
(3) N-way interactions are possible to any N.
(4) Covariate commands create beta parameters automatically.
Beta names begin with 'b' followed by the variables
and exponents as in the non-shorthand form (e.g. bage^2*sex).
(5) Quantitative variables are mean-adjusted. Binary variables
are adjusted so that the lowest value is 0 and the highest
value is 1.
(6) Suspended covariate variables are still required in sample.
(7) In a bivariate analysis "unqualified" covariates are applied
to and required by all traits, and trait-specific covariates
(with parenthetically named trait) apply only to the named
trait and are only required for that trait. (This was
changed in SOLAR version 4.0.3.)
In a univariate analysis, ALL covariates are are currently
required regardless of whether they apply to the trait.
(The requirement of covariates specific to a different trait
may be removed in a future update.)
Null covariates (such as ef() ) are not applied to any
trait, but are required by all traits.
Variables not declared as covariates, but used in the mu
equation, are (at this time) required by all traits.
(8) When a trait is changed, covariate beta parameters are
and reset to force re-evaluation of boundaries
on the next maximization. But trait changes
are not permitted for bivariate models; "model new" is
required. "model new" will remove all covariates.
(9) When created, beta parameters have starting value 0.0 and
no boundaries. Likely boundaries are set automatically
during the first maximization, and expanded if necessary up
to an internally determined maximum (and you may further
expand them if need be but this is highly unlikely).
If desired, you may set boundaries after beta values have
been created, and this will bypass automatic boundary
setting (but not automatic boundary expansion).
Shortcuts: cov - covariates
Purpose: convert Fortran D style exponents to E form
Usage: d2e <inputfilename> <outputfilename>
d2e2 <inputfilename> <outputfilename>
d2e2 starts with line 2, so as not to disturb D's in the header line of
comma delimited files. d2e is more suitable for pedsys files.
SOLAR now understands Fortran D style in phenotypes files in most
circumstances anyway, so this conversion is not generally needed. You
will know you need this if you see error messages.
solar::d2e --
Purpose: convert Fortran D style exponents to E form
Usage: d2e <inputfilename> <outputfilename>
d2e2 <inputfilename> <outputfilename>
d2e2 starts with line 2, so as not to disturb D's in the header line of
comma delimited files. d2e is more suitable for pedsys files.
SOLAR now understands Fortran D style in phenotypes files in most
circumstances anyway, so this conversion is not generally needed. You
will know you need this if you see error messages.
Purpose: Define an expression to be used in place of a trait or covariate
Usage: define <name> = <expression> ; create a definition
trait <name> [,<name>]+ ; use definition as trait(s)
define ; show all defininitions
define <name> ; show definition for <name>
define delete <name> ; delete define for name
define new ; delete all expressons
define delete_all ; delete all expressons
define rename <name1> <name2> ; rename define name
define names ; return list of all names
<name> can be any alphanumeric string with underscore, do not
use these reserved words as names:
delete delete_all rename names new
<expression> is formatted algebraically using standard
math operators + - * / and ^ (power) and () parentheses, and
also all math functions defined by the C Programming Language
which includes "log" for natural logarithm, trig functions,
and hyperbolic functions, among others. Here is a list:
erfc, erf, lgamma, gamma, j1, j0, y1, y0, rint, floor, ceil,
tanh, cosh, sinh, atan, acos, asin, tan, cos, sin, expm1, exp,
logb, log1p, log10, log, cbrt, sqrt, and abs. In addition,
the inverse normal transformation (see help for "inormal") may
be applied using the "inormal_" prefix (for example,
inormal_q4 for trait q4). "inormal_" may be abbreviated
down to "inor_".
If a phenotype name within the expression contains special
characters (anything other than letters, numbers, and underscore)
it should be enclosed in angle brackets <>, and the angle brackets
must also include any special operator prefix such as "inorm_".
For example, given a trait named q.4 (with a dot), you could
have a define command like this:
define i4 = <inorm_q.4>
Note: similar rules apply within the constraint and omega commands
because those commands also allow expressions that could contain
decimal constant terms and math operators.
A debugging function named "print" is also available which
prints and return the value of the expression it encloses.
After printing, it pauses until the RETURN key is pressed.
RETURN may be held down to pass through a lot of prints.
Examples of the print command are given in the documentation
for the "omega" command.
The following relational operators may also be used between
any two terms. If the relation is true, 1 is returned,
otherwise 0 is returned. This enables you to construct
compound conditional expressions having the same effect as
could have been done with "if" statements. The C operators
< and > have been replaced with << and >> so as not to be
confused with the <> quotation of variable names in SOLAR.
C Format Fortran Format Test
-------- -------------- ----
== .eq. if equal
!= .ne. if not equal
>= .ge. if greather than or equal
<= .le. if less than or equal
>> .gt. if greater than
<< .lt. if less than
An expression is understood to be quantitative unless the
top level operator is a relational operator, in which case
it is understood to be discrete.
Names used must not match the names of any phenotype. When
there is an unintended match, the definition can not be used for
trait names since it would be ambiguous.
Once a valid definition has been created, it can be used in
the trait command. Any or all of the traits can be definitions.
All definitions will be saved in the model file, and loaded back
in when that model is reloaded. Definitions in a model file
will override current definitions. It is possible to save a
model with nothing but definitions if desired. The only
way to delete definitions is with the "new" "delete" or
"delete_all" options, or by restarting SOLAR. The "model new"
command has no effect on definitions.
Expression names are not case sensitive. Assigning a new
expression to a name replaces the expression previously
assigned to that name, even if it differs in letter case.
Renaming a definition to a name differing only in letter
case is possible.
For covariates only, it is possible to include in definition a
constant called "blank". If an evaluation of the expression
returns blank, that individual is counted as missing from the
sample. The best way to use this constant is with one or
more conditionals like this:
define sample = blank*(age<<22)*(sex==2)
covariate sample()
This blanks any male (sex==2) having age less than 22.
blank is the number -1e-20, so any numerical operation may
change it to a non-blank small number. It should only be
multiplied by 0 or 1. The empty parentheses after sample() mean
that it is not a maximized parameter, it is a null covariate
only used to delimit the sample.
Examples:
define loga = log(a)
define eq1 = (q1 - 3.1)^2
define dq4 = q4 .gt. 12
Purpose: Make limited user key (for deputy registrars)
Usage:: deputy register <deputy-key>
deputy make <access-code> <username>
Notes:
1) Deputy registrar must obtain deputy-key and access-code from
solar@txbiomedgenetics.org. Key is granted for critical collaborators
only for use in cluster systems where normal registration process
is unwieldy.
2) Deputy registrar uses "deputy register" command to register as
deputy. This creates a file named .solar_deputy in home directory.
(Note: It does not move the .solar_deputy file to SOLAR_DEPUTY_HOME
if that is different from the deputy's home directory.)
3) The .solar_deputy file must be copied to a user to a deputy
directory on all systems. This can be done in one of two ways.
The default way is to access the .solar_deputy file in the home
directory of the deputy, which must be found in a pathname
with the deputy's username replacing the current username. For
example if deputy registrar jsmith has registered the name pmiller,
and the home directory for pmiller is:
/home/pmiller
Then the .solar_deputy file must be found in directory named:
/home/jsmith
If this default method cannot be used, there is an alternate
method involving creating a shell variable SOLAR_DEPUTY_HOME
giving the path to the .solar_deputy file. For example, the
following line could be added to the "solar" startup script:
export SOLAR_DEPUTY_HOME=/home/admin/jsmith
4) The deputy registrar can now make a limited range key for each
user using the "deputy make" command. The user uses the
normal "register" command to install the key into a file named
.solar_reg in the user's home directory. The .solar_reg file
AND the .solar_deputy file (located as described in note 3)
must be found on each system where SOLAR is to be run because
both are used in the validation process for keys created by
deputy registrars.
5) The "deputy make" command adds the usernames registered to a file
named "solar_registrations" in your home directory. The contents
of this file should be sent to solar@txbiomedgenetics.org on at least
a biannual basis.
6) Username must be 2 characters or longer.
Purpose: Describe support for discrete traits
Usage: discrete-notes
Discrete traits are detected automatically by SOLAR. They must be
coded as two integer values separated by exactly 1. Typical codings
are 0,1 or 1,2. If you specify two values that are not separated by
exactly 1, this will be detected as an error. If you specify more than
two values, your trait will not be determined to be discrete. For
this reason, DO NOT specify missing values with a third number.
Missing values should always be coded as blank (" ") or null with no
number or character. DO NOT use "0" to signify missing values. See
toward the bottom of this note for advice regarding analyzing traits
with two values quantitatively.
Discrete traits having more than 2 values are not supported by SOLAR.
(This is also true for discrete phenotypic covariates: if discrete, they
should not have more than 2 values. If you have such data, they
should be recoded into N-1 discrete binary covariates or recast into
"household groups." See the documentation for the "house" command.)
Models with discrete traits may be used with any command in SOLAR such as
polygenic, twopoint, multipoint, maximize, etc. Sometimes the
information returned by SOLAR differs. For example, while the
"polygenic" command normally returns "proportion of variance due to all
covariates" when used with a quantitative trait, it instead returns the
"Kullback-Leibler R-squared" when used with a discrete trait. (For
technical reasons, the proportion of variance due to all covariates is
not available for discrete trait models in SOLAR.)
By examining the maximization output files you can determine unambiguously
whether discrete or quantitative methods were used. (An example of
a maximization output file is "null0.out" saved in the maximization
output directory after running "polygenic".) In this file, immediately
after the "Descriptive Statistics" and immediately before the "Model
Parameter Starting Points and Boundaries" there will be one of two
comments, either:
Using SOLAR Quantitative Trait Modeling
or
Using SOLAR Discrete Trait Modeling
When a model with a discrete trait is maximized, special discrete trait
algorithms are used. Unfortunately, these methods are much more prone
to numerical problems than the usual "quantiative trait" methods.
Numerical problems lead to faulty parameter estimates and convergence
failures.
The default descrete method is relatively robust and only infrequently
has the problem where the heritability erroneously gets close to 1.0.
Even if the polygenic heritability (h2r) goes to 1.0, you may still be
able to run a "multipoint" linkage analysis to find important locii.
The heritibilities will be wrong, and the LOD scores will be wrong,
but the "peaks" may be at or near the correct locations.
It is not recommended to use the optional second discrete method set by
giving the command "option DiscreteMethod 2" prior to running
polygenic. Although it was intended to be more accurate, it more
frequently fails with convergence errors or having the heritability go
to 1.0, and at this time it is not recommended.
Some people also try analyzing their discrete trait as quantitative.
This can be done by giving the command "option EnableDiscrete 0".
The likelihoods, LODS, and parameter estimates may be inaccurate, but the
LOD peaks should be in the correct places. Better convergence is
sometimes obtained, however, than when using the discrete method.
Beware that there is a fundamental error when analyzing a discrete trait
as quantitative. There are not truly two degrees of freedom for the
mean and SD. Therefore, convergence failure is still more common with these
models than with true quantitative models.
Also beware that if you had previously analyzed the trait as discrete,
and then changed the EnableDiscrete option to 0 without exiting SOLAR
or giving the "model new" command, you will still have parameter SD
constrained to 1.0, which is probably NOT what you need to do. SD is
properly constrained to 1.0 only when you are analyzing a discrete trait
as discrete (and, perhaps, in a few other esoteric cases).
Because of all the pitfalls in using discrete traits, we try to find and
use relevant quantitative traits whenever possible.
Shortcuts: discrete-note - discrete-notes
Purpose: Find the SOLAR documentation
Usage: doc [-whereis]
doc show official documentation URL and
location of documentation files on this system
Notes: This command now tells you the URL where SOLAR may be seen with
any available browser. Previously, it would run Netscape.
Shortcuts: doc - doc
Purpose: Find dominance documentation
Dominance analysis is documented in section 9.4 of the SOLAR
manual. Use the "doc" command or point your browser to
the "doc" directory of your SOLAR directory, then select
"Go to full SOLAR manual", then select Chapter 9.
Dominance analysis is made possible by the "delta7" and "d7" columns
in SOLAR phi2.gz and ibd matrices. For polygenic models, the delta7
matrix column is loaded, a d2r parameter is created and added to the
"e2" constraint, then a delta7*d2r term is added to the omega. The
commands required to do this are described in Section 9.4
Purpose: execute a script on every ranch machine (usually for /tmp cleanup)
DO NOT USE THIS FOR SUBMISSION OF REGULAR JOBS because it bypasses
the Gridware queing system, which it must do for cleanup of ALL machines.
MUST BE RUN ON MEDUSA (only medusa addresses all other ranch machines)
See also "stepup -parclean" which uses doranch to cleanup junk created by
forcing a "stepup -par" job to quit.
Usage: doranch <procname> <argument>
doranch cleanuser <username> ;# delete ALL user's /tmp files on
;# the ranch (Note: you can only
;# delete files for which you have
;# delete privilege, usually because
;# of owning them.)
doranch finduser <username> ;# find all my /tmp files on the
;# ranch but do not delete them.
;# Findings are written
;# to finduser.out. If -all is
;# used, all users are shown.
doranch cleantmp <dirname>. ;# same as "stepup -parclean"
;# delete all /tmp/<dirname>.*
;# files. (parallel stepup dirs
;# are prefixed with <dirname>
;# followed by dot.
doranch findtmp <dirname> ;# find all name* directories
;# but do not delete them. Findings
;# are written to findtmp.out.
doranch cleanme now ;# same as
;# doranch cleantmp <username>
make_rhosts ;# make a new .rhosts file, or
;# append to existing one to
;# make it complete. It may be
;# useful to delete old .rhosts
;# file first if it contains errors.
showspace ;# Return sorted list of /tmp
;# storage used by all users
;# in showspace.out. Uses
;# doranch finduser -all, unless
;# existing finduser.out is found.
<procname> is the name of the procedure to be run on every
machine. procedures cleanuser, finduser, cleantmp,
findtmp, and cleanme are provided, but user-written
scripts could be used also.
<username> is the username.
cleantmp is a procedure that deletes all files and directories
in /tmp which match the specified prefix, after which a wildcard
* is assumed. For example "cleantmp charlesp." would delete a
directory named "/tmp/charlesp.11019.2"
Notes: It is useful to run ranch jobs in subdirectories of the /tmp
directory to minimize network traffic. Jobs should be designed to
cleanup after themselves in normal operation by deleting the
/tmp subdirectory that was used as a working directory.
However, even when jobs are designed to cleanup after themselves,
if the jobs do not run to completion, the cleanup code might never
be run. This is especially true when a user or administrator
shuts down a large array job (such as "stepup -par") because of
a mistake or emergency.
That is when "doranch" may be useful. The "cleanuser" procedure
deletes all files owned by the user in /tmp directories on
all ranch machines. The "cleantmp" procedure deletes all files
and directories in /tmp prefixed by the cleantmp argument on all
ranch machines.
The doranch procedures listed above may be used in creating custom
cleanup options for other scripts.
Such an emergency cleanup option is already built into the stepup
command as option "-parclean". That uses doranch and cleantmp
as shown above. Authors of other parallel scripts for general
create similar script options tailored to the names of /tmp
subdirectories they use.
To see what the "finduser" script looks like, in order to write
something similar, use the solar command "showproc finduser".
All the doranch procedures write to a file named by the specified
procname, for example cleanuser writes to a file named cleanuser.out
for each file found. Usually this has two columns, node name
and filename. However, for "finduser" a middle column is added
which lists total diskspace used in kbytes.
Note that a valid .rhosts file is required for usage, and
the make_rhosts file will make one. doranch will complain
if the .rhosts file is not present or incomplete.
If doranch reports failure in connecting to some hosts, it is
probably because the passwd and shadow files involved in userid
authentication have not been properly updated on those hosts.
If doranch reports failure in connecting to every ranch host, it
probably means that the .rhosts file is invalid, and you should then
delete the old .rhosts file and run make_rhosts.
If doranch hangs at a particular host, that machine is probably
down in some unusual way that is not known to gridware.
Purpose: Return a random floating-point number between 0 and 1
Usage: drand [ <seed> ]
If no argument is given, drand returns a floating-point
number in the interval [0,1].
When an argument is given, it is taken to be an integer with
which to seed the random number generator. If a seed value
of 0 is specified, the system time is used as the seed. The
random number generator should be seeded prior to its first
use in a SOLAR run. If the random number generator has not
been seeded when it is first called, it will be seeded with
the system time automatically.
Shortcuts: dran - drand
Purpose: Use command: multipoint -epistasis <N>
Usage: multipoint -epistasis <N> (<N> is the mibd index of interest)
The 'epistasis' command itself is reserved for future use.
Shortcuts: epista - epistasis
Purpose: Copy the SOLAR example to the current working directory
Usage: example
Notes: The example may be used in conjunction with the SOLAR tutorial
in Chapter 3. To read that, give the command "doc -chapter 3"
The example files are actually located in the doc/Example
subdirectory of the SOLAR installation. To find the "doc"
subdirectory, give the command "doc -whereis"
Shortcuts: examp - example
Purpose: Excude phenotypes from use as covariates by automodel
and allcovar commands.
Usage: exclude <var> <var> ... ; Add variable(s) to exclude
exclude ; List all excluded variables
exclude -reset ; Reset to default exclude list
exclude -clear ; Remove all variables from list
Notes: You may add to the exclude list with one or more exclude commands.
By default, all variables named and/or mapped by the FIELD command
will be excluded (except for SEX). The exclude command lets you
exclude additional variables. (The FIELD command variables are
pedigree variables such as ID which would never be wanted as
covariates.)
The default exclude list will include the following standard PEDSYS
pedigree mnemonics:
seq fseq mseq sseq dseq ego id fa mo sire dam pedno famno twin
mztwin ibdid fibdid mibdid blank kid1 psib msib fsib birth exit
If you are excluding more variables that you are keeping, you might
consider simply specifying the covariates you want explicitly
rather than using the allcovar or automodel commands, or creating
a new phenotypes file with fewer fields.
The variable name you enter will be converted to lower case. Solar
is intended to handle phenotypic and pedigree variables in a case
insensitive manner.
Shortcuts: excl - exclude
Purpose: Compute factorial
Usage: factorial N
Example: set big [factorial 10]
Notes: A double precision value is returned, since double precision can
represent larger numbers exactly than integers.
Non-integral N is rounded to the nearest integer first, then
the factorial is computed for that integer.
For large enough N, the value returned might not be exact.
(Currently this happens for N > 18.)
Negative N (after rounding) raises a range error
This may be, but need not be, used in an "expr".
Purpose: Replace Tcl format with fixed width fields for numbers
Usage: fformat <spec>+ <value1>+
<spec> format specifier(s) as for Tcl format command.
f, e, or g format required for "fixed width"
operation, like this:
%[--][W][.P]T where T is e, f, g, or y
default right justification
- specifies left justification
-- specifies center justification
W is desired width
P is desired precision (before and after decimal)
T is format type:
f is floating decimal
e is exponential
g is floating decimal if suitable, then exponential
y same as g, except that exponential format is not
used until the output would otherwise be 0.0 or
nearly so for a non-zero value. At least one
significant digit is preserved for P 1-4, two
digits for P 4-6, and three digits for P 7-*.
This is more consistent with readability,
retaining the fixed format nearly as long as
possible. Sometimes, more space will be used than W,
but this is much less likely than with the standard
G format. However, unlike F format, the the result
will not go to zero unless it is zero. When possible,
allow more space in "width" than the precision seems
to require. That way, under special circumstances,
there is extra space for signs, "e", decimal point,
etc.
z same as y, except resulting string is trimmed to
minimum space for csv files
Note: For fractional numbers, make width at least 2
than precision, to allow for leading "0." Then allow
one more for - sign, if that is possible.
This is intended as a drop-in replacement for the Tcl "format"
command, modifying "minimum width" to "fixed width" for
the f, e, and g formats ("fixed width" makes for more
readable columns) and adding a center justification option.
Purpose: Allow non-standard user data field names
Usage: field ; This shows all mappings
field <default name> <user name> ; Create one mapping
field <default name> ; Show one mapping
field <default name> -none ; Ignore this field (see notes)
field <default name> -default ; Restore full default
Examples: field ID Subject
field FA Father
field MO Mother
Notes:
The default names are ID, FA, MO, PROBND, MZTWIN, FAMID, SEX, and HHID.
EGO, SIRE, and DAM are also permitted in place of ID, FA, MO by default.
However, unlike the default, you can only specify one name to be
mapped. However, you can restore the full default for any field
using the -default argument.
The -none argument declares a field to be ignored whether it exists or
not. This is useful, for example, if you want the optional PROBND
field to be ignored:
field PROBND -none
This would signify that there is no PROBND field, i.e. there are no
probands. PROBND, MZTWIN, and HHID are optional fields for which the
-none argument may be used. Most other fields are mandatory and -none
cannot be used for them.
Your field selections are saved for future SOLAR sessions to a file
named field.info in the working directory. Once you have entered
your field selections, you need not enter them again (starting with
version 2.0.6) when you are running SOLAR from the same working
directory. However, if you followed our previous recommendation to
put field commands in a .solar file, note that the settings in the
.solar file take precedence over the settings in field.info.
If you would like to remove an entire old set of field assignments,
you can delete the field.info file BEFORE starting SOLAR. (SOLAR
reads the field.info file when starting.)
FAMID field may or may not be required depending on whether your ID's
are unique in the entire dataset. If your ID's are unique, you do
not need FAMID. However, if your ID's are sequential within each
family, you need a FAMID field in both your pedigree and phenotypes
files, otherwise they are ambiguous. SOLAR now catches this mistake
when either pedigree or phenotypes files are loaded.
Shortcuts: fie - fields
Purpose: Describe frequency data file requirements
The freq file contains allele frequency data for a set of marker loci,
one line per marker. Each line consists of the following space-delimited
fields:
marker name, all_1 name, all_1 freq, all_2 name, all_2 freq, ...
The allele frequencies for a marker must sum to 1 (a small roundoff error
is tolerated.)
Allele frequency information is used when IBDs are computed for a marker
that is not completely typed, i.e. there are individuals for whom genotype
data is not available.
Example:
D20S101 123 0.2457 127 0.1648 133 0.5895
IGF1 A 0.4 B 0.3 C 0.1 F 0.2
ApoE E1 .125 E2 .25 E3 .625
Once a freq file has been loaded, it is not necessary to load it again
in subsequent SOLAR runs from the same working directory.
Shortcuts: file-f - file-freq
Purpose: Describe map data file requirements
The map file contains chromosomal locations for a set of marker loci
on a single chromosome. Typically, marker locations are given in cM
and a mapping function is used to convert inter-marker distances to
recombination fractions. Currently, the Kosambi and Haldane mapping
functions are allowed. Marker locations can also be specified in
basepairs. While cM locations can be floating point numbers, basepair
locations must be integers; non-integer locations are truncated to
integers. When basepair locations are used, the mapping function is
called "basepair" rather than Kosambi or Haldane, but in fact there
is no mapping provided from basepairs to recombination fractions and
such maps cannot be used to compute multipoint IBDs. The first line
of the map file contains the chromosome number, and (optionally) the
name of the mapping function. If no mapping function is specified,
the mapping is assumed to be Kosambi. The chromosome number can be
any character string not containing a blank or a forward slash (/),
although the use of integers is recommended. For example, the strings
'01' and '10q' are allowed. Each line after the first line consists
of the following space-delimited fields:
marker name, marker location
Examples:
20
D20S101 0.0
D20S202 34.2
D20S303 57.5
TCF basepair
2448b 19828941
380659 19829489
Shortcuts: file-map - file-map
Purpose: Describe marker data file requirements
The marker file contains genotype data for one or more marker loci.
The file consists of one record for each individual who has been typed
for one or more of these markers. Each record must contain the following
fields:
ego ID, genotype1, genotype2, ...
In addition, a family ID field must be included when ego IDs are not
unique across the entire data set. If, however, each ego ID is unique
to an individual and an individual may appear multiple times in the
data set, then the family ID should not be included. The same genotypic
data is then associated with every occurrence of an individual.
The default field names are ID and FAMID. EGO is also accepted by
default. You can set up SOLAR to use different field names by using
the field command (see 'help field'). You do not necessarily need to
change your names to match ours.
Fields with names other than ID and FAMID are assumed to contain marker
data, with the exception of the following names: FA, MO, SEX, MZTWIN,
HHID, AGE, PEDNO, and GEN. Fields having one of these names are ignored.
The scheme used to encode genotypes may vary from field to field.
SOLAR recognizes many standard coding schemes, but the safest way to
code genotypes is with the forward slash to separate the alleles.
Ex: AB
E1 E3
123/456
A blank genotype field denotes missing data, as do the genotypes 0/0
and -/-. SOLAR requires that either both alleles are typed or both
alleles are missing, except for male genotypes at X-linked marker loci.
In that case, either a single allele is specified (the other allele is
blank, 0, or -), or the genotype is coded as a "homozygote".
Ex: 237/243 valid female X-linked marker genotype
/251 valid male X-linked marker genotype
251/0 valid male X-linked marker genotype
-/251 valid male X-linked marker genotype
251/251 valid male X-linked marker genotype
The marker loci in the marker file must all be autosomal or all be
X-linked. By default, SOLAR assumes that the markers are autosomal.
If the markers are X-linked, then either the XLinked option must be
set with the ibdoption command prior to loading the marker file, or
the -xlinked option must be given in the load marker command.
Once a marker file has been loaded, it is not necessary to load it
again in subsequent SOLAR runs from the same working directory.
Shortcuts: file-mar - file-markers
Purpose: Describe pedigree data file requirements
The pedigree file consists of one record for each individual in the data
set. Each record must include the following fields:
ego ID, father ID, mother ID, sex
In addition, a family ID is required when ego IDs are not unique across
the entire data set. If the data set contains genetically identical
individuals, an MZ-twin ID must be present (as described below). If an
analysis of household effects is planned, a household ID can be included
(also described below).
The default field names are ID, FA, MO, SEX, FAMID, MZTWIN, and HHID.
EGO, SIRE, and DAM are also accepted by default. You can set up SOLAR to
use different field names by using the field command (see 'help field').
You do not necessarily need to change your names to match ours.
A blank parental ID or a parental ID of 0 (zero) signifies a missing
parent. SOLAR requires that either both parents are unknown, i.e. the
individual is a founder, or both parents are known.
If the pedigree data consists of unrelated individuals with no parental
data, then the father ID and mother ID fields are not required. If there
are parents for whom pedigree file records do not exist, then records
are created internally for those parents, who are assumed to be founders.
Sex may be encoded as M, m, or 1 for males and F, f, or 2 for females.
The missing value for sex is 0, U, u, or blank.
The MZ-twin ID is used to designate genetically identical individuals,
e.g. monozygotic twins or triplets. Each member of a group of identical
individuals should be assigned the same MZ-twin ID. Twin IDs must be
unique across the entire data set. If there are no genetically identical
individuals in the data set, this field need not be present in the
pedigree file.
The household ID, if present, will be used to generate a matrix file
(house.gz) that can be used later to include a variance component for
household effects. Household IDs must be unique across the entire data
set.
The family ID field is required only when ego IDs are not unique across
the entire data set. For example, if a data set consists of nuclear
families, and the same ego ID may appear in more than one family, then
the family ID must be included. Or if, for example, IDs are sequential
integers unique only within pedigrees, then the pedigree ID must be
included.
At the time the pedigree file is loaded, SOLAR indexes the data set.
This indexing is internal and should not be confused with any external
indexing the user may have imposed upon the data set. This indexing
information is stored in a file named 'pedindex.out' in the directory
where SOLAR is running when the pedigree data is loaded. Be careful
about deleting files unless you are sure they are not needed by SOLAR!
Once a pedigree file has been loaded, it is not necessary to load
it again in subsequent SOLAR runs from the same working directory.
Shortcuts: file-pe - file-pedigrees
Purpose: Describe phenotypes data file requirements
The phenotypes file may be in either PEDSYS or Comma Delimited format.
The phenotypes file consists of one record for each individual.
Each record must include an ego ID and one or more phenotypic
values (which may be blank to signify missing data).
ego ID, phen 1, phen 2, ...
(The phenotypes file may also contain other data, such as pedigree
data. You could use one file as both your phenotype and your
pedigree file, though that is not necessarily recommended. There
are fewer possible problems with separate files.)
Just as with the pedigree file, a field name FAMID is required when
IDs are not unique across the entire data set. (If your ego IDs
are unique, it is probably better _not_ to include a family ID,
as it just complicates things slightly.)
If your data has probands and you wish to employ ascertainment
correction, the phenotypes file must have a proband field. In this
field, blank ( ) or zero (0) signifies non-proband, and anything
else signifies proband. A decimal point is _not_ permitted after
the zero. The presence of a proband field automatically turns on
ascertainment correction.
The default field names are ID, FAMID, and PROBND. You can set up
SOLAR to use different field names by using the field command.
The phenotype field names may be anything within certain rules.
(no spaces, tabs, or slashes; also certain special characters such
as *#,^/-+ can cause problems in the names of phenotypes used
as covariates). If you stick with alphabetic characters, numeric
characters, and underscores you will be safe.
The phenotype data fields must be numbers, either with or without
decimal points. Zero (0) is always considered a permissible value;
blank ( ) or null (e.g. no value in between the commas ",," in a
comma delimited file) must be used to signify missing values.
Floating or fixed point numbers must always include a decimal
point; numbers without a decimal point are assumed to be integers.
Binary, discrete or categorical values should be indicated with
consecutive integers (e.g. 0,1 or 1,2 or 2,3). SOLAR checks all
phenotype fields to see if they contain only two consecutive
integers and judges them "binary" if they do. Binary traits are
automatically handled by the SOLAR discrete trait liability
threshold modeling code; you don't need to do anything special.
See Chapter 9 for discussion on what to do with "categorical"
data that has more than two categories.
Without special scripting, categorical phenotypes with more than two
categories should not be used in SOLAR. (SOLAR will not identify
categorical phenotypes with more than two categories and instead
treat them as quantitative phenotypes.)
The 'load phenotypes' command creates a file named phenotypes.info
in the working directory. Once a phenotypes file has been loaded,
it need not be loaded again in the same working directory, unless
you change the file itself.
SOLAR automatically removes pedigrees in which no non-proband has
all required phenotypic data from the analysis. You need not
remove these pedigrees yourself. You will get a full accounting of
pedigrees and individuals included and excluded in the maximization
output files (described below) , by running the 'maximize' command,
or giving the 'verbosity max' command prior to other commands.
Shortcuts: file-ph - file-phenotypes
Purpose: Set fine mapping threshold for multipoint
Usage: finemap <LOD> [<LOD> ...]
finemap default
finemap off
finemap {displays current finemap setting}
Example: finemap 0.588
Notes: After each multipoint pass when the interval is greater than 1
SOLAR will examine all points in the regions around points
higher than some threshold. This threshold is set with the
finemap command.
The default is 0.588.
Finemapping can also be turned off. The finemap setting is
unimportant when the interval is 1. (Note: versions of SOLAR
prior to 1.1.0 did finemapping only around the single highest
peak by default.)
Shortcuts: fine - finemap
Purpose: Constrain a parameter to its current value
Usage: fix <name> ; Name is the name of the parameter you want to fix
Example: fix h2r
Shortcuts: fix - fix
Purpose: Process the allele frequency data.
Usage: load freq [-nosave] <filename> ; loads freq file
freq unload ; unloads allele frequencies
freq mle [-nose] [-hwe] [<marker> ...]
; computes MLE allele frequencies
freq save <filename> ; saves allele frequencies to a file
freq show [<marker> ...] ; displays allele frequencies
The 'freq unload' command will not unload allele frequency
information for markers with currently loaded genotype data.
Frequency data for such markers is unloaded at the time the
genotype data is unloaded with the 'marker unload' command.
In general, it is not necessary to unload existing frequency
data before loading a new freq file; the unloading will be done
automatically. However, the 'freq load' command will not replace
previously loaded frequency data if MLE allele frequencies have
been computed but not saved for one or more markers. In that
case, the MLEs must be saved to a file, or the -nosave option
must be specified.
MLE allele frequencies are computed only for markers with
currently loaded genotype data. To load genotype data, use
the 'load marker' command. If a marker name is not specified
for the 'freq mle' command, MLE allele frequencies will be
computed for all markers with currently loaded genotype data.
By default, standard errors are computed for the marker allele
frequency estimates. These additional calculations result in
increased compute time, but can be avoided with the '-nose'
option if standard errors are not required.
When the '-hwe' option is given, the 'freq mle' command carries
out an additional likelihood maximization for each marker, in
this case finding MLEs for marker genotype frequencies rather
than allele frequencies. A test of whether the assumption of
Hardy-Weinberg equilibrium holds is provided by comparing the
likelihood of the genotype frequency-based model with that of
the allele frequency-based model (which assumes HWE). When
this test has been conducted, the associated p-values will be
displayed by the 'marker show' command.
The file created by the 'freq save' command is written in a
format suitable for subsequent loading with the 'freq load'
command.
If a marker name is not specified for the 'freq show' command,
all currently loaded frequency information will be displayed.
The currently loaded allele frequency information is stored in
the file 'freq.info' in the current working directory. This
file persists between SOLAR runs, which means that the allele
frequencies which are in effect at the end of a session will
still be in effect the next time SOLAR is invoked (from within
the same working directory.)
Notes: The set of markers in the freq file and the set of markers in
the marker file do not have to be the same. Allele frequencies
will be computed for markers that do not appear in the freq file
at the time these markers are loaded with the 'load marker'
command.
For a description of the freq file, enter 'file-freq'
Shortcuts: fre - freq
Purpose: Prepend the maximization output directory name to filename(s)
Usage: full_filename [<filename>]+
Note: See "help outdir". full_filename is intended for scripts.
Purpose: Find the highest likelihood in the vicinity of marker(s)
Usage: grid <marker1> [<marker2> ...]
Example: grid APOa D6S2436
Notes: outdir (or trait) and ibddir must previously be specified.
ibd matrices for each marker must already have been computed.
A model named "null0.mod" is expected in the output directory.
That can be created with the polygenic command.
Summary output is displayed on your terminal and written to a file
named grid.out. An output and model file are saved for each
marker with the name <markername>_best_theta.
The twopoint command also has a "-grid" option which will grid
around every marker evaluated.
A special "-bsearch" option sets point at which a "golden section"
search begins. By default, grid single-steps by 0.01 from 0.00
to 0.05 and then begins a golden section search. (This is on the
assumption that the peak will be reached before 0.05 in the vast
majority of cases.) If you have a significant number of cases
above 0.05, you might want to change this, for example:
grid mrk -bsearch 0.01
would start the golden section search after 0.01 (which will be
faster if the value is expected to be greater than 0.05, but
slower if the value is going to be less than 0.05). Note: 0.01 is
the smallest value at which the search can begin. On the other
hand if you wanted to single-step all the way to 0.10, you would
give the command:
grid mrk -bsearch .1
Shortcuts: grid - grid
purpose: grid around the h2r value in polygenic model
usage: gridh2r [-lower <lower>] [-upper <upper>] [-step <step>]
-lower <lower> Lowest h2r; default is current value less 0.1
-upper <upper> Highest h2r; default is current value plus 0.1
-step <step> step; default is 0.01
Notes: polygenic should be run first. Only univariate models with
only e2,h2r parameters are supported.
Out is written to a file named gridh2r.out in the maximization
output directory. The starting model is listed first regardless
of whether it is in the range specified.
After completion, the model with the best loglikelihood will
be loaded, but with the h2r constraint (if any) deleted. This
might be the starting model even if it isn't in the specified
range.
Each point requires a maximization, so they might come out
slowly. For full maximization detail, give the "verbosity plus"
or "verbosity max" command beforehand.
Purpose: Convert esd,gsd,[qsd1] parameters to standard parameters
Usage: ;# trait only required for multivariate model
gsd2h2r [<trait>] ;# compute h2r from esd,gsd,[qsd1]
gsd2sd [<trait>] ;# compute SD from esd,gsd,[qsd1]
gsd2h2q [<trait>] ;# compute h2q1 from esd,gsd,[qsd1]
Note: Use polygsd command to set up model, and maximize to maximize it
first, followed by linkgsd for linkage models.
See the documentation for the polygsd, linkgsd.
solar::gsd2sd --
solar::gsd2h2q --
Purpose: Convert esd,gsd,[qsd1] parameters to standard parameters
Usage: ;# trait only required for multivariate model
gsd2h2r [<trait>] ;# compute h2r from esd,gsd,[qsd1]
gsd2sd [<trait>] ;# compute SD from esd,gsd,[qsd1]
gsd2h2q [<trait>] ;# compute h2q1 from esd,gsd,[qsd1]
Note: Use polygsd command to set up model, and maximize to maximize it
first, followed by linkgsd for linkage models.
See the documentation for the polygsd, linkgsd.
solar::gsd2h2q --
Purpose: Convert esd,gsd,[qsd1] parameters to standard parameters
Usage: ;# trait only required for multivariate model
gsd2h2r [<trait>] ;# compute h2r from esd,gsd,[qsd1]
gsd2sd [<trait>] ;# compute SD from esd,gsd,[qsd1]
gsd2h2q [<trait>] ;# compute h2q1 from esd,gsd,[qsd1]
Note: Use polygsd command to set up model, and maximize to maximize it
first, followed by linkgsd for linkage models.
See the documentation for the polygsd, linkgsd.
Purpose: Perform heritability power calculations
Usage: h2power [-prev] [-grid {<from> <to> <incr>}] [-data <fieldname>]
[-nreps <nreps>] [-seed <seed>] [-overwrite] [-plot]
[-nosmooth]
h2power -restart [-grid {<from> <to> <incr>}] [-nreps <nreps>]
[-plot] [-nosmooth]
This command performs a power calculation for the currently
loaded pedigree, with the following default assumptions:
(1) the trait to be studied is either quantitative or
dichotomous (e.g. affected/unaffected)
(2) the trait to be studied is influenced by additive
genetics
(3) all pedigree members will be phenotyped for the trait
to be studied (unless the -data option is used to
exclude those individuals who will not have phenotypic
data; see the description of this option below)
Simulation is used to estimate the frequency with which one
would expect to obtain a significantly non-zero estimate of
heritability given that a specified fraction of the phenotypic
variance is due to additive genetics. Twice the difference in
the loglikelihoods of the polygenic and sporadic models is
asymptotically distributed as a 1/2:1/2 mixture of a chi-square
random variate with one degree of freedom and a point mass at 0.
A result is considered significant if the probability of
obtaining the observed chi-square value, in the absence of a
genetic effect, is less than or equal to .05.
The default is to perform 10 replicates of the simulation for
each heritability in the range .01, .02, .03, ..., .99. For
each replicate, a polygenic model is fitted to the simulated
data, and the resulting heritability estimate and chi-square
statistic are recorded. The observed chi-squares are converted
to power, i.e. the power to detect the corresponding observed
heritability at a significance level of .05.
The following options give the user some control over the power
calculation procedure:
-prev If the trait to be studied is dichotomous, SOLAR
will assume the existence of an unobserved liability
distribution. Individuals with liabilities above
some threshold value will be "affected", i.e. they
will have the larger of the two trait values (for
example, a 1 for a 0/1 trait.) The -prev option
is used to specify the "disease" prevalence, or
fraction of individuals who are "affected", which
in turn determines the liability threshold.
-grid Specify the set of heritabilities for which power
will be computed. At each grid point, trait data
having that expected heritability are simulated,
sporadic and polygenic models are fitted to the
data, and the loglikelihoods of the models are
compared. The observed chi-square test statistics
are averaged to obtain the expected chi-square
value for that heritability. The grid is given by
a set of three numbers enclosed in curly braces:
{<from> <to> <incr>}
where <from> is the starting heritability, <to>
is the last heritability considered, and <incr>
is the interval between grid points. If the
desired grid consists of a single effect size,
the three-number list can be replaced by that
single number and curly braces are not required.
-data Exclude individuals from the power calculation
who are missing data for phenotype <fieldname>.
-nreps Perform <nreps> simulations at each grid point.
The default number of replicates is 100.
-seed Set the random number generator seed. The default
is to set the seed based on the date and time.
-plot At the end of the power calculations, display a
plot of power versus QTL heritability. To display
this plot for a previously completed calculation,
use the command "plot -h2power".
-nosmooth By default, the power curve is smoothed by fitting
a line through the observed chi-square values as
a function of the heritability squared prior to
converting the chi-square values to power. This
option turns the smoothing off.
-overwrite (or -ov) Overwrite the results of a previous
power calculation.
-restart (or -r) Restart a power calculation.
Notes: It is possible to change the grid of heritabilities and the number
of replicates when restarting a calculation. The calculation
will not be restarted if a grid is chosen that does not include
all the points in the previously specified grid unless the
-overwrite option is included, in which case the simulation
replicates for any extra grid points are discarded. Similarly,
the -overwrite option is necessary if fewer replicates are
requested than were done previously, in which case any extra
replicates are discarded.
The following files are created:
h2power.out A space-delimited file containing a line for
each grid point in the format X Y, which is
suitable for input to plotting packages such
as xmgr. The first (or X) column contains the
heritability. The second (or Y) column contains
the power.
h2power.info Stores the various options selected along with
the chi-square statistic, averaged over the
replicates, at each grid point.
h2power.chis Stores the results of the simulation replicates
run at each grid point. This file, along with
h2power.info, is used to restart an interrupted
power calculation.
During a power calculation, various files named "simqtl.*" are
created along with a trait directory named "simqt". These will
be removed at the end of the run.
Purpose: Display instructional messages
Usage: help ; Displays list of commands
help <command> ; Displays doc for one command
help -output <filename> ; Write list to a file
help <command> -output <filename> ; Write doc to a file
help -user ; Display user commands
Notes: See also the doc and usage commands.
The help for any particular command is quite detailed and may
cover several pages. The usage command provides a very brief
summary of command options, and does not invoke the 'more' pager
so it will stay on your window while you enter the next command.
See help for "helpadd" to see how you can add your own help
messages to the help system. To display all the user commands,
give the command "help -user". To display the help for any
particular user command, the ordinary "help <command>" will
work, because it searches user tcl files if <command> is not
found in SOLAR tcl files.
Shortcuts: hel - help
Purpose: Explain how to add more help messages to SOLAR
SOLAR first looks for a help message for a particular
commands in the main SOLAR tcl package file solar.tcl.
Next, it looks for help messages in all *.tcl files in the
current tcl search paths:
. (The current working directory)
~/lib (A lib subdirectory of your home directory)
$SOLAR_LIB (the lib subdirectory of the SOLAR installation.
This symbol is created by SOLAR.)
All help messages are commented text blocks beginning with a
command header and ending with a line containing "#-" or any
line with no leading "#".
# <packagename>::<commandname> [--] [private|public]
The packagename is taken from the filename: it is the segment
of the filename preceeding the first period (.). For example,
the packagename for solar.tcl is "solar" and the packagename
for john.solar.tcl is "john".
Each help block should contain at minimum a purpose and usage
section as shown in example below.
The help display of all commands available is taken from all
the commands defined in solar.tcl and other files in $SOLAR_LIB.
The "help -user" command shows all commands for which help is
available defined in . and ~/lib. To block display of this
command in "help -user" include the word "private" following
the command name.
Example:
File named multiproc.tcl defines a command named mmultipoint. The
file is placed in ~/lib and has a help header that looks like this:
# multiproc::mmultipoint -- public
#
# Purpose: Spawn multipoint jobs per chromosome
#
# Usage: mmultipoint <minlod>
#-
mmultipoint uses a private procedure mspawn:
# multiproc::mspawn -- private
#
# Purpose: Launch job on cluster machine
#
# Usage: mspawn <scriptfile>
#-
Shortcuts: helpa - helpadd
Purpose: Heterogeneity test for linkage
Usage: hlod [-step <stepsize>]
<stepsize> is size of increment in h2q1 and h2r. Default is
0.1, 0.05, 0.02, or 0.01 chosen to be about 1/10 of total
heritability in starting model (with a minimum of
8 test points plus maximized h2q1 and total heretability).
Notes: Linkage test model must be loaded at beginning. Corresponding
null model will be found in current output directory.
Complete results are written to hlod.out in maximization output
directory.
Linkage model may only have 1 linkage element; null model
is "polygenic" (null0) model in current output directory.
H0 vs. H1 test is only considered significant if p < 0.0001
(LOD 3). If this is not significant, there will be a warning
that the H1 vs H2 test makes no sense.
hlod uses "homo" program written by Harald Goring for which
documentation is provided in Appendix 5 of the documentation
(use "doc" command to browse, click on Table of Contents).
Purpose: Enable analysis of household effects
Usage: house ; enable analysis of household effects
house -suspend ; temporarily suspend household effects
; (the c2 parameter is constrained to zero)
house -delete ; delete household effects (c2 is deleted)
Examples:
** new model ** Note you must give house command after trait or model new
solar> model new
solar> trait weight
solar> covar age^1,2#sex
solar> house ; activates h. effects
solar> polygenic -screen ; polygenic analysis w. screening
** old model **
solar> load model poly
solar> house
solar> polygenic
Notes: This may be used for any common environmental effects (not
necessarily "household" effects). The house command changes the
current model as follows:
1) A parameter c2 (c stands for "common") is created
2) The house matrix is loaded
3) A c2*house term is added to the omega
4) The c2 parameter is added to the e2 + ... = 1 constraint
5) The starting value of c2 is carved away from the value of
e2 so that the constraint remains satisfied
The pedigree file must contain a HHID field. If so, the
'load pedigree' command produces a matrix named house.gz.
That matrix will be used. If house.gz is not present, this
command will fail, although you can map HHID to any particular
field in your pedigree file using the "field" command.
WARNING! If you load a pedigree without a HHID field (or a field
mapped to it with the field command) a pre-existing house.gz, now
assumed to be obsolete, will be deleted. This is to prevent you
from making the mistake of using an obsolete house.gz.
The 'house' command should be specified after such commands as
automodel, trait, polymod, or spormod (which initialize polygenic
or sporadic models) and/or just before commands which maximize
models such as 'polygenic,' 'maximize,' or 'multipoint.' This
is because "polygenic" or "sporadic" models, by definition,
do not have household effects. But the polygenic command will
do the household "analysis" if it detects the presence of a c2
parameter which is not constrained to zero.
We define the following model classes:
sporadic (covariates only...and e2)
household (covariates and c2)
polygenic (covariates and h2r)
household polygenic (covariates, c2, and h2r)
To create a pure "household" model with no active genetic component,
give the commands "spormod" and "house" in that order after setting
up the trait(s) and covariate(s).
By default, if a household element is in the model, pedigrees will
be merged whenever individuals in separate pedigrees share the same
household. The resulting groups are called "pedigree-household"
groups. This may significantly increase memory requirements.
Pedigree merging is controlled by two SOLAR options (see the
option command). The default of 1 for MergeHousePeds means that
pedigree merging, as just described, will be done. This feature
may be changed by setting MergeHousePeds to zero prior to the
polygenic or maximize commands:
solar> option mergehousepeds 0
solar> polygenic -screen
The MergeAllPeds option combines all the pedigrees into one large
group if set to 1. This is an alternative simpler method of
merging, but it may increase memory requirements even more.
Shortcuts: hou - houses
Purpose: Compute marker-specific IBDs.
Usage: ibd [-nomle] <marker> [<marker> ...]
; computes IBDs for specified markers
ibd [-nomle] ; computes IBDs for all markers
ibd mito ; computes mitochondrial IBDs
ibd export [-file <filename>] [-overwrite] [-append]
[-nod7] [-ibdid] [<marker> ...]
; writes IBDs for specified markers
; to a file in comma delimited format
ibd import [-file <filename>] [-nod7] [-ibdid] [<marker> ...]
; reads IBDs for specified markers
; from a file in comma delimited format
ibd prep <program> [-version 2.82] [<marker> ...]
; prepares input files needed to compute
; IBDs using <program>, where <program>
; is loki, simwalk (sw), merlin, or
; genehunter (gh)
Before any ibd command can be run, the directory in which to
store the IBDs must be specified with the 'ibddir' command. This
specification is stored in a file ibddir.info in the working
directory, so it need not be repeated in future sessions from
the same working directory.
The first record in all matrix files produced by SOLAR, including
IBD matrix files, is a checksum and not real data; see matcrc
command for details. This checksum is optional in user created
matrix files. If present, it prevents a using matrix with a
different or changed pedigree.
In the absence of prior knowledge of marker allele frequencies,
it is recommended that the 'freq mle' command be used to compute
maximum likelihood estimates of the allele frequencies. This
will improve the accuracy with which missing genotypes are imputed
in the IBD computation process. IBDs will not be computed for
markers with simple-count allele frequencies generated by the
'load marker' command, i.e. MLE allele frequencies are required
when prior frequency data is not available. To compute IBDs
using the simple-count allele frequencies instead of MLEs,
specify the -nomle option. Alternatively, the NoMLE option can
be set using the 'ibdoption' command.
The method used to compute marker-specific IBDs will depend on
the family structure and will be selected automatically. It
is possible to choose the Monte Carlo method regardless of the
automatic selection by using the 'ibdoption' command. For
performance reasons, the Monte Carlo method will be used
automatically for completely-typed markers.
Mitochondrial IBDs are a special case. Each pair of individuals
who share a common maternal lineage, i.e. who have inherited the
same mitochondrial DNA, will be assigned an IBD value of 1, while
all other pairs are assigned an IBD value of 0. The necessary
information is completely contained in the pedigree data. Hence,
there is no mitochondrial marker data to load, nor are allele
frequencies required.
The 'ibd export' command outputs the IBDs for a specified set
of markers into a comma delimited file. The IBDs must be stored
in the directory named in the 'ibddir' command. If no marker
names are given, then all IBDs found in the 'ibddir' directory
are exported. By default, the SOLAR indexed IDs (IBDIDs) in the
IBD files are translated to permanent IDs, and family IDs are
included when present in the pedigree file. The default name for
the output file is "solar-ibd-export.out". The default fields in
the output file are MARKER, [FAMID,] ID1, ID2, IBD, and D7.
The options for the 'ibd export' command are
-file (or -f) Export IBDs to this filename.
-overwrite (or -ov) Overwrite existing output file.
-append (or -a) Append IBDs to existing output file.
-nod7 Don't include D7 field from IBD files.
-ibdid Write out SOLAR indexed IDs (IBDIDs)
rather than permanent IDs.
The 'ibd import' command inputs the IBDs for a specified set of
markers from a comma delimited file. IBD files are written and
stored in the directory named in the 'ibddir' command. If an
IBD file for an imported marker already exists, it is overwritten.
By default, the permanent IDs in the input file are translated
to SOLAR indexed IDs (IBDIDs). Family IDs must be included in
the input file when they are present in the pedigree file.
The default name for the input file is "solar-ibd-import.in".
The default fields in the input file are MARKER, [FAMID,] ID1,
ID2, and IBD. If the input file does not contain a D7 field,
all D7 values in the IBD files are set to zero. By default,
all IBDs in the input file are imported. However, if markers
are specified on the command line, then IBDs are imported for
those markers only. If one and only one marker is specified on
the command line, a MARKER field is not required in the input
file. The order of the markers in the input file is unimportant,
but all the lines for a given marker must be adjacent. Unless
there is inbreeding in the pedigree file, checks are made to
ensure that imported IBDs for parent-offspring and self pairs
are correct (0.5 and 1, respectively). An option is provided
to make the parent-offspring error checking appropriate for
X-linked markers. Checks are also made to ensure that imported
IBDs for unrelated individuals are equal to 0.
The options for the 'ibd import' command are
-file (or -f) Import IBDs from this filename.
-nod7 Don't take D7 from input file; set D7
to zero instead.
-ibdid Input file contains SOLAR indexed IDs
(IBDIDs) rather than permanent IDs.
-xlinked Use error checking appropriate for
X-linked markers
Notes: The computed IBDs are stored in gzipped files with names of the
form 'ibd.<marker>.gz', where <marker> is the marker name. All
working files created during the IBD computation process will be
stored in the marker-specific subdirectories created by the
'marker load' command.
Mitochondrial IBDs are stored in the gzipped file 'ibd.mito.gz'.
If a marker exists in the marker data that is named 'mito', the
same file name will be used to store the IBDs for that marker.
Hence, the marker name 'mito' should not be used if you intend
to use the 'ibd mito' command.
Shortcuts: ibd - ibd
Purpose: Set directory in which IBD matrix files are stored
(twopoint only; use mibddir to set up multipoint)
Usage: ibddir <dirname> ; set director for IBD files
ibddir ; show current ibddir
ibddir -session ; show ibddir entered in this session
Notes: The ibddir selected is saved in file ibddir.info for
future SOLAR sessions. Once a midddir is selected, it
need not be selected again within the same working directory,
EXCEPT for the purposes of writing out ibd files. To
prevent accidentally overwriting pre-existing ibd files,
it is necessary to explicitly enter the ibddir
command before using the ibd command or other commands
which write files into the ibddir.
Shortcuts: ibdd - ibddir
Purpose: Set or display IBD processing options.
Options: XLinked select this option to load X-linked marker data
NoMLE if this option is chosen, MLE allele frequencies are not
required for IBD calculation
MCarlo if this option is chosen, the Monte Carlo method will be
used to calculate IBDs
MibdWin size (in cM) of the multipoint IBD window - the MIBDs at
a given chromosome location depend only on markers inside
or on the boundary of the window centered at that location
Usage: ibdoption ; displays current IBD options
ibdoption xlinked ; toggles the XLinked option
ibdoption xlinked <y/n> ; sets the XLinked option
ibdoption xlinked ? ; displays the current setting of XLinked
ibdoption nomle ; toggles the NoMLE option
ibdoption nomle <y/n> ; sets the NoMLE option
ibdoption nomle ? ; displays the current setting of NoMLE
ibdoption mcarlo ; toggles the MCarlo option
ibdoption mcarlo <y/n> ; sets the MCarlo option
ibdoption mcarlo ? ; displays the current setting of MCarlo
ibdoption mcarlo # <num> ; sets number of imputations
ibdoption mcarlo # ; displays number of imputations
ibdoption mcarlo max <y/n> ; choose max risk for first imputation?
ibdoption mcarlo max ? ; displays max risk option
ibdoption mibdwin ; displays the multipoint IBD window size
ibdoption mibdwin <size> ; sets the multipoint IBD window size
Shortcuts: ibdo - ibdoptions
Purpose: Compute marker-specific IBS matrices.
Usage: ibs <marker> [<marker> ...]
; computes IBSs for specified markers
ibs ; computes IBSs for all markers
Before the ibs command can be run, the directory in which to
store the IBSs must be specified with the 'ibddir' command.
Notes: The computed IBSs are stored in gzipped files with names of the
form 'ibs.<marker>.gz', where <marker> is the marker name.
Shortcuts: ibs - ibs
Purpose: Check if a Tcl global variable exists
Usage: if_global_exists <global_name>
Returns 1 if global exists, 0 otherwise.
Notes: This is used in scripts in an "if" statement. For example:
if {[if_global_exists SPECIAL_CASE]} {
global SPECIAL_CASE
puts "This is case $SPECIAL_CASE"
}
You do not need to declare the variable "global" before
calling if_global_exists. However, you will need to declare it
global before setting or using it in a script. Note that all
variables declared at the interpreter level (at the solar>
prompt) are automatically global. Global variables should
not be confused with "shell" variables such as SOLAR_BIN
(though, all shell variables may be found in the global
array "env", for example, $env(SOLAR_BIN)).
Global variables are a convenient way of passing variables
through many levels of procedure calls without rewriting all
the intervening procedures, or across commands on an ad hoc basis.
Use of global variables is considered "bad style" by programming
purists and other bores. But if they're so smart, why aren't they
writing your program? It is true, however, that use of global
variables can sometimes introduce bugs and other unintended
consequences.
Globals variables prefixed with SOLAR_ are reserved for use by
the standard SOLAR procedures defined in solar.tcl. But solar.tcl
might also use unprefixed globals, so it is recommended that users
use their own unique prefix to be safe.
See Also: remove_global
Purpose: Check if a parameter exists without creating it
Usage: if_parameter_exists <parameter_name>
Returns 1 if parameter exists, 0 otherwise.
Notes: This is used in scripts in a "if" statement. For example:
if {[if_parameter_exists h2q1]} {
constraint e2 + h2r + h2q1 = 0
}
Purpose: Save inverse normal transformation to a file (see also define)
IMPORTANT: To create a model using an inverse normal transformation,
it is more convenient to use the "define" command, and NOT
the inormal command. The "inormal" command itself is for
those rare situations where you need to save the inverse
normal transformation to a file for some other purpose.
Usage: define <defname> = inormal_<phenotype>
trait <defname>
inormal -trait <trait> [-file <filename>] -out <filename>
[-phenfile] [-nofamid]
(See notes below for obscure forms of the inormal command not
recommended for most users.)
Notes: For the "define" command, the <defname> can be any name you
can make up. The inormal_ prefix may be abbreviated down to inor_ .
The <phenotype> is any phenotypic variable in the currently
loaded phenotypes file.
For the "inormal" command itself, you must use one of the
arguments "-phenfile" or "-file <filename>". The "-phenfile"
argument is a shorthand way of specifying the currently loaded
phenotypes file. The "-file <filename>" argument is used to
specify any file. In either case, the file must be in the form
of a phenotypes file, with fields for <trait> and ID (and FAMID
if required to make the ID's unique). BE SURE TO SPECIFY THE
"-out" ARGUMENT FOR THE OUTPUT FILE.
The inverse normal transformation of a dataset is performed
by the following procedure:
The trait values are sorted, and for any value V found
at position I in the sorted list, a quantile is computed
for it by the formula I/(N+1). The inverse normal
cumulative density function (see "normal") is computed for
each quantile and stored in an array keyed by ID, and
FAMID if applicable. When the value V occurs multiple times,
the inverse normal is computed for each applicable quantile,
averaged, then the average is what is stored for each ID.
These values are accessed when the ID is provided. The
array for each trait is deleted by the -free option.
See also the "normal" command, which computes normal distribution
functions. inormal uses a "normal -inverse" command.
OBSCURE FORMS OF THE INORMAL COMMAND
Largely for internal purposes, such as the implementation of
the define command, there are additional obscure forms of the
inormal command which save the inverse normal results in
a tcl variable for access on an individual ID basis:
inormal -trait <trait> [-file <filename>] -tclvar
[-phenfile] [-nofamid]
inormal -trait <trait> -id <id> [-famid <famid>]
inormal -free <trait>
inormal -reset
The first form above is like the standard form, except that the
-out argument is replaced with a -tclvar argument, signifying
that the inverse normal results are to be saved to a Tcl variable
associated with the trait name. In the second form, a result is
obtained from previously stored data for each ID. In the third
form, stored data for a particular trait is freed. In the
fourth form, all stored data is freed.
The -out and -tclvar arguments cannot be used at the same time.
If the -out argument is used, inverse normals are simply written
to a file and nothing is stored, so the second form cannot be
used.
FAMID should only be specified if required.
The rules regarding FAMID are almost identical with
those used during maximization, so that in general you don't
have to think about them. If FAMID field is found in both
pedigree and phenotypes files, or if pedigree file isn't loaded
(which wouldn't be allowed during maximization) and FAMID is
found (only) in phenotypes file, FAMID is automatically required,
unless the -nofamid argument is used. If FAMID is found in
only one of the two files (and both are loaded), a test for
ID uniqueness is performed, then if ID's are unique without FAMID,
it is not required, otherwise FAMID is required and if not present,
it is an error. FAMID can be mapped to any other field name using
the field command.
When using these obscure forms of the inormal command, it is
recommended to load the data and then use it in short order,
even though the inormal command doesn't intrinsically require
this. Internal "inormal" data is not saved from one SOLAR
session to the next.
BEWARE that "maximize" or any SOLAR command that performs
maximization, such as "polygenic" or "multipoint", may clear
out inverse normal data stored using -tclvar. Also, if
different layers of procedures get inormals on traits
with the same name from different files, and their inormal
operations overlap, there
could be problems.
Purpose: Set cM interval and range for multipoint scanning each chromosome
Usage: interval <count> <range> ; set increment count and range
interval <count> ; default range is 0-* (* means last)
interval ; displays current setting
Examples: interval 5 ; Check every 5 cM
interval 1 101-109 ; Check every 1 cM between 101 and 109
interval 10 200-* ; Check every 10 cM after <200 cM>
interval 0 100 ; Check at position <100 cM>
interval -5 *-100 ; Check every 5 cM from last to 100
Shortcuts: interv - intervals
Purpose::Row and column inversion on comma delimited file
Usage: invert <infile> <outfile>
Note: All records must have same length. First record is treated like all
others. To invert Pedsys file, use ped2csv first. Memory usage for
extremely large files (>100mb) could be a problem. If memory is exhausted
while caching the file in memory, solar might crash to the shell prompt.
Purpose: Check if value is NaN (Not a Number)
Usage: is_nan <number>
Returns 1 if number is NaN, 0 otherwise.
Notes: This is most useful in scripts, when getting the likelihood or
other value using read_model, you should check to be sure it
is not NaN due to maximization convergence error.
Purpose: Join files horizontally based on ID's
Usage: joinfiles [-all] [<filename>]* [-out <filename>] [-key <keylist>]
-list <filename> -chunk <chunksize>
-out <filename> Write joined records to this file (default is
joinfiles.out in the working directory)
-key <keylist> Base join on these key(s) (default is ID or EGO,
and also FAMID if FAMID is present in all files)
-all Filenames may be patterns with wildcards
(including * to match any sequence of characters
and ? to match any one character) and/or names of
directories whose files will be included.
(Files in subdirectories are not included.)
When using -all, no system limit on open files
is applicable.
-list <filename> Include all files listed in <filename>, which
has one filename in each line, which may be
a pattern with wildcards. Only one -list
may be used. When using -list, no system limit
on open files is applicable.
-chunk <chunksize> The chunk size used in joining files under
-all and -list options. By joining only one
chunk of files at a time, the system limit
on open files is bypassed. The default is
100.
Some additional esoteric options are described below in Note 7.
Notes:
1) Each file may either be Comma Delimited or Pedsys, and sucessive files
may use different formats.
2) The output file will be Comma Delimited, thus this command also serves
to translate one or more Pedsys files to Comma Delimited format.
3) Any field mapping of ID and FAMID to some other name through the
"field" command will be applied if the keys are defaulted. Key
matching is case insensitive, so the key might be "ID" in one file
and "id" in the next.
4) Records will be output for every ID encountered regardless of whether
that ID is found in all files.
5) If keys are specified, you'd better know what you are doing.
No field name mapping or testing of whether FAMID is required
will be done. However, whether defaulted or not, the availability
of keys in every file will be tested.
6) If the same filename is repeated in the list of files to join, the
repeats are ignored (for technical reasons). If you must join the
same file to itself for some legitimate reason (???), copy
to a another filename first.
7) If the same field name(s), other than the key(s), are found in more
than one file, the default action is to rename them in the output
file in a way so as to be unique. The following format is used:
<field name>.<filename>[.<number>]
If adding the filename makes the field name unique, that is all that
is done, which makes for a nice new name. For example:
q4.qaw10.phen (phenotype q4 in file gaw10.phen)
Otherwise, <number> is applied, using the first number (starting
from 2 and ending with 30002) that makes the field name unique.
Unless there are more than 30000 matching field names, this will
guarantee that a unique name will be found and used. Also,
with reasonably short names, it is likely that the resulting name
will be unique within 18 characters, which is currently required
for trait and covariate names. However, uniqueness within 18
characters is not guaranteed as that would require ugly renaming
and it's quite possible the 18 character limit may be removed
eventually anyway. Uniqueness testing is case insensitive.
There are two other optional ways of handling field names which are
not unique. These option specifiers may be used anywhere after the
command name but apply globally to all files.
-uniqueonly Remove fields which are not unique among files
(except keys).
-norename Don't rename fields that are not unique, just
include them. (Note: If this option is applied,
the resulting file may cause problems with
various SOLAR commands. For example, the
"residual" command won't like it even if the
duplicated field is NOT used as a trait or
covariate.)
8) If the same fieldname is repeated in one file, that field is
not included in the output. (Such fields could not be selected
as traits or covariates in SOLAR anyway.) This typically occurs
when there is a field named BLANK to separate columns in a Pedsys
file. Also, fields with the "null" name (zero characters or all
blanks) are not included.
solar::linkqsd0 --
solar::linkqsd --
Purpose: Set up linkage model with esd, gsd, qsd parameters (EXPERIMENTAL)
Usage: linkqsd <path>/<mibdfile>
linkqsd0 <path>/<mibdfile>
Example: model new
trait q4
covar age sex
polygsd
maximize
gsd2h2r
chromosome 9 10
interval 5
mibddir gaw10mibd
multipoint -link linkqsd0 -cparm {esd gsd qsd}
Notes: Polygenic parameters must already have been set up (use the
polygsd command). Prefereably it should have been maximized
also (use the maximize command).
linkqsd modifieds the model currently in memory. linkqsd0 assumes
the existance of a null0 model in the maximization output directory,
loads that, and then adds the linkage element.
We have not worked out suitable heuristics to force maximization
of difficult models, so at the present time this parameterization
is not as robust as our standard parameterization.
Purpose: Set up parameters and constraints for multipoint linkage model
Usage: linkmod [-add [-epistasis]] [-zerostart] [-2p] <ibdfile>
[-cparm] [-se]
-add means add this to previous linkage elements
(otherwise, it supercedes the previous one)
-zerostart is used for score analysis (see below)
-2p twopoint (ibd not mibd)
-epistasis sets up epistasis parameter between new
element and previous linkage element. Use with
"-add". Not supported for bivariate.
-se Turn on standard error calculation option.
The default is to turn it off.
-cparm "Custom Parameterization" Simply replace old
matrix with new matrix. Parameters, constraints,
and omega are unchanged. A "prototype" model
with suitable matrix, parameters, omega, and
constraints must already be loaded. See
final note below for more information.
Note: if -cparm is specified, standard errors
are NOT turned off, but left in whatever state
they were in when linkmod was called.
Notes: Use the -2p option for twopoint models. linkmod2p is now
obsolescent (linkmod -2p is invoked when you give the linkmod2p
command).
A polygenic or linkage model should already have been
created and maximized first. Boundaries are set around existing
values assuming this has been done.
Multiple linkage terms will be included if Solar_Fixed_Loci
is defined. The script multipoint does this.
By default, standard error is turned off. You may turn it on
again by giving the command 'option standerr 1' after running
linkage and before running maximize.
The -zerostart option starts the new linkage component at 0.
(The linkage component MUST NOT HAVE BEEN ALREADY CREATED!)
This is used for score test analysis.
The -cparm option requires that a prototype linkage model
with all required matrices, parameters, omega terms, and
constraints be already loaded. Other than that, however, it
ATTEMPTS to be as general as possible. However, it is
necessary to make one assumption regarding the name of
the first matrix. If the -2p option is specified, the
relevant matrix that will be modified must be named
ibd or ibd1, ibd2, etc. Otherwise, the relevant matrix
must be named mibd or mibd1, mibd2, etc. It is the
ibd or mibd matrix with the highest number, if any,
which will be replaced. If a second matrix column such
as d7 or delta7 is included, it will be assumed to be
included in the replacement matrix as well. This option
is used by "multipoint -cparm" and "twopoint -cparm".
Shortcuts: linkm - linkmodel
Purpose: Set up linkage model with esd, gsd, qsd parameters (EXPERIMENTAL)
Usage: linkqsd <path>/<mibdfile>
linkqsd0 <path>/<mibdfile>
Example: model new
trait q4
covar age sex
polygsd
maximize
gsd2h2r
chromosome 9 10
interval 5
mibddir gaw10mibd
multipoint -link linkqsd0 -cparm {esd gsd qsd}
Notes: Polygenic parameters must already have been set up (use the
polygsd command). Prefereably it should have been maximized
also (use the maximize command).
linkqsd modifieds the model currently in memory. linkqsd0 assumes
the existance of a null0 model in the maximization output directory,
loads that, and then adds the linkage element.
We have not worked out suitable heuristics to force maximization
of difficult models, so at the present time this parameterization
is not as robust as our standard parameterization.
solar::linkqsd --
Purpose: Set up linkage model with esd, gsd, qsd parameters (EXPERIMENTAL)
Usage: linkqsd <path>/<mibdfile>
linkqsd0 <path>/<mibdfile>
Example: model new
trait q4
covar age sex
polygsd
maximize
gsd2h2r
chromosome 9 10
interval 5
mibddir gaw10mibd
multipoint -link linkqsd0 -cparm {esd gsd qsd}
Notes: Polygenic parameters must already have been set up (use the
polygsd command). Prefereably it should have been maximized
also (use the maximize command).
linkqsd modifieds the model currently in memory. linkqsd0 assumes
the existance of a null0 model in the maximization output directory,
loads that, and then adds the linkage element.
We have not worked out suitable heuristics to force maximization
of difficult models, so at the present time this parameterization
is not as robust as our standard parameterization.
Purpose: Load a user data file (pedigree, phenotype, marker, etc.)
Usage: load pedigree <filename>
load phenotypes <filename>
load matrix <filename> <name1> [<name2>]
load model <filename>
load freq [-nosave] <filename>
load marker [-xlinked] <filename>
load map [-haldane | -kosambi] <filename>
Notes: There is much more information available for each variant of
the load command. See the documentation for the
particular type of data file, for example, "help pedigree".
For information about a particular file format, see the
applicable file-* documentation, for example, "help file-pedigree".
The "load" command is actually an alias which reverses the
first two words. So "load pedigree" is the same command
as "pedigree load". We feel that "load pedigree" is more
natural, so we allow both.
Shortcuts: load - load
Purpose: Load a matrix named phi2.gz containing phi2 and delta7
Usage: loadkin
Notes: If the file phi2.gz does not exist, this command will be
silently ignored. This command is mainly for scripts. You
can perform the same action with a known matrix file with:
matrix load phi2.gz phi2 delta7
Shortcuts: loadk - loadkinship
Purpose: Calculate LOD score
Usage: lod [<test-loglike> <null-loglike>] [<options>]
options := [-auto|-off|-raw] [-trait <N>] [-rhoq <N>] [-v]
[-1t|-2t|-3t|-4t|-t1|-t2|-t3|-t4] [-nolodadj]
If no likelihoods are specified, the likelihoods of the
"current" model and the applicable "null" model are used.
-auto Convert multivariate LOD to 1df effective LODs based
on number of traits in current model and constraint
of relevant rhoq's (default)
-off Do not convert LODs to 1df effective
-raw Do not perform LOD conversion or lodadj
-traits <N> Convert multivariate LOD to 1dF assuming <N> traits
-1t or -t1 Assume 1 trait (same as "-traits 1")
-2t or -t2 Assume 2 traits (same as "-traits 2")
-3t or -t3 Assume 3 traits (same as "-traits 3")
-4t or -t4 Assume 4 traits (same as "-traits 4")
-rhoq <N> Convert multivariate LOD to 1df assuming <N>
constraints of relevant rhoq's
-nolodadj Do not perform lod adjustment (lodadj)
-v verbose: Show adjustment and conversion steps
Examples: outdir test1
load model test1/null1
lod
lod -v
lod -2000.51 -2030.87
lod -trait 3 -rhoq 1 -v -2000 -2030
lod -raw -2000 -2030
Notes: If no likelihoods are specified, the current model must have
been maximized through a command such as "maximize," "twopoint",
or "multipoint", and the applicable null model should be saved as
nullX.mod (e.g. null0.mod, null1.mod) where X is the number
of active linkage elements, which is assumed to be one less
linkage element than in the current model. Linkage elements are
parameters named h2q1, h2q2, etc. The null model must have
been saved in the maximization output directory, either named
after the trait or set by the outdir command.
By default, SOLAR provides easily interpreted "1 df effective" LODs
which are equivalent to those in univariate models.
However, you can also have complete control over the LOD
conversion performed either using arguments here or
preferences set globally with the lodp command. Options
specified here override the defaults and lodp preferences.
The correction of 2 trait LODs to 1dF effective LODs is based
on this formula: the LOD is converted to chi square with
1/2 1df, 1/4 3df, and 1/4 point mass at zero. If rhoq is
constrained, the formula is 1/2 1df, 1/4 2df, and 1/4
point mass at zero. This is then converted to a 1/2 1df
chi square of equivalent p-value, which is divided by
2ln10 to get the 1df effective lod score.
The correction of 3 trait LODs to 1dF effective LODs is based
on the formula: the LOD is converted to chi square with
3/8 1df, 3/8 3df, 1/8 6df, and 1/8 point mass at zero.
For each rhoq constrained, the 6df is changed downward
by 1df.
The conversion of higher multivariate LODs follows a similar
expanding sum. If you wish to see the weights used, use the
lod command with the -v option.
Empirical LOD adjustment, if any, is automatically applied (see
the lodadj command) unless the -raw option is used. Unless you
specify -raw, SOLAR will need to search the output directory for
a lodadj.info file, which means that a trait or outdir must
have been selected.
Empirical LOD adjustment is not yet supported for bivariate
models. The lodadj value is ignored when bivariate LODs are
computed, and, in the cases where the lodadj value would be
shown (such as in the multipoint.out file, or if lod is called
from the command prompt) a warning message is shown instead.
In SOLAR version 3.0.2, the "clod" and "lod" commands were
combined into a new "lod" command. The options allowed
have changed compared with the earlier "clod" ; the original
"lod" command did not allow any arguments.
Use the "lodn" command if you the current model may not use
the "h2q1" linkage parameter and you are not specifying
loglikelihoods explicitly.
See also lodn, lodp, lodadj.
Shortcuts: lod - lod
Purpose: Use or calculate an empirical LOD adjustment
Usage: lodadj [-calc] [-off] [-null <N>] [-nreps <#replicates>] [-restart]
[-restorenull0] [-query] [-summary]
lodadj If no arguments are given, this turns ON the
previously calculated empirical LOD adjustment
for the current trait/outdir. This value is
stored in a file named lodadj.info if currently
ON or lodadj.off if currently OFF.
It is an error if the null0 model has a later
timestamp than the lodadj.info file. (You can
update the timestamp of the lodadj.info file with
the Unix "touch" command if you are sure it is OK.)
-off Turn OFF empirical LOD adjustment.
-query Return the LOD adjustment currently in effect
(1.0 if none).
-calc Calculate and use a new empirical LOD adjustment.
(This requires an existing null0.mod file from the
polygenic command.) The adjustment is turned ON.
-null Use an existing nullN model instead of null0.
-nreps Number of replicates. In each replicate, a
fully-informative marker, unlinked to the trait,
is simulated, IBDs are calculated for this marker,
and a LOD is computed for linkage of the trait
to this marker. The default number is 10000.
-restart (or -r) Perform additional replicates, adding the
LODs to the set of previously computed LODS, until
the total number of replicates (old and new) reaches
the number specified by the -nreps argument. The
same null model is used as in the previous replicates;
the -null argument is ignored if present.
-cutoff Specify the fraction of the highest observed LODs
that will not be used to compute the empirical LOD
adjustment. For example, if the cutoff is .01, then
the largest 1% of the observed LODs will be ignored
when the LOD adjustment is calculated. The default
cutoff is .05.
-overwrite (or -ov) Recalculate the empirical LOD
adjustment. Existing LOD adjustment output
files in the trait/outdir will be overwritten.
-restorenull0 Restore the null0 model in effect at the time
the last empirical LOD adjustment was
calculated. This will overwrite a later
null0 model.
-summary Display a summary of the LOD adjustment calculation.
The summary shows the distribution of the original
and adjusted LOD scores, the number of replicates
performed, and the name of the null model.
Notes: The -calc option produces output files in the trait/outdir:
lodadj.out, lodadj.lods, and lodadj.info. lodadj.out contains
summary information, lodadj.lods contains the raw actual vs.
theoretical LODs, and lodadj.info contains state information
including the null model currently in effect.
The lodadj value and state (on or off) is saved in each
trait/outdir (by the lodadj.info or lodadj.off file). This
is now preserved when restarting SOLAR.
lodadj is now supported for bivariate lods. Since the
correction is always computed with one additional degree of
freedom, the lodadj adjustment is applied AFTER the
lod correction to one degree of freedom, and the user is
advised not to disable the one degree of freedom correction
with the lodp command.
Purpose: Calculate LOD score for current model relative to nullX
Usage: lodn X <options>
X Number indicating index of relevant null model (for
example, 0 for null0, the model having no linkage
elements).
<options> See "help lod".
Notes: In many cases you can more easily use the "lod" command, which
determines the applicable null model automatically, or, you can
specify the loglikelihoods. "lodn" may be useful if you
are not sure whether the current model contains h2q parameters,
for example, if it includes a custom parameterization.
The current model must have been maximized, either through the
"twopoint" or "multipoint" command, or directly with the
"maximize" command.
The null model should be saved as nullX.mod (for example, null0.mod
or null1.mod) where X is the number of active linkage elements.
There are many special options for LOD calculation. See
"help lodp" for more information. The primary LOD calculating
procedure in SOLAR is "lod" which lets you specify the
loglikelihood values and option(s) directly.
See also lod, lodp.
Shortcuts: lodn - lodn
Purpose: Change LOD preferences (such as conversion to 1df)
Usage: lodp [-auto | -off | [-traits <N>] [-rhoq <N>]]
[-1t|-2t|-3t|-4t|-t1|-t2|-t3|-t4]
(If no argument is given, current preferences are displayed.)
-auto Convert LODs to 1 degree of freedom (1dF) effective LODs
automatically based on traits and rhoq constraints
in current model (default).
-off Do not perform LOD conversion to 1 df equivalence
-traits <N> Convert assuming there are/were <N> traits
-1t or -t1 etc. Shortcuts for 1 trait, 2 traits, etc. up to 4
-rhoq <N> Convert assuming <N> rhoq's are constrained
Notes: If -traits is specified without -rhoq, -rhoq is assumed to be 0.
If -rhoq is specified without -traits, trait count is determined
automatically (and might be determined to be 1, in which case
rhoq specification is irrelevant). If you need to set both
-traits and -rhoq, you must give both in the same lodp command.
This should not be confused with lodadj (see). The lodp command
sets global preferences for "degrees of freedom" issues which
arise with multivariate models. The default "-auto" will
apply conversion based on the number of traits in the current
model and the number of relevant rhoq's (defined below) which
are constrained. LODs will be converted to 1 degree of freedom
effective LODs (for which the traditional cutoff in statistical
genetics for a genome-wide linkage scan is 3).
Relevant rhoq's are parameters prefixed rhoq which correspond
to the highest numbered linkage element. For example, in a
bivariate linkage model with one linkage element, the relevant
rhoq whould be "rhoq1", but with two linkage elements, it would
be "rhoq2". For a trivariate model with one linkage element,
the relevant rhoq's would be: rhoq1_12, rhoq1_13, rhoq1_23.
The preferences set by this command will apply to all LOD scores
subsequently calculated by SOLAR, including those reported by
the twopoint and multipoint commands, and the lod and lodn
commands. The lod command, which is what ultimately
calculates all LOD scores, has options which are similar to
lodp.
Changes to lodp preferences apply only within the current
SOLAR session, so the command must be re-entered each time
SOLAR is started or at the beginning of SOLAR scripts when
you need to change the defaults.
For more discussion of the how the conversion is performed,
which rhoq constraints are relevant, etc., see help for
the lod command.
See also lod, lodn, and lodadj.
Purpose: Get the log likelihood of the current model
Usage: loglike
Note: This could be used in a TCL script like this:
set firstll [loglike]
All this procedure does is retrieve the stored loglikelihood.
The current model must have been maximized first, either with
the maximize command, or with a command like twopoint or
multipoint which maximizes and compares many models. If
the current model has not been maximized, an error is raised.
Shortcuts: logl - loglikelihood
Purpose: Apply current lodadj to a previous multipoint run
Usage: madj
madj -restore ;# restore ORIGINAL multipoint files
madj -undo ;# restore previous multipoint files
Notes: trait or outdir must already have been selected.
madj applies loadadj from lodadj.info file in trait/outdir.
madj may be used on incomplete multipoint runs prior to restarting.
It is not necessary to -restore before applying another lodadj.
Some roundoff errors occur in last decimal place, but do not
"accumulate" over multiple runs because LOD's are calculated
from loglikelihood values, not previous LOD's. NULL models must
be present.
If there is an error, there should either be a "no files modified"
or "restoring previous files" message. If not, or if it is
desired to restore the ORIGINAL multipoint files for any
reason, use the command "madj -restore." The FIRST time madj is
run, those files were saved as multipoint*.save. (The PREVIOUS
set of multipoint files were also saved as multipoint*.tmp, and
may also be restored with the "madj -undo" command.)
Shortcuts: madj - madj
Purpose: Process the map data.
Usage: load map [-haldane | -kosambi | -basepair] <filename>
; loads map file
map show [<marker> ...] ; displays map data
map unload ; unloads map data
map names ; displays marker names
map fname ; returns name of map file
map chrnum ; returns chromosome identifier
map nloci ; returns number of loci in map
map func ; returns mapping function code
('b'=basepair, 'h'=Haldane,
'k'=Kosambi)
In the map file, marker locations are typically given in cM.
When multipoint IBDs are computed, the distances between pairs
of markers are converted to recombination fractions by means of
a mapping function. By default, the Kosambi mapping function
is assumed. The Haldane mapping function can also be used by
specifying the -haldane option when loading the map file.
Marker locations can also be specified as integer numbers of
basepairs. This is useful, for example, when the markers are
SNPs with known offsets in basepairs from some starting location.
When basepair locations are used, the mapping function is called
"basepair" rather than Kosambi or Haldane, but in fact there is
no mapping provided from basepairs to recombination fractions.
Therefore, such maps cannot be used to compute multipoint IBDs.
In map files which contain cM locations, the first line of the
file can optionally include the name of the mapping function,
in which case no command line option is required to specify the
mapping function. If the load command specifies a different
mapping function from that specified in the map file, the load
command option takes precedence.
Map files which contain basepair locations must either have
the basepair mapping function specified on the first line of
the file or be loaded using the -basepair option.
If a marker name is not specified for the 'map show' command,
all currently loaded map data will be displayed.
The name of the currently loaded map file and the mapping
function are stored in the file 'map.info' in the current
working directory. This file persists between SOLAR runs,
which means that the map file will still be loaded the next
time SOLAR is invoked (from within the same working directory.)
For a description of the map file, enter 'file-map'
Shortcuts: map - map
Purpose: Process the marker genotype data.
Usage: load marker [ -xlinked ] <filename> ; loads marker file
marker unload [ -nosave ] ; unloads marker genotype data
marker discrep [<marker> ...] ; checks for marker discrepancies
marker names ; displays marker names
marker show [<marker> ...] ; displays summary of marker data
marker fname ; returns name of marker file
The '-xlinked' option of the 'marker load' command can be given
when loading genotype data for X-linked markers. Alternatively,
the XLinked option can be set with the 'ibdoption' command.
Genotype data will not be unloaded for markers for which MLE
allele frequencies have been computed but not saved to a file.
To save MLE allele frequencies, use the 'freq save' command.
To unload markers without saving MLE allele frequencies, give
the '-nosave' option in the 'marker unload' command.
If a marker name is not specified for the 'marker discrep' or
the 'marker show' command, the command applies to all markers
currently loaded.
The state of the currently loaded marker data is stored in the
file 'marker.info' in the current working directory. This file
persists between SOLAR runs, which means that the markers which
are loaded at the end of a session will still be loaded the
next time SOLAR is invoked (from within the same working
directory.)
Notes: The marker load command creates a subdirectory in the current
working directory for each marker. Marker subdirectories are
named 'd_<marker>', where <marker> is the marker name. The
contents of a subdirectory will depend on the type of marker
processing performed, and will include various input, output,
and (possibly) error files. The marker subdirectories are
deleted when the marker genotype data is unloaded.
The loci in the marker file must all be autosomal or all be
X-linked. By default, SOLAR assumes the loci are autosomal.
The set of markers in the marker file and the set of markers in
the freq file do not have to be the same. Allele frequencies
will be computed for markers that do not appear in the freq file
at the time these markers are loaded.
For a description of the marker file, enter 'file-marker'
Shortcuts: mark - markers
Purpose: Test markerfile for discrepancies; list blankable ID's
Usage: markertest <markerfile> [<marker>]* [<option>]
<option> := -1 | -many | -ped | -fam <famfile> | -2
<markerfile> is either exact filename or pattern including *
<marker> (optional) is either exact name or pattern including *
If no markers are specified, all markers in markerfile are tested.
Each marker is tested individually.
Results are recorded in markertest.out in current directory.
During normal operation, many error and warning messages may
appear. Ignore these messages until markertest has finished.
If no options are specified, a flexible procedure is used that
should work in nearly all cases. It is the same as markertest -1
followed by markertest -many if necessary. IMPORTANT: Read the
following two paragraphs to understand how those options work.
-1 Blank one individual at a time. Report all individual
blankings (if any) that fix discrepancy. If this succeeds
only one of the reported individuals needs to be blanked
and it is up to user to pick the best one. However, this
procedure is good (if it works) because it will list all
the possibilities, and it is relatively fast. But if it
is necessary to blank more than one individual AT THE
SAME TIME, this procedure will fail, so it is frequently
inadequate by itself.
-many Blank the first individual, and, if that doesn't fix
discrepancy, blank the second individual, and then the
third, and so on, until the discrepancy is fixed. Then,
unblank all the individuals that can be unblanked without
a discrepancy returning. The result is one set of
individuals that ALL NEED TO BE BLANKED AT THE SAME TIME.
It is not necessarily the only such set of individuals,
or the smallest set. This procedure should always succeed
in finding a set of blankable individuals. (This
option used to be named -r.)
-ped Rather than blanking only one ID at a time, blank
whole "extended pedigrees" at a time. Blankable
pedigrees are identified by pedindex.out PEDNO
(the index created by "load pedigree") and
by the first ID found in that pedigree. This procedure
is the fastest, and is useful in identifying errant
pedigrees, provided there is only one errant pedigree.
-fam Rather than blanking only one ID at a time, blank
nuclear families (or other groups) identified by
"famno." The "famfile" must contain records
including "ID" (permanent ID) and "FAMNO" (other
fields will be ignored). There may be more than
one record for each ID. Records may not use
"FAMID" to disambiguate ID's.
-2 Try blanking combinations of 2 ID's until one such pair
of blankings fixes the discrepancy. Because this is an
N x N problem, it may take a VERY LONG TIME to finish, but
if you are convinced there is one pair that needs to be
blanked, this procedure will find it.
Notes: Pedigree file must already have been loaded
Markerfile must have ID and and marker(s) only. Each marker is
analyzed separately. Results for all markers are reported in
markertest.out, which is divided into one section for each
marker.
Output is written to markertest.out which is divided into one
section for each marker.
Shortcuts: markert - markertest
Purpose: Prepend pedindex checksum (CRC) to the beginning of a matrix file
Usage: matcrc [<path>/]<filename> [<pedindexdir>]
If pedindex.out is in a different directory, the optional
<pedindexdir> is specified to point to where it is located.
It defaults to the current directory "."
Quick Notes:
Matrix files in SOLAR include phi2.gz, IBD, and MIBD files.
If you have modified a pedindex file directly and are subsequently
having pedigree mismatch errors with your matrix files, but
are sure everything is correct, you can simply run matcrc on
all your matrix files to fix the problem quickly. However it is
recommended you read all about matcrc first.
Examples: matcrc phi2.gz
matcrc gaw10mibd/mibd.1.10.gz
The first two lines in an unzipped matrix file created by SOLAR
since Version 4.0 will look like this:
1 1 0.1268587045 .61377
1 1 1.0000000 1.0000000
The first line is a checksum used to verify that the current
pedindex file is the same as the one used when the matrix was
created. This checksum is not required when the matrix is read,
but if it is present, and it does not match the checksum values
from the pedindex file, an error will stop maximization until the
problem is fixed.
The checksum is immediately followed by the actual data for the
diagonal identity pair 1,1.
When the matrix is actually being read, from beginning to end, the
second line values will overwrite the first line values in the
memory used for matrix storage. The checksum line will therefore
have no effect on maximization results. The numbers in the
checksum line are only used for matrix/pedindex validation in
SOLAR versions 4.0 and beyond.
matcrc is used by all SOLAR procedures that create matrix files
since version 4.0. At this time, it need not be used for user
created matrix files, it is optional now. But we recommend that
users writing their own matrix files also use matcrc to
postprocess their matrix files so that their matrices are also
protected from being used with a changed pedigree. Even a slight
change to a pedigree file can change the numbering of all
individuals in the pedindex, and thus make a previously created
matrix file entirely incorrect.
Notes: The relevant pedindex.out file is assumed to be in the current
working directory. The matrix file can be in any other directory
so long as the relative or absolute path is given.
The current user must have write access to the matrix file
and to the directory itself, because during matcrc operation
the matrix file is unzipped and then rezipped.
A Posix compliant "cksum" command is assumed to be available
on the system. It produces a quasi-unique polynomial Cyclic
Redundancy Check (CRC) on the pedindex.out file; the chances of
any different pedindex.out file having the same CRC is
astronomically small. This includes changes such as swapping
characters or lines.
The CRC and Number of Octets (NOCT) is prepended to the beginning
of the matrix in such a way as to be backward compatible with
earlier versions of SOLAR. The first line in the actual matrix
is used as a template, but the coefficients are replaced with
CRC and NOCT. Since this is followed by the same line with the
actual coefficients, the actual data overwrites the preceding
numbers and the checksum in the preceding line will have no
ill effect with any version of SOLAR, even those which did not
use the checksum.
Beginning with SOLAR version 4.0, when any matrix is loaded,
the CRC and NOCT (if present) are detected and compared with
those from the current pedindex.out file. When SOLAR
version 4.0 loads a matrix with no checksum, no warning is given,
but it may cause a warning or error in some future version. A
mismatch in the CRC or NOCT will cause an error when loading.
The mismatch can be corrected by running matcrc again, because it
detects whether or not the matrix has already been signed, and
removes the previous signature if necessary. However this should
only be done after the user has carefully validated the pedigree
and matrix match. It would be safest to reconstruct all matrix
file with the current pedigree, if done by SOLAR procedures
it would not be necessary for the user to run matcrc since
it is done automatically by those procedures.
User construction of matrix files is discussed in Sec. 8.3 of
the manual. Matrix files are space delimited with semi-fixed
format. The first two columns are for the two IBDID values,
followed by one or two data value columns. The data value
columns should begin in character position 14 or higher
(counting the first position as 1). Once gzipped, the file
should be processed with the matcrc command after loading the
pedigree file, to generate a safety checksum value.
Purpose: Set up matrix variables (e.g. ibd, mibd, kinship)
Usage: matrix load <filename> <name1> [<name2>]
; loads matrix variable(s) (one or two) from file}
matrix ; displays current matrices
matrix delete <name> ; deletes a matrix variable
matrix delete_all ; deletes all matrix variables
matrix debug ; print info about current matrices
matrix -return ; return matrix commands in Tcl list
Notes: A .gz suffix is appended to the filename, if it is not specified.
Matrix files are compressed with (GNU) gzip; gunzip must be
installed in the user's path.
Matrix names beginning with 'ibd' or 'd7' are 'defaultable.'
This means that if they have a value of -1, that value is replaced
with Phi2 or Delta7 respectively. By default, Phi2 and Delta7 are
calculated internally, but may be overwritten with externally
provided matrices.
The preferred way to set up linkage models is simply to let
the multipoint, twopoint, and bayesavg commands do everything
for you. For more control, you can also use the "linkmod"
(multipoint) and "linkmod2p" (twopoint) commands to set up
(but not evaluate) linkage models. linkmod and linkmod2p
set up all the required parameters and constraints for you.
The command "loadkin" will load the phi2.gz matrix file,
bypassing the usual on-the-fly calculations performed by SOLAR.
User construction of matrix files is discussed in Sec. 8.3 of
the manual. Matrix files are space delimited with semi-fixed
format. The first two columns are for the two IBDID values,
followed by one or two data value columns. The data value
columns should begin in character position 14 or higher
(counting the first position as 1). Once gzipped, the file
should be processed with the matcrc command after loading the
pedigree file, to generate a safety checksum value.
Shortcuts: matr - matrixes
Purpose: Find the maximum loglikelihood of a model by adjusting
parameter values within specified constraints.
Usage: maximize [-quiet] [-out <filename>]
-quiet (or -q) Use minimum verbosity while maximizing
-out (or -o) Write results to this filename. The default
is 'solar.out' in a subdirectory named after
the current trait, or the current 'outdir.'
If a filename is specified without any /
characters, it will also be written in the
default location. If the filename contains
/ characters, it is used as a full pathname.
-noquad Do not test quadratic
-who Do not maximize, but list who would be
included in analysis. File "who.out" is
written to trait/outdir containing list
of pedindex.out sequential ID's. This
option is used by the "relatives" command
-runwho Maximize, AND produce who.out file as above.
-sampledata Do not maximize, but write out the data that
would be included in the analysis to a file
named "sampledata.out" in the maximization
output directory. [WARNING! The fields in
this file are preliminary and subject to
change!] The trait data (fields trait1,
trait2, etc.) might be from a phenotype or
an expression created by the "define" command.
Notes: This is the key command of solar. It is used by polygenic,
twopoint, multipoint, bayesavg, and other scripts to find
the model parameter values which produce the maximum
loglikelihood.
The final values are not saved to a model file. To do that,
issue a 'save model' command after maximization.
Multiple solar processes should not be writing to the same
directory. Use the outdir command to specify different output
directories.
Advanced SOLAR users sometimes use the raw "cmaximize" command
which bypasses many of the retry mechanisms (and their implicit
assumptions) now built-in to SOLAR. This is not recommended for
most users.
Shortcuts: maxi - maximize
Purpose: Show total memory used by this SOLAR process
Usage: memory
Notes: This is intended primarily for internal debugging purposes.
Currently it only works with Solaris 2.6 and above, and there
is no intention to extend it to other systems or releases.
Shortcuts: mem - memory
Purpose: Run Measured Genotype (MG) association analysis for every SNP
Usage: mga [-files [<gcovfile>]+ ] [-snps <snp-tcl-list>]
[-out <outfile>] [-snplists [<snplistfile>]+]
[-format csv | pedsys | fortran] [-noevd] [-notsame]
[-saveall] [-slowse] [-evdse]
SPECIAL NOTE: no filenames or snp names should begin with hyphen (-)
To use evd2, specify "option modeltype evd2" just before running mga.
For evd2, both -evdse and -slowse produce the same fast evd2 standard
errors.
Before invoking mga, user should set up trait and covariate(s) for
the null model. It is not necessary to maximize the model prior to running
mga. You may choose to set up the modeltype and omega, but it is not
necessary. The omega will default to polygenic and the modeltype will be
set to evd (evd1).
When default evd models are used, mga can automatically detect when the
sample has changed, and run new null models as needed.
At mga completion, mga_null will be the last null model created.
If you wish to save more information, use the -saveall option
described below) to save all maximization output files.
-noevd Do not use EVD fast maximization or sample checking. Normally
this is not needed because mga tests whether EVD can be
used and changes to -noevd model when EVD cannot be run. However,
sometimes EVD uses too much memory, so this can help.*
-notsame Samples not same, therefore always run models for each SNP!
Normally this is not needed because mga tests the sample automatically.
If "option samplesametrustme 1" is given prior to running mga,
mga will only run one null model for the first sample, unless
EVD is used and fast checking is available.
-slowse Run standard errors, not using EVD
-evdse Run standard errors, using EVD if possible
Standard Errors are not run by default, and 0 will appear in the
standard error output field.
The -slowse option will disable EVD because standard errors computed when
using EVD in the current EVD1 implementation are sometimes inaccurate.
The -evdse option will compute standard errors and use EVD if
possible, which will be faster if EVD can be used. A warning will be
given about possible SE inaccuracy.
If no -snps or -snplists arguments are given, mga will process
all snp_ prefixed covariates in the currently loaded phenotypes files.
<gcovfile> is one or more snp.genocov file generated by 'snp covar' command.
These files are scanned for snp_ prefixed covariates in addition to the
initially loaded phenotypes files.
<snp-tcl-list> is a tcl-list of snps. If specified, the -snps list
supercedes -snplists.
<snplistfile> is a file listing the snps to be processed. Each snp
is listed on a separate line with no blanks or other characters
The snp_ prefix is not required, but allowed.
Currently, no checking is done to see if any listed snp is found or
not found, or duplicated in subsequent files.
<outfile> defaults to mga.out in the current output directory, which is
cleared out first, then one line of information is produced for each SNP.
If another filename is specified, it is located relative to the output
directory or full absolute pathname, and it is not erased first to permit
accumulation. Note: if the "mgassoc" (original) command name is used, the
output file is named mgassoc.out by default.
-debug Print extra messages useful in debugging, including all
maximization output
-format csv, fortran, pedsys. csv is default. fortran
and pedsys formats are identical space delimited except
that fortran version has a header line at top, pedsys version
writes a code file (.cde). Actually, both fortran and
pedsys options write the code file in case you need it later.
If you have written pedsys format, it can be converted to
comma delimited with the command ped2csv. If you have written
a comma delimited format file, you can convert it to pedsys
format with the command mg_topedsys.
-saveall save all solar.out output files. Each output file will be named
mga_<snpname>.out. Null model output files will be named
mga_null_<snpname>.out based on the first <snpname> they
were created for (null will be re-used if sample unchanged).
Notes:
The genotype covariates are numeric variables giving the observed
(or imputed) number of rare alleles an individual has at a particular SNP.
These fields must be named'snp_<snp>' where <snp> is the SNP name.
Ex: snp_rs12345
*EVD is not currently available for multivariate or discrete models.
However, discrete traits may be handled as quantitative (at some loss in
accuracy) and therefore used with EVD by using either
"option enablediscrete 0" or (generally preferred, to increase SD):
define qtrait = dtrait * 5
solar::mga
Purpose: Run Measured Genotype (MG) association analysis for every SNP
Usage: mga [-files [<gcovfile>]+ ] [-snps <snp-tcl-list>]
[-out <outfile>] [-snplists [<snplistfile>]+]
[-format csv | pedsys | fortran] [-noevd] [-notsame]
[-saveall] [-slowse] [-evdse]
SPECIAL NOTE: no filenames or snp names should begin with hyphen (-)
To use evd2, specify "option modeltype evd2" just before running mga.
For evd2, both -evdse and -slowse produce the same fast evd2 standard
errors.
Before invoking mga, user should set up trait and covariate(s) for
the null model. It is not necessary to maximize the model prior to running
mga. You may choose to set up the modeltype and omega, but it is not
necessary. The omega will default to polygenic and the modeltype will be
set to evd (evd1).
When default evd models are used, mga can automatically detect when the
sample has changed, and run new null models as needed.
At mga completion, mga_null will be the last null model created.
If you wish to save more information, use the -saveall option
described below) to save all maximization output files.
-noevd Do not use EVD fast maximization or sample checking. Normally
this is not needed because mga tests whether EVD can be
used and changes to -noevd model when EVD cannot be run. However,
sometimes EVD uses too much memory, so this can help.*
-notsame Samples not same, therefore always run models for each SNP!
Normally this is not needed because mga tests the sample automatically.
If "option samplesametrustme 1" is given prior to running mga,
mga will only run one null model for the first sample, unless
EVD is used and fast checking is available.
-slowse Run standard errors, not using EVD
-evdse Run standard errors, using EVD if possible
Standard Errors are not run by default, and 0 will appear in the
standard error output field.
The -slowse option will disable EVD because standard errors computed when
using EVD in the current EVD1 implementation are sometimes inaccurate.
The -evdse option will compute standard errors and use EVD if
possible, which will be faster if EVD can be used. A warning will be
given about possible SE inaccuracy.
If no -snps or -snplists arguments are given, mga will process
all snp_ prefixed covariates in the currently loaded phenotypes files.
<gcovfile> is one or more snp.genocov file generated by 'snp covar' command.
These files are scanned for snp_ prefixed covariates in addition to the
initially loaded phenotypes files.
<snp-tcl-list> is a tcl-list of snps. If specified, the -snps list
supercedes -snplists.
<snplistfile> is a file listing the snps to be processed. Each snp
is listed on a separate line with no blanks or other characters
The snp_ prefix is not required, but allowed.
Currently, no checking is done to see if any listed snp is found or
not found, or duplicated in subsequent files.
<outfile> defaults to mga.out in the current output directory, which is
cleared out first, then one line of information is produced for each SNP.
If another filename is specified, it is located relative to the output
directory or full absolute pathname, and it is not erased first to permit
accumulation. Note: if the "mgassoc" (original) command name is used, the
output file is named mgassoc.out by default.
-debug Print extra messages useful in debugging, including all
maximization output
-format csv, fortran, pedsys. csv is default. fortran
and pedsys formats are identical space delimited except
that fortran version has a header line at top, pedsys version
writes a code file (.cde). Actually, both fortran and
pedsys options write the code file in case you need it later.
If you have written pedsys format, it can be converted to
comma delimited with the command ped2csv. If you have written
a comma delimited format file, you can convert it to pedsys
format with the command mg_topedsys.
-saveall save all solar.out output files. Each output file will be named
mga_<snpname>.out. Null model output files will be named
mga_null_<snpname>.out based on the first <snpname> they
were created for (null will be re-used if sample unchanged).
Notes:
The genotype covariates are numeric variables giving the observed
(or imputed) number of rare alleles an individual has at a particular SNP.
These fields must be named'snp_<snp>' where <snp> is the SNP name.
Ex: snp_rs12345
*EVD is not currently available for multivariate or discrete models.
However, discrete traits may be handled as quantitative (at some loss in
accuracy) and therefore used with EVD by using either
"option enablediscrete 0" or (generally preferred, to increase SD):
define qtrait = dtrait * 5
Shortcuts: mgas - mgassociation
Purpose: Compute multipoint IBDs.
Usage: mibd relate [-mxnrel <n>] ; creates relative-class file
mibd merge ; merges marker IBDs
mibd means [-typed | -all] ; computes mean IBD by relative-class
mibd [<from> <to>] <incr> ; computes multipoint IBDs
mibd export [-file <filename>] [-overwrite] [-append]
[-nod7] [-ibdid] [-byloc] [<chromo> ...]
; writes MIBDs for specified chromosomes
; to a file in comma delimited format
mibd import [-file <filename>] [-nod7] [-ibdid] [<chromo> ...]
; reads MIBDs for specified chromosomes
; from a file in comma delimited format
mibd prep <program> [-version 2.82] [-usefreq] [-qter]
; prepares input files needed to compute
; MIBDs using <program>, where <program>
; is loki, simwalk (sw), merlin, or
; genehunter (gh)
mibd import <program> [-file <filename>] [-version 2.82]
; imports MIBDs from an output file
; computed by <program>, where <program>
; is loki, simwalk (sw), merlin, or
; genehunter (gh)
Before any mibd command can be run, the directory in which to
store the mIBDs must be specified with the 'mibddir' command. This
specification is stored in a file mibddir.info in the working
directory, so it need not be repeated in future sessions from
the same working directory.
The first record in all matrix files produced by SOLAR, including
mIBD matrix files, is a checksum and not real data; see the matcrc
command for details. This checksum is optional in user created
matrix files. If present, it prevents a using matrix with a
different or changed pedigree.
The 'mibd relate' command can be run after the pedigree
file has been loaded, and only needs to be run once per data set.
A tally of the relative classes present in the data set can then
be displayed with the 'pedigree classes' command.
A pair of individuals may be related in multiple ways, e.g. as
1st cousins and as 2nd cousins. To conserve memory, there is a
default limit on the number of ways any two individuals may be
related. For some complex pedigrees, it may be necessary to
specify a higher limit using the '-mxnrel' option.
The remaining commands in the first group of mibd commands shown
above must be run once for each chromosome. The 'mibd merge'
command must be run first, followed by the 'mibd means' command.
The 'mibd means' can take one of two options: -typed or -all.
If the -typed option is specified, only the IBDs for pairs of
individuals who are both genotyped will be used to compute mean
IBDs by relative class. If the -all option is specified, the
IBDs for all pairs of individuals are used. The default option
is -all.
The following steps are required before computing multipoint IBDs
for chromosome N:
1. Compute the marker-specific IBDs for all markers on
chromosome N. For more information on computing marker-
specific IBDs, enter 'help ibd'.
2. Load the map file for chromosome N.
3. Use the 'ibddir' command to specify the directory where
the marker-specific IBDs are stored.
4. Use the 'mibddir' command to specify the directory where
the multipoint IBDs are to be written.
Only the last of the first four mibd commands shown above need
be entered (for a particular chromosome.) If the merged IBD
file does not exist, 'mibd merge' will automatically be run to
create it. If the mean IBD file does not exist or is older than
the merged IBD file, 'mibd means' will be run. Note that when
'mibd means' is run automatically, the default option, -all,
will be used. The 'mibd means' command must be issued directly
in order to use the -typed option. If any of the marker IBD
files is newer than the merged IBD file, a warning message will
be displayed. In order to update the merged IBD file, the
'mibd merge' command must be issued directly - this will not be
done automatically.
The 'mibd export' command outputs the multipoint IBDs for a
specified set of chromosomes into a comma delimited file.
The MIBDs must be stored in the directory named in the 'mibddir'
command. If no chromosomes are specified, then all multipoint
IBDs found in the 'mibddir' directory are exported. By default,
the SOLAR indexed IDs (IBDIDs) in the MIBD files are translated
to permanent IDs, and family IDs are included when present in
the pedigree file. The default name for the output file is
"solar-mibd-export.out". The default fields in the output file
are CHROMO, LOCATION, [FAMID,] ID1, ID2, IBD, and D7, where
LOCATION is the chromosomal location in cM.
WARNING: The file to which MIBDs are exported can become very
large. To keep export files to a manageable size, it may be best
to export MIBDs on a per-chromosome basis, i.e. one export file
per chromosome, or on a per-location basis by using the -byloc
option.
The options for the 'mibd export' command are
-file (or -f) Export MIBDs to this filename.
-overwrite (or -ov) Overwrite existing output file.
-append (or -a) Append MIBDs to existing output file.
-nod7 Don't include D7 field from MIBD files.
-ibdid Write out SOLAR indexed IDs (IBDIDs)
rather than permanent IDs.
-byloc Export MIBDs on a per-location basis,
i.e. one export file per location. The
export files are given unique names by
appending the chromosome number and the
location to the filename given by the
-file option.
The 'mibd import' command inputs the multipoint IBDs for a
specified set of chromosomes from a comma delimited file.
MIBD files are written and stored in the directory named in the
'mibddir' command. If an MIBD file for an imported chromosomal
location already exists, it is overwritten. By default, the
permanent IDs in the input file are translated to SOLAR indexed
IDs (IBDIDs). Family IDs must be included in the input file
when they are present in the pedigree file. The default name
for the input file is "solar-mibd-import.in". The default
fields in the input file are CHROMO, LOCATION, [FAMID,] ID1,
ID2, and IBD. If the input file does not contain a D7 field,
all D7 values in the MIBD files are set to zero. By default,
all MIBDs in the input file are imported. If chromosomes are
specified on the command line, however, MIBDs are imported for
those chromosomes only.
NOTE: The order of the chromosomes and chromosomal locations
in the input file is unimportant, but all the lines for a given
chromosomal location MUST BE ADJACENT. To be safe, you may want
to sort the input file by chromosome and chromosomal location
to ensure that the input file is ordered correctly.
The options for the 'mibd import' command are
-file (or -f) Import MIBDs from this filename.
-nod7 Don't take D7 from input file; set D7
to zero instead.
-ibdid Input file contains SOLAR indexed IDs
(IBDIDs) rather than permanent IDs.
The 'mibd prep' command generates the input files needed to
compute multipoint IBDs using a program other than SOLAR.
The programs currently supported are Loki, SimWalk2, Merlin and
GeneHunter. Before this command can be run, marker data and a
map file must have been loaded. The input files are generated
from various files created by SOLAR when pedigree and marker
data are loaded and so contain SOLAR indexed IDs (IBDIDs).
The marker locations written to an input file will be in Haldane
cM. If the user has loaded a Kosambi map file, the necessary
conversion to Haldane is made automatically. By default, IBDs
will be calculated at every integer cM location from 0 to the
last marker in the map file. The '-qter' option extends the range
of locations to the end of the chromosome. For each chromosome,
SOLAR defines qter as the nearest integer location greater than
or equal to the position (in Haldane cM) of the last marker on
that chromosome in the deCODE map.
The allele frequencies in effect, whether read from a file or
computed by SOLAR, are passed to the multipoint IBD calculation
program, except in the case of Loki, for which the default action
is to let Loki estimate the allele frequencies. The '-usefreq'
option can be used to force Loki to use the current allele
frequencies.
NOTE: After the input files have been created, the user must exit
SOLAR and run the external program to compute the multipoint IBDs.
Once the IBD calculations are complete, the resulting output file
can be imported into SOLAR using the 'mibd import <program>'
command.
The 'mibd import <program>' command reads an output file which
contains the multipoint IBDs computed by a program other than
SOLAR, and imports those IBDs to create SOLAR-format MIBD files.
The programs currently supported are Loki, SimWalk2, Merlin
and GeneHunter. This command is designed to work with the
'mibd prep' command, so the output file is assumed to contain
SOLAR indexed IDs (IBDIDs), not the real IDs from the pedigree
data file. Before this command can be run, the 'mibddir'
command must have been given to specify the directory in which
the MIBD files are to be stored.
NOTE: For both the 'mibd prep' and 'mibd import' commands, if
SimWalk2 is the program chosen, then it is assumed that version
2.91 or a newer version of SimWalk2 will be used to compute the
multipoint IBDs. In previous versions of SOLAR, SimWalk2 version
2.82 was assumed. Due to a backward incompatibility in file
formats that was introduced in later SimWalk2 versions, if you
wish to use the earlier version of SimWalk2, it is now necessary
to include the '-version 2.82' option.
Notes: The computed multipoint IBDs are stored in gzipped files with
names of the form 'mibd.<chromo>.<loc>.gz', where <chrom> is the
chromosome number and <loc> is the chromosomal location.
Several additional files are created and used for multipoint
IBD calculation:
mibdrel.ped relative-class and kinship information
mibdchrN.loc marker locations on chromosome N
mibdchrN.mrg.gz merged marker-specific IBDs for chromosome N
mibdchrN.mean mean IBD by relative class for chromosome N
Shortcuts: mibd - mibd
Purpose: Set directory in which MIBD matrix files are stored
(multipoint only; use ibddir to set up twopoint)
Usage: mibddir <dirname> ; set directory for MIBD files
mibddir ; show current mibddir
mibddir -session ; show mibddir entered in this session
Notes: The mibddir selected is saved in file mibddir.info for
future SOLAR sessions. Once a midddir is selected, it
need not be selected again within the same working directory,
EXCEPT for the purposes of writing out mibd files. To
prevent accidentally overwriting pre-existing mibd files,
it is necessary to explicitly enter the mibddir
command before using the mibd command or other commands
which write files into the mibddir.
Shortcuts: mibdd - mibddir
Purpose: Arrange miniature plots on a single page
Usage: miniplot [-pass <pass>] [-allpass] [-plots <number>]
[-port] [-land]
See also "plot -all"
-pass Do this pass number (default is 1)
-allpass Do all passes, each on a separate page
-plots Put this many plots on a page
-port Portrait layout
-land Landscape layout
-nodisplay Generate postscript, but don't display
Output file named passN.out (pass01.out for pass 1) in trait/outdir
is created. The trait or outdir must have been specified previously
and the plots must have been created previously (see usage for
example).
The individual chromosome plots should have been created previously
using the "plot" command. In fact, "plot -all" or "plot -allpass"
will invoke miniplot automatically.
This requires that Python (1.5.2 and later works, maybe earlier)
be installed. If you do not have python, use "plot -string"
instead.
Shortcuts: minipl - miniplot
Purpose: Describe, save, or load a model
Usage: save model <modelname> ; save current model to a file
load model <modelname> ; load model from a file
model ; display model on terminal
model new ; reset to new empty model
Notes: An extension .mod is automatically appended if not specified.
You must specify directory path if you want to save model
in a subdirectory of the current directory.
-
Shortcuts: model - model
Purpose: Set or Display the Mu equation (trait value estimator)
Usually the covariate command is used to set this automatically,
but the mu command provides more advanced capabilities, such as
using log or sine functions.
Usage: mu ; displays Mu equation
mu = mu + <expression> ; add new terms to Mu equation
mu = <expression> ; replaces Mu equation (SEE DISCUSSION!)
mu reset ; restores default Mu equation
<expression> may include mathematical operators (+-*/^),
constants, parentheses, any mathematical functions defined in
the C programming language, any phenotypic variables included
in the analysis, sex, and for any variable "var" x_var (the
sample mean), min_var (the sample minimum), and max_var
(the sample maximum). Parameters whose names include
any erstwhile operators including parentheses, *, or ^ must
be enclosed in angle brackets <> to prevent being parsed as
functions; note this always happens for bivariate models, or when
there are interaction covariates such as age*sex, or squared
covariates such as age^2. For bivariate models, you can also
include "t1" and "t2": t1 is 1 if the mu is being evaluated
for the first trait, and 0 otherwise, and t2 has this behavior
for the second trait. All variables included in the mu will be
required in the sample.
Also it is possible for the mu to include inequality operators
(such as >=) and the "print" function (for debugging purposes).
In these regards, the mu expression is like the omega expression.
See "help omega" for more about inequalities and print, and
a complete listing of the mathematical functions available.
Discussion:
The "mu" embodies the estimation of the trait value for any individual
based on the sample mean and their covariate values. It does not
normally include genetic effects. The difference from this estimation
and the actual value is used to determine genetic and other
intra-individual effects. Thus, "mu" is evaluated in the context of
of a single individual, and NOT a pair of individuals as with "omega".
You can get examples of many possible "mu" commands by using the
mu command to display the current mu equation for different
models. For example:
solar> model new
solar> trait q1
solar> covar age
solar> mu
mu = \{Mean+bage*(age-x_age)\}
First notice that the entire body of this default mu equation is
delimited by \{ and \} characters. This is the portion which is
automatically generated by SOLAR and will be changed automatically
if your covariates are changed. You should not normally edit this
portion of the mu. If you need to change the mu, you can either
augment this portion with an additional expression, or replace the
mu altogether with a new expression, in which case you must leave
out the \{ and \} delimiters. If you replace the mu altogether with
a new expression, you are then responsible for including terms for
covariates (if any) and it is not necessary to use the "covariate"
command at all.
The Mean and bage terms refer to parameters in the model, the age term
refers to a data variable "age" found in the phenotypes file, and the
term x_age refers to the average age of all individuals in this sample.
You may include similar terms in any new mu expression.
Adding To The mu
You can add additional terms either by appending them onto the mu shown
by the mu command (using terminal cut and paste makes this convenient)
or using the "mu = mu + ..." shorthand. For example, using the
shorthand, you could add a new term for log(weight-100) as follows:
solar> mu = mu + log(weight-100)
OR by entering the following:
solar> mu = \{Mean+bage*(age-x_age)\} + log(weight-100)
in either case, the result would be the same:
solar> mu
mu = \{Mean+bage*(age-x_age)\} + log(weight-100)
If you then added another covariate, that would be included automatically
in the default portion of the mu:
solar> covar sex
solar> mu
mu = \{Mean+bage*(age-x_age)+bsex*Female\} + log(weight-100)
Notice here that the variable "Female" changes according to the sex.
It is "0" for male and "1" for female.
Replacing the Mu
You can also replace the Mu altogether, removing the "default portion."
If you remove the specially delimited "default portion" from the mu,
your covariate commands will have no effect on the mu, and you will
either have to write the beta parameters into the mu yourself or
remove the covariates altogether. All phenotypic variables you
write into the model will be required for all individuals to be
included in the sample.
Continuing our example:
solar> covariate delete_all
solar> mu
mu = \{Mean]}
solar> mu = Mean + log(weight-min_weight)
solar> mu
mu = Mean + log(weight-min_weight)
The Mu can be as elaborate as you like, including any mathematical
functions defined in the "C" programming language. It need not include
the "Mean" parameter (in fact you do not even need a Mean parameter in
SOLAR anymore).
If you removed the default mu by mistake and need to restore it,
use the "mu reset" command.
Bivariate Mu
solar> model new
solar> trait q1 q2
solar> covar sex
solar> mu
mu = \{t1*(<Mean(q1)>+<bsex(q1)>*Female) + t2*(<Mean(q2)>+<bsex(q2)>*Female)\}
Notice that the mu for this bivariate model has separte terms for the first
and second traits, which are identified by "t1" and "t2". (The variable
"t1" is true if the first trait is being estimated, and false if the
second trait is being estimated. If you replace the mu, any terms not
multiplied by "t1" or "t2" will be applied to the estimation of both
traits, and you may have as many (or as few) t1 and/or t2 terms as you
need.
Additional Notes:
(1) Use the "mu = mu + <expression>" as described above instead of
the now obsolescent "mu = <expression> + mu" to add to the mu.
Also, you may notice that square brackets are no longer used
to delimit the default mu. They did not work as had been intended.
The default portion of the mu is now delimited by \{ and \} which
may be included in a user specified mu. Everything within the
delimiters is maintained by SOLAR and editing this portion will
have no effect. It is simply displayed for informational purposes.
If the mu is defaulted, models will be saved with a mu "comment"
for informational purposes only; the actual mu is determined by
the covariates.
(2) As terms in the mu equation, you may use any constant, any
parameter, Sex, Mean, and any Phenotypic variable. There are
also predefined terms for any phenotype named 'var': x_var
(the sample mean), min_var (the sample min), and max_var (the
sample max). Any math operator (+,-,*,/) and function defined
in the C programming language may be used. Also, the ^ character
may be used to indicate exponentiation.
(3) Parameter names which include * or ^ should be enclosed in
<> angle brackets to prevent the names from being interpreted
as multiplication and/or exponentiation expressions:
mu = Mean + <bage*sex>*(age-x_age)*Female
(4) The default mu expression will display all variables as being
adjusted to their mean (e.g. age-x_age). However, during
maximization, if a variable is found to be binary, the
variable is adjusted to its minimum (e.g. diabet-min_diabet)
instead. This will be reflected after the first maximization.
User-created mu equations must always correctly specify
either the mean (x_) or min (min_) variable as required.
Shortcuts: mu - mu
Purpose: Perform a multipoint analysis.
Scan loci on selected chromosomes at selected interval
(use chromosome, interval, and finemap commands beforehand)
Usage: multipoint [<LOD1> [<LOD2> [<LOD3> ...]]] [-overwrite] [-restart]
[-renew mod] [-nullbase] [-plot] [-score]
[-cparm <plist>] [-rhoq <fixed value>] [-saveall]
[-ctparm <plist>] [-se]
Zero or more criterion LOD scores may be specified. If none are
specified, multipoint will make one full scan and then stop. If
one LOD score is specified, multipoint will continue scanning
until the highest LOD found in the last scan is no longer
greater than or equal to the LOD score specified. If more than
one LOD score is specified, each LOD will apply after one scan
has been completed. Then the last LOD specified will remain in
effect.
-overwrite (or -ov) Overwrite existing multipoint output files.
-restart (or -r) Restart previous multipoint run
-nose Don't bother computing standard errors in best
models (S.E.'s are not normally computing while
scanning anyway). This should not be combined with
-slod option.
-plot plot each point while scanning (uses plot -quick)
Shows the current chromosome in the current pass,
ending with the last chromosome in the last pass.
To view previous passes, or for best quality plot,
use the plot command. The plot command may be run
simultaneously in other SOLAR sessions plotting the
same data from the multipoint*.out files. For more
information, see help for the plot command.
-score Use Score based LOD (S-LOD) defined as:
SLOD (score(i)^2 * SE(i))/(2 ln (10)) (where i is
the index of the new parameter).
-cparm <plist> Custom parameters. (See also -ctparm.) This is
discussed in Section 9.5 of the manual. Scanning
will consist of replacing one matrix with another
matrix, everything else is unchanged. The
starting model MUST be a "prototype" linkage model
will all the desired parameters, omega, and
constraints. Starting points and boundaries for
all variance parameters must be explicitly
specified. Following the -cparm tag, there must be
a list of parameters in curly braces that you want
printed out for each model. The list can be empty
as indicated by an empty pair of curly braces {}.
The matrix to be replaced must have name mibd or
mibd1, mibd2, etc. The highest such mibd will be
replaced. If the loaded matrix has two columns,
each succeeding matrix will also be loaded with two
columns. There must be a model named null0 in
the maximization output directory for LOD
computation. See section 9.5 for an example of
custom parameterization. Note: the user's
starting model is saved in the output directory
as multipoint.template.mod. Any or all parameters
in the <plist> may also be multiple-term
expressions. See second example below.
After revision in version 4.1.5, -cparm now
reloads the prototype model at the beginning of
each chromosome or during finemapping if there is
a gap greater than 11cm. This provides much more
stable operation of -cparm and fixes the problems
that led most people to use -ctparm. However,
-ctparm may be preferable in some cases where
there are convergence errors. Or vice versa.
Another strategy is to set the interval to 1.
-ctparm <plist> Custom parameters, as -cparm, but rebuilding each
model from the "prototype" linkage model. This
might be slower, but it has the advantage of
greater reliability. If any model ends up with
parameter(s) on boundaries, it has no ill effect
on the remaining models.
-se Calculate standard errors in all linkage models.
Otherwise, they are always NOT calculated. This
is mainly useful in conjunction with -cparm
and -ctparm. See second example below.
-link <proc> Use specified (by name) procedure instead of
linkmod (default) to move to the next locus.
The procedure requires 1 argument which is the
full relative or absolute pathname to the mibd file.
For now, it should ignore additional arguments
(use a trailing "args" argument to do this).
-nullbase Reload null model as base for each linkage model.
The default is to start from the previous linkage
model if on the same chromosome.
-epistasis N Use current loaded model as the base for a one-pass
epistasis scan. N is the index of the mibdN to
be included in epistatic interactions (e.g. 1 for
mibd1). An additional parameter H2qE1 will be
added for the interaction term (for mibdN and
mibd<scan>). Output files will be named
multipointe.out and multipointe1.out. Only one
epistasis pass is currently supported; if
oligogenic scanning is desired that should be
done first before running an epistasis scan. At the
one QTL where mibdN and mibd<scan> are the same,
h2q<scan> is constrained to zero (it is not and
should not be constrained to zero elsewhere).
-rhoq <value> Constrain rhoq parameters to <value>.
-saveall Save the multipoint linkage models for every locus
tested, not just the best ones. The filenames look
like this: multi.pass1.2.3.mod for pass 1, chromosome
2, locus 3. The maximization output files are saved
also following the same naming convention but with
a .out suffix. Warning! This can fill up a lot of
harddrive space quickly. It is recommended to
restrict this to a chromosome and/or range (set with
the interval command) of interest.
Examples: multipoint 3 1.9
This will first do a full scan and zero-in scan once, then, if the
highest LOD >= 3, it will scan again. If the next highest
LOD >= 1.9, it will continue scanning until the last highest
LOD < 1.9.
trait q4
polymod
maximize
save model q4/null0
linkmod gaw10mibd/mibd.9.1.gz
option standerr 1
multipoint -ctparm {h2r h2q1 {par h2q1 se}} -se
This illustrates a simple use of the "custom parameterization"
option. Note that unlike the typical use of the multipoint
command, it is necessary to create a "prototype" linkage model
first (here it is done with the linkmod command, but one might
also use linkqsd or build the model "by hand" setting up the
parameters and omega). The list of parameters following -ctparm
may also include commands enclosed in a second level of braces.
The command must include more than one element as it is not
the braces but the element length that determines whether the
element is interpreted as a parameter or a command.
In this example, a command extracts the standard error of h2q1.
Requires: mibddir, chromosome, and interval commands must have been
given to select mibd files to use. finemap may be adjusted
with finemap command.
There must be a null0.mod model in the trait or outdir
directory. This can be created with the polygenic command
prior to running multipoint. (This model may include
household and covariate effects. See the help for the
polygenic command for more information.)
IMPORTANT NOTE: In most cases, multipoint starts by loading a model named
null0.mod from the current output directory. The
model currently in memory is ignored. This is done
because it is absolutely essentially that the null0
model be the basis to build all multipoint models. However,
some options, such as -ctparm, use the model currently
in memory when multipoint is invoked because all
models are derived from a custom linkage model that
the multipoint command does not necessarily know how
to build.
Notes: 1. Summary output is written to multipoint.out in a subdirectory
named after the trait variable. You can set another output
directory with the outdir command. Contents of the output
directory will be purged of previous files at the beginning
of each invocation if -overwrite is used.
2. The final "best" linkage model is link.mod. In addition,
a series of additional "null" models is produced, starting
with null1.mod (containing 1 QTL), null2.mod, etc. These
models are produced only if a LOD criterion is specified
and satisfied (so there is more than one pass).
3. If a LOD adjustment is in effect (see lodadj command) it
is applied here.
4. If models have two traits, the 2df LOD scores will be
converted to 1df effective LOD scores. To override this,
use the lodp command (see). This feature
was first included with SOLAR beta version 2.0.1.
5. At the beginning of each pass through the selected genome,
multipoint calls a user script named multipoint_user_start_pass
which takes one argument, the pass number (which starts at 1
for the first pass). Within this routine, the user can change
the selected chromosomes or interval.
Shortcuts: mul - multipoint
Purpose: Keep K2 (phi2) terms from MIBD matrices
Usage: needk2
needk2 off
Notes: This command is now obsolescent and should not be used.
The K2 in MIBD files is obsolescent. We now maintain
a separate phi2.gz file for discrete trait analyses, and
for quantitative trait analyses, the K2 (phi2) values are
computed as needed.
Old Notes:
If you need to use any of the K2_* matrix values, issue the needk2
command before loading the matrix (or running 'multipoint.')
Normally the K2 values from matrix files are not used because
they are identical to the K2 values computed by SOLAR as needed.
The default (of not saving K2) cuts matrix memory usage in half.
Shortcuts: needk - needk2
Purpose: Start a new model
Usage: newmod [<trait>]+
<trait> Set the trait(s) to this/these trait(s). (The trait(s)
can be specified later. If not specified here, they
become <undefined>.)
Notes:
(1) This combines "model new", "outdir -default", and optionally
trait [<trait>]+ . This is now preferred to using the separate
commands, because it is shorter. For example, the command:
newmod q1 q2
takes the place of the commands:
outdir -default
model new
trait q1 q2
Clearly the "newmod" form is superior, it preserves the
essential information while reducing redundant keystrokes.
(2) Since this clears the outdir, it is adviseable to use this
command instead of "model new" to be sure that the outdir
is cleared, and not inheirited from some previous script.
From now on, the manual advises using "newmod" (and not
"model new") for this reason. However, the behavior of
"model new" itself is unchanged, so that existing scripts
that operate correctly will continue to operate correctly.
When combining previously written scripts that use "model new"
instead of "newmod", the user must be careful to update
"outdir" status if required. New scripts using "newmod" will
not be subject to the error of incorrectly inheiriting an
unwanted outdir setting.
Shortcuts: newm - newmodels
Purpose: Recognize new or changed Tcl procedures in Tcl scripts
Usage: newtcl
Notes: At the time a SOLAR session is started, all Tcl scripts
(files ending with ".tcl") are scanned. The newtcl
command forces another such scan in order to recognize
new Tcl procedures (created AFTER the start of the SOLAR
session), or to recognize changes to Tcl procedures since
the first time those procedures were used (see explanation
below). You could also accomplish this by exiting from
and restarting SOLAR, but that is often inconvenient
because it causes the loss of session state.
The following directories are scanned by SOLAR for user scripts:
. (the current working directory)
~/lib (the lib subdirectory of your home directory, if it exists)
A procedure found in "." will supercede one found in "~/lib" having
the same name. Also beware that if the same procedure name is used
in more than one script file, the first one encountered will be
the one actually used. If the same procedure name is found in two
files in the same directory, the precedence is not predictable.
The scanning process simply looks through each script file for
"proc" (procedure) statements. An index of all the procedures
is then written to a file named tclIndex in the working directory.
This file will only be created if user-defined Tcl scripts are found.
Tcl procedures are only loaded into SOLAR the first time they
used. Once loaded, they stay loaded, and may no longer reflect
the Tcl files in the scan path if those Tcl files are changed.
The newtcl command flushes all currently loaded procedures, so
the next time any procedure is invoked, it will be reloaded from
the file.
The main Tcl file used by SOLAR is named solar.tcl and is located in
the lib subdirectory of the SOLAR installation. This defines all
the fundamental procedures used by SOLAR. User-defined procedures
having the same name as built-in procedures will supercede them.
Shortcuts: newt - newtcl
Purpose: Normal distribution functions
Usage: normal -i[nverse] <p>
Notes: Currently, the only supported function is the "inverse normal
cumulative density function", which maps the open range
0,1 to the whole real line. (The values for 0 and 1 are
out of range because they would be negative and positive
infinity.)
This normal function is used by the inormal procedure to
perform an inverse normal transformation on a dataset.
For further information, see the help for "inormal".
In turn, the inormal procedure is part of the mechanism
behind the "inormal_" prefix which may be applied to
phenotypes in the define command.
We will add additional normal distribution functions here as
we need them.
Our implementation is indirectly based on:
Cody, W.D. (1993). "ALGORITHM 715: SPECFUN - A Portabel FORTRAN
Package of Special Function Routines and Test Drivers"
ACM Transactions on Mathematical Software.19, 22-32.
solar::zs
Purpose: The old zscore command to zscore current trait
Old Usage: zscore [-off] [-q]
zs [-off] ;# Perform zscore quietly
-off Turn off zscore
-q Perform zscore quietly
Notes: The "Mean" and "SD" values used by zscore are computed only
once, at the time the zscore command is given. Thus they do
not reflect later changes to the phenotypes file, or to the
sample, which might be restricted due to individuals missing
covariates added later. Generally, for this reason the
zscore command should be given after the covariates command
and immediately before a model maximizing command such as
polygenic.
Starting with SOLAR Version 4.0.9, the trait mean and SD
are computed from the actual sample that would be included
in an analysis (at the time the zscore command is given).
As described in the notes below, you can adjust the Mean
and SD by using "option zmean1" and "option zsd1" to set
the values actually used. These values are applied to
the trait values during maximization.
If the trait is changed without giving the "model new"
command, the new trait will be zscored automatically.
This feature is obsolescent. In a future update, zscore
will be turned off when the trait is changed.
An alternative to zscore is to define the trait as the
inverse normal transformation of a variable. See
"help inormal" and "help define" for further details.
zscore will also calculate a number of statistics
for the trait: mean, minimum, maximum, standard deviation,
skewness, and kurtosis. These will be written to the file
zscore.out in the current output directory. As of version
4.0.9, these statistics are no longer written to the terminal.
Instead, a single line is displayed with the trait name,
mean, and SD. Even that line is not shown if zscore is
invoked from a script or the zs abbreviation of the command
is used.
To calculate these statistics for any phenotypic variable without
zscoring and without necessarily making it the trait, use the
"stats" command instead.
A trait must already have been selected with the trait command
or loaded model. Also the phenotypes file must have been loaded.
When a maximization is performed, trait values are replaced with
their zscored values. The formula is:
zscored = (value - Mean) / SD
zscore is a model dependent option controlled by "option zscore".
It remains in effect until another model is loaded or the
"model new" command is given. When models maximized with zscore
are reloaded, zscore is again activated.
"option zscore" is set to 1 ("on") by this command, and the
related options zmean1 and zsd1 (mean and standard deviation
for the first trait) and zmean2 and zsd2 (mean and standard
deviation for the second trait) are set as required. You can
adjust these options directly to fine tune the mean and standard
deviation values used, but be sure that zscore is not set to 1
until the mean and (non-zero !) standard deviation values are
set for all traits in the model.
In a multivariate model, zscore will only be applied to the
first two traits.
Whenever zscore is activated or deactivated, parameters mean
and SD are reset to zero to force setting new boundaries and
starting point during the next maximization.
If a new phenotypes file is loaded, the zscore command should be
repeated to reflect the new file.
Purpose: Sets the Omega (Covariance) equation directly
Important: By default, SOLAR (starting with version 2.0.2)
sets up the omega automatically when you give the "trait"
command, and subsequently modifies it as required during
polygenic, multipoint, and other commands. It is only
necessary for the user to use the omega command for
special purpose advanced analyses.
Usage: omega = <expression> ; sets Omega equation
omega ; displays Omega equation
omega reset ; reset default Omega equation
Notes: The default omega for a SOLAR model with one linkage element is:
omega = pvar*(I*e2 + phi2*h2r + mibd1*h2q1)
Notice that each term inside the parentheses has a matrix and a parameter.
Some of the matrices loaded from external files and some are computed
internally.
The built-in variables are:
pvar...........Phenotypic variance. This is the square of the SD
(trait standard deviation) parameter.
I.............Identity matrix, which equals 1 when pair of
individuals is the same individual, 0 otherwise.
phi2..........Two times the kinship coefficient, for quantiatitive
models this is normally computed internally and
on-the-fly to reduce storage requirements.
Also found (identically) in the phi2.gz matrix file
created by "load pedigree," SOLAR uses the
phi2.gz by default for discrete traits, or the usage
of the external file can be forced by using the
"loadkin" command or giving a suitable "load matrix"
command such as "load matrix phi2.gz phi2 delta7".
delta7........Dominance coefficient, equivalent to Jacquard's
delta7 from the series delta1-delta7 when there
is no inbreeding. If there is inbreeding, this
should not be used. As with phi2, this is, by
default, computed internally and on-the-fly for
quantitative models unless "loadkin" or a
comparable "load matrix" command is given. The
delta7 matrix is the 2nd matrix within the phi2.gz
file. This matrix should only be loaded if it is
needed because it is usually not used by SOLAR.
male_i........1 if individual "i" is male, 0 otherwise.
male_j........1 if individual "j" is male, 0 otherwise.
female_i......1 if individual "i" is female, 0 otherwise.
female_j......1 if individual "j" is female, 0 otherwise.
si............index of trait i (1..ntraits)
sj............index of trait j (1..ntraits)
For <phenotype> which is the name of a phenotypic variable:
x_<phenotype>.....sample mean for <phenotype>. For example, x_age.
min_<phenotype>...minimum sample value of <phenotype>.
max_<phenotype>...maximum sample value of <phenotype>.
<phenotype>_i.....value of <phenotype> for individual "i".
<phenotype>_j.....value of <phenotype> for individual "j".
For <parameter> which is the base name of a parameter:
<parameter>(ti)...replace "ti" by name of trait of individual "i".
For example, h2r(ti) may be h2r(weight) in
an analysis of traits height and weight.
<parameter>(tj)...replace "tj" by name of trait of individual "j".
teq...............trait for individuals i and j is the same.
tne...............trait for individuals i and j are not the same.
Matrices may also be used, along with math operators
+ - * / and also ^ (power) and () (parentheses), and also
all math functions defined by the C Programming Language
which includes "log" for natural logarithm, trig functions,
and hyperbolic functions, among others. Here is a list:
erfc, erf, lgamma, gamma, j1, j0, y1, y0, rint, floor, ceil,
tanh, cosh, sinh, atan, acos, asin, tan, cos, sin, expm1, exp,
logb, log1p, log10, log, cbrt, sqrt, and abs.
Parameter names with special characters should be enquoted using
angle brackets so the entire name is enquoted, including any
prefix string. For example, given a variable named age.onset with
dot, the mean value of the variable could be indicated with
<x_age.onset> in angle brackets as shown. This is the same as
the rule used by the define and constraint commands.
Beginning with version 3.0.4, the following equalities and
inequalities may also be used between two terms. If the
operator is true, 1 is returned, otherwise 0 is returned.
This enables you to construct compound conditional expressions
having the same effect as could have been done with "if"
statements. The C operators < and > have been replaced with
<< and >> so as not to be confused with the <> quotation of
variable names in SOLAR.
C Format Fortran Format Test
-------- -------------- ----
== .eq. if equal
!= .ne. if not equal
>= .ge. if greather than or equal
<= .le. if less than or equal
>> .gt. if greater than
<< .lt. if less than
Example of use of inequalities:
omega = pvar * (I*e2 + (h2r >= 0.125)*phi2*h2rc + \
(h2r < 0.125)*phi2*h2rd)
Beware that comparing the equality or inequality of two floating
point numbers sometimes does not work as expected due to
numerical representation limitations. For example, 1/3 might
not equal 2/6.
The precedence of the equality and inequality operators is below
that of all other operations, so their expressions should be
in parentheses as in the example above.
There is also a function named "print" which simply prints
the value of its argument, which may be any expression, and
then returns that value. For example, in place of the standard
univariate omega
omega = pvar*(phi2*h2r + I*e2)
You could have:
omega = pvar*(print(phi2) + I*e2)
and this would print each phi2 value as it is used. An
expression may include any number of print functions, and
they are evaluated in the standard order of evaluation,
starting with the innermost subexpression. If you simply
want to print some value without including it in the rest of
the expression, you can multiply the print function by zero,
for example:
omega = pvar*(phi2*h2r + I*e2 + 0*print(delta7))
At this time, the print function can only print one number,
without any identifying string. After each value is printed,
you must press RETURN to procede to the next, or you can hold
down RETURN to pass through a lot of prints.
For multivariate models which have 3 or more traits, the automatically
created standard omega includes generic rho parameters rhoe_ij, rhog_ij,
and rhoc_ij if household effects, and rhoq1_ij (to rhoq10_ij) for linkage
models. When the omega is evaluated, the i and j are replaced with the
trait indexes. For example, with traits 1 and 2 rhoe_ij becomes rhoe_12.
It is possible to write omegas without these generic rhos if desired.
There are also 4 additional generic rho's available for custom usage:
rhoa_ij, rhob_ij, rhod_ij, and rhof_ij.
Shortcuts: ome - omega
Purpose: Set or read the value of model-specific options.
Usage: option <option name> <value> ; sets option value
option <option name> ; reads option value
option ; shows all option values
Notes: ibd-specific options are set by ibdoption.
Most options control fairly obscure aspects of SOLAR operation
and are not normally changed directly by SOLAR users. Many are
automatically controlled by other SOLAR commands during
normal operation.
Model-specific options are stored in saved model files.
Starting a new model (with the "model new" command) will
reset all options to the default value. Loading a model
will reset all options to the default, then load the options
in the model file. It is recommended to specify options
after specifying the trait and covariates, but before
giving the "polygenic" or other model maximizing command.
Warning: RARELY USED options may have become buggy in the
context of more current features since they haven't been recently
tested.
Here is a list of options and their default values:
CMDiagonal 0 Covariance matrices diagonal (1=yes; 0=no; automatically
set to 1 for sporadic models, 0 for all others)
StandErr 1 Compute Standard Errors (1=yes; 0=no; defaults to 1
except during a multipoint scan where it is set to
0 while scanning to improve speed)
StandLogLike 0 Standardize Loglikelihood (0=no; 1=yes) RARELY USED.
AutoCovarBound 1.25 Factor used in estimating covariate boundaries:
AutoCovarBound*(tmax-tmin)/(cmax-cmin)
This is a fairly wide but useable estimate
which has never needed adjustment.
Grid 0 Method used in likelihood estimation 0=Search; 1=Grid
RARELY USED, and not to be confused with grid command
GridPoints 1 Points to be used if Grid option (above) is applied
MaxIter 1000 Maximum Iterations allowed in a loglikelihood search
(If you need more than this, something is probably
wrong. Usually only 10-20 iterations of searching
is sufficient. MaxIter is to prevent SOLAR from
iterating forever in troublesome cases.)
Outlier 0 0=keep; 1=remove outliers (It's probably better to
to remove them yourself than rely on this RARELY USED
option.)
CutPeople 0.05 Factor used to remove outlying people
CutPed 1.0 Factor used to remove outlying pedigrees
TDist 0 Automatically set by the "tdist" command. Don't set
this option yourself unless you are an expert. Use
the tdist command instead, which sets this option
and sets up the required parameter for you.
Conv 1.0E-6 Convergence improvement factor for quantitative
models (For experts only! See Chapter 6.)
Conv(Discrete) 1.0e-4 Convergence improvement factor for discrete models
NConv 4 Convergence count (For experts only!) Conv has to
satisfied this many times.
Tol 1.0E-8 Tolerance for parameters (for experts only!)
MaxStep 5 Maximum steps (for experts only!) This many
decrements are allowed in the attempt to improve
loglikelihood. This may need to be increased in
troublesome cases
BCliffs 0.1 Backup from NaN cliffs by this factor (for experts
only!) currently used only for discrete models
MaxCliffs 15 Maximum steps to backup from NaN cliffs (for
experts only) currently used only for discrete
ScoreOnlyIndex -1 Automatically set by "multipoint -score"; otherwise
don't touch
MergeHousePeds 1 Merge pedigrees sharing households (1=yes; 0=no)
Necessary for accurate C2 estimation; 1 is default.
MergeAllPeds 0 Merge ALL pedigrees (Earlier merging method; use
only if MergeHousePeds fails inexplicably)
RobustEst 0 Robust Estimation (new and barely tested)
Tune 3 Factor used with robust estimation
PedSelect 0 Select only this pedigree for maximization. The
zero default means "all pedigrees." Otherwise
use integer to select pedigree as indexed in
pedindex.out created by "load pedigree."
Alternatively use commands "pedlike" and
"pedlod" to get pedigree specific likelihoods
and lods using the same parameter values. It
is also possible to select multiple pedigrees
by using + operator: option pedselect 1;
option pedselect + 2; and so on. + is optional
before the first selection. The list of selected
pedigrees is saved to model file, and can be
cleared out either with "option pedselect 0" or
"model new."
EnableDiscrete 1 Use SOLAR "Discrete Trait" modeling if trait is
found to be discrete (2 integer values separated
by one, e.g. 0,1). (0) means use quantitative
modeling regardless of inferred trait type.
DiscreteOrder 1 Ordering for discrete pedigrees. You are strongly
discouraged from changing this by Dr. Jeff Williams
who has done considerable study of discrete trait
modeling. The default ordering (1) puts affecteds
first if prevalence < 0.5, else unaffecteds first.
No ordering is done for (0). (-1) reverses the
standard ordering. (2) does per-pedigree ordering.
(-2) does per-pedigree reverse ordering. Per-pedigree
ordering is not available for multiple traits. Set
DiscreteOrder to 3 to an create output file named
"discrete.out" in output directory containing up
to 7 variables of the ordered data.
DiscreteMethod 1 Version of discrete code used. The default (1)
seems the most robust, and use of the alternate
method (2) is discouraged.
UnbalancedTraits 1 Default is to use "unbalanced traits" in bivariate
models. Individuals will be included in the
analysis if they have either trait; they do not have
to have both. Individuals are converted to
individual-trait's. (0) Excludes individuals
missing either trait. (-1) Also exclude individuals
missing either trait, use "bivariate" feature
built-in to Search.f.
EnforceBounds 1 Numerical errors during the quadratic-solving phase
of maximization can result in overshooting parameter
boundaries by a tiny amount. The default enforces
boundaries whenever a new "point" of parameter values
is computed. This was essential for bivariate models
to prevent attempted square roots of tiny negative
numbers. (0) turns this feature off. This is
not effective during some phases of maximization,
such as estimation of standard errors, where the
point is not computed in Search.f, which is why
the AbsVarianceParms option was added.
EnforceConstraints 0 Experimental. (1) turns on attempted enforcement of
constraints when the derivatives of the tableau
become too large to maintain constraint numerical
accuracy better than 1e-4. (Otherwise, convergence
may fail with the "Numerical constraint failure"
note.) Unfortunately, when this is done, models are
usually so out of whack for some reason or other that
convergence will ultimately fail by exceeding the
maximum iteration limit anyway. (-1) turns off the
constraint numerical accuracy test at the end of
maximization. The default (0) tests constraint
accuracy at the end of maximization only.
CorrectDeltas 0 Experimental. 1 turns on attempted correction of
deltas for numerical errors during the quadratic
problem solving phase of maximization. The default
(0) leaves this turned off because the default
EnforceBounds option accomplishes the intended
result more efficiently in most cases.
AbsVarianceParms 1 The default (1) forces the the abs() function to be
applied to all known parameters used in the omega in
bivariate maximization. This prevents tiny negative
values from causing NaN's to arize from the
application of square root. (-1) forces the abs()
function to be applied in univariate maxmization as
well. That might cause trouble with some discrete
models. (0) forces the use of the actual parameter
values in the omega, negative or not.
BounDiff 0 This option controls the method used to compute
"forward" and "central" differences in likelihood
(derivatives) when a parameter is at a boundary.
(Currently, this option is applied only in discrete
trait modeling where it has been shown to be
necessary to ensure complete maximization.)
The default (0) prevents the parameter from going
beyond the boundary if the boundary is -1 or 0 for
lower bounds and 0 or 1 for upper bounds since
these are the typical "hard" boundaries for variance
components, correlations, roots, and logs. Otherwise,
parameters are allowed to go beyond boundaries
during derivative calculation by very tiny amounts.
(1) always enforces the boundaries.
(-1) never enforces the boundaries, allowing
tiny excursions beyond them in all cases (which,
if possible, might give the best results).
(2) applies the historic rule: only upper boundaries
are (always) enforced, but only for forward
differences, and the simple method of substituting
the negated backward difference is used.
For options (0) and (1) a more sophisticated
algorithm is used when otherwise the boundary
would be crossed illegally. An inner difference
is taken AND adjusted for the slope of the next
nearest change in differences. In other words,
the second derivative is used to compute an
expected first derivative.
PedLike 0 Intended for use by pedlike and pedlod commands only.
Produces files "pedexclude.dat" and "pedlike.dat"
during maximization.
SampleSameTrustMe This option declares to SOLAR that the sample for
the previous model is identical to the current model.
Therefore, the same EVD matrices may be used, and
the default checking that is done to determine if the
sample is the same is bypassed. If this option is
specified before EVD matrices have been created for
any model, the required matrices are created, but the
storage of additional data to determine when the
sample changes is bypassed. Thus maximization speed
of all EVD models is increased, but only slightly,
about 1%. The downside is that if the user has
specified this option in error, the results will be
wrong and SOLAR might even crash. Hence the suffix
"TrustMe". THIS IS A SPECIAL NO-WRITE OPTION which is
not written to model files, because when the model
is reloaded it may not be applicable at the time it
is reloaded. It should be used only in scripts
where the null and test models are absolutely certain
to have the same sample.
evdphase 0 If option modeltype is evd, this option when set to 1
will enable the new experimental EVD2. EVD2 is
currently under development and does not actually
complete maximization in the current SOLAR version.
dontallowsamplechange 0 If option modeltype is evd, and this option is
set to 1, model maximization will terminate
prematurely with an error message if the
the sample changes from the previous evd
model. If modeltype is evd, and this option
is defaulted at 0, the evd code will simply
create new matrices as required. If the
option samplesametrustme is 1, this option
has no effect unless the size of a pedigree
changes. If the option modeltype is not
evd, this option has no effect.
singulartrait 0 If set to 0, a trait having only one non-blank value
in the entire phenotypes file is an error which will
halt maximization, because most of the time when this
happens it is because there is a user mistake which
needs to be corrected. If set to 1, such a trait
will be considered the "unaffected" value of a
discrete trait and discrete trait maximization will
be performed. If set to 2, the trait will be
considered a quantitative trait and quantitative
trait maximization will be performed, but then you
must also preset fake upper and lower bounds for
the mean and sd parameters because the normal
algorithm for guessing them doesn't work in this
case. Use the "parameter" command to preset these
boundaries. The latter two options are generally only
useful in the presence of covariate variation.
ParameterFormat 16 number of significant digits used for writing
parameter values to model files and queries.
Prior to Version 6.3.6, the default was 10.
16 is a compromise value that displays almost
all the precision in a double precision floating
point number, while avoiding representational issues
that cause a string of 9's to appear at the end of
a number. The largest useful value is 17, that
always shows all information available, but sometimes
at the expense of being very ugly.
IMPORTANT NOTE: the zscore options below pertain to the now obsolescent
zscore command implementation. Now it is preferred to use the zscore_
prefix operator with the define command.
zscore 0 The default mode has zscore deactivated. Activation
by setting this to a non-zero number should be done
by the zscore command and only when the additionally
required z options have been set.
zmean1 0 When zscore is active, zmean1 should be set to the
expected mean value for trait 1.
zsd1 0 When zscore is active, zsd1 should be set to the
expected standard deviation for trait 1. This option
must not be zero when zscore is non-zero.
zmean2 0 When zscore is active, zmean2 should be set to the
expected mean value for trait 2.
zsd2 0 When zscore is active, zsd2 should be set to the
expected standard deviation value for trait 2. This
option must not be zero when zscore is non-zero.
Shortcuts: opt - options
Purpose: Set maximization output directory (overriding default)
Usage: outdir <dirname>
outdir ; shows current outdir
outdir -default ; restore default: (trait name)
Notes: By default, solar models and related output are written to
the maximization output directory. By default, that directory
is named after the trait.* For bivariate models, the trait
names are separated by a period (".").
The default output directory can be overridden by this command.
Once set, it stays overridden until the "outdir -default" command
is given, or a new SOLAR session is started.
(*The directory will be named after the trait as entered in the
trait command, rather than as it exists in the phenotypes file.
For example, it will be named 'foo' if the command 'trait foo'
has been given, even if the variable is actually named FOO.)
To prepend the name of the maximization output directory to
any filename, use the "full_filename" command.
Shortcuts: outd - outdir
Purpose: Create, modify, or list parameter(s)
Usage: parameter <name> ; display or create a new parameter
parameter <name> start <value> lower <value> upper <value>
; set parameter start and
; boundaries
parameter <name> = ; return current parameter value
parameter <name> = <value> ; set current (start) value
parameter <name> start <value> ; set current (start) value
parameter <name> se ; display last computed std error
parameter <name> ; display all parameter information
parameter delete <name> ; delete parameter
parameter ; display all parameters
parameter -return ; return parameter info in a list
Notes: (1) The most commonly required standard parameters are created
automatically when you give the "covariate" and "polygenic"
commands. Their starting points and boundaries are also
set automatically either then or at the start of maximization
by fairly reliable heuristics. Boundaries are set
automatically only if both boundaries are set to zero at the
beginning of maximization, so if you preset one boundary,
be sure to set the other. The "standard" parameters include:
mean (for trait)
SD (standard deviation)
e2
h2r
h2q1
c2
b* (covariate beta)
(2) "start" and "=" are identical operators. "=" is simply a
more convenient and mnemonic name in most cases. Once
a maximization has been performed, the "start" or "="
value will actually be the maximimum likelihood estimate.
It will then be the starting value for the NEXT maximization.
Note that when you are setting the starting value for a
parameter, you must surround the "=" used for assignment
with spaces. For example:
parameter h2r = 0.1
If you did not surround the = with spaces, it would appear
that you were simply trying to create a new parameter named
h2r=0.1. To prevent this kind of mistake, such names are
not allowed. See note 4 below.
(3) When a parameter is deleted, any constraint it appears in is
deleted also. This behavior is obsolescent. In the future,
a single TERM may be deleted from the constraint instead.
In the meantime, it is recommended to edit constraints to
remove parameters slated for deletion, THEN delete the
parameter.
(4) When naming parameters, you are advised to stick to the
usual alphabetic, numeral, and underscore characters unless
there is a good reason not to.
However, other special characters may be actually allowed in
order to accomdate all automatically created parameter
names. The use of some of these characters will
force the requirement that these parameters be quoted in
<> when used in constraint, omega, and mu commands so as
not to imply math operations.
Shortcuts: par - parameters
Purpose: Convert Pedsys format file to comma delimited format
Usage: ped2csv <pedfilename> [<outfilename>]
If <outfile> is not specified, filename is <pedfile>.csv
Notes: BLANK fields are removed. Duplicate field names will cause an
error.
This command uses the "selectrecords" command, which makes it
very easy: "selectrecords <pedfilename> <outfilename>". Since
no condition is specified, all records are selected, and since
selectrecords uses the tablefile command, it can read pedsys files.
Purpose: Process the pedigree data.
Usage: load pedigree <filename> [-founders] ; loads pedigree file
pedigree show [all | <ped#>] ; displays pedigree data
pedigree classes [-full [-nowarn] [-phi2]] [-model [-meanf]]
; displays relative-class counts
When a pedigree file is loaded, each individual in the file is
assigned a unique integer identifier for internal use by SOLAR.
The mapping from permanent IDs to integer IDs is stored in the
file 'pedindex.out'. Therefore, loading a pedigree data file
named 'pedindex.out' is not allowed since that would result in
the pedigree file being overwritten. Attempting to re-load
a previously created 'pedindex.out' under a different name will
not work either; see Section 8.2.1 in the Manual for discussion.
If the pedigree file contains founders only, i.e. a set of
unrelated individuals with no parental data, parent ID fields
are not required. In this case, the '-founders' option must
be included in the load command. If this option is specified
but the pedigree file does contain parent ID fields, those
fields will be ignored
If the keyword 'all' is given in the 'pedigree show' command,
detailed info is displayed for all pedigrees. If a pedigree
number is specified, detailed info is displayed for that
pedigree only. If no argument is given, the show command
displays summary information.
The 'pedigree classes' command displays a tally of the relative
classes present in the pedigree data. By default, the counts
for relationships of 3rd degree and higher, as well as some
1st and 2nd degree relationships, are combined. If the '-full'
option is included, then the counts for all relative classes
are given separately. In this case, a warning message will be
displayed if any of the relative classes cannot be handled by
SOLAR's native method for computing multipoint IBDs. The
'-nowarn' option will turn the warning off. If the '-phi2'
option is included with the '-full' option, an additional
column will be displayed which contains the kinship coefficient,
multiplied by 2, for each relative class.
If the '-model' option in included in the 'pedigree classes'
command, the relative class tallies will include only those
pairs of individuals who both enter the polygenic analysis
specified in the null0 model for the current trait. The
'-full' and '-phi2' options work as described above, but the
'-nowarn' option is superfluous since the warning message
described above is never displayed. When the '-meanf' option
is included with the '-full' option, the "Mean F" statistic
is calculated and displayed.
The state of the currently loaded pedigree data is stored in
the file 'pedigree.info' in the current working directory.
This file persists between SOLAR runs, which means that the
pedigree data which is loaded at the end of a session will
still be loaded the next time SOLAR is invoked (from within
the same working directory.)
Notes: The pedigree load command creates several files in the current
working directory:
pedindex.out maps each ego ID to a sequential ID (IBDID)
assigned by SOLAR
pedindex.cde PEDSYS code file for pedindex.out
phi2.gz gzipped file containing the kinship matrix
multiplied by 2
house.gz gzipped file containing the household matrix
The household matrix file will be created only if a household
ID field is present in the pedigree file.
The files listed above are specifically created for the
pedigree file being loaded. They will be deleted when a new
pedigree file is loaded. Hence, a different working directory
must be used for each data set.
For a description of the pedigree file, enter 'file-pedigree'
Shortcuts: ped - pedigrees
Purpose: Calculate pedigree-specific loglikelihoods
Usage: pedlike [-q] [<model>]
-q (quiet) Supress output to terminal
Notes: Default model will be current model, if current model has
been maximized. If changes have been made to current model
since the last maximization, results may not be predictable.
If current model has not been maximized, default model is
the null0 model in current outdir.
Results are written to "pedlike.out" in the outdir
and also shown on terminal with some additional summary info.
The pedigree numbers used are based on SOLAR's pedigree
index "pedindex.out" created by the "load pedigree"
command. These do not necessarily correspond to user
pedigree numbers (and there is not necessarily even
a one-to-one correspondence). Refer to pedindex.out
to associate your ID's with the pedigrees associated
by SOLAR. (Note: pedindex.out has a code file pedindex.cde
and is best read using PEDSYS.)
Shortcuts: pedli - pedlike
Purpose: Calculate pedigree-specific LOD scores
Usage: pedlod [<test-model> [<null-model>]]
Notes: If no model is specified, the model currently in memory
is used as the test-model (useful if you have just run
multipoint or twopoint), and its null-model (having
one less linkage element) is used as the null model.
If only one model is specified, the null model is taken
from the outdir after the specified model is loaded.
The pedigree numbers used are based on SOLAR's pedigree
index "pedindex.out" created by the "load pedigree"
command. These do not necessarily correspond to user
pedigree numbers (and there is not necessarily even
a one-to-one correspondence). Refer to pedindex.out
to associate your ID's with the pedigrees associated
by SOLAR. (Note: pedindex.out has a code file pedindex.cde
and is best read using PEDSYS, but may also be read fairly
well as a text file if PEDSYS is not available.)
Note that the LOD score calculation may be affected by the
number trait(s), and the lodp options. See the documentation
for the "lodp" command for further details. When applicable,
SOLAR converts 2df bivariate LODs to "1df effective" LODs.
Shortcuts: pedlo - pedlod
Purpose: Perturb starting values for E2, H2r, and H2q's at bounds
Usage: perturb
Notes: perturb is specially tailored to the standard parameterization
of e2, h2r, h2q1, etc. perturb does nothing silently if
parameters e2 and h2r are not present.
It is no longer necessary or possible to specify h2qindex as an
unqualified argument as in earlier versions. If an unqualified
argument is specified, it is ignored.
This is used automatically by the 'linkmod' script, and therefore
also by multipoint and twopoint.
perdelta is the quantity used in the adjustment (this may be set
with the perdelta command). It defaults to 0.001
Accumulated deltas are distributed only to other parameters whose
values are 3*perdelta away from the relevant bound, and then only
in the perdelta quantity.
This does not handle conditions where parameters exceed boundaries
by more than a very small amount. (Of course, they shouldn't
exceed the boundaries at all, but sometimes they do by very small
amounts. Recent changes to the maximization routines ought to
eliminate that.)
Shortcuts: perturb - perturb
Purpose: Load the phenotypes file or display its variables
Usage: load phenotypes [<filename>]+ ;# sets phenotype file
phenotypes ;# returns phenotype names
;# (and filenames)
phenotypes -files ;# returns filenames used
;# (useful in scripts)
Notes: (1) Pedigree data should be loaded beforehand with the
"load pedigree" command. You may have pedigree and
phenotypes data in the same file if it has all the
required fields, but you will still have to load it
as phenotypes after loading it as the pedigree.
(2) The phenotypes file may be in Comma Delimited or PEDSYS format.
There must be an ID field (and FAMID field, if ID's are not
unique in all the data), and then there may be any number of
phenotypes. Sex is determined by the SEX field in the
pedigree file and a SEX field in the phenotypes file is
ignored. If FAMID is present in both pedigree and
phenotypes files, it is assumed to be necessary to make ID's
unique. If FAMID is not present in both files, uniqueness
of ID's is tested during loading (since this is an often
overlooked user error). If FAMID is present in both files,
uniqueness is not tested. The fieldname EGO may be used
in place of ID, and the "field" command may may be used to
specify a different field name for ID. For more discussion
about the phenotypes file format, see file-phenotypes.
(3) Once a phenotypes file is loaded in a particular working
directory, it remains loaded until another phenotypes is
loaded, even if SOLAR is restarted there at a later time.
The current pedigree state is kept in a file named
phenotypes.info, which points to the current pedigree file(s)
by name. When SOLAR starts, it check this file, and
get the header from the phenotypes file(s) so that the
phenotypes available are known.
(4) During maximization, the pedigree data and phenotypes file
are joined, so it is possible some errors will not be apparent
until that time.
(5) Individuals missing phenotypic data are removed from the
maximization sample, and need not be included in the
phenotypes file, however they should be included in the
pedigree file as they may contribute to the pedigree
structure and genetic coefficients of those individuals
who are included.
(6) Families in which no non-probands are present are removed
from the maximization sample. Proband status is controlled
by a PROBND field in the phenotypes file. To switch
proband detection off, you may rename that field, or
use the command "field proband -none".
Shortcuts: phen - phenotypes
Purpose: Plot multipoint LOD scores, empirical LOD adjustments, or power
Usage: plot [<chromnum>] [-pass <passnum>] [-write]
[-color <colornum>] [-overlay]
[-title <title>] [-subtitle <subtitle>]
[-yscale <maxy>] [-map <user_map>] [-lodadj]
[-min x] [-max x] [-nomark] [-nomarklab]
[-all | -allpass [-nodisplay] [-nomini]]
[-string [-allpass] [-lod <lod>] [-lodmark] [-lodscale]
[-color <colorname>] [-noconv] [-date] [-name <name>]
[-font <X-font-spec>] [-titlefont <X-font-spec]
[-layers {{<layername> [-color <colorname>]} ... }
[-replay {{<layername> [-color <colorname>]} ... }
[-title <title>] [-dash 1/2] [-linestyle 1/2]
[-liability [-model <name>]]
[-power [-title <plot_title>]]
plot -purge
plot -close
Examples: plot plot chromosome with highest LOD in pass 1
plot 9 plot chromosome 9 in pass 1
plot 9 -pass 2 plot chromosome 9 in pass 2
plot -all plot all chromosomes in pass 1
plot -all -pass 2 plot all chromosomes in pass 2
plot -allpass plot all chromosomes in all passes
plot -string plot all chromosomes in pass 1 using
"string" plot format
plot -string -allpass plot all chromosomes in all passes using
"string" plot format
If postscript output files are saved, they are written to the current
trait or outdir directory with names like these:
chr01.ps chromosome 1 (pass 1)
chr01.pass02.ps chromosome 1 (pass 2)
pass01.ps Miniplot of chromosomes in pass 1 (plot -all -pass 1)
pass01.str.ps String plot of pass 1
chromnum [1-29] Set chromosome number for plotting. The default
is to plot chromosome with highest LOD score.
-pass Set multipoint pass number for plotting. "1" would
mean the first pass in which all models have one
QTL. 1 is the default.
-close Close the XMGR plot window. The miniplot and string plot
display windows must be closed with their close buttons,
but it is better if you close XMGR from the SOLAR
command line. Otherwise, on your next plot, there will
be a delay until SOLAR determines that it cannot
communicate with the old XMGR session. Then, it will
time-out and tell you to use the "tclgr close" command,
which does the same thing as "plot -close".
-write Write postscript output file for plot(s). If there are
no other arguments and if a plot was done previously,
the output file for the previous plot is written.
Miniplot and stringplot files are always written
by default. For plots drawn using XMGR, you can
also choose to write the postscript file from the
XMGR graphical interface, which give you more options.
See note 8 below.
-nomark Do not show ticks or labels for markers. (This works
for both regular and -string plots.) Unless this
option is selected, there must be a mibddir
selection in the current directory so that SOLAR
can find the map files.
-nomarklab Do not show labels for markers (still show ticks).
-title Set plot title. Title may be blanked with
-title "" or -title " ". This is supported
by regular plots, string plots, and power
plots only. Plots made through XMGR may also
have title set through graphical interface or
by editing .gr file such as multipoint.gr.
-subtitle Set plot subtitle. Supported by regular
multipoint plots only. Subtitle may be blanked
with -subtitle "" or -subtitle " ".
-color Use this color for curve (overrides multipoint.gr default)
For regular plots, this must be integer from 1-15;
colors are defined by XMGR:
0:White 1:Black 2:Red 3:Green 4:Blue 5:Yellow
6:Brown 7:Gray 8:Violet 9:Cyan 10:Magenta 11:Orange
12:Indigo 13:Maroon 14:Turquoise 15:Green4
For string plots, the X11 color names are used. Typical
color names are:
black white blue red green grey orange purple brown violet
magenta yellow cyan
Many mixtures and shades are also available. Find the
rgb.txt file in your X11 installation for a complete list.
-overlay Plot this curve on top of the current graph, which may
already include more than one curve. (Each curve
corresponds to a distinct XMGR set, of which 30 are
available in the custom version of XMGR used by SOLAR.
To control order of sets in Legend, use the -set
option for every plot.)
-purge Delete all previously created plotfiles (not valid with
other options; only valid for multipoint plots).
-string Plot all chromosomes (in pass 1 unless otherwise
specified) using "string plot" format. (An
alternative page of plots in xmgr format can be
produced by with plot -all command.)
-name Name this plot for later use (-string plots only).
-layers <layerlist> Add one or more previous plots to this plot.
This is either a simple list of previous names, or a
nested list of names with other options, in either case
each element of <layerlist> specifies a single layer.
See extended example below under replay.
(-string plots only).
-replay <layerlist> Draw previous plots only, otherwise this is
the same as -layers. (-string plots only) Example:
trait q1
plot -string -name A1
trait q2
plot -string -name A2 -layers {{A1 -color green}}
trait q3
plot -string -name A3 -layers {{A2 -color blue} {A1 -color green}}
plot -string -replay {{A3 -color grey} {A2 -color blue} {A1 -color red}}
plot -string -replay {A3 A2 A1} ;# just default colors
Note that spaces between close and open braces, as
shown above, is required.
You can specify -color for the new top level plot and/or
for layers in the -layers or -replay list. Any unspecified
colors will default to a built-in set of defaults.
-lod lod Add horizontal scales above this lodscore (for string
plot only)
-noconv Do not mark convergence errors (string plot only)
-date Datestamp (string plot only)
-lodmark Put marker ticks ON TOP of LOD curve (default is to the
left of the plot axis) String plot only.
-lodscale Show the LOD scale when this LOD is exceeded (default
is for the scale only to appear for the highest LOD).
String plot only.
-font (String plot only!) Specify X font to use for title.
Use command "xlsfonts | more" to list X fonts.
Wildcards may be used, however, results are
sometimes unexpected. For example *bold-r* will
match first bold,roman font in list, whatever
that happens to be.
-dash 5/5 (String plot only!) Specify line style dot and
dash. Two or more integers are specified separated
by slash. The first and all odd position numbers
specify the length of drawn segments, whereas the
second and all even position numbers specify
the transparent segments. Results are approximate
depending on system used.
-linestyle 5/5 (String plot only!) Same as -dash (see above).
Note that for regular plots, linestyle can be
changed by editing the linestyle parameter in the
applicable .gr file such as multipoint.gr.
-titlefont (String plot only!) Same as -font, except applies
to title only. Supercedes -font for title only.
-all Plot all chromosomes in xmgr postscript format (in
the first pass only unless -allpass specified). A page
of miniature chromosome plots in postscript is
created (if a python interpreter is available). The
names of all postscript files are listed, and any
of them may be printed with the lp command. Nothing
is displayed on your desktop with this command. An
alternative genome plot is available with
"plot -string".
-allpass Plot all chromosomes in all passes, producing either
miniplots or "string" plots (if -string).
-nodisplay Used with -all or -allpass to skip displaying
miniplots on-screen (has no effect on xmgr graphs).
-nomini Used with -all or -allpass to skip making miniplots.
Automatically sets the "-write" option to write all
individual chromosome plots. Miniplots can always
be made later, and with more options, using the
separate "miniplot" command.
-yscale [NOTE: Ordinarily you do not need to use this.]
This sets the smallest LOD scaling if there is no
LOD above 4.99. Autoscaling will not apply for smaller
values to prevent confusion (e.g. seeing what looks
like a large peak but isn't because the y scale is
is so small). The default value is 4.99. You can
set this to 0 if you really want to look at tiny LOD
curves. Larger scaling is applied automatically, as
is the adjustment to allow space for marker labels.
-map Use user_map file (in user map format). By default,
the map information processed by the 'load map' command
(and saved in the mibddir) is used to display marker
labels and locations. However, you can substitute a user
map file (set file-map) by using this argument. This
allows you to delete unimportant markers and add QTL's of
particular interest.
-min m location at which to start plotting
-max m location at which to end plotting
(min and/or max may be used to restrict interval.
These apply to ordinary chromosome plots only.)
-quick Save time when plotting by re-using marker ticks and
labels from previous plot. This option is used
automatically when you are plotting from the multipoint
command using the "-plot" argument.
-lodadj Plot empirical LOD adjustment scores. None of the above
arguments except -close and -color are applicable in
this case. The format file lodadj.gr is used instead
of multipoint.gr, but the rules are applied in the
same way (see notes below).
-liability Plot discrete trait liability function (a different
kind of plot, not related to above). "polygenic"
command must have been run first, and the following
covariates must have been included:
sex age age*sex age^2 age^2*sex
The xmgr parameter file for -liability is liability.gr
-model name Specify a different modelname for the -liability
option. There must be a companion maximization
output file (maximize -output name.out) so, for
example, there is name.mod and name.out. The
default is poly (poly.mod and poly.out are created
by the polygenic command).
-power Plot power versus QTL heritability. Only the -title
argument is applicable in this case. The format file
power.gr is used instead of multipoint.gr.
Notes:
1. The trait or outdir must have previously been specified so
the applicable multipoint file can be found.
2. Marker labels and ticks are taken from the mibdchrN.loc file
(where N is the chromosome number) created from the user's
map file during mibd file creation. These files should be
stored in the mibddir (and the mibddir should be specified
before running plot). If the mibdchrN.loc file(s) cannot
be found, marker ticks and labels will not be shown. In
SOLAR releases 1.1.2-1.2.0 the 'load map' command will create
a mibdchrN.loc file in the current directory.
There will be a tick for each marker, and up to 42 marker
labels will be displayed. If there are more than 42
markers, some labels will not be displayed. Labels are
prioritized based on LOD score and distance from nonzero
value. By plotting after the multipoint session has
completed, one gets the best benefit from this label
prioritization. Marker ticks are always drawn vertically;
add additional line (which might be diagonal) joins the
label to its tick.
You can eliminate the need for the map file by using the
-nomark option.
3. XMGR (ACE/gr) is used for most plotting, using tclgr command.
Each SOLAR process can have only one tclgr session open. You
can change the plot command used with the 'tclgr syscommand'
command (it must be XMGR pipe command compatible).
If SOLAR is exited before closing the plot session, the plot
session may remain active (however, it may print a warning
about not being able to access the named pipe). If the user
terminates the XMGR session through its graphical interface,
the command 'plot -close' must be given to reset it before
another plot command can be given.
4. The XMGR parameter setup file multipoint.gr is loaded.
First, the multipoint.gr from SOLAR_LIB is loaded, then the
multipoint.gr from ~/lib (if any), then the multipoint.gr
from the working directory (if any). You need only include
the parameters you want to change in your local copy.
5. When SOLAR exits, the XMGR session will be terminated. If
the automatic termination of XMGR should fail, the user
should terminate XMGR anyway to prevent it from hogging CPU.
(The custom XMGR in SOLAR_BIN prevents CPU hogging.)
6. NaN's are indicated by X's on the curve. Areas of the curve
in between multiple X's may be invalid. (NaN's are Not A Number
which means maximization failed to arrive at a valid likelihood
estimate.
7. There are two additional options, -set and -graph, whose usage
is discouraged except under exceptional circumstances. They
might force the set and graph numbers to specific values.
By default, the set number is 1 (changed in version 1.6.0)
except for overlays. Overlays use the first available set
number counting backwards from 29. The graph number (except
for overlays) is the same as the set number (overlays must use
the preceding graph number). Fooling with these can get you
into trouble, but under difficult circumstances they might
help.
8. Standard postscript landscape mode is used in all output files.
If you want to choose any other output features, such as
Encapsulated Postscript (EPS), portrait mode, etc., for
those plots made by XMGR, you can open the "Printer Setup"
dialog (under the "File" menu). There you can select
portrait output in a pulldown menu, check a "Generate EPS"
check box, etc. Then, to write the file, select the
"File" option in the "Print to:" pulldown, and then press
the "Print" button at the bottom of the dialog box. You
need not go to the separate "Print" option in the file menu,
and sometimes it seems to work better to print directly
from the Printer Setup dialog anyway. All postscript files
can be printed using "lp" command. Displaying postscript
or editing on screen depends on locally available software.
Shortcuts: plo - plotmulti
Purpose: Plot qtld (qtld.out)
Usage: plotqtld <type> [-nolabels] [-nomarkers] [-file filename] [-local]
<type> is one of: strat, mgeno, qtdt, qtld
-nolabels do not include "marker" labels (ticks only)
-nomarkers do not include marker ticks or labels
-file Use named file instead of qtldm.out in outdir
-local Ignore default plot parameters; use only local file
Notes: You must select the trait or outdir first.
The plot parameter file (in SOLAR_LIB) is qtld.gr. You
may override it with a copy in ~/lib or your working
directory. Your version need only include the parameters
you would like to change. This should work in most cases.
If you specify -local, however, the qtld.gr in SOLAR_LIB
is completely ignored, and your qtld.gr must be complete,
which might get around some very obscure conflict between
the two plot parameter files.
Purpose: Plot qtn marginal tests (qtnm.out)
Usage: plotqtn [-nolabels] [-nomarkers] [-file filename] [-local]
-nolabels do not include "marker" labels (ticks only)
-nomarkers do not include marker ticks or labels
-file Use named file instead of qtnm.out in outdir
-local Ignore default plot parameters; use only local file
Notes: You must select the trait or outdir first. See qtnm for
more information. It must be possible to find the qtnm.out
file in the outdir.
The plot parameter file (in SOLAR_LIB) is qtn.gr. You
may override it with a copy in ~/lib or your working
directory. Your version need only include the parameters
you would like to change. This should work in most cases.
If you specify -local, however, the qtn.gr in SOLAR_LIB
is completely ignored, and your qtn.gr must be complete,
which might get around some very obscure conflict between
the two plot parameter files.
plotqtn accepts either the original 4 or the new 5 column qtnm.out
files. The 5 column files begin with the snp name that is not
necessarily the location.
Purpose: Set up polygenic model with class specific parameterization
Usage: polyclass [-g] [-intrait] [-incovar] [[-]]+
[-comb] [-maxi] [-rincovar] [-maxsnp ]
sporclass [-g] [-intrait] [-incovar] [[-]]+
[-comb] [-maxi] [-rincovar] [-maxsnp ]
-g Use global phenotypic values to set parameter adjustments
(otherwise, means are determined for each class)
-intrait inormalize trait values on a per-class basis
-resmax inormalize residual values in place of traits
-incovar (NOT WORKING IN version 7.1.2) inormalize covar values
on a per-class basis (only used for simple linear
covariates, no interactions or exponents)
-comb all classes combined model
-max after building the model, maximize it
-maxsnp Maximize and include snp_name as covariate
in the model and determine statistics for it: beta,
beta se, chi, p, and variance explained (varexp).
H2r's are reported for the models with and
without the snp.
Short Example:
trait q4
covariate age sex
polyclass 1-3 9
maximize -q
Notes: One phenotypes file must have a field named "class" which defines
the class value for each person in the sample.
Class specific parameters are given names with _c appended.
User covariates are transformed into class-specific mu addends.
All individuals in sample must have all variables specified as
covariates.
After choosing trait and covariates, do either sporclass or
polyclass. You cannot do a second polyclass on a sporclassed model
to make it polygenic.
Unbalanced covariates for multivariate traits are not supported.
This is different from ordinary covariate behavior for multivariate
traits--which permits covariates to be missing in the sample if they
are specific to a missing trait.
A defined pseudo-covariate named "blank_classes()" restricts the
sample to the union of all classes specified.
The maximized model is asved in the output directory as
polyclassmax.mod with output file polyclassmax.out. Note that if
-intrait option is selected, trait name and default output
directory will have leading i_ prefix (for the inormalization).
If the -resmax option is selected, the trait will be named
"residual" or "i_residual" if -intrait is also selected.
Purpose: Perform polygenic, sporadic, and/or household analysis
Calculate H2r, significance of H2r, and proportion of variance
contributed by covariates.
Optionally performs covariate screening (determine significance
level of each covariate).
Usage: polygenic [-screen] [-all] [-p | -prob <p>] [-fix <covar>]
[-testrhoe] [-testrhog] [-testrhoc]
[-sporadic] [-keephouse] [-testrhop] [-rhopse]
(screencov is an alias for 'polygenic -screen')
(sporadic is an alias for 'polygenic -sporadic')
Typically before giving this command, you will give trait,
covariate, and house (if applicable) commands. You will also load
pedigree and phenotypes files if they have not already been loaded.
solar> load pedigree ped
solar> load phenotypes phen
solar> trait hbp
solar> covariate age sex age*sex smoke
solar> polygenic -screen
Alternatively, you may use the "automodel" command first to
include all available phenotypes as covariates. See note 2
below and "help automodel".
-screen (or -s) Perform covariate screening:
Calculate significance level for each covariate, and run
only the significant covariates in the final analysis.
An inclusive significance threshold of 0.1 is used,
but may be changed with the -prob option. Covariates
may be locked in regardless of significance with the
-fix or -all options.
(An alternative method of covariate analysis using bayesian
model averaging is available with the command:
bayesavg -covariates)
-p (or -prob) p is the probability level for keeping
covariates as "significant." The default is 0.1.
It is set to be generous so that covariates are not
removed unnecessarily. (The probability levels for
H2r and C2 are fixed at 0.05, however, H2r is never
removed from the final model even if it judged to
be not significant, and C2 is only removed from the
model if it is zero in the final model and therefore
has no effect at all.)
-fix (or -f) "fix" (lock in) this particular covariate
regardless of significance level. NOTE: a -fix or -f
qualifier is required for each covariate to be fixed,
for example: -f age -f sex
-all (or -a) Keep all covariates in final anaysis regardless
of significance level.
-testrhoe (Bivariate only) Test significance of rhoe difference
from 0 by running model where rhoe is constrained to 0.
The p value is shown in the same line as the RhoE value.
-testrhog (Bivariate only) Test significance of rhog differences
from zero and from 1 (if positive) or -1 (if negative).
Because there may be two p values, they are shown
in line(s) below the RhoG result and standard error.
-testrhoc (Bivariate Household only) Test significance of rhoc
differences from zero and 1 (if positive) and -1 (if
negative). Because there may be two p values, they are
shown in line(s) below the RhoC result and std. error.
-testrhop (Bivariate polygenic only) Test significance of derived
estimate of phenotypic correlation differences
(difference from 0).
-rhopse (-testrhop must be specified also) Get standard error
of rhop, saved in model file rhop.mod and variable
SOLAR_RhoP_SE
-sporadic Only evaluate sporadic models, not polygenic.
-keephouse Keep "household effect" C2 parameter in final model
even if it maximizes to zero in the best polygenic
(or sporadic) model.
Notes: (1) Output is written to directory selected by 'outdir' command,
or, if none is selected, to a directory named by the trait. This
is called the "maximization output directory." Polygenic results
are in file named polygenic.out. Important loglikelihoods and
statistical computations are recorded in polygenic.out.logs. If
the -sporadic option is selected, the files are sporadic.out and
sporadic.out.logs. For univariate models, the residuals are
computed and written to a file named polygenic.residuals (or
sporadic.residuals), then the statistics of those residuals
are written to a file named polygenic.residuals.stats (or
sporadic.residuals.stats). If the residual kurtosis is
above 0.8, you get a special warning (see note 5 below). You
also get a special warning if the trait standard deviation is
below 0.5, which is undesireable for numerical reasons.
(2) Prior to running polygenic, you should set up the trait and
covariates. You may use the trait and covariate commands, or
use the "automodel" command. "automodel" selects all variables
otherwise unaccounted for in the phenotypes file as candidate
covariates, and also sex and the standard interactions with
sex and age. (If you are unfamiliar with "automodel" it would
be a good idea to examine the covariates afterwards with the
covariates command...)
(3) If household effect (see "house") is in effect when the
polygenic command is given, it will be included in the analysis.
If the household parameter C2 is 0 in the household polygenic
model, it will be removed from the final model regardless of
whether "covariate screening" is performed, unless -keephouse
is specified. The p value for C2 will be computed (if C2 is
nonzero), but the p value will not cause C2 to be removed from
the final model. The p value of the C2 parameters is not
computed for bivariate models.
(4) If any covariates have been constrained by the user,
certain tests are not allowed: the determination of total
variance due to covariates, or the Leibler-Kullback R
squared (done for discrete traits). Also, such covariates
are not included in the "screening" if the screening option
is selected.
(5) If you get the message about Residual Kurtosis being too high
because it is above 0.8, there is danger of LOD scores being
estimated too high in a subsequent linkage analysis. You should
start over using either tdist or lodadj or inormal (see
documentation) to protect against this. If you are already
using tdist or lodadj, you may ignore this warning, but it would
be fair to report both the Residual Kurtosis and the method
you are using to deal with it. We most strongly recommend
inormal, which in conjunction with the define command creates
an inverse normalized transformation of your trait(s).
If there are no covariates, the Kurtosis is computed from the
trait itself, and no "residuals" are computed. The same warning
threshold applies. We define Kurtosis as 0 for a standard
normal distribution; 3 has already been subtracted from the
normalized 4th central moment.
(6) The polygenic command only supports our "standard"
parameterizations. If you would like to use the esd,gsd,qsd
parameterization, use the polygsd command (see "help polygsd"
for more information) instead.
(7) For bivariate polygenic models only, a derived estimate of
RhoP, the phenotypic correlation, is displayed on terminal
and written to polygenic.out. This estimate is computed from the
h2r's, rhog, and rhoe according to the following formula:
sqrt(h2r(ti))*sqrt(h2r(tj))*rhog +
sqrt(1-h2r(ti))*sqrt(1-h2r(tj))*rhoe
To determine the significance of RhoP by comparing models with
a rhop parameter and a rhop parameter constrained to zero, use
the -testrhop option. Additional models rhop.mod and rhop0.mod
are written to the output directory.
(8) The polygenic command creates global variables which may
be accessed later (which is often useful in scripts). The
variables are:
SOLAR_Individuals number of individuals included in sample
SOLAR_H2r_P p value for h2r
SOLAR_Kurtosis residual trait kurtosis
SOLAR_Covlist_P list of p values for covariates
SOLAR_Covlist_Chi list of chi values for covariates
SOLAR_RhoP derived estimate of phenotypic correlation
for bivariate polygenic models, {} if
not calculated
SOLAR_RhoP_P -testrhop sets this to p value of rhop
being nonzero
SOLAR_RhoP_SE -rhopse sets this to se value of rhop
SOLAR_RhoP_OK -testrhop sets this if likelihood of rhop
parameterized model matches polygenic,
as it should
The covariate lists are created only if the -screen option
is used. All screened variables are included, regardless of
whether they were retained in the final model. Before you
can access any of these variables in a script, you must
use a "global" command. For example:
global SOLAR_Kurtosis
if {$SOLAR_Kurtosis > 4} {puts "Very bad kurtosis!"}
(9) The default is for the standard error option to be turned
on (and temporarily off, when desireable for certain tests).
However, if you turn the standard error option off before
starting polygenic, it will remain off.
Shortcuts: polyg - polygenic
Purpose: Set up polygenic model esd and gsd parameters (EXPERIMENTAL)
Usage: polygsd
Note: "model new" and "trait" commands should be given first.
After polygsd, you should use "maximize" command.
Use the gsd2h2r command to convert resulting esd,gsd parameters
to h2r value.
Use the linkqsd command to add in linkage element afterwards.
Example: model new
trait q4
covar age sex
polygsd
maximize
linkqsd gaw10mibd/mibd.9.18.gz ;# could maximize after this
chromosome 9 10
interval 5
mibddir gaw10mibd
multipoint -link linkqsd0 -cparm {esd gsd qsd}
Purpose: Set up polygenic model with the standard parameters
Usage: polymod [-d]
IMPORTANT:: Phenotypes, trait, and covariate commands must be
given beforehand.
-d Check for discrete trait(s) and make necessary changes.
In most cases, this option is not necessary because
"maximize" later checks for discrete traits and can also
make these changes: constraining SD to 1 and making
sure phi2 matrix is loaded, for each discrete trait.
However, use of -d option can make the constraint or matrix
order inside complex models easier to deal with.
Notes: The starting lower bound for e2 is controlled by e2lower.
Normally you do not use this command directly, but instead use
the "polygenic" command to do a complete polygenic analysis,
which maximizes a polygenic model which was set up using this
command. See the tutorial in Chapter 3.
polymod will modify an existing sporadic or linkage model
to change it to polygenic. Use spormod to set up a
sporadic model, and linkmod to set up a linkage model.
None of these commands maximize models, they just set up
or modify the parameters and omega as required.
This command removes a house parameter (if present) from
the omega, since a "polygenic" model is distinct from a
"household polygenic" model. If you want the latter, call
polymod first, then house. Or call house, THEN polygenic,
since the polygenic command will check for and handle household
effect properly.
Shortcuts: polym - polymodel
Purpose: Perform power calculations
Usage: power [-prev] [-h2t <h2t>] [-h2r <h2r>] [-data <fieldname>]
[-grid {<from> <to> <incr>}] [-lod {<lod> ...}]
[-freq <freq>] [-nreps <nreps>] [-seed <seed>]
[-overwrite] [-plot]
power -restart [-grid {<from> <to> <incr>}]
[-lod {<lod> ...}] [-nreps <nreps>] [-plot]
This command performs a power calculation for the currently
loaded pedigree, with the following assumptions:
(1) the trait to be studied is either quantitative or
dichotomous (e.g. affected/unaffected)
(2) the trait to be studied is influenced by a single
bi-allelic QTL with, optionally, a residual additive
genetic effect due to polygenes
(3) there will be fully informative marker genotype data
available for all study subjects
(4) all study subjects will be phenotyped for the trait
to be studied (unless the -data option is used to
exclude those individuals who will not have phenotypic
data; see the description of this option below)
Simulation is used to estimate the LOD score one would expect
to obtain for a QTL having a certain effect size (i.e. QTL
heritability). The expected LOD is calculated for a range of
effect sizes. The ELODs, in turn, are used to compute the power
to detect a QTL having these various effect sizes with, for
example, a LOD of 3.
The default is to perform 10 replicates of the simulation for
each effect size in the range .01, .02, .03, ..., .99. For
each replicate, both a polygenic and a linkage model are fitted
to the simulated data and then compared. The resulting QTL
heritability estimate and LOD score are recorded. The observed
LODs are converted to power, i.e. the power to detect the
corresponding observed effect size with a specified LOD.
The following options give the user some control over the power
calculation procedure:
-prev If the trait to be studied is dichotomous, SOLAR
will assume the existence of an unobserved liability
distribution. Individuals with liabilities above
some threshold value will be "affected", i.e. they
will have the larger of the two trait values (for
example, a 1 for a 0/1 trait.) The -prev option
is used to specify the "disease" prevalence, or
fraction of individuals who are "affected", which
in turn determines the liability threshold.
-grid Specify the set of effect sizes for which ELODs
will be computed. The grid is given by a set of
three numbers enclosed in curly braces:
{<from> <to> <incr>}
where <from> is the starting effect size, <to>
is the last effect size considered, and <incr>
is the interval between grid points. If the
desired grid consists of a single effect size,
the three-number list can be replaced by that
single number and curly braces are not required.
The default grid is from 0.05 through 0.5 by
steps of 0.05.
-h2r At each grid point, add a constant residual
additive genetic heritability <h2r> to the
QTL-specific heritability.
-h2t Set the residual heritability so that the total
heritability (QTL plus residual) is equal to a
constant value <h2t>.
-data Exclude individuals from the power calculation
who are missing data for phenotype <fieldname>.
-lod Specify the set of LODs for which power will be
computed. If more than one LOD is specified, the
set of numbers must be enclosed in curly braces.
The default set of LODs is { 3 2 }. The order of
the LODs is important since it is reflected in
the output file power.out (see below). The set
of LODs can also be changed for a completed power
calculation by using the -lod option in conjunction
with the -restart option.
-freq Specify the frequency of the first of the two
alleles assumed to exist for the QTL. The default
allele frequency is 0.2113; this frequency results
in the simulated trait having kurtosis = 0.
-nreps Perform <nreps> simulations at each grid point.
The default number of replicates is 100.
-seed Set the random number generator seed. The default
is to set the seed based on the date and time.
-plot At the end of the power calculations, display a
plot of power versus QTL heritability. To display
this plot for a previously completed calculation,
use the command "plot -power".
-overwrite (or -ov) Overwrite the results of a previous
power calculation.
-restart (or -r) Restart a power calculation.
Notes: It is possible to change the grid of effect sizes and the number
of replicates when restarting a calculation. The calculation
will not be restarted if a grid is chosen that does not include
all the points in the previously specified grid unless the
-overwrite option is included, in which case the simulation
replicates for any extra grid points are discarded. Similarly,
the -overwrite option is necessary if fewer replicates are
requested than were done previously, in which case any extra
replicates are discarded. The set of LODs for which power
estimates are computed can also be changed in a restart. The
other parameters, e.g. h2t, cannot be changed and are kept the
same as in the original run, with the exception of the seed for
the random number generator which is set based on the date and
time.
The plot of power versus QTL heritability is derived from a
smoothed estimate of the ELODs. Smoothing is achieved with a
least-squares fit of a second degree polynomial to the ELODs as
a function of QTL heritability. It is important to have a
sufficiently large number of replicates to produce a reasonable
curve fit. The default of 100 replicates should suffice in most
cases. To compute power as a function of the unsmoothed ELODs,
include the -nosmooth option.
The following files are created:
power.out A space-delimited file containing a line for
each grid point in the format X Y1 Y2 ..., which
is suitable for input to plotting packages such
as xmgr. The first (or X) column contains the
QTL heritability. The succeeding columns hold
the power estimates, each corresponding to a
different LOD. These columns are in the order
given by the -lod option.
power.info Stores the various options selected along with
the ELODs, averaged over the replicates, at each
grid point.
power.lods Stores the results of the simulation replicates
run at each grid point. This file, along with
power.info, is used to restart an interrupted
power calculation.
During a power calculation, various files named "simqtl.*" are
created along with a trait directory named "simqt". These will
be removed at the end of the run.
Purpose: Write message to terminal and/or file
Usage: putsout [-q] [-d.] [-nonewline] <filename> <message>
-q No output to terminal
-d. Write file to current directory
-nonewline As with puts command (note: may delay output)
<filename> *name* of file in current output directory (outdir)
<message> string
Simple Example: putsout mine.out "The result was $result"
Advanced Example: (Beginners ignore!)
set q ""
ifverbmax set q "-q"
eval putsout $q \"Iteration: $i Value: $value\"
Note: If using a variable for -q which could be "", be sure to use
eval, otherwise "" would be considered filename argument, then
remember to \ the quotes or they disappear during the eval.
Purpose: Association analysis for snps
Usage: qtld
Notes: Current model is used as starting point. It is saved in output
directory as qtld.start.mod with standard errors turned off.
Snp association phenotypes are prefixed by b_ w_ b2_ and w2_ and
are taken from the currently loaded phenotypes files. If there
is one matching phenotype, the other 3 are expected, and it is an
error if any are missing.
Main output is written to terminal and file qtld.out in the
output directory. An additional file with detailed measured
genotype information is written to mgeno.out in output directory.
These output files are fixed column size and space delimited.
The output fields in the mgeno.out file (same as in the terminal
output) are:
Trait, SNP, Stratification, Measured Genotype, QTDT, QTLD
The output fields in the mgeno.out file (also listed at the top
of the file) are:
Trait, SNP, p(mg), h2m, muAA, se(muAA), muAB, se(muAB),
muBB, se(muBB)
Purpose: Marginal tests for bayesavg -qtn
Usage: [allsnp]
bayesavg -qtn -max 1
[load map snp.map]
qtnm [-noplot] [-nomap]
-noplot Do not plot results
-nomap Do not use map file; SNP locations are encoded in names
Notes: You must do bayesavg -qtn [-max 1] first, then qtnm. qtnm
writes a datafile qtnm.out in the outdir, then invokes
plotqtn to plot it. (The -max 1 is optional, however,
if you want to do this quickly, you had best include it.)
To include all snps as covariates in the starting model, use
the separate command "allsnp".
SNP covariate names (after the snp_ or hap_ prefix) will be
mapped to locations using the currently loaded map file,
which must be loaded prior to running qtnm. Map files stay
loaded from one solar session to the next (in the same working
directory) so once you have loaded it, you do not need to load
it again.
Beginning with version 3.0.3, snp names will always be mapped
to locations using a loaded map file. However, you can revert
to the previous method, in which the locations are encoded into
the snp "names" using the -nomap option.
Beginning with SOLAR version 3.0.2, the qtnm.out file
has the following 5 columns:
SNP Name (or location if numeric)
SNP location
Chi Squared
p
log(p)
Previously there was no "SNP Name" column because it was
assumed to be the location. Note that plotqtn accepts
qtnm.out files with either 4 or 5 columns.
Shortcuts: qtn - qtnmarker
Purpose: Get the most recent quadratic form after a maximization
Usage: quadratic
Note: This could be used in a TCL script like this:
set firstq [quadratic]
Shortcuts: quadrat - quadratic
Purpose: Read hyphenated optional arguments and argument-value pairs
Usage: read_arglist arglist [identifier_name value_var]+
value_var := varname | {statement block}
Notes:
This provides a general way of handling argument lists of the
form:
[[arg] | [-identifier [value]]]+
Which is to say that there may be any number of "regular"
arguments and "hyphenated" arguments. The "hyphenated"
arguments may be stand-alone or paired with values. (Unlike
typical Unix syntax, stand-alone hyphenated arguments MAY NOT be
strung together, and hyphenated arguments with values must be
separated by space and not with some other character such as =).
The "regular" arguments (those not hyphenated or part of
a hyphenated pair) are put into a list which is returned by
this procedure.
Hyphenated arguments may either require following "value"
arguments or not allow them (in which case the hyphenated
argument acts like a switch). Value arguments must be separated
from the hyphenated argument by a space (as is typical in Tcl).
For example
bar -height 1.5
There are two ways read_arglist can handle a hyphenated argument.
(1) The first, specified by the 'varname' expansion of value_var,
performs an assignment of the "value" argument to the caller's
varname variable. For example:
read_arglist $args -height h
If $args contains "-height 1.5", then 1.5 will be assigned to the
caller's 'h' variable. Note that this method always requires
a value argument and so does not work for switch arguments.
(2) The second, specified by the '{statement block}' expansion
of value_var executes an arbitrary set of expressions in
the caller's context. This allows a simple switch or more
complex list-building. The the statement block contains the
token VALUE, a value argument is required and the token
VALUE is replaced by the actual value argument. Substitution
is performed only once and only for the first occurance of
VALUE.
A simple switch is implemented like this:
read_arglist $args -bottom {set bottom 1}
If $args contains "-bottom," bottom (in the caller's context) is
set to 1. A value argument is neither required nor allowed.
A list-building argument is implemented like this:
read_arglist $args -include {lappend ilist VALUE}
If $args contains "-include foo" then "lappend ilist foo" is
executed in the caller's context.
NOTE that in the {statement block} form, the statement block
IS REQUIRED to have more than one list element. A llength is
done to determine which form is being used. Thus, you cannot
have:
read_arglist $args -exit {exit} ;# *** BAD ***
but you could have
read_arglist $args -exit {eval exit}
If -* is used as an identifier_value, it matches any argument
in the argument list and causes that argument do be added to
the return list. Normally -* should be the last identifier
value; all following identifier values will be ignored.
Also, the value_var (or statement block) following -* is never
assigned or executed and so can be anything. This is intended
as an escape to permit only certain arguments to be processed
leaving other variables for processing by a later procedure.
More notes:
It is the responsibility of the caller to assign default
values before calling this procedure.
Hyphenated arguments may not have hyphenated strings for values.
However, hyphenated arguments may have negative numbers (e.g.
-1.2e5) for values. If the value does not parse as an integer
or floating point number, it must not begin with hyphen. If
the token following a hyphenated argument begins with - but is
not a number, it is considered to be another hyphenated argument
(which may cause the preceding argument to be flagged as having
a missing value).
Hyphenated argument names must not be numbers (integer or floating
point). For example, you may not have "-1" or "-1.2e5" as a
hyphenated argument.
Hyphenated arguments which do not match any of the templates
given raise the "Invalid argument %s".
The identifier matching rule is case insensitive.
Purpose: Read a parameter value or likelihood from any saved model
Usage: read_model <model-name> loglike ; returns loglikelihood
read_model <model-name> <parameter> ; returns mle value
read_model <model-name> <parameter> -se ; standard error
read_model <model-name> <parameter> -lower ; lower bound
read_model <model-name> <parameter> -upper ; upper bound
read_model <model-name> <parameter> -score ; score
Model is read from current maximization output directory (see
help outdir).
Example:
trait q4
read_model null0 h2r
Purpose: Read variable statistics from maximization output file
Usage: read_output <outfile> <varname> [-mean | -min | -max | -std]
-mean Get variable mean (default)
-min Get variable minimum
-max Get variable maximum
-std Get variable standard deviation
-d 1 if discrete, 0 otherwise
Note: If outfile is not full pathname, current trait/outdir is assumed.
Statistics pertain to actual sample used in maximization.
Example: read_output null1.out q4 -std
Purpose: Create registration key file
Usage: register <key>
Notes: This creates a file ~/.solar_reg containing the key. Do
not delete this file. You may copy this file to your
home directory on other systems to run SOLAR if the
same username is used. (Each key matches only one
username.)
To obtain a key, please send a request to solar@txbiomedgenetics.org.
specifing the username(s) under which you will be using the
program, the email addresses of the users, and the NIH grant
numbers (if any) that apply to the work for which SOLAR may
be used.
Shortcuts: regi - register
Purpose: Show relationships of relative pairs included in analysis
(having all required variables)
Usage: relatives [-meanf]
-meanf causes Mean f to be reported
relpairs ;# alias for "relatives -meanf"
Notes: output is returned (displayed) and also written to file named
relatives.out in current trait/outdir.
Uses previously stored null0 model in current trait/outdir.
Polygenic command should have been run previously to create
null0 model.
Shortcuts: relat - relatives
solar::relatives --
Purpose: Show relationships of relative pairs included in analysis
(having all required variables)
Usage: relatives [-meanf]
-meanf causes Mean f to be reported
relpairs ;# alias for "relatives -meanf"
Notes: output is returned (displayed) and also written to file named
relatives.out in current trait/outdir.
Uses previously stored null0 model in current trait/outdir.
Polygenic command should have been run previously to create
null0 model.
Purpose: Remove element from list by name
Usage: remlist <list> <element>
Notes: Input list is not modified, but new list is returned.
Only first matching element is removed. This works well
with setappend for set-like behavior: use setappend to add
elements to "set" and remlist to remove
elements from set.
Match testing is case insensitive.
No error is raised if thre is no matching element; input
list is returned unchanged.
See Also: setappend
Purpose: Remove a global variable (so it no longer exists)
Usage: remove_global <variable_name>
Notes: It is not necessary to declare variable as global first,
and there is no error if no such global actually exists.
See Also: if_global_exists
Purpose: Compute residuals for a maximized model and phenotypes file
Usage: residual [solarfile] [-out <residualfile>] [-phen <phenfile>]
[-fewest] [-needped]
solarfile solar maximization output file which is expected
in the current outdir. The default is null0.out
which is created by the polygenic command.
The default directory is the current outdir,
but you may specify a relative pathname to
the current directory.
EVD2 models must have actual model currently
in memory (such as is the case immediately after
running polygenic).
If the "define" command is used to define the
names used for trait(s) or covariate(s), there
must be a model with the same rootname in
the same directory as the output file. The
default is null0.mod.
Handling of the "scale" (and "noscale") commands
also requires the presence of the model with
the same rootname in the same directory as the
output file. If this model is not present,
residual will finish but print a warning if
not inside a script.
residualfile new phenotypes file with trait 'residual' (the
default is 'residual.out' written to the working
directory).
phenfile the base phenotypes file; the default is to use
the currently loaded phenotypes file(s).
-fewest Copy fewest fields to residualfile (this would be
ID, FAMID (if required), trait, and residual.
The default is to include the above along with
all other (unused) variables from the phenfile.
-needped Only include individuals in the pedigree file.
(By default, all individuals in the phenotypes
file would be included, unless there is a
covariate including sex which require the
pedigree file.)
Example:
solar> automodel pheno atrait
solar> polygenic -s
solar> residual
MOST IMPORTANT!
This procedure requires a maximization output file.
Unless otherwise specified, the default is assumed to be null0.out
which is produced by the "polygenic" command. If this is not
what you want, you need to specify the maximization output file.
You cannot specify a model file, that is insufficient.
Additional Notes:
Univariate quantitative trait only.
The trait or outdir must be specified first.
Must be run in the directory in which pedigree was loaded.
FAMID is included if present in both phenotypes and pedigree files.
residualfile will be written in comma delimited format.
This procedure does not handle hand-written 'mu' equations, only
standard covariates.
Not applicable to discrete traits.
Shortcuts: resi - residuals
Purpose: Save data for future use
The save command is an alias to make some other commands nicer.
For example, the command "save model" is actually an alias for
"model save." For more information about a particular "save"
command, see the help for the underlying file type, for example:
"help model".
save model <filename>
Shortcuts: sav - save
Purpose: scale a covariate variable, or disable default scaling
Usage: scale <var> <center> ; scale this variable to this center value
noscale <var> ; use 0 as center value disabling default
scale ; show all non-default scaling in effect
scale <var> ; show scaling for this variable
scale default <var> ; return to default scaling for var
scale default_all ; return to default for all vars
<var> any covariate variable, might be used in interaction too
<center> real number
Notes:
By default, SOLAR adjusts all covariate variables to the sample mean.
Using the scale command, you can adjust any covariate variable to
any other fixed value, or disable adjustment altogether (by adjusting
to zero).
The adjustment applies to the variable whether it appears in a simple
covariate (such as "age") or an interaction covariate (such as
"age*sex") or both.
There is currently no way of scaling the trait variable, or scaling
any variable by a factor. Those features could be added in a future
release.
Scaling is saved with the model, and is superceded by whatever scaling
is in effect with a new model.
Purpose: Perform polygenic analysis with covariate screening
Same as 'polygenic -screen'
solar::sporadic --
solar::polygenic --
Purpose: Perform polygenic, sporadic, and/or household analysis
Calculate H2r, significance of H2r, and proportion of variance
contributed by covariates.
Optionally performs covariate screening (determine significance
level of each covariate).
Usage: polygenic [-screen] [-all] [-p | -prob <p>] [-fix <covar>]
[-testrhoe] [-testrhog] [-testrhoc]
[-sporadic] [-keephouse] [-testrhop] [-rhopse]
(screencov is an alias for 'polygenic -screen')
(sporadic is an alias for 'polygenic -sporadic')
Typically before giving this command, you will give trait,
covariate, and house (if applicable) commands. You will also load
pedigree and phenotypes files if they have not already been loaded.
solar> load pedigree ped
solar> load phenotypes phen
solar> trait hbp
solar> covariate age sex age*sex smoke
solar> polygenic -screen
Alternatively, you may use the "automodel" command first to
include all available phenotypes as covariates. See note 2
below and "help automodel".
-screen (or -s) Perform covariate screening:
Calculate significance level for each covariate, and run
only the significant covariates in the final analysis.
An inclusive significance threshold of 0.1 is used,
but may be changed with the -prob option. Covariates
may be locked in regardless of significance with the
-fix or -all options.
(An alternative method of covariate analysis using bayesian
model averaging is available with the command:
bayesavg -covariates)
-p (or -prob) p is the probability level for keeping
covariates as "significant." The default is 0.1.
It is set to be generous so that covariates are not
removed unnecessarily. (The probability levels for
H2r and C2 are fixed at 0.05, however, H2r is never
removed from the final model even if it judged to
be not significant, and C2 is only removed from the
model if it is zero in the final model and therefore
has no effect at all.)
-fix (or -f) "fix" (lock in) this particular covariate
regardless of significance level. NOTE: a -fix or -f
qualifier is required for each covariate to be fixed,
for example: -f age -f sex
-all (or -a) Keep all covariates in final anaysis regardless
of significance level.
-testrhoe (Bivariate only) Test significance of rhoe difference
from 0 by running model where rhoe is constrained to 0.
The p value is shown in the same line as the RhoE value.
-testrhog (Bivariate only) Test significance of rhog differences
from zero and from 1 (if positive) or -1 (if negative).
Because there may be two p values, they are shown
in line(s) below the RhoG result and standard error.
-testrhoc (Bivariate Household only) Test significance of rhoc
differences from zero and 1 (if positive) and -1 (if
negative). Because there may be two p values, they are
shown in line(s) below the RhoC result and std. error.
-testrhop (Bivariate polygenic only) Test significance of derived
estimate of phenotypic correlation differences
(difference from 0).
-rhopse (-testrhop must be specified also) Get standard error
of rhop, saved in model file rhop.mod and variable
SOLAR_RhoP_SE
-sporadic Only evaluate sporadic models, not polygenic.
-keephouse Keep "household effect" C2 parameter in final model
even if it maximizes to zero in the best polygenic
(or sporadic) model.
Notes: (1) Output is written to directory selected by 'outdir' command,
or, if none is selected, to a directory named by the trait. This
is called the "maximization output directory." Polygenic results
are in file named polygenic.out. Important loglikelihoods and
statistical computations are recorded in polygenic.out.logs. If
the -sporadic option is selected, the files are sporadic.out and
sporadic.out.logs. For univariate models, the residuals are
computed and written to a file named polygenic.residuals (or
sporadic.residuals), then the statistics of those residuals
are written to a file named polygenic.residuals.stats (or
sporadic.residuals.stats). If the residual kurtosis is
above 0.8, you get a special warning (see note 5 below). You
also get a special warning if the trait standard deviation is
below 0.5, which is undesireable for numerical reasons.
(2) Prior to running polygenic, you should set up the trait and
covariates. You may use the trait and covariate commands, or
use the "automodel" command. "automodel" selects all variables
otherwise unaccounted for in the phenotypes file as candidate
covariates, and also sex and the standard interactions with
sex and age. (If you are unfamiliar with "automodel" it would
be a good idea to examine the covariates afterwards with the
covariates command...)
(3) If household effect (see "house") is in effect when the
polygenic command is given, it will be included in the analysis.
If the household parameter C2 is 0 in the household polygenic
model, it will be removed from the final model regardless of
whether "covariate screening" is performed, unless -keephouse
is specified. The p value for C2 will be computed (if C2 is
nonzero), but the p value will not cause C2 to be removed from
the final model. The p value of the C2 parameters is not
computed for bivariate models.
(4) If any covariates have been constrained by the user,
certain tests are not allowed: the determination of total
variance due to covariates, or the Leibler-Kullback R
squared (done for discrete traits). Also, such covariates
are not included in the "screening" if the screening option
is selected.
(5) If you get the message about Residual Kurtosis being too high
because it is above 0.8, there is danger of LOD scores being
estimated too high in a subsequent linkage analysis. You should
start over using either tdist or lodadj or inormal (see
documentation) to protect against this. If you are already
using tdist or lodadj, you may ignore this warning, but it would
be fair to report both the Residual Kurtosis and the method
you are using to deal with it. We most strongly recommend
inormal, which in conjunction with the define command creates
an inverse normalized transformation of your trait(s).
If there are no covariates, the Kurtosis is computed from the
trait itself, and no "residuals" are computed. The same warning
threshold applies. We define Kurtosis as 0 for a standard
normal distribution; 3 has already been subtracted from the
normalized 4th central moment.
(6) The polygenic command only supports our "standard"
parameterizations. If you would like to use the esd,gsd,qsd
parameterization, use the polygsd command (see "help polygsd"
for more information) instead.
(7) For bivariate polygenic models only, a derived estimate of
RhoP, the phenotypic correlation, is displayed on terminal
and written to polygenic.out. This estimate is computed from the
h2r's, rhog, and rhoe according to the following formula:
sqrt(h2r(ti))*sqrt(h2r(tj))*rhog +
sqrt(1-h2r(ti))*sqrt(1-h2r(tj))*rhoe
To determine the significance of RhoP by comparing models with
a rhop parameter and a rhop parameter constrained to zero, use
the -testrhop option. Additional models rhop.mod and rhop0.mod
are written to the output directory.
(8) The polygenic command creates global variables which may
be accessed later (which is often useful in scripts). The
variables are:
SOLAR_Individuals number of individuals included in sample
SOLAR_H2r_P p value for h2r
SOLAR_Kurtosis residual trait kurtosis
SOLAR_Covlist_P list of p values for covariates
SOLAR_Covlist_Chi list of chi values for covariates
SOLAR_RhoP derived estimate of phenotypic correlation
for bivariate polygenic models, {} if
not calculated
SOLAR_RhoP_P -testrhop sets this to p value of rhop
being nonzero
SOLAR_RhoP_SE -rhopse sets this to se value of rhop
SOLAR_RhoP_OK -testrhop sets this if likelihood of rhop
parameterized model matches polygenic,
as it should
The covariate lists are created only if the -screen option
is used. All screened variables are included, regardless of
whether they were retained in the final model. Before you
can access any of these variables in a script, you must
use a "global" command. For example:
global SOLAR_Kurtosis
if {$SOLAR_Kurtosis > 4} {puts "Very bad kurtosis!"}
(9) The default is for the standard error option to be turned
on (and temporarily off, when desireable for certain tests).
However, if you turn the standard error option off before
starting polygenic, it will remain off.
Shortcuts: scree - screencov
Purpose: Select fields (columns) from data file(s) and copy to a new file
Usage: selectfields [-noid] [<infile>]* [.] [-np] [<field-name>]*
[-o <outfile>] [-sample] [-list filename] [-noid]
A optional period (aka dot) ends the list of filenames and starts
the list of field names. If there is no dot, the first argument
is assumed to be the one and only data filename. The currently
loaded phenotypes files are automatically included at the end of
the list of files. If nothing precedes the dot, only the
phenotypes files are used. Fields found in multiple files default
to the first file in which they are found, however a warning is
given when this happens. The -np argument forces the loaded
phenotypes files to be ignored. The -sample argument forces
only the inclusion of individuals having all field values
defined. Otherwise, a record is written for every ID encountered
in the file(s) from which data is read, however one or more
data value(s) might be blank.
-list filename Use all the field names in this file, listed
one per line. These are appended to the list
of field names given in the command line, if
any.
If the -noid switch is given, the old version of selectfiles
is used. This takes one and only one <infile> followed by a
list of fieldnames, with no dot in between. The only other
option allowed is -o. No ID field is required in the input
file, and no ID field is written unless included in the list
of fieldnames. The loaded phenotypes file is not used unless
that is the one file named.
If not specified, <outfile> defaults to selectfields.out
<field-names> follow rules for phenotypes files and are also
affected by field command specifications. For example,
if you specify "ID" as field name, this would also match a
field name "EGO" in the file.
Input file may be either PEDSYS or Comma Delimited format.
Output file is comma delimited.
Example: selectfields phen.dat out.dat ID AGE -o age.dat
Purpose: Select records from a file and copy them to a new file
Usage: selectrecords <infile> [<outfile>] [{<conditions>}]*
If not specified, <outfile> defaults to selectrecords.out
Each <condition> is a Tcl conditional expression which includes
field names in the file preceded by dollar sign $.
Field names are case insensitive (you need not match
capitalization used in file itself). Each condition
must be enclosed in curly braces and spaced from other conditions
if any.
Conditions may also include actual Tcl variables, preceded by $$
Tcl variables are Case Sensitive.
Simple examples are shown, but any valid Tcl expression operators
and functions may be used, and expressions may be arbitrarily
complex...they are evaluated by the Tcl expression parser, with
the exception of special pre-substitution of $$ variables.
Internally, $$ variables are upvar'd into a local variables having
leading capital S.
If a condition includes a non-existant field, it will never be
satisfied, producing an empty result file. (In future, error
detection may be added.) If a condition includes a undefined $$
tcl variable, an error will result.
Input file may be either PEDSYS or Comma Delimited format.
Output file is comma delimited.
If the first condition does not include any dollar signs,
it must include spaces (for example, {1 == 1}). No such
requirement is made for subsequent conditions. It seems pointless
to have condition without dollar signs anyway; if no condition
is given you get all records (the "null condition" is always true).
Example: selectrecords phen.dat out.dat {$bmi > 0.3} {$famid == 10}
for {set F 1} {$F < 100} {incr F} {
selectrecords phen.dat out$F.dat {$bmi > 0.3} {$famid == $$F}
}
Note: Records are "selected" when they match ALL conditions given (unless
condition includes a non-existing field or has other error).
Purpose: Append only new elements to a list (keeping it like a set)
Usage: setappend <listname> element
Note: The list is identified by name, and may be modified, as with
lappend.
Example: set friends "John Tom Laura Jeff"
setappend friends Harald
See Also: remlist
Purpose: Perform exclusive-or (xor) on two sets (Tcl lists)
Usage: setxor aset bset
Note: If element appears multiple times in one list, but not in other,
it will appear multiple times in output.
Purpose: Show SOLAR procedure or write to a file
Usage: showproc <procname> [<filename>]
If <filename> is not specified, procedure is displayed on terminal using
the 'more' pager. If <filename> is specified, renamed proc is written
to that file.
This procedure will show any SOLAR procedure (script), whether built-in
or user-defined. Some, but not all, built-in SOLAR commands
are implemented as scripts, and can be shown by this command. Other
SOLAR commands are implemented in C++ and FORTRAN, and cannot be shown
by this command.
User-defined scripts must be used once before they can be shown.
The formatting shown by showproc may not be as pretty as it actually is
in the source file because it will concatenate lines which are extended
using backslash. showproc is based on the Tcl command "info body" which
has this "feature."
To protect built-in procedures from being accidentally superceded
through the use of this command, the procedure name is suffixed with
".copy". If you choose to edit the script, IT IS RECOMMENDED THAT
YOU DO NOT RENAME IT TO HAVE THE SAME NAME AS THE ORIGINAL PROCEDURE
UNLESS YOU REALLY KNOW WHAT YOU ARE DOING. If you do that anyway,
it would probably be ignored. SOLAR prevents you from overriding
built-in procedures by putting the directory containing the active
solar.tcl file to the front of the auto-load list. Normally, that
directory is the SOLAR_BIN directory defined when SOLAR starts up.
Even if you did have a copy of the solar.tcl file in your working
directory when SOLAR started up, procedures might be resolved either
to the solar.tcl file or to separate script files in your working
directory, depending on which appears earlier in an alphabetical
list.
Before new procedures can be used in SOLAR you must restart SOLAR or give
the newtcl command.
Purpose: Simulate a fully-informative marker and compute its IBDs
Usage: siminf -out <markerfile> -ibd <ibdfile>
-out Write simulated marker genotypes to this filename.
The default is 'siminf.out' in the current working
directory.
-ibd Write marker-specific IBDs for the simulated marker
to this filename. The default is 'ibd.siminf' in the
current working directory. The file will be gzipped.
Shortcuts: simin - siminf
Purpose: Simulate a QTL and (optionally) a linked marker
Usage: simqtl [-seed <seed>] [-inform] [-gfile <genotype_file>]
simqtl -model
simqtl -freq {<f_1> ...} [-mfreq {<f_1> ...} [-theta <theta>]]
[-ntrt <#traits>] -mean {{<m1_AA> <m1_Aa> <m1_aa>} ...}
-sdev {<sd_1> ...} [-cov {<cov1> ...}]
[-beta {{<b1_AA> <b1_Aa> <b1_aa>} ...}]
[-cmean {<cov1_mean> ...}] [-mage {<mean_age>}]
[-rhog {<gen_corr_2,1> <gen_corr_3,1> <gen_corr_3,2> ...}]
[-rhoe {<env_corr_2,1> <env_corr_3,1> <env_corr_3,2> ...}]
[-h2r {<h2r_1> ...}]
simqtl -nloc <#QTLs> -nall {<nall_1> ...}
[-ntrt <#traits>] -mean {{<m1_AA> <m1_Aa> <m1_aa>} ...}
-sdev {<sd_1> ...} [-cov {<cov1> ...}]
[-beta {{<b1_AA> <b1_Aa> <b1_aa>} ...}]
[-cmean {<cov1_mean> ...}] [-mage {<mean_age>}]
[-rhog {<gen_corr_2,1> <gen_corr_3,1> <gen_corr_3,2> ...}]
[-rhoe {<env_corr_2,1> <env_corr_3,1> <env_corr_3,2> ...}]
[-h2r {<h2r_1> ...}]
There are two steps in the simulation process: (1) specifying
the simulation model, and (2) running the simulation. The first
form of the command shown is used to run the simulation, and
takes the following optional arguments:
-seed An integer seed for the random number generator.
-inform If this argument is given, the simulated marker
genotypes will be fully informative.
-gfile For models of the second type described below, QTL
genotypes are read from a file rather than simulated.
This argument specifies the name of this file.
The simulated trait values are written to the file "simqtl.phn".
A simulated trait value will not be assigned to any individual
who has an unknown age, or who is missing data for any other
covariate specified in the simulation model. If QTL genotypes
are simulated, they will be written to the file "simqtl.qtl".
If the model includes a linked marker, the simulated marker
genotypes are written to "simqtl.mrk". Two additional files are
created and used by this command: "simqtl.dat" and "simqtl.par",
which contain pedigree/covariate data and the model parameters,
respectively. All of these files are created in the current
working directory.
If QTL genotypes are read from a file, that file must contain
an ID field, a FAMID field (if required - see the documentation
for marker files), and, for each QTL, a field containing the
QTL genotype. Unlike SOLAR marker genotypes in general, the
QTL genotypes must have integer alleles numbered consecutively
beginning with 1. Also, if there are multiple QTLs, the position
of the alleles is significant. For example, the QTL genotypes
1/3 and 2/1 are combined to form the two-locus genotype 12/31,
while genotypes 3/1 and 2/1 yield the two-locus genotype 32/11.
The second form of the command displays the simulation model
parameters.
The remaining forms of the command are for the two types of
simulation model that may be specified. In the first model, a
single QTL and, optionally, a single linked marker are simulated.
One or more correlated quantitative traits will be generated,
along with a polygenic background. Adjustments may be made to
the trait means for covariate effects. The covariates sex and
age are always included although no adjustments need be specified
for these covariates. The sex field is a required SOLAR field
and so it is guaranteed to be available. The age field is taken
from the phenotypes file, if one has been loaded. The name of
the age field must be "AGE". It is not an error if there is no
age field in the phenotypes file. The model will still contain
age correction terms (which should be set to zero), but obviously
no adjustment to the trait mean involving age can be made.
If adjustments to the mean are made for sex and/or age, then
betas must be specified for each of 5 effects: sex, male age,
female age, male age^2, and female age^2, in that order. The
parameters for this model are:
-freq The frequency of QTL alleles 1, 2, ,,,, N-1 where
the QTL has N alleles.
-mfreq The frequency of marker alleles 1, 2, ..., N-1 where
the marker has N alleles.
-theta The recombination fraction between the QTL and the
marker. The default value is 0, i.e. the QTL and marker
are fully linked.
-ntrt The number of traits controlled by the QTL. The default
value is 1.
-mean For each trait, a list of genotypic means. Genotypes
are ordered as follows: 1/1, 2/1, 2/2, 3/1, 3/2, ...
That is, the mean for genotype i/j, i >= j, is the
k-th element in the list, where k is given by
k = i*(i - 1)/2 + j
Because phase is not considered, genotypes i/j and j/i
are the same.
-sdev For each trait, the within-genotype phenotypic standard
deviation.
-cov A list of covariates, in addition to sex and age, for
which adjustments to the trait mean(s) will be made.
-beta For each trait, a set of lists, one for each covariate
including sex and age, of genotype-specific adjustments
to the trait mean. Genotype order is the same as for
genotypic means. If no betas are specified, they will
all default to 0, i.e. no covariate effects. As noted
above, sex and age together require 5 betas for each
genotype. If the betas for a particular trait and
covariate are not genotype-specific, the corresponding
list can be shortened to a single value; this value
will be used for every genotype.
-cmean For each covariate other than sex and age, a mean
value to subtract from the covariate before applying
a covariate correction to the trait means.
-mage A mean age to be subtracted before applying the age
correction to the trait means.
-rhog For each pair of traits, the genetic correlation
between those two traits. If there are N traits, the
order of the pairs is (2,1), (3,1), (3,2), ..., (N,1),
(N,2), ..., (N,N-1). The default is no genetic
correlation.
-rhoe For each pair of traits, the environmental correlation
between those two traits. The default is no
environmental correlation.
-h2r For each trait, the residual heritability expressed as
the fraction of trait variance after the QTL effect has
been accounted for. The default is no residual
heritability.
In the second type of model, there may be multiple QTLs. The QTL
genotypes are read from a file rather than simulated. Parameters
unique to this model are:
-nloc The number of QTLs.
-nall A list of the number of alleles at each QTL.
The remaining parameters are the same as in the first model type.
The order of multi-locus genotypes is analogous to the single
locus case. The multi-locus genotype i/j, i >= j, is in the k-th
position, where k is given by
k = i*(i - 1)/2 + j
The i and j refer to the i-th and j-th multi-locus haplotypes.
Haplotypes are ordered so that the alleles of the last locus
vary the fastest, while the alleles of the first locus vary the
slowest. For example, given three bi-allelic loci, the order of
the three-locus haplotypes is
111, 112, 121, 122, 211, 212, 221, 222
The order of the three-locus genotypes is then
111/111, 112/111, 112/112, 121/111, 121/112, 121/121, ...
Examples:
1. Simulate a QTL with two alleles, call them A and a, where the
frequency of allele A is 0.7. A single trait will be simulated
for which the mean of genotype AA is 90, the mean of genotype
Aa is 100, and the mean of genotype aa is 120. The trait will
have a within-genotype standard deviation of 10, and a residual
heritability of 0.3. A marker with 5 alleles of equal frequency
will be generated which has a recombination fraction of 0.05
with the QTL. The required commands are shown below - one to
create the simulation model, another to actually perform the
simulation. The first command has been broken into two lines
to avoid line-wrapping, but must actually be entered as a
single line.
solar> simqtl -freq .7 -mfreq {.2 .2 .2 .2} -theta .05
-mean {90 100 120} -sdev 10 -h2r .3
solar> simqtl
2. Simulate a QTL with 3 alleles; the allele frequencies are 0.5,
0.3, and 0.2. There is no linked marker. There are two traits
associated with this QTL. Sex and age have an effect on the
first trait; there is no sex-by-age interaction or second-order
age effect. The traits are correlated both genetically and
environmentally. A mean population age of 40 is subtracted
prior to the age correction to the mean of the first trait.
When the simulation is run, the random number generator is
seeded with the integer value 12345.
solar> simqtl -freq {.5 .3} -ntrt 2
-mean {{10 15 12 20 18 30} {50 55 60 60 55 80}}
-sdev {2.5 10} -h2r {.2 .65}
-beta {{{-1.2 .1 -.5 1.4 2 -.5} {2.4 3 1.6 -4 0 -1}
{2.4 3 1.6 -4 0 -1} {0} {0}}
{{0} {0} {0} {0} {0}}}
-mage 40 -rhog .7 -rhoe .4
solar> simqtl -seed 12345
3. Simulate a quantitative trait controlled by two QTLs. The
first QTL has 3 alleles, and the second QTL has 2 alleles.
There are 6 two-locus haplotypes, so we have a total of 21
two-locus genotypes in the order
11/11, 12/11, 12/12, 21/11, 21/12, 21/21, 22/11, 22/12,
22/21, 22/22, 31/11, 31/12, 31/21, 31/22, 31/31, 32/11,
32/12, 32/21, 32/22, 32/31, 32/32
When the simulation is run, the two-locus genotypes are read
from the file "2locgtyp".
solar> simqtl -nloc 2 -nall {3 2}
-mean {26 31 36 28 33 30 33 38 35 40 31 36 33 38 36
36 41 38 43 41 46}
-sdev 2.8
solar> simqtl -gfile 2locgtyp
Shortcuts: simqt - simqtl
Purpose: Process SNP data.
Usage: load snp [-xlinked] <genofile> [<locfile>] ; loads SNP data
snp show ; displays summary of SNP data
snp covar [-nohaplos] [-impute] ; prepare genotype covariates file
snp qtld ; prepare QTLD covariates file
snp ld [-window <window>] [-overwrite]
[-plot [-absrho] [-file <psfile>]
[-title <title>] [-date] [-gray]]
; compute linkage disequilibrium
snp effnum [<method>] [-alpha <alpha>]
; use <method> to compute the
effective number of SNPs
snp unload ; unload SNP data
SNP genotype data is treated as a special case of marker data.
The file containing SNP genotypes must be in the same format as
any SOLAR marker data file, and the SOLAR 'marker' and 'freq'
commands can be used to process the SNP genotype data. However,
the following restriction applies to SNP genotype data: there
must be exactly two allelic types present in the data for each
SNP. If a SNP has only a single allele, i.e. the SNP is not
polymorphic, it will be loaded but cannot be used for further
analysis. If a SNP with more than two alleles is encountered,
the 'load snp' command will fail. After a successful load, a
file named 'snp.typed' is created, which contains a field, named
nGTypes, giving the number of SNPs genotyped for each pedigree
member. This field is empty for untyped individuals.
The locations of the SNPs can be read from a standard SOLAR map
file by the 'load snp' command. Each location must be given as
an integer number of basepairs. Only those SNPs that appear in
the map file will be included in SNP processing commands. While
it is not necessary that every SNP in the marker file be present
in the map, it is required that genotype data be loaded for all
SNPs in the map file. If the user does not specify a map file
in the 'load snp' command but a map file has already been loaded,
that map file will be used. Otherwise, a dummy map file, named
'snp.map', will be created in which consecutive basepair locations
have been assigned to the SNPs, starting at zero.
The 'snp show' command displays a summary of the SNP genotype
data. The contents of this display are also saved in the file
'snp.show.txt'. The information displayed includes SNP name,
location, number of typed individuals, and allele frequencies.
The frequency information is ordered so that the common (higher
frequency) allele is listed first. If available, the standard
error of the allele frequency estimates and the p-value for a
test of Hardy-Weinberg equilibrium are also displayed. The
allele frequency standard errors are computed by the 'freq mle'
command. The HWE test is performed when the '-hwe' option is
included in the 'freq mle' command.
The 'snp covar' command produces a file, 'snp.genocov', in which
the SNP genotypes have been recoded to be suitable for use as
covariates in a SOLAR measured genotype, QTLD, or Bayesian
QTN analysis. This file includes a field for each SNP, named
snp_<name>, where <name> is the SNP name. Genotypes are coded
as the number of copies of the rarer allele: 0, 1, or 2. The
'snp covar' command can be run in one of three different ways.
By default, SNP haplotypes generated by a haplotype estimation
procedure, e.g. SimWalk2, are used to impute as many missing
genotypes as possible. The haplotypes are read from the file
'snp.haplotypes', which is created from the output of the
haplotype estimation procedure using the 'snphap' command.
If SNP haplotypes are not available, the '-nohaplos' option
can be given to the 'snp covar' command, in which case the
covariates are generated solely from the genotype data.
The third method for generating covariates, invoked with the
'-impute' option, is to extend the genotype imputation of the
default method. In this case, a missing genotype which cannot
be assigned exactly from the haplotypes, is imputed from the
weighted average of all haplotypes which are consistent with
that individual's genotype and estimated haplotype data, where
the weights are the estimated haplotype frequencies. These
frequencies are read from the file 'snp.haplofreqs', which is
created by the 'snphap' command. This method of generating
the covariates ensures that each individual has complete data,
i.e. there are no missing covariates. Because covariates will
have been imputed for all pedigree members, whether they were
genotyped or not, it may be desirable to include in an analysis
only those individuals for whom at least some minimum number
of SNPs were typed. This can be done by selecting on the
nGTypes field, which is taken from the file 'snp.typed' and
automatically joined to the covariates file.
In addition to the file 'snp.genocov', the 'snp covar' command
creates a file 'snp.geno-list', which contains the names of
the covariates, one per line. This file can be used to
specify the covariates to be included in various association
analyses, e.g. the 'bayesavg' command.
The 'snp qtld' command generates another type of covariates
file, in this case the covariates required for a SOLAR QTLD
analysis. This file, which is named 'snp.qtldcov', contains
four covariates for each SNP: b_<name>, w_<name>, b2_<name>,
and w2_<name>, where <name> is the name of the SNP. As with
the genotypes covariates file, the nGTypes field from the
file 'snp.typed' is automatically included and can be used
to exclude untyped individuals from an analysis.
The 'snp ld' command computes the pairwise correlation among
the SNP genotypes. This measure can be used to identify
those SNPs which are in linkage disequilibrium. The signed
pairwise correlations are written to the file 'snp.ld.dat'.
If 'snp.ld.dat' already exists, the correlations are not
recomputed unless the '-overwrite' option is specified.
The '-window' option limits the number of pairwise correlations
that will be computed by the 'snp ld' command. Only the pairs
of SNPs separated by no more than the number of basepairs in
the window will be considered. A map file containing basepair
locations must be loaded in order to use this option.
When the '-plot' option is added to the 'snp ld' command, a
PostScript LD plot will be displayed. If the file 'snp.ld.dat'
already exists, the genotypic correlations are not recomputed.
The plot is saved in the file specified by the '-file' option',
or, by default, in the file 'snp.ld.ps'. The LD measure shown
in the plot is the square of the genotypic correlation (rho^2),
or, if the '-absrho' option is specified, the absolute value of
the correlation. A plot title, enclosed in quotes, can be
supplied with the '-title' option. The '-date' option adds a
date stamp to the plot. The '-gray' option produces a gray-scale
version of the plot.
The 'snp effnum' command uses the specified method to estimate
the effective number of SNPs based on the pairwise genotypic
correlations. This is an estimate of the number of independent
tests in an association analysis utilizing these SNPs, which can
be used to determine an appropriate significance level for an
association analysis utilizing these SNP data. Currently, the
following methods are implemented:
mosk Moskvina & Schmidt (default)
liji Li & Ji
For example: snp effnum liji
The method of Moskvina & Schmidt is the more conservative of the
two and is the default. The Li & Ji method entails computing
the eigenvalues of the genotypic correlation matrix. The number
of SNPs cannot exceed the number of genotyped individuals (i.e.,
the number of rows in the correlation matrix) or the matrix will
be singular.
The 'snp effnum' command also returns the p-value required for
a result to be considered statistically significant after the
correction for multiple testing is applied. This p-value is a
function of the effective number of SNPs and the experiment-wide
significance threshold (target alpha). The '-alpha' option
specifies the target alpha (default value = .05).
Purpose: Compute SNP haplotypes and haplotype frequencies.
Usage: snphap show ; displays summary of SNP haplotypes
snphap prep <program> ; prepares input files needed to compute
; SNP haplotypes using <program>, where
; <program> is simwalk (sw) or merlin
snphap import <program> [-file <filename>] [-overwrite]
; imports SNP haplotypes from an output
; file computed by <program>, where
; <program> is simwalk (sw) or merlin
snphap freq prep ; prepares input file needed to compute
; SNP haplotype frequencies using the
; program snphap
snphap freq import [-file <filename>] [-overwrite]
; imports SNP haplotype frequencies
; from an output file generated by the
; program snphap
snphap count ; computes SNP haplotype frequencies by
; counting haplotypes present in data
snphap covar ; prepare haplotype covariates file
The 'snphap' command assumes that the 'snp load' command has been
used to load SNP genotype data. The main purpose of this command
is to prepare the SNP haplotypes and haplotype frequencies files
used by the 'snp covar' command. SOLAR does not do haplotyping
or haplotype frequency estimation itself, but rather relies on
these functions being provided by external programs.
The 'snphap prep' command is used to generate the input files for
a haplotyping run using either SimWalk2 or Merlin. The output
file created by the haplotyping procedure is then imported into
SOLAR with the 'snphap import' command, which creates the file
'snp.haplotypes'. The '-overwrite' option guards against the
unintentional replacement of an existing haplotypes file.
Haplotype frequencies can be generated in two ways. The program
SNPHAP can be used to compute frequency estimates using an EM
algorithm with the assumption that all individuals are unrelated.
The 'snphap freq prep' command prepares the input file required
by SNPHAP. The 'snphap freq import' command processes the SNPHAP
output to create the file 'snp.haplofreqs'. Alternatively, the
'snphap count' command generates the haplotype frequencies file
by simply counting the haplotypes present in the haplotypes file
'snp.haplotypes'. The haplotype frequencies file is sorted in
descending order of frequency, so that the most common haplotype
appears first.
The 'snphap covar' command generates a haplotype-based covariates
file suitable for use in a SOLAR analysis. This file, which is
named 'snp.haplocov' includes a covariate field for each of the
haplotypes present in the file 'snp.haplotypes'. These fields are
named hap_<hap#> where <hap#> is the position of the corresponding
haplotype in the file 'snp.haplofreqs'. For example, the field
corresponding to the most frequent haplotype is named hap_1 and
the value in this field is the number of copies of this haplotype
that an individual possesses. Covariates are defined only for
those individuals with two complete haplotypes in 'snp.haplotypes'
are included in the haplotype covariates file.
Purpose: Read data file applying "field" name mapping
Usage: Same as tablefile (see) but using "solarfile" command name, plus:
solarfile $tablenum establish_name <generic-name>
establish_name returns the actual field name applied or mapped
the the generic name. For example, the generic-name "id" might
actually be "ego" in the file, or a name mapped to "id" using
the field command. The generic-names are listed by the field
command.
Notes: Intended for use in scripts.
This command extends the "tablefile" command by allowing for
user-supplied field name mapping using the "field" command.
This also supports both default field names for basic identifiers:
id,fa,mo and ego,sire,dam.
Purpose: Check SOLAR version compatibility of model
solarmodel appears a the top of all new model files and identifies the
model version. If the version is incompatible with the current
version, an error message is displayed.
solartcl appears at the top of all upgraded script files. SOLAR
programmers are encoured to use solartcl as well.
To upgrade solar models, use the "upgrade" command.
Shortcuts: solarm - solarmodel
Purpose: Check SOLAR version compatibility of tcl file
solar::solarmodel --
Purpose: Check SOLAR version compatibility of model
solarmodel appears a the top of all new model files and identifies the
model version. If the version is incompatible with the current
version, an error message is displayed.
solartcl appears at the top of all upgraded script files. SOLAR
programmers are encoured to use solartcl as well.
To upgrade solar models, use the "upgrade" command.
Shortcuts: solart - solartcl
solar::polygenic --
Purpose: Perform polygenic, sporadic, and/or household analysis
Calculate H2r, significance of H2r, and proportion of variance
contributed by covariates.
Optionally performs covariate screening (determine significance
level of each covariate).
Usage: polygenic [-screen] [-all] [-p | -prob <p>] [-fix <covar>]
[-testrhoe] [-testrhog] [-testrhoc]
[-sporadic] [-keephouse] [-testrhop] [-rhopse]
(screencov is an alias for 'polygenic -screen')
(sporadic is an alias for 'polygenic -sporadic')
Typically before giving this command, you will give trait,
covariate, and house (if applicable) commands. You will also load
pedigree and phenotypes files if they have not already been loaded.
solar> load pedigree ped
solar> load phenotypes phen
solar> trait hbp
solar> covariate age sex age*sex smoke
solar> polygenic -screen
Alternatively, you may use the "automodel" command first to
include all available phenotypes as covariates. See note 2
below and "help automodel".
-screen (or -s) Perform covariate screening:
Calculate significance level for each covariate, and run
only the significant covariates in the final analysis.
An inclusive significance threshold of 0.1 is used,
but may be changed with the -prob option. Covariates
may be locked in regardless of significance with the
-fix or -all options.
(An alternative method of covariate analysis using bayesian
model averaging is available with the command:
bayesavg -covariates)
-p (or -prob) p is the probability level for keeping
covariates as "significant." The default is 0.1.
It is set to be generous so that covariates are not
removed unnecessarily. (The probability levels for
H2r and C2 are fixed at 0.05, however, H2r is never
removed from the final model even if it judged to
be not significant, and C2 is only removed from the
model if it is zero in the final model and therefore
has no effect at all.)
-fix (or -f) "fix" (lock in) this particular covariate
regardless of significance level. NOTE: a -fix or -f
qualifier is required for each covariate to be fixed,
for example: -f age -f sex
-all (or -a) Keep all covariates in final anaysis regardless
of significance level.
-testrhoe (Bivariate only) Test significance of rhoe difference
from 0 by running model where rhoe is constrained to 0.
The p value is shown in the same line as the RhoE value.
-testrhog (Bivariate only) Test significance of rhog differences
from zero and from 1 (if positive) or -1 (if negative).
Because there may be two p values, they are shown
in line(s) below the RhoG result and standard error.
-testrhoc (Bivariate Household only) Test significance of rhoc
differences from zero and 1 (if positive) and -1 (if
negative). Because there may be two p values, they are
shown in line(s) below the RhoC result and std. error.
-testrhop (Bivariate polygenic only) Test significance of derived
estimate of phenotypic correlation differences
(difference from 0).
-rhopse (-testrhop must be specified also) Get standard error
of rhop, saved in model file rhop.mod and variable
SOLAR_RhoP_SE
-sporadic Only evaluate sporadic models, not polygenic.
-keephouse Keep "household effect" C2 parameter in final model
even if it maximizes to zero in the best polygenic
(or sporadic) model.
Notes: (1) Output is written to directory selected by 'outdir' command,
or, if none is selected, to a directory named by the trait. This
is called the "maximization output directory." Polygenic results
are in file named polygenic.out. Important loglikelihoods and
statistical computations are recorded in polygenic.out.logs. If
the -sporadic option is selected, the files are sporadic.out and
sporadic.out.logs. For univariate models, the residuals are
computed and written to a file named polygenic.residuals (or
sporadic.residuals), then the statistics of those residuals
are written to a file named polygenic.residuals.stats (or
sporadic.residuals.stats). If the residual kurtosis is
above 0.8, you get a special warning (see note 5 below). You
also get a special warning if the trait standard deviation is
below 0.5, which is undesireable for numerical reasons.
(2) Prior to running polygenic, you should set up the trait and
covariates. You may use the trait and covariate commands, or
use the "automodel" command. "automodel" selects all variables
otherwise unaccounted for in the phenotypes file as candidate
covariates, and also sex and the standard interactions with
sex and age. (If you are unfamiliar with "automodel" it would
be a good idea to examine the covariates afterwards with the
covariates command...)
(3) If household effect (see "house") is in effect when the
polygenic command is given, it will be included in the analysis.
If the household parameter C2 is 0 in the household polygenic
model, it will be removed from the final model regardless of
whether "covariate screening" is performed, unless -keephouse
is specified. The p value for C2 will be computed (if C2 is
nonzero), but the p value will not cause C2 to be removed from
the final model. The p value of the C2 parameters is not
computed for bivariate models.
(4) If any covariates have been constrained by the user,
certain tests are not allowed: the determination of total
variance due to covariates, or the Leibler-Kullback R
squared (done for discrete traits). Also, such covariates
are not included in the "screening" if the screening option
is selected.
(5) If you get the message about Residual Kurtosis being too high
because it is above 0.8, there is danger of LOD scores being
estimated too high in a subsequent linkage analysis. You should
start over using either tdist or lodadj or inormal (see
documentation) to protect against this. If you are already
using tdist or lodadj, you may ignore this warning, but it would
be fair to report both the Residual Kurtosis and the method
you are using to deal with it. We most strongly recommend
inormal, which in conjunction with the define command creates
an inverse normalized transformation of your trait(s).
If there are no covariates, the Kurtosis is computed from the
trait itself, and no "residuals" are computed. The same warning
threshold applies. We define Kurtosis as 0 for a standard
normal distribution; 3 has already been subtracted from the
normalized 4th central moment.
(6) The polygenic command only supports our "standard"
parameterizations. If you would like to use the esd,gsd,qsd
parameterization, use the polygsd command (see "help polygsd"
for more information) instead.
(7) For bivariate polygenic models only, a derived estimate of
RhoP, the phenotypic correlation, is displayed on terminal
and written to polygenic.out. This estimate is computed from the
h2r's, rhog, and rhoe according to the following formula:
sqrt(h2r(ti))*sqrt(h2r(tj))*rhog +
sqrt(1-h2r(ti))*sqrt(1-h2r(tj))*rhoe
To determine the significance of RhoP by comparing models with
a rhop parameter and a rhop parameter constrained to zero, use
the -testrhop option. Additional models rhop.mod and rhop0.mod
are written to the output directory.
(8) The polygenic command creates global variables which may
be accessed later (which is often useful in scripts). The
variables are:
SOLAR_Individuals number of individuals included in sample
SOLAR_H2r_P p value for h2r
SOLAR_Kurtosis residual trait kurtosis
SOLAR_Covlist_P list of p values for covariates
SOLAR_Covlist_Chi list of chi values for covariates
SOLAR_RhoP derived estimate of phenotypic correlation
for bivariate polygenic models, {} if
not calculated
SOLAR_RhoP_P -testrhop sets this to p value of rhop
being nonzero
SOLAR_RhoP_SE -rhopse sets this to se value of rhop
SOLAR_RhoP_OK -testrhop sets this if likelihood of rhop
parameterized model matches polygenic,
as it should
The covariate lists are created only if the -screen option
is used. All screened variables are included, regardless of
whether they were retained in the final model. Before you
can access any of these variables in a script, you must
use a "global" command. For example:
global SOLAR_Kurtosis
if {$SOLAR_Kurtosis > 4} {puts "Very bad kurtosis!"}
(9) The default is for the standard error option to be turned
on (and temporarily off, when desireable for certain tests).
However, if you turn the standard error option off before
starting polygenic, it will remain off.
solar::polyclass -- (EXPERIMENTAL)
Purpose: Set up polygenic model with class specific parameterization
Usage: polyclass [-g] [<class-start>[-<class-end>]]+
sporclass [-g] [<class-start>[-<class-end>]]+ ;# sporadic model
-g Use global phenotypic values to set parameter adjustments
(otherwise, means are determined for each class)
Short Example:
trait q4
covariate age sex
polyclass 1-3 9
maximize -q
Notes: One phenotypes file must have a field named "class" which defines
the class value for each person in the sample.
Class specific parameters are given names with _c<class> appended.
User covariates are transformed into class-specific mu addends.
All individuals in sample must have all variables specified as
covariates.
After choosing trait and covariates, do either sporclass or
polyclass. You cannot do a second polyclass on a sporclassed model
to make it polygenic.
Unbalanced covariates for multivariate traits are not supported.
This is different from ordinary covariate behavior for multivariate
traits--which permits covariates to be missing in the sample if they
are specific to a missing trait.
A defined pseudo-covariate named "blank_classes()" restricts the
sample to the union of all classes specified.
Purpose: Set up a sporadic model with the standard parameters
Usage: spormod
Notes: There are no arguments. You must have previously loaded the
phenotypes file, selected the trait, and specified the
covariates.
Household effects are suspended. If you want a 'household'
model, give the spormod command first, then the 'house' command.
The starting lower bound for e2 is controlled by e2lower.
Normally you do not use this command directly, but instead use
the "polygenic" command to do a complete polygenic analysis,
which maximizes a sporadic model which was set up using this
command. See the tutorial in Chapter 3.
Shortcuts: spor - spormodel
Purpose: Get and/or show statistics for any variable in a file
Usage: stats [<variable>+ | -all [-file <filename>]] [-q] [-return]
[-out <outfile>]
-all show stats for all variables in phenotypes file
-return do not write output file, return list of stats;
use stats_get to parse return list
-q do not display to terminal
-out specify alternate output filename; also returns list
of stats
The default variable is the current trait, and the default
filename is the currently loaded phenotypes file. You may also
specify one or more variables.
Results are written to stats.out in the CURRENT WORKING DIRECTORY.
(Not the usual output directory, since the trait need not be set.)
They are also displayed on the terminal.
The statistics computed are mean, minimum, maximum, standard
deviation, skewness, and kurtosis. (Note: We define kurtosis
as 0 for a standard normal distribution; 3 has already been
subtracted from the normalized 4th central moment.)
See also the zscore command, which uses these stats to
zscore the current trait during maximization. The zscore
procedure uses stats with the -out option.
If there are multiple phenotypes files, joinfiles will be
used to create a joined file in the working directory named
joinfiles.stats.[pid].[uname -n].out. Non-unique fieldnames
will be renamed following the rules of joinfiles. Under most
circumstances, this joined file will be deleted before
stats returns. To run through the entire contents (and names)
in the joined file, use the "stats -all" command.
Variables created by a "define" command will work with stats
beginning with SOLAR version 4.2.2. Note that such variables
are not evaluated by the command "stats -all".
Purpose: Retrieve statistics from list returned by stats
Usage: stats_get <stats> <statistic> [<variable>]
<stats> list returned by stats procedure
<statistic> name of statistic desired (see below for list)
<variable> select this variable (default: <first>)
Example: set stat [stats -q -return q1]
set kurt [stats_get $stat kurtosis]
set skew [stats_get $stat skewness]
Notes: The following statistics are available:
variable name of variable
count number of individuals having this variable (sample)
missing number of individuals missing this variable
mean mean
min minimum value
max maximum value
sd standard deviation
skewness skewness
kurtosis kurtosis
discrete 0 if quantitative, 1 if discrete, -1 if not coded
properly
alpha 0 if valid numbers; 1 if alphanumeric
Of course, if a variable is selected, that variable must have
been included in the stats list. When running the stats command
you may select any number of variables or use the -all option.
See the stats command for further information.
Purpose: Foward stepwise covariate screening
Usage: stepfor [-list listfile] [-list list] [-verbose] [-v]
[-fix listfile] [-fix fixlist] [-max maxdim]
[-p pvalue] [-test othertest] [-par] [-parclean]
stepclean ;# Remove fully_typed covariate and unload file
By default, stepfor will test all covariates in the current
model, testing them all and then fixing the best one, and then
repeating the process until the best one does not meet the
default pvalue of 0.05, or user specified p-value or test (see
below). The final model will contain all the covariates which met
the screening test. A file named stepfor.out is written to the
output directory with all the loglikelihoods, and a file named
stepfor.history is written with other information. All of the
best models for each number of covariates are saved as
stepfor.null<i> where <i> is the number of tested covariates.
To ensure that all models use the same sample, a new file named
fully_typed.out is created in the output directory which
defines a variable named "fully_typed" for each fully typed
individual. This file is added to the list of open phenotypes
files, and the variable "fully_typed" is added to the
model as a "null" covariate which has no effect on the model
other than restricting the sample to fully typed individuals.
To remove the fully_typed covariate and unload the fully_typed.out
phenotypes file, give the command "stepclean" after stepfor has
completed.
-list listfile listfile is a file containing a list of all
covariates to be tested, one on each line.
The filename cannot contain spaces. These
covariates may or may not be in the model when
the command is given. If the -list option is
specified, all other covariates in the starting
model are automatically fixed.
-list list Alternatively, a Tcl list of covariates to
be tested can be specified. Tcl lists are
space delimited and enclosed in quotes or curly
braces.
-fix list list is a Tcl list of covariates to be
included in every model and not tested. Their
values will be estimated by maximum likelihood
for every model, unless you constrain them.
These covariates may or may not in the model
when the command is given. For -fix, a list
could be simply one phenotype, and that
supercedes a file with the same name.
-fix listfile Alternatively, a file containing a list of all
covariates to be included in every model may
be specified. The filename cannot contain
spaces. The list of covariates to be fixed
will supercede the list of covariates to be
tested if the same covariate occurs on both
lists, however a warning will be given.
-p pvalue pvalue is the highest p value allowed for
a covariate to be included. The default is 0.05.
-max maxdim maxdim is the maximum number of test covariates
to be included in a model (the maximum dimension).
-verbose Show maximization output during maximizations.
-v Same as -verbose
-par New and EXPERIMENTAL! This option turns on Parallel
processing on the SFBR GCC Compute Ranch.
WARNING! Do not run more than one instance of
stepfor -par from the same working directory.
Parallel stepfor will use many (but not all!) ranch
machines, and access for other users and jobs may
be delayed due to gridware thrashing. The usual
output is not printed to the terminal to save time
but numerous parallel status messages are printed
to help the developers make this program better.
The parallel operation is automatic and the
parallel status messages may be ignored by most
users most of the time unless there is no output
for more than fifteen minutes. Note: If model
includes linkage element matrices loaded from
some mibddir, those matrices should be relocated
to the working directory, or specified with an
absolute pathname in the model file. This is
because in parallel operation the model is loaded
not in the current working directory but in a
subdirectory of /tmp.
-parclean Normally, parallel stepfor cleans up after itself.
However, if it is necessary to force a shutdown
of a parallel stepfor, normal cleanup is not
done. "stepfor -parclean" cleans up all the
junk stepfor files in /tmp directories on all
ranch machines. This must be run on medusa. Do
not run if you have any other running parallel
jobs (parallel stepfor, parallel bayesavg, or any
parallel job using "launch" or "doscript") as
their files may be deleted too.
See also "doranch" for other ranch cleanup
procedures. Cleanup history is written to a file
named cleantmp.out.
-test othertest othertest is a user defined Tcl proc that judges
whether or not a covariate should be included.
The test model with the best covariate is loaded
at the time this procedure is called. This
procedure takes two mandatory arguments (whether
they are needed by the procedure or not).
loglike0 nullmodelname
loglike0 is the loglikelihood of the null model
which does not contain the current test covariate.
nullmodelname is the pathname to the null model
itself. The procedure may obtain the
loglikelihood of the current model with the
loglike command. The default procedure looks
like this:
proc stepfortest {loglike0 nullmodel} {
set chisq [expr 2.0 * ([loglike] - $loglike0)]
if {$chisq >= 0} {
set pvalue [chi -number $chisq 1]
} else {
set pvalue 1
}
set pvalue [chi -number $chisq 1]
putsout stepfor.history "\n *** p = $pvalue"
global SOLAR_stepfor_pvalue
if {$pvalue <= $SOLAR_stepfor_pvalue} {
return 1
}
return 0
}
Note that the default procedure does not use
the nullmodel argument, but it does use a
global variable that you will not have to use.
The global supports the -p argument. The
procedure may load the nullmodel without
restoring the current model; that is handled
by the stepfor procedure itself.
Purpose: Covariate screening by Step Up algorithm, useful for QTN analysis
Usage: stepup [-list listfile] [-list list] [-verbose]
[-fix listfile] [-fix fixlist]
[-cutoff cutoff] [-logn logn] [-finishlogn]
[-symmetric] [-cornerdf df] [-par]
[-parclean]
stepup is an fast version of bayesavg and may be used in
QTN analysis.
By default, stepup will test all covariates in the current
model one at a time, then add all the new covariate models
within the BIC cutoff to the window. Then the window models are
subjected to another round of testing against all covariates,
and the process repeats until no more models are added to the
window. Unlike bayesavg, this algorithm doesn't test all
possible models, just those that are derived from those in
the window. When completed, it writes files named stepup.win
and stepup.avg to the output directgory containing posterior
probabilities for the window models and components.
To ensure that all models use the same sample, a new file named
fully_typed.out is created in the output directory which
defines a variable named "fully_typed" for each fully typed
individual. This file is added to the list of open phenotypes
files, and the variable "fully_typed" is added to the
model as a "null" covariate which has no effect on the model
other than restricting the sample to fully typed individuals.
This covariate is removed from the final best model stepup.best,
so you may get a different likelihood in subsequent maximization.
Up to dimension 3, all models with BIC better than the null model
are retained. (This feature may be controlled with the -cornerdf
option.) Also, the default "strict" rule is only applied to
remove apparently redundant higher dimensional models at the
very end after all important dimensions have been scanned.
-list listfile listfile is a file containing a list of all
covariates to be tested, one on each line.
The filename cannot contain spaces. These
covariates may or may not be in the model when
the command is given. If the -list option is
specified, all other covariates in the starting
model are automatically fixed.
-list list Alternatively, a Tcl list of covariates to
be tested can be specified. Tcl lists are
space delimited and enclosed in quotes or curly
braces.
-fix list list is a Tcl list of covariates to be
included in every model and not tested. Their
values will be estimated by maximum likelihood
for every model, unless you constrain them.
These covariates may or may not in the model
when the command is given. For -fix, a list
could be simply one phenotype, and that
supercedes a file with the same name.
-fix listfile Alternatively, a file containing a list of all
covariates to be included in every model may
be specified. The filename cannot contain
spaces. The list of covariates to be fixed
will supercede the list of covariates to be
tested if the same covariate occurs on both
lists, however a warning will be given.
-cutoff cutoff Set the final BIC cutoff. The default is 6.
-logn logn Use this fixed value for log(N) from the
beginning.
-finishlogn logn Recompute results of previous analysis with this
log(N) value. Sometimes stepup fails at the end
because the standard error of the SE parameter
of the best BIC model cannot be computed, and
that is needed to compute the final log(N).
This option allows you to finish such a run that
nearly completed previously. Be sure that
starting conditions (such as loaded pedigree,
phenotypes, model, outdir) and options are
exactly the same as before. The original startup
(stepup.orig.mod) and null models from the output
directory will be loaded. Note that the temporary
log(N) used by stepup by default is simply the
log of the sample size, and this is reported
to the stepup.history file. You may choose to
use that or some other estimate. A special file
required is stepup.winmods.prelim, which was
produced by the previous incompleted run of
stepup.
-verbose Show maximization output during maximizations.
-v Same as -verbose
-cornerdf df EXPERIMENTAL. This sets the last degree of
freedom that uses a loose test to include models
in the window. Models need only have a better
BIC than the null model up to and including
this df. The default is 3.
-symmetric Apply symmetric rule rather than strict. This
results in a larger window.
-par This option turns on Parallel
processing on the SFBR GCC Compute Ranch.
WARNING! Do not run more than one instance of
stepup -par from the same working directory.
Parallel stepup will use many (but not all!) ranch
machines, and access for other users and jobs may
be delayed due to gridware thrashing. The usual
output is not printed to the terminal to save time
but numerous parallel status messages are printed
to help the developers make this program better.
The parallel operation is automatic and the
parallel status messages may be ignored by most
users most of the time unless there is no output
for more than fifteen minutes. Note: If model
includes linkage element matrices loaded from
some mibddir, those matrices should be relocated
to the working directory, or specified with an
absolute pathname in the model file. This is
because in parallel operation the model is loaded
not in the current working directory but in a
subdirectory of /tmp.
-parclean Normally, parallel stepup cleans up after itself.
However, if it is necessary to force a shutdown
of a parallel stepup, normal cleanup is not
done. "stepup -parclean" cleans up all the
junk stepup files in /tmp directories on all
ranch machines. This must be run on medusa. Do
not run if you have any other running parallel
jobs (parallel stepup, parallel bayesavg, or any
parallel job using "launch" or "doscript") as
their files may be deleted too.
See also "doranch" for other ranch cleanup
procedures. Cleanup history is written to a file
named cleantmp.out.
Purpose: Case insensitive string match testing
Usage: string_imatch <string1> <string2>
Returns 1 for case insensitive match, 0 otherwise.
Note: Useful in SOLAR scripts.
Purpose: String plot of entire genome scan
Usage: multipoint
stringplot [-pass pass] [-allpass] [-title] [-lod <lod>] [-lodmark]
[-color <name>] [-noconv] [-date] [-nomark]
[-font <X-font-spec>] [-titlefont <X-font-spec>]
[-dash <dash spec>] [-linestyle <dash spec>]
[-mibddir <mibddir>]
Notes: You can also use the command "plot -string" which has the
same options and works identically. For further information on
the options, see "help plot", where all the options are
described. Here are the more important ones. No options are
usually needed, they are usually for fine-tuning the display.
-pass Multipoint oligogenic pass number, "1" is default
-allpass Plot all multipoint passes (in separate plots)
-title Title of plot
-lod <lod> Show LOD scale for lods this high (default is highest)
-lodmark Put marker ticks ON TOP of LOD curve (default is axis)
-color Takes standard names like "blue" and "red"
-noconv Do not mark convergence errors
-date Datestamp plot
-nomark Do not show marker ticks (very useful for GWAS)
-font X font for text (see xlsfonts | more)
-titlefont X font for title only
-dash Line style (see "help plot" for description of spec)
-linestyle Line style (same as -dash)
-mibddir specify mibddir (default is set with mibddir command)
-mapfile User mapfile
-layers Method of using multiple colors. See help plot.
mibddir and trait (or outdir) must have been specified previously.
String plot graph will be both displayed on screen and written
to file. If you are running on a remote system, you will
need to enable X window forwarding by setting DISPLAY variable
to point back to X display, and enabling acceptance of X
protocol with xhost + command, as described in section
3.8.3.1 of the SOLAR documentation. Sorry, there is no possible
way to write the the file without displaying the plot, the
underlying "tk/wish" program does not allow that.
An encapsulated postscript file is written to the trait/outdir
with the name str.passN.ps where N is the pass number,
such as str.pass01.ps
If a copy of the string plot script, which is named
"stringplotk", is found in the current working directory, that
will be used in place of the standard version in the
SOLAR bin directory. You can customize stringplotk as you
wish. (It is a "wish" script, after all.) Good luck!
Shortcuts: stringp - stringplot
Purpose: Read data file in comma delimited or PEDSYS format
Usage: set tablenum [tablefile open <filename>]
tablefile $tablenum names ; return field names
tablefile $tablenum short_names ; return short names
tablefile $tablenum widths ; return field widths
tablefile $tablenum start_setup ; start user record
tablefile $tablenum setup <name> ; add field to user record
tablefile $tablenum get ; get user record
tablefile $tablenum rewind ; rewind file
tablefile $tablenum close ; close file
tablefile $tablenum test_name <name> ; test for named field
tablefile $tablenum get_position ; get current position
tablefile $tablenum set_position <pos> ; set position
Notes: Intended for use in scripts.
The get command will return data elements in a proper list.
This means that if a data element includes spaces, it will be
enclosed in braces. For best results, data records should be
read using lindex, which removes the braces.
On End of File, get will return an empty list. This should be
tested for. Other file errors will raise Tcl exceptions.
See Also: solarfile
Shortcuts: tabl - tablefile
Purpose: Create xmgr session with pipe connection to SOLAR
Note: This is a low-level plot interface used by other commands.
Most users will use the higher level interfaces such as
'plot' or 'multipoint -plot.'
Usage: tclgr open ;# Start xmgr session
tclgr send <xmgr command line> ;# send command and wait now
tclgr buffer <xmgr command line> ;# add xmgr command to buffer
tclgr flush ;# flush buffer of commands
tclgr close ;# end xmgr session
tclgr syscommand <syscommand> ;# Setup sys command for XMGR
;# 'xmgr' is the default
<XMGR command line> ;# defined in Ace/gr docs
The tclgr open command has a '-buffersize <number>' option. The default
buffersize is 1000.
If the user closes the XMGR session remotely, the 'tclgr close' command
must be used to officially close it before it can be re-opened.
Shortcuts: tclg - tclgr
Purpose: Set up t option for robust estimation of mean and variance
Usage: tdist set up t option
tdist -off turn off t option
Notes: tdist creates a parameter t_param and sets tdist option
tdist -off deletes t_param and unsets tdist option
Shortcuts: tdis - tdist
Purpose: Calculate seconds between two system time strings
Usage: timediff <start-time> <end-time>
See Also: startclock, stopclock
set starttime [exec date]
... procedure to be timed
set endtime [exec date]
return "seconds: [timediff $starttime $endtime]"
Purpose: Write previous commands to a script
Usage: toscript [-ov] <name> [<first>[-<last>]]*
-ov Force overwrite of previous script
<first> First command number to be included
<last> Last command number in sequence to be included
Example: toscript analysis 1 3 9-20 ;# include commands 1, 3, and 9-20
Notes: Command numbers are displayed with the tcl "history" command.
If no numbers are specified, all previous commands in this SOLAR
session will be included in script.
Script will be saved in file named <name>.tcl in the current
directory. After saving, newtcl will automatically be invoked
so that the script can be used immediately.
The script <name> defaults to being the first argument, but may
also be the last argument if the <name> is not a number or range
of numbers so there is no ambiguity. For example:
toscript 1-10 startscript ;# OK
toscript startscript 1-10 ;# OK
toscript 2 1-10 ;# OK script named 2.tcl
toscript 1-10 2 ;# OK script named 1-10.tcl
Purpose: Select the trait (dependent variable)
Usage: trait ; show current trait info
trait <trait1> ; selects one trait
trait [<traiti> ]+ ; multivariate (up to 20)
trait -noparm [<traiti> ]+ ; don't touch parameters at all
[define <defname> = <expression>]+ ; Define any expressions as
trait [<phenotype>|<defname> ]+ ; traits...see "help define"
Notes: Solar is case insensitive to variable names. Thus the command:
trait foo
will match a variable named FOO. Variables can not be
distinguished on the basis of case in Solar. A phenotypes file
must be loaded before giving the trait command.
Starting with SOLAR version 4.x, arbitary expressions including
one or more phenotypes may be defined with the "define" command
and then used as trait(s). See "help define" for more details.
If a model has already been created, it is recommended to give
the "model new" command to clear it out prior to giving the trait
command. It is only reasonable to skip the "model new" if the
new trait has similar parameter estimates to the previous trait.
For a change of one trait to another, SOLAR will attempt
to accomodate the change by adjusting parameter values (unless
the -noparm option is used) as described below. Any change of
trait(s) involving two traits is not permitted (if any
trait-specific parameters have been created); you will get an
error message and the trait will go into a special error state
("Must_give_command_model_new") which will require you to give
the "model new" command to clear before any model can be
maximized (however, the model can be examined and saved in this
state...you may wish to repair it offline in a text editor).
Under no circumstances will the trait command create new
parameters or delete old parameters. Normally the "polygenic"
command is given to create and test the standard variance
component parameters.
If changing from one trait to another, the Mean and SD parameters,
if present, will be reset to zero to force setting starting values
and boundaries during the next maximization. Covariate betas
will boundaries will also be zeroed.
Examples:
trait bmi
trait q1 q2
define a = 10 * log(q4)
trait a q3
Shortcuts: trai - traits
Purpose: Perform "Twopoint" analysis on directory of ibd files
Usage: twopoint [-append] [-overwrite] [-grid] [-cparm {[<parameter>]*}]
-saveall
-overwrite (or -ov) Overwrite existing twopoint.out file.
-append (or -a) Append to existing twopoint.out file.
-cparm {} Custom parameters. Scanning will consist of
replacing one matrix with another matrix, everything
else is unchanged. The starting model MUST be
a "prototype" linkage model with the desired
parameters, omega, and constraints. Starting
points and boundaries for the parameters must be
explicitly specified. Following the -cparm tag,
there must be a list of parameters in curly braces
that you want printed out for each model. The
list can be empty as is indicated with a pair of
curly braces {}. There must be a model named null0
in the maximization output directory for LOD
computation purposes. The matrix to be replaced
must have name ibd or ibd1, ibd2, etc. The highest
such ibd will be replaced. If the matrix is loaded
with two "columns," such as "d7," each succeeding
matrix will be loaded with two columns also.
See section 9.4 for an example involving dominance.
-grid Enables the "grid" option, which estimates recombination
fractions in the range theta=0 to 0.45, finding the
optimal value to the nearest 0.01. (Note: this option is
not important for most twopoint users. It also
increases evaluation time considerably. Consider using
the separate "grid" command with only the markers of
greatest interest.)
-saveall Save all twopoint models in the maximization output
directory. The models are named "ibd.<marker>".
Notes:
The trait or outdir must be specified before running twopoint.
There must be a null0.mod model in the trait or outdir
directory. This can be created with the polygenic command
prior to running multipoint. (This model may include
household and covariate effects. See the help for the
polygenic command for more information.)
An output file named twopoint.out will be created in the trait
or outdir directory. If that file already exists, the user must
choose the -append or -overwrite option.
The best twopoint model is saved as two.mod in the trait or outdir
directory. It is also loaded in memory at the completion of the
twopoint command.
IBDDIR should be set with the ibddir command prior to running
twopoint.
If models have two traits, the 2df LOD scores will be
converted to 1df effective LOD scores, with the assumption
that parameter RhoQ1 is not intentionally constrained.
To override this, use the lodp command (see). This feature
was first included with beta version 2.0.1.
Shortcuts: twop - twopoint
Purpose: Upgrade model files and scripts
Usage: upgrade modelname
upgrade scriptname.tcl
Notes: If successful, the new file will replace the original one.
The old file is saved with ".old" tacked on to the end of
the name (e.g. amodel.mod.old).
If an error is reported, the original file remains unchanged.
If the file is a model, the ".mod" extension is assumed even if
not specified. Solar always tacks on ".mod" to the end of
model filenames.
If the file is a script, it must end with the ".tcl" extension,
and the extension must be specified in the command as shown.
Upgrade looks for this, and if found it assumes that a script
is being upgraded.
solartcl appears at the top of all upgraded script files. SOLAR
programmers are encoured to use solartcl as well.
Shortcuts: upg - upgrade
Purpose: Print short "usage" message about a command
Usage: usage <command>
Example: usage multipoint ;# shows usage of multipoint command
Notes: Since this is printed directly to terminal, it will stay visible for
next command.
If help message contains no "Usage" section, the first 15 lines will
be printed.
Shortcuts: usag - usages
Purpose: Define unix sort program name
(used for multipoint*.out files)
Usage: usort <program> ; use program
usort "" ; disables sort feature}
usort ; show current program
Notes: The default is /usr/bin/sort, which should work on most system.
It is necessary to include a path for users which have PEDSYS,
which has its own program named "sort." The program must be
compatible with unix sort and have -n -o and -k arguments.
Example: usort /usr/local/bin/sort
Shortcuts: usor - usort
Purpose: Set the output verbosity.
Usage: verbosity default ; Set default verbosity
verbosity plus ; Set above average verbosity
verbosity min ; Set minimum verbosity
verbosity max ; Set maximum verbosity (debugging)
verbosity ; Display current verbosity
verbosity -number ; Show verbosity hex bit-coded number
verbosity <number> ; Set verbosity with bit-coded number
Notes: During analysis scripts such as polygenic and multipoint, the
default verbosity supresses all the usual maximization output
(such as you would see with the 'maximize' command run by
itself).
The maximization output can be turned on for analysis scripts
using the 'plus' verbosity level. 'plus' is above default,
but below 'max.'
The bit coded numbers used for various verbosity levels are
subject to change. User scripts should use the name (such
as 'default') to be consistent with future releases.
There are now a few reports which are so verbose that they
are not even included in verbosity max. They may be specified
by using hex coded numbers (starting with "0x"). These are
subject to change in future releases.
0x4ffff Max verbosity with per-pedigree likelihoods for
each interation.
Hex-coded verbosity numbers were not supported prior to version
2.0.2.
Shortcuts: verb - verbosity
Purpose: The old zscore command to zscore current trait
Old Usage: zscore [-off] [-q]
zs [-off] ;# Perform zscore quietly
-off Turn off zscore
-q Perform zscore quietly
Notes: The "Mean" and "SD" values used by zscore are computed only
once, at the time the zscore command is given. Thus they do
not reflect later changes to the phenotypes file, or to the
sample, which might be restricted due to individuals missing
covariates added later. Generally, for this reason the
zscore command should be given after the covariates command
and immediately before a model maximizing command such as
polygenic.
Starting with SOLAR Version 4.0.9, the trait mean and SD
are computed from the actual sample that would be included
in an analysis (at the time the zscore command is given).
As described in the notes below, you can adjust the Mean
and SD by using "option zmean1" and "option zsd1" to set
the values actually used. These values are applied to
the trait values during maximization.
If the trait is changed without giving the "model new"
command, the new trait will be zscored automatically.
This feature is obsolescent. In a future update, zscore
will be turned off when the trait is changed.
An alternative to zscore is to define the trait as the
inverse normal transformation of a variable. See
"help inormal" and "help define" for further details.
zscore will also calculate a number of statistics
for the trait: mean, minimum, maximum, standard deviation,
skewness, and kurtosis. These will be written to the file
zscore.out in the current output directory. As of version
4.0.9, these statistics are no longer written to the terminal.
Instead, a single line is displayed with the trait name,
mean, and SD. Even that line is not shown if zscore is
invoked from a script or the zs abbreviation of the command
is used.
To calculate these statistics for any phenotypic variable without
zscoring and without necessarily making it the trait, use the
"stats" command instead.
A trait must already have been selected with the trait command
or loaded model. Also the phenotypes file must have been loaded.
When a maximization is performed, trait values are replaced with
their zscored values. The formula is:
zscored = (value - Mean) / SD
zscore is a model dependent option controlled by "option zscore".
It remains in effect until another model is loaded or the
"model new" command is given. When models maximized with zscore
are reloaded, zscore is again activated.
"option zscore" is set to 1 ("on") by this command, and the
related options zmean1 and zsd1 (mean and standard deviation
for the first trait) and zmean2 and zsd2 (mean and standard
deviation for the second trait) are set as required. You can
adjust these options directly to fine tune the mean and standard
deviation values used, but be sure that zscore is not set to 1
until the mean and (non-zero !) standard deviation values are
set for all traits in the model.
In a multivariate model, zscore will only be applied to the
first two traits.
Whenever zscore is activated or deactivated, parameters mean
and SD are reset to zero to force setting new boundaries and
starting point during the next maximization.
If a new phenotypes file is loaded, the zscore command should be
repeated to reflect the new file.
Purpose: Zscore current trait(s) or covariate(s)
Usage: define defname = zscore_phenotype
trait defname
OR
covariate defname
(defname is any user defined name, phenotype is any phenotype name)
Notes: zscore_ is a prefix that may be used in the define command,
similar to the inormal_ prefix. Once a definition has been
created, it may be used in either the trait or covariate commands.
For further information, see "help define".
The Mean and SD are obtained from the current maximization sample,
not the entire phenotypes file.
In versions of SOLAR prior to 4.4.0, zscore was a command that
could be only used to zscore the current trait. That command
is still available as before, but was considered obsolescent.
It was difficult and problemantical. For information about that
command, for understanding previous uses, see "help old_zscore".
Purpose: Calculates pvalues on an nvidia gpu for snps given a loaded phenotype containg columns
labeled "snp_" or a plink file that contains ids that correspond to the phenotype
and pedigree ids.
Warning: With this version of SOLAR gpu_gwas is still in experimental stages. The stability of
the command is the issue, not the results of the command.
Usage: gpu_gwas -header -np -out
Purpose: Calculates pvalues for snps given a loaded phenotype containg columns
labeled "snp_" or a plink file that contains ids that correspond to the phenotype
and pedigree ids.
Warning: With this version of SOLAR gwas is still in experimental stages. The stability of
the command is the issue, not the results of the command.
Usage: gwas -np [optional: -plink ]
The GWAS function performs an examination of a genome-wide set of genetic variants and calculate p-values for
the association between trait and individual SNPs. The output is saved as gwas.out in the directory of the
trait.
Covariates can be factored in by running sporadic_normalize on the trait beforehand. The number of
permutations determines the precision of the pvalue. However, too many permutations can slow this command down
or possibly limit the number of snps that can be loaded into memory.
Using a plink file to load the snps into the function is highly recommended for large data sets since the
file format reads into memory significantly quicker than csv files. This file type requires a .bed, .fam,
and .bim file to work.
Example:
trait AverageFA
covar age*sex
sporadic_normalize -out inorm_averageFA.csv
joinfiles averageFA_snps.csv inorm_averageFA.csv -o phenotype.csv
load phenotype phenotype.csv
trait AverageFA
gwas -np 5000
Purpose: This command creates a pedigree file given a phenotype file taken as input.
Usage: create_fake_pedigree [-o output pedigree filename
Phenotype filename to be used to create pedigree
[-o
Purpose: Fast test and heritability approximation
Usage: fphi
Returns: indicator h2r standard error
Notes:
Trait and covariates must be specified beforehand. However actual
maximization is not done so there will be no parameters created.
Example:
model new
trait q4
covar age
set results [fphi]
puts "H2r is [lindex $results 1]"
Purpose: Fast test and heritability approximation performed on the GPU
Usage: gpu_fphi [-header ] [-output
SOLAR Command Descriptions
NOTICE: SOLAR is an evolving work of software and is subject to change.
Commands, arguments, features, performance, and availability are
all subject to change. There is no committment to support scripts
written using the current commands in future releases.