Return to command index
A1.1 about
Purpose: Copyright, authors, and disclaimers SOLAR is Copyright (c) 1995-2014 Southwest Foundation for Biomedical Research. All rights reserved. The authors are John Blangero, Kenneth Lange, Laura Almasy, Harald Goring, Jeff Williams, Tom Dyer, Michael Boehnke, and Charles Peterson. Parts of SOLAR consist of software developed by others; a complete list is provided in Appendix Four of the documentation included with this package (use the "doc" command to find out where that is). Use of this program should be acknowledged in scientific publications. Commands, features, performance, and availability are subject to change. There is absolutely no warranty, express or implied. There is no committment to support scripts written using current commands in future releases. Shortcuts: abou - aboutReturn to command index
A1.2 allcovar
Purpose: Set up all non-trait variables as covariates Usage: allcovar Notes: Phenotypes and trait commands must already have been given. If there is a variable named "age," it will be set up as "age^1,2#sex." If undesired variables are made into covariates, they should be removed with the covariate delete command. allcovar will not include names mapped to any of the standard field variables (see 'help field'). Be sure to set up field mappings (if required) first to ensure you don't get extra covariates for the likes of ID, FAMID, etc. allcovar will also not include any names added to the 'exclude' list. Use the 'exclude' command to add names to the exclude list, or to display the exclude list. By default, the exclude list includes some standard PEDSYS mnemonics Shortcuts: allc - allcovarsReturn to command index
A1.3 allsnp
Purpose: Include all snps as covariates in current model Usage: allsnp Notes: allsnp includes all the phenotypes prefixed with snp_ or hap_ as covariates in the current model. This is often the first step in a qtn analysis. Afterwards, you can remove some snps using the "covariate delete" command. It is OK if you have already selected other covariates, including some of the snps. Every covariate is only added once no matter what. allsnp looks at all currently loaded phenotype files.Return to command index
A1.4 alnorm
Purpose: Evaluates the tail of normal curve Usage: alnorm <x> [t | f] If "t", curve is evaluated from X to infinity If "f", curve is evaluated from minus infinity to X Shortcuts: alnorm - alnormReturn to command index
A1.5 annotate_gwas
Purpose: Add annotated gene data to a gwas.out file generated by gwas command or gpu_gwas command. The command only outputs snps with pvalues less than equal to threshold. Default threshold is 0.05 Usage: annotate_gwas -i <gwas data filename> -a <annotate data filename> <annotate field list filename> -o <output filename>> Optional: -t <pvalue threshold> -i <gwas data filename> Contains the output from commands gwas or gpu_gwas -a <annotate data filename> <annotate field list filename> The first argument is the filename containing the snp annotation data. The second argument is the file name containing the list of field names separated by commas. -o <output filename> Annotated gwas data is written out to the file name specified with this argument -t <pvalue threshold> This is the threshold that determines whether a snp within the input file will be written to output. Any snp with a pvalue less than or equal to the threshold will be included.Return to command index
A1.6 ascertainment
Purpose: Describe ascertainment correction using proband(s) Ascertainment correction is available through use of the proband field in the phenotypes file. Ascertainment correction by conditioning on probands is automatically performed if there is a field named 'proband', 'probnd', or 'prband' (in upper or lower case) in the phenotypes file. Probands are those individuals through whom the pedigree has been ascertained. In a proband field, blank ( ) or zero (0) signifies non-proband (normal) status, and anything else signifies proband status. No decimal point is permitted after the zero. If your proband field is named something else, the safest approach is to modify your data files accordingly. If that is not possible, you can use the SOLAR 'field' command to map your name. For example, if your proband field is actually named 'Affected', you would use the following command: solar> field probnd Affected (Note that the name 'probnd' is used as a field type selector because that is the PEDSYS standard mnemonic.) For routine use, such a field command should be included in a .solar startup file or user script. Field mappings are not included in model files. For that reason, it may be safest to modify the code or data file if possible. Conversely, if your file has a probnd field but you wish it to be ignored, you can rename the probnd field or give a command like the following: solar> field probnd -none Proband individuals are required to have all the quantitative variables required of other individuals to be included (as probands) in the analysis. Probands who are missing any quantitative variables are not included in the Proband Count and except for defining the pedigree structure do not enter into any calculations. Unlike the program Fisher, SOLAR does not require probands to be at the beginning of pedigrees, and does not require you to provide a "proband count." Other than that, SOLAR uses the ascertainment correction algorithm built-in to Fisher. SOLAR prints a Proband Count in the summary statistics, which are written to maximization output files and to the terminal unless the verbosity level is set low. Shortcuts: ascer - ascertainmentReturn to command index
A1.7 automodel
Purpose: Default model setup Usage: automodel <phenotypes> <trait> phenotypes is the name of phenotype file trait is the name of trait (all other variables will be used as covariates) Notes: 1. Automodel will create a new model, with all non-trait and non-pedigree variables as covariates (see note 2). 2. The pedigree-related fields listed by the 'field' command will not be used as covariates (except for SEX, which will be). Certain other standard PEDSYS names are in the default exclude list. You can add additional items to the exclude list with the exclude command. See 'exclude' and 'allcovar' help. 3. Boundaries and starting points are set automatically by the maximize command. 4. You can pick and choose from the commands that automodel uses if you want to do things differently. Here is the body of automodel: model new ;# Start a new model phenotypes load filename ;# load phenotypes file trait traitname ;# assign trait variable allcovar ;# assign covariates polymod ;# set polygenic model type Shortcuts: autom - automodelReturn to command index
A1.8 bayesavg
Purpose: Perform bayesian oligogenic model averaging on covariates or linkage components of the current model. Usage: bayesavg [-cov[ariates]] [-ov[erwrite]] [-redo] [-max <max>] [-cutoff <cutoff>] [-symmetric] [-list <listfile>] [-fix [cov|param]] [-size_log_n] [-nose] [-old_log_n] [-sporadic] [-h2rf h2r_factor] [-saveall] [-qtn] [-stop] [-nostop] bayesavg -r[estart] ;# (see also -redo) SPECIAL NOTE: THE ALGORITHMS CHANGED in VERSION 1.7.3. SEE NOTES 1-4. NUMERIC RESULTS MAY DIFFER FROM PREVIOUS VERSIONS. -covariates (or -cov) Perform bayesian model averaging on the covariates only. (The default is to perform bayesian model averaging on the linkage elements.) -overwrite (or -ov) means force overwrite of existing output files -max Only include this number of components, or fewer, at one time. This reduces the number of models enormously, particularly for large N. -list file contains a list of the elements to use. There is one line for each covariate or linkage parameter. Remaining covariates or linkage parameters in the starting model are automatically fixed. Covariates need not be present in the starting model, but linkage parameters (and their matrices, etc.) must be included in the starting model. -fix (or -f) fix (lock in) this covariate. A fixed element covariate (specified by covariate name, e.g. "age") or linkage element (by linkage parameter name, e.g. "h2q1") is carried through all models. (Note: a -fix or -f qualifier is required for each covariate to be fixed, for example: -f age -f sex.) When fixed elements are included, it is adviseable to run "polygenic" on the starting model first. -cutoff (optional) sets the BIC limit for occam's window (default: 6) -log_n specify the log(n) to use. Normally this is first estimated from the samplesize of the unsaturated model, then recalculated from the standard deviation of the mean and it's standard error in the model with the best BIC. -symmetric (or -sym) use "symmetric" Occam's window. The default is a "strict" Occam's window, which excludes superset models with higher BIC; symmetric Occam's window includes ALL models within BIC cutoff. -stop Stop when no models in the last group with the same size (degrees of freedom) have entered the window. (This is the default for -qtn.) -nostop Do not stop when no models in the last group with the same size have entered the window. (Useful for overriding the default for -qtn.) If -stop or -qtn is specified, however, the report if any models have entered the window is still given. -restart (or -r) means restart previous bayesavg run that was terminated before completion. This begins with the model after the last one in the output file. Do not use -restart if last run completed. When restarting, set the trait or outdir, then give the command "bayesavg -restart" with no other arguments. The original model and other arguments are automatically recalled. Previous command arguments are read from bayesavg.command and the previous starting model is c.orig or cov.orig. If you need to change anything, use the -redo option instead. You will also need to use the -redo option if the first pass through all models completed, or if the bayesavg was started under a previous version of SOLAR. -redo is a special form of restart that allows you to change some options. Unlike -restart, -redo REQUIRES YOU TO SPECIFY ALL OPTIONS AND LOAD ORIGINAL STARTING MODEL. Only models not already found in the output file will be maximized. There are several cases where you must use -redo instead -restart: (1) If you need to re-maximize models which had convergence problems previously (edit them out of bayesavg*.est file, change boundaries, then -redo). (2) If previous bayesavg run completed but you want to try a different window cutoff or type. (3) You deleted all files except the bayesavg.est file. (4) You need to restart from a previous version of SOLAR. Unlike -restart, with -redo you must set up the starting model and commands either as they were previously or with desired changes. Since you must set up the original model EXACTLY, and specify options either EXACTLY as they were originall specified, or with the desired changes, you are advised to use this option carefully. It is a good idea to make a backup copy of the outdir first. -saveall will force the saving of all models. Normally only the models within Occam's window are saved. (Note: models outside the window will not have standard errors.) -size_log_n Use the log(n) estimated from sample size as the final log(n). This bypasses the computation of log(n) from the S.E. of the SD parameter of the model with the best BIC. -nose Do not compute standard errors for any models (normally they are only computed for models in the window). Unless you specify a particular -log_n, the log(n) estimated from sample size will be used (as with -size_log_n). -old_log_n This calculates log(n) the old fashioned way, using the saturated model for covariate analysis or the unsaturated model for linkage analysis. This option is provided for comparison with earlier releases, and may soon be removed. -h2rf (optional) is used to set upper bound of H2r (default: 1.1) See notes below. Use of this option is now unnecessary because of automated boundary control. -sporadic This option is depricated. Force all models to sporadic. Valid only with -covariate. Now you can accomplish the same thing by making the starting model sporadic. -qtn Quantitative Trait Nucleotide Analysis: A "covariate" analysis is done with "-stop" in effect. Covariates with name snp_* or hap_* are automatically included but other covariates are excluded. A special "windowfile" named bayesavg_cov.win is also produced. The -stop default can be overridden with -nostop. To include all snps in the starting model, use the separate command "allsnp". Output: In addition to the terminal display, the following files are created (<outname> is "bayesavg" for linkage analysis or "bayesavg_cov" for covariate analysis): <outname>.avg Final averaged results <outname>.out Final BIC and other info for each model (standard errors for models in window) <outname>.history History of analysis messages <outname>.est Estimated BIC for each model (pass 1) <outname>.nose Final BIC but no standard errors (pass 2) Models are saved with "c" <prefix> for linkage analysis and "cov" prefix for covariate analysis: <prefix>0.mod Unsaturated model, with standard errors <prefix>1.mod Model with element 1 (if saved) <prefix>12.mod Model with elements 1 and 2 (if saved) <prefix>12_11.mod Model with elements 1, 2, and 11. <prefix>.orig.mod Original user model when started <prefix>.start.mod Base model (unsaturated) before maximization <prefix>.base.mod Maximized base model Notes: 1) bayesavg determines the number of variable (non-fixed) elements and sets N automatically. N and the number of models are reported near the beginning. A new algorithm is used to determine all the element combinations; this results in a more logical ordering in which the smallest models are evaluated first. 2) The first pass through all models is done with an approximate log(n) computed from the sample size. The resulting file is bayesavg.est (or bayesavg_cov.est). The final log(n) is then computed from the model with the best BIC, and all BIC's are recalculated with the resulting file being bayesavg.nose (or bayesavg_cov.nose). Then, standard errors for only the models within Occam's window are recalculated. The resulting final output file is bayesavg.out (or bayesavg_cov.out). The output summary averages are reported in bayesavg.avg (or bayesavg_cov.avg). This is a new algorithm designed to save time (by only calculating standard errors from the models in the window), be more robust, and give more accurate results. Results may differ somewhat from those in earlier versions (prior to 1.7.3) of SOLAR. Additional history of the analysis (the messages starting with "***") are saved in bayesavg.history (or bayesavg_cov.history). 3) To permit special models (with household effects, epistasis, etc.) to be handled, bayesavg no longer forces the starting model to be sporadic first. It merely maximizes the current model, with all non-fixed elements removed, but with no change(s) to the starting omega or constraints. If the starting model cannot be maximized, the user is advised to run "polygenic" first. Running "polygenic" first is probably a good idea in all -covariate cases, particularly if there are non-fixed elements. 4) Models are now "built-up" from the unsaturated model rather than being "constrained down" from the saturated model. The unsaturated model itself is usually created by "constraining down" the starting model. 5) bayesavg may not support bivariate models. Shortcuts: bayesa - bayesavgReturn to command index
A1.9 benice
Purpose: Lower priority of SOLAR to allow more CPU for other jobs or lower priority of one SOLAR run relative to another Usage: benice ; Set "nice" level to 15 benice <LEVEL> ; LEVEL is between 1 and 20 ; 20 is "most nice" Notes: This is intended for use on Unix systems which support the "renice" command, including Solaris 2.5 and above Once you have set a nice level, you cannot go back to a higher priority on this process. You must exit and restart. The default unix scheduling allows some time even for very "nice" jobs. However, they get somewhat less CPU than other jobs. On the SFBR Ranch, scheduling is absolute, so that "nice" jobs will be suspended until all other jobs are done (or waiting for a system resource such as disk access). Nice jobs have minimal (<1%) impact on other jobs, unless they hog huge gobs of memory. Shortcuts: beni - beniceReturn to command index
A1.10 blank
Purpose: Blank individuals according to variable data conditions Usage: blank [-o] [-q] [-n] [<conditional expression>] <conditional expression> can be any solar variable expression (as allowed by the define command for covariates) that adds up to zero or non-zero. If it adds to non-zero for a person, that person is removed from the sample. [-q] Go about blanking quietly. [-o] Force overwrite of existing definition having same name (see below for example of definition naming). [-n] Make new definition name if this would otherwise conflict with existing definition name With no arguments, blank shows blanking definitions currently in effect. To see all the definitions available, use define command. Examples: blank class!=1 ;# include only class=1 in sample blank age<<55 + sex==1 ;# blank all but old guys blank age>=55 * sex==2 ;# blank only old guys Notes: 1. blank creates a definition and a null covariate to achieve the desired blanking. It shows you what it does, and then suggests how this blanking may be deleted: solar> blank age<<55 + sex==1 define blank_age = blank * (0!= (age<<55 + sex==1) covariate blank_age() To delete: covariate delete blank_age() solar> 2. blanking is cumulative through the effect of all blanking covariates that remain in effect. If you choose a condition which would create the same name as used by a previous condition (see example above) it will raise an error. You can force overwrite with -o. 3. To restrict sample based on available of some variable, use a regular null covariate for that variable, as documented for the covariate command, for example: covariate age() null covariates (having following empty parentheses) are not included in likelihood estimation, but are used to delimit the available sample, just as blanking covariates are. 4. You may also create covariate definitions just like blank does. But be careful because it is easy to do it wrong.Return to command index
A1.11 boundary
Purpose: Change artificial boundary heuristics Usage: boundary ; show settings boundary wide [start|off] ; wide boundaries boundary null [on|off] ; use boundaries from ; null model boundary start upper <term> [<term> ...] ; Initial upper bounds <term> ::== <number> or <number>*h2r boundary float upper <number> ; Later upper bounds boundary change <number> ; Amount to change by boundary crunch <number> ; Crunch bounds +/- boundary quadratic tol <number> ; quadratic tolerance boundary max crunch <number> ; Maximum crunches boundary h2r factor <number> ; Bound h2r boundary e2 squeeze <number> ; Bound e2 boundary trace [off] ; Trace upper bounds boundary hints ; More discussion boundary cov retries <number> ; Max covar retries boundary cov incr <number> ; On each retry, ; increase cov bound ; by this factor Examples: boundary start upper .2 .1 .05 boundary float upper .05 Notes: To function properly, the maximization algorithm used by SOLAR needs a little bit of help in the form of artificial boundaries. In general, any variance component can assume a any value from 0.0-1.0, but in any particular case the range is more limited, and artificially limiting the range helps SOLAR maximize successfully. A set of heuristics and retry algorithms has been developed for SOLAR to set and adjust artificial boundaries. The heuristics should not normally require adjustment. If they do, please send a message to [email protected] so we can improve SOLAR. You will know if you are having trouble with the boundary heuristics becase you will get 'Boundary' or 'Convergence' errors. Beginning with SOLAR version 1.4.0, you will not get 'Boundary' errors for variance components because SOLAR will automatically increase the boundaries up to the theoretic limits (0.0-1.0) as required. If you get 'Convergence' errors, you should try setting some of the heuristics to lower values than they have already. In addition to these heuristics, there are now also built-in retry mechanisms which automatically increase bounds if they are found to be too small, or decrease bounds if they are too big (being too being can cause convergence problems). SOLAR will always discover if bounds are set too small to find the correct result and increase them, but it may not be able to deal automatically with bounds that need to be set very close to the correct result. If you get Boundary errors for covariates, you can deal with them in one of two ways. For one, you can simply set the covariate upper and lower bounds in the starting model to reasonable values using the 'parameter' command and then re-run the analysis. Or, you can use the 'boundary cov retries' or 'boundary cov incr' commands to adjust the covariate boundary retry mechanism (which is separate from the mechanism for variance component boundaries). Covariate bounds do not have obvious theoretic limits, so it is impossible to automatically prevent all covariate boundary errors. boundary wide on ... set wide boundaries for variance components (N/A) boundary wide start ... set wide boundaries at start of chromosome boundary wide off ... use standard boundary heuristics "boundary wide on" causes the boundaries for future linkage models to be set to the full natural range (lower 0 upper 1). This has no effect on the model currently in memory, but will be applied to future linkage models created by the multipoint, twopoint, linkmod, and linkmod2p commands. This supercedes the standard variance component heuristics ("boundary start upper," "boundary float upper," "boundary h2r factor," and "boundary e2 squeeze") and also sets "boundary null off." THIS OPTION IS NOT YET AVAILABLE (use "boundary wide start" instead). "boundary wide start" causes the boundaries for future linkage models to be set to the full natural range for the first QTL on each chromosome. After the first QTL, the standard heuristics are applied. (For twopoint linkage, this is the same as "boundary wide on") Both "boundary wide on" and "boundary wide start" turn off "boundary null on," as the options are incompatible. "boundary wide off" restores the usual variance component boundary heuristics for future linkage models. "boundary wide off" does not necessarily restore the exact boundaries previously in use, and it does not restore "boundary null on" if that had previously been in effect. boundary null on ... set boundaries according to null model boundary null off ... back to standard boundary heuristics "boundary null on" causes the boundaries for future linkage models to be taken from the current null model. In pass 1 of multipoint, for example, the boundaries would be taken from null0.mod, and in pass 2, they would be taken from null1.mod. In cases of persistent convergence failure, you can edit the boundaries in the null model and use "multipoint -restart" to attempt to resolve the jam. "boundary null on" turns off "boundary wide on" and "boundary wide start," if they had been operative, because the options are incompatible. In the case of h2q* parameters not defined in the null model (for example, h2q2 will not be defined in null1.mod, though it is required for all two-linkage models), the default is to use the boundaries for the previous h2q parameter. SOLAR always defines h2q1 in null0.mod. "boundary null off" restores the usual boundary heuristics for future linkage models. It does not restore "boundary wide start" or "boundary wide on" if those had been in effect previously. boundary start upper ... set upper bound starting point for h2q's boundary float upper ... set upper bound based on previous h2q value These commands apply to the upper bounds of h2q* parameters (e.g. h2q1). The default values are deliberately chosen to be quite low because they are automatically raised as required by a retry mechanism. If the starting values were set to high, convergence errors could occur, and the mechanism for handling convergence errors is not as robust because it doesn't know which boundaries to adjust. 'boundary start upper' sets the starting value for the upper bound of each new h2q parameter at the beginning of each chromosome. This can be set as a single number (0.0 - 1.0) or as a term including 'h2r' (such as 0.8*h2r, which is the default). (H2r will be taken from the preceding null model if one is found. For example, if there is one linkage component, the null model is null0.out, which contains no linkage components. If there are two linkage components, the null model is null1.out which contains one linkage component.) Multiple values can be specified, one for each multipoint scan. The last value specified applies to all remaining scans. The default value of 0.8*h2r means that the upper bound for each new linkage component is set allowing for 80% of the current residual heritability to be accounted for by the first locus. 'boundary float upper' sets the value for the upper bound of the newest h2q parameter after the beginning of each chromosome. The upper bound floats above each previously maximized h2q value by this amount, which defaults to 0.1. boundary change 'boundary change' sets the value by which a bound changes after a boundary condition is detected. Upper bounds will be raised and lower bounds will be lowered by this amount. The default value is 0.1. boundary crunch 'boundary crunch' sets the boundaries around each variance component if a convergence error occurs and then invokes a retry. The default value is 0.1. For example, if the previous value for h2r was 0.3, the new boundaries will be set at 0.2 and 0.4. Boundary crunch is only applied after convergence errors, after which the boundaries can expand again through the retry mechanism. boundary max crunch 'boundary max crunch' sets the limit on the number of crunch attempts for each locus. Any given crunch may be followed by a series of boundary expansions, so multiple crunches may be required. The default is 10, to give a large reasonable chance of success (if success is going to be achievable). Two crunches in a row are never permitted (that would be meaningless). boundary quadratic tol The normalized quadratic (for quantitative traits only) is normally required to be between 0.999 and 1.001. For some problems, this is unrealistic. To change the tolerance to +/- 0.01 (0.99-1.01), you would give the command: boundary quadratic tol 0.01 The allowed range is 0 - 1. boundary h2r factor 'boundary h2r factor' sets an upper bound for h2r based on the h2r value in the null model. The default value of 1.1 means that h2r is allowed to grow to 1.1x the size it had in the null model. So far as I know, this has never needed adjustment. In any case, if it is too small, the automatic retry system will handle it. boundary e2 squeeze 'boundary e2 squeeze' sets boundaries for e2 based on the previous e2 value. The default value of 0.1 means that e2 is allowed to deviate +/- 0.1 from the preceeding value. boundary trace [off] 'boundary trace' enables a trace of the upper bound applied to the newest h2q for each locus, and shows all retries and perturbations. This feature may be shut off with 'boundary trace off'. boundary cov retries <integer> 'boundary cov retries' sets the maximum number of retries during which the covariate boundaries are increased. The default is 10. boundary cov incr <number> 'boundary cov incr' sets the factor controlling the amount by which a covariate boundary is increased during a retry. The default is 5, which results in at least a five-fold increase on each retry. (The actual increase depends on the difference between both boundaries, and so will be larger than 5 in the beginning. This is subject to change.) - Shortcuts: bou - boundaryReturn to command index
A1.12 boundary-notes
Purpose: Discuss boundary error resolution strategies This is an extension of the help provided for the 'boundary' command, which you should read first. When convergence errors occur during a multipoint scan, scanning will terminate at the end of the scan regardless of whether some LOD scores reached criterion levels or not, and a message like the following will be displayed on the terminal (and printed to the multipoint.out file): *** Exiting because convergence errors occurred in last pass Also, to the terminal and the applicable output file for the scan, an error code will be appended to the end of each line on which an error occurred, for example: chrom 18 loc 0 0.0000 -2203.917 0.022568 0.268372 ConvrgErr The code "ConvrgErr" indicates that a Convergence Error occurred such that it was impossible to find a good maximum likelihood estimation. Beginning with version 1.4.0, SOLAR now uses a retry mechanism so that boundary errors (related to variance components) will not occur. Boundaries will be increased incrementally until their theoretic limits are reached. It is still possible that convergence errors might occur, and those may be controlled with the boundary command. Boundary errors related to covariates are also handled with a retry mechanism controlled by the 'boundary cov retries' command and the 'boundary cov incr' command. The default values should work in almost every case, but it is not possible to say they will always work because with covariates there are no theoretic limits. If convergence errors occur, you should use the 'boundary' command to lower the applicable artificial boundary setting heuristic. For example, if the error is at the beginning of a chromosome, you should use the 'boundary start upper' command to set a lower value than the default. All you need to do when these errors occur during a multipoint scan is to "restart" the scan after resetting the applicable heuristic. The restart will detect those models for which an error occurred, and redo them with the new heuristics. For example: solar> boundary start upper 0.1 0.05 solar> boundary float upper 0.05 solar> boundary change 0.05 solar> multipoint 3 -restart (In earlier releases, you had to edit out the models for which errors occurred in the multipoint1.out file and then restart. Now SOLAR recognizes models for which errors occurred and will redo them by default.)Return to command index
A1.13 build_grm
Purpose: Creates a empirical pedigree matrix from a plink data set Usage: pedifromsnps -i <input base name of plink data> -o <output csv file name> --freq <file made with plink_freq> [optional: -corr <alpha value> -per-chromo -king -method_two -normalize -batch_size <batch size value> -id_list <file w/ subject IDs> -n_threads <number of CPU threads>] -i The base file name of the plink .bed, .bim, and .fam files. -o The base file name for the output. -freq Name of output file from plink_freq command. -n_threads Number of CPU threads used for matrix calculation. Default: Automatically set based on hardware -per-chromo Outputs a separate matrix for each chromosome. Default: Disabled -corr <alpha value> Compute method one correlation GRM using this alpha value. Default: -1 -method_two Computes correlation GRM using a second method described below. Default: Disabled -king Computes Robust King GRM instead of using a correlation method. Default: Disabled -batch_size <batch size value> Number of loci computed at a single time per CPU thread. Default: 500 -id_list <file w/ subject IDs> Specified file contains a list of subject IDs separated by spaces. The resulting GRM will only use these IDs and excluded all others. Default: All IDs are used -normalize When used during the creation of a correlation GRM the final values are normalized using the square roots of the diagonal values. The result being that diagonal elements are 1 and off-diagonal elements are bounded by 1 and -1. Z*_i_j = Z_i_j/sqrt(Z_i_i*Z_j_j) where Z* is the final value and Z is the unnormalized value, i refers to the index of subject i while j refers to the index of subject j. Default method for GRM creation is correlation method one with default alpha = -1. Correlation Method One: Z_i_j = 1/m * sum_k=0 to k=m (SNP_i_k - 2*f_k)*(SNP_j_k - 2*f_k)*(2*f_k*(1-f_k))**alpha Z = GRM value i = subject row index j = subject column index k = locus index SNP_i_k = SNP value for subject i and locus k (must be 0,1,or 2) f_k = major allele frequency of locus k m = total number of loci not including loci that are missing values for subjects i or j alpha = exponent for variance, best set to default value of -1 Correlation Method Two: Z_i_j = 1/variance_sum * sum_k=0 to k=m (SNP_i_k - 2*f_k)*(SNP_j_k - 2*f_k) Note: Variables are the same as above except for variance_sum variance_sum = sum_k=0 to k=m 2*f_k*(1-f_k) Note: Locus index k is excluded if there is a missing SNP value for subject i or j. Therefore the variance_sum calculated for one element of the GRM may differ from another element. Robust King Method: Z_i_j = 1 - (sum_k=0 to k=m (SNP_i_k - SNP_j_k)**2)/(2*min(N_i,N_j)) N_i = number of heterozygotes for subject i N_j = number of heterozygotes for subject j Note: Heterozygotes for subject i or j are not included if the other subject is missing a value at that locus. For information on robust king method see Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen WM (2010) Robust relationship inference in genome-wide association studies. Bioinformatics 26(22):2867-287Return to command index
A1.14 catenate
Purpose: Concatenate strings Usage: catenate [<string>]* Example: set modelname [catenate $basename 0 .mod]Return to command index
A1.15 change-notes
Purpose: Display recent changes Usage: change-notes Shortcuts: chang - change-notesReturn to command index
A1.16 chi
Purpose: Compute probability for a chi-square value Usage: chi <value> <degrees> chi -number <value> <degrees> ; return only the number chi -inverse <pvalue> <degrees> Notes: Without the -number argument, the result is returned as a string like this: p = 0.0012345 (The sign will be "<" if below available accuracy.) With the -inverse argument, the chi-square value corresponding to a given p-value is returned. The -number argument does not apply when the inverse is computed and should not be given. chi will raise an error for certain out of bound conditions You may use a catch {} block to prevent this from ending scripts: set test [catch {set p [chi $val $deg]}] if {$test != 0} {set p 1.0} Shortcuts: chi - chiReturn to command index
A1.17 chinc
Purpose: Compute probability for a noncentral chi-square value Usage: chinc <value> <df> <lambda>Return to command index
A1.18 chromosome
Purpose: Select chromosome(s) for multipoint scan Usage: chromosome [<number>|<name>|<low>-<high>|all|*]+ ;select chromosome ; show currently selected chromosomes chromosome show ; show all available chromosomes chromosome showm ; show mibd's in pass (see note 2) Examples: chromosome 10 chromosome 10-13 15-17 20 chromosome 11 11p chromosome all ; select all available chromosomes chromosome * ; select all available chromosomes Notes: Use in conjunction with mibddir, interval, multipoint commands. (2) The showm option lists the mibds's that will be selected by the current "chromosome" and "interval" commands. (3) Alphanumeric chromosomes may not be in <low>-<high> ranges, but may be selected individually (for example, 11p), or with "all" or *. (4) The chromosome specification is not saved from one solar session to the next unless put in a .solar file. (5) For convenience, you may specify a chromosome or range of chromosomes whose mibds are not actually present, and the gap is ignored silently, as long as there are some mibds available for other specified chromosomes. The chromosome command acts as a filter applied to the mibd data actually available. Shortcuts: chro - chromosomesReturn to command index
A1.19 clod
solar::lod -- Purpose: Calculate LOD score Usage: lod [<test-loglike> <null-loglike>] [<options>] options := [-auto|-off|-raw] [-trait <N>] [-rhoq <N>] [-v] [-1t|-2t|-3t|-4t|-t1|-t2|-t3|-t4] [-nolodadj] If no likelihoods are specified, the likelihoods of the "current" model and the applicable "null" model are used. -auto Convert multivariate LOD to 1df effective LODs based on number of traits in current model and constraint of relevant rhoq's (default) -off Do not convert LODs to 1df effective -raw Do not perform LOD conversion or lodadj -traits <N> Convert multivariate LOD to 1dF assuming <N> traits -1t or -t1 Assume 1 trait (same as "-traits 1") -2t or -t2 Assume 2 traits (same as "-traits 2") -3t or -t3 Assume 3 traits (same as "-traits 3") -4t or -t4 Assume 4 traits (same as "-traits 4") -rhoq <N> Convert multivariate LOD to 1df assuming <N> constraints of relevant rhoq's -nolodadj Do not perform lod adjustment (lodadj) -v verbose: Show adjustment and conversion steps Examples: outdir test1 load model test1/null1 lod lod -v lod -2000.51 -2030.87 lod -trait 3 -rhoq 1 -v -2000 -2030 lod -raw -2000 -2030 Notes: If no likelihoods are specified, the current model must have been maximized through a command such as "maximize," "twopoint", or "multipoint", and the applicable null model should be saved as nullX.mod (e.g. null0.mod, null1.mod) where X is the number of active linkage elements, which is assumed to be one less linkage element than in the current model. Linkage elements are parameters named h2q1, h2q2, etc. The null model must have been saved in the maximization output directory, either named after the trait or set by the outdir command. By default, SOLAR provides easily interpreted "1 df effective" LODs which are equivalent to those in univariate models. However, you can also have complete control over the LOD conversion performed either using arguments here or preferences set globally with the lodp command. Options specified here override the defaults and lodp preferences. The correction of 2 trait LODs to 1dF effective LODs is based on this formula: the LOD is converted to chi square with 1/2 1df, 1/4 3df, and 1/4 point mass at zero. If rhoq is constrained, the formula is 1/2 1df, 1/4 2df, and 1/4 point mass at zero. This is then converted to a 1/2 1df chi square of equivalent p-value, which is divided by 2ln10 to get the 1df effective lod score. The correction of 3 trait LODs to 1dF effective LODs is based on the formula: the LOD is converted to chi square with 3/8 1df, 3/8 3df, 1/8 6df, and 1/8 point mass at zero. For each rhoq constrained, the 6df is changed downward by 1df. The conversion of higher multivariate LODs follows a similar expanding sum. If you wish to see the weights used, use the lod command with the -v option. Empirical LOD adjustment, if any, is automatically applied (see the lodadj command) unless the -raw option is used. Unless you specify -raw, SOLAR will need to search the output directory for a lodadj.info file, which means that a trait or outdir must have been selected. Empirical LOD adjustment is not yet supported for bivariate models. The lodadj value is ignored when bivariate LODs are computed, and, in the cases where the lodadj value would be shown (such as in the multipoint.out file, or if lod is called from the command prompt) a warning message is shown instead. In SOLAR version 3.0.2, the "clod" and "lod" commands were combined into a new "lod" command. The options allowed have changed compared with the earlier "clod" ; the original "lod" command did not allow any arguments. Use the "lodn" command if you the current model may not use the "h2q1" linkage parameter and you are not specifying loglikelihoods explicitly. See also lodn, lodp, lodadj. Shortcuts: clod - clodReturn to command index
A1.20 combinations
Purpose: Make a list or count combinations of integers 1..N of size K Usage: combinations <N> [<K>] [-max <maxsize>] [-list list] [-force] [-count] [-counts] [-start <number>] [-stop <number>] N defines the range of integers 1..N of interest. If no other arguments are specified, and N <= 10, the set of all combinations of this range of integers is returned. To get a list of combinations where possibly N > 10, add either the -list or -force option, with -list being the preferred method. K only include combinations of exactly this size (as in traditional "combinations"). If this argument is not specified, the default is to include combinations of all sizes, starting from the smallest size. -count Only return the NUMBER of combinations, not a list of the actual combinations. -counts Return a list containing the number of combinations for each "size" (i.e. "K"). -max include all combinations up to and including this size (the default is to include combinations of all sizes). The K and -max arguments may not be used at the same time. -list APPEND combinations to this list rather than returning them. Specify the list variable by name, as with the Tcl lappend command (see example below). If the variable is not already set, a new variable is created. When this argument is used, nothing is returned. For example: set comblist {} combinations 20 -max 10 -list comblist Be sure to empty the list first (as shown above) if you do not want to append to the previous contents, if the variable was used previously in the same procedure. This option may save memory (as compared with -force) for very large N since only one copy of the list is ever created. -force return list ("by value") even if N > 10. This is required for N > 10 unless the -list, -count, -counts, -start, or -stop arguments are given. Only use this option if you are sure this is what you want to do. Read all the following paragraphs to be sure. Generally, you would only use it inside a script, where the returned combinations are going to be immediately saved to a variable, such as: catch {set comblist [combinations $N -force]} The reason to require a -force option is that if a large N is given in an interactive session, the terminal window could be locked up for hours displaying all the combinations, with no way to break out until the terminal output buffer is empty. If that were to happen, you would probably want to kill the whole terminal session from another terminal window. For some users, that would probably require calling the system administrator. The -force option may require more memory than the -list option because a copy of the list is created in the process of "returning" it to the caller; that's just the way Tcl works, and it becomes important when creating lists with huge numbers of elements. If you are using this form of the command in a script, be careful that it is not the last command in the script, which Tcl automatically returns. Then, if the user runs the script from the terminal, the terminal window would be locked up. If you must use it as the last command in a script, you should use a "catch" command around it, as in the example above. The catch command only returns 0 (for success) or 1 (for error). The following options are useful when dividing up the set of combinations into jobs of an equal size. Otherwise, they may seem a bit obscure. -start Start with combination number <number> -stop Stop with combination number <number> Notes: CAUTION! The list can get VERY BIG! Be careful if n > 15 because memory requirements double for each [incr n], unless you are setting k or -max. ("BIG" means 100's of megabytes, gigabytes, etc. I am not kidding. On Solaris systems, you can use the SOLAR "memory" command to see just how much memory SOLAR has consumed.) Shortcuts: comb - combinationsReturn to command index
A1.21 constraint
Purpose: Create, list, or delete constraints Usage: constraint term [+ <term>...] = <number> | <term> <term> is [<factor>*]<parameter> constraint ; display all constraints constraint delete <number> ; delete constraint by number constraint delete <spec>|<left-spec> ; delete specified constraint constraint delete_all ; delete all constraints Example: constraint e2 + h2r = 1 constraint bq1 = bq2 constraint delete bq1 constraint delete h2r + e2 = 1 constraint H2 + 3*H2q1 - 2*H2q2 = 5*E2 ; anything is possible Notes: (1) The constraint numbers are shown when listing constraints. (2) If a new constraint matches the right hand "body" of an existing constraint, that existing constraint is replaced by the new constraint (the old one would be invalid anyway). solar> constraint sd = 1 solar> constraint [1] sd = 1 solar> constraint sd = 0 solar> constraint [1] sd = 0 (3) For the "constraint delete <left-spec>" command, if there is a constraint matching the entire left specification of a constraint, it is deleted. Or, you can specify the entire specification to deleted. (The "constraint delete <parameter>" version of the constraint command was ambiguous and is now obsolete.) [1] e2 + h2r + h2q1 = 1 [2] h2q1 = 0 constraint delete h2q1 ;# deletes constraint [2] constraint delete h2q1 = 0 ;# deletes constraint [2] constraint delete e2 + h2r + h2q1 ;# deletes constraint [1] (4) Instead of constraining covariate beta values to 0, use the "covariate suspend" command instead as this permits greater efficiency. (5) If you need to constrain interaction covariates (e.g. age*sex) or parameters whose name begins with a number, or parameters whose name includes other special characters, enclose the parameter name in angle brackets <>. When deleting the constraint, angle brackets are optional around the parameter name. Do not include numeric factors in the delete command. constraint 3*<bage*sex> = 1 constraint delete bage*sex (6) Constaints may only be simple linear equations of terms which include a optional numeric factor and a parameter name. Operating exponents and functions are not supported. If you need to constrain a power of some model feature, consider making the parameter itself contain the required power, then it can be linearly constrained. (7) Numeric constants (such as 1 or 0) should only appear as the right hand term. Shortcuts: cons - constraintsReturn to command index
A1.22 copybin
Purpose: Install new executable file without disturbing current users Usage: copybin <filename> <directory> Note: The way this works is quite simple. The original version of the file is not overwritten or deleted, but instead renamed. Running processes continue to access the original version through the inode, regardless of the name change, while new processes will access the new version. The renaming scheme simply appends dot followed by a number to the filename. The first available number starting from 1 is used. For example, the old "solarmain" becomes "solarmain.1" or "solarmain.2" if a "solarmain.1" already exists, etc. At some point you might want to clear out some of the older versions, but that is up to you, and it would lead to numbering that is not sequential, since copybin always takes the first available number. This is similar in design to the Unix "install -f" command. It lacks some of install's checking features, but in one way is much more capable: it allows any number of new versions to be installed without disturbing users of the first or any other previous version. The Unix install command only has one level of backup since it merely prepends "OLD" to the original name. If you do two install's in a row over a timespan in which jobs are continuing to run (as, unfortunately, is often required) copies of the original version are lost and users are likely to get a memory mapping error of some kind. This seems to work across NFS mounted filesystems, but it might not work for you, so be wary. Actually, in ancient Unix days this command might not have been necessary, but now that memory mapping is used to load image files, it is necessary now.Return to command index
A1.23 countfields
Purpose: determine consistency of number of columns in a comma delimited file Usage: countfields <filename> An information report is returned like this: longest: 8 (#1) x 1497 shortest: 8 (#1) x 1497 This means that the longest record had 8 fields, the first such record was #1, and it was followed by 1497 others of same length in the file. As it happens, the shortest record also had 8 fields, it was #1, and followed by 1497 of the same length in the file.Return to command index
A1.24 covariate
Purpose: Set up covariates (independent variables). It can handle interactions and polynomial terms. For other non-polynomial models, use the 'mu' command. Usage: covariate <variable>[^n | ^1,2[,3...]][*<variable> | #<variable> [([trait])]]* Creates a new covariate. See below for examples. ; covariate ; display all covariate info covariate delete <string> ; deletes covariate and beta(s) covariate delete_all ; deletes all covariates and beta(s) ; covariate <variable>() ; Null Covariate: require var in ; sample without covariation ; ; Covariate Suspension (for ; temporary hypothesis testing). covariate suspend <string> ; temporarily disable covariate covariate restore <string> ; re-activate suspended covariate Examples: covariate age ; Simple covariate Age covariate age*sex ; Age by Sex interaction (only) covariate age*diabet*diameds ; 3-way interaction covariate age^2 ; Age squared as a simple covariate covariate age^1,2 ; Shorthand for: age age^2 covariate age#diabet ; Shorthand for the following: ; covariate age diabet age*diabet covariate age^1,2,3#sex ; Shorthand for all the following: covariate sex age age*sex age^2 age^2*sex age^3 age^3*sex covariate sex age(q1) age*sex(q3) ; Trait-specific Covariates: ; covariate sex applied to all traits ; covariate age applied to trait q1 ; covariate age*sex applied to q3 In a multivariate analysis, trait-specific covariates are only required for the sample of their trait. See note (7) below. covariate q2() ; Null-Covariate: ; require q2 in sample of all traits Notes: (1) More than one covariate may be specified separated by spaces. Also more than one covariate command may be used. Adding a new covariate does not remove previous ones. Spaces are not allowd within the specification of each covariate term. (2) Pound (#) and comma (,) are shorthands allowed ONLY if there are no more than two variables. Further, only the first variable may have multiple exponents separated by commas. The following are INVALID: covariate age^1,2*diabed*diamed ;# INVALID covariate age#diabet*diamed ;# INVALID covariate weight*height^1,2 ;# INVALID Instead, specify multiple covariates as required: covariate age*diabet*diamed age^2*diabet*diamed covariate age diabet diamed age*diabet age*diamed covariate diabet*diamed age*diabet*diamed covaraite height^1,2*weight (3) N-way interactions are possible to any N. (4) Covariate commands create beta parameters automatically. Beta names begin with 'b' followed by the variables and exponents as in the non-shorthand form (e.g. bage^2*sex). (5) Quantitative variables are mean-adjusted. Binary variables are adjusted so that the lowest value is 0 and the highest value is 1. (6) Suspended covariate variables are still required in sample. (7) In a bivariate analysis "unqualified" covariates are applied to and required by all traits, and trait-specific covariates (with parenthetically named trait) apply only to the named trait and are only required for that trait. (This was changed in SOLAR version 4.0.3.) In a univariate analysis, ALL covariates are are currently required regardless of whether they apply to the trait. (The requirement of covariates specific to a different trait may be removed in a future update.) Null covariates (such as ef() ) are not applied to any trait, but are required by all traits. Variables not declared as covariates, but used in the mu equation, are (at this time) required by all traits. (8) When a trait is changed, covariate beta parameters are and reset to force re-evaluation of boundaries on the next maximization. But trait changes are not permitted for bivariate models; "model new" is required. "model new" will remove all covariates. (9) When created, beta parameters have starting value 0.0 and no boundaries. Likely boundaries are set automatically during the first maximization, and expanded if necessary up to an internally determined maximum (and you may further expand them if need be but this is highly unlikely). If desired, you may set boundaries after beta values have been created, and this will bypass automatic boundary setting (but not automatic boundary expansion). Shortcuts: cov - covariatesReturn to command index
A1.25 create_evd_data
Purpose: create_evd_data performs an EVD on the loaded pedigree for gwas or. gpu_gwas commands. This is useful for a data set with a large number of subjects. Usage: create_evd_data --o <output base filename> --plink <plink set base filename> --use_covs Prior to running the command select the trait that you plan to run gwas, gpu_gwas, or gpu_fphi with the trait command. The --plink option specifies a plink data set that will determine which ids will be used in running the EVD. The --use_covs options will include the ID set of covariates specified through the covariate command. This command now allows you to enter two traits at once in order to get their corresponding ID set. Output consists of three files labeled: <output base filename>.ids --list of subject ids <output base filename>.eigenvalues --list of eigenvalues <output base filename>.eigenvectors --list of eigenvectors <output base filename>.notes --notes on the creation of the EVD data setReturn to command index
A1.26 create_fake_pedigree
Purpose: This command creates a pedigree file given a phenotype file taken as input. Usage: create_fake_pedigree <phenotype filename> [-o output pedigree filename <phenotype filename> Phenotype filename to be used to create pedigree [-o <output pedigree filename>] Option to name output pedigree filenameReturn to command index
A1.27 d2e
Purpose: convert Fortran D style exponents to E form Usage: d2e <inputfilename> <outputfilename> d2e2 <inputfilename> <outputfilename> d2e2 starts with line 2, so as not to disturb D's in the header line of comma delimited files. d2e is more suitable for pedsys files. SOLAR now understands Fortran D style in phenotypes files in most circumstances anyway, so this conversion is not generally needed. You will know you need this if you see error messages.Return to command index
A1.28 d2e2
solar::d2e -- Purpose: convert Fortran D style exponents to E form Usage: d2e <inputfilename> <outputfilename> d2e2 <inputfilename> <outputfilename> d2e2 starts with line 2, so as not to disturb D's in the header line of comma delimited files. d2e is more suitable for pedsys files. SOLAR now understands Fortran D style in phenotypes files in most circumstances anyway, so this conversion is not generally needed. You will know you need this if you see error messages.Return to command index
A1.29 define
Purpose: Define an expression to be used in place of a trait or covariate Usage: define <name> = <expression> ; create a definition trait <name> [,<name>]+ ; use definition as trait(s) define ; show all defininitions define <name> ; show definition for <name> define delete <name> ; delete define for name define new ; delete all expressons define delete_all ; delete all expressons define rename <name1> <name2> ; rename define name define names ; return list of all names <name> can be any alphanumeric string with underscore, do not use these reserved words as names: delete delete_all rename names new <expression> is formatted algebraically using standard math operators + - * / and ^ (power) and () parentheses, and also all math functions defined by the C Programming Language which includes "log" for natural logarithm, trig functions, and hyperbolic functions, among others. Here is a list: erfc, erf, lgamma, gamma, j1, j0, y1, y0, rint, floor, ceil, tanh, cosh, sinh, atan, acos, asin, tan, cos, sin, expm1, exp, logb, log1p, log10, log, cbrt, sqrt, and abs. In addition, the inverse normal transformation (see help for "inormal") may be applied using the "inormal_" prefix (for example, inormal_q4 for trait q4). "inormal_" may be abbreviated down to "inor_". If a phenotype name within the expression contains special characters (anything other than letters, numbers, and underscore) it should be enclosed in angle brackets <>, and the angle brackets must also include any special operator prefix such as "inorm_". For example, given a trait named q.4 (with a dot), you could have a define command like this: define i4 = <inorm_q.4> Note: similar rules apply within the constraint and omega commands because those commands also allow expressions that could contain decimal constant terms and math operators. A debugging function named "print" is also available which prints and return the value of the expression it encloses. After printing, it pauses until the RETURN key is pressed. RETURN may be held down to pass through a lot of prints. Examples of the print command are given in the documentation for the "omega" command. The following relational operators may also be used between any two terms. If the relation is true, 1 is returned, otherwise 0 is returned. This enables you to construct compound conditional expressions having the same effect as could have been done with "if" statements. The C operators < and > have been replaced with << and >> so as not to be confused with the <> quotation of variable names in SOLAR. C Format Fortran Format Test -------- -------------- ---- == .eq. if equal != .ne. if not equal >= .ge. if greather than or equal <= .le. if less than or equal >> .gt. if greater than << .lt. if less than An expression is understood to be quantitative unless the top level operator is a relational operator, in which case it is understood to be discrete. Names used must not match the names of any phenotype. When there is an unintended match, the definition can not be used for trait names since it would be ambiguous. Once a valid definition has been created, it can be used in the trait command. Any or all of the traits can be definitions. All definitions will be saved in the model file, and loaded back in when that model is reloaded. Definitions in a model file will override current definitions. It is possible to save a model with nothing but definitions if desired. The only way to delete definitions is with the "new" "delete" or "delete_all" options, or by restarting SOLAR. The "model new" command has no effect on definitions. Expression names are not case sensitive. Assigning a new expression to a name replaces the expression previously assigned to that name, even if it differs in letter case. Renaming a definition to a name differing only in letter case is possible. For covariates only, it is possible to include in definition a constant called "blank". If an evaluation of the expression returns blank, that individual is counted as missing from the sample. The best way to use this constant is with one or more conditionals like this: define sample = blank*(age<<22)*(sex==2) covariate sample() This blanks any male (sex==2) having age less than 22. blank is the number -1e-20, so any numerical operation may change it to a non-blank small number. It should only be multiplied by 0 or 1. The empty parentheses after sample() mean that it is not a maximized parameter, it is a null covariate only used to delimit the sample. Examples: define loga = log(a) define eq1 = (q1 - 3.1)^2 define dq4 = q4 .gt. 12Return to command index
A1.30 deputy
Purpose: Make limited user key (for deputy registrars) Usage:: deputy register <deputy-key> deputy make <access-code> <username> Notes: 1) Deputy registrar must obtain deputy-key and access-code from [email protected]. Key is granted for critical collaborators only for use in cluster systems where normal registration process is unwieldy. 2) Deputy registrar uses "deputy register" command to register as deputy. This creates a file named .solar_deputy in home directory. (Note: It does not move the .solar_deputy file to SOLAR_DEPUTY_HOME if that is different from the deputy's home directory.) 3) The .solar_deputy file must be copied to a user to a deputy directory on all systems. This can be done in one of two ways. The default way is to access the .solar_deputy file in the home directory of the deputy, which must be found in a pathname with the deputy's username replacing the current username. For example if deputy registrar jsmith has registered the name pmiller, and the home directory for pmiller is: /home/pmiller Then the .solar_deputy file must be found in directory named: /home/jsmith If this default method cannot be used, there is an alternate method involving creating a shell variable SOLAR_DEPUTY_HOME giving the path to the .solar_deputy file. For example, the following line could be added to the "solar" startup script: export SOLAR_DEPUTY_HOME=/home/admin/jsmith 4) The deputy registrar can now make a limited range key for each user using the "deputy make" command. The user uses the normal "register" command to install the key into a file named .solar_reg in the user's home directory. The .solar_reg file AND the .solar_deputy file (located as described in note 3) must be found on each system where SOLAR is to be run because both are used in the validation process for keys created by deputy registrars. 5) The "deputy make" command adds the usernames registered to a file named "solar_registrations" in your home directory. The contents of this file should be sent to [email protected] on at least a biannual basis. 6) Username must be 2 characters or longer.Return to command index
A1.31 discrete-notes
Purpose: Describe support for discrete traits Usage: discrete-notes Discrete traits are detected automatically by SOLAR. They must be coded as two integer values separated by exactly 1. Typical codings are 0,1 or 1,2. If you specify two values that are not separated by exactly 1, this will be detected as an error. If you specify more than two values, your trait will not be determined to be discrete. For this reason, DO NOT specify missing values with a third number. Missing values should always be coded as blank (" ") or null with no number or character. DO NOT use "0" to signify missing values. See toward the bottom of this note for advice regarding analyzing traits with two values quantitatively. Discrete traits having more than 2 values are not supported by SOLAR. (This is also true for discrete phenotypic covariates: if discrete, they should not have more than 2 values. If you have such data, they should be recoded into N-1 discrete binary covariates or recast into "household groups." See the documentation for the "house" command.) Models with discrete traits may be used with any command in SOLAR such as polygenic, twopoint, multipoint, maximize, etc. Sometimes the information returned by SOLAR differs. For example, while the "polygenic" command normally returns "proportion of variance due to all covariates" when used with a quantitative trait, it instead returns the "Kullback-Leibler R-squared" when used with a discrete trait. (For technical reasons, the proportion of variance due to all covariates is not available for discrete trait models in SOLAR.) By examining the maximization output files you can determine unambiguously whether discrete or quantitative methods were used. (An example of a maximization output file is "null0.out" saved in the maximization output directory after running "polygenic".) In this file, immediately after the "Descriptive Statistics" and immediately before the "Model Parameter Starting Points and Boundaries" there will be one of two comments, either: Using SOLAR Quantitative Trait Modeling or Using SOLAR Discrete Trait Modeling When a model with a discrete trait is maximized, special discrete trait algorithms are used. Unfortunately, these methods are much more prone to numerical problems than the usual "quantiative trait" methods. Numerical problems lead to faulty parameter estimates and convergence failures. The default descrete method is relatively robust and only infrequently has the problem where the heritability erroneously gets close to 1.0. Even if the polygenic heritability (h2r) goes to 1.0, you may still be able to run a "multipoint" linkage analysis to find important locii. The heritibilities will be wrong, and the LOD scores will be wrong, but the "peaks" may be at or near the correct locations. It is not recommended to use the optional second discrete method set by giving the command "option DiscreteMethod 2" prior to running polygenic. Although it was intended to be more accurate, it more frequently fails with convergence errors or having the heritability go to 1.0, and at this time it is not recommended. Some people also try analyzing their discrete trait as quantitative. This can be done by giving the command "option EnableDiscrete 0". The likelihoods, LODS, and parameter estimates may be inaccurate, but the LOD peaks should be in the correct places. Better convergence is sometimes obtained, however, than when using the discrete method. Beware that there is a fundamental error when analyzing a discrete trait as quantitative. There are not truly two degrees of freedom for the mean and SD. Therefore, convergence failure is still more common with these models than with true quantitative models. Also beware that if you had previously analyzed the trait as discrete, and then changed the EnableDiscrete option to 0 without exiting SOLAR or giving the "model new" command, you will still have parameter SD constrained to 1.0, which is probably NOT what you need to do. SD is properly constrained to 1.0 only when you are analyzing a discrete trait as discrete (and, perhaps, in a few other esoteric cases). Because of all the pitfalls in using discrete traits, we try to find and use relevant quantitative traits whenever possible. Shortcuts: discrete-note - discrete-notesReturn to command index
A1.32 doc
Purpose: Find the SOLAR documentation Usage: doc [-whereis] doc show official documentation URL and location of documentation files on this system Notes: This command now tells you the URL where SOLAR may be seen with any available browser. Previously, it would run Netscape. Shortcuts: doc - docReturn to command index
A1.33 dominance-notes
Purpose: Find dominance documentation Dominance analysis is documented in section 9.4 of the SOLAR manual. Use the "doc" command or point your browser to the "doc" directory of your SOLAR directory, then select "Go to full SOLAR manual", then select Chapter 9. Dominance analysis is made possible by the "delta7" and "d7" columns in SOLAR phi2.gz and ibd matrices. For polygenic models, the delta7 matrix column is loaded, a d2r parameter is created and added to the "e2" constraint, then a delta7*d2r term is added to the omega. The commands required to do this are described in Section 9.4Return to command index
A1.34 doranch
Purpose: execute a script on every ranch machine (usually for /tmp cleanup) DO NOT USE THIS FOR SUBMISSION OF REGULAR JOBS because it bypasses the Gridware queing system, which it must do for cleanup of ALL machines. MUST BE RUN ON MEDUSA (only medusa addresses all other ranch machines) See also "stepup -parclean" which uses doranch to cleanup junk created by forcing a "stepup -par" job to quit. Usage: doranch <procname> <argument> doranch cleanuser <username> ;# delete ALL user's /tmp files on ;# the ranch (Note: you can only ;# delete files for which you have ;# delete privilege, usually because ;# of owning them.) doranch finduser <username> ;# find all my /tmp files on the ;# ranch but do not delete them. ;# Findings are written ;# to finduser.out. If -all is ;# used, all users are shown. doranch cleantmp <dirname>. ;# same as "stepup -parclean" ;# delete all /tmp/<dirname>.* ;# files. (parallel stepup dirs ;# are prefixed with <dirname> ;# followed by dot. doranch findtmp <dirname> ;# find all name* directories ;# but do not delete them. Findings ;# are written to findtmp.out. doranch cleanme now ;# same as ;# doranch cleantmp <username> make_rhosts ;# make a new .rhosts file, or ;# append to existing one to ;# make it complete. It may be ;# useful to delete old .rhosts ;# file first if it contains errors. showspace ;# Return sorted list of /tmp ;# storage used by all users ;# in showspace.out. Uses ;# doranch finduser -all, unless ;# existing finduser.out is found. <procname> is the name of the procedure to be run on every machine. procedures cleanuser, finduser, cleantmp, findtmp, and cleanme are provided, but user-written scripts could be used also. <username> is the username. cleantmp is a procedure that deletes all files and directories in /tmp which match the specified prefix, after which a wildcard * is assumed. For example "cleantmp charlesp." would delete a directory named "/tmp/charlesp.11019.2" Notes: It is useful to run ranch jobs in subdirectories of the /tmp directory to minimize network traffic. Jobs should be designed to cleanup after themselves in normal operation by deleting the /tmp subdirectory that was used as a working directory. However, even when jobs are designed to cleanup after themselves, if the jobs do not run to completion, the cleanup code might never be run. This is especially true when a user or administrator shuts down a large array job (such as "stepup -par") because of a mistake or emergency. That is when "doranch" may be useful. The "cleanuser" procedure deletes all files owned by the user in /tmp directories on all ranch machines. The "cleantmp" procedure deletes all files and directories in /tmp prefixed by the cleantmp argument on all ranch machines. The doranch procedures listed above may be used in creating custom cleanup options for other scripts. Such an emergency cleanup option is already built into the stepup command as option "-parclean". That uses doranch and cleantmp as shown above. Authors of other parallel scripts for general create similar script options tailored to the names of /tmp subdirectories they use. To see what the "finduser" script looks like, in order to write something similar, use the solar command "showproc finduser". All the doranch procedures write to a file named by the specified procname, for example cleanuser writes to a file named cleanuser.out for each file found. Usually this has two columns, node name and filename. However, for "finduser" a middle column is added which lists total diskspace used in kbytes. Note that a valid .rhosts file is required for usage, and the make_rhosts file will make one. doranch will complain if the .rhosts file is not present or incomplete. If doranch reports failure in connecting to some hosts, it is probably because the passwd and shadow files involved in userid authentication have not been properly updated on those hosts. If doranch reports failure in connecting to every ranch host, it probably means that the .rhosts file is invalid, and you should then delete the old .rhosts file and run make_rhosts. If doranch hangs at a particular host, that machine is probably down in some unusual way that is not known to gridware.Return to command index
A1.35 drand
Purpose: Return a random floating-point number between 0 and 1 Usage: drand [ <seed> ] If no argument is given, drand returns a floating-point number in the interval [0,1]. When an argument is given, it is taken to be an integer with which to seed the random number generator. If a seed value of 0 is specified, the system time is used as the seed. The random number generator should be seeded prior to its first use in a SOLAR run. If the random number generator has not been seeded when it is first called, it will be seeded with the system time automatically. Shortcuts: dran - drandReturn to command index
A1.36 empirical_pedigree
solar::epedigree -- Purpose: Load an empirical pedigree Usage load epedigree <epedigree> [-t <threshold>] [-founders] [-1] load pedigree <epedigree> [-t <threshold>] [-founders] [-1] [-sex <sexvarfile>] Empirical pedigrees are csv files which are representations of a kinship matrix, having kinship values for pairs of individuals. Empirical pedigrees may be loaded either with the regular "pedigree" keyword or "epedigree". However if "load pedigree" is used, and there is an error, the error message may be more helpful if "load epedigree" is used.. Empirical pedigree files must have fields named IDA, IDB, and KIN specified in the header line to be recognized as an empirical pedigree. -t <threshold> sets a minimum kinship value that indicates a family relationship. Levels lower than this are not recorded in the phi2.gz matrix produced, and do not indicate a family relationship between such individuals. The default threshold is zero, so any non-zero kinship value in the empirical pedigree file indicates a relationship. Given some noise in the kinship estimation, this means that families could become too large for efficient processing. In that case, a threshold value can help. A possible value would be 0.05. This will potentially lose information, however, and is only ever needed for greater efficiency or capacity in later processing. -1 (one family) simply bypasses the normal family determination and regards all the individuals present as one family. This can save a small amount of time for loading the pedigree, but if multiple families would otherwise be identified, it will make the phi2.gz matrix file larger and subsequent processing much slower. -founders (all founders) regards all input individuals as separate families having no relationship to anyone else. -sex <sexvarfile> permits the loading of a sex variable for this pedigree from the named file, which should be a CSV file having fields ID and SEX. Other fields are ignored. The sex variable coding is the same as for normal pedigree files: M, m, or 1 for males, and F, f, or 2 for females. Missing values are not allowed. All individuals in the pedigree must be included. This command does not attempt to generate a reconstruction of a matching "real" pedigree. Rather, it creates a skeleton pedindex sufficient for the correct operation of SOLAR. To do this, each individual in a kinship connected group (family) is made a sibling and listed together in the pedindex with an assigned family number. This permits the correct one family at a time operation of SOLAR. This skeleton pedigree MUST always be used with the kinship matrix, phi2.gz, which is also generated by this command, which contains the original data reformatted for use with the pedindex. (The default mode of SOLAR in some cases is to use an on-the-fly generated kinship matrix determined by family relationships. However, when an empirical pedigree is loaded, this is suppressed as it is for discrete and evd models.) As with other pedigrees, once an epedigree is "loaded" in a working directory it remains loaded across multiple SOLAR sessions launched from that directory.Return to command index
A1.37 empp
Purpose: Calculate an empirical p-value from lodadj results Usage: empp <lod>Return to command index
A1.38 epedigree
Purpose: Load an empirical pedigree Usage load epedigree <epedigree> [-t <threshold>] [-founders] [-1] load pedigree <epedigree> [-t <threshold>] [-founders] [-1] [-sex <sexvarfile>] Empirical pedigrees are csv files which are representations of a kinship matrix, having kinship values for pairs of individuals. Empirical pedigrees may be loaded either with the regular "pedigree" keyword or "epedigree". However if "load pedigree" is used, and there is an error, the error message may be more helpful if "load epedigree" is used.. Empirical pedigree files must have fields named IDA, IDB, and KIN specified in the header line to be recognized as an empirical pedigree. -t <threshold> sets a minimum kinship value that indicates a family relationship. Levels lower than this are not recorded in the phi2.gz matrix produced, and do not indicate a family relationship between such individuals. The default threshold is zero, so any non-zero kinship value in the empirical pedigree file indicates a relationship. Given some noise in the kinship estimation, this means that families could become too large for efficient processing. In that case, a threshold value can help. A possible value would be 0.05. This will potentially lose information, however, and is only ever needed for greater efficiency or capacity in later processing. -1 (one family) simply bypasses the normal family determination and regards all the individuals present as one family. This can save a small amount of time for loading the pedigree, but if multiple families would otherwise be identified, it will make the phi2.gz matrix file larger and subsequent processing much slower. -founders (all founders) regards all input individuals as separate families having no relationship to anyone else. -sex <sexvarfile> permits the loading of a sex variable for this pedigree from the named file, which should be a CSV file having fields ID and SEX. Other fields are ignored. The sex variable coding is the same as for normal pedigree files: M, m, or 1 for males, and F, f, or 2 for females. Missing values are not allowed. All individuals in the pedigree must be included. This command does not attempt to generate a reconstruction of a matching "real" pedigree. Rather, it creates a skeleton pedindex sufficient for the correct operation of SOLAR. To do this, each individual in a kinship connected group (family) is made a sibling and listed together in the pedindex with an assigned family number. This permits the correct one family at a time operation of SOLAR. This skeleton pedigree MUST always be used with the kinship matrix, phi2.gz, which is also generated by this command, which contains the original data reformatted for use with the pedindex. (The default mode of SOLAR in some cases is to use an on-the-fly generated kinship matrix determined by family relationships. However, when an empirical pedigree is loaded, this is suppressed as it is for discrete and evd models.) As with other pedigrees, once an epedigree is "loaded" in a working directory it remains loaded across multiple SOLAR sessions launched from that directory.Return to command index
A1.39 ephenotypes
Purpose: Rewrite phenotypes file using IBDID's in pedindex.out This has many uses, for example: This way gaw10 example can be run as empirical pedigree based on phi2.gz generated matrix as pedigree, and identical results obtained. Other pedigree/phenotypes files in original SOLAR formats could be processed this way. Also, a sex variable file can be made from the original pedigree file. Usage: ephenotypes <phenfile> Example: (starting in solar:) example load ped gaw10.ped ephenotypes gaw10.phen mv ephenotypes.out gaw10.ephen ephenotypes gaw10.ped mv ephenotypes.out gaw10.sex now make the empirical pedigree (in shell): gunzip phi2.gz echo "junk,ida,idb,kin" >phi2.csv sed -E "s/[ ]+/,/g" <phi2 >phi2.sed tail -n +2 phi2.sed >>phi2.csv mv phi2.csv gaw10.eped now use it (back in solar): load ped gaw10.eped -sex gaw10.sex load phen gaw10.ephen model new trait q4 covar age sex polygenic -sReturn to command index
A1.40 epistasis
Purpose: Use command: multipoint -epistasis <N> Usage: multipoint -epistasis <N> (<N> is the mibd index of interest) The 'epistasis' command itself is reserved for future use. Shortcuts: epista - epistasisReturn to command index
A1.41 evdinev
Purpose: I/O for EVD data Usage: trait ... covar ... evdout [<-evectors>[<-all>]] ;# write evddata.out,evectors if asked evdinx [<evdfile>] ;# return X* matrix from evddata.out file evdiny [<evdfile>] ;# return Y* matrix from evddata.out file evdinz [<evdfile>] ;# return Z* matrix from evddata.out file evdinev [<-all>] ;# load eigenvector matrix(es) from file(s) saved ;# by evdout -evectors Note: You must select trait, covariates, and anything else that would restrict the sample size before invoking evdout. You do not need a maximized model, just trait and covariates. evdout writes out evd transformed variables and the eigenvalues (which are called lambda) to a file named evddata.out in the maximization output directory but without actually doing a model maximization. The trait value, which is not demeaned, is written to variable <traitname>_evd. The covariate values, which are demeaned or scaled appropriately, are written to variables named evd2_<varname>_evd. Ignore other fields. Additionally, you can write out the eigenvectors, either in normal per-pedigree mode (one matrix for each family) or in entire-sample mode if you select the -all option. Filenames are evectors.family<n>.mat.csv for per-family matrices and evectors.mat.csv for entire sample (-all) matrix. Matrix files are written to the outdir. Current methods rely on the EVD transformation of variables and do not require the Eigenvectors to be output, so generally speaking you should not use the -evectors option unless you know you need it. evdinx returns the X* matrix corresponding to current evddata.out. The X* matrix has from left to right: (1) a column of 1's, (2+) one or more columns of EVD transformed covariates which have been scaled to mean of zero, in the order in which covariates occur in the model. evdinx has a -method2 option needed for the -method2 option of fphi. evdiny returns the Y* matrix, which is a one vertical column matrix (aka vector) of EVD transformed trait values. evdinz returns the Z matrix, which has has a column of 1's and a column of lambda's. evdinev returns a list of eigenvector matrices, one for each family, or if the -all option is specified, just one matrix for the entire sample. In order to use the -all option, you must have previously done evdout with the -evectors and -all options. Example: foreach covar $testvars { model new trait q4 covar $covar evdout set X [evdinx] set Y [evdiny] solve $X $Y }Return to command index
A1.42 evdinx
solar::evdiny -- solar::evdinz -- solar::evdinev -- Purpose: I/O for EVD data Usage: trait ... covar ... evdout [<-evectors>[<-all>]] ;# write evddata.out,evectors if asked evdinx [<evdfile>] ;# return X* matrix from evddata.out file evdiny [<evdfile>] ;# return Y* matrix from evddata.out file evdinz [<evdfile>] ;# return Z* matrix from evddata.out file evdinev [<-all>] ;# load eigenvector matrix(es) from file(s) saved ;# by evdout -evectors Note: You must select trait, covariates, and anything else that would restrict the sample size before invoking evdout. You do not need a maximized model, just trait and covariates. evdout writes out evd transformed variables and the eigenvalues (which are called lambda) to a file named evddata.out in the maximization output directory but without actually doing a model maximization. The trait value, which is not demeaned, is written to variable <traitname>_evd. The covariate values, which are demeaned or scaled appropriately, are written to variables named evd2_<varname>_evd. Ignore other fields. Additionally, you can write out the eigenvectors, either in normal per-pedigree mode (one matrix for each family) or in entire-sample mode if you select the -all option. Filenames are evectors.family<n>.mat.csv for per-family matrices and evectors.mat.csv for entire sample (-all) matrix. Matrix files are written to the outdir. Current methods rely on the EVD transformation of variables and do not require the Eigenvectors to be output, so generally speaking you should not use the -evectors option unless you know you need it. evdinx returns the X* matrix corresponding to current evddata.out. The X* matrix has from left to right: (1) a column of 1's, (2+) one or more columns of EVD transformed covariates which have been scaled to mean of zero, in the order in which covariates occur in the model. evdinx has a -method2 option needed for the -method2 option of fphi. evdiny returns the Y* matrix, which is a one vertical column matrix (aka vector) of EVD transformed trait values. evdinz returns the Z matrix, which has has a column of 1's and a column of lambda's. evdinev returns a list of eigenvector matrices, one for each family, or if the -all option is specified, just one matrix for the entire sample. In order to use the -all option, you must have previously done evdout with the -evectors and -all options. Example: foreach covar $testvars { model new trait q4 covar $covar evdout set X [evdinx] set Y [evdiny] solve $X $Y }Return to command index
A1.43 evdiny
solar::evdinz -- solar::evdinev -- Purpose: I/O for EVD data Usage: trait ... covar ... evdout [<-evectors>[<-all>]] ;# write evddata.out,evectors if asked evdinx [<evdfile>] ;# return X* matrix from evddata.out file evdiny [<evdfile>] ;# return Y* matrix from evddata.out file evdinz [<evdfile>] ;# return Z* matrix from evddata.out file evdinev [<-all>] ;# load eigenvector matrix(es) from file(s) saved ;# by evdout -evectors Note: You must select trait, covariates, and anything else that would restrict the sample size before invoking evdout. You do not need a maximized model, just trait and covariates. evdout writes out evd transformed variables and the eigenvalues (which are called lambda) to a file named evddata.out in the maximization output directory but without actually doing a model maximization. The trait value, which is not demeaned, is written to variable <traitname>_evd. The covariate values, which are demeaned or scaled appropriately, are written to variables named evd2_<varname>_evd. Ignore other fields. Additionally, you can write out the eigenvectors, either in normal per-pedigree mode (one matrix for each family) or in entire-sample mode if you select the -all option. Filenames are evectors.family<n>.mat.csv for per-family matrices and evectors.mat.csv for entire sample (-all) matrix. Matrix files are written to the outdir. Current methods rely on the EVD transformation of variables and do not require the Eigenvectors to be output, so generally speaking you should not use the -evectors option unless you know you need it. evdinx returns the X* matrix corresponding to current evddata.out. The X* matrix has from left to right: (1) a column of 1's, (2+) one or more columns of EVD transformed covariates which have been scaled to mean of zero, in the order in which covariates occur in the model. evdinx has a -method2 option needed for the -method2 option of fphi. evdiny returns the Y* matrix, which is a one vertical column matrix (aka vector) of EVD transformed trait values. evdinz returns the Z matrix, which has has a column of 1's and a column of lambda's. evdinev returns a list of eigenvector matrices, one for each family, or if the -all option is specified, just one matrix for the entire sample. In order to use the -all option, you must have previously done evdout with the -evectors and -all options. Example: foreach covar $testvars { model new trait q4 covar $covar evdout set X [evdinx] set Y [evdiny] solve $X $Y }Return to command index
A1.44 evdinz
solar::evdinev -- Purpose: I/O for EVD data Usage: trait ... covar ... evdout [<-evectors>[<-all>]] ;# write evddata.out,evectors if asked evdinx [<evdfile>] ;# return X* matrix from evddata.out file evdiny [<evdfile>] ;# return Y* matrix from evddata.out file evdinz [<evdfile>] ;# return Z* matrix from evddata.out file evdinev [<-all>] ;# load eigenvector matrix(es) from file(s) saved ;# by evdout -evectors Note: You must select trait, covariates, and anything else that would restrict the sample size before invoking evdout. You do not need a maximized model, just trait and covariates. evdout writes out evd transformed variables and the eigenvalues (which are called lambda) to a file named evddata.out in the maximization output directory but without actually doing a model maximization. The trait value, which is not demeaned, is written to variable <traitname>_evd. The covariate values, which are demeaned or scaled appropriately, are written to variables named evd2_<varname>_evd. Ignore other fields. Additionally, you can write out the eigenvectors, either in normal per-pedigree mode (one matrix for each family) or in entire-sample mode if you select the -all option. Filenames are evectors.family<n>.mat.csv for per-family matrices and evectors.mat.csv for entire sample (-all) matrix. Matrix files are written to the outdir. Current methods rely on the EVD transformation of variables and do not require the Eigenvectors to be output, so generally speaking you should not use the -evectors option unless you know you need it. evdinx returns the X* matrix corresponding to current evddata.out. The X* matrix has from left to right: (1) a column of 1's, (2+) one or more columns of EVD transformed covariates which have been scaled to mean of zero, in the order in which covariates occur in the model. evdinx has a -method2 option needed for the -method2 option of fphi. evdiny returns the Y* matrix, which is a one vertical column matrix (aka vector) of EVD transformed trait values. evdinz returns the Z matrix, which has has a column of 1's and a column of lambda's. evdinev returns a list of eigenvector matrices, one for each family, or if the -all option is specified, just one matrix for the entire sample. In order to use the -all option, you must have previously done evdout with the -evectors and -all options. Example: foreach covar $testvars { model new trait q4 covar $covar evdout set X [evdinx] set Y [evdiny] solve $X $Y }Return to command index
A1.45 evdmat
solar::evdinx -- solar::evdiny -- solar::evdinz -- solar::evdinev -- Purpose: I/O for EVD data Usage: trait ... covar ... evdout [<-evectors>[<-all>]] ;# write evddata.out,evectors if asked evdinx [<evdfile>] ;# return X* matrix from evddata.out file evdiny [<evdfile>] ;# return Y* matrix from evddata.out file evdinz [<evdfile>] ;# return Z* matrix from evddata.out file evdinev [<-all>] ;# load eigenvector matrix(es) from file(s) saved ;# by evdout -evectors Note: You must select trait, covariates, and anything else that would restrict the sample size before invoking evdout. You do not need a maximized model, just trait and covariates. evdout writes out evd transformed variables and the eigenvalues (which are called lambda) to a file named evddata.out in the maximization output directory but without actually doing a model maximization. The trait value, which is not demeaned, is written to variable <traitname>_evd. The covariate values, which are demeaned or scaled appropriately, are written to variables named evd2_<varname>_evd. Ignore other fields. Additionally, you can write out the eigenvectors, either in normal per-pedigree mode (one matrix for each family) or in entire-sample mode if you select the -all option. Filenames are evectors.family<n>.mat.csv for per-family matrices and evectors.mat.csv for entire sample (-all) matrix. Matrix files are written to the outdir. Current methods rely on the EVD transformation of variables and do not require the Eigenvectors to be output, so generally speaking you should not use the -evectors option unless you know you need it. evdinx returns the X* matrix corresponding to current evddata.out. The X* matrix has from left to right: (1) a column of 1's, (2+) one or more columns of EVD transformed covariates which have been scaled to mean of zero, in the order in which covariates occur in the model. evdinx has a -method2 option needed for the -method2 option of fphi. evdiny returns the Y* matrix, which is a one vertical column matrix (aka vector) of EVD transformed trait values. evdinz returns the Z matrix, which has has a column of 1's and a column of lambda's. evdinev returns a list of eigenvector matrices, one for each family, or if the -all option is specified, just one matrix for the entire sample. In order to use the -all option, you must have previously done evdout with the -evectors and -all options. Example: foreach covar $testvars { model new trait q4 covar $covar evdout set X [evdinx] set Y [evdiny] solve $X $Y }Return to command index
A1.46 evdout
solar::evdmat -- solar::evdinx -- solar::evdiny -- solar::evdinz -- solar::evdinev -- Purpose: I/O for EVD data Usage: trait ... covar ... evdout [<-evectors>[<-all>]] ;# write evddata.out,evectors if asked evdinx [<evdfile>] ;# return X* matrix from evddata.out file evdiny [<evdfile>] ;# return Y* matrix from evddata.out file evdinz [<evdfile>] ;# return Z* matrix from evddata.out file evdinev [<-all>] ;# load eigenvector matrix(es) from file(s) saved ;# by evdout -evectors Note: You must select trait, covariates, and anything else that would restrict the sample size before invoking evdout. You do not need a maximized model, just trait and covariates. evdout writes out evd transformed variables and the eigenvalues (which are called lambda) to a file named evddata.out in the maximization output directory but without actually doing a model maximization. The trait value, which is not demeaned, is written to variable <traitname>_evd. The covariate values, which are demeaned or scaled appropriately, are written to variables named evd2_<varname>_evd. Ignore other fields. Additionally, you can write out the eigenvectors, either in normal per-pedigree mode (one matrix for each family) or in entire-sample mode if you select the -all option. Filenames are evectors.family<n>.mat.csv for per-family matrices and evectors.mat.csv for entire sample (-all) matrix. Matrix files are written to the outdir. Current methods rely on the EVD transformation of variables and do not require the Eigenvectors to be output, so generally speaking you should not use the -evectors option unless you know you need it. evdinx returns the X* matrix corresponding to current evddata.out. The X* matrix has from left to right: (1) a column of 1's, (2+) one or more columns of EVD transformed covariates which have been scaled to mean of zero, in the order in which covariates occur in the model. evdinx has a -method2 option needed for the -method2 option of fphi. evdiny returns the Y* matrix, which is a one vertical column matrix (aka vector) of EVD transformed trait values. evdinz returns the Z matrix, which has has a column of 1's and a column of lambda's. evdinev returns a list of eigenvector matrices, one for each family, or if the -all option is specified, just one matrix for the entire sample. In order to use the -all option, you must have previously done evdout with the -evectors and -all options. Example: foreach covar $testvars { model new trait q4 covar $covar evdout set X [evdinx] set Y [evdiny] solve $X $Y }Return to command index
A1.47 example
Purpose: Copy the SOLAR example to the current working directory Usage: example Notes: The example may be used in conjunction with the SOLAR tutorial in Chapter 3. To read that, give the command "doc -chapter 3" The example files are actually located in the doc/Example subdirectory of the SOLAR installation. To find the "doc" subdirectory, give the command "doc -whereis" Shortcuts: examp - exampleReturn to command index
A1.48 exclude
Purpose: Excude phenotypes from use as covariates by automodel and allcovar commands. Usage: exclude <var> <var> ... ; Add variable(s) to exclude exclude ; List all excluded variables exclude -reset ; Reset to default exclude list exclude -clear ; Remove all variables from list Notes: You may add to the exclude list with one or more exclude commands. By default, all variables named and/or mapped by the FIELD command will be excluded (except for SEX). The exclude command lets you exclude additional variables. (The FIELD command variables are pedigree variables such as ID which would never be wanted as covariates.) The default exclude list will include the following standard PEDSYS pedigree mnemonics: seq fseq mseq sseq dseq ego id fa mo sire dam pedno famno twin mztwin ibdid fibdid mibdid blank kid1 psib msib fsib birth exit If you are excluding more variables that you are keeping, you might consider simply specifying the covariates you want explicitly rather than using the allcovar or automodel commands, or creating a new phenotypes file with fewer fields. The variable name you enter will be converted to lower case. Solar is intended to handle phenotypic and pedigree variables in a case insensitive manner. Shortcuts: excl - excludeReturn to command index
A1.49 factorial
Purpose: Compute factorial Usage: factorial N Example: set big [factorial 10] Notes: A double precision value is returned, since double precision can represent larger numbers exactly than integers. Non-integral N is rounded to the nearest integer first, then the factorial is computed for that integer. For large enough N, the value returned might not be exact. (Currently this happens for N > 18.) Negative N (after rounding) raises a range error This may be, but need not be, used in an "expr".Return to command index
A1.50 fakedata
Purpose: Generate fake pedigree/phenotypes file for testing memory required Usage: fakedata <n_individuals> <n_traits> fakedata.out is written with 4-person families. Trait data is random. Note: The intended use is testing whether memory is sufficient to handle a particular number of traits/individuals/pedigree-size.Return to command index
A1.51 faketraits
Purpose: Generate fake pedigree/phenotypes file for testing memory required Usage: faketraits <n_individuals> <n_traits> faketraits.out is written with 4-person families. Trait data is random. Note: The intended use is testing whether memory is sufficient to handle a particular number of traits/individuals/pedigree-size.Return to command index
A1.52 fformat
Purpose: Replace Tcl format with fixed width fields for numbers Usage: fformat <spec>+ <value1>+ <spec> format specifier(s) as for Tcl format command. f, e, or g format required for "fixed width" operation, like this: %[--][W][.P]T where T is e, f, g, or y default right justification - specifies left justification -- specifies center justification W is desired width P is desired precision (before and after decimal) T is format type: f is floating decimal e is exponential g is floating decimal if suitable, then exponential y same as g, except that exponential format is not used until the output would otherwise be 0.0 or nearly so for a non-zero value. At least one significant digit is preserved for P 1-4, two digits for P 4-6, and three digits for P 7-*. This is more consistent with readability, retaining the fixed format nearly as long as possible. Sometimes, more space will be used than W, but this is much less likely than with the standard G format. However, unlike F format, the the result will not go to zero unless it is zero. When possible, allow more space in "width" than the precision seems to require. That way, under special circumstances, there is extra space for signs, "e", decimal point, etc. z same as y, except resulting string is trimmed to minimum space for csv files Note: For fractional numbers, make width at least 2 than precision, to allow for leading "0." Then allow one more for - sign, if that is possible. This is intended as a drop-in replacement for the Tcl "format" command, modifying "minimum width" to "fixed width" for the f, e, and g formats ("fixed width" makes for more readable columns) and adding a center justification option.Return to command index
A1.53 field
Purpose: Allow non-standard user data field names Usage: field ; This shows all mappings field <default name> <user name> ; Create one mapping field <default name> ; Show one mapping field <default name> -none ; Ignore this field (see notes) field <default name> -default ; Restore full default Examples: field ID Subject field FA Father field MO Mother Notes: The default names are ID, FA, MO, PROBND, MZTWIN, FAMID, SEX, and HHID. EGO, SIRE, and DAM are also permitted in place of ID, FA, MO by default. However, unlike the default, you can only specify one name to be mapped. However, you can restore the full default for any field using the -default argument. The -none argument declares a field to be ignored whether it exists or not. This is useful, for example, if you want the optional PROBND field to be ignored: field PROBND -none This would signify that there is no PROBND field, i.e. there are no probands. PROBND, MZTWIN, and HHID are optional fields for which the -none argument may be used. Most other fields are mandatory and -none cannot be used for them. Your field selections are saved for future SOLAR sessions to a file named field.info in the working directory. Once you have entered your field selections, you need not enter them again (starting with version 2.0.6) when you are running SOLAR from the same working directory. However, if you followed our previous recommendation to put field commands in a .solar file, note that the settings in the .solar file take precedence over the settings in field.info. If you would like to remove an entire old set of field assignments, you can delete the field.info file BEFORE starting SOLAR. (SOLAR reads the field.info file when starting.) FAMID field may or may not be required depending on whether your ID's are unique in the entire dataset. If your ID's are unique, you do not need FAMID. However, if your ID's are sequential within each family, you need a FAMID field in both your pedigree and phenotypes files, otherwise they are ambiguous. SOLAR now catches this mistake when either pedigree or phenotypes files are loaded. Shortcuts: fie - fieldsReturn to command index
A1.54 file-freq
Purpose: Describe frequency data file requirements The freq file contains allele frequency data for a set of marker loci, one line per marker. Each line consists of the following space-delimited fields: marker name, all_1 name, all_1 freq, all_2 name, all_2 freq, ... The allele frequencies for a marker must sum to 1 (a small roundoff error is tolerated.) Allele frequency information is used when IBDs are computed for a marker that is not completely typed, i.e. there are individuals for whom genotype data is not available. Example: D20S101 123 0.2457 127 0.1648 133 0.5895 IGF1 A 0.4 B 0.3 C 0.1 F 0.2 ApoE E1 .125 E2 .25 E3 .625 Once a freq file has been loaded, it is not necessary to load it again in subsequent SOLAR runs from the same working directory. Shortcuts: file-f - file-freqReturn to command index
A1.55 file-map
Purpose: Describe map data file requirements The map file contains chromosomal locations for a set of marker loci on a single chromosome. Typically, marker locations are given in cM and a mapping function is used to convert inter-marker distances to recombination fractions. Currently, the Kosambi and Haldane mapping functions are allowed. Marker locations can also be specified in basepairs. While cM locations can be floating point numbers, basepair locations must be integers; non-integer locations are truncated to integers. When basepair locations are used, the mapping function is called "basepair" rather than Kosambi or Haldane, but in fact there is no mapping provided from basepairs to recombination fractions and such maps cannot be used to compute multipoint IBDs. The first line of the map file contains the chromosome number, and (optionally) the name of the mapping function. If no mapping function is specified, the mapping is assumed to be Kosambi. The chromosome number can be any character string not containing a blank or a forward slash (/), although the use of integers is recommended. For example, the strings '01' and '10q' are allowed. Each line after the first line consists of the following space-delimited fields: marker name, marker location Examples: 20 D20S101 0.0 D20S202 34.2 D20S303 57.5 TCF basepair 2448b 19828941 380659 19829489 Shortcuts: file-map - file-mapReturn to command index
A1.56 file-marker
Purpose: Describe marker data file requirements The marker file contains genotype data for one or more marker loci. The file consists of one record for each individual who has been typed for one or more of these markers. Each record must contain the following fields: ego ID, genotype1, genotype2, ... In addition, a family ID field must be included when ego IDs are not unique across the entire data set. If, however, each ego ID is unique to an individual and an individual may appear multiple times in the data set, then the family ID should not be included. The same genotypic data is then associated with every occurrence of an individual. The default field names are ID and FAMID. EGO is also accepted by default. You can set up SOLAR to use different field names by using the field command (see 'help field'). You do not necessarily need to change your names to match ours. Fields with names other than ID and FAMID are assumed to contain marker data, with the exception of the following names: FA, MO, SEX, MZTWIN, HHID, AGE, PEDNO, and GEN. Fields having one of these names are ignored. The scheme used to encode genotypes may vary from field to field. SOLAR recognizes many standard coding schemes, but the safest way to code genotypes is with the forward slash to separate the alleles. Ex: AB E1 E3 123/456 A blank genotype field denotes missing data, as do the genotypes 0/0 and -/-. SOLAR requires that either both alleles are typed or both alleles are missing, except for male genotypes at X-linked marker loci. In that case, either a single allele is specified (the other allele is blank, 0, or -), or the genotype is coded as a "homozygote". Ex: 237/243 valid female X-linked marker genotype /251 valid male X-linked marker genotype 251/0 valid male X-linked marker genotype -/251 valid male X-linked marker genotype 251/251 valid male X-linked marker genotype The marker loci in the marker file must all be autosomal or all be X-linked. By default, SOLAR assumes that the markers are autosomal. If the markers are X-linked, then either the XLinked option must be set with the ibdoption command prior to loading the marker file, or the -xlinked option must be given in the load marker command. Once a marker file has been loaded, it is not necessary to load it again in subsequent SOLAR runs from the same working directory. Shortcuts: file-mar - file-markersReturn to command index
A1.57 file-matrix
Purpose: Describe csv matrix file format requirements CSV matrix files were introduced in SOLAR version 7.5.0, and it is conditionally recommended that all users who are creating their own matrix files use this format, as it is more easily understood and created than the previous format. Matrix writers should beware that there are many possible pitfalls in hand writing a matrix file, and matrix files should at minimum be checked with the "matrix debug" command after loading the first time. The original space column delimited matrix file is still created and used by SOLAR itself and documented in the manual, and if you merely wish to modify an existing matrix file, it may still be easiest to use the original format. The format of the phi2.gz kinship matrix file is described in Section 8.3 of the manual, and the same rules would apply to any original format matrix file. The csv matrix file is an ordinary comma separated variable file with the first line being a header which names all the fields used, as is common with CSV files. However SOLAR requires that all matrix files, including csv matrix files, be compressed using the gzip program. Thus all matrix files will have the final filename extension ".gz". All csv matrix files must have the following three required fields: id1, id2, matrix1. id1 and id2 are user assigned ID's, as used in pedigree and phenotypes files. matrix1 is typically the primary matrix in the file, for example the primary matrix in the phi2.gz file is the "phi2" matrix. A csv matrix file may also have a second matrix named matrix2. If FAMID's are required to disambiguate ID's in your dataset, you must also include famid1 and famid2, the famid's corresponding to id1 and id2. SOLAR will determine whether famid's are needed or not from the pedigree file. If famid's are not needed, famid1 and famid2 will be ignored if they are present. If famid's are needed and not present in the matrix file, the load matrix command will generate an error and the matrix will not be loaded. Note that is the "load matrix" command which actually assigns meaningful names to matrixes for use in the SOLAR omega. For example, the phi2 matrix is usually loaded with the following command: load matrix phi2.gz phi2 Alternatively, an analysis examining dominance would require a second matrix, delta7: load matrix phi2.gz phi2 delta7 In both cases the first matrix (called matrix1 in a CSV matrix file) will be associated with the name phi2 in the omega. The second command will also associate the second matrix (matrix2) with the name delta7. Neither command shown let you directly use the names matrix1 and matrix2 in the omega, unless those names were also specified in the load matrix command. The CSV naming scheme is intended to allow the "load matrix" command to be used exactly as it was before. Variable names other than id1, id2, matrix1, matrix2, famid1, and famid2 in a CSV matrix file will be (in this version) ignored. There are other semantic requirements for matrix files, depending on the type of matrix involved. One is that every individual in an analysis must be included in the matrix file regardless of whether they have pairwise relationships with others in the file. At minimum every individual has a self relationship of value 1.0. With version 7.5.0 it is required that you include these "diagonal" matrix values of 1.0 for every individual in your sample, and possibly everyone in your pedigree file. Otherwise, diagonal values will default to -1 which could have bad consequences. (Note: in version 7.5.0 it is not possible to check this with matrix debug because it only checks values in the input file, not defaulted values in the matrix itself.) It is not permitted to have individuals in the CSV matrix file who are not defined in the pedigree file. Historically matrix files were dependent on the pedindex.out files created when the pedigree was loaded. This is because the very IBDID's used in original format matrix files might be assigned to different actual ID's if the pedigree is changed. CSV format matrix files are less dependent, but it is still likely that changes to a pedigree file would require corresponding changes in the matrix file. And sometimes this may be overlooked, causing disasterous results. SOLAR has long prepended a pedigree checksum record to all the matrix files it creates which are checked against the pedigree file when the matrix is loaded. Changes to the pedigree after the matrix file was created will cause an error to be raised when attempting to load that matrix file. The same checksum checking features can optionally be used in CSV matrix files. Once the matrix has been created and compressed using gzip, the procedure "matcrc" can be run on them, for example: solar> matcrc phi2.csv.gz This will determine a checksum value from the currently loaded pedigree file, and prepend this to the matrix file in a record with id1=checksum and id2=checksum. It also will gunzip the matrix file to perform this change, and gzip after the change has been made. If the names assigned to matrixes (in the load matrix command) begin with "ibd" or "mibd" a special defaulting rule applies. -1 values found in these matrixes mean that the actual value should be taken from the phi2 matrix (for the first matrix) or delta7 (for the second matrix). Furthermore, if -1 is READ FROM THE FILE for a diagonal matrix entry, the default is at that time applied to every pair including that individual. (On the other hand, if a -1 occurs on the diagonal only because the diagonal entry was missing from the file, defaulting would occur for the missing diagonal but not for every other pair including that individual, the default value for those other pairs would be zero.) This feature had historical importance but is considered obsolescent now and is not recommended for use in new matrix files.Return to command index
A1.58 file-pedigree
Purpose: Describe pedigree data file requirements The pedigree file consists of one record for each individual in the data set. Each record must include the following fields: ego ID, father ID, mother ID, sex In addition, a family ID is required when ego IDs are not unique across the entire data set. If the data set contains genetically identical individuals, an MZ-twin ID must be present (as described below). If an analysis of household effects is planned, a household ID can be included (also described below). The default field names are ID, FA, MO, SEX, FAMID, MZTWIN, and HHID. EGO, SIRE, and DAM are also accepted by default. You can set up SOLAR to use different field names by using the field command (see 'help field'). You do not necessarily need to change your names to match ours. A blank parental ID or a parental ID of 0 (zero) signifies a missing parent. SOLAR requires that either both parents are unknown, i.e. the individual is a founder, or both parents are known. If the pedigree data consists of unrelated individuals with no parental data, then the father ID and mother ID fields are not required. If there are parents for whom pedigree file records do not exist, then records are created internally for those parents, who are assumed to be founders. Sex may be encoded as M, m, or 1 for males and F, f, or 2 for females. The missing value for sex is 0, U, u, or blank. The MZ-twin ID is used to designate genetically identical individuals, e.g. monozygotic twins or triplets. Each member of a group of identical individuals should be assigned the same MZ-twin ID. Twin IDs must be unique across the entire data set. If there are no genetically identical individuals in the data set, this field need not be present in the pedigree file. The household ID, if present, will be used to generate a matrix file (house.gz) that can be used later to include a variance component for household effects. Household IDs must be unique across the entire data set. The family ID field is required only when ego IDs are not unique across the entire data set. For example, if a data set consists of nuclear families, and the same ego ID may appear in more than one family, then the family ID must be included. Or if, for example, IDs are sequential integers unique only within pedigrees, then the pedigree ID must be included. At the time the pedigree file is loaded, SOLAR indexes the data set. This indexing is internal and should not be confused with any external indexing the user may have imposed upon the data set. This indexing information is stored in a file named 'pedindex.out' in the directory where SOLAR is running when the pedigree data is loaded. Be careful about deleting files unless you are sure they are not needed by SOLAR! Once a pedigree file has been loaded, it is not necessary to load it again in subsequent SOLAR runs from the same working directory. Shortcuts: file-pe - file-pedigreesReturn to command index
A1.59 file-phenotypes
Purpose: Describe phenotypes data file requirements The phenotypes file may be in either PEDSYS or Comma Delimited format. The phenotypes file consists of one record for each individual. Each record must include an ego ID and one or more phenotypic values (which may be blank to signify missing data). ego ID, phen 1, phen 2, ... (The phenotypes file may also contain other data, such as pedigree data. You could use one file as both your phenotype and your pedigree file, though that is not necessarily recommended. There are fewer possible problems with separate files.) Just as with the pedigree file, a field name FAMID is required when IDs are not unique across the entire data set. (If your ego IDs are unique, it is probably better _not_ to include a family ID, as it just complicates things slightly.) If your data has probands and you wish to employ ascertainment correction, the phenotypes file must have a proband field. In this field, blank ( ) or zero (0) signifies non-proband, and anything else signifies proband. A decimal point is _not_ permitted after the zero. The presence of a proband field automatically turns on ascertainment correction. The default field names are ID, FAMID, and PROBND. You can set up SOLAR to use different field names by using the field command. The phenotype field names may be anything within certain rules. (no spaces, tabs, or slashes; also certain special characters such as *#,^/-+ can cause problems in the names of phenotypes used as covariates). If you stick with alphabetic characters, numeric characters, and underscores you will be safe. The phenotype data fields must be numbers, either with or without decimal points. Zero (0) is always considered a permissible value; blank ( ) or null (e.g. no value in between the commas ",," in a comma delimited file) must be used to signify missing values. Floating or fixed point numbers must always include a decimal point; numbers without a decimal point are assumed to be integers. Binary, discrete or categorical values should be indicated with consecutive integers (e.g. 0,1 or 1,2 or 2,3). SOLAR checks all phenotype fields to see if they contain only two consecutive integers and judges them "binary" if they do. Binary traits are automatically handled by the SOLAR discrete trait liability threshold modeling code; you don't need to do anything special. See Chapter 9 for discussion on what to do with "categorical" data that has more than two categories. Without special scripting, categorical phenotypes with more than two categories should not be used in SOLAR. (SOLAR will not identify categorical phenotypes with more than two categories and instead treat them as quantitative phenotypes.) The 'load phenotypes' command creates a file named phenotypes.info in the working directory. Once a phenotypes file has been loaded, it need not be loaded again in the same working directory, unless you change the file itself. SOLAR automatically removes pedigrees in which no non-proband has all required phenotypic data from the analysis. You need not remove these pedigrees yourself. You will get a full accounting of pedigrees and individuals included and excluded in the maximization output files (described below) , by running the 'maximize' command, or giving the 'verbosity max' command prior to other commands. Shortcuts: file-ph - file-phenotypesReturn to command index
A1.60 finemap
Purpose: Set fine mapping threshold for multipoint Usage: finemap <LOD> [<LOD> ...] finemap default finemap off finemap {displays current finemap setting} Example: finemap 0.588 Notes: After each multipoint pass when the interval is greater than 1 SOLAR will examine all points in the regions around points higher than some threshold. This threshold is set with the finemap command. The default is 0.588. Finemapping can also be turned off. The finemap setting is unimportant when the interval is 1. (Note: versions of SOLAR prior to 1.1.0 did finemapping only around the single highest peak by default.) Shortcuts: fine - finemapReturn to command index
A1.61 fix
Purpose: Constrain a parameter to its current value Usage: fix <name> ; Name is the name of the parameter you want to fix Example: fix h2r Shortcuts: fix - fixReturn to command index
A1.62 fphi
Purpose: Fast test and heritability approximation Usage: fphi [optional -fast -debug -list <file containing trait names> -precision <h2 decimal count> -mask <name of nifti template volume> -evd_data <base filename of EVD data] -fast Performs a quick estimation run -debug Displays values at each iteration -list performs fast fphi on a list of trait (does not include covariate data) -precision number of decimals to calculate h2r -mask outputs fphi -fast results of the list of voxels from -list option -evd_data When using the -list option the EVD data option can be used to avoid having to calculate EVD data within the command Fast permutation and heritability inference (FPHI). FPHI is based on the eigenvalue decomposition on the kinship matrix and a search through values of h2r for accurate approximation of heritability values and statistical inference. The default setting is the full search out to 9 decimal places of h2r. The "-fast" option uses the Wald approximation. Both use log likelyhood p-value estimation. The default setting should provide very accurate h2 estimates that are nearly identical to the standard maximum likelihood inference. The h2 values obtained using Wald approximation are usually within 3% of the classical MLE values. The same functionality is available for GPU computing. Use gpu_fphi for heritability calculations in very large datasets. For details see Ganjgahi et al., "Fast and powerful heritability inference for family-based neuroimaging studies”.Return to command index
A1.63 freq
Purpose: Process the allele frequency data. Usage: load freq [-nosave] <filename> ; loads freq file freq unload ; unloads allele frequencies freq mle [-nose] [-hwe] [<marker> ...] ; computes MLE allele frequencies freq save <filename> ; saves allele frequencies to a file freq show [<marker> ...] ; displays allele frequencies The 'freq unload' command will not unload allele frequency information for markers with currently loaded genotype data. Frequency data for such markers is unloaded at the time the genotype data is unloaded with the 'marker unload' command. In general, it is not necessary to unload existing frequency data before loading a new freq file; the unloading will be done automatically. However, the 'freq load' command will not replace previously loaded frequency data if MLE allele frequencies have been computed but not saved for one or more markers. In that case, the MLEs must be saved to a file, or the -nosave option must be specified. MLE allele frequencies are computed only for markers with currently loaded genotype data. To load genotype data, use the 'load marker' command. If a marker name is not specified for the 'freq mle' command, MLE allele frequencies will be computed for all markers with currently loaded genotype data. By default, standard errors are computed for the marker allele frequency estimates. These additional calculations result in increased compute time, but can be avoided with the '-nose' option if standard errors are not required. When the '-hwe' option is given, the 'freq mle' command carries out an additional likelihood maximization for each marker, in this case finding MLEs for marker genotype frequencies rather than allele frequencies. A test of whether the assumption of Hardy-Weinberg equilibrium holds is provided by comparing the likelihood of the genotype frequency-based model with that of the allele frequency-based model (which assumes HWE). When this test has been conducted, the associated p-values will be displayed by the 'marker show' command. The file created by the 'freq save' command is written in a format suitable for subsequent loading with the 'freq load' command. If a marker name is not specified for the 'freq show' command, all currently loaded frequency information will be displayed. The currently loaded allele frequency information is stored in the file 'freq.info' in the current working directory. This file persists between SOLAR runs, which means that the allele frequencies which are in effect at the end of a session will still be in effect the next time SOLAR is invoked (from within the same working directory.) Notes: The set of markers in the freq file and the set of markers in the marker file do not have to be the same. Allele frequencies will be computed for markers that do not appear in the freq file at the time these markers are loaded with the 'load marker' command. For a description of the freq file, enter 'file-freq' Shortcuts: fre - freqReturn to command index
A1.64 full_filename
Purpose: Prepend the maximization output directory name to filename(s) Usage: full_filename [<filename>]+ Note: See "help outdir". full_filename is intended for scripts.Return to command index
A1.65 gen_corr
Purpose: Calculates the genetic correlation between two traits. Usage: gen_corr [options --pvalues --debug --evd_data <basename of EVD file names>] All options are not required. --pvalues gives the pvalues of rhog and rhoe --debug display all output from each iteration from each numerical differentiation delta --evd_data <basename of EVD file names> allows the user to enter the base filename of the output of create_evd_data so that prolonged eigenvalue decompositions across data sets with similiar ID sets don't need to be repeated. Example trait trait_one trait_two gen_corr --debugReturn to command index
A1.66 gpu_fphi
Purpose: Fast test and heritability approximation performed on the GPU Usage: gpu_fphi -list <trait list file name> -o <output file name> [optional -all_gpus -use_covs -thread_size <number of threads used by gpu kernels> --gpus <list of gpu ids separated by commas to be used in analysis> --display-gpus --verbose --batch_size <number of traits to calculate at once per stream> --n_streams <number of gpu streams per gpu> --evd_data <base filename of output of create_evd_data> --nifti <nifti template filename must have extension .nii.gz>] Returns: Outputs H2r, standard error of H2r and p-value of H2r to <output file name> or default output filename. Requirements: -Dynamically linked version of solar-eclipse -CUDA capable NVIDIA graphics card with a minimum architecture of 3.5 -Linux version of solar-eclipse -CUDA drivers must be from toolkit 9.0 or later (Not applicable if built locally from source code) They can be downloaded https://developer.nvidia.com/cuda-10.2-download-archive -list File containing all the traits that gpu_fphi will process -thread_size Number of GPU threads used by GPU kernels must be 32,64,128,256,512, or 1024 -o <output filename or base output filename if -nifti option is used> default name is gpu-gwas.out -all_gpus Option to select all usable CUDA Capable GPUs -gpus Allows for the selection of certain GPU from the start using their GPU IDs -batch_size Number of trait to computed at once per stream -n_streams Number of streams to be used per GPU -evd_data Uses precalculated EVD data from create_evd_data command -nifti Uses a template nifti file to output H2r, standard error, and p-values into image volumes with dimensions and orientation taken from specified template file -display_gpus Displays all usable GPUs that an be used in analysis -use_covs Includes covariates that were selected with covariate command Notes: The trait list file name should be setup as a file in which all the traits are listed and separated by spaces or newlines. If you are experiencing difficulty in running this program please try the following: -Ensure your GPU/GPUs are NVIDIA CUDA GPUs that have a device architecture greater than equal to 3.5. -If you wish to run more than one GPU at once be sure your driver's compute mode is set to DEFAULT or 0. This can be checked using the command nvidia-smi -c. -Make sure the CUDA drivers installed are from at least toolkit 9.0 -Try closing all processes currently running on the GPU to free up memory and device occupancy.Return to command index
A1.67 gpu_gwas
Purpose: Calculates pvalues on an nvidia gpu for snps given a loaded phenotype containg columns labeled "snp_" or a plink file that contains ids that correspond to the phenotype and pedigree ids. Warning: With this version of SOLAR gpu_gwas is still in experimental stages. The stability of the command is the issue, not the results of the command. Usage: gpu_gwas -plink <plink base filename> [optional: -all -list <filename containing list of traits> --evd_data <base filename from create_evd_data output> --precision <precision of h2r must be integer [1,6] --stream_count <number of stream per gpu> --batch_size <number of snps processed by a stream at once> --size <number of gpu threads must be 32,64,128,256,512,or 1024> --verbose --gpus <list of gpu ids separated by no space commas> --screen --calibrate <percentage of SNPs to calibrate with> --o <base output filename> --blank_ped ] Requirements: -Dynamically linked version of solar-eclipse -CUDA capable NVIDIA graphics card with a minimum architecture of 3.5 -Linux version of solar-eclipse -CUDA drivers must be from toolkit 9.0 or later (Not applicable if built locally from source code) They can be downloaded https://developer.nvidia.com/cuda-10.2-download-archive Command Arguments: -plink <plink base filename> Base filename for plink .bed,.bim., and .fam. This variable is required -o <filename> Uses this filename as opposed to the default output fileaname -all Selects all availible usable NVIDIA GPUs -evd_data <base filename from create_evd_data> Allows gpu_gwas to avoid it's own EVD calculation -precision <integer in range [1,6]> Determines how accurate the value of h2 in null and model calculations Default: 5 -stream_count <integer number in range [1,6]> Number of GPU streams per GPU. Each GPU has a number of independant streams that read file data, transfer it to the GPU, perform calculations, and finally transfer the results back to the CPU Default: 2 -batch_size <integer number in range [1,10000]> Number of SNPs calculated per iteration of a GPU stream. It's possible a number within this range could be too large, which will lead to an error or crash. Calibrate it relative to the number of subjects. If not value is specified then this command to estimate the largest number given memory constraints -size <integer number from set {32,64,128,256,512,1024}> Number of GPU threads used in GPU kernels. Different values cause different runtimes. By default this command will try to estimate the best thread size for the data set -verbose Option that displays progress -gpus <GPU integer IDs separate by commas> List of NVIDIA GPU IDs that the command will use -screen Option to run screen mode, which estimates p-values quickly without having to converge -blank_ped Calculates p-values for SNP data without using a pedigree, must be used with -screen -calibrate <decimal in range (0,1)> Compares the runtimes of different possible options using a fraction of the full data set, which is the argument of the option The GWAS function performs an examination of a genome-wide set of genetic variants and calculate p-values for the association between trait and individual SNPs. The output is saved as a csv file labeled either <trait name>-gpu-gwas.out or <trait name>-gpu-gwas-screen.out unless a name specified with --o is given. Covariates can be factored in by running sporadic_normalize on the trait beforehand. Missing SNP values are automatically set two times the frequency of the major allele. The -screen option runs a fast version of the algorithm that only computes rough estimate of the p-value and beta. -evd_data option is highly recommended for data sets with a large number of subjects. Note if precision is set to 6 depending on the gpu there may be convergence issues. By default precision is set to 5. You can test for optimal thread count, thread count can be set with --size option, by running --calibrate. Traits are selected with either the trait command or --list option if you require to run gpu-gwas on a list of traits. -stream_count is recommended to be set to 2 and --batch_size is recommended to be set to 500 or 1000 depending on the number of subjects. Example: covar age*sex sporadic_normalize -header voxel_list.header -out voxels.csv load phenotype voxels.csv gpu_gwas -list voxel_list.header -plink snp_file --precision 4 --size 256 --batch_size 500 --stream_count 2Return to command index
A1.68 gpu_pedifromsnps
Purpose: Calculates empirical pedigree matrix from a set of PLINK files using NVIDIA GPU hardware Usage: gpu_pedifromsnps -i <filename> -o <filename> -freq <filename> [optional: -corr <decimal> -gpus <list of GPU ids> -all -normalize -batch_size <integer number> -thread_size <integer number> -snp_stride <integer number> -per-chromo -id_list <filename> ] Requirements: -Dynamically linked version of solar-eclipse -CUDA capable NVIDIA graphics card with a minimum architecture of 3.5 -Linux version of solar-eclipse -CUDA drivers must be from toolkit 9.0 or later (Not applicable if built locally from source code) They can be downloaded https://developer.nvidia.com/cuda-10.2-download-archive Command Arguments: -i <plink base filename> Base filename for plink .bed,.bim., and .fam. Required -o <filename> Output filename of the matrix calculated. Required -freq <filename from plink_freq command> Filename of the output calculated by the plink_freq command. Required -all Selects all availible usable NVIDIA GPUs -snp_stride <integer in range [1,10]> Number of SNPs calculated perform GPU kernel block -id_list <Name of file containing subject IDs> Output will only contain subjects with ids included in the specified file -per-chromo Creates a separate matrix for each chromosome found within the plink file -batch_size <integer number in range [1,20000]> Number of SNPs calculated per iteration of a GPU stream. It's possible a number within this range could be too large, which will lead to an error or crash. Calibrate it relative to the number of subjects. If not value is specified then this command to estimate the largest number given memory constraints -thread_size <integer multiple of 32 no greater than 1024> Number of GPU threads used in GPU kernels. -corr <decimal value> Used to determine the exponent of the variance. Default value is -1. Other values are discouraged -gpus <GPU integer IDs separate by commas> List of NVIDIA GPU IDs that the command will use -normalize Option to normalize the final result such that the diagonal contains only 1's gpu_pedifromsnps is a GPU optimized version of the pedifromsnps command. This command only allows for the method one algorithm that's described in help pedifromsnps. Example: covar age*sex sporadic_normalize -header voxel_list.header -out voxels.csv load phenotype voxels.csv gpu_gwas -list voxel_list.header -plink snp_file --precision 4 --size 256 --batch_size 500 --stream_count 2Return to command index
A1.69 grid
Purpose: Find the highest likelihood in the vicinity of marker(s) Usage: grid <marker1> [<marker2> ...] Example: grid APOa D6S2436 Notes: outdir (or trait) and ibddir must previously be specified. ibd matrices for each marker must already have been computed. A model named "null0.mod" is expected in the output directory. That can be created with the polygenic command. Summary output is displayed on your terminal and written to a file named grid.out. An output and model file are saved for each marker with the name <markername>_best_theta. The twopoint command also has a "-grid" option which will grid around every marker evaluated. A special "-bsearch" option sets point at which a "golden section" search begins. By default, grid single-steps by 0.01 from 0.00 to 0.05 and then begins a golden section search. (This is on the assumption that the peak will be reached before 0.05 in the vast majority of cases.) If you have a significant number of cases above 0.05, you might want to change this, for example: grid mrk -bsearch 0.01 would start the golden section search after 0.01 (which will be faster if the value is expected to be greater than 0.05, but slower if the value is going to be less than 0.05). Note: 0.01 is the smallest value at which the search can begin. On the other hand if you wanted to single-step all the way to 0.10, you would give the command: grid mrk -bsearch .1 Shortcuts: grid - gridReturn to command index
A1.70 gridh2r
purpose: grid around the h2r value in polygenic model usage: gridh2r [-lower <lower>] [-upper <upper>] [-step <step>] -lower <lower> Lowest h2r; default is current value less 0.1 -upper <upper> Highest h2r; default is current value plus 0.1 -step <step> step; default is 0.01 Notes: polygenic should be run first. Only univariate models with only e2,h2r parameters are supported. Out is written to a file named gridh2r.out in the maximization output directory. The starting model is listed first regardless of whether it is in the range specified. After completion, the model with the best loglikelihood will be loaded, but with the h2r constraint (if any) deleted. This might be the starting model even if it isn't in the specified range. Each point requires a maximization, so they might come out slowly. For full maximization detail, give the "verbosity plus" or "verbosity max" command beforehand.Return to command index
A1.71 gsd2h2q
Purpose: Convert esd,gsd,[qsd1] parameters to standard parameters Usage: ;# trait only required for multivariate model gsd2h2r [<trait>] ;# compute h2r from esd,gsd,[qsd1] gsd2sd [<trait>] ;# compute SD from esd,gsd,[qsd1] gsd2h2q [<trait>] ;# compute h2q1 from esd,gsd,[qsd1] Note: Use polygsd command to set up model, and maximize to maximize it first, followed by linkgsd for linkage models. See the documentation for the polygsd, linkgsd.Return to command index
A1.72 gsd2h2r
solar::gsd2sd -- solar::gsd2h2q -- Purpose: Convert esd,gsd,[qsd1] parameters to standard parameters Usage: ;# trait only required for multivariate model gsd2h2r [<trait>] ;# compute h2r from esd,gsd,[qsd1] gsd2sd [<trait>] ;# compute SD from esd,gsd,[qsd1] gsd2h2q [<trait>] ;# compute h2q1 from esd,gsd,[qsd1] Note: Use polygsd command to set up model, and maximize to maximize it first, followed by linkgsd for linkage models. See the documentation for the polygsd, linkgsd.Return to command index
A1.73 gsd2sd
solar::gsd2h2q -- Purpose: Convert esd,gsd,[qsd1] parameters to standard parameters Usage: ;# trait only required for multivariate model gsd2h2r [<trait>] ;# compute h2r from esd,gsd,[qsd1] gsd2sd [<trait>] ;# compute SD from esd,gsd,[qsd1] gsd2h2q [<trait>] ;# compute h2q1 from esd,gsd,[qsd1] Note: Use polygsd command to set up model, and maximize to maximize it first, followed by linkgsd for linkage models. See the documentation for the polygsd, linkgsd.Return to command index
A1.74 gwas
Purpose: Calculates pvalues for snps given a loaded phenotype containing columns labeled "snp_" or a plink file that contains ids that correspond to the phenotype and pedigree ids. Usage: gwas [optional: -plink <plink base file name> -fix -list <trait list filename> -np <number of permuations> -precision <h2 decimal count> -evd_data <base filename output of create_evd_data> -use_covs -batch_size <number of SNPs to computed at once> -screen ] For single mode gwas --single-snp <snp name> --list <trait list filename> --precision <h2r decimal count> The GWAS function performs an examination of a genome-wide set of genetic variants and calculate p-values for the association between trait and individual SNPs. The output is saved as gwas.out in the directory of the trait. Covariates can be factored in by running sporadic_normalize on the trait beforehand. Using a plink file to load the snps into the function is highly recommended for large data sets since the file format reads into memory significantly quicker than csv files. This file type requires a .bed, .fam, and .bim file to work. This command's default method performs a full model estimate to compute p-values, SNP beta values, standard error of beta, h2r for SNP model, and standard deviation. The permutation option performs a quick permutation method to estimate p-values. The fix option will impute missing SNP values with a mean so null and EVD data can be reused for all SNP data instead of recalculating it. Fix is not applicable for permutation method because permutation by default imputes missing SNP values. In order to use the -list option the SNP data must be loaded from a plink file set and cannot be used with the permutation method. -precision setting changes the precision of h2 from default 6 to the specified value. -batch_size sets the number of SNPs to be read and computed at the same time. -use_covs allows for covariates to be included in -fix analysis, covariates are selected through covariate command -evd_data reads EVD data created through create_evd_data, can only be used with -fix option -screen runs screen mode which quickly computes an estimate of a beta, beta standard error, and a p-value Single SNP mode calculates the GWAS on a list of trait for a single SNP within the loaded phenotype. Output is written to gwas.out in the trait's directory, except when -list is used. In that case output is written to the main directory as *-gwas.out. If you plan to run gwas with the fix option -fix and have a data set with a very large number of subjects consider using create_evd_data prior to running gwas.Return to command index
A1.75 h2power
Purpose: Perform heritability power calculations Usage: h2power [-prev] [-grid {<from> <to> <incr>}] [-data <fieldname>] [-nreps <nreps>] [-seed <seed>] [-overwrite] [-plot] [-nosmooth] h2power -restart [-grid {<from> <to> <incr>}] [-nreps <nreps>] [-plot] [-nosmooth] This command performs a power calculation for the currently loaded pedigree, with the following default assumptions: (1) the trait to be studied is either quantitative or dichotomous (e.g. affected/unaffected) (2) the trait to be studied is influenced by additive genetics (3) all pedigree members will be phenotyped for the trait to be studied (unless the -data option is used to exclude those individuals who will not have phenotypic data; see the description of this option below) Simulation is used to estimate the frequency with which one would expect to obtain a significantly non-zero estimate of heritability given that a specified fraction of the phenotypic variance is due to additive genetics. Twice the difference in the loglikelihoods of the polygenic and sporadic models is asymptotically distributed as a 1/2:1/2 mixture of a chi-square random variate with one degree of freedom and a point mass at 0. A result is considered significant if the probability of obtaining the observed chi-square value, in the absence of a genetic effect, is less than or equal to .05. The default is to perform 10 replicates of the simulation for each heritability in the range .01, .02, .03, ..., .99. For each replicate, a polygenic model is fitted to the simulated data, and the resulting heritability estimate and chi-square statistic are recorded. The observed chi-squares are converted to power, i.e. the power to detect the corresponding observed heritability at a significance level of .05. The following options give the user some control over the power calculation procedure: -prev If the trait to be studied is dichotomous, SOLAR will assume the existence of an unobserved liability distribution. Individuals with liabilities above some threshold value will be "affected", i.e. they will have the larger of the two trait values (for example, a 1 for a 0/1 trait.) The -prev option is used to specify the "disease" prevalence, or fraction of individuals who are "affected", which in turn determines the liability threshold. -grid Specify the set of heritabilities for which power will be computed. At each grid point, trait data having that expected heritability are simulated, sporadic and polygenic models are fitted to the data, and the loglikelihoods of the models are compared. The observed chi-square test statistics are averaged to obtain the expected chi-square value for that heritability. The grid is given by a set of three numbers enclosed in curly braces: {<from> <to> <incr>} where <from> is the starting heritability, <to> is the last heritability considered, and <incr> is the interval between grid points. If the desired grid consists of a single effect size, the three-number list can be replaced by that single number and curly braces are not required. -data Exclude individuals from the power calculation who are missing data for phenotype <fieldname>. -nreps Perform <nreps> simulations at each grid point. The default number of replicates is 100. -seed Set the random number generator seed. The default is to set the seed based on the date and time. -plot At the end of the power calculations, display a plot of power versus QTL heritability. To display this plot for a previously completed calculation, use the command "plot -h2power". -nosmooth By default, the power curve is smoothed by fitting a line through the observed chi-square values as a function of the heritability squared prior to converting the chi-square values to power. This option turns the smoothing off. -overwrite (or -ov) Overwrite the results of a previous power calculation. -restart (or -r) Restart a power calculation. Notes: It is possible to change the grid of heritabilities and the number of replicates when restarting a calculation. The calculation will not be restarted if a grid is chosen that does not include all the points in the previously specified grid unless the -overwrite option is included, in which case the simulation replicates for any extra grid points are discarded. Similarly, the -overwrite option is necessary if fewer replicates are requested than were done previously, in which case any extra replicates are discarded. The following files are created: h2power.out A space-delimited file containing a line for each grid point in the format X Y, which is suitable for input to plotting packages such as xmgr. The first (or X) column contains the heritability. The second (or Y) column contains the power. h2power.info Stores the various options selected along with the chi-square statistic, averaged over the replicates, at each grid point. h2power.chis Stores the results of the simulation replicates run at each grid point. This file, along with h2power.info, is used to restart an interrupted power calculation. During a power calculation, various files named "simqtl.*" are created along with a trait directory named "simqt". These will be removed at the end of the run.Return to command index
A1.76 help
Purpose: Display instructional messages Usage: help ; Displays list of commands help <command> ; Displays doc for one command help -output <filename> ; Write list to a file help <command> -output <filename> ; Write doc to a file help -user ; Display user commands Notes: See also the doc and usage commands. The help for any particular command is quite detailed and may cover several pages. The usage command provides a very brief summary of command options, and does not invoke the 'more' pager so it will stay on your window while you enter the next command. See help for "helpadd" to see how you can add your own help messages to the help system. To display all the user commands, give the command "help -user". To display the help for any particular user command, the ordinary "help <command>" will work, because it searches user tcl files if <command> is not found in SOLAR tcl files. Shortcuts: hel - helpReturn to command index
A1.77 helpadd
Purpose: Explain how to add more help messages to SOLAR SOLAR first looks for a help message for a particular commands in the main SOLAR tcl package file solar.tcl. Next, it looks for help messages in all *.tcl files in the current tcl search paths: . (The current working directory) ~/lib (A lib subdirectory of your home directory) $SOLAR_LIB (the lib subdirectory of the SOLAR installation. This symbol is created by SOLAR.) All help messages are commented text blocks beginning with a command header and ending with a line containing "#-" or any line with no leading "#". # <packagename>::<commandname> [--] [private|public] The packagename is taken from the filename: it is the segment of the filename preceeding the first period (.). For example, the packagename for solar.tcl is "solar" and the packagename for john.solar.tcl is "john". Each help block should contain at minimum a purpose and usage section as shown in example below. The help display of all commands available is taken from all the commands defined in solar.tcl and other files in $SOLAR_LIB. The "help -user" command shows all commands for which help is available defined in . and ~/lib. To block display of this command in "help -user" include the word "private" following the command name. Example: File named multiproc.tcl defines a command named mmultipoint. The file is placed in ~/lib and has a help header that looks like this: # multiproc::mmultipoint -- public # # Purpose: Spawn multipoint jobs per chromosome # # Usage: mmultipoint <minlod> #- mmultipoint uses a private procedure mspawn: # multiproc::mspawn -- private # # Purpose: Launch job on cluster machine # # Usage: mspawn <scriptfile> #- Shortcuts: helpa - helpaddReturn to command index
A1.78 hlod
Purpose: Heterogeneity test for linkage Usage: hlod [-step <stepsize>] <stepsize> is size of increment in h2q1 and h2r. Default is 0.1, 0.05, 0.02, or 0.01 chosen to be about 1/10 of total heritability in starting model (with a minimum of 8 test points plus maximized h2q1 and total heretability). Notes: Linkage test model must be loaded at beginning. Corresponding null model will be found in current output directory. Complete results are written to hlod.out in maximization output directory. Linkage model may only have 1 linkage element; null model is "polygenic" (null0) model in current output directory. H0 vs. H1 test is only considered significant if p < 0.0001 (LOD 3). If this is not significant, there will be a warning that the H1 vs H2 test makes no sense. hlod uses "homo" program written by Harald Goring for which documentation is provided in Appendix 5 of the documentation (use "doc" command to browse, click on Table of Contents).Return to command index
A1.79 house
Purpose: Enable analysis of household effects Usage: house ; enable analysis of household effects house -suspend ; temporarily suspend household effects ; (the c2 parameter is constrained to zero) house -delete ; delete household effects (c2 is deleted) Examples: ** new model ** Note you must give house command after trait or model new solar> model new solar> trait weight solar> covar age^1,2#sex solar> house ; activates h. effects solar> polygenic -screen ; polygenic analysis w. screening ** old model ** solar> load model poly solar> house solar> polygenic Notes: This may be used for any common environmental effects (not necessarily "household" effects). The house command changes the current model as follows: 1) A parameter c2 (c stands for "common") is created 2) The house matrix is loaded 3) A c2*house term is added to the omega 4) The c2 parameter is added to the e2 + ... = 1 constraint 5) The starting value of c2 is carved away from the value of e2 so that the constraint remains satisfied The pedigree file must contain a HHID field. If so, the 'load pedigree' command produces a matrix named house.gz. That matrix will be used. If house.gz is not present, this command will fail, although you can map HHID to any particular field in your pedigree file using the "field" command. WARNING! If you load a pedigree without a HHID field (or a field mapped to it with the field command) a pre-existing house.gz, now assumed to be obsolete, will be deleted. This is to prevent you from making the mistake of using an obsolete house.gz. HHID can be a number or an alphanumeric name (with no internal spaces or tabs) but the number 0 (zero) has a special meaning. Zero indicates singleton households--each individual with HHID of zero is a separate household, not associated with other individuals having HHID of zero. Blank or null has the same effect as zero. The 'house' command should be specified after such commands as automodel, trait, polymod, or spormod (which initialize polygenic or sporadic models) and/or just before commands which maximize models such as 'polygenic,' 'maximize,' or 'multipoint.' This is because "polygenic" or "sporadic" models, by definition, do not have household effects. But the polygenic command will do the household "analysis" if it detects the presence of a c2 parameter which is not constrained to zero. We define the following model classes: sporadic (covariates only...and e2) household (covariates and c2) polygenic (covariates and h2r) household polygenic (covariates, c2, and h2r) To create a pure "household" model with no active genetic component, give the commands "spormod" and "house" in that order after setting up the trait(s) and covariate(s). By default, if a household element is in the model, pedigrees will be merged whenever individuals in separate pedigrees share the same household. The resulting groups are called "pedigree-household" groups. This may significantly increase memory requirements. Pedigree merging is controlled by two SOLAR options (see the option command). The default of 1 for MergeHousePeds means that pedigree merging, as just described, will be done. This feature may be changed by setting MergeHousePeds to zero prior to the polygenic or maximize commands: solar> option mergehousepeds 0 solar> polygenic -screen The MergeAllPeds option combines all the pedigrees into one large group if set to 1. This is an alternative simpler method of merging, but it may increase memory requirements even more. Shortcuts: hou - housesReturn to command index
A1.80 howmanyranch
Purpose: Show how many ranch machines a user is using Usage: howmanyranch <userid> Notes: See also whoranch. Requires use of Grid Engine software.Return to command index
A1.81 ibd
Purpose: Compute marker-specific IBDs. Usage: ibd [-nomle] <marker> [<marker> ...] ; computes IBDs for specified markers ibd [-nomle] ; computes IBDs for all markers ibd mito ; computes mitochondrial IBDs ibd export [-file <filename>] [-overwrite] [-append] [-nod7] [-ibdid] [<marker> ...] ; writes IBDs for specified markers ; to a file in comma delimited format ibd import [-file <filename>] [-nod7] [-ibdid] [<marker> ...] ; reads IBDs for specified markers ; from a file in comma delimited format ibd prep <program> [-version 2.82] [<marker> ...] ; prepares input files needed to compute ; IBDs using <program>, where <program> ; is loki, simwalk (sw), merlin, or ; genehunter (gh) Before any ibd command can be run, the directory in which to store the IBDs must be specified with the 'ibddir' command. This specification is stored in a file ibddir.info in the working directory, so it need not be repeated in future sessions from the same working directory. The first record in all matrix files produced by SOLAR, including IBD matrix files, is a checksum and not real data; see matcrc command for details. This checksum is optional in user created matrix files. If present, it prevents a using matrix with a different or changed pedigree. In the absence of prior knowledge of marker allele frequencies, it is recommended that the 'freq mle' command be used to compute maximum likelihood estimates of the allele frequencies. This will improve the accuracy with which missing genotypes are imputed in the IBD computation process. IBDs will not be computed for markers with simple-count allele frequencies generated by the 'load marker' command, i.e. MLE allele frequencies are required when prior frequency data is not available. To compute IBDs using the simple-count allele frequencies instead of MLEs, specify the -nomle option. Alternatively, the NoMLE option can be set using the 'ibdoption' command. The method used to compute marker-specific IBDs will depend on the family structure and will be selected automatically. It is possible to choose the Monte Carlo method regardless of the automatic selection by using the 'ibdoption' command. For performance reasons, the Monte Carlo method will be used automatically for completely-typed markers. Mitochondrial IBDs are a special case. Each pair of individuals who share a common maternal lineage, i.e. who have inherited the same mitochondrial DNA, will be assigned an IBD value of 1, while all other pairs are assigned an IBD value of 0. The necessary information is completely contained in the pedigree data. Hence, there is no mitochondrial marker data to load, nor are allele frequencies required. The 'ibd export' command outputs the IBDs for a specified set of markers into a comma delimited file. The IBDs must be stored in the directory named in the 'ibddir' command. If no marker names are given, then all IBDs found in the 'ibddir' directory are exported. By default, the SOLAR indexed IDs (IBDIDs) in the IBD files are translated to permanent IDs, and family IDs are included when present in the pedigree file. The default name for the output file is "solar-ibd-export.out". The default fields in the output file are MARKER, [FAMID,] ID1, ID2, IBD, and D7. The options for the 'ibd export' command are -file (or -f) Export IBDs to this filename. -overwrite (or -ov) Overwrite existing output file. -append (or -a) Append IBDs to existing output file. -nod7 Don't include D7 field from IBD files. -ibdid Write out SOLAR indexed IDs (IBDIDs) rather than permanent IDs. The 'ibd import' command inputs the IBDs for a specified set of markers from a comma delimited file. IBD files are written and stored in the directory named in the 'ibddir' command. If an IBD file for an imported marker already exists, it is overwritten. By default, the permanent IDs in the input file are translated to SOLAR indexed IDs (IBDIDs). Family IDs must be included in the input file when they are present in the pedigree file. The default name for the input file is "solar-ibd-import.in". The default fields in the input file are MARKER, [FAMID,] ID1, ID2, and IBD. If the input file does not contain a D7 field, all D7 values in the IBD files are set to zero. By default, all IBDs in the input file are imported. However, if markers are specified on the command line, then IBDs are imported for those markers only. If one and only one marker is specified on the command line, a MARKER field is not required in the input file. The order of the markers in the input file is unimportant, but all the lines for a given marker must be adjacent. Unless there is inbreeding in the pedigree file, checks are made to ensure that imported IBDs for parent-offspring and self pairs are correct (0.5 and 1, respectively). An option is provided to make the parent-offspring error checking appropriate for X-linked markers. Checks are also made to ensure that imported IBDs for unrelated individuals are equal to 0. The options for the 'ibd import' command are -file (or -f) Import IBDs from this filename. -nod7 Don't take D7 from input file; set D7 to zero instead. -ibdid Input file contains SOLAR indexed IDs (IBDIDs) rather than permanent IDs. -xlinked Use error checking appropriate for X-linked markers Notes: The computed IBDs are stored in gzipped files with names of the form 'ibd.<marker>.gz', where <marker> is the marker name. All working files created during the IBD computation process will be stored in the marker-specific subdirectories created by the 'marker load' command. Mitochondrial IBDs are stored in the gzipped file 'ibd.mito.gz'. If a marker exists in the marker data that is named 'mito', the same file name will be used to store the IBDs for that marker. Hence, the marker name 'mito' should not be used if you intend to use the 'ibd mito' command. Shortcuts: ibd - ibdReturn to command index
A1.82 ibddir
Purpose: Set directory in which IBD matrix files are stored (twopoint only; use mibddir to set up multipoint) Usage: ibddir <dirname> ; set director for IBD files ibddir ; show current ibddir ibddir -session ; show ibddir entered in this session Notes: The ibddir selected is saved in file ibddir.info for future SOLAR sessions. Once a midddir is selected, it need not be selected again within the same working directory, EXCEPT for the purposes of writing out ibd files. To prevent accidentally overwriting pre-existing ibd files, it is necessary to explicitly enter the ibddir command before using the ibd command or other commands which write files into the ibddir. Shortcuts: ibdd - ibddirReturn to command index
A1.83 ibdoption
Purpose: Set or display IBD processing options. Options: XLinked select this option to load X-linked marker data NoMLE if this option is chosen, MLE allele frequencies are not required for IBD calculation MCarlo if this option is chosen, the Monte Carlo method will be used to calculate IBDs MibdWin size (in cM) of the multipoint IBD window - the MIBDs at a given chromosome location depend only on markers inside or on the boundary of the window centered at that location Usage: ibdoption ; displays current IBD options ibdoption xlinked ; toggles the XLinked option ibdoption xlinked <y/n> ; sets the XLinked option ibdoption xlinked ? ; displays the current setting of XLinked ibdoption nomle ; toggles the NoMLE option ibdoption nomle <y/n> ; sets the NoMLE option ibdoption nomle ? ; displays the current setting of NoMLE ibdoption mcarlo ; toggles the MCarlo option ibdoption mcarlo <y/n> ; sets the MCarlo option ibdoption mcarlo ? ; displays the current setting of MCarlo ibdoption mcarlo # <num> ; sets number of imputations ibdoption mcarlo # ; displays number of imputations ibdoption mcarlo max <y/n> ; choose max risk for first imputation? ibdoption mcarlo max ? ; displays max risk option ibdoption mibdwin ; displays the multipoint IBD window size ibdoption mibdwin <size> ; sets the multipoint IBD window size Shortcuts: ibdo - ibdoptionsReturn to command index
A1.84 ibs
Purpose: Compute marker-specific IBS matrices. Usage: ibs <marker> [<marker> ...] ; computes IBSs for specified markers ibs ; computes IBSs for all markers Before the ibs command can be run, the directory in which to store the IBSs must be specified with the 'ibddir' command. Notes: The computed IBSs are stored in gzipped files with names of the form 'ibs.<marker>.gz', where <marker> is the marker name. Shortcuts: ibs - ibsReturn to command index
A1.85 if_global_exists
Purpose: Check if a Tcl global variable exists Usage: if_global_exists <global_name> Returns 1 if global exists, 0 otherwise. Notes: This is used in scripts in an "if" statement. For example: if {[if_global_exists SPECIAL_CASE]} { global SPECIAL_CASE puts "This is case $SPECIAL_CASE" } You do not need to declare the variable "global" before calling if_global_exists. However, you will need to declare it global before setting or using it in a script. Note that all variables declared at the interpreter level (at the solar> prompt) are automatically global. Global variables should not be confused with "shell" variables such as SOLAR_BIN (though, all shell variables may be found in the global array "env", for example, $env(SOLAR_BIN)). Global variables are a convenient way of passing variables through many levels of procedure calls without rewriting all the intervening procedures, or across commands on an ad hoc basis. Use of global variables is considered "bad style" by programming purists and other bores. But if they're so smart, why aren't they writing your program? It is true, however, that use of global variables can sometimes introduce bugs and other unintended consequences. Globals variables prefixed with SOLAR_ are reserved for use by the standard SOLAR procedures defined in solar.tcl. But solar.tcl might also use unprefixed globals, so it is recommended that users use their own unique prefix to be safe. See Also: remove_globalReturn to command index
A1.86 if_parameter_exists
Purpose: Check if a parameter exists without creating it Usage: if_parameter_exists <parameter_name> Returns 1 if parameter exists, 0 otherwise. Notes: This is used in scripts in a "if" statement. For example: if {[if_parameter_exists h2q1]} { constraint e2 + h2r + h2q1 = 0 }Return to command index
A1.87 inormal
Purpose: Save inverse normal transformation to a file (see also define) IMPORTANT: To create a model using an inverse normal transformation, it is more convenient to use the "define" command, and NOT the inormal command. The "inormal" command itself is for those rare situations where you need to save the inverse normal transformation to a file for some other purpose. Usage: define <defname> = inormal_<phenotype> trait <defname> inormal -trait <trait> [-file <filename>] -out <filename> [-phenfile] [-nofamid] [-class <class>] -class <class> only include ID's when their class variable equals this value (See notes below for obscure forms of the inormal command not recommended for most users.) Notes: For the "define" command, the <defname> can be any name you can make up. The inormal_ prefix may be abbreviated down to inor_ . The <phenotype> is any phenotypic variable in the currently loaded phenotypes file. For the "inormal" command itself, you must use one of the arguments "-phenfile" or "-file <filename>". The "-phenfile" argument is a shorthand way of specifying the currently loaded phenotypes file. The "-file <filename>" argument is used to specify any file. In either case, the file must be in the form of a phenotypes file, with fields for <trait> and ID (and FAMID if required to make the ID's unique). BE SURE TO SPECIFY THE "-out" ARGUMENT FOR THE OUTPUT FILE. The inverse normal transformation of a dataset is performed by the following procedure: The trait values are sorted, and for any value V found at position I in the sorted list, a quantile is computed for it by the formula I/(N+1). The inverse normal cumulative density function (see "normal") is computed for each quantile and stored in an array keyed by ID, and FAMID if applicable. When the value V occurs multiple times, the inverse normal is computed for each applicable quantile, averaged, then the average is what is stored for each ID. These values are accessed when the ID is provided. The array for each trait is deleted by the -free option. See also the "normal" command, which computes normal distribution functions. inormal uses a "normal -inverse" command. OBSCURE FORMS OF THE INORMAL COMMAND Largely for internal purposes, such as the implementation of the define command, there are additional obscure forms of the inormal command which save the inverse normal results in a tcl variable for access on an individual ID basis: inormal -trait <trait> [-file <filename>] -tclvar [-phenfile] [-nofamid] inormal -trait <trait> -id <id> [-famid <famid>] inormal -free <trait> inormal -reset The first form above is like the standard form, except that the -out argument is replaced with a -tclvar argument, signifying that the inverse normal results are to be saved to a Tcl variable associated with the trait name. In the second form, a result is obtained from previously stored data for each ID. In the third form, stored data for a particular trait is freed. In the fourth form, all stored data is freed. The -out and -tclvar arguments cannot be used at the same time. If the -out argument is used, inverse normals are simply written to a file and nothing is stored, so the second form cannot be used. FAMID should only be specified if required. The rules regarding FAMID are almost identical with those used during maximization, so that in general you don't have to think about them. If FAMID field is found in both pedigree and phenotypes files, or if pedigree file isn't loaded (which wouldn't be allowed during maximization) and FAMID is found (only) in phenotypes file, FAMID is automatically required, unless the -nofamid argument is used. If FAMID is found in only one of the two files (and both are loaded), a test for ID uniqueness is performed, then if ID's are unique without FAMID, it is not required, otherwise FAMID is required and if not present, it is an error. FAMID can be mapped to any other field name using the field command. When using these obscure forms of the inormal command, it is recommended to load the data and then use it in short order, even though the inormal command doesn't intrinsically require this. Internal "inormal" data is not saved from one SOLAR session to the next. BEWARE that "maximize" or any SOLAR command that performs maximization, such as "polygenic" or "multipoint", may clear out inverse normal data stored using -tclvar. Also, if different layers of procedures get inormals on traits with the same name from different files, and their inormal operations overlap, there could be problems. When the -class option is used, the traitname is qualified with a suffix like .SOLARclass.1 (where 1 is the class number). To free such a classed trait, the fully suffixed name must be used. For example, for trait q4 and class 1, the command would be: inormal -free q4.SOLARclass.1Return to command index
A1.88 interval
Purpose: Set cM interval and range for multipoint scanning each chromosome Usage: interval <count> <range> ; set increment count and range interval <count> ; default range is 0-* (* means last) interval ; displays current setting Examples: interval 5 ; Check every 5 cM interval 1 101-109 ; Check every 1 cM between 101 and 109 interval 10 200-* ; Check every 10 cM after <200 cM> interval 0 100 ; Check at position <100 cM> interval -5 *-100 ; Check every 5 cM from last to 100 Shortcuts: interv - intervalsReturn to command index
A1.89 intraitclass
purpose: set up trait with classwise inormalization Usage: intraitclass <traitname> [<class>]+ Example: trait averagefa 0 2 3 Notes: Do model new before using this command.Return to command index
A1.90 invert
invert from version 6 has been renamed "transpose" in version 7 see "help transpose" for more informationReturn to command index
A1.91 is_nan
Purpose: Check if value is NaN (Not a Number) Usage: is_nan <number> Returns 1 if number is NaN, 0 otherwise. Notes: This is most useful in scripts, when getting the likelihood or other value using read_model, you should check to be sure it is not NaN due to maximization convergence error.Return to command index
A1.92 joinfiles
Purpose: Join files horizontally based on ID's Usage: joinfiles [-all] [<filename>]* [-out <filename>] [-key <keylist>] -list <filename> -chunk <chunksize> -out <filename> Write joined records to this file (default is joinfiles.out in the working directory) -key <keylist> Base join on these key(s) (default is ID or EGO, and also FAMID if FAMID is present in all files) -all Filenames may be patterns with wildcards (including * to match any sequence of characters and ? to match any one character) and/or names of directories whose files will be included. (Files in subdirectories are not included.) When using -all, no system limit on open files is applicable. -list <filename> Include all files listed in <filename>, which has one filename in each line, which may be a pattern with wildcards. Only one -list may be used. When using -list, no system limit on open files is applicable. -chunk <chunksize> The chunk size used in joining files under -all and -list options. By joining only one chunk of files at a time, the system limit on open files is bypassed. The default is 100. Some additional esoteric options are described below in Note 7. Notes: 1) Each file may either be Comma Delimited or Pedsys, and sucessive files may use different formats. 2) The output file will be Comma Delimited, thus this command also serves to translate one or more Pedsys files to Comma Delimited format. 3) Any field mapping of ID and FAMID to some other name through the "field" command will be applied if the keys are defaulted. Key matching is case insensitive, so the key might be "ID" in one file and "id" in the next. 4) Records will be output for every ID encountered regardless of whether that ID is found in all files. 5) If keys are specified, you'd better know what you are doing. No field name mapping or testing of whether FAMID is required will be done. However, whether defaulted or not, the availability of keys in every file will be tested. 6) If the same filename is repeated in the list of files to join, the repeats are ignored (for technical reasons). If you must join the same file to itself for some legitimate reason (???), copy to a another filename first. 7) If the same field name(s), other than the key(s), are found in more than one file, the default action is to rename them in the output file in a way so as to be unique. The following format is used: <field name>.<filename>[.<number>] If adding the filename makes the field name unique, that is all that is done, which makes for a nice new name. For example: q4.qaw10.phen (phenotype q4 in file gaw10.phen) Otherwise, <number> is applied, using the first number (starting from 2 and ending with 30002) that makes the field name unique. Unless there are more than 30000 matching field names, this will guarantee that a unique name will be found and used. Also, with reasonably short names, it is likely that the resulting name will be unique within 18 characters, which is currently required for trait and covariate names. However, uniqueness within 18 characters is not guaranteed as that would require ugly renaming and it's quite possible the 18 character limit may be removed eventually anyway. Uniqueness testing is case insensitive. There are two other optional ways of handling field names which are not unique. These option specifiers may be used anywhere after the command name but apply globally to all files. -uniqueonly Remove fields which are not unique among files (except keys). -norename Don't rename fields that are not unique, just include them. (Note: If this option is applied, the resulting file may cause problems with various SOLAR commands. For example, the "residual" command won't like it even if the duplicated field is NOT used as a trait or covariate.) 8) If the same fieldname is repeated in one file, that field is not included in the output. (Such fields could not be selected as traits or covariates in SOLAR anyway.) This typically occurs when there is a field named BLANK to separate columns in a Pedsys file. Also, fields with the "null" name (zero characters or all blanks) are not included.Return to command index
A1.93 key
Purpose: Make user key Usage: key <make> <username> Note: This is intended for use only by authorized personnel. Shortcuts: key - keyReturn to command index
A1.94 linkgsd
solar::linkqsd0 -- solar::linkqsd -- Purpose: Set up linkage model with esd, gsd, qsd parameters (EXPERIMENTAL) Usage: linkqsd <path>/<mibdfile> linkqsd0 <path>/<mibdfile> Example: model new trait q4 covar age sex polygsd maximize gsd2h2r chromosome 9 10 interval 5 mibddir gaw10mibd multipoint -link linkqsd0 -cparm {esd gsd qsd} Notes: Polygenic parameters must already have been set up (use the polygsd command). Prefereably it should have been maximized also (use the maximize command). linkqsd modifieds the model currently in memory. linkqsd0 assumes the existance of a null0 model in the maximization output directory, loads that, and then adds the linkage element. We have not worked out suitable heuristics to force maximization of difficult models, so at the present time this parameterization is not as robust as our standard parameterization.Return to command index
A1.95 linkmod
Purpose: Set up parameters and constraints for multipoint linkage model Usage: linkmod [-add [-epistasis]] [-zerostart] [-2p] <ibdfile> [-cparm] [-se] -add means add this to previous linkage elements (otherwise, it supercedes the previous one) -zerostart is used for score analysis (see below) -2p twopoint (ibd not mibd) -epistasis sets up epistasis parameter between new element and previous linkage element. Use with "-add". Not supported for bivariate. -se Turn on standard error calculation option. The default is to turn it off. -cparm "Custom Parameterization" Simply replace old matrix with new matrix. Parameters, constraints, and omega are unchanged. A "prototype" model with suitable matrix, parameters, omega, and constraints must already be loaded. See final note below for more information. Note: if -cparm is specified, standard errors are NOT turned off, but left in whatever state they were in when linkmod was called. Notes: Use the -2p option for twopoint models. linkmod2p is now obsolescent (linkmod -2p is invoked when you give the linkmod2p command). A polygenic or linkage model should already have been created and maximized first. Boundaries are set around existing values assuming this has been done. Multiple linkage terms will be included if Solar_Fixed_Loci is defined. The script multipoint does this. By default, standard error is turned off. You may turn it on again by giving the command 'option standerr 1' after running linkage and before running maximize. The -zerostart option starts the new linkage component at 0. (The linkage component MUST NOT HAVE BEEN ALREADY CREATED!) This is used for score test analysis. The -cparm option requires that a prototype linkage model with all required matrices, parameters, omega terms, and constraints be already loaded. Other than that, however, it ATTEMPTS to be as general as possible. However, it is necessary to make one assumption regarding the name of the first matrix. If the -2p option is specified, the relevant matrix that will be modified must be named ibd or ibd1, ibd2, etc. Otherwise, the relevant matrix must be named mibd or mibd1, mibd2, etc. It is the ibd or mibd matrix with the highest number, if any, which will be replaced. If a second matrix column such as d7 or delta7 is included, it will be assumed to be included in the replacement matrix as well. This option is used by "multipoint -cparm" and "twopoint -cparm". Shortcuts: linkm - linkmodelReturn to command index
A1.96 linkqsd
Purpose: Set up linkage model with esd, gsd, qsd parameters (EXPERIMENTAL) Usage: linkqsd <path>/<mibdfile> linkqsd0 <path>/<mibdfile> Example: model new trait q4 covar age sex polygsd maximize gsd2h2r chromosome 9 10 interval 5 mibddir gaw10mibd multipoint -link linkqsd0 -cparm {esd gsd qsd} Notes: Polygenic parameters must already have been set up (use the polygsd command). Prefereably it should have been maximized also (use the maximize command). linkqsd modifieds the model currently in memory. linkqsd0 assumes the existance of a null0 model in the maximization output directory, loads that, and then adds the linkage element. We have not worked out suitable heuristics to force maximization of difficult models, so at the present time this parameterization is not as robust as our standard parameterization.Return to command index
A1.97 linkqsd0
solar::linkqsd -- Purpose: Set up linkage model with esd, gsd, qsd parameters (EXPERIMENTAL) Usage: linkqsd <path>/<mibdfile> linkqsd0 <path>/<mibdfile> Example: model new trait q4 covar age sex polygsd maximize gsd2h2r chromosome 9 10 interval 5 mibddir gaw10mibd multipoint -link linkqsd0 -cparm {esd gsd qsd} Notes: Polygenic parameters must already have been set up (use the polygsd command). Prefereably it should have been maximized also (use the maximize command). linkqsd modifieds the model currently in memory. linkqsd0 assumes the existance of a null0 model in the maximization output directory, loads that, and then adds the linkage element. We have not worked out suitable heuristics to force maximization of difficult models, so at the present time this parameterization is not as robust as our standard parameterization.Return to command index
A1.98 load
Purpose: Load a user data file (pedigree, phenotype, marker, etc.) Usage: load <object-type> [<options>] <arguments> load pedigree <filename> load phenotypes <filename> load matrix [-sample | -allow] <filename> <name1> [<name2>] load matrix [-cols <tcl-list>] <filename> ;# MathMatrix load model <filename> load freq [-nosave] <filename> load marker [-xlinked] <filename> load map [-haldane | -kosambi] <filename> Notes: There is much more information available for each variant of the load command. See the documentation for the particular object type, for example, "help pedigree". For information about a particular file format, see the applicable file-* documentation, for example, "help file-pedigree". Shortcuts: load - loadReturn to command index
A1.99 load_house
Purpose: Loads house file when using empirical pedigree This command loads household matrix to be used with empirical pedigree. The matrix is expected to be in a two-column csv format as follows. Id, HouseID Where the ID column should match the IDs of the empirical pedigree file. HouseID is a sequential ID that marks subjects from the same house hold. This file can include only the subjects for whom household information is available. Subjects omitted from that file will be assumed from independent households. For example ID,HouseID A101,1 A102,1 A103,1 A121,2 A122,2 A123,2 Will code a matrix where subjects A101-3 live in the same household and subjects A121-3 form another household.Return to command index
A1.100 loadkin
Purpose: Load a matrix named phi2.gz containing phi2 and delta7 Usage: loadkin Notes: If the file phi2.gz does not exist, this command will be silently ignored. This command is mainly for scripts. You can perform the same action with a known matrix file with: matrix load phi2.gz phi2 delta7 Shortcuts: loadk - loadkinshipReturn to command index
A1.101 lod
Purpose: Calculate LOD score Usage: lod [<test-loglike> <null-loglike>] [<options>] options := [-auto|-off|-raw] [-trait <N>] [-rhoq <N>] [-v] [-1t|-2t|-3t|-4t|-t1|-t2|-t3|-t4] [-nolodadj] If no likelihoods are specified, the likelihoods of the "current" model and the applicable "null" model are used. -auto Convert multivariate LOD to 1df effective LODs based on number of traits in current model and constraint of relevant rhoq's (default) -off Do not convert LODs to 1df effective -raw Do not perform LOD conversion or lodadj -traits <N> Convert multivariate LOD to 1dF assuming <N> traits -1t or -t1 Assume 1 trait (same as "-traits 1") -2t or -t2 Assume 2 traits (same as "-traits 2") -3t or -t3 Assume 3 traits (same as "-traits 3") -4t or -t4 Assume 4 traits (same as "-traits 4") -rhoq <N> Convert multivariate LOD to 1df assuming <N> constraints of relevant rhoq's -nolodadj Do not perform lod adjustment (lodadj) -v verbose: Show adjustment and conversion steps Examples: outdir test1 load model test1/null1 lod lod -v lod -2000.51 -2030.87 lod -trait 3 -rhoq 1 -v -2000 -2030 lod -raw -2000 -2030 Notes: If no likelihoods are specified, the current model must have been maximized through a command such as "maximize," "twopoint", or "multipoint", and the applicable null model should be saved as nullX.mod (e.g. null0.mod, null1.mod) where X is the number of active linkage elements, which is assumed to be one less linkage element than in the current model. Linkage elements are parameters named h2q1, h2q2, etc. The null model must have been saved in the maximization output directory, either named after the trait or set by the outdir command. By default, SOLAR provides easily interpreted "1 df effective" LODs which are equivalent to those in univariate models. However, you can also have complete control over the LOD conversion performed either using arguments here or preferences set globally with the lodp command. Options specified here override the defaults and lodp preferences. The correction of 2 trait LODs to 1dF effective LODs is based on this formula: the LOD is converted to chi square with 1/2 1df, 1/4 3df, and 1/4 point mass at zero. If rhoq is constrained, the formula is 1/2 1df, 1/4 2df, and 1/4 point mass at zero. This is then converted to a 1/2 1df chi square of equivalent p-value, which is divided by 2ln10 to get the 1df effective lod score. The correction of 3 trait LODs to 1dF effective LODs is based on the formula: the LOD is converted to chi square with 3/8 1df, 3/8 3df, 1/8 6df, and 1/8 point mass at zero. For each rhoq constrained, the 6df is changed downward by 1df. The conversion of higher multivariate LODs follows a similar expanding sum. If you wish to see the weights used, use the lod command with the -v option. Empirical LOD adjustment, if any, is automatically applied (see the lodadj command) unless the -raw option is used. Unless you specify -raw, SOLAR will need to search the output directory for a lodadj.info file, which means that a trait or outdir must have been selected. Empirical LOD adjustment is not yet supported for bivariate models. The lodadj value is ignored when bivariate LODs are computed, and, in the cases where the lodadj value would be shown (such as in the multipoint.out file, or if lod is called from the command prompt) a warning message is shown instead. In SOLAR version 3.0.2, the "clod" and "lod" commands were combined into a new "lod" command. The options allowed have changed compared with the earlier "clod" ; the original "lod" command did not allow any arguments. Use the "lodn" command if you the current model may not use the "h2q1" linkage parameter and you are not specifying loglikelihoods explicitly. See also lodn, lodp, lodadj. Shortcuts: lod - lodReturn to command index
A1.102 lodadj
Purpose: Use or calculate an empirical LOD adjustment Usage: lodadj [-calc] [-off] [-null <N>] [-nreps <#replicates>] [-restart] [-restorenull0] [-query] [-summary] lodadj If no arguments are given, this turns ON the previously calculated empirical LOD adjustment for the current trait/outdir. This value is stored in a file named lodadj.info if currently ON or lodadj.off if currently OFF. It is an error if the null0 model has a later timestamp than the lodadj.info file. (You can update the timestamp of the lodadj.info file with the Unix "touch" command if you are sure it is OK.) -off Turn OFF empirical LOD adjustment. -query Return the LOD adjustment currently in effect (1.0 if none). -calc Calculate and use a new empirical LOD adjustment. (This requires an existing null0.mod file from the polygenic command.) The adjustment is turned ON. -null Use an existing nullN model instead of null0. -nreps Number of replicates. In each replicate, a fully-informative marker, unlinked to the trait, is simulated, IBDs are calculated for this marker, and a LOD is computed for linkage of the trait to this marker. The default number is 10000. -restart (or -r) Perform additional replicates, adding the LODs to the set of previously computed LODS, until the total number of replicates (old and new) reaches the number specified by the -nreps argument. The same null model is used as in the previous replicates; the -null argument is ignored if present. -cutoff Specify the fraction of the highest observed LODs that will not be used to compute the empirical LOD adjustment. For example, if the cutoff is .01, then the largest 1% of the observed LODs will be ignored when the LOD adjustment is calculated. The default cutoff is .05. -overwrite (or -ov) Recalculate the empirical LOD adjustment. Existing LOD adjustment output files in the trait/outdir will be overwritten. -restorenull0 Restore the null0 model in effect at the time the last empirical LOD adjustment was calculated. This will overwrite a later null0 model. -summary Display a summary of the LOD adjustment calculation. The summary shows the distribution of the original and adjusted LOD scores, the number of replicates performed, and the name of the null model. Notes: The -calc option produces output files in the trait/outdir: lodadj.out, lodadj.lods, and lodadj.info. lodadj.out contains summary information, lodadj.lods contains the raw actual vs. theoretical LODs, and lodadj.info contains state information including the null model currently in effect. The lodadj value and state (on or off) is saved in each trait/outdir (by the lodadj.info or lodadj.off file). This is now preserved when restarting SOLAR. lodadj is now supported for bivariate lods. Since the correction is always computed with one additional degree of freedom, the lodadj adjustment is applied AFTER the lod correction to one degree of freedom, and the user is advised not to disable the one degree of freedom correction with the lodp command.Return to command index
A1.103 lodn
Purpose: Calculate LOD score for current model relative to nullX Usage: lodn X <options> X Number indicating index of relevant null model (for example, 0 for null0, the model having no linkage elements). <options> See "help lod". Notes: In many cases you can more easily use the "lod" command, which determines the applicable null model automatically, or, you can specify the loglikelihoods. "lodn" may be useful if you are not sure whether the current model contains h2q parameters, for example, if it includes a custom parameterization. The current model must have been maximized, either through the "twopoint" or "multipoint" command, or directly with the "maximize" command. The null model should be saved as nullX.mod (for example, null0.mod or null1.mod) where X is the number of active linkage elements. There are many special options for LOD calculation. See "help lodp" for more information. The primary LOD calculating procedure in SOLAR is "lod" which lets you specify the loglikelihood values and option(s) directly. See also lod, lodp. Shortcuts: lodn - lodnReturn to command index
A1.104 lodp
Purpose: Change LOD preferences (such as conversion to 1df) Usage: lodp [-auto | -off | [-traits <N>] [-rhoq <N>]] [-1t|-2t|-3t|-4t|-t1|-t2|-t3|-t4] (If no argument is given, current preferences are displayed.) -auto Convert LODs to 1 degree of freedom (1dF) effective LODs automatically based on traits and rhoq constraints in current model (default). -off Do not perform LOD conversion to 1 df equivalence -traits <N> Convert assuming there are/were <N> traits -1t or -t1 etc. Shortcuts for 1 trait, 2 traits, etc. up to 4 -rhoq <N> Convert assuming <N> rhoq's are constrained Notes: If -traits is specified without -rhoq, -rhoq is assumed to be 0. If -rhoq is specified without -traits, trait count is determined automatically (and might be determined to be 1, in which case rhoq specification is irrelevant). If you need to set both -traits and -rhoq, you must give both in the same lodp command. This should not be confused with lodadj (see). The lodp command sets global preferences for "degrees of freedom" issues which arise with multivariate models. The default "-auto" will apply conVERSION BAsed on the number of traits in the current model and the number of relevant rhoq's (defined below) which are constrained. LODs will be converted to 1 degree of freedom effective LODs (for which the traditional cutoff in statistical genetics for a genome-wide linkage scan is 3). Relevant rhoq's are parameters prefixed rhoq which correspond to the highest numbered linkage element. For example, in a bivariate linkage model with one linkage element, the relevant rhoq whould be "rhoq1", but with two linkage elements, it would be "rhoq2". For a trivariate model with one linkage element, the relevant rhoq's would be: rhoq1_12, rhoq1_13, rhoq1_23. The preferences set by this command will apply to all LOD scores subsequently calculated by SOLAR, including those reported by the twopoint and multipoint commands, and the lod and lodn commands. The lod command, which is what ultimately calculates all LOD scores, has options which are similar to lodp. Changes to lodp preferences apply only within the current SOLAR session, so the command must be re-entered each time SOLAR is started or at the beginning of SOLAR scripts when you need to change the defaults. For more discussion of the how the conversion is performed, which rhoq constraints are relevant, etc., see help for the lod command. See also lod, lodn, and lodadj.Return to command index
A1.105 loglike
Purpose: Get the log likelihood of the current model Usage: loglike Note: This could be used in a TCL script like this: set firstll [loglike] All this procedure does is retrieve the stored loglikelihood. The current model must have been maximized first, either with the maximize command, or with a command like twopoint or multipoint which maximizes and compares many models. If the current model has not been maximized, an error is raised. Shortcuts: logl - loglikelihoodReturn to command index
A1.106 madj
Purpose: Apply current lodadj to a previous multipoint run Usage: madj madj -restore ;# restore ORIGINAL multipoint files madj -undo ;# restore previous multipoint files Notes: trait or outdir must already have been selected. madj applies loadadj from lodadj.info file in trait/outdir. madj may be used on incomplete multipoint runs prior to restarting. It is not necessary to -restore before applying another lodadj. Some roundoff errors occur in last decimal place, but do not "accumulate" over multiple runs because LOD's are calculated from loglikelihood values, not previous LOD's. NULL models must be present. If there is an error, there should either be a "no files modified" or "restoring previous files" message. If not, or if it is desired to restore the ORIGINAL multipoint files for any reason, use the command "madj -restore." The FIRST time madj is run, those files were saved as multipoint*.save. (The PREVIOUS set of multipoint files were also saved as multipoint*.tmp, and may also be restored with the "madj -undo" command.) Shortcuts: madj - madjReturn to command index
A1.107 map
Purpose: Process the map data. Usage: load map [-haldane | -kosambi | -basepair] <filename> ; loads map file map show [<marker> ...] ; displays map data map unload ; unloads map data map names ; displays marker names map fname ; returns name of map file map chrnum ; returns chromosome identifier map nloci ; returns number of loci in map map func ; returns mapping function code ('b'=basepair, 'h'=Haldane, 'k'=Kosambi) In the map file, marker locations are typically given in cM. When multipoint IBDs are computed, the distances between pairs of markers are converted to recombination fractions by means of a mapping function. By default, the Kosambi mapping function is assumed. The Haldane mapping function can also be used by specifying the -haldane option when loading the map file. Marker locations can also be specified as integer numbers of basepairs. This is useful, for example, when the markers are SNPs with known offsets in basepairs from some starting location. When basepair locations are used, the mapping function is called "basepair" rather than Kosambi or Haldane, but in fact there is no mapping provided from basepairs to recombination fractions. Therefore, such maps cannot be used to compute multipoint IBDs. In map files which contain cM locations, the first line of the file can optionally include the name of the mapping function, in which case no command line option is required to specify the mapping function. If the load command specifies a different mapping function from that specified in the map file, the load command option takes precedence. Map files which contain basepair locations must either have the basepair mapping function specified on the first line of the file or be loaded using the -basepair option. If a marker name is not specified for the 'map show' command, all currently loaded map data will be displayed. The name of the currently loaded map file and the mapping function are stored in the file 'map.info' in the current working directory. This file persists between SOLAR runs, which means that the map file will still be loaded the next time SOLAR is invoked (from within the same working directory.) For a description of the map file, enter 'file-map' Shortcuts: map - mapReturn to command index
A1.108 marker
Purpose: Process the marker genotype data. Usage: load marker [ -xlinked ] <filename> ; loads marker file marker unload [ -nosave ] ; unloads marker genotype data marker discrep [<marker> ...] ; checks for marker discrepancies marker names ; displays marker names marker show [<marker> ...] ; displays summary of marker data marker fname ; returns name of marker file The '-xlinked' option of the 'marker load' command can be given when loading genotype data for X-linked markers. Alternatively, the XLinked option can be set with the 'ibdoption' command. Genotype data will not be unloaded for markers for which MLE allele frequencies have been computed but not saved to a file. To save MLE allele frequencies, use the 'freq save' command. To unload markers without saving MLE allele frequencies, give the '-nosave' option in the 'marker unload' command. If a marker name is not specified for the 'marker discrep' or the 'marker show' command, the command applies to all markers currently loaded. The state of the currently loaded marker data is stored in the file 'marker.info' in the current working directory. This file persists between SOLAR runs, which means that the markers which are loaded at the end of a session will still be loaded the next time SOLAR is invoked (from within the same working directory.) Notes: The marker load command creates a subdirectory in the current working directory for each marker. Marker subdirectories are named 'd_<marker>', where <marker> is the marker name. The contents of a subdirectory will depend on the type of marker processing performed, and will include various input, output, and (possibly) error files. The marker subdirectories are deleted when the marker genotype data is unloaded. The loci in the marker file must all be autosomal or all be X-linked. By default, SOLAR assumes the loci are autosomal. The set of markers in the marker file and the set of markers in the freq file do not have to be the same. Allele frequencies will be computed for markers that do not appear in the freq file at the time these markers are loaded. For a description of the marker file, enter 'file-marker' Shortcuts: mark - markersReturn to command index
A1.109 markertest
Purpose: Test markerfile for discrepancies; list blankable ID's Usage: markertest <markerfile> [<marker>]* [<option>] <option> := -1 | -many | -ped | -fam <famfile> | -2 <markerfile> is either exact filename or pattern including * <marker> (optional) is either exact name or pattern including * If no markers are specified, all markers in markerfile are tested. Each marker is tested individually. Results are recorded in markertest.out in current directory. During normal operation, many error and warning messages may appear. Ignore these messages until markertest has finished. If no options are specified, a flexible procedure is used that should work in nearly all cases. It is the same as markertest -1 followed by markertest -many if necessary. IMPORTANT: Read the following two paragraphs to understand how those options work. -1 Blank one individual at a time. Report all individual blankings (if any) that fix discrepancy. If this succeeds only one of the reported individuals needs to be blanked and it is up to user to pick the best one. However, this procedure is good (if it works) because it will list all the possibilities, and it is relatively fast. But if it is necessary to blank more than one individual AT THE SAME TIME, this procedure will fail, so it is frequently inadequate by itself. -many Blank the first individual, and, if that doesn't fix discrepancy, blank the second individual, and then the third, and so on, until the discrepancy is fixed. Then, unblank all the individuals that can be unblanked without a discrepancy returning. The result is one set of individuals that ALL NEED TO BE BLANKED AT THE SAME TIME. It is not necessarily the only such set of individuals, or the smallest set. This procedure should always succeed in finding a set of blankable individuals. (This option used to be named -r.) -ped Rather than blanking only one ID at a time, blank whole "extended pedigrees" at a time. Blankable pedigrees are identified by pedindex.out PEDNO (the index created by "load pedigree") and by the first ID found in that pedigree. This procedure is the fastest, and is useful in identifying errant pedigrees, provided there is only one errant pedigree. -fam Rather than blanking only one ID at a time, blank nuclear families (or other groups) identified by "famno." The "famfile" must contain records including "ID" (permanent ID) and "FAMNO" (other fields will be ignored). There may be more than one record for each ID. Records may not use "FAMID" to disambiguate ID's. -2 Try blanking combinations of 2 ID's until one such pair of blankings fixes the discrepancy. Because this is an N x N problem, it may take a VERY LONG TIME to finish, but if you are convinced there is one pair that needs to be blanked, this procedure will find it. Notes: Pedigree file must already have been loaded Markerfile must have ID and and marker(s) only. Each marker is analyzed separately. Results for all markers are reported in markertest.out, which is divided into one section for each marker. Output is written to markertest.out which is divided into one section for each marker. Shortcuts: markert - markertestReturn to command index
A1.110 mask
Purpose: To read image mask file and set current voxel Usage: mask [<filename>] [-intensity <intensity>] [-index <index>] mask -next mask -delete <filename> is the name of the file containing the mask <intensity> is the integer value that defines this mask <index> is position within the set of mask-defined voxels -next specifies to advance to the next mask-defined voxel -delete deletes the mask and frees all related storage Currently, the file must be NIFTI/RicVolumeSet. The mask is assumed to be volume zero. If intensity is unspecified, the mask is defined as all non-zero intensities. The default starting index is zero, meaning the first voxel with the specified intensity. Indexes are advanced in the order x-first and z-last. The value returned is the x:y:z voxel value, which is also set as the current voxel value, which is saved to model files. The voxel value determines the voxel obtained when reading image files for image traits. See the voxel command for more information about the voxel and how image traits are included in csv files. The mask remains in effect during the current session and unless cleared by -delete. It is not cleared by 'model new'. The intent is that successive models can use successive voxels advanced with 'mask -next' which is the most efficient way to advance to the next voxel. It is possible though not required to specify filename, intensity, and index in one command. It is also possible to change the -intensity or -index in subsequent commands. If the intensity is changed, we go back to the first voxel with that intensity. The index is always absolute relative to the first voxel at current intensity specification. 'mask -next' starts from the current voxel which is normally set by the mask command itself, but could also be re-specified with the voxel command. If the <filename> has not been specifed in the current mask command or a previous mask command, all other options are invalid.Return to command index
A1.111 matcrc
Purpose: Prepend pedindex checksum (CRC) to the beginning of a matrix file Usage: matcrc [<path>/]<filename> [[<pedindexdir>] [-notzipped]] If pedindex.out is in a different directory, the optional <pedindexdir> is specified to point to where it is located. It defaults to the current directory "." If -notzipped argument is prepended the input matrix is not unzipped first so it must be unzipped already. Quick Notes: Matrix files in SOLAR include phi2.gz, IBD, and MIBD files. If you have modified a pedindex file directly and are subsequently having pedigree mismatch errors with your matrix files, but are sure everything is correct, you can simply run matcrc on all your matrix files to fix the problem quickly. However it is recommended you read all about matcrc first. Beginning with version 7.5.0, support for gzipped csv matrix files is added. The checksum will have id1 and id2 set to "checksum" and the checksum value will be in matrix1 after the initial decimal point. All other fields are ignored and are simply copied from the first record. Examples: matcrc phi2.gz matcrc gaw10mibd/mibd.1.10.gz The first two lines in an unzipped matrix file created by SOLAR since Version 4.0 will look like this: 1 1 0.1268587045 .61377 1 1 1.0000000 1.0000000 The first line is a checksum used to verify that the current pedindex file is the same as the one used when the matrix was created. This checksum is not required when the matrix is read, but if it is present, and it does not match the checksum values from the pedindex file, an error will stop maximization until the problem is fixed. The checksum is immediately followed by the actual data for the diagonal identity pair 1,1. When the matrix is actually being read, from beginning to end, the second line values will overwrite the first line values in the memory used for matrix storage. The checksum line will therefore have no effect on maximization results. The numbers in the checksum line are only used for matrix/pedindex validation in SOLAR versions 4.0 and beyond. matcrc is used by all SOLAR procedures that create matrix files since version 4.0. At this time, it need not be used for user created matrix files, it is optional now. But we recommend that users writing their own matrix files also use matcrc to postprocess their matrix files so that their matrices are also protected from being used with a changed pedigree. Even a slight change to a pedigree file can change the numbering of all individuals in the pedindex, and thus make a previously created matrix file entirely incorrect. Notes: The relevant pedindex.out file is assumed to be in the current working directory. The matrix file can be in any other directory so long as the relative or absolute path is given. The current user must have write access to the matrix file and to the directory itself, because during matcrc operation the matrix file is unzipped and then rezipped. A Posix compliant "cksum" command is assumed to be available on the system. It produces a quasi-unique polynomial Cyclic Redundancy Check (CRC) on the pedindex.out file; the chances of any different pedindex.out file having the same CRC is astronomically small. This includes changes such as swapping characters or lines. The CRC and Number of Octets (NOCT) is prepended to the beginning of the matrix in such a way as to be backward compatible with earlier versions of SOLAR. The first line in the actual matrix is used as a template, but the coefficients are replaced with CRC and NOCT. Since this is followed by the same line with the actual coefficients, the actual data overwrites the preceding numbers and the checksum in the preceding line will have no ill effect with any version of SOLAR, even those which did not use the checksum. Beginning with SOLAR version 4.0, when any matrix is loaded, the CRC and NOCT (if present) are detected and compared with those from the current pedindex.out file. When SOLAR version 4.0 loads a matrix with no checksum, no warning is given, but it may cause a warning or error in some future version. A mismatch in the CRC or NOCT will cause an error when loading. The mismatch can be corrected by running matcrc again, because it detects whether or not the matrix has already been signed, and removes the previous signature if necessary. However this should only be done after the user has carefully validated the pedigree and matrix match. It would be safest to reconstruct all matrix file with the current pedigree, if done by SOLAR procedures it would not be necessary for the user to run matcrc since it is done automatically by those procedures. User construction of matrix files is discussed in Sec. 8.3 of the manual. Matrix files are space delimited with semi-fixed format. The first two columns are for the two IBDID values, followed by one or two data value columns. The data value columns should begin in character position 14 or higher (counting the first position as 1). Once gzipped, the file should be processed with the matcrc command after loading the pedigree file, to generate a safety checksum value.Return to command index
A1.112 mathmatrix
Purpose: Create and use matrix objects in algebraic computations Usage: For matrices m, n, y, x, vector v, and scalar s: (vector is a 1 dimensional matrix either row or column) times [-e] $m $n ;# m or n can be scalar, -e means elementwise plus $m $n ;# elementwise addition, or m and/or n can be ;# scalars added to each element plus $v ;# unary plus is vector sum for v (only vectors) minus $m $n ;# elementwise subtraction, or m and/or n can be ;# scalars transpose $m ;# matrix transpose inverse $m ;# matrix inverse power $m $s ;# raise or lower each element to power $s dinverse $m ;# Fast inverse for diagonal matrix ;# Matrix must be diagonal! This is not checked! ols $y $x ;# ordinary least squares (lldt fastest) for y=xb+e solve [<method>] $y $x ;# other methods, default is ;# FullPivHouseholderQR evalues [$x] ;# eigenvalues ;# if no x, take from last evectors evectors [$x] ;# eigenvectors ;# if no x, take from last evalues mean $v ;# mean of all elements min $m|-where ;# returns min of all elements max $m|-where ;# returns max of all elements ;# -where returns {x y} for last min or max max $m $s ;# returns matrix with all values less than $s ;# changed to $s...new matrix only if values changed insert $m <row> <column> <number> ;# note one based indexing concatenate <1 | 2> [$m]* [$v]* ;#concatenate matrices and/or ;#vectors along specified dimension ;# in place of 1 | 2 may use vertical | horizontal show $m [<row> <column>] ;# echo element or entire matrix if <row> and <column> specified, scalar element is returned if no <row> and <column>, return entire matrix pretty printed as a tcl nested list (can be input to matrix new) output $m <filename> ;# write out matrix as csv file row $m <row> ;# extract row as a column vector: 1,2,... col $m <col> ;# extract column as a column vector: 1,2,... diagonal $m [<offset>] ;# extract diagonal as a column vector ;# offset to superdiagonal or subdiagonal diagonal $v ;# create matrix with v as diagonal (v must ;# be a Nx1 or 1xN matrix aka vector) rows $m ;# number of rows cols $m ;# number of columns shuffle $v ;# shuffle the elements of vector v shuffle $v n ;# shuffle the elements of vector v into n-1 cols ;# retaining first column unshuffled identity <rows> ;# create an identity matrix Matrix commands for creating, loading, and deleting matrixes matrix new { <row>* } <row> = { <number>* } load matrix [<option>]* <csvfile> <option> = <column-list>|-hoheader <column-list> = -cols { <column> } <column> = [<name>|'<name>'|<index>|<repeating-sequence>|end]+ <repeating-sequence> = { <number>+ } Examples of new and load: matrix new {{1 2 3} {4 5 6}} load matrix -noheader design.mat.csv ;# .mat.csv are headerless load matrix -cols {{1} 1 age bmi '2020' end} phen.csv ;# first column is all 1's Delete and reset commands: matrix delete $m matrix reset ;# free all MathMatrix storage and ID's ;# do reset as much as possible to free memory MathMatrix commands mathmatrix debug on ;# print messages as operations are performed Notes You can save each returned matrix in a Tcl variable as you do with other operations, such as: set X [load matrix -cols {{1} sex age} phen.csv] set Xt [transform X] Matrices are returned as internally numbered identifiers, starting from .mm.1 As shown, you may assign these to variables like any other returns in SOLAR. Or you can simply remember the identifiers themselves and use them directly, skipping the assignments and referencing. This can be convenient for interactive use as it may reduce typing, but normal assignment is better for scripting. solar> matrix reset solar> load matrix -cols {{1} sex age} phen.csv .mm.1 solar> rows .mm.1 1009 Actual matrix objects are stored internally in C++. The storage use will continue to increase as more operations are done, as every intermediate matrix created is saved. As soon as convenient you should delete matrices or do "matrix reset" to clear all matrix storage and identifiers. *matrix reset frees all the memory used by MathMatrix objects and resets the MathMatrixID to start at .mm.1. It is recommended to do this at the beginning and/or end of matrix calculations. The -cols option may be used in several ways. You can specify the columns to be taken from a CSV file, and in what order. You can specify the columns by name (the natural way), or by column number (the first column is defined as column 1) which would be required for headerless CSV files as produced by other math programs. In addition to taking columns from the file, you can specify single numbers or or sequences of numbers which will be repeated to fill up the column. Here is an example of loading a Design Matrix out of two named columns from a CSV file (age and bmi) with a leading column of all 1's: load matrix -cols {{1} age bmi} phen.csv The entire column list must be enclosed in curly braces, and sequences of numbers to be repeated must be further enclosed in an inner set of curly braces. Here is an example of loading a headerless csv file having 4 columns in reverse column order: load matrix -cols {4 3 2 1} -noheader oldmat.csv Numbers within the column list are assumed to be column indexes, starting from 1 for the first column. If you wish to identify columns in a CSV file which happen to have numbers as their names, you must enclose them in single apostrophes. For example, if phen.csv has a column named "2020" you could use it like this: load matrix -cols {age bmi '2020'} phen.csv The evalues and evectors commands allow you to compute a new real pseudo eigen decomposition, or simply echo the eigvenvalues or eigenvectors of the previous decomposition. This way you can get either one first, or only if needed. Generally you should extract your eigenvalues or eigenvectors as quickly as possible and certainly before calling any other routines which might also use eigen decompositions. (The pseudoeigendecomposition is the only type currently supported as it guarantees real numbers. A general eigendecomposition would require complex numbers and is not yet supported.) For example, starting with matrix x, you could get the eigenvalues with the evalues command, then get the eigenvectors: set values [evalues $x] set vectors [evectors] Or you could do this in reverse order: set vectors [evectors $x] set values [evalues] If a matrix filename ends in ".mat.csv" it will automatically be handled as headerless matrix file and the -noheader argument is not required.Return to command index
A1.113 matrix
Purpose: Set up sample matrix (phi2 etc.) or math matrix SOLAR has two kinds of matrix, the original sample matrix variables used during maximization (e.g. phi2) and a newer matrix algebra object used for high speed math calculations. For clarity these may be called "sample matrix" and "math matrix". Usage: matrix load [<option>] <filename> <name1> [<name2>] ;# loads sample matrix variable(s) (one or two) from file} set matname [matrix load <filename>] ;# create math matrix from contents of csv file ;# (header line is ignored) set matname [matrix new {{1 2} {3 4}}] ; create math matrix from tcl list of lists matrix ; displays sample matrices matrix delete <name> ; deletes a matrix (either kind) matrix delete_all ; deletes all sample matrices matrix reset ; deletes all math matrices matrix debug ; print info about sample matrices matrix -return ; return sample matrix commands <option> == -sample ; remove missing ID's from sample <option> == -allow ; default missing ID's to diagonal 1 <option> == ; error if matrix missing ID Notes: A .gz suffix is appended to the filename, if it is not specified. Matrix files are compressed with (GNU) gzip; gunzip must be installed in the user's path. CSV Sample Matrix Files (recommended for user written matrix files) As of version 7.5.4, matrices can be in CSV format, using user ID (not pedindex) as the key field. The required fields are id1,id2,matrix1. The optional fields are famid1,famid2,matrix2. If FAMID is required to disambiguate ID's, famid1 and famid2 must be included in each record since not all matrices are limited to family interactions. All other fields in a CSV matrix file are ignored. The mapping from ID to pedindex is obtained from the currently loaded pedigree file. It is recommended that users create matrix files in the CSV format. CSV matrix files must be gzipped and a .gz suffix must be at end of filename. For more information, give the command "file-matrix". As of version 7.5.5, matrix files are checked against the model sample during maximization. By default, if a diagonal matrix entry is missing for anyone who would otherwise be included in the sample, an error occurs and messages displaying the missing individuals will be printed to the terminal. There are two options which permit missing diagonal entries. -allow permits such individuals and their diagonal entry will be defaulted to 1.0. -sample removes such individuals from the sample, and a message indicating the number of such removed individuals will be written to the maximization output file. Matrix names beginning with 'ibd' or 'd7' are 'defaultable.' This means that if they have a value of -1, that value is replaced with Phi2 or Delta7 respectively. By default, Phi2 and Delta7 are calculated internally, but may be overwritten with externally provided matrices. (This feature is obsolescent and should not be used in new code.) Traditional Format SOLAR Sample Matrix Files Traditional format SOLAR matrix files are discussed in Sec. 8.3 of the manual. These are space delimited with semi-fixed format. The first two columns are for the IBDID1 and IBDID2 and they MUST have the same fixed width throughout the file. Then there must be one or two data columns, beginning no earlier than column 14 (counting the first position as column 1). The data columns are not fixed format and can be separated by one or more spaces. The columns for IBDID1 and IBDID2 should be right justified and separated by a space. To see an example of the traditional matrix file format, you can give these SOLAR commands: example load pedigree gaw10.ped gunzip phi2.gz head phi2 This will show output that starts like this: 1 1 0.1268587045 .61377 1 1 1.0000000 1.0000000 2 2 1.0000000 1.0000000 3 3 1.0000000 1.0000000 4 4 1.0000000 1.0000000 5 5 1.0000000 1.0000000 6 6 1.0000000 1.0000000 7 7 1.0000000 1.0000000 8 8 1.0000000 1.0000000 9 9 1.0000000 1.0000000 This sample shows the following things: IBDID1 starts in the first column (column 1) and is 5 columns wide (allowing IBDID's up to 32000, the max allowed). There is one space separating the IBDID's. This should not be a tab and you should not use tabs at all in this file. IBDID2 starts in column 7 and is allowed 5 columns. There are two spaces. VALUE1 starts in column 14 and is free format. There is one space separating value 1 and 2. The very first record in this file is not real data but a checksum for the matching pedigree file. The second record in the file overwrites the checksum values with the actual data for 1,1. The optional checksum is checked against the pedigree file during loading to ensure the matrix corresponds to the actual loaded pedigree. You can add this checksum to any gzipped matrix file by running the matcrc command on it, for example: matcrc phi2.gz This will prepend a checksum for the currently loaded pedigree to the matrix file, deleting a pre-existing checksum if needed. You must be sure the currently loaded pedigree is the correct pedigree for this matrix. This step is optional but we recommend it. Using Matrix Files The preferred way to set up linkage models is simply to let the multipoint, twopoint, and bayesavg commands do everything for you. For more control, you can also use the "linkmod" (multipoint) and "linkmod2p" (twopoint) commands to set up (but not evaluate) linkage models. linkmod and linkmod2p set up all the required parameters and constraints for you. The command "loadkin" will load the phi2.gz matrix file, bypassing the usual on-the-fly calculations performed by SOLAR. Shortcuts: matr - matrixesReturn to command index
A1.114 maximize
Purpose: Find the maximum loglikelihood of a model by adjusting parameter values within specified constraints. Usage: maximize [-quiet] [-out <filename>] -quiet (or -q) Use minimum verbosity while maximizing -out (or -o) Write results to this filename. The default is 'solar.out' in a subdirectory named after the current trait, or the current 'outdir.' If a filename is specified without any / characters, it will also be written in the default location. If the filename contains / characters, it is used as a full pathname. -noquad Do not test quadratic -who Do not maximize, but list who would be included in analysis. File "who.out" is written to trait/outdir containing list of pedindex.out sequential ID's. This option is used by the "relatives" command -runwho Maximize, AND produce who.out file as above. -sampledata Do not maximize, but write out the data that would be included in the analysis to a file named "sampledata.out" in the maximization output directory. [WARNING! The fields in this file are preliminary and subject to change!] The trait data (fields trait1, trait2, etc.) might be from a phenotype or an expression created by the "define" command. Notes: This is the key command of solar. It is used by polygenic, twopoint, multipoint, bayesavg, and other scripts to find the model parameter values which produce the maximum loglikelihood. The final values are not saved to a model file. To do that, issue a 'save model' command after maximization. Multiple solar processes should not be writing to the same directory. Use the outdir command to specify different output directories. During maximization, if no phi2 matrix has been loaded, one will be loaded from the file phi2.gz. If that file is invalid or unavailable, starting with Version 8.3.1, maximization will not be peformed...unless option modeltype AutoPhi2. AutoPhi2 brings back the original default of calculating phi2 on-the-fly from the pedigree structure in the operative pedigree file. Beware the on-the-fly numbers are invalid in all cases except the simplest univariate quantiative, and they are also invalid with empirical pedigrees. Advanced SOLAR users sometimes use the raw "cmaximize" command which bypasses many of the retry mechanisms (and their implicit assumptions) now built-in to SOLAR. This is not recommended for most users. Shortcuts: maxi - maximizeReturn to command index
A1.115 memory
Purpose: Show total memory used by this SOLAR process Usage: memory Notes: This is intended primarily for internal debugging purposes. Now works on all supported systems. Shortcuts: mem - memoryReturn to command index
A1.116 mg_topedsys
Purpose: Convert comma delimited mga.out to pedsys format Usage: mg_topedsys <input_filename> <output_filename>Return to command index
A1.117 mga
Purpose: Run Measured Genotype (MG) association analysis for every SNP Usage: mga [-files [<gcovfile>]+ ] [-snps <snp-tcl-list>] [-out <outfile>] [-snplists [<snplistfile>]+] [-format csv | pedsys | fortran] [-noevd] [-notsame] [-saveall] [-slowse] [-evdse] [-fixupper <boundary>] [-fixlower <boundary>] [-ixsnp <SNP>] SPECIAL NOTE: no filenames or snp names should begin with hyphen (-) SNPS should be specified by their actual names, but the corresponding file variables should be prefixed by snp_ The snp_ prefix is added automatically by SOLAR command snp covar. Before invoking mga, user should set up trait and covariate(s) for the null model. It is not necessary to maximize the model prior to running mga. You may choose to set up the modeltype and omega, but it is not necessary. The omega will default to polygenic and the modeltype will be set to evd (evd1). When default evd1 models are used, mga can automatically detect when the sample has changed, and run new null models as needed. This may be the fastest approach in most cases. To use evd2 instead, specify "option modeltype evd2" just before running mga. For evd2, both -evdse and -slowse produce the same fast evd2 standard errors. Evd2 may be faster for individual pedigrees larger than 3000. At mga completion, mga_null will be the last null model created. If you wish to save more information, use the -saveall option described below) to save all maximization output files. -ixsnp Do an interaction test with the specified SNP and each one of snps in the <snp-tcl-list> or <snplistfile>. Additional terms for chi's and p values related to the interaction will be included in the output file. DO NOT include this snp in the starting model, it will be added as needed. -noevd Do not use EVD fast maximization or sample checking. Normally this is not needed because mga tests whether EVD can be used and changes to -noevd model when EVD cannot be run. However, sometimes EVD uses too much memory, so this can help.* -notsame Samples not same, therefore always run models for each SNP! Normally this is not needed because mga tests the sample automatically. If "option samplesametrustme 1" is given prior to running mga, mga will only run one null model for the first sample, unless EVD is used and fast checking is available. By default (starting with version 7.4.2) standard errors are computed using the formula sqrt(beta^2/chi^2). The -slowse and -evdse options are now considered obsolescent. If standard errors cannot be computed because chi is zero, the value 10e20 (10^20) will be returned. -slowse Estimate standard errors, not using EVD -evdse Estimate standard errors, using EVD if possible The -slowse option will disable EVD because standard errors computed when using EVD in the current EVD1 implementation are sometimes inaccurate. The -evdse option will compute standard errors and use EVD if possible, which will be faster if EVD can be used. A warning will be given about possible SE inaccuracy. If no -snps or -snplists arguments are given, mga will process all snp_ prefixed covariates in the currently loaded phenotypes files. <gcovfile> is one or more snp.genocov file generated by 'snp covar' command. These files are scanned for snp_ prefixed covariates in addition to the initially loaded phenotypes files. <snp-tcl-list> is a tcl-list of snps. If specified, the -snps list supercedes -snplists. <snplistfile> is a file listing the snps to be processed. Each snp is listed on a separate line with no blanks or other characters The snp_ prefix is not required, but allowed. Currently, no checking is done to see if any listed snp is found or not found, or duplicated in subsequent files. <outfile> defaults to mga.out in the current output directory, which is cleared out first, then one line of information is produced for each SNP. If another filename is specified, it is located relative to the output directory or full absolute pathname, and it is not erased first to permit accumulation. Note: if the "mgassoc" (original) command name is used, the output file is named mgassoc.out by default. -debug Print extra messages useful in debugging, including all maximization output -format csv, fortran, pedsys. csv is default. fortran and pedsys formats are identical space delimited except that fortran version has a header line at top, pedsys version writes a code file (.cde). Actually, both fortran and pedsys options write the code file in case you need it later. If you have written pedsys format, it can be converted to comma delimited with the command ped2csv. If you have written a comma delimited format file, you can convert it to pedsys format with the command mg_topedsys. -saveall save all solar.out output files. Each output file will be named mga_<snpname>.out. Null model output files will be named mga_null_<snpname>.out based on the first <snpname> they were created for (null will be re-used if sample unchanged). -fixlower fix snp beta parameter lower boundaries to this value -fixupper fix snp beta parameter upper boundaries to this value Notes: The genotype covariates are numeric variables giving the observed (or imputed) number of rare alleles an individual has at a particular SNP. These fields must be named'snp_<snp>' where <snp> is the SNP name. Ex: snp_rs12345 *EVD is not currently available for multivariate or discrete models. However, discrete traits may be handled as quantitative (at some loss in accuracy) and therefore used with EVD by using either "option enablediscrete 0" or (generally preferred, to increase SD): define qtrait = dtrait * 5Return to command index
A1.118 mgassoc
solar::mga Purpose: Run Measured Genotype (MG) association analysis for every SNP Usage: mga [-files [<gcovfile>]+ ] [-snps <snp-tcl-list>] [-out <outfile>] [-snplists [<snplistfile>]+] [-format csv | pedsys | fortran] [-noevd] [-notsame] [-saveall] [-slowse] [-evdse] [-fixupper <boundary>] [-fixlower <boundary>] [-ixsnp <SNP>] SPECIAL NOTE: no filenames or snp names should begin with hyphen (-) SNPS should be specified by their actual names, but the corresponding file variables should be prefixed by snp_ The snp_ prefix is added automatically by SOLAR command snp covar. Before invoking mga, user should set up trait and covariate(s) for the null model. It is not necessary to maximize the model prior to running mga. You may choose to set up the modeltype and omega, but it is not necessary. The omega will default to polygenic and the modeltype will be set to evd (evd1). When default evd1 models are used, mga can automatically detect when the sample has changed, and run new null models as needed. This may be the fastest approach in most cases. To use evd2 instead, specify "option modeltype evd2" just before running mga. For evd2, both -evdse and -slowse produce the same fast evd2 standard errors. Evd2 may be faster for individual pedigrees larger than 3000. At mga completion, mga_null will be the last null model created. If you wish to save more information, use the -saveall option described below) to save all maximization output files. -ixsnp Do an interaction test with the specified SNP and each one of snps in the <snp-tcl-list> or <snplistfile>. Additional terms for chi's and p values related to the interaction will be included in the output file. DO NOT include this snp in the starting model, it will be added as needed. -noevd Do not use EVD fast maximization or sample checking. Normally this is not needed because mga tests whether EVD can be used and changes to -noevd model when EVD cannot be run. However, sometimes EVD uses too much memory, so this can help.* -notsame Samples not same, therefore always run models for each SNP! Normally this is not needed because mga tests the sample automatically. If "option samplesametrustme 1" is given prior to running mga, mga will only run one null model for the first sample, unless EVD is used and fast checking is available. By default (starting with version 7.4.2) standard errors are computed using the formula sqrt(beta^2/chi^2). The -slowse and -evdse options are now considered obsolescent. If standard errors cannot be computed because chi is zero, the value 10e20 (10^20) will be returned. -slowse Estimate standard errors, not using EVD -evdse Estimate standard errors, using EVD if possible The -slowse option will disable EVD because standard errors computed when using EVD in the current EVD1 implementation are sometimes inaccurate. The -evdse option will compute standard errors and use EVD if possible, which will be faster if EVD can be used. A warning will be given about possible SE inaccuracy. If no -snps or -snplists arguments are given, mga will process all snp_ prefixed covariates in the currently loaded phenotypes files. <gcovfile> is one or more snp.genocov file generated by 'snp covar' command. These files are scanned for snp_ prefixed covariates in addition to the initially loaded phenotypes files. <snp-tcl-list> is a tcl-list of snps. If specified, the -snps list supercedes -snplists. <snplistfile> is a file listing the snps to be processed. Each snp is listed on a separate line with no blanks or other characters The snp_ prefix is not required, but allowed. Currently, no checking is done to see if any listed snp is found or not found, or duplicated in subsequent files. <outfile> defaults to mga.out in the current output directory, which is cleared out first, then one line of information is produced for each SNP. If another filename is specified, it is located relative to the output directory or full absolute pathname, and it is not erased first to permit accumulation. Note: if the "mgassoc" (original) command name is used, the output file is named mgassoc.out by default. -debug Print extra messages useful in debugging, including all maximization output -format csv, fortran, pedsys. csv is default. fortran and pedsys formats are identical space delimited except that fortran version has a header line at top, pedsys version writes a code file (.cde). Actually, both fortran and pedsys options write the code file in case you need it later. If you have written pedsys format, it can be converted to comma delimited with the command ped2csv. If you have written a comma delimited format file, you can convert it to pedsys format with the command mg_topedsys. -saveall save all solar.out output files. Each output file will be named mga_<snpname>.out. Null model output files will be named mga_null_<snpname>.out based on the first <snpname> they were created for (null will be re-used if sample unchanged). -fixlower fix snp beta parameter lower boundaries to this value -fixupper fix snp beta parameter upper boundaries to this value Notes: The genotype covariates are numeric variables giving the observed (or imputed) number of rare alleles an individual has at a particular SNP. These fields must be named'snp_<snp>' where <snp> is the SNP name. Ex: snp_rs12345 *EVD is not currently available for multivariate or discrete models. However, discrete traits may be handled as quantitative (at some loss in accuracy) and therefore used with EVD by using either "option enablediscrete 0" or (generally preferred, to increase SD): define qtrait = dtrait * 5 Shortcuts: mgas - mgassociationReturn to command index
A1.119 mibd
Purpose: Compute multipoint IBDs. Usage: mibd relate [-mxnrel <n>] ; creates relative-class file mibd merge ; merges marker IBDs mibd means [-typed | -all] ; computes mean IBD by relative-class mibd [<from> <to>] <incr> ; computes multipoint IBDs mibd export [-file <filename>] [-overwrite] [-append] [-nod7] [-ibdid] [-byloc] [<chromo> ...] ; writes MIBDs for specified chromosomes ; to a file in comma delimited format mibd import [-file <filename>] [-nod7] [-ibdid] [<chromo> ...] ; reads MIBDs for specified chromosomes ; from a file in comma delimited format mibd prep <program> [-version 2.82] [-usefreq] [-qter] ; prepares input files needed to compute ; MIBDs using <program>, where <program> ; is loki, simwalk (sw), merlin, or ; genehunter (gh) mibd import <program> [-file <filename>] [-version 2.82] ; imports MIBDs from an output file ; computed by <program>, where <program> ; is loki, simwalk (sw), merlin, or ; genehunter (gh) Before any mibd command can be run, the directory in which to store the mIBDs must be specified with the 'mibddir' command. This specification is stored in a file mibddir.info in the working directory, so it need not be repeated in future sessions from the same working directory. The first record in all matrix files produced by SOLAR, including mIBD matrix files, is a checksum and not real data; see the matcrc command for details. This checksum is optional in user created matrix files. If present, it prevents a using matrix with a different or changed pedigree. The 'mibd relate' command can be run after the pedigree file has been loaded, and only needs to be run once per data set. A tally of the relative classes present in the data set can then be displayed with the 'pedigree classes' command. A pair of individuals may be related in multiple ways, e.g. as 1st cousins and as 2nd cousins. To conserve memory, there is a default limit on the number of ways any two individuals may be related. For some complex pedigrees, it may be necessary to specify a higher limit using the '-mxnrel' option. The remaining commands in the first group of mibd commands shown above must be run once for each chromosome. The 'mibd merge' command must be run first, followed by the 'mibd means' command. The 'mibd means' can take one of two options: -typed or -all. If the -typed option is specified, only the IBDs for pairs of individuals who are both genotyped will be used to compute mean IBDs by relative class. If the -all option is specified, the IBDs for all pairs of individuals are used. The default option is -all. The following steps are required before computing multipoint IBDs for chromosome N: 1. Compute the marker-specific IBDs for all markers on chromosome N. For more information on computing marker- specific IBDs, enter 'help ibd'. 2. Load the map file for chromosome N. 3. Use the 'ibddir' command to specify the directory where the marker-specific IBDs are stored. 4. Use the 'mibddir' command to specify the directory where the multipoint IBDs are to be written. Only the last of the first four mibd commands shown above need be entered (for a particular chromosome.) If the merged IBD file does not exist, 'mibd merge' will automatically be run to create it. If the mean IBD file does not exist or is older than the merged IBD file, 'mibd means' will be run. Note that when 'mibd means' is run automatically, the default option, -all, will be used. The 'mibd means' command must be issued directly in order to use the -typed option. If any of the marker IBD files is newer than the merged IBD file, a warning message will be displayed. In order to update the merged IBD file, the 'mibd merge' command must be issued directly - this will not be done automatically. The 'mibd export' command outputs the multipoint IBDs for a specified set of chromosomes into a comma delimited file. The MIBDs must be stored in the directory named in the 'mibddir' command. If no chromosomes are specified, then all multipoint IBDs found in the 'mibddir' directory are exported. By default, the SOLAR indexed IDs (IBDIDs) in the MIBD files are translated to permanent IDs, and family IDs are included when present in the pedigree file. The default name for the output file is "solar-mibd-export.out". The default fields in the output file are CHROMO, LOCATION, [FAMID,] ID1, ID2, IBD, and D7, where LOCATION is the chromosomal location in cM. WARNING: The file to which MIBDs are exported can become very large. To keep export files to a manageable size, it may be best to export MIBDs on a per-chromosome basis, i.e. one export file per chromosome, or on a per-location basis by using the -byloc option. The options for the 'mibd export' command are -file (or -f) Export MIBDs to this filename. -overwrite (or -ov) Overwrite existing output file. -append (or -a) Append MIBDs to existing output file. -nod7 Don't include D7 field from MIBD files. -ibdid Write out SOLAR indexed IDs (IBDIDs) rather than permanent IDs. -byloc Export MIBDs on a per-location basis, i.e. one export file per location. The export files are given unique names by appending the chromosome number and the location to the filename given by the -file option. The 'mibd import' command inputs the multipoint IBDs for a specified set of chromosomes from a comma delimited file. MIBD files are written and stored in the directory named in the 'mibddir' command. If an MIBD file for an imported chromosomal location already exists, it is overwritten. By default, the permanent IDs in the input file are translated to SOLAR indexed IDs (IBDIDs). Family IDs must be included in the input file when they are present in the pedigree file. The default name for the input file is "solar-mibd-import.in". The default fields in the input file are CHROMO, LOCATION, [FAMID,] ID1, ID2, and IBD. If the input file does not contain a D7 field, all D7 values in the MIBD files are set to zero. By default, all MIBDs in the input file are imported. If chromosomes are specified on the command line, however, MIBDs are imported for those chromosomes only. NOTE: The order of the chromosomes and chromosomal locations in the input file is unimportant, but all the lines for a given chromosomal location MUST BE ADJACENT. To be safe, you may want to sort the input file by chromosome and chromosomal location to ensure that the input file is ordered correctly. The options for the 'mibd import' command are -file (or -f) Import MIBDs from this filename. -nod7 Don't take D7 from input file; set D7 to zero instead. -ibdid Input file contains SOLAR indexed IDs (IBDIDs) rather than permanent IDs. The 'mibd prep' command generates the input files needed to compute multipoint IBDs using a program other than SOLAR. The programs currently supported are Loki, SimWalk2, Merlin and GeneHunter. Before this command can be run, marker data and a map file must have been loaded. The input files are generated from various files created by SOLAR when pedigree and marker data are loaded and so contain SOLAR indexed IDs (IBDIDs). The marker locations written to an input file will be in Haldane cM. If the user has loaded a Kosambi map file, the necessary conversion to Haldane is made automatically. By default, IBDs will be calculated at every integer cM location from 0 to the last marker in the map file. The '-qter' option extends the range of locations to the end of the chromosome. For each chromosome, SOLAR defines qter as the nearest integer location greater than or equal to the position (in Haldane cM) of the last marker on that chromosome in the deCODE map. The allele frequencies in effect, whether read from a file or computed by SOLAR, are passed to the multipoint IBD calculation program, except in the case of Loki, for which the default action is to let Loki estimate the allele frequencies. The '-usefreq' option can be used to force Loki to use the current allele frequencies. NOTE: After the input files have been created, the user must exit SOLAR and run the external program to compute the multipoint IBDs. Once the IBD calculations are complete, the resulting output file can be imported into SOLAR using the 'mibd import <program>' command. The 'mibd import <program>' command reads an output file which contains the multipoint IBDs computed by a program other than SOLAR, and imports those IBDs to create SOLAR-format MIBD files. The programs currently supported are Loki, SimWalk2, Merlin and GeneHunter. This command is designed to work with the 'mibd prep' command, so the output file is assumed to contain SOLAR indexed IDs (IBDIDs), not the real IDs from the pedigree data file. Before this command can be run, the 'mibddir' command must have been given to specify the directory in which the MIBD files are to be stored. NOTE: For both the 'mibd prep' and 'mibd import' commands, if SimWalk2 is the program chosen, then it is assumed that version 2.91 or a newer version of SimWalk2 will be used to compute the multipoint IBDs. In previous versions of SOLAR, SimWalk2 version 2.82 was assumed. Due to a backward incompatibility in file formats that was introduced in later SimWalk2 versions, if you wish to use the earlier version of SimWalk2, it is now necessary to include the '-version 2.82' option. Notes: The computed multipoint IBDs are stored in gzipped files with names of the form 'mibd.<chromo>.<loc>.gz', where <chrom> is the chromosome number and <loc> is the chromosomal location. Several additional files are created and used for multipoint IBD calculation: mibdrel.ped relative-class and kinship information mibdchrN.loc marker locations on chromosome N mibdchrN.mrg.gz merged marker-specific IBDs for chromosome N mibdchrN.mean mean IBD by relative class for chromosome N Shortcuts: mibd - mibdReturn to command index
A1.120 mibddir
Purpose: Set directory in which MIBD matrix files are stored (multipoint only; use ibddir to set up twopoint) Usage: mibddir <dirname> ; set directory for MIBD files mibddir ; show current mibddir mibddir -session ; show mibddir entered in this session Notes: The mibddir selected is saved in file mibddir.info for future SOLAR sessions. Once a midddir is selected, it need not be selected again within the same working directory, EXCEPT for the purposes of writing out mibd files. To prevent accidentally overwriting pre-existing mibd files, it is necessary to explicitly enter the mibddir command before using the mibd command or other commands which write files into the mibddir. Shortcuts: mibdd - mibddirReturn to command index
A1.121 miniplot
Purpose: Arrange miniature plots on a single page Usage: miniplot [-pass <pass>] [-allpass] [-plots <number>] [-port] [-land] See also "plot -all" -pass Do this pass number (default is 1) -allpass Do all passes, each on a separate page -plots Put this many plots on a page -port Portrait layout -land Landscape layout -nodisplay Generate postscript, but don't display Output file named passN.out (pass01.out for pass 1) in trait/outdir is created. The trait or outdir must have been specified previously and the plots must have been created previously (see usage for example). The individual chromosome plots should have been created previously using the "plot" command. In fact, "plot -all" or "plot -allpass" will invoke miniplot automatically. This requires that Python (1.5.2 and later works, maybe earlier) be installed. If you do not have python, use "plot -string" instead. Shortcuts: minipl - miniplotReturn to command index
A1.122 model
Purpose: Describe, save, or load a model Usage: save model <modelname> ; save current model to a file load model <modelname> ; load model from a file model ; display model on terminal model new ; reset to new empty model Notes: An extension .mod is automatically appended if not specified. You must specify directory path if you want to save model in a subdirectory of the current directory. - Shortcuts: model - modelReturn to command index
A1.123 mu
Purpose: Set or Display the Mu equation (trait value estimator) Usually the covariate command is used to set this automatically, but the mu command provides more advanced capabilities, such as using log or sine functions. Usage: mu ; displays Mu equation mu = mu + <expression> ; add new terms to Mu equation mu = <expression> ; replaces Mu equation (SEE DISCUSSION!) mu reset ; restores default Mu equation <expression> may include mathematical operators (+-*/^), constants, parentheses, any mathematical functions defined in the C programming language, any phenotypic variables included in the analysis, sex, and for any variable "var" x_var (the sample mean), min_var (the sample minimum), and max_var (the sample maximum). Parameters whose names include any erstwhile operators including parentheses, *, or ^ must be enclosed in angle brackets <> to prevent being parsed as functions; note this always happens for bivariate models, or when there are interaction covariates such as age*sex, or squared covariates such as age^2. For bivariate models, you can also include "t1" and "t2": t1 is 1 if the mu is being evaluated for the first trait, and 0 otherwise, and t2 has this behavior for the second trait. All variables included in the mu will be required in the sample. Also it is possible for the mu to include inequality operators (such as >=) and the "print" function (for debugging purposes). In these regards, the mu expression is like the omega expression. See "help omega" for more about inequalities and print, and a complete listing of the mathematical functions available. Discussion: The "mu" embodies the estimation of the trait value for any individual based on the sample mean and their covariate values. It does not normally include genetic effects. The difference from this estimation and the actual value is used to determine genetic and other intra-individual effects. Thus, "mu" is evaluated in the context of of a single individual, and NOT a pair of individuals as with "omega". You can get examples of many possible "mu" commands by using the mu command to display the current mu equation for different models. For example: solar> model new solar> trait q1 solar> covar age solar> mu mu = \{Mean+bage*(age-x_age)\} First notice that the entire body of this default mu equation is delimited by \{ and \} characters. This is the portion which is automatically generated by SOLAR and will be changed automatically if your covariates are changed. You should not normally edit this portion of the mu. If you need to change the mu, you can either augment this portion with an additional expression, or replace the mu altogether with a new expression, in which case you must leave out the \{ and \} delimiters. If you replace the mu altogether with a new expression, you are then responsible for including terms for covariates (if any) and it is not necessary to use the "covariate" command at all. The Mean and bage terms refer to parameters in the model, the age term refers to a data variable "age" found in the phenotypes file, and the term x_age refers to the average age of all individuals in this sample. You may include similar terms in any new mu expression. Adding To The mu You can add additional terms either by appending them onto the mu shown by the mu command (using terminal cut and paste makes this convenient) or using the "mu = mu + ..." shorthand. For example, using the shorthand, you could add a new term for log(weight-100) as follows: solar> mu = mu + log(weight-100) OR by entering the following: solar> mu = \{Mean+bage*(age-x_age)\} + log(weight-100) in either case, the result would be the same: solar> mu mu = \{Mean+bage*(age-x_age)\} + log(weight-100) If you then added another covariate, that would be included automatically in the default portion of the mu: solar> covar sex solar> mu mu = \{Mean+bage*(age-x_age)+bsex*Female\} + log(weight-100) Notice here that the variable "Female" changes according to the sex. It is "0" for male and "1" for female. Replacing the Mu You can also replace the Mu altogether, removing the "default portion." If you remove the specially delimited "default portion" from the mu, your covariate commands will have no effect on the mu, and you will either have to write the beta parameters into the mu yourself or remove the covariates altogether. All phenotypic variables you write into the model will be required for all individuals to be included in the sample. Continuing our example: solar> covariate delete_all solar> mu mu = \{Mean]} solar> mu = Mean + log(weight-min_weight) solar> mu mu = Mean + log(weight-min_weight) The Mu can be as elaborate as you like, including any mathematical functions defined in the "C" programming language. It need not include the "Mean" parameter (in fact you do not even need a Mean parameter in SOLAR anymore). If you removed the default mu by mistake and need to restore it, use the "mu reset" command. Bivariate Mu solar> model new solar> trait q1 q2 solar> covar sex solar> mu mu = \{t1*(<Mean(q1)>+<bsex(q1)>*Female) + t2*(<Mean(q2)>+<bsex(q2)>*Female)\} Notice that the mu for this bivariate model has separte terms for the first and second traits, which are identified by "t1" and "t2". (The variable "t1" is true if the first trait is being estimated, and false if the second trait is being estimated. If you replace the mu, any terms not multiplied by "t1" or "t2" will be applied to the estimation of both traits, and you may have as many (or as few) t1 and/or t2 terms as you need. Additional Notes: (1) Use the "mu = mu + <expression>" as described above instead of the now obsolescent "mu = <expression> + mu" to add to the mu. Also, you may notice that square brackets are no longer used to delimit the default mu. They did not work as had been intended. The default portion of the mu is now delimited by \{ and \} which may be included in a user specified mu. Everything within the delimiters is maintained by SOLAR and editing this portion will have no effect. It is simply displayed for informational purposes. If the mu is defaulted, models will be saved with a mu "comment" for informational purposes only; the actual mu is determined by the covariates. (2) As terms in the mu equation, you may use any constant, any parameter, Sex, Mean, and any Phenotypic variable. There are also predefined terms for any phenotype named 'var': x_var (the sample mean), min_var (the sample min), and max_var (the sample max). Any math operator (+,-,*,/) and function defined in the C programming language may be used. Also, the ^ character may be used to indicate exponentiation. (3) Parameter names which include * or ^ should be enclosed in <> angle brackets to prevent the names from being interpreted as multiplication and/or exponentiation expressions: mu = Mean + <bage*sex>*(age-x_age)*Female (4) The default mu expression will display all variables as being adjusted to their mean (e.g. age-x_age). However, during maximization, if a variable is found to be binary, the variable is adjusted to its minimum (e.g. diabet-min_diabet) instead. This will be reflected after the first maximization. User-created mu equations must always correctly specify either the mean (x_) or min (min_) variable as required. Shortcuts: mu - muReturn to command index
A1.124 multipoint
Purpose: Perform a multipoint analysis. Scan loci on selected chromosomes at selected interval (use chromosome, interval, and finemap commands beforehand) Usage: multipoint [<LOD1> [<LOD2> [<LOD3> ...]]] [-overwrite] [-restart] [-renew mod] [-nullbase] [-plot] [-score] [-cparm <plist>] [-rhoq <fixed value>] [-saveall] [-ctparm <plist>] [-se] Zero or more criterion LOD scores may be specified. If none are specified, multipoint will make one full scan and then stop. If one LOD score is specified, multipoint will continue scanning until the highest LOD found in the last scan is no longer greater than or equal to the LOD score specified. If more than one LOD score is specified, each LOD will apply after one scan has been completed. Then the last LOD specified will remain in effect. -overwrite (or -ov) Overwrite existing multipoint output files. -restart (or -r) Restart previous multipoint run -nose Don't bother computing standard errors in best models (S.E.'s are not normally computing while scanning anyway). This should not be combined with -slod option. -plot plot each point while scanning (uses plot -quick) Shows the current chromosome in the current pass, ending with the last chromosome in the last pass. To view previous passes, or for best quality plot, use the plot command. The plot command may be run simultaneously in other SOLAR sessions plotting the same data from the multipoint*.out files. For more information, see help for the plot command. -score Use Score based LOD (S-LOD) defined as: SLOD (score(i)^2 * SE(i))/(2 ln (10)) (where i is the index of the new parameter). -cparm <plist> Custom parameters. (See also -ctparm.) This is discussed in Section 9.5 of the manual. Scanning will consist of replacing one matrix with another matrix, everything else is unchanged. The starting model MUST be a "prototype" linkage model will all the desired parameters, omega, and constraints. Starting points and boundaries for all variance parameters must be explicitly specified. Following the -cparm tag, there must be a list of parameters in curly braces that you want printed out for each model. The list can be empty as indicated by an empty pair of curly braces {}. The matrix to be replaced must have name mibd or mibd1, mibd2, etc. The highest such mibd will be replaced. If the loaded matrix has two columns, each succeeding matrix will also be loaded with two columns. There must be a model named null0 in the maximization output directory for LOD computation. See section 9.5 for an example of custom parameterization. Note: the user's starting model is saved in the output directory as multipoint.template.mod. Any or all parameters in the <plist> may also be multiple-term expressions. See second example below. After revision in version 4.1.5, -cparm now reloads the prototype model at the beginning of each chromosome or during finemapping if there is a gap greater than 11cm. This provides much more stable operation of -cparm and fixes the problems that led most people to use -ctparm. However, -ctparm may be preferable in some cases where there are convergence errors. Or vice versa. Another strategy is to set the interval to 1. -ctparm <plist> Custom parameters, as -cparm, but rebuilding each model from the "prototype" linkage model. This might be slower, but it has the advantage of greater reliability. If any model ends up with parameter(s) on boundaries, it has no ill effect on the remaining models. -se Calculate standard errors in all linkage models. Otherwise, they are always NOT calculated. This is mainly useful in conjunction with -cparm and -ctparm. See second example below. -link <proc> Use specified (by name) procedure instead of linkmod (default) to move to the next locus. The procedure requires 1 argument which is the full relative or absolute pathname to the mibd file. For now, it should ignore additional arguments (use a trailing "args" argument to do this). -nullbase Reload null model as base for each linkage model. The default is to start from the previous linkage model if on the same chromosome. -epistasis N Use current loaded model as the base for a one-pass epistasis scan. N is the index of the mibdN to be included in epistatic interactions (e.g. 1 for mibd1). An additional parameter H2qE1 will be added for the interaction term (for mibdN and mibd<scan>). Output files will be named multipointe.out and multipointe1.out. Only one epistasis pass is currently supported; if oligogenic scanning is desired that should be done first before running an epistasis scan. At the one QTL where mibdN and mibd<scan> are the same, h2q<scan> is constrained to zero (it is not and should not be constrained to zero elsewhere). -rhoq <value> Constrain rhoq parameters to <value>. -saveall Save the multipoint linkage models for every locus tested, not just the best ones. The filenames look like this: multi.pass1.2.3.mod for pass 1, chromosome 2, locus 3. The maximization output files are saved also following the same naming convention but with a .out suffix. Warning! This can fill up a lot of harddrive space quickly. It is recommended to restrict this to a chromosome and/or range (set with the interval command) of interest. Examples: multipoint 3 1.9 This will first do a full scan and zero-in scan once, then, if the highest LOD >= 3, it will scan again. If the next highest LOD >= 1.9, it will continue scanning until the last highest LOD < 1.9. trait q4 polymod maximize save model q4/null0 linkmod gaw10mibd/mibd.9.1.gz option standerr 1 multipoint -ctparm {h2r h2q1 {par h2q1 se}} -se This illustrates a simple use of the "custom parameterization" option. Note that unlike the typical use of the multipoint command, it is necessary to create a "prototype" linkage model first (here it is done with the linkmod command, but one might also use linkqsd or build the model "by hand" setting up the parameters and omega). The list of parameters following -ctparm may also include commands enclosed in a second level of braces. The command must include more than one element as it is not the braces but the element length that determines whether the element is interpreted as a parameter or a command. In this example, a command extracts the standard error of h2q1. Requires: mibddir, chromosome, and interval commands must have been given to select mibd files to use. finemap may be adjusted with finemap command. There must be a null0.mod model in the trait or outdir directory. This can be created with the polygenic command prior to running multipoint. (This model may include household and covariate effects. See the help for the polygenic command for more information.) IMPORTANT NOTE: In most cases, multipoint starts by loading a model named null0.mod from the current output directory. The model currently in memory is ignored. This is done because it is absolutely essentially that the null0 model be the basis to build all multipoint models. However, some options, such as -ctparm, use the model currently in memory when multipoint is invoked because all models are derived from a custom linkage model that the multipoint command does not necessarily know how to build. Notes: 1. Summary output is written to multipoint.out in a subdirectory named after the trait variable. You can set another output directory with the outdir command. Contents of the output directory will be purged of previous files at the beginning of each invocation if -overwrite is used. 2. The final "best" linkage model is link.mod. In addition, a series of additional "null" models is produced, starting with null1.mod (containing 1 QTL), null2.mod, etc. These models are produced only if a LOD criterion is specified and satisfied (so there is more than one pass). 3. If a LOD adjustment is in effect (see lodadj command) it is applied here. 4. If models have two traits, the 2df LOD scores will be converted to 1df effective LOD scores. To override this, use the lodp command (see). This feature was first included with SOLAR beta version 2.0.1. 5. At the beginning of each pass through the selected genome, multipoint calls a user script named multipoint_user_start_pass which takes one argument, the pass number (which starts at 1 for the first pass). Within this routine, the user can change the selected chromosomes or interval. Shortcuts: mul - multipointReturn to command index
A1.125 needk2
Purpose: Keep K2 (phi2) terms from MIBD matrices Usage: needk2 needk2 off Notes: This command is now obsolescent and should not be used. The K2 in MIBD files is obsolescent. We now maintain a separate phi2.gz file for discrete trait analyses, and for quantitative trait analyses, the K2 (phi2) values are computed as needed. Old Notes: If you need to use any of the K2_* matrix values, issue the needk2 command before loading the matrix (or running 'multipoint.') Normally the K2 values from matrix files are not used because they are identical to the K2 values computed by SOLAR as needed. The default (of not saving K2) cuts matrix memory usage in half. Shortcuts: needk - needk2Return to command index
A1.126 newmod
Purpose: Start a new model Usage: newmod [<trait>]+ <trait> Set the trait(s) to this/these trait(s). (The trait(s) can be specified later. If not specified here, they become <undefined>.) Notes: (1) This combines "model new", "outdir -default", and optionally trait [<trait>]+ . This is now preferred to using the separate commands, because it is shorter. For example, the command: newmod q1 q2 takes the place of the commands: outdir -default model new trait q1 q2 Clearly the "newmod" form is superior, it preserves the essential information while reducing redundant keystrokes. (2) Since this clears the outdir, it is adviseable to use this command instead of "model new" to be sure that the outdir is cleared, and not inheirited from some previous script. From now on, the manual advises using "newmod" (and not "model new") for this reason. However, the behavior of "model new" itself is unchanged, so that existing scripts that operate correctly will continue to operate correctly. When combining previously written scripts that use "model new" instead of "newmod", the user must be careful to update "outdir" status if required. New scripts using "newmod" will not be subject to the error of incorrectly inheiriting an unwanted outdir setting. Shortcuts: newm - newmodelsReturn to command index
A1.127 newtcl
Purpose: Recognize new or changed Tcl procedures in Tcl scripts Usage: newtcl Notes: At the time a SOLAR session is started, all Tcl scripts (files ending with ".tcl") are scanned. The newtcl command forces another such scan in order to recognize new Tcl procedures (created AFTER the start of the SOLAR session), or to recognize changes to Tcl procedures since the first time those procedures were used (see explanation below). You could also accomplish this by exiting from and restarting SOLAR, but that is often inconvenient because it causes the loss of session state. The following directories are scanned by SOLAR for user scripts: . (the current working directory) ~/lib (the lib subdirectory of your home directory, if it exists) A procedure found in "." will supercede one found in "~/lib" having the same name. Also beware that if the same procedure name is used in more than one script file, the first one encountered will be the one actually used. If the same procedure name is found in two files in the same directory, the precedence is not predictable. The scanning process simply looks through each script file for "proc" (procedure) statements. An index of all the procedures is then written to a file named tclIndex in the working directory. This file will only be created if user-defined Tcl scripts are found. Tcl procedures are only loaded into SOLAR the first time they used. Once loaded, they stay loaded, and may no longer reflect the Tcl files in the scan path if those Tcl files are changed. The newtcl command flushes all currently loaded procedures, so the next time any procedure is invoked, it will be reloaded from the file. The main Tcl file used by SOLAR is named solar.tcl and is located in the lib subdirectory of the SOLAR installation. This defines all the fundamental procedures used by SOLAR. User-defined procedures having the same name as built-in procedures will supercede them. Shortcuts: newt - newtclReturn to command index
A1.128 nifti_to_csv
Purpose: Converts nifti files to solar csv files. Usage: nifti_to_csv <Mask Volume Name> <Trait Column Name> <Phenotype Filename> <Output Filename or Base Filename> [<max voxels per file>] Use a binary mask (Mask Volume) to specify the region from where the voxel wise traits will be extracted (mask>0) Trait Column refers to the column name where the nifti file names are specified. Phenotype file is a standard solar format file that includes IDs, covariates and a column of nifti file names. Example ID, Age, Sex, FA N001, 24, M, N001_FA.nii.gz All other columns in the file will be treated as covariates and included in the output files. N columns refers to the number of traits per out file. It is suggested to keep this number < 10,000 See video instructions on how to use this function at www.solar-eclipse-genetics.orgReturn to command index
A1.129 normal
Purpose: Normal distribution functions Usage: normal -i[nverse] <p> Notes: Currently, the only supported function is the "inverse normal cumulative density function", which maps the open range 0,1 to the whole real line. (The values for 0 and 1 are out of range because they would be negative and positive infinity.) This normal function is used by the inormal procedure to perform an inverse normal transformation on a dataset. For further information, see the help for "inormal". In turn, the inormal procedure is part of the mechanism behind the "inormal_" prefix which may be applied to phenotypes in the define command. We will add additional normal distribution functions here as we need them. Our implementation is indirectly based on: Cody, W.D. (1993). "ALGORITHM 715: SPECFUN - A Portabel FORTRAN Package of Special Function Routines and Test Drivers" ACM Transactions on Mathematical Software. 19, 22-32.Return to command index
A1.130 null
Purpose: Return an optimized parameter from null.mod Usage: null loglike null h2q Notes: see also nulln Shortcuts: null - nullReturn to command index
A1.131 nulln
Purpose: Return an optimized parameter from nullX.mod (e.g. null0.mod, null1.mod, etc.). Usage: nulln 0 loglike nulln 1 loglike nulln 2 h2q1 Shortcuts: nulln - nullnReturn to command index
A1.132 old_zscore
solar::zs Purpose: The old zscore command to zscore current trait Old Usage: zscore [-off] [-q] zs [-off] ;# Perform zscore quietly -off Turn off zscore -q Perform zscore quietly Notes: The "Mean" and "SD" values used by zscore are computed only once, at the time the zscore command is given. Thus they do not reflect later changes to the phenotypes file, or to the sample, which might be restricted due to individuals missing covariates added later. Generally, for this reason the zscore command should be given after the covariates command and immediately before a model maximizing command such as polygenic. Starting with SOLAR Version 4.0.9, the trait mean and SD are computed from the actual sample that would be included in an analysis (at the time the zscore command is given). As described in the notes below, you can adjust the Mean and SD by using "option zmean1" and "option zsd1" to set the values actually used. These values are applied to the trait values during maximization. If the trait is changed without giving the "model new" command, the new trait will be zscored automatically. This feature is obsolescent. In a future update, zscore will be turned off when the trait is changed. An alternative to zscore is to define the trait as the inverse normal transformation of a variable. See "help inormal" and "help define" for further details. zscore will also calculate a number of statistics for the trait: mean, minimum, maximum, standard deviation, skewness, and kurtosis. These will be written to the file zscore.out in the current output directory. As of version 4.0.9, these statistics are no longer written to the terminal. Instead, a single line is displayed with the trait name, mean, and SD. Even that line is not shown if zscore is invoked from a script or the zs abbreviation of the command is used. To calculate these statistics for any phenotypic variable without zscoring and without necessarily making it the trait, use the "stats" command instead. A trait must already have been selected with the trait command or loaded model. Also the phenotypes file must have been loaded. When a maximization is performed, trait values are replaced with their zscored values. The formula is: zscored = (value - Mean) / SD zscore is a model dependent option controlled by "option zscore". It remains in effect until another model is loaded or the "model new" command is given. When models maximized with zscore are reloaded, zscore is again activated. "option zscore" is set to 1 ("on") by this command, and the related options zmean1 and zsd1 (mean and standard deviation for the first trait) and zmean2 and zsd2 (mean and standard deviation for the second trait) are set as required. You can adjust these options directly to fine tune the mean and standard deviation values used, but be sure that zscore is not set to 1 until the mean and (non-zero !) standard deviation values are set for all traits in the model. In a multivariate model, zscore will only be applied to the first two traits. Whenever zscore is activated or deactivated, parameters mean and SD are reset to zero to force setting new boundaries and starting point during the next maximization. If a new phenotypes file is loaded, the zscore command should be repeated to reflect the new file.Return to command index
A1.133 omega
Purpose: Sets the Omega (Covariance) equation directly Important: By default, SOLAR (starting with version 2.0.2) sets up the omega automatically when you give the "trait" command, and subsequently modifies it as required during polygenic, multipoint, and other commands. It is only necessary for the user to use the omega command for special purpose advanced analyses. Usage: omega = <expression> ; sets Omega equation omega ; displays Omega equation omega reset ; reset default Omega equation Notes: The default omega for a SOLAR model with one linkage element is: omega = pvar*(I*e2 + phi2*h2r + mibd1*h2q1) Notice that each term inside the parentheses has a matrix and a parameter. Some of the matrices loaded from external files and some are computed internally. The built-in variables are: pvar...........Phenotypic variance. This is the square of the SD (trait standard deviation) parameter. I.............Identity matrix, which equals 1 when pair of individuals is the same individual, 0 otherwise. phi2..........Two times the kinship coefficient, for quantiatitive models this is normally computed internally and on-the-fly to reduce storage requirements. Also found (identically) in the phi2.gz matrix file created by "load pedigree," SOLAR uses the phi2.gz by default for discrete traits, or the usage of the external file can be forced by using the "loadkin" command or giving a suitable "load matrix" command such as "load matrix phi2.gz phi2 delta7". delta7........Dominance coefficient, equivalent to Jacquard's delta7 from the series delta1-delta7 when there is no inbreeding. If there is inbreeding, this should not be used. As with phi2, this is, by default, computed internally and on-the-fly for quantitative models unless "loadkin" or a comparable "load matrix" command is given. The delta7 matrix is the 2nd matrix within the phi2.gz file. This matrix should only be loaded if it is needed because it is usually not used by SOLAR. male_i........1 if individual "i" is male, 0 otherwise. male_j........1 if individual "j" is male, 0 otherwise. female_i......1 if individual "i" is female, 0 otherwise. female_j......1 if individual "j" is female, 0 otherwise. si............index of trait i (1..ntraits) sj............index of trait j (1..ntraits) For <phenotype> which is the name of a phenotypic variable: x_<phenotype>.....sample mean for <phenotype>. For example, x_age. min_<phenotype>...minimum sample value of <phenotype>. max_<phenotype>...maximum sample value of <phenotype>. <phenotype>_i.....value of <phenotype> for individual "i". <phenotype>_j.....value of <phenotype> for individual "j". For <parameter> which is the base name of a parameter: <parameter>(ti)...replace "ti" by name of trait of individual "i". For example, h2r(ti) may be h2r(weight) in an analysis of traits height and weight. <parameter>(tj)...replace "tj" by name of trait of individual "j". teq...............trait for individuals i and j is the same. tne...............trait for individuals i and j are not the same. Matrices may also be used, along with math operators + - * / and also ^ (power) and () (parentheses), and also all math functions defined by the C Programming Language which includes "log" for natural logarithm, trig functions, and hyperbolic functions, among others. Here is a list: erfc, erf, lgamma, gamma, j1, j0, y1, y0, rint, floor, ceil, tanh, cosh, sinh, atan, acos, asin, tan, cos, sin, expm1, exp, logb, log1p, log10, log, cbrt, sqrt, and abs. Parameter names with special characters should be enquoted using angle brackets so the entire name is enquoted, including any prefix string. For example, given a variable named age.onset with dot, the mean value of the variable could be indicated with <x_age.onset> in angle brackets as shown. This is the same as the rule used by the define and constraint commands. Beginning with version 3.0.4, the following equalities and inequalities may also be used between two terms. If the operator is true, 1 is returned, otherwise 0 is returned. This enables you to construct compound conditional expressions having the same effect as could have been done with "if" statements. The C operators < and > have been replaced with << and >> so as not to be confused with the <> quotation of variable names in SOLAR. C Format Fortran Format Test -------- -------------- ---- == .eq. if equal != .ne. if not equal >= .ge. if greather than or equal <= .le. if less than or equal >> .gt. if greater than << .lt. if less than Example of use of inequalities: omega = pvar * (I*e2 + (h2r >= 0.125)*phi2*h2rc + \ (h2r < 0.125)*phi2*h2rd) Beware that comparing the equality or inequality of two floating point numbers sometimes does not work as expected due to numerical representation limitations. For example, 1/3 might not equal 2/6. The precedence of the equality and inequality operators is below that of all other operations, so their expressions should be in parentheses as in the example above. There is also a function named "print" which simply prints the value of its argument, which may be any expression, and then returns that value. For example, in place of the standard univariate omega omega = pvar*(phi2*h2r + I*e2) You could have: omega = pvar*(print(phi2) + I*e2) and this would print each phi2 value as it is used. An expression may include any number of print functions, and they are evaluated in the standard order of evaluation, starting with the innermost subexpression. If you simply want to print some value without including it in the rest of the expression, you can multiply the print function by zero, for example: omega = pvar*(phi2*h2r + I*e2 + 0*print(delta7)) At this time, the print function can only print one number, without any identifying string. After each value is printed, you must press RETURN to procede to the next, or you can hold down RETURN to pass through a lot of prints. For multivariate models which have 3 or more traits, the automatically created standard omega includes generic rho parameters rhoe_ij, rhog_ij, and rhoc_ij if household effects, and rhoq1_ij (to rhoq10_ij) for linkage models. When the omega is evaluated, the i and j are replaced with the trait indexes. For example, with traits 1 and 2 rhoe_ij becomes rhoe_12. It is possible to write omegas without these generic rhos if desired. There are also 4 additional generic rho's available for custom usage: rhoa_ij, rhob_ij, rhod_ij, and rhof_ij. Shortcuts: ome - omegaReturn to command index
A1.134 option
Purpose: Set or read the value of model-specific options. Usage: option <option name> <value> ; sets option value option <option name> ; reads option value option ; shows all option values Notes: ibd-specific options are set by ibdoption. Most options control fairly obscure aspects of SOLAR operation and are not normally changed directly by SOLAR users. Many are automatically controlled by other SOLAR commands during normal operation. Model-specific options are stored in saved model files. Starting a new model (with the "model new" command) will reset all options to the default value. Loading a model will reset all options to the default, then load the options in the model file. It is recommended to specify options after specifying the trait and covariates, but before giving the "polygenic" or other model maximizing command. Warning: RARELY USED options may have become buggy in the context of more current features since they haven't been recently tested. Here is a list of options and their default values: CMDiagonal 0 Covariance matrices diagonal (1=yes; 0=no; automatically set to 1 for sporadic models, 0 for all others) StandErr 1 Compute Standard Errors (1=yes; 0=no; defaults to 1 except during a multipoint scan where it is set to 0 while scanning to improve speed) ModelType Default Determines the type of maximization performed. Default Default will peform standard quantitative maximization if univariate quantitative and discrete/mixed mode otherwise. Evd Original Evd1 modeltype computes eigenvalues for the kinship matrix only which requires more ongoing processing, but for pedigrees less than 1000 may be faster. (Note: the mga command by default uses Evd.) Evd2 EVD2 processing peforms complete transformation of the trait and all covariates. This requires more upfront processing but may be faster for pedigrees larger than 1000. AutoPhi2 Do not use this option unless you know you need it. It is not generally needed for new work. Beginning with Version 8.3.1, the phi2.gz matrix is loaded during maximization if phi2 is not already loaded by the user. That was already the default for many kinds of maximization, just not the simplest univariate quantitative. The AutoPhi2 modeltype brings back the original on-the-fly calculation of phi2, if no phi2.gz had been loaded. That is often not what is intended, but sometimes people who had not saved their phi2.gz were relying on it. StandLogLike 0 Standardize Loglikelihood (0=no; 1=yes) RARELY USED. AutoCovarBound 1.25 Factor used in estimating covariate boundaries: AutoCovarBound*(tmax-tmin)/(cmax-cmin) This is a fairly wide but useable estimate which has never needed adjustment. Grid 0 Method used in likelihood estimation 0=Search; 1=Grid RARELY USED, and not to be confused with grid command GridPoints 1 Points to be used if Grid option (above) is applied MaxIter 1000 Maximum Iterations allowed in a loglikelihood search (If you need more than this, something is probably wrong. Usually only 10-20 iterations of searching is sufficient. MaxIter is to prevent SOLAR from iterating forever in troublesome cases.) Outlier 0 0=keep; 1=remove outliers (It's probably better to to remove them yourself than rely on this RARELY USED option.) CutPeople 0.05 Factor used to remove outlying people CutPed 1.0 Factor used to remove outlying pedigrees TDist 0 Automatically set by the "tdist" command. Don't set this option yourself unless you are an expert. Use the tdist command instead, which sets this option and sets up the required parameter for you. Conv 1.0E-6 Convergence improvement factor for quantitative models (For experts only! See Chapter 6.) Conv(Discrete) 1.0e-4 Convergence improvement factor for discrete models NConv 4 Convergence count (For experts only!) Conv has to satisfied this many times. Tol 1.0E-8 Tolerance for parameters (for experts only!) MaxStep 5 Maximum steps (for experts only!) This many decrements are allowed in the attempt to improve loglikelihood. This may need to be increased in troublesome cases BCliffs 0.1 Backup from NaN cliffs by this factor (for experts only!) currently used only for discrete models MaxCliffs 15 Maximum steps to backup from NaN cliffs (for experts only) currently used only for discrete ScoreOnlyIndex -1 Automatically set by "multipoint -score"; otherwise don't touch MergeHousePeds 1 Merge pedigrees sharing households (1=yes; 0=no) Necessary for accurate C2 estimation; 1 is default. MergeAllPeds 0 Merge ALL pedigrees (Earlier merging method; use only if MergeHousePeds fails inexplicably) RobustEst 0 Robust Estimation (new and barely tested) Tune 3 Factor used with robust estimation PedSelect 0 Select only this pedigree for maximization. The zero default means "all pedigrees." Otherwise use integer to select pedigree as indexed in pedindex.out created by "load pedigree." Alternatively use commands "pedlike" and "pedlod" to get pedigree specific likelihoods and lods using the same parameter values. It is also possible to select multiple pedigrees by using + operator: option pedselect 1; option pedselect + 2; and so on. + is optional before the first selection. The list of selected pedigrees is saved to model file, and can be cleared out either with "option pedselect 0" or "model new." EnableDiscrete 1 Use SOLAR "Discrete Trait" modeling if trait is found to be discrete (2 integer values separated by one, e.g. 0,1). (0) means use quantitative modeling regardless of inferred trait type. DiscreteOrder 1 Ordering for discrete pedigrees. You are strongly discouraged from changing this by Dr. Jeff Williams who has done considerable study of discrete trait modeling. The default ordering (1) puts affecteds first if prevalence < 0.5, else unaffecteds first. No ordering is done for (0). (-1) reverses the standard ordering. (2) does per-pedigree ordering. (-2) does per-pedigree reverse ordering. Per-pedigree ordering is not available for multiple traits. Set DiscreteOrder to 3 to an create output file named "discrete.out" in output directory containing up to 7 variables of the ordered data. DiscreteMethod 1 Version of discrete code used. The default (1) seems the most robust, and use of the alternate method (2) is discouraged. UnbalancedTraits 1 Default is to use "unbalanced traits" in bivariate models. Individuals will be included in the analysis if they have either trait; they do not have to have both. Individuals are converted to individual-trait's. (0) Excludes individuals missing either trait. (-1) Also exclude individuals missing either trait, use "bivariate" feature built-in to Search.f. EnforceBounds 1 Numerical errors during the quadratic-solving phase of maximization can result in overshooting parameter boundaries by a tiny amount. The default enforces boundaries whenever a new "point" of parameter values is computed. This was essential for bivariate models to prevent attempted square roots of tiny negative numbers. (0) turns this feature off. This is not effective during some phases of maximization, such as estimation of standard errors, where the point is not computed in Search.f, which is why the AbsVarianceParms option was added. EnforceConstraints 0 Experimental. (1) turns on attempted enforcement of constraints when the derivatives of the tableau become too large to maintain constraint numerical accuracy better than 1e-4. (Otherwise, convergence may fail with the "Numerical constraint failure" note.) Unfortunately, when this is done, models are usually so out of whack for some reason or other that convergence will ultimately fail by exceeding the maximum iteration limit anyway. (-1) turns off the constraint numerical accuracy test at the end of maximization. The default (0) tests constraint accuracy at the end of maximization only. CorrectDeltas 0 Experimental. 1 turns on attempted correction of deltas for numerical errors during the quadratic problem solving phase of maximization. The default (0) leaves this turned off because the default EnforceBounds option accomplishes the intended result more efficiently in most cases. AbsVarianceParms 1 The default (1) forces the the abs() function to be applied to all known parameters used in the omega in bivariate maximization. This prevents tiny negative values from causing NaN's to arize from the application of square root. (-1) forces the abs() function to be applied in univariate maxmization as well. That might cause trouble with some discrete models. (0) forces the use of the actual parameter values in the omega, negative or not. BounDiff 0 This option controls the method used to compute "forward" and "central" differences in likelihood (derivatives) when a parameter is at a boundary. (Currently, this option is applied only in discrete trait modeling where it has been shown to be necessary to ensure complete maximization.) The default (0) prevents the parameter from going beyond the boundary if the boundary is -1 or 0 for lower bounds and 0 or 1 for upper bounds since these are the typical "hard" boundaries for variance components, correlations, roots, and logs. Otherwise, parameters are allowed to go beyond boundaries during derivative calculation by very tiny amounts. (1) always enforces the boundaries. (-1) never enforces the boundaries, allowing tiny excursions beyond them in all cases (which, if possible, might give the best results). (2) applies the historic rule: only upper boundaries are (always) enforced, but only for forward differences, and the simple method of substituting the negated backward difference is used. For options (0) and (1) a more sophisticated algorithm is used when otherwise the boundary would be crossed illegally. An inner difference is taken AND adjusted for the slope of the next nearest change in differences. In other words, the second derivative is used to compute an expected first derivative. PedLike 0 Intended for use by pedlike and pedlod commands only. Produces files "pedexclude.dat" and "pedlike.dat" during maximization. SampleSameTrustMe This option declares to SOLAR that the sample for the previous model is identical to the current model. Therefore, the same EVD matrices may be used, and the default checking that is done to determine if the sample is the same is bypassed. If this option is specified before EVD matrices have been created for any model, the required matrices are created, but the storage of additional data to determine when the sample changes is bypassed. Thus maximization speed of all EVD models is increased, but only slightly, about 1%. The downside is that if the user has specified this option in error, the results will be wrong and SOLAR might even crash. Hence the suffix "TrustMe". THIS IS A SPECIAL NO-WRITE OPTION which is not written to model files, because when the model is reloaded it may not be applicable at the time it is reloaded. It should be used only in scripts where the null and test models are absolutely certain to have the same sample. evdphase 0 If option modeltype is evd2, the evdphase option is used internally to handle the 3 model phases. In the evdphase 1, the sample is determined and evddata.out is written. In the second phase a model is created using evddata.out for traits and data with all unrelated individuals and this model is actually maximized using standard quantitative maximization. In the third phase the maximized model is translated back into the original parameters. Eigenvectors 0 If set to 1, the eigenvectors from EVD will be written to files. To use this option, use the "evdoutev" command (which invokes this option). EVDmat 0 If set to a non-zero value, this writes EVD data suitable for fphi or similar procedure to evddata.out. The correct way to use this option is by using the "evdout" command which invokes it. FPHIMethod 1 This sets the method for which EVD data should be written. The correct way to use this option is by using the "fhpi" command which has a "-method2" option, however the default Method 1 works better now. ResetRandom 0 If this option is set to a non-zero value, the random number sequence used during maximization will be set to its initial default value at the beginning of maximization. dontallowsamplechange 0 If option modeltype is evd, and this option is set to 1, model maximization will terminate prematurely with an error message if the the sample changes from the previous evd model. If modeltype is evd, and this option is defaulted at 0, the evd code will simply create new matrices as required. If the option samplesametrustme is 1, this option has no effect unless the size of a pedigree changes. If the option modeltype is not evd, this option has no effect. singulartrait 0 If set to 0, a trait having only one non-blank value in the entire phenotypes file is an error which will halt maximization, because most of the time when this happens it is because there is a user mistake which needs to be corrected. If set to 1, such a trait will be considered the "unaffected" value of a discrete trait and discrete trait maximization will be performed. If set to 2, the trait will be considered a quantitative trait and quantitative trait maximization will be performed, but then you must also preset fake upper and lower bounds for the mean and sd parameters because the normal algorithm for guessing them doesn't work in this case. Use the "parameter" command to preset these boundaries. The latter two options are generally only useful in the presence of covariate variation. PolyClasses "" The polyclasses option is modified by the polyclass command, users should not modify it. It becomes a comma separated list of all the classes in a polyclass model. This is created so that maximize can handle discrete trait and mixed trait models by locating the SD parameters for each trait. ParameterFormat 16 number of significant digits used for writing parameter values to model files and queries. Prior to Version 6.3.6, the default was 10. 16 is a compromise value that displays almost all the precision in a double precision floating point number, while avoiding representational issues that cause a string of 9's to appear at the end of a number. The largest useful value is 17, that always shows all information available, but sometimes at the expense of being very ugly. MatrixNumberFormat 15 number of significant digits used for writing results of matrix operations. 15 works best. ExpNotation 0 1 forces exponential notation (but only for certain commands, mga is the only one currently). 0 is auto mode, which typically uses fixed point while a few digits of precision are shown, then flips to exponential notation if required to show a nonzero value. ShuffleReseeding 1 1 gives consistent repeatable results by reseeding random generator at the beginning of each mathmatrix shuffle to default seed. Options are: 1...seeded on every shuffle to 5489u for consistent results (DEFAULT) 0...seeded first shuffle to 5489u, then free running -1...seeded every shuffle to time() for purely stochastic results -2...seeded first shuffle to time(), then free running Other values: seed to this value at beginning of each shuffle IMPORTANT NOTE: the zscore options below pertain to the now obsolescent zscore command implementation. Now it is preferred to use the zscore_ prefix operator with the define command. zscore 0 The default mode has zscore deactivated. Activation by setting this to a non-zero number should be done by the zscore command and only when the additionally required z options have been set. zmean1 0 When zscore is active, zmean1 should be set to the expected mean value for trait 1. zsd1 0 When zscore is active, zsd1 should be set to the expected standard deviation for trait 1. This option must not be zero when zscore is non-zero. zmean2 0 When zscore is active, zmean2 should be set to the expected mean value for trait 2. zsd2 0 When zscore is active, zsd2 should be set to the expected standard deviation value for trait 2. This option must not be zero when zscore is non-zero. Shortcuts: opt - optionsReturn to command index
A1.135 outdir
Purpose: Set maximization output directory (overriding default) Usage: outdir <dirname> outdir ; shows current outdir outdir -default ; restore default: (trait name) Notes: By default, solar models and related output are written to the maximization output directory. By default, that directory is named after the trait.* For bivariate models, the trait names are separated by a period ("."). The default output directory can be overridden by this command. Once set, it stays overridden until the "outdir -default" command is given, or a new SOLAR session is started. (*The directory will be named after the trait as entered in the trait command, rather than as it exists in the phenotypes file. For example, it will be named 'foo' if the command 'trait foo' has been given, even if the variable is actually named FOO.) To prepend the name of the maximization output directory to any filename, use the "full_filename" command. Shortcuts: outd - outdirReturn to command index
A1.136 parameter
Purpose: Create, modify, or list parameter(s) Usage: parameter <name> ; display or create a new parameter parameter <name> start <value> lower <value> upper <value> ; set parameter start and ; boundaries parameter <name> = ; return current parameter value parameter <name> = <value> ; set current (start) value parameter <name> start <value> ; set current (start) value parameter <name> se ; display last computed std error parameter <name> ; display all parameter information parameter delete <name> ; delete parameter parameter ; display all parameters parameter -return ; return parameter info in a list parameter fixupper <value> ; set upper bound as fixed parameter fixlower <value> ; set lower bound as fixed ; fixed bounds are not auto adjusted Notes: (1) The most commonly required standard parameters are created automatically when you give the "covariate" and "polygenic" commands. Their starting points and boundaries are also set automatically either then or at the start of maximization by fairly reliable heuristics. Boundaries are set automatically only if both boundaries are set to zero at the beginning of maximization, so if you preset one boundary, be sure to set the other. The "standard" parameters include: mean (for trait) SD (standard deviation) e2 h2r h2q1 c2 b* (covariate beta) (2) "start" and "=" are identical operators. "=" is simply a more convenient and mnemonic name in most cases. Once a maximization has been performed, the "start" or "=" value will actually be the maximimum likelihood estimate. It will then be the starting value for the NEXT maximization. Note that when you are setting the starting value for a parameter, you must surround the "=" used for assignment with spaces. For example: parameter h2r = 0.1 If you did not surround the = with spaces, it would appear that you were simply trying to create a new parameter named h2r=0.1. To prevent this kind of mistake, such names are not allowed. See note 4 below. (3) When a parameter is deleted, any constraint it appears in is deleted also. This behavior is obsolescent. In the future, a single TERM may be deleted from the constraint instead. In the meantime, it is recommended to edit constraints to remove parameters slated for deletion, THEN delete the parameter. (4) When naming parameters, you are advised to stick to the usual alphabetic, numeral, and underscore characters unless there is a good reason not to. However, other special characters may be actually allowed in order to accomdate all automatically created parameter names. The use of some of these characters will force the requirement that these parameters be quoted in <> when used in constraint, omega, and mu commands so as not to imply math operations. Shortcuts: par - parametersReturn to command index
A1.137 ped2csv
Purpose: Convert Pedsys format file to comma delimited format Usage: ped2csv <pedfilename> [<outfilename>] If <outfile> is not specified, filename is <pedfile>.csv Notes: BLANK fields are removed. Duplicate field names will cause an error. This command uses the "selectrecords" command, which makes it very easy: "selectrecords <pedfilename> <outfilename>". Since no condition is specified, all records are selected, and since selectrecords uses the tablefile command, it can read pedsys files.Return to command index
A1.138 pedifromsnps
solar::build_grm -- Purpose: Creates a empirical pedigree matrix from a plink data set Usage: pedifromsnps -i <input base name of plink data> -o <output csv file name> --freq <file made with plink_freq> [optional: -corr <alpha value> -per-chromo -king -method_two -normalize -batch_size <batch size value> -id_list <file w/ subject IDs> -n_threads <number of CPU threads>] -i The base file name of the plink .bed, .bim, and .fam files. -o The base file name for the output. -freq Name of output file from plink_freq command. -n_threads Number of CPU threads used for matrix calculation. Default: Automatically set based on hardware -per-chromo Outputs a separate matrix for each chromosome. Default: Disabled -corr <alpha value> Compute method one correlation GRM using this alpha value. Default: -1 -method_two Computes correlation GRM using a second method described below. Default: Disabled -king Computes Robust King GRM instead of using a correlation method. Default: Disabled -batch_size <batch size value> Number of loci computed at a single time per CPU thread. Default: 500 -id_list <file w/ subject IDs> Specified file contains a list of subject IDs separated by spaces. The resulting GRM will only use these IDs and excluded all others. Default: All IDs are used -normalize When used during the creation of a correlation GRM the final values are normalized using the square roots of the diagonal values. The result being that diagonal elements are 1 and off-diagonal elements are bounded by 1 and -1. Z*_i_j = Z_i_j/sqrt(Z_i_i*Z_j_j) where Z* is the final value and Z is the unnormalized value, i refers to the index of subject i while j refers to the index of subject j. Default method for GRM creation is correlation method one with default alpha = -1. Correlation Method One: Z_i_j = 1/m * sum_k=0 to k=m (SNP_i_k - 2*f_k)*(SNP_j_k - 2*f_k)*(2*f_k*(1-f_k))**alpha Z = GRM value i = subject row index j = subject column index k = locus index SNP_i_k = SNP value for subject i and locus k (must be 0,1,or 2) f_k = major allele frequency of locus k m = total number of loci not including loci that are missing values for subjects i or j alpha = exponent for variance, best set to default value of -1 Correlation Method Two: Z_i_j = 1/variance_sum * sum_k=0 to k=m (SNP_i_k - 2*f_k)*(SNP_j_k - 2*f_k) Note: Variables are the same as above except for variance_sum variance_sum = sum_k=0 to k=m 2*f_k*(1-f_k) Note: Locus index k is excluded if there is a missing SNP value for subject i or j. Therefore the variance_sum calculated for one element of the GRM may differ from another element. Robust King Method: Z_i_j = 1 - (sum_k=0 to k=m (SNP_i_k - SNP_j_k)**2)/(2*min(N_i,N_j)) N_i = number of heterozygotes for subject i N_j = number of heterozygotes for subject j Note: Heterozygotes for subject i or j are not included if the other subject is missing a value at that locus. For information on robust king method see Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen WM (2010) Robust relationship inference in genome-wide association studies. Bioinformatics 26(22):2867-287Return to command index
A1.139 pedigree
Purpose: Process the pedigree data. Usage: load pedigree <filename> [-founders] ; loads pedigree file load epedigree <filename> [-t <threshold>] [-1] ; see below ** pedigree show [all | <ped#>] ; displays pedigree data pedigree classes [-full [-nowarn] [-phi2]] [-model [-meanf]] ; displays relative-class counts When a pedigree file is loaded, each individual in the file is assigned a unique integer identifier for internal use by SOLAR. The mapping from permanent IDs to integer IDs is stored in the file 'pedindex.out'. Therefore, loading a pedigree data file named 'pedindex.out' is not allowed since that would result in the pedigree file being overwritten. Attempting to re-load a previously created 'pedindex.out' under a different name will not work either; see Section 8.2.1 in the Manual for discussion. If the pedigree file contains founders only, i.e. a set of unrelated individuals with no parental data, parent ID fields are not required. In this case, the '-founders' option must be included in the load command. If this option is specified but the pedigree file does contain parent ID fields, those fields will be ignored ** Beginning with version 8.3.0, empirical pedigrees may be loaded. Empirical pedigrees are csv files which are representations of a kinship matrix, having kinship values for pairs of individuals. Empirical pedigrees may be loaded either with the regular "pedigree" keyword or "epedigree". However if "load pedigree" is used, and the file is not obviously an empirical pedigree, an uninformative error message may result. It must have fields IDA, IDB, and KIN specified in the header line to be recognized as an empirical pedigree. Additional options -t <threshold> and -1 are available. Further discussion of empirical pedigrees is given under "help epedigree." If the keyword 'all' is given in the 'pedigree show' command, detailed info is displayed for all pedigrees. If a pedigree number is specified, detailed info is displayed for that pedigree only. If no argument is given, the show command displays summary information. The 'pedigree classes' command displays a tally of the relative classes present in the pedigree data. By default, the counts for relationships of 3rd degree and higher, as well as some 1st and 2nd degree relationships, are combined. If the '-full' option is included, then the counts for all relative classes are given separately. In this case, a warning message will be displayed if any of the relative classes cannot be handled by SOLAR's native method for computing multipoint IBDs. The '-nowarn' option will turn the warning off. If the '-phi2' option is included with the '-full' option, an additional column will be displayed which contains the kinship coefficient, multiplied by 2, for each relative class. If the '-model' option in included in the 'pedigree classes' command, the relative class tallies will include only those pairs of individuals who both enter the polygenic analysis specified in the null0 model for the current trait. The '-full' and '-phi2' options work as described above, but the '-nowarn' option is superfluous since the warning message described above is never displayed. When the '-meanf' option is included with the '-full' option, the "Mean F" statistic is calculated and displayed. The state of the currently loaded pedigree data is stored in the file 'pedigree.info' in the current working directory. This file persists between SOLAR runs, which means that the pedigree data which is loaded at the end of a session will still be loaded the next time SOLAR is invoked (from within the same working directory.) Notes: The pedigree load command creates several files in the current working directory: pedindex.out maps each ego ID to a sequential ID (IBDID) assigned by SOLAR pedindex.cde PEDSYS code file for pedindex.out phi2.gz gzipped file containing the kinship matrix multiplied by 2 house.gz gzipped file containing the household matrix The household matrix file will be created only if a household ID field is present in the pedigree file. The files listed above are specifically created for the pedigree file being loaded. They will be deleted when a new pedigree file is loaded. Hence, a different working directory must be used for each data set. For a description of the pedigree file, enter 'file-pedigree' Shortcuts: ped - pedigreesReturn to command index
A1.140 pedlike
Purpose: Calculate pedigree-specific loglikelihoods Usage: pedlike [-q] [<model>] -q (quiet) Supress output to terminal Notes: Default model will be current model, if current model has been maximized. If changes have been made to current model since the last maximization, results may not be predictable. If current model has not been maximized, default model is the null0 model in current outdir. Results are written to "pedlike.out" in the outdir and also shown on terminal with some additional summary info. The pedigree numbers used are based on SOLAR's pedigree index "pedindex.out" created by the "load pedigree" command. These do not necessarily correspond to user pedigree numbers (and there is not necessarily even a one-to-one correspondence). Refer to pedindex.out to associate your ID's with the pedigrees associated by SOLAR. (Note: pedindex.out has a code file pedindex.cde and is best read using PEDSYS.) Shortcuts: pedli - pedlikeReturn to command index
A1.141 pedlod
Purpose: Calculate pedigree-specific LOD scores Usage: pedlod [<test-model> [<null-model>]] Notes: If no model is specified, the model currently in memory is used as the test-model (useful if you have just run multipoint or twopoint), and its null-model (having one less linkage element) is used as the null model. If only one model is specified, the null model is taken from the outdir after the specified model is loaded. The pedigree numbers used are based on SOLAR's pedigree index "pedindex.out" created by the "load pedigree" command. These do not necessarily correspond to user pedigree numbers (and there is not necessarily even a one-to-one correspondence). Refer to pedindex.out to associate your ID's with the pedigrees associated by SOLAR. (Note: pedindex.out has a code file pedindex.cde and is best read using PEDSYS, but may also be read fairly well as a text file if PEDSYS is not available.) Note that the LOD score calculation may be affected by the number trait(s), and the lodp options. See the documentation for the "lodp" command for further details. When applicable, SOLAR converts 2df bivariate LODs to "1df effective" LODs. Shortcuts: pedlo - pedlodReturn to command index
A1.142 perdelta
Purpose: Set delta used by perturb Usage: perdelta <number> Notes: Defaults to 0.001 Shortcuts: perd - perdeltaReturn to command index
A1.143 perturb
Purpose: Perturb starting values for E2, H2r, and H2q's at bounds Usage: perturb Notes: perturb is specially tailored to the standard parameterization of e2, h2r, h2q1, etc. perturb does nothing silently if parameters e2 and h2r are not present. It is no longer necessary or possible to specify h2qindex as an unqualified argument as in earlier versions. If an unqualified argument is specified, it is ignored. This is used automatically by the 'linkmod' script, and therefore also by multipoint and twopoint. perdelta is the quantity used in the adjustment (this may be set with the perdelta command). It defaults to 0.001 Accumulated deltas are distributed only to other parameters whose values are 3*perdelta away from the relevant bound, and then only in the perdelta quantity. This does not handle conditions where parameters exceed boundaries by more than a very small amount. (Of course, they shouldn't exceed the boundaries at all, but sometimes they do by very small amounts. Recent changes to the maximization routines ought to eliminate that.) Shortcuts: perturb - perturbReturn to command index
A1.144 phenotypes
Purpose: Load the phenotypes file or display its variables Usage: load phenotypes [<filename>]+ ;# sets phenotype file phenotypes ;# returns phenotype names ;# (and filenames) phenotypes -files ;# returns filenames used ;# (useful in scripts) Notes: (1) Pedigree data should be loaded beforehand with the "load pedigree" command. You may have pedigree and phenotypes data in the same file if it has all the required fields, but you will still have to load it as phenotypes after loading it as the pedigree. (2) The phenotypes file may be in Comma Delimited or PEDSYS format. There must be an ID field (and FAMID field, if ID's are not unique in all the data), and then there may be any number of phenotypes. Sex is determined by the SEX field in the pedigree file and a SEX field in the phenotypes file is ignored. If FAMID is present in both pedigree and phenotypes files, it is assumed to be necessary to make ID's unique. If FAMID is not present in both files, uniqueness of ID's is tested during loading (since this is an often overlooked user error). If FAMID is present in both files, uniqueness is not tested. The fieldname EGO may be used in place of ID, and the "field" command may may be used to specify a different field name for ID. For more discussion about the phenotypes file format, see file-phenotypes. (3) Once a phenotypes file is loaded in a particular working directory, it remains loaded until another phenotypes is loaded, even if SOLAR is restarted there at a later time. The current pedigree state is kept in a file named phenotypes.info, which points to the current pedigree file(s) by name. When SOLAR starts, it check this file, and get the header from the phenotypes file(s) so that the phenotypes available are known. (4) During maximization, the pedigree data and phenotypes file are joined, so it is possible some errors will not be apparent until that time. (5) Individuals missing phenotypic data are removed from the maximization sample, and need not be included in the phenotypes file, however they should be included in the pedigree file as they may contribute to the pedigree structure and genetic coefficients of those individuals who are included. (6) Families in which no non-probands are present are removed from the maximization sample. Proband status is controlled by a PROBND field in the phenotypes file. To switch proband detection off, you may rename that field, or use the command "field proband -none". Shortcuts: phen - phenotypesReturn to command index
A1.145 plink_converter
Purpose: Converts plink .bed, .fam, and .bim to .csv file Usage: plink_converter -i <input base name> -o <output base name> optional:<-bin> <-max> <maximum snps per file> -perchromo Example: plink_converter -i test -o test -bin -max 50000 -bin Assigns values 0, 1, 2 to alllele combinations. A table with assignment valuesis provided. -max Sets the maximum number of snps per .csv file. For example if an input set contains 500 snps and the option -max 100 is used then five .csv files labeled <output_base_name>_<file number>.csv will be created. -perchromo Switch that separates output by chromosome. -solar Outputs snp data as 1/1,1/2,or 2/2Return to command index
A1.146 plink_freq
Purpose: Calculates the allele frequencies from a plink file Usage: plink_freq -plink <base filename of plink file set> -o <output file name> pedifromsnps and gpu_pedifromsnps require the output of this command. The text file output consists of an initial value that represents the number of loci in the file. The subsequent values are the frequency values for each locus. If a value could not be calculated for whatever reason, then -1 is written in place of a frequency.Return to command index
A1.147 plot
Purpose: Plot multipoint LOD scores, empirical LOD adjustments, or power Usage: plot [<chromnum>] [-pass <passnum>] [-write] [-color <colornum>] [-overlay] [-title <title>] [-subtitle <subtitle>] [-yscale <maxy>] [-map <user_map>] [-lodadj] [-min x] [-max x] [-nomark] [-nomarklab] [-all | -allpass [-nodisplay] [-nomini]] [-string [-allpass] [-lod <lod>] [-lodmark] [-lodscale] [-color <colorname>] [-noconv] [-date] [-name <name>] [-font <X-font-spec>] [-titlefont <X-font-spec] [-layers {{<layername> [-color <colorname>]} ... } [-replay {{<layername> [-color <colorname>]} ... } [-title <title>] [-dash 1/2] [-linestyle 1/2] [-liability [-model <name>]] [-power [-title <plot_title>]] plot -purge plot -close Examples: plot plot chromosome with highest LOD in pass 1 plot 9 plot chromosome 9 in pass 1 plot 9 -pass 2 plot chromosome 9 in pass 2 plot -all plot all chromosomes in pass 1 plot -all -pass 2 plot all chromosomes in pass 2 plot -allpass plot all chromosomes in all passes plot -string plot all chromosomes in pass 1 using "string" plot format plot -string -allpass plot all chromosomes in all passes using "string" plot format If postscript output files are saved, they are written to the current trait or outdir directory with names like these: chr01.ps chromosome 1 (pass 1) chr01.pass02.ps chromosome 1 (pass 2) pass01.ps Miniplot of chromosomes in pass 1 (plot -all -pass 1) pass01.str.ps String plot of pass 1 chromnum [1-29] Set chromosome number for plotting. The default is to plot chromosome with highest LOD score. -pass Set multipoint pass number for plotting. "1" would mean the first pass in which all models have one QTL. 1 is the default. -close Close the XMGR plot window. The miniplot and string plot display windows must be closed with their close buttons, but it is better if you close XMGR from the SOLAR command line. Otherwise, on your next plot, there will be a delay until SOLAR determines that it cannot communicate with the old XMGR session. Then, it will time-out and tell you to use the "tclgr close" command, which does the same thing as "plot -close". -write Write postscript output file for plot(s). If there are no other arguments and if a plot was done previously, the output file for the previous plot is written. Miniplot and stringplot files are always written by default. For plots drawn using XMGR, you can also choose to write the postscript file from the XMGR graphical interface, which give you more options. See note 8 below. -nomark Do not show ticks or labels for markers. (This works for both regular and -string plots.) Unless this option is selected, there must be a mibddir selection in the current directory so that SOLAR can find the map files. -nomarklab Do not show labels for markers (still show ticks). -title Set plot title. Title may be blanked with -title "" or -title " ". This is supported by regular plots, string plots, and power plots only. Plots made through XMGR may also have title set through graphical interface or by editing .gr file such as multipoint.gr. -subtitle Set plot subtitle. Supported by regular multipoint plots only. Subtitle may be blanked with -subtitle "" or -subtitle " ". -color Use this color for curve (overrides multipoint.gr default) For regular plots, this must be integer from 1-15; colors are defined by XMGR: 0:White 1:Black 2:Red 3:Green 4:Blue 5:Yellow 6:Brown 7:Gray 8:Violet 9:Cyan 10:Magenta 11:Orange 12:Indigo 13:Maroon 14:Turquoise 15:Green4 For string plots, the X11 color names are used. Typical color names are: black white blue red green grey orange purple brown violet magenta yellow cyan Many mixtures and shades are also available. Find the rgb.txt file in your X11 installation for a complete list. -overlay Plot this curve on top of the current graph, which may already include more than one curve. (Each curve corresponds to a distinct XMGR set, of which 30 are available in the custom version of XMGR used by SOLAR. To control order of sets in Legend, use the -set option for every plot.) -purge Delete all previously created plotfiles (not valid with other options; only valid for multipoint plots). -string Plot all chromosomes (in pass 1 unless otherwise specified) using "string plot" format. (An alternative page of plots in xmgr format can be produced by with plot -all command.) -name Name this plot for later use (-string plots only). -layers <layerlist> Add one or more previous plots to this plot. This is either a simple list of previous names, or a nested list of names with other options, in either case each element of <layerlist> specifies a single layer. See extended example below under replay. (-string plots only). -replay <layerlist> Draw previous plots only, otherwise this is the same as -layers. (-string plots only) Example: trait q1 plot -string -name A1 trait q2 plot -string -name A2 -layers {{A1 -color green}} trait q3 plot -string -name A3 -layers {{A2 -color blue} {A1 -color green}} plot -string -replay {{A3 -color grey} {A2 -color blue} {A1 -color red}} plot -string -replay {A3 A2 A1} ;# just default colors Note that spaces between close and open braces, as shown above, is required. You can specify -color for the new top level plot and/or for layers in the -layers or -replay list. Any unspecified colors will default to a built-in set of defaults. -lod lod Add horizontal scales above this lodscore (for string plot only) -noconv Do not mark convergence errors (string plot only) -date Datestamp (string plot only) -lodmark Put marker ticks ON TOP of LOD curve (default is to the left of the plot axis) String plot only. -lodscale Show the LOD scale when this LOD is exceeded (default is for the scale only to appear for the highest LOD). String plot only. -font (String plot only!) Specify X font to use for title. Use command "xlsfonts | more" to list X fonts. Wildcards may be used, however, results are sometimes unexpected. For example *bold-r* will match first bold,roman font in list, whatever that happens to be. -dash 5/5 (String plot only!) Specify line style dot and dash. Two or more integers are specified separated by slash. The first and all odd position numbers specify the length of drawn segments, whereas the second and all even position numbers specify the transparent segments. Results are approximate depending on system used. -linestyle 5/5 (String plot only!) Same as -dash (see above). Note that for regular plots, linestyle can be changed by editing the linestyle parameter in the applicable .gr file such as multipoint.gr. -titlefont (String plot only!) Same as -font, except applies to title only. Supercedes -font for title only. -all Plot all chromosomes in xmgr postscript format (in the first pass only unless -allpass specified). A page of miniature chromosome plots in postscript is created (if a python interpreter is available). The names of all postscript files are listed, and any of them may be printed with the lp command. Nothing is displayed on your desktop with this command. An alternative genome plot is available with "plot -string". -allpass Plot all chromosomes in all passes, producing either miniplots or "string" plots (if -string). -nodisplay Used with -all or -allpass to skip displaying miniplots on-screen (has no effect on xmgr graphs). -nomini Used with -all or -allpass to skip making miniplots. Automatically sets the "-write" option to write all individual chromosome plots. Miniplots can always be made later, and with more options, using the separate "miniplot" command. -yscale [NOTE: Ordinarily you do not need to use this.] This sets the smallest LOD scaling if there is no LOD above 4.99. Autoscaling will not apply for smaller values to prevent confusion (e.g. seeing what looks like a large peak but isn't because the y scale is is so small). The default value is 4.99. You can set this to 0 if you really want to look at tiny LOD curves. Larger scaling is applied automatically, as is the adjustment to allow space for marker labels. -map Use user_map file (in user map format). By default, the map information processed by the 'load map' command (and saved in the mibddir) is used to display marker labels and locations. However, you can substitute a user map file (set file-map) by using this argument. This allows you to delete unimportant markers and add QTL's of particular interest. -min m location at which to start plotting -max m location at which to end plotting (min and/or max may be used to restrict interval. These apply to ordinary chromosome plots only.) -quick Save time when plotting by re-using marker ticks and labels from previous plot. This option is used automatically when you are plotting from the multipoint command using the "-plot" argument. -lodadj Plot empirical LOD adjustment scores. None of the above arguments except -close and -color are applicable in this case. The format file lodadj.gr is used instead of multipoint.gr, but the rules are applied in the same way (see notes below). -liability Plot discrete trait liability function (a different kind of plot, not related to above). "polygenic" command must have been run first, and the following covariates must have been included: sex age age*sex age^2 age^2*sex The xmgr parameter file for -liability is liability.gr -model name Specify a different modelname for the -liability option. There must be a companion maximization output file (maximize -output name.out) so, for example, there is name.mod and name.out. The default is poly (poly.mod and poly.out are created by the polygenic command). -power Plot power versus QTL heritability. Only the -title argument is applicable in this case. The format file power.gr is used instead of multipoint.gr. Notes: 1. The trait or outdir must have previously been specified so the applicable multipoint file can be found. 2. Marker labels and ticks are taken from the mibdchrN.loc file (where N is the chromosome number) created from the user's map file during mibd file creation. These files should be stored in the mibddir (and the mibddir should be specified before running plot). If the mibdchrN.loc file(s) cannot be found, marker ticks and labels will not be shown. In SOLAR releases 1.1.2-1.2.0 the 'load map' command will create a mibdchrN.loc file in the current directory. There will be a tick for each marker, and up to 42 marker labels will be displayed. If there are more than 42 markers, some labels will not be displayed. Labels are prioritized based on LOD score and distance from nonzero value. By plotting after the multipoint session has completed, one gets the best benefit from this label prioritization. Marker ticks are always drawn vertically; add additional line (which might be diagonal) joins the label to its tick. You can eliminate the need for the map file by using the -nomark option. 3. XMGR (ACE/gr) is used for most plotting, using tclgr command. Each SOLAR process can have only one tclgr session open. You can change the plot command used with the 'tclgr syscommand' command (it must be XMGR pipe command compatible). If SOLAR is exited before closing the plot session, the plot session may remain active (however, it may print a warning about not being able to access the named pipe). If the user terminates the XMGR session through its graphical interface, the command 'plot -close' must be given to reset it before another plot command can be given. 4. The XMGR parameter setup file multipoint.gr is loaded. First, the multipoint.gr from SOLAR_LIB is loaded, then the multipoint.gr from ~/lib (if any), then the multipoint.gr from the working directory (if any). You need only include the parameters you want to change in your local copy. 5. When SOLAR exits, the XMGR session will be terminated. If the automatic termination of XMGR should fail, the user should terminate XMGR anyway to prevent it from hogging CPU. (The custom XMGR in SOLAR_BIN prevents CPU hogging.) 6. NaN's are indicated by X's on the curve. Areas of the curve in between multiple X's may be invalid. (NaN's are Not A Number which means maximization failed to arrive at a valid likelihood estimate. 7. There are two additional options, -set and -graph, whose usage is discouraged except under exceptional circumstances. They might force the set and graph numbers to specific values. By default, the set number is 1 (changed in version 1.6.0) except for overlays. Overlays use the first available set number counting backwards from 29. The graph number (except for overlays) is the same as the set number (overlays must use the preceding graph number). Fooling with these can get you into trouble, but under difficult circumstances they might help. 8. Standard postscript landscape mode is used in all output files. If you want to choose any other output features, such as Encapsulated Postscript (EPS), portrait mode, etc., for those plots made by XMGR, you can open the "Printer Setup" dialog (under the "File" menu). There you can select portrait output in a pulldown menu, check a "Generate EPS" check box, etc. Then, to write the file, select the "File" option in the "Print to:" pulldown, and then press the "Print" button at the bottom of the dialog box. You need not go to the separate "Print" option in the file menu, and sometimes it seems to work better to print directly from the Printer Setup dialog anyway. All postscript files can be printed using "lp" command. Displaying postscript or editing on screen depends on locally available software. Shortcuts: plo - plotmultiReturn to command index
A1.148 plotqtld
Purpose: Plot qtld (qtld.out) Usage: plotqtld <type> [-nolabels] [-nomarkers] [-file filename] [-local] <type> is one of: strat, mgeno, qtdt, qtld -nolabels do not include "marker" labels (ticks only) -nomarkers do not include marker ticks or labels -file Use named file instead of qtldm.out in outdir -local Ignore default plot parameters; use only local file Notes: You must select the trait or outdir first. The plot parameter file (in SOLAR_LIB) is qtld.gr. You may override it with a copy in ~/lib or your working directory. Your version need only include the parameters you would like to change. This should work in most cases. If you specify -local, however, the qtld.gr in SOLAR_LIB is completely ignored, and your qtld.gr must be complete, which might get around some very obscure conflict between the two plot parameter files.Return to command index
A1.149 plotqtn
Purpose: Plot qtn marginal tests (qtnm.out) Usage: plotqtn [-nolabels] [-nomarkers] [-file filename] [-local] -nolabels do not include "marker" labels (ticks only) -nomarkers do not include marker ticks or labels -file Use named file instead of qtnm.out in outdir -local Ignore default plot parameters; use only local file Notes: You must select the trait or outdir first. See qtnm for more information. It must be possible to find the qtnm.out file in the outdir. The plot parameter file (in SOLAR_LIB) is qtn.gr. You may override it with a copy in ~/lib or your working directory. Your version need only include the parameters you would like to change. This should work in most cases. If you specify -local, however, the qtn.gr in SOLAR_LIB is completely ignored, and your qtn.gr must be complete, which might get around some very obscure conflict between the two plot parameter files. plotqtn accepts either the original 4 or the new 5 column qtnm.out files. The 5 column files begin with the snp name that is not necessarily the location.Return to command index
A1.150 polyclass
Purpose: Set up polygenic model with class specific parameterization Usage: polyclass [-g] [-intrait] [-incovar] [<class-start>[-<class-end>]]+ [-comb] [-maxi] [-rincovar] [-maxsnp <snp_name>] sporclass [-g] [-intrait] [-incovar] [<class-start>[-<class-end>]]+ [-comb] [-maxi] [-rincovar] [-maxsnp <snp_name>] -g Use global phenotypic values to set parameter adjustments (otherwise, means are determined for each class) -intrait inormalize trait values on a per-class basis -resmax inormalize residual values in place of traits -incovar (NOT WORKING IN version 7.1.2) inormalize covar values on a per-class basis (only used for simple linear covariates, no interactions or exponents) -comb all classes combined model -max after building the model, maximize it -maxsnp <snp_name> Maximize and include snp_name as covariate in the model and determine statistics for it: beta, beta se, chi, p, and variance explained (varexp). H2r's are reported for the models with and without the snp. -append Append results to existing output file(s) if any Short Example: trait q4 covariate age sex polyclass 1-3 9 maximize -q Notes: One phenotypes file must have a field named "class" which defines the class value for each person in the sample. Class specific parameters are given names with _c<class> appended. User covariates are transformed into class-specific mu addends. All individuals in sample must have all variables specified as covariates. After choosing trait and covariates, do either sporclass or polyclass. You cannot do a second polyclass on a sporclassed model to make it polygenic. Unbalanced covariates for multivariate traits are not supported. This is different from ordinary covariate behavior for multivariate traits--which permits covariates to be missing in the sample if they are specific to a missing trait. A defined pseudo-covariate named "blank_classes()" restricts the sample to the union of all classes specified. The maximized model is asved in the output directory as polyclassmax.mod with output file polyclassmax.out. Note that if -intrait option is selected, trait name and default output directory will have leading i_ prefix (for the inormalization). If the -resmax option is selected, the trait will be named "residual" or "i_residual" if -intrait is also selected.Return to command index
A1.151 polyclass_normalize
Purpose: Runs sporadic model and inormal on a phenotype Usage: polyclass_normalize -out <output filename> optional: -class <class values seperated by comma> Example: load phenotypes <phenotypes file> covariates <covariate list> trait <trait to be analyized> polyclass_normalize -class 1,2,3 -out polyclass_normalize_output.csv Polyclass normalization function will perform normalization of datasets collected across different studies in preparation for mega-analysis. See Jahanshad an Kochunov Neuroimage. 2014 Apr 15;90:470-1. for details. In short, the data for each dataset should be coded by class variable with one class value (0, 1, 2, 3 ..) codding the individual datasets. The polyclass normalization will perform regression of the covarariates for each dataset and then inverse Gaussian normalization of the residuals. The outputs will be written in the file that can specified by -out argument.Return to command index
A1.152 polygenic
Purpose: Perform polygenic, sporadic, and/or household analysis Calculate H2r, significance of H2r, and proportion of variance contributed by covariates. Optionally performs covariate screening (determine significance level of each covariate). Usage: polygenic [-screen] [-all] [-p | -prob <p>] [-fix <covar>] [-testcovar <covar>] [-testrhoe] [-testrhog] [-testrhoc] [-sporadic] [-keephouse] [-testrhop] [-rhopse] [-fphi] (screencov is an alias for 'polygenic -screen') (sporadic is an alias for 'polygenic -sporadic') Typically before giving this command, you will give trait, covariate, and house (if applicable) commands. You will also load pedigree and phenotypes files if they have not already been loaded. solar> load pedigree ped solar> load phenotypes phen solar> trait hbp solar> covariate age sex age*sex smoke solar> polygenic -screen Alternatively, you may use the "automodel" command first to include all available phenotypes as covariates. See note 2 below and "help automodel". -screen (or -s) Perform covariate screening: Calculate significance level for each covariate, and run only the significant covariates in the final analysis. An inclusive significance threshold of 0.1 is used, but may be changed with the -prob option. Covariates may be locked in regardless of significance with the -fix or -all options. (An alternative method of covariate analysis using bayesian model averaging is available with the command: bayesavg -covariates) -p (or -prob) p is the probability level for keeping covariates as "significant." The default is 0.1. It is set to be generous so that covariates are not removed unnecessarily. (The probability levels for H2r and C2 are fixed at 0.05, however, H2r is never removed from the final model even if it judged to be not significant, and C2 is only removed from the model if it is zero in the final model and therefore has no effect at all.) -fix (or -f) "fix" (lock in) this particular covariate regardless of significance level. NOTE: a -fix or -f qualifier is required for each covariate to be fixed, for example: -f age -f sex -all (or -a) Keep all covariates in final anaysis regardless of significance level. -testcovar <covar> Test the probability of this covariate only. All other covariates are fixed and unscreened. This argument is incompatible with -screen (screen all covariates). The tested covariate is not removed from final model regardless of probability. For -testcovar, the default probability level for declared "significance" is 0.05 (which can be changed with -p option). Also, the reported proportion of variance is for the tested covariate only. -testrhoe (Bivariate only) Test significance of rhoe difference from 0 by running model where rhoe is constrained to 0. The p value is shown in the same line as the RhoE value. -testrhog (Bivariate only) Test significance of rhog differences from zero and from 1 (if positive) or -1 (if negative). Because there may be two p values, they are shown in line(s) below the RhoG result and standard error. -testrhoc (Bivariate Household only) Test significance of rhoc differences from zero and 1 (if positive) and -1 (if negative). Because there may be two p values, they are shown in line(s) below the RhoC result and std. error. -testrhop (Bivariate polygenic only) Test significance of derived estimate of phenotypic correlation differences (difference from 0). -rhopse (-testrhop must be specified also) Get standard error of rhop, saved in model file rhop.mod and variable SOLAR_RhoP_SE -sporadic Only evaluate sporadic models, not polygenic. -keephouse Keep "household effect" C2 parameter in final model even if it maximizes to zero in the best polygenic (or sporadic) model. -residinor After maximizing final sporadic model (after covariate testing, if that is done), residualize the final model and inormalize the residual trait. (Warning! The phenotypes file loaded at the end of analysis will be the residual phenotypes file.) -fphi Option to run polygenic using fphi function. Notes: (1) Output is written to directory selected by 'outdir' command, or, if none is selected, to a directory named by the trait. This is called the "maximization output directory." Polygenic results are in file named polygenic.out. Important loglikelihoods and statistical computations are recorded in polygenic.out.logs. If the -sporadic option is selected, the files are sporadic.out and sporadic.out.logs. For univariate models, the residuals are computed and written to a file named polygenic.residuals (or sporadic.residuals), then the statistics of those residuals are written to a file named polygenic.residuals.stats (or sporadic.residuals.stats). If the residual kurtosis is above 0.8, you get a special warning (see note 5 below). You also get a special warning if the trait standard deviation is below 0.5, which is undesireable for numerical reasons. (2) Prior to running polygenic, you should set up the trait and covariates. You may use the trait and covariate commands, or use the "automodel" command. "automodel" selects all variables otherwise unaccounted for in the phenotypes file as candidate covariates, and also sex and the standard interactions with sex and age. (If you are unfamiliar with "automodel" it would be a good idea to examine the covariates afterwards with the covariates command...) (3) Beginning with Version 8.3.1, the phi2.gz matrix is loaded during maximization if phi2 is not already loaded by the user. That was already being done for many kinds of maximization, just not the simplest univariate quantitative. The AutoPhi2 modeltype brings back the original on-the-fly calculation of phi2, if no phi2.gz had been loaded. That is often not what is intended, but sometimes people who had not saved their phi2.gz were relying on it. (4) If household effect (see "house") is in effect when the polygenic command is given, it will be included in the analysis. If the household parameter C2 is 0 in the household polygenic model, it will be removed from the final model regardless of whether "covariate screening" is performed, unless -keephouse is specified. The p value for C2 will be computed (if C2 is nonzero), but the p value will not cause C2 to be removed from the final model. The p value of the C2 parameters is not computed for bivariate models. (5) If any covariates have been constrained by the user, certain tests are not allowed: the determination of total variance due to covariates, or the Leibler-Kullback R squared (done for discrete traits). Also, such covariates are not included in the "screening" if the screening option is selected. (6) If you get the message about Residual Kurtosis being too high because it is above 0.8, there is danger of LOD scores being estimated too high in a subsequent linkage analysis. You should start over using either tdist or lodadj or inormal (see documentation) to protect against this. If you are already using tdist or lodadj, you may ignore this warning, but it would be fair to report both the Residual Kurtosis and the method you are using to deal with it. We most strongly recommend inormal, which in conjunction with the define command creates an inverse normalized transformation of your trait(s). If there are no covariates, the Kurtosis is computed from the trait itself, and no "residuals" are computed. The same warning threshold applies. We define Kurtosis as 0 for a standard normal distribution; 3 has already been subtracted from the normalized 4th central moment. (7) The polygenic command only supports our "standard" parameterizations. If you would like to use the esd,gsd,qsd parameterization, use the polygsd command (see "help polygsd" for more information) instead. (8) For bivariate polygenic models only, a derived estimate of RhoP, the phenotypic correlation, is displayed on terminal and written to polygenic.out. This estimate is computed from the h2r's, rhog, and rhoe according to the following formula: sqrt(h2r(ti))*sqrt(h2r(tj))*rhog + sqrt(1-h2r(ti))*sqrt(1-h2r(tj))*rhoe To determine the significance of RhoP by comparing models with a rhop parameter and a rhop parameter constrained to zero, use the -testrhop option. Additional models rhop.mod and rhop0.mod are written to the output directory. (9) The polygenic command creates global variables which may be accessed later (which is often useful in scripts). The variables are: SOLAR_Individuals number of individuals included in sample SOLAR_H2r_P p value for h2r SOLAR_Kurtosis residual trait kurtosis SOLAR_Covlist_P list of p values for covariates SOLAR_Covlist_Chi list of chi values for covariates SOLAR_RhoP derived estimate of phenotypic correlation for bivariate polygenic models, {} if not calculated SOLAR_RhoP_P -testrhop sets this to p value of rhop being nonzero SOLAR_RhoP_SE -rhopse sets this to se value of rhop SOLAR_RhoP_OK -testrhop sets this if likelihood of rhop parameterized model matches polygenic, as it should The covariate lists are created only if the -screen option is used. All screened variables are included, regardless of whether they were retained in the final model. Before you can access any of these variables in a script, you must use a "global" command. For example: global SOLAR_Kurtosis if {$SOLAR_Kurtosis > 4} {puts "Very bad kurtosis!"} (10) The default is for the standard error option to be turned on (and temporarily off, when desireable for certain tests). However, if you turn the standard error option off before starting polygenic, it will remain off. Shortcuts: polyg - polygenicReturn to command index
A1.153 polygsd
Purpose: Set up polygenic model esd and gsd parameters (EXPERIMENTAL) Usage: polygsd Note: "model new" and "trait" commands should be given first. After polygsd, you should use "maximize" command. Use the gsd2h2r command to convert resulting esd,gsd parameters to h2r value. Use the linkqsd command to add in linkage element afterwards. Example: model new trait q4 covar age sex polygsd maximize linkqsd gaw10mibd/mibd.9.18.gz ;# could maximize after this chromosome 9 10 interval 5 mibddir gaw10mibd multipoint -link linkqsd0 -cparm {esd gsd qsd}Return to command index
A1.154 polymod
Purpose: Set up polygenic model with the standard parameters Usage: polymod [-d] IMPORTANT:: Phenotypes, trait, and covariate commands must be given beforehand. -d Check for discrete trait(s) and make necessary changes. In most cases, this option is not necessary because "maximize" later checks for discrete traits and can also make these changes: constraining SD to 1 and making sure phi2 matrix is loaded, for each discrete trait. However, use of -d option can make the constraint or matrix order inside complex models easier to deal with. Notes: The starting lower bound for e2 is controlled by e2lower. Normally you do not use this command directly, but instead use the "polygenic" command to do a complete polygenic analysis, which maximizes a polygenic model which was set up using this command. See the tutorial in Chapter 3. polymod will modify an existing sporadic or linkage model to change it to polygenic. Use spormod to set up a sporadic model, and linkmod to set up a linkage model. None of these commands maximize models, they just set up or modify the parameters and omega as required. This command removes a house parameter (if present) from the omega, since a "polygenic" model is distinct from a "household polygenic" model. If you want the latter, call polymod first, then house. Or call house, THEN polygenic, since the polygenic command will check for and handle household effect properly. Shortcuts: polym - polymodelReturn to command index
A1.155 polyvoxel
Purpose: polygenic analysis of image data Usage: trait ... covariate ... polyvoxel <maskname> <outname> <maskname> the filename of mask image file to use <outname> the filename of image file to write as output Notes: Trait and covariate should be selected first. The number of layers in the output image will be adjusted to match the number of traits and covariatesReturn to command index
A1.156 power
Purpose: Perform power calculations (or MathMatrix power operation) Usage: power [-prev] [-h2t <h2t>] [-h2r <h2r>] [-data <fieldname>] [-grid {<from> <to> <incr>}] [-lod {<lod> ...}] [-freq <freq>] [-nreps <nreps>] [-seed <seed>] [-overwrite] [-plot] power -restart [-grid {<from> <to> <incr>}] [-lod {<lod> ...}] [-nreps <nreps>] [-plot] power <matrix> <integer-power> ;# see help mathmatrix This command performs a power calculation for the currently loaded pedigree, with the following assumptions: (1) the trait to be studied is either quantitative or dichotomous (e.g. affected/unaffected) (2) the trait to be studied is influenced by a single bi-allelic QTL with, optionally, a residual additive genetic effect due to polygenes (3) there will be fully informative marker genotype data available for all study subjects (4) all study subjects will be phenotyped for the trait to be studied (unless the -data option is used to exclude those individuals who will not have phenotypic data; see the description of this option below) Simulation is used to estimate the LOD score one would expect to obtain for a QTL having a certain effect size (i.e. QTL heritability). The expected LOD is calculated for a range of effect sizes. The ELODs, in turn, are used to compute the power to detect a QTL having these various effect sizes with, for example, a LOD of 3. The default is to perform 10 replicates of the simulation for each effect size in the range .01, .02, .03, ..., .99. For each replicate, both a polygenic and a linkage model are fitted to the simulated data and then compared. The resulting QTL heritability estimate and LOD score are recorded. The observed LODs are converted to power, i.e. the power to detect the corresponding observed effect size with a specified LOD. The following options give the user some control over the power calculation procedure: -prev If the trait to be studied is dichotomous, SOLAR will assume the existence of an unobserved liability distribution. Individuals with liabilities above some threshold value will be "affected", i.e. they will have the larger of the two trait values (for example, a 1 for a 0/1 trait.) The -prev option is used to specify the "disease" prevalence, or fraction of individuals who are "affected", which in turn determines the liability threshold. -grid Specify the set of effect sizes for which ELODs will be computed. The grid is given by a set of three numbers enclosed in curly braces: {<from> <to> <incr>} where <from> is the starting effect size, <to> is the last effect size considered, and <incr> is the interval between grid points. If the desired grid consists of a single effect size, the three-number list can be replaced by that single number and curly braces are not required. The default grid is from 0.05 through 0.5 by steps of 0.05. -h2r At each grid point, add a constant residual additive genetic heritability <h2r> to the QTL-specific heritability. -h2t Set the residual heritability so that the total heritability (QTL plus residual) is equal to a constant value <h2t>. -data Exclude individuals from the power calculation who are missing data for phenotype <fieldname>. -lod Specify the set of LODs for which power will be computed. If more than one LOD is specified, the set of numbers must be enclosed in curly braces. The default set of LODs is { 3 2 }. The order of the LODs is important since it is reflected in the output file power.out (see below). The set of LODs can also be changed for a completed power calculation by using the -lod option in conjunction with the -restart option. -freq Specify the frequency of the first of the two alleles assumed to exist for the QTL. The default allele frequency is 0.2113; this frequency results in the simulated trait having kurtosis = 0. -nreps Perform <nreps> simulations at each grid point. The default number of replicates is 100. -seed Set the random number generator seed. The default is to set the seed based on the date and time. -plot At the end of the power calculations, display a plot of power versus QTL heritability. To display this plot for a previously completed calculation, use the command "plot -power". -overwrite (or -ov) Overwrite the results of a previous power calculation. -restart (or -r) Restart a power calculation. Notes: It is possible to change the grid of effect sizes and the number of replicates when restarting a calculation. The calculation will not be restarted if a grid is chosen that does not include all the points in the previously specified grid unless the -overwrite option is included, in which case the simulation replicates for any extra grid points are discarded. Similarly, the -overwrite option is necessary if fewer replicates are requested than were done previously, in which case any extra replicates are discarded. The set of LODs for which power estimates are computed can also be changed in a restart. The other parameters, e.g. h2t, cannot be changed and are kept the same as in the original run, with the exception of the seed for the random number generator which is set based on the date and time. The plot of power versus QTL heritability is derived from a smoothed estimate of the ELODs. Smoothing is achieved with a least-squares fit of a second degree polynomial to the ELODs as a function of QTL heritability. It is important to have a sufficiently large number of replicates to produce a reasonable curve fit. The default of 100 replicates should suffice in most cases. To compute power as a function of the unsmoothed ELODs, include the -nosmooth option. The following files are created: power.out A space-delimited file containing a line for each grid point in the format X Y1 Y2 ..., which is suitable for input to plotting packages such as xmgr. The first (or X) column contains the QTL heritability. The succeeding columns hold the power estimates, each corresponding to a different LOD. These columns are in the order given by the -lod option. power.info Stores the various options selected along with the ELODs, averaged over the replicates, at each grid point. power.lods Stores the results of the simulation replicates run at each grid point. This file, along with power.info, is used to restart an interrupted power calculation. During a power calculation, various files named "simqtl.*" are created along with a trait directory named "simqt". These will be removed at the end of the run.Return to command index
A1.157 putsa
Purpose: Write to file and/or terminal without having to open and close Usage: putsnew <filename> ;# create new file or truncate old file putsa <filename> <string> ;# append to file putsat <filename> <string> ;# append to file and write to terminal Notes: 1. putsnew creates the file if not yet created, or truncates existing file to zero length. This is generally needed before starting to append to file using putsa unless you know the file already exists. 2. These procedures are NOT as efficient as Tcl's built-in operations for writing to a file (open, puts, close). If efficiency is paramount, and you have a tight writing loop not subject to exceptions or with fully handled exceptions, it is preferable to use the Tcl operations. However these procedures are believed to be as efficient as shell append piping (>>). These procedures are considerably more efficient than the now obsolescent "putsout". 3. putsa writes (appends) to the existing file 4. putsat writes (appends) to the existing file, and to the terminal 5. If an absolute filename is not specified, the file path is relative to the current working directory. If you need to write to a file in the maximization output directory, use the procedure full_filename first to produce the required pathname. Beware that the pathname produced by full_filename is also relative to the current working directory, so that if you change the current working directory, or the trait, or the outdir, you should run full_filename again, if you want to continue writing to a file in the current maximization directory (though it will then be a different directory and a different file). Example: # writing to file in maximization output directory set myfullname [full_filename myname.out] putsnew $myfullname putsat $myfullname "Writing to $myfullname" putsa $myfullname "writing more, but not to terminal this time" trait newtrait set myfullname [full_filename myname.out] putsnew $myfullname putsat $myfullname "Now writing to $myfullname"Return to command index
A1.158 putsat
solar::putsa -- Purpose: Write to file and/or terminal without having to open and close Usage: putsnew <filename> ;# create new file or truncate old file putsa <filename> <string> ;# append to file putsat <filename> <string> ;# append to file and write to terminal Notes: 1. putsnew creates the file if not yet created, or truncates existing file to zero length. This is generally needed before starting to append to file using putsa unless you know the file already exists. 2. These procedures are NOT as efficient as Tcl's built-in operations for writing to a file (open, puts, close). If efficiency is paramount, and you have a tight writing loop not subject to exceptions or with fully handled exceptions, it is preferable to use the Tcl operations. However these procedures are believed to be as efficient as shell append piping (>>). These procedures are considerably more efficient than the now obsolescent "putsout". 3. putsa writes (appends) to the existing file 4. putsat writes (appends) to the existing file, and to the terminal 5. If an absolute filename is not specified, the file path is relative to the current working directory. If you need to write to a file in the maximization output directory, use the procedure full_filename first to produce the required pathname. Beware that the pathname produced by full_filename is also relative to the current working directory, so that if you change the current working directory, or the trait, or the outdir, you should run full_filename again, if you want to continue writing to a file in the current maximization directory (though it will then be a different directory and a different file). Example: # writing to file in maximization output directory set myfullname [full_filename myname.out] putsnew $myfullname putsat $myfullname "Writing to $myfullname" putsa $myfullname "writing more, but not to terminal this time" trait newtrait set myfullname [full_filename myname.out] putsnew $myfullname putsat $myfullname "Now writing to $myfullname"Return to command index
A1.159 putsnew
solar::putsat -- solar::putsa -- Purpose: Write to file and/or terminal without having to open and close Usage: putsnew <filename> ;# create new file or truncate old file putsa <filename> <string> ;# append to file putsat <filename> <string> ;# append to file and write to terminal Notes: 1. putsnew creates the file if not yet created, or truncates existing file to zero length. This is generally needed before starting to append to file using putsa unless you know the file already exists. 2. These procedures are NOT as efficient as Tcl's built-in operations for writing to a file (open, puts, close). If efficiency is paramount, and you have a tight writing loop not subject to exceptions or with fully handled exceptions, it is preferable to use the Tcl operations. However these procedures are believed to be as efficient as shell append piping (>>). These procedures are considerably more efficient than the now obsolescent "putsout". 3. putsa writes (appends) to the existing file 4. putsat writes (appends) to the existing file, and to the terminal 5. If an absolute filename is not specified, the file path is relative to the current working directory. If you need to write to a file in the maximization output directory, use the procedure full_filename first to produce the required pathname. Beware that the pathname produced by full_filename is also relative to the current working directory, so that if you change the current working directory, or the trait, or the outdir, you should run full_filename again, if you want to continue writing to a file in the current maximization directory (though it will then be a different directory and a different file). Example: # writing to file in maximization output directory set myfullname [full_filename myname.out] putsnew $myfullname putsat $myfullname "Writing to $myfullname" putsa $myfullname "writing more, but not to terminal this time" trait newtrait set myfullname [full_filename myname.out] putsnew $myfullname putsat $myfullname "Now writing to $myfullname"Return to command index
A1.160 putsout
Purpose: Write message to terminal and/or file [obsolescent] NOTE: This is inefficient and no longer recommended. New code should use putsa to append to a file and putsat to append to file and write to terminal. See help putso. Usage: putsout [-q] [-d.] [-nonewline] <filename> <message> -q No output to terminal -d. Write file to current directory -nonewline As with puts command (note: may delay output) <filename> *name* of file in current output directory (outdir) <message> string Simple Example: putsout mine.out "The result was $result" Advanced Example: (Beginners ignore!) set q "" ifverbmax set q "-q" eval putsout $q \"Iteration: $i Value: $value\" Note: If using a variable for -q which could be "", be sure to use eval, otherwise "" would be considered filename argument, then remember to \ the quotes or they disappear during the eval.Return to command index
A1.161 qtld
Purpose: Association analysis for snps Usage: qtld Notes: Current model is used as starting point. It is saved in output directory as qtld.start.mod with standard errors turned off. Snp association phenotypes are prefixed by b_ w_ b2_ and w2_ and are taken from the currently loaded phenotypes files. If there is one matching phenotype, the other 3 are expected, and it is an error if any are missing. Main output is written to terminal and file qtld.out in the output directory. An additional file with detailed measured genotype information is written to mgeno.out in output directory. These output files are fixed column size and space delimited. The output fields in the mgeno.out file (same as in the terminal output) are: Trait, SNP, Stratification, Measured Genotype, QTDT, QTLD The output fields in the mgeno.out file (also listed at the top of the file) are: Trait, SNP, p(mg), h2m, muAA, se(muAA), muAB, se(muAB), muBB, se(muBB)Return to command index
A1.162 qtnm
Purpose: Marginal tests for bayesavg -qtn Usage: [allsnp] bayesavg -qtn -max 1 [load map snp.map] qtnm [-noplot] [-nomap] -noplot Do not plot results -nomap Do not use map file; SNP locations are encoded in names Notes: You must do bayesavg -qtn [-max 1] first, then qtnm. qtnm writes a datafile qtnm.out in the outdir, then invokes plotqtn to plot it. (The -max 1 is optional, however, if you want to do this quickly, you had best include it.) To include all snps as covariates in the starting model, use the separate command "allsnp". SNP covariate names (after the snp_ or hap_ prefix) will be mapped to locations using the currently loaded map file, which must be loaded prior to running qtnm. Map files stay loaded from one solar session to the next (in the same working directory) so once you have loaded it, you do not need to load it again. Beginning with version 3.0.3, snp names will always be mapped to locations using a loaded map file. However, you can revert to the previous method, in which the locations are encoded into the snp "names" using the -nomap option. Beginning with SOLAR version 3.0.2, the qtnm.out file has the following 5 columns: SNP Name (or location if numeric) SNP location Chi Squared p log(p) Previously there was no "SNP Name" column because it was assumed to be the location. Note that plotqtn accepts qtnm.out files with either 4 or 5 columns. Shortcuts: qtn - qtnmarkerReturn to command index
A1.163 quadratic
Purpose: Get the most recent quadratic form after a maximization Usage: quadratic Note: This could be used in a TCL script like this: set firstq [quadratic] Shortcuts: quadrat - quadraticReturn to command index
A1.164 read_arglist
Purpose: Read hyphenated optional arguments and argument-value pairs Usage: read_arglist arglist [identifier_name value_var]+ value_var := varname | {statement block} Notes: This provides a general way of handling argument lists of the form: [[arg] | [-identifier [value]]]+ Which is to say that there may be any number of "regular" arguments and "hyphenated" arguments. The "hyphenated" arguments may be stand-alone or paired with values. (Unlike typical Unix syntax, stand-alone hyphenated arguments MAY NOT be strung together, and hyphenated arguments with values must be separated by space and not with some other character such as =). The "regular" arguments (those not hyphenated or part of a hyphenated pair) are put into a list which is returned by this procedure. Hyphenated arguments may either require following "value" arguments or not allow them (in which case the hyphenated argument acts like a switch). Value arguments must be separated from the hyphenated argument by a space (as is typical in Tcl). For example bar -height 1.5 There are two ways read_arglist can handle a hyphenated argument. (1) The first, specified by the 'varname' expansion of value_var, performs an assignment of the "value" argument to the caller's varname variable. For example: read_arglist $args -height h If $args contains "-height 1.5", then 1.5 will be assigned to the caller's 'h' variable. Note that this method always requires a value argument and so does not work for switch arguments. (2) The second, specified by the '{statement block}' expansion of value_var executes an arbitrary set of expressions in the caller's context. This allows a simple switch or more complex list-building. The the statement block contains the token VALUE, a value argument is required and the token VALUE is replaced by the actual value argument. Substitution is performed only once and only for the first occurance of VALUE. A simple switch is implemented like this: read_arglist $args -bottom {set bottom 1} If $args contains "-bottom," bottom (in the caller's context) is set to 1. A value argument is neither required nor allowed. A list-building argument is implemented like this: read_arglist $args -include {lappend ilist VALUE} If $args contains "-include foo" then "lappend ilist foo" is executed in the caller's context. NOTE that in the {statement block} form, the statement block IS REQUIRED to have more than one list element. A llength is done to determine which form is being used. Thus, you cannot have: read_arglist $args -exit {exit} ;# *** BAD *** but you could have read_arglist $args -exit {eval exit} If -* is used as an identifier_value, it matches any argument in the argument list and causes that argument do be added to the return list. Normally -* should be the last identifier value; all following identifier values will be ignored. Also, the value_var (or statement block) following -* is never assigned or executed and so can be anything. This is intended as an escape to permit only certain arguments to be processed leaving other variables for processing by a later procedure. More notes: It is the responsibility of the caller to assign default values before calling this procedure. Hyphenated arguments may not have hyphenated strings for values. However, hyphenated arguments may have negative numbers (e.g. -1.2e5) for values. If the value does not parse as an integer or floating point number, it must not begin with hyphen. If the token following a hyphenated argument begins with - but is not a number, it is considered to be another hyphenated argument (which may cause the preceding argument to be flagged as having a missing value). Hyphenated argument names must not be numbers (integer or floating point). For example, you may not have "-1" or "-1.2e5" as a hyphenated argument. Hyphenated arguments which do not match any of the templates given raise the "Invalid argument %s". The identifier matching rule is case insensitive.Return to command index
A1.165 read_model
Purpose: Read a parameter value or likelihood from any saved model Usage: read_model <model-name> loglike ; returns loglikelihood read_model <model-name> <parameter> ; returns mle value read_model <model-name> <parameter> -se ; standard error read_model <model-name> <parameter> -lower ; lower bound read_model <model-name> <parameter> -upper ; upper bound read_model <model-name> <parameter> -score ; score Model is read from current maximization output directory (see help outdir). Example: trait q4 read_model null0 h2rReturn to command index
A1.166 read_output
Purpose: Read variable statistics from maximization output file Usage: read_output <outfile> <varname> [-mean | -min | -max | -std] -mean Get variable mean (default) -min Get variable minimum -max Get variable maximum -std Get variable standard deviation -d 1 if discrete, 0 otherwise Note: If outfile is not full pathname, current trait/outdir is assumed. Statistics pertain to actual sample used in maximization. Example: read_output null1.out q4 -stdReturn to command index
A1.167 register
Purpose: Create registration key file Usage: register <key> Notes: This creates a file ~/.solar_reg containing the key. Do not delete this file. You may copy this file to your home directory on other systems to run SOLAR if the same username is used. (Each key matches only one username.) To obtain a key, please send a request to [email protected]. specifing the username(s) under which you will be using the program, the email addresses of the users, and the NIH grant numbers (if any) that apply to the work for which SOLAR may be used. Shortcuts: regi - registerReturn to command index
A1.168 relatives
Purpose: Show relationships of relative pairs included in analysis (having all required variables) Usage: relatives [-meanf] -meanf causes Mean f to be reported relpairs ;# alias for "relatives -meanf" Notes: output is returned (displayed) and also written to file named relatives.out in current trait/outdir. Uses previously stored null0 model in current trait/outdir. Polygenic command should have been run previously to create null0 model. Shortcuts: relat - relativesReturn to command index
A1.169 relpairs
solar::relatives -- Purpose: Show relationships of relative pairs included in analysis (having all required variables) Usage: relatives [-meanf] -meanf causes Mean f to be reported relpairs ;# alias for "relatives -meanf" Notes: output is returned (displayed) and also written to file named relatives.out in current trait/outdir. Uses previously stored null0 model in current trait/outdir. Polygenic command should have been run previously to create null0 model.Return to command index
A1.170 remlist
Purpose: Remove element from list by name Usage: remlist <list> <element> Notes: Input list is not modified, but new list is returned. Only first matching element is removed. This works well with setappend for set-like behavior: use setappend to add elements to "set" and remlist to remove elements from set. Match testing is case insensitive. No error is raised if thre is no matching element; input list is returned unchanged. See Also: setappendReturn to command index
A1.171 remove_global
Purpose: Remove a global variable (so it no longer exists) Usage: remove_global <variable_name> Notes: It is not necessary to declare variable as global first, and there is no error if no such global actually exists. See Also: if_global_existsReturn to command index
A1.172 residual
Purpose: Compute residuals for a maximized model and phenotypes file Usage: residual [solarfile] [-out <residualfile>] [-phen <phenfile>] [-fewest] [-needped] solarfile solar maximization output file which is expected in the current outdir. The default is null0.out which is created by the polygenic command. The default directory is the current outdir, but you may specify a relative pathname to the current directory. EVD2 models must have actual model currently in memory (such as is the case immediately after running polygenic). If the "define" command is used to define the names used for trait(s) or covariate(s), there must be a model with the same rootname in the same directory as the output file. The default is null0.mod. Handling of the "scale" (and "noscale") commands also requires the presence of the model with the same rootname in the same directory as the output file. If this model is not present, residual will finish but print a warning if not inside a script. residualfile new phenotypes file with trait 'residual' (the default is 'residual.out' written to the working directory). phenfile the base phenotypes file; the default is to use the currently loaded phenotypes file(s). -fewest Copy fewest fields to residualfile (this would be ID, FAMID (if required), trait, and residual. The default is to include the above along with all other (unused) variables from the phenfile. -needped Only include individuals in the pedigree file. (By default, all individuals in the phenotypes file would be included, unless there is a covariate including sex which require the pedigree file.) Example: solar> automodel pheno atrait solar> polygenic -s solar> residual MOST IMPORTANT! This procedure requires a maximization output file. Unless otherwise specified, the default is assumed to be null0.out which is produced by the "polygenic" command. If this is not what you want, you need to specify the maximization output file. You cannot specify a model file, that is insufficient. Additional Notes: Univariate quantitative trait only. The trait or outdir must be specified first. Must be run in the directory in which pedigree was loaded. FAMID is included if present in both phenotypes and pedigree files. residualfile will be written in comma delimited format. This procedure does not handle hand-written 'mu' equations, only standard covariates. Not applicable to discrete traits. Shortcuts: resi - residualsReturn to command index
A1.173 retext
solar::retextpc -- Purpose: Translate files with Old Mac line terminators to unix terminators Usage: retext <input-file> [<output-file>] ;# for Old Mac files retextpc <input-file> [<output-file>] ;# for PC files (if needed) Notes: retextpc is for converting Dos/Windows files to the unix format SOLAR likes best (however this is not usually needed for csv files from PC's which generally SOLAR can handle asis). SOLAR now identifies the old style Mac termination when opening phenotypes files. These files MUST be translated because they are incompatible with the C programming library. If you need to run the retext command, you will get this message: File old.txt has unsupported text line terminators Use retext command to fix file before using The full input filename must be given, including ".txt" extension if present. If no output filename is specified, an additional ".tr" extension is added to the input filename. If a file with the .tr extension already exists, it is overwritten.Return to command index
A1.174 retextpc
Purpose: Translate files with Old Mac line terminators to unix terminators Usage: retext <input-file> [<output-file>] ;# for Old Mac files retextpc <input-file> [<output-file>] ;# for PC files (if needed) Notes: retextpc is for converting Dos/Windows files to the unix format SOLAR likes best (however this is not usually needed for csv files from PC's which generally SOLAR can handle asis). SOLAR now identifies the old style Mac termination when opening phenotypes files. These files MUST be translated because they are incompatible with the C programming library. If you need to run the retext command, you will get this message: File old.txt has unsupported text line terminators Use retext command to fix file before using The full input filename must be given, including ".txt" extension if present. If no output filename is specified, an additional ".tr" extension is added to the input filename. If a file with the .tr extension already exists, it is overwritten.Return to command index
A1.175 round_epedigree
Purpose: Converts empirical pedigree values to rounded phi2 value that would be produced but a coded pedigree Usage: round_pedigree -i <input empirical pedigree name> -o <modified empirical pedigree name> -relation <max relationship number> Example: round_epedigree -i epedigree.csv -o modified_epedigree.csv -relation 5 The value written to output are determined by finding the closest value in the set of relationship values and 0.The max relationship value being 0.5^<max relation>.Return to command index
A1.176 rvi
Purpose: rvi command will calculate regional vulnerability index for a group of patients and controls that are coded by "DX" field. The details of the calculation is provided in Kochunov et. 2019, PMID: 31352812. Usage: rvi --modality <name of modality> --disorder <name of disorder> --out <output file name> In brief, data from controls (DX=0) is used to perform inverse normal, followed by Z-normalization after correcting for covariates setup by the ordinary covar command. Typically age and sex are used. The phenotype file must include the standard names used by ENIGMA workflows. These names of expected phenotypes are provided in the cortical-effects.csv, subcortical-effects.csv and dtifa-effects.csv file in solar main directory. The RVI can be calculated for three modalities (1st option) named as:cortical, subcortical, and dtifa Disorder names include: ssd, mdd, bp, adReturn to command index
A1.177 save
Purpose: save <object-type> <arguments> save model <filename> More information is available under "help model" Shortcuts: sav - saveReturn to command index
A1.178 scale
Purpose: scale a covariate variable, or disable default scaling Usage: scale <var> <center> ; scale this variable to this center value noscale <var> ; use 0 as center value disabling default scale ; show all non-default scaling in effect scale <var> ; show scaling for this variable scale default <var> ; return to default scaling for var scale default_all ; return to default for all vars <var> any covariate variable, might be used in interaction too <center> real number Notes: By default, SOLAR adjusts all covariate variables to the sample mean. Using the scale command, you can adjust any covariate variable to any other fixed value, or disable adjustment altogether (by adjusting to zero). The adjustment applies to the variable whether it appears in a simple covariate (such as "age") or an interaction covariate (such as "age*sex") or both. There is currently no way of scaling the trait variable, or scaling any variable by a factor. Those features could be added in a future release. Scaling is saved with the model, and is superceded by whatever scaling is in effect with a new model.Return to command index
A1.179 screencov
Purpose: Perform polygenic analysis with covariate screening Same as 'polygenic -screen' solar::sporadic -- solar::polygenic -- Purpose: Perform polygenic, sporadic, and/or household analysis Calculate H2r, significance of H2r, and proportion of variance contributed by covariates. Optionally performs covariate screening (determine significance level of each covariate). Usage: polygenic [-screen] [-all] [-p | -prob <p>] [-fix <covar>] [-testcovar <covar>] [-testrhoe] [-testrhog] [-testrhoc] [-sporadic] [-keephouse] [-testrhop] [-rhopse] [-fphi] (screencov is an alias for 'polygenic -screen') (sporadic is an alias for 'polygenic -sporadic') Typically before giving this command, you will give trait, covariate, and house (if applicable) commands. You will also load pedigree and phenotypes files if they have not already been loaded. solar> load pedigree ped solar> load phenotypes phen solar> trait hbp solar> covariate age sex age*sex smoke solar> polygenic -screen Alternatively, you may use the "automodel" command first to include all available phenotypes as covariates. See note 2 below and "help automodel". -screen (or -s) Perform covariate screening: Calculate significance level for each covariate, and run only the significant covariates in the final analysis. An inclusive significance threshold of 0.1 is used, but may be changed with the -prob option. Covariates may be locked in regardless of significance with the -fix or -all options. (An alternative method of covariate analysis using bayesian model averaging is available with the command: bayesavg -covariates) -p (or -prob) p is the probability level for keeping covariates as "significant." The default is 0.1. It is set to be generous so that covariates are not removed unnecessarily. (The probability levels for H2r and C2 are fixed at 0.05, however, H2r is never removed from the final model even if it judged to be not significant, and C2 is only removed from the model if it is zero in the final model and therefore has no effect at all.) -fix (or -f) "fix" (lock in) this particular covariate regardless of significance level. NOTE: a -fix or -f qualifier is required for each covariate to be fixed, for example: -f age -f sex -all (or -a) Keep all covariates in final anaysis regardless of significance level. -testcovar <covar> Test the probability of this covariate only. All other covariates are fixed and unscreened. This argument is incompatible with -screen (screen all covariates). The tested covariate is not removed from final model regardless of probability. For -testcovar, the default probability level for declared "significance" is 0.05 (which can be changed with -p option). Also, the reported proportion of variance is for the tested covariate only. -testrhoe (Bivariate only) Test significance of rhoe difference from 0 by running model where rhoe is constrained to 0. The p value is shown in the same line as the RhoE value. -testrhog (Bivariate only) Test significance of rhog differences from zero and from 1 (if positive) or -1 (if negative). Because there may be two p values, they are shown in line(s) below the RhoG result and standard error. -testrhoc (Bivariate Household only) Test significance of rhoc differences from zero and 1 (if positive) and -1 (if negative). Because there may be two p values, they are shown in line(s) below the RhoC result and std. error. -testrhop (Bivariate polygenic only) Test significance of derived estimate of phenotypic correlation differences (difference from 0). -rhopse (-testrhop must be specified also) Get standard error of rhop, saved in model file rhop.mod and variable SOLAR_RhoP_SE -sporadic Only evaluate sporadic models, not polygenic. -keephouse Keep "household effect" C2 parameter in final model even if it maximizes to zero in the best polygenic (or sporadic) model. -residinor After maximizing final sporadic model (after covariate testing, if that is done), residualize the final model and inormalize the residual trait. (Warning! The phenotypes file loaded at the end of analysis will be the residual phenotypes file.) -fphi Option to run polygenic using fphi function. Notes: (1) Output is written to directory selected by 'outdir' command, or, if none is selected, to a directory named by the trait. This is called the "maximization output directory." Polygenic results are in file named polygenic.out. Important loglikelihoods and statistical computations are recorded in polygenic.out.logs. If the -sporadic option is selected, the files are sporadic.out and sporadic.out.logs. For univariate models, the residuals are computed and written to a file named polygenic.residuals (or sporadic.residuals), then the statistics of those residuals are written to a file named polygenic.residuals.stats (or sporadic.residuals.stats). If the residual kurtosis is above 0.8, you get a special warning (see note 5 below). You also get a special warning if the trait standard deviation is below 0.5, which is undesireable for numerical reasons. (2) Prior to running polygenic, you should set up the trait and covariates. You may use the trait and covariate commands, or use the "automodel" command. "automodel" selects all variables otherwise unaccounted for in the phenotypes file as candidate covariates, and also sex and the standard interactions with sex and age. (If you are unfamiliar with "automodel" it would be a good idea to examine the covariates afterwards with the covariates command...) (3) Beginning with Version 8.3.1, the phi2.gz matrix is loaded during maximization if phi2 is not already loaded by the user. That was already being done for many kinds of maximization, just not the simplest univariate quantitative. The AutoPhi2 modeltype brings back the original on-the-fly calculation of phi2, if no phi2.gz had been loaded. That is often not what is intended, but sometimes people who had not saved their phi2.gz were relying on it. (4) If household effect (see "house") is in effect when the polygenic command is given, it will be included in the analysis. If the household parameter C2 is 0 in the household polygenic model, it will be removed from the final model regardless of whether "covariate screening" is performed, unless -keephouse is specified. The p value for C2 will be computed (if C2 is nonzero), but the p value will not cause C2 to be removed from the final model. The p value of the C2 parameters is not computed for bivariate models. (5) If any covariates have been constrained by the user, certain tests are not allowed: the determination of total variance due to covariates, or the Leibler-Kullback R squared (done for discrete traits). Also, such covariates are not included in the "screening" if the screening option is selected. (6) If you get the message about Residual Kurtosis being too high because it is above 0.8, there is danger of LOD scores being estimated too high in a subsequent linkage analysis. You should start over using either tdist or lodadj or inormal (see documentation) to protect against this. If you are already using tdist or lodadj, you may ignore this warning, but it would be fair to report both the Residual Kurtosis and the method you are using to deal with it. We most strongly recommend inormal, which in conjunction with the define command creates an inverse normalized transformation of your trait(s). If there are no covariates, the Kurtosis is computed from the trait itself, and no "residuals" are computed. The same warning threshold applies. We define Kurtosis as 0 for a standard normal distribution; 3 has already been subtracted from the normalized 4th central moment. (7) The polygenic command only supports our "standard" parameterizations. If you would like to use the esd,gsd,qsd parameterization, use the polygsd command (see "help polygsd" for more information) instead. (8) For bivariate polygenic models only, a derived estimate of RhoP, the phenotypic correlation, is displayed on terminal and written to polygenic.out. This estimate is computed from the h2r's, rhog, and rhoe according to the following formula: sqrt(h2r(ti))*sqrt(h2r(tj))*rhog + sqrt(1-h2r(ti))*sqrt(1-h2r(tj))*rhoe To determine the significance of RhoP by comparing models with a rhop parameter and a rhop parameter constrained to zero, use the -testrhop option. Additional models rhop.mod and rhop0.mod are written to the output directory. (9) The polygenic command creates global variables which may be accessed later (which is often useful in scripts). The variables are: SOLAR_Individuals number of individuals included in sample SOLAR_H2r_P p value for h2r SOLAR_Kurtosis residual trait kurtosis SOLAR_Covlist_P list of p values for covariates SOLAR_Covlist_Chi list of chi values for covariates SOLAR_RhoP derived estimate of phenotypic correlation for bivariate polygenic models, {} if not calculated SOLAR_RhoP_P -testrhop sets this to p value of rhop being nonzero SOLAR_RhoP_SE -rhopse sets this to se value of rhop SOLAR_RhoP_OK -testrhop sets this if likelihood of rhop parameterized model matches polygenic, as it should The covariate lists are created only if the -screen option is used. All screened variables are included, regardless of whether they were retained in the final model. Before you can access any of these variables in a script, you must use a "global" command. For example: global SOLAR_Kurtosis if {$SOLAR_Kurtosis > 4} {puts "Very bad kurtosis!"} (10) The default is for the standard error option to be turned on (and temporarily off, when desireable for certain tests). However, if you turn the standard error option off before starting polygenic, it will remain off. Shortcuts: scree - screencovReturn to command index
A1.180 selectfields
Purpose: Select fields (columns) from data file(s) and copy to a new file Usage: selectfields [-noid] [<infile>]* [.] [-np] [<field-name>]* [-o <outfile>] [-sample] [-list filename] [-noid] A optional period (aka dot) ends the list of filenames and starts the list of field names. If there is no dot, the first argument is assumed to be the one and only data filename. The currently loaded phenotypes files are automatically included at the end of the list of files. If nothing precedes the dot, only the phenotypes files are used. Fields found in multiple files default to the first file in which they are found, however a warning is given when this happens. The -np argument forces the loaded phenotypes files to be ignored. The -sample argument forces only the inclusion of individuals having all field values defined. Otherwise, a record is written for every ID encountered in the file(s) from which data is read, however one or more data value(s) might be blank. -list filename Use all the field names in this file, listed one per line. These are appended to the list of field names given in the command line, if any. If the -noid switch is given, the old version of selectfiles is used. This takes one and only one <infile> followed by a list of fieldnames, with no dot in between. The only other option allowed is -o. No ID field is required in the input file, and no ID field is written unless included in the list of fieldnames. The loaded phenotypes file is not used unless that is the one file named. If not specified, <outfile> defaults to selectfields.out <field-names> follow rules for phenotypes files and are also affected by field command specifications. For example, if you specify "ID" as field name, this would also match a field name "EGO" in the file. Input file may be either PEDSYS or Comma Delimited format. Output file is comma delimited. Example: selectfields phen.dat out.dat ID AGE -o age.datReturn to command index
A1.181 selectrecords
Purpose: Select records from a file and copy them to a new file Usage: selectrecords <infile> [<outfile>] [{<conditions>}]* If not specified, <outfile> defaults to selectrecords.out Each <condition> is a Tcl conditional expression which includes field names in the file preceded by dollar sign $. Field names are case insensitive (you need not match capitalization used in file itself). Each condition must be enclosed in curly braces and spaced from other conditions if any. Conditions may also include actual Tcl variables, preceded by $$ Tcl variables are Case Sensitive. Simple examples are shown, but any valid Tcl expression operators and functions may be used, and expressions may be arbitrarily complex...they are evaluated by the Tcl expression parser, with the exception of special pre-substitution of $$ variables. Internally, $$ variables are upvar'd into a local variables having leading capital S. If a condition includes a non-existant field, it will never be satisfied, producing an empty result file. (In future, error detection may be added.) If a condition includes a undefined $$ tcl variable, an error will result. Input file may be either PEDSYS or Comma Delimited format. Output file is comma delimited. If the first condition does not include any dollar signs, it must include spaces (for example, {1 == 1}). No such requirement is made for subsequent conditions. It seems pointless to have condition without dollar signs anyway; if no condition is given you get all records (the "null condition" is always true). Example: selectrecords phen.dat out.dat {$bmi > 0.3} {$famid == 10} for {set F 1} {$F < 100} {incr F} { selectrecords phen.dat out$F.dat {$bmi > 0.3} {$famid == $$F} } Note: Records are "selected" when they match ALL conditions given (unless condition includes a non-existing field or has other error).Return to command index
A1.182 setappend
Purpose: Append only new elements to a list (keeping it like a set) Usage: setappend <listname> element Note: The list is identified by name, and may be modified, as with lappend. Example: set friends "John Tom Laura Jeff" setappend friends Harald See Also: remlistReturn to command index
A1.183 setxor
Purpose: Perform exclusive-or (xor) on two sets (Tcl lists) Usage: setxor aset bset Note: If element appears multiple times in one list, but not in other, it will appear multiple times in output.Return to command index
A1.184 shortcut
Purpose: Show command shortcuts legal in scripts Usage: shortcut <command> Shortcuts: shortc - shortcutsReturn to command index
A1.185 showproc
Purpose: Show SOLAR procedure or write to a file Usage: showproc <procname> [<filename>] If <filename> is not specified, procedure is displayed on terminal using the 'more' pager. If <filename> is specified, renamed proc is written to that file. This procedure will show any SOLAR procedure (script), whether built-in or user-defined. Some, but not all, built-in SOLAR commands are implemented as scripts, and can be shown by this command. Other SOLAR commands are implemented in C++ and FORTRAN, and cannot be shown by this command. User-defined scripts must be used once before they can be shown. The formatting shown by showproc may not be as pretty as it actually is in the source file because it will concatenate lines which are extended using backslash. showproc is based on the Tcl command "info body" which has this "feature." To protect built-in procedures from being accidentally superceded through the use of this command, the procedure name is suffixed with ".copy". If you choose to edit the script, IT IS RECOMMENDED THAT YOU DO NOT RENAME IT TO HAVE THE SAME NAME AS THE ORIGINAL PROCEDURE UNLESS YOU REALLY KNOW WHAT YOU ARE DOING. If you do that anyway, it would probably be ignored. SOLAR prevents you from overriding built-in procedures by putting the directory containing the active solar.tcl file to the front of the auto-load list. Normally, that directory is the SOLAR_BIN directory defined when SOLAR starts up. Even if you did have a copy of the solar.tcl file in your working directory when SOLAR started up, procedures might be resolved either to the solar.tcl file or to separate script files in your working directory, depending on which appears earlier in an alphabetical list. Before new procedures can be used in SOLAR you must restart SOLAR or give the newtcl command.Return to command index
A1.186 siminf
Purpose: Simulate a fully-informative marker and compute its IBDs Usage: siminf -out <markerfile> -ibd <ibdfile> -out Write simulated marker genotypes to this filename. The default is 'siminf.out' in the current working directory. -ibd Write marker-specific IBDs for the simulated marker to this filename. The default is 'ibd.siminf' in the current working directory. The file will be gzipped. Shortcuts: simin - siminfReturn to command index
A1.187 simqtl
Purpose: Simulate a QTL and (optionally) a linked marker Usage: simqtl [-seed <seed>] [-inform] [-gfile <genotype_file>] simqtl -model simqtl -freq {<f_1> ...} [-mfreq {<f_1> ...} [-theta <theta>]] [-ntrt <#traits>] -mean {{<m1_AA> <m1_Aa> <m1_aa>} ...} -sdev {<sd_1> ...} [-cov {<cov1> ...}] [-beta {{<b1_AA> <b1_Aa> <b1_aa>} ...}] [-cmean {<cov1_mean> ...}] [-mage {<mean_age>}] [-rhog {<gen_corr_2,1> <gen_corr_3,1> <gen_corr_3,2> ...}] [-rhoe {<env_corr_2,1> <env_corr_3,1> <env_corr_3,2> ...}] [-h2r {<h2r_1> ...}] simqtl -nloc <#QTLs> -nall {<nall_1> ...} [-ntrt <#traits>] -mean {{<m1_AA> <m1_Aa> <m1_aa>} ...} -sdev {<sd_1> ...} [-cov {<cov1> ...}] [-beta {{<b1_AA> <b1_Aa> <b1_aa>} ...}] [-cmean {<cov1_mean> ...}] [-mage {<mean_age>}] [-rhog {<gen_corr_2,1> <gen_corr_3,1> <gen_corr_3,2> ...}] [-rhoe {<env_corr_2,1> <env_corr_3,1> <env_corr_3,2> ...}] [-h2r {<h2r_1> ...}] There are two steps in the simulation process: (1) specifying the simulation model, and (2) running the simulation. The first form of the command shown is used to run the simulation, and takes the following optional arguments: -seed An integer seed for the random number generator. -inform If this argument is given, the simulated marker genotypes will be fully informative. -gfile For models of the second type described below, QTL genotypes are read from a file rather than simulated. This argument specifies the name of this file. The simulated trait values are written to the file "simqtl.phn". A simulated trait value will not be assigned to any individual who has an unknown age, or who is missing data for any other covariate specified in the simulation model. If QTL genotypes are simulated, they will be written to the file "simqtl.qtl". If the model includes a linked marker, the simulated marker genotypes are written to "simqtl.mrk". Two additional files are created and used by this command: "simqtl.dat" and "simqtl.par", which contain pedigree/covariate data and the model parameters, respectively. All of these files are created in the current working directory. If QTL genotypes are read from a file, that file must contain an ID field, a FAMID field (if required - see the documentation for marker files), and, for each QTL, a field containing the QTL genotype. Unlike SOLAR marker genotypes in general, the QTL genotypes must have integer alleles numbered consecutively beginning with 1. Also, if there are multiple QTLs, the position of the alleles is significant. For example, the QTL genotypes 1/3 and 2/1 are combined to form the two-locus genotype 12/31, while genotypes 3/1 and 2/1 yield the two-locus genotype 32/11. The second form of the command displays the simulation model parameters. The remaining forms of the command are for the two types of simulation model that may be specified. In the first model, a single QTL and, optionally, a single linked marker are simulated. One or more correlated quantitative traits will be generated, along with a polygenic background. Adjustments may be made to the trait means for covariate effects. The covariates sex and age are always included although no adjustments need be specified for these covariates. The sex field is a required SOLAR field and so it is guaranteed to be available. The age field is taken from the phenotypes file, if one has been loaded. The name of the age field must be "AGE". It is not an error if there is no age field in the phenotypes file. The model will still contain age correction terms (which should be set to zero), but obviously no adjustment to the trait mean involving age can be made. If adjustments to the mean are made for sex and/or age, then betas must be specified for each of 5 effects: sex, male age, female age, male age^2, and female age^2, in that order. The parameters for this model are: -freq The frequency of QTL alleles 1, 2, ,,,, N-1 where the QTL has N alleles. -mfreq The frequency of marker alleles 1, 2, ..., N-1 where the marker has N alleles. -theta The recombination fraction between the QTL and the marker. The default value is 0, i.e. the QTL and marker are fully linked. -ntrt The number of traits controlled by the QTL. The default value is 1. -mean For each trait, a list of genotypic means. Genotypes are ordered as follows: 1/1, 2/1, 2/2, 3/1, 3/2, ... That is, the mean for genotype i/j, i >= j, is the k-th element in the list, where k is given by k = i*(i - 1)/2 + j Because phase is not considered, genotypes i/j and j/i are the same. -sdev For each trait, the within-genotype phenotypic standard deviation. -cov A list of covariates, in addition to sex and age, for which adjustments to the trait mean(s) will be made. -beta For each trait, a set of lists, one for each covariate including sex and age, of genotype-specific adjustments to the trait mean. Genotype order is the same as for genotypic means. If no betas are specified, they will all default to 0, i.e. no covariate effects. As noted above, sex and age together require 5 betas for each genotype. If the betas for a particular trait and covariate are not genotype-specific, the corresponding list can be shortened to a single value; this value will be used for every genotype. -cmean For each covariate other than sex and age, a mean value to subtract from the covariate before applying a covariate correction to the trait means. -mage A mean age to be subtracted before applying the age correction to the trait means. -rhog For each pair of traits, the genetic correlation between those two traits. If there are N traits, the order of the pairs is (2,1), (3,1), (3,2), ..., (N,1), (N,2), ..., (N,N-1). The default is no genetic correlation. -rhoe For each pair of traits, the environmental correlation between those two traits. The default is no environmental correlation. -h2r For each trait, the residual heritability expressed as the fraction of trait variance after the QTL effect has been accounted for. The default is no residual heritability. In the second type of model, there may be multiple QTLs. The QTL genotypes are read from a file rather than simulated. Parameters unique to this model are: -nloc The number of QTLs. -nall A list of the number of alleles at each QTL. The remaining parameters are the same as in the first model type. The order of multi-locus genotypes is analogous to the single locus case. The multi-locus genotype i/j, i >= j, is in the k-th position, where k is given by k = i*(i - 1)/2 + j The i and j refer to the i-th and j-th multi-locus haplotypes. Haplotypes are ordered so that the alleles of the last locus vary the fastest, while the alleles of the first locus vary the slowest. For example, given three bi-allelic loci, the order of the three-locus haplotypes is 111, 112, 121, 122, 211, 212, 221, 222 The order of the three-locus genotypes is then 111/111, 112/111, 112/112, 121/111, 121/112, 121/121, ... Examples: 1. Simulate a QTL with two alleles, call them A and a, where the frequency of allele A is 0.7. A single trait will be simulated for which the mean of genotype AA is 90, the mean of genotype Aa is 100, and the mean of genotype aa is 120. The trait will have a within-genotype standard deviation of 10, and a residual heritability of 0.3. A marker with 5 alleles of equal frequency will be generated which has a recombination fraction of 0.05 with the QTL. The required commands are shown below - one to create the simulation model, another to actually perform the simulation. The first command has been broken into two lines to avoid line-wrapping, but must actually be entered as a single line. solar> simqtl -freq .7 -mfreq {.2 .2 .2 .2} -theta .05 -mean {90 100 120} -sdev 10 -h2r .3 solar> simqtl 2. Simulate a QTL with 3 alleles; the allele frequencies are 0.5, 0.3, and 0.2. There is no linked marker. There are two traits associated with this QTL. Sex and age have an effect on the first trait; there is no sex-by-age interaction or second-order age effect. The traits are correlated both genetically and environmentally. A mean population age of 40 is subtracted prior to the age correction to the mean of the first trait. When the simulation is run, the random number generator is seeded with the integer value 12345. solar> simqtl -freq {.5 .3} -ntrt 2 -mean {{10 15 12 20 18 30} {50 55 60 60 55 80}} -sdev {2.5 10} -h2r {.2 .65} -beta {{{-1.2 .1 -.5 1.4 2 -.5} {2.4 3 1.6 -4 0 -1} {2.4 3 1.6 -4 0 -1} {0} {0}} {{0} {0} {0} {0} {0}}} -mage 40 -rhog .7 -rhoe .4 solar> simqtl -seed 12345 3. Simulate a quantitative trait controlled by two QTLs. The first QTL has 3 alleles, and the second QTL has 2 alleles. There are 6 two-locus haplotypes, so we have a total of 21 two-locus genotypes in the order 11/11, 12/11, 12/12, 21/11, 21/12, 21/21, 22/11, 22/12, 22/21, 22/22, 31/11, 31/12, 31/21, 31/22, 31/31, 32/11, 32/12, 32/21, 32/22, 32/31, 32/32 When the simulation is run, the two-locus genotypes are read from the file "2locgtyp". solar> simqtl -nloc 2 -nall {3 2} -mean {26 31 36 28 33 30 33 38 35 40 31 36 33 38 36 36 41 38 43 41 46} -sdev 2.8 solar> simqtl -gfile 2locgtyp Shortcuts: simqt - simqtlReturn to command index
A1.188 slod
Purpose: Calculate slod (score-based LOD equivalent) on current model Usage: slod newparam Example: slod h2q1 Shortcuts: slod - slodReturn to command index
A1.189 snp
Purpose: Process SNP data. Usage: load snp [-xlinked] <genofile> [<locfile>] ; loads SNP data snp show ; displays summary of SNP data snp covar [-nohaplos] [-impute] ; prepare genotype covariates file snp qtld ; prepare QTLD covariates file snp ld [-window <window>] [-overwrite] [-plot [-absrho] [-file <psfile>] [-title <title>] [-date] [-gray]] ; compute linkage disequilibrium snp effnum [<method>] [-alpha <alpha>] ; use <method> to compute the effective number of SNPs snp unload ; unload SNP data SNP genotype data is treated as a special case of marker data. The file containing SNP genotypes must be in the same format as any SOLAR marker data file, and the SOLAR 'marker' and 'freq' commands can be used to process the SNP genotype data. However, the following restriction applies to SNP genotype data: there must be exactly two allelic types present in the data for each SNP. If a SNP has only a single allele, i.e. the SNP is not polymorphic, it will be loaded but cannot be used for further analysis. If a SNP with more than two alleles is encountered, the 'load snp' command will fail. After a successful load, a file named 'snp.typed' is created, which contains a field, named nGTypes, giving the number of SNPs genotyped for each pedigree member. This field is empty for untyped individuals. The locations of the SNPs can be read from a standard SOLAR map file by the 'load snp' command. Each location must be given as an integer number of basepairs. Only those SNPs that appear in the map file will be included in SNP processing commands. While it is not necessary that every SNP in the marker file be present in the map, it is required that genotype data be loaded for all SNPs in the map file. If the user does not specify a map file in the 'load snp' command but a map file has already been loaded, that map file will be used. Otherwise, a dummy map file, named 'snp.map', will be created in which consecutive basepair locations have been assigned to the SNPs, starting at zero. The 'snp show' command displays a summary of the SNP genotype data. The contents of this display are also saved in the file 'snp.show.txt'. The information displayed includes SNP name, location, number of typed individuals, and allele frequencies. The frequency information is ordered so that the common (higher frequency) allele is listed first. If available, the standard error of the allele frequency estimates and the p-value for a test of Hardy-Weinberg equilibrium are also displayed. The allele frequency standard errors are computed by the 'freq mle' command. The HWE test is performed when the '-hwe' option is included in the 'freq mle' command. The 'snp covar' command produces a file, 'snp.genocov', in which the SNP genotypes have been recoded to be suitable for use as covariates in a SOLAR measured genotype, QTLD, or Bayesian QTN analysis. This file includes a field for each SNP, named snp_<name>, where <name> is the SNP name. Genotypes are coded as the number of copies of the rarer allele: 0, 1, or 2. The 'snp covar' command can be run in one of three different ways. By default, SNP haplotypes generated by a haplotype estimation procedure, e.g. SimWalk2, are used to impute as many missing genotypes as possible. The haplotypes are read from the file 'snp.haplotypes', which is created from the output of the haplotype estimation procedure using the 'snphap' command. If SNP haplotypes are not available, the '-nohaplos' option can be given to the 'snp covar' command, in which case the covariates are generated solely from the genotype data. The third method for generating covariates, invoked with the '-impute' option, is to extend the genotype imputation of the default method. In this case, a missing genotype which cannot be assigned exactly from the haplotypes, is imputed from the weighted average of all haplotypes which are consistent with that individual's genotype and estimated haplotype data, where the weights are the estimated haplotype frequencies. These frequencies are read from the file 'snp.haplofreqs', which is created by the 'snphap' command. This method of generating the covariates ensures that each individual has complete data, i.e. there are no missing covariates. Because covariates will have been imputed for all pedigree members, whether they were genotyped or not, it may be desirable to include in an analysis only those individuals for whom at least some minimum number of SNPs were typed. This can be done by selecting on the nGTypes field, which is taken from the file 'snp.typed' and automatically joined to the covariates file. In addition to the file 'snp.genocov', the 'snp covar' command creates a file 'snp.geno-list', which contains the names of the covariates, one per line. This file can be used to specify the covariates to be included in various association analyses, e.g. the 'bayesavg' command. The 'snp qtld' command generates another type of covariates file, in this case the covariates required for a SOLAR QTLD analysis. This file, which is named 'snp.qtldcov', contains four covariates for each SNP: b_<name>, w_<name>, b2_<name>, and w2_<name>, where <name> is the name of the SNP. As with the genotypes covariates file, the nGTypes field from the file 'snp.typed' is automatically included and can be used to exclude untyped individuals from an analysis. The 'snp ld' command computes the pairwise correlation among the SNP genotypes. This measure can be used to identify those SNPs which are in linkage disequilibrium. The signed pairwise correlations are written to the file 'snp.ld.dat'. If 'snp.ld.dat' already exists, the correlations are not recomputed unless the '-overwrite' option is specified. The '-window' option limits the number of pairwise correlations that will be computed by the 'snp ld' command. Only the pairs of SNPs separated by no more than the number of basepairs in the window will be considered. A map file containing basepair locations must be loaded in order to use this option. When the '-plot' option is added to the 'snp ld' command, a PostScript LD plot will be displayed. If the file 'snp.ld.dat' already exists, the genotypic correlations are not recomputed. The plot is saved in the file specified by the '-file' option', or, by default, in the file 'snp.ld.ps'. The LD measure shown in the plot is the square of the genotypic correlation (rho^2), or, if the '-absrho' option is specified, the absolute value of the correlation. A plot title, enclosed in quotes, can be supplied with the '-title' option. The '-date' option adds a date stamp to the plot. The '-gray' option produces a gray-scale version of the plot. The 'snp effnum' command uses the specified method to estimate the effective number of SNPs based on the pairwise genotypic correlations. This is an estimate of the number of independent tests in an association analysis utilizing these SNPs, which can be used to determine an appropriate significance level for an association analysis utilizing these SNP data. Currently, the following methods are implemented: mosk Moskvina & Schmidt (default) liji Li & Ji For example: snp effnum liji The method of Moskvina & Schmidt is the more conservative of the two and is the default. The Li & Ji method entails computing the eigenvalues of the genotypic correlation matrix. The number of SNPs cannot exceed the number of genotyped individuals (i.e., the number of rows in the correlation matrix) or the matrix will be singular. The 'snp effnum' command also returns the p-value required for a result to be considered statistically significant after the correction for multiple testing is applied. This p-value is a function of the effective number of SNPs and the experiment-wide significance threshold (target alpha). The '-alpha' option specifies the target alpha (default value = .05).Return to command index
A1.190 snphap
Purpose: Compute SNP haplotypes and haplotype frequencies. Usage: snphap show ; displays summary of SNP haplotypes snphap prep <program> ; prepares input files needed to compute ; SNP haplotypes using <program>, where ; <program> is simwalk (sw) or merlin snphap import <program> [-file <filename>] [-overwrite] ; imports SNP haplotypes from an output ; file computed by <program>, where ; <program> is simwalk (sw) or merlin snphap freq prep ; prepares input file needed to compute ; SNP haplotype frequencies using the ; program snphap snphap freq import [-file <filename>] [-overwrite] ; imports SNP haplotype frequencies ; from an output file generated by the ; program snphap snphap count ; computes SNP haplotype frequencies by ; counting haplotypes present in data snphap covar ; prepare haplotype covariates file The 'snphap' command assumes that the 'snp load' command has been used to load SNP genotype data. The main purpose of this command is to prepare the SNP haplotypes and haplotype frequencies files used by the 'snp covar' command. SOLAR does not do haplotyping or haplotype frequency estimation itself, but rather relies on these functions being provided by external programs. The 'snphap prep' command is used to generate the input files for a haplotyping run using either SimWalk2 or Merlin. The output file created by the haplotyping procedure is then imported into SOLAR with the 'snphap import' command, which creates the file 'snp.haplotypes'. The '-overwrite' option guards against the unintentional replacement of an existing haplotypes file. Haplotype frequencies can be generated in two ways. The program SNPHAP can be used to compute frequency estimates using an EM algorithm with the assumption that all individuals are unrelated. The 'snphap freq prep' command prepares the input file required by SNPHAP. The 'snphap freq import' command processes the SNPHAP output to create the file 'snp.haplofreqs'. Alternatively, the 'snphap count' command generates the haplotype frequencies file by simply counting the haplotypes present in the haplotypes file 'snp.haplotypes'. The haplotype frequencies file is sorted in descending order of frequency, so that the most common haplotype appears first. The 'snphap covar' command generates a haplotype-based covariates file suitable for use in a SOLAR analysis. This file, which is named 'snp.haplocov' includes a covariate field for each of the haplotypes present in the file 'snp.haplotypes'. These fields are named hap_<hap#> where <hap#> is the position of the corresponding haplotype in the file 'snp.haplofreqs'. For example, the field corresponding to the most frequent haplotype is named hap_1 and the value in this field is the number of copies of this haplotype that an individual possesses. Covariates are defined only for those individuals with two complete haplotypes in 'snp.haplotypes' are included in the haplotype covariates file.Return to command index
A1.191 solarfile
Purpose: Read data file applying "field" name mapping Usage: Same as tablefile (see) but using "solarfile" command name, plus: solarfile $tablenum establish_name <generic-name> establish_name returns the actual field name applied or mapped the the generic name. For example, the generic-name "id" might actually be "ego" in the file, or a name mapped to "id" using the field command. The generic-names are listed by the field command. Notes: Intended for use in scripts. This command extends the "tablefile" command by allowing for user-supplied field name mapping using the "field" command. This also supports both default field names for basic identifiers: id,fa,mo and ego,sire,dam.Return to command index
A1.192 solarmodel
Purpose: Check SOLAR version compatibility of model solarmodel appears a the top of all new model files and identifies the model version. If the version is incompatible with the current version, an error message is displayed. solartcl appears at the top of all upgraded script files. SOLAR programmers are encoured to use solartcl as well. To upgrade solar models, use the "upgrade" command. Shortcuts: solarm - solarmodelReturn to command index
A1.193 solartcl
Purpose: Check SOLAR version compatibility of tcl file solar::solarmodel -- Purpose: Check SOLAR version compatibility of model solarmodel appears a the top of all new model files and identifies the model version. If the version is incompatible with the current version, an error message is displayed. solartcl appears at the top of all upgraded script files. SOLAR programmers are encoured to use solartcl as well. To upgrade solar models, use the "upgrade" command. Shortcuts: solart - solartclReturn to command index
A1.194 solarversion
Purpose: Display the SOLAR program version Usage: solarversion Shortcuts: solarv - solarversionReturn to command index
A1.195 split_class_file
Purpose: Splits a csv file based on a class column Usage: split_class_file <csv file name> Example: split_class_file pheno.csv pheno.csv is split into 0_pheno.csv, 1_pheno.csv,....,i_pheno.csvReturn to command index
A1.196 sporadic
solar::polygenic -- Purpose: Perform polygenic, sporadic, and/or household analysis Calculate H2r, significance of H2r, and proportion of variance contributed by covariates. Optionally performs covariate screening (determine significance level of each covariate). Usage: polygenic [-screen] [-all] [-p | -prob <p>] [-fix <covar>] [-testcovar <covar>] [-testrhoe] [-testrhog] [-testrhoc] [-sporadic] [-keephouse] [-testrhop] [-rhopse] [-fphi] (screencov is an alias for 'polygenic -screen') (sporadic is an alias for 'polygenic -sporadic') Typically before giving this command, you will give trait, covariate, and house (if applicable) commands. You will also load pedigree and phenotypes files if they have not already been loaded. solar> load pedigree ped solar> load phenotypes phen solar> trait hbp solar> covariate age sex age*sex smoke solar> polygenic -screen Alternatively, you may use the "automodel" command first to include all available phenotypes as covariates. See note 2 below and "help automodel". -screen (or -s) Perform covariate screening: Calculate significance level for each covariate, and run only the significant covariates in the final analysis. An inclusive significance threshold of 0.1 is used, but may be changed with the -prob option. Covariates may be locked in regardless of significance with the -fix or -all options. (An alternative method of covariate analysis using bayesian model averaging is available with the command: bayesavg -covariates) -p (or -prob) p is the probability level for keeping covariates as "significant." The default is 0.1. It is set to be generous so that covariates are not removed unnecessarily. (The probability levels for H2r and C2 are fixed at 0.05, however, H2r is never removed from the final model even if it judged to be not significant, and C2 is only removed from the model if it is zero in the final model and therefore has no effect at all.) -fix (or -f) "fix" (lock in) this particular covariate regardless of significance level. NOTE: a -fix or -f qualifier is required for each covariate to be fixed, for example: -f age -f sex -all (or -a) Keep all covariates in final anaysis regardless of significance level. -testcovar <covar> Test the probability of this covariate only. All other covariates are fixed and unscreened. This argument is incompatible with -screen (screen all covariates). The tested covariate is not removed from final model regardless of probability. For -testcovar, the default probability level for declared "significance" is 0.05 (which can be changed with -p option). Also, the reported proportion of variance is for the tested covariate only. -testrhoe (Bivariate only) Test significance of rhoe difference from 0 by running model where rhoe is constrained to 0. The p value is shown in the same line as the RhoE value. -testrhog (Bivariate only) Test significance of rhog differences from zero and from 1 (if positive) or -1 (if negative). Because there may be two p values, they are shown in line(s) below the RhoG result and standard error. -testrhoc (Bivariate Household only) Test significance of rhoc differences from zero and 1 (if positive) and -1 (if negative). Because there may be two p values, they are shown in line(s) below the RhoC result and std. error. -testrhop (Bivariate polygenic only) Test significance of derived estimate of phenotypic correlation differences (difference from 0). -rhopse (-testrhop must be specified also) Get standard error of rhop, saved in model file rhop.mod and variable SOLAR_RhoP_SE -sporadic Only evaluate sporadic models, not polygenic. -keephouse Keep "household effect" C2 parameter in final model even if it maximizes to zero in the best polygenic (or sporadic) model. -residinor After maximizing final sporadic model (after covariate testing, if that is done), residualize the final model and inormalize the residual trait. (Warning! The phenotypes file loaded at the end of analysis will be the residual phenotypes file.) -fphi Option to run polygenic using fphi function. Notes: (1) Output is written to directory selected by 'outdir' command, or, if none is selected, to a directory named by the trait. This is called the "maximization output directory." Polygenic results are in file named polygenic.out. Important loglikelihoods and statistical computations are recorded in polygenic.out.logs. If the -sporadic option is selected, the files are sporadic.out and sporadic.out.logs. For univariate models, the residuals are computed and written to a file named polygenic.residuals (or sporadic.residuals), then the statistics of those residuals are written to a file named polygenic.residuals.stats (or sporadic.residuals.stats). If the residual kurtosis is above 0.8, you get a special warning (see note 5 below). You also get a special warning if the trait standard deviation is below 0.5, which is undesireable for numerical reasons. (2) Prior to running polygenic, you should set up the trait and covariates. You may use the trait and covariate commands, or use the "automodel" command. "automodel" selects all variables otherwise unaccounted for in the phenotypes file as candidate covariates, and also sex and the standard interactions with sex and age. (If you are unfamiliar with "automodel" it would be a good idea to examine the covariates afterwards with the covariates command...) (3) Beginning with Version 8.3.1, the phi2.gz matrix is loaded during maximization if phi2 is not already loaded by the user. That was already being done for many kinds of maximization, just not the simplest univariate quantitative. The AutoPhi2 modeltype brings back the original on-the-fly calculation of phi2, if no phi2.gz had been loaded. That is often not what is intended, but sometimes people who had not saved their phi2.gz were relying on it. (4) If household effect (see "house") is in effect when the polygenic command is given, it will be included in the analysis. If the household parameter C2 is 0 in the household polygenic model, it will be removed from the final model regardless of whether "covariate screening" is performed, unless -keephouse is specified. The p value for C2 will be computed (if C2 is nonzero), but the p value will not cause C2 to be removed from the final model. The p value of the C2 parameters is not computed for bivariate models. (5) If any covariates have been constrained by the user, certain tests are not allowed: the determination of total variance due to covariates, or the Leibler-Kullback R squared (done for discrete traits). Also, such covariates are not included in the "screening" if the screening option is selected. (6) If you get the message about Residual Kurtosis being too high because it is above 0.8, there is danger of LOD scores being estimated too high in a subsequent linkage analysis. You should start over using either tdist or lodadj or inormal (see documentation) to protect against this. If you are already using tdist or lodadj, you may ignore this warning, but it would be fair to report both the Residual Kurtosis and the method you are using to deal with it. We most strongly recommend inormal, which in conjunction with the define command creates an inverse normalized transformation of your trait(s). If there are no covariates, the Kurtosis is computed from the trait itself, and no "residuals" are computed. The same warning threshold applies. We define Kurtosis as 0 for a standard normal distribution; 3 has already been subtracted from the normalized 4th central moment. (7) The polygenic command only supports our "standard" parameterizations. If you would like to use the esd,gsd,qsd parameterization, use the polygsd command (see "help polygsd" for more information) instead. (8) For bivariate polygenic models only, a derived estimate of RhoP, the phenotypic correlation, is displayed on terminal and written to polygenic.out. This estimate is computed from the h2r's, rhog, and rhoe according to the following formula: sqrt(h2r(ti))*sqrt(h2r(tj))*rhog + sqrt(1-h2r(ti))*sqrt(1-h2r(tj))*rhoe To determine the significance of RhoP by comparing models with a rhop parameter and a rhop parameter constrained to zero, use the -testrhop option. Additional models rhop.mod and rhop0.mod are written to the output directory. (9) The polygenic command creates global variables which may be accessed later (which is often useful in scripts). The variables are: SOLAR_Individuals number of individuals included in sample SOLAR_H2r_P p value for h2r SOLAR_Kurtosis residual trait kurtosis SOLAR_Covlist_P list of p values for covariates SOLAR_Covlist_Chi list of chi values for covariates SOLAR_RhoP derived estimate of phenotypic correlation for bivariate polygenic models, {} if not calculated SOLAR_RhoP_P -testrhop sets this to p value of rhop being nonzero SOLAR_RhoP_SE -rhopse sets this to se value of rhop SOLAR_RhoP_OK -testrhop sets this if likelihood of rhop parameterized model matches polygenic, as it should The covariate lists are created only if the -screen option is used. All screened variables are included, regardless of whether they were retained in the final model. Before you can access any of these variables in a script, you must use a "global" command. For example: global SOLAR_Kurtosis if {$SOLAR_Kurtosis > 4} {puts "Very bad kurtosis!"} (10) The default is for the standard error option to be turned on (and temporarily off, when desireable for certain tests). However, if you turn the standard error option off before starting polygenic, it will remain off.Return to command index
A1.197 sporadic_normalize
Purpose: Residualizes trait with sporadic model then inormalizes the residuals. Usage: sporadic_normalize -out <output file name> [optional: -list <file containing a list of traits> -class <perform by class> ] sporadic_normalize is used for factoring out covariates and inormalizing traits prior to performing other calculations. It can be used in two ways. One way is to select a trait with the trait command then select covariates with covariate command prior to calling this command. The second way requires the use of the -header option, which is a file containing the list traits separated white space. When the -header option is used sporadic_normalize residualizes and inormalizes all the traits listed within the header file. Covariates are still selected with covariate command before calling this command. If the phenotype contains a class column and you wish to perform the calculations by class then use the -class option by listing the classes separated by commas, for example -class 0,1,2,3 .Return to command index
A1.198 sporclass
solar::polyclass -- (EXPERIMENTAL) Purpose: Set up polygenic model with class specific parameterization Usage: polyclass [-g] [-intrait] [-incovar] [<class-start>[-<class-end>]]+ [-comb] [-maxi] [-rincovar] [-maxsnp <snp_name>] sporclass [-g] [-intrait] [-incovar] [<class-start>[-<class-end>]]+ [-comb] [-maxi] [-rincovar] [-maxsnp <snp_name>] -g Use global phenotypic values to set parameter adjustments (otherwise, means are determined for each class) -intrait inormalize trait values on a per-class basis -resmax inormalize residual values in place of traits -incovar (NOT WORKING IN version 7.1.2) inormalize covar values on a per-class basis (only used for simple linear covariates, no interactions or exponents) -comb all classes combined model -max after building the model, maximize it -maxsnp <snp_name> Maximize and include snp_name as covariate in the model and determine statistics for it: beta, beta se, chi, p, and variance explained (varexp). H2r's are reported for the models with and without the snp. -append Append results to existing output file(s) if any Short Example: trait q4 covariate age sex polyclass 1-3 9 maximize -q Notes: One phenotypes file must have a field named "class" which defines the class value for each person in the sample. Class specific parameters are given names with _c<class> appended. User covariates are transformed into class-specific mu addends. All individuals in sample must have all variables specified as covariates. After choosing trait and covariates, do either sporclass or polyclass. You cannot do a second polyclass on a sporclassed model to make it polygenic. Unbalanced covariates for multivariate traits are not supported. This is different from ordinary covariate behavior for multivariate traits--which permits covariates to be missing in the sample if they are specific to a missing trait. A defined pseudo-covariate named "blank_classes()" restricts the sample to the union of all classes specified. The maximized model is asved in the output directory as polyclassmax.mod with output file polyclassmax.out. Note that if -intrait option is selected, trait name and default output directory will have leading i_ prefix (for the inormalization). If the -resmax option is selected, the trait will be named "residual" or "i_residual" if -intrait is also selected.Return to command index
A1.199 spormod
Purpose: Set up a sporadic model with the standard parameters Usage: spormod Notes: There are no arguments. You must have previously loaded the phenotypes file, selected the trait, and specified the covariates. Household effects are suspended. If you want a 'household' model, give the spormod command first, then the 'house' command. The starting lower bound for e2 is controlled by e2lower. Normally you do not use this command directly, but instead use the "polygenic" command to do a complete polygenic analysis, which maximizes a sporadic model which was set up using this command. See the tutorial in Chapter 3. Shortcuts: spor - spormodelReturn to command index
A1.200 startclock
solar::stopclock Purpose: simple local timer (elapsed time) Usage: startclock;<timed command>;stopclock See Also: timediffReturn to command index
A1.201 stats
Purpose: Get and/or show statistics for any variable in a file Usage: stats [<variable>+ | -all [-file <filename>]] [-q] [-return] [-out <outfile>] -sample -all show stats for all variables in phenotypes file -return do not write output file, return list of stats; use stats_get to parse return list -q do not display to terminal -out specify alternate output filename; also returns list of stats -sample Use only the sample of the current model (there must be a current model, or at least a defined trait, and statistics can only be computed for the trait(s) and covariate(s) in that model) The default variable is the current trait, and the default filename is the currently loaded phenotypes file. You may also specify one or more variables. Results are written to stats.out in the CURRENT WORKING DIRECTORY. (Not the usual output directory, since the trait need not be set.) They are also displayed on the terminal. The statistics computed are mean, minimum, maximum, standard deviation, skewness, and kurtosis. (Note: We define kurtosis as 0 for a standard normal distribution; 3 has already been subtracted from the normalized 4th central moment.) See also the zscore command, which uses these stats to zscore the current trait during maximization. The zscore procedure uses stats with the -out option. If there are multiple phenotypes files, joinfiles will be used to create a joined file in the working directory named joinfiles.stats.[pid].[uname -n].out. Non-unique fieldnames will be renamed following the rules of joinfiles. Under most circumstances, this joined file will be deleted before stats returns. To run through the entire contents (and names) in the joined file, use the "stats -all" command. Variables created by a "define" command will work with stats beginning with SOLAR version 4.2.2. Note that such variables are not evaluated by the command "stats -all".Return to command index
A1.202 stats_get
Purpose: Retrieve statistics from list returned by stats Usage: stats_get <stats> <statistic> [<variable>] <stats> list returned by stats procedure <statistic> name of statistic desired (see below for list) <variable> select this variable (default: <first>) Example: set stat [stats -q -return q1] set kurt [stats_get $stat kurtosis] set skew [stats_get $stat skewness] Notes: The following statistics are available: variable name of variable count number of individuals having this variable (sample) missing number of individuals missing this variable mean mean min minimum value max maximum value sd standard deviation skewness skewness kurtosis kurtosis discrete 0 if quantitative, 1 if discrete, -1 if not coded properly alpha 0 if valid numbers; 1 if alphanumeric Of course, if a variable is selected, that variable must have been included in the stats list. When running the stats command you may select any number of variables or use the -all option. See the stats command for further information.Return to command index
A1.203 stepfor
Purpose: Foward stepwise covariate screening Usage: stepfor [-list listfile] [-list list] [-verbose] [-v] [-fix listfile] [-fix fixlist] [-max maxdim] [-p pvalue] [-test othertest] [-par] [-parclean] stepclean ;# Remove fully_typed covariate and unload file By default, stepfor will test all covariates in the current model, testing them all and then fixing the best one, and then repeating the process until the best one does not meet the default pvalue of 0.05, or user specified p-value or test (see below). The final model will contain all the covariates which met the screening test. A file named stepfor.out is written to the output directory with all the loglikelihoods, and a file named stepfor.history is written with other information. All of the best models for each number of covariates are saved as stepfor.null<i> where <i> is the number of tested covariates. To ensure that all models use the same sample, a new file named fully_typed.out is created in the output directory which defines a variable named "fully_typed" for each fully typed individual. This file is added to the list of open phenotypes files, and the variable "fully_typed" is added to the model as a "null" covariate which has no effect on the model other than restricting the sample to fully typed individuals. To remove the fully_typed covariate and unload the fully_typed.out phenotypes file, give the command "stepclean" after stepfor has completed. -list listfile listfile is a file containing a list of all covariates to be tested, one on each line. The filename cannot contain spaces. These covariates may or may not be in the model when the command is given. If the -list option is specified, all other covariates in the starting model are automatically fixed. -list list Alternatively, a Tcl list of covariates to be tested can be specified. Tcl lists are space delimited and enclosed in quotes or curly braces. -fix list list is a Tcl list of covariates to be included in every model and not tested. Their values will be estimated by maximum likelihood for every model, unless you constrain them. These covariates may or may not in the model when the command is given. For -fix, a list could be simply one phenotype, and that supercedes a file with the same name. -fix listfile Alternatively, a file containing a list of all covariates to be included in every model may be specified. The filename cannot contain spaces. The list of covariates to be fixed will supercede the list of covariates to be tested if the same covariate occurs on both lists, however a warning will be given. -p pvalue pvalue is the highest p value allowed for a covariate to be included. The default is 0.05. -max maxdim maxdim is the maximum number of test covariates to be included in a model (the maximum dimension). -verbose Show maximization output during maximizations. -v Same as -verbose -par New and EXPERIMENTAL! This option turns on Parallel processing on the SFBR GCC Compute Ranch. WARNING! Do not run more than one instance of stepfor -par from the same working directory. Parallel stepfor will use many (but not all!) ranch machines, and access for other users and jobs may be delayed due to gridware thrashing. The usual output is not printed to the terminal to save time but numerous parallel status messages are printed to help the developers make this program better. The parallel operation is automatic and the parallel status messages may be ignored by most users most of the time unless there is no output for more than fifteen minutes. Note: If model includes linkage element matrices loaded from some mibddir, those matrices should be relocated to the working directory, or specified with an absolute pathname in the model file. This is because in parallel operation the model is loaded not in the current working directory but in a subdirectory of /tmp. -parclean Normally, parallel stepfor cleans up after itself. However, if it is necessary to force a shutdown of a parallel stepfor, normal cleanup is not done. "stepfor -parclean" cleans up all the junk stepfor files in /tmp directories on all ranch machines. This must be run on medusa. Do not run if you have any other running parallel jobs (parallel stepfor, parallel bayesavg, or any parallel job using "launch" or "doscript") as their files may be deleted too. See also "doranch" for other ranch cleanup procedures. Cleanup history is written to a file named cleantmp.out. -test othertest othertest is a user defined Tcl proc that judges whether or not a covariate should be included. The test model with the best covariate is loaded at the time this procedure is called. This procedure takes two mandatory arguments (whether they are needed by the procedure or not). loglike0 nullmodelname loglike0 is the loglikelihood of the null model which does not contain the current test covariate. nullmodelname is the pathname to the null model itself. The procedure may obtain the loglikelihood of the current model with the loglike command. The default procedure looks like this: proc stepfortest {loglike0 nullmodel} { set chisq [expr 2.0 * ([loglike] - $loglike0)] if {$chisq >= 0} { set pvalue [chi -number $chisq 1] } else { set pvalue 1 } set pvalue [chi -number $chisq 1] putsout stepfor.history "\n *** p = $pvalue" global SOLAR_stepfor_pvalue if {$pvalue <= $SOLAR_stepfor_pvalue} { return 1 } return 0 } Note that the default procedure does not use the nullmodel argument, but it does use a global variable that you will not have to use. The global supports the -p argument. The procedure may load the nullmodel without restoring the current model; that is handled by the stepfor procedure itself.Return to command index
A1.204 stepup
Purpose: Covariate screening by Step Up algorithm, useful for QTN analysis Usage: stepup [-list listfile] [-list list] [-verbose] [-fix listfile] [-fix fixlist] [-cutoff cutoff] [-logn logn] [-finishlogn] [-symmetric] [-cornerdf df] [-par] [-parclean] stepup is an fast version of bayesavg and may be used in QTN analysis. By default, stepup will test all covariates in the current model one at a time, then add all the new covariate models within the BIC cutoff to the window. Then the window models are subjected to another round of testing against all covariates, and the process repeats until no more models are added to the window. Unlike bayesavg, this algorithm doesn't test all possible models, just those that are derived from those in the window. When completed, it writes files named stepup.win and stepup.avg to the output directgory containing posterior probabilities for the window models and components. To ensure that all models use the same sample, a new file named fully_typed.out is created in the output directory which defines a variable named "fully_typed" for each fully typed individual. This file is added to the list of open phenotypes files, and the variable "fully_typed" is added to the model as a "null" covariate which has no effect on the model other than restricting the sample to fully typed individuals. This covariate is removed from the final best model stepup.best, so you may get a different likelihood in subsequent maximization. Up to dimension 3, all models with BIC better than the null model are retained. (This feature may be controlled with the -cornerdf option.) Also, the default "strict" rule is only applied to remove apparently redundant higher dimensional models at the very end after all important dimensions have been scanned. -list listfile listfile is a file containing a list of all covariates to be tested, one on each line. The filename cannot contain spaces. These covariates may or may not be in the model when the command is given. If the -list option is specified, all other covariates in the starting model are automatically fixed. -list list Alternatively, a Tcl list of covariates to be tested can be specified. Tcl lists are space delimited and enclosed in quotes or curly braces. -fix list list is a Tcl list of covariates to be included in every model and not tested. Their values will be estimated by maximum likelihood for every model, unless you constrain them. These covariates may or may not in the model when the command is given. For -fix, a list could be simply one phenotype, and that supercedes a file with the same name. -fix listfile Alternatively, a file containing a list of all covariates to be included in every model may be specified. The filename cannot contain spaces. The list of covariates to be fixed will supercede the list of covariates to be tested if the same covariate occurs on both lists, however a warning will be given. -cutoff cutoff Set the final BIC cutoff. The default is 6. -logn logn Use this fixed value for log(N) from the beginning. -finishlogn logn Recompute results of previous analysis with this log(N) value. Sometimes stepup fails at the end because the standard error of the SE parameter of the best BIC model cannot be computed, and that is needed to compute the final log(N). This option allows you to finish such a run that nearly completed previously. Be sure that starting conditions (such as loaded pedigree, phenotypes, model, outdir) and options are exactly the same as before. The original startup (stepup.orig.mod) and null models from the output directory will be loaded. Note that the temporary log(N) used by stepup by default is simply the log of the sample size, and this is reported to the stepup.history file. You may choose to use that or some other estimate. A special file required is stepup.winmods.prelim, which was produced by the previous incompleted run of stepup. -verbose Show maximization output during maximizations. -v Same as -verbose -cornerdf df EXPERIMENTAL. This sets the last degree of freedom that uses a loose test to include models in the window. Models need only have a better BIC than the null model up to and including this df. The default is 3. -symmetric Apply symmetric rule rather than strict. This results in a larger window. -par This option turns on Parallel processing on the SFBR GCC Compute Ranch. WARNING! Do not run more than one instance of stepup -par from the same working directory. Parallel stepup will use many (but not all!) ranch machines, and access for other users and jobs may be delayed due to gridware thrashing. The usual output is not printed to the terminal to save time but numerous parallel status messages are printed to help the developers make this program better. The parallel operation is automatic and the parallel status messages may be ignored by most users most of the time unless there is no output for more than fifteen minutes. Note: If model includes linkage element matrices loaded from some mibddir, those matrices should be relocated to the working directory, or specified with an absolute pathname in the model file. This is because in parallel operation the model is loaded not in the current working directory but in a subdirectory of /tmp. -parclean Normally, parallel stepup cleans up after itself. However, if it is necessary to force a shutdown of a parallel stepup, normal cleanup is not done. "stepup -parclean" cleans up all the junk stepup files in /tmp directories on all ranch machines. This must be run on medusa. Do not run if you have any other running parallel jobs (parallel stepup, parallel bayesavg, or any parallel job using "launch" or "doscript") as their files may be deleted too. See also "doranch" for other ranch cleanup procedures. Cleanup history is written to a file named cleantmp.out.Return to command index
A1.205 stopclock
Purpose: simple local timer (elapsed time) Usage: startclock;<timed command>;stopclock See Also: timediffReturn to command index
A1.206 string_imatch
Purpose: Case insensitive string match testing Usage: string_imatch <string1> <string2> Returns 1 for case insensitive match, 0 otherwise. Note: Useful in SOLAR scripts.Return to command index
A1.207 stringplot
Purpose: String plot of entire genome scan Usage: multipoint stringplot [-pass pass] [-allpass] [-title] [-lod <lod>] [-lodmark] [-color <name>] [-noconv] [-date] [-nomark] [-font <X-font-spec>] [-titlefont <X-font-spec>] [-dash <dash spec>] [-linestyle <dash spec>] [-mibddir <mibddir>] Notes: You can also use the command "plot -string" which has the same options and works identically. For further information on the options, see "help plot", where all the options are described. Here are the more important ones. No options are usually needed, they are usually for fine-tuning the display. -pass Multipoint oligogenic pass number, "1" is default -allpass Plot all multipoint passes (in separate plots) -title Title of plot -lod <lod> Show LOD scale for lods this high (default is highest) -lodmark Put marker ticks ON TOP of LOD curve (default is axis) -color Takes standard names like "blue" and "red" -noconv Do not mark convergence errors -date Datestamp plot -nomark Do not show marker ticks (very useful for GWAS) -font X font for text (see xlsfonts | more) -titlefont X font for title only -dash Line style (see "help plot" for description of spec) -linestyle Line style (same as -dash) -mibddir specify mibddir (default is set with mibddir command) -mapfile User mapfile -layers Method of using multiple colors. See help plot. mibddir and trait (or outdir) must have been specified previously. String plot graph will be both displayed on screen and written to file. If you are running on a remote system, you will need to enable X window forwarding by setting DISPLAY variable to point back to X display, and enabling acceptance of X protocol with xhost + command, as described in section 3.8.3.1 of the SOLAR documentation. Sorry, there is no possible way to write the the file without displaying the plot, the underlying "tk/wish" program does not allow that. An encapsulated postscript file is written to the trait/outdir with the name str.passN.ps where N is the pass number, such as str.pass01.ps If a copy of the string plot script, which is named "stringplotk", is found in the current working directory, that will be used in place of the standard version in the SOLAR bin directory. You can customize stringplotk as you wish. (It is a "wish" script, after all.) Good luck! Shortcuts: stringp - stringplotReturn to command index
A1.208 stringsub
Purpose: Simple verbatim string substitution (not regsub) Usage: stringsub <original> <target> <replacement>Return to command index
A1.209 tablefile
Purpose: Read data file in comma delimited or PEDSYS format Usage: set tablenum [tablefile open <filename>] tablefile $tablenum names ; return field names tablefile $tablenum short_names ; return short names tablefile $tablenum widths ; return field widths tablefile $tablenum start_setup ; start user record tablefile $tablenum setup <name> ; add field to user record tablefile $tablenum get ; get user record tablefile $tablenum rewind ; rewind file tablefile $tablenum close ; close file tablefile $tablenum test_name <name> ; test for named field tablefile $tablenum get_position ; get current position tablefile $tablenum set_position <pos> ; set position Notes: Intended for use in scripts. The get command will return data elements in a proper list. This means that if a data element includes spaces, it will be enclosed in braces. For best results, data records should be read using lindex, which removes the braces. On End of File, get will return an empty list. This should be tested for. Other file errors will raise Tcl exceptions. See Also: solarfile Shortcuts: tabl - tablefileReturn to command index
A1.210 tclgr
Purpose: Create xmgr session with pipe connection to SOLAR Note: This is a low-level plot interface used by other commands. Most users will use the higher level interfaces such as 'plot' or 'multipoint -plot.' Usage: tclgr open ;# Start xmgr session tclgr send <xmgr command line> ;# send command and wait now tclgr buffer <xmgr command line> ;# add xmgr command to buffer tclgr flush ;# flush buffer of commands tclgr close ;# end xmgr session tclgr syscommand <syscommand> ;# Setup sys command for XMGR ;# 'xmgr' is the default <XMGR command line> ;# defined in Ace/gr docs The tclgr open command has a '-buffersize <number>' option. The default buffersize is 1000. If the user closes the XMGR session remotely, the 'tclgr close' command must be used to officially close it before it can be re-opened. Shortcuts: tclg - tclgrReturn to command index
A1.211 tdist
Purpose: Set up t option for robust estimation of mean and variance Usage: tdist set up t option tdist -off turn off t option Notes: tdist creates a parameter t_param and sets tdist option tdist -off deletes t_param and unsets tdist option Shortcuts: tdis - tdistReturn to command index
A1.212 timediff
Purpose: Calculate seconds between two system time strings Usage: timediff <start-time> <end-time> See Also: startclock, stopclock set starttime [exec date] ... procedure to be timed set endtime [exec date] return "seconds: [timediff $starttime $endtime]"Return to command index
A1.213 toscript
Purpose: Write previous commands to a script Usage: toscript [-ov] <name> [<first>[-<last>]]* -ov Force overwrite of previous script <first> First command number to be included <last> Last command number in sequence to be included Example: toscript analysis 1 3 9-20 ;# include commands 1, 3, and 9-20 Notes: Command numbers are displayed with the tcl "history" command. If no numbers are specified, all previous commands in this SOLAR session will be included in script. Script will be saved in file named <name>.tcl in the current directory. After saving, newtcl will automatically be invoked so that the script can be used immediately. The script <name> defaults to being the first argument, but may also be the last argument if the <name> is not a number or range of numbers so there is no ambiguity. For example: toscript 1-10 startscript ;# OK toscript startscript 1-10 ;# OK toscript 2 1-10 ;# OK script named 2.tcl toscript 1-10 2 ;# OK script named 1-10.tclReturn to command index
A1.214 trait
Purpose: Select the trait (dependent variable) Usage: trait ; show current trait info trait <trait1> ; selects one trait trait [<traiti> ]+ ; multivariate (up to 20) trait -noparm [<traiti> ]+ ; don't touch parameters at all [define <defname> = <expression>]+ ; Define any expressions as trait [<phenotype>|<defname> ]+ ; traits...see "help define" Notes: Solar is case insensitive to variable names. Thus the command: trait foo will match a variable named FOO. Variables can not be distinguished on the basis of case in Solar. A phenotypes file must be loaded before giving the trait command. Starting with SOLAR version 4.x, arbitary expressions including one or more phenotypes may be defined with the "define" command and then used as trait(s). See "help define" for more details. If a model has already been created, it is recommended to give the "model new" command to clear it out prior to giving the trait command. It is only reasonable to skip the "model new" if the new trait has similar parameter estimates to the previous trait. For a change of one trait to another, SOLAR will attempt to accomodate the change by adjusting parameter values (unless the -noparm option is used) as described below. Any change of trait(s) involving two traits is not permitted (if any trait-specific parameters have been created); you will get an error message and the trait will go into a special error state ("Must_give_command_model_new") which will require you to give the "model new" command to clear before any model can be maximized (however, the model can be examined and saved in this state...you may wish to repair it offline in a text editor). Under no circumstances will the trait command create new parameters or delete old parameters. Normally the "polygenic" command is given to create and test the standard variance component parameters. If changing from one trait to another, the Mean and SD parameters, if present, will be reset to zero to force setting starting values and boundaries during the next maximization. Covariate betas will boundaries will also be zeroed. Examples: trait bmi trait q1 q2 define a = 10 * log(q4) trait a q3 Shortcuts: trai - traitsReturn to command index
A1.215 transpose
Purpose:: transpose on MathMatrix or comma delimited file Usage: transpose <MathMatrix> ;# returns id of transposed MathMatrix transpose <infile> <outfile> ;# transposes CSV file Note: All records must have same length. First record is treated like all others. To invert Pedsys file, use ped2csv first. Memory usage for extremely large files (>100mb) could be a problem. If memory is exhausted while caching the file in memory, solar might crash to the shell prompt.Return to command index
A1.216 twopoint
Purpose: Perform "Twopoint" analysis on directory of ibd files Usage: twopoint [-append] [-overwrite] [-grid] [-cparm {[<parameter>]*}] -saveall -overwrite (or -ov) Overwrite existing twopoint.out file. -append (or -a) Append to existing twopoint.out file. -cparm {} Custom parameters. Scanning will consist of replacing one matrix with another matrix, everything else is unchanged. The starting model MUST be a "prototype" linkage model with the desired parameters, omega, and constraints. Starting points and boundaries for the parameters must be explicitly specified. Following the -cparm tag, there must be a list of parameters in curly braces that you want printed out for each model. The list can be empty as is indicated with a pair of curly braces {}. There must be a model named null0 in the maximization output directory for LOD computation purposes. The matrix to be replaced must have name ibd or ibd1, ibd2, etc. The highest such ibd will be replaced. If the matrix is loaded with two "columns," such as "d7," each succeeding matrix will be loaded with two columns also. See section 9.4 for an example involving dominance. -grid Enables the "grid" option, which estimates recombination fractions in the range theta=0 to 0.45, finding the optimal value to the nearest 0.01. (Note: this option is not important for most twopoint users. It also increases evaluation time considerably. Consider using the separate "grid" command with only the markers of greatest interest.) -saveall Save all twopoint models in the maximization output directory. The models are named "ibd.<marker>". Notes: The trait or outdir must be specified before running twopoint. There must be a null0.mod model in the trait or outdir directory. This can be created with the polygenic command prior to running multipoint. (This model may include household and covariate effects. See the help for the polygenic command for more information.) An output file named twopoint.out will be created in the trait or outdir directory. If that file already exists, the user must choose the -append or -overwrite option. The best twopoint model is saved as two.mod in the trait or outdir directory. It is also loaded in memory at the completion of the twopoint command. IBDDIR should be set with the ibddir command prior to running twopoint. If models have two traits, the 2df LOD scores will be converted to 1df effective LOD scores, with the assumption that parameter RhoQ1 is not intentionally constrained. To override this, use the lodp command (see). This feature was first included with beta version 2.0.1. Shortcuts: twop - twopointReturn to command index
A1.217 upgrade
Purpose: Upgrade model files and scripts Usage: upgrade modelname upgrade scriptname.tcl Notes: If successful, the new file will replace the original one. The old file is saved with ".old" tacked on to the end of the name (e.g. amodel.mod.old). If an error is reported, the original file remains unchanged. If the file is a model, the ".mod" extension is assumed even if not specified. Solar always tacks on ".mod" to the end of model filenames. If the file is a script, it must end with the ".tcl" extension, and the extension must be specified in the command as shown. Upgrade looks for this, and if found it assumes that a script is being upgraded. solartcl appears at the top of all upgraded script files. SOLAR programmers are encoured to use solartcl as well. Shortcuts: upg - upgradeReturn to command index
A1.218 usage
Purpose: Print short "usage" message about a command Usage: usage <command> Example: usage multipoint ;# shows usage of multipoint command Notes: Since this is printed directly to terminal, it will stay visible for next command. If help message contains no "Usage" section, the first 15 lines will be printed. Shortcuts: usag - usagesReturn to command index
A1.219 usort
Purpose: Define unix sort program name (used for multipoint*.out files) Usage: usort <program> ; use program usort "" ; disables sort feature} usort ; show current program Notes: The default is /usr/bin/sort, which should work on most system. It is necessary to include a path for users which have PEDSYS, which has its own program named "sort." The program must be compatible with unix sort and have -n -o and -k arguments. Example: usort /usr/local/bin/sort Shortcuts: usor - usortReturn to command index
A1.220 vcfinfo
Purpose: extract per-genotype meta information from vcf file Usage: vcfinfo <filename> [-chrom <chrom>] [-info] [-all] -chrom restrict output to chromosome <chrom> -info include INFO and FORMAT fields -all include everything, including sample genotypes (WARNING! OUTPUT CAN BE VERY LARGE!) See also vcfselect to extract genotype/sample data only Notes: (1) output file is named as input file with .gz and .vcf extensions removed and .vcfinfo.csv or .<chrom>.vcfinfo.csv appended (2) vcf files permit commas within fields, which is not permitted in csv files. vcfinfo converts these commas to spaces.Return to command index
A1.221 vcfselect
Purpose: extract genotype data from vcf file (HIGHLY EXPERIMENTAL!) Usage: vcfselect <filename> <chrom> <pos> See also vcfinfo to extract genotype meta information. Notes: (1) output file is named as input file with .gz and .vcf extensions removed and .<chrom>.<pos>.csv added. (2) File is returned with genotypes as originally coded, and also with 0,1,2 SOLAR coding if there are no blanks. Fields are named sample_id, <chrom>.<pos> and snp_<chrom>.<pos>. (3) Errors in decoding to 0,1,2 are flagged with "error" in the output lines, and an error message. However the output file is still written completely with the original genotype coding.Return to command index
A1.222 verbosity
Purpose: Set the output verbosity. Usage: verbosity default ; Set default verbosity verbosity plus ; Set above average verbosity verbosity min ; Set minimum verbosity verbosity max ; Set maximum verbosity (debugging) verbosity ; Display current verbosity verbosity -number ; Show verbosity hex bit-coded number verbosity <number> ; Set verbosity with bit-coded number Notes: During analysis scripts such as polygenic and multipoint, the default verbosity supresses all the usual maximization output (such as you would see with the 'maximize' command run by itself). The maximization output can be turned on for analysis scripts using the 'plus' verbosity level. 'plus' is above default, but below 'max.' The bit coded numbers used for various verbosity levels are subject to change. User scripts should use the name (such as 'default') to be consistent with future releases. There are now a few reports which are so verbose that they are not even included in verbosity max. They may be specified by using hex coded numbers (starting with "0x"). These are subject to change in future releases. 0x4ffff Max verbosity with per-pedigree likelihoods for each interation. Hex-coded verbosity numbers were not supported prior to version 2.0.2. Shortcuts: verb - verbosityReturn to command index
A1.223 voxel
Purpose: To set and save current voxel position Usage: voxel [<voxel-value>] <voxel-value> is 3 coordinates delimited by colons as x:y:z for example, 12:8:23 If no voxel-value is specified, the current voxel is returned. If no current voxel has been defined, an error is raised. If a voxel has been defined, it is written to model files. The current voxel can also be set with the mask command, and that is the general way it should be done. Notes: 1) Image traits are qualified as to type in the header of the phenotypes file, which must be comma-delimited type, and the type qualification follows the trait name separated by colon. The only type currently supported is NIFTI/RicVolumeSet which is called "nifti". Such a header could look like this: ID,age,count:nifti Then, each corresponding data field consists of a NIFTI filename followed by a colon and the volume number for that person. For example: A001,19,images.gz:1Return to command index
A1.224 whoranch
Purpose: Show how many ranch machines each ranch user is using Usage: whoranch Notes: See also howmanyranch. Requires use of Grid Engine software.Return to command index
A1.225 write_subject_ids
Purpose: Writes out the subject IDs with non zero values given a trait or traits selected, phenotype loaded, and pedigree loaded. The option to not include the pedigree can be selected using the -no_ped argument. Usage: write_subject_ids -out <output filename> [optional: -no_ped ] Example: load ped ped.csv load pheno phenotype.csv trait trait_one write_subject_ids -out trait_one_subject_idsReturn to command index
A1.226 zs
Purpose: The old zscore command to zscore current trait Old Usage: zscore [-off] [-q] zs [-off] ;# Perform zscore quietly -off Turn off zscore -q Perform zscore quietly Notes: The "Mean" and "SD" values used by zscore are computed only once, at the time the zscore command is given. Thus they do not reflect later changes to the phenotypes file, or to the sample, which might be restricted due to individuals missing covariates added later. Generally, for this reason the zscore command should be given after the covariates command and immediately before a model maximizing command such as polygenic. Starting with SOLAR Version 4.0.9, the trait mean and SD are computed from the actual sample that would be included in an analysis (at the time the zscore command is given). As described in the notes below, you can adjust the Mean and SD by using "option zmean1" and "option zsd1" to set the values actually used. These values are applied to the trait values during maximization. If the trait is changed without giving the "model new" command, the new trait will be zscored automatically. This feature is obsolescent. In a future update, zscore will be turned off when the trait is changed. An alternative to zscore is to define the trait as the inverse normal transformation of a variable. See "help inormal" and "help define" for further details. zscore will also calculate a number of statistics for the trait: mean, minimum, maximum, standard deviation, skewness, and kurtosis. These will be written to the file zscore.out in the current output directory. As of version 4.0.9, these statistics are no longer written to the terminal. Instead, a single line is displayed with the trait name, mean, and SD. Even that line is not shown if zscore is invoked from a script or the zs abbreviation of the command is used. To calculate these statistics for any phenotypic variable without zscoring and without necessarily making it the trait, use the "stats" command instead. A trait must already have been selected with the trait command or loaded model. Also the phenotypes file must have been loaded. When a maximization is performed, trait values are replaced with their zscored values. The formula is: zscored = (value - Mean) / SD zscore is a model dependent option controlled by "option zscore". It remains in effect until another model is loaded or the "model new" command is given. When models maximized with zscore are reloaded, zscore is again activated. "option zscore" is set to 1 ("on") by this command, and the related options zmean1 and zsd1 (mean and standard deviation for the first trait) and zmean2 and zsd2 (mean and standard deviation for the second trait) are set as required. You can adjust these options directly to fine tune the mean and standard deviation values used, but be sure that zscore is not set to 1 until the mean and (non-zero !) standard deviation values are set for all traits in the model. In a multivariate model, zscore will only be applied to the first two traits. Whenever zscore is activated or deactivated, parameters mean and SD are reset to zero to force setting new boundaries and starting point during the next maximization. If a new phenotypes file is loaded, the zscore command should be repeated to reflect the new file.Return to command index
A1.227 zscore
Purpose: Zscore current trait(s) or covariate(s) Usage: define defname = zscore_phenotype trait defname OR covariate defname (defname is any user defined name, phenotype is any phenotype name) Notes: zscore_ is a prefix that may be used in the define command, similar to the inormal_ prefix. Once a definition has been created, it may be used in either the trait or covariate commands. For further information, see "help define". The Mean and SD are obtained from the current maximization sample, not the entire phenotypes file. In versions of SOLAR prior to 4.4.0, zscore was a command that could be only used to zscore the current trait. That command is still available as before, but was considered obsolescent. It was difficult and problemantical. For information about that command, for understanding previous uses, see "help old_zscore".Return to command index