Getting Started with rTASSEL
Mục lục bài viết
Overview
One of TASSEL’s most powerful functionalities is its capability of
performing a variety of different association modeling techniques. If
you have started reading the walkthrough here it is strongly
suggested that you read the other components of this walkthrough since
the following parameters require what we have previously
created!
If you are not familar with these methods, more information about how
these operate in base TASSEL can be found at following links:
The rTASSEL::assocModelFitter()
function has several
primary components:
-
tasObj
: aTasselGenotypePhenotype
class R
object -
formula
: an R-based linear model formula -
fitMarkers
: a boolean parameter to differentiate
between BLUE and GLM analyses -
kinship
: a TASSEL kinship object -
fastAssociation
: a boolean parameter for data sets that
have many traits
Probably the most important concept of this function is
formula
parameter. If you are familar with standard R
linear model functions, this concept is fairly similar. In TASSEL, a
linear model is composed of the following scheme:
y
~
A
…where y
is any TASSEL data
type and
A
is any TASSEL covariate
and / or
factor
types:
<data> ~ <covariate> and/or <factor>
This model can be written out in several ways. With an example
phenotype data, we can have the following variables that are represented
in TASSEL in the following way:
-
Taxon
<taxa>
-
EarHT
<data>
-
dpoll
<data>
-
EarDia
<data>
-
location
<factor>
-
Q1
<covariate>
-
Q2
<covariate>
-
Q3
<covariate>
Using this data, we could write out the following formula in R
list(
EarHT
, dpoll
, EarDia
)
~
location
+
Q1
+
Q2
+
Q3
In the above example, we use a base list()
function to
indicate analysis on multiple numeric data types. For covariate and
factor information, we use +
operator. One problem with
this implementation is that it can become cumbersome and prone to error
if we want to analyze the entirety of a large data set or all data
and/or factor and covariate types.
A work around for this problem is to utilize a special character to
indicate all elements within the model (.
). By using the
.
operator we can simplify the above model into the
following:
.
~
.
This indicates we want to analyze the whole data set and
leave nothing out. If we want to analyze all data types and only a
handful of factor and/or covariates, we can use something like this:
.
~
location
+
Q1
+
Q2
Or vice-versa:
list(
EarHT
, dpoll
)
~
.
Take note we can be very specific with what we want to include in our
trait model! In the above example we have deliberately left out
EarDia
from our model.
Additionally, we can also fit marker and kinship data to our model
which can change our analytical methods. Since these options in TASSEL
are binary, additional parameters are passed for this function. In this
case, genotype/marker data is fitted using the fitMarker
parameter and kinship is fitted using the kinship
parameter.
Fast Association implements methods described by Shabalin (2012).
This method provides an ordinary least squares solution for fixed effect
models. For this method to proper work it is necessary that your
have:
- No missing data in your phenotype data set
-
Phenotypes and genotypes have been merged using an intersect
join. Since this is currently the only option of join genotype and
phenotype data, you do not have to worry about this for
now.
NOTE: since we are working with “toy” data,
empirical insight will not be elucidated upon in the following steps.
This is simply to show the user how properly use these functions and the
outputs that they give.
In the following examples, we will run example data and in return,
obtain TASSEL association table reports in the form of an R
list
object containing tibble
-based R data
frames.