Upon building CDS information on custom transcripts, one can query the translated sequence of protein-coding transcripts and determine the encoding protein domains and motifs.

`predictDomains` queries the HMM database (https://www.ebi.ac.uk/Tools/hmmer/search/hmmscan) for known protein domain families using either the "superfamily" or "pfam" database. This prediction can be performed globally on all protein coding transcripts or on specific transcript families (recommended).

# S4 method for factR
getAAsequence(object, verbose = FALSE)

# S4 method for factR
predictDomains(object, ..., database = "superfamily", ncores = 4)

Arguments

object

factRObject

...

One or more features to display. Can be the following:

  • gene_id: ID of gene to plot

  • gene_name: Name of gene to plot

  • transcript_id: ID of transcript to plot

database

HMM database to query. Can be "superfamily" or "pfam".

ncores

Number of cores to run prediction on

factRObject

Value

Updated factRObject. `getAAsequence` stores an AAStringSet object in the factRObject class.

`predictDomains` stores a dataframe of predicted protein domains in the factRObject.

Examples

## Load sample factRObject and build CDS
data(factRsample)
factRsample <- buildCDS(factRsample)

## Get peptide sequences
factRsample <- getAAsequence(factRsample)

## Predict domains of gene families
factRsample <- predictDomains(factRsample, "Osmr")
#> Set `show_more to TRUE to show more info`
#> Warning: Skipped 1 non-coding transcripts

## Predict domains of entire coding transcriptome
### This takes some time. Increase `ncores` where necessary
factRsample <- predictDomains(factRsample)
#> Set `show_more to TRUE to show more info`
#> Warning: Skipped 70 non-coding transcripts