Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to deal with NA data for non model species? #21

Open
cagenet opened this issue Jun 7, 2021 · 5 comments
Open

How to deal with NA data for non model species? #21

cagenet opened this issue Jun 7, 2021 · 5 comments

Comments

@cagenet
Copy link

cagenet commented Jun 7, 2021

Hi Federico,
Thank you for this really nice and helpful package. It is really useful.
I used it for agronomical species (pig and bovine).
I encountered a problem when trying to use enhance_table function. I have the following error :

p <- enhance_table(res_enrich_pig,

  •                res,
    
  •                n_gs = 30,
    
  •                gtl<-gtl_pig,
    
  •                annotation_obj = resannot_Pig,
    
  •                chars_limit = 60)
    

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
les noms de lignes contiennent des valeurs manquantes
I guess the trouble came from the fact that I use ENSEMBLID, convert it in HGNC symbol but as not all genes are identified , I have some missing data (valeurs manquantes in french). Could it be possible to allow missing values in this function ?
Thank you in advance
Regards
Carine

@federicomarini
Copy link
Owner

Hi Carine!
Quick note on the way you call the function - it should be enough to use the gtl object, as the res/res_enrich/annotation are all picked from that.
Re: the missing values problem - does the error persist if you

  • keep the ids as ensembl in the result
  • have the conversion handled internally with the annotation object?

@cagenet
Copy link
Author

cagenet commented Jun 8, 2021

Hi Federico,
Thank you so much for your quick reply.
First, I didn't catch what you mean by "the way I call the function ". I follow your user guide (step 5.1).
For the missing values problem :
I think that there is two issues : the first one is that for the pig specie there is no mapping between Ensembl and symbol in org.Ss.eg.db (see https://support.bioconductor.org/p/9137574/)...
So I used bovine species (I have 2 datasets I applied a treatment in bovine and in pig and want to compare the results obtained in the two species).
Now with the bovine species, the enhance_table work with symbol but not with ENSEMBL ids.
I do have a conversion handled internally in the annotation object (a two table file with gene_id and gene_symbol (call HGNC in my code) and I have in the RowData in dds object an annotation with Ensemblid and gene symbol (call HGNC in my code see below).
What I don't understand is that I have an conversion table in the annotation object and also in the RowData (dds).

Thank you for your help.
Cheers

Here is my code :

library(GeneTonic)
load(file="DESEQ2_dds1_Bov.rda")#dds2

dds2
class: DESeqDataSet
dim: 15779 20
metadata(1): version
assays(4): counts mu H cooks
rownames(15779): ENSBTAG00000000005 ENSBTAG00000000009 ... ENSBTAG00000055312 ENSBTAG00000055315
rowData names(60): ENSEMBLID HGNC ... deviance maxCooks
colnames(20): BE1_CTRL BE2_CTRL ... BE14_BMP15 BE16_BMP15
colData names(3): Cow Treatment sizeFactor

load(file="resBta_04juin.rda")

summary(resBta)
out of 15779 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 2323, 15%
LFC < 0 (down) : 2460, 16%
outliers [1] : 0, 0%
low counts [2] : 306, 1.9%
(mean count < 2)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

resannot_Bov<-read.table(file="annot_bov.txt", header = TRUE, sep="\t")

head(resannot_Bov)
gene_id gene_name
1 ENSBTAG00000000005 GRK3
2 ENSBTAG00000000008 KCNJ1
3 ENSBTAG00000000009 FOXF1
4 ENSBTAG00000000010 UBL7
5 ENSBTAG00000000011 TDH
6 ENSBTAG00000000012 TTC33

de_symbol_bov <- deseqresult2df(resBta, FDR = 0.05)$SYMBOL
bg_bov_sym <- rowData(dds2)$HGNC[rowSums(counts(dds2)) > 0]

resBta$ensemblid<-(row.names(resBta))
bg_ensembl_bov <- rowData(dds2)$ENSEMBLID[rowSums(counts(dds2)) > 0]
de_ensembl_bov <- deseqresult2df(resBta, FDR = 0.05)$ensemblid
library("topGO")
topgoBP_bov <- pcaExplorer::topGOtable(de_symbol_bov, bg_bov_sym,
ontology = "BP",
mapping = "org.Bt.eg.db",
geneID = "symbol",
topTablerows = 500)

topgoBPensembl_bov <- pcaExplorer::topGOtable(de_ensembl_bov, bg_ensembl_bov,
ontology = "BP",
mapping = "org.Bt.eg.db",
geneID = "ENSEMBL",
topTablerows = 500)
res_enrich_bov<-shake_topGOtableResult(topgoBP_bov)
res_enrich_bov_ensembl<-shake_topGOtableResult(topgoBPensembl_bov)
res_enrich_bov <- get_aggrscores(res_enrich = res_enrich_bov, res_de = resBta,
annotation_obj = resannot_Bov,
aggrfun = mean)
res_enrich_bov_ensembl<-get_aggrscores(res_enrich = res_enrich_bov_ensembl, res_de = resBta,annotation_obj = resannot_Bov,aggrfun = mean)
gtl_bov <- GeneTonic_list(
dds = dds2,
res_de = resBta,
res_enrich = res_enrich_bov,
annotation_obj = resannot_Bov
)

gtl_bov_ensembl<-GeneTonic_list(dds=dds2, res_de = resBta, res_enrich = res_enrich_bov_ensembl, annotation_obj = resannot_Bov)
p <- enhance_table(res_enrich_bov,
resBta,
n_gs = 30,
gtl<-gtl_bov,
annotation_obj = resannot_Bov,
chars_limit = 60)
p1<-enhance_table(res_enrich_bov_ensembl, resBta, n_gs=30, gtl = gtl_bov_ensembl, annotation_obj = resannot_Bov, chars_limit = 60)
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
les noms de lignes contiennent des valeurs manquantes

@federicomarini
Copy link
Owner

Hi Federico,
Thank you so much for your quick reply.
First, I didn't catch what you mean by "the way I call the function ". I follow your user guide (step 5.1).

Hi Carine,
in theory you can call

enhance_table(gtl = gtl_bov_ensembl, n_gs=30, chars_limit = 60)

as gtl should have all it needs 😉

For the missing values problem :
I think that there is two issues : the first one is that for the pig specie there is no mapping between Ensembl and symbol in org.Ss.eg.db (see https://support.bioconductor.org/p/9137574/)...
So I used bovine species (I have 2 datasets I applied a treatment in bovine and in pig and want to compare the results obtained in the two species).
Now with the bovine species, the enhance_table work with symbol but not with ENSEMBL ids.
I do have a conversion handled internally in the annotation object (a two table file with gene_id and gene_symbol (call HGNC in my code) and I have in the RowData in dds object an annotation with Ensemblid and gene symbol (call HGNC in my code see below).
What I don't understand is that I have an conversion table in the annotation object and also in the RowData (dds).

Thank you for your help.
Cheers

Here is my code :

library(GeneTonic)
load(file="DESEQ2_dds1_Bov.rda")#dds2

dds2
class: DESeqDataSet
dim: 15779 20
metadata(1): version
assays(4): counts mu H cooks
rownames(15779): ENSBTAG00000000005 ENSBTAG00000000009 ... ENSBTAG00000055312 ENSBTAG00000055315
rowData names(60): ENSEMBLID HGNC ... deviance maxCooks
colnames(20): BE1_CTRL BE2_CTRL ... BE14_BMP15 BE16_BMP15
colData names(3): Cow Treatment sizeFactor

load(file="resBta_04juin.rda")

summary(resBta)
out of 15779 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 2323, 15%
LFC < 0 (down) : 2460, 16%
outliers [1] : 0, 0%
low counts [2] : 306, 1.9%
(mean count < 2)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

resannot_Bov<-read.table(file="annot_bov.txt", header = TRUE, sep="\t")

head(resannot_Bov)
gene_id gene_name
1 ENSBTAG00000000005 GRK3
2 ENSBTAG00000000008 KCNJ1
3 ENSBTAG00000000009 FOXF1
4 ENSBTAG00000000010 UBL7
5 ENSBTAG00000000011 TDH
6 ENSBTAG00000000012 TTC33

de_symbol_bov <- deseqresult2df(resBta, FDR = 0.05)$SYMBOL
bg_bov_sym <- rowData(dds2)$HGNC[rowSums(counts(dds2)) > 0]

resBta$ensemblid<-(row.names(resBta))
bg_ensembl_bov <- rowData(dds2)$ENSEMBLID[rowSums(counts(dds2)) > 0]
de_ensembl_bov <- deseqresult2df(resBta, FDR = 0.05)$ensemblid
library("topGO")
topgoBP_bov <- pcaExplorer::topGOtable(de_symbol_bov, bg_bov_sym,
ontology = "BP",
mapping = "org.Bt.eg.db",
geneID = "symbol",
topTablerows = 500)

Ok, now that you elaborate a little more on this (and since there's no ensembl->symbol conversion) it can be that the gs_genes column in the res_enrich...

topgoBPensembl_bov <- pcaExplorer::topGOtable(de_ensembl_bov, bg_ensembl_bov,
ontology = "BP",
mapping = "org.Bt.eg.db",
geneID = "ENSEMBL",
topTablerows = 500)
res_enrich_bov<-shake_topGOtableResult(topgoBP_bov)
res_enrich_bov_ensembl<-shake_topGOtableResult(topgoBPensembl_bov)

... here ☝️
does not have the gene symbols "as expected".

I am not so familiar with species other than human, mouse & fly, as these are the almost entirety of the catalogue our cooperation partners work with.

If you want to share with me the gtl object, I can have a look (will of course be handled confidentially).

Federico

res_enrich_bov <- get_aggrscores(res_enrich = res_enrich_bov, res_de = resBta,
annotation_obj = resannot_Bov,
aggrfun = mean)
res_enrich_bov_ensembl<-get_aggrscores(res_enrich = res_enrich_bov_ensembl, res_de = resBta,annotation_obj = resannot_Bov,aggrfun = mean)
gtl_bov <- GeneTonic_list(
dds = dds2,
res_de = resBta,
res_enrich = res_enrich_bov,
annotation_obj = resannot_Bov
)

gtl_bov_ensembl<-GeneTonic_list(dds=dds2, res_de = resBta, res_enrich = res_enrich_bov_ensembl, annotation_obj = resannot_Bov)
p <- enhance_table(res_enrich_bov,
resBta,
n_gs = 30,
gtl<-gtl_bov,
annotation_obj = resannot_Bov,
chars_limit = 60)
p1<-enhance_table(res_enrich_bov_ensembl, resBta, n_gs=30, gtl = gtl_bov_ensembl, annotation_obj = resannot_Bov, chars_limit = 60)
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
les noms de lignes contiennent des valeurs manquantes

@cagenet
Copy link
Author

cagenet commented Jun 8, 2021

Hi ;-)
First issue: I guess that it was something like that BUT I still obtained the same error ...
p<-enhance_table(gtl=gtl_bov_ensembl, n_gs = 30, chars_limit = 60)
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
les noms de lignes contiennent des valeurs manquantes

Second issue : Thank you so much for your help ... Of course I can send you my gtl object ...
Will try to import and give you access to gtf object in my repository (GitHub newbie and never used it !)
Thank you

@federicomarini
Copy link
Owner

Sure, no prob.
You can reach me at the email address in the maintainer field of the package (trying to make bots' work to scrape my email a little harder, sorry for the extra step 😛 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants