Biojava

4 respostas
asandro1501

Olá Pessoal

Estou precisando de um código pra ler cabeçalhos em um arquivo com uma extensão .gbk. É um projeto para trabalhar com sequenciamentos genéticos, não achei nada que fizesse isso nem no site Biojava.org , se alguém puder ajudar eu agradeço

Alex

4 Respostas

L

poste o layout do arquivo que voce quer ler.

asandro1501

Olá

Segue abaixo um dos cabeçalhos do arquivo. Existem muitos dentro do arquivo
Pra começar eu preciso encontrar o “LOCUS” (em vermelho) e seu identificador(em verde), a linha “gene”(em marrom) que vai ter um um identificador com seu nome no modelo “/gene=nomedogene”, depois disso preciso achar a linha que contém o mRNA(em azul) e capturar os valores contidos entre parenteses, cada mRNA possui uma linha que identifica a qual gene pertence no mesmo modelo “/gene=nomedogene”. Acredito que daria pra fazer isso sem grandes problemas no dedo mas o projeto BIOJAVA.ORG possui várias api’s, mas não achei nenhuma que eu pudesse fazer isso.

//

[color=red]LOCUS[/color]      [color=green] NW_927395[/color]             611322 bp    DNA     linear   CON 10-JUN-2009

DEFINITION  Homo sapiens chromosome 22 genomic contig, alternate assembly

(based on Celera), whole genome shotgun sequence.

ACCESSION   NW_927395

VERSION     NW_927395.1  GI:89059083

DBLINK      Project:16116

KEYWORDS    WGS.

SOURCE      Homo sapiens (human)

ORGANISM  Homo sapiens

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;

Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;

Catarrhini; Hominidae; Homo.

REFERENCE   1  (bases 1 to 611322)

AUTHORS   Istrail,S., Sutton,G.G., Florea,L., Halpern,A.L., Mobarry,C.M.,

Lippert,R., Walenz,B., Shatkay,H., Dew,I., Miller,J.R.,

Flanigan,M.J., Edwards,N.J., Bolanos,R., Fasulo,D.,

Halldorsson,B.V., Hannenhalli,S., Turner,R., Yooseph,S., Lu,F.,

Nusskern,D.R., Shue,B.C., Zheng,X.H., Zhong,F., Delcher,A.L.,

Huson,D.H., Kravitz,S.A., Mouchard,L., Reinert,K., Remington,K.A.,

Clark,A.G., Waterman,M.S., Eichler,E.E., Adams,M.D.,

Hunkapiller,M.W., Myers,E.W. and Venter,J.C.

TITLE     Whole-genome shotgun assembly and comparison of human genome

assemblies

JOURNAL   Proc. Natl. Acad. Sci. U.S.A. 101 (7), 1916-1921 (2004)

PUBMED   14769938

REFERENCE   2  (bases 1 to 611322)

AUTHORS   Venter,J.C., Adams,M.D., Myers,E.W., Li,P.W., Mural,R.J.,

Sutton,G.G., Smith,H.O., Yandell,M., Evans,C.A., Holt,R.A.,

Gocayne,J.D., Amanatides,P., Ballew,R.M., Huson,D.H., Wortman,J.R.,

Zhang,Q., Kodira,C.D., Zheng,X.H., Chen,L., Skupski,M.,

Subramanian,G., Thomas,P.D., Zhang,J., Gabor Miklos,G.L.,

Nelson,C., Broder,S., Clark,A.G., Nadeau,J., McKusick,V.A.,

Zinder,N., Levine,A.J., Roberts,R.J., Simon,M., Slayman,C.,

Hunkapiller,M., Bolanos,R., Delcher,A., Dew,I., Fasulo,D.,

Flanigan,M., Florea,L., Halpern,A., Hannenhalli,S., Kravitz,S.,

Levy,S., Mobarry,C., Reinert,K., Remington,K., Abu-Threideh,J.,

Beasley,E., Biddick,K., Bonazzi,V., Brandon,R., Cargill,M.,

Chandramouliswaran,I., Charlab,R., Chaturvedi,K., Deng,Z., Di

Francesco,V., Dunn,P., Eilbeck,K., Evangelista,C., Gabrielian,A.E.,

Gan,W., Ge,W., Gong,F., Gu,Z., Guan,P., Heiman,T.J., Higgins,M.E.,

Ji,R.R., Ke,Z., Ketchum,K.A., Lai,Z., Lei,Y., Li,Z., Li,J.,

Liang,Y., Lin,X., Lu,F., Merkulov,G.V., Milshina,N., Moore,H.M.,

Naik,A.K., Narayan,V.A., Neelam,B., Nusskern,D., Rusch,D.B.,

Salzberg,S., Shao,W., Shue,B., Sun,J., Wang,Z., Wang,A., Wang,X.,

Wang,J., Wei,M., Wides,R., Xiao,C., Yan,C., Yao,A., Ye,J., Zhan,M.,

Zhang,W., Zhang,H., Zhao,Q., Zheng,L., Zhong,F., Zhong,W., Zhu,S.,

Zhao,S., Gilbert,D., Baumhueter,S., Spier,G., Carter,C.,

Cravchik,A., Woodage,T., Ali,F., An,H., Awe,A., Baldwin,D.,

Baden,H., Barnstead,M., Barrow,I., Beeson,K., Busam,D., Carver,A.,

Center,A., Cheng,M.L., Curry,L., Danaher,S., Davenport,L.,

Desilets,R., Dietz,S., Dodson,K., Doup,L., Ferriera,S., Garg,N.,

Gluecksmann,A., Hart,B., Haynes,J., Haynes,C., Heiner,C.,

Hladun,S., Hostin,D., Houck,J., Howland,T., Ibegwam,C., Johnson,J.,

Kalush,F., Kline,L., Koduru,S., Love,A., Mann,F., May,D.,

McCawley,S., McIntosh,T., McMullen,I., Moy,M., Moy,L., Murphy,B.,

Nelson,K., Pfannkoch,C., Pratts,E., Puri,V., Qureshi,H.,

Reardon,M., Rodriguez,R., Rogers,Y.H., Romblad,D., Ruhfel,B.,

Scott,R., Sitter,C., Smallwood,M., Stewart,E., Strong,R., Suh,E.,

Thomas,R., Tint,N.N., Tse,S., Vech,C., Wang,G., Wetter,J.,

Williams,S., Williams,M., Windsor,S., Winn-Deen,E., Wolfe,K.,

Zaveri,J., Zaveri,K., Abril,J.F., Guigo,R., Campbell,M.J.,

Sjolander,K.V., Karlak,B., Kejariwal,A., Mi,H., Lazareva,B.,

Hatton,T., Narechania,A., Diemer,K., Muruganujan,A., Guo,N.,

Sato,S., Bafna,V., Istrail,S., Lippert,R., Schwartz,R., Walenz,B.,

Yooseph,S., Allen,D., Basu,A., Baxendale,J., Blick,L., Caminha,M.,

Carnes-Stine,J., Caulk,P., Chiang,Y.H., Coyne,M., Dahlke,C.,

Mays,A., Dombroski,M., Donnelly,M., Ely,D., Esparham,S., Fosler,C.,

Gire,H., Glanowski,S., Glasser,K., Glodek,A., Gorokhov,M.,

Graham,K., Gropman,B., Harris,M., Heil,J., Henderson,S., Hoover,J.,

Jennings,D., Jordan,C., Jordan,J., Kasha,J., Kagan,L., Kraft,C.,

Levitsky,A., Lewis,M., Liu,X., Lopez,J., Ma,D., Majoros,W.,

McDaniel,J., Murphy,S., Newman,M., Nguyen,T., Nguyen,N., Nodell,M.,

Pan,S., Peck,J., Peterson,M., Rowe,W., Sanders,R., Scott,J.,

Simpson,M., Smith,T., Sprague,A., Stockwell,T., Turner,R.,

Venter,E., Wang,M., Wen,M., Wu,D., Wu,M., Xia,A., Zandieh,A. and

Zhu,X.

TITLE     The sequence of the human genome

JOURNAL   Science 291 (5507), 1304-1351 (2001)

PUBMED   11181995

REMARK    Erratum:[Science 2001 Jun 5;292(5523):1838]

COMMENT     GENOME ANNOTATION REFSEQ:   Features on this sequence have been

produced for build 37 version 1 of the NCBIs genome annotation

[see documentation].

The DNA sequence was produced by Celera Genomics. It is included in

the NCBI RefSeq collection as an alternative assembly to the one

produced by the Genome Reference Consortium. The original whole

genome shotgun project has the project accession AADB00000000.2.

FEATURES             Location/Qualifiers

source          1611322

/organism=Homo sapiens

/mol_type=genomic DNA

/db_xref=taxon:9606

/chromosome=22

gap             1047812022

/estimated_length=1545

gap             2189821917

/estimated_length=20

gap             2877628795

/estimated_length=20

gap             102320102700

/estimated_length=381

gap             117693117936

/estimated_length=244

gap             131318134090

/estimated_length=2773

gap             147144147163

/estimated_length=20

gap             149879150344

/estimated_length=466

gap             164002164360

/estimated_length=359

gap             167157167176

/estimated_length=20

[color=brown] gene[/color]            193425207387

/gene=LOC648218

/note=The sequence of the transcript was modified to

remove a frameshift represented in this assembly; Derived

by automated computational analysis using gene prediction

method: GNOMON. Supporting evidence includes similarity

to: 1 mRNA, 1 Protein

/pseudo

/db_xref=GeneID:648218

misc_RNA        join(193425193889,197932198046,198209198383,

200554200660,201621201783,204342204406,

207068207387)

/gene=LOC648218

/product=similar to hCG1793014

/exception=unclassified transcription discrepancy

/note=Derived by automated computational analysis using

gene prediction method: GNOMON. Supporting evidence

includes similarity to: 1 mRNA, 1 Protein

/pseudo

/transcript_id=XR_038470.2

/db_xref=GI:239752108

/db_xref=GeneID:648218

gene            207127216058

/gene=LOC100289959

/note=Derived by automated computational analysis using

gene prediction method: GNOMON. Supporting evidence

includes similarity to: 1 Protein

/db_xref=GeneID:100289959

[color=blue]mRNA[/color]            join(207127207157,215733216058)

/gene=LOC100289959

/product=similar to hCG1644292

/note=Derived by automated computational analysis using

gene prediction method: GNOMON. Supporting evidence

includes similarity to: 1 Protein

/transcript_id=XM_002348072.1

/db_xref=GI:239752109

/db_xref=GeneID:100289959

CDS             join(207127207157,215733216058)

/gene=LOC100289959

/note=Derived by automated computational analysis using

gene prediction method: GNOMON.

/codon_start=1

/product=hypothetical protein XP_002348113

/protein_id=XP_002348113.1

/db_xref=GI:239752110

/db_xref=GeneID:100289959

gene            complement(256329257494)

/gene=LOC100289992

/note=The sequence of the transcript was modified to

remove frameshifts and prevent a premature stop codon

represented in this assembly; Derived by automated

computational analysis using gene prediction method:

GNOMON. Supporting evidence includes similarity to: 1

Protein

/pseudo

/db_xref=GeneID:100289992

exon            complement(256329256532)

/gene=LOC100289992

/exception=unclassified transcription discrepancy

/note=Derived by automated computational analysis using

gene prediction method: GNOMON. Supporting evidence

includes similarity to: 1 Protein

/number=1

/pseudo

/db_xref=GeneID:100289992

exon            complement(256611257494)

/gene=LOC100289992

/exception=unclassified transcription discrepancy

/note=Derived by automated computational analysis using

gene prediction method: GNOMON. Supporting evidence

includes similarity to: 1 Protein

/number=2

/pseudo

/db_xref=GeneID:100289992

gene            complement(257500274289)

/gene=LOC100133475

/note=Derived by automated computational analysis using

gene prediction method: GNOMON. Supporting evidence

includes similarity to: 1 Protein

/db_xref=GeneID:100133475

mRNA            complement(join(257500258155,258408258906,

273114273123,273653273731,274247274289))

/gene=LOC100133475

/product=similar to Putative zinc finger protein

ENSP00000328166
L

Da uma verificada nisto e ve se atende eu peguei no proprio Guj so não me lembro do POST

asandro1501

Infelizmente não serve, você trabalhou com biojava?

De qualquer forma, obrigado pela atenção

Alex

Criado 14 de setembro de 2010
Ultima resposta 15 de set. de 2010
Respostas 4
Participantes 2