Olá Pessoal Estou precisando de um código pra ler cabeçalhos em um arquivo com uma extensão .gbk. É um projeto para trabalhar com sequenciamentos genéticos, não achei nada que fizesse isso nem no site Biojava.org , se …

poste o layout do arquivo que voce quer ler.

Da uma verificada nisto e ve se atende eu peguei no proprio Guj so não me lembro do POST

Biojava — GUJ

Olá

Segue abaixo um dos cabeçalhos do arquivo. Existem muitos dentro do arquivo
Pra começar eu preciso encontrar o “LOCUS” (em vermelho) e seu identificador(em verde), a linha “gene”(em marrom) que vai ter um um identificador com seu nome no modelo “/gene=nomedogene”, depois disso preciso achar a linha que contém o mRNA(em azul) e capturar os valores contidos entre parenteses, cada mRNA possui uma linha que identifica a qual gene pertence no mesmo modelo “/gene=nomedogene”. Acredito que daria pra fazer isso sem grandes problemas no dedo mas o projeto BIOJAVA.ORG possui várias api’s, mas não achei nenhuma que eu pudesse fazer isso.

//

[color=red]LOCUS[/color]      [color=green] NW_927395[/color]             611322 bp    DNA     linear   CON 10-JUN-2009

DEFINITION  Homo sapiens chromosome 22 genomic contig, alternate assembly

(based on Celera), whole genome shotgun sequence.

ACCESSION   NW_927395

VERSION     NW_927395.1  GI:89059083

DBLINK      Project:16116

KEYWORDS    WGS.

SOURCE      Homo sapiens (human)

ORGANISM  Homo sapiens

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;

Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;

Catarrhini; Hominidae; Homo.

REFERENCE   1  (bases 1 to 611322)

AUTHORS   Istrail,S., Sutton,G.G., Florea,L., Halpern,A.L., Mobarry,C.M.,

Lippert,R., Walenz,B., Shatkay,H., Dew,I., Miller,J.R.,

Flanigan,M.J., Edwards,N.J., Bolanos,R., Fasulo,D.,

Halldorsson,B.V., Hannenhalli,S., Turner,R., Yooseph,S., Lu,F.,

Nusskern,D.R., Shue,B.C., Zheng,X.H., Zhong,F., Delcher,A.L.,

Huson,D.H., Kravitz,S.A., Mouchard,L., Reinert,K., Remington,K.A.,

Clark,A.G., Waterman,M.S., Eichler,E.E., Adams,M.D.,

Hunkapiller,M.W., Myers,E.W. and Venter,J.C.

TITLE     Whole-genome shotgun assembly and comparison of human genome

assemblies

JOURNAL   Proc. Natl. Acad. Sci. U.S.A. 101 (7), 1916-1921 (2004)

PUBMED   14769938

REFERENCE   2  (bases 1 to 611322)

AUTHORS   Venter,J.C., Adams,M.D., Myers,E.W., Li,P.W., Mural,R.J.,

Sutton,G.G., Smith,H.O., Yandell,M., Evans,C.A., Holt,R.A.,

Gocayne,J.D., Amanatides,P., Ballew,R.M., Huson,D.H., Wortman,J.R.,

Zhang,Q., Kodira,C.D., Zheng,X.H., Chen,L., Skupski,M.,

Subramanian,G., Thomas,P.D., Zhang,J., Gabor Miklos,G.L.,

Nelson,C., Broder,S., Clark,A.G., Nadeau,J., McKusick,V.A.,

Zinder,N., Levine,A.J., Roberts,R.J., Simon,M., Slayman,C.,

Hunkapiller,M., Bolanos,R., Delcher,A., Dew,I., Fasulo,D.,

Flanigan,M., Florea,L., Halpern,A., Hannenhalli,S., Kravitz,S.,

Levy,S., Mobarry,C., Reinert,K., Remington,K., Abu-Threideh,J.,

Beasley,E., Biddick,K., Bonazzi,V., Brandon,R., Cargill,M.,

Chandramouliswaran,I., Charlab,R., Chaturvedi,K., Deng,Z., Di

Francesco,V., Dunn,P., Eilbeck,K., Evangelista,C., Gabrielian,A.E.,

Gan,W., Ge,W., Gong,F., Gu,Z., Guan,P., Heiman,T.J., Higgins,M.E.,

Ji,R.R., Ke,Z., Ketchum,K.A., Lai,Z., Lei,Y., Li,Z., Li,J.,

Liang,Y., Lin,X., Lu,F., Merkulov,G.V., Milshina,N., Moore,H.M.,

Naik,A.K., Narayan,V.A., Neelam,B., Nusskern,D., Rusch,D.B.,

Salzberg,S., Shao,W., Shue,B., Sun,J., Wang,Z., Wang,A., Wang,X.,

Wang,J., Wei,M., Wides,R., Xiao,C., Yan,C., Yao,A., Ye,J., Zhan,M.,

Zhang,W., Zhang,H., Zhao,Q., Zheng,L., Zhong,F., Zhong,W., Zhu,S.,

Zhao,S., Gilbert,D., Baumhueter,S., Spier,G., Carter,C.,

Cravchik,A., Woodage,T., Ali,F., An,H., Awe,A., Baldwin,D.,

Baden,H., Barnstead,M., Barrow,I., Beeson,K., Busam,D., Carver,A.,

Center,A., Cheng,M.L., Curry,L., Danaher,S., Davenport,L.,

Desilets,R., Dietz,S., Dodson,K., Doup,L., Ferriera,S., Garg,N.,

Gluecksmann,A., Hart,B., Haynes,J., Haynes,C., Heiner,C.,

Hladun,S., Hostin,D., Houck,J., Howland,T., Ibegwam,C., Johnson,J.,

Kalush,F., Kline,L., Koduru,S., Love,A., Mann,F., May,D.,

McCawley,S., McIntosh,T., McMullen,I., Moy,M., Moy,L., Murphy,B.,

Nelson,K., Pfannkoch,C., Pratts,E., Puri,V., Qureshi,H.,

Reardon,M., Rodriguez,R., Rogers,Y.H., Romblad,D., Ruhfel,B.,

Scott,R., Sitter,C., Smallwood,M., Stewart,E., Strong,R., Suh,E.,

Thomas,R., Tint,N.N., Tse,S., Vech,C., Wang,G., Wetter,J.,

Williams,S., Williams,M., Windsor,S., Winn-Deen,E., Wolfe,K.,

Zaveri,J., Zaveri,K., Abril,J.F., Guigo,R., Campbell,M.J.,

Sjolander,K.V., Karlak,B., Kejariwal,A., Mi,H., Lazareva,B.,

Hatton,T., Narechania,A., Diemer,K., Muruganujan,A., Guo,N.,

Sato,S., Bafna,V., Istrail,S., Lippert,R., Schwartz,R., Walenz,B.,

Yooseph,S., Allen,D., Basu,A., Baxendale,J., Blick,L., Caminha,M.,

Carnes-Stine,J., Caulk,P., Chiang,Y.H., Coyne,M., Dahlke,C.,

Mays,A., Dombroski,M., Donnelly,M., Ely,D., Esparham,S., Fosler,C.,

Gire,H., Glanowski,S., Glasser,K., Glodek,A., Gorokhov,M.,

Graham,K., Gropman,B., Harris,M., Heil,J., Henderson,S., Hoover,J.,

Jennings,D., Jordan,C., Jordan,J., Kasha,J., Kagan,L., Kraft,C.,

Levitsky,A., Lewis,M., Liu,X., Lopez,J., Ma,D., Majoros,W.,

McDaniel,J., Murphy,S., Newman,M., Nguyen,T., Nguyen,N., Nodell,M.,

Pan,S., Peck,J., Peterson,M., Rowe,W., Sanders,R., Scott,J.,

Simpson,M., Smith,T., Sprague,A., Stockwell,T., Turner,R.,

Venter,E., Wang,M., Wen,M., Wu,D., Wu,M., Xia,A., Zandieh,A. and

Zhu,X.

TITLE     The sequence of the human genome

JOURNAL   Science 291 (5507), 1304-1351 (2001)

PUBMED   11181995

REMARK    Erratum:[Science 2001 Jun 5;292(5523):1838]

COMMENT     GENOME ANNOTATION REFSEQ:   Features on this sequence have been

produced for build 37 version 1 of the NCBI’s genome annotation

[see documentation].

The DNA sequence was produced by Celera Genomics. It is included in

the NCBI RefSeq collection as an alternative assembly to the one

produced by the Genome Reference Consortium. The original whole

genome shotgun project has the project accession AADB00000000.2.

FEATURES             Location/Qualifiers

source          1…611322

/organism=“Homo sapiens”

/mol_type=“genomic DNA”

/db_xref=“taxon:9606”

/chromosome=“22”

gap             10478…12022

/estimated_length=1545

gap             21898…21917

/estimated_length=20

gap             28776…28795

/estimated_length=20

gap             102320…102700

/estimated_length=381

gap             117693…117936

/estimated_length=244

gap             131318…134090

/estimated_length=2773

gap             147144…147163

/estimated_length=20

gap             149879…150344

/estimated_length=466

gap             164002…164360

/estimated_length=359

gap             167157…167176

/estimated_length=20

[color=brown] gene[/color]            193425…207387

/gene=“LOC648218”

/note=“The sequence of the transcript was modified to

remove a frameshift represented in this assembly; Derived

by automated computational analysis using gene prediction

method: GNOMON. Supporting evidence includes similarity

to: 1 mRNA, 1 Protein”

/pseudo

/db_xref=“GeneID:648218”

misc_RNA        join(193425…193889,197932…198046,198209…198383,

200554…200660,201621…201783,204342…204406,

207068…207387)

/gene=“LOC648218”

/product=“similar to hCG1793014”

/exception=“unclassified transcription discrepancy”

/note=“Derived by automated computational analysis using

gene prediction method: GNOMON. Supporting evidence

includes similarity to: 1 mRNA, 1 Protein”

/pseudo

/transcript_id=“XR_038470.2”

/db_xref=“GI:239752108”

/db_xref=“GeneID:648218”

gene            207127…216058

/gene=“LOC100289959”

/note=“Derived by automated computational analysis using

gene prediction method: GNOMON. Supporting evidence

includes similarity to: 1 Protein”

/db_xref=“GeneID:100289959”

[color=blue]mRNA[/color]            join(207127…207157,215733…216058)

/gene=“LOC100289959”

/product=“similar to hCG1644292”

/note=“Derived by automated computational analysis using

gene prediction method: GNOMON. Supporting evidence

includes similarity to: 1 Protein”

/transcript_id=“XM_002348072.1”

/db_xref=“GI:239752109”

/db_xref=“GeneID:100289959”

CDS             join(207127…207157,215733…216058)

/gene=“LOC100289959”

/note=“Derived by automated computational analysis using

gene prediction method: GNOMON.”

/codon_start=1

/product=“hypothetical protein XP_002348113”

/protein_id=“XP_002348113.1”

/db_xref=“GI:239752110”

/db_xref=“GeneID:100289959”

gene            complement(256329…257494)

/gene=“LOC100289992”

/note=“The sequence of the transcript was modified to

remove frameshifts and prevent a premature stop codon

represented in this assembly; Derived by automated

computational analysis using gene prediction method:

GNOMON. Supporting evidence includes similarity to: 1

Protein”

/pseudo

/db_xref=“GeneID:100289992”

exon            complement(256329…256532)

/gene=“LOC100289992”

/exception=“unclassified transcription discrepancy”

/note=“Derived by automated computational analysis using

gene prediction method: GNOMON. Supporting evidence

includes similarity to: 1 Protein”

/number=1

/pseudo

/db_xref=“GeneID:100289992”

exon            complement(256611…257494)

/gene=“LOC100289992”

/exception=“unclassified transcription discrepancy”

/note=“Derived by automated computational analysis using

gene prediction method: GNOMON. Supporting evidence

includes similarity to: 1 Protein”

/number=2

/pseudo

/db_xref=“GeneID:100289992”

gene            complement(257500…274289)

/gene=“LOC100133475”

/note=“Derived by automated computational analysis using

gene prediction method: GNOMON. Supporting evidence

includes similarity to: 1 Protein”

/db_xref=“GeneID:100133475”

mRNA            complement(join(257500…258155,258408…258906,

273114…273123,273653…273731,274247…274289))

/gene=“LOC100133475”

/product=“similar to Putative zinc finger protein

ENSP00000328166”

4 Respostas

Topicos relacionados