Biojava

Olá Pessoal

Estou precisando de um código pra ler cabeçalhos em um arquivo com uma extensão .gbk. É um projeto para trabalhar com sequenciamentos genéticos, não achei nada que fizesse isso nem no site Biojava.org , se alguém puder ajudar eu agradeço

Alex

poste o layout do arquivo que voce quer ler.

Olá

Segue abaixo um dos cabeçalhos do arquivo. Existem muitos dentro do arquivo
Pra começar eu preciso encontrar o “LOCUS” (em vermelho) e seu identificador(em verde), a linha “gene”(em marrom) que vai ter um um identificador com seu nome no modelo “/gene=nomedogene”, depois disso preciso achar a linha que contém o mRNA(em azul) e capturar os valores contidos entre parenteses, cada mRNA possui uma linha que identifica a qual gene pertence no mesmo modelo “/gene=nomedogene”. Acredito que daria pra fazer isso sem grandes problemas no dedo mas o projeto BIOJAVA.ORG possui várias api’s, mas não achei nenhuma que eu pudesse fazer isso.

//
[color=red]LOCUS[/color] [color=green] NW_927395[/color] 611322 bp DNA linear CON 10-JUN-2009
DEFINITION Homo sapiens chromosome 22 genomic contig, alternate assembly
(based on Celera), whole genome shotgun sequence.
ACCESSION NW_927395
VERSION NW_927395.1 GI:89059083
DBLINK Project:16116
KEYWORDS WGS.
SOURCE Homo sapiens (human)
ORGANISM Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
Catarrhini; Hominidae; Homo.
REFERENCE 1 (bases 1 to 611322)
AUTHORS Istrail,S., Sutton,G.G., Florea,L., Halpern,A.L., Mobarry,C.M.,
Lippert,R., Walenz,B., Shatkay,H., Dew,I., Miller,J.R.,
Flanigan,M.J., Edwards,N.J., Bolanos,R., Fasulo,D.,
Halldorsson,B.V., Hannenhalli,S., Turner,R., Yooseph,S., Lu,F.,
Nusskern,D.R., Shue,B.C., Zheng,X.H., Zhong,F., Delcher,A.L.,
Huson,D.H., Kravitz,S.A., Mouchard,L., Reinert,K., Remington,K.A.,
Clark,A.G., Waterman,M.S., Eichler,E.E., Adams,M.D.,
Hunkapiller,M.W., Myers,E.W. and Venter,J.C.
TITLE Whole-genome shotgun assembly and comparison of human genome
assemblies
JOURNAL Proc. Natl. Acad. Sci. U.S.A. 101 (7), 1916-1921 (2004)
PUBMED 14769938
REFERENCE 2 (bases 1 to 611322)
AUTHORS Venter,J.C., Adams,M.D., Myers,E.W., Li,P.W., Mural,R.J.,
Sutton,G.G., Smith,H.O., Yandell,M., Evans,C.A., Holt,R.A.,
Gocayne,J.D., Amanatides,P., Ballew,R.M., Huson,D.H., Wortman,J.R.,
Zhang,Q., Kodira,C.D., Zheng,X.H., Chen,L., Skupski,M.,
Subramanian,G., Thomas,P.D., Zhang,J., Gabor Miklos,G.L.,
Nelson,C., Broder,S., Clark,A.G., Nadeau,J., McKusick,V.A.,
Zinder,N., Levine,A.J., Roberts,R.J., Simon,M., Slayman,C.,
Hunkapiller,M., Bolanos,R., Delcher,A., Dew,I., Fasulo,D.,
Flanigan,M., Florea,L., Halpern,A., Hannenhalli,S., Kravitz,S.,
Levy,S., Mobarry,C., Reinert,K., Remington,K., Abu-Threideh,J.,
Beasley,E., Biddick,K., Bonazzi,V., Brandon,R., Cargill,M.,
Chandramouliswaran,I., Charlab,R., Chaturvedi,K., Deng,Z., Di
Francesco,V., Dunn,P., Eilbeck,K., Evangelista,C., Gabrielian,A.E.,
Gan,W., Ge,W., Gong,F., Gu,Z., Guan,P., Heiman,T.J., Higgins,M.E.,
Ji,R.R., Ke,Z., Ketchum,K.A., Lai,Z., Lei,Y., Li,Z., Li,J.,
Liang,Y., Lin,X., Lu,F., Merkulov,G.V., Milshina,N., Moore,H.M.,
Naik,A.K., Narayan,V.A., Neelam,B., Nusskern,D., Rusch,D.B.,
Salzberg,S., Shao,W., Shue,B., Sun,J., Wang,Z., Wang,A., Wang,X.,
Wang,J., Wei,M., Wides,R., Xiao,C., Yan,C., Yao,A., Ye,J., Zhan,M.,
Zhang,W., Zhang,H., Zhao,Q., Zheng,L., Zhong,F., Zhong,W., Zhu,S.,
Zhao,S., Gilbert,D., Baumhueter,S., Spier,G., Carter,C.,
Cravchik,A., Woodage,T., Ali,F., An,H., Awe,A., Baldwin,D.,
Baden,H., Barnstead,M., Barrow,I., Beeson,K., Busam,D., Carver,A.,
Center,A., Cheng,M.L., Curry,L., Danaher,S., Davenport,L.,
Desilets,R., Dietz,S., Dodson,K., Doup,L., Ferriera,S., Garg,N.,
Gluecksmann,A., Hart,B., Haynes,J., Haynes,C., Heiner,C.,
Hladun,S., Hostin,D., Houck,J., Howland,T., Ibegwam,C., Johnson,J.,
Kalush,F., Kline,L., Koduru,S., Love,A., Mann,F., May,D.,
McCawley,S., McIntosh,T., McMullen,I., Moy,M., Moy,L., Murphy,B.,
Nelson,K., Pfannkoch,C., Pratts,E., Puri,V., Qureshi,H.,
Reardon,M., Rodriguez,R., Rogers,Y.H., Romblad,D., Ruhfel,B.,
Scott,R., Sitter,C., Smallwood,M., Stewart,E., Strong,R., Suh,E.,
Thomas,R., Tint,N.N., Tse,S., Vech,C., Wang,G., Wetter,J.,
Williams,S., Williams,M., Windsor,S., Winn-Deen,E., Wolfe,K.,
Zaveri,J., Zaveri,K., Abril,J.F., Guigo,R., Campbell,M.J.,
Sjolander,K.V., Karlak,B., Kejariwal,A., Mi,H., Lazareva,B.,
Hatton,T., Narechania,A., Diemer,K., Muruganujan,A., Guo,N.,
Sato,S., Bafna,V., Istrail,S., Lippert,R., Schwartz,R., Walenz,B.,
Yooseph,S., Allen,D., Basu,A., Baxendale,J., Blick,L., Caminha,M.,
Carnes-Stine,J., Caulk,P., Chiang,Y.H., Coyne,M., Dahlke,C.,
Mays,A., Dombroski,M., Donnelly,M., Ely,D., Esparham,S., Fosler,C.,
Gire,H., Glanowski,S., Glasser,K., Glodek,A., Gorokhov,M.,
Graham,K., Gropman,B., Harris,M., Heil,J., Henderson,S., Hoover,J.,
Jennings,D., Jordan,C., Jordan,J., Kasha,J., Kagan,L., Kraft,C.,
Levitsky,A., Lewis,M., Liu,X., Lopez,J., Ma,D., Majoros,W.,
McDaniel,J., Murphy,S., Newman,M., Nguyen,T., Nguyen,N., Nodell,M.,
Pan,S., Peck,J., Peterson,M., Rowe,W., Sanders,R., Scott,J.,
Simpson,M., Smith,T., Sprague,A., Stockwell,T., Turner,R.,
Venter,E., Wang,M., Wen,M., Wu,D., Wu,M., Xia,A., Zandieh,A. and
Zhu,X.
TITLE The sequence of the human genome
JOURNAL Science 291 (5507), 1304-1351 (2001)
PUBMED 11181995
REMARK Erratum:[Science 2001 Jun 5;292(5523):1838]
COMMENT GENOME ANNOTATION REFSEQ: Features on this sequence have been
produced for build 37 version 1 of the NCBI’s genome annotation
[see documentation].
The DNA sequence was produced by Celera Genomics. It is included in
the NCBI RefSeq collection as an alternative assembly to the one
produced by the Genome Reference Consortium. The original whole
genome shotgun project has the project accession AADB00000000.2.
FEATURES Location/Qualifiers
source 1…611322
/organism=“Homo sapiens”
/mol_type=“genomic DNA”
/db_xref=“taxon:9606”
/chromosome=“22”
gap 10478…12022
/estimated_length=1545
gap 21898…21917
/estimated_length=20
gap 28776…28795
/estimated_length=20
gap 102320…102700
/estimated_length=381
gap 117693…117936
/estimated_length=244
gap 131318…134090
/estimated_length=2773
gap 147144…147163
/estimated_length=20
gap 149879…150344
/estimated_length=466
gap 164002…164360
/estimated_length=359
gap 167157…167176
/estimated_length=20
[color=brown] gene[/color] 193425…207387
/gene=“LOC648218”
/note=“The sequence of the transcript was modified to
remove a frameshift represented in this assembly; Derived
by automated computational analysis using gene prediction
method: GNOMON. Supporting evidence includes similarity
to: 1 mRNA, 1 Protein”
/pseudo
/db_xref=“GeneID:648218”
misc_RNA join(193425…193889,197932…198046,198209…198383,
200554…200660,201621…201783,204342…204406,
207068…207387)
/gene=“LOC648218”
/product=“similar to hCG1793014”
/exception=“unclassified transcription discrepancy”
/note=“Derived by automated computational analysis using
gene prediction method: GNOMON. Supporting evidence
includes similarity to: 1 mRNA, 1 Protein”
/pseudo
/transcript_id=“XR_038470.2”
/db_xref=“GI:239752108”
/db_xref=“GeneID:648218”
gene 207127…216058
/gene=“LOC100289959”
/note=“Derived by automated computational analysis using
gene prediction method: GNOMON. Supporting evidence
includes similarity to: 1 Protein”
/db_xref=“GeneID:100289959”
[color=blue]mRNA[/color] join(207127…207157,215733…216058)
/gene=“LOC100289959”
/product=“similar to hCG1644292”
/note=“Derived by automated computational analysis using
gene prediction method: GNOMON. Supporting evidence
includes similarity to: 1 Protein”
/transcript_id=“XM_002348072.1”
/db_xref=“GI:239752109”
/db_xref=“GeneID:100289959”
CDS join(207127…207157,215733…216058)
/gene=“LOC100289959”
/note=“Derived by automated computational analysis using
gene prediction method: GNOMON.”
/codon_start=1
/product=“hypothetical protein XP_002348113”
/protein_id=“XP_002348113.1”
/db_xref=“GI:239752110”
/db_xref=“GeneID:100289959”
gene complement(256329…257494)
/gene=“LOC100289992”
/note=“The sequence of the transcript was modified to
remove frameshifts and prevent a premature stop codon
represented in this assembly; Derived by automated
computational analysis using gene prediction method:
GNOMON. Supporting evidence includes similarity to: 1
Protein”
/pseudo
/db_xref=“GeneID:100289992”
exon complement(256329…256532)
/gene=“LOC100289992”
/exception=“unclassified transcription discrepancy”
/note=“Derived by automated computational analysis using
gene prediction method: GNOMON. Supporting evidence
includes similarity to: 1 Protein”
/number=1
/pseudo
/db_xref=“GeneID:100289992”
exon complement(256611…257494)
/gene=“LOC100289992”
/exception=“unclassified transcription discrepancy”
/note=“Derived by automated computational analysis using
gene prediction method: GNOMON. Supporting evidence
includes similarity to: 1 Protein”
/number=2
/pseudo
/db_xref=“GeneID:100289992”
gene complement(257500…274289)
/gene=“LOC100133475”
/note=“Derived by automated computational analysis using
gene prediction method: GNOMON. Supporting evidence
includes similarity to: 1 Protein”
/db_xref=“GeneID:100133475”
mRNA complement(join(257500…258155,258408…258906,
273114…273123,273653…273731,274247…274289))
/gene=“LOC100133475”
/product=“similar to Putative zinc finger protein
ENSP00000328166”

Da uma verificada nisto e ve se atende eu peguei no proprio Guj so não me lembro do POST

Infelizmente não serve, você trabalhou com biojava?

De qualquer forma, obrigado pela atenção

Alex