Busca de PDF

19 de julho de 2011 11 respostas

denisspitfire 19 de julho de 2011

Pessoal é o seguinte, dei uma pesquisada no Forum achei algumas coisas, mas não é exatamente oque eu quero.
Preciso somente de um método que busque pelo titulo do arquivo ou pelo oque esta escrito dentro do arquivo em pdf, e retorne todos eles. É uma série de pdfs em que são feitas diversas buscas. E eu gostaria de desenvolver isso em Java.
Obrigado desde já.

11 Respostas

leoramos 19 de jul. de 2011

Pesquisa sobre Apache Lucene.
Não sei se é a melhor opção, mas foi a que me veio à mente.
Inté!

drsmachado 19 de jul. de 2011

Cara, com o uso de java.io.File você consegue pegar o nome dos arquivos.
Com o icePDF você consegue ler o conteúdo.
Estude estas APIs e você consegue.

finotti 19 de jul. de 2011

Para localizar os arquivos pelo nome, a melhor opção é usando java.io.File como disse o drsmachado.
Para pesquisar o conteúdo dos arquivos, uma opção é o Hibernate Search, que trabalha junto com o Lucene.
Aqui tem um exemplo

andredecotia 19 de jul. de 2011

Lembro ter usado o Apache Lucene para fazer isso…

denisspitfire 20 de jul. de 2011

Onde eu poderia achar algum tutorial de como procurar os arquivos pelos nomes.
Por exemplo, preciso saber quantos arquivos tem o nome teste.pdf.
Eu vi o link do Hibernate, mas nao cheguei a criar nada com Banco de dados ainda. :?
Alguem poderia me ajudar ?

finotti 20 de jul. de 2011

http://www.guj.com.br/articles/13

nel 20 de jul. de 2011

Oi!

Basicamente seria isso:

public static void main(String[] args) { String fileName = "teste.pdf"; FileFilter filter = new FileFilter() { @Override public boolean accept(File pathname) { return pathname.getName().endsWith(".pdf"); } }; int totalFiles = 0; File file = new File(System.getProperty("user.dir")); File[] filesOnDir = file.listFiles(filter); for(File f : filesOnDir) { if(f.getName().equalsIgnoreCase(fileName)) { totalFiles++; } } System.out.println("Total de arquivos com o nome " + fileName + " é: " + totalFiles); }

O 'System.getProperty(“user.dir”) ’ foi apenas para ilustrar o exemplo. Usei um filtro, onde indico que somente serão aceitos arquivos com extensão PDF. Isso facilita, pois, se houver 1000 arquivos no diretório, você não vai percorrer uma lista de 1000 posições, simplesmente o total de arquivos PDF.

Abraços.

denisspitfire 20 de jul. de 2011

Vlw finotti e Nel. Ajudou muito vou dar uma estudada.

denisspitfire 6 de set. de 2011

Pessoal, voltei. Depois de algumas duvidas a respeito de leitura de arquivo e algumas pesquisas. Consegui com a ajuda do pessoal do GUJ, este resultado.

import java.io.BufferedReader;
import java.io.FileReader;
import java.util.ArrayList;
import java.util.Scanner;

public class principal {
	public static void main(String[] args) throws Exception {
		Scanner arquivo = new Scanner(new FileReader("arquivo.txt"));
		ArrayList<Caracteristica> pilha = new ArrayList<Caracteristica>();
		Caracteristica objetoTemp;
		long inicio = System.currentTimeMillis();

		while (arquivo.hasNextLine()) {
			String linha = arquivo.nextLine();
			Caracteristica debenture = new Caracteristica();
			debenture.caracteristicas = linha.split(";");
			pilha.add(debenture);
		}
			for (int p = 0; p < pilha.size(); p++) {
				for (int i = 0; i < pilha.get(p).caracteristicas.length; i++) {
					if (i == pilha.get(p).caracteristicas.length-1) {
						String pontuacao = ".";
						System.out.print(pilha.get(p).caracteristicas[i]+ pontuacao);
					}
					else{
						String pontuacao = " ";
						System.out.print(pilha.get(p).caracteristicas[i]+ pontuacao);
					}
				}
				System.out.println("");
			}
		long fim = System.currentTimeMillis();
		long total = fim - inicio;
		System.out.println("Tempo Total: "+total);
	}
}

Este leitor joga as informações que tem em um txt dentro de uma pilha para que nao chegue ao final da capacidade de um Array…
No inicio, ele ja espera que exista um txt… como posso fazer para que ele pesquise dentro de uma pasta raiz, um pdf… e se nao tiver o pdf, ele avisa dando uma mensagem amigavel? sei que é com try e catch… mas qual exception devo usar? e… se nao for por favor me corrijam.
Obrigado.

denisspitfire 6 de set. de 2011

drsmachado:
Cara, com o uso de java.io.File você consegue pegar o nome dos arquivos.
Com o icePDF você consegue ler o conteúdo.
Estude estas APIs e você consegue.

Baixei os jars. Ele é Opensource… mas para empresa tem algum problema usar? ou ele é somente para estudo?

denisspitfire 6 de set. de 2011

Galera… eu peguei esse código de exemplo de dentro do download do icePDF.

mas esta dando um erro e nao sei por onde começar a estudar… pois ele da um erro ò.ó .Da uma luz please.

/*
 * Version: MPL 1.1/GPL 2.0/LGPL 2.1
 *
 * "The contents of this file are subject to the Mozilla Public License
 * Version 1.1 (the "License"); you may not use this file except in
 * compliance with the License. You may obtain a copy of the License at
 * http://www.mozilla.org/MPL/
 *
 * Software distributed under the License is distributed on an "AS IS"
 * basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the
 * License for the specific language governing rights and limitations under
 * the License.
 *
 * The Original Code is ICEpdf 3.0 open source software code, released
 * May 1st, 2009. The Initial Developer of the Original Code is ICEsoft
 * Technologies Canada, Corp. Portions created by ICEsoft are Copyright (C)
 * 2004-2011 ICEsoft Technologies Canada, Corp. All Rights Reserved.
 *
 * Contributor(s): _____________________.
 *
 * Alternatively, the contents of this file may be used under the terms of
 * the GNU Lesser General Public License Version 2.1 or later (the "LGPL"
 * License), in which case the provisions of the LGPL License are
 * applicable instead of those above. If you wish to allow use of your
 * version of this file only under the terms of the LGPL License and not to
 * allow others to use your version of this file under the MPL, indicate
 * your decision by deleting the provisions above and replace them with
 * the notice and other provisions required by the LGPL License. If you do
 * not delete the provisions above, a recipient may use your version of
 * this file under either the MPL or the LGPL License."
 *
 */


import org.icepdf.core.pobjects.Document;
import org.icepdf.core.pobjects.graphics.text.WordText;
import org.icepdf.core.search.DocumentSearchController;
import org.icepdf.ri.common.SwingController;
import org.icepdf.ri.common.SwingViewBuilder;

import javax.swing.*;
import java.util.ArrayList;

/**
 * The <code>SearchHighlight</code> class is an example of how to use
 * <code>DocumentSearchController</code> to highlight search terms in a
 * Document view.  A file specified at the command line is
 * opened in a JFrame which contains the viewer component and any number
 * of search terms can be specefied after the file name.
 * <p/>
 * Example:
 *   SearchHighlight "c:\DevelopersGuide.pdf" "PDF" "ICEsoft" "ICEfaces" "ICEsoft technologies"
 *
 * @since 4.0
 */
public class SearchController {
    public static void main(String[] args) {

        if (args.length < 2) {
            System.out.println("At leasts two command line arguments must " +
                    "be specified. ");
            System.out.println("<filename> <term1> ... <termN>");
        }

        // Get a file from the command line to open
        String filePath = args[0];

        // get search terms from command line
        String[] terms = new String[args.length - 1];
        for (int i = 1, max = args.length; i < max; i++) {
            terms[i - 1] = args[i];
        }

        // build a component controller
        SwingController controller = new SwingController();

        SwingViewBuilder factory = new SwingViewBuilder(controller);

        JPanel viewerComponentPanel = factory.buildViewerPanel();

        JFrame applicationFrame = new JFrame();
        applicationFrame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
        applicationFrame.getContentPane().add(viewerComponentPanel);

        // Now that the GUI is all in place, we can try opening the PDF
        controller.openDocument(filePath);

        // show the component
        applicationFrame.pack();
        applicationFrame.setVisible(true);

        /**
         * Start of a simple search for the loaded file
         */
        // get the search controller
        DocumentSearchController searchController =
                controller.getDocumentSearchController();
        // add a specified search terms.
        for (String term : terms) {
            searchController.addSearchTerm(term, false, false);
        }
        // search the pages in the document or a subset
        Document document = controller.getDocument();
        // list of founds words to print out
        ArrayList<WordText> foundWords;
        for (int pageIndex = 0; pageIndex < document.getNumberOfPages();
             pageIndex++) {
            foundWords = searchController.searchPage(pageIndex);
            System.out.println("Page " + pageIndex);
            if (foundWords != null){
                for (WordText wordText : foundWords){
                    System.out.println("    found hit: " + wordText.toString());
                }
            }
        }

    }
}

Erro

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
At leasts two command line arguments must be specified. 
<filename> <term1> ... <termN>
	at SearchController.main(SearchController.java:66)

Como ele vai saber qual palavra buscar??? como ele vai saber onde o documento esta??? acho que esta faltando algo no código

Criado 19 de julho de 2011

Ultima resposta 6 de set. de 2011

Respostas 11

Participantes 6

11 Respostas

Topicos relacionados