zoukankan      html  css  js  c++  java
  • Open Source Search Engines in Java

    Open Source Search Engines in Java

        Open Source Search Engines in Java
        Compass

        The Compass Framework is a first class open source Java framework, enabling the power of Search Engine semantics to your application stack declaratively. Built on top of the amazing Lucene Search Engine, Compass integrates seamlessly to popular development frameworks like Hibernate and Spring. It provides search capability to your application data model and synchronises changes with the datasource. With Compass: write less code, find data quicker.

        Go To Compass
        Oxyus

        Oxyus Search Engine is a Java based Application for indexing web documents for searching from an intranet or the Internet similar to other propietary search engines of the industry. Oxyus has a web module to present search results to the clients throught web browsers using Java Server that access a JDBC repository through Java Beans.

        Go To Oxyus
        BDDBot



        DDBot is a web robot, search engine, and web server written entirely in Java. It was written as an example for a chapter on how to write your search engines, and as such it is very simplistic.

        Go To BDDBot
        Egothor

        Egothor is an Open Source, high-performance, full-featured text search engine written entirely in Java. It is technology suitable for nearly any application that requires full-text search, especially cross-platform. It can be configured as a standalone engine, metasearcher, peer-to-peer HUB, and, moreover, it can be used as a library for an application that needs full-text search.

        Go To Egothor
        Nutch

        Nutch is a nascent effort to implement an open-source web search engine. Nutch provides a transparent alternative to commercial web search engines.

        Go To Nutch
        Lucene

        Jakarta Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

        Go To Lucene
        Zilverline

        Zilverline is what you could call a 'Reverse Search Engine'.

        It indexes documents from your local disks (and UNC path style network disks), and allows you to search through them locally or if you're away from your machine, through a webserver on your machine.

        Zilverline supports collections. A collection is a set of files and directories in a directory. PDF, Word, txt, java, CHM and HTML is supported, as well as zip and rar files. A collection can be indexed, and searched. The results of the search can be retrieved from local disk or remotely, if you run a webserver on your machine. Files inside zip, rar and chm files are extracted, indexed and can be cached. The cache can be mapped to sit behind your webserver as well.

        Go To Zilverline
        YaCy

        This is a distributed web crawler and also a caching HTTP proxy. You are using the online-interface of the application. You can use this interface to configure your personal settings, proxy settings, access control and crawling properties. You can also use this interface to start crawls, send messages to other peers and monitor your index, cache status and crawling processes. Most important, you can use the search page to search either your own or the global index.

        Go To YaCy
        Lius

        LIUS - Lucene Index Update and Search
        LIUS is an indexing Java framework based on the Jakarta Lucene project. The LIUS framework adds to Lucene many files format indexing fonctionalities as: Ms Word, Ms Excel, Ms PowerPoint, RTF, PDF, XML, HTML, TXT, Open Office suite and JavaBeans.
        LIUS is very easy to use; all the configuration of the indexing (types of files to be indexed, fields, etc...) as well as research is defined in a XML file, so the user only have to write few lines of code to carry out the indexing or research.

        LIUS has been developed from a range of Java technologies and full open source applications.

        Go To Lius
        Solr

        Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP APIs, caching, replication, and a web administration interface.

        Go To Solr
        regain

        ´regain´ is a fast search engine on top of Jakarta-Lucene. It crawles through files or webpages using a plugin architecture of preparators for several file formats and data sources. Search requests are handled via browser based user interface using Java server pages. ´regain´ is released under LGPL and comes in two versions:

        1. standalone desktop search program including crawler and http-server
        2. server based installation providing full text searching functionality for a website or intranet fileserver using XML configuration files.

        Go To regain
        MG4J

        MG4J (Managing Gigabytes for Java) is a collaborative effort aimed at providing a free Java implementation of inverted-index compression techniques; as a by-product, it offers several general-purpose optimised classes, including fast and compact mutable strings, bit-level I/O, fast unsynchronised buffered streams, (possibly signed) minimal perfect hashing, etc. MG4J functions as a full-fledged text-indexing system. It can analyze, index, and query consistently large document collections.

        Go To MG4J
        Piscator

        Piscator is a small SQL/XML search engine. Once an XML feed is loaded, it can be queried using plain SQL. The setup is almost identical to the DB2 side tables approach.

        Go To Piscator
        Hounder

        Hounder is a simple and complete search system. Out of the box, Hounder crawls the web targeting only those documents of interest, and presents them through a simple search web page and through an API, ideal for integrating into other projects. It is designed to scale on all fronts: the number of the indexed pages, the crawling speed and the number of simultaneous search queries. It is in use in many large scale search systems.

        Go To Hounder
        HSearch
        HSearch is an open source, NoSQL Search Engine built on Hadoop and HBase. HSearch features include:



         * Multiple document formats

         * Record and document level search access control

         * Continuous index updating

         * Parallel indexing using multiple machines

         * Embeddable application

         * A REST-ful Web service gateway that supports XML

         * Auto sharding

         * Auto replication

  • 相关阅读:
    jquery实现下拉框多选
    最好的Angular2表格控件
    CSS3阴影 box-shadow的使用和技巧总结
    存档2
    Python的编码注释# -*- coding:utf-8 -*-
    路由器与交换机区别
    TCP的流量控制
    TCP的拥塞控制
    存储管理之页式、段式、段页式存储
    什么是死锁?其条件是什么?怎样避免死锁?
  • 原文地址:https://www.cnblogs.com/lexus/p/2383599.html
Copyright © 2011-2022 走看看