Search and Cache Engines development by Kevin Duraj since 1994

Category Archives: Search Engine Architecture

MG4J: Managing Gigabytes for Java

MG4J (Managing Gigabytes for Java) is a free full-text search engine for large document collections written in Java. MG4J is a highly customisable, high-performance, full-fledged search engine providing state-of-the-art features (such as BM25/BM25F scoring) and new research algorithms.’
Reference: http://mg4j.dsi.unimi.it/

Splitting terms into separate B-Trees.

Reference: http://lists.xapian.org/pipermail/xapian-discuss/2007-May/003889.html
If we would partition databases by term’s first characters and of
course some terms would generate larger B-tree then others, then we
could infinitely partition terms by more the one, two characters like
this … until we bring the size of the B-Tree database to small enough
to handle by many small servers …

Example terms goes to:
- google [...]