Category Archives: Search Engine Architecture
MG4J: Managing Gigabytes for Java
MG4J (Managing Gigabytes for Java) is a free full-text search engine for large document collections written in Java. MG4J is a highly customisable, high-performance, full-fledged search engine providing state-of-the-art features (such as BM25/BM25F scoring) and new research algorithms.’
Reference: http://mg4j.dsi.unimi.it/
Splitting terms into separate B-Trees.
Reference: http://lists.xapian.org/pipermail/xapian-discuss/2007-May/003889.html
If we would partition databases by term’s first characters and of
course some terms would generate larger B-tree then others, then we
could infinitely partition terms by more the one, two characters like
this … until we bring the size of the B-Tree database to small enough
to handle by many small servers …
Example terms goes to:
- google [...]