Thoughts about technology, business and all that's life.

This blog has moved to http://shal.in.

Thursday, November 26, 2009

Apache Lucene Java 3.0 Released


Apache Lucene Java 3.0.0 has been released. Lucene Java 3.0.0 is mostly a clean-up release without any new features. It paves the path for refactoring and adding new features without the shackles of backwards compatibility. All APIs deprecated in Lucene 2.9 have been removed and Lucene Java has officially moved to Java 5 as the minimum requirement.

See the announcement email for more details. Congratulations Lucene Devs!

Wednesday, November 18, 2009

Apache Mahout 0.2 Released



Apache Mahout 0.2 has been released. Apache Mahout is a project which attempts to make machine learning both scalable and accessible. It is a sub-project of the excellent Apache Lucene project which provides open source search software.

From the project website:

The Apache Lucene project is pleased to announce the release of Apache Mahout 0.2.

Highlights include:

  • Significant performance increase (and API changes) in collaborative filtering engine
  • K-nearest-neighbor and SVD recommenders
  • Much code cleanup, bug fixing
  • Random forests, frequent pattern mining using parallel FP growth
  • Latent Dirichlet Allocation
  • Updates for Hadoop 0.20.x

Details on what's included can be found in the release notes.

Downloads are available from the Apache Mirrors


Tuesday, November 10, 2009

Apache Solr 1.4 Released


From the official announcement:

Apache Solr 1.4 has been released and is now available for public download!
http://www.apache.org/dyn/closer.cgi/lucene/solr/

Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of
many of the world's largest internet sites.

Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Tomcat. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. Solr's powerful external configuration allows it to be tailored to almost any type of application without Java coding, and it has an extensive plugin architecture when more advanced customization is required.

New Solr 1.4 features include
  • Major performance enhancements in indexing, searching, and faceting
  • Revamped all-Java index replication that's simple to configure and can replicate configuration files
  • Greatly improved database integration via the DataImportHandler
  • Rich document processing (Word, PDF, HTML) via Apache Tika
  • Dynamic search results clustering via Carrot2
  • Multi-select faceting (support for multiple items in a single category to be selected)
  • Many powerful query enhancements, including ranges over arbitrary functions, and nested queries of different syntaxes
  • Many other plugins including Terms for auto-suggest, Statistics, TermVectors, Deduplication
Performance Enhancements
  1. A simple FieldCache load test
  2. Filtered query performance increases
  3. Solr scalability improvements
  4. Solr faceted search performance improvements
  5. Improvements in Solr Faceting Search
Revamped All-Java Replication
  1. SolrReplication wiki page
  2. Works on Microsoft Windows Platforms too!
DataImportHandler improvements
  1. What's new in DataImportHandler in Solr
  2. DataImportHandler wiki page
Rich document processing
  1. ExtractingRequestHandler Wiki page
  2. Posting Rich Documents to Apache Solr using SolrJ and Solr Cell
Dynamic Search Results Clustering
  1. ClusteringComponent Wiki page
  2. Solr's new Clustering Capabilities
Multi-select Faceting
  1. Local params for faceting
  2. Tagging and excluding filters
Query Enhancements
  1. Ranges over functions
  2. Nested query support for any type of query parser (via QParserPlugin). Quotes will often be necessary to encapsulate the nested query if it contains reserved characters. Example: _query_:"{!dismax qf=myfield}how now brown cow"
New Plugins
  1. TermsComponent (can be used for auto-suggest)
  2. TermVectorComponent
  3. Statistics
  4. Deduplication
SolrJ - Java client
  1. Faster, more efficient Binary Update format
  2. Javabean (POJO) binding support
  3. Fast multi-threaded updates through StreamingUpdateSolrServer
  4. Simple round-robin load balancing client - LBHttpSolrServer
  5. Stream documents through an Iterator API
  6. Many performance optimizations
Miscellaneous
  1. Rollback command in UpdateHandler
  2. More configurable logging through the use of SLF4J library
  3. 'commitWithin' parameter on add document command allows setting a per-request auto-commit time limit.
  4. TokenFilter factories for Arabic language
  5. Improved Thai language tokenization (SOLR-1078)
  6. Merge multiple indexes
  7. Expunge Deletes command
Upgrade instructions

Although Solr 1.4 is backwards-compatible with previous releases, users are encouraged to read the upgrading notes in the Solr Change Log.

There are so many more new features, optimizations, bug fixes and refactorings that it is not possible to cover them all in a single blog post.

A large amount of effort has gone into this release. Many congratulations to the entire Solr community for making this happen!

Great things are planned for the next release and it is a great time to get involved. See http://wiki.apache.org/solr/HowToContribute for how to get started.

Enjoy Solr 1.4 and let us know on the mailing lists if you have any questions!

About Me

My photo
Committer on Apache Solr. Principal Software Engineer at AOL.

Twitter Updates

    follow me on Twitter

    Recently shared stories

    Recent questions on Apache Solr

    Recent development in Apache Solr