Thoughts about technology, business and all that's life.
This blog has moved to http://shal.in.
Thursday, November 26, 2009
Apache Lucene Java 3.0 Released
Apache Lucene Java 3.0.0 has been released. Lucene Java 3.0.0 is mostly a clean-up release without any new features. It paves the path for refactoring and adding new features without the shackles of backwards compatibility. All APIs deprecated in Lucene 2.9 have been removed and Lucene Java has officially moved to Java 5 as the minimum requirement.
See the announcement email for more details. Congratulations Lucene Devs!
Labels:
Apache Lucene
Wednesday, November 18, 2009
Apache Mahout 0.2 Released
Apache Mahout 0.2 has been released. Apache Mahout is a project which attempts to make machine learning both scalable and accessible. It is a sub-project of the excellent Apache Lucene project which provides open source search software.
From the project website:
The Apache Lucene project is pleased to announce the release of Apache Mahout 0.2.
Highlights include:
- Significant performance increase (and API changes) in collaborative filtering engine
- K-nearest-neighbor and SVD recommenders
- Much code cleanup, bug fixing
- Random forests, frequent pattern mining using parallel FP growth
- Latent Dirichlet Allocation
- Updates for Hadoop 0.20.x
Details on what's included can be found in the release notes.
Downloads are available from the Apache Mirrors
Labels:
Apache Mahout
Tuesday, November 10, 2009
Apache Solr 1.4 Released
From the official announcement:
Apache Solr 1.4 has been released and is now available for public download!
http://www.apache.org/dyn/
Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of
many of the world's largest internet sites.
Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Tomcat. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. Solr's powerful external configuration allows it to be tailored to almost any type of application without Java coding, and it has an extensive plugin architecture when more advanced customization is required.
New Solr 1.4 features include
- Major performance enhancements in indexing, searching, and faceting
- Revamped all-Java index replication that's simple to configure and can replicate configuration files
- Greatly improved database integration via the DataImportHandler
- Rich document processing (Word, PDF, HTML) via Apache Tika
- Dynamic search results clustering via Carrot2
- Multi-select faceting (support for multiple items in a single category to be selected)
- Many powerful query enhancements, including ranges over arbitrary functions, and nested queries of different syntaxes
- Many other plugins including Terms for auto-suggest, Statistics, TermVectors, Deduplication
- A simple FieldCache load test
- Filtered query performance increases
- Solr scalability improvements
- Solr faceted search performance improvements
- Improvements in Solr Faceting Search
- SolrReplication wiki page
- Works on Microsoft Windows Platforms too!
Rich document processing
Dynamic Search Results Clustering
Multi-select Faceting
Query Enhancements
- Ranges over functions
- Nested query support for any type of query parser (via QParserPlugin). Quotes will often be necessary to encapsulate the nested query if it contains reserved characters. Example: _query_:"{!dismax qf=myfield}how now brown cow"
- TermsComponent (can be used for auto-suggest)
- TermVectorComponent
- Statistics
- Deduplication
- Faster, more efficient Binary Update format
- Javabean (POJO) binding support
- Fast multi-threaded updates through StreamingUpdateSolrServer
- Simple round-robin load balancing client - LBHttpSolrServer
- Stream documents through an Iterator API
- Many performance optimizations
- Rollback command in UpdateHandler
- More configurable logging through the use of SLF4J library
- 'commitWithin' parameter on add document command allows setting a per-request auto-commit time limit.
- TokenFilter factories for Arabic language
- Improved Thai language tokenization (SOLR-1078)
- Merge multiple indexes
- Expunge Deletes command
Although Solr 1.4 is backwards-compatible with previous releases, users are encouraged to read the upgrading notes in the Solr Change Log.
There are so many more new features, optimizations, bug fixes and refactorings that it is not possible to cover them all in a single blog post.
A large amount of effort has gone into this release. Many congratulations to the entire Solr community for making this happen!
Great things are planned for the next release and it is a great time to get involved. See http://wiki.apache.org/solr/HowToContribute for how to get started.
Enjoy Solr 1.4 and let us know on the mailing lists if you have any questions!
Labels:
Apache Solr
Subscribe to:
Posts (Atom)
About Me
- Shalin Shekhar Mangar
- Committer on Apache Solr. Principal Software Engineer at AOL.
Blog Archive
Labels
- Apache Solr (8)
- Apache Lucene (3)
- Apache Mahout (3)
- AOL (1)
- Architecture (1)
- DataImportHandler (1)
- Faceted Search (1)
- Google App Engine (1)
- Inside Solr (1)
- Machine Learning (1)
- Optimization (1)
- Scalability (1)