12 December 2011

Apache Solr - committing changes


Auto-Commit
Solr configuration has the following option available for commit (solrconfig.xml)

<autoCommit>
    <maxDocs>10000</maxDocs>
    <maxTime>1000</maxTime>
</autoCommit>

maxDocs - Maximum number of documents to add since the last commit before automatically triggering a new commit.
maxTime - Maximum amount of time that is allowed to pass since a document was added before automaticly triggering a new commit.


commitWithin
commitWithin = "(milliseconds)" if the "commitWithin" attribute is present, the document will be added within that time.
e.g

<add>
  <doc commitWithin="1000">
    <field name="employeeId">05991</field>
    <field name="office" boost="2.0">Bridgewater</field>
  </doc>
</add>


Committing manually
We can issuse a commit using simple commit=true in update querystring or using curl.

As Query String
$curl 'http://localhost:8983/solr/update?commit=true'

As XML request
$curl 'http://localhost:8983/solr/update' --data-binary  '<commit/>' -H 'Content-type:text/xml; charset=utf-8'


Commit Policy
A commit operation makes index changes visible to new search requests.
A hard commit also calls fsync on the index files to ensure they have been flushed to stable storage and no data loss will result from a power failure.

A soft commit is much faster since it only makes index changes visible and does not fsync index files or write a new index descriptor. If the JVM crashes or there is a loss of power, changes that occurred after the last hard commit will be lost. Search collections that have near-real-time requirements (that want index changes to be quickly visible to searches) will want to soft commit often but hard commit less frequently.

Optional attributes for commit

waitFlush = "true" | "false" — default is true — block until index changes are flushed to disk
softCommit = "true" | "false" — default is false — perform a soft commit - this will refresh the 'view' of the index in a more performant manner, but without "on-disk" guarantees.
expungeDeletes = "true" | "false" — default is false — merge segments with deletes away.

e.g.
As Query String:
$curl 'http://localhost:8983/solr/update?commit=true&waitFlush=true&expungeDeletes=true

As XML
$curl 'http://localhost:8983/solr/update' --data-binary  '<commit waitFlush="true" expungeDeletes="true"/>' -H 'Content-type:text/xml; charset=utf-8'

No comments:

Post a Comment