12 December 2011

Apache Solr - Master / Slave configuration

Master Slave Configuration (Replication) :


Java based replication is implemented as RequestHandler. Using single solrconfig.xml for master and slave.

changes in solrconfig.xml

<requestHandler name="/replication" class="solr.ReplicationHandler" >
    <lst name="master">

        <str name="enable">${enable.master:false}</str>
        <str name="replicateAfter">commit</str>
        <str name="replicateAfter">startup</str>
        <str name="confFiles">schema.xml,stopwords.txt,elevate.xml</str>
        <str name="commitReserveDuration">00:00:20</str>
        <str name="backupAfter">optimize</str>
     </lst>
     <lst name="slave">
        <str name="enable">${enable.slave:false}</str>
        <str name="masterUrl">http://192.168.1.120:8983/solr/replication</str>
        <str name="pollInterval">00:00:60</str>
        <str name="compression">internal</str>
        <str name="httpConnTimeout">5000</str>
        <str name="httpReadTimeout">10000</str> 
        <str name="httpBasicAuthUser">username</str>
        <str name="httpBasicAuthPassword">password</str>
     </lst>
</requestHandler>


Tag description:

Master

enable
If a server needs to be turned into a master from a slave or if you wish to use the same solrconfig.xml for both master and slave. When the master is started, pass in -Denable.master=true and in the slave pass in -Denable.slave=true

replicateAfter
Replicate on give value. Valid values are "commit", "startup", "optimize". You can give multiple values. If only the startup option is given, replication will not be triggered on subsequent commits/optimizes after it is done for the first time at the start.

confFiles
Configuration files names mentioned will be replicated. The files that are to be replicated have to be mentioned explicitly in using the 'confFiles' parameter. Only files in the 'conf' dir of the solr instance are replicated. The files are replicated only along with a fresh index.

commitReserveDuration
If your commits are very frequent and network is particularly slow, you can tweak an extra attribute <str name="commitReserveDuration">00:00:20</str>. Default is 10 secs.

backup
Create a backup after action specified in backupAfter. Valid values are 'optimize', 'commit', 'startup'. It is possible to have multiple entries of this config string. Note that this is just for backup, replication does not require this.

Slave

enable
Same as of master. pass in -Denable.slave=true to run the solr in slave mode.

masterUrl
Master apache solr server url.

pollInterval
Interval in which the slave should poll master .Format is HH:mm:ss . If this is absent slave does not poll automatically. But a fetchindex can be triggered from the admin or the http API

compression
The possible values are internal|external if the value is 'external' make sure that your master Solr has the settings to honour the accept-encoding header. If it is 'internal' everything will be taken care of automatically. Use this only if you bandwidth is low

httpConnTimeout/httpReadTimeout
The following values are used when the slave connects to the master to download the index files. Default values implicitly set as 5000ms and 10000ms respectively. The user DOES NOT need to specify these unless the bandwidth is extremely low or if there is an extremely high latency.

httpBasicAuthUser/httpBasicAuthPassword
If HTTP Basic authentication is enabled on the master we need to specify these parameters other not.

Just copy the above xml fragment to solrconfig.xml

Starting apache solr in master mode.
$ java -jar -Dmaster.enable=true start.jar

Starting apache solr in slave mode
$ java -jar -Dslave.enable=true start.jar

For More information please refer :
http://wiki.apache.org/solr/SolrReplication

5 comments:

  1. Your site was very helpful, how would I test whether the master/slave configuration works?

    ReplyDelete
    Replies
    1. Run the master and slave instance of Apache solr. Then add/delete document to/from Master. After pollInterval it should get reflected to slave.

      Query to Slave instance you should see the added document.

      Delete
  2. Ok, I added the following document using curl:

    curl "http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true" -F "myfile=@troubleshooting_performance.doc"

    I did a query on my master server and found it but when I queried the slave I didn't see anything. I'm pretty sure my settings solrconfig.xml are correct and polling shows 00:00:60. Any ideas?

    ReplyDelete
  3. I have also indexed a pdf file using tika:

    curl http://www.manning.com/hatcher3/hatcher_meapch1.pdf|/usr/java/6/bin/java -jar tika-app-1.0.jar --text|grep -q keyword
    % Total % Received % Xferd Average Speed Time Time Time Current
    Dload Upload Total Spent Left Speed
    100 621k 100 621k 0 0 359k 0 0:00:01 0:00:01 --:--:-- 423k

    Can I perform this same task without using tika on command line?

    Thanks in advance, there isn't a lot of documentation that is straight forward.

    ReplyDelete
  4. You can check following link:

    http://www.lucidimagination.com/blog/2009/09/14/posting-rich-documents-to-apache-solr-using-solrj-and-solr-cell-apache-tika/

    You can do curl
    curl "http://localhost:8983/solr/update/extract?literal.id=doc1&uprefix=attr_&fmap.content=attr_content&commit=true" -F "myfile=@sample.pdf"

    Reference:
    http://wiki.apache.org/solr/ExtractingRequestHandler

    update/extract will extract the content of the pdf file(using Tika) and derive some attributes that you can check by querying to apche solr.

    ReplyDelete