Master Slave Configuration (Replication) :
Java based replication is implemented as RequestHandler. Using single solrconfig.xml for master and slave.
changes in solrconfig.xml
<requestHandler name="/replication" class="solr.ReplicationHandler" >
<lst name="master">
<str name="enable">${enable.master:false}</str>
<str name="replicateAfter">commit</str>
<str name="replicateAfter">startup</str>
<str name="confFiles">schema.xml,stopwords.txt,elevate.xml</str>
<str name="commitReserveDuration">00:00:20</str>
<str name="backupAfter">optimize</str>
</lst>
<lst name="slave">
<str name="enable">${enable.slave:false}</str>
<str name="masterUrl">http://192.168.1.120:8983/solr/replication</str>
<str name="pollInterval">00:00:60</str>
<str name="compression">internal</str>
<str name="httpConnTimeout">5000</str>
<str name="httpReadTimeout">10000</str>
<str name="httpBasicAuthUser">username</str>
<str name="httpBasicAuthPassword">password</str>
</lst>
</requestHandler>
Tag description:
Master
enableIf a server needs to be turned into a master from a slave or if you wish to use the same solrconfig.xml for both master and slave. When the master is started, pass in -Denable.master=true and in the slave pass in -Denable.slave=true
replicateAfter
Replicate on give value. Valid values are "commit", "startup", "optimize". You can give multiple values. If only the startup option is given, replication will not be triggered on subsequent commits/optimizes after it is done for the first time at the start.
confFiles
Configuration files names mentioned will be replicated. The files that are to be replicated have to be mentioned explicitly in using the 'confFiles' parameter. Only files in the 'conf' dir of the solr instance are replicated. The files are replicated only along with a fresh index.
commitReserveDuration
If your commits are very frequent and network is particularly slow, you can tweak an extra attribute <str name="commitReserveDuration">00:00:20</str>. Default is 10 secs.
backup
Create a backup after action specified in backupAfter. Valid values are 'optimize', 'commit', 'startup'. It is possible to have multiple entries of this config string. Note that this is just for backup, replication does not require this.
Slave
enableSame as of master. pass in -Denable.slave=true to run the solr in slave mode.
masterUrl
Master apache solr server url.
pollInterval
Interval in which the slave should poll master .Format is HH:mm:ss . If this is absent slave does not poll automatically. But a fetchindex can be triggered from the admin or the http API
compression
The possible values are internal|external if the value is 'external' make sure that your master Solr has the settings to honour the accept-encoding header. If it is 'internal' everything will be taken care of automatically. Use this only if you bandwidth is low
httpConnTimeout/httpReadTimeout
The following values are used when the slave connects to the master to download the index files. Default values implicitly set as 5000ms and 10000ms respectively. The user DOES NOT need to specify these unless the bandwidth is extremely low or if there is an extremely high latency.
httpBasicAuthUser/httpBasicAuthPassword
If HTTP Basic authentication is enabled on the master we need to specify these parameters other not.
Just copy the above xml fragment to solrconfig.xml
Starting apache solr in master mode.
$ java -jar -Dmaster.enable=true start.jar
Starting apache solr in slave mode
$ java -jar -Dslave.enable=true start.jar
For More information please refer :
http://wiki.apache.org/solr/SolrReplication
Your site was very helpful, how would I test whether the master/slave configuration works?
ReplyDeleteRun the master and slave instance of Apache solr. Then add/delete document to/from Master. After pollInterval it should get reflected to slave.
DeleteQuery to Slave instance you should see the added document.
Ok, I added the following document using curl:
ReplyDeletecurl "http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true" -F "myfile=@troubleshooting_performance.doc"
I did a query on my master server and found it but when I queried the slave I didn't see anything. I'm pretty sure my settings solrconfig.xml are correct and polling shows 00:00:60. Any ideas?
I have also indexed a pdf file using tika:
ReplyDeletecurl http://www.manning.com/hatcher3/hatcher_meapch1.pdf|/usr/java/6/bin/java -jar tika-app-1.0.jar --text|grep -q keyword
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 621k 100 621k 0 0 359k 0 0:00:01 0:00:01 --:--:-- 423k
Can I perform this same task without using tika on command line?
Thanks in advance, there isn't a lot of documentation that is straight forward.
You can check following link:
ReplyDeletehttp://www.lucidimagination.com/blog/2009/09/14/posting-rich-documents-to-apache-solr-using-solrj-and-solr-cell-apache-tika/
You can do curl
curl "http://localhost:8983/solr/update/extract?literal.id=doc1&uprefix=attr_&fmap.content=attr_content&commit=true" -F "myfile=@sample.pdf"
Reference:
http://wiki.apache.org/solr/ExtractingRequestHandler
update/extract will extract the content of the pdf file(using Tika) and derive some attributes that you can check by querying to apche solr.