Solr is the classic open-source search system. Perhaps too classic (old).
See Elasticsearch too.
- Solr UI
- Solr CLI
- Commercial Offerings
- Start SolrCloud Commands
- Solr / SolrCloud Docker Images
- SolrCloud
- Hadoop MapReduce Indexer to SolrCloud
- Monitoring
- Troubleshooting
Uses an embedded Jetty webapp on port 8983
If administering Solr a lot, you may find this CLI makes your day to day significantly easier and can use environment variables to shorten your commands.
git clone https://github.com/HariSekhon/DevOps-Perl-tools.git perl-tools
cd perl-tools
make # installs CPAN dependencies on all major Linux and Mac systems
./solr_cli.--help
On first node:
Notice the /solr
suffix to the zkHosts to chroot the SolrCloud to the /solr path inside ZooKeeper otherwise it'll
write junk all over top level
java -DzkHosts=zk1:2181,zk2:2181,zk3:2181/solr \
-DnumShards=2 \
-Dbootstrap_confdir=$SOLR_HOME/example-solrcloud1/solr/collection1/conf \
-Dcollection.configName=myconf \
-jar start.jar
on subsequent nodes:
java -DzkHost=zk1:2181,zk2:2181,zk3:2181/solr -jar start.jar
HariSekhon/Dockerfiles - SolrCloud
HariSekhon/Dockerfiles - SolrCloud Dev
Clustered Solr using ZooKeeper for coordination of nodes and shards.
Expect to do shard management, recoveries and ZooKeeper contents investigation.
Strongly recommended that you use Elasticsearch instead of SolrCloud.
The problem with SolrCloud is that clustering was tacked on to classic Solr as an afterthought and it shows, compared to Elasticsearch where this was a primary consideration.
Shard management is difficult in SolrCloud and it's common to have shard outages and shard loss requiring re-indexing from an external data store.
You will need the Solr CLI above. There is even a --request-core-recovery
switch that uses an API
endpoint that wasn't documented at the time of writing.
-
Leader uses 100 transaction log to sync replicas
-
if replicas fall too far behind leader then full replication of segments is needed instead
-
CDCR - Cross DataCenter Replication
- https://sematext.com/blog/2016/04/20/solr-6-datacenter-replication/
- 6.0+
- stores unlimited update log
- 'replicator' configured on source collection sends batch updates to target collection(s)
- shard leader receives indexing command, processes + replicates to local replicas + writes to update log for CDCR (synchronously)
- async 'replicator' checks update log, if new creates batch + sends to target collection
- data received by target collection leaders replicated to locally at other DC using std SolrCloud replication
- 'replicator' batches for max scalability
- update log only captures new indexed docs
- indexing existing collection requires shutting down source cluster, copying source leader data dirs to target leader data dirs
- solr.CdcrRequestHandler, insert update request processor chain in UpdateHandler (see link above for details)
- Limitations:
- Active - Passive
- not bi-directional so no indexing if source cluster goes down (or perhaps but no replication back to original primary)
- shards must be manually migrated (Elasticsearch auto-migrates shards)
- start CDCR
http://.../<collection>/cdcr?action=START
- stop CDCR
http://.../<collection>/cdcr?action=STOP
- enable buffering
CDCR http://.../<collection>/cdcr?action=ENABLEBUFFER
- disable buffering
CDCR http://.../<collection>/cdcr?action=DISABLEBUFFER
-
Parallel SQL:
- 6.0+
- SQL Handler
/solr/<collection>/select?q=<traditional_query>
/solr/<collection>/sql
- request body
stmt=<sql_query>
- request body must be urlencoded
- Solr JDBC driver provided with SolrJ
- GROUP BY
- aggregates count / sum / min / max / avg
- no JOIN
NRT DR cross site recovery down entire cluster, 1 node per shard to ZK at other DC => replicate catch up => down + reconfigure back to local ZK and start again
zkRun uses embedded zookeeper (just for testing):
java -DzkRun -DnumShards=2 -Dbootstrap_confdir=$SOLR_HOME/example-solrcloud1/solr/collection1/conf -Dcollection.configName=myconf -jar start.jar
java -Djetty.port=8984 -DzkHost=localhost:9983 -jar start.jar
Shortcut:
solr -e cloud -noprompt
info:
solr -i
solr stop -c -all
Routing using Murmur hash:
id=<shard>!<id>
cd $SOLR_HOME
For dry-run to get libs locally, reuse in -libjars
for distributed job:
export HADOOP_CLASSPATH="$HADOOP_CLASSPATH:$(ls dist/*.jar \
contrib/map-reduce/lib/*.jar \
dist/solrj-lib/*.jar \
contrib/morphline-core/lib/*.jar \
contrib/morphlines-cell/lib/*.jar \
contrib/extraction/lib/*.jar \
example/solr-webapp/webapp/WEB-INF/lib/*.jar |
tr '\n' ':' |
sed 's/:$//'
)"
hadoop jar dist/solr-map-reduce-*.jar org.apache.solr.hadoop.MapReduceIndexerTool --libjars ...
hadoop jar dist/solr-map-reduce-*.jar \
-libjars $(sed 's/:/,/g' <<< "$HADOOP_CLASSPATH") \
--mappers 12 \
--morphline-file myFile.conf \
--morphline-id morphline1 \
--zk-host $ZOOKEEPERS \
--collection $SOLR_COLLECTION \
--go-love --go-live-threads 6 \
--output-dir hdfs://nameservice1/tmp/blah \
--verbose \
--dry-run \
hdfs://nameservice1/data
-
When not on HDFS (online indexing performance sucks on HDFS),
--output hdfs://nameservice1/
causesorg.apache.solr.common.SolrException: Directory: org.apache.lucene.store.MMapDirectory. but hdfs lock factory can only be used with HdfsDirectory
-
MRIndexer writes
tmp2/full-import-list.txt
to dir from only 1 mapper, this causesFileNotFoundException
in mapper since they can't see the file when using file:/// - this rules out local index creation -
URI error was due to specifying MapReduceIndexer path as first arg, must be
hdfs:///data/...
java -cp dist/*:contrib/map-reduce/lib/*:$(hadoop classpath) org.apache.solr.hadoop.MapReduceIndexerTool
java -cp dist/*:contrib/map-reduce/lib/*:$(hadoop classpath) org.apache.solr.hadoop.HdfsFindTool --help
of note:
- readMultiLine
- grok
- generateUUID
- convertTimestamp
HariSekhon/Nagios-Plugins check_solr*
scripts.
Some more things to monitor:
- Number of queries per second
- Solr Write (
check_solr_write.pl
) - Average response time (
check_solr_query.pl
returns QTime query response time) - Number of updates
- Cache hit ratios
- Replication status
- Synthetic queries
Shards with no filled in circle = no leader
https://solr.apache.org/guide/8_7/shard-management.html#forceleader
Forcing a leader election can lead to data loss:
curl "http://$HOST:8983/solr/admin/collections?action=FORCELEADER&collection=$COLLECTION&shard=$SHARD"
Old script restart_local_solr.sh
example:
#!/bin/bash
set -x;
pgrep -l -f start.jar | grep -v grep | awk '{print $2}' | xargs --no-run-if-empty kill;
count=0;
while ps -ef|grep start.ja[r]; do
let count+=1;
sleep 3;
[ $count -ge 5 ] && break;
done;
pgrep -l -f start.jar | grep -v grep | awk '{print $2}' | xargs --no-run-if-empty kill -9;
rm -fv /data*/solr/*/index/write.lock
sleep 1;
cd /opt/solr/hdp; java -Xmx30g -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -DzkHost=$SOLR_ZOOKEEPER -jar start.jar &
exit 0
This file in the data directory needs to exist for core to come up - only created by solr script if dir doesn't already exists
touch /var/solr/data/<core>/core.properties
This is a bug - disable HDFS write cache to work around or switch back to using local disk (faster anyway)
org.apache.solr.common.SolrException; org.apache.lucene.index.CorruptIndexException: codec header mismatch: actual header \d+ vs expected header \d+
After restart Solr instances, some cores don't load and you this exception.
This happens because of killing Solr instance and leaving <dataDir>/index/write.lock
files behind which prevents
cores from loading on restart:
org.apache.solr.common.SolrException: Index locked for write for core Blah_shard3_replica2
FIX:
- stop Solr
- then run
rm /data*/solr/*/write.lock
- start Solr
Trigger recovery manually (not currently documented, but I've coded it into solr_cli.pl):
/solr/admin/cores?action=REQUESTRECOVERY&core=<name>
CLUSTERSTATUS
/ OVERSEERSTATUS
returns 400 Bad Request
"error": { "msg": "Unknown action: CLUSTERSTATUS" }
There was a serial mismatch in logs Solr 4.10.3 vs 4.7.2 rest of cluster.
dfs.replication
setting not respected SOLR-6305 and SOLR-6528autoAddReplicas
add 4.10 didn't work when tested in 4.10.3- Missing authority in path URI when using
hdfs:/tmp
=> needs NN part which is the "authority" =>hdfs://nameservice1/tmp
Partial port from private Knowledge Base page 2013+