Skip to content
Laura Wrubel edited this page Sep 28, 2021 · 2 revisions

Spark

Spark loader cannot find file

  1. Make sure the NFS mounts for the dataset_loading and tweetset_data/full_datasets directories are accessible from all nodes in the cluster. The directories need to be mounted before starting the Spark Docker containers in order that they be available within said containers.

Spark master cannot connect to worker

  1. Check docker logs ts_spark-master_ and (on each worker node) docker logs ts_spark-worker_1 to look for connection problems. Restarting all the Spark containers may resolve this problem. Workers should be started before the master node.

Custom datasets

Determining how many datasets are in progress of being generated

cd /storage/datasets
find . -name "generate_tasks.json" | wc -l

Revoking an export

  1. Set up flower
docker exec -it ts_server_1 /bin/bash
pip install flower
celery --broker=redis://redis:6379/0 flower 
  1. Start another shell session to run commands. docker exec -it ts_server_1 /bin/bash
  • Review tasks in progress: curl localhost:5555/api/tasks
  • View a particular task: curl localhost:5555/api/task/info/<task-id>
  1. Revoke task: curl -X POST localhost:5555/api/task/revoke/<task-id>?terminate=true For example: curl -X POST localhost:5555/api/task/revoke/91949eb0-7cd2-4f90-ae32-84e2a931da5a?terminate=true
  2. Once revoked, remove the generate_tasks.json file in /storage/datasets/<dataset-id>. This helps with accurately counting in-process datasets later.

Elasticsearch

Monitoring: Use ElasticHQ locally as a monitoring tool.

  1. docker run -p 5000:5000 elastichq/elasticsearch-hq
  2. Access with: http://localhost:5000
  3. Provide TweetSets primary VM address, port 9200, using http not https.

Reallocating unassigned shards

This may occur if storage fills up and ElasticSearch is restarted. Unassigned shards can be viewed in ElasticHQ or via the API.

  1. Identify which nodes have allocation problems curl -X PUT http://<hostname for VM>:9200/_cluster/allocation/explain?pretty=true
  2. If there are errors with allocation on any particular nodes, run on prod1 and/or any nodes with problems the _cluster/reroute command. Example: curl -X POST http://<hostname for VM>:9200/_cluster/reroute?retry_failed=true

Adding a temporary downtime alert on the home page

  1. Set up dependencies
apt-get update
apt-get -y install vim
  1. Insert HTML into templates/about.html Example text for alert to be placed in about.html before maintenance:
<div class="alert alert-warning" role="status">TweetSets will be briefly down for maintenance on the morning of Dec 21, 2020. 
Datasets in process at that time may not complete, so please wait until after Dec 21 to start any new critical exports. 
Questions? <a target="_blank" href="mailto:sfm@gwu.edu">Email us at sfm@gwu.edu</a>.
 </div>
  1. Restart gunicorn using its pid
ps -ef | grep gunicorn
kill -HUP <pid>