-
Notifications
You must be signed in to change notification settings - Fork 2
Troubleshooting
Laura Wrubel edited this page Sep 28, 2021
·
2 revisions
- Make sure the NFS mounts for the
dataset_loading
andtweetset_data/full_datasets
directories are accessible from all nodes in the cluster. The directories need to be mounted before starting the Spark Docker containers in order that they be available within said containers.
- Check
docker logs ts_spark-master_
and (on each worker node)docker logs ts_spark-worker_1
to look for connection problems. Restarting all the Spark containers may resolve this problem. Workers should be started before the master node.
cd /storage/datasets
find . -name "generate_tasks.json" | wc -l
- Set up flower
docker exec -it ts_server_1 /bin/bash
pip install flower
celery --broker=redis://redis:6379/0 flower
- Start another shell session to run commands.
docker exec -it ts_server_1 /bin/bash
- Review tasks in progress:
curl localhost:5555/api/tasks
- View a particular task:
curl localhost:5555/api/task/info/<task-id>
- Revoke task:
curl -X POST localhost:5555/api/task/revoke/<task-id>?terminate=true
For example:curl -X POST localhost:5555/api/task/revoke/91949eb0-7cd2-4f90-ae32-84e2a931da5a?terminate=true
- Once revoked, remove the
generate_tasks.json
file in/storage/datasets/<dataset-id>
. This helps with accurately counting in-process datasets later.
Monitoring: Use ElasticHQ locally as a monitoring tool.
docker run -p 5000:5000 elastichq/elasticsearch-hq
- Access with:
http://localhost:5000
- Provide TweetSets primary VM address, port 9200, using http not https.
This may occur if storage fills up and ElasticSearch is restarted. Unassigned shards can be viewed in ElasticHQ or via the API.
- Identify which nodes have allocation problems
curl -X PUT http://<hostname for VM>:9200/_cluster/allocation/explain?pretty=true
- If there are errors with allocation on any particular nodes, run on prod1 and/or any nodes with problems the _cluster/reroute command. Example:
curl -X POST http://<hostname for VM>:9200/_cluster/reroute?retry_failed=true
- Set up dependencies
apt-get update
apt-get -y install vim
- Insert HTML into
templates/about.html
Example text for alert to be placed in about.html before maintenance:
<div class="alert alert-warning" role="status">TweetSets will be briefly down for maintenance on the morning of Dec 21, 2020.
Datasets in process at that time may not complete, so please wait until after Dec 21 to start any new critical exports.
Questions? <a target="_blank" href="mailto:sfm@gwu.edu">Email us at sfm@gwu.edu</a>.
</div>
- Restart gunicorn using its pid
ps -ef | grep gunicorn
kill -HUP <pid>