-
Notifications
You must be signed in to change notification settings - Fork 2
Brainstorming
This page contains a list of DRAFT recommendation for development we could do to improve performance and/or stability. They are currently NOT IN ANY ORDER, and our goal is to use them to prompt discussion and keep track of any brainstorming we do. Note that they are all dependent on implementing monitoring, as we want to establish a baseline before we deploy any improvements.
Our analyses are here.
A majority of the objects being fetched from /dor/reindex/:pid
are actually Dor::Workflow
objects which very rarely change. We could cache these within the Dor.load_instance
method in dor-services. One complication is that Passenger respawns threads so we would need to use some kind of shared caching strategy, such as ActiveSupport::Cache::FileStore
or ActiveSupport::Cache::MemCacheStore
.
We may also want to cache collections and APOs since they are looked up for every object -- see this issue. In the context of to_solr
, only the title information is extracted from the object. One complication is that these titles may change so, for example, the TTLs would need to be short.
We have many requests to Fedora for each /dor/reindex/:pid
transaction, but we are not using "Keep-Alive" connections to Fedora (need to double check this in rubydora
). We could implement to persistent connections and connection caching.
Argo has its own /dor/reindex/:pid
route that calls dor-services
, etc. It's currently used for Bulk Reindexing and the Reindex blue button in the Argo UI. We could implement the internals of that route to use an HTTP GET on dor-indexing-app
's /dor/reindex/:pid
to consolidate our code and streamline performance analysis (i.e., ensure all reindexing is going through the same process).
When objects are active in the workflows, they change frequently. We could research whether our de-duplication of update messages are being collapsed effectively. That is, if messages are occuring in such rapid succession, then it may be that we can consider some messages "duplicates" if the queue is slow.
This may improve performance and reduce confusion as so many services are deployed on lyberservices. Currently, roughly 95% of the traffic on the lyberservices' web server is for the workflow service (mainly coming from robot-master
).
The to_solr
method currently queries Solr while it's building the Solr document (need to verify exactly where it makes the query). This adds an additional external service to the pipeline that might be replaced with data already in the ActiveFedora object.
The Ruby 2.3.x releases have performance improvements over the 2.2.x line that we've got deployed currently.
The generate_dublin_core
method parses the XSLT for every indexing request. We could cache that XSLT object and reuse it. One complication is whether that .transform
method is thread-safe.
Our performance analysis shows slow performance of some methods beyond network latency, such as generate_dublin_core
. We could identify these slow methods and simplify and/or improve their implementation.
OpenSSL::X509::Store#set_default_paths
is a time consuming operation that's called for every HTTP request. We might be able to reuse the certificate store without having to rebuild it each time.
We are not utilizing our system (CPU) resources fully. We can change our concurrent configuration to better utilize them.
We might be able to use higher performance NAS hardware for the DOR /data
NFS mount.
The workflow service hammers the Oracle server and we might be able to improve performance though configuration and/or application changes.