Skip to content
This repository has been archived by the owner on May 28, 2024. It is now read-only.

Brainstorming

Darren Hardy edited this page Nov 15, 2016 · 1 revision

This page contains a list of DRAFT recommendation for development we could do to improve performance and/or stability. They are currently NOT IN ANY ORDER, and our goal is to use them to prompt discussion and keep track of any brainstorming we do. Note that they are all dependent on implementing monitoring, as we want to establish a baseline before we deploy any improvements.

Our analyses are here.

Cache Dor::Workflow objects

A majority of the objects being fetched from /dor/reindex/:pid are actually Dor::Workflow objects which very rarely change. We could cache these within the Dor.load_instance method in dor-services. One complication is that Passenger respawns threads so we would need to use some kind of shared caching strategy, such as ActiveSupport::Cache::FileStore or ActiveSupport::Cache::MemCacheStore.

Cache Dor::Collection and Dor::AdminPolicyObject

We may also want to cache collections and APOs since they are looked up for every object -- see this issue. In the context of to_solr, only the title information is extracted from the object. One complication is that these titles may change so, for example, the TTLs would need to be short.

Reuse connections for Fedora HTTP client

We have many requests to Fedora for each /dor/reindex/:pid transaction, but we are not using "Keep-Alive" connections to Fedora (need to double check this in rubydora). We could implement to persistent connections and connection caching.

Replace reindexing code in Argo to call dor_indexing_app

Argo has its own /dor/reindex/:pid route that calls dor-services, etc. It's currently used for Bulk Reindexing and the Reindex blue button in the Argo UI. We could implement the internals of that route to use an HTTP GET on dor-indexing-app's /dor/reindex/:pid to consolidate our code and streamline performance analysis (i.e., ensure all reindexing is going through the same process).

Reduce duplication of reindexing requests in the pipeline

When objects are active in the workflows, they change frequently. We could research whether our de-duplication of update messages are being collapsed effectively. That is, if messages are occuring in such rapid succession, then it may be that we can consider some messages "duplicates" if the queue is slow.

Relocating workflow-service off of lyberservices

This may improve performance and reduce confusion as so many services are deployed on lyberservices. Currently, roughly 95% of the traffic on the lyberservices' web server is for the workflow service (mainly coming from robot-master).

Remove the Solr GET operation

The to_solr method currently queries Solr while it's building the Solr document (need to verify exactly where it makes the query). This adds an additional external service to the pipeline that might be replaced with data already in the ActiveFedora object.

✔️(?) Upgrade VMs to Ruby 2.3.x

The Ruby 2.3.x releases have performance improvements over the 2.2.x line that we've got deployed currently.

✔️ Cache XSLT objects

The generate_dublin_core method parses the XSLT for every indexing request. We could cache that XSLT object and reuse it. One complication is whether that .transform method is thread-safe.

Simplify implementation of slow methods

Our performance analysis shows slow performance of some methods beyond network latency, such as generate_dublin_core. We could identify these slow methods and simplify and/or improve their implementation.

Reuse the certificate store

OpenSSL::X509::Store#set_default_paths is a time consuming operation that's called for every HTTP request. We might be able to reuse the certificate store without having to rebuild it each time.

✔️ Reconfigure our concurrency configuration to improve throughput

We are not utilizing our system (CPU) resources fully. We can change our concurrent configuration to better utilize them.

Improve disk read/write performance on DOR production

We might be able to use higher performance NAS hardware for the DOR /data NFS mount.

Improve Workflow Service's Oracle instance performance

The workflow service hammers the Oracle server and we might be able to improve performance though configuration and/or application changes.