Skip to content

ReqMgr2 MicroService Transferor

Alan Malta Rodrigues edited this page May 15, 2019 · 23 revisions

This documentation is meant to describe the architecture, behaviour and APIs for the ReqMgr2 Microservice Transferor module, which is responsible for looking data up at the global workqueue level and creating PhEDEx subscriptions.

For the record - given that those abbreviations will be mentioned several times in this document - here is their meaning: GQE: global workqueue element LQE: local workqueue element

Relying on global workqueue (first proposal)

This is the initial proposal discussed with Valentin. This assumes we let Global WorkQueue discover what exactly are the input datasets/blocks to be used by the workflow. It also adds a new workqueue element status which would drive work acquisition.

Here is how we envision it to work:

  1. Unified assigns a workflow (request transition from assignment-approved to assigned)
  2. Global Workqueue queries ReqMgr2 for workflows in the assigned status. It then parses the request spec and create chunks of work GQE in the New status. Once ALL elements have been created, the request goes to the acquired status (right now it happens asynchronously with a ReqMgr2 CherryPy thread, TBD and investigated whether it's safe to keep it that way).
  3. The MicroService kicks in and queries ReqMgr2 for requests in acquired status;
  • IF there are none requests, it exits and wait until the next cycle.
  1. Otherwise, it has to fetch the Campaign(s) configuration for every request (this can be cached!), which will be used to decide how many replicas have to be made and where data has to be subscribed to.
  • IF a Campaign configuration cannot be found, then we have to create an alert and skip that workflow (leaving it in acquired status)
  1. Otherwise, given the list of requests in acquired, MicroService talks then to the Global WorkQueue and fetches all their GQE in the New status
  • If there are none, then all subscriptions have been made and the request should be soon moving to the staging status
  • Otherwise, run the Transferor algorithm - taking into account the Campaign configuration, from CouchDB - and create PhEDEx subscriptions.
  • PS.: Check whether parent blocks are available in the GQE
  1. Set the GQE status to Available for all the PhEDEx subscriptions that were successfully made (or for GQE without any input data!)
  2. TBD update the request status to staging. Or let the ReqMgr2 CherryPy thread take care of that...

The process above has to be executed for every single workflow. By design, the agents won't pull any GQE work that is still without a PhEDEx subscription (in New status).

In case a PhEDEx subscription fails, the GQE will remain in the New status, and so should the request still be in the acquired status as well.

Relying on ReqMgr2 (second proposal)

This model assumes that the MicroService would have to parse the request spec and find out what exactly are the input datasets/blocks. It also assume there would be another (micro)service monitoring those transfer requests and driving work acquisition. Here is how we envision it to work:

  1. Unified assigns a workflow (request transition from assignment-approved to assigned)
  2. MS Transferor queries for requests in assigned, and parses their spec in order to find whether there are any input files that need to be transferred. Information that needs to be taken into consideration are:
  • zero or one input primary dataset
  • zero or one parent dataset
  • zero to many pileup datasets
  1. With the overall list of data to replicate, create transfer requests based on:
  • campaign configuration
  • unified configuration
  • SiteWhitelist used during the workflow assignment
  • estimated amount of work
  • anything else from the "black box logic"
  1. Persist the transfer request IDs somewhere (oracle, couchdb, somewhere else not in memory). We might have to classify standard input subscription from pileup subscription.
  2. Once all transfer requests were successfully made, update the request status assigned -> staging
  • if there is nothing to be transferred (no input at all), then update the request status once again staging -> staged
  1. MS TransferMonit (yes, a new service - or we keep relying on Unified for a while to monitor those) fetches workflows in status staging, find out what are their transfer request IDs and check the transfer completion (updating the transfer completion status in the database). Some use cases would be (This is tricky!!):
  • if all transfers are completed, move the request status staging -> staged
  • if pileup transfers are completed AND some(?) input blocks are completed, move the request status staging -> staged
  • transfers not completed, just update the database with their completion
  • allow Ops to bypass this request transition, if needed
  1. Global Workqueue queries ReqMgr2 for workflows in the staged status and it performs its normal action, creating GQE in the usual Available status.
  2. From this point on, agents can start pulling work down and process those.

Open questions

Do we want to keep track of the subscriptions (persisting data somewhere?)?

Do we want to monitor the subscriptions and act upon issues and/or stuck transfers? Or we just assume transfers will eventually succeed? Alerts have to be created for bad input placement (bad transfers) as well.

ReqMgr2 MicroService APIs

The MicroService is a data-service which provides set of APIs to perform certain actions. Its general architecture is shown below: MicroServiceArchitecture

In particular the WMCore MicroService provides an interface to perform Unified actions, such as fetch requests from ReqMgr2 data-services, obtain necessary informations for data placement and place requests of assigned workflows into data placement system PhEDEx.

Available APIs

GET APIs

  • /microservice/data provides basic information about MicroService. It returns the following information:
{"result": [
 {"microservice": "UnifiedTransferorManager", "request": {}, "results": {"status": {}}}
]}
  • /microservice/data/status provides detailed information about requests in MicroService. It returns the following information:
curl --cert $X509_USER_CERT --key $X509_USER_KEY -X GET -H "Content-type: application/json" https://cmsweb-testbed.cern.ch/microservice/data/status

{"result": [
 {"microservice": "UnifiedTransferorManager", "request": {}, "results": {"status": {}}}
]}

POST APIs

  • /microservice/data allows to send specific request to MicroService

post request to process some state

curl -X POST -H "Content-type: application/json" -d '{"request":{"process":"assignment-approved"}}' http://localhost:8822/microservice/data

obtain results about specific workflow

curl --cert $X509_USER_CERT --key $X509_USER_KEY -X POST -H "Content-type: application/json" -d '{"request":{"task":"amaltaro_StepChain_DupOutMod_Mar2019_Validation_190322_105219_7255"}}' https://cmsweb-testbed.cern.ch/microservice/data

{"result": [
 {"amaltaro_StepChain_DupOutMod_Mar2019_Validation_190322_105219_7255": {"completed": 100}}
]}
Clone this wiki locally