Skip to content

Commit

Permalink
doc: refacto doc dev
Browse files Browse the repository at this point in the history
  • Loading branch information
steuxyo authored and dyoussef committed Nov 22, 2024
1 parent 4ae4173 commit 779cfa4
Show file tree
Hide file tree
Showing 17 changed files with 740 additions and 522 deletions.
125 changes: 125 additions & 0 deletions docs/source/developer_guide/algorithm_conception.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
.. role:: raw-html(raw)
:format: html


.. _algorithm_conception:

====================
Algorithm Conception
====================


.. tabs::

.. tab:: Grid Generation

:raw-html:`<h1>Method</h1>`


:raw-html:`<h1>Limits of the method</h1>`


:raw-html:`<h1>Implementation</h1>`




.. tab:: Resampling

:raw-html:`<h1>Method</h1>`


:raw-html:`<h1>Limits of the method</h1>`


:raw-html:`<h1>Implementation</h1>`


.. tab:: Sparse matching

:raw-html:`<h1>Method</h1>`


:raw-html:`<h1>Limits of the method</h1>`


:raw-html:`<h1>Implementation</h1>`


.. tab:: Grid correction

:raw-html:`<h1>Method</h1>`


:raw-html:`<h1>Limits of the method</h1>`


:raw-html:`<h1>Implementation</h1>`


.. tab:: Disparity a priory computation

:raw-html:`<h1>Method</h1>`


:raw-html:`<h1>Limits of the method</h1>`


:raw-html:`<h1>Implementation</h1>`

.. tab:: DEM generation

:raw-html:`<h1>Method</h1>`


:raw-html:`<h1>Limits of the method</h1>`


:raw-html:`<h1>Implementation</h1>`


.. tab:: Dense matching

:raw-html:`<h1>Method</h1>`


:raw-html:`<h1>Limits of the method</h1>`


:raw-html:`<h1>Implementation</h1>`


.. tab:: Triangulation

:raw-html:`<h1>Method</h1>`


:raw-html:`<h1>Limits of the method</h1>`


:raw-html:`<h1>Implementation</h1>`


.. tab:: Point cloud filtering

:raw-html:`<h1>Method</h1>`


:raw-html:`<h1>Limits of the method</h1>`


:raw-html:`<h1>Implementation</h1>`


.. tab:: Rasterization

:raw-html:`<h1>Method</h1>`


:raw-html:`<h1>Limits of the method</h1>`


:raw-html:`<h1>Implementation</h1>`



Original file line number Diff line number Diff line change
@@ -1,10 +1,14 @@

.. role:: raw-html(raw)
:format: html

.. _application:

Application
===========
:raw-html:`<h1>Application</h1>`


**Overview**

Overview
--------

An *application* is a main step of CARS 3D reconstruction framework.
It contains algorithm methods.
Expand All @@ -17,13 +21,13 @@ It is composed of:
* Some abstract applications (each one defined a main 3d step)
* Some subclass associated to each abstract application, containing specific algorithm

.. figure:: ../images/application_concept.png
.. figure:: ../../images/application_concept.png
:align: center
:alt: Applications


Example
-------
**Example**


Let's take an example of `dense_matching` application to describe the main steps:

Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,15 @@
.. _cars_dataset:

CarsDataset
===========

Goals
-----
.. role:: raw-html(raw)
:format: html

:raw-html:`<h1>CarsDataset</h1>`



**Goals**


*CarsDataset* is the CARS internal data structure.
The data used within CARS imposes some restrictions that the structure must manage:
Expand All @@ -16,17 +21,17 @@ The data used within CARS imposes some restrictions that the structure must mana

*CarsDataset* aims at defining a generic data structure that takes into account this constraints.

Details
-------
**Details**


.. figure:: ../images/Carsdataset.png
.. figure:: ../../images/Carsdataset.png
:align: center
:alt: CarsDataset concept

Here is an example of one dataset with all needed information.

Attributes
^^^^^^^^^^
*Attributes*


* *type* : CarsDataset can manage `Xarray.dataset` or `pandas.DataFrame`
* *tiles* : List of list of `Xarray.dataset` or `pandas.DataFrame`. Include overlaps.
Expand All @@ -41,8 +46,8 @@ Attributes
It is important to note that a tile, if even if you'decided to use `Xarray.dataset` or `pandas.DataFrame` could be a `delayed` or `future` related to
`dask` definition. See next sections.

Functions
^^^^^^^^^
*Functions*


*CarsDataset* integrates all functions for manipulating the data throughout the framework:

Expand Down
Original file line number Diff line number Diff line change
@@ -1,63 +1,62 @@
.. _cluster_mp:

Cluster Multiprocessing
=======================
**Cluster Multiprocessing**

Goals
-----

The multiprocessing (MP) cluster facilitates the distribution of computing for the :ref:`application` and the management of :ref:`cars_dataset` data.
**Goals**


Details
-------
The multiprocessing (MP) cluster facilitates the distribution of computing for the application and the management of cars_dataset data.


**Details**

The MP cluster is built upon `Python's multiprocessing`_ module using the forkserver mode. In this mode, a pool of worker processes handles the parallel execution of functions. Each worker process is single-threaded, and only essential resources are inherited.
By design, CARS utilizes disk-based registry for data storage, distributing data across the processes. If specified in configuration, data distribution can be done in memory, with degraded performance.


.. _`Python's multiprocessing`: https://docs.python.org/3/library/multiprocessing.html

How it works
------------
**How it works**


The main class is the MP Cluster, which inherits from the AbstractCluster class. It is instantiated within the orchestrator.

Inspired by the Dask cluster approach, the MP cluster initiates a list of delayed tasks and factorize the tasks that can be run sequentially.
Factorisation of tasks allows to reduce the number of tasks without losing any time. Reducing the number of tasks permits to reduce the number of dumps on disk and to save time.
For each task that has available data (intermediate results input from the linked previous task), the MP cluster transforms the delayed task into an MpFutureTask.

Upon completion of these jobs, the results are saved on disk, and the reference is passed to the next job. The :ref:`refresh_task_cache` function serves as the primary control function of the MP cluster.
Upon completion of these jobs, the results are saved on disk, and the reference is passed to the next job. The refresh_task_cache function serves as the primary control function of the MP cluster.

The next sections illustrates the architecture of the MP cluster, while the API provides detailed functions that offer more insight into interactions and operations.

Class diagram
^^^^^^^^^^^^^
.. image:: ../images/mp_cluster.svg
*Class diagram*

.. image:: ../../images/mp_cluster.svg
:align: center

API detailed functions
^^^^^^^^^^^^^^^^^^^^^^
*API detailed functions*


**init**
++++++++

Cluster allocation using a Python thread pool.
The worker pool is set up in forkserver mode with a specified number of workers, job timeouts, and wrapper configuration for cluster logging.

**create_task_wrapped**
+++++++++++++++++++++++

Declare task as **MpDelayed** within the cluster.
**MpDelayed** are instantiated using the **mp_delayed_builder** wrapper builder.
Furthermore, the wrapper provides parameters for the job logger.


**start_tasks**
+++++++++++++++

Factorize tasks with **mp_factorizer.factorize_tasks** and add future tasks in the cluster queue. The cluster processes tasks from the queue.
Transform **MpDelayed** with rec_start to **MpJob**, and calculate task dependencies for each job.


**mp_factorizer.factorize_tasks**
+++++++++++++++++++++++++++++++++

Take as input a list of final **MpDelayed** and factorize all the dependent tasks that are *factorizable*.

A task **t** of the class **MpDelayedTask** is *factorizable* if :
Expand All @@ -81,7 +80,7 @@ will be replaced by output of **t1** and then **t2** will be computed. Thus, the


**rec_start**
+++++++++++++

Transform delayed tasks to MpJob and create MpFuture objects to retrieve results.

For each task:
Expand All @@ -98,9 +97,9 @@ For each task:

.. _refresh_task_cache:

# refresh_task_cache


**refresh_task_cache**
++++++++++++++++++++++
At each refresh:

1. Sleep (refresh time).
Expand Down Expand Up @@ -137,41 +136,41 @@ At each refresh:


**get_ready_failed_tasks**
++++++++++++++++++++++++++

Retrieve the new ready tasks and failed tasks.


**get_tasks_without_deps**
++++++++++++++++++++++++++

A static method evaluates a list of tasks that are ready and lack dependencies, excluding those deemed as initial tasks.
The initial tasks of the graph have no priority. In order to enhance disk usage efficiency, the cluster initiates with N initial tasks (where N equals the number of workers), assigning priority to the subsequent connected tasks. After finishing a segment of the task graph, the cluster introduces N new initial tasks to continue the process.


**future_iterator**
+++++++++++++++++++

Enable the initiation of all tasks from the orchestrator controller.


**get_job_ids_from_futures**
++++++++++++++++++++++++++++

Obtain a list of job IDs from the future list.

**replace_job_by_data**
+++++++++++++++++++++++

Substitute MpJob instances in lists or dict with their actual data.


**compute_dependencies**
++++++++++++++++++++++++

Compute job result dependencies from args and kw_args.


**MpFutureTask**
++++++++++++++++

A multiprocessing version of the Dask distributed.future.
This class encapsulates data and references to job cluster threads.
It also facilitates the sharing of references between jobs and cleaning cache operations.

**log_error_hook**
++++++++++++++++++

A custom Exception hook to manage cluster thread exceptions.
Loading

0 comments on commit 779cfa4

Please sign in to comment.