-
Notifications
You must be signed in to change notification settings - Fork 10
/
Copy pathinternal.html
354 lines (240 loc) · 24.2 KB
/
internal.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
<!DOCTYPE html>
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Internal architecture — Digdag 0.10.5 documentation</title>
<script type="text/javascript" src="_static/js/modernizr.min.js"></script>
<script type="text/javascript" id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
<script type="text/javascript" src="_static/jquery.js"></script>
<script type="text/javascript" src="_static/underscore.js"></script>
<script type="text/javascript" src="_static/doctools.js"></script>
<script type="text/javascript" src="_static/language_data.js"></script>
<script type="text/javascript" src="_static/js/theme.js"></script>
<link rel="stylesheet" href="_static/css/theme.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="_static/custom.css" type="text/css" />
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Digdag metrics (experimental)" href="metrics.html" />
<link rel="prev" title="Command Executor" href="command_executor.html" />
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="index.html" class="icon icon-home"> Digdag
</a>
<div class="version">
0.10
</div>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div>
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="getting_started.html">Getting started</a></li>
<li class="toctree-l1"><a class="reference internal" href="architecture.html">Architecture</a></li>
<li class="toctree-l1"><a class="reference internal" href="concepts.html">Concepts</a></li>
<li class="toctree-l1"><a class="reference internal" href="workflow_definition.html">Workflow definition</a></li>
<li class="toctree-l1"><a class="reference internal" href="scheduling_workflow.html">Scheduling workflow</a></li>
<li class="toctree-l1"><a class="reference internal" href="operators.html">Operators</a></li>
<li class="toctree-l1"><a class="reference internal" href="command_reference.html">Command reference</a></li>
<li class="toctree-l1"><a class="reference internal" href="python_api.html">Language API - Python</a></li>
<li class="toctree-l1"><a class="reference internal" href="ruby_api.html">Language API - Ruby</a></li>
<li class="toctree-l1"><a class="reference internal" href="rest_api.html">REST API</a></li>
<li class="toctree-l1"><a class="reference internal" href="command_executor.html">Command Executor</a></li>
<li class="toctree-l1 current"><a class="current reference internal" href="#">Internal architecture</a><ul>
<li class="toctree-l2"><a class="reference internal" href="#task-logging">Task logging</a></li>
<li class="toctree-l2"><a class="reference internal" href="#database">Database</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#project-repository">Project repository</a></li>
<li class="toctree-l3"><a class="reference internal" href="#schedule-store">Schedule store</a></li>
<li class="toctree-l3"><a class="reference internal" href="#session-store">Session store</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="#api-server-agent-workflow-executor-and-schedule-executor">API server, agent, workflow executor, and schedule executor</a></li>
<li class="toctree-l2"><a class="reference internal" href="#task-queue">Task queue</a></li>
<li class="toctree-l2"><a class="reference internal" href="#project-archive-storage">Project archive storage</a></li>
<li class="toctree-l2"><a class="reference internal" href="#command-executor">Command executor</a></li>
<li class="toctree-l2"><a class="reference internal" href="#extension-mechanisms">Extension mechanisms</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#extension">Extension</a></li>
<li class="toctree-l3"><a class="reference internal" href="#system-plugins">System plugins</a></li>
<li class="toctree-l3"><a class="reference internal" href="#dynamic-plugins">Dynamic plugins</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="#next-steps">Next steps</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="metrics.html">Digdag metrics (experimental)</a></li>
<li class="toctree-l1"><a class="reference internal" href="community_contributions.html">Community Contributions</a></li>
<li class="toctree-l1"><a class="reference internal" href="releases.html">Release Notes</a></li>
<li class="toctree-l1"><a class="reference internal" href="logo.html">Logo</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
<nav class="wy-nav-top" aria-label="top navigation">
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="index.html">Digdag</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li><a href="index.html">Docs</a> »</li>
<li>Internal architecture</li>
<li class="wy-breadcrumbs-aside">
<a href="_sources/internal.md.txt" rel="nofollow"> View page source</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<div class="section" id="internal-architecture">
<h1>Internal architecture<a class="headerlink" href="#internal-architecture" title="Permalink to this headline">¶</a></h1>
<p>This guide explains implementation details of Digdag. Understanding the internal architecture is helpful for Digdag system administrators and developers to understand why Digdag behaves as described at <a class="reference external" href="architecture.html">architecture</a> page.</p>
<div class="section" id="task-logging">
<h2>Task logging<a class="headerlink" href="#task-logging" title="Permalink to this headline">¶</a></h2>
<p>When a task runs, a logger (SLF4j + logback) collects its log messages and passes them to a task logger set at a thread-local storage. When a task starts, task logger creates a new file and writes messages to the file. As the file grows larger than certain limit, it uploads the file to a storage (note: there is an optimization for local execution with local file system logger: logs are directly appended to the final destionation file). LogServer is the SPI interface that implements this storage.</p>
<p>When the task logger uploads a file, it requests a digdag server to send a direct upload URL first. If LogServer supports a temporary pre-signed HTTP URL to upload a file, the server returns a pre-signed URL. Then, the task logger uploads the file to the URL directly. Otherwise, the task logger uploads the file to a server, and the server uploads the contents using LogServer. LogServer stores the file using full name of the task as its path name.</p>
<p>When a client wants log files, the client requests a digdag server to send list of files that have a common path prefix. This prefix may be full name of a task to get logs of a task, or name of a parent task to get all logs of its children. When requested, digdag server gets the list of files from LogServer. LogServer must be capable to list files by prefix.</p>
<p>If LogServer supports a temporary pre-signed HTTP URL to download files, LogServer returns the URL in addition to file name in the list of files. Digdag server returns the list to a client. With this way, downloading traffic won’t go throw the server. S3 LogServer supports pre-signed URL, for example. Otherwise, Digdag server returns the list without direct download URL. Clients will request the server to send the files, and the server fetches the contents using LogServer. Default local filesystem LogServer doesn’t support pre-signed URL, for example.</p>
</div>
<div class="section" id="database">
<h2>Database<a class="headerlink" href="#database" title="Permalink to this headline">¶</a></h2>
<p>Digdag stores all data in a database (PostgreSQL or H2 database). When digdag runs in local mode, it uses in-memory H2 database.</p>
<div class="section" id="project-repository">
<h3>Project repository<a class="headerlink" href="#project-repository" title="Permalink to this headline">¶</a></h3>
<p>Tables <code class="docutils literal notranslate"><span class="pre">projects</span></code>, <code class="docutils literal notranslate"><span class="pre">revisions</span></code>, <code class="docutils literal notranslate"><span class="pre">revision_archives</span></code>, <code class="docutils literal notranslate"><span class="pre">workflow_definitions</span></code>, and <code class="docutils literal notranslate"><span class="pre">workflow_configs</span></code> tables store projects.</p>
<p><code class="docutils literal notranslate"><span class="pre">projects</span></code> table stores names of projects.</p>
<p><code class="docutils literal notranslate"><span class="pre">revisions</span></code> table stores revisions of projects. It contains information about where project archive is stored. If <code class="docutils literal notranslate"><span class="pre">archive_type</span></code> column is “db”, project archive is stored in <code class="docutils literal notranslate"><span class="pre">revision_archives</span></code> table. Otherwise, a project archive is stored using a storage plugin (such as S3).</p>
<p><code class="docutils literal notranslate"><span class="pre">workflow_definitions</span></code> table stores names of workflows. This table is always used with <code class="docutils literal notranslate"><span class="pre">workflow_configs</span></code> table because actual configurations of workflows are stored there. When a new revision is uploaded, multiple rows are inserted to <code class="docutils literal notranslate"><span class="pre">workflow_definitions</span></code> table because a project can contain multiple workflows. However, in most of cases, only a few workflows are changed. To optimize this workload, a same <code class="docutils literal notranslate"><span class="pre">workflow_configs</span></code> row is shared by multiple <code class="docutils literal notranslate"><span class="pre">workflow_definitions</span></code> if their configurations are identical.</p>
</div>
<div class="section" id="schedule-store">
<h3>Schedule store<a class="headerlink" href="#schedule-store" title="Permalink to this headline">¶</a></h3>
<p><code class="docutils literal notranslate"><span class="pre">schedules</span></code> table stores status of active schedules. Actual configuration of schedules are stored in project repository. <code class="docutils literal notranslate"><span class="pre">schedules</span></code> table takes only the latest revision and stores their current status.</p>
<p>When a new revision is uploaded, digdag uses workflow names to update schedules. If a new revision contains a workflow and there is an existent schedule of a old workflow with the same name, digdag keeps status of the schedule. If there’re no old workflows with the same name, digdag creates a new schedule. If opposite, an old schedule exists but new revision doesn’t include workflows with the same name, the schedule will be deleted.</p>
</div>
<div class="section" id="session-store">
<h3>Session store<a class="headerlink" href="#session-store" title="Permalink to this headline">¶</a></h3>
<p>Tables <code class="docutils literal notranslate"><span class="pre">sessions</span></code>, <code class="docutils literal notranslate"><span class="pre">session_attempts</span></code>, <code class="docutils literal notranslate"><span class="pre">tasks</span></code>, <code class="docutils literal notranslate"><span class="pre">task_details</span></code>, <code class="docutils literal notranslate"><span class="pre">task_dependencies</span></code>, and <code class="docutils literal notranslate"><span class="pre">task_archives</span></code> tables store session information.</p>
<p><code class="docutils literal notranslate"><span class="pre">sessoins</span></code> table stores history of sessions.</p>
<p><code class="docutils literal notranslate"><span class="pre">session_attempts</span></code> table stores history of attempts.</p>
<p><code class="docutils literal notranslate"><span class="pre">tasks</span></code>, <code class="docutils literal notranslate"><span class="pre">task_details</span></code>, and <code class="docutils literal notranslate"><span class="pre">task_dependencies</span></code> tables stores tasks of running attempts. <code class="docutils literal notranslate"><span class="pre">tasks</span></code> table stores state of tasks. <code class="docutils literal notranslate"><span class="pre">task_details</span></code> stores config and parameters of tasks, and <code class="docutils literal notranslate"><span class="pre">task_dependencies</span></code> stores dependency between multiple tasks under an parent tasks. Workflow executor checks these tables periodically to run workflows. When an attempt finishes, its tasks will be removed from the tables and archived in <code class="docutils literal notranslate"><span class="pre">task_archives</span></code> table.</p>
</div>
</div>
<div class="section" id="api-server-agent-workflow-executor-and-schedule-executor">
<h2>API server, agent, workflow executor, and schedule executor<a class="headerlink" href="#api-server-agent-workflow-executor-and-schedule-executor" title="Permalink to this headline">¶</a></h2>
<p>When Digdag runs as a server, it has 3 major thread pools:</p>
<ul class="simple">
<li><p>API server: REST API server. This is the only component that receives requests from external systems.</p></li>
<li><p>Agent: Agent fetches tasks from a queue and runs them. This is planned to be running on untrusted remote environment. Thus agents won’t communicate with database directly.</p></li>
<li><p>Workflow executor: Workflow executor checks state of tasks of running attempts on database and pushes ready tasks to a queue.</p></li>
<li><p>Schedule executor: Schedule executor checks state of active schedules on database and starts tasks.</p></li>
</ul>
<p>By default, <code class="docutils literal notranslate"><span class="pre">digdag</span> <span class="pre">server</span></code> runs all of them. There’re some options to disable the components:</p>
<ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">--disable-executor-loop</span></code> disables workflow executor and schedule executor.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">--disable-scheduler</span></code> disables schedule executor.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">--disable-local-agent</span></code> disables agent.</p></li>
</ul>
<p>API server, workflow executor, and schedule executor use database (H2 or PostgreSQL) to communicate each other.</p>
<p>Agent and API server use task queue to communicate. And because task queue is built on top of database, they will be a single cluster as long as a database is shared.</p>
</div>
<div class="section" id="task-queue">
<h2>Task queue<a class="headerlink" href="#task-queue" title="Permalink to this headline">¶</a></h2>
<p>Task queue makes sure that a task runs at least once.</p>
<p>When an attempt starts, WorkflowExecutor pushes the root task to a task queue using TaskQueueServer interface. When the task finished, WorkflowExecutor will get a callback when a task finishes at least once. As tasks become ready to execute, WorkflowExecutor pushes the tasks to a task queue.</p>
<p>An agent uses TaskQueueClient interface to fetch tasks. The default implementation of TaskQueueClient is TaskQueueServer itself which fetches tasks from the underlaying storage directly. An intended implementation is HTTP-based client that fetches tasks through Digdag’s REST API.</p>
<p>When a task is pushed to a queue, it pushes ID of the task. TaskRequest will be instanciated when an agent fetches it.</p>
<p>When an agent fetches a task, it locks the task first. A locked task won’t be taken by other agents for a while. The agent must send heartbeat to extend the lock expiration time until execution of the task finishes. When an agent crashes, heartbeat breaks out. In this case, the task will be taken by another agent. The task is deleted from the queue by WorkflowExecutor when task finish callback is sent.</p>
</div>
<div class="section" id="project-archive-storage">
<h2>Project archive storage<a class="headerlink" href="#project-archive-storage" title="Permalink to this headline">¶</a></h2>
<p>Storage (<code class="docutils literal notranslate"><span class="pre">io.digdag.spi.Storage</span></code>) is a plugin interface that is used to store task log files and project archives. Task log files use storage through StorageFileLogServer (<code class="docutils literal notranslate"><span class="pre">io.digdag.core.log</span></code>) class, and project archives use storage through ArchiveManager (<code class="docutils literal notranslate"><span class="pre">io.digdag.core.storage</span></code>) class.</p>
<p>S3Storage is a storage implementation that stores files on Amazon S3.</p>
<p>Storage is injected using Extension mechanism (see below).</p>
</div>
<div class="section" id="command-executor">
<h2>Command executor<a class="headerlink" href="#command-executor" title="Permalink to this headline">¶</a></h2>
<p>Command executor (<code class="docutils literal notranslate"><span class="pre">io.digdag.spi.CommandExecutor</span></code>) is a plugin interface that is used to execute a command in a sandbox environment.</p>
<p>A sandbox is expected to provide following functionalities to provide multi-user task execution environment:</p>
<ul class="simple">
<li><p>Isolated OS image</p></li>
<li><p>Limited CPU consumption</p></li>
<li><p>Limited memory consumption</p></li>
<li><p>Limited disk consumption</p></li>
</ul>
<p>An expected use case is that a system administrator of agent configure agent to enforce use of a secure command executor such as Docker, although it is not implemented yet.</p>
<p>Currently, operator code itself (Java code) doesn’t run in a sandbox. Operators run in the same environment with agent. Thus operators are required to be secure so that it doesn’t leak security information or badly impact on execution environment. If it seems hard to achieve, agent needs another mechanism to isolate execution environment.</p>
</div>
<div class="section" id="extension-mechanisms">
<h2>Extension mechanisms<a class="headerlink" href="#extension-mechanisms" title="Permalink to this headline">¶</a></h2>
<div class="section" id="extension">
<h3>Extension<a class="headerlink" href="#extension" title="Permalink to this headline">¶</a></h3>
<p>Extension (<code class="docutils literal notranslate"><span class="pre">io.digdag.spi.Extension</span></code>) is an interface to statically customize digdag using dependency injection (Guice). This is useful to override some built-in behavior, add built-in operators, or override default parameters.</p>
<p>Extension needs least code to make some extension possible. But it’s the hardest to use because users need to write program to use.</p>
<p>A typical use case is for system integrators to customize digdag for their internal use.</p>
<p>Many of customization points in digdag are assuming Extension to override (i.e. most of what’s bound using guice) because it needs less code. But for ease of use, they should also accept system plugins, eventually.</p>
</div>
<div class="section" id="system-plugins">
<h3>System plugins<a class="headerlink" href="#system-plugins" title="Permalink to this headline">¶</a></h3>
<p>System plugins are plugins loaded when digdag starts. System plugins are used to customize global behavior of digdag. Adding an authenticator is one of the use cases.</p>
<p>If <code class="docutils literal notranslate"><span class="pre">io.digdag.spi.Plugin</span></code> implementations are available using Java’s ServiceLoader mechanism in classpath, digdag uses them automatically.</p>
<p>If plugins are written in config file (<code class="docutils literal notranslate"><span class="pre">system-plugin.repositories</span></code> and <code class="docutils literal notranslate"><span class="pre">system-plugin.dependencies</span></code>), digdag downloads them from remote maven repositories, and loads <code class="docutils literal notranslate"><span class="pre">io.digdag.spi.Plugin</span></code> implementations using ServiceLoader. These plugins are loaded using isolated classloaders.</p>
<p>Plugins share the same Guice instance with digdag core. Thus plugins can get any internal resources as long as an instance is available through <code class="docutils literal notranslate"><span class="pre">io.digdag.spi</span></code> interface which are shared even with isolated classloaders.</p>
<p>A system plugin is instantiated once when digdag starts.</p>
<p>All dynamic plugins can be loaded as system plugins. But system plugins can’t be loaded as dynamic plugins.</p>
</div>
<div class="section" id="dynamic-plugins">
<h3>Dynamic plugins<a class="headerlink" href="#dynamic-plugins" title="Permalink to this headline">¶</a></h3>
<p>Dynamic plugins are plugins loaded when a task runs. Dynamic plugins are loaded using an isolated class loader and isolated Guice instance. Dynamic plugins can access to only subset of internal resources. For example, dynamic plugin loader of operators allows access to CommandExecutor and TemplateEngine.</p>
<p>An intended use case is loading operators.</p>
<p>A dynamic plugin may be instantiated multiple times depending on cache size of DynamicPluginLoader.</p>
</div>
</div>
<div class="section" id="next-steps">
<h2>Next steps<a class="headerlink" href="#next-steps" title="Permalink to this headline">¶</a></h2>
<ul class="simple">
<li><p><a class="reference external" href="command_reference.html">Command reference</a></p></li>
</ul>
</div>
</div>
</div>
</div>
<footer>
<div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
<a href="metrics.html" class="btn btn-neutral float-right" title="Digdag metrics (experimental)" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right"></span></a>
<a href="command_executor.html" class="btn btn-neutral float-left" title="Command Executor" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left"></span> Previous</a>
</div>
<hr/>
<div role="contentinfo">
<p>
© Copyright 2016-2024, Digdag Project
</p>
</div>
Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/rtfd/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>.
<br/>
<br/>
<p><a href="https://digdag.io/">Digdag</a> is an open source project, invented and sponsored by <a href="https://www.treasuredata.com/">Treasure Data, Inc.</a> under the Apache 2.0 Licence.</p>
</footer>
</div>
</div>
</section>
</div>
<script type="text/javascript">
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</body>
</html>