Skip to content

Commit

Permalink
fix: Improve Lock Manager HTTP layer
Browse files Browse the repository at this point in the history
  • Loading branch information
pit-tester committed Feb 13, 2024
1 parent 447a48a commit 718e9ec
Show file tree
Hide file tree
Showing 19 changed files with 754 additions and 225 deletions.
59 changes: 37 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,16 +26,14 @@ At this point we have:

Upon deployment, Lock Manager will prepare its database. It is expected that permanent database server prepared upfront and is accessible from the namespace. This DB is permanent it survives the lifespan of temporary namespace where Lock Manager is running. The DB is used to implement exclusivity over components (graph) used in the tests. Multiple instances of Lock Manager may be present in the K8s cluster each sitting in its own temporary namespace. With the help of locking only one test suite will ever run at the same time unless there is no dependency between tests.

There could be multiple Test Runner Apps deployed in the namespace. These apps may be designed to test different graphs. In such setup the invocation of these multiple Test Runners is orchestrated by PIT and subject to locking strategy.

All tests are divided into test suites as defined in the relevant section of pitfile. Pitfile may contain a mixed definition of local and remote test suites.

_Local_ test suites are defined in the same repository as pitfile.
_Remote_ test suites are defined as reference to remote pitfile. In this case PIT will download the file from remote repository and use its "testSuites" section. It is also possible to specify which test suites to run from remote pitfile by applying naming filter.

Once Test Runner App finishes executing the test report is available via dedicated endpoint, for example via `GET /reports/{$execution_id}`

Test reports are stored permanently. Multiple reports which are obtained from different Test Runner Apps but produced in the same test session may be stitched together before storing.
Test reports are stored permanently.

The responsibilities of all mentioned components are defined as:

Expand All @@ -48,9 +46,9 @@ The responsibilities of all mentioned components are defined as:

- Parses `pitfile.yml`
- Executes main PIT logic described above
- Creates K8s namepsace
- Creates K8s namespace (one for each test suite)
- Deploys Lock Manager
- Deploys Test Runner App
- Deploys Test Runner App (one for each test suite)
- Deploys graph
- Collects test report and stores it permanently
- Cleans up namespace
Expand All @@ -67,9 +65,8 @@ The responsibilities of all mentioned components are defined as:

**The YAML specification (pitfile)**

- Controls whether PIT should run at all based on optional filters
- Encapsulates the location of graph within each test suite
- Defines what to do with test report
- Defines test suites
- Describes components graph for each test suite

![](./docs/arch.png)

Expand Down Expand Up @@ -120,45 +117,63 @@ testSuites:
deployment:
graph:
testApp:
componentName: Talos Certifier Test App
name: Talos Certifier Test App
id: talos-certifier-test-app
location:
type: LOCAL # optional, defautls to 'LOCAL'
deploy:
command: deployment/talos-certifier-test-app/pit/deploy.sh
timeoutSeconds: 120
command: deployment/pit/deploy.sh
params: # Optional command line parameters
- param1
- param2
statusCheck:
command: deployment/talos-certifier-test-app/pit/is-deployment-ready.sh
command: deployment/pit/is-deployment-ready.sh
undeploy:
timeoutSeconds: 120
command: deployment/pit/undeploy.sh

components:
- componentName: Talos Certifier"
- name: Talos Certifier"
id: talos-certifier
# Lets assume that pipeline fired as a result of push into Talos Certifier project
location:
type: LOCAL
deploy:
command: deployment/talos-certifier/pit/deploy.sh
command: deployment/pit/deploy.sh
statusCheck:
command: deployment/talos-certifier/pit/is-deployment-ready.sh
command: deployment/pit/is-deployment-ready.sh
undeploy:
timeoutSeconds: 120
command: deployment/pit/undeploy.sh

- componentName: Talos Replicator
- name: Talos Replicator
id: talos-replicator
# Lets assume Talos Certifier and Replicator (made for testing Talos Certifier) are in the same repository
location:
type: LOCAL
deploy:
command: deployment/talos-replicator/pit/deploy.sh
command: deployment/pit/deploy.sh
statusCheck:
command: deployment/talos-replicator/pit/is-deployment-ready.sh
command: deployment/pit/is-deployment-ready.sh
undeploy:
timeoutSeconds: 120
command: deployment/pit/undeploy.sh

- componentName: Some Other Component
- name: Some Other Component
id: some-id
# This is an example how to define the remote component
location:
type: REMOTE
gitRepository: git://127.0.0.1/some-other-component.git
gitRef: # Optional, defaults to "refs/remotes/origin/master"
deploy:
command: deployment/pit/deploy.sh

statusCheck:
command: deployment/pit/is-deployment-ready.sh
undeploy:
timeoutSeconds: 120
command: deployment/pit/undeploy.sh
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- name: Testset for Talos Certifier integrated with Messenger
id: testset-talos-certifier-and-messenger
Expand Down Expand Up @@ -220,11 +235,11 @@ testSuites:
- Ingress Controller named "kubernetes-ingress". [See instructions here](https://kubernetes.github.io/ingress-nginx/deploy/_)
- All your namespaces are being observed by HNC system. When installing NGINX ingress controller it will create its own namespace and HNC system will complain. To prevent that we need to exclude namespace used by NGINX insgress controller from HNC.
- Use inline editing method: `kubectl edit -n hnc-system deploy hnc-controller-manager` find deployment with name "name: hnc-controller-manager" and add one more entry into the list under `spec/containers/args`. Entry looks like this: `--excluded-namespace=ingress-nginx`
- Custom resource definition "External secrets" is installed in your local cluster:
- Custom resource definition "External secrets" is installed in your local cluster:
- `helm repo add external-secrets https://charts.external-secrets.io`
- `helm install external-secrets external-secrets/external-secrets -n external-secrets --create-namespace`
- Also exclude namespace "external-secrets" from hnc-controller-manager as you did with nginx controller.

- The port 80 is free. Port 80 is used by ingress controller in your local desktop-docker.

## Ports used
Expand Down Expand Up @@ -408,7 +423,7 @@ Similarly run git server for "remote test app"
scripts/host-project-in-git.sh /tmp/remote-sample $(pwd)/../examples/graph-perf-test-app
```

Please note that when these projects will be feetched from local git by `k8-deployer`, the fetched project will have no `.env` file! However, our deployment scripts under `deployment/pit/*` can locate `.env` file by reading global environment variables. This is intended for local development. Below is the list of global environment variables
Please note that when these projects will be feetched from local git by `k8-deployer`, the fetched project will have no `.env` file! However, our deployment scripts under `deployment/pit/*` can locate `.env` file by reading global environment variables. This is intended for local development. Below is the list of global environment variables
controlling the location of `.env` files. Export these either globally or in the terminal where you will be launching `k8-deployer`

| Project | Variable |
Expand Down
91 changes: 91 additions & 0 deletions k8s-deployer/src/locks/http-client.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
import fetch, { RequestInit, Response } from "node-fetch"

import { logger } from "../logger.js"

export type FetchParams = {
endpoint: string,
options: RequestInit
}

export type RetryOptions = {
retries: number,
retryDelay: number,
fetchParams: FetchParams
}

export class HttpErrorOptions {
cause?: Error
responseData?: string | unknown
}

export class HttpError extends Error {
readonly type: string = "HttpError"
responseData?: string | unknown
constructor(
readonly method: string,
readonly endpoint: string,
readonly status: number,
readonly text: string,
private options: HttpErrorOptions
){
super(text, { cause: options.cause })
this.responseData = options.responseData
}

toString(): string {
return JSON.stringify({ method: this.method, endpoint: this.endpoint, status: this.status, text: this.text, responseData: this.responseData, cause: this.cause })
}
}

export const invoke = async (options: RetryOptions, apiBody: Object) => {
let attempts = 1
while (options.retries > 0) {
logger.info("http-client.invoke(): attempt %s of %s invoking %s", attempts, options.retries, JSON.stringify(options.fetchParams))
try {
return await invokeApi(options.fetchParams, apiBody)
} catch (error: unknown) {
logger.warn("http-client.invoke(): attempt %s of %s invoking %s. Error: %s", attempts, options.retries, JSON.stringify(options.fetchParams), error)
if (attempts < options.retries) {
attempts++
await sleep(options.retryDelay)
continue
}

logger.warn("http-client.invoke(): failed to invoke %s. No more attempts left, giving up", JSON.stringify(options.fetchParams))
throw new Error(`Failed to fetch ${ options.fetchParams.endpoint } after ${ attempts } attempts`, { cause: error })
}
}
}

const invokeApi = async (params: FetchParams, apiBody: Object): Promise<unknown | string> => {
const resp = await fetch(params.endpoint, { ...params.options, body: JSON.stringify(apiBody) })
const readPayload = async (response: Response): Promise<unknown | string> => {
const ctHeader = "content-type"
const hdrs = response.headers
if (hdrs.has(ctHeader) && `${ hdrs.get(ctHeader) }`.toLowerCase().startsWith("application/json")) {
try {
return await response.json()
} catch (e) {
logger.warn("Unable to parse JSON when handling response from '%s', handling will fallback to reading the payload as text. Parsing error: %s", params.endpoint, e)
return await response.text()
}
} else {
return await response.text()
}
}

if (resp.ok) {
logger.info("invokeApi(): resp was ok, reading payload")
return await readPayload(resp)
} else {
logger.info("invokeApi(): resp was not ok, raising error")
throw new HttpError(
params.options.method,
params.endpoint,
resp.status,
resp.statusText,
{ responseData: await readPayload(resp) })
}
}

export const sleep = (time: number) => new Promise(resolve => setTimeout(resolve, time))
42 changes: 0 additions & 42 deletions k8s-deployer/src/locks/lock-manager-api-client.ts

This file was deleted.

40 changes: 21 additions & 19 deletions k8s-deployer/src/locks/lock-manager.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
import {logger} from "../logger.js"
import {Schema} from "../model.js"
import * as LockManagerApi from "./lock-manager-api-client.js"
import * as HttpClient from "./http-client.js"
import { FetchParams } from "./http-client.js"
import { AcquireResponse, KeepAliveRequest, KeepAliveResponse, ReleaseResponse } from "./schema-v1.js"

const KEEP_ALIVE_INTERVAL = 10 // seconds
const RETRY_TIMEOUT = 1000 // milliseconds
Expand All @@ -10,10 +12,10 @@ export class LockManager {
}

private api = {
check: { endpoint: "", options: {} },
acquire: { endpoint: "", options: {} },
keepAlive: { endpoint: "", options: {} },
release: { endpoint: "", options: {} },
acquire: {} as FetchParams,
check: {} as FetchParams,
keepAlive: {} as FetchParams,
release: {} as FetchParams
}

// Handle to the timer
Expand Down Expand Up @@ -44,9 +46,9 @@ export class LockManager {
this.validateArgs(this.lockOwner, lock)
lock.ids = this.cleanupIds(lock.ids)

let locksAcquired = new Array<string>()
const locksAcquired = new Array<string>()

let retryOptions: LockManagerApi.RetryOptions = {
const retryOptions: HttpClient.RetryOptions = {
retries: this.apiRetries,
retryDelay: RETRY_TIMEOUT,
fetchParams: this.api.acquire,
Expand All @@ -60,7 +62,7 @@ export class LockManager {

while (locksAcquired.indexOf(lockId) == -1) {
try {
let response = await LockManagerApi.invoke(retryOptions, { owner: this.lockOwner, lockId }) as any
const response = await HttpClient.invoke(retryOptions, { owner: this.lockOwner, lockId }) as AcquireResponse
logger.info("LockManager.lock(): %s of %s. Outcome: %s", i, lock.ids.length, JSON.stringify({ request: { owner: this.lockOwner, lockId }, response }))
if (response.acquired) {
locksAcquired.push(response.lockId)
Expand Down Expand Up @@ -93,7 +95,7 @@ export class LockManager {
this.validateArgs(this.lockOwner, lock)
lock.ids = this.cleanupIds(lock.ids)

let retryOptions: LockManagerApi.RetryOptions = {
const retryOptions: HttpClient.RetryOptions = {
retries: this.apiRetries,
retryDelay: RETRY_TIMEOUT,
fetchParams: this.api.release,
Expand All @@ -105,9 +107,9 @@ export class LockManager {

logger.info("LockManager.release(): Releasing lock for %s", lock.ids)
try {
let respJson = await LockManagerApi.invoke(retryOptions, { owner: this.lockOwner, lockIds: lock.ids })
const lockIds = await HttpClient.invoke(retryOptions, { owner: this.lockOwner, lockIds: lock.ids }) as ReleaseResponse
logger.info("LockManager.release(): %s is released for %s", lock.ids, this.lockOwner)
return respJson
return lockIds
} catch (error) {
logger.error("LockManager.release(): Failed to release lock for %s", lock.ids, error)
throw new Error(`Failed to release lock for ${ lock.ids }`, { cause: error })
Expand All @@ -133,15 +135,15 @@ export class LockManager {
expiryInSec = timeValue
}

let retryOptions: LockManagerApi.RetryOptions = {
let retryOptions: HttpClient.RetryOptions = {
retries: this.apiRetries,
retryDelay: RETRY_TIMEOUT,
fetchParams: this.api.keepAlive,
}
try {
let params = { owner: this.lockOwner, lockIds: lock.ids, expiryInSec }
let resp = (await LockManagerApi.invoke(retryOptions, params)) as any
logger.debug("LockManager.keepAliveFetch(): keepAlive api for: %s with resp %s", JSON.stringify(params), resp)
const request: KeepAliveRequest = { owner: this.lockOwner, lockIds: lock.ids, expiryInSec }
const resp = (await HttpClient.invoke(retryOptions, request)) as KeepAliveResponse
logger.debug("LockManager.keepAliveFetch(): keepAlive api for: %s with resp %s", JSON.stringify(request), resp)
return resp
} catch (error) {
logger.error("LockManager.keepAliveFetch(): failed to keep alive for %s with %s", { lock, owner: this.lockOwner }, error)
Expand All @@ -165,8 +167,8 @@ export class LockManager {

this.keepAliveJobHandle = setInterval(async () => {
try {
logger.info("LockManager. Heartbeat for: %s", JSON.stringify({ owner, lock }))
return await this.keepAliveFetch(lock)
logger.info("LockManager. Heartbeat for: %s", JSON.stringify({ owner, lock }))
return await this.keepAliveFetch(lock)
} catch (error) {
clearInterval(this.keepAliveJobHandle)
logger.error("LockManager. Heartbeat failed for %s with %s", JSON.stringify({ lock, owner }), error)
Expand All @@ -190,10 +192,10 @@ export class LockManager {
const msg = `Timeout ${ lock.timeout } should be a string eg 1h, 1m or 1s`
if (typeof(lock.timeout) !== "string") throw new Error(msg)

let timeUnit = lock.timeout.trim().slice(-1)
const timeUnit = lock.timeout.trim().slice(-1)
if (![ "h", "m", "s" ].includes(timeUnit)) throw new Error(msg)

let timeValue = parseInt(lock.timeout.slice(0, -1), 10)
const timeValue = parseInt(lock.timeout.slice(0, -1), 10)
if (isNaN(timeValue) || timeValue < 0) throw new Error(msg)
}
}
Loading

0 comments on commit 718e9ec

Please sign in to comment.