Skip to content

Commit

Permalink
Docs: generated by furiosa-runtime#1456
Browse files Browse the repository at this point in the history
furiosa-runtime commit: 5b864736005496368a0f6b2dad49760642787215
  • Loading branch information
furiosa-infra committed Feb 19, 2025
1 parent 7fe5dd4 commit 25bf5d1
Show file tree
Hide file tree
Showing 221 changed files with 35,611 additions and 0 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added v2025.1.0/en/_images/rngd_card.avif
Binary file not shown.
67 changes: 67 additions & 0 deletions v2025.1.0/en/_images/sw_stack.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2,840 changes: 2,840 additions & 0 deletions v2025.1.0/en/_modules/furiosa_llm/api.html

Large diffs are not rendered by default.

1,080 changes: 1,080 additions & 0 deletions v2025.1.0/en/_modules/furiosa_llm/artifact/builder.html

Large diffs are not rendered by default.

559 changes: 559 additions & 0 deletions v2025.1.0/en/_modules/furiosa_llm/artifact/types.html

Large diffs are not rendered by default.

955 changes: 955 additions & 0 deletions v2025.1.0/en/_modules/furiosa_llm/llm_engine.html

Large diffs are not rendered by default.

631 changes: 631 additions & 0 deletions v2025.1.0/en/_modules/furiosa_llm/sampling_params.html

Large diffs are not rendered by default.

403 changes: 403 additions & 0 deletions v2025.1.0/en/_modules/index.html

Large diffs are not rendered by default.

7 changes: 7 additions & 0 deletions v2025.1.0/en/_sources/cloud_native_toolkit/intro.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
.. _CloudNativeToolkit:

####################################
Cloud Native Toolkit
####################################

FuriosaAI Cloud Native Toolkit is a software stack to enable FuriosaAI's NPU product in Kubernetes and Container ecosystem.
28 changes: 28 additions & 0 deletions v2025.1.0/en/_sources/cloud_native_toolkit/kubernetes.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
.. _Kubernetes:

####################################
Kubernetes Support
####################################

We do support the following versions of Kubernetes and CRI runtime:

* Kubernetes: v1.24.0 or later
* helm v3.0.0 or later
* CRI Runtime: `containerd <https://github.com/containerd/containerd>`_ or `CRI-O <https://github.com/cri-o/cri-o>`_

.. note::

Docker is officially deprecated as a container runtime in Kubernetes.
It is recommended to use containerd or CRI-O as a container runtime.
Otherwise you may face unexpected issues with the device plugin.
For more information, see `here <https://kubernetes.io/blog/2020/12/02/dont-panic-kubernetes-and-docker/>`_.


.. toctree::
:maxdepth: 1
:caption: Kubernetes Support

/cloud_native_toolkit/kubernetes/feature_discovery
/cloud_native_toolkit/kubernetes/device_plugin
/cloud_native_toolkit/kubernetes/metrics_exporter
/cloud_native_toolkit/kubernetes/scheduling_npus
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
.. _DevicePlugin:

################################
Installing Furiosa Device Plugin
################################


Furiosa Device Plugin
================================================================
The Furiosa device plugin implements the `Kubernetes Device Plugin <https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/>`_
interface for FuriosaAI NPU devices, and its features are as follows:

* Discovering the Furiosa NPU devices and registeriing to a Kubernetes cluster.
* Tracking the health of the devices and reporting to a Kubernetes cluster.
* Running AI workload on the top of the Furiosa NPU devices within a Kubernetes cluster.

Configuration
----------------------------------------------
The Furiosa NPU can be integrated into the Kubernetes cluster in various configurations.
A single NPU card can either be exposed as a single resource or partitioned into multiple resources.
Partitioning into multiple resources allows for more granular control.

The configuration structure is as follows:

.. code-block:: yaml
config:
resourceStrategy: generic
debugMode: false
disabledDeviceUUIDListMap:
`resourceStrategy` defines the resource unit of NPU scheduling in the cluster. The following table shows the available resource strategy:

.. list-table::
:align: center
:widths: 200 200 200
:header-rows: 1

* - NPU Configuration
- Resource Name
- Resource Count Per Card
* - generic
- furiosa.ai/rngd
- 1
* - single-core
- furiosa.ai/rngd-1core.6gb
- 8
* - dual-core
- furiosa.ai/rngd-2core.12gb
- 4
* - quad-core
- furiosa.ai/rngd-4core.24gb
- 2

`debugMode` enables or disables debug mode. The default value is `false`.

`disabledDeviceUUIDListMap` allows disabling specific devices on a per-node basis. This is structured as follows:

.. code-block:: yaml
disabledDeviceUUIDListMap:
node_a:
- "uuid1"
- "uuid2"
node_b:
- "uuid3"
- "uuid4"
If `disabledDeviceUUIDListMap` is not configured, all devices are enabled by default.


Deploying Furiosa Device Plugin with Helm
-----------------------------------------

The Furiosa device plugin helm chart is available at https://github.com/furiosa-ai/helm-charts. To configure deployment as you need, you can modify ``charts/furiosa-device-plugin/values.yaml``.

* If resourceStrategy is not specified, the default value is ``"generic"``.
* If debugMode is not specified, the default value is ``false``.
* If disabledDeviceUUIDListMap is not specified, the default value is empty list ``[]``.

You can deploy the Furiosa Device Plugin by running the following commands:

.. code-block:: sh
helm repo add furiosa https://furiosa-ai.github.io/helm-charts
helm repo update
helm install furiosa-device-plugin furiosa/furiosa-device-plugin -n kube-system
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
.. _FeatureDiscovery:

####################################
Installing Furiosa Feature Discovery
####################################


Furiosa Feature Discovery and NFD
================================================================

The Furiosa Feature Discovery automatically labels Kubernetes nodes with information
about FuriosaAI NPU properties, such as the NPU family, count, and driver versions.
Using these labels, you can schedule your Kubernetes workloads based on specific NPU requirements.

The Furiosa Feature Discovery leverage NFD(Node Feature Discovery) which is a tool that detects
hardware features and labels Kubernetes nodes. It is recommended to use NFD and
Furiosa Feature Discovery to ensure that the Cloud Native Toolkit is deployed only on nodes
equipped with FuriosaAI NPUs.


Labels
-----------------------------

The followings are the labels that the Furiosa Feature Discovery attaches and what they mean.

.. list-table:: Labels
:align: center
:header-rows: 1
:widths: 130 160 260

* - Label
- Value
- Description
* - furiosa.ai/npu.count
- n
- # of NPU devices
* - furiosa.ai/npu.family
- warboy, rngd
- Chip family
* - furiosa.ai/npu.product
- warboy, rngd, rngd-s, rngd-max
- Chip product name
* - furiosa.ai/npu.driver.version
- x.y.z
- NPU device driver version
* - furiosa.ai/npu.driver.version.major
- x
- NPU device driver version major part
* - furiosa.ai/npu.driver.version.minor
- y
- NPU device driver version minor part
* - furiosa.ai/npu.driver.version.patch
- z
- NPU device driver version patch part
* - furiosa.ai/npu.driver.version.metadata
- abcxyz
- NPU device driver version metadata


Deploying Furiosa Feature Discovery with Helm
----------------------------------------------
With the helm chart you can easily install Furiosa feature discovery and NFD into your Kubernetes cluster.
Following command shows how to install them.
The Furiosa device plugin helm chart is available at https://github.com/furiosa-ai/helm-charts. To configure deployment as you need, you can modify ``charts/furiosa-feature-discovery/values.yaml``.

.. code-block:: sh
helm repo add furiosa https://furiosa-ai.github.io/helm-charts
helm repo update
helm install furiosa-feature-discovery furiosa/furiosa-feature-discovery -n kube-system
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
.. _MetricsExporter:

###################################
Installing Furiosa Metrics Exporter
###################################


Furiosa Metrics Exporter
================================================================
The Furiosa metrics exporter exposes collection of metrics related to
FuriosaAI NPU devices in `Prometheus <https://prometheus.io/>`_ format.
In a Kubernetes cluster, you can scrape the metrics provided by furiosa-metrics-exporter
using Prometheus and visualize them with a Grafana dashboard.
This can be easily set up using the `Prometheus Chart <https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus>`_
and `Grafana <https://github.com/grafana/helm-charts/tree/main/charts/grafana>`_
Helm charts, along with the furiosa-metrics-exporter Helm chart.


Metrics
-----------------------------------
The exporter is composed of chain of collectors, each collector is responsible
for collecting specific metrics from the Furiosa NPU devices.
The following table shows the available collectors and metrics:


.. list-table:: NPU Metrics
:align: center
:widths: 100 100 100 100 200
:header-rows: 1

* - Collector Name
- Metric
- Type
- Metric Labels
- Description
* - Liveness
- furiosa_npu_alive
- guage
- arch, core, device, uuid, kubernetes_node_name
- The liveness of the Furiosa NPU device.
* - Temperature
- furiosa_npu_hw_temperature
- guage
- arch, core, device, uuid, kubernetes_node_name, label
- The temperature of the Furiosa NPU device.
* - Power
- furiosa_npu_hw_power
- guage
- arch, core, device, uuid, kubernetes_node_name, label
- The power consumption of the Furiosa NPU device.
* - Core Utilization
- furiosa_npu_core_utilization
- guage
- arch, core, device, uuid, kubernetes_node_name
- The core utilization of the Furiosa NPU device.

All metrics share common metric labels such as arch, core, device, kubernetes_node_name, and uuid.
The following table describes the common metric labels:

.. list-table:: Common NPU Metrics Label
:align: center
:widths: 100 300
:header-rows: 1

* - Common Metric Label
- Description
* - arch
- The architecture of the Furiosa NPU device. e.g. warboy, rngd
* - core
- The core number of the Furiosa NPU device. e.g. 0, 1, 2, 3, 4, 5, 6, 7, 0-1, 2-3, 0-3, 4-5, 6-7, 4-7, 0-7
* - device
- The device name of the Furiosa NPU device. e.g. npu0
* - kubernetes_node_name
- The name of the Kubernetes node where the exporter is running, this attribute can be missing if the exporter is running on the host machine or in a naked container.
* - uuid
- The UUID of the Furiosa NPU device.

The metric label “label” is used to describe additional attributes specific to each metric.
This approach helps avoid having too many metric definitions and effectively aggregates metrics that share common characteristics.

.. list-table:: NPU Metrics Type
:align: center
:widths: 100 120 200
:header-rows: 1

* - Metric Type
- Label Attribute
- Description
* - Temperature
- peak
- The highest temperature observed from SoC sensors
* - Temperature
- ambient
- The temperature observed from sensors attached to the board
* - Power
- rms
- Root Mean Square (RMS) value of the power consumed by the device, providing an average power consumption metric over a period of time.


The following shows real-world example of the metrics:

.. code-block:: sh
#liveness
furiosa_npu_alive{arch="rngd",core="0-7",device="npu0",kubernetes_node_name="node",uuid="uuid"} 1
#temperature
furiosa_npu_hw_temperature{arch="rngd",core="0-7",device="npu0",kubernetes_node_name="node",label="peak",uuid="uuid"} 39
furiosa_npu_hw_temperature{arch="rngd",core="0-7",device="npu0",kubernetes_node_name="node",label="ambient",uuid="uuid"} 35
#power
furiosa_npu_hw_power{arch="rngd",core="0-7",device="npu0",kubernetes_node_name="node",label="rms",uuid="uuid"} 4795000
#core utilization
furiosa_npu_core_utilization{arch="rngd",core="0",device="npu0",kubernetes_node_name="node",uuid="uuid"} 90
furiosa_npu_core_utilization{arch="rngd",core="1",device="npu0",kubernetes_node_name="node",uuid="uuid"} 90
furiosa_npu_core_utilization{arch="rngd",core="2",device="npu0",kubernetes_node_name="node",uuid="uuid"} 90
furiosa_npu_core_utilization{arch="rngd",core="3",device="npu0",kubernetes_node_name="node",uuid="uuid"} 90
furiosa_npu_core_utilization{arch="rngd",core="4",device="npu0",kubernetes_node_name="node",uuid="uuid"} 90
furiosa_npu_core_utilization{arch="rngd",core="5",device="npu0",kubernetes_node_name="node",uuid="uuid"} 90
furiosa_npu_core_utilization{arch="rngd",core="6",device="npu0",kubernetes_node_name="node",uuid="uuid"} 90
furiosa_npu_core_utilization{arch="rngd",core="7",device="npu0",kubernetes_node_name="node",uuid="uuid"} 90
Deploying Furiosa Metrics Exporter with Helm
---------------------------------------------------------
The Furiosa metrics exporter helm chart is available at https://github.com/furiosa-ai/helm-charts. To configure deployment as you need, you can modify ``charts/furiosa-metrics-exporter/values.yaml``.
For example, the Furiosa metrics exporter Helm chart automatically creates a Service Object with Prometheus annotations to enable metric scraping automatically. You can modify the values.yaml to change the port or disable the Prometheus annotations if needed.
You can deploy the Furiosa Metrics Exporter by running the following commands:

.. code-block:: sh
helm repo add furiosa https://furiosa-ai.github.io/helm-charts
helm repo update
helm install furiosa-metrics-exporter furiosa/furiosa-metrics-exporter -n kube-system
Loading

0 comments on commit 25bf5d1

Please sign in to comment.