You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Let's assume we're running a number of $W_{veeam}$ worker instances, each providing a maximum throughput of $\psi_{worker}$ MB/s, so the total throughput $\Psi_{total}$ can be calculated as
Per default, Veeam Backup for Azure would start a dedicated worker instance for each VM to be backed up, i.e $W_{veeam}$ would be equal to the total number $N_{vm}$ of VMs. But as we are limiting the number of worker instances to a value $\leq N_{vm};$, we have to consider running the chosen amount of workers multiple times in sequence, as each worker instance can process only a single workload at a time. The required amount $\rho_{workers}$ of these sequential "worker runs" is determined by
$$\rho_{workers} = \frac{N_{vm}}{W_{veeam}}$$ (3)
Hence, we need to multiply the duration $T_{parallel}$ by the number $\rho_{workers}$ of runs to obtain the duration $T_{seq}$ (in hours) that is required to process our set of $N_{vm}$ VMs (with $N_{vm}\geq W_{veeam}$):
To be clear: $T_{seq}$ defines the backup window $T_{backup}$ required to process the given workload, i.e.
$$T_{backup} = T_{seq} = T_{parallel} \cdot \rho_{workers}$$
By substituting $T_{parallel}$ with (2) and $\rho_{workers}$ with (3), we get
By solving this equation for $W_{veeam};$, we are now able to calculate the required number of workers $W_{veeam}$ as a function of their throughput $\psi_{worker}$ (in MB/s), the number $N_{vm}$ and (incremental) backup volume $V_{incr}$ (in TB) of protected VMs as well as the limiting backup window $T_{backup}$ (in hours):
We must not forget that our target storage accounts have an ingress limit $\psi_{stacc}$ (in MB/s). We have to divide the maximum throughput provided by the workers running in parallel by this limit to determine the minimum number of target storage accounts capable of ingesting the data at the required speed:
By using (6) to substitute $W_{veeam}$ in (7), the number of required storage accounts can be calculated as a function of well known input parameters $N_{vm};$, $V_{incr}$ [TB], $\psi_{worker}$ [MB/s], $\psi_{stacc}$ [MB/s], and $T_{backup}$ [hours]:
There is another storage account limitation that needs to be considered: Each storage account has a maximum "request rate" of $r_{stacc/iops}$ ($= 20.000$ IOPS) as described in the User Guide. The request rate $r_{veeam/iops}$ created by Veeam Backup for Azure during incremental backup processing can be calculated based data volume $ V_{incr}$ (total amount of data written during incremental backup in TB), Veeam's block size value $\beta_{veeam}$ ($=1024$ kB at source) and the backup time $T_{backup}$ (in hours) as
This request rate has to be lower than or equal to $r_{stacc/iops}$ multiplied with the number of storage accounts $ N_{stacc}$ we're targeting in parallel
While (8) provides a result based on storage account ingress speed limit (in MB/s), equation (9) is based on the storage accounts' ingress rate limit (in IOPS). For a real world calculation, we have to use both formulas and pick the higher result, because both limits must not be exceeded. We can write this down as
A single policy within Veeam Backup for Azure can only target a single storage account (as a primary backup target). So, the number of storage accounts $N_{stacc}$ we just derived also defines the minimum number $ N_{pol}$ of required policies, in other words
$$ N_{pol} \geq N_{stacc}$$
Assuming we do not want to create more policies than required (i.e., we're maxing out the policies as much as possible), the number $N_{pol}$ of policies is given by
$$ N_{pol} = N_{stacc}$$ (11)
In an ideal world, we can also assume an equal distribution of all $N_{vm}$ VMs across all existing policies $N_{pol};$, with each policy containing the same amount $ N_{vm/pol}$ of VMs, defined as
It's as simple as that and it's all we need because $N_{stacc}$ is already known from (10) and $ N_{vm}$ should be a well known input parameter.
Appliance Requirements
According to the sizing and scalability guidelines provided in the User Guide, the amount $M_{pol}$ of RAM required by running a number of $ N_{pol}$ policies (each processing $ N_{vm/pol}$ VMs) in parallel on a single Veeam Backup for Azure appliance can be calculated as
Additionally, some overhead RAM $M_{appl/oh},$ required by the appliance OS and the WebUI and REST API service needs to be taken into account. This value depends on the amount $ M_{appl/total}$ of RAM provisoned to the appliance VM.
This provides us with a theoretical value of the amount of required appliance RAM based on our assumptions and input parameters. The User Guide recommends allocating an additional margin of $20%$ of RAM for production environments.
In addition, I would recommend also increasing the number of policies $N_{pol}$ used in (16), since our assumptions from above (all policies are maximally utilized, and that all VMs are equally distributed across all policies) are unlikely to be achieved in reality. To be on the safe side, I'd start with doubling the value of $ N_{pol}$ calculated by (11) while keeping (not reducing) the value of VMs per policy $N_{vm/pol}$.
Putting these two additional thoughts (doubling $ N_{pol}$ and adding $ 20%$ overall) into (16) results in