Skip to content

Commit fd23ce3

Browse files
author
Martin KaFai Lau
committed
Merge branch 'bpf-qdisc'
Amery Hung says: ==================== bpf qdisc Hi all, This patchset aims to support implementing qdisc using bpf struct_ops. This version takes a step back and only implements the minimum support for bpf qdisc. 1) support of adding skb to bpf_list and bpf_rbtree directly and 2) classful qdisc are deferred to future patchsets. In addition, we only allow attaching bpf qdisc to root or mq for now. This is to prevent accidentally breaking exisiting classful qdiscs that rely on data in a child qdisc. This limit may be lifted in the future after careful inspection. * Overview * This series supports implementing qdisc using bpf struct_ops. bpf qdisc aims to be a flexible and easy-to-use infrastructure that allows users to quickly experiment with different scheduling algorithms/policies. It only requires users to implement core qdisc logic using bpf and implements the mundane part for them. In addition, the ability to easily communicate between qdisc and other components will also bring new opportunities for new applications and optimizations. * Performance of bpf qdisc * This patchset includes two qdisc examples, bpf_fifo and bpf_fq, for __testing__ purposes. For performance test, we compare selftests and their kernel counterparts to give you a sense of the performance of qdisc implemented in bpf. The implementation of bpf_fq is fairly complex and slightly different from fq so later we only compare the two fifo qdiscs. bpf_fq implements a scheduling algorithm similar to fq before commit 29f834a ("net_sched: sch_fq: add 3 bands and WRR scheduling") was introduced. bpf_fifo uses a single bpf_list as a queue instead of three queues for different priorities in pfifo_fast. The time complexity of fifo however should be similar since the queue selection time is negligible. Test setup: client -> qdisc -------------> server ~~~~~~~~~~~~~~~ ~~~~~~ nested VM1 @ DC1 VM2 @ DC2 Throghput: iperf3 -t 600, 5 times Qdisc Average (GBits/sec) ---------- ------------------- pfifo_fast 12.52 ± 0.26 bpf_fifo 11.72 ± 0.32 fq 10.24 ± 0.13 bpf_fq 11.92 ± 0.64 Latency: sockperf pp --tcp -t 600, 5 times Qdisc Average (usec) ---------- -------------- pfifo_fast 244.58 ± 7.93 bpf_fifo 244.92 ± 15.22 fq 234.30 ± 19.25 bpf_fq 221.34 ± 10.76 Looking at the two fifo qdiscs, the 6.4% drop in throughput in the bpf implementatioin is consistent with previous observation (v8 throughput test on a loopback device). This should be able to be mitigated by supporting adding skb to bpf_list or bpf_rbtree directly in the future. * Clean up skb in bpf qdisc during reset * The current implementation relies on bpf qdisc implementors to correctly release skbs in queues (bpf graphs or maps) in .reset, which might not be a safe thing to do. The solution as Martin has suggested would be supporting private data in struct_ops. This can also help simplifying implementation of qdisc that works with mq. For examples, qdiscs in the selftest mostly use global data. Therefore, even if user add multiple qdisc instances under mq, they would still share the same queue. ==================== Link: https://patch.msgid.link/20250409214606.2000194-1-ameryhung@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2 parents ab734b4 + 2b7b5b7 commit fd23ce3

File tree

14 files changed

+1585
-11
lines changed

14 files changed

+1585
-11
lines changed

include/linux/btf.h

+1
Original file line numberDiff line numberDiff line change
@@ -522,6 +522,7 @@ bool btf_param_match_suffix(const struct btf *btf,
522522
const char *suffix);
523523
int btf_ctx_arg_offset(const struct btf *btf, const struct btf_type *func_proto,
524524
u32 arg_no);
525+
u32 btf_ctx_arg_idx(struct btf *btf, const struct btf_type *func_proto, int off);
525526

526527
struct bpf_verifier_log;
527528

kernel/bpf/btf.c

+3-3
Original file line numberDiff line numberDiff line change
@@ -6391,8 +6391,8 @@ static bool is_int_ptr(struct btf *btf, const struct btf_type *t)
63916391
return btf_type_is_int(t);
63926392
}
63936393

6394-
static u32 get_ctx_arg_idx(struct btf *btf, const struct btf_type *func_proto,
6395-
int off)
6394+
u32 btf_ctx_arg_idx(struct btf *btf, const struct btf_type *func_proto,
6395+
int off)
63966396
{
63976397
const struct btf_param *args;
63986398
const struct btf_type *t;
@@ -6671,7 +6671,7 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
66716671
tname, off);
66726672
return false;
66736673
}
6674-
arg = get_ctx_arg_idx(btf, t, off);
6674+
arg = btf_ctx_arg_idx(btf, t, off);
66756675
args = (const struct btf_param *)(t + 1);
66766676
/* if (t == NULL) Fall back to default BPF prog with
66776677
* MAX_BPF_FUNC_REG_ARGS u64 arguments.

net/sched/Kconfig

+12
Original file line numberDiff line numberDiff line change
@@ -403,6 +403,18 @@ config NET_SCH_ETS
403403

404404
If unsure, say N.
405405

406+
config NET_SCH_BPF
407+
bool "BPF-based Qdisc"
408+
depends on BPF_SYSCALL && BPF_JIT && DEBUG_INFO_BTF
409+
help
410+
This option allows BPF-based queueing disiplines. With BPF struct_ops,
411+
users can implement supported operators in Qdisc_ops using BPF programs.
412+
The queue holding skb can be built with BPF maps or graphs.
413+
414+
Say Y here if you want to use BPF-based Qdisc.
415+
416+
If unsure, say N.
417+
406418
menuconfig NET_SCH_DEFAULT
407419
bool "Allow override default queue discipline"
408420
help

net/sched/Makefile

+1
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ obj-$(CONFIG_NET_SCH_FQ_PIE) += sch_fq_pie.o
6262
obj-$(CONFIG_NET_SCH_CBS) += sch_cbs.o
6363
obj-$(CONFIG_NET_SCH_ETF) += sch_etf.o
6464
obj-$(CONFIG_NET_SCH_TAPRIO) += sch_taprio.o
65+
obj-$(CONFIG_NET_SCH_BPF) += bpf_qdisc.o
6566

6667
obj-$(CONFIG_NET_CLS_U32) += cls_u32.o
6768
obj-$(CONFIG_NET_CLS_ROUTE4) += cls_route.o

0 commit comments

Comments
 (0)