diff options
author | Maciej Żenczykowski <maze@google.com> | 2021-06-18 02:12:35 -0700 |
---|---|---|
committer | Maciej Żenczykowski <maze@google.com> | 2021-06-22 04:48:28 +0000 |
commit | 0f104d5bbbc3da8f7a125c6b8967112823ae2f7e (patch) | |
tree | 0ffa78b3c1a6bb6ae8b50152b30f7c83571c9530 /server/Controllers.cpp | |
parent | 4e7c160003e83e473ea719ec9b4f73d3c125cedb (diff) | |
download | netd-0f104d5bbbc3da8f7a125c6b8967112823ae2f7e.tar.gz |
make failing to start TrafficController critical.
I don't want to debug any more kernel regressions, like the one
introduced by "bpf: program: Refuse non-O_RDWR flags in BPF_OBJ_GET":
06-18 07:35:36.014 360 360 E TrafficController: Failed to get program from /sys/fs/bpf/prog_netd_cgroupskb_egress_stats: Invalid argument
06-18 07:35:36.023 360 360 E netd : Failed to start trafficcontroller: (Status[code: 22, msg: "[Invalid argument] : cgroup program get failed"])
which fail to get caught and cause failures in bpf_module_test
(the test seems to get run very unreliably, people will much more
easily notice device simply failing to boot due to netd/systemserver
crash looping).
see also:
https://android-review.googlesource.com/c/kernel/common/+/1738604
https://android-review.googlesource.com/c/kernel/common/+/1741194
https://patchwork.kernel.org/project/netdevbpf/patch/20210618105526.265003-1-zenczykowski@gmail.com/
Test: atest
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: If87f02ea6fb713ebbb9a98dbc6c9b6ac477b727e
Diffstat (limited to 'server/Controllers.cpp')
-rw-r--r-- | server/Controllers.cpp | 6 |
1 files changed, 6 insertions, 0 deletions
diff --git a/server/Controllers.cpp b/server/Controllers.cpp index 7e2780f0..1f2bac22 100644 --- a/server/Controllers.cpp +++ b/server/Controllers.cpp @@ -285,6 +285,12 @@ void Controllers::init() { netdutils::Status tcStatus = trafficCtrl.start(); if (!isOk(tcStatus)) { gLog.error("Failed to start trafficcontroller: (%s)", toString(tcStatus).c_str()); + gLog.error("CRITICAL: sleeping 60 seconds, netd exiting with failure, crash loop likely!"); + // The expected reason we get here is a major kernel or other code bug, as such + // the probability that things will succeed on restart of netd is pretty small. + // So, let's wait a minute to at least try to limit the log spam a little bit. + sleep(60); + exit(1); } gLog.info("Initializing traffic control: %" PRId64 "us", s.getTimeAndResetUs()); |