diff options
author | Maciej Żenczykowski <maze@google.com> | 2021-06-22 05:23:30 +0000 |
---|---|---|
committer | Maciej Zenczykowski <maze@google.com> | 2021-06-23 02:57:30 +0000 |
commit | 8449fac88ee70e329d856377a85456d27f2eef35 (patch) | |
tree | 9b0e25b1a12515fba3598ff96802fdf83f254758 | |
parent | bc0954cad3839951b81f11ee410a0189eb4b3b15 (diff) | |
download | netd-8449fac88ee70e329d856377a85456d27f2eef35.tar.gz |
make failing to start TrafficController critical.
I don't want to debug any more kernel regressions, like the one
introduced by "bpf: program: Refuse non-O_RDWR flags in BPF_OBJ_GET":
06-18 07:35:36.014 360 360 E TrafficController: Failed to get program from /sys/fs/bpf/prog_netd_cgroupskb_egress_stats: Invalid argument
06-18 07:35:36.023 360 360 E netd : Failed to start trafficcontroller: (Status[code: 22, msg: "[Invalid argument] : cgroup program get failed"])
which fail to get caught and cause failures in bpf_module_test
(the test seems to get run very unreliably, people will much more
easily notice device simply failing to boot due to netd/systemserver
crash looping).
see also:
https://android-review.googlesource.com/c/kernel/common/+/1738604
https://android-review.googlesource.com/c/kernel/common/+/1741194
https://patchwork.kernel.org/project/netdevbpf/patch/20210618105526.265003-1-zenczykowski@gmail.com/
Test: atest
Bug: 191826861
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Original-Change: https://android-review.googlesource.com/1740456
Merged-In: If87f02ea6fb713ebbb9a98dbc6c9b6ac477b727e
Change-Id: If87f02ea6fb713ebbb9a98dbc6c9b6ac477b727e
-rw-r--r-- | server/Controllers.cpp | 6 |
1 files changed, 6 insertions, 0 deletions
diff --git a/server/Controllers.cpp b/server/Controllers.cpp index 7e2780f0..1f2bac22 100644 --- a/server/Controllers.cpp +++ b/server/Controllers.cpp @@ -285,6 +285,12 @@ void Controllers::init() { netdutils::Status tcStatus = trafficCtrl.start(); if (!isOk(tcStatus)) { gLog.error("Failed to start trafficcontroller: (%s)", toString(tcStatus).c_str()); + gLog.error("CRITICAL: sleeping 60 seconds, netd exiting with failure, crash loop likely!"); + // The expected reason we get here is a major kernel or other code bug, as such + // the probability that things will succeed on restart of netd is pretty small. + // So, let's wait a minute to at least try to limit the log spam a little bit. + sleep(60); + exit(1); } gLog.info("Initializing traffic control: %" PRId64 "us", s.getTimeAndResetUs()); |