summaryrefslogtreecommitdiff
path: root/server/Controllers.cpp
diff options
context:
space:
mode:
authorMaciej Żenczykowski <maze@google.com>2021-06-18 02:12:35 -0700
committerMaciej Żenczykowski <maze@google.com>2021-06-22 04:48:28 +0000
commit0f104d5bbbc3da8f7a125c6b8967112823ae2f7e (patch)
tree0ffa78b3c1a6bb6ae8b50152b30f7c83571c9530 /server/Controllers.cpp
parent4e7c160003e83e473ea719ec9b4f73d3c125cedb (diff)
downloadnetd-0f104d5bbbc3da8f7a125c6b8967112823ae2f7e.tar.gz
make failing to start TrafficController critical.
I don't want to debug any more kernel regressions, like the one introduced by "bpf: program: Refuse non-O_RDWR flags in BPF_OBJ_GET": 06-18 07:35:36.014 360 360 E TrafficController: Failed to get program from /sys/fs/bpf/prog_netd_cgroupskb_egress_stats: Invalid argument 06-18 07:35:36.023 360 360 E netd : Failed to start trafficcontroller: (Status[code: 22, msg: "[Invalid argument] : cgroup program get failed"]) which fail to get caught and cause failures in bpf_module_test (the test seems to get run very unreliably, people will much more easily notice device simply failing to boot due to netd/systemserver crash looping). see also: https://android-review.googlesource.com/c/kernel/common/+/1738604 https://android-review.googlesource.com/c/kernel/common/+/1741194 https://patchwork.kernel.org/project/netdevbpf/patch/20210618105526.265003-1-zenczykowski@gmail.com/ Test: atest Signed-off-by: Maciej Żenczykowski <maze@google.com> Change-Id: If87f02ea6fb713ebbb9a98dbc6c9b6ac477b727e
Diffstat (limited to 'server/Controllers.cpp')
-rw-r--r--server/Controllers.cpp6
1 files changed, 6 insertions, 0 deletions
diff --git a/server/Controllers.cpp b/server/Controllers.cpp
index 7e2780f0..1f2bac22 100644
--- a/server/Controllers.cpp
+++ b/server/Controllers.cpp
@@ -285,6 +285,12 @@ void Controllers::init() {
netdutils::Status tcStatus = trafficCtrl.start();
if (!isOk(tcStatus)) {
gLog.error("Failed to start trafficcontroller: (%s)", toString(tcStatus).c_str());
+ gLog.error("CRITICAL: sleeping 60 seconds, netd exiting with failure, crash loop likely!");
+ // The expected reason we get here is a major kernel or other code bug, as such
+ // the probability that things will succeed on restart of netd is pretty small.
+ // So, let's wait a minute to at least try to limit the log spam a little bit.
+ sleep(60);
+ exit(1);
}
gLog.info("Initializing traffic control: %" PRId64 "us", s.getTimeAndResetUs());