summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMaciej Żenczykowski <maze@google.com>2021-06-22 05:23:30 +0000
committerMaciej Zenczykowski <maze@google.com>2021-06-23 02:57:30 +0000
commit8449fac88ee70e329d856377a85456d27f2eef35 (patch)
tree9b0e25b1a12515fba3598ff96802fdf83f254758
parentbc0954cad3839951b81f11ee410a0189eb4b3b15 (diff)
downloadnetd-8449fac88ee70e329d856377a85456d27f2eef35.tar.gz
make failing to start TrafficController critical.
I don't want to debug any more kernel regressions, like the one introduced by "bpf: program: Refuse non-O_RDWR flags in BPF_OBJ_GET": 06-18 07:35:36.014 360 360 E TrafficController: Failed to get program from /sys/fs/bpf/prog_netd_cgroupskb_egress_stats: Invalid argument 06-18 07:35:36.023 360 360 E netd : Failed to start trafficcontroller: (Status[code: 22, msg: "[Invalid argument] : cgroup program get failed"]) which fail to get caught and cause failures in bpf_module_test (the test seems to get run very unreliably, people will much more easily notice device simply failing to boot due to netd/systemserver crash looping). see also: https://android-review.googlesource.com/c/kernel/common/+/1738604 https://android-review.googlesource.com/c/kernel/common/+/1741194 https://patchwork.kernel.org/project/netdevbpf/patch/20210618105526.265003-1-zenczykowski@gmail.com/ Test: atest Bug: 191826861 Signed-off-by: Maciej Żenczykowski <maze@google.com> Original-Change: https://android-review.googlesource.com/1740456 Merged-In: If87f02ea6fb713ebbb9a98dbc6c9b6ac477b727e Change-Id: If87f02ea6fb713ebbb9a98dbc6c9b6ac477b727e
-rw-r--r--server/Controllers.cpp6
1 files changed, 6 insertions, 0 deletions
diff --git a/server/Controllers.cpp b/server/Controllers.cpp
index 7e2780f0..1f2bac22 100644
--- a/server/Controllers.cpp
+++ b/server/Controllers.cpp
@@ -285,6 +285,12 @@ void Controllers::init() {
netdutils::Status tcStatus = trafficCtrl.start();
if (!isOk(tcStatus)) {
gLog.error("Failed to start trafficcontroller: (%s)", toString(tcStatus).c_str());
+ gLog.error("CRITICAL: sleeping 60 seconds, netd exiting with failure, crash loop likely!");
+ // The expected reason we get here is a major kernel or other code bug, as such
+ // the probability that things will succeed on restart of netd is pretty small.
+ // So, let's wait a minute to at least try to limit the log spam a little bit.
+ sleep(60);
+ exit(1);
}
gLog.info("Initializing traffic control: %" PRId64 "us", s.getTimeAndResetUs());