summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorRobert Carr <racarr@google.com>2021-12-31 16:59:34 -0800
committerRob Carr <racarr@google.com>2022-09-28 21:03:54 +0000
commit405e2f68fbe2992e3b06f1f0a55df331d5150a64 (patch)
treefb2c2e42edd72b8cfa2ebb7e06c892e46aad9199
parent59e3eb4a04ab5e8d5ef569f19d5ab71b2e5b472e (diff)
downloadnative-405e2f68fbe2992e3b06f1f0a55df331d5150a64.tar.gz
BlastBufferQueue: Fake release if not received by complete
This is a cherry-pick from of a CL from sc-v2 adding a safety mechanism for lost buffers. The original CL probably should have been merged in to master, and I can't relate to my thought process adding DO NOT MERGE on the original CL. Recently we have started seeing some of these ANRs again. Well actually we have seen them continuously, but most have been due to GPU hangs. Recently we started seeing some which aren't due to GPU hangs. We found one trace, which showed the sort of situation described below, which lead me to realizing this code was never merged in to master. It shoud be relatively safe since we released it on sc-v2. Perfetto telemetry shows a cluster where a transactionComplete is arriving but an earlier releaseBufferCallback has never arrived. This leads to blocking and ANRs. We try to work around this by generating a fake release buffer callback for previous buffers when we receive a TransactionCompleteCallback. How can we know this is safe? The criteria to safely release buffers are as follows: Condition 1. SurfaceFlinger does not plan to use the buffer Condition 2. Have the acquire fence (or a later signalling fence)if dropped, have the HWC release fence if presented. Now lets break down cases where the transaction complete callback arrives before the release buffer callback. All release buffer callbacks fall in to two categories: Category 1. In band with the complete callback. In which case the client library in SurfaceComposerClient always sends release callbacks first. Category 2. Out of band, e.g. from setBuffer or merge when dropping buffers In category 2 there are two scenarios: Scenario 1: Release callback was never going to arrive (bug) Scenario 2. Release callback descheduled, e.g. a. Transaction is filled with buffer and held in VRI/WM/SysUI b. A second transaction with buffer is merged in, the release buffer callback is emitted but not scheduled (async binder) c. The merged transaction is applied d. SurfaceFlinger processes a frame and emits the transaction complete callback e. The transaction complete callback is scheduled before the release callback is ever scheduled (since the 1 way binders are from different processes). In both scenarios, we can satisfy Conditions 1 and 2, because the HWC present fence is a later signalling fence than the acquire fence which the out of band callback will pass. While we may block extra on this fence. We will be safe by condition 2. Receiving the transaction complete callback should indicate condition 1 is satisfied. In category 1 there should only be a single scenario, the release buffer callback for frame N-1 is emitted immediately before the transaction callback for frame N, and so if it hasn't come at that time (and isn't out of band/scheduled e.g. category 2) then it will never come. We can further break this down in to two scenarios: 1. stat.previousReleaseFence is set correctly. This is the expected case. In this case, this is the same fence we would receive from the release buffer callback, and so we can satisfy condition 1 and 2 and safely release. 2. stat.previousReleaseFence is not set correctly. There is no known reason for this to happen, but there is also no known reason why we would receive a complete callback without the corresponding release, and so we have to explore the option as a potential risk/behavior change from the change. Hypothetically in category 1 case 2 we could experience some tearing. In category 1 we are currently experiencing ANR, and so the tradeoff is a risk of tearing vs a certainty of ANR. To tear the following would have to happen: 1. Enter the case where we previously ANRed 2. Enter the subcase where the release fence is not set correctly 3. Be scheduled very fast on the client side, in practicality HWC present has already returned and the fence should be firing very soon 4. The previous frame not be GL comp, in which case we were already done with the buffer and don't need the fence anyway. Conditions 2 and 3 should already make the occurence of tearing much lower than the occurence of ANRs. Furthermore 100% of observed ANRs were during GL comp (rotation animation) and so the telemetry indicates the risk of introducing tearing is very low. To review, we have shown that we can break down the side effects from the change in to two buckets (category 1, and category 2). In category 2, the possible negative side effect is to use too late of a fence. However this is still safe, and should be exceedingly rare. In category 1 we have shown that there are two scenarios, and in one of these scenarios we can experience tearing. We have furthermore argued that the risk of tearing should be exceedingly low even in this scenario, while the risk of ANR in this scenario was nearly 100%. Bug: 247246160 Bug: 244218818 Test: Existing tests pass Change-Id: I757ed132ac076aa811df816e04a68f57b38e47e7
-rw-r--r--libs/gui/BLASTBufferQueue.cpp28
-rw-r--r--libs/gui/SurfaceComposerClient.cpp6
-rw-r--r--libs/gui/include/gui/BLASTBufferQueue.h9
-rw-r--r--libs/gui/include/gui/SurfaceComposerClient.h7
-rw-r--r--libs/gui/include/gui/test/CallbackUtils.h3
5 files changed, 46 insertions, 7 deletions
diff --git a/libs/gui/BLASTBufferQueue.cpp b/libs/gui/BLASTBufferQueue.cpp
index aba81f60d8..a51bbb1553 100644
--- a/libs/gui/BLASTBufferQueue.cpp
+++ b/libs/gui/BLASTBufferQueue.cpp
@@ -307,7 +307,6 @@ void BLASTBufferQueue::transactionCommittedCallback(nsecs_t /*latchTime*/,
BQA_LOGE("No matching SurfaceControls found: mSurfaceControlsWithPendingCallback was "
"empty.");
}
-
decStrong((void*)transactionCommittedCallbackThunk);
}
}
@@ -350,6 +349,20 @@ void BLASTBufferQueue::transactionCallback(nsecs_t /*latchTime*/, const sp<Fence
stat.latchTime,
stat.frameEventStats.dequeueReadyTime);
}
+ auto currFrameNumber = stat.frameEventStats.frameNumber;
+ std::vector<ReleaseCallbackId> staleReleases;
+ for (const auto& [key, value]: mSubmitted) {
+ if (currFrameNumber > key.framenumber) {
+ staleReleases.push_back(key);
+ }
+ }
+ for (const auto& staleRelease : staleReleases) {
+ BQA_LOGE("Faking releaseBufferCallback from transactionCompleteCallback");
+ BBQ_TRACE("FakeReleaseCallback");
+ releaseBufferCallbackLocked(staleRelease,
+ stat.previousReleaseFence ? stat.previousReleaseFence : Fence::NO_FENCE,
+ stat.currentMaxAcquiredBufferCount);
+ }
} else {
BQA_LOGE("Failed to find matching SurfaceControl in transactionCallback");
}
@@ -390,7 +403,14 @@ void BLASTBufferQueue::releaseBufferCallback(
const ReleaseCallbackId& id, const sp<Fence>& releaseFence,
std::optional<uint32_t> currentMaxAcquiredBufferCount) {
BBQ_TRACE();
+
std::unique_lock _lock{mMutex};
+ releaseBufferCallbackLocked(id, releaseFence, currentMaxAcquiredBufferCount);
+}
+
+void BLASTBufferQueue::releaseBufferCallbackLocked(const ReleaseCallbackId& id,
+ const sp<Fence>& releaseFence, std::optional<uint32_t> currentMaxAcquiredBufferCount) {
+ ATRACE_CALL();
BQA_LOGV("releaseBufferCallback %s", id.to_string().c_str());
// Calculate how many buffers we need to hold before we release them back
@@ -408,7 +428,11 @@ void BLASTBufferQueue::releaseBufferCallback(
const auto numPendingBuffersToHold =
isEGL ? std::max(0u, mMaxAcquiredBuffers - mCurrentMaxAcquiredBufferCount) : 0;
- mPendingRelease.emplace_back(ReleasedBuffer{id, releaseFence});
+
+ auto rb = ReleasedBuffer{id, releaseFence};
+ if (std::find(mPendingRelease.begin(), mPendingRelease.end(), rb) == mPendingRelease.end()) {
+ mPendingRelease.emplace_back(rb);
+ }
// Release all buffers that are beyond the ones that we need to hold
while (mPendingRelease.size() > numPendingBuffersToHold) {
diff --git a/libs/gui/SurfaceComposerClient.cpp b/libs/gui/SurfaceComposerClient.cpp
index 9358e29030..0f5192d41c 100644
--- a/libs/gui/SurfaceComposerClient.cpp
+++ b/libs/gui/SurfaceComposerClient.cpp
@@ -352,7 +352,8 @@ void TransactionCompletedListener::onTransactionCompleted(ListenerStats listener
transactionStats.latchTime, surfaceStats.acquireTimeOrFence,
transactionStats.presentFence,
surfaceStats.previousReleaseFence, surfaceStats.transformHint,
- surfaceStats.eventStats);
+ surfaceStats.eventStats,
+ surfaceStats.currentMaxAcquiredBufferCount);
}
callbackFunction(transactionStats.latchTime, transactionStats.presentFence,
@@ -377,7 +378,8 @@ void TransactionCompletedListener::onTransactionCompleted(ListenerStats listener
transactionStats.latchTime, surfaceStats.acquireTimeOrFence,
transactionStats.presentFence,
surfaceStats.previousReleaseFence, surfaceStats.transformHint,
- surfaceStats.eventStats);
+ surfaceStats.eventStats,
+ surfaceStats.currentMaxAcquiredBufferCount);
if (callbacksMap[callbackId].surfaceControls[surfaceStats.surfaceControl]) {
callbacksMap[callbackId]
.surfaceControls[surfaceStats.surfaceControl]
diff --git a/libs/gui/include/gui/BLASTBufferQueue.h b/libs/gui/include/gui/BLASTBufferQueue.h
index 9d287910a5..f5898d20f1 100644
--- a/libs/gui/include/gui/BLASTBufferQueue.h
+++ b/libs/gui/include/gui/BLASTBufferQueue.h
@@ -95,9 +95,12 @@ public:
const std::vector<SurfaceControlStats>& stats);
void releaseBufferCallback(const ReleaseCallbackId& id, const sp<Fence>& releaseFence,
std::optional<uint32_t> currentMaxAcquiredBufferCount);
+ void releaseBufferCallbackLocked(const ReleaseCallbackId& id, const sp<Fence>& releaseFence,
+ std::optional<uint32_t> currentMaxAcquiredBufferCount);
void syncNextTransaction(std::function<void(SurfaceComposerClient::Transaction*)> callback,
bool acquireSingleBuffer = true);
void stopContinuousSyncTransaction();
+
void mergeWithNextTransaction(SurfaceComposerClient::Transaction* t, uint64_t frameNumber);
void applyPendingTransactions(uint64_t frameNumber);
SurfaceComposerClient::Transaction* gatherPendingTransactions(uint64_t frameNumber);
@@ -177,6 +180,12 @@ private:
struct ReleasedBuffer {
ReleaseCallbackId callbackId;
sp<Fence> releaseFence;
+ bool operator==(const ReleasedBuffer& rhs) const {
+ // Only compare Id so if we somehow got two callbacks
+ // with different fences we don't decrement mNumAcquired
+ // too far.
+ return rhs.callbackId == callbackId;
+ }
};
std::deque<ReleasedBuffer> mPendingRelease GUARDED_BY(mMutex);
diff --git a/libs/gui/include/gui/SurfaceComposerClient.h b/libs/gui/include/gui/SurfaceComposerClient.h
index b598b43403..9033e17c53 100644
--- a/libs/gui/include/gui/SurfaceComposerClient.h
+++ b/libs/gui/include/gui/SurfaceComposerClient.h
@@ -65,14 +65,16 @@ struct SurfaceControlStats {
SurfaceControlStats(const sp<SurfaceControl>& sc, nsecs_t latchTime,
std::variant<nsecs_t, sp<Fence>> acquireTimeOrFence,
const sp<Fence>& presentFence, const sp<Fence>& prevReleaseFence,
- uint32_t hint, FrameEventHistoryStats eventStats)
+ uint32_t hint, FrameEventHistoryStats eventStats,
+ uint32_t currentMaxAcquiredBufferCount)
: surfaceControl(sc),
latchTime(latchTime),
acquireTimeOrFence(std::move(acquireTimeOrFence)),
presentFence(presentFence),
previousReleaseFence(prevReleaseFence),
transformHint(hint),
- frameEventStats(eventStats) {}
+ frameEventStats(eventStats),
+ currentMaxAcquiredBufferCount(currentMaxAcquiredBufferCount) {}
sp<SurfaceControl> surfaceControl;
nsecs_t latchTime = -1;
@@ -81,6 +83,7 @@ struct SurfaceControlStats {
sp<Fence> previousReleaseFence;
uint32_t transformHint = 0;
FrameEventHistoryStats frameEventStats;
+ uint32_t currentMaxAcquiredBufferCount = 0;
};
using TransactionCompletedCallbackTakesContext =
diff --git a/libs/gui/include/gui/test/CallbackUtils.h b/libs/gui/include/gui/test/CallbackUtils.h
index 62d1496ccd..08785b49c1 100644
--- a/libs/gui/include/gui/test/CallbackUtils.h
+++ b/libs/gui/include/gui/test/CallbackUtils.h
@@ -135,7 +135,8 @@ private:
void verifySurfaceControlStats(const SurfaceControlStats& surfaceControlStats,
nsecs_t latchTime) const {
const auto& [surfaceControl, latch, acquireTimeOrFence, presentFence,
- previousReleaseFence, transformHint, frameEvents] = surfaceControlStats;
+ previousReleaseFence, transformHint, frameEvents, ignore] =
+ surfaceControlStats;
ASSERT_TRUE(std::holds_alternative<nsecs_t>(acquireTimeOrFence));
ASSERT_EQ(std::get<nsecs_t>(acquireTimeOrFence) > 0,