[Official] MIDCET-4458, GPUCORE-36402: Check for process exit before page alloc from kthread

The backing pages for native GPU allocations aren't always allocated in the ioctl context. A JIT_ALLOC softjob or KCPU command can get processed in the kernel worker thread. GPU page fault handling is anyways done in a kernel thread. Userspace can make Kbase allocate large number of backing pages from the kernel thread to cause out of memory situation, which would eventually lead to a kernel panic as OoM killer would run out of suitable processes to kill. Though Kbase will account for the backing pages and OoM killer will try to kill the culprit process, the memory already allocated by the process won't get freed as context termination would remain blocked or won't kick-in until kernel thread keeps trying to allocate the backing pages. For the allocation that is done from the context of kernel thread, OoM killer won't consider the kernel thread for killing and kernel would keep retrying to allocate physical page as long as the OoM killer is able to kill processes. For the memory allocation done from the ioctl context, kernel would eventually stop retrying when it sees that process has been marked for killing by the OoM killer. This commit adds a check for process exit in the page allocation loop. The check allows kernel thread to swiftly exit the page allocation loop once OoM killer has initiated the killing of culprit process (for which kernel thread is trying to allocate pages) thereby unblocking context termination and freeing of GPU memory already allocated by the process. This helps in preventing the kernel panic and also limits the number of innocent processes that gets killed. The use of __GFP_RETRY_MAYFAIL flag didn't help in all the scenarios. The flag ensures that OoM killer is not invoked directly and kernel doesn't keep retrying to allocate the page. But when system is running low on memory, other threads can invoke the OoM killer and the page allocation request from kthread could continue to get satisfied due to the killing of other processes and so the kthread may not always timely exit the page allocation loop. (cherry picked from commit 3c5c9328a7fc552e61972c1bbff4b56696682d30) GPUCORE-36402: Fix potential memleak and NULL ptr deref issue in Kbase The commit 3c5c9328a7fc552e61972c1bbff4b56696682d30 updated Kbase to check for the process exit in every iteration of the page allocation loop when the allocation is done from the context of kernel worker thread. The commit introduced a potential memleak and NULL pointer dereference issue (which was reported by Coverity). This commit adds the required fix for the 2 issues and also sets the task pointer only for the Userspace created contexts and not for the contexts created by Kbase i.e. privileged context created for the HW counter dumping and for the WA of HW issue TRYM-3485. Bug: 275614526 Change-Id: I8107edce09a2cb52d8586fc9f7990a25166f590e Signed-off-by: Guus Sliepen <gsliepen@google.com> Provenance: https://code.ipdelivery.arm.com/c/GPU/mali-ddk/+/5169 (cherry picked from commit 8294169160ebb0d11d7d22b11311ddf887fb0b63)
author: Suzanne Candanedo <suzanne.candanedo@arm.com> 2023-04-12 12:31:53 +0100
committer: Guus Sliepen <gsliepen@google.com> 2023-05-09 07:46:56 +0000
commit: 41f159f6de2788d7ce6993ba20218bcb8392ace1 (patch)
tree: 74f90aee88180d6e4dabbb2d7e326abfb2e70d78 /mali_kbase/mali_kbase_mem.c
parent: b08aa4e87a4adc0af4fea283d3af26637e2fdd8a (diff)
download: gpu-41f159f6de2788d7ce6993ba20218bcb8392ace1.tar.gz
1 files changed, 7 insertions, 12 deletions
diff --git a/mali_kbase/mali_kbase_mem.c b/mali_kbase/mali_kbase_mem.c
index ce6e94c..fc25a71 100644
--- a/mali_kbase/mali_kbase_mem.c
+++ b/mali_kbase/mali_kbase_mem.c
@@ -2471,11 +2471,8 @@ int kbase_alloc_phy_pages_helper(struct kbase_mem_phy_alloc *alloc,
 	if (nr_left >= (SZ_2M / SZ_4K)) {
 		int nr_lp = nr_left / (SZ_2M / SZ_4K);
 
-		res = kbase_mem_pool_alloc_pages(
-			&kctx->mem_pools.large[alloc->group_id],
-			 nr_lp * (SZ_2M / SZ_4K),
-			 tp,
-			 true);
+		res = kbase_mem_pool_alloc_pages(&kctx->mem_pools.large[alloc->group_id],
+						 nr_lp * (SZ_2M / SZ_4K), tp, true, kctx->task);
 
 		if (res > 0) {
 			nr_left -= res;
@@ -2527,9 +2524,8 @@ int kbase_alloc_phy_pages_helper(struct kbase_mem_phy_alloc *alloc,
 				if (np)
 					break;
 
-				err = kbase_mem_pool_grow(
-					&kctx->mem_pools.large[alloc->group_id],
-					1);
+				err = kbase_mem_pool_grow(&kctx->mem_pools.large[alloc->group_id],
+							  1, kctx->task);
 				if (err)
 					break;
 			} while (1);
@@ -2574,9 +2570,8 @@ no_new_partial:
 #endif
 
 	if (nr_left) {
-		res = kbase_mem_pool_alloc_pages(
-			&kctx->mem_pools.small[alloc->group_id],
-			nr_left, tp, false);
+		res = kbase_mem_pool_alloc_pages(&kctx->mem_pools.small[alloc->group_id], nr_left,
+						 tp, false, kctx->task);
 		if (res <= 0)
 			goto alloc_failed;
 	}
@@ -4074,7 +4069,7 @@ static int kbase_jit_grow(struct kbase_context *kctx,
 		spin_unlock(&kctx->mem_partials_lock);
 
 		kbase_gpu_vm_unlock(kctx);
-		ret = kbase_mem_pool_grow(pool, pool_delta);
+		ret = kbase_mem_pool_grow(pool, pool_delta, kctx->task);
 		kbase_gpu_vm_lock(kctx);
 
 		if (ret)
author	Suzanne Candanedo <suzanne.candanedo@arm.com>	2023-04-12 12:31:53 +0100
committer	Guus Sliepen <gsliepen@google.com>	2023-05-09 07:46:56 +0000
commit	41f159f6de2788d7ce6993ba20218bcb8392ace1 (patch)
tree	74f90aee88180d6e4dabbb2d7e326abfb2e70d78 /mali_kbase/mali_kbase_mem.c
parent	b08aa4e87a4adc0af4fea283d3af26637e2fdd8a (diff)
download	gpu-41f159f6de2788d7ce6993ba20218bcb8392ace1.tar.gz