summaryrefslogtreecommitdiff
path: root/mali_kbase
AgeCommit message (Collapse)Author
2023-07-12Constrain protected memory allocation during FW initializationJörg Wagner
For protected FW sections avoid trying to grab a 2MB page but continue to use small pages with a tight size. If an allocation fails do not fail the whole device intialization, but just treat this case similar to not finding a protected allocator - remove the allocator reference from the device and continue. Bug 264977054 Commit-Topic: R43P0_KMD Change-Id: I024503ef833eb01d2e36e3075e39aea30d891a80 Signed-off-by: Debarshi Dutta <debarshid@google.com>
2023-07-12Merge upstream DDK R43P0 KMDDebarshi Dutta
Merge DDK version R43P0 from upstream branch Provenance: 48a9c7e25986318c8475bc245de51e7bec2606e8 (ipdelivery/EAC/v_r43p0) VX504X08X-BU-00000-r43p0-01eac0 - Valhall Android DDK VX504X08X-BU-60000-r43p0-01eac0 - Valhall Android Document Bundle VX504X08X-DC-11001-r43p0-01eac0 - Valhall Android DDK Software Errata VX504X08X-SW-99006-r43p0-01eac0 - Valhall Android Renderscript AOSP parts Bug 278174418 Commit-Topic: R43P0_KMD Signed-off-by: Debarshi Dutta <debarshid@google.com> Change-Id: I84fb19e7ce5f28e735d44a4993d51bd985aac80b
2023-07-11Mali allocations: unconditionally check for pending kill signalsJörg Wagner
Remove differentiation between kernel thread and ioctl triggered allocations - if the owner has a kill signal pending, stop requesting pages. Bug: 265224675 Change-Id: I70acfc9f3e6dc07dc040c456f11e3ddac5d49494
2023-07-09Merge android13-gs-pixel-5.10-udc into android13-gs-pixel-5.10-udc-qpr1PixelBot AutoMerger
SBMerger: 526756187 Change-Id: I3cbd7d81818ce93bc2ab9d95bc2cc3dd8d2aaa61 Signed-off-by: SecurityBot <android-nexus-securitybot@system.gserviceaccount.com>
2023-07-06pixel_gpu_uevent: Increase uevent ratelimiting timeout to 20minsVarad Gautam
Bug: 276704984 Change-Id: Id86861197e8f0929b3594fa28d21b8e3b6bee0f9 Signed-off-by: Varad Gautam <varadgautam@google.com>
2023-06-25Merge android13-gs-pixel-5.10-udc into android13-gs-pixel-5.10-udc-qpr1PixelBot AutoMerger
SBMerger: 526756187 Change-Id: I2aef3b329e47c52ef205c6849552ab82feab7675 Signed-off-by: SecurityBot <android-nexus-securitybot@system.gserviceaccount.com>
2023-06-23GPUCORE-38292 Fix Use-After-Free Race with Memory-Pool GrowNongji Chen
This commit fixes a race condition in kbase_mmu_page_fault_worker when a memory pool is required to grow. It addresses a potential racing window where the worker is dealing with a given region's growable pages on fault recovery yet the application side triggers a buffer close on the specific region. Change-Id: I25234396defd874ade30cf5075ed918e1142d96c Bug: 287629203 Provenance: https://code.ipdelivery.arm.com/c/GPU/mali-ddk/+/5549 (cherry picked from commit 221aa13af3d02f6b820adba0f50db7d203c41ba6)
2023-06-20kbase: csf: Reboot on failed GPU resetandroid-u-beta-4_r0.4android-u-beta-4_r0.3android-u-beta-4_r0.2android-gs-pantah-5.10-u-beta4android-gs-lynx-5.10-u-beta4android-gs-felix-5.10-u-beta4Varad Gautam
If reset failed, both KMD and the hardware are in an unrecoverable state. Any future attempts to process work or reset the GPU will fail, and it may take a long time (30mins) for the device to reboot and return to normal. Collect a system ramdump and reboot the device immediately when reset fails. Bug: 276855700 Test: Simulated failed reset and checked that a ramdump was generated. Change-Id: Iba901e1654d150b834303e0caa8fba2dc468b5ac Signed-off-by: Varad Gautam <varadgautam@google.com>
2023-06-18Merge android13-gs-pixel-5.10-udc into android13-gs-pixel-5.10-udc-qpr1PixelBot AutoMerger
SBMerger: 526756187 Change-Id: Iddb56c0a11fefedd9d44d653ddf327d075e4d919 Signed-off-by: SecurityBot <android-nexus-securitybot@system.gserviceaccount.com>
2023-06-15Add missing hwaccess_lock around atom_flags updates.Michael Stokes
Commit 0935897 (pa/1761483) added two additional katom flags, but updates to these new flags were not protected by hwaccess_lock, and could thus race with other updates and ultimately corrupt atom_flags. Bug: 265931966 Test: SST soak test Change-Id: I95acc5e335d8013394b11149abf5d9b793648c6f
2023-06-15GPUCORE-35754: Add barrier before updating GLB_DB_REQ to ring CSG DBSuzanne Candanedo
GPUCORE-35974: Add Memory Barrier between CS_REQ/ACK and CSG_DB_REQ/ACK The access to GLB_DB_REQ/ACK needs to be ordered with respect to CSG_REQ/ACK and CSG_DB_REQ/ACK to avoid a scenario where a CSI request overlaps with a CSG request or 2 CSI requests overlap and FW ends up missing the 2nd request. Memory barrier is required, both on Host and FW side, to guarantee the ordering. Bug: 286056062 Test: SST soak test Provenance: https://code.ipdelivery.arm.com/c/GPU/mali-ddk/+/4688 Provenance: https://code.ipdelivery.arm.com/c/GPU/mali-ddk/+/5435 Change-Id: I4de23e3f37b81749c6d668952b4f8dd21c669fea
2023-06-11Merge android13-gs-pixel-5.10-udc into android13-gs-pixel-5.10-udc-qpr1PixelBot AutoMerger
SBMerger: 526756187 Change-Id: I78a4e882d943b157a365612055ea922088ca2bff Signed-off-by: SecurityBot <android-nexus-securitybot@system.gserviceaccount.com>
2023-06-11Merge android13-gs-pixel-5.10-tm-qpr3 into android13-gs-pixel-5.10-udcPixelBot AutoMerger
SBMerger: 526756187 Change-Id: Ibe152c3a5f6bde3b32b1349e33175811bc895c38 Signed-off-by: SecurityBot <android-nexus-securitybot@system.gserviceaccount.com>
2023-06-08mali_kbase: Enable kutf modulesYunju Lee
Rename kutf to mali_kutf. Enable mali_kutf and mali_kutf_clk_rate_trace_test_portal. Bug: 267758398 Test: insmod Change-Id: I36fecd89bce4f87d31d452f5a913c95c22513c53 Signed-off-by: Yunju Lee <yunjulee@google.com>
2023-06-06GPUCORE-36682 Lock MMU while disabling AS to prevent use after freeandroid-13.0.0_r0.117android-13.0.0_r0.116android-13.0.0_r0.115android-13.0.0_r0.114android-13.0.0_r0.113android-13.0.0_r0.112android-gs-felix-5.10-android13-qpr3Suzanne Candanedo
During an invalid GPU page fault, kbase will try to flush the GPU cache and disable the faulting address space (AS). There is a small window between flushing of the GPU L2 cache (MMU resumes) and when the AS is disabled where existing jobs on the GPU may access memory for that AS, dirtying the GPU cache. This is a problem as the kctx->as_nr is marked as KBASEP_AS_NR_INVALID and thus no cache maintenance will be performed on the AS of the faulty context when cleaning up the csg_slot and releasing the context. This patch addresses that issue by: 1. locking the AS via a GPU command 2. flushing the cache 3. disabling the AS 4. unlocking the AS This ensures that any jobs remaining on the GPU will not be able to access the memory due to the locked AS. Once the AS is unlocked, any memory access will fail as the AS is now disabled. The issue only happens on CSF GPUs. To avoid any issues, the code path for non-CSF GPUs is left undisturbed. (cherry picked from commit 566789dffda3dfec00ecf00f9819e7a515fb2c61) Provenance: https://code.ipdelivery.arm.com/c/GPU/mali-ddk/+/5071 Bug: 274014055 Change-Id: I2028182878b4f88505cc135a5f53ae4c7e734650
2023-05-30GPUCORE-37961 Deadlock issue due to lock ordering issuekirdev01
This patch addresses the dead lock condition due to circular locking dependency between hwaccess_lock and clk_rtm->lock.Hwaccess_lock needs to be taken before clk_rtm->lock to avoid locking dependency. Change-Id: I1064dbbac7800282bf3a1ac167c9c476177aefd8 (cherry picked from commit e0dfe9669c3456ada4b860f6ba9859c59ffec9a7) Bug: 274687461 Provenance: https://code.ipdelivery.arm.com/c/GPU/mali-ddk/+/5258
2023-05-24Make sure jobs are flushed before kbasep_platform_context_termMattias Simonsson
If kbase_release is called while jobs are in progress, the driver will start by calling kbasep_platform_context_term before waiting for jobs to finish in kbase_context_flush_jobs. When the jobs do finish, the driver will call kbasep_platform_event_work_end, which leads to issues since the platform callback has already cleaned up resources for the kbase_context. Make sure kbase_context_flush_jobs is called before kbasep_platform_context_term. Test: start/stop processes over and over Bug: 278366794 Change-Id: Iee0297f4b64a3f6b59a5df0c26e46d446257a652
2023-05-23[Official] MIDCET-4546, GPUCORE-37946: Synchronize GPU cache flush cmds with ↵Suzanne Candanedo
silent reset on GPU power up Commands for GPU cache maintenance and TLB invalidation were sent after acquiring 'hwaccess_lock' and checking if the 'gpu_powered' flag is set. The combination of lock and the flag ensured that GPU registers remained accessible whilst the commands were in progress. If the flag was not set then the GPU power up was not performed and the commands were rightfully skipped. The 'gpu_powered' flag is set immediately after the Top-level power up of GPU is done by the platform specific power_on_callback() and so the registers can be safely accessed. If the callback returns 1 then a silent soft-reset of the GPU is performed after setting the flag. This lead to a race between the cache maintanence commands and the soft reset of GPU, due to which the commands did not complete or got lost and there was a timeout. This commit replaces the 'gpu_powered' flag with the 'gpu_ready' flag as the latter is set after the soft-reset is done and all the in-use GPU address spaces have been enabled. It is okay to skip the commands when the flag is false, as L2 cache would be in powered down state. The page migrate function is also updated to use 'gpu_ready' flag as that was also affected by the similar race with silent reset in GPUCORE-35861 and 'kbdev->pm.lock' had to be used. Change-Id: I4cefe3add2863d7b29f111d437061031b66e7080 (cherry picked from commit e31494f5b7b9e9101aab4bd75fa4dc7d7f47b66a) Provenance: https://code.ipdelivery.arm.com/c/GPU/mali-ddk/+/5284 Bug: 281540759
2023-05-18Merge android13-gs-pixel-5.10-tm-qpr3 into android13-gs-pixel-5.10-udcPixelBot AutoMerger
Bug: 281607159 SBMerger: 526756187 Change-Id: I15bc929d24d73e636f12cf880126e8192ba7d9cb Signed-off-by: SecurityBot <android-nexus-securitybot@system.gserviceaccount.com>
2023-05-11mali_kbase: hold GPU utilization for premature update.android-u-beta-3_r0.3android-u-beta-3_r0.2android-u-beta-2.1_r0.4android-u-beta-2.1_r0.3android-u-beta-2.1_r0.2android-gs-raviole-5.10-u-beta3android-gs-raviole-5.10-u-beta2android-gs-pantah-5.10-u-beta2android-gs-bluejay-5.10-u-beta3android-gs-bluejay-5.10-u-beta2Wei Wang
The GPU internal counter is updated every IPA_CONTROL_TIMER_DEFAULT_VALUE_MS milliseconds. If an utilization update occurs prematurely and the counter has not been updated, the same counter value will be obtained, resulting in a difference of zero. To handle this scenario, this change will skip the utilization update if the counter difference is zero and the update occurred less than 1.5 times the internal update period (IPA_CONTROL_TIMER_DEFAULT_VALUE_MS). Bug: 277649158 Test: boot and trace Change-Id: I6e3063355f560a2872297fd32d66be8a468cdf79 Signed-off-by: Wei Wang <wvw@google.com>
2023-05-09mali_kbase: Remove incorrect WARN()Debarshi Dutta
The WARN() call at the beginning of the function schedule_on_tick() is incorrect as kbase_gpu_interrupt() might enqueue another tick_work(2) into the scheduler before the already inflight worker tick_work(1) sets the tick_timer_active variable to true. This could result in a condition where the hrtimer hasn't still expired and tick_work(1) starts executing resulting in the WARN_ON() being fired. The timer works asynchronously with the tick_work() and hence this warning can be removed from here. Bug 207824944 Change-Id: I873624c76b0de102bbcdd451a8402cb1c096edda
2023-05-09MIDCET-4324/GPUCORE-35611 Unmapping of aliased sink-page memorySuzanne Candanedo
Aliased regions containing the BASE_MEM_WRITE_ALLOC_PAGES_HANDLE MMU sink-page were not previously being unmapped correctly. In particular, the PGD entries for these pages. This change addresses that issue. Further, care is taken to ensure the flush_pa_range path operates correctly, for applicable GPUs. Also updated various WARN_ONs to WARN_ONCEs in MMU layer, in places where these could potentially occur in large numbers, rapidly - thereby helping to reduce the chances of system stress in future, as could potentially have been caused by this particular issue. GPUCORE-36048 Remove SAME_VA flag from regular allocation This patchset removes the SAME_VA flag from the regular allocation done in the defect test for GPUCORE-35611. The test was failing on 32-bit systems because there was no way to enforce that the aliased memory and the regular allocation would fall into the same region, and thus a later assumption in the test would not hold. Change-Id: Ie665fb9330a7338b7e148d1c1db13fe3cc98ee5c (cherry picked from commit 823c7b2de1933ca42cf179862d033d79d1289073) Provenance: https://code.ipdelivery.arm.com/c/GPU/mali-ddk/+/4800 Bug: 260122837
2023-05-09[Official] MIDCET-4458, GPUCORE-36402: Check for process exit before page ↵Suzanne Candanedo
alloc from kthread The backing pages for native GPU allocations aren't always allocated in the ioctl context. A JIT_ALLOC softjob or KCPU command can get processed in the kernel worker thread. GPU page fault handling is anyways done in a kernel thread. Userspace can make Kbase allocate large number of backing pages from the kernel thread to cause out of memory situation, which would eventually lead to a kernel panic as OoM killer would run out of suitable processes to kill. Though Kbase will account for the backing pages and OoM killer will try to kill the culprit process, the memory already allocated by the process won't get freed as context termination would remain blocked or won't kick-in until kernel thread keeps trying to allocate the backing pages. For the allocation that is done from the context of kernel thread, OoM killer won't consider the kernel thread for killing and kernel would keep retrying to allocate physical page as long as the OoM killer is able to kill processes. For the memory allocation done from the ioctl context, kernel would eventually stop retrying when it sees that process has been marked for killing by the OoM killer. This commit adds a check for process exit in the page allocation loop. The check allows kernel thread to swiftly exit the page allocation loop once OoM killer has initiated the killing of culprit process (for which kernel thread is trying to allocate pages) thereby unblocking context termination and freeing of GPU memory already allocated by the process. This helps in preventing the kernel panic and also limits the number of innocent processes that gets killed. The use of __GFP_RETRY_MAYFAIL flag didn't help in all the scenarios. The flag ensures that OoM killer is not invoked directly and kernel doesn't keep retrying to allocate the page. But when system is running low on memory, other threads can invoke the OoM killer and the page allocation request from kthread could continue to get satisfied due to the killing of other processes and so the kthread may not always timely exit the page allocation loop. (cherry picked from commit 3c5c9328a7fc552e61972c1bbff4b56696682d30) GPUCORE-36402: Fix potential memleak and NULL ptr deref issue in Kbase The commit 3c5c9328a7fc552e61972c1bbff4b56696682d30 updated Kbase to check for the process exit in every iteration of the page allocation loop when the allocation is done from the context of kernel worker thread. The commit introduced a potential memleak and NULL pointer dereference issue (which was reported by Coverity). This commit adds the required fix for the 2 issues and also sets the task pointer only for the Userspace created contexts and not for the contexts created by Kbase i.e. privileged context created for the HW counter dumping and for the WA of HW issue TRYM-3485. Bug: 275614526 Change-Id: I8107edce09a2cb52d8586fc9f7990a25166f590e Signed-off-by: Guus Sliepen <gsliepen@google.com> Provenance: https://code.ipdelivery.arm.com/c/GPU/mali-ddk/+/5169 (cherry picked from commit 8294169160ebb0d11d7d22b11311ddf887fb0b63)
2023-05-05Revert "Revert "GPUCORE-36748 Fix kbase_gpu_mmap() error handling""android-13.0.0_r0.107android-13.0.0_r0.106android-13.0.0_r0.105android-13.0.0_r0.104android-13.0.0_r0.103android-13.0.0_r0.100Guus Sliepen
This reverts commit 75b4a4ab15df252b112439300203dbc9b6d46922. Bug: 274002431 Change-Id: I7055294a6615e8ff282b47f822d67ecb709307a3
2023-05-01Mali Valhall Android DDK r43p0-01eac0 KMDToby Sunrise
Provenance: 48a9c7e25986318c8475bc245de51e7bec2606e8 (ipdelivery/EAC/v_r43p0) VX504X08X-BU-00000-r43p0-01eac0 - Valhall Android DDK VX504X08X-BU-60000-r43p0-01eac0 - Valhall Android Document Bundle VX504X08X-DC-11001-r43p0-01eac0 - Valhall Android DDK Software Errata VX504X08X-SW-99006-r43p0-01eac0 - Valhall Android Renderscript AOSP parts Change-Id: I5df1914eba386e0bf507d4951240e1744f666a29
2023-05-01Mali Valhall Android DDK r42p0-01eac0 KMDToby Sunrise
Provenance: 300534375857cb2963042df7b788b1ab5616c500 (ipdelivery/EAC/v_r42p0) VX504X08X-BU-00000-r42p0-01eac0 - Valhall Android DDK VX504X08X-BU-60000-r42p0-01eac0 - Valhall Android Document Bundle VX504X08X-DC-11001-r42p0-01eac0 - Valhall Android DDK Software Errata VX504X08X-SW-99006-r42p0-01eac0 - Valhall Android Renderscript AOSP parts Change-Id: I3b15e01574f03706574a8edaf50dae4ba16e30c0
2023-04-27mali_kbase: [SLC-VK] Add CCTX memory class for explicit SLC allocations.Aleks Todorov
Bug: 265007605 Test: build_slider.sh UMD: http://ag/22336262 Change-Id: Ifc22c6b961860ad7955e974d21c2b7960fa55647
2023-04-21Revert "GPUCORE-36748 Fix kbase_gpu_mmap() error handling"android-t-qpr3-beta-3.1_r0.5android-t-qpr3-beta-3.1_r0.4android-t-qpr3-beta-3.1_r0.3android-13.0.0_r0.92android-13.0.0_r0.85android-13.0.0_r0.84android-13.0.0_r0.83android-13.0.0_r0.82android-gs-raviole-5.10-t-qpr3-beta-3android-gs-pantah-5.10-t-qpr3-beta-3android-gs-bluejay-5.10-t-qpr3-beta-3Kevin DuBois
This reverts commit 04bf4049652e9aa3e952bdc30c560054e1c0f060. Bug: 274827412 Reason for revert: stability Change-Id: I923387539eabbf72f51376decf95526f13339656
2023-04-21Revert "GPUCORE-36682 Lock MMU while disabling AS to prevent use after free"android-u-beta-2_r0.4android-u-beta-2_r0.3android-u-beta-2_r0.2Kevin DuBois
This reverts commit d4a9cc691fdde6aae0f5d40ad3d949ab76518e42. Bug: 274827412 Reason for revert: stability Change-Id: Id952d2656a642b0f363d579a51843a03e7750c2c
2023-04-21Revert "GPUCORE-36748 Fix kbase_gpu_mmap() error handling"Kevin DuBois
This reverts commit 04bf4049652e9aa3e952bdc30c560054e1c0f060. Bug: 274827412 Reason for revert: stability Change-Id: I530dc9425d9cb52ab88e8211c789def29b7607ac
2023-04-21Revert "GPUCORE-36682 Lock MMU while disabling AS to prevent use after free"Kevin DuBois
This reverts commit d4a9cc691fdde6aae0f5d40ad3d949ab76518e42. Bug: 274827412 Reason for revert: stability Change-Id: I929c4e7b11bd5b62a0c14a5b960b32127b26233a
2023-04-21[Official] MIDCET-4458, GPUCORE-36429: Prevent JIT allocations following ↵Suzanne Candanedo
unmap of tracking page This commit introduces new checks to ensure that, like allocations of native memory, JIT memory allocations are blocked after the unmap of the tracking page. Bug: 275615867 Provenance: https://code.ipdelivery.arm.com/c/GPU/mali-ddk/+/5168/ Change-Id: I32460df4e8898784e75084193e038a912f67b33e (cherry picked from commit 240d4e9206528a43340c22aa69b124436f9a4e01)
2023-04-21[Official] MIDCET-4458, GPUCORE-36635 Fix memory leak via GROUP_SUSPENDSuzanne Candanedo
Userspace can cause a memory leak for physical pages of SAME_VA allocations through GROUP_SUSPEND kcpu command. This commit fixes the memleak issue Bug: 275620394 Provenance: https://code.ipdelivery.arm.com/c/GPU/mali-ddk/+/5167 Change-Id: Iec155e23ea135cf1ea7592f38934dc617cc6b10e (cherry picked from commit 1f565b867e7bff3b3307db0960fabf028f95d981)
2023-04-17Flush mmu updates regardless of coherency modeVarad Gautam
kbase avoids flushing MMU updates on coherent systems, as these systems are expected to snoop CPU caches instead. This presents a problem on GS101/GS201 devices, where GPU->CPU cache snoop requests do not work as intended when the GPU is in protected mode (b/192236116) and the GPU ends up seeing stale memory / runs into page faults. As a software workaround, always flush MMU updates regardless of coherency mode, so that the GPU page tables are accurate. Note: This was initially added in I5473345d and reverted in I2a41a2044. Bug: 200555454 Change-Id: I51187cd7c042bde42c4fcdf976a9f7f8828155e1 Signed-off-by: Varad Gautam <varadgautam@google.com>
2023-04-17kbase: Add a debugfs file to test GPU ueventsVarad Gautam
This adds a trigger_uevent debugfs node that takes the uevent type and info as write parameter and fires the corresponding uevent. Bug: 275367216 Bug: 275367223 Test: Combined with userspace patches: b/276704984#comment2 Change-Id: Ic1e069259e5d068a4677c8d1472d74485b8a904c Signed-off-by: Varad Gautam <varadgautam@google.com>
2023-04-17kbase: Add new GPU uevents to kbaseVarad Gautam
Add the following types of GPU uevents: 1. KMD_ERROR: Reports incidents where kbase runs into an error (includes FW errors). 2. GPU_RESET: Reports failed or successful GPU reset incidents. Bug: 275367216 Bug: 275367223 Test: Combined with userspace patches: b/276704984#comment2 Change-Id: Ie0d18f96c590cba561e8425eba210136bfef039d Signed-off-by: Varad Gautam <varadgautam@google.com>
2023-04-17pixel: Introduce GPU uevents to notify userspace of GPU failuresVarad Gautam
Add an interface to emit uevents with env GPU_UEVENT_TYPE and GPU_UEVENT_INFO from kbase. This will be used to report common GPU failure conditions. To avoid flooding the userspace with uevents, these are ratelimited to one uevent per GPU_UEVENT_TYPE per GPU_UEVENT_TIMEOUT_MS. Bug: 275367216 Bug: 275367223 Test: Combined with userspace patches: b/276704984#comment2 Change-Id: I557df22c87f435aca4d05e0038609e1c9f82de54 Doc: go/pixel-gpu-instability-monitoring Signed-off-by: Varad Gautam <varadgautam@google.com>
2023-04-17[Official] MIDCET-4458, GPUCORE-36654 Use %pK on GPU bus faultSuzanne Candanedo
Physical address of GPU bus fault is useful for debugging purpose. However the physical address (emitted via 0x%016llX) is also sensitive information so it should be cautiously exposed to user space. Linux kernel provides the control for physical address exposure via 'kptr_restrict'. To allow this control to work, '%pK' must be used. Bug: 275623256 Provenance: https://code.ipdelivery.arm.com/c/GPU/mali-ddk/+/5166 Change-Id: I23171bafc47e96045e42dad533ed28fc8bbcef6b (cherry picked from commit d35be16de81d9bc55dc0a586d661391e1989d6c0)
2023-04-14Merge "Merge android13-gs-pixel-5.10-tm-qpr3 into ↵Pindar Yang
android13-gs-pixel-5.10-udc" into android13-gs-pixel-5.10-udc
2023-04-13mali_kbase: platform: Init GPU SLC contextJack Diver
Init call for SLC portion of the GPU context was missing. Bug: 276392249 Change-Id: I7f9b8a89a463f66845f5da91adca63d30f138c83 Signed-off-by: Jack Diver <diverj@google.com>
2023-04-13Add partial term support to pixel gpu initAnkit Goyal
Bug: 254279889 Test: Boots to home Signed-off-by: Jack Diver <diverj@google.com> (cherry picked from https://partner-android-review.googlesource.com/q/commit:08be62386b8b087e1979c0396a23847246ca36bb) Change-Id: I1427019107b67139381390a5a73bf518b99927c8
2023-04-12mali_kbase: Add missing wake_up(poweroff_wait) when cancelling poweroff.Michael Stokes
kbase_pm_update_active() may cancel an ongoing poweroff before the poweroff has completed and enqueued a gpu_poweroff_wait_work item, which when executed would have unblocked any waiters in kbase_pm_wait_for_poweroff_work_complete(). kbase_pm_update_active() must therefore also call wake_up(poweroff_wait) after resetting poweroff_wait_in_progress to false, to prevent kbase_pm_wait_for_poweroff_work_complete() from waiting indefinitely. This change also modifies the diagnostic patch in kbase_pm_wait_for_poweroff_work_complete() to avoid triggering a subsystem coredump if a gpu_poweroff_wait_work item is actually pending. Bug: 274137481 Test: Stability soak testing Change-Id: I9009a6eed7aa305ae04179263e308ba4259afc6a
2023-04-09Merge android13-gs-pixel-5.10-tm-qpr3 into android13-gs-pixel-5.10-udcPixelBot AutoMerger
SBMerger: 516612970 Change-Id: I2e07185ea841a4f0de9998a41ddfbef7d9e6aa8e Signed-off-by: SecurityBot <android-nexus-securitybot@system.gserviceaccount.com>
2023-04-06mali_kbase: platform: mgm: Get accurate SLC partition sizeJack Diver
Use mgm_resize_callback to update memory group size. Add entry point allowing memory group size to be queried. Bug: 264990406 Test: Boot to home Test: gfx-bench mh3.1 Change-Id: I80f595724c7418b97e07679719d2b76e4ee7b96f Signed-off-by: Jack Diver <diverj@google.com>
2023-04-05mali_kbase: Remove redundant if check to unblock suspendJack Diver
Completed atoms are expected to always have a flag indicating they were submitted. A warning is present to assert this fact. Currently, if the flag is not present it will block GPU suspend. Remove the if to unblock suspend and prevent a kernel lockup. Bug: 233522199 Change-Id: I541ac835ec36562f7724b35e171d71537e763ed9 Signed-off-by: Jack Diver <diverj@google.com>
2023-04-04mali_kbase: reset: Flush SSCD worker before resetting the GPUVarad Gautam
coredump_work isn't guaranteed to happen before reset, which means the resulting SSCD can contain either of pre-reset or post-reset state. post-reset state isn't helpful in debugging a GPU hang. Ensure that we always collect the pre-reset state by flushing the coredump worker before resetting the GPU. Bug: 264595878 Test: Raced reset debugfs write with trigger_core_dump sysfs write to check that the device is stable and coredump happens before reset. Change-Id: I7a553f8dd156d5dbee2d8008a70545641ed8dbe9 Signed-off-by: Varad Gautam <varadgautam@google.com>
2023-03-31pixel_gpu_sscd: Prevent dumping multiple SSCDs when the GPU hangsVarad Gautam
Add a heuristic to ratelimit SSCD generation for "GPU hang"-type coredumps. Typically when the GPU hangs, this codepath is hit multiple times leading to unnecessary SSCD generation per hang (sometimes > 200 coredumps for a single incident). The heuristic skips SSCD generation depending on: 1. whether there was a "GPU hang" coredump recently within the GPU_HANG_SSCD_TIMEOUT_MS time window. 2. whether there was an unsuccesful GPU reset, which implies the system will end up rebooting soon. Change-Id: I761057aee9c4ff9f32d658c49b99eb162486033b Bug: 264595878 Signed-off-by: Varad Gautam <varadgautam@google.com> Test: b/264595878#comment7
2023-03-31mali_kbase: reset: Add a helper to check GPU reset failureVarad Gautam
Kbase upstreaming: Pending Change-Id: I867d64897785348d499ad4d9a4f4c95f95e8df85 Signed-off-by: Varad Gautam <varadgautam@google.com> Bug: 264595878
2023-03-29Revert "mali_kbase: mem: Prevent vma splits"Debarshi Dutta
In the original bug, the protected memory imports via Base were ignoring the actual size of the import that came back from the kernel memory import routines. These resulted in errors as when these imports were freed, the incorrect size was passed resulting in only sub-regions of the original mapped range being unmapped resulting in cases where the GPU and CPU VAs ended up being inconsistent. A WAR was added to prevent VMA splits temporarily until a fix was provided for the protected memory size mismatch. As a result of this fix this WAR is no longer necessary. The consequences of this WAR is now resulting in failures for the case when an application tries to call mprotect(restrictive) on a memory already allocated and mmapped on by Vulkan API calls. Vulkan alloc() invokes the cmem_heap_alloc() function, which for the general case allocates some extra memory to fulfil the worse case alignment requirements. As a result invoking mprotect on the partial user provided range always result in VMA splits(). For further reference look at this article. https://lwn.net/Articles/182847/ Bug 269535398 This reverts commit 6d1d889156e68493842f5bb18fc9aed74cc57454. Change-Id: Ic5749fab2613d6495fd3669356697ff40bfafcb7
2023-03-28GPUCORE-36682 Lock MMU while disabling AS to prevent use after freeandroid-t-qpr3-beta-3_r0.5android-t-qpr3-beta-3_r0.4android-t-qpr3-beta-3_r0.3Suzanne Candanedo
*Affects CSF GPUs only, but changes to common code.* During an invalid GPU page fault, kbase will try to flush the GPU cache and disable the faulting address space (AS). There is a small window between flushing of the GPU L2 cache (MMU resumes) and when the AS is disabled where existing jobs on the GPU may access memory for that AS, dirtying the GPU cache. This is a problem as the kctx->as_nr is marked as KBASEP_AS_NR_INVALID and thus no cache maintenance will be performed on the AS of the faulty context when cleaning up the csg_slot and releasing the context. This patch addresses that issue by: 1. locking the AS via a GPU command 2. flushing the cache 3. disabling the AS 4. unlocking the AS This ensures that any jobs remaining on the GPU will not be able to access the memory due to the locked AS. Once the AS is unlocked, any memory access will fail as the AS is now disabled. Change-Id: I5e02face6ca0fa4526576dd70d0261ea3ee69506 (cherry picked from commit 566789dffda3dfec00ecf00f9819e7a515fb2c61) Provenance: https://code.ipdelivery.arm.com/c/GPU/mali-ddk/+/5071 Bug: 274014055