Age | Commit message (Collapse) | Author |
|
SBMerger: 526756187
Change-Id: Iddb56c0a11fefedd9d44d653ddf327d075e4d919
Signed-off-by: SecurityBot <android-nexus-securitybot@system.gserviceaccount.com>
|
|
Commit 0935897 (pa/1761483) added two additional katom flags, but
updates to these new flags were not protected by hwaccess_lock, and
could thus race with other updates and ultimately corrupt atom_flags.
Bug: 265931966
Test: SST soak test
Change-Id: I95acc5e335d8013394b11149abf5d9b793648c6f
|
|
GPUCORE-35974: Add Memory Barrier between CS_REQ/ACK and CSG_DB_REQ/ACK
The access to GLB_DB_REQ/ACK needs to be ordered with respect to
CSG_REQ/ACK and CSG_DB_REQ/ACK to avoid a scenario where a CSI
request overlaps with a CSG request or 2 CSI requests overlap and
FW ends up missing the 2nd request. Memory barrier is required,
both on Host and FW side, to guarantee the ordering.
Bug: 286056062
Test: SST soak test
Provenance: https://code.ipdelivery.arm.com/c/GPU/mali-ddk/+/4688
Provenance: https://code.ipdelivery.arm.com/c/GPU/mali-ddk/+/5435
Change-Id: I4de23e3f37b81749c6d668952b4f8dd21c669fea
|
|
SBMerger: 526756187
Change-Id: I78a4e882d943b157a365612055ea922088ca2bff
Signed-off-by: SecurityBot <android-nexus-securitybot@system.gserviceaccount.com>
|
|
SBMerger: 526756187
Change-Id: Ibe152c3a5f6bde3b32b1349e33175811bc895c38
Signed-off-by: SecurityBot <android-nexus-securitybot@system.gserviceaccount.com>
|
|
Rename kutf to mali_kutf. Enable mali_kutf and
mali_kutf_clk_rate_trace_test_portal.
Bug: 267758398
Test: insmod
Change-Id: I36fecd89bce4f87d31d452f5a913c95c22513c53
Signed-off-by: Yunju Lee <yunjulee@google.com>
|
|
android13-gs-pixel-5.10-udc-qpr1" into android13-gs-pixel-5.10-udc-qpr1
|
|
During an invalid GPU page fault, kbase will try to flush the GPU cache
and disable the faulting address space (AS). There is a small window
between flushing of the GPU L2 cache (MMU resumes) and when the AS is
disabled where existing jobs on the GPU may access memory for that AS,
dirtying the GPU cache.
This is a problem as the kctx->as_nr is marked as KBASEP_AS_NR_INVALID
and thus no cache maintenance will be performed on the AS of the faulty
context when cleaning up the csg_slot and releasing the context.
This patch addresses that issue by:
1. locking the AS via a GPU command
2. flushing the cache
3. disabling the AS
4. unlocking the AS
This ensures that any jobs remaining on the GPU will not be able to
access the memory due to the locked AS. Once the AS is unlocked, any
memory access will fail as the AS is now disabled.
The issue only happens on CSF GPUs. To avoid any issues, the code path
for non-CSF GPUs is left undisturbed.
(cherry picked from commit 566789dffda3dfec00ecf00f9819e7a515fb2c61)
Provenance: https://code.ipdelivery.arm.com/c/GPU/mali-ddk/+/5071
Bug: 274014055
Change-Id: I2028182878b4f88505cc135a5f53ae4c7e734650
|
|
Only register/unregister enough shrinkers to mitigate shrinker_rwsem
contention and potentially improve memory reclaim time.
Bug: 285850873
Test: Verify /d/mali0/ctx/$id/mem_pool_size
Change-Id: I52b7cd7c0c6965397a84efef7e9545c3698c7c2c
Signed-off-by: liangjlee <liangjlee@google.com>
|
|
pt_size_init ignores size update unless the size is invalidated,
therefore invalidate the size before resetting the partition to
size 0 when disabling it.
Before this change startup logs are as follows:
[ 2.648001] google,slc-acpm slc-acpm: ptid 3 size 0K
[ 2.648164] mali-mgm physical-memory-group-manager: pt_size_init: tried to set size to 0 and got 262144
After this change startup logs are as follows:
[ 2.625674] google,slc-acpm slc-acpm: ptid 3 size 0K
[ 2.625852] mali-mgm physical-memory-group-manager: pt_size_invalidate: set size to sentinel (18446744073709551615)
[ 2.626263] mali-mgm physical-memory-group-manager: pt_size_init: tried to set size to 0 and got 0
Bug: 284108328
Test: ./build_slider.sh
Test: ./build_cloudripper.sh
Test: boot to home
Change-Id: Iaf506c4a148c215ed94b7a5af469dc73cf482b67
Merged-In: Iaf506c4a148c215ed94b7a5af469dc73cf482b67
Signed-off-by: Aleks Todorov <aleksbgbg@google.com>
|
|
SBMerger: 526756187
Change-Id: I8c1147988e6e9bcbaa15a491b1f05af84afe9c65
Signed-off-by: SecurityBot <android-nexus-securitybot@system.gserviceaccount.com>
|
|
This patch addresses the dead lock condition due to circular locking
dependency between hwaccess_lock and clk_rtm->lock.Hwaccess_lock needs
to be taken before clk_rtm->lock to avoid locking dependency.
Change-Id: I1064dbbac7800282bf3a1ac167c9c476177aefd8
(cherry picked from commit e0dfe9669c3456ada4b860f6ba9859c59ffec9a7)
Bug: 274687461
Provenance: https://code.ipdelivery.arm.com/c/GPU/mali-ddk/+/5258
|
|
Bug: 277560840
Change-Id: I20dc55f7dfd84b84fade6d1213cfabead37a4423
Signed-off-by: Wilson Sung <wilsonsung@google.com>
|
|
If kbase_release is called while jobs are in progress, the driver
will start by calling kbasep_platform_context_term before waiting
for jobs to finish in kbase_context_flush_jobs. When the jobs do
finish, the driver will call kbasep_platform_event_work_end, which
leads to issues since the platform callback has already cleaned
up resources for the kbase_context.
Make sure kbase_context_flush_jobs is called before
kbasep_platform_context_term.
Test: start/stop processes over and over
Bug: 278366794
Change-Id: Iee0297f4b64a3f6b59a5df0c26e46d446257a652
|
|
silent reset on GPU power up
Commands for GPU cache maintenance and TLB invalidation were sent after
acquiring 'hwaccess_lock' and checking if the 'gpu_powered' flag is set.
The combination of lock and the flag ensured that GPU registers remained
accessible whilst the commands were in progress. If the flag was not set
then the GPU power up was not performed and the commands were rightfully
skipped.
The 'gpu_powered' flag is set immediately after the Top-level power up
of GPU is done by the platform specific power_on_callback() and so the
registers can be safely accessed. If the callback returns 1 then a
silent soft-reset of the GPU is performed after setting the flag.
This lead to a race between the cache maintanence commands and the soft
reset of GPU, due to which the commands did not complete or got lost
and there was a timeout.
This commit replaces the 'gpu_powered' flag with the 'gpu_ready' flag
as the latter is set after the soft-reset is done and all the in-use
GPU address spaces have been enabled. It is okay to skip the commands
when the flag is false, as L2 cache would be in powered down state.
The page migrate function is also updated to use 'gpu_ready' flag as
that was also affected by the similar race with silent reset in
GPUCORE-35861 and 'kbdev->pm.lock' had to be used.
Change-Id: I4cefe3add2863d7b29f111d437061031b66e7080
(cherry picked from commit e31494f5b7b9e9101aab4bd75fa4dc7d7f47b66a)
Provenance: https://code.ipdelivery.arm.com/c/GPU/mali-ddk/+/5284
Bug: 281540759
|
|
Bug: 281607159
SBMerger: 526756187
Change-Id: I15bc929d24d73e636f12cf880126e8192ba7d9cb
Signed-off-by: SecurityBot <android-nexus-securitybot@system.gserviceaccount.com>
|
|
The GPU internal counter is updated every IPA_CONTROL_TIMER_DEFAULT_VALUE_MS milliseconds.
If an utilization update occurs prematurely and
the counter has not been updated, the same counter
value will be obtained, resulting in a difference
of zero.
To handle this scenario, this change will skip the
utilization update if the counter difference is zero
and the update occurred less than 1.5 times the internal
update period (IPA_CONTROL_TIMER_DEFAULT_VALUE_MS).
Bug: 277649158
Test: boot and trace
Change-Id: I6e3063355f560a2872297fd32d66be8a468cdf79
Signed-off-by: Wei Wang <wvw@google.com>
|
|
The WARN() call at the beginning of the function schedule_on_tick()
is incorrect as kbase_gpu_interrupt() might enqueue another
tick_work(2) into the scheduler before the already inflight worker
tick_work(1) sets the tick_timer_active variable to true.
This could result in a condition where the hrtimer hasn't still expired
and tick_work(1) starts executing resulting in the WARN_ON() being fired.
The timer works asynchronously with the tick_work() and hence this warning
can be removed from here.
Bug 207824944
Change-Id: I873624c76b0de102bbcdd451a8402cb1c096edda
|
|
Aliased regions containing the BASE_MEM_WRITE_ALLOC_PAGES_HANDLE MMU
sink-page were not previously being unmapped correctly. In
particular, the PGD entries for these pages. This change addresses
that issue. Further, care is taken to ensure the flush_pa_range
path operates correctly, for applicable GPUs.
Also updated various WARN_ONs to WARN_ONCEs in MMU layer, in places
where these could potentially occur in large numbers, rapidly -
thereby helping to reduce the chances of system stress in future,
as could potentially have been caused by this particular issue.
GPUCORE-36048 Remove SAME_VA flag from regular allocation
This patchset removes the SAME_VA flag from the regular allocation
done in the defect test for GPUCORE-35611. The test was failing on
32-bit systems because there was no way to enforce that the aliased
memory and the regular allocation would fall into the same region,
and thus a later assumption in the test would not hold.
Change-Id: Ie665fb9330a7338b7e148d1c1db13fe3cc98ee5c
(cherry picked from commit 823c7b2de1933ca42cf179862d033d79d1289073)
Provenance: https://code.ipdelivery.arm.com/c/GPU/mali-ddk/+/4800
Bug: 260122837
|
|
alloc from kthread
The backing pages for native GPU allocations aren't always allocated in
the ioctl context. A JIT_ALLOC softjob or KCPU command can get processed
in the kernel worker thread. GPU page fault handling is anyways done in
a kernel thread.
Userspace can make Kbase allocate large number of backing pages from the
kernel thread to cause out of memory situation, which would eventually
lead to a kernel panic as OoM killer would run out of suitable processes
to kill.
Though Kbase will account for the backing pages and OoM killer will try
to kill the culprit process, the memory already allocated by the process
won't get freed as context termination would remain blocked or won't
kick-in until kernel thread keeps trying to allocate the backing pages.
For the allocation that is done from the context of kernel thread,
OoM killer won't consider the kernel thread for killing and kernel
would keep retrying to allocate physical page as long as the OoM
killer is able to kill processes.
For the memory allocation done from the ioctl context, kernel would
eventually stop retrying when it sees that process has been marked
for killing by the OoM killer.
This commit adds a check for process exit in the page allocation loop.
The check allows kernel thread to swiftly exit the page allocation loop
once OoM killer has initiated the killing of culprit process (for which
kernel thread is trying to allocate pages) thereby unblocking context
termination and freeing of GPU memory already allocated by the process.
This helps in preventing the kernel panic and also limits the number of
innocent processes that gets killed.
The use of __GFP_RETRY_MAYFAIL flag didn't help in all the scenarios.
The flag ensures that OoM killer is not invoked directly and kernel
doesn't keep retrying to allocate the page. But when system is running
low on memory, other threads can invoke the OoM killer and the page
allocation request from kthread could continue to get satisfied due to
the killing of other processes and so the kthread may not always timely
exit the page allocation loop.
(cherry picked from commit 3c5c9328a7fc552e61972c1bbff4b56696682d30)
GPUCORE-36402: Fix potential memleak and NULL ptr deref issue in Kbase
The commit 3c5c9328a7fc552e61972c1bbff4b56696682d30 updated Kbase to
check for the process exit in every iteration of the page allocation
loop when the allocation is done from the context of kernel worker
thread. The commit introduced a potential memleak and NULL pointer
dereference issue (which was reported by Coverity).
This commit adds the required fix for the 2 issues and also sets the
task pointer only for the Userspace created contexts and not for the
contexts created by Kbase i.e. privileged context created for the HW
counter dumping and for the WA of HW issue TRYM-3485.
Bug: 275614526
Change-Id: I8107edce09a2cb52d8586fc9f7990a25166f590e
Signed-off-by: Guus Sliepen <gsliepen@google.com>
Provenance: https://code.ipdelivery.arm.com/c/GPU/mali-ddk/+/5169
(cherry picked from commit 8294169160ebb0d11d7d22b11311ddf887fb0b63)
|
|
This reverts commit 75b4a4ab15df252b112439300203dbc9b6d46922.
Bug: 274002431
Change-Id: I7055294a6615e8ff282b47f822d67ecb709307a3
|
|
Provenance: 48a9c7e25986318c8475bc245de51e7bec2606e8 (ipdelivery/EAC/v_r43p0)
VX504X08X-BU-00000-r43p0-01eac0 - Valhall Android DDK
VX504X08X-BU-60000-r43p0-01eac0 - Valhall Android Document Bundle
VX504X08X-DC-11001-r43p0-01eac0 - Valhall Android DDK Software Errata
VX504X08X-SW-99006-r43p0-01eac0 - Valhall Android Renderscript AOSP parts
Change-Id: I5df1914eba386e0bf507d4951240e1744f666a29
|
|
Provenance: 300534375857cb2963042df7b788b1ab5616c500 (ipdelivery/EAC/v_r42p0)
VX504X08X-BU-00000-r42p0-01eac0 - Valhall Android DDK
VX504X08X-BU-60000-r42p0-01eac0 - Valhall Android Document Bundle
VX504X08X-DC-11001-r42p0-01eac0 - Valhall Android DDK Software Errata
VX504X08X-SW-99006-r42p0-01eac0 - Valhall Android Renderscript AOSP parts
Change-Id: I3b15e01574f03706574a8edaf50dae4ba16e30c0
|
|
allocations.
Bug: 265007605
Test: build_slider.sh
Change-Id: Ie75bb74248e5bdc98b226f9907c3831d38f5905f
|
|
Bug: 265007605
Test: build_slider.sh
UMD: http://ag/22335635
Change-Id: I032ab48a850ba3918cb056c72e719fbb978b3d77
|
|
Bug: 265007605
Test: build_slider.sh
UMD: http://ag/22336262
Change-Id: Ifc22c6b961860ad7955e974d21c2b7960fa55647
|
|
android13-gs-pixel-5.10-udc" into android13-gs-pixel-5.10-udc
|
|
pt_client_free expects a partition index rather than the allocated ptid.
Currently enabled partitions should be disabled rather than freed,
freeing the ptid of an enabled partition is a bug.
Bug: 279416508
Signed-off-by: Jack Diver <diverj@google.com>
(cherry picked from https://partner-android-review.googlesource.com/q/commit:e74cbbaef43c1445cb474c2b2fd0cbab785a5858)
Merged-In: Ib90ebc6e90a9a213d78b8983ca01b00cd81fb5b9
Change-Id: Ib90ebc6e90a9a213d78b8983ca01b00cd81fb5b9
|
|
Bug: 277936698
Test: gfx-bench shmoo
Signed-off-by: Jack Diver <diverj@google.com>
(cherry picked from https://partner-android-review.googlesource.com/q/commit:1f65451f5891d8053975405b60ac364ed96aa148)
Merged-In: Id83704cffde39a279e91eb19b1ae5a4a130992e0
Change-Id: Id83704cffde39a279e91eb19b1ae5a4a130992e0
|
|
SBMerger: 516612970
Change-Id: I93b384e082bc3acc9cf505ed2d1ac57fb0a0488b
Signed-off-by: SecurityBot <android-nexus-securitybot@system.gserviceaccount.com>
|
|
This reverts commit 04bf4049652e9aa3e952bdc30c560054e1c0f060.
Bug: 274827412
Reason for revert: stability
Change-Id: I923387539eabbf72f51376decf95526f13339656
|
|
This reverts commit d4a9cc691fdde6aae0f5d40ad3d949ab76518e42.
Bug: 274827412
Reason for revert: stability
Change-Id: Id952d2656a642b0f363d579a51843a03e7750c2c
|
|
This reverts commit 04bf4049652e9aa3e952bdc30c560054e1c0f060.
Bug: 274827412
Reason for revert: stability
Change-Id: I530dc9425d9cb52ab88e8211c789def29b7607ac
|
|
This reverts commit d4a9cc691fdde6aae0f5d40ad3d949ab76518e42.
Bug: 274827412
Reason for revert: stability
Change-Id: I929c4e7b11bd5b62a0c14a5b960b32127b26233a
|
|
unmap of tracking page
This commit introduces new checks to ensure that,
like allocations of native memory, JIT memory
allocations are blocked after the unmap of the
tracking page.
Bug: 275615867
Provenance: https://code.ipdelivery.arm.com/c/GPU/mali-ddk/+/5168/
Change-Id: I32460df4e8898784e75084193e038a912f67b33e
(cherry picked from commit 240d4e9206528a43340c22aa69b124436f9a4e01)
|
|
Userspace can cause a memory leak for physical pages of SAME_VA allocations through GROUP_SUSPEND kcpu command.
This commit fixes the memleak issue
Bug: 275620394
Provenance: https://code.ipdelivery.arm.com/c/GPU/mali-ddk/+/5167
Change-Id: Iec155e23ea135cf1ea7592f38934dc617cc6b10e
(cherry picked from commit 1f565b867e7bff3b3307db0960fabf028f95d981)
|
|
kbase avoids flushing MMU updates on coherent systems, as these
systems are expected to snoop CPU caches instead.
This presents a problem on GS101/GS201 devices, where GPU->CPU cache
snoop requests do not work as intended when the GPU is in protected mode
(b/192236116) and the GPU ends up seeing stale memory / runs into page
faults.
As a software workaround, always flush MMU updates regardless of
coherency mode, so that the GPU page tables are accurate.
Note: This was initially added in I5473345d and reverted in I2a41a2044.
Bug: 200555454
Change-Id: I51187cd7c042bde42c4fcdf976a9f7f8828155e1
Signed-off-by: Varad Gautam <varadgautam@google.com>
|
|
This adds a trigger_uevent debugfs node that takes the uevent type and
info as write parameter and fires the corresponding uevent.
Bug: 275367216
Bug: 275367223
Test: Combined with userspace patches: b/276704984#comment2
Change-Id: Ic1e069259e5d068a4677c8d1472d74485b8a904c
Signed-off-by: Varad Gautam <varadgautam@google.com>
|
|
Add the following types of GPU uevents:
1. KMD_ERROR: Reports incidents where kbase runs into an error
(includes FW errors).
2. GPU_RESET: Reports failed or successful GPU reset incidents.
Bug: 275367216
Bug: 275367223
Test: Combined with userspace patches: b/276704984#comment2
Change-Id: Ie0d18f96c590cba561e8425eba210136bfef039d
Signed-off-by: Varad Gautam <varadgautam@google.com>
|
|
Add an interface to emit uevents with env GPU_UEVENT_TYPE and
GPU_UEVENT_INFO from kbase. This will be used to report common
GPU failure conditions.
To avoid flooding the userspace with uevents, these are ratelimited
to one uevent per GPU_UEVENT_TYPE per GPU_UEVENT_TIMEOUT_MS.
Bug: 275367216
Bug: 275367223
Test: Combined with userspace patches: b/276704984#comment2
Change-Id: I557df22c87f435aca4d05e0038609e1c9f82de54
Doc: go/pixel-gpu-instability-monitoring
Signed-off-by: Varad Gautam <varadgautam@google.com>
|
|
Physical address of GPU bus fault is useful for debugging purpose.
However the physical address (emitted via 0x%016llX) is also sensitive
information so it should be cautiously exposed to user space.
Linux kernel provides the control for physical address exposure
via 'kptr_restrict'. To allow this control to work, '%pK' must be used.
Bug: 275623256
Provenance: https://code.ipdelivery.arm.com/c/GPU/mali-ddk/+/5166
Change-Id: I23171bafc47e96045e42dad533ed28fc8bbcef6b
(cherry picked from commit d35be16de81d9bc55dc0a586d661391e1989d6c0)
|
|
android13-gs-pixel-5.10-udc" into android13-gs-pixel-5.10-udc
|
|
Init call for SLC portion of the GPU context was missing.
Bug: 276392249
Change-Id: I7f9b8a89a463f66845f5da91adca63d30f138c83
Signed-off-by: Jack Diver <diverj@google.com>
|
|
Bug: 254279889
Test: Boots to home
Signed-off-by: Jack Diver <diverj@google.com>
(cherry picked from https://partner-android-review.googlesource.com/q/commit:08be62386b8b087e1979c0396a23847246ca36bb)
Change-Id: I1427019107b67139381390a5a73bf518b99927c8
|
|
kbase_pm_update_active() may cancel an ongoing poweroff before the
poweroff has completed and enqueued a gpu_poweroff_wait_work item,
which when executed would have unblocked any waiters in
kbase_pm_wait_for_poweroff_work_complete().
kbase_pm_update_active() must therefore also call wake_up(poweroff_wait)
after resetting poweroff_wait_in_progress to false, to prevent
kbase_pm_wait_for_poweroff_work_complete() from waiting indefinitely.
This change also modifies the diagnostic patch in
kbase_pm_wait_for_poweroff_work_complete() to avoid triggering a
subsystem coredump if a gpu_poweroff_wait_work item is actually pending.
Bug: 274137481
Test: Stability soak testing
Change-Id: I9009a6eed7aa305ae04179263e308ba4259afc6a
|
|
SBMerger: 516612970
Change-Id: I2e07185ea841a4f0de9998a41ddfbef7d9e6aa8e
Signed-off-by: SecurityBot <android-nexus-securitybot@system.gserviceaccount.com>
|
|
Bug: 264990406
Test: Boot to home
Test: gfx-bench shmoo
Change-Id: I502031f0f5ade7053c487f4c50981c1c05eea7d4
Signed-off-by: Jack Diver <diverj@google.com>
|
|
Bug: 264990406
Test: Boot to home
Test: gfx-bench mh3.1
Change-Id: Icce0f68a07f33ec8cd9f85ae7d0436ab58891adb
Signed-off-by: Jack Diver <diverj@google.com>
|
|
Use mgm_resize_callback to update memory group size.
Add entry point allowing memory group size to be queried.
Bug: 264990406
Test: Boot to home
Test: gfx-bench mh3.1
Change-Id: I80f595724c7418b97e07679719d2b76e4ee7b96f
Signed-off-by: Jack Diver <diverj@google.com>
|
|
Completed atoms are expected to always have a flag indicating they were
submitted.
A warning is present to assert this fact.
Currently, if the flag is not present it will block GPU suspend.
Remove the if to unblock suspend and prevent a kernel lockup.
Bug: 233522199
Change-Id: I541ac835ec36562f7724b35e171d71537e763ed9
Signed-off-by: Jack Diver <diverj@google.com>
|