diff options
author | Will McVicker <willmcvicker@google.com> | 2024-04-15 11:41:22 -0700 |
---|---|---|
committer | Will McVicker <willmcvicker@google.com> | 2024-04-16 10:17:07 -0700 |
commit | 0aa4c41c172f1e2acdf976c655f75a7a21db9791 (patch) | |
tree | 878a00410737d020c7be8fa0e2ab6849e310645e | |
parent | de85b3c05698f1ce2829d3ff977dee90be48b2d8 (diff) | |
parent | cfb55729953d62d99f66b0adc59963b189e9394b (diff) | |
download | gpu-android14-gs-pixel-6.1.tar.gz |
Merge aosp/android-gs-raviole-5.10-android14-qpr2 into aosp/android14-gs-pixel-6.1android14-gs-pixel-6.1
* aosp/android-gs-raviole-5.10-android14-qpr2: (354 commits)
[Official] MIDCET-5090, GPUCORE-40350: Flushes for L2 powerdown
Fix invalid page table entries from occuring.
Fix deadlock BTW user thread and page fault worker
Fix deadlock BTW user thread and page fault worker
csf: Fix kbase_kcpu_command_queue UaF due to bad queue creation
Fix kernel build warnings
Fix kernel build warnings
Add firmware core dump error code in sscd
GPUCORE-39469 Error handling for invalid slot when parsing trace data
mali_kbase: platform: Add missing bounds check
mali_kbase: Zero-initialize the dump_bufs_meta array
mali_kbase: Fix OOB write in kbase_csf_cpu_queue_dump()
mali_kbase: Move epoll-consumed waitqueue to struct kbase_file
Integrate firmware core dump into sscd
MIDCET-4870: Fix GPU page fault issue due to reclaiming of Tiler heap chunks
mali_kbase: platform: Fix integer overflow
mali_kbase: Tracepoints for governor recommendation
mali_kbase: Add tracepoints to hint_min_freq / hint_max_freq
mali_kbase: Enable mali_kutf_clk_rate_trace_test_portal build
mali_kbase: restore CSF ftrace events
Refactor helpers for creating RT threads
Update KMD to 'mini release: update r44p1-00dev2 to r44p1-00dev3'
mali_kbase: Use kthread for protm_event_worker
GPUCORE-34589 jit_lock all JIT operations
[Official] MIDCET-4458, GPUCORE-36765: Stop the use of tracking page for GPU memory accounting
mali_kbase: Unmask RESET_COMPLETED irq before resetting the GPU
[Official] MIDCET-4820,GPUCORE-36255 Sync whole USER_BUFFER pages upon GPU mapping
mali_kbase: Use rt_mutex for scheduler lock
mali_kbase: fix incorrect auto-merger change
mali_pixel: Disable mgm debugfs by default
mali_kbase: platform: Batch MMU flushes after liveness update
mali_kbase: refactor kbase_mmu_update_pages
[Official] MIDCET-4806,GPUCORE-38732 Continue FLUSH_MEM after power transition timeout
mali_pixel: mgm: Compensate for group migration
mali_pixel: mgm: Remove race condition
mali_pixel: mgm: Refactor update_size
mali_kbase: add missing deinitialization
[Official] MIDCET-4458, GPUCORE-36765: Stop the use of tracking page for GPU memory accounting
mali_kbase: restore hysteresis time.
Update KMD to 'mini release: update r44p1-01bet1 to r44p1-00dev2'
mali_kbase: Reduce kernel log spam.
csf: Setup kcpu_fence->metadata before accessing it
mali_kbase: Add an ITMON notifier callback to check GPU page tables.
mali_kbase: shorten 'mali_kbase_*' thread names
Constrain protected memory allocation during FW initialization
Merge upstream DDK R43P0 KMD
Mali allocations: unconditionally check for pending kill signals
pixel_gpu_uevent: Increase uevent ratelimiting timeout to 20mins
GPUCORE-38292 Fix Use-After-Free Race with Memory-Pool Grow
kbase: csf: Reboot on failed GPU reset
Add missing hwaccess_lock around atom_flags updates.
GPUCORE-35754: Add barrier before updating GLB_DB_REQ to ring CSG DB
mali_kbase: Enable kutf modules
GPUCORE-36682 Lock MMU while disabling AS to prevent use after free
kbase_mem: Reduce per-memory-group pool size to 4.
mali_pixel: mgm: Ensure partition size is set to 0 when disabled.
GPUCORE-37961 Deadlock issue due to lock ordering issue
Make sure jobs are flushed before kbasep_platform_context_term
[Official] MIDCET-4546, GPUCORE-37946: Synchronize GPU cache flush cmds with silent reset on GPU power up
mali_kbase: hold GPU utilization for premature update.
mali_kbase: Remove incorrect WARN()
MIDCET-4324/GPUCORE-35611 Unmapping of aliased sink-page memory
[Official] MIDCET-4458, GPUCORE-36402: Check for process exit before page alloc from kthread
Revert "Revert "GPUCORE-36748 Fix kbase_gpu_mmap() error handling""
Mali Valhall Android DDK r43p0-01eac0 KMD
Mali Valhall Android DDK r42p0-01eac0 KMD
mali_kbase: platform: [SLC-VK] Add new MGM group id for explicit SLC allocations.
mali_kbase: [SLC-VK] Add new BASE_MEM_GROUP for explicit SLC allocations.
mali_kbase: [SLC-VK] Add CCTX memory class for explicit SLC allocations.
platform: Fix mgm_term_data behavior
platform: Disable the GPU SLC partition when not in demand
Revert "GPUCORE-36748 Fix kbase_gpu_mmap() error handling"
Revert "GPUCORE-36682 Lock MMU while disabling AS to prevent use after free"
Revert "GPUCORE-36748 Fix kbase_gpu_mmap() error handling"
Revert "GPUCORE-36682 Lock MMU while disabling AS to prevent use after free"
[Official] MIDCET-4458, GPUCORE-36429: Prevent JIT allocations following unmap of tracking page
[Official] MIDCET-4458, GPUCORE-36635 Fix memory leak via GROUP_SUSPEND
Flush mmu updates regardless of coherency mode
kbase: Add a debugfs file to test GPU uevents
kbase: Add new GPU uevents to kbase
pixel: Introduce GPU uevents to notify userspace of GPU failures
[Official] MIDCET-4458, GPUCORE-36654 Use %pK on GPU bus fault
mali_kbase: platform: Init GPU SLC context
Add partial term support to pixel gpu init
mali_kbase: Add missing wake_up(poweroff_wait) when cancelling poweroff.
mali_pixel: mgm: Factor out common code between enabling/mutating partitions
mali_pixel: mgm: Get accurate size from slc pt mutate
mali_kbase: platform: mgm: Get accurate SLC partition size
mali_kbase: Remove redundant if check to unblock suspend
mali_kbase: reset: Flush SSCD worker before resetting the GPU
pixel_gpu_sscd: Prevent dumping multiple SSCDs when the GPU hangs
mali_kbase: reset: Add a helper to check GPU reset failure
mali-pma: Defer probing until the dma_heap is found
Revert "mali_kbase: mem: Prevent vma splits"
GPUCORE-36682 Lock MMU while disabling AS to prevent use after free
GPUCORE-36748 Fix kbase_gpu_mmap() error handling
Powercycle mali to recover from a PM timeout
mali_pixel: Downgrade invalid region warning to dev_dbg
mali_pixel: Fix PBHA bit pos for ZUMA and PRO
mali_kbase: platform: Perform partition resize and region migration
...
Test: Verify `git diff aosp/android-gs-raviole-5.10-android14-qpr2..HEAD`
Change-Id: I0711654dd45ae2996e837ce3353f0790394d7c72
Signed-off-by: Will McVicker <willmcvicker@google.com>
332 files changed, 40150 insertions, 15611 deletions
diff --git a/mali_kbase/arbiter/mali_kbase_arbiter_interface.h b/common/include/linux/mali_arbiter_interface.h index a0ca1cc..8e675ec 100644 --- a/mali_kbase/arbiter/mali_kbase_arbiter_interface.h +++ b/common/include/linux/mali_arbiter_interface.h @@ -41,7 +41,7 @@ * 4 - Added max_config support * 5 - Added GPU clock frequency reporting support from arbiter */ -#define MALI_KBASE_ARBITER_INTERFACE_VERSION 5 +#define MALI_ARBITER_INTERFACE_VERSION 5 /** * DOC: NO_FREQ is used in case platform doesn't support reporting frequency diff --git a/common/include/linux/mali_kbase_debug_coresight_csf.h b/common/include/linux/mali_kbase_debug_coresight_csf.h new file mode 100644 index 0000000..8356fd4 --- /dev/null +++ b/common/include/linux/mali_kbase_debug_coresight_csf.h @@ -0,0 +1,241 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * + * (C) COPYRIGHT 2022 ARM Limited. All rights reserved. + * + * This program is free software and is provided to you under the terms of the + * GNU General Public License version 2 as published by the Free Software + * Foundation, and any use by you of this program is subject to the terms + * of such GNU license. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, you can access it online at + * http://www.gnu.org/licenses/gpl-2.0.html. + * + */ + +#ifndef _KBASE_DEBUG_CORESIGHT_CSF_ +#define _KBASE_DEBUG_CORESIGHT_CSF_ + +#include <linux/types.h> +#include <linux/list.h> + +#define KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_NOP 0U +#define KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_WRITE_IMM 1U +#define KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_WRITE_IMM_RANGE 2U +#define KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_WRITE 3U +#define KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_READ 4U +#define KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_POLL 5U +#define KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_BIT_OR 6U +#define KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_BIT_XOR 7U +#define KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_BIT_AND 8U +#define KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_BIT_NOT 9U + +/** + * struct kbase_debug_coresight_csf_write_imm_op - Coresight immediate write operation structure + * + * @reg_addr: Register address to write to. + * @val: Value to write at @reg_addr. + */ +struct kbase_debug_coresight_csf_write_imm_op { + __u32 reg_addr; + __u32 val; +}; + +/** + * struct kbase_debug_coresight_csf_write_imm_range_op - Coresight immediate write range + * operation structure + * + * @reg_start: Register address to start writing from. + * @reg_end: Register address to stop writing from. End address included in the write range. + * @val: Value to write at @reg_addr. + */ +struct kbase_debug_coresight_csf_write_imm_range_op { + __u32 reg_start; + __u32 reg_end; + __u32 val; +}; + +/** + * struct kbase_debug_coresight_csf_write_op - Coresight write operation structure + * + * @reg_addr: Register address to write to. + * @ptr: Pointer to the value to write at @reg_addr. + */ +struct kbase_debug_coresight_csf_write_op { + __u32 reg_addr; + __u32 *ptr; +}; + +/** + * struct kbase_debug_coresight_csf_read_op - Coresight read operation structure + * + * @reg_addr: Register address to read. + * @ptr: Pointer where to store the read value. + */ +struct kbase_debug_coresight_csf_read_op { + __u32 reg_addr; + __u32 *ptr; +}; + +/** + * struct kbase_debug_coresight_csf_poll_op - Coresight poll operation structure + * + * @reg_addr: Register address to poll. + * @val: Expected value after poll. + * @mask: Mask to apply on the read value from @reg_addr when comparing against @val. + */ +struct kbase_debug_coresight_csf_poll_op { + __u32 reg_addr; + __u32 val; + __u32 mask; +}; + +/** + * struct kbase_debug_coresight_csf_bitw_op - Coresight bitwise operation structure + * + * @ptr: Pointer to the variable on which to execute the bit operation. + * @val: Value with which the operation should be executed against @ptr value. + */ +struct kbase_debug_coresight_csf_bitw_op { + __u32 *ptr; + __u32 val; +}; + +/** + * struct kbase_debug_coresight_csf_op - Coresight supported operations + * + * @type: Operation type. + * @padding: Padding for 64bit alignment. + * @op: Operation union. + * @op.write_imm: Parameters for immediate write operation. + * @op.write_imm_range: Parameters for immediate range write operation. + * @op.write: Parameters for write operation. + * @op.read: Parameters for read operation. + * @op.poll: Parameters for poll operation. + * @op.bitw: Parameters for bitwise operation. + * @op.padding: Padding for 64bit alignment. + * + * All operation structures should include padding to ensure they are the same size. + */ +struct kbase_debug_coresight_csf_op { + __u8 type; + __u8 padding[7]; + union { + struct kbase_debug_coresight_csf_write_imm_op write_imm; + struct kbase_debug_coresight_csf_write_imm_range_op write_imm_range; + struct kbase_debug_coresight_csf_write_op write; + struct kbase_debug_coresight_csf_read_op read; + struct kbase_debug_coresight_csf_poll_op poll; + struct kbase_debug_coresight_csf_bitw_op bitw; + u32 padding[3]; + } op; +}; + +/** + * struct kbase_debug_coresight_csf_sequence - Coresight sequence of operations + * + * @ops: Arrays containing Coresight operations. + * @nr_ops: Size of @ops. + */ +struct kbase_debug_coresight_csf_sequence { + struct kbase_debug_coresight_csf_op *ops; + int nr_ops; +}; + +/** + * struct kbase_debug_coresight_csf_address_range - Coresight client address range + * + * @start: Start offset of the address range. + * @end: End offset of the address range. + */ +struct kbase_debug_coresight_csf_address_range { + __u32 start; + __u32 end; +}; + +/** + * kbase_debug_coresight_csf_register - Register as a client for set ranges of MCU memory. + * + * @drv_data: Pointer to driver device data. + * @ranges: Pointer to an array of struct kbase_debug_coresight_csf_address_range + * that contains start and end addresses that the client will manage. + * @nr_ranges: Size of @ranges array. + * + * This function checks @ranges against current client claimed ranges. If there + * are no overlaps, a new client is created and added to the list. + * + * Return: A pointer of the registered client instance on success. NULL on failure. + */ +void *kbase_debug_coresight_csf_register(void *drv_data, + struct kbase_debug_coresight_csf_address_range *ranges, + int nr_ranges); + +/** + * kbase_debug_coresight_csf_unregister - Removes a coresight client. + * + * @client_data: A pointer to a coresight client. + * + * This function removes a client from the client list and frees the client struct. + */ +void kbase_debug_coresight_csf_unregister(void *client_data); + +/** + * kbase_debug_coresight_csf_config_create - Creates a configuration containing + * enable and disable sequence. + * + * @client_data: Pointer to a coresight client. + * @enable_seq: Pointer to a struct containing the ops needed to enable coresight blocks. + * It's optional so could be NULL. + * @disable_seq: Pointer to a struct containing ops to run to disable coresight blocks. + * It's optional so could be NULL. + * + * Return: Valid pointer on success. NULL on failure. + */ +void * +kbase_debug_coresight_csf_config_create(void *client_data, + struct kbase_debug_coresight_csf_sequence *enable_seq, + struct kbase_debug_coresight_csf_sequence *disable_seq); +/** + * kbase_debug_coresight_csf_config_free - Frees a configuration containing + * enable and disable sequence. + * + * @config_data: Pointer to a coresight configuration. + */ +void kbase_debug_coresight_csf_config_free(void *config_data); + +/** + * kbase_debug_coresight_csf_config_enable - Enables a coresight configuration + * + * @config_data: Pointer to coresight configuration. + * + * If GPU is turned on, the configuration is immediately applied the CoreSight blocks. + * If the GPU is turned off, the configuration is scheduled to be applied on the next + * time the GPU is turned on. + * + * A configuration is enabled by executing read/write/poll ops defined in config->enable_seq. + * + * Return: 0 if success. Error code on failure. + */ +int kbase_debug_coresight_csf_config_enable(void *config_data); +/** + * kbase_debug_coresight_csf_config_disable - Disables a coresight configuration + * + * @config_data: Pointer to coresight configuration. + * + * If the GPU is turned off, this is effective a NOP as kbase should have disabled + * the configuration when GPU is off. + * If the GPU is on, the configuration will be disabled. + * + * A configuration is disabled by executing read/write/poll ops defined in config->disable_seq. + * + * Return: 0 if success. Error code on failure. + */ +int kbase_debug_coresight_csf_config_disable(void *config_data); + +#endif /* _KBASE_DEBUG_CORESIGHT_CSF_ */ diff --git a/common/include/linux/memory_group_manager.h b/common/include/linux/memory_group_manager.h index efa35f5..7561363 100644 --- a/common/include/linux/memory_group_manager.h +++ b/common/include/linux/memory_group_manager.h @@ -30,7 +30,7 @@ typedef int vm_fault_t; #endif -#define MEMORY_GROUP_MANAGER_NR_GROUPS (16) +#define MEMORY_GROUP_MANAGER_NR_GROUPS (4) struct memory_group_manager_device; struct memory_group_manager_import_data; @@ -43,6 +43,8 @@ struct memory_group_manager_import_data; * @mgm_free_page: Callback to free physical memory in a group * @mgm_get_import_memory_id: Callback to get the group ID for imported memory * @mgm_update_gpu_pte: Callback to modify a GPU page table entry + * @mgm_pte_to_original_pte: Callback to get the original PTE entry as given + * to mgm_update_gpu_pte * @mgm_vmf_insert_pfn_prot: Callback to map a physical memory page for the CPU */ struct memory_group_manager_ops { @@ -120,7 +122,8 @@ struct memory_group_manager_ops { * This function allows the memory group manager to modify a GPU page * table entry before it is stored by the kbase module (controller * driver). It may set certain bits in the page table entry attributes - * or in the physical address, based on the physical memory group ID. + * or modify the physical address, based on the physical memory group ID + * and/or additional data in struct memory_group_manager_device. * * Return: A modified GPU page table entry to be stored in a page table. */ @@ -128,6 +131,27 @@ struct memory_group_manager_ops { int group_id, int mmu_level, u64 pte); /* + * mgm_pte_to_original_pte - Undo any modification done during mgm_update_gpu_pte() + * + * @mgm_dev: The memory group manager through which the request + * is being made. + * @group_id: A physical memory group ID. The meaning of this is + * defined by the systems integrator. Its valid range is + * 0 .. MEMORY_GROUP_MANAGER_NR_GROUPS-1. + * @mmu_level: The level of the page table entry in @ate. + * @pte: The page table entry to restore the original representation for, + * in LPAE or AArch64 format (depending on the driver's configuration). + * + * Undo any modifications done during mgm_update_gpu_pte(). + * This function allows getting back the original PTE entry as given + * to mgm_update_gpu_pte(). + * + * Return: PTE entry as originally specified to mgm_update_gpu_pte() + */ + u64 (*mgm_pte_to_original_pte)(struct memory_group_manager_device *mgm_dev, int group_id, + int mmu_level, u64 pte); + + /* * mgm_vmf_insert_pfn_prot - Map a physical page in a group for the CPU * * @mgm_dev: The memory group manager through which the request diff --git a/common/include/linux/version_compat_defs.h b/common/include/linux/version_compat_defs.h index 8d289f2..47551f2 100644 --- a/common/include/linux/version_compat_defs.h +++ b/common/include/linux/version_compat_defs.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2022-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -23,11 +23,46 @@ #define _VERSION_COMPAT_DEFS_H_ #include <linux/version.h> +#include <linux/highmem.h> +#include <linux/timer.h> -#if KERNEL_VERSION(4, 16, 0) >= LINUX_VERSION_CODE +#if (KERNEL_VERSION(4, 4, 267) < LINUX_VERSION_CODE) +#include <linux/overflow.h> +#endif + +#include <linux/bitops.h> +#if (KERNEL_VERSION(4, 19, 0) <= LINUX_VERSION_CODE) +#include <linux/bits.h> +#endif + +#ifndef BITS_PER_TYPE +#define BITS_PER_TYPE(type) (sizeof(type) * BITS_PER_BYTE) +#endif + +#if KERNEL_VERSION(4, 16, 0) > LINUX_VERSION_CODE typedef unsigned int __poll_t; #endif +#if KERNEL_VERSION(4, 9, 78) >= LINUX_VERSION_CODE + +#ifndef EPOLLHUP +#define EPOLLHUP POLLHUP +#endif + +#ifndef EPOLLERR +#define EPOLLERR POLLERR +#endif + +#ifndef EPOLLIN +#define EPOLLIN POLLIN +#endif + +#ifndef EPOLLRDNORM +#define EPOLLRDNORM POLLRDNORM +#endif + +#endif + #if KERNEL_VERSION(6, 1, 0) <= LINUX_VERSION_CODE /* This is defined inside kbase for matching the default to kernel's * mmap_min_addr, used inside file mali_kbase_mmap.c. @@ -36,21 +71,173 @@ typedef unsigned int __poll_t; */ #ifdef CONFIG_MMU #define kbase_mmap_min_addr CONFIG_DEFAULT_MMAP_MIN_ADDR + #ifdef CONFIG_LSM_MMAP_MIN_ADDR #if (CONFIG_LSM_MMAP_MIN_ADDR > CONFIG_DEFAULT_MMAP_MIN_ADDR) /* Replace the default definition with CONFIG_LSM_MMAP_MIN_ADDR */ #undef kbase_mmap_min_addr #define kbase_mmap_min_addr CONFIG_LSM_MMAP_MIN_ADDR -#pragma message "kbase_mmap_min_addr compiled to CONFIG_LSM_MMAP_MIN_ADDR, no runtime update!" +#define KBASE_COMPILED_MMAP_MIN_ADDR_MSG \ + "* MALI kbase_mmap_min_addr compiled to CONFIG_LSM_MMAP_MIN_ADDR, no runtime update possible! *" #endif /* (CONFIG_LSM_MMAP_MIN_ADDR > CONFIG_DEFAULT_MMAP_MIN_ADDR) */ #endif /* CONFIG_LSM_MMAP_MIN_ADDR */ + #if (kbase_mmap_min_addr == CONFIG_DEFAULT_MMAP_MIN_ADDR) -#pragma message "kbase_mmap_min_addr compiled to CONFIG_DEFAULT_MMAP_MIN_ADDR, no runtime update!" +#define KBASE_COMPILED_MMAP_MIN_ADDR_MSG \ + "* MALI kbase_mmap_min_addr compiled to CONFIG_DEFAULT_MMAP_MIN_ADDR, no runtime update possible! *" #endif + #else /* CONFIG_MMU */ #define kbase_mmap_min_addr (0UL) -#pragma message "kbase_mmap_min_addr compiled to (0UL), no runtime update!" +#define KBASE_COMPILED_MMAP_MIN_ADDR_MSG \ + "* MALI kbase_mmap_min_addr compiled to (0UL), no runtime update possible! *" #endif /* CONFIG_MMU */ #endif /* KERNEL_VERSION(6, 1, 0) <= LINUX_VERSION_CODE */ +static inline void kbase_timer_setup(struct timer_list *timer, + void (*callback)(struct timer_list *timer)) +{ +#if KERNEL_VERSION(4, 14, 0) > LINUX_VERSION_CODE + setup_timer(timer, (void (*)(unsigned long))callback, (unsigned long)timer); +#else + timer_setup(timer, callback, 0); +#endif +} + +#ifndef WRITE_ONCE +#ifdef ASSIGN_ONCE +#define WRITE_ONCE(x, val) ASSIGN_ONCE(val, x) +#else +#define WRITE_ONCE(x, val) (ACCESS_ONCE(x) = (val)) +#endif +#endif + +#ifndef READ_ONCE +#define READ_ONCE(x) ACCESS_ONCE(x) +#endif + +static inline void *kbase_kmap(struct page *p) +{ +#if KERNEL_VERSION(5, 11, 0) <= LINUX_VERSION_CODE + return kmap_local_page(p); +#else + return kmap(p); +#endif /* KERNEL_VERSION(5, 11, 0) */ +} + +static inline void *kbase_kmap_atomic(struct page *p) +{ +#if KERNEL_VERSION(5, 11, 0) <= LINUX_VERSION_CODE + return kmap_local_page(p); +#else + return kmap_atomic(p); +#endif /* KERNEL_VERSION(5, 11, 0) */ +} + +static inline void kbase_kunmap(struct page *p, void *address) +{ +#if KERNEL_VERSION(5, 11, 0) <= LINUX_VERSION_CODE + kunmap_local(address); +#else + kunmap(p); +#endif /* KERNEL_VERSION(5, 11, 0) */ +} + +static inline void kbase_kunmap_atomic(void *address) +{ +#if KERNEL_VERSION(5, 11, 0) <= LINUX_VERSION_CODE + kunmap_local(address); +#else + kunmap_atomic(address); +#endif /* KERNEL_VERSION(5, 11, 0) */ +} + +/* Some of the older 4.4 kernel patch versions do + * not contain the overflow check functions. However, + * they are based on compiler instrinsics, so they + * are simple to reproduce. + */ +#if (KERNEL_VERSION(4, 4, 267) >= LINUX_VERSION_CODE) +/* Some of the older 4.4 kernel patch versions do + * not contain the overflow check functions. However, + * they are based on compiler instrinsics, so they + * are simple to reproduce. + */ +#define check_mul_overflow(a, b, d) __builtin_mul_overflow(a, b, d) +#endif + +/* + * There was a big rename in the 4.10 kernel (fence* -> dma_fence*), + * with most of the related functions keeping the same signatures. + */ + +#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE) + +#include <linux/fence.h> + +#define dma_fence fence +#define dma_fence_ops fence_ops +#define dma_fence_context_alloc(a) fence_context_alloc(a) +#define dma_fence_init(a, b, c, d, e) fence_init(a, b, c, d, e) +#define dma_fence_get(a) fence_get(a) +#define dma_fence_put(a) fence_put(a) +#define dma_fence_signal(a) fence_signal(a) +#define dma_fence_is_signaled(a) fence_is_signaled(a) +#define dma_fence_add_callback(a, b, c) fence_add_callback(a, b, c) +#define dma_fence_remove_callback(a, b) fence_remove_callback(a, b) +#define dma_fence_default_wait fence_default_wait + +#if (KERNEL_VERSION(4, 9, 68) <= LINUX_VERSION_CODE) +#define dma_fence_get_status(a) (fence_is_signaled(a) ? (a)->error ?: 1 : 0) +#else +#define dma_fence_get_status(a) (fence_is_signaled(a) ? (a)->status ?: 1 : 0) +#endif + +#else + +#include <linux/dma-fence.h> + +#if (KERNEL_VERSION(4, 11, 0) > LINUX_VERSION_CODE) +#define dma_fence_get_status(a) (dma_fence_is_signaled(a) ? (a)->status ?: 1 : 0) +#endif + +#endif /* < 4.10.0 */ + +static inline void dma_fence_set_error_helper( +#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE) + struct fence *fence, +#else + struct dma_fence *fence, +#endif + int error) +{ +#if (KERNEL_VERSION(4, 11, 0) <= LINUX_VERSION_CODE) + dma_fence_set_error(fence, error); +#elif (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE && \ + KERNEL_VERSION(4, 9, 68) <= LINUX_VERSION_CODE) + fence_set_error(fence, error); +#else + fence->status = error; +#endif +} + +#include <linux/mm.h> +#if !((KERNEL_VERSION(6, 3, 0) <= LINUX_VERSION_CODE) || \ + ((KERNEL_VERSION(6, 1, 25) <= LINUX_VERSION_CODE) && defined(__ANDROID_COMMON_KERNEL__))) +static inline void vm_flags_set(struct vm_area_struct *vma, vm_flags_t flags) +{ + vma->vm_flags |= flags; +} +static inline void vm_flags_clear(struct vm_area_struct *vma, vm_flags_t flags) +{ + vma->vm_flags &= ~flags; +} +#endif + +#if (KERNEL_VERSION(6, 4, 0) <= LINUX_VERSION_CODE) +#define KBASE_CLASS_CREATE(owner, name) class_create(name) +#else +#define KBASE_CLASS_CREATE(owner, name) class_create(owner, name) +#endif + #endif /* _VERSION_COMPAT_DEFS_H_ */ diff --git a/common/include/linux/dma-buf-test-exporter.h b/common/include/uapi/base/arm/dma_buf_test_exporter/dma-buf-test-exporter.h index aae12f9..a92e296 100644 --- a/common/include/linux/dma-buf-test-exporter.h +++ b/common/include/uapi/base/arm/dma_buf_test_exporter/dma-buf-test-exporter.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2012-2013, 2017, 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2012-2013, 2017, 2020-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -19,21 +19,22 @@ * */ -#ifndef _LINUX_DMA_BUF_TEST_EXPORTER_H_ -#define _LINUX_DMA_BUF_TEST_EXPORTER_H_ +#ifndef _UAPI_DMA_BUF_TEST_EXPORTER_H_ +#define _UAPI_DMA_BUF_TEST_EXPORTER_H_ #include <linux/types.h> #include <linux/ioctl.h> -#define DMA_BUF_TE_VER_MAJOR 1 -#define DMA_BUF_TE_VER_MINOR 0 #define DMA_BUF_TE_ENQ 0x642d7465 #define DMA_BUF_TE_ACK 0x68692100 struct dma_buf_te_ioctl_version { - int op; /**< Must be set to DMA_BUF_TE_ENQ by client, driver will set it to DMA_BUF_TE_ACK */ - int major; /**< Major version */ - int minor; /**< Minor version */ + /** Must be set to DMA_BUF_TE_ENQ by client, driver will set it to DMA_BUF_TE_ACK */ + int op; + /** Major version */ + int major; + /** Minor version */ + int minor; }; struct dma_buf_te_ioctl_alloc { @@ -46,7 +47,7 @@ struct dma_buf_te_ioctl_status { /* out */ int attached_devices; /* number of devices attached (active 'dma_buf_attach's) */ int device_mappings; /* number of device mappings (active 'dma_buf_map_attachment's) */ - int cpu_mappings; /* number of cpu mappings (active 'mmap's) */ + int cpu_mappings; /* number of cpu mappings (active 'mmap's) */ }; struct dma_buf_te_ioctl_set_failing { @@ -66,11 +67,12 @@ struct dma_buf_te_ioctl_fill { #define DMA_BUF_TE_IOCTL_BASE 'E' /* Below all returning 0 if successful or -errcode except DMA_BUF_TE_ALLOC which will return fd or -errcode */ -#define DMA_BUF_TE_VERSION _IOR(DMA_BUF_TE_IOCTL_BASE, 0x00, struct dma_buf_te_ioctl_version) -#define DMA_BUF_TE_ALLOC _IOR(DMA_BUF_TE_IOCTL_BASE, 0x01, struct dma_buf_te_ioctl_alloc) -#define DMA_BUF_TE_QUERY _IOR(DMA_BUF_TE_IOCTL_BASE, 0x02, struct dma_buf_te_ioctl_status) -#define DMA_BUF_TE_SET_FAILING _IOW(DMA_BUF_TE_IOCTL_BASE, 0x03, struct dma_buf_te_ioctl_set_failing) -#define DMA_BUF_TE_ALLOC_CONT _IOR(DMA_BUF_TE_IOCTL_BASE, 0x04, struct dma_buf_te_ioctl_alloc) -#define DMA_BUF_TE_FILL _IOR(DMA_BUF_TE_IOCTL_BASE, 0x05, struct dma_buf_te_ioctl_fill) +#define DMA_BUF_TE_VERSION _IOR(DMA_BUF_TE_IOCTL_BASE, 0x00, struct dma_buf_te_ioctl_version) +#define DMA_BUF_TE_ALLOC _IOR(DMA_BUF_TE_IOCTL_BASE, 0x01, struct dma_buf_te_ioctl_alloc) +#define DMA_BUF_TE_QUERY _IOR(DMA_BUF_TE_IOCTL_BASE, 0x02, struct dma_buf_te_ioctl_status) +#define DMA_BUF_TE_SET_FAILING \ + _IOW(DMA_BUF_TE_IOCTL_BASE, 0x03, struct dma_buf_te_ioctl_set_failing) +#define DMA_BUF_TE_ALLOC_CONT _IOR(DMA_BUF_TE_IOCTL_BASE, 0x04, struct dma_buf_te_ioctl_alloc) +#define DMA_BUF_TE_FILL _IOR(DMA_BUF_TE_IOCTL_BASE, 0x05, struct dma_buf_te_ioctl_fill) -#endif /* _LINUX_DMA_BUF_TEST_EXPORTER_H_ */ +#endif /* _UAPI_DMA_BUF_TEST_EXPORTER_H_ */ diff --git a/common/include/uapi/gpu/arm/midgard/backend/gpu/mali_kbase_model_dummy.h b/common/include/uapi/gpu/arm/midgard/backend/gpu/mali_kbase_model_dummy.h index 9d677ca..a44da7b 100644 --- a/common/include/uapi/gpu/arm/midgard/backend/gpu/mali_kbase_model_dummy.h +++ b/common/include/uapi/gpu/arm/midgard/backend/gpu/mali_kbase_model_dummy.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -29,7 +29,11 @@ #include <linux/types.h> #define KBASE_DUMMY_MODEL_COUNTER_HEADER_DWORDS (4) +#if MALI_USE_CSF +#define KBASE_DUMMY_MODEL_COUNTER_PER_CORE (65) +#else /* MALI_USE_CSF */ #define KBASE_DUMMY_MODEL_COUNTER_PER_CORE (60) +#endif /* !MALI_USE_CSF */ #define KBASE_DUMMY_MODEL_COUNTERS_PER_BIT (4) #define KBASE_DUMMY_MODEL_COUNTER_ENABLED(enable_mask, ctr_idx) \ (enable_mask & (1 << (ctr_idx / KBASE_DUMMY_MODEL_COUNTERS_PER_BIT))) @@ -43,13 +47,29 @@ (KBASE_DUMMY_MODEL_VALUES_PER_BLOCK * sizeof(__u32)) #define KBASE_DUMMY_MODEL_MAX_MEMSYS_BLOCKS 8 #define KBASE_DUMMY_MODEL_MAX_SHADER_CORES 32 -#define KBASE_DUMMY_MODEL_MAX_NUM_PERF_BLOCKS \ +#define KBASE_DUMMY_MODEL_MAX_FIRMWARE_BLOCKS 0 +#define KBASE_DUMMY_MODEL_MAX_NUM_HARDWARE_BLOCKS \ (1 + 1 + KBASE_DUMMY_MODEL_MAX_MEMSYS_BLOCKS + KBASE_DUMMY_MODEL_MAX_SHADER_CORES) +#define KBASE_DUMMY_MODEL_MAX_NUM_PERF_BLOCKS \ + (KBASE_DUMMY_MODEL_MAX_NUM_HARDWARE_BLOCKS + KBASE_DUMMY_MODEL_MAX_FIRMWARE_BLOCKS) #define KBASE_DUMMY_MODEL_COUNTER_TOTAL \ (KBASE_DUMMY_MODEL_MAX_NUM_PERF_BLOCKS * \ KBASE_DUMMY_MODEL_COUNTER_PER_CORE) +#define KBASE_DUMMY_MODEL_MAX_VALUES_PER_SAMPLE \ + (KBASE_DUMMY_MODEL_MAX_NUM_PERF_BLOCKS * KBASE_DUMMY_MODEL_VALUES_PER_BLOCK) +#define KBASE_DUMMY_MODEL_MAX_SAMPLE_SIZE \ + (KBASE_DUMMY_MODEL_MAX_NUM_PERF_BLOCKS * KBASE_DUMMY_MODEL_BLOCK_SIZE) +/* + * Bit mask - no. bits set is no. cores + * Values obtained from talking to HW team + * Example: tODx has 10 cores, 0b11 1111 1111 -> 0x3FF + */ #define DUMMY_IMPLEMENTATION_SHADER_PRESENT (0xFull) +#define DUMMY_IMPLEMENTATION_SHADER_PRESENT_TBEX (0x7FFFull) +#define DUMMY_IMPLEMENTATION_SHADER_PRESENT_TODX (0x3FFull) +#define DUMMY_IMPLEMENTATION_SHADER_PRESENT_TTUX (0x7FFull) +#define DUMMY_IMPLEMENTATION_SHADER_PRESENT_TTIX (0xFFFull) #define DUMMY_IMPLEMENTATION_TILER_PRESENT (0x1ull) #define DUMMY_IMPLEMENTATION_L2_PRESENT (0x1ull) #define DUMMY_IMPLEMENTATION_STACK_PRESENT (0xFull) diff --git a/mali_kbase/mali_kbase_bits.h b/common/include/uapi/gpu/arm/midgard/backend/gpu/mali_kbase_model_linux.h index a085fd8..c83cedd 100644 --- a/mali_kbase/mali_kbase_bits.h +++ b/common/include/uapi/gpu/arm/midgard/backend/gpu/mali_kbase_model_linux.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -19,13 +19,18 @@ * */ -#ifndef _KBASE_BITS_H_ -#define _KBASE_BITS_H_ +/* + * Dummy Model interface + */ + +#ifndef _UAPI_KBASE_MODEL_LINUX_H_ +#define _UAPI_KBASE_MODEL_LINUX_H_ + +/* Generic model IRQs */ +#define MODEL_LINUX_JOB_IRQ (0x1 << 0) +#define MODEL_LINUX_GPU_IRQ (0x1 << 1) +#define MODEL_LINUX_MMU_IRQ (0x1 << 2) -#if (KERNEL_VERSION(4, 19, 0) <= LINUX_VERSION_CODE) -#include <linux/bits.h> -#else -#include <linux/bitops.h> -#endif +#define MODEL_LINUX_IRQ_MASK (MODEL_LINUX_JOB_IRQ | MODEL_LINUX_GPU_IRQ | MODEL_LINUX_MMU_IRQ) -#endif /* _KBASE_BITS_H_ */ +#endif /* _UAPI_KBASE_MODEL_LINUX_H_ */ diff --git a/common/include/uapi/gpu/arm/midgard/csf/mali_base_csf_kernel.h b/common/include/uapi/gpu/arm/midgard/csf/mali_base_csf_kernel.h index 7f7b9dd..a8e5802 100644 --- a/common/include/uapi/gpu/arm/midgard/csf/mali_base_csf_kernel.h +++ b/common/include/uapi/gpu/arm/midgard/csf/mali_base_csf_kernel.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2020-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2020-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -23,99 +23,16 @@ #define _UAPI_BASE_CSF_KERNEL_H_ #include <linux/types.h> +#include "../mali_base_common_kernel.h" -/* Memory allocation, access/hint flags. +/* Memory allocation, access/hint flags & mask specific to CSF GPU. * * See base_mem_alloc_flags. */ -/* IN */ -/* Read access CPU side - */ -#define BASE_MEM_PROT_CPU_RD ((base_mem_alloc_flags)1 << 0) - -/* Write access CPU side - */ -#define BASE_MEM_PROT_CPU_WR ((base_mem_alloc_flags)1 << 1) - -/* Read access GPU side - */ -#define BASE_MEM_PROT_GPU_RD ((base_mem_alloc_flags)1 << 2) - -/* Write access GPU side - */ -#define BASE_MEM_PROT_GPU_WR ((base_mem_alloc_flags)1 << 3) - -/* Execute allowed on the GPU side - */ -#define BASE_MEM_PROT_GPU_EX ((base_mem_alloc_flags)1 << 4) - -/* Will be permanently mapped in kernel space. - * Flag is only allowed on allocations originating from kbase. - */ -#define BASEP_MEM_PERMANENT_KERNEL_MAPPING ((base_mem_alloc_flags)1 << 5) - -/* The allocation will completely reside within the same 4GB chunk in the GPU - * virtual space. - * Since this flag is primarily required only for the TLS memory which will - * not be used to contain executable code and also not used for Tiler heap, - * it can't be used along with BASE_MEM_PROT_GPU_EX and TILER_ALIGN_TOP flags. - */ -#define BASE_MEM_GPU_VA_SAME_4GB_PAGE ((base_mem_alloc_flags)1 << 6) - -/* Userspace is not allowed to free this memory. - * Flag is only allowed on allocations originating from kbase. - */ -#define BASEP_MEM_NO_USER_FREE ((base_mem_alloc_flags)1 << 7) - /* Must be FIXED memory. */ #define BASE_MEM_FIXED ((base_mem_alloc_flags)1 << 8) -/* Grow backing store on GPU Page Fault - */ -#define BASE_MEM_GROW_ON_GPF ((base_mem_alloc_flags)1 << 9) - -/* Page coherence Outer shareable, if available - */ -#define BASE_MEM_COHERENT_SYSTEM ((base_mem_alloc_flags)1 << 10) - -/* Page coherence Inner shareable - */ -#define BASE_MEM_COHERENT_LOCAL ((base_mem_alloc_flags)1 << 11) - -/* IN/OUT */ -/* Should be cached on the CPU, returned if actually cached - */ -#define BASE_MEM_CACHED_CPU ((base_mem_alloc_flags)1 << 12) - -/* IN/OUT */ -/* Must have same VA on both the GPU and the CPU - */ -#define BASE_MEM_SAME_VA ((base_mem_alloc_flags)1 << 13) - -/* OUT */ -/* Must call mmap to acquire a GPU address for the alloc - */ -#define BASE_MEM_NEED_MMAP ((base_mem_alloc_flags)1 << 14) - -/* IN */ -/* Page coherence Outer shareable, required. - */ -#define BASE_MEM_COHERENT_SYSTEM_REQUIRED ((base_mem_alloc_flags)1 << 15) - -/* Protected memory - */ -#define BASE_MEM_PROTECTED ((base_mem_alloc_flags)1 << 16) - -/* Not needed physical memory - */ -#define BASE_MEM_DONT_NEED ((base_mem_alloc_flags)1 << 17) - -/* Must use shared CPU/GPU zone (SAME_VA zone) but doesn't require the - * addresses to be the same - */ -#define BASE_MEM_IMPORT_SHARED ((base_mem_alloc_flags)1 << 18) - /* CSF event memory * * If Outer shareable coherence is not specified or not available, then on @@ -131,46 +48,15 @@ #define BASE_MEM_RESERVED_BIT_20 ((base_mem_alloc_flags)1 << 20) -/* Should be uncached on the GPU, will work only for GPUs using AARCH64 mmu - * mode. Some components within the GPU might only be able to access memory - * that is GPU cacheable. Refer to the specific GPU implementation for more - * details. The 3 shareability flags will be ignored for GPU uncached memory. - * If used while importing USER_BUFFER type memory, then the import will fail - * if the memory is not aligned to GPU and CPU cache line width. - */ -#define BASE_MEM_UNCACHED_GPU ((base_mem_alloc_flags)1 << 21) - -/* - * Bits [22:25] for group_id (0~15). - * - * base_mem_group_id_set() should be used to pack a memory group ID into a - * base_mem_alloc_flags value instead of accessing the bits directly. - * base_mem_group_id_get() should be used to extract the memory group ID from - * a base_mem_alloc_flags value. - */ -#define BASEP_MEM_GROUP_ID_SHIFT 22 -#define BASE_MEM_GROUP_ID_MASK \ - ((base_mem_alloc_flags)0xF << BASEP_MEM_GROUP_ID_SHIFT) - -/* Must do CPU cache maintenance when imported memory is mapped/unmapped - * on GPU. Currently applicable to dma-buf type only. - */ -#define BASE_MEM_IMPORT_SYNC_ON_MAP_UNMAP ((base_mem_alloc_flags)1 << 26) - -/* OUT */ -/* Kernel side cache sync ops required */ -#define BASE_MEM_KERNEL_SYNC ((base_mem_alloc_flags)1 << 28) /* Must be FIXABLE memory: its GPU VA will be determined at a later point, * at which time it will be at a fixed GPU VA. */ #define BASE_MEM_FIXABLE ((base_mem_alloc_flags)1 << 29) -/* Number of bits used as flags for base memory management - * - * Must be kept in sync with the base_mem_alloc_flags flags +/* Note that the number of bits used for base_mem_alloc_flags + * must be less than BASE_MEM_FLAGS_NR_BITS !!! */ -#define BASE_MEM_FLAGS_NR_BITS 30 /* A mask of all the flags which are only valid for allocations within kbase, * and may not be passed from user space. @@ -178,62 +64,23 @@ #define BASEP_MEM_FLAGS_KERNEL_ONLY \ (BASEP_MEM_PERMANENT_KERNEL_MAPPING | BASEP_MEM_NO_USER_FREE) -/* A mask for all output bits, excluding IN/OUT bits. - */ -#define BASE_MEM_FLAGS_OUTPUT_MASK BASE_MEM_NEED_MMAP - -/* A mask for all input bits, including IN/OUT bits. - */ -#define BASE_MEM_FLAGS_INPUT_MASK \ - (((1 << BASE_MEM_FLAGS_NR_BITS) - 1) & ~BASE_MEM_FLAGS_OUTPUT_MASK) - /* A mask of all currently reserved flags */ #define BASE_MEM_FLAGS_RESERVED BASE_MEM_RESERVED_BIT_20 -#define BASEP_MEM_INVALID_HANDLE (0ul) -#define BASE_MEM_MMU_DUMP_HANDLE (1ul << LOCAL_PAGE_SHIFT) -#define BASE_MEM_TRACE_BUFFER_HANDLE (2ul << LOCAL_PAGE_SHIFT) -#define BASE_MEM_MAP_TRACKING_HANDLE (3ul << LOCAL_PAGE_SHIFT) -#define BASEP_MEM_WRITE_ALLOC_PAGES_HANDLE (4ul << LOCAL_PAGE_SHIFT) -/* reserved handles ..-47<<PAGE_SHIFT> for future special handles */ +/* Special base mem handles specific to CSF. + */ #define BASEP_MEM_CSF_USER_REG_PAGE_HANDLE (47ul << LOCAL_PAGE_SHIFT) #define BASEP_MEM_CSF_USER_IO_PAGES_HANDLE (48ul << LOCAL_PAGE_SHIFT) -#define BASE_MEM_COOKIE_BASE (64ul << LOCAL_PAGE_SHIFT) -#define BASE_MEM_FIRST_FREE_ADDRESS \ - ((BITS_PER_LONG << LOCAL_PAGE_SHIFT) + BASE_MEM_COOKIE_BASE) #define KBASE_CSF_NUM_USER_IO_PAGES_HANDLE \ ((BASE_MEM_COOKIE_BASE - BASEP_MEM_CSF_USER_IO_PAGES_HANDLE) >> \ LOCAL_PAGE_SHIFT) -/** - * Valid set of just-in-time memory allocation flags - */ +/* Valid set of just-in-time memory allocation flags */ #define BASE_JIT_ALLOC_VALID_FLAGS ((__u8)0) -/* Flags to pass to ::base_context_init. - * Flags can be ORed together to enable multiple things. - * - * These share the same space as BASEP_CONTEXT_FLAG_*, and so must - * not collide with them. - */ -typedef __u32 base_context_create_flags; - -/* No flags set */ -#define BASE_CONTEXT_CREATE_FLAG_NONE ((base_context_create_flags)0) - -/* Base context is embedded in a cctx object (flag used for CINSTR - * software counter macros) - */ -#define BASE_CONTEXT_CCTX_EMBEDDED ((base_context_create_flags)1 << 0) - -/* Base context is a 'System Monitor' context for Hardware counters. - * - * One important side effect of this is that job submission is disabled. - */ -#define BASE_CONTEXT_SYSTEM_MONITOR_SUBMIT_DISABLED \ - ((base_context_create_flags)1 << 1) +/* flags for base context specific to CSF */ /* Base context creates a CSF event notification thread. * @@ -242,22 +89,6 @@ typedef __u32 base_context_create_flags; */ #define BASE_CONTEXT_CSF_EVENT_THREAD ((base_context_create_flags)1 << 2) -/* Bit-shift used to encode a memory group ID in base_context_create_flags - */ -#define BASEP_CONTEXT_MMU_GROUP_ID_SHIFT (3) - -/* Bitmask used to encode a memory group ID in base_context_create_flags - */ -#define BASEP_CONTEXT_MMU_GROUP_ID_MASK \ - ((base_context_create_flags)0xF << BASEP_CONTEXT_MMU_GROUP_ID_SHIFT) - -/* Bitpattern describing the base_context_create_flags that can be - * passed to the kernel - */ -#define BASEP_CONTEXT_CREATE_KERNEL_FLAGS \ - (BASE_CONTEXT_SYSTEM_MONITOR_SUBMIT_DISABLED | \ - BASEP_CONTEXT_MMU_GROUP_ID_MASK) - /* Bitpattern describing the ::base_context_create_flags that can be * passed to base_context_init() */ @@ -266,15 +97,7 @@ typedef __u32 base_context_create_flags; BASE_CONTEXT_CSF_EVENT_THREAD | \ BASEP_CONTEXT_CREATE_KERNEL_FLAGS) -/* Enable additional tracepoints for latency measurements (TL_ATOM_READY, - * TL_ATOM_DONE, TL_ATOM_PRIO_CHANGE, TL_ATOM_EVENT_POST) - */ -#define BASE_TLSTREAM_ENABLE_LATENCY_TRACEPOINTS (1 << 0) - -/* Indicate that job dumping is enabled. This could affect certain timers - * to account for the performance impact. - */ -#define BASE_TLSTREAM_JOB_DUMPING_ENABLED (1 << 1) +/* Flags for base tracepoint specific to CSF */ /* Enable KBase tracepoints for CSF builds */ #define BASE_TLSTREAM_ENABLE_CSF_TRACEPOINTS (1 << 2) @@ -295,9 +118,21 @@ typedef __u32 base_context_create_flags; #define BASE_QUEUE_MAX_PRIORITY (15U) -/* CQS Sync object is an array of __u32 event_mem[2], error field index is 1 */ -#define BASEP_EVENT_VAL_INDEX (0U) -#define BASEP_EVENT_ERR_INDEX (1U) +/* Sync32 object fields definition */ +#define BASEP_EVENT32_VAL_OFFSET (0U) +#define BASEP_EVENT32_ERR_OFFSET (4U) +#define BASEP_EVENT32_SIZE_BYTES (8U) + +/* Sync64 object fields definition */ +#define BASEP_EVENT64_VAL_OFFSET (0U) +#define BASEP_EVENT64_ERR_OFFSET (8U) +#define BASEP_EVENT64_SIZE_BYTES (16U) + +/* Sync32 object alignment, equal to its size */ +#define BASEP_EVENT32_ALIGN_BYTES (8U) + +/* Sync64 object alignment, equal to its size */ +#define BASEP_EVENT64_ALIGN_BYTES (16U) /* The upper limit for number of objects that could be waited/set per command. * This limit is now enforced as internally the error inherit inputs are @@ -306,6 +141,13 @@ typedef __u32 base_context_create_flags; */ #define BASEP_KCPU_CQS_MAX_NUM_OBJS ((size_t)32) +/* CSF CSI EXCEPTION_HANDLER_FLAGS */ +#define BASE_CSF_TILER_OOM_EXCEPTION_FLAG (1u << 0) +#define BASE_CSF_EXCEPTION_HANDLER_FLAGS_MASK (BASE_CSF_TILER_OOM_EXCEPTION_FLAG) + +/* Initial value for LATEST_FLUSH register */ +#define POWER_DOWN_LATEST_FLUSH_VALUE ((uint32_t)1) + /** * enum base_kcpu_command_type - Kernel CPU queue command type. * @BASE_KCPU_COMMAND_TYPE_FENCE_SIGNAL: fence_signal, @@ -335,7 +177,7 @@ enum base_kcpu_command_type { BASE_KCPU_COMMAND_TYPE_JIT_ALLOC, BASE_KCPU_COMMAND_TYPE_JIT_FREE, BASE_KCPU_COMMAND_TYPE_GROUP_SUSPEND, - BASE_KCPU_COMMAND_TYPE_ERROR_BARRIER + BASE_KCPU_COMMAND_TYPE_ERROR_BARRIER, }; /** @@ -725,4 +567,47 @@ struct base_csf_notification { } payload; }; +/** + * struct mali_base_gpu_core_props - GPU core props info + * + * @product_id: Pro specific value. + * @version_status: Status of the GPU release. No defined values, but starts at + * 0 and increases by one for each release status (alpha, beta, EAC, etc.). + * 4 bit values (0-15). + * @minor_revision: Minor release number of the GPU. "P" part of an "RnPn" + * release number. + * 8 bit values (0-255). + * @major_revision: Major release number of the GPU. "R" part of an "RnPn" + * release number. + * 4 bit values (0-15). + * @padding: padding to align to 8-byte + * @gpu_freq_khz_max: The maximum GPU frequency. Reported to applications by + * clGetDeviceInfo() + * @log2_program_counter_size: Size of the shader program counter, in bits. + * @texture_features: TEXTURE_FEATURES_x registers, as exposed by the GPU. This + * is a bitpattern where a set bit indicates that the format is supported. + * Before using a texture format, it is recommended that the corresponding + * bit be checked. + * @paddings: Padding bytes. + * @gpu_available_memory_size: Theoretical maximum memory available to the GPU. + * It is unlikely that a client will be able to allocate all of this memory + * for their own purposes, but this at least provides an upper bound on the + * memory available to the GPU. + * This is required for OpenCL's clGetDeviceInfo() call when + * CL_DEVICE_GLOBAL_MEM_SIZE is requested, for OpenCL GPU devices. The + * client will not be expecting to allocate anywhere near this value. + */ +struct mali_base_gpu_core_props { + __u32 product_id; + __u16 version_status; + __u16 minor_revision; + __u16 major_revision; + __u16 padding; + __u32 gpu_freq_khz_max; + __u32 log2_program_counter_size; + __u32 texture_features[BASE_GPU_NUM_TEXTURE_FEATURES_REGISTERS]; + __u8 paddings[4]; + __u64 gpu_available_memory_size; +}; + #endif /* _UAPI_BASE_CSF_KERNEL_H_ */ diff --git a/common/include/uapi/gpu/arm/midgard/csf/mali_kbase_csf_errors_dumpfault.h b/common/include/uapi/gpu/arm/midgard/csf/mali_kbase_csf_errors_dumpfault.h new file mode 100644 index 0000000..f49ab00 --- /dev/null +++ b/common/include/uapi/gpu/arm/midgard/csf/mali_kbase_csf_errors_dumpfault.h @@ -0,0 +1,81 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * + * (C) COPYRIGHT 2022 ARM Limited. All rights reserved. + * + * This program is free software and is provided to you under the terms of the + * GNU General Public License version 2 as published by the Free Software + * Foundation, and any use by you of this program is subject to the terms + * of such GNU license. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, you can access it online at + * http://www.gnu.org/licenses/gpl-2.0.html. + * + */ + +#ifndef _UAPI_KBASE_CSF_ERRORS_DUMPFAULT_H_ +#define _UAPI_KBASE_CSF_ERRORS_DUMPFAULT_H_ + +/** + * enum dumpfault_error_type - Enumeration to define errors to be dumped + * + * @DF_NO_ERROR: No pending error + * @DF_CSG_SUSPEND_TIMEOUT: CSG suspension timeout + * @DF_CSG_TERMINATE_TIMEOUT: CSG group termination timeout + * @DF_CSG_START_TIMEOUT: CSG start timeout + * @DF_CSG_RESUME_TIMEOUT: CSG resume timeout + * @DF_CSG_EP_CFG_TIMEOUT: CSG end point configuration timeout + * @DF_CSG_STATUS_UPDATE_TIMEOUT: CSG status update timeout + * @DF_PROGRESS_TIMER_TIMEOUT: Progress timer timeout + * @DF_FW_INTERNAL_ERROR: Firmware internal error + * @DF_CS_FATAL: CS fatal error + * @DF_CS_FAULT: CS fault error + * @DF_FENCE_WAIT_TIMEOUT: Fence wait timeout + * @DF_PROTECTED_MODE_EXIT_TIMEOUT: P.mode exit timeout + * @DF_PROTECTED_MODE_ENTRY_FAILURE: P.mode entrance failure + * @DF_PING_REQUEST_TIMEOUT: Ping request timeout + * @DF_CORE_DOWNSCALE_REQUEST_TIMEOUT: DCS downscale request timeout + * @DF_TILER_OOM: Tiler Out-of-memory error + * @DF_GPU_PAGE_FAULT: GPU page fault + * @DF_BUS_FAULT: MMU BUS Fault + * @DF_GPU_PROTECTED_FAULT: GPU P.mode fault + * @DF_AS_ACTIVE_STUCK: AS active stuck + * @DF_GPU_SOFT_RESET_FAILURE: GPU soft reset falure + * + * This is used for kbase to notify error type of an event whereby + * user space client will dump relevant debugging information via debugfs. + * @DF_NO_ERROR is used to indicate no pending fault, thus the client will + * be blocked on reading debugfs file till a fault happens. + */ +enum dumpfault_error_type { + DF_NO_ERROR = 0, + DF_CSG_SUSPEND_TIMEOUT, + DF_CSG_TERMINATE_TIMEOUT, + DF_CSG_START_TIMEOUT, + DF_CSG_RESUME_TIMEOUT, + DF_CSG_EP_CFG_TIMEOUT, + DF_CSG_STATUS_UPDATE_TIMEOUT, + DF_PROGRESS_TIMER_TIMEOUT, + DF_FW_INTERNAL_ERROR, + DF_CS_FATAL, + DF_CS_FAULT, + DF_FENCE_WAIT_TIMEOUT, + DF_PROTECTED_MODE_EXIT_TIMEOUT, + DF_PROTECTED_MODE_ENTRY_FAILURE, + DF_PING_REQUEST_TIMEOUT, + DF_CORE_DOWNSCALE_REQUEST_TIMEOUT, + DF_TILER_OOM, + DF_GPU_PAGE_FAULT, + DF_BUS_FAULT, + DF_GPU_PROTECTED_FAULT, + DF_AS_ACTIVE_STUCK, + DF_GPU_SOFT_RESET_FAILURE, +}; + +#endif /* _UAPI_KBASE_CSF_ERRORS_DUMPFAULT_H_ */ diff --git a/common/include/uapi/gpu/arm/midgard/csf/mali_kbase_csf_ioctl.h b/common/include/uapi/gpu/arm/midgard/csf/mali_kbase_csf_ioctl.h index 1794ddc..c9de5fd 100644 --- a/common/include/uapi/gpu/arm/midgard/csf/mali_kbase_csf_ioctl.h +++ b/common/include/uapi/gpu/arm/midgard/csf/mali_kbase_csf_ioctl.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2020-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2020-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -56,10 +56,44 @@ * - Added new Base memory allocation interface * 1.10: * - First release of new HW performance counters interface. + * 1.11: + * - Dummy model (no mali) backend will now clear HWC values after each sample + * 1.12: + * - Added support for incremental rendering flag in CSG create call + * 1.13: + * - Added ioctl to query a register of USER page. + * 1.14: + * - Added support for passing down the buffer descriptor VA in tiler heap init + * 1.15: + * - Enable new sync_wait GE condition + * 1.16: + * - Remove legacy definitions: + * - base_jit_alloc_info_10_2 + * - base_jit_alloc_info_11_5 + * - kbase_ioctl_mem_jit_init_10_2 + * - kbase_ioctl_mem_jit_init_11_5 + * 1.17: + * - Fix kinstr_prfcnt issues: + * - Missing implicit sample for CMD_STOP when HWCNT buffer is full. + * - Race condition when stopping periodic sampling. + * - prfcnt_block_metadata::block_idx gaps. + * - PRFCNT_CONTROL_CMD_SAMPLE_ASYNC is removed. + * 1.18: + * - Relax the requirement to create a mapping with BASE_MEM_MAP_TRACKING_HANDLE + * before allocating GPU memory for the context. + * - CPU mappings of USER_BUFFER imported memory handles must be cached. + * 1.19: + * - Add NE support in queue_group_create IOCTL fields + * - Previous version retained as KBASE_IOCTL_CS_QUEUE_GROUP_CREATE_1_18 for + * backward compatibility. + * 1.20: + * - Restrict child process from doing supported file operations (like mmap, ioctl, + * read, poll) on the file descriptor of mali device file that was inherited + * from the parent process. */ #define BASE_UK_VERSION_MAJOR 1 -#define BASE_UK_VERSION_MINOR 10 +#define BASE_UK_VERSION_MINOR 20 /** * struct kbase_ioctl_version_check - Check version compatibility between @@ -232,6 +266,56 @@ union kbase_ioctl_cs_queue_group_create_1_6 { _IOWR(KBASE_IOCTL_TYPE, 42, union kbase_ioctl_cs_queue_group_create_1_6) /** + * union kbase_ioctl_cs_queue_group_create_1_18 - Create a GPU command queue group + * @in: Input parameters + * @in.tiler_mask: Mask of tiler endpoints the group is allowed to use. + * @in.fragment_mask: Mask of fragment endpoints the group is allowed to use. + * @in.compute_mask: Mask of compute endpoints the group is allowed to use. + * @in.cs_min: Minimum number of CSs required. + * @in.priority: Queue group's priority within a process. + * @in.tiler_max: Maximum number of tiler endpoints the group is allowed + * to use. + * @in.fragment_max: Maximum number of fragment endpoints the group is + * allowed to use. + * @in.compute_max: Maximum number of compute endpoints the group is allowed + * to use. + * @in.csi_handlers: Flags to signal that the application intends to use CSI + * exception handlers in some linear buffers to deal with + * the given exception types. + * @in.padding: Currently unused, must be zero + * @out: Output parameters + * @out.group_handle: Handle of a newly created queue group. + * @out.padding: Currently unused, must be zero + * @out.group_uid: UID of the queue group available to base. + */ +union kbase_ioctl_cs_queue_group_create_1_18 { + struct { + __u64 tiler_mask; + __u64 fragment_mask; + __u64 compute_mask; + __u8 cs_min; + __u8 priority; + __u8 tiler_max; + __u8 fragment_max; + __u8 compute_max; + __u8 csi_handlers; + __u8 padding[2]; + /** + * @in.dvs_buf: buffer for deferred vertex shader + */ + __u64 dvs_buf; + } in; + struct { + __u8 group_handle; + __u8 padding[3]; + __u32 group_uid; + } out; +}; + +#define KBASE_IOCTL_CS_QUEUE_GROUP_CREATE_1_18 \ + _IOWR(KBASE_IOCTL_TYPE, 58, union kbase_ioctl_cs_queue_group_create_1_18) + +/** * union kbase_ioctl_cs_queue_group_create - Create a GPU command queue group * @in: Input parameters * @in.tiler_mask: Mask of tiler endpoints the group is allowed to use. @@ -245,6 +329,9 @@ union kbase_ioctl_cs_queue_group_create_1_6 { * allowed to use. * @in.compute_max: Maximum number of compute endpoints the group is allowed * to use. + * @in.csi_handlers: Flags to signal that the application intends to use CSI + * exception handlers in some linear buffers to deal with + * the given exception types. * @in.padding: Currently unused, must be zero * @out: Output parameters * @out.group_handle: Handle of a newly created queue group. @@ -261,11 +348,16 @@ union kbase_ioctl_cs_queue_group_create { __u8 tiler_max; __u8 fragment_max; __u8 compute_max; - __u8 padding[3]; + __u8 csi_handlers; + /** + * @in.reserved: Reserved, currently unused, must be zero. + */ + __u16 reserved; /** - * @reserved: Reserved + * @in.dvs_buf: buffer for deferred vertex shader */ - __u64 reserved; + __u64 dvs_buf; + __u64 padding[9]; } in; struct { __u8 group_handle; @@ -353,6 +445,7 @@ struct kbase_ioctl_kcpu_queue_enqueue { * allowed. * @in.group_id: Group ID to be used for physical allocations. * @in.padding: Padding + * @in.buf_desc_va: Buffer descriptor GPU VA for tiler heap reclaims. * @out: Output parameters * @out.gpu_heap_va: GPU VA (virtual address) of Heap context that was set up * for the heap. @@ -368,6 +461,7 @@ union kbase_ioctl_cs_tiler_heap_init { __u16 target_in_flight; __u8 group_id; __u8 padding; + __u64 buf_desc_va; } in; struct { __u64 gpu_heap_va; @@ -379,6 +473,43 @@ union kbase_ioctl_cs_tiler_heap_init { _IOWR(KBASE_IOCTL_TYPE, 48, union kbase_ioctl_cs_tiler_heap_init) /** + * union kbase_ioctl_cs_tiler_heap_init_1_13 - Initialize chunked tiler memory heap, + * earlier version upto 1.13 + * @in: Input parameters + * @in.chunk_size: Size of each chunk. + * @in.initial_chunks: Initial number of chunks that heap will be created with. + * @in.max_chunks: Maximum number of chunks that the heap is allowed to use. + * @in.target_in_flight: Number of render-passes that the driver should attempt to + * keep in flight for which allocation of new chunks is + * allowed. + * @in.group_id: Group ID to be used for physical allocations. + * @in.padding: Padding + * @out: Output parameters + * @out.gpu_heap_va: GPU VA (virtual address) of Heap context that was set up + * for the heap. + * @out.first_chunk_va: GPU VA of the first chunk allocated for the heap, + * actually points to the header of heap chunk and not to + * the low address of free memory in the chunk. + */ +union kbase_ioctl_cs_tiler_heap_init_1_13 { + struct { + __u32 chunk_size; + __u32 initial_chunks; + __u32 max_chunks; + __u16 target_in_flight; + __u8 group_id; + __u8 padding; + } in; + struct { + __u64 gpu_heap_va; + __u64 first_chunk_va; + } out; +}; + +#define KBASE_IOCTL_CS_TILER_HEAP_INIT_1_13 \ + _IOWR(KBASE_IOCTL_TYPE, 48, union kbase_ioctl_cs_tiler_heap_init_1_13) + +/** * struct kbase_ioctl_cs_tiler_heap_term - Terminate a chunked tiler heap * instance * @@ -479,6 +610,29 @@ union kbase_ioctl_mem_alloc_ex { #define KBASE_IOCTL_MEM_ALLOC_EX _IOWR(KBASE_IOCTL_TYPE, 59, union kbase_ioctl_mem_alloc_ex) +/** + * union kbase_ioctl_read_user_page - Read a register of USER page + * + * @in: Input parameters. + * @in.offset: Register offset in USER page. + * @in.padding: Padding to round up to a multiple of 8 bytes, must be zero. + * @out: Output parameters. + * @out.val_lo: Value of 32bit register or the 1st half of 64bit register to be read. + * @out.val_hi: Value of the 2nd half of 64bit register to be read. + */ +union kbase_ioctl_read_user_page { + struct { + __u32 offset; + __u32 padding; + } in; + struct { + __u32 val_lo; + __u32 val_hi; + } out; +}; + +#define KBASE_IOCTL_READ_USER_PAGE _IOWR(KBASE_IOCTL_TYPE, 60, union kbase_ioctl_read_user_page) + /*************** * test ioctls * ***************/ diff --git a/mali_kbase/mali_kbase_strings.h b/common/include/uapi/gpu/arm/midgard/gpu/backend/mali_kbase_gpu_regmap_csf.h index c3f94f9..eaa4b2d 100644 --- a/mali_kbase/mali_kbase_strings.h +++ b/common/include/uapi/gpu/arm/midgard/gpu/backend/mali_kbase_gpu_regmap_csf.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2010-2016, 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2021-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -19,5 +19,18 @@ * */ -extern const char kbase_drv_name[]; -extern const char kbase_timeline_name[]; +#ifndef _UAPI_KBASE_GPU_REGMAP_CSF_H_ +#define _UAPI_KBASE_GPU_REGMAP_CSF_H_ + +/* USER base address */ +#define USER_BASE 0x0010000 +#define USER_REG(r) (USER_BASE + (r)) + +/* USER register offsets */ +#define LATEST_FLUSH 0x0000 /* () Flush ID of latest clean-and-invalidate operation */ + +/* DOORBELLS base address */ +#define DOORBELLS_BASE 0x0080000 +#define DOORBELLS_REG(r) (DOORBELLS_BASE + (r)) + +#endif /* _UAPI_KBASE_GPU_REGMAP_CSF_H_ */ diff --git a/common/include/uapi/gpu/arm/midgard/gpu/backend/mali_kbase_gpu_regmap_jm.h b/common/include/uapi/gpu/arm/midgard/gpu/backend/mali_kbase_gpu_regmap_jm.h index f466389..d24afcc 100644 --- a/common/include/uapi/gpu/arm/midgard/gpu/backend/mali_kbase_gpu_regmap_jm.h +++ b/common/include/uapi/gpu/arm/midgard/gpu/backend/mali_kbase_gpu_regmap_jm.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -22,9 +22,4 @@ #ifndef _UAPI_KBASE_GPU_REGMAP_JM_H_ #define _UAPI_KBASE_GPU_REGMAP_JM_H_ -/* GPU control registers */ -#define LATEST_FLUSH 0x038 /* (RO) Flush ID of latest - * clean-and-invalidate operation - */ - #endif /* _UAPI_KBASE_GPU_REGMAP_JM_H_ */ diff --git a/common/include/uapi/gpu/arm/midgard/gpu/mali_kbase_gpu_id.h b/common/include/uapi/gpu/arm/midgard/gpu/mali_kbase_gpu_id.h index 1a99e56..784e09a 100644 --- a/common/include/uapi/gpu/arm/midgard/gpu/mali_kbase_gpu_id.h +++ b/common/include/uapi/gpu/arm/midgard/gpu/mali_kbase_gpu_id.h @@ -119,13 +119,14 @@ #define GPU_ID2_PRODUCT_TBEX GPU_ID2_MODEL_MAKE(9, 2) #define GPU_ID2_PRODUCT_LBEX GPU_ID2_MODEL_MAKE(9, 4) #define GPU_ID2_PRODUCT_TBAX GPU_ID2_MODEL_MAKE(9, 5) -#define GPU_ID2_PRODUCT_TDUX GPU_ID2_MODEL_MAKE(10, 1) #define GPU_ID2_PRODUCT_TODX GPU_ID2_MODEL_MAKE(10, 2) #define GPU_ID2_PRODUCT_TGRX GPU_ID2_MODEL_MAKE(10, 3) #define GPU_ID2_PRODUCT_TVAX GPU_ID2_MODEL_MAKE(10, 4) #define GPU_ID2_PRODUCT_LODX GPU_ID2_MODEL_MAKE(10, 7) #define GPU_ID2_PRODUCT_TTUX GPU_ID2_MODEL_MAKE(11, 2) #define GPU_ID2_PRODUCT_LTUX GPU_ID2_MODEL_MAKE(11, 3) +#define GPU_ID2_PRODUCT_TTIX GPU_ID2_MODEL_MAKE(12, 0) +#define GPU_ID2_PRODUCT_LTIX GPU_ID2_MODEL_MAKE(12, 1) /** * GPU_ID_MAKE - Helper macro to generate GPU_ID using id, major, minor, status diff --git a/common/include/uapi/gpu/arm/midgard/gpu/mali_kbase_gpu_regmap.h b/common/include/uapi/gpu/arm/midgard/gpu/mali_kbase_gpu_regmap.h index deca665..8256191 100644 --- a/common/include/uapi/gpu/arm/midgard/gpu/mali_kbase_gpu_regmap.h +++ b/common/include/uapi/gpu/arm/midgard/gpu/mali_kbase_gpu_regmap.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -22,13 +22,10 @@ #ifndef _UAPI_KBASE_GPU_REGMAP_H_ #define _UAPI_KBASE_GPU_REGMAP_H_ -#if !MALI_USE_CSF +#if MALI_USE_CSF +#include "backend/mali_kbase_gpu_regmap_csf.h" +#else #include "backend/mali_kbase_gpu_regmap_jm.h" #endif /* !MALI_USE_CSF */ -/* MMU control registers */ -#define MEMORY_MANAGEMENT_BASE 0x2000 -#define MMU_REG(r) (MEMORY_MANAGEMENT_BASE + (r)) -#define MMU_IRQ_RAWSTAT 0x000 /* (RW) Raw interrupt status register */ - #endif /* _UAPI_KBASE_GPU_REGMAP_H_ */ diff --git a/common/include/uapi/gpu/arm/midgard/jm/mali_base_jm_kernel.h b/common/include/uapi/gpu/arm/midgard/jm/mali_base_jm_kernel.h index 94f4dc7..1a3098d 100644 --- a/common/include/uapi/gpu/arm/midgard/jm/mali_base_jm_kernel.h +++ b/common/include/uapi/gpu/arm/midgard/jm/mali_base_jm_kernel.h @@ -23,100 +23,16 @@ #define _UAPI_BASE_JM_KERNEL_H_ #include <linux/types.h> +#include "../mali_base_common_kernel.h" -/* Memory allocation, access/hint flags. +/* Memory allocation, access/hint flags & mask specific to JM GPU. * * See base_mem_alloc_flags. */ -/* IN */ -/* Read access CPU side - */ -#define BASE_MEM_PROT_CPU_RD ((base_mem_alloc_flags)1 << 0) - -/* Write access CPU side - */ -#define BASE_MEM_PROT_CPU_WR ((base_mem_alloc_flags)1 << 1) - -/* Read access GPU side - */ -#define BASE_MEM_PROT_GPU_RD ((base_mem_alloc_flags)1 << 2) - -/* Write access GPU side - */ -#define BASE_MEM_PROT_GPU_WR ((base_mem_alloc_flags)1 << 3) - -/* Execute allowed on the GPU side - */ -#define BASE_MEM_PROT_GPU_EX ((base_mem_alloc_flags)1 << 4) - -/* Will be permanently mapped in kernel space. - * Flag is only allowed on allocations originating from kbase. - */ -#define BASEP_MEM_PERMANENT_KERNEL_MAPPING ((base_mem_alloc_flags)1 << 5) - -/* The allocation will completely reside within the same 4GB chunk in the GPU - * virtual space. - * Since this flag is primarily required only for the TLS memory which will - * not be used to contain executable code and also not used for Tiler heap, - * it can't be used along with BASE_MEM_PROT_GPU_EX and TILER_ALIGN_TOP flags. - */ -#define BASE_MEM_GPU_VA_SAME_4GB_PAGE ((base_mem_alloc_flags)1 << 6) - -/* Userspace is not allowed to free this memory. - * Flag is only allowed on allocations originating from kbase. - */ -#define BASEP_MEM_NO_USER_FREE ((base_mem_alloc_flags)1 << 7) - -/* Used as BASE_MEM_FIXED in other backends - */ +/* Used as BASE_MEM_FIXED in other backends */ #define BASE_MEM_RESERVED_BIT_8 ((base_mem_alloc_flags)1 << 8) -/* Grow backing store on GPU Page Fault - */ -#define BASE_MEM_GROW_ON_GPF ((base_mem_alloc_flags)1 << 9) - -/* Page coherence Outer shareable, if available - */ -#define BASE_MEM_COHERENT_SYSTEM ((base_mem_alloc_flags)1 << 10) - -/* Page coherence Inner shareable - */ -#define BASE_MEM_COHERENT_LOCAL ((base_mem_alloc_flags)1 << 11) - -/* IN/OUT */ -/* Should be cached on the CPU, returned if actually cached - */ -#define BASE_MEM_CACHED_CPU ((base_mem_alloc_flags)1 << 12) - -/* IN/OUT */ -/* Must have same VA on both the GPU and the CPU - */ -#define BASE_MEM_SAME_VA ((base_mem_alloc_flags)1 << 13) - -/* OUT */ -/* Must call mmap to acquire a GPU address for the allocation - */ -#define BASE_MEM_NEED_MMAP ((base_mem_alloc_flags)1 << 14) - -/* IN */ -/* Page coherence Outer shareable, required. - */ -#define BASE_MEM_COHERENT_SYSTEM_REQUIRED ((base_mem_alloc_flags)1 << 15) - -/* Protected memory - */ -#define BASE_MEM_PROTECTED ((base_mem_alloc_flags)1 << 16) - -/* Not needed physical memory - */ -#define BASE_MEM_DONT_NEED ((base_mem_alloc_flags)1 << 17) - -/* Must use shared CPU/GPU zone (SAME_VA zone) but doesn't require the - * addresses to be the same - */ -#define BASE_MEM_IMPORT_SHARED ((base_mem_alloc_flags)1 << 18) - /** * BASE_MEM_RESERVED_BIT_19 - Bit 19 is reserved. * @@ -131,47 +47,15 @@ */ #define BASE_MEM_TILER_ALIGN_TOP ((base_mem_alloc_flags)1 << 20) -/* Should be uncached on the GPU, will work only for GPUs using AARCH64 mmu - * mode. Some components within the GPU might only be able to access memory - * that is GPU cacheable. Refer to the specific GPU implementation for more - * details. The 3 shareability flags will be ignored for GPU uncached memory. - * If used while importing USER_BUFFER type memory, then the import will fail - * if the memory is not aligned to GPU and CPU cache line width. - */ -#define BASE_MEM_UNCACHED_GPU ((base_mem_alloc_flags)1 << 21) - -/* - * Bits [22:25] for group_id (0~15). - * - * base_mem_group_id_set() should be used to pack a memory group ID into a - * base_mem_alloc_flags value instead of accessing the bits directly. - * base_mem_group_id_get() should be used to extract the memory group ID from - * a base_mem_alloc_flags value. - */ -#define BASEP_MEM_GROUP_ID_SHIFT 22 -#define BASE_MEM_GROUP_ID_MASK \ - ((base_mem_alloc_flags)0xF << BASEP_MEM_GROUP_ID_SHIFT) - -/* Must do CPU cache maintenance when imported memory is mapped/unmapped - * on GPU. Currently applicable to dma-buf type only. - */ -#define BASE_MEM_IMPORT_SYNC_ON_MAP_UNMAP ((base_mem_alloc_flags)1 << 26) - /* Use the GPU VA chosen by the kernel client */ #define BASE_MEM_FLAG_MAP_FIXED ((base_mem_alloc_flags)1 << 27) -/* OUT */ -/* Kernel side cache sync ops required */ -#define BASE_MEM_KERNEL_SYNC ((base_mem_alloc_flags)1 << 28) - /* Force trimming of JIT allocations when creating a new allocation */ #define BASEP_MEM_PERFORM_JIT_TRIM ((base_mem_alloc_flags)1 << 29) -/* Number of bits used as flags for base memory management - * - * Must be kept in sync with the base_mem_alloc_flags flags +/* Note that the number of bits used for base_mem_alloc_flags + * must be less than BASE_MEM_FLAGS_NR_BITS !!! */ -#define BASE_MEM_FLAGS_NR_BITS 30 /* A mask of all the flags which are only valid for allocations within kbase, * and may not be passed from user space. @@ -180,29 +64,11 @@ (BASEP_MEM_PERMANENT_KERNEL_MAPPING | BASEP_MEM_NO_USER_FREE | \ BASE_MEM_FLAG_MAP_FIXED | BASEP_MEM_PERFORM_JIT_TRIM) -/* A mask for all output bits, excluding IN/OUT bits. - */ -#define BASE_MEM_FLAGS_OUTPUT_MASK BASE_MEM_NEED_MMAP - -/* A mask for all input bits, including IN/OUT bits. - */ -#define BASE_MEM_FLAGS_INPUT_MASK \ - (((1 << BASE_MEM_FLAGS_NR_BITS) - 1) & ~BASE_MEM_FLAGS_OUTPUT_MASK) - /* A mask of all currently reserved flags */ #define BASE_MEM_FLAGS_RESERVED \ (BASE_MEM_RESERVED_BIT_8 | BASE_MEM_RESERVED_BIT_19) -#define BASEP_MEM_INVALID_HANDLE (0ul) -#define BASE_MEM_MMU_DUMP_HANDLE (1ul << LOCAL_PAGE_SHIFT) -#define BASE_MEM_TRACE_BUFFER_HANDLE (2ul << LOCAL_PAGE_SHIFT) -#define BASE_MEM_MAP_TRACKING_HANDLE (3ul << LOCAL_PAGE_SHIFT) -#define BASEP_MEM_WRITE_ALLOC_PAGES_HANDLE (4ul << LOCAL_PAGE_SHIFT) -/* reserved handles ..-47<<PAGE_SHIFT> for future special handles */ -#define BASE_MEM_COOKIE_BASE (64ul << LOCAL_PAGE_SHIFT) -#define BASE_MEM_FIRST_FREE_ADDRESS \ - ((BITS_PER_LONG << LOCAL_PAGE_SHIFT) + BASE_MEM_COOKIE_BASE) /* Similar to BASE_MEM_TILER_ALIGN_TOP, memory starting from the end of the * initial commit is aligned to 'extension' pages, where 'extension' must be a power @@ -227,47 +93,6 @@ #define BASE_JIT_ALLOC_VALID_FLAGS \ (BASE_JIT_ALLOC_MEM_TILER_ALIGN_TOP | BASE_JIT_ALLOC_HEAP_INFO_IS_SIZE) -/** - * typedef base_context_create_flags - Flags to pass to ::base_context_init. - * - * Flags can be ORed together to enable multiple things. - * - * These share the same space as BASEP_CONTEXT_FLAG_*, and so must - * not collide with them. - */ -typedef __u32 base_context_create_flags; - -/* No flags set */ -#define BASE_CONTEXT_CREATE_FLAG_NONE ((base_context_create_flags)0) - -/* Base context is embedded in a cctx object (flag used for CINSTR - * software counter macros) - */ -#define BASE_CONTEXT_CCTX_EMBEDDED ((base_context_create_flags)1 << 0) - -/* Base context is a 'System Monitor' context for Hardware counters. - * - * One important side effect of this is that job submission is disabled. - */ -#define BASE_CONTEXT_SYSTEM_MONITOR_SUBMIT_DISABLED \ - ((base_context_create_flags)1 << 1) - -/* Bit-shift used to encode a memory group ID in base_context_create_flags - */ -#define BASEP_CONTEXT_MMU_GROUP_ID_SHIFT (3) - -/* Bitmask used to encode a memory group ID in base_context_create_flags - */ -#define BASEP_CONTEXT_MMU_GROUP_ID_MASK \ - ((base_context_create_flags)0xF << BASEP_CONTEXT_MMU_GROUP_ID_SHIFT) - -/* Bitpattern describing the base_context_create_flags that can be - * passed to the kernel - */ -#define BASEP_CONTEXT_CREATE_KERNEL_FLAGS \ - (BASE_CONTEXT_SYSTEM_MONITOR_SUBMIT_DISABLED | \ - BASEP_CONTEXT_MMU_GROUP_ID_MASK) - /* Bitpattern describing the ::base_context_create_flags that can be * passed to base_context_init() */ @@ -287,16 +112,7 @@ typedef __u32 base_context_create_flags; #define BASEP_CONTEXT_FLAG_JOB_DUMP_DISABLED \ ((base_context_create_flags)(1 << 31)) -/* Enable additional tracepoints for latency measurements (TL_ATOM_READY, - * TL_ATOM_DONE, TL_ATOM_PRIO_CHANGE, TL_ATOM_EVENT_POST) - */ -#define BASE_TLSTREAM_ENABLE_LATENCY_TRACEPOINTS (1 << 0) - -/* Indicate that job dumping is enabled. This could affect certain timers - * to account for the performance impact. - */ -#define BASE_TLSTREAM_JOB_DUMPING_ENABLED (1 << 1) - +/* Flags for base tracepoint specific to JM */ #define BASE_TLSTREAM_FLAGS_MASK (BASE_TLSTREAM_ENABLE_LATENCY_TRACEPOINTS | \ BASE_TLSTREAM_JOB_DUMPING_ENABLED) /* @@ -509,9 +325,6 @@ typedef __u32 base_jd_core_req; * takes priority * * This is only guaranteed to work for BASE_JD_REQ_ONLY_COMPUTE atoms. - * - * If the core availability policy is keeping the required core group turned - * off, then the job will fail with a BASE_JD_EVENT_PM_EVENT error code. */ #define BASE_JD_REQ_SPECIFIC_COHERENT_GROUP ((base_jd_core_req)1 << 11) @@ -770,6 +583,9 @@ typedef __u8 base_jd_prio; */ #define BASE_JD_PRIO_REALTIME ((base_jd_prio)3) +/* Invalid atom priority (max uint8_t value) */ +#define BASE_JD_PRIO_INVALID ((base_jd_prio)255) + /* Count of the number of priority levels. This itself is not a valid * base_jd_prio setting */ @@ -1016,11 +832,6 @@ enum { * BASE_JD_EVENT_JOB_CONFIG_FAULT, or if the * platform doesn't support the feature specified in * the atom. - * @BASE_JD_EVENT_PM_EVENT: TODO: remove as it's not used - * @BASE_JD_EVENT_TIMED_OUT: TODO: remove as it's not used - * @BASE_JD_EVENT_BAG_INVALID: TODO: remove as it's not used - * @BASE_JD_EVENT_PROGRESS_REPORT: TODO: remove as it's not used - * @BASE_JD_EVENT_BAG_DONE: TODO: remove as it's not used * @BASE_JD_EVENT_DRV_TERMINATED: this is a special event generated to indicate * to userspace that the KBase context has been * destroyed and Base should stop listening for @@ -1115,17 +926,10 @@ enum base_jd_event_code { /* SW defined exceptions */ BASE_JD_EVENT_MEM_GROWTH_FAILED = BASE_JD_SW_EVENT | BASE_JD_SW_EVENT_JOB | 0x000, - BASE_JD_EVENT_TIMED_OUT = - BASE_JD_SW_EVENT | BASE_JD_SW_EVENT_JOB | 0x001, BASE_JD_EVENT_JOB_CANCELLED = BASE_JD_SW_EVENT | BASE_JD_SW_EVENT_JOB | 0x002, BASE_JD_EVENT_JOB_INVALID = BASE_JD_SW_EVENT | BASE_JD_SW_EVENT_JOB | 0x003, - BASE_JD_EVENT_PM_EVENT = - BASE_JD_SW_EVENT | BASE_JD_SW_EVENT_JOB | 0x004, - - BASE_JD_EVENT_BAG_INVALID = - BASE_JD_SW_EVENT | BASE_JD_SW_EVENT_BAG | 0x003, BASE_JD_EVENT_RANGE_HW_FAULT_OR_SW_ERROR_END = BASE_JD_SW_EVENT | BASE_JD_SW_EVENT_RESERVED | 0x3FF, @@ -1133,10 +937,6 @@ enum base_jd_event_code { BASE_JD_EVENT_RANGE_SW_SUCCESS_START = BASE_JD_SW_EVENT | BASE_JD_SW_EVENT_SUCCESS | 0x000, - BASE_JD_EVENT_PROGRESS_REPORT = BASE_JD_SW_EVENT | - BASE_JD_SW_EVENT_SUCCESS | BASE_JD_SW_EVENT_JOB | 0x000, - BASE_JD_EVENT_BAG_DONE = BASE_JD_SW_EVENT | BASE_JD_SW_EVENT_SUCCESS | - BASE_JD_SW_EVENT_BAG | 0x000, BASE_JD_EVENT_DRV_TERMINATED = BASE_JD_SW_EVENT | BASE_JD_SW_EVENT_SUCCESS | BASE_JD_SW_EVENT_INFO | 0x000, @@ -1203,4 +1003,53 @@ struct base_dump_cpu_gpu_counters { __u8 padding[36]; }; +/** + * struct mali_base_gpu_core_props - GPU core props info + * + * @product_id: Pro specific value. + * @version_status: Status of the GPU release. No defined values, but starts at + * 0 and increases by one for each release status (alpha, beta, EAC, etc.). + * 4 bit values (0-15). + * @minor_revision: Minor release number of the GPU. "P" part of an "RnPn" + * release number. + * 8 bit values (0-255). + * @major_revision: Major release number of the GPU. "R" part of an "RnPn" + * release number. + * 4 bit values (0-15). + * @padding: padding to align to 8-byte + * @gpu_freq_khz_max: The maximum GPU frequency. Reported to applications by + * clGetDeviceInfo() + * @log2_program_counter_size: Size of the shader program counter, in bits. + * @texture_features: TEXTURE_FEATURES_x registers, as exposed by the GPU. This + * is a bitpattern where a set bit indicates that the format is supported. + * Before using a texture format, it is recommended that the corresponding + * bit be checked. + * @paddings_1: Padding bytes. + * @gpu_available_memory_size: Theoretical maximum memory available to the GPU. + * It is unlikely that a client will be able to allocate all of this memory + * for their own purposes, but this at least provides an upper bound on the + * memory available to the GPU. + * This is required for OpenCL's clGetDeviceInfo() call when + * CL_DEVICE_GLOBAL_MEM_SIZE is requested, for OpenCL GPU devices. The + * client will not be expecting to allocate anywhere near this value. + * @num_exec_engines: The number of execution engines. Only valid for tGOX + * (Bifrost) GPUs, where GPU_HAS_REG_CORE_FEATURES is defined. Otherwise, + * this is always 0. + * @paddings_2: Padding bytes. + */ +struct mali_base_gpu_core_props { + __u32 product_id; + __u16 version_status; + __u16 minor_revision; + __u16 major_revision; + __u16 padding; + __u32 gpu_freq_khz_max; + __u32 log2_program_counter_size; + __u32 texture_features[BASE_GPU_NUM_TEXTURE_FEATURES_REGISTERS]; + __u8 paddings_1[4]; + __u64 gpu_available_memory_size; + __u8 num_exec_engines; + __u8 paddings_2[7]; +}; + #endif /* _UAPI_BASE_JM_KERNEL_H_ */ diff --git a/common/include/uapi/gpu/arm/midgard/jm/mali_kbase_jm_ioctl.h b/common/include/uapi/gpu/arm/midgard/jm/mali_kbase_jm_ioctl.h index 215f12d..f2329f9 100644 --- a/common/include/uapi/gpu/arm/midgard/jm/mali_kbase_jm_ioctl.h +++ b/common/include/uapi/gpu/arm/midgard/jm/mali_kbase_jm_ioctl.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2020-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2020-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -125,9 +125,32 @@ * - Removed Kernel legacy HWC interface * 11.34: * - First release of new HW performance counters interface. + * 11.35: + * - Dummy model (no mali) backend will now clear HWC values after each sample + * 11.36: + * - Remove legacy definitions: + * - base_jit_alloc_info_10_2 + * - base_jit_alloc_info_11_5 + * - kbase_ioctl_mem_jit_init_10_2 + * - kbase_ioctl_mem_jit_init_11_5 + * 11.37: + * - Fix kinstr_prfcnt issues: + * - Missing implicit sample for CMD_STOP when HWCNT buffer is full. + * - Race condition when stopping periodic sampling. + * - prfcnt_block_metadata::block_idx gaps. + * - PRFCNT_CONTROL_CMD_SAMPLE_ASYNC is removed. + * 11.38: + * - Relax the requirement to create a mapping with BASE_MEM_MAP_TRACKING_HANDLE + * before allocating GPU memory for the context. + * - CPU mappings of USER_BUFFER imported memory handles must be cached. + * 11.39: + * - Restrict child process from doing supported file operations (like mmap, ioctl, + * read, poll) on the file descriptor of mali device file that was inherited + * from the parent process. */ + #define BASE_UK_VERSION_MAJOR 11 -#define BASE_UK_VERSION_MINOR 34 +#define BASE_UK_VERSION_MINOR 39 /** * struct kbase_ioctl_version_check - Check version compatibility between diff --git a/common/include/uapi/gpu/arm/midgard/mali_base_common_kernel.h b/common/include/uapi/gpu/arm/midgard/mali_base_common_kernel.h new file mode 100644 index 0000000..f837814 --- /dev/null +++ b/common/include/uapi/gpu/arm/midgard/mali_base_common_kernel.h @@ -0,0 +1,231 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * + * (C) COPYRIGHT 2022 ARM Limited. All rights reserved. + * + * This program is free software and is provided to you under the terms of the + * GNU General Public License version 2 as published by the Free Software + * Foundation, and any use by you of this program is subject to the terms + * of such GNU license. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, you can access it online at + * http://www.gnu.org/licenses/gpl-2.0.html. + * + */ + +#ifndef _UAPI_BASE_COMMON_KERNEL_H_ +#define _UAPI_BASE_COMMON_KERNEL_H_ + +#include <linux/types.h> + +struct base_mem_handle { + struct { + __u64 handle; + } basep; +}; + +#define BASE_GPU_NUM_TEXTURE_FEATURES_REGISTERS 4 + +/* Memory allocation, access/hint flags & mask. + * + * See base_mem_alloc_flags. + */ + +/* IN */ +/* Read access CPU side + */ +#define BASE_MEM_PROT_CPU_RD ((base_mem_alloc_flags)1 << 0) + +/* Write access CPU side + */ +#define BASE_MEM_PROT_CPU_WR ((base_mem_alloc_flags)1 << 1) + +/* Read access GPU side + */ +#define BASE_MEM_PROT_GPU_RD ((base_mem_alloc_flags)1 << 2) + +/* Write access GPU side + */ +#define BASE_MEM_PROT_GPU_WR ((base_mem_alloc_flags)1 << 3) + +/* Execute allowed on the GPU side + */ +#define BASE_MEM_PROT_GPU_EX ((base_mem_alloc_flags)1 << 4) + +/* Will be permanently mapped in kernel space. + * Flag is only allowed on allocations originating from kbase. + */ +#define BASEP_MEM_PERMANENT_KERNEL_MAPPING ((base_mem_alloc_flags)1 << 5) + +/* The allocation will completely reside within the same 4GB chunk in the GPU + * virtual space. + * Since this flag is primarily required only for the TLS memory which will + * not be used to contain executable code and also not used for Tiler heap, + * it can't be used along with BASE_MEM_PROT_GPU_EX and TILER_ALIGN_TOP flags. + */ +#define BASE_MEM_GPU_VA_SAME_4GB_PAGE ((base_mem_alloc_flags)1 << 6) + +/* Userspace is not allowed to free this memory. + * Flag is only allowed on allocations originating from kbase. + */ +#define BASEP_MEM_NO_USER_FREE ((base_mem_alloc_flags)1 << 7) + +/* Grow backing store on GPU Page Fault + */ +#define BASE_MEM_GROW_ON_GPF ((base_mem_alloc_flags)1 << 9) + +/* Page coherence Outer shareable, if available + */ +#define BASE_MEM_COHERENT_SYSTEM ((base_mem_alloc_flags)1 << 10) + +/* Page coherence Inner shareable + */ +#define BASE_MEM_COHERENT_LOCAL ((base_mem_alloc_flags)1 << 11) + +/* IN/OUT */ +/* Should be cached on the CPU, returned if actually cached + */ +#define BASE_MEM_CACHED_CPU ((base_mem_alloc_flags)1 << 12) + +/* IN/OUT */ +/* Must have same VA on both the GPU and the CPU + */ +#define BASE_MEM_SAME_VA ((base_mem_alloc_flags)1 << 13) + +/* OUT */ +/* Must call mmap to acquire a GPU address for the allocation + */ +#define BASE_MEM_NEED_MMAP ((base_mem_alloc_flags)1 << 14) + +/* IN */ +/* Page coherence Outer shareable, required. + */ +#define BASE_MEM_COHERENT_SYSTEM_REQUIRED ((base_mem_alloc_flags)1 << 15) + +/* Protected memory + */ +#define BASE_MEM_PROTECTED ((base_mem_alloc_flags)1 << 16) + +/* Not needed physical memory + */ +#define BASE_MEM_DONT_NEED ((base_mem_alloc_flags)1 << 17) + +/* Must use shared CPU/GPU zone (SAME_VA zone) but doesn't require the + * addresses to be the same + */ +#define BASE_MEM_IMPORT_SHARED ((base_mem_alloc_flags)1 << 18) + +/* Should be uncached on the GPU, will work only for GPUs using AARCH64 mmu + * mode. Some components within the GPU might only be able to access memory + * that is GPU cacheable. Refer to the specific GPU implementation for more + * details. The 3 shareability flags will be ignored for GPU uncached memory. + * If used while importing USER_BUFFER type memory, then the import will fail + * if the memory is not aligned to GPU and CPU cache line width. + */ +#define BASE_MEM_UNCACHED_GPU ((base_mem_alloc_flags)1 << 21) + +/* + * Bits [22:25] for group_id (0~15). + * + * base_mem_group_id_set() should be used to pack a memory group ID into a + * base_mem_alloc_flags value instead of accessing the bits directly. + * base_mem_group_id_get() should be used to extract the memory group ID from + * a base_mem_alloc_flags value. + */ +#define BASEP_MEM_GROUP_ID_SHIFT 22 +#define BASE_MEM_GROUP_ID_MASK ((base_mem_alloc_flags)0xF << BASEP_MEM_GROUP_ID_SHIFT) + +/* Must do CPU cache maintenance when imported memory is mapped/unmapped + * on GPU. Currently applicable to dma-buf type only. + */ +#define BASE_MEM_IMPORT_SYNC_ON_MAP_UNMAP ((base_mem_alloc_flags)1 << 26) + +/* OUT */ +/* Kernel side cache sync ops required */ +#define BASE_MEM_KERNEL_SYNC ((base_mem_alloc_flags)1 << 28) + +/* Number of bits used as flags for base memory management + * + * Must be kept in sync with the base_mem_alloc_flags flags + */ +#define BASE_MEM_FLAGS_NR_BITS 30 + +/* A mask for all output bits, excluding IN/OUT bits. + */ +#define BASE_MEM_FLAGS_OUTPUT_MASK BASE_MEM_NEED_MMAP + +/* A mask for all input bits, including IN/OUT bits. + */ +#define BASE_MEM_FLAGS_INPUT_MASK \ + (((1 << BASE_MEM_FLAGS_NR_BITS) - 1) & ~BASE_MEM_FLAGS_OUTPUT_MASK) + +/* Special base mem handles. + */ +#define BASEP_MEM_INVALID_HANDLE (0ul) +#define BASE_MEM_MMU_DUMP_HANDLE (1ul << LOCAL_PAGE_SHIFT) +#define BASE_MEM_TRACE_BUFFER_HANDLE (2ul << LOCAL_PAGE_SHIFT) +#define BASE_MEM_MAP_TRACKING_HANDLE (3ul << LOCAL_PAGE_SHIFT) +#define BASEP_MEM_WRITE_ALLOC_PAGES_HANDLE (4ul << LOCAL_PAGE_SHIFT) +/* reserved handles ..-47<<PAGE_SHIFT> for future special handles */ +#define BASE_MEM_COOKIE_BASE (64ul << LOCAL_PAGE_SHIFT) +#define BASE_MEM_FIRST_FREE_ADDRESS ((BITS_PER_LONG << LOCAL_PAGE_SHIFT) + BASE_MEM_COOKIE_BASE) + +/* Flags to pass to ::base_context_init. + * Flags can be ORed together to enable multiple things. + * + * These share the same space as BASEP_CONTEXT_FLAG_*, and so must + * not collide with them. + */ +typedef __u32 base_context_create_flags; + +/* Flags for base context */ + +/* No flags set */ +#define BASE_CONTEXT_CREATE_FLAG_NONE ((base_context_create_flags)0) + +/* Base context is embedded in a cctx object (flag used for CINSTR + * software counter macros) + */ +#define BASE_CONTEXT_CCTX_EMBEDDED ((base_context_create_flags)1 << 0) + +/* Base context is a 'System Monitor' context for Hardware counters. + * + * One important side effect of this is that job submission is disabled. + */ +#define BASE_CONTEXT_SYSTEM_MONITOR_SUBMIT_DISABLED ((base_context_create_flags)1 << 1) + +/* Bit-shift used to encode a memory group ID in base_context_create_flags + */ +#define BASEP_CONTEXT_MMU_GROUP_ID_SHIFT (3) + +/* Bitmask used to encode a memory group ID in base_context_create_flags + */ +#define BASEP_CONTEXT_MMU_GROUP_ID_MASK \ + ((base_context_create_flags)0xF << BASEP_CONTEXT_MMU_GROUP_ID_SHIFT) + +/* Bitpattern describing the base_context_create_flags that can be + * passed to the kernel + */ +#define BASEP_CONTEXT_CREATE_KERNEL_FLAGS \ + (BASE_CONTEXT_SYSTEM_MONITOR_SUBMIT_DISABLED | BASEP_CONTEXT_MMU_GROUP_ID_MASK) + +/* Flags for base tracepoint + */ + +/* Enable additional tracepoints for latency measurements (TL_ATOM_READY, + * TL_ATOM_DONE, TL_ATOM_PRIO_CHANGE, TL_ATOM_EVENT_POST) + */ +#define BASE_TLSTREAM_ENABLE_LATENCY_TRACEPOINTS (1 << 0) + +/* Indicate that job dumping is enabled. This could affect certain timers + * to account for the performance impact. + */ +#define BASE_TLSTREAM_JOB_DUMPING_ENABLED (1 << 1) + +#endif /* _UAPI_BASE_COMMON_KERNEL_H_ */ diff --git a/common/include/uapi/gpu/arm/midgard/mali_base_kernel.h b/common/include/uapi/gpu/arm/midgard/mali_base_kernel.h index f3ffb36..b1b2912 100644 --- a/common/include/uapi/gpu/arm/midgard/mali_base_kernel.h +++ b/common/include/uapi/gpu/arm/midgard/mali_base_kernel.h @@ -27,19 +27,10 @@ #define _UAPI_BASE_KERNEL_H_ #include <linux/types.h> - -struct base_mem_handle { - struct { - __u64 handle; - } basep; -}; - #include "mali_base_mem_priv.h" #include "gpu/mali_kbase_gpu_id.h" #include "gpu/mali_kbase_gpu_coherency.h" -#define BASE_GPU_NUM_TEXTURE_FEATURES_REGISTERS 4 - #define BASE_MAX_COHERENT_GROUPS 16 #if defined(PAGE_MASK) && defined(PAGE_SHIFT) @@ -62,9 +53,13 @@ struct base_mem_handle { */ #define BASE_MEM_GROUP_DEFAULT (0) +/* Physical memory group ID for explicit SLC allocations. + */ +#define BASE_MEM_GROUP_PIXEL_SLC_EXPLICIT (2) + /* Number of physical memory groups. */ -#define BASE_MEM_GROUP_COUNT (16) +#define BASE_MEM_GROUP_COUNT (4) /** * typedef base_mem_alloc_flags - Memory allocation, access/hint flags. @@ -206,55 +201,6 @@ struct base_mem_aliasing_info { */ #define BASE_JIT_ALLOC_COUNT (255) -/* base_jit_alloc_info in use for kernel driver versions 10.2 to early 11.5 - * - * jit_version is 1 - * - * Due to the lack of padding specified, user clients between 32 and 64-bit - * may have assumed a different size of the struct - * - * An array of structures was not supported - */ -struct base_jit_alloc_info_10_2 { - __u64 gpu_alloc_addr; - __u64 va_pages; - __u64 commit_pages; - __u64 extension; - __u8 id; -}; - -/* base_jit_alloc_info introduced by kernel driver version 11.5, and in use up - * to 11.19 - * - * This structure had a number of modifications during and after kernel driver - * version 11.5, but remains size-compatible throughout its version history, and - * with earlier variants compatible with future variants by requiring - * zero-initialization to the unused space in the structure. - * - * jit_version is 2 - * - * Kernel driver version history: - * 11.5: Initial introduction with 'usage_id' and padding[5]. All padding bytes - * must be zero. Kbase minor version was not incremented, so some - * versions of 11.5 do not have this change. - * 11.5: Added 'bin_id' and 'max_allocations', replacing 2 padding bytes (Kbase - * minor version not incremented) - * 11.6: Added 'flags', replacing 1 padding byte - * 11.10: Arrays of this structure are supported - */ -struct base_jit_alloc_info_11_5 { - __u64 gpu_alloc_addr; - __u64 va_pages; - __u64 commit_pages; - __u64 extension; - __u8 id; - __u8 bin_id; - __u8 max_allocations; - __u8 flags; - __u8 padding[2]; - __u16 usage_id; -}; - /** * struct base_jit_alloc_info - Structure which describes a JIT allocation * request. @@ -284,16 +230,6 @@ struct base_jit_alloc_info_11_5 { * @heap_info_gpu_addr: Pointer to an object in GPU memory describing * the actual usage of the region. * - * jit_version is 3. - * - * When modifications are made to this structure, it is still compatible with - * jit_version 3 when: a) the size is unchanged, and b) new members only - * replace the padding bytes. - * - * Previous jit_version history: - * jit_version == 1, refer to &base_jit_alloc_info_10_2 - * jit_version == 2, refer to &base_jit_alloc_info_11_5 - * * Kbase version history: * 11.20: added @heap_info_gpu_addr */ @@ -458,49 +394,6 @@ struct base_jd_debug_copy_buffer { * 16 coherent groups, since core groups are typically 4 cores. */ -/** - * struct mali_base_gpu_core_props - GPU core props info - * - * @product_id: Pro specific value. - * @version_status: Status of the GPU release. No defined values, but starts at - * 0 and increases by one for each release status (alpha, beta, EAC, etc.). - * 4 bit values (0-15). - * @minor_revision: Minor release number of the GPU. "P" part of an "RnPn" - * release number. - * 8 bit values (0-255). - * @major_revision: Major release number of the GPU. "R" part of an "RnPn" - * release number. - * 4 bit values (0-15). - * @padding: padding to allign to 8-byte - * @gpu_freq_khz_max: The maximum GPU frequency. Reported to applications by - * clGetDeviceInfo() - * @log2_program_counter_size: Size of the shader program counter, in bits. - * @texture_features: TEXTURE_FEATURES_x registers, as exposed by the GPU. This - * is a bitpattern where a set bit indicates that the format is supported. - * Before using a texture format, it is recommended that the corresponding - * bit be checked. - * @gpu_available_memory_size: Theoretical maximum memory available to the GPU. - * It is unlikely that a client will be able to allocate all of this memory - * for their own purposes, but this at least provides an upper bound on the - * memory available to the GPU. - * This is required for OpenCL's clGetDeviceInfo() call when - * CL_DEVICE_GLOBAL_MEM_SIZE is requested, for OpenCL GPU devices. The - * client will not be expecting to allocate anywhere near this value. - * @num_exec_engines: The number of execution engines. - */ -struct mali_base_gpu_core_props { - __u32 product_id; - __u16 version_status; - __u16 minor_revision; - __u16 major_revision; - __u16 padding; - __u32 gpu_freq_khz_max; - __u32 log2_program_counter_size; - __u32 texture_features[BASE_GPU_NUM_TEXTURE_FEATURES_REGISTERS]; - __u64 gpu_available_memory_size; - __u8 num_exec_engines; -}; - /* * More information is possible - but associativity and bus width are not * required by upper-level apis. @@ -531,7 +424,7 @@ struct mali_base_gpu_tiler_props { * field. * @impl_tech: 0 = Not specified, 1 = Silicon, 2 = FPGA, * 3 = SW Model/Emulation - * @padding: padding to allign to 8-byte + * @padding: padding to align to 8-byte * @tls_alloc: Number of threads per core that TLS must be * allocated for */ @@ -551,7 +444,7 @@ struct mali_base_gpu_thread_props { * struct mali_base_gpu_coherent_group - descriptor for a coherent group * @core_mask: Core restriction mask required for the group * @num_cores: Number of cores in the group - * @padding: padding to allign to 8-byte + * @padding: padding to align to 8-byte * * \c core_mask exposes all cores in that coherent group, and \c num_cores * provides a cached population-count for that mask. @@ -581,7 +474,7 @@ struct mali_base_gpu_coherent_group { * are in the group[] member. Use num_groups instead. * @coherency: Coherency features of the memory, accessed by gpu_mem_features * methods - * @padding: padding to allign to 8-byte + * @padding: padding to align to 8-byte * @group: Descriptors of coherent groups * * Note that the sizes of the members could be reduced. However, the \c group @@ -599,6 +492,12 @@ struct mali_base_gpu_coherent_group_info { struct mali_base_gpu_coherent_group group[BASE_MAX_COHERENT_GROUPS]; }; +#if MALI_USE_CSF +#include "csf/mali_base_csf_kernel.h" +#else +#include "jm/mali_base_jm_kernel.h" +#endif + /** * struct gpu_raw_gpu_props - A complete description of the GPU's Hardware * Configuration Discovery registers. @@ -696,12 +595,6 @@ struct base_gpu_props { struct mali_base_gpu_coherent_group_info coherency_info; }; -#if MALI_USE_CSF -#include "csf/mali_base_csf_kernel.h" -#else -#include "jm/mali_base_jm_kernel.h" -#endif - #define BASE_MEM_GROUP_ID_GET(flags) \ ((flags & BASE_MEM_GROUP_ID_MASK) >> BASEP_MEM_GROUP_ID_SHIFT) diff --git a/common/include/uapi/gpu/arm/midgard/mali_base_mem_priv.h b/common/include/uapi/gpu/arm/midgard/mali_base_mem_priv.h index 304a334..70f5b09 100644 --- a/common/include/uapi/gpu/arm/midgard/mali_base_mem_priv.h +++ b/common/include/uapi/gpu/arm/midgard/mali_base_mem_priv.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2010-2015, 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2010-2015, 2020-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -23,8 +23,7 @@ #define _UAPI_BASE_MEM_PRIV_H_ #include <linux/types.h> - -#include "mali_base_kernel.h" +#include "mali_base_common_kernel.h" #define BASE_SYNCSET_OP_MSYNC (1U << 0) #define BASE_SYNCSET_OP_CSYNC (1U << 1) diff --git a/common/include/uapi/gpu/arm/midgard/mali_kbase_hwcnt_reader.h b/common/include/uapi/gpu/arm/midgard/mali_kbase_hwcnt_reader.h index 42d93ba..5089bf2 100644 --- a/common/include/uapi/gpu/arm/midgard/mali_kbase_hwcnt_reader.h +++ b/common/include/uapi/gpu/arm/midgard/mali_kbase_hwcnt_reader.h @@ -221,6 +221,7 @@ struct prfcnt_enum_sample_info { /** * struct prfcnt_enum_item - Performance counter enumeration item. + * @padding: Padding bytes. * @hdr: Header describing the type of item in the list. * @u: Structure containing discriptor for enumeration item type. * @u.block_counter: Performance counter block descriptor. @@ -229,6 +230,7 @@ struct prfcnt_enum_sample_info { */ struct prfcnt_enum_item { struct prfcnt_item_header hdr; + __u8 padding[4]; /** union u - union of block_counter and request */ union { struct prfcnt_enum_block_counter block_counter; @@ -305,6 +307,7 @@ struct prfcnt_request_scope { /** * struct prfcnt_request_item - Performance counter request item. + * @padding: Padding bytes. * @hdr: Header describing the type of item in the list. * @u: Structure containing descriptor for request type. * @u.req_mode: Mode request descriptor. @@ -313,6 +316,7 @@ struct prfcnt_request_scope { */ struct prfcnt_request_item { struct prfcnt_item_header hdr; + __u8 padding[4]; /** union u - union on req_mode and req_enable */ union { struct prfcnt_request_mode req_mode; @@ -417,6 +421,7 @@ struct prfcnt_block_metadata { /** * struct prfcnt_metadata - Performance counter metadata item. + * @padding: Padding bytes. * @hdr: Header describing the type of item in the list. * @u: Structure containing descriptor for metadata type. * @u.sample_md: Counter sample data metadata descriptor. @@ -425,6 +430,7 @@ struct prfcnt_block_metadata { */ struct prfcnt_metadata { struct prfcnt_item_header hdr; + __u8 padding[4]; union { struct prfcnt_sample_metadata sample_md; struct prfcnt_clock_metadata clock_md; @@ -439,7 +445,7 @@ struct prfcnt_metadata { * @PRFCNT_CONTROL_CMD_STOP: Stop the counter data dump run for the * calling client session. * @PRFCNT_CONTROL_CMD_SAMPLE_SYNC: Trigger a synchronous manual sample. - * @PRFCNT_CONTROL_CMD_SAMPLE_ASYNC: Trigger an asynchronous manual sample. + * @PRFCNT_CONTROL_CMD_RESERVED: Previously SAMPLE_ASYNC not supported any more. * @PRFCNT_CONTROL_CMD_DISCARD: Discard all samples which have not yet * been consumed by userspace. Note that * this can race with new samples if @@ -449,7 +455,7 @@ enum prfcnt_control_cmd_code { PRFCNT_CONTROL_CMD_START = 1, PRFCNT_CONTROL_CMD_STOP, PRFCNT_CONTROL_CMD_SAMPLE_SYNC, - PRFCNT_CONTROL_CMD_SAMPLE_ASYNC, + PRFCNT_CONTROL_CMD_RESERVED, PRFCNT_CONTROL_CMD_DISCARD, }; diff --git a/common/include/uapi/gpu/arm/midgard/mali_kbase_ioctl.h b/common/include/uapi/gpu/arm/midgard/mali_kbase_ioctl.h index d1d5f3d..e72c82e 100644 --- a/common/include/uapi/gpu/arm/midgard/mali_kbase_ioctl.h +++ b/common/include/uapi/gpu/arm/midgard/mali_kbase_ioctl.h @@ -46,8 +46,7 @@ struct kbase_ioctl_set_flags { __u32 create_flags; }; -#define KBASE_IOCTL_SET_FLAGS \ - _IOW(KBASE_IOCTL_TYPE, 1, struct kbase_ioctl_set_flags) +#define KBASE_IOCTL_SET_FLAGS _IOW(KBASE_IOCTL_TYPE, 1, struct kbase_ioctl_set_flags) /** * struct kbase_ioctl_get_gpuprops - Read GPU properties from the kernel @@ -81,8 +80,7 @@ struct kbase_ioctl_get_gpuprops { __u32 flags; }; -#define KBASE_IOCTL_GET_GPUPROPS \ - _IOW(KBASE_IOCTL_TYPE, 3, struct kbase_ioctl_get_gpuprops) +#define KBASE_IOCTL_GET_GPUPROPS _IOW(KBASE_IOCTL_TYPE, 3, struct kbase_ioctl_get_gpuprops) /** * union kbase_ioctl_mem_alloc - Allocate memory on the GPU @@ -108,8 +106,7 @@ union kbase_ioctl_mem_alloc { } out; }; -#define KBASE_IOCTL_MEM_ALLOC \ - _IOWR(KBASE_IOCTL_TYPE, 5, union kbase_ioctl_mem_alloc) +#define KBASE_IOCTL_MEM_ALLOC _IOWR(KBASE_IOCTL_TYPE, 5, union kbase_ioctl_mem_alloc) /** * struct kbase_ioctl_mem_query - Query properties of a GPU memory region @@ -131,12 +128,11 @@ union kbase_ioctl_mem_query { } out; }; -#define KBASE_IOCTL_MEM_QUERY \ - _IOWR(KBASE_IOCTL_TYPE, 6, union kbase_ioctl_mem_query) +#define KBASE_IOCTL_MEM_QUERY _IOWR(KBASE_IOCTL_TYPE, 6, union kbase_ioctl_mem_query) -#define KBASE_MEM_QUERY_COMMIT_SIZE ((__u64)1) -#define KBASE_MEM_QUERY_VA_SIZE ((__u64)2) -#define KBASE_MEM_QUERY_FLAGS ((__u64)3) +#define KBASE_MEM_QUERY_COMMIT_SIZE ((__u64)1) +#define KBASE_MEM_QUERY_VA_SIZE ((__u64)2) +#define KBASE_MEM_QUERY_FLAGS ((__u64)3) /** * struct kbase_ioctl_mem_free - Free a memory region @@ -146,8 +142,7 @@ struct kbase_ioctl_mem_free { __u64 gpu_addr; }; -#define KBASE_IOCTL_MEM_FREE \ - _IOW(KBASE_IOCTL_TYPE, 7, struct kbase_ioctl_mem_free) +#define KBASE_IOCTL_MEM_FREE _IOW(KBASE_IOCTL_TYPE, 7, struct kbase_ioctl_mem_free) /** * struct kbase_ioctl_hwcnt_reader_setup - Setup HWC dumper/reader @@ -167,7 +162,7 @@ struct kbase_ioctl_hwcnt_reader_setup { __u32 mmu_l2_bm; }; -#define KBASE_IOCTL_HWCNT_READER_SETUP \ +#define KBASE_IOCTL_HWCNT_READER_SETUP \ _IOW(KBASE_IOCTL_TYPE, 8, struct kbase_ioctl_hwcnt_reader_setup) /** @@ -182,8 +177,7 @@ struct kbase_ioctl_hwcnt_values { __u32 padding; }; -#define KBASE_IOCTL_HWCNT_SET \ - _IOW(KBASE_IOCTL_TYPE, 32, struct kbase_ioctl_hwcnt_values) +#define KBASE_IOCTL_HWCNT_SET _IOW(KBASE_IOCTL_TYPE, 32, struct kbase_ioctl_hwcnt_values) /** * struct kbase_ioctl_disjoint_query - Query the disjoint counter @@ -193,8 +187,7 @@ struct kbase_ioctl_disjoint_query { __u32 counter; }; -#define KBASE_IOCTL_DISJOINT_QUERY \ - _IOR(KBASE_IOCTL_TYPE, 12, struct kbase_ioctl_disjoint_query) +#define KBASE_IOCTL_DISJOINT_QUERY _IOR(KBASE_IOCTL_TYPE, 12, struct kbase_ioctl_disjoint_query) /** * struct kbase_ioctl_get_ddk_version - Query the kernel version @@ -215,54 +208,7 @@ struct kbase_ioctl_get_ddk_version { __u32 padding; }; -#define KBASE_IOCTL_GET_DDK_VERSION \ - _IOW(KBASE_IOCTL_TYPE, 13, struct kbase_ioctl_get_ddk_version) - -/** - * struct kbase_ioctl_mem_jit_init_10_2 - Initialize the just-in-time memory - * allocator (between kernel driver - * version 10.2--11.4) - * @va_pages: Number of VA pages to reserve for JIT - * - * Note that depending on the VA size of the application and GPU, the value - * specified in @va_pages may be ignored. - * - * New code should use KBASE_IOCTL_MEM_JIT_INIT instead, this is kept for - * backwards compatibility. - */ -struct kbase_ioctl_mem_jit_init_10_2 { - __u64 va_pages; -}; - -#define KBASE_IOCTL_MEM_JIT_INIT_10_2 \ - _IOW(KBASE_IOCTL_TYPE, 14, struct kbase_ioctl_mem_jit_init_10_2) - -/** - * struct kbase_ioctl_mem_jit_init_11_5 - Initialize the just-in-time memory - * allocator (between kernel driver - * version 11.5--11.19) - * @va_pages: Number of VA pages to reserve for JIT - * @max_allocations: Maximum number of concurrent allocations - * @trim_level: Level of JIT allocation trimming to perform on free (0 - 100%) - * @group_id: Group ID to be used for physical allocations - * @padding: Currently unused, must be zero - * - * Note that depending on the VA size of the application and GPU, the value - * specified in @va_pages may be ignored. - * - * New code should use KBASE_IOCTL_MEM_JIT_INIT instead, this is kept for - * backwards compatibility. - */ -struct kbase_ioctl_mem_jit_init_11_5 { - __u64 va_pages; - __u8 max_allocations; - __u8 trim_level; - __u8 group_id; - __u8 padding[5]; -}; - -#define KBASE_IOCTL_MEM_JIT_INIT_11_5 \ - _IOW(KBASE_IOCTL_TYPE, 14, struct kbase_ioctl_mem_jit_init_11_5) +#define KBASE_IOCTL_GET_DDK_VERSION _IOW(KBASE_IOCTL_TYPE, 13, struct kbase_ioctl_get_ddk_version) /** * struct kbase_ioctl_mem_jit_init - Initialize the just-in-time memory @@ -287,8 +233,7 @@ struct kbase_ioctl_mem_jit_init { __u64 phys_pages; }; -#define KBASE_IOCTL_MEM_JIT_INIT \ - _IOW(KBASE_IOCTL_TYPE, 14, struct kbase_ioctl_mem_jit_init) +#define KBASE_IOCTL_MEM_JIT_INIT _IOW(KBASE_IOCTL_TYPE, 14, struct kbase_ioctl_mem_jit_init) /** * struct kbase_ioctl_mem_sync - Perform cache maintenance on memory @@ -308,8 +253,7 @@ struct kbase_ioctl_mem_sync { __u8 padding[7]; }; -#define KBASE_IOCTL_MEM_SYNC \ - _IOW(KBASE_IOCTL_TYPE, 15, struct kbase_ioctl_mem_sync) +#define KBASE_IOCTL_MEM_SYNC _IOW(KBASE_IOCTL_TYPE, 15, struct kbase_ioctl_mem_sync) /** * union kbase_ioctl_mem_find_cpu_offset - Find the offset of a CPU pointer @@ -332,7 +276,7 @@ union kbase_ioctl_mem_find_cpu_offset { } out; }; -#define KBASE_IOCTL_MEM_FIND_CPU_OFFSET \ +#define KBASE_IOCTL_MEM_FIND_CPU_OFFSET \ _IOWR(KBASE_IOCTL_TYPE, 16, union kbase_ioctl_mem_find_cpu_offset) /** @@ -344,8 +288,7 @@ struct kbase_ioctl_get_context_id { __u32 id; }; -#define KBASE_IOCTL_GET_CONTEXT_ID \ - _IOR(KBASE_IOCTL_TYPE, 17, struct kbase_ioctl_get_context_id) +#define KBASE_IOCTL_GET_CONTEXT_ID _IOR(KBASE_IOCTL_TYPE, 17, struct kbase_ioctl_get_context_id) /** * struct kbase_ioctl_tlstream_acquire - Acquire a tlstream fd @@ -358,11 +301,9 @@ struct kbase_ioctl_tlstream_acquire { __u32 flags; }; -#define KBASE_IOCTL_TLSTREAM_ACQUIRE \ - _IOW(KBASE_IOCTL_TYPE, 18, struct kbase_ioctl_tlstream_acquire) +#define KBASE_IOCTL_TLSTREAM_ACQUIRE _IOW(KBASE_IOCTL_TYPE, 18, struct kbase_ioctl_tlstream_acquire) -#define KBASE_IOCTL_TLSTREAM_FLUSH \ - _IO(KBASE_IOCTL_TYPE, 19) +#define KBASE_IOCTL_TLSTREAM_FLUSH _IO(KBASE_IOCTL_TYPE, 19) /** * struct kbase_ioctl_mem_commit - Change the amount of memory backing a region @@ -379,8 +320,7 @@ struct kbase_ioctl_mem_commit { __u64 pages; }; -#define KBASE_IOCTL_MEM_COMMIT \ - _IOW(KBASE_IOCTL_TYPE, 20, struct kbase_ioctl_mem_commit) +#define KBASE_IOCTL_MEM_COMMIT _IOW(KBASE_IOCTL_TYPE, 20, struct kbase_ioctl_mem_commit) /** * union kbase_ioctl_mem_alias - Create an alias of memory regions @@ -408,8 +348,7 @@ union kbase_ioctl_mem_alias { } out; }; -#define KBASE_IOCTL_MEM_ALIAS \ - _IOWR(KBASE_IOCTL_TYPE, 21, union kbase_ioctl_mem_alias) +#define KBASE_IOCTL_MEM_ALIAS _IOWR(KBASE_IOCTL_TYPE, 21, union kbase_ioctl_mem_alias) /** * union kbase_ioctl_mem_import - Import memory for use by the GPU @@ -437,8 +376,7 @@ union kbase_ioctl_mem_import { } out; }; -#define KBASE_IOCTL_MEM_IMPORT \ - _IOWR(KBASE_IOCTL_TYPE, 22, union kbase_ioctl_mem_import) +#define KBASE_IOCTL_MEM_IMPORT _IOWR(KBASE_IOCTL_TYPE, 22, union kbase_ioctl_mem_import) /** * struct kbase_ioctl_mem_flags_change - Change the flags for a memory region @@ -452,8 +390,7 @@ struct kbase_ioctl_mem_flags_change { __u64 mask; }; -#define KBASE_IOCTL_MEM_FLAGS_CHANGE \ - _IOW(KBASE_IOCTL_TYPE, 23, struct kbase_ioctl_mem_flags_change) +#define KBASE_IOCTL_MEM_FLAGS_CHANGE _IOW(KBASE_IOCTL_TYPE, 23, struct kbase_ioctl_mem_flags_change) /** * struct kbase_ioctl_stream_create - Create a synchronisation stream @@ -470,8 +407,7 @@ struct kbase_ioctl_stream_create { char name[32]; }; -#define KBASE_IOCTL_STREAM_CREATE \ - _IOW(KBASE_IOCTL_TYPE, 24, struct kbase_ioctl_stream_create) +#define KBASE_IOCTL_STREAM_CREATE _IOW(KBASE_IOCTL_TYPE, 24, struct kbase_ioctl_stream_create) /** * struct kbase_ioctl_fence_validate - Validate a fd refers to a fence @@ -481,8 +417,7 @@ struct kbase_ioctl_fence_validate { int fd; }; -#define KBASE_IOCTL_FENCE_VALIDATE \ - _IOW(KBASE_IOCTL_TYPE, 25, struct kbase_ioctl_fence_validate) +#define KBASE_IOCTL_FENCE_VALIDATE _IOW(KBASE_IOCTL_TYPE, 25, struct kbase_ioctl_fence_validate) /** * struct kbase_ioctl_mem_profile_add - Provide profiling information to kernel @@ -498,8 +433,7 @@ struct kbase_ioctl_mem_profile_add { __u32 padding; }; -#define KBASE_IOCTL_MEM_PROFILE_ADD \ - _IOW(KBASE_IOCTL_TYPE, 27, struct kbase_ioctl_mem_profile_add) +#define KBASE_IOCTL_MEM_PROFILE_ADD _IOW(KBASE_IOCTL_TYPE, 27, struct kbase_ioctl_mem_profile_add) /** * struct kbase_ioctl_sticky_resource_map - Permanently map an external resource @@ -511,7 +445,7 @@ struct kbase_ioctl_sticky_resource_map { __u64 address; }; -#define KBASE_IOCTL_STICKY_RESOURCE_MAP \ +#define KBASE_IOCTL_STICKY_RESOURCE_MAP \ _IOW(KBASE_IOCTL_TYPE, 29, struct kbase_ioctl_sticky_resource_map) /** @@ -525,7 +459,7 @@ struct kbase_ioctl_sticky_resource_unmap { __u64 address; }; -#define KBASE_IOCTL_STICKY_RESOURCE_UNMAP \ +#define KBASE_IOCTL_STICKY_RESOURCE_UNMAP \ _IOW(KBASE_IOCTL_TYPE, 30, struct kbase_ioctl_sticky_resource_unmap) /** @@ -553,17 +487,16 @@ union kbase_ioctl_mem_find_gpu_start_and_offset { } out; }; -#define KBASE_IOCTL_MEM_FIND_GPU_START_AND_OFFSET \ +#define KBASE_IOCTL_MEM_FIND_GPU_START_AND_OFFSET \ _IOWR(KBASE_IOCTL_TYPE, 31, union kbase_ioctl_mem_find_gpu_start_and_offset) -#define KBASE_IOCTL_CINSTR_GWT_START \ - _IO(KBASE_IOCTL_TYPE, 33) +#define KBASE_IOCTL_CINSTR_GWT_START _IO(KBASE_IOCTL_TYPE, 33) -#define KBASE_IOCTL_CINSTR_GWT_STOP \ - _IO(KBASE_IOCTL_TYPE, 34) +#define KBASE_IOCTL_CINSTR_GWT_STOP _IO(KBASE_IOCTL_TYPE, 34) /** - * union kbase_ioctl_gwt_dump - Used to collect all GPU write fault addresses. + * union kbase_ioctl_cinstr_gwt_dump - Used to collect all GPU write fault + * addresses. * @in: Input parameters * @in.addr_buffer: Address of buffer to hold addresses of gpu modified areas. * @in.size_buffer: Address of buffer to hold size of modified areas (in pages) @@ -592,8 +525,7 @@ union kbase_ioctl_cinstr_gwt_dump { } out; }; -#define KBASE_IOCTL_CINSTR_GWT_DUMP \ - _IOWR(KBASE_IOCTL_TYPE, 35, union kbase_ioctl_cinstr_gwt_dump) +#define KBASE_IOCTL_CINSTR_GWT_DUMP _IOWR(KBASE_IOCTL_TYPE, 35, union kbase_ioctl_cinstr_gwt_dump) /** * struct kbase_ioctl_mem_exec_init - Initialise the EXEC_VA memory zone @@ -604,8 +536,7 @@ struct kbase_ioctl_mem_exec_init { __u64 va_pages; }; -#define KBASE_IOCTL_MEM_EXEC_INIT \ - _IOW(KBASE_IOCTL_TYPE, 38, struct kbase_ioctl_mem_exec_init) +#define KBASE_IOCTL_MEM_EXEC_INIT _IOW(KBASE_IOCTL_TYPE, 38, struct kbase_ioctl_mem_exec_init) /** * union kbase_ioctl_get_cpu_gpu_timeinfo - Request zero or more types of @@ -634,7 +565,7 @@ union kbase_ioctl_get_cpu_gpu_timeinfo { } out; }; -#define KBASE_IOCTL_GET_CPU_GPU_TIMEINFO \ +#define KBASE_IOCTL_GET_CPU_GPU_TIMEINFO \ _IOWR(KBASE_IOCTL_TYPE, 50, union kbase_ioctl_get_cpu_gpu_timeinfo) /** @@ -646,7 +577,7 @@ struct kbase_ioctl_context_priority_check { __u8 priority; }; -#define KBASE_IOCTL_CONTEXT_PRIORITY_CHECK \ +#define KBASE_IOCTL_CONTEXT_PRIORITY_CHECK \ _IOWR(KBASE_IOCTL_TYPE, 54, struct kbase_ioctl_context_priority_check) /** @@ -658,7 +589,7 @@ struct kbase_ioctl_set_limited_core_count { __u8 max_core_count; }; -#define KBASE_IOCTL_SET_LIMITED_CORE_COUNT \ +#define KBASE_IOCTL_SET_LIMITED_CORE_COUNT \ _IOW(KBASE_IOCTL_TYPE, 55, struct kbase_ioctl_set_limited_core_count) /** @@ -679,11 +610,11 @@ struct kbase_ioctl_kinstr_prfcnt_enum_info { __u64 info_list_ptr; }; -#define KBASE_IOCTL_KINSTR_PRFCNT_ENUM_INFO \ +#define KBASE_IOCTL_KINSTR_PRFCNT_ENUM_INFO \ _IOWR(KBASE_IOCTL_TYPE, 56, struct kbase_ioctl_kinstr_prfcnt_enum_info) /** - * struct kbase_ioctl_hwcnt_reader_setup - Setup HWC dumper/reader + * struct kbase_ioctl_kinstr_prfcnt_setup - Setup HWC dumper/reader * @in: input parameters. * @in.request_item_count: Number of requests in the requests array. * @in.request_item_size: Size in bytes of each request in the requests array. @@ -708,7 +639,7 @@ union kbase_ioctl_kinstr_prfcnt_setup { } out; }; -#define KBASE_IOCTL_KINSTR_PRFCNT_SETUP \ +#define KBASE_IOCTL_KINSTR_PRFCNT_SETUP \ _IOWR(KBASE_IOCTL_TYPE, 57, union kbase_ioctl_kinstr_prfcnt_setup) /*************** @@ -727,6 +658,27 @@ struct kbase_ioctl_apc_request { #define KBASE_IOCTL_APC_REQUEST \ _IOW(KBASE_IOCTL_TYPE, 66, struct kbase_ioctl_apc_request) +/** + * struct kbase_ioctl_buffer_liveness_update - Update the live ranges of buffers from previous frame + * + * @live_ranges_address: Array of live ranges + * @live_ranges_count: Number of elements in the live ranges buffer + * @buffer_va_address: Array of buffer base virtual addresses + * @buffer_sizes_address: Array of buffer sizes + * @buffer_count: Number of buffers + * @padding: Unused + */ +struct kbase_ioctl_buffer_liveness_update { + __u64 live_ranges_address; + __u64 live_ranges_count; + __u64 buffer_va_address; + __u64 buffer_sizes_address; + __u64 buffer_count; +}; + +#define KBASE_IOCTL_BUFFER_LIVENESS_UPDATE \ + _IOW(KBASE_IOCTL_TYPE, 67, struct kbase_ioctl_buffer_liveness_update) + /*************** * test ioctls * ***************/ @@ -748,8 +700,7 @@ struct kbase_ioctl_tlstream_stats { __u32 bytes_generated; }; -#define KBASE_IOCTL_TLSTREAM_STATS \ - _IOR(KBASE_IOCTL_TEST_TYPE, 2, struct kbase_ioctl_tlstream_stats) +#define KBASE_IOCTL_TLSTREAM_STATS _IOR(KBASE_IOCTL_TEST_TYPE, 2, struct kbase_ioctl_tlstream_stats) #endif /* MALI_UNIT_TEST */ @@ -767,108 +718,107 @@ struct kbase_ioctl_tlstream_stats { * _IOWR(KBASE_IOCTL_EXTRA_TYPE, 0, struct my_ioctl_args) */ - /********************************** * Definitions for GPU properties * **********************************/ -#define KBASE_GPUPROP_VALUE_SIZE_U8 (0x0) -#define KBASE_GPUPROP_VALUE_SIZE_U16 (0x1) -#define KBASE_GPUPROP_VALUE_SIZE_U32 (0x2) -#define KBASE_GPUPROP_VALUE_SIZE_U64 (0x3) - -#define KBASE_GPUPROP_PRODUCT_ID 1 -#define KBASE_GPUPROP_VERSION_STATUS 2 -#define KBASE_GPUPROP_MINOR_REVISION 3 -#define KBASE_GPUPROP_MAJOR_REVISION 4 +#define KBASE_GPUPROP_VALUE_SIZE_U8 (0x0) +#define KBASE_GPUPROP_VALUE_SIZE_U16 (0x1) +#define KBASE_GPUPROP_VALUE_SIZE_U32 (0x2) +#define KBASE_GPUPROP_VALUE_SIZE_U64 (0x3) + +#define KBASE_GPUPROP_PRODUCT_ID 1 +#define KBASE_GPUPROP_VERSION_STATUS 2 +#define KBASE_GPUPROP_MINOR_REVISION 3 +#define KBASE_GPUPROP_MAJOR_REVISION 4 /* 5 previously used for GPU speed */ -#define KBASE_GPUPROP_GPU_FREQ_KHZ_MAX 6 +#define KBASE_GPUPROP_GPU_FREQ_KHZ_MAX 6 /* 7 previously used for minimum GPU speed */ -#define KBASE_GPUPROP_LOG2_PROGRAM_COUNTER_SIZE 8 -#define KBASE_GPUPROP_TEXTURE_FEATURES_0 9 -#define KBASE_GPUPROP_TEXTURE_FEATURES_1 10 -#define KBASE_GPUPROP_TEXTURE_FEATURES_2 11 -#define KBASE_GPUPROP_GPU_AVAILABLE_MEMORY_SIZE 12 - -#define KBASE_GPUPROP_L2_LOG2_LINE_SIZE 13 -#define KBASE_GPUPROP_L2_LOG2_CACHE_SIZE 14 -#define KBASE_GPUPROP_L2_NUM_L2_SLICES 15 - -#define KBASE_GPUPROP_TILER_BIN_SIZE_BYTES 16 -#define KBASE_GPUPROP_TILER_MAX_ACTIVE_LEVELS 17 - -#define KBASE_GPUPROP_MAX_THREADS 18 -#define KBASE_GPUPROP_MAX_WORKGROUP_SIZE 19 -#define KBASE_GPUPROP_MAX_BARRIER_SIZE 20 -#define KBASE_GPUPROP_MAX_REGISTERS 21 -#define KBASE_GPUPROP_MAX_TASK_QUEUE 22 -#define KBASE_GPUPROP_MAX_THREAD_GROUP_SPLIT 23 -#define KBASE_GPUPROP_IMPL_TECH 24 - -#define KBASE_GPUPROP_RAW_SHADER_PRESENT 25 -#define KBASE_GPUPROP_RAW_TILER_PRESENT 26 -#define KBASE_GPUPROP_RAW_L2_PRESENT 27 -#define KBASE_GPUPROP_RAW_STACK_PRESENT 28 -#define KBASE_GPUPROP_RAW_L2_FEATURES 29 -#define KBASE_GPUPROP_RAW_CORE_FEATURES 30 -#define KBASE_GPUPROP_RAW_MEM_FEATURES 31 -#define KBASE_GPUPROP_RAW_MMU_FEATURES 32 -#define KBASE_GPUPROP_RAW_AS_PRESENT 33 -#define KBASE_GPUPROP_RAW_JS_PRESENT 34 -#define KBASE_GPUPROP_RAW_JS_FEATURES_0 35 -#define KBASE_GPUPROP_RAW_JS_FEATURES_1 36 -#define KBASE_GPUPROP_RAW_JS_FEATURES_2 37 -#define KBASE_GPUPROP_RAW_JS_FEATURES_3 38 -#define KBASE_GPUPROP_RAW_JS_FEATURES_4 39 -#define KBASE_GPUPROP_RAW_JS_FEATURES_5 40 -#define KBASE_GPUPROP_RAW_JS_FEATURES_6 41 -#define KBASE_GPUPROP_RAW_JS_FEATURES_7 42 -#define KBASE_GPUPROP_RAW_JS_FEATURES_8 43 -#define KBASE_GPUPROP_RAW_JS_FEATURES_9 44 -#define KBASE_GPUPROP_RAW_JS_FEATURES_10 45 -#define KBASE_GPUPROP_RAW_JS_FEATURES_11 46 -#define KBASE_GPUPROP_RAW_JS_FEATURES_12 47 -#define KBASE_GPUPROP_RAW_JS_FEATURES_13 48 -#define KBASE_GPUPROP_RAW_JS_FEATURES_14 49 -#define KBASE_GPUPROP_RAW_JS_FEATURES_15 50 -#define KBASE_GPUPROP_RAW_TILER_FEATURES 51 -#define KBASE_GPUPROP_RAW_TEXTURE_FEATURES_0 52 -#define KBASE_GPUPROP_RAW_TEXTURE_FEATURES_1 53 -#define KBASE_GPUPROP_RAW_TEXTURE_FEATURES_2 54 -#define KBASE_GPUPROP_RAW_GPU_ID 55 -#define KBASE_GPUPROP_RAW_THREAD_MAX_THREADS 56 -#define KBASE_GPUPROP_RAW_THREAD_MAX_WORKGROUP_SIZE 57 -#define KBASE_GPUPROP_RAW_THREAD_MAX_BARRIER_SIZE 58 -#define KBASE_GPUPROP_RAW_THREAD_FEATURES 59 -#define KBASE_GPUPROP_RAW_COHERENCY_MODE 60 - -#define KBASE_GPUPROP_COHERENCY_NUM_GROUPS 61 -#define KBASE_GPUPROP_COHERENCY_NUM_CORE_GROUPS 62 -#define KBASE_GPUPROP_COHERENCY_COHERENCY 63 -#define KBASE_GPUPROP_COHERENCY_GROUP_0 64 -#define KBASE_GPUPROP_COHERENCY_GROUP_1 65 -#define KBASE_GPUPROP_COHERENCY_GROUP_2 66 -#define KBASE_GPUPROP_COHERENCY_GROUP_3 67 -#define KBASE_GPUPROP_COHERENCY_GROUP_4 68 -#define KBASE_GPUPROP_COHERENCY_GROUP_5 69 -#define KBASE_GPUPROP_COHERENCY_GROUP_6 70 -#define KBASE_GPUPROP_COHERENCY_GROUP_7 71 -#define KBASE_GPUPROP_COHERENCY_GROUP_8 72 -#define KBASE_GPUPROP_COHERENCY_GROUP_9 73 -#define KBASE_GPUPROP_COHERENCY_GROUP_10 74 -#define KBASE_GPUPROP_COHERENCY_GROUP_11 75 -#define KBASE_GPUPROP_COHERENCY_GROUP_12 76 -#define KBASE_GPUPROP_COHERENCY_GROUP_13 77 -#define KBASE_GPUPROP_COHERENCY_GROUP_14 78 -#define KBASE_GPUPROP_COHERENCY_GROUP_15 79 - -#define KBASE_GPUPROP_TEXTURE_FEATURES_3 80 -#define KBASE_GPUPROP_RAW_TEXTURE_FEATURES_3 81 - -#define KBASE_GPUPROP_NUM_EXEC_ENGINES 82 - -#define KBASE_GPUPROP_RAW_THREAD_TLS_ALLOC 83 -#define KBASE_GPUPROP_TLS_ALLOC 84 -#define KBASE_GPUPROP_RAW_GPU_FEATURES 85 +#define KBASE_GPUPROP_LOG2_PROGRAM_COUNTER_SIZE 8 +#define KBASE_GPUPROP_TEXTURE_FEATURES_0 9 +#define KBASE_GPUPROP_TEXTURE_FEATURES_1 10 +#define KBASE_GPUPROP_TEXTURE_FEATURES_2 11 +#define KBASE_GPUPROP_GPU_AVAILABLE_MEMORY_SIZE 12 + +#define KBASE_GPUPROP_L2_LOG2_LINE_SIZE 13 +#define KBASE_GPUPROP_L2_LOG2_CACHE_SIZE 14 +#define KBASE_GPUPROP_L2_NUM_L2_SLICES 15 + +#define KBASE_GPUPROP_TILER_BIN_SIZE_BYTES 16 +#define KBASE_GPUPROP_TILER_MAX_ACTIVE_LEVELS 17 + +#define KBASE_GPUPROP_MAX_THREADS 18 +#define KBASE_GPUPROP_MAX_WORKGROUP_SIZE 19 +#define KBASE_GPUPROP_MAX_BARRIER_SIZE 20 +#define KBASE_GPUPROP_MAX_REGISTERS 21 +#define KBASE_GPUPROP_MAX_TASK_QUEUE 22 +#define KBASE_GPUPROP_MAX_THREAD_GROUP_SPLIT 23 +#define KBASE_GPUPROP_IMPL_TECH 24 + +#define KBASE_GPUPROP_RAW_SHADER_PRESENT 25 +#define KBASE_GPUPROP_RAW_TILER_PRESENT 26 +#define KBASE_GPUPROP_RAW_L2_PRESENT 27 +#define KBASE_GPUPROP_RAW_STACK_PRESENT 28 +#define KBASE_GPUPROP_RAW_L2_FEATURES 29 +#define KBASE_GPUPROP_RAW_CORE_FEATURES 30 +#define KBASE_GPUPROP_RAW_MEM_FEATURES 31 +#define KBASE_GPUPROP_RAW_MMU_FEATURES 32 +#define KBASE_GPUPROP_RAW_AS_PRESENT 33 +#define KBASE_GPUPROP_RAW_JS_PRESENT 34 +#define KBASE_GPUPROP_RAW_JS_FEATURES_0 35 +#define KBASE_GPUPROP_RAW_JS_FEATURES_1 36 +#define KBASE_GPUPROP_RAW_JS_FEATURES_2 37 +#define KBASE_GPUPROP_RAW_JS_FEATURES_3 38 +#define KBASE_GPUPROP_RAW_JS_FEATURES_4 39 +#define KBASE_GPUPROP_RAW_JS_FEATURES_5 40 +#define KBASE_GPUPROP_RAW_JS_FEATURES_6 41 +#define KBASE_GPUPROP_RAW_JS_FEATURES_7 42 +#define KBASE_GPUPROP_RAW_JS_FEATURES_8 43 +#define KBASE_GPUPROP_RAW_JS_FEATURES_9 44 +#define KBASE_GPUPROP_RAW_JS_FEATURES_10 45 +#define KBASE_GPUPROP_RAW_JS_FEATURES_11 46 +#define KBASE_GPUPROP_RAW_JS_FEATURES_12 47 +#define KBASE_GPUPROP_RAW_JS_FEATURES_13 48 +#define KBASE_GPUPROP_RAW_JS_FEATURES_14 49 +#define KBASE_GPUPROP_RAW_JS_FEATURES_15 50 +#define KBASE_GPUPROP_RAW_TILER_FEATURES 51 +#define KBASE_GPUPROP_RAW_TEXTURE_FEATURES_0 52 +#define KBASE_GPUPROP_RAW_TEXTURE_FEATURES_1 53 +#define KBASE_GPUPROP_RAW_TEXTURE_FEATURES_2 54 +#define KBASE_GPUPROP_RAW_GPU_ID 55 +#define KBASE_GPUPROP_RAW_THREAD_MAX_THREADS 56 +#define KBASE_GPUPROP_RAW_THREAD_MAX_WORKGROUP_SIZE 57 +#define KBASE_GPUPROP_RAW_THREAD_MAX_BARRIER_SIZE 58 +#define KBASE_GPUPROP_RAW_THREAD_FEATURES 59 +#define KBASE_GPUPROP_RAW_COHERENCY_MODE 60 + +#define KBASE_GPUPROP_COHERENCY_NUM_GROUPS 61 +#define KBASE_GPUPROP_COHERENCY_NUM_CORE_GROUPS 62 +#define KBASE_GPUPROP_COHERENCY_COHERENCY 63 +#define KBASE_GPUPROP_COHERENCY_GROUP_0 64 +#define KBASE_GPUPROP_COHERENCY_GROUP_1 65 +#define KBASE_GPUPROP_COHERENCY_GROUP_2 66 +#define KBASE_GPUPROP_COHERENCY_GROUP_3 67 +#define KBASE_GPUPROP_COHERENCY_GROUP_4 68 +#define KBASE_GPUPROP_COHERENCY_GROUP_5 69 +#define KBASE_GPUPROP_COHERENCY_GROUP_6 70 +#define KBASE_GPUPROP_COHERENCY_GROUP_7 71 +#define KBASE_GPUPROP_COHERENCY_GROUP_8 72 +#define KBASE_GPUPROP_COHERENCY_GROUP_9 73 +#define KBASE_GPUPROP_COHERENCY_GROUP_10 74 +#define KBASE_GPUPROP_COHERENCY_GROUP_11 75 +#define KBASE_GPUPROP_COHERENCY_GROUP_12 76 +#define KBASE_GPUPROP_COHERENCY_GROUP_13 77 +#define KBASE_GPUPROP_COHERENCY_GROUP_14 78 +#define KBASE_GPUPROP_COHERENCY_GROUP_15 79 + +#define KBASE_GPUPROP_TEXTURE_FEATURES_3 80 +#define KBASE_GPUPROP_RAW_TEXTURE_FEATURES_3 81 + +#define KBASE_GPUPROP_NUM_EXEC_ENGINES 82 + +#define KBASE_GPUPROP_RAW_THREAD_TLS_ALLOC 83 +#define KBASE_GPUPROP_TLS_ALLOC 84 +#define KBASE_GPUPROP_RAW_GPU_FEATURES 85 #ifdef __cpluscplus } #endif diff --git a/mali_kbase/mali_kbase_mem_profile_debugfs_buf_size.h b/common/include/uapi/gpu/arm/midgard/mali_kbase_mem_profile_debugfs_buf_size.h index c2fb3f5..1649100 100644 --- a/mali_kbase/mali_kbase_mem_profile_debugfs_buf_size.h +++ b/common/include/uapi/gpu/arm/midgard/mali_kbase_mem_profile_debugfs_buf_size.h @@ -23,14 +23,13 @@ * DOC: Header file for the size of the buffer to accumulate the histogram report text in */ -#ifndef _KBASE_MEM_PROFILE_DEBUGFS_BUF_SIZE_H_ -#define _KBASE_MEM_PROFILE_DEBUGFS_BUF_SIZE_H_ +#ifndef _UAPI_KBASE_MEM_PROFILE_DEBUGFS_BUF_SIZE_H_ +#define _UAPI_KBASE_MEM_PROFILE_DEBUGFS_BUF_SIZE_H_ /** * KBASE_MEM_PROFILE_MAX_BUF_SIZE - The size of the buffer to accumulate the histogram report text * in @see @ref CCTXP_HIST_BUF_SIZE_MAX_LENGTH_REPORT */ -#define KBASE_MEM_PROFILE_MAX_BUF_SIZE ((size_t)(64 + ((80 + (56 * 64)) * 54) + 56)) - -#endif /*_KBASE_MEM_PROFILE_DEBUGFS_BUF_SIZE_H_*/ +#define KBASE_MEM_PROFILE_MAX_BUF_SIZE ((size_t)(64 + ((80 + (56 * 64)) * 55) + 56)) +#endif /*_UAPI_KBASE_MEM_PROFILE_DEBUGFS_BUF_SIZE_H_*/ diff --git a/common/include/uapi/gpu/arm/midgard/mali_uk.h b/common/include/uapi/gpu/arm/midgard/mali_uk.h deleted file mode 100644 index 78946f6..0000000 --- a/common/include/uapi/gpu/arm/midgard/mali_uk.h +++ /dev/null @@ -1,70 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ -/* - * - * (C) COPYRIGHT 2010, 2012-2015, 2018, 2020-2022 ARM Limited. All rights reserved. - * - * This program is free software and is provided to you under the terms of the - * GNU General Public License version 2 as published by the Free Software - * Foundation, and any use by you of this program is subject to the terms - * of such GNU license. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, you can access it online at - * http://www.gnu.org/licenses/gpl-2.0.html. - * - */ - -/** - * DOC: Types and definitions that are common across OSs for both the user - * and kernel side of the User-Kernel interface. - */ - -#ifndef _UAPI_UK_H_ -#define _UAPI_UK_H_ - -#ifdef __cplusplus -extern "C" { -#endif /* __cplusplus */ - -/** - * DOC: uk_api User-Kernel Interface API - * - * The User-Kernel Interface abstracts the communication mechanism between the user and kernel-side code of device - * drivers developed as part of the Midgard DDK. Currently that includes the Base driver. - * - * It exposes an OS independent API to user-side code (UKU) which routes functions calls to an OS-independent - * kernel-side API (UKK) via an OS-specific communication mechanism. - * - * This API is internal to the Midgard DDK and is not exposed to any applications. - * - */ - -/** - * enum uk_client_id - These are identifiers for kernel-side drivers - * implementing a UK interface, aka UKK clients. - * @UK_CLIENT_MALI_T600_BASE: Value used to identify the Base driver UK client. - * @UK_CLIENT_COUNT: The number of uk clients supported. This must be - * the last member of the enum - * - * The UK module maps this to an OS specific device name, e.g. "gpu_base" -> "GPU0:". Specify this - * identifier to select a UKK client to the uku_open() function. - * - * When a new UKK client driver is created a new identifier needs to be added to the uk_client_id - * enumeration and the uku_open() implemenation for the various OS ports need to be updated to - * provide a mapping of the identifier to the OS specific device name. - * - */ -enum uk_client_id { - UK_CLIENT_MALI_T600_BASE, - UK_CLIENT_COUNT -}; - -#ifdef __cplusplus -} -#endif /* __cplusplus */ -#endif /* _UAPI_UK_H_ */ diff --git a/common/include/uapi/gpu/arm/midgard/platform/pixel/pixel_gpu_common_kernel.h b/common/include/uapi/gpu/arm/midgard/platform/pixel/pixel_gpu_common_kernel.h new file mode 100644 index 0000000..d2de578 --- /dev/null +++ b/common/include/uapi/gpu/arm/midgard/platform/pixel/pixel_gpu_common_kernel.h @@ -0,0 +1,12 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright 2022-2023 Google LLC. + * + * Author: Jack Diver <diverj@google.com> + */ +#ifndef _UAPI_PIXEL_GPU_COMMON_KERNEL_H_ +#define _UAPI_PIXEL_GPU_COMMON_KERNEL_H_ + +#include "pixel_gpu_common_slc.h" + +#endif /* _UAPI_PIXEL_GPU_COMMON_KERNEL_H_ */ diff --git a/common/include/uapi/gpu/arm/midgard/platform/pixel/pixel_gpu_common_slc.h b/common/include/uapi/gpu/arm/midgard/platform/pixel/pixel_gpu_common_slc.h new file mode 100644 index 0000000..76e631d --- /dev/null +++ b/common/include/uapi/gpu/arm/midgard/platform/pixel/pixel_gpu_common_slc.h @@ -0,0 +1,36 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright 2022-2023 Google LLC. + * + * Author: Jack Diver <diverj@google.com> + */ +#ifndef _UAPI_PIXEL_GPU_COMMON_SLC_H_ +#define _UAPI_PIXEL_GPU_COMMON_SLC_H_ + +#include <linux/types.h> + +/** + * enum kbase_pixel_gpu_slc_liveness_mark_type - Determines the type of a live range mark + * + * @KBASE_PIXEL_GPU_LIVE_RANGE_BEGIN: Signifies that a mark is the start of a live range + * @KBASE_PIXEL_GPU_LIVE_RANGE_END: Signifies that a mark is the end of a live range + * + */ +enum kbase_pixel_gpu_slc_liveness_mark_type { + KBASE_PIXEL_GPU_LIVE_RANGE_BEGIN, + KBASE_PIXEL_GPU_LIVE_RANGE_END, +}; + +/** + * struct kbase_pixel_gpu_slc_liveness_mark - Live range marker + * + * @type: See @struct kbase_pixel_gpu_slc_liveness_mark_type + * @index: Buffer index (within liveness update array) that this mark represents + * + */ +struct kbase_pixel_gpu_slc_liveness_mark { + __u32 type : 1; + __u32 index : 31; +}; + +#endif /* _UAPI_PIXEL_GPU_COMMON_SLC_H_ */ diff --git a/common/include/uapi/gpu/arm/midgard/platform/pixel/pixel_memory_group_manager.h b/common/include/uapi/gpu/arm/midgard/platform/pixel/pixel_memory_group_manager.h new file mode 100644 index 0000000..b575c79 --- /dev/null +++ b/common/include/uapi/gpu/arm/midgard/platform/pixel/pixel_memory_group_manager.h @@ -0,0 +1,55 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright 2022-2023 Google LLC. + * + * Author: Jack Diver <diverj@google.com> + */ +#ifndef _UAPI_PIXEL_MEMORY_GROUP_MANAGER_H_ +#define _UAPI_PIXEL_MEMORY_GROUP_MANAGER_H_ + +/** + * enum pixel_mgm_group_id - Symbolic names for used memory groups + */ +enum pixel_mgm_group_id +{ + /* The Mali driver requires that allocations made on one of the groups + * are not treated specially. + */ + MGM_RESERVED_GROUP_ID = 0, + + /* Group for memory that should be cached in the system level cache. */ + MGM_SLC_GROUP_ID = 1, + + /* Group for memory explicitly allocated in SLC. */ + MGM_SLC_EXPLICIT_GROUP_ID = 2, + + /* Imported memory is handled by the allocator of the memory, and the Mali + * DDK will request a group_id for such memory via mgm_get_import_memory_id(). + * We specify which group we want to use for this here. + */ + MGM_IMPORTED_MEMORY_GROUP_ID = (MEMORY_GROUP_MANAGER_NR_GROUPS - 1), +}; + +/** + * pixel_mgm_query_group_size - Query the current size of a memory group + * + * @mgm_dev: The memory group manager through which the request is being made. + * @group_id: Memory group to query. + * + * Returns the actual size of the memory group's active partition + */ +extern u64 pixel_mgm_query_group_size(struct memory_group_manager_device* mgm_dev, + enum pixel_mgm_group_id group_id); + +/** + * pixel_mgm_resize_group_to_fit - Resize a memory group to meet @demand, if possible + * + * @mgm_dev: The memory group manager through which the request is being made. + * @group_id: Memory group for which we will change the backing partition. + * @demand: The demanded space from the memory group. + */ +extern void pixel_mgm_resize_group_to_fit(struct memory_group_manager_device* mgm_dev, + enum pixel_mgm_group_id group_id, + u64 demand); + +#endif /* _UAPI_PIXEL_MEMORY_GROUP_MANAGER_H_ */ diff --git a/mali_kbase/BUILD.bazel b/mali_kbase/BUILD.bazel index 86d8658..b987493 100644 --- a/mali_kbase/BUILD.bazel +++ b/mali_kbase/BUILD.bazel @@ -1,25 +1,59 @@ -# SPDX-License-Identifier: GPL-2.0-or-later +# This program is free software and is provided to you under the terms of the +# GNU General Public License version 2 as published by the Free Software +# Foundation, and any use by you of this program is subject to the terms +# of such GNU license. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, you can access it online at +# http://www.gnu.org/licenses/gpl-2.0.html. +# +# -load("//build/kernel/kleaf:kernel.bzl", "kernel_module") +load( + "//build/kernel/kleaf:kernel.bzl", + "kernel_module", +) + +_midgard_modules = [ + "mali_kbase.ko", + "tests/kutf/mali_kutf.ko", + "tests/mali_kutf_clk_rate_trace/kernel/mali_kutf_clk_rate_trace_test_portal.ko", +] kernel_module( name = "mali_kbase", srcs = glob([ "**/*.c", "**/*.h", - "**/Kbuild", + "**/*Kbuild", + "**/*Makefile", ]) + [ "//private/google-modules/gpu/common:headers", "//private/google-modules/soc/gs:gs_soc_headers", ], - outs = [ - "mali_kbase.ko", - ], + outs = _midgard_modules, kernel_build = "//private/google-modules/soc/gs:gs_kernel_build", visibility = [ "//private/google-modules/soc/gs:__pkg__", ], deps = [ + "//private/google-modules/gpu/mali_pixel", "//private/google-modules/soc/gs:gs_soc_module", ], ) + +filegroup( + name = "midgard_kconfig.cloudripper", + srcs = glob([ + "**/*Kconfig", + ]), + visibility = [ + "//common:__pkg__", + "//common-modules/mali:__subpackages__", + ], +) diff --git a/mali_kbase/Kbuild b/mali_kbase/Kbuild index e0703ab..666498c 100644 --- a/mali_kbase/Kbuild +++ b/mali_kbase/Kbuild @@ -1,6 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note # -# (C) COPYRIGHT 2012-2021 ARM Limited. All rights reserved. +# (C) COPYRIGHT 2012-2023 ARM Limited. All rights reserved. # # This program is free software and is provided to you under the terms of the # GNU General Public License version 2 as published by the Free Software @@ -59,10 +59,8 @@ ifeq ($(CONFIG_MALI_PRFCNT_SET_SELECT_VIA_DEBUG_FS), y) endif ifeq ($(CONFIG_MALI_FENCE_DEBUG), y) - ifneq ($(CONFIG_SYNC), y) - ifneq ($(CONFIG_SYNC_FILE), y) - $(error CONFIG_MALI_FENCE_DEBUG depends on CONFIG_SYNC || CONFIG_SYNC_FILE to be set in Kernel configuration) - endif + ifneq ($(CONFIG_SYNC_FILE), y) + $(error CONFIG_MALI_FENCE_DEBUG depends on CONFIG_SYNC_FILE to be set in Kernel configuration) endif endif @@ -70,12 +68,11 @@ endif # Configurations # -# Driver version string which is returned to userspace via an ioctl -MALI_RELEASE_NAME ?= '"r36p0-01eac0"' - # We are building for Pixel CONFIG_MALI_PLATFORM_NAME="pixel" +# Driver version string which is returned to userspace via an ioctl +MALI_RELEASE_NAME ?= '"r44p1-00dev3"' # Set up defaults if not defined by build system ifeq ($(CONFIG_MALI_DEBUG), y) MALI_UNIT_TEST = 1 @@ -89,9 +86,19 @@ MALI_COVERAGE ?= 0 # Kconfig passes in the name with quotes for in-tree builds - remove them. MALI_PLATFORM_DIR := $(shell echo $(CONFIG_MALI_PLATFORM_NAME)) +ifneq ($(CONFIG_SOC_GS101),y) + CONFIG_MALI_CSF_SUPPORT ?= y +endif + ifeq ($(CONFIG_MALI_CSF_SUPPORT),y) MALI_JIT_PRESSURE_LIMIT_BASE = 0 MALI_USE_CSF = 1 + ccflags-y += -DCONFIG_MALI_PIXEL_GPU_SSCD +ifeq ($(CONFIG_SOC_GS201),y) +ifeq ($(CONFIG_MALI_HOST_CONTROLS_SC_RAILS),y) + ccflags-y += -DCONFIG_MALI_HOST_CONTROLS_SC_RAILS +endif +endif else MALI_JIT_PRESSURE_LIMIT_BASE ?= 1 MALI_USE_CSF ?= 0 @@ -110,12 +117,12 @@ endif # # Experimental features must default to disabled, e.g.: # MALI_EXPERIMENTAL_FEATURE ?= 0 -MALI_INCREMENTAL_RENDERING ?= 0 +MALI_INCREMENTAL_RENDERING_JM ?= 0 # # ccflags # -ccflags-y = \ +ccflags-y += \ -DMALI_CUSTOMER_RELEASE=$(MALI_CUSTOMER_RELEASE) \ -DMALI_USE_CSF=$(MALI_USE_CSF) \ -DMALI_KERNEL_TEST_API=$(MALI_KERNEL_TEST_API) \ @@ -123,10 +130,9 @@ ccflags-y = \ -DMALI_COVERAGE=$(MALI_COVERAGE) \ -DMALI_RELEASE_NAME=$(MALI_RELEASE_NAME) \ -DMALI_JIT_PRESSURE_LIMIT_BASE=$(MALI_JIT_PRESSURE_LIMIT_BASE) \ - -DMALI_INCREMENTAL_RENDERING=$(MALI_INCREMENTAL_RENDERING) \ + -DMALI_INCREMENTAL_RENDERING_JM=$(MALI_INCREMENTAL_RENDERING_JM) \ -DMALI_PLATFORM_DIR=$(MALI_PLATFORM_DIR) - ifeq ($(KBUILD_EXTMOD),) # in-tree ccflags-y +=-DMALI_KBASE_PLATFORM_PATH=../../$(src)/platform/$(CONFIG_MALI_PLATFORM_NAME) @@ -139,7 +145,8 @@ ccflags-y += \ -I$(src) \ -I$(src)/platform/$(MALI_PLATFORM_DIR) \ -I$(src)/../../../base \ - -I$(src)/../../../../include + -I$(src)/../../../../include \ + -I$(src)/tests/include # Add include path for related GPU modules ccflags-y += -I$(src)/../common/include @@ -150,13 +157,14 @@ subdir-ccflags-y += $(ccflags-y) # Kernel Modules # obj-$(CONFIG_MALI_MIDGARD) += mali_kbase.o -obj-$(CONFIG_MALI_ARBITRATION) += arbitration/ +obj-$(CONFIG_MALI_ARBITRATION) += ../arbitration/ obj-$(CONFIG_MALI_KUTF) += tests/ mali_kbase-y := \ mali_kbase_cache_policy.o \ mali_kbase_ccswe.o \ mali_kbase_mem.o \ + mali_kbase_mem_migrate.o \ mali_kbase_mem_pool_group.o \ mali_kbase_native_mgm.o \ mali_kbase_ctx_sched.o \ @@ -165,12 +173,6 @@ mali_kbase-y := \ mali_kbase_config.o \ mali_kbase_kinstr_prfcnt.o \ mali_kbase_vinstr.o \ - mali_kbase_hwcnt.o \ - mali_kbase_hwcnt_gpu.o \ - mali_kbase_hwcnt_gpu_narrow.o \ - mali_kbase_hwcnt_types.o \ - mali_kbase_hwcnt_virtualizer.o \ - mali_kbase_hwcnt_watchdog_if_timer.o \ mali_kbase_softjobs.o \ mali_kbase_hw.o \ mali_kbase_debug.o \ @@ -180,11 +182,12 @@ mali_kbase-y := \ mali_kbase_mem_profile_debugfs.o \ mali_kbase_disjoint_events.o \ mali_kbase_debug_mem_view.o \ + mali_kbase_debug_mem_zones.o \ + mali_kbase_debug_mem_allocs.o \ mali_kbase_smc.o \ mali_kbase_mem_pool.o \ mali_kbase_mem_pool_debugfs.o \ mali_kbase_debugfs_helper.o \ - mali_kbase_strings.o \ mali_kbase_as_fault_debugfs.o \ mali_kbase_regs_history_debugfs.o \ mali_kbase_dvfs_debugfs.o \ @@ -196,24 +199,18 @@ mali_kbase-$(CONFIG_DEBUG_FS) += mali_kbase_pbha_debugfs.o mali_kbase-$(CONFIG_MALI_CINSTR_GWT) += mali_kbase_gwt.o -mali_kbase-$(CONFIG_SYNC) += \ - mali_kbase_sync_android.o \ - mali_kbase_sync_common.o - mali_kbase-$(CONFIG_SYNC_FILE) += \ mali_kbase_fence_ops.o \ mali_kbase_sync_file.o \ mali_kbase_sync_common.o -ifeq ($(CONFIG_MALI_CSF_SUPPORT),y) - mali_kbase-y += \ - mali_kbase_hwcnt_backend_csf.o \ - mali_kbase_hwcnt_backend_csf_if_fw.o -else +mali_kbase-$(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) += \ + mali_power_gpu_work_period_trace.o \ + mali_kbase_gpu_metrics.o + +ifneq ($(CONFIG_MALI_CSF_SUPPORT),y) mali_kbase-y += \ mali_kbase_jm.o \ - mali_kbase_hwcnt_backend_jm.o \ - mali_kbase_hwcnt_backend_jm_watchdog.o \ mali_kbase_dummy_job_wa.o \ mali_kbase_debug_job_fault.o \ mali_kbase_event.o \ @@ -223,11 +220,6 @@ else mali_kbase_js_ctx_attr.o \ mali_kbase_kinstr_jm.o - mali_kbase-$(CONFIG_MALI_DMA_FENCE) += \ - mali_kbase_fence_ops.o \ - mali_kbase_dma_fence.o \ - mali_kbase_fence.o - mali_kbase-$(CONFIG_SYNC_FILE) += \ mali_kbase_fence_ops.o \ mali_kbase_fence.o @@ -241,6 +233,7 @@ INCLUDE_SUBDIR = \ $(src)/backend/gpu/Kbuild \ $(src)/mmu/Kbuild \ $(src)/tl/Kbuild \ + $(src)/hwcnt/Kbuild \ $(src)/gpu/Kbuild \ $(src)/thirdparty/Kbuild \ $(src)/platform/$(MALI_PLATFORM_DIR)/Kbuild diff --git a/mali_kbase/Kconfig b/mali_kbase/Kconfig index a563d35..bb25ef4 100644 --- a/mali_kbase/Kconfig +++ b/mali_kbase/Kconfig @@ -1,6 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note # -# (C) COPYRIGHT 2012-2021 ARM Limited. All rights reserved. +# (C) COPYRIGHT 2012-2023 ARM Limited. All rights reserved. # # This program is free software and is provided to you under the terms of the # GNU General Public License version 2 as published by the Free Software @@ -43,12 +43,40 @@ config MALI_PLATFORM_NAME include in the build. 'platform/$(MALI_PLATFORM_NAME)/Kbuild' must exist. +choice + prompt "Mali HW backend" + depends on MALI_MIDGARD + default MALI_REAL_HW + config MALI_REAL_HW + bool "Enable build of Mali kernel driver for real HW" depends on MALI_MIDGARD - def_bool !MALI_NO_MALI + help + This is the default HW backend. + +config MALI_NO_MALI + bool "Enable build of Mali kernel driver for No Mali" + depends on MALI_MIDGARD && MALI_EXPERT + help + This can be used to test the driver in a simulated environment + whereby the hardware is not physically present. If the hardware is physically + present it will not be used. This can be used to test the majority of the + driver without needing actual hardware or for software benchmarking. + All calls to the simulated hardware will complete immediately as if the hardware + completed the task. + +config MALI_NO_MALI_DEFAULT_GPU + string "Default GPU for No Mali" + depends on MALI_NO_MALI + default "tMIx" + help + This option sets the default GPU to identify as for No Mali builds. + + +endchoice menu "Platform specific options" -source "drivers/gpu/arm/midgard/platform/Kconfig" +source "$(MALI_KCONFIG_EXT_PREFIX)drivers/gpu/arm/midgard/platform/Kconfig" endmenu config MALI_CSF_SUPPORT @@ -94,16 +122,6 @@ config MALI_MIDGARD_ENABLE_TRACE Enables tracing in kbase. Trace log available through the "mali_trace" debugfs file, when the CONFIG_DEBUG_FS is enabled -config MALI_DMA_FENCE - bool "Enable DMA_BUF fence support for Mali" - depends on MALI_MIDGARD - default n - help - Support DMA_BUF fences for Mali. - - This option should only be enabled if the Linux Kernel has built in - support for DMA_BUF fences. - config MALI_ARBITER_SUPPORT bool "Enable arbiter support for Mali" depends on MALI_MIDGARD && !MALI_CSF_SUPPORT @@ -120,7 +138,7 @@ config MALI_DMA_BUF_MAP_ON_DEMAND depends on MALI_MIDGARD default n help - This option caused kbase to set up the GPU mapping of imported + This option will cause kbase to set up the GPU mapping of imported dma-buf when needed to run atoms. This is the legacy behavior. This is intended for testing and the option will get removed in the @@ -140,6 +158,11 @@ config MALI_DMA_BUF_LEGACY_COMPAT flushes in other drivers. This only has an effect for clients using UK 11.18 or older. For later UK versions it is not possible. +config MALI_CORESIGHT + depends on MALI_MIDGARD && MALI_CSF_SUPPORT && !MALI_NO_MALI + bool "Enable Kbase CoreSight tracing support" + default n + menuconfig MALI_EXPERT depends on MALI_MIDGARD bool "Enable Expert Settings" @@ -150,7 +173,19 @@ menuconfig MALI_EXPERT if MALI_EXPERT -config MALI_2MB_ALLOC +config LARGE_PAGE_ALLOC_OVERRIDE + bool "Override default setting of 2MB pages" + depends on MALI_MIDGARD && MALI_EXPERT + default n + help + An override config for LARGE_PAGE_ALLOC config. + When LARGE_PAGE_ALLOC_OVERRIDE is Y, 2MB page allocation will be + enabled by LARGE_PAGE_ALLOC. When this is N, the feature will be + enabled when GPU HW satisfies requirements. + + If in doubt, say N + +config LARGE_PAGE_ALLOC bool "Attempt to allocate 2MB pages" depends on MALI_MIDGARD && MALI_EXPERT default n @@ -159,8 +194,28 @@ config MALI_2MB_ALLOC allocate 2MB pages from the kernel. This reduces TLB pressure and helps to prevent memory fragmentation. + Note this config applies only when LARGE_PAGE_ALLOC_OVERRIDE config + is enabled and enabling this on a GPU HW that does not satisfy + requirements can cause serious problem. + If in doubt, say N +config PAGE_MIGRATION_SUPPORT + bool "Enable support for page migration" + depends on MALI_MIDGARD && MALI_EXPERT + default y + default n if ANDROID + help + Compile in support for page migration. + If set to disabled ('n') then page migration cannot + be enabled at all, and related symbols are not compiled in. + If not set, page migration is compiled in by default, and + if not explicitly enabled or disabled with the insmod parameter, + page migration becomes automatically enabled with large pages. + + If in doubt, say Y. To strip out page migration symbols and support, + say N. + config MALI_MEMORY_FULLY_BACKED bool "Enable memory fully physically-backed" depends on MALI_MIDGARD && MALI_EXPERT @@ -187,18 +242,6 @@ config MALI_CORESTACK comment "Platform options" depends on MALI_MIDGARD && MALI_EXPERT -config MALI_NO_MALI - bool "Enable No Mali" - depends on MALI_MIDGARD && MALI_EXPERT - default n - help - This can be used to test the driver in a simulated environment - whereby the hardware is not physically present. If the hardware is physically - present it will not be used. This can be used to test the majority of the - driver without needing actual hardware or for software benchmarking. - All calls to the simulated hardware will complete immediately as if the hardware - completed the task. - config MALI_ERROR_INJECT bool "Enable No Mali error injection" depends on MALI_MIDGARD && MALI_EXPERT && MALI_NO_MALI @@ -206,14 +249,6 @@ config MALI_ERROR_INJECT help Enables insertion of errors to test module failure and recovery mechanisms. -config MALI_GEM5_BUILD - bool "Enable build of Mali kernel driver for GEM5" - depends on MALI_MIDGARD && MALI_EXPERT - default n - help - This option is to do a Mali GEM5 build. - If unsure, say N. - comment "Debug options" depends on MALI_MIDGARD && MALI_EXPERT @@ -226,7 +261,7 @@ config MALI_DEBUG config MALI_FENCE_DEBUG bool "Enable debug sync fence usage" - depends on MALI_MIDGARD && MALI_EXPERT && (SYNC || SYNC_FILE) + depends on MALI_MIDGARD && MALI_EXPERT && SYNC_FILE default y if MALI_DEBUG help Select this option to enable additional checking and reporting on the @@ -363,6 +398,15 @@ config MALI_HW_ERRATA_1485982_USE_CLOCK_ALTERNATIVE tree using the property, opp-mali-errata-1485982. Otherwise the slowest clock will be selected. +config MALI_HOST_CONTROLS_SC_RAILS + bool "Enable Host based control of the shader core power rails" + depends on MALI_CSF_SUPPORT + default n + help + This option enables the Host based control of the power rails for + shader cores. It is recommended to use PDCA (Power Domain Control + Adapter) inside the GPU to handshake with SoC PMU to control the + power of cores. endif config MALI_ARBITRATION @@ -374,10 +418,16 @@ config MALI_ARBITRATION virtualization setup for Mali If unsure, say N. -if MALI_ARBITRATION -source "drivers/gpu/arm/midgard/arbitration/Kconfig" -endif +config MALI_TRACE_POWER_GPU_WORK_PERIOD + bool "Enable per-application GPU metrics tracepoints" + depends on MALI_MIDGARD + default y + help + This option enables per-application GPU metrics tracepoints. + + If unsure, say N. + -source "drivers/gpu/arm/midgard/tests/Kconfig" +source "$(MALI_KCONFIG_EXT_PREFIX)drivers/gpu/arm/midgard/tests/Kconfig" endif diff --git a/mali_kbase/Makefile b/mali_kbase/Makefile index ae4609c..6ee3a2d 100644 --- a/mali_kbase/Makefile +++ b/mali_kbase/Makefile @@ -1,6 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note # -# (C) COPYRIGHT 2010-2021 ARM Limited. All rights reserved. +# (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved. # # This program is free software and is provided to you under the terms of the # GNU General Public License version 2 as published by the Free Software @@ -20,8 +20,6 @@ KERNEL_SRC ?= /lib/modules/$(shell uname -r)/build KDIR ?= $(KERNEL_SRC) - -# Ensure build intermediates are in OUT_DIR instead of alongside the source M ?= $(shell pwd) ifeq ($(KDIR),) @@ -33,17 +31,21 @@ endif # Pixel integration configuration values # +# Debug Ftrace configuration options +CONFIG_MALI_SYSTEM_TRACE=y + # Core kbase configuration options CONFIG_MALI_EXPERT=y CONFIG_MALI_MIDGARD_DVFS=y +CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD = y # Pixel integration specific configuration options CONFIG_MALI_PLATFORM_NAME="pixel" -CONFIG_MALI_PIXEL_GPU_QOS=y -CONFIG_MALI_PIXEL_GPU_BTS=y -CONFIG_MALI_PIXEL_GPU_THERMAL=y -CONFIG_MALI_PIXEL_GPU_SECURE_RENDERING=y - +CONFIG_MALI_PIXEL_GPU_QOS ?= y +CONFIG_MALI_PIXEL_GPU_BTS ?= y +CONFIG_MALI_PIXEL_GPU_SECURE_RENDERING ?= y +CONFIG_MALI_PIXEL_GPU_THERMAL ?= y +CONFIG_MALI_PIXEL_GPU_SLC ?= y # # Default configuration values @@ -51,175 +53,179 @@ CONFIG_MALI_PIXEL_GPU_SECURE_RENDERING=y # Dependency resolution is done through statements as Kconfig # is not supported for out-of-tree builds. # +CONFIGS := +ifeq ($(MALI_KCONFIG_EXT_PREFIX),) + CONFIG_MALI_MIDGARD ?= m + ifeq ($(CONFIG_MALI_MIDGARD),m) + CONFIG_MALI_PLATFORM_NAME ?= "devicetree" + CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD ?= y + CONFIG_MALI_GATOR_SUPPORT ?= y + CONFIG_MALI_ARBITRATION ?= n + CONFIG_MALI_PARTITION_MANAGER ?= n -CONFIG_MALI_MIDGARD ?= m -ifeq ($(CONFIG_MALI_MIDGARD),m) - CONFIG_MALI_PLATFORM_NAME ?= "devicetree" - CONFIG_MALI_GATOR_SUPPORT ?= y - CONFIG_MALI_ARBITRATION ?= n - CONFIG_MALI_PARTITION_MANAGER ?= n - - ifeq ($(origin CONFIG_MALI_ABITER_MODULES), undefined) - CONFIG_MALI_ARBITER_MODULES := $(CONFIG_MALI_ARBITRATION) - endif - - ifeq ($(origin CONFIG_MALI_GPU_POWER_MODULES), undefined) - CONFIG_MALI_GPU_POWER_MODULES := $(CONFIG_MALI_ARBITRATION) - endif - - ifneq ($(CONFIG_MALI_NO_MALI),y) - # Prevent misuse when CONFIG_MALI_NO_MALI=y - CONFIG_MALI_REAL_HW ?= y - endif - - ifeq ($(CONFIG_MALI_MIDGARD_DVFS),y) - # Prevent misuse when CONFIG_MALI_MIDGARD_DVFS=y - CONFIG_MALI_DEVFREQ ?= n - else - CONFIG_MALI_DEVFREQ ?= y - endif - - ifeq ($(CONFIG_MALI_DMA_BUF_MAP_ON_DEMAND), y) - # Prevent misuse when CONFIG_MALI_DMA_BUF_MAP_ON_DEMAND=y - CONFIG_MALI_DMA_BUF_LEGACY_COMPAT = n - endif - - ifeq ($(CONFIG_XEN),y) - ifneq ($(CONFIG_MALI_ARBITRATION), n) - CONFIG_MALI_XEN ?= m + ifneq ($(CONFIG_MALI_NO_MALI),y) + # Prevent misuse when CONFIG_MALI_NO_MALI=y + CONFIG_MALI_REAL_HW ?= y + CONFIG_MALI_CORESIGHT = n endif - endif - # - # Expert/Debug/Test released configurations - # - ifeq ($(CONFIG_MALI_EXPERT), y) - ifeq ($(CONFIG_MALI_NO_MALI), y) - CONFIG_MALI_REAL_HW = n + ifeq ($(CONFIG_MALI_MIDGARD_DVFS),y) + # Prevent misuse when CONFIG_MALI_MIDGARD_DVFS=y + CONFIG_MALI_DEVFREQ ?= n else - # Prevent misuse when CONFIG_MALI_NO_MALI=n - CONFIG_MALI_REAL_HW = y - CONFIG_MALI_ERROR_INJECT = n + CONFIG_MALI_DEVFREQ ?= y endif - ifeq ($(CONFIG_MALI_HW_ERRATA_1485982_NOT_AFFECTED), y) - # Prevent misuse when CONFIG_MALI_HW_ERRATA_1485982_NOT_AFFECTED=y - CONFIG_MALI_HW_ERRATA_1485982_USE_CLOCK_ALTERNATIVE = n + ifeq ($(CONFIG_MALI_DMA_BUF_MAP_ON_DEMAND), y) + # Prevent misuse when CONFIG_MALI_DMA_BUF_MAP_ON_DEMAND=y + CONFIG_MALI_DMA_BUF_LEGACY_COMPAT = n endif - ifeq ($(CONFIG_MALI_DEBUG), y) - CONFIG_MALI_MIDGARD_ENABLE_TRACE ?= y - CONFIG_MALI_SYSTEM_TRACE ?= y + ifeq ($(CONFIG_MALI_CSF_SUPPORT), y) + CONFIG_MALI_CORESIGHT ?= n + endif + + # + # Expert/Debug/Test released configurations + # + ifeq ($(CONFIG_MALI_EXPERT), y) + ifeq ($(CONFIG_MALI_NO_MALI), y) + CONFIG_MALI_REAL_HW = n + CONFIG_MALI_NO_MALI_DEFAULT_GPU ?= "tMIx" - ifeq ($(CONFIG_SYNC), y) - CONFIG_MALI_FENCE_DEBUG ?= y else + # Prevent misuse when CONFIG_MALI_NO_MALI=n + CONFIG_MALI_REAL_HW = y + CONFIG_MALI_ERROR_INJECT = n + endif + + + ifeq ($(CONFIG_MALI_HW_ERRATA_1485982_NOT_AFFECTED), y) + # Prevent misuse when CONFIG_MALI_HW_ERRATA_1485982_NOT_AFFECTED=y + CONFIG_MALI_HW_ERRATA_1485982_USE_CLOCK_ALTERNATIVE = n + endif + + ifeq ($(CONFIG_MALI_DEBUG), y) + CONFIG_MALI_MIDGARD_ENABLE_TRACE ?= y + CONFIG_MALI_SYSTEM_TRACE ?= y + ifeq ($(CONFIG_SYNC_FILE), y) CONFIG_MALI_FENCE_DEBUG ?= y else CONFIG_MALI_FENCE_DEBUG = n endif + else + # Prevent misuse when CONFIG_MALI_DEBUG=n + CONFIG_MALI_MIDGARD_ENABLE_TRACE = n + CONFIG_MALI_FENCE_DEBUG = n endif else - # Prevent misuse when CONFIG_MALI_DEBUG=n + # Prevent misuse when CONFIG_MALI_EXPERT=n + CONFIG_MALI_CORESTACK = n + CONFIG_LARGE_PAGE_ALLOC_OVERRIDE = n + CONFIG_LARGE_PAGE_ALLOC = n + CONFIG_MALI_PWRSOFT_765 = n + CONFIG_MALI_MEMORY_FULLY_BACKED = n + CONFIG_MALI_JOB_DUMP = n + CONFIG_MALI_NO_MALI = n + CONFIG_MALI_REAL_HW = y + CONFIG_MALI_ERROR_INJECT = n + CONFIG_MALI_HW_ERRATA_1485982_NOT_AFFECTED = n + CONFIG_MALI_HW_ERRATA_1485982_USE_CLOCK_ALTERNATIVE = n + CONFIG_MALI_HOST_CONTROLS_SC_RAILS = n + CONFIG_MALI_PRFCNT_SET_SELECT_VIA_DEBUG_FS = n + CONFIG_MALI_DEBUG = n CONFIG_MALI_MIDGARD_ENABLE_TRACE = n - CONFIG_MALI_SYSTEM_TRACE = n CONFIG_MALI_FENCE_DEBUG = n endif - else - # Prevent misuse when CONFIG_MALI_EXPERT=n - CONFIG_MALI_CORESTACK = n - CONFIG_MALI_2MB_ALLOC = n - CONFIG_MALI_PWRSOFT_765 = n - CONFIG_MALI_MEMORY_FULLY_BACKED = n - CONFIG_MALI_JOB_DUMP = n - CONFIG_MALI_NO_MALI = n - CONFIG_MALI_REAL_HW = y - CONFIG_MALI_ERROR_INJECT = n - CONFIG_MALI_HW_ERRATA_1485982_NOT_AFFECTED = n - CONFIG_MALI_HW_ERRATA_1485982_USE_CLOCK_ALTERNATIVE = n - CONFIG_MALI_PRFCNT_SET_SELECT_VIA_DEBUG_FS = n - CONFIG_MALI_DEBUG = n - CONFIG_MALI_MIDGARD_ENABLE_TRACE = n - CONFIG_MALI_SYSTEM_TRACE = n - CONFIG_MALI_FENCE_DEBUG = n - endif - ifeq ($(CONFIG_MALI_DEBUG), y) - CONFIG_MALI_KUTF ?= y - ifeq ($(CONFIG_MALI_KUTF), y) - CONFIG_MALI_KUTF_IRQ_TEST ?= y - CONFIG_MALI_KUTF_CLK_RATE_TRACE ?= y + ifeq ($(CONFIG_MALI_DEBUG), y) + CONFIG_MALI_KUTF ?= y + ifeq ($(CONFIG_MALI_KUTF), y) + CONFIG_MALI_KUTF_IRQ_TEST ?= y + CONFIG_MALI_KUTF_CLK_RATE_TRACE ?= y + CONFIG_MALI_KUTF_MGM_INTEGRATION_TEST ?= y + ifeq ($(CONFIG_MALI_DEVFREQ), y) + ifeq ($(CONFIG_MALI_NO_MALI), y) + CONFIG_MALI_KUTF_IPA_UNIT_TEST ?= y + endif + endif + + else + # Prevent misuse when CONFIG_MALI_KUTF=n + CONFIG_MALI_KUTF_IRQ_TEST = n + CONFIG_MALI_KUTF_CLK_RATE_TRACE = n + CONFIG_MALI_KUTF_MGM_INTEGRATION_TEST = n + endif else - # Prevent misuse when CONFIG_MALI_KUTF=n + # Prevent misuse when CONFIG_MALI_DEBUG=n + CONFIG_MALI_KUTF = y CONFIG_MALI_KUTF_IRQ_TEST = n - CONFIG_MALI_KUTF_CLK_RATE_TRACE = n + CONFIG_MALI_KUTF_CLK_RATE_TRACE = y + CONFIG_MALI_KUTF_MGM_INTEGRATION_TEST = n endif else - # Prevent misuse when CONFIG_MALI_DEBUG=n + # Prevent misuse when CONFIG_MALI_MIDGARD=n + CONFIG_MALI_ARBITRATION = n CONFIG_MALI_KUTF = n CONFIG_MALI_KUTF_IRQ_TEST = n - CONFIG_MALI_KUTF_CLK_RATE_TRACE = n + CONFIG_MALI_KUTF_CLK_RATE_TRACE = y + CONFIG_MALI_KUTF_MGM_INTEGRATION_TEST = n endif -else - # Prevent misuse when CONFIG_MALI_MIDGARD=n - CONFIG_MALI_ARBITRATION = n - CONFIG_MALI_ARBITER_MODULES = n - CONFIG_MALI_GPU_POWER_MODULES = n - CONFIG_MALI_KUTF = n - CONFIG_MALI_KUTF_IRQ_TEST = n - CONFIG_MALI_KUTF_CLK_RATE_TRACE = n -endif -# All Mali CONFIG should be listed here -CONFIGS := \ - CONFIG_MALI_MIDGARD \ - CONFIG_MALI_CSF_SUPPORT \ - CONFIG_MALI_GATOR_SUPPORT \ - CONFIG_MALI_DMA_FENCE \ - CONFIG_MALI_ARBITER_SUPPORT \ - CONFIG_MALI_ARBITRATION \ - CONFIG_MALI_ARBITER_MODULES \ - CONFIG_MALI_GPU_POWER_MODULES \ - CONFIG_MALI_PARTITION_MANAGER \ - CONFIG_MALI_REAL_HW \ - CONFIG_MALI_GEM5_BUILD \ - CONFIG_MALI_DEVFREQ \ - CONFIG_MALI_MIDGARD_DVFS \ - CONFIG_MALI_DMA_BUF_MAP_ON_DEMAND \ - CONFIG_MALI_DMA_BUF_LEGACY_COMPAT \ - CONFIG_MALI_EXPERT \ - CONFIG_MALI_CORESTACK \ - CONFIG_MALI_2MB_ALLOC \ - CONFIG_MALI_PWRSOFT_765 \ - CONFIG_MALI_MEMORY_FULLY_BACKED \ - CONFIG_MALI_JOB_DUMP \ - CONFIG_MALI_NO_MALI \ - CONFIG_MALI_ERROR_INJECT \ - CONFIG_MALI_HW_ERRATA_1485982_NOT_AFFECTED \ - CONFIG_MALI_HW_ERRATA_1485982_USE_CLOCK_ALTERNATIVE \ - CONFIG_MALI_PRFCNT_SET_PRIMARY \ - CONFIG_MALI_PRFCNT_SET_SECONDARY \ - CONFIG_MALI_PRFCNT_SET_TERTIARY \ - CONFIG_MALI_PRFCNT_SET_SELECT_VIA_DEBUG_FS \ - CONFIG_MALI_DEBUG \ - CONFIG_MALI_MIDGARD_ENABLE_TRACE \ - CONFIG_MALI_SYSTEM_TRACE \ - CONFIG_MALI_FENCE_DEBUG \ - CONFIG_MALI_KUTF \ - CONFIG_MALI_KUTF_IRQ_TEST \ - CONFIG_MALI_KUTF_CLK_RATE_TRACE \ - CONFIG_MALI_XEN - -# Pixel integration CONFIG options -CONFIGS += \ - CONFIG_MALI_PIXEL_GPU_QOS \ - CONFIG_MALI_PIXEL_GPU_BTS \ - CONFIG_MALI_PIXEL_GPU_THERMAL \ - CONFIG_MALI_PIXEL_GPU_SECURE_RENDERING + # All Mali CONFIG should be listed here + CONFIGS := \ + CONFIG_MALI_MIDGARD \ + CONFIG_MALI_GATOR_SUPPORT \ + CONFIG_MALI_ARBITER_SUPPORT \ + CONFIG_MALI_ARBITRATION \ + CONFIG_MALI_PARTITION_MANAGER \ + CONFIG_MALI_REAL_HW \ + CONFIG_MALI_DEVFREQ \ + CONFIG_MALI_MIDGARD_DVFS \ + CONFIG_MALI_DMA_BUF_MAP_ON_DEMAND \ + CONFIG_MALI_DMA_BUF_LEGACY_COMPAT \ + CONFIG_MALI_EXPERT \ + CONFIG_MALI_CORESTACK \ + CONFIG_LARGE_PAGE_ALLOC_OVERRIDE \ + CONFIG_LARGE_PAGE_ALLOC \ + CONFIG_MALI_PWRSOFT_765 \ + CONFIG_MALI_MEMORY_FULLY_BACKED \ + CONFIG_MALI_JOB_DUMP \ + CONFIG_MALI_NO_MALI \ + CONFIG_MALI_ERROR_INJECT \ + CONFIG_MALI_HW_ERRATA_1485982_NOT_AFFECTED \ + CONFIG_MALI_HW_ERRATA_1485982_USE_CLOCK_ALTERNATIVE \ + CONFIG_MALI_HOST_CONTROLS_SC_RAILS \ + CONFIG_MALI_PRFCNT_SET_PRIMARY \ + CONFIG_MALI_PRFCNT_SET_SECONDARY \ + CONFIG_MALI_PRFCNT_SET_TERTIARY \ + CONFIG_MALI_PRFCNT_SET_SELECT_VIA_DEBUG_FS \ + CONFIG_MALI_DEBUG \ + CONFIG_MALI_MIDGARD_ENABLE_TRACE \ + CONFIG_MALI_SYSTEM_TRACE \ + CONFIG_MALI_FENCE_DEBUG \ + CONFIG_MALI_KUTF \ + CONFIG_MALI_KUTF_IRQ_TEST \ + CONFIG_MALI_KUTF_CLK_RATE_TRACE \ + CONFIG_MALI_KUTF_MGM_INTEGRATION_TEST \ + CONFIG_MALI_XEN \ + CONFIG_MALI_CORESIGHT \ + CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD + # Pixel integration CONFIG options + CONFIGS += \ + CONFIG_MALI_PIXEL_GPU_QOS \ + CONFIG_MALI_PIXEL_GPU_BTS \ + CONFIG_MALI_PIXEL_GPU_THERMAL \ + CONFIG_MALI_PIXEL_GPU_SECURE_RENDERING \ + CONFIG_MALI_PIXEL_GPU_SLC + +endif + +THIS_DIR := $(dir $(lastword $(MAKEFILE_LIST))) +-include $(THIS_DIR)/../arbitration/Makefile -# # MAKE_ARGS to pass the custom CONFIGs on out-of-tree build # # Generate the list of CONFIGs and values. @@ -231,7 +237,9 @@ MAKE_ARGS := $(foreach config,$(CONFIGS), \ $(value config)=$(value $(value config)), \ $(value config)=n)) -MAKE_ARGS += CONFIG_MALI_PLATFORM_NAME=$(CONFIG_MALI_PLATFORM_NAME) +ifeq ($(MALI_KCONFIG_EXT_PREFIX),) + MAKE_ARGS += CONFIG_MALI_PLATFORM_NAME=$(CONFIG_MALI_PLATFORM_NAME) +endif # # EXTRA_CFLAGS to define the custom CONFIGs on out-of-tree build @@ -243,13 +251,71 @@ EXTRA_CFLAGS := $(foreach config,$(CONFIGS), \ $(if $(filter y m,$(value $(value config))), \ -D$(value config)=1)) -EXTRA_CFLAGS += -DCONFIG_MALI_PLATFORM_NAME=$(CONFIG_MALI_PLATFORM_NAME) +ifeq ($(MALI_KCONFIG_EXT_PREFIX),) + EXTRA_CFLAGS += -DCONFIG_MALI_PLATFORM_NAME='\"$(CONFIG_MALI_PLATFORM_NAME)\"' + EXTRA_CFLAGS += -DCONFIG_MALI_NO_MALI_DEFAULT_GPU='\"$(CONFIG_MALI_NO_MALI_DEFAULT_GPU)\"' +endif include $(KDIR)/../private/google-modules/soc/gs/Makefile.include # # KBUILD_EXTRA_SYMBOLS to prevent warnings about unknown functions # +EXTRA_SYMBOLS += $(OUT_DIR)/../private/google-modules/gpu/mali_pixel/Module.symvers + +CFLAGS_MODULE += -Wall -Werror + +# The following were added to align with W=1 in scripts/Makefile.extrawarn +# from the Linux source tree (v5.18.14) +CFLAGS_MODULE += -Wextra -Wunused -Wno-unused-parameter +CFLAGS_MODULE += -Wmissing-declarations +CFLAGS_MODULE += -Wmissing-format-attribute +CFLAGS_MODULE += -Wmissing-prototypes +CFLAGS_MODULE += -Wold-style-definition +# The -Wmissing-include-dirs cannot be enabled as the path to some of the +# included directories change depending on whether it is an in-tree or +# out-of-tree build. +CFLAGS_MODULE += $(call cc-option, -Wunused-but-set-variable) +CFLAGS_MODULE += $(call cc-option, -Wunused-const-variable) +CFLAGS_MODULE += $(call cc-option, -Wpacked-not-aligned) +CFLAGS_MODULE += $(call cc-option, -Wstringop-truncation) +# The following turn off the warnings enabled by -Wextra +CFLAGS_MODULE += -Wno-sign-compare +CFLAGS_MODULE += -Wno-shift-negative-value +# This flag is needed to avoid build errors on older kernels +CFLAGS_MODULE += $(call cc-option, -Wno-cast-function-type) + +KBUILD_CPPFLAGS += -DKBUILD_EXTRA_WARN1 + +# The following were added to align with W=2 in scripts/Makefile.extrawarn +# from the Linux source tree (v5.18.14) +CFLAGS_MODULE += -Wdisabled-optimization +# The -Wshadow flag cannot be enabled unless upstream kernels are +# patched to fix redefinitions of certain built-in functions and +# global variables. +CFLAGS_MODULE += $(call cc-option, -Wlogical-op) +CFLAGS_MODULE += -Wmissing-field-initializers +# -Wtype-limits must be disabled due to build failures on kernel 5.x +CFLAGS_MODULE += -Wno-type-limits +CFLAGS_MODULE += $(call cc-option, -Wmaybe-uninitialized) +CFLAGS_MODULE += $(call cc-option, -Wunused-macros) + +KBUILD_CPPFLAGS += -DKBUILD_EXTRA_WARN2 + +# This warning is disabled to avoid build failures in some kernel versions +CFLAGS_MODULE += -Wno-ignored-qualifiers + +ifeq ($(CONFIG_GCOV_KERNEL),y) + CFLAGS_MODULE += $(call cc-option, -ftest-coverage) + CFLAGS_MODULE += $(call cc-option, -fprofile-arcs) + EXTRA_CFLAGS += -DGCOV_PROFILE=1 +endif + +ifeq ($(CONFIG_MALI_KCOV),y) + CFLAGS_MODULE += $(call cc-option, -fsanitize-coverage=trace-cmp) + EXTRA_CFLAGS += -DKCOV=1 + EXTRA_CFLAGS += -DKCOV_ENABLE_COMPARISONS=1 +endif modules modules_install clean: $(MAKE) -C $(KDIR) M=$(M) W=1 $(MAKE_ARGS) EXTRA_CFLAGS="$(EXTRA_CFLAGS)" KBUILD_EXTRA_SYMBOLS="$(EXTRA_SYMBOLS)" $(@) diff --git a/mali_kbase/Mconfig b/mali_kbase/Mconfig index 0f8f273..2d6fca0 100644 --- a/mali_kbase/Mconfig +++ b/mali_kbase/Mconfig @@ -1,6 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note # -# (C) COPYRIGHT 2012-2021 ARM Limited. All rights reserved. +# (C) COPYRIGHT 2012-2023 ARM Limited. All rights reserved. # # This program is free software and is provided to you under the terms of the # GNU General Public License version 2 as published by the Free Software @@ -41,11 +41,31 @@ config MALI_PLATFORM_NAME When PLATFORM_CUSTOM is set, this needs to be set manually to pick up the desired platform files. +choice + prompt "Mali HW backend" + depends on MALI_MIDGARD + default MALI_NO_MALI if NO_MALI + default MALI_REAL_HW + config MALI_REAL_HW - bool + bool "Enable build of Mali kernel driver for real HW" depends on MALI_MIDGARD - default y - default n if NO_MALI + help + This is the default HW backend. + +config MALI_NO_MALI + bool "Enable build of Mali kernel driver for No Mali" + depends on MALI_MIDGARD && MALI_EXPERT + help + This can be used to test the driver in a simulated environment + whereby the hardware is not physically present. If the hardware is physically + present it will not be used. This can be used to test the majority of the + driver without needing actual hardware or for software benchmarking. + All calls to the simulated hardware will complete immediately as if the hardware + completed the task. + + +endchoice config MALI_PLATFORM_DT_PIN_RST bool "Enable Juno GPU Pin reset" @@ -65,8 +85,7 @@ config MALI_CSF_SUPPORT config MALI_DEVFREQ bool "Enable devfreq support for Mali" depends on MALI_MIDGARD - default y if PLATFORM_JUNO - default y if PLATFORM_CUSTOM + default y help Support devfreq for Mali. @@ -98,16 +117,6 @@ config MALI_MIDGARD_ENABLE_TRACE Enables tracing in kbase. Trace log available through the "mali_trace" debugfs file, when the CONFIG_DEBUG_FS is enabled -config MALI_DMA_FENCE - bool "Enable DMA_BUF fence support for Mali" - depends on MALI_MIDGARD - default n - help - Support DMA_BUF fences for Mali. - - This option should only be enabled if the Linux Kernel has built in - support for DMA_BUF fences. - config MALI_ARBITER_SUPPORT bool "Enable arbiter support for Mali" depends on MALI_MIDGARD && !MALI_CSF_SUPPORT @@ -130,7 +139,7 @@ config MALI_DMA_BUF_MAP_ON_DEMAND default n default y if !DMA_BUF_SYNC_IOCTL_SUPPORTED help - This option caused kbase to set up the GPU mapping of imported + This option will cause kbase to set up the GPU mapping of imported dma-buf when needed to run atoms. This is the legacy behavior. This is intended for testing and the option will get removed in the @@ -150,6 +159,12 @@ config MALI_DMA_BUF_LEGACY_COMPAT flushes in other drivers. This only has an effect for clients using UK 11.18 or older. For later UK versions it is not possible. +config MALI_CORESIGHT + depends on MALI_MIDGARD && MALI_CSF_SUPPORT && !NO_MALI + select CSFFW_DEBUG_FW_AS_RW + bool "Enable Kbase CoreSight tracing support" + default n + menuconfig MALI_EXPERT depends on MALI_MIDGARD bool "Enable Expert Settings" @@ -158,17 +173,6 @@ menuconfig MALI_EXPERT Enabling this option and modifying the default settings may produce a driver with performance or other limitations. -config MALI_2MB_ALLOC - bool "Attempt to allocate 2MB pages" - depends on MALI_MIDGARD && MALI_EXPERT - default n - help - Rather than allocating all GPU memory page-by-page, attempt to - allocate 2MB pages from the kernel. This reduces TLB pressure and - helps to prevent memory fragmentation. - - If in doubt, say N - config MALI_MEMORY_FULLY_BACKED bool "Enable memory fully physically-backed" depends on MALI_MIDGARD && MALI_EXPERT @@ -192,6 +196,18 @@ config MALI_CORESTACK If unsure, say N. +config PAGE_MIGRATION_SUPPORT + bool "Compile with page migration support" + depends on BACKEND_KERNEL + default y + default n if ANDROID + help + Compile in support for page migration. + If set to disabled ('n') then page migration cannot + be enabled at all. If set to enabled, then page migration + support is explicitly compiled in. This has no effect when + PAGE_MIGRATION_OVERRIDE is disabled. + choice prompt "Error injection level" depends on MALI_MIDGARD && MALI_EXPERT @@ -231,14 +247,6 @@ config MALI_ERROR_INJECT depends on MALI_MIDGARD && MALI_EXPERT default y if !MALI_ERROR_INJECT_NONE -config MALI_GEM5_BUILD - bool "Enable build of Mali kernel driver for GEM5" - depends on MALI_MIDGARD && MALI_EXPERT - default n - help - This option is to do a Mali GEM5 build. - If unsure, say N. - config MALI_DEBUG bool "Enable debug build" depends on MALI_MIDGARD && MALI_EXPERT @@ -247,6 +255,23 @@ config MALI_DEBUG help Select this option for increased checking and reporting of errors. +config MALI_GCOV_KERNEL + bool "Enable branch coverage via gcov" + depends on MALI_MIDGARD && MALI_DEBUG + default n + help + Choose this option to enable building kbase with branch + coverage information. When built against a supporting kernel, + the coverage information will be available via debugfs. + +config MALI_KCOV + bool "Enable kcov coverage to support fuzzers" + depends on MALI_MIDGARD && MALI_DEBUG + default n + help + Choose this option to enable building with fuzzing-oriented + coverage, to improve the random test cases that are generated. + config MALI_FENCE_DEBUG bool "Enable debug sync fence usage" depends on MALI_MIDGARD && MALI_EXPERT @@ -329,6 +354,55 @@ config MALI_HW_ERRATA_1485982_USE_CLOCK_ALTERNATIVE tree using the property, opp-mali-errata-1485982. Otherwise the slowest clock will be selected. +config MALI_HOST_CONTROLS_SC_RAILS + bool "Enable Host based control of the shader core power rails" + depends on MALI_EXPERT && MALI_CSF_SUPPORT + default n + help + This option enables the Host based control of the power rails for + shader cores. It is recommended to use PDCA (Power Domain Control + Adapter) inside the GPU to handshake with SoC PMU to control the + power of cores. + +config MALI_TRACE_POWER_GPU_WORK_PERIOD + bool "Enable per-application GPU metrics tracepoints" + depends on MALI_MIDGARD + default y + help + This option enables per-application GPU metrics tracepoints. + + If unsure, say N. + +choice + prompt "CSF Firmware trace mode" + depends on MALI_MIDGARD + default MALI_FW_TRACE_MODE_MANUAL + help + CSF Firmware log operating mode. + +config MALI_FW_TRACE_MODE_MANUAL + bool "manual mode" + depends on MALI_MIDGARD + help + firmware log can be read manually by the userspace (and it will + also be dumped automatically into dmesg on GPU reset). + +config MALI_FW_TRACE_MODE_AUTO_PRINT + bool "automatic printing mode" + depends on MALI_MIDGARD + help + firmware log will be periodically emptied into dmesg, manual + reading through debugfs is disabled. + +config MALI_FW_TRACE_MODE_AUTO_DISCARD + bool "automatic discarding mode" + depends on MALI_MIDGARD + help + firmware log will be periodically discarded, the remaining log can be + read manually by the userspace (and it will also be dumped + automatically into dmesg on GPU reset). + +endchoice -source "kernel/drivers/gpu/arm/midgard/arbitration/Mconfig" +source "kernel/drivers/gpu/arm/arbitration/Mconfig" source "kernel/drivers/gpu/arm/midgard/tests/Mconfig" diff --git a/mali_kbase/arbiter/mali_kbase_arbif.c b/mali_kbase/arbiter/mali_kbase_arbif.c index 64e11ce..b5d3cd6 100644 --- a/mali_kbase/arbiter/mali_kbase_arbif.c +++ b/mali_kbase/arbiter/mali_kbase_arbif.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -28,12 +28,12 @@ #include <tl/mali_kbase_tracepoints.h> #include <linux/of.h> #include <linux/of_platform.h> -#include "mali_kbase_arbiter_interface.h" +#include "linux/mali_arbiter_interface.h" /* Arbiter interface version against which was implemented this module */ #define MALI_REQUIRED_KBASE_ARBITER_INTERFACE_VERSION 5 #if MALI_REQUIRED_KBASE_ARBITER_INTERFACE_VERSION != \ - MALI_KBASE_ARBITER_INTERFACE_VERSION + MALI_ARBITER_INTERFACE_VERSION #error "Unsupported Mali Arbiter interface version." #endif @@ -205,6 +205,7 @@ int kbase_arbif_init(struct kbase_device *kbdev) if (!pdev->dev.driver || !try_module_get(pdev->dev.driver->owner)) { dev_err(kbdev->dev, "arbiter_if driver not available\n"); + put_device(&pdev->dev); return -EPROBE_DEFER; } kbdev->arb.arb_dev = &pdev->dev; @@ -212,6 +213,7 @@ int kbase_arbif_init(struct kbase_device *kbdev) if (!arb_if) { dev_err(kbdev->dev, "arbiter_if driver not ready\n"); module_put(pdev->dev.driver->owner); + put_device(&pdev->dev); return -EPROBE_DEFER; } @@ -233,6 +235,7 @@ int kbase_arbif_init(struct kbase_device *kbdev) if (err) { dev_err(&pdev->dev, "Failed to register with arbiter\n"); module_put(pdev->dev.driver->owner); + put_device(&pdev->dev); if (err != -EPROBE_DEFER) err = -EFAULT; return err; @@ -262,8 +265,10 @@ void kbase_arbif_destroy(struct kbase_device *kbdev) arb_if->vm_ops.vm_arb_unregister_dev(kbdev->arb.arb_if); } kbdev->arb.arb_if = NULL; - if (kbdev->arb.arb_dev) + if (kbdev->arb.arb_dev) { module_put(kbdev->arb.arb_dev->driver->owner); + put_device(kbdev->arb.arb_dev); + } kbdev->arb.arb_dev = NULL; } diff --git a/mali_kbase/arbiter/mali_kbase_arbiter_pm.c b/mali_kbase/arbiter/mali_kbase_arbiter_pm.c index d813a04..667552c 100644 --- a/mali_kbase/arbiter/mali_kbase_arbiter_pm.c +++ b/mali_kbase/arbiter/mali_kbase_arbiter_pm.c @@ -955,7 +955,6 @@ static inline bool kbase_arbiter_pm_vm_gpu_assigned_lockheld( int kbase_arbiter_pm_ctx_active_handle_suspend(struct kbase_device *kbdev, enum kbase_pm_suspend_handler suspend_handler) { - struct kbasep_js_device_data *js_devdata = &kbdev->js_data; struct kbase_arbiter_vm_state *arb_vm_state = kbdev->pm.arb_vm_state; int res = 0; @@ -1008,11 +1007,9 @@ int kbase_arbiter_pm_ctx_active_handle_suspend(struct kbase_device *kbdev, /* Need to synchronously wait for GPU assignment */ atomic_inc(&kbdev->pm.gpu_users_waiting); mutex_unlock(&arb_vm_state->vm_state_lock); - mutex_unlock(&kbdev->pm.lock); - mutex_unlock(&js_devdata->runpool_mutex); + kbase_pm_unlock(kbdev); kbase_arbiter_pm_vm_wait_gpu_assignment(kbdev); - mutex_lock(&js_devdata->runpool_mutex); - mutex_lock(&kbdev->pm.lock); + kbase_pm_lock(kbdev); mutex_lock(&arb_vm_state->vm_state_lock); atomic_dec(&kbdev->pm.gpu_users_waiting); } @@ -1111,7 +1108,7 @@ static int arb_gpu_clk_notifier_register(struct kbase_device *kbdev, } /** - * gpu_clk_notifier_unregister() - Unregister clock rate change notifier + * arb_gpu_clk_notifier_unregister() - Unregister clock rate change notifier * @kbdev: kbase_device pointer * @gpu_clk_handle: Handle unique to the enumerated GPU clock * @nb: notifier block containing the callback function pointer diff --git a/mali_kbase/arbitration/Kconfig b/mali_kbase/arbitration/Kconfig deleted file mode 100644 index 1935c81..0000000 --- a/mali_kbase/arbitration/Kconfig +++ /dev/null @@ -1,49 +0,0 @@ -# SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note OR MIT -# -# (C) COPYRIGHT 2012-2021 ARM Limited. All rights reserved. -# -# This program is free software and is provided to you under the terms of the -# GNU General Public License version 2 as published by the Free Software -# Foundation, and any use by you of this program is subject to the terms -# of such GNU license. -# -# This program is distributed in the hope that it will be useful, -# but WITHOUT ANY WARRANTY; without even the implied warranty of -# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -# GNU General Public License for more details. -# -# You should have received a copy of the GNU General Public License -# along with this program; if not, you can access it online at -# http://www.gnu.org/licenses/gpl-2.0.html. -# -# - -config MALI_XEN - tristate "Enable Xen Interface reference code" - depends on MALI_ARBITRATION && XEN - default n - help - Enables the build of xen interface modules used in the reference - virtualization setup for Mali - If unsure, say N. - -config MALI_ARBITER_MODULES - tristate "Enable mali arbiter modules" - depends on MALI_ARBITRATION - default y - help - Enables the build of the arbiter modules used in the reference - virtualization setup for Mali - If unsure, say N - -config MALI_GPU_POWER_MODULES - tristate "Enable gpu power modules" - depends on MALI_ARBITRATION - default y - help - Enables the build of the gpu power modules used in the reference - virtualization setup for Mali - If unsure, say N - - -source "drivers/gpu/arm/midgard/arbitration/ptm/Kconfig" diff --git a/mali_kbase/backend/gpu/Kbuild b/mali_kbase/backend/gpu/Kbuild index 49abc1c..c37cc59 100644 --- a/mali_kbase/backend/gpu/Kbuild +++ b/mali_kbase/backend/gpu/Kbuild @@ -1,6 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note # -# (C) COPYRIGHT 2014-2021 ARM Limited. All rights reserved. +# (C) COPYRIGHT 2014-2022 ARM Limited. All rights reserved. # # This program is free software and is provided to you under the terms of the # GNU General Public License version 2 as published by the Free Software @@ -22,7 +22,6 @@ mali_kbase-y += \ backend/gpu/mali_kbase_cache_policy_backend.o \ backend/gpu/mali_kbase_gpuprops_backend.o \ backend/gpu/mali_kbase_irq_linux.o \ - backend/gpu/mali_kbase_js_backend.o \ backend/gpu/mali_kbase_pm_backend.o \ backend/gpu/mali_kbase_pm_driver.o \ backend/gpu/mali_kbase_pm_metrics.o \ @@ -31,6 +30,7 @@ mali_kbase-y += \ backend/gpu/mali_kbase_pm_coarse_demand.o \ backend/gpu/mali_kbase_pm_adaptive.o \ backend/gpu/mali_kbase_pm_policy.o \ + backend/gpu/mali_kbase_pm_event_log.o \ backend/gpu/mali_kbase_time.o \ backend/gpu/mali_kbase_l2_mmu_config.o \ backend/gpu/mali_kbase_clk_rate_trace_mgr.o @@ -41,15 +41,20 @@ ifeq ($(MALI_USE_CSF),0) backend/gpu/mali_kbase_jm_as.o \ backend/gpu/mali_kbase_debug_job_fault_backend.o \ backend/gpu/mali_kbase_jm_hw.o \ - backend/gpu/mali_kbase_jm_rb.o + backend/gpu/mali_kbase_jm_rb.o \ + backend/gpu/mali_kbase_js_backend.o endif mali_kbase-$(CONFIG_MALI_DEVFREQ) += \ backend/gpu/mali_kbase_devfreq.o -# Dummy model +ifneq ($(CONFIG_MALI_REAL_HW),y) + mali_kbase-y += backend/gpu/mali_kbase_model_linux.o +endif + +# NO_MALI Dummy model interface mali_kbase-$(CONFIG_MALI_NO_MALI) += backend/gpu/mali_kbase_model_dummy.o -mali_kbase-$(CONFIG_MALI_NO_MALI) += backend/gpu/mali_kbase_model_linux.o # HW error simulation mali_kbase-$(CONFIG_MALI_NO_MALI) += backend/gpu/mali_kbase_model_error_generator.o + diff --git a/mali_kbase/backend/gpu/mali_kbase_cache_policy_backend.c b/mali_kbase/backend/gpu/mali_kbase_cache_policy_backend.c index 9587c70..86539d5 100644 --- a/mali_kbase/backend/gpu/mali_kbase_cache_policy_backend.c +++ b/mali_kbase/backend/gpu/mali_kbase_cache_policy_backend.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2014-2016, 2018, 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2014-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -22,22 +22,59 @@ #include "backend/gpu/mali_kbase_cache_policy_backend.h" #include <device/mali_kbase_device.h> +/** + * kbasep_amba_register_present() - Check AMBA_<> register is present + * in the GPU. + * @kbdev: Device pointer + * + * Note: Only for arch version 12.x.1 onwards. + * + * Return: true if AMBA_FEATURES/ENABLE registers are present. + */ +static bool kbasep_amba_register_present(struct kbase_device *kbdev) +{ + return (ARCH_MAJOR_REV_REG(kbdev->gpu_props.props.raw_props.gpu_id) >= + GPU_ID2_ARCH_MAJOR_REV_MAKE(12, 1)); +} void kbase_cache_set_coherency_mode(struct kbase_device *kbdev, u32 mode) { kbdev->current_gpu_coherency_mode = mode; - kbase_reg_write(kbdev, COHERENCY_ENABLE, mode); + if (kbasep_amba_register_present(kbdev)) { + u32 val = kbase_reg_read(kbdev, GPU_CONTROL_REG(AMBA_ENABLE)); + + val = AMBA_ENABLE_COHERENCY_PROTOCOL_SET(val, mode); + kbase_reg_write(kbdev, GPU_CONTROL_REG(AMBA_ENABLE), val); + } else + kbase_reg_write(kbdev, GPU_CONTROL_REG(COHERENCY_ENABLE), mode); } u32 kbase_cache_get_coherency_features(struct kbase_device *kbdev) { u32 coherency_features; + if (kbasep_amba_register_present(kbdev)) + coherency_features = + kbase_reg_read(kbdev, GPU_CONTROL_REG(AMBA_FEATURES)); + else coherency_features = kbase_reg_read( kbdev, GPU_CONTROL_REG(COHERENCY_FEATURES)); return coherency_features; } +void kbase_amba_set_memory_cache_support(struct kbase_device *kbdev, + bool enable) +{ + if (kbasep_amba_register_present(kbdev)) { + u32 val = kbase_reg_read(kbdev, GPU_CONTROL_REG(AMBA_ENABLE)); + + val = AMBA_ENABLE_MEMORY_CACHE_SUPPORT_SET(val, enable); + kbase_reg_write(kbdev, GPU_CONTROL_REG(AMBA_ENABLE), val); + + } else { + WARN(1, "memory_cache_support not supported"); + } +} diff --git a/mali_kbase/backend/gpu/mali_kbase_cache_policy_backend.h b/mali_kbase/backend/gpu/mali_kbase_cache_policy_backend.h index 13c79d6..0103695 100644 --- a/mali_kbase/backend/gpu/mali_kbase_cache_policy_backend.h +++ b/mali_kbase/backend/gpu/mali_kbase_cache_policy_backend.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2014-2016, 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2014-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -43,4 +43,14 @@ void kbase_cache_set_coherency_mode(struct kbase_device *kbdev, */ u32 kbase_cache_get_coherency_features(struct kbase_device *kbdev); +/** + * kbase_amba_set_memory_cache_support() - Sets AMBA memory cache support + * in the GPU. + * @kbdev: Device pointer + * @enable: true for enable. + * + * Note: Only for arch version 12.x.1 onwards. + */ +void kbase_amba_set_memory_cache_support(struct kbase_device *kbdev, + bool enable); #endif /* _KBASE_CACHE_POLICY_BACKEND_H_ */ diff --git a/mali_kbase/backend/gpu/mali_kbase_clk_rate_trace_mgr.c b/mali_kbase/backend/gpu/mali_kbase_clk_rate_trace_mgr.c index d6b9750..cca4f74 100644 --- a/mali_kbase/backend/gpu/mali_kbase_clk_rate_trace_mgr.c +++ b/mali_kbase/backend/gpu/mali_kbase_clk_rate_trace_mgr.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2020-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -58,8 +58,10 @@ get_clk_rate_trace_callbacks(__maybe_unused struct kbase_device *kbdev) if (WARN_ON(!kbdev) || WARN_ON(!kbdev->dev)) return callbacks; - arbiter_if_node = - of_get_property(kbdev->dev->of_node, "arbiter_if", NULL); + arbiter_if_node = of_get_property(kbdev->dev->of_node, "arbiter-if", NULL); + if (!arbiter_if_node) + arbiter_if_node = of_get_property(kbdev->dev->of_node, "arbiter_if", NULL); + /* Arbitration enabled, override the callback pointer.*/ if (arbiter_if_node) callbacks = &arb_clk_rate_trace_ops; @@ -72,49 +74,6 @@ get_clk_rate_trace_callbacks(__maybe_unused struct kbase_device *kbdev) return callbacks; } -int kbase_lowest_gpu_freq_init(struct kbase_device *kbdev) -{ - /* Uses default reference frequency defined in below macro */ - u64 lowest_freq_khz = DEFAULT_REF_TIMEOUT_FREQ_KHZ; - - /* Only check lowest frequency in cases when OPPs are used and - * present in the device tree. - */ -#ifdef CONFIG_PM_OPP - struct dev_pm_opp *opp_ptr; - unsigned long found_freq = 0; - - /* find lowest frequency OPP */ - opp_ptr = dev_pm_opp_find_freq_ceil(kbdev->dev, &found_freq); - if (IS_ERR(opp_ptr)) { - dev_err(kbdev->dev, - "No OPPs found in device tree! Scaling timeouts using %llu kHz", - (unsigned long long)lowest_freq_khz); - } else { -#if KERNEL_VERSION(4, 11, 0) <= LINUX_VERSION_CODE - dev_pm_opp_put(opp_ptr); /* decrease OPP refcount */ -#endif - /* convert found frequency to KHz */ - found_freq /= 1000; - - /* If lowest frequency in OPP table is still higher - * than the reference, then keep the reference frequency - * as the one to use for scaling . - */ - if (found_freq < lowest_freq_khz) - lowest_freq_khz = found_freq; - } -#else - dev_err(kbdev->dev, - "No operating-points-v2 node or operating-points property in DT"); -#endif - - kbdev->lowest_gpu_freq_khz = lowest_freq_khz; - dev_dbg(kbdev->dev, "Lowest frequency identified is %llu kHz", - kbdev->lowest_gpu_freq_khz); - return 0; -} - static int gpu_clk_rate_change_notifier(struct notifier_block *nb, unsigned long event, void *data) { diff --git a/mali_kbase/backend/gpu/mali_kbase_clk_rate_trace_mgr.h b/mali_kbase/backend/gpu/mali_kbase_clk_rate_trace_mgr.h index a6ee959..35b3b8d 100644 --- a/mali_kbase/backend/gpu/mali_kbase_clk_rate_trace_mgr.h +++ b/mali_kbase/backend/gpu/mali_kbase_clk_rate_trace_mgr.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2020-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -61,20 +61,6 @@ struct kbase_clk_data { int kbase_clk_rate_trace_manager_init(struct kbase_device *kbdev); /** - * kbase_init_lowest_gpu_freq() - Find the lowest frequency that the GPU can - * run as using the device tree, and save this - * within kbdev. - * @kbdev: Pointer to kbase device. - * - * This function could be called from kbase_clk_rate_trace_manager_init, - * but is left separate as it can be called as soon as - * dev_pm_opp_of_add_table() has been called to initialize the OPP table. - * - * Return: 0 in any case. - */ -int kbase_lowest_gpu_freq_init(struct kbase_device *kbdev); - -/** * kbase_clk_rate_trace_manager_term - Terminate GPU clock rate trace manager. * * @kbdev: Device pointer diff --git a/mali_kbase/backend/gpu/mali_kbase_debug_job_fault_backend.c b/mali_kbase/backend/gpu/mali_kbase_debug_job_fault_backend.c index e121b41..cd3b29d 100644 --- a/mali_kbase/backend/gpu/mali_kbase_debug_job_fault_backend.c +++ b/mali_kbase/backend/gpu/mali_kbase_debug_job_fault_backend.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2012-2015, 2018-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2012-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -59,7 +59,7 @@ static int job_slot_reg_snapshot[] = { JS_CONFIG_NEXT }; -/*MMU_REG(r)*/ +/*MMU_CONTROL_REG(r)*/ static int mmu_reg_snapshot[] = { MMU_IRQ_MASK, MMU_IRQ_STATUS @@ -118,15 +118,14 @@ bool kbase_debug_job_fault_reg_snapshot_init(struct kbase_context *kctx, /* get the MMU registers*/ for (i = 0; i < sizeof(mmu_reg_snapshot)/4; i++) { - kctx->reg_dump[offset] = MMU_REG(mmu_reg_snapshot[i]); + kctx->reg_dump[offset] = MMU_CONTROL_REG(mmu_reg_snapshot[i]); offset += 2; } /* get the Address space registers*/ for (j = 0; j < as_number; j++) { for (i = 0; i < sizeof(as_reg_snapshot)/4; i++) { - kctx->reg_dump[offset] = - MMU_AS_REG(j, as_reg_snapshot[i]); + kctx->reg_dump[offset] = MMU_STAGE1_REG(MMU_AS_REG(j, as_reg_snapshot[i])); offset += 2; } } diff --git a/mali_kbase/backend/gpu/mali_kbase_devfreq.c b/mali_kbase/backend/gpu/mali_kbase_devfreq.c index 00b32b9..a389cd9 100644 --- a/mali_kbase/backend/gpu/mali_kbase_devfreq.c +++ b/mali_kbase/backend/gpu/mali_kbase_devfreq.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2014-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2014-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -57,7 +57,7 @@ static unsigned long get_voltage(struct kbase_device *kbdev, unsigned long freq) opp = dev_pm_opp_find_freq_exact(kbdev->dev, freq, true); if (IS_ERR_OR_NULL(opp)) - dev_err(kbdev->dev, "Failed to get opp (%ld)\n", PTR_ERR(opp)); + dev_err(kbdev->dev, "Failed to get opp (%d)\n", PTR_ERR_OR_ZERO(opp)); else { voltage = dev_pm_opp_get_voltage(opp); #if KERNEL_VERSION(4, 11, 0) <= LINUX_VERSION_CODE @@ -133,8 +133,8 @@ kbase_devfreq_target(struct device *dev, unsigned long *target_freq, u32 flags) rcu_read_unlock(); #endif if (IS_ERR_OR_NULL(opp)) { - dev_err(dev, "Failed to get opp (%ld)\n", PTR_ERR(opp)); - return PTR_ERR(opp); + dev_err(dev, "Failed to get opp (%d)\n", PTR_ERR_OR_ZERO(opp)); + return IS_ERR(opp) ? PTR_ERR(opp) : -ENODEV; } #if KERNEL_VERSION(4, 11, 0) <= LINUX_VERSION_CODE dev_pm_opp_put(opp); @@ -317,6 +317,7 @@ static int kbase_devfreq_init_freq_table(struct kbase_device *kbdev, dp->max_state = i; + /* Have the lowest clock as suspend clock. * It may be overridden by 'opp-mali-errata-1485982'. */ @@ -630,12 +631,12 @@ static void kbase_devfreq_work_term(struct kbase_device *kbdev) destroy_workqueue(workq); } - int kbase_devfreq_init(struct kbase_device *kbdev) { struct devfreq_dev_profile *dp; int err; unsigned int i; + bool free_devfreq_freq_table = true; if (kbdev->nr_clocks == 0) { dev_err(kbdev->dev, "Clock not available for devfreq\n"); @@ -669,32 +670,35 @@ int kbase_devfreq_init(struct kbase_device *kbdev) dp->freq_table[0] / 1000; } - err = kbase_devfreq_init_core_mask_table(kbdev); +#if IS_ENABLED(CONFIG_DEVFREQ_THERMAL) + err = kbase_ipa_init(kbdev); if (err) { - kbase_devfreq_term_freq_table(kbdev); - return err; + dev_err(kbdev->dev, "IPA initialization failed"); + goto ipa_init_failed; } +#endif + + err = kbase_devfreq_init_core_mask_table(kbdev); + if (err) + goto init_core_mask_table_failed; kbdev->devfreq = devfreq_add_device(kbdev->dev, dp, "simple_ondemand", NULL); if (IS_ERR(kbdev->devfreq)) { err = PTR_ERR(kbdev->devfreq); kbdev->devfreq = NULL; - kbase_devfreq_term_core_mask_table(kbdev); - kbase_devfreq_term_freq_table(kbdev); - dev_err(kbdev->dev, "Fail to add devfreq device(%d)\n", err); - return err; + dev_err(kbdev->dev, "Fail to add devfreq device(%d)", err); + goto devfreq_add_dev_failed; } + /* Explicit free of freq table isn't needed after devfreq_add_device() */ + free_devfreq_freq_table = false; + /* Initialize devfreq suspend/resume workqueue */ err = kbase_devfreq_work_init(kbdev); if (err) { - if (devfreq_remove_device(kbdev->devfreq)) - dev_err(kbdev->dev, "Fail to rm devfreq\n"); - kbdev->devfreq = NULL; - kbase_devfreq_term_core_mask_table(kbdev); - dev_err(kbdev->dev, "Fail to init devfreq workqueue\n"); - return err; + dev_err(kbdev->dev, "Fail to init devfreq workqueue"); + goto devfreq_work_init_failed; } /* devfreq_add_device only copies a few of kbdev->dev's fields, so @@ -705,26 +709,20 @@ int kbase_devfreq_init(struct kbase_device *kbdev) err = devfreq_register_opp_notifier(kbdev->dev, kbdev->devfreq); if (err) { dev_err(kbdev->dev, - "Failed to register OPP notifier (%d)\n", err); + "Failed to register OPP notifier (%d)", err); goto opp_notifier_failed; } #if IS_ENABLED(CONFIG_DEVFREQ_THERMAL) - err = kbase_ipa_init(kbdev); - if (err) { - dev_err(kbdev->dev, "IPA initialization failed\n"); - goto ipa_init_failed; - } - kbdev->devfreq_cooling = of_devfreq_cooling_register_power( kbdev->dev->of_node, kbdev->devfreq, &kbase_ipa_power_model_ops); if (IS_ERR_OR_NULL(kbdev->devfreq_cooling)) { - err = PTR_ERR(kbdev->devfreq_cooling); + err = PTR_ERR_OR_ZERO(kbdev->devfreq_cooling); dev_err(kbdev->dev, - "Failed to register cooling device (%d)\n", - err); + "Failed to register cooling device (%d)", err); + err = err == 0 ? -ENODEV : err; goto cooling_reg_failed; } #endif @@ -733,21 +731,29 @@ int kbase_devfreq_init(struct kbase_device *kbdev) #if IS_ENABLED(CONFIG_DEVFREQ_THERMAL) cooling_reg_failed: - kbase_ipa_term(kbdev); -ipa_init_failed: devfreq_unregister_opp_notifier(kbdev->dev, kbdev->devfreq); #endif /* CONFIG_DEVFREQ_THERMAL */ opp_notifier_failed: kbase_devfreq_work_term(kbdev); +devfreq_work_init_failed: if (devfreq_remove_device(kbdev->devfreq)) - dev_err(kbdev->dev, "Failed to terminate devfreq (%d)\n", err); + dev_err(kbdev->dev, "Failed to terminate devfreq (%d)", err); kbdev->devfreq = NULL; +devfreq_add_dev_failed: kbase_devfreq_term_core_mask_table(kbdev); +init_core_mask_table_failed: +#if IS_ENABLED(CONFIG_DEVFREQ_THERMAL) + kbase_ipa_term(kbdev); +ipa_init_failed: +#endif + if (free_devfreq_freq_table) + kbase_devfreq_term_freq_table(kbdev); + return err; } @@ -760,8 +766,6 @@ void kbase_devfreq_term(struct kbase_device *kbdev) #if IS_ENABLED(CONFIG_DEVFREQ_THERMAL) if (kbdev->devfreq_cooling) devfreq_cooling_unregister(kbdev->devfreq_cooling); - - kbase_ipa_term(kbdev); #endif devfreq_unregister_opp_notifier(kbdev->dev, kbdev->devfreq); @@ -775,4 +779,8 @@ void kbase_devfreq_term(struct kbase_device *kbdev) kbdev->devfreq = NULL; kbase_devfreq_term_core_mask_table(kbdev); + +#if IS_ENABLED(CONFIG_DEVFREQ_THERMAL) + kbase_ipa_term(kbdev); +#endif } diff --git a/mali_kbase/backend/gpu/mali_kbase_gpuprops_backend.c b/mali_kbase/backend/gpu/mali_kbase_gpuprops_backend.c index 0ea14bc..10e92ec 100644 --- a/mali_kbase/backend/gpu/mali_kbase_gpuprops_backend.c +++ b/mali_kbase/backend/gpu/mali_kbase_gpuprops_backend.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2014-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2014-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -40,19 +40,7 @@ int kbase_backend_gpuprops_get(struct kbase_device *kbdev, registers.l2_features = kbase_reg_read(kbdev, GPU_CONTROL_REG(L2_FEATURES)); - registers.core_features = 0; -#if !MALI_USE_CSF - /* TGOx */ - registers.core_features = kbase_reg_read(kbdev, - GPU_CONTROL_REG(CORE_FEATURES)); -#else /* !MALI_USE_CSF */ - if (!(((registers.gpu_id & GPU_ID2_PRODUCT_MODEL) == - GPU_ID2_PRODUCT_TDUX) || - ((registers.gpu_id & GPU_ID2_PRODUCT_MODEL) == - GPU_ID2_PRODUCT_TODX))) - registers.core_features = - kbase_reg_read(kbdev, GPU_CONTROL_REG(CORE_FEATURES)); -#endif /* MALI_USE_CSF */ + registers.tiler_features = kbase_reg_read(kbdev, GPU_CONTROL_REG(TILER_FEATURES)); registers.mem_features = kbase_reg_read(kbdev, @@ -170,6 +158,11 @@ int kbase_backend_gpuprops_get_features(struct kbase_device *kbdev, regdump->coherency_features = coherency_features; + if (kbase_hw_has_feature(kbdev, BASE_HW_FEATURE_CORE_FEATURES)) + regdump->core_features = kbase_reg_read(kbdev, GPU_CONTROL_REG(CORE_FEATURES)); + else + regdump->core_features = 0; + kbase_pm_register_access_disable(kbdev); return error; diff --git a/mali_kbase/backend/gpu/mali_kbase_instr_backend.c b/mali_kbase/backend/gpu/mali_kbase_instr_backend.c index 0ece571..b89b917 100644 --- a/mali_kbase/backend/gpu/mali_kbase_instr_backend.c +++ b/mali_kbase/backend/gpu/mali_kbase_instr_backend.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2014-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2014-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -29,6 +29,20 @@ #include <device/mali_kbase_device.h> #include <backend/gpu/mali_kbase_instr_internal.h> +static int wait_prfcnt_ready(struct kbase_device *kbdev) +{ + u32 loops; + + for (loops = 0; loops < KBASE_PRFCNT_ACTIVE_MAX_LOOPS; loops++) { + const u32 prfcnt_active = kbase_reg_read(kbdev, GPU_CONTROL_REG(GPU_STATUS)) & + GPU_STATUS_PRFCNT_ACTIVE; + if (!prfcnt_active) + return 0; + } + + dev_err(kbdev->dev, "PRFCNT_ACTIVE bit stuck\n"); + return -EBUSY; +} int kbase_instr_hwcnt_enable_internal(struct kbase_device *kbdev, struct kbase_context *kctx, @@ -43,20 +57,20 @@ int kbase_instr_hwcnt_enable_internal(struct kbase_device *kbdev, /* alignment failure */ if ((enable->dump_buffer == 0ULL) || (enable->dump_buffer & (2048 - 1))) - goto out_err; + return err; spin_lock_irqsave(&kbdev->hwcnt.lock, flags); if (kbdev->hwcnt.backend.state != KBASE_INSTR_STATE_DISABLED) { /* Instrumentation is already enabled */ spin_unlock_irqrestore(&kbdev->hwcnt.lock, flags); - goto out_err; + return err; } if (kbase_is_gpu_removed(kbdev)) { /* GPU has been removed by Arbiter */ spin_unlock_irqrestore(&kbdev->hwcnt.lock, flags); - goto out_err; + return err; } /* Enable interrupt */ @@ -81,9 +95,19 @@ int kbase_instr_hwcnt_enable_internal(struct kbase_device *kbdev, prfcnt_config |= enable->counter_set << PRFCNT_CONFIG_SETSELECT_SHIFT; #endif + /* Wait until prfcnt config register can be written */ + err = wait_prfcnt_ready(kbdev); + if (err) + return err; + kbase_reg_write(kbdev, GPU_CONTROL_REG(PRFCNT_CONFIG), prfcnt_config | PRFCNT_CONFIG_MODE_OFF); + /* Wait until prfcnt is disabled before writing configuration registers */ + err = wait_prfcnt_ready(kbdev); + if (err) + return err; + kbase_reg_write(kbdev, GPU_CONTROL_REG(PRFCNT_BASE_LO), enable->dump_buffer & 0xFFFFFFFF); kbase_reg_write(kbdev, GPU_CONTROL_REG(PRFCNT_BASE_HI), @@ -111,12 +135,8 @@ int kbase_instr_hwcnt_enable_internal(struct kbase_device *kbdev, spin_unlock_irqrestore(&kbdev->hwcnt.lock, flags); - err = 0; - dev_dbg(kbdev->dev, "HW counters dumping set-up for context %pK", kctx); - return err; - out_err: - return err; + return 0; } static void kbasep_instr_hwc_disable_hw_prfcnt(struct kbase_device *kbdev) @@ -135,7 +155,10 @@ static void kbasep_instr_hwc_disable_hw_prfcnt(struct kbase_device *kbdev) kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_IRQ_MASK), irq_mask & ~PRFCNT_SAMPLE_COMPLETED); - /* Disable the counters */ + /* Wait until prfcnt config register can be written, then disable the counters. + * Return value is ignored as we are disabling anyway. + */ + wait_prfcnt_ready(kbdev); kbase_reg_write(kbdev, GPU_CONTROL_REG(PRFCNT_CONFIG), 0); kbdev->hwcnt.kctx = NULL; @@ -146,7 +169,6 @@ static void kbasep_instr_hwc_disable_hw_prfcnt(struct kbase_device *kbdev) int kbase_instr_hwcnt_disable_internal(struct kbase_context *kctx) { unsigned long flags, pm_flags; - int err = -EINVAL; struct kbase_device *kbdev = kctx->kbdev; while (1) { @@ -167,14 +189,14 @@ int kbase_instr_hwcnt_disable_internal(struct kbase_context *kctx) /* Instrumentation is not enabled */ spin_unlock_irqrestore(&kbdev->hwcnt.lock, flags); spin_unlock_irqrestore(&kbdev->hwaccess_lock, pm_flags); - return err; + return -EINVAL; } if (kbdev->hwcnt.kctx != kctx) { /* Instrumentation has been setup for another context */ spin_unlock_irqrestore(&kbdev->hwcnt.lock, flags); spin_unlock_irqrestore(&kbdev->hwaccess_lock, pm_flags); - return err; + return -EINVAL; } if (kbdev->hwcnt.backend.state == KBASE_INSTR_STATE_IDLE) @@ -233,6 +255,11 @@ int kbase_instr_hwcnt_request_dump(struct kbase_context *kctx) */ kbdev->hwcnt.backend.state = KBASE_INSTR_STATE_DUMPING; + /* Wait until prfcnt is ready to request dump */ + err = wait_prfcnt_ready(kbdev); + if (err) + goto unlock; + /* Reconfigure the dump address */ kbase_reg_write(kbdev, GPU_CONTROL_REG(PRFCNT_BASE_LO), kbdev->hwcnt.addr & 0xFFFFFFFF); @@ -248,11 +275,8 @@ int kbase_instr_hwcnt_request_dump(struct kbase_context *kctx) dev_dbg(kbdev->dev, "HW counters dumping done for context %pK", kctx); - err = 0; - unlock: spin_unlock_irqrestore(&kbdev->hwcnt.lock, flags); - return err; } KBASE_EXPORT_SYMBOL(kbase_instr_hwcnt_request_dump); @@ -346,21 +370,24 @@ int kbase_instr_hwcnt_clear(struct kbase_context *kctx) */ if (kbdev->hwcnt.kctx != kctx || kbdev->hwcnt.backend.state != KBASE_INSTR_STATE_IDLE) - goto out; + goto unlock; if (kbase_is_gpu_removed(kbdev)) { /* GPU has been removed by Arbiter */ - goto out; + goto unlock; } + /* Wait until prfcnt is ready to clear */ + err = wait_prfcnt_ready(kbdev); + if (err) + goto unlock; + /* Clear the counters */ KBASE_KTRACE_ADD(kbdev, CORE_GPU_PRFCNT_CLEAR, NULL, 0); kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_COMMAND), GPU_COMMAND_PRFCNT_CLEAR); - err = 0; - -out: +unlock: spin_unlock_irqrestore(&kbdev->hwcnt.lock, flags); return err; } diff --git a/mali_kbase/backend/gpu/mali_kbase_instr_defs.h b/mali_kbase/backend/gpu/mali_kbase_instr_defs.h index 7190f42..bd2eb8a 100644 --- a/mali_kbase/backend/gpu/mali_kbase_instr_defs.h +++ b/mali_kbase/backend/gpu/mali_kbase_instr_defs.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2014, 2016, 2018-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2014, 2016, 2018-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -26,7 +26,7 @@ #ifndef _KBASE_INSTR_DEFS_H_ #define _KBASE_INSTR_DEFS_H_ -#include <mali_kbase_hwcnt_gpu.h> +#include <hwcnt/mali_kbase_hwcnt_gpu.h> /* * Instrumentation State Machine States diff --git a/mali_kbase/backend/gpu/mali_kbase_irq_linux.c b/mali_kbase/backend/gpu/mali_kbase_irq_linux.c index a29f7ef..b95277c 100644 --- a/mali_kbase/backend/gpu/mali_kbase_irq_linux.c +++ b/mali_kbase/backend/gpu/mali_kbase_irq_linux.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2014-2016, 2018-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2014-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -25,12 +25,12 @@ #include <linux/interrupt.h> -#if !IS_ENABLED(CONFIG_MALI_NO_MALI) +#if IS_ENABLED(CONFIG_MALI_REAL_HW) /* GPU IRQ Tags */ -#define JOB_IRQ_TAG 0 -#define MMU_IRQ_TAG 1 -#define GPU_IRQ_TAG 2 +#define JOB_IRQ_TAG 0 +#define MMU_IRQ_TAG 1 +#define GPU_IRQ_TAG 2 static void *kbase_tag(void *ptr, u32 tag) { @@ -99,7 +99,7 @@ static irqreturn_t kbase_mmu_irq_handler(int irq, void *data) atomic_inc(&kbdev->faults_pending); - val = kbase_reg_read(kbdev, MMU_REG(MMU_IRQ_STATUS)); + val = kbase_reg_read(kbdev, MMU_CONTROL_REG(MMU_IRQ_STATUS)); #ifdef CONFIG_MALI_DEBUG if (!kbdev->pm.backend.driver_ready_for_irqs) @@ -163,7 +163,6 @@ static irq_handler_t kbase_handler_table[] = { #ifdef CONFIG_MALI_DEBUG #define JOB_IRQ_HANDLER JOB_IRQ_TAG -#define MMU_IRQ_HANDLER MMU_IRQ_TAG #define GPU_IRQ_HANDLER GPU_IRQ_TAG /** @@ -299,7 +298,7 @@ static irqreturn_t kbase_mmu_irq_test_handler(int irq, void *data) return IRQ_NONE; } - val = kbase_reg_read(kbdev, MMU_REG(MMU_IRQ_STATUS)); + val = kbase_reg_read(kbdev, MMU_CONTROL_REG(MMU_IRQ_STATUS)); spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); @@ -311,7 +310,7 @@ static irqreturn_t kbase_mmu_irq_test_handler(int irq, void *data) kbasep_irq_test_data.triggered = 1; wake_up(&kbasep_irq_test_data.wait); - kbase_reg_write(kbdev, MMU_REG(MMU_IRQ_CLEAR), val); + kbase_reg_write(kbdev, MMU_CONTROL_REG(MMU_IRQ_CLEAR), val); return IRQ_HANDLED; } @@ -345,8 +344,8 @@ static int kbasep_common_test_interrupt( break; case MMU_IRQ_TAG: test_handler = kbase_mmu_irq_test_handler; - rawstat_offset = MMU_REG(MMU_IRQ_RAWSTAT); - mask_offset = MMU_REG(MMU_IRQ_MASK); + rawstat_offset = MMU_CONTROL_REG(MMU_IRQ_RAWSTAT); + mask_offset = MMU_CONTROL_REG(MMU_IRQ_MASK); break; case GPU_IRQ_TAG: /* already tested by pm_driver - bail out */ @@ -501,4 +500,4 @@ void kbase_synchronize_irqs(struct kbase_device *kbdev) KBASE_EXPORT_TEST_API(kbase_synchronize_irqs); -#endif /* !IS_ENABLED(CONFIG_MALI_NO_MALI) */ +#endif /* IS_ENABLED(CONFIG_MALI_REAL_HW) */ diff --git a/mali_kbase/backend/gpu/mali_kbase_jm_as.c b/mali_kbase/backend/gpu/mali_kbase_jm_as.c index 309e5c7..7059c84 100644 --- a/mali_kbase/backend/gpu/mali_kbase_jm_as.c +++ b/mali_kbase/backend/gpu/mali_kbase_jm_as.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2014-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2014-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -67,9 +67,8 @@ static void assign_and_activate_kctx_addr_space(struct kbase_device *kbdev, kbase_js_runpool_inc_context_count(kbdev, kctx); } -bool kbase_backend_use_ctx_sched(struct kbase_device *kbdev, - struct kbase_context *kctx, - int js) +bool kbase_backend_use_ctx_sched(struct kbase_device *kbdev, struct kbase_context *kctx, + unsigned int js) { int i; @@ -240,4 +239,3 @@ bool kbase_backend_use_ctx(struct kbase_device *kbdev, return true; } - diff --git a/mali_kbase/backend/gpu/mali_kbase_jm_hw.c b/mali_kbase/backend/gpu/mali_kbase_jm_hw.c index 32bdf72..dd8f4d9 100644 --- a/mali_kbase/backend/gpu/mali_kbase_jm_hw.c +++ b/mali_kbase/backend/gpu/mali_kbase_jm_hw.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2010-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -34,7 +34,7 @@ #include <mali_kbase_ctx_sched.h> #include <mali_kbase_kinstr_jm.h> #include <mali_kbase_hwaccess_instr.h> -#include <mali_kbase_hwcnt_context.h> +#include <hwcnt/mali_kbase_hwcnt_context.h> #include <device/mali_kbase_device.h> #include <backend/gpu/mali_kbase_irq_internal.h> #include <backend/gpu/mali_kbase_jm_internal.h> @@ -44,9 +44,8 @@ static void kbasep_try_reset_gpu_early_locked(struct kbase_device *kbdev); static u64 kbasep_apply_limited_core_mask(const struct kbase_device *kbdev, const u64 affinity, const u64 limited_core_mask); -static u64 kbase_job_write_affinity(struct kbase_device *kbdev, - base_jd_core_req core_req, - int js, const u64 limited_core_mask) +static u64 kbase_job_write_affinity(struct kbase_device *kbdev, base_jd_core_req core_req, + unsigned int js, const u64 limited_core_mask) { u64 affinity; bool skip_affinity_check = false; @@ -191,9 +190,28 @@ static u64 select_job_chain(struct kbase_jd_atom *katom) return jc; } -void kbase_job_hw_submit(struct kbase_device *kbdev, - struct kbase_jd_atom *katom, - int js) +static inline bool kbasep_jm_wait_js_free(struct kbase_device *kbdev, unsigned int js, + struct kbase_context *kctx) +{ + const ktime_t wait_loop_start = ktime_get_raw(); + const s64 max_timeout = (s64)kbdev->js_data.js_free_wait_time_ms; + s64 diff = 0; + + /* wait for the JS_COMMAND_NEXT register to reach the given status value */ + do { + if (!kbase_reg_read(kbdev, JOB_SLOT_REG(js, JS_COMMAND_NEXT))) + return true; + + diff = ktime_to_ms(ktime_sub(ktime_get_raw(), wait_loop_start)); + } while (diff < max_timeout); + + dev_err(kbdev->dev, "Timeout in waiting for job slot %u to become free for ctx %d_%u", js, + kctx->tgid, kctx->id); + + return false; +} + +int kbase_job_hw_submit(struct kbase_device *kbdev, struct kbase_jd_atom *katom, unsigned int js) { struct kbase_context *kctx; u32 cfg; @@ -202,13 +220,12 @@ void kbase_job_hw_submit(struct kbase_device *kbdev, struct slot_rb *ptr_slot_rb = &kbdev->hwaccess.backend.slot_rb[js]; lockdep_assert_held(&kbdev->hwaccess_lock); - KBASE_DEBUG_ASSERT(kbdev); - KBASE_DEBUG_ASSERT(katom); kctx = katom->kctx; /* Command register must be available */ - KBASE_DEBUG_ASSERT(kbasep_jm_is_js_free(kbdev, js, kctx)); + if (!kbasep_jm_wait_js_free(kbdev, js, kctx)) + return -EPERM; dev_dbg(kctx->kbdev->dev, "Write JS_HEAD_NEXT 0x%llx for atom %pK\n", jc_head, (void *)katom); @@ -226,36 +243,47 @@ void kbase_job_hw_submit(struct kbase_device *kbdev, */ cfg = kctx->as_nr; - if (kbase_hw_has_feature(kbdev, BASE_HW_FEATURE_FLUSH_REDUCTION) && - !(kbdev->serialize_jobs & KBASE_SERIALIZE_RESET)) - cfg |= JS_CONFIG_ENABLE_FLUSH_REDUCTION; + if(!kbase_jd_katom_is_protected(katom)) { + if (kbase_hw_has_feature(kbdev, BASE_HW_FEATURE_FLUSH_REDUCTION) && + !(kbdev->serialize_jobs & KBASE_SERIALIZE_RESET)) + cfg |= JS_CONFIG_ENABLE_FLUSH_REDUCTION; + + if (0 != (katom->core_req & BASE_JD_REQ_SKIP_CACHE_START)) { + /* Force a cache maintenance operation if the newly submitted + * katom to the slot is from a different kctx. For a JM GPU + * that has the feature BASE_HW_FEATURE_FLUSH_INV_SHADER_OTHER, + * applies a FLUSH_INV_SHADER_OTHER. Otherwise, do a + * FLUSH_CLEAN_INVALIDATE. + */ + u64 tagged_kctx = ptr_slot_rb->last_kctx_tagged; + + if (tagged_kctx != SLOT_RB_NULL_TAG_VAL && + tagged_kctx != SLOT_RB_TAG_KCTX(kctx)) { + if (kbase_hw_has_feature(kbdev, + BASE_HW_FEATURE_FLUSH_INV_SHADER_OTHER)) + cfg |= JS_CONFIG_START_FLUSH_INV_SHADER_OTHER; + else + cfg |= JS_CONFIG_START_FLUSH_CLEAN_INVALIDATE; + } else + cfg |= JS_CONFIG_START_FLUSH_NO_ACTION; + } else + cfg |= JS_CONFIG_START_FLUSH_CLEAN_INVALIDATE; - if (0 != (katom->core_req & BASE_JD_REQ_SKIP_CACHE_START)) { - /* Force a cache maintenance operation if the newly submitted - * katom to the slot is from a different kctx. For a JM GPU - * that has the feature BASE_HW_FEATURE_FLUSH_INV_SHADER_OTHER, - * applies a FLUSH_INV_SHADER_OTHER. Otherwise, do a - * FLUSH_CLEAN_INVALIDATE. + if (0 != (katom->core_req & BASE_JD_REQ_SKIP_CACHE_END) && + !(kbdev->serialize_jobs & KBASE_SERIALIZE_RESET)) + cfg |= JS_CONFIG_END_FLUSH_NO_ACTION; + else if (kbase_hw_has_feature(kbdev, BASE_HW_FEATURE_CLEAN_ONLY_SAFE)) + cfg |= JS_CONFIG_END_FLUSH_CLEAN; + else + cfg |= JS_CONFIG_END_FLUSH_CLEAN_INVALIDATE; + } else { + /* Force cache flush on job chain start/end if katom is protected. + * Valhall JM GPUs have BASE_HW_FEATURE_CLEAN_ONLY_SAFE feature, + * so DDK set JS_CONFIG_END_FLUSH_CLEAN config */ - u64 tagged_kctx = ptr_slot_rb->last_kctx_tagged; - - if (tagged_kctx != SLOT_RB_NULL_TAG_VAL && tagged_kctx != SLOT_RB_TAG_KCTX(kctx)) { - if (kbase_hw_has_feature(kbdev, BASE_HW_FEATURE_FLUSH_INV_SHADER_OTHER)) - cfg |= JS_CONFIG_START_FLUSH_INV_SHADER_OTHER; - else - cfg |= JS_CONFIG_START_FLUSH_CLEAN_INVALIDATE; - } else - cfg |= JS_CONFIG_START_FLUSH_NO_ACTION; - } else cfg |= JS_CONFIG_START_FLUSH_CLEAN_INVALIDATE; - - if (0 != (katom->core_req & BASE_JD_REQ_SKIP_CACHE_END) && - !(kbdev->serialize_jobs & KBASE_SERIALIZE_RESET)) - cfg |= JS_CONFIG_END_FLUSH_NO_ACTION; - else if (kbase_hw_has_feature(kbdev, BASE_HW_FEATURE_CLEAN_ONLY_SAFE)) cfg |= JS_CONFIG_END_FLUSH_CLEAN; - else - cfg |= JS_CONFIG_END_FLUSH_CLEAN_INVALIDATE; + } cfg |= JS_CONFIG_THREAD_PRI(8); @@ -281,7 +309,7 @@ void kbase_job_hw_submit(struct kbase_device *kbdev, /* Write an approximate start timestamp. * It's approximate because there might be a job in the HEAD register. */ - katom->start_timestamp = ktime_get(); + katom->start_timestamp = ktime_get_raw(); /* GO ! */ dev_dbg(kbdev->dev, "JS: Submitting atom %pK from ctx %pK to js[%d] with head=0x%llx", @@ -329,6 +357,8 @@ void kbase_job_hw_submit(struct kbase_device *kbdev, kbase_reg_write(kbdev, JOB_SLOT_REG(js, JS_COMMAND_NEXT), JS_COMMAND_START); + + return 0; } /** @@ -344,10 +374,8 @@ void kbase_job_hw_submit(struct kbase_device *kbdev, * work out the best estimate (which might still result in an over-estimate to * the calculated time spent) */ -static void kbasep_job_slot_update_head_start_timestamp( - struct kbase_device *kbdev, - int js, - ktime_t end_timestamp) +static void kbasep_job_slot_update_head_start_timestamp(struct kbase_device *kbdev, unsigned int js, + ktime_t end_timestamp) { ktime_t timestamp_diff; struct kbase_jd_atom *katom; @@ -377,8 +405,7 @@ static void kbasep_job_slot_update_head_start_timestamp( * Make a tracepoint call to the instrumentation module informing that * softstop happened on given lpu (job slot). */ -static void kbasep_trace_tl_event_lpu_softstop(struct kbase_device *kbdev, - int js) +static void kbasep_trace_tl_event_lpu_softstop(struct kbase_device *kbdev, unsigned int js) { KBASE_TLSTREAM_TL_EVENT_LPU_SOFTSTOP( kbdev, @@ -387,19 +414,17 @@ static void kbasep_trace_tl_event_lpu_softstop(struct kbase_device *kbdev, void kbase_job_done(struct kbase_device *kbdev, u32 done) { - int i; u32 count = 0; ktime_t end_timestamp; lockdep_assert_held(&kbdev->hwaccess_lock); - KBASE_DEBUG_ASSERT(kbdev); - KBASE_KTRACE_ADD_JM(kbdev, JM_IRQ, NULL, NULL, 0, done); - end_timestamp = ktime_get(); + end_timestamp = ktime_get_raw(); while (done) { + unsigned int i; u32 failed = done >> 16; /* treat failed slots as finished slots */ @@ -409,7 +434,6 @@ void kbase_job_done(struct kbase_device *kbdev, u32 done) * numbered interrupts before the higher numbered ones. */ i = ffs(finished) - 1; - KBASE_DEBUG_ASSERT(i >= 0); do { int nr_done; @@ -561,7 +585,7 @@ void kbase_job_done(struct kbase_device *kbdev, u32 done) count += nr_done; while (nr_done) { - if (nr_done == 1) { + if (likely(nr_done == 1)) { kbase_gpu_complete_hw(kbdev, i, completion_code, job_tail, @@ -580,6 +604,14 @@ void kbase_job_done(struct kbase_device *kbdev, u32 done) BASE_JD_EVENT_DONE, 0, &end_timestamp); +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) + /* Increment the end timestamp value by 1 ns to + * avoid having the same value for 'start_time_ns' + * and 'end_time_ns' for the 2nd atom whose job + * completion IRQ got merged with the 1st atom. + */ + end_timestamp = ktime_add(end_timestamp, ns_to_ktime(1)); +#endif } nr_done--; } @@ -590,7 +622,7 @@ void kbase_job_done(struct kbase_device *kbdev, u32 done) failed = done >> 16; finished = (done & 0xFFFF) | failed; if (done) - end_timestamp = ktime_get(); + end_timestamp = ktime_get_raw(); } while (finished & (1 << i)); kbasep_job_slot_update_head_start_timestamp(kbdev, i, @@ -608,18 +640,16 @@ void kbase_job_done(struct kbase_device *kbdev, u32 done) KBASE_KTRACE_ADD_JM(kbdev, JM_IRQ_END, NULL, NULL, 0, count); } -void kbasep_job_slot_soft_or_hard_stop_do_action(struct kbase_device *kbdev, - int js, - u32 action, - base_jd_core_req core_reqs, - struct kbase_jd_atom *target_katom) +void kbasep_job_slot_soft_or_hard_stop_do_action(struct kbase_device *kbdev, unsigned int js, + u32 action, base_jd_core_req core_reqs, + struct kbase_jd_atom *target_katom) { #if KBASE_KTRACE_ENABLE u32 status_reg_before; u64 job_in_head_before; u32 status_reg_after; - KBASE_DEBUG_ASSERT(!(action & (~JS_COMMAND_MASK))); + WARN_ON(action & (~JS_COMMAND_MASK)); /* Check the head pointer */ job_in_head_before = ((u64) kbase_reg_read(kbdev, @@ -670,6 +700,10 @@ void kbasep_job_slot_soft_or_hard_stop_do_action(struct kbase_device *kbdev, struct kbase_context *head_kctx; head = kbase_gpu_inspect(kbdev, js, 0); + if (unlikely(!head)) { + dev_err(kbdev->dev, "Can't get a katom from js(%d)\n", js); + return; + } head_kctx = head->kctx; if (status_reg_before == BASE_JD_EVENT_ACTIVE) @@ -697,7 +731,8 @@ void kbasep_job_slot_soft_or_hard_stop_do_action(struct kbase_device *kbdev, KBASE_KTRACE_ADD_JM_SLOT(kbdev, JM_HARDSTOP_1, head_kctx, head, head->jc, js); break; default: - BUG(); + WARN(1, "Unknown action %d on atom %pK in kctx %pK\n", action, + (void *)target_katom, (void *)target_katom->kctx); break; } } else { @@ -726,7 +761,8 @@ void kbasep_job_slot_soft_or_hard_stop_do_action(struct kbase_device *kbdev, KBASE_KTRACE_ADD_JM_SLOT(kbdev, JM_HARDSTOP_1, NULL, NULL, 0, js); break; default: - BUG(); + WARN(1, "Unknown action %d on atom %pK in kctx %pK\n", action, + (void *)target_katom, (void *)target_katom->kctx); break; } } @@ -736,7 +772,7 @@ void kbasep_job_slot_soft_or_hard_stop_do_action(struct kbase_device *kbdev, void kbase_backend_jm_kill_running_jobs_from_kctx(struct kbase_context *kctx) { struct kbase_device *kbdev = kctx->kbdev; - int i; + unsigned int i; lockdep_assert_held(&kbdev->hwaccess_lock); @@ -748,13 +784,11 @@ void kbase_job_slot_ctx_priority_check_locked(struct kbase_context *kctx, struct kbase_jd_atom *target_katom) { struct kbase_device *kbdev; - int target_js = target_katom->slot_nr; + unsigned int target_js = target_katom->slot_nr; int i; bool stop_sent = false; - KBASE_DEBUG_ASSERT(kctx != NULL); kbdev = kctx->kbdev; - KBASE_DEBUG_ASSERT(kbdev != NULL); lockdep_assert_held(&kbdev->hwaccess_lock); @@ -884,11 +918,11 @@ u32 kbase_backend_get_current_flush_id(struct kbase_device *kbdev) u32 flush_id = 0; if (kbase_hw_has_feature(kbdev, BASE_HW_FEATURE_FLUSH_REDUCTION)) { - mutex_lock(&kbdev->pm.lock); + rt_mutex_lock(&kbdev->pm.lock); if (kbdev->pm.backend.gpu_powered) flush_id = kbase_reg_read(kbdev, GPU_CONTROL_REG(LATEST_FLUSH)); - mutex_unlock(&kbdev->pm.lock); + rt_mutex_unlock(&kbdev->pm.lock); } return flush_id; @@ -928,13 +962,17 @@ KBASE_EXPORT_TEST_API(kbase_job_slot_term); * * Where possible any job in the next register is evicted before the soft-stop. */ -void kbase_job_slot_softstop_swflags(struct kbase_device *kbdev, int js, - struct kbase_jd_atom *target_katom, u32 sw_flags) +void kbase_job_slot_softstop_swflags(struct kbase_device *kbdev, unsigned int js, + struct kbase_jd_atom *target_katom, u32 sw_flags) { dev_dbg(kbdev->dev, "Soft-stop atom %pK with flags 0x%x (s:%d)\n", target_katom, sw_flags, js); - KBASE_DEBUG_ASSERT(!(sw_flags & JS_COMMAND_MASK)); + if (sw_flags & JS_COMMAND_MASK) { + WARN(true, "Atom %pK in kctx %pK received non-NOP flags %d\n", (void *)target_katom, + target_katom ? (void *)target_katom->kctx : NULL, sw_flags); + sw_flags &= ~((u32)JS_COMMAND_MASK); + } kbase_backend_soft_hard_stop_slot(kbdev, NULL, js, target_katom, JS_COMMAND_SOFT_STOP | sw_flags); } @@ -945,8 +983,8 @@ void kbase_job_slot_softstop(struct kbase_device *kbdev, int js, kbase_job_slot_softstop_swflags(kbdev, js, target_katom, 0u); } -void kbase_job_slot_hardstop(struct kbase_context *kctx, int js, - struct kbase_jd_atom *target_katom) +void kbase_job_slot_hardstop(struct kbase_context *kctx, unsigned int js, + struct kbase_jd_atom *target_katom) { struct kbase_device *kbdev = kctx->kbdev; @@ -1031,12 +1069,12 @@ static void kbase_debug_dump_registers(struct kbase_device *kbdev) i, kbase_reg_read(kbdev, JOB_SLOT_REG(i, JS_HEAD_LO))); } dev_err(kbdev->dev, " MMU_IRQ_RAWSTAT=0x%08x GPU_FAULTSTATUS=0x%08x", - kbase_reg_read(kbdev, MMU_REG(MMU_IRQ_RAWSTAT)), + kbase_reg_read(kbdev, MMU_CONTROL_REG(MMU_IRQ_RAWSTAT)), kbase_reg_read(kbdev, GPU_CONTROL_REG(GPU_FAULTSTATUS))); dev_err(kbdev->dev, " GPU_IRQ_MASK=0x%08x JOB_IRQ_MASK=0x%08x MMU_IRQ_MASK=0x%08x", kbase_reg_read(kbdev, GPU_CONTROL_REG(GPU_IRQ_MASK)), kbase_reg_read(kbdev, JOB_CONTROL_REG(JOB_IRQ_MASK)), - kbase_reg_read(kbdev, MMU_REG(MMU_IRQ_MASK))); + kbase_reg_read(kbdev, MMU_CONTROL_REG(MMU_IRQ_MASK))); dev_err(kbdev->dev, " PWR_OVERRIDE0=0x%08x PWR_OVERRIDE1=0x%08x", kbase_reg_read(kbdev, GPU_CONTROL_REG(PWR_OVERRIDE0)), kbase_reg_read(kbdev, GPU_CONTROL_REG(PWR_OVERRIDE1))); @@ -1052,17 +1090,14 @@ static void kbasep_reset_timeout_worker(struct work_struct *data) { unsigned long flags; struct kbase_device *kbdev; - ktime_t end_timestamp = ktime_get(); + ktime_t end_timestamp = ktime_get_raw(); struct kbasep_js_device_data *js_devdata; bool silent = false; u32 max_loops = KBASE_CLEAN_CACHE_MAX_LOOPS; - KBASE_DEBUG_ASSERT(data); - kbdev = container_of(data, struct kbase_device, hwaccess.backend.reset_work); - KBASE_DEBUG_ASSERT(kbdev); js_devdata = &kbdev->js_data; if (atomic_read(&kbdev->hwaccess.backend.reset_gpu) == @@ -1097,7 +1132,7 @@ static void kbasep_reset_timeout_worker(struct work_struct *data) return; } - KBASE_DEBUG_ASSERT(kbdev->irq_reset_flush == false); + WARN(kbdev->irq_reset_flush, "%s: GPU reset already in flight\n", __func__); spin_lock_irqsave(&kbdev->hwaccess_lock, flags); spin_lock(&kbdev->mmu_mask_change); @@ -1136,9 +1171,10 @@ static void kbasep_reset_timeout_worker(struct work_struct *data) WARN(!max_loops, "L2 power transition timed out while trying to reset\n"); } - mutex_lock(&kbdev->pm.lock); + rt_mutex_lock(&kbdev->pm.lock); /* We hold the pm lock, so there ought to be a current policy */ - KBASE_DEBUG_ASSERT(kbdev->pm.backend.pm_current_policy); + if (unlikely(!kbdev->pm.backend.pm_current_policy)) + dev_warn(kbdev->dev, "No power policy set!"); /* All slot have been soft-stopped and we've waited * SOFT_STOP_RESET_TIMEOUT for the slots to clear, at this point we @@ -1174,7 +1210,7 @@ static void kbasep_reset_timeout_worker(struct work_struct *data) /* Reset the GPU */ kbase_pm_init_hw(kbdev, 0); - mutex_unlock(&kbdev->pm.lock); + rt_mutex_unlock(&kbdev->pm.lock); mutex_lock(&js_devdata->runpool_mutex); @@ -1190,7 +1226,7 @@ static void kbasep_reset_timeout_worker(struct work_struct *data) mutex_unlock(&js_devdata->runpool_mutex); - mutex_lock(&kbdev->pm.lock); + rt_mutex_lock(&kbdev->pm.lock); kbase_pm_reset_complete(kbdev); @@ -1202,7 +1238,7 @@ static void kbasep_reset_timeout_worker(struct work_struct *data) */ kbase_pm_wait_for_desired_state(kbdev); - mutex_unlock(&kbdev->pm.lock); + rt_mutex_unlock(&kbdev->pm.lock); atomic_set(&kbdev->hwaccess.backend.reset_gpu, KBASE_RESET_GPU_NOT_PENDING); @@ -1235,8 +1271,6 @@ static enum hrtimer_restart kbasep_reset_timer_callback(struct hrtimer *timer) struct kbase_device *kbdev = container_of(timer, struct kbase_device, hwaccess.backend.reset_timer); - KBASE_DEBUG_ASSERT(kbdev); - /* Reset still pending? */ if (atomic_cmpxchg(&kbdev->hwaccess.backend.reset_gpu, KBASE_RESET_GPU_COMMITTED, KBASE_RESET_GPU_HAPPENING) == @@ -1254,11 +1288,9 @@ static enum hrtimer_restart kbasep_reset_timer_callback(struct hrtimer *timer) static void kbasep_try_reset_gpu_early_locked(struct kbase_device *kbdev) { - int i; + unsigned int i; int pending_jobs = 0; - KBASE_DEBUG_ASSERT(kbdev); - /* Count the number of jobs */ for (i = 0; i < kbdev->gpu_props.num_job_slots; i++) pending_jobs += kbase_backend_nr_atoms_submitted(kbdev, i); @@ -1316,8 +1348,6 @@ bool kbase_prepare_to_reset_gpu_locked(struct kbase_device *kbdev, { int i; - KBASE_DEBUG_ASSERT(kbdev); - #ifdef CONFIG_MALI_ARBITER_SUPPORT if (kbase_pm_is_gpu_lost(kbdev)) { /* GPU access has been removed, reset will be done by @@ -1371,13 +1401,11 @@ KBASE_EXPORT_TEST_API(kbase_prepare_to_reset_gpu); */ void kbase_reset_gpu(struct kbase_device *kbdev) { - KBASE_DEBUG_ASSERT(kbdev); - /* Note this is an assert/atomic_set because it is a software issue for * a race to be occurring here */ - KBASE_DEBUG_ASSERT(atomic_read(&kbdev->hwaccess.backend.reset_gpu) == - KBASE_RESET_GPU_PREPARED); + if (WARN_ON(atomic_read(&kbdev->hwaccess.backend.reset_gpu) != KBASE_RESET_GPU_PREPARED)) + return; atomic_set(&kbdev->hwaccess.backend.reset_gpu, KBASE_RESET_GPU_COMMITTED); @@ -1395,13 +1423,11 @@ KBASE_EXPORT_TEST_API(kbase_reset_gpu); void kbase_reset_gpu_locked(struct kbase_device *kbdev) { - KBASE_DEBUG_ASSERT(kbdev); - /* Note this is an assert/atomic_set because it is a software issue for * a race to be occurring here */ - KBASE_DEBUG_ASSERT(atomic_read(&kbdev->hwaccess.backend.reset_gpu) == - KBASE_RESET_GPU_PREPARED); + if (WARN_ON(atomic_read(&kbdev->hwaccess.backend.reset_gpu) != KBASE_RESET_GPU_PREPARED)) + return; atomic_set(&kbdev->hwaccess.backend.reset_gpu, KBASE_RESET_GPU_COMMITTED); @@ -1442,6 +1468,11 @@ bool kbase_reset_gpu_is_active(struct kbase_device *kbdev) return true; } +bool kbase_reset_gpu_is_not_pending(struct kbase_device *kbdev) +{ + return atomic_read(&kbdev->hwaccess.backend.reset_gpu) == KBASE_RESET_GPU_NOT_PENDING; +} + int kbase_reset_gpu_wait(struct kbase_device *kbdev) { wait_event(kbdev->hwaccess.backend.reset_wait, diff --git a/mali_kbase/backend/gpu/mali_kbase_jm_internal.h b/mali_kbase/backend/gpu/mali_kbase_jm_internal.h index 1039e85..380a530 100644 --- a/mali_kbase/backend/gpu/mali_kbase_jm_internal.h +++ b/mali_kbase/backend/gpu/mali_kbase_jm_internal.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2011-2016, 2018-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2011-2016, 2018-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -34,21 +34,6 @@ #include <device/mali_kbase_device.h> /** - * kbase_job_submit_nolock() - Submit a job to a certain job-slot - * @kbdev: Device pointer - * @katom: Atom to submit - * @js: Job slot to submit on - * - * The caller must check kbasep_jm_is_submit_slots_free() != false before - * calling this. - * - * The following locking conditions are made on the caller: - * - it must hold the hwaccess_lock - */ -void kbase_job_submit_nolock(struct kbase_device *kbdev, - struct kbase_jd_atom *katom, int js); - -/** * kbase_job_done_slot() - Complete the head job on a particular job-slot * @kbdev: Device pointer * @s: Job slot @@ -60,23 +45,13 @@ void kbase_job_done_slot(struct kbase_device *kbdev, int s, u32 completion_code, u64 job_tail, ktime_t *end_timestamp); #if IS_ENABLED(CONFIG_GPU_TRACEPOINTS) -static inline char *kbasep_make_job_slot_string(int js, char *js_string, - size_t js_size) +static inline char *kbasep_make_job_slot_string(unsigned int js, char *js_string, size_t js_size) { - snprintf(js_string, js_size, "job_slot_%i", js); + (void)scnprintf(js_string, js_size, "job_slot_%u", js); return js_string; } #endif -#if !MALI_USE_CSF -static inline int kbasep_jm_is_js_free(struct kbase_device *kbdev, int js, - struct kbase_context *kctx) -{ - return !kbase_reg_read(kbdev, JOB_SLOT_REG(js, JS_COMMAND_NEXT)); -} -#endif - - /** * kbase_job_hw_submit() - Submit a job to the GPU * @kbdev: Device pointer @@ -88,10 +63,10 @@ static inline int kbasep_jm_is_js_free(struct kbase_device *kbdev, int js, * * The following locking conditions are made on the caller: * - it must hold the hwaccess_lock + * + * Return: 0 if the job was successfully submitted to hardware, an error otherwise. */ -void kbase_job_hw_submit(struct kbase_device *kbdev, - struct kbase_jd_atom *katom, - int js); +int kbase_job_hw_submit(struct kbase_device *kbdev, struct kbase_jd_atom *katom, unsigned int js); #if !MALI_USE_CSF /** @@ -107,11 +82,9 @@ void kbase_job_hw_submit(struct kbase_device *kbdev, * The following locking conditions are made on the caller: * - it must hold the hwaccess_lock */ -void kbasep_job_slot_soft_or_hard_stop_do_action(struct kbase_device *kbdev, - int js, - u32 action, - base_jd_core_req core_reqs, - struct kbase_jd_atom *target_katom); +void kbasep_job_slot_soft_or_hard_stop_do_action(struct kbase_device *kbdev, unsigned int js, + u32 action, base_jd_core_req core_reqs, + struct kbase_jd_atom *target_katom); #endif /* !MALI_USE_CSF */ /** @@ -135,11 +108,8 @@ void kbasep_job_slot_soft_or_hard_stop_do_action(struct kbase_device *kbdev, * * Return: true if an atom was stopped, false otherwise */ -bool kbase_backend_soft_hard_stop_slot(struct kbase_device *kbdev, - struct kbase_context *kctx, - int js, - struct kbase_jd_atom *katom, - u32 action); +bool kbase_backend_soft_hard_stop_slot(struct kbase_device *kbdev, struct kbase_context *kctx, + unsigned int js, struct kbase_jd_atom *katom, u32 action); /** * kbase_job_slot_init - Initialise job slot framework diff --git a/mali_kbase/backend/gpu/mali_kbase_jm_rb.c b/mali_kbase/backend/gpu/mali_kbase_jm_rb.c index eaa3640..66f068a 100644 --- a/mali_kbase/backend/gpu/mali_kbase_jm_rb.c +++ b/mali_kbase/backend/gpu/mali_kbase_jm_rb.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2014-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2014-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -29,9 +29,12 @@ #include <mali_kbase_jm.h> #include <mali_kbase_js.h> #include <tl/mali_kbase_tracepoints.h> -#include <mali_kbase_hwcnt_context.h> +#include <hwcnt/mali_kbase_hwcnt_context.h> #include <mali_kbase_reset_gpu.h> #include <mali_kbase_kinstr_jm.h> +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) +#include <mali_kbase_gpu_metrics.h> +#endif #include <backend/gpu/mali_kbase_cache_policy_backend.h> #include <device/mali_kbase_device.h> #include <backend/gpu/mali_kbase_jm_internal.h> @@ -93,9 +96,8 @@ static void kbase_gpu_enqueue_atom(struct kbase_device *kbdev, * * Return: Atom removed from ringbuffer */ -static struct kbase_jd_atom *kbase_gpu_dequeue_atom(struct kbase_device *kbdev, - int js, - ktime_t *end_timestamp) +static struct kbase_jd_atom *kbase_gpu_dequeue_atom(struct kbase_device *kbdev, unsigned int js, + ktime_t *end_timestamp) { struct slot_rb *rb = &kbdev->hwaccess.backend.slot_rb[js]; struct kbase_jd_atom *katom; @@ -118,8 +120,7 @@ static struct kbase_jd_atom *kbase_gpu_dequeue_atom(struct kbase_device *kbdev, return katom; } -struct kbase_jd_atom *kbase_gpu_inspect(struct kbase_device *kbdev, int js, - int idx) +struct kbase_jd_atom *kbase_gpu_inspect(struct kbase_device *kbdev, unsigned int js, int idx) { struct slot_rb *rb = &kbdev->hwaccess.backend.slot_rb[js]; @@ -131,8 +132,7 @@ struct kbase_jd_atom *kbase_gpu_inspect(struct kbase_device *kbdev, int js, return rb->entries[(rb->read_idx + idx) & SLOT_RB_MASK].katom; } -struct kbase_jd_atom *kbase_backend_inspect_tail(struct kbase_device *kbdev, - int js) +struct kbase_jd_atom *kbase_backend_inspect_tail(struct kbase_device *kbdev, unsigned int js) { struct slot_rb *rb = &kbdev->hwaccess.backend.slot_rb[js]; @@ -144,12 +144,13 @@ struct kbase_jd_atom *kbase_backend_inspect_tail(struct kbase_device *kbdev, bool kbase_gpu_atoms_submitted_any(struct kbase_device *kbdev) { - int js; - int i; + unsigned int js; lockdep_assert_held(&kbdev->hwaccess_lock); for (js = 0; js < kbdev->gpu_props.num_job_slots; js++) { + int i; + for (i = 0; i < SLOT_RB_SIZE; i++) { struct kbase_jd_atom *katom = kbase_gpu_inspect(kbdev, js, i); @@ -160,7 +161,7 @@ bool kbase_gpu_atoms_submitted_any(struct kbase_device *kbdev) return false; } -int kbase_backend_nr_atoms_submitted(struct kbase_device *kbdev, int js) +int kbase_backend_nr_atoms_submitted(struct kbase_device *kbdev, unsigned int js) { int nr = 0; int i; @@ -178,7 +179,7 @@ int kbase_backend_nr_atoms_submitted(struct kbase_device *kbdev, int js) return nr; } -int kbase_backend_nr_atoms_on_slot(struct kbase_device *kbdev, int js) +int kbase_backend_nr_atoms_on_slot(struct kbase_device *kbdev, unsigned int js) { int nr = 0; int i; @@ -193,8 +194,8 @@ int kbase_backend_nr_atoms_on_slot(struct kbase_device *kbdev, int js) return nr; } -static int kbase_gpu_nr_atoms_on_slot_min(struct kbase_device *kbdev, int js, - enum kbase_atom_gpu_rb_state min_rb_state) +static int kbase_gpu_nr_atoms_on_slot_min(struct kbase_device *kbdev, unsigned int js, + enum kbase_atom_gpu_rb_state min_rb_state) { int nr = 0; int i; @@ -244,9 +245,11 @@ static bool check_secure_atom(struct kbase_jd_atom *katom, bool secure) static bool kbase_gpu_check_secure_atoms(struct kbase_device *kbdev, bool secure) { - int js, i; + unsigned int js; for (js = 0; js < kbdev->gpu_props.num_job_slots; js++) { + int i; + for (i = 0; i < SLOT_RB_SIZE; i++) { struct kbase_jd_atom *katom = kbase_gpu_inspect(kbdev, js, i); @@ -261,7 +264,7 @@ static bool kbase_gpu_check_secure_atoms(struct kbase_device *kbdev, return false; } -int kbase_backend_slot_free(struct kbase_device *kbdev, int js) +int kbase_backend_slot_free(struct kbase_device *kbdev, unsigned int js) { lockdep_assert_held(&kbdev->hwaccess_lock); @@ -274,6 +277,59 @@ int kbase_backend_slot_free(struct kbase_device *kbdev, int js) return SLOT_RB_SIZE - kbase_backend_nr_atoms_on_slot(kbdev, js); } +/** + * trace_atom_completion_for_gpu_metrics - Report the completion of atom for the + * purpose of emitting power/gpu_work_period + * tracepoint. + * + * @katom: Pointer to the atom that completed execution on GPU. + * @end_timestamp: Pointer to the timestamp of atom completion. May be NULL, in + * which case current time will be used. + * + * The function would also report the start for an atom that was in the HEAD_NEXT + * register. + * + * Note: Caller must hold the HW access lock. + */ +static inline void trace_atom_completion_for_gpu_metrics( + struct kbase_jd_atom *const katom, + ktime_t *end_timestamp) +{ +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) + u64 complete_ns; + struct kbase_context *kctx = katom->kctx; + struct kbase_jd_atom *queued = + kbase_gpu_inspect(kctx->kbdev, katom->slot_nr, 1); + +#ifdef CONFIG_MALI_DEBUG + WARN_ON(!kbase_gpu_inspect(kctx->kbdev, katom->slot_nr, 0)); +#endif + + lockdep_assert_held(&kctx->kbdev->hwaccess_lock); + + if (unlikely(queued == katom)) + return; + + /* A protected atom and a non-protected atom cannot be in the RB_SUBMITTED + * state at the same time in the job slot ringbuffer. Atom submission state + * machine prevents the submission of a non-protected atom until all + * protected atoms have completed and GPU has exited the protected mode. + * This implies that if the queued atom is in RB_SUBMITTED state, it shall + * be a protected atom and so we can return early. + */ + if (unlikely(kbase_jd_katom_is_protected(katom))) + return; + + if (likely(end_timestamp)) + complete_ns = ktime_to_ns(*end_timestamp); + else + complete_ns = ktime_get_raw_ns(); + + kbase_gpu_metrics_ctx_end_activity(kctx, complete_ns); + if (queued && queued->gpu_rb_state == KBASE_ATOM_GPU_RB_SUBMITTED) + kbase_gpu_metrics_ctx_start_activity(queued->kctx, complete_ns); +#endif +} static void kbase_gpu_release_atom(struct kbase_device *kbdev, struct kbase_jd_atom *katom, @@ -290,6 +346,7 @@ static void kbase_gpu_release_atom(struct kbase_device *kbdev, break; case KBASE_ATOM_GPU_RB_SUBMITTED: + trace_atom_completion_for_gpu_metrics(katom, end_timestamp); kbase_kinstr_jm_atom_hw_release(katom); /* Inform power management at start/finish of atom so it can * update its GPU utilisation metrics. Mark atom as not @@ -298,8 +355,7 @@ static void kbase_gpu_release_atom(struct kbase_device *kbdev, katom->gpu_rb_state = KBASE_ATOM_GPU_RB_READY; kbase_pm_metrics_update(kbdev, end_timestamp); - /* Inform platform at start/finish of atom */ - kbasep_platform_event_atom_complete(katom); + kbasep_platform_event_work_end(katom); if (katom->core_req & BASE_JD_REQ_PERMON) kbase_pm_release_gpu_cycle_counter_nolock(kbdev); @@ -347,16 +403,35 @@ static void kbase_gpu_release_atom(struct kbase_device *kbdev, katom->protected_state.exit != KBASE_ATOM_EXIT_PROTECTED_CHECK) kbdev->protected_mode_transition = false; + + /* If the atom is at KBASE_ATOM_ENTER_PROTECTED_HWCNT state, it means + * one of two events prevented it from progressing to the next state and + * ultimately reach protected mode: + * - hwcnts were enabled, and the atom had to schedule a worker to + * disable them. + * - the hwcnts were already disabled, but some other error occurred. + * In the first case, if the worker has not yet completed + * (kbdev->protected_mode_hwcnt_disabled == false), we need to re-enable + * them and signal to the worker they have already been enabled + */ + if (kbase_jd_katom_is_protected(katom) && + (katom->protected_state.enter == KBASE_ATOM_ENTER_PROTECTED_HWCNT)) { + kbdev->protected_mode_hwcnt_desired = true; + if (kbdev->protected_mode_hwcnt_disabled) { + kbase_hwcnt_context_enable(kbdev->hwcnt_gpu_ctx); + kbdev->protected_mode_hwcnt_disabled = false; + } + } + /* If the atom has suspended hwcnt but has not yet entered * protected mode, then resume hwcnt now. If the GPU is now in * protected mode then hwcnt will be resumed by GPU reset so * don't resume it here. */ if (kbase_jd_katom_is_protected(katom) && - ((katom->protected_state.enter == - KBASE_ATOM_ENTER_PROTECTED_IDLE_L2) || - (katom->protected_state.enter == - KBASE_ATOM_ENTER_PROTECTED_SET_COHERENCY))) { + ((katom->protected_state.enter == KBASE_ATOM_ENTER_PROTECTED_IDLE_L2) || + (katom->protected_state.enter == KBASE_ATOM_ENTER_PROTECTED_SET_COHERENCY) || + (katom->protected_state.enter == KBASE_ATOM_ENTER_PROTECTED_FINISHED))) { WARN_ON(!kbdev->protected_mode_hwcnt_disabled); kbdev->protected_mode_hwcnt_desired = true; if (kbdev->protected_mode_hwcnt_disabled) { @@ -411,9 +486,9 @@ static void kbase_gpu_mark_atom_for_return(struct kbase_device *kbdev, * * Return: true if any slots other than @js are busy, false otherwise */ -static inline bool other_slots_busy(struct kbase_device *kbdev, int js) +static inline bool other_slots_busy(struct kbase_device *kbdev, unsigned int js) { - int slot; + unsigned int slot; for (slot = 0; slot < kbdev->gpu_props.num_job_slots; slot++) { if (slot == js) @@ -507,17 +582,14 @@ static int kbase_jm_protected_entry(struct kbase_device *kbdev, KBASE_TLSTREAM_AUX_PROTECTED_ENTER_END(kbdev, kbdev); if (err) { /* - * Failed to switch into protected mode, resume - * GPU hwcnt and fail atom. + * Failed to switch into protected mode. + * + * At this point we expect: + * katom->gpu_rb_state = KBASE_ATOM_GPU_RB_WAITING_PROTECTED_MODE_TRANSITION && + * katom->protected_state.enter = KBASE_ATOM_ENTER_PROTECTED_FINISHED + * ==> + * kbdev->protected_mode_hwcnt_disabled = false */ - WARN_ON(!kbdev->protected_mode_hwcnt_disabled); - kbdev->protected_mode_hwcnt_desired = true; - if (kbdev->protected_mode_hwcnt_disabled) { - kbase_hwcnt_context_enable( - kbdev->hwcnt_gpu_ctx); - kbdev->protected_mode_hwcnt_disabled = false; - } - katom[idx]->event_code = BASE_JD_EVENT_JOB_INVALID; kbase_gpu_mark_atom_for_return(kbdev, katom[idx]); /* @@ -537,12 +609,9 @@ static int kbase_jm_protected_entry(struct kbase_device *kbdev, /* * Protected mode sanity checks. */ - KBASE_DEBUG_ASSERT_MSG( - kbase_jd_katom_is_protected(katom[idx]) == - kbase_gpu_in_protected_mode(kbdev), - "Protected mode of atom (%d) doesn't match protected mode of GPU (%d)", - kbase_jd_katom_is_protected(katom[idx]), - kbase_gpu_in_protected_mode(kbdev)); + WARN(kbase_jd_katom_is_protected(katom[idx]) != kbase_gpu_in_protected_mode(kbdev), + "Protected mode of atom (%d) doesn't match protected mode of GPU (%d)", + kbase_jd_katom_is_protected(katom[idx]), kbase_gpu_in_protected_mode(kbdev)); katom[idx]->gpu_rb_state = KBASE_ATOM_GPU_RB_READY; @@ -831,7 +900,7 @@ static int kbase_jm_exit_protected_mode(struct kbase_device *kbdev, void kbase_backend_slot_update(struct kbase_device *kbdev) { - int js; + unsigned int js; lockdep_assert_held(&kbdev->hwaccess_lock); @@ -853,6 +922,9 @@ void kbase_backend_slot_update(struct kbase_device *kbdev) for (idx = 0; idx < SLOT_RB_SIZE; idx++) { bool cores_ready; +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) + bool trace_atom_submit_for_gpu_metrics = true; +#endif int ret; if (!katom[idx]) @@ -952,18 +1024,6 @@ void kbase_backend_slot_update(struct kbase_device *kbdev) cores_ready = kbase_pm_cores_requested(kbdev, true); - if (katom[idx]->event_code == - BASE_JD_EVENT_PM_EVENT) { - KBASE_KTRACE_ADD_JM_SLOT_INFO( - kbdev, JM_MARK_FOR_RETURN_TO_JS, - katom[idx]->kctx, katom[idx], - katom[idx]->jc, js, - katom[idx]->event_code); - katom[idx]->gpu_rb_state = - KBASE_ATOM_GPU_RB_RETURN_TO_JS; - break; - } - if (!cores_ready) break; @@ -975,12 +1035,21 @@ void kbase_backend_slot_update(struct kbase_device *kbdev) case KBASE_ATOM_GPU_RB_READY: if (idx == 1) { + enum kbase_atom_gpu_rb_state atom_0_gpu_rb_state = + katom[0]->gpu_rb_state; + +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) + trace_atom_submit_for_gpu_metrics = + (atom_0_gpu_rb_state == + KBASE_ATOM_GPU_RB_NOT_IN_SLOT_RB); +#endif + /* Only submit if head atom or previous * atom already submitted */ - if ((katom[0]->gpu_rb_state != + if ((atom_0_gpu_rb_state != KBASE_ATOM_GPU_RB_SUBMITTED && - katom[0]->gpu_rb_state != + atom_0_gpu_rb_state != KBASE_ATOM_GPU_RB_NOT_IN_SLOT_RB)) break; @@ -1000,36 +1069,42 @@ void kbase_backend_slot_update(struct kbase_device *kbdev) other_slots_busy(kbdev, js)) break; -#ifdef CONFIG_MALI_GEM5_BUILD - if (!kbasep_jm_is_js_free(kbdev, js, - katom[idx]->kctx)) - break; -#endif /* Check if this job needs the cycle counter * enabled before submission */ if (katom[idx]->core_req & BASE_JD_REQ_PERMON) - kbase_pm_request_gpu_cycle_counter_l2_is_on( - kbdev); + kbase_pm_request_gpu_cycle_counter_l2_is_on(kbdev); - kbase_job_hw_submit(kbdev, katom[idx], js); - katom[idx]->gpu_rb_state = - KBASE_ATOM_GPU_RB_SUBMITTED; + if (!kbase_job_hw_submit(kbdev, katom[idx], js)) { + katom[idx]->gpu_rb_state = KBASE_ATOM_GPU_RB_SUBMITTED; + + /* Inform power management at start/finish of + * atom so it can update its GPU utilisation + * metrics. + */ + kbase_pm_metrics_update(kbdev, + &katom[idx]->start_timestamp); + + /* Inform platform at start/finish of atom */ + + kbasep_platform_event_work_begin(katom[idx]); +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) + if (likely(trace_atom_submit_for_gpu_metrics && + !kbase_jd_katom_is_protected(katom[idx]))) + kbase_gpu_metrics_ctx_start_activity( + katom[idx]->kctx, + ktime_to_ns(katom[idx]->start_timestamp)); +#endif + } else { + if (katom[idx]->core_req & BASE_JD_REQ_PERMON) + kbase_pm_release_gpu_cycle_counter_nolock(kbdev); + + break; + } /* ***TRANSITION TO HIGHER STATE*** */ fallthrough; case KBASE_ATOM_GPU_RB_SUBMITTED: - - /* Inform power management at start/finish of - * atom so it can update its GPU utilisation - * metrics. - */ - kbase_pm_metrics_update(kbdev, - &katom[idx]->start_timestamp); - - /* Inform platform at start/finish of atom */ - kbasep_platform_event_atom_submit(katom[idx]); - break; case KBASE_ATOM_GPU_RB_RETURN_TO_JS: @@ -1081,6 +1156,25 @@ kbase_rb_atom_might_depend(const struct kbase_jd_atom *katom_a, KBASE_KATOM_FLAG_FAIL_BLOCKER))); } +static inline void kbase_gpu_remove_atom(struct kbase_device *kbdev, + struct kbase_jd_atom *katom, + u32 action, + bool disjoint) +{ + struct kbase_context *kctx = katom->kctx; + + lockdep_assert_held(&kbdev->hwaccess_lock); + + katom->event_code = BASE_JD_EVENT_REMOVED_FROM_NEXT; + kbase_gpu_mark_atom_for_return(kbdev, katom); + kbase_jsctx_slot_prio_blocked_set(kctx, katom->slot_nr, + katom->sched_priority); + + if (disjoint) + kbase_job_check_enter_disjoint(kbdev, action, katom->core_req, + katom); +} + /** * kbase_gpu_irq_evict - evict a slot's JSn_HEAD_NEXT atom from the HW if it is * related to a failed JSn_HEAD atom @@ -1109,8 +1203,7 @@ kbase_rb_atom_might_depend(const struct kbase_jd_atom *katom_a, * * Return: true if an atom was evicted, false otherwise. */ -bool kbase_gpu_irq_evict(struct kbase_device *kbdev, int js, - u32 completion_code) +bool kbase_gpu_irq_evict(struct kbase_device *kbdev, unsigned int js, u32 completion_code) { struct kbase_jd_atom *katom; struct kbase_jd_atom *next_katom; @@ -1118,6 +1211,10 @@ bool kbase_gpu_irq_evict(struct kbase_device *kbdev, int js, lockdep_assert_held(&kbdev->hwaccess_lock); katom = kbase_gpu_inspect(kbdev, js, 0); + if (!katom) { + dev_err(kbdev->dev, "Can't get a katom from js(%u)\n", js); + return false; + } next_katom = kbase_gpu_inspect(kbdev, js, 1); if (next_katom && @@ -1128,9 +1225,9 @@ bool kbase_gpu_irq_evict(struct kbase_device *kbdev, int js, kbase_reg_read(kbdev, JOB_SLOT_REG(js, JS_HEAD_NEXT_HI)) != 0)) { kbase_reg_write(kbdev, JOB_SLOT_REG(js, JS_COMMAND_NEXT), JS_COMMAND_NOP); - next_katom->gpu_rb_state = KBASE_ATOM_GPU_RB_READY; if (completion_code == BASE_JD_EVENT_STOPPED) { + kbase_gpu_remove_atom(kbdev, next_katom, JS_COMMAND_SOFT_STOP, false); KBASE_TLSTREAM_TL_NRET_ATOM_LPU(kbdev, next_katom, &kbdev->gpu_props.props.raw_props.js_features [next_katom->slot_nr]); @@ -1139,10 +1236,12 @@ bool kbase_gpu_irq_evict(struct kbase_device *kbdev, int js, KBASE_TLSTREAM_TL_NRET_CTX_LPU(kbdev, next_katom->kctx, &kbdev->gpu_props.props.raw_props.js_features [next_katom->slot_nr]); - } + } else { + next_katom->gpu_rb_state = KBASE_ATOM_GPU_RB_READY; - if (next_katom->core_req & BASE_JD_REQ_PERMON) - kbase_pm_release_gpu_cycle_counter_nolock(kbdev); + if (next_katom->core_req & BASE_JD_REQ_PERMON) + kbase_pm_release_gpu_cycle_counter_nolock(kbdev); + } /* On evicting the next_katom, the last submission kctx on the * given job slot then reverts back to the one that owns katom. @@ -1181,13 +1280,19 @@ bool kbase_gpu_irq_evict(struct kbase_device *kbdev, int js, * otherwise we would be in the incorrect state of having an atom both running * on the HW and returned to the JS. */ -void kbase_gpu_complete_hw(struct kbase_device *kbdev, int js, - u32 completion_code, - u64 job_tail, - ktime_t *end_timestamp) + +void kbase_gpu_complete_hw(struct kbase_device *kbdev, unsigned int js, u32 completion_code, + u64 job_tail, ktime_t *end_timestamp) { struct kbase_jd_atom *katom = kbase_gpu_inspect(kbdev, js, 0); - struct kbase_context *kctx = katom->kctx; + struct kbase_context *kctx = NULL; + + if (unlikely(!katom)) { + dev_err(kbdev->dev, "Can't get a katom from js(%d)\n", js); + return; + } + + kctx = katom->kctx; dev_dbg(kbdev->dev, "Atom %pK completed on hw with code 0x%x and job_tail 0x%llx (s:%d)\n", @@ -1240,7 +1345,7 @@ void kbase_gpu_complete_hw(struct kbase_device *kbdev, int js, } } else if (completion_code != BASE_JD_EVENT_DONE) { struct kbasep_js_device_data *js_devdata = &kbdev->js_data; - int i; + unsigned int i; if (!kbase_ctx_flag(katom->kctx, KCTX_DYING)) { dev_warn(kbdev->dev, "error detected from slot %d, job status 0x%08x (%s)", @@ -1348,11 +1453,9 @@ void kbase_gpu_complete_hw(struct kbase_device *kbdev, int js, } else { char js_string[16]; - trace_gpu_sched_switch(kbasep_make_job_slot_string(js, - js_string, - sizeof(js_string)), - ktime_to_ns(ktime_get()), 0, 0, - 0); + trace_gpu_sched_switch(kbasep_make_job_slot_string(js, js_string, + sizeof(js_string)), + ktime_to_ns(ktime_get_raw()), 0, 0, 0); } } #endif @@ -1387,7 +1490,7 @@ void kbase_gpu_complete_hw(struct kbase_device *kbdev, int js, void kbase_backend_reset(struct kbase_device *kbdev, ktime_t *end_timestamp) { - int js; + unsigned int js; lockdep_assert_held(&kbdev->hwaccess_lock); @@ -1408,14 +1511,14 @@ void kbase_backend_reset(struct kbase_device *kbdev, ktime_t *end_timestamp) if (katom->protected_state.exit == KBASE_ATOM_EXIT_PROTECTED_RESET_WAIT) { /* protected mode sanity checks */ - KBASE_DEBUG_ASSERT_MSG( - kbase_jd_katom_is_protected(katom) == kbase_gpu_in_protected_mode(kbdev), - "Protected mode of atom (%d) doesn't match protected mode of GPU (%d)", - kbase_jd_katom_is_protected(katom), kbase_gpu_in_protected_mode(kbdev)); - KBASE_DEBUG_ASSERT_MSG( - (kbase_jd_katom_is_protected(katom) && js == 0) || - !kbase_jd_katom_is_protected(katom), - "Protected atom on JS%d not supported", js); + WARN(kbase_jd_katom_is_protected(katom) != + kbase_gpu_in_protected_mode(kbdev), + "Protected mode of atom (%d) doesn't match protected mode of GPU (%d)", + kbase_jd_katom_is_protected(katom), + kbase_gpu_in_protected_mode(kbdev)); + WARN(!(kbase_jd_katom_is_protected(katom) && js == 0) && + kbase_jd_katom_is_protected(katom), + "Protected atom on JS%u not supported", js); } if ((katom->gpu_rb_state < KBASE_ATOM_GPU_RB_SUBMITTED) && !kbase_ctx_flag(katom->kctx, KCTX_DYING)) @@ -1511,10 +1614,8 @@ static bool should_stop_next_atom(struct kbase_device *kbdev, return ret; } -static inline void kbase_gpu_stop_atom(struct kbase_device *kbdev, - int js, - struct kbase_jd_atom *katom, - u32 action) +static inline void kbase_gpu_stop_atom(struct kbase_device *kbdev, unsigned int js, + struct kbase_jd_atom *katom, u32 action) { struct kbase_context *kctx = katom->kctx; u32 hw_action = action & JS_COMMAND_MASK; @@ -1525,25 +1626,6 @@ static inline void kbase_gpu_stop_atom(struct kbase_device *kbdev, kbase_jsctx_slot_prio_blocked_set(kctx, js, katom->sched_priority); } -static inline void kbase_gpu_remove_atom(struct kbase_device *kbdev, - struct kbase_jd_atom *katom, - u32 action, - bool disjoint) -{ - struct kbase_context *kctx = katom->kctx; - - lockdep_assert_held(&kbdev->hwaccess_lock); - - katom->event_code = BASE_JD_EVENT_REMOVED_FROM_NEXT; - kbase_gpu_mark_atom_for_return(kbdev, katom); - kbase_jsctx_slot_prio_blocked_set(kctx, katom->slot_nr, - katom->sched_priority); - - if (disjoint) - kbase_job_check_enter_disjoint(kbdev, action, katom->core_req, - katom); -} - static int should_stop_x_dep_slot(struct kbase_jd_atom *katom) { if (katom->x_post_dep) { @@ -1558,11 +1640,8 @@ static int should_stop_x_dep_slot(struct kbase_jd_atom *katom) return -1; } -bool kbase_backend_soft_hard_stop_slot(struct kbase_device *kbdev, - struct kbase_context *kctx, - int js, - struct kbase_jd_atom *katom, - u32 action) +bool kbase_backend_soft_hard_stop_slot(struct kbase_device *kbdev, struct kbase_context *kctx, + unsigned int js, struct kbase_jd_atom *katom, u32 action) { struct kbase_jd_atom *katom_idx0; struct kbase_context *kctx_idx0 = NULL; @@ -1806,18 +1885,16 @@ void kbase_backend_complete_wq_post_sched(struct kbase_device *kbdev, base_jd_core_req core_req) { if (!kbdev->pm.active_count) { - mutex_lock(&kbdev->js_data.runpool_mutex); - mutex_lock(&kbdev->pm.lock); + kbase_pm_lock(kbdev); kbase_pm_update_active(kbdev); - mutex_unlock(&kbdev->pm.lock); - mutex_unlock(&kbdev->js_data.runpool_mutex); + kbase_pm_unlock(kbdev); } } void kbase_gpu_dump_slots(struct kbase_device *kbdev) { unsigned long flags; - int js; + unsigned int js; spin_lock_irqsave(&kbdev->hwaccess_lock, flags); @@ -1832,12 +1909,10 @@ void kbase_gpu_dump_slots(struct kbase_device *kbdev) idx); if (katom) - dev_info(kbdev->dev, - " js%d idx%d : katom=%pK gpu_rb_state=%d\n", - js, idx, katom, katom->gpu_rb_state); + dev_info(kbdev->dev, " js%u idx%d : katom=%pK gpu_rb_state=%d\n", + js, idx, katom, katom->gpu_rb_state); else - dev_info(kbdev->dev, " js%d idx%d : empty\n", - js, idx); + dev_info(kbdev->dev, " js%u idx%d : empty\n", js, idx); } } @@ -1846,7 +1921,7 @@ void kbase_gpu_dump_slots(struct kbase_device *kbdev) void kbase_backend_slot_kctx_purge_locked(struct kbase_device *kbdev, struct kbase_context *kctx) { - int js; + unsigned int js; bool tracked = false; lockdep_assert_held(&kbdev->hwaccess_lock); diff --git a/mali_kbase/backend/gpu/mali_kbase_jm_rb.h b/mali_kbase/backend/gpu/mali_kbase_jm_rb.h index d3ff203..32be0bf 100644 --- a/mali_kbase/backend/gpu/mali_kbase_jm_rb.h +++ b/mali_kbase/backend/gpu/mali_kbase_jm_rb.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2014-2018, 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2014-2018, 2020-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -40,8 +40,7 @@ * * Return: true if job evicted from NEXT registers, false otherwise */ -bool kbase_gpu_irq_evict(struct kbase_device *kbdev, int js, - u32 completion_code); +bool kbase_gpu_irq_evict(struct kbase_device *kbdev, unsigned int js, u32 completion_code); /** * kbase_gpu_complete_hw - Complete an atom on job slot js @@ -53,10 +52,8 @@ bool kbase_gpu_irq_evict(struct kbase_device *kbdev, int js, * completed * @end_timestamp: Time of completion */ -void kbase_gpu_complete_hw(struct kbase_device *kbdev, int js, - u32 completion_code, - u64 job_tail, - ktime_t *end_timestamp); +void kbase_gpu_complete_hw(struct kbase_device *kbdev, unsigned int js, u32 completion_code, + u64 job_tail, ktime_t *end_timestamp); /** * kbase_gpu_inspect - Inspect the contents of the HW access ringbuffer @@ -68,8 +65,7 @@ void kbase_gpu_complete_hw(struct kbase_device *kbdev, int js, * Return: The atom at that position in the ringbuffer * or NULL if no atom present */ -struct kbase_jd_atom *kbase_gpu_inspect(struct kbase_device *kbdev, int js, - int idx); +struct kbase_jd_atom *kbase_gpu_inspect(struct kbase_device *kbdev, unsigned int js, int idx); /** * kbase_gpu_dump_slots - Print the contents of the slot ringbuffers diff --git a/mali_kbase/backend/gpu/mali_kbase_js_backend.c b/mali_kbase/backend/gpu/mali_kbase_js_backend.c index 02d7cdb..ff4e114 100644 --- a/mali_kbase/backend/gpu/mali_kbase_js_backend.c +++ b/mali_kbase/backend/gpu/mali_kbase_js_backend.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2014-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2014-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -28,28 +28,18 @@ #include <mali_kbase_reset_gpu.h> #include <backend/gpu/mali_kbase_jm_internal.h> #include <backend/gpu/mali_kbase_js_internal.h> +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) +#include <mali_kbase_gpu_metrics.h> + +#endif -#if !MALI_USE_CSF /* * Hold the runpool_mutex for this */ -static inline bool timer_callback_should_run(struct kbase_device *kbdev) +static inline bool timer_callback_should_run(struct kbase_device *kbdev, int nr_running_ctxs) { - struct kbase_backend_data *backend = &kbdev->hwaccess.backend; - int nr_running_ctxs; - lockdep_assert_held(&kbdev->js_data.runpool_mutex); - /* Timer must stop if we are suspending */ - if (backend->suspend_timer) - return false; - - /* nr_contexts_pullable is updated with the runpool_mutex. However, the - * locking in the caller gives us a barrier that ensures - * nr_contexts_pullable is up-to-date for reading - */ - nr_running_ctxs = atomic_read(&kbdev->js_data.nr_contexts_runnable); - #ifdef CONFIG_MALI_DEBUG if (kbdev->js_data.softstop_always) { /* Debug support for allowing soft-stop on a single context */ @@ -91,7 +81,7 @@ static enum hrtimer_restart timer_callback(struct hrtimer *timer) struct kbase_device *kbdev; struct kbasep_js_device_data *js_devdata; struct kbase_backend_data *backend; - int s; + unsigned int s; bool reset_needed = false; KBASE_DEBUG_ASSERT(timer != NULL); @@ -273,18 +263,20 @@ static enum hrtimer_restart timer_callback(struct hrtimer *timer) return HRTIMER_NORESTART; } -#endif /* !MALI_USE_CSF */ void kbase_backend_ctx_count_changed(struct kbase_device *kbdev) { -#if !MALI_USE_CSF struct kbasep_js_device_data *js_devdata = &kbdev->js_data; struct kbase_backend_data *backend = &kbdev->hwaccess.backend; unsigned long flags; + /* Timer must stop if we are suspending */ + const bool suspend_timer = backend->suspend_timer; + const int nr_running_ctxs = + atomic_read(&kbdev->js_data.nr_contexts_runnable); lockdep_assert_held(&js_devdata->runpool_mutex); - if (!timer_callback_should_run(kbdev)) { + if (suspend_timer || !timer_callback_should_run(kbdev, nr_running_ctxs)) { /* Take spinlock to force synchronisation with timer */ spin_lock_irqsave(&kbdev->hwaccess_lock, flags); backend->timer_running = false; @@ -298,7 +290,8 @@ void kbase_backend_ctx_count_changed(struct kbase_device *kbdev) hrtimer_cancel(&backend->scheduling_timer); } - if (timer_callback_should_run(kbdev) && !backend->timer_running) { + if (!suspend_timer && timer_callback_should_run(kbdev, nr_running_ctxs) && + !backend->timer_running) { /* Take spinlock to force synchronisation with timer */ spin_lock_irqsave(&kbdev->hwaccess_lock, flags); backend->timer_running = true; @@ -309,36 +302,59 @@ void kbase_backend_ctx_count_changed(struct kbase_device *kbdev) KBASE_KTRACE_ADD_JM(kbdev, JS_POLICY_TIMER_START, NULL, NULL, 0u, 0u); } -#else /* !MALI_USE_CSF */ - CSTD_UNUSED(kbdev); -#endif /* !MALI_USE_CSF */ + +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) + if (unlikely(suspend_timer)) { + js_devdata->gpu_metrics_timer_needed = false; + /* Cancel the timer as System suspend is happening */ + hrtimer_cancel(&js_devdata->gpu_metrics_timer); + js_devdata->gpu_metrics_timer_running = false; + spin_lock_irqsave(&kbdev->hwaccess_lock, flags); + /* Explicitly emit the tracepoint on System suspend */ + kbase_gpu_metrics_emit_tracepoint(kbdev, ktime_get_raw_ns()); + spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); + return; + } + + if (!nr_running_ctxs) { + /* Just set the flag to not restart the timer on expiry */ + js_devdata->gpu_metrics_timer_needed = false; + return; + } + + /* There are runnable contexts so the timer is needed */ + if (!js_devdata->gpu_metrics_timer_needed) { + spin_lock_irqsave(&kbdev->hwaccess_lock, flags); + js_devdata->gpu_metrics_timer_needed = true; + /* No need to restart the timer if it is already running. */ + if (!js_devdata->gpu_metrics_timer_running) { + hrtimer_start(&js_devdata->gpu_metrics_timer, + HR_TIMER_DELAY_NSEC(kbase_gpu_metrics_get_emit_interval()), + HRTIMER_MODE_REL); + js_devdata->gpu_metrics_timer_running = true; + } + spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); + } +#endif } int kbase_backend_timer_init(struct kbase_device *kbdev) { -#if !MALI_USE_CSF struct kbase_backend_data *backend = &kbdev->hwaccess.backend; hrtimer_init(&backend->scheduling_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); backend->scheduling_timer.function = timer_callback; backend->timer_running = false; -#else /* !MALI_USE_CSF */ - CSTD_UNUSED(kbdev); -#endif /* !MALI_USE_CSF */ return 0; } void kbase_backend_timer_term(struct kbase_device *kbdev) { -#if !MALI_USE_CSF struct kbase_backend_data *backend = &kbdev->hwaccess.backend; hrtimer_cancel(&backend->scheduling_timer); -#else /* !MALI_USE_CSF */ - CSTD_UNUSED(kbdev); -#endif /* !MALI_USE_CSF */ } void kbase_backend_timer_suspend(struct kbase_device *kbdev) @@ -365,4 +381,3 @@ void kbase_backend_timeouts_changed(struct kbase_device *kbdev) backend->timeouts_updated = true; } - diff --git a/mali_kbase/backend/gpu/mali_kbase_l2_mmu_config.c b/mali_kbase/backend/gpu/mali_kbase_l2_mmu_config.c index 9ce5075..6eedc00 100644 --- a/mali_kbase/backend/gpu/mali_kbase_l2_mmu_config.c +++ b/mali_kbase/backend/gpu/mali_kbase_l2_mmu_config.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -19,8 +19,9 @@ * */ +#include <linux/version_compat_defs.h> + #include <mali_kbase.h> -#include <mali_kbase_bits.h> #include <mali_kbase_config_defaults.h> #include <device/mali_kbase_device.h> #include "mali_kbase_l2_mmu_config.h" diff --git a/mali_kbase/backend/gpu/mali_kbase_model_dummy.c b/mali_kbase/backend/gpu/mali_kbase_model_dummy.c index 603ffcf..46bcdc7 100644 --- a/mali_kbase/backend/gpu/mali_kbase_model_dummy.c +++ b/mali_kbase/backend/gpu/mali_kbase_model_dummy.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2018-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2014-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -62,8 +62,9 @@ * document */ #include <mali_kbase.h> +#include <device/mali_kbase_device.h> #include <gpu/mali_kbase_gpu_regmap.h> -#include <backend/gpu/mali_kbase_model_dummy.h> +#include <backend/gpu/mali_kbase_model_linux.h> #include <mali_kbase_mem_linux.h> #if MALI_USE_CSF @@ -80,71 +81,23 @@ static bool ipa_control_timer_enabled; #endif #define LO_MASK(M) ((M) & 0xFFFFFFFF) - -static u32 get_implementation_register(u32 reg) -{ - switch (reg) { - case GPU_CONTROL_REG(SHADER_PRESENT_LO): - return LO_MASK(DUMMY_IMPLEMENTATION_SHADER_PRESENT); - case GPU_CONTROL_REG(TILER_PRESENT_LO): - return LO_MASK(DUMMY_IMPLEMENTATION_TILER_PRESENT); - case GPU_CONTROL_REG(L2_PRESENT_LO): - return LO_MASK(DUMMY_IMPLEMENTATION_L2_PRESENT); - case GPU_CONTROL_REG(STACK_PRESENT_LO): - return LO_MASK(DUMMY_IMPLEMENTATION_STACK_PRESENT); - - case GPU_CONTROL_REG(SHADER_PRESENT_HI): - case GPU_CONTROL_REG(TILER_PRESENT_HI): - case GPU_CONTROL_REG(L2_PRESENT_HI): - case GPU_CONTROL_REG(STACK_PRESENT_HI): - /* *** FALLTHROUGH *** */ - default: - return 0; - } -} - -struct { - unsigned long prfcnt_base; - u32 *prfcnt_base_cpu; - struct kbase_device *kbdev; - struct tagged_addr *pages; - size_t page_count; - - u32 time; - - struct { - u32 jm; - u32 tiler; - u32 l2; - u32 shader; - } prfcnt_en; - - u64 l2_present; - u64 shader_present; - #if !MALI_USE_CSF - u64 jm_counters[KBASE_DUMMY_MODEL_COUNTER_PER_CORE]; -#else - u64 cshw_counters[KBASE_DUMMY_MODEL_COUNTER_PER_CORE]; -#endif /* !MALI_USE_CSF */ - u64 tiler_counters[KBASE_DUMMY_MODEL_COUNTER_PER_CORE]; - u64 l2_counters[KBASE_DUMMY_MODEL_MAX_MEMSYS_BLOCKS * - KBASE_DUMMY_MODEL_COUNTER_PER_CORE]; - u64 shader_counters[KBASE_DUMMY_MODEL_MAX_SHADER_CORES * - KBASE_DUMMY_MODEL_COUNTER_PER_CORE]; +#define HI_MASK(M) ((M) & 0xFFFFFFFF00000000) +#endif -} performance_counters = { - .l2_present = DUMMY_IMPLEMENTATION_L2_PRESENT, - .shader_present = DUMMY_IMPLEMENTATION_SHADER_PRESENT, -}; +/* Construct a value for the THREAD_FEATURES register, *except* the two most + * significant bits, which are set to IMPLEMENTATION_MODEL in + * midgard_model_read_reg(). + */ +#if MALI_USE_CSF +#define THREAD_FEATURES_PARTIAL(MAX_REGISTERS, MAX_TASK_QUEUE, MAX_TG_SPLIT) \ + ((MAX_REGISTERS) | ((MAX_TASK_QUEUE) << 24)) +#else +#define THREAD_FEATURES_PARTIAL(MAX_REGISTERS, MAX_TASK_QUEUE, MAX_TG_SPLIT) \ + ((MAX_REGISTERS) | ((MAX_TASK_QUEUE) << 16) | ((MAX_TG_SPLIT) << 24)) +#endif -struct job_slot { - int job_active; - int job_queued; - int job_complete_irq_asserted; - int job_irq_mask; - int job_disabled; -}; +struct error_status_t hw_error_status; /** * struct control_reg_values_t - control register values specific to the GPU being 'emulated' @@ -162,6 +115,9 @@ struct job_slot { * @mmu_features: MMU features * @gpu_features_lo: GPU features (low) * @gpu_features_hi: GPU features (high) + * @shader_present: Available shader bitmap + * @stack_present: Core stack present bitmap + * */ struct control_reg_values_t { const char *name; @@ -176,16 +132,32 @@ struct control_reg_values_t { u32 mmu_features; u32 gpu_features_lo; u32 gpu_features_hi; + u32 shader_present; + u32 stack_present; +}; + +struct job_slot { + int job_active; + int job_queued; + int job_complete_irq_asserted; + int job_irq_mask; + int job_disabled; }; struct dummy_model_t { int reset_completed; int reset_completed_mask; +#if !MALI_USE_CSF int prfcnt_sample_completed; +#endif /* !MALI_USE_CSF */ int power_changed_mask; /* 2bits: _ALL,_SINGLE */ int power_changed; /* 1bit */ bool clean_caches_completed; bool clean_caches_completed_irq_enabled; +#if MALI_USE_CSF + bool flush_pa_range_completed; + bool flush_pa_range_completed_irq_enabled; +#endif int power_on; /* 6bits: SHADER[4],TILER,L2 */ u32 stack_power_on_lo; u32 coherency_enable; @@ -196,45 +168,6 @@ struct dummy_model_t { void *data; }; -void gpu_device_set_data(void *model, void *data) -{ - struct dummy_model_t *dummy = (struct dummy_model_t *)model; - - dummy->data = data; -} - -void *gpu_device_get_data(void *model) -{ - struct dummy_model_t *dummy = (struct dummy_model_t *)model; - - return dummy->data; -} - -#define signal_int(m, s) m->slots[(s)].job_complete_irq_asserted = 1 - -/* SCons should pass in a default GPU, but other ways of building (e.g. - * in-tree) won't, so define one here in case. - */ -#ifndef CONFIG_MALI_NO_MALI_DEFAULT_GPU -#define CONFIG_MALI_NO_MALI_DEFAULT_GPU "tMIx" -#endif - -static char *no_mali_gpu = CONFIG_MALI_NO_MALI_DEFAULT_GPU; -module_param(no_mali_gpu, charp, 0000); -MODULE_PARM_DESC(no_mali_gpu, "GPU to identify as"); - -/* Construct a value for the THREAD_FEATURES register, *except* the two most - * significant bits, which are set to IMPLEMENTATION_MODEL in - * midgard_model_read_reg(). - */ -#if MALI_USE_CSF -#define THREAD_FEATURES_PARTIAL(MAX_REGISTERS, MAX_TASK_QUEUE, MAX_TG_SPLIT) \ - ((MAX_REGISTERS) | ((MAX_TASK_QUEUE) << 24)) -#else -#define THREAD_FEATURES_PARTIAL(MAX_REGISTERS, MAX_TASK_QUEUE, MAX_TG_SPLIT) \ - ((MAX_REGISTERS) | ((MAX_TASK_QUEUE) << 16) | ((MAX_TG_SPLIT) << 24)) -#endif - /* Array associating GPU names with control register values. The first * one is used in the case of no match. */ @@ -251,6 +184,8 @@ static const struct control_reg_values_t all_control_reg_values[] = { .mmu_features = 0x2830, .gpu_features_lo = 0, .gpu_features_hi = 0, + .shader_present = DUMMY_IMPLEMENTATION_SHADER_PRESENT, + .stack_present = DUMMY_IMPLEMENTATION_STACK_PRESENT, }, { .name = "tHEx", @@ -264,6 +199,8 @@ static const struct control_reg_values_t all_control_reg_values[] = { .mmu_features = 0x2830, .gpu_features_lo = 0, .gpu_features_hi = 0, + .shader_present = DUMMY_IMPLEMENTATION_SHADER_PRESENT, + .stack_present = DUMMY_IMPLEMENTATION_STACK_PRESENT, }, { .name = "tSIx", @@ -277,6 +214,8 @@ static const struct control_reg_values_t all_control_reg_values[] = { .mmu_features = 0x2821, .gpu_features_lo = 0, .gpu_features_hi = 0, + .shader_present = DUMMY_IMPLEMENTATION_SHADER_PRESENT, + .stack_present = DUMMY_IMPLEMENTATION_STACK_PRESENT, }, { .name = "tDVx", @@ -290,6 +229,8 @@ static const struct control_reg_values_t all_control_reg_values[] = { .mmu_features = 0x2821, .gpu_features_lo = 0, .gpu_features_hi = 0, + .shader_present = DUMMY_IMPLEMENTATION_SHADER_PRESENT, + .stack_present = DUMMY_IMPLEMENTATION_STACK_PRESENT, }, { .name = "tNOx", @@ -303,6 +244,8 @@ static const struct control_reg_values_t all_control_reg_values[] = { .mmu_features = 0x2830, .gpu_features_lo = 0, .gpu_features_hi = 0, + .shader_present = DUMMY_IMPLEMENTATION_SHADER_PRESENT, + .stack_present = DUMMY_IMPLEMENTATION_STACK_PRESENT, }, { .name = "tGOx_r0p0", @@ -316,6 +259,8 @@ static const struct control_reg_values_t all_control_reg_values[] = { .mmu_features = 0x2830, .gpu_features_lo = 0, .gpu_features_hi = 0, + .shader_present = DUMMY_IMPLEMENTATION_SHADER_PRESENT, + .stack_present = DUMMY_IMPLEMENTATION_STACK_PRESENT, }, { .name = "tGOx_r1p0", @@ -330,6 +275,8 @@ static const struct control_reg_values_t all_control_reg_values[] = { .mmu_features = 0x2823, .gpu_features_lo = 0, .gpu_features_hi = 0, + .shader_present = DUMMY_IMPLEMENTATION_SHADER_PRESENT, + .stack_present = DUMMY_IMPLEMENTATION_STACK_PRESENT, }, { .name = "tTRx", @@ -343,6 +290,8 @@ static const struct control_reg_values_t all_control_reg_values[] = { .mmu_features = 0x2830, .gpu_features_lo = 0, .gpu_features_hi = 0, + .shader_present = DUMMY_IMPLEMENTATION_SHADER_PRESENT, + .stack_present = DUMMY_IMPLEMENTATION_STACK_PRESENT, }, { .name = "tNAx", @@ -356,6 +305,8 @@ static const struct control_reg_values_t all_control_reg_values[] = { .mmu_features = 0x2830, .gpu_features_lo = 0, .gpu_features_hi = 0, + .shader_present = DUMMY_IMPLEMENTATION_SHADER_PRESENT, + .stack_present = DUMMY_IMPLEMENTATION_STACK_PRESENT, }, { .name = "tBEx", @@ -369,6 +320,8 @@ static const struct control_reg_values_t all_control_reg_values[] = { .mmu_features = 0x2830, .gpu_features_lo = 0, .gpu_features_hi = 0, + .shader_present = DUMMY_IMPLEMENTATION_SHADER_PRESENT_TBEX, + .stack_present = DUMMY_IMPLEMENTATION_STACK_PRESENT, }, { .name = "tBAx", @@ -382,19 +335,8 @@ static const struct control_reg_values_t all_control_reg_values[] = { .mmu_features = 0x2830, .gpu_features_lo = 0, .gpu_features_hi = 0, - }, - { - .name = "tDUx", - .gpu_id = GPU_ID2_MAKE(10, 2, 0, 1, 0, 0, 0), - .as_present = 0xFF, - .thread_max_threads = 0x180, - .thread_max_workgroup_size = 0x180, - .thread_max_barrier_size = 0x180, - .thread_features = THREAD_FEATURES_PARTIAL(0x6000, 4, 0), - .tiler_features = 0x809, - .mmu_features = 0x2830, - .gpu_features_lo = 0, - .gpu_features_hi = 0, + .shader_present = DUMMY_IMPLEMENTATION_SHADER_PRESENT, + .stack_present = DUMMY_IMPLEMENTATION_STACK_PRESENT, }, { .name = "tODx", @@ -408,6 +350,8 @@ static const struct control_reg_values_t all_control_reg_values[] = { .mmu_features = 0x2830, .gpu_features_lo = 0, .gpu_features_hi = 0, + .shader_present = DUMMY_IMPLEMENTATION_SHADER_PRESENT_TODX, + .stack_present = DUMMY_IMPLEMENTATION_STACK_PRESENT, }, { .name = "tGRx", @@ -422,6 +366,8 @@ static const struct control_reg_values_t all_control_reg_values[] = { .mmu_features = 0x2830, .gpu_features_lo = 0, .gpu_features_hi = 0, + .shader_present = DUMMY_IMPLEMENTATION_SHADER_PRESENT, + .stack_present = DUMMY_IMPLEMENTATION_STACK_PRESENT, }, { .name = "tVAx", @@ -436,6 +382,8 @@ static const struct control_reg_values_t all_control_reg_values[] = { .mmu_features = 0x2830, .gpu_features_lo = 0, .gpu_features_hi = 0, + .shader_present = DUMMY_IMPLEMENTATION_SHADER_PRESENT, + .stack_present = DUMMY_IMPLEMENTATION_STACK_PRESENT, }, { .name = "tTUx", @@ -450,10 +398,95 @@ static const struct control_reg_values_t all_control_reg_values[] = { .mmu_features = 0x2830, .gpu_features_lo = 0xf, .gpu_features_hi = 0, + .shader_present = DUMMY_IMPLEMENTATION_SHADER_PRESENT_TTUX, + .stack_present = 0xF, + }, + { + .name = "tTIx", + .gpu_id = GPU_ID2_MAKE(12, 8, 1, 0, 0, 0, 0), + .as_present = 0xFF, + .thread_max_threads = 0x800, + .thread_max_workgroup_size = 0x400, + .thread_max_barrier_size = 0x400, + .thread_features = THREAD_FEATURES_PARTIAL(0x10000, 16, 0), + .core_features = 0x1, /* core_1e64fma4tex */ + .tiler_features = 0x809, + .mmu_features = 0x2830, + .gpu_features_lo = 0xf, + .gpu_features_hi = 0, + .shader_present = DUMMY_IMPLEMENTATION_SHADER_PRESENT_TTIX, + .stack_present = 0xF, }, }; -struct error_status_t hw_error_status; +static struct { + spinlock_t access_lock; +#if !MALI_USE_CSF + unsigned long prfcnt_base; +#endif /* !MALI_USE_CSF */ + u32 *prfcnt_base_cpu; + + u32 time; + + struct gpu_model_prfcnt_en prfcnt_en; + + u64 l2_present; + u64 shader_present; + +#if !MALI_USE_CSF + u64 jm_counters[KBASE_DUMMY_MODEL_COUNTER_PER_CORE]; +#else + u64 cshw_counters[KBASE_DUMMY_MODEL_COUNTER_PER_CORE]; +#endif /* !MALI_USE_CSF */ + u64 tiler_counters[KBASE_DUMMY_MODEL_COUNTER_PER_CORE]; + u64 l2_counters[KBASE_DUMMY_MODEL_MAX_MEMSYS_BLOCKS * + KBASE_DUMMY_MODEL_COUNTER_PER_CORE]; + u64 shader_counters[KBASE_DUMMY_MODEL_MAX_SHADER_CORES * + KBASE_DUMMY_MODEL_COUNTER_PER_CORE]; +} performance_counters; + +static u32 get_implementation_register(u32 reg, + const struct control_reg_values_t *const control_reg_values) +{ + switch (reg) { + case GPU_CONTROL_REG(SHADER_PRESENT_LO): + return LO_MASK(control_reg_values->shader_present); + case GPU_CONTROL_REG(TILER_PRESENT_LO): + return LO_MASK(DUMMY_IMPLEMENTATION_TILER_PRESENT); + case GPU_CONTROL_REG(L2_PRESENT_LO): + return LO_MASK(DUMMY_IMPLEMENTATION_L2_PRESENT); + case GPU_CONTROL_REG(STACK_PRESENT_LO): + return LO_MASK(control_reg_values->stack_present); + + case GPU_CONTROL_REG(SHADER_PRESENT_HI): + case GPU_CONTROL_REG(TILER_PRESENT_HI): + case GPU_CONTROL_REG(L2_PRESENT_HI): + case GPU_CONTROL_REG(STACK_PRESENT_HI): + /* *** FALLTHROUGH *** */ + default: + return 0; + } +} + +void gpu_device_set_data(void *model, void *data) +{ + struct dummy_model_t *dummy = (struct dummy_model_t *)model; + + dummy->data = data; +} + +void *gpu_device_get_data(void *model) +{ + struct dummy_model_t *dummy = (struct dummy_model_t *)model; + + return dummy->data; +} + +#define signal_int(m, s) m->slots[(s)].job_complete_irq_asserted = 1 + +static char *no_mali_gpu = CONFIG_MALI_NO_MALI_DEFAULT_GPU; +module_param(no_mali_gpu, charp, 0000); +MODULE_PARM_DESC(no_mali_gpu, "GPU to identify as"); #if MALI_USE_CSF static u32 gpu_model_get_prfcnt_value(enum kbase_ipa_core_type core_type, @@ -464,6 +497,7 @@ static u32 gpu_model_get_prfcnt_value(enum kbase_ipa_core_type core_type, u32 event_index; u64 value = 0; u32 core; + unsigned long flags; if (WARN_ON(core_type >= KBASE_IPA_CORE_TYPE_NUM)) return 0; @@ -475,17 +509,20 @@ static u32 gpu_model_get_prfcnt_value(enum kbase_ipa_core_type core_type, (ipa_ctl_select_config[core_type] >> (cnt_idx * 8)) & 0xFF; /* Currently only primary counter blocks are supported */ - if (WARN_ON(event_index >= 64)) + if (WARN_ON(event_index >= + (KBASE_DUMMY_MODEL_COUNTER_HEADER_DWORDS + KBASE_DUMMY_MODEL_COUNTER_PER_CORE))) return 0; /* The actual events start index 4 onwards. Spec also says PRFCNT_EN, * TIMESTAMP_LO or TIMESTAMP_HI pseudo-counters do not make sense for * IPA counters. If selected, the value returned for them will be zero. */ - if (WARN_ON(event_index <= 3)) + if (WARN_ON(event_index < KBASE_DUMMY_MODEL_COUNTER_HEADER_DWORDS)) return 0; - event_index -= 4; + event_index -= KBASE_DUMMY_MODEL_COUNTER_HEADER_DWORDS; + + spin_lock_irqsave(&performance_counters.access_lock, flags); switch (core_type) { case KBASE_IPA_CORE_TYPE_CSHW: @@ -514,28 +551,46 @@ static u32 gpu_model_get_prfcnt_value(enum kbase_ipa_core_type core_type, event_index += KBASE_DUMMY_MODEL_COUNTER_PER_CORE; } + spin_unlock_irqrestore(&performance_counters.access_lock, flags); + if (is_low_word) return (value & U32_MAX); else return (value >> 32); } +#endif /* MALI_USE_CSF */ -void gpu_model_clear_prfcnt_values(void) +/** + * gpu_model_clear_prfcnt_values_nolock - Clear performance counter values + * + * Sets all performance counter values to zero. The performance counter access + * lock must be held when calling this function. + */ +static void gpu_model_clear_prfcnt_values_nolock(void) { - memset(performance_counters.cshw_counters, 0, - sizeof(performance_counters.cshw_counters)); - - memset(performance_counters.tiler_counters, 0, - sizeof(performance_counters.tiler_counters)); - - memset(performance_counters.l2_counters, 0, - sizeof(performance_counters.l2_counters)); - + lockdep_assert_held(&performance_counters.access_lock); +#if !MALI_USE_CSF + memset(performance_counters.jm_counters, 0, sizeof(performance_counters.jm_counters)); +#else + memset(performance_counters.cshw_counters, 0, sizeof(performance_counters.cshw_counters)); +#endif /* !MALI_USE_CSF */ + memset(performance_counters.tiler_counters, 0, sizeof(performance_counters.tiler_counters)); + memset(performance_counters.l2_counters, 0, sizeof(performance_counters.l2_counters)); memset(performance_counters.shader_counters, 0, sizeof(performance_counters.shader_counters)); } + +#if MALI_USE_CSF +void gpu_model_clear_prfcnt_values(void) +{ + unsigned long flags; + + spin_lock_irqsave(&performance_counters.access_lock, flags); + gpu_model_clear_prfcnt_values_nolock(); + spin_unlock_irqrestore(&performance_counters.access_lock, flags); +} KBASE_EXPORT_TEST_API(gpu_model_clear_prfcnt_values); -#endif +#endif /* MALI_USE_CSF */ /** * gpu_model_dump_prfcnt_blocks() - Dump performance counter values to buffer @@ -545,17 +600,20 @@ KBASE_EXPORT_TEST_API(gpu_model_clear_prfcnt_values); * @block_count: Number of blocks to dump * @prfcnt_enable_mask: Counter enable mask * @blocks_present: Available blocks bit mask + * + * The performance counter access lock must be held before calling this + * function. */ -static void gpu_model_dump_prfcnt_blocks(u64 *values, u32 *out_index, - u32 block_count, - u32 prfcnt_enable_mask, - u64 blocks_present) +static void gpu_model_dump_prfcnt_blocks(u64 *values, u32 *out_index, u32 block_count, + u32 prfcnt_enable_mask, u64 blocks_present) { u32 block_idx, counter; u32 counter_value = 0; u32 *prfcnt_base; u32 index = 0; + lockdep_assert_held(&performance_counters.access_lock); + prfcnt_base = performance_counters.prfcnt_base_cpu; for (block_idx = 0; block_idx < block_count; block_idx++) { @@ -594,35 +652,18 @@ static void gpu_model_dump_prfcnt_blocks(u64 *values, u32 *out_index, } } -/** - * gpu_model_sync_dummy_prfcnt() - Synchronize dumped performance counter values - * - * Used to ensure counter values are not lost if cache invalidation is performed - * prior to reading. - */ -static void gpu_model_sync_dummy_prfcnt(void) -{ - int i; - struct page *pg; - - for (i = 0; i < performance_counters.page_count; i++) { - pg = as_page(performance_counters.pages[i]); - kbase_sync_single_for_device(performance_counters.kbdev, - kbase_dma_addr(pg), PAGE_SIZE, - DMA_BIDIRECTIONAL); - } -} - -static void midgard_model_dump_prfcnt(void) +static void gpu_model_dump_nolock(void) { u32 index = 0; + lockdep_assert_held(&performance_counters.access_lock); + #if !MALI_USE_CSF - gpu_model_dump_prfcnt_blocks(performance_counters.jm_counters, &index, - 1, 0xffffffff, 0x1); + gpu_model_dump_prfcnt_blocks(performance_counters.jm_counters, &index, 1, + performance_counters.prfcnt_en.fe, 0x1); #else - gpu_model_dump_prfcnt_blocks(performance_counters.cshw_counters, &index, - 1, 0xffffffff, 0x1); + gpu_model_dump_prfcnt_blocks(performance_counters.cshw_counters, &index, 1, + performance_counters.prfcnt_en.fe, 0x1); #endif /* !MALI_USE_CSF */ gpu_model_dump_prfcnt_blocks(performance_counters.tiler_counters, &index, 1, @@ -637,12 +678,48 @@ static void midgard_model_dump_prfcnt(void) performance_counters.prfcnt_en.shader, performance_counters.shader_present); - gpu_model_sync_dummy_prfcnt(); + /* Counter values are cleared after each dump */ + gpu_model_clear_prfcnt_values_nolock(); /* simulate a 'long' time between samples */ performance_counters.time += 10; } +#if !MALI_USE_CSF +static void midgard_model_dump_prfcnt(void) +{ + unsigned long flags; + + spin_lock_irqsave(&performance_counters.access_lock, flags); + gpu_model_dump_nolock(); + spin_unlock_irqrestore(&performance_counters.access_lock, flags); +} +#else +void gpu_model_prfcnt_dump_request(u32 *sample_buf, struct gpu_model_prfcnt_en enable_maps) +{ + unsigned long flags; + + if (WARN_ON(!sample_buf)) + return; + + spin_lock_irqsave(&performance_counters.access_lock, flags); + performance_counters.prfcnt_base_cpu = sample_buf; + performance_counters.prfcnt_en = enable_maps; + gpu_model_dump_nolock(); + spin_unlock_irqrestore(&performance_counters.access_lock, flags); +} + +void gpu_model_glb_request_job_irq(void *model) +{ + unsigned long flags; + + spin_lock_irqsave(&hw_error_status.access_lock, flags); + hw_error_status.job_irq_status |= JOB_IRQ_GLOBAL_IF; + spin_unlock_irqrestore(&hw_error_status.access_lock, flags); + gpu_device_raise_irq(model, MODEL_LINUX_JOB_IRQ); +} +#endif /* !MALI_USE_CSF */ + static void init_register_statuses(struct dummy_model_t *dummy) { int i; @@ -671,8 +748,10 @@ static void init_register_statuses(struct dummy_model_t *dummy) performance_counters.time = 0; } -static void update_register_statuses(struct dummy_model_t *dummy, int job_slot) +static void update_register_statuses(struct dummy_model_t *dummy, unsigned int job_slot) { + lockdep_assert_held(&hw_error_status.access_lock); + if (hw_error_status.errors_mask & IS_A_JOB_ERROR) { if (job_slot == hw_error_status.current_job_slot) { #if !MALI_USE_CSF @@ -922,6 +1001,7 @@ static void update_job_irq_js_state(struct dummy_model_t *dummy, int mask) { int i; + lockdep_assert_held(&hw_error_status.access_lock); pr_debug("%s", "Updating the JS_ACTIVE register"); for (i = 0; i < NUM_SLOTS; i++) { @@ -967,6 +1047,21 @@ static const struct control_reg_values_t *find_control_reg_values(const char *gp size_t i; const struct control_reg_values_t *ret = NULL; + /* Edge case for tGOx, as it has 2 entries in the table for its R0 and R1 + * revisions respectively. As none of them are named "tGOx" the name comparison + * needs to be fixed in these cases. CONFIG_GPU_HWVER should be one of "r0p0" + * or "r1p0" and is derived from the DDK's build configuration. In cases + * where it is unavailable, it defaults to tGOx r1p0. + */ + if (!strcmp(gpu, "tGOx")) { +#ifdef CONFIG_GPU_HWVER + if (!strcmp(CONFIG_GPU_HWVER, "r0p0")) + gpu = "tGOx_r0p0"; + else if (!strcmp(CONFIG_GPU_HWVER, "r1p0")) +#endif /* CONFIG_GPU_HWVER defined */ + gpu = "tGOx_r1p0"; + } + for (i = 0; i < ARRAY_SIZE(all_control_reg_values); ++i) { const struct control_reg_values_t * const fcrv = &all_control_reg_values[i]; @@ -986,17 +1081,29 @@ static const struct control_reg_values_t *find_control_reg_values(const char *gp return ret; } -void *midgard_model_create(const void *config) +void *midgard_model_create(struct kbase_device *kbdev) { struct dummy_model_t *dummy = NULL; + spin_lock_init(&hw_error_status.access_lock); + spin_lock_init(&performance_counters.access_lock); + dummy = kzalloc(sizeof(*dummy), GFP_KERNEL); if (dummy) { dummy->job_irq_js_state = 0; init_register_statuses(dummy); dummy->control_reg_values = find_control_reg_values(no_mali_gpu); + performance_counters.l2_present = get_implementation_register( + GPU_CONTROL_REG(L2_PRESENT_LO), dummy->control_reg_values); + performance_counters.shader_present = get_implementation_register( + GPU_CONTROL_REG(SHADER_PRESENT_LO), dummy->control_reg_values); + + gpu_device_set_data(dummy, kbdev); + + dev_info(kbdev->dev, "Using Dummy Model"); } + return dummy; } @@ -1009,18 +1116,24 @@ static void midgard_model_get_outputs(void *h) { struct dummy_model_t *dummy = (struct dummy_model_t *)h; + lockdep_assert_held(&hw_error_status.access_lock); + if (hw_error_status.job_irq_status) - gpu_device_raise_irq(dummy, GPU_DUMMY_JOB_IRQ); + gpu_device_raise_irq(dummy, MODEL_LINUX_JOB_IRQ); if ((dummy->power_changed && dummy->power_changed_mask) || (dummy->reset_completed & dummy->reset_completed_mask) || hw_error_status.gpu_error_irq || - (dummy->clean_caches_completed && dummy->clean_caches_completed_irq_enabled) || - dummy->prfcnt_sample_completed) - gpu_device_raise_irq(dummy, GPU_DUMMY_GPU_IRQ); +#if !MALI_USE_CSF + dummy->prfcnt_sample_completed || +#else + (dummy->flush_pa_range_completed && dummy->flush_pa_range_completed_irq_enabled) || +#endif + (dummy->clean_caches_completed && dummy->clean_caches_completed_irq_enabled)) + gpu_device_raise_irq(dummy, MODEL_LINUX_GPU_IRQ); if (hw_error_status.mmu_irq_rawstat & hw_error_status.mmu_irq_mask) - gpu_device_raise_irq(dummy, GPU_DUMMY_MMU_IRQ); + gpu_device_raise_irq(dummy, MODEL_LINUX_MMU_IRQ); } static void midgard_model_update(void *h) @@ -1028,6 +1141,8 @@ static void midgard_model_update(void *h) struct dummy_model_t *dummy = (struct dummy_model_t *)h; int i; + lockdep_assert_held(&hw_error_status.access_lock); + for (i = 0; i < NUM_SLOTS; i++) { if (!dummy->slots[i].job_active) continue; @@ -1074,6 +1189,8 @@ static void invalidate_active_jobs(struct dummy_model_t *dummy) { int i; + lockdep_assert_held(&hw_error_status.access_lock); + for (i = 0; i < NUM_SLOTS; i++) { if (dummy->slots[i].job_active) { hw_error_status.job_irq_rawstat |= (1 << (16 + i)); @@ -1083,13 +1200,17 @@ static void invalidate_active_jobs(struct dummy_model_t *dummy) } } -u8 midgard_model_write_reg(void *h, u32 addr, u32 value) +void midgard_model_write_reg(void *h, u32 addr, u32 value) { + unsigned long flags; struct dummy_model_t *dummy = (struct dummy_model_t *)h; + + spin_lock_irqsave(&hw_error_status.access_lock, flags); + #if !MALI_USE_CSF if ((addr >= JOB_CONTROL_REG(JOB_SLOT0)) && (addr < (JOB_CONTROL_REG(JOB_SLOT15) + 0x80))) { - int slot_idx = (addr >> 7) & 0xf; + unsigned int slot_idx = (addr >> 7) & 0xf; KBASE_DEBUG_ASSERT(slot_idx < NUM_SLOTS); if (addr == JOB_SLOT_REG(slot_idx, JS_HEAD_NEXT_LO)) { @@ -1176,6 +1297,9 @@ u8 midgard_model_write_reg(void *h, u32 addr, u32 value) dummy->reset_completed_mask = (value >> 8) & 0x01; dummy->power_changed_mask = (value >> 9) & 0x03; dummy->clean_caches_completed_irq_enabled = (value & (1u << 17)) != 0u; +#if MALI_USE_CSF + dummy->flush_pa_range_completed_irq_enabled = (value & (1u << 20)) != 0u; +#endif } else if (addr == GPU_CONTROL_REG(COHERENCY_ENABLE)) { dummy->coherency_enable = value; } else if (addr == GPU_CONTROL_REG(GPU_IRQ_CLEAR)) { @@ -1188,8 +1312,16 @@ u8 midgard_model_write_reg(void *h, u32 addr, u32 value) if (value & (1 << 17)) dummy->clean_caches_completed = false; - if (value & (1 << 16)) + +#if MALI_USE_CSF + if (value & (1u << 20)) + dummy->flush_pa_range_completed = false; +#endif /* MALI_USE_CSF */ + +#if !MALI_USE_CSF + if (value & PRFCNT_SAMPLE_COMPLETED) /* (1 << 16) */ dummy->prfcnt_sample_completed = 0; +#endif /* !MALI_USE_CSF */ /*update error status */ hw_error_status.gpu_error_irq &= ~(value); @@ -1214,21 +1346,42 @@ u8 midgard_model_write_reg(void *h, u32 addr, u32 value) pr_debug("clean caches requested"); dummy->clean_caches_completed = true; break; +#if MALI_USE_CSF + case GPU_COMMAND_FLUSH_PA_RANGE_CLN_INV_L2: + case GPU_COMMAND_FLUSH_PA_RANGE_CLN_INV_L2_LSC: + case GPU_COMMAND_FLUSH_PA_RANGE_CLN_INV_FULL: + pr_debug("pa range flush requested"); + dummy->flush_pa_range_completed = true; + break; +#endif /* MALI_USE_CSF */ +#if !MALI_USE_CSF case GPU_COMMAND_PRFCNT_SAMPLE: midgard_model_dump_prfcnt(); dummy->prfcnt_sample_completed = 1; +#endif /* !MALI_USE_CSF */ default: break; } +#if MALI_USE_CSF + } else if (addr >= GPU_CONTROL_REG(GPU_COMMAND_ARG0_LO) && + addr <= GPU_CONTROL_REG(GPU_COMMAND_ARG1_HI)) { + /* Writes ignored */ +#endif } else if (addr == GPU_CONTROL_REG(L2_CONFIG)) { dummy->l2_config = value; } #if MALI_USE_CSF - else if (addr >= GPU_CONTROL_REG(CSF_HW_DOORBELL_PAGE_OFFSET) && - addr < GPU_CONTROL_REG(CSF_HW_DOORBELL_PAGE_OFFSET + - (CSF_NUM_DOORBELL * CSF_HW_DOORBELL_PAGE_SIZE))) { - if (addr == GPU_CONTROL_REG(CSF_HW_DOORBELL_PAGE_OFFSET)) + else if (addr >= CSF_HW_DOORBELL_PAGE_OFFSET && + addr < CSF_HW_DOORBELL_PAGE_OFFSET + + (CSF_NUM_DOORBELL * CSF_HW_DOORBELL_PAGE_SIZE)) { + if (addr == CSF_HW_DOORBELL_PAGE_OFFSET) hw_error_status.job_irq_status = JOB_IRQ_GLOBAL_IF; + } else if ((addr >= GPU_CONTROL_REG(SYSC_ALLOC0)) && + (addr < GPU_CONTROL_REG(SYSC_ALLOC(SYSC_ALLOC_COUNT)))) { + /* Do nothing */ + } else if ((addr >= GPU_CONTROL_REG(ASN_HASH_0)) && + (addr < GPU_CONTROL_REG(ASN_HASH(ASN_HASH_COUNT)))) { + /* Do nothing */ } else if (addr == IPA_CONTROL_REG(COMMAND)) { pr_debug("Received IPA_CONTROL command"); } else if (addr == IPA_CONTROL_REG(TIMER)) { @@ -1249,14 +1402,13 @@ u8 midgard_model_write_reg(void *h, u32 addr, u32 value) } } #endif - else if (addr == MMU_REG(MMU_IRQ_MASK)) { + else if (addr == MMU_CONTROL_REG(MMU_IRQ_MASK)) { hw_error_status.mmu_irq_mask = value; - } else if (addr == MMU_REG(MMU_IRQ_CLEAR)) { + } else if (addr == MMU_CONTROL_REG(MMU_IRQ_CLEAR)) { hw_error_status.mmu_irq_rawstat &= (~value); - } else if ((addr >= MMU_AS_REG(0, AS_TRANSTAB_LO)) && - (addr <= MMU_AS_REG(15, AS_STATUS))) { - int mem_addr_space = (addr - MMU_AS_REG(0, AS_TRANSTAB_LO)) - >> 6; + } else if ((addr >= MMU_STAGE1_REG(MMU_AS_REG(0, AS_TRANSTAB_LO))) && + (addr <= MMU_STAGE1_REG(MMU_AS_REG(15, AS_STATUS)))) { + int mem_addr_space = (addr - MMU_STAGE1_REG(MMU_AS_REG(0, AS_TRANSTAB_LO))) >> 6; switch (addr & 0x3F) { case AS_COMMAND: @@ -1346,20 +1498,24 @@ u8 midgard_model_write_reg(void *h, u32 addr, u32 value) mem_addr_space, addr, value); break; } - } else if (addr >= GPU_CONTROL_REG(PRFCNT_BASE_LO) && - addr <= GPU_CONTROL_REG(PRFCNT_MMU_L2_EN)) { + } else { switch (addr) { +#if !MALI_USE_CSF case PRFCNT_BASE_LO: - performance_counters.prfcnt_base |= value; + performance_counters.prfcnt_base = + HI_MASK(performance_counters.prfcnt_base) | value; + performance_counters.prfcnt_base_cpu = + (u32 *)(uintptr_t)performance_counters.prfcnt_base; break; case PRFCNT_BASE_HI: - performance_counters.prfcnt_base |= ((u64) value) << 32; + performance_counters.prfcnt_base = + LO_MASK(performance_counters.prfcnt_base) | (((u64)value) << 32); + performance_counters.prfcnt_base_cpu = + (u32 *)(uintptr_t)performance_counters.prfcnt_base; break; -#if !MALI_USE_CSF case PRFCNT_JM_EN: - performance_counters.prfcnt_en.jm = value; + performance_counters.prfcnt_en.fe = value; break; -#endif /* !MALI_USE_CSF */ case PRFCNT_SHADER_EN: performance_counters.prfcnt_en.shader = value; break; @@ -1369,9 +1525,7 @@ u8 midgard_model_write_reg(void *h, u32 addr, u32 value) case PRFCNT_MMU_L2_EN: performance_counters.prfcnt_en.l2 = value; break; - } - } else { - switch (addr) { +#endif /* !MALI_USE_CSF */ case TILER_PWRON_LO: dummy->power_on |= (value & 1) << 1; /* Also ensure L2 is powered on */ @@ -1379,7 +1533,8 @@ u8 midgard_model_write_reg(void *h, u32 addr, u32 value) dummy->power_changed = 1; break; case SHADER_PWRON_LO: - dummy->power_on |= (value & 0xF) << 2; + dummy->power_on |= + (value & dummy->control_reg_values->shader_present) << 2; dummy->power_changed = 1; break; case L2_PWRON_LO: @@ -1395,7 +1550,8 @@ u8 midgard_model_write_reg(void *h, u32 addr, u32 value) dummy->power_changed = 1; break; case SHADER_PWROFF_LO: - dummy->power_on &= ~((value & 0xF) << 2); + dummy->power_on &= + ~((value & dummy->control_reg_values->shader_present) << 2); dummy->power_changed = 1; break; case L2_PWROFF_LO: @@ -1416,6 +1572,7 @@ u8 midgard_model_write_reg(void *h, u32 addr, u32 value) case PWR_OVERRIDE0: #if !MALI_USE_CSF case JM_CONFIG: + case PRFCNT_CONFIG: #else /* !MALI_USE_CSF */ case CSF_CONFIG: #endif /* !MALI_USE_CSF */ @@ -1434,13 +1591,16 @@ u8 midgard_model_write_reg(void *h, u32 addr, u32 value) midgard_model_update(dummy); midgard_model_get_outputs(dummy); - - return 1; + spin_unlock_irqrestore(&hw_error_status.access_lock, flags); } -u8 midgard_model_read_reg(void *h, u32 addr, u32 * const value) +void midgard_model_read_reg(void *h, u32 addr, u32 *const value) { + unsigned long flags; struct dummy_model_t *dummy = (struct dummy_model_t *)h; + + spin_lock_irqsave(&hw_error_status.access_lock, flags); + *value = 0; /* 0 by default */ #if !MALI_USE_CSF if (addr == JOB_CONTROL_REG(JOB_IRQ_JS_STATE)) { @@ -1475,24 +1635,44 @@ u8 midgard_model_read_reg(void *h, u32 addr, u32 * const value) #endif /* !MALI_USE_CSF */ else if (addr == GPU_CONTROL_REG(GPU_IRQ_MASK)) { *value = (dummy->reset_completed_mask << 8) | - (dummy->power_changed_mask << 9) | (1 << 7) | 1; + ((dummy->clean_caches_completed_irq_enabled ? 1u : 0u) << 17) | +#if MALI_USE_CSF + ((dummy->flush_pa_range_completed_irq_enabled ? 1u : 0u) << 20) | +#endif + (dummy->power_changed_mask << 9) | (1 << 7) | 1; pr_debug("GPU_IRQ_MASK read %x", *value); } else if (addr == GPU_CONTROL_REG(GPU_IRQ_RAWSTAT)) { *value = (dummy->power_changed << 9) | (dummy->power_changed << 10) | (dummy->reset_completed << 8) | +#if !MALI_USE_CSF + (dummy->prfcnt_sample_completed ? PRFCNT_SAMPLE_COMPLETED : 0) | +#endif /* !MALI_USE_CSF */ ((dummy->clean_caches_completed ? 1u : 0u) << 17) | - (dummy->prfcnt_sample_completed << 16) | hw_error_status.gpu_error_irq; +#if MALI_USE_CSF + ((dummy->flush_pa_range_completed ? 1u : 0u) << 20) | +#endif + hw_error_status.gpu_error_irq; pr_debug("GPU_IRQ_RAWSTAT read %x", *value); } else if (addr == GPU_CONTROL_REG(GPU_IRQ_STATUS)) { *value = ((dummy->power_changed && (dummy->power_changed_mask & 0x1)) << 9) | ((dummy->power_changed && (dummy->power_changed_mask & 0x2)) << 10) | ((dummy->reset_completed & dummy->reset_completed_mask) << 8) | +#if !MALI_USE_CSF + (dummy->prfcnt_sample_completed ? PRFCNT_SAMPLE_COMPLETED : 0) | +#endif /* !MALI_USE_CSF */ (((dummy->clean_caches_completed && dummy->clean_caches_completed_irq_enabled) ? 1u : 0u) << 17) | - (dummy->prfcnt_sample_completed << 16) | hw_error_status.gpu_error_irq; +#if MALI_USE_CSF + (((dummy->flush_pa_range_completed && + dummy->flush_pa_range_completed_irq_enabled) ? + 1u : + 0u) + << 20) | +#endif + hw_error_status.gpu_error_irq; pr_debug("GPU_IRQ_STAT read %x", *value); } else if (addr == GPU_CONTROL_REG(GPU_STATUS)) { *value = 0; @@ -1504,8 +1684,18 @@ u8 midgard_model_read_reg(void *h, u32 addr, u32 * const value) *value = hw_error_status.gpu_fault_status; } else if (addr == GPU_CONTROL_REG(L2_CONFIG)) { *value = dummy->l2_config; - } else if ((addr >= GPU_CONTROL_REG(SHADER_PRESENT_LO)) && - (addr <= GPU_CONTROL_REG(L2_MMU_CONFIG))) { + } +#if MALI_USE_CSF + else if ((addr >= GPU_CONTROL_REG(SYSC_ALLOC0)) && + (addr < GPU_CONTROL_REG(SYSC_ALLOC(SYSC_ALLOC_COUNT)))) { + *value = 0; + } else if ((addr >= GPU_CONTROL_REG(ASN_HASH_0)) && + (addr < GPU_CONTROL_REG(ASN_HASH(ASN_HASH_COUNT)))) { + *value = 0; + } +#endif + else if ((addr >= GPU_CONTROL_REG(SHADER_PRESENT_LO)) && + (addr <= GPU_CONTROL_REG(L2_MMU_CONFIG))) { switch (addr) { case GPU_CONTROL_REG(SHADER_PRESENT_LO): case GPU_CONTROL_REG(SHADER_PRESENT_HI): @@ -1515,27 +1705,27 @@ u8 midgard_model_read_reg(void *h, u32 addr, u32 * const value) case GPU_CONTROL_REG(L2_PRESENT_HI): case GPU_CONTROL_REG(STACK_PRESENT_LO): case GPU_CONTROL_REG(STACK_PRESENT_HI): - *value = get_implementation_register(addr); + *value = get_implementation_register(addr, dummy->control_reg_values); break; case GPU_CONTROL_REG(SHADER_READY_LO): *value = (dummy->power_on >> 0x02) & - get_implementation_register( - GPU_CONTROL_REG(SHADER_PRESENT_LO)); + get_implementation_register(GPU_CONTROL_REG(SHADER_PRESENT_LO), + dummy->control_reg_values); break; case GPU_CONTROL_REG(TILER_READY_LO): *value = (dummy->power_on >> 0x01) & - get_implementation_register( - GPU_CONTROL_REG(TILER_PRESENT_LO)); + get_implementation_register(GPU_CONTROL_REG(TILER_PRESENT_LO), + dummy->control_reg_values); break; case GPU_CONTROL_REG(L2_READY_LO): *value = dummy->power_on & - get_implementation_register( - GPU_CONTROL_REG(L2_PRESENT_LO)); + get_implementation_register(GPU_CONTROL_REG(L2_PRESENT_LO), + dummy->control_reg_values); break; case GPU_CONTROL_REG(STACK_READY_LO): *value = dummy->stack_power_on_lo & - get_implementation_register( - GPU_CONTROL_REG(STACK_PRESENT_LO)); + get_implementation_register(GPU_CONTROL_REG(STACK_PRESENT_LO), + dummy->control_reg_values); break; case GPU_CONTROL_REG(SHADER_READY_HI): @@ -1729,10 +1919,9 @@ u8 midgard_model_read_reg(void *h, u32 addr, u32 * const value) } else if (addr >= GPU_CONTROL_REG(CYCLE_COUNT_LO) && addr <= GPU_CONTROL_REG(TIMESTAMP_HI)) { *value = 0; - } else if (addr >= MMU_AS_REG(0, AS_TRANSTAB_LO) - && addr <= MMU_AS_REG(15, AS_STATUS)) { - int mem_addr_space = (addr - MMU_AS_REG(0, AS_TRANSTAB_LO)) - >> 6; + } else if (addr >= MMU_STAGE1_REG(MMU_AS_REG(0, AS_TRANSTAB_LO)) && + addr <= MMU_STAGE1_REG(MMU_AS_REG(15, AS_STATUS))) { + int mem_addr_space = (addr - MMU_STAGE1_REG(MMU_AS_REG(0, AS_TRANSTAB_LO))) >> 6; switch (addr & 0x3F) { case AS_TRANSTAB_LO: @@ -1776,11 +1965,11 @@ u8 midgard_model_read_reg(void *h, u32 addr, u32 * const value) *value = 0; break; } - } else if (addr == MMU_REG(MMU_IRQ_MASK)) { + } else if (addr == MMU_CONTROL_REG(MMU_IRQ_MASK)) { *value = hw_error_status.mmu_irq_mask; - } else if (addr == MMU_REG(MMU_IRQ_RAWSTAT)) { + } else if (addr == MMU_CONTROL_REG(MMU_IRQ_RAWSTAT)) { *value = hw_error_status.mmu_irq_rawstat; - } else if (addr == MMU_REG(MMU_IRQ_STATUS)) { + } else if (addr == MMU_CONTROL_REG(MMU_IRQ_STATUS)) { *value = hw_error_status.mmu_irq_mask & hw_error_status.mmu_irq_rawstat; } @@ -1788,8 +1977,7 @@ u8 midgard_model_read_reg(void *h, u32 addr, u32 * const value) else if (addr == IPA_CONTROL_REG(STATUS)) { *value = (ipa_control_timer_enabled << 31); } else if ((addr >= IPA_CONTROL_REG(VALUE_CSHW_REG_LO(0))) && - (addr <= IPA_CONTROL_REG(VALUE_CSHW_REG_HI( - IPA_CTL_MAX_VAL_CNT_IDX)))) { + (addr <= IPA_CONTROL_REG(VALUE_CSHW_REG_HI(IPA_CTL_MAX_VAL_CNT_IDX)))) { u32 counter_index = (addr - IPA_CONTROL_REG(VALUE_CSHW_REG_LO(0))) >> 3; bool is_low_word = @@ -1798,8 +1986,7 @@ u8 midgard_model_read_reg(void *h, u32 addr, u32 * const value) *value = gpu_model_get_prfcnt_value(KBASE_IPA_CORE_TYPE_CSHW, counter_index, is_low_word); } else if ((addr >= IPA_CONTROL_REG(VALUE_MEMSYS_REG_LO(0))) && - (addr <= IPA_CONTROL_REG(VALUE_MEMSYS_REG_HI( - IPA_CTL_MAX_VAL_CNT_IDX)))) { + (addr <= IPA_CONTROL_REG(VALUE_MEMSYS_REG_HI(IPA_CTL_MAX_VAL_CNT_IDX)))) { u32 counter_index = (addr - IPA_CONTROL_REG(VALUE_MEMSYS_REG_LO(0))) >> 3; bool is_low_word = @@ -1808,8 +1995,7 @@ u8 midgard_model_read_reg(void *h, u32 addr, u32 * const value) *value = gpu_model_get_prfcnt_value(KBASE_IPA_CORE_TYPE_MEMSYS, counter_index, is_low_word); } else if ((addr >= IPA_CONTROL_REG(VALUE_TILER_REG_LO(0))) && - (addr <= IPA_CONTROL_REG(VALUE_TILER_REG_HI( - IPA_CTL_MAX_VAL_CNT_IDX)))) { + (addr <= IPA_CONTROL_REG(VALUE_TILER_REG_HI(IPA_CTL_MAX_VAL_CNT_IDX)))) { u32 counter_index = (addr - IPA_CONTROL_REG(VALUE_TILER_REG_LO(0))) >> 3; bool is_low_word = @@ -1818,8 +2004,7 @@ u8 midgard_model_read_reg(void *h, u32 addr, u32 * const value) *value = gpu_model_get_prfcnt_value(KBASE_IPA_CORE_TYPE_TILER, counter_index, is_low_word); } else if ((addr >= IPA_CONTROL_REG(VALUE_SHADER_REG_LO(0))) && - (addr <= IPA_CONTROL_REG(VALUE_SHADER_REG_HI( - IPA_CTL_MAX_VAL_CNT_IDX)))) { + (addr <= IPA_CONTROL_REG(VALUE_SHADER_REG_HI(IPA_CTL_MAX_VAL_CNT_IDX)))) { u32 counter_index = (addr - IPA_CONTROL_REG(VALUE_SHADER_REG_LO(0))) >> 3; bool is_low_word = @@ -1840,18 +2025,18 @@ u8 midgard_model_read_reg(void *h, u32 addr, u32 * const value) *value = 0; } + spin_unlock_irqrestore(&hw_error_status.access_lock, flags); CSTD_UNUSED(dummy); - - return 1; } -static u32 set_user_sample_core_type(u64 *counters, - u32 *usr_data_start, u32 usr_data_offset, - u32 usr_data_size, u32 core_count) +static u32 set_user_sample_core_type(u64 *counters, u32 *usr_data_start, u32 usr_data_offset, + u32 usr_data_size, u32 core_count) { u32 sample_size; u32 *usr_data = NULL; + lockdep_assert_held(&performance_counters.access_lock); + sample_size = core_count * KBASE_DUMMY_MODEL_COUNTER_PER_CORE * sizeof(u32); @@ -1866,11 +2051,7 @@ static u32 set_user_sample_core_type(u64 *counters, u32 i; for (i = 0; i < loop_cnt; i++) { - if (copy_from_user(&counters[i], &usr_data[i], - sizeof(u32))) { - model_error_log(KBASE_CORE, "Unable to set counter sample 2"); - break; - } + counters[i] = usr_data[i]; } } @@ -1884,6 +2065,8 @@ static u32 set_kernel_sample_core_type(u64 *counters, u32 sample_size; u64 *usr_data = NULL; + lockdep_assert_held(&performance_counters.access_lock); + sample_size = core_count * KBASE_DUMMY_MODEL_COUNTER_PER_CORE * sizeof(u64); @@ -1900,49 +2083,70 @@ static u32 set_kernel_sample_core_type(u64 *counters, } /* Counter values injected through ioctl are of 32 bits */ -void gpu_model_set_dummy_prfcnt_sample(u32 *usr_data, u32 usr_data_size) +int gpu_model_set_dummy_prfcnt_user_sample(u32 __user *data, u32 size) { + unsigned long flags; + u32 *user_data; u32 offset = 0; + if (data == NULL || size == 0 || size > KBASE_DUMMY_MODEL_COUNTER_TOTAL * sizeof(u32)) + return -EINVAL; + + /* copy_from_user might sleep so can't be called from inside a spinlock + * allocate a temporary buffer for user data and copy to that before taking + * the lock + */ + user_data = kmalloc(size, GFP_KERNEL); + if (!user_data) + return -ENOMEM; + + if (copy_from_user(user_data, data, size)) { + model_error_log(KBASE_CORE, "Unable to copy prfcnt data from userspace"); + kfree(user_data); + return -EINVAL; + } + + spin_lock_irqsave(&performance_counters.access_lock, flags); #if !MALI_USE_CSF - offset = set_user_sample_core_type(performance_counters.jm_counters, - usr_data, offset, usr_data_size, 1); + offset = set_user_sample_core_type(performance_counters.jm_counters, user_data, offset, + size, 1); #else - offset = set_user_sample_core_type(performance_counters.cshw_counters, - usr_data, offset, usr_data_size, 1); + offset = set_user_sample_core_type(performance_counters.cshw_counters, user_data, offset, + size, 1); #endif /* !MALI_USE_CSF */ - offset = set_user_sample_core_type(performance_counters.tiler_counters, - usr_data, offset, usr_data_size, - hweight64(DUMMY_IMPLEMENTATION_TILER_PRESENT)); - offset = set_user_sample_core_type(performance_counters.l2_counters, - usr_data, offset, usr_data_size, - KBASE_DUMMY_MODEL_MAX_MEMSYS_BLOCKS); - offset = set_user_sample_core_type(performance_counters.shader_counters, - usr_data, offset, usr_data_size, - KBASE_DUMMY_MODEL_MAX_SHADER_CORES); + offset = set_user_sample_core_type(performance_counters.tiler_counters, user_data, offset, + size, hweight64(DUMMY_IMPLEMENTATION_TILER_PRESENT)); + offset = set_user_sample_core_type(performance_counters.l2_counters, user_data, offset, + size, KBASE_DUMMY_MODEL_MAX_MEMSYS_BLOCKS); + offset = set_user_sample_core_type(performance_counters.shader_counters, user_data, offset, + size, KBASE_DUMMY_MODEL_MAX_SHADER_CORES); + spin_unlock_irqrestore(&performance_counters.access_lock, flags); + + kfree(user_data); + return 0; } /* Counter values injected through kutf are of 64 bits */ -void gpu_model_set_dummy_prfcnt_kernel_sample(u64 *usr_data, u32 usr_data_size) +void gpu_model_set_dummy_prfcnt_kernel_sample(u64 *data, u32 size) { + unsigned long flags; u32 offset = 0; + spin_lock_irqsave(&performance_counters.access_lock, flags); #if !MALI_USE_CSF - offset = set_kernel_sample_core_type(performance_counters.jm_counters, - usr_data, offset, usr_data_size, 1); + offset = set_kernel_sample_core_type(performance_counters.jm_counters, data, offset, size, + 1); #else - offset = set_kernel_sample_core_type(performance_counters.cshw_counters, - usr_data, offset, usr_data_size, 1); + offset = set_kernel_sample_core_type(performance_counters.cshw_counters, data, offset, size, + 1); #endif /* !MALI_USE_CSF */ - offset = set_kernel_sample_core_type(performance_counters.tiler_counters, - usr_data, offset, usr_data_size, - hweight64(DUMMY_IMPLEMENTATION_TILER_PRESENT)); - offset = set_kernel_sample_core_type(performance_counters.l2_counters, - usr_data, offset, usr_data_size, - hweight64(performance_counters.l2_present)); - offset = set_kernel_sample_core_type(performance_counters.shader_counters, - usr_data, offset, usr_data_size, - hweight64(performance_counters.shader_present)); + offset = set_kernel_sample_core_type(performance_counters.tiler_counters, data, offset, + size, hweight64(DUMMY_IMPLEMENTATION_TILER_PRESENT)); + offset = set_kernel_sample_core_type(performance_counters.l2_counters, data, offset, size, + hweight64(performance_counters.l2_present)); + offset = set_kernel_sample_core_type(performance_counters.shader_counters, data, offset, + size, hweight64(performance_counters.shader_present)); + spin_unlock_irqrestore(&performance_counters.access_lock, flags); } KBASE_EXPORT_TEST_API(gpu_model_set_dummy_prfcnt_kernel_sample); @@ -1977,21 +2181,12 @@ void gpu_model_set_dummy_prfcnt_cores(struct kbase_device *kbdev, } KBASE_EXPORT_TEST_API(gpu_model_set_dummy_prfcnt_cores); -void gpu_model_set_dummy_prfcnt_base_cpu(u32 *base, struct kbase_device *kbdev, - struct tagged_addr *pages, - size_t page_count) -{ - performance_counters.prfcnt_base_cpu = base; - performance_counters.kbdev = kbdev; - performance_counters.pages = pages; - performance_counters.page_count = page_count; -} - int gpu_model_control(void *model, struct kbase_model_control_params *params) { struct dummy_model_t *dummy = (struct dummy_model_t *)model; int i; + unsigned long flags; if (params->command == KBASE_MC_DISABLE_JOBS) { for (i = 0; i < NUM_SLOTS; i++) @@ -2000,8 +2195,10 @@ int gpu_model_control(void *model, return -EINVAL; } + spin_lock_irqsave(&hw_error_status.access_lock, flags); midgard_model_update(dummy); midgard_model_get_outputs(dummy); + spin_unlock_irqrestore(&hw_error_status.access_lock, flags); return 0; } diff --git a/mali_kbase/backend/gpu/mali_kbase_model_dummy.h b/mali_kbase/backend/gpu/mali_kbase_model_dummy.h index 87690f4..2a3351b 100644 --- a/mali_kbase/backend/gpu/mali_kbase_model_dummy.h +++ b/mali_kbase/backend/gpu/mali_kbase_model_dummy.h @@ -21,11 +21,24 @@ /* * Dummy Model interface + * + * Support for NO_MALI dummy Model interface. + * + * +-----------------------------------+ + * | Kbase read/write/IRQ | + * +-----------------------------------+ + * | Model Linux Framework | + * +-----------------------------------+ + * | Model Dummy interface definitions | + * +-----------------+-----------------+ + * | Fake R/W | Fake IRQ | + * +-----------------+-----------------+ */ #ifndef _KBASE_MODEL_DUMMY_H_ #define _KBASE_MODEL_DUMMY_H_ +#include <uapi/gpu/arm/midgard/backend/gpu/mali_kbase_model_linux.h> #include <uapi/gpu/arm/midgard/backend/gpu/mali_kbase_model_dummy.h> #define model_error_log(module, ...) pr_err(__VA_ARGS__) @@ -116,6 +129,8 @@ struct kbase_error_atom { /*struct to track the system error state*/ struct error_status_t { + spinlock_t access_lock; + u32 errors_mask; u32 mmu_table_level; int faulty_mmu_as; @@ -138,38 +153,71 @@ struct error_status_t { u64 as_transtab[NUM_MMU_AS]; }; -void *midgard_model_create(const void *config); -void midgard_model_destroy(void *h); -u8 midgard_model_write_reg(void *h, u32 addr, u32 value); -u8 midgard_model_read_reg(void *h, u32 addr, - u32 * const value); +/** + * struct gpu_model_prfcnt_en - Performance counter enable masks + * @fe: Enable mask for front-end block + * @tiler: Enable mask for tiler block + * @l2: Enable mask for L2/Memory system blocks + * @shader: Enable mask for shader core blocks + */ +struct gpu_model_prfcnt_en { + u32 fe; + u32 tiler; + u32 l2; + u32 shader; +}; + void midgard_set_error(int job_slot); int job_atom_inject_error(struct kbase_error_params *params); int gpu_model_control(void *h, struct kbase_model_control_params *params); -void gpu_model_set_dummy_prfcnt_sample(u32 *usr_data, u32 usr_data_size); -void gpu_model_set_dummy_prfcnt_kernel_sample(u64 *usr_data, u32 usr_data_size); +/** + * gpu_model_set_dummy_prfcnt_user_sample() - Set performance counter values + * @data: Userspace pointer to array of counter values + * @size: Size of counter value array + * + * Counter values set by this function will be used for one sample dump only + * after which counters will be cleared back to zero. + * + * Return: 0 on success, else error code. + */ +int gpu_model_set_dummy_prfcnt_user_sample(u32 __user *data, u32 size); + +/** + * gpu_model_set_dummy_prfcnt_kernel_sample() - Set performance counter values + * @data: Pointer to array of counter values + * @size: Size of counter value array + * + * Counter values set by this function will be used for one sample dump only + * after which counters will be cleared back to zero. + */ +void gpu_model_set_dummy_prfcnt_kernel_sample(u64 *data, u32 size); + void gpu_model_get_dummy_prfcnt_cores(struct kbase_device *kbdev, u64 *l2_present, u64 *shader_present); void gpu_model_set_dummy_prfcnt_cores(struct kbase_device *kbdev, u64 l2_present, u64 shader_present); -void gpu_model_set_dummy_prfcnt_base_cpu(u32 *base, struct kbase_device *kbdev, - struct tagged_addr *pages, - size_t page_count); + /* Clear the counter values array maintained by the dummy model */ void gpu_model_clear_prfcnt_values(void); -enum gpu_dummy_irq { - GPU_DUMMY_JOB_IRQ, - GPU_DUMMY_GPU_IRQ, - GPU_DUMMY_MMU_IRQ -}; +#if MALI_USE_CSF +/** + * gpu_model_prfcnt_dump_request() - Request performance counter sample dump. + * @sample_buf: Pointer to KBASE_DUMMY_MODEL_MAX_VALUES_PER_SAMPLE sized array + * in which to store dumped performance counter values. + * @enable_maps: Physical enable maps for performance counter blocks. + */ +void gpu_model_prfcnt_dump_request(uint32_t *sample_buf, struct gpu_model_prfcnt_en enable_maps); -void gpu_device_raise_irq(void *model, - enum gpu_dummy_irq irq); -void gpu_device_set_data(void *model, void *data); -void *gpu_device_get_data(void *model); +/** + * gpu_model_glb_request_job_irq() - Trigger job interrupt with global request + * flag set. + * @model: Model pointer returned by midgard_model_create(). + */ +void gpu_model_glb_request_job_irq(void *model); +#endif /* MALI_USE_CSF */ extern struct error_status_t hw_error_status; diff --git a/mali_kbase/backend/gpu/mali_kbase_model_error_generator.c b/mali_kbase/backend/gpu/mali_kbase_model_error_generator.c index c91c0d8..f310cc7 100644 --- a/mali_kbase/backend/gpu/mali_kbase_model_error_generator.c +++ b/mali_kbase/backend/gpu/mali_kbase_model_error_generator.c @@ -21,30 +21,29 @@ #include <mali_kbase.h> #include <linux/random.h> -#include "backend/gpu/mali_kbase_model_dummy.h" +#include "backend/gpu/mali_kbase_model_linux.h" -/* all the error conditions supported by the model */ -#define TOTAL_FAULTS 27 -/* maximum number of levels in the MMU translation table tree */ -#define MAX_MMU_TABLE_LEVEL 4 -/* worst case scenario is <1 MMU fault + 1 job fault + 2 GPU faults> */ -#define MAX_CONCURRENT_FAULTS 3 +static struct kbase_error_atom *error_track_list; + +#ifdef CONFIG_MALI_ERROR_INJECT_RANDOM /** Kernel 6.1.0 has dropped prandom_u32(), use get_random_u32() */ #if (KERNEL_VERSION(6, 1, 0) <= LINUX_VERSION_CODE) #define prandom_u32 get_random_u32 #endif -static struct kbase_error_atom *error_track_list; - -unsigned int rand_seed; - /*following error probability are set quite high in order to stress the driver*/ -unsigned int error_probability = 50; /* to be set between 0 and 100 */ +static unsigned int error_probability = 50; /* to be set between 0 and 100 */ /* probability to have multiple error give that there is an error */ -unsigned int multiple_error_probability = 50; +static unsigned int multiple_error_probability = 50; + +/* all the error conditions supported by the model */ +#define TOTAL_FAULTS 27 +/* maximum number of levels in the MMU translation table tree */ +#define MAX_MMU_TABLE_LEVEL 4 +/* worst case scenario is <1 MMU fault + 1 job fault + 2 GPU faults> */ +#define MAX_CONCURRENT_FAULTS 3 -#ifdef CONFIG_MALI_ERROR_INJECT_RANDOM /** * gpu_generate_error - Generate GPU error */ diff --git a/mali_kbase/backend/gpu/mali_kbase_model_linux.c b/mali_kbase/backend/gpu/mali_kbase_model_linux.c index 7887cb2..67e00e9 100644 --- a/mali_kbase/backend/gpu/mali_kbase_model_linux.c +++ b/mali_kbase/backend/gpu/mali_kbase_model_linux.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2010, 2012-2015, 2017-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -20,12 +20,12 @@ */ /* - * Model interface + * Model Linux Framework interfaces. */ #include <mali_kbase.h> #include <gpu/mali_kbase_gpu_regmap.h> -#include <backend/gpu/mali_kbase_model_dummy.h> + #include "backend/gpu/mali_kbase_model_linux.h" #include "device/mali_kbase_device.h" #include "mali_kbase_irq_internal.h" @@ -95,8 +95,7 @@ static void serve_mmu_irq(struct work_struct *work) if (atomic_cmpxchg(&kbdev->serving_mmu_irq, 1, 0) == 1) { u32 val; - while ((val = kbase_reg_read(kbdev, - MMU_REG(MMU_IRQ_STATUS)))) { + while ((val = kbase_reg_read(kbdev, MMU_CONTROL_REG(MMU_IRQ_STATUS)))) { /* Handle the IRQ */ kbase_mmu_interrupt(kbdev, val); } @@ -105,8 +104,7 @@ static void serve_mmu_irq(struct work_struct *work) kmem_cache_free(kbdev->irq_slab, data); } -void gpu_device_raise_irq(void *model, - enum gpu_dummy_irq irq) +void gpu_device_raise_irq(void *model, u32 irq) { struct model_irq_data *data; struct kbase_device *kbdev = gpu_device_get_data(model); @@ -120,15 +118,15 @@ void gpu_device_raise_irq(void *model, data->kbdev = kbdev; switch (irq) { - case GPU_DUMMY_JOB_IRQ: + case MODEL_LINUX_JOB_IRQ: INIT_WORK(&data->work, serve_job_irq); atomic_set(&kbdev->serving_job_irq, 1); break; - case GPU_DUMMY_GPU_IRQ: + case MODEL_LINUX_GPU_IRQ: INIT_WORK(&data->work, serve_gpu_irq); atomic_set(&kbdev->serving_gpu_irq, 1); break; - case GPU_DUMMY_MMU_IRQ: + case MODEL_LINUX_MMU_IRQ: INIT_WORK(&data->work, serve_mmu_irq); atomic_set(&kbdev->serving_mmu_irq, 1); break; @@ -157,7 +155,7 @@ KBASE_EXPORT_TEST_API(kbase_reg_write); u32 kbase_reg_read(struct kbase_device *kbdev, u32 offset) { unsigned long flags; - u32 val; + u32 val = 0; spin_lock_irqsave(&kbdev->reg_op_lock, flags); midgard_model_read_reg(kbdev->model, offset, &val); @@ -165,22 +163,8 @@ u32 kbase_reg_read(struct kbase_device *kbdev, u32 offset) return val; } - KBASE_EXPORT_TEST_API(kbase_reg_read); -/** - * kbase_is_gpu_removed - Has the GPU been removed. - * @kbdev: Kbase device pointer - * - * This function would return true if the GPU has been removed. - * It is stubbed here - * Return: Always false - */ -bool kbase_is_gpu_removed(struct kbase_device *kbdev) -{ - return false; -} - int kbase_install_interrupts(struct kbase_device *kbdev) { KBASE_DEBUG_ASSERT(kbdev); @@ -239,16 +223,12 @@ KBASE_EXPORT_TEST_API(kbase_gpu_irq_test_handler); int kbase_gpu_device_create(struct kbase_device *kbdev) { - kbdev->model = midgard_model_create(NULL); + kbdev->model = midgard_model_create(kbdev); if (kbdev->model == NULL) return -ENOMEM; - gpu_device_set_data(kbdev->model, kbdev); - spin_lock_init(&kbdev->reg_op_lock); - dev_warn(kbdev->dev, "Using Dummy Model"); - return 0; } diff --git a/mali_kbase/backend/gpu/mali_kbase_model_linux.h b/mali_kbase/backend/gpu/mali_kbase_model_linux.h index dcb2e7c..4cf1235 100644 --- a/mali_kbase/backend/gpu/mali_kbase_model_linux.h +++ b/mali_kbase/backend/gpu/mali_kbase_model_linux.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -20,13 +20,132 @@ */ /* - * Model interface + * Model Linux Framework interfaces. + * + * This framework is used to provide generic Kbase Models interfaces. + * Note: Backends cannot be used together; the selection is done at build time. + * + * - Without Model Linux Framework: + * +-----------------------------+ + * | Kbase read/write/IRQ | + * +-----------------------------+ + * | HW interface definitions | + * +-----------------------------+ + * + * - With Model Linux Framework: + * +-----------------------------+ + * | Kbase read/write/IRQ | + * +-----------------------------+ + * | Model Linux Framework | + * +-----------------------------+ + * | Model interface definitions | + * +-----------------------------+ */ #ifndef _KBASE_MODEL_LINUX_H_ #define _KBASE_MODEL_LINUX_H_ +/* + * Include Model definitions + */ + +#if IS_ENABLED(CONFIG_MALI_NO_MALI) +#include <backend/gpu/mali_kbase_model_dummy.h> +#endif /* IS_ENABLED(CONFIG_MALI_NO_MALI) */ + +#if !IS_ENABLED(CONFIG_MALI_REAL_HW) +/** + * kbase_gpu_device_create() - Generic create function. + * + * @kbdev: Kbase device. + * + * Specific model hook is implemented by midgard_model_create() + * + * Return: 0 on success, error code otherwise. + */ int kbase_gpu_device_create(struct kbase_device *kbdev); + +/** + * kbase_gpu_device_destroy() - Generic create function. + * + * @kbdev: Kbase device. + * + * Specific model hook is implemented by midgard_model_destroy() + */ void kbase_gpu_device_destroy(struct kbase_device *kbdev); -#endif /* _KBASE_MODEL_LINUX_H_ */ +/** + * midgard_model_create() - Private create function. + * + * @kbdev: Kbase device. + * + * This hook is specific to the model built in Kbase. + * + * Return: Model handle. + */ +void *midgard_model_create(struct kbase_device *kbdev); + +/** + * midgard_model_destroy() - Private destroy function. + * + * @h: Model handle. + * + * This hook is specific to the model built in Kbase. + */ +void midgard_model_destroy(void *h); + +/** + * midgard_model_write_reg() - Private model write function. + * + * @h: Model handle. + * @addr: Address at which to write. + * @value: value to write. + * + * This hook is specific to the model built in Kbase. + */ +void midgard_model_write_reg(void *h, u32 addr, u32 value); + +/** + * midgard_model_read_reg() - Private model read function. + * + * @h: Model handle. + * @addr: Address from which to read. + * @value: Pointer where to store the read value. + * + * This hook is specific to the model built in Kbase. + */ +void midgard_model_read_reg(void *h, u32 addr, u32 *const value); + +/** + * gpu_device_raise_irq() - Private IRQ raise function. + * + * @model: Model handle. + * @irq: IRQ type to raise. + * + * This hook is global to the model Linux framework. + */ +void gpu_device_raise_irq(void *model, u32 irq); + +/** + * gpu_device_set_data() - Private model set data function. + * + * @model: Model handle. + * @data: Data carried by model. + * + * This hook is global to the model Linux framework. + */ +void gpu_device_set_data(void *model, void *data); + +/** + * gpu_device_get_data() - Private model get data function. + * + * @model: Model handle. + * + * This hook is global to the model Linux framework. + * + * Return: Pointer to the data carried by model. + */ +void *gpu_device_get_data(void *model); +#endif /* !IS_ENABLED(CONFIG_MALI_REAL_HW) */ + +#endif /* _KBASE_MODEL_LINUX_H_ */ diff --git a/mali_kbase/backend/gpu/mali_kbase_pm_backend.c b/mali_kbase/backend/gpu/mali_kbase_pm_backend.c index 2d52eca..311ce90 100644 --- a/mali_kbase/backend/gpu/mali_kbase_pm_backend.c +++ b/mali_kbase/backend/gpu/mali_kbase_pm_backend.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2010-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -36,7 +36,7 @@ #include <linux/pm_runtime.h> #include <mali_kbase_reset_gpu.h> #endif /* !MALI_USE_CSF */ -#include <mali_kbase_hwcnt_context.h> +#include <hwcnt/mali_kbase_hwcnt_context.h> #include <backend/gpu/mali_kbase_pm_internal.h> #include <backend/gpu/mali_kbase_devfreq.h> #include <mali_kbase_dummy_job_wa.h> @@ -72,10 +72,18 @@ int kbase_pm_runtime_init(struct kbase_device *kbdev) callbacks->power_runtime_idle_callback; kbdev->pm.backend.callback_soft_reset = callbacks->soft_reset_callback; + kbdev->pm.backend.callback_hardware_reset = + callbacks->hardware_reset_callback; kbdev->pm.backend.callback_power_runtime_gpu_idle = callbacks->power_runtime_gpu_idle_callback; kbdev->pm.backend.callback_power_runtime_gpu_active = callbacks->power_runtime_gpu_active_callback; +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + kbdev->pm.backend.callback_power_on_sc_rails = + callbacks->power_on_sc_rails_callback; + kbdev->pm.backend.callback_power_off_sc_rails = + callbacks->power_off_sc_rails_callback; +#endif if (callbacks->power_runtime_init_callback) return callbacks->power_runtime_init_callback(kbdev); @@ -93,8 +101,13 @@ int kbase_pm_runtime_init(struct kbase_device *kbdev) kbdev->pm.backend.callback_power_runtime_off = NULL; kbdev->pm.backend.callback_power_runtime_idle = NULL; kbdev->pm.backend.callback_soft_reset = NULL; + kbdev->pm.backend.callback_hardware_reset = NULL; kbdev->pm.backend.callback_power_runtime_gpu_idle = NULL; kbdev->pm.backend.callback_power_runtime_gpu_active = NULL; +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + kbdev->pm.backend.callback_power_on_sc_rails = NULL; + kbdev->pm.backend.callback_power_off_sc_rails = NULL; +#endif return 0; } @@ -140,7 +153,9 @@ int kbase_hwaccess_pm_init(struct kbase_device *kbdev) KBASE_DEBUG_ASSERT(kbdev != NULL); - mutex_init(&kbdev->pm.lock); + rt_mutex_init(&kbdev->pm.lock); + + kbase_pm_init_event_log(kbdev); kbdev->pm.backend.gpu_poweroff_wait_wq = alloc_workqueue("kbase_pm_poweroff_wait", WQ_HIGHPRI | WQ_UNBOUND, 1); @@ -154,6 +169,7 @@ int kbase_hwaccess_pm_init(struct kbase_device *kbdev) kbdev->pm.backend.gpu_powered = false; kbdev->pm.backend.gpu_ready = false; kbdev->pm.suspending = false; + kbdev->pm.resuming = false; #ifdef CONFIG_MALI_ARBITER_SUPPORT kbase_pm_set_gpu_lost(kbdev, false); #endif @@ -207,6 +223,10 @@ int kbase_hwaccess_pm_init(struct kbase_device *kbdev) !kbase_hw_has_issue(kbdev, BASE_HW_ISSUE_TURSEHW_1997) && kbdev->pm.backend.callback_power_runtime_gpu_active && kbdev->pm.backend.callback_power_runtime_gpu_idle; + + kbdev->pm.backend.apply_hw_issue_TITANHW_2938_wa = + kbase_hw_has_issue(kbdev, BASE_HW_ISSUE_TITANHW_2938) && + kbdev->pm.backend.gpu_sleep_supported; #endif if (IS_ENABLED(CONFIG_MALI_HW_ERRATA_1485982_NOT_AFFECTED)) { @@ -422,8 +442,7 @@ static void kbase_pm_l2_clock_slow(struct kbase_device *kbdev) return; /* Stop the metrics gathering framework */ - if (kbase_pm_metrics_is_active(kbdev)) - kbase_pm_metrics_stop(kbdev); + kbase_pm_metrics_stop(kbdev); /* Keep the current freq to restore it upon resume */ kbdev->previous_frequency = clk_get_rate(clk); @@ -576,11 +595,13 @@ static int kbase_pm_do_poweroff_sync(struct kbase_device *kbdev) { struct kbase_pm_backend_data *backend = &kbdev->pm.backend; unsigned long flags; - int ret = 0; + int ret; WARN_ON(kbdev->pm.active_count); - kbase_pm_wait_for_poweroff_work_complete(kbdev); + ret = kbase_pm_wait_for_poweroff_work_complete(kbdev); + if (ret) + return ret; kbase_pm_lock(kbdev); spin_lock_irqsave(&kbdev->hwaccess_lock, flags); @@ -665,25 +686,6 @@ unlock_hwaccess: spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); } -static bool is_poweroff_in_progress(struct kbase_device *kbdev) -{ - bool ret; - unsigned long flags; - - spin_lock_irqsave(&kbdev->hwaccess_lock, flags); - ret = (kbdev->pm.backend.poweroff_wait_in_progress == false); - spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); - - return ret; -} - -void kbase_pm_wait_for_poweroff_work_complete(struct kbase_device *kbdev) -{ - wait_event_killable(kbdev->pm.backend.poweroff_wait, - is_poweroff_in_progress(kbdev)); -} -KBASE_EXPORT_TEST_API(kbase_pm_wait_for_poweroff_work_complete); - /** * is_gpu_powered_down - Check whether GPU is powered down * @@ -807,9 +809,9 @@ void kbase_hwaccess_pm_halt(struct kbase_device *kbdev) #if MALI_USE_CSF && defined(KBASE_PM_RUNTIME) WARN_ON(kbase_pm_do_poweroff_sync(kbdev)); #else - mutex_lock(&kbdev->pm.lock); + rt_mutex_lock(&kbdev->pm.lock); kbase_pm_do_poweroff(kbdev); - mutex_unlock(&kbdev->pm.lock); + rt_mutex_unlock(&kbdev->pm.lock); kbase_pm_wait_for_poweroff_work_complete(kbdev); #endif @@ -865,7 +867,7 @@ void kbase_pm_power_changed(struct kbase_device *kbdev) kbase_pm_update_state(kbdev); #if !MALI_USE_CSF - kbase_backend_slot_update(kbdev); + kbase_backend_slot_update(kbdev); #endif /* !MALI_USE_CSF */ spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); @@ -937,7 +939,13 @@ int kbase_hwaccess_pm_suspend(struct kbase_device *kbdev) kbase_pm_unlock(kbdev); - kbase_pm_wait_for_poweroff_work_complete(kbdev); + ret = kbase_pm_wait_for_poweroff_work_complete(kbdev); + if (ret) { +#if !MALI_USE_CSF + kbase_backend_timer_resume(kbdev); +#endif /* !MALI_USE_CSF */ + return ret; + } #endif WARN_ON(kbdev->pm.backend.gpu_powered); @@ -953,6 +961,8 @@ void kbase_hwaccess_pm_resume(struct kbase_device *kbdev) { kbase_pm_lock(kbdev); + /* System resume callback has begun */ + kbdev->pm.resuming = true; kbdev->pm.suspending = false; #ifdef CONFIG_MALI_ARBITER_SUPPORT if (kbase_pm_is_gpu_lost(kbdev)) { @@ -967,7 +977,6 @@ void kbase_hwaccess_pm_resume(struct kbase_device *kbdev) kbase_backend_timer_resume(kbdev); #endif /* !MALI_USE_CSF */ - wake_up_all(&kbdev->pm.resume_wait); kbase_pm_unlock(kbdev); } @@ -975,13 +984,13 @@ void kbase_hwaccess_pm_resume(struct kbase_device *kbdev) void kbase_pm_handle_gpu_lost(struct kbase_device *kbdev) { unsigned long flags; - ktime_t end_timestamp = ktime_get(); + ktime_t end_timestamp = ktime_get_raw(); struct kbase_arbiter_vm_state *arb_vm_state = kbdev->pm.arb_vm_state; if (!kbdev->arb.arb_if) return; - mutex_lock(&kbdev->pm.lock); + rt_mutex_lock(&kbdev->pm.lock); mutex_lock(&arb_vm_state->vm_state_lock); if (kbdev->pm.backend.gpu_powered && !kbase_pm_is_gpu_lost(kbdev)) { @@ -1021,7 +1030,7 @@ void kbase_pm_handle_gpu_lost(struct kbase_device *kbdev) spin_unlock_irqrestore(&kbdev->hwcnt.lock, flags); } mutex_unlock(&arb_vm_state->vm_state_lock); - mutex_unlock(&kbdev->pm.lock); + rt_mutex_unlock(&kbdev->pm.lock); } #endif /* CONFIG_MALI_ARBITER_SUPPORT */ @@ -1050,6 +1059,7 @@ static int pm_handle_mcu_sleep_on_runtime_suspend(struct kbase_device *kbdev) lockdep_assert_held(&kbdev->csf.scheduler.lock); lockdep_assert_held(&kbdev->pm.lock); +#ifdef CONFIG_MALI_DEBUG /* In case of no active CSG on slot, powering up L2 could be skipped and * proceed directly to suspend GPU. * ToDo: firmware has to be reloaded after wake-up as no halt command @@ -1059,6 +1069,7 @@ static int pm_handle_mcu_sleep_on_runtime_suspend(struct kbase_device *kbdev) dev_info( kbdev->dev, "No active CSGs. Can skip the power up of L2 and go for suspension directly"); +#endif ret = kbase_pm_force_mcu_wakeup_after_sleep(kbdev); if (ret) { diff --git a/mali_kbase/backend/gpu/mali_kbase_pm_ca.c b/mali_kbase/backend/gpu/mali_kbase_pm_ca.c index 7d14be9..b02f77f 100644 --- a/mali_kbase/backend/gpu/mali_kbase_pm_ca.c +++ b/mali_kbase/backend/gpu/mali_kbase_pm_ca.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2013-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2013-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -26,9 +26,7 @@ #include <mali_kbase.h> #include <mali_kbase_pm.h> #include <backend/gpu/mali_kbase_pm_internal.h> -#if IS_ENABLED(CONFIG_MALI_NO_MALI) -#include <backend/gpu/mali_kbase_model_dummy.h> -#endif /* CONFIG_MALI_NO_MALI */ +#include <backend/gpu/mali_kbase_model_linux.h> #include <mali_kbase_dummy_job_wa.h> int kbase_pm_ca_init(struct kbase_device *kbdev) @@ -92,29 +90,10 @@ void kbase_devfreq_set_core_mask(struct kbase_device *kbdev, u64 core_mask) * for those cores to get powered down */ if ((core_mask & old_core_mask) != old_core_mask) { - bool can_wait; - - spin_lock_irqsave(&kbdev->hwaccess_lock, flags); - can_wait = kbdev->pm.backend.gpu_ready && kbase_pm_is_mcu_desired(kbdev); - spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); - - /* This check is ideally not required, the wait function can - * deal with the GPU power down. But it has been added to - * address the scenario where down-scaling request comes from - * the platform specific code soon after the GPU power down - * and at the time same time application thread tries to - * power up the GPU (on the flush of GPU queue). - * The platform specific @ref callback_power_on that gets - * invoked on power up does not return until down-scaling - * request is complete. The check mitigates the race caused by - * the problem in platform specific code. - */ - if (likely(can_wait)) { - if (kbase_pm_wait_for_desired_state(kbdev)) { - dev_warn(kbdev->dev, - "Wait for update of core_mask from %llx to %llx failed", - old_core_mask, core_mask); - } + if (kbase_pm_wait_for_cores_down_scale(kbdev)) { + dev_warn(kbdev->dev, + "Wait for update of core_mask from %llx to %llx failed", + old_core_mask, core_mask); } } #endif diff --git a/mali_kbase/backend/gpu/mali_kbase_pm_defs.h b/mali_kbase/backend/gpu/mali_kbase_pm_defs.h index 80da093..ad49019 100644 --- a/mali_kbase/backend/gpu/mali_kbase_pm_defs.h +++ b/mali_kbase/backend/gpu/mali_kbase_pm_defs.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2014-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2014-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -136,7 +136,7 @@ struct kbasep_pm_metrics { * or removed from a GPU slot. * @active_cl_ctx: number of CL jobs active on the GPU. Array is per-device. * @active_gl_ctx: number of GL jobs active on the GPU. Array is per-slot. - * @lock: spinlock protecting the kbasep_pm_metrics_data structure + * @lock: spinlock protecting the kbasep_pm_metrics_state structure * @platform_data: pointer to data controlled by platform specific code * @kbdev: pointer to kbase device for which metrics are collected * @values: The current values of the power management metrics. The @@ -145,7 +145,7 @@ struct kbasep_pm_metrics { * @initialized: tracks whether metrics_state has been initialized or not. * @timer: timer to regularly make DVFS decisions based on the power * management metrics. - * @timer_active: boolean indicating @timer is running + * @timer_state: atomic indicating current @timer state, on, off, or stopped. * @dvfs_last: values of the PM metrics from the last DVFS tick * @dvfs_diff: different between the current and previous PM metrics. */ @@ -169,7 +169,7 @@ struct kbasep_pm_metrics_state { #ifdef CONFIG_MALI_MIDGARD_DVFS bool initialized; struct hrtimer timer; - bool timer_active; + atomic_t timer_state; struct kbasep_pm_metrics dvfs_last; struct kbasep_pm_metrics dvfs_diff; #endif @@ -215,6 +215,60 @@ union kbase_pm_policy_data { }; /** + * enum kbase_pm_log_event_type - The types of core in a GPU. + * + * @KBASE_PM_LOG_EVENT_NONE: an unused log event, default state at + * initialization. Carries no data. + * @KBASE_PM_LOG_EVENT_SHADERS_STATE: a transition of the JM shader state + * machine. .state is populated. + * @KBASE_PM_LOG_EVENT_L2_STATE: a transition of the L2 state machine. + * .state is populated. + * @KBASE_PM_LOG_EVENT_MCU_STATE: a transition of the MCU state machine. + * .state is populated. + * @KBASE_PM_LOG_EVENT_CORES: a transition of core availability. + * .cores is populated. + * + * Each event log event has a type which determines the data it carries. + */ +enum kbase_pm_log_event_type { + KBASE_PM_LOG_EVENT_NONE = 0, + KBASE_PM_LOG_EVENT_SHADERS_STATE, + KBASE_PM_LOG_EVENT_L2_STATE, + KBASE_PM_LOG_EVENT_MCU_STATE, + KBASE_PM_LOG_EVENT_CORES +}; + +/** + * struct kbase_pm_event_log_event - One event in the PM log. + * + * @type: The type of the event, from &enum kbase_pm_log_event_type. + * @timestamp: The time the log event was generated. + **/ +struct kbase_pm_event_log_event { + u8 type; + ktime_t timestamp; + union { + struct { + u8 next; + u8 prev; + } state; + struct { + u64 l2; + u64 shader; + u64 tiler; + u64 stack; + } cores; + }; +}; + +#define EVENT_LOG_MAX (PAGE_SIZE / sizeof(struct kbase_pm_event_log_event)) + +struct kbase_pm_event_log { + u32 last_event; + struct kbase_pm_event_log_event events[EVENT_LOG_MAX]; +}; + +/** * struct kbase_pm_backend_data - Data stored per device for power management. * * @pm_current_policy: The policy that is currently actively controlling the @@ -279,6 +333,8 @@ union kbase_pm_policy_data { * &struct kbase_pm_callback_conf * @callback_soft_reset: Optional callback to software reset the GPU. See * &struct kbase_pm_callback_conf + * @callback_hardware_reset: Optional callback to hardware reset the GPU. See + * &struct kbase_pm_callback_conf * @callback_power_runtime_gpu_idle: Callback invoked by Kbase when GPU has * become idle. * See &struct kbase_pm_callback_conf. @@ -286,7 +342,13 @@ union kbase_pm_policy_data { * @callback_power_runtime_gpu_idle was * called previously. * See &struct kbase_pm_callback_conf. + * @callback_power_on_sc_rails: Callback invoked to turn on the shader core + * power rails. See &struct kbase_pm_callback_conf. + * @callback_power_off_sc_rails: Callback invoked to turn off the shader core + * power rails. See &struct kbase_pm_callback_conf. * @ca_cores_enabled: Cores that are currently available + * @apply_hw_issue_TITANHW_2938_wa: Indicates if the workaround for BASE_HW_ISSUE_TITANHW_2938 + * needs to be applied when unmapping memory from GPU. * @mcu_state: The current state of the micro-control unit, only applicable * to GPUs that have such a component * @l2_state: The current state of the L2 cache state machine. See @@ -391,6 +453,7 @@ union kbase_pm_policy_data { * work function, kbase_pm_gpu_clock_control_worker. * @gpu_clock_control_work: work item to set GPU clock during L2 power cycle * using gpu_clock_control + * @event_log: data for the always-on event log * * This structure contains data for the power management framework. There is one * instance of this structure per device in the system. @@ -444,12 +507,18 @@ struct kbase_pm_backend_data { void (*callback_power_runtime_off)(struct kbase_device *kbdev); int (*callback_power_runtime_idle)(struct kbase_device *kbdev); int (*callback_soft_reset)(struct kbase_device *kbdev); + void (*callback_hardware_reset)(struct kbase_device *kbdev); void (*callback_power_runtime_gpu_idle)(struct kbase_device *kbdev); void (*callback_power_runtime_gpu_active)(struct kbase_device *kbdev); +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + void (*callback_power_on_sc_rails)(struct kbase_device *kbdev); + void (*callback_power_off_sc_rails)(struct kbase_device *kbdev); +#endif u64 ca_cores_enabled; #if MALI_USE_CSF + bool apply_hw_issue_TITANHW_2938_wa; enum kbase_mcu_state mcu_state; #endif enum kbase_l2_core_state l2_state; @@ -463,6 +532,11 @@ struct kbase_pm_backend_data { struct mutex policy_change_lock; struct workqueue_struct *core_idle_wq; struct work_struct core_idle_work; +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + struct work_struct sc_rails_on_work; + bool sc_power_rails_off; + bool sc_pwroff_safe; +#endif #ifdef KBASE_PM_RUNTIME bool gpu_sleep_supported; @@ -496,10 +570,12 @@ struct kbase_pm_backend_data { bool gpu_clock_slow_down_desired; bool gpu_clock_slowed_down; struct work_struct gpu_clock_control_work; + + struct kbase_pm_event_log event_log; }; #if MALI_USE_CSF -/* CSF PM flag, signaling that the MCU CORE should be kept on */ +/* CSF PM flag, signaling that the MCU shader Core should be kept on */ #define CSF_DYNAMIC_PM_CORE_KEEP_ON (1 << 0) /* CSF PM flag, signaling no scheduler suspension on idle groups */ #define CSF_DYNAMIC_PM_SCHED_IGNORE_IDLE (1 << 1) diff --git a/mali_kbase/backend/gpu/mali_kbase_pm_driver.c b/mali_kbase/backend/gpu/mali_kbase_pm_driver.c index 240c31a..7c891c1 100644 --- a/mali_kbase/backend/gpu/mali_kbase_pm_driver.c +++ b/mali_kbase/backend/gpu/mali_kbase_pm_driver.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2010-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -39,20 +39,18 @@ #include <mali_kbase_reset_gpu.h> #include <mali_kbase_ctx_sched.h> -#include <mali_kbase_hwcnt_context.h> +#include <hwcnt/mali_kbase_hwcnt_context.h> #include <mali_kbase_pbha.h> #include <backend/gpu/mali_kbase_cache_policy_backend.h> #include <device/mali_kbase_device.h> #include <backend/gpu/mali_kbase_irq_internal.h> #include <backend/gpu/mali_kbase_pm_internal.h> #include <backend/gpu/mali_kbase_l2_mmu_config.h> +#include <backend/gpu/mali_kbase_pm_event_log.h> #include <mali_kbase_dummy_job_wa.h> #ifdef CONFIG_MALI_ARBITER_SUPPORT #include <arbiter/mali_kbase_arbiter_pm.h> #endif /* CONFIG_MALI_ARBITER_SUPPORT */ -#if MALI_USE_CSF -#include <csf/ipa_control/mali_kbase_csf_ipa_control.h> -#endif #if MALI_USE_CSF #include <linux/delay.h> @@ -148,9 +146,9 @@ bool kbase_pm_is_l2_desired(struct kbase_device *kbdev) if (unlikely(kbdev->pm.backend.policy_change_clamp_state_to_off)) return false; - /* Power up the L2 cache only when MCU is desired */ - if (likely(kbdev->csf.firmware_inited)) - return kbase_pm_is_mcu_desired(kbdev); + /* We need to power up the L2 when the MCU is desired */ + if (kbase_pm_is_mcu_desired(kbdev)) + return true; #endif return kbdev->pm.backend.l2_desired; @@ -538,6 +536,14 @@ static void kbase_pm_l2_config_override(struct kbase_device *kbdev) if (!kbase_hw_has_feature(kbdev, BASE_HW_FEATURE_L2_CONFIG)) return; +#if MALI_USE_CSF + if (kbase_hw_has_feature(kbdev, BASE_HW_FEATURE_PBHA_HWU)) { + val = kbase_reg_read(kbdev, GPU_CONTROL_REG(L2_CONFIG)); + kbase_reg_write(kbdev, GPU_CONTROL_REG(L2_CONFIG), + L2_CONFIG_PBHA_HWU_SET(val, kbdev->pbha_propagate_bits)); + } +#endif /* MALI_USE_CSF */ + /* * Skip if size and hash are not given explicitly, * which means default values are used. @@ -599,6 +605,21 @@ static const char *kbase_mcu_state_to_string(enum kbase_mcu_state state) return strings[state]; } +static +void kbase_ktrace_log_mcu_state(struct kbase_device *kbdev, enum kbase_mcu_state state) +{ +#if KBASE_KTRACE_ENABLE + switch (state) { +#define KBASEP_MCU_STATE(n) \ + case KBASE_MCU_ ## n: \ + KBASE_KTRACE_ADD(kbdev, PM_MCU_ ## n, NULL, state); \ + break; +#include "mali_kbase_pm_mcu_states.h" +#undef KBASEP_MCU_STATE + } +#endif +} + static inline bool kbase_pm_handle_mcu_core_attr_update(struct kbase_device *kbdev) { struct kbase_pm_backend_data *backend = &kbdev->pm.backend; @@ -655,8 +676,39 @@ static void kbase_pm_enable_mcu_db_notification(struct kbase_device *kbdev) val &= ~MCU_CNTRL_DOORBELL_DISABLE_MASK; kbase_reg_write(kbdev, GPU_CONTROL_REG(MCU_CONTROL), val); } -#endif +/** + * wait_mcu_as_inactive - Wait for AS used by MCU FW to get configured + * + * @kbdev: Pointer to the device. + * + * This function is called to wait for the AS used by MCU FW to get configured + * before DB notification on MCU is enabled, as a workaround for HW issue. + */ +static void wait_mcu_as_inactive(struct kbase_device *kbdev) +{ + unsigned int max_loops = KBASE_AS_INACTIVE_MAX_LOOPS; + + lockdep_assert_held(&kbdev->hwaccess_lock); + + if (!kbase_hw_has_issue(kbdev, BASE_HW_ISSUE_TURSEHW_2716)) + return; + + /* Wait for the AS_ACTIVE_INT bit to become 0 for the AS used by MCU FW */ + while (--max_loops && + kbase_reg_read(kbdev, MMU_STAGE1_REG(MMU_AS_REG(MCU_AS_NR, AS_STATUS))) & + AS_STATUS_AS_ACTIVE_INT) + ; + + if (!WARN_ON_ONCE(max_loops == 0)) + return; + + dev_err(kbdev->dev, "AS_ACTIVE_INT bit stuck for AS %d used by MCU FW", MCU_AS_NR); + + if (kbase_prepare_to_reset_gpu(kbdev, 0)) + kbase_reset_gpu(kbdev); +} +#endif /** * kbasep_pm_toggle_power_interrupt - Toggles the IRQ mask for power interrupts @@ -665,10 +717,10 @@ static void kbase_pm_enable_mcu_db_notification(struct kbase_device *kbdev) * @kbdev: Pointer to the device * @enable: boolean indicating to enable interrupts or not * - * The POWER_CHANGED_ALL and POWER_CHANGED_SINGLE interrupts can be disabled - * after L2 has been turned on when FW is controlling the power for the shader - * cores. Correspondingly, the interrupts can be re-enabled after the MCU has - * been disabled before the power down of L2. + * The POWER_CHANGED_ALL interrupt can be disabled after L2 has been turned on + * when FW is controlling the power for the shader cores. Correspondingly, the + * interrupts can be re-enabled after the MCU has been disabled before the + * power down of L2. */ static void kbasep_pm_toggle_power_interrupt(struct kbase_device *kbdev, bool enable) { @@ -678,10 +730,16 @@ static void kbasep_pm_toggle_power_interrupt(struct kbase_device *kbdev, bool en irq_mask = kbase_reg_read(kbdev, GPU_CONTROL_REG(GPU_IRQ_MASK)); - if (enable) - irq_mask |= POWER_CHANGED_ALL | POWER_CHANGED_SINGLE; - else - irq_mask &= ~(POWER_CHANGED_ALL | POWER_CHANGED_SINGLE); +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + /* For IFPO, we require the POWER_CHANGED_ALL interrupt to be always on */ + enable = true; +#endif + if (enable) { + irq_mask |= POWER_CHANGED_ALL; + kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_IRQ_CLEAR), POWER_CHANGED_ALL); + } else { + irq_mask &= ~POWER_CHANGED_ALL; + } kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_IRQ_MASK), irq_mask); } @@ -742,12 +800,31 @@ static int kbase_pm_mcu_update_state(struct kbase_device *kbdev) backend->shaders_desired_mask; backend->pm_shaders_core_mask = 0; if (kbdev->csf.firmware_hctl_core_pwr) { +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + /* On rail up, this state machine will be re-invoked */ + if (backend->sc_power_rails_off) { + /* The work should already be queued or executing */ + WARN_ON(!work_busy(&backend->sc_rails_on_work)); + break; + } +#endif kbase_pm_invoke(kbdev, KBASE_PM_CORE_SHADER, backend->shaders_avail, ACTION_PWRON); backend->mcu_state = KBASE_MCU_HCTL_SHADERS_PEND_ON; } else backend->mcu_state = KBASE_MCU_ON_HWCNT_ENABLE; +#if IS_ENABLED(CONFIG_MALI_CORESIGHT) + if (kbase_debug_coresight_csf_state_check( + kbdev, KBASE_DEBUG_CORESIGHT_CSF_DISABLED)) { + kbase_debug_coresight_csf_state_request( + kbdev, KBASE_DEBUG_CORESIGHT_CSF_ENABLED); + backend->mcu_state = KBASE_MCU_CORESIGHT_ENABLE; + } else if (kbase_debug_coresight_csf_state_check( + kbdev, KBASE_DEBUG_CORESIGHT_CSF_ENABLED)) { + backend->mcu_state = KBASE_MCU_CORESIGHT_ENABLE; + } +#endif /* IS_ENABLED(CONFIG_MALI_CORESIGHT) */ } break; @@ -776,8 +853,7 @@ static int kbase_pm_mcu_update_state(struct kbase_device *kbdev) unsigned long flags; kbase_csf_scheduler_spin_lock(kbdev, &flags); - kbase_hwcnt_context_enable( - kbdev->hwcnt_gpu_ctx); + kbase_hwcnt_context_enable(kbdev->hwcnt_gpu_ctx); kbase_csf_scheduler_spin_unlock(kbdev, flags); backend->hwcnt_disabled = false; } @@ -798,9 +874,19 @@ static int kbase_pm_mcu_update_state(struct kbase_device *kbdev) backend->mcu_state = KBASE_MCU_HCTL_MCU_ON_RECHECK; } - } else if (kbase_pm_handle_mcu_core_attr_update(kbdev)) { + } else if (kbase_pm_handle_mcu_core_attr_update(kbdev)) backend->mcu_state = KBASE_MCU_ON_CORE_ATTR_UPDATE_PEND; +#if IS_ENABLED(CONFIG_MALI_CORESIGHT) + else if (kbdev->csf.coresight.disable_on_pmode_enter) { + kbase_debug_coresight_csf_state_request( + kbdev, KBASE_DEBUG_CORESIGHT_CSF_DISABLED); + backend->mcu_state = KBASE_MCU_ON_PMODE_ENTER_CORESIGHT_DISABLE; + } else if (kbdev->csf.coresight.enable_on_pmode_exit) { + kbase_debug_coresight_csf_state_request( + kbdev, KBASE_DEBUG_CORESIGHT_CSF_ENABLED); + backend->mcu_state = KBASE_MCU_ON_PMODE_EXIT_CORESIGHT_ENABLE; } +#endif break; case KBASE_MCU_HCTL_MCU_ON_RECHECK: @@ -891,12 +977,46 @@ static int kbase_pm_mcu_update_state(struct kbase_device *kbdev) #ifdef KBASE_PM_RUNTIME if (backend->gpu_sleep_mode_active) backend->mcu_state = KBASE_MCU_ON_SLEEP_INITIATE; - else + else { #endif backend->mcu_state = KBASE_MCU_ON_HALT; +#if IS_ENABLED(CONFIG_MALI_CORESIGHT) + kbase_debug_coresight_csf_state_request( + kbdev, KBASE_DEBUG_CORESIGHT_CSF_DISABLED); + backend->mcu_state = KBASE_MCU_CORESIGHT_DISABLE; +#endif /* IS_ENABLED(CONFIG_MALI_CORESIGHT) */ + } } break; +#if IS_ENABLED(CONFIG_MALI_CORESIGHT) + case KBASE_MCU_ON_PMODE_ENTER_CORESIGHT_DISABLE: + if (kbase_debug_coresight_csf_state_check( + kbdev, KBASE_DEBUG_CORESIGHT_CSF_DISABLED)) { + backend->mcu_state = KBASE_MCU_ON; + kbdev->csf.coresight.disable_on_pmode_enter = false; + } + break; + case KBASE_MCU_ON_PMODE_EXIT_CORESIGHT_ENABLE: + if (kbase_debug_coresight_csf_state_check( + kbdev, KBASE_DEBUG_CORESIGHT_CSF_ENABLED)) { + backend->mcu_state = KBASE_MCU_ON; + kbdev->csf.coresight.enable_on_pmode_exit = false; + } + break; + case KBASE_MCU_CORESIGHT_DISABLE: + if (kbase_debug_coresight_csf_state_check( + kbdev, KBASE_DEBUG_CORESIGHT_CSF_DISABLED)) + backend->mcu_state = KBASE_MCU_ON_HALT; + break; + + case KBASE_MCU_CORESIGHT_ENABLE: + if (kbase_debug_coresight_csf_state_check( + kbdev, KBASE_DEBUG_CORESIGHT_CSF_ENABLED)) + backend->mcu_state = KBASE_MCU_ON_HWCNT_ENABLE; + break; +#endif /* IS_ENABLED(CONFIG_MALI_CORESIGHT) */ + case KBASE_MCU_ON_HALT: if (!kbase_pm_is_mcu_desired(kbdev)) { kbase_csf_firmware_trigger_mcu_halt(kbdev); @@ -907,7 +1027,7 @@ static int kbase_pm_mcu_update_state(struct kbase_device *kbdev) case KBASE_MCU_ON_PEND_HALT: if (kbase_csf_firmware_mcu_halted(kbdev)) { - KBASE_KTRACE_ADD(kbdev, MCU_HALTED, NULL, + KBASE_KTRACE_ADD(kbdev, CSF_FIRMWARE_MCU_HALTED, NULL, kbase_csf_ktrace_gpu_cycle_cnt(kbdev)); if (kbdev->csf.firmware_hctl_core_pwr) backend->mcu_state = @@ -954,7 +1074,7 @@ static int kbase_pm_mcu_update_state(struct kbase_device *kbdev) case KBASE_MCU_ON_PEND_SLEEP: if (kbase_csf_firmware_is_mcu_in_sleep(kbdev)) { - KBASE_KTRACE_ADD(kbdev, MCU_IN_SLEEP, NULL, + KBASE_KTRACE_ADD(kbdev, CSF_FIRMWARE_MCU_SLEEP, NULL, kbase_csf_ktrace_gpu_cycle_cnt(kbdev)); backend->mcu_state = KBASE_MCU_IN_SLEEP; kbase_pm_enable_db_mirror_interrupt(kbdev); @@ -970,6 +1090,7 @@ static int kbase_pm_mcu_update_state(struct kbase_device *kbdev) case KBASE_MCU_IN_SLEEP: if (kbase_pm_is_mcu_desired(kbdev) && backend->l2_state == KBASE_L2_ON) { + wait_mcu_as_inactive(kbdev); KBASE_TLSTREAM_TL_KBASE_CSFFW_FW_REQUEST_WAKEUP( kbdev, kbase_backend_get_cycle_cnt(kbdev)); kbase_pm_enable_mcu_db_notification(kbdev); @@ -980,6 +1101,7 @@ static int kbase_pm_mcu_update_state(struct kbase_device *kbdev) if (!kbdev->csf.firmware_hctl_core_pwr) kbasep_pm_toggle_power_interrupt(kbdev, false); backend->mcu_state = KBASE_MCU_ON_HWCNT_ENABLE; + kbase_csf_ring_doorbell(kbdev, CSF_KERNEL_DOORBELL_NR); } break; #endif @@ -987,6 +1109,11 @@ static int kbase_pm_mcu_update_state(struct kbase_device *kbdev) /* Reset complete */ if (!backend->in_reset) backend->mcu_state = KBASE_MCU_OFF; + +#if IS_ENABLED(CONFIG_MALI_CORESIGHT) + kbdev->csf.coresight.disable_on_pmode_enter = false; + kbdev->csf.coresight.enable_on_pmode_exit = false; +#endif /* IS_ENABLED(CONFIG_MALI_CORESIGHT) */ break; default: @@ -994,10 +1121,18 @@ static int kbase_pm_mcu_update_state(struct kbase_device *kbdev) backend->mcu_state); } - if (backend->mcu_state != prev_state) + if (backend->mcu_state != prev_state) { + struct kbase_pm_event_log_event *event = + kbase_pm_add_log_event(kbdev); + event->type = KBASE_PM_LOG_EVENT_MCU_STATE; + event->state.prev = prev_state; + event->state.next = backend->mcu_state; + dev_dbg(kbdev->dev, "MCU state transition: %s to %s\n", kbase_mcu_state_to_string(prev_state), kbase_mcu_state_to_string(backend->mcu_state)); + kbase_ktrace_log_mcu_state(kbdev, backend->mcu_state); + } } while (backend->mcu_state != prev_state); @@ -1032,6 +1167,31 @@ static void core_idle_worker(struct work_struct *work) } #endif +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS +static void sc_rails_on_worker(struct work_struct *work) +{ + struct kbase_device *kbdev = + container_of(work, struct kbase_device, pm.backend.sc_rails_on_work); + unsigned long flags; + + /* + * Intentionally not synchronized using the scheduler.lock, as the scheduler may be waiting + * on the SC rail to power up + */ + kbase_pm_lock(kbdev); + + kbase_pm_turn_on_sc_power_rails_locked(kbdev); + + spin_lock_irqsave(&kbdev->hwaccess_lock, flags); + /* Push the state machine forward in case it was waiting on SC rail power up */ + kbase_pm_update_state(kbdev); + spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); + + kbase_pm_unlock(kbdev); +} +#endif /* CONFIG_MALI_HOST_CONTROLS_SC_RAILS */ + + static const char *kbase_l2_core_state_to_string(enum kbase_l2_core_state state) { const char *const strings[] = { @@ -1045,6 +1205,21 @@ static const char *kbase_l2_core_state_to_string(enum kbase_l2_core_state state) return strings[state]; } +static +void kbase_ktrace_log_l2_core_state(struct kbase_device *kbdev, enum kbase_l2_core_state state) +{ +#if KBASE_KTRACE_ENABLE + switch (state) { +#define KBASEP_L2_STATE(n) \ + case KBASE_L2_ ## n: \ + KBASE_KTRACE_ADD(kbdev, PM_L2_ ## n, NULL, state); \ + break; +#include "mali_kbase_pm_l2_states.h" +#undef KBASEP_L2_STATE + } +#endif +} + #if !MALI_USE_CSF /* On powering on the L2, the tracked kctx becomes stale and can be cleared. * This enables the backend to spare the START_FLUSH.INV_SHADER_OTHER @@ -1062,13 +1237,82 @@ static void kbase_pm_l2_clear_backend_slot_submit_kctx(struct kbase_device *kbde } #endif +/* wait_as_active_int - Wait for AS_ACTIVE_INT bits to become 0 for all AS + * + * @kbdev: Pointer to the device. + * + * This function is supposed to be called before the write to L2_PWROFF register + * to wait for AS_ACTIVE_INT bit to become 0 for all the GPU address space slots. + * AS_ACTIVE_INT bit can become 1 for an AS, only when L2_READY becomes 1, based + * on the value in TRANSCFG register and would become 0 once AS has been reconfigured. + */ +static void wait_as_active_int(struct kbase_device *kbdev) +{ +#if MALI_USE_CSF && !IS_ENABLED(CONFIG_MALI_NO_MALI) + int as_no; + + lockdep_assert_held(&kbdev->hwaccess_lock); + + if (!kbase_hw_has_issue(kbdev, BASE_HW_ISSUE_GPU2019_3878)) + return; + + for (as_no = 0; as_no != kbdev->nr_hw_address_spaces; as_no++) { + unsigned int max_loops = KBASE_AS_INACTIVE_MAX_LOOPS; + + /* Wait for the AS_ACTIVE_INT bit to become 0 for the AS. + * The wait is actually needed only for the enabled AS. + */ + while (--max_loops && + kbase_reg_read(kbdev, MMU_AS_REG(as_no, AS_STATUS)) & + AS_STATUS_AS_ACTIVE_INT) + ; + +#ifdef CONFIG_MALI_DEBUG + /* For a disabled AS the loop should run for a single iteration only. */ + if (!kbdev->as_to_kctx[as_no] && (max_loops != (KBASE_AS_INACTIVE_MAX_LOOPS -1))) + dev_warn(kbdev->dev, "AS_ACTIVE_INT bit found to be set for disabled AS %d", as_no); +#endif + + if (max_loops) + continue; + + dev_warn(kbdev->dev, "AS_ACTIVE_INT bit stuck for AS %d", as_no); + + if (kbase_prepare_to_reset_gpu(kbdev, 0)) + kbase_reset_gpu(kbdev); + return; + } +#endif +} + static bool can_power_down_l2(struct kbase_device *kbdev) { #if MALI_USE_CSF /* Due to the HW issue GPU2019-3878, need to prevent L2 power off * whilst MMU command is in progress. + * Also defer the power-down if MMU is in process of page migration. */ - return !kbdev->mmu_hw_operation_in_progress; + return !kbdev->mmu_hw_operation_in_progress && !kbdev->mmu_page_migrate_in_progress; +#else + return !kbdev->mmu_page_migrate_in_progress; +#endif +} + +static bool can_power_up_l2(struct kbase_device *kbdev) +{ + lockdep_assert_held(&kbdev->hwaccess_lock); + + /* Avoiding l2 transition if MMU is undergoing page migration */ + return !kbdev->mmu_page_migrate_in_progress; +} + +static bool need_tiler_control(struct kbase_device *kbdev) +{ +#if MALI_USE_CSF + if (kbase_pm_no_mcu_core_pwroff(kbdev)) + return true; + else + return false; #else return true; #endif @@ -1078,9 +1322,8 @@ static int kbase_pm_l2_update_state(struct kbase_device *kbdev) { struct kbase_pm_backend_data *backend = &kbdev->pm.backend; u64 l2_present = kbdev->gpu_props.curr_config.l2_present; -#if !MALI_USE_CSF u64 tiler_present = kbdev->gpu_props.props.raw_props.tiler_present; -#endif + bool l2_power_up_done; enum kbase_l2_core_state prev_state; lockdep_assert_held(&kbdev->hwaccess_lock); @@ -1092,23 +1335,12 @@ static int kbase_pm_l2_update_state(struct kbase_device *kbdev) u64 l2_ready = kbase_pm_get_ready_cores(kbdev, KBASE_PM_CORE_L2); -#if !MALI_USE_CSF - u64 tiler_trans = kbase_pm_get_trans_cores(kbdev, - KBASE_PM_CORE_TILER); - u64 tiler_ready = kbase_pm_get_ready_cores(kbdev, - KBASE_PM_CORE_TILER); -#endif - +#ifdef CONFIG_MALI_ARBITER_SUPPORT /* * kbase_pm_get_ready_cores and kbase_pm_get_trans_cores * are vulnerable to corruption if gpu is lost */ - if (kbase_is_gpu_removed(kbdev) -#ifdef CONFIG_MALI_ARBITER_SUPPORT - || kbase_pm_is_gpu_lost(kbdev)) { -#else - ) { -#endif + if (kbase_is_gpu_removed(kbdev) || kbase_pm_is_gpu_lost(kbdev)) { backend->shaders_state = KBASE_SHADERS_OFF_CORESTACK_OFF; backend->hwcnt_desired = false; @@ -1122,41 +1354,59 @@ static int kbase_pm_l2_update_state(struct kbase_device *kbdev) */ backend->l2_state = KBASE_L2_ON_HWCNT_DISABLE; + KBASE_KTRACE_ADD(kbdev, PM_L2_ON_HWCNT_DISABLE, NULL, + backend->l2_state); kbase_pm_trigger_hwcnt_disable(kbdev); } if (backend->hwcnt_disabled) { backend->l2_state = KBASE_L2_OFF; + KBASE_KTRACE_ADD(kbdev, PM_L2_OFF, NULL, backend->l2_state); dev_dbg(kbdev->dev, "GPU lost has occurred - L2 off\n"); } break; } +#endif /* mask off ready from trans in case transitions finished * between the register reads */ l2_trans &= ~l2_ready; -#if !MALI_USE_CSF - tiler_trans &= ~tiler_ready; -#endif + prev_state = backend->l2_state; switch (backend->l2_state) { case KBASE_L2_OFF: - if (kbase_pm_is_l2_desired(kbdev)) { + if (kbase_pm_is_l2_desired(kbdev) && can_power_up_l2(kbdev)) { +#if MALI_USE_CSF && defined(KBASE_PM_RUNTIME) + // Workaround: give a short pause here before starting L2 transition. + udelay(200); + /* Enable HW timer of IPA control before + * L2 cache is powered-up. + */ + kbase_ipa_control_handle_gpu_sleep_exit(kbdev); +#endif /* * Set the desired config for L2 before * powering it on */ kbase_pm_l2_config_override(kbdev); kbase_pbha_write_settings(kbdev); -#if !MALI_USE_CSF - /* L2 is required, power on. Powering on the - * tiler will also power the first L2 cache. - */ - kbase_pm_invoke(kbdev, KBASE_PM_CORE_TILER, - tiler_present, ACTION_PWRON); + /* If Host is controlling the power for shader + * cores, then it also needs to control the + * power for Tiler. + * Powering on the tiler will also power the + * L2 cache. + */ + if (need_tiler_control(kbdev)) { + kbase_pm_invoke(kbdev, KBASE_PM_CORE_TILER, tiler_present, + ACTION_PWRON); + } else { + kbase_pm_invoke(kbdev, KBASE_PM_CORE_L2, l2_present, + ACTION_PWRON); + } +#if !MALI_USE_CSF /* If we have more than one L2 cache then we * must power them on explicitly. */ @@ -1166,30 +1416,34 @@ static int kbase_pm_l2_update_state(struct kbase_device *kbdev) ACTION_PWRON); /* Clear backend slot submission kctx */ kbase_pm_l2_clear_backend_slot_submit_kctx(kbdev); -#else - /* With CSF firmware, Host driver doesn't need to - * handle power management with both shader and tiler cores. - * The CSF firmware will power up the cores appropriately. - * So only power the l2 cache explicitly. - */ - kbase_pm_invoke(kbdev, KBASE_PM_CORE_L2, - l2_present, ACTION_PWRON); #endif backend->l2_state = KBASE_L2_PEND_ON; } break; case KBASE_L2_PEND_ON: -#if !MALI_USE_CSF - if (!l2_trans && l2_ready == l2_present && !tiler_trans - && tiler_ready == tiler_present) { - KBASE_KTRACE_ADD(kbdev, PM_CORES_CHANGE_AVAILABLE_TILER, NULL, - tiler_ready); -#else + l2_power_up_done = false; if (!l2_trans && l2_ready == l2_present) { - KBASE_KTRACE_ADD(kbdev, PM_CORES_CHANGE_AVAILABLE_L2, NULL, - l2_ready); -#endif + if (need_tiler_control(kbdev)) { + u64 tiler_trans = kbase_pm_get_trans_cores( + kbdev, KBASE_PM_CORE_TILER); + u64 tiler_ready = kbase_pm_get_ready_cores( + kbdev, KBASE_PM_CORE_TILER); + tiler_trans &= ~tiler_ready; + + if (!tiler_trans && tiler_ready == tiler_present) { + KBASE_KTRACE_ADD(kbdev, + PM_CORES_CHANGE_AVAILABLE_TILER, + NULL, tiler_ready); + l2_power_up_done = true; + } + } else { + KBASE_KTRACE_ADD(kbdev, PM_CORES_CHANGE_AVAILABLE_L2, NULL, + l2_ready); + l2_power_up_done = true; + } + } + if (l2_power_up_done) { /* * Ensure snoops are enabled after L2 is powered * up. Note that kbase keeps track of the snoop @@ -1356,14 +1610,15 @@ static int kbase_pm_l2_update_state(struct kbase_device *kbdev) if (kbase_pm_is_l2_desired(kbdev)) backend->l2_state = KBASE_L2_PEND_ON; else if (can_power_down_l2(kbdev)) { - if (!backend->l2_always_on) + if (!backend->l2_always_on) { + wait_as_active_int(kbdev); /* Powering off the L2 will also power off the * tiler. */ kbase_pm_invoke(kbdev, KBASE_PM_CORE_L2, l2_present, ACTION_PWROFF); - else + } else /* If L2 cache is powered then we must flush it * before we power off the GPU. Normally this * would have been handled when the L2 was @@ -1385,12 +1640,26 @@ static int kbase_pm_l2_update_state(struct kbase_device *kbdev) /* We only need to check the L2 here - if the L2 * is off then the tiler is definitely also off. */ - if (!l2_trans && !l2_ready) + if (!l2_trans && !l2_ready) { +#if MALI_USE_CSF && defined(KBASE_PM_RUNTIME) + /* Allow clock gating within the GPU and prevent it + * from being seen as active during sleep. + */ + kbase_ipa_control_handle_gpu_sleep_enter(kbdev); +#endif /* L2 is now powered off */ backend->l2_state = KBASE_L2_OFF; + } } else { - if (!kbdev->cache_clean_in_progress) + if (!kbdev->cache_clean_in_progress) { +#if MALI_USE_CSF && defined(KBASE_PM_RUNTIME) + /* Allow clock gating within the GPU and prevent it + * from being seen as active during sleep. + */ + kbase_ipa_control_handle_gpu_sleep_enter(kbdev); +#endif backend->l2_state = KBASE_L2_OFF; + } } break; @@ -1405,11 +1674,19 @@ static int kbase_pm_l2_update_state(struct kbase_device *kbdev) backend->l2_state); } - if (backend->l2_state != prev_state) + if (backend->l2_state != prev_state) { + struct kbase_pm_event_log_event *event = + kbase_pm_add_log_event(kbdev); + event->type = KBASE_PM_LOG_EVENT_L2_STATE; + event->state.prev = prev_state; + event->state.next = backend->l2_state; + dev_dbg(kbdev->dev, "L2 state transition: %s to %s\n", kbase_l2_core_state_to_string(prev_state), kbase_l2_core_state_to_string( backend->l2_state)); + kbase_ktrace_log_l2_core_state(kbdev, backend->l2_state); + } } while (backend->l2_state != prev_state); @@ -1845,11 +2122,18 @@ static int kbase_pm_shaders_update_state(struct kbase_device *kbdev) break; } - if (backend->shaders_state != prev_state) + if (backend->shaders_state != prev_state) { + struct kbase_pm_event_log_event *event = + kbase_pm_add_log_event(kbdev); + event->type = KBASE_PM_LOG_EVENT_SHADERS_STATE; + event->state.prev = prev_state; + event->state.next = backend->shaders_state; + dev_dbg(kbdev->dev, "Shader state transition: %s to %s\n", kbase_shader_core_state_to_string(prev_state), kbase_shader_core_state_to_string( backend->shaders_state)); + } } while (backend->shaders_state != prev_state); @@ -1873,7 +2157,7 @@ static bool kbase_pm_is_in_desired_state_nolock(struct kbase_device *kbdev) kbdev->pm.backend.shaders_state != KBASE_SHADERS_OFF_CORESTACK_OFF) in_desired_state = false; #else - in_desired_state = kbase_pm_mcu_is_in_desired_state(kbdev); + in_desired_state &= kbase_pm_mcu_is_in_desired_state(kbdev); #endif return in_desired_state; @@ -1910,6 +2194,22 @@ static void kbase_pm_trace_power_state(struct kbase_device *kbdev) { lockdep_assert_held(&kbdev->hwaccess_lock); + { + struct kbase_pm_event_log_event *event = + kbase_pm_add_log_event(kbdev); + event->type = KBASE_PM_LOG_EVENT_CORES; + event->cores.l2 = kbase_pm_get_state( + kbdev, KBASE_PM_CORE_L2, ACTION_READY); + event->cores.shader = kbase_pm_get_state( + kbdev, KBASE_PM_CORE_SHADER, ACTION_READY); + event->cores.tiler = kbase_pm_get_state( + kbdev, KBASE_PM_CORE_TILER, ACTION_READY); + if (corestack_driver_control) { + event->cores.stack = kbase_pm_get_state( + kbdev, KBASE_PM_CORE_STACK, ACTION_READY); + } + } + KBASE_TLSTREAM_AUX_PM_STATE( kbdev, KBASE_PM_CORE_L2, @@ -2048,6 +2348,9 @@ int kbase_pm_state_machine_init(struct kbase_device *kbdev) } INIT_WORK(&kbdev->pm.backend.core_idle_work, core_idle_worker); +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + INIT_WORK(&kbdev->pm.backend.sc_rails_on_work, sc_rails_on_worker); +#endif #endif return 0; @@ -2070,6 +2373,7 @@ void kbase_pm_reset_start_locked(struct kbase_device *kbdev) backend->in_reset = true; backend->l2_state = KBASE_L2_RESET_WAIT; + KBASE_KTRACE_ADD(kbdev, PM_L2_RESET_WAIT, NULL, backend->l2_state); #if !MALI_USE_CSF backend->shaders_state = KBASE_SHADERS_RESET_WAIT; #else @@ -2078,6 +2382,7 @@ void kbase_pm_reset_start_locked(struct kbase_device *kbdev) */ if (likely(kbdev->csf.firmware_inited)) { backend->mcu_state = KBASE_MCU_RESET_WAIT; + KBASE_KTRACE_ADD(kbdev, PM_MCU_RESET_WAIT, NULL, backend->mcu_state); #ifdef KBASE_PM_RUNTIME backend->exit_gpu_sleep_mode = true; #endif @@ -2134,22 +2439,38 @@ void kbase_pm_reset_complete(struct kbase_device *kbdev) #define PM_TIMEOUT_MS (5000) /* 5s */ #endif -static void kbase_pm_timed_out(struct kbase_device *kbdev) +void kbase_gpu_timeout_debug_message(struct kbase_device *kbdev, const char *timeout_msg) { unsigned long flags; - dev_err(kbdev->dev, "Power transition timed out unexpectedly\n"); + dev_err(kbdev->dev, "%s", timeout_msg); #if !MALI_USE_CSF CSTD_UNUSED(flags); dev_err(kbdev->dev, "Desired state :\n"); dev_err(kbdev->dev, "\tShader=%016llx\n", kbdev->pm.backend.shaders_desired ? kbdev->pm.backend.shaders_avail : 0); #else + dev_err(kbdev->dev, "GPU pm state :\n"); spin_lock_irqsave(&kbdev->hwaccess_lock, flags); + dev_err(kbdev->dev, "\tscheduler.pm_active_count = %d", kbdev->csf.scheduler.pm_active_count); + dev_err(kbdev->dev, "\tpoweron_required %d pm.active_count %d invoke_poweroff_wait_wq_when_l2_off %d", + kbdev->pm.backend.poweron_required, + kbdev->pm.active_count, + kbdev->pm.backend.invoke_poweroff_wait_wq_when_l2_off); + dev_err(kbdev->dev, "\tgpu_poweroff_wait_work pending %d", + work_pending(&kbdev->pm.backend.gpu_poweroff_wait_work)); dev_err(kbdev->dev, "\tMCU desired = %d\n", kbase_pm_is_mcu_desired(kbdev)); dev_err(kbdev->dev, "\tMCU sw state = %d\n", kbdev->pm.backend.mcu_state); + dev_err(kbdev->dev, "\tL2 desired = %d (locked_off: %d)\n", + kbase_pm_is_l2_desired(kbdev), kbdev->pm.backend.policy_change_clamp_state_to_off); + dev_err(kbdev->dev, "\tL2 sw state = %d\n", + kbdev->pm.backend.l2_state); +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + dev_err(kbdev->dev, "\tbackend.sc_power_rails_off = %d\n", + kbdev->pm.backend.sc_power_rails_off); +#endif spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); #endif dev_err(kbdev->dev, "Current state :\n"); @@ -2169,8 +2490,7 @@ static void kbase_pm_timed_out(struct kbase_device *kbdev) kbase_reg_read(kbdev, GPU_CONTROL_REG(L2_READY_LO))); #if MALI_USE_CSF - dev_err(kbdev->dev, "\tMCU status = %d\n", - kbase_reg_read(kbdev, GPU_CONTROL_REG(MCU_STATUS))); + kbase_csf_debug_dump_registers(kbdev); #endif dev_err(kbdev->dev, "Cores transitioning :\n"); dev_err(kbdev->dev, "\tShader=%08x%08x\n", @@ -2189,9 +2509,28 @@ static void kbase_pm_timed_out(struct kbase_device *kbdev) kbase_reg_read(kbdev, GPU_CONTROL_REG( L2_PWRTRANS_LO))); + dump_stack(); +} + +static void kbase_pm_timed_out(struct kbase_device *kbdev, const char *timeout_msg) +{ + kbase_gpu_timeout_debug_message(kbdev, timeout_msg); + /* pixel: If either: + * 1. L2/MCU power transition timed out, or, + * 2. kbase state machine fell out of sync with the hw state, + * a soft/hard reset (ie writing to SOFT/HARD_RESET regs) is insufficient to resume + * operation. + * + * Besides, Odin TRM advises against touching SOFT/HARD_RESET + * regs if L2_PWRTRANS is 1 to avoid undefined state. + * + * We have already lost work if we end up here, so send a powercycle to reset the hw, + * which is more reliable. + */ dev_err(kbdev->dev, "Sending reset to GPU - all running jobs will be lost\n"); if (kbase_prepare_to_reset_gpu(kbdev, - RESET_FLAGS_HWC_UNRECOVERABLE_ERROR)) + RESET_FLAGS_HWC_UNRECOVERABLE_ERROR | + RESET_FLAGS_FORCE_PM_HW_RESET)) kbase_reset_gpu(kbdev); } @@ -2214,15 +2553,22 @@ int kbase_pm_wait_for_l2_powered(struct kbase_device *kbdev) /* Wait for cores */ #if KERNEL_VERSION(4, 13, 1) <= LINUX_VERSION_CODE - remaining = wait_event_killable_timeout( + remaining = wait_event_killable_timeout(kbdev->pm.backend.gpu_in_desired_state_wait, + kbase_pm_is_in_desired_state_with_l2_powered(kbdev), + timeout); #else remaining = wait_event_timeout( -#endif kbdev->pm.backend.gpu_in_desired_state_wait, kbase_pm_is_in_desired_state_with_l2_powered(kbdev), timeout); +#endif if (!remaining) { - kbase_pm_timed_out(kbdev); + const struct gpu_uevent evt = { + .type = GPU_UEVENT_TYPE_KMD_ERROR, + .info = GPU_UEVENT_INFO_L2_PM_TIMEOUT + }; + pixel_gpu_uevent_send(kbdev, &evt); + kbase_pm_timed_out(kbdev, "Wait for desired PM state with L2 powered timed out"); err = -ETIMEDOUT; } else if (remaining < 0) { dev_info( @@ -2234,7 +2580,7 @@ int kbase_pm_wait_for_l2_powered(struct kbase_device *kbdev) return err; } -int kbase_pm_wait_for_desired_state(struct kbase_device *kbdev) +static int pm_wait_for_desired_state(struct kbase_device *kbdev, bool killable_wait) { unsigned long flags; long remaining; @@ -2252,27 +2598,193 @@ int kbase_pm_wait_for_desired_state(struct kbase_device *kbdev) /* Wait for cores */ #if KERNEL_VERSION(4, 13, 1) <= LINUX_VERSION_CODE + if (killable_wait) + remaining = wait_event_killable_timeout(kbdev->pm.backend.gpu_in_desired_state_wait, + kbase_pm_is_in_desired_state(kbdev), + timeout); +#else + killable_wait = false; +#endif + if (!killable_wait) + remaining = wait_event_timeout(kbdev->pm.backend.gpu_in_desired_state_wait, + kbase_pm_is_in_desired_state(kbdev), timeout); + if (!remaining) { + const struct gpu_uevent evt = { + .type = GPU_UEVENT_TYPE_KMD_ERROR, + .info = GPU_UEVENT_INFO_PM_TIMEOUT + }; + pixel_gpu_uevent_send(kbdev, &evt); + kbase_pm_timed_out(kbdev, "Wait for power transition timed out"); + err = -ETIMEDOUT; + } else if (remaining < 0) { + WARN_ON_ONCE(!killable_wait); + dev_info(kbdev->dev, "Wait for power transition got interrupted"); + err = (int)remaining; + } + + return err; +} + +int kbase_pm_killable_wait_for_desired_state(struct kbase_device *kbdev) +{ + return pm_wait_for_desired_state(kbdev, true); +} + +int kbase_pm_wait_for_desired_state(struct kbase_device *kbdev) +{ + return pm_wait_for_desired_state(kbdev, false); +} +KBASE_EXPORT_TEST_API(kbase_pm_wait_for_desired_state); + +#if MALI_USE_CSF +/** + * core_mask_update_done - Check if downscaling of shader cores is done + * + * @kbdev: The kbase device structure for the device. + * + * This function checks if the downscaling of cores is effectively complete. + * + * Return: true if the downscale is done. + */ +static bool core_mask_update_done(struct kbase_device *kbdev) +{ + bool update_done = false; + unsigned long flags; + + spin_lock_irqsave(&kbdev->hwaccess_lock, flags); + /* If MCU is in stable ON state then it implies that the downscale + * request had completed. + * If MCU is not active then it implies all cores are off, so can + * consider the downscale request as complete. + */ + if ((kbdev->pm.backend.mcu_state == KBASE_MCU_ON) || + kbase_pm_is_mcu_inactive(kbdev, kbdev->pm.backend.mcu_state)) + update_done = true; + spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); + + return update_done; +} + +int kbase_pm_wait_for_cores_down_scale(struct kbase_device *kbdev) +{ + long timeout = kbase_csf_timeout_in_jiffies(kbase_get_timeout_ms(kbdev, CSF_PM_TIMEOUT)); + long remaining; + int err = 0; + + /* Wait for core mask update to complete */ +#if KERNEL_VERSION(4, 13, 1) <= LINUX_VERSION_CODE remaining = wait_event_killable_timeout( kbdev->pm.backend.gpu_in_desired_state_wait, - kbase_pm_is_in_desired_state(kbdev), timeout); + core_mask_update_done(kbdev), timeout); #else remaining = wait_event_timeout( kbdev->pm.backend.gpu_in_desired_state_wait, - kbase_pm_is_in_desired_state(kbdev), timeout); + core_mask_update_done(kbdev), timeout); #endif if (!remaining) { - kbase_pm_timed_out(kbdev); + kbase_pm_timed_out(kbdev, "Wait for cores down scaling timed out"); err = -ETIMEDOUT; } else if (remaining < 0) { - dev_info(kbdev->dev, - "Wait for desired PM state got interrupted"); + dev_info( + kbdev->dev, + "Wait for cores down scaling got interrupted"); err = (int)remaining; } return err; } -KBASE_EXPORT_TEST_API(kbase_pm_wait_for_desired_state); +#endif + +static bool is_poweroff_wait_in_progress(struct kbase_device *kbdev) +{ + bool ret; + unsigned long flags; + + spin_lock_irqsave(&kbdev->hwaccess_lock, flags); + ret = kbdev->pm.backend.poweroff_wait_in_progress; + spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); + + return ret; +} + +static int pm_wait_for_poweroff_work_complete(struct kbase_device *kbdev, bool killable_wait) +{ + long remaining; +#if MALI_USE_CSF + /* gpu_poweroff_wait_work would be subjected to the kernel scheduling + * and so the wait time can't only be the function of GPU frequency. + */ + const unsigned int extra_wait_time_ms = 2000; + const long timeout = kbase_csf_timeout_in_jiffies( + kbase_get_timeout_ms(kbdev, CSF_PM_TIMEOUT) + extra_wait_time_ms); +#else +#ifdef CONFIG_MALI_ARBITER_SUPPORT + /* Handling of timeout error isn't supported for arbiter builds */ + const long timeout = MAX_SCHEDULE_TIMEOUT; +#else + const long timeout = msecs_to_jiffies(PM_TIMEOUT_MS); +#endif +#endif + int err = 0; + +#if KERNEL_VERSION(4, 13, 1) <= LINUX_VERSION_CODE + if (killable_wait) + remaining = wait_event_killable_timeout(kbdev->pm.backend.poweroff_wait, + !is_poweroff_wait_in_progress(kbdev), + timeout); +#else + killable_wait = false; +#endif + + if (!killable_wait) + remaining = wait_event_timeout(kbdev->pm.backend.poweroff_wait, + !is_poweroff_wait_in_progress(kbdev), timeout); + if (!remaining) { + /* If work is now pending, kbase_pm_gpu_poweroff_wait_wq() will + * definitely be called, so it's safe to continue waiting for it. + */ + if (work_pending(&kbdev->pm.backend.gpu_poweroff_wait_work)) { + wait_event_killable(kbdev->pm.backend.poweroff_wait, + !is_poweroff_wait_in_progress(kbdev)); + } else { + unsigned long flags; + kbasep_platform_event_core_dump(kbdev, "poweroff work timeout"); + kbase_gpu_timeout_debug_message(kbdev, "failed to wait for poweroff worker"); +#if MALI_USE_CSF + //csf.scheduler.state should be accessed with scheduler lock! + //callchains go through this function though holding that lock + //so just print without locking. + dev_err(kbdev->dev, "scheduler.state %d", kbdev->csf.scheduler.state); + dev_err(kbdev->dev, "Firmware ping %d", kbase_csf_firmware_ping_wait(kbdev, 0)); +#endif + //Attempt another state machine transition prompt. + dev_err(kbdev->dev, "Attempt to prompt state machine"); + spin_lock_irqsave(&kbdev->hwaccess_lock, flags); + kbase_pm_update_state(kbdev); + spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); + + kbase_gpu_timeout_debug_message(kbdev, "GPU state after re-prompt of state machine"); + err = -ETIMEDOUT; + } + } else if (remaining < 0) { + WARN_ON_ONCE(!killable_wait); + dev_info(kbdev->dev, "Wait for poweroff work got interrupted"); + err = (int)remaining; + } + return err; +} + +int kbase_pm_killable_wait_for_poweroff_work_complete(struct kbase_device *kbdev) +{ + return pm_wait_for_poweroff_work_complete(kbdev, true); +} + +int kbase_pm_wait_for_poweroff_work_complete(struct kbase_device *kbdev) +{ + return pm_wait_for_poweroff_work_complete(kbdev, false); +} +KBASE_EXPORT_TEST_API(kbase_pm_wait_for_poweroff_work_complete); void kbase_pm_enable_interrupts(struct kbase_device *kbdev) { @@ -2291,12 +2803,12 @@ void kbase_pm_enable_interrupts(struct kbase_device *kbdev) kbase_reg_write(kbdev, JOB_CONTROL_REG(JOB_IRQ_CLEAR), 0xFFFFFFFF); kbase_reg_write(kbdev, JOB_CONTROL_REG(JOB_IRQ_MASK), 0xFFFFFFFF); - kbase_reg_write(kbdev, MMU_REG(MMU_IRQ_CLEAR), 0xFFFFFFFF); + kbase_reg_write(kbdev, MMU_CONTROL_REG(MMU_IRQ_CLEAR), 0xFFFFFFFF); #if MALI_USE_CSF /* Enable only the Page fault bits part */ - kbase_reg_write(kbdev, MMU_REG(MMU_IRQ_MASK), 0xFFFF); + kbase_reg_write(kbdev, MMU_CONTROL_REG(MMU_IRQ_MASK), 0xFFFF); #else - kbase_reg_write(kbdev, MMU_REG(MMU_IRQ_MASK), 0xFFFFFFFF); + kbase_reg_write(kbdev, MMU_CONTROL_REG(MMU_IRQ_MASK), 0xFFFFFFFF); #endif } @@ -2316,8 +2828,8 @@ void kbase_pm_disable_interrupts_nolock(struct kbase_device *kbdev) kbase_reg_write(kbdev, JOB_CONTROL_REG(JOB_IRQ_MASK), 0); kbase_reg_write(kbdev, JOB_CONTROL_REG(JOB_IRQ_CLEAR), 0xFFFFFFFF); - kbase_reg_write(kbdev, MMU_REG(MMU_IRQ_MASK), 0); - kbase_reg_write(kbdev, MMU_REG(MMU_IRQ_CLEAR), 0xFFFFFFFF); + kbase_reg_write(kbdev, MMU_CONTROL_REG(MMU_IRQ_MASK), 0); + kbase_reg_write(kbdev, MMU_CONTROL_REG(MMU_IRQ_CLEAR), 0xFFFFFFFF); } void kbase_pm_disable_interrupts(struct kbase_device *kbdev) @@ -2332,24 +2844,37 @@ void kbase_pm_disable_interrupts(struct kbase_device *kbdev) KBASE_EXPORT_TEST_API(kbase_pm_disable_interrupts); #if MALI_USE_CSF +/** + * update_user_reg_page_mapping - Update the mapping for USER Register page + * + * @kbdev: The kbase device structure for the device. + * + * This function must be called to unmap the dummy or real page from USER Register page + * mapping whenever GPU is powered up or down. The dummy or real page would get + * appropriately mapped in when Userspace reads the LATEST_FLUSH value. + */ static void update_user_reg_page_mapping(struct kbase_device *kbdev) { + struct kbase_context *kctx, *n; + lockdep_assert_held(&kbdev->pm.lock); mutex_lock(&kbdev->csf.reg_lock); - if (kbdev->csf.mali_file_inode) { - /* This would zap the pte corresponding to the mapping of User - * register page for all the Kbase contexts. + list_for_each_entry_safe(kctx, n, &kbdev->csf.user_reg.list, csf.user_reg.link) { + /* This would zap the PTE corresponding to the mapping of User + * Register page of the kbase context. The mapping will be reestablished + * when the context (user process) needs to access to the page. */ - unmap_mapping_range(kbdev->csf.mali_file_inode->i_mapping, - BASEP_MEM_CSF_USER_REG_PAGE_HANDLE, - PAGE_SIZE, 1); + unmap_mapping_range(kbdev->csf.user_reg.filp->f_inode->i_mapping, + kctx->csf.user_reg.file_offset << PAGE_SHIFT, PAGE_SIZE, 1); + list_del_init(&kctx->csf.user_reg.link); + dev_dbg(kbdev->dev, "Updated USER Reg page mapping of ctx %d_%d", kctx->tgid, + kctx->id); } mutex_unlock(&kbdev->csf.reg_lock); } #endif - /* * pmu layout: * 0x0000: PMU TAG (RO) (0xCAFECAFE) @@ -2487,7 +3012,6 @@ void kbase_pm_clock_on(struct kbase_device *kbdev, bool is_resume) backend->gpu_idled = false; } #endif - } KBASE_EXPORT_TEST_API(kbase_pm_clock_on); @@ -2722,9 +3246,13 @@ static int kbase_pm_hw_issues_detect(struct kbase_device *kbdev) kbdev->hw_quirks_tiler = 0; kbdev->hw_quirks_mmu = 0; - if (!of_property_read_u32(np, "quirks_gpu", &kbdev->hw_quirks_gpu)) { - dev_info(kbdev->dev, - "Found quirks_gpu = [0x%x] in Devicetree\n", + /* Read the "-" versions of the properties and fall back to + * the "_" versions if these are not found + */ + + if (!of_property_read_u32(np, "quirks-gpu", &kbdev->hw_quirks_gpu) || + !of_property_read_u32(np, "quirks_gpu", &kbdev->hw_quirks_gpu)) { + dev_info(kbdev->dev, "Found quirks_gpu = [0x%x] in Devicetree\n", kbdev->hw_quirks_gpu); } else { error = kbase_set_gpu_quirks(kbdev, prod_id); @@ -2732,33 +3260,30 @@ static int kbase_pm_hw_issues_detect(struct kbase_device *kbdev) return error; } - if (!of_property_read_u32(np, "quirks_sc", - &kbdev->hw_quirks_sc)) { - dev_info(kbdev->dev, - "Found quirks_sc = [0x%x] in Devicetree\n", - kbdev->hw_quirks_sc); + if (!of_property_read_u32(np, "quirks-sc", &kbdev->hw_quirks_sc) || + !of_property_read_u32(np, "quirks_sc", &kbdev->hw_quirks_sc)) { + dev_info(kbdev->dev, "Found quirks_sc = [0x%x] in Devicetree\n", + kbdev->hw_quirks_sc); } else { error = kbase_set_sc_quirks(kbdev, prod_id); if (error) return error; } - if (!of_property_read_u32(np, "quirks_tiler", - &kbdev->hw_quirks_tiler)) { - dev_info(kbdev->dev, - "Found quirks_tiler = [0x%x] in Devicetree\n", - kbdev->hw_quirks_tiler); + if (!of_property_read_u32(np, "quirks-tiler", &kbdev->hw_quirks_tiler) || + !of_property_read_u32(np, "quirks_tiler", &kbdev->hw_quirks_tiler)) { + dev_info(kbdev->dev, "Found quirks_tiler = [0x%x] in Devicetree\n", + kbdev->hw_quirks_tiler); } else { error = kbase_set_tiler_quirks(kbdev); if (error) return error; } - if (!of_property_read_u32(np, "quirks_mmu", - &kbdev->hw_quirks_mmu)) { - dev_info(kbdev->dev, - "Found quirks_mmu = [0x%x] in Devicetree\n", - kbdev->hw_quirks_mmu); + if (!of_property_read_u32(np, "quirks-mmu", &kbdev->hw_quirks_mmu) || + !of_property_read_u32(np, "quirks_mmu", &kbdev->hw_quirks_mmu)) { + dev_info(kbdev->dev, "Found quirks_mmu = [0x%x] in Devicetree\n", + kbdev->hw_quirks_mmu); } else { error = kbase_set_mmu_quirks(kbdev); } @@ -2827,15 +3352,73 @@ static void reenable_protected_mode_hwcnt(struct kbase_device *kbdev) } #endif +static int kbase_pm_hw_reset(struct kbase_device *kbdev) +{ + unsigned long flags; + bool gpu_ready; + + lockdep_assert_held(&kbdev->pm.lock); + + if (!kbdev->pm.backend.callback_hardware_reset) { + dev_warn(kbdev->dev, "No hardware reset provided"); + return -EINVAL; + } + + /* Save GPU power state */ + spin_lock_irqsave(&kbdev->hwaccess_lock, flags); + WARN_ON(!kbdev->pm.backend.gpu_powered); + gpu_ready = kbdev->pm.backend.gpu_ready; + kbdev->pm.backend.gpu_ready = false; + kbdev->pm.backend.gpu_powered = false; + spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); + +#if MALI_USE_CSF + /* Swap for dummy page */ + update_user_reg_page_mapping(kbdev); +#endif + + /* Delegate hardware reset to platform */ + kbdev->pm.backend.callback_hardware_reset(kbdev); + +#if MALI_USE_CSF + /* Swap for real page */ + update_user_reg_page_mapping(kbdev); +#endif + + /* GPU is powered again, restore state */ + spin_lock_irqsave(&kbdev->hwaccess_lock, flags); + kbdev->pm.backend.gpu_powered = true; + kbdev->pm.backend.gpu_ready = gpu_ready; + spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); + + /* Check register access for success */ + if (kbase_is_gpu_removed(kbdev)) { + dev_err(kbdev->dev, "Registers in-accessible after platform reset"); + return -EINVAL; + } + return 0; +} + static int kbase_pm_do_reset(struct kbase_device *kbdev) { struct kbasep_reset_timeout_data rtdata; int ret; +#if MALI_USE_CSF + if (kbdev->csf.reset.force_pm_hw_reset && kbdev->pm.backend.callback_hardware_reset) { + dev_err(kbdev->dev, "Power Cycle reset mali"); + kbdev->csf.reset.force_pm_hw_reset = false; + return kbase_pm_hw_reset(kbdev); + } +#endif + KBASE_KTRACE_ADD(kbdev, CORE_GPU_SOFT_RESET, NULL, 0); KBASE_TLSTREAM_JD_GPU_SOFT_RESET(kbdev, kbdev); + /* Unmask the reset complete interrupt only */ + kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_IRQ_MASK), RESET_COMPLETED); + if (kbdev->pm.backend.callback_soft_reset) { ret = kbdev->pm.backend.callback_soft_reset(kbdev); if (ret < 0) @@ -2847,9 +3430,6 @@ static int kbase_pm_do_reset(struct kbase_device *kbdev) GPU_COMMAND_SOFT_RESET); } - /* Unmask the reset complete interrupt only */ - kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_IRQ_MASK), RESET_COMPLETED); - /* Initialize a structure for tracking the status of the reset */ rtdata.kbdev = kbdev; rtdata.timed_out = false; @@ -2921,8 +3501,12 @@ static int kbase_pm_do_reset(struct kbase_device *kbdev) destroy_hrtimer_on_stack(&rtdata.timer); - dev_err(kbdev->dev, "Failed to hard-reset the GPU (timed out after %d ms)\n", - RESET_TIMEOUT); + dev_err(kbdev->dev, + "Failed to hard-reset the GPU (timed out after %d ms) GPU_IRQ_RAWSTAT: %d\n", + RESET_TIMEOUT, kbase_reg_read(kbdev, GPU_CONTROL_REG(GPU_IRQ_RAWSTAT))); + + /* Last resort, trigger a hardware reset of the GPU */ + return kbase_pm_hw_reset(kbdev); #ifdef CONFIG_MALI_ARBITER_SUPPORT } #endif /* CONFIG_MALI_ARBITER_SUPPORT */ @@ -2959,6 +3543,10 @@ int kbase_pm_init_hw(struct kbase_device *kbdev, unsigned int flags) kbdev->pm.backend.gpu_powered = true; } +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + /* Ensure the SC rail is up otherwise the FW will get stuck during reset */ + kbase_pm_turn_on_sc_power_rails_locked(kbdev); +#endif /* Ensure interrupts are off to begin with, this also clears any * outstanding interrupts diff --git a/mali_kbase/backend/gpu/mali_kbase_pm_event_log.c b/mali_kbase/backend/gpu/mali_kbase_pm_event_log.c new file mode 100644 index 0000000..b752af8 --- /dev/null +++ b/mali_kbase/backend/gpu/mali_kbase_pm_event_log.c @@ -0,0 +1,108 @@ +// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note +/* + * + * (C) COPYRIGHT 2022 Google LLC. All rights reserved. + * + * This program is free software and is provided to you under the terms of the + * GNU General Public License version 2 as published by the Free Software + * Foundation, and any use by you of this program is subject to the terms + * of such GNU license. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, you can access it online at + * http://www.gnu.org/licenses/gpl-2.0.html. + * + */ + +#include <backend/gpu/mali_kbase_pm_event_log.h> + +static inline u32 kbase_pm_next_log_event( + struct kbase_pm_event_log *log) +{ + u32 ret = log->last_event; + ++ret; + ret %= EVENT_LOG_MAX; + log->last_event = ret; + return ret; +} + +struct kbase_pm_event_log_event *kbase_pm_add_log_event( + struct kbase_device *kbdev) +{ + struct kbase_pm_event_log *log = &kbdev->pm.backend.event_log; + struct kbase_pm_event_log_event *ret = NULL; + + lockdep_assert_held(&kbdev->hwaccess_lock); + ret = &log->events[kbase_pm_next_log_event(log)]; + + memset(ret, 0, sizeof(*ret)); + ret->timestamp = ktime_get(); + return ret; +} + +/** + * struct kbase_pm_event_log_metadata - Info about the event log. + * + * @magic: always 'kpel', helps find the log in memory dumps + * @version: updated whenever the binary layout changes + * @events_address: the memory address of the log, or in a file the offset + * from the start of the metadata to the log + * @num_events: the capacity of the event log + * @event_stride: distance between log entries, to aid in parsing if only some + * entry types are supported by the parser + **/ +struct kbase_pm_event_log_metadata { + char magic[4]; + u8 version; + u64 events_address; + u32 num_events; + u32 event_stride; +} __attribute__((packed)); + +static struct kbase_pm_event_log_metadata global_event_log_metadata; + +void kbase_pm_init_event_log(struct kbase_device *kbdev) +{ + struct kbase_pm_event_log_metadata *md = + &global_event_log_metadata; + kbdev->pm.backend.event_log.last_event = -1; + md->magic[0] = 'k'; + md->magic[1] = 'p'; + md->magic[2] = 'e'; + md->magic[3] = 'l'; + md->version = 1; + md->num_events = EVENT_LOG_MAX; + md->events_address = (u64)kbdev->pm.backend.event_log.events; + md->event_stride = ((u8*)&kbdev->pm.backend.event_log.events[1] - + (u8*)&kbdev->pm.backend.event_log.events[0]); +} + +u64 kbase_pm_max_event_log_size(struct kbase_device *kbdev) +{ + return sizeof(struct kbase_pm_event_log_metadata) + + sizeof(kbdev->pm.backend.event_log.events); +} + +int kbase_pm_copy_event_log(struct kbase_device *kbdev, + void *buffer, u64 size) +{ + lockdep_assert_held(&kbdev->hwaccess_lock); + if (size < kbase_pm_max_event_log_size(kbdev)) { + return -EINVAL; + } + memcpy(buffer, &global_event_log_metadata, + sizeof(global_event_log_metadata)); + memcpy(((u8*)buffer) + sizeof(global_event_log_metadata), + &kbdev->pm.backend.event_log.events, + sizeof(kbdev->pm.backend.event_log.events)); + ((struct kbase_pm_event_log_metadata*)buffer)->events_address = + sizeof(struct kbase_pm_event_log_metadata); + + return 0; +} + diff --git a/mali_kbase/backend/gpu/mali_kbase_pm_event_log.h b/mali_kbase/backend/gpu/mali_kbase_pm_event_log.h new file mode 100644 index 0000000..072efa5 --- /dev/null +++ b/mali_kbase/backend/gpu/mali_kbase_pm_event_log.h @@ -0,0 +1,44 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * + * (C) COPYRIGHT 2022 Google LLC. All rights reserved. + * + * This program is free software and is provided to you under the terms of the + * GNU General Public License version 2 as published by the Free Software + * Foundation, and any use by you of this program is subject to the terms + * of such GNU license. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, you can access it online at + * http://www.gnu.org/licenses/gpl-2.0.html. + * + */ + +/* + * Power management API definitions used internally by GPU backend + */ + +#ifndef _KBASE_BACKEND_PM_EVENT_LOG_H_ +#define _KBASE_BACKEND_PM_EVENT_LOG_H_ + +#include <mali_kbase.h> +#include <mali_kbase_pm.h> + +/** + * kbase_pm_add_log_event - Add a newly-initialized event to the event log. + * + * @kbdev: Device pointer + * + * Return: a pointer to the event, which has been nulled out and had its + * timestamp set to the current time. + * + */ +struct kbase_pm_event_log_event *kbase_pm_add_log_event( + struct kbase_device *kbdev); + +#endif /* _KBASE_BACKEND_PM_INTERNAL_H_ */ diff --git a/mali_kbase/backend/gpu/mali_kbase_pm_internal.h b/mali_kbase/backend/gpu/mali_kbase_pm_internal.h index 68ded7d..d7f19fb 100644 --- a/mali_kbase/backend/gpu/mali_kbase_pm_internal.h +++ b/mali_kbase/backend/gpu/mali_kbase_pm_internal.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2010-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -224,7 +224,7 @@ void kbase_pm_reset_done(struct kbase_device *kbdev); * power off in progress and kbase_pm_context_active() was called instead of * kbase_csf_scheduler_pm_active(). * - * Return: 0 on success, error code on error + * Return: 0 on success, or -ETIMEDOUT code on timeout error. */ int kbase_pm_wait_for_desired_state(struct kbase_device *kbdev); #else @@ -247,12 +247,27 @@ int kbase_pm_wait_for_desired_state(struct kbase_device *kbdev); * must ensure that this is not the case by, for example, calling * kbase_pm_wait_for_poweroff_work_complete() * - * Return: 0 on success, error code on error + * Return: 0 on success, or -ETIMEDOUT error code on timeout error. */ int kbase_pm_wait_for_desired_state(struct kbase_device *kbdev); #endif /** + * kbase_pm_killable_wait_for_desired_state - Wait for the desired power state to be + * reached in a killable state. + * @kbdev: The kbase device structure for the device (must be a valid pointer) + * + * This function is same as kbase_pm_wait_for_desired_state(), expect that it would + * allow the SIGKILL signal to interrupt the wait. + * This function is supposed to be called from the code that is executed in ioctl or + * Userspace context, wherever it is safe to do so. + * + * Return: 0 on success, or -ETIMEDOUT code on timeout error or -ERESTARTSYS if the + * wait was interrupted. + */ +int kbase_pm_killable_wait_for_desired_state(struct kbase_device *kbdev); + +/** * kbase_pm_wait_for_l2_powered - Wait for the L2 cache to be powered on * * @kbdev: The kbase device structure for the device (must be a valid pointer) @@ -269,6 +284,37 @@ int kbase_pm_wait_for_desired_state(struct kbase_device *kbdev); */ int kbase_pm_wait_for_l2_powered(struct kbase_device *kbdev); +#if MALI_USE_CSF +/** + * kbase_pm_wait_for_cores_down_scale - Wait for the downscaling of shader cores + * + * @kbdev: The kbase device structure for the device (must be a valid pointer) + * + * This function can be called to ensure that the downscaling of cores is + * effectively complete and it would be safe to lower the voltage. + * The function assumes that caller had exercised the MCU state machine for the + * downscale request through the kbase_pm_update_state() function. + * + * This function needs to be used by the caller to safely wait for the completion + * of downscale request, instead of kbase_pm_wait_for_desired_state(). + * The downscale request would trigger a state change in MCU state machine + * and so when MCU reaches the stable ON state, it can be inferred that + * downscaling is complete. But it has been observed that the wake up of the + * waiting thread can get delayed by few milli seconds and by the time the + * thread wakes up the power down transition could have started (after the + * completion of downscale request). + * On the completion of power down transition another wake up signal would be + * sent, but again by the time thread wakes up the power up transition can begin. + * And the power up transition could then get blocked inside the platform specific + * callback_power_on() function due to the thread that called into Kbase (from the + * platform specific code) to perform the downscaling and then ended up waiting + * for the completion of downscale request. + * + * Return: 0 on success, error code on error or remaining jiffies on timeout. + */ +int kbase_pm_wait_for_cores_down_scale(struct kbase_device *kbdev); +#endif + /** * kbase_pm_update_dynamic_cores_onoff - Update the L2 and shader power state * machines after changing shader core @@ -436,8 +482,26 @@ void kbase_pm_release_gpu_cycle_counter_nolock(struct kbase_device *kbdev); * This function effectively just waits for the @gpu_poweroff_wait_work work * item to complete, if it was enqueued. GPU may not have been powered down * before this function returns. + * + * Return: 0 on success, error code on error + */ +int kbase_pm_wait_for_poweroff_work_complete(struct kbase_device *kbdev); + +/** + * kbase_pm_killable_wait_for_poweroff_work_complete - Wait for the poweroff workqueue to + * complete in killable state. + * + * @kbdev: The kbase device structure for the device (must be a valid pointer) + * + * This function is same as kbase_pm_wait_for_poweroff_work_complete(), expect that + * it would allow the SIGKILL signal to interrupt the wait. + * This function is supposed to be called from the code that is executed in ioctl or + * Userspace context, wherever it is safe to do so. + * + * Return: 0 on success, or -ETIMEDOUT code on timeout error or -ERESTARTSYS if the + * wait was interrupted. */ -void kbase_pm_wait_for_poweroff_work_complete(struct kbase_device *kbdev); +int kbase_pm_killable_wait_for_poweroff_work_complete(struct kbase_device *kbdev); /** * kbase_pm_wait_for_gpu_power_down - Wait for the GPU power down to complete @@ -800,7 +864,7 @@ bool kbase_pm_no_runnables_sched_suspendable(struct kbase_device *kbdev) /** * kbase_pm_no_mcu_core_pwroff - Check whether the PM is required to keep the - * MCU core powered in accordance to the active + * MCU shader Core powered in accordance to the active * power management policy * * @kbdev: Device pointer @@ -826,6 +890,8 @@ static inline bool kbase_pm_mcu_is_in_desired_state(struct kbase_device *kbdev) { bool in_desired_state = true; + lockdep_assert_held(&kbdev->hwaccess_lock); + if (kbase_pm_is_mcu_desired(kbdev) && kbdev->pm.backend.mcu_state != KBASE_MCU_ON) in_desired_state = false; else if (!kbase_pm_is_mcu_desired(kbdev) && @@ -869,7 +935,7 @@ static inline void kbase_pm_lock(struct kbase_device *kbdev) #if !MALI_USE_CSF mutex_lock(&kbdev->js_data.runpool_mutex); #endif /* !MALI_USE_CSF */ - mutex_lock(&kbdev->pm.lock); + rt_mutex_lock(&kbdev->pm.lock); } /** @@ -879,7 +945,7 @@ static inline void kbase_pm_lock(struct kbase_device *kbdev) */ static inline void kbase_pm_unlock(struct kbase_device *kbdev) { - mutex_unlock(&kbdev->pm.lock); + rt_mutex_unlock(&kbdev->pm.lock); #if !MALI_USE_CSF mutex_unlock(&kbdev->js_data.runpool_mutex); #endif /* !MALI_USE_CSF */ @@ -964,4 +1030,27 @@ static inline void kbase_pm_disable_db_mirror_interrupt(struct kbase_device *kbd } #endif +/** + * kbase_pm_l2_allow_mmu_page_migration - L2 state allows MMU page migration or not + * + * @kbdev: The kbase device structure for the device (must be a valid pointer) + * + * Check whether the L2 state is in power transition phase or not. If it is, the MMU + * page migration should be deferred. The caller must hold hwaccess_lock, and, if MMU + * page migration is intended, immediately start the MMU migration action without + * dropping the lock. When page migration begins, a flag is set in kbdev that would + * prevent the L2 state machine traversing into power transition phases, until + * the MMU migration action ends. + * + * Return: true if MMU page migration is allowed + */ +static inline bool kbase_pm_l2_allow_mmu_page_migration(struct kbase_device *kbdev) +{ + struct kbase_pm_backend_data *backend = &kbdev->pm.backend; + + lockdep_assert_held(&kbdev->hwaccess_lock); + + return (backend->l2_state != KBASE_L2_PEND_ON && backend->l2_state != KBASE_L2_PEND_OFF); +} + #endif /* _KBASE_BACKEND_PM_INTERNAL_H_ */ diff --git a/mali_kbase/backend/gpu/mali_kbase_pm_mcu_states.h b/mali_kbase/backend/gpu/mali_kbase_pm_mcu_states.h index 5e57c9d..3b448e3 100644 --- a/mali_kbase/backend/gpu/mali_kbase_pm_mcu_states.h +++ b/mali_kbase/backend/gpu/mali_kbase_pm_mcu_states.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2020-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -66,6 +66,13 @@ * is being put to sleep. * @ON_PEND_SLEEP: MCU sleep is in progress. * @IN_SLEEP: Sleep request is completed and MCU has halted. + * @ON_PMODE_ENTER_CORESIGHT_DISABLE: The MCU is on, protected mode enter is about to + * be requested, Coresight is being disabled. + * @ON_PMODE_EXIT_CORESIGHT_ENABLE : The MCU is on, protected mode exit has happened + * Coresight is being enabled. + * @CORESIGHT_DISABLE: The MCU is on and Coresight is being disabled. + * @CORESIGHT_ENABLE: The MCU is on, host does not have control and + * Coresight is being enabled. */ KBASEP_MCU_STATE(OFF) KBASEP_MCU_STATE(PEND_ON_RELOAD) @@ -92,3 +99,10 @@ KBASEP_MCU_STATE(HCTL_SHADERS_CORE_OFF_PEND) KBASEP_MCU_STATE(ON_SLEEP_INITIATE) KBASEP_MCU_STATE(ON_PEND_SLEEP) KBASEP_MCU_STATE(IN_SLEEP) +#if IS_ENABLED(CONFIG_MALI_CORESIGHT) +/* Additional MCU states for Coresight */ +KBASEP_MCU_STATE(ON_PMODE_ENTER_CORESIGHT_DISABLE) +KBASEP_MCU_STATE(ON_PMODE_EXIT_CORESIGHT_ENABLE) +KBASEP_MCU_STATE(CORESIGHT_DISABLE) +KBASEP_MCU_STATE(CORESIGHT_ENABLE) +#endif /* IS_ENABLED(CONFIG_MALI_CORESIGHT) */ diff --git a/mali_kbase/backend/gpu/mali_kbase_pm_metrics.c b/mali_kbase/backend/gpu/mali_kbase_pm_metrics.c index f85b466..5d98bd7 100644 --- a/mali_kbase/backend/gpu/mali_kbase_pm_metrics.c +++ b/mali_kbase/backend/gpu/mali_kbase_pm_metrics.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2011-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2011-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -24,6 +24,7 @@ */ #include <mali_kbase.h> +#include <mali_kbase_config_defaults.h> #include <mali_kbase_pm.h> #include <backend/gpu/mali_kbase_pm_internal.h> @@ -37,38 +38,64 @@ #include <backend/gpu/mali_kbase_pm_defs.h> #include <mali_linux_trace.h> +#if defined(CONFIG_MALI_DEVFREQ) || defined(CONFIG_MALI_MIDGARD_DVFS) || !MALI_USE_CSF /* Shift used for kbasep_pm_metrics_data.time_busy/idle - units of (1 << 8) ns * This gives a maximum period between samples of 2^(32+8)/100 ns = slightly * under 11s. Exceeding this will cause overflow */ #define KBASE_PM_TIME_SHIFT 8 +#endif #if MALI_USE_CSF /* To get the GPU_ACTIVE value in nano seconds unit */ #define GPU_ACTIVE_SCALING_FACTOR ((u64)1E9) #endif +/* + * Possible state transitions + * ON -> ON | OFF | STOPPED + * STOPPED -> ON | OFF + * OFF -> ON + * + * + * ┌─e─┐┌────────────f─────────────┐ + * │ v│ v + * └───ON ──a──> STOPPED ──b──> OFF + * ^^ │ │ + * │└──────c─────┘ │ + * │ │ + * └─────────────d─────────────┘ + * + * Transition effects: + * a. None + * b. Timer expires without restart + * c. Timer is not stopped, timer period is unaffected + * d. Timer must be restarted + * e. Callback is executed and the timer is restarted + * f. Timer is cancelled, or the callback is waited on if currently executing. This is called during + * tear-down and should not be subject to a race from an OFF->ON transition + */ +enum dvfs_metric_timer_state { TIMER_OFF, TIMER_STOPPED, TIMER_ON }; + #ifdef CONFIG_MALI_MIDGARD_DVFS static enum hrtimer_restart dvfs_callback(struct hrtimer *timer) { - unsigned long flags; struct kbasep_pm_metrics_state *metrics; - KBASE_DEBUG_ASSERT(timer != NULL); + if (WARN_ON(!timer)) + return HRTIMER_NORESTART; metrics = container_of(timer, struct kbasep_pm_metrics_state, timer); - kbase_pm_get_dvfs_action(metrics->kbdev); - - spin_lock_irqsave(&metrics->lock, flags); - if (metrics->timer_active) - hrtimer_start(timer, - HR_TIMER_DELAY_MSEC(metrics->kbdev->pm.dvfs_period), - HRTIMER_MODE_REL); + /* Transition (b) to fully off if timer was stopped, don't restart the timer in this case */ + if (atomic_cmpxchg(&metrics->timer_state, TIMER_STOPPED, TIMER_OFF) != TIMER_ON) + return HRTIMER_NORESTART; - spin_unlock_irqrestore(&metrics->lock, flags); + kbase_pm_get_dvfs_action(metrics->kbdev); - return HRTIMER_NORESTART; + /* Set the new expiration time and restart (transition e) */ + hrtimer_forward_now(timer, HR_TIMER_DELAY_MSEC(metrics->kbdev->pm.dvfs_period)); + return HRTIMER_RESTART; } #endif /* CONFIG_MALI_MIDGARD_DVFS */ @@ -83,7 +110,7 @@ int kbasep_pm_metrics_init(struct kbase_device *kbdev) KBASE_DEBUG_ASSERT(kbdev != NULL); kbdev->pm.backend.metrics.kbdev = kbdev; - kbdev->pm.backend.metrics.time_period_start = ktime_get(); + kbdev->pm.backend.metrics.time_period_start = ktime_get_raw(); kbdev->pm.backend.metrics.values.time_busy = 0; kbdev->pm.backend.metrics.values.time_idle = 0; kbdev->pm.backend.metrics.values.time_in_protm = 0; @@ -111,7 +138,7 @@ int kbasep_pm_metrics_init(struct kbase_device *kbdev) #else KBASE_DEBUG_ASSERT(kbdev != NULL); kbdev->pm.backend.metrics.kbdev = kbdev; - kbdev->pm.backend.metrics.time_period_start = ktime_get(); + kbdev->pm.backend.metrics.time_period_start = ktime_get_raw(); kbdev->pm.backend.metrics.gpu_active = false; kbdev->pm.backend.metrics.active_cl_ctx[0] = 0; @@ -134,6 +161,7 @@ int kbasep_pm_metrics_init(struct kbase_device *kbdev) HRTIMER_MODE_REL); kbdev->pm.backend.metrics.timer.function = dvfs_callback; kbdev->pm.backend.metrics.initialized = true; + atomic_set(&kbdev->pm.backend.metrics.timer_state, TIMER_OFF); kbase_pm_metrics_start(kbdev); #endif /* CONFIG_MALI_MIDGARD_DVFS */ @@ -152,16 +180,12 @@ KBASE_EXPORT_TEST_API(kbasep_pm_metrics_init); void kbasep_pm_metrics_term(struct kbase_device *kbdev) { #ifdef CONFIG_MALI_MIDGARD_DVFS - unsigned long flags; - KBASE_DEBUG_ASSERT(kbdev != NULL); - spin_lock_irqsave(&kbdev->pm.backend.metrics.lock, flags); - kbdev->pm.backend.metrics.timer_active = false; - spin_unlock_irqrestore(&kbdev->pm.backend.metrics.lock, flags); - - hrtimer_cancel(&kbdev->pm.backend.metrics.timer); + /* Cancel the timer, and block if the callback is currently executing (transition f) */ kbdev->pm.backend.metrics.initialized = false; + atomic_set(&kbdev->pm.backend.metrics.timer_state, TIMER_OFF); + hrtimer_cancel(&kbdev->pm.backend.metrics.timer); #endif /* CONFIG_MALI_MIDGARD_DVFS */ #if MALI_USE_CSF @@ -177,7 +201,7 @@ KBASE_EXPORT_TEST_API(kbasep_pm_metrics_term); */ #if MALI_USE_CSF #if defined(CONFIG_MALI_DEVFREQ) || defined(CONFIG_MALI_MIDGARD_DVFS) -static void kbase_pm_get_dvfs_utilisation_calc(struct kbase_device *kbdev) +static bool kbase_pm_get_dvfs_utilisation_calc(struct kbase_device *kbdev) { int err; u64 gpu_active_counter; @@ -199,7 +223,7 @@ static void kbase_pm_get_dvfs_utilisation_calc(struct kbase_device *kbdev) * elapsed time. The lock taken inside kbase_ipa_control_query() * function can cause lot of variation. */ - now = ktime_get(); + now = ktime_get_raw(); if (err) { dev_err(kbdev->dev, @@ -215,7 +239,20 @@ static void kbase_pm_get_dvfs_utilisation_calc(struct kbase_device *kbdev) diff_ns_signed = ktime_to_ns(diff); if (diff_ns_signed < 0) - return; + return false; + + /* + * The GPU internal counter is updated every IPA_CONTROL_TIMER_DEFAULT_VALUE_MS + * milliseconds. If an update occurs prematurely and the counter has not been + * updated, the same counter value will be obtained, resulting in a difference + * of zero. To handle this scenario, we will skip the update if the difference + * is zero and the update occurred less than 1.5 times the internal update period + * (IPA_CONTROL_TIMER_DEFAULT_VALUE_MS). Ideally, we should check the counter + * update timestamp in the GPU internal register to ensure accurate updates. + */ + if (gpu_active_counter == 0 && + diff_ns_signed < IPA_CONTROL_TIMER_DEFAULT_VALUE_MS * NSEC_PER_MSEC * 3 / 2) + return false; diff_ns = (u64)diff_ns_signed; @@ -231,12 +268,14 @@ static void kbase_pm_get_dvfs_utilisation_calc(struct kbase_device *kbdev) * time. */ if (!kbdev->pm.backend.metrics.skip_gpu_active_sanity_check) { - /* Use a margin value that is approximately 1% of the time - * difference. + /* The margin is scaled to allow for the worst-case + * scenario where the samples are maximally separated, + * plus a small offset for sampling errors. */ - u64 margin_ns = diff_ns >> 6; + u64 const MARGIN_NS = + IPA_CONTROL_TIMER_DEFAULT_VALUE_MS * NSEC_PER_MSEC * 3 / 2; - if (gpu_active_counter > (diff_ns + margin_ns)) { + if (gpu_active_counter > (diff_ns + MARGIN_NS)) { dev_info( kbdev->dev, "GPU activity takes longer than time interval: %llu ns > %llu ns", @@ -282,10 +321,11 @@ static void kbase_pm_get_dvfs_utilisation_calc(struct kbase_device *kbdev) } kbdev->pm.backend.metrics.time_period_start = now; + return true; } #endif /* defined(CONFIG_MALI_DEVFREQ) || defined(CONFIG_MALI_MIDGARD_DVFS) */ #else -static void kbase_pm_get_dvfs_utilisation_calc(struct kbase_device *kbdev, +static bool kbase_pm_get_dvfs_utilisation_calc(struct kbase_device *kbdev, ktime_t now) { ktime_t diff; @@ -294,7 +334,7 @@ static void kbase_pm_get_dvfs_utilisation_calc(struct kbase_device *kbdev, diff = ktime_sub(now, kbdev->pm.backend.metrics.time_period_start); if (ktime_to_ns(diff) < 0) - return; + return false; if (kbdev->pm.backend.metrics.gpu_active) { u32 ns_time = (u32) (ktime_to_ns(diff) >> KBASE_PM_TIME_SHIFT); @@ -316,6 +356,7 @@ static void kbase_pm_get_dvfs_utilisation_calc(struct kbase_device *kbdev, } kbdev->pm.backend.metrics.time_period_start = now; + return true; } #endif /* MALI_USE_CSF */ @@ -329,10 +370,13 @@ void kbase_pm_get_dvfs_metrics(struct kbase_device *kbdev, spin_lock_irqsave(&kbdev->pm.backend.metrics.lock, flags); #if MALI_USE_CSF - kbase_pm_get_dvfs_utilisation_calc(kbdev); + if (!kbase_pm_get_dvfs_utilisation_calc(kbdev)) { #else - kbase_pm_get_dvfs_utilisation_calc(kbdev, ktime_get()); + if (!kbase_pm_get_dvfs_utilisation_calc(kbdev, ktime_get_raw())) { #endif + spin_unlock_irqrestore(&kbdev->pm.backend.metrics.lock, flags); + return; + } memset(diff, 0, sizeof(*diff)); diff->time_busy = cur->time_busy - last->time_busy; @@ -396,57 +440,33 @@ void kbase_pm_get_dvfs_action(struct kbase_device *kbdev) bool kbase_pm_metrics_is_active(struct kbase_device *kbdev) { - bool isactive; - unsigned long flags; - KBASE_DEBUG_ASSERT(kbdev != NULL); - spin_lock_irqsave(&kbdev->pm.backend.metrics.lock, flags); - isactive = kbdev->pm.backend.metrics.timer_active; - spin_unlock_irqrestore(&kbdev->pm.backend.metrics.lock, flags); - - return isactive; + return atomic_read(&kbdev->pm.backend.metrics.timer_state) == TIMER_ON; } KBASE_EXPORT_TEST_API(kbase_pm_metrics_is_active); void kbase_pm_metrics_start(struct kbase_device *kbdev) { - unsigned long flags; - bool update = true; + struct kbasep_pm_metrics_state *metrics = &kbdev->pm.backend.metrics; - if (unlikely(!kbdev->pm.backend.metrics.initialized)) + if (unlikely(!metrics->initialized)) return; - spin_lock_irqsave(&kbdev->pm.backend.metrics.lock, flags); - if (!kbdev->pm.backend.metrics.timer_active) - kbdev->pm.backend.metrics.timer_active = true; - else - update = false; - spin_unlock_irqrestore(&kbdev->pm.backend.metrics.lock, flags); - - if (update) - hrtimer_start(&kbdev->pm.backend.metrics.timer, - HR_TIMER_DELAY_MSEC(kbdev->pm.dvfs_period), - HRTIMER_MODE_REL); + /* Transition to ON, from a stopped state (transition c) */ + if (atomic_xchg(&metrics->timer_state, TIMER_ON) == TIMER_OFF) + /* Start the timer only if it's been fully stopped (transition d)*/ + hrtimer_start(&metrics->timer, HR_TIMER_DELAY_MSEC(kbdev->pm.dvfs_period), + HRTIMER_MODE_REL); } void kbase_pm_metrics_stop(struct kbase_device *kbdev) { - unsigned long flags; - bool update = true; - if (unlikely(!kbdev->pm.backend.metrics.initialized)) return; - spin_lock_irqsave(&kbdev->pm.backend.metrics.lock, flags); - if (kbdev->pm.backend.metrics.timer_active) - kbdev->pm.backend.metrics.timer_active = false; - else - update = false; - spin_unlock_irqrestore(&kbdev->pm.backend.metrics.lock, flags); - - if (update) - hrtimer_cancel(&kbdev->pm.backend.metrics.timer); + /* Timer is Stopped if its currently on (transition a) */ + atomic_cmpxchg(&kbdev->pm.backend.metrics.timer_state, TIMER_ON, TIMER_STOPPED); } @@ -462,7 +482,7 @@ void kbase_pm_metrics_stop(struct kbase_device *kbdev) */ static void kbase_pm_metrics_active_calc(struct kbase_device *kbdev) { - int js; + unsigned int js; lockdep_assert_held(&kbdev->pm.backend.metrics.lock); @@ -512,7 +532,7 @@ void kbase_pm_metrics_update(struct kbase_device *kbdev, ktime_t *timestamp) spin_lock_irqsave(&kbdev->pm.backend.metrics.lock, flags); if (!timestamp) { - now = ktime_get(); + now = ktime_get_raw(); timestamp = &now; } diff --git a/mali_kbase/backend/gpu/mali_kbase_pm_policy.c b/mali_kbase/backend/gpu/mali_kbase_pm_policy.c index cb38c6e..7d7650c 100644 --- a/mali_kbase/backend/gpu/mali_kbase_pm_policy.c +++ b/mali_kbase/backend/gpu/mali_kbase_pm_policy.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2010-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -54,7 +54,9 @@ void kbase_pm_policy_init(struct kbase_device *kbdev) unsigned long flags; int i; - if (of_property_read_string(np, "power_policy", &power_policy_name) == 0) { + /* Read "power-policy" property and fallback to "power_policy" if not found */ + if ((of_property_read_string(np, "power-policy", &power_policy_name) == 0) || + (of_property_read_string(np, "power_policy", &power_policy_name) == 0)) { for (i = 0; i < ARRAY_SIZE(all_policy_list); i++) if (sysfs_streq(all_policy_list[i]->name, power_policy_name)) { default_policy = all_policy_list[i]; @@ -117,10 +119,12 @@ void kbase_pm_update_active(struct kbase_device *kbdev) } else { /* Cancel the invocation of * kbase_pm_gpu_poweroff_wait_wq() from the L2 state - * machine. This is safe - it + * machine. This is safe - if * invoke_poweroff_wait_wq_when_l2_off is true, then * the poweroff work hasn't even been queued yet, - * meaning we can go straight to powering on. + * meaning we can go straight to powering on. We must + * however wake_up(poweroff_wait) in case someone was + * waiting for poweroff_wait_in_progress to become false. */ pm->backend.invoke_poweroff_wait_wq_when_l2_off = false; pm->backend.poweroff_wait_in_progress = false; @@ -130,6 +134,7 @@ void kbase_pm_update_active(struct kbase_device *kbdev) #endif spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); + wake_up(&kbdev->pm.backend.poweroff_wait); kbase_pm_do_poweron(kbdev, false); } } else { @@ -293,6 +298,10 @@ void kbase_pm_set_policy(struct kbase_device *kbdev, unsigned int new_policy_csf_pm_sched_flags; bool sched_suspend; bool reset_gpu = false; + bool reset_op_prevented = true; + struct kbase_csf_scheduler *scheduler = NULL; + u32 pwroff; + bool switching_to_always_on; #endif KBASE_DEBUG_ASSERT(kbdev != NULL); @@ -301,9 +310,33 @@ void kbase_pm_set_policy(struct kbase_device *kbdev, KBASE_KTRACE_ADD(kbdev, PM_SET_POLICY, NULL, new_policy->id); #if MALI_USE_CSF + pwroff = kbase_csf_firmware_get_mcu_core_pwroff_time(kbdev); + switching_to_always_on = new_policy == &kbase_pm_always_on_policy_ops; + if (pwroff == 0 && !switching_to_always_on) { + dev_warn(kbdev->dev, + "power_policy: cannot switch away from always_on with mcu_shader_pwroff_timeout set to 0\n"); + dev_warn(kbdev->dev, + "power_policy: resetting mcu_shader_pwroff_timeout to default value to switch policy from always_on\n"); + kbase_csf_firmware_reset_mcu_core_pwroff_time(kbdev); + } + + scheduler = &kbdev->csf.scheduler; + KBASE_DEBUG_ASSERT(scheduler != NULL); + /* Serialize calls on kbase_pm_set_policy() */ mutex_lock(&kbdev->pm.backend.policy_change_lock); + if (kbase_reset_gpu_prevent_and_wait(kbdev)) { + dev_warn(kbdev->dev, "Set PM policy failing to prevent gpu reset"); + reset_op_prevented = false; + } + + /* In case of CSF, the scheduler may be invoked to suspend. In that + * case, there is a risk that the L2 may be turned on by the time we + * check it here. So we hold the scheduler lock to avoid other operations + * interfering with the policy change and vice versa. + */ + rt_mutex_lock(&scheduler->lock); spin_lock_irqsave(&kbdev->hwaccess_lock, flags); /* policy_change_clamp_state_to_off, when needed, is set/cleared in * this function, a very limited temporal scope for covering the @@ -316,23 +349,22 @@ void kbase_pm_set_policy(struct kbase_device *kbdev, * the always_on policy, reflected by the CSF_DYNAMIC_PM_CORE_KEEP_ON * flag bit. */ - sched_suspend = kbdev->csf.firmware_inited && + sched_suspend = reset_op_prevented && (CSF_DYNAMIC_PM_CORE_KEEP_ON & - (new_policy_csf_pm_sched_flags | - kbdev->pm.backend.csf_pm_sched_flags)); + (new_policy_csf_pm_sched_flags | kbdev->pm.backend.csf_pm_sched_flags)); spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); - if (sched_suspend) - kbase_csf_scheduler_pm_suspend(kbdev); + if (sched_suspend) { + /* Update the suspend flag to reflect actually suspend being done ! */ + sched_suspend = !kbase_csf_scheduler_pm_suspend_no_lock(kbdev); + /* Set the reset recovery flag if the required suspend failed */ + reset_gpu = !sched_suspend; + } spin_lock_irqsave(&kbdev->hwaccess_lock, flags); - /* If the current active policy is always_on, one needs to clamp the - * MCU/L2 for reaching off-state - */ - if (sched_suspend) - kbdev->pm.backend.policy_change_clamp_state_to_off = - CSF_DYNAMIC_PM_CORE_KEEP_ON & kbdev->pm.backend.csf_pm_sched_flags; + + kbdev->pm.backend.policy_change_clamp_state_to_off = sched_suspend; kbase_pm_update_state(kbdev); spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); @@ -392,13 +424,19 @@ void kbase_pm_set_policy(struct kbase_device *kbdev, #if MALI_USE_CSF /* Reverse the suspension done */ + if (sched_suspend) + kbase_csf_scheduler_pm_resume_no_lock(kbdev); + rt_mutex_unlock(&scheduler->lock); + + if (reset_op_prevented) + kbase_reset_gpu_allow(kbdev); + if (reset_gpu) { dev_warn(kbdev->dev, "Resorting to GPU reset for policy change\n"); if (kbase_prepare_to_reset_gpu(kbdev, RESET_FLAGS_NONE)) kbase_reset_gpu(kbdev); kbase_reset_gpu_wait(kbdev); - } else if (sched_suspend) - kbase_csf_scheduler_pm_resume(kbdev); + } mutex_unlock(&kbdev->pm.backend.policy_change_lock); #endif diff --git a/mali_kbase/backend/gpu/mali_kbase_time.c b/mali_kbase/backend/gpu/mali_kbase_time.c index a83206a..28365c0 100644 --- a/mali_kbase/backend/gpu/mali_kbase_time.c +++ b/mali_kbase/backend/gpu/mali_kbase_time.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2014-2016, 2018-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2014-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -21,9 +21,47 @@ #include <mali_kbase.h> #include <mali_kbase_hwaccess_time.h> +#if MALI_USE_CSF +#include <asm/arch_timer.h> +#include <linux/gcd.h> +#include <csf/mali_kbase_csf_timeout.h> +#endif #include <device/mali_kbase_device.h> #include <backend/gpu/mali_kbase_pm_internal.h> #include <mali_kbase_config_defaults.h> +#include <linux/version_compat_defs.h> + +struct kbase_timeout_info { + char *selector_str; + u64 timeout_cycles; +}; + +#if MALI_USE_CSF +static struct kbase_timeout_info timeout_info[KBASE_TIMEOUT_SELECTOR_COUNT] = { + [CSF_FIRMWARE_TIMEOUT] = { "CSF_FIRMWARE_TIMEOUT", MIN(CSF_FIRMWARE_TIMEOUT_CYCLES, + CSF_FIRMWARE_PING_TIMEOUT_CYCLES) }, + [CSF_PM_TIMEOUT] = { "CSF_PM_TIMEOUT", CSF_PM_TIMEOUT_CYCLES }, + [CSF_GPU_RESET_TIMEOUT] = { "CSF_GPU_RESET_TIMEOUT", CSF_GPU_RESET_TIMEOUT_CYCLES }, + [CSF_CSG_SUSPEND_TIMEOUT] = { "CSF_CSG_SUSPEND_TIMEOUT", CSF_CSG_SUSPEND_TIMEOUT_CYCLES }, + [CSF_FIRMWARE_BOOT_TIMEOUT] = { "CSF_FIRMWARE_BOOT_TIMEOUT", + CSF_FIRMWARE_BOOT_TIMEOUT_CYCLES }, + [CSF_FIRMWARE_PING_TIMEOUT] = { "CSF_FIRMWARE_PING_TIMEOUT", + CSF_FIRMWARE_PING_TIMEOUT_CYCLES }, + [CSF_SCHED_PROTM_PROGRESS_TIMEOUT] = { "CSF_SCHED_PROTM_PROGRESS_TIMEOUT", + DEFAULT_PROGRESS_TIMEOUT_CYCLES }, + [MMU_AS_INACTIVE_WAIT_TIMEOUT] = { "MMU_AS_INACTIVE_WAIT_TIMEOUT", + MMU_AS_INACTIVE_WAIT_TIMEOUT_CYCLES }, + [KCPU_FENCE_SIGNAL_TIMEOUT] = { "KCPU_FENCE_SIGNAL_TIMEOUT", + KCPU_FENCE_SIGNAL_TIMEOUT_CYCLES }, +}; +#else +static struct kbase_timeout_info timeout_info[KBASE_TIMEOUT_SELECTOR_COUNT] = { + [MMU_AS_INACTIVE_WAIT_TIMEOUT] = { "MMU_AS_INACTIVE_WAIT_TIMEOUT", + MMU_AS_INACTIVE_WAIT_TIMEOUT_CYCLES }, + [JM_DEFAULT_JS_FREE_TIMEOUT] = { "JM_DEFAULT_JS_FREE_TIMEOUT", + JM_DEFAULT_JS_FREE_TIMEOUT_CYCLES }, +}; +#endif void kbase_backend_get_gpu_time_norequest(struct kbase_device *kbdev, u64 *cycle_counter, @@ -103,72 +141,132 @@ void kbase_backend_get_gpu_time(struct kbase_device *kbdev, u64 *cycle_counter, #endif } -unsigned int kbase_get_timeout_ms(struct kbase_device *kbdev, - enum kbase_timeout_selector selector) +static u64 kbase_device_get_scaling_frequency(struct kbase_device *kbdev) { + u64 freq_khz = kbdev->lowest_gpu_freq_khz; + + if (!freq_khz) { + dev_dbg(kbdev->dev, + "Lowest frequency uninitialized! Using reference frequency for scaling"); + return DEFAULT_REF_TIMEOUT_FREQ_KHZ; + } + + return freq_khz; +} + +void kbase_device_set_timeout_ms(struct kbase_device *kbdev, enum kbase_timeout_selector selector, + unsigned int timeout_ms) +{ + char *selector_str; + + if (unlikely(selector >= KBASE_TIMEOUT_SELECTOR_COUNT)) { + selector = KBASE_DEFAULT_TIMEOUT; + selector_str = timeout_info[selector].selector_str; + dev_warn(kbdev->dev, + "Unknown timeout selector passed, falling back to default: %s\n", + timeout_info[selector].selector_str); + } + selector_str = timeout_info[selector].selector_str; + + kbdev->backend_time.device_scaled_timeouts[selector] = timeout_ms; + dev_dbg(kbdev->dev, "\t%-35s: %ums\n", selector_str, timeout_ms); +} + +void kbase_device_set_timeout(struct kbase_device *kbdev, enum kbase_timeout_selector selector, + u64 timeout_cycles, u32 cycle_multiplier) +{ + u64 final_cycles; + u64 timeout; + u64 freq_khz = kbase_device_get_scaling_frequency(kbdev); + + if (unlikely(selector >= KBASE_TIMEOUT_SELECTOR_COUNT)) { + selector = KBASE_DEFAULT_TIMEOUT; + dev_warn(kbdev->dev, + "Unknown timeout selector passed, falling back to default: %s\n", + timeout_info[selector].selector_str); + } + + /* If the multiplication overflows, we will have unsigned wrap-around, and so might + * end up with a shorter timeout. In those cases, we then want to have the largest + * timeout possible that will not run into these issues. Note that this will not + * wait for U64_MAX/frequency ms, as it will be clamped to a max of UINT_MAX + * milliseconds by subsequent steps. + */ + if (check_mul_overflow(timeout_cycles, (u64)cycle_multiplier, &final_cycles)) + final_cycles = U64_MAX; + /* Timeout calculation: * dividing number of cycles by freq in KHz automatically gives value * in milliseconds. nr_cycles will have to be multiplied by 1e3 to * get result in microseconds, and 1e6 to get result in nanoseconds. */ + timeout = div_u64(final_cycles, freq_khz); - u64 timeout, nr_cycles = 0; - /* Default value to mean 'no cap' */ - u64 timeout_cap = U64_MAX; - u64 freq_khz = kbdev->lowest_gpu_freq_khz; - /* Only for debug messages, safe default in case it's mis-maintained */ - const char *selector_str = "(unknown)"; + if (unlikely(timeout > UINT_MAX)) { + dev_dbg(kbdev->dev, + "Capping excessive timeout %llums for %s at freq %llukHz to UINT_MAX ms", + timeout, timeout_info[selector].selector_str, + kbase_device_get_scaling_frequency(kbdev)); + timeout = UINT_MAX; + } - WARN_ON(!freq_khz); + kbase_device_set_timeout_ms(kbdev, selector, (unsigned int)timeout); +} - switch (selector) { - case KBASE_TIMEOUT_SELECTOR_COUNT: - default: -#if !MALI_USE_CSF - WARN(1, "Invalid timeout selector used! Using default value"); - nr_cycles = JM_DEFAULT_TIMEOUT_CYCLES; - break; -#else - /* Use Firmware timeout if invalid selection */ - WARN(1, - "Invalid timeout selector used! Using CSF Firmware timeout"); - fallthrough; - case CSF_FIRMWARE_TIMEOUT: - selector_str = "CSF_FIRMWARE_TIMEOUT"; - nr_cycles = CSF_FIRMWARE_TIMEOUT_CYCLES; - /* Setup a cap on CSF FW timeout to FIRMWARE_PING_INTERVAL_MS, - * if calculated timeout exceeds it. This should be adapted to - * a direct timeout comparison once the - * FIRMWARE_PING_INTERVAL_MS option is added to this timeout - * function. A compile-time check such as BUILD_BUG_ON can also - * be done once the firmware ping interval in cycles becomes - * available as a macro. +/** + * kbase_timeout_scaling_init - Initialize the table of scaled timeout + * values associated with a @kbase_device. + * + * @kbdev: KBase device pointer. + * + * Return: 0 on success, negative error code otherwise. + */ +static int kbase_timeout_scaling_init(struct kbase_device *kbdev) +{ + int err; + enum kbase_timeout_selector selector; + + /* First, we initialize the minimum and maximum device frequencies, which + * are used to compute the timeouts. + */ + err = kbase_pm_gpu_freq_init(kbdev); + if (unlikely(err < 0)) { + dev_dbg(kbdev->dev, "Could not initialize GPU frequency\n"); + return err; + } + + dev_dbg(kbdev->dev, "Scaling kbase timeouts:\n"); + for (selector = 0; selector < KBASE_TIMEOUT_SELECTOR_COUNT; selector++) { + u32 cycle_multiplier = 1; + u64 nr_cycles = timeout_info[selector].timeout_cycles; +#if MALI_USE_CSF + /* Special case: the scheduler progress timeout can be set manually, + * and does not have a canonical length defined in the headers. Hence, + * we query it once upon startup to get a baseline, and change it upon + * every invocation of the appropriate functions */ - timeout_cap = FIRMWARE_PING_INTERVAL_MS; - break; - case CSF_PM_TIMEOUT: - selector_str = "CSF_PM_TIMEOUT"; - nr_cycles = CSF_PM_TIMEOUT_CYCLES; - break; - case CSF_GPU_RESET_TIMEOUT: - selector_str = "CSF_GPU_RESET_TIMEOUT"; - nr_cycles = CSF_GPU_RESET_TIMEOUT_CYCLES; - break; + if (selector == CSF_SCHED_PROTM_PROGRESS_TIMEOUT) + nr_cycles = kbase_csf_timeout_get(kbdev); #endif + + /* Since we are in control of the iteration bounds for the selector, + * we don't have to worry about bounds checking when setting the timeout. + */ + kbase_device_set_timeout(kbdev, selector, nr_cycles, cycle_multiplier); } + return 0; +} - timeout = div_u64(nr_cycles, freq_khz); - if (timeout > timeout_cap) { - dev_dbg(kbdev->dev, "Capped %s %llu to %llu", selector_str, - (unsigned long long)timeout, (unsigned long long)timeout_cap); - timeout = timeout_cap; +unsigned int kbase_get_timeout_ms(struct kbase_device *kbdev, enum kbase_timeout_selector selector) +{ + if (unlikely(selector >= KBASE_TIMEOUT_SELECTOR_COUNT)) { + dev_warn(kbdev->dev, "Querying wrong selector, falling back to default\n"); + selector = KBASE_DEFAULT_TIMEOUT; } - if (WARN(timeout > UINT_MAX, - "Capping excessive timeout %llums for %s at freq %llukHz to UINT_MAX ms", - (unsigned long long)timeout, selector_str, (unsigned long long)freq_khz)) - timeout = UINT_MAX; - return (unsigned int)timeout; + + return kbdev->backend_time.device_scaled_timeouts[selector]; } +KBASE_EXPORT_TEST_API(kbase_get_timeout_ms); u64 kbase_backend_get_cycle_cnt(struct kbase_device *kbdev) { @@ -186,3 +284,79 @@ u64 kbase_backend_get_cycle_cnt(struct kbase_device *kbdev) return lo | (((u64) hi1) << 32); } + +#if MALI_USE_CSF +u64 __maybe_unused kbase_backend_time_convert_gpu_to_cpu(struct kbase_device *kbdev, u64 gpu_ts) +{ + if (WARN_ON(!kbdev)) + return 0; + + return div64_u64(gpu_ts * kbdev->backend_time.multiplier, kbdev->backend_time.divisor) + + kbdev->backend_time.offset; +} + +/** + * get_cpu_gpu_time() - Get current CPU and GPU timestamps. + * + * @kbdev: Kbase device. + * @cpu_ts: Output CPU timestamp. + * @gpu_ts: Output GPU timestamp. + * @gpu_cycle: Output GPU cycle counts. + */ +static void get_cpu_gpu_time(struct kbase_device *kbdev, u64 *cpu_ts, u64 *gpu_ts, u64 *gpu_cycle) +{ + struct timespec64 ts; + + kbase_backend_get_gpu_time(kbdev, gpu_cycle, gpu_ts, &ts); + + if (cpu_ts) + *cpu_ts = ts.tv_sec * NSEC_PER_SEC + ts.tv_nsec; +} +#endif + +int kbase_backend_time_init(struct kbase_device *kbdev) +{ + int err = 0; +#if MALI_USE_CSF + u64 cpu_ts = 0; + u64 gpu_ts = 0; + u64 freq; + u64 common_factor; + + kbase_pm_register_access_enable(kbdev); + get_cpu_gpu_time(kbdev, &cpu_ts, &gpu_ts, NULL); + freq = arch_timer_get_cntfrq(); + + if (!freq) { + dev_warn(kbdev->dev, "arch_timer_get_rate() is zero!"); + err = -EINVAL; + goto disable_registers; + } + + common_factor = gcd(NSEC_PER_SEC, freq); + + kbdev->backend_time.multiplier = div64_u64(NSEC_PER_SEC, common_factor); + kbdev->backend_time.divisor = div64_u64(freq, common_factor); + + if (!kbdev->backend_time.divisor) { + dev_warn(kbdev->dev, "CPU to GPU divisor is zero!"); + err = -EINVAL; + goto disable_registers; + } + + kbdev->backend_time.offset = cpu_ts - div64_u64(gpu_ts * kbdev->backend_time.multiplier, + kbdev->backend_time.divisor); +#endif + + if (kbase_timeout_scaling_init(kbdev)) { + dev_warn(kbdev->dev, "Could not initialize timeout scaling"); + err = -EINVAL; + } + +#if MALI_USE_CSF +disable_registers: + kbase_pm_register_access_disable(kbdev); +#endif + + return err; +} diff --git a/mali_kbase/build.bp b/mali_kbase/build.bp index 5dd5fd5..381b1fe 100644 --- a/mali_kbase/build.bp +++ b/mali_kbase/build.bp @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2017-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2017-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -28,10 +28,11 @@ bob_defaults { defaults: [ "kernel_defaults", ], - no_mali: { + mali_no_mali: { kbuild_options: [ "CONFIG_MALI_NO_MALI=y", "CONFIG_MALI_NO_MALI_DEFAULT_GPU={{.gpu}}", + "CONFIG_GPU_HWVER={{.hwver}}", ], }, mali_platform_dt_pin_rst: { @@ -52,9 +53,6 @@ bob_defaults { mali_midgard_enable_trace: { kbuild_options: ["CONFIG_MALI_MIDGARD_ENABLE_TRACE=y"], }, - mali_dma_fence: { - kbuild_options: ["CONFIG_MALI_DMA_FENCE=y"], - }, mali_arbiter_support: { kbuild_options: ["CONFIG_MALI_ARBITER_SUPPORT=y"], }, @@ -64,8 +62,14 @@ bob_defaults { mali_dma_buf_legacy_compat: { kbuild_options: ["CONFIG_MALI_DMA_BUF_LEGACY_COMPAT=y"], }, - mali_2mb_alloc: { - kbuild_options: ["CONFIG_MALI_2MB_ALLOC=y"], + large_page_alloc_override: { + kbuild_options: ["CONFIG_LARGE_PAGE_ALLOC_OVERRIDE=y"], + }, + large_page_alloc: { + kbuild_options: ["CONFIG_LARGE_PAGE_ALLOC=y"], + }, + page_migration_support: { + kbuild_options: ["CONFIG_PAGE_MIGRATION_SUPPORT=y"], }, mali_memory_fully_backed: { kbuild_options: ["CONFIG_MALI_MEMORY_FULLY_BACKED=y"], @@ -88,9 +92,6 @@ bob_defaults { mali_error_inject: { kbuild_options: ["CONFIG_MALI_ERROR_INJECT=y"], }, - mali_gem5_build: { - kbuild_options: ["CONFIG_MALI_GEM5_BUILD=y"], - }, mali_debug: { kbuild_options: [ "CONFIG_MALI_DEBUG=y", @@ -136,6 +137,27 @@ bob_defaults { mali_hw_errata_1485982_use_clock_alternative: { kbuild_options: ["CONFIG_MALI_HW_ERRATA_1485982_USE_CLOCK_ALTERNATIVE=y"], }, + mali_host_controls_sc_rails: { + kbuild_options: ["CONFIG_MALI_HOST_CONTROLS_SC_RAILS=y"], + }, + platform_is_fpga: { + kbuild_options: ["CONFIG_MALI_IS_FPGA=y"], + }, + mali_coresight: { + kbuild_options: ["CONFIG_MALI_CORESIGHT=y"], + }, + mali_fw_trace_mode_manual: { + kbuild_options: ["CONFIG_MALI_FW_TRACE_MODE_MANUAL=y"], + }, + mali_fw_trace_mode_auto_print: { + kbuild_options: ["CONFIG_MALI_FW_TRACE_MODE_AUTO_PRINT=y"], + }, + mali_fw_trace_mode_auto_discard: { + kbuild_options: ["CONFIG_MALI_FW_TRACE_MODE_AUTO_DISCARD=y"], + }, + mali_trace_power_gpu_work_period: { + kbuild_options: ["CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD=y"], + }, kbuild_options: [ "CONFIG_MALI_PLATFORM_NAME={{.mali_platform_name}}", "MALI_CUSTOMER_RELEASE={{.release}}", @@ -156,10 +178,8 @@ bob_defaults { // is an umbrella feature that would be open for inappropriate use // (catch-all for experimental CS code without separating it into // different features). - "MALI_INCREMENTAL_RENDERING={{.incremental_rendering}}", - "MALI_GPU_TIMESTAMP_CORRECTION={{.gpu_timestamp_correction}}", + "MALI_INCREMENTAL_RENDERING_JM={{.incremental_rendering_jm}}", "MALI_BASE_CSF_PERFORMANCE_TESTS={{.base_csf_performance_tests}}", - "MALI_GPU_TIMESTAMP_INTERPOLATION={{.gpu_timestamp_interpolation}}", ], } @@ -178,6 +198,10 @@ bob_kernel_module { "context/*.c", "context/*.h", "context/Kbuild", + "hwcnt/*.c", + "hwcnt/*.h", + "hwcnt/backend/*.h", + "hwcnt/Kbuild", "ipa/*.c", "ipa/*.h", "ipa/Kbuild", @@ -185,6 +209,15 @@ bob_kernel_module { "platform/*/*.c", "platform/*/*.h", "platform/*/Kbuild", + "platform/*/*/*.c", + "platform/*/*/*.h", + "platform/*/*/Kbuild", + "platform/*/*/*.c", + "platform/*/*/*.h", + "platform/*/*/Kbuild", + "platform/*/*/*/*.c", + "platform/*/*/*/*.h", + "platform/*/*/*/Kbuild", "thirdparty/*.c", "thirdparty/Kbuild", "debug/*.c", @@ -211,6 +244,10 @@ bob_kernel_module { "device/backend/*_jm.c", "gpu/backend/*_jm.c", "gpu/backend/*_jm.h", + "hwcnt/backend/*_jm.c", + "hwcnt/backend/*_jm.h", + "hwcnt/backend/*_jm_*.c", + "hwcnt/backend/*_jm_*.h", "jm/*.h", "tl/backend/*_jm.c", "mmu/backend/*_jm.c", @@ -232,6 +269,10 @@ bob_kernel_module { "device/backend/*_csf.c", "gpu/backend/*_csf.c", "gpu/backend/*_csf.h", + "hwcnt/backend/*_csf.c", + "hwcnt/backend/*_csf.h", + "hwcnt/backend/*_csf_*.c", + "hwcnt/backend/*_csf_*.h", "tl/backend/*_csf.c", "mmu/backend/*_csf.c", "ipa/backend/*_csf.c", diff --git a/mali_kbase/context/backend/mali_kbase_context_csf.c b/mali_kbase/context/backend/mali_kbase_context_csf.c index 7d45a08..45a5a6c 100644 --- a/mali_kbase/context/backend/mali_kbase_context_csf.c +++ b/mali_kbase/context/backend/mali_kbase_context_csf.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -26,27 +26,33 @@ #include <context/mali_kbase_context_internal.h> #include <gpu/mali_kbase_gpu_regmap.h> #include <mali_kbase.h> -#include <mali_kbase_dma_fence.h> #include <mali_kbase_mem_linux.h> #include <mali_kbase_mem_pool_group.h> #include <mmu/mali_kbase_mmu.h> #include <tl/mali_kbase_timeline.h> +#include <backend/gpu/mali_kbase_pm_internal.h> #if IS_ENABLED(CONFIG_DEBUG_FS) #include <csf/mali_kbase_csf_csg_debugfs.h> #include <csf/mali_kbase_csf_kcpu_debugfs.h> +#include <csf/mali_kbase_csf_sync_debugfs.h> #include <csf/mali_kbase_csf_tiler_heap_debugfs.h> #include <csf/mali_kbase_csf_cpu_queue_debugfs.h> #include <mali_kbase_debug_mem_view.h> +#include <mali_kbase_debug_mem_zones.h> +#include <mali_kbase_debug_mem_allocs.h> #include <mali_kbase_mem_pool_debugfs.h> void kbase_context_debugfs_init(struct kbase_context *const kctx) { kbase_debug_mem_view_init(kctx); + kbase_debug_mem_zones_init(kctx); + kbase_debug_mem_allocs_init(kctx); kbase_mem_pool_debugfs_init(kctx->kctx_dentry, kctx); kbase_jit_debugfs_init(kctx); kbase_csf_queue_group_debugfs_init(kctx); kbase_csf_kcpu_debugfs_init(kctx); + kbase_csf_sync_debugfs_init(kctx); kbase_csf_tiler_heap_debugfs_init(kctx); kbase_csf_tiler_heap_total_debugfs_init(kctx); kbase_csf_cpu_queue_debugfs_init(kctx); @@ -96,6 +102,8 @@ static const struct kbase_context_init context_init[] = { { kbase_sticky_resource_init, kbase_context_sticky_resource_term, "Sticky resource initialization failed" }, { kbase_jit_init, kbase_jit_term, "JIT initialization failed" }, + { kbasep_platform_context_init, kbasep_platform_context_term, + "Platform callback for kctx initialization failed" }, { kbase_csf_ctx_init, kbase_csf_ctx_term, "CSF context initialization failed" }, { kbase_context_add_to_dev_list, kbase_context_remove_from_dev_list, @@ -116,7 +124,7 @@ struct kbase_context *kbase_create_context(struct kbase_device *kbdev, bool is_compat, base_context_create_flags const flags, unsigned long const api_version, - struct file *const filp) + struct kbase_file *const kfile) { struct kbase_context *kctx; unsigned int i = 0; @@ -135,9 +143,11 @@ struct kbase_context *kbase_create_context(struct kbase_device *kbdev, kctx->kbdev = kbdev; kctx->api_version = api_version; - kctx->filp = filp; + kctx->kfile = kfile; kctx->create_flags = flags; + memcpy(kctx->comm, current->comm, sizeof(current->comm)); + if (is_compat) kbase_ctx_flag_set(kctx, KCTX_COMPAT); #if defined(CONFIG_64BIT) @@ -172,6 +182,7 @@ KBASE_EXPORT_SYMBOL(kbase_create_context); void kbase_destroy_context(struct kbase_context *kctx) { struct kbase_device *kbdev; + int err; if (WARN_ON(!kctx)) return; @@ -192,6 +203,27 @@ void kbase_destroy_context(struct kbase_context *kctx) wait_event(kbdev->pm.resume_wait, !kbase_pm_is_suspending(kbdev)); } + /* + * Taking a pm reference does not guarantee that the GPU has finished powering up. + * It's possible that the power up has been deferred until after a scheduled power down. + * We must wait here for the L2 to be powered up, and holding a pm reference guarantees that + * it will not be powered down afterwards. + */ + err = kbase_pm_wait_for_l2_powered(kbdev); + if (err) { + dev_err(kbdev->dev, "Wait for L2 power up failed on term of ctx %d_%d", + kctx->tgid, kctx->id); + } + + /* Have synchronized against the System suspend and incremented the + * pm.active_count. So any subsequent invocation of System suspend + * callback would get blocked. + * If System suspend callback was already in progress then the above loop + * would have waited till the System resume callback has begun. + * So wait for the System resume callback to also complete as we want to + * avoid context termination during System resume also. + */ + wait_event(kbdev->pm.resume_wait, !kbase_pm_is_resuming(kbdev)); kbase_mem_pool_group_mark_dying(&kctx->mem_pools); diff --git a/mali_kbase/context/backend/mali_kbase_context_jm.c b/mali_kbase/context/backend/mali_kbase_context_jm.c index 74402ec..39595d9 100644 --- a/mali_kbase/context/backend/mali_kbase_context_jm.c +++ b/mali_kbase/context/backend/mali_kbase_context_jm.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -27,7 +27,6 @@ #include <gpu/mali_kbase_gpu_regmap.h> #include <mali_kbase.h> #include <mali_kbase_ctx_sched.h> -#include <mali_kbase_dma_fence.h> #include <mali_kbase_kinstr_jm.h> #include <mali_kbase_mem_linux.h> #include <mali_kbase_mem_pool_group.h> @@ -36,11 +35,15 @@ #if IS_ENABLED(CONFIG_DEBUG_FS) #include <mali_kbase_debug_mem_view.h> +#include <mali_kbase_debug_mem_zones.h> +#include <mali_kbase_debug_mem_allocs.h> #include <mali_kbase_mem_pool_debugfs.h> void kbase_context_debugfs_init(struct kbase_context *const kctx) { kbase_debug_mem_view_init(kctx); + kbase_debug_mem_zones_init(kctx); + kbase_debug_mem_allocs_init(kctx); kbase_mem_pool_debugfs_init(kctx->kctx_dentry, kctx); kbase_jit_debugfs_init(kctx); kbasep_jd_debugfs_ctx_init(kctx); @@ -126,8 +129,6 @@ static const struct kbase_context_init context_init[] = { { NULL, kbase_context_free, NULL }, { kbase_context_common_init, kbase_context_common_term, "Common context initialization failed" }, - { kbase_dma_fence_init, kbase_dma_fence_term, - "DMA fence initialization failed" }, { kbase_context_mem_pool_group_init, kbase_context_mem_pool_group_term, "Memory pool group initialization failed" }, { kbase_mem_evictable_init, kbase_mem_evictable_deinit, @@ -157,11 +158,11 @@ static const struct kbase_context_init context_init[] = { kbase_debug_job_fault_context_term, "Job fault context initialization failed" }, #endif + { kbasep_platform_context_init, kbasep_platform_context_term, + "Platform callback for kctx initialization failed" }, { NULL, kbase_context_flush_jobs, NULL }, { kbase_context_add_to_dev_list, kbase_context_remove_from_dev_list, "Adding kctx to device failed" }, - { kbasep_platform_context_init, kbasep_platform_context_term, - "Platform callback for kctx initialization failed" }, }; static void kbase_context_term_partial( @@ -178,7 +179,7 @@ struct kbase_context *kbase_create_context(struct kbase_device *kbdev, bool is_compat, base_context_create_flags const flags, unsigned long const api_version, - struct file *const filp) + struct kbase_file *const kfile) { struct kbase_context *kctx; unsigned int i = 0; @@ -197,7 +198,7 @@ struct kbase_context *kbase_create_context(struct kbase_device *kbdev, kctx->kbdev = kbdev; kctx->api_version = api_version; - kctx->filp = filp; + kctx->kfile = kfile; kctx->create_flags = flags; if (is_compat) @@ -257,6 +258,17 @@ void kbase_destroy_context(struct kbase_context *kctx) wait_event(kbdev->pm.resume_wait, !kbase_pm_is_suspending(kbdev)); } + + /* Have synchronized against the System suspend and incremented the + * pm.active_count. So any subsequent invocation of System suspend + * callback would get blocked. + * If System suspend callback was already in progress then the above loop + * would have waited till the System resume callback has begun. + * So wait for the System resume callback to also complete as we want to + * avoid context termination during System resume also. + */ + wait_event(kbdev->pm.resume_wait, !kbase_pm_is_resuming(kbdev)); + #ifdef CONFIG_MALI_ARBITER_SUPPORT atomic_dec(&kbdev->pm.gpu_users_waiting); #endif /* CONFIG_MALI_ARBITER_SUPPORT */ diff --git a/mali_kbase/context/mali_kbase_context.c b/mali_kbase/context/mali_kbase_context.c index c7d7585..d227084 100644 --- a/mali_kbase/context/mali_kbase_context.c +++ b/mali_kbase/context/mali_kbase_context.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -22,6 +22,12 @@ /* * Base kernel context APIs */ +#include <linux/version.h> +#if KERNEL_VERSION(4, 11, 0) <= LINUX_VERSION_CODE +#include <linux/sched/task.h> +#else +#include <linux/sched.h> +#endif #include <mali_kbase.h> #include <gpu/mali_kbase_gpu_regmap.h> @@ -176,17 +182,51 @@ int kbase_context_common_init(struct kbase_context *kctx) /* creating a context is considered a disjoint event */ kbase_disjoint_event(kctx->kbdev); - kctx->as_nr = KBASEP_AS_NR_INVALID; - - atomic_set(&kctx->refcount, 0); - - spin_lock_init(&kctx->mm_update_lock); kctx->process_mm = NULL; + kctx->task = NULL; atomic_set(&kctx->nonmapped_pages, 0); atomic_set(&kctx->permanent_mapped_pages, 0); kctx->tgid = current->tgid; kctx->pid = current->pid; + /* Check if this is a Userspace created context */ + if (likely(kctx->kfile)) { + struct pid *pid_struct; + + rcu_read_lock(); + pid_struct = find_get_pid(kctx->tgid); + if (likely(pid_struct)) { + struct task_struct *task = pid_task(pid_struct, PIDTYPE_PID); + + if (likely(task)) { + /* Take a reference on the task to avoid slow lookup + * later on from the page allocation loop. + */ + get_task_struct(task); + kctx->task = task; + } else { + dev_err(kctx->kbdev->dev, + "Failed to get task pointer for %s/%d", + current->comm, current->pid); + err = -ESRCH; + } + + put_pid(pid_struct); + } else { + dev_err(kctx->kbdev->dev, + "Failed to get pid pointer for %s/%d", + current->comm, current->pid); + err = -ESRCH; + } + rcu_read_unlock(); + + if (unlikely(err)) + return err; + + kbase_mem_mmgrab(); + kctx->process_mm = current->mm; + } + atomic_set(&kctx->used_pages, 0); mutex_init(&kctx->reg_lock); @@ -197,7 +237,6 @@ int kbase_context_common_init(struct kbase_context *kctx) spin_lock_init(&kctx->waiting_soft_jobs_lock); INIT_LIST_HEAD(&kctx->waiting_soft_jobs); - init_waitqueue_head(&kctx->event_queue); atomic_set(&kctx->event_count, 0); #if !MALI_USE_CSF @@ -212,18 +251,23 @@ int kbase_context_common_init(struct kbase_context *kctx) atomic64_set(&kctx->num_fixed_allocs, 0); #endif + kbase_gpu_vm_lock(kctx); bitmap_copy(kctx->cookies, &cookies_mask, BITS_PER_LONG); + kbase_gpu_vm_unlock(kctx); kctx->id = atomic_add_return(1, &(kctx->kbdev->ctx_num)) - 1; mutex_lock(&kctx->kbdev->kctx_list_lock); - err = kbase_insert_kctx_to_process(kctx); - if (err) - dev_err(kctx->kbdev->dev, - "(err:%d) failed to insert kctx to kbase_process\n", err); - mutex_unlock(&kctx->kbdev->kctx_list_lock); + if (err) { + dev_err(kctx->kbdev->dev, + "(err:%d) failed to insert kctx to kbase_process", err); + if (likely(kctx->kfile)) { + mmdrop(kctx->process_mm); + put_task_struct(kctx->task); + } + } return err; } @@ -286,7 +330,9 @@ static void kbase_remove_kctx_from_process(struct kbase_context *kctx) /* Add checks, so that the terminating process Should not * hold any gpu_memory. */ + spin_lock(&kctx->kbdev->gpu_mem_usage_lock); WARN_ON(kprcs->total_gpu_pages); + spin_unlock(&kctx->kbdev->gpu_mem_usage_lock); WARN_ON(!RB_EMPTY_ROOT(&kprcs->dma_buf_root)); kobject_del(&kprcs->kobj); kobject_put(&kprcs->kobj); @@ -296,15 +342,8 @@ static void kbase_remove_kctx_from_process(struct kbase_context *kctx) void kbase_context_common_term(struct kbase_context *kctx) { - unsigned long flags; int pages; - mutex_lock(&kctx->kbdev->mmu_hw_mutex); - spin_lock_irqsave(&kctx->kbdev->hwaccess_lock, flags); - kbase_ctx_sched_remove_ctx(kctx); - spin_unlock_irqrestore(&kctx->kbdev->hwaccess_lock, flags); - mutex_unlock(&kctx->kbdev->mmu_hw_mutex); - pages = atomic_read(&kctx->used_pages); if (pages != 0) dev_warn(kctx->kbdev->dev, @@ -316,15 +355,18 @@ void kbase_context_common_term(struct kbase_context *kctx) kbase_remove_kctx_from_process(kctx); mutex_unlock(&kctx->kbdev->kctx_list_lock); + if (likely(kctx->kfile)) { + mmdrop(kctx->process_mm); + put_task_struct(kctx->task); + } + KBASE_KTRACE_ADD(kctx->kbdev, CORE_CTX_DESTROY, kctx, 0u); } int kbase_context_mem_pool_group_init(struct kbase_context *kctx) { - return kbase_mem_pool_group_init(&kctx->mem_pools, - kctx->kbdev, - &kctx->kbdev->mem_pool_defaults, - &kctx->kbdev->mem_pools); + return kbase_mem_pool_group_init(&kctx->mem_pools, kctx->kbdev, + &kctx->kbdev->mem_pool_defaults, &kctx->kbdev->mem_pools); } void kbase_context_mem_pool_group_term(struct kbase_context *kctx) diff --git a/mali_kbase/context/mali_kbase_context.h b/mali_kbase/context/mali_kbase_context.h index a0c51c9..22cb00c 100644 --- a/mali_kbase/context/mali_kbase_context.h +++ b/mali_kbase/context/mali_kbase_context.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2011-2017, 2019-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2011-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -56,8 +56,9 @@ void kbase_context_debugfs_term(struct kbase_context *const kctx); * BASEP_CONTEXT_CREATE_KERNEL_FLAGS. * @api_version: Application program interface version, as encoded in * a single integer by the KBASE_API_VERSION macro. - * @filp: Pointer to the struct file corresponding to device file - * /dev/malixx instance, passed to the file's open method. + * @kfile: Pointer to the object representing the /dev/malixx device + * file instance. Shall be passed as NULL for internally created + * contexts. * * Up to one context can be created for each client that opens the device file * /dev/malixx. Context creation is deferred until a special ioctl() system call @@ -69,7 +70,7 @@ struct kbase_context * kbase_create_context(struct kbase_device *kbdev, bool is_compat, base_context_create_flags const flags, unsigned long api_version, - struct file *filp); + struct kbase_file *const kfile); /** * kbase_destroy_context - Destroy a kernel base context. @@ -93,6 +94,19 @@ static inline bool kbase_ctx_flag(struct kbase_context *kctx, } /** + * kbase_ctx_compat_mode - Indicate whether a kbase context needs to operate + * in compatibility mode for 32-bit userspace. + * @kctx: kbase context + * + * Return: True if needs to maintain compatibility, False otherwise. + */ +static inline bool kbase_ctx_compat_mode(struct kbase_context *kctx) +{ + return !IS_ENABLED(CONFIG_64BIT) || + (IS_ENABLED(CONFIG_64BIT) && kbase_ctx_flag(kctx, KCTX_COMPAT)); +} + +/** * kbase_ctx_flag_clear - Clear @flag on @kctx * @kctx: Pointer to kbase context * @flag: Flag to clear diff --git a/mali_kbase/csf/Kbuild b/mali_kbase/csf/Kbuild index 29983fb..c626092 100644 --- a/mali_kbase/csf/Kbuild +++ b/mali_kbase/csf/Kbuild @@ -1,6 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note # -# (C) COPYRIGHT 2018-2021 ARM Limited. All rights reserved. +# (C) COPYRIGHT 2018-2023 ARM Limited. All rights reserved. # # This program is free software and is provided to you under the terms of the # GNU General Public License version 2 as published by the Free Software @@ -31,14 +31,24 @@ mali_kbase-y += \ csf/mali_kbase_csf_reset_gpu.o \ csf/mali_kbase_csf_csg_debugfs.o \ csf/mali_kbase_csf_kcpu_debugfs.o \ + csf/mali_kbase_csf_sync_debugfs.o \ + csf/mali_kbase_csf_kcpu_fence_debugfs.o \ csf/mali_kbase_csf_protected_memory.o \ csf/mali_kbase_csf_tiler_heap_debugfs.o \ csf/mali_kbase_csf_cpu_queue_debugfs.o \ - csf/mali_kbase_csf_event.o + csf/mali_kbase_csf_event.o \ + csf/mali_kbase_csf_firmware_log.o \ + csf/mali_kbase_csf_firmware_core_dump.o \ + csf/mali_kbase_csf_tiler_heap_reclaim.o \ + csf/mali_kbase_csf_mcu_shared_reg.o -mali_kbase-$(CONFIG_MALI_REAL_HW) += csf/mali_kbase_csf_firmware.o +ifeq ($(CONFIG_MALI_NO_MALI),y) +mali_kbase-y += csf/mali_kbase_csf_firmware_no_mali.o +else +mali_kbase-y += csf/mali_kbase_csf_firmware.o +endif -mali_kbase-$(CONFIG_MALI_NO_MALI) += csf/mali_kbase_csf_firmware_no_mali.o +mali_kbase-$(CONFIG_DEBUG_FS) += csf/mali_kbase_debug_csf_fault.o ifeq ($(KBUILD_EXTMOD),) # in-tree diff --git a/mali_kbase/csf/ipa_control/mali_kbase_csf_ipa_control.c b/mali_kbase/csf/ipa_control/mali_kbase_csf_ipa_control.c index a56b689..bbf2e4e 100644 --- a/mali_kbase/csf/ipa_control/mali_kbase_csf_ipa_control.c +++ b/mali_kbase/csf/ipa_control/mali_kbase_csf_ipa_control.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2020-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -20,6 +20,7 @@ */ #include <mali_kbase.h> +#include <mali_kbase_config_defaults.h> #include "backend/gpu/mali_kbase_clk_rate_trace_mgr.h" #include "mali_kbase_csf_ipa_control.h" @@ -27,8 +28,6 @@ * Status flags from the STATUS register of the IPA Control interface. */ #define STATUS_COMMAND_ACTIVE ((u32)1 << 0) -#define STATUS_TIMER_ACTIVE ((u32)1 << 1) -#define STATUS_AUTO_ACTIVE ((u32)1 << 2) #define STATUS_PROTECTED_MODE ((u32)1 << 8) #define STATUS_RESET ((u32)1 << 9) #define STATUS_TIMER_ENABLED ((u32)1 << 31) @@ -36,27 +35,15 @@ /* * Commands for the COMMAND register of the IPA Control interface. */ -#define COMMAND_NOP ((u32)0) #define COMMAND_APPLY ((u32)1) -#define COMMAND_CLEAR ((u32)2) #define COMMAND_SAMPLE ((u32)3) #define COMMAND_PROTECTED_ACK ((u32)4) #define COMMAND_RESET_ACK ((u32)5) /* - * Default value for the TIMER register of the IPA Control interface, - * expressed in milliseconds. - * - * The chosen value is a trade off between two requirements: the IPA Control - * interface should sample counters with a resolution in the order of - * milliseconds, while keeping GPU overhead as limited as possible. - */ -#define TIMER_DEFAULT_VALUE_MS ((u32)10) /* 10 milliseconds */ - -/* * Number of timer events per second. */ -#define TIMER_EVENTS_PER_SECOND ((u32)1000 / TIMER_DEFAULT_VALUE_MS) +#define TIMER_EVENTS_PER_SECOND ((u32)1000 / IPA_CONTROL_TIMER_DEFAULT_VALUE_MS) /* * Maximum number of loops polling the GPU before we assume the GPU has hung. @@ -77,12 +64,19 @@ * struct kbase_ipa_control_listener_data - Data for the GPU clock frequency * listener * - * @listener: GPU clock frequency listener. - * @kbdev: Pointer to kbase device. + * @listener: GPU clock frequency listener. + * @kbdev: Pointer to kbase device. + * @clk_chg_wq: Dedicated workqueue to process the work item corresponding to + * a clock rate notification. + * @clk_chg_work: Work item to process the clock rate change + * @rate: The latest notified rate change, in unit of Hz */ struct kbase_ipa_control_listener_data { struct kbase_clk_rate_listener listener; struct kbase_device *kbdev; + struct workqueue_struct *clk_chg_wq; + struct work_struct clk_chg_work; + atomic_t rate; }; static u32 timer_value(u32 gpu_rate) @@ -284,58 +278,61 @@ kbase_ipa_control_rate_change_notify(struct kbase_clk_rate_listener *listener, u32 clk_index, u32 clk_rate_hz) { if ((clk_index == KBASE_CLOCK_DOMAIN_TOP) && (clk_rate_hz != 0)) { - size_t i; - unsigned long flags; struct kbase_ipa_control_listener_data *listener_data = - container_of(listener, - struct kbase_ipa_control_listener_data, - listener); - struct kbase_device *kbdev = listener_data->kbdev; - struct kbase_ipa_control *ipa_ctrl = &kbdev->csf.ipa_control; + container_of(listener, struct kbase_ipa_control_listener_data, listener); - spin_lock_irqsave(&kbdev->hwaccess_lock, flags); + /* Save the rate and delegate the job to a work item */ + atomic_set(&listener_data->rate, clk_rate_hz); + queue_work(listener_data->clk_chg_wq, &listener_data->clk_chg_work); + } +} - if (!kbdev->pm.backend.gpu_ready) { - dev_err(kbdev->dev, - "%s: GPU frequency cannot change while GPU is off", - __func__); - spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); - return; - } +static void kbase_ipa_ctrl_rate_change_worker(struct work_struct *data) +{ + struct kbase_ipa_control_listener_data *listener_data = + container_of(data, struct kbase_ipa_control_listener_data, clk_chg_work); + struct kbase_device *kbdev = listener_data->kbdev; + struct kbase_ipa_control *ipa_ctrl = &kbdev->csf.ipa_control; + unsigned long flags; + u32 rate; + size_t i; - /* Interrupts are already disabled and interrupt state is also saved */ - spin_lock(&ipa_ctrl->lock); + spin_lock_irqsave(&kbdev->hwaccess_lock, flags); - for (i = 0; i < KBASE_IPA_CONTROL_MAX_SESSIONS; i++) { - struct kbase_ipa_control_session *session = &ipa_ctrl->sessions[i]; + if (!kbdev->pm.backend.gpu_ready) { + dev_err(kbdev->dev, "%s: GPU frequency cannot change while GPU is off", __func__); + spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); + return; + } - if (session->active) { - size_t j; + spin_lock(&ipa_ctrl->lock); + /* Picking up the latest notified rate */ + rate = (u32)atomic_read(&listener_data->rate); - for (j = 0; j < session->num_prfcnts; j++) { - struct kbase_ipa_control_prfcnt *prfcnt = - &session->prfcnts[j]; + for (i = 0; i < KBASE_IPA_CONTROL_MAX_SESSIONS; i++) { + struct kbase_ipa_control_session *session = &ipa_ctrl->sessions[i]; - if (prfcnt->gpu_norm) - calc_prfcnt_delta(kbdev, prfcnt, true); - } - } - } + if (session->active) { + size_t j; - ipa_ctrl->cur_gpu_rate = clk_rate_hz; + for (j = 0; j < session->num_prfcnts; j++) { + struct kbase_ipa_control_prfcnt *prfcnt = &session->prfcnts[j]; - /* Update the timer for automatic sampling if active sessions - * are present. Counters have already been manually sampled. - */ - if (ipa_ctrl->num_active_sessions > 0) { - kbase_reg_write(kbdev, IPA_CONTROL_REG(TIMER), - timer_value(ipa_ctrl->cur_gpu_rate)); + if (prfcnt->gpu_norm) + calc_prfcnt_delta(kbdev, prfcnt, true); + } } + } - spin_unlock(&ipa_ctrl->lock); + ipa_ctrl->cur_gpu_rate = rate; + /* Update the timer for automatic sampling if active sessions + * are present. Counters have already been manually sampled. + */ + if (ipa_ctrl->num_active_sessions > 0) + kbase_reg_write(kbdev, IPA_CONTROL_REG(TIMER), timer_value(rate)); - spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); - } + spin_unlock(&ipa_ctrl->lock); + spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); } void kbase_ipa_control_init(struct kbase_device *kbdev) @@ -344,6 +341,7 @@ void kbase_ipa_control_init(struct kbase_device *kbdev) struct kbase_clk_rate_trace_manager *clk_rtm = &kbdev->pm.clk_rtm; struct kbase_ipa_control_listener_data *listener_data; size_t i, j; + unsigned long flags; for (i = 0; i < KBASE_IPA_CORE_TYPE_NUM; i++) { for (j = 0; j < KBASE_IPA_CONTROL_NUM_BLOCK_COUNTERS; j++) { @@ -362,20 +360,35 @@ void kbase_ipa_control_init(struct kbase_device *kbdev) listener_data = kmalloc(sizeof(struct kbase_ipa_control_listener_data), GFP_KERNEL); if (listener_data) { - listener_data->listener.notify = - kbase_ipa_control_rate_change_notify; - listener_data->kbdev = kbdev; - ipa_ctrl->rtm_listener_data = listener_data; - } + listener_data->clk_chg_wq = + alloc_workqueue("ipa_ctrl_wq", WQ_HIGHPRI | WQ_UNBOUND, 1); + if (listener_data->clk_chg_wq) { + INIT_WORK(&listener_data->clk_chg_work, kbase_ipa_ctrl_rate_change_worker); + listener_data->listener.notify = kbase_ipa_control_rate_change_notify; + listener_data->kbdev = kbdev; + ipa_ctrl->rtm_listener_data = listener_data; + /* Initialise to 0, which is out of normal notified rates */ + atomic_set(&listener_data->rate, 0); + } else { + dev_warn(kbdev->dev, + "%s: failed to allocate workqueue, clock rate update disabled", + __func__); + kfree(listener_data); + listener_data = NULL; + } + } else + dev_warn(kbdev->dev, + "%s: failed to allocate memory, IPA control clock rate update disabled", + __func__); - spin_lock(&clk_rtm->lock); + spin_lock_irqsave(&clk_rtm->lock, flags); if (clk_rtm->clks[KBASE_CLOCK_DOMAIN_TOP]) ipa_ctrl->cur_gpu_rate = clk_rtm->clks[KBASE_CLOCK_DOMAIN_TOP]->clock_val; if (listener_data) kbase_clk_rate_trace_manager_subscribe_no_lock( clk_rtm, &listener_data->listener); - spin_unlock(&clk_rtm->lock); + spin_unlock_irqrestore(&clk_rtm->lock, flags); } KBASE_EXPORT_TEST_API(kbase_ipa_control_init); @@ -389,8 +402,10 @@ void kbase_ipa_control_term(struct kbase_device *kbdev) WARN_ON(ipa_ctrl->num_active_sessions); - if (listener_data) + if (listener_data) { kbase_clk_rate_trace_manager_unsubscribe(clk_rtm, &listener_data->listener); + destroy_workqueue(listener_data->clk_chg_wq); + } kfree(ipa_ctrl->rtm_listener_data); spin_lock_irqsave(&kbdev->hwaccess_lock, flags); @@ -602,9 +617,10 @@ int kbase_ipa_control_register( */ for (session_idx = 0; session_idx < KBASE_IPA_CONTROL_MAX_SESSIONS; session_idx++) { - session = &ipa_ctrl->sessions[session_idx]; - if (!session->active) + if (!ipa_ctrl->sessions[session_idx].active) { + session = &ipa_ctrl->sessions[session_idx]; break; + } } if (!session) { @@ -659,7 +675,7 @@ int kbase_ipa_control_register( /* Reports to this client for GPU time spent in protected mode * should begin from the point of registration. */ - session->last_query_time = ktime_get_ns(); + session->last_query_time = ktime_get_raw_ns(); /* Initially, no time has been spent in protected mode */ session->protm_time = 0; @@ -829,7 +845,7 @@ int kbase_ipa_control_query(struct kbase_device *kbdev, const void *client, } if (protected_time) { - u64 time_now = ktime_get_ns(); + u64 time_now = ktime_get_raw_ns(); /* This is the amount of protected-mode time spent prior to * the current protm period. @@ -973,16 +989,53 @@ void kbase_ipa_control_handle_gpu_reset_post(struct kbase_device *kbdev) } KBASE_EXPORT_TEST_API(kbase_ipa_control_handle_gpu_reset_post); +#ifdef KBASE_PM_RUNTIME +void kbase_ipa_control_handle_gpu_sleep_enter(struct kbase_device *kbdev) +{ + lockdep_assert_held(&kbdev->hwaccess_lock); + + if (kbdev->pm.backend.mcu_state == KBASE_MCU_IN_SLEEP) { + /* GPU Sleep is treated as a power down */ + kbase_ipa_control_handle_gpu_power_off(kbdev); + + /* SELECT_CSHW register needs to be cleared to prevent any + * IPA control message to be sent to the top level GPU HWCNT. + */ + kbase_reg_write(kbdev, IPA_CONTROL_REG(SELECT_CSHW_LO), 0); + kbase_reg_write(kbdev, IPA_CONTROL_REG(SELECT_CSHW_HI), 0); + + /* No need to issue the APPLY command here */ + } +} +KBASE_EXPORT_TEST_API(kbase_ipa_control_handle_gpu_sleep_enter); + +void kbase_ipa_control_handle_gpu_sleep_exit(struct kbase_device *kbdev) +{ + lockdep_assert_held(&kbdev->hwaccess_lock); + + if (kbdev->pm.backend.mcu_state == KBASE_MCU_IN_SLEEP) { + /* To keep things simple, currently exit from + * GPU Sleep is treated as a power on event where + * all 4 SELECT registers are reconfigured. + * On exit from sleep, reconfiguration is needed + * only for the SELECT_CSHW register. + */ + kbase_ipa_control_handle_gpu_power_on(kbdev); + } +} +KBASE_EXPORT_TEST_API(kbase_ipa_control_handle_gpu_sleep_exit); +#endif + #if MALI_UNIT_TEST void kbase_ipa_control_rate_change_notify_test(struct kbase_device *kbdev, u32 clk_index, u32 clk_rate_hz) { struct kbase_ipa_control *ipa_ctrl = &kbdev->csf.ipa_control; - struct kbase_ipa_control_listener_data *listener_data = - ipa_ctrl->rtm_listener_data; + struct kbase_ipa_control_listener_data *listener_data = ipa_ctrl->rtm_listener_data; - kbase_ipa_control_rate_change_notify(&listener_data->listener, - clk_index, clk_rate_hz); + kbase_ipa_control_rate_change_notify(&listener_data->listener, clk_index, clk_rate_hz); + /* Ensure the callback has taken effect before returning back to the test caller */ + flush_work(&listener_data->clk_chg_work); } KBASE_EXPORT_TEST_API(kbase_ipa_control_rate_change_notify_test); #endif @@ -992,14 +1045,14 @@ void kbase_ipa_control_protm_entered(struct kbase_device *kbdev) struct kbase_ipa_control *ipa_ctrl = &kbdev->csf.ipa_control; lockdep_assert_held(&kbdev->hwaccess_lock); - ipa_ctrl->protm_start = ktime_get_ns(); + ipa_ctrl->protm_start = ktime_get_raw_ns(); } void kbase_ipa_control_protm_exited(struct kbase_device *kbdev) { struct kbase_ipa_control *ipa_ctrl = &kbdev->csf.ipa_control; size_t i; - u64 time_now = ktime_get_ns(); + u64 time_now = ktime_get_raw_ns(); u32 status; lockdep_assert_held(&kbdev->hwaccess_lock); @@ -1035,4 +1088,3 @@ void kbase_ipa_control_protm_exited(struct kbase_device *kbdev) } } } - diff --git a/mali_kbase/csf/ipa_control/mali_kbase_csf_ipa_control.h b/mali_kbase/csf/ipa_control/mali_kbase_csf_ipa_control.h index 0469c48..69ff897 100644 --- a/mali_kbase/csf/ipa_control/mali_kbase_csf_ipa_control.h +++ b/mali_kbase/csf/ipa_control/mali_kbase_csf_ipa_control.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2020-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -198,6 +198,33 @@ void kbase_ipa_control_handle_gpu_reset_pre(struct kbase_device *kbdev); */ void kbase_ipa_control_handle_gpu_reset_post(struct kbase_device *kbdev); +#ifdef KBASE_PM_RUNTIME +/** + * kbase_ipa_control_handle_gpu_sleep_enter - Handle the pre GPU Sleep event + * + * @kbdev: Pointer to kbase device. + * + * This function is called after MCU has been put to sleep state & L2 cache has + * been powered down. The top level part of GPU is still powered up when this + * function is called. + */ +void kbase_ipa_control_handle_gpu_sleep_enter(struct kbase_device *kbdev); + +/** + * kbase_ipa_control_handle_gpu_sleep_exit - Handle the post GPU Sleep event + * + * @kbdev: Pointer to kbase device. + * + * This function is called when L2 needs to be powered up and MCU can exit the + * sleep state. The top level part of GPU is powered up when this function is + * called. + * + * This function must be called only if kbase_ipa_control_handle_gpu_sleep_enter() + * was called previously. + */ +void kbase_ipa_control_handle_gpu_sleep_exit(struct kbase_device *kbdev); +#endif + #if MALI_UNIT_TEST /** * kbase_ipa_control_rate_change_notify_test - Notify GPU rate change diff --git a/mali_kbase/csf/mali_kbase_csf.c b/mali_kbase/csf/mali_kbase_csf.c index 1a92267..91d5c43 100644 --- a/mali_kbase/csf/mali_kbase_csf.c +++ b/mali_kbase/csf/mali_kbase_csf.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2018-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2018-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -34,10 +34,19 @@ #include <csf/ipa_control/mali_kbase_csf_ipa_control.h> #include <mali_kbase_hwaccess_time.h> #include "mali_kbase_csf_event.h" +#include <mali_linux_trace.h> +#include <linux/protected_memory_allocator.h> +#include <tl/mali_kbase_tracepoints.h> +#include "mali_kbase_csf_mcu_shared_reg.h" +#include <linux/version_compat_defs.h> #define CS_REQ_EXCEPTION_MASK (CS_REQ_FAULT_MASK | CS_REQ_FATAL_MASK) #define CS_ACK_EXCEPTION_MASK (CS_ACK_FAULT_MASK | CS_ACK_FATAL_MASK) -#define POWER_DOWN_LATEST_FLUSH_VALUE ((u32)1) + +#define CS_RING_BUFFER_MAX_SIZE ((uint32_t)(1 << 31)) /* 2GiB */ +#define CS_RING_BUFFER_MIN_SIZE ((uint32_t)4096) + +#define PROTM_ALLOC_MAX_RETRIES ((u8)5) const u8 kbasep_csf_queue_group_priority_to_relative[BASE_QUEUE_GROUP_PRIORITY_COUNT] = { KBASE_QUEUE_GROUP_PRIORITY_HIGH, @@ -52,6 +61,55 @@ const u8 kbasep_csf_relative_to_queue_group_priority[KBASE_QUEUE_GROUP_PRIORITY_ BASE_QUEUE_GROUP_PRIORITY_LOW }; +/* + * struct irq_idle_and_protm_track - Object that tracks the idle and protected mode + * request information in an interrupt case across + * groups. + * + * @protm_grp: Possibly schedulable group that requested protected mode in the interrupt. + * If NULL, no such case observed in the tracked interrupt case. + * @idle_seq: The highest priority group that notified idle. If no such instance in the + * interrupt case, marked with the largest field value: U32_MAX. + * @idle_slot: The slot number if @p idle_seq is valid in the given tracking case. + */ +struct irq_idle_and_protm_track { + struct kbase_queue_group *protm_grp; + u32 idle_seq; + s8 idle_slot; +}; + +/** + * kbasep_ctx_user_reg_page_mapping_term() - Terminate resources for USER Register Page. + * + * @kctx: Pointer to the kbase context + */ +static void kbasep_ctx_user_reg_page_mapping_term(struct kbase_context *kctx) +{ + struct kbase_device *kbdev = kctx->kbdev; + + if (unlikely(kctx->csf.user_reg.vma)) + dev_err(kbdev->dev, "VMA for USER Register page exist on termination of ctx %d_%d", + kctx->tgid, kctx->id); + if (WARN_ON_ONCE(!list_empty(&kctx->csf.user_reg.link))) + list_del_init(&kctx->csf.user_reg.link); +} + +/** + * kbasep_ctx_user_reg_page_mapping_init() - Initialize resources for USER Register Page. + * + * @kctx: Pointer to the kbase context + * + * @return: 0 on success. + */ +static int kbasep_ctx_user_reg_page_mapping_init(struct kbase_context *kctx) +{ + INIT_LIST_HEAD(&kctx->csf.user_reg.link); + kctx->csf.user_reg.vma = NULL; + kctx->csf.user_reg.file_offset = 0; + + return 0; +} + static void put_user_pages_mmap_handle(struct kbase_context *kctx, struct kbase_queue *queue) { @@ -112,116 +170,32 @@ static int get_user_pages_mmap_handle(struct kbase_context *kctx, return 0; } -static void gpu_munmap_user_io_pages(struct kbase_context *kctx, - struct kbase_va_region *reg) -{ - size_t num_pages = 2; - - kbase_mmu_teardown_pages(kctx->kbdev, &kctx->kbdev->csf.mcu_mmu, - reg->start_pfn, num_pages, MCU_AS_NR); - - WARN_ON(reg->flags & KBASE_REG_FREE); - - mutex_lock(&kctx->kbdev->csf.reg_lock); - kbase_remove_va_region(kctx->kbdev, reg); - mutex_unlock(&kctx->kbdev->csf.reg_lock); -} - static void init_user_io_pages(struct kbase_queue *queue) { - u32 *input_addr = (u32 *)(queue->user_io_addr); - u32 *output_addr = (u32 *)(queue->user_io_addr + PAGE_SIZE); - - input_addr[CS_INSERT_LO/4] = 0; - input_addr[CS_INSERT_HI/4] = 0; - - input_addr[CS_EXTRACT_INIT_LO/4] = 0; - input_addr[CS_EXTRACT_INIT_HI/4] = 0; - - output_addr[CS_EXTRACT_LO/4] = 0; - output_addr[CS_EXTRACT_HI/4] = 0; - - output_addr[CS_ACTIVE/4] = 0; -} - -/* Map the input/output pages in the shared interface segment of MCU firmware - * address space. - */ -static int gpu_mmap_user_io_pages(struct kbase_device *kbdev, - struct tagged_addr *phys, struct kbase_va_region *reg) -{ - unsigned long mem_flags = KBASE_REG_GPU_RD; - const size_t num_pages = 2; - int ret; + u64 *input_addr = queue->user_io_addr; + u64 *output_addr64 = queue->user_io_addr + PAGE_SIZE / sizeof(u64); + u32 *output_addr32 = (u32 *)(queue->user_io_addr + PAGE_SIZE / sizeof(u64)); - /* Calls to this function are inherently asynchronous, with respect to - * MMU operations. + /* + * CS_INSERT and CS_EXTRACT registers contain 64-bit memory addresses which + * should be accessed atomically. Here we update them 32-bits at a time, but + * as this is initialisation code, non-atomic accesses are safe. */ - const enum kbase_caller_mmu_sync_info mmu_sync_info = CALLER_MMU_ASYNC; - -#if ((KERNEL_VERSION(4, 4, 147) >= LINUX_VERSION_CODE) || \ - ((KERNEL_VERSION(4, 6, 0) > LINUX_VERSION_CODE) && \ - (KERNEL_VERSION(4, 5, 0) <= LINUX_VERSION_CODE))) - mem_flags |= - KBASE_REG_MEMATTR_INDEX(AS_MEMATTR_INDEX_NON_CACHEABLE); -#else - if (kbdev->system_coherency == COHERENCY_NONE) { - mem_flags |= - KBASE_REG_MEMATTR_INDEX(AS_MEMATTR_INDEX_NON_CACHEABLE); - } else { - mem_flags |= KBASE_REG_SHARE_BOTH | - KBASE_REG_MEMATTR_INDEX(AS_MEMATTR_INDEX_SHARED); - } -#endif - - mutex_lock(&kbdev->csf.reg_lock); - ret = kbase_add_va_region_rbtree(kbdev, reg, 0, num_pages, 1); - reg->flags &= ~KBASE_REG_FREE; - mutex_unlock(&kbdev->csf.reg_lock); - - if (ret) - return ret; - - /* Map input page */ - ret = kbase_mmu_insert_pages(kbdev, &kbdev->csf.mcu_mmu, reg->start_pfn, - &phys[0], 1, mem_flags, MCU_AS_NR, - KBASE_MEM_GROUP_CSF_IO, mmu_sync_info); - if (ret) - goto bad_insert; - - /* Map output page, it needs rw access */ - mem_flags |= KBASE_REG_GPU_WR; - ret = kbase_mmu_insert_pages(kbdev, &kbdev->csf.mcu_mmu, - reg->start_pfn + 1, &phys[1], 1, mem_flags, - MCU_AS_NR, KBASE_MEM_GROUP_CSF_IO, - mmu_sync_info); - if (ret) - goto bad_insert_output_page; - - return 0; - -bad_insert_output_page: - kbase_mmu_teardown_pages(kbdev, &kbdev->csf.mcu_mmu, - reg->start_pfn, 1, MCU_AS_NR); -bad_insert: - mutex_lock(&kbdev->csf.reg_lock); - kbase_remove_va_region(kbdev, reg); - mutex_unlock(&kbdev->csf.reg_lock); - - return ret; + input_addr[CS_INSERT_LO / sizeof(*input_addr)] = 0; + input_addr[CS_EXTRACT_INIT_LO / sizeof(*input_addr)] = 0; + output_addr64[CS_EXTRACT_LO / sizeof(*output_addr64)] = 0; + output_addr32[CS_ACTIVE / sizeof(*output_addr32)] = 0; } static void kernel_unmap_user_io_pages(struct kbase_context *kctx, struct kbase_queue *queue) { - const size_t num_pages = 2; - kbase_gpu_vm_lock(kctx); vunmap(queue->user_io_addr); - WARN_ON(num_pages > atomic_read(&kctx->permanent_mapped_pages)); - atomic_sub(num_pages, &kctx->permanent_mapped_pages); + WARN_ON(atomic_read(&kctx->permanent_mapped_pages) < KBASEP_NUM_CS_USER_IO_PAGES); + atomic_sub(KBASEP_NUM_CS_USER_IO_PAGES, &kctx->permanent_mapped_pages); kbase_gpu_vm_unlock(kctx); } @@ -231,6 +205,8 @@ static int kernel_map_user_io_pages(struct kbase_context *kctx, { struct page *page_list[2]; pgprot_t cpu_map_prot; + unsigned long flags; + uint64_t *user_io_addr; int ret = 0; size_t i; @@ -245,27 +221,25 @@ static int kernel_map_user_io_pages(struct kbase_context *kctx, /* The pages are mapped to Userspace also, so use the same mapping * attributes as used inside the CPU page fault handler. */ -#if ((KERNEL_VERSION(4, 4, 147) >= LINUX_VERSION_CODE) || \ - ((KERNEL_VERSION(4, 6, 0) > LINUX_VERSION_CODE) && \ - (KERNEL_VERSION(4, 5, 0) <= LINUX_VERSION_CODE))) - cpu_map_prot = pgprot_device(PAGE_KERNEL); -#else if (kctx->kbdev->system_coherency == COHERENCY_NONE) cpu_map_prot = pgprot_writecombine(PAGE_KERNEL); else cpu_map_prot = PAGE_KERNEL; -#endif for (i = 0; i < ARRAY_SIZE(page_list); i++) page_list[i] = as_page(queue->phys[i]); - queue->user_io_addr = vmap(page_list, ARRAY_SIZE(page_list), VM_MAP, cpu_map_prot); + user_io_addr = vmap(page_list, ARRAY_SIZE(page_list), VM_MAP, cpu_map_prot); - if (!queue->user_io_addr) + if (!user_io_addr) ret = -ENOMEM; else atomic_add(ARRAY_SIZE(page_list), &kctx->permanent_mapped_pages); + kbase_csf_scheduler_spin_lock(kctx->kbdev, &flags); + queue->user_io_addr = user_io_addr; + kbase_csf_scheduler_spin_unlock(kctx->kbdev, flags); + unlock: kbase_gpu_vm_unlock(kctx); return ret; @@ -273,7 +247,7 @@ unlock: static void term_queue_group(struct kbase_queue_group *group); static void get_queue(struct kbase_queue *queue); -static void release_queue(struct kbase_queue *queue); +static bool release_queue(struct kbase_queue *queue); /** * kbase_csf_free_command_stream_user_pages() - Free the resources allocated @@ -297,70 +271,62 @@ static void release_queue(struct kbase_queue *queue); * If an explicit or implicit unbind was missed by the userspace then the * mapping will persist. On process exit kernel itself will remove the mapping. */ -static void kbase_csf_free_command_stream_user_pages(struct kbase_context *kctx, - struct kbase_queue *queue) +void kbase_csf_free_command_stream_user_pages(struct kbase_context *kctx, struct kbase_queue *queue) { - const size_t num_pages = 2; - - gpu_munmap_user_io_pages(kctx, queue->reg); kernel_unmap_user_io_pages(kctx, queue); kbase_mem_pool_free_pages( &kctx->mem_pools.small[KBASE_MEM_GROUP_CSF_IO], - num_pages, queue->phys, true, false); + KBASEP_NUM_CS_USER_IO_PAGES, queue->phys, true, false); + kbase_process_page_usage_dec(kctx, KBASEP_NUM_CS_USER_IO_PAGES); - kfree(queue->reg); - queue->reg = NULL; + /* The user_io_gpu_va should have been unmapped inside the scheduler */ + WARN_ONCE(queue->user_io_gpu_va, "Userio pages appears still have mapping"); /* If the queue has already been terminated by userspace * then the ref count for queue object will drop to 0 here. */ release_queue(queue); } +KBASE_EXPORT_TEST_API(kbase_csf_free_command_stream_user_pages); -int kbase_csf_alloc_command_stream_user_pages(struct kbase_context *kctx, - struct kbase_queue *queue) +int kbase_csf_alloc_command_stream_user_pages(struct kbase_context *kctx, struct kbase_queue *queue) { struct kbase_device *kbdev = kctx->kbdev; - struct kbase_va_region *reg; - const size_t num_pages = 2; int ret; lockdep_assert_held(&kctx->csf.lock); - reg = kbase_alloc_free_region(&kctx->kbdev->csf.shared_reg_rbtree, 0, - num_pages, KBASE_REG_ZONE_MCU_SHARED); - if (!reg) + ret = kbase_mem_pool_alloc_pages(&kctx->mem_pools.small[KBASE_MEM_GROUP_CSF_IO], + KBASEP_NUM_CS_USER_IO_PAGES, + queue->phys, false, kctx->task); + if (ret != KBASEP_NUM_CS_USER_IO_PAGES) { + /* Marking both the phys to zero for indicating there is no phys allocated */ + queue->phys[0].tagged_addr = 0; + queue->phys[1].tagged_addr = 0; return -ENOMEM; - - ret = kbase_mem_pool_alloc_pages( - &kctx->mem_pools.small[KBASE_MEM_GROUP_CSF_IO], - num_pages, queue->phys, false); - - if (ret != num_pages) - goto phys_alloc_failed; + } ret = kernel_map_user_io_pages(kctx, queue); if (ret) goto kernel_map_failed; + kbase_process_page_usage_inc(kctx, KBASEP_NUM_CS_USER_IO_PAGES); init_user_io_pages(queue); - ret = gpu_mmap_user_io_pages(kctx->kbdev, queue->phys, reg); - if (ret) - goto gpu_mmap_failed; - - queue->reg = reg; + /* user_io_gpu_va is only mapped when scheduler decides to put the queue + * on slot at runtime. Initialize it to 0, signalling no mapping. + */ + queue->user_io_gpu_va = 0; mutex_lock(&kbdev->csf.reg_lock); - if (kbdev->csf.db_file_offsets > - (U32_MAX - BASEP_QUEUE_NR_MMAP_USER_PAGES + 1)) + if (kbdev->csf.db_file_offsets > (U32_MAX - BASEP_QUEUE_NR_MMAP_USER_PAGES + 1)) kbdev->csf.db_file_offsets = 0; queue->db_file_offset = kbdev->csf.db_file_offsets; kbdev->csf.db_file_offsets += BASEP_QUEUE_NR_MMAP_USER_PAGES; - - WARN(atomic_read(&queue->refcount) != 1, "Incorrect refcounting for queue object\n"); + WARN(kbase_refcount_read(&queue->refcount) != 1, + "Incorrect refcounting for queue object\n"); /* This is the second reference taken on the queue object and * would be dropped only when the IO mapping is removed either * explicitly by userspace or implicitly by kernel on process exit. @@ -371,19 +337,16 @@ int kbase_csf_alloc_command_stream_user_pages(struct kbase_context *kctx, return 0; -gpu_mmap_failed: - kernel_unmap_user_io_pages(kctx, queue); - kernel_map_failed: - kbase_mem_pool_free_pages( - &kctx->mem_pools.small[KBASE_MEM_GROUP_CSF_IO], - num_pages, queue->phys, false, false); - -phys_alloc_failed: - kfree(reg); + kbase_mem_pool_free_pages(&kctx->mem_pools.small[KBASE_MEM_GROUP_CSF_IO], + KBASEP_NUM_CS_USER_IO_PAGES, queue->phys, false, false); + /* Marking both the phys to zero for indicating there is no phys allocated */ + queue->phys[0].tagged_addr = 0; + queue->phys[1].tagged_addr = 0; - return -ENOMEM; + return ret; } +KBASE_EXPORT_TEST_API(kbase_csf_alloc_command_stream_user_pages); static struct kbase_queue_group *find_queue_group(struct kbase_context *kctx, u8 group_handle) @@ -401,14 +364,20 @@ static struct kbase_queue_group *find_queue_group(struct kbase_context *kctx, return NULL; } +struct kbase_queue_group *kbase_csf_find_queue_group(struct kbase_context *kctx, u8 group_handle) +{ + return find_queue_group(kctx, group_handle); +} +KBASE_EXPORT_TEST_API(kbase_csf_find_queue_group); + int kbase_csf_queue_group_handle_is_valid(struct kbase_context *kctx, u8 group_handle) { struct kbase_queue_group *group; - mutex_lock(&kctx->csf.lock); + rt_mutex_lock(&kctx->csf.lock); group = find_queue_group(kctx, group_handle); - mutex_unlock(&kctx->csf.lock); + rt_mutex_unlock(&kctx->csf.lock); return group ? 0 : -EINVAL; } @@ -429,25 +398,49 @@ static struct kbase_queue *find_queue(struct kbase_context *kctx, u64 base_addr) static void get_queue(struct kbase_queue *queue) { - WARN_ON(!atomic_inc_not_zero(&queue->refcount)); + WARN_ON(!kbase_refcount_inc_not_zero(&queue->refcount)); } -static void release_queue(struct kbase_queue *queue) +/** + * release_queue() - Release a reference to a GPU queue + * + * @queue: The queue to release. + * + * Return: true if the queue has been released. + * + * The queue will be released when its reference count reaches zero. + */ +static bool release_queue(struct kbase_queue *queue) { lockdep_assert_held(&queue->kctx->csf.lock); - - WARN_ON(atomic_read(&queue->refcount) <= 0); - - if (atomic_dec_and_test(&queue->refcount)) { + if (kbase_refcount_dec_and_test(&queue->refcount)) { /* The queue can't still be on the per context list. */ WARN_ON(!list_empty(&queue->link)); WARN_ON(queue->group); + dev_dbg(queue->kctx->kbdev->dev, + "Remove any pending command queue fatal from ctx %d_%d", + queue->kctx->tgid, queue->kctx->id); + + /* After this the Userspace would be able to free the + * memory for GPU queue. In case the Userspace missed + * terminating the queue, the cleanup will happen on + * context termination where tear down of region tracker + * would free up the GPU queue memory. + */ + kbase_gpu_vm_lock(queue->kctx); + kbase_va_region_no_user_free_dec(queue->queue_reg); + kbase_gpu_vm_unlock(queue->kctx); + kfree(queue); + + return true; } + + return false; } static void oom_event_worker(struct work_struct *data); -static void fatal_event_worker(struct work_struct *data); +static void cs_error_worker(struct work_struct *data); /* Between reg and reg_ex, one and only one must be null */ static int csf_queue_register_internal(struct kbase_context *kctx, @@ -482,7 +475,7 @@ static int csf_queue_register_internal(struct kbase_context *kctx, queue_addr = reg->buffer_gpu_addr; queue_size = reg->buffer_size >> PAGE_SHIFT; - mutex_lock(&kctx->csf.lock); + rt_mutex_lock(&kctx->csf.lock); /* Check if queue is already registered */ if (find_queue(kctx, queue_addr) != NULL) { @@ -495,7 +488,8 @@ static int csf_queue_register_internal(struct kbase_context *kctx, region = kbase_region_tracker_find_region_enclosing_address(kctx, queue_addr); - if (kbase_is_region_invalid_or_free(region)) { + if (kbase_is_region_invalid_or_free(region) || kbase_is_region_shrinkable(region) || + region->gpu_alloc->type != KBASE_MEM_TYPE_NATIVE) { ret = -ENOENT; goto out_unlock_vm; } @@ -544,41 +538,31 @@ static int csf_queue_register_internal(struct kbase_context *kctx, queue->kctx = kctx; queue->base_addr = queue_addr; + queue->queue_reg = region; + kbase_va_region_no_user_free_inc(region); + queue->size = (queue_size << PAGE_SHIFT); queue->csi_index = KBASEP_IF_NR_INVALID; - queue->enabled = false; queue->priority = reg->priority; - atomic_set(&queue->refcount, 1); + /* Default to a safe value, this would be updated on binding */ + queue->group_priority = KBASE_QUEUE_GROUP_PRIORITY_LOW; + kbase_refcount_set(&queue->refcount, 1); - queue->group = NULL; queue->bind_state = KBASE_CSF_QUEUE_UNBOUND; queue->handle = BASEP_MEM_INVALID_HANDLE; queue->doorbell_nr = KBASEP_USER_DB_NR_INVALID; - queue->status_wait = 0; - queue->sync_ptr = 0; - queue->sync_value = 0; - -#if IS_ENABLED(CONFIG_DEBUG_FS) - queue->saved_cmd_ptr = 0; -#endif - - queue->sb_status = 0; queue->blocked_reason = CS_STATUS_BLOCKED_REASON_REASON_UNBLOCKED; - atomic_set(&queue->pending, 0); - INIT_LIST_HEAD(&queue->link); - INIT_LIST_HEAD(&queue->error.link); + atomic_set(&queue->pending_kick, 0); + INIT_LIST_HEAD(&queue->pending_kick_link); INIT_WORK(&queue->oom_event_work, oom_event_worker); - INIT_WORK(&queue->fatal_event_work, fatal_event_worker); + INIT_WORK(&queue->cs_error_work, cs_error_worker); list_add(&queue->link, &kctx->csf.queue_list); - queue->extract_ofs = 0; - - region->flags |= KBASE_REG_NO_USER_FREE; region->user_data = queue; /* Initialize the cs_trace configuration parameters, When buffer_size @@ -600,7 +584,7 @@ static int csf_queue_register_internal(struct kbase_context *kctx, out_unlock_vm: kbase_gpu_vm_unlock(kctx); out: - mutex_unlock(&kctx->csf.lock); + rt_mutex_unlock(&kctx->csf.lock); return ret; } @@ -608,6 +592,13 @@ out: int kbase_csf_queue_register(struct kbase_context *kctx, struct kbase_ioctl_cs_queue_register *reg) { + /* Validate the ring buffer configuration parameters */ + if (reg->buffer_size < CS_RING_BUFFER_MIN_SIZE || + reg->buffer_size > CS_RING_BUFFER_MAX_SIZE || + reg->buffer_size & (reg->buffer_size - 1) || !reg->buffer_gpu_addr || + reg->buffer_gpu_addr & ~PAGE_MASK) + return -EINVAL; + return csf_queue_register_internal(kctx, reg, NULL); } @@ -626,6 +617,13 @@ int kbase_csf_queue_register_ex(struct kbase_context *kctx, if (glb_version < kbase_csf_interface_version(1, 1, 0)) return -EINVAL; + /* Validate the ring buffer configuration parameters */ + if (reg->buffer_size < CS_RING_BUFFER_MIN_SIZE || + reg->buffer_size > CS_RING_BUFFER_MAX_SIZE || + reg->buffer_size & (reg->buffer_size - 1) || !reg->buffer_gpu_addr || + reg->buffer_gpu_addr & ~PAGE_MASK) + return -EINVAL; + /* Validate the cs_trace configuration parameters */ if (reg->ex_buffer_size && ((reg->ex_event_size > max_size) || @@ -639,6 +637,22 @@ int kbase_csf_queue_register_ex(struct kbase_context *kctx, static void unbind_queue(struct kbase_context *kctx, struct kbase_queue *queue); +static void wait_pending_queue_kick(struct kbase_queue *queue) +{ + struct kbase_context *const kctx = queue->kctx; + + /* Drain a pending queue kick if any. It should no longer be + * possible to issue further queue kicks at this point: either the + * queue has been unbound, or the context is being terminated. + * + * Signal kbase_csf_scheduler_kthread() to allow for the + * eventual completion of the current iteration. Once it's done the + * event_wait wait queue shall be signalled. + */ + complete(&kctx->kbdev->csf.scheduler.kthread_signal); + wait_event(kctx->kbdev->csf.event_wait, atomic_read(&queue->pending_kick) == 0); +} + void kbase_csf_queue_terminate(struct kbase_context *kctx, struct kbase_ioctl_cs_queue_terminate *term) { @@ -656,7 +670,7 @@ void kbase_csf_queue_terminate(struct kbase_context *kctx, else reset_prevented = true; - mutex_lock(&kctx->csf.lock); + rt_mutex_lock(&kctx->csf.lock); queue = find_queue(kctx, term->buffer_gpu_addr); if (queue) { @@ -672,27 +686,26 @@ void kbase_csf_queue_terminate(struct kbase_context *kctx, unbind_queue(kctx, queue); kbase_gpu_vm_lock(kctx); - if (!WARN_ON(!queue->queue_reg)) { - /* After this the Userspace would be able to free the - * memory for GPU queue. In case the Userspace missed - * terminating the queue, the cleanup will happen on - * context termination where tear down of region tracker - * would free up the GPU queue memory. - */ - queue->queue_reg->flags &= ~KBASE_REG_NO_USER_FREE; + if (!WARN_ON(!queue->queue_reg)) queue->queue_reg->user_data = NULL; - } kbase_gpu_vm_unlock(kctx); - dev_dbg(kctx->kbdev->dev, - "Remove any pending command queue fatal from context %pK\n", - (void *)kctx); - kbase_csf_event_remove_error(kctx, &queue->error); + rt_mutex_unlock(&kctx->csf.lock); + /* The GPU reset can be allowed now as the queue has been unbound. */ + if (reset_prevented) { + kbase_reset_gpu_allow(kbdev); + reset_prevented = false; + } + wait_pending_queue_kick(queue); + /* The work items can be cancelled as Userspace is terminating the queue */ + cancel_work_sync(&queue->oom_event_work); + cancel_work_sync(&queue->cs_error_work); + rt_mutex_lock(&kctx->csf.lock); release_queue(queue); } - mutex_unlock(&kctx->csf.lock); + rt_mutex_unlock(&kctx->csf.lock); if (reset_prevented) kbase_reset_gpu_allow(kbdev); } @@ -704,7 +717,7 @@ int kbase_csf_queue_bind(struct kbase_context *kctx, union kbase_ioctl_cs_queue_ u8 max_streams; int ret = -EINVAL; - mutex_lock(&kctx->csf.lock); + rt_mutex_lock(&kctx->csf.lock); group = find_queue_group(kctx, bind->in.group_handle); queue = find_queue(kctx, bind->in.buffer_gpu_addr); @@ -733,21 +746,30 @@ int kbase_csf_queue_bind(struct kbase_context *kctx, union kbase_ioctl_cs_queue_ bind->out.mmap_handle = queue->handle; group->bound_queues[bind->in.csi_index] = queue; queue->group = group; + queue->group_priority = group->priority; queue->csi_index = bind->in.csi_index; queue->bind_state = KBASE_CSF_QUEUE_BIND_IN_PROGRESS; out: - mutex_unlock(&kctx->csf.lock); + rt_mutex_unlock(&kctx->csf.lock); return ret; } -static struct kbase_queue_group *get_bound_queue_group( - struct kbase_queue *queue) +/** + * get_bound_queue_group - Get the group to which a queue was bound + * + * @queue: Pointer to the queue for this group + * + * Return: The group to which this queue was bound, or NULL on error. + */ +static struct kbase_queue_group *get_bound_queue_group(struct kbase_queue *queue) { struct kbase_context *kctx = queue->kctx; struct kbase_queue_group *group; + lockdep_assert_held(&kctx->csf.lock); + if (queue->bind_state == KBASE_CSF_QUEUE_UNBOUND) return NULL; @@ -769,53 +791,13 @@ static struct kbase_queue_group *get_bound_queue_group( return group; } -/** - * pending_submission_worker() - Work item to process pending kicked GPU command queues. - * - * @work: Pointer to pending_submission_work. - * - * This function starts all pending queues, for which the work - * was previously submitted via ioctl call from application thread. - * If the queue is already scheduled and resident, it will be started - * right away, otherwise once the group is made resident. - */ -static void pending_submission_worker(struct work_struct *work) -{ - struct kbase_context *kctx = - container_of(work, struct kbase_context, csf.pending_submission_work); - struct kbase_device *kbdev = kctx->kbdev; - struct kbase_queue *queue; - int err = kbase_reset_gpu_prevent_and_wait(kbdev); - - if (err) { - dev_err(kbdev->dev, "Unsuccessful GPU reset detected when kicking queue "); - return; - } - - mutex_lock(&kctx->csf.lock); - - /* Iterate through the queue list and schedule the pending ones for submission. */ - list_for_each_entry(queue, &kctx->csf.queue_list, link) { - if (atomic_cmpxchg(&queue->pending, 1, 0) == 1) { - struct kbase_queue_group *group = get_bound_queue_group(queue); - - if (!group || queue->bind_state != KBASE_CSF_QUEUE_BOUND) - dev_dbg(kbdev->dev, "queue is not bound to a group"); - else - WARN_ON(kbase_csf_scheduler_queue_start(queue)); - } - } - - mutex_unlock(&kctx->csf.lock); - - kbase_reset_gpu_allow(kbdev); -} - void kbase_csf_ring_csg_doorbell(struct kbase_device *kbdev, int slot) { if (WARN_ON(slot < 0)) return; + kbase_csf_scheduler_spin_lock_assert_held(kbdev); + kbase_csf_ring_csg_slots_doorbell(kbdev, (u32) (1 << slot)); } @@ -828,9 +810,20 @@ void kbase_csf_ring_csg_slots_doorbell(struct kbase_device *kbdev, (u32) ((1U << kbdev->csf.global_iface.group_num) - 1); u32 value; + kbase_csf_scheduler_spin_lock_assert_held(kbdev); + if (WARN_ON(slot_bitmap > allowed_bitmap)) return; + /* The access to GLB_DB_REQ/ACK needs to be ordered with respect to CSG_REQ/ACK and + * CSG_DB_REQ/ACK to avoid a scenario where a CSI request overlaps with a CSG request + * or 2 CSI requests overlap and FW ends up missing the 2nd request. + * Memory barrier is required, both on Host and FW side, to guarantee the ordering. + * + * 'osh' is used as CPU and GPU would be in the same Outer shareable domain. + */ + dmb(osh); + value = kbase_csf_firmware_global_output(global_iface, GLB_DB_ACK); value ^= slot_bitmap; kbase_csf_firmware_global_input_mask(global_iface, GLB_DB_REQ, value, @@ -857,6 +850,8 @@ void kbase_csf_ring_cs_kernel_doorbell(struct kbase_device *kbdev, struct kbase_csf_cmd_stream_group_info *ginfo; u32 value; + kbase_csf_scheduler_spin_lock_assert_held(kbdev); + if (WARN_ON(csg_nr < 0) || WARN_ON(csg_nr >= kbdev->csf.global_iface.group_num)) return; @@ -867,6 +862,14 @@ void kbase_csf_ring_cs_kernel_doorbell(struct kbase_device *kbdev, WARN_ON(csi_index >= ginfo->stream_num)) return; + /* The access to CSG_DB_REQ/ACK needs to be ordered with respect to + * CS_REQ/ACK to avoid a scenario where CSG_DB_REQ/ACK becomes visible to + * FW before CS_REQ/ACK is set. + * + * 'osh' is used as CPU and GPU would be in the same outer shareable domain. + */ + dmb(osh); + value = kbase_csf_firmware_csg_output(ginfo, CSG_DB_ACK); value ^= (1 << csi_index); kbase_csf_firmware_csg_input_mask(ginfo, CSG_DB_REQ, value, @@ -876,19 +879,15 @@ void kbase_csf_ring_cs_kernel_doorbell(struct kbase_device *kbdev, kbase_csf_ring_csg_doorbell(kbdev, csg_nr); } -static void enqueue_gpu_submission_work(struct kbase_context *const kctx) -{ - queue_work(system_highpri_wq, &kctx->csf.pending_submission_work); -} - int kbase_csf_queue_kick(struct kbase_context *kctx, struct kbase_ioctl_cs_queue_kick *kick) { struct kbase_device *kbdev = kctx->kbdev; - bool trigger_submission = false; struct kbase_va_region *region; int err = 0; + KBASE_TLSTREAM_TL_KBASE_GPUCMDQUEUE_KICK(kbdev, kctx->id, kick->buffer_gpu_addr); + /* GPU work submission happening asynchronously to prevent the contention with * scheduler lock and as the result blocking application thread. For this reason, * the vm_lock is used here to get the reference to the queue based on its buffer_gpu_addr @@ -901,9 +900,19 @@ int kbase_csf_queue_kick(struct kbase_context *kctx, if (!kbase_is_region_invalid_or_free(region)) { struct kbase_queue *queue = region->user_data; - if (queue) { - atomic_cmpxchg(&queue->pending, 0, 1); - trigger_submission = true; + if (queue && (queue->bind_state == KBASE_CSF_QUEUE_BOUND)) { + spin_lock(&kbdev->csf.pending_gpuq_kicks_lock); + if (list_empty(&queue->pending_kick_link)) { + /* Queue termination shall block until this + * kick has been handled. + */ + atomic_inc(&queue->pending_kick); + list_add_tail( + &queue->pending_kick_link, + &kbdev->csf.pending_gpuq_kicks[queue->group_priority]); + complete(&kbdev->csf.scheduler.kthread_signal); + } + spin_unlock(&kbdev->csf.pending_gpuq_kicks_lock); } } else { dev_dbg(kbdev->dev, @@ -912,9 +921,6 @@ int kbase_csf_queue_kick(struct kbase_context *kctx, } kbase_gpu_vm_unlock(kctx); - if (likely(trigger_submission)) - enqueue_gpu_submission_work(kctx); - return err; } @@ -923,19 +929,23 @@ static void unbind_stopped_queue(struct kbase_context *kctx, { lockdep_assert_held(&kctx->csf.lock); + if (WARN_ON(queue->csi_index < 0)) + return; + if (queue->bind_state != KBASE_CSF_QUEUE_UNBOUND) { unsigned long flags; kbase_csf_scheduler_spin_lock(kctx->kbdev, &flags); bitmap_clear(queue->group->protm_pending_bitmap, queue->csi_index, 1); - KBASE_KTRACE_ADD_CSF_GRP_Q(kctx->kbdev, PROTM_PENDING_CLEAR, + KBASE_KTRACE_ADD_CSF_GRP_Q(kctx->kbdev, CSI_PROTM_PEND_CLEAR, queue->group, queue, queue->group->protm_pending_bitmap[0]); queue->group->bound_queues[queue->csi_index] = NULL; queue->group = NULL; kbase_csf_scheduler_spin_unlock(kctx->kbdev, flags); put_user_pages_mmap_handle(kctx, queue); + WARN_ON_ONCE(queue->doorbell_nr != KBASEP_USER_DB_NR_INVALID); queue->bind_state = KBASE_CSF_QUEUE_UNBOUND; } } @@ -977,7 +987,16 @@ static void unbind_queue(struct kbase_context *kctx, struct kbase_queue *queue) } } -void kbase_csf_queue_unbind(struct kbase_queue *queue) +static bool kbase_csf_queue_phys_allocated(struct kbase_queue *queue) +{ + /* The queue's phys are zeroed when allocation fails. Both of them being + * zero is an impossible condition for a successful allocated set of phy pages. + */ + + return (queue->phys[0].tagged_addr | queue->phys[1].tagged_addr); +} + +void kbase_csf_queue_unbind(struct kbase_queue *queue, bool process_exit) { struct kbase_context *kctx = queue->kctx; @@ -991,7 +1010,7 @@ void kbase_csf_queue_unbind(struct kbase_queue *queue) * whereas CSG TERM request would result in an immediate abort or * cancellation of the pending work. */ - if (current->flags & PF_EXITING) { + if (process_exit) { struct kbase_queue_group *group = get_bound_queue_group(queue); if (group) @@ -1002,8 +1021,8 @@ void kbase_csf_queue_unbind(struct kbase_queue *queue) unbind_queue(kctx, queue); } - /* Free the resources, if allocated for this queue. */ - if (queue->reg) + /* Free the resources, if allocated phys for this queue */ + if (kbase_csf_queue_phys_allocated(queue)) kbase_csf_free_command_stream_user_pages(kctx, queue); } @@ -1016,8 +1035,8 @@ void kbase_csf_queue_unbind_stopped(struct kbase_queue *queue) WARN_ON(queue->bind_state == KBASE_CSF_QUEUE_BOUND); unbind_stopped_queue(kctx, queue); - /* Free the resources, if allocated for this queue. */ - if (queue->reg) + /* Free the resources, if allocated phys for this queue */ + if (kbase_csf_queue_phys_allocated(queue)) kbase_csf_free_command_stream_user_pages(kctx, queue); } @@ -1080,172 +1099,43 @@ static bool iface_has_enough_streams(struct kbase_device *const kbdev, * @kctx: Pointer to kbase context where the queue group is created at * @s_buf: Pointer to suspend buffer that is attached to queue group * - * Return: 0 if suspend buffer is successfully allocated and reflected to GPU - * MMU page table. Otherwise -ENOMEM. + * Return: 0 if phy-pages for the suspend buffer is successfully allocated. + * Otherwise -ENOMEM or error code. */ static int create_normal_suspend_buffer(struct kbase_context *const kctx, struct kbase_normal_suspend_buffer *s_buf) { - struct kbase_va_region *reg = NULL; - const unsigned long mem_flags = KBASE_REG_GPU_RD | KBASE_REG_GPU_WR; const size_t nr_pages = PFN_UP(kctx->kbdev->csf.global_iface.groups[0].suspend_size); - int err = 0; - - /* Calls to this function are inherently asynchronous, with respect to - * MMU operations. - */ - const enum kbase_caller_mmu_sync_info mmu_sync_info = CALLER_MMU_ASYNC; + int err; lockdep_assert_held(&kctx->csf.lock); - /* Allocate and initialize Region Object */ - reg = kbase_alloc_free_region(&kctx->kbdev->csf.shared_reg_rbtree, 0, - nr_pages, KBASE_REG_ZONE_MCU_SHARED); - - if (!reg) - return -ENOMEM; - - s_buf->phy = kcalloc(nr_pages, sizeof(*s_buf->phy), GFP_KERNEL); - - if (!s_buf->phy) { - err = -ENOMEM; - goto phy_alloc_failed; - } - - /* Get physical page for a normal suspend buffer */ - err = kbase_mem_pool_alloc_pages( - &kctx->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], - nr_pages, &s_buf->phy[0], false); - - if (err < 0) - goto phy_pages_alloc_failed; - - /* Insert Region Object into rbtree and make virtual address available - * to map it to physical page - */ - mutex_lock(&kctx->kbdev->csf.reg_lock); - err = kbase_add_va_region_rbtree(kctx->kbdev, reg, 0, nr_pages, 1); - reg->flags &= ~KBASE_REG_FREE; - mutex_unlock(&kctx->kbdev->csf.reg_lock); - - if (err) - goto add_va_region_failed; - - /* Update MMU table */ - err = kbase_mmu_insert_pages(kctx->kbdev, &kctx->kbdev->csf.mcu_mmu, - reg->start_pfn, &s_buf->phy[0], nr_pages, - mem_flags, MCU_AS_NR, - KBASE_MEM_GROUP_CSF_FW, mmu_sync_info); - if (err) - goto mmu_insert_failed; - - s_buf->reg = reg; - - return 0; - -mmu_insert_failed: - mutex_lock(&kctx->kbdev->csf.reg_lock); - kbase_remove_va_region(kctx->kbdev, reg); - mutex_unlock(&kctx->kbdev->csf.reg_lock); - -add_va_region_failed: - kbase_mem_pool_free_pages( - &kctx->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], nr_pages, - &s_buf->phy[0], false, false); - -phy_pages_alloc_failed: - kfree(s_buf->phy); -phy_alloc_failed: - kfree(reg); - - return err; -} - -/** - * create_protected_suspend_buffer() - Create protected-mode suspend buffer - * per queue group - * - * @kbdev: Instance of a GPU platform device that implements a CSF interface. - * @s_buf: Pointer to suspend buffer that is attached to queue group - * - * Return: 0 if suspend buffer is successfully allocated and reflected to GPU - * MMU page table. Otherwise -ENOMEM. - */ -static int create_protected_suspend_buffer(struct kbase_device *const kbdev, - struct kbase_protected_suspend_buffer *s_buf) -{ - struct kbase_va_region *reg = NULL; - struct tagged_addr *phys = NULL; - const unsigned long mem_flags = KBASE_REG_GPU_RD | KBASE_REG_GPU_WR; - const size_t nr_pages = - PFN_UP(kbdev->csf.global_iface.groups[0].suspend_size); - int err = 0; - - /* Calls to this function are inherently asynchronous, with respect to - * MMU operations. + /* The suspend buffer's mapping address is valid only when the CSG is to + * run on slot, initializing it 0, signalling the buffer is not mapped. */ - const enum kbase_caller_mmu_sync_info mmu_sync_info = CALLER_MMU_ASYNC; + s_buf->gpu_va = 0; - /* Allocate and initialize Region Object */ - reg = kbase_alloc_free_region(&kbdev->csf.shared_reg_rbtree, 0, - nr_pages, KBASE_REG_ZONE_MCU_SHARED); + s_buf->phy = kcalloc(nr_pages, sizeof(*s_buf->phy), GFP_KERNEL); - if (!reg) + if (!s_buf->phy) return -ENOMEM; - phys = kcalloc(nr_pages, sizeof(*phys), GFP_KERNEL); - if (!phys) { - err = -ENOMEM; - goto phy_alloc_failed; - } + /* Get physical page for a normal suspend buffer */ + err = kbase_mem_pool_alloc_pages(&kctx->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], nr_pages, + &s_buf->phy[0], false, kctx->task); - s_buf->pma = kbase_csf_protected_memory_alloc(kbdev, phys, - nr_pages, true); - if (s_buf->pma == NULL) { - err = -ENOMEM; - goto pma_alloc_failed; + if (err < 0) { + kfree(s_buf->phy); + return err; } - /* Insert Region Object into rbtree and make virtual address available - * to map it to physical page - */ - mutex_lock(&kbdev->csf.reg_lock); - err = kbase_add_va_region_rbtree(kbdev, reg, 0, nr_pages, 1); - reg->flags &= ~KBASE_REG_FREE; - mutex_unlock(&kbdev->csf.reg_lock); - - if (err) - goto add_va_region_failed; - - /* Update MMU table */ - err = kbase_mmu_insert_pages(kbdev, &kbdev->csf.mcu_mmu, reg->start_pfn, - phys, nr_pages, mem_flags, MCU_AS_NR, - KBASE_MEM_GROUP_CSF_FW, mmu_sync_info); - if (err) - goto mmu_insert_failed; - - s_buf->reg = reg; - kfree(phys); + kbase_process_page_usage_inc(kctx, nr_pages); return 0; - -mmu_insert_failed: - mutex_lock(&kbdev->csf.reg_lock); - kbase_remove_va_region(kbdev, reg); - mutex_unlock(&kbdev->csf.reg_lock); - -add_va_region_failed: - kbase_csf_protected_memory_free(kbdev, s_buf->pma, nr_pages, true); -pma_alloc_failed: - kfree(phys); -phy_alloc_failed: - kfree(reg); - - return err; } static void timer_event_worker(struct work_struct *data); -static void protm_event_worker(struct work_struct *data); +static void protm_event_worker(struct kthread_work *work); static void term_normal_suspend_buffer(struct kbase_context *const kctx, struct kbase_normal_suspend_buffer *s_buf); @@ -1262,26 +1152,17 @@ static void term_normal_suspend_buffer(struct kbase_context *const kctx, static int create_suspend_buffers(struct kbase_context *const kctx, struct kbase_queue_group * const group) { - int err = 0; - if (create_normal_suspend_buffer(kctx, &group->normal_suspend_buf)) { dev_err(kctx->kbdev->dev, "Failed to create normal suspend buffer\n"); return -ENOMEM; } - if (kctx->kbdev->csf.pma_dev) { - err = create_protected_suspend_buffer(kctx->kbdev, - &group->protected_suspend_buf); - if (err) { - term_normal_suspend_buffer(kctx, - &group->normal_suspend_buf); - dev_err(kctx->kbdev->dev, "Failed to create protected suspend buffer\n"); - } - } else { - group->protected_suspend_buf.reg = NULL; - } + /* Protected suspend buffer, runtime binding so just initialize it */ + group->protected_suspend_buf.gpu_va = 0; + group->protected_suspend_buf.pma = NULL; + group->protected_suspend_buf.alloc_retries = 0; - return err; + return 0; } /** @@ -1328,6 +1209,9 @@ static int create_queue_group(struct kbase_context *const kctx, } else { int err = 0; +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) + group->prev_act = false; +#endif group->kctx = kctx; group->handle = group_handle; group->csg_nr = KBASEP_CSG_NR_INVALID; @@ -1339,11 +1223,23 @@ static int create_queue_group(struct kbase_context *const kctx, group->tiler_max = create->in.tiler_max; group->fragment_max = create->in.fragment_max; group->compute_max = create->in.compute_max; + group->csi_handlers = create->in.csi_handlers; group->priority = kbase_csf_priority_queue_group_priority_to_relative( kbase_csf_priority_check(kctx->kbdev, create->in.priority)); group->doorbell_nr = KBASEP_USER_DB_NR_INVALID; group->faulted = false; + group->cs_unrecoverable = false; + group->reevaluate_idle_status = false; + + group->csg_reg = NULL; + group->csg_reg_bind_retries = 0; + group->dvs_buf = create->in.dvs_buf; + + +#if IS_ENABLED(CONFIG_DEBUG_FS) + group->deschedule_deferred_cnt = 0; +#endif group->group_uid = generate_group_uid(); create->out.group_uid = group->group_uid; @@ -1351,14 +1247,15 @@ static int create_queue_group(struct kbase_context *const kctx, INIT_LIST_HEAD(&group->link); INIT_LIST_HEAD(&group->link_to_schedule); INIT_LIST_HEAD(&group->error_fatal.link); - INIT_LIST_HEAD(&group->error_timeout.link); - INIT_LIST_HEAD(&group->error_tiler_oom.link); INIT_WORK(&group->timer_event_work, timer_event_worker); - INIT_WORK(&group->protm_event_work, protm_event_worker); + kthread_init_work(&group->protm_event_work, protm_event_worker); bitmap_zero(group->protm_pending_bitmap, MAX_SUPPORTED_STREAMS_PER_GROUP); group->run_state = KBASE_CSF_GROUP_INACTIVE; + KBASE_KTRACE_ADD_CSF_GRP(group->kctx->kbdev, CSF_GROUP_INACTIVE, group, + group->run_state); + err = create_suspend_buffers(kctx, group); if (err < 0) { @@ -1378,6 +1275,17 @@ static int create_queue_group(struct kbase_context *const kctx, return group_handle; } +static bool dvs_supported(u32 csf_version) +{ + if (GLB_VERSION_MAJOR_GET(csf_version) < 3) + return false; + + if (GLB_VERSION_MAJOR_GET(csf_version) == 3) + if (GLB_VERSION_MINOR_GET(csf_version) < 2) + return false; + + return true; +} int kbase_csf_queue_group_create(struct kbase_context *const kctx, union kbase_ioctl_cs_queue_group_create *const create) @@ -1386,11 +1294,18 @@ int kbase_csf_queue_group_create(struct kbase_context *const kctx, const u32 tiler_count = hweight64(create->in.tiler_mask); const u32 fragment_count = hweight64(create->in.fragment_mask); const u32 compute_count = hweight64(create->in.compute_mask); + size_t i; - mutex_lock(&kctx->csf.lock); + for (i = 0; i < ARRAY_SIZE(create->in.padding); i++) { + if (create->in.padding[i] != 0) { + dev_warn(kctx->kbdev->dev, "Invalid padding not 0 in queue group create\n"); + return -EINVAL; + } + } - if ((create->in.tiler_max > tiler_count) || - (create->in.fragment_max > fragment_count) || + rt_mutex_lock(&kctx->csf.lock); + + if ((create->in.tiler_max > tiler_count) || (create->in.fragment_max > fragment_count) || (create->in.compute_max > compute_count)) { dev_dbg(kctx->kbdev->dev, "Invalid maximum number of endpoints for a queue group"); @@ -1404,8 +1319,20 @@ int kbase_csf_queue_group_create(struct kbase_context *const kctx, "No CSG has at least %d CSs", create->in.cs_min); err = -EINVAL; - } else if (create->in.reserved) { - dev_warn(kctx->kbdev->dev, "Reserved field was set to non-0"); + } else if (create->in.csi_handlers & ~BASE_CSF_EXCEPTION_HANDLER_FLAGS_MASK) { + dev_warn(kctx->kbdev->dev, "Unknown exception handler flags set: %u", + create->in.csi_handlers & ~BASE_CSF_EXCEPTION_HANDLER_FLAGS_MASK); + err = -EINVAL; + } else if (!dvs_supported(kctx->kbdev->csf.global_iface.version) && create->in.dvs_buf) { + dev_warn( + kctx->kbdev->dev, + "GPU does not support DVS but userspace is trying to use it"); + err = -EINVAL; + } else if (dvs_supported(kctx->kbdev->csf.global_iface.version) && + !CSG_DVS_BUF_BUFFER_POINTER_GET(create->in.dvs_buf) && + CSG_DVS_BUF_BUFFER_SIZE_GET(create->in.dvs_buf)) { + dev_warn(kctx->kbdev->dev, + "DVS buffer pointer is null but size is not 0"); err = -EINVAL; } else { /* For the CSG which satisfies the condition for having @@ -1423,7 +1350,7 @@ int kbase_csf_queue_group_create(struct kbase_context *const kctx, err = group_handle; } - mutex_unlock(&kctx->csf.lock); + rt_mutex_unlock(&kctx->csf.lock); return err; } @@ -1435,60 +1362,39 @@ int kbase_csf_queue_group_create(struct kbase_context *const kctx, * @s_buf: Pointer to queue group suspend buffer to be freed */ static void term_normal_suspend_buffer(struct kbase_context *const kctx, - struct kbase_normal_suspend_buffer *s_buf) + struct kbase_normal_suspend_buffer *s_buf) { - const size_t nr_pages = - PFN_UP(kctx->kbdev->csf.global_iface.groups[0].suspend_size); + const size_t nr_pages = PFN_UP(kctx->kbdev->csf.global_iface.groups[0].suspend_size); lockdep_assert_held(&kctx->csf.lock); - WARN_ON(kbase_mmu_teardown_pages( - kctx->kbdev, &kctx->kbdev->csf.mcu_mmu, - s_buf->reg->start_pfn, nr_pages, MCU_AS_NR)); - - WARN_ON(s_buf->reg->flags & KBASE_REG_FREE); + /* The group should not have a bind remaining on any suspend buf region */ + WARN_ONCE(s_buf->gpu_va, "Suspend buffer address should be 0 at termination"); - mutex_lock(&kctx->kbdev->csf.reg_lock); - kbase_remove_va_region(kctx->kbdev, s_buf->reg); - mutex_unlock(&kctx->kbdev->csf.reg_lock); - - kbase_mem_pool_free_pages( - &kctx->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], - nr_pages, &s_buf->phy[0], false, false); + kbase_mem_pool_free_pages(&kctx->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], nr_pages, + &s_buf->phy[0], false, false); + kbase_process_page_usage_dec(kctx, nr_pages); kfree(s_buf->phy); s_buf->phy = NULL; - kfree(s_buf->reg); - s_buf->reg = NULL; } /** - * term_protected_suspend_buffer() - Free normal-mode suspend buffer of + * term_protected_suspend_buffer() - Free protected-mode suspend buffer of * queue group * * @kbdev: Instance of a GPU platform device that implements a CSF interface. - * @s_buf: Pointer to queue group suspend buffer to be freed + * @sbuf: Pointer to queue group suspend buffer to be freed */ static void term_protected_suspend_buffer(struct kbase_device *const kbdev, - struct kbase_protected_suspend_buffer *s_buf) + struct kbase_protected_suspend_buffer *sbuf) { - const size_t nr_pages = - PFN_UP(kbdev->csf.global_iface.groups[0].suspend_size); - - WARN_ON(kbase_mmu_teardown_pages( - kbdev, &kbdev->csf.mcu_mmu, - s_buf->reg->start_pfn, nr_pages, MCU_AS_NR)); - - WARN_ON(s_buf->reg->flags & KBASE_REG_FREE); - - mutex_lock(&kbdev->csf.reg_lock); - kbase_remove_va_region(kbdev, s_buf->reg); - mutex_unlock(&kbdev->csf.reg_lock); - - kbase_csf_protected_memory_free(kbdev, s_buf->pma, nr_pages, true); - s_buf->pma = NULL; - kfree(s_buf->reg); - s_buf->reg = NULL; + WARN_ONCE(sbuf->gpu_va, "Suspend buf should have been unmapped inside scheduler!"); + if (sbuf->pma) { + const size_t nr_pages = PFN_UP(kbdev->csf.global_iface.groups[0].suspend_size); + kbase_csf_protected_memory_free(kbdev, sbuf->pma, nr_pages, true); + sbuf->pma = NULL; + } } void kbase_csf_term_descheduled_queue_group(struct kbase_queue_group *group) @@ -1520,6 +1426,7 @@ void kbase_csf_term_descheduled_queue_group(struct kbase_queue_group *group) &group->protected_suspend_buf); group->run_state = KBASE_CSF_GROUP_TERMINATED; + KBASE_KTRACE_ADD_CSF_GRP(group->kctx->kbdev, CSF_GROUP_TERMINATED, group, group->run_state); } /** @@ -1550,10 +1457,38 @@ static void term_queue_group(struct kbase_queue_group *group) kbase_csf_term_descheduled_queue_group(group); } +/** + * wait_group_deferred_deschedule_completion - Wait for refcount of the group to + * become 0 that was taken when the group deschedule had to be deferred. + * + * @group: Pointer to GPU command queue group that is being deleted. + * + * This function is called when Userspace deletes the group and after the group + * has been descheduled. The function synchronizes with the other threads that were + * also trying to deschedule the group whilst the dumping was going on for a fault. + * Please refer the documentation of wait_for_dump_complete_on_group_deschedule() + * for more details. + */ +static void wait_group_deferred_deschedule_completion(struct kbase_queue_group *group) +{ +#if IS_ENABLED(CONFIG_DEBUG_FS) + struct kbase_context *kctx = group->kctx; + + lockdep_assert_held(&kctx->csf.lock); + + if (likely(!group->deschedule_deferred_cnt)) + return; + + rt_mutex_unlock(&kctx->csf.lock); + wait_event(kctx->kbdev->csf.event_wait, !group->deschedule_deferred_cnt); + rt_mutex_lock(&kctx->csf.lock); +#endif +} + static void cancel_queue_group_events(struct kbase_queue_group *group) { cancel_work_sync(&group->timer_event_work); - cancel_work_sync(&group->protm_event_work); + kthread_cancel_work_sync(&group->protm_event_work); } static void remove_pending_group_fatal_error(struct kbase_queue_group *group) @@ -1564,8 +1499,6 @@ static void remove_pending_group_fatal_error(struct kbase_queue_group *group) "Remove any pending group fatal error from context %pK\n", (void *)group->kctx); - kbase_csf_event_remove_error(kctx, &group->error_tiler_oom); - kbase_csf_event_remove_error(kctx, &group->error_timeout); kbase_csf_event_remove_error(kctx, &group->error_fatal); } @@ -1586,32 +1519,49 @@ void kbase_csf_queue_group_terminate(struct kbase_context *kctx, else reset_prevented = true; - mutex_lock(&kctx->csf.lock); + rt_mutex_lock(&kctx->csf.lock); group = find_queue_group(kctx, group_handle); if (group) { - remove_pending_group_fatal_error(group); - term_queue_group(group); kctx->csf.queue_groups[group_handle] = NULL; + /* Stop the running of the given group */ + term_queue_group(group); + rt_mutex_unlock(&kctx->csf.lock); + + if (reset_prevented) { + /* Allow GPU reset before cancelling the group specific + * work item to avoid potential deadlock. + * Reset prevention isn't needed after group termination. + */ + kbase_reset_gpu_allow(kbdev); + reset_prevented = false; + } + + /* Cancel any pending event callbacks. If one is in progress + * then this thread waits synchronously for it to complete (which + * is why we must unlock the context first). We already ensured + * that no more callbacks can be enqueued by terminating the group. + */ + cancel_queue_group_events(group); + + rt_mutex_lock(&kctx->csf.lock); + + /* Clean up after the termination */ + remove_pending_group_fatal_error(group); + + wait_group_deferred_deschedule_completion(group); } - mutex_unlock(&kctx->csf.lock); + rt_mutex_unlock(&kctx->csf.lock); if (reset_prevented) kbase_reset_gpu_allow(kbdev); - if (!group) - return; - - /* Cancel any pending event callbacks. If one is in progress - * then this thread waits synchronously for it to complete (which - * is why we must unlock the context first). We already ensured - * that no more callbacks can be enqueued by terminating the group. - */ - cancel_queue_group_events(group); kfree(group); } +KBASE_EXPORT_TEST_API(kbase_csf_queue_group_terminate); +#if IS_ENABLED(CONFIG_MALI_VECTOR_DUMP) || MALI_UNIT_TEST int kbase_csf_queue_group_suspend(struct kbase_context *kctx, struct kbase_suspend_copy_buffer *sus_buf, u8 group_handle) @@ -1628,7 +1578,7 @@ int kbase_csf_queue_group_suspend(struct kbase_context *kctx, group_handle); return err; } - mutex_lock(&kctx->csf.lock); + rt_mutex_lock(&kctx->csf.lock); group = find_queue_group(kctx, group_handle); if (group) @@ -1637,11 +1587,12 @@ int kbase_csf_queue_group_suspend(struct kbase_context *kctx, else err = -EINVAL; - mutex_unlock(&kctx->csf.lock); + rt_mutex_unlock(&kctx->csf.lock); kbase_reset_gpu_allow(kbdev); return err; } +#endif void kbase_csf_add_group_fatal_error( struct kbase_queue_group *const group, @@ -1677,7 +1628,7 @@ void kbase_csf_active_queue_groups_reset(struct kbase_device *kbdev, INIT_LIST_HEAD(&evicted_groups); - mutex_lock(&kctx->csf.lock); + rt_mutex_lock(&kctx->csf.lock); kbase_csf_scheduler_evict_ctx_slots(kbdev, kctx, &evicted_groups); while (!list_empty(&evicted_groups)) { @@ -1698,12 +1649,11 @@ void kbase_csf_active_queue_groups_reset(struct kbase_device *kbdev, kbase_csf_term_descheduled_queue_group(group); } - mutex_unlock(&kctx->csf.lock); + rt_mutex_unlock(&kctx->csf.lock); } int kbase_csf_ctx_init(struct kbase_context *kctx) { - struct kbase_device *kbdev = kctx->kbdev; int err = -ENOMEM; INIT_LIST_HEAD(&kctx->csf.queue_list); @@ -1711,21 +1661,6 @@ int kbase_csf_ctx_init(struct kbase_context *kctx) kbase_csf_event_init(kctx); - kctx->csf.user_reg_vma = NULL; - mutex_lock(&kbdev->pm.lock); - /* The inode information for /dev/malixx file is not available at the - * time of device probe as the inode is created when the device node - * is created by udevd (through mknod). - */ - if (kctx->filp) { - if (!kbdev->csf.mali_file_inode) - kbdev->csf.mali_file_inode = kctx->filp->f_inode; - - /* inode is unique for a file */ - WARN_ON(kbdev->csf.mali_file_inode != kctx->filp->f_inode); - } - mutex_unlock(&kbdev->pm.lock); - /* Mark all the cookies as 'free' */ bitmap_fill(kctx->csf.cookies, KBASE_CSF_NUM_USER_IO_PAGES_HANDLE); @@ -1742,10 +1677,24 @@ int kbase_csf_ctx_init(struct kbase_context *kctx) err = kbase_csf_tiler_heap_context_init(kctx); if (likely(!err)) { - mutex_init(&kctx->csf.lock); - INIT_WORK(&kctx->csf.pending_submission_work, - pending_submission_worker); - } else + rt_mutex_init(&kctx->csf.lock); + + err = kbasep_ctx_user_reg_page_mapping_init(kctx); + + if (likely(!err)) { + err = kbase_kthread_run_worker_rt(kctx->kbdev, + &kctx->csf.protm_event_worker, "mali_protm_event"); + if (unlikely(err)) { + dev_err(kctx->kbdev->dev, "error initializing protm event worker thread"); + kbasep_ctx_user_reg_page_mapping_term(kctx); + } + } + + if (unlikely(err)) + kbase_csf_tiler_heap_context_term(kctx); + } + + if (unlikely(err)) kbase_csf_kcpu_queue_context_term(kctx); } @@ -1760,6 +1709,36 @@ int kbase_csf_ctx_init(struct kbase_context *kctx) return err; } +void kbase_csf_ctx_report_page_fault_for_active_groups(struct kbase_context *kctx, + struct kbase_fault *fault) +{ + struct base_gpu_queue_group_error err_payload = + (struct base_gpu_queue_group_error){ .error_type = BASE_GPU_QUEUE_GROUP_ERROR_FATAL, + .payload = { .fatal_group = { + .sideband = fault->addr, + .status = fault->status, + } } }; + struct kbase_device *kbdev = kctx->kbdev; + const u32 num_groups = kbdev->csf.global_iface.group_num; + unsigned long flags; + int csg_nr; + + lockdep_assert_held(&kbdev->hwaccess_lock); + + kbase_csf_scheduler_spin_lock(kbdev, &flags); + for (csg_nr = 0; csg_nr < num_groups; csg_nr++) { + struct kbase_queue_group *const group = + kbdev->csf.scheduler.csg_slots[csg_nr].resident_group; + + if (!group || (group->kctx != kctx)) + continue; + + group->faulted = true; + kbase_csf_add_group_fatal_error(group, &err_payload); + } + kbase_csf_scheduler_spin_unlock(kbdev, flags); +} + void kbase_csf_ctx_handle_fault(struct kbase_context *kctx, struct kbase_fault *fault) { @@ -1793,7 +1772,7 @@ void kbase_csf_ctx_handle_fault(struct kbase_context *kctx, } }; - mutex_lock(&kctx->csf.lock); + rt_mutex_lock(&kctx->csf.lock); for (gr = 0; gr < MAX_QUEUE_GROUP_NUM; gr++) { struct kbase_queue_group *const group = @@ -1801,12 +1780,15 @@ void kbase_csf_ctx_handle_fault(struct kbase_context *kctx, if (group && group->run_state != KBASE_CSF_GROUP_TERMINATED) { term_queue_group(group); + /* This would effectively be a NOP if the fatal error was already added to + * the error_list by kbase_csf_ctx_report_page_fault_for_active_groups(). + */ kbase_csf_add_group_fatal_error(group, &err_payload); reported = true; } } - mutex_unlock(&kctx->csf.lock); + rt_mutex_unlock(&kctx->csf.lock); if (reported) kbase_event_wakeup_sync(kctx); @@ -1839,9 +1821,7 @@ void kbase_csf_ctx_term(struct kbase_context *kctx) else reset_prevented = true; - cancel_work_sync(&kctx->csf.pending_submission_work); - - mutex_lock(&kctx->csf.lock); + rt_mutex_lock(&kctx->csf.lock); /* Iterate through the queue groups that were not terminated by * userspace and issue the term request to firmware for them. @@ -1854,7 +1834,7 @@ void kbase_csf_ctx_term(struct kbase_context *kctx) term_queue_group(group); } } - mutex_unlock(&kctx->csf.lock); + rt_mutex_unlock(&kctx->csf.lock); if (reset_prevented) kbase_reset_gpu_allow(kbdev); @@ -1881,7 +1861,7 @@ void kbase_csf_ctx_term(struct kbase_context *kctx) if (as) flush_workqueue(as->pf_wq); - mutex_lock(&kctx->csf.lock); + rt_mutex_lock(&kctx->csf.lock); for (i = 0; i < MAX_QUEUE_GROUP_NUM; i++) { kfree(kctx->csf.queue_groups[i]); @@ -1897,34 +1877,40 @@ void kbase_csf_ctx_term(struct kbase_context *kctx) queue = list_first_entry(&kctx->csf.queue_list, struct kbase_queue, link); + list_del_init(&queue->link); + + rt_mutex_unlock(&kctx->csf.lock); + wait_pending_queue_kick(queue); + rt_mutex_lock(&kctx->csf.lock); + /* The reference held when the IO mapping was created on bind * would have been dropped otherwise the termination of Kbase * context itself wouldn't have kicked-in. So there shall be * only one reference left that was taken when queue was * registered. */ - if (atomic_read(&queue->refcount) != 1) - dev_warn(kctx->kbdev->dev, - "Releasing queue with incorrect refcounting!\n"); - list_del_init(&queue->link); + WARN_ON(kbase_refcount_read(&queue->refcount) != 1); + release_queue(queue); } - mutex_unlock(&kctx->csf.lock); + rt_mutex_unlock(&kctx->csf.lock); + kbase_destroy_kworker_stack(&kctx->csf.protm_event_worker); + kbasep_ctx_user_reg_page_mapping_term(kctx); kbase_csf_tiler_heap_context_term(kctx); kbase_csf_kcpu_queue_context_term(kctx); kbase_csf_scheduler_context_term(kctx); kbase_csf_event_term(kctx); - mutex_destroy(&kctx->csf.lock); + rt_mutex_destroy(&kctx->csf.lock); } /** * handle_oom_event - Handle the OoM event generated by the firmware for the * CSI. * - * @kctx: Pointer to the kbase context in which the tiler heap was initialized. + * @group: Pointer to the CSG group the oom-event belongs to. * @stream: Pointer to the structure containing info provided by the firmware * about the CSI. * @@ -1939,9 +1925,10 @@ void kbase_csf_ctx_term(struct kbase_context *kctx) * Return: 0 if successfully handled the request, otherwise a negative error * code on failure. */ -static int handle_oom_event(struct kbase_context *const kctx, - struct kbase_csf_cmd_stream_info const *const stream) +static int handle_oom_event(struct kbase_queue_group *const group, + struct kbase_csf_cmd_stream_info const *const stream) { + struct kbase_context *const kctx = group->kctx; u64 gpu_heap_va = kbase_csf_firmware_cs_output(stream, CS_HEAP_ADDRESS_LO) | ((u64)kbase_csf_firmware_cs_output(stream, CS_HEAP_ADDRESS_HI) << 32); @@ -1968,12 +1955,18 @@ static int handle_oom_event(struct kbase_context *const kctx, err = kbase_csf_tiler_heap_alloc_new_chunk(kctx, gpu_heap_va, renderpasses_in_flight, pending_frag_count, &new_chunk_ptr); - /* It is okay to acknowledge with a NULL chunk (firmware will then wait - * for the fragment jobs to complete and release chunks) - */ - if (err == -EBUSY) + if ((group->csi_handlers & BASE_CSF_TILER_OOM_EXCEPTION_FLAG) && + (pending_frag_count == 0) && (err == -ENOMEM || err == -EBUSY)) { + /* The group allows incremental rendering, trigger it */ + new_chunk_ptr = 0; + dev_dbg(kctx->kbdev->dev, "Group-%d (slot-%d) enter incremental render\n", + group->handle, group->csg_nr); + } else if (err == -EBUSY) { + /* Acknowledge with a NULL chunk (firmware will then wait for + * the fragment jobs to complete and release chunks) + */ new_chunk_ptr = 0; - else if (err) + } else if (err) return err; kbase_csf_firmware_cs_input(stream, CS_TILER_HEAP_START_LO, @@ -2007,11 +2000,33 @@ static void report_tiler_oom_error(struct kbase_queue_group *group) } } } }; kbase_csf_event_add_error(group->kctx, - &group->error_tiler_oom, + &group->error_fatal, &error); kbase_event_wakeup_sync(group->kctx); } +static void flush_gpu_cache_on_fatal_error(struct kbase_device *kbdev) +{ + kbase_pm_lock(kbdev); + /* With the advent of partial cache flush, dirty cache lines could + * be left in the GPU L2 caches by terminating the queue group here + * without waiting for proper cache maintenance. A full cache flush + * here will prevent these dirty cache lines from being arbitrarily + * evicted later and possible causing memory corruption. + */ + if (kbdev->pm.backend.gpu_powered) { + kbase_gpu_start_cache_clean(kbdev, GPU_COMMAND_CACHE_CLN_INV_L2_LSC); + if (kbase_gpu_wait_cache_clean_timeout(kbdev, + kbdev->mmu_or_gpu_cache_op_wait_time_ms)) + dev_warn( + kbdev->dev, + "[%llu] Timeout waiting for CACHE_CLN_INV_L2_LSC to complete after fatal error", + kbase_backend_get_cycle_cnt(kbdev)); + } + + kbase_pm_unlock(kbdev); +} + /** * kbase_queue_oom_event - Handle tiler out-of-memory for a GPU command queue. * @@ -2024,8 +2039,8 @@ static void report_tiler_oom_error(struct kbase_queue_group *group) * notification to allow the firmware to report out-of-memory again in future. * If the out-of-memory condition was successfully handled then this function * rings the relevant doorbell to notify the firmware; otherwise, it terminates - * the GPU command queue group to which the queue is bound. See - * term_queue_group() for details. + * the GPU command queue group to which the queue is bound and notify a waiting + * user space client of the failure. */ static void kbase_queue_oom_event(struct kbase_queue *const queue) { @@ -2037,6 +2052,7 @@ static void kbase_queue_oom_event(struct kbase_queue *const queue) struct kbase_csf_cmd_stream_info const *stream; int csi_index = queue->csi_index; u32 cs_oom_ack, cs_oom_req; + unsigned long flags; lockdep_assert_held(&kctx->csf.lock); @@ -2048,6 +2064,13 @@ static void kbase_queue_oom_event(struct kbase_queue *const queue) kbase_csf_scheduler_lock(kbdev); +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + if (kbdev->csf.scheduler.sc_power_rails_off) { + dev_warn(kctx->kbdev->dev, "SC power rails off unexpectedly when handling OoM event"); + goto unlock; + } +#endif + slot_num = kbase_csf_scheduler_group_get_slot(group); /* The group could have gone off slot before this work item got @@ -2080,22 +2103,25 @@ static void kbase_queue_oom_event(struct kbase_queue *const queue) if (cs_oom_ack == cs_oom_req) goto unlock; - err = handle_oom_event(kctx, stream); + err = handle_oom_event(group, stream); + kbase_csf_scheduler_spin_lock(kbdev, &flags); kbase_csf_firmware_cs_input_mask(stream, CS_REQ, cs_oom_ack, CS_REQ_TILER_OOM_MASK); + kbase_csf_ring_cs_kernel_doorbell(kbdev, csi_index, slot_num, true); + kbase_csf_scheduler_spin_unlock(kbdev, flags); - if (err) { + if (unlikely(err)) { dev_warn( kbdev->dev, "Queue group to be terminated, couldn't handle the OoM event\n"); + kbase_debug_csf_fault_notify(kbdev, kctx, DF_TILER_OOM); kbase_csf_scheduler_unlock(kbdev); term_queue_group(group); + flush_gpu_cache_on_fatal_error(kbdev); report_tiler_oom_error(group); return; } - - kbase_csf_ring_cs_kernel_doorbell(kbdev, csi_index, slot_num, true); unlock: kbase_csf_scheduler_unlock(kbdev); } @@ -2117,18 +2143,18 @@ static void oom_event_worker(struct work_struct *data) struct kbase_device *const kbdev = kctx->kbdev; int err = kbase_reset_gpu_try_prevent(kbdev); + /* Regardless of whether reset failed or is currently happening, exit * early */ if (err) return; - mutex_lock(&kctx->csf.lock); + rt_mutex_lock(&kctx->csf.lock); kbase_queue_oom_event(queue); - release_queue(queue); - mutex_unlock(&kctx->csf.lock); + rt_mutex_unlock(&kctx->csf.lock); kbase_reset_gpu_allow(kbdev); } @@ -2153,7 +2179,7 @@ static void report_group_timeout_error(struct kbase_queue_group *const group) "Notify the event notification thread, forward progress timeout (%llu cycles)\n", kbase_csf_timeout_get(group->kctx->kbdev)); - kbase_csf_event_add_error(group->kctx, &group->error_timeout, &error); + kbase_csf_event_add_error(group->kctx, &group->error_fatal, &error); kbase_event_wakeup_sync(group->kctx); } @@ -2169,25 +2195,27 @@ static void timer_event_worker(struct work_struct *data) struct kbase_queue_group *const group = container_of(data, struct kbase_queue_group, timer_event_work); struct kbase_context *const kctx = group->kctx; + struct kbase_device *const kbdev = kctx->kbdev; bool reset_prevented = false; - int err = kbase_reset_gpu_prevent_and_wait(kctx->kbdev); + int err = kbase_reset_gpu_prevent_and_wait(kbdev); if (err) dev_warn( - kctx->kbdev->dev, + kbdev->dev, "Unsuccessful GPU reset detected when terminating group %d on progress timeout, attempting to terminate regardless", group->handle); else reset_prevented = true; - mutex_lock(&kctx->csf.lock); + rt_mutex_lock(&kctx->csf.lock); term_queue_group(group); + flush_gpu_cache_on_fatal_error(kbdev); report_group_timeout_error(group); - mutex_unlock(&kctx->csf.lock); + rt_mutex_unlock(&kctx->csf.lock); if (reset_prevented) - kbase_reset_gpu_allow(kctx->kbdev); + kbase_reset_gpu_allow(kbdev); } /** @@ -2195,30 +2223,125 @@ static void timer_event_worker(struct work_struct *data) * * @group: Pointer to GPU queue group for which the timeout event is received. * + * Notify a waiting user space client of the timeout. * Enqueue a work item to terminate the group and notify the event notification * thread of progress timeout fault for the GPU command queue group. */ static void handle_progress_timer_event(struct kbase_queue_group *const group) { + kbase_debug_csf_fault_notify(group->kctx->kbdev, group->kctx, + DF_PROGRESS_TIMER_TIMEOUT); + queue_work(group->kctx->csf.wq, &group->timer_event_work); } /** + * alloc_grp_protected_suspend_buffer_pages() - Allocate physical pages from the protected + * memory for the protected mode suspend buffer. + * @group: Pointer to the GPU queue group. + * + * Return: 0 if suspend buffer allocation is successful or if its already allocated, otherwise + * negative error value. + */ +static int alloc_grp_protected_suspend_buffer_pages(struct kbase_queue_group *const group) +{ + struct kbase_device *const kbdev = group->kctx->kbdev; + struct kbase_context *kctx = group->kctx; + struct tagged_addr *phys = NULL; + struct kbase_protected_suspend_buffer *sbuf = &group->protected_suspend_buf; + size_t nr_pages; + int err = 0; + + if (likely(sbuf->pma)) + return 0; + + nr_pages = PFN_UP(kbdev->csf.global_iface.groups[0].suspend_size); + phys = kcalloc(nr_pages, sizeof(*phys), GFP_KERNEL); + if (unlikely(!phys)) { + err = -ENOMEM; + goto phys_free; + } + + rt_mutex_lock(&kctx->csf.lock); + kbase_csf_scheduler_lock(kbdev); + + if (unlikely(!group->csg_reg)) { + /* The only chance of the bound csg_reg is removed from the group is + * that it has been put off slot by the scheduler and the csg_reg resource + * is contended by other groups. In this case, it needs another occasion for + * mapping the pma, which needs a bound csg_reg. Since the group is already + * off-slot, returning no error is harmless as the scheduler, when place the + * group back on-slot again would do the required MMU map operation on the + * allocated and retained pma. + */ + WARN_ON(group->csg_nr >= 0); + dev_dbg(kbdev->dev, "No bound csg_reg for group_%d_%d_%d to enter protected mode", + group->kctx->tgid, group->kctx->id, group->handle); + goto unlock; + } + + /* Allocate the protected mode pages */ + sbuf->pma = kbase_csf_protected_memory_alloc(kbdev, phys, nr_pages, true); + if (unlikely(!sbuf->pma)) { + err = -ENOMEM; + goto unlock; + } + + /* Map the bound susp_reg to the just allocated pma pages */ + err = kbase_csf_mcu_shared_group_update_pmode_map(kbdev, group); + +unlock: + kbase_csf_scheduler_unlock(kbdev); + rt_mutex_unlock(&kctx->csf.lock); +phys_free: + kfree(phys); + return err; +} + +static void report_group_fatal_error(struct kbase_queue_group *const group) +{ + struct base_gpu_queue_group_error const + err_payload = { .error_type = BASE_GPU_QUEUE_GROUP_ERROR_FATAL, + .payload = { .fatal_group = { + .status = GPU_EXCEPTION_TYPE_SW_FAULT_0, + } } }; + + kbase_csf_add_group_fatal_error(group, &err_payload); + kbase_event_wakeup_sync(group->kctx); +} + +/** * protm_event_worker - Protected mode switch request event handler - * called from a workqueue. + * called from a kthread. * - * @data: Pointer to a work_struct embedded in GPU command queue group data. + * @work: Pointer to a kthread_work struct embedded in GPU command queue group data. * * Request to switch to protected mode. */ -static void protm_event_worker(struct work_struct *data) +static void protm_event_worker(struct kthread_work *work) { struct kbase_queue_group *const group = - container_of(data, struct kbase_queue_group, protm_event_work); + container_of(work, struct kbase_queue_group, protm_event_work); + struct kbase_protected_suspend_buffer *sbuf = &group->protected_suspend_buf; + int err = 0; - KBASE_KTRACE_ADD_CSF_GRP(group->kctx->kbdev, PROTM_EVENT_WORKER_BEGIN, + KBASE_KTRACE_ADD_CSF_GRP(group->kctx->kbdev, PROTM_EVENT_WORKER_START, group, 0u); - kbase_csf_scheduler_group_protm_enter(group); + + err = alloc_grp_protected_suspend_buffer_pages(group); + if (!err) { + kbase_csf_scheduler_group_protm_enter(group); + } else if (err == -ENOMEM && sbuf->alloc_retries <= PROTM_ALLOC_MAX_RETRIES) { + sbuf->alloc_retries++; + /* try again to allocate pages */ + kthread_queue_work(&group->kctx->csf.protm_event_worker, &group->protm_event_work); + } else if (sbuf->alloc_retries >= PROTM_ALLOC_MAX_RETRIES || err != -ENOMEM) { + dev_err(group->kctx->kbdev->dev, + "Failed to allocate physical pages for Protected mode suspend buffer for the group %d of context %d_%d", + group->handle, group->kctx->tgid, group->kctx->id); + report_group_fatal_error(group); + } + KBASE_KTRACE_ADD_CSF_GRP(group->kctx->kbdev, PROTM_EVENT_WORKER_END, group, 0u); } @@ -2227,16 +2350,20 @@ static void protm_event_worker(struct work_struct *data) * handle_fault_event - Handler for CS fault. * * @queue: Pointer to queue for which fault event was received. - * @stream: Pointer to the structure containing info provided by the - * firmware about the CSI. - * - * Prints meaningful CS fault information. + * @cs_ack: Value of the CS_ACK register in the CS kernel input page used for + * the queue. * + * Print required information about the CS fault and notify the user space client + * about the fault. */ static void -handle_fault_event(struct kbase_queue *const queue, - struct kbase_csf_cmd_stream_info const *const stream) +handle_fault_event(struct kbase_queue *const queue, const u32 cs_ack) { + struct kbase_device *const kbdev = queue->kctx->kbdev; + struct kbase_csf_cmd_stream_group_info const *ginfo = + &kbdev->csf.global_iface.groups[queue->group->csg_nr]; + struct kbase_csf_cmd_stream_info const *stream = + &ginfo->streams[queue->csi_index]; const u32 cs_fault = kbase_csf_firmware_cs_output(stream, CS_FAULT); const u64 cs_fault_info = kbase_csf_firmware_cs_output(stream, CS_FAULT_INFO_LO) | @@ -2248,7 +2375,6 @@ handle_fault_event(struct kbase_queue *const queue, CS_FAULT_EXCEPTION_DATA_GET(cs_fault); const u64 cs_fault_info_exception_data = CS_FAULT_INFO_EXCEPTION_DATA_GET(cs_fault_info); - struct kbase_device *const kbdev = queue->kctx->kbdev; kbase_csf_scheduler_spin_lock_assert_held(kbdev); @@ -2263,53 +2389,82 @@ handle_fault_event(struct kbase_queue *const queue, kbase_gpu_exception_name(cs_fault_exception_type), cs_fault_exception_data, cs_fault_info_exception_data); + +#if IS_ENABLED(CONFIG_DEBUG_FS) + /* CS_RESOURCE_TERMINATED type fault event can be ignored from the + * standpoint of dump on error. It is used to report fault for the CSIs + * that are associated with the same CSG as the CSI for which the actual + * fault was reported by the Iterator. + * Dumping would be triggered when the actual fault is reported. + * + * CS_INHERIT_FAULT can also be ignored. It could happen due to the error + * in other types of queues (cpu/kcpu). If a fault had occurred in some + * other GPU queue then the dump would have been performed anyways when + * that fault was reported. + */ + if ((cs_fault_exception_type != CS_FAULT_EXCEPTION_TYPE_CS_INHERIT_FAULT) && + (cs_fault_exception_type != CS_FAULT_EXCEPTION_TYPE_CS_RESOURCE_TERMINATED)) { + if (unlikely(kbase_debug_csf_fault_notify(kbdev, queue->kctx, DF_CS_FAULT))) { + queue->cs_error = cs_fault; + queue->cs_error_info = cs_fault_info; + queue->cs_error_fatal = false; + queue_work(queue->kctx->csf.wq, &queue->cs_error_work); + return; + } + } +#endif + + kbase_csf_firmware_cs_input_mask(stream, CS_REQ, cs_ack, + CS_REQ_FAULT_MASK); + kbase_csf_ring_cs_kernel_doorbell(kbdev, queue->csi_index, queue->group->csg_nr, true); } -static void report_queue_fatal_error(struct kbase_queue *const queue, - u32 cs_fatal, u64 cs_fatal_info, - u8 group_handle) +static void report_queue_fatal_error(struct kbase_queue *const queue, u32 cs_fatal, + u64 cs_fatal_info, struct kbase_queue_group *group) { - struct base_csf_notification error = { - .type = BASE_CSF_NOTIFICATION_GPU_QUEUE_GROUP_ERROR, - .payload = { - .csg_error = { - .handle = group_handle, - .error = { - .error_type = - BASE_GPU_QUEUE_GROUP_QUEUE_ERROR_FATAL, - .payload = { - .fatal_queue = { - .sideband = cs_fatal_info, - .status = cs_fatal, - .csi_index = queue->csi_index, - } - } - } - } - } - }; + struct base_csf_notification + error = { .type = BASE_CSF_NOTIFICATION_GPU_QUEUE_GROUP_ERROR, + .payload = { + .csg_error = { + .error = { .error_type = + BASE_GPU_QUEUE_GROUP_QUEUE_ERROR_FATAL, + .payload = { .fatal_queue = { + .sideband = cs_fatal_info, + .status = cs_fatal, + } } } } } }; + + if (!queue) + return; - kbase_csf_event_add_error(queue->kctx, &queue->error, &error); - kbase_event_wakeup(queue->kctx); + if (WARN_ON_ONCE(!group)) + return; + + error.payload.csg_error.handle = group->handle; + error.payload.csg_error.error.payload.fatal_queue.csi_index = queue->csi_index; + kbase_csf_event_add_error(queue->kctx, &group->error_fatal, &error); + kbase_event_wakeup_sync(queue->kctx); } /** - * fatal_event_worker - Handle the fatal error for the GPU queue + * cs_error_worker - Handle the CS_FATAL/CS_FAULT error for the GPU queue * * @data: Pointer to a work_struct embedded in GPU command queue. * * Terminate the CSG and report the error to userspace. */ -static void fatal_event_worker(struct work_struct *const data) +static void cs_error_worker(struct work_struct *const data) { struct kbase_queue *const queue = - container_of(data, struct kbase_queue, fatal_event_work); + container_of(data, struct kbase_queue, cs_error_work); + const u32 cs_fatal_exception_type = CS_FATAL_EXCEPTION_TYPE_GET(queue->cs_error); struct kbase_context *const kctx = queue->kctx; struct kbase_device *const kbdev = kctx->kbdev; struct kbase_queue_group *group; - u8 group_handle; bool reset_prevented = false; - int err = kbase_reset_gpu_prevent_and_wait(kbdev); + int err; + + kbase_debug_csf_fault_wait_completion(kbdev); + err = kbase_reset_gpu_prevent_and_wait(kbdev); if (err) dev_warn( @@ -2318,7 +2473,7 @@ static void fatal_event_worker(struct work_struct *const data) else reset_prevented = true; - mutex_lock(&kctx->csf.lock); + rt_mutex_lock(&kctx->csf.lock); group = get_bound_queue_group(queue); if (!group) { @@ -2326,14 +2481,48 @@ static void fatal_event_worker(struct work_struct *const data) goto unlock; } - group_handle = group->handle; +#if IS_ENABLED(CONFIG_DEBUG_FS) + if (!queue->cs_error_fatal) { + unsigned long flags; + int slot_num; + + kbase_csf_scheduler_spin_lock(kbdev, &flags); + slot_num = kbase_csf_scheduler_group_get_slot_locked(group); + if (slot_num >= 0) { + struct kbase_csf_cmd_stream_group_info const *ginfo = + &kbdev->csf.global_iface.groups[slot_num]; + struct kbase_csf_cmd_stream_info const *stream = + &ginfo->streams[queue->csi_index]; + u32 const cs_ack = + kbase_csf_firmware_cs_output(stream, CS_ACK); + + kbase_csf_firmware_cs_input_mask(stream, CS_REQ, cs_ack, + CS_REQ_FAULT_MASK); + kbase_csf_ring_cs_kernel_doorbell(kbdev, queue->csi_index, + slot_num, true); + } + kbase_csf_scheduler_spin_unlock(kbdev, flags); + goto unlock; + } +#endif + term_queue_group(group); - report_queue_fatal_error(queue, queue->cs_fatal, queue->cs_fatal_info, - group_handle); + flush_gpu_cache_on_fatal_error(kbdev); + /* For an invalid GPU page fault, CS_BUS_FAULT fatal error is expected after the + * page fault handler disables the AS of faulty context. Need to skip reporting the + * CS_BUS_FAULT fatal error to the Userspace as it doesn't have the full fault info. + * Page fault handler will report the fatal error with full page fault info. + */ + if ((cs_fatal_exception_type == CS_FATAL_EXCEPTION_TYPE_CS_BUS_FAULT) && group->faulted) { + dev_dbg(kbdev->dev, + "Skipped reporting CS_BUS_FAULT for queue %d of group %d of ctx %d_%d", + queue->csi_index, group->handle, kctx->tgid, kctx->id); + } else { + report_queue_fatal_error(queue, queue->cs_error, queue->cs_error_info, group); + } unlock: - release_queue(queue); - mutex_unlock(&kctx->csf.lock); + rt_mutex_unlock(&kctx->csf.lock); if (reset_prevented) kbase_reset_gpu_allow(kbdev); } @@ -2344,14 +2533,18 @@ unlock: * @queue: Pointer to queue for which fatal event was received. * @stream: Pointer to the structure containing info provided by the * firmware about the CSI. + * @cs_ack: Value of the CS_ACK register in the CS kernel input page used for + * the queue. * - * Prints meaningful CS fatal information. + * Notify a waiting user space client of the CS fatal and prints meaningful + * information. * Enqueue a work item to terminate the group and report the fatal error * to user space. */ static void handle_fatal_event(struct kbase_queue *const queue, - struct kbase_csf_cmd_stream_info const *const stream) + struct kbase_csf_cmd_stream_info const *const stream, + u32 cs_ack) { const u32 cs_fatal = kbase_csf_firmware_cs_output(stream, CS_FATAL); const u64 cs_fatal_info = @@ -2381,52 +2574,24 @@ handle_fatal_event(struct kbase_queue *const queue, if (cs_fatal_exception_type == CS_FATAL_EXCEPTION_TYPE_FIRMWARE_INTERNAL_ERROR) { + kbase_debug_csf_fault_notify(kbdev, queue->kctx, DF_FW_INTERNAL_ERROR); queue_work(system_wq, &kbdev->csf.fw_error_work); } else { - get_queue(queue); - queue->cs_fatal = cs_fatal; - queue->cs_fatal_info = cs_fatal_info; - if (!queue_work(queue->kctx->csf.wq, &queue->fatal_event_work)) - release_queue(queue); + kbase_debug_csf_fault_notify(kbdev, queue->kctx, DF_CS_FATAL); + if (cs_fatal_exception_type == CS_FATAL_EXCEPTION_TYPE_CS_UNRECOVERABLE) { + queue->group->cs_unrecoverable = true; + if (kbase_prepare_to_reset_gpu(queue->kctx->kbdev, RESET_FLAGS_NONE)) + kbase_reset_gpu(queue->kctx->kbdev); + } + queue->cs_error = cs_fatal; + queue->cs_error_info = cs_fatal_info; + queue->cs_error_fatal = true; + queue_work(queue->kctx->csf.wq, &queue->cs_error_work); } -} - -/** - * handle_queue_exception_event - Handler for CS fatal/fault exception events. - * - * @queue: Pointer to queue for which fatal/fault event was received. - * @cs_req: Value of the CS_REQ register from the CS's input page. - * @cs_ack: Value of the CS_ACK register from the CS's output page. - */ -static void handle_queue_exception_event(struct kbase_queue *const queue, - const u32 cs_req, const u32 cs_ack) -{ - struct kbase_csf_cmd_stream_group_info const *ginfo; - struct kbase_csf_cmd_stream_info const *stream; - struct kbase_context *const kctx = queue->kctx; - struct kbase_device *const kbdev = kctx->kbdev; - struct kbase_queue_group *group = queue->group; - int csi_index = queue->csi_index; - int slot_num = group->csg_nr; + kbase_csf_firmware_cs_input_mask(stream, CS_REQ, cs_ack, + CS_REQ_FATAL_MASK); - kbase_csf_scheduler_spin_lock_assert_held(kbdev); - - ginfo = &kbdev->csf.global_iface.groups[slot_num]; - stream = &ginfo->streams[csi_index]; - - if ((cs_ack & CS_ACK_FATAL_MASK) != (cs_req & CS_REQ_FATAL_MASK)) { - handle_fatal_event(queue, stream); - kbase_csf_firmware_cs_input_mask(stream, CS_REQ, cs_ack, - CS_REQ_FATAL_MASK); - } - - if ((cs_ack & CS_ACK_FAULT_MASK) != (cs_req & CS_REQ_FAULT_MASK)) { - handle_fault_event(queue, stream); - kbase_csf_firmware_cs_input_mask(stream, CS_REQ, cs_ack, - CS_REQ_FAULT_MASK); - kbase_csf_ring_cs_kernel_doorbell(kbdev, csi_index, slot_num, true); - } } /** @@ -2436,6 +2601,9 @@ static void handle_queue_exception_event(struct kbase_queue *const queue, * @ginfo: The CSG interface provided by the firmware. * @irqreq: CSG's IRQ request bitmask (one bit per CS). * @irqack: CSG's IRQ acknowledge bitmask (one bit per CS). + * @track: Pointer that tracks the highest scanout priority idle CSG + * and any newly potentially viable protected mode requesting + * CSG in current IRQ context. * * If the interrupt request bitmask differs from the acknowledge bitmask * then the firmware is notifying the host of an event concerning those @@ -2444,8 +2612,9 @@ static void handle_queue_exception_event(struct kbase_queue *const queue, * the request and acknowledge registers for the individual CS(s). */ static void process_cs_interrupts(struct kbase_queue_group *const group, - struct kbase_csf_cmd_stream_group_info const *const ginfo, - u32 const irqreq, u32 const irqack) + struct kbase_csf_cmd_stream_group_info const *const ginfo, + u32 const irqreq, u32 const irqack, + struct irq_idle_and_protm_track *track) { struct kbase_device *const kbdev = group->kctx->kbdev; u32 remaining = irqreq ^ irqack; @@ -2475,10 +2644,16 @@ static void process_cs_interrupts(struct kbase_queue_group *const group, kbase_csf_firmware_cs_output(stream, CS_ACK); struct workqueue_struct *wq = group->kctx->csf.wq; - if ((cs_req & CS_REQ_EXCEPTION_MASK) ^ - (cs_ack & CS_ACK_EXCEPTION_MASK)) { - KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, CSI_FAULT_INTERRUPT, group, queue, cs_req ^ cs_ack); - handle_queue_exception_event(queue, cs_req, cs_ack); + if ((cs_ack & CS_ACK_FATAL_MASK) != (cs_req & CS_REQ_FATAL_MASK)) { + KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, CSI_INTERRUPT_FAULT, + group, queue, cs_req ^ cs_ack); + handle_fatal_event(queue, stream, cs_ack); + } + + if ((cs_ack & CS_ACK_FAULT_MASK) != (cs_req & CS_REQ_FAULT_MASK)) { + KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, CSI_INTERRUPT_FAULT, + group, queue, cs_req ^ cs_ack); + handle_fault_event(queue, cs_ack); } /* PROTM_PEND and TILER_OOM can be safely ignored @@ -2489,30 +2664,35 @@ static void process_cs_interrupts(struct kbase_queue_group *const group, u32 const cs_req_remain = cs_req & ~CS_REQ_EXCEPTION_MASK; u32 const cs_ack_remain = cs_ack & ~CS_ACK_EXCEPTION_MASK; - KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, CSI_IGNORED_INTERRUPTS_GROUP_SUSPEND, - group, queue, cs_req_remain ^ cs_ack_remain); + KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, + CSI_INTERRUPT_GROUP_SUSPENDS_IGNORED, + group, queue, + cs_req_remain ^ cs_ack_remain); continue; } if (((cs_req & CS_REQ_TILER_OOM_MASK) ^ (cs_ack & CS_ACK_TILER_OOM_MASK))) { - get_queue(queue); - KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, CSI_TILER_OOM_INTERRUPT, group, queue, - cs_req ^ cs_ack); - if (WARN_ON(!queue_work(wq, &queue->oom_event_work))) { + KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, CSI_INTERRUPT_TILER_OOM, + group, queue, cs_req ^ cs_ack); + if (!queue_work(wq, &queue->oom_event_work)) { /* The work item shall not have been * already queued, there can be only * one pending OoM event for a * queue. */ - release_queue(queue); + dev_warn( + kbdev->dev, + "Tiler OOM work pending: queue %d group %d (ctx %d_%d)", + queue->csi_index, group->handle, queue->kctx->tgid, + queue->kctx->id); } } if ((cs_req & CS_REQ_PROTM_PEND_MASK) ^ (cs_ack & CS_ACK_PROTM_PEND_MASK)) { - KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, CSI_PROTM_PEND_INTERRUPT, group, queue, - cs_req ^ cs_ack); + KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, CSI_INTERRUPT_PROTM_PEND, + group, queue, cs_req ^ cs_ack); dev_dbg(kbdev->dev, "Protected mode entry request for queue on csi %d bound to group-%d on slot %d", @@ -2520,7 +2700,7 @@ static void process_cs_interrupts(struct kbase_queue_group *const group, group->csg_nr); bitmap_set(group->protm_pending_bitmap, i, 1); - KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, PROTM_PENDING_SET, group, queue, + KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, CSI_PROTM_PEND_SET, group, queue, group->protm_pending_bitmap[0]); protm_pend = true; } @@ -2529,17 +2709,21 @@ static void process_cs_interrupts(struct kbase_queue_group *const group, if (protm_pend) { struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler; - u32 current_protm_pending_seq = - scheduler->tick_protm_pending_seq; - if (current_protm_pending_seq > group->scan_seq_num) { + if (scheduler->tick_protm_pending_seq > group->scan_seq_num) { scheduler->tick_protm_pending_seq = group->scan_seq_num; - queue_work(group->kctx->csf.wq, &group->protm_event_work); + track->protm_grp = group; } + if (!group->protected_suspend_buf.pma) + kthread_queue_work(&group->kctx->csf.protm_event_worker, + &group->protm_event_work); + if (test_bit(group->csg_nr, scheduler->csg_slots_idle_mask)) { clear_bit(group->csg_nr, scheduler->csg_slots_idle_mask); + KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_SLOT_IDLE_CLEAR, group, + scheduler->csg_slots_idle_mask[0]); dev_dbg(kbdev->dev, "Group-%d on slot %d de-idled by protm request", group->handle, group->csg_nr); @@ -2552,6 +2736,8 @@ static void process_cs_interrupts(struct kbase_queue_group *const group, * * @kbdev: Instance of a GPU platform device that implements a CSF interface. * @csg_nr: CSG number. + * @track: Pointer that tracks the highest idle CSG and the newly possible viable + * protected mode requesting group, in current IRQ context. * * Handles interrupts for a CSG and for CSs within it. * @@ -2562,8 +2748,8 @@ static void process_cs_interrupts(struct kbase_queue_group *const group, * * See process_cs_interrupts() for details of per-stream interrupt handling. */ -static void process_csg_interrupts(struct kbase_device *const kbdev, - int const csg_nr) +static void process_csg_interrupts(struct kbase_device *const kbdev, int const csg_nr, + struct irq_idle_and_protm_track *track) { struct kbase_csf_cmd_stream_group_info *ginfo; struct kbase_queue_group *group = NULL; @@ -2574,8 +2760,6 @@ static void process_csg_interrupts(struct kbase_device *const kbdev, if (WARN_ON(csg_nr >= kbdev->csf.global_iface.group_num)) return; - KBASE_KTRACE_ADD(kbdev, CSG_INTERRUPT_PROCESS, NULL, csg_nr); - ginfo = &kbdev->csf.global_iface.groups[csg_nr]; req = kbase_csf_firmware_csg_input_read(ginfo, CSG_REQ); ack = kbase_csf_firmware_csg_output(ginfo, CSG_ACK); @@ -2584,7 +2768,7 @@ static void process_csg_interrupts(struct kbase_device *const kbdev, /* There may not be any pending CSG/CS interrupts to process */ if ((req == ack) && (irqreq == irqack)) - goto out; + return; /* Immediately set IRQ_ACK bits to be same as the IRQ_REQ bits before * examining the CS_ACK & CS_REQ bits. This would ensure that Host @@ -2605,33 +2789,28 @@ static void process_csg_interrupts(struct kbase_device *const kbdev, * slot scheduler spinlock is required. */ if (!group) - goto out; + return; if (WARN_ON(kbase_csf_scheduler_group_get_slot_locked(group) != csg_nr)) - goto out; - - if ((req ^ ack) & CSG_REQ_SYNC_UPDATE_MASK) { - kbase_csf_firmware_csg_input_mask(ginfo, - CSG_REQ, ack, CSG_REQ_SYNC_UPDATE_MASK); + return; - KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_SYNC_UPDATE_INTERRUPT, group, req ^ ack); + KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_INTERRUPT_PROCESS_START, group, csg_nr); - /* SYNC_UPDATE events shall invalidate GPU idle event */ - atomic_set(&kbdev->csf.scheduler.gpu_no_longer_idle, true); - - kbase_csf_event_signal_cpu_only(group->kctx); - } + kbase_csf_handle_csg_sync_update(kbdev, ginfo, group, req, ack); if ((req ^ ack) & CSG_REQ_IDLE_MASK) { struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler; + KBASE_TLSTREAM_TL_KBASE_DEVICE_CSG_IDLE( + kbdev, kbdev->gpu_props.props.raw_props.gpu_id, csg_nr); + kbase_csf_firmware_csg_input_mask(ginfo, CSG_REQ, ack, CSG_REQ_IDLE_MASK); set_bit(csg_nr, scheduler->csg_slots_idle_mask); KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_SLOT_IDLE_SET, group, scheduler->csg_slots_idle_mask[0]); - KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_IDLE_INTERRUPT, group, req ^ ack); + KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_INTERRUPT_IDLE, group, req ^ ack); dev_dbg(kbdev->dev, "Idle notification received for Group %u on slot %d\n", group->handle, csg_nr); @@ -2639,42 +2818,37 @@ static void process_csg_interrupts(struct kbase_device *const kbdev, /* If there are non-idle CSGs waiting for a slot, fire * a tock for a replacement. */ - mod_delayed_work(scheduler->wq, &scheduler->tock_work, 0); + KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_INTERRUPT_NON_IDLE_GROUPS, + group, req ^ ack); + kbase_csf_scheduler_invoke_tock(kbdev); } else { - u32 current_protm_pending_seq = - scheduler->tick_protm_pending_seq; - - if ((current_protm_pending_seq != - KBASEP_TICK_PROTM_PEND_SCAN_SEQ_NR_INVALID) && - (group->scan_seq_num < current_protm_pending_seq)) { - /* If the protm enter was prevented due to groups - * priority, then fire a tock for the scheduler - * to re-examine the case. - */ - mod_delayed_work(scheduler->wq, - &scheduler->tock_work, 0); - } + KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_INTERRUPT_NO_NON_IDLE_GROUPS, + group, req ^ ack); + } + + if (group->scan_seq_num < track->idle_seq) { + track->idle_seq = group->scan_seq_num; + track->idle_slot = csg_nr; } } if ((req ^ ack) & CSG_REQ_PROGRESS_TIMER_EVENT_MASK) { kbase_csf_firmware_csg_input_mask(ginfo, CSG_REQ, ack, - CSG_REQ_PROGRESS_TIMER_EVENT_MASK); + CSG_REQ_PROGRESS_TIMER_EVENT_MASK); - KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_PROGRESS_TIMER_INTERRUPT, - group, req ^ ack); - dev_info(kbdev->dev, + KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_INTERRUPT_PROGRESS_TIMER_EVENT, group, + req ^ ack); + dev_info( + kbdev->dev, "[%llu] Iterator PROGRESS_TIMER timeout notification received for group %u of ctx %d_%d on slot %d\n", - kbase_backend_get_cycle_cnt(kbdev), - group->handle, group->kctx->tgid, group->kctx->id, csg_nr); + kbase_backend_get_cycle_cnt(kbdev), group->handle, group->kctx->tgid, + group->kctx->id, csg_nr); handle_progress_timer_event(group); } - process_cs_interrupts(group, ginfo, irqreq, irqack); + process_cs_interrupts(group, ginfo, irqreq, irqack, track); -out: - /* group may still be NULL here */ KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_INTERRUPT_PROCESS_END, group, ((u64)req ^ ack) | (((u64)irqreq ^ irqack) << 32)); } @@ -2793,6 +2967,7 @@ static inline void check_protm_enter_req_complete(struct kbase_device *kbdev, dev_dbg(kbdev->dev, "Protected mode entry interrupt received"); kbdev->protected_mode = true; + trace_mali_protected_mode(kbdev->protected_mode); kbase_ipa_protection_mode_switch_event(kbdev); kbase_ipa_control_protm_entered(kbdev); kbase_hwcnt_backend_csf_protm_entered(&kbdev->hwcnt_gpu_iface); @@ -2822,7 +2997,7 @@ static inline void process_protm_exit(struct kbase_device *kbdev, u32 glb_ack) GLB_REQ_PROTM_EXIT_MASK); if (likely(scheduler->active_protm_grp)) { - KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_EXIT_PROTM, + KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_PROTM_EXIT, scheduler->active_protm_grp, 0u); scheduler->active_protm_grp = NULL; } else { @@ -2831,80 +3006,230 @@ static inline void process_protm_exit(struct kbase_device *kbdev, u32 glb_ack) if (!WARN_ON(!kbdev->protected_mode)) { kbdev->protected_mode = false; + trace_mali_protected_mode(kbdev->protected_mode); kbase_ipa_control_protm_exited(kbdev); kbase_hwcnt_backend_csf_protm_exited(&kbdev->hwcnt_gpu_iface); } + +#if IS_ENABLED(CONFIG_MALI_CORESIGHT) + kbase_debug_coresight_csf_enable_pmode_exit(kbdev); +#endif /* IS_ENABLED(CONFIG_MALI_CORESIGHT) */ } -void kbase_csf_interrupt(struct kbase_device *kbdev, u32 val) +static inline void process_tracked_info_for_protm(struct kbase_device *kbdev, + struct irq_idle_and_protm_track *track) { - unsigned long flags; - u32 csg_interrupts = val & ~JOB_IRQ_GLOBAL_IF; + struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler; + struct kbase_queue_group *group = track->protm_grp; + u32 current_protm_pending_seq = scheduler->tick_protm_pending_seq; - lockdep_assert_held(&kbdev->hwaccess_lock); + kbase_csf_scheduler_spin_lock_assert_held(kbdev); - KBASE_KTRACE_ADD(kbdev, CSF_INTERRUPT, NULL, val); - kbase_reg_write(kbdev, JOB_CONTROL_REG(JOB_IRQ_CLEAR), val); + if (likely(current_protm_pending_seq == KBASEP_TICK_PROTM_PEND_SCAN_SEQ_NR_INVALID)) + return; - if (csg_interrupts != 0) { - kbase_csf_scheduler_spin_lock(kbdev, &flags); - while (csg_interrupts != 0) { - int const csg_nr = ffs(csg_interrupts) - 1; + /* Handle protm from the tracked information */ + if (track->idle_seq < current_protm_pending_seq) { + /* If the protm enter was prevented due to groups priority, then fire a tock + * for the scheduler to re-examine the case. + */ + dev_dbg(kbdev->dev, "Attempt pending protm from idle slot %d\n", track->idle_slot); + kbase_csf_scheduler_invoke_tock(kbdev); + } else if (group) { + u32 i, num_groups = kbdev->csf.global_iface.group_num; + struct kbase_queue_group *grp; + bool tock_triggered = false; + + /* A new protm request, and track->idle_seq is not sufficient, check across + * previously notified idle CSGs in the current tick/tock cycle. + */ + for_each_set_bit(i, scheduler->csg_slots_idle_mask, num_groups) { + if (i == track->idle_slot) + continue; + grp = kbase_csf_scheduler_get_group_on_slot(kbdev, i); + /* If not NULL then the group pointer cannot disappear as the + * scheduler spinlock is held. + */ + if (grp == NULL) + continue; - process_csg_interrupts(kbdev, csg_nr); - csg_interrupts &= ~(1 << csg_nr); + if (grp->scan_seq_num < current_protm_pending_seq) { + tock_triggered = true; + dev_dbg(kbdev->dev, + "Attempt new protm from tick/tock idle slot %d\n", i); + kbase_csf_scheduler_invoke_tock(kbdev); + break; + } + } + + if (!tock_triggered) { + dev_dbg(kbdev->dev, "Group-%d on slot-%d start protm work\n", + group->handle, group->csg_nr); + kthread_queue_work(&group->kctx->csf.protm_event_worker, + &group->protm_event_work); } - kbase_csf_scheduler_spin_unlock(kbdev, flags); } +} - if (val & JOB_IRQ_GLOBAL_IF) { - const struct kbase_csf_global_iface *const global_iface = - &kbdev->csf.global_iface; +static void order_job_irq_clear_with_iface_mem_read(void) +{ + /* Ensure that write to the JOB_IRQ_CLEAR is ordered with regards to the + * read from interface memory. The ordering is needed considering the way + * FW & Kbase writes to the JOB_IRQ_RAWSTAT and JOB_IRQ_CLEAR registers + * without any synchronization. Without the barrier there is no guarantee + * about the ordering, the write to IRQ_CLEAR can take effect after the read + * from interface memory and that could cause a problem for the scenario where + * FW sends back to back notifications for the same CSG for events like + * SYNC_UPDATE and IDLE, but Kbase gets a single IRQ and observes only the + * first event. Similar thing can happen with glb events like CFG_ALLOC_EN + * acknowledgment and GPU idle notification. + * + * MCU CPU + * --------------- ---------------- + * Update interface memory Write to IRQ_CLEAR to clear current IRQ + * <barrier> <barrier> + * Write to IRQ_RAWSTAT to raise new IRQ Read interface memory + */ - kbdev->csf.interrupt_received = true; + /* CPU and GPU would be in the same Outer shareable domain */ + dmb(osh); +} - if (!kbdev->csf.firmware_reloaded) - kbase_csf_firmware_reload_completed(kbdev); - else if (global_iface->output) { - u32 glb_req, glb_ack; +void kbase_csf_interrupt(struct kbase_device *kbdev, u32 val) +{ + bool deferred_handling_glb_idle_irq = false; - kbase_csf_scheduler_spin_lock(kbdev, &flags); - glb_req = kbase_csf_firmware_global_input_read( - global_iface, GLB_REQ); - glb_ack = kbase_csf_firmware_global_output( - global_iface, GLB_ACK); - KBASE_KTRACE_ADD(kbdev, GLB_REQ_ACQ, NULL, glb_req ^ glb_ack); + lockdep_assert_held(&kbdev->hwaccess_lock); - check_protm_enter_req_complete(kbdev, glb_req, glb_ack); + KBASE_KTRACE_ADD(kbdev, CSF_INTERRUPT_START, NULL, val); - if ((glb_req ^ glb_ack) & GLB_REQ_PROTM_EXIT_MASK) - process_protm_exit(kbdev, glb_ack); + do { + unsigned long flags; + u32 csg_interrupts = val & ~JOB_IRQ_GLOBAL_IF; + bool glb_idle_irq_received = false; - /* Handle IDLE Hysteresis notification event */ - if ((glb_req ^ glb_ack) & GLB_REQ_IDLE_EVENT_MASK) { - dev_dbg(kbdev->dev, "Idle-hysteresis event flagged"); - kbase_csf_firmware_global_input_mask( - global_iface, GLB_REQ, glb_ack, - GLB_REQ_IDLE_EVENT_MASK); + kbase_reg_write(kbdev, JOB_CONTROL_REG(JOB_IRQ_CLEAR), val); + order_job_irq_clear_with_iface_mem_read(); - kbase_csf_scheduler_process_gpu_idle_event(kbdev); - } + if (csg_interrupts != 0) { + struct irq_idle_and_protm_track track = { .protm_grp = NULL, + .idle_seq = U32_MAX, + .idle_slot = S8_MAX }; - process_prfcnt_interrupts(kbdev, glb_req, glb_ack); + kbase_csf_scheduler_spin_lock(kbdev, &flags); + /* Looping through and track the highest idle and protm groups */ + while (csg_interrupts != 0) { + int const csg_nr = ffs(csg_interrupts) - 1; + + process_csg_interrupts(kbdev, csg_nr, &track); + csg_interrupts &= ~(1 << csg_nr); + } + /* Handle protm from the tracked information */ + process_tracked_info_for_protm(kbdev, &track); kbase_csf_scheduler_spin_unlock(kbdev, flags); + } - /* Invoke the MCU state machine as a state transition - * might have completed. - */ - kbase_pm_update_state(kbdev); + if (val & JOB_IRQ_GLOBAL_IF) { + const struct kbase_csf_global_iface *const global_iface = + &kbdev->csf.global_iface; + + kbdev->csf.interrupt_received = true; + + if (!kbdev->csf.firmware_reloaded) + kbase_csf_firmware_reload_completed(kbdev); + else if (global_iface->output) { + u32 glb_req, glb_ack; + + kbase_csf_scheduler_spin_lock(kbdev, &flags); + glb_req = + kbase_csf_firmware_global_input_read(global_iface, GLB_REQ); + glb_ack = kbase_csf_firmware_global_output(global_iface, GLB_ACK); + KBASE_KTRACE_ADD(kbdev, CSF_INTERRUPT_GLB_REQ_ACK, NULL, + glb_req ^ glb_ack); + + check_protm_enter_req_complete(kbdev, glb_req, glb_ack); + + if ((glb_req ^ glb_ack) & GLB_REQ_PROTM_EXIT_MASK) + process_protm_exit(kbdev, glb_ack); + + /* Handle IDLE Hysteresis notification event */ + if ((glb_req ^ glb_ack) & GLB_REQ_IDLE_EVENT_MASK) { + dev_dbg(kbdev->dev, "Idle-hysteresis event flagged"); +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + if (kbase_csf_scheduler_process_gpu_idle_event(kbdev)) { + kbase_csf_firmware_global_input_mask( + global_iface, GLB_REQ, glb_ack, + GLB_REQ_IDLE_EVENT_MASK); + } +#else + kbase_csf_firmware_global_input_mask( + global_iface, GLB_REQ, glb_ack, + GLB_REQ_IDLE_EVENT_MASK); +#endif + + glb_idle_irq_received = true; + /* Defer handling this IRQ to account for a race condition + * where the idle worker could be executed before we have + * finished handling all pending IRQs (including CSG IDLE + * IRQs). + */ + deferred_handling_glb_idle_irq = true; + } + + process_prfcnt_interrupts(kbdev, glb_req, glb_ack); + + kbase_csf_scheduler_spin_unlock(kbdev, flags); + + /* Invoke the MCU state machine as a state transition + * might have completed. + */ + kbase_pm_update_state(kbdev); + } } + + if (!glb_idle_irq_received) + break; + /* Attempt to serve potential IRQs that might have occurred + * whilst handling the previous IRQ. In case we have observed + * the GLB IDLE IRQ without all CSGs having been marked as + * idle, the GPU would be treated as no longer idle and left + * powered on. + */ + val = kbase_reg_read(kbdev, JOB_CONTROL_REG(JOB_IRQ_STATUS)); + } while (val); + + if (deferred_handling_glb_idle_irq) { + unsigned long flags; + + kbase_csf_scheduler_spin_lock(kbdev, &flags); + kbase_csf_scheduler_process_gpu_idle_event(kbdev); + kbase_csf_scheduler_spin_unlock(kbdev, flags); } wake_up_all(&kbdev->csf.event_wait); + KBASE_KTRACE_ADD(kbdev, CSF_INTERRUPT_END, NULL, val); } +void kbase_csf_handle_csg_sync_update(struct kbase_device *const kbdev, + struct kbase_csf_cmd_stream_group_info *ginfo, + struct kbase_queue_group *group, u32 req, u32 ack) +{ + kbase_csf_scheduler_spin_lock_assert_held(kbdev); + + if ((req ^ ack) & CSG_REQ_SYNC_UPDATE_MASK) { + kbase_csf_firmware_csg_input_mask(ginfo, CSG_REQ, ack, CSG_REQ_SYNC_UPDATE_MASK); + + KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_INTERRUPT_SYNC_UPDATE, group, req ^ ack); + + /* SYNC_UPDATE events shall invalidate GPU idle event */ + atomic_set(&kbdev->csf.scheduler.gpu_no_longer_idle, true); + + kbase_csf_event_signal_cpu_only(group->kctx); + } +} + void kbase_csf_doorbell_mapping_term(struct kbase_device *kbdev) { if (kbdev->csf.db_filp) { @@ -2924,13 +3249,12 @@ int kbase_csf_doorbell_mapping_init(struct kbase_device *kbdev) struct file *filp; int ret; - filp = shmem_file_setup("mali csf", MAX_LFS_FILESIZE, VM_NORESERVE); + filp = shmem_file_setup("mali csf db", MAX_LFS_FILESIZE, VM_NORESERVE); if (IS_ERR(filp)) return PTR_ERR(filp); - ret = kbase_mem_pool_alloc_pages( - &kbdev->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], - 1, &phys, false); + ret = kbase_mem_pool_alloc_pages(&kbdev->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], 1, &phys, + false, NULL); if (ret <= 0) { fput(filp); @@ -2944,47 +3268,74 @@ int kbase_csf_doorbell_mapping_init(struct kbase_device *kbdev) return 0; } +void kbase_csf_pending_gpuq_kicks_init(struct kbase_device *kbdev) +{ + size_t i; + + for (i = 0; i != ARRAY_SIZE(kbdev->csf.pending_gpuq_kicks); ++i) + INIT_LIST_HEAD(&kbdev->csf.pending_gpuq_kicks[i]); + spin_lock_init(&kbdev->csf.pending_gpuq_kicks_lock); +} + +void kbase_csf_pending_gpuq_kicks_term(struct kbase_device *kbdev) +{ + size_t i; + + spin_lock(&kbdev->csf.pending_gpuq_kicks_lock); + for (i = 0; i != ARRAY_SIZE(kbdev->csf.pending_gpuq_kicks); ++i) { + if (!list_empty(&kbdev->csf.pending_gpuq_kicks[i])) + dev_warn(kbdev->dev, + "Some GPU queue kicks for priority %zu were not handled", i); + } + spin_unlock(&kbdev->csf.pending_gpuq_kicks_lock); +} + void kbase_csf_free_dummy_user_reg_page(struct kbase_device *kbdev) { - if (as_phys_addr_t(kbdev->csf.dummy_user_reg_page)) { - struct page *page = as_page(kbdev->csf.dummy_user_reg_page); + if (kbdev->csf.user_reg.filp) { + struct page *page = as_page(kbdev->csf.user_reg.dummy_page); - kbase_mem_pool_free( - &kbdev->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], page, - false); + kbase_mem_pool_free(&kbdev->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], page, false); + fput(kbdev->csf.user_reg.filp); } } int kbase_csf_setup_dummy_user_reg_page(struct kbase_device *kbdev) { struct tagged_addr phys; + struct file *filp; struct page *page; u32 *addr; - int ret; - kbdev->csf.dummy_user_reg_page = as_tagged(0); + kbdev->csf.user_reg.filp = NULL; - ret = kbase_mem_pool_alloc_pages( - &kbdev->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], 1, &phys, - false); + filp = shmem_file_setup("mali csf user_reg", MAX_LFS_FILESIZE, VM_NORESERVE); + if (IS_ERR(filp)) { + dev_err(kbdev->dev, "failed to get an unlinked file for user_reg"); + return PTR_ERR(filp); + } - if (ret <= 0) - return ret; + if (kbase_mem_pool_alloc_pages(&kbdev->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], 1, &phys, + false, NULL) <= 0) { + fput(filp); + return -ENOMEM; + } page = as_page(phys); - addr = kmap_atomic(page); + addr = kbase_kmap_atomic(page); /* Write a special value for the latest flush register inside the * dummy page */ addr[LATEST_FLUSH / sizeof(u32)] = POWER_DOWN_LATEST_FLUSH_VALUE; - kbase_sync_single_for_device(kbdev, kbase_dma_addr(page), sizeof(u32), + kbase_sync_single_for_device(kbdev, kbase_dma_addr(page) + LATEST_FLUSH, sizeof(u32), DMA_BIDIRECTIONAL); - kunmap_atomic(addr); - - kbdev->csf.dummy_user_reg_page = phys; + kbase_kunmap_atomic(addr); + kbdev->csf.user_reg.filp = filp; + kbdev->csf.user_reg.dummy_page = phys; + kbdev->csf.user_reg.file_offset = 0; return 0; } @@ -3001,3 +3352,60 @@ u8 kbase_csf_priority_check(struct kbase_device *kbdev, u8 req_priority) return out_priority; } + +void kbase_csf_process_queue_kick(struct kbase_queue *queue) +{ + struct kbase_context *kctx = queue->kctx; + struct kbase_device *kbdev = kctx->kbdev; + bool retry_kick = false; + int err = kbase_reset_gpu_prevent_and_wait(kbdev); + + if (err) { + dev_err(kbdev->dev, "Unsuccessful GPU reset detected when kicking queue"); + goto out_release_queue; + } + + rt_mutex_lock(&kctx->csf.lock); + + if (queue->bind_state != KBASE_CSF_QUEUE_BOUND) + goto out_allow_gpu_reset; + + err = kbase_csf_scheduler_queue_start(queue); + if (unlikely(err)) { + dev_dbg(kbdev->dev, "Failed to start queue"); + if (err == -EBUSY) { + retry_kick = true; + + spin_lock(&kbdev->csf.pending_gpuq_kicks_lock); + if (list_empty(&queue->pending_kick_link)) { + /* A failed queue kick shall be pushed to the + * back of the queue to avoid potential abuse. + */ + list_add_tail( + &queue->pending_kick_link, + &kbdev->csf.pending_gpuq_kicks[queue->group_priority]); + spin_unlock(&kbdev->csf.pending_gpuq_kicks_lock); + } else { + spin_unlock(&kbdev->csf.pending_gpuq_kicks_lock); + WARN_ON(atomic_read(&queue->pending_kick) == 0); + } + + complete(&kbdev->csf.scheduler.kthread_signal); + } + } + +out_allow_gpu_reset: + if (likely(!retry_kick)) { + WARN_ON(atomic_read(&queue->pending_kick) == 0); + atomic_dec(&queue->pending_kick); + } + + rt_mutex_unlock(&kctx->csf.lock); + + kbase_reset_gpu_allow(kbdev); + + return; +out_release_queue: + WARN_ON(atomic_read(&queue->pending_kick) == 0); + atomic_dec(&queue->pending_kick); +} diff --git a/mali_kbase/csf/mali_kbase_csf.h b/mali_kbase/csf/mali_kbase_csf.h index 46a0529..29119e1 100644 --- a/mali_kbase/csf/mali_kbase_csf.h +++ b/mali_kbase/csf/mali_kbase_csf.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2018-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2018-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -40,14 +40,17 @@ */ #define KBASEP_USER_DB_NR_INVALID ((s8)-1) +/* Number of pages used for GPU command queue's User input & output data */ +#define KBASEP_NUM_CS_USER_IO_PAGES (2) + /* Indicates an invalid value for the scan out sequence number, used to * signify there is no group that has protected mode execution pending. */ #define KBASEP_TICK_PROTM_PEND_SCAN_SEQ_NR_INVALID (U32_MAX) -#define FIRMWARE_PING_INTERVAL_MS (12000) /* 12 seconds */ - -#define FIRMWARE_IDLE_HYSTERESIS_TIME_MS (10) /* Default 10 milliseconds */ +/* 60ms optimizes power while minimizing latency impact for UI test cases. */ +#define MALI_HOST_CONTROLS_SC_RAILS_IDLE_TIMER_NS (600 * 1000) +#define FIRMWARE_IDLE_HYSTERESIS_TIME_NS (60 * 1000 * 1000) /* Default 60 milliseconds */ /* Idle hysteresis time can be scaled down when GPU sleep feature is used */ #define FIRMWARE_IDLE_HYSTERESIS_GPU_SLEEP_SCALER (5) @@ -75,6 +78,18 @@ void kbase_csf_ctx_handle_fault(struct kbase_context *kctx, struct kbase_fault *fault); /** + * kbase_csf_ctx_report_page_fault_for_active_groups - Notify Userspace about GPU page fault + * for active groups of the faulty context. + * + * @kctx: Pointer to faulty kbase context. + * @fault: Pointer to the fault. + * + * This function notifies the event notification thread of the GPU page fault. + */ +void kbase_csf_ctx_report_page_fault_for_active_groups(struct kbase_context *kctx, + struct kbase_fault *fault); + +/** * kbase_csf_ctx_term - Terminate the CSF interface for a GPU address space. * * @kctx: Pointer to the kbase context which is being terminated. @@ -126,6 +141,25 @@ void kbase_csf_queue_terminate(struct kbase_context *kctx, struct kbase_ioctl_cs_queue_terminate *term); /** + * kbase_csf_free_command_stream_user_pages() - Free the resources allocated + * for a queue at the time of bind. + * + * @kctx: Address of the kbase context within which the queue was created. + * @queue: Pointer to the queue to be unlinked. + * + * This function will free the pair of physical pages allocated for a GPU + * command queue, and also release the hardware doorbell page, that were mapped + * into the process address space to enable direct submission of commands to + * the hardware. Also releases the reference taken on the queue when the mapping + * was created. + * + * If an explicit or implicit unbind was missed by the userspace then the + * mapping will persist. On process exit kernel itself will remove the mapping. + */ +void kbase_csf_free_command_stream_user_pages(struct kbase_context *kctx, + struct kbase_queue *queue); + +/** * kbase_csf_alloc_command_stream_user_pages - Allocate resources for a * GPU command queue. * @@ -161,8 +195,9 @@ int kbase_csf_queue_bind(struct kbase_context *kctx, * are any. * * @queue: Pointer to queue to be unbound. + * @process_exit: Flag to indicate if process exit is happening. */ -void kbase_csf_queue_unbind(struct kbase_queue *queue); +void kbase_csf_queue_unbind(struct kbase_queue *queue, bool process_exit); /** * kbase_csf_queue_unbind_stopped - Unbind a GPU command queue in the case @@ -187,6 +222,20 @@ int kbase_csf_queue_kick(struct kbase_context *kctx, struct kbase_ioctl_cs_queue_kick *kick); /** + * kbase_csf_queue_group_handle_is_valid - Find the queue group corresponding + * to the indicated handle. + * + * @kctx: The kbase context under which the queue group exists. + * @group_handle: Handle for the group which uniquely identifies it within + * the context with which it was created. + * + * This function is used to find the queue group when passed a handle. + * + * Return: Pointer to a queue group on success, NULL on failure + */ +struct kbase_queue_group *kbase_csf_find_queue_group(struct kbase_context *kctx, u8 group_handle); + +/** * kbase_csf_queue_group_handle_is_valid - Find if the given queue group handle * is valid. * @@ -239,6 +288,7 @@ void kbase_csf_queue_group_terminate(struct kbase_context *kctx, */ void kbase_csf_term_descheduled_queue_group(struct kbase_queue_group *group); +#if IS_ENABLED(CONFIG_MALI_VECTOR_DUMP) || MALI_UNIT_TEST /** * kbase_csf_queue_group_suspend - Suspend a GPU command queue group * @@ -256,6 +306,7 @@ void kbase_csf_term_descheduled_queue_group(struct kbase_queue_group *group); */ int kbase_csf_queue_group_suspend(struct kbase_context *kctx, struct kbase_suspend_copy_buffer *sus_buf, u8 group_handle); +#endif /** * kbase_csf_add_group_fatal_error - Report a fatal group error to userspace @@ -276,6 +327,19 @@ void kbase_csf_add_group_fatal_error( void kbase_csf_interrupt(struct kbase_device *kbdev, u32 val); /** + * kbase_csf_handle_csg_sync_update - Handle SYNC_UPDATE notification for the group. + * + * @kbdev: The kbase device to handle the SYNC_UPDATE interrupt. + * @ginfo: Pointer to the CSG interface used by the @group + * @group: Pointer to the GPU command queue group. + * @req: CSG_REQ register value corresponding to @group. + * @ack: CSG_ACK register value corresponding to @group. + */ +void kbase_csf_handle_csg_sync_update(struct kbase_device *const kbdev, + struct kbase_csf_cmd_stream_group_info *ginfo, + struct kbase_queue_group *group, u32 req, u32 ack); + +/** * kbase_csf_doorbell_mapping_init - Initialize the fields that facilitates * the update of userspace mapping of HW * doorbell page. @@ -324,6 +388,22 @@ int kbase_csf_setup_dummy_user_reg_page(struct kbase_device *kbdev); void kbase_csf_free_dummy_user_reg_page(struct kbase_device *kbdev); /** + * kbase_csf_pending_gpuq_kicks_init - Initialize the data used for handling + * GPU queue kicks. + * + * @kbdev: Instance of a GPU platform device that implements a CSF interface. + */ +void kbase_csf_pending_gpuq_kicks_init(struct kbase_device *kbdev); + +/** + * kbase_csf_pending_gpuq_kicks_init - De-initialize the data used for handling + * GPU queue kicks. + * + * @kbdev: Instance of a GPU platform device that implements a CSF interface. + */ +void kbase_csf_pending_gpuq_kicks_term(struct kbase_device *kbdev); + +/** * kbase_csf_ring_csg_doorbell - ring the doorbell for a CSG interface. * * @kbdev: Instance of a GPU platform device that implements a CSF interface. @@ -465,4 +545,18 @@ static inline u64 kbase_csf_ktrace_gpu_cycle_cnt(struct kbase_device *kbdev) return 0; #endif } + +/** + * kbase_csf_process_queue_kick() - Process a pending kicked GPU command queue. + * + * @queue: Pointer to the queue to process. + * + * This function starts the pending queue, for which the work + * was previously submitted via ioctl call from application thread. + * If the queue is already scheduled and resident, it will be started + * right away, otherwise once the group is made resident. + */ +void kbase_csf_process_queue_kick(struct kbase_queue *queue); + + #endif /* _KBASE_CSF_H_ */ diff --git a/mali_kbase/csf/mali_kbase_csf_cpu_queue_debugfs.c b/mali_kbase/csf/mali_kbase_csf_cpu_queue_debugfs.c index 66b671d..d783650 100644 --- a/mali_kbase/csf/mali_kbase_csf_cpu_queue_debugfs.c +++ b/mali_kbase/csf/mali_kbase_csf_cpu_queue_debugfs.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2020-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -51,18 +51,18 @@ static int kbasep_csf_cpu_queue_debugfs_show(struct seq_file *file, void *data) { struct kbase_context *kctx = file->private; - mutex_lock(&kctx->csf.lock); + rt_mutex_lock(&kctx->csf.lock); if (atomic_read(&kctx->csf.cpu_queue.dump_req_status) != BASE_CSF_CPU_QUEUE_DUMP_COMPLETE) { seq_puts(file, "Dump request already started! (try again)\n"); - mutex_unlock(&kctx->csf.lock); + rt_mutex_unlock(&kctx->csf.lock); return -EBUSY; } atomic_set(&kctx->csf.cpu_queue.dump_req_status, BASE_CSF_CPU_QUEUE_DUMP_ISSUED); init_completion(&kctx->csf.cpu_queue.dump_cmp); kbase_event_wakeup_nosync(kctx); - mutex_unlock(&kctx->csf.lock); + rt_mutex_unlock(&kctx->csf.lock); seq_puts(file, "CPU Queues table (version:v" __stringify(MALI_CSF_CPU_QUEUE_DEBUGFS_VERSION) "):\n"); @@ -70,7 +70,7 @@ static int kbasep_csf_cpu_queue_debugfs_show(struct seq_file *file, void *data) wait_for_completion_timeout(&kctx->csf.cpu_queue.dump_cmp, msecs_to_jiffies(3000)); - mutex_lock(&kctx->csf.lock); + rt_mutex_lock(&kctx->csf.lock); if (kctx->csf.cpu_queue.buffer) { WARN_ON(atomic_read(&kctx->csf.cpu_queue.dump_req_status) != BASE_CSF_CPU_QUEUE_DUMP_PENDING); @@ -86,7 +86,7 @@ static int kbasep_csf_cpu_queue_debugfs_show(struct seq_file *file, void *data) atomic_set(&kctx->csf.cpu_queue.dump_req_status, BASE_CSF_CPU_QUEUE_DUMP_COMPLETE); - mutex_unlock(&kctx->csf.lock); + rt_mutex_unlock(&kctx->csf.lock); return 0; } @@ -126,33 +126,30 @@ void kbase_csf_cpu_queue_debugfs_init(struct kbase_context *kctx) int kbase_csf_cpu_queue_dump(struct kbase_context *kctx, u64 buffer, size_t buf_size) { - int err = 0; - size_t alloc_size = buf_size; char *dump_buffer; if (!buffer || !alloc_size) - goto done; + return 0; + + if (alloc_size > SIZE_MAX - PAGE_SIZE) + return -ENOMEM; alloc_size = (alloc_size + PAGE_SIZE) & ~(PAGE_SIZE - 1); dump_buffer = kzalloc(alloc_size, GFP_KERNEL); - if (ZERO_OR_NULL_PTR(dump_buffer)) { - err = -ENOMEM; - goto done; - } + if (!dump_buffer) + return -ENOMEM; WARN_ON(kctx->csf.cpu_queue.buffer != NULL); - err = copy_from_user(dump_buffer, + if (copy_from_user(dump_buffer, u64_to_user_ptr(buffer), - buf_size); - if (err) { + buf_size)) { kfree(dump_buffer); - err = -EFAULT; - goto done; + return -EFAULT; } - mutex_lock(&kctx->csf.lock); + rt_mutex_lock(&kctx->csf.lock); kfree(kctx->csf.cpu_queue.buffer); @@ -161,13 +158,12 @@ int kbase_csf_cpu_queue_dump(struct kbase_context *kctx, kctx->csf.cpu_queue.buffer = dump_buffer; kctx->csf.cpu_queue.buffer_size = buf_size; complete_all(&kctx->csf.cpu_queue.dump_cmp); - } else { + } else kfree(dump_buffer); - } - mutex_unlock(&kctx->csf.lock); -done: - return err; + rt_mutex_unlock(&kctx->csf.lock); + + return 0; } #else /* diff --git a/mali_kbase/csf/mali_kbase_csf_csg_debugfs.c b/mali_kbase/csf/mali_kbase_csf_csg_debugfs.c index 2075797..c94e656 100644 --- a/mali_kbase/csf/mali_kbase_csf_csg_debugfs.c +++ b/mali_kbase/csf/mali_kbase_csf_csg_debugfs.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -23,11 +23,137 @@ #include <mali_kbase.h> #include <linux/seq_file.h> #include <linux/delay.h> -#include <csf/mali_kbase_csf_trace_buffer.h> #include <backend/gpu/mali_kbase_pm_internal.h> #if IS_ENABLED(CONFIG_DEBUG_FS) #include "mali_kbase_csf_tl_reader.h" +#include <linux/version_compat_defs.h> + +/* Wait time to be used cumulatively for all the CSG slots. + * Since scheduler lock is held when STATUS_UPDATE request is sent, there won't be + * any other Host request pending on the FW side and usually FW would be responsive + * to the Doorbell IRQs as it won't do any polling for a long time and also it won't + * have to wait for any HW state transition to complete for publishing the status. + * So it is reasonable to expect that handling of STATUS_UPDATE request would be + * relatively very quick. + */ +#define STATUS_UPDATE_WAIT_TIMEOUT 500 + +/* The bitmask of CSG slots for which the STATUS_UPDATE request completed. + * The access to it is serialized with scheduler lock, so at a time it would + * get used either for "active_groups" or per context "groups" debugfs file. + */ +static DECLARE_BITMAP(csg_slots_status_updated, MAX_SUPPORTED_CSGS); + +static +bool csg_slot_status_update_finish(struct kbase_device *kbdev, u32 csg_nr) +{ + struct kbase_csf_cmd_stream_group_info const *const ginfo = + &kbdev->csf.global_iface.groups[csg_nr]; + + return !((kbase_csf_firmware_csg_input_read(ginfo, CSG_REQ) ^ + kbase_csf_firmware_csg_output(ginfo, CSG_ACK)) & + CSG_REQ_STATUS_UPDATE_MASK); +} + +static +bool csg_slots_status_update_finish(struct kbase_device *kbdev, + const unsigned long *slots_mask) +{ + const u32 max_csg_slots = kbdev->csf.global_iface.group_num; + bool changed = false; + u32 csg_nr; + + lockdep_assert_held(&kbdev->csf.scheduler.lock); + + for_each_set_bit(csg_nr, slots_mask, max_csg_slots) { + if (csg_slot_status_update_finish(kbdev, csg_nr)) { + set_bit(csg_nr, csg_slots_status_updated); + changed = true; + } + } + + return changed; +} + +static void wait_csg_slots_status_update_finish(struct kbase_device *kbdev, + unsigned long *slots_mask) +{ + const u32 max_csg_slots = kbdev->csf.global_iface.group_num; + long remaining = kbase_csf_timeout_in_jiffies(STATUS_UPDATE_WAIT_TIMEOUT); + + lockdep_assert_held(&kbdev->csf.scheduler.lock); + + bitmap_zero(csg_slots_status_updated, max_csg_slots); + + while (!bitmap_empty(slots_mask, max_csg_slots) && remaining) { + remaining = wait_event_timeout(kbdev->csf.event_wait, + csg_slots_status_update_finish(kbdev, slots_mask), + remaining); + if (likely(remaining)) { + bitmap_andnot(slots_mask, slots_mask, + csg_slots_status_updated, max_csg_slots); + } else { + dev_warn(kbdev->dev, + "STATUS_UPDATE request timed out for slots 0x%lx", + slots_mask[0]); + } + } +} + +void kbase_csf_debugfs_update_active_groups_status(struct kbase_device *kbdev) +{ + u32 max_csg_slots = kbdev->csf.global_iface.group_num; + DECLARE_BITMAP(used_csgs, MAX_SUPPORTED_CSGS) = { 0 }; + u32 csg_nr; + unsigned long flags; + + lockdep_assert_held(&kbdev->csf.scheduler.lock); + + /* Global doorbell ring for CSG STATUS_UPDATE request or User doorbell + * ring for Extract offset update, shall not be made when MCU has been + * put to sleep otherwise it will undesirably make MCU exit the sleep + * state. Also it isn't really needed as FW will implicitly update the + * status of all on-slot groups when MCU sleep request is sent to it. + */ + if (kbdev->csf.scheduler.state == SCHED_SLEEPING) { + /* Wait for the MCU sleep request to complete. */ + kbase_pm_wait_for_desired_state(kbdev); + bitmap_copy(csg_slots_status_updated, + kbdev->csf.scheduler.csg_inuse_bitmap, max_csg_slots); + return; + } + + for (csg_nr = 0; csg_nr < max_csg_slots; csg_nr++) { + struct kbase_queue_group *const group = + kbdev->csf.scheduler.csg_slots[csg_nr].resident_group; + if (!group) + continue; + /* Ring the User doorbell for FW to update the Extract offset */ + kbase_csf_ring_doorbell(kbdev, group->doorbell_nr); + set_bit(csg_nr, used_csgs); + } + + /* Return early if there are no on-slot groups */ + if (bitmap_empty(used_csgs, max_csg_slots)) + return; + + kbase_csf_scheduler_spin_lock(kbdev, &flags); + for_each_set_bit(csg_nr, used_csgs, max_csg_slots) { + struct kbase_csf_cmd_stream_group_info const *const ginfo = + &kbdev->csf.global_iface.groups[csg_nr]; + kbase_csf_firmware_csg_input_mask(ginfo, CSG_REQ, + ~kbase_csf_firmware_csg_output(ginfo, CSG_ACK), + CSG_REQ_STATUS_UPDATE_MASK); + } + + BUILD_BUG_ON(MAX_SUPPORTED_CSGS > (sizeof(used_csgs[0]) * BITS_PER_BYTE)); + kbase_csf_ring_csg_slots_doorbell(kbdev, used_csgs[0]); + kbase_csf_scheduler_spin_unlock(kbdev, flags); + wait_csg_slots_status_update_finish(kbdev, used_csgs); + /* Wait for the User doobell ring to take effect */ + msleep(100); +} #define MAX_SCHED_STATE_STRING_LEN (16) static const char *scheduler_state_to_string(struct kbase_device *kbdev, @@ -77,16 +203,32 @@ static const char *blocked_reason_to_string(u32 reason_id) return cs_blocked_reason[reason_id]; } +static bool sb_source_supported(u32 glb_version) +{ + bool supported = false; + + if (((GLB_VERSION_MAJOR_GET(glb_version) == 3) && + (GLB_VERSION_MINOR_GET(glb_version) >= 5)) || + ((GLB_VERSION_MAJOR_GET(glb_version) == 2) && + (GLB_VERSION_MINOR_GET(glb_version) >= 6)) || + ((GLB_VERSION_MAJOR_GET(glb_version) == 1) && + (GLB_VERSION_MINOR_GET(glb_version) >= 3))) + supported = true; + + return supported; +} + static void kbasep_csf_scheduler_dump_active_queue_cs_status_wait( - struct seq_file *file, u32 wait_status, u32 wait_sync_value, - u64 wait_sync_live_value, u64 wait_sync_pointer, u32 sb_status, - u32 blocked_reason) + struct seq_file *file, u32 glb_version, u32 wait_status, u32 wait_sync_value, + u64 wait_sync_live_value, u64 wait_sync_pointer, u32 sb_status, u32 blocked_reason) { #define WAITING "Waiting" #define NOT_WAITING "Not waiting" seq_printf(file, "SB_MASK: %d\n", CS_STATUS_WAIT_SB_MASK_GET(wait_status)); + if (sb_source_supported(glb_version)) + seq_printf(file, "SB_SOURCE: %d\n", CS_STATUS_WAIT_SB_SOURCE_GET(wait_status)); seq_printf(file, "PROGRESS_WAIT: %s\n", CS_STATUS_WAIT_PROGRESS_WAIT_GET(wait_status) ? WAITING : NOT_WAITING); @@ -145,7 +287,8 @@ static void kbasep_csf_scheduler_dump_active_cs_trace(struct seq_file *file, static void kbasep_csf_scheduler_dump_active_queue(struct seq_file *file, struct kbase_queue *queue) { - u32 *addr; + u64 *addr; + u32 *addr32; u64 cs_extract; u64 cs_insert; u32 cs_active; @@ -156,20 +299,25 @@ static void kbasep_csf_scheduler_dump_active_queue(struct seq_file *file, struct kbase_vmap_struct *mapping; u64 *evt; u64 wait_sync_live_value; + u32 glb_version; if (!queue) return; + glb_version = queue->kctx->kbdev->csf.global_iface.version; + if (WARN_ON(queue->csi_index == KBASEP_IF_NR_INVALID || !queue->group)) return; - addr = (u32 *)queue->user_io_addr; - cs_insert = addr[CS_INSERT_LO/4] | ((u64)addr[CS_INSERT_HI/4] << 32); + addr = queue->user_io_addr; + cs_insert = addr[CS_INSERT_LO / sizeof(*addr)]; + + addr = queue->user_io_addr + PAGE_SIZE / sizeof(*addr); + cs_extract = addr[CS_EXTRACT_LO / sizeof(*addr)]; - addr = (u32 *)(queue->user_io_addr + PAGE_SIZE); - cs_extract = addr[CS_EXTRACT_LO/4] | ((u64)addr[CS_EXTRACT_HI/4] << 32); - cs_active = addr[CS_ACTIVE/4]; + addr32 = (u32 *)(queue->user_io_addr + PAGE_SIZE / sizeof(*addr)); + cs_active = addr32[CS_ACTIVE / sizeof(*addr32)]; #define KBASEP_CSF_DEBUGFS_CS_HEADER_USER_IO \ "Bind Idx, Ringbuf addr, Size, Prio, Insert offset, Extract offset, Active, Doorbell\n" @@ -200,9 +348,8 @@ static void kbasep_csf_scheduler_dump_active_queue(struct seq_file *file, } kbasep_csf_scheduler_dump_active_queue_cs_status_wait( - file, wait_status, wait_sync_value, - wait_sync_live_value, wait_sync_pointer, - sb_status, blocked_reason); + file, glb_version, wait_status, wait_sync_value, + wait_sync_live_value, wait_sync_pointer, sb_status, blocked_reason); } } else { struct kbase_device const *const kbdev = @@ -257,9 +404,8 @@ static void kbasep_csf_scheduler_dump_active_queue(struct seq_file *file, } kbasep_csf_scheduler_dump_active_queue_cs_status_wait( - file, wait_status, wait_sync_value, - wait_sync_live_value, wait_sync_pointer, sb_status, - blocked_reason); + file, glb_version, wait_status, wait_sync_value, wait_sync_live_value, + wait_sync_pointer, sb_status, blocked_reason); /* Dealing with cs_trace */ if (kbase_csf_scheduler_queue_has_trace(queue)) kbasep_csf_scheduler_dump_active_cs_trace(file, stream); @@ -270,54 +416,6 @@ static void kbasep_csf_scheduler_dump_active_queue(struct seq_file *file, seq_puts(file, "\n"); } -static void update_active_group_status(struct seq_file *file, - struct kbase_queue_group *const group) -{ - struct kbase_device *const kbdev = group->kctx->kbdev; - struct kbase_csf_cmd_stream_group_info const *const ginfo = - &kbdev->csf.global_iface.groups[group->csg_nr]; - long remaining = kbase_csf_timeout_in_jiffies(kbdev->csf.fw_timeout_ms); - unsigned long flags; - - /* Global doorbell ring for CSG STATUS_UPDATE request or User doorbell - * ring for Extract offset update, shall not be made when MCU has been - * put to sleep otherwise it will undesirably make MCU exit the sleep - * state. Also it isn't really needed as FW will implicitly update the - * status of all on-slot groups when MCU sleep request is sent to it. - */ - if (kbdev->csf.scheduler.state == SCHED_SLEEPING) - return; - - /* Ring the User doobell shared between the queues bound to this - * group, to have FW update the CS_EXTRACT for all the queues - * bound to the group. Ring early so that FW gets adequate time - * for the handling. - */ - kbase_csf_ring_doorbell(kbdev, group->doorbell_nr); - - kbase_csf_scheduler_spin_lock(kbdev, &flags); - kbase_csf_firmware_csg_input_mask(ginfo, CSG_REQ, - ~kbase_csf_firmware_csg_output(ginfo, CSG_ACK), - CSG_REQ_STATUS_UPDATE_MASK); - kbase_csf_scheduler_spin_unlock(kbdev, flags); - kbase_csf_ring_csg_doorbell(kbdev, group->csg_nr); - - remaining = wait_event_timeout(kbdev->csf.event_wait, - !((kbase_csf_firmware_csg_input_read(ginfo, CSG_REQ) ^ - kbase_csf_firmware_csg_output(ginfo, CSG_ACK)) & - CSG_REQ_STATUS_UPDATE_MASK), remaining); - - if (!remaining) { - dev_err(kbdev->dev, - "Timed out for STATUS_UPDATE on group %d on slot %d", - group->handle, group->csg_nr); - - seq_printf(file, "*** Warn: Timed out for STATUS_UPDATE on slot %d\n", - group->csg_nr); - seq_puts(file, "*** The following group-record is likely stale\n"); - } -} - static void kbasep_csf_scheduler_dump_active_group(struct seq_file *file, struct kbase_queue_group *const group) { @@ -331,8 +429,6 @@ static void kbasep_csf_scheduler_dump_active_group(struct seq_file *file, u8 slot_priority = kbdev->csf.scheduler.csg_slots[group->csg_nr].priority; - update_active_group_status(file, group); - ep_c = kbase_csf_firmware_csg_output(ginfo, CSG_STATUS_EP_CURRENT); ep_r = kbase_csf_firmware_csg_output(ginfo, CSG_STATUS_EP_REQ); @@ -348,25 +444,25 @@ static void kbasep_csf_scheduler_dump_active_group(struct seq_file *file, CSG_STATUS_STATE_IDLE_MASK) idle = 'Y'; - seq_puts(file, "GroupID, CSG NR, CSG Prio, Run State, Priority, C_EP(Alloc/Req), F_EP(Alloc/Req), T_EP(Alloc/Req), Exclusive, Idle\n"); - seq_printf(file, "%7d, %6d, %8d, %9d, %8d, %11d/%3d, %11d/%3d, %11d/%3d, %9c, %4c\n", - group->handle, - group->csg_nr, - slot_priority, - group->run_state, - group->priority, - CSG_STATUS_EP_CURRENT_COMPUTE_EP_GET(ep_c), - CSG_STATUS_EP_REQ_COMPUTE_EP_GET(ep_r), - CSG_STATUS_EP_CURRENT_FRAGMENT_EP_GET(ep_c), - CSG_STATUS_EP_REQ_FRAGMENT_EP_GET(ep_r), - CSG_STATUS_EP_CURRENT_TILER_EP_GET(ep_c), - CSG_STATUS_EP_REQ_TILER_EP_GET(ep_r), - exclusive, - idle); - - /* Wait for the User doobell ring to take effect */ - if (kbdev->csf.scheduler.state != SCHED_SLEEPING) - msleep(100); + if (!test_bit(group->csg_nr, csg_slots_status_updated)) { + seq_printf(file, "*** Warn: Timed out for STATUS_UPDATE on slot %d\n", + group->csg_nr); + seq_puts(file, "*** The following group-record is likely stale\n"); + } + seq_puts( + file, + "GroupID, CSG NR, CSG Prio, Run State, Priority, C_EP(Alloc/Req), F_EP(Alloc/Req), T_EP(Alloc/Req), Exclusive, Idle\n"); + seq_printf( + file, + "%7d, %6d, %8d, %9d, %8d, %11d/%3d, %11d/%3d, %11d/%3d, %9c, %4c\n", + group->handle, group->csg_nr, slot_priority, group->run_state, + group->priority, CSG_STATUS_EP_CURRENT_COMPUTE_EP_GET(ep_c), + CSG_STATUS_EP_REQ_COMPUTE_EP_GET(ep_r), + CSG_STATUS_EP_CURRENT_FRAGMENT_EP_GET(ep_c), + CSG_STATUS_EP_REQ_FRAGMENT_EP_GET(ep_r), + CSG_STATUS_EP_CURRENT_TILER_EP_GET(ep_c), + CSG_STATUS_EP_REQ_TILER_EP_GET(ep_r), exclusive, idle); + } else { seq_puts(file, "GroupID, CSG NR, Run State, Priority\n"); seq_printf(file, "%7d, %6d, %9d, %8d\n", @@ -404,22 +500,19 @@ static int kbasep_csf_queue_group_debugfs_show(struct seq_file *file, { u32 gr; struct kbase_context *const kctx = file->private; - struct kbase_device *const kbdev = kctx->kbdev; + struct kbase_device *kbdev; if (WARN_ON(!kctx)) return -EINVAL; + kbdev = kctx->kbdev; + seq_printf(file, "MALI_CSF_CSG_DEBUGFS_VERSION: v%u\n", MALI_CSF_CSG_DEBUGFS_VERSION); - mutex_lock(&kctx->csf.lock); + rt_mutex_lock(&kctx->csf.lock); kbase_csf_scheduler_lock(kbdev); - if (kbdev->csf.scheduler.state == SCHED_SLEEPING) { - /* Wait for the MCU sleep request to complete. Please refer the - * update_active_group_status() function for the explanation. - */ - kbase_pm_wait_for_desired_state(kbdev); - } + kbase_csf_debugfs_update_active_groups_status(kbdev); for (gr = 0; gr < MAX_QUEUE_GROUP_NUM; gr++) { struct kbase_queue_group *const group = kctx->csf.queue_groups[gr]; @@ -428,7 +521,7 @@ static int kbasep_csf_queue_group_debugfs_show(struct seq_file *file, kbasep_csf_scheduler_dump_active_group(file, group); } kbase_csf_scheduler_unlock(kbdev); - mutex_unlock(&kctx->csf.lock); + rt_mutex_unlock(&kctx->csf.lock); return 0; } @@ -453,12 +546,7 @@ static int kbasep_csf_scheduler_dump_active_groups(struct seq_file *file, MALI_CSF_CSG_DEBUGFS_VERSION); kbase_csf_scheduler_lock(kbdev); - if (kbdev->csf.scheduler.state == SCHED_SLEEPING) { - /* Wait for the MCU sleep request to complete. Please refer the - * update_active_group_status() function for the explanation. - */ - kbase_pm_wait_for_desired_state(kbdev); - } + kbase_csf_debugfs_update_active_groups_status(kbdev); for (csg_nr = 0; csg_nr < num_groups; csg_nr++) { struct kbase_queue_group *const group = kbdev->csf.scheduler.csg_slots[csg_nr].resident_group; @@ -500,11 +588,7 @@ static const struct file_operations kbasep_csf_queue_group_debugfs_fops = { void kbase_csf_queue_group_debugfs_init(struct kbase_context *kctx) { struct dentry *file; -#if (KERNEL_VERSION(4, 7, 0) <= LINUX_VERSION_CODE) const mode_t mode = 0444; -#else - const mode_t mode = 0400; -#endif if (WARN_ON(!kctx || IS_ERR_OR_NULL(kctx->kctx_dentry))) return; @@ -556,14 +640,11 @@ static int kbasep_csf_debugfs_scheduling_timer_kick_set( return 0; } -DEFINE_SIMPLE_ATTRIBUTE(kbasep_csf_debugfs_scheduling_timer_enabled_fops, - &kbasep_csf_debugfs_scheduling_timer_enabled_get, - &kbasep_csf_debugfs_scheduling_timer_enabled_set, - "%llu\n"); -DEFINE_SIMPLE_ATTRIBUTE(kbasep_csf_debugfs_scheduling_timer_kick_fops, - NULL, - &kbasep_csf_debugfs_scheduling_timer_kick_set, - "%llu\n"); +DEFINE_DEBUGFS_ATTRIBUTE(kbasep_csf_debugfs_scheduling_timer_enabled_fops, + &kbasep_csf_debugfs_scheduling_timer_enabled_get, + &kbasep_csf_debugfs_scheduling_timer_enabled_set, "%llu\n"); +DEFINE_DEBUGFS_ATTRIBUTE(kbasep_csf_debugfs_scheduling_timer_kick_fops, NULL, + &kbasep_csf_debugfs_scheduling_timer_kick_set, "%llu\n"); /** * kbase_csf_debugfs_scheduler_state_get() - Get the state of scheduler. @@ -671,7 +752,6 @@ void kbase_csf_debugfs_init(struct kbase_device *kbdev) &kbasep_csf_debugfs_scheduler_state_fops); kbase_csf_tl_reader_debugfs_init(kbdev); - kbase_csf_firmware_trace_buffer_debugfs_init(kbdev); } #else diff --git a/mali_kbase/csf/mali_kbase_csf_csg_debugfs.h b/mali_kbase/csf/mali_kbase_csf_csg_debugfs.h index 397e657..16a548b 100644 --- a/mali_kbase/csf/mali_kbase_csf_csg_debugfs.h +++ b/mali_kbase/csf/mali_kbase_csf_csg_debugfs.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -44,4 +44,11 @@ void kbase_csf_queue_group_debugfs_init(struct kbase_context *kctx); */ void kbase_csf_debugfs_init(struct kbase_device *kbdev); +/** + * kbase_csf_debugfs_update_active_groups_status() - Update on-slot group statuses + * + * @kbdev: Pointer to the device + */ +void kbase_csf_debugfs_update_active_groups_status(struct kbase_device *kbdev); + #endif /* _KBASE_CSF_CSG_DEBUGFS_H_ */ diff --git a/mali_kbase/csf/mali_kbase_csf_defs.h b/mali_kbase/csf/mali_kbase_csf_defs.h index 07b5874..fdaa10f 100644 --- a/mali_kbase/csf/mali_kbase_csf_defs.h +++ b/mali_kbase/csf/mali_kbase_csf_defs.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2018-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2018-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -30,7 +30,13 @@ #include <linux/wait.h> #include "mali_kbase_csf_firmware.h" +#include "mali_kbase_refcount_defs.h" #include "mali_kbase_csf_event.h" +#include <uapi/gpu/arm/midgard/csf/mali_kbase_csf_errors_dumpfault.h> + +#if IS_ENABLED(CONFIG_MALI_CORESIGHT) +#include <debug/backend/mali_kbase_debug_coresight_internal_csf.h> +#endif /* IS_ENABLED(CONFIG_MALI_CORESIGHT) */ /* Maximum number of KCPU command queues to be created per GPU address space. */ @@ -55,7 +61,7 @@ #define CSF_FIRMWARE_ENTRY_ZERO (1ul << 31) /** - * enum kbase_csf_bind_state - bind state of the queue + * enum kbase_csf_queue_bind_state - bind state of the queue * * @KBASE_CSF_QUEUE_UNBOUND: Set when the queue is registered or when the link * between queue and the group to which it was bound or being bound is removed. @@ -259,16 +265,33 @@ enum kbase_queue_group_priority { * @CSF_PM_TIMEOUT: Timeout for GPU Power Management to reach the desired * Shader, L2 and MCU state. * @CSF_GPU_RESET_TIMEOUT: Waiting timeout for GPU reset to complete. + * @CSF_CSG_SUSPEND_TIMEOUT: Timeout given for a CSG to be suspended. + * @CSF_FIRMWARE_BOOT_TIMEOUT: Maximum time to wait for firmware to boot. + * @CSF_FIRMWARE_PING_TIMEOUT: Maximum time to wait for firmware to respond + * to a ping from KBase. + * @CSF_SCHED_PROTM_PROGRESS_TIMEOUT: Timeout used to prevent protected mode execution hang. + * @MMU_AS_INACTIVE_WAIT_TIMEOUT: Maximum waiting time in ms for the completion + * of a MMU operation. + * @KCPU_FENCE_SIGNAL_TIMEOUT: Waiting time in ms for triggering a KCPU queue sync state dump * @KBASE_TIMEOUT_SELECTOR_COUNT: Number of timeout selectors. Must be last in * the enum. + * @KBASE_DEFAULT_TIMEOUT: Default timeout used when an invalid selector is passed + * to the pre-computed timeout getter. */ enum kbase_timeout_selector { CSF_FIRMWARE_TIMEOUT, CSF_PM_TIMEOUT, CSF_GPU_RESET_TIMEOUT, + CSF_CSG_SUSPEND_TIMEOUT, + CSF_FIRMWARE_BOOT_TIMEOUT, + CSF_FIRMWARE_PING_TIMEOUT, + CSF_SCHED_PROTM_PROGRESS_TIMEOUT, + MMU_AS_INACTIVE_WAIT_TIMEOUT, + KCPU_FENCE_SIGNAL_TIMEOUT, /* Must be the last in the enum */ - KBASE_TIMEOUT_SELECTOR_COUNT + KBASE_TIMEOUT_SELECTOR_COUNT, + KBASE_DEFAULT_TIMEOUT = CSF_FIRMWARE_TIMEOUT }; /** @@ -288,9 +311,9 @@ struct kbase_csf_notification { * * @kctx: Pointer to the base context with which this GPU command queue * is associated. - * @reg: Pointer to the region allocated from the shared - * interface segment for mapping the User mode - * input/output pages in MCU firmware address space. + * @user_io_gpu_va: The start GPU VA address of this queue's userio pages. Only + * valid (i.e. not 0 ) when the queue is enabled and its owner + * group has a runtime bound csg_reg (group region). * @phys: Pointer to the physical pages allocated for the * pair or User mode input/output page * @user_io_addr: Pointer to the permanent kernel mapping of User mode @@ -306,6 +329,14 @@ struct kbase_csf_notification { * It is in page units. * @link: Link to the linked list of GPU command queues created per * GPU address space. + * @pending_kick: Indicates whether there is a pending kick to be handled. + * @pending_kick_link: Link to the linked list of GPU command queues that have + * been kicked, but the kick has not yet been processed. + * This link would be deleted right before the kick is + * handled to allow for future kicks to occur in the mean + * time. For this reason, this must not be used to check + * for the presence of a pending queue kick. @pending_kick + * should be used instead. * @refcount: Reference count, stands for the number of times the queue * has been referenced. The reference is taken when it is * created, when it is bound to the group and also when the @@ -318,6 +349,7 @@ struct kbase_csf_notification { * @base_addr: Base address of the CS buffer. * @size: Size of the CS buffer. * @priority: Priority of this queue within the group. + * @group_priority: Priority of the group to which this queue has been bound. * @bind_state: Bind state of the queue as enum @kbase_csf_queue_bind_state * @csi_index: The ID of the assigned CS hardware interface. * @enabled: Indicating whether the CS is running, or not. @@ -345,15 +377,18 @@ struct kbase_csf_notification { * @trace_offset_ptr: Pointer to the CS trace buffer offset variable. * @trace_buffer_size: CS trace buffer size for the queue. * @trace_cfg: CS trace configuration parameters. - * @error: GPU command queue fatal information to pass to user space. - * @fatal_event_work: Work item to handle the CS fatal event reported for this - * queue. - * @cs_fatal_info: Records additional information about the CS fatal event. - * @cs_fatal: Records information about the CS fatal event. - * @pending: Indicating whether the queue has new submitted work. - * @extract_ofs: The current EXTRACT offset, this is updated during certain - * events such as GPU idle IRQ in order to help detect a - * queue's true idle status. + * @cs_error_work: Work item to handle the CS fatal event reported for this + * queue or the CS fault event if dump on fault is enabled + * and acknowledgment for CS fault event needs to be done + * after dumping is complete. + * @cs_error_info: Records additional information about the CS fatal event or + * about CS fault event if dump on fault is enabled. + * @cs_error: Records information about the CS fatal event or + * about CS fault event if dump on fault is enabled. + * @cs_error_fatal: Flag to track if the CS fault or CS fatal event occurred. + * @extract_ofs: The current EXTRACT offset, this is only updated when handling + * the GLB IDLE IRQ if the idle timeout value is non-0 in order + * to help detect a queue's true idle status. * @saved_cmd_ptr: The command pointer value for the GPU queue, saved when the * group to which queue is bound is suspended. * This can be useful in certain cases to know that till which @@ -361,20 +396,23 @@ struct kbase_csf_notification { */ struct kbase_queue { struct kbase_context *kctx; - struct kbase_va_region *reg; + u64 user_io_gpu_va; struct tagged_addr phys[2]; - char *user_io_addr; + u64 *user_io_addr; u64 handle; int doorbell_nr; unsigned long db_file_offset; struct list_head link; - atomic_t refcount; + atomic_t pending_kick; + struct list_head pending_kick_link; + kbase_refcount_t refcount; struct kbase_queue_group *group; struct kbase_va_region *queue_reg; struct work_struct oom_event_work; u64 base_addr; u32 size; u8 priority; + u8 group_priority; s8 csi_index; enum kbase_csf_queue_bind_state bind_state; bool enabled; @@ -387,40 +425,46 @@ struct kbase_queue { u64 trace_offset_ptr; u32 trace_buffer_size; u32 trace_cfg; - struct kbase_csf_notification error; - struct work_struct fatal_event_work; - u64 cs_fatal_info; - u32 cs_fatal; - atomic_t pending; + struct work_struct cs_error_work; + u64 cs_error_info; + u32 cs_error; + bool cs_error_fatal; u64 extract_ofs; #if IS_ENABLED(CONFIG_DEBUG_FS) u64 saved_cmd_ptr; -#endif +#endif /* CONFIG_DEBUG_FS */ }; /** * struct kbase_normal_suspend_buffer - Object representing a normal * suspend buffer for queue group. - * @reg: Memory region allocated for the normal-mode suspend buffer. + * @gpu_va: The start GPU VA address of the bound suspend buffer. Note, this + * field is only valid when the owner group has a region bound at + * runtime. * @phy: Array of physical memory pages allocated for the normal- * mode suspend buffer. */ struct kbase_normal_suspend_buffer { - struct kbase_va_region *reg; + u64 gpu_va; struct tagged_addr *phy; }; /** * struct kbase_protected_suspend_buffer - Object representing a protected * suspend buffer for queue group. - * @reg: Memory region allocated for the protected-mode suspend buffer. + * @gpu_va: The start GPU VA address of the bound protected mode suspend buffer. + * Note, this field is only valid when the owner group has a region + * bound at runtime. * @pma: Array of pointer to protected mode allocations containing * information about memory pages allocated for protected mode * suspend buffer. + * @alloc_retries: Number of times we retried allocing physical pages + * for protected suspend buffers. */ struct kbase_protected_suspend_buffer { - struct kbase_va_region *reg; + u64 gpu_va; struct protected_memory_allocation **pma; + u8 alloc_retries; }; /** @@ -446,6 +490,7 @@ struct kbase_protected_suspend_buffer { * allowed to use. * @compute_max: Maximum number of compute endpoints the group is * allowed to use. + * @csi_handlers: Requested CSI exception handler flags for the group. * @tiler_mask: Mask of tiler endpoints the group is allowed to use. * @fragment_mask: Mask of fragment endpoints the group is allowed to use. * @compute_mask: Mask of compute endpoints the group is allowed to use. @@ -467,6 +512,12 @@ struct kbase_protected_suspend_buffer { * @faulted: Indicates that a GPU fault occurred for the queue group. * This flag persists until the fault has been queued to be * reported to userspace. + * @cs_unrecoverable: Flag to unblock the thread waiting for CSG termination in + * case of CS_FATAL_EXCEPTION_TYPE_CS_UNRECOVERABLE + * @reevaluate_idle_status : Flag set when work is submitted for the normal group + * or it becomes unblocked during protected mode. The + * flag helps Scheduler confirm if the group actually + * became non idle or not. * @bound_queues: Array of registered queues bound to this queue group. * @doorbell_nr: Index of the hardware doorbell page assigned to the * group. @@ -476,12 +527,18 @@ struct kbase_protected_suspend_buffer { * have pending protected mode entry requests. * @error_fatal: An error of type BASE_GPU_QUEUE_GROUP_ERROR_FATAL to be * returned to userspace if such an error has occurred. - * @error_timeout: An error of type BASE_GPU_QUEUE_GROUP_ERROR_TIMEOUT - * to be returned to userspace if such an error has occurred. - * @error_tiler_oom: An error of type BASE_GPU_QUEUE_GROUP_ERROR_TILER_HEAP_OOM - * to be returned to userspace if such an error has occurred. * @timer_event_work: Work item to handle the progress timeout fatal event * for the group. + * @deschedule_deferred_cnt: Counter keeping a track of the number of threads + * that tried to deschedule the group and had to defer + * the descheduling due to the dump on fault. + * @csg_reg: An opaque pointer to the runtime bound shared regions. It is + * dynamically managed by the scheduler and can be NULL if the + * group is off-slot. + * @csg_reg_bind_retries: Runtime MCU shared region map operation attempted counts. + * It is accumulated on consecutive mapping attempt failures. On + * reaching a preset limit, the group is regarded as suffered + * a fatal error and triggers a fatal error notification. */ struct kbase_queue_group { struct kbase_context *kctx; @@ -494,6 +551,8 @@ struct kbase_queue_group { u8 tiler_max; u8 fragment_max; u8 compute_max; + u8 csi_handlers; + u64 tiler_mask; u64 fragment_mask; @@ -507,19 +566,36 @@ struct kbase_queue_group { u32 prepared_seq_num; u32 scan_seq_num; bool faulted; + bool cs_unrecoverable; + bool reevaluate_idle_status; struct kbase_queue *bound_queues[MAX_SUPPORTED_STREAMS_PER_GROUP]; int doorbell_nr; - struct work_struct protm_event_work; + struct kthread_work protm_event_work; DECLARE_BITMAP(protm_pending_bitmap, MAX_SUPPORTED_STREAMS_PER_GROUP); struct kbase_csf_notification error_fatal; - struct kbase_csf_notification error_timeout; - struct kbase_csf_notification error_tiler_oom; struct work_struct timer_event_work; + /** + * @dvs_buf: Address and size of scratch memory. + * + * Used to store intermediate DVS data by the GPU. + */ + u64 dvs_buf; +#if IS_ENABLED(CONFIG_DEBUG_FS) + u32 deschedule_deferred_cnt; +#endif + void *csg_reg; + u8 csg_reg_bind_retries; +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) + /** + * @prev_act: Previous CSG activity transition in a GPU metrics. + */ + bool prev_act; +#endif }; /** @@ -529,10 +605,10 @@ struct kbase_queue_group { * @lock: Lock preventing concurrent access to @array and the @in_use bitmap. * @array: Array of pointers to kernel CPU command queues. * @in_use: Bitmap which indicates which kernel CPU command queues are in use. - * @wq: Dedicated workqueue for processing kernel CPU command queues. - * @num_cmds: The number of commands that have been enqueued across - * all the KCPU command queues. This could be used as a - * timestamp to determine the command's enqueueing time. + * @cmd_seq_num: The sequence number assigned to an enqueued command, + * in incrementing order (older commands shall have a + * smaller number). + * @jit_lock: Lock to serialise JIT operations. * @jit_cmds_head: A list of the just-in-time memory commands, both * allocate & free, in submission order, protected * by kbase_csf_kcpu_queue_context.lock. @@ -545,9 +621,9 @@ struct kbase_csf_kcpu_queue_context { struct mutex lock; struct kbase_kcpu_command_queue *array[KBASEP_MAX_KCPU_QUEUES]; DECLARE_BITMAP(in_use, KBASEP_MAX_KCPU_QUEUES); - struct workqueue_struct *wq; - u64 num_cmds; + atomic64_t cmd_seq_num; + struct mutex jit_lock; struct list_head jit_cmds_head; struct list_head jit_blocked_queues; }; @@ -581,6 +657,8 @@ struct kbase_csf_cpu_queue_context { * @lock: Lock preventing concurrent access to the @in_use bitmap. * @in_use: Bitmap that indicates which heap context structures are currently * allocated (in @region). + * @heap_context_size_aligned: Size of a heap context structure, in bytes, + * aligned to GPU cacheline size. * * Heap context structures are allocated by the kernel for use by the firmware. * The current implementation subdivides a single GPU memory region for use as @@ -592,6 +670,7 @@ struct kbase_csf_heap_context_allocator { u64 gpu_va; struct mutex lock; DECLARE_BITMAP(in_use, MAX_TILER_HEAPS); + u32 heap_context_size_aligned; }; /** @@ -618,6 +697,28 @@ struct kbase_csf_tiler_heap_context { }; /** + * struct kbase_csf_ctx_heap_reclaim_info - Object representing the data section of + * a kctx for tiler heap reclaim manger + * @mgr_link: Link for hooking up to the heap reclaim manger's kctx lists + * @nr_freed_pages: Number of freed pages from the the kctx, after its attachment + * to the reclaim manager. This is used for tracking reclaim's + * free operation progress. + * @nr_est_unused_pages: Estimated number of pages that could be freed for the kctx + * when all its CSGs are off-slot, on attaching to the reclaim + * manager. + * @on_slot_grps: Number of on-slot groups from this kctx. In principle, if a + * kctx has groups on-slot, the scheduler will detach it from + * the tiler heap reclaim manager, i.e. no tiler heap memory + * reclaiming operations on the kctx. + */ +struct kbase_csf_ctx_heap_reclaim_info { + struct list_head mgr_link; + u32 nr_freed_pages; + u32 nr_est_unused_pages; + u8 on_slot_grps; +}; + +/** * struct kbase_csf_scheduler_context - Object representing the scheduler's * context for a GPU address space. * @@ -629,7 +730,7 @@ struct kbase_csf_tiler_heap_context { * GPU command queues are idle and at least one of them * is blocked on a sync wait operation. * @num_idle_wait_grps: Length of the @idle_wait_groups list. - * @sync_update_wq: Dedicated workqueue to process work items corresponding + * @sync_update_worker: Dedicated workqueue to process work items corresponding * to the sync_update events by sync_set/sync_add * instruction execution on CSs bound to groups * of @idle_wait_groups list. @@ -638,15 +739,20 @@ struct kbase_csf_tiler_heap_context { * streams bound to groups of @idle_wait_groups list. * @ngrp_to_schedule: Number of groups added for the context to the * 'groups_to_schedule' list of scheduler instance. + * @heap_info: Heap reclaim information data of the kctx. As the + * reclaim action needs to be coordinated with the scheduler + * operations, any manipulations on the data needs holding + * the scheduler's mutex lock. */ struct kbase_csf_scheduler_context { struct list_head runnable_groups[KBASE_QUEUE_GROUP_PRIORITY_COUNT]; u32 num_runnable_grps; struct list_head idle_wait_groups; u32 num_idle_wait_grps; - struct workqueue_struct *sync_update_wq; - struct work_struct sync_update_work; + struct kthread_worker sync_update_worker; + struct kthread_work sync_update_work; u32 ngrp_to_schedule; + struct kbase_csf_ctx_heap_reclaim_info heap_info; }; /** @@ -687,6 +793,23 @@ struct kbase_csf_event { }; /** + * struct kbase_csf_user_reg_context - Object containing members to manage the mapping + * of USER Register page for a context. + * + * @vma: Pointer to the VMA corresponding to the virtual mapping + * of the USER register page. + * @file_offset: File offset value that is assigned to userspace mapping + * of the USER Register page. It is in page units. + * @link: Links the context to the device list when mapping is pointing to + * either the dummy or the real Register page. + */ +struct kbase_csf_user_reg_context { + struct vm_area_struct *vma; + u32 file_offset; + struct list_head link; +}; + +/** * struct kbase_csf_context - Object representing CSF for a GPU address space. * * @event_pages_head: A list of pages allocated for the event memory used by @@ -724,20 +847,18 @@ struct kbase_csf_event { * used by GPU command queues, and progress timeout events. * @link: Link to this csf context in the 'runnable_kctxs' list of * the scheduler instance - * @user_reg_vma: Pointer to the vma corresponding to the virtual mapping - * of the USER register page. Currently used only for sanity - * checking. * @sched: Object representing the scheduler's context - * @pending_submission_work: Work item to process pending kicked GPU command queues. + * @protm_event_worker: Worker to process requests to enter protected mode. * @cpu_queue: CPU queue information. Only be available when DEBUG_FS * is enabled. + * @user_reg: Collective information to support mapping to USER Register page. */ struct kbase_csf_context { struct list_head event_pages_head; DECLARE_BITMAP(cookies, KBASE_CSF_NUM_USER_IO_PAGES_HANDLE); struct kbase_queue *user_pages_info[ KBASE_CSF_NUM_USER_IO_PAGES_HANDLE]; - struct mutex lock; + struct rt_mutex lock; struct kbase_queue_group *queue_groups[MAX_QUEUE_GROUP_NUM]; struct list_head queue_list; struct kbase_csf_kcpu_queue_context kcpu_queues; @@ -745,12 +866,12 @@ struct kbase_csf_context { struct kbase_csf_tiler_heap_context tiler_heaps; struct workqueue_struct *wq; struct list_head link; - struct vm_area_struct *user_reg_vma; struct kbase_csf_scheduler_context sched; - struct work_struct pending_submission_work; + struct kthread_worker protm_event_worker; #if IS_ENABLED(CONFIG_DEBUG_FS) struct kbase_csf_cpu_queue_context cpu_queue; #endif + struct kbase_csf_user_reg_context user_reg; }; /** @@ -765,6 +886,7 @@ struct kbase_csf_context { * mechanism to check for deadlocks involving reset waits. * @state: Tracks if the GPU reset is in progress or not. * The state is represented by enum @kbase_csf_reset_gpu_state. + * @force_pm_hw_reset: pixel: Powercycle the GPU instead of attempting a soft/hard reset. */ struct kbase_csf_reset_gpu { struct workqueue_struct *workq; @@ -772,6 +894,7 @@ struct kbase_csf_reset_gpu { wait_queue_head_t wait; struct rw_semaphore sem; atomic_t state; + bool force_pm_hw_reset; }; /** @@ -790,6 +913,49 @@ struct kbase_csf_csg_slot { }; /** + * struct kbase_csf_sched_heap_reclaim_mgr - Object for managing tiler heap reclaim + * kctx lists inside the CSF device's scheduler. + * + * @heap_reclaim: Tiler heap reclaim shrinker object. + * @ctx_lists: Array of kctx lists, size matching CSG defined priorities. The + * lists track the kctxs attached to the reclaim manager. + * @unused_pages: Estimated number of unused pages from the @ctxlist array. The + * number is indicative for use with reclaim shrinker's count method. + */ +struct kbase_csf_sched_heap_reclaim_mgr { + struct shrinker heap_reclaim; + struct list_head ctx_lists[KBASE_QUEUE_GROUP_PRIORITY_COUNT]; + atomic_t unused_pages; +}; + +/** + * struct kbase_csf_mcu_shared_regions - Control data for managing the MCU shared + * interface segment regions for scheduler + * operations + * + * @array_csg_regs: Base pointer of an internally created array_csg_regs[]. + * @unused_csg_regs: List contains unused csg_regs items. When an item is bound to a + * group that is placed onto on-slot by the scheduler, it is dropped + * from the list (i.e busy active). The Scheduler will put an active + * item back when it's becoming off-slot (not in use). + * @dummy_phys: An array of dummy phys[nr_susp_pages] pages for use with normal + * and pmode suspend buffers, as a default replacement of a CSG's pages + * for the MMU mapping when the csg_reg is not bound to a group. + * @pma_phys: Pre-allocated array phy[nr_susp_pages] for transitional use with + * protected suspend buffer MMU map operations. + * @userio_mem_rd_flags: Userio input page's read access mapping configuration flags. + * @dummy_phys_allocated: Indicating the @p dummy_phy page is allocated when true. + */ +struct kbase_csf_mcu_shared_regions { + void *array_csg_regs; + struct list_head unused_csg_regs; + struct tagged_addr *dummy_phys; + struct tagged_addr *pma_phys; + unsigned long userio_mem_rd_flags; + bool dummy_phys_allocated; +}; + +/** * struct kbase_csf_scheduler - Object representing the scheduler used for * CSF for an instance of GPU platform device. * @lock: Lock to serialize the scheduler operations and @@ -848,19 +1014,19 @@ struct kbase_csf_csg_slot { * "tock" schedule operation concluded. Used for * evaluating the exclusion window for in-cycle * schedule operation. + * @csf_worker: Dedicated kthread_worker to execute the @tick_work. * @timer_enabled: Whether the CSF scheduler wakes itself up for * periodic scheduling tasks. If this value is 0 * then it will only perform scheduling under the * influence of external factors e.g., IRQs, IOCTLs. - * @wq: Dedicated workqueue to execute the @tick_work. * @tick_timer: High-resolution timer employed to schedule tick * workqueue items (kernel-provided delayed_work * items do not use hrtimer and for some reason do * not provide sufficiently reliable periodicity). - * @tick_work: Work item that performs the "schedule on tick" - * operation to implement timeslice-based scheduling. - * @tock_work: Work item that would perform the schedule on tock - * operation to implement the asynchronous scheduling. + * @pending_tick_work: Indicates that kbase_csf_scheduler_kthread() should perform + * a scheduling tick. + * @pending_tock_work: Indicates that kbase_csf_scheduler_kthread() should perform + * a scheduling tock. * @ping_work: Work item that would ping the firmware at regular * intervals, only if there is a single active CSG * slot, to check if firmware is alive and would @@ -870,8 +1036,6 @@ struct kbase_csf_csg_slot { * @top_grp. * @top_grp: Pointer to queue group inside @groups_to_schedule * list that was assigned the highest slot priority. - * @tock_pending_request: A "tock" request is pending: a group that is not - * currently on the GPU demands to be scheduled. * @active_protm_grp: Indicates if firmware has been permitted to let GPU * enter protected mode with the given group. On exit * from protected mode the pointer is reset to NULL. @@ -884,6 +1048,13 @@ struct kbase_csf_csg_slot { * handler. * @gpu_idle_work: Work item for facilitating the scheduler to bring * the GPU to a low-power mode on becoming idle. + * @fast_gpu_idle_handling: Indicates whether to relax many of the checks + * normally done in the GPU idle worker. This is + * set to true when handling the GLB IDLE IRQ if the + * idle hysteresis timeout is 0, since it makes it + * possible to receive this IRQ before the extract + * offset is published (which would cause more + * extensive GPU idle checks to fail). * @gpu_no_longer_idle: Effective only when the GPU idle worker has been * queued for execution, this indicates whether the * GPU has become non-idle since the last time the @@ -901,22 +1072,41 @@ struct kbase_csf_csg_slot { * after GPU and L2 cache have been powered up. So when * this count is zero, MCU will not be powered up. * @csg_scheduling_period_ms: Duration of Scheduling tick in milliseconds. - * @tick_timer_active: Indicates whether the @tick_timer is effectively - * active or not, as the callback function of - * @tick_timer will enqueue @tick_work only if this - * flag is true. This is mainly useful for the case - * when scheduling tick needs to be advanced from - * interrupt context, without actually deactivating - * the @tick_timer first and then enqueing @tick_work. * @tick_protm_pending_seq: Scan out sequence number of the group that has * protected mode execution pending for the queue(s) * bound to it and will be considered first for the * protected mode execution compared to other such * groups. It is updated on every tick/tock. * @interrupt_lock is used to serialize the access. + * @sc_rails_off_work: Work item enqueued on GPU idle notification to + * turn off the shader core power rails. + * @sc_power_rails_off: Flag to keep a track of the status of shader core + * power rails, set to true when power rails are + * turned off. + * @gpu_idle_work_pending: Flag to indicate that the power down of GPU is + * pending and it is set after turning off the + * shader core power rails. The power down is skipped + * if the flag is cleared. @lock is used to serialize + * the access. Scheduling actions are skipped whilst + * this flag is set. + * @gpu_idle_fw_timer_enabled: Flag to keep a track if GPU idle event reporting + * is disabled on FW side. It is set for the power + * policy where the power managment of shader cores + * needs to be done by the Host. + * @protm_enter_time: GPU protected mode enter time. + * @reclaim_mgr: CSGs tiler heap manager object. + * @mcu_regs_data: Scheduler MCU shared regions data for managing the + * shared interface mappings for on-slot queues and + * CSG suspend buffers. + * @kthread_signal: Used to wake up the GPU queue submission + * thread when a queue needs attention. + * @kthread_running: Whether the GPU queue submission thread should keep + * executing. + * @gpuq_kthread: High-priority thread used to handle GPU queue + * submissions. */ struct kbase_csf_scheduler { - struct mutex lock; + struct rt_mutex lock; spinlock_t interrupt_lock; enum kbase_csf_scheduler_state state; DECLARE_BITMAP(doorbell_inuse_bitmap, CSF_NUM_DOORBELL); @@ -935,25 +1125,46 @@ struct kbase_csf_scheduler { DECLARE_BITMAP(csg_slots_idle_mask, MAX_SUPPORTED_CSGS); DECLARE_BITMAP(csg_slots_prio_update, MAX_SUPPORTED_CSGS); unsigned long last_schedule; - bool timer_enabled; - struct workqueue_struct *wq; + struct kthread_worker csf_worker; + atomic_t timer_enabled; struct hrtimer tick_timer; - struct work_struct tick_work; - struct delayed_work tock_work; + atomic_t pending_tick_work; + atomic_t pending_tock_work; struct delayed_work ping_work; struct kbase_context *top_ctx; struct kbase_queue_group *top_grp; - bool tock_pending_request; struct kbase_queue_group *active_protm_grp; - struct workqueue_struct *idle_wq; +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + struct delayed_work gpu_idle_work; +#else struct work_struct gpu_idle_work; +#endif + struct workqueue_struct *idle_wq; + bool fast_gpu_idle_handling; atomic_t gpu_no_longer_idle; atomic_t non_idle_offslot_grps; u32 non_idle_scanout_grps; u32 pm_active_count; unsigned int csg_scheduling_period_ms; - bool tick_timer_active; u32 tick_protm_pending_seq; +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + struct work_struct sc_rails_off_work; + bool sc_power_rails_off; + bool gpu_idle_work_pending; + bool gpu_idle_fw_timer_enabled; +#endif + ktime_t protm_enter_time; + struct kbase_csf_sched_heap_reclaim_mgr reclaim_mgr; + struct kbase_csf_mcu_shared_regions mcu_regs_data; + struct completion kthread_signal; + bool kthread_running; + struct task_struct *gpuq_kthread; +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) + /** + * @gpu_metrics_tb: Handler of firmware trace buffer for gpu_metrics + */ + struct firmware_trace_buffer *gpu_metrics_tb; +#endif /* CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD */ }; /* @@ -970,9 +1181,9 @@ struct kbase_csf_scheduler { GLB_PROGRESS_TIMER_TIMEOUT_SCALE) /* - * Default GLB_PWROFF_TIMER_TIMEOUT value in unit of micro-seconds. + * Default GLB_PWROFF_TIMER_TIMEOUT value in unit of nanosecond. */ -#define DEFAULT_GLB_PWROFF_TIMEOUT_US (800) +#define DEFAULT_GLB_PWROFF_TIMEOUT_NS (800 * 1000) /* * In typical operations, the management of the shader core power transitions @@ -1140,6 +1351,7 @@ struct kbase_ipa_control { * @flags: bitmask of CSF_FIRMWARE_ENTRY_* conveying the interface attributes * @data_start: Offset into firmware image at which the interface data starts * @data_end: Offset into firmware image at which the interface data ends + * @virtual_exe_start: Starting GPU execution virtual address of this interface * @kernel_map: A kernel mapping of the memory or NULL if not required to be * mapped in the kernel * @pma: Array of pointers to protected memory allocations. @@ -1156,6 +1368,7 @@ struct kbase_csf_firmware_interface { u32 flags; u32 data_start; u32 data_end; + u32 virtual_exe_start; void *kernel_map; struct protected_memory_allocation **pma; }; @@ -1174,6 +1387,144 @@ struct kbase_csf_hwcnt { bool enable_pending; }; +/* + * struct kbase_csf_mcu_fw - Object containing device loaded MCU firmware data. + * + * @size: Loaded firmware data size. Meaningful only when the + * other field @p data is not NULL. + * @data: Pointer to the device retained firmware data. If NULL + * means not loaded yet or error in loading stage. + */ +struct kbase_csf_mcu_fw { + size_t size; + u8 *data; +}; + +/* + * Firmware log polling period. + */ +#define KBASE_CSF_FIRMWARE_LOG_POLL_PERIOD_MS_DEFAULT 25 + +/** + * enum kbase_csf_firmware_log_mode - Firmware log operating mode + * + * @KBASE_CSF_FIRMWARE_LOG_MODE_MANUAL: Manual mode, firmware log can be read + * manually by the userspace (and it will also be dumped automatically into + * dmesg on GPU reset). + * + * @KBASE_CSF_FIRMWARE_LOG_MODE_AUTO_PRINT: Automatic printing mode, firmware log + * will be periodically emptied into dmesg, manual reading through debugfs is + * disabled. + * + * @KBASE_CSF_FIRMWARE_LOG_MODE_AUTO_DISCARD: Automatic discarding mode, firmware + * log will be periodically discarded, the remaining log can be read manually by + * the userspace (and it will also be dumped automatically into dmesg on GPU + * reset). + */ +enum kbase_csf_firmware_log_mode { + KBASE_CSF_FIRMWARE_LOG_MODE_MANUAL, + KBASE_CSF_FIRMWARE_LOG_MODE_AUTO_PRINT, + KBASE_CSF_FIRMWARE_LOG_MODE_AUTO_DISCARD +}; + +/** + * struct kbase_csf_firmware_log - Object containing members for handling firmware log. + * + * @mode: Firmware log operating mode. + * @busy: Indicating whether a firmware log operation is in progress. + * @poll_work: Work item that would poll firmware log buffer + * at regular intervals to perform any periodic + * activities required by current log mode. + * @dump_buf: Buffer used for dumping the log. + * @func_call_list_va_start: Virtual address of the start of the call list of FW log functions. + * @func_call_list_va_end: Virtual address of the end of the call list of FW log functions. + * @poll_period_ms: Firmware log polling period in milliseconds. + */ +struct kbase_csf_firmware_log { + enum kbase_csf_firmware_log_mode mode; + atomic_t busy; + struct delayed_work poll_work; + u8 *dump_buf; + u32 func_call_list_va_start; + u32 func_call_list_va_end; + atomic_t poll_period_ms; +}; + +/** + * struct kbase_csf_firmware_core_dump - Object containing members for handling + * firmware core dump. + * + * @mcu_regs_addr: GPU virtual address of the start of the MCU registers buffer + * in Firmware. + * @version: Version of the FW image header core dump data format. Bits + * 7:0 specify version minor and 15:8 specify version major. + * @available: Flag to identify if the FW core dump buffer is available. + * True if entry is available in the FW image header and version + * is supported, False otherwise. + */ +struct kbase_csf_firmware_core_dump { + u32 mcu_regs_addr; + u16 version; + bool available; +}; + +#if IS_ENABLED(CONFIG_DEBUG_FS) +/** + * struct kbase_csf_dump_on_fault - Faulty information to deliver to the daemon + * + * @error_code: Error code. + * @kctx_tgid: tgid value of the Kbase context for which the fault happened. + * @kctx_id: id of the Kbase context for which the fault happened. + * @enabled: Flag to indicate that 'csf_fault' debugfs has been opened + * so dump on fault is enabled. + * @fault_wait_wq: Waitqueue on which user space client is blocked till kbase + * reports a fault. + * @dump_wait_wq: Waitqueue on which kbase threads are blocked till user space client + * completes the dump on fault. + * @lock: Lock to protect this struct members from concurrent access. + */ +struct kbase_csf_dump_on_fault { + enum dumpfault_error_type error_code; + u32 kctx_tgid; + u32 kctx_id; + atomic_t enabled; + wait_queue_head_t fault_wait_wq; + wait_queue_head_t dump_wait_wq; + spinlock_t lock; +}; +#endif /* CONFIG_DEBUG_FS*/ + +/** + * struct kbase_csf_user_reg - Object containing members to manage the mapping + * of USER Register page for all contexts + * + * @dummy_page: Address of a dummy page that is mapped in place + * of the real USER Register page just before the GPU + * is powered down. The USER Register page is mapped + * in the address space of every process, that created + * a Base context, to enable the access to LATEST_FLUSH + * register from userspace. + * @filp: Pointer to a dummy file, that along with @file_offset, + * facilitates the use of unique file offset for the userspace mapping + * created for USER Register page. + * The userspace mapping is made to point to this file + * inside the mmap handler. + * @file_offset: Counter that is incremented every time Userspace creates a mapping of + * USER Register page, to provide a unique file offset range for + * @filp file, so that the CPU PTE of the Userspace mapping can be zapped + * through the kernel function unmap_mapping_range(). + * It is incremented in page units. + * @list: Linked list to maintain user processes(contexts) + * having the mapping to USER Register page. + * It's protected by &kbase_csf_device.reg_lock. + */ +struct kbase_csf_user_reg { + struct tagged_addr dummy_page; + struct file *filp; + u32 file_offset; + struct list_head list; +}; + /** * struct kbase_csf_device - Object representing CSF for an instance of GPU * platform device. @@ -1192,7 +1543,7 @@ struct kbase_csf_hwcnt { * image. * @shared_interface: Pointer to the interface object containing info for * the memory area shared between firmware & host. - * @shared_reg_rbtree: RB tree of the memory regions allocated from the + * @mcu_shared_zone: Memory zone tracking memory regions allocated from the * shared interface segment in MCU firmware address * space. * @db_filp: Pointer to a dummy file, that alongwith @@ -1211,17 +1562,6 @@ struct kbase_csf_hwcnt { * of the real Hw doorbell page for the active GPU * command queues after they are stopped or after the * GPU is powered down. - * @dummy_user_reg_page: Address of the dummy page that is mapped in place - * of the real User register page just before the GPU - * is powered down. The User register page is mapped - * in the address space of every process, that created - * a Base context, to enable the access to LATEST_FLUSH - * register from userspace. - * @mali_file_inode: Pointer to the inode corresponding to mali device - * file. This is needed in order to switch to the - * @dummy_user_reg_page on GPU power down. - * All instances of the mali device file will point to - * the same inode. * @reg_lock: Lock to serialize the MCU firmware related actions * that affect all contexts such as allocation of * regions from shared interface area, assignment of @@ -1264,27 +1604,48 @@ struct kbase_csf_hwcnt { * acknowledgement is pending. * @fw_error_work: Work item for handling the firmware internal error * fatal event. + * @coredump_work: Work item for initiating a platform core dump. * @ipa_control: IPA Control component manager. - * @mcu_core_pwroff_dur_us: Sysfs attribute for the glb_pwroff timeout input - * in unit of micro-seconds. The firmware does not use + * @mcu_core_pwroff_dur_ns: Sysfs attribute for the glb_pwroff timeout input + * in unit of nanoseconds. The firmware does not use * it directly. * @mcu_core_pwroff_dur_count: The counterpart of the glb_pwroff timeout input * in interface required format, ready to be used * directly in the firmware. + * @mcu_core_pwroff_dur_count_modifier: Update csffw_glb_req_cfg_pwroff_timer + * to make the shr(10) modifier conditional + * on new flag in GLB_PWROFF_TIMER_CONFIG * @mcu_core_pwroff_reg_shadow: The actual value that has been programed into * the glb_pwoff register. This is separated from * the @p mcu_core_pwroff_dur_count as an update * to the latter is asynchronous. - * @gpu_idle_hysteresis_ms: Sysfs attribute for the idle hysteresis time - * window in unit of ms. The firmware does not use it - * directly. + * @gpu_idle_hysteresis_ns: Sysfs attribute for the idle hysteresis time + * window in unit of nanoseconds. The firmware does not + * use it directly. * @gpu_idle_dur_count: The counterpart of the hysteresis time window in * interface required format, ready to be used * directly in the firmware. + * @gpu_idle_dur_count_modifier: Update csffw_glb_req_idle_enable to make the shr(10) + * modifier conditional on the new flag + * in GLB_IDLE_TIMER_CONFIG. * @fw_timeout_ms: Timeout value (in milliseconds) used when waiting * for any request sent to the firmware. * @hwcnt: Contain members required for handling the dump of * HW counters. + * @fw: Copy of the loaded MCU firmware image. + * @fw_log: Contain members required for handling firmware log. + * @fw_core_dump: Contain members required for handling the firmware + * core dump. + * @dof: Structure for dump on fault. + * @user_reg: Collective information to support the mapping to + * USER Register page for user processes. + * @pending_gpuq_kicks: Lists of GPU queue that have been kicked but not + * yet processed, categorised by queue group's priority. + * @pending_gpuq_kicks_lock: Protect @pending_gpu_kicks and + * kbase_queue.pending_kick_link. + * @quirks_ext: Pointer to an allocated buffer containing the firmware + * workarounds configuration. + * @pmode_sync_sem: RW Semaphore to prevent MMU operations during P.Mode entrance. */ struct kbase_csf_device { struct kbase_mmu_table mcu_mmu; @@ -1294,12 +1655,10 @@ struct kbase_csf_device { struct kobject *fw_cfg_kobj; struct kbase_csf_trace_buffers firmware_trace_buffers; void *shared_interface; - struct rb_root shared_reg_rbtree; + struct kbase_reg_zone mcu_shared_zone; struct file *db_filp; u32 db_file_offsets; struct tagged_addr dummy_db_page; - struct tagged_addr dummy_user_reg_page; - struct inode *mali_file_inode; struct mutex reg_lock; wait_queue_head_t event_wait; bool interrupt_received; @@ -1316,14 +1675,34 @@ struct kbase_csf_device { struct work_struct firmware_reload_work; bool glb_init_request_pending; struct work_struct fw_error_work; + struct work_struct coredump_work; struct kbase_ipa_control ipa_control; - u32 mcu_core_pwroff_dur_us; + u32 mcu_core_pwroff_dur_ns; u32 mcu_core_pwroff_dur_count; + u32 mcu_core_pwroff_dur_count_modifier; u32 mcu_core_pwroff_reg_shadow; - u32 gpu_idle_hysteresis_ms; + u32 gpu_idle_hysteresis_ns; u32 gpu_idle_dur_count; + u32 gpu_idle_dur_count_modifier; unsigned int fw_timeout_ms; struct kbase_csf_hwcnt hwcnt; + struct kbase_csf_mcu_fw fw; + struct kbase_csf_firmware_log fw_log; + struct kbase_csf_firmware_core_dump fw_core_dump; +#if IS_ENABLED(CONFIG_DEBUG_FS) + struct kbase_csf_dump_on_fault dof; +#endif /* CONFIG_DEBUG_FS */ +#if IS_ENABLED(CONFIG_MALI_CORESIGHT) + /** + * @coresight: Coresight device structure. + */ + struct kbase_debug_coresight_device coresight; +#endif /* IS_ENABLED(CONFIG_MALI_CORESIGHT) */ + struct kbase_csf_user_reg user_reg; + struct list_head pending_gpuq_kicks[KBASE_QUEUE_GROUP_PRIORITY_COUNT]; + spinlock_t pending_gpuq_kicks_lock; + u32 *quirks_ext; + struct rw_semaphore pmode_sync_sem; }; /** diff --git a/mali_kbase/csf/mali_kbase_csf_event.c b/mali_kbase/csf/mali_kbase_csf_event.c index 5c86688..63e6c15 100644 --- a/mali_kbase/csf/mali_kbase_csf_event.c +++ b/mali_kbase/csf/mali_kbase_csf_event.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -102,7 +102,7 @@ static void sync_update_notify_gpu(struct kbase_context *kctx) if (can_notify_gpu) { kbase_csf_ring_doorbell(kctx->kbdev, CSF_KERNEL_DOORBELL_NR); - KBASE_KTRACE_ADD(kctx->kbdev, SYNC_UPDATE_EVENT_NOTIFY_GPU, kctx, 0u); + KBASE_KTRACE_ADD(kctx->kbdev, CSF_SYNC_UPDATE_NOTIFY_GPU_EVENT, kctx, 0u); } spin_unlock_irqrestore(&kctx->kbdev->hwaccess_lock, flags); @@ -120,7 +120,7 @@ void kbase_csf_event_signal(struct kbase_context *kctx, bool notify_gpu) /* First increment the signal count and wake up event thread. */ atomic_set(&kctx->event_count, 1); - kbase_event_wakeup(kctx); + kbase_event_wakeup_nosync(kctx); /* Signal the CSF firmware. This is to ensure that pending command * stream synch object wait operations are re-evaluated. @@ -169,7 +169,8 @@ void kbase_csf_event_term(struct kbase_context *kctx) kfree(event_cb); } - WARN_ON(!list_empty(&kctx->csf.event.error_list)); + WARN(!list_empty(&kctx->csf.event.error_list), + "Error list not empty for ctx %d_%d\n", kctx->tgid, kctx->id); spin_unlock_irqrestore(&kctx->csf.event.lock, flags); } @@ -226,12 +227,15 @@ void kbase_csf_event_add_error(struct kbase_context *const kctx, return; spin_lock_irqsave(&kctx->csf.event.lock, flags); - if (!WARN_ON(!list_empty(&error->link))) { + if (list_empty(&error->link)) { error->data = *data; list_add_tail(&error->link, &kctx->csf.event.error_list); dev_dbg(kctx->kbdev->dev, "Added error %pK of type %d in context %pK\n", (void *)error, data->type, (void *)kctx); + } else { + dev_dbg(kctx->kbdev->dev, "Error %pK of type %d already pending in context %pK", + (void *)error, error->data.type, (void *)kctx); } spin_unlock_irqrestore(&kctx->csf.event.lock, flags); } @@ -241,6 +245,14 @@ bool kbase_csf_event_error_pending(struct kbase_context *kctx) bool error_pending = false; unsigned long flags; + /* Withhold the error event if the dump on fault is ongoing. + * This would prevent the Userspace from taking error recovery actions + * (which can potentially affect the state that is being dumped). + * Event handling thread would eventually notice the error event. + */ + if (unlikely(!kbase_debug_csf_fault_dump_complete(kctx->kbdev))) + return false; + spin_lock_irqsave(&kctx->csf.event.lock, flags); error_pending = !list_empty(&kctx->csf.event.error_list); diff --git a/mali_kbase/csf/mali_kbase_csf_event.h b/mali_kbase/csf/mali_kbase_csf_event.h index 4c853b5..52122a9 100644 --- a/mali_kbase/csf/mali_kbase_csf_event.h +++ b/mali_kbase/csf/mali_kbase_csf_event.h @@ -30,8 +30,8 @@ struct kbase_csf_event; enum kbase_csf_event_callback_action; /** - * kbase_csf_event_callback_action - type for callback functions to be - * called upon CSF events. + * kbase_csf_event_callback - type for callback functions to be + * called upon CSF events. * @param: Generic parameter to pass to the callback function. * * This is the type of callback functions that can be registered diff --git a/mali_kbase/csf/mali_kbase_csf_firmware.c b/mali_kbase/csf/mali_kbase_csf_firmware.c index bf7cdf4..cf4bb4c 100644 --- a/mali_kbase/csf/mali_kbase_csf_firmware.c +++ b/mali_kbase/csf/mali_kbase_csf_firmware.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2018-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2018-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -21,6 +21,8 @@ #include "mali_kbase.h" #include "mali_kbase_csf_firmware_cfg.h" +#include "mali_kbase_csf_firmware_log.h" +#include "mali_kbase_csf_firmware_core_dump.h" #include "mali_kbase_csf_trace_buffer.h" #include "mali_kbase_csf_timeout.h" #include "mali_kbase_mem.h" @@ -37,27 +39,29 @@ #include "backend/gpu/mali_kbase_clk_rate_trace_mgr.h" #include <csf/ipa_control/mali_kbase_csf_ipa_control.h> #include <csf/mali_kbase_csf_registers.h> - #include <linux/list.h> #include <linux/slab.h> #include <linux/firmware.h> #include <linux/mman.h> #include <linux/string.h> #include <linux/mutex.h> +#include <linux/ctype.h> #if (KERNEL_VERSION(4, 13, 0) <= LINUX_VERSION_CODE) #include <linux/set_memory.h> #endif #include <mmu/mali_kbase_mmu.h> #include <asm/arch_timer.h> +#include <linux/delay.h> +#include <linux/version_compat_defs.h> -#define MALI_MAX_FIRMWARE_NAME_LEN ((size_t)20) +#define MALI_MAX_DEFAULT_FIRMWARE_NAME_LEN ((size_t)20) -static char fw_name[MALI_MAX_FIRMWARE_NAME_LEN] = "mali_csffw.bin"; -module_param_string(fw_name, fw_name, sizeof(fw_name), 0644); +static char default_fw_name[MALI_MAX_DEFAULT_FIRMWARE_NAME_LEN] = "mali_csffw.bin"; +module_param_string(fw_name, default_fw_name, sizeof(default_fw_name), 0644); MODULE_PARM_DESC(fw_name, "firmware image"); /* The waiting time for firmware to boot */ -static unsigned int csf_firmware_boot_timeout_ms = 500; +static unsigned int csf_firmware_boot_timeout_ms; module_param(csf_firmware_boot_timeout_ms, uint, 0444); MODULE_PARM_DESC(csf_firmware_boot_timeout_ms, "Maximum time to wait for firmware to boot."); @@ -75,9 +79,10 @@ MODULE_PARM_DESC(fw_debug, "Enables effective use of a debugger for debugging firmware code."); #endif -#define FIRMWARE_HEADER_MAGIC (0xC3F13A6Eul) -#define FIRMWARE_HEADER_VERSION (0ul) -#define FIRMWARE_HEADER_LENGTH (0x14ul) +#define FIRMWARE_HEADER_MAGIC (0xC3F13A6Eul) +#define FIRMWARE_HEADER_VERSION_MAJOR (0ul) +#define FIRMWARE_HEADER_VERSION_MINOR (3ul) +#define FIRMWARE_HEADER_LENGTH (0x14ul) #define CSF_FIRMWARE_ENTRY_SUPPORTED_FLAGS \ (CSF_FIRMWARE_ENTRY_READ | \ @@ -88,11 +93,13 @@ MODULE_PARM_DESC(fw_debug, CSF_FIRMWARE_ENTRY_ZERO | \ CSF_FIRMWARE_ENTRY_CACHE_MODE) -#define CSF_FIRMWARE_ENTRY_TYPE_INTERFACE (0) -#define CSF_FIRMWARE_ENTRY_TYPE_CONFIGURATION (1) -#define CSF_FIRMWARE_ENTRY_TYPE_FUTF_TEST (2) -#define CSF_FIRMWARE_ENTRY_TYPE_TRACE_BUFFER (3) -#define CSF_FIRMWARE_ENTRY_TYPE_TIMELINE_METADATA (4) +#define CSF_FIRMWARE_ENTRY_TYPE_INTERFACE (0) +#define CSF_FIRMWARE_ENTRY_TYPE_CONFIGURATION (1) +#define CSF_FIRMWARE_ENTRY_TYPE_TRACE_BUFFER (3) +#define CSF_FIRMWARE_ENTRY_TYPE_TIMELINE_METADATA (4) +#define CSF_FIRMWARE_ENTRY_TYPE_BUILD_INFO_METADATA (6) +#define CSF_FIRMWARE_ENTRY_TYPE_FUNC_CALL_LIST (7) +#define CSF_FIRMWARE_ENTRY_TYPE_CORE_DUMP (9) #define CSF_FIRMWARE_CACHE_MODE_NONE (0ul << 3) #define CSF_FIRMWARE_CACHE_MODE_CACHED (1ul << 3) @@ -109,6 +116,8 @@ MODULE_PARM_DESC(fw_debug, (GLB_REQ_CFG_ALLOC_EN_MASK | GLB_REQ_CFG_PROGRESS_TIMER_MASK | \ GLB_REQ_CFG_PWROFF_TIMER_MASK | GLB_REQ_IDLE_ENABLE_MASK) +char fw_git_sha[BUILD_INFO_GIT_SHA_LEN]; + static inline u32 input_page_read(const u32 *const input, const u32 offset) { WARN_ON(offset % sizeof(u32)); @@ -176,7 +185,7 @@ struct firmware_timeline_metadata { /* The shared interface area, used for communicating with firmware, is managed * like a virtual memory zone. Reserve the virtual space from that zone * corresponding to shared interface entry parsed from the firmware image. - * The shared_reg_rbtree should have been initialized before calling this + * The MCU_SHARED_ZONE should have been initialized before calling this * function. */ static int setup_shared_iface_static_region(struct kbase_device *kbdev) @@ -189,8 +198,7 @@ static int setup_shared_iface_static_region(struct kbase_device *kbdev) if (!interface) return -EINVAL; - reg = kbase_alloc_free_region(&kbdev->csf.shared_reg_rbtree, 0, - interface->num_pages_aligned, KBASE_REG_ZONE_MCU_SHARED); + reg = kbase_alloc_free_region(&kbdev->csf.mcu_shared_zone, 0, interface->num_pages_aligned); if (reg) { mutex_lock(&kbdev->csf.reg_lock); ret = kbase_add_va_region_rbtree(kbdev, reg, @@ -249,10 +257,15 @@ static void stop_csf_firmware(struct kbase_device *kbdev) static void wait_for_firmware_boot(struct kbase_device *kbdev) { - const long wait_timeout = - kbase_csf_timeout_in_jiffies(csf_firmware_boot_timeout_ms); + long wait_timeout; long remaining; + if (!csf_firmware_boot_timeout_ms) + csf_firmware_boot_timeout_ms = + kbase_get_timeout_ms(kbdev, CSF_FIRMWARE_BOOT_TIMEOUT); + + wait_timeout = kbase_csf_timeout_in_jiffies(csf_firmware_boot_timeout_ms); + /* Firmware will generate a global interface interrupt once booting * is complete */ @@ -269,22 +282,53 @@ static void boot_csf_firmware(struct kbase_device *kbdev) { kbase_csf_firmware_enable_mcu(kbdev); +#if IS_ENABLED(CONFIG_MALI_CORESIGHT) + kbase_debug_coresight_csf_state_request(kbdev, KBASE_DEBUG_CORESIGHT_CSF_ENABLED); + + if (!kbase_debug_coresight_csf_state_wait(kbdev, KBASE_DEBUG_CORESIGHT_CSF_ENABLED)) + dev_err(kbdev->dev, "Timeout waiting for CoreSight to be enabled"); +#endif /* IS_ENABLED(CONFIG_MALI_CORESIGHT) */ + wait_for_firmware_boot(kbdev); } -static void wait_ready(struct kbase_device *kbdev) +/** + * wait_ready() - Wait for previously issued MMU command to complete. + * + * @kbdev: Kbase device to wait for a MMU command to complete. + * + * Reset GPU if the wait for previously issued command times out. + * + * Return: 0 on success, error code otherwise. + */ +static int wait_ready(struct kbase_device *kbdev) { - u32 max_loops = KBASE_AS_INACTIVE_MAX_LOOPS; - u32 val; + const ktime_t wait_loop_start = ktime_get_raw(); + const u32 mmu_as_inactive_wait_time_ms = kbdev->mmu_or_gpu_cache_op_wait_time_ms; + s64 diff; - val = kbase_reg_read(kbdev, MMU_AS_REG(MCU_AS_NR, AS_STATUS)); + do { + unsigned int i; - /* Wait for a while for the update command to take effect */ - while (--max_loops && (val & AS_STATUS_AS_ACTIVE)) - val = kbase_reg_read(kbdev, MMU_AS_REG(MCU_AS_NR, AS_STATUS)); + for (i = 0; i < 1000; i++) { + /* Wait for the MMU status to indicate there is no active command */ + if (!(kbase_reg_read(kbdev, + MMU_STAGE1_REG(MMU_AS_REG(MCU_AS_NR, AS_STATUS))) & + AS_STATUS_AS_ACTIVE)) + return 0; + } + + diff = ktime_to_ms(ktime_sub(ktime_get_raw(), wait_loop_start)); + } while (diff < mmu_as_inactive_wait_time_ms); - if (max_loops == 0) - dev_err(kbdev->dev, "AS_ACTIVE bit stuck, might be caused by slow/unstable GPU clock or possible faulty FPGA connector\n"); + dev_err(kbdev->dev, + "AS_ACTIVE bit stuck for MCU AS. Might be caused by unstable GPU clk/pwr or faulty system"); + queue_work(system_highpri_wq, &kbdev->csf.coredump_work); + + if (kbase_prepare_to_reset_gpu_locked(kbdev, RESET_FLAGS_HWC_UNRECOVERABLE_ERROR)) + kbase_reset_gpu_locked(kbdev); + + return -ETIMEDOUT; } static void unload_mmu_tables(struct kbase_device *kbdev) @@ -299,7 +343,7 @@ static void unload_mmu_tables(struct kbase_device *kbdev) mutex_unlock(&kbdev->mmu_hw_mutex); } -static void load_mmu_tables(struct kbase_device *kbdev) +static int load_mmu_tables(struct kbase_device *kbdev) { unsigned long irq_flags; @@ -310,7 +354,7 @@ static void load_mmu_tables(struct kbase_device *kbdev) mutex_unlock(&kbdev->mmu_hw_mutex); /* Wait for a while for the update command to take effect */ - wait_ready(kbdev); + return wait_ready(kbdev); } /** @@ -402,7 +446,7 @@ static void load_fw_image_section(struct kbase_device *kbdev, const u8 *data, for (page_num = 0; page_num < page_limit; ++page_num) { struct page *const page = as_page(phys[page_num]); - char *const p = kmap_atomic(page); + char *const p = kbase_kmap_atomic(page); u32 const copy_len = min_t(u32, PAGE_SIZE, data_len); if (copy_len > 0) { @@ -417,9 +461,9 @@ static void load_fw_image_section(struct kbase_device *kbdev, const u8 *data, memset(p + copy_len, 0, zi_len); } - kbase_sync_single_for_device(kbdev, kbase_dma_addr(page), - PAGE_SIZE, DMA_TO_DEVICE); - kunmap_atomic(p); + kbase_sync_single_for_device(kbdev, kbase_dma_addr_from_tagged(phys[page_num]), + PAGE_SIZE, DMA_TO_DEVICE); + kbase_kunmap_atomic(p); } } @@ -427,24 +471,17 @@ static int reload_fw_image(struct kbase_device *kbdev) { const u32 magic = FIRMWARE_HEADER_MAGIC; struct kbase_csf_firmware_interface *interface; - const struct firmware *firmware; + struct kbase_csf_mcu_fw *const mcu_fw = &kbdev->csf.fw; int ret = 0; - if (request_firmware(&firmware, fw_name, kbdev->dev) != 0) { - dev_err(kbdev->dev, - "Failed to reload firmware image '%s'\n", - fw_name); - return -ENOENT; - } - - /* Do couple of basic sanity checks */ - if (firmware->size < FIRMWARE_HEADER_LENGTH) { - dev_err(kbdev->dev, "Firmware image unexpectedly too small\n"); + if (WARN_ON(mcu_fw->data == NULL)) { + dev_err(kbdev->dev, "Firmware image copy not loaded\n"); ret = -EINVAL; goto out; } - if (memcmp(firmware->data, &magic, sizeof(magic)) != 0) { + /* Do a basic sanity check on MAGIC signature */ + if (memcmp(mcu_fw->data, &magic, sizeof(magic)) != 0) { dev_err(kbdev->dev, "Incorrect magic value, firmware image could have been corrupted\n"); ret = -EINVAL; goto out; @@ -459,16 +496,14 @@ static int reload_fw_image(struct kbase_device *kbdev) continue; } - load_fw_image_section(kbdev, firmware->data, interface->phys, - interface->num_pages, interface->flags, - interface->data_start, interface->data_end); + load_fw_image_section(kbdev, mcu_fw->data, interface->phys, interface->num_pages, + interface->flags, interface->data_start, interface->data_end); } kbdev->csf.firmware_full_reload_needed = false; kbase_csf_firmware_reload_trace_buffers_data(kbdev); out: - release_firmware(firmware); return ret; } @@ -480,6 +515,7 @@ out: * @kbdev: Kbase device structure * @virtual_start: Start of the virtual address range required for an entry allocation * @virtual_end: End of the virtual address range required for an entry allocation + * @flags: Firmware entry flags for comparison with the reusable pages found * @phys: Pointer to the array of physical (tagged) addresses making up the new * FW interface entry. It is an output parameter which would be made to * point to an already existing array allocated for the previously parsed @@ -494,16 +530,19 @@ out: * within the 2MB pages aligned allocation. * @is_small_page: This is an output flag used to select between the small and large page * to be used for the FW entry allocation. + * @force_small_page: Use 4kB pages to allocate memory needed for FW loading * * Go through all the already initialized interfaces and find if a previously * allocated large page can be used to store contents of new FW interface entry. * * Return: true if a large page can be reused, false otherwise. */ -static inline bool entry_find_large_page_to_reuse( - struct kbase_device *kbdev, const u32 virtual_start, const u32 virtual_end, - struct tagged_addr **phys, struct protected_memory_allocation ***pma, - u32 num_pages, u32 *num_pages_aligned, bool *is_small_page) +static inline bool entry_find_large_page_to_reuse(struct kbase_device *kbdev, + const u32 virtual_start, const u32 virtual_end, + const u32 flags, struct tagged_addr **phys, + struct protected_memory_allocation ***pma, + u32 num_pages, u32 *num_pages_aligned, + bool *is_small_page, bool force_small_page) { struct kbase_csf_firmware_interface *interface = NULL; struct kbase_csf_firmware_interface *target_interface = NULL; @@ -519,7 +558,61 @@ static inline bool entry_find_large_page_to_reuse( *phys = NULL; *pma = NULL; + if (force_small_page) + goto out; + + /* If the section starts at 2MB aligned boundary, + * then use 2MB page(s) for it. + */ + if (!(virtual_start & (SZ_2M - 1))) { + *num_pages_aligned = + round_up(*num_pages_aligned, NUM_4K_PAGES_IN_2MB_PAGE); + *is_small_page = false; + goto out; + } + + /* If the section doesn't lie within the same 2MB aligned boundary, + * then use 4KB pages as it would be complicated to use a 2MB page + * for such section. + */ + if ((virtual_start & ~(SZ_2M - 1)) != (virtual_end & ~(SZ_2M - 1))) + goto out; + + /* Find the nearest 2MB aligned section which comes before the current + * section. + */ + list_for_each_entry(interface, &kbdev->csf.firmware_interfaces, node) { + const u32 virtual_diff = virtual_start - interface->virtual; + + if (interface->virtual > virtual_end) + continue; + + if (interface->virtual & (SZ_2M - 1)) + continue; + + if ((virtual_diff < virtual_diff_min) && (interface->flags == flags)) { + target_interface = interface; + virtual_diff_min = virtual_diff; + } + } + + if (target_interface) { + const u32 page_index = virtual_diff_min >> PAGE_SHIFT; + + if (page_index >= target_interface->num_pages_aligned) + goto out; + if (target_interface->phys) + *phys = &target_interface->phys[page_index]; + + if (target_interface->pma) + *pma = &target_interface->pma[page_index / NUM_4K_PAGES_IN_2MB_PAGE]; + + *is_small_page = false; + reuse_large_page = true; + } + +out: return reuse_large_page; } @@ -538,8 +631,8 @@ static inline bool entry_find_large_page_to_reuse( * Return: 0 if successful, negative error code on failure */ static int parse_memory_setup_entry(struct kbase_device *kbdev, - const struct firmware *fw, - const u32 *entry, unsigned int size) + const struct kbase_csf_mcu_fw *const fw, const u32 *entry, + unsigned int size) { int ret = 0; const u32 flags = entry[0]; @@ -550,6 +643,8 @@ static int parse_memory_setup_entry(struct kbase_device *kbdev, u32 num_pages; u32 num_pages_aligned; char *name; + void *name_entry; + unsigned int name_len; struct tagged_addr *phys = NULL; struct kbase_csf_firmware_interface *interface = NULL; bool allocated_pages = false, protected_mode = false; @@ -558,6 +653,7 @@ static int parse_memory_setup_entry(struct kbase_device *kbdev, struct protected_memory_allocation **pma = NULL; bool reuse_pages = false; bool is_small_page = true; + bool force_small_page = false; if (data_end < data_start) { dev_err(kbdev->dev, "Firmware corrupt, data_end < data_start (0x%x<0x%x)\n", @@ -592,7 +688,7 @@ static int parse_memory_setup_entry(struct kbase_device *kbdev, protected_mode = true; if (protected_mode && kbdev->csf.pma_dev == NULL) { - dev_err(kbdev->dev, + dev_warn(kbdev->dev, "Protected memory allocator not found, Firmware protected mode entry will not be supported"); return 0; } @@ -600,9 +696,15 @@ static int parse_memory_setup_entry(struct kbase_device *kbdev, num_pages = (virtual_end - virtual_start) >> PAGE_SHIFT; - reuse_pages = entry_find_large_page_to_reuse( - kbdev, virtual_start, virtual_end, &phys, &pma, - num_pages, &num_pages_aligned, &is_small_page); + if(protected_mode) { + force_small_page = true; + dev_warn(kbdev->dev, "Protected memory allocation requested for %u bytes (%u pages), serving with small pages and tight allocation.", (virtual_end - virtual_start), num_pages); + } + +retry_alloc: + reuse_pages = entry_find_large_page_to_reuse(kbdev, virtual_start, virtual_end, flags, + &phys, &pma, num_pages, &num_pages_aligned, + &is_small_page, force_small_page); if (!reuse_pages) phys = kmalloc_array(num_pages_aligned, sizeof(*phys), GFP_KERNEL); @@ -613,23 +715,41 @@ static int parse_memory_setup_entry(struct kbase_device *kbdev, if (!reuse_pages) { pma = kbase_csf_protected_memory_alloc( kbdev, phys, num_pages_aligned, is_small_page); + if (!pma) { + /* If we can't allocate sufficient memory for FW - bail out and leave protected execution unsupported by termintating the allocator. */ + dev_warn(kbdev->dev, + "Protected memory allocation failed during FW initialization - Firmware protected mode entry will not be supported"); + kbase_csf_protected_memory_term(kbdev); + kbdev->csf.pma_dev = NULL; + kfree(phys); + return 0; + } + } else if (WARN_ON(!pma)) { + ret = -EINVAL; + goto out; } - - if (!pma) - ret = -ENOMEM; } else { if (!reuse_pages) { ret = kbase_mem_pool_alloc_pages( - kbase_mem_pool_group_select( - kbdev, KBASE_MEM_GROUP_CSF_FW, is_small_page), - num_pages_aligned, phys, false); + kbase_mem_pool_group_select(kbdev, KBASE_MEM_GROUP_CSF_FW, + is_small_page), + num_pages_aligned, phys, false, NULL); } } if (ret < 0) { - dev_err(kbdev->dev, - "Failed to allocate %u physical pages for the firmware interface entry at VA 0x%x\n", - num_pages_aligned, virtual_start); + dev_warn( + kbdev->dev, + "Failed to allocate %u physical pages for the firmware interface entry at VA 0x%x using %s ", + num_pages_aligned, virtual_start, + is_small_page ? "small pages" : "large page"); + WARN_ON(reuse_pages); + if (!is_small_page) { + dev_warn(kbdev->dev, "Retrying by using small pages"); + force_small_page = true; + kfree(phys); + goto retry_alloc; + } goto out; } @@ -638,21 +758,24 @@ static int parse_memory_setup_entry(struct kbase_device *kbdev, data_start, data_end); /* Allocate enough memory for the struct kbase_csf_firmware_interface and - * the name of the interface. An extra byte is allocated to place a - * NUL-terminator in. This should already be included according to the - * specification but here we add it anyway to be robust against a - * corrupt firmware image. + * the name of the interface. */ - interface = kmalloc(sizeof(*interface) + - size - INTERFACE_ENTRY_NAME_OFFSET + 1, GFP_KERNEL); + name_entry = (void *)entry + INTERFACE_ENTRY_NAME_OFFSET; + name_len = strnlen(name_entry, size - INTERFACE_ENTRY_NAME_OFFSET); + if (size < (INTERFACE_ENTRY_NAME_OFFSET + name_len + 1 + sizeof(u32))) { + dev_err(kbdev->dev, "Memory setup entry too short to contain virtual_exe_start"); + ret = -EINVAL; + goto out; + } + + interface = kmalloc(sizeof(*interface) + name_len + 1, GFP_KERNEL); if (!interface) { ret = -ENOMEM; goto out; } name = (void *)(interface + 1); - memcpy(name, entry + (INTERFACE_ENTRY_NAME_OFFSET / sizeof(*entry)), - size - INTERFACE_ENTRY_NAME_OFFSET); - name[size - INTERFACE_ENTRY_NAME_OFFSET] = 0; + memcpy(name, name_entry, name_len); + name[name_len] = 0; interface->name = name; interface->phys = phys; @@ -667,6 +790,11 @@ static int parse_memory_setup_entry(struct kbase_device *kbdev, interface->data_end = data_end; interface->pma = pma; + /* Discover the virtual execution address field after the end of the name + * field taking into account the NULL-termination character. + */ + interface->virtual_exe_start = *((u32 *)(name_entry + name_len + 1)); + mem_flags = convert_mem_flags(kbdev, flags, &cache_mode); if (flags & CSF_FIRMWARE_ENTRY_SHARED) { @@ -722,8 +850,9 @@ static int parse_memory_setup_entry(struct kbase_device *kbdev, if (!reuse_pages) { ret = kbase_mmu_insert_pages_no_flush(kbdev, &kbdev->csf.mcu_mmu, - virtual_start >> PAGE_SHIFT, phys, num_pages_aligned, mem_flags, - KBASE_MEM_GROUP_CSF_FW); + virtual_start >> PAGE_SHIFT, phys, + num_pages_aligned, mem_flags, + KBASE_MEM_GROUP_CSF_FW, NULL, NULL); if (ret != 0) { dev_err(kbdev->dev, "Failed to insert firmware pages\n"); @@ -770,7 +899,8 @@ out: * @size: Size (in bytes) of the section */ static int parse_timeline_metadata_entry(struct kbase_device *kbdev, - const struct firmware *fw, const u32 *entry, unsigned int size) + const struct kbase_csf_mcu_fw *const fw, const u32 *entry, + unsigned int size) { const u32 data_start = entry[0]; const u32 data_size = entry[1]; @@ -813,6 +943,59 @@ static int parse_timeline_metadata_entry(struct kbase_device *kbdev, } /** + * parse_build_info_metadata_entry() - Process a "build info metadata" section + * @kbdev: Kbase device structure + * @fw: Firmware image containing the section + * @entry: Pointer to the section + * @size: Size (in bytes) of the section + * + * This prints the git SHA of the firmware on frimware load. + * + * Return: 0 if successful, negative error code on failure + */ +static int parse_build_info_metadata_entry(struct kbase_device *kbdev, + const struct kbase_csf_mcu_fw *const fw, + const u32 *entry, unsigned int size) +{ + const u32 meta_start_addr = entry[0]; + char *ptr = NULL; + size_t sha_pattern_len = strlen(BUILD_INFO_GIT_SHA_PATTERN); + + /* Only print git SHA to avoid releasing sensitive information */ + ptr = strstr(fw->data + meta_start_addr, BUILD_INFO_GIT_SHA_PATTERN); + /* Check that we won't overrun the found string */ + if (ptr && + strlen(ptr) >= BUILD_INFO_GIT_SHA_LEN + BUILD_INFO_GIT_DIRTY_LEN + sha_pattern_len) { + char git_sha[BUILD_INFO_GIT_SHA_LEN + BUILD_INFO_GIT_DIRTY_LEN + 1]; + int i = 0; + + /* Move ptr to start of SHA */ + ptr += sha_pattern_len; + for (i = 0; i < BUILD_INFO_GIT_SHA_LEN; i++) { + /* Ensure that the SHA is made up of hex digits */ + if (!isxdigit(ptr[i])) + break; + + git_sha[i] = ptr[i]; + } + + /* Check if the next char indicates git SHA is dirty */ + if (ptr[i] == ' ' || ptr[i] == '+') { + git_sha[i] = ptr[i]; + i++; + } + git_sha[i] = '\0'; + + memcpy(fw_git_sha, git_sha, BUILD_INFO_GIT_SHA_LEN); + + dev_info(kbdev->dev, "Mali firmware git_sha: %s\n", git_sha); + } else + dev_info(kbdev->dev, "Mali firmware git_sha not found or invalid\n"); + + return 0; +} + +/** * load_firmware_entry() - Process an entry from a firmware image * * @kbdev: Kbase device @@ -828,9 +1011,8 @@ static int parse_timeline_metadata_entry(struct kbase_device *kbdev, * * Return: 0 if successful, negative error code on failure */ -static int load_firmware_entry(struct kbase_device *kbdev, - const struct firmware *fw, - u32 offset, u32 header) +static int load_firmware_entry(struct kbase_device *kbdev, const struct kbase_csf_mcu_fw *const fw, + u32 offset, u32 header) { const unsigned int type = entry_type(header); unsigned int size = entry_size(header); @@ -892,13 +1074,35 @@ static int load_firmware_entry(struct kbase_device *kbdev, return -EINVAL; } return parse_timeline_metadata_entry(kbdev, fw, entry, size); - } - - if (!optional) { - dev_err(kbdev->dev, - "Unsupported non-optional entry type %u in firmware\n", - type); - return -EINVAL; + case CSF_FIRMWARE_ENTRY_TYPE_BUILD_INFO_METADATA: + if (size < BUILD_INFO_METADATA_SIZE_OFFSET + sizeof(*entry)) { + dev_err(kbdev->dev, "Build info metadata entry too short (size=%u)\n", + size); + return -EINVAL; + } + return parse_build_info_metadata_entry(kbdev, fw, entry, size); + case CSF_FIRMWARE_ENTRY_TYPE_FUNC_CALL_LIST: + /* Function call list section */ + if (size < FUNC_CALL_LIST_ENTRY_NAME_OFFSET + sizeof(*entry)) { + dev_err(kbdev->dev, "Function call list entry too short (size=%u)\n", + size); + return -EINVAL; + } + kbase_csf_firmware_log_parse_logging_call_list_entry(kbdev, entry); + return 0; + case CSF_FIRMWARE_ENTRY_TYPE_CORE_DUMP: + /* Core Dump section */ + if (size < CORE_DUMP_ENTRY_START_ADDR_OFFSET + sizeof(*entry)) { + dev_err(kbdev->dev, "FW Core dump entry too short (size=%u)\n", size); + return -EINVAL; + } + return kbase_csf_firmware_core_dump_entry_parse(kbdev, entry); + default: + if (!optional) { + dev_err(kbdev->dev, "Unsupported non-optional entry type %u in firmware\n", + type); + return -EINVAL; + } } return 0; @@ -1115,40 +1319,80 @@ static int parse_capabilities(struct kbase_device *kbdev) return 0; } +static inline void access_firmware_memory_common(struct kbase_device *kbdev, + struct kbase_csf_firmware_interface *interface, u32 offset_bytes, + u32 *value, const bool read) +{ + u32 page_num = offset_bytes >> PAGE_SHIFT; + u32 offset_in_page = offset_bytes & ~PAGE_MASK; + struct page *target_page = as_page(interface->phys[page_num]); + uintptr_t cpu_addr = (uintptr_t)kbase_kmap_atomic(target_page); + u32 *addr = (u32 *)(cpu_addr + offset_in_page); + + if (read) { + kbase_sync_single_for_device(kbdev, + kbase_dma_addr_from_tagged(interface->phys[page_num]) + offset_in_page, + sizeof(u32), DMA_BIDIRECTIONAL); + *value = *addr; + } else { + *addr = *value; + kbase_sync_single_for_device(kbdev, + kbase_dma_addr_from_tagged(interface->phys[page_num]) + offset_in_page, + sizeof(u32), DMA_BIDIRECTIONAL); + } + + kbase_kunmap_atomic((u32 *)cpu_addr); +} + static inline void access_firmware_memory(struct kbase_device *kbdev, u32 gpu_addr, u32 *value, const bool read) { - struct kbase_csf_firmware_interface *interface; + struct kbase_csf_firmware_interface *interface, *access_interface = NULL; + u32 offset_bytes = 0; list_for_each_entry(interface, &kbdev->csf.firmware_interfaces, node) { if ((gpu_addr >= interface->virtual) && (gpu_addr < interface->virtual + (interface->num_pages << PAGE_SHIFT))) { - u32 offset_bytes = gpu_addr - interface->virtual; - u32 page_num = offset_bytes >> PAGE_SHIFT; - u32 offset_in_page = offset_bytes & ~PAGE_MASK; - struct page *target_page = as_page( - interface->phys[page_num]); - u32 *cpu_addr = kmap_atomic(target_page); - - if (read) { - kbase_sync_single_for_device(kbdev, - kbase_dma_addr(target_page) + offset_in_page, - sizeof(u32), DMA_BIDIRECTIONAL); - - *value = cpu_addr[offset_in_page >> 2]; - } else { - cpu_addr[offset_in_page >> 2] = *value; + offset_bytes = gpu_addr - interface->virtual; + access_interface = interface; + break; + } + } - kbase_sync_single_for_device(kbdev, - kbase_dma_addr(target_page) + offset_in_page, - sizeof(u32), DMA_BIDIRECTIONAL); - } + if (access_interface) + access_firmware_memory_common(kbdev, access_interface, offset_bytes, value, read); + else + dev_warn(kbdev->dev, "Invalid GPU VA %x passed", gpu_addr); +} + +static inline void access_firmware_memory_exe(struct kbase_device *kbdev, + u32 gpu_addr, u32 *value, const bool read) +{ + struct kbase_csf_firmware_interface *interface, *access_interface = NULL; + u32 offset_bytes = 0; - kunmap_atomic(cpu_addr); - return; + list_for_each_entry(interface, &kbdev->csf.firmware_interfaces, node) { + if ((gpu_addr >= interface->virtual_exe_start) && + (gpu_addr < interface->virtual_exe_start + + (interface->num_pages << PAGE_SHIFT))) { + offset_bytes = gpu_addr - interface->virtual_exe_start; + access_interface = interface; + + /* If there's an overlap in execution address range between a moved and a + * non-moved areas, always prefer the moved one. The idea is that FW may + * move sections around during init time, but after the layout is settled, + * any moved sections are going to override non-moved areas at the same + * location. + */ + if (interface->virtual_exe_start != interface->virtual) + break; } } - dev_warn(kbdev->dev, "Invalid GPU VA %x passed\n", gpu_addr); + + if (access_interface) + access_firmware_memory_common(kbdev, access_interface, offset_bytes, value, read); + else + dev_warn(kbdev->dev, "Invalid GPU VA %x passed", gpu_addr); } void kbase_csf_read_firmware_memory(struct kbase_device *kbdev, @@ -1163,6 +1407,18 @@ void kbase_csf_update_firmware_memory(struct kbase_device *kbdev, access_firmware_memory(kbdev, gpu_addr, &value, false); } +void kbase_csf_read_firmware_memory_exe(struct kbase_device *kbdev, + u32 gpu_addr, u32 *value) +{ + access_firmware_memory_exe(kbdev, gpu_addr, value, true); +} + +void kbase_csf_update_firmware_memory_exe(struct kbase_device *kbdev, + u32 gpu_addr, u32 value) +{ + access_firmware_memory_exe(kbdev, gpu_addr, &value, false); +} + void kbase_csf_firmware_cs_input( const struct kbase_csf_cmd_stream_info *const info, const u32 offset, const u32 value) @@ -1295,6 +1551,26 @@ u32 kbase_csf_firmware_global_output( KBASE_EXPORT_TEST_API(kbase_csf_firmware_global_output); /** + * csf_doorbell_offset() - Calculate the offset to the CSF host doorbell + * @doorbell_nr: Doorbell number + * + * Return: CSF host register offset for the specified doorbell number. + */ +static u32 csf_doorbell_offset(int doorbell_nr) +{ + WARN_ON(doorbell_nr < 0); + WARN_ON(doorbell_nr >= CSF_NUM_DOORBELL); + + return CSF_HW_DOORBELL_PAGE_OFFSET + (doorbell_nr * CSF_HW_DOORBELL_PAGE_SIZE); +} + +void kbase_csf_ring_doorbell(struct kbase_device *kbdev, int doorbell_nr) +{ + kbase_reg_write(kbdev, csf_doorbell_offset(doorbell_nr), (u32)1); +} +EXPORT_SYMBOL(kbase_csf_ring_doorbell); + +/** * handle_internal_firmware_fatal - Handler for CS internal firmware fault. * * @kbdev: Pointer to kbase device @@ -1306,6 +1582,8 @@ static void handle_internal_firmware_fatal(struct kbase_device *const kbdev) { int as; + kbasep_platform_event_core_dump(kbdev, "Internal firmware error"); + for (as = 0; as < kbdev->nr_hw_address_spaces; as++) { unsigned long flags; struct kbase_context *kctx; @@ -1378,11 +1656,10 @@ static bool global_request_complete(struct kbase_device *const kbdev, return complete; } -static int wait_for_global_request(struct kbase_device *const kbdev, - u32 const req_mask) +static int wait_for_global_request_with_timeout(struct kbase_device *const kbdev, + u32 const req_mask, unsigned int timeout_ms) { - const long wait_timeout = - kbase_csf_timeout_in_jiffies(kbdev->csf.fw_timeout_ms); + const long wait_timeout = kbase_csf_timeout_in_jiffies(timeout_ms); long remaining; int err = 0; @@ -1391,10 +1668,9 @@ static int wait_for_global_request(struct kbase_device *const kbdev, wait_timeout); if (!remaining) { - dev_warn(kbdev->dev, "[%llu] Timeout (%d ms) waiting for global request %x to complete", - kbase_backend_get_cycle_cnt(kbdev), - kbdev->csf.fw_timeout_ms, - req_mask); + dev_warn(kbdev->dev, + "[%llu] Timeout (%d ms) waiting for global request %x to complete", + kbase_backend_get_cycle_cnt(kbdev), timeout_ms, req_mask); err = -ETIMEDOUT; } @@ -1402,6 +1678,11 @@ static int wait_for_global_request(struct kbase_device *const kbdev, return err; } +static int wait_for_global_request(struct kbase_device *const kbdev, u32 const req_mask) +{ + return wait_for_global_request_with_timeout(kbdev, req_mask, kbdev->csf.fw_timeout_ms); +} + static void set_global_request( const struct kbase_csf_global_iface *const global_iface, u32 const req_mask) @@ -1442,6 +1723,11 @@ static void enable_shader_poweroff_timer(struct kbase_device *const kbdev, kbase_csf_firmware_global_input(global_iface, GLB_PWROFF_TIMER, pwroff_reg); + + kbase_csf_firmware_global_input_mask(global_iface, GLB_PWROFF_TIMER_CONFIG, + kbdev->csf.mcu_core_pwroff_dur_count_modifier, + GLB_PWROFF_TIMER_CONFIG_NO_MODIFIER_MASK); + set_global_request(global_iface, GLB_REQ_CFG_PWROFF_TIMER_MASK); /* Save the programed reg value in its shadow field */ @@ -1468,12 +1754,102 @@ static void enable_gpu_idle_timer(struct kbase_device *const kbdev) kbase_csf_firmware_global_input(global_iface, GLB_IDLE_TIMER, kbdev->csf.gpu_idle_dur_count); + + kbase_csf_firmware_global_input_mask(global_iface, GLB_IDLE_TIMER_CONFIG, + kbdev->csf.gpu_idle_dur_count_modifier, + GLB_IDLE_TIMER_CONFIG_NO_MODIFIER_MASK); + kbase_csf_firmware_global_input_mask(global_iface, GLB_REQ, GLB_REQ_REQ_IDLE_ENABLE, GLB_REQ_IDLE_ENABLE_MASK); dev_dbg(kbdev->dev, "Enabling GPU idle timer with count-value: 0x%.8x", kbdev->csf.gpu_idle_dur_count); } +static bool global_debug_request_complete(struct kbase_device *const kbdev, u32 const req_mask) +{ + struct kbase_csf_global_iface *global_iface = &kbdev->csf.global_iface; + bool complete = false; + unsigned long flags; + + kbase_csf_scheduler_spin_lock(kbdev, &flags); + + if ((kbase_csf_firmware_global_output(global_iface, GLB_DEBUG_ACK) & req_mask) == + (kbase_csf_firmware_global_input_read(global_iface, GLB_DEBUG_REQ) & req_mask)) + complete = true; + + kbase_csf_scheduler_spin_unlock(kbdev, flags); + + return complete; +} + +static void set_global_debug_request(const struct kbase_csf_global_iface *const global_iface, + u32 const req_mask) +{ + u32 glb_debug_req; + + kbase_csf_scheduler_spin_lock_assert_held(global_iface->kbdev); + + glb_debug_req = kbase_csf_firmware_global_output(global_iface, GLB_DEBUG_ACK); + glb_debug_req ^= req_mask; + + kbase_csf_firmware_global_input_mask(global_iface, GLB_DEBUG_REQ, glb_debug_req, req_mask); +} + +static void request_fw_core_dump( + const struct kbase_csf_global_iface *const global_iface) +{ + uint32_t run_mode = GLB_DEBUG_REQ_RUN_MODE_SET(0, GLB_DEBUG_RUN_MODE_TYPE_CORE_DUMP); + + set_global_debug_request(global_iface, GLB_DEBUG_REQ_DEBUG_RUN_MASK | run_mode); + + set_global_request(global_iface, GLB_REQ_DEBUG_CSF_REQ_MASK); +} + +int kbase_csf_firmware_req_core_dump(struct kbase_device *const kbdev) +{ + const struct kbase_csf_global_iface *const global_iface = + &kbdev->csf.global_iface; + unsigned long flags; + int ret; + + /* Serialize CORE_DUMP requests. */ + mutex_lock(&kbdev->csf.reg_lock); + + /* Update GLB_REQ with CORE_DUMP request and make firmware act on it. */ + kbase_csf_scheduler_spin_lock(kbdev, &flags); + request_fw_core_dump(global_iface); + kbase_csf_ring_doorbell(kbdev, CSF_KERNEL_DOORBELL_NR); + kbase_csf_scheduler_spin_unlock(kbdev, flags); + + /* Wait for firmware to acknowledge completion of the CORE_DUMP request. */ + ret = wait_for_global_request(kbdev, GLB_REQ_DEBUG_CSF_REQ_MASK); + if (!ret) + WARN_ON(!global_debug_request_complete(kbdev, GLB_DEBUG_REQ_DEBUG_RUN_MASK)); + + mutex_unlock(&kbdev->csf.reg_lock); + + return ret; +} + +/** + * kbasep_enable_rtu - Enable Ray Tracing Unit on powering up shader core + * + * @kbdev: The kbase device structure of the device + * + * This function needs to be called to enable the Ray Tracing Unit + * by writing SHADER_PWRFEATURES only when host controls shader cores power. + */ +static void kbasep_enable_rtu(struct kbase_device *kbdev) +{ + const u32 gpu_id = kbdev->gpu_props.props.raw_props.gpu_id; + + if (gpu_id < GPU_ID2_PRODUCT_MAKE(12, 8, 3, 0)) + return; + + if (kbdev->csf.firmware_hctl_core_pwr) + kbase_reg_write(kbdev, GPU_CONTROL_REG(SHADER_PWRFEATURES), 1); +} + static void global_init(struct kbase_device *const kbdev, u64 core_mask) { u32 const ack_irq_mask = @@ -1481,30 +1857,49 @@ static void global_init(struct kbase_device *const kbdev, u64 core_mask) GLB_ACK_IRQ_MASK_CFG_PROGRESS_TIMER_MASK | GLB_ACK_IRQ_MASK_PROTM_ENTER_MASK | GLB_ACK_IRQ_MASK_PROTM_EXIT_MASK | GLB_ACK_IRQ_MASK_FIRMWARE_CONFIG_UPDATE_MASK | GLB_ACK_IRQ_MASK_CFG_PWROFF_TIMER_MASK | GLB_ACK_IRQ_MASK_IDLE_EVENT_MASK | - GLB_ACK_IRQ_MASK_IDLE_ENABLE_MASK; + GLB_REQ_DEBUG_CSF_REQ_MASK | GLB_ACK_IRQ_MASK_IDLE_ENABLE_MASK; const struct kbase_csf_global_iface *const global_iface = &kbdev->csf.global_iface; unsigned long flags; +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + /* If the power_policy will grant host control over FW PM, we need to turn on the SC rail*/ + if (kbdev->csf.firmware_hctl_core_pwr) { + queue_work(system_highpri_wq, &kbdev->pm.backend.sc_rails_on_work); + } +#endif + kbase_csf_scheduler_spin_lock(kbdev, &flags); + kbasep_enable_rtu(kbdev); + /* Update shader core allocation enable mask */ enable_endpoints_global(global_iface, core_mask); enable_shader_poweroff_timer(kbdev, global_iface); - set_timeout_global(global_iface, kbase_csf_timeout_get(kbdev)); - +#ifndef CONFIG_MALI_HOST_CONTROLS_SC_RAILS /* The GPU idle timer is always enabled for simplicity. Checks will be * done before scheduling the GPU idle worker to see if it is * appropriate for the current power policy. */ enable_gpu_idle_timer(kbdev); +#endif + + set_timeout_global(global_iface, kbase_csf_timeout_get(kbdev)); /* Unmask the interrupts */ kbase_csf_firmware_global_input(global_iface, GLB_ACK_IRQ_MASK, ack_irq_mask); +#if IS_ENABLED(CONFIG_MALI_CORESIGHT) + /* Enable FW MCU read/write debug interfaces */ + kbase_csf_firmware_global_input_mask( + global_iface, GLB_DEBUG_ACK_IRQ_MASK, + GLB_DEBUG_REQ_FW_AS_READ_MASK | GLB_DEBUG_REQ_FW_AS_WRITE_MASK, + GLB_DEBUG_REQ_FW_AS_READ_MASK | GLB_DEBUG_REQ_FW_AS_WRITE_MASK); +#endif /* IS_ENABLED(CONFIG_MALI_CORESIGHT) */ + kbase_csf_ring_doorbell(kbdev, CSF_KERNEL_DOORBELL_NR); kbase_csf_scheduler_spin_unlock(kbdev, flags); @@ -1550,7 +1945,9 @@ void kbase_csf_firmware_global_reinit(struct kbase_device *kbdev, bool kbase_csf_firmware_global_reinit_complete(struct kbase_device *kbdev) { lockdep_assert_held(&kbdev->hwaccess_lock); +#ifndef CONFIG_MALI_HOST_CONTROLS_SC_RAILS WARN_ON(!kbdev->csf.glb_init_request_pending); +#endif if (global_request_complete(kbdev, CSF_GLB_REQ_CFG_MASK)) kbdev->csf.glb_init_request_pending = false; @@ -1613,6 +2010,20 @@ static void kbase_csf_firmware_reload_worker(struct work_struct *work) kbase_csf_tl_reader_reset(&kbdev->timeline->csf_tl_reader); + err = kbasep_platform_fw_config_init(kbdev); + if (WARN_ON(err)) + return; + +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + err = kbase_csf_firmware_cfg_enable_host_ctrl_sc_rails(kbdev); + if (WARN_ON(err)) + return; +#endif + + err = kbase_csf_firmware_cfg_fw_wa_enable(kbdev); + if (WARN_ON(err)) + return; + /* Reboot the firmware */ kbase_csf_firmware_enable_mcu(kbdev); } @@ -1625,7 +2036,7 @@ void kbase_csf_firmware_trigger_reload(struct kbase_device *kbdev) if (kbdev->csf.firmware_reload_needed) { kbdev->csf.firmware_reload_needed = false; - queue_work(system_wq, &kbdev->csf.firmware_reload_work); + queue_work(system_highpri_wq, &kbdev->csf.firmware_reload_work); } else { kbase_csf_firmware_enable_mcu(kbdev); } @@ -1648,19 +2059,20 @@ void kbase_csf_firmware_reload_completed(struct kbase_device *kbdev) if (version != kbdev->csf.global_iface.version) dev_err(kbdev->dev, "Version check failed in firmware reboot."); - KBASE_KTRACE_ADD(kbdev, FIRMWARE_REBOOT, NULL, 0u); + KBASE_KTRACE_ADD(kbdev, CSF_FIRMWARE_REBOOT, NULL, 0u); /* Tell MCU state machine to transit to next state */ kbdev->csf.firmware_reloaded = true; kbase_pm_update_state(kbdev); } -static u32 convert_dur_to_idle_count(struct kbase_device *kbdev, const u32 dur_ms) +static u32 convert_dur_to_idle_count(struct kbase_device *kbdev, const u32 dur_ns, u32 *modifier) { +#define MICROSECONDS_PER_SECOND 1000000u #define HYSTERESIS_VAL_UNIT_SHIFT (10) /* Get the cntfreq_el0 value, which drives the SYSTEM_TIMESTAMP */ u64 freq = arch_timer_get_cntfrq(); - u64 dur_val = dur_ms; + u64 dur_val = dur_ns; u32 cnt_val_u32, reg_val_u32; bool src_system_timestamp = freq > 0; @@ -1673,25 +2085,29 @@ static u32 convert_dur_to_idle_count(struct kbase_device *kbdev, const u32 dur_m dev_warn(kbdev->dev, "No GPU clock, unexpected intregration issue!"); spin_unlock(&kbdev->pm.clk_rtm.lock); - dev_info(kbdev->dev, "Can't get the timestamp frequency, " - "use cycle counter format with firmware idle hysteresis!"); + dev_info( + kbdev->dev, + "Can't get the timestamp frequency, use cycle counter format with firmware idle hysteresis!"); } - /* Formula for dur_val = ((dur_ms/1000) * freq_HZ) >> 10) */ - dur_val = (dur_val * freq) >> HYSTERESIS_VAL_UNIT_SHIFT; - dur_val = div_u64(dur_val, 1000); + /* Formula for dur_val = (dur/1e9) * freq_HZ) */ + dur_val = dur_val * freq; + dur_val = div_u64(dur_val, NSEC_PER_SEC); + if (dur_val < S32_MAX) { + *modifier = 1; + } else { + dur_val = dur_val >> HYSTERESIS_VAL_UNIT_SHIFT; + *modifier = 0; + } /* Interface limits the value field to S32_MAX */ cnt_val_u32 = (dur_val > S32_MAX) ? S32_MAX : (u32)dur_val; reg_val_u32 = GLB_IDLE_TIMER_TIMEOUT_SET(0, cnt_val_u32); /* add the source flag */ - if (src_system_timestamp) - reg_val_u32 = GLB_IDLE_TIMER_TIMER_SOURCE_SET(reg_val_u32, - GLB_IDLE_TIMER_TIMER_SOURCE_SYSTEM_TIMESTAMP); - else - reg_val_u32 = GLB_IDLE_TIMER_TIMER_SOURCE_SET(reg_val_u32, - GLB_IDLE_TIMER_TIMER_SOURCE_GPU_COUNTER); + reg_val_u32 = GLB_IDLE_TIMER_TIMER_SOURCE_SET( + reg_val_u32, (src_system_timestamp ? GLB_IDLE_TIMER_TIMER_SOURCE_SYSTEM_TIMESTAMP : + GLB_IDLE_TIMER_TIMER_SOURCE_GPU_COUNTER)); return reg_val_u32; } @@ -1702,16 +2118,22 @@ u32 kbase_csf_firmware_get_gpu_idle_hysteresis_time(struct kbase_device *kbdev) u32 dur; kbase_csf_scheduler_spin_lock(kbdev, &flags); - dur = kbdev->csf.gpu_idle_hysteresis_ms; + dur = kbdev->csf.gpu_idle_hysteresis_ns; kbase_csf_scheduler_spin_unlock(kbdev, flags); return dur; } -u32 kbase_csf_firmware_set_gpu_idle_hysteresis_time(struct kbase_device *kbdev, u32 dur) +u32 kbase_csf_firmware_set_gpu_idle_hysteresis_time(struct kbase_device *kbdev, u32 dur_ns) { unsigned long flags; - const u32 hysteresis_val = convert_dur_to_idle_count(kbdev, dur); + u32 modifier = 0; + +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + const u32 hysteresis_val = convert_dur_to_idle_count(kbdev, MALI_HOST_CONTROLS_SC_RAILS_IDLE_TIMER_NS, &modifier); +#else + const u32 hysteresis_val = convert_dur_to_idle_count(kbdev, dur_ns, &modifier); +#endif /* The 'fw_load_lock' is taken to synchronize against the deferred * loading of FW, where the idle timer will be enabled. @@ -1719,22 +2141,32 @@ u32 kbase_csf_firmware_set_gpu_idle_hysteresis_time(struct kbase_device *kbdev, mutex_lock(&kbdev->fw_load_lock); if (unlikely(!kbdev->csf.firmware_inited)) { kbase_csf_scheduler_spin_lock(kbdev, &flags); - kbdev->csf.gpu_idle_hysteresis_ms = dur; + kbdev->csf.gpu_idle_hysteresis_ns = dur_ns; kbdev->csf.gpu_idle_dur_count = hysteresis_val; + kbdev->csf.gpu_idle_dur_count_modifier = modifier; kbase_csf_scheduler_spin_unlock(kbdev, flags); mutex_unlock(&kbdev->fw_load_lock); goto end; } mutex_unlock(&kbdev->fw_load_lock); + if (kbase_reset_gpu_prevent_and_wait(kbdev)) { + dev_warn(kbdev->dev, + "Failed to prevent GPU reset when updating idle_hysteresis_time"); + return kbdev->csf.gpu_idle_dur_count; + } + kbase_csf_scheduler_pm_active(kbdev); - if (kbase_csf_scheduler_wait_mcu_active(kbdev)) { + if (kbase_csf_scheduler_killable_wait_mcu_active(kbdev)) { dev_err(kbdev->dev, "Unable to activate the MCU, the idle hysteresis value shall remain unchanged"); kbase_csf_scheduler_pm_idle(kbdev); + kbase_reset_gpu_allow(kbdev); + return kbdev->csf.gpu_idle_dur_count; } +#ifndef CONFIG_MALI_HOST_CONTROLS_SC_RAILS /* The 'reg_lock' is also taken and is held till the update is not * complete, to ensure the update of idle timer value by multiple Users * gets serialized. @@ -1743,22 +2175,49 @@ u32 kbase_csf_firmware_set_gpu_idle_hysteresis_time(struct kbase_device *kbdev, /* The firmware only reads the new idle timer value when the timer is * disabled. */ - kbase_csf_scheduler_spin_lock(kbdev, &flags); - kbase_csf_firmware_disable_gpu_idle_timer(kbdev); - kbase_csf_scheduler_spin_unlock(kbdev, flags); - /* Ensure that the request has taken effect */ - wait_for_global_request(kbdev, GLB_REQ_IDLE_DISABLE_MASK); +#endif - kbase_csf_scheduler_spin_lock(kbdev, &flags); - kbdev->csf.gpu_idle_hysteresis_ms = dur; - kbdev->csf.gpu_idle_dur_count = hysteresis_val; - kbase_csf_firmware_enable_gpu_idle_timer(kbdev); - kbase_csf_scheduler_spin_unlock(kbdev, flags); - wait_for_global_request(kbdev, GLB_REQ_IDLE_ENABLE_MASK); +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + kbase_csf_scheduler_lock(kbdev); + if (kbdev->csf.scheduler.gpu_idle_fw_timer_enabled) { +#endif + /* The firmware only reads the new idle timer value when the timer is + * disabled. + */ + kbase_csf_scheduler_spin_lock(kbdev, &flags); + kbase_csf_firmware_disable_gpu_idle_timer(kbdev); + kbase_csf_scheduler_spin_unlock(kbdev, flags); + /* Ensure that the request has taken effect */ + wait_for_global_request(kbdev, GLB_REQ_IDLE_DISABLE_MASK); + + kbase_csf_scheduler_spin_lock(kbdev, &flags); + kbdev->csf.gpu_idle_hysteresis_ns = dur_ns; + kbdev->csf.gpu_idle_dur_count = hysteresis_val; + kbdev->csf.gpu_idle_dur_count_modifier = modifier; + kbase_csf_firmware_enable_gpu_idle_timer(kbdev); + kbase_csf_scheduler_spin_unlock(kbdev, flags); + wait_for_global_request(kbdev, GLB_REQ_IDLE_ENABLE_MASK); +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + } else { + /* Record the new values. Would be used later when timer is + * enabled + */ + kbase_csf_scheduler_spin_lock(kbdev, &flags); + kbdev->csf.gpu_idle_hysteresis_ns = dur_ns; + kbdev->csf.gpu_idle_dur_count = hysteresis_val; + kbdev->csf.gpu_idle_dur_count_modifier = modifier; + kbase_csf_scheduler_spin_unlock(kbdev, flags); + } + kbase_csf_scheduler_unlock(kbdev); +#else mutex_unlock(&kbdev->csf.reg_lock); +#endif + dev_dbg(kbdev->dev, "GPU suspend timeout updated: %i ns (0x%.8x)", + kbdev->csf.gpu_idle_hysteresis_ns, + kbdev->csf.gpu_idle_dur_count); kbase_csf_scheduler_pm_idle(kbdev); - + kbase_reset_gpu_allow(kbdev); end: dev_dbg(kbdev->dev, "CSF set firmware idle hysteresis count-value: 0x%.8x", hysteresis_val); @@ -1766,15 +2225,18 @@ end: return hysteresis_val; } -static u32 convert_dur_to_core_pwroff_count(struct kbase_device *kbdev, const u32 dur_us) +static u32 convert_dur_to_core_pwroff_count(struct kbase_device *kbdev, const u32 dur_ns, + u32 *modifier) { -#define PWROFF_VAL_UNIT_SHIFT (10) /* Get the cntfreq_el0 value, which drives the SYSTEM_TIMESTAMP */ u64 freq = arch_timer_get_cntfrq(); - u64 dur_val = dur_us; + u64 dur_val = dur_ns; u32 cnt_val_u32, reg_val_u32; bool src_system_timestamp = freq > 0; + const struct kbase_pm_policy *current_policy = kbase_pm_get_policy(kbdev); + bool always_on = current_policy == &kbase_pm_always_on_policy_ops; + if (!src_system_timestamp) { /* Get the cycle_counter source alternative */ spin_lock(&kbdev->pm.clk_rtm.lock); @@ -1784,49 +2246,76 @@ static u32 convert_dur_to_core_pwroff_count(struct kbase_device *kbdev, const u3 dev_warn(kbdev->dev, "No GPU clock, unexpected integration issue!"); spin_unlock(&kbdev->pm.clk_rtm.lock); - dev_info(kbdev->dev, "Can't get the timestamp frequency, " - "use cycle counter with MCU Core Poweroff timer!"); + dev_info( + kbdev->dev, + "Can't get the timestamp frequency, use cycle counter with MCU shader Core Poweroff timer!"); } - /* Formula for dur_val = ((dur_us/1e6) * freq_HZ) >> 10) */ - dur_val = (dur_val * freq) >> HYSTERESIS_VAL_UNIT_SHIFT; - dur_val = div_u64(dur_val, 1000000); + /* Formula for dur_val = (dur/1e9) * freq_HZ) */ + dur_val = dur_val * freq; + dur_val = div_u64(dur_val, NSEC_PER_SEC); + if (dur_val < S32_MAX) { + *modifier = 1; + } else { + dur_val = dur_val >> HYSTERESIS_VAL_UNIT_SHIFT; + *modifier = 0; + } - /* Interface limits the value field to S32_MAX */ - cnt_val_u32 = (dur_val > S32_MAX) ? S32_MAX : (u32)dur_val; + if (dur_val == 0 && !always_on) { + /* Lower Bound - as 0 disables timeout and host controls shader-core power management. */ + cnt_val_u32 = 1; + } else if (dur_val > S32_MAX) { + /* Upper Bound - as interface limits the field to S32_MAX */ + cnt_val_u32 = S32_MAX; + } else { + cnt_val_u32 = (u32)dur_val; + } reg_val_u32 = GLB_PWROFF_TIMER_TIMEOUT_SET(0, cnt_val_u32); /* add the source flag */ - if (src_system_timestamp) - reg_val_u32 = GLB_PWROFF_TIMER_TIMER_SOURCE_SET(reg_val_u32, - GLB_PWROFF_TIMER_TIMER_SOURCE_SYSTEM_TIMESTAMP); - else - reg_val_u32 = GLB_PWROFF_TIMER_TIMER_SOURCE_SET(reg_val_u32, - GLB_PWROFF_TIMER_TIMER_SOURCE_GPU_COUNTER); + reg_val_u32 = GLB_PWROFF_TIMER_TIMER_SOURCE_SET( + reg_val_u32, + (src_system_timestamp ? GLB_PWROFF_TIMER_TIMER_SOURCE_SYSTEM_TIMESTAMP : + GLB_PWROFF_TIMER_TIMER_SOURCE_GPU_COUNTER)); return reg_val_u32; } u32 kbase_csf_firmware_get_mcu_core_pwroff_time(struct kbase_device *kbdev) { - return kbdev->csf.mcu_core_pwroff_dur_us; + u32 pwroff; + unsigned long flags; + + spin_lock_irqsave(&kbdev->hwaccess_lock, flags); + pwroff = kbdev->csf.mcu_core_pwroff_dur_ns; + spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); + + return pwroff; } -u32 kbase_csf_firmware_set_mcu_core_pwroff_time(struct kbase_device *kbdev, u32 dur) +u32 kbase_csf_firmware_set_mcu_core_pwroff_time(struct kbase_device *kbdev, u32 dur_ns) { unsigned long flags; - const u32 pwroff = convert_dur_to_core_pwroff_count(kbdev, dur); + u32 modifier = 0; + + const u32 pwroff = convert_dur_to_core_pwroff_count(kbdev, dur_ns, &modifier); spin_lock_irqsave(&kbdev->hwaccess_lock, flags); - kbdev->csf.mcu_core_pwroff_dur_us = dur; + kbdev->csf.mcu_core_pwroff_dur_ns = dur_ns; kbdev->csf.mcu_core_pwroff_dur_count = pwroff; + kbdev->csf.mcu_core_pwroff_dur_count_modifier = modifier; spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); - dev_dbg(kbdev->dev, "MCU Core Poweroff input update: 0x%.8x", pwroff); + dev_dbg(kbdev->dev, "MCU shader Core Poweroff input update: 0x%.8x", pwroff); return pwroff; } +u32 kbase_csf_firmware_reset_mcu_core_pwroff_time(struct kbase_device *kbdev) +{ + return kbase_csf_firmware_set_mcu_core_pwroff_time(kbdev, DEFAULT_GLB_PWROFF_TIMEOUT_NS); +} + /** * kbase_device_csf_iterator_trace_init - Send request to enable iterator * trace port. @@ -1838,19 +2327,25 @@ u32 kbase_csf_firmware_set_mcu_core_pwroff_time(struct kbase_device *kbdev, u32 static int kbase_device_csf_iterator_trace_init(struct kbase_device *kbdev) { /* Enable the iterator trace port if supported by the GPU. - * It requires the GPU to have a nonzero "iter_trace_enable" + * It requires the GPU to have a nonzero "iter-trace-enable" * property in the device tree, and the FW must advertise * this feature in GLB_FEATURES. */ if (kbdev->pm.backend.gpu_powered) { - /* check device tree for iterator trace enable property */ + /* check device tree for iterator trace enable property + * and fallback to "iter_trace_enable" if it is not found + */ const void *iter_trace_param = of_get_property( kbdev->dev->of_node, - "iter_trace_enable", NULL); + "iter-trace-enable", NULL); const struct kbase_csf_global_iface *iface = &kbdev->csf.global_iface; + if (!iter_trace_param) + iter_trace_param = + of_get_property(kbdev->dev->of_node, "iter_trace_enable", NULL); + if (iter_trace_param) { u32 iter_trace_value = be32_to_cpup(iter_trace_param); @@ -1889,50 +2384,105 @@ static int kbase_device_csf_iterator_trace_init(struct kbase_device *kbdev) return 0; } +static void coredump_worker(struct work_struct *data) +{ + struct kbase_device *kbdev = container_of(data, struct kbase_device, csf.coredump_work); + + kbasep_platform_event_core_dump(kbdev, "GPU hang"); +} + int kbase_csf_firmware_early_init(struct kbase_device *kbdev) { + u32 modifier = 0; + init_waitqueue_head(&kbdev->csf.event_wait); kbdev->csf.interrupt_received = false; kbdev->csf.fw_timeout_ms = kbase_get_timeout_ms(kbdev, CSF_FIRMWARE_TIMEOUT); - kbdev->csf.gpu_idle_hysteresis_ms = FIRMWARE_IDLE_HYSTERESIS_TIME_MS; -#ifdef KBASE_PM_RUNTIME - if (kbase_pm_gpu_sleep_allowed(kbdev)) - kbdev->csf.gpu_idle_hysteresis_ms /= - FIRMWARE_IDLE_HYSTERESIS_GPU_SLEEP_SCALER; -#endif - WARN_ON(!kbdev->csf.gpu_idle_hysteresis_ms); - kbdev->csf.gpu_idle_dur_count = convert_dur_to_idle_count( - kbdev, kbdev->csf.gpu_idle_hysteresis_ms); - - kbdev->csf.mcu_core_pwroff_dur_us = DEFAULT_GLB_PWROFF_TIMEOUT_US; +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + /* Set to the lowest possible value for FW to immediately write + * to the power off register to disable the cores. + */ + kbdev->csf.mcu_core_pwroff_dur_count = 1; +#else + kbdev->csf.mcu_core_pwroff_dur_ns = DEFAULT_GLB_PWROFF_TIMEOUT_NS; kbdev->csf.mcu_core_pwroff_dur_count = convert_dur_to_core_pwroff_count( - kbdev, DEFAULT_GLB_PWROFF_TIMEOUT_US); + kbdev, DEFAULT_GLB_PWROFF_TIMEOUT_NS, &modifier); + kbdev->csf.mcu_core_pwroff_dur_count_modifier = modifier; +#endif + kbase_csf_firmware_reset_mcu_core_pwroff_time(kbdev); INIT_LIST_HEAD(&kbdev->csf.firmware_interfaces); INIT_LIST_HEAD(&kbdev->csf.firmware_config); INIT_LIST_HEAD(&kbdev->csf.firmware_timeline_metadata); INIT_LIST_HEAD(&kbdev->csf.firmware_trace_buffers.list); + INIT_LIST_HEAD(&kbdev->csf.user_reg.list); INIT_WORK(&kbdev->csf.firmware_reload_work, kbase_csf_firmware_reload_worker); INIT_WORK(&kbdev->csf.fw_error_work, firmware_error_worker); + INIT_WORK(&kbdev->csf.coredump_work, coredump_worker); + init_rwsem(&kbdev->csf.pmode_sync_sem); mutex_init(&kbdev->csf.reg_lock); + kbase_csf_pending_gpuq_kicks_init(kbdev); + + kbdev->csf.fw = (struct kbase_csf_mcu_fw){ .data = NULL }; + + return 0; +} + +void kbase_csf_firmware_early_term(struct kbase_device *kbdev) +{ + kbase_csf_pending_gpuq_kicks_term(kbdev); + mutex_destroy(&kbdev->csf.reg_lock); +} + +int kbase_csf_firmware_late_init(struct kbase_device *kbdev) +{ + u32 modifier = 0; + + kbdev->csf.gpu_idle_hysteresis_ns = FIRMWARE_IDLE_HYSTERESIS_TIME_NS; + +#ifdef KBASE_PM_RUNTIME + if (kbase_pm_gpu_sleep_allowed(kbdev)) + kbdev->csf.gpu_idle_hysteresis_ns /= FIRMWARE_IDLE_HYSTERESIS_GPU_SLEEP_SCALER; +#endif + WARN_ON(!kbdev->csf.gpu_idle_hysteresis_ns); + +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + kbdev->csf.gpu_idle_dur_count = convert_dur_to_idle_count( + kbdev, MALI_HOST_CONTROLS_SC_RAILS_IDLE_TIMER_NS, &modifier); + + /* Set to the lowest possible value for FW to immediately write + * to the power off register to disable the cores. + */ + kbdev->csf.mcu_core_pwroff_dur_count = 1; +#else + kbdev->csf.gpu_idle_dur_count = convert_dur_to_idle_count( + kbdev, kbdev->csf.gpu_idle_hysteresis_ns, &modifier); + kbdev->csf.gpu_idle_dur_count_modifier = modifier; + kbdev->csf.mcu_core_pwroff_dur_ns = DEFAULT_GLB_PWROFF_TIMEOUT_NS; + kbdev->csf.mcu_core_pwroff_dur_count = convert_dur_to_core_pwroff_count( + kbdev, DEFAULT_GLB_PWROFF_TIMEOUT_NS, &modifier); + kbdev->csf.mcu_core_pwroff_dur_count_modifier = modifier; +#endif return 0; } -int kbase_csf_firmware_init(struct kbase_device *kbdev) +int kbase_csf_firmware_load_init(struct kbase_device *kbdev) { - const struct firmware *firmware; + const struct firmware *firmware = NULL; + struct kbase_csf_mcu_fw *const mcu_fw = &kbdev->csf.fw; const u32 magic = FIRMWARE_HEADER_MAGIC; u8 version_major, version_minor; u32 version_hash; u32 entry_end_offset; u32 entry_offset; int ret; + const char *fw_name = default_fw_name; lockdep_assert_held(&kbdev->fw_load_lock); @@ -1953,51 +2503,95 @@ int kbase_csf_firmware_init(struct kbase_device *kbdev) if (ret != 0) { dev_err(kbdev->dev, "Failed to setup the rb tree for managing shared interface segment\n"); - goto error; + goto err_out; + } + +#if IS_ENABLED(CONFIG_OF) + /* If we can't read CSF firmware name from DTB, + * fw_name is not modified and remains the default. + */ + ret = of_property_read_string(kbdev->dev->of_node, "firmware-name", &fw_name); + if (ret == -EINVAL) { + /* Property doesn't exist in DTB, and fw_name already points to default FW name + * so just reset return value and continue. + */ + ret = 0; + } else if (ret == -ENODATA) { + dev_warn(kbdev->dev, + "\"firmware-name\" DTB property contains no data, using default FW name"); + /* Reset return value so FW does not fail to load */ + ret = 0; + } else if (ret == -EILSEQ) { + /* This is reached when the size of the fw_name buffer is too small for the string + * stored in the DTB and the null terminator. + */ + dev_warn(kbdev->dev, + "\"firmware-name\" DTB property value too long, using default FW name."); + /* Reset return value so FW does not fail to load */ + ret = 0; } +#endif /* IS_ENABLED(CONFIG_OF) */ + if (request_firmware(&firmware, fw_name, kbdev->dev) != 0) { dev_err(kbdev->dev, "Failed to load firmware image '%s'\n", fw_name); ret = -ENOENT; - goto error; + } else { + /* Try to save a copy and then release the loaded firmware image */ + mcu_fw->size = firmware->size; + mcu_fw->data = vmalloc((unsigned long)mcu_fw->size); + + if (mcu_fw->data == NULL) { + ret = -ENOMEM; + } else { + memcpy(mcu_fw->data, firmware->data, mcu_fw->size); + dev_dbg(kbdev->dev, "Firmware image (%zu-bytes) retained in csf.fw\n", + mcu_fw->size); + } + + release_firmware(firmware); } - if (firmware->size < FIRMWARE_HEADER_LENGTH) { + /* If error in loading or saving the image, branches to error out */ + if (ret) + goto err_out; + + if (mcu_fw->size < FIRMWARE_HEADER_LENGTH) { dev_err(kbdev->dev, "Firmware too small\n"); ret = -EINVAL; - goto error; + goto err_out; } - if (memcmp(firmware->data, &magic, sizeof(magic)) != 0) { + if (memcmp(mcu_fw->data, &magic, sizeof(magic)) != 0) { dev_err(kbdev->dev, "Incorrect firmware magic\n"); ret = -EINVAL; - goto error; + goto err_out; } - version_minor = firmware->data[4]; - version_major = firmware->data[5]; + version_minor = mcu_fw->data[4]; + version_major = mcu_fw->data[5]; - if (version_major != FIRMWARE_HEADER_VERSION) { + if (version_major != FIRMWARE_HEADER_VERSION_MAJOR || + version_minor != FIRMWARE_HEADER_VERSION_MINOR) { dev_err(kbdev->dev, "Firmware header version %d.%d not understood\n", version_major, version_minor); ret = -EINVAL; - goto error; + goto err_out; } - memcpy(&version_hash, &firmware->data[8], sizeof(version_hash)); + memcpy(&version_hash, &mcu_fw->data[8], sizeof(version_hash)); dev_notice(kbdev->dev, "Loading Mali firmware 0x%x", version_hash); - memcpy(&entry_end_offset, &firmware->data[0x10], - sizeof(entry_end_offset)); + memcpy(&entry_end_offset, &mcu_fw->data[0x10], sizeof(entry_end_offset)); - if (entry_end_offset > firmware->size) { + if (entry_end_offset > mcu_fw->size) { dev_err(kbdev->dev, "Firmware image is truncated\n"); ret = -EINVAL; - goto error; + goto err_out; } entry_offset = FIRMWARE_HEADER_LENGTH; @@ -2005,15 +2599,14 @@ int kbase_csf_firmware_init(struct kbase_device *kbdev) u32 header; unsigned int size; - memcpy(&header, &firmware->data[entry_offset], sizeof(header)); + memcpy(&header, &mcu_fw->data[entry_offset], sizeof(header)); size = entry_size(header); - ret = load_firmware_entry(kbdev, firmware, entry_offset, - header); + ret = load_firmware_entry(kbdev, mcu_fw, entry_offset, header); if (ret != 0) { dev_err(kbdev->dev, "Failed to load firmware image\n"); - goto error; + goto err_out; } entry_offset += size; } @@ -2021,75 +2614,104 @@ int kbase_csf_firmware_init(struct kbase_device *kbdev) if (!kbdev->csf.shared_interface) { dev_err(kbdev->dev, "Shared interface region not found\n"); ret = -EINVAL; - goto error; + goto err_out; } else { ret = setup_shared_iface_static_region(kbdev); if (ret != 0) { dev_err(kbdev->dev, "Failed to insert a region for shared iface entry parsed from fw image\n"); - goto error; + goto err_out; } } ret = kbase_csf_firmware_trace_buffers_init(kbdev); if (ret != 0) { dev_err(kbdev->dev, "Failed to initialize trace buffers\n"); + goto err_out; + } + + ret = kbasep_platform_fw_config_init(kbdev); + if (ret != 0) { + dev_err(kbdev->dev, "Failed to perform platform specific FW configuration"); + goto err_out; + } + +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + ret = kbase_csf_firmware_cfg_enable_host_ctrl_sc_rails(kbdev); + if (ret != 0) { + dev_err(kbdev->dev, "Failed to enable SC PM WA"); goto error; } +#endif + + ret = kbase_csf_firmware_cfg_fw_wa_init(kbdev); + if (ret != 0) { + dev_err(kbdev->dev, "Failed to initialize firmware workarounds"); + goto err_out; + } /* Make sure L2 cache is powered up */ kbase_pm_wait_for_l2_powered(kbdev); /* Load the MMU tables into the selected address space */ - load_mmu_tables(kbdev); + ret = load_mmu_tables(kbdev); + if (ret != 0) + goto err_out; boot_csf_firmware(kbdev); ret = parse_capabilities(kbdev); if (ret != 0) - goto error; + goto err_out; ret = kbase_csf_doorbell_mapping_init(kbdev); if (ret != 0) - goto error; + goto err_out; ret = kbase_csf_scheduler_init(kbdev); if (ret != 0) - goto error; + goto err_out; ret = kbase_csf_setup_dummy_user_reg_page(kbdev); if (ret != 0) - goto error; + goto err_out; ret = kbase_csf_timeout_init(kbdev); if (ret != 0) - goto error; + goto err_out; ret = global_init_on_boot(kbdev); if (ret != 0) - goto error; + goto err_out; + + ret = kbase_csf_firmware_log_init(kbdev); + if (ret != 0) { + dev_err(kbdev->dev, "Failed to initialize FW trace (err %d)", ret); + goto err_out; + } ret = kbase_csf_firmware_cfg_init(kbdev); if (ret != 0) - goto error; + goto err_out; ret = kbase_device_csf_iterator_trace_init(kbdev); if (ret != 0) - goto error; + goto err_out; + + if (kbdev->csf.fw_core_dump.available) + kbase_csf_firmware_core_dump_init(kbdev); - /* Firmware loaded successfully */ - release_firmware(firmware); - KBASE_KTRACE_ADD(kbdev, FIRMWARE_BOOT, NULL, + /* Firmware loaded successfully, ret = 0 */ + KBASE_KTRACE_ADD(kbdev, CSF_FIRMWARE_BOOT, NULL, (((u64)version_hash) << 32) | (((u64)version_major) << 8) | version_minor); return 0; -error: - kbase_csf_firmware_term(kbdev); - release_firmware(firmware); +err_out: + kbase_csf_firmware_unload_term(kbdev); return ret; } -void kbase_csf_firmware_term(struct kbase_device *kbdev) +void kbase_csf_firmware_unload_term(struct kbase_device *kbdev) { unsigned long flags; int ret = 0; @@ -2102,6 +2724,8 @@ void kbase_csf_firmware_term(struct kbase_device *kbdev) kbase_csf_firmware_cfg_term(kbdev); + kbase_csf_firmware_log_term(kbdev); + kbase_csf_timeout_term(kbdev); kbase_csf_free_dummy_user_reg_page(kbdev); @@ -2129,6 +2753,8 @@ void kbase_csf_firmware_term(struct kbase_device *kbdev) unload_mmu_tables(kbdev); + kbase_csf_firmware_cfg_fw_wa_term(kbdev); + kbase_csf_firmware_trace_buffers_term(kbdev); while (!list_empty(&kbdev->csf.firmware_interfaces)) { @@ -2175,19 +2801,137 @@ void kbase_csf_firmware_term(struct kbase_device *kbdev) kfree(metadata); } + if (kbdev->csf.fw.data) { + /* Free the copy of the firmware image */ + vfree(kbdev->csf.fw.data); + kbdev->csf.fw.data = NULL; + dev_dbg(kbdev->dev, "Free retained image csf.fw (%zu-bytes)\n", kbdev->csf.fw.size); + } + /* This will also free up the region allocated for the shared interface * entry parsed from the firmware image. */ kbase_mcu_shared_interface_region_tracker_term(kbdev); - mutex_destroy(&kbdev->csf.reg_lock); - kbase_mmu_term(kbdev, &kbdev->csf.mcu_mmu); /* Release the address space */ kbdev->as_free |= MCU_AS_BITMASK; } +#if IS_ENABLED(CONFIG_MALI_CORESIGHT) +int kbase_csf_firmware_mcu_register_write(struct kbase_device *const kbdev, u32 const reg_addr, + u32 const reg_val) +{ + struct kbase_csf_global_iface *global_iface = &kbdev->csf.global_iface; + unsigned long flags; + int err; + u32 glb_req; + + mutex_lock(&kbdev->csf.reg_lock); + kbase_csf_scheduler_spin_lock(kbdev, &flags); + + /* Set the address and value to write */ + kbase_csf_firmware_global_input(global_iface, GLB_DEBUG_ARG_IN0, reg_addr); + kbase_csf_firmware_global_input(global_iface, GLB_DEBUG_ARG_IN1, reg_val); + + /* Set the Global Debug request for FW MCU write */ + glb_req = kbase_csf_firmware_global_output(global_iface, GLB_DEBUG_ACK); + glb_req ^= GLB_DEBUG_REQ_FW_AS_WRITE_MASK; + kbase_csf_firmware_global_input_mask(global_iface, GLB_DEBUG_REQ, glb_req, + GLB_DEBUG_REQ_FW_AS_WRITE_MASK); + + set_global_request(global_iface, GLB_REQ_DEBUG_CSF_REQ_MASK); + + /* Notify FW about the Global Debug request */ + kbase_csf_ring_doorbell(kbdev, CSF_KERNEL_DOORBELL_NR); + + kbase_csf_scheduler_spin_unlock(kbdev, flags); + + err = wait_for_global_request(kbdev, GLB_REQ_DEBUG_CSF_REQ_MASK); + + mutex_unlock(&kbdev->csf.reg_lock); + + dev_dbg(kbdev->dev, "w: reg %08x val %08x", reg_addr, reg_val); + + return err; +} + +int kbase_csf_firmware_mcu_register_read(struct kbase_device *const kbdev, u32 const reg_addr, + u32 *reg_val) +{ + struct kbase_csf_global_iface *global_iface = &kbdev->csf.global_iface; + unsigned long flags; + int err; + u32 glb_req; + + if (WARN_ON(reg_val == NULL)) + return -EINVAL; + + mutex_lock(&kbdev->csf.reg_lock); + kbase_csf_scheduler_spin_lock(kbdev, &flags); + + /* Set the address to read */ + kbase_csf_firmware_global_input(global_iface, GLB_DEBUG_ARG_IN0, reg_addr); + + /* Set the Global Debug request for FW MCU read */ + glb_req = kbase_csf_firmware_global_output(global_iface, GLB_DEBUG_ACK); + glb_req ^= GLB_DEBUG_REQ_FW_AS_READ_MASK; + kbase_csf_firmware_global_input_mask(global_iface, GLB_DEBUG_REQ, glb_req, + GLB_DEBUG_REQ_FW_AS_READ_MASK); + + set_global_request(global_iface, GLB_REQ_DEBUG_CSF_REQ_MASK); + + /* Notify FW about the Global Debug request */ + kbase_csf_ring_doorbell(kbdev, CSF_KERNEL_DOORBELL_NR); + + kbase_csf_scheduler_spin_unlock(kbdev, flags); + + err = wait_for_global_request(kbdev, GLB_REQ_DEBUG_CSF_REQ_MASK); + + if (!err) { + kbase_csf_scheduler_spin_lock(kbdev, &flags); + *reg_val = kbase_csf_firmware_global_output(global_iface, GLB_DEBUG_ARG_OUT0); + kbase_csf_scheduler_spin_unlock(kbdev, flags); + } + + mutex_unlock(&kbdev->csf.reg_lock); + + dev_dbg(kbdev->dev, "r: reg %08x val %08x", reg_addr, *reg_val); + + return err; +} + +int kbase_csf_firmware_mcu_register_poll(struct kbase_device *const kbdev, u32 const reg_addr, + u32 const val_mask, u32 const reg_val) +{ + unsigned long remaining = kbase_csf_timeout_in_jiffies(kbdev->csf.fw_timeout_ms) + jiffies; + u32 read_val; + + dev_dbg(kbdev->dev, "p: reg %08x val %08x mask %08x", reg_addr, reg_val, val_mask); + + while (time_before(jiffies, remaining)) { + int err = kbase_csf_firmware_mcu_register_read(kbdev, reg_addr, &read_val); + + if (err) { + dev_err(kbdev->dev, + "Error reading MCU register value (read_val = %u, expect = %u)\n", + read_val, reg_val); + return err; + } + + if ((read_val & val_mask) == reg_val) + return 0; + } + + dev_err(kbdev->dev, + "Timeout waiting for MCU register value to be set (read_val = %u, expect = %u)\n", + read_val, reg_val); + + return -ETIMEDOUT; +} +#endif /* IS_ENABLED(CONFIG_MALI_CORESIGHT) */ + void kbase_csf_firmware_enable_gpu_idle_timer(struct kbase_device *kbdev) { struct kbase_csf_global_iface *global_iface = &kbdev->csf.global_iface; @@ -2233,10 +2977,11 @@ void kbase_csf_firmware_ping(struct kbase_device *const kbdev) kbase_csf_scheduler_spin_unlock(kbdev, flags); } -int kbase_csf_firmware_ping_wait(struct kbase_device *const kbdev) +int kbase_csf_firmware_ping_wait(struct kbase_device *const kbdev, unsigned int wait_timeout_ms) { kbase_csf_firmware_ping(kbdev); - return wait_for_global_request(kbdev, GLB_REQ_PING_MASK); + + return wait_for_global_request_with_timeout(kbdev, GLB_REQ_PING_MASK, wait_timeout_ms); } int kbase_csf_firmware_set_timeout(struct kbase_device *const kbdev, @@ -2275,16 +3020,52 @@ void kbase_csf_enter_protected_mode(struct kbase_device *kbdev) kbase_csf_ring_doorbell(kbdev, CSF_KERNEL_DOORBELL_NR); } -void kbase_csf_wait_protected_mode_enter(struct kbase_device *kbdev) +int kbase_csf_wait_protected_mode_enter(struct kbase_device *kbdev) { - int err = wait_for_global_request(kbdev, GLB_REQ_PROTM_ENTER_MASK); + int err; + + err = wait_for_global_request(kbdev, GLB_REQ_PROTM_ENTER_MASK); + + if (!err) { +#define WAIT_TIMEOUT 5000 /* 50ms timeout */ +#define DELAY_TIME_IN_US 10 + const int max_iterations = WAIT_TIMEOUT; + int loop; - if (err) { - if (kbase_prepare_to_reset_gpu(kbdev, RESET_FLAGS_NONE)) + /* Wait for the GPU to actually enter protected mode */ + for (loop = 0; loop < max_iterations; loop++) { + unsigned long flags; + bool pmode_exited; + + if (kbase_reg_read(kbdev, GPU_CONTROL_REG(GPU_STATUS)) & + GPU_STATUS_PROTECTED_MODE_ACTIVE) + break; + + /* Check if GPU already exited the protected mode */ + kbase_csf_scheduler_spin_lock(kbdev, &flags); + pmode_exited = + !kbase_csf_scheduler_protected_mode_in_use(kbdev); + kbase_csf_scheduler_spin_unlock(kbdev, flags); + if (pmode_exited) + break; + + udelay(DELAY_TIME_IN_US); + } + + if (loop == max_iterations) { + dev_err(kbdev->dev, "Timeout for actual pmode entry after PROTM_ENTER ack"); + err = -ETIMEDOUT; + } + } + + if (unlikely(err)) { + if (kbase_prepare_to_reset_gpu(kbdev, RESET_FLAGS_HWC_UNRECOVERABLE_ERROR)) kbase_reset_gpu(kbdev); } KBASE_TLSTREAM_AUX_PROTECTED_ENTER_END(kbdev, kbdev); + + return err; } void kbase_csf_firmware_trigger_mcu_halt(struct kbase_device *kbdev) @@ -2348,7 +3129,9 @@ int kbase_csf_trigger_firmware_config_update(struct kbase_device *kbdev) /* Ensure GPU is powered-up until we complete config update.*/ kbase_csf_scheduler_pm_active(kbdev); - kbase_csf_scheduler_wait_mcu_active(kbdev); + err = kbase_csf_scheduler_killable_wait_mcu_active(kbdev); + if (err) + goto exit; /* The 'reg_lock' is also taken and is held till the update is * complete, to ensure the config update gets serialized. @@ -2365,6 +3148,7 @@ int kbase_csf_trigger_firmware_config_update(struct kbase_device *kbdev) GLB_REQ_FIRMWARE_CONFIG_UPDATE_MASK); mutex_unlock(&kbdev->csf.reg_lock); +exit: kbase_csf_scheduler_pm_idle(kbdev); return err; } @@ -2488,7 +3272,7 @@ int kbase_csf_firmware_mcu_shared_mapping_init( gpu_map_prot = KBASE_REG_MEMATTR_INDEX(AS_MEMATTR_INDEX_NON_CACHEABLE); cpu_map_prot = pgprot_writecombine(cpu_map_prot); - }; + } phys = kmalloc_array(num_pages, sizeof(*phys), GFP_KERNEL); if (!phys) @@ -2498,9 +3282,8 @@ int kbase_csf_firmware_mcu_shared_mapping_init( if (!page_list) goto page_list_alloc_error; - ret = kbase_mem_pool_alloc_pages( - &kbdev->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], - num_pages, phys, false); + ret = kbase_mem_pool_alloc_pages(&kbdev->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], num_pages, + phys, false, NULL); if (ret <= 0) goto phys_mem_pool_alloc_error; @@ -2511,8 +3294,7 @@ int kbase_csf_firmware_mcu_shared_mapping_init( if (!cpu_addr) goto vmap_error; - va_reg = kbase_alloc_free_region(&kbdev->csf.shared_reg_rbtree, 0, - num_pages, KBASE_REG_ZONE_MCU_SHARED); + va_reg = kbase_alloc_free_region(&kbdev->csf.mcu_shared_zone, 0, num_pages); if (!va_reg) goto va_region_alloc_error; @@ -2526,9 +3308,9 @@ int kbase_csf_firmware_mcu_shared_mapping_init( gpu_map_properties &= (KBASE_REG_GPU_RD | KBASE_REG_GPU_WR); gpu_map_properties |= gpu_map_prot; - ret = kbase_mmu_insert_pages_no_flush(kbdev, &kbdev->csf.mcu_mmu, - va_reg->start_pfn, &phys[0], num_pages, - gpu_map_properties, KBASE_MEM_GROUP_CSF_FW); + ret = kbase_mmu_insert_pages_no_flush(kbdev, &kbdev->csf.mcu_mmu, va_reg->start_pfn, + &phys[0], num_pages, gpu_map_properties, + KBASE_MEM_GROUP_CSF_FW, NULL, NULL); if (ret) goto mmu_insert_pages_error; diff --git a/mali_kbase/csf/mali_kbase_csf_firmware.h b/mali_kbase/csf/mali_kbase_csf_firmware.h index 74bae39..15d7b58 100644 --- a/mali_kbase/csf/mali_kbase_csf_firmware.h +++ b/mali_kbase/csf/mali_kbase_csf_firmware.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2018-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2018-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -56,7 +56,7 @@ #define CSF_NUM_DOORBELL ((u8)24) /* Offset to the first HW doorbell page */ -#define CSF_HW_DOORBELL_PAGE_OFFSET ((u32)0x80000) +#define CSF_HW_DOORBELL_PAGE_OFFSET ((u32)DOORBELLS_BASE) /* Size of HW Doorbell page, used to calculate the offset to subsequent pages */ #define CSF_HW_DOORBELL_PAGE_SIZE ((u32)0x10000) @@ -78,6 +78,13 @@ /* MAX_SUPPORTED_STREAMS_PER_GROUP: Maximum CSs per csg. */ #define MAX_SUPPORTED_STREAMS_PER_GROUP 32 +#define BUILD_INFO_METADATA_SIZE_OFFSET (0x4) +#define BUILD_INFO_GIT_SHA_LEN (40U) +#define BUILD_INFO_GIT_DIRTY_LEN (1U) +#define BUILD_INFO_GIT_SHA_PATTERN "git_sha: " + +extern char fw_git_sha[BUILD_INFO_GIT_SHA_LEN]; + struct kbase_device; @@ -324,24 +331,13 @@ u32 kbase_csf_firmware_global_input_read( u32 kbase_csf_firmware_global_output( const struct kbase_csf_global_iface *iface, u32 offset); -/* Calculate the offset to the Hw doorbell page corresponding to the - * doorbell number. +/** + * kbase_csf_ring_doorbell() - Ring the doorbell + * + * @kbdev: An instance of the GPU platform device + * @doorbell_nr: Index of the HW doorbell page */ -static u32 csf_doorbell_offset(int doorbell_nr) -{ - WARN_ON(doorbell_nr >= CSF_NUM_DOORBELL); - - return CSF_HW_DOORBELL_PAGE_OFFSET + - (doorbell_nr * CSF_HW_DOORBELL_PAGE_SIZE); -} - -static inline void kbase_csf_ring_doorbell(struct kbase_device *kbdev, - int doorbell_nr) -{ - WARN_ON(doorbell_nr >= CSF_NUM_DOORBELL); - - kbase_reg_write(kbdev, csf_doorbell_offset(doorbell_nr), (u32)1); -} +void kbase_csf_ring_doorbell(struct kbase_device *kbdev, int doorbell_nr); /** * kbase_csf_read_firmware_memory - Read a value in a GPU address @@ -374,7 +370,45 @@ void kbase_csf_update_firmware_memory(struct kbase_device *kbdev, u32 gpu_addr, u32 value); /** - * kbase_csf_firmware_early_init() - Early initializatin for the firmware. + * kbase_csf_read_firmware_memory_exe - Read a value in a GPU address in the + * region of its final execution location. + * + * @kbdev: Device pointer + * @gpu_addr: GPU address to read + * @value: Output pointer to which the read value will be written + * + * This function read a value in a GPU address that belongs to a private loaded + * firmware memory region based on its final execution location. The function + * assumes that the location is not permanently mapped on the CPU address space, + * therefore it maps it and then unmaps it to access it independently. This function + * needs to be used when accessing firmware memory regions which will be moved to + * their final execution location during firmware boot using an address based on the + * final execution location. + */ +void kbase_csf_read_firmware_memory_exe(struct kbase_device *kbdev, + u32 gpu_addr, u32 *value); + +/** + * kbase_csf_update_firmware_memory_exe - Write a value in a GPU address in the + * region of its final execution location. + * + * @kbdev: Device pointer + * @gpu_addr: GPU address to write + * @value: Value to write + * + * This function writes a value in a GPU address that belongs to a private loaded + * firmware memory region based on its final execution location. The function + * assumes that the location is not permanently mapped on the CPU address space, + * therefore it maps it and then unmaps it to access it independently. This function + * needs to be used when accessing firmware memory regions which will be moved to + * their final execution location during firmware boot using an address based on the + * final execution location. + */ +void kbase_csf_update_firmware_memory_exe(struct kbase_device *kbdev, + u32 gpu_addr, u32 value); + +/** + * kbase_csf_firmware_early_init() - Early initialization for the firmware. * @kbdev: Kbase device * * Initialize resources related to the firmware. Must be called at kbase probe. @@ -384,22 +418,87 @@ void kbase_csf_update_firmware_memory(struct kbase_device *kbdev, int kbase_csf_firmware_early_init(struct kbase_device *kbdev); /** - * kbase_csf_firmware_init() - Load the firmware for the CSF MCU + * kbase_csf_firmware_early_term() - Terminate resources related to the firmware + * after the firmware unload has been done. + * + * @kbdev: Device pointer + * + * This should be called only when kbase probe fails or gets rmmoded. + */ +void kbase_csf_firmware_early_term(struct kbase_device *kbdev); + +/** + * kbase_csf_firmware_late_init() - Late initialization for the firmware. + * @kbdev: Kbase device + * + * Initialize resources related to the firmware. But must be called after + * backend late init is done. Must be used at probe time only. + * + * Return: 0 if successful, negative error code on failure + */ +int kbase_csf_firmware_late_init(struct kbase_device *kbdev); + +/** + * kbase_csf_firmware_load_init() - Load the firmware for the CSF MCU * @kbdev: Kbase device * * Request the firmware from user space and load it into memory. * * Return: 0 if successful, negative error code on failure */ -int kbase_csf_firmware_init(struct kbase_device *kbdev); +int kbase_csf_firmware_load_init(struct kbase_device *kbdev); /** - * kbase_csf_firmware_term() - Unload the firmware + * kbase_csf_firmware_unload_term() - Unload the firmware * @kbdev: Kbase device * - * Frees the memory allocated by kbase_csf_firmware_init() + * Frees the memory allocated by kbase_csf_firmware_load_init() + */ +void kbase_csf_firmware_unload_term(struct kbase_device *kbdev); + +#if IS_ENABLED(CONFIG_MALI_CORESIGHT) +/** + * kbase_csf_firmware_mcu_register_write - Write to MCU register + * + * @kbdev: Instance of a gpu platform device that implements a csf interface. + * @reg_addr: Register address to write into + * @reg_val: Value to be written + * + * Write a desired value to a register in MCU address space. + * + * return: 0 on success, or negative on failure. + */ +int kbase_csf_firmware_mcu_register_write(struct kbase_device *const kbdev, u32 const reg_addr, + u32 const reg_val); +/** + * kbase_csf_firmware_mcu_register_read - Read from MCU register + * + * @kbdev: Instance of a gpu platform device that implements a csf interface. + * @reg_addr: Register address to read from + * @reg_val: Value as present in reg_addr register + * + * Read a value from MCU address space. + * + * return: 0 on success, or negative on failure. + */ +int kbase_csf_firmware_mcu_register_read(struct kbase_device *const kbdev, u32 const reg_addr, + u32 *reg_val); + +/** + * kbase_csf_firmware_mcu_register_poll - Poll MCU register + * + * @kbdev: Instance of a gpu platform device that implements a csf interface. + * @reg_addr: Register address to read from + * @val_mask: Value to mask the read value for comparison + * @reg_val: Value to be compared against + * + * Continue to read a value from MCU address space until it matches given mask and value. + * + * return: 0 on success, or negative on failure. */ -void kbase_csf_firmware_term(struct kbase_device *kbdev); +int kbase_csf_firmware_mcu_register_poll(struct kbase_device *const kbdev, u32 const reg_addr, + u32 const val_mask, u32 const reg_val); +#endif /* IS_ENABLED(CONFIG_MALI_CORESIGHT) */ /** * kbase_csf_firmware_ping - Send the ping request to firmware. @@ -414,13 +513,14 @@ void kbase_csf_firmware_ping(struct kbase_device *kbdev); * kbase_csf_firmware_ping_wait - Send the ping request to firmware and waits. * * @kbdev: Instance of a GPU platform device that implements a CSF interface. + * @wait_timeout_ms: Timeout to get the acknowledgment for PING request from FW. * * The function sends the ping request to firmware and waits to confirm it is * alive. * * Return: 0 on success, or negative on failure. */ -int kbase_csf_firmware_ping_wait(struct kbase_device *kbdev); +int kbase_csf_firmware_ping_wait(struct kbase_device *kbdev, unsigned int wait_timeout_ms); /** * kbase_csf_firmware_set_timeout - Set a hardware endpoint progress timeout. @@ -454,11 +554,13 @@ void kbase_csf_enter_protected_mode(struct kbase_device *kbdev); * * @kbdev: Instance of a GPU platform device that implements a CSF interface. * - * This function needs to be called after kbase_csf_wait_protected_mode_enter() - * to wait for the protected mode entry to complete. GPU reset is triggered if + * This function needs to be called after kbase_csf_enter_protected_mode() to + * wait for the GPU to actually enter protected mode. GPU reset is triggered if * the wait is unsuccessful. + * + * Return: 0 on success, or negative on failure. */ -void kbase_csf_wait_protected_mode_enter(struct kbase_device *kbdev); +int kbase_csf_wait_protected_mode_enter(struct kbase_device *kbdev); static inline bool kbase_csf_firmware_mcu_halted(struct kbase_device *kbdev) { @@ -523,9 +625,9 @@ bool kbase_csf_firmware_is_mcu_in_sleep(struct kbase_device *kbdev); #endif /** - * kbase_trigger_firmware_reload - Trigger the reboot of MCU firmware, for the - * cold boot case firmware image would be - * reloaded from filesystem into memory. + * kbase_csf_firmware_trigger_reload() - Trigger the reboot of MCU firmware, for + * the cold boot case firmware image would + * be reloaded from filesystem into memory. * * @kbdev: Instance of a GPU platform device that implements a CSF interface. */ @@ -738,18 +840,18 @@ u32 kbase_csf_firmware_get_gpu_idle_hysteresis_time(struct kbase_device *kbdev); u32 kbase_csf_firmware_set_gpu_idle_hysteresis_time(struct kbase_device *kbdev, u32 dur); /** - * kbase_csf_firmware_get_mcu_core_pwroff_time - Get the MCU core power-off + * kbase_csf_firmware_get_mcu_core_pwroff_time - Get the MCU shader Core power-off * time value * * @kbdev: Instance of a GPU platform device that implements a CSF interface. * - * Return: the internally recorded MCU core power-off (nominal) value. The unit + * Return: the internally recorded MCU shader Core power-off (nominal) timeout value. The unit * of the value is in micro-seconds. */ u32 kbase_csf_firmware_get_mcu_core_pwroff_time(struct kbase_device *kbdev); /** - * kbase_csf_firmware_set_mcu_core_pwroff_time - Set the MCU core power-off + * kbase_csf_firmware_set_mcu_core_pwroff_time - Set the MCU shader Core power-off * time value * * @kbdev: Instance of a GPU platform device that implements a CSF interface. @@ -766,7 +868,7 @@ u32 kbase_csf_firmware_get_mcu_core_pwroff_time(struct kbase_device *kbdev); * returned value is the source configuration flag, and it is set to '1' * when CYCLE_COUNTER alternative source is used. * - * The configured MCU core power-off timer will only have effect when the host + * The configured MCU shader Core power-off timer will only have effect when the host * driver has delegated the shader cores' power management to MCU. * * Return: the actual internal core power-off timer value in register defined @@ -775,6 +877,22 @@ u32 kbase_csf_firmware_get_mcu_core_pwroff_time(struct kbase_device *kbdev); u32 kbase_csf_firmware_set_mcu_core_pwroff_time(struct kbase_device *kbdev, u32 dur); /** + * kbase_csf_firmware_reset_mcu_core_pwroff_time - Reset the MCU shader Core power-off + * time value + * + * @kbdev: Instance of a GPU platform device that implements a CSF interface. + * + * Sets the MCU Shader Core power-off time value to the default. + * + * The configured MCU shader Core power-off timer will only have effect when the host + * driver has delegated the shader cores' power management to MCU. + * + * Return: the actual internal core power-off timer value in register defined + * format. + */ +u32 kbase_csf_firmware_reset_mcu_core_pwroff_time(struct kbase_device *kbdev); + +/** * kbase_csf_interface_version - Helper function to build the full firmware * interface version in a format compatible with * GLB_VERSION register @@ -805,4 +923,27 @@ static inline u32 kbase_csf_interface_version(u32 major, u32 minor, u32 patch) * Return: 0 if success, or negative error code on failure. */ int kbase_csf_trigger_firmware_config_update(struct kbase_device *kbdev); + +/** + * kbase_csf_debug_dump_registers - Print CSF debug message. + * + * @kbdev: Instance of a GPU platform device that implements a CSF interface. + * + * Prints CSF debug message cccontaining critical CSF firmware information. + * GPU must be powered during this call. + */ +void kbase_csf_debug_dump_registers(struct kbase_device *kbdev); + +/** + * kbase_csf_firmware_req_core_dump - Request a firmware core dump + * + * @kbdev: Instance of a GPU platform device that implements a CSF interface. + * + * Request a firmware core dump and wait for for firmware to acknowledge. + * Firmware will enter infinite loop after the firmware core dump is created. + * + * Return: 0 if success, or negative error code on failure. + */ +int kbase_csf_firmware_req_core_dump(struct kbase_device *const kbdev); + #endif diff --git a/mali_kbase/csf/mali_kbase_csf_firmware_cfg.c b/mali_kbase/csf/mali_kbase_csf_firmware_cfg.c index b114817..48ddbb5 100644 --- a/mali_kbase/csf/mali_kbase_csf_firmware_cfg.c +++ b/mali_kbase/csf/mali_kbase_csf_firmware_cfg.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2020-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -20,12 +20,23 @@ */ #include <mali_kbase.h> -#include "mali_kbase_csf_firmware_cfg.h" #include <mali_kbase_reset_gpu.h> +#include <linux/version.h> + +#include "mali_kbase_csf_firmware_cfg.h" +#include "mali_kbase_csf_firmware_log.h" #if CONFIG_SYSFS #define CSF_FIRMWARE_CFG_SYSFS_DIR_NAME "firmware_config" +#define CSF_FIRMWARE_CFG_LOG_VERBOSITY_ENTRY_NAME "Log verbosity" + +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS +#define HOST_CONTROLS_SC_RAILS_CFG_ENTRY_NAME "Host controls SC rails" +#endif + +#define CSF_FIRMWARE_CFG_WA_CFG0_ENTRY_NAME "WA_CFG0" + /** * struct firmware_config - Configuration item within the MCU firmware * @@ -107,7 +118,7 @@ static ssize_t show_fw_cfg(struct kobject *kobj, return -EINVAL; } - return snprintf(buf, PAGE_SIZE, "%u\n", val); + return scnprintf(buf, PAGE_SIZE, "%u\n", val); } static ssize_t store_fw_cfg(struct kobject *kobj, @@ -124,7 +135,7 @@ static ssize_t store_fw_cfg(struct kobject *kobj, if (attr == &fw_cfg_attr_cur) { unsigned long flags; - u32 val; + u32 val, cur_val; int ret = kstrtouint(buf, 0, &val); if (ret) { @@ -135,11 +146,22 @@ static ssize_t store_fw_cfg(struct kobject *kobj, return -EINVAL; } +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + if (!strcmp(config->name, + HOST_CONTROLS_SC_RAILS_CFG_ENTRY_NAME)) + return -EPERM; +#endif + if (!strcmp(config->name, + CSF_FIRMWARE_CFG_WA_CFG0_ENTRY_NAME)) + return -EPERM; + if ((val < config->min) || (val > config->max)) return -EINVAL; spin_lock_irqsave(&kbdev->hwaccess_lock, flags); - if (config->cur_val == val) { + + cur_val = config->cur_val; + if (cur_val == val) { spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); return count; } @@ -176,6 +198,20 @@ static ssize_t store_fw_cfg(struct kobject *kobj, spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); + /* Enable FW logging only if Log verbosity is non-zero */ + if (!strcmp(config->name, CSF_FIRMWARE_CFG_LOG_VERBOSITY_ENTRY_NAME) && + (!cur_val || !val)) { + ret = kbase_csf_firmware_log_toggle_logging_calls(kbdev, val); + if (ret) { + /* Undo FW configuration changes */ + spin_lock_irqsave(&kbdev->hwaccess_lock, flags); + config->cur_val = cur_val; + kbase_csf_update_firmware_memory(kbdev, config->address, cur_val); + spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); + return ret; + } + } + /* If we can update the config without firmware reset then * we need to just trigger FIRMWARE_CONFIG_UPDATE. */ @@ -209,11 +245,18 @@ static struct attribute *fw_cfg_attrs[] = { &fw_cfg_attr_cur, NULL, }; +#if (KERNEL_VERSION(5, 2, 0) <= LINUX_VERSION_CODE) +ATTRIBUTE_GROUPS(fw_cfg); +#endif static struct kobj_type fw_cfg_kobj_type = { .release = &fw_cfg_kobj_release, .sysfs_ops = &fw_cfg_ops, +#if (KERNEL_VERSION(5, 2, 0) <= LINUX_VERSION_CODE) + .default_groups = fw_cfg_groups, +#else .default_attrs = fw_cfg_attrs, +#endif }; int kbase_csf_firmware_cfg_init(struct kbase_device *kbdev) @@ -236,6 +279,19 @@ int kbase_csf_firmware_cfg_init(struct kbase_device *kbdev) kbase_csf_read_firmware_memory(kbdev, config->address, &config->cur_val); + if (!strcmp(config->name, CSF_FIRMWARE_CFG_LOG_VERBOSITY_ENTRY_NAME) && + (config->cur_val)) { + err = kbase_csf_firmware_log_toggle_logging_calls(config->kbdev, + config->cur_val); + + if (err) { + kobject_put(&config->kobj); + dev_err(kbdev->dev, "Failed to enable logging (result: %d)", err); + return err; + } + } + + err = kobject_init_and_add(&config->kobj, &fw_cfg_kobj_type, kbdev->csf.fw_cfg_kobj, "%s", config->name); if (err) { @@ -273,9 +329,8 @@ void kbase_csf_firmware_cfg_term(struct kbase_device *kbdev) } int kbase_csf_firmware_cfg_option_entry_parse(struct kbase_device *kbdev, - const struct firmware *fw, - const u32 *entry, - unsigned int size, bool updatable) + const struct kbase_csf_mcu_fw *const fw, + const u32 *entry, unsigned int size, bool updatable) { const char *name = (char *)&entry[3]; struct firmware_config *config; @@ -307,6 +362,108 @@ int kbase_csf_firmware_cfg_option_entry_parse(struct kbase_device *kbdev, return 0; } + +int kbase_csf_firmware_cfg_find_config_address(struct kbase_device *kbdev, const char *name, u32* addr) +{ + struct firmware_config *config; + + list_for_each_entry(config, &kbdev->csf.firmware_config, node) { + if (strcmp(config->name, name) || !config->address) + continue; + + *addr = config->address; + return 0; + } + + return -ENOENT; +} + +int kbase_csf_firmware_cfg_fw_wa_enable(struct kbase_device *kbdev) +{ + struct firmware_config *config; + + /* "quirks_ext" property is optional */ + if (!kbdev->csf.quirks_ext) + return 0; + + list_for_each_entry(config, &kbdev->csf.firmware_config, node) { + if (strcmp(config->name, CSF_FIRMWARE_CFG_WA_CFG0_ENTRY_NAME)) + continue; + dev_info(kbdev->dev, "External quirks 0: 0x%08x", kbdev->csf.quirks_ext[0]); + kbase_csf_update_firmware_memory(kbdev, config->address, kbdev->csf.quirks_ext[0]); + return 0; + } + + return -ENOENT; +} + +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS +int kbase_csf_firmware_cfg_enable_host_ctrl_sc_rails(struct kbase_device *kbdev) +{ + struct firmware_config *config; + + list_for_each_entry(config, &kbdev->csf.firmware_config, node) { + if (strcmp(config->name, + HOST_CONTROLS_SC_RAILS_CFG_ENTRY_NAME)) + continue; + + kbase_csf_update_firmware_memory(kbdev, config->address, 1); + return 0; + } + + return -ENOENT; +} +#endif + +int kbase_csf_firmware_cfg_fw_wa_init(struct kbase_device *kbdev) +{ + int ret; + int entry_count; + size_t entry_bytes; + + /* "quirks-ext" property is optional and may have no value. + * Also try fallback "quirks_ext" property if it doesn't exist. + */ + entry_count = of_property_count_u32_elems(kbdev->dev->of_node, "quirks-ext"); + + if (entry_count == -EINVAL) + entry_count = of_property_count_u32_elems(kbdev->dev->of_node, "quirks_ext"); + + if (entry_count == -EINVAL || entry_count == -ENODATA) + return 0; + + entry_bytes = entry_count * sizeof(u32); + kbdev->csf.quirks_ext = kzalloc(entry_bytes, GFP_KERNEL); + if (!kbdev->csf.quirks_ext) + return -ENOMEM; + + ret = of_property_read_u32_array(kbdev->dev->of_node, "quirks-ext", kbdev->csf.quirks_ext, + entry_count); + + if (ret == -EINVAL) + ret = of_property_read_u32_array(kbdev->dev->of_node, "quirks_ext", + kbdev->csf.quirks_ext, entry_count); + + if (ret == -EINVAL || ret == -ENODATA) { + /* This is unexpected since the property is already accessed for counting the number + * of its elements. + */ + dev_err(kbdev->dev, "\"quirks_ext\" DTB property data read failed"); + return ret; + } + if (ret == -EOVERFLOW) { + dev_err(kbdev->dev, "\"quirks_ext\" DTB property data size exceeds 32 bits"); + return ret; + } + + return kbase_csf_firmware_cfg_fw_wa_enable(kbdev); +} + +void kbase_csf_firmware_cfg_fw_wa_term(struct kbase_device *kbdev) +{ + kfree(kbdev->csf.quirks_ext); +} + #else int kbase_csf_firmware_cfg_init(struct kbase_device *kbdev) { @@ -319,9 +476,27 @@ void kbase_csf_firmware_cfg_term(struct kbase_device *kbdev) } int kbase_csf_firmware_cfg_option_entry_parse(struct kbase_device *kbdev, - const struct firmware *fw, - const u32 *entry, unsigned int size) + const struct kbase_csf_mcu_fw *const fw, + const u32 *entry, unsigned int size) +{ + return 0; +} + +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS +int kbase_csf_firmware_cfg_enable_host_ctrl_sc_rails(struct kbase_device *kbdev) { return 0; } +#endif + +int kbase_csf_firmware_cfg_fw_wa_enable(struct kbase_device *kbdev) +{ + return 0; +} + +int kbase_csf_firmware_cfg_fw_wa_init(struct kbase_device *kbdev) +{ + return 0; +} + #endif /* CONFIG_SYSFS */ diff --git a/mali_kbase/csf/mali_kbase_csf_firmware_cfg.h b/mali_kbase/csf/mali_kbase_csf_firmware_cfg.h index c2d2fc5..f565290 100644 --- a/mali_kbase/csf/mali_kbase_csf_firmware_cfg.h +++ b/mali_kbase/csf/mali_kbase_csf_firmware_cfg.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2020-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -67,8 +67,67 @@ void kbase_csf_firmware_cfg_term(struct kbase_device *kbdev); * Return: 0 if successful, negative error code on failure */ int kbase_csf_firmware_cfg_option_entry_parse(struct kbase_device *kbdev, - const struct firmware *fw, - const u32 *entry, - unsigned int size, - bool updatable); + const struct kbase_csf_mcu_fw *const fw, + const u32 *entry, unsigned int size, bool updatable); + +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS +/** + * kbase_csf_firmware_cfg_enable_host_ctrl_sc_rails() - Enable the config in FW to support + * Host based control of SC power rails + * + * Look for the config entry that enables support in FW for the Host based + * control of shader core power rails and set it before the intial boot + * or reload of firmware. + * + * @kbdev: Kbase device structure + * + * Return: 0 if successful, negative error code on failure + */ +int kbase_csf_firmware_cfg_enable_host_ctrl_sc_rails(struct kbase_device *kbdev); +#endif + +/** + * kbase_csf_firmware_cfg_find_config_address() - Get a FW config option address + * + * @kbdev: Kbase device structure + * @name: Name of cfg option to find + * @addr: Pointer to store the address + * + * Return: 0 if successful, negative error code on failure + */ +int kbase_csf_firmware_cfg_find_config_address(struct kbase_device *kbdev, const char *name, + u32 *addr); +/** + * kbase_csf_firmware_cfg_fw_wa_enable() - Enable firmware workarounds configuration. + * + * @kbdev: Kbase device structure + * + * Look for the config entry that enables support in FW for workarounds and set it according to + * the firmware workaround configuration before the initial boot or reload of firmware. + * + * Return: 0 if successful, negative error code on failure + */ +int kbase_csf_firmware_cfg_fw_wa_enable(struct kbase_device *kbdev); + +/** + * kbase_csf_firmware_cfg_fw_wa_init() - Initialize firmware workarounds configuration. + * + * @kbdev: Kbase device structure + * + * Retrieve and save the firmware workarounds configuration from device-tree "quirks_ext" property. + * Then, look for the config entry that enables support in FW for workarounds and set it according + * to the configuration before the initial firmware boot. + * + * Return: 0 if successful, negative error code on failure + */ +int kbase_csf_firmware_cfg_fw_wa_init(struct kbase_device *kbdev); + +/** + * kbase_csf_firmware_cfg_fw_wa_term - Delete local cache for firmware workarounds configuration. + * + * @kbdev: Pointer to the Kbase device + * + */ +void kbase_csf_firmware_cfg_fw_wa_term(struct kbase_device *kbdev); + #endif /* _KBASE_CSF_FIRMWARE_CFG_H_ */ diff --git a/mali_kbase/csf/mali_kbase_csf_firmware_core_dump.c b/mali_kbase/csf/mali_kbase_csf_firmware_core_dump.c new file mode 100644 index 0000000..e371db2 --- /dev/null +++ b/mali_kbase/csf/mali_kbase_csf_firmware_core_dump.c @@ -0,0 +1,833 @@ +// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note +/* + * + * (C) COPYRIGHT 2021-2023 ARM Limited. All rights reserved. + * + * This program is free software and is provided to you under the terms of the + * GNU General Public License version 2 as published by the Free Software + * Foundation, and any use by you of this program is subject to the terms + * of such GNU license. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, you can access it online at + * http://www.gnu.org/licenses/gpl-2.0.html. + * + */ + +#include <linux/kernel.h> +#include <linux/device.h> +#include <linux/list.h> +#include <linux/file.h> +#include <linux/elf.h> +#include <linux/elfcore.h> +#include <linux/version_compat_defs.h> + +#include "mali_kbase.h" +#include "mali_kbase_csf_firmware_core_dump.h" +#include "backend/gpu/mali_kbase_pm_internal.h" + +/* + * FW image header core dump data format supported. + * Currently only version 0.1 is supported. + */ +#define FW_CORE_DUMP_DATA_VERSION_MAJOR 0 +#define FW_CORE_DUMP_DATA_VERSION_MINOR 1 + +/* Full version of the image header core dump data format */ +#define FW_CORE_DUMP_DATA_VERSION \ + ((FW_CORE_DUMP_DATA_VERSION_MAJOR << 8) | FW_CORE_DUMP_DATA_VERSION_MINOR) + +/* Validity flag to indicate if the MCU registers in the buffer are valid */ +#define FW_MCU_STATUS_MASK 0x1 +#define FW_MCU_STATUS_VALID (1 << 0) + +/* Core dump entry fields */ +#define FW_CORE_DUMP_VERSION_INDEX 0 +#define FW_CORE_DUMP_START_ADDR_INDEX 1 + +/* MCU registers stored by a firmware core dump */ +struct fw_core_dump_mcu { + u32 r0; + u32 r1; + u32 r2; + u32 r3; + u32 r4; + u32 r5; + u32 r6; + u32 r7; + u32 r8; + u32 r9; + u32 r10; + u32 r11; + u32 r12; + u32 sp; + u32 lr; + u32 pc; +}; + +/* Any ELF definitions used in this file are from elf.h/elfcore.h except + * when specific 32-bit versions are required (mainly for the + * ELF_PRSTATUS32 note that is used to contain the MCU registers). + */ + +/* - 32-bit version of timeval structures used in ELF32 PRSTATUS note. */ +struct prstatus32_timeval { + int tv_sec; + int tv_usec; +}; + +/* - Structure defining ELF32 PRSTATUS note contents, as defined by the + * GNU binutils BFD library used by GDB, in bfd/hosts/x86-64linux.h. + * Note: GDB checks for the size of this structure to be 0x94. + * Modified pr_reg (array containing the Arm 32-bit MCU registers) to + * use u32[18] instead of elf_gregset32_t to prevent introducing new typedefs. + */ +struct elf_prstatus32 { + struct elf_siginfo pr_info; /* Info associated with signal. */ + short int pr_cursig; /* Current signal. */ + unsigned int pr_sigpend; /* Set of pending signals. */ + unsigned int pr_sighold; /* Set of held signals. */ + pid_t pr_pid; + pid_t pr_ppid; + pid_t pr_pgrp; + pid_t pr_sid; + struct prstatus32_timeval pr_utime; /* User time. */ + struct prstatus32_timeval pr_stime; /* System time. */ + struct prstatus32_timeval pr_cutime; /* Cumulative user time. */ + struct prstatus32_timeval pr_cstime; /* Cumulative system time. */ + u32 pr_reg[18]; /* GP registers. */ + int pr_fpvalid; /* True if math copro being used. */ +}; + +/* + * struct fw_core_dump_seq_off - Iterator for seq_file operations used on 'fw_core_dump' + * debugfs file. + * @interface: current firmware memory interface + * @page_num: current page number (0..) within @interface + */ +struct fw_core_dump_seq_off { + struct kbase_csf_firmware_interface *interface; + u32 page_num; +}; + +/** + * fw_get_core_dump_mcu - Get the MCU registers saved by a firmware core dump + * + * @kbdev: Instance of a GPU platform device that implements a CSF interface. + * @regs: Pointer to a core dump mcu struct where the MCU registers are copied + * to. Should be allocated by the called. + * + * Return: 0 if successfully copied the MCU registers, negative error code otherwise. + */ +static int fw_get_core_dump_mcu(struct kbase_device *kbdev, struct fw_core_dump_mcu *regs) +{ + unsigned int i; + u32 status = 0; + u32 data_addr = kbdev->csf.fw_core_dump.mcu_regs_addr; + u32 *data = (u32 *)regs; + + /* Check if the core dump entry exposed the buffer */ + if (!regs || !kbdev->csf.fw_core_dump.available) + return -EPERM; + + /* Check if the data in the buffer is valid, if not, return error */ + kbase_csf_read_firmware_memory(kbdev, data_addr, &status); + if ((status & FW_MCU_STATUS_MASK) != FW_MCU_STATUS_VALID) + return -EPERM; + + /* According to image header documentation, the MCU registers core dump + * buffer is 32-bit aligned. + */ + for (i = 1; i <= sizeof(struct fw_core_dump_mcu) / sizeof(u32); ++i) + kbase_csf_read_firmware_memory(kbdev, data_addr + i * sizeof(u32), &data[i - 1]); + + return 0; +} + +/** + * fw_core_dump_fill_elf_header - Initializes an ELF32 header + * @hdr: ELF32 header to initialize + * @sections: Number of entries in the ELF program header table + * + * Initializes an ELF32 header for an ARM 32-bit little-endian + * 'Core file' object file. + */ +static void fw_core_dump_fill_elf_header(struct elf32_hdr *hdr, unsigned int sections) +{ + /* Reset all members in header. */ + memset(hdr, 0, sizeof(*hdr)); + + /* Magic number identifying file as an ELF object. */ + memcpy(hdr->e_ident, ELFMAG, SELFMAG); + + /* Identify file as 32-bit, little-endian, using current + * ELF header version, with no OS or ABI specific ELF + * extensions used. + */ + hdr->e_ident[EI_CLASS] = ELFCLASS32; + hdr->e_ident[EI_DATA] = ELFDATA2LSB; + hdr->e_ident[EI_VERSION] = EV_CURRENT; + hdr->e_ident[EI_OSABI] = ELFOSABI_NONE; + + /* 'Core file' type of object file. */ + hdr->e_type = ET_CORE; + + /* ARM 32-bit architecture (AARCH32) */ + hdr->e_machine = EM_ARM; + + /* Object file version: the original format. */ + hdr->e_version = EV_CURRENT; + + /* Offset of program header table in file. */ + hdr->e_phoff = sizeof(struct elf32_hdr); + + /* No processor specific flags. */ + hdr->e_flags = 0; + + /* Size of the ELF header in bytes. */ + hdr->e_ehsize = sizeof(struct elf32_hdr); + + /* Size of the ELF program header entry in bytes. */ + hdr->e_phentsize = sizeof(struct elf32_phdr); + + /* Number of entries in the program header table. */ + hdr->e_phnum = sections; +} + +/** + * fw_core_dump_fill_elf_program_header_note - Initializes an ELF32 program header + * for holding auxiliary information + * @phdr: ELF32 program header + * @file_offset: Location of the note in the file in bytes + * @size: Size of the note in bytes. + * + * Initializes an ELF32 program header describing auxiliary information (containing + * one or more notes) of @size bytes alltogether located in the file at offset + * @file_offset. + */ +static void fw_core_dump_fill_elf_program_header_note(struct elf32_phdr *phdr, u32 file_offset, + u32 size) +{ + /* Auxiliary information (note) in program header. */ + phdr->p_type = PT_NOTE; + + /* Location of first note in file in bytes. */ + phdr->p_offset = file_offset; + + /* Size of all notes combined in bytes. */ + phdr->p_filesz = size; + + /* Other members not relevant for a note. */ + phdr->p_vaddr = 0; + phdr->p_paddr = 0; + phdr->p_memsz = 0; + phdr->p_align = 0; + phdr->p_flags = 0; +} + +/** + * fw_core_dump_fill_elf_program_header - Initializes an ELF32 program header for a loadable segment + * @phdr: ELF32 program header to initialize. + * @file_offset: Location of loadable segment in file in bytes + * (aligned to FW_PAGE_SIZE bytes) + * @vaddr: 32-bit virtual address where to write the segment + * (aligned to FW_PAGE_SIZE bytes) + * @size: Size of the segment in bytes. + * @flags: CSF_FIRMWARE_ENTRY_* flags describing access permissions. + * + * Initializes an ELF32 program header describing a loadable segment of + * @size bytes located in the file at offset @file_offset to be loaded + * at virtual address @vaddr with access permissions as described by + * CSF_FIRMWARE_ENTRY_* flags in @flags. + */ +static void fw_core_dump_fill_elf_program_header(struct elf32_phdr *phdr, u32 file_offset, + u32 vaddr, u32 size, u32 flags) +{ + /* Loadable segment in program header. */ + phdr->p_type = PT_LOAD; + + /* Location of segment in file in bytes. Aligned to p_align bytes. */ + phdr->p_offset = file_offset; + + /* Virtual address of segment. Aligned to p_align bytes. */ + phdr->p_vaddr = vaddr; + + /* Physical address of segment. Not relevant. */ + phdr->p_paddr = 0; + + /* Size of segment in file and memory. */ + phdr->p_filesz = size; + phdr->p_memsz = size; + + /* Alignment of segment in the file and memory in bytes (integral power of 2). */ + phdr->p_align = FW_PAGE_SIZE; + + /* Set segment access permissions. */ + phdr->p_flags = 0; + if (flags & CSF_FIRMWARE_ENTRY_READ) + phdr->p_flags |= PF_R; + if (flags & CSF_FIRMWARE_ENTRY_WRITE) + phdr->p_flags |= PF_W; + if (flags & CSF_FIRMWARE_ENTRY_EXECUTE) + phdr->p_flags |= PF_X; +} + +/** + * fw_core_dump_get_prstatus_note_size - Calculates size of a ELF32 PRSTATUS note + * @name: Name given to the PRSTATUS note. + * + * Calculates the size of a 32-bit PRSTATUS note (which contains information + * about a process like the current MCU registers) taking into account + * @name must be padded to a 4-byte multiple. + * + * Return: size of 32-bit PRSTATUS note in bytes. + */ +static unsigned int fw_core_dump_get_prstatus_note_size(char *name) +{ + return sizeof(struct elf32_note) + roundup(strlen(name) + 1, 4) + + sizeof(struct elf_prstatus32); +} + +/** + * fw_core_dump_fill_elf_prstatus - Initializes an ELF32 PRSTATUS structure + * @prs: ELF32 PRSTATUS note to initialize + * @regs: MCU registers to copy into the PRSTATUS note + * + * Initializes an ELF32 PRSTATUS structure with MCU registers @regs. + * Other process information is N/A for CSF Firmware. + */ +static void fw_core_dump_fill_elf_prstatus(struct elf_prstatus32 *prs, + struct fw_core_dump_mcu *regs) +{ + /* Only fill in registers (32-bit) of PRSTATUS note. */ + memset(prs, 0, sizeof(*prs)); + prs->pr_reg[0] = regs->r0; + prs->pr_reg[1] = regs->r1; + prs->pr_reg[2] = regs->r2; + prs->pr_reg[3] = regs->r3; + prs->pr_reg[4] = regs->r4; + prs->pr_reg[5] = regs->r5; + prs->pr_reg[6] = regs->r0; + prs->pr_reg[7] = regs->r7; + prs->pr_reg[8] = regs->r8; + prs->pr_reg[9] = regs->r9; + prs->pr_reg[10] = regs->r10; + prs->pr_reg[11] = regs->r11; + prs->pr_reg[12] = regs->r12; + prs->pr_reg[13] = regs->sp; + prs->pr_reg[14] = regs->lr; + prs->pr_reg[15] = regs->pc; +} + +/** + * fw_core_dump_create_prstatus_note - Creates an ELF32 PRSTATUS note + * @name: Name for the PRSTATUS note + * @prs: ELF32 PRSTATUS structure to put in the PRSTATUS note + * @created_prstatus_note: + * Pointer to the allocated ELF32 PRSTATUS note + * + * Creates an ELF32 note with one PRSTATUS entry containing the + * ELF32 PRSTATUS structure @prs. Caller needs to free the created note in + * @created_prstatus_note. + * + * Return: 0 on failure, otherwise size of ELF32 PRSTATUS note in bytes. + */ +static unsigned int fw_core_dump_create_prstatus_note(char *name, struct elf_prstatus32 *prs, + struct elf32_note **created_prstatus_note) +{ + struct elf32_note *note; + unsigned int note_name_sz; + unsigned int note_sz; + + /* Allocate memory for ELF32 note containing a PRSTATUS note. */ + note_name_sz = strlen(name) + 1; + note_sz = sizeof(struct elf32_note) + roundup(note_name_sz, 4) + + sizeof(struct elf_prstatus32); + note = kmalloc(note_sz, GFP_KERNEL); + if (!note) + return 0; + + /* Fill in ELF32 note with one entry for a PRSTATUS note. */ + note->n_namesz = note_name_sz; + note->n_descsz = sizeof(struct elf_prstatus32); + note->n_type = NT_PRSTATUS; + memcpy(note + 1, name, note_name_sz); + memcpy((char *)(note + 1) + roundup(note_name_sz, 4), prs, sizeof(*prs)); + + /* Return pointer and size of the created ELF32 note. */ + *created_prstatus_note = note; + return note_sz; +} + +/** + * fw_core_dump_write_elf_header - Writes ELF header for the FW core dump + * @m: the seq_file handle + * + * Writes the ELF header of the core dump including program headers for + * memory sections and a note containing the current MCU register + * values. + * + * Excludes memory sections without read access permissions or + * are for protected memory. + * + * The data written is as follows: + * - ELF header + * - ELF PHDRs for memory sections + * - ELF PHDR for program header NOTE + * - ELF PRSTATUS note + * - 0-bytes padding to multiple of ELF_EXEC_PAGESIZE + * + * The actual memory section dumps should follow this (not written + * by this function). + * + * Retrieves the necessary information via the struct + * fw_core_dump_data stored in the private member of the seq_file + * handle. + * + * Return: + * * 0 - success + * * -ENOMEM - not enough memory for allocating ELF32 note + */ +int fw_core_dump_write_elf_header(struct seq_file *m) +{ + struct elf32_hdr hdr; + struct elf32_phdr phdr; + struct fw_core_dump_data *dump_data = m->private; + struct kbase_device *const kbdev = dump_data->kbdev; + struct kbase_csf_firmware_interface *interface; + struct elf_prstatus32 elf_prs; + struct elf32_note *elf_prstatus_note; + unsigned int sections = 0; + unsigned int elf_prstatus_note_size; + u32 elf_prstatus_offset; + u32 elf_phdr_note_offset; + u32 elf_memory_sections_data_offset; + u32 total_pages = 0; + u32 padding_size, *padding; + struct fw_core_dump_mcu regs = { 0 }; + + CSTD_UNUSED(total_pages); + + /* Count number of memory sections. */ + list_for_each_entry(interface, &kbdev->csf.firmware_interfaces, node) { + /* Skip memory sections that cannot be read or are protected. */ + if ((interface->flags & CSF_FIRMWARE_ENTRY_PROTECTED) || + (interface->flags & CSF_FIRMWARE_ENTRY_READ) == 0) + continue; + sections++; + } + + /* Prepare ELF header. */ + fw_core_dump_fill_elf_header(&hdr, sections + 1); + seq_write(m, &hdr, sizeof(struct elf32_hdr)); + + elf_prstatus_note_size = fw_core_dump_get_prstatus_note_size("CORE"); + /* PHDRs of PT_LOAD type. */ + elf_phdr_note_offset = sizeof(struct elf32_hdr) + sections * sizeof(struct elf32_phdr); + /* PHDR of PT_NOTE type. */ + elf_prstatus_offset = elf_phdr_note_offset + sizeof(struct elf32_phdr); + elf_memory_sections_data_offset = elf_prstatus_offset + elf_prstatus_note_size; + + /* Calculate padding size to page offset. */ + padding_size = roundup(elf_memory_sections_data_offset, ELF_EXEC_PAGESIZE) - + elf_memory_sections_data_offset; + elf_memory_sections_data_offset += padding_size; + + /* Prepare ELF program header table. */ + list_for_each_entry(interface, &kbdev->csf.firmware_interfaces, node) { + /* Skip memory sections that cannot be read or are protected. */ + if ((interface->flags & CSF_FIRMWARE_ENTRY_PROTECTED) || + (interface->flags & CSF_FIRMWARE_ENTRY_READ) == 0) + continue; + + fw_core_dump_fill_elf_program_header(&phdr, elf_memory_sections_data_offset, + interface->virtual, + interface->num_pages * FW_PAGE_SIZE, + interface->flags); + + seq_write(m, &phdr, sizeof(struct elf32_phdr)); + + elf_memory_sections_data_offset += interface->num_pages * FW_PAGE_SIZE; + total_pages += interface->num_pages; + } + + /* Prepare PHDR of PT_NOTE type. */ + fw_core_dump_fill_elf_program_header_note(&phdr, elf_prstatus_offset, + elf_prstatus_note_size); + seq_write(m, &phdr, sizeof(struct elf32_phdr)); + + /* Prepare ELF note of PRSTATUS type. */ + if (fw_get_core_dump_mcu(kbdev, ®s)) + dev_dbg(kbdev->dev, "MCU Registers not available, all registers set to zero"); + /* Even if MCU Registers are not available the ELF prstatus is still + * filled with the registers equal to zero. + */ + fw_core_dump_fill_elf_prstatus(&elf_prs, ®s); + elf_prstatus_note_size = + fw_core_dump_create_prstatus_note("CORE", &elf_prs, &elf_prstatus_note); + if (elf_prstatus_note_size == 0) + return -ENOMEM; + + seq_write(m, elf_prstatus_note, elf_prstatus_note_size); + kfree(elf_prstatus_note); + + /* Pad file to page size. */ + padding = kzalloc(padding_size, GFP_KERNEL); + seq_write(m, padding, padding_size); + kfree(padding); + + return 0; +} + +#define MAX_FW_CORE_DUMP_HEADER_SIZE (1 << 14) + +/** + * get_fw_core_dump_size - Get firmware core dump size + * @kbdev: Instance of a GPU platform device that implements a CSF interface. + * + * Return: size on success, -1 otherwise. + */ +size_t get_fw_core_dump_size(struct kbase_device *kbdev) +{ + static char buffer[MAX_FW_CORE_DUMP_HEADER_SIZE]; + size_t size; + struct fw_core_dump_data private = {.kbdev = kbdev}; + struct seq_file m = {.private = &private, .buf = buffer, .size = MAX_FW_CORE_DUMP_HEADER_SIZE}; + struct kbase_csf_firmware_interface *interface; + + fw_core_dump_write_elf_header(&m); + if (unlikely(m.count >= m.size)) { + dev_warn(kbdev->dev, "firmware core dump header may be larger than buffer size"); + return -1; + } + size = m.count; + + list_for_each_entry(interface, &kbdev->csf.firmware_interfaces, node) { + /* Skip memory sections that cannot be read or are protected. */ + if ((interface->flags & CSF_FIRMWARE_ENTRY_PROTECTED) || + (interface->flags & CSF_FIRMWARE_ENTRY_READ) == 0) + continue; + + size += interface->num_pages * FW_PAGE_SIZE; + } + + return size; +} + +/** + * fw_core_dump_create - Requests firmware to save state for a firmware core dump + * @kbdev: Instance of a GPU platform device that implements a CSF interface. + * + * Return: 0 on success, error code otherwise. + */ +int fw_core_dump_create(struct kbase_device *kbdev) +{ + int err; + + /* Ensure MCU is active before requesting the core dump. */ + kbase_csf_scheduler_pm_active(kbdev); + err = kbase_csf_scheduler_killable_wait_mcu_active(kbdev); + if (!err) + err = kbase_csf_firmware_req_core_dump(kbdev); + + kbase_csf_scheduler_pm_idle(kbdev); + + return err; +} + +/** + * fw_core_dump_seq_start - seq_file start operation for firmware core dump file + * @m: the seq_file handle + * @_pos: holds the current position in pages + * (0 or most recent position used in previous session) + * + * Starts a seq_file session, positioning the iterator for the session to page @_pos - 1 + * within the firmware interface memory sections. @_pos value 0 is used to indicate the + * position of the ELF header at the start of the file. + * + * Retrieves the necessary information via the struct fw_core_dump_data stored in + * the private member of the seq_file handle. + * + * Return: + * * iterator pointer - pointer to iterator struct fw_core_dump_seq_off + * * SEQ_START_TOKEN - special iterator pointer indicating its is the start of the file + * * NULL - iterator could not be allocated + */ +static void *fw_core_dump_seq_start(struct seq_file *m, loff_t *_pos) +{ + struct fw_core_dump_data *dump_data = m->private; + struct fw_core_dump_seq_off *data; + struct kbase_csf_firmware_interface *interface; + loff_t pos = *_pos; + + if (pos == 0) + return SEQ_START_TOKEN; + + /* Move iterator in the right position based on page number within + * available pages of firmware interface memory sections. + */ + pos--; /* ignore start token */ + list_for_each_entry(interface, &dump_data->kbdev->csf.firmware_interfaces, node) { + /* Skip memory sections that cannot be read or are protected. */ + if ((interface->flags & CSF_FIRMWARE_ENTRY_PROTECTED) || + (interface->flags & CSF_FIRMWARE_ENTRY_READ) == 0) + continue; + + if (pos >= interface->num_pages) { + pos -= interface->num_pages; + } else { + data = kmalloc(sizeof(*data), GFP_KERNEL); + if (!data) + return NULL; + data->interface = interface; + data->page_num = pos; + return data; + } + } + + return NULL; +} + +/** + * fw_core_dump_seq_stop - seq_file stop operation for firmware core dump file + * @m: the seq_file handle + * @v: the current iterator (pointer to struct fw_core_dump_seq_off) + * + * Closes the current session and frees any memory related. + */ +static void fw_core_dump_seq_stop(struct seq_file *m, void *v) +{ + kfree(v); +} + +/** + * fw_core_dump_seq_next - seq_file next operation for firmware core dump file + * @m: the seq_file handle + * @v: the current iterator (pointer to struct fw_core_dump_seq_off) + * @pos: holds the current position in pages + * (0 or most recent position used in previous session) + * + * Moves the iterator @v forward to the next page within the firmware interface + * memory sections and returns the updated position in @pos. + * @v value SEQ_START_TOKEN indicates the ELF header position. + * + * Return: + * * iterator pointer - pointer to iterator struct fw_core_dump_seq_off + * * NULL - iterator could not be allocated + */ +static void *fw_core_dump_seq_next(struct seq_file *m, void *v, loff_t *pos) +{ + struct fw_core_dump_data *dump_data = m->private; + struct fw_core_dump_seq_off *data = v; + struct kbase_csf_firmware_interface *interface; + struct list_head *interfaces = &dump_data->kbdev->csf.firmware_interfaces; + + /* Is current position at the ELF header ? */ + if (v == SEQ_START_TOKEN) { + if (list_empty(interfaces)) + return NULL; + + /* Prepare iterator for starting at first page in firmware interface + * memory sections. + */ + data = kmalloc(sizeof(*data), GFP_KERNEL); + if (!data) + return NULL; + data->interface = + list_first_entry(interfaces, struct kbase_csf_firmware_interface, node); + data->page_num = 0; + ++*pos; + return data; + } + + /* First attempt to satisfy from current firmware interface memory section. */ + interface = data->interface; + if (data->page_num + 1 < interface->num_pages) { + data->page_num++; + ++*pos; + return data; + } + + /* Need next firmware interface memory section. This could be the last one. */ + if (list_is_last(&interface->node, interfaces)) { + kfree(data); + return NULL; + } + + /* Move to first page in next firmware interface memory section. */ + data->interface = list_next_entry(interface, node); + data->page_num = 0; + ++*pos; + + return data; +} + +/** + * fw_core_dump_seq_show - seq_file show operation for firmware core dump file + * @m: the seq_file handle + * @v: the current iterator (pointer to struct fw_core_dump_seq_off) + * + * Writes the current page in a firmware interface memory section indicated + * by the iterator @v to the file. If @v is SEQ_START_TOKEN the ELF + * header is written. + * + * Return: 0 on success, error code otherwise. + */ +static int fw_core_dump_seq_show(struct seq_file *m, void *v) +{ + struct fw_core_dump_seq_off *data = v; + struct page *page; + u32 *p; + + /* Either write the ELF header or current page. */ + if (v == SEQ_START_TOKEN) + return fw_core_dump_write_elf_header(m); + + /* Write the current page. */ + page = as_page(data->interface->phys[data->page_num]); + p = kbase_kmap_atomic(page); + seq_write(m, p, FW_PAGE_SIZE); + kbase_kunmap_atomic(p); + + return 0; +} + +/* Sequence file operations for firmware core dump file. */ +static const struct seq_operations fw_core_dump_seq_ops = { + .start = fw_core_dump_seq_start, + .next = fw_core_dump_seq_next, + .stop = fw_core_dump_seq_stop, + .show = fw_core_dump_seq_show, +}; + +/** + * fw_core_dump_debugfs_open - callback for opening the 'fw_core_dump' debugfs file + * @inode: inode of the file + * @file: file pointer + * + * Prepares for servicing a write request to request a core dump from firmware and + * a read request to retrieve the core dump. + * + * Returns an error if the firmware is not initialized yet. + * + * Return: 0 on success, error code otherwise. + */ +static int fw_core_dump_debugfs_open(struct inode *inode, struct file *file) +{ + struct kbase_device *const kbdev = inode->i_private; + struct fw_core_dump_data *dump_data; + int ret; + + /* Fail if firmware is not initialized yet. */ + if (!kbdev->csf.firmware_inited) { + ret = -ENODEV; + goto open_fail; + } + + /* Open a sequence file for iterating through the pages in the + * firmware interface memory pages. seq_open stores a + * struct seq_file * in the private_data field of @file. + */ + ret = seq_open(file, &fw_core_dump_seq_ops); + if (ret) + goto open_fail; + + /* Allocate a context for sequence file operations. */ + dump_data = kmalloc(sizeof(*dump_data), GFP_KERNEL); + if (!dump_data) { + ret = -ENOMEM; + goto out; + } + + /* Kbase device will be shared with sequence file operations. */ + dump_data->kbdev = kbdev; + + /* Link our sequence file context. */ + ((struct seq_file *)file->private_data)->private = dump_data; + + return 0; +out: + seq_release(inode, file); +open_fail: + return ret; +} + +/** + * fw_core_dump_debugfs_write - callback for a write to the 'fw_core_dump' debugfs file + * @file: file pointer + * @ubuf: user buffer containing data to store + * @count: number of bytes in user buffer + * @ppos: file position + * + * Any data written to the file triggers a firmware core dump request which + * subsequently can be retrieved by reading from the file. + * + * Return: @count if the function succeeded. An error code on failure. + */ +static ssize_t fw_core_dump_debugfs_write(struct file *file, const char __user *ubuf, size_t count, + loff_t *ppos) +{ + int err; + struct fw_core_dump_data *dump_data = ((struct seq_file *)file->private_data)->private; + struct kbase_device *const kbdev = dump_data->kbdev; + + CSTD_UNUSED(ppos); + + err = fw_core_dump_create(kbdev); + + return err ? err : count; +} + +/** + * fw_core_dump_debugfs_release - callback for releasing the 'fw_core_dump' debugfs file + * @inode: inode of the file + * @file: file pointer + * + * Return: 0 on success, error code otherwise. + */ +static int fw_core_dump_debugfs_release(struct inode *inode, struct file *file) +{ + struct fw_core_dump_data *dump_data = ((struct seq_file *)file->private_data)->private; + + seq_release(inode, file); + + kfree(dump_data); + + return 0; +} +/* Debugfs file operations for firmware core dump file. */ +static const struct file_operations kbase_csf_fw_core_dump_fops = { + .owner = THIS_MODULE, + .open = fw_core_dump_debugfs_open, + .read = seq_read, + .write = fw_core_dump_debugfs_write, + .llseek = seq_lseek, + .release = fw_core_dump_debugfs_release, +}; + +void kbase_csf_firmware_core_dump_init(struct kbase_device *const kbdev) +{ +#if IS_ENABLED(CONFIG_DEBUG_FS) + debugfs_create_file("fw_core_dump", 0600, kbdev->mali_debugfs_directory, kbdev, + &kbase_csf_fw_core_dump_fops); +#endif /* CONFIG_DEBUG_FS */ +} + +int kbase_csf_firmware_core_dump_entry_parse(struct kbase_device *kbdev, const u32 *entry) +{ + /* Casting to u16 as version is defined by bits 15:0 */ + kbdev->csf.fw_core_dump.version = (u16)entry[FW_CORE_DUMP_VERSION_INDEX]; + + if (kbdev->csf.fw_core_dump.version != FW_CORE_DUMP_DATA_VERSION) + return -EPERM; + + kbdev->csf.fw_core_dump.mcu_regs_addr = entry[FW_CORE_DUMP_START_ADDR_INDEX]; + kbdev->csf.fw_core_dump.available = true; + + return 0; +} diff --git a/mali_kbase/csf/mali_kbase_csf_firmware_core_dump.h b/mali_kbase/csf/mali_kbase_csf_firmware_core_dump.h new file mode 100644 index 0000000..940e8af --- /dev/null +++ b/mali_kbase/csf/mali_kbase_csf_firmware_core_dump.h @@ -0,0 +1,124 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * + * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved. + * + * This program is free software and is provided to you under the terms of the + * GNU General Public License version 2 as published by the Free Software + * Foundation, and any use by you of this program is subject to the terms + * of such GNU license. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, you can access it online at + * http://www.gnu.org/licenses/gpl-2.0.html. + * + */ + +#ifndef _KBASE_CSF_FIRMWARE_CORE_DUMP_H_ +#define _KBASE_CSF_FIRMWARE_CORE_DUMP_H_ + +struct kbase_device; + +/** Offset of the last field of core dump entry from the image header */ +#define CORE_DUMP_ENTRY_START_ADDR_OFFSET (0x4) + +/* Page size in bytes in use by MCU. */ +#define FW_PAGE_SIZE 4096 + +/** + * struct fw_core_dump_data - Context for seq_file operations used on 'fw_core_dump' + * debugfs file. + * @kbdev: Instance of a GPU platform device that implements a CSF interface. + */ +struct fw_core_dump_data { + struct kbase_device *kbdev; +}; + +/** + * kbase_csf_firmware_core_dump_entry_parse() - Parse a "core dump" entry from + * the image header. + * + * @kbdev: Instance of a GPU platform device that implements a CSF interface. + * @entry: Pointer to section. + * + * Read a "core dump" entry from the image header, check the version for + * compatibility and store the address pointer. + * + * Return: 0 if successfully parse entry, negative error code otherwise. + */ +int kbase_csf_firmware_core_dump_entry_parse(struct kbase_device *kbdev, const u32 *entry); + +/** + * kbase_csf_firmware_core_dump_init() - Initialize firmware core dump support + * + * @kbdev: Instance of a GPU platform device that implements a CSF interface. + * Must be zero-initialized. + * + * Creates the fw_core_dump debugfs file through which to request a firmware + * core dump. The created debugfs file is cleaned up as part of kbdev debugfs + * cleanup. + * + * The fw_core_dump debugs file that case be used in the following way: + * + * To explicitly request core dump: + * echo 1 >/sys/kernel/debug/mali0/fw_core_dump + * + * To output current core dump (after explicitly requesting a core dump, or + * kernel driver reported an internal firmware error): + * cat /sys/kernel/debug/mali0/fw_core_dump + */ +void kbase_csf_firmware_core_dump_init(struct kbase_device *const kbdev); + +/** + * get_fw_core_dump_size - Get firmware core dump size + * @kbdev: Instance of a GPU platform device that implements a CSF interface. + * + * Return: size on success, -1 otherwise. + */ +size_t get_fw_core_dump_size(struct kbase_device *kbdev); + +/** + * fw_core_dump_create - Requests firmware to save state for a firmware core dump + * @kbdev: Instance of a GPU platform device that implements a CSF interface. + * + * Return: 0 on success, error code otherwise. + */ +int fw_core_dump_create(struct kbase_device *kbdev); + +/** + * fw_core_dump_write_elf_header - Writes ELF header for the FW core dump + * @m: the seq_file handle + * + * Writes the ELF header of the core dump including program headers for + * memory sections and a note containing the current MCU register + * values. + * + * Excludes memory sections without read access permissions or + * are for protected memory. + * + * The data written is as follows: + * - ELF header + * - ELF PHDRs for memory sections + * - ELF PHDR for program header NOTE + * - ELF PRSTATUS note + * - 0-bytes padding to multiple of ELF_EXEC_PAGESIZE + * + * The actual memory section dumps should follow this (not written + * by this function). + * + * Retrieves the necessary information via the struct + * fw_core_dump_data stored in the private member of the seq_file + * handle. + * + * Return: + * * 0 - success + * * -ENOMEM - not enough memory for allocating ELF32 note + */ +int fw_core_dump_write_elf_header(struct seq_file *m); + +#endif /* _KBASE_CSF_FIRMWARE_CORE_DUMP_H_ */ diff --git a/mali_kbase/csf/mali_kbase_csf_firmware_log.c b/mali_kbase/csf/mali_kbase_csf_firmware_log.c new file mode 100644 index 0000000..89df839 --- /dev/null +++ b/mali_kbase/csf/mali_kbase_csf_firmware_log.c @@ -0,0 +1,547 @@ +// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note +/* + * + * (C) COPYRIGHT 2021-2023 ARM Limited. All rights reserved. + * + * This program is free software and is provided to you under the terms of the + * GNU General Public License version 2 as published by the Free Software + * Foundation, and any use by you of this program is subject to the terms + * of such GNU license. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, you can access it online at + * http://www.gnu.org/licenses/gpl-2.0.html. + * + */ + +#include <mali_kbase.h> +#include "backend/gpu/mali_kbase_pm_internal.h" +#include <csf/mali_kbase_csf_firmware_log.h> +#include <csf/mali_kbase_csf_trace_buffer.h> +#include <linux/debugfs.h> +#include <linux/string.h> +#include <linux/workqueue.h> + +/* + * ARMv7 instruction: Branch with Link calls a subroutine at a PC-relative address. + */ +#define ARMV7_T1_BL_IMM_INSTR 0xd800f000 + +/* + * ARMv7 instruction: Branch with Link calls a subroutine at a PC-relative address, maximum + * negative jump offset. + */ +#define ARMV7_T1_BL_IMM_RANGE_MIN -16777216 + +/* + * ARMv7 instruction: Branch with Link calls a subroutine at a PC-relative address, maximum + * positive jump offset. + */ +#define ARMV7_T1_BL_IMM_RANGE_MAX 16777214 + +/* + * ARMv7 instruction: Double NOP instructions. + */ +#define ARMV7_DOUBLE_NOP_INSTR 0xbf00bf00 + +#if defined(CONFIG_DEBUG_FS) + +static int kbase_csf_firmware_log_enable_mask_read(void *data, u64 *val) +{ + struct kbase_device *kbdev = (struct kbase_device *)data; + struct firmware_trace_buffer *tb = + kbase_csf_firmware_get_trace_buffer(kbdev, KBASE_CSFFW_LOG_BUF_NAME); + + if (tb == NULL) { + dev_err(kbdev->dev, "Couldn't get the firmware trace buffer"); + return -EIO; + } + /* The enabled traces limited to u64 here, regarded practical */ + *val = kbase_csf_firmware_trace_buffer_get_active_mask64(tb); + return 0; +} + +static int kbase_csf_firmware_log_enable_mask_write(void *data, u64 val) +{ + struct kbase_device *kbdev = (struct kbase_device *)data; + struct firmware_trace_buffer *tb = + kbase_csf_firmware_get_trace_buffer(kbdev, KBASE_CSFFW_LOG_BUF_NAME); + u64 new_mask; + unsigned int enable_bits_count; + + if (tb == NULL) { + dev_err(kbdev->dev, "Couldn't get the firmware trace buffer"); + return -EIO; + } + + /* Ignore unsupported types */ + enable_bits_count = kbase_csf_firmware_trace_buffer_get_trace_enable_bits_count(tb); + if (enable_bits_count > 64) { + dev_dbg(kbdev->dev, "Limit enabled bits count from %u to 64", enable_bits_count); + enable_bits_count = 64; + } + new_mask = val & (UINT64_MAX >> (64 - enable_bits_count)); + + if (new_mask != kbase_csf_firmware_trace_buffer_get_active_mask64(tb)) + return kbase_csf_firmware_trace_buffer_set_active_mask64(tb, new_mask); + else + return 0; +} + +static int kbasep_csf_firmware_log_debugfs_open(struct inode *in, struct file *file) +{ + struct kbase_device *kbdev = in->i_private; + + file->private_data = kbdev; + dev_dbg(kbdev->dev, "Opened firmware trace buffer dump debugfs file"); + + return 0; +} + +static ssize_t kbasep_csf_firmware_log_debugfs_read(struct file *file, char __user *buf, + size_t size, loff_t *ppos) +{ + struct kbase_device *kbdev = file->private_data; + struct kbase_csf_firmware_log *fw_log = &kbdev->csf.fw_log; + unsigned int n_read; + unsigned long not_copied; + /* Limit reads to the kernel dump buffer size */ + size_t mem = MIN(size, FIRMWARE_LOG_DUMP_BUF_SIZE); + int ret; + + struct firmware_trace_buffer *tb = + kbase_csf_firmware_get_trace_buffer(kbdev, KBASE_CSFFW_LOG_BUF_NAME); + + if (tb == NULL) { + dev_err(kbdev->dev, "Couldn't get the firmware trace buffer"); + return -EIO; + } + + if (atomic_cmpxchg(&fw_log->busy, 0, 1) != 0) + return -EBUSY; + + /* Reading from userspace is only allowed in manual mode or auto-discard mode */ + if (fw_log->mode != KBASE_CSF_FIRMWARE_LOG_MODE_MANUAL && + fw_log->mode != KBASE_CSF_FIRMWARE_LOG_MODE_AUTO_DISCARD) { + ret = -EINVAL; + goto out; + } + + n_read = kbase_csf_firmware_trace_buffer_read_data(tb, fw_log->dump_buf, mem); + + /* Do the copy, if we have obtained some trace data */ + not_copied = (n_read) ? copy_to_user(buf, fw_log->dump_buf, n_read) : 0; + + if (not_copied) { + dev_err(kbdev->dev, "Couldn't copy trace buffer data to user space buffer"); + ret = -EFAULT; + goto out; + } + + *ppos += n_read; + ret = n_read; + +out: + atomic_set(&fw_log->busy, 0); + return ret; +} + +static int kbase_csf_firmware_log_mode_read(void *data, u64 *val) +{ + struct kbase_device *kbdev = (struct kbase_device *)data; + struct kbase_csf_firmware_log *fw_log = &kbdev->csf.fw_log; + + *val = fw_log->mode; + return 0; +} + +static int kbase_csf_firmware_log_mode_write(void *data, u64 val) +{ + struct kbase_device *kbdev = (struct kbase_device *)data; + struct kbase_csf_firmware_log *fw_log = &kbdev->csf.fw_log; + int ret = 0; + + if (atomic_cmpxchg(&fw_log->busy, 0, 1) != 0) + return -EBUSY; + + if (val == fw_log->mode) + goto out; + + switch (val) { + case KBASE_CSF_FIRMWARE_LOG_MODE_MANUAL: + cancel_delayed_work_sync(&fw_log->poll_work); + break; + case KBASE_CSF_FIRMWARE_LOG_MODE_AUTO_PRINT: + case KBASE_CSF_FIRMWARE_LOG_MODE_AUTO_DISCARD: + schedule_delayed_work(&fw_log->poll_work, + msecs_to_jiffies(atomic_read(&fw_log->poll_period_ms))); + break; + default: + ret = -EINVAL; + goto out; + } + + fw_log->mode = val; + +out: + atomic_set(&fw_log->busy, 0); + return ret; +} + +static int kbase_csf_firmware_log_poll_period_read(void *data, u64 *val) +{ + struct kbase_device *kbdev = (struct kbase_device *)data; + struct kbase_csf_firmware_log *fw_log = &kbdev->csf.fw_log; + + *val = atomic_read(&fw_log->poll_period_ms); + return 0; +} + +static int kbase_csf_firmware_log_poll_period_write(void *data, u64 val) +{ + struct kbase_device *kbdev = (struct kbase_device *)data; + struct kbase_csf_firmware_log *fw_log = &kbdev->csf.fw_log; + + atomic_set(&fw_log->poll_period_ms, val); + return 0; +} + +DEFINE_DEBUGFS_ATTRIBUTE(kbase_csf_firmware_log_enable_mask_fops, + kbase_csf_firmware_log_enable_mask_read, + kbase_csf_firmware_log_enable_mask_write, "%llx\n"); + +static const struct file_operations kbasep_csf_firmware_log_debugfs_fops = { + .owner = THIS_MODULE, + .open = kbasep_csf_firmware_log_debugfs_open, + .read = kbasep_csf_firmware_log_debugfs_read, + .llseek = no_llseek, +}; + +DEFINE_DEBUGFS_ATTRIBUTE(kbase_csf_firmware_log_mode_fops, kbase_csf_firmware_log_mode_read, + kbase_csf_firmware_log_mode_write, "%llu\n"); +DEFINE_DEBUGFS_ATTRIBUTE(kbase_csf_firmware_log_poll_period_fops, + kbase_csf_firmware_log_poll_period_read, + kbase_csf_firmware_log_poll_period_write, "%llu\n"); + +#endif /* CONFIG_DEBUG_FS */ + +static void kbase_csf_firmware_log_discard_buffer(struct kbase_device *kbdev) +{ + struct kbase_csf_firmware_log *fw_log = &kbdev->csf.fw_log; + struct firmware_trace_buffer *tb = + kbase_csf_firmware_get_trace_buffer(kbdev, KBASE_CSFFW_LOG_BUF_NAME); + + if (tb == NULL) { + dev_dbg(kbdev->dev, "Can't get the trace buffer, firmware log discard skipped"); + return; + } + + if (atomic_cmpxchg(&fw_log->busy, 0, 1) != 0) + return; + + kbase_csf_firmware_trace_buffer_discard(tb); + + atomic_set(&fw_log->busy, 0); +} + +static void kbase_csf_firmware_log_poll(struct work_struct *work) +{ + struct kbase_device *kbdev = + container_of(work, struct kbase_device, csf.fw_log.poll_work.work); + struct kbase_csf_firmware_log *fw_log = &kbdev->csf.fw_log; + + if (fw_log->mode == KBASE_CSF_FIRMWARE_LOG_MODE_AUTO_PRINT) + kbase_csf_firmware_log_dump_buffer(kbdev); + else if (fw_log->mode == KBASE_CSF_FIRMWARE_LOG_MODE_AUTO_DISCARD) + kbase_csf_firmware_log_discard_buffer(kbdev); + else + return; + + schedule_delayed_work(&fw_log->poll_work, + msecs_to_jiffies(atomic_read(&fw_log->poll_period_ms))); +} + +int kbase_csf_firmware_log_init(struct kbase_device *kbdev) +{ + struct kbase_csf_firmware_log *fw_log = &kbdev->csf.fw_log; + int err = 0; +#if defined(CONFIG_DEBUG_FS) + struct dentry *dentry; +#endif /* CONFIG_DEBUG_FS */ + + /* Add one byte for null-termination */ + fw_log->dump_buf = kmalloc(FIRMWARE_LOG_DUMP_BUF_SIZE + 1, GFP_KERNEL); + if (fw_log->dump_buf == NULL) { + err = -ENOMEM; + goto out; + } + + /* Ensure null-termination for all strings */ + fw_log->dump_buf[FIRMWARE_LOG_DUMP_BUF_SIZE] = 0; + + /* Set default log polling period */ + atomic_set(&fw_log->poll_period_ms, KBASE_CSF_FIRMWARE_LOG_POLL_PERIOD_MS_DEFAULT); + + INIT_DEFERRABLE_WORK(&fw_log->poll_work, kbase_csf_firmware_log_poll); +#ifdef CONFIG_MALI_FW_TRACE_MODE_AUTO_DISCARD + fw_log->mode = KBASE_CSF_FIRMWARE_LOG_MODE_AUTO_DISCARD; + schedule_delayed_work(&fw_log->poll_work, + msecs_to_jiffies(KBASE_CSF_FIRMWARE_LOG_POLL_PERIOD_MS_DEFAULT)); +#elif defined(CONFIG_MALI_FW_TRACE_MODE_AUTO_PRINT) + fw_log->mode = KBASE_CSF_FIRMWARE_LOG_MODE_AUTO_PRINT; + schedule_delayed_work(&fw_log->poll_work, + msecs_to_jiffies(KBASE_CSF_FIRMWARE_LOG_POLL_PERIOD_MS_DEFAULT)); +#else /* CONFIG_MALI_FW_TRACE_MODE_MANUAL */ + fw_log->mode = KBASE_CSF_FIRMWARE_LOG_MODE_MANUAL; +#endif + + atomic_set(&fw_log->busy, 0); + +#if !defined(CONFIG_DEBUG_FS) + return 0; +#else /* !CONFIG_DEBUG_FS */ + dentry = debugfs_create_file("fw_trace_enable_mask", 0644, kbdev->mali_debugfs_directory, + kbdev, &kbase_csf_firmware_log_enable_mask_fops); + if (IS_ERR_OR_NULL(dentry)) { + dev_err(kbdev->dev, "Unable to create fw_trace_enable_mask\n"); + err = -ENOENT; + goto free_out; + } + dentry = debugfs_create_file("fw_traces", 0444, kbdev->mali_debugfs_directory, kbdev, + &kbasep_csf_firmware_log_debugfs_fops); + if (IS_ERR_OR_NULL(dentry)) { + dev_err(kbdev->dev, "Unable to create fw_traces\n"); + err = -ENOENT; + goto free_out; + } + dentry = debugfs_create_file("fw_trace_mode", 0644, kbdev->mali_debugfs_directory, kbdev, + &kbase_csf_firmware_log_mode_fops); + if (IS_ERR_OR_NULL(dentry)) { + dev_err(kbdev->dev, "Unable to create fw_trace_mode\n"); + err = -ENOENT; + goto free_out; + } + dentry = debugfs_create_file("fw_trace_poll_period_ms", 0644, kbdev->mali_debugfs_directory, + kbdev, &kbase_csf_firmware_log_poll_period_fops); + if (IS_ERR_OR_NULL(dentry)) { + dev_err(kbdev->dev, "Unable to create fw_trace_poll_period_ms"); + err = -ENOENT; + goto free_out; + } + + return 0; + +free_out: + kfree(fw_log->dump_buf); + fw_log->dump_buf = NULL; +#endif /* CONFIG_DEBUG_FS */ +out: + return err; +} + +void kbase_csf_firmware_log_term(struct kbase_device *kbdev) +{ + struct kbase_csf_firmware_log *fw_log = &kbdev->csf.fw_log; + + if (fw_log->dump_buf) { + cancel_delayed_work_sync(&fw_log->poll_work); + kfree(fw_log->dump_buf); + fw_log->dump_buf = NULL; + } +} + +void kbase_csf_firmware_log_dump_buffer(struct kbase_device *kbdev) +{ + struct kbase_csf_firmware_log *fw_log = &kbdev->csf.fw_log; + u8 *buf = fw_log->dump_buf, *p, *pnewline, *pend, *pendbuf; + unsigned int read_size, remaining_size; + struct firmware_trace_buffer *tb = + kbase_csf_firmware_get_trace_buffer(kbdev, KBASE_CSFFW_LOG_BUF_NAME); + + if (tb == NULL) { + dev_dbg(kbdev->dev, "Can't get the trace buffer, firmware trace dump skipped"); + return; + } + + if (atomic_cmpxchg(&fw_log->busy, 0, 1) != 0) + return; + + /* FW should only print complete messages, so there's no need to handle + * partial messages over multiple invocations of this function + */ + + p = buf; + pendbuf = &buf[FIRMWARE_LOG_DUMP_BUF_SIZE]; + + while ((read_size = kbase_csf_firmware_trace_buffer_read_data(tb, p, pendbuf - p))) { + pend = p + read_size; + p = buf; + + while (p < pend && (pnewline = memchr(p, '\n', pend - p))) { + /* Null-terminate the string */ + *pnewline = 0; + + dev_err(kbdev->dev, "FW> %s", p); + + p = pnewline + 1; + } + + remaining_size = pend - p; + + if (!remaining_size) { + p = buf; + } else if (remaining_size < FIRMWARE_LOG_DUMP_BUF_SIZE) { + /* Copy unfinished string to the start of the buffer */ + memmove(buf, p, remaining_size); + p = &buf[remaining_size]; + } else { + /* Print abnormally long string without newlines */ + dev_err(kbdev->dev, "FW> %s", buf); + p = buf; + } + } + + if (p != buf) { + /* Null-terminate and print last unfinished string */ + *p = 0; + dev_err(kbdev->dev, "FW> %s", buf); + } + + atomic_set(&fw_log->busy, 0); +} + +void kbase_csf_firmware_log_parse_logging_call_list_entry(struct kbase_device *kbdev, + const uint32_t *entry) +{ + kbdev->csf.fw_log.func_call_list_va_start = entry[0]; + kbdev->csf.fw_log.func_call_list_va_end = entry[1]; +} + +/** + * toggle_logging_calls_in_loaded_image - Toggles FW log func calls in loaded FW image. + * + * @kbdev: Instance of a GPU platform device that implements a CSF interface. + * @enable: Whether to enable or disable the function calls. + */ +static void toggle_logging_calls_in_loaded_image(struct kbase_device *kbdev, bool enable) +{ + uint32_t bl_instruction, diff; + uint32_t imm11, imm10, i1, i2, j1, j2, sign; + uint32_t calling_address = 0, callee_address = 0; + uint32_t list_entry = kbdev->csf.fw_log.func_call_list_va_start; + const uint32_t list_va_end = kbdev->csf.fw_log.func_call_list_va_end; + + if (list_entry == 0 || list_va_end == 0) + return; + + if (enable) { + for (; list_entry < list_va_end; list_entry += 2 * sizeof(uint32_t)) { + /* Read calling address */ + kbase_csf_read_firmware_memory(kbdev, list_entry, &calling_address); + /* Read callee address */ + kbase_csf_read_firmware_memory(kbdev, list_entry + sizeof(uint32_t), + &callee_address); + + diff = callee_address - calling_address - 4; + sign = !!(diff & 0x80000000); + if (ARMV7_T1_BL_IMM_RANGE_MIN > (int32_t)diff || + ARMV7_T1_BL_IMM_RANGE_MAX < (int32_t)diff) { + dev_warn(kbdev->dev, "FW log patch 0x%x out of range, skipping", + calling_address); + continue; + } + + i1 = (diff & 0x00800000) >> 23; + j1 = !i1 ^ sign; + i2 = (diff & 0x00400000) >> 22; + j2 = !i2 ^ sign; + imm11 = (diff & 0xffe) >> 1; + imm10 = (diff & 0x3ff000) >> 12; + + /* Compose BL instruction */ + bl_instruction = ARMV7_T1_BL_IMM_INSTR; + bl_instruction |= j1 << 29; + bl_instruction |= j2 << 27; + bl_instruction |= imm11 << 16; + bl_instruction |= sign << 10; + bl_instruction |= imm10; + + /* Patch logging func calls in their load location */ + dev_dbg(kbdev->dev, "FW log patch 0x%x: 0x%x\n", calling_address, + bl_instruction); + kbase_csf_update_firmware_memory_exe(kbdev, calling_address, + bl_instruction); + } + } else { + for (; list_entry < list_va_end; list_entry += 2 * sizeof(uint32_t)) { + /* Read calling address */ + kbase_csf_read_firmware_memory(kbdev, list_entry, &calling_address); + + /* Overwrite logging func calls with 2 NOP instructions */ + kbase_csf_update_firmware_memory_exe(kbdev, calling_address, + ARMV7_DOUBLE_NOP_INSTR); + } + } +} + +int kbase_csf_firmware_log_toggle_logging_calls(struct kbase_device *kbdev, u32 val) +{ + unsigned long flags; + struct kbase_csf_firmware_log *fw_log = &kbdev->csf.fw_log; + bool mcu_inactive; + bool resume_needed = false; + int ret = 0; + struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler; + + if (atomic_cmpxchg(&fw_log->busy, 0, 1) != 0) + return -EBUSY; + + /* Suspend all the active CS groups */ + dev_dbg(kbdev->dev, "Suspend all the active CS groups"); + + kbase_csf_scheduler_lock(kbdev); + while (scheduler->state != SCHED_SUSPENDED) { + kbase_csf_scheduler_unlock(kbdev); + kbase_csf_scheduler_pm_suspend(kbdev); + kbase_csf_scheduler_lock(kbdev); + resume_needed = true; + } + + /* Wait for the MCU to get disabled */ + dev_info(kbdev->dev, "Wait for the MCU to get disabled"); + ret = kbase_pm_killable_wait_for_desired_state(kbdev); + if (ret) { + dev_err(kbdev->dev, + "wait for PM state failed when toggling FW logging calls"); + ret = -EAGAIN; + goto out; + } + + spin_lock_irqsave(&kbdev->hwaccess_lock, flags); + mcu_inactive = + kbase_pm_is_mcu_inactive(kbdev, kbdev->pm.backend.mcu_state); + spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); + if (!mcu_inactive) { + dev_err(kbdev->dev, + "MCU not inactive after PM state wait when toggling FW logging calls"); + ret = -EAGAIN; + goto out; + } + + /* Toggle FW logging call in the loaded FW image */ + toggle_logging_calls_in_loaded_image(kbdev, val); + dev_dbg(kbdev->dev, "FW logging: %s", val ? "enabled" : "disabled"); + +out: + kbase_csf_scheduler_unlock(kbdev); + if (resume_needed) + /* Resume queue groups and start mcu */ + kbase_csf_scheduler_pm_resume(kbdev); + atomic_set(&fw_log->busy, 0); + return ret; +} diff --git a/mali_kbase/csf/mali_kbase_csf_firmware_log.h b/mali_kbase/csf/mali_kbase_csf_firmware_log.h new file mode 100644 index 0000000..1008320 --- /dev/null +++ b/mali_kbase/csf/mali_kbase_csf_firmware_log.h @@ -0,0 +1,77 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * + * (C) COPYRIGHT 2022 ARM Limited. All rights reserved. + * + * This program is free software and is provided to you under the terms of the + * GNU General Public License version 2 as published by the Free Software + * Foundation, and any use by you of this program is subject to the terms + * of such GNU license. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, you can access it online at + * http://www.gnu.org/licenses/gpl-2.0.html. + * + */ + +#ifndef _KBASE_CSF_FIRMWARE_LOG_H_ +#define _KBASE_CSF_FIRMWARE_LOG_H_ + +#include <mali_kbase.h> + +/** Offset of the last field of functions call list entry from the image header */ +#define FUNC_CALL_LIST_ENTRY_NAME_OFFSET (0x8) + +/* + * Firmware log dumping buffer size. + */ +#define FIRMWARE_LOG_DUMP_BUF_SIZE PAGE_SIZE + +/** + * kbase_csf_firmware_log_init - Initialize firmware log handling. + * + * @kbdev: Pointer to the Kbase device + * + * Return: The initialization error code. + */ +int kbase_csf_firmware_log_init(struct kbase_device *kbdev); + +/** + * kbase_csf_firmware_log_term - Terminate firmware log handling. + * + * @kbdev: Pointer to the Kbase device + */ +void kbase_csf_firmware_log_term(struct kbase_device *kbdev); + +/** + * kbase_csf_firmware_log_dump_buffer - Read remaining data in the firmware log + * buffer and print it to dmesg. + * + * @kbdev: Pointer to the Kbase device + */ +void kbase_csf_firmware_log_dump_buffer(struct kbase_device *kbdev); + +/** + * kbase_csf_firmware_log_parse_logging_call_list_entry - Parse FW logging function call list entry. + * + * @kbdev: Instance of a GPU platform device that implements a CSF interface. + * @entry: Pointer to section. + */ +void kbase_csf_firmware_log_parse_logging_call_list_entry(struct kbase_device *kbdev, + const uint32_t *entry); +/** + * kbase_csf_firmware_log_toggle_logging_calls - Enables/Disables FW logging function calls. + * + * @kbdev: Instance of a GPU platform device that implements a CSF interface. + * @val: Configuration option value. + * + * Return: 0 if successful, negative error code on failure + */ +int kbase_csf_firmware_log_toggle_logging_calls(struct kbase_device *kbdev, u32 val); + +#endif /* _KBASE_CSF_FIRMWARE_LOG_H_ */ diff --git a/mali_kbase/csf/mali_kbase_csf_firmware_no_mali.c b/mali_kbase/csf/mali_kbase_csf_firmware_no_mali.c index 0fd848f..93d7c36 100644 --- a/mali_kbase/csf/mali_kbase_csf_firmware_no_mali.c +++ b/mali_kbase/csf/mali_kbase_csf_firmware_no_mali.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2018-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2018-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -32,6 +32,8 @@ #include "mali_kbase_csf_scheduler.h" #include "mmu/mali_kbase_mmu.h" #include "backend/gpu/mali_kbase_clk_rate_trace_mgr.h" +#include <backend/gpu/mali_kbase_model_linux.h> +#include <csf/mali_kbase_csf_registers.h> #include <linux/list.h> #include <linux/slab.h> @@ -227,7 +229,8 @@ static int invent_capabilities(struct kbase_device *kbdev) iface->version = 1; iface->kbdev = kbdev; iface->features = 0; - iface->prfcnt_size = 64; + iface->prfcnt_size = + GLB_PRFCNT_SIZE_HARDWARE_SIZE_SET(0, KBASE_DUMMY_MODEL_MAX_SAMPLE_SIZE); if (iface->version >= kbase_csf_interface_version(1, 1, 0)) { /* update rate=1, max event size = 1<<8 = 256 */ @@ -270,6 +273,18 @@ void kbase_csf_update_firmware_memory(struct kbase_device *kbdev, /* NO_MALI: Nothing to do here */ } +void kbase_csf_read_firmware_memory_exe(struct kbase_device *kbdev, + u32 gpu_addr, u32 *value) +{ + /* NO_MALI: Nothing to do here */ +} + +void kbase_csf_update_firmware_memory_exe(struct kbase_device *kbdev, + u32 gpu_addr, u32 value) +{ + /* NO_MALI: Nothing to do here */ +} + void kbase_csf_firmware_cs_input( const struct kbase_csf_cmd_stream_info *const info, const u32 offset, const u32 value) @@ -371,37 +386,6 @@ u32 kbase_csf_firmware_csg_output( } KBASE_EXPORT_TEST_API(kbase_csf_firmware_csg_output); -static void -csf_firmware_prfcnt_process(const struct kbase_csf_global_iface *const iface, - const u32 glb_req) -{ - struct kbase_device *kbdev = iface->kbdev; - u32 glb_ack = output_page_read(iface->output, GLB_ACK); - /* If the value of GLB_REQ.PRFCNT_SAMPLE is different from the value of - * GLB_ACK.PRFCNT_SAMPLE, the CSF will sample the performance counters. - */ - if ((glb_req ^ glb_ack) & GLB_REQ_PRFCNT_SAMPLE_MASK) { - /* NO_MALI only uses the first buffer in the ring buffer. */ - input_page_write(iface->input, GLB_PRFCNT_EXTRACT, 0); - output_page_write(iface->output, GLB_PRFCNT_INSERT, 1); - kbase_reg_write(kbdev, GPU_COMMAND, GPU_COMMAND_PRFCNT_SAMPLE); - } - - /* Propagate enable masks to model if request to enable. */ - if (glb_req & GLB_REQ_PRFCNT_ENABLE_MASK) { - u32 tiler_en, l2_en, sc_en; - - tiler_en = input_page_read(iface->input, GLB_PRFCNT_TILER_EN); - l2_en = input_page_read(iface->input, GLB_PRFCNT_MMU_L2_EN); - sc_en = input_page_read(iface->input, GLB_PRFCNT_SHADER_EN); - - /* NO_MALI platform enabled all CSHW counters by default. */ - kbase_reg_write(kbdev, PRFCNT_TILER_EN, tiler_en); - kbase_reg_write(kbdev, PRFCNT_MMU_L2_EN, l2_en); - kbase_reg_write(kbdev, PRFCNT_SHADER_EN, sc_en); - } -} - void kbase_csf_firmware_global_input( const struct kbase_csf_global_iface *const iface, const u32 offset, const u32 value) @@ -412,9 +396,17 @@ void kbase_csf_firmware_global_input( input_page_write(iface->input, offset, value); if (offset == GLB_REQ) { - csf_firmware_prfcnt_process(iface, value); - /* NO_MALI: Immediately acknowledge requests */ - output_page_write(iface->output, GLB_ACK, value); + /* NO_MALI: Immediately acknowledge requests - except for PRFCNT_ENABLE + * and PRFCNT_SAMPLE. These will be processed along with the + * corresponding performance counter registers when the global doorbell + * is rung in order to emulate the performance counter sampling behavior + * of the real firmware. + */ + const u32 ack = output_page_read(iface->output, GLB_ACK); + const u32 req_mask = ~(GLB_REQ_PRFCNT_ENABLE_MASK | GLB_REQ_PRFCNT_SAMPLE_MASK); + const u32 toggled = (value ^ ack) & req_mask; + + output_page_write(iface->output, GLB_ACK, ack ^ toggled); } } KBASE_EXPORT_TEST_API(kbase_csf_firmware_global_input); @@ -455,6 +447,99 @@ u32 kbase_csf_firmware_global_output( KBASE_EXPORT_TEST_API(kbase_csf_firmware_global_output); /** + * csf_doorbell_prfcnt() - Process CSF performance counter doorbell request + * + * @kbdev: An instance of the GPU platform device + */ +static void csf_doorbell_prfcnt(struct kbase_device *kbdev) +{ + struct kbase_csf_global_iface *iface; + u32 req; + u32 ack; + u32 extract_index; + + if (WARN_ON(!kbdev)) + return; + + iface = &kbdev->csf.global_iface; + + req = input_page_read(iface->input, GLB_REQ); + ack = output_page_read(iface->output, GLB_ACK); + extract_index = input_page_read(iface->input, GLB_PRFCNT_EXTRACT); + + /* Process enable bit toggle */ + if ((req ^ ack) & GLB_REQ_PRFCNT_ENABLE_MASK) { + if (req & GLB_REQ_PRFCNT_ENABLE_MASK) { + /* Reset insert index to zero on enable bit set */ + output_page_write(iface->output, GLB_PRFCNT_INSERT, 0); + WARN_ON(extract_index != 0); + } + ack ^= GLB_REQ_PRFCNT_ENABLE_MASK; + } + + /* Process sample request */ + if ((req ^ ack) & GLB_REQ_PRFCNT_SAMPLE_MASK) { + const u32 ring_size = GLB_PRFCNT_CONFIG_SIZE_GET( + input_page_read(iface->input, GLB_PRFCNT_CONFIG)); + u32 insert_index = output_page_read(iface->output, GLB_PRFCNT_INSERT); + + const bool prev_overflow = (req ^ ack) & GLB_ACK_IRQ_MASK_PRFCNT_OVERFLOW_MASK; + const bool prev_threshold = (req ^ ack) & GLB_ACK_IRQ_MASK_PRFCNT_THRESHOLD_MASK; + + /* If ringbuffer is full toggle PRFCNT_OVERFLOW and skip sample */ + if (insert_index - extract_index >= ring_size) { + WARN_ON(insert_index - extract_index > ring_size); + if (!prev_overflow) + ack ^= GLB_ACK_IRQ_MASK_PRFCNT_OVERFLOW_MASK; + } else { + struct gpu_model_prfcnt_en enable_maps = { + .fe = input_page_read(iface->input, GLB_PRFCNT_CSF_EN), + .tiler = input_page_read(iface->input, GLB_PRFCNT_TILER_EN), + .l2 = input_page_read(iface->input, GLB_PRFCNT_MMU_L2_EN), + .shader = input_page_read(iface->input, GLB_PRFCNT_SHADER_EN), + }; + + const u64 prfcnt_base = + input_page_read(iface->input, GLB_PRFCNT_BASE_LO) + + ((u64)input_page_read(iface->input, GLB_PRFCNT_BASE_HI) << 32); + + u32 *sample_base = (u32 *)(uintptr_t)prfcnt_base + + (KBASE_DUMMY_MODEL_MAX_VALUES_PER_SAMPLE * + (insert_index % ring_size)); + + /* trigger sample dump in the dummy model */ + gpu_model_prfcnt_dump_request(sample_base, enable_maps); + + /* increment insert index and toggle PRFCNT_SAMPLE bit in ACK */ + output_page_write(iface->output, GLB_PRFCNT_INSERT, ++insert_index); + ack ^= GLB_ACK_IRQ_MASK_PRFCNT_SAMPLE_MASK; + } + + /* When the ringbuffer reaches 50% capacity toggle PRFCNT_THRESHOLD */ + if (!prev_threshold && (insert_index - extract_index >= (ring_size / 2))) + ack ^= GLB_ACK_IRQ_MASK_PRFCNT_THRESHOLD_MASK; + } + + /* Update GLB_ACK */ + output_page_write(iface->output, GLB_ACK, ack); +} + +void kbase_csf_ring_doorbell(struct kbase_device *kbdev, int doorbell_nr) +{ + WARN_ON(doorbell_nr < 0); + WARN_ON(doorbell_nr >= CSF_NUM_DOORBELL); + + if (WARN_ON(!kbdev)) + return; + + if (doorbell_nr == CSF_KERNEL_DOORBELL_NR) { + csf_doorbell_prfcnt(kbdev); + gpu_model_glb_request_job_irq(kbdev->model); + } +} +EXPORT_SYMBOL(kbase_csf_ring_doorbell); + +/** * handle_internal_firmware_fatal - Handler for CS internal firmware fault. * * @kbdev: Pointer to kbase device @@ -631,17 +716,80 @@ static void enable_gpu_idle_timer(struct kbase_device *const kbdev) kbdev->csf.gpu_idle_dur_count); } +static bool global_debug_request_complete(struct kbase_device *const kbdev, u32 const req_mask) +{ + struct kbase_csf_global_iface *global_iface = &kbdev->csf.global_iface; + bool complete = false; + unsigned long flags; + + kbase_csf_scheduler_spin_lock(kbdev, &flags); + + if ((kbase_csf_firmware_global_output(global_iface, GLB_DEBUG_ACK) & req_mask) == + (kbase_csf_firmware_global_input_read(global_iface, GLB_DEBUG_REQ) & req_mask)) + complete = true; + + kbase_csf_scheduler_spin_unlock(kbdev, flags); + + return complete; +} + +static void set_global_debug_request(const struct kbase_csf_global_iface *const global_iface, + u32 const req_mask) +{ + u32 glb_debug_req; + + kbase_csf_scheduler_spin_lock_assert_held(global_iface->kbdev); + + glb_debug_req = kbase_csf_firmware_global_output(global_iface, GLB_DEBUG_ACK); + glb_debug_req ^= req_mask; + + kbase_csf_firmware_global_input_mask(global_iface, GLB_DEBUG_REQ, glb_debug_req, req_mask); +} + +static void request_fw_core_dump( + const struct kbase_csf_global_iface *const global_iface) +{ + uint32_t run_mode = GLB_DEBUG_REQ_RUN_MODE_SET(0, GLB_DEBUG_RUN_MODE_TYPE_CORE_DUMP); + + set_global_debug_request(global_iface, GLB_DEBUG_REQ_DEBUG_RUN_MASK | run_mode); + + set_global_request(global_iface, GLB_REQ_DEBUG_CSF_REQ_MASK); +} + +int kbase_csf_firmware_req_core_dump(struct kbase_device *const kbdev) +{ + const struct kbase_csf_global_iface *const global_iface = + &kbdev->csf.global_iface; + unsigned long flags; + int ret; + + /* Serialize CORE_DUMP requests. */ + mutex_lock(&kbdev->csf.reg_lock); + + /* Update GLB_REQ with CORE_DUMP request and make firmware act on it. */ + kbase_csf_scheduler_spin_lock(kbdev, &flags); + request_fw_core_dump(global_iface); + kbase_csf_ring_doorbell(kbdev, CSF_KERNEL_DOORBELL_NR); + kbase_csf_scheduler_spin_unlock(kbdev, flags); + + /* Wait for firmware to acknowledge completion of the CORE_DUMP request. */ + ret = wait_for_global_request(kbdev, GLB_REQ_DEBUG_CSF_REQ_MASK); + if (!ret) + WARN_ON(!global_debug_request_complete(kbdev, GLB_DEBUG_REQ_DEBUG_RUN_MASK)); + + mutex_unlock(&kbdev->csf.reg_lock); + + return ret; +} + static void global_init(struct kbase_device *const kbdev, u64 core_mask) { - u32 const ack_irq_mask = GLB_ACK_IRQ_MASK_CFG_ALLOC_EN_MASK | - GLB_ACK_IRQ_MASK_PING_MASK | - GLB_ACK_IRQ_MASK_CFG_PROGRESS_TIMER_MASK | - GLB_ACK_IRQ_MASK_PROTM_ENTER_MASK | - GLB_ACK_IRQ_MASK_FIRMWARE_CONFIG_UPDATE_MASK | - GLB_ACK_IRQ_MASK_PROTM_EXIT_MASK | - GLB_ACK_IRQ_MASK_CFG_PWROFF_TIMER_MASK | - GLB_ACK_IRQ_MASK_IDLE_EVENT_MASK | - GLB_ACK_IRQ_MASK_IDLE_ENABLE_MASK; + u32 const ack_irq_mask = + GLB_ACK_IRQ_MASK_CFG_ALLOC_EN_MASK | GLB_ACK_IRQ_MASK_PING_MASK | + GLB_ACK_IRQ_MASK_CFG_PROGRESS_TIMER_MASK | GLB_ACK_IRQ_MASK_PROTM_ENTER_MASK | + GLB_ACK_IRQ_MASK_PROTM_EXIT_MASK | GLB_ACK_IRQ_MASK_FIRMWARE_CONFIG_UPDATE_MASK | + GLB_ACK_IRQ_MASK_CFG_PWROFF_TIMER_MASK | GLB_ACK_IRQ_MASK_IDLE_EVENT_MASK | + GLB_ACK_IRQ_MASK_IDLE_ENABLE_MASK | GLB_REQ_DEBUG_CSF_REQ_MASK; const struct kbase_csf_global_iface *const global_iface = &kbdev->csf.global_iface; @@ -655,11 +803,14 @@ static void global_init(struct kbase_device *const kbdev, u64 core_mask) set_timeout_global(global_iface, kbase_csf_timeout_get(kbdev)); +#ifndef CONFIG_MALI_HOST_CONTROLS_SC_RAILS /* The GPU idle timer is always enabled for simplicity. Checks will be * done before scheduling the GPU idle worker to see if it is * appropriate for the current power policy. */ enable_gpu_idle_timer(kbdev); +#endif + /* Unmask the interrupts */ kbase_csf_firmware_global_input(global_iface, @@ -785,7 +936,7 @@ void kbase_csf_firmware_reload_completed(struct kbase_device *kbdev) kbase_pm_update_state(kbdev); } -static u32 convert_dur_to_idle_count(struct kbase_device *kbdev, const u32 dur_ms) +static u32 convert_dur_to_idle_count(struct kbase_device *kbdev, const u32 dur_ms, u32 *modifier) { #define HYSTERESIS_VAL_UNIT_SHIFT (10) /* Get the cntfreq_el0 value, which drives the SYSTEM_TIMESTAMP */ @@ -803,14 +954,17 @@ static u32 convert_dur_to_idle_count(struct kbase_device *kbdev, const u32 dur_m dev_warn(kbdev->dev, "No GPU clock, unexpected intregration issue!"); spin_unlock(&kbdev->pm.clk_rtm.lock); - dev_info(kbdev->dev, "Can't get the timestamp frequency, " - "use cycle counter format with firmware idle hysteresis!"); + dev_info( + kbdev->dev, + "Can't get the timestamp frequency, use cycle counter format with firmware idle hysteresis!"); } /* Formula for dur_val = ((dur_ms/1000) * freq_HZ) >> 10) */ dur_val = (dur_val * freq) >> HYSTERESIS_VAL_UNIT_SHIFT; dur_val = div_u64(dur_val, 1000); + *modifier = 0; + /* Interface limits the value field to S32_MAX */ cnt_val_u32 = (dur_val > S32_MAX) ? S32_MAX : (u32)dur_val; @@ -832,7 +986,7 @@ u32 kbase_csf_firmware_get_gpu_idle_hysteresis_time(struct kbase_device *kbdev) u32 dur; kbase_csf_scheduler_spin_lock(kbdev, &flags); - dur = kbdev->csf.gpu_idle_hysteresis_ms; + dur = kbdev->csf.gpu_idle_hysteresis_ns; kbase_csf_scheduler_spin_unlock(kbdev, flags); return dur; @@ -841,7 +995,9 @@ u32 kbase_csf_firmware_get_gpu_idle_hysteresis_time(struct kbase_device *kbdev) u32 kbase_csf_firmware_set_gpu_idle_hysteresis_time(struct kbase_device *kbdev, u32 dur) { unsigned long flags; - const u32 hysteresis_val = convert_dur_to_idle_count(kbdev, dur); + u32 modifier = 0; + + const u32 hysteresis_val = convert_dur_to_idle_count(kbdev, dur, &modifier); /* The 'fw_load_lock' is taken to synchronize against the deferred * loading of FW, where the idle timer will be enabled. @@ -849,46 +1005,77 @@ u32 kbase_csf_firmware_set_gpu_idle_hysteresis_time(struct kbase_device *kbdev, mutex_lock(&kbdev->fw_load_lock); if (unlikely(!kbdev->csf.firmware_inited)) { kbase_csf_scheduler_spin_lock(kbdev, &flags); - kbdev->csf.gpu_idle_hysteresis_ms = dur; + kbdev->csf.gpu_idle_hysteresis_ns = dur; kbdev->csf.gpu_idle_dur_count = hysteresis_val; + kbdev->csf.gpu_idle_dur_count_modifier = modifier; kbase_csf_scheduler_spin_unlock(kbdev, flags); mutex_unlock(&kbdev->fw_load_lock); goto end; } mutex_unlock(&kbdev->fw_load_lock); + if (kbase_reset_gpu_prevent_and_wait(kbdev)) { + dev_warn(kbdev->dev, + "Failed to prevent GPU reset when updating idle_hysteresis_time"); + return kbdev->csf.gpu_idle_dur_count; + } + kbase_csf_scheduler_pm_active(kbdev); - if (kbase_csf_scheduler_wait_mcu_active(kbdev)) { + if (kbase_csf_scheduler_killable_wait_mcu_active(kbdev)) { dev_err(kbdev->dev, "Unable to activate the MCU, the idle hysteresis value shall remain unchanged"); kbase_csf_scheduler_pm_idle(kbdev); + kbase_reset_gpu_allow(kbdev); + return kbdev->csf.gpu_idle_dur_count; } +#ifndef CONFIG_MALI_HOST_CONTROLS_SC_RAILS /* The 'reg_lock' is also taken and is held till the update is not * complete, to ensure the update of idle timer value by multiple Users * gets serialized. */ mutex_lock(&kbdev->csf.reg_lock); - /* The firmware only reads the new idle timer value when the timer is - * disabled. - */ - kbase_csf_scheduler_spin_lock(kbdev, &flags); - kbase_csf_firmware_disable_gpu_idle_timer(kbdev); - kbase_csf_scheduler_spin_unlock(kbdev, flags); - /* Ensure that the request has taken effect */ - wait_for_global_request(kbdev, GLB_REQ_IDLE_DISABLE_MASK); +#endif - kbase_csf_scheduler_spin_lock(kbdev, &flags); - kbdev->csf.gpu_idle_hysteresis_ms = dur; - kbdev->csf.gpu_idle_dur_count = hysteresis_val; - kbase_csf_firmware_enable_gpu_idle_timer(kbdev); - kbase_csf_scheduler_spin_unlock(kbdev, flags); - wait_for_global_request(kbdev, GLB_REQ_IDLE_ENABLE_MASK); +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + kbase_csf_scheduler_lock(kbdev); + if (kbdev->csf.scheduler.gpu_idle_fw_timer_enabled) { +#endif /* CONFIG_MALI_HOST_CONTROLS_SC_RAILS */ + /* The firmware only reads the new idle timer value when the timer is + * disabled. + */ + kbase_csf_scheduler_spin_lock(kbdev, &flags); + kbase_csf_firmware_disable_gpu_idle_timer(kbdev); + kbase_csf_scheduler_spin_unlock(kbdev, flags); + /* Ensure that the request has taken effect */ + wait_for_global_request(kbdev, GLB_REQ_IDLE_DISABLE_MASK); + + kbase_csf_scheduler_spin_lock(kbdev, &flags); + kbdev->csf.gpu_idle_hysteresis_us = dur; + kbdev->csf.gpu_idle_dur_count = hysteresis_val; + kbdev->csf.gpu_idle_dur_count_modifier = modifier; + kbase_csf_firmware_enable_gpu_idle_timer(kbdev); + kbase_csf_scheduler_spin_unlock(kbdev, flags); + wait_for_global_request(kbdev, GLB_REQ_IDLE_ENABLE_MASK); +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + } else { + /* Record the new values. Would be used later when timer is + * enabled + */ + kbase_csf_scheduler_spin_lock(kbdev, &flags); + kbdev->csf.gpu_idle_hysteresis_us = dur; + kbdev->csf.gpu_idle_dur_count = hysteresis_val; + kbdev->csf.gpu_idle_dur_count_modifier = modifier; + kbase_csf_scheduler_spin_unlock(kbdev, flags); + } + kbase_csf_scheduler_unlock(kbdev); +#else mutex_unlock(&kbdev->csf.reg_lock); +#endif kbase_csf_scheduler_pm_idle(kbdev); - + kbase_reset_gpu_allow(kbdev); end: dev_dbg(kbdev->dev, "CSF set firmware idle hysteresis count-value: 0x%.8x", hysteresis_val); @@ -896,9 +1083,9 @@ end: return hysteresis_val; } -static u32 convert_dur_to_core_pwroff_count(struct kbase_device *kbdev, const u32 dur_us) +static u32 convert_dur_to_core_pwroff_count(struct kbase_device *kbdev, const u32 dur_us, + u32 *modifier) { -#define PWROFF_VAL_UNIT_SHIFT (10) /* Get the cntfreq_el0 value, which drives the SYSTEM_TIMESTAMP */ u64 freq = arch_timer_get_cntfrq(); u64 dur_val = dur_us; @@ -914,14 +1101,17 @@ static u32 convert_dur_to_core_pwroff_count(struct kbase_device *kbdev, const u3 dev_warn(kbdev->dev, "No GPU clock, unexpected integration issue!"); spin_unlock(&kbdev->pm.clk_rtm.lock); - dev_info(kbdev->dev, "Can't get the timestamp frequency, " - "use cycle counter with MCU Core Poweroff timer!"); + dev_info( + kbdev->dev, + "Can't get the timestamp frequency, use cycle counter with MCU shader Core Poweroff timer!"); } /* Formula for dur_val = ((dur_us/1e6) * freq_HZ) >> 10) */ dur_val = (dur_val * freq) >> HYSTERESIS_VAL_UNIT_SHIFT; dur_val = div_u64(dur_val, 1000000); + *modifier = 0; + /* Interface limits the value field to S32_MAX */ cnt_val_u32 = (dur_val > S32_MAX) ? S32_MAX : (u32)dur_val; @@ -939,24 +1129,39 @@ static u32 convert_dur_to_core_pwroff_count(struct kbase_device *kbdev, const u3 u32 kbase_csf_firmware_get_mcu_core_pwroff_time(struct kbase_device *kbdev) { - return kbdev->csf.mcu_core_pwroff_dur_us; + u32 pwroff; + unsigned long flags; + + spin_lock_irqsave(&kbdev->hwaccess_lock, flags); + pwroff = kbdev->csf.mcu_core_pwroff_dur_ns; + spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); + + return pwroff; } u32 kbase_csf_firmware_set_mcu_core_pwroff_time(struct kbase_device *kbdev, u32 dur) { unsigned long flags; - const u32 pwroff = convert_dur_to_core_pwroff_count(kbdev, dur); + u32 modifier = 0; + + const u32 pwroff = convert_dur_to_core_pwroff_count(kbdev, dur, &modifier); spin_lock_irqsave(&kbdev->hwaccess_lock, flags); - kbdev->csf.mcu_core_pwroff_dur_us = dur; + kbdev->csf.mcu_core_pwroff_dur_ns = dur; kbdev->csf.mcu_core_pwroff_dur_count = pwroff; + kbdev->csf.mcu_core_pwroff_dur_count_modifier = modifier; spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); - dev_dbg(kbdev->dev, "MCU Core Poweroff input update: 0x%.8x", pwroff); + dev_dbg(kbdev->dev, "MCU shader Core Poweroff input update: 0x%.8x", pwroff); return pwroff; } +u32 kbase_csf_firmware_reset_mcu_core_pwroff_time(struct kbase_device *kbdev) +{ + return kbase_csf_firmware_set_mcu_core_pwroff_time(kbdev, DEFAULT_GLB_PWROFF_TIMEOUT_NS); +} + int kbase_csf_firmware_early_init(struct kbase_device *kbdev) { init_waitqueue_head(&kbdev->csf.event_wait); @@ -965,29 +1170,46 @@ int kbase_csf_firmware_early_init(struct kbase_device *kbdev) kbdev->csf.fw_timeout_ms = kbase_get_timeout_ms(kbdev, CSF_FIRMWARE_TIMEOUT); - kbdev->csf.gpu_idle_hysteresis_ms = FIRMWARE_IDLE_HYSTERESIS_TIME_MS; -#ifdef KBASE_PM_RUNTIME - if (kbase_pm_gpu_sleep_allowed(kbdev)) - kbdev->csf.gpu_idle_hysteresis_ms /= - FIRMWARE_IDLE_HYSTERESIS_GPU_SLEEP_SCALER; -#endif - WARN_ON(!kbdev->csf.gpu_idle_hysteresis_ms); - kbdev->csf.gpu_idle_dur_count = convert_dur_to_idle_count( - kbdev, kbdev->csf.gpu_idle_hysteresis_ms); - + kbase_csf_firmware_reset_mcu_core_pwroff_time(kbdev); INIT_LIST_HEAD(&kbdev->csf.firmware_interfaces); INIT_LIST_HEAD(&kbdev->csf.firmware_config); INIT_LIST_HEAD(&kbdev->csf.firmware_trace_buffers.list); + INIT_LIST_HEAD(&kbdev->csf.user_reg.list); INIT_WORK(&kbdev->csf.firmware_reload_work, kbase_csf_firmware_reload_worker); INIT_WORK(&kbdev->csf.fw_error_work, firmware_error_worker); + init_rwsem(&kbdev->csf.pmode_sync_sem); mutex_init(&kbdev->csf.reg_lock); + kbase_csf_pending_gpuq_kicks_init(kbdev); + + return 0; +} + +void kbase_csf_firmware_early_term(struct kbase_device *kbdev) +{ + kbase_csf_pending_gpuq_kicks_term(kbdev); + mutex_destroy(&kbdev->csf.reg_lock); +} + +int kbase_csf_firmware_late_init(struct kbase_device *kbdev) +{ + u32 modifier = 0; + + kbdev->csf.gpu_idle_hysteresis_ns = FIRMWARE_IDLE_HYSTERESIS_TIME_NS; +#ifdef KBASE_PM_RUNTIME + if (kbase_pm_gpu_sleep_allowed(kbdev)) + kbdev->csf.gpu_idle_hysteresis_ns /= FIRMWARE_IDLE_HYSTERESIS_GPU_SLEEP_SCALER; +#endif + WARN_ON(!kbdev->csf.gpu_idle_hysteresis_ns); + kbdev->csf.gpu_idle_dur_count = + convert_dur_to_idle_count(kbdev, kbdev->csf.gpu_idle_hysteresis_ns, &modifier); + kbdev->csf.gpu_idle_dur_count_modifier = modifier; return 0; } -int kbase_csf_firmware_init(struct kbase_device *kbdev) +int kbase_csf_firmware_load_init(struct kbase_device *kbdev) { int ret; @@ -1053,11 +1275,11 @@ int kbase_csf_firmware_init(struct kbase_device *kbdev) return 0; error: - kbase_csf_firmware_term(kbdev); + kbase_csf_firmware_unload_term(kbdev); return ret; } -void kbase_csf_firmware_term(struct kbase_device *kbdev) +void kbase_csf_firmware_unload_term(struct kbase_device *kbdev) { cancel_work_sync(&kbdev->csf.fw_error_work); @@ -1065,12 +1287,10 @@ void kbase_csf_firmware_term(struct kbase_device *kbdev) /* NO_MALI: Don't stop firmware or unload MMU tables */ - kbase_mmu_term(kbdev, &kbdev->csf.mcu_mmu); + kbase_csf_free_dummy_user_reg_page(kbdev); kbase_csf_scheduler_term(kbdev); - kbase_csf_free_dummy_user_reg_page(kbdev); - kbase_csf_doorbell_mapping_term(kbdev); free_global_iface(kbdev); @@ -1092,12 +1312,12 @@ void kbase_csf_firmware_term(struct kbase_device *kbdev) /* NO_MALI: No trace buffers to terminate */ - mutex_destroy(&kbdev->csf.reg_lock); - /* This will also free up the region allocated for the shared interface * entry parsed from the firmware image. */ kbase_mcu_shared_interface_region_tracker_term(kbdev); + + kbase_mmu_term(kbdev, &kbdev->csf.mcu_mmu); } void kbase_csf_firmware_enable_gpu_idle_timer(struct kbase_device *kbdev) @@ -1146,8 +1366,9 @@ void kbase_csf_firmware_ping(struct kbase_device *const kbdev) kbase_csf_scheduler_spin_unlock(kbdev, flags); } -int kbase_csf_firmware_ping_wait(struct kbase_device *const kbdev) +int kbase_csf_firmware_ping_wait(struct kbase_device *const kbdev, unsigned int wait_timeout_ms) { + CSTD_UNUSED(wait_timeout_ms); kbase_csf_firmware_ping(kbdev); return wait_for_global_request(kbdev, GLB_REQ_PING_MASK); } @@ -1186,7 +1407,7 @@ void kbase_csf_enter_protected_mode(struct kbase_device *kbdev) kbase_csf_ring_doorbell(kbdev, CSF_KERNEL_DOORBELL_NR); } -void kbase_csf_wait_protected_mode_enter(struct kbase_device *kbdev) +int kbase_csf_wait_protected_mode_enter(struct kbase_device *kbdev) { int err = wait_for_global_request(kbdev, GLB_REQ_PROTM_ENTER_MASK); @@ -1194,6 +1415,8 @@ void kbase_csf_wait_protected_mode_enter(struct kbase_device *kbdev) if (kbase_prepare_to_reset_gpu(kbdev, RESET_FLAGS_NONE)) kbase_reset_gpu(kbdev); } + + return err; } void kbase_csf_firmware_trigger_mcu_halt(struct kbase_device *kbdev) @@ -1392,7 +1615,7 @@ int kbase_csf_firmware_mcu_shared_mapping_init( gpu_map_prot = KBASE_REG_MEMATTR_INDEX(AS_MEMATTR_INDEX_NON_CACHEABLE); cpu_map_prot = pgprot_writecombine(cpu_map_prot); - }; + } phys = kmalloc_array(num_pages, sizeof(*phys), GFP_KERNEL); if (!phys) @@ -1402,9 +1625,8 @@ int kbase_csf_firmware_mcu_shared_mapping_init( if (!page_list) goto page_list_alloc_error; - ret = kbase_mem_pool_alloc_pages( - &kbdev->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], - num_pages, phys, false); + ret = kbase_mem_pool_alloc_pages(&kbdev->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], num_pages, + phys, false, NULL); if (ret <= 0) goto phys_mem_pool_alloc_error; @@ -1415,8 +1637,7 @@ int kbase_csf_firmware_mcu_shared_mapping_init( if (!cpu_addr) goto vmap_error; - va_reg = kbase_alloc_free_region(&kbdev->csf.shared_reg_rbtree, 0, - num_pages, KBASE_REG_ZONE_MCU_SHARED); + va_reg = kbase_alloc_free_region(&kbdev->csf.mcu_shared_zone, 0, num_pages); if (!va_reg) goto va_region_alloc_error; @@ -1430,9 +1651,9 @@ int kbase_csf_firmware_mcu_shared_mapping_init( gpu_map_properties &= (KBASE_REG_GPU_RD | KBASE_REG_GPU_WR); gpu_map_properties |= gpu_map_prot; - ret = kbase_mmu_insert_pages_no_flush(kbdev, &kbdev->csf.mcu_mmu, - va_reg->start_pfn, &phys[0], num_pages, - gpu_map_properties, KBASE_MEM_GROUP_CSF_FW); + ret = kbase_mmu_insert_pages_no_flush(kbdev, &kbdev->csf.mcu_mmu, va_reg->start_pfn, + &phys[0], num_pages, gpu_map_properties, + KBASE_MEM_GROUP_CSF_FW, NULL, NULL); if (ret) goto mmu_insert_pages_error; diff --git a/mali_kbase/csf/mali_kbase_csf_heap_context_alloc.c b/mali_kbase/csf/mali_kbase_csf_heap_context_alloc.c index 4b3931f..7c14b8e 100644 --- a/mali_kbase/csf/mali_kbase_csf_heap_context_alloc.c +++ b/mali_kbase/csf/mali_kbase_csf_heap_context_alloc.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -23,10 +23,7 @@ #include "mali_kbase_csf_heap_context_alloc.h" /* Size of one heap context structure, in bytes. */ -#define HEAP_CTX_SIZE ((size_t)32) - -/* Total size of the GPU memory region allocated for heap contexts, in bytes. */ -#define HEAP_CTX_REGION_SIZE (MAX_TILER_HEAPS * HEAP_CTX_SIZE) +#define HEAP_CTX_SIZE ((u32)32) /** * sub_alloc - Sub-allocate a heap context from a GPU memory region @@ -38,8 +35,8 @@ static u64 sub_alloc(struct kbase_csf_heap_context_allocator *const ctx_alloc) { struct kbase_context *const kctx = ctx_alloc->kctx; - int heap_nr = 0; - size_t ctx_offset = 0; + unsigned long heap_nr = 0; + u32 ctx_offset = 0; u64 heap_gpu_va = 0; struct kbase_vmap_struct mapping; void *ctx_ptr = NULL; @@ -55,30 +52,65 @@ static u64 sub_alloc(struct kbase_csf_heap_context_allocator *const ctx_alloc) return 0; } - ctx_offset = heap_nr * HEAP_CTX_SIZE; + ctx_offset = heap_nr * ctx_alloc->heap_context_size_aligned; heap_gpu_va = ctx_alloc->gpu_va + ctx_offset; ctx_ptr = kbase_vmap_prot(kctx, heap_gpu_va, - HEAP_CTX_SIZE, KBASE_REG_CPU_WR, &mapping); + ctx_alloc->heap_context_size_aligned, KBASE_REG_CPU_WR, &mapping); if (unlikely(!ctx_ptr)) { dev_err(kctx->kbdev->dev, - "Failed to map tiler heap context %d (0x%llX)\n", + "Failed to map tiler heap context %lu (0x%llX)\n", heap_nr, heap_gpu_va); return 0; } - memset(ctx_ptr, 0, HEAP_CTX_SIZE); + memset(ctx_ptr, 0, ctx_alloc->heap_context_size_aligned); kbase_vunmap(ctx_ptr, &mapping); bitmap_set(ctx_alloc->in_use, heap_nr, 1); - dev_dbg(kctx->kbdev->dev, "Allocated tiler heap context %d (0x%llX)\n", + dev_dbg(kctx->kbdev->dev, "Allocated tiler heap context %lu (0x%llX)\n", heap_nr, heap_gpu_va); return heap_gpu_va; } /** + * evict_heap_context - Evict the data of heap context from GPU's L2 cache. + * + * @ctx_alloc: Pointer to the heap context allocator. + * @heap_gpu_va: The GPU virtual address of a heap context structure to free. + * + * This function is called when memory for the heap context is freed. It uses the + * FLUSH_PA_RANGE command to evict the data of heap context, so on older CSF GPUs + * there is nothing done. The whole GPU cache is anyways expected to be flushed + * on older GPUs when initial chunks of the heap are freed just before the memory + * for heap context is freed. + */ +static void evict_heap_context(struct kbase_csf_heap_context_allocator *const ctx_alloc, + u64 const heap_gpu_va) +{ + struct kbase_context *const kctx = ctx_alloc->kctx; + u32 offset_in_bytes = (u32)(heap_gpu_va - ctx_alloc->gpu_va); + u32 offset_within_page = offset_in_bytes & ~PAGE_MASK; + u32 page_index = offset_in_bytes >> PAGE_SHIFT; + struct tagged_addr page = + kbase_get_gpu_phy_pages(ctx_alloc->region)[page_index]; + phys_addr_t heap_context_pa = as_phys_addr_t(page) + offset_within_page; + + lockdep_assert_held(&ctx_alloc->lock); + + /* There is no need to take vm_lock here as the ctx_alloc region is protected + * via a nonzero no_user_free_count. The region and the backing page can't + * disappear whilst this function is executing. Flush type is passed as FLUSH_PT + * to CLN+INV L2 only. + */ + kbase_mmu_flush_pa_range(kctx->kbdev, kctx, + heap_context_pa, ctx_alloc->heap_context_size_aligned, + KBASE_MMU_OP_FLUSH_PT); +} + +/** * sub_free - Free a heap context sub-allocated from a GPU memory region * * @ctx_alloc: Pointer to the heap context allocator. @@ -88,7 +120,7 @@ static void sub_free(struct kbase_csf_heap_context_allocator *const ctx_alloc, u64 const heap_gpu_va) { struct kbase_context *const kctx = ctx_alloc->kctx; - u64 ctx_offset = 0; + u32 ctx_offset = 0; unsigned int heap_nr = 0; lockdep_assert_held(&ctx_alloc->lock); @@ -99,13 +131,15 @@ static void sub_free(struct kbase_csf_heap_context_allocator *const ctx_alloc, if (WARN_ON(heap_gpu_va < ctx_alloc->gpu_va)) return; - ctx_offset = heap_gpu_va - ctx_alloc->gpu_va; + ctx_offset = (u32)(heap_gpu_va - ctx_alloc->gpu_va); - if (WARN_ON(ctx_offset >= HEAP_CTX_REGION_SIZE) || - WARN_ON(ctx_offset % HEAP_CTX_SIZE)) + if (WARN_ON(ctx_offset >= (ctx_alloc->region->nr_pages << PAGE_SHIFT)) || + WARN_ON(ctx_offset % ctx_alloc->heap_context_size_aligned)) return; - heap_nr = ctx_offset / HEAP_CTX_SIZE; + evict_heap_context(ctx_alloc, heap_gpu_va); + + heap_nr = ctx_offset / ctx_alloc->heap_context_size_aligned; dev_dbg(kctx->kbdev->dev, "Freed tiler heap context %d (0x%llX)\n", heap_nr, heap_gpu_va); @@ -116,12 +150,17 @@ int kbase_csf_heap_context_allocator_init( struct kbase_csf_heap_context_allocator *const ctx_alloc, struct kbase_context *const kctx) { + const u32 gpu_cache_line_size = + (1U << kctx->kbdev->gpu_props.props.l2_props.log2_line_size); + /* We cannot pre-allocate GPU memory here because the * custom VA zone may not have been created yet. */ ctx_alloc->kctx = kctx; ctx_alloc->region = NULL; ctx_alloc->gpu_va = 0; + ctx_alloc->heap_context_size_aligned = + (HEAP_CTX_SIZE + gpu_cache_line_size - 1) & ~(gpu_cache_line_size - 1); mutex_init(&ctx_alloc->lock); bitmap_zero(ctx_alloc->in_use, MAX_TILER_HEAPS); @@ -142,7 +181,9 @@ void kbase_csf_heap_context_allocator_term( if (ctx_alloc->region) { kbase_gpu_vm_lock(kctx); - ctx_alloc->region->flags &= ~KBASE_REG_NO_USER_FREE; + WARN_ON(!kbase_va_region_is_no_user_free(ctx_alloc->region)); + + kbase_va_region_no_user_free_dec(ctx_alloc->region); kbase_mem_free_region(kctx, ctx_alloc->region); kbase_gpu_vm_unlock(kctx); } @@ -154,9 +195,9 @@ u64 kbase_csf_heap_context_allocator_alloc( struct kbase_csf_heap_context_allocator *const ctx_alloc) { struct kbase_context *const kctx = ctx_alloc->kctx; - u64 flags = BASE_MEM_PROT_GPU_RD | BASE_MEM_PROT_GPU_WR | - BASE_MEM_PROT_CPU_WR | BASEP_MEM_NO_USER_FREE; - u64 nr_pages = PFN_UP(HEAP_CTX_REGION_SIZE); + u64 flags = BASE_MEM_PROT_GPU_RD | BASE_MEM_PROT_GPU_WR | BASE_MEM_PROT_CPU_WR | + BASEP_MEM_NO_USER_FREE | BASE_MEM_PROT_CPU_RD; + u64 nr_pages = PFN_UP(MAX_TILER_HEAPS * ctx_alloc->heap_context_size_aligned); u64 heap_gpu_va = 0; /* Calls to this function are inherently asynchronous, with respect to @@ -164,10 +205,6 @@ u64 kbase_csf_heap_context_allocator_alloc( */ const enum kbase_caller_mmu_sync_info mmu_sync_info = CALLER_MMU_ASYNC; -#ifdef CONFIG_MALI_VECTOR_DUMP - flags |= BASE_MEM_PROT_CPU_RD; -#endif - mutex_lock(&ctx_alloc->lock); /* If the pool of heap contexts wasn't already allocated then diff --git a/mali_kbase/csf/mali_kbase_csf_kcpu.c b/mali_kbase/csf/mali_kbase_csf_kcpu.c index 5380994..0b08dba 100644 --- a/mali_kbase/csf/mali_kbase_csf_kcpu.c +++ b/mali_kbase/csf/mali_kbase_csf_kcpu.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2018-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2018-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -24,7 +24,9 @@ #include <mali_kbase_ctx_sched.h> #include "device/mali_kbase_device.h" #include "mali_kbase_csf.h" +#include "mali_kbase_csf_sync_debugfs.h" #include <linux/export.h> +#include <linux/version_compat_defs.h> #if IS_ENABLED(CONFIG_SYNC_FILE) #include "mali_kbase_fence.h" @@ -33,10 +35,14 @@ static DEFINE_SPINLOCK(kbase_csf_fence_lock); #endif +#ifdef CONFIG_MALI_FENCE_DEBUG +#define FENCE_WAIT_TIMEOUT_MS 3000 +#endif + static void kcpu_queue_process(struct kbase_kcpu_command_queue *kcpu_queue, bool drain_queue); -static void kcpu_queue_process_worker(struct work_struct *data); +static void kcpu_queue_process_worker(struct kthread_work *data); static int kbase_kcpu_map_import_prepare( struct kbase_kcpu_command_queue *kcpu_queue, @@ -51,7 +57,7 @@ static int kbase_kcpu_map_import_prepare( long i; int ret = 0; - lockdep_assert_held(&kctx->csf.kcpu_queues.lock); + lockdep_assert_held(&kcpu_queue->lock); /* Take the processes mmap lock */ down_read(kbase_mem_get_process_mmap_lock()); @@ -76,7 +82,14 @@ static int kbase_kcpu_map_import_prepare( * on the physical pages tracking object. When the last * reference to the tracking object is dropped the pages * would be unpinned if they weren't unpinned before. + * + * Region should be CPU cached: abort if it isn't. */ + if (WARN_ON(!(reg->flags & KBASE_REG_CPU_CACHED))) { + ret = -EINVAL; + goto out; + } + ret = kbase_jd_user_buf_pin_pages(kctx, reg); if (ret) goto out; @@ -110,7 +123,7 @@ static int kbase_kcpu_unmap_import_prepare_internal( struct kbase_va_region *reg; int ret = 0; - lockdep_assert_held(&kctx->csf.kcpu_queues.lock); + lockdep_assert_held(&kcpu_queue->lock); kbase_gpu_vm_lock(kctx); @@ -178,7 +191,8 @@ static void kbase_jit_add_to_pending_alloc_list( &kctx->csf.kcpu_queues.jit_blocked_queues; struct kbase_kcpu_command_queue *blocked_queue; - lockdep_assert_held(&kctx->csf.kcpu_queues.lock); + lockdep_assert_held(&queue->lock); + lockdep_assert_held(&kctx->csf.kcpu_queues.jit_lock); list_for_each_entry(blocked_queue, &kctx->csf.kcpu_queues.jit_blocked_queues, @@ -223,25 +237,28 @@ static int kbase_kcpu_jit_allocate_process( u32 i; int ret; - lockdep_assert_held(&kctx->csf.kcpu_queues.lock); - - if (alloc_info->blocked) { - list_del(&queue->jit_blocked); - alloc_info->blocked = false; - } + lockdep_assert_held(&queue->lock); if (WARN_ON(!info)) return -EINVAL; + mutex_lock(&kctx->csf.kcpu_queues.jit_lock); + /* Check if all JIT IDs are not in use */ for (i = 0; i < count; i++, info++) { /* The JIT ID is still in use so fail the allocation */ if (kctx->jit_alloc[info->id]) { dev_dbg(kctx->kbdev->dev, "JIT ID still in use"); - return -EINVAL; + ret = -EINVAL; + goto fail; } } + if (alloc_info->blocked) { + list_del(&queue->jit_blocked); + alloc_info->blocked = false; + } + /* Now start the allocation loop */ for (i = 0, info = alloc_info->info; i < count; i++, info++) { /* Create a JIT allocation */ @@ -276,7 +293,7 @@ static int kbase_kcpu_jit_allocate_process( */ dev_warn_ratelimited(kctx->kbdev->dev, "JIT alloc command failed: %pK\n", cmd); ret = -ENOMEM; - goto fail; + goto fail_rollback; } /* There are pending frees for an active allocation @@ -294,7 +311,8 @@ static int kbase_kcpu_jit_allocate_process( kctx->jit_alloc[info->id] = NULL; } - return -EAGAIN; + ret = -EAGAIN; + goto fail; } /* Bind it to the user provided ID. */ @@ -310,7 +328,7 @@ static int kbase_kcpu_jit_allocate_process( KBASE_REG_CPU_WR, &mapping); if (!ptr) { ret = -ENOMEM; - goto fail; + goto fail_rollback; } reg = kctx->jit_alloc[info->id]; @@ -319,9 +337,11 @@ static int kbase_kcpu_jit_allocate_process( kbase_vunmap(kctx, &mapping); } + mutex_unlock(&kctx->csf.kcpu_queues.jit_lock); + return 0; -fail: +fail_rollback: /* Roll back completely */ for (i = 0, info = alloc_info->info; i < count; i++, info++) { /* Free the allocations that were successful. @@ -334,6 +354,8 @@ fail: kctx->jit_alloc[info->id] = KBASE_RESERVED_REG_JIT_ALLOC; } +fail: + mutex_unlock(&kctx->csf.kcpu_queues.jit_lock); return ret; } @@ -345,15 +367,16 @@ static int kbase_kcpu_jit_allocate_prepare( { struct kbase_context *const kctx = kcpu_queue->kctx; void __user *data = u64_to_user_ptr(alloc_info->info); - struct base_jit_alloc_info *info; + struct base_jit_alloc_info *info = NULL; u32 count = alloc_info->count; int ret = 0; u32 i; - lockdep_assert_held(&kctx->csf.kcpu_queues.lock); + lockdep_assert_held(&kcpu_queue->lock); - if (!data || count > kcpu_queue->kctx->jit_max_allocations || - count > ARRAY_SIZE(kctx->jit_alloc)) { + if ((count == 0) || (count > ARRAY_SIZE(kctx->jit_alloc)) || + (count > kcpu_queue->kctx->jit_max_allocations) || (!data) || + !kbase_mem_allow_alloc(kctx)) { ret = -EINVAL; goto out; } @@ -388,11 +411,13 @@ static int kbase_kcpu_jit_allocate_prepare( } current_command->type = BASE_KCPU_COMMAND_TYPE_JIT_ALLOC; - list_add_tail(¤t_command->info.jit_alloc.node, - &kctx->csf.kcpu_queues.jit_cmds_head); current_command->info.jit_alloc.info = info; current_command->info.jit_alloc.count = count; current_command->info.jit_alloc.blocked = false; + mutex_lock(&kctx->csf.kcpu_queues.jit_lock); + list_add_tail(¤t_command->info.jit_alloc.node, + &kctx->csf.kcpu_queues.jit_cmds_head); + mutex_unlock(&kctx->csf.kcpu_queues.jit_lock); return 0; out_free: @@ -411,7 +436,9 @@ static void kbase_kcpu_jit_allocate_finish( struct kbase_kcpu_command_queue *queue, struct kbase_kcpu_command *cmd) { - lockdep_assert_held(&queue->kctx->csf.kcpu_queues.lock); + lockdep_assert_held(&queue->lock); + + mutex_lock(&queue->kctx->csf.kcpu_queues.jit_lock); /* Remove this command from the jit_cmds_head list */ list_del(&cmd->info.jit_alloc.node); @@ -425,6 +452,8 @@ static void kbase_kcpu_jit_allocate_finish( cmd->info.jit_alloc.blocked = false; } + mutex_unlock(&queue->kctx->csf.kcpu_queues.jit_lock); + kfree(cmd->info.jit_alloc.info); } @@ -437,18 +466,17 @@ static void kbase_kcpu_jit_retry_pending_allocs(struct kbase_context *kctx) { struct kbase_kcpu_command_queue *blocked_queue; - lockdep_assert_held(&kctx->csf.kcpu_queues.lock); + lockdep_assert_held(&kctx->csf.kcpu_queues.jit_lock); /* * Reschedule all queues blocked by JIT_ALLOC commands. * NOTE: This code traverses the list of blocked queues directly. It * only works as long as the queued works are not executed at the same * time. This precondition is true since we're holding the - * kbase_csf_kcpu_queue_context.lock . + * kbase_csf_kcpu_queue_context.jit_lock . */ - list_for_each_entry(blocked_queue, - &kctx->csf.kcpu_queues.jit_blocked_queues, jit_blocked) - queue_work(kctx->csf.kcpu_queues.wq, &blocked_queue->work); + list_for_each_entry(blocked_queue, &kctx->csf.kcpu_queues.jit_blocked_queues, jit_blocked) + kthread_queue_work(&blocked_queue->csf_kcpu_worker, &blocked_queue->work); } static int kbase_kcpu_jit_free_process(struct kbase_kcpu_command_queue *queue, @@ -465,7 +493,8 @@ static int kbase_kcpu_jit_free_process(struct kbase_kcpu_command_queue *queue, if (WARN_ON(!ids)) return -EINVAL; - lockdep_assert_held(&kctx->csf.kcpu_queues.lock); + lockdep_assert_held(&queue->lock); + mutex_lock(&kctx->csf.kcpu_queues.jit_lock); KBASE_TLSTREAM_TL_KBASE_ARRAY_BEGIN_KCPUQUEUE_EXECUTE_JIT_FREE_END(queue->kctx->kbdev, queue); @@ -497,9 +526,6 @@ static int kbase_kcpu_jit_free_process(struct kbase_kcpu_command_queue *queue, queue->kctx->kbdev, queue, item_err, pages_used); } - /* Free the list of ids */ - kfree(ids); - /* * Remove this command from the jit_cmds_head list and retry pending * allocations. @@ -507,6 +533,11 @@ static int kbase_kcpu_jit_free_process(struct kbase_kcpu_command_queue *queue, list_del(&cmd->info.jit_free.node); kbase_kcpu_jit_retry_pending_allocs(kctx); + mutex_unlock(&kctx->csf.kcpu_queues.jit_lock); + + /* Free the list of ids */ + kfree(ids); + return rc; } @@ -522,7 +553,7 @@ static int kbase_kcpu_jit_free_prepare( int ret; u32 i; - lockdep_assert_held(&kctx->csf.kcpu_queues.lock); + lockdep_assert_held(&kcpu_queue->lock); /* Sanity checks */ if (!count || count > ARRAY_SIZE(kctx->jit_alloc)) { @@ -568,10 +599,12 @@ static int kbase_kcpu_jit_free_prepare( } current_command->type = BASE_KCPU_COMMAND_TYPE_JIT_FREE; - list_add_tail(¤t_command->info.jit_free.node, - &kctx->csf.kcpu_queues.jit_cmds_head); current_command->info.jit_free.ids = ids; current_command->info.jit_free.count = count; + mutex_lock(&kctx->csf.kcpu_queues.jit_lock); + list_add_tail(¤t_command->info.jit_free.node, + &kctx->csf.kcpu_queues.jit_cmds_head); + mutex_unlock(&kctx->csf.kcpu_queues.jit_lock); return 0; out_free: @@ -580,6 +613,7 @@ out: return ret; } +#if IS_ENABLED(CONFIG_MALI_VECTOR_DUMP) || MALI_UNIT_TEST static int kbase_csf_queue_group_suspend_prepare( struct kbase_kcpu_command_queue *kcpu_queue, struct base_kcpu_command_group_suspend_info *suspend_buf, @@ -597,7 +631,7 @@ static int kbase_csf_queue_group_suspend_prepare( int pinned_pages = 0, ret = 0; struct kbase_va_region *reg; - lockdep_assert_held(&kctx->csf.kcpu_queues.lock); + lockdep_assert_held(&kcpu_queue->lock); if (suspend_buf->size < csg_suspend_buf_size) return -EINVAL; @@ -647,10 +681,11 @@ static int kbase_csf_queue_group_suspend_prepare( struct tagged_addr *page_array; u64 start, end, i; - if (((reg->flags & KBASE_REG_ZONE_MASK) != KBASE_REG_ZONE_SAME_VA) || - reg->nr_pages < nr_pages || - kbase_reg_current_backed_size(reg) != - reg->nr_pages) { + if ((kbase_bits_to_zone(reg->flags) != SAME_VA_ZONE) || + (kbase_reg_current_backed_size(reg) < nr_pages) || + !(reg->flags & KBASE_REG_CPU_WR) || + (reg->gpu_alloc->type != KBASE_MEM_TYPE_NATIVE) || + (kbase_is_region_shrinkable(reg)) || (kbase_va_region_is_no_user_free(reg))) { ret = -EINVAL; goto out_clean_pages; } @@ -694,14 +729,14 @@ static int kbase_csf_queue_group_suspend_process(struct kbase_context *kctx, { return kbase_csf_queue_group_suspend(kctx, sus_buf, group_handle); } +#endif static enum kbase_csf_event_callback_action event_cqs_callback(void *param) { struct kbase_kcpu_command_queue *kcpu_queue = (struct kbase_kcpu_command_queue *)param; - struct kbase_context *const kctx = kcpu_queue->kctx; - queue_work(kctx->csf.kcpu_queues.wq, &kcpu_queue->work); + kthread_queue_work(&kcpu_queue->csf_kcpu_worker, &kcpu_queue->work); return KBASE_CSF_EVENT_CALLBACK_KEEP; } @@ -731,7 +766,7 @@ static int kbase_kcpu_cqs_wait_process(struct kbase_device *kbdev, { u32 i; - lockdep_assert_held(&queue->kctx->csf.kcpu_queues.lock); + lockdep_assert_held(&queue->lock); if (WARN_ON(!cqs_wait->objs)) return -EINVAL; @@ -748,7 +783,7 @@ static int kbase_kcpu_cqs_wait_process(struct kbase_device *kbdev, KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_START(kbdev, queue); queue->command_started = true; - KBASE_KTRACE_ADD_CSF_KCPU(kbdev, CQS_WAIT_START, + KBASE_KTRACE_ADD_CSF_KCPU(kbdev, KCPU_CQS_WAIT_START, queue, cqs_wait->nr_objs, 0); } @@ -759,23 +794,24 @@ static int kbase_kcpu_cqs_wait_process(struct kbase_device *kbdev, return -EINVAL; } - sig_set = evt[BASEP_EVENT_VAL_INDEX] > cqs_wait->objs[i].val; + sig_set = + evt[BASEP_EVENT32_VAL_OFFSET / sizeof(u32)] > cqs_wait->objs[i].val; if (sig_set) { bool error = false; bitmap_set(cqs_wait->signaled, i, 1); if ((cqs_wait->inherit_err_flags & (1U << i)) && - evt[BASEP_EVENT_ERR_INDEX] > 0) { + evt[BASEP_EVENT32_ERR_OFFSET / sizeof(u32)] > 0) { queue->has_error = true; error = true; } - KBASE_KTRACE_ADD_CSF_KCPU(kbdev, CQS_WAIT_END, + KBASE_KTRACE_ADD_CSF_KCPU(kbdev, KCPU_CQS_WAIT_END, queue, cqs_wait->objs[i].addr, error); KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_END( - kbdev, queue, evt[BASEP_EVENT_ERR_INDEX]); + kbdev, queue, evt[BASEP_EVENT32_ERR_OFFSET / sizeof(u32)]); queue->command_started = false; } @@ -792,14 +828,36 @@ static int kbase_kcpu_cqs_wait_process(struct kbase_device *kbdev, return bitmap_full(cqs_wait->signaled, cqs_wait->nr_objs); } +static inline bool kbase_kcpu_cqs_is_data_type_valid(u8 data_type) +{ + return data_type == BASEP_CQS_DATA_TYPE_U32 || data_type == BASEP_CQS_DATA_TYPE_U64; +} + +static inline bool kbase_kcpu_cqs_is_aligned(u64 addr, u8 data_type) +{ + BUILD_BUG_ON(BASEP_EVENT32_ALIGN_BYTES != BASEP_EVENT32_SIZE_BYTES); + BUILD_BUG_ON(BASEP_EVENT64_ALIGN_BYTES != BASEP_EVENT64_SIZE_BYTES); + WARN_ON(!kbase_kcpu_cqs_is_data_type_valid(data_type)); + + switch (data_type) { + default: + return false; + case BASEP_CQS_DATA_TYPE_U32: + return (addr & (BASEP_EVENT32_ALIGN_BYTES - 1)) == 0; + case BASEP_CQS_DATA_TYPE_U64: + return (addr & (BASEP_EVENT64_ALIGN_BYTES - 1)) == 0; + } +} + static int kbase_kcpu_cqs_wait_prepare(struct kbase_kcpu_command_queue *queue, struct base_kcpu_command_cqs_wait_info *cqs_wait_info, struct kbase_kcpu_command *current_command) { struct base_cqs_wait_info *objs; unsigned int nr_objs = cqs_wait_info->nr_objs; + unsigned int i; - lockdep_assert_held(&queue->kctx->csf.kcpu_queues.lock); + lockdep_assert_held(&queue->lock); if (nr_objs > BASEP_KCPU_CQS_MAX_NUM_OBJS) return -EINVAL; @@ -817,6 +875,17 @@ static int kbase_kcpu_cqs_wait_prepare(struct kbase_kcpu_command_queue *queue, return -ENOMEM; } + /* Check the CQS objects as early as possible. By checking their alignment + * (required alignment equals to size for Sync32 and Sync64 objects), we can + * prevent overrunning the supplied event page. + */ + for (i = 0; i < nr_objs; i++) { + if (!kbase_kcpu_cqs_is_aligned(objs[i].addr, BASEP_CQS_DATA_TYPE_U32)) { + kfree(objs); + return -EINVAL; + } + } + if (++queue->cqs_wait_count == 1) { if (kbase_csf_event_wait_add(queue->kctx, event_cqs_callback, queue)) { @@ -853,7 +922,7 @@ static void kbase_kcpu_cqs_set_process(struct kbase_device *kbdev, { unsigned int i; - lockdep_assert_held(&queue->kctx->csf.kcpu_queues.lock); + lockdep_assert_held(&queue->lock); if (WARN_ON(!cqs_set->objs)) return; @@ -872,14 +941,13 @@ static void kbase_kcpu_cqs_set_process(struct kbase_device *kbdev, "Sync memory %llx already freed", cqs_set->objs[i].addr); queue->has_error = true; } else { - evt[BASEP_EVENT_ERR_INDEX] = queue->has_error; + evt[BASEP_EVENT32_ERR_OFFSET / sizeof(u32)] = queue->has_error; /* Set to signaled */ - evt[BASEP_EVENT_VAL_INDEX]++; + evt[BASEP_EVENT32_VAL_OFFSET / sizeof(u32)]++; kbase_phy_alloc_mapping_put(queue->kctx, mapping); - KBASE_KTRACE_ADD_CSF_KCPU(kbdev, CQS_SET, - queue, cqs_set->objs[i].addr, - evt[BASEP_EVENT_ERR_INDEX]); + KBASE_KTRACE_ADD_CSF_KCPU(kbdev, KCPU_CQS_SET, queue, cqs_set->objs[i].addr, + evt[BASEP_EVENT32_ERR_OFFSET / sizeof(u32)]); } } @@ -894,11 +962,11 @@ static int kbase_kcpu_cqs_set_prepare( struct base_kcpu_command_cqs_set_info *cqs_set_info, struct kbase_kcpu_command *current_command) { - struct kbase_context *const kctx = kcpu_queue->kctx; struct base_cqs_set *objs; unsigned int nr_objs = cqs_set_info->nr_objs; + unsigned int i; - lockdep_assert_held(&kctx->csf.kcpu_queues.lock); + lockdep_assert_held(&kcpu_queue->lock); if (nr_objs > BASEP_KCPU_CQS_MAX_NUM_OBJS) return -EINVAL; @@ -916,6 +984,17 @@ static int kbase_kcpu_cqs_set_prepare( return -ENOMEM; } + /* Check the CQS objects as early as possible. By checking their alignment + * (required alignment equals to size for Sync32 and Sync64 objects), we can + * prevent overrunning the supplied event page. + */ + for (i = 0; i < nr_objs; i++) { + if (!kbase_kcpu_cqs_is_aligned(objs[i].addr, BASEP_CQS_DATA_TYPE_U32)) { + kfree(objs); + return -EINVAL; + } + } + current_command->type = BASE_KCPU_COMMAND_TYPE_CQS_SET; current_command->info.cqs_set.nr_objs = nr_objs; current_command->info.cqs_set.objs = objs; @@ -948,7 +1027,7 @@ static int kbase_kcpu_cqs_wait_operation_process(struct kbase_device *kbdev, { u32 i; - lockdep_assert_held(&queue->kctx->csf.kcpu_queues.lock); + lockdep_assert_held(&queue->lock); if (WARN_ON(!cqs_wait_operation->objs)) return -EINVAL; @@ -958,12 +1037,16 @@ static int kbase_kcpu_cqs_wait_operation_process(struct kbase_device *kbdev, if (!test_bit(i, cqs_wait_operation->signaled)) { struct kbase_vmap_struct *mapping; bool sig_set; - u64 *evt = (u64 *)kbase_phy_alloc_mapping_get(queue->kctx, - cqs_wait_operation->objs[i].addr, &mapping); + uintptr_t evt = (uintptr_t)kbase_phy_alloc_mapping_get( + queue->kctx, cqs_wait_operation->objs[i].addr, &mapping); + u64 val = 0; - /* GPUCORE-28172 RDT to review */ - if (!queue->command_started) + if (!queue->command_started) { queue->command_started = true; + KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_OPERATION_START( + kbdev, queue); + } + if (!evt) { dev_warn(kbdev->dev, @@ -972,12 +1055,29 @@ static int kbase_kcpu_cqs_wait_operation_process(struct kbase_device *kbdev, return -EINVAL; } + switch (cqs_wait_operation->objs[i].data_type) { + default: + WARN_ON(!kbase_kcpu_cqs_is_data_type_valid( + cqs_wait_operation->objs[i].data_type)); + kbase_phy_alloc_mapping_put(queue->kctx, mapping); + queue->has_error = true; + return -EINVAL; + case BASEP_CQS_DATA_TYPE_U32: + val = *(u32 *)evt; + evt += BASEP_EVENT32_ERR_OFFSET - BASEP_EVENT32_VAL_OFFSET; + break; + case BASEP_CQS_DATA_TYPE_U64: + val = *(u64 *)evt; + evt += BASEP_EVENT64_ERR_OFFSET - BASEP_EVENT64_VAL_OFFSET; + break; + } + switch (cqs_wait_operation->objs[i].operation) { case BASEP_CQS_WAIT_OPERATION_LE: - sig_set = *evt <= cqs_wait_operation->objs[i].val; + sig_set = val <= cqs_wait_operation->objs[i].val; break; case BASEP_CQS_WAIT_OPERATION_GT: - sig_set = *evt > cqs_wait_operation->objs[i].val; + sig_set = val > cqs_wait_operation->objs[i].val; break; default: dev_dbg(kbdev->dev, @@ -989,28 +1089,15 @@ static int kbase_kcpu_cqs_wait_operation_process(struct kbase_device *kbdev, return -EINVAL; } - /* Increment evt up to the error_state value depending on the CQS data type */ - switch (cqs_wait_operation->objs[i].data_type) { - default: - dev_dbg(kbdev->dev, "Unreachable data_type=%d", cqs_wait_operation->objs[i].data_type); - /* Fallthrough - hint to compiler that there's really only 2 options at present */ - fallthrough; - case BASEP_CQS_DATA_TYPE_U32: - evt = (u64 *)((u8 *)evt + sizeof(u32)); - break; - case BASEP_CQS_DATA_TYPE_U64: - evt = (u64 *)((u8 *)evt + sizeof(u64)); - break; - } - if (sig_set) { bitmap_set(cqs_wait_operation->signaled, i, 1); if ((cqs_wait_operation->inherit_err_flags & (1U << i)) && - *evt > 0) { + *(u32 *)evt > 0) { queue->has_error = true; } - /* GPUCORE-28172 RDT to review */ + KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_OPERATION_END( + kbdev, queue, *(u32 *)evt); queue->command_started = false; } @@ -1034,8 +1121,9 @@ static int kbase_kcpu_cqs_wait_operation_prepare(struct kbase_kcpu_command_queue { struct base_cqs_wait_operation_info *objs; unsigned int nr_objs = cqs_wait_operation_info->nr_objs; + unsigned int i; - lockdep_assert_held(&queue->kctx->csf.kcpu_queues.lock); + lockdep_assert_held(&queue->lock); if (nr_objs > BASEP_KCPU_CQS_MAX_NUM_OBJS) return -EINVAL; @@ -1053,6 +1141,18 @@ static int kbase_kcpu_cqs_wait_operation_prepare(struct kbase_kcpu_command_queue return -ENOMEM; } + /* Check the CQS objects as early as possible. By checking their alignment + * (required alignment equals to size for Sync32 and Sync64 objects), we can + * prevent overrunning the supplied event page. + */ + for (i = 0; i < nr_objs; i++) { + if (!kbase_kcpu_cqs_is_data_type_valid(objs[i].data_type) || + !kbase_kcpu_cqs_is_aligned(objs[i].addr, objs[i].data_type)) { + kfree(objs); + return -EINVAL; + } + } + if (++queue->cqs_wait_count == 1) { if (kbase_csf_event_wait_add(queue->kctx, event_cqs_callback, queue)) { @@ -1083,6 +1183,44 @@ static int kbase_kcpu_cqs_wait_operation_prepare(struct kbase_kcpu_command_queue return 0; } +static void kbasep_kcpu_cqs_do_set_operation_32(struct kbase_kcpu_command_queue *queue, + uintptr_t evt, u8 operation, u64 val) +{ + struct kbase_device *kbdev = queue->kctx->kbdev; + + switch (operation) { + case BASEP_CQS_SET_OPERATION_ADD: + *(u32 *)evt += (u32)val; + break; + case BASEP_CQS_SET_OPERATION_SET: + *(u32 *)evt = val; + break; + default: + dev_dbg(kbdev->dev, "Unsupported CQS set operation %d", operation); + queue->has_error = true; + break; + } +} + +static void kbasep_kcpu_cqs_do_set_operation_64(struct kbase_kcpu_command_queue *queue, + uintptr_t evt, u8 operation, u64 val) +{ + struct kbase_device *kbdev = queue->kctx->kbdev; + + switch (operation) { + case BASEP_CQS_SET_OPERATION_ADD: + *(u64 *)evt += val; + break; + case BASEP_CQS_SET_OPERATION_SET: + *(u64 *)evt = val; + break; + default: + dev_dbg(kbdev->dev, "Unsupported CQS set operation %d", operation); + queue->has_error = true; + break; + } +} + static void kbase_kcpu_cqs_set_operation_process( struct kbase_device *kbdev, struct kbase_kcpu_command_queue *queue, @@ -1090,58 +1228,49 @@ static void kbase_kcpu_cqs_set_operation_process( { unsigned int i; - lockdep_assert_held(&queue->kctx->csf.kcpu_queues.lock); + lockdep_assert_held(&queue->lock); if (WARN_ON(!cqs_set_operation->objs)) return; for (i = 0; i < cqs_set_operation->nr_objs; i++) { struct kbase_vmap_struct *mapping; - u64 *evt; + uintptr_t evt; - evt = (u64 *)kbase_phy_alloc_mapping_get( + evt = (uintptr_t)kbase_phy_alloc_mapping_get( queue->kctx, cqs_set_operation->objs[i].addr, &mapping); - /* GPUCORE-28172 RDT to review */ - if (!evt) { dev_warn(kbdev->dev, "Sync memory %llx already freed", cqs_set_operation->objs[i].addr); queue->has_error = true; } else { - switch (cqs_set_operation->objs[i].operation) { - case BASEP_CQS_SET_OPERATION_ADD: - *evt += cqs_set_operation->objs[i].val; - break; - case BASEP_CQS_SET_OPERATION_SET: - *evt = cqs_set_operation->objs[i].val; - break; - default: - dev_dbg(kbdev->dev, - "Unsupported CQS set operation %d", cqs_set_operation->objs[i].operation); - queue->has_error = true; - break; - } + struct base_cqs_set_operation_info *obj = &cqs_set_operation->objs[i]; - /* Increment evt up to the error_state value depending on the CQS data type */ - switch (cqs_set_operation->objs[i].data_type) { + switch (obj->data_type) { default: - dev_dbg(kbdev->dev, "Unreachable data_type=%d", cqs_set_operation->objs[i].data_type); - /* Fallthrough - hint to compiler that there's really only 2 options at present */ - fallthrough; + WARN_ON(!kbase_kcpu_cqs_is_data_type_valid(obj->data_type)); + queue->has_error = true; + goto skip_err_propagation; case BASEP_CQS_DATA_TYPE_U32: - evt = (u64 *)((u8 *)evt + sizeof(u32)); + kbasep_kcpu_cqs_do_set_operation_32(queue, evt, obj->operation, + obj->val); + evt += BASEP_EVENT32_ERR_OFFSET - BASEP_EVENT32_VAL_OFFSET; break; case BASEP_CQS_DATA_TYPE_U64: - evt = (u64 *)((u8 *)evt + sizeof(u64)); + kbasep_kcpu_cqs_do_set_operation_64(queue, evt, obj->operation, + obj->val); + evt += BASEP_EVENT64_ERR_OFFSET - BASEP_EVENT64_VAL_OFFSET; break; } - /* GPUCORE-28172 RDT to review */ + KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_SET_OPERATION( + kbdev, queue, *(u32 *)evt ? 1 : 0); /* Always propagate errors */ - *evt = queue->has_error; + *(u32 *)evt = queue->has_error; +skip_err_propagation: kbase_phy_alloc_mapping_put(queue->kctx, mapping); } } @@ -1157,11 +1286,11 @@ static int kbase_kcpu_cqs_set_operation_prepare( struct base_kcpu_command_cqs_set_operation_info *cqs_set_operation_info, struct kbase_kcpu_command *current_command) { - struct kbase_context *const kctx = kcpu_queue->kctx; struct base_cqs_set_operation_info *objs; unsigned int nr_objs = cqs_set_operation_info->nr_objs; + unsigned int i; - lockdep_assert_held(&kctx->csf.kcpu_queues.lock); + lockdep_assert_held(&kcpu_queue->lock); if (nr_objs > BASEP_KCPU_CQS_MAX_NUM_OBJS) return -EINVAL; @@ -1179,6 +1308,18 @@ static int kbase_kcpu_cqs_set_operation_prepare( return -ENOMEM; } + /* Check the CQS objects as early as possible. By checking their alignment + * (required alignment equals to size for Sync32 and Sync64 objects), we can + * prevent overrunning the supplied event page. + */ + for (i = 0; i < nr_objs; i++) { + if (!kbase_kcpu_cqs_is_data_type_valid(objs[i].data_type) || + !kbase_kcpu_cqs_is_aligned(objs[i].addr, objs[i].data_type)) { + kfree(objs); + return -EINVAL; + } + } + current_command->type = BASE_KCPU_COMMAND_TYPE_CQS_SET_OPERATION; current_command->info.cqs_set_operation.nr_objs = nr_objs; current_command->info.cqs_set_operation.objs = objs; @@ -1200,20 +1341,24 @@ static void kbase_csf_fence_wait_callback(struct dma_fence *fence, struct kbase_kcpu_command_queue *kcpu_queue = fence_info->kcpu_queue; struct kbase_context *const kctx = kcpu_queue->kctx; - KBASE_KTRACE_ADD_CSF_KCPU(kctx->kbdev, FENCE_WAIT_END, kcpu_queue, +#ifdef CONFIG_MALI_FENCE_DEBUG + /* Fence gets signaled. Deactivate the timer for fence-wait timeout */ + del_timer(&kcpu_queue->fence_timeout); +#endif + + KBASE_KTRACE_ADD_CSF_KCPU(kctx->kbdev, KCPU_FENCE_WAIT_END, kcpu_queue, fence->context, fence->seqno); /* Resume kcpu command queue processing. */ - queue_work(kctx->csf.kcpu_queues.wq, &kcpu_queue->work); + kthread_queue_work(&kcpu_queue->csf_kcpu_worker, &kcpu_queue->work); } -static void kbase_kcpu_fence_wait_cancel( - struct kbase_kcpu_command_queue *kcpu_queue, - struct kbase_kcpu_command_fence_info *fence_info) +static void kbasep_kcpu_fence_wait_cancel(struct kbase_kcpu_command_queue *kcpu_queue, + struct kbase_kcpu_command_fence_info *fence_info) { struct kbase_context *const kctx = kcpu_queue->kctx; - lockdep_assert_held(&kctx->csf.kcpu_queues.lock); + lockdep_assert_held(&kcpu_queue->lock); if (WARN_ON(!fence_info->fence)) return; @@ -1222,8 +1367,15 @@ static void kbase_kcpu_fence_wait_cancel( bool removed = dma_fence_remove_callback(fence_info->fence, &fence_info->fence_cb); +#ifdef CONFIG_MALI_FENCE_DEBUG + /* Fence-wait cancelled or fence signaled. In the latter case + * the timer would already have been deactivated inside + * kbase_csf_fence_wait_callback(). + */ + del_timer_sync(&kcpu_queue->fence_timeout); +#endif if (removed) - KBASE_KTRACE_ADD_CSF_KCPU(kctx->kbdev, FENCE_WAIT_END, + KBASE_KTRACE_ADD_CSF_KCPU(kctx->kbdev, KCPU_FENCE_WAIT_END, kcpu_queue, fence_info->fence->context, fence_info->fence->seqno); } @@ -1235,6 +1387,80 @@ static void kbase_kcpu_fence_wait_cancel( fence_info->fence = NULL; } +#ifdef CONFIG_MALI_FENCE_DEBUG +/** + * fence_timeout_callback() - Timeout callback function for fence-wait + * + * @timer: Timer struct + * + * Context and seqno of the timed-out fence will be displayed in dmesg. + * If the fence has been signalled a work will be enqueued to process + * the fence-wait without displaying debugging information. + */ +static void fence_timeout_callback(struct timer_list *timer) +{ + struct kbase_kcpu_command_queue *kcpu_queue = + container_of(timer, struct kbase_kcpu_command_queue, fence_timeout); + struct kbase_context *const kctx = kcpu_queue->kctx; + struct kbase_kcpu_command *cmd = &kcpu_queue->commands[kcpu_queue->start_offset]; + struct kbase_kcpu_command_fence_info *fence_info; +#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE) + struct fence *fence; +#else + struct dma_fence *fence; +#endif + struct kbase_sync_fence_info info; + + if (cmd->type != BASE_KCPU_COMMAND_TYPE_FENCE_WAIT) { + dev_err(kctx->kbdev->dev, + "%s: Unexpected command type %d in ctx:%d_%d kcpu queue:%u", __func__, + cmd->type, kctx->tgid, kctx->id, kcpu_queue->id); + return; + } + + fence_info = &cmd->info.fence; + + fence = kbase_fence_get(fence_info); + if (!fence) { + dev_err(kctx->kbdev->dev, "no fence found in ctx:%d_%d kcpu queue:%u", kctx->tgid, + kctx->id, kcpu_queue->id); + return; + } + + kbase_sync_fence_info_get(fence, &info); + + if (info.status == 1) { + kthread_queue_work(&kcpu_queue->csf_kcpu_worker, &kcpu_queue->work); + } else if (info.status == 0) { + dev_warn(kctx->kbdev->dev, "fence has not yet signalled in %ums", + FENCE_WAIT_TIMEOUT_MS); + dev_warn(kctx->kbdev->dev, + "ctx:%d_%d kcpu queue:%u still waiting for fence[%pK] context#seqno:%s", + kctx->tgid, kctx->id, kcpu_queue->id, fence, info.name); + } else { + dev_warn(kctx->kbdev->dev, "fence has got error"); + dev_warn(kctx->kbdev->dev, + "ctx:%d_%d kcpu queue:%u faulty fence[%pK] context#seqno:%s error(%d)", + kctx->tgid, kctx->id, kcpu_queue->id, fence, info.name, info.status); + } + + kbase_fence_put(fence); +} + +/** + * fence_wait_timeout_start() - Start a timer to check fence-wait timeout + * + * @cmd: KCPU command queue + * + * Activate a timer to check whether a fence-wait command in the queue + * gets completed within FENCE_WAIT_TIMEOUT_MS + */ +static void fence_wait_timeout_start(struct kbase_kcpu_command_queue *cmd) +{ + mod_timer(&cmd->fence_timeout, jiffies + msecs_to_jiffies(FENCE_WAIT_TIMEOUT_MS)); +} +#endif + /** * kbase_kcpu_fence_wait_process() - Process the kcpu fence wait command * @@ -1254,8 +1480,9 @@ static int kbase_kcpu_fence_wait_process( #else struct dma_fence *fence; #endif + struct kbase_context *const kctx = kcpu_queue->kctx; - lockdep_assert_held(&kcpu_queue->kctx->csf.kcpu_queues.lock); + lockdep_assert_held(&kcpu_queue->lock); if (WARN_ON(!fence_info->fence)) return -EINVAL; @@ -1265,18 +1492,38 @@ static int kbase_kcpu_fence_wait_process( if (kcpu_queue->fence_wait_processed) { fence_status = dma_fence_get_status(fence); } else { - int cb_err = dma_fence_add_callback(fence, + int cb_err; + + KBASE_KTRACE_ADD_CSF_KCPU(kctx->kbdev, KCPU_FENCE_WAIT_START, kcpu_queue, + fence->context, fence->seqno); + + cb_err = dma_fence_add_callback(fence, &fence_info->fence_cb, kbase_csf_fence_wait_callback); - KBASE_KTRACE_ADD_CSF_KCPU(kcpu_queue->kctx->kbdev, - FENCE_WAIT_START, kcpu_queue, - fence->context, fence->seqno); fence_status = cb_err; - if (cb_err == 0) + if (cb_err == 0) { kcpu_queue->fence_wait_processed = true; - else if (cb_err == -ENOENT) +#ifdef CONFIG_MALI_FENCE_DEBUG + fence_wait_timeout_start(kcpu_queue); +#endif + } else if (cb_err == -ENOENT) { fence_status = dma_fence_get_status(fence); + if (!fence_status) { + struct kbase_sync_fence_info info; + + kbase_sync_fence_info_get(fence, &info); + dev_warn(kctx->kbdev->dev, + "Unexpected status for fence %s of ctx:%d_%d kcpu queue:%u", + info.name, kctx->tgid, kctx->id, kcpu_queue->id); + } + + KBASE_KTRACE_ADD_CSF_KCPU(kctx->kbdev, KCPU_FENCE_WAIT_END, kcpu_queue, + fence->context, fence->seqno); + } else { + KBASE_KTRACE_ADD_CSF_KCPU(kctx->kbdev, KCPU_FENCE_WAIT_END, kcpu_queue, + fence->context, fence->seqno); + } } /* @@ -1289,17 +1536,15 @@ static int kbase_kcpu_fence_wait_process( */ if (fence_status) - kbase_kcpu_fence_wait_cancel(kcpu_queue, fence_info); + kbasep_kcpu_fence_wait_cancel(kcpu_queue, fence_info); return fence_status; } -static int kbase_kcpu_fence_wait_prepare( - struct kbase_kcpu_command_queue *kcpu_queue, - struct base_kcpu_command_fence_info *fence_info, - struct kbase_kcpu_command *current_command) +static int kbase_kcpu_fence_wait_prepare(struct kbase_kcpu_command_queue *kcpu_queue, + struct base_kcpu_command_fence_info *fence_info, + struct kbase_kcpu_command *current_command) { - struct kbase_context *const kctx = kcpu_queue->kctx; #if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE) struct fence *fence_in; #else @@ -1307,10 +1552,9 @@ static int kbase_kcpu_fence_wait_prepare( #endif struct base_fence fence; - lockdep_assert_held(&kctx->csf.kcpu_queues.lock); + lockdep_assert_held(&kcpu_queue->lock); - if (copy_from_user(&fence, u64_to_user_ptr(fence_info->fence), - sizeof(fence))) + if (copy_from_user(&fence, u64_to_user_ptr(fence_info->fence), sizeof(fence))) return -ENOMEM; fence_in = sync_file_get_fence(fence.basep.fd); @@ -1321,62 +1565,267 @@ static int kbase_kcpu_fence_wait_prepare( current_command->type = BASE_KCPU_COMMAND_TYPE_FENCE_WAIT; current_command->info.fence.fence = fence_in; current_command->info.fence.kcpu_queue = kcpu_queue; - return 0; } -static int kbase_kcpu_fence_signal_process( +/** + * fence_signal_timeout_start() - Start a timer to check enqueued fence-signal command is + * blocked for too long a duration + * + * @kcpu_queue: KCPU command queue + * + * Activate the queue's fence_signal_timeout timer to check whether a fence-signal command + * enqueued has been blocked for longer than a configured wait duration. + */ +static void fence_signal_timeout_start(struct kbase_kcpu_command_queue *kcpu_queue) +{ + struct kbase_device *kbdev = kcpu_queue->kctx->kbdev; + unsigned int wait_ms = kbase_get_timeout_ms(kbdev, KCPU_FENCE_SIGNAL_TIMEOUT); + + if (atomic_read(&kbdev->fence_signal_timeout_enabled)) + mod_timer(&kcpu_queue->fence_signal_timeout, jiffies + msecs_to_jiffies(wait_ms)); +} + +static void kbase_kcpu_command_fence_force_signaled_set( + struct kbase_kcpu_command_fence_info *fence_info, + bool has_force_signaled) +{ + fence_info->fence_has_force_signaled = has_force_signaled; +} + +bool kbase_kcpu_command_fence_has_force_signaled(struct kbase_kcpu_command_fence_info *fence_info) +{ + return fence_info->fence_has_force_signaled; +} + +static int kbase_kcpu_fence_force_signal_process( struct kbase_kcpu_command_queue *kcpu_queue, struct kbase_kcpu_command_fence_info *fence_info) { struct kbase_context *const kctx = kcpu_queue->kctx; int ret; + /* already force signaled just return*/ + if (kbase_kcpu_command_fence_has_force_signaled(fence_info)) + return 0; + + if (WARN_ON(!fence_info->fence)) + return -EINVAL; + + ret = dma_fence_signal(fence_info->fence); + if (unlikely(ret < 0)) { + dev_warn(kctx->kbdev->dev, "dma_fence(%d) has been signalled already\n", ret); + /* Treated as a success */ + ret = 0; + } + + KBASE_KTRACE_ADD_CSF_KCPU(kctx->kbdev, KCPU_FENCE_SIGNAL, kcpu_queue, + fence_info->fence->context, + fence_info->fence->seqno); + +#if (KERNEL_VERSION(5, 1, 0) > LINUX_VERSION_CODE) + dev_info(kctx->kbdev->dev, + "ctx:%d_%d kcpu queue[%pK]:%u signal fence[%pK] context#seqno:%llu#%u\n", + kctx->tgid, kctx->id, kcpu_queue, kcpu_queue->id, fence_info->fence, + fence_info->fence->context, fence_info->fence->seqno); +#else + dev_info(kctx->kbdev->dev, + "ctx:%d_%d kcpu queue[%pK]:%u signal fence[%pK] context#seqno:%llu#%llu\n", + kctx->tgid, kctx->id, kcpu_queue, kcpu_queue->id, fence_info->fence, + fence_info->fence->context, fence_info->fence->seqno); +#endif + + /* dma_fence refcount needs to be decreased to release it. */ + dma_fence_put(fence_info->fence); + fence_info->fence = NULL; + + return ret; +} + +static void kcpu_force_signal_fence(struct kbase_kcpu_command_queue *kcpu_queue) +{ + int status; + int i; +#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE) + struct fence *fence; +#else + struct dma_fence *fence; +#endif + struct kbase_context *const kctx = kcpu_queue->kctx; +#ifdef CONFIG_MALI_FENCE_DEBUG + int del; +#endif + + /* Force trigger all pending fence-signal commands */ + for (i = 0; i != kcpu_queue->num_pending_cmds; ++i) { + struct kbase_kcpu_command *cmd = + &kcpu_queue->commands[(u8)(kcpu_queue->start_offset + i)]; + + if (cmd->type == BASE_KCPU_COMMAND_TYPE_FENCE_SIGNAL) { + /* If a fence had already force-signalled previously, + * just skip it in this round of force signalling. + */ + if (kbase_kcpu_command_fence_has_force_signaled(&cmd->info.fence)) + continue; + + fence = kbase_fence_get(&cmd->info.fence); + + dev_info(kctx->kbdev->dev, "kbase KCPU[%pK] cmd%d fence[%pK] force signaled\n", + kcpu_queue, i+1, fence); + + /* set ETIMEDOUT error flag before signal the fence*/ + dma_fence_set_error_helper(fence, -ETIMEDOUT); + + /* force signal fence */ + status = kbase_kcpu_fence_force_signal_process( + kcpu_queue, &cmd->info.fence); + if (status < 0) + dev_err(kctx->kbdev->dev, "kbase signal failed\n"); + else + kbase_kcpu_command_fence_force_signaled_set(&cmd->info.fence, true); + + kcpu_queue->has_error = true; + } + } + + /* set fence_signal_pending_cnt to 0 + * and del_timer of the kcpu_queue + * because we signaled all the pending fence in the queue + */ + atomic_set(&kcpu_queue->fence_signal_pending_cnt, 0); +#ifdef CONFIG_MALI_FENCE_DEBUG + del = del_timer_sync(&kcpu_queue->fence_signal_timeout); + dev_info(kctx->kbdev->dev, "kbase KCPU [%pK] delete fence signal timeout timer ret: %d", + kcpu_queue, del); +#else + del_timer_sync(&kcpu_queue->fence_signal_timeout); +#endif +} + +static void kcpu_queue_force_fence_signal(struct kbase_kcpu_command_queue *kcpu_queue) +{ + struct kbase_context *const kctx = kcpu_queue->kctx; + char buff[] = "surfaceflinger"; + + /* Force signal unsignaled fence expect surfaceflinger */ + if (memcmp(kctx->comm, buff, sizeof(buff))) { + mutex_lock(&kcpu_queue->lock); + kcpu_force_signal_fence(kcpu_queue); + mutex_unlock(&kcpu_queue->lock); + } +} + +/** + * fence_signal_timeout_cb() - Timeout callback function for fence-signal-wait + * + * @timer: Timer struct + * + * Callback function on an enqueued fence signal command has expired on its configured wait + * duration. At the moment it's just a simple place-holder for other tasks to expand on actual + * sync state dump via a bottom-half workqueue item. + */ +static void fence_signal_timeout_cb(struct timer_list *timer) +{ + struct kbase_kcpu_command_queue *kcpu_queue = + container_of(timer, struct kbase_kcpu_command_queue, fence_signal_timeout); + struct kbase_context *const kctx = kcpu_queue->kctx; +#ifdef CONFIG_MALI_FENCE_DEBUG + dev_warn(kctx->kbdev->dev, "kbase KCPU fence signal timeout callback triggered"); +#endif + + /* If we have additional pending fence signal commands in the queue, re-arm for the + * remaining fence signal commands, and dump the work to dmesg, only if the + * global configuration option is set. + */ + if (atomic_read(&kctx->kbdev->fence_signal_timeout_enabled)) { + if (atomic_read(&kcpu_queue->fence_signal_pending_cnt) > 1) + fence_signal_timeout_start(kcpu_queue); + + kthread_queue_work(&kcpu_queue->csf_kcpu_worker, &kcpu_queue->timeout_work); + } +} + +static int kbasep_kcpu_fence_signal_process(struct kbase_kcpu_command_queue *kcpu_queue, + struct kbase_kcpu_command_fence_info *fence_info) +{ + struct kbase_context *const kctx = kcpu_queue->kctx; + int ret; + + /* already force signaled */ + if (kbase_kcpu_command_fence_has_force_signaled(fence_info)) + return 0; + if (WARN_ON(!fence_info->fence)) return -EINVAL; ret = dma_fence_signal(fence_info->fence); if (unlikely(ret < 0)) { - dev_warn(kctx->kbdev->dev, - "fence_signal() failed with %d\n", ret); + dev_warn(kctx->kbdev->dev, "dma_fence(%d) has been signalled already\n", ret); + /* Treated as a success */ + ret = 0; } - KBASE_KTRACE_ADD_CSF_KCPU(kctx->kbdev, FENCE_SIGNAL, kcpu_queue, + KBASE_KTRACE_ADD_CSF_KCPU(kctx->kbdev, KCPU_FENCE_SIGNAL, kcpu_queue, fence_info->fence->context, fence_info->fence->seqno); - dma_fence_put(fence_info->fence); + /* If one has multiple enqueued fence signal commands, re-arm the timer */ + if (atomic_dec_return(&kcpu_queue->fence_signal_pending_cnt) > 0) { + fence_signal_timeout_start(kcpu_queue); +#ifdef CONFIG_MALI_FENCE_DEBUG + dev_dbg(kctx->kbdev->dev, + "kbase re-arm KCPU fence signal timeout timer for next signal command"); +#endif + } else { +#ifdef CONFIG_MALI_FENCE_DEBUG + int del = del_timer_sync(&kcpu_queue->fence_signal_timeout); + + dev_dbg(kctx->kbdev->dev, "kbase KCPU delete fence signal timeout timer ret: %d", + del); + CSTD_UNUSED(del); +#else + del_timer_sync(&kcpu_queue->fence_signal_timeout); +#endif + } + + /* dma_fence refcount needs to be decreased to release it. */ + kbase_fence_put(fence_info->fence); fence_info->fence = NULL; return ret; } -static int kbase_kcpu_fence_signal_prepare( - struct kbase_kcpu_command_queue *kcpu_queue, - struct base_kcpu_command_fence_info *fence_info, - struct kbase_kcpu_command *current_command) +static int kbasep_kcpu_fence_signal_init(struct kbase_kcpu_command_queue *kcpu_queue, + struct kbase_kcpu_command *current_command, + struct base_fence *fence, struct sync_file **sync_file, + int *fd) { - struct kbase_context *const kctx = kcpu_queue->kctx; #if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE) struct fence *fence_out; #else struct dma_fence *fence_out; #endif - struct base_fence fence; - struct sync_file *sync_file; + struct kbase_kcpu_dma_fence *kcpu_fence; int ret = 0; - int fd; - lockdep_assert_held(&kctx->csf.kcpu_queues.lock); + lockdep_assert_held(&kcpu_queue->lock); - if (copy_from_user(&fence, u64_to_user_ptr(fence_info->fence), - sizeof(fence))) - return -EFAULT; - - fence_out = kzalloc(sizeof(*fence_out), GFP_KERNEL); - if (!fence_out) + kcpu_fence = kzalloc(sizeof(*kcpu_fence), GFP_KERNEL); + if (!kcpu_fence) return -ENOMEM; + /* Set reference to KCPU metadata */ + kcpu_fence->metadata = kcpu_queue->metadata; + + /* Set reference to KCPU metadata and increment refcount */ + kcpu_fence->metadata = kcpu_queue->metadata; + WARN_ON(!kbase_refcount_inc_not_zero(&kcpu_fence->metadata->refcount)); + +#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE) + fence_out = (struct fence *)kcpu_fence; +#else + fence_out = (struct dma_fence *)kcpu_fence; +#endif dma_fence_init(fence_out, &kbase_fence_ops, @@ -1394,55 +1843,197 @@ static int kbase_kcpu_fence_signal_prepare( #endif /* create a sync_file fd representing the fence */ - sync_file = sync_file_create(fence_out); - if (!sync_file) { -#if (KERNEL_VERSION(4, 9, 67) >= LINUX_VERSION_CODE) - dma_fence_put(fence_out); -#endif + *sync_file = sync_file_create(fence_out); + if (!(*sync_file)) { ret = -ENOMEM; goto file_create_fail; } - fd = get_unused_fd_flags(O_CLOEXEC); - if (fd < 0) { - ret = fd; + *fd = get_unused_fd_flags(O_CLOEXEC); + if (*fd < 0) { + ret = *fd; goto fd_flags_fail; } - fd_install(fd, sync_file->file); - - fence.basep.fd = fd; + fence->basep.fd = *fd; current_command->type = BASE_KCPU_COMMAND_TYPE_FENCE_SIGNAL; current_command->info.fence.fence = fence_out; + kbase_kcpu_command_fence_force_signaled_set(¤t_command->info.fence, false); + + return 0; + +fd_flags_fail: + fput((*sync_file)->file); +file_create_fail: + /* + * Upon failure, dma_fence refcount that was increased by + * dma_fence_get() or sync_file_create() needs to be decreased + * to release it. + */ + kbase_fence_put(fence_out); + current_command->info.fence.fence = NULL; + + return ret; +} + +static int kbase_kcpu_fence_signal_prepare(struct kbase_kcpu_command_queue *kcpu_queue, + struct base_kcpu_command_fence_info *fence_info, + struct kbase_kcpu_command *current_command) +{ + struct base_fence fence; + struct sync_file *sync_file = NULL; + int fd; + int ret = 0; + + lockdep_assert_held(&kcpu_queue->lock); + + if (copy_from_user(&fence, u64_to_user_ptr(fence_info->fence), sizeof(fence))) + return -EFAULT; + + ret = kbasep_kcpu_fence_signal_init(kcpu_queue, current_command, &fence, &sync_file, &fd); + if (ret) + return ret; if (copy_to_user(u64_to_user_ptr(fence_info->fence), &fence, sizeof(fence))) { ret = -EFAULT; - goto fd_flags_fail; + goto fail; } + /* 'sync_file' pointer can't be safely dereferenced once 'fd' is + * installed, so the install step needs to be done at the last + * before returning success. + */ + fd_install(fd, sync_file->file); + + if (atomic_inc_return(&kcpu_queue->fence_signal_pending_cnt) == 1) + fence_signal_timeout_start(kcpu_queue); + return 0; -fd_flags_fail: +fail: fput(sync_file->file); -file_create_fail: - dma_fence_put(fence_out); + kbase_fence_put(current_command->info.fence.fence); + current_command->info.fence.fence = NULL; return ret; } + +int kbase_kcpu_fence_signal_process(struct kbase_kcpu_command_queue *kcpu_queue, + struct kbase_kcpu_command_fence_info *fence_info) +{ + if (!kcpu_queue || !fence_info) + return -EINVAL; + + return kbasep_kcpu_fence_signal_process(kcpu_queue, fence_info); +} +KBASE_EXPORT_TEST_API(kbase_kcpu_fence_signal_process); + +int kbase_kcpu_fence_signal_init(struct kbase_kcpu_command_queue *kcpu_queue, + struct kbase_kcpu_command *current_command, + struct base_fence *fence, struct sync_file **sync_file, int *fd) +{ + if (!kcpu_queue || !current_command || !fence || !sync_file || !fd) + return -EINVAL; + + return kbasep_kcpu_fence_signal_init(kcpu_queue, current_command, fence, sync_file, fd); +} +KBASE_EXPORT_TEST_API(kbase_kcpu_fence_signal_init); #endif /* CONFIG_SYNC_FILE */ -static void kcpu_queue_process_worker(struct work_struct *data) +static void kcpu_queue_dump(struct kbase_kcpu_command_queue *queue) +{ + struct kbase_context *kctx = queue->kctx; + struct kbase_kcpu_command *cmd; + struct kbase_kcpu_command_fence_info *fence_info; + struct kbase_kcpu_dma_fence *kcpu_fence; +#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE) + struct fence *fence; +#else + struct dma_fence *fence; +#endif + struct kbase_sync_fence_info info; + size_t i; + + mutex_lock(&queue->lock); + + /* Find the next fence signal command in the queue */ + for (i = 0; i != queue->num_pending_cmds; ++i) { + cmd = &queue->commands[(u8)(queue->start_offset + i)]; + if (cmd->type == BASE_KCPU_COMMAND_TYPE_FENCE_SIGNAL) { + fence_info = &cmd->info.fence; + /* find the first unforce signaled fence */ + if (!kbase_kcpu_command_fence_has_force_signaled(fence_info)) + break; + } + } + + if (i == queue->num_pending_cmds) { + dev_err(kctx->kbdev->dev, + "%s: No fence signal command found in ctx:%d_%d kcpu queue:%u", __func__, + kctx->tgid, kctx->id, queue->id); + mutex_unlock(&queue->lock); + return; + } + + + fence = kbase_fence_get(fence_info); + if (!fence) { + dev_err(kctx->kbdev->dev, "no fence found in ctx:%d_%d kcpu queue:%u", kctx->tgid, + kctx->id, queue->id); + mutex_unlock(&queue->lock); + return; + } + + kcpu_fence = kbase_kcpu_dma_fence_get(fence); + if (!kcpu_fence) { + dev_err(kctx->kbdev->dev, "no fence metadata found in ctx:%d_%d kcpu queue:%u", + kctx->tgid, kctx->id, queue->id); + kbase_fence_put(fence); + mutex_unlock(&queue->lock); + return; + } + + kbase_sync_fence_info_get(fence, &info); + + dev_warn(kctx->kbdev->dev, "------------------------------------------------\n"); + dev_warn(kctx->kbdev->dev, "KCPU Fence signal timeout detected for ctx:%d_%d\n", kctx->tgid, + kctx->id); + dev_warn(kctx->kbdev->dev, "------------------------------------------------\n"); + dev_warn(kctx->kbdev->dev, "Kcpu queue:%u still waiting for fence[%pK] context#seqno:%s\n", + queue->id, fence, info.name); + dev_warn(kctx->kbdev->dev, "Fence metadata timeline name: %s\n", + kcpu_fence->metadata->timeline_name); + + kbase_fence_put(fence); + mutex_unlock(&queue->lock); + + mutex_lock(&kctx->csf.kcpu_queues.lock); + kbasep_csf_sync_kcpu_dump_locked(kctx, NULL); + mutex_unlock(&kctx->csf.kcpu_queues.lock); + + dev_warn(kctx->kbdev->dev, "-----------------------------------------------\n"); +} + +static void kcpu_queue_timeout_worker(struct kthread_work *data) +{ + struct kbase_kcpu_command_queue *queue = + container_of(data, struct kbase_kcpu_command_queue, timeout_work); + + kcpu_queue_dump(queue); + + kcpu_queue_force_fence_signal(queue); +} + +static void kcpu_queue_process_worker(struct kthread_work *data) { struct kbase_kcpu_command_queue *queue = container_of(data, struct kbase_kcpu_command_queue, work); - mutex_lock(&queue->kctx->csf.kcpu_queues.lock); - + mutex_lock(&queue->lock); kcpu_queue_process(queue, false); - - mutex_unlock(&queue->kctx->csf.kcpu_queues.lock); + mutex_unlock(&queue->lock); } static int delete_queue(struct kbase_context *kctx, u32 id) @@ -1455,9 +2046,23 @@ static int delete_queue(struct kbase_context *kctx, u32 id) struct kbase_kcpu_command_queue *queue = kctx->csf.kcpu_queues.array[id]; - KBASE_KTRACE_ADD_CSF_KCPU(kctx->kbdev, KCPU_QUEUE_DESTROY, + KBASE_KTRACE_ADD_CSF_KCPU(kctx->kbdev, KCPU_QUEUE_DELETE, queue, queue->num_pending_cmds, queue->cqs_wait_count); + /* Disassociate the queue from the system to prevent further + * submissions. Draining pending commands would be acceptable + * even if a new queue is created using the same ID. + */ + kctx->csf.kcpu_queues.array[id] = NULL; + bitmap_clear(kctx->csf.kcpu_queues.in_use, id, 1); + + mutex_unlock(&kctx->csf.kcpu_queues.lock); + + mutex_lock(&queue->lock); + + /* Metadata struct may outlive KCPU queue. */ + kbase_kcpu_dma_fence_meta_put(queue->metadata); + /* Drain the remaining work for this queue first and go past * all the waits. */ @@ -1469,17 +2074,16 @@ static int delete_queue(struct kbase_context *kctx, u32 id) /* All CQS wait commands should have been cleaned up */ WARN_ON(queue->cqs_wait_count); - kctx->csf.kcpu_queues.array[id] = NULL; - bitmap_clear(kctx->csf.kcpu_queues.in_use, id, 1); - /* Fire the tracepoint with the mutex held to enforce correct * ordering with the summary stream. */ KBASE_TLSTREAM_TL_KBASE_DEL_KCPUQUEUE(kctx->kbdev, queue); - mutex_unlock(&kctx->csf.kcpu_queues.lock); + mutex_unlock(&queue->lock); + + kbase_destroy_kworker_stack(&queue->csf_kcpu_worker); - cancel_work_sync(&queue->work); + mutex_destroy(&queue->lock); kfree(queue); } else { @@ -1546,7 +2150,7 @@ static void kcpu_queue_process(struct kbase_kcpu_command_queue *queue, bool process_next = true; size_t i; - lockdep_assert_held(&queue->kctx->csf.kcpu_queues.lock); + lockdep_assert_held(&queue->lock); for (i = 0; i != queue->num_pending_cmds; ++i) { struct kbase_kcpu_command *cmd = @@ -1564,8 +2168,7 @@ static void kcpu_queue_process(struct kbase_kcpu_command_queue *queue, status = 0; #if IS_ENABLED(CONFIG_SYNC_FILE) if (drain_queue) { - kbase_kcpu_fence_wait_cancel(queue, - &cmd->info.fence); + kbasep_kcpu_fence_wait_cancel(queue, &cmd->info.fence); } else { status = kbase_kcpu_fence_wait_process(queue, &cmd->info.fence); @@ -1595,8 +2198,7 @@ static void kcpu_queue_process(struct kbase_kcpu_command_queue *queue, status = 0; #if IS_ENABLED(CONFIG_SYNC_FILE) - status = kbase_kcpu_fence_signal_process( - queue, &cmd->info.fence); + status = kbasep_kcpu_fence_signal_process(queue, &cmd->info.fence); if (status < 0) queue->has_error = true; @@ -1668,10 +2270,10 @@ static void kcpu_queue_process(struct kbase_kcpu_command_queue *queue, KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_MAP_IMPORT_START(kbdev, queue); - kbase_gpu_vm_lock(queue->kctx); - meta = kbase_sticky_resource_acquire( - queue->kctx, cmd->info.import.gpu_va); - kbase_gpu_vm_unlock(queue->kctx); + kbase_gpu_vm_lock_with_pmode_sync(queue->kctx); + meta = kbase_sticky_resource_acquire(queue->kctx, + cmd->info.import.gpu_va); + kbase_gpu_vm_unlock_with_pmode_sync(queue->kctx); if (meta == NULL) { queue->has_error = true; @@ -1690,10 +2292,10 @@ static void kcpu_queue_process(struct kbase_kcpu_command_queue *queue, KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_UNMAP_IMPORT_START(kbdev, queue); - kbase_gpu_vm_lock(queue->kctx); - ret = kbase_sticky_resource_release( - queue->kctx, NULL, cmd->info.import.gpu_va); - kbase_gpu_vm_unlock(queue->kctx); + kbase_gpu_vm_lock_with_pmode_sync(queue->kctx); + ret = kbase_sticky_resource_release(queue->kctx, NULL, + cmd->info.import.gpu_va); + kbase_gpu_vm_unlock_with_pmode_sync(queue->kctx); if (!ret) { queue->has_error = true; @@ -1711,10 +2313,10 @@ static void kcpu_queue_process(struct kbase_kcpu_command_queue *queue, KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_UNMAP_IMPORT_FORCE_START(kbdev, queue); - kbase_gpu_vm_lock(queue->kctx); - ret = kbase_sticky_resource_release_force( - queue->kctx, NULL, cmd->info.import.gpu_va); - kbase_gpu_vm_unlock(queue->kctx); + kbase_gpu_vm_lock_with_pmode_sync(queue->kctx); + ret = kbase_sticky_resource_release_force(queue->kctx, NULL, + cmd->info.import.gpu_va); + kbase_gpu_vm_unlock_with_pmode_sync(queue->kctx); if (!ret) { queue->has_error = true; @@ -1756,7 +2358,7 @@ static void kcpu_queue_process(struct kbase_kcpu_command_queue *queue, break; } - case BASE_KCPU_COMMAND_TYPE_JIT_FREE: + case BASE_KCPU_COMMAND_TYPE_JIT_FREE: { KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_JIT_FREE_START(kbdev, queue); status = kbase_kcpu_jit_free_process(queue, cmd); @@ -1766,6 +2368,8 @@ static void kcpu_queue_process(struct kbase_kcpu_command_queue *queue, KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_JIT_FREE_END( kbdev, queue); break; + } +#if IS_ENABLED(CONFIG_MALI_VECTOR_DUMP) || MALI_UNIT_TEST case BASE_KCPU_COMMAND_TYPE_GROUP_SUSPEND: { struct kbase_suspend_copy_buffer *sus_buf = cmd->info.suspend_buf_copy.sus_buf; @@ -1777,29 +2381,31 @@ static void kcpu_queue_process(struct kbase_kcpu_command_queue *queue, status = kbase_csf_queue_group_suspend_process( queue->kctx, sus_buf, cmd->info.suspend_buf_copy.group_handle); + if (status) queue->has_error = true; KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_GROUP_SUSPEND_END( kbdev, queue, status); + } - if (!sus_buf->cpu_alloc) { - int i; + if (!sus_buf->cpu_alloc) { + int i; - for (i = 0; i < sus_buf->nr_pages; i++) - put_page(sus_buf->pages[i]); - } else { - kbase_mem_phy_alloc_kernel_unmapped( - sus_buf->cpu_alloc); - kbase_mem_phy_alloc_put( - sus_buf->cpu_alloc); - } + for (i = 0; i < sus_buf->nr_pages; i++) + put_page(sus_buf->pages[i]); + } else { + kbase_mem_phy_alloc_kernel_unmapped( + sus_buf->cpu_alloc); + kbase_mem_phy_alloc_put( + sus_buf->cpu_alloc); } kfree(sus_buf->pages); kfree(sus_buf); break; } +#endif default: dev_dbg(kbdev->dev, "Unrecognized command type"); @@ -1874,12 +2480,29 @@ static void KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_COMMAND( } case BASE_KCPU_COMMAND_TYPE_CQS_WAIT_OPERATION: { - /* GPUCORE-28172 RDT to review */ + const struct base_cqs_wait_operation_info *waits = + cmd->info.cqs_wait_operation.objs; + u32 inherit_err_flags = cmd->info.cqs_wait_operation.inherit_err_flags; + unsigned int i; + + for (i = 0; i < cmd->info.cqs_wait_operation.nr_objs; i++) { + KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_WAIT_OPERATION( + kbdev, queue, waits[i].addr, waits[i].val, + waits[i].operation, waits[i].data_type, + (inherit_err_flags & ((uint32_t)1 << i)) ? 1 : 0); + } break; } case BASE_KCPU_COMMAND_TYPE_CQS_SET_OPERATION: { - /* GPUCORE-28172 RDT to review */ + const struct base_cqs_set_operation_info *sets = cmd->info.cqs_set_operation.objs; + unsigned int i; + + for (i = 0; i < cmd->info.cqs_set_operation.nr_objs; i++) { + KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_SET_OPERATION( + kbdev, queue, sets[i].addr, sets[i].val, + sets[i].operation, sets[i].data_type); + } break; } case BASE_KCPU_COMMAND_TYPE_ERROR_BARRIER: @@ -1926,11 +2549,13 @@ static void KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_COMMAND( KBASE_TLSTREAM_TL_KBASE_ARRAY_END_KCPUQUEUE_ENQUEUE_JIT_FREE(kbdev, queue); break; } +#if IS_ENABLED(CONFIG_MALI_VECTOR_DUMP) || MALI_UNIT_TEST case BASE_KCPU_COMMAND_TYPE_GROUP_SUSPEND: KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_GROUP_SUSPEND( kbdev, queue, cmd->info.suspend_buf_copy.sus_buf, cmd->info.suspend_buf_copy.group_handle); break; +#endif default: dev_dbg(kbdev->dev, "Unknown command type %u", cmd->type); break; @@ -1947,9 +2572,11 @@ int kbase_csf_kcpu_queue_enqueue(struct kbase_context *kctx, /* The offset to the first command that is being processed or yet to * be processed is of u8 type, so the number of commands inside the - * queue cannot be more than 256. + * queue cannot be more than 256. The current implementation expects + * exactly 256, any other size will require the addition of wrapping + * logic. */ - BUILD_BUG_ON(KBASEP_KCPU_QUEUE_SIZE > 256); + BUILD_BUG_ON(KBASEP_KCPU_QUEUE_SIZE != 256); /* Whilst the backend interface allows enqueueing multiple commands in * a single operation, the Base interface does not expose any mechanism @@ -1964,14 +2591,30 @@ int kbase_csf_kcpu_queue_enqueue(struct kbase_context *kctx, return -EINVAL; } + /* There might be a race between one thread trying to enqueue commands to the queue + * and other thread trying to delete the same queue. + * This racing could lead to use-after-free problem by enqueuing thread if + * resources for the queue has already been freed by deleting thread. + * + * To prevent the issue, two mutexes are acquired/release asymmetrically as follows. + * + * Lock A (kctx mutex) + * Lock B (queue mutex) + * Unlock A + * Unlock B + * + * With the kctx mutex being held, enqueuing thread will check the queue + * and will return error code if the queue had already been deleted. + */ mutex_lock(&kctx->csf.kcpu_queues.lock); - - if (!kctx->csf.kcpu_queues.array[enq->id]) { - ret = -EINVAL; - goto out; - } - queue = kctx->csf.kcpu_queues.array[enq->id]; + if (queue == NULL) { + dev_dbg(kctx->kbdev->dev, "Invalid KCPU queue (id:%u)", enq->id); + mutex_unlock(&kctx->csf.kcpu_queues.lock); + return -EINVAL; + } + mutex_lock(&queue->lock); + mutex_unlock(&kctx->csf.kcpu_queues.lock); if (kcpu_queue_get_space(queue) < enq->nr_commands) { ret = -EBUSY; @@ -1986,7 +2629,7 @@ int kbase_csf_kcpu_queue_enqueue(struct kbase_context *kctx, * for the possibility to roll back. */ - for (i = 0; (i != enq->nr_commands) && !ret; ++i, ++kctx->csf.kcpu_queues.num_cmds) { + for (i = 0; (i != enq->nr_commands) && !ret; ++i) { struct kbase_kcpu_command *kcpu_cmd = &queue->commands[(u8)(queue->start_offset + queue->num_pending_cmds + i)]; struct base_kcpu_command command; @@ -2009,7 +2652,7 @@ int kbase_csf_kcpu_queue_enqueue(struct kbase_context *kctx, } } - kcpu_cmd->enqueue_ts = kctx->csf.kcpu_queues.num_cmds; + kcpu_cmd->enqueue_ts = atomic64_inc_return(&kctx->csf.kcpu_queues.cmd_seq_num); switch (command.type) { case BASE_KCPU_COMMAND_TYPE_FENCE_WAIT: #if IS_ENABLED(CONFIG_SYNC_FILE) @@ -2069,11 +2712,13 @@ int kbase_csf_kcpu_queue_enqueue(struct kbase_context *kctx, ret = kbase_kcpu_jit_free_prepare(queue, &command.info.jit_free, kcpu_cmd); break; +#if IS_ENABLED(CONFIG_MALI_VECTOR_DUMP) || MALI_UNIT_TEST case BASE_KCPU_COMMAND_TYPE_GROUP_SUSPEND: ret = kbase_csf_queue_group_suspend_prepare(queue, &command.info.suspend_buf_copy, kcpu_cmd); break; +#endif default: dev_dbg(queue->kctx->kbdev->dev, "Unknown command type %u", command.type); @@ -2097,13 +2742,10 @@ int kbase_csf_kcpu_queue_enqueue(struct kbase_context *kctx, queue->num_pending_cmds += enq->nr_commands; kcpu_queue_process(queue, false); - } else { - /* Roll back the number of enqueued commands */ - kctx->csf.kcpu_queues.num_cmds -= i; } out: - mutex_unlock(&kctx->csf.kcpu_queues.lock); + mutex_unlock(&queue->lock); return ret; } @@ -2117,14 +2759,9 @@ int kbase_csf_kcpu_queue_context_init(struct kbase_context *kctx) for (idx = 0; idx < KBASEP_MAX_KCPU_QUEUES; ++idx) kctx->csf.kcpu_queues.array[idx] = NULL; - kctx->csf.kcpu_queues.wq = alloc_workqueue("mali_kbase_csf_kcpu", - WQ_UNBOUND | WQ_HIGHPRI, 0); - if (!kctx->csf.kcpu_queues.wq) - return -ENOMEM; - mutex_init(&kctx->csf.kcpu_queues.lock); - kctx->csf.kcpu_queues.num_cmds = 0; + atomic64_set(&kctx->csf.kcpu_queues.cmd_seq_num, 0); return 0; } @@ -2142,9 +2779,9 @@ void kbase_csf_kcpu_queue_context_term(struct kbase_context *kctx) (void)delete_queue(kctx, id); } - destroy_workqueue(kctx->csf.kcpu_queues.wq); mutex_destroy(&kctx->csf.kcpu_queues.lock); } +KBASE_EXPORT_TEST_API(kbase_csf_kcpu_queue_context_term); int kbase_csf_kcpu_queue_delete(struct kbase_context *kctx, struct kbase_ioctl_kcpu_queue_delete *del) @@ -2157,8 +2794,11 @@ int kbase_csf_kcpu_queue_new(struct kbase_context *kctx, { struct kbase_kcpu_command_queue *queue; int idx; + int n; int ret = 0; - +#if IS_ENABLED(CONFIG_SYNC_FILE) + struct kbase_kcpu_dma_fence_meta *metadata; +#endif /* The queue id is of u8 type and we use the index of the kcpu_queues * array as an id, so the number of elements in the array can't be * more than 256. @@ -2186,8 +2826,14 @@ int kbase_csf_kcpu_queue_new(struct kbase_context *kctx, goto out; } - bitmap_set(kctx->csf.kcpu_queues.in_use, idx, 1); - kctx->csf.kcpu_queues.array[idx] = queue; + ret = kbase_kthread_run_worker_rt(kctx->kbdev, &queue->csf_kcpu_worker, "csf_kcpu_%i", idx); + + if (ret) { + kfree(queue); + goto out; + } + + mutex_init(&queue->lock); queue->kctx = kctx; queue->start_offset = 0; queue->num_pending_cmds = 0; @@ -2195,12 +2841,37 @@ int kbase_csf_kcpu_queue_new(struct kbase_context *kctx, queue->fence_context = dma_fence_context_alloc(1); queue->fence_seqno = 0; queue->fence_wait_processed = false; -#endif + + metadata = kzalloc(sizeof(*metadata), GFP_KERNEL); + if (!metadata) { + kbase_destroy_kworker_stack(&queue->csf_kcpu_worker); + kfree(queue); + ret = -ENOMEM; + goto out; + } + + metadata->kbdev = kctx->kbdev; + metadata->kctx_id = kctx->id; + n = snprintf(metadata->timeline_name, MAX_TIMELINE_NAME, "%d-%d_%d-%lld-kcpu", + kctx->kbdev->id, kctx->tgid, kctx->id, queue->fence_context); + if (WARN_ON(n >= MAX_TIMELINE_NAME)) { + kbase_destroy_kworker_stack(&queue->csf_kcpu_worker); + kfree(queue); + kfree(metadata); + ret = -EINVAL; + goto out; + } + + kbase_refcount_set(&metadata->refcount, 1); + queue->metadata = metadata; + atomic_inc(&kctx->kbdev->live_fence_metadata); +#endif /* CONFIG_SYNC_FILE */ queue->enqueue_failed = false; queue->command_started = false; INIT_LIST_HEAD(&queue->jit_blocked); queue->has_error = false; - INIT_WORK(&queue->work, kcpu_queue_process_worker); + kthread_init_work(&queue->work, kcpu_queue_process_worker); + kthread_init_work(&queue->timeout_work, kcpu_queue_timeout_worker); queue->id = idx; newq->id = idx; @@ -2211,10 +2882,103 @@ int kbase_csf_kcpu_queue_new(struct kbase_context *kctx, KBASE_TLSTREAM_TL_KBASE_NEW_KCPUQUEUE(kctx->kbdev, queue, queue->id, kctx->id, queue->num_pending_cmds); - KBASE_KTRACE_ADD_CSF_KCPU(kctx->kbdev, KCPU_QUEUE_NEW, queue, + KBASE_KTRACE_ADD_CSF_KCPU(kctx->kbdev, KCPU_QUEUE_CREATE, queue, queue->fence_context, 0); +#ifdef CONFIG_MALI_FENCE_DEBUG + kbase_timer_setup(&queue->fence_timeout, fence_timeout_callback); +#endif + +#if IS_ENABLED(CONFIG_SYNC_FILE) + atomic_set(&queue->fence_signal_pending_cnt, 0); + kbase_timer_setup(&queue->fence_signal_timeout, fence_signal_timeout_cb); +#endif + bitmap_set(kctx->csf.kcpu_queues.in_use, idx, 1); + kctx->csf.kcpu_queues.array[idx] = queue; out: mutex_unlock(&kctx->csf.kcpu_queues.lock); return ret; } +KBASE_EXPORT_TEST_API(kbase_csf_kcpu_queue_new); + +int kbase_csf_kcpu_queue_halt_timers(struct kbase_device *kbdev) +{ + struct kbase_context *kctx; + + list_for_each_entry(kctx, &kbdev->kctx_list, kctx_list_link) { + unsigned long queue_idx; + struct kbase_csf_kcpu_queue_context *kcpu_ctx = &kctx->csf.kcpu_queues; + + mutex_lock(&kcpu_ctx->lock); + + for_each_set_bit(queue_idx, kcpu_ctx->in_use, KBASEP_MAX_KCPU_QUEUES) { + struct kbase_kcpu_command_queue *kcpu_queue = kcpu_ctx->array[queue_idx]; + + if (unlikely(!kcpu_queue)) + continue; + + mutex_lock(&kcpu_queue->lock); + + if (atomic_read(&kcpu_queue->fence_signal_pending_cnt)) { + int ret = del_timer_sync(&kcpu_queue->fence_signal_timeout); + + dev_dbg(kbdev->dev, + "Fence signal timeout on KCPU queue(%lu), kctx (%d_%d) was %s on suspend", + queue_idx, kctx->tgid, kctx->id, + ret ? "pending" : "not pending"); + } + +#ifdef CONFIG_MALI_FENCE_DEBUG + if (kcpu_queue->fence_wait_processed) { + int ret = del_timer_sync(&kcpu_queue->fence_timeout); + + dev_dbg(kbdev->dev, + "Fence wait timeout on KCPU queue(%lu), kctx (%d_%d) was %s on suspend", + queue_idx, kctx->tgid, kctx->id, + ret ? "pending" : "not pending"); + } +#endif + mutex_unlock(&kcpu_queue->lock); + } + mutex_unlock(&kcpu_ctx->lock); + } + return 0; +} + +void kbase_csf_kcpu_queue_resume_timers(struct kbase_device *kbdev) +{ + struct kbase_context *kctx; + + list_for_each_entry(kctx, &kbdev->kctx_list, kctx_list_link) { + unsigned long queue_idx; + struct kbase_csf_kcpu_queue_context *kcpu_ctx = &kctx->csf.kcpu_queues; + + mutex_lock(&kcpu_ctx->lock); + + for_each_set_bit(queue_idx, kcpu_ctx->in_use, KBASEP_MAX_KCPU_QUEUES) { + struct kbase_kcpu_command_queue *kcpu_queue = kcpu_ctx->array[queue_idx]; + + if (unlikely(!kcpu_queue)) + continue; + + mutex_lock(&kcpu_queue->lock); +#ifdef CONFIG_MALI_FENCE_DEBUG + if (kcpu_queue->fence_wait_processed) { + fence_wait_timeout_start(kcpu_queue); + dev_dbg(kbdev->dev, + "Fence wait timeout on KCPU queue(%lu), kctx (%d_%d) has been resumed on system resume", + queue_idx, kctx->tgid, kctx->id); + } +#endif + if (atomic_read(&kbdev->fence_signal_timeout_enabled) && + atomic_read(&kcpu_queue->fence_signal_pending_cnt)) { + fence_signal_timeout_start(kcpu_queue); + dev_dbg(kbdev->dev, + "Fence signal timeout on KCPU queue(%lu), kctx (%d_%d) has been resumed on system resume", + queue_idx, kctx->tgid, kctx->id); + } + mutex_unlock(&kcpu_queue->lock); + } + mutex_unlock(&kcpu_ctx->lock); + } +} diff --git a/mali_kbase/csf/mali_kbase_csf_kcpu.h b/mali_kbase/csf/mali_kbase_csf_kcpu.h index 2216cb7..4a8d937 100644 --- a/mali_kbase/csf/mali_kbase_csf_kcpu.h +++ b/mali_kbase/csf/mali_kbase_csf_kcpu.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2018-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2018-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -22,6 +22,9 @@ #ifndef _KBASE_CSF_KCPU_H_ #define _KBASE_CSF_KCPU_H_ +#include <mali_kbase_fence.h> +#include <mali_kbase_sync.h> + #if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE) #include <linux/fence.h> #else @@ -44,12 +47,13 @@ struct kbase_kcpu_command_import_info { }; /** - * struct kbase_kcpu_command_fence_info - Structure which holds information - * about the fence object enqueued in the kcpu command queue + * struct kbase_kcpu_command_fence_info - Structure which holds information about the + * fence object enqueued in the kcpu command queue * - * @fence_cb: Fence callback - * @fence: Fence - * @kcpu_queue: kcpu command queue + * @fence_cb: Fence callback + * @fence: Fence + * @kcpu_queue: kcpu command queue + * @fence_has_force_signaled: fence has forced signaled after fence timeouted */ struct kbase_kcpu_command_fence_info { #if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE) @@ -60,6 +64,7 @@ struct kbase_kcpu_command_fence_info { struct dma_fence *fence; #endif /* LINUX_VERSION_CODE < KERNEL_VERSION(4, 10, 0) */ struct kbase_kcpu_command_queue *kcpu_queue; + bool fence_has_force_signaled; }; /** @@ -183,8 +188,9 @@ struct kbase_suspend_copy_buffer { struct kbase_mem_phy_alloc *cpu_alloc; }; +#if IS_ENABLED(CONFIG_MALI_VECTOR_DUMP) || MALI_UNIT_TEST /** - * struct base_kcpu_command_group_suspend - structure which contains + * struct kbase_kcpu_command_group_suspend_info - structure which contains * suspend buffer data captured for a suspended queue group. * * @sus_buf: Pointer to the structure which contains details of the @@ -195,10 +201,11 @@ struct kbase_kcpu_command_group_suspend_info { struct kbase_suspend_copy_buffer *sus_buf; u8 group_handle; }; +#endif /** - * struct kbase_cpu_command - Command which is to be part of the kernel + * struct kbase_kcpu_command - Command which is to be part of the kernel * command queue * * @type: Type of the command. @@ -229,20 +236,28 @@ struct kbase_kcpu_command { struct kbase_kcpu_command_import_info import; struct kbase_kcpu_command_jit_alloc_info jit_alloc; struct kbase_kcpu_command_jit_free_info jit_free; +#if IS_ENABLED(CONFIG_MALI_VECTOR_DUMP) || MALI_UNIT_TEST struct kbase_kcpu_command_group_suspend_info suspend_buf_copy; +#endif } info; }; /** * struct kbase_kcpu_command_queue - a command queue executed by the kernel * + * @lock: Lock to protect accesses to this queue. * @kctx: The context to which this command queue belongs. * @commands: Array of commands which have been successfully * enqueued to this command queue. - * @work: struct work_struct which contains a pointer to + * @csf_kcpu_worker: Dedicated worker for processing kernel CPU command + * queues. + * @work: struct kthread_work which contains a pointer to * the function which handles processing of kcpu * commands enqueued into a kcpu command queue; * part of kernel API for processing workqueues + * @timeout_work: struct kthread_work which contains a pointer to the + * function which handles post-timeout actions + * queue when a fence signal timeout occurs. * @start_offset: Index of the command to be executed next * @id: KCPU command queue ID. * @num_pending_cmds: The number of commands enqueued but not yet @@ -271,11 +286,20 @@ struct kbase_kcpu_command { * or without errors since last cleaned. * @jit_blocked: Used to keep track of command queues blocked * by a pending JIT allocation command. + * @fence_timeout: Timer used to detect the fence wait timeout. + * @metadata: Metadata structure containing basic information about + * this queue for any fence objects associated with this queue. + * @fence_signal_timeout: Timer used for detect a fence signal command has + * been blocked for too long. + * @fence_signal_pending_cnt: Enqueued fence signal commands in the queue. */ struct kbase_kcpu_command_queue { + struct mutex lock; struct kbase_context *kctx; struct kbase_kcpu_command commands[KBASEP_KCPU_QUEUE_SIZE]; - struct work_struct work; + struct kthread_worker csf_kcpu_worker; + struct kthread_work work; + struct kthread_work timeout_work; u8 start_offset; u8 id; u16 num_pending_cmds; @@ -287,6 +311,14 @@ struct kbase_kcpu_command_queue { bool command_started; struct list_head jit_blocked; bool has_error; +#ifdef CONFIG_MALI_FENCE_DEBUG + struct timer_list fence_timeout; +#endif /* CONFIG_MALI_FENCE_DEBUG */ +#if IS_ENABLED(CONFIG_SYNC_FILE) + struct kbase_kcpu_dma_fence_meta *metadata; +#endif /* CONFIG_SYNC_FILE */ + struct timer_list fence_signal_timeout; + atomic_t fence_signal_pending_cnt; }; /** @@ -351,4 +383,42 @@ int kbase_csf_kcpu_queue_context_init(struct kbase_context *kctx); */ void kbase_csf_kcpu_queue_context_term(struct kbase_context *kctx); +#if IS_ENABLED(CONFIG_SYNC_FILE) +/* Test wrappers for dma fence operations. */ +int kbase_kcpu_fence_signal_process(struct kbase_kcpu_command_queue *kcpu_queue, + struct kbase_kcpu_command_fence_info *fence_info); + +int kbase_kcpu_fence_signal_init(struct kbase_kcpu_command_queue *kcpu_queue, + struct kbase_kcpu_command *current_command, + struct base_fence *fence, struct sync_file **sync_file, int *fd); +#endif /* CONFIG_SYNC_FILE */ + +/* + * kbase_csf_kcpu_queue_halt_timers - Halt the KCPU fence timers associated with + * the kbase device. + * + * @kbdev: Kbase device + * + * Note that this function assumes that the caller has ensured that the + * kbase_device::kctx_list does not get updated during this function's runtime. + * At the moment, the function is only safe to call during system suspend, when + * the device PM active count has reached zero. + * + * Return: 0 on success, negative value otherwise. + */ +int kbase_csf_kcpu_queue_halt_timers(struct kbase_device *kbdev); + +/* + * kbase_csf_kcpu_queue_resume_timers - Resume the KCPU fence timers associated + * with the kbase device. + * + * @kbdev: Kbase device + * + * Note that this function assumes that the caller has ensured that the + * kbase_device::kctx_list does not get updated during this function's runtime. + * At the moment, the function is only safe to call during system resume. + */ +void kbase_csf_kcpu_queue_resume_timers(struct kbase_device *kbdev); + +bool kbase_kcpu_command_fence_has_force_signaled(struct kbase_kcpu_command_fence_info *fence_info); #endif /* _KBASE_CSF_KCPU_H_ */ diff --git a/mali_kbase/csf/mali_kbase_csf_kcpu_debugfs.c b/mali_kbase/csf/mali_kbase_csf_kcpu_debugfs.c index 0a2cde0..fa87777 100644 --- a/mali_kbase/csf/mali_kbase_csf_kcpu_debugfs.c +++ b/mali_kbase/csf/mali_kbase_csf_kcpu_debugfs.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -30,7 +30,7 @@ #if IS_ENABLED(CONFIG_DEBUG_FS) /** - * kbasep_csf_kcpu_debugfs_print_queue() - Print additional info for KCPU + * kbasep_csf_kcpu_debugfs_print_cqs_waits() - Print additional info for KCPU * queues blocked on CQS wait commands. * * @file: The seq_file to print to @@ -167,11 +167,7 @@ static const struct file_operations kbasep_csf_kcpu_debugfs_fops = { void kbase_csf_kcpu_debugfs_init(struct kbase_context *kctx) { struct dentry *file; -#if (KERNEL_VERSION(4, 7, 0) <= LINUX_VERSION_CODE) const mode_t mode = 0444; -#else - const mode_t mode = 0400; -#endif if (WARN_ON(!kctx || IS_ERR_OR_NULL(kctx->kctx_dentry))) return; diff --git a/mali_kbase/csf/mali_kbase_csf_kcpu_fence_debugfs.c b/mali_kbase/csf/mali_kbase_csf_kcpu_fence_debugfs.c new file mode 100644 index 0000000..cd55f62 --- /dev/null +++ b/mali_kbase/csf/mali_kbase_csf_kcpu_fence_debugfs.c @@ -0,0 +1,151 @@ +// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note +/* + * + * (C) COPYRIGHT 2018-2023 ARM Limited. All rights reserved. + * + * This program is free software and is provided to you under the terms of the + * GNU General Public License version 2 as published by the Free Software + * Foundation, and any use by you of this program is subject to the terms + * of such GNU license. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, you can access it online at + * http://www.gnu.org/licenses/gpl-2.0.html. + * + */ +#include <linux/fs.h> +#include <linux/version.h> +#include <linux/module.h> +#if IS_ENABLED(CONFIG_DEBUG_FS) +#include <linux/debugfs.h> +#endif + +#include <mali_kbase.h> +#include <csf/mali_kbase_csf_kcpu_fence_debugfs.h> +#include <mali_kbase_hwaccess_time.h> + +#define BUF_SIZE 10 + +#if IS_ENABLED(CONFIG_DEBUG_FS) +static ssize_t kbase_csf_kcpu_queue_fence_signal_enabled_get(struct file *file, char __user *buf, + size_t count, loff_t *ppos) +{ + int ret; + struct kbase_device *kbdev = file->private_data; + + if (atomic_read(&kbdev->fence_signal_timeout_enabled)) + ret = simple_read_from_buffer(buf, count, ppos, "1\n", 2); + else + ret = simple_read_from_buffer(buf, count, ppos, "0\n", 2); + + return ret; +}; + +static ssize_t kbase_csf_kcpu_queue_fence_signal_enabled_set(struct file *file, + const char __user *buf, size_t count, + loff_t *ppos) +{ + int ret; + unsigned int enabled; + struct kbase_device *kbdev = file->private_data; + + ret = kstrtouint_from_user(buf, count, 10, &enabled); + if (ret < 0) + return ret; + + atomic_set(&kbdev->fence_signal_timeout_enabled, enabled); + + return count; +} + +static const struct file_operations kbase_csf_kcpu_queue_fence_signal_fops = { + .owner = THIS_MODULE, + .read = kbase_csf_kcpu_queue_fence_signal_enabled_get, + .write = kbase_csf_kcpu_queue_fence_signal_enabled_set, + .open = simple_open, + .llseek = default_llseek, +}; + +static ssize_t kbase_csf_kcpu_queue_fence_signal_timeout_get(struct file *file, char __user *buf, + size_t count, loff_t *ppos) +{ + int size; + char buffer[BUF_SIZE]; + struct kbase_device *kbdev = file->private_data; + unsigned int timeout_ms = kbase_get_timeout_ms(kbdev, KCPU_FENCE_SIGNAL_TIMEOUT); + + size = scnprintf(buffer, sizeof(buffer), "%u\n", timeout_ms); + return simple_read_from_buffer(buf, count, ppos, buffer, size); +} + +static ssize_t kbase_csf_kcpu_queue_fence_signal_timeout_set(struct file *file, + const char __user *buf, size_t count, + loff_t *ppos) +{ + int ret; + unsigned int timeout_ms; + struct kbase_device *kbdev = file->private_data; + + ret = kstrtouint_from_user(buf, count, 10, &timeout_ms); + if (ret < 0) + return ret; + + /* The timeout passed by the user is bounded when trying to insert it into + * the precomputed timeout table, so we don't need to do any more validation + * before-hand. + */ + kbase_device_set_timeout_ms(kbdev, KCPU_FENCE_SIGNAL_TIMEOUT, timeout_ms); + + return count; +} + +static const struct file_operations kbase_csf_kcpu_queue_fence_signal_timeout_fops = { + .owner = THIS_MODULE, + .read = kbase_csf_kcpu_queue_fence_signal_timeout_get, + .write = kbase_csf_kcpu_queue_fence_signal_timeout_set, + .open = simple_open, + .llseek = default_llseek, +}; + +int kbase_csf_fence_timer_debugfs_init(struct kbase_device *kbdev) +{ + struct dentry *file; + const mode_t mode = 0644; + + if (WARN_ON(IS_ERR_OR_NULL(kbdev->mali_debugfs_directory))) + return -1; + + file = debugfs_create_file("fence_signal_timeout_enable", mode, + kbdev->mali_debugfs_directory, kbdev, + &kbase_csf_kcpu_queue_fence_signal_fops); + + if (IS_ERR_OR_NULL(file)) { + dev_warn(kbdev->dev, "Unable to create fence signal timer toggle entry"); + return -1; + } + + file = debugfs_create_file("fence_signal_timeout_ms", mode, kbdev->mali_debugfs_directory, + kbdev, &kbase_csf_kcpu_queue_fence_signal_timeout_fops); + + if (IS_ERR_OR_NULL(file)) { + dev_warn(kbdev->dev, "Unable to create fence signal timeout entry"); + return -1; + } + return 0; +} + +#else +int kbase_csf_fence_timer_debugfs_init(struct kbase_device *kbdev) +{ + return 0; +} + +#endif +void kbase_csf_fence_timer_debugfs_term(struct kbase_device *kbdev) +{ +} diff --git a/mali_kbase/csf/mali_kbase_csf_kcpu_fence_debugfs.h b/mali_kbase/csf/mali_kbase_csf_kcpu_fence_debugfs.h new file mode 100644 index 0000000..e3799fb --- /dev/null +++ b/mali_kbase/csf/mali_kbase_csf_kcpu_fence_debugfs.h @@ -0,0 +1,42 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * + * (C) COPYRIGHT 2018-2023 ARM Limited. All rights reserved. + * + * This program is free software and is provided to you under the terms of the + * GNU General Public License version 2 as published by the Free Software + * Foundation, and any use by you of this program is subject to the terms + * of such GNU license. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, you can access it online at + * http://www.gnu.org/licenses/gpl-2.0.html. + * + */ +#ifndef _KBASE_CSF_KCPU_FENCE_SIGNAL_DEBUGFS_H_ +#define _KBASE_CSF_KCPU_FENCE_SIGNAL_DEBUGFS_H_ + +struct kbase_device; + +/* + * kbase_csf_fence_timer_debugfs_init - Initialize fence signal timeout debugfs + * entries. + * @kbdev: Kbase device. + * + * Return: 0 on success, -1 on failure. + */ +int kbase_csf_fence_timer_debugfs_init(struct kbase_device *kbdev); + +/* + * kbase_csf_fence_timer_debugfs_term - Terminate fence signal timeout debugfs + * entries. + * @kbdev: Kbase device. + */ +void kbase_csf_fence_timer_debugfs_term(struct kbase_device *kbdev); + +#endif /* _KBASE_CSF_KCPU_FENCE_SIGNAL_DEBUGFS_H_ */ diff --git a/mali_kbase/csf/mali_kbase_csf_mcu_shared_reg.c b/mali_kbase/csf/mali_kbase_csf_mcu_shared_reg.c new file mode 100644 index 0000000..863cf10 --- /dev/null +++ b/mali_kbase/csf/mali_kbase_csf_mcu_shared_reg.c @@ -0,0 +1,818 @@ +// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note +/* + * + * (C) COPYRIGHT 2022-2023 ARM Limited. All rights reserved. + * + * This program is free software and is provided to you under the terms of the + * GNU General Public License version 2 as published by the Free Software + * Foundation, and any use by you of this program is subject to the terms + * of such GNU license. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, you can access it online at + * http://www.gnu.org/licenses/gpl-2.0.html. + * + */ + +#include <linux/protected_memory_allocator.h> +#include <mali_kbase.h> +#include "mali_kbase_csf.h" +#include "mali_kbase_csf_mcu_shared_reg.h" +#include <mali_kbase_mem_migrate.h> + +/* Scaling factor in pre-allocating shared regions for suspend bufs and userios */ +#define MCU_SHARED_REGS_PREALLOCATE_SCALE (8) + +/* MCU shared region map attempt limit */ +#define MCU_SHARED_REGS_BIND_ATTEMPT_LIMIT (4) + +/* Convert a VPFN to its start addr */ +#define GET_VPFN_VA(vpfn) ((vpfn) << PAGE_SHIFT) + +/* Macros for extract the corresponding VPFNs from a CSG_REG */ +#define CSG_REG_SUSP_BUF_VPFN(reg, nr_susp_pages) (reg->start_pfn) +#define CSG_REG_PMOD_BUF_VPFN(reg, nr_susp_pages) (reg->start_pfn + nr_susp_pages) +#define CSG_REG_USERIO_VPFN(reg, csi, nr_susp_pages) (reg->start_pfn + 2 * (nr_susp_pages + csi)) + +/* MCU shared segment dummy page mapping flags */ +#define DUMMY_PAGE_MAP_FLAGS (KBASE_REG_MEMATTR_INDEX(AS_MEMATTR_INDEX_DEFAULT) | KBASE_REG_GPU_NX) + +/* MCU shared segment suspend buffer mapping flags */ +#define SUSP_PAGE_MAP_FLAGS \ + (KBASE_REG_GPU_RD | KBASE_REG_GPU_WR | KBASE_REG_GPU_NX | \ + KBASE_REG_MEMATTR_INDEX(AS_MEMATTR_INDEX_DEFAULT)) + +/** + * struct kbase_csg_shared_region - Wrapper object for use with a CSG on runtime + * resources for suspend buffer pages, userio pages + * and their corresponding mapping GPU VA addresses + * from the MCU shared interface segment + * + * @link: Link to the managing list for the wrapper object. + * @reg: pointer to the region allocated from the shared interface segment, which + * covers the normal/P-mode suspend buffers, userio pages of the queues + * @grp: Pointer to the bound kbase_queue_group, or NULL if no binding (free). + * @pmode_mapped: Boolean for indicating the region has MMU mapped with the bound group's + * protected mode suspend buffer pages. + */ +struct kbase_csg_shared_region { + struct list_head link; + struct kbase_va_region *reg; + struct kbase_queue_group *grp; + bool pmode_mapped; +}; + +static unsigned long get_userio_mmu_flags(struct kbase_device *kbdev) +{ + unsigned long userio_map_flags; + + if (kbdev->system_coherency == COHERENCY_NONE) + userio_map_flags = + KBASE_REG_GPU_RD | KBASE_REG_MEMATTR_INDEX(AS_MEMATTR_INDEX_NON_CACHEABLE); + else + userio_map_flags = KBASE_REG_GPU_RD | KBASE_REG_SHARE_BOTH | + KBASE_REG_MEMATTR_INDEX(AS_MEMATTR_INDEX_SHARED); + + return (userio_map_flags | KBASE_REG_GPU_NX); +} + +static void set_page_meta_status_not_movable(struct tagged_addr phy) +{ + if (kbase_is_page_migration_enabled()) { + struct kbase_page_metadata *page_md = kbase_page_private(as_page(phy)); + + if (page_md) { + spin_lock(&page_md->migrate_lock); + page_md->status = PAGE_STATUS_SET(page_md->status, (u8)NOT_MOVABLE); + spin_unlock(&page_md->migrate_lock); + } + } +} + +static struct kbase_csg_shared_region *get_group_bound_csg_reg(struct kbase_queue_group *group) +{ + return (struct kbase_csg_shared_region *)group->csg_reg; +} + +static inline int update_mapping_with_dummy_pages(struct kbase_device *kbdev, u64 vpfn, + u32 nr_pages) +{ + struct kbase_csf_mcu_shared_regions *shared_regs = &kbdev->csf.scheduler.mcu_regs_data; + const unsigned long mem_flags = DUMMY_PAGE_MAP_FLAGS; + + return kbase_mmu_update_csf_mcu_pages(kbdev, vpfn, shared_regs->dummy_phys, nr_pages, + mem_flags, KBASE_MEM_GROUP_CSF_FW); +} + +static inline int insert_dummy_pages(struct kbase_device *kbdev, u64 vpfn, u32 nr_pages) +{ + struct kbase_csf_mcu_shared_regions *shared_regs = &kbdev->csf.scheduler.mcu_regs_data; + const unsigned long mem_flags = DUMMY_PAGE_MAP_FLAGS; + const enum kbase_caller_mmu_sync_info mmu_sync_info = CALLER_MMU_ASYNC; + + return kbase_mmu_insert_pages(kbdev, &kbdev->csf.mcu_mmu, vpfn, shared_regs->dummy_phys, + nr_pages, mem_flags, MCU_AS_NR, KBASE_MEM_GROUP_CSF_FW, + mmu_sync_info, NULL); +} + +/* Reset consecutive retry count to zero */ +static void notify_group_csg_reg_map_done(struct kbase_queue_group *group) +{ + lockdep_assert_held(&group->kctx->kbdev->csf.scheduler.lock); + + /* Just clear the internal map retry count */ + group->csg_reg_bind_retries = 0; +} + +/* Return true if a fatal group error has already been triggered */ +static bool notify_group_csg_reg_map_error(struct kbase_queue_group *group) +{ + struct kbase_device *kbdev = group->kctx->kbdev; + + lockdep_assert_held(&kbdev->csf.scheduler.lock); + + if (group->csg_reg_bind_retries < U8_MAX) + group->csg_reg_bind_retries++; + + /* Allow only one fatal error notification */ + if (group->csg_reg_bind_retries == MCU_SHARED_REGS_BIND_ATTEMPT_LIMIT) { + struct base_gpu_queue_group_error const err_payload = { + .error_type = BASE_GPU_QUEUE_GROUP_ERROR_FATAL, + .payload = { .fatal_group = { .status = GPU_EXCEPTION_TYPE_SW_FAULT_0 } } + }; + + dev_err(kbdev->dev, "Fatal: group_%d_%d_%d exceeded shared region map retry limit", + group->kctx->tgid, group->kctx->id, group->handle); + kbase_csf_add_group_fatal_error(group, &err_payload); + kbase_event_wakeup_nosync(group->kctx); + } + + return group->csg_reg_bind_retries >= MCU_SHARED_REGS_BIND_ATTEMPT_LIMIT; +} + +/* Replace the given phys at vpfn (reflecting a queue's userio_pages) mapping. + * If phys is NULL, the internal dummy_phys is used, which effectively + * restores back to the initialized state for the given queue's userio_pages + * (i.e. mapped to the default dummy page). + * In case of CSF mmu update error on a queue, the dummy phy is used to restore + * back the default 'unbound' (i.e. mapped to dummy) condition. + * + * It's the caller's responsibility to ensure that the given vpfn is extracted + * correctly from a CSG_REG object, for example, using CSG_REG_USERIO_VPFN(). + */ +static int userio_pages_replace_phys(struct kbase_device *kbdev, u64 vpfn, struct tagged_addr *phys) +{ + struct kbase_csf_mcu_shared_regions *shared_regs = &kbdev->csf.scheduler.mcu_regs_data; + int err = 0, err1; + + lockdep_assert_held(&kbdev->csf.scheduler.lock); + + if (phys) { + unsigned long mem_flags_input = shared_regs->userio_mem_rd_flags; + unsigned long mem_flags_output = mem_flags_input | KBASE_REG_GPU_WR; + + /* Dealing with a queue's INPUT page */ + err = kbase_mmu_update_csf_mcu_pages(kbdev, vpfn, &phys[0], 1, mem_flags_input, + KBASE_MEM_GROUP_CSF_IO); + /* Dealing with a queue's OUTPUT page */ + err1 = kbase_mmu_update_csf_mcu_pages(kbdev, vpfn + 1, &phys[1], 1, + mem_flags_output, KBASE_MEM_GROUP_CSF_IO); + if (unlikely(err1)) + err = err1; + } + + if (unlikely(err) || !phys) { + /* Restore back to dummy_userio_phy */ + update_mapping_with_dummy_pages(kbdev, vpfn, KBASEP_NUM_CS_USER_IO_PAGES); + } + + return err; +} + +/* Update a group's queues' mappings for a group with its runtime bound group region */ +static int csg_reg_update_on_csis(struct kbase_device *kbdev, struct kbase_queue_group *group, + struct kbase_queue_group *prev_grp) +{ + struct kbase_csg_shared_region *csg_reg = get_group_bound_csg_reg(group); + const u32 nr_susp_pages = PFN_UP(kbdev->csf.global_iface.groups[0].suspend_size); + const u32 nr_csis = kbdev->csf.global_iface.groups[0].stream_num; + struct tagged_addr *phy; + int err = 0, err1; + u32 i; + + lockdep_assert_held(&kbdev->csf.scheduler.lock); + + if (WARN_ONCE(!csg_reg, "Update_userio pages: group has no bound csg_reg")) + return -EINVAL; + + for (i = 0; i < nr_csis; i++) { + struct kbase_queue *queue = group->bound_queues[i]; + struct kbase_queue *prev_queue = prev_grp ? prev_grp->bound_queues[i] : NULL; + + /* Set the phy if the group's queue[i] needs mapping, otherwise NULL */ + phy = (queue && queue->enabled && !queue->user_io_gpu_va) ? queue->phys : NULL; + + /* Either phy is valid, or this update is for a transition change from + * prev_group, and the prev_queue was mapped, so an update is required. + */ + if (phy || (prev_queue && prev_queue->user_io_gpu_va)) { + u64 vpfn = CSG_REG_USERIO_VPFN(csg_reg->reg, i, nr_susp_pages); + + err1 = userio_pages_replace_phys(kbdev, vpfn, phy); + + if (unlikely(err1)) { + dev_warn(kbdev->dev, + "%s: Error in update queue-%d mapping for csg_%d_%d_%d", + __func__, i, group->kctx->tgid, group->kctx->id, + group->handle); + err = err1; + } else if (phy) + queue->user_io_gpu_va = GET_VPFN_VA(vpfn); + + /* Mark prev_group's queue has lost its mapping */ + if (prev_queue) + prev_queue->user_io_gpu_va = 0; + } + } + + return err; +} + +/* Bind a group to a given csg_reg, any previous mappings with the csg_reg are replaced + * with the given group's phy pages, or, if no replacement, the default dummy pages. + * Note, the csg_reg's fields are in transition step-by-step from the prev_grp to its + * new binding owner in this function. At the end, the prev_grp would be completely + * detached away from the previously bound csg_reg. + */ +static int group_bind_csg_reg(struct kbase_device *kbdev, struct kbase_queue_group *group, + struct kbase_csg_shared_region *csg_reg) +{ + const unsigned long mem_flags = SUSP_PAGE_MAP_FLAGS; + const u32 nr_susp_pages = PFN_UP(kbdev->csf.global_iface.groups[0].suspend_size); + struct kbase_queue_group *prev_grp = csg_reg->grp; + struct kbase_va_region *reg = csg_reg->reg; + struct tagged_addr *phy; + int err = 0, err1; + + lockdep_assert_held(&kbdev->csf.scheduler.lock); + + /* The csg_reg is expected still on the unused list so its link is not empty */ + if (WARN_ON_ONCE(list_empty(&csg_reg->link))) { + dev_dbg(kbdev->dev, "csg_reg is marked in active use"); + return -EINVAL; + } + + if (WARN_ON_ONCE(prev_grp && prev_grp->csg_reg != csg_reg)) { + dev_dbg(kbdev->dev, "Unexpected bound lost on prev_group"); + prev_grp->csg_reg = NULL; + return -EINVAL; + } + + /* Replacing the csg_reg bound group to the newly given one */ + csg_reg->grp = group; + group->csg_reg = csg_reg; + + /* Resolving mappings, deal with protected mode first */ + if (group->protected_suspend_buf.pma) { + /* We are binding a new group with P-mode ready, the prev_grp's P-mode mapping + * status is now stale during this transition of ownership. For the new owner, + * its mapping would have been updated away when it lost its binding previously. + * So it needs an update to this pma map. By clearing here the mapped flag + * ensures it reflects the new owner's condition. + */ + csg_reg->pmode_mapped = false; + err = kbase_csf_mcu_shared_group_update_pmode_map(kbdev, group); + } else if (csg_reg->pmode_mapped) { + /* Need to unmap the previous one, use the dummy pages */ + err = update_mapping_with_dummy_pages( + kbdev, CSG_REG_PMOD_BUF_VPFN(reg, nr_susp_pages), nr_susp_pages); + + if (unlikely(err)) + dev_warn(kbdev->dev, "%s: Failed to update P-mode dummy for csg_%d_%d_%d", + __func__, group->kctx->tgid, group->kctx->id, group->handle); + + csg_reg->pmode_mapped = false; + } + + /* Unlike the normal suspend buf, the mapping of the protected mode suspend buffer is + * actually reflected by a specific mapped flag (due to phys[] is only allocated on + * in-need basis). So the GPU_VA is always updated to the bound region's corresponding + * VA, as a reflection of the binding to the csg_reg. + */ + group->protected_suspend_buf.gpu_va = + GET_VPFN_VA(CSG_REG_PMOD_BUF_VPFN(reg, nr_susp_pages)); + + /* Deal with normal mode suspend buffer */ + phy = group->normal_suspend_buf.phy; + err1 = kbase_mmu_update_csf_mcu_pages(kbdev, CSG_REG_SUSP_BUF_VPFN(reg, nr_susp_pages), phy, + nr_susp_pages, mem_flags, KBASE_MEM_GROUP_CSF_FW); + + if (unlikely(err1)) { + dev_warn(kbdev->dev, "%s: Failed to update suspend buffer for csg_%d_%d_%d", + __func__, group->kctx->tgid, group->kctx->id, group->handle); + + /* Attempt a restore to default dummy for removing previous mapping */ + if (prev_grp) + update_mapping_with_dummy_pages( + kbdev, CSG_REG_SUSP_BUF_VPFN(reg, nr_susp_pages), nr_susp_pages); + err = err1; + /* Marking the normal suspend buffer is not mapped (due to error) */ + group->normal_suspend_buf.gpu_va = 0; + } else { + /* Marking the normal suspend buffer is actually mapped */ + group->normal_suspend_buf.gpu_va = + GET_VPFN_VA(CSG_REG_SUSP_BUF_VPFN(reg, nr_susp_pages)); + } + + /* Deal with queue uerio_pages */ + err1 = csg_reg_update_on_csis(kbdev, group, prev_grp); + if (likely(!err1)) + err = err1; + + /* Reset the previous group's suspend buffers' GPU_VAs as it has lost its bound */ + if (prev_grp) { + prev_grp->normal_suspend_buf.gpu_va = 0; + prev_grp->protected_suspend_buf.gpu_va = 0; + prev_grp->csg_reg = NULL; + } + + return err; +} + +/* Notify the group is placed on-slot, hence the bound csg_reg is active in use */ +void kbase_csf_mcu_shared_set_group_csg_reg_active(struct kbase_device *kbdev, + struct kbase_queue_group *group) +{ + struct kbase_csg_shared_region *csg_reg = get_group_bound_csg_reg(group); + + lockdep_assert_held(&kbdev->csf.scheduler.lock); + + if (WARN_ONCE(!csg_reg || csg_reg->grp != group, "Group_%d_%d_%d has no csg_reg bounding", + group->kctx->tgid, group->kctx->id, group->handle)) + return; + + /* By dropping out the csg_reg from the unused list, it becomes active and is tracked + * by its bound group that is on-slot. The design is that, when this on-slot group is + * moved to off-slot, the scheduler slot-clean up will add it back to the tail of the + * unused list. + */ + if (!WARN_ON_ONCE(list_empty(&csg_reg->link))) + list_del_init(&csg_reg->link); +} + +/* Notify the group is placed off-slot, hence the bound csg_reg is not in active use + * anymore. Existing bounding/mappings are left untouched. These would only be dealt with + * if the bound csg_reg is to be reused with another group. + */ +void kbase_csf_mcu_shared_set_group_csg_reg_unused(struct kbase_device *kbdev, + struct kbase_queue_group *group) +{ + struct kbase_csg_shared_region *csg_reg = get_group_bound_csg_reg(group); + struct kbase_csf_mcu_shared_regions *shared_regs = &kbdev->csf.scheduler.mcu_regs_data; + + lockdep_assert_held(&kbdev->csf.scheduler.lock); + + if (WARN_ONCE(!csg_reg || csg_reg->grp != group, "Group_%d_%d_%d has no csg_reg bound", + group->kctx->tgid, group->kctx->id, group->handle)) + return; + + /* By adding back the csg_reg to the unused list, it becomes available for another + * group to break its existing binding and set up a new one. + */ + if (!list_empty(&csg_reg->link)) { + WARN_ONCE(group->csg_nr >= 0, "Group is assumed vacated from slot"); + list_move_tail(&csg_reg->link, &shared_regs->unused_csg_regs); + } else + list_add_tail(&csg_reg->link, &shared_regs->unused_csg_regs); +} + +/* Adding a new queue to an existing on-slot group */ +int kbase_csf_mcu_shared_add_queue(struct kbase_device *kbdev, struct kbase_queue *queue) +{ + struct kbase_queue_group *group = queue->group; + struct kbase_csg_shared_region *csg_reg; + const u32 nr_susp_pages = PFN_UP(kbdev->csf.global_iface.groups[0].suspend_size); + u64 vpfn; + int err; + + lockdep_assert_held(&kbdev->csf.scheduler.lock); + + if (WARN_ONCE(!group || group->csg_nr < 0, "No bound group, or group is not on-slot")) + return -EIO; + + csg_reg = get_group_bound_csg_reg(group); + if (WARN_ONCE(!csg_reg || !list_empty(&csg_reg->link), + "No bound csg_reg, or in wrong state")) + return -EIO; + + vpfn = CSG_REG_USERIO_VPFN(csg_reg->reg, queue->csi_index, nr_susp_pages); + err = userio_pages_replace_phys(kbdev, vpfn, queue->phys); + if (likely(!err)) { + /* Mark the queue has been successfully mapped */ + queue->user_io_gpu_va = GET_VPFN_VA(vpfn); + } else { + /* Mark the queue has no mapping on its phys[] */ + queue->user_io_gpu_va = 0; + dev_dbg(kbdev->dev, + "%s: Error in mapping userio pages for queue-%d of csg_%d_%d_%d", __func__, + queue->csi_index, group->kctx->tgid, group->kctx->id, group->handle); + + /* notify the error for the bound group */ + if (notify_group_csg_reg_map_error(group)) + err = -EIO; + } + + return err; +} + +/* Unmap a given queue's userio pages, when the queue is deleted */ +void kbase_csf_mcu_shared_drop_stopped_queue(struct kbase_device *kbdev, struct kbase_queue *queue) +{ + struct kbase_queue_group *group; + struct kbase_csg_shared_region *csg_reg; + const u32 nr_susp_pages = PFN_UP(kbdev->csf.global_iface.groups[0].suspend_size); + u64 vpfn; + + lockdep_assert_held(&kbdev->csf.scheduler.lock); + + /* The queue has no existing mapping, nothing to do */ + if (!queue || !queue->user_io_gpu_va) + return; + + group = queue->group; + if (WARN_ONCE(!group || !group->csg_reg, "Queue/Group has no bound region")) + return; + + csg_reg = get_group_bound_csg_reg(group); + + vpfn = CSG_REG_USERIO_VPFN(csg_reg->reg, queue->csi_index, nr_susp_pages); + + WARN_ONCE(userio_pages_replace_phys(kbdev, vpfn, NULL), + "Unexpected restoring to dummy map update error"); + queue->user_io_gpu_va = 0; +} + +int kbase_csf_mcu_shared_group_update_pmode_map(struct kbase_device *kbdev, + struct kbase_queue_group *group) +{ + struct kbase_csf_mcu_shared_regions *shared_regs = &kbdev->csf.scheduler.mcu_regs_data; + struct kbase_csg_shared_region *csg_reg = get_group_bound_csg_reg(group); + const u32 nr_susp_pages = PFN_UP(kbdev->csf.global_iface.groups[0].suspend_size); + int err = 0, err1; + + lockdep_assert_held(&kbdev->csf.scheduler.lock); + + if (WARN_ONCE(!csg_reg, "Update_pmode_map: the bound csg_reg can't be NULL")) + return -EINVAL; + + /* If the pmode already mapped, nothing to do */ + if (csg_reg->pmode_mapped) + return 0; + + /* P-mode map not in place and the group has allocated P-mode pages, map it */ + if (group->protected_suspend_buf.pma) { + unsigned long mem_flags = SUSP_PAGE_MAP_FLAGS; + struct tagged_addr *phy = shared_regs->pma_phys; + struct kbase_va_region *reg = csg_reg->reg; + u64 vpfn = CSG_REG_PMOD_BUF_VPFN(reg, nr_susp_pages); + u32 i; + + /* Populate the protected phys from pma to phy[] */ + for (i = 0; i < nr_susp_pages; i++) + phy[i] = as_tagged(group->protected_suspend_buf.pma[i]->pa); + + /* Add the P-mode suspend buffer mapping */ + err = kbase_mmu_update_csf_mcu_pages(kbdev, vpfn, phy, nr_susp_pages, mem_flags, + KBASE_MEM_GROUP_CSF_FW); + + /* If error, restore to default dummpy */ + if (unlikely(err)) { + err1 = update_mapping_with_dummy_pages(kbdev, vpfn, nr_susp_pages); + if (unlikely(err1)) + dev_warn( + kbdev->dev, + "%s: Failed in recovering to P-mode dummy for csg_%d_%d_%d", + __func__, group->kctx->tgid, group->kctx->id, + group->handle); + + csg_reg->pmode_mapped = false; + } else + csg_reg->pmode_mapped = true; + } + + return err; +} + +void kbase_csf_mcu_shared_clear_evicted_group_csg_reg(struct kbase_device *kbdev, + struct kbase_queue_group *group) +{ + struct kbase_csf_mcu_shared_regions *shared_regs = &kbdev->csf.scheduler.mcu_regs_data; + struct kbase_csg_shared_region *csg_reg = get_group_bound_csg_reg(group); + struct kbase_va_region *reg; + const u32 nr_susp_pages = PFN_UP(kbdev->csf.global_iface.groups[0].suspend_size); + u32 nr_csis = kbdev->csf.global_iface.groups[0].stream_num; + int err = 0; + u32 i; + + lockdep_assert_held(&kbdev->csf.scheduler.lock); + + /* Nothing to do for clearing up if no bound csg_reg */ + if (!csg_reg) + return; + + reg = csg_reg->reg; + /* Restore mappings default dummy pages for any mapped pages */ + if (csg_reg->pmode_mapped) { + err = update_mapping_with_dummy_pages( + kbdev, CSG_REG_PMOD_BUF_VPFN(reg, nr_susp_pages), nr_susp_pages); + WARN_ONCE(unlikely(err), "Restore dummy failed for clearing pmod buffer mapping"); + + csg_reg->pmode_mapped = false; + } + + if (group->normal_suspend_buf.gpu_va) { + err = update_mapping_with_dummy_pages( + kbdev, CSG_REG_SUSP_BUF_VPFN(reg, nr_susp_pages), nr_susp_pages); + WARN_ONCE(err, "Restore dummy failed for clearing suspend buffer mapping"); + } + + /* Deal with queue uerio pages */ + for (i = 0; i < nr_csis; i++) + kbase_csf_mcu_shared_drop_stopped_queue(kbdev, group->bound_queues[i]); + + group->normal_suspend_buf.gpu_va = 0; + group->protected_suspend_buf.gpu_va = 0; + + /* Break the binding */ + group->csg_reg = NULL; + csg_reg->grp = NULL; + + /* Put the csg_reg to the front of the unused list */ + if (WARN_ON_ONCE(list_empty(&csg_reg->link))) + list_add(&csg_reg->link, &shared_regs->unused_csg_regs); + else + list_move(&csg_reg->link, &shared_regs->unused_csg_regs); +} + +int kbase_csf_mcu_shared_group_bind_csg_reg(struct kbase_device *kbdev, + struct kbase_queue_group *group) +{ + struct kbase_csf_mcu_shared_regions *shared_regs = &kbdev->csf.scheduler.mcu_regs_data; + struct kbase_csg_shared_region *csg_reg; + int err; + + lockdep_assert_held(&kbdev->csf.scheduler.lock); + + csg_reg = get_group_bound_csg_reg(group); + if (!csg_reg) + csg_reg = list_first_entry_or_null(&shared_regs->unused_csg_regs, + struct kbase_csg_shared_region, link); + + if (!WARN_ON_ONCE(!csg_reg)) { + struct kbase_queue_group *prev_grp = csg_reg->grp; + + /* Deal with the previous binding and lazy unmap, i.e if the previous mapping not + * the required one, unmap it. + */ + if (prev_grp == group) { + /* Update existing bindings, if there have been some changes */ + err = kbase_csf_mcu_shared_group_update_pmode_map(kbdev, group); + if (likely(!err)) + err = csg_reg_update_on_csis(kbdev, group, NULL); + } else + err = group_bind_csg_reg(kbdev, group, csg_reg); + } else { + /* This should not have been possible if the code operates rightly */ + dev_err(kbdev->dev, "%s: Unexpected NULL csg_reg for group %d of context %d_%d", + __func__, group->handle, group->kctx->tgid, group->kctx->id); + return -EIO; + } + + if (likely(!err)) + notify_group_csg_reg_map_done(group); + else + notify_group_csg_reg_map_error(group); + + return err; +} + +static int shared_mcu_csg_reg_init(struct kbase_device *kbdev, + struct kbase_csg_shared_region *csg_reg) +{ + struct kbase_csf_mcu_shared_regions *shared_regs = &kbdev->csf.scheduler.mcu_regs_data; + const u32 nr_susp_pages = PFN_UP(kbdev->csf.global_iface.groups[0].suspend_size); + u32 nr_csis = kbdev->csf.global_iface.groups[0].stream_num; + const size_t nr_csg_reg_pages = 2 * (nr_susp_pages + nr_csis); + struct kbase_va_region *reg; + u64 vpfn; + int err, i; + + INIT_LIST_HEAD(&csg_reg->link); + reg = kbase_alloc_free_region(&kbdev->csf.mcu_shared_zone, 0, nr_csg_reg_pages); + + if (!reg) { + dev_err(kbdev->dev, "%s: Failed to allocate a MCU shared region for %zu pages\n", + __func__, nr_csg_reg_pages); + return -ENOMEM; + } + + /* Insert the region into rbtree, so it becomes ready to use */ + mutex_lock(&kbdev->csf.reg_lock); + err = kbase_add_va_region_rbtree(kbdev, reg, 0, nr_csg_reg_pages, 1); + reg->flags &= ~KBASE_REG_FREE; + mutex_unlock(&kbdev->csf.reg_lock); + if (err) { + kfree(reg); + dev_err(kbdev->dev, "%s: Failed to add a region of %zu pages into rbtree", __func__, + nr_csg_reg_pages); + return err; + } + + /* Initialize the mappings so MMU only need to update the the corresponding + * mapped phy-pages at runtime. + * Map the normal suspend buffer pages to the prepared dummy phys[]. + */ + vpfn = CSG_REG_SUSP_BUF_VPFN(reg, nr_susp_pages); + err = insert_dummy_pages(kbdev, vpfn, nr_susp_pages); + + if (unlikely(err)) + goto fail_susp_map_fail; + + /* Map the protected suspend buffer pages to the prepared dummy phys[] */ + vpfn = CSG_REG_PMOD_BUF_VPFN(reg, nr_susp_pages); + err = insert_dummy_pages(kbdev, vpfn, nr_susp_pages); + + if (unlikely(err)) + goto fail_pmod_map_fail; + + for (i = 0; i < nr_csis; i++) { + vpfn = CSG_REG_USERIO_VPFN(reg, i, nr_susp_pages); + err = insert_dummy_pages(kbdev, vpfn, KBASEP_NUM_CS_USER_IO_PAGES); + + if (unlikely(err)) + goto fail_userio_pages_map_fail; + } + + /* Replace the previous NULL-valued field with the successully initialized reg */ + csg_reg->reg = reg; + + return 0; + +fail_userio_pages_map_fail: + while (i-- > 0) { + vpfn = CSG_REG_USERIO_VPFN(reg, i, nr_susp_pages); + kbase_mmu_teardown_firmware_pages(kbdev, &kbdev->csf.mcu_mmu, vpfn, + shared_regs->dummy_phys, + KBASEP_NUM_CS_USER_IO_PAGES, + KBASEP_NUM_CS_USER_IO_PAGES, MCU_AS_NR); + } + + vpfn = CSG_REG_PMOD_BUF_VPFN(reg, nr_susp_pages); + kbase_mmu_teardown_firmware_pages(kbdev, &kbdev->csf.mcu_mmu, vpfn, shared_regs->dummy_phys, + nr_susp_pages, nr_susp_pages, MCU_AS_NR); +fail_pmod_map_fail: + vpfn = CSG_REG_SUSP_BUF_VPFN(reg, nr_susp_pages); + kbase_mmu_teardown_firmware_pages(kbdev, &kbdev->csf.mcu_mmu, vpfn, shared_regs->dummy_phys, + nr_susp_pages, nr_susp_pages, MCU_AS_NR); +fail_susp_map_fail: + mutex_lock(&kbdev->csf.reg_lock); + kbase_remove_va_region(kbdev, reg); + mutex_unlock(&kbdev->csf.reg_lock); + kfree(reg); + + return err; +} + +/* Note, this helper can only be called on scheduler shutdown */ +static void shared_mcu_csg_reg_term(struct kbase_device *kbdev, + struct kbase_csg_shared_region *csg_reg) +{ + struct kbase_csf_mcu_shared_regions *shared_regs = &kbdev->csf.scheduler.mcu_regs_data; + struct kbase_va_region *reg = csg_reg->reg; + const u32 nr_susp_pages = PFN_UP(kbdev->csf.global_iface.groups[0].suspend_size); + const u32 nr_csis = kbdev->csf.global_iface.groups[0].stream_num; + u64 vpfn; + int i; + + for (i = 0; i < nr_csis; i++) { + vpfn = CSG_REG_USERIO_VPFN(reg, i, nr_susp_pages); + kbase_mmu_teardown_firmware_pages(kbdev, &kbdev->csf.mcu_mmu, vpfn, + shared_regs->dummy_phys, + KBASEP_NUM_CS_USER_IO_PAGES, + KBASEP_NUM_CS_USER_IO_PAGES, MCU_AS_NR); + } + + vpfn = CSG_REG_PMOD_BUF_VPFN(reg, nr_susp_pages); + kbase_mmu_teardown_firmware_pages(kbdev, &kbdev->csf.mcu_mmu, vpfn, shared_regs->dummy_phys, + nr_susp_pages, nr_susp_pages, MCU_AS_NR); + vpfn = CSG_REG_SUSP_BUF_VPFN(reg, nr_susp_pages); + kbase_mmu_teardown_firmware_pages(kbdev, &kbdev->csf.mcu_mmu, vpfn, shared_regs->dummy_phys, + nr_susp_pages, nr_susp_pages, MCU_AS_NR); + + mutex_lock(&kbdev->csf.reg_lock); + kbase_remove_va_region(kbdev, reg); + mutex_unlock(&kbdev->csf.reg_lock); + kfree(reg); +} + +int kbase_csf_mcu_shared_regs_data_init(struct kbase_device *kbdev) +{ + struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler; + struct kbase_csf_mcu_shared_regions *shared_regs = &scheduler->mcu_regs_data; + struct kbase_csg_shared_region *array_csg_regs; + const size_t nr_susp_pages = PFN_UP(kbdev->csf.global_iface.groups[0].suspend_size); + const u32 nr_groups = kbdev->csf.global_iface.group_num; + const u32 nr_csg_regs = MCU_SHARED_REGS_PREALLOCATE_SCALE * nr_groups; + const u32 nr_dummy_phys = MAX(nr_susp_pages, KBASEP_NUM_CS_USER_IO_PAGES); + u32 i; + int err; + + shared_regs->userio_mem_rd_flags = get_userio_mmu_flags(kbdev); + INIT_LIST_HEAD(&shared_regs->unused_csg_regs); + + shared_regs->dummy_phys = + kcalloc(nr_dummy_phys, sizeof(*shared_regs->dummy_phys), GFP_KERNEL); + if (!shared_regs->dummy_phys) + return -ENOMEM; + + if (kbase_mem_pool_alloc_pages(&kbdev->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], 1, + &shared_regs->dummy_phys[0], false, NULL) <= 0) + return -ENOMEM; + + shared_regs->dummy_phys_allocated = true; + set_page_meta_status_not_movable(shared_regs->dummy_phys[0]); + + /* Replicate the allocated single shared_regs->dummy_phys[0] to the full array */ + for (i = 1; i < nr_dummy_phys; i++) + shared_regs->dummy_phys[i] = shared_regs->dummy_phys[0]; + + shared_regs->pma_phys = kcalloc(nr_susp_pages, sizeof(*shared_regs->pma_phys), GFP_KERNEL); + if (!shared_regs->pma_phys) + return -ENOMEM; + + array_csg_regs = kcalloc(nr_csg_regs, sizeof(*array_csg_regs), GFP_KERNEL); + if (!array_csg_regs) + return -ENOMEM; + shared_regs->array_csg_regs = array_csg_regs; + + /* All fields in scheduler->mcu_regs_data except the shared_regs->array_csg_regs + * are properly populated and ready to use. Now initialize the items in + * shared_regs->array_csg_regs[] + */ + for (i = 0; i < nr_csg_regs; i++) { + err = shared_mcu_csg_reg_init(kbdev, &array_csg_regs[i]); + if (err) + return err; + + list_add_tail(&array_csg_regs[i].link, &shared_regs->unused_csg_regs); + } + + return 0; +} + +void kbase_csf_mcu_shared_regs_data_term(struct kbase_device *kbdev) +{ + struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler; + struct kbase_csf_mcu_shared_regions *shared_regs = &scheduler->mcu_regs_data; + struct kbase_csg_shared_region *array_csg_regs = + (struct kbase_csg_shared_region *)shared_regs->array_csg_regs; + const u32 nr_groups = kbdev->csf.global_iface.group_num; + const u32 nr_csg_regs = MCU_SHARED_REGS_PREALLOCATE_SCALE * nr_groups; + + if (array_csg_regs) { + struct kbase_csg_shared_region *csg_reg; + u32 i, cnt_csg_regs = 0; + + for (i = 0; i < nr_csg_regs; i++) { + csg_reg = &array_csg_regs[i]; + /* There should not be any group mapping bindings */ + WARN_ONCE(csg_reg->grp, "csg_reg has a bound group"); + + if (csg_reg->reg) { + shared_mcu_csg_reg_term(kbdev, csg_reg); + cnt_csg_regs++; + } + } + + /* The nr_susp_regs counts should match the array_csg_regs' length */ + list_for_each_entry(csg_reg, &shared_regs->unused_csg_regs, link) + cnt_csg_regs--; + + WARN_ONCE(cnt_csg_regs, "Unmatched counts of susp_regs"); + kfree(shared_regs->array_csg_regs); + } + + if (shared_regs->dummy_phys_allocated) { + struct page *page = as_page(shared_regs->dummy_phys[0]); + + kbase_mem_pool_free(&kbdev->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], page, false); + } + + kfree(shared_regs->dummy_phys); + kfree(shared_regs->pma_phys); +} diff --git a/mali_kbase/csf/mali_kbase_csf_mcu_shared_reg.h b/mali_kbase/csf/mali_kbase_csf_mcu_shared_reg.h new file mode 100644 index 0000000..61943cb --- /dev/null +++ b/mali_kbase/csf/mali_kbase_csf_mcu_shared_reg.h @@ -0,0 +1,139 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * + * (C) COPYRIGHT 2022 ARM Limited. All rights reserved. + * + * This program is free software and is provided to you under the terms of the + * GNU General Public License version 2 as published by the Free Software + * Foundation, and any use by you of this program is subject to the terms + * of such GNU license. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, you can access it online at + * http://www.gnu.org/licenses/gpl-2.0.html. + * + */ + +#ifndef _KBASE_CSF_MCU_SHARED_REG_H_ +#define _KBASE_CSF_MCU_SHARED_REG_H_ + +/** + * kbase_csf_mcu_shared_set_group_csg_reg_active - Notify that the group is active on-slot with + * scheduling action. Essential runtime resources + * are bound with the group for it to run + * + * @kbdev: Instance of a GPU platform device that implements a CSF interface. + * @group: Pointer to the group that is placed into active on-slot running by the scheduler. + * + */ +void kbase_csf_mcu_shared_set_group_csg_reg_active(struct kbase_device *kbdev, + struct kbase_queue_group *group); + +/** + * kbase_csf_mcu_shared_set_group_csg_reg_unused - Notify that the group is placed off-slot with + * scheduling action. Some of bound runtime + * resources can be reallocated for others to use + * + * @kbdev: Instance of a GPU platform device that implements a CSF interface. + * @group: Pointer to the group that is placed off-slot by the scheduler. + * + */ +void kbase_csf_mcu_shared_set_group_csg_reg_unused(struct kbase_device *kbdev, + struct kbase_queue_group *group); + +/** + * kbase_csf_mcu_shared_group_update_pmode_map - Request to update the given group's protected + * suspend buffer pages to be mapped for supporting + * protected mode operations. + * + * @kbdev: Instance of a GPU platform device that implements a CSF interface. + * @group: Pointer to the group for attempting a protected mode suspend buffer binding/mapping. + * + * Return: 0 for success, the group has a protected suspend buffer region mapped. Otherwise an + * error code is returned. + */ +int kbase_csf_mcu_shared_group_update_pmode_map(struct kbase_device *kbdev, + struct kbase_queue_group *group); + +/** + * kbase_csf_mcu_shared_clear_evicted_group_csg_reg - Clear any bound regions/mappings as the + * given group is evicted out of the runtime + * operations. + * + * @kbdev: Instance of a GPU platform device that implements a CSF interface. + * @group: Pointer to the group that has been evicted out of set of operational groups. + * + * This function will taken away any of the bindings/mappings immediately so the resources + * are not tied up to the given group, which has been evicted out of scheduling action for + * termination. + */ +void kbase_csf_mcu_shared_clear_evicted_group_csg_reg(struct kbase_device *kbdev, + struct kbase_queue_group *group); + +/** + * kbase_csf_mcu_shared_add_queue - Request to add a newly activated queue for a group to be + * run on slot. + * + * @kbdev: Instance of a GPU platform device that implements a CSF interface. + * @queue: Pointer to the queue that requires some runtime resource to be bound for joining + * others that are already running on-slot with their bound group. + * + * Return: 0 on success, or negative on failure. + */ +int kbase_csf_mcu_shared_add_queue(struct kbase_device *kbdev, struct kbase_queue *queue); + +/** + * kbase_csf_mcu_shared_drop_stopped_queue - Request to drop a queue after it has been stopped + * from its operational state from a group. + * + * @kbdev: Instance of a GPU platform device that implements a CSF interface. + * @queue: Pointer to the queue that has been stopped from operational state. + * + */ +void kbase_csf_mcu_shared_drop_stopped_queue(struct kbase_device *kbdev, struct kbase_queue *queue); + +/** + * kbase_csf_mcu_shared_group_bind_csg_reg - Bind some required runtime resources to the given + * group for ready to run on-slot. + * + * @kbdev: Instance of a GPU platform device that implements a CSF interface. + * @group: Pointer to the queue group that requires the runtime resources. + * + * This function binds/maps the required suspend buffer pages and userio pages for the given + * group, readying it to run on-slot. + * + * Return: 0 on success, or negative on failure. + */ +int kbase_csf_mcu_shared_group_bind_csg_reg(struct kbase_device *kbdev, + struct kbase_queue_group *group); + +/** + * kbase_csf_mcu_shared_regs_data_init - Allocate and initialize the MCU shared regions data for + * the given device. + * + * @kbdev: Instance of a GPU platform device that implements a CSF interface. + * + * This function allocate and initialize the MCU shared VA regions for runtime operations + * of the CSF scheduler. + * + * Return: 0 on success, or an error code. + */ +int kbase_csf_mcu_shared_regs_data_init(struct kbase_device *kbdev); + +/** + * kbase_csf_mcu_shared_regs_data_term - Terminate the allocated MCU shared regions data for + * the given device. + * + * @kbdev: Instance of a GPU platform device that implements a CSF interface. + * + * This function terminates the MCU shared VA regions allocated for runtime operations + * of the CSF scheduler. + */ +void kbase_csf_mcu_shared_regs_data_term(struct kbase_device *kbdev); + +#endif /* _KBASE_CSF_MCU_SHARED_REG_H_ */ diff --git a/mali_kbase/csf/mali_kbase_csf_protected_memory.c b/mali_kbase/csf/mali_kbase_csf_protected_memory.c index bf1835b..1bb1c03 100644 --- a/mali_kbase/csf/mali_kbase_csf_protected_memory.c +++ b/mali_kbase/csf/mali_kbase_csf_protected_memory.c @@ -51,7 +51,12 @@ int kbase_csf_protected_memory_init(struct kbase_device *const kbdev) dev_err(kbdev->dev, "Failed to get Protected memory allocator module\n"); err = -ENODEV; } else { - dev_info(kbdev->dev, "Protected memory allocator successfully loaded\n"); + err = dma_set_mask_and_coherent(&pdev->dev, + DMA_BIT_MASK(kbdev->gpu_props.mmu.pa_bits)); + if (err) + dev_err(&(pdev->dev), "protected_memory_allocator set dma fail\n"); + else + dev_info(kbdev->dev, "Protected memory allocator successfully loaded\n"); } } of_node_put(pma_node); diff --git a/mali_kbase/csf/mali_kbase_csf_registers.h b/mali_kbase/csf/mali_kbase_csf_registers.h index 99de444..b5ca885 100644 --- a/mali_kbase/csf/mali_kbase_csf_registers.h +++ b/mali_kbase/csf/mali_kbase_csf_registers.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2018-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2018-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -31,10 +31,6 @@ * Begin register sets */ -/* DOORBELLS base address */ -#define DOORBELLS_BASE 0x0080000 -#define DOORBELLS_REG(r) (DOORBELLS_BASE + (r)) - /* CS_KERNEL_INPUT_BLOCK base address */ #define CS_KERNEL_INPUT_BLOCK_BASE 0x0000 #define CS_KERNEL_INPUT_BLOCK_REG(r) (CS_KERNEL_INPUT_BLOCK_BASE + (r)) @@ -71,10 +67,6 @@ #define GLB_OUTPUT_BLOCK_BASE 0x0000 #define GLB_OUTPUT_BLOCK_REG(r) (GLB_OUTPUT_BLOCK_BASE + (r)) -/* USER base address */ -#define USER_BASE 0x0010000 -#define USER_REG(r) (USER_BASE + (r)) - /* End register sets */ /* @@ -151,18 +143,23 @@ #define CSG_ACK_IRQ_MASK 0x0004 /* () Global acknowledge interrupt mask */ #define CSG_DB_REQ 0x0008 /* () Global doorbell request */ #define CSG_IRQ_ACK 0x000C /* () CS IRQ acknowledge */ + + #define CSG_ALLOW_COMPUTE_LO 0x0020 /* () Allowed compute endpoints, low word */ #define CSG_ALLOW_COMPUTE_HI 0x0024 /* () Allowed compute endpoints, high word */ #define CSG_ALLOW_FRAGMENT_LO 0x0028 /* () Allowed fragment endpoints, low word */ #define CSG_ALLOW_FRAGMENT_HI 0x002C /* () Allowed fragment endpoints, high word */ #define CSG_ALLOW_OTHER 0x0030 /* () Allowed other endpoints */ -#define CSG_EP_REQ 0x0034 /* () Maximum number of endpoints allowed */ +#define CSG_EP_REQ_LO 0x0034 /* () Maximum number of endpoints allowed, low word */ +#define CSG_EP_REQ_HI 0x0038 /* () Maximum number of endpoints allowed, high word */ #define CSG_SUSPEND_BUF_LO 0x0040 /* () Normal mode suspend buffer, low word */ #define CSG_SUSPEND_BUF_HI 0x0044 /* () Normal mode suspend buffer, high word */ #define CSG_PROTM_SUSPEND_BUF_LO 0x0048 /* () Protected mode suspend buffer, low word */ #define CSG_PROTM_SUSPEND_BUF_HI 0x004C /* () Protected mode suspend buffer, high word */ #define CSG_CONFIG 0x0050 /* () CSG configuration options */ #define CSG_ITER_TRACE_CONFIG 0x0054 /* () CSG trace configuration */ +#define CSG_DVS_BUF_LO 0x0060 /* () Normal mode deferred vertex shading work buffer, low word */ +#define CSG_DVS_BUF_HI 0x0064 /* () Normal mode deferred vertex shading work buffer, high word */ /* CSG_OUTPUT_BLOCK register offsets */ #define CSG_ACK 0x0000 /* () CSG acknowledge flags */ @@ -227,24 +224,43 @@ #define GLB_PRFCNT_TILER_EN 0x0058 /* () Performance counter enable for tiler */ #define GLB_PRFCNT_MMU_L2_EN 0x005C /* () Performance counter enable for MMU/L2 cache */ -#define GLB_DEBUG_FWUTF_DESTROY 0x0FE0 /* () Test fixture destroy function address */ -#define GLB_DEBUG_FWUTF_TEST 0x0FE4 /* () Test index */ -#define GLB_DEBUG_FWUTF_FIXTURE 0x0FE8 /* () Test fixture index */ -#define GLB_DEBUG_FWUTF_CREATE 0x0FEC /* () Test fixture create function address */ +#define GLB_DEBUG_ARG_IN0 0x0FE0 /* Firmware Debug argument array element 0 */ +#define GLB_DEBUG_ARG_IN1 0x0FE4 /* Firmware Debug argument array element 1 */ +#define GLB_DEBUG_ARG_IN2 0x0FE8 /* Firmware Debug argument array element 2 */ +#define GLB_DEBUG_ARG_IN3 0x0FEC /* Firmware Debug argument array element 3 */ + +/* Mappings based on GLB_DEBUG_REQ.FWUTF_RUN bit being different from GLB_DEBUG_ACK.FWUTF_RUN */ +#define GLB_DEBUG_FWUTF_DESTROY GLB_DEBUG_ARG_IN0 /* () Test fixture destroy function address */ +#define GLB_DEBUG_FWUTF_TEST GLB_DEBUG_ARG_IN1 /* () Test index */ +#define GLB_DEBUG_FWUTF_FIXTURE GLB_DEBUG_ARG_IN2 /* () Test fixture index */ +#define GLB_DEBUG_FWUTF_CREATE GLB_DEBUG_ARG_IN3 /* () Test fixture create function address */ + #define GLB_DEBUG_ACK_IRQ_MASK 0x0FF8 /* () Global debug acknowledge interrupt mask */ #define GLB_DEBUG_REQ 0x0FFC /* () Global debug request */ /* GLB_OUTPUT_BLOCK register offsets */ +#define GLB_DEBUG_ARG_OUT0 0x0FE0 /* Firmware debug result element 0 */ +#define GLB_DEBUG_ARG_OUT1 0x0FE4 /* Firmware debug result element 1 */ +#define GLB_DEBUG_ARG_OUT2 0x0FE8 /* Firmware debug result element 2 */ +#define GLB_DEBUG_ARG_OUT3 0x0FEC /* Firmware debug result element 3 */ + #define GLB_ACK 0x0000 /* () Global acknowledge */ #define GLB_DB_ACK 0x0008 /* () Global doorbell acknowledge */ #define GLB_HALT_STATUS 0x0010 /* () Global halt status */ #define GLB_PRFCNT_STATUS 0x0014 /* () Performance counter status */ #define GLB_PRFCNT_INSERT 0x0018 /* () Performance counter buffer insert index */ -#define GLB_DEBUG_FWUTF_RESULT 0x0FE0 /* () Firmware debug test result */ +#define GLB_DEBUG_FWUTF_RESULT GLB_DEBUG_ARG_OUT0 /* () Firmware debug test result */ #define GLB_DEBUG_ACK 0x0FFC /* () Global debug acknowledge */ -/* USER register offsets */ -#define LATEST_FLUSH 0x0000 /* () Flush ID of latest clean-and-invalidate operation */ +#ifdef CONFIG_MALI_CORESIGHT +#define GLB_DEBUG_REQ_FW_AS_WRITE_SHIFT 4 +#define GLB_DEBUG_REQ_FW_AS_WRITE_MASK (0x1 << GLB_DEBUG_REQ_FW_AS_WRITE_SHIFT) +#define GLB_DEBUG_REQ_FW_AS_READ_SHIFT 5 +#define GLB_DEBUG_REQ_FW_AS_READ_MASK (0x1 << GLB_DEBUG_REQ_FW_AS_READ_SHIFT) +#define GLB_DEBUG_ARG_IN0 0x0FE0 +#define GLB_DEBUG_ARG_IN1 0x0FE4 +#define GLB_DEBUG_ARG_OUT0 0x0FE0 +#endif /* CONFIG_MALI_CORESIGHT */ /* End register offsets */ @@ -302,10 +318,17 @@ #define CS_REQ_IDLE_RESOURCE_REQ_SHIFT 11 #define CS_REQ_IDLE_RESOURCE_REQ_MASK (0x1 << CS_REQ_IDLE_RESOURCE_REQ_SHIFT) #define CS_REQ_IDLE_RESOURCE_REQ_GET(reg_val) \ - (((reg_val)&CS_REQ_IDLE_RESOURCE_REQ_MASK) >> CS_REQ_IDLE_RESOURCE_REQ_SHIFT) + (((reg_val) & CS_REQ_IDLE_RESOURCE_REQ_MASK) >> CS_REQ_IDLE_RESOURCE_REQ_SHIFT) #define CS_REQ_IDLE_RESOURCE_REQ_SET(reg_val, value) \ (((reg_val) & ~CS_REQ_IDLE_RESOURCE_REQ_MASK) | \ (((value) << CS_REQ_IDLE_RESOURCE_REQ_SHIFT) & CS_REQ_IDLE_RESOURCE_REQ_MASK)) +#define CS_REQ_IDLE_SHARED_SB_DEC_SHIFT 12 +#define CS_REQ_IDLE_SHARED_SB_DEC_MASK (0x1 << CS_REQ_IDLE_SHARED_SB_DEC_SHIFT) +#define CS_REQ_IDLE_SHARED_SB_DEC_GET(reg_val) \ + (((reg_val) & CS_REQ_IDLE_SHARED_SB_DEC_MASK) >> CS_REQ_IDLE_SHARED_SB_DEC_SHIFT) +#define CS_REQ_IDLE_SHARED_SB_DEC_REQ_SET(reg_val, value) \ + (((reg_val) & ~CS_REQ_IDLE_SHARED_SB_DEC_MASK) | \ + (((value) << CS_REQ_IDLE_SHARED_SB_DEC_SHIFT) & CS_REQ_IDLE_SHARED_SB_DEC_MASK)) #define CS_REQ_TILER_OOM_SHIFT 26 #define CS_REQ_TILER_OOM_MASK (0x1 << CS_REQ_TILER_OOM_SHIFT) #define CS_REQ_TILER_OOM_GET(reg_val) (((reg_val)&CS_REQ_TILER_OOM_MASK) >> CS_REQ_TILER_OOM_SHIFT) @@ -387,7 +410,7 @@ /* CS_BASE register */ #define CS_BASE_POINTER_SHIFT 0 -#define CS_BASE_POINTER_MASK (0xFFFFFFFFFFFFFFFF << CS_BASE_POINTER_SHIFT) +#define CS_BASE_POINTER_MASK (GPU_ULL(0xFFFFFFFFFFFFFFFF) << CS_BASE_POINTER_SHIFT) #define CS_BASE_POINTER_GET(reg_val) (((reg_val)&CS_BASE_POINTER_MASK) >> CS_BASE_POINTER_SHIFT) #define CS_BASE_POINTER_SET(reg_val, value) \ (((reg_val) & ~CS_BASE_POINTER_MASK) | (((value) << CS_BASE_POINTER_SHIFT) & CS_BASE_POINTER_MASK)) @@ -401,7 +424,8 @@ /* CS_TILER_HEAP_START register */ #define CS_TILER_HEAP_START_POINTER_SHIFT 0 -#define CS_TILER_HEAP_START_POINTER_MASK (0xFFFFFFFFFFFFFFFF << CS_TILER_HEAP_START_POINTER_SHIFT) +#define CS_TILER_HEAP_START_POINTER_MASK \ + (GPU_ULL(0xFFFFFFFFFFFFFFFF) << CS_TILER_HEAP_START_POINTER_SHIFT) #define CS_TILER_HEAP_START_POINTER_GET(reg_val) \ (((reg_val)&CS_TILER_HEAP_START_POINTER_MASK) >> CS_TILER_HEAP_START_POINTER_SHIFT) #define CS_TILER_HEAP_START_POINTER_SET(reg_val, value) \ @@ -412,7 +436,8 @@ /* CS_TILER_HEAP_END register */ #define CS_TILER_HEAP_END_POINTER_SHIFT 0 -#define CS_TILER_HEAP_END_POINTER_MASK (0xFFFFFFFFFFFFFFFF << CS_TILER_HEAP_END_POINTER_SHIFT) +#define CS_TILER_HEAP_END_POINTER_MASK \ + (GPU_ULL(0xFFFFFFFFFFFFFFFF) << CS_TILER_HEAP_END_POINTER_SHIFT) #define CS_TILER_HEAP_END_POINTER_GET(reg_val) \ (((reg_val)&CS_TILER_HEAP_END_POINTER_MASK) >> CS_TILER_HEAP_END_POINTER_SHIFT) #define CS_TILER_HEAP_END_POINTER_SET(reg_val, value) \ @@ -423,7 +448,7 @@ /* CS_USER_INPUT register */ #define CS_USER_INPUT_POINTER_SHIFT 0 -#define CS_USER_INPUT_POINTER_MASK (0xFFFFFFFFFFFFFFFF << CS_USER_INPUT_POINTER_SHIFT) +#define CS_USER_INPUT_POINTER_MASK (GPU_ULL(0xFFFFFFFFFFFFFFFF) << CS_USER_INPUT_POINTER_SHIFT) #define CS_USER_INPUT_POINTER_GET(reg_val) (((reg_val)&CS_USER_INPUT_POINTER_MASK) >> CS_USER_INPUT_POINTER_SHIFT) #define CS_USER_INPUT_POINTER_SET(reg_val, value) \ (((reg_val) & ~CS_USER_INPUT_POINTER_MASK) | \ @@ -431,7 +456,7 @@ /* CS_USER_OUTPUT register */ #define CS_USER_OUTPUT_POINTER_SHIFT 0 -#define CS_USER_OUTPUT_POINTER_MASK (0xFFFFFFFFFFFFFFFF << CS_USER_OUTPUT_POINTER_SHIFT) +#define CS_USER_OUTPUT_POINTER_MASK (GPU_ULL(0xFFFFFFFFFFFFFFFF) << CS_USER_OUTPUT_POINTER_SHIFT) #define CS_USER_OUTPUT_POINTER_GET(reg_val) (((reg_val)&CS_USER_OUTPUT_POINTER_MASK) >> CS_USER_OUTPUT_POINTER_SHIFT) #define CS_USER_OUTPUT_POINTER_SET(reg_val, value) \ (((reg_val) & ~CS_USER_OUTPUT_POINTER_MASK) | \ @@ -470,7 +495,8 @@ /* CS_INSTR_BUFFER_BASE register */ #define CS_INSTR_BUFFER_BASE_POINTER_SHIFT (0) -#define CS_INSTR_BUFFER_BASE_POINTER_MASK ((u64)0xFFFFFFFFFFFFFFFF << CS_INSTR_BUFFER_BASE_POINTER_SHIFT) +#define CS_INSTR_BUFFER_BASE_POINTER_MASK \ + (GPU_ULL(0xFFFFFFFFFFFFFFFF) << CS_INSTR_BUFFER_BASE_POINTER_SHIFT) #define CS_INSTR_BUFFER_BASE_POINTER_GET(reg_val) \ (((reg_val)&CS_INSTR_BUFFER_BASE_POINTER_MASK) >> CS_INSTR_BUFFER_BASE_POINTER_SHIFT) #define CS_INSTR_BUFFER_BASE_POINTER_SET(reg_val, value) \ @@ -479,8 +505,8 @@ /* CS_INSTR_BUFFER_OFFSET_POINTER register */ #define CS_INSTR_BUFFER_OFFSET_POINTER_POINTER_SHIFT (0) -#define CS_INSTR_BUFFER_OFFSET_POINTER_POINTER_MASK \ - (((u64)0xFFFFFFFFFFFFFFFF) << CS_INSTR_BUFFER_OFFSET_POINTER_POINTER_SHIFT) +#define CS_INSTR_BUFFER_OFFSET_POINTER_POINTER_MASK \ + ((GPU_ULL(0xFFFFFFFFFFFFFFFF)) << CS_INSTR_BUFFER_OFFSET_POINTER_POINTER_SHIFT) #define CS_INSTR_BUFFER_OFFSET_POINTER_POINTER_GET(reg_val) \ (((reg_val)&CS_INSTR_BUFFER_OFFSET_POINTER_POINTER_MASK) >> CS_INSTR_BUFFER_OFFSET_POINTER_POINTER_SHIFT) #define CS_INSTR_BUFFER_OFFSET_POINTER_POINTER_SET(reg_val, value) \ @@ -529,7 +555,8 @@ /* CS_STATUS_CMD_PTR register */ #define CS_STATUS_CMD_PTR_POINTER_SHIFT 0 -#define CS_STATUS_CMD_PTR_POINTER_MASK (0xFFFFFFFFFFFFFFFF << CS_STATUS_CMD_PTR_POINTER_SHIFT) +#define CS_STATUS_CMD_PTR_POINTER_MASK \ + (GPU_ULL(0xFFFFFFFFFFFFFFFF) << CS_STATUS_CMD_PTR_POINTER_SHIFT) #define CS_STATUS_CMD_PTR_POINTER_GET(reg_val) \ (((reg_val)&CS_STATUS_CMD_PTR_POINTER_MASK) >> CS_STATUS_CMD_PTR_POINTER_SHIFT) #define CS_STATUS_CMD_PTR_POINTER_SET(reg_val, value) \ @@ -543,6 +570,13 @@ #define CS_STATUS_WAIT_SB_MASK_SET(reg_val, value) \ (((reg_val) & ~CS_STATUS_WAIT_SB_MASK_MASK) | \ (((value) << CS_STATUS_WAIT_SB_MASK_SHIFT) & CS_STATUS_WAIT_SB_MASK_MASK)) +#define CS_STATUS_WAIT_SB_SOURCE_SHIFT 16 +#define CS_STATUS_WAIT_SB_SOURCE_MASK (0xF << CS_STATUS_WAIT_SB_SOURCE_SHIFT) +#define CS_STATUS_WAIT_SB_SOURCE_GET(reg_val) \ + (((reg_val)&CS_STATUS_WAIT_SB_SOURCE_MASK) >> CS_STATUS_WAIT_SB_SOURCE_SHIFT) +#define CS_STATUS_WAIT_SB_SOURCE_SET(reg_val, value) \ + (((reg_val) & ~CS_STATUS_WAIT_SB_SOURCE_MASK) | \ + (((value) << CS_STATUS_WAIT_SB_SOURCE_SHIFT) & CS_STATUS_WAIT_SB_SOURCE_MASK)) #define CS_STATUS_WAIT_SYNC_WAIT_CONDITION_SHIFT 24 #define CS_STATUS_WAIT_SYNC_WAIT_CONDITION_MASK (0xF << CS_STATUS_WAIT_SYNC_WAIT_CONDITION_SHIFT) #define CS_STATUS_WAIT_SYNC_WAIT_CONDITION_GET(reg_val) \ @@ -553,6 +587,7 @@ /* CS_STATUS_WAIT_SYNC_WAIT_CONDITION values */ #define CS_STATUS_WAIT_SYNC_WAIT_CONDITION_LE 0x0 #define CS_STATUS_WAIT_SYNC_WAIT_CONDITION_GT 0x1 +#define CS_STATUS_WAIT_SYNC_WAIT_CONDITION_GE 0x5 /* End of CS_STATUS_WAIT_SYNC_WAIT_CONDITION values */ #define CS_STATUS_WAIT_PROGRESS_WAIT_SHIFT 28 #define CS_STATUS_WAIT_PROGRESS_WAIT_MASK (0x1 << CS_STATUS_WAIT_PROGRESS_WAIT_SHIFT) @@ -568,6 +603,13 @@ #define CS_STATUS_WAIT_PROTM_PEND_SET(reg_val, value) \ (((reg_val) & ~CS_STATUS_WAIT_PROTM_PEND_MASK) | \ (((value) << CS_STATUS_WAIT_PROTM_PEND_SHIFT) & CS_STATUS_WAIT_PROTM_PEND_MASK)) +#define CS_STATUS_WAIT_SYNC_WAIT_SIZE_SHIFT 30 +#define CS_STATUS_WAIT_SYNC_WAIT_SIZE_MASK (0x1 << CS_STATUS_WAIT_SYNC_WAIT_SIZE_SHIFT) +#define CS_STATUS_WAIT_SYNC_WAIT_SIZE_GET(reg_val) \ + (((reg_val)&CS_STATUS_WAIT_SYNC_WAIT_SIZE_MASK) >> CS_STATUS_WAIT_SYNC_WAIT_SIZE_SHIFT) +#define CS_STATUS_WAIT_SYNC_WAIT_SIZE_SET(reg_val, value) \ + (((reg_val) & ~CS_STATUS_WAIT_SYNC_WAIT_SIZE_MASK) | \ + (((value) << CS_STATUS_WAIT_SYNC_WAIT_SIZE_SHIFT) & CS_STATUS_WAIT_SYNC_WAIT_SIZE_MASK)) #define CS_STATUS_WAIT_SYNC_WAIT_SHIFT 31 #define CS_STATUS_WAIT_SYNC_WAIT_MASK (0x1 << CS_STATUS_WAIT_SYNC_WAIT_SHIFT) #define CS_STATUS_WAIT_SYNC_WAIT_GET(reg_val) \ @@ -606,9 +648,11 @@ (((reg_val) & ~CS_STATUS_REQ_RESOURCE_IDVS_RESOURCES_MASK) | \ (((value) << CS_STATUS_REQ_RESOURCE_IDVS_RESOURCES_SHIFT) & CS_STATUS_REQ_RESOURCE_IDVS_RESOURCES_MASK)) + /* CS_STATUS_WAIT_SYNC_POINTER register */ #define CS_STATUS_WAIT_SYNC_POINTER_POINTER_SHIFT 0 -#define CS_STATUS_WAIT_SYNC_POINTER_POINTER_MASK (0xFFFFFFFFFFFFFFFF << CS_STATUS_WAIT_SYNC_POINTER_POINTER_SHIFT) +#define CS_STATUS_WAIT_SYNC_POINTER_POINTER_MASK \ + (GPU_ULL(0xFFFFFFFFFFFFFFFF) << CS_STATUS_WAIT_SYNC_POINTER_POINTER_SHIFT) #define CS_STATUS_WAIT_SYNC_POINTER_POINTER_GET(reg_val) \ (((reg_val)&CS_STATUS_WAIT_SYNC_POINTER_POINTER_MASK) >> CS_STATUS_WAIT_SYNC_POINTER_POINTER_SHIFT) #define CS_STATUS_WAIT_SYNC_POINTER_POINTER_SET(reg_val, value) \ @@ -677,6 +721,27 @@ #define CS_FAULT_EXCEPTION_TYPE_ADDR_RANGE_FAULT 0x5A #define CS_FAULT_EXCEPTION_TYPE_IMPRECISE_FAULT 0x5B #define CS_FAULT_EXCEPTION_TYPE_RESOURCE_EVICTION_TIMEOUT 0x69 +#define CS_FAULT_EXCEPTION_TYPE_TRANSLATION_FAULT_L0 0xC0 +#define CS_FAULT_EXCEPTION_TYPE_TRANSLATION_FAULT_L1 0xC1 +#define CS_FAULT_EXCEPTION_TYPE_TRANSLATION_FAULT_L2 0xC2 +#define CS_FAULT_EXCEPTION_TYPE_TRANSLATION_FAULT_L3 0xC3 +#define CS_FAULT_EXCEPTION_TYPE_TRANSLATION_FAULT_L4 0xC4 +#define CS_FAULT_EXCEPTION_TYPE_PERMISSION_FAULT_0 0xC8 +#define CS_FAULT_EXCEPTION_TYPE_PERMISSION_FAULT_1 0xC9 +#define CS_FAULT_EXCEPTION_TYPE_PERMISSION_FAULT_2 0xCA +#define CS_FAULT_EXCEPTION_TYPE_PERMISSION_FAULT_3 0xCB +#define CS_FAULT_EXCEPTION_TYPE_ACCESS_FLAG_1 0xD9 +#define CS_FAULT_EXCEPTION_TYPE_ACCESS_FLAG_2 0xDA +#define CS_FAULT_EXCEPTION_TYPE_ACCESS_FLAG_3 0xDB +#define CS_FAULT_EXCEPTION_TYPE_ADDRESS_SIZE_FAULT_IN 0xE0 +#define CS_FAULT_EXCEPTION_TYPE_ADDRESS_SIZE_FAULT_OUT_0 0xE4 +#define CS_FAULT_EXCEPTION_TYPE_ADDRESS_SIZE_FAULT_OUT_1 0xE5 +#define CS_FAULT_EXCEPTION_TYPE_ADDRESS_SIZE_FAULT_OUT_2 0xE6 +#define CS_FAULT_EXCEPTION_TYPE_ADDRESS_SIZE_FAULT_OUT_3 0xE7 +#define CS_FAULT_EXCEPTION_TYPE_MEMORY_ATTRIBUTE_FAULT_0 0xE8 +#define CS_FAULT_EXCEPTION_TYPE_MEMORY_ATTRIBUTE_FAULT_1 0xE9 +#define CS_FAULT_EXCEPTION_TYPE_MEMORY_ATTRIBUTE_FAULT_2 0xEA +#define CS_FAULT_EXCEPTION_TYPE_MEMORY_ATTRIBUTE_FAULT_3 0xEB /* End of CS_FAULT_EXCEPTION_TYPE values */ #define CS_FAULT_EXCEPTION_DATA_SHIFT 8 #define CS_FAULT_EXCEPTION_DATA_MASK (0xFFFFFF << CS_FAULT_EXCEPTION_DATA_SHIFT) @@ -694,6 +759,7 @@ (((value) << CS_FATAL_EXCEPTION_TYPE_SHIFT) & CS_FATAL_EXCEPTION_TYPE_MASK)) /* CS_FATAL_EXCEPTION_TYPE values */ #define CS_FATAL_EXCEPTION_TYPE_CS_CONFIG_FAULT 0x40 +#define CS_FATAL_EXCEPTION_TYPE_CS_UNRECOVERABLE 0x41 #define CS_FATAL_EXCEPTION_TYPE_CS_ENDPOINT_FAULT 0x44 #define CS_FATAL_EXCEPTION_TYPE_CS_BUS_FAULT 0x48 #define CS_FATAL_EXCEPTION_TYPE_CS_INVALID_INSTRUCTION 0x49 @@ -709,7 +775,8 @@ /* CS_FAULT_INFO register */ #define CS_FAULT_INFO_EXCEPTION_DATA_SHIFT 0 -#define CS_FAULT_INFO_EXCEPTION_DATA_MASK (0xFFFFFFFFFFFFFFFF << CS_FAULT_INFO_EXCEPTION_DATA_SHIFT) +#define CS_FAULT_INFO_EXCEPTION_DATA_MASK \ + (GPU_ULL(0xFFFFFFFFFFFFFFFF) << CS_FAULT_INFO_EXCEPTION_DATA_SHIFT) #define CS_FAULT_INFO_EXCEPTION_DATA_GET(reg_val) \ (((reg_val)&CS_FAULT_INFO_EXCEPTION_DATA_MASK) >> CS_FAULT_INFO_EXCEPTION_DATA_SHIFT) #define CS_FAULT_INFO_EXCEPTION_DATA_SET(reg_val, value) \ @@ -718,7 +785,8 @@ /* CS_FATAL_INFO register */ #define CS_FATAL_INFO_EXCEPTION_DATA_SHIFT 0 -#define CS_FATAL_INFO_EXCEPTION_DATA_MASK (0xFFFFFFFFFFFFFFFF << CS_FATAL_INFO_EXCEPTION_DATA_SHIFT) +#define CS_FATAL_INFO_EXCEPTION_DATA_MASK \ + (GPU_ULL(0xFFFFFFFFFFFFFFFF) << CS_FATAL_INFO_EXCEPTION_DATA_SHIFT) #define CS_FATAL_INFO_EXCEPTION_DATA_GET(reg_val) \ (((reg_val)&CS_FATAL_INFO_EXCEPTION_DATA_MASK) >> CS_FATAL_INFO_EXCEPTION_DATA_SHIFT) #define CS_FATAL_INFO_EXCEPTION_DATA_SET(reg_val, value) \ @@ -750,7 +818,7 @@ /* CS_HEAP_ADDRESS register */ #define CS_HEAP_ADDRESS_POINTER_SHIFT 0 -#define CS_HEAP_ADDRESS_POINTER_MASK (0xFFFFFFFFFFFFFFFF << CS_HEAP_ADDRESS_POINTER_SHIFT) +#define CS_HEAP_ADDRESS_POINTER_MASK (GPU_ULL(0xFFFFFFFFFFFFFFFF) << CS_HEAP_ADDRESS_POINTER_SHIFT) #define CS_HEAP_ADDRESS_POINTER_GET(reg_val) (((reg_val)&CS_HEAP_ADDRESS_POINTER_MASK) >> CS_HEAP_ADDRESS_POINTER_SHIFT) #define CS_HEAP_ADDRESS_POINTER_SET(reg_val, value) \ (((reg_val) & ~CS_HEAP_ADDRESS_POINTER_MASK) | \ @@ -761,14 +829,14 @@ /* CS_INSERT register */ #define CS_INSERT_VALUE_SHIFT 0 -#define CS_INSERT_VALUE_MASK (0xFFFFFFFFFFFFFFFF << CS_INSERT_VALUE_SHIFT) +#define CS_INSERT_VALUE_MASK (GPU_ULL(0xFFFFFFFFFFFFFFFF) << CS_INSERT_VALUE_SHIFT) #define CS_INSERT_VALUE_GET(reg_val) (((reg_val)&CS_INSERT_VALUE_MASK) >> CS_INSERT_VALUE_SHIFT) #define CS_INSERT_VALUE_SET(reg_val, value) \ (((reg_val) & ~CS_INSERT_VALUE_MASK) | (((value) << CS_INSERT_VALUE_SHIFT) & CS_INSERT_VALUE_MASK)) /* CS_EXTRACT_INIT register */ #define CS_EXTRACT_INIT_VALUE_SHIFT 0 -#define CS_EXTRACT_INIT_VALUE_MASK (0xFFFFFFFFFFFFFFFF << CS_EXTRACT_INIT_VALUE_SHIFT) +#define CS_EXTRACT_INIT_VALUE_MASK (GPU_ULL(0xFFFFFFFFFFFFFFFF) << CS_EXTRACT_INIT_VALUE_SHIFT) #define CS_EXTRACT_INIT_VALUE_GET(reg_val) (((reg_val)&CS_EXTRACT_INIT_VALUE_MASK) >> CS_EXTRACT_INIT_VALUE_SHIFT) #define CS_EXTRACT_INIT_VALUE_SET(reg_val, value) \ (((reg_val) & ~CS_EXTRACT_INIT_VALUE_MASK) | \ @@ -779,7 +847,7 @@ /* CS_EXTRACT register */ #define CS_EXTRACT_VALUE_SHIFT 0 -#define CS_EXTRACT_VALUE_MASK (0xFFFFFFFFFFFFFFFF << CS_EXTRACT_VALUE_SHIFT) +#define CS_EXTRACT_VALUE_MASK (GPU_ULL(0xFFFFFFFFFFFFFFFF) << CS_EXTRACT_VALUE_SHIFT) #define CS_EXTRACT_VALUE_GET(reg_val) (((reg_val)&CS_EXTRACT_VALUE_MASK) >> CS_EXTRACT_VALUE_SHIFT) #define CS_EXTRACT_VALUE_SET(reg_val, value) \ (((reg_val) & ~CS_EXTRACT_VALUE_MASK) | (((value) << CS_EXTRACT_VALUE_SHIFT) & CS_EXTRACT_VALUE_MASK)) @@ -827,11 +895,6 @@ #define CSG_REQ_IDLE_GET(reg_val) (((reg_val)&CSG_REQ_IDLE_MASK) >> CSG_REQ_IDLE_SHIFT) #define CSG_REQ_IDLE_SET(reg_val, value) \ (((reg_val) & ~CSG_REQ_IDLE_MASK) | (((value) << CSG_REQ_IDLE_SHIFT) & CSG_REQ_IDLE_MASK)) -#define CSG_REQ_DOORBELL_SHIFT 30 -#define CSG_REQ_DOORBELL_MASK (0x1 << CSG_REQ_DOORBELL_SHIFT) -#define CSG_REQ_DOORBELL_GET(reg_val) (((reg_val)&CSG_REQ_DOORBELL_MASK) >> CSG_REQ_DOORBELL_SHIFT) -#define CSG_REQ_DOORBELL_SET(reg_val, value) \ - (((reg_val) & ~CSG_REQ_DOORBELL_MASK) | (((value) << CSG_REQ_DOORBELL_SHIFT) & CSG_REQ_DOORBELL_MASK)) #define CSG_REQ_PROGRESS_TIMER_EVENT_SHIFT 31 #define CSG_REQ_PROGRESS_TIMER_EVENT_MASK (0x1 << CSG_REQ_PROGRESS_TIMER_EVENT_SHIFT) #define CSG_REQ_PROGRESS_TIMER_EVENT_GET(reg_val) \ @@ -894,45 +957,50 @@ /* CSG_EP_REQ register */ #define CSG_EP_REQ_COMPUTE_EP_SHIFT 0 -#define CSG_EP_REQ_COMPUTE_EP_MASK (0xFF << CSG_EP_REQ_COMPUTE_EP_SHIFT) +#define CSG_EP_REQ_COMPUTE_EP_MASK ((u64)0xFF << CSG_EP_REQ_COMPUTE_EP_SHIFT) #define CSG_EP_REQ_COMPUTE_EP_GET(reg_val) (((reg_val)&CSG_EP_REQ_COMPUTE_EP_MASK) >> CSG_EP_REQ_COMPUTE_EP_SHIFT) -#define CSG_EP_REQ_COMPUTE_EP_SET(reg_val, value) \ - (((reg_val) & ~CSG_EP_REQ_COMPUTE_EP_MASK) | \ - (((value) << CSG_EP_REQ_COMPUTE_EP_SHIFT) & CSG_EP_REQ_COMPUTE_EP_MASK)) +#define CSG_EP_REQ_COMPUTE_EP_SET(reg_val, value) \ + (((reg_val) & ~CSG_EP_REQ_COMPUTE_EP_MASK) | \ + ((((u64)value) << CSG_EP_REQ_COMPUTE_EP_SHIFT) & CSG_EP_REQ_COMPUTE_EP_MASK)) #define CSG_EP_REQ_FRAGMENT_EP_SHIFT 8 -#define CSG_EP_REQ_FRAGMENT_EP_MASK (0xFF << CSG_EP_REQ_FRAGMENT_EP_SHIFT) +#define CSG_EP_REQ_FRAGMENT_EP_MASK ((u64)0xFF << CSG_EP_REQ_FRAGMENT_EP_SHIFT) #define CSG_EP_REQ_FRAGMENT_EP_GET(reg_val) (((reg_val)&CSG_EP_REQ_FRAGMENT_EP_MASK) >> CSG_EP_REQ_FRAGMENT_EP_SHIFT) -#define CSG_EP_REQ_FRAGMENT_EP_SET(reg_val, value) \ - (((reg_val) & ~CSG_EP_REQ_FRAGMENT_EP_MASK) | \ - (((value) << CSG_EP_REQ_FRAGMENT_EP_SHIFT) & CSG_EP_REQ_FRAGMENT_EP_MASK)) +#define CSG_EP_REQ_FRAGMENT_EP_SET(reg_val, value) \ + (((reg_val) & ~CSG_EP_REQ_FRAGMENT_EP_MASK) | \ + ((((u64)value) << CSG_EP_REQ_FRAGMENT_EP_SHIFT) & CSG_EP_REQ_FRAGMENT_EP_MASK)) #define CSG_EP_REQ_TILER_EP_SHIFT 16 -#define CSG_EP_REQ_TILER_EP_MASK (0xF << CSG_EP_REQ_TILER_EP_SHIFT) +#define CSG_EP_REQ_TILER_EP_MASK ((u64)0xF << CSG_EP_REQ_TILER_EP_SHIFT) #define CSG_EP_REQ_TILER_EP_GET(reg_val) (((reg_val)&CSG_EP_REQ_TILER_EP_MASK) >> CSG_EP_REQ_TILER_EP_SHIFT) -#define CSG_EP_REQ_TILER_EP_SET(reg_val, value) \ - (((reg_val) & ~CSG_EP_REQ_TILER_EP_MASK) | (((value) << CSG_EP_REQ_TILER_EP_SHIFT) & CSG_EP_REQ_TILER_EP_MASK)) +#define CSG_EP_REQ_TILER_EP_SET(reg_val, value) \ + (((reg_val) & ~CSG_EP_REQ_TILER_EP_MASK) | \ + ((((u64)value) << CSG_EP_REQ_TILER_EP_SHIFT) & CSG_EP_REQ_TILER_EP_MASK)) #define CSG_EP_REQ_EXCLUSIVE_COMPUTE_SHIFT 20 -#define CSG_EP_REQ_EXCLUSIVE_COMPUTE_MASK (0x1 << CSG_EP_REQ_EXCLUSIVE_COMPUTE_SHIFT) +#define CSG_EP_REQ_EXCLUSIVE_COMPUTE_MASK ((u64)0x1 << CSG_EP_REQ_EXCLUSIVE_COMPUTE_SHIFT) #define CSG_EP_REQ_EXCLUSIVE_COMPUTE_GET(reg_val) \ (((reg_val)&CSG_EP_REQ_EXCLUSIVE_COMPUTE_MASK) >> CSG_EP_REQ_EXCLUSIVE_COMPUTE_SHIFT) -#define CSG_EP_REQ_EXCLUSIVE_COMPUTE_SET(reg_val, value) \ - (((reg_val) & ~CSG_EP_REQ_EXCLUSIVE_COMPUTE_MASK) | \ - (((value) << CSG_EP_REQ_EXCLUSIVE_COMPUTE_SHIFT) & CSG_EP_REQ_EXCLUSIVE_COMPUTE_MASK)) +#define CSG_EP_REQ_EXCLUSIVE_COMPUTE_SET(reg_val, value) \ + (((reg_val) & ~CSG_EP_REQ_EXCLUSIVE_COMPUTE_MASK) | \ + ((((u64)value) << CSG_EP_REQ_EXCLUSIVE_COMPUTE_SHIFT) & \ + CSG_EP_REQ_EXCLUSIVE_COMPUTE_MASK)) #define CSG_EP_REQ_EXCLUSIVE_FRAGMENT_SHIFT 21 -#define CSG_EP_REQ_EXCLUSIVE_FRAGMENT_MASK (0x1 << CSG_EP_REQ_EXCLUSIVE_FRAGMENT_SHIFT) +#define CSG_EP_REQ_EXCLUSIVE_FRAGMENT_MASK ((u64)0x1 << CSG_EP_REQ_EXCLUSIVE_FRAGMENT_SHIFT) #define CSG_EP_REQ_EXCLUSIVE_FRAGMENT_GET(reg_val) \ (((reg_val)&CSG_EP_REQ_EXCLUSIVE_FRAGMENT_MASK) >> CSG_EP_REQ_EXCLUSIVE_FRAGMENT_SHIFT) -#define CSG_EP_REQ_EXCLUSIVE_FRAGMENT_SET(reg_val, value) \ - (((reg_val) & ~CSG_EP_REQ_EXCLUSIVE_FRAGMENT_MASK) | \ - (((value) << CSG_EP_REQ_EXCLUSIVE_FRAGMENT_SHIFT) & CSG_EP_REQ_EXCLUSIVE_FRAGMENT_MASK)) +#define CSG_EP_REQ_EXCLUSIVE_FRAGMENT_SET(reg_val, value) \ + (((reg_val) & ~CSG_EP_REQ_EXCLUSIVE_FRAGMENT_MASK) | \ + ((((u64)value) << CSG_EP_REQ_EXCLUSIVE_FRAGMENT_SHIFT) & \ + CSG_EP_REQ_EXCLUSIVE_FRAGMENT_MASK)) #define CSG_EP_REQ_PRIORITY_SHIFT 28 -#define CSG_EP_REQ_PRIORITY_MASK (0xF << CSG_EP_REQ_PRIORITY_SHIFT) +#define CSG_EP_REQ_PRIORITY_MASK ((u64)0xF << CSG_EP_REQ_PRIORITY_SHIFT) #define CSG_EP_REQ_PRIORITY_GET(reg_val) (((reg_val)&CSG_EP_REQ_PRIORITY_MASK) >> CSG_EP_REQ_PRIORITY_SHIFT) -#define CSG_EP_REQ_PRIORITY_SET(reg_val, value) \ - (((reg_val) & ~CSG_EP_REQ_PRIORITY_MASK) | (((value) << CSG_EP_REQ_PRIORITY_SHIFT) & CSG_EP_REQ_PRIORITY_MASK)) +#define CSG_EP_REQ_PRIORITY_SET(reg_val, value) \ + (((reg_val) & ~CSG_EP_REQ_PRIORITY_MASK) | \ + ((((u64)value) << CSG_EP_REQ_PRIORITY_SHIFT) & CSG_EP_REQ_PRIORITY_MASK)) + /* CSG_SUSPEND_BUF register */ #define CSG_SUSPEND_BUF_POINTER_SHIFT 0 -#define CSG_SUSPEND_BUF_POINTER_MASK (0xFFFFFFFFFFFFFFFF << CSG_SUSPEND_BUF_POINTER_SHIFT) +#define CSG_SUSPEND_BUF_POINTER_MASK (GPU_ULL(0xFFFFFFFFFFFFFFFF) << CSG_SUSPEND_BUF_POINTER_SHIFT) #define CSG_SUSPEND_BUF_POINTER_GET(reg_val) (((reg_val)&CSG_SUSPEND_BUF_POINTER_MASK) >> CSG_SUSPEND_BUF_POINTER_SHIFT) #define CSG_SUSPEND_BUF_POINTER_SET(reg_val, value) \ (((reg_val) & ~CSG_SUSPEND_BUF_POINTER_MASK) | \ @@ -940,13 +1008,29 @@ /* CSG_PROTM_SUSPEND_BUF register */ #define CSG_PROTM_SUSPEND_BUF_POINTER_SHIFT 0 -#define CSG_PROTM_SUSPEND_BUF_POINTER_MASK (0xFFFFFFFFFFFFFFFF << CSG_PROTM_SUSPEND_BUF_POINTER_SHIFT) +#define CSG_PROTM_SUSPEND_BUF_POINTER_MASK \ + (GPU_ULL(0xFFFFFFFFFFFFFFFF) << CSG_PROTM_SUSPEND_BUF_POINTER_SHIFT) #define CSG_PROTM_SUSPEND_BUF_POINTER_GET(reg_val) \ (((reg_val)&CSG_PROTM_SUSPEND_BUF_POINTER_MASK) >> CSG_PROTM_SUSPEND_BUF_POINTER_SHIFT) #define CSG_PROTM_SUSPEND_BUF_POINTER_SET(reg_val, value) \ (((reg_val) & ~CSG_PROTM_SUSPEND_BUF_POINTER_MASK) | \ (((value) << CSG_PROTM_SUSPEND_BUF_POINTER_SHIFT) & CSG_PROTM_SUSPEND_BUF_POINTER_MASK)) +/* CSG_DVS_BUF_BUFFER register */ +#define CSG_DVS_BUF_BUFFER_SIZE_SHIFT GPU_U(0) +#define CSG_DVS_BUF_BUFFER_SIZE_MASK (GPU_U(0xFFF) << CSG_DVS_BUF_BUFFER_SIZE_SHIFT) +#define CSG_DVS_BUF_BUFFER_SIZE_GET(reg_val) (((reg_val)&CSG_DVS_BUF_BUFFER_SIZE_MASK) >> CSG_DVS_BUF_BUFFER_SIZE_SHIFT) +#define CSG_DVS_BUF_BUFFER_SIZE_SET(reg_val, value) \ + (((reg_val) & ~CSG_DVS_BUF_BUFFER_SIZE_MASK) | \ + (((value) << CSG_DVS_BUF_BUFFER_SIZE_SHIFT) & CSG_DVS_BUF_BUFFER_SIZE_MASK)) +#define CSG_DVS_BUF_BUFFER_POINTER_SHIFT GPU_U(12) +#define CSG_DVS_BUF_BUFFER_POINTER_MASK \ + (GPU_ULL(0xFFFFFFFFFFFFF) << CSG_DVS_BUF_BUFFER_POINTER_SHIFT) +#define CSG_DVS_BUF_BUFFER_POINTER_GET(reg_val) \ + (((reg_val)&CSG_DVS_BUF_BUFFER_POINTER_MASK) >> CSG_DVS_BUF_BUFFER_POINTER_SHIFT) +#define CSG_DVS_BUF_BUFFER_POINTER_SET(reg_val, value) \ + (((reg_val) & ~CSG_DVS_BUF_BUFFER_POINTER_MASK) | \ + (((value) << CSG_DVS_BUF_BUFFER_POINTER_SHIFT) & CSG_DVS_BUF_BUFFER_POINTER_MASK)) /* End of CSG_INPUT_BLOCK register set definitions */ @@ -1021,6 +1105,7 @@ (((reg_val) & ~CSG_STATUS_EP_CURRENT_TILER_EP_MASK) | \ (((value) << CSG_STATUS_EP_CURRENT_TILER_EP_SHIFT) & CSG_STATUS_EP_CURRENT_TILER_EP_MASK)) + /* CSG_STATUS_EP_REQ register */ #define CSG_STATUS_EP_REQ_COMPUTE_EP_SHIFT 0 #define CSG_STATUS_EP_REQ_COMPUTE_EP_MASK (0xFF << CSG_STATUS_EP_REQ_COMPUTE_EP_SHIFT) @@ -1058,6 +1143,7 @@ (((reg_val) & ~CSG_STATUS_EP_REQ_EXCLUSIVE_FRAGMENT_MASK) | \ (((value) << CSG_STATUS_EP_REQ_EXCLUSIVE_FRAGMENT_SHIFT) & CSG_STATUS_EP_REQ_EXCLUSIVE_FRAGMENT_MASK)) + /* End of CSG_OUTPUT_BLOCK register set definitions */ /* STREAM_CONTROL_BLOCK register set definitions */ @@ -1406,9 +1492,23 @@ #define GLB_PWROFF_TIMER_TIMER_SOURCE_GPU_COUNTER 0x1 /* End of GLB_PWROFF_TIMER_TIMER_SOURCE values */ +/* GLB_PWROFF_TIMER_CONFIG register */ +#ifndef GLB_PWROFF_TIMER_CONFIG +#define GLB_PWROFF_TIMER_CONFIG 0x0088 /* () Configuration fields for GLB_PWROFF_TIMER */ +#define GLB_PWROFF_TIMER_CONFIG_NO_MODIFIER_SHIFT 0 +#define GLB_PWROFF_TIMER_CONFIG_NO_MODIFIER_MASK (0x1 << GLB_PWROFF_TIMER_CONFIG_NO_MODIFIER_SHIFT) +#define GLB_PWROFF_TIMER_CONFIG_NO_MODIFIER_GET(reg_val) \ + (((reg_val)&GLB_PWROFF_TIMER_CONFIG_NO_MODIFIER_MASK) >> \ + GLB_PWROFF_TIMER_CONFIG_NO_MODIFIER_SHIFT) +#define GLB_PWROFF_TIMER_CONFIG_NO_MODIFIER_SET(reg_val, value) \ + (((reg_val) & ~GLB_PWROFF_TIMER_CONFIG_NO_MODIFIER_MASK) | \ + (((value) << GLB_PWROFF_TIMER_CONFIG_NO_MODIFIER_SHIFT) & \ + GLB_PWROFF_TIMER_CONFIG_NO_MODIFIER_MASK)) +#endif /* End of GLB_PWROFF_TIMER_CONFIG values */ + /* GLB_ALLOC_EN register */ #define GLB_ALLOC_EN_MASK_SHIFT 0 -#define GLB_ALLOC_EN_MASK_MASK (0xFFFFFFFFFFFFFFFF << GLB_ALLOC_EN_MASK_SHIFT) +#define GLB_ALLOC_EN_MASK_MASK (GPU_ULL(0xFFFFFFFFFFFFFFFF) << GLB_ALLOC_EN_MASK_SHIFT) #define GLB_ALLOC_EN_MASK_GET(reg_val) (((reg_val)&GLB_ALLOC_EN_MASK_MASK) >> GLB_ALLOC_EN_MASK_SHIFT) #define GLB_ALLOC_EN_MASK_SET(reg_val, value) \ (((reg_val) & ~GLB_ALLOC_EN_MASK_MASK) | (((value) << GLB_ALLOC_EN_MASK_SHIFT) & GLB_ALLOC_EN_MASK_MASK)) @@ -1471,6 +1571,20 @@ #define GLB_IDLE_TIMER_TIMER_SOURCE_GPU_COUNTER 0x1 /* End of GLB_IDLE_TIMER_TIMER_SOURCE values */ +/* GLB_IDLE_TIMER_CONFIG values */ +#ifndef GLB_IDLE_TIMER_CONFIG +#define GLB_IDLE_TIMER_CONFIG 0x0084 /* () Configuration fields for GLB_IDLE_TIMER */ +#define GLB_IDLE_TIMER_CONFIG_NO_MODIFIER_SHIFT 0 +#define GLB_IDLE_TIMER_CONFIG_NO_MODIFIER_MASK (0x1 << GLB_IDLE_TIMER_CONFIG_NO_MODIFIER_SHIFT) +#define GLB_IDLE_TIMER_CONFIG_NO_MODIFIER_GET(reg_val) \ + (((reg_val)&GLB_IDLE_TIMER_CONFIG_NO_MODIFIER_MASK) >> \ + GLB_IDLE_TIMER_CONFIG_NO_MODIFIER_SHIFT) +#define GLB_IDLE_TIMER_CONFIG_NO_MODIFIER_SET(reg_val, value) \ + (((reg_val) & ~GLB_IDLE_TIMER_CONFIG_NO_MODIFIER_MASK) | \ + (((value) << GLB_IDLE_TIMER_CONFIG_NO_MODIFIER_SHIFT) & \ + GLB_IDLE_TIMER_CONFIG_NO_MODIFIER_MASK)) +#endif /* End of GLB_IDLE_TIMER_CONFIG values */ + /* GLB_INSTR_FEATURES register */ #define GLB_INSTR_FEATURES_OFFSET_UPDATE_RATE_SHIFT (0) #define GLB_INSTR_FEATURES_OFFSET_UPDATE_RATE_MASK ((u32)0xF << GLB_INSTR_FEATURES_OFFSET_UPDATE_RATE_SHIFT) @@ -1521,4 +1635,84 @@ (((value) << GLB_REQ_ITER_TRACE_ENABLE_SHIFT) & \ GLB_REQ_ITER_TRACE_ENABLE_MASK)) +/* GLB_PRFCNT_CONFIG register */ +#define GLB_PRFCNT_CONFIG_SIZE_SHIFT (0) +#define GLB_PRFCNT_CONFIG_SIZE_MASK (0xFF << GLB_PRFCNT_CONFIG_SIZE_SHIFT) +#define GLB_PRFCNT_CONFIG_SIZE_GET(reg_val) \ + (((reg_val)&GLB_PRFCNT_CONFIG_SIZE_MASK) >> GLB_PRFCNT_CONFIG_SIZE_SHIFT) +#define GLB_PRFCNT_CONFIG_SIZE_SET(reg_val, value) \ + (((reg_val) & ~GLB_PRFCNT_CONFIG_SIZE_MASK) | \ + (((value) << GLB_PRFCNT_CONFIG_SIZE_SHIFT) & GLB_PRFCNT_CONFIG_SIZE_MASK)) +#define GLB_PRFCNT_CONFIG_SET_SELECT_SHIFT GPU_U(8) +#define GLB_PRFCNT_CONFIG_SET_SELECT_MASK (GPU_U(0x3) << GLB_PRFCNT_CONFIG_SET_SELECT_SHIFT) +#define GLB_PRFCNT_CONFIG_SET_SELECT_GET(reg_val) \ + (((reg_val)&GLB_PRFCNT_CONFIG_SET_SELECT_MASK) >> GLB_PRFCNT_CONFIG_SET_SELECT_SHIFT) +#define GLB_PRFCNT_CONFIG_SET_SELECT_SET(reg_val, value) \ + (((reg_val) & ~GLB_PRFCNT_CONFIG_SET_SELECT_MASK) | \ + (((value) << GLB_PRFCNT_CONFIG_SET_SELECT_SHIFT) & GLB_PRFCNT_CONFIG_SET_SELECT_MASK)) + +/* GLB_PRFCNT_SIZE register */ +#define GLB_PRFCNT_SIZE_HARDWARE_SIZE_SET_MOD(value) ((value) >> 8) +#define GLB_PRFCNT_SIZE_HARDWARE_SIZE_GET_MOD(value) ((value) << 8) +#define GLB_PRFCNT_SIZE_HARDWARE_SIZE_SHIFT GPU_U(0) +#define GLB_PRFCNT_SIZE_HARDWARE_SIZE_MASK (GPU_U(0xFFFF) << GLB_PRFCNT_SIZE_HARDWARE_SIZE_SHIFT) +#define GLB_PRFCNT_SIZE_HARDWARE_SIZE_GET(reg_val) \ + (GLB_PRFCNT_SIZE_HARDWARE_SIZE_GET_MOD(((reg_val)&GLB_PRFCNT_SIZE_HARDWARE_SIZE_MASK) >> \ + GLB_PRFCNT_SIZE_HARDWARE_SIZE_SHIFT)) +#define GLB_PRFCNT_SIZE_HARDWARE_SIZE_SET(reg_val, value) \ + (((reg_val) & ~GLB_PRFCNT_SIZE_HARDWARE_SIZE_MASK) | \ + ((GLB_PRFCNT_SIZE_HARDWARE_SIZE_SET_MOD(value) << GLB_PRFCNT_SIZE_HARDWARE_SIZE_SHIFT) & \ + GLB_PRFCNT_SIZE_HARDWARE_SIZE_MASK)) +#define GLB_PRFCNT_SIZE_FIRMWARE_SIZE_SET_MOD(value) ((value) >> 8) +#define GLB_PRFCNT_SIZE_FIRMWARE_SIZE_GET_MOD(value) ((value) << 8) +#define GLB_PRFCNT_SIZE_FIRMWARE_SIZE_SHIFT GPU_U(16) +#define GLB_PRFCNT_SIZE_FIRMWARE_SIZE_MASK (GPU_U(0xFFFF) << GLB_PRFCNT_SIZE_FIRMWARE_SIZE_SHIFT) +#define GLB_PRFCNT_SIZE_FIRMWARE_SIZE_GET(reg_val) \ + (GLB_PRFCNT_SIZE_FIRMWARE_SIZE_GET_MOD(((reg_val)&GLB_PRFCNT_SIZE_FIRMWARE_SIZE_MASK) >> \ + GLB_PRFCNT_SIZE_FIRMWARE_SIZE_SHIFT)) +#define GLB_PRFCNT_SIZE_FIRMWARE_SIZE_SET(reg_val, value) \ + (((reg_val) & ~GLB_PRFCNT_SIZE_FIRMWARE_SIZE_MASK) | \ + ((GLB_PRFCNT_SIZE_FIRMWARE_SIZE_SET_MOD(value) << GLB_PRFCNT_SIZE_FIRMWARE_SIZE_SHIFT) & \ + GLB_PRFCNT_SIZE_FIRMWARE_SIZE_MASK)) + +/* GLB_DEBUG_REQ register */ +#define GLB_DEBUG_REQ_DEBUG_RUN_SHIFT GPU_U(23) +#define GLB_DEBUG_REQ_DEBUG_RUN_MASK (GPU_U(0x1) << GLB_DEBUG_REQ_DEBUG_RUN_SHIFT) +#define GLB_DEBUG_REQ_DEBUG_RUN_GET(reg_val) \ + (((reg_val)&GLB_DEBUG_REQ_DEBUG_RUN_MASK) >> GLB_DEBUG_REQ_DEBUG_RUN_SHIFT) +#define GLB_DEBUG_REQ_DEBUG_RUN_SET(reg_val, value) \ + (((reg_val) & ~GLB_DEBUG_REQ_DEBUG_RUN_MASK) | \ + (((value) << GLB_DEBUG_REQ_DEBUG_RUN_SHIFT) & GLB_DEBUG_REQ_DEBUG_RUN_MASK)) + +#define GLB_DEBUG_REQ_RUN_MODE_SHIFT GPU_U(24) +#define GLB_DEBUG_REQ_RUN_MODE_MASK (GPU_U(0xFF) << GLB_DEBUG_REQ_RUN_MODE_SHIFT) +#define GLB_DEBUG_REQ_RUN_MODE_GET(reg_val) \ + (((reg_val)&GLB_DEBUG_REQ_RUN_MODE_MASK) >> GLB_DEBUG_REQ_RUN_MODE_SHIFT) +#define GLB_DEBUG_REQ_RUN_MODE_SET(reg_val, value) \ + (((reg_val) & ~GLB_DEBUG_REQ_RUN_MODE_MASK) | \ + (((value) << GLB_DEBUG_REQ_RUN_MODE_SHIFT) & GLB_DEBUG_REQ_RUN_MODE_MASK)) + +/* GLB_DEBUG_ACK register */ +#define GLB_DEBUG_ACK_DEBUG_RUN_SHIFT GPU_U(23) +#define GLB_DEBUG_ACK_DEBUG_RUN_MASK (GPU_U(0x1) << GLB_DEBUG_ACK_DEBUG_RUN_SHIFT) +#define GLB_DEBUG_ACK_DEBUG_RUN_GET(reg_val) \ + (((reg_val)&GLB_DEBUG_ACK_DEBUG_RUN_MASK) >> GLB_DEBUG_ACK_DEBUG_RUN_SHIFT) +#define GLB_DEBUG_ACK_DEBUG_RUN_SET(reg_val, value) \ + (((reg_val) & ~GLB_DEBUG_ACK_DEBUG_RUN_MASK) | \ + (((value) << GLB_DEBUG_ACK_DEBUG_RUN_SHIFT) & GLB_DEBUG_ACK_DEBUG_RUN_MASK)) + +#define GLB_DEBUG_ACK_RUN_MODE_SHIFT GPU_U(24) +#define GLB_DEBUG_ACK_RUN_MODE_MASK (GPU_U(0xFF) << GLB_DEBUG_ACK_RUN_MODE_SHIFT) +#define GLB_DEBUG_ACK_RUN_MODE_GET(reg_val) \ + (((reg_val)&GLB_DEBUG_ACK_RUN_MODE_MASK) >> GLB_DEBUG_ACK_RUN_MODE_SHIFT) +#define GLB_DEBUG_ACK_RUN_MODE_SET(reg_val, value) \ + (((reg_val) & ~GLB_DEBUG_ACK_RUN_MODE_MASK) | \ + (((value) << GLB_DEBUG_ACK_RUN_MODE_SHIFT) & GLB_DEBUG_ACK_RUN_MODE_MASK)) + + +/* RUN_MODE values */ +#define GLB_DEBUG_RUN_MODE_TYPE_NOP 0x0 +#define GLB_DEBUG_RUN_MODE_TYPE_CORE_DUMP 0x1 +/* End of RUN_MODE values */ + #endif /* _KBASE_CSF_REGISTERS_H_ */ diff --git a/mali_kbase/csf/mali_kbase_csf_reset_gpu.c b/mali_kbase/csf/mali_kbase_csf_reset_gpu.c index 10de93f..8ed65b1 100644 --- a/mali_kbase/csf/mali_kbase_csf_reset_gpu.c +++ b/mali_kbase/csf/mali_kbase_csf_reset_gpu.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -21,7 +21,7 @@ #include <mali_kbase.h> #include <mali_kbase_ctx_sched.h> -#include <mali_kbase_hwcnt_context.h> +#include <hwcnt/mali_kbase_hwcnt_context.h> #include <device/mali_kbase_device.h> #include <backend/gpu/mali_kbase_irq_internal.h> #include <backend/gpu/mali_kbase_pm_internal.h> @@ -29,7 +29,10 @@ #include <csf/mali_kbase_csf_trace_buffer.h> #include <csf/ipa_control/mali_kbase_csf_ipa_control.h> #include <mali_kbase_reset_gpu.h> -#include <linux/string.h> +#include <csf/mali_kbase_csf_firmware_log.h> +#include "mali_kbase_config_platform.h" + +#include <soc/google/debug-snapshot.h> enum kbasep_soft_reset_status { RESET_SUCCESS = 0, @@ -163,6 +166,11 @@ void kbase_reset_gpu_assert_failed_or_prevented(struct kbase_device *kbdev) WARN_ON(kbase_reset_gpu_is_active(kbdev)); } +bool kbase_reset_gpu_failed(struct kbase_device *kbdev) +{ + return (atomic_read(&kbdev->csf.reset.state) == KBASE_CSF_RESET_GPU_FAILED); +} + /* Mark the reset as now happening, and synchronize with other threads that * might be trying to access the GPU */ @@ -173,6 +181,9 @@ static void kbase_csf_reset_begin_hw_access_sync( unsigned long hwaccess_lock_flags; unsigned long scheduler_spin_lock_flags; + /* Flush any pending coredumps */ + flush_work(&kbdev->csf.coredump_work); + /* Note this is a WARN/atomic_set because it is a software issue for a * race to be occurring here */ @@ -185,7 +196,7 @@ static void kbase_csf_reset_begin_hw_access_sync( */ spin_lock_irqsave(&kbdev->hwaccess_lock, hwaccess_lock_flags); kbase_csf_scheduler_spin_lock(kbdev, &scheduler_spin_lock_flags); - atomic_set(&kbdev->csf.reset.state, KBASE_RESET_GPU_HAPPENING); + atomic_set(&kbdev->csf.reset.state, KBASE_CSF_RESET_GPU_HAPPENING); kbase_csf_scheduler_spin_unlock(kbdev, scheduler_spin_lock_flags); spin_unlock_irqrestore(&kbdev->hwaccess_lock, hwaccess_lock_flags); } @@ -215,6 +226,9 @@ static void kbase_csf_reset_end_hw_access(struct kbase_device *kbdev, } else { dev_err(kbdev->dev, "Reset failed to complete"); atomic_set(&kbdev->csf.reset.state, KBASE_CSF_RESET_GPU_FAILED); + + /* pixel: This is unrecoverable, collect a ramdump and reboot. */ + dbg_snapshot_emergency_reboot("mali: reset failed - unrecoverable GPU"); } kbase_csf_scheduler_spin_unlock(kbdev, scheduler_spin_lock_flags); @@ -231,23 +245,27 @@ static void kbase_csf_reset_end_hw_access(struct kbase_device *kbdev, kbase_csf_scheduler_enable_tick_timer(kbdev); } -static void kbase_csf_debug_dump_registers(struct kbase_device *kbdev) +void kbase_csf_debug_dump_registers(struct kbase_device *kbdev) { +#define DOORBELL_CFG_BASE 0x20000 +#define MCUC_DB_VALUE_0 0x80 + struct kbase_csf_global_iface *global_iface = &kbdev->csf.global_iface; kbase_io_history_dump(kbdev); - dev_err(kbdev->dev, "Register state:"); + dev_err(kbdev->dev, "MCU state:"); dev_err(kbdev->dev, " GPU_IRQ_RAWSTAT=0x%08x GPU_STATUS=0x%08x MCU_STATUS=0x%08x", kbase_reg_read(kbdev, GPU_CONTROL_REG(GPU_IRQ_RAWSTAT)), kbase_reg_read(kbdev, GPU_CONTROL_REG(GPU_STATUS)), kbase_reg_read(kbdev, GPU_CONTROL_REG(MCU_STATUS))); - dev_err(kbdev->dev, " JOB_IRQ_RAWSTAT=0x%08x MMU_IRQ_RAWSTAT=0x%08x GPU_FAULTSTATUS=0x%08x", + dev_err(kbdev->dev, + " JOB_IRQ_RAWSTAT=0x%08x MMU_IRQ_RAWSTAT=0x%08x GPU_FAULTSTATUS=0x%08x", kbase_reg_read(kbdev, JOB_CONTROL_REG(JOB_IRQ_RAWSTAT)), - kbase_reg_read(kbdev, MMU_REG(MMU_IRQ_RAWSTAT)), + kbase_reg_read(kbdev, MMU_CONTROL_REG(MMU_IRQ_RAWSTAT)), kbase_reg_read(kbdev, GPU_CONTROL_REG(GPU_FAULTSTATUS))); dev_err(kbdev->dev, " GPU_IRQ_MASK=0x%08x JOB_IRQ_MASK=0x%08x MMU_IRQ_MASK=0x%08x", kbase_reg_read(kbdev, GPU_CONTROL_REG(GPU_IRQ_MASK)), kbase_reg_read(kbdev, JOB_CONTROL_REG(JOB_IRQ_MASK)), - kbase_reg_read(kbdev, MMU_REG(MMU_IRQ_MASK))); + kbase_reg_read(kbdev, MMU_CONTROL_REG(MMU_IRQ_MASK))); dev_err(kbdev->dev, " PWR_OVERRIDE0=0x%08x PWR_OVERRIDE1=0x%08x", kbase_reg_read(kbdev, GPU_CONTROL_REG(PWR_OVERRIDE0)), kbase_reg_read(kbdev, GPU_CONTROL_REG(PWR_OVERRIDE1))); @@ -255,68 +273,12 @@ static void kbase_csf_debug_dump_registers(struct kbase_device *kbdev) kbase_reg_read(kbdev, GPU_CONTROL_REG(SHADER_CONFIG)), kbase_reg_read(kbdev, GPU_CONTROL_REG(L2_MMU_CONFIG)), kbase_reg_read(kbdev, GPU_CONTROL_REG(TILER_CONFIG))); -} - -static void kbase_csf_dump_firmware_trace_buffer(struct kbase_device *kbdev) -{ - u8 *buf, *p, *pnewline, *pend, *pendbuf; - unsigned int read_size, remaining_size; - struct firmware_trace_buffer *tb = - kbase_csf_firmware_get_trace_buffer(kbdev, FW_TRACE_BUF_NAME); - - if (tb == NULL) { - dev_dbg(kbdev->dev, "Can't get the trace buffer, firmware trace dump skipped"); - return; - } - - buf = kmalloc(PAGE_SIZE + 1, GFP_KERNEL); - if (buf == NULL) { - dev_err(kbdev->dev, "Short of memory, firmware trace dump skipped"); - return; - } - - buf[PAGE_SIZE] = 0; - - p = buf; - pendbuf = &buf[PAGE_SIZE]; - - dev_err(kbdev->dev, "Firmware trace buffer dump:"); - while ((read_size = kbase_csf_firmware_trace_buffer_read_data(tb, p, - pendbuf - p))) { - pend = p + read_size; - p = buf; - - while (p < pend && (pnewline = memchr(p, '\n', pend - p))) { - /* Null-terminate the string */ - *pnewline = 0; - - dev_err(kbdev->dev, "FW> %s", p); - - p = pnewline + 1; - } - - remaining_size = pend - p; - - if (!remaining_size) { - p = buf; - } else if (remaining_size < PAGE_SIZE) { - /* Copy unfinished string to the start of the buffer */ - memmove(buf, p, remaining_size); - p = &buf[remaining_size]; - } else { - /* Print abnormal page-long string without newlines */ - dev_err(kbdev->dev, "FW> %s", buf); - p = buf; - } - } - - if (p != buf) { - /* Null-terminate and print last unfinished string */ - *p = 0; - dev_err(kbdev->dev, "FW> %s", buf); - } - - kfree(buf); + dev_err(kbdev->dev, " MCU DB0: %x", kbase_reg_read(kbdev, DOORBELL_CFG_BASE + MCUC_DB_VALUE_0)); + dev_err(kbdev->dev, " MCU GLB_REQ %x GLB_ACK %x", + kbase_csf_firmware_global_input_read(global_iface, GLB_REQ), + kbase_csf_firmware_global_output(global_iface, GLB_ACK)); +#undef MCUC_DB_VALUE_0 +#undef DOORBELL_CFG_BASE } /** @@ -378,7 +340,6 @@ static enum kbasep_soft_reset_status kbase_csf_reset_gpu_once(struct kbase_devic "The flush has completed so reset the active indicator\n"); kbdev->irq_reset_flush = false; - mutex_lock(&kbdev->pm.lock); if (!silent) dev_err(kbdev->dev, "Resetting GPU (allowing up to %d ms)", RESET_TIMEOUT); @@ -389,7 +350,7 @@ static enum kbasep_soft_reset_status kbase_csf_reset_gpu_once(struct kbase_devic if (!silent) { kbase_csf_debug_dump_registers(kbdev); if (likely(firmware_inited)) - kbase_csf_dump_firmware_trace_buffer(kbdev); + kbase_csf_firmware_log_dump_buffer(kbdev); } spin_lock_irqsave(&kbdev->hwaccess_lock, flags); @@ -403,10 +364,11 @@ static enum kbasep_soft_reset_status kbase_csf_reset_gpu_once(struct kbase_devic */ kbase_hwcnt_backend_csf_on_before_reset(&kbdev->hwcnt_gpu_iface); + rt_mutex_lock(&kbdev->pm.lock); /* Reset the GPU */ err = kbase_pm_init_hw(kbdev, 0); - mutex_unlock(&kbdev->pm.lock); + rt_mutex_unlock(&kbdev->pm.lock); if (WARN_ON(err)) return SOFT_RESET_FAILED; @@ -420,17 +382,19 @@ static enum kbasep_soft_reset_status kbase_csf_reset_gpu_once(struct kbase_devic kbase_pm_enable_interrupts(kbdev); - mutex_lock(&kbdev->pm.lock); + rt_mutex_lock(&kbdev->pm.lock); kbase_pm_reset_complete(kbdev); /* Synchronously wait for the reload of firmware to complete */ err = kbase_pm_wait_for_desired_state(kbdev); - mutex_unlock(&kbdev->pm.lock); + rt_mutex_unlock(&kbdev->pm.lock); if (err) { + spin_lock_irqsave(&kbdev->hwaccess_lock, flags); if (!kbase_pm_l2_is_in_desired_state(kbdev)) ret = L2_ON_FAILED; else if (!kbase_pm_mcu_is_in_desired_state(kbdev)) ret = MCU_REINIT_FAILED; + spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); } return ret; @@ -512,6 +476,7 @@ static void kbase_csf_reset_gpu_worker(struct work_struct *data) atomic_read(&kbdev->csf.reset.state); const bool silent = kbase_csf_reset_state_is_silent(initial_reset_state); + struct gpu_uevent evt; /* Ensure any threads (e.g. executing the CSF scheduler) have finished * using the HW @@ -549,6 +514,16 @@ static void kbase_csf_reset_gpu_worker(struct work_struct *data) kbase_disjoint_state_down(kbdev); + if (err) { + evt.type = GPU_UEVENT_TYPE_GPU_RESET; + evt.info = GPU_UEVENT_INFO_CSF_RESET_FAILED; + } else { + evt.type = GPU_UEVENT_TYPE_GPU_RESET; + evt.info = GPU_UEVENT_INFO_CSF_RESET_OK; + } + if (!silent) + pixel_gpu_uevent_send(kbdev, &evt); + /* Allow other threads to once again use the GPU */ kbase_csf_reset_end_hw_access(kbdev, err, firmware_inited); } @@ -566,6 +541,9 @@ bool kbase_prepare_to_reset_gpu(struct kbase_device *kbdev, unsigned int flags) /* Some other thread is already resetting the GPU */ return false; + if (flags & RESET_FLAGS_FORCE_PM_HW_RESET) + kbdev->csf.reset.force_pm_hw_reset = true; + return true; } KBASE_EXPORT_TEST_API(kbase_prepare_to_reset_gpu); @@ -633,6 +611,11 @@ bool kbase_reset_gpu_is_active(struct kbase_device *kbdev) return kbase_csf_reset_state_is_active(reset_state); } +bool kbase_reset_gpu_is_not_pending(struct kbase_device *kbdev) +{ + return atomic_read(&kbdev->csf.reset.state) == KBASE_CSF_RESET_GPU_NOT_PENDING; +} + int kbase_reset_gpu_wait(struct kbase_device *kbdev) { const long wait_timeout = @@ -676,7 +659,7 @@ KBASE_EXPORT_TEST_API(kbase_reset_gpu_wait); int kbase_reset_gpu_init(struct kbase_device *kbdev) { - kbdev->csf.reset.workq = alloc_workqueue("Mali reset workqueue", 0, 1); + kbdev->csf.reset.workq = alloc_workqueue("Mali reset workqueue", WQ_HIGHPRI, 1); if (kbdev->csf.reset.workq == NULL) return -ENOMEM; @@ -684,6 +667,7 @@ int kbase_reset_gpu_init(struct kbase_device *kbdev) init_waitqueue_head(&kbdev->csf.reset.wait); init_rwsem(&kbdev->csf.reset.sem); + kbdev->csf.reset.force_pm_hw_reset = false; return 0; } diff --git a/mali_kbase/csf/mali_kbase_csf_scheduler.c b/mali_kbase/csf/mali_kbase_csf_scheduler.c index 237b7be..01d6feb 100644 --- a/mali_kbase/csf/mali_kbase_csf_scheduler.c +++ b/mali_kbase/csf/mali_kbase_csf_scheduler.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2018-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2018-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -19,6 +19,8 @@ * */ +#include <linux/kthread.h> + #include <mali_kbase.h> #include "mali_kbase_config_defaults.h" #include <mali_kbase_ctx_sched.h> @@ -28,9 +30,19 @@ #include <tl/mali_kbase_tracepoints.h> #include <backend/gpu/mali_kbase_pm_internal.h> #include <linux/export.h> +#include <linux/delay.h> #include <csf/mali_kbase_csf_registers.h> #include <uapi/gpu/arm/midgard/mali_base_kernel.h> #include <mali_kbase_hwaccess_time.h> +#include <trace/events/power.h> +#include "mali_kbase_csf_tiler_heap.h" +#include "mali_kbase_csf_tiler_heap_reclaim.h" +#include "mali_kbase_csf_mcu_shared_reg.h" +#include <linux/version_compat_defs.h> +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) +#include <mali_kbase_gpu_metrics.h> +#include <csf/mali_kbase_csf_trace_buffer.h> +#endif /* CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD */ /* Value to indicate that a queue group is not groups_to_schedule list */ #define KBASEP_GROUP_PREPARED_SEQ_NUM_INVALID (U32_MAX) @@ -50,36 +62,18 @@ /* CSF scheduler time slice value */ #define CSF_SCHEDULER_TIME_TICK_MS (100) /* 100 milliseconds */ -/* - * CSF scheduler time threshold for converting "tock" requests into "tick" if - * they come too close to the end of a tick interval. This avoids scheduling - * twice in a row. - */ -#define CSF_SCHEDULER_TIME_TICK_THRESHOLD_MS \ - CSF_SCHEDULER_TIME_TICK_MS - -#define CSF_SCHEDULER_TIME_TICK_THRESHOLD_JIFFIES \ - msecs_to_jiffies(CSF_SCHEDULER_TIME_TICK_THRESHOLD_MS) - -/* Nanoseconds per millisecond */ -#define NS_PER_MS ((u64)1000 * 1000) - -/* - * CSF minimum time to reschedule for a new "tock" request. Bursts of "tock" - * requests are not serviced immediately, but shall wait for a minimum time in - * order to reduce load on the CSF scheduler thread. - */ -#define CSF_SCHEDULER_TIME_TOCK_JIFFIES 1 /* 1 jiffies-time */ - -/* CS suspended and is idle (empty ring buffer) */ -#define CS_IDLE_FLAG (1 << 0) - -/* CS suspended and is wait for a CQS condition */ -#define CS_WAIT_SYNC_FLAG (1 << 1) +/* CSG_REQ:STATUS_UPDATE timeout */ +#define CSG_STATUS_UPDATE_REQ_TIMEOUT_MS (250) /* 250 milliseconds */ /* A GPU address space slot is reserved for MCU. */ #define NUM_RESERVED_AS_SLOTS (1) +/* Time to wait for completion of PING req before considering MCU as hung */ +#define FW_PING_AFTER_ERROR_TIMEOUT_MS (10) + +/* Explicitly defining this blocked_reason code as SB_WAIT for clarity */ +#define CS_STATUS_BLOCKED_ON_SB_WAIT CS_STATUS_BLOCKED_REASON_REASON_WAIT + static int scheduler_group_schedule(struct kbase_queue_group *group); static void remove_group_from_idle_wait(struct kbase_queue_group *const group); static @@ -97,9 +91,441 @@ static int suspend_active_queue_groups(struct kbase_device *kbdev, static int suspend_active_groups_on_powerdown(struct kbase_device *kbdev, bool system_suspend); static void schedule_in_cycle(struct kbase_queue_group *group, bool force); +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS +static bool evaluate_sync_update(struct kbase_queue *queue); +#endif +static bool queue_group_scheduled_locked(struct kbase_queue_group *group); #define kctx_as_enabled(kctx) (!kbase_ctx_flag(kctx, KCTX_AS_DISABLED_ON_FAULT)) +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS +void turn_on_sc_power_rails(struct kbase_device *kbdev) +{ + lockdep_assert_held(&kbdev->csf.scheduler.lock); + + WARN_ON(kbdev->csf.scheduler.state == SCHED_SUSPENDED); + + if (kbdev->csf.scheduler.sc_power_rails_off) { + if (kbdev->pm.backend.callback_power_on_sc_rails) + kbdev->pm.backend.callback_power_on_sc_rails(kbdev); + kbdev->csf.scheduler.sc_power_rails_off = false; + } +} + +/** + * turn_off_sc_power_rails - Turn off the shader core power rails. + * + * @kbdev: Pointer to the device. + * + * This function is called to synchronously turn off the shader core power rails. + */ +static void turn_off_sc_power_rails(struct kbase_device *kbdev) +{ + lockdep_assert_held(&kbdev->csf.scheduler.lock); + + WARN_ON(kbdev->csf.scheduler.state == SCHED_SUSPENDED); + + if (!kbdev->csf.scheduler.sc_power_rails_off) { + if (kbdev->pm.backend.callback_power_off_sc_rails) + kbdev->pm.backend.callback_power_off_sc_rails(kbdev); + kbdev->csf.scheduler.sc_power_rails_off = true; + } +} + +/** + * gpu_idle_event_is_pending - Check if there is a pending GPU idle event + * + * @kbdev: Pointer to the device. + */ +static bool gpu_idle_event_is_pending(struct kbase_device *kbdev) +{ + struct kbase_csf_global_iface *global_iface = &kbdev->csf.global_iface; + + lockdep_assert_held(&kbdev->csf.scheduler.lock); + lockdep_assert_held(&kbdev->csf.scheduler.interrupt_lock); + + return (kbase_csf_firmware_global_input_read(global_iface, GLB_REQ) ^ + kbase_csf_firmware_global_output(global_iface, GLB_ACK)) & + GLB_REQ_IDLE_EVENT_MASK; +} + +/** + * ack_gpu_idle_event - Acknowledge the GPU idle event + * + * @kbdev: Pointer to the device. + * + * This function is called to acknowledge the GPU idle event. It is expected + * that firmware will re-enable the User submission only when it receives a + * CSI kernel doorbell after the idle event acknowledgement. + */ +static void ack_gpu_idle_event(struct kbase_device *kbdev) +{ + struct kbase_csf_global_iface *global_iface = &kbdev->csf.global_iface; + u32 glb_req, glb_ack; + unsigned long flags; + + lockdep_assert_held(&kbdev->csf.scheduler.lock); + + spin_lock_irqsave(&kbdev->csf.scheduler.interrupt_lock, flags); + glb_req = kbase_csf_firmware_global_input_read(global_iface, GLB_REQ); + glb_ack = kbase_csf_firmware_global_output(global_iface, GLB_ACK); + if ((glb_req ^ glb_ack) & GLB_REQ_IDLE_EVENT_MASK) { + kbase_csf_firmware_global_input_mask( + global_iface, GLB_REQ, glb_ack, + GLB_REQ_IDLE_EVENT_MASK); + } + spin_unlock_irqrestore(&kbdev->csf.scheduler.interrupt_lock, flags); +} + +static void cancel_gpu_idle_work(struct kbase_device *kbdev) +{ + lockdep_assert_held(&kbdev->csf.scheduler.lock); + + kbdev->csf.scheduler.gpu_idle_work_pending = false; + cancel_delayed_work(&kbdev->csf.scheduler.gpu_idle_work); +} + +static bool queue_empty_or_blocked(struct kbase_queue *queue) +{ + bool empty = false; + bool blocked = false; + + if (CS_STATUS_WAIT_SYNC_WAIT_GET(queue->status_wait)) { + if (!evaluate_sync_update(queue)) + blocked = true; + else + queue->status_wait = 0; + } + + if (!blocked) { + u64 *input_addr = (u64 *)queue->user_io_addr; + u64 *output_addr = (u64 *)(queue->user_io_addr + PAGE_SIZE); + + empty = (input_addr[CS_INSERT_LO / sizeof(u64)] == + output_addr[CS_EXTRACT_LO / sizeof(u64)]); + } + + return (empty || blocked); +} +#endif + +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) +/** + * gpu_metrics_ctx_init() - Take a reference on GPU metrics context if it exists, + * otherwise allocate and initialise one. + * + * @kctx: Pointer to the Kbase context. + * + * The GPU metrics context represents an "Application" for the purposes of GPU metrics + * reporting. There may be multiple kbase_contexts contributing data to a single GPU + * metrics context. + * This function takes a reference on GPU metrics context if it already exists + * corresponding to the Application that is creating the Kbase context, otherwise + * memory is allocated for it and initialised. + * + * Return: 0 on success, or negative on failure. + */ +static inline int gpu_metrics_ctx_init(struct kbase_context *kctx) +{ + struct kbase_gpu_metrics_ctx *gpu_metrics_ctx; + struct kbase_device *kbdev = kctx->kbdev; + int ret = 0; + + const struct cred *cred = get_current_cred(); + const unsigned int aid = cred->euid.val; + + put_cred(cred); + + /* Return early if this is not a Userspace created context */ + if (unlikely(!kctx->kfile)) + return 0; + + /* Serialize against the other threads trying to create/destroy Kbase contexts. */ + mutex_lock(&kbdev->kctx_list_lock); + rt_mutex_lock(&kbdev->csf.scheduler.lock); + gpu_metrics_ctx = kbase_gpu_metrics_ctx_get(kbdev, aid); + rt_mutex_unlock(&kbdev->csf.scheduler.lock); + + if (!gpu_metrics_ctx) { + gpu_metrics_ctx = kmalloc(sizeof(*gpu_metrics_ctx), GFP_KERNEL); + + if (gpu_metrics_ctx) { + rt_mutex_lock(&kbdev->csf.scheduler.lock); + kbase_gpu_metrics_ctx_init(kbdev, gpu_metrics_ctx, aid); + rt_mutex_unlock(&kbdev->csf.scheduler.lock); + } else { + dev_err(kbdev->dev, "Allocation for gpu_metrics_ctx failed"); + ret = -ENOMEM; + } + } + + kctx->gpu_metrics_ctx = gpu_metrics_ctx; + mutex_unlock(&kbdev->kctx_list_lock); + + return ret; +} + +/** + * gpu_metrics_ctx_term() - Drop a reference on a GPU metrics context and free it + * if the refcount becomes 0. + * + * @kctx: Pointer to the Kbase context. + */ +static inline void gpu_metrics_ctx_term(struct kbase_context *kctx) +{ + /* Return early if this is not a Userspace created context */ + if (unlikely(!kctx->kfile)) + return; + + /* Serialize against the other threads trying to create/destroy Kbase contexts. */ + mutex_lock(&kctx->kbdev->kctx_list_lock); + rt_mutex_lock(&kctx->kbdev->csf.scheduler.lock); + kbase_gpu_metrics_ctx_put(kctx->kbdev, kctx->gpu_metrics_ctx); + rt_mutex_unlock(&kctx->kbdev->csf.scheduler.lock); + mutex_unlock(&kctx->kbdev->kctx_list_lock); +} + +/** + * struct gpu_metrics_event - A GPU metrics event recorded in trace buffer. + * + * @csg_slot_act: The 32bit data consisting of a GPU metrics event. + * 5 bits[4:0] represents CSG slot number. + * 1 bit [5] represents the transition of the CSG group on the slot. + * '1' means idle->active whilst '0' does active->idle. + * @timestamp: 64bit timestamp consisting of a GPU metrics event. + * + * Note: It's packed and word-aligned as agreed layout with firmware. + */ +struct gpu_metrics_event { + u32 csg_slot_act; + u64 timestamp; +} __packed __aligned(4); +#define GPU_METRICS_EVENT_SIZE sizeof(struct gpu_metrics_event) + +#define GPU_METRICS_ACT_SHIFT 5 +#define GPU_METRICS_ACT_MASK (0x1 << GPU_METRICS_ACT_SHIFT) +#define GPU_METRICS_ACT_GET(val) (((val)&GPU_METRICS_ACT_MASK) >> GPU_METRICS_ACT_SHIFT) + +#define GPU_METRICS_CSG_MASK 0x1f +#define GPU_METRICS_CSG_GET(val) ((val)&GPU_METRICS_CSG_MASK) + +/** + * gpu_metrics_read_event() - Read a GPU metrics trace from trace buffer + * + * @kbdev: Pointer to the device + * @kctx: Kcontext that is derived from CSG slot field of a GPU metrics. + * @prev_act: Previous CSG activity transition in a GPU metrics. + * @cur_act: Current CSG activity transition in a GPU metrics. + * @ts: CSG activity transition timestamp in a GPU metrics. + * + * This function reads firmware trace buffer, named 'gpu_metrics' and + * parse one 12-byte data packet into following information. + * - The number of CSG slot on which CSG was transitioned to active or idle. + * - Activity transition (1: idle->active, 0: active->idle). + * - Timestamp in nanoseconds when the transition occurred. + * + * Return: true on success. + */ +static bool gpu_metrics_read_event(struct kbase_device *kbdev, struct kbase_context **kctx, + bool *prev_act, bool *cur_act, uint64_t *ts) +{ + struct firmware_trace_buffer *tb = kbdev->csf.scheduler.gpu_metrics_tb; + struct gpu_metrics_event e; + + if (kbase_csf_firmware_trace_buffer_read_data(tb, (u8 *)&e, GPU_METRICS_EVENT_SIZE) == + GPU_METRICS_EVENT_SIZE) { + const u8 slot = GPU_METRICS_CSG_GET(e.csg_slot_act); + struct kbase_queue_group *group; + + if (WARN_ON_ONCE(slot >= kbdev->csf.global_iface.group_num)) { + dev_err(kbdev->dev, "invalid CSG slot (%u)", slot); + return false; + } + + group = kbdev->csf.scheduler.csg_slots[slot].resident_group; + + if (unlikely(!group)) { + dev_err(kbdev->dev, "failed to find CSG group from CSG slot (%u)", slot); + return false; + } + + *cur_act = GPU_METRICS_ACT_GET(e.csg_slot_act); + *ts = kbase_backend_time_convert_gpu_to_cpu(kbdev, e.timestamp); + *kctx = group->kctx; + + *prev_act = group->prev_act; + group->prev_act = *cur_act; + + return true; + } + + dev_err(kbdev->dev, "failed to read a GPU metrics from trace buffer"); + + return false; +} + +/** + * emit_gpu_metrics_to_frontend() - Emit GPU metrics events to the frontend. + * + * @kbdev: Pointer to the device + * + * This function must be called to emit GPU metrics data to the + * frontend whenever needed. + * Calls to this function will be serialized by scheduler lock. + * + * Kbase reports invalid activity traces when detected. + */ +static void emit_gpu_metrics_to_frontend(struct kbase_device *kbdev) +{ + u64 system_time = 0; + u64 ts_before_drain; + u64 ts = 0; + + lockdep_assert_held(&kbdev->csf.scheduler.lock); + +#if IS_ENABLED(CONFIG_MALI_NO_MALI) + return; +#endif + + if (WARN_ON_ONCE(kbdev->csf.scheduler.state == SCHED_SUSPENDED)) + return; + + kbase_backend_get_gpu_time_norequest(kbdev, NULL, &system_time, NULL); + ts_before_drain = kbase_backend_time_convert_gpu_to_cpu(kbdev, system_time); + + while (!kbase_csf_firmware_trace_buffer_is_empty(kbdev->csf.scheduler.gpu_metrics_tb)) { + struct kbase_context *kctx; + bool prev_act; + bool cur_act; + + if (gpu_metrics_read_event(kbdev, &kctx, &prev_act, &cur_act, &ts)) { + if (prev_act == cur_act) { + /* Error handling + * + * In case of active CSG, Kbase will try to recover the + * lost event by ending previously active event and + * starting a new one. + * + * In case of inactive CSG, the event is drop as Kbase + * cannot recover. + */ + dev_err(kbdev->dev, + "Invalid activity state transition. (prev_act = %u, cur_act = %u)", + prev_act, cur_act); + if (cur_act) { + kbase_gpu_metrics_ctx_end_activity(kctx, ts); + kbase_gpu_metrics_ctx_start_activity(kctx, ts); + } + } else { + /* Normal handling */ + if (cur_act) + kbase_gpu_metrics_ctx_start_activity(kctx, ts); + else + kbase_gpu_metrics_ctx_end_activity(kctx, ts); + } + } else + break; + } + + kbase_gpu_metrics_emit_tracepoint(kbdev, ts >= ts_before_drain ? ts + 1 : ts_before_drain); +} +#endif /* CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD */ + +/** + * wait_for_dump_complete_on_group_deschedule() - Wait for dump on fault and + * scheduling tick/tock to complete before the group deschedule. + * + * @group: Pointer to the group that is being descheduled. + * + * This function blocks the descheduling of the group until the dump on fault is + * completed and scheduling tick/tock has completed. + * To deschedule an on slot group CSG termination request would be sent and that + * might time out if the fault had occurred and also potentially affect the state + * being dumped. Moreover the scheduler lock would be held, so the access to debugfs + * files would get blocked. + * Scheduler lock and 'kctx->csf.lock' are released before this function starts + * to wait. When a request sent by the Scheduler to the FW times out, Scheduler + * would also wait for the dumping to complete and release the Scheduler lock + * before the wait. Meanwhile Userspace can try to delete the group, this function + * would ensure that the group doesn't exit the Scheduler until scheduling + * tick/tock has completed. Though very unlikely, group deschedule can be triggered + * from multiple threads around the same time and after the wait Userspace thread + * can win the race and get the group descheduled and free the memory for group + * pointer before the other threads wake up and notice that group has already been + * descheduled. To avoid the freeing in such a case, a sort of refcount is used + * for the group which is incremented & decremented across the wait. + */ +static +void wait_for_dump_complete_on_group_deschedule(struct kbase_queue_group *group) +{ +#if IS_ENABLED(CONFIG_DEBUG_FS) + struct kbase_device *kbdev = group->kctx->kbdev; + struct kbase_context *kctx = group->kctx; + struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler; + + lockdep_assert_held(&kctx->csf.lock); + lockdep_assert_held(&scheduler->lock); + + if (likely(!kbase_debug_csf_fault_dump_enabled(kbdev))) + return; + + while ((!kbase_debug_csf_fault_dump_complete(kbdev) || + (scheduler->state == SCHED_BUSY)) && + queue_group_scheduled_locked(group)) { + group->deschedule_deferred_cnt++; + rt_mutex_unlock(&scheduler->lock); + rt_mutex_unlock(&kctx->csf.lock); + kbase_debug_csf_fault_wait_completion(kbdev); + rt_mutex_lock(&kctx->csf.lock); + rt_mutex_lock(&scheduler->lock); + group->deschedule_deferred_cnt--; + } +#endif +} + +/** + * schedule_actions_trigger_df() - Notify the client about the fault and + * wait for the dumping to complete. + * + * @kbdev: Pointer to the device + * @kctx: Pointer to the context associated with the CSG slot for which + * the timeout was seen. + * @error: Error code indicating the type of timeout that occurred. + * + * This function notifies the Userspace client waiting for the faults and wait + * for the Client to complete the dumping. + * The function is called only from Scheduling tick/tock when a request sent by + * the Scheduler to FW times out or from the protm event work item of the group + * when the protected mode entry request times out. + * In the latter case there is no wait done as scheduler lock would be released + * immediately. In the former case the function waits and releases the scheduler + * lock before the wait. It has been ensured that the Scheduler view of the groups + * won't change meanwhile, so no group can enter/exit the Scheduler, become + * runnable or go off slot. + */ +static void schedule_actions_trigger_df(struct kbase_device *kbdev, + struct kbase_context *kctx, enum dumpfault_error_type error) +{ +#if IS_ENABLED(CONFIG_DEBUG_FS) + struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler; + + lockdep_assert_held(&scheduler->lock); + + if (!kbase_debug_csf_fault_notify(kbdev, kctx, error)) + return; + + if (unlikely(scheduler->state != SCHED_BUSY)) { + WARN_ON(error != DF_PROTECTED_MODE_ENTRY_FAILURE); + return; + } + + rt_mutex_unlock(&scheduler->lock); + kbase_debug_csf_fault_wait_completion(kbdev); + rt_mutex_lock(&scheduler->lock); + WARN_ON(scheduler->state != SCHED_BUSY); +#endif +} + #ifdef KBASE_PM_RUNTIME /** * wait_for_scheduler_to_exit_sleep() - Wait for Scheduler to exit the @@ -143,12 +569,12 @@ static int wait_for_scheduler_to_exit_sleep(struct kbase_device *kbdev) remaining = kbase_csf_timeout_in_jiffies(sleep_exit_wait_time); while ((scheduler->state == SCHED_SLEEPING) && !ret) { - mutex_unlock(&scheduler->lock); + rt_mutex_unlock(&scheduler->lock); remaining = wait_event_timeout( kbdev->csf.event_wait, (scheduler->state != SCHED_SLEEPING), remaining); - mutex_lock(&scheduler->lock); + rt_mutex_lock(&scheduler->lock); if (!remaining && (scheduler->state == SCHED_SLEEPING)) ret = -ETIMEDOUT; } @@ -187,7 +613,8 @@ static int force_scheduler_to_exit_sleep(struct kbase_device *kbdev) goto out; } - if (suspend_active_groups_on_powerdown(kbdev, true)) + ret = suspend_active_groups_on_powerdown(kbdev, true); + if (ret) goto out; kbase_pm_lock(kbdev); @@ -206,6 +633,7 @@ static int force_scheduler_to_exit_sleep(struct kbase_device *kbdev) } scheduler->state = SCHED_SUSPENDED; + KBASE_KTRACE_ADD(kbdev, SCHED_SUSPENDED, NULL, scheduler->state); return 0; @@ -225,80 +653,20 @@ out: * * @timer: Pointer to the scheduling tick hrtimer * - * This function will enqueue the scheduling tick work item for immediate - * execution, if it has not been queued already. + * This function will wake up kbase_csf_scheduler_kthread() to process a + * pending scheduling tick. It will be restarted manually once a tick has been + * processed if appropriate. * * Return: enum value to indicate that timer should not be restarted. */ static enum hrtimer_restart tick_timer_callback(struct hrtimer *timer) { - struct kbase_device *kbdev = container_of(timer, struct kbase_device, - csf.scheduler.tick_timer); - - kbase_csf_scheduler_advance_tick(kbdev); - return HRTIMER_NORESTART; -} - -/** - * start_tick_timer() - Start the scheduling tick hrtimer. - * - * @kbdev: Pointer to the device - * - * This function will start the scheduling tick hrtimer and is supposed to - * be called only from the tick work item function. The tick hrtimer should - * not be active already. - */ -static void start_tick_timer(struct kbase_device *kbdev) -{ - struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler; - unsigned long flags; - - lockdep_assert_held(&scheduler->lock); - - spin_lock_irqsave(&scheduler->interrupt_lock, flags); - WARN_ON(scheduler->tick_timer_active); - if (likely(!work_pending(&scheduler->tick_work))) { - scheduler->tick_timer_active = true; - - hrtimer_start(&scheduler->tick_timer, - HR_TIMER_DELAY_MSEC(scheduler->csg_scheduling_period_ms), - HRTIMER_MODE_REL); - } - spin_unlock_irqrestore(&scheduler->interrupt_lock, flags); -} - -/** - * cancel_tick_timer() - Cancel the scheduling tick hrtimer - * - * @kbdev: Pointer to the device - */ -static void cancel_tick_timer(struct kbase_device *kbdev) -{ - struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler; - unsigned long flags; - - spin_lock_irqsave(&scheduler->interrupt_lock, flags); - scheduler->tick_timer_active = false; - spin_unlock_irqrestore(&scheduler->interrupt_lock, flags); - hrtimer_cancel(&scheduler->tick_timer); -} - -/** - * enqueue_tick_work() - Enqueue the scheduling tick work item - * - * @kbdev: Pointer to the device - * - * This function will queue the scheduling tick work item for immediate - * execution. This shall only be called when both the tick hrtimer and tick - * work item are not active/pending. - */ -static void enqueue_tick_work(struct kbase_device *kbdev) -{ - struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler; - - lockdep_assert_held(&scheduler->lock); + struct kbase_device *kbdev = + container_of(timer, struct kbase_device, csf.scheduler.tick_timer); kbase_csf_scheduler_invoke_tick(kbdev); + + return HRTIMER_NORESTART; } static void release_doorbell(struct kbase_device *kbdev, int doorbell_nr) @@ -398,14 +766,15 @@ static void scheduler_doorbell_init(struct kbase_device *kbdev) bitmap_zero(kbdev->csf.scheduler.doorbell_inuse_bitmap, CSF_NUM_DOORBELL); - mutex_lock(&kbdev->csf.scheduler.lock); + rt_mutex_lock(&kbdev->csf.scheduler.lock); /* Reserve doorbell 0 for use by kernel driver */ doorbell_nr = acquire_doorbell(kbdev); - mutex_unlock(&kbdev->csf.scheduler.lock); + rt_mutex_unlock(&kbdev->csf.scheduler.lock); WARN_ON(doorbell_nr != CSF_KERNEL_DOORBELL_NR); } +#ifndef CONFIG_MALI_HOST_CONTROLS_SC_RAILS /** * update_on_slot_queues_offsets - Update active queues' INSERT & EXTRACT ofs * @@ -441,48 +810,90 @@ static void update_on_slot_queues_offsets(struct kbase_device *kbdev) for (j = 0; j < max_streams; ++j) { struct kbase_queue *const queue = group->bound_queues[j]; - if (queue) { + if (queue && queue->user_io_addr) { u64 const *const output_addr = - (u64 const *)(queue->user_io_addr + PAGE_SIZE); + (u64 const *)(queue->user_io_addr + + PAGE_SIZE / sizeof(u64)); + /* + * This 64-bit read will be atomic on a 64-bit kernel but may not + * be atomic on 32-bit kernels. Support for 32-bit kernels is + * limited to build-only. + */ queue->extract_ofs = output_addr[CS_EXTRACT_LO / sizeof(u64)]; } } } } +#endif -static void enqueue_gpu_idle_work(struct kbase_csf_scheduler *const scheduler) +static void enqueue_gpu_idle_work(struct kbase_csf_scheduler *const scheduler, + unsigned long delay) { +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + lockdep_assert_held(&scheduler->lock); + + scheduler->gpu_idle_work_pending = true; + mod_delayed_work(system_highpri_wq, &scheduler->gpu_idle_work, delay); +#else + CSTD_UNUSED(delay); atomic_set(&scheduler->gpu_no_longer_idle, false); queue_work(scheduler->idle_wq, &scheduler->gpu_idle_work); +#endif } -void kbase_csf_scheduler_process_gpu_idle_event(struct kbase_device *kbdev) +bool kbase_csf_scheduler_process_gpu_idle_event(struct kbase_device *kbdev) { struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler; int non_idle_offslot_grps; bool can_suspend_on_idle; + bool ack_gpu_idle_event = true; + lockdep_assert_held(&kbdev->hwaccess_lock); lockdep_assert_held(&scheduler->interrupt_lock); non_idle_offslot_grps = atomic_read(&scheduler->non_idle_offslot_grps); can_suspend_on_idle = kbase_pm_idle_groups_sched_suspendable(kbdev); - KBASE_KTRACE_ADD(kbdev, SCHEDULER_CAN_IDLE, NULL, + KBASE_KTRACE_ADD(kbdev, SCHEDULER_GPU_IDLE_EVENT_CAN_SUSPEND, NULL, ((u64)(u32)non_idle_offslot_grps) | (((u64)can_suspend_on_idle) << 32)); if (!non_idle_offslot_grps) { +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + /* If FW is managing the cores then we need to turn off the + * the power rails. + */ + if (!kbase_pm_no_mcu_core_pwroff(kbdev)) { + queue_work(system_highpri_wq, + &scheduler->sc_rails_off_work); + ack_gpu_idle_event = false; + } +#else if (can_suspend_on_idle) { + /* fast_gpu_idle_handling is protected by the + * interrupt_lock, which would prevent this from being + * updated whilst gpu_idle_worker() is executing. + */ + scheduler->fast_gpu_idle_handling = + (kbdev->csf.gpu_idle_hysteresis_ns == 0) || + !kbase_csf_scheduler_all_csgs_idle(kbdev); + /* The GPU idle worker relies on update_on_slot_queues_offsets() to have * finished. It's queued before to reduce the time it takes till execution * but it'll eventually be blocked by the scheduler->interrupt_lock. */ - enqueue_gpu_idle_work(scheduler); - update_on_slot_queues_offsets(kbdev); + enqueue_gpu_idle_work(scheduler, 0); + + /* The extract offsets are unused in fast GPU idle handling */ + if (!scheduler->fast_gpu_idle_handling) + update_on_slot_queues_offsets(kbdev); } +#endif } else { - /* Advance the scheduling tick to get the non-idle suspended groups loaded soon */ - kbase_csf_scheduler_advance_tick_nolock(kbdev); + /* Invoke the scheduling tick to get the non-idle suspended groups loaded soon */ + kbase_csf_scheduler_invoke_tick(kbdev); } + + return ack_gpu_idle_event; } u32 kbase_csf_scheduler_get_nr_active_csgs_locked(struct kbase_device *kbdev) @@ -551,6 +962,12 @@ static bool on_slot_group_idle_locked(struct kbase_queue_group *group) return (group->run_state == KBASE_CSF_GROUP_IDLE); } +static bool can_schedule_idle_group(struct kbase_queue_group *group) +{ + return (on_slot_group_idle_locked(group) || + (group->priority == KBASE_QUEUE_GROUP_PRIORITY_REALTIME)); +} + static bool queue_group_scheduled(struct kbase_queue_group *group) { return (group->run_state != KBASE_CSF_GROUP_INACTIVE && @@ -565,35 +982,52 @@ static bool queue_group_scheduled_locked(struct kbase_queue_group *group) return queue_group_scheduled(group); } +static void update_idle_protm_group_state_to_runnable(struct kbase_queue_group *group) +{ + lockdep_assert_held(&group->kctx->kbdev->csf.scheduler.lock); + + group->run_state = KBASE_CSF_GROUP_RUNNABLE; + KBASE_KTRACE_ADD_CSF_GRP(group->kctx->kbdev, CSF_GROUP_RUNNABLE, group, group->run_state); +} + /** - * scheduler_wait_protm_quit() - Wait for GPU to exit protected mode. + * scheduler_protm_wait_quit() - Wait for GPU to exit protected mode. * * @kbdev: Pointer to the GPU device * * This function waits for the GPU to exit protected mode which is confirmed * when active_protm_grp is set to NULL. + * + * Return: true on success, false otherwise. */ -static void scheduler_wait_protm_quit(struct kbase_device *kbdev) +static bool scheduler_protm_wait_quit(struct kbase_device *kbdev) { struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler; long wt = kbase_csf_timeout_in_jiffies(kbdev->csf.fw_timeout_ms); long remaining; + bool success = true; lockdep_assert_held(&scheduler->lock); - KBASE_KTRACE_ADD(kbdev, SCHEDULER_WAIT_PROTM_QUIT, NULL, - jiffies_to_msecs(wt)); + KBASE_KTRACE_ADD(kbdev, SCHEDULER_PROTM_WAIT_QUIT_START, NULL, jiffies_to_msecs(wt)); remaining = wait_event_timeout(kbdev->csf.event_wait, !kbase_csf_scheduler_protected_mode_in_use(kbdev), wt); - if (!remaining) + if (unlikely(!remaining)) { + struct kbase_queue_group *group = kbdev->csf.scheduler.active_protm_grp; + struct kbase_context *kctx = group ? group->kctx : NULL; + dev_warn(kbdev->dev, "[%llu] Timeout (%d ms), protm_quit wait skipped", kbase_backend_get_cycle_cnt(kbdev), kbdev->csf.fw_timeout_ms); + schedule_actions_trigger_df(kbdev, kctx, DF_PROTECTED_MODE_EXIT_TIMEOUT); + success = false; + } - KBASE_KTRACE_ADD(kbdev, SCHEDULER_WAIT_PROTM_QUIT_DONE, NULL, - jiffies_to_msecs(remaining)); + KBASE_KTRACE_ADD(kbdev, SCHEDULER_PROTM_WAIT_QUIT_END, NULL, jiffies_to_msecs(remaining)); + + return success; } /** @@ -603,31 +1037,39 @@ static void scheduler_wait_protm_quit(struct kbase_device *kbdev) * * This function sends a ping request to the firmware and waits for the GPU * to exit protected mode. + * + * If the GPU does not exit protected mode, it is considered as hang. + * A GPU reset would then be triggered. */ static void scheduler_force_protm_exit(struct kbase_device *kbdev) { + unsigned long flags; + lockdep_assert_held(&kbdev->csf.scheduler.lock); kbase_csf_firmware_ping(kbdev); - scheduler_wait_protm_quit(kbdev); -} -/** - * scheduler_timer_is_enabled_nolock() - Check if the scheduler wakes up - * automatically for periodic tasks. - * - * @kbdev: Pointer to the device - * - * This is a variant of kbase_csf_scheduler_timer_is_enabled() that assumes the - * CSF scheduler lock to already have been held. - * - * Return: true if the scheduler is configured to wake up periodically - */ -static bool scheduler_timer_is_enabled_nolock(struct kbase_device *kbdev) -{ - lockdep_assert_held(&kbdev->csf.scheduler.lock); + if (scheduler_protm_wait_quit(kbdev)) + return; + + dev_err(kbdev->dev, "Possible GPU hang in Protected mode"); + + spin_lock_irqsave(&kbdev->csf.scheduler.interrupt_lock, flags); + if (kbdev->csf.scheduler.active_protm_grp) { + dev_err(kbdev->dev, + "Group-%d of context %d_%d ran in protected mode for too long on slot %d", + kbdev->csf.scheduler.active_protm_grp->handle, + kbdev->csf.scheduler.active_protm_grp->kctx->tgid, + kbdev->csf.scheduler.active_protm_grp->kctx->id, + kbdev->csf.scheduler.active_protm_grp->csg_nr); + } + spin_unlock_irqrestore(&kbdev->csf.scheduler.interrupt_lock, flags); - return kbdev->csf.scheduler.timer_enabled; + /* The GPU could be stuck in Protected mode. To prevent a hang, + * a GPU reset is performed. + */ + if (kbase_prepare_to_reset_gpu(kbdev, RESET_FLAGS_NONE)) + kbase_reset_gpu(kbdev); } /** @@ -682,7 +1124,8 @@ static int scheduler_pm_active_handle_suspend(struct kbase_device *kbdev, * Scheduler * * @kbdev: Pointer to the device - * @flags: flags containing previous interrupt state + * @flags: Pointer to the flags variable containing the interrupt state + * when hwaccess lock was acquired. * * This function is called when Scheduler needs to be activated from the * sleeping state. @@ -690,14 +1133,14 @@ static int scheduler_pm_active_handle_suspend(struct kbase_device *kbdev, * MCU is initiated. It resets the flag that indicates to the MCU state * machine that MCU needs to be put in sleep state. * - * Note: This function shall be called with hwaccess lock held and it will - * release that lock. + * Note: This function shall be called with hwaccess lock held and it may + * release that lock and reacquire it. * * Return: zero when the PM reference was taken and non-zero when the * system is being suspending/suspended. */ static int scheduler_pm_active_after_sleep(struct kbase_device *kbdev, - unsigned long flags) + unsigned long *flags) { u32 prev_count; int ret = 0; @@ -708,20 +1151,20 @@ static int scheduler_pm_active_after_sleep(struct kbase_device *kbdev, prev_count = kbdev->csf.scheduler.pm_active_count; if (!WARN_ON(prev_count == U32_MAX)) kbdev->csf.scheduler.pm_active_count++; - spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); /* On 0 => 1, make a pm_ctx_active request */ if (!prev_count) { + spin_unlock_irqrestore(&kbdev->hwaccess_lock, *flags); + ret = kbase_pm_context_active_handle_suspend(kbdev, KBASE_PM_SUSPEND_HANDLER_DONT_REACTIVATE); - spin_lock_irqsave(&kbdev->hwaccess_lock, flags); + spin_lock_irqsave(&kbdev->hwaccess_lock, *flags); if (ret) kbdev->csf.scheduler.pm_active_count--; else kbdev->pm.backend.gpu_sleep_mode_active = false; kbase_pm_update_state(kbdev); - spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); } return ret; @@ -801,6 +1244,71 @@ static void scheduler_pm_idle_before_sleep(struct kbase_device *kbdev) } #endif +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS +static void enable_gpu_idle_fw_timer(struct kbase_device *kbdev) +{ + struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler; + unsigned long flags; + + lockdep_assert_held(&scheduler->lock); + + spin_lock_irqsave(&scheduler->interrupt_lock, flags); + if (!scheduler->gpu_idle_fw_timer_enabled) { + kbase_csf_firmware_enable_gpu_idle_timer(kbdev); + scheduler->gpu_idle_fw_timer_enabled = true; + } + spin_unlock_irqrestore(&scheduler->interrupt_lock, flags); +} + +static void disable_gpu_idle_fw_timer(struct kbase_device *kbdev) +{ + struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler; + unsigned long flags; + + lockdep_assert_held(&scheduler->lock); + + spin_lock_irqsave(&scheduler->interrupt_lock, flags); + if (scheduler->gpu_idle_fw_timer_enabled) { + kbase_csf_firmware_disable_gpu_idle_timer(kbdev); + scheduler->gpu_idle_fw_timer_enabled = false; + } + spin_unlock_irqrestore(&scheduler->interrupt_lock, flags); +} + +/** + * update_gpu_idle_timer_on_scheduler_wakeup() - Update the GPU idle state + * reporting as per the power policy in use. + * + * @kbdev: Pointer to the device + * + * This function disables the GPU idle state reporting in FW if as per the + * power policy the power management of shader cores needs to be done by the + * Host. This prevents the needless disabling of User submissions in FW on + * reporting the GPU idle event to Host if power rail for shader cores is + * controlled by the Host. + * Scheduler is suspended when switching and out of such power policy, so on + * the wakeup of Scheduler can enable or disable the GPU idle state reporting. + */ +static void update_gpu_idle_timer_on_scheduler_wakeup(struct kbase_device *kbdev) +{ + struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler; + unsigned long flags; + + lockdep_assert_held(&scheduler->lock); + + WARN_ON(scheduler->state != SCHED_SUSPENDED); + + spin_lock_irqsave(&kbdev->hwaccess_lock, flags); + if (kbase_pm_no_mcu_core_pwroff(kbdev)) + disable_gpu_idle_fw_timer(kbdev); + else + enable_gpu_idle_fw_timer(kbdev); + spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); + + return; +} +#endif + static void scheduler_wakeup(struct kbase_device *kbdev, bool kick) { struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler; @@ -825,8 +1333,8 @@ static void scheduler_wakeup(struct kbase_device *kbdev, bool kick) "Re-activating the Scheduler out of sleep"); spin_lock_irqsave(&kbdev->hwaccess_lock, flags); - ret = scheduler_pm_active_after_sleep(kbdev, flags); - /* hwaccess_lock is released in the previous function call. */ + ret = scheduler_pm_active_after_sleep(kbdev, &flags); + spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); #endif } @@ -839,7 +1347,12 @@ static void scheduler_wakeup(struct kbase_device *kbdev, bool kick) return; } +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + update_gpu_idle_timer_on_scheduler_wakeup(kbdev); +#endif + scheduler->state = SCHED_INACTIVE; + KBASE_KTRACE_ADD(kbdev, SCHED_INACTIVE, NULL, scheduler->state); if (kick) scheduler_enable_tick_timer_nolock(kbdev); @@ -855,6 +1368,7 @@ static void scheduler_suspend(struct kbase_device *kbdev) dev_dbg(kbdev->dev, "Suspending the Scheduler"); scheduler_pm_idle(kbdev); scheduler->state = SCHED_SUSPENDED; + KBASE_KTRACE_ADD(kbdev, SCHED_SUSPENDED, NULL, scheduler->state); } } @@ -885,8 +1399,10 @@ static void update_idle_suspended_group_state(struct kbase_queue_group *group) KBASE_CSF_GROUP_SUSPENDED); } else if (group->run_state == KBASE_CSF_GROUP_SUSPENDED_ON_IDLE) { group->run_state = KBASE_CSF_GROUP_SUSPENDED; + KBASE_KTRACE_ADD_CSF_GRP(group->kctx->kbdev, CSF_GROUP_SUSPENDED, group, + group->run_state); - /* If scheduler is not suspended and the given group's + /* If scheduler is not suspended and the given group's * static priority (reflected by the scan_seq_num) is inside * the current tick slot-range, or there are some on_slot * idle groups, schedule an async tock. @@ -916,8 +1432,8 @@ static void update_idle_suspended_group_state(struct kbase_queue_group *group) return; new_val = atomic_inc_return(&scheduler->non_idle_offslot_grps); - KBASE_KTRACE_ADD_CSF_GRP(group->kctx->kbdev, SCHEDULER_NONIDLE_OFFSLOT_INC, - group, new_val); + KBASE_KTRACE_ADD_CSF_GRP(group->kctx->kbdev, SCHEDULER_NONIDLE_OFFSLOT_GRP_INC, group, + new_val); } int kbase_csf_scheduler_group_get_slot_locked(struct kbase_queue_group *group) @@ -1009,6 +1525,7 @@ static int halt_stream_sync(struct kbase_queue *queue) struct kbase_csf_cmd_stream_info *stream; int csi_index = queue->csi_index; long remaining = kbase_csf_timeout_in_jiffies(kbdev->csf.fw_timeout_ms); + unsigned long flags; if (WARN_ON(!group) || WARN_ON(!kbasep_csf_scheduler_group_is_on_slot_locked(group))) @@ -1026,6 +1543,11 @@ static int halt_stream_sync(struct kbase_queue *queue) == CS_ACK_STATE_START), remaining); if (!remaining) { + const struct gpu_uevent evt = { + .type = GPU_UEVENT_TYPE_KMD_ERROR, + .info = GPU_UEVENT_INFO_QUEUE_START + }; + pixel_gpu_uevent_send(kbdev, &evt); dev_warn(kbdev->dev, "[%llu] Timeout (%d ms) waiting for queue to start on csi %d bound to group %d on slot %d", kbase_backend_get_cycle_cnt(kbdev), kbdev->csf.fw_timeout_ms, csi_index, group->handle, group->csg_nr); @@ -1040,12 +1562,15 @@ static int halt_stream_sync(struct kbase_queue *queue) kbase_csf_timeout_in_jiffies(kbdev->csf.fw_timeout_ms); } + spin_lock_irqsave(&kbdev->csf.scheduler.interrupt_lock, flags); /* Set state to STOP */ kbase_csf_firmware_cs_input_mask(stream, CS_REQ, CS_REQ_STATE_STOP, CS_REQ_STATE_MASK); - KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, CSI_STOP_REQUESTED, group, queue, 0u); kbase_csf_ring_cs_kernel_doorbell(kbdev, csi_index, group->csg_nr, true); + spin_unlock_irqrestore(&kbdev->csf.scheduler.interrupt_lock, flags); + + KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, CSI_STOP_REQ, group, queue, 0u); /* Timed wait */ remaining = wait_event_timeout(kbdev->csf.event_wait, @@ -1053,6 +1578,11 @@ static int halt_stream_sync(struct kbase_queue *queue) == CS_ACK_STATE_STOP), remaining); if (!remaining) { + const struct gpu_uevent evt = { + .type = GPU_UEVENT_TYPE_KMD_ERROR, + .info = GPU_UEVENT_INFO_QUEUE_STOP + }; + pixel_gpu_uevent_send(kbdev, &evt); dev_warn(kbdev->dev, "[%llu] Timeout (%d ms) waiting for queue to stop on csi %d bound to group %d on slot %d", kbase_backend_get_cycle_cnt(kbdev), kbdev->csf.fw_timeout_ms, queue->csi_index, group->handle, group->csg_nr); @@ -1117,8 +1647,7 @@ static int sched_halt_stream(struct kbase_queue *queue) long remaining; int slot; int err = 0; - const u32 group_schedule_timeout = - 20 * kbdev->csf.scheduler.csg_scheduling_period_ms; + const u32 group_schedule_timeout = kbase_get_timeout_ms(kbdev, CSF_CSG_SUSPEND_TIMEOUT); if (WARN_ON(!group)) return -EINVAL; @@ -1141,7 +1670,7 @@ retry: /* Update the group state so that it can get scheduled soon */ update_idle_suspended_group_state(group); - mutex_unlock(&scheduler->lock); + rt_mutex_unlock(&scheduler->lock); /* This function is called when the queue group is either not on a CSG * slot or is on the slot but undergoing transition. @@ -1164,7 +1693,7 @@ retry: kbdev->csf.event_wait, can_halt_stream(kbdev, group), kbase_csf_timeout_in_jiffies(group_schedule_timeout)); - mutex_lock(&scheduler->lock); + rt_mutex_lock(&scheduler->lock); if (remaining && queue_group_scheduled_locked(group)) { slot = kbase_csf_scheduler_group_get_slot(group); @@ -1227,6 +1756,11 @@ retry: kbase_csf_timeout_in_jiffies(kbdev->csf.fw_timeout_ms)); if (!remaining) { + const struct gpu_uevent evt = { + .type = GPU_UEVENT_TYPE_KMD_ERROR, + .info = GPU_UEVENT_INFO_QUEUE_STOP_ACK + }; + pixel_gpu_uevent_send(kbdev, &evt); dev_warn(kbdev->dev, "[%llu] Timeout (%d ms) waiting for queue stop ack on csi %d bound to group %d on slot %d", kbase_backend_get_cycle_cnt(kbdev), kbdev->csf.fw_timeout_ms, @@ -1292,7 +1826,7 @@ int kbase_csf_scheduler_queue_stop(struct kbase_queue *queue) kbase_reset_gpu_assert_failed_or_prevented(kbdev); lockdep_assert_held(&queue->kctx->csf.lock); - mutex_lock(&kbdev->csf.scheduler.lock); + rt_mutex_lock(&kbdev->csf.scheduler.lock); queue->enabled = false; KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, CSI_STOP, group, queue, cs_enabled); @@ -1314,9 +1848,11 @@ int kbase_csf_scheduler_queue_stop(struct kbase_queue *queue) err = sched_halt_stream(queue); unassign_user_doorbell_from_queue(kbdev, queue); + kbase_csf_mcu_shared_drop_stopped_queue(kbdev, queue); } - mutex_unlock(&kbdev->csf.scheduler.lock); + rt_mutex_unlock(&kbdev->csf.scheduler.lock); + KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, QUEUE_STOP, group, queue, group->run_state); return err; } @@ -1324,9 +1860,9 @@ static void update_hw_active(struct kbase_queue *queue, bool active) { #if IS_ENABLED(CONFIG_MALI_NO_MALI) if (queue && queue->enabled) { - u32 *output_addr = (u32 *)(queue->user_io_addr + PAGE_SIZE); + u64 *output_addr = queue->user_io_addr + PAGE_SIZE / sizeof(u64); - output_addr[CS_ACTIVE / sizeof(u32)] = active; + output_addr[CS_ACTIVE / sizeof(*output_addr)] = active; } #else CSTD_UNUSED(queue); @@ -1336,11 +1872,16 @@ static void update_hw_active(struct kbase_queue *queue, bool active) static void program_cs_extract_init(struct kbase_queue *queue) { - u64 *input_addr = (u64 *)queue->user_io_addr; - u64 *output_addr = (u64 *)(queue->user_io_addr + PAGE_SIZE); + u64 *input_addr = queue->user_io_addr; + u64 *output_addr = queue->user_io_addr + PAGE_SIZE / sizeof(u64); - input_addr[CS_EXTRACT_INIT_LO / sizeof(u64)] = - output_addr[CS_EXTRACT_LO / sizeof(u64)]; + /* + * These 64-bit reads and writes will be atomic on a 64-bit kernel but may + * not be atomic on 32-bit kernels. Support for 32-bit kernels is limited to + * build-only. + */ + input_addr[CS_EXTRACT_INIT_LO / sizeof(*input_addr)] = + output_addr[CS_EXTRACT_LO / sizeof(*output_addr)]; } static void program_cs_trace_cfg(struct kbase_csf_cmd_stream_info *stream, @@ -1394,6 +1935,7 @@ static void program_cs(struct kbase_device *kbdev, struct kbase_csf_cmd_stream_group_info *ginfo; struct kbase_csf_cmd_stream_info *stream; int csi_index = queue->csi_index; + unsigned long flags; u64 user_input; u64 user_output; @@ -1411,11 +1953,13 @@ static void program_cs(struct kbase_device *kbdev, WARN_ON(csi_index >= ginfo->stream_num)) return; - assign_user_doorbell_to_queue(kbdev, queue); - if (queue->doorbell_nr == KBASEP_USER_DB_NR_INVALID) - return; + if (queue->enabled) { + assign_user_doorbell_to_queue(kbdev, queue); + if (queue->doorbell_nr == KBASEP_USER_DB_NR_INVALID) + return; - WARN_ON(queue->doorbell_nr != queue->group->doorbell_nr); + WARN_ON(queue->doorbell_nr != queue->group->doorbell_nr); + } if (queue->enabled && queue_group_suspended_locked(group)) program_cs_extract_init(queue); @@ -1429,17 +1973,15 @@ static void program_cs(struct kbase_device *kbdev, kbase_csf_firmware_cs_input(stream, CS_SIZE, queue->size); - user_input = (queue->reg->start_pfn << PAGE_SHIFT); - kbase_csf_firmware_cs_input(stream, CS_USER_INPUT_LO, - user_input & 0xFFFFFFFF); - kbase_csf_firmware_cs_input(stream, CS_USER_INPUT_HI, - user_input >> 32); + user_input = queue->user_io_gpu_va; + WARN_ONCE(!user_input && queue->enabled, "Enabled queue should have a valid gpu_va"); - user_output = ((queue->reg->start_pfn + 1) << PAGE_SHIFT); - kbase_csf_firmware_cs_input(stream, CS_USER_OUTPUT_LO, - user_output & 0xFFFFFFFF); - kbase_csf_firmware_cs_input(stream, CS_USER_OUTPUT_HI, - user_output >> 32); + kbase_csf_firmware_cs_input(stream, CS_USER_INPUT_LO, user_input & 0xFFFFFFFF); + kbase_csf_firmware_cs_input(stream, CS_USER_INPUT_HI, user_input >> 32); + + user_output = user_input + PAGE_SIZE; + kbase_csf_firmware_cs_input(stream, CS_USER_OUTPUT_LO, user_output & 0xFFFFFFFF); + kbase_csf_firmware_cs_input(stream, CS_USER_OUTPUT_HI, user_output >> 32); kbase_csf_firmware_cs_input(stream, CS_CONFIG, (queue->doorbell_nr << 8) | (queue->priority & 0xF)); @@ -1450,27 +1992,104 @@ static void program_cs(struct kbase_device *kbdev, /* Enable all interrupts for now */ kbase_csf_firmware_cs_input(stream, CS_ACK_IRQ_MASK, ~((u32)0)); + spin_lock_irqsave(&kbdev->csf.scheduler.interrupt_lock, flags); + + /* The fault bit could be misaligned between CS_REQ and CS_ACK if the + * acknowledgment was deferred due to dump on fault and the group was + * removed from the CSG slot before the fault could be acknowledged. + */ + if (queue->enabled) { + u32 const cs_ack = + kbase_csf_firmware_cs_output(stream, CS_ACK); + + kbase_csf_firmware_cs_input_mask(stream, CS_REQ, cs_ack, + CS_REQ_FAULT_MASK); + } + /* * Enable the CSG idle notification once the CS's ringbuffer * becomes empty or the CS becomes sync_idle, waiting sync update * or protected mode switch. */ kbase_csf_firmware_cs_input_mask(stream, CS_REQ, - CS_REQ_IDLE_EMPTY_MASK | CS_REQ_IDLE_SYNC_WAIT_MASK, - CS_REQ_IDLE_EMPTY_MASK | CS_REQ_IDLE_SYNC_WAIT_MASK); + CS_REQ_IDLE_EMPTY_MASK | CS_REQ_IDLE_SYNC_WAIT_MASK | + CS_REQ_IDLE_SHARED_SB_DEC_MASK, + CS_REQ_IDLE_EMPTY_MASK | CS_REQ_IDLE_SYNC_WAIT_MASK | + CS_REQ_IDLE_SHARED_SB_DEC_MASK); /* Set state to START/STOP */ kbase_csf_firmware_cs_input_mask(stream, CS_REQ, queue->enabled ? CS_REQ_STATE_START : CS_REQ_STATE_STOP, CS_REQ_STATE_MASK); + kbase_csf_ring_cs_kernel_doorbell(kbdev, csi_index, group->csg_nr, + ring_csg_doorbell); + spin_unlock_irqrestore(&kbdev->csf.scheduler.interrupt_lock, flags); KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, CSI_START, group, queue, queue->enabled); - kbase_csf_ring_cs_kernel_doorbell(kbdev, csi_index, group->csg_nr, - ring_csg_doorbell); update_hw_active(queue, true); } +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS +static void start_stream_sync(struct kbase_queue *queue) +{ + struct kbase_queue_group *group = queue->group; + struct kbase_device *kbdev = queue->kctx->kbdev; + struct kbase_csf_global_iface *global_iface = &kbdev->csf.global_iface; + struct kbase_csf_cmd_stream_group_info *ginfo; + struct kbase_csf_cmd_stream_info *stream; + int csi_index = queue->csi_index; + long remaining = kbase_csf_timeout_in_jiffies(kbdev->csf.fw_timeout_ms); + + lockdep_assert_held(&kbdev->csf.scheduler.lock); + + if (WARN_ON(!group) || + WARN_ON(!kbasep_csf_scheduler_group_is_on_slot_locked(group))) + return; + + ginfo = &global_iface->groups[group->csg_nr]; + stream = &ginfo->streams[csi_index]; + + program_cs(kbdev, queue, true); + + /* Timed wait */ + remaining = wait_event_timeout(kbdev->csf.event_wait, + (CS_ACK_STATE_GET(kbase_csf_firmware_cs_output(stream, CS_ACK)) + == CS_ACK_STATE_START), remaining); + + if (!remaining) { + const struct gpu_uevent evt = { + .type = GPU_UEVENT_TYPE_KMD_ERROR, + .info = GPU_UEVENT_INFO_QUEUE_START + }; + pixel_gpu_uevent_send(kbdev, &evt); + dev_warn(kbdev->dev, "[%llu] Timeout (%d ms) waiting for queue to start on csi %d bound to group %d on slot %d", + kbase_backend_get_cycle_cnt(kbdev), kbdev->csf.fw_timeout_ms, + csi_index, group->handle, group->csg_nr); + + /* TODO GPUCORE-25328: The CSG can't be terminated, the GPU + * will be reset as a work-around. + */ + if (kbase_prepare_to_reset_gpu(kbdev, RESET_FLAGS_NONE)) + kbase_reset_gpu(kbdev); + } +} +#endif + +static int onslot_csg_add_new_queue(struct kbase_queue *queue) +{ + struct kbase_device *kbdev = queue->kctx->kbdev; + int err; + + lockdep_assert_held(&kbdev->csf.scheduler.lock); + + err = kbase_csf_mcu_shared_add_queue(kbdev, queue); + if (!err) + program_cs(kbdev, queue, true); + + return err; +} + int kbase_csf_scheduler_queue_start(struct kbase_queue *queue) { struct kbase_queue_group *group = queue->group; @@ -1482,15 +2101,22 @@ int kbase_csf_scheduler_queue_start(struct kbase_queue *queue) kbase_reset_gpu_assert_prevented(kbdev); lockdep_assert_held(&queue->kctx->csf.lock); - if (WARN_ON(!group || queue->bind_state != KBASE_CSF_QUEUE_BOUND)) + if (WARN_ON_ONCE(!group || queue->bind_state != KBASE_CSF_QUEUE_BOUND)) return -EINVAL; - mutex_lock(&kbdev->csf.scheduler.lock); + rt_mutex_lock(&kbdev->csf.scheduler.lock); + +#if IS_ENABLED(CONFIG_DEBUG_FS) + if (unlikely(kbdev->csf.scheduler.state == SCHED_BUSY)) { + rt_mutex_unlock(&kbdev->csf.scheduler.lock); + return -EBUSY; + } +#endif KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, QUEUE_START, group, queue, group->run_state); - KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, QUEUE_SYNC_STATUS_WAIT, queue->group, - queue, queue->status_wait); + KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, QUEUE_SYNC_UPDATE_WAIT_STATUS, queue->group, queue, + queue->status_wait); if (group->run_state == KBASE_CSF_GROUP_FAULT_EVICTED) { err = -EIO; @@ -1504,8 +2130,34 @@ int kbase_csf_scheduler_queue_start(struct kbase_queue *queue) if (!err) { queue->enabled = true; +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + /* If the kicked GPU queue can make progress, then only + * need to abort the GPU power down. + */ + if (!queue_empty_or_blocked(queue)) + cancel_gpu_idle_work(kbdev); +#endif if (kbasep_csf_scheduler_group_is_on_slot_locked(group)) { +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + /* The shader core power rails need to be turned + * on before FW resumes the execution on HW and + * that would happen when the CSI kernel doorbell + * is rung from the following code. + */ + turn_on_sc_power_rails(kbdev); +#endif if (cs_enabled) { +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + spin_lock_irqsave(&kbdev->csf.scheduler.interrupt_lock, + flags); + kbase_csf_ring_cs_kernel_doorbell(kbdev, + queue->csi_index, group->csg_nr, + true); + spin_unlock_irqrestore( + &kbdev->csf.scheduler.interrupt_lock, flags); + } else { + start_stream_sync(queue); +#else /* In normal situation, when a queue is * already running, the queue update * would be a doorbell kick on user @@ -1519,16 +2171,37 @@ int kbase_csf_scheduler_queue_start(struct kbase_queue *queue) * user door-bell on such a case. */ kbase_csf_ring_cs_user_doorbell(kbdev, queue); - } else - program_cs(kbdev, queue, true); + } else { + err = onslot_csg_add_new_queue(queue); + /* For an on slot CSG, the only error in adding a new + * queue to run is that the scheduler could not map + * the required userio pages due to likely some resource + * issues. In such a case, and if the group is yet + * to enter its fatal error state, we return a -EBUSY + * to the submitter for another kick. The queue itself + * has yet to be programmed hence needs to remain its + * previous (disabled) state. If the error persists, + * the group will eventually reports a fatal error by + * the group's error reporting mechanism, when the MCU + * shared region map retry limit of the group is + * exceeded. For such a case, the expected error value + * is -EIO. + */ + if (unlikely(err)) { + queue->enabled = cs_enabled; + rt_mutex_unlock(&kbdev->csf.scheduler.lock); + return (err != -EIO) ? -EBUSY : err; + } +#endif + } } - queue_delayed_work(system_long_wq, - &kbdev->csf.scheduler.ping_work, - msecs_to_jiffies(FIRMWARE_PING_INTERVAL_MS)); + queue_delayed_work(system_long_wq, &kbdev->csf.scheduler.ping_work, + msecs_to_jiffies(kbase_get_timeout_ms( + kbdev, CSF_FIRMWARE_PING_TIMEOUT))); } } - mutex_unlock(&kbdev->csf.scheduler.lock); + rt_mutex_unlock(&kbdev->csf.scheduler.lock); if (evicted) kbase_csf_term_descheduled_queue_group(group); @@ -1559,7 +2232,8 @@ static enum kbase_csf_csg_slot_state update_csg_slot_status( slot_state = CSG_SLOT_RUNNING; atomic_set(&csg_slot->state, slot_state); csg_slot->trigger_jiffies = jiffies; - KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_SLOT_STARTED, csg_slot->resident_group, state); + KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_SLOT_RUNNING, csg_slot->resident_group, + state); dev_dbg(kbdev->dev, "Group %u running on slot %d\n", csg_slot->resident_group->handle, slot); } @@ -1649,17 +2323,24 @@ static void halt_csg_slot(struct kbase_queue_group *group, bool suspend) dev_dbg(kbdev->dev, "slot %d wait for up-running\n", slot); remaining = wait_event_timeout(kbdev->csf.event_wait, csg_slot_running(kbdev, slot), remaining); - if (!remaining) + if (!remaining) { + const struct gpu_uevent evt = { + .type = GPU_UEVENT_TYPE_KMD_ERROR, + .info = GPU_UEVENT_INFO_CSG_SLOT_READY + }; + pixel_gpu_uevent_send(kbdev, &evt); dev_warn(kbdev->dev, "[%llu] slot %d timeout (%d ms) on up-running\n", kbase_backend_get_cycle_cnt(kbdev), slot, kbdev->csf.fw_timeout_ms); + } } if (csg_slot_running(kbdev, slot)) { unsigned long flags; struct kbase_csf_cmd_stream_group_info *ginfo = &global_iface->groups[slot]; + u32 halt_cmd = suspend ? CSG_REQ_STATE_SUSPEND : CSG_REQ_STATE_TERMINATE; @@ -1670,15 +2351,15 @@ static void halt_csg_slot(struct kbase_queue_group *group, bool suspend) /* Set state to SUSPEND/TERMINATE */ kbase_csf_firmware_csg_input_mask(ginfo, CSG_REQ, halt_cmd, CSG_REQ_STATE_MASK); + kbase_csf_ring_csg_doorbell(kbdev, slot); spin_unlock_irqrestore(&kbdev->csf.scheduler.interrupt_lock, flags); atomic_set(&csg_slot[slot].state, CSG_SLOT_DOWN2STOP); csg_slot[slot].trigger_jiffies = jiffies; - KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_SLOT_STOP, group, halt_cmd); + KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_SLOT_STOP_REQ, group, halt_cmd); - KBASE_TLSTREAM_TL_KBASE_DEVICE_HALT_CSG( - kbdev, kbdev->gpu_props.props.raw_props.gpu_id, slot); - kbase_csf_ring_csg_doorbell(kbdev, slot); + KBASE_TLSTREAM_TL_KBASE_DEVICE_HALTING_CSG( + kbdev, kbdev->gpu_props.props.raw_props.gpu_id, slot, suspend); } } @@ -1692,6 +2373,31 @@ static void suspend_csg_slot(struct kbase_queue_group *group) halt_csg_slot(group, true); } +static bool csf_wait_ge_condition_supported(struct kbase_device *kbdev) +{ + const uint32_t glb_major = GLB_VERSION_MAJOR_GET(kbdev->csf.global_iface.version); + const uint32_t glb_minor = GLB_VERSION_MINOR_GET(kbdev->csf.global_iface.version); + + switch (glb_major) { + case 0: + break; + case 1: + if (glb_minor >= 4) + return true; + break; + case 2: + if (glb_minor >= 6) + return true; + break; + case 3: + if (glb_minor >= 6) + return true; + break; + default: + return true; + } + return false; +} /** * evaluate_sync_update() - Evaluate the sync wait condition the GPU command * queue has been blocked on. @@ -1705,23 +2411,38 @@ static bool evaluate_sync_update(struct kbase_queue *queue) struct kbase_vmap_struct *mapping; bool updated = false; u32 *sync_ptr; + u32 sync_wait_size; + u32 sync_wait_align_mask; u32 sync_wait_cond; u32 sync_current_val; struct kbase_device *kbdev; + bool sync_wait_align_valid = false; + bool sync_wait_cond_valid = false; if (WARN_ON(!queue)) return false; kbdev = queue->kctx->kbdev; + lockdep_assert_held(&kbdev->csf.scheduler.lock); + sync_wait_size = CS_STATUS_WAIT_SYNC_WAIT_SIZE_GET(queue->status_wait); + sync_wait_align_mask = + (sync_wait_size == 0 ? BASEP_EVENT32_ALIGN_BYTES : BASEP_EVENT64_ALIGN_BYTES) - 1; + sync_wait_align_valid = ((uintptr_t)queue->sync_ptr & sync_wait_align_mask) == 0; + if (!sync_wait_align_valid) { + dev_dbg(queue->kctx->kbdev->dev, "sync memory VA 0x%016llX is misaligned", + queue->sync_ptr); + goto out; + } + sync_ptr = kbase_phy_alloc_mapping_get(queue->kctx, queue->sync_ptr, &mapping); - KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, QUEUE_SYNC_UPDATE, queue->group, - queue, queue->sync_ptr); - KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, QUEUE_SYNC_BLOCKED_REASON, - queue->group, queue, queue->blocked_reason); + KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, QUEUE_SYNC_UPDATE_EVAL_START, queue->group, queue, + queue->sync_ptr); + KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, QUEUE_SYNC_UPDATE_BLOCKED_REASON, queue->group, queue, + queue->blocked_reason); if (!sync_ptr) { dev_dbg(queue->kctx->kbdev->dev, "sync memory VA 0x%016llX already freed", @@ -1731,19 +2452,24 @@ static bool evaluate_sync_update(struct kbase_queue *queue) sync_wait_cond = CS_STATUS_WAIT_SYNC_WAIT_CONDITION_GET(queue->status_wait); + sync_wait_cond_valid = (sync_wait_cond == CS_STATUS_WAIT_SYNC_WAIT_CONDITION_GT) || + (sync_wait_cond == CS_STATUS_WAIT_SYNC_WAIT_CONDITION_LE) || + ((sync_wait_cond == CS_STATUS_WAIT_SYNC_WAIT_CONDITION_GE) && + csf_wait_ge_condition_supported(kbdev)); - WARN_ON((sync_wait_cond != CS_STATUS_WAIT_SYNC_WAIT_CONDITION_GT) && - (sync_wait_cond != CS_STATUS_WAIT_SYNC_WAIT_CONDITION_LE)); + WARN_ON(!sync_wait_cond_valid); sync_current_val = READ_ONCE(*sync_ptr); - KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, QUEUE_SYNC_CURRENT_VAL, queue->group, - queue, sync_current_val); + KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, QUEUE_SYNC_UPDATE_CUR_VAL, queue->group, queue, + sync_current_val); - KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, QUEUE_SYNC_TEST_VAL, queue->group, - queue, queue->sync_value); + KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, QUEUE_SYNC_UPDATE_TEST_VAL, queue->group, queue, + queue->sync_value); if (((sync_wait_cond == CS_STATUS_WAIT_SYNC_WAIT_CONDITION_GT) && (sync_current_val > queue->sync_value)) || + ((sync_wait_cond == CS_STATUS_WAIT_SYNC_WAIT_CONDITION_GE) && + (sync_current_val >= queue->sync_value) && csf_wait_ge_condition_supported(kbdev)) || ((sync_wait_cond == CS_STATUS_WAIT_SYNC_WAIT_CONDITION_LE) && (sync_current_val <= queue->sync_value))) { /* The sync wait condition is satisfied so the group to which @@ -1757,8 +2483,7 @@ static bool evaluate_sync_update(struct kbase_queue *queue) kbase_phy_alloc_mapping_put(queue->kctx, mapping); out: - KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, QUEUE_SYNC_UPDATE_EVALUATED, - queue->group, queue, updated); + KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, QUEUE_SYNC_UPDATE_EVAL_END, queue->group, queue, updated); return updated; } @@ -1792,10 +2517,10 @@ bool save_slot_cs(struct kbase_csf_cmd_stream_group_info const *const ginfo, queue->saved_cmd_ptr = cmd_ptr; #endif - KBASE_KTRACE_ADD_CSF_GRP_Q(stream->kbdev, QUEUE_SYNC_STATUS_WAIT, - queue->group, queue, status); + KBASE_KTRACE_ADD_CSF_GRP_Q(stream->kbdev, QUEUE_SYNC_UPDATE_WAIT_STATUS, queue->group, + queue, status); - if (CS_STATUS_WAIT_SYNC_WAIT_GET(status)) { + if (CS_STATUS_WAIT_SYNC_WAIT_GET(status) || CS_STATUS_WAIT_SB_MASK_GET(status)) { queue->status_wait = status; queue->sync_ptr = kbase_csf_firmware_cs_output(stream, CS_STATUS_WAIT_SYNC_POINTER_LO); @@ -1811,7 +2536,8 @@ bool save_slot_cs(struct kbase_csf_cmd_stream_group_info const *const ginfo, kbase_csf_firmware_cs_output(stream, CS_STATUS_BLOCKED_REASON)); - if (!evaluate_sync_update(queue)) { + if ((queue->blocked_reason == CS_STATUS_BLOCKED_ON_SB_WAIT) || + !evaluate_sync_update(queue)) { is_waiting = true; } else { /* Sync object already got updated & met the condition @@ -1847,12 +2573,48 @@ static void schedule_in_cycle(struct kbase_queue_group *group, bool force) * of work needs to be enforced in situation such as entering into * protected mode). */ - if ((likely(scheduler_timer_is_enabled_nolock(kbdev)) || force) && - !scheduler->tock_pending_request) { - scheduler->tock_pending_request = true; + if (likely(kbase_csf_scheduler_timer_is_enabled(kbdev)) || force) { dev_dbg(kbdev->dev, "Kicking async for group %d\n", group->handle); - mod_delayed_work(scheduler->wq, &scheduler->tock_work, 0); + kbase_csf_scheduler_invoke_tock(kbdev); + } +} + +static void ktrace_log_group_state(struct kbase_queue_group *const group) +{ + switch (group->run_state) { + case KBASE_CSF_GROUP_INACTIVE: + KBASE_KTRACE_ADD_CSF_GRP(group->kctx->kbdev, CSF_GROUP_INACTIVE, group, + group->run_state); + break; + case KBASE_CSF_GROUP_RUNNABLE: + KBASE_KTRACE_ADD_CSF_GRP(group->kctx->kbdev, CSF_GROUP_RUNNABLE, group, + group->run_state); + break; + case KBASE_CSF_GROUP_IDLE: + KBASE_KTRACE_ADD_CSF_GRP(group->kctx->kbdev, CSF_GROUP_IDLE, group, + group->run_state); + break; + case KBASE_CSF_GROUP_SUSPENDED: + KBASE_KTRACE_ADD_CSF_GRP(group->kctx->kbdev, CSF_GROUP_SUSPENDED, group, + group->run_state); + break; + case KBASE_CSF_GROUP_SUSPENDED_ON_IDLE: + KBASE_KTRACE_ADD_CSF_GRP(group->kctx->kbdev, CSF_GROUP_SUSPENDED_ON_IDLE, group, + group->run_state); + break; + case KBASE_CSF_GROUP_SUSPENDED_ON_WAIT_SYNC: + KBASE_KTRACE_ADD_CSF_GRP(group->kctx->kbdev, CSF_GROUP_SUSPENDED_ON_WAIT_SYNC, + group, group->run_state); + break; + case KBASE_CSF_GROUP_FAULT_EVICTED: + KBASE_KTRACE_ADD_CSF_GRP(group->kctx->kbdev, CSF_GROUP_FAULT_EVICTED, group, + group->run_state); + break; + case KBASE_CSF_GROUP_TERMINATED: + KBASE_KTRACE_ADD_CSF_GRP(group->kctx->kbdev, CSF_GROUP_TERMINATED, group, + group->run_state); + break; } } @@ -1873,13 +2635,15 @@ void insert_group_to_runnable(struct kbase_csf_scheduler *const scheduler, group->run_state = run_state; + ktrace_log_group_state(group); + if (run_state == KBASE_CSF_GROUP_RUNNABLE) group->prepared_seq_num = KBASEP_GROUP_PREPARED_SEQ_NUM_INVALID; list_add_tail(&group->link, &kctx->csf.sched.runnable_groups[group->priority]); kctx->csf.sched.num_runnable_grps++; - KBASE_KTRACE_ADD_CSF_GRP(kbdev, GROUP_INSERT_RUNNABLE, group, + KBASE_KTRACE_ADD_CSF_GRP(kbdev, GROUP_RUNNABLE_INSERT, group, kctx->csf.sched.num_runnable_grps); /* Add the kctx if not yet in runnable kctxs */ @@ -1887,18 +2651,17 @@ void insert_group_to_runnable(struct kbase_csf_scheduler *const scheduler, /* First runnable csg, adds to the runnable_kctxs */ INIT_LIST_HEAD(&kctx->csf.link); list_add_tail(&kctx->csf.link, &scheduler->runnable_kctxs); - KBASE_KTRACE_ADD(kbdev, SCHEDULER_INSERT_RUNNABLE, kctx, 0u); + KBASE_KTRACE_ADD(kbdev, SCHEDULER_RUNNABLE_KCTX_INSERT, kctx, 0u); } scheduler->total_runnable_grps++; - if (likely(scheduler_timer_is_enabled_nolock(kbdev)) && - (scheduler->total_runnable_grps == 1 || - scheduler->state == SCHED_SUSPENDED || + if (likely(kbase_csf_scheduler_timer_is_enabled(kbdev)) && + (scheduler->total_runnable_grps == 1 || scheduler->state == SCHED_SUSPENDED || scheduler->state == SCHED_SLEEPING)) { dev_dbg(kbdev->dev, "Kicking scheduler on first runnable group\n"); /* Fire a scheduling to start the time-slice */ - enqueue_tick_work(kbdev); + kbase_csf_scheduler_invoke_tick(kbdev); } else schedule_in_cycle(group, false); @@ -1908,6 +2671,17 @@ void insert_group_to_runnable(struct kbase_csf_scheduler *const scheduler, scheduler_wakeup(kbdev, false); } +static void cancel_tick_work(struct kbase_csf_scheduler *const scheduler) +{ + hrtimer_cancel(&scheduler->tick_timer); + atomic_set(&scheduler->pending_tick_work, false); +} + +static void cancel_tock_work(struct kbase_csf_scheduler *const scheduler) +{ + atomic_set(&scheduler->pending_tock_work, false); +} + static void remove_group_from_runnable(struct kbase_csf_scheduler *const scheduler, struct kbase_queue_group *group, @@ -1924,6 +2698,9 @@ void remove_group_from_runnable(struct kbase_csf_scheduler *const scheduler, WARN_ON(!queue_group_scheduled_locked(group)); group->run_state = run_state; + + ktrace_log_group_state(group); + list_del_init(&group->link); spin_lock_irqsave(&scheduler->interrupt_lock, flags); @@ -1944,7 +2721,7 @@ void remove_group_from_runnable(struct kbase_csf_scheduler *const scheduler, if (kbase_prepare_to_reset_gpu(kctx->kbdev, RESET_FLAGS_NONE)) kbase_reset_gpu(kctx->kbdev); - KBASE_KTRACE_ADD_CSF_GRP(kctx->kbdev, SCHEDULER_EXIT_PROTM, + KBASE_KTRACE_ADD_CSF_GRP(kctx->kbdev, SCHEDULER_PROTM_EXIT, scheduler->active_protm_grp, 0u); scheduler->active_protm_grp = NULL; } @@ -1974,13 +2751,12 @@ void remove_group_from_runnable(struct kbase_csf_scheduler *const scheduler, } kctx->csf.sched.num_runnable_grps--; - KBASE_KTRACE_ADD_CSF_GRP(kctx->kbdev, GROUP_REMOVE_RUNNABLE, group, + KBASE_KTRACE_ADD_CSF_GRP(kctx->kbdev, GROUP_RUNNABLE_REMOVE, group, kctx->csf.sched.num_runnable_grps); new_head_grp = (!list_empty(list)) ? list_first_entry(list, struct kbase_queue_group, link) : NULL; - KBASE_KTRACE_ADD_CSF_GRP(kctx->kbdev, GROUP_HEAD_RUNNABLE, new_head_grp, - 0u); + KBASE_KTRACE_ADD_CSF_GRP(kctx->kbdev, GROUP_RUNNABLE_HEAD, new_head_grp, 0u); if (kctx->csf.sched.num_runnable_grps == 0) { struct kbase_context *new_head_kctx; @@ -1989,23 +2765,21 @@ void remove_group_from_runnable(struct kbase_csf_scheduler *const scheduler, list_del_init(&kctx->csf.link); if (scheduler->top_ctx == kctx) scheduler->top_ctx = NULL; - KBASE_KTRACE_ADD(kctx->kbdev, SCHEDULER_REMOVE_RUNNABLE, kctx, - 0u); + KBASE_KTRACE_ADD(kctx->kbdev, SCHEDULER_RUNNABLE_KCTX_REMOVE, kctx, 0u); new_head_kctx = (!list_empty(kctx_list)) ? list_first_entry(kctx_list, struct kbase_context, csf.link) : NULL; - KBASE_KTRACE_ADD(kctx->kbdev, SCHEDULER_HEAD_RUNNABLE, - new_head_kctx, 0u); + KBASE_KTRACE_ADD(kctx->kbdev, SCHEDULER_RUNNABLE_KCTX_HEAD, new_head_kctx, 0u); } WARN_ON(scheduler->total_runnable_grps == 0); scheduler->total_runnable_grps--; if (!scheduler->total_runnable_grps) { dev_dbg(kctx->kbdev->dev, "Scheduler idle has no runnable groups"); - cancel_tick_timer(kctx->kbdev); + cancel_tick_work(scheduler); WARN_ON(atomic_read(&scheduler->non_idle_offslot_grps)); if (scheduler->state != SCHED_SUSPENDED) - enqueue_gpu_idle_work(scheduler); + enqueue_gpu_idle_work(scheduler, 0); } KBASE_KTRACE_ADD_CSF_GRP(kctx->kbdev, SCHEDULER_TOP_GRP, scheduler->top_grp, scheduler->num_active_address_spaces | @@ -2022,9 +2796,11 @@ static void insert_group_to_idle_wait(struct kbase_queue_group *const group) list_add_tail(&group->link, &kctx->csf.sched.idle_wait_groups); kctx->csf.sched.num_idle_wait_grps++; - KBASE_KTRACE_ADD_CSF_GRP(kctx->kbdev, GROUP_INSERT_IDLE_WAIT, group, + KBASE_KTRACE_ADD_CSF_GRP(kctx->kbdev, GROUP_IDLE_WAIT_INSERT, group, kctx->csf.sched.num_idle_wait_grps); group->run_state = KBASE_CSF_GROUP_SUSPENDED_ON_WAIT_SYNC; + KBASE_KTRACE_ADD_CSF_GRP(kctx->kbdev, CSF_GROUP_SUSPENDED_ON_WAIT_SYNC, group, + group->run_state); dev_dbg(kctx->kbdev->dev, "Group-%d suspended on sync_wait, total wait_groups: %u\n", group->handle, kctx->csf.sched.num_idle_wait_grps); @@ -2043,14 +2819,14 @@ static void remove_group_from_idle_wait(struct kbase_queue_group *const group) list_del_init(&group->link); WARN_ON(kctx->csf.sched.num_idle_wait_grps == 0); kctx->csf.sched.num_idle_wait_grps--; - KBASE_KTRACE_ADD_CSF_GRP(kctx->kbdev, GROUP_REMOVE_IDLE_WAIT, group, + KBASE_KTRACE_ADD_CSF_GRP(kctx->kbdev, GROUP_IDLE_WAIT_REMOVE, group, kctx->csf.sched.num_idle_wait_grps); new_head_grp = (!list_empty(list)) ? list_first_entry(list, struct kbase_queue_group, link) : NULL; - KBASE_KTRACE_ADD_CSF_GRP(kctx->kbdev, GROUP_HEAD_IDLE_WAIT, - new_head_grp, 0u); + KBASE_KTRACE_ADD_CSF_GRP(kctx->kbdev, GROUP_IDLE_WAIT_HEAD, new_head_grp, 0u); group->run_state = KBASE_CSF_GROUP_INACTIVE; + KBASE_KTRACE_ADD_CSF_GRP(kctx->kbdev, CSF_GROUP_INACTIVE, group, group->run_state); } static void deschedule_idle_wait_group(struct kbase_csf_scheduler *scheduler, @@ -2065,7 +2841,7 @@ static void deschedule_idle_wait_group(struct kbase_csf_scheduler *scheduler, insert_group_to_idle_wait(group); } -static void update_offslot_non_idle_cnt_for_faulty_grp(struct kbase_queue_group *group) +static void update_offslot_non_idle_cnt(struct kbase_queue_group *group) { struct kbase_device *kbdev = group->kctx->kbdev; struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler; @@ -2075,8 +2851,7 @@ static void update_offslot_non_idle_cnt_for_faulty_grp(struct kbase_queue_group if (group->prepared_seq_num < scheduler->non_idle_scanout_grps) { int new_val = atomic_dec_return(&scheduler->non_idle_offslot_grps); - KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_DEC, - group, new_val); + KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_GRP_DEC, group, new_val); } } @@ -2092,8 +2867,7 @@ static void update_offslot_non_idle_cnt_for_onslot_grp(struct kbase_queue_group if (group->prepared_seq_num < scheduler->non_idle_scanout_grps) { int new_val = atomic_dec_return(&scheduler->non_idle_offslot_grps); - KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_DEC, - group, new_val); + KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_GRP_DEC, group, new_val); } } @@ -2113,15 +2887,15 @@ static void update_offslot_non_idle_cnt_on_grp_suspend( if (group->run_state == KBASE_CSF_GROUP_SUSPENDED) { int new_val = atomic_inc_return( &scheduler->non_idle_offslot_grps); - KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_INC, - group, new_val); + KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_GRP_INC, + group, new_val); } } else { if (group->run_state != KBASE_CSF_GROUP_SUSPENDED) { int new_val = atomic_dec_return( &scheduler->non_idle_offslot_grps); - KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_DEC, - group, new_val); + KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_GRP_DEC, + group, new_val); } } } else { @@ -2129,8 +2903,8 @@ static void update_offslot_non_idle_cnt_on_grp_suspend( if (group->run_state == KBASE_CSF_GROUP_SUSPENDED) { int new_val = atomic_inc_return( &scheduler->non_idle_offslot_grps); - KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_INC, - group, new_val); + KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_GRP_INC, group, + new_val); } } } @@ -2148,7 +2922,7 @@ static bool confirm_cmd_buf_empty(struct kbase_queue const *queue) u32 glb_version = iface->version; u64 const *input_addr = (u64 const *)queue->user_io_addr; - u64 const *output_addr = (u64 const *)(queue->user_io_addr + PAGE_SIZE); + u64 const *output_addr = (u64 const *)(queue->user_io_addr + PAGE_SIZE / sizeof(u64)); if (glb_version >= kbase_csf_interface_version(1, 0, 0)) { /* CS_STATUS_SCOREBOARD supported from CSF 1.0 */ @@ -2162,6 +2936,11 @@ static bool confirm_cmd_buf_empty(struct kbase_queue const *queue) CS_STATUS_SCOREBOARDS)); } + /* + * These 64-bit reads and writes will be atomic on a 64-bit kernel but may + * not be atomic on 32-bit kernels. Support for 32-bit kernels is limited to + * build-only. + */ cs_empty = (input_addr[CS_INSERT_LO / sizeof(u64)] == output_addr[CS_EXTRACT_LO / sizeof(u64)]); cs_idle = cs_empty && (!sb_status); @@ -2204,9 +2983,14 @@ static void save_csg_slot(struct kbase_queue_group *group) if (!queue || !queue->enabled) continue; - if (save_slot_cs(ginfo, queue)) - sync_wait = true; - else { + if (save_slot_cs(ginfo, queue)) { + /* sync_wait is only true if the queue is blocked on + * a CQS and not a scoreboard. + */ + if (queue->blocked_reason != + CS_STATUS_BLOCKED_ON_SB_WAIT) + sync_wait = true; + } else { /* Need to confirm if ringbuffer of the GPU * queue is empty or not. A race can arise * between the flush of GPU queue and suspend @@ -2231,14 +3015,19 @@ static void save_csg_slot(struct kbase_queue_group *group) else { group->run_state = KBASE_CSF_GROUP_SUSPENDED_ON_IDLE; + KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSF_GROUP_SUSPENDED_ON_IDLE, group, + group->run_state); dev_dbg(kbdev->dev, "Group-%d suspended: idle", group->handle); } } else { group->run_state = KBASE_CSF_GROUP_SUSPENDED; + KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSF_GROUP_SUSPENDED, group, + group->run_state); } update_offslot_non_idle_cnt_on_grp_suspend(group); + kbase_csf_tiler_heap_reclaim_sched_notify_grp_suspend(group); } } @@ -2255,7 +3044,7 @@ static bool cleanup_csg_slot(struct kbase_queue_group *group) s8 slot; struct kbase_csf_csg_slot *csg_slot; unsigned long flags; - u32 i; + u32 csg_req, csg_ack, i; bool as_fault = false; lockdep_assert_held(&kbdev->csf.scheduler.lock); @@ -2285,6 +3074,8 @@ static bool cleanup_csg_slot(struct kbase_queue_group *group) unassign_user_doorbell_from_group(kbdev, group); + kbasep_platform_event_work_end(group); + /* The csg does not need cleanup other than drop its AS */ spin_lock_irqsave(&kctx->kbdev->hwaccess_lock, flags); as_fault = kbase_ctx_flag(kctx, KCTX_AS_DISABLED_ON_FAULT); @@ -2293,8 +3084,17 @@ static bool cleanup_csg_slot(struct kbase_queue_group *group) as_fault = true; spin_unlock_irqrestore(&kctx->kbdev->hwaccess_lock, flags); +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) + emit_gpu_metrics_to_frontend(kbdev); +#endif /* CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD */ + /* now marking the slot is vacant */ spin_lock_irqsave(&kbdev->csf.scheduler.interrupt_lock, flags); + /* Process pending SYNC_UPDATE, if any */ + csg_req = kbase_csf_firmware_csg_input_read(ginfo, CSG_REQ); + csg_ack = kbase_csf_firmware_csg_output(ginfo, CSG_ACK); + kbase_csf_handle_csg_sync_update(kbdev, ginfo, group, csg_req, csg_ack); + kbdev->csf.scheduler.csg_slots[slot].resident_group = NULL; clear_bit(slot, kbdev->csf.scheduler.csg_slots_idle_mask); KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_SLOT_IDLE_CLEAR, group, @@ -2315,6 +3115,11 @@ static bool cleanup_csg_slot(struct kbase_queue_group *group) KBASE_TLSTREAM_TL_KBASE_DEVICE_DEPROGRAM_CSG(kbdev, kbdev->gpu_props.props.raw_props.gpu_id, slot); + /* Notify the group is off-slot and the csg_reg might be available for + * resue with other groups in a 'lazy unbinding' style. + */ + kbase_csf_mcu_shared_set_group_csg_reg_unused(kbdev, group); + return as_fault; } @@ -2351,16 +3156,17 @@ static void update_csg_slot_priority(struct kbase_queue_group *group, u8 prio) return; /* Read the csg_ep_cfg back for updating the priority field */ - ep_cfg = kbase_csf_firmware_csg_input_read(ginfo, CSG_EP_REQ); + ep_cfg = kbase_csf_firmware_csg_input_read(ginfo, CSG_EP_REQ_LO); prev_prio = CSG_EP_REQ_PRIORITY_GET(ep_cfg); ep_cfg = CSG_EP_REQ_PRIORITY_SET(ep_cfg, prio); - kbase_csf_firmware_csg_input(ginfo, CSG_EP_REQ, ep_cfg); + kbase_csf_firmware_csg_input(ginfo, CSG_EP_REQ_LO, ep_cfg); spin_lock_irqsave(&kbdev->csf.scheduler.interrupt_lock, flags); csg_req = kbase_csf_firmware_csg_output(ginfo, CSG_ACK); csg_req ^= CSG_REQ_EP_CFG_MASK; kbase_csf_firmware_csg_input_mask(ginfo, CSG_REQ, csg_req, CSG_REQ_EP_CFG_MASK); + kbase_csf_ring_csg_doorbell(kbdev, slot); spin_unlock_irqrestore(&kbdev->csf.scheduler.interrupt_lock, flags); csg_slot->priority = prio; @@ -2369,9 +3175,8 @@ static void update_csg_slot_priority(struct kbase_queue_group *group, u8 prio) group->handle, group->kctx->tgid, group->kctx->id, slot, prev_prio, prio); - KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_PRIO_UPDATE, group, prev_prio); + KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_SLOT_PRIO_UPDATE, group, prev_prio); - kbase_csf_ring_csg_doorbell(kbdev, slot); set_bit(slot, kbdev->csf.scheduler.csg_slots_prio_update); } @@ -2388,18 +3193,17 @@ static void program_csg_slot(struct kbase_queue_group *group, s8 slot, const u64 compute_mask = shader_core_mask & group->compute_mask; const u64 fragment_mask = shader_core_mask & group->fragment_mask; const u64 tiler_mask = tiler_core_mask & group->tiler_mask; - const u8 num_cores = kbdev->gpu_props.num_cores; - const u8 compute_max = min(num_cores, group->compute_max); - const u8 fragment_max = min(num_cores, group->fragment_max); + const u8 compute_max = min(kbdev->gpu_props.num_cores, group->compute_max); + const u8 fragment_max = min(kbdev->gpu_props.num_cores, group->fragment_max); const u8 tiler_max = min(CSG_TILER_MAX, group->tiler_max); struct kbase_csf_cmd_stream_group_info *ginfo; - u32 ep_cfg = 0; + u64 ep_cfg = 0; u32 csg_req; u32 state; int i; unsigned long flags; - const u64 normal_suspend_buf = - group->normal_suspend_buf.reg->start_pfn << PAGE_SHIFT; + u64 normal_suspend_buf; + u64 protm_suspend_buf; struct kbase_csf_csg_slot *csg_slot = &kbdev->csf.scheduler.csg_slots[slot]; @@ -2411,6 +3215,19 @@ static void program_csg_slot(struct kbase_queue_group *group, s8 slot, WARN_ON(atomic_read(&csg_slot->state) != CSG_SLOT_READY); + if (unlikely(kbase_csf_mcu_shared_group_bind_csg_reg(kbdev, group))) { + dev_warn(kbdev->dev, + "Couldn't bind MCU shared csg_reg for group %d of context %d_%d, slot=%u", + group->handle, group->kctx->tgid, kctx->id, slot); + kbase_csf_mcu_shared_set_group_csg_reg_unused(kbdev, group); + return; + } + + /* The suspend buf has already been mapped through binding to csg_reg */ + normal_suspend_buf = group->normal_suspend_buf.gpu_va; + protm_suspend_buf = group->protected_suspend_buf.gpu_va; + WARN_ONCE(!normal_suspend_buf, "Normal suspend buffer not mapped"); + ginfo = &global_iface->groups[slot]; /* Pick an available address space for this context */ @@ -2423,6 +3240,7 @@ static void program_csg_slot(struct kbase_queue_group *group, s8 slot, if (kctx->as_nr == KBASEP_AS_NR_INVALID) { dev_warn(kbdev->dev, "Could not get a valid AS for group %d of context %d_%d on slot %d\n", group->handle, kctx->tgid, kctx->id, slot); + kbase_csf_mcu_shared_set_group_csg_reg_unused(kbdev, group); return; } @@ -2430,6 +3248,7 @@ static void program_csg_slot(struct kbase_queue_group *group, s8 slot, set_bit(slot, kbdev->csf.scheduler.csg_inuse_bitmap); kbdev->csf.scheduler.csg_slots[slot].resident_group = group; group->csg_nr = slot; + spin_unlock_irqrestore(&kbdev->csf.scheduler.interrupt_lock, flags); assign_user_doorbell_to_group(kbdev, group); @@ -2452,6 +3271,7 @@ static void program_csg_slot(struct kbase_queue_group *group, s8 slot, fragment_mask & U32_MAX); kbase_csf_firmware_csg_input(ginfo, CSG_ALLOW_FRAGMENT_HI, fragment_mask >> 32); + kbase_csf_firmware_csg_input(ginfo, CSG_ALLOW_OTHER, tiler_mask & U32_MAX); @@ -2463,7 +3283,7 @@ static void program_csg_slot(struct kbase_queue_group *group, s8 slot, ep_cfg = CSG_EP_REQ_FRAGMENT_EP_SET(ep_cfg, fragment_max); ep_cfg = CSG_EP_REQ_TILER_EP_SET(ep_cfg, tiler_max); ep_cfg = CSG_EP_REQ_PRIORITY_SET(ep_cfg, prio); - kbase_csf_firmware_csg_input(ginfo, CSG_EP_REQ, ep_cfg); + kbase_csf_firmware_csg_input(ginfo, CSG_EP_REQ_LO, ep_cfg & U32_MAX); /* Program the address space number assigned to the context */ kbase_csf_firmware_csg_input(ginfo, CSG_CONFIG, kctx->as_nr); @@ -2473,16 +3293,22 @@ static void program_csg_slot(struct kbase_queue_group *group, s8 slot, kbase_csf_firmware_csg_input(ginfo, CSG_SUSPEND_BUF_HI, normal_suspend_buf >> 32); - if (group->protected_suspend_buf.reg) { - const u64 protm_suspend_buf = - group->protected_suspend_buf.reg->start_pfn << - PAGE_SHIFT; - kbase_csf_firmware_csg_input(ginfo, CSG_PROTM_SUSPEND_BUF_LO, - protm_suspend_buf & U32_MAX); - kbase_csf_firmware_csg_input(ginfo, CSG_PROTM_SUSPEND_BUF_HI, - protm_suspend_buf >> 32); - } + /* Note, we program the P-mode buffer pointer here, but actual runtime + * enter into pmode execution is controlled by the P-mode phy pages are + * allocated and mapped with the bound csg_reg, which has a specific flag + * for indicating this P-mode runnable condition before a group is + * granted its p-mode section entry. Without a P-mode entry, the buffer + * pointed is not going to be accessed at all. + */ + kbase_csf_firmware_csg_input(ginfo, CSG_PROTM_SUSPEND_BUF_LO, protm_suspend_buf & U32_MAX); + kbase_csf_firmware_csg_input(ginfo, CSG_PROTM_SUSPEND_BUF_HI, protm_suspend_buf >> 32); + if (group->dvs_buf) { + kbase_csf_firmware_csg_input(ginfo, CSG_DVS_BUF_LO, + group->dvs_buf & U32_MAX); + kbase_csf_firmware_csg_input(ginfo, CSG_DVS_BUF_HI, + group->dvs_buf >> 32); + } /* Enable all interrupts for now */ kbase_csf_firmware_csg_input(ginfo, CSG_ACK_IRQ_MASK, ~((u32)0)); @@ -2503,6 +3329,7 @@ static void program_csg_slot(struct kbase_queue_group *group, s8 slot, kbase_csf_firmware_csg_input_mask(ginfo, CSG_REQ, state, CSG_REQ_STATE_MASK); + kbase_csf_ring_csg_doorbell(kbdev, slot); spin_unlock_irqrestore(&kbdev->csf.scheduler.interrupt_lock, flags); /* Update status before rings the door-bell, marking ready => run */ @@ -2518,15 +3345,19 @@ static void program_csg_slot(struct kbase_queue_group *group, s8 slot, dev_dbg(kbdev->dev, "Starting group %d of context %d_%d on slot %d with priority %u\n", group->handle, kctx->tgid, kctx->id, slot, prio); - KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_SLOT_START, group, - (((u64)ep_cfg) << 32) | - ((((u32)kctx->as_nr) & 0xF) << 16) | - (state & (CSG_REQ_STATE_MASK >> CS_REQ_STATE_SHIFT))); + KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_SLOT_START_REQ, group, + (((u64)ep_cfg) << 32) | ((((u32)kctx->as_nr) & 0xF) << 16) | + (state & (CSG_REQ_STATE_MASK >> CS_REQ_STATE_SHIFT))); - kbase_csf_ring_csg_doorbell(kbdev, slot); + kbasep_platform_event_work_begin(group); + /* Update the heap reclaim manager */ + kbase_csf_tiler_heap_reclaim_sched_notify_grp_active(group); /* Programming a slot consumes a group from scanout */ update_offslot_non_idle_cnt_for_onslot_grp(group); + + /* Notify the group's bound csg_reg is now in active use */ + kbase_csf_mcu_shared_set_group_csg_reg_active(kbdev, group); } static void remove_scheduled_group(struct kbase_device *kbdev, @@ -2547,7 +3378,7 @@ static void remove_scheduled_group(struct kbase_device *kbdev, } static void sched_evict_group(struct kbase_queue_group *group, bool fault, - bool update_non_idle_offslot_grps_cnt) + bool update_non_idle_offslot_grps_cnt_from_run_state) { struct kbase_context *kctx = group->kctx; struct kbase_device *kbdev = kctx->kbdev; @@ -2558,13 +3389,13 @@ static void sched_evict_group(struct kbase_queue_group *group, bool fault, if (queue_group_scheduled_locked(group)) { u32 i; - if (update_non_idle_offslot_grps_cnt && + if (update_non_idle_offslot_grps_cnt_from_run_state && (group->run_state == KBASE_CSF_GROUP_SUSPENDED || group->run_state == KBASE_CSF_GROUP_RUNNABLE)) { int new_val = atomic_dec_return( &scheduler->non_idle_offslot_grps); - KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_DEC, - group, new_val); + KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_GRP_DEC, group, + new_val); } for (i = 0; i < MAX_SUPPORTED_STREAMS_PER_GROUP; i++) { @@ -2573,8 +3404,11 @@ static void sched_evict_group(struct kbase_queue_group *group, bool fault, } if (group->prepared_seq_num != - KBASEP_GROUP_PREPARED_SEQ_NUM_INVALID) + KBASEP_GROUP_PREPARED_SEQ_NUM_INVALID) { + if (!update_non_idle_offslot_grps_cnt_from_run_state) + update_offslot_non_idle_cnt(group); remove_scheduled_group(kbdev, group); + } if (group->run_state == KBASE_CSF_GROUP_SUSPENDED_ON_WAIT_SYNC) remove_group_from_idle_wait(group); @@ -2585,17 +3419,25 @@ static void sched_evict_group(struct kbase_queue_group *group, bool fault, WARN_ON(group->run_state != KBASE_CSF_GROUP_INACTIVE); - if (fault) + if (fault) { group->run_state = KBASE_CSF_GROUP_FAULT_EVICTED; + KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSF_GROUP_FAULT_EVICTED, group, + scheduler->total_runnable_grps); + } - KBASE_KTRACE_ADD_CSF_GRP(kbdev, GROUP_EVICT_SCHED, group, - (((u64)scheduler->total_runnable_grps) << 32) | - ((u32)group->run_state)); + KBASE_KTRACE_ADD_CSF_GRP(kbdev, GROUP_EVICT, group, + (((u64)scheduler->total_runnable_grps) << 32) | + ((u32)group->run_state)); dev_dbg(kbdev->dev, "group %d exited scheduler, num_runnable_grps %d\n", group->handle, scheduler->total_runnable_grps); /* Notify a group has been evicted */ wake_up_all(&kbdev->csf.event_wait); } + + kbase_csf_tiler_heap_reclaim_sched_notify_grp_evict(group); + + /* Clear all the bound shared regions and unmap any in-place MMU maps */ + kbase_csf_mcu_shared_clear_evicted_group_csg_reg(kbdev, group); } static int term_group_sync(struct kbase_queue_group *group) @@ -2607,13 +3449,23 @@ static int term_group_sync(struct kbase_queue_group *group) term_csg_slot(group); remaining = wait_event_timeout(kbdev->csf.event_wait, - csg_slot_stopped_locked(kbdev, group->csg_nr), remaining); - - if (!remaining) { + group->cs_unrecoverable || csg_slot_stopped_locked(kbdev, group->csg_nr), + remaining); + + if (unlikely(!remaining)) { + enum dumpfault_error_type error_type = DF_CSG_TERMINATE_TIMEOUT; + const struct gpu_uevent evt = { + .type = GPU_UEVENT_TYPE_KMD_ERROR, + .info = GPU_UEVENT_INFO_GROUP_TERM + }; + pixel_gpu_uevent_send(kbdev, &evt); dev_warn(kbdev->dev, "[%llu] term request timeout (%d ms) for group %d of context %d_%d on slot %d", kbase_backend_get_cycle_cnt(kbdev), kbdev->csf.fw_timeout_ms, group->handle, group->kctx->tgid, group->kctx->id, group->csg_nr); + if (kbase_csf_firmware_ping_wait(kbdev, FW_PING_AFTER_ERROR_TIMEOUT_MS)) + error_type = DF_PING_REQUEST_TIMEOUT; + kbase_debug_csf_fault_notify(kbdev, group->kctx, error_type); if (kbase_prepare_to_reset_gpu(kbdev, RESET_FLAGS_NONE)) kbase_reset_gpu(kbdev); @@ -2628,13 +3480,15 @@ void kbase_csf_scheduler_group_deschedule(struct kbase_queue_group *group) { struct kbase_device *kbdev = group->kctx->kbdev; struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler; + bool wait_for_termination = true; bool on_slot; kbase_reset_gpu_assert_failed_or_prevented(kbdev); lockdep_assert_held(&group->kctx->csf.lock); - mutex_lock(&scheduler->lock); + rt_mutex_lock(&scheduler->lock); KBASE_KTRACE_ADD_CSF_GRP(kbdev, GROUP_DESCHEDULE, group, group->run_state); + wait_for_dump_complete_on_group_deschedule(group); if (!queue_group_scheduled_locked(group)) goto unlock; @@ -2642,39 +3496,28 @@ void kbase_csf_scheduler_group_deschedule(struct kbase_queue_group *group) #ifdef KBASE_PM_RUNTIME /* If the queue group is on slot and Scheduler is in SLEEPING state, - * then we need to wait here for Scheduler to exit the sleep state - * (i.e. wait for the runtime suspend or power down of GPU). This would - * be better than aborting the power down. The group will be suspended - * anyways on power down, so won't have to send the CSG termination - * request to FW. + * then we need to wake up the Scheduler to exit the sleep state rather + * than waiting for the runtime suspend or power down of GPU. + * The group termination is usually triggered in the context of Application + * thread and it has been seen that certain Apps can destroy groups at + * random points and not necessarily when the App is exiting. */ if (on_slot && (scheduler->state == SCHED_SLEEPING)) { - if (wait_for_scheduler_to_exit_sleep(kbdev)) { + scheduler_wakeup(kbdev, true); + + /* Wait for MCU firmware to start running */ + if (kbase_csf_scheduler_wait_mcu_active(kbdev)) { dev_warn( kbdev->dev, - "Wait for scheduler to exit sleep state timedout when terminating group %d of context %d_%d on slot %d", + "[%llu] Wait for MCU active failed when terminating group %d of context %d_%d on slot %d", + kbase_backend_get_cycle_cnt(kbdev), group->handle, group->kctx->tgid, group->kctx->id, group->csg_nr); - - scheduler_wakeup(kbdev, true); - - /* Wait for MCU firmware to start running */ - if (kbase_csf_scheduler_wait_mcu_active(kbdev)) - dev_warn( - kbdev->dev, - "[%llu] Wait for MCU active failed when terminating group %d of context %d_%d on slot %d", - kbase_backend_get_cycle_cnt(kbdev), - group->handle, group->kctx->tgid, - group->kctx->id, group->csg_nr); + /* No point in waiting for CSG termination if MCU didn't + * become active. + */ + wait_for_termination = false; } - - /* Check the group state again as scheduler lock would have been - * released when waiting for the exit from SLEEPING state. - */ - if (!queue_group_scheduled_locked(group)) - goto unlock; - - on_slot = kbasep_csf_scheduler_group_is_on_slot_locked(group); } #endif if (!on_slot) { @@ -2682,7 +3525,11 @@ void kbase_csf_scheduler_group_deschedule(struct kbase_queue_group *group) } else { bool as_faulty; - term_group_sync(group); + if (likely(wait_for_termination)) + term_group_sync(group); + else + term_csg_slot(group); + /* Treat the csg been terminated */ as_faulty = cleanup_csg_slot(group); /* remove from the scheduler list */ @@ -2692,7 +3539,7 @@ void kbase_csf_scheduler_group_deschedule(struct kbase_queue_group *group) WARN_ON(queue_group_scheduled_locked(group)); unlock: - mutex_unlock(&scheduler->lock); + rt_mutex_unlock(&scheduler->lock); } /** @@ -2731,6 +3578,8 @@ static int scheduler_group_schedule(struct kbase_queue_group *group) group)); group->run_state = KBASE_CSF_GROUP_RUNNABLE; + KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSF_GROUP_RUNNABLE, group, + group->run_state); /* A normal mode CSG could be idle onslot during * protected mode. In this case clear the @@ -2741,6 +3590,8 @@ static int scheduler_group_schedule(struct kbase_queue_group *group) if (protm_grp && protm_grp != group) { clear_bit((unsigned int)group->csg_nr, scheduler->csg_slots_idle_mask); + /* Request the update to confirm the condition inferred. */ + group->reevaluate_idle_status = true; KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_SLOT_IDLE_CLEAR, group, scheduler->csg_slots_idle_mask[0]); } @@ -2767,8 +3618,7 @@ static int scheduler_group_schedule(struct kbase_queue_group *group) /* A new group into the scheduler */ new_val = atomic_inc_return( &kbdev->csf.scheduler.non_idle_offslot_grps); - KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_INC, - group, new_val); + KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_GRP_INC, group, new_val); } /* Since a group has become active now, check if GPU needs to be @@ -2971,8 +3821,7 @@ static void program_group_on_vacant_csg_slot(struct kbase_device *kbdev, scheduler->remaining_tick_slots--; } } else { - update_offslot_non_idle_cnt_for_faulty_grp( - group); + update_offslot_non_idle_cnt(group); remove_scheduled_group(kbdev, group); } } @@ -3064,7 +3913,6 @@ static void program_suspending_csg_slots(struct kbase_device *kbdev) DECLARE_BITMAP(slot_mask, MAX_SUPPORTED_CSGS); DECLARE_BITMAP(evicted_mask, MAX_SUPPORTED_CSGS) = {0}; bool suspend_wait_failed = false; - long remaining = kbase_csf_timeout_in_jiffies(kbdev->csf.fw_timeout_ms); lockdep_assert_held(&kbdev->csf.scheduler.lock); @@ -3076,6 +3924,7 @@ static void program_suspending_csg_slots(struct kbase_device *kbdev) while (!bitmap_empty(slot_mask, MAX_SUPPORTED_CSGS)) { DECLARE_BITMAP(changed, MAX_SUPPORTED_CSGS); + long remaining = kbase_csf_timeout_in_jiffies(kbase_get_timeout_ms(kbdev, CSF_CSG_SUSPEND_TIMEOUT)); bitmap_copy(changed, slot_mask, MAX_SUPPORTED_CSGS); @@ -3084,7 +3933,7 @@ static void program_suspending_csg_slots(struct kbase_device *kbdev) csg_slot_stopped_raw), remaining); - if (remaining) { + if (likely(remaining)) { u32 i; for_each_set_bit(i, changed, num_groups) { @@ -3103,6 +3952,12 @@ static void program_suspending_csg_slots(struct kbase_device *kbdev) * group is not terminated during * the sleep. */ + + /* Only emit suspend, if there was no AS fault */ + if (kctx_as_enabled(group->kctx) && !group->faulted) + KBASE_TLSTREAM_TL_KBASE_DEVICE_SUSPEND_CSG( + kbdev, + kbdev->gpu_props.props.raw_props.gpu_id, i); save_csg_slot(group); as_fault = cleanup_csg_slot(group); /* If AS fault detected, evict it */ @@ -3115,6 +3970,7 @@ static void program_suspending_csg_slots(struct kbase_device *kbdev) program_vacant_csg_slot(kbdev, (s8)i); } } else { + struct gpu_uevent evt; u32 i; /* Groups that have failed to suspend in time shall @@ -3124,6 +3980,7 @@ static void program_suspending_csg_slots(struct kbase_device *kbdev) for_each_set_bit(i, slot_mask, num_groups) { struct kbase_queue_group *const group = scheduler->csg_slots[i].resident_group; + enum dumpfault_error_type error_type = DF_CSG_SUSPEND_TIMEOUT; struct base_gpu_queue_group_error const err_payload = { .error_type = @@ -3137,14 +3994,13 @@ static void program_suspending_csg_slots(struct kbase_device *kbdev) if (unlikely(group == NULL)) continue; - kbase_csf_add_group_fatal_error(group, - &err_payload); - kbase_event_wakeup_nosync(group->kctx); - /* TODO GPUCORE-25328: The CSG can't be * terminated, the GPU will be reset as a * work-around. */ + evt.type = GPU_UEVENT_TYPE_KMD_ERROR; + evt.info = GPU_UEVENT_INFO_CSG_GROUP_SUSPEND; + pixel_gpu_uevent_send(kbdev, &evt); dev_warn( kbdev->dev, "[%llu] Group %d of context %d_%d on slot %u failed to suspend (timeout %d ms)", @@ -3152,14 +4008,19 @@ static void program_suspending_csg_slots(struct kbase_device *kbdev) group->handle, group->kctx->tgid, group->kctx->id, i, kbdev->csf.fw_timeout_ms); + if (kbase_csf_firmware_ping_wait(kbdev, + FW_PING_AFTER_ERROR_TIMEOUT_MS)) + error_type = DF_PING_REQUEST_TIMEOUT; + schedule_actions_trigger_df(kbdev, group->kctx, error_type); + + kbase_csf_add_group_fatal_error(group, &err_payload); + kbase_event_wakeup_nosync(group->kctx); /* The group has failed suspension, stop * further examination. */ clear_bit(i, slot_mask); set_bit(i, scheduler->csgs_events_enable_mask); - update_offslot_non_idle_cnt_for_onslot_grp( - group); } suspend_wait_failed = true; @@ -3239,7 +4100,7 @@ static void wait_csg_slots_start(struct kbase_device *kbdev) slots_state_changed(kbdev, changed, csg_slot_running), remaining); - if (remaining) { + if (likely(remaining)) { for_each_set_bit(i, changed, num_groups) { struct kbase_queue_group *group = scheduler->csg_slots[i].resident_group; @@ -3247,12 +4108,27 @@ static void wait_csg_slots_start(struct kbase_device *kbdev) /* The on slot csg is now running */ clear_bit(i, slot_mask); group->run_state = KBASE_CSF_GROUP_RUNNABLE; + KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSF_GROUP_RUNNABLE, group, + group->run_state); } } else { + const struct gpu_uevent evt = { + .type = GPU_UEVENT_TYPE_KMD_ERROR, + .info = GPU_UEVENT_INFO_CSG_SLOTS_START + }; + const int csg_nr = ffs(slot_mask[0]) - 1; + struct kbase_queue_group *group = + scheduler->csg_slots[csg_nr].resident_group; + enum dumpfault_error_type error_type = DF_CSG_START_TIMEOUT; + + pixel_gpu_uevent_send(kbdev, &evt); dev_warn(kbdev->dev, "[%llu] Timeout (%d ms) waiting for CSG slots to start, slots: 0x%*pb\n", kbase_backend_get_cycle_cnt(kbdev), kbdev->csf.fw_timeout_ms, num_groups, slot_mask); + if (kbase_csf_firmware_ping_wait(kbdev, FW_PING_AFTER_ERROR_TIMEOUT_MS)) + error_type = DF_PING_REQUEST_TIMEOUT; + schedule_actions_trigger_df(kbdev, group->kctx, error_type); if (kbase_prepare_to_reset_gpu(kbdev, RESET_FLAGS_NONE)) kbase_reset_gpu(kbdev); @@ -3369,11 +4245,10 @@ static int wait_csg_slots_handshake_ack(struct kbase_device *kbdev, slot_mask, dones), remaining); - if (remaining) + if (likely(remaining)) bitmap_andnot(slot_mask, slot_mask, dones, num_groups); else { - /* Timed-out on the wait */ return -ETIMEDOUT; } @@ -3392,20 +4267,47 @@ static void wait_csg_slots_finish_prio_update(struct kbase_device *kbdev) lockdep_assert_held(&kbdev->csf.scheduler.lock); - if (ret != 0) { + if (unlikely(ret != 0)) { + const int csg_nr = ffs(slot_mask[0]) - 1; + struct kbase_queue_group *group = + kbdev->csf.scheduler.csg_slots[csg_nr].resident_group; + enum dumpfault_error_type error_type = DF_CSG_EP_CFG_TIMEOUT; /* The update timeout is not regarded as a serious * issue, no major consequences are expected as a * result, so just warn the case. */ + const struct gpu_uevent evt = { + .type = GPU_UEVENT_TYPE_KMD_ERROR, + .info = GPU_UEVENT_INFO_CSG_EP_CFG + }; + pixel_gpu_uevent_send(kbdev, &evt); dev_warn( kbdev->dev, "[%llu] Timeout (%d ms) on CSG_REQ:EP_CFG, skipping the update wait: slot mask=0x%lx", kbase_backend_get_cycle_cnt(kbdev), kbdev->csf.fw_timeout_ms, slot_mask[0]); + if (kbase_csf_firmware_ping_wait(kbdev, FW_PING_AFTER_ERROR_TIMEOUT_MS)) + error_type = DF_PING_REQUEST_TIMEOUT; + schedule_actions_trigger_df(kbdev, group->kctx, error_type); + + /* Timeout could indicate firmware is unresponsive so trigger a GPU reset. */ + if (kbase_prepare_to_reset_gpu(kbdev, RESET_FLAGS_HWC_UNRECOVERABLE_ERROR)) + kbase_reset_gpu(kbdev); } } +static void report_csg_termination(struct kbase_queue_group *const group) +{ + struct base_gpu_queue_group_error + err = { .error_type = BASE_GPU_QUEUE_GROUP_ERROR_FATAL, + .payload = { .fatal_group = { + .status = GPU_EXCEPTION_TYPE_SW_FAULT_2, + } } }; + + kbase_csf_add_group_fatal_error(group, &err); +} + void kbase_csf_scheduler_evict_ctx_slots(struct kbase_device *kbdev, struct kbase_context *kctx, struct list_head *evicted_groups) { @@ -3416,23 +4318,28 @@ void kbase_csf_scheduler_evict_ctx_slots(struct kbase_device *kbdev, DECLARE_BITMAP(slot_mask, MAX_SUPPORTED_CSGS) = {0}; lockdep_assert_held(&kctx->csf.lock); - mutex_lock(&scheduler->lock); + rt_mutex_lock(&scheduler->lock); /* This code is only called during reset, so we don't wait for the CSG * slots to be stopped */ WARN_ON(!kbase_reset_gpu_is_active(kbdev)); - KBASE_KTRACE_ADD(kbdev, EVICT_CTX_SLOTS, kctx, 0u); + KBASE_KTRACE_ADD(kbdev, SCHEDULER_EVICT_CTX_SLOTS_START, kctx, 0u); for (slot = 0; slot < num_groups; slot++) { group = kbdev->csf.scheduler.csg_slots[slot].resident_group; if (group && group->kctx == kctx) { bool as_fault; + dev_dbg(kbdev->dev, "Evicting group [%d] running on slot [%d] due to reset", + group->handle, group->csg_nr); + term_csg_slot(group); as_fault = cleanup_csg_slot(group); /* remove the group from the scheduler list */ sched_evict_group(group, as_fault, false); + /* signal Userspace that CSG is being terminated */ + report_csg_termination(group); /* return the evicted group to the caller */ list_add_tail(&group->link, evicted_groups); set_bit(slot, slot_mask); @@ -3442,7 +4349,17 @@ void kbase_csf_scheduler_evict_ctx_slots(struct kbase_device *kbdev, dev_info(kbdev->dev, "Evicting context %d_%d slots: 0x%*pb\n", kctx->tgid, kctx->id, num_groups, slot_mask); - mutex_unlock(&scheduler->lock); + /* Fatal errors may have been the cause of the GPU reset + * taking place, in which case we want to make sure that + * we wake up the fatal event queue to notify userspace + * only once. Otherwise, we may have duplicate event + * notifications between the time the first notification + * occurs and the time the GPU is reset. + */ + kbase_event_wakeup_nosync(kctx); + + rt_mutex_unlock(&scheduler->lock); + KBASE_KTRACE_ADD(kbdev, SCHEDULER_EVICT_CTX_SLOTS_END, kctx, num_groups); } /** @@ -3486,8 +4403,8 @@ static bool scheduler_slot_protm_ack(struct kbase_device *const kbdev, struct kbase_queue *queue = group->bound_queues[i]; clear_bit(i, group->protm_pending_bitmap); - KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, PROTM_PENDING_CLEAR, group, - queue, group->protm_pending_bitmap[0]); + KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, CSI_PROTM_PEND_CLEAR, group, queue, + group->protm_pending_bitmap[0]); if (!WARN_ON(!queue) && queue->enabled) { struct kbase_csf_cmd_stream_info *stream = @@ -3523,6 +4440,39 @@ static bool scheduler_slot_protm_ack(struct kbase_device *const kbdev, } /** + * protm_enter_set_next_pending_seq - Update the scheduler's field of + * tick_protm_pending_seq to that from the next available on-slot protm + * pending CSG. + * + * @kbdev: Pointer to the GPU device. + * + * If applicable, the function updates the scheduler's tick_protm_pending_seq + * field from the next available on-slot protm pending CSG. If not, the field + * is set to KBASEP_TICK_PROTM_PEND_SCAN_SEQ_NR_INVALID. + */ +static void protm_enter_set_next_pending_seq(struct kbase_device *const kbdev) +{ + struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler; + u32 num_groups = kbdev->csf.global_iface.group_num; + u32 num_csis = kbdev->csf.global_iface.groups[0].stream_num; + u32 i; + + kbase_csf_scheduler_spin_lock_assert_held(kbdev); + + /* Reset the tick's pending protm seq number to invalid initially */ + scheduler->tick_protm_pending_seq = KBASEP_TICK_PROTM_PEND_SCAN_SEQ_NR_INVALID; + for_each_set_bit(i, scheduler->csg_inuse_bitmap, num_groups) { + struct kbase_queue_group *group = scheduler->csg_slots[i].resident_group; + + /* Set to the next pending protm group's scan_seq_number */ + if ((group != scheduler->active_protm_grp) && + (!bitmap_empty(group->protm_pending_bitmap, num_csis)) && + (group->scan_seq_num < scheduler->tick_protm_pending_seq)) + scheduler->tick_protm_pending_seq = group->scan_seq_num; + } +} + +/** * scheduler_group_check_protm_enter - Request the given group to be evaluated * for triggering the protected mode. * @@ -3540,11 +4490,22 @@ static void scheduler_group_check_protm_enter(struct kbase_device *const kbdev, struct kbase_queue_group *const input_grp) { struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler; + struct kbase_protected_suspend_buffer *sbuf = &input_grp->protected_suspend_buf; unsigned long flags; bool protm_in_use; lockdep_assert_held(&scheduler->lock); + /* Return early if the physical pages have not been allocated yet */ + if (unlikely(!sbuf->pma)) + return; + + /* This lock is taken to prevent the issuing of MMU command during the + * transition to protected mode. This helps avoid the scenario where the + * entry to protected mode happens with a memory region being locked and + * the same region is then accessed by the GPU in protected mode. + */ + down_write(&kbdev->csf.pmode_sync_sem); spin_lock_irqsave(&scheduler->interrupt_lock, flags); /* Check if the previous transition to enter & exit the protected @@ -3552,8 +4513,7 @@ static void scheduler_group_check_protm_enter(struct kbase_device *const kbdev, */ protm_in_use = kbase_csf_scheduler_protected_mode_in_use(kbdev) || kbdev->protected_mode; - KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_CHECK_PROTM_ENTER, input_grp, - protm_in_use); + KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_PROTM_ENTER_CHECK, input_grp, protm_in_use); /* Firmware samples the PROTM_PEND ACK bit for CSs when * Host sends PROTM_ENTER global request. So if PROTM_PEND ACK bit @@ -3584,6 +4544,8 @@ static void scheduler_group_check_protm_enter(struct kbase_device *const kbdev, CSG_SLOT_RUNNING) { if (kctx_as_enabled(input_grp->kctx) && scheduler_slot_protm_ack(kbdev, input_grp, slot)) { + int err; + /* Option of acknowledging to multiple * CSGs from the same kctx is dropped, * after consulting with the @@ -3593,22 +4555,75 @@ static void scheduler_group_check_protm_enter(struct kbase_device *const kbdev, /* Switch to protected mode */ scheduler->active_protm_grp = input_grp; - KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_ENTER_PROTM, - input_grp, 0u); - /* Reset the tick's pending protm seq number */ - scheduler->tick_protm_pending_seq = - KBASEP_TICK_PROTM_PEND_SCAN_SEQ_NR_INVALID; + KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_PROTM_ENTER, input_grp, + 0u); + +#if IS_ENABLED(CONFIG_MALI_CORESIGHT) + spin_unlock_irqrestore(&scheduler->interrupt_lock, flags); + + /* Coresight must be disabled before entering protected mode. */ + kbase_debug_coresight_csf_disable_pmode_enter(kbdev); + + spin_lock_irqsave(&scheduler->interrupt_lock, flags); +#endif /* IS_ENABLED(CONFIG_MALI_CORESIGHT) */ kbase_csf_enter_protected_mode(kbdev); + /* Set the pending protm seq number to the next one */ + protm_enter_set_next_pending_seq(kbdev); + spin_unlock_irqrestore(&scheduler->interrupt_lock, flags); - kbase_csf_wait_protected_mode_enter(kbdev); + err = kbase_csf_wait_protected_mode_enter(kbdev); + up_write(&kbdev->csf.pmode_sync_sem); + + if (err) + schedule_actions_trigger_df(kbdev, input_grp->kctx, + DF_PROTECTED_MODE_ENTRY_FAILURE); + + scheduler->protm_enter_time = ktime_get_raw(); + return; } } } spin_unlock_irqrestore(&scheduler->interrupt_lock, flags); + up_write(&kbdev->csf.pmode_sync_sem); +} + +/** + * scheduler_check_pmode_progress - Check if protected mode execution is progressing + * + * @kbdev: Pointer to the GPU device. + * + * This function is called when the GPU is in protected mode. + * + * It will check if the time spent in protected mode is less + * than CSF_SCHED_PROTM_PROGRESS_TIMEOUT. If not, a PROTM_EXIT + * request is sent to the FW. + */ +static void scheduler_check_pmode_progress(struct kbase_device *kbdev) +{ + u64 protm_spent_time_ms; + u64 protm_progress_timeout = + kbase_get_timeout_ms(kbdev, CSF_SCHED_PROTM_PROGRESS_TIMEOUT); + s64 diff_ms_signed = + ktime_ms_delta(ktime_get_raw(), kbdev->csf.scheduler.protm_enter_time); + + if (diff_ms_signed < 0) + return; + + lockdep_assert_held(&kbdev->csf.scheduler.lock); + + protm_spent_time_ms = (u64)diff_ms_signed; + if (protm_spent_time_ms < protm_progress_timeout) + return; + + dev_dbg(kbdev->dev, "Protected mode progress timeout: %llu >= %llu", + protm_spent_time_ms, protm_progress_timeout); + + /* Prompt the FW to exit protected mode */ + scheduler_force_protm_exit(kbdev); } static void scheduler_apply(struct kbase_device *kbdev) @@ -3616,8 +4631,6 @@ static void scheduler_apply(struct kbase_device *kbdev) struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler; const u32 total_csg_slots = kbdev->csf.global_iface.group_num; const u32 available_csg_slots = scheduler->num_csg_slots_for_tick; - u32 suspend_cnt = 0; - u32 remain_cnt = 0; u32 resident_cnt = 0; struct kbase_queue_group *group; u32 i; @@ -3630,11 +4643,8 @@ static void scheduler_apply(struct kbase_device *kbdev) group = scheduler->csg_slots[i].resident_group; if (group) { resident_cnt++; - if (group->prepared_seq_num >= available_csg_slots) { + if (group->prepared_seq_num >= available_csg_slots) suspend_queue_group(group); - suspend_cnt++; - } else - remain_cnt++; } } @@ -3664,8 +4674,7 @@ static void scheduler_apply(struct kbase_device *kbdev) if (!kctx_as_enabled(group->kctx) || group->faulted) { /* Drop the head group and continue */ - update_offslot_non_idle_cnt_for_faulty_grp( - group); + update_offslot_non_idle_cnt(group); remove_scheduled_group(kbdev, group); continue; } @@ -3688,8 +4697,9 @@ static void scheduler_apply(struct kbase_device *kbdev) program_suspending_csg_slots(kbdev); } -static void scheduler_ctx_scan_groups(struct kbase_device *kbdev, - struct kbase_context *kctx, int priority) +static void scheduler_ctx_scan_groups(struct kbase_device *kbdev, struct kbase_context *kctx, + int priority, struct list_head *privileged_groups, + struct list_head *active_groups) { struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler; struct kbase_queue_group *group; @@ -3703,8 +4713,9 @@ static void scheduler_ctx_scan_groups(struct kbase_device *kbdev, if (!kctx_as_enabled(kctx)) return; - list_for_each_entry(group, &kctx->csf.sched.runnable_groups[priority], - link) { + list_for_each_entry(group, &kctx->csf.sched.runnable_groups[priority], link) { + bool protm_req; + if (WARN_ON(!list_empty(&group->link_to_schedule))) /* This would be a bug */ list_del_init(&group->link_to_schedule); @@ -3715,33 +4726,30 @@ static void scheduler_ctx_scan_groups(struct kbase_device *kbdev, /* Set the scanout sequence number, starting from 0 */ group->scan_seq_num = scheduler->csg_scan_count_for_tick++; + protm_req = !bitmap_empty(group->protm_pending_bitmap, + kbdev->csf.global_iface.groups[0].stream_num); + if (scheduler->tick_protm_pending_seq == - KBASEP_TICK_PROTM_PEND_SCAN_SEQ_NR_INVALID) { - if (!bitmap_empty(group->protm_pending_bitmap, - kbdev->csf.global_iface.groups[0].stream_num)) - scheduler->tick_protm_pending_seq = - group->scan_seq_num; + KBASEP_TICK_PROTM_PEND_SCAN_SEQ_NR_INVALID) { + if (protm_req) + scheduler->tick_protm_pending_seq = group->scan_seq_num; } - if (queue_group_idle_locked(group)) { - if (on_slot_group_idle_locked(group)) + if (protm_req && on_slot_group_idle_locked(group)) + update_idle_protm_group_state_to_runnable(group); + else if (queue_group_idle_locked(group)) { + if (can_schedule_idle_group(group)) list_add_tail(&group->link_to_schedule, &scheduler->idle_groups_to_schedule); continue; } - if (!scheduler->ngrp_to_schedule) { - /* keep the top csg's origin */ - scheduler->top_ctx = kctx; - scheduler->top_grp = group; + if (protm_req && (group->priority == KBASE_QUEUE_GROUP_PRIORITY_REALTIME)) { + list_add_tail(&group->link_to_schedule, privileged_groups); + continue; } - list_add_tail(&group->link_to_schedule, - &scheduler->groups_to_schedule); - group->prepared_seq_num = scheduler->ngrp_to_schedule++; - - kctx->csf.sched.ngrp_to_schedule++; - count_active_address_space(kbdev, kctx); + list_add_tail(&group->link_to_schedule, active_groups); } } @@ -3810,10 +4818,9 @@ static void scheduler_rotate_groups(struct kbase_device *kbdev) new_head_grp = (!list_empty(list)) ? list_first_entry(list, struct kbase_queue_group, link) : NULL; - KBASE_KTRACE_ADD_CSF_GRP(kbdev, GROUP_ROTATE_RUNNABLE, - top_grp, top_ctx->csf.sched.num_runnable_grps); - KBASE_KTRACE_ADD_CSF_GRP(kbdev, GROUP_HEAD_RUNNABLE, - new_head_grp, 0u); + KBASE_KTRACE_ADD_CSF_GRP(kbdev, GROUP_RUNNABLE_ROTATE, top_grp, + top_ctx->csf.sched.num_runnable_grps); + KBASE_KTRACE_ADD_CSF_GRP(kbdev, GROUP_RUNNABLE_HEAD, new_head_grp, 0u); dev_dbg(kbdev->dev, "groups rotated for a context, num_runnable_groups: %u\n", scheduler->top_ctx->csf.sched.num_runnable_grps); @@ -3844,13 +4851,12 @@ static void scheduler_rotate_ctxs(struct kbase_device *kbdev) struct kbase_context *new_head_kctx; list_move_tail(&pos->csf.link, list); - KBASE_KTRACE_ADD(kbdev, SCHEDULER_ROTATE_RUNNABLE, pos, - 0u); + KBASE_KTRACE_ADD(kbdev, SCHEDULER_RUNNABLE_KCTX_ROTATE, pos, 0u); new_head_kctx = (!list_empty(list)) ? list_first_entry(list, struct kbase_context, csf.link) : NULL; - KBASE_KTRACE_ADD(kbdev, SCHEDULER_HEAD_RUNNABLE, - new_head_kctx, 0u); + KBASE_KTRACE_ADD(kbdev, SCHEDULER_RUNNABLE_KCTX_HEAD, new_head_kctx, + 0u); dev_dbg(kbdev->dev, "contexts rotated\n"); } } @@ -3865,12 +4871,17 @@ static void scheduler_rotate_ctxs(struct kbase_device *kbdev) * @kbdev: Pointer to the GPU device. * @csg_bitmap: Bitmap of the CSG slots for which * the status update request completed successfully. - * @failed_csg_bitmap: Bitmap of the CSG slots for which + * @failed_csg_bitmap: Bitmap of the idle CSG slots for which * the status update request timedout. * * This function sends a CSG status update request for all the CSG slots - * present in the bitmap scheduler->csg_slots_idle_mask and wait for the - * request to complete. + * present in the bitmap scheduler->csg_slots_idle_mask. Additionally, if + * the group's 'reevaluate_idle_status' field is set, the nominally non-idle + * slots are also included in the status update for a confirmation of their + * status. The function wait for the status update request to complete and + * returns the update completed slots bitmap and any timed out idle-flagged + * slots bitmap. + * * The bits set in the scheduler->csg_slots_idle_mask bitmap are cleared by * this function. */ @@ -3882,60 +4893,119 @@ static void scheduler_update_idle_slots_status(struct kbase_device *kbdev, struct kbase_csf_global_iface *const global_iface = &kbdev->csf.global_iface; unsigned long flags, i; + u32 active_chk = 0; lockdep_assert_held(&scheduler->lock); spin_lock_irqsave(&scheduler->interrupt_lock, flags); - for_each_set_bit(i, scheduler->csg_slots_idle_mask, num_groups) { + + for_each_set_bit(i, scheduler->csg_inuse_bitmap, num_groups) { struct kbase_csf_csg_slot *csg_slot = &scheduler->csg_slots[i]; struct kbase_queue_group *group = csg_slot->resident_group; struct kbase_csf_cmd_stream_group_info *const ginfo = &global_iface->groups[i]; u32 csg_req; + bool idle_flag; - clear_bit(i, scheduler->csg_slots_idle_mask); - KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_SLOT_IDLE_CLEAR, group, - scheduler->csg_slots_idle_mask[0]); - if (WARN_ON(!group)) + if (WARN_ON(!group)) { + clear_bit(i, scheduler->csg_inuse_bitmap); + clear_bit(i, scheduler->csg_slots_idle_mask); continue; + } - KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_SLOT_STATUS_UPDATE, group, - i); + idle_flag = test_bit(i, scheduler->csg_slots_idle_mask); + if (idle_flag || group->reevaluate_idle_status) { + if (idle_flag) { +#ifdef CONFIG_MALI_DEBUG + if (!bitmap_empty(group->protm_pending_bitmap, + ginfo->stream_num)) { + dev_warn(kbdev->dev, + "Idle bit set for group %d of ctx %d_%d on slot %d with pending protm execution", + group->handle, group->kctx->tgid, + group->kctx->id, (int)i); + } +#endif + clear_bit(i, scheduler->csg_slots_idle_mask); + KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_SLOT_IDLE_CLEAR, group, + scheduler->csg_slots_idle_mask[0]); + } else { + /* Updates include slots for which reevaluation is needed. + * Here one tracks the extra included slots in active_chk. + * For protm pending slots, their status of activeness are + * assured so no need to request an update. + */ + active_chk |= BIT(i); + group->reevaluate_idle_status = false; + } - csg_req = kbase_csf_firmware_csg_output(ginfo, CSG_ACK); - csg_req ^= CSG_REQ_STATUS_UPDATE_MASK; - kbase_csf_firmware_csg_input_mask(ginfo, CSG_REQ, csg_req, - CSG_REQ_STATUS_UPDATE_MASK); + KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_UPDATE_IDLE_SLOT_REQ, group, i); + csg_req = kbase_csf_firmware_csg_output(ginfo, CSG_ACK); + csg_req ^= CSG_REQ_STATUS_UPDATE_MASK; + kbase_csf_firmware_csg_input_mask(ginfo, CSG_REQ, csg_req, + CSG_REQ_STATUS_UPDATE_MASK); - set_bit(i, csg_bitmap); + /* Track the slot update requests in csg_bitmap. + * Note, if the scheduler requested extended update, the resulting + * csg_bitmap would be the idle_flags + active_chk. Otherwise it's + * identical to the idle_flags. + */ + set_bit(i, csg_bitmap); + } else { + group->run_state = KBASE_CSF_GROUP_RUNNABLE; + KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSF_GROUP_RUNNABLE, group, + group->run_state); + } } - spin_unlock_irqrestore(&scheduler->interrupt_lock, flags); /* The groups are aggregated into a single kernel doorbell request */ if (!bitmap_empty(csg_bitmap, num_groups)) { long wt = - kbase_csf_timeout_in_jiffies(kbdev->csf.fw_timeout_ms); + kbase_csf_timeout_in_jiffies(CSG_STATUS_UPDATE_REQ_TIMEOUT_MS); u32 db_slots = (u32)csg_bitmap[0]; kbase_csf_ring_csg_slots_doorbell(kbdev, db_slots); + spin_unlock_irqrestore(&scheduler->interrupt_lock, flags); if (wait_csg_slots_handshake_ack(kbdev, CSG_REQ_STATUS_UPDATE_MASK, csg_bitmap, wt)) { + const struct gpu_uevent evt = { + .type = GPU_UEVENT_TYPE_KMD_ERROR, + .info = GPU_UEVENT_INFO_CSG_REQ_STATUS_UPDATE + }; + const int csg_nr = ffs(csg_bitmap[0]) - 1; + struct kbase_queue_group *group = + scheduler->csg_slots[csg_nr].resident_group; + pixel_gpu_uevent_send(kbdev, &evt); + dev_warn( kbdev->dev, "[%llu] Timeout (%d ms) on CSG_REQ:STATUS_UPDATE, treat groups as not idle: slot mask=0x%lx", kbase_backend_get_cycle_cnt(kbdev), - kbdev->csf.fw_timeout_ms, + CSG_STATUS_UPDATE_REQ_TIMEOUT_MS, csg_bitmap[0]); + schedule_actions_trigger_df(kbdev, group->kctx, + DF_CSG_STATUS_UPDATE_TIMEOUT); /* Store the bitmap of timed out slots */ bitmap_copy(failed_csg_bitmap, csg_bitmap, num_groups); csg_bitmap[0] = ~csg_bitmap[0] & db_slots; + + /* Mask off any failed bit position contributed from active ones, as the + * intention is to retain the failed bit pattern contains only those from + * idle flags reporting back to the caller. This way, any failed to update + * original idle flag would be kept as 'idle' (an informed guess, as the + * update did not come to a conclusive result). So will be the failed + * active ones be treated as still 'non-idle'. This is for a graceful + * handling to the unexpected timeout condition. + */ + failed_csg_bitmap[0] &= ~active_chk; + } else { - KBASE_KTRACE_ADD(kbdev, SLOTS_STATUS_UPDATE_ACK, NULL, - db_slots); + KBASE_KTRACE_ADD(kbdev, SCHEDULER_UPDATE_IDLE_SLOTS_ACK, NULL, db_slots); csg_bitmap[0] = db_slots; } + } else { + spin_unlock_irqrestore(&scheduler->interrupt_lock, flags); } } @@ -3990,34 +5060,35 @@ static void scheduler_handle_idle_slots(struct kbase_device *kbdev) if (group_on_slot_is_idle(kbdev, i)) { group->run_state = KBASE_CSF_GROUP_IDLE; + KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSF_GROUP_IDLE, group, group->run_state); set_bit(i, scheduler->csg_slots_idle_mask); KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_SLOT_IDLE_SET, group, scheduler->csg_slots_idle_mask[0]); - } else + } else { group->run_state = KBASE_CSF_GROUP_RUNNABLE; + KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSF_GROUP_RUNNABLE, group, + group->run_state); + } } bitmap_or(scheduler->csg_slots_idle_mask, scheduler->csg_slots_idle_mask, failed_csg_bitmap, num_groups); - KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_SLOT_IDLE_SET, NULL, + KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_HANDLE_IDLE_SLOTS, NULL, scheduler->csg_slots_idle_mask[0]); spin_unlock_irqrestore(&scheduler->interrupt_lock, flags); } -static void scheduler_scan_idle_groups(struct kbase_device *kbdev) +static void scheduler_scan_group_list(struct kbase_device *kbdev, struct list_head *groups) { struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler; struct kbase_queue_group *group, *n; - list_for_each_entry_safe(group, n, &scheduler->idle_groups_to_schedule, - link_to_schedule) { - - WARN_ON(!on_slot_group_idle_locked(group)); - + list_for_each_entry_safe(group, n, groups, link_to_schedule) { if (!scheduler->ngrp_to_schedule) { /* keep the top csg's origin */ scheduler->top_ctx = group->kctx; + /* keep the top csg''s origin */ scheduler->top_grp = group; } @@ -4087,14 +5158,27 @@ static int suspend_active_groups_on_powerdown(struct kbase_device *kbdev, int ret = suspend_active_queue_groups(kbdev, slot_mask); - if (ret) { + if (unlikely(ret)) { + const int csg_nr = ffs(slot_mask[0]) - 1; + struct kbase_queue_group *group = + scheduler->csg_slots[csg_nr].resident_group; + enum dumpfault_error_type error_type = DF_CSG_SUSPEND_TIMEOUT; + /* The suspend of CSGs failed, * trigger the GPU reset to be in a deterministic state. */ + const struct gpu_uevent evt = { + .type = GPU_UEVENT_TYPE_KMD_ERROR, + .info = GPU_UEVENT_INFO_CSG_SLOTS_SUSPEND + }; + pixel_gpu_uevent_send(kbdev, &evt); dev_warn(kbdev->dev, "[%llu] Timeout (%d ms) waiting for CSG slots to suspend on power down, slot_mask: 0x%*pb\n", kbase_backend_get_cycle_cnt(kbdev), kbdev->csf.fw_timeout_ms, kbdev->csf.global_iface.group_num, slot_mask); + if (kbase_csf_firmware_ping_wait(kbdev, FW_PING_AFTER_ERROR_TIMEOUT_MS)) + error_type = DF_PING_REQUEST_TIMEOUT; + schedule_actions_trigger_df(kbdev, group->kctx, error_type); if (kbase_prepare_to_reset_gpu(kbdev, RESET_FLAGS_NONE)) kbase_reset_gpu(kbdev); @@ -4111,6 +5195,7 @@ static int suspend_active_groups_on_powerdown(struct kbase_device *kbdev, return 0; } +#ifndef CONFIG_MALI_HOST_CONTROLS_SC_RAILS /** * all_on_slot_groups_remained_idle - Live check for all groups' idleness * @@ -4147,10 +5232,15 @@ static bool all_on_slot_groups_remained_idle(struct kbase_device *kbdev) u64 const *output_addr; u64 cur_extract_ofs; - if (!queue) + if (!queue || !queue->user_io_addr) continue; - output_addr = (u64 const *)(queue->user_io_addr + PAGE_SIZE); + output_addr = (u64 const *)(queue->user_io_addr + PAGE_SIZE / sizeof(u64)); + /* + * These 64-bit reads and writes will be atomic on a 64-bit kernel + * but may not be atomic on 32-bit kernels. Support for 32-bit + * kernels is limited to build-only. + */ cur_extract_ofs = output_addr[CS_EXTRACT_LO / sizeof(u64)]; if (cur_extract_ofs != queue->extract_ofs) { /* More work has been executed since the idle @@ -4163,6 +5253,7 @@ static bool all_on_slot_groups_remained_idle(struct kbase_device *kbdev) return true; } +#endif static bool scheduler_idle_suspendable(struct kbase_device *kbdev) { @@ -4178,6 +5269,21 @@ static bool scheduler_idle_suspendable(struct kbase_device *kbdev) spin_lock_irqsave(&kbdev->hwaccess_lock, flags); spin_lock(&scheduler->interrupt_lock); + + if (scheduler->fast_gpu_idle_handling) { + scheduler->fast_gpu_idle_handling = false; + + if (scheduler->total_runnable_grps) { + suspend = !atomic_read(&scheduler->non_idle_offslot_grps) && + kbase_pm_idle_groups_sched_suspendable(kbdev); + } else + suspend = kbase_pm_no_runnables_sched_suspendable(kbdev); + spin_unlock(&scheduler->interrupt_lock); + spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); + + return suspend; + } + if (scheduler->total_runnable_grps) { /* Check both on-slots and off-slots groups idle status */ @@ -4187,16 +5293,18 @@ static bool scheduler_idle_suspendable(struct kbase_device *kbdev) } else suspend = kbase_pm_no_runnables_sched_suspendable(kbdev); +#ifndef CONFIG_MALI_HOST_CONTROLS_SC_RAILS /* Confirm that all groups are actually idle before proceeding with * suspension as groups might potentially become active again without * informing the scheduler in case userspace rings a doorbell directly. */ if (suspend && (unlikely(atomic_read(&scheduler->gpu_no_longer_idle)) || unlikely(!all_on_slot_groups_remained_idle(kbdev)))) { - dev_info(kbdev->dev, + dev_dbg(kbdev->dev, "GPU suspension skipped due to active CSGs"); suspend = false; } +#endif spin_unlock(&scheduler->interrupt_lock); spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); @@ -4224,9 +5332,13 @@ static void scheduler_sleep_on_idle(struct kbase_device *kbdev) dev_dbg(kbdev->dev, "Scheduler to be put to sleep on GPU becoming idle"); - cancel_tick_timer(kbdev); + cancel_tick_work(scheduler); scheduler_pm_idle_before_sleep(kbdev); scheduler->state = SCHED_SLEEPING; + KBASE_KTRACE_ADD(kbdev, SCHED_SLEEPING, NULL, scheduler->state); +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) + emit_gpu_metrics_to_frontend(kbdev); +#endif /* CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD */ } #endif @@ -4244,6 +5356,7 @@ static void scheduler_sleep_on_idle(struct kbase_device *kbdev) */ static bool scheduler_suspend_on_idle(struct kbase_device *kbdev) { + struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler; int ret = suspend_active_groups_on_powerdown(kbdev, false); if (ret) { @@ -4251,62 +5364,352 @@ static bool scheduler_suspend_on_idle(struct kbase_device *kbdev) atomic_read( &kbdev->csf.scheduler.non_idle_offslot_grps)); /* Bring forward the next tick */ - kbase_csf_scheduler_advance_tick(kbdev); + kbase_csf_scheduler_invoke_tick(kbdev); return false; } +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + turn_off_sc_power_rails(kbdev); + ack_gpu_idle_event(kbdev); +#endif + dev_dbg(kbdev->dev, "Scheduler to be suspended on GPU becoming idle"); scheduler_suspend(kbdev); - cancel_tick_timer(kbdev); + cancel_tick_work(scheduler); return true; } static void gpu_idle_worker(struct work_struct *work) { +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + struct kbase_device *kbdev = container_of( + work, struct kbase_device, csf.scheduler.gpu_idle_work.work); +#else struct kbase_device *kbdev = container_of( work, struct kbase_device, csf.scheduler.gpu_idle_work); +#endif struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler; bool scheduler_is_idle_suspendable = false; bool all_groups_suspended = false; - KBASE_KTRACE_ADD(kbdev, IDLE_WORKER_BEGIN, NULL, 0u); + KBASE_KTRACE_ADD(kbdev, SCHEDULER_GPU_IDLE_WORKER_START, NULL, 0u); #define __ENCODE_KTRACE_INFO(reset, idle, all_suspend) \ (((u32)reset) | (((u32)idle) << 4) | (((u32)all_suspend) << 8)) if (kbase_reset_gpu_try_prevent(kbdev)) { dev_warn(kbdev->dev, "Quit idle for failing to prevent gpu reset.\n"); - KBASE_KTRACE_ADD(kbdev, IDLE_WORKER_END, NULL, + KBASE_KTRACE_ADD(kbdev, SCHEDULER_GPU_IDLE_WORKER_END, NULL, __ENCODE_KTRACE_INFO(true, false, false)); return; } - mutex_lock(&scheduler->lock); + kbase_debug_csf_fault_wait_completion(kbdev); + rt_mutex_lock(&scheduler->lock); + +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + if (!scheduler->gpu_idle_work_pending) + goto unlock; + + scheduler->gpu_idle_work_pending = false; +#endif + +#if IS_ENABLED(CONFIG_DEBUG_FS) + if (unlikely(scheduler->state == SCHED_BUSY)) { + rt_mutex_unlock(&scheduler->lock); + kbase_reset_gpu_allow(kbdev); + return; + } +#endif scheduler_is_idle_suspendable = scheduler_idle_suspendable(kbdev); if (scheduler_is_idle_suspendable) { - KBASE_KTRACE_ADD(kbdev, GPU_IDLE_HANDLING_START, NULL, + KBASE_KTRACE_ADD(kbdev, SCHEDULER_GPU_IDLE_WORKER_HANDLING_START, NULL, kbase_csf_ktrace_gpu_cycle_cnt(kbdev)); #ifdef KBASE_PM_RUNTIME if (kbase_pm_gpu_sleep_allowed(kbdev) && - scheduler->total_runnable_grps) + kbase_csf_scheduler_get_nr_active_csgs(kbdev)) scheduler_sleep_on_idle(kbdev); else #endif all_groups_suspended = scheduler_suspend_on_idle(kbdev); + + KBASE_KTRACE_ADD(kbdev, SCHEDULER_GPU_IDLE_WORKER_HANDLING_END, NULL, 0u); } - mutex_unlock(&scheduler->lock); +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS +unlock: +#endif + rt_mutex_unlock(&scheduler->lock); kbase_reset_gpu_allow(kbdev); - KBASE_KTRACE_ADD(kbdev, IDLE_WORKER_END, NULL, - __ENCODE_KTRACE_INFO(false, - scheduler_is_idle_suspendable, + KBASE_KTRACE_ADD(kbdev, SCHEDULER_GPU_IDLE_WORKER_END, NULL, + __ENCODE_KTRACE_INFO(false, scheduler_is_idle_suspendable, all_groups_suspended)); #undef __ENCODE_KTRACE_INFO } +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS +/** + * wait_csg_db_ack - Wait for the previously sent CSI kernel DBs for a CSG to + * get acknowledged. + * + * @kbdev: Pointer to the device. + * @csg_nr: The CSG number. + * + * This function is called to wait for the previously sent CSI kernel DBs + * for a CSG to get acknowledged before acknowledging the GPU idle event. + * This is to ensure when @sc_rails_off_worker is doing the GPU idleness + * reevaluation the User submissions remain disabled. + * For firmware to re-enable User submission, two conditions are required to + * be met. + * 1. GLB_IDLE_EVENT acknowledgement + * 2. CSI kernel DB ring + * + * If GLB_IDLE_EVENT is acknowledged and FW notices the previously rung CS kernel + * DB, then it would re-enable the User submission and @sc_rails_off_worker might + * end up turning off the SC rails. + */ +static void wait_csg_db_ack(struct kbase_device *kbdev, int csg_nr) +{ +#define WAIT_TIMEOUT 10 /* 1ms timeout */ +#define DELAY_TIME_IN_US 100 + struct kbase_csf_cmd_stream_group_info *const ginfo = + &kbdev->csf.global_iface.groups[csg_nr]; + const int max_iterations = WAIT_TIMEOUT; + int loop; + + for (loop = 0; loop < max_iterations; loop++) { + if (kbase_csf_firmware_csg_input_read(ginfo, CSG_DB_REQ) == + kbase_csf_firmware_csg_output(ginfo, CSG_DB_ACK)) + break; + + udelay(DELAY_TIME_IN_US); + } + + if (loop == max_iterations) { + dev_err(kbdev->dev, + "Timeout for csg %d CSG_DB_REQ %x != CSG_DB_ACK %x", + csg_nr, + kbase_csf_firmware_csg_input_read(ginfo, CSG_DB_REQ), + kbase_csf_firmware_csg_output(ginfo, CSG_DB_ACK)); + } +} + +/** + * recheck_gpu_idleness - Recheck the idleness of the GPU before turning off + * the SC power rails. + * + * @kbdev: Pointer to the device. + * + * This function is called on the GPU idle notification to recheck the idleness + * of GPU before turning off the SC power rails. The reevaluation of idleness + * is done by sending CSG status update requests. An additional check is done + * for the CSGs that are reported as idle that whether the associated queues + * are empty or blocked. + * + * Return: true if the GPU was reevaluated as idle. + */ +static bool recheck_gpu_idleness(struct kbase_device *kbdev) +{ + struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler; + DECLARE_BITMAP(csg_bitmap, MAX_SUPPORTED_CSGS) = { 0 }; + long wt = kbase_csf_timeout_in_jiffies(kbdev->csf.fw_timeout_ms); + u32 num_groups = kbdev->csf.global_iface.group_num; + unsigned long flags, i; + + lockdep_assert_held(&scheduler->lock); + + spin_lock_irqsave(&scheduler->interrupt_lock, flags); + for_each_set_bit(i, scheduler->csg_slots_idle_mask, num_groups) { + struct kbase_csf_cmd_stream_group_info *const ginfo = + &kbdev->csf.global_iface.groups[i]; + u32 csg_req = kbase_csf_firmware_csg_output(ginfo, CSG_ACK); + + csg_req ^= CSG_REQ_STATUS_UPDATE_MASK; + kbase_csf_firmware_csg_input_mask(ginfo, CSG_REQ, csg_req, + CSG_REQ_STATUS_UPDATE_MASK); + set_bit(i, csg_bitmap); + wait_csg_db_ack(kbdev, i); + } + kbase_csf_ring_csg_slots_doorbell(kbdev, csg_bitmap[0]); + spin_unlock_irqrestore(&scheduler->interrupt_lock, flags); + + if (wait_csg_slots_handshake_ack(kbdev, + CSG_REQ_STATUS_UPDATE_MASK, csg_bitmap, wt)) { + dev_warn( + kbdev->dev, + "[%llu] Timeout (%d ms) on STATUS_UPDATE, treat GPU as not idle: slot mask=0x%lx", + kbase_backend_get_cycle_cnt(kbdev), + kbdev->csf.fw_timeout_ms, + csg_bitmap[0]); + return false; + } + + KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_SLOT_IDLE_SET, NULL, + scheduler->csg_slots_idle_mask[0]); + + ack_gpu_idle_event(kbdev); + for_each_set_bit(i, scheduler->csg_slots_idle_mask, num_groups) { + struct kbase_csf_cmd_stream_group_info *const ginfo = + &kbdev->csf.global_iface.groups[i]; + struct kbase_csf_csg_slot *csg_slot = &scheduler->csg_slots[i]; + struct kbase_queue_group *group = csg_slot->resident_group; + bool group_idle = true; + int j; + + if (!group_on_slot_is_idle(kbdev, i)) + group_idle = false; + + for (j = 0; j < ginfo->stream_num; j++) { + struct kbase_queue *const queue = + group->bound_queues[j]; + u32 *output_addr; + + if (!queue || !queue->enabled) + continue; + + output_addr = (u32 *)(queue->user_io_addr + PAGE_SIZE); + + if (output_addr[CS_ACTIVE / sizeof(u32)]) { + dev_warn( + kbdev->dev, + "queue %d bound to group %d on slot %d active unexpectedly", + queue->csi_index, queue->group->handle, + queue->group->csg_nr); + group_idle = false; + } + + if (group_idle) { + if (!save_slot_cs(ginfo, queue) && + !confirm_cmd_buf_empty(queue)) + group_idle = false; + } + + if (!group_idle) { + spin_lock_irqsave(&scheduler->interrupt_lock, flags); + kbase_csf_ring_cs_kernel_doorbell(kbdev, + queue->csi_index, group->csg_nr, true); + spin_unlock_irqrestore(&scheduler->interrupt_lock, flags); + KBASE_KTRACE_ADD_CSF_GRP(kbdev, SC_RAIL_RECHECK_NOT_IDLE, group, i); + return false; + } + } + } + KBASE_KTRACE_ADD_CSF_GRP(kbdev, SC_RAIL_RECHECK_IDLE, NULL, (u64)scheduler->csg_slots_idle_mask); + return true; +} + +/** + * can_turn_off_sc_rails - Check if the conditions are met to turn off the + * SC power rails. + * + * @kbdev: Pointer to the device. + * + * This function checks both the on-slots and off-slots groups idle status and + * if firmware is managing the cores. If the groups are not idle or Host is + * managing the cores then the rails need to be kept on. + * Additionally, we must check that the Idle event has not already been acknowledged + * as that would indicate that the idle worker has run and potentially re-enabled + * user-submission. + * + * Return: true if the SC power rails can be turned off. + */ +static bool can_turn_off_sc_rails(struct kbase_device *kbdev) +{ + struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler; + bool turn_off_sc_rails; + bool idle_event_pending; + bool all_csg_idle; + bool non_idle_offslot; + unsigned long flags; + + lockdep_assert_held(&scheduler->lock); + + if (scheduler->state == SCHED_SUSPENDED) + return false; + + spin_lock_irqsave(&kbdev->hwaccess_lock, flags); + spin_lock(&scheduler->interrupt_lock); + /* Ensure the SC power off sequence is complete before powering off the rail. + * If shader rail is turned off during job, APM generates fatal error and GPU firmware + * will generate error interrupt and try to reset. + * Note that this will avert the case when a power off is not complete, but it is not + * designed to handle a situation where a power on races with this code. That situation + * should be prevented by trapping new work through the kernel. + */ + if (!kbdev->pm.backend.sc_pwroff_safe) { + trace_clock_set_rate("rail_off_aborted.", 1, raw_smp_processor_id()); + dev_info(kbdev->dev, "SC Rail off aborted, power sequence incomplete"); + } + + idle_event_pending = gpu_idle_event_is_pending(kbdev); + all_csg_idle = kbase_csf_scheduler_all_csgs_idle(kbdev); + non_idle_offslot = !atomic_read(&scheduler->non_idle_offslot_grps); + turn_off_sc_rails = kbdev->pm.backend.sc_pwroff_safe && + idle_event_pending && + all_csg_idle && + non_idle_offslot && + !kbase_pm_no_mcu_core_pwroff(kbdev) && + !scheduler->sc_power_rails_off; + KBASE_KTRACE_ADD_CSF_GRP(kbdev, SC_RAIL_CAN_TURN_OFF, NULL, + kbdev->pm.backend.sc_pwroff_safe | + idle_event_pending << 1 | + all_csg_idle << 2 | + non_idle_offslot << 3 | + !kbase_pm_no_mcu_core_pwroff(kbdev) << 4 | + !scheduler->sc_power_rails_off << 5); + + spin_unlock(&scheduler->interrupt_lock); + spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); + + return turn_off_sc_rails; +} + +static void sc_rails_off_worker(struct work_struct *work) +{ + struct kbase_device *kbdev = container_of( + work, struct kbase_device, csf.scheduler.sc_rails_off_work); + struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler; + + KBASE_KTRACE_ADD(kbdev, SCHEDULER_ENTER_SC_RAIL, NULL, + kbase_csf_ktrace_gpu_cycle_cnt(kbdev)); + if (kbase_reset_gpu_try_prevent(kbdev)) { + dev_warn(kbdev->dev, "Skip SC rails off for failing to prevent gpu reset"); + return; + } + + rt_mutex_lock(&scheduler->lock); + /* All the previously sent CSG/CSI level requests are expected to have + * completed at this point. + */ + + if (can_turn_off_sc_rails(kbdev)) { + if (recheck_gpu_idleness(kbdev)) { + /* The GPU idle work, enqueued after previous idle + * notification, could already be pending if GPU became + * active momentarily after the previous idle notification + * and all CSGs were reported as idle. + */ + if (!scheduler->gpu_idle_work_pending) + WARN_ON(scheduler->sc_power_rails_off); + turn_off_sc_power_rails(kbdev); + enqueue_gpu_idle_work(scheduler, + kbdev->csf.gpu_idle_hysteresis_ms); + } + } else { + ack_gpu_idle_event(kbdev); + } + + rt_mutex_unlock(&scheduler->lock); + kbase_reset_gpu_allow(kbdev); + KBASE_KTRACE_ADD(kbdev, SCHEDULER_EXIT_SC_RAIL, NULL, + kbase_csf_ktrace_gpu_cycle_cnt(kbdev)); +} +#endif + static int scheduler_prepare(struct kbase_device *kbdev) { struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler; + struct list_head privileged_groups, active_groups; unsigned long flags; int i; @@ -4332,6 +5735,8 @@ static int scheduler_prepare(struct kbase_device *kbdev) scheduler->num_active_address_spaces = 0; scheduler->num_csg_slots_for_tick = 0; bitmap_zero(scheduler->csg_slots_prio_update, MAX_SUPPORTED_CSGS); + INIT_LIST_HEAD(&privileged_groups); + INIT_LIST_HEAD(&active_groups); spin_lock_irqsave(&scheduler->interrupt_lock, flags); scheduler->tick_protm_pending_seq = @@ -4341,10 +5746,17 @@ static int scheduler_prepare(struct kbase_device *kbdev) struct kbase_context *kctx; list_for_each_entry(kctx, &scheduler->runnable_kctxs, csf.link) - scheduler_ctx_scan_groups(kbdev, kctx, i); + scheduler_ctx_scan_groups(kbdev, kctx, i, &privileged_groups, + &active_groups); } spin_unlock_irqrestore(&scheduler->interrupt_lock, flags); + /* Adds privileged (RT + p.mode) groups to the scanout list */ + scheduler_scan_group_list(kbdev, &privileged_groups); + + /* Adds remainder of active groups to the scanout list */ + scheduler_scan_group_list(kbdev, &active_groups); + /* Update this tick's non-idle groups */ scheduler->non_idle_scanout_grps = scheduler->ngrp_to_schedule; @@ -4355,11 +5767,11 @@ static int scheduler_prepare(struct kbase_device *kbdev) */ atomic_set(&scheduler->non_idle_offslot_grps, scheduler->non_idle_scanout_grps); - KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_INC, NULL, + KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_GRP_INC, NULL, scheduler->non_idle_scanout_grps); /* Adds those idle but runnable groups to the scanout list */ - scheduler_scan_idle_groups(kbdev); + scheduler_scan_group_list(kbdev, &scheduler->idle_groups_to_schedule); WARN_ON(scheduler->csg_scan_count_for_tick < scheduler->ngrp_to_schedule); @@ -4451,14 +5863,176 @@ static int prepare_fast_local_tock(struct kbase_device *kbdev) struct kbase_csf_csg_slot *csg_slot = &scheduler->csg_slots[i]; struct kbase_queue_group *group = csg_slot->resident_group; - if (!queue_group_idle_locked(group)) + if (!queue_group_idle_locked(group)) { group->run_state = KBASE_CSF_GROUP_IDLE; + KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSF_GROUP_IDLE, group, group->run_state); + } } /* Return the number of idle slots for potential replacement */ return bitmap_weight(csg_bitmap, num_groups); } +static int wait_csg_slots_suspend(struct kbase_device *kbdev, unsigned long *slot_mask) +{ + struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler; + u32 num_groups = kbdev->csf.global_iface.group_num; + int err = 0; + DECLARE_BITMAP(slot_mask_local, MAX_SUPPORTED_CSGS); + + lockdep_assert_held(&scheduler->lock); + + bitmap_copy(slot_mask_local, slot_mask, MAX_SUPPORTED_CSGS); + + while (!bitmap_empty(slot_mask_local, MAX_SUPPORTED_CSGS)) { + long remaining = kbase_csf_timeout_in_jiffies(kbase_get_timeout_ms(kbdev, CSF_CSG_SUSPEND_TIMEOUT)); + DECLARE_BITMAP(changed, MAX_SUPPORTED_CSGS); + + bitmap_copy(changed, slot_mask_local, MAX_SUPPORTED_CSGS); + remaining = wait_event_timeout( + kbdev->csf.event_wait, + slots_state_changed(kbdev, changed, csg_slot_stopped_locked), remaining); + + if (likely(remaining)) { + u32 i; + + for_each_set_bit(i, changed, num_groups) { + struct kbase_queue_group *group; + + if (WARN_ON(!csg_slot_stopped_locked(kbdev, (s8)i))) + continue; + + /* The on slot csg is now stopped */ + clear_bit(i, slot_mask_local); + + group = scheduler->csg_slots[i].resident_group; + if (likely(group)) { + /* Only do save/cleanup if the + * group is not terminated during + * the sleep. + */ + + /* Only emit suspend, if there was no AS fault */ + if (kctx_as_enabled(group->kctx) && !group->faulted) + KBASE_TLSTREAM_TL_KBASE_DEVICE_SUSPEND_CSG( + kbdev, + kbdev->gpu_props.props.raw_props.gpu_id, i); + + save_csg_slot(group); + if (cleanup_csg_slot(group)) { + sched_evict_group(group, true, true); + } + } + } + } else { + dev_warn( + kbdev->dev, + "[%llu] Suspend request sent on CSG slots 0x%lx timed out for slots 0x%lx", + kbase_backend_get_cycle_cnt(kbdev), slot_mask[0], + slot_mask_local[0]); + /* Return the bitmask of the timed out slots to the caller */ + bitmap_copy(slot_mask, slot_mask_local, MAX_SUPPORTED_CSGS); + err = -ETIMEDOUT; + break; + } + } + + return err; +} + +/** + * evict_lru_or_blocked_csg() - Evict the least-recently-used idle or blocked CSG + * + * @kbdev: Pointer to the device + * + * Used to allow for speedier starting/resumption of another CSG. The worst-case + * scenario of the evicted CSG being scheduled next is expected to be rare. + * Also, the eviction will not be applied if the GPU is running in protected mode. + * Otherwise the the eviction attempt would force the MCU to quit the execution of + * the protected mode, and likely re-request to enter it again. + */ +static void evict_lru_or_blocked_csg(struct kbase_device *kbdev) +{ + struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler; + size_t i; + struct kbase_queue_group *lru_idle_group = NULL; + const u32 total_csg_slots = kbdev->csf.global_iface.group_num; + const bool all_addr_spaces_used = (scheduler->num_active_address_spaces >= + (kbdev->nr_hw_address_spaces - NUM_RESERVED_AS_SLOTS)); + u8 as_usage[BASE_MAX_NR_AS] = { 0 }; + + lockdep_assert_held(&scheduler->lock); + if (kbase_csf_scheduler_protected_mode_in_use(kbdev)) + return; + + BUILD_BUG_ON(MAX_SUPPORTED_CSGS > (sizeof(int) * BITS_PER_BYTE)); + if (fls(scheduler->csg_inuse_bitmap[0]) != total_csg_slots) + return; /* Some CSG slots remain unused */ + + if (all_addr_spaces_used) { + for (i = 0; i != total_csg_slots; ++i) { + if (scheduler->csg_slots[i].resident_group != NULL) { + if (WARN_ON(scheduler->csg_slots[i].resident_group->kctx->as_nr < + 0)) + continue; + + as_usage[scheduler->csg_slots[i].resident_group->kctx->as_nr]++; + } + } + } + + for (i = 0; i != total_csg_slots; ++i) { + struct kbase_queue_group *const group = scheduler->csg_slots[i].resident_group; + + /* We expect that by this point all groups would normally be + * assigned a physical CSG slot, but if circumstances have + * changed then bail out of this optimisation. + */ + if (group == NULL) + return; + + /* Real-time priority CSGs must be kept on-slot even when + * idle. + */ + if ((group->run_state == KBASE_CSF_GROUP_IDLE) && + (group->priority != KBASE_QUEUE_GROUP_PRIORITY_REALTIME) && + ((lru_idle_group == NULL) || + (lru_idle_group->prepared_seq_num < group->prepared_seq_num))) { + if (WARN_ON(group->kctx->as_nr < 0)) + continue; + + /* If all address spaces are used, we need to ensure the group does not + * share the AS with other active CSGs. Or CSG would be freed without AS + * and this optimization would not work. + */ + if ((!all_addr_spaces_used) || (as_usage[group->kctx->as_nr] == 1)) + lru_idle_group = group; + } + } + + if (lru_idle_group != NULL) { + unsigned long slot_mask = 1 << lru_idle_group->csg_nr; + + dev_dbg(kbdev->dev, "Suspending LRU idle group %d of context %d_%d on slot %d", + lru_idle_group->handle, lru_idle_group->kctx->tgid, + lru_idle_group->kctx->id, lru_idle_group->csg_nr); + suspend_queue_group(lru_idle_group); + if (wait_csg_slots_suspend(kbdev, &slot_mask)) { + enum dumpfault_error_type error_type = DF_CSG_SUSPEND_TIMEOUT; + + dev_warn( + kbdev->dev, + "[%llu] LRU idle group %d of context %d_%d failed to suspend on slot %d (timeout %d ms)", + kbase_backend_get_cycle_cnt(kbdev), lru_idle_group->handle, + lru_idle_group->kctx->tgid, lru_idle_group->kctx->id, + lru_idle_group->csg_nr, kbdev->csf.fw_timeout_ms); + if (kbase_csf_firmware_ping_wait(kbdev, FW_PING_AFTER_ERROR_TIMEOUT_MS)) + error_type = DF_PING_REQUEST_TIMEOUT; + schedule_actions_trigger_df(kbdev, lru_idle_group->kctx, error_type); + } + } +} + static void schedule_actions(struct kbase_device *kbdev, bool is_tick) { struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler; @@ -4473,6 +6047,11 @@ static void schedule_actions(struct kbase_device *kbdev, bool is_tick) kbase_reset_gpu_assert_prevented(kbdev); lockdep_assert_held(&scheduler->lock); +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + if (scheduler->gpu_idle_work_pending) + return; +#endif + ret = kbase_csf_scheduler_wait_mcu_active(kbdev); if (ret) { dev_err(kbdev->dev, @@ -4480,6 +6059,10 @@ static void schedule_actions(struct kbase_device *kbdev, bool is_tick) return; } +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + turn_on_sc_power_rails(kbdev); +#endif + spin_lock_irqsave(&scheduler->interrupt_lock, flags); skip_idle_slots_update = kbase_csf_scheduler_protected_mode_in_use(kbdev); skip_scheduling_actions = @@ -4522,7 +6105,7 @@ redo_local_tock: if (unlikely(!scheduler->ngrp_to_schedule && scheduler->total_runnable_grps)) { dev_dbg(kbdev->dev, "No groups to schedule in the tick"); - enqueue_gpu_idle_work(scheduler); + enqueue_gpu_idle_work(scheduler, 0); return; } spin_lock_irqsave(&scheduler->interrupt_lock, flags); @@ -4539,13 +6122,13 @@ redo_local_tock: * queue jobs. */ if (protm_grp && scheduler->top_grp == protm_grp) { - int new_val; - dev_dbg(kbdev->dev, "Scheduler keep protm exec: group-%d", protm_grp->handle); - new_val = atomic_dec_return(&scheduler->non_idle_offslot_grps); - KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_DEC, - protm_grp, new_val); + spin_unlock_irqrestore(&scheduler->interrupt_lock, flags); + + update_offslot_non_idle_cnt_for_onslot_grp(protm_grp); + remove_scheduled_group(kbdev, protm_grp); + scheduler_check_pmode_progress(kbdev); } else if (scheduler->top_grp) { if (protm_grp) dev_dbg(kbdev->dev, "Scheduler drop protm exec: group-%d", @@ -4599,11 +6182,11 @@ redo_local_tock: goto redo_local_tock; } } - - return; + } else { + spin_unlock_irqrestore(&scheduler->interrupt_lock, flags); } - spin_unlock_irqrestore(&scheduler->interrupt_lock, flags); + evict_lru_or_blocked_csg(kbdev); } /** @@ -4625,6 +6208,9 @@ static bool can_skip_scheduling(struct kbase_device *kbdev) lockdep_assert_held(&scheduler->lock); + if (unlikely(!kbase_reset_gpu_is_not_pending(kbdev))) + return true; + if (scheduler->state == SCHED_SUSPENDED) return true; @@ -4634,12 +6220,12 @@ static bool can_skip_scheduling(struct kbase_device *kbdev) spin_lock_irqsave(&kbdev->hwaccess_lock, flags); if (kbdev->pm.backend.exit_gpu_sleep_mode) { - int ret = scheduler_pm_active_after_sleep(kbdev, flags); - /* hwaccess_lock is released in the previous function - * call. - */ + int ret = scheduler_pm_active_after_sleep(kbdev, &flags); + + spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); if (!ret) { scheduler->state = SCHED_INACTIVE; + KBASE_KTRACE_ADD(kbdev, SCHED_INACTIVE, NULL, scheduler->state); return false; } @@ -4655,16 +6241,11 @@ static bool can_skip_scheduling(struct kbase_device *kbdev) return false; } -static void schedule_on_tock(struct work_struct *work) +static void schedule_on_tock(struct kbase_device *kbdev) { - struct kbase_device *kbdev = container_of(work, struct kbase_device, - csf.scheduler.tock_work.work); struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler; int err; - /* Tock work item is serviced */ - scheduler->tock_pending_request = false; - err = kbase_reset_gpu_try_prevent(kbdev); /* Regardless of whether reset failed or is currently happening, exit * early @@ -4672,41 +6253,46 @@ static void schedule_on_tock(struct work_struct *work) if (err) return; - mutex_lock(&scheduler->lock); + kbase_debug_csf_fault_wait_completion(kbdev); + rt_mutex_lock(&scheduler->lock); if (can_skip_scheduling(kbdev)) + { + atomic_set(&scheduler->pending_tock_work, false); goto exit_no_schedule_unlock; + } WARN_ON(!(scheduler->state == SCHED_INACTIVE)); scheduler->state = SCHED_BUSY; + KBASE_KTRACE_ADD(kbdev, SCHED_BUSY, NULL, scheduler->state); /* Undertaking schedule action steps */ - KBASE_KTRACE_ADD(kbdev, SCHEDULER_TOCK, NULL, 0u); - schedule_actions(kbdev, false); + KBASE_KTRACE_ADD(kbdev, SCHEDULER_TOCK_START, NULL, 0u); + while (atomic_cmpxchg(&scheduler->pending_tock_work, true, false) == true) + schedule_actions(kbdev, false); /* Record time information on a non-skipped tock */ scheduler->last_schedule = jiffies; scheduler->state = SCHED_INACTIVE; + KBASE_KTRACE_ADD(kbdev, SCHED_INACTIVE, NULL, scheduler->state); if (!scheduler->total_runnable_grps) - enqueue_gpu_idle_work(scheduler); - mutex_unlock(&scheduler->lock); + enqueue_gpu_idle_work(scheduler, 0); +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) + emit_gpu_metrics_to_frontend(kbdev); +#endif /* CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD */ + rt_mutex_unlock(&scheduler->lock); kbase_reset_gpu_allow(kbdev); - dev_dbg(kbdev->dev, - "Waking up for event after schedule-on-tock completes."); - wake_up_all(&kbdev->csf.event_wait); KBASE_KTRACE_ADD(kbdev, SCHEDULER_TOCK_END, NULL, 0u); return; exit_no_schedule_unlock: - mutex_unlock(&scheduler->lock); + rt_mutex_unlock(&scheduler->lock); kbase_reset_gpu_allow(kbdev); } -static void schedule_on_tick(struct work_struct *work) +static void schedule_on_tick(struct kbase_device *kbdev) { - struct kbase_device *kbdev = container_of(work, struct kbase_device, - csf.scheduler.tick_work); struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler; int err = kbase_reset_gpu_try_prevent(kbdev); @@ -4716,109 +6302,51 @@ static void schedule_on_tick(struct work_struct *work) if (err) return; - mutex_lock(&scheduler->lock); + kbase_debug_csf_fault_wait_completion(kbdev); + rt_mutex_lock(&scheduler->lock); - WARN_ON(scheduler->tick_timer_active); if (can_skip_scheduling(kbdev)) goto exit_no_schedule_unlock; scheduler->state = SCHED_BUSY; + KBASE_KTRACE_ADD(kbdev, SCHED_BUSY, NULL, scheduler->state); /* Undertaking schedule action steps */ - KBASE_KTRACE_ADD(kbdev, SCHEDULER_TICK, NULL, - scheduler->total_runnable_grps); + KBASE_KTRACE_ADD(kbdev, SCHEDULER_TICK_START, NULL, scheduler->total_runnable_grps); schedule_actions(kbdev, true); /* Record time information */ scheduler->last_schedule = jiffies; /* Kicking next scheduling if needed */ - if (likely(scheduler_timer_is_enabled_nolock(kbdev)) && - (scheduler->total_runnable_grps > 0)) { - start_tick_timer(kbdev); - dev_dbg(kbdev->dev, - "scheduling for next tick, num_runnable_groups:%u\n", + if (likely(kbase_csf_scheduler_timer_is_enabled(kbdev)) && + (scheduler->total_runnable_grps > 0)) { + hrtimer_start(&scheduler->tick_timer, + HR_TIMER_DELAY_MSEC(scheduler->csg_scheduling_period_ms), + HRTIMER_MODE_REL); + dev_dbg(kbdev->dev, "scheduling for next tick, num_runnable_groups:%u\n", scheduler->total_runnable_grps); } else if (!scheduler->total_runnable_grps) { - enqueue_gpu_idle_work(scheduler); + enqueue_gpu_idle_work(scheduler, 0); } scheduler->state = SCHED_INACTIVE; - mutex_unlock(&scheduler->lock); +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) + emit_gpu_metrics_to_frontend(kbdev); +#endif /* CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD */ + rt_mutex_unlock(&scheduler->lock); + KBASE_KTRACE_ADD(kbdev, SCHED_INACTIVE, NULL, scheduler->state); kbase_reset_gpu_allow(kbdev); - dev_dbg(kbdev->dev, "Waking up for event after schedule-on-tick completes."); - wake_up_all(&kbdev->csf.event_wait); KBASE_KTRACE_ADD(kbdev, SCHEDULER_TICK_END, NULL, scheduler->total_runnable_grps); return; exit_no_schedule_unlock: - mutex_unlock(&scheduler->lock); + rt_mutex_unlock(&scheduler->lock); kbase_reset_gpu_allow(kbdev); } -static int wait_csg_slots_suspend(struct kbase_device *kbdev, - const unsigned long *slot_mask, - unsigned int timeout_ms) -{ - struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler; - long remaining = kbase_csf_timeout_in_jiffies(timeout_ms); - u32 num_groups = kbdev->csf.global_iface.group_num; - int err = 0; - DECLARE_BITMAP(slot_mask_local, MAX_SUPPORTED_CSGS); - - lockdep_assert_held(&scheduler->lock); - - bitmap_copy(slot_mask_local, slot_mask, MAX_SUPPORTED_CSGS); - - while (!bitmap_empty(slot_mask_local, MAX_SUPPORTED_CSGS) - && remaining) { - DECLARE_BITMAP(changed, MAX_SUPPORTED_CSGS); - - bitmap_copy(changed, slot_mask_local, MAX_SUPPORTED_CSGS); - - remaining = wait_event_timeout(kbdev->csf.event_wait, - slots_state_changed(kbdev, changed, - csg_slot_stopped_locked), - remaining); - - if (remaining) { - u32 i; - - for_each_set_bit(i, changed, num_groups) { - struct kbase_queue_group *group; - - if (WARN_ON(!csg_slot_stopped_locked(kbdev, (s8)i))) - continue; - - /* The on slot csg is now stopped */ - clear_bit(i, slot_mask_local); - - group = scheduler->csg_slots[i].resident_group; - if (likely(group)) { - /* Only do save/cleanup if the - * group is not terminated during - * the sleep. - */ - save_csg_slot(group); - if (cleanup_csg_slot(group)) - sched_evict_group(group, true, true); - } - } - } else { - dev_warn(kbdev->dev, "[%llu] Timeout waiting for CSG slots to suspend, slot_mask: 0x%*pb\n", - kbase_backend_get_cycle_cnt(kbdev), - num_groups, slot_mask_local); - - - err = -ETIMEDOUT; - } - } - - return err; -} - static int suspend_active_queue_groups(struct kbase_device *kbdev, unsigned long *slot_mask) { @@ -4839,7 +6367,7 @@ static int suspend_active_queue_groups(struct kbase_device *kbdev, } } - ret = wait_csg_slots_suspend(kbdev, slot_mask, kbdev->reset_timeout_ms); + ret = wait_csg_slots_suspend(kbdev, slot_mask); return ret; } @@ -4850,13 +6378,18 @@ static int suspend_active_queue_groups_on_reset(struct kbase_device *kbdev) int ret; int ret2; - mutex_lock(&scheduler->lock); + rt_mutex_lock(&scheduler->lock); ret = suspend_active_queue_groups(kbdev, slot_mask); if (ret) { dev_warn(kbdev->dev, "Timeout waiting for CSG slots to suspend before reset, slot_mask: 0x%*pb\n", kbdev->csf.global_iface.group_num, slot_mask); + //TODO: should introduce SSCD report if this happens. + kbase_gpu_timeout_debug_message(kbdev, ""); + dev_warn(kbdev->dev, "[%llu] Firmware ping %d", + kbase_backend_get_cycle_cnt(kbdev), + kbase_csf_firmware_ping_wait(kbdev, 0)); } /* Need to flush the GPU cache to ensure suspend buffer @@ -4874,16 +6407,15 @@ static int suspend_active_queue_groups_on_reset(struct kbase_device *kbdev) * overflow. */ kbase_gpu_start_cache_clean(kbdev, GPU_COMMAND_CACHE_CLN_INV_L2_LSC); - ret2 = kbase_gpu_wait_cache_clean_timeout(kbdev, - kbdev->reset_timeout_ms); + ret2 = kbase_gpu_wait_cache_clean_timeout(kbdev, kbdev->mmu_or_gpu_cache_op_wait_time_ms); if (ret2) { - dev_warn(kbdev->dev, "[%llu] Timeout waiting for cache clean to complete before reset", - kbase_backend_get_cycle_cnt(kbdev)); + dev_err(kbdev->dev, "[%llu] Timeout waiting for CACHE_CLN_INV_L2_LSC", + kbase_backend_get_cycle_cnt(kbdev)); if (!ret) ret = ret2; } - mutex_unlock(&scheduler->lock); + rt_mutex_unlock(&scheduler->lock); return ret; } @@ -4920,7 +6452,7 @@ static bool scheduler_handle_reset_in_protected_mode(struct kbase_device *kbdev) unsigned long flags; u32 csg_nr; - mutex_lock(&scheduler->lock); + rt_mutex_lock(&scheduler->lock); spin_lock_irqsave(&scheduler->interrupt_lock, flags); protm_grp = scheduler->active_protm_grp; @@ -4981,27 +6513,21 @@ static bool scheduler_handle_reset_in_protected_mode(struct kbase_device *kbdev) cleanup_csg_slot(group); group->run_state = KBASE_CSF_GROUP_SUSPENDED; + KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSF_GROUP_SUSPENDED, group, group->run_state); /* Simply treat the normal mode groups as non-idle. The tick * scheduled after the reset will re-initialize the counter * anyways. */ new_val = atomic_inc_return(&scheduler->non_idle_offslot_grps); - KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_INC, - group, new_val); + KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_GRP_INC, group, new_val); } unlock: - mutex_unlock(&scheduler->lock); + rt_mutex_unlock(&scheduler->lock); return suspend_on_slot_groups; } -static void cancel_tock_work(struct kbase_csf_scheduler *const scheduler) -{ - cancel_delayed_work_sync(&scheduler->tock_work); - scheduler->tock_pending_request = false; -} - static void scheduler_inner_reset(struct kbase_device *kbdev) { u32 const num_groups = kbdev->csf.global_iface.group_num; @@ -5011,19 +6537,22 @@ static void scheduler_inner_reset(struct kbase_device *kbdev) WARN_ON(kbase_csf_scheduler_get_nr_active_csgs(kbdev)); /* Cancel any potential queued delayed work(s) */ +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + cancel_delayed_work_sync(&scheduler->gpu_idle_work); +#else cancel_work_sync(&kbdev->csf.scheduler.gpu_idle_work); - cancel_tick_timer(kbdev); - cancel_work_sync(&scheduler->tick_work); +#endif + cancel_tick_work(scheduler); cancel_tock_work(scheduler); cancel_delayed_work_sync(&scheduler->ping_work); - mutex_lock(&scheduler->lock); + rt_mutex_lock(&scheduler->lock); spin_lock_irqsave(&scheduler->interrupt_lock, flags); bitmap_fill(scheduler->csgs_events_enable_mask, MAX_SUPPORTED_CSGS); if (scheduler->active_protm_grp) - KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_EXIT_PROTM, - scheduler->active_protm_grp, 0u); + KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_PROTM_EXIT, scheduler->active_protm_grp, + 0u); scheduler->active_protm_grp = NULL; memset(kbdev->csf.scheduler.csg_slots, 0, num_groups * sizeof(struct kbase_csf_csg_slot)); @@ -5037,7 +6566,7 @@ static void scheduler_inner_reset(struct kbase_device *kbdev) scheduler->num_active_address_spaces | (((u64)scheduler->total_runnable_grps) << 32)); - mutex_unlock(&scheduler->lock); + rt_mutex_unlock(&scheduler->lock); } void kbase_csf_scheduler_reset(struct kbase_device *kbdev) @@ -5046,7 +6575,9 @@ void kbase_csf_scheduler_reset(struct kbase_device *kbdev) WARN_ON(!kbase_reset_gpu_is_active(kbdev)); - KBASE_KTRACE_ADD(kbdev, SCHEDULER_RESET, NULL, 0u); + KBASE_KTRACE_ADD(kbdev, SCHEDULER_RESET_START, NULL, 0u); + + kbase_debug_csf_fault_wait_completion(kbdev); if (scheduler_handle_reset_in_protected_mode(kbdev) && !suspend_active_queue_groups_on_reset(kbdev)) { @@ -5084,6 +6615,8 @@ void kbase_csf_scheduler_reset(struct kbase_device *kbdev) mutex_unlock(&kbdev->kctx_list_lock); + KBASE_KTRACE_ADD(kbdev, SCHEDULER_RESET_END, NULL, 0u); + /* After queue groups reset, the scheduler data fields clear out */ scheduler_inner_reset(kbdev); } @@ -5111,7 +6644,7 @@ static void firmware_aliveness_monitor(struct work_struct *work) return; } - mutex_lock(&kbdev->csf.scheduler.lock); + rt_mutex_lock(&kbdev->csf.scheduler.lock); #ifdef CONFIG_MALI_DEBUG if (fw_debug) { @@ -5138,7 +6671,7 @@ static void firmware_aliveness_monitor(struct work_struct *work) kbase_csf_scheduler_wait_mcu_active(kbdev); - err = kbase_csf_firmware_ping_wait(kbdev); + err = kbase_csf_firmware_ping_wait(kbdev, kbdev->csf.fw_timeout_ms); if (err) { /* It is acceptable to enqueue a reset whilst we've prevented @@ -5148,14 +6681,14 @@ static void firmware_aliveness_monitor(struct work_struct *work) kbdev, RESET_FLAGS_HWC_UNRECOVERABLE_ERROR)) kbase_reset_gpu(kbdev); } else if (kbase_csf_scheduler_get_nr_active_csgs(kbdev) == 1) { - queue_delayed_work(system_long_wq, - &kbdev->csf.scheduler.ping_work, - msecs_to_jiffies(FIRMWARE_PING_INTERVAL_MS)); + queue_delayed_work( + system_long_wq, &kbdev->csf.scheduler.ping_work, + msecs_to_jiffies(kbase_get_timeout_ms(kbdev, CSF_FIRMWARE_PING_TIMEOUT))); } kbase_pm_context_idle(kbdev); exit: - mutex_unlock(&kbdev->csf.scheduler.lock); + rt_mutex_unlock(&kbdev->csf.scheduler.lock); kbase_reset_gpu_allow(kbdev); } @@ -5170,7 +6703,7 @@ int kbase_csf_scheduler_group_copy_suspend_buf(struct kbase_queue_group *group, kbase_reset_gpu_assert_prevented(kbdev); lockdep_assert_held(&kctx->csf.lock); - mutex_lock(&scheduler->lock); + rt_mutex_lock(&scheduler->lock); on_slot = kbasep_csf_scheduler_group_is_on_slot_locked(group); @@ -5207,9 +6740,13 @@ int kbase_csf_scheduler_group_copy_suspend_buf(struct kbase_queue_group *group, if (!WARN_ON(scheduler->state == SCHED_SUSPENDED)) suspend_queue_group(group); - err = wait_csg_slots_suspend(kbdev, slot_mask, - kbdev->csf.fw_timeout_ms); + err = wait_csg_slots_suspend(kbdev, slot_mask); if (err) { + const struct gpu_uevent evt = { + .type = GPU_UEVENT_TYPE_KMD_ERROR, + .info = GPU_UEVENT_INFO_CSG_GROUP_SUSPEND + }; + pixel_gpu_uevent_send(kbdev, &evt); dev_warn(kbdev->dev, "[%llu] Timeout waiting for the group %d to suspend on slot %d", kbase_backend_get_cycle_cnt(kbdev), group->handle, group->csg_nr); @@ -5248,7 +6785,7 @@ int kbase_csf_scheduler_group_copy_suspend_buf(struct kbase_queue_group *group, target_page_nr < sus_buf->nr_pages; i++) { struct page *pg = as_page(group->normal_suspend_buf.phy[i]); - void *sus_page = kmap(pg); + void *sus_page = kbase_kmap(pg); if (sus_page) { kbase_sync_single_for_cpu(kbdev, @@ -5259,7 +6796,7 @@ int kbase_csf_scheduler_group_copy_suspend_buf(struct kbase_queue_group *group, sus_buf->pages, sus_page, &to_copy, sus_buf->nr_pages, &target_page_nr, offset); - kunmap(pg); + kbase_kunmap(pg, sus_page); if (err) break; } else { @@ -5274,7 +6811,7 @@ int kbase_csf_scheduler_group_copy_suspend_buf(struct kbase_queue_group *group, } exit: - mutex_unlock(&scheduler->lock); + rt_mutex_unlock(&scheduler->lock); return err; } @@ -5375,15 +6912,31 @@ static struct kbase_queue_group *scheduler_get_protm_enter_async_group( spin_lock_irqsave(&scheduler->interrupt_lock, flags); - if (kbase_csf_scheduler_protected_mode_in_use(kbdev) || - bitmap_empty(pending, ginfo->stream_num)) + if (bitmap_empty(pending, ginfo->stream_num)) { + dev_dbg(kbdev->dev, + "Pmode requested for group %d of ctx %d_%d with no pending queues", + input_grp->handle, input_grp->kctx->tgid, input_grp->kctx->id); + input_grp = NULL; + } else if (kbase_csf_scheduler_protected_mode_in_use(kbdev)) { + kbase_csf_scheduler_invoke_tock(kbdev); input_grp = NULL; + } spin_unlock_irqrestore(&scheduler->interrupt_lock, flags); } else { + if (group && (group->priority == KBASE_QUEUE_GROUP_PRIORITY_REALTIME)) + kbase_csf_scheduler_invoke_tock(kbdev); + input_grp = NULL; } +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + if (input_grp && kbdev->csf.scheduler.sc_power_rails_off) { + dev_warn(kbdev->dev, "SC power rails unexpectedly off in async protm enter"); + return NULL; + } +#endif + return input_grp; } @@ -5399,15 +6952,15 @@ void kbase_csf_scheduler_group_protm_enter(struct kbase_queue_group *group) if (err) return; - mutex_lock(&scheduler->lock); + rt_mutex_lock(&scheduler->lock); - if (group->run_state == KBASE_CSF_GROUP_IDLE) - group->run_state = KBASE_CSF_GROUP_RUNNABLE; + if (on_slot_group_idle_locked(group)) + update_idle_protm_group_state_to_runnable(group); /* Check if the group is now eligible for execution in protected mode. */ if (scheduler_get_protm_enter_async_group(kbdev, group)) scheduler_group_check_protm_enter(kbdev, group); - mutex_unlock(&scheduler->lock); + rt_mutex_unlock(&scheduler->lock); kbase_reset_gpu_allow(kbdev); } @@ -5450,7 +7003,7 @@ static bool check_sync_update_for_on_slot_group( stream, CS_STATUS_WAIT); unsigned long flags; - KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, QUEUE_SYNC_STATUS_WAIT, + KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, QUEUE_SYNC_UPDATE_WAIT_STATUS, queue->group, queue, status); if (!CS_STATUS_WAIT_SYNC_WAIT_GET(status)) @@ -5477,6 +7030,10 @@ static bool check_sync_update_for_on_slot_group( if (!evaluate_sync_update(queue)) continue; +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + queue->status_wait = 0; +#endif + /* Update csg_slots_idle_mask and group's run_state */ if (group->run_state != KBASE_CSF_GROUP_RUNNABLE) { /* Only clear the group's idle flag if it has been dealt @@ -5492,11 +7049,34 @@ static bool check_sync_update_for_on_slot_group( scheduler->csg_slots_idle_mask[0]); spin_unlock_irqrestore( &scheduler->interrupt_lock, flags); + /* Request the scheduler to confirm the condition inferred + * here inside the protected mode. + */ + group->reevaluate_idle_status = true; group->run_state = KBASE_CSF_GROUP_RUNNABLE; + KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSF_GROUP_RUNNABLE, group, + group->run_state); } KBASE_KTRACE_ADD_CSF_GRP(kbdev, GROUP_SYNC_UPDATE_DONE, group, 0u); sync_update_done = true; + +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + /* As the queue of an on-slot group has become unblocked, + * the power rails can be turned on and the execution can + * be resumed on HW. + */ + if (kbdev->csf.scheduler.sc_power_rails_off) { + cancel_gpu_idle_work(kbdev); + turn_on_sc_power_rails(kbdev); + spin_lock_irqsave(&scheduler->interrupt_lock, + flags); + kbase_csf_ring_cs_kernel_doorbell(kbdev, + queue->csi_index, group->csg_nr, true); + spin_unlock_irqrestore(&scheduler->interrupt_lock, + flags); + } +#endif } } @@ -5571,17 +7151,34 @@ static void check_sync_update_in_sleep_mode(struct kbase_device *kbdev) continue; if (check_sync_update_for_on_slot_group(group)) { - /* As sync update has been performed for an on-slot - * group, when MCU is in sleep state, ring the doorbell - * so that FW can re-evaluate the SYNC_WAIT on wakeup. - */ - kbase_csf_ring_doorbell(kbdev, CSF_KERNEL_DOORBELL_NR); scheduler_wakeup(kbdev, true); return; } } } +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS +static void check_sync_update_after_sc_power_down(struct kbase_device *kbdev) +{ + struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler; + u32 const num_groups = kbdev->csf.global_iface.group_num; + u32 csg_nr; + + lockdep_assert_held(&scheduler->lock); + + for (csg_nr = 0; csg_nr < num_groups; csg_nr++) { + struct kbase_queue_group *const group = + kbdev->csf.scheduler.csg_slots[csg_nr].resident_group; + + if (!group) + continue; + + if (check_sync_update_for_on_slot_group(group)) + return; + } +} +#endif + /** * check_group_sync_update_worker() - Check the sync wait condition for all the * blocked queue groups @@ -5597,7 +7194,7 @@ static void check_sync_update_in_sleep_mode(struct kbase_device *kbdev) * runnable groups so that Scheduler can consider scheduling the group * in next tick or exit protected mode. */ -static void check_group_sync_update_worker(struct work_struct *work) +static void check_group_sync_update_worker(struct kthread_work *work) { struct kbase_context *const kctx = container_of(work, struct kbase_context, csf.sched.sync_update_work); @@ -5605,9 +7202,18 @@ static void check_group_sync_update_worker(struct work_struct *work) struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler; bool sync_updated = false; - mutex_lock(&scheduler->lock); + rt_mutex_lock(&scheduler->lock); + +#if IS_ENABLED(CONFIG_DEBUG_FS) + if (unlikely(scheduler->state == SCHED_BUSY)) { + kthread_queue_work(&kctx->csf.sched.sync_update_worker, + &kctx->csf.sched.sync_update_work); + rt_mutex_unlock(&scheduler->lock); + return; + } +#endif - KBASE_KTRACE_ADD(kbdev, GROUP_SYNC_UPDATE_WORKER_BEGIN, kctx, 0u); + KBASE_KTRACE_ADD(kbdev, SCHEDULER_GROUP_SYNC_UPDATE_WORKER_START, kctx, 0u); if (kctx->csf.sched.num_idle_wait_grps != 0) { struct kbase_queue_group *group, *temp; @@ -5620,6 +7226,14 @@ static void check_group_sync_update_worker(struct work_struct *work) */ update_idle_suspended_group_state(group); KBASE_KTRACE_ADD_CSF_GRP(kbdev, GROUP_SYNC_UPDATE_DONE, group, 0u); +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + cancel_gpu_idle_work(kbdev); + /* As an off-slot group has become runnable, + * the rails will be turned on and the CS + * kernel doorbell will be rung from the + * scheduling tick. + */ +#endif } } } else { @@ -5637,9 +7251,18 @@ static void check_group_sync_update_worker(struct work_struct *work) if (!sync_updated && (scheduler->state == SCHED_SLEEPING)) check_sync_update_in_sleep_mode(kbdev); - KBASE_KTRACE_ADD(kbdev, GROUP_SYNC_UPDATE_WORKER_END, kctx, 0u); +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + /* Check if the sync update happened for a blocked on-slot group, + * after the shader core power rails were turned off and reactivate + * the GPU if the wait condition is met for the blocked group. + */ + if (!sync_updated && scheduler->sc_power_rails_off) + check_sync_update_after_sc_power_down(kbdev); +#endif + + KBASE_KTRACE_ADD(kbdev, SCHEDULER_GROUP_SYNC_UPDATE_WORKER_END, kctx, 0u); - mutex_unlock(&scheduler->lock); + rt_mutex_unlock(&scheduler->lock); } static @@ -5647,9 +7270,9 @@ enum kbase_csf_event_callback_action check_group_sync_update_cb(void *param) { struct kbase_context *const kctx = param; - KBASE_KTRACE_ADD(kctx->kbdev, SYNC_UPDATE_EVENT, kctx, 0u); + KBASE_KTRACE_ADD(kctx->kbdev, SCHEDULER_GROUP_SYNC_UPDATE_EVENT, kctx, 0u); - queue_work(kctx->csf.sched.sync_update_wq, + kthread_queue_work(&kctx->csf.sched.sync_update_worker, &kctx->csf.sched.sync_update_work); return KBASE_CSF_EVENT_CALLBACK_KEEP; @@ -5659,6 +7282,15 @@ int kbase_csf_scheduler_context_init(struct kbase_context *kctx) { int priority; int err; + struct kbase_device *kbdev = kctx->kbdev; + +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) + err = gpu_metrics_ctx_init(kctx); + if (err) + return err; +#endif /* CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD */ + + kbase_ctx_sched_init_ctx(kctx); for (priority = 0; priority < KBASE_QUEUE_GROUP_PRIORITY_COUNT; ++priority) { @@ -5670,34 +7302,113 @@ int kbase_csf_scheduler_context_init(struct kbase_context *kctx) kctx->csf.sched.num_idle_wait_grps = 0; kctx->csf.sched.ngrp_to_schedule = 0; - kctx->csf.sched.sync_update_wq = - alloc_ordered_workqueue("mali_kbase_csf_sync_update_wq", - WQ_HIGHPRI); - if (!kctx->csf.sched.sync_update_wq) { + err = kbase_kthread_run_worker_rt(kctx->kbdev, &kctx->csf.sched.sync_update_worker, "csf_sync_update"); + if (err) { dev_err(kctx->kbdev->dev, "Failed to initialize scheduler context workqueue"); - return -ENOMEM; + err = -ENOMEM; + goto alloc_wq_failed; } - INIT_WORK(&kctx->csf.sched.sync_update_work, + kthread_init_work(&kctx->csf.sched.sync_update_work, check_group_sync_update_worker); + kbase_csf_tiler_heap_reclaim_ctx_init(kctx); + err = kbase_csf_event_wait_add(kctx, check_group_sync_update_cb, kctx); if (err) { - dev_err(kctx->kbdev->dev, - "Failed to register a sync update callback"); - destroy_workqueue(kctx->csf.sched.sync_update_wq); + dev_err(kbdev->dev, "Failed to register a sync update callback"); + goto event_wait_add_failed; } return err; + +event_wait_add_failed: + kbase_destroy_kworker_stack(&kctx->csf.sched.sync_update_worker); +alloc_wq_failed: + kbase_ctx_sched_remove_ctx(kctx); +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) + gpu_metrics_ctx_term(kctx); +#endif /* CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD */ + return err; } void kbase_csf_scheduler_context_term(struct kbase_context *kctx) { kbase_csf_event_wait_remove(kctx, check_group_sync_update_cb, kctx); - cancel_work_sync(&kctx->csf.sched.sync_update_work); - destroy_workqueue(kctx->csf.sched.sync_update_wq); + kthread_cancel_work_sync(&kctx->csf.sched.sync_update_work); + kbase_destroy_kworker_stack(&kctx->csf.sched.sync_update_worker); + + kbase_ctx_sched_remove_ctx(kctx); +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) + gpu_metrics_ctx_term(kctx); +#endif /* CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD */ +} + +static int kbase_csf_scheduler_kthread(void *data) +{ + struct kbase_device *const kbdev = data; + struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler; + + while (scheduler->kthread_running) { + struct kbase_queue *queue; + + if (wait_for_completion_interruptible(&scheduler->kthread_signal) != 0) + continue; + reinit_completion(&scheduler->kthread_signal); + + /* Iterate through queues with pending kicks */ + do { + u8 prio; + + spin_lock(&kbdev->csf.pending_gpuq_kicks_lock); + queue = NULL; + for (prio = 0; prio != KBASE_QUEUE_GROUP_PRIORITY_COUNT; ++prio) { + if (!list_empty(&kbdev->csf.pending_gpuq_kicks[prio])) { + queue = list_first_entry( + &kbdev->csf.pending_gpuq_kicks[prio], + struct kbase_queue, pending_kick_link); + list_del_init(&queue->pending_kick_link); + break; + } + } + spin_unlock(&kbdev->csf.pending_gpuq_kicks_lock); + + if (queue != NULL) { + WARN_ONCE( + prio != queue->group_priority, + "Queue %pK has priority %hhu but instead its kick was handled at priority %hhu", + (void *)queue, queue->group_priority, prio); + + kbase_csf_process_queue_kick(queue); + + /* Perform a scheduling tock for high-priority queue groups if + * required. + */ + BUILD_BUG_ON(KBASE_QUEUE_GROUP_PRIORITY_REALTIME != 0); + BUILD_BUG_ON(KBASE_QUEUE_GROUP_PRIORITY_HIGH != 1); + if ((prio <= KBASE_QUEUE_GROUP_PRIORITY_HIGH) && + atomic_read(&scheduler->pending_tock_work)) + schedule_on_tock(kbdev); + } + } while (queue != NULL); + + /* Check if we need to perform a scheduling tick/tock. A tick + * event shall override a tock event but not vice-versa. + */ + if (atomic_cmpxchg(&scheduler->pending_tick_work, true, false) == true) { + atomic_set(&scheduler->pending_tock_work, false); + schedule_on_tick(kbdev); + } else if (atomic_read(&scheduler->pending_tock_work)) { + schedule_on_tock(kbdev); + } + + dev_dbg(kbdev->dev, "Waking up for event after a scheduling iteration."); + wake_up_all(&kbdev->csf.event_wait); + } + + return 0; } int kbase_csf_scheduler_init(struct kbase_device *kbdev) @@ -5716,35 +7427,56 @@ int kbase_csf_scheduler_init(struct kbase_device *kbdev) return -ENOMEM; } - return 0; + init_completion(&scheduler->kthread_signal); + scheduler->kthread_running = true; + scheduler->gpuq_kthread = + kbase_kthread_run_rt(kbdev, &kbase_csf_scheduler_kthread, kbdev, "mali-gpuq-kthread"); + if (IS_ERR(scheduler->gpuq_kthread)) { + kfree(scheduler->csg_slots); + scheduler->csg_slots = NULL; + + dev_err(kbdev->dev, "Failed to spawn the GPU queue submission worker thread"); + return -ENOMEM; + } +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) && !IS_ENABLED(CONFIG_MALI_NO_MALI) + scheduler->gpu_metrics_tb = + kbase_csf_firmware_get_trace_buffer(kbdev, KBASE_CSFFW_GPU_METRICS_BUF_NAME); + if (!scheduler->gpu_metrics_tb) { + scheduler->kthread_running = false; + complete(&scheduler->kthread_signal); + kthread_stop(scheduler->gpuq_kthread); + scheduler->gpuq_kthread = NULL; + + kfree(scheduler->csg_slots); + scheduler->csg_slots = NULL; + + dev_err(kbdev->dev, "Failed to get the handler of gpu_metrics from trace buffer"); + return -ENOENT; + } +#endif /* CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD */ + + return kbase_csf_mcu_shared_regs_data_init(kbdev); } int kbase_csf_scheduler_early_init(struct kbase_device *kbdev) { struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler; - scheduler->timer_enabled = true; + atomic_set(&scheduler->timer_enabled, true); - scheduler->wq = alloc_ordered_workqueue("csf_scheduler_wq", WQ_HIGHPRI); - if (!scheduler->wq) { - dev_err(kbdev->dev, "Failed to allocate scheduler workqueue\n"); - return -ENOMEM; - } scheduler->idle_wq = alloc_ordered_workqueue( "csf_scheduler_gpu_idle_wq", WQ_HIGHPRI); if (!scheduler->idle_wq) { - dev_err(kbdev->dev, - "Failed to allocate GPU idle scheduler workqueue\n"); - destroy_workqueue(kbdev->csf.scheduler.wq); + dev_err(kbdev->dev, "Failed to allocate GPU idle scheduler workqueue\n"); return -ENOMEM; } - INIT_WORK(&scheduler->tick_work, schedule_on_tick); - INIT_DEFERRABLE_WORK(&scheduler->tock_work, schedule_on_tock); + atomic_set(&scheduler->pending_tick_work, false); + atomic_set(&scheduler->pending_tock_work, false); INIT_DEFERRABLE_WORK(&scheduler->ping_work, firmware_aliveness_monitor); - mutex_init(&scheduler->lock); + rt_mutex_init(&scheduler->lock); spin_lock_init(&scheduler->interrupt_lock); /* Internal lists */ @@ -5756,30 +7488,48 @@ int kbase_csf_scheduler_early_init(struct kbase_device *kbdev) (sizeof(scheduler->csgs_events_enable_mask) * BITS_PER_BYTE)); bitmap_fill(scheduler->csgs_events_enable_mask, MAX_SUPPORTED_CSGS); scheduler->state = SCHED_SUSPENDED; + KBASE_KTRACE_ADD(kbdev, SCHED_SUSPENDED, NULL, scheduler->state); scheduler->pm_active_count = 0; scheduler->ngrp_to_schedule = 0; scheduler->total_runnable_grps = 0; scheduler->top_ctx = NULL; scheduler->top_grp = NULL; scheduler->last_schedule = 0; - scheduler->tock_pending_request = false; scheduler->active_protm_grp = NULL; scheduler->csg_scheduling_period_ms = CSF_SCHEDULER_TIME_TICK_MS; scheduler_doorbell_init(kbdev); +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + INIT_DEFERRABLE_WORK(&scheduler->gpu_idle_work, gpu_idle_worker); + INIT_WORK(&scheduler->sc_rails_off_work, sc_rails_off_worker); + scheduler->sc_power_rails_off = true; + scheduler->gpu_idle_work_pending = false; + scheduler->gpu_idle_fw_timer_enabled = false; +#else INIT_WORK(&scheduler->gpu_idle_work, gpu_idle_worker); +#endif + scheduler->fast_gpu_idle_handling = false; atomic_set(&scheduler->gpu_no_longer_idle, false); atomic_set(&scheduler->non_idle_offslot_grps, 0); hrtimer_init(&scheduler->tick_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); scheduler->tick_timer.function = tick_timer_callback; - scheduler->tick_timer_active = false; + + kbase_csf_tiler_heap_reclaim_mgr_init(kbdev); return 0; } void kbase_csf_scheduler_term(struct kbase_device *kbdev) { + struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler; + + if (scheduler->gpuq_kthread) { + scheduler->kthread_running = false; + complete(&scheduler->kthread_signal); + kthread_stop(scheduler->gpuq_kthread); + } + if (kbdev->csf.scheduler.csg_slots) { WARN_ON(atomic_read(&kbdev->csf.scheduler.non_idle_offslot_grps)); /* The unload of Driver can take place only when all contexts have @@ -5788,34 +7538,42 @@ void kbase_csf_scheduler_term(struct kbase_device *kbdev) * to be active at the time of Driver unload. */ WARN_ON(kbase_csf_scheduler_get_nr_active_csgs(kbdev)); +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + flush_work(&kbdev->csf.scheduler.sc_rails_off_work); + flush_delayed_work(&kbdev->csf.scheduler.gpu_idle_work); +#else flush_work(&kbdev->csf.scheduler.gpu_idle_work); - mutex_lock(&kbdev->csf.scheduler.lock); +#endif + rt_mutex_lock(&kbdev->csf.scheduler.lock); if (kbdev->csf.scheduler.state != SCHED_SUSPENDED) { + unsigned long flags; /* The power policy could prevent the Scheduler from * getting suspended when GPU becomes idle. */ + spin_lock_irqsave(&kbdev->hwaccess_lock, flags); WARN_ON(kbase_pm_idle_groups_sched_suspendable(kbdev)); + spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); scheduler_suspend(kbdev); } - mutex_unlock(&kbdev->csf.scheduler.lock); + rt_mutex_unlock(&kbdev->csf.scheduler.lock); cancel_delayed_work_sync(&kbdev->csf.scheduler.ping_work); - cancel_tick_timer(kbdev); - cancel_work_sync(&kbdev->csf.scheduler.tick_work); - cancel_tock_work(&kbdev->csf.scheduler); - mutex_destroy(&kbdev->csf.scheduler.lock); kfree(kbdev->csf.scheduler.csg_slots); kbdev->csf.scheduler.csg_slots = NULL; } + KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSF_GROUP_TERMINATED, NULL, + kbase_csf_scheduler_get_nr_active_csgs(kbdev)); + /* Terminating the MCU shared regions, following the release of slots */ + kbase_csf_mcu_shared_regs_data_term(kbdev); } void kbase_csf_scheduler_early_term(struct kbase_device *kbdev) { if (kbdev->csf.scheduler.idle_wq) destroy_workqueue(kbdev->csf.scheduler.idle_wq); - if (kbdev->csf.scheduler.wq) - destroy_workqueue(kbdev->csf.scheduler.wq); + + kbase_csf_tiler_heap_reclaim_mgr_term(kbdev); } /** @@ -5834,7 +7592,7 @@ static void scheduler_enable_tick_timer_nolock(struct kbase_device *kbdev) lockdep_assert_held(&kbdev->csf.scheduler.lock); - if (unlikely(!scheduler_timer_is_enabled_nolock(kbdev))) + if (unlikely(!kbase_csf_scheduler_timer_is_enabled(kbdev))) return; WARN_ON((scheduler->state != SCHED_INACTIVE) && @@ -5842,30 +7600,18 @@ static void scheduler_enable_tick_timer_nolock(struct kbase_device *kbdev) (scheduler->state != SCHED_SLEEPING)); if (scheduler->total_runnable_grps > 0) { - enqueue_tick_work(kbdev); + kbase_csf_scheduler_invoke_tick(kbdev); dev_dbg(kbdev->dev, "Re-enabling the scheduler timer\n"); } else if (scheduler->state != SCHED_SUSPENDED) { - enqueue_gpu_idle_work(scheduler); + enqueue_gpu_idle_work(scheduler, 0); } } void kbase_csf_scheduler_enable_tick_timer(struct kbase_device *kbdev) { - mutex_lock(&kbdev->csf.scheduler.lock); + rt_mutex_lock(&kbdev->csf.scheduler.lock); scheduler_enable_tick_timer_nolock(kbdev); - mutex_unlock(&kbdev->csf.scheduler.lock); -} - -bool kbase_csf_scheduler_timer_is_enabled(struct kbase_device *kbdev) -{ - struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler; - bool enabled; - - mutex_lock(&scheduler->lock); - enabled = scheduler_timer_is_enabled_nolock(kbdev); - mutex_unlock(&scheduler->lock); - - return enabled; + rt_mutex_unlock(&kbdev->csf.scheduler.lock); } void kbase_csf_scheduler_timer_set_enabled(struct kbase_device *kbdev, @@ -5874,66 +7620,52 @@ void kbase_csf_scheduler_timer_set_enabled(struct kbase_device *kbdev, struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler; bool currently_enabled; - mutex_lock(&scheduler->lock); + /* This lock is taken to prevent this code being executed concurrently + * by userspace. + */ + rt_mutex_lock(&scheduler->lock); - currently_enabled = scheduler_timer_is_enabled_nolock(kbdev); + currently_enabled = kbase_csf_scheduler_timer_is_enabled(kbdev); if (currently_enabled && !enable) { - scheduler->timer_enabled = false; - cancel_tick_timer(kbdev); - cancel_delayed_work(&scheduler->tock_work); - scheduler->tock_pending_request = false; - mutex_unlock(&scheduler->lock); - /* The non-sync version to cancel the normal work item is not - * available, so need to drop the lock before cancellation. - */ - cancel_work_sync(&scheduler->tick_work); - return; - } - - if (!currently_enabled && enable) { - scheduler->timer_enabled = true; - - scheduler_enable_tick_timer_nolock(kbdev); + atomic_set(&scheduler->timer_enabled, false); + cancel_tick_work(scheduler); + } else if (!currently_enabled && enable) { + atomic_set(&scheduler->timer_enabled, true); + kbase_csf_scheduler_invoke_tick(kbdev); } - mutex_unlock(&scheduler->lock); + rt_mutex_unlock(&scheduler->lock); } void kbase_csf_scheduler_kick(struct kbase_device *kbdev) { struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler; - mutex_lock(&scheduler->lock); + if (unlikely(kbase_csf_scheduler_timer_is_enabled(kbdev))) + return; - if (unlikely(scheduler_timer_is_enabled_nolock(kbdev))) - goto out; + /* This lock is taken to prevent this code being executed concurrently + * by userspace. + */ + rt_mutex_lock(&scheduler->lock); - if (scheduler->total_runnable_grps > 0) { - enqueue_tick_work(kbdev); - dev_dbg(kbdev->dev, "Kicking the scheduler manually\n"); - } + kbase_csf_scheduler_invoke_tick(kbdev); + dev_dbg(kbdev->dev, "Kicking the scheduler manually\n"); -out: - mutex_unlock(&scheduler->lock); + rt_mutex_unlock(&scheduler->lock); } -int kbase_csf_scheduler_pm_suspend(struct kbase_device *kbdev) +int kbase_csf_scheduler_pm_suspend_no_lock(struct kbase_device *kbdev) { - int result = 0; struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler; + int result = 0; - /* Cancel any potential queued delayed work(s) */ - cancel_work_sync(&scheduler->tick_work); - cancel_tock_work(scheduler); - - result = kbase_reset_gpu_prevent_and_wait(kbdev); - if (result) { - dev_warn(kbdev->dev, - "Stop PM suspending for failing to prevent gpu reset.\n"); - return result; - } + lockdep_assert_held(&scheduler->lock); - mutex_lock(&scheduler->lock); +#if IS_ENABLED(CONFIG_DEBUG_FS) + if (unlikely(scheduler->state == SCHED_BUSY)) + return -EBUSY; +#endif #ifdef KBASE_PM_RUNTIME /* If scheduler is in sleeping state, then MCU needs to be activated @@ -5954,14 +7686,35 @@ int kbase_csf_scheduler_pm_suspend(struct kbase_device *kbdev) dev_warn(kbdev->dev, "failed to suspend active groups"); goto exit; } else { - dev_info(kbdev->dev, "Scheduler PM suspend"); + dev_dbg(kbdev->dev, "Scheduler PM suspend"); scheduler_suspend(kbdev); - cancel_tick_timer(kbdev); + cancel_tick_work(scheduler); } } exit: - mutex_unlock(&scheduler->lock); + return result; +} + +int kbase_csf_scheduler_pm_suspend(struct kbase_device *kbdev) +{ + int result = 0; + struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler; + + /* Cancel any potential queued delayed work(s) */ + cancel_tick_work(scheduler); + cancel_tock_work(scheduler); + + result = kbase_reset_gpu_prevent_and_wait(kbdev); + if (result) { + dev_warn(kbdev->dev, "Stop PM suspending for failing to prevent gpu reset.\n"); + return result; + } + + rt_mutex_lock(&scheduler->lock); + + result = kbase_csf_scheduler_pm_suspend_no_lock(kbdev); + rt_mutex_unlock(&scheduler->lock); kbase_reset_gpu_allow(kbdev); @@ -5969,17 +7722,23 @@ exit: } KBASE_EXPORT_TEST_API(kbase_csf_scheduler_pm_suspend); -void kbase_csf_scheduler_pm_resume(struct kbase_device *kbdev) +void kbase_csf_scheduler_pm_resume_no_lock(struct kbase_device *kbdev) { struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler; - mutex_lock(&scheduler->lock); + lockdep_assert_held(&scheduler->lock); if ((scheduler->total_runnable_grps > 0) && (scheduler->state == SCHED_SUSPENDED)) { - dev_info(kbdev->dev, "Scheduler PM resume"); + dev_dbg(kbdev->dev, "Scheduler PM resume"); scheduler_wakeup(kbdev, true); } - mutex_unlock(&scheduler->lock); +} + +void kbase_csf_scheduler_pm_resume(struct kbase_device *kbdev) +{ + rt_mutex_lock(&kbdev->csf.scheduler.lock); + kbase_csf_scheduler_pm_resume_no_lock(kbdev); + rt_mutex_unlock(&kbdev->csf.scheduler.lock); } KBASE_EXPORT_TEST_API(kbase_csf_scheduler_pm_resume); @@ -5989,10 +7748,10 @@ void kbase_csf_scheduler_pm_active(struct kbase_device *kbdev) * callback function, which may need to wake up the MCU for suspending * the CSGs before powering down the GPU. */ - mutex_lock(&kbdev->csf.scheduler.lock); + rt_mutex_lock(&kbdev->csf.scheduler.lock); scheduler_pm_active_handle_suspend(kbdev, KBASE_PM_SUSPEND_HANDLER_NOT_POSSIBLE); - mutex_unlock(&kbdev->csf.scheduler.lock); + rt_mutex_unlock(&kbdev->csf.scheduler.lock); } KBASE_EXPORT_TEST_API(kbase_csf_scheduler_pm_active); @@ -6001,13 +7760,13 @@ void kbase_csf_scheduler_pm_idle(struct kbase_device *kbdev) /* Here the lock is taken just to maintain symmetry with * kbase_csf_scheduler_pm_active(). */ - mutex_lock(&kbdev->csf.scheduler.lock); + rt_mutex_lock(&kbdev->csf.scheduler.lock); scheduler_pm_idle(kbdev); - mutex_unlock(&kbdev->csf.scheduler.lock); + rt_mutex_unlock(&kbdev->csf.scheduler.lock); } KBASE_EXPORT_TEST_API(kbase_csf_scheduler_pm_idle); -int kbase_csf_scheduler_wait_mcu_active(struct kbase_device *kbdev) +static int scheduler_wait_mcu_active(struct kbase_device *kbdev, bool killable_wait) { struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler; unsigned long flags; @@ -6020,9 +7779,17 @@ int kbase_csf_scheduler_wait_mcu_active(struct kbase_device *kbdev) spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); kbase_pm_unlock(kbdev); - kbase_pm_wait_for_poweroff_work_complete(kbdev); + if (killable_wait) + err = kbase_pm_killable_wait_for_poweroff_work_complete(kbdev); + else + err = kbase_pm_wait_for_poweroff_work_complete(kbdev); + if (err) + return err; - err = kbase_pm_wait_for_desired_state(kbdev); + if (killable_wait) + err = kbase_pm_killable_wait_for_desired_state(kbdev); + else + err = kbase_pm_wait_for_desired_state(kbdev); if (!err) { spin_lock_irqsave(&kbdev->hwaccess_lock, flags); WARN_ON(kbdev->pm.backend.mcu_state != KBASE_MCU_ON); @@ -6031,6 +7798,17 @@ int kbase_csf_scheduler_wait_mcu_active(struct kbase_device *kbdev) return err; } + +int kbase_csf_scheduler_killable_wait_mcu_active(struct kbase_device *kbdev) +{ + return scheduler_wait_mcu_active(kbdev, true); +} + +int kbase_csf_scheduler_wait_mcu_active(struct kbase_device *kbdev) +{ + return scheduler_wait_mcu_active(kbdev, false); +} + KBASE_EXPORT_TEST_API(kbase_csf_scheduler_wait_mcu_active); #ifdef KBASE_PM_RUNTIME @@ -6066,6 +7844,7 @@ int kbase_csf_scheduler_handle_runtime_suspend(struct kbase_device *kbdev) } scheduler->state = SCHED_SUSPENDED; + KBASE_KTRACE_ADD(kbdev, SCHED_SUSPENDED, NULL, scheduler->state); spin_lock_irqsave(&kbdev->hwaccess_lock, flags); kbdev->pm.backend.gpu_sleep_mode_active = false; spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); @@ -6107,11 +7886,10 @@ void kbase_csf_scheduler_force_sleep(struct kbase_device *kbdev) { struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler; - mutex_lock(&scheduler->lock); - if (kbase_pm_gpu_sleep_allowed(kbdev) && - (scheduler->state == SCHED_INACTIVE)) + rt_mutex_lock(&scheduler->lock); + if (kbase_pm_gpu_sleep_allowed(kbdev) && (scheduler->state == SCHED_INACTIVE)) scheduler_sleep_on_idle(kbdev); - mutex_unlock(&scheduler->lock); + rt_mutex_unlock(&scheduler->lock); } #endif @@ -6119,7 +7897,7 @@ void kbase_csf_scheduler_force_wakeup(struct kbase_device *kbdev) { struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler; - mutex_lock(&scheduler->lock); + rt_mutex_lock(&scheduler->lock); scheduler_wakeup(kbdev, true); - mutex_unlock(&scheduler->lock); + rt_mutex_unlock(&scheduler->lock); } diff --git a/mali_kbase/csf/mali_kbase_csf_scheduler.h b/mali_kbase/csf/mali_kbase_csf_scheduler.h index a00a9ca..88521f0 100644 --- a/mali_kbase/csf/mali_kbase_csf_scheduler.h +++ b/mali_kbase/csf/mali_kbase_csf_scheduler.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -36,7 +36,9 @@ * If the CSG is already scheduled and resident, the CSI will be started * right away, otherwise once the group is made resident. * - * Return: 0 on success, or negative on failure. + * Return: 0 on success, or negative on failure. -EBUSY is returned to + * indicate to the caller that queue could not be enabled due to Scheduler + * state and the caller can try to enable the queue after sometime. */ int kbase_csf_scheduler_queue_start(struct kbase_queue *queue); @@ -274,7 +276,7 @@ int kbase_csf_scheduler_group_copy_suspend_buf(struct kbase_queue_group *group, */ static inline void kbase_csf_scheduler_lock(struct kbase_device *kbdev) { - mutex_lock(&kbdev->csf.scheduler.lock); + rt_mutex_lock(&kbdev->csf.scheduler.lock); } /** @@ -284,7 +286,7 @@ static inline void kbase_csf_scheduler_lock(struct kbase_device *kbdev) */ static inline void kbase_csf_scheduler_unlock(struct kbase_device *kbdev) { - mutex_unlock(&kbdev->csf.scheduler.lock); + rt_mutex_unlock(&kbdev->csf.scheduler.lock); } /** @@ -336,7 +338,10 @@ kbase_csf_scheduler_spin_lock_assert_held(struct kbase_device *kbdev) * * Return: true if the scheduler is configured to wake up periodically */ -bool kbase_csf_scheduler_timer_is_enabled(struct kbase_device *kbdev); +static inline bool kbase_csf_scheduler_timer_is_enabled(struct kbase_device *kbdev) +{ + return atomic_read(&kbdev->csf.scheduler.timer_enabled); +} /** * kbase_csf_scheduler_timer_set_enabled() - Enable/disable periodic @@ -410,6 +415,33 @@ void kbase_csf_scheduler_pm_idle(struct kbase_device *kbdev); int kbase_csf_scheduler_wait_mcu_active(struct kbase_device *kbdev); /** + * kbase_csf_scheduler_killable_wait_mcu_active - Wait for the MCU to actually become + * active in killable state. + * + * @kbdev: Instance of a GPU platform device that implements a CSF interface. + * + * This function is same as kbase_csf_scheduler_wait_mcu_active(), expect that + * it would allow the SIGKILL signal to interrupt the wait. + * This function is supposed to be called from the code that is executed in ioctl or + * Userspace context, wherever it is safe to do so. + * + * Return: 0 if the MCU was successfully activated, or -ETIMEDOUT code on timeout error or + * -ERESTARTSYS if the wait was interrupted. + */ +int kbase_csf_scheduler_killable_wait_mcu_active(struct kbase_device *kbdev); + +/** + * kbase_csf_scheduler_pm_resume_no_lock - Reactivate the scheduler on system resume + * + * @kbdev: Instance of a GPU platform device that implements a CSF interface. + * + * This function will make the scheduler resume the scheduling of queue groups + * and take the power managemenet reference, if there are any runnable groups. + * The caller must have acquired the global Scheduler lock. + */ +void kbase_csf_scheduler_pm_resume_no_lock(struct kbase_device *kbdev); + +/** * kbase_csf_scheduler_pm_resume - Reactivate the scheduler on system resume * * @kbdev: Instance of a GPU platform device that implements a CSF interface. @@ -420,6 +452,19 @@ int kbase_csf_scheduler_wait_mcu_active(struct kbase_device *kbdev); void kbase_csf_scheduler_pm_resume(struct kbase_device *kbdev); /** + * kbase_csf_scheduler_pm_suspend_no_lock - Idle the scheduler on system suspend + * + * @kbdev: Instance of a GPU platform device that implements a CSF interface. + * + * This function will make the scheduler suspend all the running queue groups + * and drop its power managemenet reference. + * The caller must have acquired the global Scheduler lock. + * + * Return: 0 on success. + */ +int kbase_csf_scheduler_pm_suspend_no_lock(struct kbase_device *kbdev); + +/** * kbase_csf_scheduler_pm_suspend - Idle the scheduler on system suspend * * @kbdev: Instance of a GPU platform device that implements a CSF interface. @@ -448,68 +493,44 @@ static inline bool kbase_csf_scheduler_all_csgs_idle(struct kbase_device *kbdev) } /** - * kbase_csf_scheduler_advance_tick_nolock() - Advance the scheduling tick + * kbase_csf_scheduler_invoke_tick() - Invoke the scheduling tick * * @kbdev: Pointer to the device * - * This function advances the scheduling tick by enqueing the tick work item for - * immediate execution, but only if the tick hrtimer is active. If the timer - * is inactive then the tick work item is already in flight. - * The caller must hold the interrupt lock. - */ -static inline void -kbase_csf_scheduler_advance_tick_nolock(struct kbase_device *kbdev) -{ - struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler; - - lockdep_assert_held(&scheduler->interrupt_lock); - - if (scheduler->tick_timer_active) { - KBASE_KTRACE_ADD(kbdev, SCHEDULER_ADVANCE_TICK, NULL, 0u); - scheduler->tick_timer_active = false; - queue_work(scheduler->wq, &scheduler->tick_work); - } else { - KBASE_KTRACE_ADD(kbdev, SCHEDULER_NOADVANCE_TICK, NULL, 0u); - } -} - -/** - * kbase_csf_scheduler_advance_tick() - Advance the scheduling tick - * - * @kbdev: Pointer to the device + * This function wakes up kbase_csf_scheduler_kthread() to perform a scheduling + * tick regardless of whether the tick timer is enabled. This can be called + * from interrupt context to resume the scheduling after GPU was put to sleep. * - * This function advances the scheduling tick by enqueing the tick work item for - * immediate execution, but only if the tick hrtimer is active. If the timer - * is inactive then the tick work item is already in flight. + * Caller is expected to check kbase_csf_scheduler.timer_enabled as required + * to see whether it is appropriate before calling this function. */ -static inline void kbase_csf_scheduler_advance_tick(struct kbase_device *kbdev) +static inline void kbase_csf_scheduler_invoke_tick(struct kbase_device *kbdev) { struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler; - unsigned long flags; - spin_lock_irqsave(&scheduler->interrupt_lock, flags); - kbase_csf_scheduler_advance_tick_nolock(kbdev); - spin_unlock_irqrestore(&scheduler->interrupt_lock, flags); + KBASE_KTRACE_ADD(kbdev, SCHEDULER_TICK_INVOKE, NULL, 0u); + if (atomic_cmpxchg(&scheduler->pending_tick_work, false, true) == false) + complete(&scheduler->kthread_signal); } /** - * kbase_csf_scheduler_invoke_tick() - Invoke the scheduling tick + * kbase_csf_scheduler_invoke_tock() - Invoke the scheduling tock * * @kbdev: Pointer to the device * - * This function will queue the scheduling tick work item for immediate - * execution if tick timer is not active. This can be called from interrupt - * context to resume the scheduling after GPU was put to sleep. + * This function wakes up kbase_csf_scheduler_kthread() to perform a scheduling + * tock. + * + * Caller is expected to check kbase_csf_scheduler.timer_enabled as required + * to see whether it is appropriate before calling this function. */ -static inline void kbase_csf_scheduler_invoke_tick(struct kbase_device *kbdev) +static inline void kbase_csf_scheduler_invoke_tock(struct kbase_device *kbdev) { struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler; - unsigned long flags; - spin_lock_irqsave(&scheduler->interrupt_lock, flags); - if (!scheduler->tick_timer_active) - queue_work(scheduler->wq, &scheduler->tick_work); - spin_unlock_irqrestore(&scheduler->interrupt_lock, flags); + KBASE_KTRACE_ADD(kbdev, SCHEDULER_TOCK_INVOKE, NULL, 0u); + if (atomic_cmpxchg(&scheduler->pending_tock_work, false, true) == false) + complete(&scheduler->kthread_signal); } /** @@ -570,15 +591,6 @@ int kbase_csf_scheduler_handle_runtime_suspend(struct kbase_device *kbdev); #endif /** - * kbase_csf_scheduler_process_gpu_idle_event() - Process GPU idle IRQ - * - * @kbdev: Pointer to the device - * - * This function is called when a GPU idle IRQ has been raised. - */ -void kbase_csf_scheduler_process_gpu_idle_event(struct kbase_device *kbdev); - -/** * kbase_csf_scheduler_get_nr_active_csgs() - Get the number of active CSGs * * @kbdev: Pointer to the device @@ -634,4 +646,28 @@ void kbase_csf_scheduler_force_wakeup(struct kbase_device *kbdev); void kbase_csf_scheduler_force_sleep(struct kbase_device *kbdev); #endif +/** + * kbase_csf_scheduler_process_gpu_idle_event() - Process GPU idle event + * + * @kbdev: Pointer to the device + * + * This function is called when a IRQ for GPU idle event has been raised. + * + * Return: true if the GPU idle event can be acknowledged. + */ +bool kbase_csf_scheduler_process_gpu_idle_event(struct kbase_device *kbdev); + +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS +/** + * turn_on_sc_power_rails - Turn on the shader core power rails. + * + * @kbdev: Pointer to the device. + * + * This function is called to synchronously turn on the shader core power rails, + * before execution is resumed on the cores. + * + * scheduler lock must be held when calling this function + */ +void turn_on_sc_power_rails(struct kbase_device *kbdev); +#endif #endif /* _KBASE_CSF_SCHEDULER_H_ */ diff --git a/mali_kbase/csf/mali_kbase_csf_sync_debugfs.c b/mali_kbase/csf/mali_kbase_csf_sync_debugfs.c new file mode 100644 index 0000000..0615d5f --- /dev/null +++ b/mali_kbase/csf/mali_kbase_csf_sync_debugfs.c @@ -0,0 +1,878 @@ +// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note +/* + * + * (C) COPYRIGHT 2022-2023 ARM Limited. All rights reserved. + * + * This program is free software and is provided to you under the terms of the + * GNU General Public License version 2 as published by the Free Software + * Foundation, and any use by you of this program is subject to the terms + * of such GNU license. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, you can access it online at + * http://www.gnu.org/licenses/gpl-2.0.html. + * + */ + +#include "mali_kbase_csf_sync_debugfs.h" +#include "mali_kbase_csf_csg_debugfs.h" +#include <mali_kbase.h> +#include <linux/seq_file.h> +#include <linux/version_compat_defs.h> + +#if IS_ENABLED(CONFIG_SYNC_FILE) +#include "mali_kbase_sync.h" +#endif + +#define CQS_UNREADABLE_LIVE_VALUE "(unavailable)" + +#define CSF_SYNC_DUMP_SIZE 256 + +/** + * kbasep_print() - Helper function to print to either debugfs file or dmesg. + * + * @kctx: The kbase context + * @file: The seq_file for printing to. This is NULL if printing to dmesg. + * @fmt: The message to print. + * @...: Arguments to format the message. + */ +__attribute__((format(__printf__, 3, 4))) static void +kbasep_print(struct kbase_context *kctx, struct seq_file *file, const char *fmt, ...) +{ + int len = 0; + char buffer[CSF_SYNC_DUMP_SIZE]; + va_list arglist; + + va_start(arglist, fmt); + len = vsnprintf(buffer, CSF_SYNC_DUMP_SIZE, fmt, arglist); + if (len <= 0) { + pr_err("message write to the buffer failed"); + goto exit; + } + + if (file) + seq_printf(file, buffer); + else + dev_warn(kctx->kbdev->dev, buffer); + +exit: + va_end(arglist); +} + +/** + * kbasep_csf_debugfs_get_cqs_live_u32() - Obtain live (u32) value for a CQS object. + * + * @kctx: The context of the queue. + * @obj_addr: Pointer to the CQS live 32-bit value. + * @live_val: Pointer to the u32 that will be set to the CQS object's current, live + * value. + * + * Return: 0 if successful or a negative error code on failure. + */ +static int kbasep_csf_debugfs_get_cqs_live_u32(struct kbase_context *kctx, u64 obj_addr, + u32 *live_val) +{ + struct kbase_vmap_struct *mapping; + u32 *const cpu_ptr = (u32 *)kbase_phy_alloc_mapping_get(kctx, obj_addr, &mapping); + + if (!cpu_ptr) + return -1; + + *live_val = *cpu_ptr; + kbase_phy_alloc_mapping_put(kctx, mapping); + return 0; +} + +/** + * kbasep_csf_debugfs_get_cqs_live_u64() - Obtain live (u64) value for a CQS object. + * + * @kctx: The context of the queue. + * @obj_addr: Pointer to the CQS live value (32 or 64-bit). + * @live_val: Pointer to the u64 that will be set to the CQS object's current, live + * value. + * + * Return: 0 if successful or a negative error code on failure. + */ +static int kbasep_csf_debugfs_get_cqs_live_u64(struct kbase_context *kctx, u64 obj_addr, + u64 *live_val) +{ + struct kbase_vmap_struct *mapping; + u64 *cpu_ptr = (u64 *)kbase_phy_alloc_mapping_get(kctx, obj_addr, &mapping); + + if (!cpu_ptr) + return -1; + + *live_val = *cpu_ptr; + kbase_phy_alloc_mapping_put(kctx, mapping); + return 0; +} + +/** + * kbasep_csf_sync_print_kcpu_fence_wait_or_signal() - Print details of a CSF SYNC Fence Wait + * or Fence Signal command, contained in a + * KCPU queue. + * + * @buffer: The buffer to write to. + * @length: The length of text in the buffer. + * @cmd: The KCPU Command to be printed. + * @cmd_name: The name of the command: indicates either a fence SIGNAL or WAIT. + */ +static void kbasep_csf_sync_print_kcpu_fence_wait_or_signal(char *buffer, int *length, + struct kbase_kcpu_command *cmd, + const char *cmd_name) +{ +#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE) + struct fence *fence = NULL; +#else + struct dma_fence *fence = NULL; +#endif /* LINUX_VERSION_CODE < KERNEL_VERSION(4, 10, 0) */ + struct kbase_kcpu_command_fence_info *fence_info; + struct kbase_sync_fence_info info; + const char *timeline_name = NULL; + bool is_signaled = false; + + fence_info = &cmd->info.fence; + if (kbase_kcpu_command_fence_has_force_signaled(fence_info)) + return; + + fence = kbase_fence_get(fence_info); + if (WARN_ON(!fence)) + return; + + kbase_sync_fence_info_get(fence, &info); + timeline_name = fence->ops->get_timeline_name(fence); + is_signaled = info.status > 0; + + *length += snprintf(buffer + *length, CSF_SYNC_DUMP_SIZE - *length, + "cmd:%s obj:0x%pK live_value:0x%.8x | ", cmd_name, fence, is_signaled); + + /* Note: fence->seqno was u32 until 5.1 kernel, then u64 */ + *length += snprintf(buffer + *length, CSF_SYNC_DUMP_SIZE - *length, + "timeline_name:%s timeline_context:0x%.16llx fence_seqno:0x%.16llx", + timeline_name, fence->context, (u64)fence->seqno); + + kbase_fence_put(fence); +} + +/** + * kbasep_csf_sync_print_kcpu_cqs_wait() - Print details of a CSF SYNC CQS Wait command, + * contained in a KCPU queue. + * + * @kctx: The kbase context. + * @buffer: The buffer to write to. + * @length: The length of text in the buffer. + * @cmd: The KCPU Command to be printed. + */ +static void kbasep_csf_sync_print_kcpu_cqs_wait(struct kbase_context *kctx, char *buffer, + int *length, struct kbase_kcpu_command *cmd) +{ + size_t i; + + for (i = 0; i < cmd->info.cqs_wait.nr_objs; i++) { + struct base_cqs_wait_info *cqs_obj = &cmd->info.cqs_wait.objs[i]; + + u32 live_val; + int ret = kbasep_csf_debugfs_get_cqs_live_u32(kctx, cqs_obj->addr, &live_val); + bool live_val_valid = (ret >= 0); + + *length += + snprintf(buffer + *length, CSF_SYNC_DUMP_SIZE - *length, + "cmd:CQS_WAIT_OPERATION obj:0x%.16llx live_value:", cqs_obj->addr); + + if (live_val_valid) + *length += snprintf(buffer + *length, CSF_SYNC_DUMP_SIZE - *length, + "0x%.16llx", (u64)live_val); + else + *length += snprintf(buffer + *length, CSF_SYNC_DUMP_SIZE - *length, + CQS_UNREADABLE_LIVE_VALUE); + + *length += snprintf(buffer + *length, CSF_SYNC_DUMP_SIZE - *length, + " | op:gt arg_value:0x%.8x", cqs_obj->val); + } +} + +/** + * kbasep_csf_sync_print_kcpu_cqs_set() - Print details of a CSF SYNC CQS + * Set command, contained in a KCPU queue. + * + * @kctx: The kbase context. + * @buffer: The buffer to write to. + * @length: The length of text in the buffer. + * @cmd: The KCPU Command to be printed. + */ +static void kbasep_csf_sync_print_kcpu_cqs_set(struct kbase_context *kctx, char *buffer, + int *length, struct kbase_kcpu_command *cmd) +{ + size_t i; + + for (i = 0; i < cmd->info.cqs_set.nr_objs; i++) { + struct base_cqs_set *cqs_obj = &cmd->info.cqs_set.objs[i]; + + u32 live_val; + int ret = kbasep_csf_debugfs_get_cqs_live_u32(kctx, cqs_obj->addr, &live_val); + bool live_val_valid = (ret >= 0); + + *length += + snprintf(buffer + *length, CSF_SYNC_DUMP_SIZE - *length, + "cmd:CQS_SET_OPERATION obj:0x%.16llx live_value:", cqs_obj->addr); + + if (live_val_valid) + *length += snprintf(buffer + *length, CSF_SYNC_DUMP_SIZE - *length, + "0x%.16llx", (u64)live_val); + else + *length += snprintf(buffer + *length, CSF_SYNC_DUMP_SIZE - *length, + CQS_UNREADABLE_LIVE_VALUE); + + *length += snprintf(buffer + *length, CSF_SYNC_DUMP_SIZE - *length, + " | op:add arg_value:0x%.8x", 1); + } +} + +/** + * kbasep_csf_sync_get_wait_op_name() - Print the name of a CQS Wait Operation. + * + * @op: The numerical value of operation. + * + * Return: const static pointer to the command name, or '??' if unknown. + */ +static const char *kbasep_csf_sync_get_wait_op_name(basep_cqs_wait_operation_op op) +{ + const char *string; + + switch (op) { + case BASEP_CQS_WAIT_OPERATION_LE: + string = "le"; + break; + case BASEP_CQS_WAIT_OPERATION_GT: + string = "gt"; + break; + default: + string = "??"; + break; + } + return string; +} + +/** + * kbasep_csf_sync_get_set_op_name() - Print the name of a CQS Set Operation. + * + * @op: The numerical value of operation. + * + * Return: const static pointer to the command name, or '??' if unknown. + */ +static const char *kbasep_csf_sync_get_set_op_name(basep_cqs_set_operation_op op) +{ + const char *string; + + switch (op) { + case BASEP_CQS_SET_OPERATION_ADD: + string = "add"; + break; + case BASEP_CQS_SET_OPERATION_SET: + string = "set"; + break; + default: + string = "???"; + break; + } + return string; +} + +/** + * kbasep_csf_sync_print_kcpu_cqs_wait_op() - Print details of a CSF SYNC CQS + * Wait Operation command, contained + * in a KCPU queue. + * + * @kctx: The kbase context. + * @buffer: The buffer to write to. + * @length: The length of text in the buffer. + * @cmd: The KCPU Command to be printed. + */ +static void kbasep_csf_sync_print_kcpu_cqs_wait_op(struct kbase_context *kctx, char *buffer, + int *length, struct kbase_kcpu_command *cmd) +{ + size_t i; + + for (i = 0; i < cmd->info.cqs_wait.nr_objs; i++) { + struct base_cqs_wait_operation_info *wait_op = + &cmd->info.cqs_wait_operation.objs[i]; + const char *op_name = kbasep_csf_sync_get_wait_op_name(wait_op->operation); + + u64 live_val; + int ret = kbasep_csf_debugfs_get_cqs_live_u64(kctx, wait_op->addr, &live_val); + + bool live_val_valid = (ret >= 0); + + *length += + snprintf(buffer + *length, CSF_SYNC_DUMP_SIZE - *length, + "cmd:CQS_WAIT_OPERATION obj:0x%.16llx live_value:", wait_op->addr); + + if (live_val_valid) + *length += snprintf(buffer + *length, CSF_SYNC_DUMP_SIZE - *length, + "0x%.16llx", live_val); + else + *length += snprintf(buffer + *length, CSF_SYNC_DUMP_SIZE - *length, + CQS_UNREADABLE_LIVE_VALUE); + + *length += snprintf(buffer + *length, CSF_SYNC_DUMP_SIZE - *length, + " | op:%s arg_value:0x%.16llx", op_name, wait_op->val); + } +} + +/** + * kbasep_csf_sync_print_kcpu_cqs_set_op() - Print details of a CSF SYNC CQS + * Set Operation command, contained + * in a KCPU queue. + * + * @kctx: The kbase context. + * @buffer: The buffer to write to. + * @length: The length of text in the buffer. + * @cmd: The KCPU Command to be printed. + */ +static void kbasep_csf_sync_print_kcpu_cqs_set_op(struct kbase_context *kctx, char *buffer, + int *length, struct kbase_kcpu_command *cmd) +{ + size_t i; + + for (i = 0; i < cmd->info.cqs_set_operation.nr_objs; i++) { + struct base_cqs_set_operation_info *set_op = &cmd->info.cqs_set_operation.objs[i]; + const char *op_name = kbasep_csf_sync_get_set_op_name( + (basep_cqs_set_operation_op)set_op->operation); + + u64 live_val; + int ret = kbasep_csf_debugfs_get_cqs_live_u64(kctx, set_op->addr, &live_val); + + bool live_val_valid = (ret >= 0); + + *length += + snprintf(buffer + *length, CSF_SYNC_DUMP_SIZE - *length, + "cmd:CQS_SET_OPERATION obj:0x%.16llx live_value:", set_op->addr); + + if (live_val_valid) + *length += snprintf(buffer + *length, CSF_SYNC_DUMP_SIZE - *length, + "0x%.16llx", live_val); + else + *length += snprintf(buffer + *length, CSF_SYNC_DUMP_SIZE - *length, + CQS_UNREADABLE_LIVE_VALUE); + + *length += snprintf(buffer + *length, CSF_SYNC_DUMP_SIZE - *length, + " | op:%s arg_value:0x%.16llx", op_name, set_op->val); + } +} + +/** + * kbasep_csf_sync_kcpu_debugfs_print_queue() - Print debug data for a KCPU queue + * + * @kctx: The kbase context. + * @file: The seq_file to print to. + * @queue: Pointer to the KCPU queue. + */ +static void kbasep_csf_sync_kcpu_debugfs_print_queue(struct kbase_context *kctx, + struct seq_file *file, + struct kbase_kcpu_command_queue *queue) +{ + char started_or_pending; + struct kbase_kcpu_command *cmd; + size_t i; + + if (WARN_ON(!queue)) + return; + + lockdep_assert_held(&kctx->csf.kcpu_queues.lock); + mutex_lock(&queue->lock); + + for (i = 0; i != queue->num_pending_cmds; ++i) { + char buffer[CSF_SYNC_DUMP_SIZE]; + int length = 0; + started_or_pending = ((i == 0) && queue->command_started) ? 'S' : 'P'; + length += snprintf(buffer, CSF_SYNC_DUMP_SIZE, "queue:KCPU-%d-%d exec:%c ", + kctx->id, queue->id, started_or_pending); + + cmd = &queue->commands[(u8)(queue->start_offset + i)]; + switch (cmd->type) { +#if IS_ENABLED(CONFIG_SYNC_FILE) + case BASE_KCPU_COMMAND_TYPE_FENCE_SIGNAL: + kbasep_csf_sync_print_kcpu_fence_wait_or_signal(buffer, &length, cmd, + "FENCE_SIGNAL"); + break; + case BASE_KCPU_COMMAND_TYPE_FENCE_WAIT: + kbasep_csf_sync_print_kcpu_fence_wait_or_signal(buffer, &length, cmd, + "FENCE_WAIT"); + break; +#endif + case BASE_KCPU_COMMAND_TYPE_CQS_WAIT: + kbasep_csf_sync_print_kcpu_cqs_wait(kctx, buffer, &length, cmd); + break; + case BASE_KCPU_COMMAND_TYPE_CQS_SET: + kbasep_csf_sync_print_kcpu_cqs_set(kctx, buffer, &length, cmd); + break; + case BASE_KCPU_COMMAND_TYPE_CQS_WAIT_OPERATION: + kbasep_csf_sync_print_kcpu_cqs_wait_op(kctx, buffer, &length, cmd); + break; + case BASE_KCPU_COMMAND_TYPE_CQS_SET_OPERATION: + kbasep_csf_sync_print_kcpu_cqs_set_op(kctx, buffer, &length, cmd); + break; + default: + length += snprintf(buffer + length, CSF_SYNC_DUMP_SIZE - length, + ", U, Unknown blocking command"); + break; + } + + length += snprintf(buffer + length, CSF_SYNC_DUMP_SIZE - length, "\n"); + kbasep_print(kctx, file, buffer); + } + + mutex_unlock(&queue->lock); +} + +int kbasep_csf_sync_kcpu_dump_locked(struct kbase_context *kctx, struct seq_file *file) +{ + unsigned long queue_idx; + + lockdep_assert_held(&kctx->csf.kcpu_queues.lock); + + kbasep_print(kctx, file, "KCPU queues for ctx %d:\n", kctx->id); + + queue_idx = find_first_bit(kctx->csf.kcpu_queues.in_use, KBASEP_MAX_KCPU_QUEUES); + + while (queue_idx < KBASEP_MAX_KCPU_QUEUES) { + kbasep_csf_sync_kcpu_debugfs_print_queue(kctx, file, + kctx->csf.kcpu_queues.array[queue_idx]); + + queue_idx = find_next_bit(kctx->csf.kcpu_queues.in_use, KBASEP_MAX_KCPU_QUEUES, + queue_idx + 1); + } + + return 0; +} + +int kbasep_csf_sync_kcpu_dump(struct kbase_context *kctx, struct seq_file *file) +{ + mutex_lock(&kctx->csf.kcpu_queues.lock); + kbasep_csf_sync_kcpu_dump_locked(kctx, file); + mutex_unlock(&kctx->csf.kcpu_queues.lock); + return 0; +} + +#if IS_ENABLED(CONFIG_DEBUG_FS) + +/* GPU queue related values */ +#define GPU_CSF_MOVE_OPCODE ((u64)0x1) +#define GPU_CSF_MOVE32_OPCODE ((u64)0x2) +#define GPU_CSF_SYNC_ADD_OPCODE ((u64)0x25) +#define GPU_CSF_SYNC_SET_OPCODE ((u64)0x26) +#define GPU_CSF_SYNC_WAIT_OPCODE ((u64)0x27) +#define GPU_CSF_SYNC_ADD64_OPCODE ((u64)0x33) +#define GPU_CSF_SYNC_SET64_OPCODE ((u64)0x34) +#define GPU_CSF_SYNC_WAIT64_OPCODE ((u64)0x35) +#define GPU_CSF_CALL_OPCODE ((u64)0x20) + +#define MAX_NR_GPU_CALLS (5) +#define INSTR_OPCODE_MASK ((u64)0xFF << 56) +#define INSTR_OPCODE_GET(value) ((value & INSTR_OPCODE_MASK) >> 56) +#define MOVE32_IMM_MASK ((u64)0xFFFFFFFFFUL) +#define MOVE_DEST_MASK ((u64)0xFF << 48) +#define MOVE_DEST_GET(value) ((value & MOVE_DEST_MASK) >> 48) +#define MOVE_IMM_MASK ((u64)0xFFFFFFFFFFFFUL) +#define SYNC_SRC0_MASK ((u64)0xFF << 40) +#define SYNC_SRC1_MASK ((u64)0xFF << 32) +#define SYNC_SRC0_GET(value) (u8)((value & SYNC_SRC0_MASK) >> 40) +#define SYNC_SRC1_GET(value) (u8)((value & SYNC_SRC1_MASK) >> 32) +#define SYNC_WAIT_CONDITION_MASK ((u64)0xF << 28) +#define SYNC_WAIT_CONDITION_GET(value) (u8)((value & SYNC_WAIT_CONDITION_MASK) >> 28) + +/* Enumeration for types of GPU queue sync events for + * the purpose of dumping them through debugfs. + */ +enum debugfs_gpu_sync_type { + DEBUGFS_GPU_SYNC_WAIT, + DEBUGFS_GPU_SYNC_SET, + DEBUGFS_GPU_SYNC_ADD, + NUM_DEBUGFS_GPU_SYNC_TYPES +}; + +/** + * kbasep_csf_get_move_immediate_value() - Get the immediate values for sync operations + * from a MOVE instruction. + * + * @move_cmd: Raw MOVE instruction. + * @sync_addr_reg: Register identifier from SYNC_* instruction. + * @compare_val_reg: Register identifier from SYNC_* instruction. + * @sync_val: Pointer to store CQS object address for sync operation. + * @compare_val: Pointer to store compare value for sync operation. + * + * Return: True if value is obtained by checking for correct register identifier, + * or false otherwise. + */ +static bool kbasep_csf_get_move_immediate_value(u64 move_cmd, u64 sync_addr_reg, + u64 compare_val_reg, u64 *sync_val, + u64 *compare_val) +{ + u64 imm_mask; + + /* Verify MOVE instruction and get immediate mask */ + if (INSTR_OPCODE_GET(move_cmd) == GPU_CSF_MOVE32_OPCODE) + imm_mask = MOVE32_IMM_MASK; + else if (INSTR_OPCODE_GET(move_cmd) == GPU_CSF_MOVE_OPCODE) + imm_mask = MOVE_IMM_MASK; + else + /* Error return */ + return false; + + /* Verify value from MOVE instruction and assign to variable */ + if (sync_addr_reg == MOVE_DEST_GET(move_cmd)) + *sync_val = move_cmd & imm_mask; + else if (compare_val_reg == MOVE_DEST_GET(move_cmd)) + *compare_val = move_cmd & imm_mask; + else + /* Error return */ + return false; + + return true; +} + +/** kbasep_csf_read_ringbuffer_value() - Reads a u64 from the ringbuffer at a provided + * offset. + * + * @queue: Pointer to the queue. + * @ringbuff_offset: Ringbuffer offset. + * + * Return: the u64 in the ringbuffer at the desired offset. + */ +static u64 kbasep_csf_read_ringbuffer_value(struct kbase_queue *queue, u32 ringbuff_offset) +{ + u64 page_off = ringbuff_offset >> PAGE_SHIFT; + u64 offset_within_page = ringbuff_offset & ~PAGE_MASK; + struct page *page = as_page(queue->queue_reg->gpu_alloc->pages[page_off]); + u64 *ringbuffer = vmap(&page, 1, VM_MAP, pgprot_noncached(PAGE_KERNEL)); + u64 value; + + if (!ringbuffer) { + struct kbase_context *kctx = queue->kctx; + + dev_err(kctx->kbdev->dev, "%s failed to map the buffer page for read a command!", + __func__); + /* Return an alternative 0 for dumpping operation*/ + value = 0; + } else { + value = ringbuffer[offset_within_page / sizeof(u64)]; + vunmap(ringbuffer); + } + + return value; +} + +/** + * kbasep_csf_print_gpu_sync_op() - Print sync operation info for given sync command. + * + * @file: Pointer to debugfs seq_file file struct for writing output. + * @kctx: Pointer to kbase context. + * @queue: Pointer to the GPU command queue. + * @ringbuff_offset: Offset to index the ring buffer with, for the given sync command. + * (Useful for finding preceding MOVE commands) + * @sync_cmd: Entire u64 of the sync command, which has both sync address and + * comparison-value encoded in it. + * @type: Type of GPU sync command (e.g. SYNC_SET, SYNC_ADD, SYNC_WAIT). + * @is_64bit: Bool to indicate if operation is 64 bit (true) or 32 bit (false). + * @follows_wait: Bool to indicate if the operation follows at least one wait + * operation. Used to determine whether it's pending or started. + */ +static void kbasep_csf_print_gpu_sync_op(struct seq_file *file, struct kbase_context *kctx, + struct kbase_queue *queue, u32 ringbuff_offset, + u64 sync_cmd, enum debugfs_gpu_sync_type type, + bool is_64bit, bool follows_wait) +{ + u64 sync_addr = 0, compare_val = 0, live_val = 0; + u64 move_cmd; + u8 sync_addr_reg, compare_val_reg, wait_condition = 0; + int err; + + static const char *const gpu_sync_type_name[] = { "SYNC_WAIT", "SYNC_SET", "SYNC_ADD" }; + static const char *const gpu_sync_type_op[] = { + "wait", /* This should never be printed, only included to simplify indexing */ + "set", "add" + }; + + if (type >= NUM_DEBUGFS_GPU_SYNC_TYPES) { + dev_warn(kctx->kbdev->dev, "Expected GPU queue sync type is unknown!"); + return; + } + + /* We expect there to be at least 2 preceding MOVE instructions, and + * Base will always arrange for the 2 MOVE + SYNC instructions to be + * contiguously located, and is therefore never expected to be wrapped + * around the ringbuffer boundary. + */ + if (unlikely(ringbuff_offset < (2 * sizeof(u64)))) { + dev_warn(kctx->kbdev->dev, + "Unexpected wraparound detected between %s & MOVE instruction", + gpu_sync_type_name[type]); + return; + } + + /* 1. Get Register identifiers from SYNC_* instruction */ + sync_addr_reg = SYNC_SRC0_GET(sync_cmd); + compare_val_reg = SYNC_SRC1_GET(sync_cmd); + + /* 2. Get values from first MOVE command */ + ringbuff_offset -= sizeof(u64); + move_cmd = kbasep_csf_read_ringbuffer_value(queue, ringbuff_offset); + if (!kbasep_csf_get_move_immediate_value(move_cmd, sync_addr_reg, compare_val_reg, + &sync_addr, &compare_val)) + return; + + /* 3. Get values from next MOVE command */ + ringbuff_offset -= sizeof(u64); + move_cmd = kbasep_csf_read_ringbuffer_value(queue, ringbuff_offset); + if (!kbasep_csf_get_move_immediate_value(move_cmd, sync_addr_reg, compare_val_reg, + &sync_addr, &compare_val)) + return; + + /* 4. Get CQS object value */ + if (is_64bit) + err = kbasep_csf_debugfs_get_cqs_live_u64(kctx, sync_addr, &live_val); + else + err = kbasep_csf_debugfs_get_cqs_live_u32(kctx, sync_addr, (u32 *)(&live_val)); + + if (err) + return; + + /* 5. Print info */ + kbasep_print(kctx, file, "queue:GPU-%u-%u-%u exec:%c cmd:%s ", kctx->id, + queue->group->handle, queue->csi_index, + queue->enabled && !follows_wait ? 'S' : 'P', gpu_sync_type_name[type]); + + if (queue->group->csg_nr == KBASEP_CSG_NR_INVALID) + kbasep_print(kctx, file, "slot:-"); + else + kbasep_print(kctx, file, "slot:%d", (int)queue->group->csg_nr); + + kbasep_print(kctx, file, " obj:0x%.16llx live_value:0x%.16llx | ", sync_addr, live_val); + + if (type == DEBUGFS_GPU_SYNC_WAIT) { + wait_condition = SYNC_WAIT_CONDITION_GET(sync_cmd); + kbasep_print(kctx, file, "op:%s ", + kbasep_csf_sync_get_wait_op_name(wait_condition)); + } else + kbasep_print(kctx, file, "op:%s ", gpu_sync_type_op[type]); + + kbasep_print(kctx, file, "arg_value:0x%.16llx\n", compare_val); +} + +/** + * kbasep_csf_dump_active_queue_sync_info() - Print GPU command queue sync information. + * + * @file: seq_file for printing to. + * @queue: Address of a GPU command queue to examine. + * + * This function will iterate through each command in the ring buffer of the given GPU queue from + * CS_EXTRACT, and if is a SYNC_* instruction it will attempt to decode the sync operation and + * print relevant information to the debugfs file. + * This function will stop iterating once the CS_INSERT address is reached by the cursor (i.e. + * when there are no more commands to view) or a number of consumed GPU CALL commands have + * been observed. + */ +static void kbasep_csf_dump_active_queue_sync_info(struct seq_file *file, struct kbase_queue *queue) +{ + struct kbase_context *kctx; + u64 *addr; + u64 cs_extract, cs_insert, instr, cursor; + bool follows_wait = false; + int nr_calls = 0; + + if (!queue) + return; + + kctx = queue->kctx; + + addr = queue->user_io_addr; + cs_insert = addr[CS_INSERT_LO / sizeof(*addr)]; + + addr = queue->user_io_addr + PAGE_SIZE / sizeof(*addr); + cs_extract = addr[CS_EXTRACT_LO / sizeof(*addr)]; + + cursor = cs_extract; + + if (!is_power_of_2(queue->size)) { + dev_warn(kctx->kbdev->dev, "GPU queue %u size of %u not a power of 2", + queue->csi_index, queue->size); + return; + } + + while ((cursor < cs_insert) && (nr_calls < MAX_NR_GPU_CALLS)) { + bool instr_is_64_bit = false; + /* Calculate offset into ringbuffer from the absolute cursor, + * by finding the remainder of the cursor divided by the + * ringbuffer size. The ringbuffer size is guaranteed to be + * a power of 2, so the remainder can be calculated without an + * explicit modulo. queue->size - 1 is the ringbuffer mask. + */ + u32 cursor_ringbuff_offset = (u32)(cursor & (queue->size - 1)); + + /* Find instruction that cursor is currently on */ + instr = kbasep_csf_read_ringbuffer_value(queue, cursor_ringbuff_offset); + + switch (INSTR_OPCODE_GET(instr)) { + case GPU_CSF_SYNC_ADD64_OPCODE: + case GPU_CSF_SYNC_SET64_OPCODE: + case GPU_CSF_SYNC_WAIT64_OPCODE: + instr_is_64_bit = true; + break; + default: + break; + } + + switch (INSTR_OPCODE_GET(instr)) { + case GPU_CSF_SYNC_ADD_OPCODE: + case GPU_CSF_SYNC_ADD64_OPCODE: + kbasep_csf_print_gpu_sync_op(file, kctx, queue, cursor_ringbuff_offset, + instr, DEBUGFS_GPU_SYNC_ADD, instr_is_64_bit, + follows_wait); + break; + case GPU_CSF_SYNC_SET_OPCODE: + case GPU_CSF_SYNC_SET64_OPCODE: + kbasep_csf_print_gpu_sync_op(file, kctx, queue, cursor_ringbuff_offset, + instr, DEBUGFS_GPU_SYNC_SET, instr_is_64_bit, + follows_wait); + break; + case GPU_CSF_SYNC_WAIT_OPCODE: + case GPU_CSF_SYNC_WAIT64_OPCODE: + kbasep_csf_print_gpu_sync_op(file, kctx, queue, cursor_ringbuff_offset, + instr, DEBUGFS_GPU_SYNC_WAIT, instr_is_64_bit, + follows_wait); + follows_wait = true; /* Future commands will follow at least one wait */ + break; + case GPU_CSF_CALL_OPCODE: + nr_calls++; + break; + default: + /* Unrecognized command, skip past it */ + break; + } + + cursor += sizeof(u64); + } +} + +/** + * kbasep_csf_dump_active_group_sync_state() - Prints SYNC commands in all GPU queues of + * the provided queue group. + * + * @kctx: The kbase context + * @file: seq_file for printing to. + * @group: Address of a GPU command group to iterate through. + * + * This function will iterate through each queue in the provided GPU queue group and + * print its SYNC related commands. + */ +static void kbasep_csf_dump_active_group_sync_state(struct kbase_context *kctx, + struct seq_file *file, + struct kbase_queue_group *const group) +{ + unsigned int i; + + kbasep_print(kctx, file, "GPU queues for group %u (slot %d) of ctx %d_%d\n", group->handle, + group->csg_nr, kctx->tgid, kctx->id); + + for (i = 0; i < MAX_SUPPORTED_STREAMS_PER_GROUP; i++) + kbasep_csf_dump_active_queue_sync_info(file, group->bound_queues[i]); +} + +/** + * kbasep_csf_sync_gpu_dump() - Print CSF GPU queue sync info + * + * @kctx: The kbase context + * @file: The seq_file for printing to. + * + * Return: Negative error code or 0 on success. + */ +static int kbasep_csf_sync_gpu_dump(struct kbase_context *kctx, struct seq_file *file) +{ + u32 gr; + struct kbase_device *kbdev; + + if (WARN_ON(!kctx)) + return -EINVAL; + + kbdev = kctx->kbdev; + kbase_csf_scheduler_lock(kbdev); + kbase_csf_debugfs_update_active_groups_status(kbdev); + + for (gr = 0; gr < kbdev->csf.global_iface.group_num; gr++) { + struct kbase_queue_group *const group = + kbdev->csf.scheduler.csg_slots[gr].resident_group; + if (!group || group->kctx != kctx) + continue; + kbasep_csf_dump_active_group_sync_state(kctx, file, group); + } + + kbase_csf_scheduler_unlock(kbdev); + return 0; +} + +/** + * kbasep_csf_sync_debugfs_show() - Print CSF queue sync information + * + * @file: The seq_file for printing to. + * @data: The debugfs dentry private data, a pointer to kbase_context. + * + * Return: Negative error code or 0 on success. + */ +static int kbasep_csf_sync_debugfs_show(struct seq_file *file, void *data) +{ + struct kbase_context *kctx = file->private; + + kbasep_print(kctx, file, "MALI_CSF_SYNC_DEBUGFS_VERSION: v%u\n", + MALI_CSF_SYNC_DEBUGFS_VERSION); + + kbasep_csf_sync_kcpu_dump(kctx, file); + kbasep_csf_sync_gpu_dump(kctx, file); + return 0; +} + +static int kbasep_csf_sync_debugfs_open(struct inode *in, struct file *file) +{ + return single_open(file, kbasep_csf_sync_debugfs_show, in->i_private); +} + +static const struct file_operations kbasep_csf_sync_debugfs_fops = { + .open = kbasep_csf_sync_debugfs_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + +/** + * kbase_csf_sync_debugfs_init() - Initialise debugfs file. + * + * @kctx: Kernel context pointer. + */ +void kbase_csf_sync_debugfs_init(struct kbase_context *kctx) +{ + struct dentry *file; + const mode_t mode = 0444; + + if (WARN_ON(!kctx || IS_ERR_OR_NULL(kctx->kctx_dentry))) + return; + + file = debugfs_create_file("csf_sync", mode, kctx->kctx_dentry, kctx, + &kbasep_csf_sync_debugfs_fops); + + if (IS_ERR_OR_NULL(file)) + dev_warn(kctx->kbdev->dev, "Unable to create CSF Sync debugfs entry"); +} + +#else +/* + * Stub functions for when debugfs is disabled + */ +void kbase_csf_sync_debugfs_init(struct kbase_context *kctx) +{ +} + +#endif /* CONFIG_DEBUG_FS */ diff --git a/mali_kbase/csf/mali_kbase_csf_sync_debugfs.h b/mali_kbase/csf/mali_kbase_csf_sync_debugfs.h new file mode 100644 index 0000000..2fe5060 --- /dev/null +++ b/mali_kbase/csf/mali_kbase_csf_sync_debugfs.h @@ -0,0 +1,62 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * + * (C) COPYRIGHT 2022-2023 ARM Limited. All rights reserved. + * + * This program is free software and is provided to you under the terms of the + * GNU General Public License version 2 as published by the Free Software + * Foundation, and any use by you of this program is subject to the terms + * of such GNU license. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, you can access it online at + * http://www.gnu.org/licenses/gpl-2.0.html. + * + */ + +#ifndef _KBASE_CSF_SYNC_DEBUGFS_H_ +#define _KBASE_CSF_SYNC_DEBUGFS_H_ + +#include <linux/seq_file.h> + +/* Forward declaration */ +struct kbase_context; + +#define MALI_CSF_SYNC_DEBUGFS_VERSION 0 + +/** + * kbase_csf_sync_debugfs_init() - Create a debugfs entry for CSF queue sync info + * + * @kctx: The kbase_context for which to create the debugfs entry + */ +void kbase_csf_sync_debugfs_init(struct kbase_context *kctx); + +/** + * kbasep_csf_sync_kcpu_dump() - Print CSF KCPU queue sync info + * + * @kctx: The kbase context. + * @file: The seq_file for printing to. + * + * Return: Negative error code or 0 on success. + * + * Note: This function should not be used if kcpu_queues.lock is held. Use + * kbasep_csf_sync_kcpu_dump_locked() instead. + */ +int kbasep_csf_sync_kcpu_dump(struct kbase_context *kctx, struct seq_file *file); + +/** + * kbasep_csf_sync_kcpu_dump() - Print CSF KCPU queue sync info + * + * @kctx: The kbase context. + * @file: The seq_file for printing to. + * + * Return: Negative error code or 0 on success. + */ +int kbasep_csf_sync_kcpu_dump_locked(struct kbase_context *kctx, struct seq_file *file); + +#endif /* _KBASE_CSF_SYNC_DEBUGFS_H_ */ diff --git a/mali_kbase/csf/mali_kbase_csf_tiler_heap.c b/mali_kbase/csf/mali_kbase_csf_tiler_heap.c index 85babf9..f7e1a8d 100644 --- a/mali_kbase/csf/mali_kbase_csf_tiler_heap.c +++ b/mali_kbase/csf/mali_kbase_csf_tiler_heap.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -25,6 +25,26 @@ #include "mali_kbase_csf_tiler_heap_def.h" #include "mali_kbase_csf_heap_context_alloc.h" +/* Tiler heap shrink stop limit for maintaining a minimum number of chunks */ +#define HEAP_SHRINK_STOP_LIMIT (1) + +/** + * struct kbase_csf_gpu_buffer_heap - A gpu buffer object specific to tiler heap + * + * @cdsbp_0: Descriptor_type and buffer_type + * @size: The size of the current heap chunk + * @pointer: Pointer to the current heap chunk + * @low_pointer: Pointer to low end of current heap chunk + * @high_pointer: Pointer to high end of current heap chunk + */ +struct kbase_csf_gpu_buffer_heap { + u32 cdsbp_0; + u32 size; + u64 pointer; + u64 low_pointer; + u64 high_pointer; +} __packed; + /** * encode_chunk_ptr - Encode the address and size of a chunk as an integer. * @@ -74,6 +94,35 @@ static struct kbase_csf_tiler_heap_chunk *get_last_chunk( } /** + * remove_external_chunk_mappings - Remove external mappings from a chunk that + * is being transitioned to the tiler heap + * memory system. + * + * @kctx: kbase context the chunk belongs to. + * @chunk: The chunk whose external mappings are going to be removed. + * + * This function marks the region as DONT NEED. Along with NO_USER_FREE, this indicates + * that the VA region is owned by the tiler heap and could potentially be shrunk at any time. Other + * parts of kbase outside of tiler heap management should not take references on its physical + * pages, and should not modify them. + */ +static void remove_external_chunk_mappings(struct kbase_context *const kctx, + struct kbase_csf_tiler_heap_chunk *chunk) +{ + lockdep_assert_held(&kctx->reg_lock); + + if (chunk->region->cpu_alloc != NULL) { + kbase_mem_shrink_cpu_mapping(kctx, chunk->region, 0, + chunk->region->cpu_alloc->nents); + } +#if !defined(CONFIG_MALI_VECTOR_DUMP) + chunk->region->flags |= KBASE_REG_DONT_NEED; +#endif + + dev_dbg(kctx->kbdev->dev, "Removed external mappings from chunk 0x%llX", chunk->gpu_va); +} + +/** * link_chunk - Link a chunk into a tiler heap * * @heap: Pointer to the tiler heap. @@ -93,19 +142,12 @@ static int link_chunk(struct kbase_csf_tiler_heap *const heap, if (prev) { struct kbase_context *const kctx = heap->kctx; - struct kbase_vmap_struct map; - u64 *const prev_hdr = kbase_vmap_prot(kctx, prev->gpu_va, - sizeof(*prev_hdr), KBASE_REG_CPU_WR, &map); + u64 *prev_hdr = prev->map.addr; - if (unlikely(!prev_hdr)) { - dev_err(kctx->kbdev->dev, - "Failed to map tiler heap chunk 0x%llX\n", - prev->gpu_va); - return -ENOMEM; - } + WARN((prev->region->flags & KBASE_REG_CPU_CACHED), + "Cannot support CPU cached chunks without sync operations"); *prev_hdr = encode_chunk_ptr(heap->chunk_size, chunk->gpu_va); - kbase_vunmap(kctx, &map); dev_dbg(kctx->kbdev->dev, "Linked tiler heap chunks, 0x%llX -> 0x%llX\n", @@ -132,152 +174,284 @@ static int link_chunk(struct kbase_csf_tiler_heap *const heap, static int init_chunk(struct kbase_csf_tiler_heap *const heap, struct kbase_csf_tiler_heap_chunk *const chunk, bool link_with_prev) { - struct kbase_vmap_struct map; - struct u64 *chunk_hdr = NULL; + int err = 0; + u64 *chunk_hdr; struct kbase_context *const kctx = heap->kctx; + lockdep_assert_held(&kctx->csf.tiler_heaps.lock); + if (unlikely(chunk->gpu_va & ~CHUNK_ADDR_MASK)) { dev_err(kctx->kbdev->dev, "Tiler heap chunk address is unusable\n"); return -EINVAL; } - chunk_hdr = kbase_vmap_prot(kctx, - chunk->gpu_va, CHUNK_HDR_SIZE, KBASE_REG_CPU_WR, &map); - - if (unlikely(!chunk_hdr)) { - dev_err(kctx->kbdev->dev, - "Failed to map a tiler heap chunk header\n"); - return -ENOMEM; + WARN((chunk->region->flags & KBASE_REG_CPU_CACHED), + "Cannot support CPU cached chunks without sync operations"); + chunk_hdr = chunk->map.addr; + if (WARN(chunk->map.size < CHUNK_HDR_SIZE, + "Tiler chunk kernel mapping was not large enough for zero-init")) { + return -EINVAL; } memset(chunk_hdr, 0, CHUNK_HDR_SIZE); - kbase_vunmap(kctx, &map); + INIT_LIST_HEAD(&chunk->link); if (link_with_prev) - return link_chunk(heap, chunk); - else - return 0; + err = link_chunk(heap, chunk); + + if (unlikely(err)) { + dev_err(kctx->kbdev->dev, "Failed to link a chunk to a tiler heap\n"); + return -EINVAL; + } + + list_add_tail(&chunk->link, &heap->chunks_list); + heap->chunk_count++; + + return err; } /** - * create_chunk - Create a tiler heap chunk + * remove_unlinked_chunk - Remove a chunk that is not currently linked into a + * heap. * - * @heap: Pointer to the tiler heap for which to allocate memory. - * @link_with_prev: Flag to indicate if the chunk to be allocated needs to be - * linked with the previously allocated chunk. + * @kctx: Kbase context that was used to allocate the memory. + * @chunk: Chunk that has been allocated, but not linked into a heap. + */ +static void remove_unlinked_chunk(struct kbase_context *kctx, + struct kbase_csf_tiler_heap_chunk *chunk) +{ + if (WARN_ON(!list_empty(&chunk->link))) + return; + + kbase_gpu_vm_lock_with_pmode_sync(kctx); + kbase_vunmap(kctx, &chunk->map); + /* KBASE_REG_DONT_NEED regions will be confused with ephemeral regions (inc freed JIT + * regions), and so we must clear that flag too before freeing. + * For "no user free count", we check that the count is 1 as it is a shrinkable region; + * no other code part within kbase can take a reference to it. + */ + WARN_ON(atomic_read(&chunk->region->no_user_free_count) > 1); + kbase_va_region_no_user_free_dec(chunk->region); +#if !defined(CONFIG_MALI_VECTOR_DUMP) + chunk->region->flags &= ~KBASE_REG_DONT_NEED; +#endif + kbase_mem_free_region(kctx, chunk->region); + kbase_gpu_vm_unlock_with_pmode_sync(kctx); + + kfree(chunk); +} + +/** + * alloc_new_chunk - Allocate new chunk metadata for the tiler heap, reserve a fully backed VA + * region for the chunk, and provide a kernel mapping. + * @kctx: kbase context with which the chunk will be linked + * @chunk_size: the size of the chunk from the corresponding heap * - * This function allocates a chunk of memory for a tiler heap and adds it to - * the end of the list of chunks associated with that heap. The size of the - * chunk is not a parameter because it is configured per-heap not per-chunk. + * Allocate the chunk tracking metadata and a corresponding fully backed VA region for the + * chunk. The kernel may need to invoke the reclaim path while trying to fulfill the allocation, so + * we cannot hold any lock that would be held in the shrinker paths (JIT evict lock or tiler heap + * lock). * - * Return: 0 if successful or a negative error code on failure. + * Since the chunk may have its physical backing removed, to prevent use-after-free scenarios we + * ensure that it is protected from being mapped by other parts of kbase. + * + * The chunk's GPU memory can be accessed via its 'map' member, but should only be done so by the + * shrinker path, as it may be otherwise shrunk at any time. + * + * Return: pointer to kbase_csf_tiler_heap_chunk on success or a NULL pointer + * on failure */ -static int create_chunk(struct kbase_csf_tiler_heap *const heap, - bool link_with_prev) +static struct kbase_csf_tiler_heap_chunk *alloc_new_chunk(struct kbase_context *kctx, + u64 chunk_size) { - int err = 0; - struct kbase_context *const kctx = heap->kctx; - u64 nr_pages = PFN_UP(heap->chunk_size); - u64 flags = BASE_MEM_PROT_GPU_RD | BASE_MEM_PROT_GPU_WR | - BASE_MEM_PROT_CPU_WR | BASEP_MEM_NO_USER_FREE | - BASE_MEM_COHERENT_LOCAL; + u64 nr_pages = PFN_UP(chunk_size); + u64 flags = BASE_MEM_PROT_GPU_RD | BASE_MEM_PROT_GPU_WR | BASE_MEM_PROT_CPU_WR | + BASEP_MEM_NO_USER_FREE | BASE_MEM_COHERENT_LOCAL | BASE_MEM_PROT_CPU_RD; struct kbase_csf_tiler_heap_chunk *chunk = NULL; + /* The chunk kernel mapping needs to be large enough to: + * - initially zero the CHUNK_HDR_SIZE area + * - on shrinking, access the NEXT_CHUNK_ADDR_SIZE area + */ + const size_t chunk_kernel_map_size = max(CHUNK_HDR_SIZE, NEXT_CHUNK_ADDR_SIZE); /* Calls to this function are inherently synchronous, with respect to * MMU operations. */ const enum kbase_caller_mmu_sync_info mmu_sync_info = CALLER_MMU_SYNC; - flags |= kbase_mem_group_id_set(kctx->jit_group_id); -#if defined(CONFIG_MALI_DEBUG) || defined(CONFIG_MALI_VECTOR_DUMP) - flags |= BASE_MEM_PROT_CPU_RD; -#endif - chunk = kzalloc(sizeof(*chunk), GFP_KERNEL); if (unlikely(!chunk)) { dev_err(kctx->kbdev->dev, "No kernel memory for a new tiler heap chunk\n"); - return -ENOMEM; + return NULL; } /* Allocate GPU memory for the new chunk. */ - INIT_LIST_HEAD(&chunk->link); chunk->region = kbase_mem_alloc(kctx, nr_pages, nr_pages, 0, &flags, &chunk->gpu_va, mmu_sync_info); if (unlikely(!chunk->region)) { - dev_err(kctx->kbdev->dev, - "Failed to allocate a tiler heap chunk\n"); - err = -ENOMEM; - } else { - err = init_chunk(heap, chunk, link_with_prev); - if (unlikely(err)) { - kbase_gpu_vm_lock(kctx); - chunk->region->flags &= ~KBASE_REG_NO_USER_FREE; - kbase_mem_free_region(kctx, chunk->region); - kbase_gpu_vm_unlock(kctx); - } + dev_err(kctx->kbdev->dev, "Failed to allocate a tiler heap chunk!\n"); + goto unroll_chunk; } - if (unlikely(err)) { - kfree(chunk); - } else { - list_add_tail(&chunk->link, &heap->chunks_list); - heap->chunk_count++; + kbase_gpu_vm_lock(kctx); - dev_dbg(kctx->kbdev->dev, "Created tiler heap chunk 0x%llX\n", - chunk->gpu_va); + /* Some checks done here as NO_USER_FREE still allows such things to be made + * whilst we had dropped the region lock + */ + if (unlikely(atomic_read(&chunk->region->gpu_alloc->kernel_mappings) > 0)) { + dev_err(kctx->kbdev->dev, "Chunk region has active kernel mappings!\n"); + goto unroll_region; } - return err; + /* There is a race condition with regard to KBASE_REG_DONT_NEED, where another + * thread can have the "no user free" refcount increased between kbase_mem_alloc + * and kbase_gpu_vm_lock (above) and before KBASE_REG_DONT_NEED is set by + * remove_external_chunk_mappings (below). + * + * It should be fine and not a security risk if we let the region leak till + * region tracker termination in such a case. + */ + if (unlikely(atomic_read(&chunk->region->no_user_free_count) > 1)) { + dev_err(kctx->kbdev->dev, "Chunk region has no_user_free_count > 1!\n"); + goto unroll_region; + } + + /* Whilst we can be sure of a number of other restrictions due to BASEP_MEM_NO_USER_FREE + * being requested, it's useful to document in code what those restrictions are, and ensure + * they remain in place in future. + */ + if (WARN(!chunk->region->gpu_alloc, + "NO_USER_FREE chunks should not have had their alloc freed")) { + goto unroll_region; + } + + if (WARN(chunk->region->gpu_alloc->type != KBASE_MEM_TYPE_NATIVE, + "NO_USER_FREE chunks should not have been freed and then reallocated as imported/non-native regions")) { + goto unroll_region; + } + + if (WARN((chunk->region->flags & KBASE_REG_ACTIVE_JIT_ALLOC), + "NO_USER_FREE chunks should not have been freed and then reallocated as JIT regions")) { + goto unroll_region; + } + + if (WARN((chunk->region->flags & KBASE_REG_DONT_NEED), + "NO_USER_FREE chunks should not have been made ephemeral")) { + goto unroll_region; + } + + if (WARN(atomic_read(&chunk->region->cpu_alloc->gpu_mappings) > 1, + "NO_USER_FREE chunks should not have been aliased")) { + goto unroll_region; + } + + if (unlikely(!kbase_vmap_reg(kctx, chunk->region, chunk->gpu_va, chunk_kernel_map_size, + (KBASE_REG_CPU_RD | KBASE_REG_CPU_WR), &chunk->map, + KBASE_VMAP_FLAG_PERMANENT_MAP_ACCOUNTING))) { + dev_err(kctx->kbdev->dev, "Failed to map chunk header for shrinking!\n"); + goto unroll_region; + } + + remove_external_chunk_mappings(kctx, chunk); + kbase_gpu_vm_unlock(kctx); + + /* If page migration is enabled, we don't want to migrate tiler heap pages. + * This does not change if the constituent pages are already marked as isolated. + */ + if (kbase_is_page_migration_enabled()) + kbase_set_phy_alloc_page_status(chunk->region->gpu_alloc, NOT_MOVABLE); + + return chunk; + +unroll_region: + /* KBASE_REG_DONT_NEED regions will be confused with ephemeral regions (inc freed JIT + * regions), and so we must clear that flag too before freeing. + */ + kbase_va_region_no_user_free_dec(chunk->region); +#if !defined(CONFIG_MALI_VECTOR_DUMP) + chunk->region->flags &= ~KBASE_REG_DONT_NEED; +#endif + kbase_mem_free_region(kctx, chunk->region); + kbase_gpu_vm_unlock(kctx); +unroll_chunk: + kfree(chunk); + return NULL; } /** - * delete_chunk - Delete a tiler heap chunk + * create_chunk - Create a tiler heap chunk * - * @heap: Pointer to the tiler heap for which @chunk was allocated. - * @chunk: Pointer to a chunk to be deleted. + * @heap: Pointer to the tiler heap for which to allocate memory. * - * This function frees a tiler heap chunk previously allocated by @create_chunk - * and removes it from the list of chunks associated with the heap. + * This function allocates a chunk of memory for a tiler heap, adds it to the + * the list of chunks associated with that heap both on the host side and in GPU + * memory. * - * WARNING: The deleted chunk is not unlinked from the list of chunks used by - * the GPU, therefore it is only safe to use this function when - * deleting a heap. + * Return: 0 if successful or a negative error code on failure. */ -static void delete_chunk(struct kbase_csf_tiler_heap *const heap, - struct kbase_csf_tiler_heap_chunk *const chunk) +static int create_chunk(struct kbase_csf_tiler_heap *const heap) { - struct kbase_context *const kctx = heap->kctx; + int err = 0; + struct kbase_csf_tiler_heap_chunk *chunk = NULL; - kbase_gpu_vm_lock(kctx); - chunk->region->flags &= ~KBASE_REG_NO_USER_FREE; - kbase_mem_free_region(kctx, chunk->region); - kbase_gpu_vm_unlock(kctx); - list_del(&chunk->link); - heap->chunk_count--; - kfree(chunk); + chunk = alloc_new_chunk(heap->kctx, heap->chunk_size); + if (unlikely(!chunk)) { + err = -ENOMEM; + goto allocation_failure; + } + + mutex_lock(&heap->kctx->csf.tiler_heaps.lock); + err = init_chunk(heap, chunk, true); + mutex_unlock(&heap->kctx->csf.tiler_heaps.lock); + + if (unlikely(err)) + goto initialization_failure; + + dev_dbg(heap->kctx->kbdev->dev, "Created tiler heap chunk 0x%llX\n", chunk->gpu_va); + + return 0; +initialization_failure: + remove_unlinked_chunk(heap->kctx, chunk); +allocation_failure: + return err; } /** - * delete_all_chunks - Delete all chunks belonging to a tiler heap + * delete_all_chunks - Delete all chunks belonging to an unlinked tiler heap * * @heap: Pointer to a tiler heap. * - * This function empties the list of chunks associated with a tiler heap by - * freeing all chunks previously allocated by @create_chunk. + * This function empties the list of chunks associated with a tiler heap by freeing all chunks + * previously allocated by @create_chunk. + * + * The heap must not be reachable from a &struct kbase_context.csf.tiler_heaps.list, as the + * tiler_heaps lock cannot be held whilst deleting its chunks due to also needing the &struct + * kbase_context.region_lock. + * + * WARNING: Whilst the deleted chunks are unlinked from host memory, they are not unlinked from the + * list of chunks used by the GPU, therefore it is only safe to use this function when + * deleting a heap. */ static void delete_all_chunks(struct kbase_csf_tiler_heap *heap) { + struct kbase_context *const kctx = heap->kctx; struct list_head *entry = NULL, *tmp = NULL; + WARN(!list_empty(&heap->link), + "Deleting a heap's chunks when that heap is still linked requires the tiler_heaps lock, which cannot be held by the caller"); + list_for_each_safe(entry, tmp, &heap->chunks_list) { struct kbase_csf_tiler_heap_chunk *chunk = list_entry( entry, struct kbase_csf_tiler_heap_chunk, link); - delete_chunk(heap, chunk); + list_del_init(&chunk->link); + heap->chunk_count--; + + remove_unlinked_chunk(kctx, chunk); } } @@ -299,7 +473,7 @@ static int create_initial_chunks(struct kbase_csf_tiler_heap *const heap, u32 i; for (i = 0; (i < nchunks) && likely(!err); i++) - err = create_chunk(heap, true); + err = create_chunk(heap); if (unlikely(err)) delete_all_chunks(heap); @@ -308,14 +482,17 @@ static int create_initial_chunks(struct kbase_csf_tiler_heap *const heap, } /** - * delete_heap - Delete a tiler heap + * delete_heap - Delete an unlinked tiler heap * * @heap: Pointer to a tiler heap to be deleted. * * This function frees any chunks allocated for a tiler heap previously - * initialized by @kbase_csf_tiler_heap_init and removes it from the list of - * heaps associated with the kbase context. The heap context structure used by + * initialized by @kbase_csf_tiler_heap_init. The heap context structure used by * the firmware is also freed. + * + * The heap must not be reachable from a &struct kbase_context.csf.tiler_heaps.list, as the + * tiler_heaps lock cannot be held whilst deleting it due to also needing the &struct + * kbase_context.region_lock. */ static void delete_heap(struct kbase_csf_tiler_heap *heap) { @@ -323,23 +500,41 @@ static void delete_heap(struct kbase_csf_tiler_heap *heap) dev_dbg(kctx->kbdev->dev, "Deleting tiler heap 0x%llX\n", heap->gpu_va); - lockdep_assert_held(&kctx->csf.tiler_heaps.lock); + WARN(!list_empty(&heap->link), + "Deleting a heap that is still linked requires the tiler_heaps lock, which cannot be held by the caller"); + /* Make sure that all of the VA regions corresponding to the chunks are + * freed at this time and that the work queue is not trying to access freed + * memory. + * + * Note: since the heap is unlinked, and that no references are made to chunks other + * than from their heap, there is no need to separately move the chunks out of the + * heap->chunks_list to delete them. + */ delete_all_chunks(heap); + kbase_vunmap(kctx, &heap->gpu_va_map); /* We could optimize context destruction by not freeing leaked heap - * contexts but it doesn't seem worth the extra complexity. + * contexts but it doesn't seem worth the extra complexity. After this + * point, the suballocation is returned to the heap context allocator and + * may be overwritten with new data, meaning heap->gpu_va should not + * be used past this point. */ kbase_csf_heap_context_allocator_free(&kctx->csf.tiler_heaps.ctx_alloc, heap->gpu_va); - list_del(&heap->link); - WARN_ON(heap->chunk_count); KBASE_TLSTREAM_AUX_TILER_HEAP_STATS(kctx->kbdev, kctx->id, heap->heap_id, 0, 0, heap->max_chunks, heap->chunk_size, 0, heap->target_in_flight, 0); + if (heap->buf_desc_reg) { + kbase_vunmap(kctx, &heap->buf_desc_map); + kbase_gpu_vm_lock(kctx); + kbase_va_region_no_user_free_dec(heap->buf_desc_reg); + kbase_gpu_vm_unlock(kctx); + } + kfree(heap); } @@ -375,6 +570,23 @@ static struct kbase_csf_tiler_heap *find_tiler_heap( return NULL; } +static struct kbase_csf_tiler_heap_chunk *find_chunk(struct kbase_csf_tiler_heap *heap, + u64 const chunk_gpu_va) +{ + struct kbase_csf_tiler_heap_chunk *chunk = NULL; + + lockdep_assert_held(&heap->kctx->csf.tiler_heaps.lock); + + list_for_each_entry(chunk, &heap->chunks_list, link) { + if (chunk->gpu_va == chunk_gpu_va) + return chunk; + } + + dev_dbg(heap->kctx->kbdev->dev, "Tiler heap chunk 0x%llX was not found\n", chunk_gpu_va); + + return NULL; +} + int kbase_csf_tiler_heap_context_init(struct kbase_context *const kctx) { int err = kbase_csf_heap_context_allocator_init( @@ -393,37 +605,88 @@ int kbase_csf_tiler_heap_context_init(struct kbase_context *const kctx) void kbase_csf_tiler_heap_context_term(struct kbase_context *const kctx) { + LIST_HEAD(local_heaps_list); struct list_head *entry = NULL, *tmp = NULL; dev_dbg(kctx->kbdev->dev, "Terminating a context for tiler heaps\n"); mutex_lock(&kctx->csf.tiler_heaps.lock); + list_splice_init(&kctx->csf.tiler_heaps.list, &local_heaps_list); + mutex_unlock(&kctx->csf.tiler_heaps.lock); - list_for_each_safe(entry, tmp, &kctx->csf.tiler_heaps.list) { + list_for_each_safe(entry, tmp, &local_heaps_list) { struct kbase_csf_tiler_heap *heap = list_entry( entry, struct kbase_csf_tiler_heap, link); + + list_del_init(&heap->link); delete_heap(heap); } - mutex_unlock(&kctx->csf.tiler_heaps.lock); mutex_destroy(&kctx->csf.tiler_heaps.lock); kbase_csf_heap_context_allocator_term(&kctx->csf.tiler_heaps.ctx_alloc); } -int kbase_csf_tiler_heap_init(struct kbase_context *const kctx, - u32 const chunk_size, u32 const initial_chunks, u32 const max_chunks, - u16 const target_in_flight, u64 *const heap_gpu_va, - u64 *const first_chunk_va) +/** + * kbasep_is_buffer_descriptor_region_suitable - Check if a VA region chosen to house + * the tiler heap buffer descriptor + * is suitable for the purpose. + * @kctx: kbase context of the tiler heap + * @reg: VA region being checked for suitability + * + * The tiler heap buffer descriptor memory does not admit page faults according + * to its design, so it must have the entirety of the backing upon allocation, + * and it has to remain alive as long as the tiler heap is alive, meaning it + * cannot be allocated from JIT/Ephemeral, or user freeable memory. + * + * Return: true on suitability, false otherwise. + */ +static bool kbasep_is_buffer_descriptor_region_suitable(struct kbase_context *const kctx, + struct kbase_va_region *const reg) +{ + if (kbase_is_region_invalid_or_free(reg)) { + dev_err(kctx->kbdev->dev, "Region is either invalid or free!\n"); + return false; + } + + if (!(reg->flags & KBASE_REG_CPU_RD) || kbase_is_region_shrinkable(reg) || + (reg->flags & KBASE_REG_PF_GROW)) { + dev_err(kctx->kbdev->dev, "Region has invalid flags: 0x%lX!\n", reg->flags); + return false; + } + + if (reg->gpu_alloc->type != KBASE_MEM_TYPE_NATIVE) { + dev_err(kctx->kbdev->dev, "Region has invalid type!\n"); + return false; + } + + if ((reg->nr_pages != kbase_reg_current_backed_size(reg)) || + (reg->nr_pages < PFN_UP(sizeof(struct kbase_csf_gpu_buffer_heap)))) { + dev_err(kctx->kbdev->dev, "Region has invalid backing!\n"); + return false; + } + + return true; +} + +#define TILER_BUF_DESC_SIZE (sizeof(struct kbase_csf_gpu_buffer_heap)) + +int kbase_csf_tiler_heap_init(struct kbase_context *const kctx, u32 const chunk_size, + u32 const initial_chunks, u32 const max_chunks, + u16 const target_in_flight, u64 const buf_desc_va, + u64 *const heap_gpu_va, u64 *const first_chunk_va) { int err = 0; struct kbase_csf_tiler_heap *heap = NULL; struct kbase_csf_heap_context_allocator *const ctx_alloc = &kctx->csf.tiler_heaps.ctx_alloc; + struct kbase_csf_tiler_heap_chunk *chunk = NULL; + struct kbase_va_region *gpu_va_reg = NULL; + void *vmap_ptr = NULL; dev_dbg(kctx->kbdev->dev, - "Creating a tiler heap with %u chunks (limit: %u) of size %u\n", - initial_chunks, max_chunks, chunk_size); + "Creating a tiler heap with %u chunks (limit: %u) of size %u, buf_desc_va: 0x%llx\n", + initial_chunks, max_chunks, chunk_size, buf_desc_va); if (!kbase_mem_allow_alloc(kctx)) return -EINVAL; @@ -445,8 +708,7 @@ int kbase_csf_tiler_heap_init(struct kbase_context *const kctx, heap = kzalloc(sizeof(*heap), GFP_KERNEL); if (unlikely(!heap)) { - dev_err(kctx->kbdev->dev, - "No kernel memory for a new tiler heap\n"); + dev_err(kctx->kbdev->dev, "No kernel memory for a new tiler heap"); return -ENOMEM; } @@ -454,57 +716,130 @@ int kbase_csf_tiler_heap_init(struct kbase_context *const kctx, heap->chunk_size = chunk_size; heap->max_chunks = max_chunks; heap->target_in_flight = target_in_flight; + heap->buf_desc_checked = false; INIT_LIST_HEAD(&heap->chunks_list); + INIT_LIST_HEAD(&heap->link); - heap->gpu_va = kbase_csf_heap_context_allocator_alloc(ctx_alloc); + /* Check on the buffer descriptor virtual Address */ + if (buf_desc_va) { + struct kbase_va_region *buf_desc_reg; + + kbase_gpu_vm_lock(kctx); + buf_desc_reg = + kbase_region_tracker_find_region_enclosing_address(kctx, buf_desc_va); + + if (!kbasep_is_buffer_descriptor_region_suitable(kctx, buf_desc_reg)) { + kbase_gpu_vm_unlock(kctx); + dev_err(kctx->kbdev->dev, + "Could not find a suitable VA region for the tiler heap buf desc!\n"); + err = -EINVAL; + goto buf_desc_not_suitable; + } + + /* If we don't prevent userspace from unmapping this, we may run into + * use-after-free, as we don't check for the existence of the region throughout. + */ + + heap->buf_desc_va = buf_desc_va; + heap->buf_desc_reg = buf_desc_reg; + kbase_va_region_no_user_free_inc(buf_desc_reg); + vmap_ptr = kbase_vmap_reg(kctx, buf_desc_reg, buf_desc_va, TILER_BUF_DESC_SIZE, + KBASE_REG_CPU_RD, &heap->buf_desc_map, + KBASE_VMAP_FLAG_PERMANENT_MAP_ACCOUNTING); + + if (kbase_is_page_migration_enabled()) + kbase_set_phy_alloc_page_status(buf_desc_reg->gpu_alloc, NOT_MOVABLE); + + kbase_gpu_vm_unlock(kctx); + + if (unlikely(!vmap_ptr)) { + dev_err(kctx->kbdev->dev, + "Could not vmap buffer descriptor into kernel memory (err %d)\n", + err); + err = -ENOMEM; + goto buf_desc_vmap_failed; + } + } + + heap->gpu_va = kbase_csf_heap_context_allocator_alloc(ctx_alloc); if (unlikely(!heap->gpu_va)) { - dev_dbg(kctx->kbdev->dev, - "Failed to allocate a tiler heap context"); + dev_dbg(kctx->kbdev->dev, "Failed to allocate a tiler heap context\n"); err = -ENOMEM; - } else { - err = create_initial_chunks(heap, initial_chunks); - if (unlikely(err)) - kbase_csf_heap_context_allocator_free(ctx_alloc, heap->gpu_va); + goto heap_context_alloc_failed; + } + + gpu_va_reg = ctx_alloc->region; + + kbase_gpu_vm_lock(kctx); + /* gpu_va_reg was created with BASEP_MEM_NO_USER_FREE, the code to unset this only happens + * on kctx termination (after all syscalls on kctx have finished), and so it is safe to + * assume that gpu_va_reg is still present. + */ + vmap_ptr = kbase_vmap_reg(kctx, gpu_va_reg, heap->gpu_va, NEXT_CHUNK_ADDR_SIZE, + (KBASE_REG_CPU_RD | KBASE_REG_CPU_WR), &heap->gpu_va_map, + KBASE_VMAP_FLAG_PERMANENT_MAP_ACCOUNTING); + kbase_gpu_vm_unlock(kctx); + if (unlikely(!vmap_ptr)) { + dev_dbg(kctx->kbdev->dev, "Failed to vmap the correct heap GPU VA address\n"); + err = -ENOMEM; + goto heap_context_vmap_failed; } + err = create_initial_chunks(heap, initial_chunks); if (unlikely(err)) { - kfree(heap); - } else { - struct kbase_csf_tiler_heap_chunk const *chunk = list_first_entry( - &heap->chunks_list, struct kbase_csf_tiler_heap_chunk, link); + dev_dbg(kctx->kbdev->dev, "Failed to create the initial tiler heap chunks\n"); + goto create_chunks_failed; + } + chunk = list_first_entry(&heap->chunks_list, struct kbase_csf_tiler_heap_chunk, link); - *heap_gpu_va = heap->gpu_va; - *first_chunk_va = chunk->gpu_va; + *heap_gpu_va = heap->gpu_va; + *first_chunk_va = chunk->gpu_va; - mutex_lock(&kctx->csf.tiler_heaps.lock); - kctx->csf.tiler_heaps.nr_of_heaps++; - heap->heap_id = kctx->csf.tiler_heaps.nr_of_heaps; - list_add(&heap->link, &kctx->csf.tiler_heaps.list); + mutex_lock(&kctx->csf.tiler_heaps.lock); + kctx->csf.tiler_heaps.nr_of_heaps++; + heap->heap_id = kctx->csf.tiler_heaps.nr_of_heaps; + list_add(&heap->link, &kctx->csf.tiler_heaps.list); - KBASE_TLSTREAM_AUX_TILER_HEAP_STATS( - kctx->kbdev, kctx->id, heap->heap_id, - PFN_UP(heap->chunk_size * heap->max_chunks), - PFN_UP(heap->chunk_size * heap->chunk_count), heap->max_chunks, - heap->chunk_size, heap->chunk_count, heap->target_in_flight, 0); + KBASE_TLSTREAM_AUX_TILER_HEAP_STATS(kctx->kbdev, kctx->id, heap->heap_id, + PFN_UP(heap->chunk_size * heap->max_chunks), + PFN_UP(heap->chunk_size * heap->chunk_count), + heap->max_chunks, heap->chunk_size, heap->chunk_count, + heap->target_in_flight, 0); #if defined(CONFIG_MALI_VECTOR_DUMP) - list_for_each_entry(chunk, &heap->chunks_list, link) { - KBASE_TLSTREAM_JD_TILER_HEAP_CHUNK_ALLOC( - kctx->kbdev, kctx->id, heap->heap_id, chunk->gpu_va); - } + list_for_each_entry(chunk, &heap->chunks_list, link) { + KBASE_TLSTREAM_JD_TILER_HEAP_CHUNK_ALLOC(kctx->kbdev, kctx->id, heap->heap_id, + chunk->gpu_va); + } #endif + kctx->running_total_tiler_heap_nr_chunks += heap->chunk_count; + kctx->running_total_tiler_heap_memory += (u64)heap->chunk_size * heap->chunk_count; + if (kctx->running_total_tiler_heap_memory > kctx->peak_total_tiler_heap_memory) + kctx->peak_total_tiler_heap_memory = kctx->running_total_tiler_heap_memory; - dev_dbg(kctx->kbdev->dev, "Created tiler heap 0x%llX\n", heap->gpu_va); - mutex_unlock(&kctx->csf.tiler_heaps.lock); - kctx->running_total_tiler_heap_nr_chunks += heap->chunk_count; - kctx->running_total_tiler_heap_memory += - heap->chunk_size * heap->chunk_count; - if (kctx->running_total_tiler_heap_memory > - kctx->peak_total_tiler_heap_memory) - kctx->peak_total_tiler_heap_memory = - kctx->running_total_tiler_heap_memory; + dev_dbg(kctx->kbdev->dev, + "Created tiler heap 0x%llX, buffer descriptor 0x%llX, ctx_%d_%d\n", heap->gpu_va, + buf_desc_va, kctx->tgid, kctx->id); + mutex_unlock(&kctx->csf.tiler_heaps.lock); + + return 0; + +create_chunks_failed: + kbase_vunmap(kctx, &heap->gpu_va_map); +heap_context_vmap_failed: + kbase_csf_heap_context_allocator_free(ctx_alloc, heap->gpu_va); +heap_context_alloc_failed: + if (heap->buf_desc_reg) + kbase_vunmap(kctx, &heap->buf_desc_map); +buf_desc_vmap_failed: + if (heap->buf_desc_reg) { + kbase_gpu_vm_lock(kctx); + kbase_va_region_no_user_free_dec(heap->buf_desc_reg); + kbase_gpu_vm_unlock(kctx); } +buf_desc_not_suitable: + kfree(heap); return err; } @@ -517,16 +852,19 @@ int kbase_csf_tiler_heap_term(struct kbase_context *const kctx, u64 heap_size = 0; mutex_lock(&kctx->csf.tiler_heaps.lock); - heap = find_tiler_heap(kctx, heap_gpu_va); if (likely(heap)) { chunk_count = heap->chunk_count; heap_size = heap->chunk_size * chunk_count; - delete_heap(heap); - } else + + list_del_init(&heap->link); + } else { err = -EINVAL; + } - mutex_unlock(&kctx->csf.tiler_heaps.lock); + /* Update stats whilst still holding the lock so they are in sync with the tiler_heaps.list + * at all times + */ if (likely(kctx->running_total_tiler_heap_memory >= heap_size)) kctx->running_total_tiler_heap_memory -= heap_size; else @@ -537,36 +875,46 @@ int kbase_csf_tiler_heap_term(struct kbase_context *const kctx, else dev_warn(kctx->kbdev->dev, "Running total tiler chunk count lower than expected!"); + if (!err) + dev_dbg(kctx->kbdev->dev, + "Terminated tiler heap 0x%llX, buffer descriptor 0x%llX, ctx_%d_%d\n", + heap->gpu_va, heap->buf_desc_va, kctx->tgid, kctx->id); + mutex_unlock(&kctx->csf.tiler_heaps.lock); + + /* Deletion requires the kctx->reg_lock, so must only operate on it whilst unlinked from + * the kctx's csf.tiler_heaps.list, and without holding the csf.tiler_heaps.lock + */ + if (likely(heap)) + delete_heap(heap); + return err; } /** - * alloc_new_chunk - Allocate a new chunk for the tiler heap. - * - * @heap: Pointer to the tiler heap. - * @nr_in_flight: Number of render passes that are in-flight, must not be zero. - * @pending_frag_count: Number of render passes in-flight with completed vertex/tiler stage. - * The minimum value is zero but it must be less or equal to - * the total number of render passes in flight - * @new_chunk_ptr: Where to store the GPU virtual address & size of the new - * chunk allocated for the heap. - * - * This function will allocate a new chunk for the chunked tiler heap depending - * on the settings provided by userspace when the heap was created and the - * heap's statistics (like number of render passes in-flight). - * - * Return: 0 if a new chunk was allocated otherwise an appropriate negative - * error code. + * validate_allocation_request - Check whether the chunk allocation request + * received on tiler OOM should be handled at + * current time. + * + * @heap: The tiler heap the OOM is associated with + * @nr_in_flight: Number of fragment jobs in flight + * @pending_frag_count: Number of pending fragment jobs + * + * Context: must hold the tiler heap lock to guarantee its lifetime + * + * Return: + * * 0 - allowed to allocate an additional chunk + * * -EINVAL - invalid + * * -EBUSY - there are fragment jobs still in flight, which may free chunks + * after completing + * * -ENOMEM - the targeted number of in-flight chunks has been reached and + * no new ones will be allocated */ -static int alloc_new_chunk(struct kbase_csf_tiler_heap *heap, - u32 nr_in_flight, u32 pending_frag_count, u64 *new_chunk_ptr) +static int validate_allocation_request(struct kbase_csf_tiler_heap *heap, u32 nr_in_flight, + u32 pending_frag_count) { - int err = -ENOMEM; - lockdep_assert_held(&heap->kctx->csf.tiler_heaps.lock); - if (WARN_ON(!nr_in_flight) || - WARN_ON(pending_frag_count > nr_in_flight)) + if (WARN_ON(!nr_in_flight) || WARN_ON(pending_frag_count > nr_in_flight)) return -EINVAL; if (nr_in_flight <= heap->target_in_flight) { @@ -574,66 +922,452 @@ static int alloc_new_chunk(struct kbase_csf_tiler_heap *heap, /* Not exceeded the target number of render passes yet so be * generous with memory. */ - err = create_chunk(heap, false); - - if (likely(!err)) { - struct kbase_csf_tiler_heap_chunk *new_chunk = - get_last_chunk(heap); - if (!WARN_ON(!new_chunk)) { - *new_chunk_ptr = - encode_chunk_ptr(heap->chunk_size, - new_chunk->gpu_va); - return 0; - } - } + return 0; } else if (pending_frag_count > 0) { - err = -EBUSY; + return -EBUSY; } else { - err = -ENOMEM; + return -ENOMEM; } } else { /* Reached target number of render passes in flight. * Wait for some of them to finish */ - err = -EBUSY; + return -EBUSY; } - - return err; + return -ENOMEM; } int kbase_csf_tiler_heap_alloc_new_chunk(struct kbase_context *kctx, u64 gpu_heap_va, u32 nr_in_flight, u32 pending_frag_count, u64 *new_chunk_ptr) { struct kbase_csf_tiler_heap *heap; + struct kbase_csf_tiler_heap_chunk *chunk; int err = -EINVAL; + u64 chunk_size = 0; + u64 heap_id = 0; + + /* To avoid potential locking issues during allocation, this is handled + * in three phases: + * 1. Take the lock, find the corresponding heap, and find its chunk size + * (this is always 2 MB, but may change down the line). + * 2. Allocate memory for the chunk and its region. + * 3. If the heap still exists, link it to the end of the list. If it + * doesn't, roll back the allocation. + */ mutex_lock(&kctx->csf.tiler_heaps.lock); + heap = find_tiler_heap(kctx, gpu_heap_va); + if (likely(heap)) { + chunk_size = heap->chunk_size; + heap_id = heap->heap_id; + } else { + dev_err(kctx->kbdev->dev, "Heap 0x%llX does not exist", gpu_heap_va); + mutex_unlock(&kctx->csf.tiler_heaps.lock); + goto prelink_failure; + } + err = validate_allocation_request(heap, nr_in_flight, pending_frag_count); + if (unlikely(err)) { + /* The allocation request can be legitimate, but be invoked on a heap + * that has already reached the maximum pre-configured capacity. This + * is useful debug information, but should not be treated as an error, + * since the request will be re-sent at a later point. + */ + dev_dbg(kctx->kbdev->dev, + "Not allocating new chunk for heap 0x%llX due to current heap state (err %d)", + gpu_heap_va, err); + mutex_unlock(&kctx->csf.tiler_heaps.lock); + goto prelink_failure; + } + mutex_unlock(&kctx->csf.tiler_heaps.lock); + /* this heap must not be used whilst we have dropped the lock */ + heap = NULL; + + chunk = alloc_new_chunk(kctx, chunk_size); + if (unlikely(!chunk)) { + dev_err(kctx->kbdev->dev, "Could not allocate chunk of size %lld for ctx %d_%d", + chunk_size, kctx->tgid, kctx->id); + goto prelink_failure; + } + + /* After this point, the heap that we were targeting could already have had the needed + * chunks allocated, if we were handling multiple OoM events on multiple threads, so + * we need to revalidate the need for the allocation. + */ + mutex_lock(&kctx->csf.tiler_heaps.lock); heap = find_tiler_heap(kctx, gpu_heap_va); - if (likely(heap)) { - err = alloc_new_chunk(heap, nr_in_flight, pending_frag_count, - new_chunk_ptr); - if (likely(!err)) { - /* update total and peak tiler heap memory record */ - kctx->running_total_tiler_heap_nr_chunks++; - kctx->running_total_tiler_heap_memory += heap->chunk_size; - - if (kctx->running_total_tiler_heap_memory > - kctx->peak_total_tiler_heap_memory) - kctx->peak_total_tiler_heap_memory = - kctx->running_total_tiler_heap_memory; - } + if (unlikely(!heap)) { + dev_err(kctx->kbdev->dev, "Tiler heap 0x%llX no longer exists!\n", gpu_heap_va); + mutex_unlock(&kctx->csf.tiler_heaps.lock); + goto unroll_chunk; + } - KBASE_TLSTREAM_AUX_TILER_HEAP_STATS( - kctx->kbdev, kctx->id, heap->heap_id, - PFN_UP(heap->chunk_size * heap->max_chunks), - PFN_UP(heap->chunk_size * heap->chunk_count), - heap->max_chunks, heap->chunk_size, heap->chunk_count, - heap->target_in_flight, nr_in_flight); + if (heap_id != heap->heap_id) { + dev_err(kctx->kbdev->dev, + "Tiler heap 0x%llX was removed from ctx %d_%d while allocating chunk of size %lld!", + gpu_heap_va, kctx->tgid, kctx->id, chunk_size); + mutex_unlock(&kctx->csf.tiler_heaps.lock); + goto unroll_chunk; } + if (WARN_ON(chunk_size != heap->chunk_size)) { + mutex_unlock(&kctx->csf.tiler_heaps.lock); + goto unroll_chunk; + } + + err = validate_allocation_request(heap, nr_in_flight, pending_frag_count); + if (unlikely(err)) { + dev_warn( + kctx->kbdev->dev, + "Aborting linking chunk to heap 0x%llX: heap state changed during allocation (err %d)", + gpu_heap_va, err); + mutex_unlock(&kctx->csf.tiler_heaps.lock); + goto unroll_chunk; + } + + err = init_chunk(heap, chunk, false); + + /* On error, the chunk would not be linked, so we can still treat it as an unlinked + * chunk for error handling. + */ + if (unlikely(err)) { + dev_err(kctx->kbdev->dev, + "Could not link chunk(0x%llX) with tiler heap 0%llX in ctx %d_%d due to error %d", + chunk->gpu_va, gpu_heap_va, kctx->tgid, kctx->id, err); + mutex_unlock(&kctx->csf.tiler_heaps.lock); + goto unroll_chunk; + } + + *new_chunk_ptr = encode_chunk_ptr(heap->chunk_size, chunk->gpu_va); + + /* update total and peak tiler heap memory record */ + kctx->running_total_tiler_heap_nr_chunks++; + kctx->running_total_tiler_heap_memory += heap->chunk_size; + + if (kctx->running_total_tiler_heap_memory > kctx->peak_total_tiler_heap_memory) + kctx->peak_total_tiler_heap_memory = kctx->running_total_tiler_heap_memory; + + KBASE_TLSTREAM_AUX_TILER_HEAP_STATS(kctx->kbdev, kctx->id, heap->heap_id, + PFN_UP(heap->chunk_size * heap->max_chunks), + PFN_UP(heap->chunk_size * heap->chunk_count), + heap->max_chunks, heap->chunk_size, heap->chunk_count, + heap->target_in_flight, nr_in_flight); + mutex_unlock(&kctx->csf.tiler_heaps.lock); return err; +unroll_chunk: + remove_unlinked_chunk(kctx, chunk); +prelink_failure: + return err; +} + +static bool delete_chunk_physical_pages(struct kbase_csf_tiler_heap *heap, u64 chunk_gpu_va, + u64 *hdr_val) +{ + int err; + u64 *chunk_hdr; + struct kbase_context *kctx = heap->kctx; + struct kbase_csf_tiler_heap_chunk *chunk = NULL; + + lockdep_assert_held(&heap->kctx->csf.tiler_heaps.lock); + lockdep_assert_held(&kctx->kbdev->csf.scheduler.lock); + + chunk = find_chunk(heap, chunk_gpu_va); + if (unlikely(!chunk)) { + dev_warn(kctx->kbdev->dev, + "Failed to find tiler heap(0x%llX) chunk(0x%llX) for reclaim-delete\n", + heap->gpu_va, chunk_gpu_va); + return false; + } + + WARN((chunk->region->flags & KBASE_REG_CPU_CACHED), + "Cannot support CPU cached chunks without sync operations"); + chunk_hdr = chunk->map.addr; + *hdr_val = *chunk_hdr; + + dev_dbg(kctx->kbdev->dev, + "Reclaim: delete chunk(0x%llx) in heap(0x%llx), header value(0x%llX)\n", + chunk_gpu_va, heap->gpu_va, *hdr_val); + + err = kbase_mem_shrink_gpu_mapping(kctx, chunk->region, 0, chunk->region->gpu_alloc->nents); + if (unlikely(err)) { + dev_warn( + kctx->kbdev->dev, + "Reclaim: shrinking GPU mapping failed on chunk(0x%llx) in heap(0x%llx) (err %d)\n", + chunk_gpu_va, heap->gpu_va, err); + + /* Cannot free the pages whilst references on the GPU remain, so keep the chunk on + * the heap's chunk list and try a different heap. + */ + + return false; + } + /* Destroy the mapping before the physical pages which are mapped are destroyed. */ + kbase_vunmap(kctx, &chunk->map); + + err = kbase_free_phy_pages_helper(chunk->region->gpu_alloc, + chunk->region->gpu_alloc->nents); + if (unlikely(err)) { + dev_warn( + kctx->kbdev->dev, + "Reclaim: remove physical backing failed on chunk(0x%llx) in heap(0x%llx) (err %d), continuing with deferred removal\n", + chunk_gpu_va, heap->gpu_va, err); + + /* kbase_free_phy_pages_helper() should only fail on invalid input, and WARNs + * anyway, so continue instead of returning early. + * + * Indeed, we don't want to leave the chunk on the heap's chunk list whilst it has + * its mapping removed, as that could lead to problems. It's safest to instead + * continue with deferred destruction of the chunk. + */ + } + + dev_dbg(kctx->kbdev->dev, + "Reclaim: delete chunk(0x%llx) in heap(0x%llx), header value(0x%llX)\n", + chunk_gpu_va, heap->gpu_va, *hdr_val); + + mutex_lock(&heap->kctx->jit_evict_lock); + list_move(&chunk->region->jit_node, &kctx->jit_destroy_head); + mutex_unlock(&heap->kctx->jit_evict_lock); + + list_del(&chunk->link); + heap->chunk_count--; + kfree(chunk); + + return true; +} + +static void sanity_check_gpu_buffer_heap(struct kbase_csf_tiler_heap *heap, + struct kbase_csf_gpu_buffer_heap *desc) +{ + u64 first_hoarded_chunk_gpu_va = desc->pointer & CHUNK_ADDR_MASK; + + lockdep_assert_held(&heap->kctx->csf.tiler_heaps.lock); + + if (first_hoarded_chunk_gpu_va) { + struct kbase_csf_tiler_heap_chunk *chunk = + find_chunk(heap, first_hoarded_chunk_gpu_va); + + if (likely(chunk)) { + dev_dbg(heap->kctx->kbdev->dev, + "Buffer descriptor 0x%llX sanity check ok, HW reclaim allowed\n", + heap->buf_desc_va); + + heap->buf_desc_checked = true; + return; + } + } + /* If there is no match, defer the check to next time */ + dev_dbg(heap->kctx->kbdev->dev, "Buffer descriptor 0x%llX runtime sanity check deferred\n", + heap->buf_desc_va); +} + +static bool can_read_hw_gpu_buffer_heap(struct kbase_csf_tiler_heap *heap, u64 *chunk_gpu_va_ptr) +{ + struct kbase_context *kctx = heap->kctx; + + lockdep_assert_held(&kctx->csf.tiler_heaps.lock); + + /* Initialize the descriptor pointer value to 0 */ + *chunk_gpu_va_ptr = 0; + + /* The BufferDescriptor on heap is a hint on creation, do a sanity check at runtime */ + if (heap->buf_desc_reg && !heap->buf_desc_checked) { + struct kbase_csf_gpu_buffer_heap *desc = heap->buf_desc_map.addr; + + /* BufferDescriptor is supplied by userspace, so could be CPU-cached */ + if (heap->buf_desc_map.flags & KBASE_VMAP_FLAG_SYNC_NEEDED) + kbase_sync_mem_regions(kctx, &heap->buf_desc_map, KBASE_SYNC_TO_CPU); + + sanity_check_gpu_buffer_heap(heap, desc); + if (heap->buf_desc_checked) + *chunk_gpu_va_ptr = desc->pointer & CHUNK_ADDR_MASK; + } + + return heap->buf_desc_checked; +} + +static u32 delete_hoarded_chunks(struct kbase_csf_tiler_heap *heap) +{ + u32 freed = 0; + u64 chunk_gpu_va = 0; + struct kbase_context *kctx = heap->kctx; + struct kbase_csf_tiler_heap_chunk *chunk = NULL; + + lockdep_assert_held(&kctx->csf.tiler_heaps.lock); + + if (can_read_hw_gpu_buffer_heap(heap, &chunk_gpu_va)) { + u64 chunk_hdr_val; + u64 *hw_hdr; + + if (!chunk_gpu_va) { + struct kbase_csf_gpu_buffer_heap *desc = heap->buf_desc_map.addr; + + /* BufferDescriptor is supplied by userspace, so could be CPU-cached */ + if (heap->buf_desc_map.flags & KBASE_VMAP_FLAG_SYNC_NEEDED) + kbase_sync_mem_regions(kctx, &heap->buf_desc_map, + KBASE_SYNC_TO_CPU); + chunk_gpu_va = desc->pointer & CHUNK_ADDR_MASK; + + if (!chunk_gpu_va) { + dev_dbg(kctx->kbdev->dev, + "Buffer descriptor 0x%llX has no chunks (NULL) for reclaim scan\n", + heap->buf_desc_va); + goto out; + } + } + + chunk = find_chunk(heap, chunk_gpu_va); + if (unlikely(!chunk)) + goto out; + + WARN((chunk->region->flags & KBASE_REG_CPU_CACHED), + "Cannot support CPU cached chunks without sync operations"); + hw_hdr = chunk->map.addr; + + /* Move onto the next chunk relevant information */ + chunk_hdr_val = *hw_hdr; + chunk_gpu_va = chunk_hdr_val & CHUNK_ADDR_MASK; + + while (chunk_gpu_va && heap->chunk_count > HEAP_SHRINK_STOP_LIMIT) { + bool success = + delete_chunk_physical_pages(heap, chunk_gpu_va, &chunk_hdr_val); + + if (!success) + break; + + freed++; + /* On success, chunk_hdr_val is updated, extract the next chunk address */ + chunk_gpu_va = chunk_hdr_val & CHUNK_ADDR_MASK; + } + + /* Update the existing hardware chunk header, after reclaim deletion of chunks */ + *hw_hdr = chunk_hdr_val; + + dev_dbg(heap->kctx->kbdev->dev, + "HW reclaim scan freed chunks: %u, set hw_hdr[0]: 0x%llX\n", freed, + chunk_hdr_val); + } else { + dev_dbg(kctx->kbdev->dev, + "Skip HW reclaim scan, (disabled: buffer descriptor 0x%llX)\n", + heap->buf_desc_va); + } +out: + return freed; +} + +static u64 delete_unused_chunk_pages(struct kbase_csf_tiler_heap *heap) +{ + u32 freed_chunks = 0; + u64 freed_pages = 0; + u64 chunk_gpu_va; + u64 chunk_hdr_val; + struct kbase_context *kctx = heap->kctx; + u64 *ctx_ptr; + + lockdep_assert_held(&kctx->csf.tiler_heaps.lock); + + WARN(heap->gpu_va_map.flags & KBASE_VMAP_FLAG_SYNC_NEEDED, + "Cannot support CPU cached heap context without sync operations"); + + ctx_ptr = heap->gpu_va_map.addr; + + /* Extract the first chunk address from the context's free_list_head */ + chunk_hdr_val = *ctx_ptr; + chunk_gpu_va = chunk_hdr_val & CHUNK_ADDR_MASK; + + while (chunk_gpu_va) { + u64 hdr_val; + bool success = delete_chunk_physical_pages(heap, chunk_gpu_va, &hdr_val); + + if (!success) + break; + + freed_chunks++; + chunk_hdr_val = hdr_val; + /* extract the next chunk address */ + chunk_gpu_va = chunk_hdr_val & CHUNK_ADDR_MASK; + } + + /* Update the post-scan deletion to context header */ + *ctx_ptr = chunk_hdr_val; + + /* Try to scan the HW hoarded list of unused chunks */ + freed_chunks += delete_hoarded_chunks(heap); + freed_pages = freed_chunks * PFN_UP(heap->chunk_size); + dev_dbg(heap->kctx->kbdev->dev, + "Scan reclaim freed chunks/pages %u/%llu, set heap-ctx_u64[0]: 0x%llX\n", + freed_chunks, freed_pages, chunk_hdr_val); + + /* Update context tiler heaps memory usage */ + kctx->running_total_tiler_heap_memory -= freed_pages << PAGE_SHIFT; + kctx->running_total_tiler_heap_nr_chunks -= freed_chunks; + return freed_pages; +} + +u32 kbase_csf_tiler_heap_scan_kctx_unused_pages(struct kbase_context *kctx, u32 to_free) +{ + u64 freed = 0; + struct kbase_csf_tiler_heap *heap; + + mutex_lock(&kctx->csf.tiler_heaps.lock); + + list_for_each_entry(heap, &kctx->csf.tiler_heaps.list, link) { + freed += delete_unused_chunk_pages(heap); + + /* If freed enough, then stop here */ + if (freed >= to_free) + break; + } + + mutex_unlock(&kctx->csf.tiler_heaps.lock); + /* The scan is surely not more than 4-G pages, but for logic flow limit it */ + if (WARN_ON(unlikely(freed > U32_MAX))) + return U32_MAX; + else + return (u32)freed; +} + +static u64 count_unused_heap_pages(struct kbase_csf_tiler_heap *heap) +{ + u32 chunk_cnt = 0; + u64 page_cnt = 0; + + lockdep_assert_held(&heap->kctx->csf.tiler_heaps.lock); + + /* Here the count is basically an informed estimate, avoiding the costly mapping/unmaping + * in the chunk list walk. The downside is that the number is a less reliable guide for + * later on scan (free) calls on this heap for what actually is freeable. + */ + if (heap->chunk_count > HEAP_SHRINK_STOP_LIMIT) { + chunk_cnt = heap->chunk_count - HEAP_SHRINK_STOP_LIMIT; + page_cnt = chunk_cnt * PFN_UP(heap->chunk_size); + } + + dev_dbg(heap->kctx->kbdev->dev, + "Reclaim count chunks/pages %u/%llu (estimated), heap_va: 0x%llX\n", chunk_cnt, + page_cnt, heap->gpu_va); + + return page_cnt; +} + +u32 kbase_csf_tiler_heap_count_kctx_unused_pages(struct kbase_context *kctx) +{ + u64 page_cnt = 0; + struct kbase_csf_tiler_heap *heap; + + mutex_lock(&kctx->csf.tiler_heaps.lock); + + list_for_each_entry(heap, &kctx->csf.tiler_heaps.list, link) + page_cnt += count_unused_heap_pages(heap); + + mutex_unlock(&kctx->csf.tiler_heaps.lock); + + /* The count is surely not more than 4-G pages, but for logic flow limit it */ + if (WARN_ON(unlikely(page_cnt > U32_MAX))) + return U32_MAX; + else + return (u32)page_cnt; } diff --git a/mali_kbase/csf/mali_kbase_csf_tiler_heap.h b/mali_kbase/csf/mali_kbase_csf_tiler_heap.h index 4031ad4..1b5cb56 100644 --- a/mali_kbase/csf/mali_kbase_csf_tiler_heap.h +++ b/mali_kbase/csf/mali_kbase_csf_tiler_heap.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -23,7 +23,6 @@ #define _KBASE_CSF_TILER_HEAP_H_ #include <mali_kbase.h> - /** * kbase_csf_tiler_heap_context_init - Initialize the tiler heaps context for a * GPU address space @@ -58,6 +57,12 @@ void kbase_csf_tiler_heap_context_term(struct kbase_context *kctx); * @target_in_flight: Number of render-passes that the driver should attempt to * keep in flight for which allocation of new chunks is * allowed. Must not be zero. + * @buf_desc_va: Buffer descriptor GPU virtual address. This is a hint for + * indicating that the caller is intending to perform tiler heap + * chunks reclaim for those that are hoarded with hardware while + * the associated shader activites are suspended and the CSGs are + * off slots. If the referred reclaiming is not desired, can + * set it to 0. * @gpu_heap_va: Where to store the GPU virtual address of the context that was * set up for the tiler heap. * @first_chunk_va: Where to store the GPU virtual address of the first chunk @@ -66,13 +71,12 @@ void kbase_csf_tiler_heap_context_term(struct kbase_context *kctx); * * Return: 0 if successful or a negative error code on failure. */ -int kbase_csf_tiler_heap_init(struct kbase_context *kctx, - u32 chunk_size, u32 initial_chunks, u32 max_chunks, - u16 target_in_flight, u64 *gpu_heap_va, - u64 *first_chunk_va); +int kbase_csf_tiler_heap_init(struct kbase_context *kctx, u32 chunk_size, u32 initial_chunks, + u32 max_chunks, u16 target_in_flight, u64 const buf_desc_va, + u64 *gpu_heap_va, u64 *first_chunk_va); /** - * kbasep_cs_tiler_heap_term - Terminate a chunked tiler memory heap. + * kbase_csf_tiler_heap_term - Terminate a chunked tiler memory heap. * * @kctx: Pointer to the kbase context in which the tiler heap was initialized. * @gpu_heap_va: The GPU virtual address of the context that was set up for the @@ -112,4 +116,27 @@ int kbase_csf_tiler_heap_term(struct kbase_context *kctx, u64 gpu_heap_va); */ int kbase_csf_tiler_heap_alloc_new_chunk(struct kbase_context *kctx, u64 gpu_heap_va, u32 nr_in_flight, u32 pending_frag_count, u64 *new_chunk_ptr); + +/** + * kbase_csf_tiler_heap_scan_kctx_unused_pages - Performs the tiler heap shrinker calim's scan + * functionality. + * + * @kctx: Pointer to the kbase context for which the tiler heap recalim is to be + * operated with. + * @to_free: Number of pages suggested for the reclaim scan (free) method to reach. + * + * Return: the actual number of pages the scan method has freed from the call. + */ +u32 kbase_csf_tiler_heap_scan_kctx_unused_pages(struct kbase_context *kctx, u32 to_free); + +/** + * kbase_csf_tiler_heap_count_kctx_unused_pages - Performs the tiler heap shrinker calim's count + * functionality. + * + * @kctx: Pointer to the kbase context for which the tiler heap recalim is to be + * operated with. + * + * Return: a number of pages that could likely be freed on the subsequent scan method call. + */ +u32 kbase_csf_tiler_heap_count_kctx_unused_pages(struct kbase_context *kctx); #endif diff --git a/mali_kbase/csf/mali_kbase_csf_tiler_heap_def.h b/mali_kbase/csf/mali_kbase_csf_tiler_heap_def.h index 2c006d9..96f2b03 100644 --- a/mali_kbase/csf/mali_kbase_csf_tiler_heap_def.h +++ b/mali_kbase/csf/mali_kbase_csf_tiler_heap_def.h @@ -56,12 +56,20 @@ ((CHUNK_HDR_NEXT_ADDR_MASK >> CHUNK_HDR_NEXT_ADDR_POS) << \ CHUNK_HDR_NEXT_ADDR_ENCODE_SHIFT) +/* The size of the area needed to be vmapped prior to handing the tiler heap + * over to the tiler, so that the shrinker could be invoked. + */ +#define NEXT_CHUNK_ADDR_SIZE (sizeof(u64)) + /** * struct kbase_csf_tiler_heap_chunk - A tiler heap chunk managed by the kernel * * @link: Link to this chunk in a list of chunks belonging to a * @kbase_csf_tiler_heap. * @region: Pointer to the GPU memory region allocated for the chunk. + * @map: Kernel VA mapping so that we would not need to use vmap in the + * shrinker callback, which can allocate. This maps only the header + * of the chunk, so it could be traversed. * @gpu_va: GPU virtual address of the start of the memory region. * This points to the header of the chunk and not to the low address * of free memory within it. @@ -75,9 +83,12 @@ struct kbase_csf_tiler_heap_chunk { struct list_head link; struct kbase_va_region *region; + struct kbase_vmap_struct map; u64 gpu_va; }; +#define HEAP_BUF_DESCRIPTOR_CHECKED (1 << 0) + /** * struct kbase_csf_tiler_heap - A tiler heap managed by the kernel * @@ -85,6 +96,20 @@ struct kbase_csf_tiler_heap_chunk { * associated. * @link: Link to this heap in a list of tiler heaps belonging to * the @kbase_csf_tiler_heap_context. + * @chunks_list: Linked list of allocated chunks. + * @gpu_va: The GPU virtual address of the heap context structure that + * was allocated for the firmware. This is also used to + * uniquely identify the heap. + * @heap_id: Unique id representing the heap, assigned during heap + * initialization. + * @buf_desc_va: Buffer descriptor GPU VA. Can be 0 for backward compatible + * to earlier version base interfaces. + * @buf_desc_reg: Pointer to the VA region that covers the provided buffer + * descriptor memory object pointed to by buf_desc_va. + * @gpu_va_map: Kernel VA mapping of the GPU VA region. + * @buf_desc_map: Kernel VA mapping of the buffer descriptor, read from + * during the tiler heap shrinker. Sync operations may need + * to be done before each read. * @chunk_size: Size of each chunk, in bytes. Must be page-aligned. * @chunk_count: The number of chunks currently allocated. Must not be * zero or greater than @max_chunks. @@ -93,22 +118,23 @@ struct kbase_csf_tiler_heap_chunk { * @target_in_flight: Number of render-passes that the driver should attempt * to keep in flight for which allocation of new chunks is * allowed. Must not be zero. - * @gpu_va: The GPU virtual address of the heap context structure that - * was allocated for the firmware. This is also used to - * uniquely identify the heap. - * @heap_id: Unique id representing the heap, assigned during heap - * initialization. - * @chunks_list: Linked list of allocated chunks. + * @buf_desc_checked: Indicates if runtime check on buffer descriptor has been done. */ struct kbase_csf_tiler_heap { struct kbase_context *kctx; struct list_head link; + struct list_head chunks_list; + u64 gpu_va; + u64 heap_id; + u64 buf_desc_va; + struct kbase_va_region *buf_desc_reg; + struct kbase_vmap_struct buf_desc_map; + struct kbase_vmap_struct gpu_va_map; u32 chunk_size; u32 chunk_count; u32 max_chunks; u16 target_in_flight; - u64 gpu_va; - u64 heap_id; - struct list_head chunks_list; + bool buf_desc_checked; }; + #endif /* !_KBASE_CSF_TILER_HEAP_DEF_H_ */ diff --git a/mali_kbase/csf/mali_kbase_csf_tiler_heap_reclaim.c b/mali_kbase/csf/mali_kbase_csf_tiler_heap_reclaim.c new file mode 100644 index 0000000..39db1a0 --- /dev/null +++ b/mali_kbase/csf/mali_kbase_csf_tiler_heap_reclaim.c @@ -0,0 +1,394 @@ +// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note +/* + * + * (C) COPYRIGHT 2022 ARM Limited. All rights reserved. + * + * This program is free software and is provided to you under the terms of the + * GNU General Public License version 2 as published by the Free Software + * Foundation, and any use by you of this program is subject to the terms + * of such GNU license. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, you can access it online at + * http://www.gnu.org/licenses/gpl-2.0.html. + * + */ + +#include <mali_kbase.h> +#include "backend/gpu/mali_kbase_pm_internal.h" +#include "mali_kbase_csf.h" +#include "mali_kbase_csf_tiler_heap.h" +#include "mali_kbase_csf_tiler_heap_reclaim.h" + +/* Tiler heap shrinker seek value, needs to be higher than jit and memory pools */ +#define HEAP_SHRINKER_SEEKS (DEFAULT_SEEKS + 2) + +/* Tiler heap shrinker batch value */ +#define HEAP_SHRINKER_BATCH (512) + +/* Tiler heap reclaim scan (free) method size for limiting a scan run length */ +#define HEAP_RECLAIM_SCAN_BATCH_SIZE (HEAP_SHRINKER_BATCH << 7) + +static u8 get_kctx_highest_csg_priority(struct kbase_context *kctx) +{ + u8 prio; + + for (prio = KBASE_QUEUE_GROUP_PRIORITY_REALTIME; prio < KBASE_QUEUE_GROUP_PRIORITY_LOW; + prio++) + if (!list_empty(&kctx->csf.sched.runnable_groups[prio])) + break; + + if (prio != KBASE_QUEUE_GROUP_PRIORITY_REALTIME && kctx->csf.sched.num_idle_wait_grps) { + struct kbase_queue_group *group; + + list_for_each_entry(group, &kctx->csf.sched.idle_wait_groups, link) { + if (group->priority < prio) + prio = group->priority; + } + } + + return prio; +} + +static void detach_ctx_from_heap_reclaim_mgr(struct kbase_context *kctx) +{ + struct kbase_csf_scheduler *const scheduler = &kctx->kbdev->csf.scheduler; + struct kbase_csf_ctx_heap_reclaim_info *info = &kctx->csf.sched.heap_info; + + lockdep_assert_held(&scheduler->lock); + + if (!list_empty(&info->mgr_link)) { + u32 remaining = (info->nr_est_unused_pages > info->nr_freed_pages) ? + info->nr_est_unused_pages - info->nr_freed_pages : + 0; + + list_del_init(&info->mgr_link); + if (remaining) + WARN_ON(atomic_sub_return(remaining, &scheduler->reclaim_mgr.unused_pages) < + 0); + + dev_dbg(kctx->kbdev->dev, + "Reclaim_mgr_detach: ctx_%d_%d, est_pages=0%u, freed_pages=%u", kctx->tgid, + kctx->id, info->nr_est_unused_pages, info->nr_freed_pages); + } +} + +static void attach_ctx_to_heap_reclaim_mgr(struct kbase_context *kctx) +{ + struct kbase_csf_ctx_heap_reclaim_info *const info = &kctx->csf.sched.heap_info; + struct kbase_csf_scheduler *const scheduler = &kctx->kbdev->csf.scheduler; + u8 const prio = get_kctx_highest_csg_priority(kctx); + + lockdep_assert_held(&scheduler->lock); + + if (WARN_ON(!list_empty(&info->mgr_link))) + list_del_init(&info->mgr_link); + + /* Count the pages that could be freed */ + info->nr_est_unused_pages = kbase_csf_tiler_heap_count_kctx_unused_pages(kctx); + /* Initialize the scan operation tracking pages */ + info->nr_freed_pages = 0; + + list_add_tail(&info->mgr_link, &scheduler->reclaim_mgr.ctx_lists[prio]); + /* Accumulate the estimated pages to the manager total field */ + atomic_add(info->nr_est_unused_pages, &scheduler->reclaim_mgr.unused_pages); + + dev_dbg(kctx->kbdev->dev, "Reclaim_mgr_attach: ctx_%d_%d, est_count_pages=%u", kctx->tgid, + kctx->id, info->nr_est_unused_pages); +} + +void kbase_csf_tiler_heap_reclaim_sched_notify_grp_active(struct kbase_queue_group *group) +{ + struct kbase_context *kctx = group->kctx; + struct kbase_csf_ctx_heap_reclaim_info *info = &kctx->csf.sched.heap_info; + + lockdep_assert_held(&kctx->kbdev->csf.scheduler.lock); + + info->on_slot_grps++; + /* If the kctx has an on-slot change from 0 => 1, detach it from reclaim_mgr */ + if (info->on_slot_grps == 1) { + dev_dbg(kctx->kbdev->dev, "CSG_%d_%d_%d on-slot, remove kctx from reclaim manager", + group->kctx->tgid, group->kctx->id, group->handle); + + detach_ctx_from_heap_reclaim_mgr(kctx); + } +} + +void kbase_csf_tiler_heap_reclaim_sched_notify_grp_evict(struct kbase_queue_group *group) +{ + struct kbase_context *kctx = group->kctx; + struct kbase_csf_ctx_heap_reclaim_info *const info = &kctx->csf.sched.heap_info; + struct kbase_csf_scheduler *const scheduler = &kctx->kbdev->csf.scheduler; + const u32 num_groups = kctx->kbdev->csf.global_iface.group_num; + u32 on_slot_grps = 0; + u32 i; + + lockdep_assert_held(&scheduler->lock); + + /* Group eviction from the scheduler is a bit more complex, but fairly less + * frequent in operations. Taking the opportunity to actually count the + * on-slot CSGs from the given kctx, for robustness and clearer code logic. + */ + for_each_set_bit(i, scheduler->csg_inuse_bitmap, num_groups) { + struct kbase_csf_csg_slot *csg_slot = &scheduler->csg_slots[i]; + struct kbase_queue_group *grp = csg_slot->resident_group; + + if (unlikely(!grp)) + continue; + + if (grp->kctx == kctx) + on_slot_grps++; + } + + info->on_slot_grps = on_slot_grps; + + /* If the kctx has no other CSGs on-slot, handle the heap reclaim related actions */ + if (!info->on_slot_grps) { + if (kctx->csf.sched.num_runnable_grps || kctx->csf.sched.num_idle_wait_grps) { + /* The kctx has other operational CSGs, attach it if not yet done */ + if (list_empty(&info->mgr_link)) { + dev_dbg(kctx->kbdev->dev, + "CSG_%d_%d_%d evict, add kctx to reclaim manager", + group->kctx->tgid, group->kctx->id, group->handle); + + attach_ctx_to_heap_reclaim_mgr(kctx); + } + } else { + /* The kctx is a zombie after the group eviction, drop it out */ + dev_dbg(kctx->kbdev->dev, + "CSG_%d_%d_%d evict leading to zombie kctx, dettach from reclaim manager", + group->kctx->tgid, group->kctx->id, group->handle); + + detach_ctx_from_heap_reclaim_mgr(kctx); + } + } +} + +void kbase_csf_tiler_heap_reclaim_sched_notify_grp_suspend(struct kbase_queue_group *group) +{ + struct kbase_context *kctx = group->kctx; + struct kbase_csf_ctx_heap_reclaim_info *info = &kctx->csf.sched.heap_info; + + lockdep_assert_held(&kctx->kbdev->csf.scheduler.lock); + + if (!WARN_ON(info->on_slot_grps == 0)) + info->on_slot_grps--; + /* If the kctx has no CSGs on-slot, attach it to scheduler's reclaim manager */ + if (info->on_slot_grps == 0) { + dev_dbg(kctx->kbdev->dev, "CSG_%d_%d_%d off-slot, add kctx to reclaim manager", + group->kctx->tgid, group->kctx->id, group->handle); + + attach_ctx_to_heap_reclaim_mgr(kctx); + } +} + +static unsigned long reclaim_unused_heap_pages(struct kbase_device *kbdev) +{ + struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler; + struct kbase_csf_sched_heap_reclaim_mgr *const mgr = &scheduler->reclaim_mgr; + unsigned long total_freed_pages = 0; + int prio; + + lockdep_assert_held(&scheduler->lock); + + if (scheduler->state != SCHED_SUSPENDED) { + /* Clean and invalidate the L2 cache before reading from the heap contexts, + * headers of the individual chunks and buffer descriptors. + */ + kbase_gpu_start_cache_clean(kbdev, GPU_COMMAND_CACHE_CLN_INV_L2); + if (kbase_gpu_wait_cache_clean_timeout(kbdev, + kbdev->mmu_or_gpu_cache_op_wait_time_ms)) + dev_warn( + kbdev->dev, + "[%llu] Timeout waiting for CACHE_CLN_INV_L2 to complete before Tiler heap reclaim", + kbase_backend_get_cycle_cnt(kbdev)); + + } else { + /* Make sure power down transitions have completed, i.e. L2 has been + * powered off as that would ensure its contents are flushed to memory. + * This is needed as Scheduler doesn't wait for the power down to finish. + */ + if (kbase_pm_wait_for_desired_state(kbdev)) + dev_warn(kbdev->dev, + "Wait for power down transition failed before Tiler heap reclaim"); + } + + for (prio = KBASE_QUEUE_GROUP_PRIORITY_LOW; + total_freed_pages < HEAP_RECLAIM_SCAN_BATCH_SIZE && + prio >= KBASE_QUEUE_GROUP_PRIORITY_REALTIME; + prio--) { + struct kbase_csf_ctx_heap_reclaim_info *info, *tmp; + u32 cnt_ctxs = 0; + + list_for_each_entry_safe(info, tmp, &scheduler->reclaim_mgr.ctx_lists[prio], + mgr_link) { + struct kbase_context *kctx = + container_of(info, struct kbase_context, csf.sched.heap_info); + u32 freed_pages = kbase_csf_tiler_heap_scan_kctx_unused_pages( + kctx, info->nr_est_unused_pages); + + if (freed_pages) { + /* Remove the freed pages from the manager retained estimate. The + * accumulated removals from the kctx should not exceed the kctx + * initially notified contribution amount: + * info->nr_est_unused_pages. + */ + u32 rm_cnt = MIN(info->nr_est_unused_pages - info->nr_freed_pages, + freed_pages); + + WARN_ON(atomic_sub_return(rm_cnt, &mgr->unused_pages) < 0); + + /* tracking the freed pages, before a potential detach call */ + info->nr_freed_pages += freed_pages; + total_freed_pages += freed_pages; + + schedule_work(&kctx->jit_work); + } + + /* If the kctx can't offer anymore, drop it from the reclaim manger, + * otherwise leave it remaining in. If the kctx changes its state (i.e. + * some CSGs becoming on-slot), the scheduler will pull it out. + */ + if (info->nr_freed_pages >= info->nr_est_unused_pages || freed_pages == 0) + detach_ctx_from_heap_reclaim_mgr(kctx); + + cnt_ctxs++; + + /* Enough has been freed, break to avoid holding the lock too long */ + if (total_freed_pages >= HEAP_RECLAIM_SCAN_BATCH_SIZE) + break; + } + + dev_dbg(kbdev->dev, "Reclaim free heap pages: %lu (cnt_ctxs: %u, prio: %d)", + total_freed_pages, cnt_ctxs, prio); + } + + dev_dbg(kbdev->dev, "Reclaim free total heap pages: %lu (across all CSG priority)", + total_freed_pages); + + return total_freed_pages; +} + +static unsigned long kbase_csf_tiler_heap_reclaim_count_free_pages(struct kbase_device *kbdev, + struct shrink_control *sc) +{ + struct kbase_csf_sched_heap_reclaim_mgr *mgr = &kbdev->csf.scheduler.reclaim_mgr; + unsigned long page_cnt = atomic_read(&mgr->unused_pages); + + dev_dbg(kbdev->dev, "Reclaim count unused pages (estimate): %lu", page_cnt); + + return page_cnt; +} + +static unsigned long kbase_csf_tiler_heap_reclaim_scan_free_pages(struct kbase_device *kbdev, + struct shrink_control *sc) +{ + struct kbase_csf_sched_heap_reclaim_mgr *mgr = &kbdev->csf.scheduler.reclaim_mgr; + unsigned long freed = 0; + unsigned long avail = 0; + + /* If Scheduler is busy in action, return 0 */ + if (!rt_mutex_trylock(&kbdev->csf.scheduler.lock)) { + struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler; + + /* Wait for roughly 2-ms */ + wait_event_timeout(kbdev->csf.event_wait, (scheduler->state != SCHED_BUSY), + msecs_to_jiffies(2)); + if (!rt_mutex_trylock(&kbdev->csf.scheduler.lock)) { + dev_dbg(kbdev->dev, "Tiler heap reclaim scan see device busy (freed: 0)"); + return 0; + } + } + + avail = atomic_read(&mgr->unused_pages); + if (avail) + freed = reclaim_unused_heap_pages(kbdev); + + rt_mutex_unlock(&kbdev->csf.scheduler.lock); + +#if (KERNEL_VERSION(4, 14, 0) <= LINUX_VERSION_CODE) + if (freed > sc->nr_to_scan) + sc->nr_scanned = freed; +#endif /* (KERNEL_VERSION(4, 14, 0) <= LINUX_VERSION_CODE) */ + + dev_dbg(kbdev->dev, "Tiler heap reclaim scan freed pages: %lu (unused: %lu)", freed, + avail); + + /* On estimate suggesting available, yet actual free failed, return STOP */ + if (avail && !freed) + return SHRINK_STOP; + else + return freed; +} + +static unsigned long kbase_csf_tiler_heap_reclaim_count_objects(struct shrinker *s, + struct shrink_control *sc) +{ + struct kbase_device *kbdev = + container_of(s, struct kbase_device, csf.scheduler.reclaim_mgr.heap_reclaim); + + return kbase_csf_tiler_heap_reclaim_count_free_pages(kbdev, sc); +} + +static unsigned long kbase_csf_tiler_heap_reclaim_scan_objects(struct shrinker *s, + struct shrink_control *sc) +{ + struct kbase_device *kbdev = + container_of(s, struct kbase_device, csf.scheduler.reclaim_mgr.heap_reclaim); + + return kbase_csf_tiler_heap_reclaim_scan_free_pages(kbdev, sc); +} + +void kbase_csf_tiler_heap_reclaim_ctx_init(struct kbase_context *kctx) +{ + /* Per-kctx heap_info object initialization */ + memset(&kctx->csf.sched.heap_info, 0, sizeof(struct kbase_csf_ctx_heap_reclaim_info)); + INIT_LIST_HEAD(&kctx->csf.sched.heap_info.mgr_link); +} + +void kbase_csf_tiler_heap_reclaim_mgr_init(struct kbase_device *kbdev) +{ + struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler; + struct shrinker *reclaim = &scheduler->reclaim_mgr.heap_reclaim; + u8 prio; + + for (prio = KBASE_QUEUE_GROUP_PRIORITY_REALTIME; prio < KBASE_QUEUE_GROUP_PRIORITY_COUNT; + prio++) + INIT_LIST_HEAD(&scheduler->reclaim_mgr.ctx_lists[prio]); + + atomic_set(&scheduler->reclaim_mgr.unused_pages, 0); + + reclaim->count_objects = kbase_csf_tiler_heap_reclaim_count_objects; + reclaim->scan_objects = kbase_csf_tiler_heap_reclaim_scan_objects; + reclaim->seeks = HEAP_SHRINKER_SEEKS; + reclaim->batch = HEAP_SHRINKER_BATCH; + +#if !defined(CONFIG_MALI_VECTOR_DUMP) +#if KERNEL_VERSION(6, 0, 0) > LINUX_VERSION_CODE + register_shrinker(reclaim); +#else + register_shrinker(reclaim, "mali-csf-tiler-heap"); +#endif +#endif +} + +void kbase_csf_tiler_heap_reclaim_mgr_term(struct kbase_device *kbdev) +{ + struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler; + u8 prio; + +#if !defined(CONFIG_MALI_VECTOR_DUMP) + unregister_shrinker(&scheduler->reclaim_mgr.heap_reclaim); +#endif + + for (prio = KBASE_QUEUE_GROUP_PRIORITY_REALTIME; prio < KBASE_QUEUE_GROUP_PRIORITY_COUNT; + prio++) + WARN_ON(!list_empty(&scheduler->reclaim_mgr.ctx_lists[prio])); + + WARN_ON(atomic_read(&scheduler->reclaim_mgr.unused_pages)); +} diff --git a/mali_kbase/csf/mali_kbase_csf_tiler_heap_reclaim.h b/mali_kbase/csf/mali_kbase_csf_tiler_heap_reclaim.h new file mode 100644 index 0000000..b6e580e --- /dev/null +++ b/mali_kbase/csf/mali_kbase_csf_tiler_heap_reclaim.h @@ -0,0 +1,80 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * + * (C) COPYRIGHT 2022 ARM Limited. All rights reserved. + * + * This program is free software and is provided to you under the terms of the + * GNU General Public License version 2 as published by the Free Software + * Foundation, and any use by you of this program is subject to the terms + * of such GNU license. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, you can access it online at + * http://www.gnu.org/licenses/gpl-2.0.html. + * + */ + +#ifndef _KBASE_CSF_TILER_HEAP_RECLAIM_H_ +#define _KBASE_CSF_TILER_HEAP_RECLAIM_H_ + +#include <mali_kbase.h> + +/** + * kbase_csf_tiler_heap_reclaim_sched_notify_grp_active - Notifier function for the scheduler + * to use when a group is put on-slot. + * + * @group: Pointer to the group object that has been placed on-slot for running. + * + */ +void kbase_csf_tiler_heap_reclaim_sched_notify_grp_active(struct kbase_queue_group *group); + +/** + * kbase_csf_tiler_heap_reclaim_sched_notify_grp_evict - Notifier function for the scheduler + * to use when a group is evicted out of the schedulder's scope, i.e no run of + * the group is possible afterwards. + * + * @group: Pointer to the group object that has been evicted. + * + */ +void kbase_csf_tiler_heap_reclaim_sched_notify_grp_evict(struct kbase_queue_group *group); + +/** + * kbase_csf_tiler_heap_reclaim_sched_notify_grp_suspend - Notifier function for the scheduler + * to use when a group is suspended from running, but could resume in future. + * + * @group: Pointer to the group object that is in suspended state. + * + */ +void kbase_csf_tiler_heap_reclaim_sched_notify_grp_suspend(struct kbase_queue_group *group); + +/** + * kbase_csf_tiler_heap_reclaim_ctx_init - Initializer on per context data fields for use + * with the tiler heap reclaim manager. + * + * @kctx: Pointer to the kbase_context. + * + */ +void kbase_csf_tiler_heap_reclaim_ctx_init(struct kbase_context *kctx); + +/** + * kbase_csf_tiler_heap_reclaim_mgr_init - Initializer for the tiler heap reclaim manger. + * + * @kbdev: Pointer to the device. + * + */ +void kbase_csf_tiler_heap_reclaim_mgr_init(struct kbase_device *kbdev); + +/** + * kbase_csf_tiler_heap_reclaim_mgr_term - Termination call for the tiler heap reclaim manger. + * + * @kbdev: Pointer to the device. + * + */ +void kbase_csf_tiler_heap_reclaim_mgr_term(struct kbase_device *kbdev); + +#endif diff --git a/mali_kbase/csf/mali_kbase_csf_timeout.c b/mali_kbase/csf/mali_kbase_csf_timeout.c index ea6c116..f7fcbb1 100644 --- a/mali_kbase/csf/mali_kbase_csf_timeout.c +++ b/mali_kbase/csf/mali_kbase_csf_timeout.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -52,6 +52,7 @@ static int set_timeout(struct kbase_device *const kbdev, u64 const timeout) dev_dbg(kbdev->dev, "New progress timeout: %llu cycles\n", timeout); atomic64_set(&kbdev->csf.progress_timeout, timeout); + kbase_device_set_timeout(kbdev, CSF_SCHED_PROTM_PROGRESS_TIMEOUT, timeout, 1); return 0; } @@ -100,7 +101,7 @@ static ssize_t progress_timeout_store(struct device * const dev, if (!err) { kbase_csf_scheduler_pm_active(kbdev); - err = kbase_csf_scheduler_wait_mcu_active(kbdev); + err = kbase_csf_scheduler_killable_wait_mcu_active(kbdev); if (!err) err = kbase_csf_firmware_set_timeout(kbdev, timeout); @@ -147,8 +148,14 @@ int kbase_csf_timeout_init(struct kbase_device *const kbdev) int err; #if IS_ENABLED(CONFIG_OF) - err = of_property_read_u64(kbdev->dev->of_node, - "progress_timeout", &timeout); + /* Read "progress-timeout" property and fallback to "progress_timeout" + * if not found. + */ + err = of_property_read_u64(kbdev->dev->of_node, "progress-timeout", &timeout); + + if (err == -EINVAL) + err = of_property_read_u64(kbdev->dev->of_node, "progress_timeout", &timeout); + if (!err) dev_info(kbdev->dev, "Found progress_timeout = %llu in Devicetree\n", timeout); diff --git a/mali_kbase/csf/mali_kbase_csf_tl_reader.c b/mali_kbase/csf/mali_kbase_csf_tl_reader.c index f40be8f..ce50683 100644 --- a/mali_kbase/csf/mali_kbase_csf_tl_reader.c +++ b/mali_kbase/csf/mali_kbase_csf_tl_reader.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -31,21 +31,14 @@ #include "mali_kbase_pm.h" #include "mali_kbase_hwaccess_time.h" -#include <linux/gcd.h> #include <linux/math64.h> -#include <asm/arch_timer.h> #if IS_ENABLED(CONFIG_DEBUG_FS) #include "tl/mali_kbase_timeline_priv.h" #include <linux/debugfs.h> - -#if (KERNEL_VERSION(4, 7, 0) > LINUX_VERSION_CODE) -#define DEFINE_DEBUGFS_ATTRIBUTE DEFINE_SIMPLE_ATTRIBUTE -#endif +#include <linux/version_compat_defs.h> #endif -/* Name of the CSFFW timeline tracebuffer. */ -#define KBASE_CSFFW_TRACEBUFFER_NAME "timeline" /* Name of the timeline header metatadata */ #define KBASE_CSFFW_TIMELINE_HEADER_NAME "timeline_header" @@ -92,93 +85,15 @@ DEFINE_DEBUGFS_ATTRIBUTE(kbase_csf_tl_poll_interval_fops, kbase_csf_tl_debugfs_poll_interval_read, kbase_csf_tl_debugfs_poll_interval_write, "%llu\n"); - void kbase_csf_tl_reader_debugfs_init(struct kbase_device *kbdev) { debugfs_create_file("csf_tl_poll_interval_in_ms", 0644, kbdev->debugfs_instr_directory, kbdev, &kbase_csf_tl_poll_interval_fops); - } #endif /** - * get_cpu_gpu_time() - Get current CPU and GPU timestamps. - * - * @kbdev: Kbase device. - * @cpu_ts: Output CPU timestamp. - * @gpu_ts: Output GPU timestamp. - * @gpu_cycle: Output GPU cycle counts. - */ -static void get_cpu_gpu_time( - struct kbase_device *kbdev, - u64 *cpu_ts, - u64 *gpu_ts, - u64 *gpu_cycle) -{ - struct timespec64 ts; - - kbase_pm_context_active(kbdev); - kbase_backend_get_gpu_time(kbdev, gpu_cycle, gpu_ts, &ts); - kbase_pm_context_idle(kbdev); - - if (cpu_ts) - *cpu_ts = ts.tv_sec * NSEC_PER_SEC + ts.tv_nsec; -} - - -/** - * kbase_ts_converter_init() - Initialize system timestamp converter. - * - * @self: System Timestamp Converter instance. - * @kbdev: Kbase device pointer - * - * Return: Zero on success, -1 otherwise. - */ -static int kbase_ts_converter_init( - struct kbase_ts_converter *self, - struct kbase_device *kbdev) -{ - u64 cpu_ts = 0; - u64 gpu_ts = 0; - u64 freq; - u64 common_factor; - - get_cpu_gpu_time(kbdev, &cpu_ts, &gpu_ts, NULL); - freq = arch_timer_get_cntfrq(); - - if (!freq) { - dev_warn(kbdev->dev, "arch_timer_get_rate() is zero!"); - return -1; - } - - common_factor = gcd(NSEC_PER_SEC, freq); - - self->multiplier = div64_u64(NSEC_PER_SEC, common_factor); - self->divisor = div64_u64(freq, common_factor); - self->offset = - cpu_ts - div64_u64(gpu_ts * self->multiplier, self->divisor); - - return 0; -} - -/** - * kbase_ts_converter_convert() - Convert GPU timestamp to CPU timestamp. - * - * @self: System Timestamp Converter instance. - * @gpu_ts: System timestamp value to converter. - * - * Return: The CPU timestamp. - */ -static void __maybe_unused -kbase_ts_converter_convert(const struct kbase_ts_converter *self, u64 *gpu_ts) -{ - u64 old_gpu_ts = *gpu_ts; - *gpu_ts = div64_u64(old_gpu_ts * self->multiplier, self->divisor) + - self->offset; -} - -/** * tl_reader_overflow_notify() - Emit stream overflow tracepoint. * * @self: CSFFW TL Reader instance. @@ -254,7 +169,6 @@ static void tl_reader_reset(struct kbase_csf_tl_reader *self) self->tl_header.btc = 0; } - int kbase_csf_tl_reader_flush_buffer(struct kbase_csf_tl_reader *self) { int ret = 0; @@ -279,7 +193,6 @@ int kbase_csf_tl_reader_flush_buffer(struct kbase_csf_tl_reader *self) return -EBUSY; } - /* Copying the whole buffer in a single shot. We assume * that the buffer will not contain partially written messages. */ @@ -301,7 +214,7 @@ int kbase_csf_tl_reader_flush_buffer(struct kbase_csf_tl_reader *self) dev_warn( kbdev->dev, "Unable to parse CSFFW tracebuffer event header."); - ret = -EBUSY; + ret = -EBUSY; break; } @@ -322,7 +235,7 @@ int kbase_csf_tl_reader_flush_buffer(struct kbase_csf_tl_reader *self) dev_warn(kbdev->dev, "event_id: %u, can't read with event_size: %u.", event_id, event_size); - ret = -EBUSY; + ret = -EBUSY; break; } @@ -330,8 +243,8 @@ int kbase_csf_tl_reader_flush_buffer(struct kbase_csf_tl_reader *self) { struct kbase_csffw_tl_message *msg = (struct kbase_csffw_tl_message *) csffw_data_it; - kbase_ts_converter_convert(&self->ts_converter, - &msg->timestamp); + msg->timestamp = + kbase_backend_time_convert_gpu_to_cpu(kbdev, msg->timestamp); } /* Copy the message out to the tl_stream. */ @@ -384,16 +297,13 @@ static int tl_reader_init_late( if (self->kbdev) return 0; - tb = kbase_csf_firmware_get_trace_buffer( - kbdev, KBASE_CSFFW_TRACEBUFFER_NAME); + tb = kbase_csf_firmware_get_trace_buffer(kbdev, KBASE_CSFFW_TIMELINE_BUF_NAME); hdr = kbase_csf_firmware_get_timeline_metadata( kbdev, KBASE_CSFFW_TIMELINE_HEADER_NAME, &hdr_size); if (!tb) { - dev_warn( - kbdev->dev, - "'%s' tracebuffer is not present in the firmware image.", - KBASE_CSFFW_TRACEBUFFER_NAME); + dev_warn(kbdev->dev, "'%s' tracebuffer is not present in the firmware image.", + KBASE_CSFFW_TIMELINE_BUF_NAME); return -1; } @@ -405,9 +315,6 @@ static int tl_reader_init_late( return -1; } - if (kbase_ts_converter_init(&self->ts_converter, kbdev)) - return -1; - self->kbdev = kbdev; self->trace_buffer = tb; self->tl_header.data = hdr; diff --git a/mali_kbase/csf/mali_kbase_csf_tl_reader.h b/mali_kbase/csf/mali_kbase_csf_tl_reader.h index d554d56..12b285f 100644 --- a/mali_kbase/csf/mali_kbase_csf_tl_reader.h +++ b/mali_kbase/csf/mali_kbase_csf_tl_reader.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -40,37 +40,6 @@ struct kbase_tlstream; struct kbase_device; /** - * struct kbase_ts_converter - System timestamp to CPU timestamp converter state. - * - * @multiplier: Numerator of the converter's fraction. - * @divisor: Denominator of the converter's fraction. - * @offset: Converter's offset term. - * - * According to Generic timer spec, system timer: - * - Increments at a fixed frequency - * - Starts operating from zero - * - * Hence CPU time is a linear function of System Time. - * - * CPU_ts = alpha * SYS_ts + beta - * - * Where - * - alpha = 10^9/SYS_ts_freq - * - beta is calculated by two timer samples taken at the same time: - * beta = CPU_ts_s - SYS_ts_s * alpha - * - * Since alpha is a rational number, we minimizing possible - * rounding error by simplifying the ratio. Thus alpha is stored - * as a simple `multiplier / divisor` ratio. - * - */ -struct kbase_ts_converter { - u64 multiplier; - u64 divisor; - s64 offset; -}; - -/** * struct kbase_csf_tl_reader - CSFFW timeline reader state. * * @read_timer: Timer used for periodical tracebufer reading. @@ -106,7 +75,6 @@ struct kbase_csf_tl_reader { size_t size; size_t btc; } tl_header; - struct kbase_ts_converter ts_converter; bool got_first_event; bool is_active; diff --git a/mali_kbase/csf/mali_kbase_csf_trace_buffer.c b/mali_kbase/csf/mali_kbase_csf_trace_buffer.c index e90d30d..2b63f19 100644 --- a/mali_kbase/csf/mali_kbase_csf_trace_buffer.c +++ b/mali_kbase/csf/mali_kbase_csf_trace_buffer.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2018-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2018-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -28,12 +28,7 @@ #include <linux/list.h> #include <linux/mman.h> - -#if IS_ENABLED(CONFIG_DEBUG_FS) -#if (KERNEL_VERSION(4, 7, 0) > LINUX_VERSION_CODE) -#define DEFINE_DEBUGFS_ATTRIBUTE DEFINE_SIMPLE_ATTRIBUTE -#endif -#endif +#include <linux/version_compat_defs.h> /** * struct firmware_trace_buffer - Trace Buffer within the MCU firmware @@ -94,7 +89,7 @@ struct firmware_trace_buffer { } cpu_va; u32 num_pages; u32 trace_enable_init_mask[CSF_FIRMWARE_TRACE_ENABLE_INIT_MASK_MAX]; - char name[1]; /* this field must be last */ + char name[]; /* this field must be last */ }; /** @@ -123,11 +118,19 @@ struct firmware_trace_buffer_data { */ static const struct firmware_trace_buffer_data trace_buffer_data[] = { #if MALI_UNIT_TEST - { "fwutf", { 0 }, 1 }, + { KBASE_CSFFW_UTF_BUF_NAME, { 0 }, 1 }, #endif - { FW_TRACE_BUF_NAME, { 0 }, 4 }, - { "benchmark", { 0 }, 2 }, - { "timeline", { 0 }, KBASE_CSF_TL_BUFFER_NR_PAGES }, +#ifdef CONFIG_MALI_PIXEL_GPU_SSCD + /* Enable all the logs */ + { KBASE_CSFFW_LOG_BUF_NAME, { 0xFFFFFFFF }, FW_TRACE_BUF_NR_PAGES }, +#else + { KBASE_CSFFW_LOG_BUF_NAME, { 0 }, FW_TRACE_BUF_NR_PAGES }, +#endif /* CONFIG_MALI_PIXEL_GPU_SSCD */ + { KBASE_CSFFW_BENCHMARK_BUF_NAME, { 0 }, 2 }, + { KBASE_CSFFW_TIMELINE_BUF_NAME, { 0 }, KBASE_CSF_TL_BUFFER_NR_PAGES }, +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) + { KBASE_CSFFW_GPU_METRICS_BUF_NAME, { 0 }, 8 }, +#endif /* CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD */ }; int kbase_csf_firmware_trace_buffers_init(struct kbase_device *kbdev) @@ -265,7 +268,7 @@ int kbase_csf_firmware_parse_trace_buffer_entry(struct kbase_device *kbdev, * trace buffer name (with NULL termination). */ trace_buffer = - kmalloc(sizeof(*trace_buffer) + name_len + 1, GFP_KERNEL); + kmalloc(struct_size(trace_buffer, name, name_len + 1), GFP_KERNEL); if (!trace_buffer) return -ENOMEM; @@ -512,10 +515,47 @@ unsigned int kbase_csf_firmware_trace_buffer_read_data( } EXPORT_SYMBOL(kbase_csf_firmware_trace_buffer_read_data); -#if IS_ENABLED(CONFIG_DEBUG_FS) +void kbase_csf_firmware_trace_buffer_discard(struct firmware_trace_buffer *trace_buffer) +{ + unsigned int bytes_discarded; + u32 buffer_size = trace_buffer->num_pages << PAGE_SHIFT; + u32 extract_offset = *(trace_buffer->cpu_va.extract_cpu_va); + u32 insert_offset = *(trace_buffer->cpu_va.insert_cpu_va); + unsigned int trace_size; + + if (insert_offset >= extract_offset) { + trace_size = insert_offset - extract_offset; + if (trace_size > buffer_size / 2) { + bytes_discarded = trace_size - buffer_size / 2; + extract_offset += bytes_discarded; + *(trace_buffer->cpu_va.extract_cpu_va) = extract_offset; + } + } else { + unsigned int bytes_tail; + + bytes_tail = buffer_size - extract_offset; + trace_size = bytes_tail + insert_offset; + if (trace_size > buffer_size / 2) { + bytes_discarded = trace_size - buffer_size / 2; + extract_offset += bytes_discarded; + if (extract_offset >= buffer_size) + extract_offset = extract_offset - buffer_size; + *(trace_buffer->cpu_va.extract_cpu_va) = extract_offset; + } + } +} +EXPORT_SYMBOL(kbase_csf_firmware_trace_buffer_discard); + +static void update_trace_buffer_active_mask64(struct firmware_trace_buffer *tb, u64 mask) +{ + unsigned int i; + + for (i = 0; i < tb->trace_enable_entry_count; i++) + kbasep_csf_firmware_trace_buffer_update_trace_enable_bit(tb, i, (mask >> i) & 1); +} #define U32_BITS 32 -static u64 get_trace_buffer_active_mask64(struct firmware_trace_buffer *tb) +u64 kbase_csf_firmware_trace_buffer_get_active_mask64(struct firmware_trace_buffer *tb) { u64 active_mask = tb->trace_enable_init_mask[0]; @@ -525,18 +565,7 @@ static u64 get_trace_buffer_active_mask64(struct firmware_trace_buffer *tb) return active_mask; } -static void update_trace_buffer_active_mask64(struct firmware_trace_buffer *tb, - u64 mask) -{ - unsigned int i; - - for (i = 0; i < tb->trace_enable_entry_count; i++) - kbasep_csf_firmware_trace_buffer_update_trace_enable_bit( - tb, i, (mask >> i) & 1); -} - -static int set_trace_buffer_active_mask64(struct firmware_trace_buffer *tb, - u64 mask) +int kbase_csf_firmware_trace_buffer_set_active_mask64(struct firmware_trace_buffer *tb, u64 mask) { struct kbase_device *kbdev = tb->kbdev; unsigned long flags; @@ -564,124 +593,3 @@ static int set_trace_buffer_active_mask64(struct firmware_trace_buffer *tb, return err; } - -static int kbase_csf_firmware_trace_enable_mask_read(void *data, u64 *val) -{ - struct kbase_device *kbdev = (struct kbase_device *)data; - struct firmware_trace_buffer *tb = - kbase_csf_firmware_get_trace_buffer(kbdev, FW_TRACE_BUF_NAME); - - if (tb == NULL) { - dev_err(kbdev->dev, "Couldn't get the firmware trace buffer"); - return -EIO; - } - /* The enabled traces limited to u64 here, regarded practical */ - *val = get_trace_buffer_active_mask64(tb); - return 0; -} - -static int kbase_csf_firmware_trace_enable_mask_write(void *data, u64 val) -{ - struct kbase_device *kbdev = (struct kbase_device *)data; - struct firmware_trace_buffer *tb = - kbase_csf_firmware_get_trace_buffer(kbdev, FW_TRACE_BUF_NAME); - u64 new_mask; - unsigned int enable_bits_count; - - if (tb == NULL) { - dev_err(kbdev->dev, "Couldn't get the firmware trace buffer"); - return -EIO; - } - - /* Ignore unsupported types */ - enable_bits_count = - kbase_csf_firmware_trace_buffer_get_trace_enable_bits_count(tb); - if (enable_bits_count > 64) { - dev_dbg(kbdev->dev, "Limit enabled bits count from %u to 64", - enable_bits_count); - enable_bits_count = 64; - } - new_mask = val & ((1 << enable_bits_count) - 1); - - if (new_mask != get_trace_buffer_active_mask64(tb)) - return set_trace_buffer_active_mask64(tb, new_mask); - else - return 0; -} - -static int kbasep_csf_firmware_trace_debugfs_open(struct inode *in, - struct file *file) -{ - struct kbase_device *kbdev = in->i_private; - - file->private_data = kbdev; - dev_dbg(kbdev->dev, "Opened firmware trace buffer dump debugfs file"); - - return 0; -} - -static ssize_t kbasep_csf_firmware_trace_debugfs_read(struct file *file, - char __user *buf, size_t size, loff_t *ppos) -{ - struct kbase_device *kbdev = file->private_data; - u8 *pbyte; - unsigned int n_read; - unsigned long not_copied; - /* Limit the kernel buffer to no more than two pages */ - size_t mem = MIN(size, 2 * PAGE_SIZE); - unsigned long flags; - - struct firmware_trace_buffer *tb = - kbase_csf_firmware_get_trace_buffer(kbdev, FW_TRACE_BUF_NAME); - - if (tb == NULL) { - dev_err(kbdev->dev, "Couldn't get the firmware trace buffer"); - return -EIO; - } - - pbyte = kmalloc(mem, GFP_KERNEL); - if (pbyte == NULL) { - dev_err(kbdev->dev, "Couldn't allocate memory for trace buffer dump"); - return -ENOMEM; - } - - spin_lock_irqsave(&kbdev->hwaccess_lock, flags); - n_read = kbase_csf_firmware_trace_buffer_read_data(tb, pbyte, mem); - spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); - - /* Do the copy, if we have obtained some trace data */ - not_copied = (n_read) ? copy_to_user(buf, pbyte, n_read) : 0; - kfree(pbyte); - - if (!not_copied) { - *ppos += n_read; - return n_read; - } - - dev_err(kbdev->dev, "Couldn't copy trace buffer data to user space buffer"); - return -EFAULT; -} - - -DEFINE_SIMPLE_ATTRIBUTE(kbase_csf_firmware_trace_enable_mask_fops, - kbase_csf_firmware_trace_enable_mask_read, - kbase_csf_firmware_trace_enable_mask_write, "%llx\n"); - -static const struct file_operations kbasep_csf_firmware_trace_debugfs_fops = { - .owner = THIS_MODULE, - .open = kbasep_csf_firmware_trace_debugfs_open, - .read = kbasep_csf_firmware_trace_debugfs_read, - .llseek = no_llseek, -}; - -void kbase_csf_firmware_trace_buffer_debugfs_init(struct kbase_device *kbdev) -{ - debugfs_create_file("fw_trace_enable_mask", 0644, - kbdev->mali_debugfs_directory, kbdev, - &kbase_csf_firmware_trace_enable_mask_fops); - - debugfs_create_file("fw_traces", 0444, - kbdev->mali_debugfs_directory, kbdev, - &kbasep_csf_firmware_trace_debugfs_fops); -} -#endif /* CONFIG_DEBUG_FS */ diff --git a/mali_kbase/csf/mali_kbase_csf_trace_buffer.h b/mali_kbase/csf/mali_kbase_csf_trace_buffer.h index 823ace7..c0a42ca 100644 --- a/mali_kbase/csf/mali_kbase_csf_trace_buffer.h +++ b/mali_kbase/csf/mali_kbase_csf_trace_buffer.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2018-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2018-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -25,7 +25,16 @@ #include <linux/types.h> #define CSF_FIRMWARE_TRACE_ENABLE_INIT_MASK_MAX (4) -#define FW_TRACE_BUF_NAME "fwlog" +#define FW_TRACE_BUF_NR_PAGES 4 +#if MALI_UNIT_TEST +#define KBASE_CSFFW_UTF_BUF_NAME "fwutf" +#endif +#define KBASE_CSFFW_LOG_BUF_NAME "fwlog" +#define KBASE_CSFFW_BENCHMARK_BUF_NAME "benchmark" +#define KBASE_CSFFW_TIMELINE_BUF_NAME "timeline" +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) +#define KBASE_CSFFW_GPU_METRICS_BUF_NAME "gpu_metrics" +#endif /* CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD */ /* Forward declarations */ struct firmware_trace_buffer; @@ -58,7 +67,7 @@ struct kbase_device; int kbase_csf_firmware_trace_buffers_init(struct kbase_device *kbdev); /** - * kbase_csf_firmware_trace_buffer_term - Terminate trace buffers + * kbase_csf_firmware_trace_buffers_term - Terminate trace buffers * * @kbdev: Device pointer */ @@ -116,7 +125,8 @@ struct firmware_trace_buffer *kbase_csf_firmware_get_trace_buffer( struct kbase_device *kbdev, const char *name); /** - * kbase_csf_firmware_trace_buffer_get_trace_enable_bits_count - Get number of trace enable bits for a trace buffer + * kbase_csf_firmware_trace_buffer_get_trace_enable_bits_count - Get number of trace enable bits + * for a trace buffer * * @trace_buffer: Trace buffer handle * @@ -165,15 +175,32 @@ bool kbase_csf_firmware_trace_buffer_is_empty( unsigned int kbase_csf_firmware_trace_buffer_read_data( struct firmware_trace_buffer *trace_buffer, u8 *data, unsigned int num_bytes); -#if IS_ENABLED(CONFIG_DEBUG_FS) /** - * kbase_csf_fw_trace_buffer_debugfs_init() - Add debugfs entries for setting - * enable mask and dumping the binary - * firmware trace buffer + * kbase_csf_firmware_trace_buffer_discard - Discard data from a trace buffer * - * @kbdev: Pointer to the device + * @trace_buffer: Trace buffer handle + * + * Discard part of the data in the trace buffer to reduce its utilization to half of its size. + */ +void kbase_csf_firmware_trace_buffer_discard(struct firmware_trace_buffer *trace_buffer); + +/** + * kbase_csf_firmware_trace_buffer_get_active_mask64 - Get trace buffer active mask + * + * @tb: Trace buffer handle + * + * Return: Trace buffer active mask. + */ +u64 kbase_csf_firmware_trace_buffer_get_active_mask64(struct firmware_trace_buffer *tb); + +/** + * kbase_csf_firmware_trace_buffer_set_active_mask64 - Set trace buffer active mask + * + * @tb: Trace buffer handle + * @mask: New active mask + * + * Return: 0 if successful, negative error code on failure. */ -void kbase_csf_firmware_trace_buffer_debugfs_init(struct kbase_device *kbdev); -#endif /* CONFIG_DEBUG_FS */ +int kbase_csf_firmware_trace_buffer_set_active_mask64(struct firmware_trace_buffer *tb, u64 mask); #endif /* _KBASE_CSF_TRACE_BUFFER_H_ */ diff --git a/mali_kbase/csf/mali_kbase_debug_csf_fault.c b/mali_kbase/csf/mali_kbase_debug_csf_fault.c new file mode 100644 index 0000000..185779c --- /dev/null +++ b/mali_kbase/csf/mali_kbase_debug_csf_fault.c @@ -0,0 +1,271 @@ +// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note +/* + * + * (C) COPYRIGHT 2022 ARM Limited. All rights reserved. + * + * This program is free software and is provided to you under the terms of the + * GNU General Public License version 2 as published by the Free Software + * Foundation, and any use by you of this program is subject to the terms + * of such GNU license. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, you can access it online at + * http://www.gnu.org/licenses/gpl-2.0.html. + * + */ + +#include <mali_kbase.h> + +#if IS_ENABLED(CONFIG_DEBUG_FS) + +/** + * kbasep_fault_occurred - Check if fault occurred. + * + * @kbdev: Device pointer + * + * Return: true if a fault occurred. + */ +static bool kbasep_fault_occurred(struct kbase_device *kbdev) +{ + unsigned long flags; + bool ret; + + spin_lock_irqsave(&kbdev->csf.dof.lock, flags); + ret = (kbdev->csf.dof.error_code != DF_NO_ERROR); + spin_unlock_irqrestore(&kbdev->csf.dof.lock, flags); + + return ret; +} + +void kbase_debug_csf_fault_wait_completion(struct kbase_device *kbdev) +{ + if (likely(!kbase_debug_csf_fault_dump_enabled(kbdev))) { + dev_dbg(kbdev->dev, "No userspace client for dumping exists"); + return; + } + + wait_event(kbdev->csf.dof.dump_wait_wq, kbase_debug_csf_fault_dump_complete(kbdev)); +} +KBASE_EXPORT_TEST_API(kbase_debug_csf_fault_wait_completion); + +/** + * kbase_debug_csf_fault_wakeup - Wake up a waiting user space client. + * + * @kbdev: Kbase device + */ +static void kbase_debug_csf_fault_wakeup(struct kbase_device *kbdev) +{ + wake_up_interruptible(&kbdev->csf.dof.fault_wait_wq); +} + +bool kbase_debug_csf_fault_notify(struct kbase_device *kbdev, + struct kbase_context *kctx, enum dumpfault_error_type error) +{ + unsigned long flags; + + if (likely(!kbase_debug_csf_fault_dump_enabled(kbdev))) + return false; + + if (WARN_ON(error == DF_NO_ERROR)) + return false; + + if (kctx && kbase_ctx_flag(kctx, KCTX_DYING)) { + dev_info(kbdev->dev, "kctx %d_%d is dying when error %d is reported", + kctx->tgid, kctx->id, error); + kctx = NULL; + } + + spin_lock_irqsave(&kbdev->csf.dof.lock, flags); + + /* Only one fault at a time can be processed */ + if (kbdev->csf.dof.error_code) { + dev_info(kbdev->dev, "skip this fault as there's a pending fault"); + goto unlock; + } + + kbdev->csf.dof.kctx_tgid = kctx ? kctx->tgid : 0; + kbdev->csf.dof.kctx_id = kctx ? kctx->id : 0; + kbdev->csf.dof.error_code = error; + kbase_debug_csf_fault_wakeup(kbdev); + +unlock: + spin_unlock_irqrestore(&kbdev->csf.dof.lock, flags); + return true; +} + +static ssize_t debug_csf_fault_read(struct file *file, char __user *buffer, size_t size, + loff_t *f_pos) +{ +#define BUF_SIZE 64 + struct kbase_device *kbdev; + unsigned long flags; + int count; + char buf[BUF_SIZE]; + u32 tgid, ctx_id; + enum dumpfault_error_type error_code; + + if (unlikely(!file)) { + pr_warn("%s: file is NULL", __func__); + return -EINVAL; + } + + kbdev = file->private_data; + if (unlikely(!buffer)) { + dev_warn(kbdev->dev, "%s: buffer is NULL", __func__); + return -EINVAL; + } + + if (unlikely(*f_pos < 0)) { + dev_warn(kbdev->dev, "%s: f_pos is negative", __func__); + return -EINVAL; + } + + if (size < sizeof(buf)) { + dev_warn(kbdev->dev, "%s: buffer is too small", __func__); + return -EINVAL; + } + + if (wait_event_interruptible(kbdev->csf.dof.fault_wait_wq, kbasep_fault_occurred(kbdev))) + return -ERESTARTSYS; + + spin_lock_irqsave(&kbdev->csf.dof.lock, flags); + tgid = kbdev->csf.dof.kctx_tgid; + ctx_id = kbdev->csf.dof.kctx_id; + error_code = kbdev->csf.dof.error_code; + BUILD_BUG_ON(sizeof(buf) < (sizeof(tgid) + sizeof(ctx_id) + sizeof(error_code))); + count = scnprintf(buf, sizeof(buf), "%u_%u_%u\n", tgid, ctx_id, error_code); + spin_unlock_irqrestore(&kbdev->csf.dof.lock, flags); + + dev_info(kbdev->dev, "debug csf fault info read"); + return simple_read_from_buffer(buffer, size, f_pos, buf, count); +} + +static int debug_csf_fault_open(struct inode *in, struct file *file) +{ + struct kbase_device *kbdev; + + if (unlikely(!in)) { + pr_warn("%s: inode is NULL", __func__); + return -EINVAL; + } + + kbdev = in->i_private; + if (unlikely(!file)) { + dev_warn(kbdev->dev, "%s: file is NULL", __func__); + return -EINVAL; + } + + if (atomic_cmpxchg(&kbdev->csf.dof.enabled, 0, 1) == 1) { + dev_warn(kbdev->dev, "Only one client is allowed for dump on fault"); + return -EBUSY; + } + + dev_info(kbdev->dev, "debug csf fault file open"); + + return simple_open(in, file); +} + +static ssize_t debug_csf_fault_write(struct file *file, const char __user *ubuf, size_t count, + loff_t *ppos) +{ + struct kbase_device *kbdev; + unsigned long flags; + + if (unlikely(!file)) { + pr_warn("%s: file is NULL", __func__); + return -EINVAL; + } + + kbdev = file->private_data; + spin_lock_irqsave(&kbdev->csf.dof.lock, flags); + kbdev->csf.dof.error_code = DF_NO_ERROR; + kbdev->csf.dof.kctx_tgid = 0; + kbdev->csf.dof.kctx_id = 0; + dev_info(kbdev->dev, "debug csf fault dump complete"); + spin_unlock_irqrestore(&kbdev->csf.dof.lock, flags); + + /* User space finished the dump. + * Wake up blocked kernel threads to proceed. + */ + wake_up(&kbdev->csf.dof.dump_wait_wq); + + return count; +} + +static int debug_csf_fault_release(struct inode *in, struct file *file) +{ + struct kbase_device *kbdev; + unsigned long flags; + + if (unlikely(!in)) { + pr_warn("%s: inode is NULL", __func__); + return -EINVAL; + } + + kbdev = in->i_private; + spin_lock_irqsave(&kbdev->csf.dof.lock, flags); + kbdev->csf.dof.kctx_tgid = 0; + kbdev->csf.dof.kctx_id = 0; + kbdev->csf.dof.error_code = DF_NO_ERROR; + spin_unlock_irqrestore(&kbdev->csf.dof.lock, flags); + + atomic_set(&kbdev->csf.dof.enabled, 0); + dev_info(kbdev->dev, "debug csf fault file close"); + + /* User space closed the debugfs file. + * Wake up blocked kernel threads to resume. + */ + wake_up(&kbdev->csf.dof.dump_wait_wq); + + return 0; +} + +static const struct file_operations kbasep_debug_csf_fault_fops = { + .owner = THIS_MODULE, + .open = debug_csf_fault_open, + .read = debug_csf_fault_read, + .write = debug_csf_fault_write, + .llseek = default_llseek, + .release = debug_csf_fault_release, +}; + +void kbase_debug_csf_fault_debugfs_init(struct kbase_device *kbdev) +{ + const char *fname = "csf_fault"; + + if (unlikely(!kbdev)) { + pr_warn("%s: kbdev is NULL", __func__); + return; + } + + debugfs_create_file(fname, 0600, kbdev->mali_debugfs_directory, kbdev, + &kbasep_debug_csf_fault_fops); +} + +int kbase_debug_csf_fault_init(struct kbase_device *kbdev) +{ + if (unlikely(!kbdev)) { + pr_warn("%s: kbdev is NULL", __func__); + return -EINVAL; + } + + init_waitqueue_head(&(kbdev->csf.dof.fault_wait_wq)); + init_waitqueue_head(&(kbdev->csf.dof.dump_wait_wq)); + spin_lock_init(&kbdev->csf.dof.lock); + kbdev->csf.dof.kctx_tgid = 0; + kbdev->csf.dof.kctx_id = 0; + kbdev->csf.dof.error_code = DF_NO_ERROR; + atomic_set(&kbdev->csf.dof.enabled, 0); + + return 0; +} + +void kbase_debug_csf_fault_term(struct kbase_device *kbdev) +{ +} +#endif /* CONFIG_DEBUG_FS */ diff --git a/mali_kbase/csf/mali_kbase_debug_csf_fault.h b/mali_kbase/csf/mali_kbase_debug_csf_fault.h new file mode 100644 index 0000000..6e9b1a9 --- /dev/null +++ b/mali_kbase/csf/mali_kbase_debug_csf_fault.h @@ -0,0 +1,137 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * + * (C) COPYRIGHT 2022 ARM Limited. All rights reserved. + * + * This program is free software and is provided to you under the terms of the + * GNU General Public License version 2 as published by the Free Software + * Foundation, and any use by you of this program is subject to the terms + * of such GNU license. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, you can access it online at + * http://www.gnu.org/licenses/gpl-2.0.html. + * + */ + +#ifndef _KBASE_DEBUG_CSF_FAULT_H +#define _KBASE_DEBUG_CSF_FAULT_H + +#if IS_ENABLED(CONFIG_DEBUG_FS) +/** + * kbase_debug_csf_fault_debugfs_init - Initialize CSF fault debugfs + * @kbdev: Device pointer + */ +void kbase_debug_csf_fault_debugfs_init(struct kbase_device *kbdev); + +/** + * kbase_debug_csf_fault_init - Create the fault event wait queue per device + * and initialize the required resources. + * @kbdev: Device pointer + * + * Return: Zero on success or a negative error code. + */ +int kbase_debug_csf_fault_init(struct kbase_device *kbdev); + +/** + * kbase_debug_csf_fault_term - Clean up resources created by + * @kbase_debug_csf_fault_init. + * @kbdev: Device pointer + */ +void kbase_debug_csf_fault_term(struct kbase_device *kbdev); + +/** + * kbase_debug_csf_fault_wait_completion - Wait for the client to complete. + * + * @kbdev: Device Pointer + * + * Wait for the user space client to finish reading the fault information. + * This function must be called in thread context. + */ +void kbase_debug_csf_fault_wait_completion(struct kbase_device *kbdev); + +/** + * kbase_debug_csf_fault_notify - Notify client of a fault. + * + * @kbdev: Device pointer + * @kctx: Faulty context (can be NULL) + * @error: Error code. + * + * Store fault information and wake up the user space client. + * + * Return: true if a dump on fault was initiated or was is in progress and + * so caller can opt to wait for the dumping to complete. + */ +bool kbase_debug_csf_fault_notify(struct kbase_device *kbdev, + struct kbase_context *kctx, enum dumpfault_error_type error); + +/** + * kbase_debug_csf_fault_dump_enabled - Check if dump on fault is enabled. + * + * @kbdev: Device pointer + * + * Return: true if debugfs file is opened so dump on fault is enabled. + */ +static inline bool kbase_debug_csf_fault_dump_enabled(struct kbase_device *kbdev) +{ + return atomic_read(&kbdev->csf.dof.enabled); +} + +/** + * kbase_debug_csf_fault_dump_complete - Check if dump on fault is completed. + * + * @kbdev: Device pointer + * + * Return: true if dump on fault completes or file is closed. + */ +static inline bool kbase_debug_csf_fault_dump_complete(struct kbase_device *kbdev) +{ + unsigned long flags; + bool ret; + + if (likely(!kbase_debug_csf_fault_dump_enabled(kbdev))) + return true; + + spin_lock_irqsave(&kbdev->csf.dof.lock, flags); + ret = (kbdev->csf.dof.error_code == DF_NO_ERROR); + spin_unlock_irqrestore(&kbdev->csf.dof.lock, flags); + + return ret; +} +#else /* CONFIG_DEBUG_FS */ +static inline int kbase_debug_csf_fault_init(struct kbase_device *kbdev) +{ + return 0; +} + +static inline void kbase_debug_csf_fault_term(struct kbase_device *kbdev) +{ +} + +static inline void kbase_debug_csf_fault_wait_completion(struct kbase_device *kbdev) +{ +} + +static inline bool kbase_debug_csf_fault_notify(struct kbase_device *kbdev, + struct kbase_context *kctx, enum dumpfault_error_type error) +{ + return false; +} + +static inline bool kbase_debug_csf_fault_dump_enabled(struct kbase_device *kbdev) +{ + return false; +} + +static inline bool kbase_debug_csf_fault_dump_complete(struct kbase_device *kbdev) +{ + return true; +} +#endif /* CONFIG_DEBUG_FS */ + +#endif /*_KBASE_DEBUG_CSF_FAULT_H*/ diff --git a/mali_kbase/debug/Kbuild b/mali_kbase/debug/Kbuild index 1682c0f..8beee2d 100644 --- a/mali_kbase/debug/Kbuild +++ b/mali_kbase/debug/Kbuild @@ -1,6 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note # -# (C) COPYRIGHT 2021 ARM Limited. All rights reserved. +# (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved. # # This program is free software and is provided to you under the terms of the # GNU General Public License version 2 as published by the Free Software @@ -22,6 +22,7 @@ mali_kbase-y += debug/mali_kbase_debug_ktrace.o ifeq ($(CONFIG_MALI_CSF_SUPPORT),y) mali_kbase-y += debug/backend/mali_kbase_debug_ktrace_csf.o + mali_kbase-$(CONFIG_MALI_CORESIGHT) += debug/backend/mali_kbase_debug_coresight_csf.o else mali_kbase-y += debug/backend/mali_kbase_debug_ktrace_jm.o endif diff --git a/mali_kbase/debug/backend/mali_kbase_debug_coresight_csf.c b/mali_kbase/debug/backend/mali_kbase_debug_coresight_csf.c new file mode 100644 index 0000000..ff5f947 --- /dev/null +++ b/mali_kbase/debug/backend/mali_kbase_debug_coresight_csf.c @@ -0,0 +1,851 @@ +// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note +/* + * + * (C) COPYRIGHT 2022 ARM Limited. All rights reserved. + * + * This program is free software and is provided to you under the terms of the + * GNU General Public License version 2 as published by the Free Software + * Foundation, and any use by you of this program is subject to the terms + * of such GNU license. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, you can access it online at + * http://www.gnu.org/licenses/gpl-2.0.html. + * + */ + +#include <mali_kbase.h> +#include <linux/slab.h> +#include <csf/mali_kbase_csf_registers.h> +#include <csf/mali_kbase_csf_firmware.h> +#include <backend/gpu/mali_kbase_pm_internal.h> +#include <linux/mali_kbase_debug_coresight_csf.h> +#include <debug/backend/mali_kbase_debug_coresight_internal_csf.h> + +static const char *coresight_state_to_string(enum kbase_debug_coresight_csf_state state) +{ + switch (state) { + case KBASE_DEBUG_CORESIGHT_CSF_DISABLED: + return "DISABLED"; + case KBASE_DEBUG_CORESIGHT_CSF_ENABLED: + return "ENABLED"; + default: + break; + } + + return "UNKNOWN"; +} + +static bool validate_reg_addr(struct kbase_debug_coresight_csf_client *client, + struct kbase_device *kbdev, u32 reg_addr, u8 op_type) +{ + int i; + + if (reg_addr & 0x3) { + dev_err(kbdev->dev, "Invalid operation %d: reg_addr (0x%x) not 32bit aligned", + op_type, reg_addr); + return false; + } + + for (i = 0; i < client->nr_ranges; i++) { + struct kbase_debug_coresight_csf_address_range *range = &client->addr_ranges[i]; + + if ((range->start <= reg_addr) && (reg_addr <= range->end)) + return true; + } + + dev_err(kbdev->dev, "Invalid operation %d: reg_addr (0x%x) not in client range", op_type, + reg_addr); + + return false; +} + +static bool validate_op(struct kbase_debug_coresight_csf_client *client, + struct kbase_debug_coresight_csf_op *op) +{ + struct kbase_device *kbdev; + u32 reg; + + if (!op) + return false; + + if (!client) + return false; + + kbdev = (struct kbase_device *)client->drv_data; + + switch (op->type) { + case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_NOP: + return true; + case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_WRITE_IMM: + if (validate_reg_addr(client, kbdev, op->op.write_imm.reg_addr, op->type)) + return true; + + break; + case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_WRITE_IMM_RANGE: + for (reg = op->op.write_imm_range.reg_start; reg <= op->op.write_imm_range.reg_end; + reg += sizeof(u32)) { + if (!validate_reg_addr(client, kbdev, reg, op->type)) + return false; + } + + return true; + case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_WRITE: + if (!op->op.write.ptr) { + dev_err(kbdev->dev, "Invalid operation %d: ptr not set", op->type); + break; + } + + if (validate_reg_addr(client, kbdev, op->op.write.reg_addr, op->type)) + return true; + + break; + case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_READ: + if (!op->op.read.ptr) { + dev_err(kbdev->dev, "Invalid operation %d: ptr not set", op->type); + break; + } + + if (validate_reg_addr(client, kbdev, op->op.read.reg_addr, op->type)) + return true; + + break; + case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_POLL: + if (validate_reg_addr(client, kbdev, op->op.poll.reg_addr, op->type)) + return true; + + break; + case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_BIT_AND: + fallthrough; + case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_BIT_OR: + fallthrough; + case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_BIT_XOR: + fallthrough; + case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_BIT_NOT: + if (op->op.bitw.ptr != NULL) + return true; + + dev_err(kbdev->dev, "Invalid bitwise operation pointer"); + + break; + default: + dev_err(kbdev->dev, "Invalid operation %d", op->type); + break; + } + + return false; +} + +static bool validate_seq(struct kbase_debug_coresight_csf_client *client, + struct kbase_debug_coresight_csf_sequence *seq) +{ + struct kbase_debug_coresight_csf_op *ops = seq->ops; + int nr_ops = seq->nr_ops; + int i; + + for (i = 0; i < nr_ops; i++) { + if (!validate_op(client, &ops[i])) + return false; + } + + return true; +} + +static int execute_op(struct kbase_device *kbdev, struct kbase_debug_coresight_csf_op *op) +{ + int result = -EINVAL; + u32 reg; + + dev_dbg(kbdev->dev, "Execute operation %d", op->type); + + switch (op->type) { + case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_NOP: + result = 0; + break; + case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_WRITE_IMM: + result = kbase_csf_firmware_mcu_register_write(kbdev, op->op.write.reg_addr, + op->op.write_imm.val); + break; + case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_WRITE_IMM_RANGE: + for (reg = op->op.write_imm_range.reg_start; reg <= op->op.write_imm_range.reg_end; + reg += sizeof(u32)) { + result = kbase_csf_firmware_mcu_register_write(kbdev, reg, + op->op.write_imm_range.val); + if (!result) + break; + } + break; + case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_WRITE: + result = kbase_csf_firmware_mcu_register_write(kbdev, op->op.write.reg_addr, + *op->op.write.ptr); + break; + case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_READ: + result = kbase_csf_firmware_mcu_register_read(kbdev, op->op.read.reg_addr, + op->op.read.ptr); + break; + case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_POLL: + result = kbase_csf_firmware_mcu_register_poll(kbdev, op->op.poll.reg_addr, + op->op.poll.mask, op->op.poll.val); + break; + case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_BIT_AND: + *op->op.bitw.ptr &= op->op.bitw.val; + result = 0; + break; + case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_BIT_OR: + *op->op.bitw.ptr |= op->op.bitw.val; + result = 0; + break; + case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_BIT_XOR: + *op->op.bitw.ptr ^= op->op.bitw.val; + result = 0; + break; + case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_BIT_NOT: + *op->op.bitw.ptr = ~(*op->op.bitw.ptr); + result = 0; + break; + default: + dev_err(kbdev->dev, "Invalid operation %d", op->type); + break; + } + + return result; +} + +static int coresight_config_enable(struct kbase_device *kbdev, + struct kbase_debug_coresight_csf_config *config) +{ + int ret = 0; + int i; + + if (!config) + return -EINVAL; + + if (config->state == KBASE_DEBUG_CORESIGHT_CSF_ENABLED) + return ret; + + for (i = 0; config->enable_seq && !ret && i < config->enable_seq->nr_ops; i++) + ret = execute_op(kbdev, &config->enable_seq->ops[i]); + + if (!ret) { + dev_dbg(kbdev->dev, "Coresight config (0x%pK) state transition: %s to %s", config, + coresight_state_to_string(config->state), + coresight_state_to_string(KBASE_DEBUG_CORESIGHT_CSF_ENABLED)); + config->state = KBASE_DEBUG_CORESIGHT_CSF_ENABLED; + } + + /* Always assign the return code during config enable. + * It gets propagated when calling config disable. + */ + config->error = ret; + + return ret; +} + +static int coresight_config_disable(struct kbase_device *kbdev, + struct kbase_debug_coresight_csf_config *config) +{ + int ret = 0; + int i; + + if (!config) + return -EINVAL; + + if (config->state == KBASE_DEBUG_CORESIGHT_CSF_DISABLED) + return ret; + + for (i = 0; config->disable_seq && !ret && i < config->disable_seq->nr_ops; i++) + ret = execute_op(kbdev, &config->disable_seq->ops[i]); + + if (!ret) { + dev_dbg(kbdev->dev, "Coresight config (0x%pK) state transition: %s to %s", config, + coresight_state_to_string(config->state), + coresight_state_to_string(KBASE_DEBUG_CORESIGHT_CSF_DISABLED)); + config->state = KBASE_DEBUG_CORESIGHT_CSF_DISABLED; + } else { + /* Only assign the error if ret is not 0. + * As we don't want to overwrite an error from config enable + */ + if (!config->error) + config->error = ret; + } + + return ret; +} + +void *kbase_debug_coresight_csf_register(void *drv_data, + struct kbase_debug_coresight_csf_address_range *ranges, + int nr_ranges) +{ + struct kbase_debug_coresight_csf_client *client, *client_entry; + struct kbase_device *kbdev; + unsigned long flags; + int k; + + if (unlikely(!drv_data)) { + pr_err("NULL drv_data"); + return NULL; + } + + kbdev = (struct kbase_device *)drv_data; + + if (unlikely(!ranges)) { + dev_err(kbdev->dev, "NULL ranges"); + return NULL; + } + + if (unlikely(!nr_ranges)) { + dev_err(kbdev->dev, "nr_ranges is 0"); + return NULL; + } + + for (k = 0; k < nr_ranges; k++) { + if (ranges[k].end < ranges[k].start) { + dev_err(kbdev->dev, "Invalid address ranges 0x%08x - 0x%08x", + ranges[k].start, ranges[k].end); + return NULL; + } + } + + client = kzalloc(sizeof(struct kbase_debug_coresight_csf_client), GFP_KERNEL); + + if (!client) + return NULL; + + spin_lock_irqsave(&kbdev->csf.coresight.lock, flags); + list_for_each_entry(client_entry, &kbdev->csf.coresight.clients, link) { + struct kbase_debug_coresight_csf_address_range *client_ranges = + client_entry->addr_ranges; + int i; + + for (i = 0; i < client_entry->nr_ranges; i++) { + int j; + + for (j = 0; j < nr_ranges; j++) { + if ((ranges[j].start < client_ranges[i].end) && + (client_ranges[i].start < ranges[j].end)) { + spin_unlock_irqrestore(&kbdev->csf.coresight.lock, flags); + kfree(client); + dev_err(kbdev->dev, + "Client with range 0x%08x - 0x%08x already present at address range 0x%08x - 0x%08x", + client_ranges[i].start, client_ranges[i].end, + ranges[j].start, ranges[j].end); + + return NULL; + } + } + } + } + + client->drv_data = drv_data; + client->addr_ranges = ranges; + client->nr_ranges = nr_ranges; + list_add(&client->link, &kbdev->csf.coresight.clients); + spin_unlock_irqrestore(&kbdev->csf.coresight.lock, flags); + + return client; +} +EXPORT_SYMBOL(kbase_debug_coresight_csf_register); + +void kbase_debug_coresight_csf_unregister(void *client_data) +{ + struct kbase_debug_coresight_csf_client *client; + struct kbase_debug_coresight_csf_config *config_entry; + struct kbase_device *kbdev; + unsigned long flags; + bool retry = true; + + if (unlikely(!client_data)) { + pr_err("NULL client"); + return; + } + + client = (struct kbase_debug_coresight_csf_client *)client_data; + + kbdev = (struct kbase_device *)client->drv_data; + if (unlikely(!kbdev)) { + pr_err("NULL drv_data in client"); + return; + } + + /* check for active config from client */ + spin_lock_irqsave(&kbdev->csf.coresight.lock, flags); + list_del_init(&client->link); + + while (retry && !list_empty(&kbdev->csf.coresight.configs)) { + retry = false; + list_for_each_entry(config_entry, &kbdev->csf.coresight.configs, link) { + if (config_entry->client == client) { + spin_unlock_irqrestore(&kbdev->csf.coresight.lock, flags); + kbase_debug_coresight_csf_config_free(config_entry); + spin_lock_irqsave(&kbdev->csf.coresight.lock, flags); + retry = true; + break; + } + } + } + spin_unlock_irqrestore(&kbdev->csf.coresight.lock, flags); + + kfree(client); +} +EXPORT_SYMBOL(kbase_debug_coresight_csf_unregister); + +void * +kbase_debug_coresight_csf_config_create(void *client_data, + struct kbase_debug_coresight_csf_sequence *enable_seq, + struct kbase_debug_coresight_csf_sequence *disable_seq) +{ + struct kbase_debug_coresight_csf_client *client; + struct kbase_debug_coresight_csf_config *config; + struct kbase_device *kbdev; + + if (unlikely(!client_data)) { + pr_err("NULL client"); + return NULL; + } + + client = (struct kbase_debug_coresight_csf_client *)client_data; + + kbdev = (struct kbase_device *)client->drv_data; + if (unlikely(!kbdev)) { + pr_err("NULL drv_data in client"); + return NULL; + } + + if (enable_seq) { + if (!validate_seq(client, enable_seq)) { + dev_err(kbdev->dev, "Invalid enable_seq"); + return NULL; + } + } + + if (disable_seq) { + if (!validate_seq(client, disable_seq)) { + dev_err(kbdev->dev, "Invalid disable_seq"); + return NULL; + } + } + + config = kzalloc(sizeof(struct kbase_debug_coresight_csf_config), GFP_KERNEL); + if (WARN_ON(!client)) + return NULL; + + config->client = client; + config->enable_seq = enable_seq; + config->disable_seq = disable_seq; + config->error = 0; + config->state = KBASE_DEBUG_CORESIGHT_CSF_DISABLED; + + INIT_LIST_HEAD(&config->link); + + return config; +} +EXPORT_SYMBOL(kbase_debug_coresight_csf_config_create); + +void kbase_debug_coresight_csf_config_free(void *config_data) +{ + struct kbase_debug_coresight_csf_config *config; + + if (unlikely(!config_data)) { + pr_err("NULL config"); + return; + } + + config = (struct kbase_debug_coresight_csf_config *)config_data; + + kbase_debug_coresight_csf_config_disable(config); + + kfree(config); +} +EXPORT_SYMBOL(kbase_debug_coresight_csf_config_free); + +int kbase_debug_coresight_csf_config_enable(void *config_data) +{ + struct kbase_debug_coresight_csf_config *config; + struct kbase_debug_coresight_csf_client *client; + struct kbase_device *kbdev; + struct kbase_debug_coresight_csf_config *config_entry; + unsigned long flags; + int ret = 0; + + if (unlikely(!config_data)) { + pr_err("NULL config"); + return -EINVAL; + } + + config = (struct kbase_debug_coresight_csf_config *)config_data; + client = (struct kbase_debug_coresight_csf_client *)config->client; + + if (unlikely(!client)) { + pr_err("NULL client in config"); + return -EINVAL; + } + + kbdev = (struct kbase_device *)client->drv_data; + if (unlikely(!kbdev)) { + pr_err("NULL drv_data in client"); + return -EINVAL; + } + + /* Check to prevent double entry of config */ + spin_lock_irqsave(&kbdev->csf.coresight.lock, flags); + list_for_each_entry(config_entry, &kbdev->csf.coresight.configs, link) { + if (config_entry == config) { + spin_unlock_irqrestore(&kbdev->csf.coresight.lock, flags); + dev_err(kbdev->dev, "Config already enabled"); + return -EINVAL; + } + } + spin_unlock_irqrestore(&kbdev->csf.coresight.lock, flags); + + kbase_csf_scheduler_lock(kbdev); + kbase_csf_scheduler_spin_lock(kbdev, &flags); + + /* Check the state of Scheduler to confirm the desired state of MCU */ + if (((kbdev->csf.scheduler.state != SCHED_SUSPENDED) && + (kbdev->csf.scheduler.state != SCHED_SLEEPING) && + !kbase_csf_scheduler_protected_mode_in_use(kbdev)) || + kbase_pm_get_policy(kbdev) == &kbase_pm_always_on_policy_ops) { + kbase_csf_scheduler_spin_unlock(kbdev, flags); + /* Wait for MCU to reach the stable ON state */ + ret = kbase_pm_wait_for_desired_state(kbdev); + + if (ret) + dev_err(kbdev->dev, + "Wait for PM state failed when enabling coresight config"); + else + ret = coresight_config_enable(kbdev, config); + + kbase_csf_scheduler_spin_lock(kbdev, &flags); + } + + /* Add config to next enable sequence */ + if (!ret) { + spin_lock(&kbdev->csf.coresight.lock); + list_add(&config->link, &kbdev->csf.coresight.configs); + spin_unlock(&kbdev->csf.coresight.lock); + } + + kbase_csf_scheduler_spin_unlock(kbdev, flags); + kbase_csf_scheduler_unlock(kbdev); + + return ret; +} +EXPORT_SYMBOL(kbase_debug_coresight_csf_config_enable); + +int kbase_debug_coresight_csf_config_disable(void *config_data) +{ + struct kbase_debug_coresight_csf_config *config; + struct kbase_debug_coresight_csf_client *client; + struct kbase_device *kbdev; + struct kbase_debug_coresight_csf_config *config_entry; + bool found_in_list = false; + unsigned long flags; + int ret = 0; + + if (unlikely(!config_data)) { + pr_err("NULL config"); + return -EINVAL; + } + + config = (struct kbase_debug_coresight_csf_config *)config_data; + + /* Exit early if not enabled prior */ + if (list_empty(&config->link)) + return ret; + + client = (struct kbase_debug_coresight_csf_client *)config->client; + + if (unlikely(!client)) { + pr_err("NULL client in config"); + return -EINVAL; + } + + kbdev = (struct kbase_device *)client->drv_data; + if (unlikely(!kbdev)) { + pr_err("NULL drv_data in client"); + return -EINVAL; + } + + /* Check if the config is in the correct list */ + spin_lock_irqsave(&kbdev->csf.coresight.lock, flags); + list_for_each_entry(config_entry, &kbdev->csf.coresight.configs, link) { + if (config_entry == config) { + found_in_list = true; + break; + } + } + spin_unlock_irqrestore(&kbdev->csf.coresight.lock, flags); + + if (!found_in_list) { + dev_err(kbdev->dev, "Config looks corrupted"); + return -EINVAL; + } + + kbase_csf_scheduler_lock(kbdev); + kbase_csf_scheduler_spin_lock(kbdev, &flags); + + /* Check the state of Scheduler to confirm the desired state of MCU */ + if (((kbdev->csf.scheduler.state != SCHED_SUSPENDED) && + (kbdev->csf.scheduler.state != SCHED_SLEEPING) && + !kbase_csf_scheduler_protected_mode_in_use(kbdev)) || + kbase_pm_get_policy(kbdev) == &kbase_pm_always_on_policy_ops) { + kbase_csf_scheduler_spin_unlock(kbdev, flags); + /* Wait for MCU to reach the stable ON state */ + ret = kbase_pm_wait_for_desired_state(kbdev); + + if (ret) + dev_err(kbdev->dev, + "Wait for PM state failed when disabling coresight config"); + else + ret = coresight_config_disable(kbdev, config); + + kbase_csf_scheduler_spin_lock(kbdev, &flags); + } else if (kbdev->pm.backend.mcu_state == KBASE_MCU_OFF) { + /* MCU is OFF, so the disable sequence was already executed. + * + * Propagate any error that would have occurred during the enable + * or disable sequence. + * + * This is done as part of the disable sequence, since the call from + * client is synchronous. + */ + ret = config->error; + } + + /* Remove config from next disable sequence */ + spin_lock(&kbdev->csf.coresight.lock); + list_del_init(&config->link); + spin_unlock(&kbdev->csf.coresight.lock); + + kbase_csf_scheduler_spin_unlock(kbdev, flags); + kbase_csf_scheduler_unlock(kbdev); + + return ret; +} +EXPORT_SYMBOL(kbase_debug_coresight_csf_config_disable); + +static void coresight_config_enable_all(struct work_struct *data) +{ + struct kbase_device *kbdev = + container_of(data, struct kbase_device, csf.coresight.enable_work); + struct kbase_debug_coresight_csf_config *config_entry; + unsigned long flags; + + spin_lock_irqsave(&kbdev->csf.coresight.lock, flags); + + list_for_each_entry(config_entry, &kbdev->csf.coresight.configs, link) { + spin_unlock_irqrestore(&kbdev->csf.coresight.lock, flags); + if (coresight_config_enable(kbdev, config_entry)) + dev_err(kbdev->dev, "enable config (0x%pK) failed", config_entry); + spin_lock_irqsave(&kbdev->csf.coresight.lock, flags); + } + + spin_unlock_irqrestore(&kbdev->csf.coresight.lock, flags); + + spin_lock_irqsave(&kbdev->hwaccess_lock, flags); + kbase_pm_update_state(kbdev); + spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); + + wake_up_all(&kbdev->csf.coresight.event_wait); +} + +static void coresight_config_disable_all(struct work_struct *data) +{ + struct kbase_device *kbdev = + container_of(data, struct kbase_device, csf.coresight.disable_work); + struct kbase_debug_coresight_csf_config *config_entry; + unsigned long flags; + + spin_lock_irqsave(&kbdev->csf.coresight.lock, flags); + + list_for_each_entry(config_entry, &kbdev->csf.coresight.configs, link) { + spin_unlock_irqrestore(&kbdev->csf.coresight.lock, flags); + if (coresight_config_disable(kbdev, config_entry)) + dev_err(kbdev->dev, "disable config (0x%pK) failed", config_entry); + spin_lock_irqsave(&kbdev->csf.coresight.lock, flags); + } + + spin_unlock_irqrestore(&kbdev->csf.coresight.lock, flags); + + spin_lock_irqsave(&kbdev->hwaccess_lock, flags); + kbase_pm_update_state(kbdev); + spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); + + wake_up_all(&kbdev->csf.coresight.event_wait); +} + +void kbase_debug_coresight_csf_disable_pmode_enter(struct kbase_device *kbdev) +{ + unsigned long flags; + + dev_dbg(kbdev->dev, "Coresight state %s before protected mode enter", + coresight_state_to_string(KBASE_DEBUG_CORESIGHT_CSF_ENABLED)); + + lockdep_assert_held(&kbdev->csf.scheduler.lock); + + kbase_pm_lock(kbdev); + spin_lock_irqsave(&kbdev->hwaccess_lock, flags); + + kbdev->csf.coresight.disable_on_pmode_enter = true; + kbdev->csf.coresight.enable_on_pmode_exit = false; + kbase_pm_update_state(kbdev); + + spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); + + kbase_pm_wait_for_desired_state(kbdev); + + kbase_pm_unlock(kbdev); +} + +void kbase_debug_coresight_csf_enable_pmode_exit(struct kbase_device *kbdev) +{ + dev_dbg(kbdev->dev, "Coresight state %s after protected mode exit", + coresight_state_to_string(KBASE_DEBUG_CORESIGHT_CSF_DISABLED)); + + lockdep_assert_held(&kbdev->hwaccess_lock); + + WARN_ON(kbdev->csf.coresight.disable_on_pmode_enter); + + kbdev->csf.coresight.enable_on_pmode_exit = true; + kbase_pm_update_state(kbdev); +} + +void kbase_debug_coresight_csf_state_request(struct kbase_device *kbdev, + enum kbase_debug_coresight_csf_state state) +{ + if (unlikely(!kbdev)) + return; + + if (unlikely(!kbdev->csf.coresight.workq)) + return; + + dev_dbg(kbdev->dev, "Coresight state %s requested", coresight_state_to_string(state)); + + switch (state) { + case KBASE_DEBUG_CORESIGHT_CSF_DISABLED: + queue_work(kbdev->csf.coresight.workq, &kbdev->csf.coresight.disable_work); + break; + case KBASE_DEBUG_CORESIGHT_CSF_ENABLED: + queue_work(kbdev->csf.coresight.workq, &kbdev->csf.coresight.enable_work); + break; + default: + dev_err(kbdev->dev, "Invalid Coresight state %d", state); + break; + } +} + +bool kbase_debug_coresight_csf_state_check(struct kbase_device *kbdev, + enum kbase_debug_coresight_csf_state state) +{ + struct kbase_debug_coresight_csf_config *config_entry; + unsigned long flags; + bool success = true; + + dev_dbg(kbdev->dev, "Coresight check for state: %s", coresight_state_to_string(state)); + + spin_lock_irqsave(&kbdev->csf.coresight.lock, flags); + + list_for_each_entry(config_entry, &kbdev->csf.coresight.configs, link) { + if (state != config_entry->state) { + success = false; + break; + } + } + + spin_unlock_irqrestore(&kbdev->csf.coresight.lock, flags); + + return success; +} +KBASE_EXPORT_TEST_API(kbase_debug_coresight_csf_state_check); + +bool kbase_debug_coresight_csf_state_wait(struct kbase_device *kbdev, + enum kbase_debug_coresight_csf_state state) +{ + const long wait_timeout = kbase_csf_timeout_in_jiffies(kbdev->csf.fw_timeout_ms); + struct kbase_debug_coresight_csf_config *config_entry, *next_config_entry; + unsigned long flags; + bool success = true; + + dev_dbg(kbdev->dev, "Coresight wait for state: %s", coresight_state_to_string(state)); + + spin_lock_irqsave(&kbdev->csf.coresight.lock, flags); + + list_for_each_entry_safe(config_entry, next_config_entry, &kbdev->csf.coresight.configs, + link) { + const enum kbase_debug_coresight_csf_state prev_state = config_entry->state; + long remaining; + + spin_unlock_irqrestore(&kbdev->csf.coresight.lock, flags); + remaining = wait_event_timeout(kbdev->csf.coresight.event_wait, + state == config_entry->state, wait_timeout); + spin_lock_irqsave(&kbdev->csf.coresight.lock, flags); + + if (!remaining) { + success = false; + dev_err(kbdev->dev, + "Timeout waiting for Coresight state transition %s to %s", + coresight_state_to_string(prev_state), + coresight_state_to_string(state)); + } + } + + spin_unlock_irqrestore(&kbdev->csf.coresight.lock, flags); + + return success; +} +KBASE_EXPORT_TEST_API(kbase_debug_coresight_csf_state_wait); + +int kbase_debug_coresight_csf_init(struct kbase_device *kbdev) +{ + kbdev->csf.coresight.workq = alloc_ordered_workqueue("Mali CoreSight workqueue", 0); + if (kbdev->csf.coresight.workq == NULL) + return -ENOMEM; + + INIT_LIST_HEAD(&kbdev->csf.coresight.clients); + INIT_LIST_HEAD(&kbdev->csf.coresight.configs); + INIT_WORK(&kbdev->csf.coresight.enable_work, coresight_config_enable_all); + INIT_WORK(&kbdev->csf.coresight.disable_work, coresight_config_disable_all); + init_waitqueue_head(&kbdev->csf.coresight.event_wait); + spin_lock_init(&kbdev->csf.coresight.lock); + + kbdev->csf.coresight.disable_on_pmode_enter = false; + kbdev->csf.coresight.enable_on_pmode_exit = false; + + return 0; +} + +void kbase_debug_coresight_csf_term(struct kbase_device *kbdev) +{ + struct kbase_debug_coresight_csf_client *client_entry, *next_client_entry; + struct kbase_debug_coresight_csf_config *config_entry, *next_config_entry; + unsigned long flags; + + kbdev->csf.coresight.disable_on_pmode_enter = false; + kbdev->csf.coresight.enable_on_pmode_exit = false; + + cancel_work_sync(&kbdev->csf.coresight.enable_work); + cancel_work_sync(&kbdev->csf.coresight.disable_work); + destroy_workqueue(kbdev->csf.coresight.workq); + kbdev->csf.coresight.workq = NULL; + + spin_lock_irqsave(&kbdev->csf.coresight.lock, flags); + + list_for_each_entry_safe(config_entry, next_config_entry, &kbdev->csf.coresight.configs, + link) { + list_del_init(&config_entry->link); + kfree(config_entry); + } + + list_for_each_entry_safe(client_entry, next_client_entry, &kbdev->csf.coresight.clients, + link) { + list_del_init(&client_entry->link); + kfree(client_entry); + } + + spin_unlock_irqrestore(&kbdev->csf.coresight.lock, flags); +} diff --git a/mali_kbase/debug/backend/mali_kbase_debug_coresight_internal_csf.h b/mali_kbase/debug/backend/mali_kbase_debug_coresight_internal_csf.h new file mode 100644 index 0000000..06d62dc --- /dev/null +++ b/mali_kbase/debug/backend/mali_kbase_debug_coresight_internal_csf.h @@ -0,0 +1,182 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * + * (C) COPYRIGHT 2022 ARM Limited. All rights reserved. + * + * This program is free software and is provided to you under the terms of the + * GNU General Public License version 2 as published by the Free Software + * Foundation, and any use by you of this program is subject to the terms + * of such GNU license. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, you can access it online at + * http://www.gnu.org/licenses/gpl-2.0.html. + * + */ + +#ifndef _KBASE_DEBUG_CORESIGHT_INTERNAL_CSF_H_ +#define _KBASE_DEBUG_CORESIGHT_INTERNAL_CSF_H_ + +#include <mali_kbase.h> +#include <linux/mali_kbase_debug_coresight_csf.h> + +/** + * struct kbase_debug_coresight_csf_client - Coresight client definition + * + * @drv_data: Pointer to driver device data. + * @addr_ranges: Arrays of address ranges used by the registered client. + * @nr_ranges: Size of @addr_ranges array. + * @link: Link item of a Coresight client. + * Linked to &struct_kbase_device.csf.coresight.clients. + */ +struct kbase_debug_coresight_csf_client { + void *drv_data; + struct kbase_debug_coresight_csf_address_range *addr_ranges; + u32 nr_ranges; + struct list_head link; +}; + +/** + * enum kbase_debug_coresight_csf_state - Coresight configuration states + * + * @KBASE_DEBUG_CORESIGHT_CSF_DISABLED: Coresight configuration is disabled. + * @KBASE_DEBUG_CORESIGHT_CSF_ENABLED: Coresight configuration is enabled. + */ +enum kbase_debug_coresight_csf_state { + KBASE_DEBUG_CORESIGHT_CSF_DISABLED = 0, + KBASE_DEBUG_CORESIGHT_CSF_ENABLED, +}; + +/** + * struct kbase_debug_coresight_csf_config - Coresight configuration definition + * + * @client: Pointer to the client for which the configuration is created. + * @enable_seq: Array of operations for Coresight client enable sequence. Can be NULL. + * @disable_seq: Array of operations for Coresight client disable sequence. Can be NULL. + * @state: Current Coresight configuration state. + * @error: Error code used to know if an error occurred during the execution + * of the enable or disable sequences. + * @link: Link item of a Coresight configuration. + * Linked to &struct_kbase_device.csf.coresight.configs. + */ +struct kbase_debug_coresight_csf_config { + void *client; + struct kbase_debug_coresight_csf_sequence *enable_seq; + struct kbase_debug_coresight_csf_sequence *disable_seq; + enum kbase_debug_coresight_csf_state state; + int error; + struct list_head link; +}; + +/** + * struct kbase_debug_coresight_device - Object representing the Coresight device + * + * @clients: List head to maintain Coresight clients. + * @configs: List head to maintain Coresight configs. + * @lock: A lock to protect client/config lists. + * Lists can be accessed concurrently by + * Coresight kernel modules and kernel threads. + * @workq: Work queue for Coresight enable/disable execution. + * @enable_work: Work item used to enable Coresight. + * @disable_work: Work item used to disable Coresight. + * @event_wait: Wait queue for Coresight events. + * @enable_on_pmode_exit: Flag used by the PM state machine to + * identify if Coresight enable is needed. + * @disable_on_pmode_enter: Flag used by the PM state machine to + * identify if Coresight disable is needed. + */ +struct kbase_debug_coresight_device { + struct list_head clients; + struct list_head configs; + spinlock_t lock; + struct workqueue_struct *workq; + struct work_struct enable_work; + struct work_struct disable_work; + wait_queue_head_t event_wait; + bool enable_on_pmode_exit; + bool disable_on_pmode_enter; +}; + +/** + * kbase_debug_coresight_csf_init - Initialize Coresight resources. + * + * @kbdev: Instance of a GPU platform device that implements a CSF interface. + * + * This function should be called once at device initialization. + * + * Return: 0 on success. + */ +int kbase_debug_coresight_csf_init(struct kbase_device *kbdev); + +/** + * kbase_debug_coresight_csf_term - Terminate Coresight resources. + * + * @kbdev: Instance of a GPU platform device that implements a CSF interface. + * + * This function should be called at device termination to prevent any + * memory leaks if Coresight module would have been removed without calling + * kbasep_debug_coresight_csf_trace_disable(). + */ +void kbase_debug_coresight_csf_term(struct kbase_device *kbdev); + +/** + * kbase_debug_coresight_csf_disable_pmode_enter - Disable Coresight on Protected + * mode enter. + * + * @kbdev: Instance of a GPU platform device that implements a CSF interface. + * + * This function should be called just before requesting to enter protected mode. + * It will trigger a PM state machine transition from MCU_ON + * to ON_PMODE_ENTER_CORESIGHT_DISABLE. + */ +void kbase_debug_coresight_csf_disable_pmode_enter(struct kbase_device *kbdev); + +/** + * kbase_debug_coresight_csf_enable_pmode_exit - Enable Coresight on Protected + * mode enter. + * + * @kbdev: Instance of a GPU platform device that implements a CSF interface. + * + * This function should be called after protected mode exit is acknowledged. + * It will trigger a PM state machine transition from MCU_ON + * to ON_PMODE_EXIT_CORESIGHT_ENABLE. + */ +void kbase_debug_coresight_csf_enable_pmode_exit(struct kbase_device *kbdev); + +/** + * kbase_debug_coresight_csf_state_request - Request Coresight state transition. + * + * @kbdev: Instance of a GPU platform device that implements a CSF interface. + * @state: Coresight state to check for. + */ +void kbase_debug_coresight_csf_state_request(struct kbase_device *kbdev, + enum kbase_debug_coresight_csf_state state); + +/** + * kbase_debug_coresight_csf_state_check - Check Coresight state. + * + * @kbdev: Instance of a GPU platform device that implements a CSF interface. + * @state: Coresight state to check for. + * + * Return: true if all states of configs are @state. + */ +bool kbase_debug_coresight_csf_state_check(struct kbase_device *kbdev, + enum kbase_debug_coresight_csf_state state); + +/** + * kbase_debug_coresight_csf_state_wait - Wait for Coresight state transition to complete. + * + * @kbdev: Instance of a GPU platform device that implements a CSF interface. + * @state: Coresight state to wait for. + * + * Return: true if all configs become @state in pre-defined time period. + */ +bool kbase_debug_coresight_csf_state_wait(struct kbase_device *kbdev, + enum kbase_debug_coresight_csf_state state); + +#endif /* _KBASE_DEBUG_CORESIGHT_INTERNAL_CSF_H_ */ diff --git a/mali_kbase/debug/backend/mali_kbase_debug_ktrace_codes_csf.h b/mali_kbase/debug/backend/mali_kbase_debug_ktrace_codes_csf.h index 2506ce1..87e13e5 100644 --- a/mali_kbase/debug/backend/mali_kbase_debug_ktrace_codes_csf.h +++ b/mali_kbase/debug/backend/mali_kbase_debug_ktrace_codes_csf.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2020-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -42,67 +42,75 @@ int dummy_array[] = { /* * Generic CSF events */ - KBASE_KTRACE_CODE_MAKE_CODE(EVICT_CTX_SLOTS), + /* info_val = 0 */ + KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_EVICT_CTX_SLOTS_START), + /* info_val == number of CSGs supported */ + KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_EVICT_CTX_SLOTS_END), /* info_val[0:7] == fw version_minor * info_val[15:8] == fw version_major * info_val[63:32] == fw version_hash */ - KBASE_KTRACE_CODE_MAKE_CODE(FIRMWARE_BOOT), - KBASE_KTRACE_CODE_MAKE_CODE(FIRMWARE_REBOOT), - KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_TOCK), + KBASE_KTRACE_CODE_MAKE_CODE(CSF_FIRMWARE_BOOT), + KBASE_KTRACE_CODE_MAKE_CODE(CSF_FIRMWARE_REBOOT), + KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_TOCK_INVOKE), + KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_TICK_INVOKE), + KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_TOCK_START), KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_TOCK_END), /* info_val == total number of runnable groups across all kctxs */ - KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_TICK), + KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_TICK_START), KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_TICK_END), - KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_RESET), + KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_RESET_START), + KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_RESET_END), /* info_val = timeout in ms */ - KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_WAIT_PROTM_QUIT), + KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_PROTM_WAIT_QUIT_START), /* info_val = remaining ms timeout, or 0 if timedout */ - KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_WAIT_PROTM_QUIT_DONE), - KBASE_KTRACE_CODE_MAKE_CODE(SYNC_UPDATE_EVENT), - KBASE_KTRACE_CODE_MAKE_CODE(SYNC_UPDATE_EVENT_NOTIFY_GPU), + KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_PROTM_WAIT_QUIT_END), + KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_GROUP_SYNC_UPDATE_EVENT), + KBASE_KTRACE_CODE_MAKE_CODE(CSF_SYNC_UPDATE_NOTIFY_GPU_EVENT), /* info_val = JOB_IRQ_STATUS */ - KBASE_KTRACE_CODE_MAKE_CODE(CSF_INTERRUPT), + KBASE_KTRACE_CODE_MAKE_CODE(CSF_INTERRUPT_START), /* info_val = JOB_IRQ_STATUS */ KBASE_KTRACE_CODE_MAKE_CODE(CSF_INTERRUPT_END), /* info_val = JOB_IRQ_STATUS */ - KBASE_KTRACE_CODE_MAKE_CODE(CSG_INTERRUPT_PROCESS), + KBASE_KTRACE_CODE_MAKE_CODE(CSG_INTERRUPT_PROCESS_START), /* info_val = GLB_REQ ^ GLB_ACQ */ - KBASE_KTRACE_CODE_MAKE_CODE(GLB_REQ_ACQ), + KBASE_KTRACE_CODE_MAKE_CODE(CSF_INTERRUPT_GLB_REQ_ACK), /* info_val[31:0] = num non idle offslot groups * info_val[32] = scheduler can suspend on idle */ - KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_CAN_IDLE), - KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_ADVANCE_TICK), - KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_NOADVANCE_TICK), + KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_GPU_IDLE_EVENT_CAN_SUSPEND), + KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_TICK_ADVANCE), + KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_TICK_NOADVANCE), /* kctx is added to the back of the list */ - KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_INSERT_RUNNABLE), - KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_REMOVE_RUNNABLE), + KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_RUNNABLE_KCTX_INSERT), + KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_RUNNABLE_KCTX_REMOVE), /* kctx is moved to the back of the list */ - KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_ROTATE_RUNNABLE), - KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_HEAD_RUNNABLE), + KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_RUNNABLE_KCTX_ROTATE), + KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_RUNNABLE_KCTX_HEAD), - KBASE_KTRACE_CODE_MAKE_CODE(IDLE_WORKER_BEGIN), + KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_GPU_IDLE_WORKER_START), /* 4-bit encoding of boolean values (ease of reading as hex values) * * info_val[3:0] = was reset active/failed to be prevented * info_val[7:4] = whether scheduler was both idle and suspendable * info_val[11:8] = whether all groups were suspended */ - KBASE_KTRACE_CODE_MAKE_CODE(IDLE_WORKER_END), - KBASE_KTRACE_CODE_MAKE_CODE(GROUP_SYNC_UPDATE_WORKER_BEGIN), - KBASE_KTRACE_CODE_MAKE_CODE(GROUP_SYNC_UPDATE_WORKER_END), + KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_GPU_IDLE_WORKER_END), + KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_GROUP_SYNC_UPDATE_WORKER_START), + KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_GROUP_SYNC_UPDATE_WORKER_END), /* info_val = bitmask of slots that gave an ACK for STATUS_UPDATE */ - KBASE_KTRACE_CODE_MAKE_CODE(SLOTS_STATUS_UPDATE_ACK), + KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_UPDATE_IDLE_SLOTS_ACK), /* info_val[63:0] = GPU cycle counter, used mainly for benchmarking * purpose. */ - KBASE_KTRACE_CODE_MAKE_CODE(GPU_IDLE_HANDLING_START), - KBASE_KTRACE_CODE_MAKE_CODE(MCU_HALTED), - KBASE_KTRACE_CODE_MAKE_CODE(MCU_IN_SLEEP), + KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_GPU_IDLE_WORKER_HANDLING_START), + KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_GPU_IDLE_WORKER_HANDLING_END), + + KBASE_KTRACE_CODE_MAKE_CODE(CSF_FIRMWARE_MCU_HALTED), + KBASE_KTRACE_CODE_MAKE_CODE(CSF_FIRMWARE_MCU_SLEEP), /* * Group events @@ -111,21 +119,23 @@ int dummy_array[] = { * info_val[19:16] == as_nr * info_val[63:32] == endpoint config (max number of endpoints allowed) */ - KBASE_KTRACE_CODE_MAKE_CODE(CSG_SLOT_START), + KBASE_KTRACE_CODE_MAKE_CODE(CSG_SLOT_START_REQ), /* info_val == CSG_REQ state issued */ - KBASE_KTRACE_CODE_MAKE_CODE(CSG_SLOT_STOP), + KBASE_KTRACE_CODE_MAKE_CODE(CSG_SLOT_STOP_REQ), /* info_val == CSG_ACK state */ - KBASE_KTRACE_CODE_MAKE_CODE(CSG_SLOT_STARTED), + KBASE_KTRACE_CODE_MAKE_CODE(CSG_SLOT_RUNNING), /* info_val == CSG_ACK state */ KBASE_KTRACE_CODE_MAKE_CODE(CSG_SLOT_STOPPED), /* info_val == slot cleaned */ KBASE_KTRACE_CODE_MAKE_CODE(CSG_SLOT_CLEANED), /* info_val = slot requesting STATUS_UPDATE */ - KBASE_KTRACE_CODE_MAKE_CODE(CSG_SLOT_STATUS_UPDATE), + KBASE_KTRACE_CODE_MAKE_CODE(CSG_UPDATE_IDLE_SLOT_REQ), /* info_val = scheduler's new csg_slots_idle_mask[0] * group->csg_nr indicates which bit was set */ KBASE_KTRACE_CODE_MAKE_CODE(CSG_SLOT_IDLE_SET), + KBASE_KTRACE_CODE_MAKE_CODE(CSG_INTERRUPT_NO_NON_IDLE_GROUPS), + KBASE_KTRACE_CODE_MAKE_CODE(CSG_INTERRUPT_NON_IDLE_GROUPS), /* info_val = scheduler's new csg_slots_idle_mask[0] * group->csg_nr indicates which bit was cleared * @@ -133,13 +143,13 @@ int dummy_array[] = { */ KBASE_KTRACE_CODE_MAKE_CODE(CSG_SLOT_IDLE_CLEAR), /* info_val == previous priority */ - KBASE_KTRACE_CODE_MAKE_CODE(CSG_PRIO_UPDATE), + KBASE_KTRACE_CODE_MAKE_CODE(CSG_SLOT_PRIO_UPDATE), /* info_val == CSG_REQ ^ CSG_ACK */ - KBASE_KTRACE_CODE_MAKE_CODE(CSG_SYNC_UPDATE_INTERRUPT), + KBASE_KTRACE_CODE_MAKE_CODE(CSG_INTERRUPT_SYNC_UPDATE), /* info_val == CSG_REQ ^ CSG_ACK */ - KBASE_KTRACE_CODE_MAKE_CODE(CSG_IDLE_INTERRUPT), + KBASE_KTRACE_CODE_MAKE_CODE(CSG_INTERRUPT_IDLE), /* info_val == CSG_REQ ^ CSG_ACK */ - KBASE_KTRACE_CODE_MAKE_CODE(CSG_PROGRESS_TIMER_INTERRUPT), + KBASE_KTRACE_CODE_MAKE_CODE(CSG_INTERRUPT_PROGRESS_TIMER_EVENT), /* info_val[31:0] == CSG_REQ ^ CSG_ACQ * info_val[63:32] == CSG_IRQ_REQ ^ CSG_IRQ_ACK */ @@ -152,34 +162,34 @@ int dummy_array[] = { /* info_val[31:0] == new run state of the evicted group * info_val[63:32] == number of runnable groups */ - KBASE_KTRACE_CODE_MAKE_CODE(GROUP_EVICT_SCHED), + KBASE_KTRACE_CODE_MAKE_CODE(GROUP_EVICT), /* info_val == new num_runnable_grps * group is added to the back of the list for its priority level */ - KBASE_KTRACE_CODE_MAKE_CODE(GROUP_INSERT_RUNNABLE), + KBASE_KTRACE_CODE_MAKE_CODE(GROUP_RUNNABLE_INSERT), /* info_val == new num_runnable_grps */ - KBASE_KTRACE_CODE_MAKE_CODE(GROUP_REMOVE_RUNNABLE), + KBASE_KTRACE_CODE_MAKE_CODE(GROUP_RUNNABLE_REMOVE), /* info_val == num_runnable_grps * group is moved to the back of the list for its priority level */ - KBASE_KTRACE_CODE_MAKE_CODE(GROUP_ROTATE_RUNNABLE), - KBASE_KTRACE_CODE_MAKE_CODE(GROUP_HEAD_RUNNABLE), + KBASE_KTRACE_CODE_MAKE_CODE(GROUP_RUNNABLE_ROTATE), + KBASE_KTRACE_CODE_MAKE_CODE(GROUP_RUNNABLE_HEAD), /* info_val == new num_idle_wait_grps * group is added to the back of the list */ - KBASE_KTRACE_CODE_MAKE_CODE(GROUP_INSERT_IDLE_WAIT), + KBASE_KTRACE_CODE_MAKE_CODE(GROUP_IDLE_WAIT_INSERT), /* info_val == new num_idle_wait_grps * group is added to the back of the list */ - KBASE_KTRACE_CODE_MAKE_CODE(GROUP_REMOVE_IDLE_WAIT), - KBASE_KTRACE_CODE_MAKE_CODE(GROUP_HEAD_IDLE_WAIT), + KBASE_KTRACE_CODE_MAKE_CODE(GROUP_IDLE_WAIT_REMOVE), + KBASE_KTRACE_CODE_MAKE_CODE(GROUP_IDLE_WAIT_HEAD), /* info_val == is scheduler running with protected mode tasks */ - KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_CHECK_PROTM_ENTER), - KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_ENTER_PROTM), - KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_EXIT_PROTM), + KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_PROTM_ENTER_CHECK), + KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_PROTM_ENTER), + KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_PROTM_EXIT), /* info_val[31:0] == number of GPU address space slots in use * info_val[63:32] == number of runnable groups */ @@ -187,13 +197,40 @@ int dummy_array[] = { /* info_val == new count of off-slot non-idle groups * no group indicates it was set rather than incremented */ - KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_NONIDLE_OFFSLOT_INC), + KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_NONIDLE_OFFSLOT_GRP_INC), /* info_val == new count of off-slot non-idle groups */ - KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_NONIDLE_OFFSLOT_DEC), + KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_NONIDLE_OFFSLOT_GRP_DEC), + /* info_val = scheduler's new csg_slots_idle_mask[0] + * group->csg_nr indicates which bit was set + */ + KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_HANDLE_IDLE_SLOTS), - KBASE_KTRACE_CODE_MAKE_CODE(PROTM_EVENT_WORKER_BEGIN), + KBASE_KTRACE_CODE_MAKE_CODE(PROTM_EVENT_WORKER_START), KBASE_KTRACE_CODE_MAKE_CODE(PROTM_EVENT_WORKER_END), + /* info_val = scheduler state */ + KBASE_KTRACE_CODE_MAKE_CODE(SCHED_BUSY), + KBASE_KTRACE_CODE_MAKE_CODE(SCHED_INACTIVE), + KBASE_KTRACE_CODE_MAKE_CODE(SCHED_SUSPENDED), + KBASE_KTRACE_CODE_MAKE_CODE(SCHED_SLEEPING), + + /* info_val = mcu state */ +#define KBASEP_MCU_STATE(n) KBASE_KTRACE_CODE_MAKE_CODE(PM_MCU_ ## n), +#include "backend/gpu/mali_kbase_pm_mcu_states.h" +#undef KBASEP_MCU_STATE + + /* info_val = number of runnable groups */ + KBASE_KTRACE_CODE_MAKE_CODE(CSF_GROUP_INACTIVE), + KBASE_KTRACE_CODE_MAKE_CODE(CSF_GROUP_RUNNABLE), + KBASE_KTRACE_CODE_MAKE_CODE(CSF_GROUP_IDLE), + KBASE_KTRACE_CODE_MAKE_CODE(CSF_GROUP_SUSPENDED), + KBASE_KTRACE_CODE_MAKE_CODE(CSF_GROUP_SUSPENDED_ON_IDLE), + KBASE_KTRACE_CODE_MAKE_CODE(CSF_GROUP_SUSPENDED_ON_WAIT_SYNC), + /* info_val = new run state of the evicted group */ + KBASE_KTRACE_CODE_MAKE_CODE(CSF_GROUP_FAULT_EVICTED), + /* info_val = get the number of active CSGs */ + KBASE_KTRACE_CODE_MAKE_CODE(CSF_GROUP_TERMINATED), + /* * Group + Queue events */ @@ -201,42 +238,42 @@ int dummy_array[] = { KBASE_KTRACE_CODE_MAKE_CODE(CSI_START), /* info_val == queue->enabled before stop */ KBASE_KTRACE_CODE_MAKE_CODE(CSI_STOP), - KBASE_KTRACE_CODE_MAKE_CODE(CSI_STOP_REQUESTED), + KBASE_KTRACE_CODE_MAKE_CODE(CSI_STOP_REQ), /* info_val == CS_REQ ^ CS_ACK that were not processed due to the group * being suspended */ - KBASE_KTRACE_CODE_MAKE_CODE(CSI_IGNORED_INTERRUPTS_GROUP_SUSPEND), + KBASE_KTRACE_CODE_MAKE_CODE(CSI_INTERRUPT_GROUP_SUSPENDS_IGNORED), /* info_val == CS_REQ ^ CS_ACK */ - KBASE_KTRACE_CODE_MAKE_CODE(CSI_FAULT_INTERRUPT), + KBASE_KTRACE_CODE_MAKE_CODE(CSI_INTERRUPT_FAULT), /* info_val == CS_REQ ^ CS_ACK */ - KBASE_KTRACE_CODE_MAKE_CODE(CSI_TILER_OOM_INTERRUPT), + KBASE_KTRACE_CODE_MAKE_CODE(CSI_INTERRUPT_TILER_OOM), /* info_val == CS_REQ ^ CS_ACK */ - KBASE_KTRACE_CODE_MAKE_CODE(CSI_PROTM_PEND_INTERRUPT), + KBASE_KTRACE_CODE_MAKE_CODE(CSI_INTERRUPT_PROTM_PEND), /* info_val == CS_ACK_PROTM_PEND ^ CS_REQ_PROTM_PEND */ KBASE_KTRACE_CODE_MAKE_CODE(CSI_PROTM_ACK), /* info_val == group->run_State (for group the queue is bound to) */ KBASE_KTRACE_CODE_MAKE_CODE(QUEUE_START), KBASE_KTRACE_CODE_MAKE_CODE(QUEUE_STOP), /* info_val == contents of CS_STATUS_WAIT_SYNC_POINTER */ - KBASE_KTRACE_CODE_MAKE_CODE(QUEUE_SYNC_UPDATE), + KBASE_KTRACE_CODE_MAKE_CODE(QUEUE_SYNC_UPDATE_EVAL_START), /* info_val == bool for result of the evaluation */ - KBASE_KTRACE_CODE_MAKE_CODE(QUEUE_SYNC_UPDATE_EVALUATED), + KBASE_KTRACE_CODE_MAKE_CODE(QUEUE_SYNC_UPDATE_EVAL_END), /* info_val == contents of CS_STATUS_WAIT */ - KBASE_KTRACE_CODE_MAKE_CODE(QUEUE_SYNC_STATUS_WAIT), + KBASE_KTRACE_CODE_MAKE_CODE(QUEUE_SYNC_UPDATE_WAIT_STATUS), /* info_val == current sync value pointed to by queue->sync_ptr */ - KBASE_KTRACE_CODE_MAKE_CODE(QUEUE_SYNC_CURRENT_VAL), + KBASE_KTRACE_CODE_MAKE_CODE(QUEUE_SYNC_UPDATE_CUR_VAL), /* info_val == current value of CS_STATUS_WAIT_SYNC_VALUE */ - KBASE_KTRACE_CODE_MAKE_CODE(QUEUE_SYNC_TEST_VAL), + KBASE_KTRACE_CODE_MAKE_CODE(QUEUE_SYNC_UPDATE_TEST_VAL), /* info_val == current value of CS_STATUS_BLOCKED_REASON */ - KBASE_KTRACE_CODE_MAKE_CODE(QUEUE_SYNC_BLOCKED_REASON), + KBASE_KTRACE_CODE_MAKE_CODE(QUEUE_SYNC_UPDATE_BLOCKED_REASON), /* info_val = group's new protm_pending_bitmap[0] * queue->csi_index indicates which bit was set */ - KBASE_KTRACE_CODE_MAKE_CODE(PROTM_PENDING_SET), + KBASE_KTRACE_CODE_MAKE_CODE(CSI_PROTM_PEND_SET), /* info_val = group's new protm_pending_bitmap[0] * queue->csi_index indicates which bit was cleared */ - KBASE_KTRACE_CODE_MAKE_CODE(PROTM_PENDING_CLEAR), + KBASE_KTRACE_CODE_MAKE_CODE(CSI_PROTM_PEND_CLEAR), /* * KCPU queue events @@ -244,42 +281,49 @@ int dummy_array[] = { /* KTrace info_val == KCPU queue fence context * KCPU extra_info_val == N/A. */ - KBASE_KTRACE_CODE_MAKE_CODE(KCPU_QUEUE_NEW), + KBASE_KTRACE_CODE_MAKE_CODE(KCPU_QUEUE_CREATE), /* KTrace info_val == Number of pending commands in KCPU queue when * it is destroyed. * KCPU extra_info_val == Number of CQS wait operations present in * the KCPU queue when it is destroyed. */ - KBASE_KTRACE_CODE_MAKE_CODE(KCPU_QUEUE_DESTROY), + KBASE_KTRACE_CODE_MAKE_CODE(KCPU_QUEUE_DELETE), /* KTrace info_val == CQS event memory address * KCPU extra_info_val == Upper 32 bits of event memory, i.e. contents * of error field. */ - KBASE_KTRACE_CODE_MAKE_CODE(CQS_SET), + KBASE_KTRACE_CODE_MAKE_CODE(KCPU_CQS_SET), /* KTrace info_val == Number of CQS objects to be waited upon * KCPU extra_info_val == N/A. */ - KBASE_KTRACE_CODE_MAKE_CODE(CQS_WAIT_START), + KBASE_KTRACE_CODE_MAKE_CODE(KCPU_CQS_WAIT_START), /* KTrace info_val == CQS event memory address * KCPU extra_info_val == 1 if CQS was signaled with an error and queue * inherited the error, otherwise 0. */ - KBASE_KTRACE_CODE_MAKE_CODE(CQS_WAIT_END), + KBASE_KTRACE_CODE_MAKE_CODE(KCPU_CQS_WAIT_END), /* KTrace info_val == Fence context * KCPU extra_info_val == Fence seqno. */ - KBASE_KTRACE_CODE_MAKE_CODE(FENCE_SIGNAL), + KBASE_KTRACE_CODE_MAKE_CODE(KCPU_FENCE_SIGNAL), /* KTrace info_val == Fence context * KCPU extra_info_val == Fence seqno. */ - KBASE_KTRACE_CODE_MAKE_CODE(FENCE_WAIT_START), + KBASE_KTRACE_CODE_MAKE_CODE(KCPU_FENCE_WAIT_START), /* KTrace info_val == Fence context * KCPU extra_info_val == Fence seqno. */ - KBASE_KTRACE_CODE_MAKE_CODE(FENCE_WAIT_END), + KBASE_KTRACE_CODE_MAKE_CODE(KCPU_FENCE_WAIT_END), +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_ENTER_SC_RAIL), + KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_EXIT_SC_RAIL), + KBASE_KTRACE_CODE_MAKE_CODE(SC_RAIL_RECHECK_IDLE), + KBASE_KTRACE_CODE_MAKE_CODE(SC_RAIL_RECHECK_NOT_IDLE), + KBASE_KTRACE_CODE_MAKE_CODE(SC_RAIL_CAN_TURN_OFF), +#endif #if 0 /* Dummy section to avoid breaking formatting */ }; #endif -/* ***** THE LACK OF HEADER GUARDS IS INTENTIONAL ***** */ + /* ***** THE LACK OF HEADER GUARDS IS INTENTIONAL ***** */ diff --git a/mali_kbase/debug/backend/mali_kbase_debug_ktrace_csf.c b/mali_kbase/debug/backend/mali_kbase_debug_ktrace_csf.c index 824ca4b..cff6f89 100644 --- a/mali_kbase/debug/backend/mali_kbase_debug_ktrace_csf.c +++ b/mali_kbase/debug/backend/mali_kbase_debug_ktrace_csf.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2020-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -98,6 +98,9 @@ void kbasep_ktrace_add_csf(struct kbase_device *kbdev, struct kbase_ktrace_msg *trace_msg; struct kbase_context *kctx = NULL; + if (unlikely(!kbasep_ktrace_initialized(&kbdev->ktrace))) + return; + spin_lock_irqsave(&kbdev->ktrace.lock, irqflags); /* Reserve and update indices */ @@ -165,6 +168,9 @@ void kbasep_ktrace_add_csf_kcpu(struct kbase_device *kbdev, struct kbase_ktrace_msg *trace_msg; struct kbase_context *kctx = queue->kctx; + if (unlikely(!kbasep_ktrace_initialized(&kbdev->ktrace))) + return; + spin_lock_irqsave(&kbdev->ktrace.lock, irqflags); /* Reserve and update indices */ diff --git a/mali_kbase/debug/backend/mali_kbase_debug_ktrace_defs_csf.h b/mali_kbase/debug/backend/mali_kbase_debug_ktrace_defs_csf.h index 7f32cd2..1896e10 100644 --- a/mali_kbase/debug/backend/mali_kbase_debug_ktrace_defs_csf.h +++ b/mali_kbase/debug/backend/mali_kbase_debug_ktrace_defs_csf.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2020-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -47,7 +47,7 @@ * 1.3: * Add a lot of extra new traces. Tweak some existing scheduler related traces * to contain extra information information/happen at slightly different times. - * SCHEDULER_EXIT_PROTM now has group information + * SCHEDULER_PROTM_EXIT now has group information */ #define KBASE_KTRACE_VERSION_MAJOR 1 #define KBASE_KTRACE_VERSION_MINOR 3 diff --git a/mali_kbase/debug/backend/mali_kbase_debug_ktrace_jm.c b/mali_kbase/debug/backend/mali_kbase_debug_ktrace_jm.c index 05d1677..6597a15 100644 --- a/mali_kbase/debug/backend/mali_kbase_debug_ktrace_jm.c +++ b/mali_kbase/debug/backend/mali_kbase_debug_ktrace_jm.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2020-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -80,6 +80,9 @@ void kbasep_ktrace_add_jm(struct kbase_device *kbdev, unsigned long irqflags; struct kbase_ktrace_msg *trace_msg; + if (unlikely(!kbasep_ktrace_initialized(&kbdev->ktrace))) + return; + spin_lock_irqsave(&kbdev->ktrace.lock, irqflags); /* Reserve and update indices */ diff --git a/mali_kbase/debug/backend/mali_kbase_debug_linux_ktrace_csf.h b/mali_kbase/debug/backend/mali_kbase_debug_linux_ktrace_csf.h index 9ee7f81..e70a498 100644 --- a/mali_kbase/debug/backend/mali_kbase_debug_linux_ktrace_csf.h +++ b/mali_kbase/debug/backend/mali_kbase_debug_linux_ktrace_csf.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2020-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -30,37 +30,52 @@ /* * Generic CSF events - using the common DEFINE_MALI_ADD_EVENT */ -DEFINE_MALI_ADD_EVENT(EVICT_CTX_SLOTS); -DEFINE_MALI_ADD_EVENT(FIRMWARE_BOOT); -DEFINE_MALI_ADD_EVENT(FIRMWARE_REBOOT); -DEFINE_MALI_ADD_EVENT(SCHEDULER_TOCK); +DEFINE_MALI_ADD_EVENT(SCHEDULER_EVICT_CTX_SLOTS_START); +DEFINE_MALI_ADD_EVENT(SCHEDULER_EVICT_CTX_SLOTS_END); +DEFINE_MALI_ADD_EVENT(CSF_FIRMWARE_BOOT); +DEFINE_MALI_ADD_EVENT(CSF_FIRMWARE_REBOOT); +DEFINE_MALI_ADD_EVENT(SCHEDULER_TOCK_INVOKE); +DEFINE_MALI_ADD_EVENT(SCHEDULER_TICK_INVOKE); +DEFINE_MALI_ADD_EVENT(SCHEDULER_TOCK_START); DEFINE_MALI_ADD_EVENT(SCHEDULER_TOCK_END); -DEFINE_MALI_ADD_EVENT(SCHEDULER_TICK); +DEFINE_MALI_ADD_EVENT(SCHEDULER_TICK_START); DEFINE_MALI_ADD_EVENT(SCHEDULER_TICK_END); -DEFINE_MALI_ADD_EVENT(SCHEDULER_RESET); -DEFINE_MALI_ADD_EVENT(SCHEDULER_WAIT_PROTM_QUIT); -DEFINE_MALI_ADD_EVENT(SCHEDULER_WAIT_PROTM_QUIT_DONE); -DEFINE_MALI_ADD_EVENT(SYNC_UPDATE_EVENT); -DEFINE_MALI_ADD_EVENT(SYNC_UPDATE_EVENT_NOTIFY_GPU); -DEFINE_MALI_ADD_EVENT(CSF_INTERRUPT); +DEFINE_MALI_ADD_EVENT(SCHEDULER_RESET_START); +DEFINE_MALI_ADD_EVENT(SCHEDULER_RESET_END); +DEFINE_MALI_ADD_EVENT(SCHEDULER_PROTM_WAIT_QUIT_START); +DEFINE_MALI_ADD_EVENT(SCHEDULER_PROTM_WAIT_QUIT_END); +DEFINE_MALI_ADD_EVENT(SCHEDULER_GROUP_SYNC_UPDATE_EVENT); +DEFINE_MALI_ADD_EVENT(CSF_SYNC_UPDATE_NOTIFY_GPU_EVENT); +DEFINE_MALI_ADD_EVENT(CSF_INTERRUPT_START); DEFINE_MALI_ADD_EVENT(CSF_INTERRUPT_END); -DEFINE_MALI_ADD_EVENT(CSG_INTERRUPT_PROCESS); -DEFINE_MALI_ADD_EVENT(GLB_REQ_ACQ); -DEFINE_MALI_ADD_EVENT(SCHEDULER_CAN_IDLE); -DEFINE_MALI_ADD_EVENT(SCHEDULER_ADVANCE_TICK); -DEFINE_MALI_ADD_EVENT(SCHEDULER_NOADVANCE_TICK); -DEFINE_MALI_ADD_EVENT(SCHEDULER_INSERT_RUNNABLE); -DEFINE_MALI_ADD_EVENT(SCHEDULER_REMOVE_RUNNABLE); -DEFINE_MALI_ADD_EVENT(SCHEDULER_ROTATE_RUNNABLE); -DEFINE_MALI_ADD_EVENT(SCHEDULER_HEAD_RUNNABLE); -DEFINE_MALI_ADD_EVENT(IDLE_WORKER_BEGIN); -DEFINE_MALI_ADD_EVENT(IDLE_WORKER_END); -DEFINE_MALI_ADD_EVENT(GROUP_SYNC_UPDATE_WORKER_BEGIN); -DEFINE_MALI_ADD_EVENT(GROUP_SYNC_UPDATE_WORKER_END); -DEFINE_MALI_ADD_EVENT(SLOTS_STATUS_UPDATE_ACK); -DEFINE_MALI_ADD_EVENT(GPU_IDLE_HANDLING_START); -DEFINE_MALI_ADD_EVENT(MCU_HALTED); -DEFINE_MALI_ADD_EVENT(MCU_IN_SLEEP); +DEFINE_MALI_ADD_EVENT(CSF_INTERRUPT_GLB_REQ_ACK); +DEFINE_MALI_ADD_EVENT(SCHEDULER_GPU_IDLE_EVENT_CAN_SUSPEND); +DEFINE_MALI_ADD_EVENT(SCHEDULER_TICK_ADVANCE); +DEFINE_MALI_ADD_EVENT(SCHEDULER_TICK_NOADVANCE); +DEFINE_MALI_ADD_EVENT(SCHEDULER_RUNNABLE_KCTX_INSERT); +DEFINE_MALI_ADD_EVENT(SCHEDULER_RUNNABLE_KCTX_REMOVE); +DEFINE_MALI_ADD_EVENT(SCHEDULER_RUNNABLE_KCTX_ROTATE); +DEFINE_MALI_ADD_EVENT(SCHEDULER_RUNNABLE_KCTX_HEAD); +DEFINE_MALI_ADD_EVENT(SCHEDULER_GPU_IDLE_WORKER_START); +DEFINE_MALI_ADD_EVENT(SCHEDULER_GPU_IDLE_WORKER_END); +DEFINE_MALI_ADD_EVENT(SCHEDULER_GROUP_SYNC_UPDATE_WORKER_START); +DEFINE_MALI_ADD_EVENT(SCHEDULER_GROUP_SYNC_UPDATE_WORKER_END); +DEFINE_MALI_ADD_EVENT(SCHEDULER_UPDATE_IDLE_SLOTS_ACK); +DEFINE_MALI_ADD_EVENT(SCHEDULER_GPU_IDLE_WORKER_HANDLING_START); +DEFINE_MALI_ADD_EVENT(SCHEDULER_GPU_IDLE_WORKER_HANDLING_END); +DEFINE_MALI_ADD_EVENT(CSF_FIRMWARE_MCU_HALTED); +DEFINE_MALI_ADD_EVENT(CSF_FIRMWARE_MCU_SLEEP); +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS +DEFINE_MALI_ADD_EVENT(SCHEDULER_ENTER_SC_RAIL); +DEFINE_MALI_ADD_EVENT(SCHEDULER_EXIT_SC_RAIL); +#endif +DEFINE_MALI_ADD_EVENT(SCHED_BUSY); +DEFINE_MALI_ADD_EVENT(SCHED_INACTIVE); +DEFINE_MALI_ADD_EVENT(SCHED_SUSPENDED); +DEFINE_MALI_ADD_EVENT(SCHED_SLEEPING); +#define KBASEP_MCU_STATE(n) DEFINE_MALI_ADD_EVENT(PM_MCU_ ## n); +#include "backend/gpu/mali_kbase_pm_mcu_states.h" +#undef KBASEP_MCU_STATE DECLARE_EVENT_CLASS(mali_csf_grp_q_template, TP_PROTO(struct kbase_device *kbdev, struct kbase_queue_group *group, @@ -130,38 +145,55 @@ DECLARE_EVENT_CLASS(mali_csf_grp_q_template, __entry->kctx_tgid, __entry->kctx_id, __entry->group_handle, \ __entry->csg_nr, __entry->slot_prio, __entry->info_val)) -DEFINE_MALI_CSF_GRP_EVENT(CSG_SLOT_START); -DEFINE_MALI_CSF_GRP_EVENT(CSG_SLOT_STOP); -DEFINE_MALI_CSF_GRP_EVENT(CSG_SLOT_STARTED); +DEFINE_MALI_CSF_GRP_EVENT(CSG_SLOT_START_REQ); +DEFINE_MALI_CSF_GRP_EVENT(CSG_SLOT_STOP_REQ); +DEFINE_MALI_CSF_GRP_EVENT(CSG_SLOT_RUNNING); DEFINE_MALI_CSF_GRP_EVENT(CSG_SLOT_STOPPED); DEFINE_MALI_CSF_GRP_EVENT(CSG_SLOT_CLEANED); -DEFINE_MALI_CSF_GRP_EVENT(CSG_SLOT_STATUS_UPDATE); +DEFINE_MALI_CSF_GRP_EVENT(CSG_UPDATE_IDLE_SLOT_REQ); DEFINE_MALI_CSF_GRP_EVENT(CSG_SLOT_IDLE_SET); +DEFINE_MALI_CSF_GRP_EVENT(CSG_INTERRUPT_NO_NON_IDLE_GROUPS); +DEFINE_MALI_CSF_GRP_EVENT(CSG_INTERRUPT_NON_IDLE_GROUPS); DEFINE_MALI_CSF_GRP_EVENT(CSG_SLOT_IDLE_CLEAR); -DEFINE_MALI_CSF_GRP_EVENT(CSG_PRIO_UPDATE); -DEFINE_MALI_CSF_GRP_EVENT(CSG_SYNC_UPDATE_INTERRUPT); -DEFINE_MALI_CSF_GRP_EVENT(CSG_IDLE_INTERRUPT); -DEFINE_MALI_CSF_GRP_EVENT(CSG_PROGRESS_TIMER_INTERRUPT); +DEFINE_MALI_CSF_GRP_EVENT(CSG_SLOT_PRIO_UPDATE); +DEFINE_MALI_CSF_GRP_EVENT(CSG_INTERRUPT_SYNC_UPDATE); +DEFINE_MALI_CSF_GRP_EVENT(CSG_INTERRUPT_IDLE); +DEFINE_MALI_CSF_GRP_EVENT(CSG_INTERRUPT_PROGRESS_TIMER_EVENT); +DEFINE_MALI_CSF_GRP_EVENT(CSG_INTERRUPT_PROCESS_START); DEFINE_MALI_CSF_GRP_EVENT(CSG_INTERRUPT_PROCESS_END); DEFINE_MALI_CSF_GRP_EVENT(GROUP_SYNC_UPDATE_DONE); DEFINE_MALI_CSF_GRP_EVENT(GROUP_DESCHEDULE); DEFINE_MALI_CSF_GRP_EVENT(GROUP_SCHEDULE); -DEFINE_MALI_CSF_GRP_EVENT(GROUP_EVICT_SCHED); -DEFINE_MALI_CSF_GRP_EVENT(GROUP_INSERT_RUNNABLE); -DEFINE_MALI_CSF_GRP_EVENT(GROUP_REMOVE_RUNNABLE); -DEFINE_MALI_CSF_GRP_EVENT(GROUP_ROTATE_RUNNABLE); -DEFINE_MALI_CSF_GRP_EVENT(GROUP_HEAD_RUNNABLE); -DEFINE_MALI_CSF_GRP_EVENT(GROUP_INSERT_IDLE_WAIT); -DEFINE_MALI_CSF_GRP_EVENT(GROUP_REMOVE_IDLE_WAIT); -DEFINE_MALI_CSF_GRP_EVENT(GROUP_HEAD_IDLE_WAIT); -DEFINE_MALI_CSF_GRP_EVENT(SCHEDULER_CHECK_PROTM_ENTER); -DEFINE_MALI_CSF_GRP_EVENT(SCHEDULER_ENTER_PROTM); -DEFINE_MALI_CSF_GRP_EVENT(SCHEDULER_EXIT_PROTM); +DEFINE_MALI_CSF_GRP_EVENT(GROUP_EVICT); +DEFINE_MALI_CSF_GRP_EVENT(GROUP_RUNNABLE_INSERT); +DEFINE_MALI_CSF_GRP_EVENT(GROUP_RUNNABLE_REMOVE); +DEFINE_MALI_CSF_GRP_EVENT(GROUP_RUNNABLE_ROTATE); +DEFINE_MALI_CSF_GRP_EVENT(GROUP_RUNNABLE_HEAD); +DEFINE_MALI_CSF_GRP_EVENT(GROUP_IDLE_WAIT_INSERT); +DEFINE_MALI_CSF_GRP_EVENT(GROUP_IDLE_WAIT_REMOVE); +DEFINE_MALI_CSF_GRP_EVENT(GROUP_IDLE_WAIT_HEAD); +DEFINE_MALI_CSF_GRP_EVENT(SCHEDULER_PROTM_ENTER_CHECK); +DEFINE_MALI_CSF_GRP_EVENT(SCHEDULER_PROTM_ENTER); +DEFINE_MALI_CSF_GRP_EVENT(SCHEDULER_PROTM_EXIT); DEFINE_MALI_CSF_GRP_EVENT(SCHEDULER_TOP_GRP); -DEFINE_MALI_CSF_GRP_EVENT(SCHEDULER_NONIDLE_OFFSLOT_INC); -DEFINE_MALI_CSF_GRP_EVENT(SCHEDULER_NONIDLE_OFFSLOT_DEC); -DEFINE_MALI_CSF_GRP_EVENT(PROTM_EVENT_WORKER_BEGIN); +DEFINE_MALI_CSF_GRP_EVENT(SCHEDULER_NONIDLE_OFFSLOT_GRP_INC); +DEFINE_MALI_CSF_GRP_EVENT(SCHEDULER_NONIDLE_OFFSLOT_GRP_DEC); +DEFINE_MALI_CSF_GRP_EVENT(SCHEDULER_HANDLE_IDLE_SLOTS); +DEFINE_MALI_CSF_GRP_EVENT(PROTM_EVENT_WORKER_START); DEFINE_MALI_CSF_GRP_EVENT(PROTM_EVENT_WORKER_END); +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS +DEFINE_MALI_CSF_GRP_EVENT(SC_RAIL_RECHECK_IDLE); +DEFINE_MALI_CSF_GRP_EVENT(SC_RAIL_RECHECK_NOT_IDLE); +DEFINE_MALI_CSF_GRP_EVENT(SC_RAIL_CAN_TURN_OFF); +#endif +DEFINE_MALI_CSF_GRP_EVENT(CSF_GROUP_INACTIVE); +DEFINE_MALI_CSF_GRP_EVENT(CSF_GROUP_RUNNABLE); +DEFINE_MALI_CSF_GRP_EVENT(CSF_GROUP_IDLE); +DEFINE_MALI_CSF_GRP_EVENT(CSF_GROUP_SUSPENDED); +DEFINE_MALI_CSF_GRP_EVENT(CSF_GROUP_SUSPENDED_ON_IDLE); +DEFINE_MALI_CSF_GRP_EVENT(CSF_GROUP_SUSPENDED_ON_WAIT_SYNC); +DEFINE_MALI_CSF_GRP_EVENT(CSF_GROUP_FAULT_EVICTED); +DEFINE_MALI_CSF_GRP_EVENT(CSF_GROUP_TERMINATED); #undef DEFINE_MALI_CSF_GRP_EVENT @@ -176,22 +208,22 @@ DEFINE_MALI_CSF_GRP_EVENT(PROTM_EVENT_WORKER_END); DEFINE_MALI_CSF_GRP_Q_EVENT(CSI_START); DEFINE_MALI_CSF_GRP_Q_EVENT(CSI_STOP); -DEFINE_MALI_CSF_GRP_Q_EVENT(CSI_STOP_REQUESTED); -DEFINE_MALI_CSF_GRP_Q_EVENT(CSI_IGNORED_INTERRUPTS_GROUP_SUSPEND); -DEFINE_MALI_CSF_GRP_Q_EVENT(CSI_FAULT_INTERRUPT); -DEFINE_MALI_CSF_GRP_Q_EVENT(CSI_TILER_OOM_INTERRUPT); -DEFINE_MALI_CSF_GRP_Q_EVENT(CSI_PROTM_PEND_INTERRUPT); +DEFINE_MALI_CSF_GRP_Q_EVENT(CSI_STOP_REQ); +DEFINE_MALI_CSF_GRP_Q_EVENT(CSI_INTERRUPT_GROUP_SUSPENDS_IGNORED); +DEFINE_MALI_CSF_GRP_Q_EVENT(CSI_INTERRUPT_FAULT); +DEFINE_MALI_CSF_GRP_Q_EVENT(CSI_INTERRUPT_TILER_OOM); +DEFINE_MALI_CSF_GRP_Q_EVENT(CSI_INTERRUPT_PROTM_PEND); DEFINE_MALI_CSF_GRP_Q_EVENT(CSI_PROTM_ACK); DEFINE_MALI_CSF_GRP_Q_EVENT(QUEUE_START); DEFINE_MALI_CSF_GRP_Q_EVENT(QUEUE_STOP); -DEFINE_MALI_CSF_GRP_Q_EVENT(QUEUE_SYNC_UPDATE); -DEFINE_MALI_CSF_GRP_Q_EVENT(QUEUE_SYNC_UPDATE_EVALUATED); -DEFINE_MALI_CSF_GRP_Q_EVENT(QUEUE_SYNC_STATUS_WAIT); -DEFINE_MALI_CSF_GRP_Q_EVENT(QUEUE_SYNC_CURRENT_VAL); -DEFINE_MALI_CSF_GRP_Q_EVENT(QUEUE_SYNC_TEST_VAL); -DEFINE_MALI_CSF_GRP_Q_EVENT(QUEUE_SYNC_BLOCKED_REASON); -DEFINE_MALI_CSF_GRP_Q_EVENT(PROTM_PENDING_SET); -DEFINE_MALI_CSF_GRP_Q_EVENT(PROTM_PENDING_CLEAR); +DEFINE_MALI_CSF_GRP_Q_EVENT(QUEUE_SYNC_UPDATE_EVAL_START); +DEFINE_MALI_CSF_GRP_Q_EVENT(QUEUE_SYNC_UPDATE_EVAL_END); +DEFINE_MALI_CSF_GRP_Q_EVENT(QUEUE_SYNC_UPDATE_WAIT_STATUS); +DEFINE_MALI_CSF_GRP_Q_EVENT(QUEUE_SYNC_UPDATE_CUR_VAL); +DEFINE_MALI_CSF_GRP_Q_EVENT(QUEUE_SYNC_UPDATE_TEST_VAL); +DEFINE_MALI_CSF_GRP_Q_EVENT(QUEUE_SYNC_UPDATE_BLOCKED_REASON); +DEFINE_MALI_CSF_GRP_Q_EVENT(CSI_PROTM_PEND_SET); +DEFINE_MALI_CSF_GRP_Q_EVENT(CSI_PROTM_PEND_CLEAR); #undef DEFINE_MALI_CSF_GRP_Q_EVENT @@ -230,14 +262,14 @@ DECLARE_EVENT_CLASS(mali_csf_kcpu_queue_template, u64 info_val1, u64 info_val2), \ TP_ARGS(queue, info_val1, info_val2)) -DEFINE_MALI_CSF_KCPU_EVENT(KCPU_QUEUE_NEW); -DEFINE_MALI_CSF_KCPU_EVENT(KCPU_QUEUE_DESTROY); -DEFINE_MALI_CSF_KCPU_EVENT(CQS_SET); -DEFINE_MALI_CSF_KCPU_EVENT(CQS_WAIT_START); -DEFINE_MALI_CSF_KCPU_EVENT(CQS_WAIT_END); -DEFINE_MALI_CSF_KCPU_EVENT(FENCE_SIGNAL); -DEFINE_MALI_CSF_KCPU_EVENT(FENCE_WAIT_START); -DEFINE_MALI_CSF_KCPU_EVENT(FENCE_WAIT_END); +DEFINE_MALI_CSF_KCPU_EVENT(KCPU_QUEUE_CREATE); +DEFINE_MALI_CSF_KCPU_EVENT(KCPU_QUEUE_DELETE); +DEFINE_MALI_CSF_KCPU_EVENT(KCPU_CQS_SET); +DEFINE_MALI_CSF_KCPU_EVENT(KCPU_CQS_WAIT_START); +DEFINE_MALI_CSF_KCPU_EVENT(KCPU_CQS_WAIT_END); +DEFINE_MALI_CSF_KCPU_EVENT(KCPU_FENCE_SIGNAL); +DEFINE_MALI_CSF_KCPU_EVENT(KCPU_FENCE_WAIT_START); +DEFINE_MALI_CSF_KCPU_EVENT(KCPU_FENCE_WAIT_END); #undef DEFINE_MALI_CSF_KCPU_EVENT diff --git a/mali_kbase/debug/mali_kbase_debug_ktrace.c b/mali_kbase/debug/mali_kbase_debug_ktrace.c index 9bf8610..3cbd2da 100644 --- a/mali_kbase/debug/mali_kbase_debug_ktrace.c +++ b/mali_kbase/debug/mali_kbase_debug_ktrace.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2020-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -27,13 +27,13 @@ int kbase_ktrace_init(struct kbase_device *kbdev) #if KBASE_KTRACE_TARGET_RBUF struct kbase_ktrace_msg *rbuf; + spin_lock_init(&kbdev->ktrace.lock); rbuf = kmalloc_array(KBASE_KTRACE_SIZE, sizeof(*rbuf), GFP_KERNEL); if (!rbuf) return -EINVAL; kbdev->ktrace.rbuf = rbuf; - spin_lock_init(&kbdev->ktrace.lock); #endif /* KBASE_KTRACE_TARGET_RBUF */ return 0; } @@ -42,6 +42,7 @@ void kbase_ktrace_term(struct kbase_device *kbdev) { #if KBASE_KTRACE_TARGET_RBUF kfree(kbdev->ktrace.rbuf); + kbdev->ktrace.rbuf = NULL; #endif /* KBASE_KTRACE_TARGET_RBUF */ } @@ -131,7 +132,7 @@ static void kbasep_ktrace_dump_msg(struct kbase_device *kbdev, lockdep_assert_held(&kbdev->ktrace.lock); kbasep_ktrace_format_msg(trace_msg, buffer, sizeof(buffer)); - dev_dbg(kbdev->dev, "%s", buffer); + dev_err(kbdev->dev, "%s", buffer); } struct kbase_ktrace_msg *kbasep_ktrace_reserve(struct kbase_ktrace *ktrace) @@ -183,6 +184,9 @@ void kbasep_ktrace_add(struct kbase_device *kbdev, enum kbase_ktrace_code code, unsigned long irqflags; struct kbase_ktrace_msg *trace_msg; + if (unlikely(!kbasep_ktrace_initialized(&kbdev->ktrace))) + return; + WARN_ON((flags & ~KBASE_KTRACE_FLAG_COMMON_ALL)); spin_lock_irqsave(&kbdev->ktrace.lock, irqflags); @@ -212,34 +216,61 @@ void kbasep_ktrace_clear(struct kbase_device *kbdev) spin_unlock_irqrestore(&kbdev->ktrace.lock, flags); } +static inline u32 ktrace_buffer_distance(u32 start, u32 end) { + if (end == start) + return 0; + if (end > start) + return end - start; + return KBASE_KTRACE_SIZE; +} + void kbasep_ktrace_dump(struct kbase_device *kbdev) { unsigned long flags; u32 start; u32 end; + u32 i = 0; + u32 distance = 0; char buffer[KTRACE_DUMP_MESSAGE_SIZE] = "Dumping trace:\n"; kbasep_ktrace_format_header(buffer, sizeof(buffer), strlen(buffer)); - dev_dbg(kbdev->dev, "%s", buffer); + dev_err(kbdev->dev, "%s", buffer); spin_lock_irqsave(&kbdev->ktrace.lock, flags); start = kbdev->ktrace.first_out; end = kbdev->ktrace.next_in; - - while (start != end) { - struct kbase_ktrace_msg *trace_msg = &kbdev->ktrace.rbuf[start]; - + distance = ktrace_buffer_distance(start, end); + for (i = 0; i < distance; ++i) { + struct kbase_ktrace_msg *trace_msg = &kbdev->ktrace.rbuf[end]; kbasep_ktrace_dump_msg(kbdev, trace_msg); - start = (start + 1) & KBASE_KTRACE_MASK; + end = (end + 1) & KBASE_KTRACE_MASK; } - dev_dbg(kbdev->dev, "TRACE_END"); + dev_err(kbdev->dev, "TRACE_END: (%i entries)", i); kbasep_ktrace_clear_locked(kbdev); spin_unlock_irqrestore(&kbdev->ktrace.lock, flags); } +u32 kbasep_ktrace_copy(struct kbase_device* kbdev, struct kbase_ktrace_msg* msgs, u32 num_msgs) +{ + u32 start = kbdev->ktrace.first_out; + u32 end = kbdev->ktrace.next_in; + u32 i = 0; + u32 distance = min(ktrace_buffer_distance(start, end), num_msgs); + + lockdep_assert_held(&kbdev->ktrace.lock); + + for (i = 0; i < distance; ++i) { + struct kbase_ktrace_msg *trace_msg = &kbdev->ktrace.rbuf[end]; + memcpy(&msgs[i], trace_msg, sizeof(struct kbase_ktrace_msg)); + end = (end + 1) & KBASE_KTRACE_MASK; + } + + return i; +} + #if IS_ENABLED(CONFIG_DEBUG_FS) struct trace_seq_state { struct kbase_ktrace_msg trace_buf[KBASE_KTRACE_SIZE]; diff --git a/mali_kbase/debug/mali_kbase_debug_ktrace.h b/mali_kbase/debug/mali_kbase_debug_ktrace.h index f1e6d3d..7c988f4 100644 --- a/mali_kbase/debug/mali_kbase_debug_ktrace.h +++ b/mali_kbase/debug/mali_kbase_debug_ktrace.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2020-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -82,6 +82,18 @@ void kbase_ktrace_debugfs_init(struct kbase_device *kbdev); */ #if KBASE_KTRACE_TARGET_RBUF /** + * kbasep_ktrace_initialized - Check whether kbase ktrace is initialized + * + * @ktrace: ktrace of kbase device. + * + * Return: true if ktrace has been initialized. + */ +static inline bool kbasep_ktrace_initialized(struct kbase_ktrace *ktrace) +{ + return ktrace->rbuf != NULL; +} + +/** * kbasep_ktrace_add - internal function to add trace to the ringbuffer. * @kbdev: kbase device * @code: ktrace code @@ -111,6 +123,18 @@ void kbasep_ktrace_clear(struct kbase_device *kbdev); */ void kbasep_ktrace_dump(struct kbase_device *kbdev); +/** + * kbasep_ktrace_copy - copy ktrace buffer. + * Elements in the buffer will be ordered from earliest to latest. + * Precondition: ktrace lock must be held. + * + * @kbdev: kbase device + * @msgs: a region of memory of size data_size that the ktrace buffer will be copied to + * @num_msgs: the size of data. + * Return: The number of elements copied. + */ + u32 kbasep_ktrace_copy(struct kbase_device* kbdev, struct kbase_ktrace_msg* msgs, u32 num_msgs); + #define KBASE_KTRACE_RBUF_ADD(kbdev, code, kctx, info_val) \ kbasep_ktrace_add(kbdev, KBASE_KTRACE_CODE(code), kctx, 0, \ info_val) \ diff --git a/mali_kbase/debug/mali_kbase_debug_ktrace_codes.h b/mali_kbase/debug/mali_kbase_debug_ktrace_codes.h index 1c6b4cd..e2a1e8c 100644 --- a/mali_kbase/debug/mali_kbase_debug_ktrace_codes.h +++ b/mali_kbase/debug/mali_kbase_debug_ktrace_codes.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2011-2015, 2018-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2011-2015, 2018-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -142,6 +142,11 @@ int dummy_array[] = { KBASE_KTRACE_CODE_MAKE_CODE(PM_RUNTIME_SUSPEND_CALLBACK), KBASE_KTRACE_CODE_MAKE_CODE(PM_RUNTIME_RESUME_CALLBACK), + /* info_val = l2 state */ +#define KBASEP_L2_STATE(n) KBASE_KTRACE_CODE_MAKE_CODE(PM_L2_ ## n), +#include "backend/gpu/mali_kbase_pm_l2_states.h" +#undef KBASEP_L2_STATE + /* * Context Scheduler events */ @@ -157,6 +162,10 @@ int dummy_array[] = { KBASE_KTRACE_CODE_MAKE_CODE(ARB_VM_STATE), KBASE_KTRACE_CODE_MAKE_CODE(ARB_VM_EVT), #endif +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + KBASE_KTRACE_CODE_MAKE_CODE(PM_RAIL_ON), + KBASE_KTRACE_CODE_MAKE_CODE(PM_RAIL_OFF), +#endif #if MALI_USE_CSF #include "debug/backend/mali_kbase_debug_ktrace_codes_csf.h" diff --git a/mali_kbase/debug/mali_kbase_debug_ktrace_defs.h b/mali_kbase/debug/mali_kbase_debug_ktrace_defs.h index 4694b78..8d9e11e 100644 --- a/mali_kbase/debug/mali_kbase_debug_ktrace_defs.h +++ b/mali_kbase/debug/mali_kbase_debug_ktrace_defs.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2020-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -138,8 +138,8 @@ enum kbase_ktrace_code { }; /** - * struct kbase_ktrace - object representing a trace message added to trace - * buffer trace_rbuf in &kbase_device + * struct kbase_ktrace_msg - object representing a trace message added to trace + * buffer trace_rbuf in &kbase_device * @timestamp: CPU timestamp at which the trace message was added. * @thread_id: id of the thread in the context of which trace message was * added. diff --git a/mali_kbase/debug/mali_kbase_debug_linux_ktrace.h b/mali_kbase/debug/mali_kbase_debug_linux_ktrace.h index 5fac763..1b95306 100644 --- a/mali_kbase/debug/mali_kbase_debug_linux_ktrace.h +++ b/mali_kbase/debug/mali_kbase_debug_linux_ktrace.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2014, 2018, 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2014, 2018, 2020-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -98,6 +98,9 @@ DEFINE_MALI_ADD_EVENT(PM_WAKE_WAITERS); DEFINE_MALI_ADD_EVENT(PM_POWEROFF_WAIT_WQ); DEFINE_MALI_ADD_EVENT(PM_RUNTIME_SUSPEND_CALLBACK); DEFINE_MALI_ADD_EVENT(PM_RUNTIME_RESUME_CALLBACK); +#define KBASEP_L2_STATE(n) DEFINE_MALI_ADD_EVENT(PM_L2_ ## n); +#include "backend/gpu/mali_kbase_pm_l2_states.h" +#undef KBASEP_L2_STATE DEFINE_MALI_ADD_EVENT(SCHED_RETAIN_CTX_NOLOCK); DEFINE_MALI_ADD_EVENT(SCHED_RELEASE_CTX); #ifdef CONFIG_MALI_ARBITER_SUPPORT @@ -107,6 +110,11 @@ DEFINE_MALI_ADD_EVENT(ARB_VM_STATE); DEFINE_MALI_ADD_EVENT(ARB_VM_EVT); #endif +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS +DEFINE_MALI_ADD_EVENT(PM_RAIL_ON); +DEFINE_MALI_ADD_EVENT(PM_RAIL_OFF); +#endif + #if MALI_USE_CSF #include "backend/mali_kbase_debug_linux_ktrace_csf.h" #else diff --git a/mali_kbase/device/backend/mali_kbase_device_csf.c b/mali_kbase/device/backend/mali_kbase_device_csf.c index 5325658..571761f 100644 --- a/mali_kbase/device/backend/mali_kbase_device_csf.c +++ b/mali_kbase/device/backend/mali_kbase_device_csf.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -23,26 +23,27 @@ #include <device/mali_kbase_device.h> #include <mali_kbase_hwaccess_backend.h> -#include <mali_kbase_hwcnt_backend_csf_if_fw.h> -#include <mali_kbase_hwcnt_watchdog_if_timer.h> +#include <hwcnt/backend/mali_kbase_hwcnt_backend_csf_if_fw.h> +#include <hwcnt/mali_kbase_hwcnt_watchdog_if_timer.h> #include <mali_kbase_ctx_sched.h> #include <mali_kbase_reset_gpu.h> #include <csf/mali_kbase_csf.h> #include <csf/ipa_control/mali_kbase_csf_ipa_control.h> - -#if IS_ENABLED(CONFIG_MALI_NO_MALI) #include <backend/gpu/mali_kbase_model_linux.h> -#endif #include <mali_kbase.h> #include <backend/gpu/mali_kbase_irq_internal.h> #include <backend/gpu/mali_kbase_pm_internal.h> -#include <backend/gpu/mali_kbase_js_internal.h> #include <backend/gpu/mali_kbase_clk_rate_trace_mgr.h> #include <csf/mali_kbase_csf_csg_debugfs.h> -#include <mali_kbase_hwcnt_virtualizer.h> +#include <csf/mali_kbase_csf_kcpu_fence_debugfs.h> +#include <hwcnt/mali_kbase_hwcnt_virtualizer.h> #include <mali_kbase_kinstr_prfcnt.h> #include <mali_kbase_vinstr.h> +#include <tl/mali_kbase_timeline.h> +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) +#include <mali_kbase_gpu_metrics.h> +#endif /** * kbase_device_firmware_hwcnt_term - Terminate CSF firmware and HWC @@ -60,7 +61,7 @@ static void kbase_device_firmware_hwcnt_term(struct kbase_device *kbdev) kbase_vinstr_term(kbdev->vinstr_ctx); kbase_hwcnt_virtualizer_term(kbdev->hwcnt_gpu_virt); kbase_hwcnt_backend_csf_metadata_term(&kbdev->hwcnt_gpu_iface); - kbase_csf_firmware_term(kbdev); + kbase_csf_firmware_unload_term(kbdev); } } @@ -86,18 +87,14 @@ static int kbase_backend_late_init(struct kbase_device *kbdev) if (err) goto fail_pm_powerup; - err = kbase_backend_timer_init(kbdev); - if (err) - goto fail_timer; - #ifdef CONFIG_MALI_DEBUG -#ifndef CONFIG_MALI_NO_MALI +#if IS_ENABLED(CONFIG_MALI_REAL_HW) if (kbasep_common_test_interrupt_handlers(kbdev) != 0) { dev_err(kbdev->dev, "Interrupt assignment check failed.\n"); err = -EINVAL; goto fail_interrupt_test; } -#endif /* !CONFIG_MALI_NO_MALI */ +#endif /* IS_ENABLED(CONFIG_MALI_REAL_HW) */ #endif /* CONFIG_MALI_DEBUG */ kbase_ipa_control_init(kbdev); @@ -141,13 +138,11 @@ fail_pm_metrics_init: kbase_ipa_control_term(kbdev); #ifdef CONFIG_MALI_DEBUG -#ifndef CONFIG_MALI_NO_MALI +#if IS_ENABLED(CONFIG_MALI_REAL_HW) fail_interrupt_test: -#endif /* !CONFIG_MALI_NO_MALI */ +#endif /* IS_ENABLED(CONFIG_MALI_REAL_HW) */ #endif /* CONFIG_MALI_DEBUG */ - kbase_backend_timer_term(kbdev); -fail_timer: kbase_pm_context_idle(kbdev); kbase_hwaccess_pm_halt(kbdev); fail_pm_powerup: @@ -191,12 +186,26 @@ static int kbase_csf_early_init(struct kbase_device *kbdev) } /** - * kbase_csf_early_init - Early termination for firmware & scheduler. + * kbase_csf_early_term() - Early termination for firmware & scheduler. * @kbdev: Device pointer */ static void kbase_csf_early_term(struct kbase_device *kbdev) { kbase_csf_scheduler_early_term(kbdev); + kbase_csf_firmware_early_term(kbdev); +} + +/** + * kbase_csf_late_init - late initialization for firmware. + * @kbdev: Device pointer + * + * Return: 0 on success, error code otherwise. + */ +static int kbase_csf_late_init(struct kbase_device *kbdev) +{ + int err = kbase_csf_firmware_late_init(kbdev); + + return err; } /** @@ -268,60 +277,55 @@ static void kbase_device_hwcnt_backend_csf_term(struct kbase_device *kbdev) } static const struct kbase_device_init dev_init[] = { -#if IS_ENABLED(CONFIG_MALI_NO_MALI) - { kbase_gpu_device_create, kbase_gpu_device_destroy, - "Dummy model initialization failed" }, -#else +#if !IS_ENABLED(CONFIG_MALI_REAL_HW) + { kbase_gpu_device_create, kbase_gpu_device_destroy, "Dummy model initialization failed" }, +#else /* !IS_ENABLED(CONFIG_MALI_REAL_HW) */ { assign_irqs, NULL, "IRQ search failed" }, +#endif /* !IS_ENABLED(CONFIG_MALI_REAL_HW) */ +#if !IS_ENABLED(CONFIG_MALI_NO_MALI) { registers_map, registers_unmap, "Register map failed" }, -#endif - { power_control_init, power_control_term, - "Power control initialization failed" }, +#endif /* !IS_ENABLED(CONFIG_MALI_NO_MALI) */ +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) + { kbase_gpu_metrics_init, kbase_gpu_metrics_term, "GPU metrics initialization failed" }, +#endif /* IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) */ + { power_control_init, power_control_term, "Power control initialization failed" }, { kbase_device_io_history_init, kbase_device_io_history_term, "Register access history initialization failed" }, - { kbase_device_early_init, kbase_device_early_term, - "Early device initialization failed" }, - { kbase_device_populate_max_freq, NULL, - "Populating max frequency failed" }, + { kbase_device_early_init, kbase_device_early_term, "Early device initialization failed" }, + { kbase_backend_time_init, NULL, "Time backend initialization failed" }, { kbase_device_misc_init, kbase_device_misc_term, "Miscellaneous device initialization failed" }, { kbase_device_pcm_dev_init, kbase_device_pcm_dev_term, "Priority control manager initialization failed" }, - { kbase_ctx_sched_init, kbase_ctx_sched_term, - "Context scheduler initialization failed" }, - { kbase_mem_init, kbase_mem_term, - "Memory subsystem initialization failed" }, + { kbase_ctx_sched_init, kbase_ctx_sched_term, "Context scheduler initialization failed" }, + { kbase_mem_init, kbase_mem_term, "Memory subsystem initialization failed" }, { kbase_csf_protected_memory_init, kbase_csf_protected_memory_term, "Protected memory allocator initialization failed" }, { kbase_device_coherency_init, NULL, "Device coherency init failed" }, { kbase_protected_mode_init, kbase_protected_mode_term, "Protected mode subsystem initialization failed" }, - { kbase_device_list_init, kbase_device_list_term, - "Device list setup failed" }, + { kbase_device_list_init, kbase_device_list_term, "Device list setup failed" }, { kbase_device_timeline_init, kbase_device_timeline_term, "Timeline stream initialization failed" }, { kbase_clk_rate_trace_manager_init, kbase_clk_rate_trace_manager_term, "Clock rate trace manager initialization failed" }, - { kbase_lowest_gpu_freq_init, NULL, - "Lowest freq initialization failed" }, - { kbase_device_hwcnt_watchdog_if_init, - kbase_device_hwcnt_watchdog_if_term, + { kbase_device_hwcnt_watchdog_if_init, kbase_device_hwcnt_watchdog_if_term, "GPU hwcnt backend watchdog interface creation failed" }, - { kbase_device_hwcnt_backend_csf_if_init, - kbase_device_hwcnt_backend_csf_if_term, + { kbase_device_hwcnt_backend_csf_if_init, kbase_device_hwcnt_backend_csf_if_term, "GPU hwcnt backend CSF interface creation failed" }, - { kbase_device_hwcnt_backend_csf_init, - kbase_device_hwcnt_backend_csf_term, + { kbase_device_hwcnt_backend_csf_init, kbase_device_hwcnt_backend_csf_term, "GPU hwcnt backend creation failed" }, { kbase_device_hwcnt_context_init, kbase_device_hwcnt_context_term, "GPU hwcnt context initialization failed" }, - { kbase_backend_late_init, kbase_backend_late_term, - "Late backend initialization failed" }, - { kbase_csf_early_init, kbase_csf_early_term, - "Early CSF initialization failed" }, + { kbase_csf_early_init, kbase_csf_early_term, "Early CSF initialization failed" }, + { kbase_backend_late_init, kbase_backend_late_term, "Late backend initialization failed" }, + { kbase_csf_late_init, NULL, "Late CSF initialization failed" }, { NULL, kbase_device_firmware_hwcnt_term, NULL }, - { kbase_device_debugfs_init, kbase_device_debugfs_term, - "DebugFS initialization failed" }, + { kbase_debug_csf_fault_init, kbase_debug_csf_fault_term, + "CSF fault debug initialization failed" }, + { kbase_device_debugfs_init, kbase_device_debugfs_term, "DebugFS initialization failed" }, + { kbase_csf_fence_timer_debugfs_init, kbase_csf_fence_timer_debugfs_term, + "Fence timeout DebugFS initialization failed" }, /* Sysfs init needs to happen before registering the device with * misc_register(), otherwise it causes a race condition between * registering the device and a uevent event being generated for @@ -339,8 +343,11 @@ static const struct kbase_device_init dev_init[] = { "Misc device registration failed" }, { kbase_gpuprops_populate_user_buffer, kbase_gpuprops_free_user_buffer, "GPU property population failed" }, - { kbase_device_late_init, kbase_device_late_term, - "Late device initialization failed" }, + { kbase_device_late_init, kbase_device_late_term, "Late device initialization failed" }, +#if IS_ENABLED(CONFIG_MALI_CORESIGHT) + { kbase_debug_coresight_csf_init, kbase_debug_coresight_csf_term, + "Coresight initialization failed" }, +#endif /* IS_ENABLED(CONFIG_MALI_CORESIGHT) */ }; static void kbase_device_term_partial(struct kbase_device *kbdev, @@ -354,7 +361,6 @@ static void kbase_device_term_partial(struct kbase_device *kbdev, void kbase_device_term(struct kbase_device *kbdev) { - kbdev->csf.mali_file_inode = NULL; kbase_device_term_partial(kbdev, ARRAY_SIZE(dev_init)); kbase_mem_halt(kbdev); } @@ -468,7 +474,7 @@ static int kbase_csf_firmware_deferred_init(struct kbase_device *kbdev) lockdep_assert_held(&kbdev->fw_load_lock); - err = kbase_csf_firmware_init(kbdev); + err = kbase_csf_firmware_load_init(kbdev); if (!err) { unsigned long flags; @@ -498,11 +504,12 @@ int kbase_device_firmware_init_once(struct kbase_device *kbdev) ret = kbase_device_hwcnt_csf_deferred_init(kbdev); if (ret) { - kbase_csf_firmware_term(kbdev); + kbase_csf_firmware_unload_term(kbdev); goto out; } kbase_csf_debugfs_init(kbdev); + kbase_timeline_io_debugfs_init(kbdev); out: kbase_pm_context_idle(kbdev); } @@ -511,4 +518,4 @@ out: return ret; } -KBASE_EXPORT_TEST_API(kbase_device_firmware_init_once); +KBASE_EXPORT_TEST_API(kbase_device_firmware_init_once);
\ No newline at end of file diff --git a/mali_kbase/device/backend/mali_kbase_device_hw_csf.c b/mali_kbase/device/backend/mali_kbase_device_hw_csf.c index e2228ca..c837f5a 100644 --- a/mali_kbase/device/backend/mali_kbase_device_hw_csf.c +++ b/mali_kbase/device/backend/mali_kbase_device_hw_csf.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2020-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -24,6 +24,7 @@ #include <backend/gpu/mali_kbase_instr_internal.h> #include <backend/gpu/mali_kbase_pm_internal.h> #include <device/mali_kbase_device.h> +#include <device/mali_kbase_device_internal.h> #include <mali_kbase_reset_gpu.h> #include <mmu/mali_kbase_mmu.h> #include <mali_kbase_ctx_sched.h> @@ -57,7 +58,7 @@ static void kbase_gpu_fault_interrupt(struct kbase_device *kbdev) { const u32 status = kbase_reg_read(kbdev, GPU_CONTROL_REG(GPU_FAULTSTATUS)); - const bool as_valid = status & GPU_FAULTSTATUS_JASID_VALID_FLAG; + const bool as_valid = status & GPU_FAULTSTATUS_JASID_VALID_MASK; const u32 as_nr = (status & GPU_FAULTSTATUS_JASID_MASK) >> GPU_FAULTSTATUS_JASID_SHIFT; bool bus_fault = (status & GPU_FAULTSTATUS_EXCEPTION_TYPE_MASK) == @@ -83,6 +84,37 @@ static void kbase_gpu_fault_interrupt(struct kbase_device *kbdev) } +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS +/* When the GLB_PWROFF_TIMER expires, FW will write the SHADER_PWROFF register, this sequence + * follows: + * - SHADER_PWRTRANS goes high + * - SHADER_READY goes low + * - Iterator is told not to send any more work to the core + * - Wait for the core to drain + * - SHADER_PWRACTIVE goes low + * - Do an IPA sample + * - Flush the core + * - Apply functional isolation + * - Turn the clock off + * - Put the core in reset + * - Apply electrical isolation + * - Power off the core + * - SHADER_PWRTRANS goes low + * + * It's therefore safe to turn off the SC rail when: + * - SHADER_READY == 0, this means the SC's last transitioned to OFF + * - SHADER_PWRTRANS == 0, this means the SC's have finished transitioning + */ +static bool safe_to_turn_off_sc_rail(struct kbase_device *kbdev) +{ + lockdep_assert_held(&kbdev->hwaccess_lock); + return (kbase_reg_read(kbdev, GPU_CONTROL_REG(SHADER_READY_HI)) || + kbase_reg_read(kbdev, GPU_CONTROL_REG(SHADER_READY_LO)) || + kbase_reg_read(kbdev, GPU_CONTROL_REG(SHADER_PWRTRANS_HI)) || + kbase_reg_read(kbdev, GPU_CONTROL_REG(SHADER_PWRTRANS_LO))) == 0; +} +#endif /* CONFIG_MALI_HOST_CONTROLS_SC_RAILS */ + void kbase_gpu_interrupt(struct kbase_device *kbdev, u32 val) { KBASE_KTRACE_ADD(kbdev, CORE_GPU_IRQ, NULL, val); @@ -115,6 +147,9 @@ void kbase_gpu_interrupt(struct kbase_device *kbdev, u32 val) GPU_EXCEPTION_TYPE_SW_FAULT_0, } } }; + kbase_debug_csf_fault_notify(kbdev, scheduler->active_protm_grp->kctx, + DF_GPU_PROTECTED_FAULT); + scheduler->active_protm_grp->faulted = true; kbase_csf_add_group_fatal_error( scheduler->active_protm_grp, &err_payload); @@ -146,7 +181,6 @@ void kbase_gpu_interrupt(struct kbase_device *kbdev, u32 val) dev_dbg(kbdev->dev, "Doorbell mirror interrupt received"); spin_lock_irqsave(&kbdev->hwaccess_lock, flags); - WARN_ON(!kbase_csf_scheduler_get_nr_active_csgs(kbdev)); kbase_pm_disable_db_mirror_interrupt(kbdev); kbdev->pm.backend.exit_gpu_sleep_mode = true; kbase_csf_scheduler_invoke_tick(kbdev); @@ -166,6 +200,16 @@ void kbase_gpu_interrupt(struct kbase_device *kbdev, u32 val) if (val & CLEAN_CACHES_COMPLETED) kbase_clean_caches_done(kbdev); +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + if (val & POWER_CHANGED_ALL) { + unsigned long flags; + spin_lock_irqsave(&kbdev->hwaccess_lock, flags); + kbdev->pm.backend.sc_pwroff_safe = safe_to_turn_off_sc_rail(kbdev); + spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); + } +#endif + + if (val & (POWER_CHANGED_ALL | MCU_STATUS_GPU_IRQ)) { kbase_pm_power_changed(kbdev); } else if (val & CLEAN_CACHES_COMPLETED) { @@ -184,7 +228,7 @@ void kbase_gpu_interrupt(struct kbase_device *kbdev, u32 val) } #if !IS_ENABLED(CONFIG_MALI_NO_MALI) -static bool kbase_is_register_accessible(u32 offset) +bool kbase_is_register_accessible(u32 offset) { #ifdef CONFIG_MALI_DEBUG if (((offset >= MCU_SUBSYSTEM_BASE) && (offset < IPA_CONTROL_BASE)) || @@ -196,11 +240,16 @@ static bool kbase_is_register_accessible(u32 offset) return true; } +#endif /* !IS_ENABLED(CONFIG_MALI_NO_MALI) */ +#if IS_ENABLED(CONFIG_MALI_REAL_HW) void kbase_reg_write(struct kbase_device *kbdev, u32 offset, u32 value) { - KBASE_DEBUG_ASSERT(kbdev->pm.backend.gpu_powered); - KBASE_DEBUG_ASSERT(kbdev->dev != NULL); + if (WARN_ON(!kbdev->pm.backend.gpu_powered)) + return; + + if (WARN_ON(kbdev->dev == NULL)) + return; if (!kbase_is_register_accessible(offset)) return; @@ -220,8 +269,11 @@ u32 kbase_reg_read(struct kbase_device *kbdev, u32 offset) { u32 val; - KBASE_DEBUG_ASSERT(kbdev->pm.backend.gpu_powered); - KBASE_DEBUG_ASSERT(kbdev->dev != NULL); + if (WARN_ON(!kbdev->pm.backend.gpu_powered)) + return 0; + + if (WARN_ON(kbdev->dev == NULL)) + return 0; if (!kbase_is_register_accessible(offset)) return 0; @@ -238,4 +290,4 @@ u32 kbase_reg_read(struct kbase_device *kbdev, u32 offset) return val; } KBASE_EXPORT_TEST_API(kbase_reg_read); -#endif /* !IS_ENABLED(CONFIG_MALI_NO_MALI) */ +#endif /* IS_ENABLED(CONFIG_MALI_REAL_HW) */ diff --git a/mali_kbase/device/backend/mali_kbase_device_hw_jm.c b/mali_kbase/device/backend/mali_kbase_device_hw_jm.c index ff57cf6..8f7b39b 100644 --- a/mali_kbase/device/backend/mali_kbase_device_hw_jm.c +++ b/mali_kbase/device/backend/mali_kbase_device_hw_jm.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2020-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -63,9 +63,6 @@ void kbase_gpu_interrupt(struct kbase_device *kbdev, u32 val) if (val & RESET_COMPLETED) kbase_pm_reset_done(kbdev); - if (val & PRFCNT_SAMPLE_COMPLETED) - kbase_instr_hwcnt_sample_done(kbdev); - /* Defer clearing CLEAN_CACHES_COMPLETED to kbase_clean_caches_done. * We need to acquire hwaccess_lock to avoid a race condition with * kbase_gpu_cache_flush_and_busy_wait @@ -73,6 +70,13 @@ void kbase_gpu_interrupt(struct kbase_device *kbdev, u32 val) KBASE_KTRACE_ADD(kbdev, CORE_GPU_IRQ_CLEAR, NULL, val & ~CLEAN_CACHES_COMPLETED); kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_IRQ_CLEAR), val & ~CLEAN_CACHES_COMPLETED); + /* kbase_instr_hwcnt_sample_done frees the HWCNT pipeline to request another + * sample. Therefore this must be called after clearing the IRQ to avoid a + * race between clearing and the next sample raising the IRQ again. + */ + if (val & PRFCNT_SAMPLE_COMPLETED) + kbase_instr_hwcnt_sample_done(kbdev); + /* kbase_pm_check_transitions (called by kbase_pm_power_changed) must * be called after the IRQ has been cleared. This is because it might * trigger further power transitions and we don't want to miss the @@ -102,11 +106,10 @@ void kbase_gpu_interrupt(struct kbase_device *kbdev, u32 val) KBASE_KTRACE_ADD(kbdev, CORE_GPU_IRQ_DONE, NULL, val); } -#if !IS_ENABLED(CONFIG_MALI_NO_MALI) +#if IS_ENABLED(CONFIG_MALI_REAL_HW) void kbase_reg_write(struct kbase_device *kbdev, u32 offset, u32 value) { - KBASE_DEBUG_ASSERT(kbdev->pm.backend.gpu_powered); - KBASE_DEBUG_ASSERT(kbdev->dev != NULL); + WARN_ON(!kbdev->pm.backend.gpu_powered); writel(value, kbdev->reg + offset); @@ -121,10 +124,10 @@ KBASE_EXPORT_TEST_API(kbase_reg_write); u32 kbase_reg_read(struct kbase_device *kbdev, u32 offset) { - u32 val; + u32 val = 0; - KBASE_DEBUG_ASSERT(kbdev->pm.backend.gpu_powered); - KBASE_DEBUG_ASSERT(kbdev->dev != NULL); + if (WARN_ON(!kbdev->pm.backend.gpu_powered)) + return val; val = readl(kbdev->reg + offset); @@ -138,4 +141,4 @@ u32 kbase_reg_read(struct kbase_device *kbdev, u32 offset) return val; } KBASE_EXPORT_TEST_API(kbase_reg_read); -#endif /* !IS_ENABLED(CONFIG_MALI_NO_MALI) */ +#endif /* IS_ENABLED(CONFIG_MALI_REAL_HW) */ diff --git a/mali_kbase/device/backend/mali_kbase_device_jm.c b/mali_kbase/device/backend/mali_kbase_device_jm.c index ab75bc6..0ce2bc8 100644 --- a/mali_kbase/device/backend/mali_kbase_device_jm.c +++ b/mali_kbase/device/backend/mali_kbase_device_jm.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -29,13 +29,10 @@ #include <mali_kbase_hwaccess_backend.h> #include <mali_kbase_ctx_sched.h> #include <mali_kbase_reset_gpu.h> -#include <mali_kbase_hwcnt_watchdog_if_timer.h> -#include <mali_kbase_hwcnt_backend_jm.h> -#include <mali_kbase_hwcnt_backend_jm_watchdog.h> - -#if IS_ENABLED(CONFIG_MALI_NO_MALI) +#include <hwcnt/mali_kbase_hwcnt_watchdog_if_timer.h> +#include <hwcnt/backend/mali_kbase_hwcnt_backend_jm.h> +#include <hwcnt/backend/mali_kbase_hwcnt_backend_jm_watchdog.h> #include <backend/gpu/mali_kbase_model_linux.h> -#endif /* CONFIG_MALI_NO_MALI */ #ifdef CONFIG_MALI_ARBITER_SUPPORT #include <arbiter/mali_kbase_arbiter_pm.h> @@ -48,6 +45,9 @@ #include <backend/gpu/mali_kbase_pm_internal.h> #include <mali_kbase_dummy_job_wa.h> #include <backend/gpu/mali_kbase_clk_rate_trace_mgr.h> +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) +#include <mali_kbase_gpu_metrics.h> +#endif /** * kbase_backend_late_init - Perform any backend-specific initialization. @@ -76,13 +76,13 @@ static int kbase_backend_late_init(struct kbase_device *kbdev) goto fail_timer; #ifdef CONFIG_MALI_DEBUG -#ifndef CONFIG_MALI_NO_MALI +#if IS_ENABLED(CONFIG_MALI_REAL_HW) if (kbasep_common_test_interrupt_handlers(kbdev) != 0) { dev_err(kbdev->dev, "Interrupt assignment check failed.\n"); err = -EINVAL; goto fail_interrupt_test; } -#endif /* !CONFIG_MALI_NO_MALI */ +#endif /* IS_ENABLED(CONFIG_MALI_REAL_HW) */ #endif /* CONFIG_MALI_DEBUG */ err = kbase_job_slot_init(kbdev); @@ -121,9 +121,9 @@ fail_devfreq_init: fail_job_slot: #ifdef CONFIG_MALI_DEBUG -#ifndef CONFIG_MALI_NO_MALI +#if IS_ENABLED(CONFIG_MALI_REAL_HW) fail_interrupt_test: -#endif /* !CONFIG_MALI_NO_MALI */ +#endif /* IS_ENABLED(CONFIG_MALI_REAL_HW) */ #endif /* CONFIG_MALI_DEBUG */ kbase_backend_timer_term(kbdev); @@ -215,17 +215,22 @@ static void kbase_device_hwcnt_backend_jm_watchdog_term(struct kbase_device *kbd } static const struct kbase_device_init dev_init[] = { -#if IS_ENABLED(CONFIG_MALI_NO_MALI) +#if !IS_ENABLED(CONFIG_MALI_REAL_HW) { kbase_gpu_device_create, kbase_gpu_device_destroy, "Dummy model initialization failed" }, -#else +#else /* !IS_ENABLED(CONFIG_MALI_REAL_HW) */ { assign_irqs, NULL, "IRQ search failed" }, +#endif /* !IS_ENABLED(CONFIG_MALI_REAL_HW) */ +#if !IS_ENABLED(CONFIG_MALI_NO_MALI) { registers_map, registers_unmap, "Register map failed" }, -#endif +#endif /* !IS_ENABLED(CONFIG_MALI_NO_MALI) */ +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) + { kbase_gpu_metrics_init, kbase_gpu_metrics_term, "GPU metrics initialization failed" }, +#endif /* IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) */ { kbase_device_io_history_init, kbase_device_io_history_term, "Register access history initialization failed" }, { kbase_device_pm_init, kbase_device_pm_term, "Power management initialization failed" }, { kbase_device_early_init, kbase_device_early_term, "Early device initialization failed" }, - { kbase_device_populate_max_freq, NULL, "Populating max frequency failed" }, + { kbase_backend_time_init, NULL, "Time backend initialization failed" }, { kbase_device_misc_init, kbase_device_misc_term, "Miscellaneous device initialization failed" }, { kbase_device_pcm_dev_init, kbase_device_pcm_dev_term, @@ -241,7 +246,6 @@ static const struct kbase_device_init dev_init[] = { "Timeline stream initialization failed" }, { kbase_clk_rate_trace_manager_init, kbase_clk_rate_trace_manager_term, "Clock rate trace manager initialization failed" }, - { kbase_lowest_gpu_freq_init, NULL, "Lowest freq initialization failed" }, { kbase_instr_backend_init, kbase_instr_backend_term, "Instrumentation backend initialization failed" }, { kbase_device_hwcnt_watchdog_if_init, kbase_device_hwcnt_watchdog_if_term, @@ -323,20 +327,21 @@ int kbase_device_init(struct kbase_device *kbdev) } } - kthread_init_worker(&kbdev->job_done_worker); - kbdev->job_done_worker_thread = kbase_create_realtime_thread(kbdev, - kthread_worker_fn, &kbdev->job_done_worker, "mali_jd_thread"); - if (IS_ERR(kbdev->job_done_worker_thread)) - return PTR_ERR(kbdev->job_done_worker_thread); + if (err) + return err; + + err = kbase_kthread_run_worker_rt(kbdev, &kbdev->job_done_worker, "mali_jd_thread"); + if (err) + return err; err = kbase_pm_apc_init(kbdev); if (err) return err; kthread_init_worker(&kbdev->event_worker); - kbdev->event_worker_thread = kthread_run(kthread_worker_fn, - &kbdev->event_worker, "mali_event_thread"); - if (IS_ERR(kbdev->event_worker_thread)) { + kbdev->event_worker.task = + kthread_run(kthread_worker_fn, &kbdev->event_worker, "mali_event_thread"); + if (IS_ERR(kbdev->event_worker.task)) { err = -ENOMEM; } @@ -358,4 +363,4 @@ int kbase_device_firmware_init_once(struct kbase_device *kbdev) mutex_unlock(&kbdev->fw_load_lock); return ret; -} +}
\ No newline at end of file diff --git a/mali_kbase/device/mali_kbase_device.c b/mali_kbase/device/mali_kbase_device.c index c123010..e5b3e2b 100644 --- a/mali_kbase/device/mali_kbase_device.c +++ b/mali_kbase/device/mali_kbase_device.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2010-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -35,6 +35,7 @@ #include <mali_kbase.h> #include <mali_kbase_defs.h> #include <mali_kbase_hwaccess_instr.h> +#include <mali_kbase_hwaccess_time.h> #include <mali_kbase_hw.h> #include <mali_kbase_config_defaults.h> #include <linux/priority_control_manager.h> @@ -42,8 +43,8 @@ #include <tl/mali_kbase_timeline.h> #include "mali_kbase_kinstr_prfcnt.h" #include "mali_kbase_vinstr.h" -#include "mali_kbase_hwcnt_context.h" -#include "mali_kbase_hwcnt_virtualizer.h" +#include "hwcnt/mali_kbase_hwcnt_context.h" +#include "hwcnt/mali_kbase_hwcnt_virtualizer.h" #include "mali_kbase_device.h" #include "mali_kbase_device_internal.h" @@ -56,17 +57,15 @@ #include "arbiter/mali_kbase_arbiter_pm.h" #endif /* CONFIG_MALI_ARBITER_SUPPORT */ -/* NOTE: Magic - 0x45435254 (TRCE in ASCII). - * Supports tracing feature provided in the base module. - * Please keep it in sync with the value of base module. - */ -#define TRACE_BUFFER_HEADER_SPECIAL 0x45435254 +#if defined(CONFIG_DEBUG_FS) && !IS_ENABLED(CONFIG_MALI_NO_MALI) /* Number of register accesses for the buffer that we allocate during * initialization time. The buffer size can be changed later via debugfs. */ #define KBASEP_DEFAULT_REGISTER_HISTORY_SIZE ((u16)512) +#endif /* defined(CONFIG_DEBUG_FS) && !IS_ENABLED(CONFIG_MALI_NO_MALI) */ + static DEFINE_MUTEX(kbase_dev_list_lock); static LIST_HEAD(kbase_dev_list); static int kbase_dev_nr; @@ -187,8 +186,8 @@ static int mali_oom_notifier_handler(struct notifier_block *nb, kbdev_alloc_total = KBASE_PAGES_TO_KIB(atomic_read(&(kbdev->memdev.used_pages))); - dev_err(kbdev->dev, "OOM notifier: dev %s %lu kB\n", kbdev->devname, - kbdev_alloc_total); + dev_info(kbdev->dev, + "System reports low memory, GPU memory usage summary:\n"); mutex_lock(&kbdev->kctx_list_lock); @@ -202,15 +201,18 @@ static int mali_oom_notifier_handler(struct notifier_block *nb, pid_struct = find_get_pid(kctx->pid); task = pid_task(pid_struct, PIDTYPE_PID); - dev_err(kbdev->dev, - "OOM notifier: tsk %s tgid (%u) pid (%u) %lu kB\n", - task ? task->comm : "[null task]", kctx->tgid, - kctx->pid, task_alloc_total); + dev_info(kbdev->dev, + " tsk %s tgid %u pid %u has allocated %lu kB GPU memory\n", + task ? task->comm : "[null task]", kctx->tgid, kctx->pid, + task_alloc_total); put_pid(pid_struct); rcu_read_unlock(); } + dev_info(kbdev->dev, "End of summary, device usage is %lu kB\n", + kbdev_alloc_total); + mutex_unlock(&kbdev->kctx_list_lock); return NOTIFY_OK; } @@ -228,11 +230,14 @@ int kbase_device_misc_init(struct kbase_device * const kbdev) kbdev->cci_snoop_enabled = false; np = kbdev->dev->of_node; if (np != NULL) { - if (of_property_read_u32(np, "snoop_enable_smc", - &kbdev->snoop_enable_smc)) + /* Read "-" versions of the properties and fallback to "_" + * if these are not found + */ + if (of_property_read_u32(np, "snoop-enable-smc", &kbdev->snoop_enable_smc) && + of_property_read_u32(np, "snoop_enable_smc", &kbdev->snoop_enable_smc)) kbdev->snoop_enable_smc = 0; - if (of_property_read_u32(np, "snoop_disable_smc", - &kbdev->snoop_disable_smc)) + if (of_property_read_u32(np, "snoop-disable-smc", &kbdev->snoop_disable_smc) && + of_property_read_u32(np, "snoop_disable_smc", &kbdev->snoop_disable_smc)) kbdev->snoop_disable_smc = 0; /* Either both or none of the calls should be provided. */ if (!((kbdev->snoop_disable_smc == 0 @@ -279,9 +284,7 @@ int kbase_device_misc_init(struct kbase_device * const kbdev) goto dma_set_mask_failed; - /* There is no limit for Mali, so set to max. We only do this if dma_parms - * is already allocated by the platform. - */ + /* There is no limit for Mali, so set to max. */ if (kbdev->dev->dma_parms) err = dma_set_max_seg_size(kbdev->dev, UINT_MAX); if (err) @@ -293,12 +296,9 @@ int kbase_device_misc_init(struct kbase_device * const kbdev) if (err) goto dma_set_mask_failed; - err = kbase_ktrace_init(kbdev); - if (err) - goto term_as; err = kbase_pbha_read_dtb(kbdev); if (err) - goto term_ktrace; + goto term_as; init_waitqueue_head(&kbdev->cache_clean_wait); @@ -308,10 +308,15 @@ int kbase_device_misc_init(struct kbase_device * const kbdev) kbdev->pm.dvfs_period = DEFAULT_PM_DVFS_PERIOD; - kbdev->reset_timeout_ms = DEFAULT_RESET_TIMEOUT_MS; +#if MALI_USE_CSF + kbdev->reset_timeout_ms = kbase_get_timeout_ms(kbdev, CSF_GPU_RESET_TIMEOUT); +#else /* MALI_USE_CSF */ + kbdev->reset_timeout_ms = JM_DEFAULT_RESET_TIMEOUT_MS; +#endif /* !MALI_USE_CSF */ kbdev->mmu_mode = kbase_mmu_mode_get_aarch64(); - + kbdev->mmu_or_gpu_cache_op_wait_time_ms = + kbase_get_timeout_ms(kbdev, MMU_AS_INACTIVE_WAIT_TIMEOUT); mutex_init(&kbdev->kctx_list_lock); INIT_LIST_HEAD(&kbdev->kctx_list); @@ -324,10 +329,16 @@ int kbase_device_misc_init(struct kbase_device * const kbdev) "Unable to register OOM notifier for Mali - but will continue\n"); kbdev->oom_notifier_block.notifier_call = NULL; } + +#if MALI_USE_CSF +#if IS_ENABLED(CONFIG_SYNC_FILE) + atomic_set(&kbdev->live_fence_metadata, 0); +#endif /* IS_ENABLED(CONFIG_SYNC_FILE) */ + atomic_set(&kbdev->fence_signal_timeout_enabled, 1); +#endif + return 0; -term_ktrace: - kbase_ktrace_term(kbdev); term_as: kbase_device_all_as_term(kbdev); dma_set_mask_failed: @@ -344,14 +355,16 @@ void kbase_device_misc_term(struct kbase_device *kbdev) #if KBASE_KTRACE_ENABLE kbase_debug_assert_register_hook(NULL, NULL); #endif - - kbase_ktrace_term(kbdev); - kbase_device_all_as_term(kbdev); if (kbdev->oom_notifier_block.notifier_call) unregister_oom_notifier(&kbdev->oom_notifier_block); + +#if MALI_USE_CSF && IS_ENABLED(CONFIG_SYNC_FILE) + if (atomic_read(&kbdev->live_fence_metadata) > 0) + dev_warn(kbdev->dev, "Terminating Kbase device with live fence metadata!"); +#endif } void kbase_device_free(struct kbase_device *kbdev) @@ -361,8 +374,7 @@ void kbase_device_free(struct kbase_device *kbdev) void kbase_device_id_init(struct kbase_device *kbdev) { - scnprintf(kbdev->devname, DEVNAME_SIZE, "%s%d", kbase_drv_name, - kbase_dev_nr); + scnprintf(kbdev->devname, DEVNAME_SIZE, "%s%d", KBASE_DRV_NAME, kbase_dev_nr); kbdev->id = kbase_dev_nr; } @@ -484,10 +496,14 @@ int kbase_device_early_init(struct kbase_device *kbdev) { int err; + err = kbase_ktrace_init(kbdev); + if (err) + return err; + err = kbasep_platform_device_init(kbdev); if (err) - return err; + goto ktrace_term; err = kbase_pm_runtime_init(kbdev); if (err) @@ -501,7 +517,12 @@ int kbase_device_early_init(struct kbase_device *kbdev) /* Ensure we can access the GPU registers */ kbase_pm_register_access_enable(kbdev); - /* Find out GPU properties based on the GPU feature registers */ + /* + * Find out GPU properties based on the GPU feature registers. + * Note that this does not populate the few properties that depend on + * hw_features being initialized. Those are set by kbase_gpuprops_set_features + * soon after this in the init process. + */ kbase_gpuprops_set(kbdev); /* We're done accessing the GPU registers for now. */ @@ -524,6 +545,8 @@ fail_interrupts: kbase_pm_runtime_term(kbdev); fail_runtime_pm: kbasep_platform_device_term(kbdev); +ktrace_term: + kbase_ktrace_term(kbdev); return err; } @@ -540,6 +563,7 @@ void kbase_device_early_term(struct kbase_device *kbdev) #endif /* CONFIG_MALI_ARBITER_SUPPORT */ kbase_pm_runtime_term(kbdev); kbasep_platform_device_term(kbdev); + kbase_ktrace_term(kbdev); } int kbase_device_late_init(struct kbase_device *kbdev) diff --git a/mali_kbase/device/mali_kbase_device.h b/mali_kbase/device/mali_kbase_device.h index 5ff970a..e9cb5c2 100644 --- a/mali_kbase/device/mali_kbase_device.h +++ b/mali_kbase/device/mali_kbase_device.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -39,7 +39,7 @@ const struct list_head *kbase_device_get_list(void); void kbase_device_put_list(const struct list_head *dev_list); /** - * Kbase_increment_device_id - increment device id. + * kbase_increment_device_id - increment device id. * * Used to increment device id on successful initialization of the device. */ @@ -116,6 +116,26 @@ u32 kbase_reg_read(struct kbase_device *kbdev, u32 offset); bool kbase_is_gpu_removed(struct kbase_device *kbdev); /** + * kbase_gpu_cache_flush_pa_range_and_busy_wait() - Start a cache physical range flush + * and busy wait + * + * @kbdev: kbase device to issue the MMU operation on. + * @phys: Starting address of the physical range to start the operation on. + * @nr_bytes: Number of bytes to work on. + * @flush_op: Flush command register value to be sent to HW + * + * Issue a cache flush physical range command, then busy wait an irq status. + * This function will clear FLUSH_PA_RANGE_COMPLETED irq mask bit + * and busy-wait the rawstat register. + * + * Return: 0 if successful or a negative error code on failure. + */ +#if MALI_USE_CSF +int kbase_gpu_cache_flush_pa_range_and_busy_wait(struct kbase_device *kbdev, phys_addr_t phys, + size_t nr_bytes, u32 flush_op); +#endif /* MALI_USE_CSF */ + +/** * kbase_gpu_cache_flush_and_busy_wait - Start a cache flush and busy wait * @kbdev: Kbase device * @flush_op: Flush command register value to be sent to HW @@ -171,6 +191,7 @@ void kbase_gpu_wait_cache_clean(struct kbase_device *kbdev); * called from paths (like GPU reset) where an indefinite wait for the * completion of cache clean operation can cause deadlock, as the operation may * never complete. + * If cache clean times out, reset GPU to recover. * * Return: 0 if successful or a negative error code on failure. */ @@ -188,7 +209,7 @@ int kbase_gpu_wait_cache_clean_timeout(struct kbase_device *kbdev, void kbase_gpu_cache_clean_wait_complete(struct kbase_device *kbdev); /** - * kbase_clean_caches_done - Issue preiously queued cache clean request or + * kbase_clean_caches_done - Issue previously queued cache clean request or * wake up the requester that issued cache clean. * @kbdev: Kbase device * diff --git a/mali_kbase/device/mali_kbase_device_hw.c b/mali_kbase/device/mali_kbase_device_hw.c index 249d5f8..8126b9b 100644 --- a/mali_kbase/device/mali_kbase_device_hw.c +++ b/mali_kbase/device/mali_kbase_device_hw.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2014-2016, 2018-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2014-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -27,49 +27,108 @@ #include <mali_kbase_reset_gpu.h> #include <mmu/mali_kbase_mmu.h> -#if !IS_ENABLED(CONFIG_MALI_NO_MALI) bool kbase_is_gpu_removed(struct kbase_device *kbdev) { - u32 val; + if (!IS_ENABLED(CONFIG_MALI_ARBITER_SUPPORT)) + return false; - val = kbase_reg_read(kbdev, GPU_CONTROL_REG(GPU_ID)); - - return val == 0; + return (kbase_reg_read(kbdev, GPU_CONTROL_REG(GPU_ID)) == 0); } -#endif /* !IS_ENABLED(CONFIG_MALI_NO_MALI) */ -static int busy_wait_cache_clean_irq(struct kbase_device *kbdev) +/** + * busy_wait_cache_operation - Wait for a pending cache flush to complete + * + * @kbdev: Pointer of kbase device. + * @irq_bit: IRQ bit cache flush operation to wait on. + * + * It will reset GPU if the wait fails. + * + * Return: 0 on success, error code otherwise. + */ +static int busy_wait_cache_operation(struct kbase_device *kbdev, u32 irq_bit) { - /* Previously MMU-AS command was used for L2 cache flush on page-table update. - * And we're using the same max-loops count for GPU command, because amount of - * L2 cache flush overhead are same between them. - */ - unsigned int max_loops = KBASE_AS_INACTIVE_MAX_LOOPS; + const ktime_t wait_loop_start = ktime_get_raw(); + const u32 wait_time_ms = kbdev->mmu_or_gpu_cache_op_wait_time_ms; + bool completed = false; + s64 diff; + + do { + unsigned int i; + + for (i = 0; i < 1000; i++) { + if (kbase_reg_read(kbdev, GPU_CONTROL_REG(GPU_IRQ_RAWSTAT)) & irq_bit) { + completed = true; + break; + } + } - /* Wait for the GPU cache clean operation to complete */ - while (--max_loops && - !(kbase_reg_read(kbdev, GPU_CONTROL_REG(GPU_IRQ_RAWSTAT)) & - CLEAN_CACHES_COMPLETED)) { - ; - } + diff = ktime_to_ms(ktime_sub(ktime_get_raw(), wait_loop_start)); + } while ((diff < wait_time_ms) && !completed); + + if (!completed) { + char *irq_flag_name; + + switch (irq_bit) { + case CLEAN_CACHES_COMPLETED: + irq_flag_name = "CLEAN_CACHES_COMPLETED"; + break; + case FLUSH_PA_RANGE_COMPLETED: + irq_flag_name = "FLUSH_PA_RANGE_COMPLETED"; + break; + default: + irq_flag_name = "UNKNOWN"; + break; + } - /* reset gpu if time-out occurred */ - if (max_loops == 0) { dev_err(kbdev->dev, - "CLEAN_CACHES_COMPLETED bit stuck, might be caused by slow/unstable GPU clock or possible faulty FPGA connector\n"); + "Stuck waiting on %s bit, might be due to unstable GPU clk/pwr or possible faulty FPGA connector\n", + irq_flag_name); + if (kbase_prepare_to_reset_gpu_locked(kbdev, RESET_FLAGS_NONE)) kbase_reset_gpu_locked(kbdev); + return -EBUSY; } - /* Clear the interrupt CLEAN_CACHES_COMPLETED bit. */ - KBASE_KTRACE_ADD(kbdev, CORE_GPU_IRQ_CLEAR, NULL, CLEAN_CACHES_COMPLETED); - kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_IRQ_CLEAR), - CLEAN_CACHES_COMPLETED); + KBASE_KTRACE_ADD(kbdev, CORE_GPU_IRQ_CLEAR, NULL, irq_bit); + kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_IRQ_CLEAR), irq_bit); return 0; } +#if MALI_USE_CSF +#define U64_LO_MASK ((1ULL << 32) - 1) +#define U64_HI_MASK (~U64_LO_MASK) + +int kbase_gpu_cache_flush_pa_range_and_busy_wait(struct kbase_device *kbdev, phys_addr_t phys, + size_t nr_bytes, u32 flush_op) +{ + u64 start_pa, end_pa; + int ret = 0; + + lockdep_assert_held(&kbdev->hwaccess_lock); + + /* 1. Clear the interrupt FLUSH_PA_RANGE_COMPLETED bit. */ + kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_IRQ_CLEAR), FLUSH_PA_RANGE_COMPLETED); + + /* 2. Issue GPU_CONTROL.COMMAND.FLUSH_PA_RANGE operation. */ + start_pa = phys; + end_pa = start_pa + nr_bytes - 1; + + kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_COMMAND_ARG0_LO), start_pa & U64_LO_MASK); + kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_COMMAND_ARG0_HI), + (start_pa & U64_HI_MASK) >> 32); + kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_COMMAND_ARG1_LO), end_pa & U64_LO_MASK); + kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_COMMAND_ARG1_HI), (end_pa & U64_HI_MASK) >> 32); + kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_COMMAND), flush_op); + + /* 3. Busy-wait irq status to be enabled. */ + ret = busy_wait_cache_operation(kbdev, (u32)FLUSH_PA_RANGE_COMPLETED); + + return ret; +} +#endif /* MALI_USE_CSF */ + int kbase_gpu_cache_flush_and_busy_wait(struct kbase_device *kbdev, u32 flush_op) { @@ -97,7 +156,7 @@ int kbase_gpu_cache_flush_and_busy_wait(struct kbase_device *kbdev, irq_mask & ~CLEAN_CACHES_COMPLETED); /* busy wait irq status to be enabled */ - ret = busy_wait_cache_clean_irq(kbdev); + ret = busy_wait_cache_operation(kbdev, (u32)CLEAN_CACHES_COMPLETED); if (ret) return ret; @@ -118,7 +177,7 @@ int kbase_gpu_cache_flush_and_busy_wait(struct kbase_device *kbdev, kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_COMMAND), flush_op); /* 3. Busy-wait irq status to be enabled. */ - ret = busy_wait_cache_clean_irq(kbdev); + ret = busy_wait_cache_operation(kbdev, (u32)CLEAN_CACHES_COMPLETED); if (ret) return ret; @@ -225,8 +284,9 @@ static inline bool get_cache_clean_flag(struct kbase_device *kbdev) void kbase_gpu_wait_cache_clean(struct kbase_device *kbdev) { while (get_cache_clean_flag(kbdev)) { - wait_event_interruptible(kbdev->cache_clean_wait, - !kbdev->cache_clean_in_progress); + if (wait_event_interruptible(kbdev->cache_clean_wait, + !kbdev->cache_clean_in_progress)) + dev_warn(kbdev->dev, "Wait for cache clean is interrupted"); } } @@ -234,6 +294,7 @@ int kbase_gpu_wait_cache_clean_timeout(struct kbase_device *kbdev, unsigned int wait_timeout_ms) { long remaining = msecs_to_jiffies(wait_timeout_ms); + int result = 0; while (remaining && get_cache_clean_flag(kbdev)) { remaining = wait_event_timeout(kbdev->cache_clean_wait, @@ -241,5 +302,15 @@ int kbase_gpu_wait_cache_clean_timeout(struct kbase_device *kbdev, remaining); } - return (remaining ? 0 : -ETIMEDOUT); + if (!remaining) { + dev_err(kbdev->dev, + "Cache clean timed out. Might be caused by unstable GPU clk/pwr or faulty system"); + + if (kbase_prepare_to_reset_gpu_locked(kbdev, RESET_FLAGS_HWC_UNRECOVERABLE_ERROR)) + kbase_reset_gpu_locked(kbdev); + + result = -ETIMEDOUT; + } + + return result; } diff --git a/mali_kbase/device/mali_kbase_device_internal.h b/mali_kbase/device/mali_kbase_device_internal.h index d4f6875..de54c83 100644 --- a/mali_kbase/device/mali_kbase_device_internal.h +++ b/mali_kbase/device/mali_kbase_device_internal.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -89,3 +89,13 @@ int kbase_device_late_init(struct kbase_device *kbdev); * @kbdev: Device pointer */ void kbase_device_late_term(struct kbase_device *kbdev); + +#if MALI_USE_CSF && !IS_ENABLED(CONFIG_MALI_NO_MALI) +/** + * kbase_is_register_accessible - Checks if register is accessible + * @offset: Register offset + * + * Return: true if the register is accessible, false otherwise. + */ +bool kbase_is_register_accessible(u32 offset); +#endif /* MALI_USE_CSF && !IS_ENABLED(CONFIG_MALI_NO_MALI) */ diff --git a/mali_kbase/gpu/backend/mali_kbase_gpu_fault_csf.c b/mali_kbase/gpu/backend/mali_kbase_gpu_fault_csf.c index 893a335..60ba9be 100644 --- a/mali_kbase/gpu/backend/mali_kbase_gpu_fault_csf.c +++ b/mali_kbase/gpu/backend/mali_kbase_gpu_fault_csf.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -86,6 +86,9 @@ const char *kbase_gpu_exception_name(u32 const exception_code) case CS_FATAL_EXCEPTION_TYPE_FIRMWARE_INTERNAL_ERROR: e = "FIRMWARE_INTERNAL_ERROR"; break; + case CS_FATAL_EXCEPTION_TYPE_CS_UNRECOVERABLE: + e = "CS_UNRECOVERABLE"; + break; case CS_FAULT_EXCEPTION_TYPE_RESOURCE_EVICTION_TIMEOUT: e = "RESOURCE_EVICTION_TIMEOUT"; break; @@ -102,6 +105,70 @@ const char *kbase_gpu_exception_name(u32 const exception_code) case GPU_FAULTSTATUS_EXCEPTION_TYPE_GPU_CACHEABILITY_FAULT: e = "GPU_CACHEABILITY_FAULT"; break; + /* MMU Fault */ + case CS_FAULT_EXCEPTION_TYPE_TRANSLATION_FAULT_L0: + e = "TRANSLATION_FAULT at level 0"; + break; + case CS_FAULT_EXCEPTION_TYPE_TRANSLATION_FAULT_L1: + e = "TRANSLATION_FAULT at level 1"; + break; + case CS_FAULT_EXCEPTION_TYPE_TRANSLATION_FAULT_L2: + e = "TRANSLATION_FAULT at level 2"; + break; + case CS_FAULT_EXCEPTION_TYPE_TRANSLATION_FAULT_L3: + e = "TRANSLATION_FAULT at level 3"; + break; + case CS_FAULT_EXCEPTION_TYPE_TRANSLATION_FAULT_L4: + e = "TRANSLATION_FAULT"; + break; + case CS_FAULT_EXCEPTION_TYPE_PERMISSION_FAULT_0: + e = "PERMISSION_FAULT at level 0"; + break; + case CS_FAULT_EXCEPTION_TYPE_PERMISSION_FAULT_1: + e = "PERMISSION_FAULT at level 1"; + break; + case CS_FAULT_EXCEPTION_TYPE_PERMISSION_FAULT_2: + e = "PERMISSION_FAULT at level 2"; + break; + case CS_FAULT_EXCEPTION_TYPE_PERMISSION_FAULT_3: + e = "PERMISSION_FAULT at level 3"; + break; + case CS_FAULT_EXCEPTION_TYPE_ACCESS_FLAG_1: + e = "ACCESS_FLAG at level 1"; + break; + case CS_FAULT_EXCEPTION_TYPE_ACCESS_FLAG_2: + e = "ACCESS_FLAG at level 2"; + break; + case CS_FAULT_EXCEPTION_TYPE_ACCESS_FLAG_3: + e = "ACCESS_FLAG at level 3"; + break; + case CS_FAULT_EXCEPTION_TYPE_ADDRESS_SIZE_FAULT_IN: + e = "ADDRESS_SIZE_FAULT_IN"; + break; + case CS_FAULT_EXCEPTION_TYPE_ADDRESS_SIZE_FAULT_OUT_0: + e = "ADDRESS_SIZE_FAULT_OUT_0 at level 0"; + break; + case CS_FAULT_EXCEPTION_TYPE_ADDRESS_SIZE_FAULT_OUT_1: + e = "ADDRESS_SIZE_FAULT_OUT_1 at level 1"; + break; + case CS_FAULT_EXCEPTION_TYPE_ADDRESS_SIZE_FAULT_OUT_2: + e = "ADDRESS_SIZE_FAULT_OUT_2 at level 2"; + break; + case CS_FAULT_EXCEPTION_TYPE_ADDRESS_SIZE_FAULT_OUT_3: + e = "ADDRESS_SIZE_FAULT_OUT_3 at level 3"; + break; + case CS_FAULT_EXCEPTION_TYPE_MEMORY_ATTRIBUTE_FAULT_0: + e = "MEMORY_ATTRIBUTE_FAULT_0 at level 0"; + break; + case CS_FAULT_EXCEPTION_TYPE_MEMORY_ATTRIBUTE_FAULT_1: + e = "MEMORY_ATTRIBUTE_FAULT_1 at level 1"; + break; + case CS_FAULT_EXCEPTION_TYPE_MEMORY_ATTRIBUTE_FAULT_2: + e = "MEMORY_ATTRIBUTE_FAULT_2 at level 2"; + break; + case CS_FAULT_EXCEPTION_TYPE_MEMORY_ATTRIBUTE_FAULT_3: + e = "MEMORY_ATTRIBUTE_FAULT_3 at level 3"; + break; /* Any other exception code is unknown */ default: e = "UNKNOWN"; diff --git a/mali_kbase/gpu/backend/mali_kbase_gpu_fault_jm.c b/mali_kbase/gpu/backend/mali_kbase_gpu_fault_jm.c index 37015cc..7f3743c 100644 --- a/mali_kbase/gpu/backend/mali_kbase_gpu_fault_jm.c +++ b/mali_kbase/gpu/backend/mali_kbase_gpu_fault_jm.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -170,7 +170,7 @@ const char *kbase_gpu_exception_name(u32 const exception_code) default: e = "UNKNOWN"; break; - }; + } return e; } diff --git a/mali_kbase/gpu/backend/mali_kbase_gpu_regmap_csf.h b/mali_kbase/gpu/backend/mali_kbase_gpu_regmap_csf.h index f6945b3..ab989e0 100644 --- a/mali_kbase/gpu/backend/mali_kbase_gpu_regmap_csf.h +++ b/mali_kbase/gpu/backend/mali_kbase_gpu_regmap_csf.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -28,6 +28,17 @@ #error "Cannot be compiled with JM" #endif +/* GPU control registers */ +#define MCU_CONTROL 0x700 + +#define L2_CONFIG_PBHA_HWU_SHIFT GPU_U(12) +#define L2_CONFIG_PBHA_HWU_MASK (GPU_U(0xF) << L2_CONFIG_PBHA_HWU_SHIFT) +#define L2_CONFIG_PBHA_HWU_GET(reg_val) \ + (((reg_val)&L2_CONFIG_PBHA_HWU_MASK) >> L2_CONFIG_PBHA_HWU_SHIFT) +#define L2_CONFIG_PBHA_HWU_SET(reg_val, value) \ + (((reg_val) & ~L2_CONFIG_PBHA_HWU_MASK) | \ + (((value) << L2_CONFIG_PBHA_HWU_SHIFT) & L2_CONFIG_PBHA_HWU_MASK)) + /* GPU_CONTROL_MCU base address */ #define GPU_CONTROL_MCU_BASE 0x3000 @@ -35,38 +46,41 @@ #define MCU_SUBSYSTEM_BASE 0x20000 /* IPA control registers */ -#define IPA_CONTROL_BASE 0x40000 -#define IPA_CONTROL_REG(r) (IPA_CONTROL_BASE+(r)) -#define COMMAND 0x000 /* (WO) Command register */ -#define STATUS 0x004 /* (RO) Status register */ -#define TIMER 0x008 /* (RW) Timer control register */ - -#define SELECT_CSHW_LO 0x010 /* (RW) Counter select for CS hardware, low word */ -#define SELECT_CSHW_HI 0x014 /* (RW) Counter select for CS hardware, high word */ -#define SELECT_MEMSYS_LO 0x018 /* (RW) Counter select for Memory system, low word */ -#define SELECT_MEMSYS_HI 0x01C /* (RW) Counter select for Memory system, high word */ -#define SELECT_TILER_LO 0x020 /* (RW) Counter select for Tiler cores, low word */ -#define SELECT_TILER_HI 0x024 /* (RW) Counter select for Tiler cores, high word */ -#define SELECT_SHADER_LO 0x028 /* (RW) Counter select for Shader cores, low word */ -#define SELECT_SHADER_HI 0x02C /* (RW) Counter select for Shader cores, high word */ +#define IPA_CONTROL_BASE 0x40000 +#define IPA_CONTROL_REG(r) (IPA_CONTROL_BASE + (r)) + +#define COMMAND 0x000 /* (WO) Command register */ +#define STATUS 0x004 /* (RO) Status register */ +#define TIMER 0x008 /* (RW) Timer control register */ + +#define SELECT_CSHW_LO 0x010 /* (RW) Counter select for CS hardware, low word */ +#define SELECT_CSHW_HI 0x014 /* (RW) Counter select for CS hardware, high word */ +#define SELECT_MEMSYS_LO 0x018 /* (RW) Counter select for Memory system, low word */ +#define SELECT_MEMSYS_HI 0x01C /* (RW) Counter select for Memory system, high word */ +#define SELECT_TILER_LO 0x020 /* (RW) Counter select for Tiler cores, low word */ +#define SELECT_TILER_HI 0x024 /* (RW) Counter select for Tiler cores, high word */ +#define SELECT_SHADER_LO 0x028 /* (RW) Counter select for Shader cores, low word */ +#define SELECT_SHADER_HI 0x02C /* (RW) Counter select for Shader cores, high word */ /* Accumulated counter values for CS hardware */ -#define VALUE_CSHW_BASE 0x100 -#define VALUE_CSHW_REG_LO(n) (VALUE_CSHW_BASE + ((n) << 3)) /* (RO) Counter value #n, low word */ -#define VALUE_CSHW_REG_HI(n) (VALUE_CSHW_BASE + ((n) << 3) + 4) /* (RO) Counter value #n, high word */ +#define VALUE_CSHW_BASE 0x100 +#define VALUE_CSHW_REG_LO(n) (VALUE_CSHW_BASE + ((n) << 3)) /* (RO) Counter value #n, low word */ +#define VALUE_CSHW_REG_HI(n) (VALUE_CSHW_BASE + ((n) << 3) + 4) /* (RO) Counter value #n, high word */ /* Accumulated counter values for memory system */ -#define VALUE_MEMSYS_BASE 0x140 -#define VALUE_MEMSYS_REG_LO(n) (VALUE_MEMSYS_BASE + ((n) << 3)) /* (RO) Counter value #n, low word */ -#define VALUE_MEMSYS_REG_HI(n) (VALUE_MEMSYS_BASE + ((n) << 3) + 4) /* (RO) Counter value #n, high word */ +#define VALUE_MEMSYS_BASE 0x140 +#define VALUE_MEMSYS_REG_LO(n) (VALUE_MEMSYS_BASE + ((n) << 3)) /* (RO) Counter value #n, low word */ +#define VALUE_MEMSYS_REG_HI(n) (VALUE_MEMSYS_BASE + ((n) << 3) + 4) /* (RO) Counter value #n, high word */ -#define VALUE_TILER_BASE 0x180 -#define VALUE_TILER_REG_LO(n) (VALUE_TILER_BASE + ((n) << 3)) /* (RO) Counter value #n, low word */ -#define VALUE_TILER_REG_HI(n) (VALUE_TILER_BASE + ((n) << 3) + 4) /* (RO) Counter value #n, high word */ +#define VALUE_TILER_BASE 0x180 +#define VALUE_TILER_REG_LO(n) (VALUE_TILER_BASE + ((n) << 3)) /* (RO) Counter value #n, low word */ +#define VALUE_TILER_REG_HI(n) (VALUE_TILER_BASE + ((n) << 3) + 4) /* (RO) Counter value #n, high word */ -#define VALUE_SHADER_BASE 0x1C0 -#define VALUE_SHADER_REG_LO(n) (VALUE_SHADER_BASE + ((n) << 3)) /* (RO) Counter value #n, low word */ -#define VALUE_SHADER_REG_HI(n) (VALUE_SHADER_BASE + ((n) << 3) + 4) /* (RO) Counter value #n, high word */ +#define VALUE_SHADER_BASE 0x1C0 +#define VALUE_SHADER_REG_LO(n) (VALUE_SHADER_BASE + ((n) << 3)) /* (RO) Counter value #n, low word */ +#define VALUE_SHADER_REG_HI(n) (VALUE_SHADER_BASE + ((n) << 3) + 4) /* (RO) Counter value #n, high word */ + +#define AS_STATUS_AS_ACTIVE_INT 0x2 /* Set to implementation defined, outer caching */ #define AS_MEMATTR_AARCH64_OUTER_IMPL_DEF 0x88ull @@ -113,7 +127,6 @@ /* GPU control registers */ #define CORE_FEATURES 0x008 /* () Shader Core Features */ -#define MCU_CONTROL 0x700 #define MCU_STATUS 0x704 #define MCU_CNTRL_ENABLE (1 << 0) @@ -123,44 +136,20 @@ #define MCU_CNTRL_DOORBELL_DISABLE_SHIFT (31) #define MCU_CNTRL_DOORBELL_DISABLE_MASK (1 << MCU_CNTRL_DOORBELL_DISABLE_SHIFT) -#define MCU_STATUS_HALTED (1 << 1) - -#define PRFCNT_BASE_LO 0x060 /* (RW) Performance counter memory - * region base address, low word - */ -#define PRFCNT_BASE_HI 0x064 /* (RW) Performance counter memory - * region base address, high word - */ -#define PRFCNT_CONFIG 0x068 /* (RW) Performance counter - * configuration - */ - -#define PRFCNT_CSHW_EN 0x06C /* (RW) Performance counter - * enable for CS Hardware - */ - -#define PRFCNT_SHADER_EN 0x070 /* (RW) Performance counter enable - * flags for shader cores - */ -#define PRFCNT_TILER_EN 0x074 /* (RW) Performance counter enable - * flags for tiler - */ -#define PRFCNT_MMU_L2_EN 0x07C /* (RW) Performance counter enable - * flags for MMU/L2 cache - */ +#define MCU_STATUS_HALTED (1 << 1) /* JOB IRQ flags */ -#define JOB_IRQ_GLOBAL_IF (1 << 31) /* Global interface interrupt received */ +#define JOB_IRQ_GLOBAL_IF (1u << 31) /* Global interface interrupt received */ /* GPU_COMMAND codes */ #define GPU_COMMAND_CODE_NOP 0x00 /* No operation, nothing happens */ #define GPU_COMMAND_CODE_RESET 0x01 /* Reset the GPU */ -#define GPU_COMMAND_CODE_PRFCNT 0x02 /* Clear or sample performance counters */ #define GPU_COMMAND_CODE_TIME 0x03 /* Configure time sources */ #define GPU_COMMAND_CODE_FLUSH_CACHES 0x04 /* Flush caches */ #define GPU_COMMAND_CODE_SET_PROTECTED_MODE 0x05 /* Places the GPU in protected mode */ #define GPU_COMMAND_CODE_FINISH_HALT 0x06 /* Halt CSF */ #define GPU_COMMAND_CODE_CLEAR_FAULT 0x07 /* Clear GPU_FAULTSTATUS and GPU_FAULTADDRESS, TODX */ +#define GPU_COMMAND_CODE_FLUSH_PA_RANGE 0x08 /* Flush the GPU caches for a physical range, TITX */ /* GPU_COMMAND_RESET payloads */ @@ -179,27 +168,34 @@ */ #define GPU_COMMAND_RESET_PAYLOAD_HARD_RESET 0x02 -/* GPU_COMMAND_PRFCNT payloads */ -#define GPU_COMMAND_PRFCNT_PAYLOAD_SAMPLE 0x01 /* Sample performance counters */ -#define GPU_COMMAND_PRFCNT_PAYLOAD_CLEAR 0x02 /* Clear performance counters */ - /* GPU_COMMAND_TIME payloads */ #define GPU_COMMAND_TIME_DISABLE 0x00 /* Disable cycle counter */ #define GPU_COMMAND_TIME_ENABLE 0x01 /* Enable cycle counter */ /* GPU_COMMAND_FLUSH_CACHES payloads bits for L2 caches */ -#define GPU_COMMAND_FLUSH_PAYLOAD_L2_NONE 0x000 /* No flush */ -#define GPU_COMMAND_FLUSH_PAYLOAD_L2_CLEAN 0x001 /* CLN only */ -#define GPU_COMMAND_FLUSH_PAYLOAD_L2_CLEAN_INVALIDATE 0x003 /* CLN + INV */ +#define GPU_COMMAND_FLUSH_CACHES_PAYLOAD_L2_NONE 0x000 /* No flush */ +#define GPU_COMMAND_FLUSH_CACHES_PAYLOAD_L2_CLEAN 0x001 /* CLN only */ +#define GPU_COMMAND_FLUSH_CACHES_PAYLOAD_L2_CLEAN_INVALIDATE 0x003 /* CLN + INV */ /* GPU_COMMAND_FLUSH_CACHES payloads bits for Load-store caches */ -#define GPU_COMMAND_FLUSH_PAYLOAD_LSC_NONE 0x000 /* No flush */ -#define GPU_COMMAND_FLUSH_PAYLOAD_LSC_CLEAN 0x010 /* CLN only */ -#define GPU_COMMAND_FLUSH_PAYLOAD_LSC_CLEAN_INVALIDATE 0x030 /* CLN + INV */ +#define GPU_COMMAND_FLUSH_CACHES_PAYLOAD_LSC_NONE 0x000 /* No flush */ +#define GPU_COMMAND_FLUSH_CACHES_PAYLOAD_LSC_CLEAN 0x010 /* CLN only */ +#define GPU_COMMAND_FLUSH_CACHES_PAYLOAD_LSC_CLEAN_INVALIDATE 0x030 /* CLN + INV */ /* GPU_COMMAND_FLUSH_CACHES payloads bits for Other caches */ -#define GPU_COMMAND_FLUSH_PAYLOAD_OTHER_NONE 0x000 /* No flush */ -#define GPU_COMMAND_FLUSH_PAYLOAD_OTHER_INVALIDATE 0x200 /* INV only */ +#define GPU_COMMAND_FLUSH_CACHES_PAYLOAD_OTHER_NONE 0x000 /* No flush */ +#define GPU_COMMAND_FLUSH_CACHES_PAYLOAD_OTHER_INVALIDATE 0x200 /* INV only */ + +/* GPU_COMMAND_FLUSH_PA_RANGE payload bits for flush modes */ +#define GPU_COMMAND_FLUSH_PA_RANGE_PAYLOAD_MODE_NONE 0x00 /* No flush */ +#define GPU_COMMAND_FLUSH_PA_RANGE_PAYLOAD_MODE_CLEAN 0x01 /* CLN only */ +#define GPU_COMMAND_FLUSH_PA_RANGE_PAYLOAD_MODE_INVALIDATE 0x02 /* INV only */ +#define GPU_COMMAND_FLUSH_PA_RANGE_PAYLOAD_MODE_CLEAN_INVALIDATE 0x03 /* CLN + INV */ + +/* GPU_COMMAND_FLUSH_PA_RANGE payload bits for which caches should be the target of the command */ +#define GPU_COMMAND_FLUSH_PA_RANGE_PAYLOAD_OTHER_CACHE 0x10 /* Other caches */ +#define GPU_COMMAND_FLUSH_PA_RANGE_PAYLOAD_LSC_CACHE 0x20 /* Load-store caches */ +#define GPU_COMMAND_FLUSH_PA_RANGE_PAYLOAD_L2_CACHE 0x40 /* L2 caches */ /* GPU_COMMAND command + payload */ #define GPU_COMMAND_CODE_PAYLOAD(opcode, payload) \ @@ -218,14 +214,6 @@ #define GPU_COMMAND_HARD_RESET \ GPU_COMMAND_CODE_PAYLOAD(GPU_COMMAND_CODE_RESET, GPU_COMMAND_RESET_PAYLOAD_HARD_RESET) -/* Clear all performance counters, setting them all to zero. */ -#define GPU_COMMAND_PRFCNT_CLEAR \ - GPU_COMMAND_CODE_PAYLOAD(GPU_COMMAND_CODE_PRFCNT, GPU_COMMAND_PRFCNT_PAYLOAD_CLEAR) - -/* Sample all performance counters, writing them out to memory */ -#define GPU_COMMAND_PRFCNT_SAMPLE \ - GPU_COMMAND_CODE_PAYLOAD(GPU_COMMAND_CODE_PRFCNT, GPU_COMMAND_PRFCNT_PAYLOAD_SAMPLE) - /* Starts the cycle counter, and system timestamp propagation */ #define GPU_COMMAND_CYCLE_COUNT_START \ GPU_COMMAND_CODE_PAYLOAD(GPU_COMMAND_CODE_TIME, GPU_COMMAND_TIME_ENABLE) @@ -235,28 +223,53 @@ GPU_COMMAND_CODE_PAYLOAD(GPU_COMMAND_CODE_TIME, GPU_COMMAND_TIME_DISABLE) /* Clean and invalidate L2 cache (Equivalent to FLUSH_PT) */ -#define GPU_COMMAND_CACHE_CLN_INV_L2 \ - GPU_COMMAND_CODE_PAYLOAD( \ - GPU_COMMAND_CODE_FLUSH_CACHES, \ - (GPU_COMMAND_FLUSH_PAYLOAD_L2_CLEAN_INVALIDATE | \ - GPU_COMMAND_FLUSH_PAYLOAD_LSC_NONE | \ - GPU_COMMAND_FLUSH_PAYLOAD_OTHER_NONE)) +#define GPU_COMMAND_CACHE_CLN_INV_L2 \ + GPU_COMMAND_CODE_PAYLOAD(GPU_COMMAND_CODE_FLUSH_CACHES, \ + (GPU_COMMAND_FLUSH_CACHES_PAYLOAD_L2_CLEAN_INVALIDATE | \ + GPU_COMMAND_FLUSH_CACHES_PAYLOAD_LSC_NONE | \ + GPU_COMMAND_FLUSH_CACHES_PAYLOAD_OTHER_NONE)) /* Clean and invalidate L2 and LSC caches (Equivalent to FLUSH_MEM) */ -#define GPU_COMMAND_CACHE_CLN_INV_L2_LSC \ - GPU_COMMAND_CODE_PAYLOAD( \ - GPU_COMMAND_CODE_FLUSH_CACHES, \ - (GPU_COMMAND_FLUSH_PAYLOAD_L2_CLEAN_INVALIDATE | \ - GPU_COMMAND_FLUSH_PAYLOAD_LSC_CLEAN_INVALIDATE | \ - GPU_COMMAND_FLUSH_PAYLOAD_OTHER_NONE)) +#define GPU_COMMAND_CACHE_CLN_INV_L2_LSC \ + GPU_COMMAND_CODE_PAYLOAD(GPU_COMMAND_CODE_FLUSH_CACHES, \ + (GPU_COMMAND_FLUSH_CACHES_PAYLOAD_L2_CLEAN_INVALIDATE | \ + GPU_COMMAND_FLUSH_CACHES_PAYLOAD_LSC_CLEAN_INVALIDATE | \ + GPU_COMMAND_FLUSH_CACHES_PAYLOAD_OTHER_NONE)) /* Clean and invalidate L2, LSC, and Other caches */ -#define GPU_COMMAND_CACHE_CLN_INV_FULL \ - GPU_COMMAND_CODE_PAYLOAD( \ - GPU_COMMAND_CODE_FLUSH_CACHES, \ - (GPU_COMMAND_FLUSH_PAYLOAD_L2_CLEAN_INVALIDATE | \ - GPU_COMMAND_FLUSH_PAYLOAD_LSC_CLEAN_INVALIDATE | \ - GPU_COMMAND_FLUSH_PAYLOAD_OTHER_INVALIDATE)) +#define GPU_COMMAND_CACHE_CLN_INV_FULL \ + GPU_COMMAND_CODE_PAYLOAD(GPU_COMMAND_CODE_FLUSH_CACHES, \ + (GPU_COMMAND_FLUSH_CACHES_PAYLOAD_L2_CLEAN_INVALIDATE | \ + GPU_COMMAND_FLUSH_CACHES_PAYLOAD_LSC_CLEAN_INVALIDATE | \ + GPU_COMMAND_FLUSH_CACHES_PAYLOAD_OTHER_INVALIDATE)) + +/* Clean and invalidate only LSC cache */ +#define GPU_COMMAND_CACHE_CLN_INV_LSC \ + GPU_COMMAND_CODE_PAYLOAD(GPU_COMMAND_CODE_FLUSH_CACHES, \ + (GPU_COMMAND_FLUSH_CACHES_PAYLOAD_L2_NONE | \ + GPU_COMMAND_FLUSH_CACHES_PAYLOAD_LSC_CLEAN_INVALIDATE | \ + GPU_COMMAND_FLUSH_CACHES_PAYLOAD_OTHER_NONE)) + +/* Clean and invalidate physical range L2 cache (equivalent to FLUSH_PT) */ +#define GPU_COMMAND_FLUSH_PA_RANGE_CLN_INV_L2 \ + GPU_COMMAND_CODE_PAYLOAD(GPU_COMMAND_CODE_FLUSH_PA_RANGE, \ + (GPU_COMMAND_FLUSH_PA_RANGE_PAYLOAD_MODE_CLEAN_INVALIDATE | \ + GPU_COMMAND_FLUSH_PA_RANGE_PAYLOAD_L2_CACHE)) + +/* Clean and invalidate physical range L2 and LSC cache (equivalent to FLUSH_MEM) */ +#define GPU_COMMAND_FLUSH_PA_RANGE_CLN_INV_L2_LSC \ + GPU_COMMAND_CODE_PAYLOAD(GPU_COMMAND_CODE_FLUSH_PA_RANGE, \ + (GPU_COMMAND_FLUSH_PA_RANGE_PAYLOAD_MODE_CLEAN_INVALIDATE | \ + GPU_COMMAND_FLUSH_PA_RANGE_PAYLOAD_LSC_CACHE | \ + GPU_COMMAND_FLUSH_PA_RANGE_PAYLOAD_L2_CACHE)) + +/* Clean and invalidate physical range L2, LSC and Other caches */ +#define GPU_COMMAND_FLUSH_PA_RANGE_CLN_INV_FULL \ + GPU_COMMAND_CODE_PAYLOAD(GPU_COMMAND_CODE_FLUSH_PA_RANGE, \ + (GPU_COMMAND_FLUSH_PA_RANGE_PAYLOAD_MODE_CLEAN_INVALIDATE | \ + GPU_COMMAND_FLUSH_PA_RANGE_PAYLOAD_OTHER_CACHE | \ + GPU_COMMAND_FLUSH_PA_RANGE_PAYLOAD_LSC_CACHE | \ + GPU_COMMAND_FLUSH_PA_RANGE_PAYLOAD_L2_CACHE)) /* Merge cache flush commands */ #define GPU_COMMAND_FLUSH_CACHE_MERGE(cmd1, cmd2) ((cmd1) | (cmd2)) @@ -285,13 +298,13 @@ #define GPU_FAULTSTATUS_ACCESS_TYPE_MASK \ (0x3ul << GPU_FAULTSTATUS_ACCESS_TYPE_SHIFT) -#define GPU_FAULTSTATUS_ADDR_VALID_SHIFT 10 -#define GPU_FAULTSTATUS_ADDR_VALID_FLAG \ - (1ul << GPU_FAULTSTATUS_ADDR_VALID_SHIFT) +#define GPU_FAULTSTATUS_ADDRESS_VALID_SHIFT GPU_U(10) +#define GPU_FAULTSTATUS_ADDRESS_VALID_MASK \ + (GPU_U(0x1) << GPU_FAULTSTATUS_ADDRESS_VALID_SHIFT) -#define GPU_FAULTSTATUS_JASID_VALID_SHIFT 11 -#define GPU_FAULTSTATUS_JASID_VALID_FLAG \ - (1ul << GPU_FAULTSTATUS_JASID_VALID_SHIFT) +#define GPU_FAULTSTATUS_JASID_VALID_SHIFT GPU_U(11) +#define GPU_FAULTSTATUS_JASID_VALID_MASK \ + (GPU_U(0x1) << GPU_FAULTSTATUS_JASID_VALID_SHIFT) #define GPU_FAULTSTATUS_JASID_SHIFT 12 #define GPU_FAULTSTATUS_JASID_MASK (0xF << GPU_FAULTSTATUS_JASID_SHIFT) @@ -337,14 +350,16 @@ (((value) << GPU_FAULTSTATUS_ADDRESS_VALID_SHIFT) & GPU_FAULTSTATUS_ADDRESS_VALID_MASK)) /* IRQ flags */ -#define GPU_FAULT (1 << 0) /* A GPU Fault has occurred */ -#define GPU_PROTECTED_FAULT (1 << 1) /* A GPU fault has occurred in protected mode */ -#define RESET_COMPLETED (1 << 8) /* Set when a reset has completed. */ -#define POWER_CHANGED_SINGLE (1 << 9) /* Set when a single core has finished powering up or down. */ -#define POWER_CHANGED_ALL (1 << 10) /* Set when all cores have finished powering up or down. */ -#define CLEAN_CACHES_COMPLETED (1 << 17) /* Set when a cache clean operation has completed. */ -#define DOORBELL_MIRROR (1 << 18) /* Mirrors the doorbell interrupt line to the CPU */ -#define MCU_STATUS_GPU_IRQ (1 << 19) /* MCU requires attention */ +#define GPU_FAULT (1 << 0) /* A GPU Fault has occurred */ +#define GPU_PROTECTED_FAULT (1 << 1) /* A GPU fault has occurred in protected mode */ +#define RESET_COMPLETED (1 << 8) /* Set when a reset has completed. */ +#define POWER_CHANGED_SINGLE (1 << 9) /* Set when a single core has finished powering up or down. */ +#define POWER_CHANGED_ALL (1 << 10) /* Set when all cores have finished powering up or down. */ +#define CLEAN_CACHES_COMPLETED (1 << 17) /* Set when a cache clean operation has completed. */ +#define DOORBELL_MIRROR (1 << 18) /* Mirrors the doorbell interrupt line to the CPU */ +#define MCU_STATUS_GPU_IRQ (1 << 19) /* MCU requires attention */ +#define FLUSH_PA_RANGE_COMPLETED \ + (1 << 20) /* Set when a physical range cache clean operation has completed. */ /* * In Debug build, @@ -362,7 +377,11 @@ #define GPU_IRQ_REG_COMMON (GPU_FAULT | GPU_PROTECTED_FAULT | RESET_COMPLETED \ | POWER_CHANGED_ALL | MCU_STATUS_GPU_IRQ) -/* GPU_CONTROL_MCU.GPU_IRQ_RAWSTAT */ -#define PRFCNT_SAMPLE_COMPLETED (1 << 16) /* Set when performance count sample has completed */ +/* GPU_FEATURES register */ +#define GPU_FEATURES_RAY_TRACING_SHIFT GPU_U(2) +#define GPU_FEATURES_RAY_TRACING_MASK (GPU_U(0x1) << GPU_FEATURES_RAY_TRACING_SHIFT) +#define GPU_FEATURES_RAY_TRACING_GET(reg_val) \ + (((reg_val)&GPU_FEATURES_RAY_TRACING_MASK) >> GPU_FEATURES_RAY_TRACING_SHIFT) +/* End of GPU_FEATURES register */ #endif /* _KBASE_GPU_REGMAP_CSF_H_ */ diff --git a/mali_kbase/gpu/backend/mali_kbase_gpu_regmap_jm.h b/mali_kbase/gpu/backend/mali_kbase_gpu_regmap_jm.h index d1cd8fc..387cd50 100644 --- a/mali_kbase/gpu/backend/mali_kbase_gpu_regmap_jm.h +++ b/mali_kbase/gpu/backend/mali_kbase_gpu_regmap_jm.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -59,28 +59,27 @@ #define CORE_FEATURES 0x008 /* (RO) Shader Core Features */ #define JS_PRESENT 0x01C /* (RO) Job slots present */ - -#define PRFCNT_BASE_LO 0x060 /* (RW) Performance counter memory - * region base address, low word - */ -#define PRFCNT_BASE_HI 0x064 /* (RW) Performance counter memory - * region base address, high word - */ -#define PRFCNT_CONFIG 0x068 /* (RW) Performance counter - * configuration - */ -#define PRFCNT_JM_EN 0x06C /* (RW) Performance counter enable - * flags for Job Manager - */ -#define PRFCNT_SHADER_EN 0x070 /* (RW) Performance counter enable - * flags for shader cores - */ -#define PRFCNT_TILER_EN 0x074 /* (RW) Performance counter enable - * flags for tiler - */ -#define PRFCNT_MMU_L2_EN 0x07C /* (RW) Performance counter enable - * flags for MMU/L2 cache - */ +#define LATEST_FLUSH 0x038 /* (RO) Flush ID of latest + * clean-and-invalidate operation + */ +#define PRFCNT_BASE_LO 0x060 /* (RW) Performance counter memory + * region base address, low word + */ +#define PRFCNT_BASE_HI 0x064 /* (RW) Performance counter memory + * region base address, high word + */ +#define PRFCNT_CONFIG 0x068 /* (RW) Performance counter configuration */ +#define PRFCNT_JM_EN 0x06C /* (RW) Performance counter enable + * flags for Job Manager + */ +#define PRFCNT_SHADER_EN 0x070 /* (RW) Performance counter enable + * flags for shader cores */ +#define PRFCNT_TILER_EN 0x074 /* (RW) Performance counter enable + * flags for tiler + */ +#define PRFCNT_MMU_L2_EN 0x07C /* (RW) Performance counter enable + * flags for MMU/L2 cache + */ #define JS0_FEATURES 0x0C0 /* (RO) Features of job slot 0 */ #define JS1_FEATURES 0x0C4 /* (RO) Features of job slot 1 */ @@ -109,6 +108,7 @@ #define JOB_IRQ_THROTTLE 0x014 /* cycles to delay delivering an interrupt externally. The JOB_IRQ_STATUS is NOT affected by this, just the delivery of the interrupt. */ #define JOB_SLOT0 0x800 /* Configuration registers for job slot 0 */ +#define JOB_SLOT_REG(n, r) (JOB_CONTROL_REG(JOB_SLOT0 + ((n) << 7)) + (r)) #define JOB_SLOT1 0x880 /* Configuration registers for job slot 1 */ #define JOB_SLOT2 0x900 /* Configuration registers for job slot 2 */ #define JOB_SLOT3 0x980 /* Configuration registers for job slot 3 */ @@ -125,48 +125,41 @@ #define JOB_SLOT14 0xF00 /* Configuration registers for job slot 14 */ #define JOB_SLOT15 0xF80 /* Configuration registers for job slot 15 */ -#define JOB_SLOT_REG(n, r) (JOB_CONTROL_REG(JOB_SLOT0 + ((n) << 7)) + (r)) - -#define JS_HEAD_LO 0x00 /* (RO) Job queue head pointer for job slot n, low word */ -#define JS_HEAD_HI 0x04 /* (RO) Job queue head pointer for job slot n, high word */ -#define JS_TAIL_LO 0x08 /* (RO) Job queue tail pointer for job slot n, low word */ -#define JS_TAIL_HI 0x0C /* (RO) Job queue tail pointer for job slot n, high word */ -#define JS_AFFINITY_LO 0x10 /* (RO) Core affinity mask for job slot n, low word */ -#define JS_AFFINITY_HI 0x14 /* (RO) Core affinity mask for job slot n, high word */ -#define JS_CONFIG 0x18 /* (RO) Configuration settings for job slot n */ -/* (RO) Extended affinity mask for job slot n*/ -#define JS_XAFFINITY 0x1C - -#define JS_COMMAND 0x20 /* (WO) Command register for job slot n */ -#define JS_STATUS 0x24 /* (RO) Status register for job slot n */ - -#define JS_HEAD_NEXT_LO 0x40 /* (RW) Next job queue head pointer for job slot n, low word */ -#define JS_HEAD_NEXT_HI 0x44 /* (RW) Next job queue head pointer for job slot n, high word */ - -#define JS_AFFINITY_NEXT_LO 0x50 /* (RW) Next core affinity mask for job slot n, low word */ -#define JS_AFFINITY_NEXT_HI 0x54 /* (RW) Next core affinity mask for job slot n, high word */ -#define JS_CONFIG_NEXT 0x58 /* (RW) Next configuration settings for job slot n */ -/* (RW) Next extended affinity mask for job slot n */ -#define JS_XAFFINITY_NEXT 0x5C - -#define JS_COMMAND_NEXT 0x60 /* (RW) Next command register for job slot n */ - -#define JS_FLUSH_ID_NEXT 0x70 /* (RW) Next job slot n cache flush ID */ +/* JM Job control register definitions for mali_kbase_debug_job_fault */ +#define JS_HEAD_LO 0x00 /* (RO) Job queue head pointer for job slot n, low word */ +#define JS_HEAD_HI 0x04 /* (RO) Job queue head pointer for job slot n, high word */ +#define JS_TAIL_LO 0x08 /* (RO) Job queue tail pointer for job slot n, low word */ +#define JS_TAIL_HI 0x0C /* (RO) Job queue tail pointer for job slot n, high word */ +#define JS_AFFINITY_LO 0x10 /* (RO) Core affinity mask for job slot n, low word */ +#define JS_AFFINITY_HI 0x14 /* (RO) Core affinity mask for job slot n, high word */ +#define JS_CONFIG 0x18 /* (RO) Configuration settings for job slot n */ +#define JS_XAFFINITY 0x1C /* (RO) Extended affinity mask for job slot n*/ +#define JS_COMMAND 0x20 /* (WO) Command register for job slot n */ +#define JS_STATUS 0x24 /* (RO) Status register for job slot n */ +#define JS_HEAD_NEXT_LO 0x40 /* (RW) Next job queue head pointer for job slot n, low word */ +#define JS_HEAD_NEXT_HI 0x44 /* (RW) Next job queue head pointer for job slot n, high word */ +#define JS_AFFINITY_NEXT_LO 0x50 /* (RW) Next core affinity mask for job slot n, low word */ +#define JS_AFFINITY_NEXT_HI 0x54 /* (RW) Next core affinity mask for job slot n, high word */ +#define JS_CONFIG_NEXT 0x58 /* (RW) Next configuration settings for job slot n */ +#define JS_XAFFINITY_NEXT 0x5C /* (RW) Next extended affinity mask for job slot n */ +#define JS_COMMAND_NEXT 0x60 /* (RW) Next command register for job slot n */ + +#define JS_FLUSH_ID_NEXT 0x70 /* (RW) Next job slot n cache flush ID */ /* No JM-specific MMU control registers */ /* No JM-specific MMU address space control registers */ /* JS_COMMAND register commands */ -#define JS_COMMAND_NOP 0x00 /* NOP Operation. Writing this value is ignored */ -#define JS_COMMAND_START 0x01 /* Start processing a job chain. Writing this value is ignored */ -#define JS_COMMAND_SOFT_STOP 0x02 /* Gently stop processing a job chain */ -#define JS_COMMAND_HARD_STOP 0x03 /* Rudely stop processing a job chain */ -#define JS_COMMAND_SOFT_STOP_0 0x04 /* Execute SOFT_STOP if JOB_CHAIN_FLAG is 0 */ -#define JS_COMMAND_HARD_STOP_0 0x05 /* Execute HARD_STOP if JOB_CHAIN_FLAG is 0 */ -#define JS_COMMAND_SOFT_STOP_1 0x06 /* Execute SOFT_STOP if JOB_CHAIN_FLAG is 1 */ -#define JS_COMMAND_HARD_STOP_1 0x07 /* Execute HARD_STOP if JOB_CHAIN_FLAG is 1 */ +#define JS_COMMAND_NOP 0x00 /* NOP Operation. Writing this value is ignored */ +#define JS_COMMAND_START 0x01 /* Start processing a job chain. Writing this value is ignored */ +#define JS_COMMAND_SOFT_STOP 0x02 /* Gently stop processing a job chain */ +#define JS_COMMAND_HARD_STOP 0x03 /* Rudely stop processing a job chain */ +#define JS_COMMAND_SOFT_STOP_0 0x04 /* Execute SOFT_STOP if JOB_CHAIN_FLAG is 0 */ +#define JS_COMMAND_HARD_STOP_0 0x05 /* Execute HARD_STOP if JOB_CHAIN_FLAG is 0 */ +#define JS_COMMAND_SOFT_STOP_1 0x06 /* Execute SOFT_STOP if JOB_CHAIN_FLAG is 1 */ +#define JS_COMMAND_HARD_STOP_1 0x07 /* Execute HARD_STOP if JOB_CHAIN_FLAG is 1 */ -#define JS_COMMAND_MASK 0x07 /* Mask of bits currently in use by the HW */ +#define JS_COMMAND_MASK 0x07 /* Mask of bits currently in use by the HW */ /* Possible values of JS_CONFIG and JS_CONFIG_NEXT registers */ #define JS_CONFIG_START_FLUSH_NO_ACTION (0u << 0) @@ -262,19 +255,22 @@ #define GPU_COMMAND_CACHE_CLN_INV_L2 GPU_COMMAND_CLEAN_INV_CACHES #define GPU_COMMAND_CACHE_CLN_INV_L2_LSC GPU_COMMAND_CLEAN_INV_CACHES #define GPU_COMMAND_CACHE_CLN_INV_FULL GPU_COMMAND_CLEAN_INV_CACHES +#define GPU_COMMAND_CACHE_CLN_INV_LSC GPU_COMMAND_CLEAN_INV_CACHES /* Merge cache flush commands */ #define GPU_COMMAND_FLUSH_CACHE_MERGE(cmd1, cmd2) \ ((cmd1) > (cmd2) ? (cmd1) : (cmd2)) /* IRQ flags */ -#define GPU_FAULT (1 << 0) /* A GPU Fault has occurred */ -#define MULTIPLE_GPU_FAULTS (1 << 7) /* More than one GPU Fault occurred. */ -#define RESET_COMPLETED (1 << 8) /* Set when a reset has completed. */ -#define POWER_CHANGED_SINGLE (1 << 9) /* Set when a single core has finished powering up or down. */ -#define POWER_CHANGED_ALL (1 << 10) /* Set when all cores have finished powering up or down. */ -#define PRFCNT_SAMPLE_COMPLETED (1 << 16) /* Set when a performance count sample has completed. */ -#define CLEAN_CACHES_COMPLETED (1 << 17) /* Set when a cache clean operation has completed. */ +#define GPU_FAULT (1 << 0) /* A GPU Fault has occurred */ +#define MULTIPLE_GPU_FAULTS (1 << 7) /* More than one GPU Fault occurred. */ +#define RESET_COMPLETED (1 << 8) /* Set when a reset has completed. */ +#define POWER_CHANGED_SINGLE (1 << 9) /* Set when a single core has finished powering up or down. */ +#define POWER_CHANGED_ALL (1 << 10) /* Set when all cores have finished powering up or down. */ +#define PRFCNT_SAMPLE_COMPLETED (1 << 16) /* Set when a performance count sample has completed. */ +#define CLEAN_CACHES_COMPLETED (1 << 17) /* Set when a cache clean operation has completed. */ +#define FLUSH_PA_RANGE_COMPLETED \ + (1 << 20) /* Set when a physical range cache clean operation has completed. */ /* * In Debug build, diff --git a/mali_kbase/gpu/mali_kbase_gpu.c b/mali_kbase/gpu/mali_kbase_gpu.c index 8a84ef5..eee670f 100644 --- a/mali_kbase/gpu/mali_kbase_gpu.c +++ b/mali_kbase/gpu/mali_kbase_gpu.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -32,7 +32,7 @@ const char *kbase_gpu_access_type_name(u32 fault_status) return "READ"; case AS_FAULTSTATUS_ACCESS_TYPE_WRITE: return "WRITE"; - case AS_FAULTSTATUS_ACCESS_TYPE_EX: + case AS_FAULTSTATUS_ACCESS_TYPE_EXECUTE: return "EXECUTE"; default: WARN_ON(1); diff --git a/mali_kbase/gpu/mali_kbase_gpu_fault.h b/mali_kbase/gpu/mali_kbase_gpu_fault.h index 8b50a5d..6a937a5 100644 --- a/mali_kbase/gpu/mali_kbase_gpu_fault.h +++ b/mali_kbase/gpu/mali_kbase_gpu_fault.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -27,9 +27,9 @@ * * @exception_code: exception code * - * This function is called from the interrupt handler when a GPU fault occurs. + * This function is called by error handlers when GPU reports an error. * - * Return: name associated with the exception code + * Return: Error string associated with the exception code */ const char *kbase_gpu_exception_name(u32 exception_code); diff --git a/mali_kbase/gpu/mali_kbase_gpu_regmap.h b/mali_kbase/gpu/mali_kbase_gpu_regmap.h index 1d2a49b..a92b498 100644 --- a/mali_kbase/gpu/mali_kbase_gpu_regmap.h +++ b/mali_kbase/gpu/mali_kbase_gpu_regmap.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2010-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -25,6 +25,7 @@ #include <uapi/gpu/arm/midgard/gpu/mali_kbase_gpu_regmap.h> #include <uapi/gpu/arm/midgard/gpu/mali_kbase_gpu_coherency.h> #include <uapi/gpu/arm/midgard/gpu/mali_kbase_gpu_id.h> + #if MALI_USE_CSF #include "backend/mali_kbase_gpu_regmap_csf.h" #else @@ -34,15 +35,21 @@ /* GPU_U definition */ #ifdef __ASSEMBLER__ #define GPU_U(x) x +#define GPU_UL(x) x +#define GPU_ULL(x) x #else #define GPU_U(x) x##u +#define GPU_UL(x) x##ul +#define GPU_ULL(x) x##ull #endif /* __ASSEMBLER__ */ + /* Begin Register Offsets */ /* GPU control registers */ #define GPU_CONTROL_BASE 0x0000 #define GPU_CONTROL_REG(r) (GPU_CONTROL_BASE + (r)) + #define GPU_ID 0x000 /* (RO) GPU and revision identifier */ #define L2_FEATURES 0x004 /* (RO) Level 2 cache features */ #define TILER_FEATURES 0x00C /* (RO) Tiler Features */ @@ -53,9 +60,12 @@ #define GPU_IRQ_CLEAR 0x024 /* (WO) */ #define GPU_IRQ_MASK 0x028 /* (RW) */ #define GPU_IRQ_STATUS 0x02C /* (RO) */ - #define GPU_COMMAND 0x030 /* (WO) */ + #define GPU_STATUS 0x034 /* (RO) */ +#define GPU_STATUS_PRFCNT_ACTIVE (1 << 2) /* Set if the performance counters are active. */ +#define GPU_STATUS_CYCLE_COUNT_ACTIVE (1 << 6) /* Set if the cycle counter is active. */ +#define GPU_STATUS_PROTECTED_MODE_ACTIVE (1 << 7) /* Set if protected mode is active */ #define GPU_DBGEN (1 << 8) /* DBGEN wire status */ @@ -65,10 +75,9 @@ #define L2_CONFIG 0x048 /* (RW) Level 2 cache configuration */ -#define GROUPS_L2_COHERENT (1 << 0) /* Cores groups are l2 coherent */ -#define SUPER_L2_COHERENT (1 << 1) /* Shader cores within a core - * supergroup are l2 coherent - */ +/* Cores groups are l2 coherent */ +#define MEM_FEATURES_COHERENT_CORE_GROUP_SHIFT GPU_U(0) +#define MEM_FEATURES_COHERENT_CORE_GROUP_MASK (GPU_U(0x1) << MEM_FEATURES_COHERENT_CORE_GROUP_SHIFT) #define PWR_KEY 0x050 /* (WO) Power manager key register */ #define PWR_OVERRIDE0 0x054 /* (RW) Power manager override settings */ @@ -96,6 +105,11 @@ #define TEXTURE_FEATURES_REG(n) GPU_CONTROL_REG(TEXTURE_FEATURES_0 + ((n) << 2)) +#define GPU_COMMAND_ARG0_LO 0x0D0 /* (RW) Additional parameter 0 for GPU commands, low word */ +#define GPU_COMMAND_ARG0_HI 0x0D4 /* (RW) Additional parameter 0 for GPU commands, high word */ +#define GPU_COMMAND_ARG1_LO 0x0D8 /* (RW) Additional parameter 1 for GPU commands, low word */ +#define GPU_COMMAND_ARG1_HI 0x0DC /* (RW) Additional parameter 1 for GPU commands, high word */ + #define SHADER_PRESENT_LO 0x100 /* (RO) Shader core present bitmap, low word */ #define SHADER_PRESENT_HI 0x104 /* (RO) Shader core present bitmap, high word */ @@ -105,9 +119,6 @@ #define L2_PRESENT_LO 0x120 /* (RO) Level 2 cache present bitmap, low word */ #define L2_PRESENT_HI 0x124 /* (RO) Level 2 cache present bitmap, high word */ -#define STACK_PRESENT_LO 0xE00 /* (RO) Core stack present bitmap, low word */ -#define STACK_PRESENT_HI 0xE04 /* (RO) Core stack present bitmap, high word */ - #define SHADER_READY_LO 0x140 /* (RO) Shader core ready bitmap, low word */ #define SHADER_READY_HI 0x144 /* (RO) Shader core ready bitmap, high word */ @@ -117,18 +128,23 @@ #define L2_READY_LO 0x160 /* (RO) Level 2 cache ready bitmap, low word */ #define L2_READY_HI 0x164 /* (RO) Level 2 cache ready bitmap, high word */ -#define STACK_READY_LO 0xE10 /* (RO) Core stack ready bitmap, low word */ -#define STACK_READY_HI 0xE14 /* (RO) Core stack ready bitmap, high word */ - #define SHADER_PWRON_LO 0x180 /* (WO) Shader core power on bitmap, low word */ #define SHADER_PWRON_HI 0x184 /* (WO) Shader core power on bitmap, high word */ +#define SHADER_PWRFEATURES 0x188 /* (RW) Shader core power features */ + #define TILER_PWRON_LO 0x190 /* (WO) Tiler core power on bitmap, low word */ #define TILER_PWRON_HI 0x194 /* (WO) Tiler core power on bitmap, high word */ #define L2_PWRON_LO 0x1A0 /* (WO) Level 2 cache power on bitmap, low word */ #define L2_PWRON_HI 0x1A4 /* (WO) Level 2 cache power on bitmap, high word */ +#define STACK_PRESENT_LO 0xE00 /* (RO) Core stack present bitmap, low word */ +#define STACK_PRESENT_HI 0xE04 /* (RO) Core stack present bitmap, high word */ + +#define STACK_READY_LO 0xE10 /* (RO) Core stack ready bitmap, low word */ +#define STACK_READY_HI 0xE14 /* (RO) Core stack ready bitmap, high word */ + #define STACK_PWRON_LO 0xE20 /* (RO) Core stack power on bitmap, low word */ #define STACK_PWRON_HI 0xE24 /* (RO) Core stack power on bitmap, high word */ @@ -176,6 +192,8 @@ #define COHERENCY_FEATURES 0x300 /* (RO) Coherency features present */ #define COHERENCY_ENABLE 0x304 /* (RW) Coherency enable */ +#define AMBA_FEATURES 0x300 /* (RO) AMBA bus supported features */ +#define AMBA_ENABLE 0x304 /* (RW) AMBA features enable */ #define SHADER_CONFIG 0xF04 /* (RW) Shader core configuration (implementation-specific) */ #define TILER_CONFIG 0xF08 /* (RW) Tiler core configuration (implementation-specific) */ @@ -184,7 +202,6 @@ /* Job control registers */ #define JOB_CONTROL_BASE 0x1000 - #define JOB_CONTROL_REG(r) (JOB_CONTROL_BASE + (r)) #define JOB_IRQ_RAWSTAT 0x000 /* Raw interrupt status register */ @@ -194,6 +211,10 @@ /* MMU control registers */ +#define MMU_CONTROL_BASE 0x2000 +#define MMU_CONTROL_REG(r) (MMU_CONTROL_BASE + (r)) + +#define MMU_IRQ_RAWSTAT 0x000 /* (RW) Raw interrupt status register */ #define MMU_IRQ_CLEAR 0x004 /* (WO) Interrupt clear register */ #define MMU_IRQ_MASK 0x008 /* (RW) Interrupt mask register */ #define MMU_IRQ_STATUS 0x00C /* (RO) Interrupt status register */ @@ -217,28 +238,26 @@ /* MMU address space control registers */ -#define MMU_AS_REG(n, r) (MMU_REG(MMU_AS0 + ((n) << 6)) + (r)) - -#define AS_TRANSTAB_LO 0x00 /* (RW) Translation Table Base Address for address space n, low word */ -#define AS_TRANSTAB_HI 0x04 /* (RW) Translation Table Base Address for address space n, high word */ -#define AS_MEMATTR_LO 0x08 /* (RW) Memory attributes for address space n, low word. */ -#define AS_MEMATTR_HI 0x0C /* (RW) Memory attributes for address space n, high word. */ -#define AS_LOCKADDR_LO 0x10 /* (RW) Lock region address for address space n, low word */ -#define AS_LOCKADDR_HI 0x14 /* (RW) Lock region address for address space n, high word */ -#define AS_COMMAND 0x18 /* (WO) MMU command register for address space n */ -#define AS_FAULTSTATUS 0x1C /* (RO) MMU fault status register for address space n */ -#define AS_FAULTADDRESS_LO 0x20 /* (RO) Fault Address for address space n, low word */ -#define AS_FAULTADDRESS_HI 0x24 /* (RO) Fault Address for address space n, high word */ -#define AS_STATUS 0x28 /* (RO) Status flags for address space n */ - -/* (RW) Translation table configuration for address space n, low word */ -#define AS_TRANSCFG_LO 0x30 -/* (RW) Translation table configuration for address space n, high word */ -#define AS_TRANSCFG_HI 0x34 -/* (RO) Secondary fault address for address space n, low word */ -#define AS_FAULTEXTRA_LO 0x38 -/* (RO) Secondary fault address for address space n, high word */ -#define AS_FAULTEXTRA_HI 0x3C +#define MMU_STAGE1 0x2000 /* () MMU control registers */ +#define MMU_STAGE1_REG(r) (MMU_STAGE1 + (r)) + +#define MMU_AS_REG(n, r) (MMU_AS0 + ((n) << 6) + (r)) + +#define AS_TRANSTAB_LO 0x00 /* (RW) Translation Table Base Address for address space n, low word */ +#define AS_TRANSTAB_HI 0x04 /* (RW) Translation Table Base Address for address space n, high word */ +#define AS_MEMATTR_LO 0x08 /* (RW) Memory attributes for address space n, low word. */ +#define AS_MEMATTR_HI 0x0C /* (RW) Memory attributes for address space n, high word. */ +#define AS_LOCKADDR_LO 0x10 /* (RW) Lock region address for address space n, low word */ +#define AS_LOCKADDR_HI 0x14 /* (RW) Lock region address for address space n, high word */ +#define AS_COMMAND 0x18 /* (WO) MMU command register for address space n */ +#define AS_FAULTSTATUS 0x1C /* (RO) MMU fault status register for address space n */ +#define AS_FAULTADDRESS_LO 0x20 /* (RO) Fault Address for address space n, low word */ +#define AS_FAULTADDRESS_HI 0x24 /* (RO) Fault Address for address space n, high word */ +#define AS_STATUS 0x28 /* (RO) Status flags for address space n */ +#define AS_TRANSCFG_LO 0x30 /* (RW) Translation table configuration for address space n, low word */ +#define AS_TRANSCFG_HI 0x34 /* (RW) Translation table configuration for address space n, high word */ +#define AS_FAULTEXTRA_LO 0x38 /* (RO) Secondary fault address for address space n, low word */ +#define AS_FAULTEXTRA_HI 0x3C /* (RO) Secondary fault address for address space n, high word */ /* End Register Offsets */ @@ -288,7 +307,7 @@ (((reg_val)&AS_FAULTSTATUS_ACCESS_TYPE_MASK) >> AS_FAULTSTATUS_ACCESS_TYPE_SHIFT) #define AS_FAULTSTATUS_ACCESS_TYPE_ATOMIC (0x0) -#define AS_FAULTSTATUS_ACCESS_TYPE_EX (0x1) +#define AS_FAULTSTATUS_ACCESS_TYPE_EXECUTE (0x1) #define AS_FAULTSTATUS_ACCESS_TYPE_READ (0x2) #define AS_FAULTSTATUS_ACCESS_TYPE_WRITE (0x3) @@ -355,8 +374,8 @@ (((value) << AS_LOCKADDR_LOCKADDR_SIZE_SHIFT) & \ AS_LOCKADDR_LOCKADDR_SIZE_MASK)) #define AS_LOCKADDR_LOCKADDR_BASE_SHIFT GPU_U(12) -#define AS_LOCKADDR_LOCKADDR_BASE_MASK \ - (GPU_U(0xFFFFFFFFFFFFF) << AS_LOCKADDR_LOCKADDR_BASE_SHIFT) +#define AS_LOCKADDR_LOCKADDR_BASE_MASK \ + (GPU_ULL(0xFFFFFFFFFFFFF) << AS_LOCKADDR_LOCKADDR_BASE_SHIFT) #define AS_LOCKADDR_LOCKADDR_BASE_GET(reg_val) \ (((reg_val)&AS_LOCKADDR_LOCKADDR_BASE_MASK) >> \ AS_LOCKADDR_LOCKADDR_BASE_SHIFT) @@ -364,11 +383,11 @@ (((reg_val) & ~AS_LOCKADDR_LOCKADDR_BASE_MASK) | \ (((value) << AS_LOCKADDR_LOCKADDR_BASE_SHIFT) & \ AS_LOCKADDR_LOCKADDR_BASE_MASK)) - -/* GPU_STATUS values */ -#define GPU_STATUS_PRFCNT_ACTIVE (1 << 2) /* Set if the performance counters are active. */ -#define GPU_STATUS_CYCLE_COUNT_ACTIVE (1 << 6) /* Set if the cycle counter is active. */ -#define GPU_STATUS_PROTECTED_MODE_ACTIVE (1 << 7) /* Set if protected mode is active */ +#define AS_LOCKADDR_FLUSH_SKIP_LEVELS_SHIFT (6) +#define AS_LOCKADDR_FLUSH_SKIP_LEVELS_MASK ((0xF) << AS_LOCKADDR_FLUSH_SKIP_LEVELS_SHIFT) +#define AS_LOCKADDR_FLUSH_SKIP_LEVELS_SET(reg_val, value) \ + (((reg_val) & ~AS_LOCKADDR_FLUSH_SKIP_LEVELS_MASK) | \ + ((value << AS_LOCKADDR_FLUSH_SKIP_LEVELS_SHIFT) & AS_LOCKADDR_FLUSH_SKIP_LEVELS_MASK)) /* PRFCNT_CONFIG register values */ #define PRFCNT_CONFIG_MODE_SHIFT 0 /* Counter mode position. */ @@ -454,6 +473,60 @@ #define L2_CONFIG_ASN_HASH_ENABLE_MASK (1ul << L2_CONFIG_ASN_HASH_ENABLE_SHIFT) /* End L2_CONFIG register */ +/* AMBA_FEATURES register */ +#define AMBA_FEATURES_ACE_LITE_SHIFT GPU_U(0) +#define AMBA_FEATURES_ACE_LITE_MASK (GPU_U(0x1) << AMBA_FEATURES_ACE_LITE_SHIFT) +#define AMBA_FEATURES_ACE_LITE_GET(reg_val) \ + (((reg_val)&AMBA_FEATURES_ACE_LITE_MASK) >> \ + AMBA_FEATURES_ACE_LITE_SHIFT) +#define AMBA_FEATURES_ACE_LITE_SET(reg_val, value) \ + (((reg_val) & ~AMBA_FEATURES_ACE_LITE_MASK) | \ + (((value) << AMBA_FEATURES_ACE_LITE_SHIFT) & \ + AMBA_FEATURES_ACE_LITE_MASK)) +#define AMBA_FEATURES_ACE_SHIFT GPU_U(1) +#define AMBA_FEATURES_ACE_MASK (GPU_U(0x1) << AMBA_FEATURES_ACE_SHIFT) +#define AMBA_FEATURES_ACE_GET(reg_val) \ + (((reg_val)&AMBA_FEATURES_ACE_MASK) >> AMBA_FEATURES_ACE_SHIFT) +#define AMBA_FEATURES_ACE_SET(reg_val, value) \ + (((reg_val) & ~AMBA_FEATURES_ACE_MASK) | \ + (((value) << AMBA_FEATURES_ACE_SHIFT) & AMBA_FEATURES_ACE_MASK)) +#define AMBA_FEATURES_MEMORY_CACHE_SUPPORT_SHIFT GPU_U(5) +#define AMBA_FEATURES_MEMORY_CACHE_SUPPORT_MASK \ + (GPU_U(0x1) << AMBA_FEATURES_MEMORY_CACHE_SUPPORT_SHIFT) +#define AMBA_FEATURES_MEMORY_CACHE_SUPPORT_GET(reg_val) \ + (((reg_val)&AMBA_FEATURES_MEMORY_CACHE_SUPPORT_MASK) >> \ + AMBA_FEATURES_MEMORY_CACHE_SUPPORT_SHIFT) +#define AMBA_FEATURES_MEMORY_CACHE_SUPPORT_SET(reg_val, value) \ + (((reg_val) & ~AMBA_FEATURES_MEMORY_CACHE_SUPPORT_MASK) | \ + (((value) << AMBA_FEATURES_MEMORY_CACHE_SUPPORT_SHIFT) & \ + AMBA_FEATURES_MEMORY_CACHE_SUPPORT_MASK)) + +/* AMBA_ENABLE register */ +#define AMBA_ENABLE_COHERENCY_PROTOCOL_SHIFT GPU_U(0) +#define AMBA_ENABLE_COHERENCY_PROTOCOL_MASK \ + (GPU_U(0x1F) << AMBA_ENABLE_COHERENCY_PROTOCOL_SHIFT) +#define AMBA_ENABLE_COHERENCY_PROTOCOL_GET(reg_val) \ + (((reg_val)&AMBA_ENABLE_COHERENCY_PROTOCOL_MASK) >> \ + AMBA_ENABLE_COHERENCY_PROTOCOL_SHIFT) +#define AMBA_ENABLE_COHERENCY_PROTOCOL_SET(reg_val, value) \ + (((reg_val) & ~AMBA_ENABLE_COHERENCY_PROTOCOL_MASK) | \ + (((value) << AMBA_ENABLE_COHERENCY_PROTOCOL_SHIFT) & \ + AMBA_ENABLE_COHERENCY_PROTOCOL_MASK)) +/* AMBA_ENABLE_coherency_protocol values */ +#define AMBA_ENABLE_COHERENCY_PROTOCOL_ACE_LITE 0x0 +#define AMBA_ENABLE_COHERENCY_PROTOCOL_ACE 0x1 +#define AMBA_ENABLE_COHERENCY_PROTOCOL_NO_COHERENCY 0x1F +/* End of AMBA_ENABLE_coherency_protocol values */ +#define AMBA_ENABLE_MEMORY_CACHE_SUPPORT_SHIFT GPU_U(5) +#define AMBA_ENABLE_MEMORY_CACHE_SUPPORT_MASK \ + (GPU_U(0x1) << AMBA_ENABLE_MEMORY_CACHE_SUPPORT_SHIFT) +#define AMBA_ENABLE_MEMORY_CACHE_SUPPORT_GET(reg_val) \ + (((reg_val)&AMBA_ENABLE_MEMORY_CACHE_SUPPORT_MASK) >> \ + AMBA_ENABLE_MEMORY_CACHE_SUPPORT_SHIFT) +#define AMBA_ENABLE_MEMORY_CACHE_SUPPORT_SET(reg_val, value) \ + (((reg_val) & ~AMBA_ENABLE_MEMORY_CACHE_SUPPORT_MASK) | \ + (((value) << AMBA_ENABLE_MEMORY_CACHE_SUPPORT_SHIFT) & \ + AMBA_ENABLE_MEMORY_CACHE_SUPPORT_MASK)) /* IDVS_GROUP register */ #define IDVS_GROUP_SIZE_SHIFT (16) diff --git a/mali_kbase/hwcnt/Kbuild b/mali_kbase/hwcnt/Kbuild new file mode 100644 index 0000000..8c8775f --- /dev/null +++ b/mali_kbase/hwcnt/Kbuild @@ -0,0 +1,37 @@ +# SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note +# +# (C) COPYRIGHT 2022 ARM Limited. All rights reserved. +# +# This program is free software and is provided to you under the terms of the +# GNU General Public License version 2 as published by the Free Software +# Foundation, and any use by you of this program is subject to the terms +# of such GNU license. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, you can access it online at +# http://www.gnu.org/licenses/gpl-2.0.html. +# +# + +mali_kbase-y += \ + hwcnt/mali_kbase_hwcnt.o \ + hwcnt/mali_kbase_hwcnt_gpu.o \ + hwcnt/mali_kbase_hwcnt_gpu_narrow.o \ + hwcnt/mali_kbase_hwcnt_types.o \ + hwcnt/mali_kbase_hwcnt_virtualizer.o \ + hwcnt/mali_kbase_hwcnt_watchdog_if_timer.o + +ifeq ($(CONFIG_MALI_CSF_SUPPORT),y) + mali_kbase-y += \ + hwcnt/backend/mali_kbase_hwcnt_backend_csf.o \ + hwcnt/backend/mali_kbase_hwcnt_backend_csf_if_fw.o +else + mali_kbase-y += \ + hwcnt/backend/mali_kbase_hwcnt_backend_jm.o \ + hwcnt/backend/mali_kbase_hwcnt_backend_jm_watchdog.o +endif diff --git a/mali_kbase/mali_kbase_hwcnt_backend.h b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend.h index b069fc1..6cfa6f5 100644 --- a/mali_kbase/mali_kbase_hwcnt_backend.h +++ b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2018, 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2018, 2020-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -56,8 +56,8 @@ struct kbase_hwcnt_backend; * * Return: Non-NULL pointer to immutable hardware counter metadata. */ -typedef const struct kbase_hwcnt_metadata *kbase_hwcnt_backend_metadata_fn( - const struct kbase_hwcnt_backend_info *info); +typedef const struct kbase_hwcnt_metadata * +kbase_hwcnt_backend_metadata_fn(const struct kbase_hwcnt_backend_info *info); /** * typedef kbase_hwcnt_backend_init_fn - Initialise a counter backend. @@ -69,9 +69,8 @@ typedef const struct kbase_hwcnt_metadata *kbase_hwcnt_backend_metadata_fn( * * Return: 0 on success, else error code. */ -typedef int kbase_hwcnt_backend_init_fn( - const struct kbase_hwcnt_backend_info *info, - struct kbase_hwcnt_backend **out_backend); +typedef int kbase_hwcnt_backend_init_fn(const struct kbase_hwcnt_backend_info *info, + struct kbase_hwcnt_backend **out_backend); /** * typedef kbase_hwcnt_backend_term_fn - Terminate a counter backend. @@ -86,8 +85,7 @@ typedef void kbase_hwcnt_backend_term_fn(struct kbase_hwcnt_backend *backend); * * Return: Backend timestamp in nanoseconds. */ -typedef u64 kbase_hwcnt_backend_timestamp_ns_fn( - struct kbase_hwcnt_backend *backend); +typedef u64 kbase_hwcnt_backend_timestamp_ns_fn(struct kbase_hwcnt_backend *backend); /** * typedef kbase_hwcnt_backend_dump_enable_fn - Start counter dumping with the @@ -102,9 +100,8 @@ typedef u64 kbase_hwcnt_backend_timestamp_ns_fn( * * Return: 0 on success, else error code. */ -typedef int kbase_hwcnt_backend_dump_enable_fn( - struct kbase_hwcnt_backend *backend, - const struct kbase_hwcnt_enable_map *enable_map); +typedef int kbase_hwcnt_backend_dump_enable_fn(struct kbase_hwcnt_backend *backend, + const struct kbase_hwcnt_enable_map *enable_map); /** * typedef kbase_hwcnt_backend_dump_enable_nolock_fn - Start counter dumping @@ -118,9 +115,9 @@ typedef int kbase_hwcnt_backend_dump_enable_fn( * * Return: 0 on success, else error code. */ -typedef int kbase_hwcnt_backend_dump_enable_nolock_fn( - struct kbase_hwcnt_backend *backend, - const struct kbase_hwcnt_enable_map *enable_map); +typedef int +kbase_hwcnt_backend_dump_enable_nolock_fn(struct kbase_hwcnt_backend *backend, + const struct kbase_hwcnt_enable_map *enable_map); /** * typedef kbase_hwcnt_backend_dump_disable_fn - Disable counter dumping with @@ -130,8 +127,7 @@ typedef int kbase_hwcnt_backend_dump_enable_nolock_fn( * If the backend is already disabled, does nothing. * Any undumped counter values since the last dump get will be lost. */ -typedef void kbase_hwcnt_backend_dump_disable_fn( - struct kbase_hwcnt_backend *backend); +typedef void kbase_hwcnt_backend_dump_disable_fn(struct kbase_hwcnt_backend *backend); /** * typedef kbase_hwcnt_backend_dump_clear_fn - Reset all the current undumped @@ -142,8 +138,7 @@ typedef void kbase_hwcnt_backend_dump_disable_fn( * * Return: 0 on success, else error code. */ -typedef int kbase_hwcnt_backend_dump_clear_fn( - struct kbase_hwcnt_backend *backend); +typedef int kbase_hwcnt_backend_dump_clear_fn(struct kbase_hwcnt_backend *backend); /** * typedef kbase_hwcnt_backend_dump_request_fn - Request an asynchronous counter @@ -157,9 +152,8 @@ typedef int kbase_hwcnt_backend_dump_clear_fn( * * Return: 0 on success, else error code. */ -typedef int kbase_hwcnt_backend_dump_request_fn( - struct kbase_hwcnt_backend *backend, - u64 *dump_time_ns); +typedef int kbase_hwcnt_backend_dump_request_fn(struct kbase_hwcnt_backend *backend, + u64 *dump_time_ns); /** * typedef kbase_hwcnt_backend_dump_wait_fn - Wait until the last requested @@ -170,8 +164,7 @@ typedef int kbase_hwcnt_backend_dump_request_fn( * * Return: 0 on success, else error code. */ -typedef int kbase_hwcnt_backend_dump_wait_fn( - struct kbase_hwcnt_backend *backend); +typedef int kbase_hwcnt_backend_dump_wait_fn(struct kbase_hwcnt_backend *backend); /** * typedef kbase_hwcnt_backend_dump_get_fn - Copy or accumulate enable the @@ -189,11 +182,10 @@ typedef int kbase_hwcnt_backend_dump_wait_fn( * * Return: 0 on success, else error code. */ -typedef int kbase_hwcnt_backend_dump_get_fn( - struct kbase_hwcnt_backend *backend, - struct kbase_hwcnt_dump_buffer *dump_buffer, - const struct kbase_hwcnt_enable_map *enable_map, - bool accumulate); +typedef int kbase_hwcnt_backend_dump_get_fn(struct kbase_hwcnt_backend *backend, + struct kbase_hwcnt_dump_buffer *dump_buffer, + const struct kbase_hwcnt_enable_map *enable_map, + bool accumulate); /** * struct kbase_hwcnt_backend_interface - Hardware counter backend virtual diff --git a/mali_kbase/mali_kbase_hwcnt_backend_csf.c b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend_csf.c index c42f2a0..27acfc6 100644 --- a/mali_kbase/mali_kbase_hwcnt_backend_csf.c +++ b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend_csf.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2021-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -19,9 +19,9 @@ * */ -#include "mali_kbase_hwcnt_backend_csf.h" -#include "mali_kbase_hwcnt_gpu.h" -#include "mali_kbase_hwcnt_types.h" +#include "hwcnt/backend/mali_kbase_hwcnt_backend_csf.h" +#include "hwcnt/mali_kbase_hwcnt_gpu.h" +#include "hwcnt/mali_kbase_hwcnt_types.h" #include <linux/log2.h> #include <linux/kernel.h> @@ -36,8 +36,13 @@ #define BASE_MAX_NR_CLOCKS_REGULATORS 2 #endif +#if IS_ENABLED(CONFIG_MALI_IS_FPGA) && !IS_ENABLED(CONFIG_MALI_NO_MALI) +/* Backend watch dog timer interval in milliseconds: 18 seconds. */ +#define HWCNT_BACKEND_WATCHDOG_TIMER_INTERVAL_MS ((u32)18000) +#else /* Backend watch dog timer interval in milliseconds: 1 second. */ #define HWCNT_BACKEND_WATCHDOG_TIMER_INTERVAL_MS ((u32)1000) +#endif /* IS_FPGA && !NO_MALI */ /** * enum kbase_hwcnt_backend_csf_dump_state - HWC CSF backend dumping states. @@ -168,23 +173,29 @@ struct kbase_hwcnt_backend_csf_info { /** * struct kbase_hwcnt_csf_physical_layout - HWC sample memory physical layout * information. + * @hw_block_cnt: Total number of hardware counters blocks. The hw counters blocks are + * sub-categorized into 4 classes: front-end, tiler, memory system, and shader. + * hw_block_cnt = fe_cnt + tiler_cnt + mmu_l2_cnt + shader_cnt. * @fe_cnt: Front end block count. * @tiler_cnt: Tiler block count. - * @mmu_l2_cnt: Memory system(MMU and L2 cache) block count. + * @mmu_l2_cnt: Memory system (MMU and L2 cache) block count. * @shader_cnt: Shader Core block count. - * @block_cnt: Total block count (sum of all other block counts). + * @fw_block_cnt: Total number of firmware counters blocks. + * @block_cnt: Total block count (sum of all counter blocks: hw_block_cnt + fw_block_cnt). * @shader_avail_mask: Bitmap of all shader cores in the system. * @enable_mask_offset: Offset in array elements of enable mask in each block * starting from the beginning of block. - * @headers_per_block: Header size per block. - * @counters_per_block: Counters size per block. - * @values_per_block: Total size per block. + * @headers_per_block: For any block, the number of counters designated as block's header. + * @counters_per_block: For any block, the number of counters designated as block's payload. + * @values_per_block: For any block, the number of counters in total (header + payload). */ struct kbase_hwcnt_csf_physical_layout { + u8 hw_block_cnt; u8 fe_cnt; u8 tiler_cnt; u8 mmu_l2_cnt; u8 shader_cnt; + u8 fw_block_cnt; u8 block_cnt; u64 shader_avail_mask; size_t enable_mask_offset; @@ -256,8 +267,7 @@ struct kbase_hwcnt_backend_csf { struct work_struct hwc_threshold_work; }; -static bool kbasep_hwcnt_backend_csf_backend_exists( - struct kbase_hwcnt_backend_csf_info *csf_info) +static bool kbasep_hwcnt_backend_csf_backend_exists(struct kbase_hwcnt_backend_csf_info *csf_info) { WARN_ON(!csf_info); csf_info->csf_if->assert_lock_held(csf_info->csf_if->ctx); @@ -271,19 +281,22 @@ static bool kbasep_hwcnt_backend_csf_backend_exists( * @backend_csf: Non-NULL pointer to backend. * @enable_map: Non-NULL pointer to enable map specifying enabled counters. */ -static void kbasep_hwcnt_backend_csf_cc_initial_sample( - struct kbase_hwcnt_backend_csf *backend_csf, - const struct kbase_hwcnt_enable_map *enable_map) +static void +kbasep_hwcnt_backend_csf_cc_initial_sample(struct kbase_hwcnt_backend_csf *backend_csf, + const struct kbase_hwcnt_enable_map *enable_map) { u64 clk_enable_map = enable_map->clk_enable_map; u64 cycle_counts[BASE_MAX_NR_CLOCKS_REGULATORS]; size_t clk; + memset(cycle_counts, 0, sizeof(cycle_counts)); + /* Read cycle count from CSF interface for both clock domains. */ - backend_csf->info->csf_if->get_gpu_cycle_count( - backend_csf->info->csf_if->ctx, cycle_counts, clk_enable_map); + backend_csf->info->csf_if->get_gpu_cycle_count(backend_csf->info->csf_if->ctx, cycle_counts, + clk_enable_map); - kbase_hwcnt_metadata_for_each_clock(enable_map->metadata, clk) { + kbase_hwcnt_metadata_for_each_clock(enable_map->metadata, clk) + { if (kbase_hwcnt_clk_enable_map_enabled(clk_enable_map, clk)) backend_csf->prev_cycle_count[clk] = cycle_counts[clk]; } @@ -292,42 +305,37 @@ static void kbasep_hwcnt_backend_csf_cc_initial_sample( backend_csf->clk_enable_map = clk_enable_map; } -static void -kbasep_hwcnt_backend_csf_cc_update(struct kbase_hwcnt_backend_csf *backend_csf) +static void kbasep_hwcnt_backend_csf_cc_update(struct kbase_hwcnt_backend_csf *backend_csf) { u64 cycle_counts[BASE_MAX_NR_CLOCKS_REGULATORS]; size_t clk; - backend_csf->info->csf_if->assert_lock_held( - backend_csf->info->csf_if->ctx); + memset(cycle_counts, 0, sizeof(cycle_counts)); + + backend_csf->info->csf_if->assert_lock_held(backend_csf->info->csf_if->ctx); - backend_csf->info->csf_if->get_gpu_cycle_count( - backend_csf->info->csf_if->ctx, cycle_counts, - backend_csf->clk_enable_map); + backend_csf->info->csf_if->get_gpu_cycle_count(backend_csf->info->csf_if->ctx, cycle_counts, + backend_csf->clk_enable_map); - kbase_hwcnt_metadata_for_each_clock(backend_csf->info->metadata, clk) { - if (kbase_hwcnt_clk_enable_map_enabled( - backend_csf->clk_enable_map, clk)) { + kbase_hwcnt_metadata_for_each_clock(backend_csf->info->metadata, clk) + { + if (kbase_hwcnt_clk_enable_map_enabled(backend_csf->clk_enable_map, clk)) { backend_csf->cycle_count_elapsed[clk] = - cycle_counts[clk] - - backend_csf->prev_cycle_count[clk]; + cycle_counts[clk] - backend_csf->prev_cycle_count[clk]; backend_csf->prev_cycle_count[clk] = cycle_counts[clk]; } } } /* CSF backend implementation of kbase_hwcnt_backend_timestamp_ns_fn */ -static u64 -kbasep_hwcnt_backend_csf_timestamp_ns(struct kbase_hwcnt_backend *backend) +static u64 kbasep_hwcnt_backend_csf_timestamp_ns(struct kbase_hwcnt_backend *backend) { - struct kbase_hwcnt_backend_csf *backend_csf = - (struct kbase_hwcnt_backend_csf *)backend; + struct kbase_hwcnt_backend_csf *backend_csf = (struct kbase_hwcnt_backend_csf *)backend; if (!backend_csf || !backend_csf->info || !backend_csf->info->csf_if) return 0; - return backend_csf->info->csf_if->timestamp_ns( - backend_csf->info->csf_if->ctx); + return backend_csf->info->csf_if->timestamp_ns(backend_csf->info->csf_if->ctx); } /** kbasep_hwcnt_backend_csf_process_enable_map() - Process the enable_map to @@ -336,8 +344,8 @@ kbasep_hwcnt_backend_csf_timestamp_ns(struct kbase_hwcnt_backend *backend) * required. *@phys_enable_map: HWC physical enable map to be processed. */ -static void kbasep_hwcnt_backend_csf_process_enable_map( - struct kbase_hwcnt_physical_enable_map *phys_enable_map) +static void +kbasep_hwcnt_backend_csf_process_enable_map(struct kbase_hwcnt_physical_enable_map *phys_enable_map) { WARN_ON(!phys_enable_map); @@ -361,46 +369,55 @@ static void kbasep_hwcnt_backend_csf_init_layout( const struct kbase_hwcnt_backend_csf_if_prfcnt_info *prfcnt_info, struct kbase_hwcnt_csf_physical_layout *phys_layout) { - u8 shader_core_cnt; + size_t shader_core_cnt; size_t values_per_block; + size_t fw_blocks_count; + size_t hw_blocks_count; WARN_ON(!prfcnt_info); WARN_ON(!phys_layout); shader_core_cnt = fls64(prfcnt_info->core_mask); - values_per_block = - prfcnt_info->prfcnt_block_size / KBASE_HWCNT_VALUE_HW_BYTES; + values_per_block = prfcnt_info->prfcnt_block_size / KBASE_HWCNT_VALUE_HW_BYTES; + fw_blocks_count = div_u64(prfcnt_info->prfcnt_fw_size, prfcnt_info->prfcnt_block_size); + hw_blocks_count = div_u64(prfcnt_info->prfcnt_hw_size, prfcnt_info->prfcnt_block_size); + + /* The number of hardware counters reported by the GPU matches the legacy guess-work we + * have done in the past + */ + WARN_ON(hw_blocks_count != KBASE_HWCNT_V5_FE_BLOCK_COUNT + + KBASE_HWCNT_V5_TILER_BLOCK_COUNT + + prfcnt_info->l2_count + shader_core_cnt); *phys_layout = (struct kbase_hwcnt_csf_physical_layout){ .fe_cnt = KBASE_HWCNT_V5_FE_BLOCK_COUNT, .tiler_cnt = KBASE_HWCNT_V5_TILER_BLOCK_COUNT, .mmu_l2_cnt = prfcnt_info->l2_count, .shader_cnt = shader_core_cnt, - .block_cnt = KBASE_HWCNT_V5_FE_BLOCK_COUNT + - KBASE_HWCNT_V5_TILER_BLOCK_COUNT + - prfcnt_info->l2_count + shader_core_cnt, + .fw_block_cnt = fw_blocks_count, + .hw_block_cnt = hw_blocks_count, + .block_cnt = fw_blocks_count + hw_blocks_count, .shader_avail_mask = prfcnt_info->core_mask, .headers_per_block = KBASE_HWCNT_V5_HEADERS_PER_BLOCK, .values_per_block = values_per_block, - .counters_per_block = - values_per_block - KBASE_HWCNT_V5_HEADERS_PER_BLOCK, + .counters_per_block = values_per_block - KBASE_HWCNT_V5_HEADERS_PER_BLOCK, .enable_mask_offset = KBASE_HWCNT_V5_PRFCNT_EN_HEADER, }; } -static void kbasep_hwcnt_backend_csf_reset_internal_buffers( - struct kbase_hwcnt_backend_csf *backend_csf) +static void +kbasep_hwcnt_backend_csf_reset_internal_buffers(struct kbase_hwcnt_backend_csf *backend_csf) { size_t user_buf_bytes = backend_csf->info->metadata->dump_buf_bytes; memset(backend_csf->to_user_buf, 0, user_buf_bytes); memset(backend_csf->accum_buf, 0, user_buf_bytes); - memset(backend_csf->old_sample_buf, 0, - backend_csf->info->prfcnt_info.dump_bytes); + memset(backend_csf->old_sample_buf, 0, backend_csf->info->prfcnt_info.dump_bytes); } -static void kbasep_hwcnt_backend_csf_zero_sample_prfcnt_en_header( - struct kbase_hwcnt_backend_csf *backend_csf, u32 *sample) +static void +kbasep_hwcnt_backend_csf_zero_sample_prfcnt_en_header(struct kbase_hwcnt_backend_csf *backend_csf, + u32 *sample) { u32 block_idx; const struct kbase_hwcnt_csf_physical_layout *phys_layout; @@ -414,8 +431,8 @@ static void kbasep_hwcnt_backend_csf_zero_sample_prfcnt_en_header( } } -static void kbasep_hwcnt_backend_csf_zero_all_prfcnt_en_header( - struct kbase_hwcnt_backend_csf *backend_csf) +static void +kbasep_hwcnt_backend_csf_zero_all_prfcnt_en_header(struct kbase_hwcnt_backend_csf *backend_csf) { u32 idx; u32 *sample; @@ -426,19 +443,16 @@ static void kbasep_hwcnt_backend_csf_zero_all_prfcnt_en_header( for (idx = 0; idx < backend_csf->info->ring_buf_cnt; idx++) { sample = (u32 *)&cpu_dump_base[idx * dump_bytes]; - kbasep_hwcnt_backend_csf_zero_sample_prfcnt_en_header( - backend_csf, sample); + kbasep_hwcnt_backend_csf_zero_sample_prfcnt_en_header(backend_csf, sample); } } -static void kbasep_hwcnt_backend_csf_update_user_sample( - struct kbase_hwcnt_backend_csf *backend_csf) +static void kbasep_hwcnt_backend_csf_update_user_sample(struct kbase_hwcnt_backend_csf *backend_csf) { size_t user_buf_bytes = backend_csf->info->metadata->dump_buf_bytes; /* Copy the data into the sample and wait for the user to get it. */ - memcpy(backend_csf->to_user_buf, backend_csf->accum_buf, - user_buf_bytes); + memcpy(backend_csf->to_user_buf, backend_csf->accum_buf, user_buf_bytes); /* After copied data into user sample, clear the accumulator values to * prepare for the next accumulator, such as the next request or @@ -448,9 +462,8 @@ static void kbasep_hwcnt_backend_csf_update_user_sample( } static void kbasep_hwcnt_backend_csf_accumulate_sample( - const struct kbase_hwcnt_csf_physical_layout *phys_layout, - size_t dump_bytes, u64 *accum_buf, const u32 *old_sample_buf, - const u32 *new_sample_buf, bool clearing_samples) + const struct kbase_hwcnt_csf_physical_layout *phys_layout, size_t dump_bytes, + u64 *accum_buf, const u32 *old_sample_buf, const u32 *new_sample_buf, bool clearing_samples) { size_t block_idx; const u32 *old_block = old_sample_buf; @@ -458,11 +471,17 @@ static void kbasep_hwcnt_backend_csf_accumulate_sample( u64 *acc_block = accum_buf; const size_t values_per_block = phys_layout->values_per_block; - for (block_idx = 0; block_idx < phys_layout->block_cnt; block_idx++) { - const u32 old_enable_mask = - old_block[phys_layout->enable_mask_offset]; - const u32 new_enable_mask = - new_block[phys_layout->enable_mask_offset]; + /* Performance counter blocks for firmware are stored before blocks for hardware. + * We skip over the firmware's performance counter blocks (counters dumping is not + * supported for firmware blocks, only hardware ones). + */ + old_block += values_per_block * phys_layout->fw_block_cnt; + new_block += values_per_block * phys_layout->fw_block_cnt; + + for (block_idx = phys_layout->fw_block_cnt; block_idx < phys_layout->block_cnt; + block_idx++) { + const u32 old_enable_mask = old_block[phys_layout->enable_mask_offset]; + const u32 new_enable_mask = new_block[phys_layout->enable_mask_offset]; if (new_enable_mask == 0) { /* Hardware block was unavailable or we didn't turn on @@ -475,9 +494,7 @@ static void kbasep_hwcnt_backend_csf_accumulate_sample( size_t ctr_idx; /* Unconditionally copy the headers. */ - for (ctr_idx = 0; - ctr_idx < phys_layout->headers_per_block; - ctr_idx++) { + for (ctr_idx = 0; ctr_idx < phys_layout->headers_per_block; ctr_idx++) { acc_block[ctr_idx] = new_block[ctr_idx]; } @@ -506,34 +523,25 @@ static void kbasep_hwcnt_backend_csf_accumulate_sample( * counters only, as we know previous * values are zeroes. */ - for (ctr_idx = - phys_layout - ->headers_per_block; - ctr_idx < values_per_block; - ctr_idx++) { - acc_block[ctr_idx] += - new_block[ctr_idx]; + for (ctr_idx = phys_layout->headers_per_block; + ctr_idx < values_per_block; ctr_idx++) { + acc_block[ctr_idx] += new_block[ctr_idx]; } } else { /* Hardware block was previously * available. Accumulate the delta * between old and new counter values. */ - for (ctr_idx = - phys_layout - ->headers_per_block; - ctr_idx < values_per_block; - ctr_idx++) { + for (ctr_idx = phys_layout->headers_per_block; + ctr_idx < values_per_block; ctr_idx++) { acc_block[ctr_idx] += - new_block[ctr_idx] - - old_block[ctr_idx]; + new_block[ctr_idx] - old_block[ctr_idx]; } } } else { for (ctr_idx = phys_layout->headers_per_block; ctr_idx < values_per_block; ctr_idx++) { - acc_block[ctr_idx] += - new_block[ctr_idx]; + acc_block[ctr_idx] += new_block[ctr_idx]; } } } @@ -542,27 +550,25 @@ static void kbasep_hwcnt_backend_csf_accumulate_sample( acc_block += values_per_block; } - WARN_ON(old_block != - old_sample_buf + (dump_bytes / KBASE_HWCNT_VALUE_HW_BYTES)); - WARN_ON(new_block != - new_sample_buf + (dump_bytes / KBASE_HWCNT_VALUE_HW_BYTES)); - WARN_ON(acc_block != - accum_buf + (dump_bytes / KBASE_HWCNT_VALUE_HW_BYTES)); + WARN_ON(old_block != old_sample_buf + (dump_bytes / KBASE_HWCNT_VALUE_HW_BYTES)); + WARN_ON(new_block != new_sample_buf + (dump_bytes / KBASE_HWCNT_VALUE_HW_BYTES)); + WARN_ON(acc_block != accum_buf + (dump_bytes / KBASE_HWCNT_VALUE_HW_BYTES) - + (values_per_block * phys_layout->fw_block_cnt)); (void)dump_bytes; } -static void kbasep_hwcnt_backend_csf_accumulate_samples( - struct kbase_hwcnt_backend_csf *backend_csf, u32 extract_index_to_start, - u32 insert_index_to_stop) +static void kbasep_hwcnt_backend_csf_accumulate_samples(struct kbase_hwcnt_backend_csf *backend_csf, + u32 extract_index_to_start, + u32 insert_index_to_stop) { u32 raw_idx; - unsigned long flags; + unsigned long flags = 0UL; u8 *cpu_dump_base = (u8 *)backend_csf->ring_buf_cpu_base; const size_t ring_buf_cnt = backend_csf->info->ring_buf_cnt; const size_t buf_dump_bytes = backend_csf->info->prfcnt_info.dump_bytes; bool clearing_samples = backend_csf->info->prfcnt_info.clearing_samples; u32 *old_sample_buf = backend_csf->old_sample_buf; - u32 *new_sample_buf; + u32 *new_sample_buf = old_sample_buf; if (extract_index_to_start == insert_index_to_stop) /* No samples to accumulate. Early out. */ @@ -570,25 +576,22 @@ static void kbasep_hwcnt_backend_csf_accumulate_samples( /* Sync all the buffers to CPU side before read the data. */ backend_csf->info->csf_if->ring_buf_sync(backend_csf->info->csf_if->ctx, - backend_csf->ring_buf, - extract_index_to_start, + backend_csf->ring_buf, extract_index_to_start, insert_index_to_stop, true); /* Consider u32 wrap case, '!=' is used here instead of '<' operator */ - for (raw_idx = extract_index_to_start; raw_idx != insert_index_to_stop; - raw_idx++) { + for (raw_idx = extract_index_to_start; raw_idx != insert_index_to_stop; raw_idx++) { /* The logical "&" acts as a modulo operation since buf_count * must be a power of two. */ const u32 buf_idx = raw_idx & (ring_buf_cnt - 1); - new_sample_buf = - (u32 *)&cpu_dump_base[buf_idx * buf_dump_bytes]; + new_sample_buf = (u32 *)&cpu_dump_base[buf_idx * buf_dump_bytes]; - kbasep_hwcnt_backend_csf_accumulate_sample( - &backend_csf->phys_layout, buf_dump_bytes, - backend_csf->accum_buf, old_sample_buf, new_sample_buf, - clearing_samples); + kbasep_hwcnt_backend_csf_accumulate_sample(&backend_csf->phys_layout, + buf_dump_bytes, backend_csf->accum_buf, + old_sample_buf, new_sample_buf, + clearing_samples); old_sample_buf = new_sample_buf; } @@ -597,19 +600,16 @@ static void kbasep_hwcnt_backend_csf_accumulate_samples( memcpy(backend_csf->old_sample_buf, new_sample_buf, buf_dump_bytes); /* Reset the prfcnt_en header on each sample before releasing them. */ - for (raw_idx = extract_index_to_start; raw_idx != insert_index_to_stop; - raw_idx++) { + for (raw_idx = extract_index_to_start; raw_idx != insert_index_to_stop; raw_idx++) { const u32 buf_idx = raw_idx & (ring_buf_cnt - 1); u32 *sample = (u32 *)&cpu_dump_base[buf_idx * buf_dump_bytes]; - kbasep_hwcnt_backend_csf_zero_sample_prfcnt_en_header( - backend_csf, sample); + kbasep_hwcnt_backend_csf_zero_sample_prfcnt_en_header(backend_csf, sample); } /* Sync zeroed buffers to avoid coherency issues on future use. */ backend_csf->info->csf_if->ring_buf_sync(backend_csf->info->csf_if->ctx, - backend_csf->ring_buf, - extract_index_to_start, + backend_csf->ring_buf, extract_index_to_start, insert_index_to_stop, false); /* After consuming all samples between extract_idx and insert_idx, @@ -617,22 +617,20 @@ static void kbasep_hwcnt_backend_csf_accumulate_samples( * can be released back to the ring buffer pool. */ backend_csf->info->csf_if->lock(backend_csf->info->csf_if->ctx, &flags); - backend_csf->info->csf_if->set_extract_index( - backend_csf->info->csf_if->ctx, insert_index_to_stop); + backend_csf->info->csf_if->set_extract_index(backend_csf->info->csf_if->ctx, + insert_index_to_stop); /* Update the watchdog last seen index to check any new FW auto samples * in next watchdog callback. */ backend_csf->watchdog_last_seen_insert_idx = insert_index_to_stop; - backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, - flags); + backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags); } static void kbasep_hwcnt_backend_csf_change_es_and_wake_waiters( struct kbase_hwcnt_backend_csf *backend_csf, enum kbase_hwcnt_backend_csf_enable_state new_state) { - backend_csf->info->csf_if->assert_lock_held( - backend_csf->info->csf_if->ctx); + backend_csf->info->csf_if->assert_lock_held(backend_csf->info->csf_if->ctx); if (backend_csf->enable_state != new_state) { backend_csf->enable_state = new_state; @@ -645,7 +643,7 @@ static void kbasep_hwcnt_backend_watchdog_timer_cb(void *info) { struct kbase_hwcnt_backend_csf_info *csf_info = info; struct kbase_hwcnt_backend_csf *backend_csf; - unsigned long flags; + unsigned long flags = 0UL; csf_info->csf_if->lock(csf_info->csf_if->ctx, &flags); @@ -663,26 +661,22 @@ static void kbasep_hwcnt_backend_watchdog_timer_cb(void *info) (!csf_info->fw_in_protected_mode) && /* 3. dump state indicates no other dumping is in progress. */ ((backend_csf->dump_state == KBASE_HWCNT_BACKEND_CSF_DUMP_IDLE) || - (backend_csf->dump_state == - KBASE_HWCNT_BACKEND_CSF_DUMP_COMPLETED))) { - u32 extract_index; - u32 insert_index; + (backend_csf->dump_state == KBASE_HWCNT_BACKEND_CSF_DUMP_COMPLETED))) { + u32 extract_index = 0U; + u32 insert_index = 0U; /* Read the raw extract and insert indexes from the CSF interface. */ - csf_info->csf_if->get_indexes(csf_info->csf_if->ctx, - &extract_index, &insert_index); + csf_info->csf_if->get_indexes(csf_info->csf_if->ctx, &extract_index, &insert_index); /* Do watchdog request if no new FW auto samples. */ - if (insert_index == - backend_csf->watchdog_last_seen_insert_idx) { + if (insert_index == backend_csf->watchdog_last_seen_insert_idx) { /* Trigger the watchdog request. */ csf_info->csf_if->dump_request(csf_info->csf_if->ctx); /* A watchdog dump is required, change the state to * start the request process. */ - backend_csf->dump_state = - KBASE_HWCNT_BACKEND_CSF_DUMP_WATCHDOG_REQUESTED; + backend_csf->dump_state = KBASE_HWCNT_BACKEND_CSF_DUMP_WATCHDOG_REQUESTED; } } @@ -691,12 +685,10 @@ static void kbasep_hwcnt_backend_watchdog_timer_cb(void *info) * counter enabled interrupt. */ if ((backend_csf->enable_state == KBASE_HWCNT_BACKEND_CSF_ENABLED) || - (backend_csf->enable_state == - KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_ENABLED)) { + (backend_csf->enable_state == KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_ENABLED)) { /* Reschedule the timer for next watchdog callback. */ - csf_info->watchdog_if->modify( - csf_info->watchdog_if->timer, - HWCNT_BACKEND_WATCHDOG_TIMER_INTERVAL_MS); + csf_info->watchdog_if->modify(csf_info->watchdog_if->timer, + HWCNT_BACKEND_WATCHDOG_TIMER_INTERVAL_MS); } csf_info->csf_if->unlock(csf_info->csf_if->ctx, flags); @@ -712,15 +704,14 @@ static void kbasep_hwcnt_backend_watchdog_timer_cb(void *info) */ static void kbasep_hwcnt_backend_csf_dump_worker(struct work_struct *work) { - unsigned long flags; + unsigned long flags = 0ULL; struct kbase_hwcnt_backend_csf *backend_csf; u32 insert_index_to_acc; - u32 extract_index; - u32 insert_index; + u32 extract_index = 0U; + u32 insert_index = 0U; WARN_ON(!work); - backend_csf = container_of(work, struct kbase_hwcnt_backend_csf, - hwc_dump_work); + backend_csf = container_of(work, struct kbase_hwcnt_backend_csf, hwc_dump_work); backend_csf->info->csf_if->lock(backend_csf->info->csf_if->ctx, &flags); /* Assert the backend is not destroyed. */ WARN_ON(backend_csf != backend_csf->info->backend); @@ -729,26 +720,22 @@ static void kbasep_hwcnt_backend_csf_dump_worker(struct work_struct *work) * launched. */ if (backend_csf->enable_state != KBASE_HWCNT_BACKEND_CSF_ENABLED) { - WARN_ON(backend_csf->dump_state != - KBASE_HWCNT_BACKEND_CSF_DUMP_IDLE); + WARN_ON(backend_csf->dump_state != KBASE_HWCNT_BACKEND_CSF_DUMP_IDLE); WARN_ON(!completion_done(&backend_csf->dump_completed)); - backend_csf->info->csf_if->unlock( - backend_csf->info->csf_if->ctx, flags); + backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags); return; } - WARN_ON(backend_csf->dump_state != - KBASE_HWCNT_BACKEND_CSF_DUMP_WORKER_LAUNCHED); + WARN_ON(backend_csf->dump_state != KBASE_HWCNT_BACKEND_CSF_DUMP_WORKER_LAUNCHED); backend_csf->dump_state = KBASE_HWCNT_BACKEND_CSF_DUMP_ACCUMULATING; insert_index_to_acc = backend_csf->insert_index_to_accumulate; /* Read the raw extract and insert indexes from the CSF interface. */ - backend_csf->info->csf_if->get_indexes(backend_csf->info->csf_if->ctx, - &extract_index, &insert_index); + backend_csf->info->csf_if->get_indexes(backend_csf->info->csf_if->ctx, &extract_index, + &insert_index); - backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, - flags); + backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags); /* Accumulate up to the insert we grabbed at the prfcnt request * interrupt. @@ -769,22 +756,18 @@ static void kbasep_hwcnt_backend_csf_dump_worker(struct work_struct *work) /* The backend was disabled or had an error while we were accumulating. */ if (backend_csf->enable_state != KBASE_HWCNT_BACKEND_CSF_ENABLED) { - WARN_ON(backend_csf->dump_state != - KBASE_HWCNT_BACKEND_CSF_DUMP_IDLE); + WARN_ON(backend_csf->dump_state != KBASE_HWCNT_BACKEND_CSF_DUMP_IDLE); WARN_ON(!completion_done(&backend_csf->dump_completed)); - backend_csf->info->csf_if->unlock( - backend_csf->info->csf_if->ctx, flags); + backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags); return; } - WARN_ON(backend_csf->dump_state != - KBASE_HWCNT_BACKEND_CSF_DUMP_ACCUMULATING); + WARN_ON(backend_csf->dump_state != KBASE_HWCNT_BACKEND_CSF_DUMP_ACCUMULATING); /* Our work here is done - set the wait object and unblock waiters. */ backend_csf->dump_state = KBASE_HWCNT_BACKEND_CSF_DUMP_COMPLETED; complete_all(&backend_csf->dump_completed); - backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, - flags); + backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags); } /** @@ -797,30 +780,28 @@ static void kbasep_hwcnt_backend_csf_dump_worker(struct work_struct *work) */ static void kbasep_hwcnt_backend_csf_threshold_worker(struct work_struct *work) { - unsigned long flags; + unsigned long flags = 0ULL; struct kbase_hwcnt_backend_csf *backend_csf; - u32 extract_index; - u32 insert_index; + u32 extract_index = 0U; + u32 insert_index = 0U; WARN_ON(!work); - backend_csf = container_of(work, struct kbase_hwcnt_backend_csf, - hwc_threshold_work); + backend_csf = container_of(work, struct kbase_hwcnt_backend_csf, hwc_threshold_work); backend_csf->info->csf_if->lock(backend_csf->info->csf_if->ctx, &flags); /* Assert the backend is not destroyed. */ WARN_ON(backend_csf != backend_csf->info->backend); /* Read the raw extract and insert indexes from the CSF interface. */ - backend_csf->info->csf_if->get_indexes(backend_csf->info->csf_if->ctx, - &extract_index, &insert_index); + backend_csf->info->csf_if->get_indexes(backend_csf->info->csf_if->ctx, &extract_index, + &insert_index); /* The backend was disabled or had an error while the worker was being * launched. */ if (backend_csf->enable_state != KBASE_HWCNT_BACKEND_CSF_ENABLED) { - backend_csf->info->csf_if->unlock( - backend_csf->info->csf_if->ctx, flags); + backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags); return; } @@ -829,14 +810,11 @@ static void kbasep_hwcnt_backend_csf_threshold_worker(struct work_struct *work) * interfere. */ if ((backend_csf->dump_state != KBASE_HWCNT_BACKEND_CSF_DUMP_IDLE) && - (backend_csf->dump_state != - KBASE_HWCNT_BACKEND_CSF_DUMP_COMPLETED)) { - backend_csf->info->csf_if->unlock( - backend_csf->info->csf_if->ctx, flags); + (backend_csf->dump_state != KBASE_HWCNT_BACKEND_CSF_DUMP_COMPLETED)) { + backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags); return; } - backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, - flags); + backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags); /* Accumulate everything we possibly can. We grabbed the insert index * immediately after we acquired the lock but before we checked whether @@ -845,14 +823,13 @@ static void kbasep_hwcnt_backend_csf_threshold_worker(struct work_struct *work) * fact that our insert will not exceed the concurrent dump's * insert_to_accumulate, so we don't risk accumulating too much data. */ - kbasep_hwcnt_backend_csf_accumulate_samples(backend_csf, extract_index, - insert_index); + kbasep_hwcnt_backend_csf_accumulate_samples(backend_csf, extract_index, insert_index); /* No need to wake up anything since it is not a user dump request. */ } -static void kbase_hwcnt_backend_csf_submit_dump_worker( - struct kbase_hwcnt_backend_csf_info *csf_info) +static void +kbase_hwcnt_backend_csf_submit_dump_worker(struct kbase_hwcnt_backend_csf_info *csf_info) { u32 extract_index; @@ -860,31 +837,26 @@ static void kbase_hwcnt_backend_csf_submit_dump_worker( csf_info->csf_if->assert_lock_held(csf_info->csf_if->ctx); WARN_ON(!kbasep_hwcnt_backend_csf_backend_exists(csf_info)); - WARN_ON(csf_info->backend->enable_state != - KBASE_HWCNT_BACKEND_CSF_ENABLED); - WARN_ON(csf_info->backend->dump_state != - KBASE_HWCNT_BACKEND_CSF_DUMP_QUERYING_INSERT); + WARN_ON(csf_info->backend->enable_state != KBASE_HWCNT_BACKEND_CSF_ENABLED); + WARN_ON(csf_info->backend->dump_state != KBASE_HWCNT_BACKEND_CSF_DUMP_QUERYING_INSERT); /* Save insert index now so that the dump worker only accumulates the * HWC data associated with this request. Extract index is not stored * as that needs to be checked when accumulating to prevent re-reading * buffers that have already been read and returned to the GPU. */ - csf_info->csf_if->get_indexes( - csf_info->csf_if->ctx, &extract_index, - &csf_info->backend->insert_index_to_accumulate); - csf_info->backend->dump_state = - KBASE_HWCNT_BACKEND_CSF_DUMP_WORKER_LAUNCHED; + csf_info->csf_if->get_indexes(csf_info->csf_if->ctx, &extract_index, + &csf_info->backend->insert_index_to_accumulate); + csf_info->backend->dump_state = KBASE_HWCNT_BACKEND_CSF_DUMP_WORKER_LAUNCHED; /* Submit the accumulator task into the work queue. */ - queue_work(csf_info->backend->hwc_dump_workq, - &csf_info->backend->hwc_dump_work); + queue_work(csf_info->backend->hwc_dump_workq, &csf_info->backend->hwc_dump_work); } -static void kbasep_hwcnt_backend_csf_get_physical_enable( - struct kbase_hwcnt_backend_csf *backend_csf, - const struct kbase_hwcnt_enable_map *enable_map, - struct kbase_hwcnt_backend_csf_if_enable *enable) +static void +kbasep_hwcnt_backend_csf_get_physical_enable(struct kbase_hwcnt_backend_csf *backend_csf, + const struct kbase_hwcnt_enable_map *enable_map, + struct kbase_hwcnt_backend_csf_if_enable *enable) { enum kbase_hwcnt_physical_set phys_counter_set; struct kbase_hwcnt_physical_enable_map phys_enable_map; @@ -896,8 +868,7 @@ static void kbasep_hwcnt_backend_csf_get_physical_enable( */ kbasep_hwcnt_backend_csf_process_enable_map(&phys_enable_map); - kbase_hwcnt_gpu_set_to_physical(&phys_counter_set, - backend_csf->info->counter_set); + kbase_hwcnt_gpu_set_to_physical(&phys_counter_set, backend_csf->info->counter_set); /* Use processed enable_map to enable HWC in HW level. */ enable->fe_bm = phys_enable_map.fe_bm; @@ -909,33 +880,29 @@ static void kbasep_hwcnt_backend_csf_get_physical_enable( } /* CSF backend implementation of kbase_hwcnt_backend_dump_enable_nolock_fn */ -static int kbasep_hwcnt_backend_csf_dump_enable_nolock( - struct kbase_hwcnt_backend *backend, - const struct kbase_hwcnt_enable_map *enable_map) +static int +kbasep_hwcnt_backend_csf_dump_enable_nolock(struct kbase_hwcnt_backend *backend, + const struct kbase_hwcnt_enable_map *enable_map) { - struct kbase_hwcnt_backend_csf *backend_csf = - (struct kbase_hwcnt_backend_csf *)backend; + struct kbase_hwcnt_backend_csf *backend_csf = (struct kbase_hwcnt_backend_csf *)backend; struct kbase_hwcnt_backend_csf_if_enable enable; int err; - if (!backend_csf || !enable_map || - (enable_map->metadata != backend_csf->info->metadata)) + if (!backend_csf || !enable_map || (enable_map->metadata != backend_csf->info->metadata)) return -EINVAL; - backend_csf->info->csf_if->assert_lock_held( - backend_csf->info->csf_if->ctx); + backend_csf->info->csf_if->assert_lock_held(backend_csf->info->csf_if->ctx); - kbasep_hwcnt_backend_csf_get_physical_enable(backend_csf, enable_map, - &enable); + kbasep_hwcnt_backend_csf_get_physical_enable(backend_csf, enable_map, &enable); /* enable_state should be DISABLED before we transfer it to enabled */ if (backend_csf->enable_state != KBASE_HWCNT_BACKEND_CSF_DISABLED) return -EIO; - err = backend_csf->info->watchdog_if->enable( - backend_csf->info->watchdog_if->timer, - HWCNT_BACKEND_WATCHDOG_TIMER_INTERVAL_MS, - kbasep_hwcnt_backend_watchdog_timer_cb, backend_csf->info); + err = backend_csf->info->watchdog_if->enable(backend_csf->info->watchdog_if->timer, + HWCNT_BACKEND_WATCHDOG_TIMER_INTERVAL_MS, + kbasep_hwcnt_backend_watchdog_timer_cb, + backend_csf->info); if (err) return err; @@ -953,58 +920,46 @@ static int kbasep_hwcnt_backend_csf_dump_enable_nolock( } /* CSF backend implementation of kbase_hwcnt_backend_dump_enable_fn */ -static int kbasep_hwcnt_backend_csf_dump_enable( - struct kbase_hwcnt_backend *backend, - const struct kbase_hwcnt_enable_map *enable_map) +static int kbasep_hwcnt_backend_csf_dump_enable(struct kbase_hwcnt_backend *backend, + const struct kbase_hwcnt_enable_map *enable_map) { int errcode; - unsigned long flags; - struct kbase_hwcnt_backend_csf *backend_csf = - (struct kbase_hwcnt_backend_csf *)backend; + unsigned long flags = 0UL; + struct kbase_hwcnt_backend_csf *backend_csf = (struct kbase_hwcnt_backend_csf *)backend; if (!backend_csf) return -EINVAL; backend_csf->info->csf_if->lock(backend_csf->info->csf_if->ctx, &flags); - errcode = kbasep_hwcnt_backend_csf_dump_enable_nolock(backend, - enable_map); - backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, - flags); + errcode = kbasep_hwcnt_backend_csf_dump_enable_nolock(backend, enable_map); + backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags); return errcode; } static void kbasep_hwcnt_backend_csf_wait_enable_transition_complete( struct kbase_hwcnt_backend_csf *backend_csf, unsigned long *lock_flags) { - backend_csf->info->csf_if->assert_lock_held( - backend_csf->info->csf_if->ctx); - - while ((backend_csf->enable_state == - KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_ENABLED) || - (backend_csf->enable_state == - KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_DISABLED)) { - backend_csf->info->csf_if->unlock( - backend_csf->info->csf_if->ctx, *lock_flags); - - wait_event( - backend_csf->enable_state_waitq, - (backend_csf->enable_state != - KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_ENABLED) && - (backend_csf->enable_state != - KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_DISABLED)); - - backend_csf->info->csf_if->lock(backend_csf->info->csf_if->ctx, - lock_flags); + backend_csf->info->csf_if->assert_lock_held(backend_csf->info->csf_if->ctx); + + while ((backend_csf->enable_state == KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_ENABLED) || + (backend_csf->enable_state == KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_DISABLED)) { + backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, *lock_flags); + + wait_event(backend_csf->enable_state_waitq, + (backend_csf->enable_state != + KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_ENABLED) && + (backend_csf->enable_state != + KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_DISABLED)); + + backend_csf->info->csf_if->lock(backend_csf->info->csf_if->ctx, lock_flags); } } /* CSF backend implementation of kbase_hwcnt_backend_dump_disable_fn */ -static void -kbasep_hwcnt_backend_csf_dump_disable(struct kbase_hwcnt_backend *backend) +static void kbasep_hwcnt_backend_csf_dump_disable(struct kbase_hwcnt_backend *backend) { - unsigned long flags; - struct kbase_hwcnt_backend_csf *backend_csf = - (struct kbase_hwcnt_backend_csf *)backend; + unsigned long flags = 0UL; + struct kbase_hwcnt_backend_csf *backend_csf = (struct kbase_hwcnt_backend_csf *)backend; bool do_disable = false; WARN_ON(!backend_csf); @@ -1014,24 +969,20 @@ kbasep_hwcnt_backend_csf_dump_disable(struct kbase_hwcnt_backend *backend) /* Make sure we wait until any previous enable or disable have completed * before doing anything. */ - kbasep_hwcnt_backend_csf_wait_enable_transition_complete(backend_csf, - &flags); + kbasep_hwcnt_backend_csf_wait_enable_transition_complete(backend_csf, &flags); if (backend_csf->enable_state == KBASE_HWCNT_BACKEND_CSF_DISABLED || - backend_csf->enable_state == - KBASE_HWCNT_BACKEND_CSF_UNRECOVERABLE_ERROR) { + backend_csf->enable_state == KBASE_HWCNT_BACKEND_CSF_UNRECOVERABLE_ERROR) { /* If we are already disabled or in an unrecoverable error * state, there is nothing for us to do. */ - backend_csf->info->csf_if->unlock( - backend_csf->info->csf_if->ctx, flags); + backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags); return; } if (backend_csf->enable_state == KBASE_HWCNT_BACKEND_CSF_ENABLED) { kbasep_hwcnt_backend_csf_change_es_and_wake_waiters( - backend_csf, - KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_DISABLED); + backend_csf, KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_DISABLED); backend_csf->dump_state = KBASE_HWCNT_BACKEND_CSF_DUMP_IDLE; complete_all(&backend_csf->dump_completed); /* Only disable if we were previously enabled - in all other @@ -1043,15 +994,13 @@ kbasep_hwcnt_backend_csf_dump_disable(struct kbase_hwcnt_backend *backend) WARN_ON(backend_csf->dump_state != KBASE_HWCNT_BACKEND_CSF_DUMP_IDLE); WARN_ON(!completion_done(&backend_csf->dump_completed)); - backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, - flags); + backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags); /* Deregister the timer and block until any timer callback has completed. * We've transitioned out of the ENABLED state so we can guarantee it * won't reschedule itself. */ - backend_csf->info->watchdog_if->disable( - backend_csf->info->watchdog_if->timer); + backend_csf->info->watchdog_if->disable(backend_csf->info->watchdog_if->timer); /* Block until any async work has completed. We have transitioned out of * the ENABLED state so we can guarantee no new work will concurrently @@ -1062,11 +1011,9 @@ kbasep_hwcnt_backend_csf_dump_disable(struct kbase_hwcnt_backend *backend) backend_csf->info->csf_if->lock(backend_csf->info->csf_if->ctx, &flags); if (do_disable) - backend_csf->info->csf_if->dump_disable( - backend_csf->info->csf_if->ctx); + backend_csf->info->csf_if->dump_disable(backend_csf->info->csf_if->ctx); - kbasep_hwcnt_backend_csf_wait_enable_transition_complete(backend_csf, - &flags); + kbasep_hwcnt_backend_csf_wait_enable_transition_complete(backend_csf, &flags); switch (backend_csf->enable_state) { case KBASE_HWCNT_BACKEND_CSF_DISABLED_WAIT_FOR_WORKER: @@ -1075,8 +1022,7 @@ kbasep_hwcnt_backend_csf_dump_disable(struct kbase_hwcnt_backend *backend) break; case KBASE_HWCNT_BACKEND_CSF_UNRECOVERABLE_ERROR_WAIT_FOR_WORKER: kbasep_hwcnt_backend_csf_change_es_and_wake_waiters( - backend_csf, - KBASE_HWCNT_BACKEND_CSF_UNRECOVERABLE_ERROR); + backend_csf, KBASE_HWCNT_BACKEND_CSF_UNRECOVERABLE_ERROR); break; default: WARN_ON(true); @@ -1086,8 +1032,7 @@ kbasep_hwcnt_backend_csf_dump_disable(struct kbase_hwcnt_backend *backend) backend_csf->user_requested = false; backend_csf->watchdog_last_seen_insert_idx = 0; - backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, - flags); + backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags); /* After disable, zero the header of all buffers in the ring buffer back * to 0 to prepare for the next enable. @@ -1095,9 +1040,9 @@ kbasep_hwcnt_backend_csf_dump_disable(struct kbase_hwcnt_backend *backend) kbasep_hwcnt_backend_csf_zero_all_prfcnt_en_header(backend_csf); /* Sync zeroed buffers to avoid coherency issues on future use. */ - backend_csf->info->csf_if->ring_buf_sync( - backend_csf->info->csf_if->ctx, backend_csf->ring_buf, 0, - backend_csf->info->ring_buf_cnt, false); + backend_csf->info->csf_if->ring_buf_sync(backend_csf->info->csf_if->ctx, + backend_csf->ring_buf, 0, + backend_csf->info->ring_buf_cnt, false); /* Reset accumulator, old_sample_buf and user_sample to all-0 to prepare * for next enable. @@ -1106,13 +1051,11 @@ kbasep_hwcnt_backend_csf_dump_disable(struct kbase_hwcnt_backend *backend) } /* CSF backend implementation of kbase_hwcnt_backend_dump_request_fn */ -static int -kbasep_hwcnt_backend_csf_dump_request(struct kbase_hwcnt_backend *backend, - u64 *dump_time_ns) +static int kbasep_hwcnt_backend_csf_dump_request(struct kbase_hwcnt_backend *backend, + u64 *dump_time_ns) { - unsigned long flags; - struct kbase_hwcnt_backend_csf *backend_csf = - (struct kbase_hwcnt_backend_csf *)backend; + unsigned long flags = 0UL; + struct kbase_hwcnt_backend_csf *backend_csf = (struct kbase_hwcnt_backend_csf *)backend; bool do_request = false; bool watchdog_dumping = false; @@ -1125,22 +1068,18 @@ kbasep_hwcnt_backend_csf_dump_request(struct kbase_hwcnt_backend *backend, * the user dump buffer is already zeroed. We can just short circuit to * the DUMP_COMPLETED state. */ - if (backend_csf->enable_state == - KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_ENABLED) { - backend_csf->dump_state = - KBASE_HWCNT_BACKEND_CSF_DUMP_COMPLETED; + if (backend_csf->enable_state == KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_ENABLED) { + backend_csf->dump_state = KBASE_HWCNT_BACKEND_CSF_DUMP_COMPLETED; *dump_time_ns = kbasep_hwcnt_backend_csf_timestamp_ns(backend); kbasep_hwcnt_backend_csf_cc_update(backend_csf); backend_csf->user_requested = true; - backend_csf->info->csf_if->unlock( - backend_csf->info->csf_if->ctx, flags); + backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags); return 0; } /* Otherwise, make sure we're already enabled. */ if (backend_csf->enable_state != KBASE_HWCNT_BACKEND_CSF_ENABLED) { - backend_csf->info->csf_if->unlock( - backend_csf->info->csf_if->ctx, flags); + backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags); return -EIO; } @@ -1153,15 +1092,12 @@ kbasep_hwcnt_backend_csf_dump_request(struct kbase_hwcnt_backend *backend, * request can be processed instead of ignored. */ if ((backend_csf->dump_state != KBASE_HWCNT_BACKEND_CSF_DUMP_IDLE) && - (backend_csf->dump_state != - KBASE_HWCNT_BACKEND_CSF_DUMP_COMPLETED) && - (backend_csf->dump_state != - KBASE_HWCNT_BACKEND_CSF_DUMP_WATCHDOG_REQUESTED)) { + (backend_csf->dump_state != KBASE_HWCNT_BACKEND_CSF_DUMP_COMPLETED) && + (backend_csf->dump_state != KBASE_HWCNT_BACKEND_CSF_DUMP_WATCHDOG_REQUESTED)) { /* HWC is disabled or another user dump is ongoing, * or we're on fault. */ - backend_csf->info->csf_if->unlock( - backend_csf->info->csf_if->ctx, flags); + backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags); /* HWC is disabled or another dump is ongoing, or we are on * fault. */ @@ -1171,8 +1107,7 @@ kbasep_hwcnt_backend_csf_dump_request(struct kbase_hwcnt_backend *backend, /* Reset the completion so dump_wait() has something to wait on. */ reinit_completion(&backend_csf->dump_completed); - if (backend_csf->dump_state == - KBASE_HWCNT_BACKEND_CSF_DUMP_WATCHDOG_REQUESTED) + if (backend_csf->dump_state == KBASE_HWCNT_BACKEND_CSF_DUMP_WATCHDOG_REQUESTED) watchdog_dumping = true; if ((backend_csf->enable_state == KBASE_HWCNT_BACKEND_CSF_ENABLED) && @@ -1180,15 +1115,13 @@ kbasep_hwcnt_backend_csf_dump_request(struct kbase_hwcnt_backend *backend, /* Only do the request if we are fully enabled and not in * protected mode. */ - backend_csf->dump_state = - KBASE_HWCNT_BACKEND_CSF_DUMP_REQUESTED; + backend_csf->dump_state = KBASE_HWCNT_BACKEND_CSF_DUMP_REQUESTED; do_request = true; } else { /* Skip the request and waiting for ack and go straight to * checking the insert and kicking off the worker to do the dump */ - backend_csf->dump_state = - KBASE_HWCNT_BACKEND_CSF_DUMP_QUERYING_INSERT; + backend_csf->dump_state = KBASE_HWCNT_BACKEND_CSF_DUMP_QUERYING_INSERT; } /* CSF firmware might enter protected mode now, but still call request. @@ -1210,31 +1143,26 @@ kbasep_hwcnt_backend_csf_dump_request(struct kbase_hwcnt_backend *backend, * ownership of the sample which watchdog requested. */ if (!watchdog_dumping) - backend_csf->info->csf_if->dump_request( - backend_csf->info->csf_if->ctx); + backend_csf->info->csf_if->dump_request(backend_csf->info->csf_if->ctx); } else kbase_hwcnt_backend_csf_submit_dump_worker(backend_csf->info); - backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, - flags); + backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags); /* Modify watchdog timer to delay the regular check time since * just requested. */ - backend_csf->info->watchdog_if->modify( - backend_csf->info->watchdog_if->timer, - HWCNT_BACKEND_WATCHDOG_TIMER_INTERVAL_MS); + backend_csf->info->watchdog_if->modify(backend_csf->info->watchdog_if->timer, + HWCNT_BACKEND_WATCHDOG_TIMER_INTERVAL_MS); return 0; } /* CSF backend implementation of kbase_hwcnt_backend_dump_wait_fn */ -static int -kbasep_hwcnt_backend_csf_dump_wait(struct kbase_hwcnt_backend *backend) +static int kbasep_hwcnt_backend_csf_dump_wait(struct kbase_hwcnt_backend *backend) { - unsigned long flags; - struct kbase_hwcnt_backend_csf *backend_csf = - (struct kbase_hwcnt_backend_csf *)backend; + unsigned long flags = 0UL; + struct kbase_hwcnt_backend_csf *backend_csf = (struct kbase_hwcnt_backend_csf *)backend; int errcode; if (!backend_csf) @@ -1247,26 +1175,21 @@ kbasep_hwcnt_backend_csf_dump_wait(struct kbase_hwcnt_backend *backend) * set. */ if (backend_csf->user_requested && - ((backend_csf->dump_state == - KBASE_HWCNT_BACKEND_CSF_DUMP_COMPLETED) || - (backend_csf->dump_state == - KBASE_HWCNT_BACKEND_CSF_DUMP_WATCHDOG_REQUESTED))) + ((backend_csf->dump_state == KBASE_HWCNT_BACKEND_CSF_DUMP_COMPLETED) || + (backend_csf->dump_state == KBASE_HWCNT_BACKEND_CSF_DUMP_WATCHDOG_REQUESTED))) errcode = 0; else errcode = -EIO; - backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, - flags); + backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags); return errcode; } /* CSF backend implementation of kbase_hwcnt_backend_dump_clear_fn */ -static int -kbasep_hwcnt_backend_csf_dump_clear(struct kbase_hwcnt_backend *backend) +static int kbasep_hwcnt_backend_csf_dump_clear(struct kbase_hwcnt_backend *backend) { - struct kbase_hwcnt_backend_csf *backend_csf = - (struct kbase_hwcnt_backend_csf *)backend; + struct kbase_hwcnt_backend_csf *backend_csf = (struct kbase_hwcnt_backend_csf *)backend; int errcode; u64 ts; @@ -1285,13 +1208,12 @@ kbasep_hwcnt_backend_csf_dump_clear(struct kbase_hwcnt_backend *backend) } /* CSF backend implementation of kbase_hwcnt_backend_dump_get_fn */ -static int kbasep_hwcnt_backend_csf_dump_get( - struct kbase_hwcnt_backend *backend, - struct kbase_hwcnt_dump_buffer *dst, - const struct kbase_hwcnt_enable_map *dst_enable_map, bool accumulate) +static int kbasep_hwcnt_backend_csf_dump_get(struct kbase_hwcnt_backend *backend, + struct kbase_hwcnt_dump_buffer *dst, + const struct kbase_hwcnt_enable_map *dst_enable_map, + bool accumulate) { - struct kbase_hwcnt_backend_csf *backend_csf = - (struct kbase_hwcnt_backend_csf *)backend; + struct kbase_hwcnt_backend_csf *backend_csf = (struct kbase_hwcnt_backend_csf *)backend; int ret; size_t clk; @@ -1301,9 +1223,9 @@ static int kbasep_hwcnt_backend_csf_dump_get( return -EINVAL; /* Extract elapsed cycle count for each clock domain if enabled. */ - kbase_hwcnt_metadata_for_each_clock(dst_enable_map->metadata, clk) { - if (!kbase_hwcnt_clk_enable_map_enabled( - dst_enable_map->clk_enable_map, clk)) + kbase_hwcnt_metadata_for_each_clock(dst_enable_map->metadata, clk) + { + if (!kbase_hwcnt_clk_enable_map_enabled(dst_enable_map->clk_enable_map, clk)) continue; /* Reset the counter to zero if accumulation is off. */ @@ -1316,8 +1238,7 @@ static int kbasep_hwcnt_backend_csf_dump_get( * as it is undefined to call this function without a prior succeeding * one to dump_wait(). */ - ret = kbase_hwcnt_csf_dump_get(dst, backend_csf->to_user_buf, - dst_enable_map, accumulate); + ret = kbase_hwcnt_csf_dump_get(dst, backend_csf->to_user_buf, dst_enable_map, accumulate); return ret; } @@ -1329,8 +1250,7 @@ static int kbasep_hwcnt_backend_csf_dump_get( * Can be safely called on a backend in any state of partial construction. * */ -static void -kbasep_hwcnt_backend_csf_destroy(struct kbase_hwcnt_backend_csf *backend_csf) +static void kbasep_hwcnt_backend_csf_destroy(struct kbase_hwcnt_backend_csf *backend_csf) { if (!backend_csf) return; @@ -1360,9 +1280,8 @@ kbasep_hwcnt_backend_csf_destroy(struct kbase_hwcnt_backend_csf *backend_csf) * * Return: 0 on success, else error code. */ -static int -kbasep_hwcnt_backend_csf_create(struct kbase_hwcnt_backend_csf_info *csf_info, - struct kbase_hwcnt_backend_csf **out_backend) +static int kbasep_hwcnt_backend_csf_create(struct kbase_hwcnt_backend_csf_info *csf_info, + struct kbase_hwcnt_backend_csf **out_backend) { struct kbase_hwcnt_backend_csf *backend_csf = NULL; int errcode = -ENOMEM; @@ -1375,27 +1294,23 @@ kbasep_hwcnt_backend_csf_create(struct kbase_hwcnt_backend_csf_info *csf_info, goto alloc_error; backend_csf->info = csf_info; - kbasep_hwcnt_backend_csf_init_layout(&csf_info->prfcnt_info, - &backend_csf->phys_layout); + kbasep_hwcnt_backend_csf_init_layout(&csf_info->prfcnt_info, &backend_csf->phys_layout); - backend_csf->accum_buf = - kzalloc(csf_info->metadata->dump_buf_bytes, GFP_KERNEL); + backend_csf->accum_buf = kzalloc(csf_info->metadata->dump_buf_bytes, GFP_KERNEL); if (!backend_csf->accum_buf) goto err_alloc_acc_buf; - backend_csf->old_sample_buf = - kzalloc(csf_info->prfcnt_info.dump_bytes, GFP_KERNEL); + backend_csf->old_sample_buf = kzalloc(csf_info->prfcnt_info.dump_bytes, GFP_KERNEL); if (!backend_csf->old_sample_buf) goto err_alloc_pre_sample_buf; - backend_csf->to_user_buf = - kzalloc(csf_info->metadata->dump_buf_bytes, GFP_KERNEL); + backend_csf->to_user_buf = kzalloc(csf_info->metadata->dump_buf_bytes, GFP_KERNEL); if (!backend_csf->to_user_buf) goto err_alloc_user_sample_buf; - errcode = csf_info->csf_if->ring_buf_alloc( - csf_info->csf_if->ctx, csf_info->ring_buf_cnt, - &backend_csf->ring_buf_cpu_base, &backend_csf->ring_buf); + errcode = csf_info->csf_if->ring_buf_alloc(csf_info->csf_if->ctx, csf_info->ring_buf_cnt, + &backend_csf->ring_buf_cpu_base, + &backend_csf->ring_buf); if (errcode) goto err_ring_buf_alloc; errcode = -ENOMEM; @@ -1404,9 +1319,9 @@ kbasep_hwcnt_backend_csf_create(struct kbase_hwcnt_backend_csf_info *csf_info, kbasep_hwcnt_backend_csf_zero_all_prfcnt_en_header(backend_csf); /* Sync zeroed buffers to avoid coherency issues on use. */ - backend_csf->info->csf_if->ring_buf_sync( - backend_csf->info->csf_if->ctx, backend_csf->ring_buf, 0, - backend_csf->info->ring_buf_cnt, false); + backend_csf->info->csf_if->ring_buf_sync(backend_csf->info->csf_if->ctx, + backend_csf->ring_buf, 0, + backend_csf->info->ring_buf_cnt, false); init_completion(&backend_csf->dump_completed); @@ -1420,10 +1335,8 @@ kbasep_hwcnt_backend_csf_create(struct kbase_hwcnt_backend_csf_info *csf_info, if (!backend_csf->hwc_dump_workq) goto err_alloc_workqueue; - INIT_WORK(&backend_csf->hwc_dump_work, - kbasep_hwcnt_backend_csf_dump_worker); - INIT_WORK(&backend_csf->hwc_threshold_work, - kbasep_hwcnt_backend_csf_threshold_worker); + INIT_WORK(&backend_csf->hwc_dump_work, kbasep_hwcnt_backend_csf_dump_worker); + INIT_WORK(&backend_csf->hwc_threshold_work, kbasep_hwcnt_backend_csf_threshold_worker); backend_csf->enable_state = KBASE_HWCNT_BACKEND_CSF_DISABLED; backend_csf->dump_state = KBASE_HWCNT_BACKEND_CSF_DUMP_IDLE; @@ -1434,7 +1347,6 @@ kbasep_hwcnt_backend_csf_create(struct kbase_hwcnt_backend_csf_info *csf_info, *out_backend = backend_csf; return 0; - destroy_workqueue(backend_csf->hwc_dump_workq); err_alloc_workqueue: backend_csf->info->csf_if->ring_buf_free(backend_csf->info->csf_if->ctx, backend_csf->ring_buf); @@ -1454,14 +1366,12 @@ alloc_error: } /* CSF backend implementation of kbase_hwcnt_backend_init_fn */ -static int -kbasep_hwcnt_backend_csf_init(const struct kbase_hwcnt_backend_info *info, - struct kbase_hwcnt_backend **out_backend) +static int kbasep_hwcnt_backend_csf_init(const struct kbase_hwcnt_backend_info *info, + struct kbase_hwcnt_backend **out_backend) { - unsigned long flags; + unsigned long flags = 0UL; struct kbase_hwcnt_backend_csf *backend_csf = NULL; - struct kbase_hwcnt_backend_csf_info *csf_info = - (struct kbase_hwcnt_backend_csf_info *)info; + struct kbase_hwcnt_backend_csf_info *csf_info = (struct kbase_hwcnt_backend_csf_info *)info; int errcode; bool success = false; @@ -1482,11 +1392,9 @@ kbasep_hwcnt_backend_csf_init(const struct kbase_hwcnt_backend_info *info, *out_backend = (struct kbase_hwcnt_backend *)backend_csf; success = true; if (csf_info->unrecoverable_error_happened) - backend_csf->enable_state = - KBASE_HWCNT_BACKEND_CSF_UNRECOVERABLE_ERROR; + backend_csf->enable_state = KBASE_HWCNT_BACKEND_CSF_UNRECOVERABLE_ERROR; } - backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, - flags); + backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags); /* Destroy the new created backend if the backend has already created * before. In normal case, this won't happen if the client call init() @@ -1503,9 +1411,8 @@ kbasep_hwcnt_backend_csf_init(const struct kbase_hwcnt_backend_info *info, /* CSF backend implementation of kbase_hwcnt_backend_term_fn */ static void kbasep_hwcnt_backend_csf_term(struct kbase_hwcnt_backend *backend) { - unsigned long flags; - struct kbase_hwcnt_backend_csf *backend_csf = - (struct kbase_hwcnt_backend_csf *)backend; + unsigned long flags = 0UL; + struct kbase_hwcnt_backend_csf *backend_csf = (struct kbase_hwcnt_backend_csf *)backend; if (!backend) return; @@ -1517,8 +1424,7 @@ static void kbasep_hwcnt_backend_csf_term(struct kbase_hwcnt_backend *backend) */ backend_csf->info->csf_if->lock(backend_csf->info->csf_if->ctx, &flags); backend_csf->info->backend = NULL; - backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, - flags); + backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags); kbasep_hwcnt_backend_csf_destroy(backend_csf); } @@ -1530,8 +1436,7 @@ static void kbasep_hwcnt_backend_csf_term(struct kbase_hwcnt_backend *backend) * Can be safely called on a backend info in any state of partial construction. * */ -static void kbasep_hwcnt_backend_csf_info_destroy( - const struct kbase_hwcnt_backend_csf_info *info) +static void kbasep_hwcnt_backend_csf_info_destroy(const struct kbase_hwcnt_backend_csf_info *info) { if (!info) return; @@ -1558,10 +1463,10 @@ static void kbasep_hwcnt_backend_csf_info_destroy( * * Return: 0 on success, else error code. */ -static int kbasep_hwcnt_backend_csf_info_create( - struct kbase_hwcnt_backend_csf_if *csf_if, u32 ring_buf_cnt, - struct kbase_hwcnt_watchdog_interface *watchdog_if, - const struct kbase_hwcnt_backend_csf_info **out_info) +static int +kbasep_hwcnt_backend_csf_info_create(struct kbase_hwcnt_backend_csf_if *csf_if, u32 ring_buf_cnt, + struct kbase_hwcnt_watchdog_interface *watchdog_if, + const struct kbase_hwcnt_backend_csf_info **out_info) { struct kbase_hwcnt_backend_csf_info *info = NULL; @@ -1584,8 +1489,7 @@ static int kbasep_hwcnt_backend_csf_info_create( .counter_set = KBASE_HWCNT_SET_PRIMARY, #endif .backend = NULL, .csf_if = csf_if, .ring_buf_cnt = ring_buf_cnt, - .fw_in_protected_mode = false, - .unrecoverable_error_happened = false, + .fw_in_protected_mode = false, .unrecoverable_error_happened = false, .watchdog_if = watchdog_if, }; *out_info = info; @@ -1605,19 +1509,17 @@ kbasep_hwcnt_backend_csf_metadata(const struct kbase_hwcnt_backend_info *info) return ((const struct kbase_hwcnt_backend_csf_info *)info)->metadata; } -static void kbasep_hwcnt_backend_csf_handle_unrecoverable_error( - struct kbase_hwcnt_backend_csf *backend_csf) +static void +kbasep_hwcnt_backend_csf_handle_unrecoverable_error(struct kbase_hwcnt_backend_csf *backend_csf) { bool do_disable = false; - backend_csf->info->csf_if->assert_lock_held( - backend_csf->info->csf_if->ctx); + backend_csf->info->csf_if->assert_lock_held(backend_csf->info->csf_if->ctx); /* We are already in or transitioning to the unrecoverable error state. * Early out. */ - if ((backend_csf->enable_state == - KBASE_HWCNT_BACKEND_CSF_UNRECOVERABLE_ERROR) || + if ((backend_csf->enable_state == KBASE_HWCNT_BACKEND_CSF_UNRECOVERABLE_ERROR) || (backend_csf->enable_state == KBASE_HWCNT_BACKEND_CSF_UNRECOVERABLE_ERROR_WAIT_FOR_WORKER)) return; @@ -1627,8 +1529,7 @@ static void kbasep_hwcnt_backend_csf_handle_unrecoverable_error( */ if (backend_csf->enable_state == KBASE_HWCNT_BACKEND_CSF_DISABLED) { kbasep_hwcnt_backend_csf_change_es_and_wake_waiters( - backend_csf, - KBASE_HWCNT_BACKEND_CSF_UNRECOVERABLE_ERROR); + backend_csf, KBASE_HWCNT_BACKEND_CSF_UNRECOVERABLE_ERROR); return; } @@ -1636,12 +1537,11 @@ static void kbasep_hwcnt_backend_csf_handle_unrecoverable_error( * disabled, we don't want to disable twice if an unrecoverable error * happens while we are disabling. */ - do_disable = (backend_csf->enable_state != - KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_DISABLED); + do_disable = + (backend_csf->enable_state != KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_DISABLED); kbasep_hwcnt_backend_csf_change_es_and_wake_waiters( - backend_csf, - KBASE_HWCNT_BACKEND_CSF_UNRECOVERABLE_ERROR_WAIT_FOR_WORKER); + backend_csf, KBASE_HWCNT_BACKEND_CSF_UNRECOVERABLE_ERROR_WAIT_FOR_WORKER); /* Transition the dump to the IDLE state and unblock any waiters. The * IDLE state signifies an error. @@ -1654,15 +1554,13 @@ static void kbasep_hwcnt_backend_csf_handle_unrecoverable_error( * happens while we are disabling. */ if (do_disable) - backend_csf->info->csf_if->dump_disable( - backend_csf->info->csf_if->ctx); + backend_csf->info->csf_if->dump_disable(backend_csf->info->csf_if->ctx); } -static void kbasep_hwcnt_backend_csf_handle_recoverable_error( - struct kbase_hwcnt_backend_csf *backend_csf) +static void +kbasep_hwcnt_backend_csf_handle_recoverable_error(struct kbase_hwcnt_backend_csf *backend_csf) { - backend_csf->info->csf_if->assert_lock_held( - backend_csf->info->csf_if->ctx); + backend_csf->info->csf_if->assert_lock_held(backend_csf->info->csf_if->ctx); switch (backend_csf->enable_state) { case KBASE_HWCNT_BACKEND_CSF_DISABLED: @@ -1678,8 +1576,7 @@ static void kbasep_hwcnt_backend_csf_handle_recoverable_error( /* A seemingly recoverable error that occurs while we are * transitioning to enabled is probably unrecoverable. */ - kbasep_hwcnt_backend_csf_handle_unrecoverable_error( - backend_csf); + kbasep_hwcnt_backend_csf_handle_unrecoverable_error(backend_csf); return; case KBASE_HWCNT_BACKEND_CSF_ENABLED: /* Start transitioning to the disabled state. We can't wait for @@ -1688,22 +1585,19 @@ static void kbasep_hwcnt_backend_csf_handle_recoverable_error( * disable(). */ kbasep_hwcnt_backend_csf_change_es_and_wake_waiters( - backend_csf, - KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_DISABLED); + backend_csf, KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_DISABLED); /* Transition the dump to the IDLE state and unblock any * waiters. The IDLE state signifies an error. */ backend_csf->dump_state = KBASE_HWCNT_BACKEND_CSF_DUMP_IDLE; complete_all(&backend_csf->dump_completed); - backend_csf->info->csf_if->dump_disable( - backend_csf->info->csf_if->ctx); + backend_csf->info->csf_if->dump_disable(backend_csf->info->csf_if->ctx); return; } } -void kbase_hwcnt_backend_csf_protm_entered( - struct kbase_hwcnt_backend_interface *iface) +void kbase_hwcnt_backend_csf_protm_entered(struct kbase_hwcnt_backend_interface *iface) { struct kbase_hwcnt_backend_csf_info *csf_info = (struct kbase_hwcnt_backend_csf_info *)iface->info; @@ -1717,8 +1611,7 @@ void kbase_hwcnt_backend_csf_protm_entered( kbase_hwcnt_backend_csf_on_prfcnt_sample(iface); } -void kbase_hwcnt_backend_csf_protm_exited( - struct kbase_hwcnt_backend_interface *iface) +void kbase_hwcnt_backend_csf_protm_exited(struct kbase_hwcnt_backend_interface *iface) { struct kbase_hwcnt_backend_csf_info *csf_info; @@ -1728,10 +1621,9 @@ void kbase_hwcnt_backend_csf_protm_exited( csf_info->fw_in_protected_mode = false; } -void kbase_hwcnt_backend_csf_on_unrecoverable_error( - struct kbase_hwcnt_backend_interface *iface) +void kbase_hwcnt_backend_csf_on_unrecoverable_error(struct kbase_hwcnt_backend_interface *iface) { - unsigned long flags; + unsigned long flags = 0UL; struct kbase_hwcnt_backend_csf_info *csf_info; csf_info = (struct kbase_hwcnt_backend_csf_info *)iface->info; @@ -1749,10 +1641,9 @@ void kbase_hwcnt_backend_csf_on_unrecoverable_error( csf_info->csf_if->unlock(csf_info->csf_if->ctx, flags); } -void kbase_hwcnt_backend_csf_on_before_reset( - struct kbase_hwcnt_backend_interface *iface) +void kbase_hwcnt_backend_csf_on_before_reset(struct kbase_hwcnt_backend_interface *iface) { - unsigned long flags; + unsigned long flags = 0UL; struct kbase_hwcnt_backend_csf_info *csf_info; struct kbase_hwcnt_backend_csf *backend_csf; @@ -1768,8 +1659,7 @@ void kbase_hwcnt_backend_csf_on_before_reset( backend_csf = csf_info->backend; if ((backend_csf->enable_state != KBASE_HWCNT_BACKEND_CSF_DISABLED) && - (backend_csf->enable_state != - KBASE_HWCNT_BACKEND_CSF_UNRECOVERABLE_ERROR)) { + (backend_csf->enable_state != KBASE_HWCNT_BACKEND_CSF_UNRECOVERABLE_ERROR)) { /* Before a reset occurs, we must either have been disabled * (else we lose data) or we should have encountered an * unrecoverable error. Either way, we will have disabled the @@ -1780,13 +1670,11 @@ void kbase_hwcnt_backend_csf_on_before_reset( * We can't wait for this disable to complete, but it doesn't * really matter, the power is being pulled. */ - kbasep_hwcnt_backend_csf_handle_unrecoverable_error( - csf_info->backend); + kbasep_hwcnt_backend_csf_handle_unrecoverable_error(csf_info->backend); } /* A reset is the only way to exit the unrecoverable error state */ - if (backend_csf->enable_state == - KBASE_HWCNT_BACKEND_CSF_UNRECOVERABLE_ERROR) { + if (backend_csf->enable_state == KBASE_HWCNT_BACKEND_CSF_UNRECOVERABLE_ERROR) { kbasep_hwcnt_backend_csf_change_es_and_wake_waiters( backend_csf, KBASE_HWCNT_BACKEND_CSF_DISABLED); } @@ -1794,8 +1682,7 @@ void kbase_hwcnt_backend_csf_on_before_reset( csf_info->csf_if->unlock(csf_info->csf_if->ctx, flags); } -void kbase_hwcnt_backend_csf_on_prfcnt_sample( - struct kbase_hwcnt_backend_interface *iface) +void kbase_hwcnt_backend_csf_on_prfcnt_sample(struct kbase_hwcnt_backend_interface *iface) { struct kbase_hwcnt_backend_csf_info *csf_info; struct kbase_hwcnt_backend_csf *backend_csf; @@ -1809,10 +1696,8 @@ void kbase_hwcnt_backend_csf_on_prfcnt_sample( backend_csf = csf_info->backend; /* Skip the dump_work if it's a watchdog request. */ - if (backend_csf->dump_state == - KBASE_HWCNT_BACKEND_CSF_DUMP_WATCHDOG_REQUESTED) { - backend_csf->dump_state = - KBASE_HWCNT_BACKEND_CSF_DUMP_COMPLETED; + if (backend_csf->dump_state == KBASE_HWCNT_BACKEND_CSF_DUMP_WATCHDOG_REQUESTED) { + backend_csf->dump_state = KBASE_HWCNT_BACKEND_CSF_DUMP_COMPLETED; return; } @@ -1826,8 +1711,7 @@ void kbase_hwcnt_backend_csf_on_prfcnt_sample( kbase_hwcnt_backend_csf_submit_dump_worker(csf_info); } -void kbase_hwcnt_backend_csf_on_prfcnt_threshold( - struct kbase_hwcnt_backend_interface *iface) +void kbase_hwcnt_backend_csf_on_prfcnt_threshold(struct kbase_hwcnt_backend_interface *iface) { struct kbase_hwcnt_backend_csf_info *csf_info; struct kbase_hwcnt_backend_csf *backend_csf; @@ -1844,12 +1728,10 @@ void kbase_hwcnt_backend_csf_on_prfcnt_threshold( /* Submit the threshold work into the work queue to consume the * available samples. */ - queue_work(backend_csf->hwc_dump_workq, - &backend_csf->hwc_threshold_work); + queue_work(backend_csf->hwc_dump_workq, &backend_csf->hwc_threshold_work); } -void kbase_hwcnt_backend_csf_on_prfcnt_overflow( - struct kbase_hwcnt_backend_interface *iface) +void kbase_hwcnt_backend_csf_on_prfcnt_overflow(struct kbase_hwcnt_backend_interface *iface) { struct kbase_hwcnt_backend_csf_info *csf_info; @@ -1870,8 +1752,7 @@ void kbase_hwcnt_backend_csf_on_prfcnt_overflow( kbasep_hwcnt_backend_csf_handle_recoverable_error(csf_info->backend); } -void kbase_hwcnt_backend_csf_on_prfcnt_enable( - struct kbase_hwcnt_backend_interface *iface) +void kbase_hwcnt_backend_csf_on_prfcnt_enable(struct kbase_hwcnt_backend_interface *iface) { struct kbase_hwcnt_backend_csf_info *csf_info; struct kbase_hwcnt_backend_csf *backend_csf; @@ -1884,12 +1765,10 @@ void kbase_hwcnt_backend_csf_on_prfcnt_enable( return; backend_csf = csf_info->backend; - if (backend_csf->enable_state == - KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_ENABLED) { + if (backend_csf->enable_state == KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_ENABLED) { kbasep_hwcnt_backend_csf_change_es_and_wake_waiters( backend_csf, KBASE_HWCNT_BACKEND_CSF_ENABLED); - } else if (backend_csf->enable_state == - KBASE_HWCNT_BACKEND_CSF_ENABLED) { + } else if (backend_csf->enable_state == KBASE_HWCNT_BACKEND_CSF_ENABLED) { /* Unexpected, but we are already in the right state so just * ignore it. */ @@ -1897,13 +1776,11 @@ void kbase_hwcnt_backend_csf_on_prfcnt_enable( /* Unexpected state change, assume everything is broken until * we reset. */ - kbasep_hwcnt_backend_csf_handle_unrecoverable_error( - csf_info->backend); + kbasep_hwcnt_backend_csf_handle_unrecoverable_error(csf_info->backend); } } -void kbase_hwcnt_backend_csf_on_prfcnt_disable( - struct kbase_hwcnt_backend_interface *iface) +void kbase_hwcnt_backend_csf_on_prfcnt_disable(struct kbase_hwcnt_backend_interface *iface) { struct kbase_hwcnt_backend_csf_info *csf_info; struct kbase_hwcnt_backend_csf *backend_csf; @@ -1916,13 +1793,10 @@ void kbase_hwcnt_backend_csf_on_prfcnt_disable( return; backend_csf = csf_info->backend; - if (backend_csf->enable_state == - KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_DISABLED) { + if (backend_csf->enable_state == KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_DISABLED) { kbasep_hwcnt_backend_csf_change_es_and_wake_waiters( - backend_csf, - KBASE_HWCNT_BACKEND_CSF_DISABLED_WAIT_FOR_WORKER); - } else if (backend_csf->enable_state == - KBASE_HWCNT_BACKEND_CSF_DISABLED) { + backend_csf, KBASE_HWCNT_BACKEND_CSF_DISABLED_WAIT_FOR_WORKER); + } else if (backend_csf->enable_state == KBASE_HWCNT_BACKEND_CSF_DISABLED) { /* Unexpected, but we are already in the right state so just * ignore it. */ @@ -1930,15 +1804,12 @@ void kbase_hwcnt_backend_csf_on_prfcnt_disable( /* Unexpected state change, assume everything is broken until * we reset. */ - kbasep_hwcnt_backend_csf_handle_unrecoverable_error( - csf_info->backend); + kbasep_hwcnt_backend_csf_handle_unrecoverable_error(csf_info->backend); } } -int kbase_hwcnt_backend_csf_metadata_init( - struct kbase_hwcnt_backend_interface *iface) +int kbase_hwcnt_backend_csf_metadata_init(struct kbase_hwcnt_backend_interface *iface) { - int errcode; struct kbase_hwcnt_backend_csf_info *csf_info; struct kbase_hwcnt_gpu_info gpu_info; @@ -1949,8 +1820,7 @@ int kbase_hwcnt_backend_csf_metadata_init( WARN_ON(!csf_info->csf_if->get_prfcnt_info); - csf_info->csf_if->get_prfcnt_info(csf_info->csf_if->ctx, - &csf_info->prfcnt_info); + csf_info->csf_if->get_prfcnt_info(csf_info->csf_if->ctx, &csf_info->prfcnt_info); /* The clock domain counts should not exceed the number of maximum * number of clock regulators. @@ -1962,25 +1832,12 @@ int kbase_hwcnt_backend_csf_metadata_init( gpu_info.core_mask = csf_info->prfcnt_info.core_mask; gpu_info.clk_cnt = csf_info->prfcnt_info.clk_cnt; gpu_info.prfcnt_values_per_block = - csf_info->prfcnt_info.prfcnt_block_size / - KBASE_HWCNT_VALUE_HW_BYTES; - errcode = kbase_hwcnt_csf_metadata_create( - &gpu_info, csf_info->counter_set, &csf_info->metadata); - if (errcode) - return errcode; - - /* - * Dump abstraction size should be exactly twice the size and layout as - * the physical dump size since 64-bit per value used in metadata. - */ - WARN_ON(csf_info->prfcnt_info.dump_bytes * 2 != - csf_info->metadata->dump_buf_bytes); - - return 0; + csf_info->prfcnt_info.prfcnt_block_size / KBASE_HWCNT_VALUE_HW_BYTES; + return kbase_hwcnt_csf_metadata_create(&gpu_info, csf_info->counter_set, + &csf_info->metadata); } -void kbase_hwcnt_backend_csf_metadata_term( - struct kbase_hwcnt_backend_interface *iface) +void kbase_hwcnt_backend_csf_metadata_term(struct kbase_hwcnt_backend_interface *iface) { struct kbase_hwcnt_backend_csf_info *csf_info; @@ -1994,10 +1851,9 @@ void kbase_hwcnt_backend_csf_metadata_term( } } -int kbase_hwcnt_backend_csf_create( - struct kbase_hwcnt_backend_csf_if *csf_if, u32 ring_buf_cnt, - struct kbase_hwcnt_watchdog_interface *watchdog_if, - struct kbase_hwcnt_backend_interface *iface) +int kbase_hwcnt_backend_csf_create(struct kbase_hwcnt_backend_csf_if *csf_if, u32 ring_buf_cnt, + struct kbase_hwcnt_watchdog_interface *watchdog_if, + struct kbase_hwcnt_backend_interface *iface) { int errcode; const struct kbase_hwcnt_backend_csf_info *info = NULL; @@ -2009,8 +1865,7 @@ int kbase_hwcnt_backend_csf_create( if (!is_power_of_2(ring_buf_cnt)) return -EINVAL; - errcode = kbasep_hwcnt_backend_csf_info_create(csf_if, ring_buf_cnt, - watchdog_if, &info); + errcode = kbasep_hwcnt_backend_csf_info_create(csf_if, ring_buf_cnt, watchdog_if, &info); if (errcode) return errcode; diff --git a/mali_kbase/mali_kbase_hwcnt_backend_csf.h b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend_csf.h index e0cafbe..9c5a5c9 100644 --- a/mali_kbase/mali_kbase_hwcnt_backend_csf.h +++ b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend_csf.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -27,9 +27,9 @@ #ifndef _KBASE_HWCNT_BACKEND_CSF_H_ #define _KBASE_HWCNT_BACKEND_CSF_H_ -#include "mali_kbase_hwcnt_backend.h" -#include "mali_kbase_hwcnt_backend_csf_if.h" -#include "mali_kbase_hwcnt_watchdog_if.h" +#include "hwcnt/backend/mali_kbase_hwcnt_backend.h" +#include "hwcnt/backend/mali_kbase_hwcnt_backend_csf_if.h" +#include "hwcnt/mali_kbase_hwcnt_watchdog_if.h" /** * kbase_hwcnt_backend_csf_create() - Create a CSF hardware counter backend @@ -47,10 +47,9 @@ * * Return: 0 on success, else error code. */ -int kbase_hwcnt_backend_csf_create( - struct kbase_hwcnt_backend_csf_if *csf_if, u32 ring_buf_cnt, - struct kbase_hwcnt_watchdog_interface *watchdog_if, - struct kbase_hwcnt_backend_interface *iface); +int kbase_hwcnt_backend_csf_create(struct kbase_hwcnt_backend_csf_if *csf_if, u32 ring_buf_cnt, + struct kbase_hwcnt_watchdog_interface *watchdog_if, + struct kbase_hwcnt_backend_interface *iface); /** * kbase_hwcnt_backend_csf_metadata_init() - Initialize the metadata for a CSF @@ -58,16 +57,14 @@ int kbase_hwcnt_backend_csf_create( * @iface: Non-NULL pointer to backend interface structure * Return: 0 on success, else error code. */ -int kbase_hwcnt_backend_csf_metadata_init( - struct kbase_hwcnt_backend_interface *iface); +int kbase_hwcnt_backend_csf_metadata_init(struct kbase_hwcnt_backend_interface *iface); /** * kbase_hwcnt_backend_csf_metadata_term() - Terminate the metadata for a CSF * hardware counter backend. * @iface: Non-NULL pointer to backend interface structure. */ -void kbase_hwcnt_backend_csf_metadata_term( - struct kbase_hwcnt_backend_interface *iface); +void kbase_hwcnt_backend_csf_metadata_term(struct kbase_hwcnt_backend_interface *iface); /** * kbase_hwcnt_backend_csf_destroy() - Destroy a CSF hardware counter backend @@ -77,8 +74,7 @@ void kbase_hwcnt_backend_csf_metadata_term( * Can be safely called on an all-zeroed interface, or on an already destroyed * interface. */ -void kbase_hwcnt_backend_csf_destroy( - struct kbase_hwcnt_backend_interface *iface); +void kbase_hwcnt_backend_csf_destroy(struct kbase_hwcnt_backend_interface *iface); /** * kbase_hwcnt_backend_csf_protm_entered() - CSF HWC backend function to receive @@ -86,8 +82,7 @@ void kbase_hwcnt_backend_csf_destroy( * has been entered. * @iface: Non-NULL pointer to HWC backend interface. */ -void kbase_hwcnt_backend_csf_protm_entered( - struct kbase_hwcnt_backend_interface *iface); +void kbase_hwcnt_backend_csf_protm_entered(struct kbase_hwcnt_backend_interface *iface); /** * kbase_hwcnt_backend_csf_protm_exited() - CSF HWC backend function to receive @@ -95,8 +90,7 @@ void kbase_hwcnt_backend_csf_protm_entered( * been exited. * @iface: Non-NULL pointer to HWC backend interface. */ -void kbase_hwcnt_backend_csf_protm_exited( - struct kbase_hwcnt_backend_interface *iface); +void kbase_hwcnt_backend_csf_protm_exited(struct kbase_hwcnt_backend_interface *iface); /** * kbase_hwcnt_backend_csf_on_unrecoverable_error() - CSF HWC backend function @@ -108,8 +102,7 @@ void kbase_hwcnt_backend_csf_protm_exited( * with reset, or that may put HWC logic in state that could result in hang. For * example, on bus error, or when FW becomes unresponsive. */ -void kbase_hwcnt_backend_csf_on_unrecoverable_error( - struct kbase_hwcnt_backend_interface *iface); +void kbase_hwcnt_backend_csf_on_unrecoverable_error(struct kbase_hwcnt_backend_interface *iface); /** * kbase_hwcnt_backend_csf_on_before_reset() - CSF HWC backend function to be @@ -119,16 +112,14 @@ void kbase_hwcnt_backend_csf_on_unrecoverable_error( * were in it. * @iface: Non-NULL pointer to HWC backend interface. */ -void kbase_hwcnt_backend_csf_on_before_reset( - struct kbase_hwcnt_backend_interface *iface); +void kbase_hwcnt_backend_csf_on_before_reset(struct kbase_hwcnt_backend_interface *iface); /** * kbase_hwcnt_backend_csf_on_prfcnt_sample() - CSF performance counter sample * complete interrupt handler. * @iface: Non-NULL pointer to HWC backend interface. */ -void kbase_hwcnt_backend_csf_on_prfcnt_sample( - struct kbase_hwcnt_backend_interface *iface); +void kbase_hwcnt_backend_csf_on_prfcnt_sample(struct kbase_hwcnt_backend_interface *iface); /** * kbase_hwcnt_backend_csf_on_prfcnt_threshold() - CSF performance counter @@ -136,31 +127,27 @@ void kbase_hwcnt_backend_csf_on_prfcnt_sample( * interrupt handler. * @iface: Non-NULL pointer to HWC backend interface. */ -void kbase_hwcnt_backend_csf_on_prfcnt_threshold( - struct kbase_hwcnt_backend_interface *iface); +void kbase_hwcnt_backend_csf_on_prfcnt_threshold(struct kbase_hwcnt_backend_interface *iface); /** * kbase_hwcnt_backend_csf_on_prfcnt_overflow() - CSF performance counter buffer * overflow interrupt handler. * @iface: Non-NULL pointer to HWC backend interface. */ -void kbase_hwcnt_backend_csf_on_prfcnt_overflow( - struct kbase_hwcnt_backend_interface *iface); +void kbase_hwcnt_backend_csf_on_prfcnt_overflow(struct kbase_hwcnt_backend_interface *iface); /** * kbase_hwcnt_backend_csf_on_prfcnt_enable() - CSF performance counter enabled * interrupt handler. * @iface: Non-NULL pointer to HWC backend interface. */ -void kbase_hwcnt_backend_csf_on_prfcnt_enable( - struct kbase_hwcnt_backend_interface *iface); +void kbase_hwcnt_backend_csf_on_prfcnt_enable(struct kbase_hwcnt_backend_interface *iface); /** * kbase_hwcnt_backend_csf_on_prfcnt_disable() - CSF performance counter * disabled interrupt handler. * @iface: Non-NULL pointer to HWC backend interface. */ -void kbase_hwcnt_backend_csf_on_prfcnt_disable( - struct kbase_hwcnt_backend_interface *iface); +void kbase_hwcnt_backend_csf_on_prfcnt_disable(struct kbase_hwcnt_backend_interface *iface); #endif /* _KBASE_HWCNT_BACKEND_CSF_H_ */ diff --git a/mali_kbase/mali_kbase_hwcnt_backend_csf_if.h b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend_csf_if.h index 9c4fef5..382a3ad 100644 --- a/mali_kbase/mali_kbase_hwcnt_backend_csf_if.h +++ b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend_csf_if.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -55,8 +55,12 @@ struct kbase_hwcnt_backend_csf_if_enable { /** * struct kbase_hwcnt_backend_csf_if_prfcnt_info - Performance counter * information. + * @prfcnt_hw_size: Total length in bytes of all the hardware counters data. The hardware + * counters are sub-divided into 4 classes: front-end, shader, tiler, and + * memory system (l2 cache + MMU). + * @prfcnt_fw_size: Total length in bytes of all the firmware counters data. * @dump_bytes: Bytes of GPU memory required to perform a performance - * counter dump. + * counter dump. dump_bytes = prfcnt_hw_size + prfcnt_fw_size. * @prfcnt_block_size: Bytes of each performance counter block. * @l2_count: The MMU L2 cache count. * @core_mask: Shader core mask. @@ -65,6 +69,8 @@ struct kbase_hwcnt_backend_csf_if_enable { * is taken. */ struct kbase_hwcnt_backend_csf_if_prfcnt_info { + size_t prfcnt_hw_size; + size_t prfcnt_fw_size; size_t dump_bytes; size_t prfcnt_block_size; size_t l2_count; @@ -79,8 +85,8 @@ struct kbase_hwcnt_backend_csf_if_prfcnt_info { * held. * @ctx: Non-NULL pointer to a CSF context. */ -typedef void kbase_hwcnt_backend_csf_if_assert_lock_held_fn( - struct kbase_hwcnt_backend_csf_if_ctx *ctx); +typedef void +kbase_hwcnt_backend_csf_if_assert_lock_held_fn(struct kbase_hwcnt_backend_csf_if_ctx *ctx); /** * typedef kbase_hwcnt_backend_csf_if_lock_fn - Acquire backend spinlock. @@ -89,9 +95,8 @@ typedef void kbase_hwcnt_backend_csf_if_assert_lock_held_fn( * @flags: Pointer to the memory location that would store the previous * interrupt state. */ -typedef void kbase_hwcnt_backend_csf_if_lock_fn( - struct kbase_hwcnt_backend_csf_if_ctx *ctx, - unsigned long *flags); +typedef void kbase_hwcnt_backend_csf_if_lock_fn(struct kbase_hwcnt_backend_csf_if_ctx *ctx, + unsigned long *flags); /** * typedef kbase_hwcnt_backend_csf_if_unlock_fn - Release backend spinlock. @@ -100,9 +105,8 @@ typedef void kbase_hwcnt_backend_csf_if_lock_fn( * @flags: Previously stored interrupt state when Scheduler interrupt * spinlock was acquired. */ -typedef void kbase_hwcnt_backend_csf_if_unlock_fn( - struct kbase_hwcnt_backend_csf_if_ctx *ctx, - unsigned long flags); +typedef void kbase_hwcnt_backend_csf_if_unlock_fn(struct kbase_hwcnt_backend_csf_if_ctx *ctx, + unsigned long flags); /** * typedef kbase_hwcnt_backend_csf_if_get_prfcnt_info_fn - Get performance @@ -131,10 +135,10 @@ typedef void kbase_hwcnt_backend_csf_if_get_prfcnt_info_fn( * * Return: 0 on success, else error code. */ -typedef int kbase_hwcnt_backend_csf_if_ring_buf_alloc_fn( - struct kbase_hwcnt_backend_csf_if_ctx *ctx, u32 buf_count, - void **cpu_dump_base, - struct kbase_hwcnt_backend_csf_if_ring_buf **ring_buf); +typedef int +kbase_hwcnt_backend_csf_if_ring_buf_alloc_fn(struct kbase_hwcnt_backend_csf_if_ctx *ctx, + u32 buf_count, void **cpu_dump_base, + struct kbase_hwcnt_backend_csf_if_ring_buf **ring_buf); /** * typedef kbase_hwcnt_backend_csf_if_ring_buf_sync_fn - Sync HWC dump buffers @@ -153,10 +157,10 @@ typedef int kbase_hwcnt_backend_csf_if_ring_buf_alloc_fn( * Flush cached HWC dump buffer data to ensure that all writes from GPU and CPU * are correctly observed. */ -typedef void kbase_hwcnt_backend_csf_if_ring_buf_sync_fn( - struct kbase_hwcnt_backend_csf_if_ctx *ctx, - struct kbase_hwcnt_backend_csf_if_ring_buf *ring_buf, - u32 buf_index_first, u32 buf_index_last, bool for_cpu); +typedef void +kbase_hwcnt_backend_csf_if_ring_buf_sync_fn(struct kbase_hwcnt_backend_csf_if_ctx *ctx, + struct kbase_hwcnt_backend_csf_if_ring_buf *ring_buf, + u32 buf_index_first, u32 buf_index_last, bool for_cpu); /** * typedef kbase_hwcnt_backend_csf_if_ring_buf_free_fn - Free a ring buffer for @@ -165,9 +169,9 @@ typedef void kbase_hwcnt_backend_csf_if_ring_buf_sync_fn( * @ctx: Non-NULL pointer to a CSF interface context. * @ring_buf: Non-NULL pointer to the ring buffer which to be freed. */ -typedef void kbase_hwcnt_backend_csf_if_ring_buf_free_fn( - struct kbase_hwcnt_backend_csf_if_ctx *ctx, - struct kbase_hwcnt_backend_csf_if_ring_buf *ring_buf); +typedef void +kbase_hwcnt_backend_csf_if_ring_buf_free_fn(struct kbase_hwcnt_backend_csf_if_ctx *ctx, + struct kbase_hwcnt_backend_csf_if_ring_buf *ring_buf); /** * typedef kbase_hwcnt_backend_csf_if_timestamp_ns_fn - Get the current @@ -177,8 +181,7 @@ typedef void kbase_hwcnt_backend_csf_if_ring_buf_free_fn( * * Return: CSF interface timestamp in nanoseconds. */ -typedef u64 kbase_hwcnt_backend_csf_if_timestamp_ns_fn( - struct kbase_hwcnt_backend_csf_if_ctx *ctx); +typedef u64 kbase_hwcnt_backend_csf_if_timestamp_ns_fn(struct kbase_hwcnt_backend_csf_if_ctx *ctx); /** * typedef kbase_hwcnt_backend_csf_if_dump_enable_fn - Setup and enable hardware @@ -189,10 +192,10 @@ typedef u64 kbase_hwcnt_backend_csf_if_timestamp_ns_fn( * * Requires lock to be taken before calling. */ -typedef void kbase_hwcnt_backend_csf_if_dump_enable_fn( - struct kbase_hwcnt_backend_csf_if_ctx *ctx, - struct kbase_hwcnt_backend_csf_if_ring_buf *ring_buf, - struct kbase_hwcnt_backend_csf_if_enable *enable); +typedef void +kbase_hwcnt_backend_csf_if_dump_enable_fn(struct kbase_hwcnt_backend_csf_if_ctx *ctx, + struct kbase_hwcnt_backend_csf_if_ring_buf *ring_buf, + struct kbase_hwcnt_backend_csf_if_enable *enable); /** * typedef kbase_hwcnt_backend_csf_if_dump_disable_fn - Disable hardware counter @@ -201,8 +204,7 @@ typedef void kbase_hwcnt_backend_csf_if_dump_enable_fn( * * Requires lock to be taken before calling. */ -typedef void kbase_hwcnt_backend_csf_if_dump_disable_fn( - struct kbase_hwcnt_backend_csf_if_ctx *ctx); +typedef void kbase_hwcnt_backend_csf_if_dump_disable_fn(struct kbase_hwcnt_backend_csf_if_ctx *ctx); /** * typedef kbase_hwcnt_backend_csf_if_dump_request_fn - Request a HWC dump. @@ -211,8 +213,7 @@ typedef void kbase_hwcnt_backend_csf_if_dump_disable_fn( * * Requires lock to be taken before calling. */ -typedef void kbase_hwcnt_backend_csf_if_dump_request_fn( - struct kbase_hwcnt_backend_csf_if_ctx *ctx); +typedef void kbase_hwcnt_backend_csf_if_dump_request_fn(struct kbase_hwcnt_backend_csf_if_ctx *ctx); /** * typedef kbase_hwcnt_backend_csf_if_get_indexes_fn - Get current extract and @@ -225,9 +226,8 @@ typedef void kbase_hwcnt_backend_csf_if_dump_request_fn( * * Requires lock to be taken before calling. */ -typedef void kbase_hwcnt_backend_csf_if_get_indexes_fn( - struct kbase_hwcnt_backend_csf_if_ctx *ctx, u32 *extract_index, - u32 *insert_index); +typedef void kbase_hwcnt_backend_csf_if_get_indexes_fn(struct kbase_hwcnt_backend_csf_if_ctx *ctx, + u32 *extract_index, u32 *insert_index); /** * typedef kbase_hwcnt_backend_csf_if_set_extract_index_fn - Update the extract @@ -239,8 +239,9 @@ typedef void kbase_hwcnt_backend_csf_if_get_indexes_fn( * * Requires lock to be taken before calling. */ -typedef void kbase_hwcnt_backend_csf_if_set_extract_index_fn( - struct kbase_hwcnt_backend_csf_if_ctx *ctx, u32 extract_index); +typedef void +kbase_hwcnt_backend_csf_if_set_extract_index_fn(struct kbase_hwcnt_backend_csf_if_ctx *ctx, + u32 extract_index); /** * typedef kbase_hwcnt_backend_csf_if_get_gpu_cycle_count_fn - Get the current @@ -254,9 +255,9 @@ typedef void kbase_hwcnt_backend_csf_if_set_extract_index_fn( * * Requires lock to be taken before calling. */ -typedef void kbase_hwcnt_backend_csf_if_get_gpu_cycle_count_fn( - struct kbase_hwcnt_backend_csf_if_ctx *ctx, u64 *cycle_counts, - u64 clk_enable_map); +typedef void +kbase_hwcnt_backend_csf_if_get_gpu_cycle_count_fn(struct kbase_hwcnt_backend_csf_if_ctx *ctx, + u64 *cycle_counts, u64 clk_enable_map); /** * struct kbase_hwcnt_backend_csf_if - Hardware counter backend CSF virtual @@ -273,8 +274,6 @@ typedef void kbase_hwcnt_backend_csf_if_get_gpu_cycle_count_fn( * @timestamp_ns: Function ptr to get the current CSF interface * timestamp. * @dump_enable: Function ptr to enable dumping. - * @dump_enable_nolock: Function ptr to enable dumping while the - * backend-specific spinlock is already held. * @dump_disable: Function ptr to disable dumping. * @dump_request: Function ptr to request a dump. * @get_indexes: Function ptr to get extract and insert indexes of the diff --git a/mali_kbase/mali_kbase_hwcnt_backend_csf_if_fw.c b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend_csf_if_fw.c index 15ffbfa..c8cf934 100644 --- a/mali_kbase/mali_kbase_hwcnt_backend_csf_if_fw.c +++ b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend_csf_if_fw.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2021-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -26,24 +26,19 @@ #include <mali_kbase.h> #include <gpu/mali_kbase_gpu_regmap.h> #include <device/mali_kbase_device.h> -#include "mali_kbase_hwcnt_gpu.h" -#include "mali_kbase_hwcnt_types.h" +#include "hwcnt/mali_kbase_hwcnt_gpu.h" +#include "hwcnt/mali_kbase_hwcnt_types.h" #include <csf/mali_kbase_csf_registers.h> #include "csf/mali_kbase_csf_firmware.h" -#include "mali_kbase_hwcnt_backend_csf_if_fw.h" +#include "hwcnt/backend/mali_kbase_hwcnt_backend_csf_if_fw.h" #include "mali_kbase_hwaccess_time.h" #include "backend/gpu/mali_kbase_clk_rate_trace_mgr.h" +#include <backend/gpu/mali_kbase_model_linux.h> #include <linux/log2.h> #include "mali_kbase_ccswe.h" -#if IS_ENABLED(CONFIG_MALI_NO_MALI) -#include <backend/gpu/mali_kbase_model_dummy.h> -#endif /* CONFIG_MALI_NO_MALI */ - -/** The number of nanoseconds in a second. */ -#define NSECS_IN_SEC 1000000000ull /* ns */ /* Ring buffer virtual address start at 4GB */ #define KBASE_HWC_CSF_RING_BUFFER_VA_START (1ull << 32) @@ -90,8 +85,8 @@ struct kbase_hwcnt_backend_csf_if_fw_ctx { struct kbase_ccswe ccswe_shader_cores; }; -static void kbasep_hwcnt_backend_csf_if_fw_assert_lock_held( - struct kbase_hwcnt_backend_csf_if_ctx *ctx) +static void +kbasep_hwcnt_backend_csf_if_fw_assert_lock_held(struct kbase_hwcnt_backend_csf_if_ctx *ctx) { struct kbase_hwcnt_backend_csf_if_fw_ctx *fw_ctx; struct kbase_device *kbdev; @@ -104,9 +99,10 @@ static void kbasep_hwcnt_backend_csf_if_fw_assert_lock_held( kbase_csf_scheduler_spin_lock_assert_held(kbdev); } -static void -kbasep_hwcnt_backend_csf_if_fw_lock(struct kbase_hwcnt_backend_csf_if_ctx *ctx, - unsigned long *flags) +static void kbasep_hwcnt_backend_csf_if_fw_lock(struct kbase_hwcnt_backend_csf_if_ctx *ctx, + unsigned long *flags) + __acquires(&(struct kbase_hwcnt_backend_csf_if_fw_ctx) + ctx->kbdev->csf.scheduler.interrupt_lock) { struct kbase_hwcnt_backend_csf_if_fw_ctx *fw_ctx; struct kbase_device *kbdev; @@ -119,8 +115,10 @@ kbasep_hwcnt_backend_csf_if_fw_lock(struct kbase_hwcnt_backend_csf_if_ctx *ctx, kbase_csf_scheduler_spin_lock(kbdev, flags); } -static void kbasep_hwcnt_backend_csf_if_fw_unlock( - struct kbase_hwcnt_backend_csf_if_ctx *ctx, unsigned long flags) +static void kbasep_hwcnt_backend_csf_if_fw_unlock(struct kbase_hwcnt_backend_csf_if_ctx *ctx, + unsigned long flags) + __releases(&(struct kbase_hwcnt_backend_csf_if_fw_ctx) + ctx->kbdev->csf.scheduler.interrupt_lock) { struct kbase_hwcnt_backend_csf_if_fw_ctx *fw_ctx; struct kbase_device *kbdev; @@ -141,22 +139,19 @@ static void kbasep_hwcnt_backend_csf_if_fw_unlock( * @clk_index: Clock index * @clk_rate_hz: Clock frequency(hz) */ -static void kbasep_hwcnt_backend_csf_if_fw_on_freq_change( - struct kbase_clk_rate_listener *rate_listener, u32 clk_index, - u32 clk_rate_hz) +static void +kbasep_hwcnt_backend_csf_if_fw_on_freq_change(struct kbase_clk_rate_listener *rate_listener, + u32 clk_index, u32 clk_rate_hz) { - struct kbase_hwcnt_backend_csf_if_fw_ctx *fw_ctx = - container_of(rate_listener, - struct kbase_hwcnt_backend_csf_if_fw_ctx, - rate_listener); + struct kbase_hwcnt_backend_csf_if_fw_ctx *fw_ctx = container_of( + rate_listener, struct kbase_hwcnt_backend_csf_if_fw_ctx, rate_listener); u64 timestamp_ns; if (clk_index != KBASE_CLOCK_DOMAIN_SHADER_CORES) return; timestamp_ns = ktime_get_raw_ns(); - kbase_ccswe_freq_change(&fw_ctx->ccswe_shader_cores, timestamp_ns, - clk_rate_hz); + kbase_ccswe_freq_change(&fw_ctx->ccswe_shader_cores, timestamp_ns, clk_rate_hz); } /** @@ -165,17 +160,16 @@ static void kbasep_hwcnt_backend_csf_if_fw_on_freq_change( * @fw_ctx: Non-NULL pointer to CSF firmware interface context. * @clk_enable_map: Non-NULL pointer to enable map specifying enabled counters. */ -static void kbasep_hwcnt_backend_csf_if_fw_cc_enable( - struct kbase_hwcnt_backend_csf_if_fw_ctx *fw_ctx, u64 clk_enable_map) +static void +kbasep_hwcnt_backend_csf_if_fw_cc_enable(struct kbase_hwcnt_backend_csf_if_fw_ctx *fw_ctx, + u64 clk_enable_map) { struct kbase_device *kbdev = fw_ctx->kbdev; - if (kbase_hwcnt_clk_enable_map_enabled( - clk_enable_map, KBASE_CLOCK_DOMAIN_SHADER_CORES)) { + if (kbase_hwcnt_clk_enable_map_enabled(clk_enable_map, KBASE_CLOCK_DOMAIN_SHADER_CORES)) { /* software estimation for non-top clock domains */ struct kbase_clk_rate_trace_manager *rtm = &kbdev->pm.clk_rtm; - const struct kbase_clk_data *clk_data = - rtm->clks[KBASE_CLOCK_DOMAIN_SHADER_CORES]; + const struct kbase_clk_data *clk_data = rtm->clks[KBASE_CLOCK_DOMAIN_SHADER_CORES]; u32 cur_freq; unsigned long flags; u64 timestamp_ns; @@ -186,11 +180,9 @@ static void kbasep_hwcnt_backend_csf_if_fw_cc_enable( cur_freq = (u32)clk_data->clock_val; kbase_ccswe_reset(&fw_ctx->ccswe_shader_cores); - kbase_ccswe_freq_change(&fw_ctx->ccswe_shader_cores, - timestamp_ns, cur_freq); + kbase_ccswe_freq_change(&fw_ctx->ccswe_shader_cores, timestamp_ns, cur_freq); - kbase_clk_rate_trace_manager_subscribe_no_lock( - rtm, &fw_ctx->rate_listener); + kbase_clk_rate_trace_manager_subscribe_no_lock(rtm, &fw_ctx->rate_listener); spin_unlock_irqrestore(&rtm->lock, flags); } @@ -203,17 +195,15 @@ static void kbasep_hwcnt_backend_csf_if_fw_cc_enable( * * @fw_ctx: Non-NULL pointer to CSF firmware interface context. */ -static void kbasep_hwcnt_backend_csf_if_fw_cc_disable( - struct kbase_hwcnt_backend_csf_if_fw_ctx *fw_ctx) +static void +kbasep_hwcnt_backend_csf_if_fw_cc_disable(struct kbase_hwcnt_backend_csf_if_fw_ctx *fw_ctx) { struct kbase_device *kbdev = fw_ctx->kbdev; struct kbase_clk_rate_trace_manager *rtm = &kbdev->pm.clk_rtm; u64 clk_enable_map = fw_ctx->clk_enable_map; - if (kbase_hwcnt_clk_enable_map_enabled(clk_enable_map, - KBASE_CLOCK_DOMAIN_SHADER_CORES)) - kbase_clk_rate_trace_manager_unsubscribe( - rtm, &fw_ctx->rate_listener); + if (kbase_hwcnt_clk_enable_map_enabled(clk_enable_map, KBASE_CLOCK_DOMAIN_SHADER_CORES)) + kbase_clk_rate_trace_manager_unsubscribe(rtm, &fw_ctx->rate_listener); } static void kbasep_hwcnt_backend_csf_if_fw_get_prfcnt_info( @@ -221,32 +211,31 @@ static void kbasep_hwcnt_backend_csf_if_fw_get_prfcnt_info( struct kbase_hwcnt_backend_csf_if_prfcnt_info *prfcnt_info) { #if IS_ENABLED(CONFIG_MALI_NO_MALI) - size_t dummy_model_blk_count; struct kbase_hwcnt_backend_csf_if_fw_ctx *fw_ctx = (struct kbase_hwcnt_backend_csf_if_fw_ctx *)ctx; - prfcnt_info->l2_count = KBASE_DUMMY_MODEL_MAX_MEMSYS_BLOCKS; - prfcnt_info->core_mask = - (1ull << KBASE_DUMMY_MODEL_MAX_SHADER_CORES) - 1; - /* 1 FE block + 1 Tiler block + l2_count blocks + shader_core blocks */ - dummy_model_blk_count = - 2 + prfcnt_info->l2_count + fls64(prfcnt_info->core_mask); - prfcnt_info->dump_bytes = - dummy_model_blk_count * KBASE_DUMMY_MODEL_BLOCK_SIZE; - prfcnt_info->prfcnt_block_size = - KBASE_HWCNT_V5_DEFAULT_VALUES_PER_BLOCK * - KBASE_HWCNT_VALUE_HW_BYTES; - prfcnt_info->clk_cnt = 1; - prfcnt_info->clearing_samples = true; + *prfcnt_info = (struct kbase_hwcnt_backend_csf_if_prfcnt_info){ + .l2_count = KBASE_DUMMY_MODEL_MAX_MEMSYS_BLOCKS, + .core_mask = (1ull << KBASE_DUMMY_MODEL_MAX_SHADER_CORES) - 1, + .prfcnt_hw_size = + KBASE_DUMMY_MODEL_MAX_NUM_HARDWARE_BLOCKS * KBASE_DUMMY_MODEL_BLOCK_SIZE, + .prfcnt_fw_size = + KBASE_DUMMY_MODEL_MAX_FIRMWARE_BLOCKS * KBASE_DUMMY_MODEL_BLOCK_SIZE, + .dump_bytes = KBASE_DUMMY_MODEL_MAX_SAMPLE_SIZE, + .prfcnt_block_size = KBASE_DUMMY_MODEL_BLOCK_SIZE, + .clk_cnt = 1, + .clearing_samples = true, + }; + fw_ctx->buf_bytes = prfcnt_info->dump_bytes; #else struct kbase_hwcnt_backend_csf_if_fw_ctx *fw_ctx; struct kbase_device *kbdev; u32 prfcnt_size; - u32 prfcnt_hw_size = 0; - u32 prfcnt_fw_size = 0; - u32 prfcnt_block_size = KBASE_HWCNT_V5_DEFAULT_VALUES_PER_BLOCK * - KBASE_HWCNT_VALUE_HW_BYTES; + u32 prfcnt_hw_size; + u32 prfcnt_fw_size; + u32 prfcnt_block_size = + KBASE_HWCNT_V5_DEFAULT_VALUES_PER_BLOCK * KBASE_HWCNT_VALUE_HW_BYTES; WARN_ON(!ctx); WARN_ON(!prfcnt_info); @@ -254,8 +243,8 @@ static void kbasep_hwcnt_backend_csf_if_fw_get_prfcnt_info( fw_ctx = (struct kbase_hwcnt_backend_csf_if_fw_ctx *)ctx; kbdev = fw_ctx->kbdev; prfcnt_size = kbdev->csf.global_iface.prfcnt_size; - prfcnt_hw_size = (prfcnt_size & 0xFF) << 8; - prfcnt_fw_size = (prfcnt_size >> 16) << 8; + prfcnt_hw_size = GLB_PRFCNT_SIZE_HARDWARE_SIZE_GET(prfcnt_size); + prfcnt_fw_size = GLB_PRFCNT_SIZE_FIRMWARE_SIZE_GET(prfcnt_size); fw_ctx->buf_bytes = prfcnt_hw_size + prfcnt_fw_size; /* Read the block size if the GPU has the register PRFCNT_FEATURES @@ -263,33 +252,31 @@ static void kbasep_hwcnt_backend_csf_if_fw_get_prfcnt_info( */ if ((kbdev->gpu_props.props.raw_props.gpu_id & GPU_ID2_PRODUCT_MODEL) >= GPU_ID2_PRODUCT_TTUX) { - prfcnt_block_size = - PRFCNT_FEATURES_COUNTER_BLOCK_SIZE_GET(kbase_reg_read( - kbdev, GPU_CONTROL_REG(PRFCNT_FEATURES))) - << 8; + prfcnt_block_size = PRFCNT_FEATURES_COUNTER_BLOCK_SIZE_GET( + kbase_reg_read(kbdev, GPU_CONTROL_REG(PRFCNT_FEATURES))) + << 8; } - prfcnt_info->dump_bytes = fw_ctx->buf_bytes; - prfcnt_info->prfcnt_block_size = prfcnt_block_size; - prfcnt_info->l2_count = kbdev->gpu_props.props.l2_props.num_l2_slices; - prfcnt_info->core_mask = - kbdev->gpu_props.props.coherency_info.group[0].core_mask; - - prfcnt_info->clk_cnt = fw_ctx->clk_cnt; - prfcnt_info->clearing_samples = true; + *prfcnt_info = (struct kbase_hwcnt_backend_csf_if_prfcnt_info){ + .prfcnt_hw_size = prfcnt_hw_size, + .prfcnt_fw_size = prfcnt_fw_size, + .dump_bytes = fw_ctx->buf_bytes, + .prfcnt_block_size = prfcnt_block_size, + .l2_count = kbdev->gpu_props.props.l2_props.num_l2_slices, + .core_mask = kbdev->gpu_props.props.coherency_info.group[0].core_mask, + .clk_cnt = fw_ctx->clk_cnt, + .clearing_samples = true, + }; /* Block size must be multiple of counter size. */ - WARN_ON((prfcnt_info->prfcnt_block_size % KBASE_HWCNT_VALUE_HW_BYTES) != - 0); + WARN_ON((prfcnt_info->prfcnt_block_size % KBASE_HWCNT_VALUE_HW_BYTES) != 0); /* Total size must be multiple of block size. */ - WARN_ON((prfcnt_info->dump_bytes % prfcnt_info->prfcnt_block_size) != - 0); + WARN_ON((prfcnt_info->dump_bytes % prfcnt_info->prfcnt_block_size) != 0); #endif } static int kbasep_hwcnt_backend_csf_if_fw_ring_buf_alloc( - struct kbase_hwcnt_backend_csf_if_ctx *ctx, u32 buf_count, - void **cpu_dump_base, + struct kbase_hwcnt_backend_csf_if_ctx *ctx, u32 buf_count, void **cpu_dump_base, struct kbase_hwcnt_backend_csf_if_ring_buf **out_ring_buf) { struct kbase_device *kbdev; @@ -341,9 +328,8 @@ static int kbasep_hwcnt_backend_csf_if_fw_ring_buf_alloc( goto page_list_alloc_error; /* Get physical page for the buffer */ - ret = kbase_mem_pool_alloc_pages( - &kbdev->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], num_pages, - phys, false); + ret = kbase_mem_pool_alloc_pages(&kbdev->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], num_pages, + phys, false, NULL); if (ret != num_pages) goto phys_mem_pool_alloc_error; @@ -359,16 +345,19 @@ static int kbasep_hwcnt_backend_csf_if_fw_ring_buf_alloc( KBASE_REG_MEMATTR_INDEX(AS_MEMATTR_INDEX_NON_CACHEABLE); /* Update MMU table */ - ret = kbase_mmu_insert_pages(kbdev, &kbdev->csf.mcu_mmu, - gpu_va_base >> PAGE_SHIFT, phys, num_pages, - flags, MCU_AS_NR, KBASE_MEM_GROUP_CSF_FW, - mmu_sync_info); + ret = kbase_mmu_insert_pages(kbdev, &kbdev->csf.mcu_mmu, gpu_va_base >> PAGE_SHIFT, phys, + num_pages, flags, MCU_AS_NR, KBASE_MEM_GROUP_CSF_FW, + mmu_sync_info, NULL); if (ret) goto mmu_insert_failed; kfree(page_list); +#if IS_ENABLED(CONFIG_MALI_NO_MALI) + fw_ring_buf->gpu_dump_base = (uintptr_t)cpu_addr; +#else fw_ring_buf->gpu_dump_base = gpu_va_base; +#endif /* CONFIG_MALI_NO_MALI */ fw_ring_buf->cpu_dump_base = cpu_addr; fw_ring_buf->phys = phys; fw_ring_buf->num_pages = num_pages; @@ -376,23 +365,15 @@ static int kbasep_hwcnt_backend_csf_if_fw_ring_buf_alloc( fw_ring_buf->as_nr = MCU_AS_NR; *cpu_dump_base = fw_ring_buf->cpu_dump_base; - *out_ring_buf = - (struct kbase_hwcnt_backend_csf_if_ring_buf *)fw_ring_buf; - -#if IS_ENABLED(CONFIG_MALI_NO_MALI) - /* The dummy model needs the CPU mapping. */ - gpu_model_set_dummy_prfcnt_base_cpu(fw_ring_buf->cpu_dump_base, kbdev, - phys, num_pages); -#endif /* CONFIG_MALI_NO_MALI */ + *out_ring_buf = (struct kbase_hwcnt_backend_csf_if_ring_buf *)fw_ring_buf; return 0; mmu_insert_failed: vunmap(cpu_addr); vmap_error: - kbase_mem_pool_free_pages( - &kbdev->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], num_pages, - phys, false, false); + kbase_mem_pool_free_pages(&kbdev->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], num_pages, phys, + false, false); phys_mem_pool_alloc_error: kfree(page_list); page_list_alloc_error: @@ -402,10 +383,10 @@ phys_alloc_error: return -ENOMEM; } -static void kbasep_hwcnt_backend_csf_if_fw_ring_buf_sync( - struct kbase_hwcnt_backend_csf_if_ctx *ctx, - struct kbase_hwcnt_backend_csf_if_ring_buf *ring_buf, - u32 buf_index_first, u32 buf_index_last, bool for_cpu) +static void +kbasep_hwcnt_backend_csf_if_fw_ring_buf_sync(struct kbase_hwcnt_backend_csf_if_ctx *ctx, + struct kbase_hwcnt_backend_csf_if_ring_buf *ring_buf, + u32 buf_index_first, u32 buf_index_last, bool for_cpu) { struct kbase_hwcnt_backend_csf_if_fw_ring_buf *fw_ring_buf = (struct kbase_hwcnt_backend_csf_if_fw_ring_buf *)ring_buf; @@ -422,14 +403,21 @@ static void kbasep_hwcnt_backend_csf_if_fw_ring_buf_sync( WARN_ON(!ctx); WARN_ON(!ring_buf); +#if IS_ENABLED(CONFIG_MALI_NO_MALI) + /* When using the dummy backend syncing the ring buffer is unnecessary as + * the ring buffer is only accessed by the CPU. It may also cause data loss + * due to cache invalidation so return early. + */ + return; +#endif /* CONFIG_MALI_NO_MALI */ + /* The index arguments for this function form an inclusive, exclusive * range. * However, when masking back to the available buffers we will make this * inclusive at both ends so full flushes are not 0 -> 0. */ ring_buf_index_first = buf_index_first & (fw_ring_buf->buf_count - 1); - ring_buf_index_last = - (buf_index_last - 1) & (fw_ring_buf->buf_count - 1); + ring_buf_index_last = (buf_index_last - 1) & (fw_ring_buf->buf_count - 1); /* The start address is the offset of the first buffer. */ start_address = fw_ctx->buf_bytes * ring_buf_index_first; @@ -446,15 +434,11 @@ static void kbasep_hwcnt_backend_csf_if_fw_ring_buf_sync( struct page *pg = as_page(fw_ring_buf->phys[i]); if (for_cpu) { - kbase_sync_single_for_cpu(fw_ctx->kbdev, - kbase_dma_addr(pg), - PAGE_SIZE, - DMA_BIDIRECTIONAL); + kbase_sync_single_for_cpu(fw_ctx->kbdev, kbase_dma_addr(pg), + PAGE_SIZE, DMA_BIDIRECTIONAL); } else { - kbase_sync_single_for_device(fw_ctx->kbdev, - kbase_dma_addr(pg), - PAGE_SIZE, - DMA_BIDIRECTIONAL); + kbase_sync_single_for_device(fw_ctx->kbdev, kbase_dma_addr(pg), + PAGE_SIZE, DMA_BIDIRECTIONAL); } } @@ -466,28 +450,24 @@ static void kbasep_hwcnt_backend_csf_if_fw_ring_buf_sync( struct page *pg = as_page(fw_ring_buf->phys[i]); if (for_cpu) { - kbase_sync_single_for_cpu(fw_ctx->kbdev, - kbase_dma_addr(pg), PAGE_SIZE, + kbase_sync_single_for_cpu(fw_ctx->kbdev, kbase_dma_addr(pg), PAGE_SIZE, DMA_BIDIRECTIONAL); } else { - kbase_sync_single_for_device(fw_ctx->kbdev, - kbase_dma_addr(pg), - PAGE_SIZE, + kbase_sync_single_for_device(fw_ctx->kbdev, kbase_dma_addr(pg), PAGE_SIZE, DMA_BIDIRECTIONAL); } } } -static u64 kbasep_hwcnt_backend_csf_if_fw_timestamp_ns( - struct kbase_hwcnt_backend_csf_if_ctx *ctx) +static u64 kbasep_hwcnt_backend_csf_if_fw_timestamp_ns(struct kbase_hwcnt_backend_csf_if_ctx *ctx) { CSTD_UNUSED(ctx); return ktime_get_raw_ns(); } -static void kbasep_hwcnt_backend_csf_if_fw_ring_buf_free( - struct kbase_hwcnt_backend_csf_if_ctx *ctx, - struct kbase_hwcnt_backend_csf_if_ring_buf *ring_buf) +static void +kbasep_hwcnt_backend_csf_if_fw_ring_buf_free(struct kbase_hwcnt_backend_csf_if_ctx *ctx, + struct kbase_hwcnt_backend_csf_if_ring_buf *ring_buf) { struct kbase_hwcnt_backend_csf_if_fw_ring_buf *fw_ring_buf = (struct kbase_hwcnt_backend_csf_if_fw_ring_buf *)ring_buf; @@ -500,17 +480,15 @@ static void kbasep_hwcnt_backend_csf_if_fw_ring_buf_free( if (fw_ring_buf->phys) { u64 gpu_va_base = KBASE_HWC_CSF_RING_BUFFER_VA_START; - WARN_ON(kbase_mmu_teardown_pages( - fw_ctx->kbdev, &fw_ctx->kbdev->csf.mcu_mmu, - gpu_va_base >> PAGE_SHIFT, fw_ring_buf->num_pages, + WARN_ON(kbase_mmu_teardown_firmware_pages( + fw_ctx->kbdev, &fw_ctx->kbdev->csf.mcu_mmu, gpu_va_base >> PAGE_SHIFT, + fw_ring_buf->phys, fw_ring_buf->num_pages, fw_ring_buf->num_pages, MCU_AS_NR)); vunmap(fw_ring_buf->cpu_dump_base); - kbase_mem_pool_free_pages( - &fw_ctx->kbdev->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], - fw_ring_buf->num_pages, fw_ring_buf->phys, false, - false); + kbase_mem_pool_free_pages(&fw_ctx->kbdev->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], + fw_ring_buf->num_pages, fw_ring_buf->phys, false, false); kfree(fw_ring_buf->phys); @@ -518,10 +496,10 @@ static void kbasep_hwcnt_backend_csf_if_fw_ring_buf_free( } } -static void kbasep_hwcnt_backend_csf_if_fw_dump_enable( - struct kbase_hwcnt_backend_csf_if_ctx *ctx, - struct kbase_hwcnt_backend_csf_if_ring_buf *ring_buf, - struct kbase_hwcnt_backend_csf_if_enable *enable) +static void +kbasep_hwcnt_backend_csf_if_fw_dump_enable(struct kbase_hwcnt_backend_csf_if_ctx *ctx, + struct kbase_hwcnt_backend_csf_if_ring_buf *ring_buf, + struct kbase_hwcnt_backend_csf_if_enable *enable) { u32 prfcnt_config; struct kbase_device *kbdev; @@ -540,12 +518,11 @@ static void kbasep_hwcnt_backend_csf_if_fw_dump_enable( global_iface = &kbdev->csf.global_iface; /* Configure */ - prfcnt_config = fw_ring_buf->buf_count; - prfcnt_config |= enable->counter_set << PRFCNT_CONFIG_SETSELECT_SHIFT; + prfcnt_config = GLB_PRFCNT_CONFIG_SIZE_SET(0, fw_ring_buf->buf_count); + prfcnt_config = GLB_PRFCNT_CONFIG_SET_SELECT_SET(prfcnt_config, enable->counter_set); /* Configure the ring buffer base address */ - kbase_csf_firmware_global_input(global_iface, GLB_PRFCNT_JASID, - fw_ring_buf->as_nr); + kbase_csf_firmware_global_input(global_iface, GLB_PRFCNT_JASID, fw_ring_buf->as_nr); kbase_csf_firmware_global_input(global_iface, GLB_PRFCNT_BASE_LO, fw_ring_buf->gpu_dump_base & U32_MAX); kbase_csf_firmware_global_input(global_iface, GLB_PRFCNT_BASE_HI, @@ -555,38 +532,29 @@ static void kbasep_hwcnt_backend_csf_if_fw_dump_enable( kbase_csf_firmware_global_input(global_iface, GLB_PRFCNT_EXTRACT, 0); /* Configure the enable bitmap */ - kbase_csf_firmware_global_input(global_iface, GLB_PRFCNT_CSF_EN, - enable->fe_bm); - kbase_csf_firmware_global_input(global_iface, GLB_PRFCNT_SHADER_EN, - enable->shader_bm); - kbase_csf_firmware_global_input(global_iface, GLB_PRFCNT_MMU_L2_EN, - enable->mmu_l2_bm); - kbase_csf_firmware_global_input(global_iface, GLB_PRFCNT_TILER_EN, - enable->tiler_bm); + kbase_csf_firmware_global_input(global_iface, GLB_PRFCNT_CSF_EN, enable->fe_bm); + kbase_csf_firmware_global_input(global_iface, GLB_PRFCNT_SHADER_EN, enable->shader_bm); + kbase_csf_firmware_global_input(global_iface, GLB_PRFCNT_MMU_L2_EN, enable->mmu_l2_bm); + kbase_csf_firmware_global_input(global_iface, GLB_PRFCNT_TILER_EN, enable->tiler_bm); /* Configure the HWC set and buffer size */ - kbase_csf_firmware_global_input(global_iface, GLB_PRFCNT_CONFIG, - prfcnt_config); + kbase_csf_firmware_global_input(global_iface, GLB_PRFCNT_CONFIG, prfcnt_config); kbdev->csf.hwcnt.enable_pending = true; /* Unmask the interrupts */ - kbase_csf_firmware_global_input_mask( - global_iface, GLB_ACK_IRQ_MASK, - GLB_ACK_IRQ_MASK_PRFCNT_SAMPLE_MASK, - GLB_ACK_IRQ_MASK_PRFCNT_SAMPLE_MASK); - kbase_csf_firmware_global_input_mask( - global_iface, GLB_ACK_IRQ_MASK, - GLB_ACK_IRQ_MASK_PRFCNT_THRESHOLD_MASK, - GLB_ACK_IRQ_MASK_PRFCNT_THRESHOLD_MASK); - kbase_csf_firmware_global_input_mask( - global_iface, GLB_ACK_IRQ_MASK, - GLB_ACK_IRQ_MASK_PRFCNT_OVERFLOW_MASK, - GLB_ACK_IRQ_MASK_PRFCNT_OVERFLOW_MASK); - kbase_csf_firmware_global_input_mask( - global_iface, GLB_ACK_IRQ_MASK, - GLB_ACK_IRQ_MASK_PRFCNT_ENABLE_MASK, - GLB_ACK_IRQ_MASK_PRFCNT_ENABLE_MASK); + kbase_csf_firmware_global_input_mask(global_iface, GLB_ACK_IRQ_MASK, + GLB_ACK_IRQ_MASK_PRFCNT_SAMPLE_MASK, + GLB_ACK_IRQ_MASK_PRFCNT_SAMPLE_MASK); + kbase_csf_firmware_global_input_mask(global_iface, GLB_ACK_IRQ_MASK, + GLB_ACK_IRQ_MASK_PRFCNT_THRESHOLD_MASK, + GLB_ACK_IRQ_MASK_PRFCNT_THRESHOLD_MASK); + kbase_csf_firmware_global_input_mask(global_iface, GLB_ACK_IRQ_MASK, + GLB_ACK_IRQ_MASK_PRFCNT_OVERFLOW_MASK, + GLB_ACK_IRQ_MASK_PRFCNT_OVERFLOW_MASK); + kbase_csf_firmware_global_input_mask(global_iface, GLB_ACK_IRQ_MASK, + GLB_ACK_IRQ_MASK_PRFCNT_ENABLE_MASK, + GLB_ACK_IRQ_MASK_PRFCNT_ENABLE_MASK); /* Enable the HWC */ kbase_csf_firmware_global_input_mask(global_iface, GLB_REQ, @@ -594,15 +562,12 @@ static void kbasep_hwcnt_backend_csf_if_fw_dump_enable( GLB_REQ_PRFCNT_ENABLE_MASK); kbase_csf_ring_doorbell(kbdev, CSF_KERNEL_DOORBELL_NR); - prfcnt_config = kbase_csf_firmware_global_input_read(global_iface, - GLB_PRFCNT_CONFIG); + prfcnt_config = kbase_csf_firmware_global_input_read(global_iface, GLB_PRFCNT_CONFIG); - kbasep_hwcnt_backend_csf_if_fw_cc_enable(fw_ctx, - enable->clk_enable_map); + kbasep_hwcnt_backend_csf_if_fw_cc_enable(fw_ctx, enable->clk_enable_map); } -static void kbasep_hwcnt_backend_csf_if_fw_dump_disable( - struct kbase_hwcnt_backend_csf_if_ctx *ctx) +static void kbasep_hwcnt_backend_csf_if_fw_dump_disable(struct kbase_hwcnt_backend_csf_if_ctx *ctx) { struct kbase_device *kbdev; struct kbase_csf_global_iface *global_iface; @@ -617,20 +582,16 @@ static void kbasep_hwcnt_backend_csf_if_fw_dump_disable( /* Disable the HWC */ kbdev->csf.hwcnt.enable_pending = true; - kbase_csf_firmware_global_input_mask(global_iface, GLB_REQ, 0, - GLB_REQ_PRFCNT_ENABLE_MASK); + kbase_csf_firmware_global_input_mask(global_iface, GLB_REQ, 0, GLB_REQ_PRFCNT_ENABLE_MASK); kbase_csf_ring_doorbell(kbdev, CSF_KERNEL_DOORBELL_NR); /* mask the interrupts */ - kbase_csf_firmware_global_input_mask( - global_iface, GLB_ACK_IRQ_MASK, 0, - GLB_ACK_IRQ_MASK_PRFCNT_SAMPLE_MASK); - kbase_csf_firmware_global_input_mask( - global_iface, GLB_ACK_IRQ_MASK, 0, - GLB_ACK_IRQ_MASK_PRFCNT_THRESHOLD_MASK); - kbase_csf_firmware_global_input_mask( - global_iface, GLB_ACK_IRQ_MASK, 0, - GLB_ACK_IRQ_MASK_PRFCNT_OVERFLOW_MASK); + kbase_csf_firmware_global_input_mask(global_iface, GLB_ACK_IRQ_MASK, 0, + GLB_ACK_IRQ_MASK_PRFCNT_SAMPLE_MASK); + kbase_csf_firmware_global_input_mask(global_iface, GLB_ACK_IRQ_MASK, 0, + GLB_ACK_IRQ_MASK_PRFCNT_THRESHOLD_MASK); + kbase_csf_firmware_global_input_mask(global_iface, GLB_ACK_IRQ_MASK, 0, + GLB_ACK_IRQ_MASK_PRFCNT_OVERFLOW_MASK); /* In case we have a previous request in flight when the disable * happens. @@ -640,8 +601,7 @@ static void kbasep_hwcnt_backend_csf_if_fw_dump_disable( kbasep_hwcnt_backend_csf_if_fw_cc_disable(fw_ctx); } -static void kbasep_hwcnt_backend_csf_if_fw_dump_request( - struct kbase_hwcnt_backend_csf_if_ctx *ctx) +static void kbasep_hwcnt_backend_csf_if_fw_dump_request(struct kbase_hwcnt_backend_csf_if_ctx *ctx) { u32 glb_req; struct kbase_device *kbdev; @@ -664,9 +624,8 @@ static void kbasep_hwcnt_backend_csf_if_fw_dump_request( kbase_csf_ring_doorbell(kbdev, CSF_KERNEL_DOORBELL_NR); } -static void kbasep_hwcnt_backend_csf_if_fw_get_indexes( - struct kbase_hwcnt_backend_csf_if_ctx *ctx, u32 *extract_index, - u32 *insert_index) +static void kbasep_hwcnt_backend_csf_if_fw_get_indexes(struct kbase_hwcnt_backend_csf_if_ctx *ctx, + u32 *extract_index, u32 *insert_index) { struct kbase_hwcnt_backend_csf_if_fw_ctx *fw_ctx = (struct kbase_hwcnt_backend_csf_if_fw_ctx *)ctx; @@ -676,14 +635,15 @@ static void kbasep_hwcnt_backend_csf_if_fw_get_indexes( WARN_ON(!insert_index); kbasep_hwcnt_backend_csf_if_fw_assert_lock_held(ctx); - *extract_index = kbase_csf_firmware_global_input_read( - &fw_ctx->kbdev->csf.global_iface, GLB_PRFCNT_EXTRACT); - *insert_index = kbase_csf_firmware_global_output( - &fw_ctx->kbdev->csf.global_iface, GLB_PRFCNT_INSERT); + *extract_index = kbase_csf_firmware_global_input_read(&fw_ctx->kbdev->csf.global_iface, + GLB_PRFCNT_EXTRACT); + *insert_index = kbase_csf_firmware_global_output(&fw_ctx->kbdev->csf.global_iface, + GLB_PRFCNT_INSERT); } -static void kbasep_hwcnt_backend_csf_if_fw_set_extract_index( - struct kbase_hwcnt_backend_csf_if_ctx *ctx, u32 extract_idx) +static void +kbasep_hwcnt_backend_csf_if_fw_set_extract_index(struct kbase_hwcnt_backend_csf_if_ctx *ctx, + u32 extract_idx) { struct kbase_hwcnt_backend_csf_if_fw_ctx *fw_ctx = (struct kbase_hwcnt_backend_csf_if_fw_ctx *)ctx; @@ -694,13 +654,13 @@ static void kbasep_hwcnt_backend_csf_if_fw_set_extract_index( /* Set the raw extract index to release the buffer back to the ring * buffer. */ - kbase_csf_firmware_global_input(&fw_ctx->kbdev->csf.global_iface, - GLB_PRFCNT_EXTRACT, extract_idx); + kbase_csf_firmware_global_input(&fw_ctx->kbdev->csf.global_iface, GLB_PRFCNT_EXTRACT, + extract_idx); } -static void kbasep_hwcnt_backend_csf_if_fw_get_gpu_cycle_count( - struct kbase_hwcnt_backend_csf_if_ctx *ctx, u64 *cycle_counts, - u64 clk_enable_map) +static void +kbasep_hwcnt_backend_csf_if_fw_get_gpu_cycle_count(struct kbase_hwcnt_backend_csf_if_ctx *ctx, + u64 *cycle_counts, u64 clk_enable_map) { struct kbase_hwcnt_backend_csf_if_fw_ctx *fw_ctx = (struct kbase_hwcnt_backend_csf_if_fw_ctx *)ctx; @@ -717,12 +677,12 @@ static void kbasep_hwcnt_backend_csf_if_fw_get_gpu_cycle_count( if (clk == KBASE_CLOCK_DOMAIN_TOP) { /* Read cycle count for top clock domain. */ - kbase_backend_get_gpu_time_norequest( - fw_ctx->kbdev, &cycle_counts[clk], NULL, NULL); + kbase_backend_get_gpu_time_norequest(fw_ctx->kbdev, &cycle_counts[clk], + NULL, NULL); } else { /* Estimate cycle count for non-top clock domain. */ - cycle_counts[clk] = kbase_ccswe_cycle_at( - &fw_ctx->ccswe_shader_cores, timestamp_ns); + cycle_counts[clk] = + kbase_ccswe_cycle_at(&fw_ctx->ccswe_shader_cores, timestamp_ns); } } } @@ -732,8 +692,8 @@ static void kbasep_hwcnt_backend_csf_if_fw_get_gpu_cycle_count( * * @fw_ctx: Pointer to context to destroy. */ -static void kbasep_hwcnt_backend_csf_if_fw_ctx_destroy( - struct kbase_hwcnt_backend_csf_if_fw_ctx *fw_ctx) +static void +kbasep_hwcnt_backend_csf_if_fw_ctx_destroy(struct kbase_hwcnt_backend_csf_if_fw_ctx *fw_ctx) { if (!fw_ctx) return; @@ -748,9 +708,9 @@ static void kbasep_hwcnt_backend_csf_if_fw_ctx_destroy( * @out_ctx: Non-NULL pointer to where info is stored on success. * Return: 0 on success, else error code. */ -static int kbasep_hwcnt_backend_csf_if_fw_ctx_create( - struct kbase_device *kbdev, - struct kbase_hwcnt_backend_csf_if_fw_ctx **out_ctx) +static int +kbasep_hwcnt_backend_csf_if_fw_ctx_create(struct kbase_device *kbdev, + struct kbase_hwcnt_backend_csf_if_fw_ctx **out_ctx) { u8 clk; int errcode = -ENOMEM; @@ -774,8 +734,7 @@ static int kbasep_hwcnt_backend_csf_if_fw_ctx_create( ctx->clk_enable_map = 0; kbase_ccswe_init(&ctx->ccswe_shader_cores); - ctx->rate_listener.notify = - kbasep_hwcnt_backend_csf_if_fw_on_freq_change; + ctx->rate_listener.notify = kbasep_hwcnt_backend_csf_if_fw_on_freq_change; *out_ctx = ctx; @@ -785,8 +744,7 @@ error: return errcode; } -void kbase_hwcnt_backend_csf_if_fw_destroy( - struct kbase_hwcnt_backend_csf_if *if_fw) +void kbase_hwcnt_backend_csf_if_fw_destroy(struct kbase_hwcnt_backend_csf_if *if_fw) { if (!if_fw) return; @@ -796,8 +754,8 @@ void kbase_hwcnt_backend_csf_if_fw_destroy( memset(if_fw, 0, sizeof(*if_fw)); } -int kbase_hwcnt_backend_csf_if_fw_create( - struct kbase_device *kbdev, struct kbase_hwcnt_backend_csf_if *if_fw) +int kbase_hwcnt_backend_csf_if_fw_create(struct kbase_device *kbdev, + struct kbase_hwcnt_backend_csf_if *if_fw) { int errcode; struct kbase_hwcnt_backend_csf_if_fw_ctx *ctx = NULL; @@ -810,8 +768,7 @@ int kbase_hwcnt_backend_csf_if_fw_create( return errcode; if_fw->ctx = (struct kbase_hwcnt_backend_csf_if_ctx *)ctx; - if_fw->assert_lock_held = - kbasep_hwcnt_backend_csf_if_fw_assert_lock_held; + if_fw->assert_lock_held = kbasep_hwcnt_backend_csf_if_fw_assert_lock_held; if_fw->lock = kbasep_hwcnt_backend_csf_if_fw_lock; if_fw->unlock = kbasep_hwcnt_backend_csf_if_fw_unlock; if_fw->get_prfcnt_info = kbasep_hwcnt_backend_csf_if_fw_get_prfcnt_info; @@ -822,11 +779,9 @@ int kbase_hwcnt_backend_csf_if_fw_create( if_fw->dump_enable = kbasep_hwcnt_backend_csf_if_fw_dump_enable; if_fw->dump_disable = kbasep_hwcnt_backend_csf_if_fw_dump_disable; if_fw->dump_request = kbasep_hwcnt_backend_csf_if_fw_dump_request; - if_fw->get_gpu_cycle_count = - kbasep_hwcnt_backend_csf_if_fw_get_gpu_cycle_count; + if_fw->get_gpu_cycle_count = kbasep_hwcnt_backend_csf_if_fw_get_gpu_cycle_count; if_fw->get_indexes = kbasep_hwcnt_backend_csf_if_fw_get_indexes; - if_fw->set_extract_index = - kbasep_hwcnt_backend_csf_if_fw_set_extract_index; + if_fw->set_extract_index = kbasep_hwcnt_backend_csf_if_fw_set_extract_index; return 0; } diff --git a/mali_kbase/mali_kbase_hwcnt_backend_csf_if_fw.h b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend_csf_if_fw.h index b69668b..71d1506 100644 --- a/mali_kbase/mali_kbase_hwcnt_backend_csf_if_fw.h +++ b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend_csf_if_fw.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -26,7 +26,7 @@ #ifndef _KBASE_HWCNT_BACKEND_CSF_IF_FW_H_ #define _KBASE_HWCNT_BACKEND_CSF_IF_FW_H_ -#include "mali_kbase_hwcnt_backend_csf_if.h" +#include "hwcnt/backend/mali_kbase_hwcnt_backend_csf_if.h" /** * kbase_hwcnt_backend_csf_if_fw_create() - Create a firmware CSF interface @@ -36,15 +36,14 @@ * creation success. * Return: 0 on success, else error code. */ -int kbase_hwcnt_backend_csf_if_fw_create( - struct kbase_device *kbdev, struct kbase_hwcnt_backend_csf_if *if_fw); +int kbase_hwcnt_backend_csf_if_fw_create(struct kbase_device *kbdev, + struct kbase_hwcnt_backend_csf_if *if_fw); /** * kbase_hwcnt_backend_csf_if_fw_destroy() - Destroy a firmware CSF interface of * hardware counter backend. * @if_fw: Pointer to a CSF interface to destroy. */ -void kbase_hwcnt_backend_csf_if_fw_destroy( - struct kbase_hwcnt_backend_csf_if *if_fw); +void kbase_hwcnt_backend_csf_if_fw_destroy(struct kbase_hwcnt_backend_csf_if *if_fw); #endif /* _KBASE_HWCNT_BACKEND_CSF_IF_FW_H_ */ diff --git a/mali_kbase/mali_kbase_hwcnt_backend_jm.c b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend_jm.c index e418212..8b3caac 100644 --- a/mali_kbase/mali_kbase_hwcnt_backend_jm.c +++ b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend_jm.c @@ -19,18 +19,15 @@ * */ -#include "mali_kbase_hwcnt_backend_jm.h" -#include "mali_kbase_hwcnt_gpu.h" -#include "mali_kbase_hwcnt_types.h" +#include "hwcnt/backend/mali_kbase_hwcnt_backend_jm.h" +#include "hwcnt/mali_kbase_hwcnt_gpu.h" +#include "hwcnt/mali_kbase_hwcnt_types.h" #include "mali_kbase.h" #include "backend/gpu/mali_kbase_pm_ca.h" #include "mali_kbase_hwaccess_instr.h" #include "mali_kbase_hwaccess_time.h" #include "mali_kbase_ccswe.h" - -#if IS_ENABLED(CONFIG_MALI_NO_MALI) -#include "backend/gpu/mali_kbase_model_dummy.h" -#endif /* CONFIG_MALI_NO_MALI */ +#include "backend/gpu/mali_kbase_model_linux.h" #include "backend/gpu/mali_kbase_clk_rate_trace_mgr.h" #include "backend/gpu/mali_kbase_pm_internal.h" @@ -136,9 +133,8 @@ struct kbase_hwcnt_backend_jm { * * Return: 0 on success, else error code. */ -static int -kbasep_hwcnt_backend_jm_gpu_info_init(struct kbase_device *kbdev, - struct kbase_hwcnt_gpu_info *info) +static int kbasep_hwcnt_backend_jm_gpu_info_init(struct kbase_device *kbdev, + struct kbase_hwcnt_gpu_info *info) { size_t clk; @@ -153,13 +149,11 @@ kbasep_hwcnt_backend_jm_gpu_info_init(struct kbase_device *kbdev, { const struct base_gpu_props *props = &kbdev->gpu_props.props; const size_t l2_count = props->l2_props.num_l2_slices; - const size_t core_mask = - props->coherency_info.group[0].core_mask; + const size_t core_mask = props->coherency_info.group[0].core_mask; info->l2_count = l2_count; info->core_mask = core_mask; - info->prfcnt_values_per_block = - KBASE_HWCNT_V5_DEFAULT_VALUES_PER_BLOCK; + info->prfcnt_values_per_block = KBASE_HWCNT_V5_DEFAULT_VALUES_PER_BLOCK; } #endif /* CONFIG_MALI_NO_MALI */ @@ -173,9 +167,8 @@ kbasep_hwcnt_backend_jm_gpu_info_init(struct kbase_device *kbdev, return 0; } -static void kbasep_hwcnt_backend_jm_init_layout( - const struct kbase_hwcnt_gpu_info *gpu_info, - struct kbase_hwcnt_jm_physical_layout *phys_layout) +static void kbasep_hwcnt_backend_jm_init_layout(const struct kbase_hwcnt_gpu_info *gpu_info, + struct kbase_hwcnt_jm_physical_layout *phys_layout) { u8 shader_core_cnt; @@ -189,32 +182,29 @@ static void kbasep_hwcnt_backend_jm_init_layout( .tiler_cnt = KBASE_HWCNT_V5_TILER_BLOCK_COUNT, .mmu_l2_cnt = gpu_info->l2_count, .shader_cnt = shader_core_cnt, - .block_cnt = KBASE_HWCNT_V5_FE_BLOCK_COUNT + - KBASE_HWCNT_V5_TILER_BLOCK_COUNT + + .block_cnt = KBASE_HWCNT_V5_FE_BLOCK_COUNT + KBASE_HWCNT_V5_TILER_BLOCK_COUNT + gpu_info->l2_count + shader_core_cnt, .shader_avail_mask = gpu_info->core_mask, .headers_per_block = KBASE_HWCNT_V5_HEADERS_PER_BLOCK, .values_per_block = gpu_info->prfcnt_values_per_block, - .counters_per_block = gpu_info->prfcnt_values_per_block - - KBASE_HWCNT_V5_HEADERS_PER_BLOCK, + .counters_per_block = + gpu_info->prfcnt_values_per_block - KBASE_HWCNT_V5_HEADERS_PER_BLOCK, .enable_mask_offset = KBASE_HWCNT_V5_PRFCNT_EN_HEADER, }; } -static void kbasep_hwcnt_backend_jm_dump_sample( - const struct kbase_hwcnt_backend_jm *const backend_jm) +static void +kbasep_hwcnt_backend_jm_dump_sample(const struct kbase_hwcnt_backend_jm *const backend_jm) { size_t block_idx; const u32 *new_sample_buf = backend_jm->cpu_dump_va; const u32 *new_block = new_sample_buf; u64 *dst_buf = backend_jm->to_user_buf; u64 *dst_block = dst_buf; - const size_t values_per_block = - backend_jm->phys_layout.values_per_block; + const size_t values_per_block = backend_jm->phys_layout.values_per_block; const size_t dump_bytes = backend_jm->info->dump_bytes; - for (block_idx = 0; block_idx < backend_jm->phys_layout.block_cnt; - block_idx++) { + for (block_idx = 0; block_idx < backend_jm->phys_layout.block_cnt; block_idx++) { size_t ctr_idx; for (ctr_idx = 0; ctr_idx < values_per_block; ctr_idx++) @@ -224,10 +214,8 @@ static void kbasep_hwcnt_backend_jm_dump_sample( dst_block += values_per_block; } - WARN_ON(new_block != - new_sample_buf + (dump_bytes / KBASE_HWCNT_VALUE_HW_BYTES)); - WARN_ON(dst_block != - dst_buf + (dump_bytes / KBASE_HWCNT_VALUE_HW_BYTES)); + WARN_ON(new_block != new_sample_buf + (dump_bytes / KBASE_HWCNT_VALUE_HW_BYTES)); + WARN_ON(dst_block != dst_buf + (dump_bytes / KBASE_HWCNT_VALUE_HW_BYTES)); } /** @@ -237,21 +225,18 @@ static void kbasep_hwcnt_backend_jm_dump_sample( * @clk_index: Clock index * @clk_rate_hz: Clock frequency(hz) */ -static void kbasep_hwcnt_backend_jm_on_freq_change( - struct kbase_clk_rate_listener *rate_listener, - u32 clk_index, - u32 clk_rate_hz) +static void kbasep_hwcnt_backend_jm_on_freq_change(struct kbase_clk_rate_listener *rate_listener, + u32 clk_index, u32 clk_rate_hz) { - struct kbase_hwcnt_backend_jm *backend_jm = container_of( - rate_listener, struct kbase_hwcnt_backend_jm, rate_listener); + struct kbase_hwcnt_backend_jm *backend_jm = + container_of(rate_listener, struct kbase_hwcnt_backend_jm, rate_listener); u64 timestamp_ns; if (clk_index != KBASE_CLOCK_DOMAIN_SHADER_CORES) return; timestamp_ns = ktime_get_raw_ns(); - kbase_ccswe_freq_change( - &backend_jm->ccswe_shader_cores, timestamp_ns, clk_rate_hz); + kbase_ccswe_freq_change(&backend_jm->ccswe_shader_cores, timestamp_ns, clk_rate_hz); } /** @@ -261,53 +246,42 @@ static void kbasep_hwcnt_backend_jm_on_freq_change( * @enable_map: Non-NULL pointer to enable map specifying enabled counters. * @timestamp_ns: Timestamp(ns) when HWCNT were enabled. */ -static void kbasep_hwcnt_backend_jm_cc_enable( - struct kbase_hwcnt_backend_jm *backend_jm, - const struct kbase_hwcnt_enable_map *enable_map, - u64 timestamp_ns) +static void kbasep_hwcnt_backend_jm_cc_enable(struct kbase_hwcnt_backend_jm *backend_jm, + const struct kbase_hwcnt_enable_map *enable_map, + u64 timestamp_ns) { struct kbase_device *kbdev = backend_jm->kctx->kbdev; u64 clk_enable_map = enable_map->clk_enable_map; u64 cycle_count; - if (kbase_hwcnt_clk_enable_map_enabled( - clk_enable_map, KBASE_CLOCK_DOMAIN_TOP)) { + if (kbase_hwcnt_clk_enable_map_enabled(clk_enable_map, KBASE_CLOCK_DOMAIN_TOP)) { /* turn on the cycle counter */ kbase_pm_request_gpu_cycle_counter_l2_is_on(kbdev); /* Read cycle count for top clock domain. */ - kbase_backend_get_gpu_time_norequest( - kbdev, &cycle_count, NULL, NULL); + kbase_backend_get_gpu_time_norequest(kbdev, &cycle_count, NULL, NULL); - backend_jm->prev_cycle_count[KBASE_CLOCK_DOMAIN_TOP] = - cycle_count; + backend_jm->prev_cycle_count[KBASE_CLOCK_DOMAIN_TOP] = cycle_count; } - if (kbase_hwcnt_clk_enable_map_enabled( - clk_enable_map, KBASE_CLOCK_DOMAIN_SHADER_CORES)) { + if (kbase_hwcnt_clk_enable_map_enabled(clk_enable_map, KBASE_CLOCK_DOMAIN_SHADER_CORES)) { /* software estimation for non-top clock domains */ struct kbase_clk_rate_trace_manager *rtm = &kbdev->pm.clk_rtm; - const struct kbase_clk_data *clk_data = - rtm->clks[KBASE_CLOCK_DOMAIN_SHADER_CORES]; + const struct kbase_clk_data *clk_data = rtm->clks[KBASE_CLOCK_DOMAIN_SHADER_CORES]; u32 cur_freq; unsigned long flags; spin_lock_irqsave(&rtm->lock, flags); - cur_freq = (u32) clk_data->clock_val; + cur_freq = (u32)clk_data->clock_val; kbase_ccswe_reset(&backend_jm->ccswe_shader_cores); - kbase_ccswe_freq_change( - &backend_jm->ccswe_shader_cores, - timestamp_ns, - cur_freq); + kbase_ccswe_freq_change(&backend_jm->ccswe_shader_cores, timestamp_ns, cur_freq); - kbase_clk_rate_trace_manager_subscribe_no_lock( - rtm, &backend_jm->rate_listener); + kbase_clk_rate_trace_manager_subscribe_no_lock(rtm, &backend_jm->rate_listener); spin_unlock_irqrestore(&rtm->lock, flags); /* ccswe was reset. The estimated cycle is zero. */ - backend_jm->prev_cycle_count[ - KBASE_CLOCK_DOMAIN_SHADER_CORES] = 0; + backend_jm->prev_cycle_count[KBASE_CLOCK_DOMAIN_SHADER_CORES] = 0; } /* Keep clk_enable_map for dump_request. */ @@ -319,28 +293,22 @@ static void kbasep_hwcnt_backend_jm_cc_enable( * * @backend_jm: Non-NULL pointer to backend. */ -static void kbasep_hwcnt_backend_jm_cc_disable( - struct kbase_hwcnt_backend_jm *backend_jm) +static void kbasep_hwcnt_backend_jm_cc_disable(struct kbase_hwcnt_backend_jm *backend_jm) { struct kbase_device *kbdev = backend_jm->kctx->kbdev; struct kbase_clk_rate_trace_manager *rtm = &kbdev->pm.clk_rtm; u64 clk_enable_map = backend_jm->clk_enable_map; - if (kbase_hwcnt_clk_enable_map_enabled( - clk_enable_map, KBASE_CLOCK_DOMAIN_TOP)) { + if (kbase_hwcnt_clk_enable_map_enabled(clk_enable_map, KBASE_CLOCK_DOMAIN_TOP)) { /* turn off the cycle counter */ kbase_pm_release_gpu_cycle_counter(kbdev); } - if (kbase_hwcnt_clk_enable_map_enabled( - clk_enable_map, KBASE_CLOCK_DOMAIN_SHADER_CORES)) { - - kbase_clk_rate_trace_manager_unsubscribe( - rtm, &backend_jm->rate_listener); + if (kbase_hwcnt_clk_enable_map_enabled(clk_enable_map, KBASE_CLOCK_DOMAIN_SHADER_CORES)) { + kbase_clk_rate_trace_manager_unsubscribe(rtm, &backend_jm->rate_listener); } } - /** * kbasep_hwcnt_gpu_update_curr_config() - Update the destination buffer with * current config information. @@ -356,38 +324,33 @@ static void kbasep_hwcnt_backend_jm_cc_disable( * * Return: 0 on success, else error code. */ -static int kbasep_hwcnt_gpu_update_curr_config( - struct kbase_device *kbdev, - struct kbase_hwcnt_curr_config *curr_config) +static int kbasep_hwcnt_gpu_update_curr_config(struct kbase_device *kbdev, + struct kbase_hwcnt_curr_config *curr_config) { if (WARN_ON(!kbdev) || WARN_ON(!curr_config)) return -EINVAL; lockdep_assert_held(&kbdev->hwaccess_lock); - curr_config->num_l2_slices = - kbdev->gpu_props.curr_config.l2_slices; - curr_config->shader_present = - kbdev->gpu_props.curr_config.shader_present; + curr_config->num_l2_slices = kbdev->gpu_props.curr_config.l2_slices; + curr_config->shader_present = kbdev->gpu_props.curr_config.shader_present; return 0; } /* JM backend implementation of kbase_hwcnt_backend_timestamp_ns_fn */ -static u64 kbasep_hwcnt_backend_jm_timestamp_ns( - struct kbase_hwcnt_backend *backend) +static u64 kbasep_hwcnt_backend_jm_timestamp_ns(struct kbase_hwcnt_backend *backend) { (void)backend; return ktime_get_raw_ns(); } /* JM backend implementation of kbase_hwcnt_backend_dump_enable_nolock_fn */ -static int kbasep_hwcnt_backend_jm_dump_enable_nolock( - struct kbase_hwcnt_backend *backend, - const struct kbase_hwcnt_enable_map *enable_map) +static int +kbasep_hwcnt_backend_jm_dump_enable_nolock(struct kbase_hwcnt_backend *backend, + const struct kbase_hwcnt_enable_map *enable_map) { int errcode; - struct kbase_hwcnt_backend_jm *backend_jm = - (struct kbase_hwcnt_backend_jm *)backend; + struct kbase_hwcnt_backend_jm *backend_jm = (struct kbase_hwcnt_backend_jm *)backend; struct kbase_context *kctx; struct kbase_device *kbdev; struct kbase_hwcnt_physical_enable_map phys_enable_map; @@ -406,22 +369,25 @@ static int kbasep_hwcnt_backend_jm_dump_enable_nolock( kbase_hwcnt_gpu_enable_map_to_physical(&phys_enable_map, enable_map); - kbase_hwcnt_gpu_set_to_physical(&phys_counter_set, - backend_jm->info->counter_set); + kbase_hwcnt_gpu_set_to_physical(&phys_counter_set, backend_jm->info->counter_set); enable.fe_bm = phys_enable_map.fe_bm; enable.shader_bm = phys_enable_map.shader_bm; enable.tiler_bm = phys_enable_map.tiler_bm; enable.mmu_l2_bm = phys_enable_map.mmu_l2_bm; enable.counter_set = phys_counter_set; +#if IS_ENABLED(CONFIG_MALI_NO_MALI) + /* The dummy model needs the CPU mapping. */ + enable.dump_buffer = (uintptr_t)backend_jm->cpu_dump_va; +#else enable.dump_buffer = backend_jm->gpu_dump_va; +#endif /* CONFIG_MALI_NO_MALI */ enable.dump_buffer_bytes = backend_jm->info->dump_bytes; timestamp_ns = kbasep_hwcnt_backend_jm_timestamp_ns(backend); /* Update the current configuration information. */ - errcode = kbasep_hwcnt_gpu_update_curr_config(kbdev, - &backend_jm->curr_config); + errcode = kbasep_hwcnt_gpu_update_curr_config(kbdev, &backend_jm->curr_config); if (errcode) goto error; @@ -441,14 +407,12 @@ error: } /* JM backend implementation of kbase_hwcnt_backend_dump_enable_fn */ -static int kbasep_hwcnt_backend_jm_dump_enable( - struct kbase_hwcnt_backend *backend, - const struct kbase_hwcnt_enable_map *enable_map) +static int kbasep_hwcnt_backend_jm_dump_enable(struct kbase_hwcnt_backend *backend, + const struct kbase_hwcnt_enable_map *enable_map) { unsigned long flags; int errcode; - struct kbase_hwcnt_backend_jm *backend_jm = - (struct kbase_hwcnt_backend_jm *)backend; + struct kbase_hwcnt_backend_jm *backend_jm = (struct kbase_hwcnt_backend_jm *)backend; struct kbase_device *kbdev; if (!backend_jm) @@ -458,8 +422,7 @@ static int kbasep_hwcnt_backend_jm_dump_enable( spin_lock_irqsave(&kbdev->hwaccess_lock, flags); - errcode = kbasep_hwcnt_backend_jm_dump_enable_nolock( - backend, enable_map); + errcode = kbasep_hwcnt_backend_jm_dump_enable_nolock(backend, enable_map); spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); @@ -467,12 +430,10 @@ static int kbasep_hwcnt_backend_jm_dump_enable( } /* JM backend implementation of kbase_hwcnt_backend_dump_disable_fn */ -static void kbasep_hwcnt_backend_jm_dump_disable( - struct kbase_hwcnt_backend *backend) +static void kbasep_hwcnt_backend_jm_dump_disable(struct kbase_hwcnt_backend *backend) { int errcode; - struct kbase_hwcnt_backend_jm *backend_jm = - (struct kbase_hwcnt_backend_jm *)backend; + struct kbase_hwcnt_backend_jm *backend_jm = (struct kbase_hwcnt_backend_jm *)backend; if (WARN_ON(!backend_jm) || !backend_jm->enabled) return; @@ -486,11 +447,9 @@ static void kbasep_hwcnt_backend_jm_dump_disable( } /* JM backend implementation of kbase_hwcnt_backend_dump_clear_fn */ -static int kbasep_hwcnt_backend_jm_dump_clear( - struct kbase_hwcnt_backend *backend) +static int kbasep_hwcnt_backend_jm_dump_clear(struct kbase_hwcnt_backend *backend) { - struct kbase_hwcnt_backend_jm *backend_jm = - (struct kbase_hwcnt_backend_jm *)backend; + struct kbase_hwcnt_backend_jm *backend_jm = (struct kbase_hwcnt_backend_jm *)backend; if (!backend_jm || !backend_jm->enabled) return -EINVAL; @@ -499,12 +458,10 @@ static int kbasep_hwcnt_backend_jm_dump_clear( } /* JM backend implementation of kbase_hwcnt_backend_dump_request_fn */ -static int kbasep_hwcnt_backend_jm_dump_request( - struct kbase_hwcnt_backend *backend, - u64 *dump_time_ns) +static int kbasep_hwcnt_backend_jm_dump_request(struct kbase_hwcnt_backend *backend, + u64 *dump_time_ns) { - struct kbase_hwcnt_backend_jm *backend_jm = - (struct kbase_hwcnt_backend_jm *)backend; + struct kbase_hwcnt_backend_jm *backend_jm = (struct kbase_hwcnt_backend_jm *)backend; struct kbase_device *kbdev; const struct kbase_hwcnt_metadata *metadata; u64 current_cycle_count; @@ -523,28 +480,25 @@ static int kbasep_hwcnt_backend_jm_dump_request( *dump_time_ns = kbasep_hwcnt_backend_jm_timestamp_ns(backend); ret = kbase_instr_hwcnt_request_dump(backend_jm->kctx); - kbase_hwcnt_metadata_for_each_clock(metadata, clk) { - if (!kbase_hwcnt_clk_enable_map_enabled( - backend_jm->clk_enable_map, clk)) + kbase_hwcnt_metadata_for_each_clock(metadata, clk) + { + if (!kbase_hwcnt_clk_enable_map_enabled(backend_jm->clk_enable_map, clk)) continue; if (clk == KBASE_CLOCK_DOMAIN_TOP) { /* Read cycle count for top clock domain. */ - kbase_backend_get_gpu_time_norequest( - kbdev, ¤t_cycle_count, - NULL, NULL); + kbase_backend_get_gpu_time_norequest(kbdev, ¤t_cycle_count, + NULL, NULL); } else { /* * Estimate cycle count for non-top clock * domain. */ current_cycle_count = kbase_ccswe_cycle_at( - &backend_jm->ccswe_shader_cores, - *dump_time_ns); + &backend_jm->ccswe_shader_cores, *dump_time_ns); } backend_jm->cycle_count_elapsed[clk] = - current_cycle_count - - backend_jm->prev_cycle_count[clk]; + current_cycle_count - backend_jm->prev_cycle_count[clk]; /* * Keep the current cycle count for later calculation. @@ -558,11 +512,9 @@ static int kbasep_hwcnt_backend_jm_dump_request( } /* JM backend implementation of kbase_hwcnt_backend_dump_wait_fn */ -static int kbasep_hwcnt_backend_jm_dump_wait( - struct kbase_hwcnt_backend *backend) +static int kbasep_hwcnt_backend_jm_dump_wait(struct kbase_hwcnt_backend *backend) { - struct kbase_hwcnt_backend_jm *backend_jm = - (struct kbase_hwcnt_backend_jm *)backend; + struct kbase_hwcnt_backend_jm *backend_jm = (struct kbase_hwcnt_backend_jm *)backend; if (!backend_jm || !backend_jm->enabled) return -EINVAL; @@ -571,14 +523,12 @@ static int kbasep_hwcnt_backend_jm_dump_wait( } /* JM backend implementation of kbase_hwcnt_backend_dump_get_fn */ -static int kbasep_hwcnt_backend_jm_dump_get( - struct kbase_hwcnt_backend *backend, - struct kbase_hwcnt_dump_buffer *dst, - const struct kbase_hwcnt_enable_map *dst_enable_map, - bool accumulate) +static int kbasep_hwcnt_backend_jm_dump_get(struct kbase_hwcnt_backend *backend, + struct kbase_hwcnt_dump_buffer *dst, + const struct kbase_hwcnt_enable_map *dst_enable_map, + bool accumulate) { - struct kbase_hwcnt_backend_jm *backend_jm = - (struct kbase_hwcnt_backend_jm *)backend; + struct kbase_hwcnt_backend_jm *backend_jm = (struct kbase_hwcnt_backend_jm *)backend; size_t clk; #if IS_ENABLED(CONFIG_MALI_NO_MALI) struct kbase_device *kbdev; @@ -592,16 +542,15 @@ static int kbasep_hwcnt_backend_jm_dump_get( return -EINVAL; /* Invalidate the kernel buffer before reading from it. */ - kbase_sync_mem_regions( - backend_jm->kctx, backend_jm->vmap, KBASE_SYNC_TO_CPU); + kbase_sync_mem_regions(backend_jm->kctx, backend_jm->vmap, KBASE_SYNC_TO_CPU); /* Dump sample to the internal 64-bit user buffer. */ kbasep_hwcnt_backend_jm_dump_sample(backend_jm); /* Extract elapsed cycle count for each clock domain if enabled. */ - kbase_hwcnt_metadata_for_each_clock(dst_enable_map->metadata, clk) { - if (!kbase_hwcnt_clk_enable_map_enabled( - dst_enable_map->clk_enable_map, clk)) + kbase_hwcnt_metadata_for_each_clock(dst_enable_map->metadata, clk) + { + if (!kbase_hwcnt_clk_enable_map_enabled(dst_enable_map->clk_enable_map, clk)) continue; /* Reset the counter to zero if accumulation is off. */ @@ -616,17 +565,16 @@ static int kbasep_hwcnt_backend_jm_dump_get( spin_lock_irqsave(&kbdev->hwaccess_lock, flags); /* Update the current configuration information. */ - errcode = kbasep_hwcnt_gpu_update_curr_config(kbdev, - &backend_jm->curr_config); + errcode = kbasep_hwcnt_gpu_update_curr_config(kbdev, &backend_jm->curr_config); spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); if (errcode) return errcode; #endif /* CONFIG_MALI_NO_MALI */ - return kbase_hwcnt_jm_dump_get(dst, backend_jm->to_user_buf, - dst_enable_map, backend_jm->pm_core_mask, - &backend_jm->curr_config, accumulate); + return kbase_hwcnt_jm_dump_get(dst, backend_jm->to_user_buf, dst_enable_map, + backend_jm->pm_core_mask, &backend_jm->curr_config, + accumulate); } /** @@ -638,10 +586,8 @@ static int kbasep_hwcnt_backend_jm_dump_get( * * Return: 0 on success, else error code. */ -static int kbasep_hwcnt_backend_jm_dump_alloc( - const struct kbase_hwcnt_backend_jm_info *info, - struct kbase_context *kctx, - u64 *gpu_dump_va) +static int kbasep_hwcnt_backend_jm_dump_alloc(const struct kbase_hwcnt_backend_jm_info *info, + struct kbase_context *kctx, u64 *gpu_dump_va) { struct kbase_va_region *reg; u64 flags; @@ -656,16 +602,12 @@ static int kbasep_hwcnt_backend_jm_dump_alloc( WARN_ON(!kctx); WARN_ON(!gpu_dump_va); - flags = BASE_MEM_PROT_CPU_RD | - BASE_MEM_PROT_GPU_WR | - BASEP_MEM_PERMANENT_KERNEL_MAPPING | - BASE_MEM_CACHED_CPU | - BASE_MEM_UNCACHED_GPU; + flags = BASE_MEM_PROT_CPU_RD | BASE_MEM_PROT_GPU_WR | BASEP_MEM_PERMANENT_KERNEL_MAPPING | + BASE_MEM_CACHED_CPU | BASE_MEM_UNCACHED_GPU; nr_pages = PFN_UP(info->dump_bytes); - reg = kbase_mem_alloc(kctx, nr_pages, nr_pages, 0, &flags, gpu_dump_va, - mmu_sync_info); + reg = kbase_mem_alloc(kctx, nr_pages, nr_pages, 0, &flags, gpu_dump_va, mmu_sync_info); if (!reg) return -ENOMEM; @@ -678,9 +620,7 @@ static int kbasep_hwcnt_backend_jm_dump_alloc( * @kctx: Non-NULL pointer to kbase context. * @gpu_dump_va: GPU dump buffer virtual address. */ -static void kbasep_hwcnt_backend_jm_dump_free( - struct kbase_context *kctx, - u64 gpu_dump_va) +static void kbasep_hwcnt_backend_jm_dump_free(struct kbase_context *kctx, u64 gpu_dump_va) { WARN_ON(!kctx); if (gpu_dump_va) @@ -693,8 +633,7 @@ static void kbasep_hwcnt_backend_jm_dump_free( * * Can be safely called on a backend in any state of partial construction. */ -static void kbasep_hwcnt_backend_jm_destroy( - struct kbase_hwcnt_backend_jm *backend) +static void kbasep_hwcnt_backend_jm_destroy(struct kbase_hwcnt_backend_jm *backend) { if (!backend) return; @@ -707,8 +646,7 @@ static void kbasep_hwcnt_backend_jm_destroy( kbase_phy_alloc_mapping_put(kctx, backend->vmap); if (backend->gpu_dump_va) - kbasep_hwcnt_backend_jm_dump_free( - kctx, backend->gpu_dump_va); + kbasep_hwcnt_backend_jm_dump_free(kctx, backend->gpu_dump_va); kbasep_js_release_privileged_ctx(kbdev, kctx); kbase_destroy_context(kctx); @@ -726,16 +664,12 @@ static void kbasep_hwcnt_backend_jm_destroy( * * Return: 0 on success, else error code. */ -static int kbasep_hwcnt_backend_jm_create( - const struct kbase_hwcnt_backend_jm_info *info, - struct kbase_hwcnt_backend_jm **out_backend) +static int kbasep_hwcnt_backend_jm_create(const struct kbase_hwcnt_backend_jm_info *info, + struct kbase_hwcnt_backend_jm **out_backend) { int errcode; struct kbase_device *kbdev; struct kbase_hwcnt_backend_jm *backend = NULL; -#if IS_ENABLED(CONFIG_MALI_NO_MALI) - size_t page_count; -#endif WARN_ON(!info); WARN_ON(!out_backend); @@ -747,42 +681,31 @@ static int kbasep_hwcnt_backend_jm_create( goto alloc_error; backend->info = info; - kbasep_hwcnt_backend_jm_init_layout(&info->hwcnt_gpu_info, - &backend->phys_layout); + kbasep_hwcnt_backend_jm_init_layout(&info->hwcnt_gpu_info, &backend->phys_layout); backend->kctx = kbase_create_context(kbdev, true, - BASE_CONTEXT_SYSTEM_MONITOR_SUBMIT_DISABLED, 0, NULL); + BASE_CONTEXT_SYSTEM_MONITOR_SUBMIT_DISABLED, 0, NULL); if (!backend->kctx) goto alloc_error; kbasep_js_schedule_privileged_ctx(kbdev, backend->kctx); - errcode = kbasep_hwcnt_backend_jm_dump_alloc( - info, backend->kctx, &backend->gpu_dump_va); + errcode = kbasep_hwcnt_backend_jm_dump_alloc(info, backend->kctx, &backend->gpu_dump_va); if (errcode) goto error; - backend->cpu_dump_va = kbase_phy_alloc_mapping_get(backend->kctx, - backend->gpu_dump_va, &backend->vmap); + backend->cpu_dump_va = + kbase_phy_alloc_mapping_get(backend->kctx, backend->gpu_dump_va, &backend->vmap); if (!backend->cpu_dump_va || !backend->vmap) goto alloc_error; - backend->to_user_buf = - kzalloc(info->metadata->dump_buf_bytes, GFP_KERNEL); + backend->to_user_buf = kzalloc(info->metadata->dump_buf_bytes, GFP_KERNEL); if (!backend->to_user_buf) goto alloc_error; kbase_ccswe_init(&backend->ccswe_shader_cores); backend->rate_listener.notify = kbasep_hwcnt_backend_jm_on_freq_change; -#if IS_ENABLED(CONFIG_MALI_NO_MALI) - /* The dummy model needs the CPU mapping. */ - page_count = PFN_UP(info->dump_bytes); - gpu_model_set_dummy_prfcnt_base_cpu(backend->cpu_dump_va, kbdev, - backend->vmap->cpu_pages, - page_count); -#endif /* CONFIG_MALI_NO_MALI */ - *out_backend = backend; return 0; @@ -804,9 +727,8 @@ kbasep_hwcnt_backend_jm_metadata(const struct kbase_hwcnt_backend_info *info) } /* JM backend implementation of kbase_hwcnt_backend_init_fn */ -static int kbasep_hwcnt_backend_jm_init( - const struct kbase_hwcnt_backend_info *info, - struct kbase_hwcnt_backend **out_backend) +static int kbasep_hwcnt_backend_jm_init(const struct kbase_hwcnt_backend_info *info, + struct kbase_hwcnt_backend **out_backend) { int errcode; struct kbase_hwcnt_backend_jm *backend = NULL; @@ -814,8 +736,8 @@ static int kbasep_hwcnt_backend_jm_init( if (!info || !out_backend) return -EINVAL; - errcode = kbasep_hwcnt_backend_jm_create( - (const struct kbase_hwcnt_backend_jm_info *) info, &backend); + errcode = kbasep_hwcnt_backend_jm_create((const struct kbase_hwcnt_backend_jm_info *)info, + &backend); if (errcode) return errcode; @@ -831,8 +753,7 @@ static void kbasep_hwcnt_backend_jm_term(struct kbase_hwcnt_backend *backend) return; kbasep_hwcnt_backend_jm_dump_disable(backend); - kbasep_hwcnt_backend_jm_destroy( - (struct kbase_hwcnt_backend_jm *)backend); + kbasep_hwcnt_backend_jm_destroy((struct kbase_hwcnt_backend_jm *)backend); } /** @@ -841,8 +762,7 @@ static void kbasep_hwcnt_backend_jm_term(struct kbase_hwcnt_backend *backend) * * Can be safely called on a backend info in any state of partial construction. */ -static void kbasep_hwcnt_backend_jm_info_destroy( - const struct kbase_hwcnt_backend_jm_info *info) +static void kbasep_hwcnt_backend_jm_info_destroy(const struct kbase_hwcnt_backend_jm_info *info) { if (!info) return; @@ -858,9 +778,8 @@ static void kbasep_hwcnt_backend_jm_info_destroy( * * Return: 0 on success, else error code. */ -static int kbasep_hwcnt_backend_jm_info_create( - struct kbase_device *kbdev, - const struct kbase_hwcnt_backend_jm_info **out_info) +static int kbasep_hwcnt_backend_jm_info_create(struct kbase_device *kbdev, + const struct kbase_hwcnt_backend_jm_info **out_info) { int errcode = -ENOMEM; struct kbase_hwcnt_backend_jm_info *info = NULL; @@ -883,15 +802,12 @@ static int kbasep_hwcnt_backend_jm_info_create( info->counter_set = KBASE_HWCNT_SET_PRIMARY; #endif - errcode = kbasep_hwcnt_backend_jm_gpu_info_init(kbdev, - &info->hwcnt_gpu_info); + errcode = kbasep_hwcnt_backend_jm_gpu_info_init(kbdev, &info->hwcnt_gpu_info); if (errcode) goto error; - errcode = kbase_hwcnt_jm_metadata_create(&info->hwcnt_gpu_info, - info->counter_set, - &info->metadata, - &info->dump_bytes); + errcode = kbase_hwcnt_jm_metadata_create(&info->hwcnt_gpu_info, info->counter_set, + &info->metadata, &info->dump_bytes); if (errcode) goto error; @@ -903,9 +819,8 @@ error: return errcode; } -int kbase_hwcnt_backend_jm_create( - struct kbase_device *kbdev, - struct kbase_hwcnt_backend_interface *iface) +int kbase_hwcnt_backend_jm_create(struct kbase_device *kbdev, + struct kbase_hwcnt_backend_interface *iface) { int errcode; const struct kbase_hwcnt_backend_jm_info *info = NULL; @@ -934,8 +849,7 @@ int kbase_hwcnt_backend_jm_create( return 0; } -void kbase_hwcnt_backend_jm_destroy( - struct kbase_hwcnt_backend_interface *iface) +void kbase_hwcnt_backend_jm_destroy(struct kbase_hwcnt_backend_interface *iface) { if (!iface) return; diff --git a/mali_kbase/mali_kbase_hwcnt_backend_jm.h b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend_jm.h index 1bc3906..4a6293c 100644 --- a/mali_kbase/mali_kbase_hwcnt_backend_jm.h +++ b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend_jm.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2018, 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2018, 2020-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -27,7 +27,7 @@ #ifndef _KBASE_HWCNT_BACKEND_JM_H_ #define _KBASE_HWCNT_BACKEND_JM_H_ -#include "mali_kbase_hwcnt_backend.h" +#include "hwcnt/backend/mali_kbase_hwcnt_backend.h" struct kbase_device; @@ -42,9 +42,8 @@ struct kbase_device; * * Return: 0 on success, else error code. */ -int kbase_hwcnt_backend_jm_create( - struct kbase_device *kbdev, - struct kbase_hwcnt_backend_interface *iface); +int kbase_hwcnt_backend_jm_create(struct kbase_device *kbdev, + struct kbase_hwcnt_backend_interface *iface); /** * kbase_hwcnt_backend_jm_destroy() - Destroy a JM hardware counter backend @@ -54,7 +53,6 @@ int kbase_hwcnt_backend_jm_create( * Can be safely called on an all-zeroed interface, or on an already destroyed * interface. */ -void kbase_hwcnt_backend_jm_destroy( - struct kbase_hwcnt_backend_interface *iface); +void kbase_hwcnt_backend_jm_destroy(struct kbase_hwcnt_backend_interface *iface); #endif /* _KBASE_HWCNT_BACKEND_JM_H_ */ diff --git a/mali_kbase/mali_kbase_hwcnt_backend_jm_watchdog.c b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend_jm_watchdog.c index cdf3cd9..a8654ea 100644 --- a/mali_kbase/mali_kbase_hwcnt_backend_jm_watchdog.c +++ b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend_jm_watchdog.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -21,13 +21,20 @@ #include <mali_kbase.h> -#include <mali_kbase_hwcnt_gpu.h> -#include <mali_kbase_hwcnt_types.h> +#include <hwcnt/mali_kbase_hwcnt_gpu.h> +#include <hwcnt/mali_kbase_hwcnt_types.h> -#include <mali_kbase_hwcnt_backend.h> -#include <mali_kbase_hwcnt_watchdog_if.h> +#include <hwcnt/backend/mali_kbase_hwcnt_backend.h> +#include <hwcnt/backend/mali_kbase_hwcnt_backend_jm_watchdog.h> +#include <hwcnt/mali_kbase_hwcnt_watchdog_if.h> +#if IS_ENABLED(CONFIG_MALI_IS_FPGA) && !IS_ENABLED(CONFIG_MALI_NO_MALI) +/* Backend watch dog timer interval in milliseconds: 18 seconds. */ +static const u32 hwcnt_backend_watchdog_timer_interval_ms = 18000; +#else +/* Backend watch dog timer interval in milliseconds: 1 second. */ static const u32 hwcnt_backend_watchdog_timer_interval_ms = 1000; +#endif /* IS_FPGA && !NO_MALI */ /* * IDLE_BUFFER_EMPTY -> USER_DUMPING_BUFFER_EMPTY on dump_request. @@ -112,8 +119,7 @@ enum backend_watchdog_state { */ enum wd_init_state { HWCNT_JM_WD_INIT_START, - HWCNT_JM_WD_INIT_ALLOC = HWCNT_JM_WD_INIT_START, - HWCNT_JM_WD_INIT_BACKEND, + HWCNT_JM_WD_INIT_BACKEND = HWCNT_JM_WD_INIT_START, HWCNT_JM_WD_INIT_ENABLE_MAP, HWCNT_JM_WD_INIT_DUMP_BUFFER, HWCNT_JM_WD_INIT_END @@ -290,16 +296,10 @@ kbasep_hwcnt_backend_jm_watchdog_term_partial(struct kbase_hwcnt_backend_jm_watc if (!wd_backend) return; - /* disable timer thread to avoid concurrent access to shared resources */ - wd_backend->info->dump_watchdog_iface->disable( - wd_backend->info->dump_watchdog_iface->timer); + WARN_ON(state > HWCNT_JM_WD_INIT_END); - /*will exit the loop when state reaches HWCNT_JM_WD_INIT_START*/ while (state-- > HWCNT_JM_WD_INIT_START) { switch (state) { - case HWCNT_JM_WD_INIT_ALLOC: - kfree(wd_backend); - break; case HWCNT_JM_WD_INIT_BACKEND: wd_backend->info->jm_backend_iface->term(wd_backend->jm_backend); break; @@ -313,6 +313,8 @@ kbasep_hwcnt_backend_jm_watchdog_term_partial(struct kbase_hwcnt_backend_jm_watc break; } } + + kfree(wd_backend); } /* Job manager watchdog backend, implementation of kbase_hwcnt_backend_term_fn @@ -320,11 +322,17 @@ kbasep_hwcnt_backend_jm_watchdog_term_partial(struct kbase_hwcnt_backend_jm_watc */ static void kbasep_hwcnt_backend_jm_watchdog_term(struct kbase_hwcnt_backend *backend) { + struct kbase_hwcnt_backend_jm_watchdog *wd_backend = + (struct kbase_hwcnt_backend_jm_watchdog *)backend; + if (!backend) return; - kbasep_hwcnt_backend_jm_watchdog_term_partial( - (struct kbase_hwcnt_backend_jm_watchdog *)backend, HWCNT_JM_WD_INIT_END); + /* disable timer thread to avoid concurrent access to shared resources */ + wd_backend->info->dump_watchdog_iface->disable( + wd_backend->info->dump_watchdog_iface->timer); + + kbasep_hwcnt_backend_jm_watchdog_term_partial(wd_backend, HWCNT_JM_WD_INIT_END); } /* Job manager watchdog backend, implementation of kbase_hwcnt_backend_init_fn */ @@ -344,20 +352,20 @@ static int kbasep_hwcnt_backend_jm_watchdog_init(const struct kbase_hwcnt_backen jm_info = wd_info->jm_backend_iface->info; metadata = wd_info->jm_backend_iface->metadata(wd_info->jm_backend_iface->info); + wd_backend = kmalloc(sizeof(*wd_backend), GFP_KERNEL); + if (!wd_backend) { + *out_backend = NULL; + return -ENOMEM; + } + + *wd_backend = (struct kbase_hwcnt_backend_jm_watchdog){ + .info = wd_info, + .timeout_ms = hwcnt_backend_watchdog_timer_interval_ms, + .locked = { .state = HWCNT_JM_WD_IDLE_BUFFER_EMPTY, .is_enabled = false } + }; + while (state < HWCNT_JM_WD_INIT_END && !errcode) { switch (state) { - case HWCNT_JM_WD_INIT_ALLOC: - wd_backend = kmalloc(sizeof(*wd_backend), GFP_KERNEL); - if (wd_backend) { - *wd_backend = (struct kbase_hwcnt_backend_jm_watchdog){ - .info = wd_info, - .timeout_ms = hwcnt_backend_watchdog_timer_interval_ms, - .locked = { .state = HWCNT_JM_WD_IDLE_BUFFER_EMPTY, - .is_enabled = false } - }; - } else - errcode = -ENOMEM; - break; case HWCNT_JM_WD_INIT_BACKEND: errcode = wd_info->jm_backend_iface->init(jm_info, &wd_backend->jm_backend); break; @@ -817,5 +825,5 @@ void kbase_hwcnt_backend_jm_watchdog_destroy(struct kbase_hwcnt_backend_interfac kfree((struct kbase_hwcnt_backend_jm_watchdog_info *)iface->info); /*blanking the watchdog backend interface*/ - *iface = (struct kbase_hwcnt_backend_interface){ NULL }; + memset(iface, 0, sizeof(*iface)); } diff --git a/mali_kbase/mali_kbase_hwcnt_backend_jm_watchdog.h b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend_jm_watchdog.h index 5021b4f..02a7952 100644 --- a/mali_kbase/mali_kbase_hwcnt_backend_jm_watchdog.h +++ b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend_jm_watchdog.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -32,8 +32,8 @@ #ifndef _KBASE_HWCNT_BACKEND_JM_WATCHDOG_H_ #define _KBASE_HWCNT_BACKEND_JM_WATCHDOG_H_ -#include <mali_kbase_hwcnt_backend.h> -#include <mali_kbase_hwcnt_watchdog_if.h> +#include <hwcnt/backend/mali_kbase_hwcnt_backend.h> +#include <hwcnt/mali_kbase_hwcnt_watchdog_if.h> /** * kbase_hwcnt_backend_jm_watchdog_create() - Create a job manager hardware counter watchdog diff --git a/mali_kbase/mali_kbase_hwcnt.c b/mali_kbase/hwcnt/mali_kbase_hwcnt.c index a54f005..34deb5d 100644 --- a/mali_kbase/mali_kbase_hwcnt.c +++ b/mali_kbase/hwcnt/mali_kbase_hwcnt.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2018, 2020-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2018-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -23,10 +23,10 @@ * Implementation of hardware counter context and accumulator APIs. */ -#include "mali_kbase_hwcnt_context.h" -#include "mali_kbase_hwcnt_accumulator.h" -#include "mali_kbase_hwcnt_backend.h" -#include "mali_kbase_hwcnt_types.h" +#include "hwcnt/mali_kbase_hwcnt_context.h" +#include "hwcnt/mali_kbase_hwcnt_accumulator.h" +#include "hwcnt/backend/mali_kbase_hwcnt_backend.h" +#include "hwcnt/mali_kbase_hwcnt_types.h" #include <linux/mutex.h> #include <linux/spinlock.h> @@ -39,11 +39,7 @@ * @ACCUM_STATE_ENABLED: Enabled state, where dumping is enabled if there are * any enabled counters. */ -enum kbase_hwcnt_accum_state { - ACCUM_STATE_ERROR, - ACCUM_STATE_DISABLED, - ACCUM_STATE_ENABLED -}; +enum kbase_hwcnt_accum_state { ACCUM_STATE_ERROR, ACCUM_STATE_DISABLED, ACCUM_STATE_ENABLED }; /** * struct kbase_hwcnt_accumulator - Hardware counter accumulator structure. @@ -130,9 +126,8 @@ struct kbase_hwcnt_context { struct workqueue_struct *wq; }; -int kbase_hwcnt_context_init( - const struct kbase_hwcnt_backend_interface *iface, - struct kbase_hwcnt_context **out_hctx) +int kbase_hwcnt_context_init(const struct kbase_hwcnt_backend_interface *iface, + struct kbase_hwcnt_context **out_hctx) { struct kbase_hwcnt_context *hctx = NULL; @@ -149,8 +144,7 @@ int kbase_hwcnt_context_init( mutex_init(&hctx->accum_lock); hctx->accum_inited = false; - hctx->wq = - alloc_workqueue("mali_kbase_hwcnt", WQ_HIGHPRI | WQ_UNBOUND, 0); + hctx->wq = alloc_workqueue("mali_kbase_hwcnt", WQ_HIGHPRI | WQ_UNBOUND, 0); if (!hctx->wq) goto err_alloc_workqueue; @@ -208,35 +202,30 @@ static int kbasep_hwcnt_accumulator_init(struct kbase_hwcnt_context *hctx) WARN_ON(!hctx); WARN_ON(!hctx->accum_inited); - errcode = hctx->iface->init( - hctx->iface->info, &hctx->accum.backend); + errcode = hctx->iface->init(hctx->iface->info, &hctx->accum.backend); if (errcode) goto error; hctx->accum.metadata = hctx->iface->metadata(hctx->iface->info); hctx->accum.state = ACCUM_STATE_ERROR; - errcode = kbase_hwcnt_enable_map_alloc(hctx->accum.metadata, - &hctx->accum.enable_map); + errcode = kbase_hwcnt_enable_map_alloc(hctx->accum.metadata, &hctx->accum.enable_map); if (errcode) goto error; hctx->accum.enable_map_any_enabled = false; - errcode = kbase_hwcnt_dump_buffer_alloc(hctx->accum.metadata, - &hctx->accum.accum_buf); + errcode = kbase_hwcnt_dump_buffer_alloc(hctx->accum.metadata, &hctx->accum.accum_buf); if (errcode) goto error; - errcode = kbase_hwcnt_enable_map_alloc(hctx->accum.metadata, - &hctx->accum.scratch_map); + errcode = kbase_hwcnt_enable_map_alloc(hctx->accum.metadata, &hctx->accum.scratch_map); if (errcode) goto error; hctx->accum.accumulated = false; - hctx->accum.ts_last_dump_ns = - hctx->iface->timestamp_ns(hctx->accum.backend); + hctx->accum.ts_last_dump_ns = hctx->iface->timestamp_ns(hctx->accum.backend); return 0; @@ -252,8 +241,7 @@ error: * @hctx: Non-NULL pointer to hardware counter context. * @accumulate: True if we should accumulate before disabling, else false. */ -static void kbasep_hwcnt_accumulator_disable( - struct kbase_hwcnt_context *hctx, bool accumulate) +static void kbasep_hwcnt_accumulator_disable(struct kbase_hwcnt_context *hctx, bool accumulate) { int errcode = 0; bool backend_enabled = false; @@ -272,8 +260,7 @@ static void kbasep_hwcnt_accumulator_disable( WARN_ON(hctx->disable_count != 0); WARN_ON(hctx->accum.state == ACCUM_STATE_DISABLED); - if ((hctx->accum.state == ACCUM_STATE_ENABLED) && - (accum->enable_map_any_enabled)) + if ((hctx->accum.state == ACCUM_STATE_ENABLED) && (accum->enable_map_any_enabled)) backend_enabled = true; if (!backend_enabled) @@ -297,8 +284,8 @@ static void kbasep_hwcnt_accumulator_disable( if (errcode) goto disable; - errcode = hctx->iface->dump_get(accum->backend, - &accum->accum_buf, &accum->enable_map, accum->accumulated); + errcode = hctx->iface->dump_get(accum->backend, &accum->accum_buf, &accum->enable_map, + accum->accumulated); if (errcode) goto disable; @@ -336,8 +323,7 @@ static void kbasep_hwcnt_accumulator_enable(struct kbase_hwcnt_context *hctx) /* The backend only needs enabling if any counters are enabled */ if (accum->enable_map_any_enabled) - errcode = hctx->iface->dump_enable_nolock( - accum->backend, &accum->enable_map); + errcode = hctx->iface->dump_enable_nolock(accum->backend, &accum->enable_map); if (!errcode) accum->state = ACCUM_STATE_ENABLED; @@ -364,12 +350,9 @@ static void kbasep_hwcnt_accumulator_enable(struct kbase_hwcnt_context *hctx) * * Return: 0 on success, else error code. */ -static int kbasep_hwcnt_accumulator_dump( - struct kbase_hwcnt_context *hctx, - u64 *ts_start_ns, - u64 *ts_end_ns, - struct kbase_hwcnt_dump_buffer *dump_buf, - const struct kbase_hwcnt_enable_map *new_map) +static int kbasep_hwcnt_accumulator_dump(struct kbase_hwcnt_context *hctx, u64 *ts_start_ns, + u64 *ts_end_ns, struct kbase_hwcnt_dump_buffer *dump_buf, + const struct kbase_hwcnt_enable_map *new_map) { int errcode = 0; unsigned long flags; @@ -379,7 +362,7 @@ static int kbasep_hwcnt_accumulator_dump( bool cur_map_any_enabled; struct kbase_hwcnt_enable_map *cur_map; bool new_map_any_enabled = false; - u64 dump_time_ns; + u64 dump_time_ns = 0; struct kbase_hwcnt_accumulator *accum; WARN_ON(!hctx); @@ -398,8 +381,7 @@ static int kbasep_hwcnt_accumulator_dump( kbase_hwcnt_enable_map_copy(cur_map, &accum->enable_map); if (new_map) - new_map_any_enabled = - kbase_hwcnt_enable_map_any_enabled(new_map); + new_map_any_enabled = kbase_hwcnt_enable_map_any_enabled(new_map); /* * We're holding accum_lock, so the accumulator state might transition @@ -426,8 +408,7 @@ static int kbasep_hwcnt_accumulator_dump( * then we'll do it ourselves after the dump. */ if (new_map) { - kbase_hwcnt_enable_map_copy( - &accum->enable_map, new_map); + kbase_hwcnt_enable_map_copy(&accum->enable_map, new_map); accum->enable_map_any_enabled = new_map_any_enabled; } @@ -440,12 +421,10 @@ static int kbasep_hwcnt_accumulator_dump( /* Initiate the dump if the backend is enabled. */ if ((state == ACCUM_STATE_ENABLED) && cur_map_any_enabled) { if (dump_buf) { - errcode = hctx->iface->dump_request( - accum->backend, &dump_time_ns); + errcode = hctx->iface->dump_request(accum->backend, &dump_time_ns); dump_requested = true; } else { - dump_time_ns = hctx->iface->timestamp_ns( - accum->backend); + dump_time_ns = hctx->iface->timestamp_ns(accum->backend); errcode = hctx->iface->dump_clear(accum->backend); } @@ -457,8 +436,7 @@ static int kbasep_hwcnt_accumulator_dump( /* Copy any accumulation into the dest buffer */ if (accum->accumulated && dump_buf) { - kbase_hwcnt_dump_buffer_copy( - dump_buf, &accum->accum_buf, cur_map); + kbase_hwcnt_dump_buffer_copy(dump_buf, &accum->accum_buf, cur_map); dump_written = true; } @@ -483,8 +461,7 @@ static int kbasep_hwcnt_accumulator_dump( * we're already enabled and holding accum_lock is impossible. */ if (new_map_any_enabled) { - errcode = hctx->iface->dump_enable( - accum->backend, new_map); + errcode = hctx->iface->dump_enable(accum->backend, new_map); if (errcode) goto error; } @@ -495,11 +472,8 @@ static int kbasep_hwcnt_accumulator_dump( /* If we dumped, copy or accumulate it into the destination */ if (dump_requested) { WARN_ON(state != ACCUM_STATE_ENABLED); - errcode = hctx->iface->dump_get( - accum->backend, - dump_buf, - cur_map, - dump_written); + errcode = hctx->iface->dump_get(accum->backend, dump_buf, cur_map, + dump_written); if (errcode) goto error; dump_written = true; @@ -540,8 +514,7 @@ error: * @hctx: Non-NULL pointer to hardware counter context. * @accumulate: True if we should accumulate before disabling, else false. */ -static void kbasep_hwcnt_context_disable( - struct kbase_hwcnt_context *hctx, bool accumulate) +static void kbasep_hwcnt_context_disable(struct kbase_hwcnt_context *hctx, bool accumulate) { unsigned long flags; @@ -563,9 +536,8 @@ static void kbasep_hwcnt_context_disable( } } -int kbase_hwcnt_accumulator_acquire( - struct kbase_hwcnt_context *hctx, - struct kbase_hwcnt_accumulator **accum) +int kbase_hwcnt_accumulator_acquire(struct kbase_hwcnt_context *hctx, + struct kbase_hwcnt_accumulator **accum) { int errcode = 0; unsigned long flags; @@ -618,9 +590,7 @@ int kbase_hwcnt_accumulator_acquire( * Regardless of initial state, counters don't need to be enabled via * the backend, as the initial enable map has no enabled counters. */ - hctx->accum.state = (hctx->disable_count == 0) ? - ACCUM_STATE_ENABLED : - ACCUM_STATE_DISABLED; + hctx->accum.state = (hctx->disable_count == 0) ? ACCUM_STATE_ENABLED : ACCUM_STATE_DISABLED; spin_unlock_irqrestore(&hctx->state_lock, flags); @@ -728,8 +698,7 @@ void kbase_hwcnt_context_enable(struct kbase_hwcnt_context *hctx) spin_unlock_irqrestore(&hctx->state_lock, flags); } -const struct kbase_hwcnt_metadata *kbase_hwcnt_context_metadata( - struct kbase_hwcnt_context *hctx) +const struct kbase_hwcnt_metadata *kbase_hwcnt_context_metadata(struct kbase_hwcnt_context *hctx) { if (!hctx) return NULL; @@ -737,8 +706,7 @@ const struct kbase_hwcnt_metadata *kbase_hwcnt_context_metadata( return hctx->iface->metadata(hctx->iface->info); } -bool kbase_hwcnt_context_queue_work(struct kbase_hwcnt_context *hctx, - struct work_struct *work) +bool kbase_hwcnt_context_queue_work(struct kbase_hwcnt_context *hctx, struct work_struct *work) { if (WARN_ON(!hctx) || WARN_ON(!work)) return false; @@ -746,12 +714,10 @@ bool kbase_hwcnt_context_queue_work(struct kbase_hwcnt_context *hctx, return queue_work(hctx->wq, work); } -int kbase_hwcnt_accumulator_set_counters( - struct kbase_hwcnt_accumulator *accum, - const struct kbase_hwcnt_enable_map *new_map, - u64 *ts_start_ns, - u64 *ts_end_ns, - struct kbase_hwcnt_dump_buffer *dump_buf) +int kbase_hwcnt_accumulator_set_counters(struct kbase_hwcnt_accumulator *accum, + const struct kbase_hwcnt_enable_map *new_map, + u64 *ts_start_ns, u64 *ts_end_ns, + struct kbase_hwcnt_dump_buffer *dump_buf) { int errcode; struct kbase_hwcnt_context *hctx; @@ -767,19 +733,15 @@ int kbase_hwcnt_accumulator_set_counters( mutex_lock(&hctx->accum_lock); - errcode = kbasep_hwcnt_accumulator_dump( - hctx, ts_start_ns, ts_end_ns, dump_buf, new_map); + errcode = kbasep_hwcnt_accumulator_dump(hctx, ts_start_ns, ts_end_ns, dump_buf, new_map); mutex_unlock(&hctx->accum_lock); return errcode; } -int kbase_hwcnt_accumulator_dump( - struct kbase_hwcnt_accumulator *accum, - u64 *ts_start_ns, - u64 *ts_end_ns, - struct kbase_hwcnt_dump_buffer *dump_buf) +int kbase_hwcnt_accumulator_dump(struct kbase_hwcnt_accumulator *accum, u64 *ts_start_ns, + u64 *ts_end_ns, struct kbase_hwcnt_dump_buffer *dump_buf) { int errcode; struct kbase_hwcnt_context *hctx; @@ -794,8 +756,7 @@ int kbase_hwcnt_accumulator_dump( mutex_lock(&hctx->accum_lock); - errcode = kbasep_hwcnt_accumulator_dump( - hctx, ts_start_ns, ts_end_ns, dump_buf, NULL); + errcode = kbasep_hwcnt_accumulator_dump(hctx, ts_start_ns, ts_end_ns, dump_buf, NULL); mutex_unlock(&hctx->accum_lock); diff --git a/mali_kbase/mali_kbase_hwcnt_accumulator.h b/mali_kbase/hwcnt/mali_kbase_hwcnt_accumulator.h index af542ea..069e020 100644 --- a/mali_kbase/mali_kbase_hwcnt_accumulator.h +++ b/mali_kbase/hwcnt/mali_kbase_hwcnt_accumulator.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2018, 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2018, 2020-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -67,9 +67,8 @@ struct kbase_hwcnt_dump_buffer; * * Return: 0 on success or error code. */ -int kbase_hwcnt_accumulator_acquire( - struct kbase_hwcnt_context *hctx, - struct kbase_hwcnt_accumulator **accum); +int kbase_hwcnt_accumulator_acquire(struct kbase_hwcnt_context *hctx, + struct kbase_hwcnt_accumulator **accum); /** * kbase_hwcnt_accumulator_release() - Release a hardware counter accumulator. @@ -102,12 +101,10 @@ void kbase_hwcnt_accumulator_release(struct kbase_hwcnt_accumulator *accum); * * Return: 0 on success or error code. */ -int kbase_hwcnt_accumulator_set_counters( - struct kbase_hwcnt_accumulator *accum, - const struct kbase_hwcnt_enable_map *new_map, - u64 *ts_start_ns, - u64 *ts_end_ns, - struct kbase_hwcnt_dump_buffer *dump_buf); +int kbase_hwcnt_accumulator_set_counters(struct kbase_hwcnt_accumulator *accum, + const struct kbase_hwcnt_enable_map *new_map, + u64 *ts_start_ns, u64 *ts_end_ns, + struct kbase_hwcnt_dump_buffer *dump_buf); /** * kbase_hwcnt_accumulator_dump() - Perform a dump of the currently enabled @@ -127,11 +124,8 @@ int kbase_hwcnt_accumulator_set_counters( * * Return: 0 on success or error code. */ -int kbase_hwcnt_accumulator_dump( - struct kbase_hwcnt_accumulator *accum, - u64 *ts_start_ns, - u64 *ts_end_ns, - struct kbase_hwcnt_dump_buffer *dump_buf); +int kbase_hwcnt_accumulator_dump(struct kbase_hwcnt_accumulator *accum, u64 *ts_start_ns, + u64 *ts_end_ns, struct kbase_hwcnt_dump_buffer *dump_buf); /** * kbase_hwcnt_accumulator_timestamp_ns() - Get the current accumulator backend diff --git a/mali_kbase/mali_kbase_hwcnt_context.h b/mali_kbase/hwcnt/mali_kbase_hwcnt_context.h index 34423d1..89732a9 100644 --- a/mali_kbase/mali_kbase_hwcnt_context.h +++ b/mali_kbase/hwcnt/mali_kbase_hwcnt_context.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2018, 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2018, 2020-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -43,9 +43,8 @@ struct kbase_hwcnt_context; * * Return: 0 on success, else error code. */ -int kbase_hwcnt_context_init( - const struct kbase_hwcnt_backend_interface *iface, - struct kbase_hwcnt_context **out_hctx); +int kbase_hwcnt_context_init(const struct kbase_hwcnt_backend_interface *iface, + struct kbase_hwcnt_context **out_hctx); /** * kbase_hwcnt_context_term() - Terminate a hardware counter context. @@ -61,8 +60,7 @@ void kbase_hwcnt_context_term(struct kbase_hwcnt_context *hctx); * * Return: Non-NULL pointer to metadata, or NULL on error. */ -const struct kbase_hwcnt_metadata *kbase_hwcnt_context_metadata( - struct kbase_hwcnt_context *hctx); +const struct kbase_hwcnt_metadata *kbase_hwcnt_context_metadata(struct kbase_hwcnt_context *hctx); /** * kbase_hwcnt_context_disable() - Increment the disable count of the context. @@ -145,7 +143,6 @@ void kbase_hwcnt_context_enable(struct kbase_hwcnt_context *hctx); * this meant progress through the power management states could be stalled * for however long that higher priority thread took. */ -bool kbase_hwcnt_context_queue_work(struct kbase_hwcnt_context *hctx, - struct work_struct *work); +bool kbase_hwcnt_context_queue_work(struct kbase_hwcnt_context *hctx, struct work_struct *work); #endif /* _KBASE_HWCNT_CONTEXT_H_ */ diff --git a/mali_kbase/mali_kbase_hwcnt_gpu.c b/mali_kbase/hwcnt/mali_kbase_hwcnt_gpu.c index 752d096..74916da 100644 --- a/mali_kbase/mali_kbase_hwcnt_gpu.c +++ b/mali_kbase/hwcnt/mali_kbase_hwcnt_gpu.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2018-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2018-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -19,10 +19,9 @@ * */ -#include "mali_kbase_hwcnt_gpu.h" -#include "mali_kbase_hwcnt_types.h" +#include "hwcnt/mali_kbase_hwcnt_gpu.h" +#include "hwcnt/mali_kbase_hwcnt_types.h" -#include <linux/bug.h> #include <linux/err.h> /** enum enable_map_idx - index into a block enable map that spans multiple u64 array elements @@ -33,8 +32,7 @@ enum enable_map_idx { EM_COUNT, }; -static void kbasep_get_fe_block_type(u64 *dst, enum kbase_hwcnt_set counter_set, - bool is_csf) +static void kbasep_get_fe_block_type(u64 *dst, enum kbase_hwcnt_set counter_set, bool is_csf) { switch (counter_set) { case KBASE_HWCNT_SET_PRIMARY: @@ -44,21 +42,20 @@ static void kbasep_get_fe_block_type(u64 *dst, enum kbase_hwcnt_set counter_set, if (is_csf) *dst = KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_FE2; else - *dst = KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_UNDEFINED; + *dst = KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_FE_UNDEFINED; break; case KBASE_HWCNT_SET_TERTIARY: if (is_csf) *dst = KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_FE3; else - *dst = KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_UNDEFINED; + *dst = KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_FE_UNDEFINED; break; default: WARN_ON(true); } } -static void kbasep_get_tiler_block_type(u64 *dst, - enum kbase_hwcnt_set counter_set) +static void kbasep_get_tiler_block_type(u64 *dst, enum kbase_hwcnt_set counter_set) { switch (counter_set) { case KBASE_HWCNT_SET_PRIMARY: @@ -66,15 +63,14 @@ static void kbasep_get_tiler_block_type(u64 *dst, break; case KBASE_HWCNT_SET_SECONDARY: case KBASE_HWCNT_SET_TERTIARY: - *dst = KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_UNDEFINED; + *dst = KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_TILER_UNDEFINED; break; default: WARN_ON(true); } } -static void kbasep_get_sc_block_type(u64 *dst, enum kbase_hwcnt_set counter_set, - bool is_csf) +static void kbasep_get_sc_block_type(u64 *dst, enum kbase_hwcnt_set counter_set, bool is_csf) { switch (counter_set) { case KBASE_HWCNT_SET_PRIMARY: @@ -87,15 +83,14 @@ static void kbasep_get_sc_block_type(u64 *dst, enum kbase_hwcnt_set counter_set, if (is_csf) *dst = KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC3; else - *dst = KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_UNDEFINED; + *dst = KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC_UNDEFINED; break; default: WARN_ON(true); } } -static void kbasep_get_memsys_block_type(u64 *dst, - enum kbase_hwcnt_set counter_set) +static void kbasep_get_memsys_block_type(u64 *dst, enum kbase_hwcnt_set counter_set) { switch (counter_set) { case KBASE_HWCNT_SET_PRIMARY: @@ -105,7 +100,7 @@ static void kbasep_get_memsys_block_type(u64 *dst, *dst = KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_MEMSYS2; break; case KBASE_HWCNT_SET_TERTIARY: - *dst = KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_UNDEFINED; + *dst = KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_MEMSYS_UNDEFINED; break; default: WARN_ON(true); @@ -123,15 +118,14 @@ static void kbasep_get_memsys_block_type(u64 *dst, * * Return: 0 on success, else error code. */ -static int kbasep_hwcnt_backend_gpu_metadata_create( - const struct kbase_hwcnt_gpu_info *gpu_info, const bool is_csf, - enum kbase_hwcnt_set counter_set, - const struct kbase_hwcnt_metadata **metadata) +static int kbasep_hwcnt_backend_gpu_metadata_create(const struct kbase_hwcnt_gpu_info *gpu_info, + const bool is_csf, + enum kbase_hwcnt_set counter_set, + const struct kbase_hwcnt_metadata **metadata) { struct kbase_hwcnt_description desc; struct kbase_hwcnt_group_description group; - struct kbase_hwcnt_block_description - blks[KBASE_HWCNT_V5_BLOCK_TYPE_COUNT]; + struct kbase_hwcnt_block_description blks[KBASE_HWCNT_V5_BLOCK_TYPE_COUNT]; size_t non_sc_block_count; size_t sc_block_count; @@ -157,22 +151,19 @@ static int kbasep_hwcnt_backend_gpu_metadata_create( kbasep_get_fe_block_type(&blks[0].type, counter_set, is_csf); blks[0].inst_cnt = 1; blks[0].hdr_cnt = KBASE_HWCNT_V5_HEADERS_PER_BLOCK; - blks[0].ctr_cnt = gpu_info->prfcnt_values_per_block - - KBASE_HWCNT_V5_HEADERS_PER_BLOCK; + blks[0].ctr_cnt = gpu_info->prfcnt_values_per_block - KBASE_HWCNT_V5_HEADERS_PER_BLOCK; /* One Tiler block */ kbasep_get_tiler_block_type(&blks[1].type, counter_set); blks[1].inst_cnt = 1; blks[1].hdr_cnt = KBASE_HWCNT_V5_HEADERS_PER_BLOCK; - blks[1].ctr_cnt = gpu_info->prfcnt_values_per_block - - KBASE_HWCNT_V5_HEADERS_PER_BLOCK; + blks[1].ctr_cnt = gpu_info->prfcnt_values_per_block - KBASE_HWCNT_V5_HEADERS_PER_BLOCK; /* l2_count memsys blks */ kbasep_get_memsys_block_type(&blks[2].type, counter_set); blks[2].inst_cnt = gpu_info->l2_count; blks[2].hdr_cnt = KBASE_HWCNT_V5_HEADERS_PER_BLOCK; - blks[2].ctr_cnt = gpu_info->prfcnt_values_per_block - - KBASE_HWCNT_V5_HEADERS_PER_BLOCK; + blks[2].ctr_cnt = gpu_info->prfcnt_values_per_block - KBASE_HWCNT_V5_HEADERS_PER_BLOCK; /* * There are as many shader cores in the system as there are bits set in @@ -193,8 +184,7 @@ static int kbasep_hwcnt_backend_gpu_metadata_create( kbasep_get_sc_block_type(&blks[3].type, counter_set, is_csf); blks[3].inst_cnt = sc_block_count; blks[3].hdr_cnt = KBASE_HWCNT_V5_HEADERS_PER_BLOCK; - blks[3].ctr_cnt = gpu_info->prfcnt_values_per_block - - KBASE_HWCNT_V5_HEADERS_PER_BLOCK; + blks[3].ctr_cnt = gpu_info->prfcnt_values_per_block - KBASE_HWCNT_V5_HEADERS_PER_BLOCK; WARN_ON(KBASE_HWCNT_V5_BLOCK_TYPE_COUNT != 4); @@ -221,8 +211,7 @@ static int kbasep_hwcnt_backend_gpu_metadata_create( * * Return: Size of buffer the GPU needs to perform a counter dump. */ -static size_t -kbasep_hwcnt_backend_jm_dump_bytes(const struct kbase_hwcnt_gpu_info *gpu_info) +static size_t kbasep_hwcnt_backend_jm_dump_bytes(const struct kbase_hwcnt_gpu_info *gpu_info) { WARN_ON(!gpu_info); @@ -230,11 +219,10 @@ kbasep_hwcnt_backend_jm_dump_bytes(const struct kbase_hwcnt_gpu_info *gpu_info) gpu_info->prfcnt_values_per_block * KBASE_HWCNT_VALUE_HW_BYTES; } -int kbase_hwcnt_jm_metadata_create( - const struct kbase_hwcnt_gpu_info *gpu_info, - enum kbase_hwcnt_set counter_set, - const struct kbase_hwcnt_metadata **out_metadata, - size_t *out_dump_bytes) +int kbase_hwcnt_jm_metadata_create(const struct kbase_hwcnt_gpu_info *gpu_info, + enum kbase_hwcnt_set counter_set, + const struct kbase_hwcnt_metadata **out_metadata, + size_t *out_dump_bytes) { int errcode; const struct kbase_hwcnt_metadata *metadata; @@ -251,8 +239,7 @@ int kbase_hwcnt_jm_metadata_create( * all the available L2 cache and Shader cores are allocated. */ dump_bytes = kbasep_hwcnt_backend_jm_dump_bytes(gpu_info); - errcode = kbasep_hwcnt_backend_gpu_metadata_create( - gpu_info, false, counter_set, &metadata); + errcode = kbasep_hwcnt_backend_gpu_metadata_create(gpu_info, false, counter_set, &metadata); if (errcode) return errcode; @@ -277,10 +264,9 @@ void kbase_hwcnt_jm_metadata_destroy(const struct kbase_hwcnt_metadata *metadata kbase_hwcnt_metadata_destroy(metadata); } -int kbase_hwcnt_csf_metadata_create( - const struct kbase_hwcnt_gpu_info *gpu_info, - enum kbase_hwcnt_set counter_set, - const struct kbase_hwcnt_metadata **out_metadata) +int kbase_hwcnt_csf_metadata_create(const struct kbase_hwcnt_gpu_info *gpu_info, + enum kbase_hwcnt_set counter_set, + const struct kbase_hwcnt_metadata **out_metadata) { int errcode; const struct kbase_hwcnt_metadata *metadata; @@ -288,8 +274,7 @@ int kbase_hwcnt_csf_metadata_create( if (!gpu_info || !out_metadata) return -EINVAL; - errcode = kbasep_hwcnt_backend_gpu_metadata_create( - gpu_info, true, counter_set, &metadata); + errcode = kbasep_hwcnt_backend_gpu_metadata_create(gpu_info, true, counter_set, &metadata); if (errcode) return errcode; @@ -298,8 +283,7 @@ int kbase_hwcnt_csf_metadata_create( return 0; } -void kbase_hwcnt_csf_metadata_destroy( - const struct kbase_hwcnt_metadata *metadata) +void kbase_hwcnt_csf_metadata_destroy(const struct kbase_hwcnt_metadata *metadata) { if (!metadata) return; @@ -307,10 +291,7 @@ void kbase_hwcnt_csf_metadata_destroy( kbase_hwcnt_metadata_destroy(metadata); } -static bool is_block_type_shader( - const u64 grp_type, - const u64 blk_type, - const size_t blk) +static bool is_block_type_shader(const u64 grp_type, const u64 blk_type, const size_t blk) { bool is_shader = false; @@ -320,22 +301,22 @@ static bool is_block_type_shader( if (blk_type == KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC || blk_type == KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC2 || - blk_type == KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC3) + blk_type == KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC3 || + blk_type == KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC_UNDEFINED) is_shader = true; return is_shader; } -static bool is_block_type_l2_cache( - const u64 grp_type, - const u64 blk_type) +static bool is_block_type_l2_cache(const u64 grp_type, const u64 blk_type) { bool is_l2_cache = false; switch (grp_type) { case KBASE_HWCNT_GPU_GROUP_TYPE_V5: if (blk_type == KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_MEMSYS || - blk_type == KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_MEMSYS2) + blk_type == KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_MEMSYS2 || + blk_type == KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_MEMSYS_UNDEFINED) is_l2_cache = true; break; default: @@ -347,10 +328,8 @@ static bool is_block_type_l2_cache( } int kbase_hwcnt_jm_dump_get(struct kbase_hwcnt_dump_buffer *dst, u64 *src, - const struct kbase_hwcnt_enable_map *dst_enable_map, - u64 pm_core_mask, - const struct kbase_hwcnt_curr_config *curr_config, - bool accumulate) + const struct kbase_hwcnt_enable_map *dst_enable_map, u64 pm_core_mask, + const struct kbase_hwcnt_curr_config *curr_config, bool accumulate) { const struct kbase_hwcnt_metadata *metadata; size_t grp, blk, blk_inst; @@ -361,28 +340,23 @@ int kbase_hwcnt_jm_dump_get(struct kbase_hwcnt_dump_buffer *dst, u64 *src, /* Variables to deal with the current configuration */ int l2_count = 0; - if (!dst || !src || !dst_enable_map || - (dst_enable_map->metadata != dst->metadata)) + if (!dst || !src || !dst_enable_map || (dst_enable_map->metadata != dst->metadata)) return -EINVAL; metadata = dst->metadata; - kbase_hwcnt_metadata_for_each_block( - metadata, grp, blk, blk_inst) { - const size_t hdr_cnt = - kbase_hwcnt_metadata_block_headers_count( - metadata, grp, blk); + kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst) + { + const size_t hdr_cnt = kbase_hwcnt_metadata_block_headers_count(metadata, grp, blk); const size_t ctr_cnt = - kbase_hwcnt_metadata_block_counters_count( - metadata, grp, blk); - const u64 blk_type = kbase_hwcnt_metadata_block_type( - metadata, grp, blk); + kbase_hwcnt_metadata_block_counters_count(metadata, grp, blk); + const u64 blk_type = kbase_hwcnt_metadata_block_type(metadata, grp, blk); const bool is_shader_core = is_block_type_shader( - kbase_hwcnt_metadata_group_type(metadata, grp), - blk_type, blk); + kbase_hwcnt_metadata_group_type(metadata, grp), blk_type, blk); const bool is_l2_cache = is_block_type_l2_cache( - kbase_hwcnt_metadata_group_type(metadata, grp), - blk_type); + kbase_hwcnt_metadata_group_type(metadata, grp), blk_type); + const bool is_undefined = kbase_hwcnt_is_block_type_undefined( + kbase_hwcnt_metadata_group_type(metadata, grp), blk_type); bool hw_res_available = true; /* @@ -409,25 +383,46 @@ int kbase_hwcnt_jm_dump_get(struct kbase_hwcnt_dump_buffer *dst, u64 *src, /* * Skip block if no values in the destination block are enabled. */ - if (kbase_hwcnt_enable_map_block_enabled( - dst_enable_map, grp, blk, blk_inst)) { - u64 *dst_blk = kbase_hwcnt_dump_buffer_block_instance( - dst, grp, blk, blk_inst); + if (kbase_hwcnt_enable_map_block_enabled(dst_enable_map, grp, blk, blk_inst)) { + u64 *dst_blk = + kbase_hwcnt_dump_buffer_block_instance(dst, grp, blk, blk_inst); const u64 *src_blk = dump_src + src_offset; + bool blk_powered; + + if (!is_shader_core) { + /* Under the current PM system, counters will + * only be enabled after all non shader core + * blocks are powered up. + */ + blk_powered = true; + } else { + /* Check the PM core mask to see if the shader + * core is powered up. + */ + blk_powered = core_mask & 1; + } - if ((!is_shader_core || (core_mask & 1)) && hw_res_available) { + if (blk_powered && !is_undefined && hw_res_available) { + /* Only powered and defined blocks have valid data. */ if (accumulate) { - kbase_hwcnt_dump_buffer_block_accumulate( - dst_blk, src_blk, hdr_cnt, - ctr_cnt); + kbase_hwcnt_dump_buffer_block_accumulate(dst_blk, src_blk, + hdr_cnt, ctr_cnt); } else { - kbase_hwcnt_dump_buffer_block_copy( - dst_blk, src_blk, - (hdr_cnt + ctr_cnt)); + kbase_hwcnt_dump_buffer_block_copy(dst_blk, src_blk, + (hdr_cnt + ctr_cnt)); + } + } else { + /* Even though the block might be undefined, the + * user has enabled counter collection for it. + * We should not propagate garbage data. + */ + if (accumulate) { + /* No-op to preserve existing values */ + } else { + /* src is garbage, so zero the dst */ + kbase_hwcnt_dump_buffer_block_zero(dst_blk, + (hdr_cnt + ctr_cnt)); } - } else if (!accumulate) { - kbase_hwcnt_dump_buffer_block_zero( - dst_blk, (hdr_cnt + ctr_cnt)); } } @@ -442,42 +437,55 @@ int kbase_hwcnt_jm_dump_get(struct kbase_hwcnt_dump_buffer *dst, u64 *src, } int kbase_hwcnt_csf_dump_get(struct kbase_hwcnt_dump_buffer *dst, u64 *src, - const struct kbase_hwcnt_enable_map *dst_enable_map, - bool accumulate) + const struct kbase_hwcnt_enable_map *dst_enable_map, bool accumulate) { const struct kbase_hwcnt_metadata *metadata; const u64 *dump_src = src; size_t src_offset = 0; size_t grp, blk, blk_inst; - if (!dst || !src || !dst_enable_map || - (dst_enable_map->metadata != dst->metadata)) + if (!dst || !src || !dst_enable_map || (dst_enable_map->metadata != dst->metadata)) return -EINVAL; metadata = dst->metadata; - kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst) { - const size_t hdr_cnt = kbase_hwcnt_metadata_block_headers_count( - metadata, grp, blk); + kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst) + { + const size_t hdr_cnt = kbase_hwcnt_metadata_block_headers_count(metadata, grp, blk); const size_t ctr_cnt = - kbase_hwcnt_metadata_block_counters_count(metadata, grp, - blk); + kbase_hwcnt_metadata_block_counters_count(metadata, grp, blk); + const uint64_t blk_type = kbase_hwcnt_metadata_block_type(metadata, grp, blk); + const bool is_undefined = kbase_hwcnt_is_block_type_undefined( + kbase_hwcnt_metadata_group_type(metadata, grp), blk_type); /* * Skip block if no values in the destination block are enabled. */ - if (kbase_hwcnt_enable_map_block_enabled(dst_enable_map, grp, - blk, blk_inst)) { - u64 *dst_blk = kbase_hwcnt_dump_buffer_block_instance( - dst, grp, blk, blk_inst); + if (kbase_hwcnt_enable_map_block_enabled(dst_enable_map, grp, blk, blk_inst)) { + u64 *dst_blk = + kbase_hwcnt_dump_buffer_block_instance(dst, grp, blk, blk_inst); const u64 *src_blk = dump_src + src_offset; - if (accumulate) { - kbase_hwcnt_dump_buffer_block_accumulate( - dst_blk, src_blk, hdr_cnt, ctr_cnt); + if (!is_undefined) { + if (accumulate) { + kbase_hwcnt_dump_buffer_block_accumulate(dst_blk, src_blk, + hdr_cnt, ctr_cnt); + } else { + kbase_hwcnt_dump_buffer_block_copy(dst_blk, src_blk, + (hdr_cnt + ctr_cnt)); + } } else { - kbase_hwcnt_dump_buffer_block_copy( - dst_blk, src_blk, (hdr_cnt + ctr_cnt)); + /* Even though the block might be undefined, the + * user has enabled counter collection for it. + * We should not propagate garbage data. + */ + if (accumulate) { + /* No-op to preserve existing values */ + } else { + /* src is garbage, so zero the dst */ + kbase_hwcnt_dump_buffer_block_zero(dst_blk, + (hdr_cnt + ctr_cnt)); + } } } @@ -498,12 +506,9 @@ int kbase_hwcnt_csf_dump_get(struct kbase_hwcnt_dump_buffer *dst, u64 *src, * @hi: Non-NULL pointer to where high 64 bits of block enable map abstraction * will be stored. */ -static inline void kbasep_hwcnt_backend_gpu_block_map_from_physical( - u32 phys, - u64 *lo, - u64 *hi) +static inline void kbasep_hwcnt_backend_gpu_block_map_from_physical(u32 phys, u64 *lo, u64 *hi) { - u64 dwords[2] = {0, 0}; + u64 dwords[2] = { 0, 0 }; size_t dword_idx; @@ -528,9 +533,8 @@ static inline void kbasep_hwcnt_backend_gpu_block_map_from_physical( *hi = dwords[1]; } -void kbase_hwcnt_gpu_enable_map_to_physical( - struct kbase_hwcnt_physical_enable_map *dst, - const struct kbase_hwcnt_enable_map *src) +void kbase_hwcnt_gpu_enable_map_to_physical(struct kbase_hwcnt_physical_enable_map *dst, + const struct kbase_hwcnt_enable_map *src) { const struct kbase_hwcnt_metadata *metadata; u64 fe_bm[EM_COUNT] = { 0 }; @@ -544,17 +548,13 @@ void kbase_hwcnt_gpu_enable_map_to_physical( metadata = src->metadata; - kbase_hwcnt_metadata_for_each_block( - metadata, grp, blk, blk_inst) { - const u64 grp_type = kbase_hwcnt_metadata_group_type( - metadata, grp); - const u64 blk_type = kbase_hwcnt_metadata_block_type( - metadata, grp, blk); - const u64 *blk_map = kbase_hwcnt_enable_map_block_instance( - src, grp, blk, blk_inst); - - if ((enum kbase_hwcnt_gpu_group_type)grp_type == - KBASE_HWCNT_GPU_GROUP_TYPE_V5) { + kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst) + { + const u64 grp_type = kbase_hwcnt_metadata_group_type(metadata, grp); + const u64 blk_type = kbase_hwcnt_metadata_block_type(metadata, grp, blk); + const u64 *blk_map = kbase_hwcnt_enable_map_block_instance(src, grp, blk, blk_inst); + + if ((enum kbase_hwcnt_gpu_group_type)grp_type == KBASE_HWCNT_GPU_GROUP_TYPE_V5) { const size_t map_stride = kbase_hwcnt_metadata_block_enable_map_stride(metadata, grp, blk); size_t map_idx; @@ -564,7 +564,10 @@ void kbase_hwcnt_gpu_enable_map_to_physical( break; switch ((enum kbase_hwcnt_gpu_v5_block_type)blk_type) { - case KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_UNDEFINED: + case KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_FE_UNDEFINED: + case KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC_UNDEFINED: + case KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_TILER_UNDEFINED: + case KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_MEMSYS_UNDEFINED: /* Nothing to do in this case. */ break; case KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_FE: @@ -602,8 +605,7 @@ void kbase_hwcnt_gpu_enable_map_to_physical( kbase_hwcnt_backend_gpu_block_map_to_physical(mmu_l2_bm[EM_LO], mmu_l2_bm[EM_HI]); } -void kbase_hwcnt_gpu_set_to_physical(enum kbase_hwcnt_physical_set *dst, - enum kbase_hwcnt_set src) +void kbase_hwcnt_gpu_set_to_physical(enum kbase_hwcnt_physical_set *dst, enum kbase_hwcnt_set src) { switch (src) { case KBASE_HWCNT_SET_PRIMARY: @@ -620,9 +622,8 @@ void kbase_hwcnt_gpu_set_to_physical(enum kbase_hwcnt_physical_set *dst, } } -void kbase_hwcnt_gpu_enable_map_from_physical( - struct kbase_hwcnt_enable_map *dst, - const struct kbase_hwcnt_physical_enable_map *src) +void kbase_hwcnt_gpu_enable_map_from_physical(struct kbase_hwcnt_enable_map *dst, + const struct kbase_hwcnt_physical_enable_map *src) { const struct kbase_hwcnt_metadata *metadata; @@ -645,16 +646,13 @@ void kbase_hwcnt_gpu_enable_map_from_physical( kbasep_hwcnt_backend_gpu_block_map_from_physical(src->mmu_l2_bm, &mmu_l2_bm[EM_LO], &mmu_l2_bm[EM_HI]); - kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst) { - const u64 grp_type = kbase_hwcnt_metadata_group_type( - metadata, grp); - const u64 blk_type = kbase_hwcnt_metadata_block_type( - metadata, grp, blk); - u64 *blk_map = kbase_hwcnt_enable_map_block_instance( - dst, grp, blk, blk_inst); + kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst) + { + const u64 grp_type = kbase_hwcnt_metadata_group_type(metadata, grp); + const u64 blk_type = kbase_hwcnt_metadata_block_type(metadata, grp, blk); + u64 *blk_map = kbase_hwcnt_enable_map_block_instance(dst, grp, blk, blk_inst); - if ((enum kbase_hwcnt_gpu_group_type)grp_type == - KBASE_HWCNT_GPU_GROUP_TYPE_V5) { + if ((enum kbase_hwcnt_gpu_group_type)grp_type == KBASE_HWCNT_GPU_GROUP_TYPE_V5) { const size_t map_stride = kbase_hwcnt_metadata_block_enable_map_stride(metadata, grp, blk); size_t map_idx; @@ -664,7 +662,10 @@ void kbase_hwcnt_gpu_enable_map_from_physical( break; switch ((enum kbase_hwcnt_gpu_v5_block_type)blk_type) { - case KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_UNDEFINED: + case KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_FE_UNDEFINED: + case KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC_UNDEFINED: + case KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_TILER_UNDEFINED: + case KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_MEMSYS_UNDEFINED: /* Nothing to do in this case. */ break; case KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_FE: @@ -694,29 +695,25 @@ void kbase_hwcnt_gpu_enable_map_from_physical( } } -void kbase_hwcnt_gpu_patch_dump_headers( - struct kbase_hwcnt_dump_buffer *buf, - const struct kbase_hwcnt_enable_map *enable_map) +void kbase_hwcnt_gpu_patch_dump_headers(struct kbase_hwcnt_dump_buffer *buf, + const struct kbase_hwcnt_enable_map *enable_map) { const struct kbase_hwcnt_metadata *metadata; size_t grp, blk, blk_inst; - if (WARN_ON(!buf) || WARN_ON(!enable_map) || - WARN_ON(buf->metadata != enable_map->metadata)) + if (WARN_ON(!buf) || WARN_ON(!enable_map) || WARN_ON(buf->metadata != enable_map->metadata)) return; metadata = buf->metadata; - kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst) { - const u64 grp_type = - kbase_hwcnt_metadata_group_type(metadata, grp); - u64 *buf_blk = kbase_hwcnt_dump_buffer_block_instance( - buf, grp, blk, blk_inst); - const u64 *blk_map = kbase_hwcnt_enable_map_block_instance( - enable_map, grp, blk, blk_inst); + kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst) + { + const u64 grp_type = kbase_hwcnt_metadata_group_type(metadata, grp); + u64 *buf_blk = kbase_hwcnt_dump_buffer_block_instance(buf, grp, blk, blk_inst); + const u64 *blk_map = + kbase_hwcnt_enable_map_block_instance(enable_map, grp, blk, blk_inst); - if ((enum kbase_hwcnt_gpu_group_type)grp_type == - KBASE_HWCNT_GPU_GROUP_TYPE_V5) { + if ((enum kbase_hwcnt_gpu_group_type)grp_type == KBASE_HWCNT_GPU_GROUP_TYPE_V5) { const size_t map_stride = kbase_hwcnt_metadata_block_enable_map_stride(metadata, grp, blk); u64 prfcnt_bm[EM_COUNT] = { 0 }; diff --git a/mali_kbase/mali_kbase_hwcnt_gpu.h b/mali_kbase/hwcnt/mali_kbase_hwcnt_gpu.h index 648f85f..a49c31e 100644 --- a/mali_kbase/mali_kbase_hwcnt_gpu.h +++ b/mali_kbase/hwcnt/mali_kbase_hwcnt_gpu.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2018, 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2018, 2020-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -22,6 +22,7 @@ #ifndef _KBASE_HWCNT_GPU_H_ #define _KBASE_HWCNT_GPU_H_ +#include <linux/bug.h> #include <linux/types.h> struct kbase_device; @@ -33,9 +34,8 @@ struct kbase_hwcnt_dump_buffer; #define KBASE_HWCNT_V5_BLOCK_TYPE_COUNT 4 #define KBASE_HWCNT_V5_HEADERS_PER_BLOCK 4 #define KBASE_HWCNT_V5_DEFAULT_COUNTERS_PER_BLOCK 60 -#define KBASE_HWCNT_V5_DEFAULT_VALUES_PER_BLOCK \ - (KBASE_HWCNT_V5_HEADERS_PER_BLOCK + \ - KBASE_HWCNT_V5_DEFAULT_COUNTERS_PER_BLOCK) +#define KBASE_HWCNT_V5_DEFAULT_VALUES_PER_BLOCK \ + (KBASE_HWCNT_V5_HEADERS_PER_BLOCK + KBASE_HWCNT_V5_DEFAULT_COUNTERS_PER_BLOCK) /* FrontEnd block count in V5 GPU hardware counter. */ #define KBASE_HWCNT_V5_FE_BLOCK_COUNT 1 @@ -60,33 +60,40 @@ enum kbase_hwcnt_gpu_group_type { /** * enum kbase_hwcnt_gpu_v5_block_type - GPU V5 hardware counter block types, * used to identify metadata blocks. - * @KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_UNDEFINED: Undefined block (e.g. if a - * counter set that a block - * doesn't support is used). * @KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_FE: Front End block (Job manager * or CSF HW). * @KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_FE2: Secondary Front End block (Job * manager or CSF HW). * @KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_FE3: Tertiary Front End block (Job * manager or CSF HW). + * @KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_FE_UNDEFINED: Undefined Front End block + * (e.g. if a counter set that + * a block doesn't support is + * used). * @KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_TILER: Tiler block. + * @KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_TILER_UNDEFINED: Undefined Tiler block. * @KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC: Shader Core block. * @KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC2: Secondary Shader Core block. * @KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC3: Tertiary Shader Core block. + * @KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC_UNDEFINED: Undefined Shader Core block. * @KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_MEMSYS: Memsys block. * @KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_MEMSYS2: Secondary Memsys block. + * @KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_MEMSYS_UNDEFINED: Undefined Memsys block. */ enum kbase_hwcnt_gpu_v5_block_type { - KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_UNDEFINED, KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_FE, KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_FE2, KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_FE3, + KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_FE_UNDEFINED, KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_TILER, + KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_TILER_UNDEFINED, KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC, KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC2, KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC3, + KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC_UNDEFINED, KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_MEMSYS, KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_MEMSYS2, + KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_MEMSYS_UNDEFINED, }; /** @@ -188,6 +195,27 @@ struct kbase_hwcnt_curr_config { }; /** + * kbase_hwcnt_is_block_type_undefined() - Check if a block type is undefined. + * + * @grp_type: Hardware counter group type. + * @blk_type: Hardware counter block type. + * + * Return: true if the block type is undefined, else false. + */ +static inline bool kbase_hwcnt_is_block_type_undefined(const uint64_t grp_type, + const uint64_t blk_type) +{ + /* Warn on unknown group type */ + if (WARN_ON(grp_type != KBASE_HWCNT_GPU_GROUP_TYPE_V5)) + return false; + + return (blk_type == KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_FE_UNDEFINED || + blk_type == KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_TILER_UNDEFINED || + blk_type == KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC_UNDEFINED || + blk_type == KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_MEMSYS_UNDEFINED); +} + +/** * kbase_hwcnt_jm_metadata_create() - Create hardware counter metadata for the * JM GPUs. * @info: Non-NULL pointer to info struct. @@ -199,19 +227,17 @@ struct kbase_hwcnt_curr_config { * * Return: 0 on success, else error code. */ -int kbase_hwcnt_jm_metadata_create( - const struct kbase_hwcnt_gpu_info *info, - enum kbase_hwcnt_set counter_set, - const struct kbase_hwcnt_metadata **out_metadata, - size_t *out_dump_bytes); +int kbase_hwcnt_jm_metadata_create(const struct kbase_hwcnt_gpu_info *info, + enum kbase_hwcnt_set counter_set, + const struct kbase_hwcnt_metadata **out_metadata, + size_t *out_dump_bytes); /** * kbase_hwcnt_jm_metadata_destroy() - Destroy JM GPU hardware counter metadata. * * @metadata: Pointer to metadata to destroy. */ -void kbase_hwcnt_jm_metadata_destroy( - const struct kbase_hwcnt_metadata *metadata); +void kbase_hwcnt_jm_metadata_destroy(const struct kbase_hwcnt_metadata *metadata); /** * kbase_hwcnt_csf_metadata_create() - Create hardware counter metadata for the @@ -223,18 +249,16 @@ void kbase_hwcnt_jm_metadata_destroy( * * Return: 0 on success, else error code. */ -int kbase_hwcnt_csf_metadata_create( - const struct kbase_hwcnt_gpu_info *info, - enum kbase_hwcnt_set counter_set, - const struct kbase_hwcnt_metadata **out_metadata); +int kbase_hwcnt_csf_metadata_create(const struct kbase_hwcnt_gpu_info *info, + enum kbase_hwcnt_set counter_set, + const struct kbase_hwcnt_metadata **out_metadata); /** * kbase_hwcnt_csf_metadata_destroy() - Destroy CSF GPU hardware counter * metadata. * @metadata: Pointer to metadata to destroy. */ -void kbase_hwcnt_csf_metadata_destroy( - const struct kbase_hwcnt_metadata *metadata); +void kbase_hwcnt_csf_metadata_destroy(const struct kbase_hwcnt_metadata *metadata); /** * kbase_hwcnt_jm_dump_get() - Copy or accumulate enabled counters from the raw @@ -260,8 +284,7 @@ void kbase_hwcnt_csf_metadata_destroy( int kbase_hwcnt_jm_dump_get(struct kbase_hwcnt_dump_buffer *dst, u64 *src, const struct kbase_hwcnt_enable_map *dst_enable_map, const u64 pm_core_mask, - const struct kbase_hwcnt_curr_config *curr_config, - bool accumulate); + const struct kbase_hwcnt_curr_config *curr_config, bool accumulate); /** * kbase_hwcnt_csf_dump_get() - Copy or accumulate enabled counters from the raw @@ -281,8 +304,7 @@ int kbase_hwcnt_jm_dump_get(struct kbase_hwcnt_dump_buffer *dst, u64 *src, * Return: 0 on success, else error code. */ int kbase_hwcnt_csf_dump_get(struct kbase_hwcnt_dump_buffer *dst, u64 *src, - const struct kbase_hwcnt_enable_map *dst_enable_map, - bool accumulate); + const struct kbase_hwcnt_enable_map *dst_enable_map, bool accumulate); /** * kbase_hwcnt_backend_gpu_block_map_to_physical() - Convert from a block @@ -336,9 +358,8 @@ static inline u32 kbase_hwcnt_backend_gpu_block_map_to_physical(u64 lo, u64 hi) * individual counter block value, but the physical enable map uses 1 bit for * every 4 counters, shared over all instances of a block. */ -void kbase_hwcnt_gpu_enable_map_to_physical( - struct kbase_hwcnt_physical_enable_map *dst, - const struct kbase_hwcnt_enable_map *src); +void kbase_hwcnt_gpu_enable_map_to_physical(struct kbase_hwcnt_physical_enable_map *dst, + const struct kbase_hwcnt_enable_map *src); /** * kbase_hwcnt_gpu_set_to_physical() - Map counter set selection to physical @@ -347,8 +368,7 @@ void kbase_hwcnt_gpu_enable_map_to_physical( * @dst: Non-NULL pointer to destination physical SET_SELECT value. * @src: Non-NULL pointer to source counter set selection. */ -void kbase_hwcnt_gpu_set_to_physical(enum kbase_hwcnt_physical_set *dst, - enum kbase_hwcnt_set src); +void kbase_hwcnt_gpu_set_to_physical(enum kbase_hwcnt_physical_set *dst, enum kbase_hwcnt_set src); /** * kbase_hwcnt_gpu_enable_map_from_physical() - Convert a physical enable map to @@ -364,9 +384,8 @@ void kbase_hwcnt_gpu_set_to_physical(enum kbase_hwcnt_physical_set *dst, * more than 64, so the enable map abstraction has nowhere to store the enable * information for the 64 non-existent counters. */ -void kbase_hwcnt_gpu_enable_map_from_physical( - struct kbase_hwcnt_enable_map *dst, - const struct kbase_hwcnt_physical_enable_map *src); +void kbase_hwcnt_gpu_enable_map_from_physical(struct kbase_hwcnt_enable_map *dst, + const struct kbase_hwcnt_physical_enable_map *src); /** * kbase_hwcnt_gpu_patch_dump_headers() - Patch all the performance counter @@ -382,8 +401,7 @@ void kbase_hwcnt_gpu_enable_map_from_physical( * kernel-user boundary, to ensure the header is accurate for the enable map * used by the user. */ -void kbase_hwcnt_gpu_patch_dump_headers( - struct kbase_hwcnt_dump_buffer *buf, - const struct kbase_hwcnt_enable_map *enable_map); +void kbase_hwcnt_gpu_patch_dump_headers(struct kbase_hwcnt_dump_buffer *buf, + const struct kbase_hwcnt_enable_map *enable_map); #endif /* _KBASE_HWCNT_GPU_H_ */ diff --git a/mali_kbase/mali_kbase_hwcnt_gpu_narrow.c b/mali_kbase/hwcnt/mali_kbase_hwcnt_gpu_narrow.c index e2caa1c..0cf2f94 100644 --- a/mali_kbase/mali_kbase_hwcnt_gpu_narrow.c +++ b/mali_kbase/hwcnt/mali_kbase_hwcnt_gpu_narrow.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -19,21 +19,19 @@ * */ -#include "mali_kbase_hwcnt_gpu.h" -#include "mali_kbase_hwcnt_gpu_narrow.h" +#include "hwcnt/mali_kbase_hwcnt_gpu.h" +#include "hwcnt/mali_kbase_hwcnt_gpu_narrow.h" #include <linux/bug.h> #include <linux/err.h> #include <linux/slab.h> -int kbase_hwcnt_gpu_metadata_narrow_create( - const struct kbase_hwcnt_metadata_narrow **dst_md_narrow, - const struct kbase_hwcnt_metadata *src_md) +int kbase_hwcnt_gpu_metadata_narrow_create(const struct kbase_hwcnt_metadata_narrow **dst_md_narrow, + const struct kbase_hwcnt_metadata *src_md) { struct kbase_hwcnt_description desc; struct kbase_hwcnt_group_description group; - struct kbase_hwcnt_block_description - blks[KBASE_HWCNT_V5_BLOCK_TYPE_COUNT]; + struct kbase_hwcnt_block_description blks[KBASE_HWCNT_V5_BLOCK_TYPE_COUNT]; size_t prfcnt_values_per_block; size_t blk; int err; @@ -47,18 +45,15 @@ int kbase_hwcnt_gpu_metadata_narrow_create( * count in the metadata. */ if ((kbase_hwcnt_metadata_group_count(src_md) != 1) || - (kbase_hwcnt_metadata_block_count(src_md, 0) != - KBASE_HWCNT_V5_BLOCK_TYPE_COUNT)) + (kbase_hwcnt_metadata_block_count(src_md, 0) != KBASE_HWCNT_V5_BLOCK_TYPE_COUNT)) return -EINVAL; /* Get the values count in the first block. */ - prfcnt_values_per_block = - kbase_hwcnt_metadata_block_values_count(src_md, 0, 0); + prfcnt_values_per_block = kbase_hwcnt_metadata_block_values_count(src_md, 0, 0); /* check all blocks should have same values count. */ for (blk = 1; blk < KBASE_HWCNT_V5_BLOCK_TYPE_COUNT; blk++) { - size_t val_cnt = - kbase_hwcnt_metadata_block_values_count(src_md, 0, blk); + size_t val_cnt = kbase_hwcnt_metadata_block_values_count(src_md, 0, blk); if (val_cnt != prfcnt_values_per_block) return -EINVAL; } @@ -75,12 +70,10 @@ int kbase_hwcnt_gpu_metadata_narrow_create( prfcnt_values_per_block = 64; for (blk = 0; blk < KBASE_HWCNT_V5_BLOCK_TYPE_COUNT; blk++) { - size_t blk_hdr_cnt = kbase_hwcnt_metadata_block_headers_count( - src_md, 0, blk); + size_t blk_hdr_cnt = kbase_hwcnt_metadata_block_headers_count(src_md, 0, blk); blks[blk] = (struct kbase_hwcnt_block_description){ .type = kbase_hwcnt_metadata_block_type(src_md, 0, blk), - .inst_cnt = kbase_hwcnt_metadata_block_instance_count( - src_md, 0, blk), + .inst_cnt = kbase_hwcnt_metadata_block_instance_count(src_md, 0, blk), .hdr_cnt = blk_hdr_cnt, .ctr_cnt = prfcnt_values_per_block - blk_hdr_cnt, }; @@ -105,8 +98,7 @@ int kbase_hwcnt_gpu_metadata_narrow_create( * only supports 32-bit but the created metadata uses 64-bit for * block entry. */ - metadata_narrow->dump_buf_bytes = - metadata_narrow->metadata->dump_buf_bytes >> 1; + metadata_narrow->dump_buf_bytes = metadata_narrow->metadata->dump_buf_bytes >> 1; *dst_md_narrow = metadata_narrow; } else { kfree(metadata_narrow); @@ -115,8 +107,7 @@ int kbase_hwcnt_gpu_metadata_narrow_create( return err; } -void kbase_hwcnt_gpu_metadata_narrow_destroy( - const struct kbase_hwcnt_metadata_narrow *md_narrow) +void kbase_hwcnt_gpu_metadata_narrow_destroy(const struct kbase_hwcnt_metadata_narrow *md_narrow) { if (!md_narrow) return; @@ -125,9 +116,8 @@ void kbase_hwcnt_gpu_metadata_narrow_destroy( kfree(md_narrow); } -int kbase_hwcnt_dump_buffer_narrow_alloc( - const struct kbase_hwcnt_metadata_narrow *md_narrow, - struct kbase_hwcnt_dump_buffer_narrow *dump_buf) +int kbase_hwcnt_dump_buffer_narrow_alloc(const struct kbase_hwcnt_metadata_narrow *md_narrow, + struct kbase_hwcnt_dump_buffer_narrow *dump_buf) { size_t dump_buf_bytes; size_t clk_cnt_buf_bytes; @@ -137,8 +127,7 @@ int kbase_hwcnt_dump_buffer_narrow_alloc( return -EINVAL; dump_buf_bytes = md_narrow->dump_buf_bytes; - clk_cnt_buf_bytes = - sizeof(*dump_buf->clk_cnt_buf) * md_narrow->metadata->clk_cnt; + clk_cnt_buf_bytes = sizeof(*dump_buf->clk_cnt_buf) * md_narrow->metadata->clk_cnt; /* Make a single allocation for both dump_buf and clk_cnt_buf. */ buf = kmalloc(dump_buf_bytes + clk_cnt_buf_bytes, GFP_KERNEL); @@ -154,14 +143,15 @@ int kbase_hwcnt_dump_buffer_narrow_alloc( return 0; } -void kbase_hwcnt_dump_buffer_narrow_free( - struct kbase_hwcnt_dump_buffer_narrow *dump_buf_narrow) +void kbase_hwcnt_dump_buffer_narrow_free(struct kbase_hwcnt_dump_buffer_narrow *dump_buf_narrow) { if (!dump_buf_narrow) return; kfree(dump_buf_narrow->dump_buf); - *dump_buf_narrow = (struct kbase_hwcnt_dump_buffer_narrow){ 0 }; + *dump_buf_narrow = (struct kbase_hwcnt_dump_buffer_narrow){ .md_narrow = NULL, + .dump_buf = NULL, + .clk_cnt_buf = NULL }; } int kbase_hwcnt_dump_buffer_narrow_array_alloc( @@ -180,8 +170,7 @@ int kbase_hwcnt_dump_buffer_narrow_array_alloc( return -EINVAL; dump_buf_bytes = md_narrow->dump_buf_bytes; - clk_cnt_buf_bytes = sizeof(*dump_bufs->bufs->clk_cnt_buf) * - md_narrow->metadata->clk_cnt; + clk_cnt_buf_bytes = sizeof(*dump_bufs->bufs->clk_cnt_buf) * md_narrow->metadata->clk_cnt; /* Allocate memory for the dump buffer struct array */ buffers = kmalloc_array(n, sizeof(*buffers), GFP_KERNEL); @@ -234,27 +223,22 @@ void kbase_hwcnt_dump_buffer_narrow_array_free( memset(dump_bufs, 0, sizeof(*dump_bufs)); } -void kbase_hwcnt_dump_buffer_block_copy_strict_narrow(u32 *dst_blk, - const u64 *src_blk, - const u64 *blk_em, - size_t val_cnt) +void kbase_hwcnt_dump_buffer_block_copy_strict_narrow(u32 *dst_blk, const u64 *src_blk, + const u64 *blk_em, size_t val_cnt) { size_t val; for (val = 0; val < val_cnt; val++) { - bool val_enabled = - kbase_hwcnt_enable_map_block_value_enabled(blk_em, val); - u32 src_val = - (src_blk[val] > U32_MAX) ? U32_MAX : (u32)src_blk[val]; + bool val_enabled = kbase_hwcnt_enable_map_block_value_enabled(blk_em, val); + u32 src_val = (src_blk[val] > U32_MAX) ? U32_MAX : (u32)src_blk[val]; dst_blk[val] = val_enabled ? src_val : 0; } } -void kbase_hwcnt_dump_buffer_copy_strict_narrow( - struct kbase_hwcnt_dump_buffer_narrow *dst_narrow, - const struct kbase_hwcnt_dump_buffer *src, - const struct kbase_hwcnt_enable_map *dst_enable_map) +void kbase_hwcnt_dump_buffer_copy_strict_narrow(struct kbase_hwcnt_dump_buffer_narrow *dst_narrow, + const struct kbase_hwcnt_dump_buffer *src, + const struct kbase_hwcnt_enable_map *dst_enable_map) { const struct kbase_hwcnt_metadata_narrow *metadata_narrow; size_t grp; @@ -262,68 +246,53 @@ void kbase_hwcnt_dump_buffer_copy_strict_narrow( if (WARN_ON(!dst_narrow) || WARN_ON(!src) || WARN_ON(!dst_enable_map) || WARN_ON(dst_narrow->md_narrow->metadata == src->metadata) || - WARN_ON(dst_narrow->md_narrow->metadata->grp_cnt != - src->metadata->grp_cnt) || + WARN_ON(dst_narrow->md_narrow->metadata->grp_cnt != src->metadata->grp_cnt) || WARN_ON(src->metadata->grp_cnt != 1) || WARN_ON(dst_narrow->md_narrow->metadata->grp_metadata[0].blk_cnt != src->metadata->grp_metadata[0].blk_cnt) || WARN_ON(dst_narrow->md_narrow->metadata->grp_metadata[0].blk_cnt != KBASE_HWCNT_V5_BLOCK_TYPE_COUNT) || - WARN_ON(dst_narrow->md_narrow->metadata->grp_metadata[0] - .blk_metadata[0] - .ctr_cnt > + WARN_ON(dst_narrow->md_narrow->metadata->grp_metadata[0].blk_metadata[0].ctr_cnt > src->metadata->grp_metadata[0].blk_metadata[0].ctr_cnt)) return; /* Don't use src metadata since src buffer is bigger than dst buffer. */ metadata_narrow = dst_narrow->md_narrow; - for (grp = 0; - grp < kbase_hwcnt_metadata_narrow_group_count(metadata_narrow); - grp++) { + for (grp = 0; grp < kbase_hwcnt_metadata_narrow_group_count(metadata_narrow); grp++) { size_t blk; - size_t blk_cnt = kbase_hwcnt_metadata_narrow_block_count( - metadata_narrow, grp); + size_t blk_cnt = kbase_hwcnt_metadata_narrow_block_count(metadata_narrow, grp); for (blk = 0; blk < blk_cnt; blk++) { size_t blk_inst; - size_t blk_inst_cnt = - kbase_hwcnt_metadata_narrow_block_instance_count( - metadata_narrow, grp, blk); + size_t blk_inst_cnt = kbase_hwcnt_metadata_narrow_block_instance_count( + metadata_narrow, grp, blk); - for (blk_inst = 0; blk_inst < blk_inst_cnt; - blk_inst++) { + for (blk_inst = 0; blk_inst < blk_inst_cnt; blk_inst++) { /* The narrowed down buffer is only 32-bit. */ - u32 *dst_blk = - kbase_hwcnt_dump_buffer_narrow_block_instance( - dst_narrow, grp, blk, blk_inst); - const u64 *src_blk = - kbase_hwcnt_dump_buffer_block_instance( - src, grp, blk, blk_inst); - const u64 *blk_em = - kbase_hwcnt_enable_map_block_instance( - dst_enable_map, grp, blk, - blk_inst); - size_t val_cnt = - kbase_hwcnt_metadata_narrow_block_values_count( - metadata_narrow, grp, blk); + u32 *dst_blk = kbase_hwcnt_dump_buffer_narrow_block_instance( + dst_narrow, grp, blk, blk_inst); + const u64 *src_blk = kbase_hwcnt_dump_buffer_block_instance( + src, grp, blk, blk_inst); + const u64 *blk_em = kbase_hwcnt_enable_map_block_instance( + dst_enable_map, grp, blk, blk_inst); + size_t val_cnt = kbase_hwcnt_metadata_narrow_block_values_count( + metadata_narrow, grp, blk); /* Align upwards to include padding bytes */ val_cnt = KBASE_HWCNT_ALIGN_UPWARDS( - val_cnt, - (KBASE_HWCNT_BLOCK_BYTE_ALIGNMENT / - KBASE_HWCNT_VALUE_BYTES)); + val_cnt, (KBASE_HWCNT_BLOCK_BYTE_ALIGNMENT / + KBASE_HWCNT_VALUE_BYTES)); - kbase_hwcnt_dump_buffer_block_copy_strict_narrow( - dst_blk, src_blk, blk_em, val_cnt); + kbase_hwcnt_dump_buffer_block_copy_strict_narrow(dst_blk, src_blk, + blk_em, val_cnt); } } } for (clk = 0; clk < metadata_narrow->metadata->clk_cnt; clk++) { - bool clk_enabled = kbase_hwcnt_clk_enable_map_enabled( - dst_enable_map->clk_enable_map, clk); + bool clk_enabled = + kbase_hwcnt_clk_enable_map_enabled(dst_enable_map->clk_enable_map, clk); - dst_narrow->clk_cnt_buf[clk] = - clk_enabled ? src->clk_cnt_buf[clk] : 0; + dst_narrow->clk_cnt_buf[clk] = clk_enabled ? src->clk_cnt_buf[clk] : 0; } } diff --git a/mali_kbase/mali_kbase_hwcnt_gpu_narrow.h b/mali_kbase/hwcnt/mali_kbase_hwcnt_gpu_narrow.h index af6fa19..afd236d 100644 --- a/mali_kbase/mali_kbase_hwcnt_gpu_narrow.h +++ b/mali_kbase/hwcnt/mali_kbase_hwcnt_gpu_narrow.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -22,7 +22,7 @@ #ifndef _KBASE_HWCNT_GPU_NARROW_H_ #define _KBASE_HWCNT_GPU_NARROW_H_ -#include "mali_kbase_hwcnt_types.h" +#include "hwcnt/mali_kbase_hwcnt_types.h" #include <linux/types.h> struct kbase_device; @@ -86,8 +86,8 @@ struct kbase_hwcnt_dump_buffer_narrow_array { * * Return: Number of hardware counter groups described by narrow metadata. */ -static inline size_t kbase_hwcnt_metadata_narrow_group_count( - const struct kbase_hwcnt_metadata_narrow *md_narrow) +static inline size_t +kbase_hwcnt_metadata_narrow_group_count(const struct kbase_hwcnt_metadata_narrow *md_narrow) { return kbase_hwcnt_metadata_group_count(md_narrow->metadata); } @@ -100,8 +100,9 @@ static inline size_t kbase_hwcnt_metadata_narrow_group_count( * * Return: Type of the group grp. */ -static inline u64 kbase_hwcnt_metadata_narrow_group_type( - const struct kbase_hwcnt_metadata_narrow *md_narrow, size_t grp) +static inline u64 +kbase_hwcnt_metadata_narrow_group_type(const struct kbase_hwcnt_metadata_narrow *md_narrow, + size_t grp) { return kbase_hwcnt_metadata_group_type(md_narrow->metadata, grp); } @@ -114,8 +115,9 @@ static inline u64 kbase_hwcnt_metadata_narrow_group_type( * * Return: Number of blocks in group grp. */ -static inline size_t kbase_hwcnt_metadata_narrow_block_count( - const struct kbase_hwcnt_metadata_narrow *md_narrow, size_t grp) +static inline size_t +kbase_hwcnt_metadata_narrow_block_count(const struct kbase_hwcnt_metadata_narrow *md_narrow, + size_t grp) { return kbase_hwcnt_metadata_block_count(md_narrow->metadata, grp); } @@ -131,11 +133,9 @@ static inline size_t kbase_hwcnt_metadata_narrow_block_count( * Return: Number of instances of block blk in group grp. */ static inline size_t kbase_hwcnt_metadata_narrow_block_instance_count( - const struct kbase_hwcnt_metadata_narrow *md_narrow, size_t grp, - size_t blk) + const struct kbase_hwcnt_metadata_narrow *md_narrow, size_t grp, size_t blk) { - return kbase_hwcnt_metadata_block_instance_count(md_narrow->metadata, - grp, blk); + return kbase_hwcnt_metadata_block_instance_count(md_narrow->metadata, grp, blk); } /** @@ -148,12 +148,11 @@ static inline size_t kbase_hwcnt_metadata_narrow_block_instance_count( * * Return: Number of counter headers in each instance of block blk in group grp. */ -static inline size_t kbase_hwcnt_metadata_narrow_block_headers_count( - const struct kbase_hwcnt_metadata_narrow *md_narrow, size_t grp, - size_t blk) +static inline size_t +kbase_hwcnt_metadata_narrow_block_headers_count(const struct kbase_hwcnt_metadata_narrow *md_narrow, + size_t grp, size_t blk) { - return kbase_hwcnt_metadata_block_headers_count(md_narrow->metadata, - grp, blk); + return kbase_hwcnt_metadata_block_headers_count(md_narrow->metadata, grp, blk); } /** @@ -167,11 +166,9 @@ static inline size_t kbase_hwcnt_metadata_narrow_block_headers_count( * Return: Number of counters in each instance of block blk in group grp. */ static inline size_t kbase_hwcnt_metadata_narrow_block_counters_count( - const struct kbase_hwcnt_metadata_narrow *md_narrow, size_t grp, - size_t blk) + const struct kbase_hwcnt_metadata_narrow *md_narrow, size_t grp, size_t blk) { - return kbase_hwcnt_metadata_block_counters_count(md_narrow->metadata, - grp, blk); + return kbase_hwcnt_metadata_block_counters_count(md_narrow->metadata, grp, blk); } /** @@ -184,14 +181,12 @@ static inline size_t kbase_hwcnt_metadata_narrow_block_counters_count( * Return: Number of headers plus counters in each instance of block blk * in group grp. */ -static inline size_t kbase_hwcnt_metadata_narrow_block_values_count( - const struct kbase_hwcnt_metadata_narrow *md_narrow, size_t grp, - size_t blk) +static inline size_t +kbase_hwcnt_metadata_narrow_block_values_count(const struct kbase_hwcnt_metadata_narrow *md_narrow, + size_t grp, size_t blk) { - return kbase_hwcnt_metadata_narrow_block_counters_count(md_narrow, grp, - blk) + - kbase_hwcnt_metadata_narrow_block_headers_count(md_narrow, grp, - blk); + return kbase_hwcnt_metadata_narrow_block_counters_count(md_narrow, grp, blk) + + kbase_hwcnt_metadata_narrow_block_headers_count(md_narrow, grp, blk); } /** @@ -205,18 +200,13 @@ static inline size_t kbase_hwcnt_metadata_narrow_block_values_count( * * Return: u32* to the dump buffer for the block instance. */ -static inline u32 *kbase_hwcnt_dump_buffer_narrow_block_instance( - const struct kbase_hwcnt_dump_buffer_narrow *buf, size_t grp, - size_t blk, size_t blk_inst) +static inline u32 * +kbase_hwcnt_dump_buffer_narrow_block_instance(const struct kbase_hwcnt_dump_buffer_narrow *buf, + size_t grp, size_t blk, size_t blk_inst) { - return buf->dump_buf + - buf->md_narrow->metadata->grp_metadata[grp].dump_buf_index + - buf->md_narrow->metadata->grp_metadata[grp] - .blk_metadata[blk] - .dump_buf_index + - (buf->md_narrow->metadata->grp_metadata[grp] - .blk_metadata[blk] - .dump_buf_stride * + return buf->dump_buf + buf->md_narrow->metadata->grp_metadata[grp].dump_buf_index + + buf->md_narrow->metadata->grp_metadata[grp].blk_metadata[blk].dump_buf_index + + (buf->md_narrow->metadata->grp_metadata[grp].blk_metadata[blk].dump_buf_stride * blk_inst); } @@ -239,17 +229,15 @@ static inline u32 *kbase_hwcnt_dump_buffer_narrow_block_instance( * * Return: 0 on success, else error code. */ -int kbase_hwcnt_gpu_metadata_narrow_create( - const struct kbase_hwcnt_metadata_narrow **dst_md_narrow, - const struct kbase_hwcnt_metadata *src_md); +int kbase_hwcnt_gpu_metadata_narrow_create(const struct kbase_hwcnt_metadata_narrow **dst_md_narrow, + const struct kbase_hwcnt_metadata *src_md); /** * kbase_hwcnt_gpu_metadata_narrow_destroy() - Destroy a hardware counter narrow * metadata object. * @md_narrow: Pointer to hardware counter narrow metadata. */ -void kbase_hwcnt_gpu_metadata_narrow_destroy( - const struct kbase_hwcnt_metadata_narrow *md_narrow); +void kbase_hwcnt_gpu_metadata_narrow_destroy(const struct kbase_hwcnt_metadata_narrow *md_narrow); /** * kbase_hwcnt_dump_buffer_narrow_alloc() - Allocate a narrow dump buffer. @@ -260,9 +248,8 @@ void kbase_hwcnt_gpu_metadata_narrow_destroy( * * Return: 0 on success, else error code. */ -int kbase_hwcnt_dump_buffer_narrow_alloc( - const struct kbase_hwcnt_metadata_narrow *md_narrow, - struct kbase_hwcnt_dump_buffer_narrow *dump_buf); +int kbase_hwcnt_dump_buffer_narrow_alloc(const struct kbase_hwcnt_metadata_narrow *md_narrow, + struct kbase_hwcnt_dump_buffer_narrow *dump_buf); /** * kbase_hwcnt_dump_buffer_narrow_free() - Free a narrow dump buffer. @@ -271,8 +258,7 @@ int kbase_hwcnt_dump_buffer_narrow_alloc( * Can be safely called on an all-zeroed narrow dump buffer structure, or on an * already freed narrow dump buffer. */ -void kbase_hwcnt_dump_buffer_narrow_free( - struct kbase_hwcnt_dump_buffer_narrow *dump_buf); +void kbase_hwcnt_dump_buffer_narrow_free(struct kbase_hwcnt_dump_buffer_narrow *dump_buf); /** * kbase_hwcnt_dump_buffer_narrow_array_alloc() - Allocate an array of narrow @@ -320,10 +306,8 @@ void kbase_hwcnt_dump_buffer_narrow_array_free( * source value is bigger than U32_MAX, or copy the value from source if the * corresponding source value is less than or equal to U32_MAX. */ -void kbase_hwcnt_dump_buffer_block_copy_strict_narrow(u32 *dst_blk, - const u64 *src_blk, - const u64 *blk_em, - size_t val_cnt); +void kbase_hwcnt_dump_buffer_block_copy_strict_narrow(u32 *dst_blk, const u64 *src_blk, + const u64 *blk_em, size_t val_cnt); /** * kbase_hwcnt_dump_buffer_copy_strict_narrow() - Copy all enabled values to a @@ -339,9 +323,8 @@ void kbase_hwcnt_dump_buffer_block_copy_strict_narrow(u32 *dst_blk, * corresponding source value is bigger than U32_MAX, or copy the value from * source if the corresponding source value is less than or equal to U32_MAX. */ -void kbase_hwcnt_dump_buffer_copy_strict_narrow( - struct kbase_hwcnt_dump_buffer_narrow *dst_narrow, - const struct kbase_hwcnt_dump_buffer *src, - const struct kbase_hwcnt_enable_map *dst_enable_map); +void kbase_hwcnt_dump_buffer_copy_strict_narrow(struct kbase_hwcnt_dump_buffer_narrow *dst_narrow, + const struct kbase_hwcnt_dump_buffer *src, + const struct kbase_hwcnt_enable_map *dst_enable_map); #endif /* _KBASE_HWCNT_GPU_NARROW_H_ */ diff --git a/mali_kbase/mali_kbase_hwcnt_types.c b/mali_kbase/hwcnt/mali_kbase_hwcnt_types.c index d925ed7..763eb31 100644 --- a/mali_kbase/mali_kbase_hwcnt_types.c +++ b/mali_kbase/hwcnt/mali_kbase_hwcnt_types.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2018, 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2018, 2020-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -19,13 +19,12 @@ * */ -#include "mali_kbase_hwcnt_types.h" +#include "hwcnt/mali_kbase_hwcnt_types.h" #include <linux/slab.h> -int kbase_hwcnt_metadata_create( - const struct kbase_hwcnt_description *desc, - const struct kbase_hwcnt_metadata **out_metadata) +int kbase_hwcnt_metadata_create(const struct kbase_hwcnt_description *desc, + const struct kbase_hwcnt_metadata **out_metadata) { char *buf; struct kbase_hwcnt_metadata *metadata; @@ -56,8 +55,7 @@ int kbase_hwcnt_metadata_create( /* Block metadata */ for (grp = 0; grp < desc->grp_cnt; grp++) { - size += sizeof(struct kbase_hwcnt_block_metadata) * - desc->grps[grp].blk_cnt; + size += sizeof(struct kbase_hwcnt_block_metadata) * desc->grps[grp].blk_cnt; } /* Single allocation for the entire metadata */ @@ -83,8 +81,7 @@ int kbase_hwcnt_metadata_create( for (grp = 0; grp < desc->grp_cnt; grp++) { size_t blk; - const struct kbase_hwcnt_group_description *grp_desc = - desc->grps + grp; + const struct kbase_hwcnt_group_description *grp_desc = desc->grps + grp; struct kbase_hwcnt_group_metadata *grp_md = grp_mds + grp; size_t group_enable_map_count = 0; @@ -94,37 +91,28 @@ int kbase_hwcnt_metadata_create( /* Bump allocate this group's block metadata */ struct kbase_hwcnt_block_metadata *blk_mds = (struct kbase_hwcnt_block_metadata *)(buf + offset); - offset += sizeof(struct kbase_hwcnt_block_metadata) * - grp_desc->blk_cnt; + offset += sizeof(struct kbase_hwcnt_block_metadata) * grp_desc->blk_cnt; /* Fill in each block in the group's information */ for (blk = 0; blk < grp_desc->blk_cnt; blk++) { - const struct kbase_hwcnt_block_description *blk_desc = - grp_desc->blks + blk; - struct kbase_hwcnt_block_metadata *blk_md = - blk_mds + blk; - const size_t n_values = - blk_desc->hdr_cnt + blk_desc->ctr_cnt; + const struct kbase_hwcnt_block_description *blk_desc = grp_desc->blks + blk; + struct kbase_hwcnt_block_metadata *blk_md = blk_mds + blk; + const size_t n_values = blk_desc->hdr_cnt + blk_desc->ctr_cnt; blk_md->type = blk_desc->type; blk_md->inst_cnt = blk_desc->inst_cnt; blk_md->hdr_cnt = blk_desc->hdr_cnt; blk_md->ctr_cnt = blk_desc->ctr_cnt; blk_md->enable_map_index = group_enable_map_count; - blk_md->enable_map_stride = - kbase_hwcnt_bitfield_count(n_values); + blk_md->enable_map_stride = kbase_hwcnt_bitfield_count(n_values); blk_md->dump_buf_index = group_dump_buffer_count; - blk_md->dump_buf_stride = - KBASE_HWCNT_ALIGN_UPWARDS( - n_values, - (KBASE_HWCNT_BLOCK_BYTE_ALIGNMENT / - KBASE_HWCNT_VALUE_BYTES)); + blk_md->dump_buf_stride = KBASE_HWCNT_ALIGN_UPWARDS( + n_values, + (KBASE_HWCNT_BLOCK_BYTE_ALIGNMENT / KBASE_HWCNT_VALUE_BYTES)); blk_md->avail_mask_index = group_avail_mask_bits; - group_enable_map_count += - blk_md->enable_map_stride * blk_md->inst_cnt; - group_dump_buffer_count += - blk_md->dump_buf_stride * blk_md->inst_cnt; + group_enable_map_count += blk_md->enable_map_stride * blk_md->inst_cnt; + group_dump_buffer_count += blk_md->dump_buf_stride * blk_md->inst_cnt; group_avail_mask_bits += blk_md->inst_cnt; } @@ -144,8 +132,7 @@ int kbase_hwcnt_metadata_create( /* Fill in the top level metadata's information */ metadata->grp_cnt = desc->grp_cnt; metadata->grp_metadata = grp_mds; - metadata->enable_map_bytes = - enable_map_count * KBASE_HWCNT_BITFIELD_BYTES; + metadata->enable_map_bytes = enable_map_count * KBASE_HWCNT_BITFIELD_BYTES; metadata->dump_buf_bytes = dump_buf_count * KBASE_HWCNT_VALUE_BYTES; metadata->avail_mask = desc->avail_mask; metadata->clk_cnt = desc->clk_cnt; @@ -155,8 +142,7 @@ int kbase_hwcnt_metadata_create( * bit per 4 bytes in the dump buffer. */ WARN_ON(metadata->dump_buf_bytes != - (metadata->enable_map_bytes * - BITS_PER_BYTE * KBASE_HWCNT_VALUE_BYTES)); + (metadata->enable_map_bytes * BITS_PER_BYTE * KBASE_HWCNT_VALUE_BYTES)); *out_metadata = metadata; return 0; @@ -167,9 +153,8 @@ void kbase_hwcnt_metadata_destroy(const struct kbase_hwcnt_metadata *metadata) kfree(metadata); } -int kbase_hwcnt_enable_map_alloc( - const struct kbase_hwcnt_metadata *metadata, - struct kbase_hwcnt_enable_map *enable_map) +int kbase_hwcnt_enable_map_alloc(const struct kbase_hwcnt_metadata *metadata, + struct kbase_hwcnt_enable_map *enable_map) { u64 *enable_map_buf; @@ -177,8 +162,7 @@ int kbase_hwcnt_enable_map_alloc( return -EINVAL; if (metadata->enable_map_bytes > 0) { - enable_map_buf = - kzalloc(metadata->enable_map_bytes, GFP_KERNEL); + enable_map_buf = kzalloc(metadata->enable_map_bytes, GFP_KERNEL); if (!enable_map_buf) return -ENOMEM; } else { @@ -200,9 +184,8 @@ void kbase_hwcnt_enable_map_free(struct kbase_hwcnt_enable_map *enable_map) enable_map->metadata = NULL; } -int kbase_hwcnt_dump_buffer_alloc( - const struct kbase_hwcnt_metadata *metadata, - struct kbase_hwcnt_dump_buffer *dump_buf) +int kbase_hwcnt_dump_buffer_alloc(const struct kbase_hwcnt_metadata *metadata, + struct kbase_hwcnt_dump_buffer *dump_buf) { size_t dump_buf_bytes; size_t clk_cnt_buf_bytes; @@ -235,10 +218,8 @@ void kbase_hwcnt_dump_buffer_free(struct kbase_hwcnt_dump_buffer *dump_buf) memset(dump_buf, 0, sizeof(*dump_buf)); } -int kbase_hwcnt_dump_buffer_array_alloc( - const struct kbase_hwcnt_metadata *metadata, - size_t n, - struct kbase_hwcnt_dump_buffer_array *dump_bufs) +int kbase_hwcnt_dump_buffer_array_alloc(const struct kbase_hwcnt_metadata *metadata, size_t n, + struct kbase_hwcnt_dump_buffer_array *dump_bufs) { struct kbase_hwcnt_dump_buffer *buffers; size_t buf_idx; @@ -251,8 +232,7 @@ int kbase_hwcnt_dump_buffer_array_alloc( return -EINVAL; dump_buf_bytes = metadata->dump_buf_bytes; - clk_cnt_buf_bytes = - sizeof(*dump_bufs->bufs->clk_cnt_buf) * metadata->clk_cnt; + clk_cnt_buf_bytes = sizeof(*dump_bufs->bufs->clk_cnt_buf) * metadata->clk_cnt; /* Allocate memory for the dump buffer struct array */ buffers = kmalloc_array(n, sizeof(*buffers), GFP_KERNEL); @@ -283,15 +263,13 @@ int kbase_hwcnt_dump_buffer_array_alloc( buffers[buf_idx].metadata = metadata; buffers[buf_idx].dump_buf = (u64 *)(addr + dump_buf_offset); - buffers[buf_idx].clk_cnt_buf = - (u64 *)(addr + clk_cnt_buf_offset); + buffers[buf_idx].clk_cnt_buf = (u64 *)(addr + clk_cnt_buf_offset); } return 0; } -void kbase_hwcnt_dump_buffer_array_free( - struct kbase_hwcnt_dump_buffer_array *dump_bufs) +void kbase_hwcnt_dump_buffer_array_free(struct kbase_hwcnt_dump_buffer_array *dump_bufs) { if (!dump_bufs) return; @@ -301,84 +279,71 @@ void kbase_hwcnt_dump_buffer_array_free( memset(dump_bufs, 0, sizeof(*dump_bufs)); } -void kbase_hwcnt_dump_buffer_zero( - struct kbase_hwcnt_dump_buffer *dst, - const struct kbase_hwcnt_enable_map *dst_enable_map) +void kbase_hwcnt_dump_buffer_zero(struct kbase_hwcnt_dump_buffer *dst, + const struct kbase_hwcnt_enable_map *dst_enable_map) { const struct kbase_hwcnt_metadata *metadata; size_t grp, blk, blk_inst; - if (WARN_ON(!dst) || - WARN_ON(!dst_enable_map) || + if (WARN_ON(!dst) || WARN_ON(!dst_enable_map) || WARN_ON(dst->metadata != dst_enable_map->metadata)) return; metadata = dst->metadata; - kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst) { + kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst) + { u64 *dst_blk; size_t val_cnt; - if (!kbase_hwcnt_enable_map_block_enabled( - dst_enable_map, grp, blk, blk_inst)) + if (!kbase_hwcnt_enable_map_block_enabled(dst_enable_map, grp, blk, blk_inst)) continue; - dst_blk = kbase_hwcnt_dump_buffer_block_instance( - dst, grp, blk, blk_inst); - val_cnt = kbase_hwcnt_metadata_block_values_count( - metadata, grp, blk); + dst_blk = kbase_hwcnt_dump_buffer_block_instance(dst, grp, blk, blk_inst); + val_cnt = kbase_hwcnt_metadata_block_values_count(metadata, grp, blk); kbase_hwcnt_dump_buffer_block_zero(dst_blk, val_cnt); } - memset(dst->clk_cnt_buf, 0, - sizeof(*dst->clk_cnt_buf) * metadata->clk_cnt); + memset(dst->clk_cnt_buf, 0, sizeof(*dst->clk_cnt_buf) * metadata->clk_cnt); } -void kbase_hwcnt_dump_buffer_zero_strict( - struct kbase_hwcnt_dump_buffer *dst) +void kbase_hwcnt_dump_buffer_zero_strict(struct kbase_hwcnt_dump_buffer *dst) { if (WARN_ON(!dst)) return; memset(dst->dump_buf, 0, dst->metadata->dump_buf_bytes); - memset(dst->clk_cnt_buf, 0, - sizeof(*dst->clk_cnt_buf) * dst->metadata->clk_cnt); + memset(dst->clk_cnt_buf, 0, sizeof(*dst->clk_cnt_buf) * dst->metadata->clk_cnt); } -void kbase_hwcnt_dump_buffer_zero_non_enabled( - struct kbase_hwcnt_dump_buffer *dst, - const struct kbase_hwcnt_enable_map *dst_enable_map) +void kbase_hwcnt_dump_buffer_zero_non_enabled(struct kbase_hwcnt_dump_buffer *dst, + const struct kbase_hwcnt_enable_map *dst_enable_map) { const struct kbase_hwcnt_metadata *metadata; size_t grp, blk, blk_inst; - if (WARN_ON(!dst) || - WARN_ON(!dst_enable_map) || + if (WARN_ON(!dst) || WARN_ON(!dst_enable_map) || WARN_ON(dst->metadata != dst_enable_map->metadata)) return; metadata = dst->metadata; - kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst) { - u64 *dst_blk = kbase_hwcnt_dump_buffer_block_instance( - dst, grp, blk, blk_inst); - const u64 *blk_em = kbase_hwcnt_enable_map_block_instance( - dst_enable_map, grp, blk, blk_inst); - size_t val_cnt = kbase_hwcnt_metadata_block_values_count( - metadata, grp, blk); + kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst) + { + u64 *dst_blk = kbase_hwcnt_dump_buffer_block_instance(dst, grp, blk, blk_inst); + const u64 *blk_em = + kbase_hwcnt_enable_map_block_instance(dst_enable_map, grp, blk, blk_inst); + size_t val_cnt = kbase_hwcnt_metadata_block_values_count(metadata, grp, blk); /* Align upwards to include padding bytes */ - val_cnt = KBASE_HWCNT_ALIGN_UPWARDS(val_cnt, - (KBASE_HWCNT_BLOCK_BYTE_ALIGNMENT / - KBASE_HWCNT_VALUE_BYTES)); + val_cnt = KBASE_HWCNT_ALIGN_UPWARDS( + val_cnt, (KBASE_HWCNT_BLOCK_BYTE_ALIGNMENT / KBASE_HWCNT_VALUE_BYTES)); - if (kbase_hwcnt_metadata_block_instance_avail( - metadata, grp, blk, blk_inst)) { + if (kbase_hwcnt_metadata_block_instance_avail(metadata, grp, blk, blk_inst)) { /* Block available, so only zero non-enabled values */ - kbase_hwcnt_dump_buffer_block_zero_non_enabled( - dst_blk, blk_em, val_cnt); + kbase_hwcnt_dump_buffer_block_zero_non_enabled(dst_blk, blk_em, val_cnt); } else { /* Block not available, so zero the entire thing */ kbase_hwcnt_dump_buffer_block_zero(dst_blk, val_cnt); @@ -386,188 +351,159 @@ void kbase_hwcnt_dump_buffer_zero_non_enabled( } } -void kbase_hwcnt_dump_buffer_copy( - struct kbase_hwcnt_dump_buffer *dst, - const struct kbase_hwcnt_dump_buffer *src, - const struct kbase_hwcnt_enable_map *dst_enable_map) +void kbase_hwcnt_dump_buffer_copy(struct kbase_hwcnt_dump_buffer *dst, + const struct kbase_hwcnt_dump_buffer *src, + const struct kbase_hwcnt_enable_map *dst_enable_map) { const struct kbase_hwcnt_metadata *metadata; size_t grp, blk, blk_inst; size_t clk; - if (WARN_ON(!dst) || - WARN_ON(!src) || - WARN_ON(!dst_enable_map) || - WARN_ON(dst == src) || + if (WARN_ON(!dst) || WARN_ON(!src) || WARN_ON(!dst_enable_map) || WARN_ON(dst == src) || WARN_ON(dst->metadata != src->metadata) || WARN_ON(dst->metadata != dst_enable_map->metadata)) return; metadata = dst->metadata; - kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst) { + kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst) + { u64 *dst_blk; const u64 *src_blk; size_t val_cnt; - if (!kbase_hwcnt_enable_map_block_enabled( - dst_enable_map, grp, blk, blk_inst)) + if (!kbase_hwcnt_enable_map_block_enabled(dst_enable_map, grp, blk, blk_inst)) continue; - dst_blk = kbase_hwcnt_dump_buffer_block_instance( - dst, grp, blk, blk_inst); - src_blk = kbase_hwcnt_dump_buffer_block_instance( - src, grp, blk, blk_inst); - val_cnt = kbase_hwcnt_metadata_block_values_count( - metadata, grp, blk); + dst_blk = kbase_hwcnt_dump_buffer_block_instance(dst, grp, blk, blk_inst); + src_blk = kbase_hwcnt_dump_buffer_block_instance(src, grp, blk, blk_inst); + val_cnt = kbase_hwcnt_metadata_block_values_count(metadata, grp, blk); kbase_hwcnt_dump_buffer_block_copy(dst_blk, src_blk, val_cnt); } - kbase_hwcnt_metadata_for_each_clock(metadata, clk) { - if (kbase_hwcnt_clk_enable_map_enabled( - dst_enable_map->clk_enable_map, clk)) + kbase_hwcnt_metadata_for_each_clock(metadata, clk) + { + if (kbase_hwcnt_clk_enable_map_enabled(dst_enable_map->clk_enable_map, clk)) dst->clk_cnt_buf[clk] = src->clk_cnt_buf[clk]; } } -void kbase_hwcnt_dump_buffer_copy_strict( - struct kbase_hwcnt_dump_buffer *dst, - const struct kbase_hwcnt_dump_buffer *src, - const struct kbase_hwcnt_enable_map *dst_enable_map) +void kbase_hwcnt_dump_buffer_copy_strict(struct kbase_hwcnt_dump_buffer *dst, + const struct kbase_hwcnt_dump_buffer *src, + const struct kbase_hwcnt_enable_map *dst_enable_map) { const struct kbase_hwcnt_metadata *metadata; size_t grp, blk, blk_inst; size_t clk; - if (WARN_ON(!dst) || - WARN_ON(!src) || - WARN_ON(!dst_enable_map) || - WARN_ON(dst == src) || + if (WARN_ON(!dst) || WARN_ON(!src) || WARN_ON(!dst_enable_map) || WARN_ON(dst == src) || WARN_ON(dst->metadata != src->metadata) || WARN_ON(dst->metadata != dst_enable_map->metadata)) return; metadata = dst->metadata; - kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst) { - u64 *dst_blk = kbase_hwcnt_dump_buffer_block_instance( - dst, grp, blk, blk_inst); - const u64 *src_blk = kbase_hwcnt_dump_buffer_block_instance( - src, grp, blk, blk_inst); - const u64 *blk_em = kbase_hwcnt_enable_map_block_instance( - dst_enable_map, grp, blk, blk_inst); - size_t val_cnt = kbase_hwcnt_metadata_block_values_count( - metadata, grp, blk); + kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst) + { + u64 *dst_blk = kbase_hwcnt_dump_buffer_block_instance(dst, grp, blk, blk_inst); + const u64 *src_blk = + kbase_hwcnt_dump_buffer_block_instance(src, grp, blk, blk_inst); + const u64 *blk_em = + kbase_hwcnt_enable_map_block_instance(dst_enable_map, grp, blk, blk_inst); + size_t val_cnt = kbase_hwcnt_metadata_block_values_count(metadata, grp, blk); /* Align upwards to include padding bytes */ - val_cnt = KBASE_HWCNT_ALIGN_UPWARDS(val_cnt, - (KBASE_HWCNT_BLOCK_BYTE_ALIGNMENT / - KBASE_HWCNT_VALUE_BYTES)); + val_cnt = KBASE_HWCNT_ALIGN_UPWARDS( + val_cnt, (KBASE_HWCNT_BLOCK_BYTE_ALIGNMENT / KBASE_HWCNT_VALUE_BYTES)); - kbase_hwcnt_dump_buffer_block_copy_strict( - dst_blk, src_blk, blk_em, val_cnt); + kbase_hwcnt_dump_buffer_block_copy_strict(dst_blk, src_blk, blk_em, val_cnt); } - kbase_hwcnt_metadata_for_each_clock(metadata, clk) { + kbase_hwcnt_metadata_for_each_clock(metadata, clk) + { bool clk_enabled = - kbase_hwcnt_clk_enable_map_enabled( - dst_enable_map->clk_enable_map, clk); + kbase_hwcnt_clk_enable_map_enabled(dst_enable_map->clk_enable_map, clk); dst->clk_cnt_buf[clk] = clk_enabled ? src->clk_cnt_buf[clk] : 0; } } -void kbase_hwcnt_dump_buffer_accumulate( - struct kbase_hwcnt_dump_buffer *dst, - const struct kbase_hwcnt_dump_buffer *src, - const struct kbase_hwcnt_enable_map *dst_enable_map) +void kbase_hwcnt_dump_buffer_accumulate(struct kbase_hwcnt_dump_buffer *dst, + const struct kbase_hwcnt_dump_buffer *src, + const struct kbase_hwcnt_enable_map *dst_enable_map) { const struct kbase_hwcnt_metadata *metadata; size_t grp, blk, blk_inst; size_t clk; - if (WARN_ON(!dst) || - WARN_ON(!src) || - WARN_ON(!dst_enable_map) || - WARN_ON(dst == src) || + if (WARN_ON(!dst) || WARN_ON(!src) || WARN_ON(!dst_enable_map) || WARN_ON(dst == src) || WARN_ON(dst->metadata != src->metadata) || WARN_ON(dst->metadata != dst_enable_map->metadata)) return; metadata = dst->metadata; - kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst) { + kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst) + { u64 *dst_blk; const u64 *src_blk; size_t hdr_cnt; size_t ctr_cnt; - if (!kbase_hwcnt_enable_map_block_enabled( - dst_enable_map, grp, blk, blk_inst)) + if (!kbase_hwcnt_enable_map_block_enabled(dst_enable_map, grp, blk, blk_inst)) continue; - dst_blk = kbase_hwcnt_dump_buffer_block_instance( - dst, grp, blk, blk_inst); - src_blk = kbase_hwcnt_dump_buffer_block_instance( - src, grp, blk, blk_inst); - hdr_cnt = kbase_hwcnt_metadata_block_headers_count( - metadata, grp, blk); - ctr_cnt = kbase_hwcnt_metadata_block_counters_count( - metadata, grp, blk); - - kbase_hwcnt_dump_buffer_block_accumulate( - dst_blk, src_blk, hdr_cnt, ctr_cnt); + dst_blk = kbase_hwcnt_dump_buffer_block_instance(dst, grp, blk, blk_inst); + src_blk = kbase_hwcnt_dump_buffer_block_instance(src, grp, blk, blk_inst); + hdr_cnt = kbase_hwcnt_metadata_block_headers_count(metadata, grp, blk); + ctr_cnt = kbase_hwcnt_metadata_block_counters_count(metadata, grp, blk); + + kbase_hwcnt_dump_buffer_block_accumulate(dst_blk, src_blk, hdr_cnt, ctr_cnt); } - kbase_hwcnt_metadata_for_each_clock(metadata, clk) { - if (kbase_hwcnt_clk_enable_map_enabled( - dst_enable_map->clk_enable_map, clk)) + kbase_hwcnt_metadata_for_each_clock(metadata, clk) + { + if (kbase_hwcnt_clk_enable_map_enabled(dst_enable_map->clk_enable_map, clk)) dst->clk_cnt_buf[clk] += src->clk_cnt_buf[clk]; } } -void kbase_hwcnt_dump_buffer_accumulate_strict( - struct kbase_hwcnt_dump_buffer *dst, - const struct kbase_hwcnt_dump_buffer *src, - const struct kbase_hwcnt_enable_map *dst_enable_map) +void kbase_hwcnt_dump_buffer_accumulate_strict(struct kbase_hwcnt_dump_buffer *dst, + const struct kbase_hwcnt_dump_buffer *src, + const struct kbase_hwcnt_enable_map *dst_enable_map) { const struct kbase_hwcnt_metadata *metadata; size_t grp, blk, blk_inst; size_t clk; - if (WARN_ON(!dst) || - WARN_ON(!src) || - WARN_ON(!dst_enable_map) || - WARN_ON(dst == src) || + if (WARN_ON(!dst) || WARN_ON(!src) || WARN_ON(!dst_enable_map) || WARN_ON(dst == src) || WARN_ON(dst->metadata != src->metadata) || WARN_ON(dst->metadata != dst_enable_map->metadata)) return; metadata = dst->metadata; - kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst) { - u64 *dst_blk = kbase_hwcnt_dump_buffer_block_instance( - dst, grp, blk, blk_inst); - const u64 *src_blk = kbase_hwcnt_dump_buffer_block_instance( - src, grp, blk, blk_inst); - const u64 *blk_em = kbase_hwcnt_enable_map_block_instance( - dst_enable_map, grp, blk, blk_inst); - size_t hdr_cnt = kbase_hwcnt_metadata_block_headers_count( - metadata, grp, blk); - size_t ctr_cnt = kbase_hwcnt_metadata_block_counters_count( - metadata, grp, blk); + kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst) + { + u64 *dst_blk = kbase_hwcnt_dump_buffer_block_instance(dst, grp, blk, blk_inst); + const u64 *src_blk = + kbase_hwcnt_dump_buffer_block_instance(src, grp, blk, blk_inst); + const u64 *blk_em = + kbase_hwcnt_enable_map_block_instance(dst_enable_map, grp, blk, blk_inst); + size_t hdr_cnt = kbase_hwcnt_metadata_block_headers_count(metadata, grp, blk); + size_t ctr_cnt = kbase_hwcnt_metadata_block_counters_count(metadata, grp, blk); /* Align upwards to include padding bytes */ - ctr_cnt = KBASE_HWCNT_ALIGN_UPWARDS(hdr_cnt + ctr_cnt, - (KBASE_HWCNT_BLOCK_BYTE_ALIGNMENT / - KBASE_HWCNT_VALUE_BYTES) - hdr_cnt); + ctr_cnt = KBASE_HWCNT_ALIGN_UPWARDS( + hdr_cnt + ctr_cnt, + (KBASE_HWCNT_BLOCK_BYTE_ALIGNMENT / KBASE_HWCNT_VALUE_BYTES) - hdr_cnt); - kbase_hwcnt_dump_buffer_block_accumulate_strict( - dst_blk, src_blk, blk_em, hdr_cnt, ctr_cnt); + kbase_hwcnt_dump_buffer_block_accumulate_strict(dst_blk, src_blk, blk_em, hdr_cnt, + ctr_cnt); } - kbase_hwcnt_metadata_for_each_clock(metadata, clk) { - if (kbase_hwcnt_clk_enable_map_enabled( - dst_enable_map->clk_enable_map, clk)) + kbase_hwcnt_metadata_for_each_clock(metadata, clk) + { + if (kbase_hwcnt_clk_enable_map_enabled(dst_enable_map->clk_enable_map, clk)) dst->clk_cnt_buf[clk] += src->clk_cnt_buf[clk]; else dst->clk_cnt_buf[clk] = 0; diff --git a/mali_kbase/mali_kbase_hwcnt_types.h b/mali_kbase/hwcnt/mali_kbase_hwcnt_types.h index 9397840..5c5ada4 100644 --- a/mali_kbase/mali_kbase_hwcnt_types.h +++ b/mali_kbase/hwcnt/mali_kbase_hwcnt_types.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2018, 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2018, 2020-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -104,8 +104,7 @@ #define KBASE_HWCNT_AVAIL_MASK_BITS (sizeof(u64) * BITS_PER_BYTE) /* Minimum alignment of each block of hardware counters */ -#define KBASE_HWCNT_BLOCK_BYTE_ALIGNMENT \ - (KBASE_HWCNT_BITFIELD_BITS * KBASE_HWCNT_VALUE_BYTES) +#define KBASE_HWCNT_BLOCK_BYTE_ALIGNMENT (KBASE_HWCNT_BITFIELD_BITS * KBASE_HWCNT_VALUE_BYTES) /** * KBASE_HWCNT_ALIGN_UPWARDS() - Calculate next aligned value. @@ -115,7 +114,7 @@ * Return: Input value if already aligned to the specified boundary, or next * (incrementing upwards) aligned value. */ -#define KBASE_HWCNT_ALIGN_UPWARDS(value, alignment) \ +#define KBASE_HWCNT_ALIGN_UPWARDS(value, alignment) \ (value + ((alignment - (value % alignment)) % alignment)) /** @@ -307,9 +306,8 @@ struct kbase_hwcnt_dump_buffer_array { * * Return: 0 on success, else error code. */ -int kbase_hwcnt_metadata_create( - const struct kbase_hwcnt_description *desc, - const struct kbase_hwcnt_metadata **metadata); +int kbase_hwcnt_metadata_create(const struct kbase_hwcnt_description *desc, + const struct kbase_hwcnt_metadata **metadata); /** * kbase_hwcnt_metadata_destroy() - Destroy a hardware counter metadata object. @@ -323,8 +321,7 @@ void kbase_hwcnt_metadata_destroy(const struct kbase_hwcnt_metadata *metadata); * * Return: Number of hardware counter groups described by metadata. */ -static inline size_t -kbase_hwcnt_metadata_group_count(const struct kbase_hwcnt_metadata *metadata) +static inline size_t kbase_hwcnt_metadata_group_count(const struct kbase_hwcnt_metadata *metadata) { if (WARN_ON(!metadata)) return 0; @@ -339,9 +336,8 @@ kbase_hwcnt_metadata_group_count(const struct kbase_hwcnt_metadata *metadata) * * Return: Type of the group grp. */ -static inline u64 -kbase_hwcnt_metadata_group_type(const struct kbase_hwcnt_metadata *metadata, - size_t grp) +static inline u64 kbase_hwcnt_metadata_group_type(const struct kbase_hwcnt_metadata *metadata, + size_t grp) { if (WARN_ON(!metadata) || WARN_ON(grp >= metadata->grp_cnt)) return 0; @@ -356,9 +352,8 @@ kbase_hwcnt_metadata_group_type(const struct kbase_hwcnt_metadata *metadata, * * Return: Number of blocks in group grp. */ -static inline size_t -kbase_hwcnt_metadata_block_count(const struct kbase_hwcnt_metadata *metadata, - size_t grp) +static inline size_t kbase_hwcnt_metadata_block_count(const struct kbase_hwcnt_metadata *metadata, + size_t grp) { if (WARN_ON(!metadata) || WARN_ON(grp >= metadata->grp_cnt)) return 0; @@ -374,9 +369,8 @@ kbase_hwcnt_metadata_block_count(const struct kbase_hwcnt_metadata *metadata, * * Return: Type of the block blk in group grp. */ -static inline u64 -kbase_hwcnt_metadata_block_type(const struct kbase_hwcnt_metadata *metadata, - size_t grp, size_t blk) +static inline u64 kbase_hwcnt_metadata_block_type(const struct kbase_hwcnt_metadata *metadata, + size_t grp, size_t blk) { if (WARN_ON(!metadata) || WARN_ON(grp >= metadata->grp_cnt) || WARN_ON(blk >= metadata->grp_metadata[grp].blk_cnt)) @@ -394,8 +388,9 @@ kbase_hwcnt_metadata_block_type(const struct kbase_hwcnt_metadata *metadata, * * Return: Number of instances of block blk in group grp. */ -static inline size_t kbase_hwcnt_metadata_block_instance_count( - const struct kbase_hwcnt_metadata *metadata, size_t grp, size_t blk) +static inline size_t +kbase_hwcnt_metadata_block_instance_count(const struct kbase_hwcnt_metadata *metadata, size_t grp, + size_t blk) { if (WARN_ON(!metadata) || WARN_ON(grp >= metadata->grp_cnt) || WARN_ON(blk >= metadata->grp_metadata[grp].blk_cnt)) @@ -413,8 +408,9 @@ static inline size_t kbase_hwcnt_metadata_block_instance_count( * * Return: Number of counter headers in each instance of block blk in group grp. */ -static inline size_t kbase_hwcnt_metadata_block_headers_count( - const struct kbase_hwcnt_metadata *metadata, size_t grp, size_t blk) +static inline size_t +kbase_hwcnt_metadata_block_headers_count(const struct kbase_hwcnt_metadata *metadata, size_t grp, + size_t blk) { if (WARN_ON(!metadata) || WARN_ON(grp >= metadata->grp_cnt) || WARN_ON(blk >= metadata->grp_metadata[grp].blk_cnt)) @@ -431,8 +427,9 @@ static inline size_t kbase_hwcnt_metadata_block_headers_count( * * Return: Number of counters in each instance of block blk in group grp. */ -static inline size_t kbase_hwcnt_metadata_block_counters_count( - const struct kbase_hwcnt_metadata *metadata, size_t grp, size_t blk) +static inline size_t +kbase_hwcnt_metadata_block_counters_count(const struct kbase_hwcnt_metadata *metadata, size_t grp, + size_t blk) { if (WARN_ON(!metadata) || WARN_ON(grp >= metadata->grp_cnt) || WARN_ON(blk >= metadata->grp_metadata[grp].blk_cnt)) @@ -449,8 +446,9 @@ static inline size_t kbase_hwcnt_metadata_block_counters_count( * * Return: enable map stride in each instance of block blk in group grp. */ -static inline size_t kbase_hwcnt_metadata_block_enable_map_stride( - const struct kbase_hwcnt_metadata *metadata, size_t grp, size_t blk) +static inline size_t +kbase_hwcnt_metadata_block_enable_map_stride(const struct kbase_hwcnt_metadata *metadata, + size_t grp, size_t blk) { if (WARN_ON(!metadata) || WARN_ON(grp >= metadata->grp_cnt) || WARN_ON(blk >= metadata->grp_metadata[grp].blk_cnt)) @@ -468,8 +466,9 @@ static inline size_t kbase_hwcnt_metadata_block_enable_map_stride( * Return: Number of headers plus counters in each instance of block blk * in group grp. */ -static inline size_t kbase_hwcnt_metadata_block_values_count( - const struct kbase_hwcnt_metadata *metadata, size_t grp, size_t blk) +static inline size_t +kbase_hwcnt_metadata_block_values_count(const struct kbase_hwcnt_metadata *metadata, size_t grp, + size_t blk) { if (WARN_ON(!metadata) || WARN_ON(grp >= metadata->grp_cnt) || WARN_ON(blk >= metadata->grp_metadata[grp].blk_cnt)) @@ -490,10 +489,13 @@ static inline size_t kbase_hwcnt_metadata_block_values_count( * Iteration order is group, then block, then block instance (i.e. linearly * through memory). */ -#define kbase_hwcnt_metadata_for_each_block(md, grp, blk, blk_inst) \ - for ((grp) = 0; (grp) < kbase_hwcnt_metadata_group_count((md)); (grp)++) \ - for ((blk) = 0; (blk) < kbase_hwcnt_metadata_block_count((md), (grp)); (blk)++) \ - for ((blk_inst) = 0; (blk_inst) < kbase_hwcnt_metadata_block_instance_count((md), (grp), (blk)); (blk_inst)++) +#define kbase_hwcnt_metadata_for_each_block(md, grp, blk, blk_inst) \ + for ((grp) = 0; (grp) < kbase_hwcnt_metadata_group_count((md)); (grp)++) \ + for ((blk) = 0; (blk) < kbase_hwcnt_metadata_block_count((md), (grp)); (blk)++) \ + for ((blk_inst) = 0; \ + (blk_inst) < \ + kbase_hwcnt_metadata_block_instance_count((md), (grp), (blk)); \ + (blk_inst)++) /** * kbase_hwcnt_metadata_block_avail_bit() - Get the bit index into the avail @@ -504,10 +506,9 @@ static inline size_t kbase_hwcnt_metadata_block_values_count( * * Return: The bit index into the avail mask for the block. */ -static inline size_t kbase_hwcnt_metadata_block_avail_bit( - const struct kbase_hwcnt_metadata *metadata, - size_t grp, - size_t blk) +static inline size_t +kbase_hwcnt_metadata_block_avail_bit(const struct kbase_hwcnt_metadata *metadata, size_t grp, + size_t blk) { if (WARN_ON(!metadata) || WARN_ON(grp >= metadata->grp_cnt) || WARN_ON(blk >= metadata->grp_metadata[grp].blk_cnt)) @@ -527,11 +528,9 @@ static inline size_t kbase_hwcnt_metadata_block_avail_bit( * * Return: true if the block instance is available, else false. */ -static inline bool kbase_hwcnt_metadata_block_instance_avail( - const struct kbase_hwcnt_metadata *metadata, - size_t grp, - size_t blk, - size_t blk_inst) +static inline bool +kbase_hwcnt_metadata_block_instance_avail(const struct kbase_hwcnt_metadata *metadata, size_t grp, + size_t blk, size_t blk_inst) { size_t bit; u64 mask; @@ -553,9 +552,8 @@ static inline bool kbase_hwcnt_metadata_block_instance_avail( * * Return: 0 on success, else error code. */ -int kbase_hwcnt_enable_map_alloc( - const struct kbase_hwcnt_metadata *metadata, - struct kbase_hwcnt_enable_map *enable_map); +int kbase_hwcnt_enable_map_alloc(const struct kbase_hwcnt_metadata *metadata, + struct kbase_hwcnt_enable_map *enable_map); /** * kbase_hwcnt_enable_map_free() - Free an enable map. @@ -577,9 +575,8 @@ void kbase_hwcnt_enable_map_free(struct kbase_hwcnt_enable_map *enable_map); * Return: u64* to the bitfield(s) used as the enable map for the * block instance. */ -static inline u64 * -kbase_hwcnt_enable_map_block_instance(const struct kbase_hwcnt_enable_map *map, - size_t grp, size_t blk, size_t blk_inst) +static inline u64 *kbase_hwcnt_enable_map_block_instance(const struct kbase_hwcnt_enable_map *map, + size_t grp, size_t blk, size_t blk_inst) { if (WARN_ON(!map) || WARN_ON(!map->hwcnt_enable_map)) return NULL; @@ -589,15 +586,9 @@ kbase_hwcnt_enable_map_block_instance(const struct kbase_hwcnt_enable_map *map, WARN_ON(blk_inst >= map->metadata->grp_metadata[grp].blk_metadata[blk].inst_cnt)) return map->hwcnt_enable_map; - return map->hwcnt_enable_map + - map->metadata->grp_metadata[grp].enable_map_index + - map->metadata->grp_metadata[grp] - .blk_metadata[blk] - .enable_map_index + - (map->metadata->grp_metadata[grp] - .blk_metadata[blk] - .enable_map_stride * - blk_inst); + return map->hwcnt_enable_map + map->metadata->grp_metadata[grp].enable_map_index + + map->metadata->grp_metadata[grp].blk_metadata[blk].enable_map_index + + (map->metadata->grp_metadata[grp].blk_metadata[blk].enable_map_stride * blk_inst); } /** @@ -609,8 +600,7 @@ kbase_hwcnt_enable_map_block_instance(const struct kbase_hwcnt_enable_map *map, */ static inline size_t kbase_hwcnt_bitfield_count(size_t val_cnt) { - return (val_cnt + KBASE_HWCNT_BITFIELD_BITS - 1) / - KBASE_HWCNT_BITFIELD_BITS; + return (val_cnt + KBASE_HWCNT_BITFIELD_BITS - 1) / KBASE_HWCNT_BITFIELD_BITS; } /** @@ -620,11 +610,8 @@ static inline size_t kbase_hwcnt_bitfield_count(size_t val_cnt) * @blk: Index of the block in the group. * @blk_inst: Index of the block instance in the block. */ -static inline void kbase_hwcnt_enable_map_block_disable_all( - struct kbase_hwcnt_enable_map *dst, - size_t grp, - size_t blk, - size_t blk_inst) +static inline void kbase_hwcnt_enable_map_block_disable_all(struct kbase_hwcnt_enable_map *dst, + size_t grp, size_t blk, size_t blk_inst) { size_t val_cnt; size_t bitfld_cnt; @@ -644,15 +631,13 @@ static inline void kbase_hwcnt_enable_map_block_disable_all( * kbase_hwcnt_enable_map_disable_all() - Disable all values in the enable map. * @dst: Non-NULL pointer to enable map to zero. */ -static inline void kbase_hwcnt_enable_map_disable_all( - struct kbase_hwcnt_enable_map *dst) +static inline void kbase_hwcnt_enable_map_disable_all(struct kbase_hwcnt_enable_map *dst) { if (WARN_ON(!dst) || WARN_ON(!dst->metadata)) return; if (dst->hwcnt_enable_map != NULL) - memset(dst->hwcnt_enable_map, 0, - dst->metadata->enable_map_bytes); + memset(dst->hwcnt_enable_map, 0, dst->metadata->enable_map_bytes); dst->clk_enable_map = 0; } @@ -664,11 +649,8 @@ static inline void kbase_hwcnt_enable_map_disable_all( * @blk: Index of the block in the group. * @blk_inst: Index of the block instance in the block. */ -static inline void kbase_hwcnt_enable_map_block_enable_all( - struct kbase_hwcnt_enable_map *dst, - size_t grp, - size_t blk, - size_t blk_inst) +static inline void kbase_hwcnt_enable_map_block_enable_all(struct kbase_hwcnt_enable_map *dst, + size_t grp, size_t blk, size_t blk_inst) { size_t val_cnt; size_t bitfld_cnt; @@ -683,8 +665,7 @@ static inline void kbase_hwcnt_enable_map_block_enable_all( bitfld_cnt = kbase_hwcnt_bitfield_count(val_cnt); for (bitfld_idx = 0; bitfld_idx < bitfld_cnt; bitfld_idx++) { - const u64 remaining_values = val_cnt - - (bitfld_idx * KBASE_HWCNT_BITFIELD_BITS); + const u64 remaining_values = val_cnt - (bitfld_idx * KBASE_HWCNT_BITFIELD_BITS); u64 block_enable_map_mask = U64_MAX; if (remaining_values < KBASE_HWCNT_BITFIELD_BITS) @@ -699,8 +680,7 @@ static inline void kbase_hwcnt_enable_map_block_enable_all( * map. * @dst: Non-NULL pointer to enable map. */ -static inline void kbase_hwcnt_enable_map_enable_all( - struct kbase_hwcnt_enable_map *dst) +static inline void kbase_hwcnt_enable_map_enable_all(struct kbase_hwcnt_enable_map *dst) { size_t grp, blk, blk_inst; @@ -708,8 +688,7 @@ static inline void kbase_hwcnt_enable_map_enable_all( return; kbase_hwcnt_metadata_for_each_block(dst->metadata, grp, blk, blk_inst) - kbase_hwcnt_enable_map_block_enable_all( - dst, grp, blk, blk_inst); + kbase_hwcnt_enable_map_block_enable_all(dst, grp, blk, blk_inst); dst->clk_enable_map = (1ull << dst->metadata->clk_cnt) - 1; } @@ -721,9 +700,8 @@ static inline void kbase_hwcnt_enable_map_enable_all( * * The dst and src MUST have been created from the same metadata. */ -static inline void kbase_hwcnt_enable_map_copy( - struct kbase_hwcnt_enable_map *dst, - const struct kbase_hwcnt_enable_map *src) +static inline void kbase_hwcnt_enable_map_copy(struct kbase_hwcnt_enable_map *dst, + const struct kbase_hwcnt_enable_map *src) { if (WARN_ON(!dst) || WARN_ON(!src) || WARN_ON(!dst->metadata) || WARN_ON(dst->metadata != src->metadata)) @@ -733,8 +711,7 @@ static inline void kbase_hwcnt_enable_map_copy( if (WARN_ON(!src->hwcnt_enable_map)) return; - memcpy(dst->hwcnt_enable_map, - src->hwcnt_enable_map, + memcpy(dst->hwcnt_enable_map, src->hwcnt_enable_map, dst->metadata->enable_map_bytes); } @@ -748,9 +725,8 @@ static inline void kbase_hwcnt_enable_map_copy( * * The dst and src MUST have been created from the same metadata. */ -static inline void kbase_hwcnt_enable_map_union( - struct kbase_hwcnt_enable_map *dst, - const struct kbase_hwcnt_enable_map *src) +static inline void kbase_hwcnt_enable_map_union(struct kbase_hwcnt_enable_map *dst, + const struct kbase_hwcnt_enable_map *src) { if (WARN_ON(!dst) || WARN_ON(!src) || WARN_ON(!dst->metadata) || WARN_ON(dst->metadata != src->metadata)) @@ -781,11 +757,9 @@ static inline void kbase_hwcnt_enable_map_union( * * Return: true if any values in the block are enabled, else false. */ -static inline bool kbase_hwcnt_enable_map_block_enabled( - const struct kbase_hwcnt_enable_map *enable_map, - size_t grp, - size_t blk, - size_t blk_inst) +static inline bool +kbase_hwcnt_enable_map_block_enabled(const struct kbase_hwcnt_enable_map *enable_map, size_t grp, + size_t blk, size_t blk_inst) { bool any_enabled = false; size_t val_cnt; @@ -801,15 +775,13 @@ static inline bool kbase_hwcnt_enable_map_block_enabled( bitfld_cnt = kbase_hwcnt_bitfield_count(val_cnt); for (bitfld_idx = 0; bitfld_idx < bitfld_cnt; bitfld_idx++) { - const u64 remaining_values = val_cnt - - (bitfld_idx * KBASE_HWCNT_BITFIELD_BITS); + const u64 remaining_values = val_cnt - (bitfld_idx * KBASE_HWCNT_BITFIELD_BITS); u64 block_enable_map_mask = U64_MAX; if (remaining_values < KBASE_HWCNT_BITFIELD_BITS) block_enable_map_mask = (1ull << remaining_values) - 1; - any_enabled = any_enabled || - (block_enable_map[bitfld_idx] & block_enable_map_mask); + any_enabled = any_enabled || (block_enable_map[bitfld_idx] & block_enable_map_mask); } return any_enabled; @@ -821,8 +793,8 @@ static inline bool kbase_hwcnt_enable_map_block_enabled( * * Return: true if any values are enabled, else false. */ -static inline bool kbase_hwcnt_enable_map_any_enabled( - const struct kbase_hwcnt_enable_map *enable_map) +static inline bool +kbase_hwcnt_enable_map_any_enabled(const struct kbase_hwcnt_enable_map *enable_map) { size_t grp, blk, blk_inst; u64 clk_enable_map_mask; @@ -832,14 +804,12 @@ static inline bool kbase_hwcnt_enable_map_any_enabled( clk_enable_map_mask = (1ull << enable_map->metadata->clk_cnt) - 1; - if (enable_map->metadata->clk_cnt > 0 && - (enable_map->clk_enable_map & clk_enable_map_mask)) + if (enable_map->metadata->clk_cnt > 0 && (enable_map->clk_enable_map & clk_enable_map_mask)) return true; - kbase_hwcnt_metadata_for_each_block( - enable_map->metadata, grp, blk, blk_inst) { - if (kbase_hwcnt_enable_map_block_enabled( - enable_map, grp, blk, blk_inst)) + kbase_hwcnt_metadata_for_each_block(enable_map->metadata, grp, blk, blk_inst) + { + if (kbase_hwcnt_enable_map_block_enabled(enable_map, grp, blk, blk_inst)) return true; } @@ -855,9 +825,7 @@ static inline bool kbase_hwcnt_enable_map_any_enabled( * * Return: true if the value was enabled, else false. */ -static inline bool kbase_hwcnt_enable_map_block_value_enabled( - const u64 *bitfld, - size_t val_idx) +static inline bool kbase_hwcnt_enable_map_block_value_enabled(const u64 *bitfld, size_t val_idx) { const size_t idx = val_idx / KBASE_HWCNT_BITFIELD_BITS; const size_t bit = val_idx % KBASE_HWCNT_BITFIELD_BITS; @@ -873,9 +841,7 @@ static inline bool kbase_hwcnt_enable_map_block_value_enabled( * kbase_hwcnt_enable_map_block_instance. * @val_idx: Index of the value to enable in the block instance. */ -static inline void kbase_hwcnt_enable_map_block_enable_value( - u64 *bitfld, - size_t val_idx) +static inline void kbase_hwcnt_enable_map_block_enable_value(u64 *bitfld, size_t val_idx) { const size_t idx = val_idx / KBASE_HWCNT_BITFIELD_BITS; const size_t bit = val_idx % KBASE_HWCNT_BITFIELD_BITS; @@ -891,9 +857,7 @@ static inline void kbase_hwcnt_enable_map_block_enable_value( * kbase_hwcnt_enable_map_block_instance. * @val_idx: Index of the value to disable in the block instance. */ -static inline void kbase_hwcnt_enable_map_block_disable_value( - u64 *bitfld, - size_t val_idx) +static inline void kbase_hwcnt_enable_map_block_disable_value(u64 *bitfld, size_t val_idx) { const size_t idx = val_idx / KBASE_HWCNT_BITFIELD_BITS; const size_t bit = val_idx % KBASE_HWCNT_BITFIELD_BITS; @@ -911,9 +875,8 @@ static inline void kbase_hwcnt_enable_map_block_disable_value( * * Return: 0 on success, else error code. */ -int kbase_hwcnt_dump_buffer_alloc( - const struct kbase_hwcnt_metadata *metadata, - struct kbase_hwcnt_dump_buffer *dump_buf); +int kbase_hwcnt_dump_buffer_alloc(const struct kbase_hwcnt_metadata *metadata, + struct kbase_hwcnt_dump_buffer *dump_buf); /** * kbase_hwcnt_dump_buffer_free() - Free a dump buffer. @@ -936,10 +899,8 @@ void kbase_hwcnt_dump_buffer_free(struct kbase_hwcnt_dump_buffer *dump_buf); * * Return: 0 on success, else error code. */ -int kbase_hwcnt_dump_buffer_array_alloc( - const struct kbase_hwcnt_metadata *metadata, - size_t n, - struct kbase_hwcnt_dump_buffer_array *dump_bufs); +int kbase_hwcnt_dump_buffer_array_alloc(const struct kbase_hwcnt_metadata *metadata, size_t n, + struct kbase_hwcnt_dump_buffer_array *dump_bufs); /** * kbase_hwcnt_dump_buffer_array_free() - Free a dump buffer array. @@ -948,8 +909,7 @@ int kbase_hwcnt_dump_buffer_array_alloc( * Can be safely called on an all-zeroed dump buffer array structure, or on an * already freed dump buffer array. */ -void kbase_hwcnt_dump_buffer_array_free( - struct kbase_hwcnt_dump_buffer_array *dump_bufs); +void kbase_hwcnt_dump_buffer_array_free(struct kbase_hwcnt_dump_buffer_array *dump_bufs); /** * kbase_hwcnt_dump_buffer_block_instance() - Get the pointer to a block @@ -961,9 +921,8 @@ void kbase_hwcnt_dump_buffer_array_free( * * Return: u64* to the dump buffer for the block instance. */ -static inline u64 *kbase_hwcnt_dump_buffer_block_instance( - const struct kbase_hwcnt_dump_buffer *buf, size_t grp, size_t blk, - size_t blk_inst) +static inline u64 *kbase_hwcnt_dump_buffer_block_instance(const struct kbase_hwcnt_dump_buffer *buf, + size_t grp, size_t blk, size_t blk_inst) { if (WARN_ON(!buf) || WARN_ON(!buf->dump_buf)) return NULL; @@ -975,10 +934,7 @@ static inline u64 *kbase_hwcnt_dump_buffer_block_instance( return buf->dump_buf + buf->metadata->grp_metadata[grp].dump_buf_index + buf->metadata->grp_metadata[grp].blk_metadata[blk].dump_buf_index + - (buf->metadata->grp_metadata[grp] - .blk_metadata[blk] - .dump_buf_stride * - blk_inst); + (buf->metadata->grp_metadata[grp].blk_metadata[blk].dump_buf_stride * blk_inst); } /** @@ -990,9 +946,8 @@ static inline u64 *kbase_hwcnt_dump_buffer_block_instance( * * The dst and dst_enable_map MUST have been created from the same metadata. */ -void kbase_hwcnt_dump_buffer_zero( - struct kbase_hwcnt_dump_buffer *dst, - const struct kbase_hwcnt_enable_map *dst_enable_map); +void kbase_hwcnt_dump_buffer_zero(struct kbase_hwcnt_dump_buffer *dst, + const struct kbase_hwcnt_enable_map *dst_enable_map); /** * kbase_hwcnt_dump_buffer_block_zero() - Zero all values in a block. @@ -1000,8 +955,7 @@ void kbase_hwcnt_dump_buffer_zero( * kbase_hwcnt_dump_buffer_block_instance. * @val_cnt: Number of values in the block. */ -static inline void kbase_hwcnt_dump_buffer_block_zero(u64 *dst_blk, - size_t val_cnt) +static inline void kbase_hwcnt_dump_buffer_block_zero(u64 *dst_blk, size_t val_cnt) { if (WARN_ON(!dst_blk)) return; @@ -1017,8 +971,7 @@ static inline void kbase_hwcnt_dump_buffer_block_zero(u64 *dst_blk, * Slower than the non-strict variant. * @dst: Non-NULL pointer to dump buffer. */ -void kbase_hwcnt_dump_buffer_zero_strict( - struct kbase_hwcnt_dump_buffer *dst); +void kbase_hwcnt_dump_buffer_zero_strict(struct kbase_hwcnt_dump_buffer *dst); /** * kbase_hwcnt_dump_buffer_zero_non_enabled() - Zero all non-enabled values in @@ -1031,9 +984,8 @@ void kbase_hwcnt_dump_buffer_zero_strict( * * The dst and dst_enable_map MUST have been created from the same metadata. */ -void kbase_hwcnt_dump_buffer_zero_non_enabled( - struct kbase_hwcnt_dump_buffer *dst, - const struct kbase_hwcnt_enable_map *dst_enable_map); +void kbase_hwcnt_dump_buffer_zero_non_enabled(struct kbase_hwcnt_dump_buffer *dst, + const struct kbase_hwcnt_enable_map *dst_enable_map); /** * kbase_hwcnt_dump_buffer_block_zero_non_enabled() - Zero all non-enabled @@ -1047,9 +999,8 @@ void kbase_hwcnt_dump_buffer_zero_non_enabled( * kbase_hwcnt_enable_map_block_instance. * @val_cnt: Number of values in the block. */ -static inline void -kbase_hwcnt_dump_buffer_block_zero_non_enabled(u64 *dst_blk, const u64 *blk_em, - size_t val_cnt) +static inline void kbase_hwcnt_dump_buffer_block_zero_non_enabled(u64 *dst_blk, const u64 *blk_em, + size_t val_cnt) { size_t val; @@ -1073,10 +1024,9 @@ kbase_hwcnt_dump_buffer_block_zero_non_enabled(u64 *dst_blk, const u64 *blk_em, * The dst, src, and dst_enable_map MUST have been created from the same * metadata. */ -void kbase_hwcnt_dump_buffer_copy( - struct kbase_hwcnt_dump_buffer *dst, - const struct kbase_hwcnt_dump_buffer *src, - const struct kbase_hwcnt_enable_map *dst_enable_map); +void kbase_hwcnt_dump_buffer_copy(struct kbase_hwcnt_dump_buffer *dst, + const struct kbase_hwcnt_dump_buffer *src, + const struct kbase_hwcnt_enable_map *dst_enable_map); /** * kbase_hwcnt_dump_buffer_block_copy() - Copy all block values from src to dst. @@ -1086,8 +1036,7 @@ void kbase_hwcnt_dump_buffer_copy( * kbase_hwcnt_dump_buffer_block_instance. * @val_cnt: Number of values in the block. */ -static inline void kbase_hwcnt_dump_buffer_block_copy(u64 *dst_blk, - const u64 *src_blk, +static inline void kbase_hwcnt_dump_buffer_block_copy(u64 *dst_blk, const u64 *src_blk, size_t val_cnt) { if (WARN_ON(!dst_blk) || WARN_ON(!src_blk)) @@ -1113,10 +1062,9 @@ static inline void kbase_hwcnt_dump_buffer_block_copy(u64 *dst_blk, * The dst, src, and dst_enable_map MUST have been created from the same * metadata. */ -void kbase_hwcnt_dump_buffer_copy_strict( - struct kbase_hwcnt_dump_buffer *dst, - const struct kbase_hwcnt_dump_buffer *src, - const struct kbase_hwcnt_enable_map *dst_enable_map); +void kbase_hwcnt_dump_buffer_copy_strict(struct kbase_hwcnt_dump_buffer *dst, + const struct kbase_hwcnt_dump_buffer *src, + const struct kbase_hwcnt_enable_map *dst_enable_map); /** * kbase_hwcnt_dump_buffer_block_copy_strict() - Copy all enabled block values @@ -1134,10 +1082,8 @@ void kbase_hwcnt_dump_buffer_copy_strict( * * After the copy, any disabled values in dst will be zero. */ -static inline void kbase_hwcnt_dump_buffer_block_copy_strict(u64 *dst_blk, - const u64 *src_blk, - const u64 *blk_em, - size_t val_cnt) +static inline void kbase_hwcnt_dump_buffer_block_copy_strict(u64 *dst_blk, const u64 *src_blk, + const u64 *blk_em, size_t val_cnt) { size_t val; @@ -1145,8 +1091,7 @@ static inline void kbase_hwcnt_dump_buffer_block_copy_strict(u64 *dst_blk, return; for (val = 0; val < val_cnt; val++) { - bool val_enabled = kbase_hwcnt_enable_map_block_value_enabled( - blk_em, val); + bool val_enabled = kbase_hwcnt_enable_map_block_value_enabled(blk_em, val); dst_blk[val] = val_enabled ? src_blk[val] : 0; } @@ -1165,10 +1110,9 @@ static inline void kbase_hwcnt_dump_buffer_block_copy_strict(u64 *dst_blk, * The dst, src, and dst_enable_map MUST have been created from the same * metadata. */ -void kbase_hwcnt_dump_buffer_accumulate( - struct kbase_hwcnt_dump_buffer *dst, - const struct kbase_hwcnt_dump_buffer *src, - const struct kbase_hwcnt_enable_map *dst_enable_map); +void kbase_hwcnt_dump_buffer_accumulate(struct kbase_hwcnt_dump_buffer *dst, + const struct kbase_hwcnt_dump_buffer *src, + const struct kbase_hwcnt_enable_map *dst_enable_map); /** * kbase_hwcnt_dump_buffer_block_accumulate() - Copy all block headers and @@ -1181,10 +1125,8 @@ void kbase_hwcnt_dump_buffer_accumulate( * @hdr_cnt: Number of headers in the block. * @ctr_cnt: Number of counters in the block. */ -static inline void kbase_hwcnt_dump_buffer_block_accumulate(u64 *dst_blk, - const u64 *src_blk, - size_t hdr_cnt, - size_t ctr_cnt) +static inline void kbase_hwcnt_dump_buffer_block_accumulate(u64 *dst_blk, const u64 *src_blk, + size_t hdr_cnt, size_t ctr_cnt) { size_t ctr; @@ -1219,10 +1161,9 @@ static inline void kbase_hwcnt_dump_buffer_block_accumulate(u64 *dst_blk, * The dst, src, and dst_enable_map MUST have been created from the same * metadata. */ -void kbase_hwcnt_dump_buffer_accumulate_strict( - struct kbase_hwcnt_dump_buffer *dst, - const struct kbase_hwcnt_dump_buffer *src, - const struct kbase_hwcnt_enable_map *dst_enable_map); +void kbase_hwcnt_dump_buffer_accumulate_strict(struct kbase_hwcnt_dump_buffer *dst, + const struct kbase_hwcnt_dump_buffer *src, + const struct kbase_hwcnt_enable_map *dst_enable_map); /** * kbase_hwcnt_dump_buffer_block_accumulate_strict() - Copy all enabled block @@ -1241,21 +1182,19 @@ void kbase_hwcnt_dump_buffer_accumulate_strict( * @hdr_cnt: Number of headers in the block. * @ctr_cnt: Number of counters in the block. */ -static inline void kbase_hwcnt_dump_buffer_block_accumulate_strict( - u64 *dst_blk, const u64 *src_blk, const u64 *blk_em, size_t hdr_cnt, - size_t ctr_cnt) +static inline void kbase_hwcnt_dump_buffer_block_accumulate_strict(u64 *dst_blk, const u64 *src_blk, + const u64 *blk_em, + size_t hdr_cnt, size_t ctr_cnt) { size_t ctr; if (WARN_ON(!dst_blk) || WARN_ON(!src_blk)) return; - kbase_hwcnt_dump_buffer_block_copy_strict( - dst_blk, src_blk, blk_em, hdr_cnt); + kbase_hwcnt_dump_buffer_block_copy_strict(dst_blk, src_blk, blk_em, hdr_cnt); for (ctr = hdr_cnt; ctr < ctr_cnt + hdr_cnt; ctr++) { - bool ctr_enabled = kbase_hwcnt_enable_map_block_value_enabled( - blk_em, ctr); + bool ctr_enabled = kbase_hwcnt_enable_map_block_value_enabled(blk_em, ctr); if (ctr_enabled) dst_blk[ctr] += src_blk[ctr]; @@ -1270,8 +1209,7 @@ static inline void kbase_hwcnt_dump_buffer_block_accumulate_strict( * @md: Non-NULL pointer to metadata. * @clk: size_t variable used as clock iterator. */ -#define kbase_hwcnt_metadata_for_each_clock(md, clk) \ - for ((clk) = 0; (clk) < (md)->clk_cnt; (clk)++) +#define kbase_hwcnt_metadata_for_each_clock(md, clk) for ((clk) = 0; (clk) < (md)->clk_cnt; (clk)++) /** * kbase_hwcnt_clk_enable_map_enabled() - Check if the given index is enabled @@ -1281,8 +1219,7 @@ static inline void kbase_hwcnt_dump_buffer_block_accumulate_strict( * * Return: true if the index of the clock domain is enabled, else false. */ -static inline bool kbase_hwcnt_clk_enable_map_enabled( - const u64 clk_enable_map, const size_t index) +static inline bool kbase_hwcnt_clk_enable_map_enabled(const u64 clk_enable_map, const size_t index) { if (WARN_ON(index >= 64)) return false; diff --git a/mali_kbase/mali_kbase_hwcnt_virtualizer.c b/mali_kbase/hwcnt/mali_kbase_hwcnt_virtualizer.c index 52ecb7b..d618764 100644 --- a/mali_kbase/mali_kbase_hwcnt_virtualizer.c +++ b/mali_kbase/hwcnt/mali_kbase_hwcnt_virtualizer.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2018, 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2018, 2020-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -19,10 +19,10 @@ * */ -#include "mali_kbase_hwcnt_virtualizer.h" -#include "mali_kbase_hwcnt_accumulator.h" -#include "mali_kbase_hwcnt_context.h" -#include "mali_kbase_hwcnt_types.h" +#include "hwcnt/mali_kbase_hwcnt_virtualizer.h" +#include "hwcnt/mali_kbase_hwcnt_accumulator.h" +#include "hwcnt/mali_kbase_hwcnt_context.h" +#include "hwcnt/mali_kbase_hwcnt_types.h" #include <linux/mutex.h> #include <linux/slab.h> @@ -75,8 +75,8 @@ struct kbase_hwcnt_virtualizer_client { u64 ts_start_ns; }; -const struct kbase_hwcnt_metadata *kbase_hwcnt_virtualizer_metadata( - struct kbase_hwcnt_virtualizer *hvirt) +const struct kbase_hwcnt_metadata * +kbase_hwcnt_virtualizer_metadata(struct kbase_hwcnt_virtualizer *hvirt) { if (!hvirt) return NULL; @@ -90,8 +90,7 @@ const struct kbase_hwcnt_metadata *kbase_hwcnt_virtualizer_metadata( * * Will safely free a client in any partial state of construction. */ -static void kbasep_hwcnt_virtualizer_client_free( - struct kbase_hwcnt_virtualizer_client *hvcli) +static void kbasep_hwcnt_virtualizer_client_free(struct kbase_hwcnt_virtualizer_client *hvcli) { if (!hvcli) return; @@ -110,9 +109,8 @@ static void kbasep_hwcnt_virtualizer_client_free( * * Return: 0 on success, else error code. */ -static int kbasep_hwcnt_virtualizer_client_alloc( - const struct kbase_hwcnt_metadata *metadata, - struct kbase_hwcnt_virtualizer_client **out_hvcli) +static int kbasep_hwcnt_virtualizer_client_alloc(const struct kbase_hwcnt_metadata *metadata, + struct kbase_hwcnt_virtualizer_client **out_hvcli) { int errcode; struct kbase_hwcnt_virtualizer_client *hvcli = NULL; @@ -145,9 +143,9 @@ error: * @hvcli: Non-NULL pointer to virtualizer client. * @dump_buf: Non-NULL pointer to dump buffer to accumulate from. */ -static void kbasep_hwcnt_virtualizer_client_accumulate( - struct kbase_hwcnt_virtualizer_client *hvcli, - const struct kbase_hwcnt_dump_buffer *dump_buf) +static void +kbasep_hwcnt_virtualizer_client_accumulate(struct kbase_hwcnt_virtualizer_client *hvcli, + const struct kbase_hwcnt_dump_buffer *dump_buf) { WARN_ON(!hvcli); WARN_ON(!dump_buf); @@ -155,12 +153,10 @@ static void kbasep_hwcnt_virtualizer_client_accumulate( if (hvcli->has_accum) { /* If already some accumulation, accumulate */ - kbase_hwcnt_dump_buffer_accumulate( - &hvcli->accum_buf, dump_buf, &hvcli->enable_map); + kbase_hwcnt_dump_buffer_accumulate(&hvcli->accum_buf, dump_buf, &hvcli->enable_map); } else { /* If no accumulation, copy */ - kbase_hwcnt_dump_buffer_copy( - &hvcli->accum_buf, dump_buf, &hvcli->enable_map); + kbase_hwcnt_dump_buffer_copy(&hvcli->accum_buf, dump_buf, &hvcli->enable_map); } hvcli->has_accum = true; } @@ -173,8 +169,7 @@ static void kbasep_hwcnt_virtualizer_client_accumulate( * * Will safely terminate the accumulator in any partial state of initialisation. */ -static void kbasep_hwcnt_virtualizer_accumulator_term( - struct kbase_hwcnt_virtualizer *hvirt) +static void kbasep_hwcnt_virtualizer_accumulator_term(struct kbase_hwcnt_virtualizer *hvirt) { WARN_ON(!hvirt); lockdep_assert_held(&hvirt->lock); @@ -194,8 +189,7 @@ static void kbasep_hwcnt_virtualizer_accumulator_term( * * Return: 0 on success, else error code. */ -static int kbasep_hwcnt_virtualizer_accumulator_init( - struct kbase_hwcnt_virtualizer *hvirt) +static int kbasep_hwcnt_virtualizer_accumulator_init(struct kbase_hwcnt_virtualizer *hvirt) { int errcode; @@ -204,18 +198,15 @@ static int kbasep_hwcnt_virtualizer_accumulator_init( WARN_ON(hvirt->client_count); WARN_ON(hvirt->accum); - errcode = kbase_hwcnt_accumulator_acquire( - hvirt->hctx, &hvirt->accum); + errcode = kbase_hwcnt_accumulator_acquire(hvirt->hctx, &hvirt->accum); if (errcode) goto error; - errcode = kbase_hwcnt_enable_map_alloc( - hvirt->metadata, &hvirt->scratch_map); + errcode = kbase_hwcnt_enable_map_alloc(hvirt->metadata, &hvirt->scratch_map); if (errcode) goto error; - errcode = kbase_hwcnt_dump_buffer_alloc( - hvirt->metadata, &hvirt->scratch_buf); + errcode = kbase_hwcnt_dump_buffer_alloc(hvirt->metadata, &hvirt->scratch_buf); if (errcode) goto error; @@ -234,10 +225,9 @@ error: * * Return: 0 on success, else error code. */ -static int kbasep_hwcnt_virtualizer_client_add( - struct kbase_hwcnt_virtualizer *hvirt, - struct kbase_hwcnt_virtualizer_client *hvcli, - const struct kbase_hwcnt_enable_map *enable_map) +static int kbasep_hwcnt_virtualizer_client_add(struct kbase_hwcnt_virtualizer *hvirt, + struct kbase_hwcnt_virtualizer_client *hvcli, + const struct kbase_hwcnt_enable_map *enable_map) { int errcode = 0; u64 ts_start_ns; @@ -258,28 +248,25 @@ static int kbasep_hwcnt_virtualizer_client_add( if (hvirt->client_count == 1) { /* First client, so just pass the enable map onwards as is */ - errcode = kbase_hwcnt_accumulator_set_counters(hvirt->accum, - enable_map, &ts_start_ns, &ts_end_ns, NULL); + errcode = kbase_hwcnt_accumulator_set_counters(hvirt->accum, enable_map, + &ts_start_ns, &ts_end_ns, NULL); } else { struct kbase_hwcnt_virtualizer_client *pos; /* Make the scratch enable map the union of all enable maps */ - kbase_hwcnt_enable_map_copy( - &hvirt->scratch_map, enable_map); - list_for_each_entry(pos, &hvirt->clients, node) - kbase_hwcnt_enable_map_union( - &hvirt->scratch_map, &pos->enable_map); + kbase_hwcnt_enable_map_copy(&hvirt->scratch_map, enable_map); + list_for_each_entry (pos, &hvirt->clients, node) + kbase_hwcnt_enable_map_union(&hvirt->scratch_map, &pos->enable_map); /* Set the counters with the new union enable map */ - errcode = kbase_hwcnt_accumulator_set_counters(hvirt->accum, - &hvirt->scratch_map, - &ts_start_ns, &ts_end_ns, - &hvirt->scratch_buf); + errcode = kbase_hwcnt_accumulator_set_counters(hvirt->accum, &hvirt->scratch_map, + &ts_start_ns, &ts_end_ns, + &hvirt->scratch_buf); /* Accumulate into only existing clients' accumulation bufs */ if (!errcode) - list_for_each_entry(pos, &hvirt->clients, node) - kbasep_hwcnt_virtualizer_client_accumulate( - pos, &hvirt->scratch_buf); + list_for_each_entry (pos, &hvirt->clients, node) + kbasep_hwcnt_virtualizer_client_accumulate(pos, + &hvirt->scratch_buf); } if (errcode) goto error; @@ -307,9 +294,8 @@ error: * @hvirt: Non-NULL pointer to the hardware counter virtualizer. * @hvcli: Non-NULL pointer to the virtualizer client to remove. */ -static void kbasep_hwcnt_virtualizer_client_remove( - struct kbase_hwcnt_virtualizer *hvirt, - struct kbase_hwcnt_virtualizer_client *hvcli) +static void kbasep_hwcnt_virtualizer_client_remove(struct kbase_hwcnt_virtualizer *hvirt, + struct kbase_hwcnt_virtualizer_client *hvcli) { int errcode = 0; u64 ts_start_ns; @@ -329,22 +315,21 @@ static void kbasep_hwcnt_virtualizer_client_remove( struct kbase_hwcnt_virtualizer_client *pos; /* Make the scratch enable map the union of all enable maps */ kbase_hwcnt_enable_map_disable_all(&hvirt->scratch_map); - list_for_each_entry(pos, &hvirt->clients, node) - kbase_hwcnt_enable_map_union( - &hvirt->scratch_map, &pos->enable_map); + list_for_each_entry (pos, &hvirt->clients, node) + kbase_hwcnt_enable_map_union(&hvirt->scratch_map, &pos->enable_map); /* Set the counters with the new union enable map */ - errcode = kbase_hwcnt_accumulator_set_counters(hvirt->accum, - &hvirt->scratch_map, - &ts_start_ns, &ts_end_ns, - &hvirt->scratch_buf); + errcode = kbase_hwcnt_accumulator_set_counters(hvirt->accum, &hvirt->scratch_map, + &ts_start_ns, &ts_end_ns, + &hvirt->scratch_buf); /* Accumulate into remaining clients' accumulation bufs */ - if (!errcode) - list_for_each_entry(pos, &hvirt->clients, node) - kbasep_hwcnt_virtualizer_client_accumulate( - pos, &hvirt->scratch_buf); + if (!errcode) { + list_for_each_entry (pos, &hvirt->clients, node) + kbasep_hwcnt_virtualizer_client_accumulate(pos, + &hvirt->scratch_buf); - /* Store the most recent dump time for rate limiting */ - hvirt->ts_last_dump_ns = ts_end_ns; + /* Store the most recent dump time for rate limiting */ + hvirt->ts_last_dump_ns = ts_end_ns; + } } WARN_ON(errcode); } @@ -370,11 +355,8 @@ static void kbasep_hwcnt_virtualizer_client_remove( * Return: 0 on success or error code. */ static int kbasep_hwcnt_virtualizer_client_set_counters( - struct kbase_hwcnt_virtualizer *hvirt, - struct kbase_hwcnt_virtualizer_client *hvcli, - const struct kbase_hwcnt_enable_map *enable_map, - u64 *ts_start_ns, - u64 *ts_end_ns, + struct kbase_hwcnt_virtualizer *hvirt, struct kbase_hwcnt_virtualizer_client *hvcli, + const struct kbase_hwcnt_enable_map *enable_map, u64 *ts_start_ns, u64 *ts_end_ns, struct kbase_hwcnt_dump_buffer *dump_buf) { int errcode; @@ -391,32 +373,29 @@ static int kbasep_hwcnt_virtualizer_client_set_counters( /* Make the scratch enable map the union of all enable maps */ kbase_hwcnt_enable_map_copy(&hvirt->scratch_map, enable_map); - list_for_each_entry(pos, &hvirt->clients, node) + list_for_each_entry (pos, &hvirt->clients, node) /* Ignore the enable map of the selected client */ if (pos != hvcli) - kbase_hwcnt_enable_map_union( - &hvirt->scratch_map, &pos->enable_map); + kbase_hwcnt_enable_map_union(&hvirt->scratch_map, &pos->enable_map); /* Set the counters with the new union enable map */ - errcode = kbase_hwcnt_accumulator_set_counters(hvirt->accum, - &hvirt->scratch_map, ts_start_ns, ts_end_ns, - &hvirt->scratch_buf); + errcode = kbase_hwcnt_accumulator_set_counters(hvirt->accum, &hvirt->scratch_map, + ts_start_ns, ts_end_ns, &hvirt->scratch_buf); if (errcode) return errcode; /* Accumulate into all accumulation bufs except the selected client's */ - list_for_each_entry(pos, &hvirt->clients, node) + list_for_each_entry (pos, &hvirt->clients, node) if (pos != hvcli) - kbasep_hwcnt_virtualizer_client_accumulate( - pos, &hvirt->scratch_buf); + kbasep_hwcnt_virtualizer_client_accumulate(pos, &hvirt->scratch_buf); /* Finally, write into the dump buf */ if (dump_buf) { const struct kbase_hwcnt_dump_buffer *src = &hvirt->scratch_buf; if (hvcli->has_accum) { - kbase_hwcnt_dump_buffer_accumulate( - &hvcli->accum_buf, src, &hvcli->enable_map); + kbase_hwcnt_dump_buffer_accumulate(&hvcli->accum_buf, src, + &hvcli->enable_map); src = &hvcli->accum_buf; } kbase_hwcnt_dump_buffer_copy(dump_buf, src, &hvcli->enable_map); @@ -436,12 +415,10 @@ static int kbasep_hwcnt_virtualizer_client_set_counters( return errcode; } -int kbase_hwcnt_virtualizer_client_set_counters( - struct kbase_hwcnt_virtualizer_client *hvcli, - const struct kbase_hwcnt_enable_map *enable_map, - u64 *ts_start_ns, - u64 *ts_end_ns, - struct kbase_hwcnt_dump_buffer *dump_buf) +int kbase_hwcnt_virtualizer_client_set_counters(struct kbase_hwcnt_virtualizer_client *hvcli, + const struct kbase_hwcnt_enable_map *enable_map, + u64 *ts_start_ns, u64 *ts_end_ns, + struct kbase_hwcnt_dump_buffer *dump_buf) { int errcode; struct kbase_hwcnt_virtualizer *hvirt; @@ -464,14 +441,12 @@ int kbase_hwcnt_virtualizer_client_set_counters( * to the accumulator, saving a fair few copies and * accumulations. */ - errcode = kbase_hwcnt_accumulator_set_counters( - hvirt->accum, enable_map, - ts_start_ns, ts_end_ns, dump_buf); + errcode = kbase_hwcnt_accumulator_set_counters(hvirt->accum, enable_map, + ts_start_ns, ts_end_ns, dump_buf); if (!errcode) { /* Update the selected client's enable map */ - kbase_hwcnt_enable_map_copy( - &hvcli->enable_map, enable_map); + kbase_hwcnt_enable_map_copy(&hvcli->enable_map, enable_map); /* Fix up the timestamps */ *ts_start_ns = hvcli->ts_start_ns; @@ -483,8 +458,7 @@ int kbase_hwcnt_virtualizer_client_set_counters( } else { /* Otherwise, do the full virtualize */ errcode = kbasep_hwcnt_virtualizer_client_set_counters( - hvirt, hvcli, enable_map, - ts_start_ns, ts_end_ns, dump_buf); + hvirt, hvcli, enable_map, ts_start_ns, ts_end_ns, dump_buf); } mutex_unlock(&hvirt->lock); @@ -507,12 +481,10 @@ int kbase_hwcnt_virtualizer_client_set_counters( * * Return: 0 on success or error code. */ -static int kbasep_hwcnt_virtualizer_client_dump( - struct kbase_hwcnt_virtualizer *hvirt, - struct kbase_hwcnt_virtualizer_client *hvcli, - u64 *ts_start_ns, - u64 *ts_end_ns, - struct kbase_hwcnt_dump_buffer *dump_buf) +static int kbasep_hwcnt_virtualizer_client_dump(struct kbase_hwcnt_virtualizer *hvirt, + struct kbase_hwcnt_virtualizer_client *hvcli, + u64 *ts_start_ns, u64 *ts_end_ns, + struct kbase_hwcnt_dump_buffer *dump_buf) { int errcode; struct kbase_hwcnt_virtualizer_client *pos; @@ -525,24 +497,23 @@ static int kbasep_hwcnt_virtualizer_client_dump( lockdep_assert_held(&hvirt->lock); /* Perform the dump */ - errcode = kbase_hwcnt_accumulator_dump(hvirt->accum, - ts_start_ns, ts_end_ns, &hvirt->scratch_buf); + errcode = kbase_hwcnt_accumulator_dump(hvirt->accum, ts_start_ns, ts_end_ns, + &hvirt->scratch_buf); if (errcode) return errcode; /* Accumulate into all accumulation bufs except the selected client's */ - list_for_each_entry(pos, &hvirt->clients, node) + list_for_each_entry (pos, &hvirt->clients, node) if (pos != hvcli) - kbasep_hwcnt_virtualizer_client_accumulate( - pos, &hvirt->scratch_buf); + kbasep_hwcnt_virtualizer_client_accumulate(pos, &hvirt->scratch_buf); /* Finally, write into the dump buf */ if (dump_buf) { const struct kbase_hwcnt_dump_buffer *src = &hvirt->scratch_buf; if (hvcli->has_accum) { - kbase_hwcnt_dump_buffer_accumulate( - &hvcli->accum_buf, src, &hvcli->enable_map); + kbase_hwcnt_dump_buffer_accumulate(&hvcli->accum_buf, src, + &hvcli->enable_map); src = &hvcli->accum_buf; } kbase_hwcnt_dump_buffer_copy(dump_buf, src, &hvcli->enable_map); @@ -578,11 +549,8 @@ static int kbasep_hwcnt_virtualizer_client_dump( * Return: 0 on success or error code. */ static int kbasep_hwcnt_virtualizer_client_dump_rate_limited( - struct kbase_hwcnt_virtualizer *hvirt, - struct kbase_hwcnt_virtualizer_client *hvcli, - u64 *ts_start_ns, - u64 *ts_end_ns, - struct kbase_hwcnt_dump_buffer *dump_buf) + struct kbase_hwcnt_virtualizer *hvirt, struct kbase_hwcnt_virtualizer_client *hvcli, + u64 *ts_start_ns, u64 *ts_end_ns, struct kbase_hwcnt_dump_buffer *dump_buf) { bool rate_limited = true; @@ -602,10 +570,8 @@ static int kbasep_hwcnt_virtualizer_client_dump_rate_limited( */ rate_limited = false; } else { - const u64 ts_ns = - kbase_hwcnt_accumulator_timestamp_ns(hvirt->accum); - const u64 time_since_last_dump_ns = - ts_ns - hvirt->ts_last_dump_ns; + const u64 ts_ns = kbase_hwcnt_accumulator_timestamp_ns(hvirt->accum); + const u64 time_since_last_dump_ns = ts_ns - hvirt->ts_last_dump_ns; /* Dump period equals or exceeds the threshold */ if (time_since_last_dump_ns >= hvirt->dump_threshold_ns) @@ -613,8 +579,8 @@ static int kbasep_hwcnt_virtualizer_client_dump_rate_limited( } if (!rate_limited) - return kbasep_hwcnt_virtualizer_client_dump( - hvirt, hvcli, ts_start_ns, ts_end_ns, dump_buf); + return kbasep_hwcnt_virtualizer_client_dump(hvirt, hvcli, ts_start_ns, ts_end_ns, + dump_buf); /* If we've gotten this far, the client must have something accumulated * otherwise it is a logic error @@ -622,8 +588,7 @@ static int kbasep_hwcnt_virtualizer_client_dump_rate_limited( WARN_ON(!hvcli->has_accum); if (dump_buf) - kbase_hwcnt_dump_buffer_copy( - dump_buf, &hvcli->accum_buf, &hvcli->enable_map); + kbase_hwcnt_dump_buffer_copy(dump_buf, &hvcli->accum_buf, &hvcli->enable_map); hvcli->has_accum = false; *ts_start_ns = hvcli->ts_start_ns; @@ -633,11 +598,9 @@ static int kbasep_hwcnt_virtualizer_client_dump_rate_limited( return 0; } -int kbase_hwcnt_virtualizer_client_dump( - struct kbase_hwcnt_virtualizer_client *hvcli, - u64 *ts_start_ns, - u64 *ts_end_ns, - struct kbase_hwcnt_dump_buffer *dump_buf) +int kbase_hwcnt_virtualizer_client_dump(struct kbase_hwcnt_virtualizer_client *hvcli, + u64 *ts_start_ns, u64 *ts_end_ns, + struct kbase_hwcnt_dump_buffer *dump_buf) { int errcode; struct kbase_hwcnt_virtualizer *hvirt; @@ -659,8 +622,8 @@ int kbase_hwcnt_virtualizer_client_dump( * to the accumulator, saving a fair few copies and * accumulations. */ - errcode = kbase_hwcnt_accumulator_dump( - hvirt->accum, ts_start_ns, ts_end_ns, dump_buf); + errcode = kbase_hwcnt_accumulator_dump(hvirt->accum, ts_start_ns, ts_end_ns, + dump_buf); if (!errcode) { /* Fix up the timestamps */ @@ -681,20 +644,17 @@ int kbase_hwcnt_virtualizer_client_dump( return errcode; } -int kbase_hwcnt_virtualizer_client_create( - struct kbase_hwcnt_virtualizer *hvirt, - const struct kbase_hwcnt_enable_map *enable_map, - struct kbase_hwcnt_virtualizer_client **out_hvcli) +int kbase_hwcnt_virtualizer_client_create(struct kbase_hwcnt_virtualizer *hvirt, + const struct kbase_hwcnt_enable_map *enable_map, + struct kbase_hwcnt_virtualizer_client **out_hvcli) { int errcode; struct kbase_hwcnt_virtualizer_client *hvcli; - if (!hvirt || !enable_map || !out_hvcli || - (enable_map->metadata != hvirt->metadata)) + if (!hvirt || !enable_map || !out_hvcli || (enable_map->metadata != hvirt->metadata)) return -EINVAL; - errcode = kbasep_hwcnt_virtualizer_client_alloc( - hvirt->metadata, &hvcli); + errcode = kbasep_hwcnt_virtualizer_client_alloc(hvirt->metadata, &hvcli); if (errcode) return errcode; @@ -713,8 +673,7 @@ int kbase_hwcnt_virtualizer_client_create( return 0; } -void kbase_hwcnt_virtualizer_client_destroy( - struct kbase_hwcnt_virtualizer_client *hvcli) +void kbase_hwcnt_virtualizer_client_destroy(struct kbase_hwcnt_virtualizer_client *hvcli) { if (!hvcli) return; @@ -728,10 +687,8 @@ void kbase_hwcnt_virtualizer_client_destroy( kbasep_hwcnt_virtualizer_client_free(hvcli); } -int kbase_hwcnt_virtualizer_init( - struct kbase_hwcnt_context *hctx, - u64 dump_threshold_ns, - struct kbase_hwcnt_virtualizer **out_hvirt) +int kbase_hwcnt_virtualizer_init(struct kbase_hwcnt_context *hctx, u64 dump_threshold_ns, + struct kbase_hwcnt_virtualizer **out_hvirt) { struct kbase_hwcnt_virtualizer *virt; const struct kbase_hwcnt_metadata *metadata; @@ -758,8 +715,7 @@ int kbase_hwcnt_virtualizer_init( return 0; } -void kbase_hwcnt_virtualizer_term( - struct kbase_hwcnt_virtualizer *hvirt) +void kbase_hwcnt_virtualizer_term(struct kbase_hwcnt_virtualizer *hvirt) { if (!hvirt) return; @@ -768,7 +724,7 @@ void kbase_hwcnt_virtualizer_term( if (WARN_ON(hvirt->client_count != 0)) { struct kbase_hwcnt_virtualizer_client *pos, *n; - list_for_each_entry_safe(pos, n, &hvirt->clients, node) + list_for_each_entry_safe (pos, n, &hvirt->clients, node) kbase_hwcnt_virtualizer_client_destroy(pos); } diff --git a/mali_kbase/mali_kbase_hwcnt_virtualizer.h b/mali_kbase/hwcnt/mali_kbase_hwcnt_virtualizer.h index 08e8e9f..485ba74 100644 --- a/mali_kbase/mali_kbase_hwcnt_virtualizer.h +++ b/mali_kbase/hwcnt/mali_kbase_hwcnt_virtualizer.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2018, 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2018, 2020-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -51,17 +51,14 @@ struct kbase_hwcnt_dump_buffer; * * Return: 0 on success, else error code. */ -int kbase_hwcnt_virtualizer_init( - struct kbase_hwcnt_context *hctx, - u64 dump_threshold_ns, - struct kbase_hwcnt_virtualizer **out_hvirt); +int kbase_hwcnt_virtualizer_init(struct kbase_hwcnt_context *hctx, u64 dump_threshold_ns, + struct kbase_hwcnt_virtualizer **out_hvirt); /** * kbase_hwcnt_virtualizer_term - Terminate a hardware counter virtualizer. * @hvirt: Pointer to virtualizer to be terminated. */ -void kbase_hwcnt_virtualizer_term( - struct kbase_hwcnt_virtualizer *hvirt); +void kbase_hwcnt_virtualizer_term(struct kbase_hwcnt_virtualizer *hvirt); /** * kbase_hwcnt_virtualizer_metadata - Get the hardware counter metadata used by @@ -71,8 +68,8 @@ void kbase_hwcnt_virtualizer_term( * * Return: Non-NULL pointer to metadata, or NULL on error. */ -const struct kbase_hwcnt_metadata *kbase_hwcnt_virtualizer_metadata( - struct kbase_hwcnt_virtualizer *hvirt); +const struct kbase_hwcnt_metadata * +kbase_hwcnt_virtualizer_metadata(struct kbase_hwcnt_virtualizer *hvirt); /** * kbase_hwcnt_virtualizer_client_create - Create a new virtualizer client. @@ -84,17 +81,15 @@ const struct kbase_hwcnt_metadata *kbase_hwcnt_virtualizer_metadata( * * Return: 0 on success, else error code. */ -int kbase_hwcnt_virtualizer_client_create( - struct kbase_hwcnt_virtualizer *hvirt, - const struct kbase_hwcnt_enable_map *enable_map, - struct kbase_hwcnt_virtualizer_client **out_hvcli); +int kbase_hwcnt_virtualizer_client_create(struct kbase_hwcnt_virtualizer *hvirt, + const struct kbase_hwcnt_enable_map *enable_map, + struct kbase_hwcnt_virtualizer_client **out_hvcli); /** * kbase_hwcnt_virtualizer_client_destroy() - Destroy a virtualizer client. * @hvcli: Pointer to the hardware counter client. */ -void kbase_hwcnt_virtualizer_client_destroy( - struct kbase_hwcnt_virtualizer_client *hvcli); +void kbase_hwcnt_virtualizer_client_destroy(struct kbase_hwcnt_virtualizer_client *hvcli); /** * kbase_hwcnt_virtualizer_client_set_counters - Perform a dump of the client's @@ -115,12 +110,10 @@ void kbase_hwcnt_virtualizer_client_destroy( * * Return: 0 on success or error code. */ -int kbase_hwcnt_virtualizer_client_set_counters( - struct kbase_hwcnt_virtualizer_client *hvcli, - const struct kbase_hwcnt_enable_map *enable_map, - u64 *ts_start_ns, - u64 *ts_end_ns, - struct kbase_hwcnt_dump_buffer *dump_buf); +int kbase_hwcnt_virtualizer_client_set_counters(struct kbase_hwcnt_virtualizer_client *hvcli, + const struct kbase_hwcnt_enable_map *enable_map, + u64 *ts_start_ns, u64 *ts_end_ns, + struct kbase_hwcnt_dump_buffer *dump_buf); /** * kbase_hwcnt_virtualizer_client_dump - Perform a dump of the client's @@ -136,11 +129,9 @@ int kbase_hwcnt_virtualizer_client_set_counters( * * Return: 0 on success or error code. */ -int kbase_hwcnt_virtualizer_client_dump( - struct kbase_hwcnt_virtualizer_client *hvcli, - u64 *ts_start_ns, - u64 *ts_end_ns, - struct kbase_hwcnt_dump_buffer *dump_buf); +int kbase_hwcnt_virtualizer_client_dump(struct kbase_hwcnt_virtualizer_client *hvcli, + u64 *ts_start_ns, u64 *ts_end_ns, + struct kbase_hwcnt_dump_buffer *dump_buf); /** * kbase_hwcnt_virtualizer_queue_work() - Queue hardware counter related async diff --git a/mali_kbase/mali_kbase_hwcnt_watchdog_if.h b/mali_kbase/hwcnt/mali_kbase_hwcnt_watchdog_if.h index 1873318..501c008 100644 --- a/mali_kbase/mali_kbase_hwcnt_watchdog_if.h +++ b/mali_kbase/hwcnt/mali_kbase_hwcnt_watchdog_if.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -50,17 +50,17 @@ typedef void kbase_hwcnt_watchdog_callback_fn(void *user_data); * * Return: 0 if the watchdog timer enabled successfully, error code otherwise. */ -typedef int kbase_hwcnt_watchdog_enable_fn( - const struct kbase_hwcnt_watchdog_info *timer, u32 period_ms, - kbase_hwcnt_watchdog_callback_fn *callback, void *user_data); +typedef int kbase_hwcnt_watchdog_enable_fn(const struct kbase_hwcnt_watchdog_info *timer, + u32 period_ms, + kbase_hwcnt_watchdog_callback_fn *callback, + void *user_data); /** * typedef kbase_hwcnt_watchdog_disable_fn - Disable watchdog timer * * @timer: Non-NULL pointer to a watchdog timer interface context */ -typedef void -kbase_hwcnt_watchdog_disable_fn(const struct kbase_hwcnt_watchdog_info *timer); +typedef void kbase_hwcnt_watchdog_disable_fn(const struct kbase_hwcnt_watchdog_info *timer); /** * typedef kbase_hwcnt_watchdog_modify_fn - Modify watchdog timer's timeout @@ -68,9 +68,8 @@ kbase_hwcnt_watchdog_disable_fn(const struct kbase_hwcnt_watchdog_info *timer); * @timer: Non-NULL pointer to a watchdog timer interface context * @delay_ms: Watchdog timer expiration in milliseconds */ -typedef void -kbase_hwcnt_watchdog_modify_fn(const struct kbase_hwcnt_watchdog_info *timer, - u32 delay_ms); +typedef void kbase_hwcnt_watchdog_modify_fn(const struct kbase_hwcnt_watchdog_info *timer, + u32 delay_ms); /** * struct kbase_hwcnt_watchdog_interface - Hardware counter watchdog virtual interface. diff --git a/mali_kbase/mali_kbase_hwcnt_watchdog_if_timer.c b/mali_kbase/hwcnt/mali_kbase_hwcnt_watchdog_if_timer.c index 69b957a..4caa832 100644 --- a/mali_kbase/mali_kbase_hwcnt_watchdog_if_timer.c +++ b/mali_kbase/hwcnt/mali_kbase_hwcnt_watchdog_if_timer.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -20,8 +20,8 @@ */ #include "mali_kbase.h" -#include "mali_kbase_hwcnt_watchdog_if.h" -#include "mali_kbase_hwcnt_watchdog_if_timer.h" +#include "hwcnt/mali_kbase_hwcnt_watchdog_if.h" +#include "hwcnt/mali_kbase_hwcnt_watchdog_if_timer.h" #include <linux/workqueue.h> #include <linux/slab.h> @@ -62,12 +62,10 @@ static void kbasep_hwcnt_watchdog_callback(struct work_struct *const work) } static int kbasep_hwcnt_watchdog_if_timer_enable( - const struct kbase_hwcnt_watchdog_info *const timer, - u32 const period_ms, kbase_hwcnt_watchdog_callback_fn *const callback, - void *const user_data) + const struct kbase_hwcnt_watchdog_info *const timer, u32 const period_ms, + kbase_hwcnt_watchdog_callback_fn *const callback, void *const user_data) { - struct kbase_hwcnt_watchdog_if_timer_info *const timer_info = - (void *)timer; + struct kbase_hwcnt_watchdog_if_timer_info *const timer_info = (void *)timer; if (WARN_ON(!timer) || WARN_ON(!callback) || WARN_ON(timer_info->timer_enabled)) return -EINVAL; @@ -81,11 +79,10 @@ static int kbasep_hwcnt_watchdog_if_timer_enable( return 0; } -static void kbasep_hwcnt_watchdog_if_timer_disable( - const struct kbase_hwcnt_watchdog_info *const timer) +static void +kbasep_hwcnt_watchdog_if_timer_disable(const struct kbase_hwcnt_watchdog_info *const timer) { - struct kbase_hwcnt_watchdog_if_timer_info *const timer_info = - (void *)timer; + struct kbase_hwcnt_watchdog_if_timer_info *const timer_info = (void *)timer; if (WARN_ON(!timer)) return; @@ -97,11 +94,11 @@ static void kbasep_hwcnt_watchdog_if_timer_disable( timer_info->timer_enabled = false; } -static void kbasep_hwcnt_watchdog_if_timer_modify( - const struct kbase_hwcnt_watchdog_info *const timer, u32 const delay_ms) +static void +kbasep_hwcnt_watchdog_if_timer_modify(const struct kbase_hwcnt_watchdog_info *const timer, + u32 const delay_ms) { - struct kbase_hwcnt_watchdog_if_timer_info *const timer_info = - (void *)timer; + struct kbase_hwcnt_watchdog_if_timer_info *const timer_info = (void *)timer; if (WARN_ON(!timer) || WARN_ON(!timer_info->timer_enabled)) return; @@ -109,8 +106,7 @@ static void kbasep_hwcnt_watchdog_if_timer_modify( mod_delayed_work(timer_info->workq, &timer_info->dwork, msecs_to_jiffies(delay_ms)); } -void kbase_hwcnt_watchdog_if_timer_destroy( - struct kbase_hwcnt_watchdog_interface *const watchdog_if) +void kbase_hwcnt_watchdog_if_timer_destroy(struct kbase_hwcnt_watchdog_interface *const watchdog_if) { struct kbase_hwcnt_watchdog_if_timer_info *timer_info; @@ -125,11 +121,12 @@ void kbase_hwcnt_watchdog_if_timer_destroy( destroy_workqueue(timer_info->workq); kfree(timer_info); - *watchdog_if = (struct kbase_hwcnt_watchdog_interface){ NULL }; + *watchdog_if = (struct kbase_hwcnt_watchdog_interface){ + .timer = NULL, .enable = NULL, .disable = NULL, .modify = NULL + }; } -int kbase_hwcnt_watchdog_if_timer_create( - struct kbase_hwcnt_watchdog_interface *const watchdog_if) +int kbase_hwcnt_watchdog_if_timer_create(struct kbase_hwcnt_watchdog_interface *const watchdog_if) { struct kbase_hwcnt_watchdog_if_timer_info *timer_info; @@ -140,9 +137,7 @@ int kbase_hwcnt_watchdog_if_timer_create( if (!timer_info) return -ENOMEM; - *timer_info = - (struct kbase_hwcnt_watchdog_if_timer_info){ .timer_enabled = - false }; + *timer_info = (struct kbase_hwcnt_watchdog_if_timer_info){ .timer_enabled = false }; INIT_DELAYED_WORK(&timer_info->dwork, kbasep_hwcnt_watchdog_callback); diff --git a/mali_kbase/mali_kbase_hwcnt_watchdog_if_timer.h b/mali_kbase/hwcnt/mali_kbase_hwcnt_watchdog_if_timer.h index 3bd69c3..a545ad3 100644 --- a/mali_kbase/mali_kbase_hwcnt_watchdog_if_timer.h +++ b/mali_kbase/hwcnt/mali_kbase_hwcnt_watchdog_if_timer.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -35,8 +35,7 @@ struct kbase_hwcnt_watchdog_interface; * * Return: 0 on success, error otherwise. */ -int kbase_hwcnt_watchdog_if_timer_create( - struct kbase_hwcnt_watchdog_interface *watchdog_if); +int kbase_hwcnt_watchdog_if_timer_create(struct kbase_hwcnt_watchdog_interface *watchdog_if); /** * kbase_hwcnt_watchdog_if_timer_destroy() - Destroy a watchdog interface of hardware counter @@ -44,7 +43,6 @@ int kbase_hwcnt_watchdog_if_timer_create( * * @watchdog_if: Pointer to watchdog interface to destroy */ -void kbase_hwcnt_watchdog_if_timer_destroy( - struct kbase_hwcnt_watchdog_interface *watchdog_if); +void kbase_hwcnt_watchdog_if_timer_destroy(struct kbase_hwcnt_watchdog_interface *watchdog_if); #endif /* _KBASE_HWCNT_WATCHDOG_IF_TIMER_H_ */ diff --git a/mali_kbase/ipa/backend/mali_kbase_ipa_counter_common_csf.c b/mali_kbase/ipa/backend/mali_kbase_ipa_counter_common_csf.c index 81dc56b..60b061e 100644 --- a/mali_kbase/ipa/backend/mali_kbase_ipa_counter_common_csf.c +++ b/mali_kbase/ipa/backend/mali_kbase_ipa_counter_common_csf.c @@ -281,7 +281,7 @@ int kbase_ipa_counter_dynamic_coeff(struct kbase_ipa_model *model, u32 *coeffp) if (WARN_ON(ret)) return ret; - now = ktime_get(); + now = ktime_get_raw(); diff = ktime_sub(now, kbdev->ipa.last_sample_time); diff_ms = ktime_to_ms(diff); diff --git a/mali_kbase/ipa/backend/mali_kbase_ipa_counter_common_jm.c b/mali_kbase/ipa/backend/mali_kbase_ipa_counter_common_jm.c index e240117..34515a9 100644 --- a/mali_kbase/ipa/backend/mali_kbase_ipa_counter_common_jm.c +++ b/mali_kbase/ipa/backend/mali_kbase_ipa_counter_common_jm.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2017-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2017-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -31,7 +31,7 @@ #define DEFAULT_MIN_SAMPLE_CYCLES 10000 /** - * read_hwcnt() - read a counter value + * kbase_ipa_read_hwcnt() - read a counter value * @model_data: pointer to model data * @offset: offset, in bytes, into vinstr buffer * diff --git a/mali_kbase/ipa/backend/mali_kbase_ipa_counter_common_jm.h b/mali_kbase/ipa/backend/mali_kbase_ipa_counter_common_jm.h index e1718c6..6089610 100644 --- a/mali_kbase/ipa/backend/mali_kbase_ipa_counter_common_jm.h +++ b/mali_kbase/ipa/backend/mali_kbase_ipa_counter_common_jm.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2017-2018, 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2017-2018, 2020-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -23,8 +23,8 @@ #define _KBASE_IPA_COUNTER_COMMON_JM_H_ #include "mali_kbase.h" -#include "mali_kbase_hwcnt_virtualizer.h" -#include "mali_kbase_hwcnt_types.h" +#include "hwcnt/mali_kbase_hwcnt_virtualizer.h" +#include "hwcnt/mali_kbase_hwcnt_types.h" /* Maximum number of IPA groups for an IPA model. */ #define KBASE_IPA_MAX_GROUP_DEF_NUM 16 @@ -83,7 +83,7 @@ struct kbase_ipa_model_vinstr_data { }; /** - * struct ipa_group - represents a single IPA group + * struct kbase_ipa_group - represents a single IPA group * @name: name of the IPA group * @default_value: default value of coefficient for IPA group. * Coefficients are interpreted as fractions where the @@ -152,7 +152,7 @@ s64 kbase_ipa_single_counter( s32 coeff, u32 counter); /** - * attach_vinstr() - attach a vinstr_buffer to an IPA model. + * kbase_ipa_attach_vinstr() - attach a vinstr_buffer to an IPA model. * @model_data: pointer to model data * * Attach a vinstr_buffer to an IPA model. The vinstr_buffer @@ -164,7 +164,7 @@ s64 kbase_ipa_single_counter( int kbase_ipa_attach_vinstr(struct kbase_ipa_model_vinstr_data *model_data); /** - * detach_vinstr() - detach a vinstr_buffer from an IPA model. + * kbase_ipa_detach_vinstr() - detach a vinstr_buffer from an IPA model. * @model_data: pointer to model data * * Detach a vinstr_buffer from an IPA model. diff --git a/mali_kbase/ipa/backend/mali_kbase_ipa_counter_csf.c b/mali_kbase/ipa/backend/mali_kbase_ipa_counter_csf.c index 66e56e2..21b4e52 100644 --- a/mali_kbase/ipa/backend/mali_kbase_ipa_counter_csf.c +++ b/mali_kbase/ipa/backend/mali_kbase_ipa_counter_csf.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2020-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -23,10 +23,13 @@ #include "mali_kbase.h" /* MEMSYS counter block offsets */ +#define L2_RD_MSG_IN_CU (13) #define L2_RD_MSG_IN (16) #define L2_WR_MSG_IN (18) +#define L2_SNP_MSG_IN (20) #define L2_RD_MSG_OUT (22) #define L2_READ_LOOKUP (26) +#define L2_EXT_READ_NOSNP (30) #define L2_EXT_WRITE_NOSNP_FULL (43) /* SC counter block offsets */ @@ -36,17 +39,23 @@ #define FULL_QUAD_WARPS (21) #define EXEC_INSTR_FMA (27) #define EXEC_INSTR_CVT (28) +#define EXEC_INSTR_SFU (29) #define EXEC_INSTR_MSG (30) #define TEX_FILT_NUM_OPS (39) #define LS_MEM_READ_SHORT (45) #define LS_MEM_WRITE_SHORT (47) #define VARY_SLOT_16 (51) +#define BEATS_RD_LSC_EXT (57) +#define BEATS_RD_TEX (58) +#define BEATS_RD_TEX_EXT (59) +#define FRAG_QUADS_COARSE (68) /* Tiler counter block offsets */ #define IDVS_POS_SHAD_STALL (23) #define PREFETCH_STALL (25) #define VFETCH_POS_READ_WAIT (29) #define VFETCH_VERTEX_WAIT (30) +#define PRIMASSY_STALL (32) #define IDVS_VAR_SHAD_STALL (38) #define ITER_STALL (40) #define PMGR_PTR_RD_STALL (48) @@ -59,9 +68,6 @@ .counter_block_type = block_type, \ } -#define CSHW_COUNTER_DEF(cnt_name, coeff, cnt_idx) \ - COUNTER_DEF(cnt_name, coeff, cnt_idx, KBASE_IPA_CORE_TYPE_CSHW) - #define MEMSYS_COUNTER_DEF(cnt_name, coeff, cnt_idx) \ COUNTER_DEF(cnt_name, coeff, cnt_idx, KBASE_IPA_CORE_TYPE_MEMSYS) @@ -114,6 +120,15 @@ static const struct kbase_ipa_counter ipa_top_level_cntrs_def_ttux[] = { TILER_COUNTER_DEF("vfetch_vertex_wait", -391964, VFETCH_VERTEX_WAIT), }; +static const struct kbase_ipa_counter ipa_top_level_cntrs_def_ttix[] = { + TILER_COUNTER_DEF("primassy_stall", 471953, PRIMASSY_STALL), + TILER_COUNTER_DEF("idvs_var_shad_stall", -460559, IDVS_VAR_SHAD_STALL), + + MEMSYS_COUNTER_DEF("l2_rd_msg_in_cu", -6189604, L2_RD_MSG_IN_CU), + MEMSYS_COUNTER_DEF("l2_snp_msg_in", 6289609, L2_SNP_MSG_IN), + MEMSYS_COUNTER_DEF("l2_ext_read_nosnp", 512341, L2_EXT_READ_NOSNP), +}; + /* These tables provide a description of each performance counter * used by the shader cores counter model for energy estimation. */ @@ -153,6 +168,17 @@ static const struct kbase_ipa_counter ipa_shader_core_cntrs_def_ttux[] = { SC_COUNTER_DEF("frag_quads_ezs_update", 372032, FRAG_QUADS_EZS_UPDATE), }; +static const struct kbase_ipa_counter ipa_shader_core_cntrs_def_ttix[] = { + SC_COUNTER_DEF("exec_instr_fma", 192642, EXEC_INSTR_FMA), + SC_COUNTER_DEF("exec_instr_msg", 1326465, EXEC_INSTR_MSG), + SC_COUNTER_DEF("beats_rd_tex", 163518, BEATS_RD_TEX), + SC_COUNTER_DEF("beats_rd_lsc_ext", 127475, BEATS_RD_LSC_EXT), + SC_COUNTER_DEF("frag_quads_coarse", -36247, FRAG_QUADS_COARSE), + SC_COUNTER_DEF("ls_mem_write_short", 51547, LS_MEM_WRITE_SHORT), + SC_COUNTER_DEF("beats_rd_tex_ext", -43370, BEATS_RD_TEX_EXT), + SC_COUNTER_DEF("exec_instr_sfu", 31583, EXEC_INSTR_SFU), +}; + #define IPA_POWER_MODEL_OPS(gpu, init_token) \ const struct kbase_ipa_model_ops kbase_ ## gpu ## _ipa_model_ops = { \ .name = "mali-" #gpu "-power-model", \ @@ -184,13 +210,13 @@ static const struct kbase_ipa_counter ipa_shader_core_cntrs_def_ttux[] = { #define ALIAS_POWER_MODEL(gpu, as_gpu) \ IPA_POWER_MODEL_OPS(gpu, as_gpu) -/* Reference voltage value is 750 mV. - */ +/* Reference voltage value is 750 mV. */ STANDARD_POWER_MODEL(todx, 750); STANDARD_POWER_MODEL(tgrx, 750); STANDARD_POWER_MODEL(tvax, 750); - STANDARD_POWER_MODEL(ttux, 750); +/* Reference voltage value is 550 mV. */ +STANDARD_POWER_MODEL(ttix, 550); /* Assuming LODX is an alias of TODX for IPA */ ALIAS_POWER_MODEL(lodx, todx); @@ -198,10 +224,14 @@ ALIAS_POWER_MODEL(lodx, todx); /* Assuming LTUX is an alias of TTUX for IPA */ ALIAS_POWER_MODEL(ltux, ttux); +/* Assuming LTUX is an alias of TTUX for IPA */ +ALIAS_POWER_MODEL(ltix, ttix); + static const struct kbase_ipa_model_ops *ipa_counter_model_ops[] = { &kbase_todx_ipa_model_ops, &kbase_lodx_ipa_model_ops, &kbase_tgrx_ipa_model_ops, &kbase_tvax_ipa_model_ops, - &kbase_ttux_ipa_model_ops, &kbase_ltux_ipa_model_ops + &kbase_ttux_ipa_model_ops, &kbase_ltux_ipa_model_ops, + &kbase_ttix_ipa_model_ops, &kbase_ltix_ipa_model_ops, }; const struct kbase_ipa_model_ops *kbase_ipa_counter_model_ops_find( @@ -240,6 +270,10 @@ const char *kbase_ipa_counter_model_name_from_id(u32 gpu_id) return "mali-ttux-power-model"; case GPU_ID2_PRODUCT_LTUX: return "mali-ltux-power-model"; + case GPU_ID2_PRODUCT_TTIX: + return "mali-ttix-power-model"; + case GPU_ID2_PRODUCT_LTIX: + return "mali-ltix-power-model"; default: return NULL; } diff --git a/mali_kbase/ipa/backend/mali_kbase_ipa_counter_jm.c b/mali_kbase/ipa/backend/mali_kbase_ipa_counter_jm.c index f11be0d..5a204ae 100644 --- a/mali_kbase/ipa/backend/mali_kbase_ipa_counter_jm.c +++ b/mali_kbase/ipa/backend/mali_kbase_ipa_counter_jm.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2016-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2016-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -23,28 +23,19 @@ #include "mali_kbase_ipa_counter_common_jm.h" #include "mali_kbase.h" - -#if IS_ENABLED(CONFIG_MALI_NO_MALI) -#include <backend/gpu/mali_kbase_model_dummy.h> -#endif /* CONFIG_MALI_NO_MALI */ +#include <backend/gpu/mali_kbase_model_linux.h> /* Performance counter blocks base offsets */ #define JM_BASE (0 * KBASE_IPA_NR_BYTES_PER_BLOCK) -#define TILER_BASE (1 * KBASE_IPA_NR_BYTES_PER_BLOCK) #define MEMSYS_BASE (2 * KBASE_IPA_NR_BYTES_PER_BLOCK) /* JM counter block offsets */ #define JM_GPU_ACTIVE (KBASE_IPA_NR_BYTES_PER_CNT * 6) -/* Tiler counter block offsets */ -#define TILER_ACTIVE (KBASE_IPA_NR_BYTES_PER_CNT * 45) - /* MEMSYS counter block offsets */ #define MEMSYS_L2_ANY_LOOKUP (KBASE_IPA_NR_BYTES_PER_CNT * 25) /* SC counter block offsets */ -#define SC_FRAG_ACTIVE (KBASE_IPA_NR_BYTES_PER_CNT * 4) -#define SC_EXEC_CORE_ACTIVE (KBASE_IPA_NR_BYTES_PER_CNT * 26) #define SC_EXEC_INSTR_FMA (KBASE_IPA_NR_BYTES_PER_CNT * 27) #define SC_EXEC_INSTR_COUNT (KBASE_IPA_NR_BYTES_PER_CNT * 28) #define SC_EXEC_INSTR_MSG (KBASE_IPA_NR_BYTES_PER_CNT * 30) @@ -52,16 +43,14 @@ #define SC_TEX_COORD_ISSUE (KBASE_IPA_NR_BYTES_PER_CNT * 40) #define SC_TEX_TFCH_NUM_OPERATIONS (KBASE_IPA_NR_BYTES_PER_CNT * 42) #define SC_VARY_INSTR (KBASE_IPA_NR_BYTES_PER_CNT * 49) -#define SC_VARY_SLOT_32 (KBASE_IPA_NR_BYTES_PER_CNT * 50) -#define SC_VARY_SLOT_16 (KBASE_IPA_NR_BYTES_PER_CNT * 51) -#define SC_BEATS_RD_LSC (KBASE_IPA_NR_BYTES_PER_CNT * 56) -#define SC_BEATS_WR_LSC (KBASE_IPA_NR_BYTES_PER_CNT * 61) #define SC_BEATS_WR_TIB (KBASE_IPA_NR_BYTES_PER_CNT * 62) /** - * get_jm_counter() - get performance counter offset inside the Job Manager block + * kbase_g7x_power_model_get_jm_counter() - get performance counter offset + * inside the Job Manager block * @model_data: pointer to GPU model data. - * @counter_block_offset: offset in bytes of the performance counter inside the Job Manager block. + * @counter_block_offset: offset in bytes of the performance counter inside + * the Job Manager block. * * Return: Block offset in bytes of the required performance counter. */ @@ -72,9 +61,11 @@ static u32 kbase_g7x_power_model_get_jm_counter(struct kbase_ipa_model_vinstr_da } /** - * get_memsys_counter() - get performance counter offset inside the Memory System block + * kbase_g7x_power_model_get_memsys_counter() - get performance counter offset + * inside the Memory System block * @model_data: pointer to GPU model data. - * @counter_block_offset: offset in bytes of the performance counter inside the (first) Memory System block. + * @counter_block_offset: offset in bytes of the performance counter inside + * the (first) Memory System block. * * Return: Block offset in bytes of the required performance counter. */ @@ -88,9 +79,11 @@ static u32 kbase_g7x_power_model_get_memsys_counter(struct kbase_ipa_model_vinst } /** - * get_sc_counter() - get performance counter offset inside the Shader Cores block + * kbase_g7x_power_model_get_sc_counter() - get performance counter offset + * inside the Shader Cores block * @model_data: pointer to GPU model data. - * @counter_block_offset: offset in bytes of the performance counter inside the (first) Shader Cores block. + * @counter_block_offset: offset in bytes of the performance counter inside + * the (first) Shader Cores block. * * Return: Block offset in bytes of the required performance counter. */ @@ -110,10 +103,12 @@ static u32 kbase_g7x_power_model_get_sc_counter(struct kbase_ipa_model_vinstr_da } /** - * memsys_single_counter() - calculate energy for a single Memory System performance counter. + * kbase_g7x_sum_all_memsys_blocks() - calculate energy for a single Memory + * System performance counter. * @model_data: pointer to GPU model data. * @coeff: default value of coefficient for IPA group. - * @counter_block_offset: offset in bytes of the counter inside the block it belongs to. + * @counter_block_offset: offset in bytes of the counter inside the block it + * belongs to. * * Return: Energy estimation for a single Memory System performance counter. */ @@ -130,12 +125,15 @@ static s64 kbase_g7x_sum_all_memsys_blocks( } /** - * sum_all_shader_cores() - calculate energy for a Shader Cores performance counter for all cores. + * kbase_g7x_sum_all_shader_cores() - calculate energy for a Shader Cores + * performance counter for all cores. * @model_data: pointer to GPU model data. * @coeff: default value of coefficient for IPA group. - * @counter_block_offset: offset in bytes of the counter inside the block it belongs to. + * @counter_block_offset: offset in bytes of the counter inside the block it + * belongs to. * - * Return: Energy estimation for a Shader Cores performance counter for all cores. + * Return: Energy estimation for a Shader Cores performance counter for all + * cores. */ static s64 kbase_g7x_sum_all_shader_cores( struct kbase_ipa_model_vinstr_data *model_data, @@ -150,7 +148,7 @@ static s64 kbase_g7x_sum_all_shader_cores( } /** - * jm_single_counter() - calculate energy for a single Job Manager performance counter. + * kbase_g7x_jm_single_counter() - calculate energy for a single Job Manager performance counter. * @model_data: pointer to GPU model data. * @coeff: default value of coefficient for IPA group. * @counter_block_offset: offset in bytes of the counter inside the block it belongs to. @@ -170,7 +168,7 @@ static s64 kbase_g7x_jm_single_counter( } /** - * get_active_cycles() - return the GPU_ACTIVE counter + * kbase_g7x_get_active_cycles() - return the GPU_ACTIVE counter * @model_data: pointer to GPU model data. * * Return: the number of cycles the GPU was active during the counter sampling @@ -457,16 +455,14 @@ static const struct kbase_ipa_group ipa_groups_def_tbax[] = { }, }; - -#define IPA_POWER_MODEL_OPS(gpu, init_token) \ - const struct kbase_ipa_model_ops kbase_ ## gpu ## _ipa_model_ops = { \ - .name = "mali-" #gpu "-power-model", \ - .init = kbase_ ## init_token ## _power_model_init, \ - .term = kbase_ipa_vinstr_common_model_term, \ - .get_dynamic_coeff = kbase_ipa_vinstr_dynamic_coeff, \ - .reset_counter_data = kbase_ipa_vinstr_reset_data, \ - }; \ - KBASE_EXPORT_TEST_API(kbase_ ## gpu ## _ipa_model_ops) +#define IPA_POWER_MODEL_OPS(gpu, init_token) \ + static const struct kbase_ipa_model_ops kbase_##gpu##_ipa_model_ops = { \ + .name = "mali-" #gpu "-power-model", \ + .init = kbase_##init_token##_power_model_init, \ + .term = kbase_ipa_vinstr_common_model_term, \ + .get_dynamic_coeff = kbase_ipa_vinstr_dynamic_coeff, \ + .reset_counter_data = kbase_ipa_vinstr_reset_data, \ + } #define STANDARD_POWER_MODEL(gpu, reference_voltage) \ static int kbase_ ## gpu ## _power_model_init(\ diff --git a/mali_kbase/ipa/mali_kbase_ipa.c b/mali_kbase/ipa/mali_kbase_ipa.c index 428e68b..0e8abb1 100644 --- a/mali_kbase/ipa/mali_kbase_ipa.c +++ b/mali_kbase/ipa/mali_kbase_ipa.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2016-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2016-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -84,11 +84,11 @@ KBASE_EXPORT_TEST_API(kbase_ipa_model_name_from_id); static struct device_node *get_model_dt_node(struct kbase_ipa_model *model, bool dt_required) { - struct device_node *model_dt_node; + struct device_node *model_dt_node = NULL; char compat_string[64]; - snprintf(compat_string, sizeof(compat_string), "arm,%s", - model->ops->name); + if (unlikely(!scnprintf(compat_string, sizeof(compat_string), "arm,%s", model->ops->name))) + return NULL; /* of_find_compatible_node() will call of_node_put() on the root node, * so take a reference on it first. @@ -111,12 +111,12 @@ int kbase_ipa_model_add_param_s32(struct kbase_ipa_model *model, const char *name, s32 *addr, size_t num_elems, bool dt_required) { - int err, i; + int err = -EINVAL, i; struct device_node *model_dt_node = get_model_dt_node(model, dt_required); char *origin; - err = of_property_read_u32_array(model_dt_node, name, addr, num_elems); + err = of_property_read_u32_array(model_dt_node, name, (u32 *)addr, num_elems); /* We're done with model_dt_node now, so drop the reference taken in * get_model_dt_node()/of_find_compatible_node(). */ @@ -138,11 +138,17 @@ int kbase_ipa_model_add_param_s32(struct kbase_ipa_model *model, for (i = 0; i < num_elems; ++i) { char elem_name[32]; - if (num_elems == 1) - snprintf(elem_name, sizeof(elem_name), "%s", name); - else - snprintf(elem_name, sizeof(elem_name), "%s.%d", - name, i); + if (num_elems == 1) { + if (unlikely(!scnprintf(elem_name, sizeof(elem_name), "%s", name))) { + err = -ENOMEM; + goto exit; + } + } else { + if (unlikely(!scnprintf(elem_name, sizeof(elem_name), "%s.%d", name, i))) { + err = -ENOMEM; + goto exit; + } + } dev_dbg(model->kbdev->dev, "%s.%s = %d (%s)\n", model->ops->name, elem_name, addr[i], origin); @@ -164,7 +170,7 @@ int kbase_ipa_model_add_param_string(struct kbase_ipa_model *model, int err; struct device_node *model_dt_node = get_model_dt_node(model, dt_required); - const char *string_prop_value; + const char *string_prop_value = ""; char *origin; err = of_property_read_string(model_dt_node, name, @@ -324,7 +330,7 @@ int kbase_ipa_init(struct kbase_device *kbdev) kbdev->ipa.configured_model = default_model; } - kbdev->ipa.last_sample_time = ktime_get(); + kbdev->ipa.last_sample_time = ktime_get_raw(); end: if (err) @@ -750,7 +756,7 @@ void kbase_ipa_reset_data(struct kbase_device *kbdev) mutex_lock(&kbdev->ipa.lock); - now = ktime_get(); + now = ktime_get_raw(); diff = ktime_sub(now, kbdev->ipa.last_sample_time); elapsed_time = ktime_to_ms(diff); @@ -765,7 +771,7 @@ void kbase_ipa_reset_data(struct kbase_device *kbdev) if (model != kbdev->ipa.fallback_model) model->ops->reset_counter_data(model); - kbdev->ipa.last_sample_time = ktime_get(); + kbdev->ipa.last_sample_time = ktime_get_raw(); } mutex_unlock(&kbdev->ipa.lock); diff --git a/mali_kbase/ipa/mali_kbase_ipa.h b/mali_kbase/ipa/mali_kbase_ipa.h index c668af9..4f35b9e 100644 --- a/mali_kbase/ipa/mali_kbase_ipa.h +++ b/mali_kbase/ipa/mali_kbase_ipa.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2016-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2016-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -266,7 +266,6 @@ int kbase_get_real_power(struct devfreq *df, u32 *power, unsigned long freq, unsigned long voltage); -#if MALI_UNIT_TEST /* Called by kbase_get_real_power() to invoke the power models. * Must be called with kbdev->ipa.lock held. * This function is only exposed for use by unit tests. @@ -274,7 +273,6 @@ int kbase_get_real_power(struct devfreq *df, u32 *power, int kbase_get_real_power_locked(struct kbase_device *kbdev, u32 *power, unsigned long freq, unsigned long voltage); -#endif /* MALI_UNIT_TEST */ extern struct devfreq_cooling_power kbase_ipa_power_model_ops; diff --git a/mali_kbase/ipa/mali_kbase_ipa_debugfs.c b/mali_kbase/ipa/mali_kbase_ipa_debugfs.c index d554fff..a8523a7 100644 --- a/mali_kbase/ipa/mali_kbase_ipa_debugfs.c +++ b/mali_kbase/ipa/mali_kbase_ipa_debugfs.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2017-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2017-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -20,6 +20,7 @@ */ #include <linux/debugfs.h> +#include <linux/version_compat_defs.h> #include <linux/list.h> #include <linux/mutex.h> @@ -27,10 +28,6 @@ #include "mali_kbase_ipa.h" #include "mali_kbase_ipa_debugfs.h" -#if (KERNEL_VERSION(4, 7, 0) > LINUX_VERSION_CODE) -#define DEFINE_DEBUGFS_ATTRIBUTE DEFINE_SIMPLE_ATTRIBUTE -#endif - struct kbase_ipa_model_param { char *name; union { diff --git a/mali_kbase/ipa/mali_kbase_ipa_simple.c b/mali_kbase/ipa/mali_kbase_ipa_simple.c index fadae7d..0fd2136 100644 --- a/mali_kbase/ipa/mali_kbase_ipa_simple.c +++ b/mali_kbase/ipa/mali_kbase_ipa_simple.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2016-2018, 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2016-2018, 2020-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -33,6 +33,8 @@ #include "mali_kbase_ipa_simple.h" #include "mali_kbase_ipa_debugfs.h" +#if MALI_USE_CSF + /* This is used if the dynamic power for top-level is estimated separately * through the counter model. To roughly match the contribution of top-level * power in the total dynamic power, when calculated through counter model, @@ -43,6 +45,8 @@ */ #define TOP_LEVEL_DYN_COEFF_SCALER (3) +#endif /* MALI_USE_CSF */ + #if MALI_UNIT_TEST static int dummy_temp; @@ -227,14 +231,12 @@ static int add_params(struct kbase_ipa_model *model) (struct kbase_ipa_model_simple_data *)model->model_data; err = kbase_ipa_model_add_param_s32(model, "static-coefficient", - &model_data->static_coefficient, - 1, true); + (s32 *)&model_data->static_coefficient, 1, true); if (err) goto end; err = kbase_ipa_model_add_param_s32(model, "dynamic-coefficient", - &model_data->dynamic_coefficient, - 1, true); + (s32 *)&model_data->dynamic_coefficient, 1, true); if (err) goto end; @@ -321,8 +323,9 @@ static int kbase_simple_power_model_recalculate(struct kbase_ipa_model *model) mutex_lock(&model->kbdev->ipa.lock); if (IS_ERR_OR_NULL(tz)) { - pr_warn_ratelimited("Error %ld getting thermal zone \'%s\', not yet ready?\n", - PTR_ERR(tz), tz_name); + pr_warn_ratelimited( + "Error %d getting thermal zone \'%s\', not yet ready?\n", + PTR_ERR_OR_ZERO(tz), tz_name); return -EPROBE_DEFER; } diff --git a/mali_kbase/jm/mali_kbase_jm_defs.h b/mali_kbase/jm/mali_kbase_jm_defs.h index 3c4d6b2..e694f9f 100644 --- a/mali_kbase/jm/mali_kbase_jm_defs.h +++ b/mali_kbase/jm/mali_kbase_jm_defs.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -135,13 +135,22 @@ /** * enum kbase_timeout_selector - The choice of which timeout to get scaled * using the lowest GPU frequency. - * @KBASE_TIMEOUT_SELECTOR_COUNT: Number of timeout selectors. Must be last in - * the enum. + * @MMU_AS_INACTIVE_WAIT_TIMEOUT: Maximum waiting time in ms for the completion + * of a MMU operation + * @JM_DEFAULT_JS_FREE_TIMEOUT: Maximum timeout to wait for JS_COMMAND_NEXT + * to be updated on HW side so a Job Slot is + * considered free. + * @KBASE_TIMEOUT_SELECTOR_COUNT: Number of timeout selectors. + * @KBASE_DEFAULT_TIMEOUT: Fallthrough in case an invalid timeout is + * passed. */ enum kbase_timeout_selector { + MMU_AS_INACTIVE_WAIT_TIMEOUT, + JM_DEFAULT_JS_FREE_TIMEOUT, /* Must be the last in the enum */ - KBASE_TIMEOUT_SELECTOR_COUNT + KBASE_TIMEOUT_SELECTOR_COUNT, + KBASE_DEFAULT_TIMEOUT = JM_DEFAULT_JS_FREE_TIMEOUT }; #if IS_ENABLED(CONFIG_DEBUG_FS) @@ -194,8 +203,6 @@ struct kbase_jd_atom_dependency { static inline const struct kbase_jd_atom * kbase_jd_katom_dep_atom(const struct kbase_jd_atom_dependency *dep) { - KBASE_DEBUG_ASSERT(dep != NULL); - return (const struct kbase_jd_atom *)(dep->atom); } @@ -209,8 +216,6 @@ kbase_jd_katom_dep_atom(const struct kbase_jd_atom_dependency *dep) static inline u8 kbase_jd_katom_dep_type( const struct kbase_jd_atom_dependency *dep) { - KBASE_DEBUG_ASSERT(dep != NULL); - return dep->dep_type; } @@ -227,8 +232,6 @@ static inline void kbase_jd_katom_dep_set( { struct kbase_jd_atom_dependency *dep; - KBASE_DEBUG_ASSERT(const_dep != NULL); - dep = (struct kbase_jd_atom_dependency *)const_dep; dep->atom = a; @@ -245,8 +248,6 @@ static inline void kbase_jd_katom_dep_clear( { struct kbase_jd_atom_dependency *dep; - KBASE_DEBUG_ASSERT(const_dep != NULL); - dep = (struct kbase_jd_atom_dependency *)const_dep; dep->atom = NULL; @@ -361,19 +362,6 @@ enum kbase_atom_exit_protected_state { }; /** - * struct kbase_ext_res - Contains the info for external resources referred - * by an atom, which have been mapped on GPU side. - * @gpu_address: Start address of the memory region allocated for - * the resource from GPU virtual address space. - * @alloc: pointer to physical pages tracking object, set on - * mapping the external resource on GPU side. - */ -struct kbase_ext_res { - u64 gpu_address; - struct kbase_mem_phy_alloc *alloc; -}; - -/** * struct kbase_jd_atom - object representing the atom, containing the complete * state and attributes of an atom. * @work: work item for the bottom half processing of the atom, @@ -406,7 +394,8 @@ struct kbase_ext_res { * each allocation is read in order to enforce an * overall physical memory usage limit. * @nr_extres: number of external resources referenced by the atom. - * @extres: pointer to the location containing info about + * @extres: Pointer to @nr_extres VA regions containing the external + * resource allocation and other information. * @nr_extres external resources referenced by the atom. * @device_nr: indicates the coregroup with which the atom is * associated, when @@ -424,16 +413,21 @@ struct kbase_ext_res { * sync through soft jobs and for the implicit * synchronization required on access to external * resources. - * @dma_fence.fence_in: Input fence + * @dma_fence.fence_in: Points to the dma-buf input fence for this atom. + * The atom would complete only after the fence is + * signaled. * @dma_fence.fence: Points to the dma-buf output fence for this atom. + * @dma_fence.fence_cb: The object that is passed at the time of adding the + * callback that gets invoked when @dma_fence.fence_in + * is signaled. + * @dma_fence.fence_cb_added: Flag to keep a track if the callback was successfully + * added for @dma_fence.fence_in, which is supposed to be + * invoked on the signaling of fence. * @dma_fence.context: The dma-buf fence context number for this atom. A * unique context number is allocated to each katom in * the context on context creation. * @dma_fence.seqno: The dma-buf fence sequence number for this atom. This * is increased every time this katom uses dma-buf fence - * @dma_fence.callbacks: List of all callbacks set up to wait on other fences - * @dma_fence.dep_count: Atomic counter of number of outstandind dma-buf fence - * dependencies for this atom. * @event_code: Event code for the job chain represented by the atom, * both HW and low-level SW events are represented by * event codes. @@ -516,7 +510,6 @@ struct kbase_ext_res { * BASE_JD_REQ_START_RENDERPASS set in its core requirements * with an atom that has BASE_JD_REQ_END_RENDERPASS set. * @jc_fragment: Set of GPU fragment job chains - * @retry_count: TODO: Not used,to be removed */ struct kbase_jd_atom { struct kthread_work work; @@ -536,21 +529,17 @@ struct kbase_jd_atom { #endif /* MALI_JIT_PRESSURE_LIMIT_BASE */ u16 nr_extres; - struct kbase_ext_res *extres; + struct kbase_va_region **extres; u32 device_nr; u64 jc; void *softjob_data; -#if defined(CONFIG_SYNC) - struct sync_fence *fence; - struct sync_fence_waiter sync_waiter; -#endif /* CONFIG_SYNC */ -#if defined(CONFIG_MALI_DMA_FENCE) || defined(CONFIG_SYNC_FILE) +#if IS_ENABLED(CONFIG_SYNC_FILE) struct { /* Use the functions/API defined in mali_kbase_fence.h to * when working with this sub struct */ -#if defined(CONFIG_SYNC_FILE) +#if IS_ENABLED(CONFIG_SYNC_FILE) #if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE) struct fence *fence_in; #else @@ -573,38 +562,21 @@ struct kbase_jd_atom { #else struct dma_fence *fence; #endif + + /* This is the callback object that is registered for the fence_in. + * The callback is invoked when the fence_in is signaled. + */ +#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE) + struct fence_cb fence_cb; +#else + struct dma_fence_cb fence_cb; +#endif + bool fence_cb_added; + unsigned int context; atomic_t seqno; - /* This contains a list of all callbacks set up to wait on - * other fences. This atom must be held back from JS until all - * these callbacks have been called and dep_count have reached - * 0. The initial value of dep_count must be equal to the - * number of callbacks on this list. - * - * This list is protected by jctx.lock. Callbacks are added to - * this list when the atom is built and the wait are set up. - * All the callbacks then stay on the list until all callbacks - * have been called and the atom is queued, or cancelled, and - * then all callbacks are taken off the list and freed. - */ - struct list_head callbacks; - /* Atomic counter of number of outstandind dma-buf fence - * dependencies for this atom. When dep_count reaches 0 the - * atom may be queued. - * - * The special value "-1" may only be set after the count - * reaches 0, while holding jctx.lock. This indicates that the - * atom has been handled, either queued in JS or cancelled. - * - * If anyone but the dma-fence worker sets this to -1 they must - * ensure that any potentially queued worker must have - * completed before allowing the atom to be marked as unused. - * This can be done by flushing the fence work queue: - * kctx->dma_fence.wq. - */ - atomic_t dep_count; } dma_fence; -#endif /* CONFIG_MALI_DMA_FENCE || CONFIG_SYNC_FILE */ +#endif /* CONFIG_SYNC_FILE */ /* Note: refer to kbasep_js_atom_retained_state, which will take a copy * of some of the following members @@ -623,12 +595,10 @@ struct kbase_jd_atom { #if IS_ENABLED(CONFIG_GPU_TRACEPOINTS) int work_id; #endif - int slot_nr; + unsigned int slot_nr; u32 atom_flags; - int retry_count; - enum kbase_atom_gpu_rb_state gpu_rb_state; bool need_cache_flush_cores_retained; @@ -672,7 +642,7 @@ static inline bool kbase_jd_katom_is_protected( } /** - * kbase_atom_is_younger - query if one atom is younger by age than another + * kbase_jd_atom_is_younger - query if one atom is younger by age than another * * @katom_a: the first atom * @katom_b: the second atom diff --git a/mali_kbase/jm/mali_kbase_jm_js.h b/mali_kbase/jm/mali_kbase_jm_js.h index f01e8bb..53819ca 100644 --- a/mali_kbase/jm/mali_kbase_jm_js.h +++ b/mali_kbase/jm/mali_kbase_jm_js.h @@ -29,6 +29,8 @@ #include "mali_kbase_js_ctx_attr.h" +#define JS_MAX_RUNNING_JOBS 8 + /** * kbasep_js_devdata_init - Initialize the Job Scheduler * @kbdev: The kbase_device to operate on @@ -130,15 +132,15 @@ void kbasep_js_kctx_term(struct kbase_context *kctx); * Atoms of higher priority might still be able to be pulled from the context * on @js. This helps with starting a high priority atom as soon as possible. */ -static inline void kbase_jsctx_slot_prio_blocked_set(struct kbase_context *kctx, - int js, int sched_prio) +static inline void kbase_jsctx_slot_prio_blocked_set(struct kbase_context *kctx, unsigned int js, + int sched_prio) { struct kbase_jsctx_slot_tracking *slot_tracking = &kctx->slot_tracking[js]; lockdep_assert_held(&kctx->kbdev->hwaccess_lock); WARN(!slot_tracking->atoms_pulled_pri[sched_prio], - "When marking slot %d as blocked for priority %d on a kctx, no atoms were pulled - the slot cannot become unblocked", + "When marking slot %u as blocked for priority %d on a kctx, no atoms were pulled - the slot cannot become unblocked", js, sched_prio); slot_tracking->blocked |= ((kbase_js_prio_bitmap_t)1) << sched_prio; @@ -508,19 +510,6 @@ bool kbase_js_dep_resolved_submit(struct kbase_context *kctx, struct kbase_jd_atom *katom); /** - * jsctx_ll_flush_to_rb() - Pushes atoms from the linked list to ringbuffer. - * @kctx: Context Pointer - * @prio: Priority (specifies the queue together with js). - * @js: Job slot (specifies the queue together with prio). - * - * Pushes all possible atoms from the linked list to the ringbuffer. - * Number of atoms are limited to free space in the ringbuffer and - * number of available atoms in the linked list. - * - */ -void jsctx_ll_flush_to_rb(struct kbase_context *kctx, int prio, int js); - -/** * kbase_js_pull - Pull an atom from a context in the job scheduler for * execution. * @@ -534,7 +523,7 @@ void jsctx_ll_flush_to_rb(struct kbase_context *kctx, int prio, int js); * Return: a pointer to an atom, or NULL if there are no atoms for this * slot that can be currently run. */ -struct kbase_jd_atom *kbase_js_pull(struct kbase_context *kctx, int js); +struct kbase_jd_atom *kbase_js_pull(struct kbase_context *kctx, unsigned int js); /** * kbase_js_unpull - Return an atom to the job scheduler ringbuffer. @@ -615,10 +604,10 @@ bool kbase_js_atom_blocked_on_x_dep(struct kbase_jd_atom *katom); * been used. * */ -void kbase_js_sched(struct kbase_device *kbdev, int js_mask); +void kbase_js_sched(struct kbase_device *kbdev, unsigned int js_mask); /** - * kbase_jd_zap_context - Attempt to deschedule a context that is being + * kbase_js_zap_context - Attempt to deschedule a context that is being * destroyed * @kctx: Context pointer * @@ -705,8 +694,10 @@ static inline bool kbasep_js_is_submit_allowed( bool is_allowed; /* Ensure context really is scheduled in */ - KBASE_DEBUG_ASSERT(kctx->as_nr != KBASEP_AS_NR_INVALID); - KBASE_DEBUG_ASSERT(kbase_ctx_flag(kctx, KCTX_SCHEDULED)); + if (WARN((kctx->as_nr == KBASEP_AS_NR_INVALID) || !kbase_ctx_flag(kctx, KCTX_SCHEDULED), + "%s: kctx %pK has assigned AS %d and context flag %d\n", __func__, (void *)kctx, + kctx->as_nr, atomic_read(&kctx->flags))) + return false; test_bit = (u16) (1u << kctx->as_nr); @@ -733,8 +724,10 @@ static inline void kbasep_js_set_submit_allowed( u16 set_bit; /* Ensure context really is scheduled in */ - KBASE_DEBUG_ASSERT(kctx->as_nr != KBASEP_AS_NR_INVALID); - KBASE_DEBUG_ASSERT(kbase_ctx_flag(kctx, KCTX_SCHEDULED)); + if (WARN((kctx->as_nr == KBASEP_AS_NR_INVALID) || !kbase_ctx_flag(kctx, KCTX_SCHEDULED), + "%s: kctx %pK has assigned AS %d and context flag %d\n", __func__, (void *)kctx, + kctx->as_nr, atomic_read(&kctx->flags))) + return; set_bit = (u16) (1u << kctx->as_nr); @@ -763,8 +756,10 @@ static inline void kbasep_js_clear_submit_allowed( u16 clear_mask; /* Ensure context really is scheduled in */ - KBASE_DEBUG_ASSERT(kctx->as_nr != KBASEP_AS_NR_INVALID); - KBASE_DEBUG_ASSERT(kbase_ctx_flag(kctx, KCTX_SCHEDULED)); + if (WARN((kctx->as_nr == KBASEP_AS_NR_INVALID) || !kbase_ctx_flag(kctx, KCTX_SCHEDULED), + "%s: kctx %pK has assigned AS %d and context flag %d\n", __func__, (void *)kctx, + kctx->as_nr, atomic_read(&kctx->flags))) + return; clear_bit = (u16) (1u << kctx->as_nr); clear_mask = ~clear_bit; @@ -798,7 +793,7 @@ static inline void kbasep_js_atom_retained_state_init_invalid( * @retained_state: where to copy * @katom: where to copy from * - * Copy atom state that can be made available after jd_done_nolock() is called + * Copy atom state that can be made available after kbase_jd_done_nolock() is called * on that atom. */ static inline void kbasep_js_atom_retained_state_copy( @@ -872,9 +867,6 @@ static inline void kbase_js_runpool_inc_context_count( struct kbasep_js_device_data *js_devdata; struct kbasep_js_kctx_info *js_kctx_info; - KBASE_DEBUG_ASSERT(kbdev != NULL); - KBASE_DEBUG_ASSERT(kctx != NULL); - js_devdata = &kbdev->js_data; js_kctx_info = &kctx->jctx.sched_info; @@ -882,13 +874,12 @@ static inline void kbase_js_runpool_inc_context_count( lockdep_assert_held(&js_devdata->runpool_mutex); /* Track total contexts */ - KBASE_DEBUG_ASSERT(js_devdata->nr_all_contexts_running < S8_MAX); + WARN_ON_ONCE(js_devdata->nr_all_contexts_running >= JS_MAX_RUNNING_JOBS); ++(js_devdata->nr_all_contexts_running); if (!kbase_ctx_flag(kctx, KCTX_SUBMIT_DISABLED)) { /* Track contexts that can submit jobs */ - KBASE_DEBUG_ASSERT(js_devdata->nr_user_contexts_running < - S8_MAX); + WARN_ON_ONCE(js_devdata->nr_user_contexts_running >= JS_MAX_RUNNING_JOBS); ++(js_devdata->nr_user_contexts_running); } } @@ -909,9 +900,6 @@ static inline void kbase_js_runpool_dec_context_count( struct kbasep_js_device_data *js_devdata; struct kbasep_js_kctx_info *js_kctx_info; - KBASE_DEBUG_ASSERT(kbdev != NULL); - KBASE_DEBUG_ASSERT(kctx != NULL); - js_devdata = &kbdev->js_data; js_kctx_info = &kctx->jctx.sched_info; @@ -920,12 +908,12 @@ static inline void kbase_js_runpool_dec_context_count( /* Track total contexts */ --(js_devdata->nr_all_contexts_running); - KBASE_DEBUG_ASSERT(js_devdata->nr_all_contexts_running >= 0); + WARN_ON_ONCE(js_devdata->nr_all_contexts_running < 0); if (!kbase_ctx_flag(kctx, KCTX_SUBMIT_DISABLED)) { /* Track contexts that can submit jobs */ --(js_devdata->nr_user_contexts_running); - KBASE_DEBUG_ASSERT(js_devdata->nr_user_contexts_running >= 0); + WARN_ON_ONCE(js_devdata->nr_user_contexts_running < 0); } } @@ -950,8 +938,8 @@ extern const base_jd_prio kbasep_js_relative_priority_to_atom[KBASE_JS_ATOM_SCHED_PRIO_COUNT]; /** - * kbasep_js_atom_prio_to_sched_prio(): - Convert atom priority (base_jd_prio) - * to relative ordering + * kbasep_js_atom_prio_to_sched_prio - Convert atom priority (base_jd_prio) + * to relative ordering. * @atom_prio: Priority ID to translate. * * Atom priority values for @ref base_jd_prio cannot be compared directly to @@ -980,16 +968,33 @@ static inline int kbasep_js_atom_prio_to_sched_prio(base_jd_prio atom_prio) return kbasep_js_atom_priority_to_relative[atom_prio]; } -static inline base_jd_prio kbasep_js_sched_prio_to_atom_prio(int sched_prio) +/** + * kbasep_js_sched_prio_to_atom_prio - Convert relative scheduler priority + * to atom priority (base_jd_prio). + * + * @kbdev: Device pointer + * @sched_prio: Relative scheduler priority to translate. + * + * This function will convert relative scheduler priority back into base_jd_prio + * values. It takes values which priorities are monotonically increasing + * and converts them to the corresponding base_jd_prio values. If an invalid number is + * passed in (i.e. not within the expected range) an error code is returned instead. + * + * The mapping is 1:1 and the size of the valid input range is the same as the + * size of the valid output range, i.e. + * KBASE_JS_ATOM_SCHED_PRIO_COUNT == BASE_JD_NR_PRIO_LEVELS + * + * Return: On success: a value in the inclusive range + * 0..BASE_JD_NR_PRIO_LEVELS-1. On failure: BASE_JD_PRIO_INVALID. + */ +static inline base_jd_prio kbasep_js_sched_prio_to_atom_prio(struct kbase_device *kbdev, + int sched_prio) { - unsigned int prio_idx; - - KBASE_DEBUG_ASSERT(sched_prio >= 0 && - sched_prio < KBASE_JS_ATOM_SCHED_PRIO_COUNT); - - prio_idx = (unsigned int)sched_prio; - - return kbasep_js_relative_priority_to_atom[prio_idx]; + if (likely(sched_prio >= 0 && sched_prio < KBASE_JS_ATOM_SCHED_PRIO_COUNT)) + return kbasep_js_relative_priority_to_atom[sched_prio]; + /* Invalid priority value if reached here */ + dev_warn(kbdev->dev, "Unknown JS scheduling priority %d", sched_prio); + return BASE_JD_PRIO_INVALID; } /** diff --git a/mali_kbase/jm/mali_kbase_js_defs.h b/mali_kbase/jm/mali_kbase_js_defs.h index c5cb9ea..009ff02 100644 --- a/mali_kbase/jm/mali_kbase_js_defs.h +++ b/mali_kbase/jm/mali_kbase_js_defs.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2011-2018, 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2011-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -277,6 +277,7 @@ typedef u32 kbase_atom_ordering_flag_t; * @nr_contexts_runnable:Number of contexts that can either be pulled from or * arecurrently running * @soft_job_timeout_ms:Value for JS_SOFT_JOB_TIMEOUT + * @js_free_wait_time_ms: Maximum waiting time in ms for a Job Slot to be seen free. * @queue_mutex: Queue Lock, used to access the Policy's queue of contexts * independently of the Run Pool. * Of course, you don't need the Run Pool lock to access this. @@ -329,6 +330,8 @@ struct kbasep_js_device_data { u32 nr_contexts_pullable; atomic_t nr_contexts_runnable; atomic_t soft_job_timeout_ms; + u32 js_free_wait_time_ms; + struct rt_mutex queue_mutex; /* * Run Pool mutex, for managing contexts within the runpool. @@ -339,6 +342,30 @@ struct kbasep_js_device_data { * * the kbasep_js_kctx_info::runpool substructure */ struct mutex runpool_mutex; + +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) + /** + * @gpu_metrics_timer: High-resolution timer used to periodically emit the GPU metrics + * tracepoints for applications that are using the GPU. The timer is + * needed for the long duration handling so that the length of work + * period is within the allowed limit. + */ + struct hrtimer gpu_metrics_timer; + + /** + * @gpu_metrics_timer_needed: Flag to indicate if the @gpu_metrics_timer is needed. + * The timer won't be started after the expiry if the flag + * isn't set. + */ + bool gpu_metrics_timer_needed; + + /** + * @gpu_metrics_timer_running: Flag to indicate if the @gpu_metrics_timer is running. + * The flag is set to false when the timer is cancelled or + * is not restarted after the expiry. + */ + bool gpu_metrics_timer_running; +#endif }; /** @@ -387,7 +414,7 @@ struct kbasep_js_kctx_info { * @sched_priority: priority * @device_nr: Core group atom was executed on * - * Subset of atom state that can be available after jd_done_nolock() is called + * Subset of atom state that can be available after kbase_jd_done_nolock() is called * on that atom. A copy must be taken via kbasep_js_atom_retained_state_copy(), * because the original atom could disappear. */ diff --git a/mali_kbase/mali_base_hwconfig_features.h b/mali_kbase/mali_base_hwconfig_features.h index a713681..724145f 100644 --- a/mali_kbase/mali_base_hwconfig_features.h +++ b/mali_kbase/mali_base_hwconfig_features.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2014-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2014-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -21,7 +21,7 @@ /* AUTOMATICALLY GENERATED FILE. If you want to amend the issues/features, * please update base/tools/hwconfig_generator/hwc_{issues,features}.py - * For more information see base/tools/hwconfig_generator/README + * For more information see base/tools/docs/hwconfig_generator.md */ #ifndef _BASE_HWCONFIG_FEATURES_H_ @@ -38,6 +38,9 @@ enum base_hw_feature { BASE_HW_FEATURE_ASN_HASH, BASE_HW_FEATURE_GPU_SLEEP, BASE_HW_FEATURE_FLUSH_INV_SHADER_OTHER, + BASE_HW_FEATURE_CORE_FEATURES, + BASE_HW_FEATURE_PBHA_HWU, + BASE_HW_FEATURE_LARGE_PAGE_ALLOC, BASE_HW_FEATURE_END }; @@ -87,6 +90,7 @@ __attribute__((unused)) static const enum base_hw_feature base_hw_features_tGOx[ BASE_HW_FEATURE_PROTECTED_DEBUG_MODE, BASE_HW_FEATURE_TLS_HASHING, BASE_HW_FEATURE_IDVS_GROUP_SIZE, + BASE_HW_FEATURE_CORE_FEATURES, BASE_HW_FEATURE_END }; @@ -128,47 +132,52 @@ __attribute__((unused)) static const enum base_hw_feature base_hw_features_tBAx[ BASE_HW_FEATURE_END }; -__attribute__((unused)) static const enum base_hw_feature base_hw_features_tDUx[] = { +__attribute__((unused)) static const enum base_hw_feature base_hw_features_tODx[] = { BASE_HW_FEATURE_FLUSH_REDUCTION, BASE_HW_FEATURE_PROTECTED_DEBUG_MODE, - BASE_HW_FEATURE_IDVS_GROUP_SIZE, BASE_HW_FEATURE_L2_CONFIG, BASE_HW_FEATURE_CLEAN_ONLY_SAFE, - BASE_HW_FEATURE_FLUSH_INV_SHADER_OTHER, BASE_HW_FEATURE_END }; -__attribute__((unused)) static const enum base_hw_feature base_hw_features_tODx[] = { +__attribute__((unused)) static const enum base_hw_feature base_hw_features_tGRx[] = { BASE_HW_FEATURE_FLUSH_REDUCTION, BASE_HW_FEATURE_PROTECTED_DEBUG_MODE, BASE_HW_FEATURE_L2_CONFIG, BASE_HW_FEATURE_CLEAN_ONLY_SAFE, + BASE_HW_FEATURE_CORE_FEATURES, BASE_HW_FEATURE_END }; -__attribute__((unused)) static const enum base_hw_feature base_hw_features_tGRx[] = { +__attribute__((unused)) static const enum base_hw_feature base_hw_features_tVAx[] = { BASE_HW_FEATURE_FLUSH_REDUCTION, BASE_HW_FEATURE_PROTECTED_DEBUG_MODE, BASE_HW_FEATURE_L2_CONFIG, BASE_HW_FEATURE_CLEAN_ONLY_SAFE, + BASE_HW_FEATURE_CORE_FEATURES, BASE_HW_FEATURE_END }; -__attribute__((unused)) static const enum base_hw_feature base_hw_features_tVAx[] = { +__attribute__((unused)) static const enum base_hw_feature base_hw_features_tTUx[] = { BASE_HW_FEATURE_FLUSH_REDUCTION, BASE_HW_FEATURE_PROTECTED_DEBUG_MODE, BASE_HW_FEATURE_L2_CONFIG, BASE_HW_FEATURE_CLEAN_ONLY_SAFE, + BASE_HW_FEATURE_ASN_HASH, + BASE_HW_FEATURE_GPU_SLEEP, + BASE_HW_FEATURE_CORE_FEATURES, BASE_HW_FEATURE_END }; -__attribute__((unused)) static const enum base_hw_feature base_hw_features_tTUx[] = { +__attribute__((unused)) static const enum base_hw_feature base_hw_features_tTIx[] = { BASE_HW_FEATURE_FLUSH_REDUCTION, BASE_HW_FEATURE_PROTECTED_DEBUG_MODE, BASE_HW_FEATURE_L2_CONFIG, BASE_HW_FEATURE_CLEAN_ONLY_SAFE, BASE_HW_FEATURE_ASN_HASH, BASE_HW_FEATURE_GPU_SLEEP, + BASE_HW_FEATURE_CORE_FEATURES, + BASE_HW_FEATURE_PBHA_HWU, BASE_HW_FEATURE_END }; diff --git a/mali_kbase/mali_base_hwconfig_issues.h b/mali_kbase/mali_base_hwconfig_issues.h index 8766a6d..003edda 100644 --- a/mali_kbase/mali_base_hwconfig_issues.h +++ b/mali_kbase/mali_base_hwconfig_issues.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2014-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2014-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -21,7 +21,7 @@ /* AUTOMATICALLY GENERATED FILE. If you want to amend the issues/features, * please update base/tools/hwconfig_generator/hwc_{issues,features}.py - * For more information see base/tools/hwconfig_generator/README + * For more information see base/tools/docs/hwconfig_generator.md */ #ifndef _BASE_HWCONFIG_ISSUES_H_ @@ -61,6 +61,13 @@ enum base_hw_issue { BASE_HW_ISSUE_GPU2019_3212, BASE_HW_ISSUE_TURSEHW_1997, BASE_HW_ISSUE_GPU2019_3878, + BASE_HW_ISSUE_TURSEHW_2716, + BASE_HW_ISSUE_GPU2019_3901, + BASE_HW_ISSUE_GPU2021PRO_290, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_TITANHW_2679, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -85,6 +92,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tMIx_r0p0 BASE_HW_ISSUE_TSIX_2033, BASE_HW_ISSUE_TTRX_921, BASE_HW_ISSUE_GPU2017_1336, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -105,6 +115,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tMIx_r0p0 BASE_HW_ISSUE_TSIX_2033, BASE_HW_ISSUE_TTRX_921, BASE_HW_ISSUE_GPU2017_1336, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -125,6 +138,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tMIx_r0p1 BASE_HW_ISSUE_TSIX_2033, BASE_HW_ISSUE_TTRX_921, BASE_HW_ISSUE_GPU2017_1336, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -140,6 +156,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tMI BASE_HW_ISSUE_TMIX_8343, BASE_HW_ISSUE_TMIX_8456, BASE_HW_ISSUE_TSIX_2033, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -153,6 +172,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tHEx_r0p0 BASE_HW_ISSUE_TSIX_2033, BASE_HW_ISSUE_TTRX_921, BASE_HW_ISSUE_GPU2017_1336, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -166,6 +188,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tHEx_r0p1 BASE_HW_ISSUE_TSIX_2033, BASE_HW_ISSUE_TTRX_921, BASE_HW_ISSUE_GPU2017_1336, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -179,6 +204,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tHEx_r0p2 BASE_HW_ISSUE_TSIX_2033, BASE_HW_ISSUE_TTRX_921, BASE_HW_ISSUE_GPU2017_1336, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -191,6 +219,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tHEx_r0p3 BASE_HW_ISSUE_TSIX_2033, BASE_HW_ISSUE_TTRX_921, BASE_HW_ISSUE_GPU2017_1336, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -201,6 +232,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tHE BASE_HW_ISSUE_TMIX_8042, BASE_HW_ISSUE_TMIX_8133, BASE_HW_ISSUE_TSIX_2033, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -214,6 +248,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tSIx_r0p0 BASE_HW_ISSUE_TTRX_921, BASE_HW_ISSUE_GPU2017_1336, BASE_HW_ISSUE_TTRX_3464, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -227,6 +264,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tSIx_r0p1 BASE_HW_ISSUE_TTRX_921, BASE_HW_ISSUE_GPU2017_1336, BASE_HW_ISSUE_TTRX_3464, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -239,6 +279,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tSIx_r1p0 BASE_HW_ISSUE_TTRX_921, BASE_HW_ISSUE_GPU2017_1336, BASE_HW_ISSUE_TTRX_3464, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -250,6 +293,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tSIx_r1p1 BASE_HW_ISSUE_TTRX_921, BASE_HW_ISSUE_GPU2017_1336, BASE_HW_ISSUE_TTRX_3464, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -260,6 +306,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tSI BASE_HW_ISSUE_TSIX_1116, BASE_HW_ISSUE_TSIX_2033, BASE_HW_ISSUE_TTRX_3464, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -271,6 +320,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tDVx_r0p0 BASE_HW_ISSUE_TTRX_921, BASE_HW_ISSUE_GPU2017_1336, BASE_HW_ISSUE_TTRX_3464, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -281,6 +333,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tDV BASE_HW_ISSUE_TSIX_1116, BASE_HW_ISSUE_TSIX_2033, BASE_HW_ISSUE_TTRX_3464, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -293,6 +348,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tNOx_r0p0 BASE_HW_ISSUE_TTRX_921, BASE_HW_ISSUE_GPU2017_1336, BASE_HW_ISSUE_TTRX_3464, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -303,6 +361,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tNO BASE_HW_ISSUE_TSIX_1116, BASE_HW_ISSUE_TSIX_2033, BASE_HW_ISSUE_TTRX_3464, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -315,6 +376,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tGOx_r0p0 BASE_HW_ISSUE_TTRX_921, BASE_HW_ISSUE_GPU2017_1336, BASE_HW_ISSUE_TTRX_3464, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -327,6 +391,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tGOx_r1p0 BASE_HW_ISSUE_TTRX_921, BASE_HW_ISSUE_GPU2017_1336, BASE_HW_ISSUE_TTRX_3464, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -337,6 +404,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tGO BASE_HW_ISSUE_TSIX_1116, BASE_HW_ISSUE_TSIX_2033, BASE_HW_ISSUE_TTRX_3464, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -353,6 +423,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tTRx_r0p0 BASE_HW_ISSUE_TTRX_3470, BASE_HW_ISSUE_TTRX_3464, BASE_HW_ISSUE_TTRX_3485, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -369,6 +442,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tTRx_r0p1 BASE_HW_ISSUE_TTRX_3470, BASE_HW_ISSUE_TTRX_3464, BASE_HW_ISSUE_TTRX_3485, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -384,6 +460,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tTRx_r0p2 BASE_HW_ISSUE_TTRX_3083, BASE_HW_ISSUE_TTRX_3470, BASE_HW_ISSUE_TTRX_3464, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -396,6 +475,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tTR BASE_HW_ISSUE_TTRX_3083, BASE_HW_ISSUE_TTRX_3470, BASE_HW_ISSUE_TTRX_3464, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -412,6 +494,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tNAx_r0p0 BASE_HW_ISSUE_TTRX_3470, BASE_HW_ISSUE_TTRX_3464, BASE_HW_ISSUE_TTRX_3485, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -427,6 +512,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tNAx_r0p1 BASE_HW_ISSUE_TTRX_3083, BASE_HW_ISSUE_TTRX_3470, BASE_HW_ISSUE_TTRX_3464, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -439,6 +527,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tNA BASE_HW_ISSUE_TTRX_3083, BASE_HW_ISSUE_TTRX_3470, BASE_HW_ISSUE_TTRX_3464, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -453,6 +544,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tBEx_r0p0 BASE_HW_ISSUE_TTRX_3470, BASE_HW_ISSUE_TTRX_3464, BASE_HW_ISSUE_TTRX_3485, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -466,6 +560,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tBEx_r0p1 BASE_HW_ISSUE_TTRX_3083, BASE_HW_ISSUE_TTRX_3470, BASE_HW_ISSUE_TTRX_3464, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -479,6 +576,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tBEx_r1p0 BASE_HW_ISSUE_TTRX_3083, BASE_HW_ISSUE_TTRX_3470, BASE_HW_ISSUE_TTRX_3464, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -492,6 +592,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tBEx_r1p1 BASE_HW_ISSUE_TTRX_3083, BASE_HW_ISSUE_TTRX_3470, BASE_HW_ISSUE_TTRX_3464, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -504,6 +607,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tBE BASE_HW_ISSUE_TTRX_3083, BASE_HW_ISSUE_TTRX_3470, BASE_HW_ISSUE_TTRX_3464, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -518,6 +624,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_lBEx_r1p0 BASE_HW_ISSUE_TTRX_3470, BASE_HW_ISSUE_TTRX_3464, BASE_HW_ISSUE_TTRX_3485, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -531,6 +640,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_lBEx_r1p1 BASE_HW_ISSUE_TTRX_3083, BASE_HW_ISSUE_TTRX_3470, BASE_HW_ISSUE_TTRX_3464, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -544,6 +656,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tBAx_r0p0 BASE_HW_ISSUE_TTRX_3083, BASE_HW_ISSUE_TTRX_3470, BASE_HW_ISSUE_TTRX_3464, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -557,6 +672,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tBAx_r1p0 BASE_HW_ISSUE_TTRX_3083, BASE_HW_ISSUE_TTRX_3470, BASE_HW_ISSUE_TTRX_3464, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; @@ -569,105 +687,201 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tBA BASE_HW_ISSUE_TTRX_3083, BASE_HW_ISSUE_TTRX_3470, BASE_HW_ISSUE_TTRX_3464, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; -__attribute__((unused)) static const enum base_hw_issue base_hw_issues_tDUx_r0p0[] = { - BASE_HW_ISSUE_9435, +__attribute__((unused)) static const enum base_hw_issue base_hw_issues_tODx_r0p0[] = { BASE_HW_ISSUE_TSIX_2033, BASE_HW_ISSUE_TTRX_1337, - BASE_HW_ISSUE_TTRX_921, - BASE_HW_ISSUE_TTRX_3414, - BASE_HW_ISSUE_TTRX_3083, + BASE_HW_ISSUE_GPU2019_3212, + BASE_HW_ISSUE_GPU2019_3878, + BASE_HW_ISSUE_GPU2019_3901, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; -__attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tDUx[] = { - BASE_HW_ISSUE_5736, - BASE_HW_ISSUE_9435, +__attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tODx[] = { BASE_HW_ISSUE_TSIX_2033, BASE_HW_ISSUE_TTRX_1337, - BASE_HW_ISSUE_TTRX_3414, - BASE_HW_ISSUE_TTRX_3083, + BASE_HW_ISSUE_GPU2019_3212, + BASE_HW_ISSUE_GPU2019_3878, + BASE_HW_ISSUE_GPU2019_3901, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; -__attribute__((unused)) static const enum base_hw_issue base_hw_issues_tODx_r0p0[] = { - BASE_HW_ISSUE_9435, +__attribute__((unused)) static const enum base_hw_issue base_hw_issues_tGRx_r0p0[] = { BASE_HW_ISSUE_TSIX_2033, BASE_HW_ISSUE_TTRX_1337, - BASE_HW_ISSUE_GPU2019_3212, BASE_HW_ISSUE_GPU2019_3878, + BASE_HW_ISSUE_GPU2019_3901, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; -__attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tODx[] = { - BASE_HW_ISSUE_5736, - BASE_HW_ISSUE_9435, +__attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tGRx[] = { BASE_HW_ISSUE_TSIX_2033, BASE_HW_ISSUE_TTRX_1337, - BASE_HW_ISSUE_GPU2019_3212, BASE_HW_ISSUE_GPU2019_3878, + BASE_HW_ISSUE_GPU2019_3901, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; -__attribute__((unused)) static const enum base_hw_issue base_hw_issues_tGRx_r0p0[] = { - BASE_HW_ISSUE_9435, +__attribute__((unused)) static const enum base_hw_issue base_hw_issues_tVAx_r0p0[] = { BASE_HW_ISSUE_TSIX_2033, BASE_HW_ISSUE_TTRX_1337, BASE_HW_ISSUE_GPU2019_3878, + BASE_HW_ISSUE_GPU2019_3901, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; -__attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tGRx[] = { - BASE_HW_ISSUE_5736, - BASE_HW_ISSUE_9435, +__attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tVAx[] = { BASE_HW_ISSUE_TSIX_2033, BASE_HW_ISSUE_TTRX_1337, BASE_HW_ISSUE_GPU2019_3878, + BASE_HW_ISSUE_GPU2019_3901, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; -__attribute__((unused)) static const enum base_hw_issue base_hw_issues_tVAx_r0p0[] = { - BASE_HW_ISSUE_9435, +__attribute__((unused)) static const enum base_hw_issue base_hw_issues_tTUx_r0p0[] = { BASE_HW_ISSUE_TSIX_2033, BASE_HW_ISSUE_TTRX_1337, + BASE_HW_ISSUE_TURSEHW_1997, BASE_HW_ISSUE_GPU2019_3878, + BASE_HW_ISSUE_TURSEHW_2716, + BASE_HW_ISSUE_GPU2019_3901, + BASE_HW_ISSUE_GPU2021PRO_290, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_TITANHW_2679, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; -__attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tVAx[] = { - BASE_HW_ISSUE_5736, - BASE_HW_ISSUE_9435, +__attribute__((unused)) static const enum base_hw_issue base_hw_issues_tTUx_r0p1[] = { BASE_HW_ISSUE_TSIX_2033, BASE_HW_ISSUE_TTRX_1337, + BASE_HW_ISSUE_TURSEHW_1997, BASE_HW_ISSUE_GPU2019_3878, + BASE_HW_ISSUE_TURSEHW_2716, + BASE_HW_ISSUE_GPU2019_3901, + BASE_HW_ISSUE_GPU2021PRO_290, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_TITANHW_2679, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; __attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tTUx[] = { - BASE_HW_ISSUE_5736, - BASE_HW_ISSUE_9435, BASE_HW_ISSUE_TSIX_2033, BASE_HW_ISSUE_TTRX_1337, BASE_HW_ISSUE_GPU2019_3878, + BASE_HW_ISSUE_TURSEHW_2716, + BASE_HW_ISSUE_GPU2019_3901, + BASE_HW_ISSUE_GPU2021PRO_290, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_TITANHW_2679, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; -__attribute__((unused)) static const enum base_hw_issue base_hw_issues_tTUx_r0p0[] = { - BASE_HW_ISSUE_9435, +__attribute__((unused)) static const enum base_hw_issue base_hw_issues_tTUx_r1p0[] = { BASE_HW_ISSUE_TSIX_2033, BASE_HW_ISSUE_TTRX_1337, - BASE_HW_ISSUE_TURSEHW_1997, BASE_HW_ISSUE_GPU2019_3878, + BASE_HW_ISSUE_TURSEHW_2716, + BASE_HW_ISSUE_GPU2019_3901, + BASE_HW_ISSUE_GPU2021PRO_290, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_TITANHW_2679, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; -__attribute__((unused)) static const enum base_hw_issue base_hw_issues_tTUx_r1p0[] = { - BASE_HW_ISSUE_9435, +__attribute__((unused)) static const enum base_hw_issue base_hw_issues_tTUx_r1p1[] = { + BASE_HW_ISSUE_TSIX_2033, + BASE_HW_ISSUE_TTRX_1337, + BASE_HW_ISSUE_GPU2019_3878, + BASE_HW_ISSUE_TURSEHW_2716, + BASE_HW_ISSUE_GPU2019_3901, + BASE_HW_ISSUE_GPU2021PRO_290, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_TITANHW_2679, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, + BASE_HW_ISSUE_END +}; + +__attribute__((unused)) static const enum base_hw_issue base_hw_issues_tTUx_r1p2[] = { BASE_HW_ISSUE_TSIX_2033, BASE_HW_ISSUE_TTRX_1337, BASE_HW_ISSUE_GPU2019_3878, + BASE_HW_ISSUE_TURSEHW_2716, + BASE_HW_ISSUE_GPU2019_3901, + BASE_HW_ISSUE_GPU2021PRO_290, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_TITANHW_2679, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, + BASE_HW_ISSUE_END +}; + +__attribute__((unused)) static const enum base_hw_issue base_hw_issues_tTUx_r1p3[] = { + BASE_HW_ISSUE_TSIX_2033, + BASE_HW_ISSUE_TTRX_1337, + BASE_HW_ISSUE_GPU2019_3878, + BASE_HW_ISSUE_TURSEHW_2716, + BASE_HW_ISSUE_GPU2019_3901, + BASE_HW_ISSUE_GPU2021PRO_290, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_TITANHW_2679, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, + BASE_HW_ISSUE_END +}; + +__attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tTIx[] = { + BASE_HW_ISSUE_TSIX_2033, + BASE_HW_ISSUE_TTRX_1337, + BASE_HW_ISSUE_TURSEHW_2716, + BASE_HW_ISSUE_GPU2021PRO_290, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_TITANHW_2679, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, + BASE_HW_ISSUE_END +}; + +__attribute__((unused)) static const enum base_hw_issue base_hw_issues_tTIx_r0p0[] = { + BASE_HW_ISSUE_TSIX_2033, + BASE_HW_ISSUE_TTRX_1337, + BASE_HW_ISSUE_TURSEHW_2716, + BASE_HW_ISSUE_GPU2021PRO_290, + BASE_HW_ISSUE_TITANHW_2710, + BASE_HW_ISSUE_TITANHW_2679, + BASE_HW_ISSUE_GPU2022PRO_148, + BASE_HW_ISSUE_TITANHW_2938, BASE_HW_ISSUE_END }; diff --git a/mali_kbase/mali_kbase.h b/mali_kbase/mali_kbase.h index 9f2d209..d9e632f 100644 --- a/mali_kbase/mali_kbase.h +++ b/mali_kbase/mali_kbase.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2010-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -52,6 +52,7 @@ #include <uapi/gpu/arm/midgard/mali_base_kernel.h> #include <mali_kbase_linux.h> +#include <linux/version_compat_defs.h> /* * Include mali_kbase_defs.h first as this provides types needed by other local @@ -61,9 +62,7 @@ #include "debug/mali_kbase_debug_ktrace.h" #include "context/mali_kbase_context.h" -#include "mali_kbase_strings.h" #include "mali_kbase_mem_lowlevel.h" -#include "mali_kbase_utility.h" #include "mali_kbase_mem.h" #include "mmu/mali_kbase_mmu.h" #include "mali_kbase_gpu_memory_debugfs.h" @@ -75,7 +74,9 @@ #include "mali_kbase_jd_debugfs.h" #include "mali_kbase_jm.h" #include "mali_kbase_js.h" -#endif /* !MALI_USE_CSF */ +#else /* !MALI_USE_CSF */ +#include "csf/mali_kbase_debug_csf_fault.h" +#endif /* MALI_USE_CSF */ #include "ipa/mali_kbase_ipa.h" @@ -85,16 +86,12 @@ #include "mali_linux_trace.h" +#define KBASE_DRV_NAME "mali" +#define KBASE_TIMELINE_NAME KBASE_DRV_NAME ".timeline" + #if MALI_USE_CSF #include "csf/mali_kbase_csf.h" -#endif -#ifndef u64_to_user_ptr -/* Introduced in Linux v4.6 */ -#define u64_to_user_ptr(x) ((void __user *)(uintptr_t)x) -#endif - -#if MALI_USE_CSF /* Physical memory group ID for CSF user I/O. */ #define KBASE_MEM_GROUP_CSF_IO BASE_MEM_GROUP_DEFAULT @@ -266,7 +263,7 @@ void kbase_jd_cancel(struct kbase_device *kbdev, struct kbase_jd_atom *katom); void kbase_jd_zap_context(struct kbase_context *kctx); /* - * jd_done_nolock - Perform the necessary handling of an atom that has completed + * kbase_jd_done_nolock - Perform the necessary handling of an atom that has completed * the execution. * * @katom: Pointer to the atom that completed the execution @@ -282,7 +279,7 @@ void kbase_jd_zap_context(struct kbase_context *kctx); * * The caller must hold the kbase_jd_context.lock. */ -bool jd_done_nolock(struct kbase_jd_atom *katom, bool post_immediately); +bool kbase_jd_done_nolock(struct kbase_jd_atom *katom, bool post_immediately); void kbase_jd_free_external_resources(struct kbase_jd_atom *katom); void kbase_jd_dep_clear_locked(struct kbase_jd_atom *katom); @@ -345,21 +342,8 @@ int kbase_job_slot_softstop_start_rp(struct kbase_context *kctx, void kbase_job_slot_softstop(struct kbase_device *kbdev, int js, struct kbase_jd_atom *target_katom); -void kbase_job_slot_softstop_swflags(struct kbase_device *kbdev, int js, - struct kbase_jd_atom *target_katom, u32 sw_flags); - -/** - * kbase_job_slot_hardstop - Hard-stop the specified job slot - * @kctx: The kbase context that contains the job(s) that should - * be hard-stopped - * @js: The job slot to hard-stop - * @target_katom: The job that should be hard-stopped (or NULL for all - * jobs from the context) - * Context: - * The job slot lock must be held when calling this function. - */ -void kbase_job_slot_hardstop(struct kbase_context *kctx, int js, - struct kbase_jd_atom *target_katom); +void kbase_job_slot_softstop_swflags(struct kbase_device *kbdev, unsigned int js, + struct kbase_jd_atom *target_katom, u32 sw_flags); /** * kbase_job_check_enter_disjoint - potentiall enter disjoint mode @@ -454,19 +438,6 @@ static inline void kbase_free_user_buffer( } } -/** - * kbase_mem_copy_from_extres() - Copy from external resources. - * - * @kctx: kbase context within which the copying is to take place. - * @buf_data: Pointer to the information about external resources: - * pages pertaining to the external resource, number of - * pages to copy. - * - * Return: 0 on success, error code otherwise. - */ -int kbase_mem_copy_from_extres(struct kbase_context *kctx, - struct kbase_debug_copy_buffer *buf_data); - #if !MALI_USE_CSF int kbase_process_soft_job(struct kbase_jd_atom *katom); int kbase_prepare_soft_job(struct kbase_jd_atom *katom); @@ -474,7 +445,7 @@ void kbase_finish_soft_job(struct kbase_jd_atom *katom); void kbase_cancel_soft_job(struct kbase_jd_atom *katom); void kbase_resume_suspended_soft_jobs(struct kbase_device *kbdev); void kbasep_remove_waiting_soft_job(struct kbase_jd_atom *katom); -#if defined(CONFIG_SYNC) || defined(CONFIG_SYNC_FILE) +#if IS_ENABLED(CONFIG_SYNC_FILE) void kbase_soft_event_wait_callback(struct kbase_jd_atom *katom); #endif int kbase_soft_event_update(struct kbase_context *kctx, @@ -493,9 +464,9 @@ void kbasep_as_do_poke(struct work_struct *work); * * @kbdev: The kbase device structure for the device * - * The caller should ensure that either kbdev->pm.active_count_lock is held, or - * a dmb was executed recently (to ensure the value is most - * up-to-date). However, without a lock the value could change afterwards. + * The caller should ensure that either kbase_device::kbase_pm_device_data::lock is held, + * or a dmb was executed recently (to ensure the value is most up-to-date). + * However, without a lock the value could change afterwards. * * Return: * * false if a suspend is not in progress @@ -506,6 +477,22 @@ static inline bool kbase_pm_is_suspending(struct kbase_device *kbdev) return kbdev->pm.suspending; } +/** + * kbase_pm_is_resuming - Check whether System resume of GPU device is in progress. + * + * @kbdev: The kbase device structure for the device + * + * The caller should ensure that either kbase_device::kbase_pm_device_data::lock is held, + * or a dmb was executed recently (to ensure the value is most up-to-date). + * However, without a lock the value could change afterwards. + * + * Return: true if System resume is in progress, otherwise false. + */ +static inline bool kbase_pm_is_resuming(struct kbase_device *kbdev) +{ + return kbdev->pm.resuming; +} + #ifdef CONFIG_MALI_ARBITER_SUPPORT /* * Check whether a gpu lost is in progress @@ -559,6 +546,23 @@ static inline bool kbase_pm_is_active(struct kbase_device *kbdev) } /** + * kbase_pm_gpu_freq_init() - Find the lowest frequency that the GPU can + * run as using the device tree, then query the + * GPU properties to find out the highest GPU + * frequency and store both of them within the + * @kbase_device. + * @kbdev: Pointer to kbase device. + * + * This function could be called from kbase_clk_rate_trace_manager_init, + * but is left separate as it can be called as soon as + * dev_pm_opp_of_add_table() has been called to initialize the OPP table, + * which occurs in power_control_init(). + * + * Return: 0 on success, negative error code on failure. + */ +int kbase_pm_gpu_freq_init(struct kbase_device *kbdev); + +/** * kbase_pm_metrics_start - Start the utilization metrics timer * @kbdev: Pointer to the kbase device for which to start the utilization * metrics calculation thread. @@ -576,6 +580,40 @@ void kbase_pm_metrics_start(struct kbase_device *kbdev); */ void kbase_pm_metrics_stop(struct kbase_device *kbdev); +/** + * kbase_pm_init_event_log - Initialize the event log and make it discoverable + * + * @kbdev: The kbase device structure for the device (must be a valid pointer) + */ +void kbase_pm_init_event_log(struct kbase_device *kbdev); + +/** + * kbase_pm_max_event_log_size - Get the largest size of the power management event log + * + * @kbdev: The kbase device structure for the device (must be a valid pointer) + * + * Return: The size of a buffer large enough to contain the log at any time. + */ +u64 kbase_pm_max_event_log_size(struct kbase_device *kbdev); + +/** + * kbase_pm_copy_event_log - Retrieve a copy of the power management event log + * + * @kbdev: The kbase device structure for the device (must be a valid pointer) + * @buffer: If non-NULL, a buffer of @size bytes to copy the data into + * @size: The size of buffer (should be at least as large as returned by + * kbase_pm_event_max_log_size()) + * + * This function is called when dumping a debug log of all recent events in the + * power management backend. + * + * Return: 0 if the log could be copied successfully, otherwise an error code. + * + * Requires kbdev->pmaccess_lock to be held. + */ +int kbase_pm_copy_event_log(struct kbase_device *kbdev, + void *buffer, u64 size); + #if MALI_USE_CSF && defined(KBASE_PM_RUNTIME) /** * kbase_pm_handle_runtime_suspend - Handle the runtime suspend of GPU @@ -614,6 +652,7 @@ int kbase_pm_handle_runtime_suspend(struct kbase_device *kbdev); * Return: 0 if the wake up was successful. */ int kbase_pm_force_mcu_wakeup_after_sleep(struct kbase_device *kbdev); + #endif #if !MALI_USE_CSF @@ -763,23 +802,153 @@ void kbase_device_pcm_dev_term(struct kbase_device *const kbdev); #define KBASE_DISJOINT_STATE_INTERLEAVED_CONTEXT_COUNT_THRESHOLD 2 /** - * kbase_create_realtime_thread - Create a realtime thread with an appropriate coremask + * kbase_kthread_run_rt - Create a realtime thread with an appropriate coremask * - * @kbdev: the kbase device - * @threadfn: the function the realtime thread will execute - * @data: pointer to the thread's data - * @namefmt: a name for the thread. + * @kbdev: the kbase device + * @threadfn: the function the realtime thread will execute + * @thread_param: data pointer to @threadfn + * @namefmt: a name for the thread. * * Creates a realtime kthread with priority &KBASE_RT_THREAD_PRIO and restricted * to cores defined by &KBASE_RT_THREAD_CPUMASK_MIN and &KBASE_RT_THREAD_CPUMASK_MAX. * - * Return: A valid &struct task_struct pointer on success, or an ERR_PTR on failure. + * Wakes up the task. + * + * Return: IS_ERR() on failure, or a valid task pointer. */ -struct task_struct * kbase_create_realtime_thread(struct kbase_device *kbdev, - int (*threadfn)(void *data), void *data, const char namefmt[]); +struct task_struct *kbase_kthread_run_rt(struct kbase_device *kbdev, + int (*threadfn)(void *data), void *thread_param, const char namefmt[], ...); + +/** + * kbase_kthread_run_worker_rt - Create a realtime kthread_worker_fn with an appropriate coremask + * + * @kbdev: the kbase device + * @worker: pointer to the thread's parameters + * @namefmt: a name for the thread. + * + * Creates a realtime kthread_worker_fn thread with priority &KBASE_RT_THREAD_PRIO and restricted + * to cores defined by &KBASE_RT_THREAD_CPUMASK_MIN and &KBASE_RT_THREAD_CPUMASK_MAX. + * + * Wakes up the task. + * + * Return: Zero on success, or an PTR_ERR on failure. + */ +int kbase_kthread_run_worker_rt(struct kbase_device *kbdev, + struct kthread_worker *worker, const char namefmt[], ...); + +/** + * kbase_destroy_kworker_stack - Destroy a kthread_worker and it's thread on the stack + * + * @worker: pointer to the thread's kworker + */ +void kbase_destroy_kworker_stack(struct kthread_worker *worker); #if !defined(UINT64_MAX) #define UINT64_MAX ((uint64_t)0xFFFFFFFFFFFFFFFFULL) #endif +/** + * kbase_file_fops_count() - Get the kfile::fops_count value + * + * @kfile: Pointer to the object representing the mali device file. + * + * The value is read with kfile::lock held. + * + * Return: sampled value of kfile::fops_count. + */ +static inline u32 kbase_file_fops_count(struct kbase_file *kfile) +{ + u32 fops_count; + + spin_lock(&kfile->lock); + fops_count = kfile->fops_count; + spin_unlock(&kfile->lock); + + return fops_count; +} + +/** + * kbase_file_inc_fops_count_unless_closed() - Increment the kfile::fops_count value if the + * kfile::owner is still set. + * + * @kfile: Pointer to the object representing the /dev/malixx device file instance. + * + * Return: true if the increment was done otherwise false. + */ +static inline bool kbase_file_inc_fops_count_unless_closed(struct kbase_file *kfile) +{ + bool count_incremented = false; + + spin_lock(&kfile->lock); + if (kfile->owner) { + kfile->fops_count++; + count_incremented = true; + } + spin_unlock(&kfile->lock); + + return count_incremented; +} + +/** + * kbase_file_dec_fops_count() - Decrement the kfile::fops_count value + * + * @kfile: Pointer to the object representing the /dev/malixx device file instance. + * + * This function shall only be called to decrement kfile::fops_count if a successful call + * to kbase_file_inc_fops_count_unless_closed() was made previously by the current thread. + * + * The function would enqueue the kfile::destroy_kctx_work if the process that originally + * created the file instance has closed its copy and no Kbase handled file operations are + * in progress and no memory mappings are present for the file instance. + */ +static inline void kbase_file_dec_fops_count(struct kbase_file *kfile) +{ + spin_lock(&kfile->lock); + WARN_ON_ONCE(kfile->fops_count <= 0); + kfile->fops_count--; + if (unlikely(!kfile->fops_count && !kfile->owner && !kfile->map_count)) { + queue_work(system_wq, &kfile->destroy_kctx_work); +#if IS_ENABLED(CONFIG_DEBUG_FS) + wake_up(&kfile->zero_fops_count_wait); +#endif + } + spin_unlock(&kfile->lock); +} + +/** + * kbase_file_inc_cpu_mapping_count() - Increment the kfile::map_count value. + * + * @kfile: Pointer to the object representing the /dev/malixx device file instance. + * + * This function shall be called when the memory mapping on /dev/malixx device file + * instance is created. The kbase_file::setup_state shall be KBASE_FILE_COMPLETE. + */ +static inline void kbase_file_inc_cpu_mapping_count(struct kbase_file *kfile) +{ + spin_lock(&kfile->lock); + kfile->map_count++; + spin_unlock(&kfile->lock); +} + +/** + * kbase_file_dec_cpu_mapping_count() - Decrement the kfile::map_count value + * + * @kfile: Pointer to the object representing the /dev/malixx device file instance. + * + * This function is called to decrement kfile::map_count value when the memory mapping + * on /dev/malixx device file is closed. + * The function would enqueue the kfile::destroy_kctx_work if the process that originally + * created the file instance has closed its copy and there are no mappings present and no + * Kbase handled file operations are in progress for the file instance. + */ +static inline void kbase_file_dec_cpu_mapping_count(struct kbase_file *kfile) +{ + spin_lock(&kfile->lock); + WARN_ON_ONCE(kfile->map_count <= 0); + kfile->map_count--; + if (unlikely(!kfile->map_count && !kfile->owner && !kfile->fops_count)) + queue_work(system_wq, &kfile->destroy_kctx_work); + spin_unlock(&kfile->lock); +} + #endif diff --git a/mali_kbase/mali_kbase_as_fault_debugfs.c b/mali_kbase/mali_kbase_as_fault_debugfs.c index 77f450d..ad33691 100644 --- a/mali_kbase/mali_kbase_as_fault_debugfs.c +++ b/mali_kbase/mali_kbase_as_fault_debugfs.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2016-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2016-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -98,11 +98,9 @@ void kbase_as_fault_debugfs_init(struct kbase_device *kbdev) "unable to create address_spaces debugfs directory"); } else { for (i = 0; i < kbdev->nr_hw_address_spaces; i++) { - snprintf(as_name, ARRAY_SIZE(as_name), "as%u", i); - debugfs_create_file(as_name, 0444, - debugfs_directory, - (void *)(uintptr_t)i, - &as_fault_fops); + if (likely(scnprintf(as_name, ARRAY_SIZE(as_name), "as%u", i))) + debugfs_create_file(as_name, 0444, debugfs_directory, + (void *)(uintptr_t)i, &as_fault_fops); } } diff --git a/mali_kbase/mali_kbase_config.c b/mali_kbase/mali_kbase_config.c index 37dbca1..32f404b 100644 --- a/mali_kbase/mali_kbase_config.c +++ b/mali_kbase/mali_kbase_config.c @@ -63,7 +63,6 @@ void kbasep_platform_device_late_term(struct kbase_device *kbdev) platform_funcs_p->platform_late_term_func(kbdev); } -#if !MALI_USE_CSF int kbasep_platform_context_init(struct kbase_context *kctx) { struct kbase_platform_funcs_conf *platform_funcs_p; @@ -84,21 +83,41 @@ void kbasep_platform_context_term(struct kbase_context *kctx) platform_funcs_p->platform_handler_context_term_func(kctx); } -void kbasep_platform_event_atom_submit(struct kbase_jd_atom *katom) +void kbasep_platform_event_work_begin(void *param) { struct kbase_platform_funcs_conf *platform_funcs_p; - platform_funcs_p = (struct kbase_platform_funcs_conf *)PLATFORM_FUNCS; - if (platform_funcs_p && platform_funcs_p->platform_handler_atom_submit_func) - platform_funcs_p->platform_handler_atom_submit_func(katom); + platform_funcs_p = (struct kbase_platform_funcs_conf*)PLATFORM_FUNCS; + if (platform_funcs_p && platform_funcs_p->platform_handler_work_begin_func) + platform_funcs_p->platform_handler_work_begin_func(param); } -void kbasep_platform_event_atom_complete(struct kbase_jd_atom *katom) +void kbasep_platform_event_work_end(void *param) { struct kbase_platform_funcs_conf *platform_funcs_p; - platform_funcs_p = (struct kbase_platform_funcs_conf *)PLATFORM_FUNCS; - if (platform_funcs_p && platform_funcs_p->platform_handler_atom_complete_func) - platform_funcs_p->platform_handler_atom_complete_func(katom); + platform_funcs_p = (struct kbase_platform_funcs_conf*)PLATFORM_FUNCS; + if (platform_funcs_p && platform_funcs_p->platform_handler_work_end_func) + platform_funcs_p->platform_handler_work_end_func(param); } -#endif + +int kbasep_platform_fw_config_init(struct kbase_device *kbdev) +{ + struct kbase_platform_funcs_conf *platform_funcs_p; + + platform_funcs_p = (struct kbase_platform_funcs_conf*)PLATFORM_FUNCS; + if (platform_funcs_p && platform_funcs_p->platform_fw_cfg_init_func) + return platform_funcs_p->platform_fw_cfg_init_func(kbdev); + + return 0; +} + +void kbasep_platform_event_core_dump(struct kbase_device *kbdev, const char* reason) +{ + struct kbase_platform_funcs_conf *platform_funcs_p; + + platform_funcs_p = (struct kbase_platform_funcs_conf*)PLATFORM_FUNCS; + if (platform_funcs_p && platform_funcs_p->platform_handler_core_dump_func) + platform_funcs_p->platform_handler_core_dump_func(kbdev, reason); +} + diff --git a/mali_kbase/mali_kbase_config.h b/mali_kbase/mali_kbase_config.h index ecfdb28..ab65216 100644 --- a/mali_kbase/mali_kbase_config.h +++ b/mali_kbase/mali_kbase_config.h @@ -34,14 +34,9 @@ /* Forward declaration of struct kbase_device */ struct kbase_device; -#if !MALI_USE_CSF /* Forward declaration of struct kbase_context */ struct kbase_context; -/* Forward declaration of struct kbase_atom */ -struct kbase_jd_atom; -#endif - /** * struct kbase_platform_funcs_conf - Specifies platform integration function * pointers for DDK events such as device init and term. @@ -104,8 +99,6 @@ struct kbase_platform_funcs_conf { * can be accessed (and possibly terminated) in here. */ void (*platform_late_term_func)(struct kbase_device *kbdev); - -#if !MALI_USE_CSF /** * @platform_handler_context_init_func: platform specific handler for * when a new kbase_context is created. @@ -129,33 +122,63 @@ struct kbase_platform_funcs_conf { */ void (*platform_handler_context_term_func)(struct kbase_context *kctx); /** - * @platform_handler_atom_submit_func: platform specific handler for - * when a kbase_jd_atom is submitted. - * @katom - kbase_jd_atom pointer + * platform_handler_work_begin_func - Platform specific handler whose + * function changes depending on the + * backend used. + * @param + * - If job manager GPU: Param is a pointer of type struct kbase_jd_atom*, + * to the atom that just started executing. + * - If CSF GPU: Param is a pointer of type struct kbase_queue_group*, to + * the group resident in a CSG slot which just started executing. + * + * Function pointer for platform specific handling at the point when a unit + * of work starts running on the GPU or set to NULL if not required. The + * function cannot assume that it is running in a process context. * - * Function pointer for platform specific handling at the point when an - * atom is submitted to the GPU or set to NULL if not required. The + * Context: + * - If job manager: Function must be runnable in an interrupt context. + */ + void (*platform_handler_work_begin_func)(void* param); + /** + * platform_handler_work_end_func - Platform specific handler whose function + * changes depending on the backend used. + * @param + * - If job manager GPU: Param is a pointer of type struct kbase_jd_atom*, + * to the atom that just completed. + * - If CSF GPU: Param is a pointer of type struct kbase_queue_group*, to + * the group resident in a CSG slot which just completed or suspended + * execution. + * + * Function pointer for platform specific handling at the point when a unit + * of work stops running on the GPU or set to NULL if not required. The * function cannot assume that it is running in a process context. * - * Context: The caller must hold the hwaccess_lock. Function must be - * runnable in an interrupt context. + * Context: + * - If job manager: Function must be runnable in an interrupt context. */ - void (*platform_handler_atom_submit_func)(struct kbase_jd_atom *katom); + void (*platform_handler_work_end_func)(void* param); /** - * @platform_handler_atom_complete_func: platform specific handler for - * when a kbase_jd_atom completes. - * @katom - kbase_jd_atom pointer + * platform_fw_cfg_init_func - Platform specific callback for FW configuration * - * Function pointer for platform specific handling at the point when an - * atom stops running on the GPU or set to NULL if not required. The - * function cannot assume that it is running in a process context. + * @kbdev: kbase_device pointer + * + * Function pointer for platform specific FW configuration * - * Context: The caller must hold the hwaccess_lock. Function must be - * runnable in an interrupt context. + * Context: Process context */ - void (*platform_handler_atom_complete_func)( - struct kbase_jd_atom *katom); -#endif + int (*platform_fw_cfg_init_func)(struct kbase_device *kbdev); + /** + * platform_handler_core_dump_func - Platform specific handler for triggering a core dump. + * + * @kbdev: kbase_device pointer + * @reason: A null terminated string containing a dump reason + * + * Function pointer for platform specific handling at the point an internal error + * has occurred, to dump debug info about the error. Or set to NULL if not required. + * + * Context: The caller must hold the hwaccess lock + */ + void (*platform_handler_core_dump_func)(struct kbase_device *kbdev, const char* reason); }; /* @@ -297,6 +320,14 @@ struct kbase_pm_callback_conf { int (*soft_reset_callback)(struct kbase_device *kbdev); /* + * Optional callback for full hardware reset of the GPU + * + * This callback will be called by the power management core to trigger + * a GPU hardware reset. + */ + void (*hardware_reset_callback)(struct kbase_device *kbdev); + + /* * Optional callback invoked after GPU becomes idle, not supported on * JM GPUs. * @@ -338,6 +369,24 @@ struct kbase_pm_callback_conf { * this feature. */ void (*power_runtime_gpu_active_callback)(struct kbase_device *kbdev); + +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + /* + * This callback will be invoked by the Kbase when GPU becomes active + * to turn on the shader core power rails. + * This callback is invoked from process context and the power rails + * must be turned on before the completion of callback. + */ + void (*power_on_sc_rails_callback)(struct kbase_device *kbdev); + + /* + * This callback will be invoked by the Kbase when GPU becomes idle + * to turn off the shader core power rails. + * This callback is invoked from process context and the power rails + * must be turned off before the completion of callback. + */ + void (*power_off_sc_rails_callback)(struct kbase_device *kbdev); +#endif }; /* struct kbase_gpu_clk_notifier_data - Data for clock rate change notifier. @@ -511,7 +560,6 @@ int kbasep_platform_device_late_init(struct kbase_device *kbdev); */ void kbasep_platform_device_late_term(struct kbase_device *kbdev); -#if !MALI_USE_CSF /** * kbasep_platform_context_init - Platform specific callback when a kernel * context is created @@ -538,28 +586,58 @@ int kbasep_platform_context_init(struct kbase_context *kctx); void kbasep_platform_context_term(struct kbase_context *kctx); /** - * kbasep_platform_event_atom_submit - Platform specific callback when an atom - * is submitted to the GPU - * @katom: kbase_jd_atom pointer + * kbasep_platform_event_work_begin - Platform specific callback whose function + * changes depending on the backend used. + * Signals that a unit of work has started + * running on the GPU. + * @param + * - If job manager GPU: Param is a pointer of type struct kbase_jd_atom*, + * to the atom that just started executing. + * - If CSF GPU: Param is a pointer of type struct kbase_queue_group*, to + * the group resident in a CSG slot which just started executing. * * Function calls a platform defined routine if specified in the configuration - * attributes. The routine should not assume that it is in a process context. + * attributes. The routine should not assume that it is in a process context. * - * Return: 0 if no errors were encountered. Negative error code otherwise. */ -void kbasep_platform_event_atom_submit(struct kbase_jd_atom *katom); +void kbasep_platform_event_work_begin(void *param); /** - * kbasep_platform_event_atom_complete - Platform specific callback when an atom - * has stopped running on the GPU - * @katom: kbase_jd_atom pointer + * kbasep_platform_event_work_end - Platform specific callback whose function + * changes depending on the backend used. + * Signals that a unit of work has completed. + * @param + * - If job manager GPU: Param is a pointer of type struct kbase_jd_atom*, + * to the atom that just completed. + * - If CSF GPU: Param is a pointer of type struct kbase_queue_group*, to + * the group resident in a CSG slot which just completed or suspended execution. * * Function calls a platform defined routine if specified in the configuration - * attributes. The routine should not assume that it is in a process context. + * attributes. The routine should not assume that it is in a process context. * */ -void kbasep_platform_event_atom_complete(struct kbase_jd_atom *katom); -#endif +void kbasep_platform_event_work_end(void *param); + +/** + * kbasep_platform_fw_config_init - Platform specific callback to configure FW + * + * @kbdev - kbase_device pointer + * + * Function calls a platform defined routine if specified in the configuration attributes. + * + */ +int kbasep_platform_fw_config_init(struct kbase_device *kbdev); + +/** + * kbasep_platform_event_core_dump - Platform specific callback to act on a firmware error. + * + * @kbdev - kbase_device pointer + * @reason: A null terminated string containing a dump reason + * + * Function calls a platform defined routine if specified in the configuration attributes. + * + */ +void kbasep_platform_event_core_dump(struct kbase_device *kbdev, const char* reason); #ifndef CONFIG_OF /** diff --git a/mali_kbase/mali_kbase_config_defaults.h b/mali_kbase/mali_kbase_config_defaults.h index 18e40b5..fa73612 100644 --- a/mali_kbase/mali_kbase_config_defaults.h +++ b/mali_kbase/mali_kbase_config_defaults.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2013-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2013-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -89,6 +89,18 @@ enum { KBASE_3BIT_AID_4 = 0x7 }; +#if MALI_USE_CSF +/* + * Default value for the TIMER register of the IPA Control interface, + * expressed in milliseconds. + * + * The chosen value is a trade off between two requirements: the IPA Control + * interface should sample counters with a resolution in the order of + * milliseconds, while keeping GPU overhead as limited as possible. + */ +#define IPA_CONTROL_TIMER_DEFAULT_VALUE_MS ((u32)10) /* 10 milliseconds */ +#endif /* MALI_USE_CSF */ + /* Default period for DVFS sampling (can be overridden by platform header) */ #ifndef DEFAULT_PM_DVFS_PERIOD #define DEFAULT_PM_DVFS_PERIOD 100 /* 100ms */ @@ -158,11 +170,6 @@ enum { */ #define DEFAULT_JS_RESET_TICKS_DUMPING (15020) /* 1502s */ -/* Default number of milliseconds given for other jobs on the GPU to be - * soft-stopped when the GPU needs to be reset. - */ -#define DEFAULT_RESET_TIMEOUT_MS (3000) /* 3s */ - /* Nominal reference frequency that was used to obtain all following * <...>_TIMEOUT_CYCLES macros, in kHz. * @@ -176,11 +183,12 @@ enum { * * This is also the default timeout to be used when an invalid timeout * selector is used to retrieve the timeout on CSF GPUs. + * This shouldn't be used as a timeout for the CSG suspend request. * * Based on 75000ms timeout at nominal 100MHz, as is required for Android - based * on scaling from a 50MHz GPU system. */ -#define CSF_FIRMWARE_TIMEOUT_CYCLES (7500000000) +#define CSF_FIRMWARE_TIMEOUT_CYCLES (7500000000ull) /* Timeout in clock cycles for GPU Power Management to reach the desired * Shader, L2 and MCU state. @@ -189,11 +197,41 @@ enum { */ #define CSF_PM_TIMEOUT_CYCLES (250000000) -/* Waiting timeout in clock cycles for GPU reset to complete. +/* Waiting timeout in clock cycles for a CSG to be suspended. + * + * Based on 30s timeout at 100MHz, scaled from 5s at 600Mhz GPU frequency. + * More cycles (1s @ 100Mhz = 100000000) are added up to ensure that + * host timeout is always bigger than FW timeout. + */ +#define CSF_CSG_SUSPEND_TIMEOUT_CYCLES (3100000000ull) + +/* Waiting timeout in clock cycles for GPU reset to complete. */ +#define CSF_GPU_RESET_TIMEOUT_CYCLES (CSF_CSG_SUSPEND_TIMEOUT_CYCLES * 2) + +/* Waiting timeout in clock cycles for GPU firmware to boot. + * + * Based on 250ms timeout at 100MHz, scaled from a 50MHz GPU system. + */ +#define CSF_FIRMWARE_BOOT_TIMEOUT_CYCLES (25000000) + +/* Waiting timeout for a ping request to be acknowledged, in clock cycles. + * + * Based on 6000ms timeout at 100MHz, scaled from a 50MHz GPU system. + */ +#define CSF_FIRMWARE_PING_TIMEOUT_CYCLES (600000000ull) + +/* Waiting timeout for a KCPU queue's fence signal blocked to long, in clock cycles. * - * Based on 2500ms timeout at 100MHz, scaled from a 50MHz GPU system. + * Based on 10s timeout at 100MHz, scaled from a 50MHz GPU system. */ -#define CSF_GPU_RESET_TIMEOUT_CYCLES (250000000) +#define KCPU_FENCE_SIGNAL_TIMEOUT_CYCLES (1000000000ull) + +/* Waiting timeout for task execution on an endpoint. Based on the + * DEFAULT_PROGRESS_TIMEOUT. + * + * Based on 25s timeout at 100Mhz, scaled from a 500MHz GPU system. + */ +#define DEFAULT_PROGRESS_TIMEOUT_CYCLES (2500000000ull) #else /* MALI_USE_CSF */ @@ -202,7 +240,22 @@ enum { */ #define JM_DEFAULT_TIMEOUT_CYCLES (150000000) -#endif /* MALI_USE_CSF */ +/* Default number of milliseconds given for other jobs on the GPU to be + * soft-stopped when the GPU needs to be reset. + */ +#define JM_DEFAULT_RESET_TIMEOUT_MS (3000) /* 3s */ + +/* Default timeout in clock cycles to be used when checking if JS_COMMAND_NEXT + * is updated on HW side so a Job Slot is considered free. + * This timeout will only take effect on GPUs with low value for the minimum + * GPU clock frequency (<= 100MHz). + * + * Based on 1ms timeout at 100MHz. Will default to 0ms on GPUs with higher + * value for minimum GPU clock frequency. + */ +#define JM_DEFAULT_JS_FREE_TIMEOUT_CYCLES (100000) + +#endif /* !MALI_USE_CSF */ /* Default timeslice that a context is scheduled in for, in nanoseconds. * @@ -238,5 +291,18 @@ enum { */ #define DEFAULT_IR_THRESHOLD (192) -#endif /* _KBASE_CONFIG_DEFAULTS_H_ */ +/* Waiting time in clock cycles for the completion of a MMU operation. + * + * Ideally 1.6M GPU cycles required for the L2 cache (512KiB slice) flush. + * + * As a pessimistic value, 50M GPU cycles ( > 30 times bigger ) is chosen. + * It corresponds to 0.5s in GPU @ 100Mhz. + */ +#define MMU_AS_INACTIVE_WAIT_TIMEOUT_CYCLES ((u64)50 * 1024 * 1024) + +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) +/* Default value of the time interval at which GPU metrics tracepoints are emitted. */ +#define DEFAULT_GPU_METRICS_TP_EMIT_INTERVAL_NS (500000000u) /* 500 ms */ +#endif +#endif /* _KBASE_CONFIG_DEFAULTS_H_ */ diff --git a/mali_kbase/mali_kbase_core_linux.c b/mali_kbase/mali_kbase_core_linux.c index e714056..d8fab9f 100644 --- a/mali_kbase/mali_kbase_core_linux.c +++ b/mali_kbase/mali_kbase_core_linux.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2010-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -31,11 +31,8 @@ #include <ipa/mali_kbase_ipa_debugfs.h> #endif /* CONFIG_DEVFREQ_THERMAL */ #endif /* CONFIG_MALI_DEVFREQ */ -#if IS_ENABLED(CONFIG_MALI_NO_MALI) #include "backend/gpu/mali_kbase_model_linux.h" -#include <backend/gpu/mali_kbase_model_dummy.h> -#endif /* CONFIG_MALI_NO_MALI */ -#include "mali_kbase_mem_profile_debugfs_buf_size.h" +#include "uapi/gpu/arm/midgard/mali_kbase_mem_profile_debugfs_buf_size.h" #include "mali_kbase_mem.h" #include "mali_kbase_mem_pool_debugfs.h" #include "mali_kbase_mem_pool_group.h" @@ -54,8 +51,8 @@ #if !MALI_USE_CSF #include "mali_kbase_kinstr_jm.h" #endif -#include "mali_kbase_hwcnt_context.h" -#include "mali_kbase_hwcnt_virtualizer.h" +#include "hwcnt/mali_kbase_hwcnt_context.h" +#include "hwcnt/mali_kbase_hwcnt_virtualizer.h" #include "mali_kbase_kinstr_prfcnt.h" #include "mali_kbase_vinstr.h" #if MALI_USE_CSF @@ -80,6 +77,9 @@ #include "mali_kbase_pbha_debugfs.h" #endif +/* Pixel includes */ +#include "platform/pixel/pixel_gpu_slc.h" + #include <linux/module.h> #include <linux/init.h> #include <linux/poll.h> @@ -96,14 +96,16 @@ #include <linux/fs.h> #include <linux/uaccess.h> #include <linux/interrupt.h> +#include <linux/irq.h> #include <linux/mm.h> #include <linux/compat.h> /* is_compat_task/in_compat_syscall */ #include <linux/mman.h> #include <linux/version.h> +#include <linux/version_compat_defs.h> #include <mali_kbase_hw.h> -#if defined(CONFIG_SYNC) || defined(CONFIG_SYNC_FILE) +#if IS_ENABLED(CONFIG_SYNC_FILE) #include <mali_kbase_sync.h> -#endif /* CONFIG_SYNC || CONFIG_SYNC_FILE */ +#endif /* CONFIG_SYNC_FILE */ #include <linux/clk.h> #include <linux/clk-provider.h> #include <linux/delay.h> @@ -122,11 +124,6 @@ #include <mali_kbase_caps.h> -/* GPU IRQ Tags */ -#define JOB_IRQ_TAG 0 -#define MMU_IRQ_TAG 1 -#define GPU_IRQ_TAG 2 - #define KERNEL_SIDE_DDK_VERSION_STRING "K:" MALI_RELEASE_NAME "(GPL)" /** @@ -138,9 +135,6 @@ (((minor) & 0xFFF) << 8) | \ ((0 & 0xFF) << 0)) -#define KBASE_API_MIN(api_version) ((api_version >> 8) & 0xFFF) -#define KBASE_API_MAJ(api_version) ((api_version >> 20) & 0xFFF) - /** * struct mali_kbase_capability_def - kbase capabilities table * @@ -172,6 +166,13 @@ static const struct mali_kbase_capability_def kbase_caps_table[MALI_KBASE_NUM_CA #endif }; +#if (KERNEL_VERSION(5, 3, 0) <= LINUX_VERSION_CODE) +/* Mutex to synchronize the probe of multiple kbase instances */ +static struct mutex kbase_probe_mutex; +#endif + +static void kbase_file_destroy_kctx_worker(struct work_struct *work); + /** * mali_kbase_supports_cap - Query whether a kbase capability is supported * @@ -200,35 +201,92 @@ bool mali_kbase_supports_cap(unsigned long api_version, enum mali_kbase_cap cap) return supported; } -struct task_struct *kbase_create_realtime_thread(struct kbase_device *kbdev, - int (*threadfn)(void *data), void *data, const char namefmt[]) +static void kbase_set_sched_rt(struct kbase_device *kbdev, struct task_struct *task, char *thread_name) { unsigned int i; - - cpumask_t mask = { CPU_BITS_NONE }; - static const struct sched_param param = { .sched_priority = KBASE_RT_THREAD_PRIO, }; - struct task_struct *ret = kthread_create(kthread_worker_fn, data, namefmt); + cpumask_t mask = { CPU_BITS_NONE }; + for (i = KBASE_RT_THREAD_CPUMASK_MIN; i <= KBASE_RT_THREAD_CPUMASK_MAX ; i++) + cpumask_set_cpu(i, &mask); + kthread_bind_mask(task, &mask); - if (!IS_ERR(ret)) { - for (i = KBASE_RT_THREAD_CPUMASK_MIN; i <= KBASE_RT_THREAD_CPUMASK_MAX ; i++) - cpumask_set_cpu(i, &mask); + wake_up_process(task); - kthread_bind_mask(ret, &mask); + if (sched_setscheduler_nocheck(task, SCHED_FIFO, ¶m)) + dev_warn(kbdev->dev, "%s not set to RT prio", thread_name); + else + dev_dbg(kbdev->dev, "%s set to RT prio: %i", + thread_name, param.sched_priority); +} - wake_up_process(ret); +struct task_struct *kbase_kthread_run_rt(struct kbase_device *kbdev, + int (*threadfn)(void *data), void *thread_param, const char namefmt[], ...) +{ + struct task_struct *task; + va_list args; + char name_buf[128]; + int len; - if (sched_setscheduler_nocheck(ret, SCHED_FIFO, ¶m)) - dev_warn(kbdev->dev, "%s not set to RT prio", namefmt); - else - dev_dbg(kbdev->dev, "%s set to RT prio: %i", - namefmt, param.sched_priority); + /* Construct the thread name */ + va_start(args, namefmt); + len = vsnprintf(name_buf, sizeof(name_buf), namefmt, args); + va_end(args); + if (len + 1 > sizeof(name_buf)) { + dev_warn(kbdev->dev, "RT thread name truncated to %s", name_buf); } - return ret; + task = kthread_create(threadfn, thread_param, name_buf); + + if (!IS_ERR(task)) { + kbase_set_sched_rt(kbdev, task, name_buf); + } + + return task; +} + +int kbase_kthread_run_worker_rt(struct kbase_device *kbdev, + struct kthread_worker *worker, const char namefmt[], ...) +{ + struct task_struct *task; + va_list args; + char name_buf[128]; + int len; + + /* Construct the thread name */ + va_start(args, namefmt); + len = vsnprintf(name_buf, sizeof(name_buf), namefmt, args); + va_end(args); + if (len + 1 > sizeof(name_buf)) { + dev_warn(kbdev->dev, "RT thread name truncated to %s", name_buf); + } + + kthread_init_worker(worker); + + task = kthread_create(kthread_worker_fn, worker, name_buf); + + if (!IS_ERR(task)) { + worker->task = task; + kbase_set_sched_rt(kbdev, task, name_buf); + return 0; + } + + return PTR_ERR(task); +} + +void kbase_destroy_kworker_stack(struct kthread_worker *worker) +{ + struct task_struct *task; + + task = worker->task; + if (WARN_ON(!task)) + return; + + kthread_flush_worker(worker); + kthread_stop(task); + WARN_ON(!list_empty(&worker->work_list)); } /** @@ -245,6 +303,8 @@ struct task_struct *kbase_create_realtime_thread(struct kbase_device *kbdev, * * Return: Address of an object representing a simulated device file, or NULL * on failure. + * + * Note: This function always gets called in Userspace context. */ static struct kbase_file *kbase_file_new(struct kbase_device *const kbdev, struct file *const filp) @@ -257,6 +317,17 @@ static struct kbase_file *kbase_file_new(struct kbase_device *const kbdev, kfile->kctx = NULL; kfile->api_version = 0; atomic_set(&kfile->setup_state, KBASE_FILE_NEED_VSN); + /* Store the pointer to the file table structure of current process. */ + kfile->owner = current->files; + INIT_WORK(&kfile->destroy_kctx_work, kbase_file_destroy_kctx_worker); + spin_lock_init(&kfile->lock); + kfile->fops_count = 0; + kfile->map_count = 0; + typecheck(typeof(kfile->map_count), typeof(current->mm->map_count)); +#if IS_ENABLED(CONFIG_DEBUG_FS) + init_waitqueue_head(&kfile->zero_fops_count_wait); +#endif + init_waitqueue_head(&kfile->event_queue); } return kfile; } @@ -337,18 +408,46 @@ static int kbase_file_create_kctx(struct kbase_file *kfile, base_context_create_flags flags); /** + * kbase_file_inc_fops_count_if_allowed - Increment the kfile::fops_count value if the file + * operation is allowed for the current process. + * + * @kfile: Pointer to the object representing the /dev/malixx device file instance. + * + * The function shall be called at the beginning of certain file operation methods + * implemented for @kbase_fops, like ioctl, poll, read and mmap. + * + * kbase_file_dec_fops_count() shall be called if the increment was done. + * + * Return: true if the increment was done otherwise false. + * + * Note: This function shall always be called in Userspace context. + */ +static bool kbase_file_inc_fops_count_if_allowed(struct kbase_file *const kfile) +{ + /* Disallow file operations from the other process that shares the instance + * of /dev/malixx file i.e. 'kfile' or disallow file operations if parent + * process has closed the file instance. + */ + if (unlikely(kfile->owner != current->files)) + return false; + + return kbase_file_inc_fops_count_unless_closed(kfile); +} + +/** * kbase_file_get_kctx_if_setup_complete - Get a kernel base context * pointer from a device file * * @kfile: A device file created by kbase_file_new() * - * This function returns an error code (encoded with ERR_PTR) if no context - * has been created for the given @kfile. This makes it safe to use in - * circumstances where the order of initialization cannot be enforced, but - * only if the caller checks the return value. + * This function returns NULL if no context has been created for the given @kfile. + * This makes it safe to use in circumstances where the order of initialization + * cannot be enforced, but only if the caller checks the return value. * * Return: Address of the kernel base context associated with the @kfile, or * NULL if no context exists. + * + * Note: This function shall always be called in Userspace context. */ static struct kbase_context *kbase_file_get_kctx_if_setup_complete( struct kbase_file *const kfile) @@ -362,37 +461,103 @@ static struct kbase_context *kbase_file_get_kctx_if_setup_complete( } /** - * kbase_file_delete - Destroy an object representing a device file + * kbase_file_destroy_kctx - Destroy the Kbase context created for @kfile. * * @kfile: A device file created by kbase_file_new() - * - * If any context was created for the @kfile then it is destroyed. */ -static void kbase_file_delete(struct kbase_file *const kfile) +static void kbase_file_destroy_kctx(struct kbase_file *const kfile) { - struct kbase_device *kbdev = NULL; - - if (WARN_ON(!kfile)) + if (atomic_cmpxchg(&kfile->setup_state, KBASE_FILE_COMPLETE, + KBASE_FILE_DESTROY_CTX) != KBASE_FILE_COMPLETE) return; - kfile->filp->private_data = NULL; - kbdev = kfile->kbdev; - - if (atomic_read(&kfile->setup_state) == KBASE_FILE_COMPLETE) { - struct kbase_context *kctx = kfile->kctx; - #if IS_ENABLED(CONFIG_DEBUG_FS) - kbasep_mem_profile_debugfs_remove(kctx); + kbasep_mem_profile_debugfs_remove(kfile->kctx); + kbase_context_debugfs_term(kfile->kctx); #endif - kbase_context_debugfs_term(kctx); - kbase_destroy_context(kctx); + kbase_destroy_context(kfile->kctx); + dev_dbg(kfile->kbdev->dev, "Deleted kbase context"); +} + +/** + * kbase_file_destroy_kctx_worker - Work item to destroy the Kbase context. + * + * @work: Pointer to the kfile::destroy_kctx_work. + * + * The work item shall only be enqueued if the context termination could not + * be done from @kbase_flush(). + */ +static void kbase_file_destroy_kctx_worker(struct work_struct *work) +{ + struct kbase_file *kfile = + container_of(work, struct kbase_file, destroy_kctx_work); + + WARN_ON_ONCE(kfile->owner); + WARN_ON_ONCE(kfile->map_count); + WARN_ON_ONCE(kfile->fops_count); + + kbase_file_destroy_kctx(kfile); +} + +/** + * kbase_file_destroy_kctx_on_flush - Try destroy the Kbase context from the flush() + * method of @kbase_fops. + * + * @kfile: A device file created by kbase_file_new() + */ +static void kbase_file_destroy_kctx_on_flush(struct kbase_file *const kfile) +{ + bool can_destroy_context = false; - dev_dbg(kbdev->dev, "deleted base context\n"); + spin_lock(&kfile->lock); + kfile->owner = NULL; + /* To destroy the context from flush() method, unlike the release() + * method, need to synchronize manually against the other threads in + * the current process that could be operating on the /dev/malixx file. + * + * Only destroy the context if all the memory mappings on the + * /dev/malixx file instance have been closed. If there are mappings + * present then the context would be destroyed later when the last + * mapping is closed. + * Also, only destroy the context if no file operations are in progress. + */ + can_destroy_context = !kfile->map_count && !kfile->fops_count; + spin_unlock(&kfile->lock); + + if (likely(can_destroy_context)) { + WARN_ON_ONCE(work_pending(&kfile->destroy_kctx_work)); + kbase_file_destroy_kctx(kfile); } +} - kbase_release_device(kbdev); +/** + * kbase_file_delete - Destroy an object representing a device file + * + * @kfile: A device file created by kbase_file_new() + * + * If any context was created for the @kfile and is still alive, then it is destroyed. + */ +static void kbase_file_delete(struct kbase_file *const kfile) +{ + if (WARN_ON(!kfile)) + return; + + /* All the CPU mappings on the device file should have been closed */ + WARN_ON_ONCE(kfile->map_count); +#if IS_ENABLED(CONFIG_DEBUG_FS) + /* There could still be file operations due to the debugfs file (mem_view) */ + wait_event(kfile->zero_fops_count_wait, !kbase_file_fops_count(kfile)); +#else + /* There shall not be any file operations in progress on the device file */ + WARN_ON_ONCE(kfile->fops_count); +#endif + kfile->filp->private_data = NULL; + cancel_work_sync(&kfile->destroy_kctx_work); + /* Destroy the context if it wasn't done earlier from the flush() method. */ + kbase_file_destroy_kctx(kfile); + kbase_release_device(kfile->kbdev); kfree(kfile); } @@ -463,6 +628,7 @@ static struct kbase_device *to_kbase_device(struct device *dev) int assign_irqs(struct kbase_device *kbdev) { + static const char *const irq_names_caps[] = { "JOB", "MMU", "GPU" }; struct platform_device *pdev; int i; @@ -470,40 +636,35 @@ int assign_irqs(struct kbase_device *kbdev) return -ENODEV; pdev = to_platform_device(kbdev->dev); - /* 3 IRQ resources */ - for (i = 0; i < 3; i++) { - struct resource irq_res; + + for (i = 0; i < ARRAY_SIZE(irq_names_caps); i++) { + struct irq_data *irqdata; int irq; - int irqtag; - irq = platform_get_irq(pdev, i); + /* We recommend using Upper case for the irq names in dts, but if + * there are devices in the world using Lower case then we should + * avoid breaking support for them. So try using names in Upper case + * first then try using Lower case names. If both attempts fail then + * we assume there is no IRQ resource specified for the GPU. + */ + irq = platform_get_irq_byname(pdev, irq_names_caps[i]); if (irq < 0) { - dev_err(kbdev->dev, "No IRQ resource at index %d\n", i); - return irq; - } + static const char *const irq_names[] = { "job", "mmu", "gpu" }; -#if IS_ENABLED(CONFIG_OF) - if (irq != of_irq_to_resource(kbdev->dev->of_node, i, &irq_res)) { - dev_err(kbdev->dev, "Failed to get irq resource at index %d\n", i); + irq = platform_get_irq_byname(pdev, irq_names[i]); + } + + if (irq < 0) { + dev_err(kbdev->dev, "No IRQ resource '%s'\n", irq_names_caps[i]); return irq; } - if (!strncasecmp(irq_res.name, "JOB", 4)) { - irqtag = JOB_IRQ_TAG; - } else if (!strncasecmp(irq_res.name, "MMU", 4)) { - irqtag = MMU_IRQ_TAG; - } else if (!strncasecmp(irq_res.name, "GPU", 4)) { - irqtag = GPU_IRQ_TAG; - } else { - dev_err(&pdev->dev, "Invalid irq res name: '%s'\n", - irq_res.name); + kbdev->irqs[i].irq = (u32)irq; + irqdata = irq_get_irq_data((unsigned int)irq); + if (likely(irqdata)) + kbdev->irqs[i].flags = irqd_get_trigger_type(irqdata); + else return -EINVAL; - } -#else - irqtag = i; -#endif /* CONFIG_OF */ - kbdev->irqs[irqtag].irq = irq; - kbdev->irqs[irqtag].flags = irq_res.flags & IRQF_TRIGGER_MASK; } return 0; @@ -539,27 +700,6 @@ void kbase_release_device(struct kbase_device *kbdev) EXPORT_SYMBOL(kbase_release_device); #if IS_ENABLED(CONFIG_DEBUG_FS) -#if KERNEL_VERSION(4, 6, 0) > LINUX_VERSION_CODE && \ - !(KERNEL_VERSION(4, 4, 28) <= LINUX_VERSION_CODE && \ - KERNEL_VERSION(4, 5, 0) > LINUX_VERSION_CODE) -/* - * Older versions, before v4.6, of the kernel doesn't have - * kstrtobool_from_user(), except longterm 4.4.y which had it added in 4.4.28 - */ -static int kstrtobool_from_user(const char __user *s, size_t count, bool *res) -{ - char buf[4]; - - count = min(count, sizeof(buf) - 1); - - if (copy_from_user(buf, s, count)) - return -EFAULT; - buf[count] = '\0'; - - return strtobool(buf, res); -} -#endif - static ssize_t write_ctx_infinite_cache(struct file *f, const char __user *ubuf, size_t size, loff_t *off) { struct kbase_context *kctx = f->private_data; @@ -671,13 +811,8 @@ static int kbase_file_create_kctx(struct kbase_file *const kfile, kbdev = kfile->kbdev; -#if (KERNEL_VERSION(4, 6, 0) <= LINUX_VERSION_CODE) kctx = kbase_create_context(kbdev, in_compat_syscall(), - flags, kfile->api_version, kfile->filp); -#else - kctx = kbase_create_context(kbdev, is_compat_task(), - flags, kfile->api_version, kfile->filp); -#endif /* (KERNEL_VERSION(4, 6, 0) <= LINUX_VERSION_CODE) */ + flags, kfile->api_version, kfile); /* if bad flags, will stay stuck in setup mode */ if (!kctx) @@ -687,7 +822,8 @@ static int kbase_file_create_kctx(struct kbase_file *const kfile, kbase_ctx_flag_set(kctx, KCTX_INFINITE_CACHE); #if IS_ENABLED(CONFIG_DEBUG_FS) - snprintf(kctx_name, 64, "%d_%d", kctx->tgid, kctx->id); + if (unlikely(!scnprintf(kctx_name, 64, "%d_%d", kctx->tgid, kctx->id))) + return -ENOMEM; mutex_init(&kctx->mem_profile_lock); @@ -698,16 +834,8 @@ static int kbase_file_create_kctx(struct kbase_file *const kfile, /* we don't treat this as a fail - just warn about it */ dev_warn(kbdev->dev, "couldn't create debugfs dir for kctx\n"); } else { -#if (KERNEL_VERSION(4, 7, 0) > LINUX_VERSION_CODE) - /* prevent unprivileged use of debug file system - * in old kernel version - */ - debugfs_create_file("infinite_cache", 0600, kctx->kctx_dentry, - kctx, &kbase_infinite_cache_fops); -#else debugfs_create_file("infinite_cache", 0644, kctx->kctx_dentry, kctx, &kbase_infinite_cache_fops); -#endif debugfs_create_file("force_same_va", 0600, kctx->kctx_dentry, kctx, &kbase_force_same_va_fops); @@ -734,6 +862,11 @@ static int kbase_open(struct inode *inode, struct file *filp) if (!kbdev) return -ENODEV; +#if (KERNEL_VERSION(6, 0, 0) > LINUX_VERSION_CODE) + /* Set address space operations for page migration */ + kbase_mem_migrate_set_address_space_ops(kbdev, filp); +#endif + /* Device-wide firmware load is moved here from probing to comply with * Android GKI vendor guideline. */ @@ -765,6 +898,36 @@ static int kbase_release(struct inode *inode, struct file *filp) return 0; } +/** + * kbase_flush - Function implementing the flush() method of @kbase_fops. + * + * @filp: Pointer to the /dev/malixx device file instance. + * @id: Pointer to the file table structure of current process. + * If @filp is being shared by multiple processes then @id can differ + * from kfile::owner. + * + * This function is called everytime the copy of @filp is closed. So if 3 processes + * are sharing the @filp then this function would be called 3 times and only after + * that kbase_release() would get called. + * + * Return: 0 if successful, otherwise a negative error code. + * + * Note: This function always gets called in Userspace context when the + * file is closed. + */ +static int kbase_flush(struct file *filp, fl_owner_t id) +{ + struct kbase_file *const kfile = filp->private_data; + + /* Try to destroy the context if the flush() method has been called for the + * process that created the instance of /dev/malixx file i.e. 'kfile'. + */ + if (kfile->owner == id) + kbase_file_destroy_kctx_on_flush(kfile); + + return 0; +} + static int kbase_api_set_flags(struct kbase_file *kfile, struct kbase_ioctl_set_flags *flags) { @@ -818,12 +981,21 @@ static int kbase_api_set_flags(struct kbase_file *kfile, return err; } +#if !MALI_USE_CSF static int kbase_api_apc_request(struct kbase_file *kfile, struct kbase_ioctl_apc_request *apc) { kbase_pm_apc_request(kfile->kbdev, apc->dur_usec); return 0; } +#endif + +static int kbase_api_buffer_liveness_update(struct kbase_context *kctx, + struct kbase_ioctl_buffer_liveness_update *update) +{ + /* Defer handling to platform */ + return gpu_pixel_handle_buffer_liveness_update_ioctl(kctx, update); +} #if !MALI_USE_CSF static int kbase_api_job_submit(struct kbase_context *kctx, @@ -1053,9 +1225,9 @@ static int kbase_api_get_cpu_gpu_timeinfo(struct kbase_context *kctx, union kbase_ioctl_get_cpu_gpu_timeinfo *timeinfo) { u32 flags = timeinfo->in.request_flags; - struct timespec64 ts; - u64 timestamp; - u64 cycle_cnt; + struct timespec64 ts = { 0 }; + u64 timestamp = 0; + u64 cycle_cnt = 0; kbase_pm_context_active(kctx->kbdev); @@ -1084,11 +1256,7 @@ static int kbase_api_get_cpu_gpu_timeinfo(struct kbase_context *kctx, static int kbase_api_hwcnt_set(struct kbase_context *kctx, struct kbase_ioctl_hwcnt_values *values) { - gpu_model_set_dummy_prfcnt_sample( - (u32 __user *)(uintptr_t)values->data, - values->size); - - return 0; + return gpu_model_set_dummy_prfcnt_user_sample(u64_to_user_ptr(values->data), values->size); } #endif /* CONFIG_MALI_NO_MALI */ @@ -1122,52 +1290,11 @@ static int kbase_api_get_ddk_version(struct kbase_context *kctx, return len; } -/* Defaults for legacy just-in-time memory allocator initialization - * kernel calls - */ -#define DEFAULT_MAX_JIT_ALLOCATIONS 255 -#define JIT_LEGACY_TRIM_LEVEL (0) /* No trimming */ - -static int kbase_api_mem_jit_init_10_2(struct kbase_context *kctx, - struct kbase_ioctl_mem_jit_init_10_2 *jit_init) -{ - kctx->jit_version = 1; - - /* since no phys_pages parameter, use the maximum: va_pages */ - return kbase_region_tracker_init_jit(kctx, jit_init->va_pages, - DEFAULT_MAX_JIT_ALLOCATIONS, - JIT_LEGACY_TRIM_LEVEL, BASE_MEM_GROUP_DEFAULT, - jit_init->va_pages); -} - -static int kbase_api_mem_jit_init_11_5(struct kbase_context *kctx, - struct kbase_ioctl_mem_jit_init_11_5 *jit_init) -{ - int i; - - kctx->jit_version = 2; - - for (i = 0; i < sizeof(jit_init->padding); i++) { - /* Ensure all padding bytes are 0 for potential future - * extension - */ - if (jit_init->padding[i]) - return -EINVAL; - } - - /* since no phys_pages parameter, use the maximum: va_pages */ - return kbase_region_tracker_init_jit(kctx, jit_init->va_pages, - jit_init->max_allocations, jit_init->trim_level, - jit_init->group_id, jit_init->va_pages); -} - static int kbase_api_mem_jit_init(struct kbase_context *kctx, struct kbase_ioctl_mem_jit_init *jit_init) { int i; - kctx->jit_version = 3; - for (i = 0; i < sizeof(jit_init->padding); i++) { /* Ensure all padding bytes are 0 for potential future * extension @@ -1325,7 +1452,7 @@ static int kbase_api_mem_flags_change(struct kbase_context *kctx, static int kbase_api_stream_create(struct kbase_context *kctx, struct kbase_ioctl_stream_create *stream) { -#if defined(CONFIG_SYNC) || defined(CONFIG_SYNC_FILE) +#if IS_ENABLED(CONFIG_SYNC_FILE) int fd, ret; /* Name must be NULL-terminated and padded with NULLs, so check last @@ -1347,7 +1474,7 @@ static int kbase_api_stream_create(struct kbase_context *kctx, static int kbase_api_fence_validate(struct kbase_context *kctx, struct kbase_ioctl_fence_validate *validate) { -#if defined(CONFIG_SYNC) || defined(CONFIG_SYNC_FILE) +#if IS_ENABLED(CONFIG_SYNC_FILE) return kbase_sync_fence_validate(validate->fd); #else return -ENOENT; @@ -1361,12 +1488,18 @@ static int kbase_api_mem_profile_add(struct kbase_context *kctx, int err; if (data->len > KBASE_MEM_PROFILE_MAX_BUF_SIZE) { - dev_err(kctx->kbdev->dev, "mem_profile_add: buffer too big\n"); + dev_err(kctx->kbdev->dev, "mem_profile_add: buffer too big"); return -EINVAL; } + if (!data->len) { + dev_err(kctx->kbdev->dev, "mem_profile_add: buffer size is 0"); + /* Should return -EINVAL, but returning -ENOMEM for backwards compat */ + return -ENOMEM; + } + buf = kmalloc(data->len, GFP_KERNEL); - if (ZERO_OR_NULL_PTR(buf)) + if (!buf) return -ENOMEM; err = copy_from_user(buf, u64_to_user_ptr(data->buffer), @@ -1406,7 +1539,7 @@ static int kbase_api_sticky_resource_map(struct kbase_context *kctx, if (ret != 0) return -EFAULT; - kbase_gpu_vm_lock(kctx); + kbase_gpu_vm_lock_with_pmode_sync(kctx); for (i = 0; i < map->count; i++) { if (!kbase_sticky_resource_acquire(kctx, gpu_addr[i])) { @@ -1423,7 +1556,7 @@ static int kbase_api_sticky_resource_map(struct kbase_context *kctx, } } - kbase_gpu_vm_unlock(kctx); + kbase_gpu_vm_unlock_with_pmode_sync(kctx); return ret; } @@ -1444,7 +1577,7 @@ static int kbase_api_sticky_resource_unmap(struct kbase_context *kctx, if (ret != 0) return -EFAULT; - kbase_gpu_vm_lock(kctx); + kbase_gpu_vm_lock_with_pmode_sync(kctx); for (i = 0; i < unmap->count; i++) { if (!kbase_sticky_resource_release_force(kctx, NULL, gpu_addr[i])) { @@ -1453,7 +1586,7 @@ static int kbase_api_sticky_resource_unmap(struct kbase_context *kctx, } } - kbase_gpu_vm_unlock(kctx); + kbase_gpu_vm_unlock_with_pmode_sync(kctx); return ret; } @@ -1518,6 +1651,7 @@ static int kbasep_cs_queue_group_create_1_6( struct kbase_context *kctx, union kbase_ioctl_cs_queue_group_create_1_6 *create) { + int ret, i; union kbase_ioctl_cs_queue_group_create new_create = { .in = { .tiler_mask = create->in.tiler_mask, @@ -1531,16 +1665,61 @@ static int kbasep_cs_queue_group_create_1_6( .compute_max = create->in.compute_max, } }; - int ret = kbase_csf_queue_group_create(kctx, &new_create); + for (i = 0; i < ARRAY_SIZE(create->in.padding); i++) { + if (create->in.padding[i] != 0) { + dev_warn(kctx->kbdev->dev, "Invalid padding not 0 in queue group create\n"); + return -EINVAL; + } + } + + ret = kbase_csf_queue_group_create(kctx, &new_create); create->out.group_handle = new_create.out.group_handle; create->out.group_uid = new_create.out.group_uid; return ret; } + +static int kbasep_cs_queue_group_create_1_18(struct kbase_context *kctx, + union kbase_ioctl_cs_queue_group_create_1_18 *create) +{ + int ret, i; + union kbase_ioctl_cs_queue_group_create + new_create = { .in = { + .tiler_mask = create->in.tiler_mask, + .fragment_mask = create->in.fragment_mask, + .compute_mask = create->in.compute_mask, + .cs_min = create->in.cs_min, + .priority = create->in.priority, + .tiler_max = create->in.tiler_max, + .fragment_max = create->in.fragment_max, + .compute_max = create->in.compute_max, + .csi_handlers = create->in.csi_handlers, + .dvs_buf = create->in.dvs_buf, + } }; + + for (i = 0; i < ARRAY_SIZE(create->in.padding); i++) { + if (create->in.padding[i] != 0) { + dev_warn(kctx->kbdev->dev, "Invalid padding not 0 in queue group create\n"); + return -EINVAL; + } + } + + ret = kbase_csf_queue_group_create(kctx, &new_create); + + create->out.group_handle = new_create.out.group_handle; + create->out.group_uid = new_create.out.group_uid; + + return ret; +} + static int kbasep_cs_queue_group_create(struct kbase_context *kctx, union kbase_ioctl_cs_queue_group_create *create) { + if (create->in.reserved != 0) { + dev_warn(kctx->kbdev->dev, "Invalid reserved field not 0 in queue group create\n"); + return -EINVAL; + } return kbase_csf_queue_group_create(kctx, create); } @@ -1573,12 +1752,31 @@ static int kbasep_kcpu_queue_enqueue(struct kbase_context *kctx, static int kbasep_cs_tiler_heap_init(struct kbase_context *kctx, union kbase_ioctl_cs_tiler_heap_init *heap_init) { + if (heap_init->in.group_id >= MEMORY_GROUP_MANAGER_NR_GROUPS) + return -EINVAL; + else + kctx->jit_group_id = heap_init->in.group_id; + + return kbase_csf_tiler_heap_init(kctx, heap_init->in.chunk_size, + heap_init->in.initial_chunks, heap_init->in.max_chunks, + heap_init->in.target_in_flight, heap_init->in.buf_desc_va, + &heap_init->out.gpu_heap_va, + &heap_init->out.first_chunk_va); +} + +static int kbasep_cs_tiler_heap_init_1_13(struct kbase_context *kctx, + union kbase_ioctl_cs_tiler_heap_init_1_13 *heap_init) +{ + if (heap_init->in.group_id >= MEMORY_GROUP_MANAGER_NR_GROUPS) + return -EINVAL; + kctx->jit_group_id = heap_init->in.group_id; return kbase_csf_tiler_heap_init(kctx, heap_init->in.chunk_size, - heap_init->in.initial_chunks, heap_init->in.max_chunks, - heap_init->in.target_in_flight, - &heap_init->out.gpu_heap_va, &heap_init->out.first_chunk_va); + heap_init->in.initial_chunks, heap_init->in.max_chunks, + heap_init->in.target_in_flight, 0, + &heap_init->out.gpu_heap_va, + &heap_init->out.first_chunk_va); } static int kbasep_cs_tiler_heap_term(struct kbase_context *kctx, @@ -1660,6 +1858,30 @@ static int kbasep_ioctl_cs_cpu_queue_dump(struct kbase_context *kctx, cpu_queue_info->size); } +static int kbase_ioctl_read_user_page(struct kbase_context *kctx, + union kbase_ioctl_read_user_page *user_page) +{ + struct kbase_device *kbdev = kctx->kbdev; + unsigned long flags; + + /* As of now, only LATEST_FLUSH is supported */ + if (unlikely(user_page->in.offset != LATEST_FLUSH)) + return -EINVAL; + + /* Validating padding that must be zero */ + if (unlikely(user_page->in.padding != 0)) + return -EINVAL; + + spin_lock_irqsave(&kbdev->hwaccess_lock, flags); + if (!kbdev->pm.backend.gpu_powered) + user_page->out.val_lo = POWER_DOWN_LATEST_FLUSH_VALUE; + else + user_page->out.val_lo = kbase_reg_read(kbdev, USER_REG(LATEST_FLUSH)); + user_page->out.val_hi = 0; + spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); + + return 0; +} #endif /* MALI_USE_CSF */ static int kbasep_ioctl_context_priority_check(struct kbase_context *kctx, @@ -1755,9 +1977,8 @@ static int kbasep_ioctl_set_limited_core_count(struct kbase_context *kctx, return 0; } -static long kbase_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) +static long kbase_kfile_ioctl(struct kbase_file *kfile, unsigned int cmd, unsigned long arg) { - struct kbase_file *const kfile = filp->private_data; struct kbase_context *kctx = NULL; struct kbase_device *kbdev = kfile->kbdev; void __user *uarg = (void __user *)arg; @@ -1785,12 +2006,14 @@ static long kbase_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) kfile); break; +#if !MALI_USE_CSF case KBASE_IOCTL_APC_REQUEST: KBASE_HANDLE_IOCTL_IN(KBASE_IOCTL_APC_REQUEST, kbase_api_apc_request, struct kbase_ioctl_apc_request, kfile); break; +#endif case KBASE_IOCTL_KINSTR_PRFCNT_ENUM_INFO: KBASE_HANDLE_IOCTL_INOUT( @@ -1868,18 +2091,6 @@ static long kbase_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) struct kbase_ioctl_get_ddk_version, kctx); break; - case KBASE_IOCTL_MEM_JIT_INIT_10_2: - KBASE_HANDLE_IOCTL_IN(KBASE_IOCTL_MEM_JIT_INIT_10_2, - kbase_api_mem_jit_init_10_2, - struct kbase_ioctl_mem_jit_init_10_2, - kctx); - break; - case KBASE_IOCTL_MEM_JIT_INIT_11_5: - KBASE_HANDLE_IOCTL_IN(KBASE_IOCTL_MEM_JIT_INIT_11_5, - kbase_api_mem_jit_init_11_5, - struct kbase_ioctl_mem_jit_init_11_5, - kctx); - break; case KBASE_IOCTL_MEM_JIT_INIT: KBASE_HANDLE_IOCTL_IN(KBASE_IOCTL_MEM_JIT_INIT, kbase_api_mem_jit_init, @@ -2081,6 +2292,11 @@ static long kbase_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) kbasep_cs_queue_group_create_1_6, union kbase_ioctl_cs_queue_group_create_1_6, kctx); break; + case KBASE_IOCTL_CS_QUEUE_GROUP_CREATE_1_18: + KBASE_HANDLE_IOCTL_INOUT(KBASE_IOCTL_CS_QUEUE_GROUP_CREATE_1_18, + kbasep_cs_queue_group_create_1_18, + union kbase_ioctl_cs_queue_group_create_1_18, kctx); + break; case KBASE_IOCTL_CS_QUEUE_GROUP_CREATE: KBASE_HANDLE_IOCTL_INOUT(KBASE_IOCTL_CS_QUEUE_GROUP_CREATE, kbasep_cs_queue_group_create, @@ -2117,6 +2333,11 @@ static long kbase_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) union kbase_ioctl_cs_tiler_heap_init, kctx); break; + case KBASE_IOCTL_CS_TILER_HEAP_INIT_1_13: + KBASE_HANDLE_IOCTL_INOUT(KBASE_IOCTL_CS_TILER_HEAP_INIT_1_13, + kbasep_cs_tiler_heap_init_1_13, + union kbase_ioctl_cs_tiler_heap_init_1_13, kctx); + break; case KBASE_IOCTL_CS_TILER_HEAP_TERM: KBASE_HANDLE_IOCTL_IN(KBASE_IOCTL_CS_TILER_HEAP_TERM, kbasep_cs_tiler_heap_term, @@ -2135,6 +2356,11 @@ static long kbase_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) struct kbase_ioctl_cs_cpu_queue_info, kctx); break; + /* This IOCTL will be kept for backward compatibility */ + case KBASE_IOCTL_READ_USER_PAGE: + KBASE_HANDLE_IOCTL_INOUT(KBASE_IOCTL_READ_USER_PAGE, kbase_ioctl_read_user_page, + union kbase_ioctl_read_user_page, kctx); + break; #endif /* MALI_USE_CSF */ #if MALI_UNIT_TEST case KBASE_IOCTL_TLSTREAM_STATS: @@ -2156,6 +2382,12 @@ static long kbase_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) struct kbase_ioctl_set_limited_core_count, kctx); break; + case KBASE_IOCTL_BUFFER_LIVENESS_UPDATE: + KBASE_HANDLE_IOCTL_IN(KBASE_IOCTL_BUFFER_LIVENESS_UPDATE, + kbase_api_buffer_liveness_update, + struct kbase_ioctl_buffer_liveness_update, + kctx); + break; } dev_warn(kbdev->dev, "Unknown ioctl 0x%x nr:%d", cmd, _IOC_NR(cmd)); @@ -2163,20 +2395,45 @@ static long kbase_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) return -ENOIOCTLCMD; } +static long kbase_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) +{ + struct kbase_file *const kfile = filp->private_data; + long ioctl_ret; + + if (unlikely(!kbase_file_inc_fops_count_if_allowed(kfile))) + return -EPERM; + + ioctl_ret = kbase_kfile_ioctl(kfile, cmd, arg); + kbase_file_dec_fops_count(kfile); + + return ioctl_ret; +} + #if MALI_USE_CSF static ssize_t kbase_read(struct file *filp, char __user *buf, size_t count, loff_t *f_pos) { struct kbase_file *const kfile = filp->private_data; - struct kbase_context *const kctx = - kbase_file_get_kctx_if_setup_complete(kfile); + struct kbase_context *kctx; struct base_csf_notification event_data = { .type = BASE_CSF_NOTIFICATION_EVENT }; const size_t data_size = sizeof(event_data); bool read_event = false, read_error = false; + ssize_t err = 0; - if (unlikely(!kctx)) + if (unlikely(!kbase_file_inc_fops_count_if_allowed(kfile))) return -EPERM; + kctx = kbase_file_get_kctx_if_setup_complete(kfile); + if (unlikely(!kctx)) { + err = -EPERM; + goto out; + } + + if (count < data_size) { + err = -ENOBUFS; + goto out; + } + if (atomic_read(&kctx->event_count)) read_event = true; else @@ -2199,28 +2456,39 @@ static ssize_t kbase_read(struct file *filp, char __user *buf, size_t count, lof if (copy_to_user(buf, &event_data, data_size) != 0) { dev_warn(kctx->kbdev->dev, "Failed to copy data\n"); - return -EFAULT; + err = -EFAULT; + goto out; } if (read_event) atomic_set(&kctx->event_count, 0); - return data_size; +out: + kbase_file_dec_fops_count(kfile); + return err ? err : data_size; } #else /* MALI_USE_CSF */ static ssize_t kbase_read(struct file *filp, char __user *buf, size_t count, loff_t *f_pos) { struct kbase_file *const kfile = filp->private_data; - struct kbase_context *const kctx = - kbase_file_get_kctx_if_setup_complete(kfile); + struct kbase_context *kctx; struct base_jd_event_v2 uevent; int out_count = 0; + ssize_t err = 0; - if (unlikely(!kctx)) + if (unlikely(!kbase_file_inc_fops_count_if_allowed(kfile))) return -EPERM; - if (count < sizeof(uevent)) - return -ENOBUFS; + kctx = kbase_file_get_kctx_if_setup_complete(kfile); + if (unlikely(!kctx)) { + err = -EPERM; + goto out; + } + + if (count < sizeof(uevent)) { + err = -ENOBUFS; + goto out; + } memset(&uevent, 0, sizeof(uevent)); @@ -2229,46 +2497,78 @@ static ssize_t kbase_read(struct file *filp, char __user *buf, size_t count, lof if (out_count > 0) goto out; - if (filp->f_flags & O_NONBLOCK) - return -EAGAIN; + if (filp->f_flags & O_NONBLOCK) { + err = -EAGAIN; + goto out; + } - if (wait_event_interruptible(kctx->event_queue, - kbase_event_pending(kctx)) != 0) - return -ERESTARTSYS; + if (wait_event_interruptible(kctx->kfile->event_queue, + kbase_event_pending(kctx)) != 0) { + err = -ERESTARTSYS; + goto out; + } } if (uevent.event_code == BASE_JD_EVENT_DRV_TERMINATED) { - if (out_count == 0) - return -EPIPE; + if (out_count == 0) { + err = -EPIPE; + goto out; + } goto out; } - if (copy_to_user(buf, &uevent, sizeof(uevent)) != 0) - return -EFAULT; + if (copy_to_user(buf, &uevent, sizeof(uevent)) != 0) { + err = -EFAULT; + goto out; + } buf += sizeof(uevent); out_count++; count -= sizeof(uevent); } while (count >= sizeof(uevent)); - out: - return out_count * sizeof(uevent); +out: + kbase_file_dec_fops_count(kfile); + return err ? err : (out_count * sizeof(uevent)); } #endif /* MALI_USE_CSF */ -static unsigned int kbase_poll(struct file *filp, poll_table *wait) +static __poll_t kbase_poll(struct file *filp, poll_table *wait) { struct kbase_file *const kfile = filp->private_data; - struct kbase_context *const kctx = - kbase_file_get_kctx_if_setup_complete(kfile); + struct kbase_context *kctx; + __poll_t ret = 0; - if (unlikely(!kctx)) - return POLLERR; + if (unlikely(!kbase_file_inc_fops_count_if_allowed(kfile))) { +#if (KERNEL_VERSION(4, 19, 0) > LINUX_VERSION_CODE) + ret = POLLNVAL; +#else + ret = EPOLLNVAL; +#endif + return ret; + } + + kctx = kbase_file_get_kctx_if_setup_complete(kfile); + if (unlikely(!kctx)) { +#if (KERNEL_VERSION(4, 19, 0) > LINUX_VERSION_CODE) + ret = POLLERR; +#else + ret = EPOLLERR; +#endif + goto out; + } - poll_wait(filp, &kctx->event_queue, wait); - if (kbase_event_pending(kctx)) - return POLLIN | POLLRDNORM; + poll_wait(filp, &kfile->event_queue, wait); + if (kbase_event_pending(kctx)) { +#if (KERNEL_VERSION(4, 19, 0) > LINUX_VERSION_CODE) + ret = POLLIN | POLLRDNORM; +#else + ret = EPOLLIN | EPOLLRDNORM; +#endif + } - return 0; +out: + kbase_file_dec_fops_count(kfile); + return ret; } void _kbase_event_wakeup(struct kbase_context *kctx, bool sync) @@ -2277,12 +2577,12 @@ void _kbase_event_wakeup(struct kbase_context *kctx, bool sync) if(sync) { dev_dbg(kctx->kbdev->dev, "Waking event queue for context %pK (sync)\n", (void *)kctx); - wake_up_interruptible_sync(&kctx->event_queue); + wake_up_interruptible_sync(&kctx->kfile->event_queue); } else { dev_dbg(kctx->kbdev->dev, "Waking event queue for context %pK (nosync)\n",(void *)kctx); - wake_up_interruptible(&kctx->event_queue); + wake_up_interruptible(&kctx->kfile->event_queue); } } @@ -2291,7 +2591,10 @@ KBASE_EXPORT_TEST_API(_kbase_event_wakeup); #if MALI_USE_CSF int kbase_event_pending(struct kbase_context *ctx) { - WARN_ON_ONCE(!ctx); + KBASE_DEBUG_ASSERT(ctx); + + if (unlikely(!ctx)) + return -EPERM; return (atomic_read(&ctx->event_count) != 0) || kbase_csf_event_error_pending(ctx) || @@ -2302,6 +2605,9 @@ int kbase_event_pending(struct kbase_context *ctx) { KBASE_DEBUG_ASSERT(ctx); + if (unlikely(!ctx)) + return -EPERM; + return (atomic_read(&ctx->event_count) != 0) || (atomic_read(&ctx->event_closed) != 0); } @@ -2312,13 +2618,20 @@ KBASE_EXPORT_TEST_API(kbase_event_pending); static int kbase_mmap(struct file *const filp, struct vm_area_struct *const vma) { struct kbase_file *const kfile = filp->private_data; - struct kbase_context *const kctx = - kbase_file_get_kctx_if_setup_complete(kfile); + struct kbase_context *kctx; + int ret; - if (unlikely(!kctx)) + if (unlikely(!kbase_file_inc_fops_count_if_allowed(kfile))) return -EPERM; - return kbase_context_mmap(kctx, vma); + kctx = kbase_file_get_kctx_if_setup_complete(kfile); + if (likely(kctx)) + ret = kbase_context_mmap(kctx, vma); + else + ret = -EPERM; + + kbase_file_dec_fops_count(kfile); + return ret; } static int kbase_check_flags(int flags) @@ -2337,18 +2650,26 @@ static unsigned long kbase_get_unmapped_area(struct file *const filp, const unsigned long pgoff, const unsigned long flags) { struct kbase_file *const kfile = filp->private_data; - struct kbase_context *const kctx = - kbase_file_get_kctx_if_setup_complete(kfile); + struct kbase_context *kctx; + unsigned long address; - if (unlikely(!kctx)) + if (unlikely(!kbase_file_inc_fops_count_if_allowed(kfile))) return -EPERM; - return kbase_context_get_unmapped_area(kctx, addr, len, pgoff, flags); + kctx = kbase_file_get_kctx_if_setup_complete(kfile); + if (likely(kctx)) + address = kbase_context_get_unmapped_area(kctx, addr, len, pgoff, flags); + else + address = -EPERM; + + kbase_file_dec_fops_count(kfile); + return address; } static const struct file_operations kbase_fops = { .owner = THIS_MODULE, .open = kbase_open, + .flush = kbase_flush, .release = kbase_release, .read = kbase_read, .poll = kbase_poll, @@ -2579,7 +2900,7 @@ static ssize_t core_mask_store(struct device *dev, struct device_attribute *attr new_core_mask[1] = new_core_mask[2] = new_core_mask[0]; #endif - mutex_lock(&kbdev->pm.lock); + rt_mutex_lock(&kbdev->pm.lock); spin_lock_irqsave(&kbdev->hwaccess_lock, flags); shader_present = kbdev->gpu_props.props.raw_props.shader_present; @@ -2649,7 +2970,7 @@ static ssize_t core_mask_store(struct device *dev, struct device_attribute *attr unlock: spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); - mutex_unlock(&kbdev->pm.lock); + rt_mutex_unlock(&kbdev->pm.lock); end: return err; } @@ -3271,22 +3592,22 @@ static ssize_t gpuinfo_show(struct device *dev, .name = "Mali-G510" }, { .id = GPU_ID2_PRODUCT_TVAX >> KBASE_GPU_ID_VERSION_PRODUCT_ID_SHIFT, .name = "Mali-G310" }, - { .id = GPU_ID2_PRODUCT_TTUX >> KBASE_GPU_ID_VERSION_PRODUCT_ID_SHIFT, - .name = "Mali-TTUX" }, - { .id = GPU_ID2_PRODUCT_LTUX >> KBASE_GPU_ID_VERSION_PRODUCT_ID_SHIFT, - .name = "Mali-LTUX" }, + { .id = GPU_ID2_PRODUCT_LTIX >> KBASE_GPU_ID_VERSION_PRODUCT_ID_SHIFT, + .name = "Mali-G620" }, }; const char *product_name = "(Unknown Mali GPU)"; struct kbase_device *kbdev; u32 gpu_id; unsigned int product_id, product_id_mask; unsigned int i; + struct kbase_gpu_props *gpu_props; kbdev = to_kbase_device(dev); if (!kbdev) return -ENODEV; - gpu_id = kbdev->gpu_props.props.raw_props.gpu_id; + gpu_props = &kbdev->gpu_props; + gpu_id = gpu_props->props.raw_props.gpu_id; product_id = gpu_id >> KBASE_GPU_ID_VERSION_PRODUCT_ID_SHIFT; product_id_mask = GPU_ID2_PRODUCT_MODEL >> KBASE_GPU_ID_VERSION_PRODUCT_ID_SHIFT; @@ -3300,6 +3621,47 @@ static ssize_t gpuinfo_show(struct device *dev, } } +#if MALI_USE_CSF + if ((product_id & product_id_mask) == + ((GPU_ID2_PRODUCT_TTUX >> KBASE_GPU_ID_VERSION_PRODUCT_ID_SHIFT) & product_id_mask)) { + const bool rt_supported = + GPU_FEATURES_RAY_TRACING_GET(gpu_props->props.raw_props.gpu_features); + const u8 nr_cores = gpu_props->num_cores; + + /* Mali-G715-Immortalis if 10 < number of cores with ray tracing supproted. + * Mali-G715 if 10 < number of cores without ray tracing supported. + * Mali-G715 if 7 <= number of cores <= 10 regardless ray tracing. + * Mali-G615 if number of cores < 7. + */ + if ((nr_cores > 10) && rt_supported) + product_name = "Mali-G715-Immortalis"; + else if (nr_cores >= 7) + product_name = "Mali-G715"; + + if (nr_cores < 7) { + dev_warn(kbdev->dev, "nr_cores(%u) GPU ID must be G615", nr_cores); + product_name = "Mali-G615"; + } else + dev_dbg(kbdev->dev, "GPU ID_Name: %s, nr_cores(%u)\n", product_name, + nr_cores); + } + + if ((product_id & product_id_mask) == + ((GPU_ID2_PRODUCT_TTIX >> KBASE_GPU_ID_VERSION_PRODUCT_ID_SHIFT) & product_id_mask)) { + const bool rt_supported = + GPU_FEATURES_RAY_TRACING_GET(gpu_props->props.raw_props.gpu_features); + const u8 nr_cores = gpu_props->num_cores; + + if ((nr_cores >= 10) && rt_supported) + product_name = "Mali-G720-Immortalis"; + else + product_name = (nr_cores >= 6) ? "Mali-G720" : "Mali-G620"; + + dev_dbg(kbdev->dev, "GPU ID_Name: %s (ID: 0x%x), nr_cores(%u)\n", product_name, + nr_cores, product_id & product_id_mask); + } +#endif /* MALI_USE_CSF */ + return scnprintf(buf, PAGE_SIZE, "%s %d cores r%dp%d 0x%04X\n", product_name, kbdev->gpu_props.num_cores, (gpu_id & GPU_ID_VERSION_MAJOR) >> KBASE_GPU_ID_VERSION_MAJOR_SHIFT, @@ -3372,6 +3734,56 @@ static ssize_t dvfs_period_show(struct device *dev, static DEVICE_ATTR_RW(dvfs_period); +int kbase_pm_gpu_freq_init(struct kbase_device *kbdev) +{ + int err; + /* Uses default reference frequency defined in below macro */ + u64 lowest_freq_khz = DEFAULT_REF_TIMEOUT_FREQ_KHZ; + + /* Only check lowest frequency in cases when OPPs are used and + * present in the device tree. + */ +#ifdef CONFIG_PM_OPP + struct dev_pm_opp *opp_ptr; + unsigned long found_freq = 0; + + /* find lowest frequency OPP */ + opp_ptr = dev_pm_opp_find_freq_ceil(kbdev->dev, &found_freq); + if (IS_ERR(opp_ptr)) { + dev_err(kbdev->dev, "No OPPs found in device tree! Scaling timeouts using %llu kHz", + (unsigned long long)lowest_freq_khz); + } else { +#if KERNEL_VERSION(4, 11, 0) <= LINUX_VERSION_CODE + dev_pm_opp_put(opp_ptr); /* decrease OPP refcount */ +#endif + /* convert found frequency to KHz */ + found_freq /= 1000; + + /* If lowest frequency in OPP table is still higher + * than the reference, then keep the reference frequency + * as the one to use for scaling . + */ + if (found_freq < lowest_freq_khz) + lowest_freq_khz = found_freq; + } +#else + dev_err(kbdev->dev, "No operating-points-v2 node or operating-points property in DT"); +#endif + + kbdev->lowest_gpu_freq_khz = lowest_freq_khz; + + err = kbase_device_populate_max_freq(kbdev); + if (unlikely(err < 0)) + return -1; + + dev_dbg(kbdev->dev, "Lowest frequency identified is %llu kHz", kbdev->lowest_gpu_freq_khz); + dev_dbg(kbdev->dev, + "Setting default highest frequency to %u kHz (pending devfreq initialization", + kbdev->gpu_props.props.core_props.gpu_freq_khz_max); + + return 0; +} + /** * pm_poweroff_store - Store callback for the pm_poweroff sysfs file. * @dev: The device with sysfs file is for @@ -3481,21 +3893,32 @@ static ssize_t reset_timeout_store(struct device *dev, { struct kbase_device *kbdev; int ret; - int reset_timeout; + u32 reset_timeout; + u32 default_reset_timeout; kbdev = to_kbase_device(dev); if (!kbdev) return -ENODEV; - ret = kstrtoint(buf, 0, &reset_timeout); - if (ret || reset_timeout <= 0) { + ret = kstrtou32(buf, 0, &reset_timeout); + if (ret || reset_timeout == 0) { dev_err(kbdev->dev, "Couldn't process reset_timeout write operation.\n" "Use format <reset_timeout_ms>\n"); return -EINVAL; } +#if MALI_USE_CSF + default_reset_timeout = kbase_get_timeout_ms(kbdev, CSF_GPU_RESET_TIMEOUT); +#else /* MALI_USE_CSF */ + default_reset_timeout = JM_DEFAULT_RESET_TIMEOUT_MS; +#endif /* !MALI_USE_CSF */ + + if (reset_timeout < default_reset_timeout) + dev_warn(kbdev->dev, "requested reset_timeout(%u) is smaller than default(%u)", + reset_timeout, default_reset_timeout); + kbdev->reset_timeout_ms = reset_timeout; - dev_dbg(kbdev->dev, "Reset timeout: %dms\n", reset_timeout); + dev_dbg(kbdev->dev, "Reset timeout: %ums\n", reset_timeout); return count; } @@ -4290,7 +4713,7 @@ static int kbase_common_reg_map(struct kbase_device *kbdev) static void kbase_common_reg_unmap(struct kbase_device * const kbdev) { } -#else /* CONFIG_MALI_NO_MALI */ +#else /* !IS_ENABLED(CONFIG_MALI_NO_MALI) */ static int kbase_common_reg_map(struct kbase_device *kbdev) { int err = 0; @@ -4326,7 +4749,7 @@ static void kbase_common_reg_unmap(struct kbase_device * const kbdev) kbdev->reg_size = 0; } } -#endif /* CONFIG_MALI_NO_MALI */ +#endif /* !IS_ENABLED(CONFIG_MALI_NO_MALI) */ int registers_map(struct kbase_device * const kbdev) { @@ -4379,8 +4802,10 @@ static bool kbase_is_pm_enabled(const struct device_node *gpu_node) const void *operating_point_node; bool is_pm_enable = false; - power_model_node = of_get_child_by_name(gpu_node, - "power_model"); + power_model_node = of_get_child_by_name(gpu_node, "power-model"); + if (!power_model_node) + power_model_node = of_get_child_by_name(gpu_node, "power_model"); + if (power_model_node) is_pm_enable = true; @@ -4401,8 +4826,9 @@ static bool kbase_is_pv_enabled(const struct device_node *gpu_node) { const void *arbiter_if_node; - arbiter_if_node = of_get_property(gpu_node, - "arbiter_if", NULL); + arbiter_if_node = of_get_property(gpu_node, "arbiter-if", NULL); + if (!arbiter_if_node) + arbiter_if_node = of_get_property(gpu_node, "arbiter_if", NULL); return arbiter_if_node ? true : false; } @@ -4530,14 +4956,14 @@ int power_control_init(struct kbase_device *kbdev) for (i = 0; i < BASE_MAX_NR_CLOCKS_REGULATORS; i++) { kbdev->regulators[i] = regulator_get_optional(kbdev->dev, regulator_names[i]); - if (IS_ERR_OR_NULL(kbdev->regulators[i])) { + if (IS_ERR(kbdev->regulators[i])) { err = PTR_ERR(kbdev->regulators[i]); kbdev->regulators[i] = NULL; break; } } if (err == -EPROBE_DEFER) { - while ((i > 0) && (i < BASE_MAX_NR_CLOCKS_REGULATORS)) + while (i > 0) regulator_put(kbdev->regulators[--i]); return err; } @@ -4558,7 +4984,7 @@ int power_control_init(struct kbase_device *kbdev) */ for (i = 0; i < BASE_MAX_NR_CLOCKS_REGULATORS; i++) { kbdev->clocks[i] = of_clk_get(kbdev->dev->of_node, i); - if (IS_ERR_OR_NULL(kbdev->clocks[i])) { + if (IS_ERR(kbdev->clocks[i])) { err = PTR_ERR(kbdev->clocks[i]); kbdev->clocks[i] = NULL; break; @@ -4574,7 +5000,7 @@ int power_control_init(struct kbase_device *kbdev) } } if (err == -EPROBE_DEFER) { - while ((i > 0) && (i < BASE_MAX_NR_CLOCKS_REGULATORS)) { + while (i > 0) { clk_disable_unprepare(kbdev->clocks[--i]); clk_put(kbdev->clocks[i]); } @@ -4591,16 +5017,47 @@ int power_control_init(struct kbase_device *kbdev) */ #if defined(CONFIG_PM_OPP) #if defined(CONFIG_REGULATOR) +#if (KERNEL_VERSION(6, 0, 0) <= LINUX_VERSION_CODE) + if (kbdev->nr_regulators > 0) { + kbdev->token = dev_pm_opp_set_regulators(kbdev->dev, regulator_names); + + if (kbdev->token < 0) { + err = kbdev->token; + goto regulators_probe_defer; + } + + } +#elif (KERNEL_VERSION(4, 10, 0) <= LINUX_VERSION_CODE) if (kbdev->nr_regulators > 0) { - kbdev->opp_token = dev_pm_opp_set_regulators(kbdev->dev, - regulator_names); + kbdev->opp_table = dev_pm_opp_set_regulators(kbdev->dev, + regulator_names, BASE_MAX_NR_CLOCKS_REGULATORS); + + if (IS_ERR(kbdev->opp_table)) { + err = PTR_ERR(kbdev->opp_table); + goto regulators_probe_defer; + } } +#endif /* (KERNEL_VERSION(6, 0, 0) <= LINUX_VERSION_CODE) */ #endif /* CONFIG_REGULATOR */ err = dev_pm_opp_of_add_table(kbdev->dev); CSTD_UNUSED(err); #endif /* CONFIG_PM_OPP */ return 0; +#if defined(CONFIG_PM_OPP) && \ + ((KERNEL_VERSION(4, 10, 0) <= LINUX_VERSION_CODE) && defined(CONFIG_REGULATOR)) +regulators_probe_defer: + for (i = 0; i < BASE_MAX_NR_CLOCKS_REGULATORS; i++) { + if (kbdev->clocks[i]) { + if (__clk_is_enabled(kbdev->clocks[i])) + clk_disable_unprepare(kbdev->clocks[i]); + clk_put(kbdev->clocks[i]); + kbdev->clocks[i] = NULL; + } else + break; + } +#endif + clocks_probe_defer: #if defined(CONFIG_REGULATOR) for (i = 0; i < BASE_MAX_NR_CLOCKS_REGULATORS; i++) @@ -4617,8 +5074,13 @@ void power_control_term(struct kbase_device *kbdev) #if defined(CONFIG_PM_OPP) dev_pm_opp_of_remove_table(kbdev->dev); #if defined(CONFIG_REGULATOR) - if (kbdev->opp_token >= 0) - dev_pm_opp_put_regulators(kbdev->opp_token); +#if (KERNEL_VERSION(6, 0, 0) <= LINUX_VERSION_CODE) + if (kbdev->token > -EPERM) + dev_pm_opp_put_regulators(kbdev->token); +#elif (KERNEL_VERSION(4, 10, 0) <= LINUX_VERSION_CODE) + if (!IS_ERR_OR_NULL(kbdev->opp_table)) + dev_pm_opp_put_regulators(kbdev->opp_table); +#endif /* (KERNEL_VERSION(6, 0, 0) <= LINUX_VERSION_CODE) */ #endif /* CONFIG_REGULATOR */ #endif /* CONFIG_PM_OPP */ @@ -4659,18 +5121,18 @@ static int type##_quirks_set(void *data, u64 val) \ kbdev = (struct kbase_device *)data; \ kbdev->hw_quirks_##type = (u32)val; \ trigger_reset(kbdev); \ - return 0;\ + return 0; \ } \ \ static int type##_quirks_get(void *data, u64 *val) \ { \ - struct kbase_device *kbdev;\ - kbdev = (struct kbase_device *)data;\ - *val = kbdev->hw_quirks_##type;\ - return 0;\ + struct kbase_device *kbdev; \ + kbdev = (struct kbase_device *)data; \ + *val = kbdev->hw_quirks_##type; \ + return 0; \ } \ -DEFINE_SIMPLE_ATTRIBUTE(fops_##type##_quirks, type##_quirks_get,\ - type##_quirks_set, "%llu\n") +DEFINE_DEBUGFS_ATTRIBUTE(fops_##type##_quirks, type##_quirks_get, \ + type##_quirks_set, "%llu\n") MAKE_QUIRK_ACCESSORS(sc); MAKE_QUIRK_ACCESSORS(tiler); @@ -4700,8 +5162,46 @@ static int kbase_device_debugfs_reset_write(void *data, u64 wait_for_reset) return 0; } -DEFINE_SIMPLE_ATTRIBUTE(fops_trigger_reset, - NULL, &kbase_device_debugfs_reset_write, "%llu\n"); +DEFINE_DEBUGFS_ATTRIBUTE(fops_trigger_reset, NULL, &kbase_device_debugfs_reset_write, "%llu\n"); + +/** + * kbase_device_debugfs_trigger_uevent_write - send a GPU uevent + * @file: File object to write to + * @ubuf: User buffer to read data from + * @count: Length of user buffer + * @ppos: Offset within file object + * + * Return: bytes read. + */ +static ssize_t kbase_device_debugfs_trigger_uevent_write(struct file *file, + const char __user *ubuf, size_t count, loff_t *ppos) +{ + struct kbase_device *kbdev = (struct kbase_device *)file->private_data; + struct gpu_uevent evt = { 0 }; + char str[8] = { 0 }; + + if (count >= sizeof(str)) + return -EINVAL; + + if (copy_from_user(str, ubuf, count)) + return -EINVAL; + + str[count] = '\0'; + + if (sscanf(str, "%u %u", &evt.type, &evt.info) != 2) + return -EINVAL; + + pixel_gpu_uevent_send(kbdev, (const struct gpu_uevent *) &evt); + + return count; +} + +static const struct file_operations fops_trigger_uevent = { + .owner = THIS_MODULE, + .open = simple_open, + .write = kbase_device_debugfs_trigger_uevent_write, + .llseek = default_llseek, +}; /** * debugfs_protected_debug_mode_read - "protected_debug_mode" debugfs read @@ -4785,57 +5285,84 @@ static const struct file_operations .release = single_release, }; -int kbase_device_debugfs_init(struct kbase_device *kbdev) +/** + * debugfs_ctx_defaults_init - Create the default configuration of new contexts in debugfs + * @kbdev: An instance of the GPU platform device, allocated from the probe method of the driver. + * Return: A pointer to the last dentry that it tried to create, whether successful or not. + * Could be NULL or encode another error value. + */ +static struct dentry *debugfs_ctx_defaults_init(struct kbase_device *const kbdev) { - struct dentry *debugfs_ctx_defaults_directory; - int err; /* prevent unprivileged use of debug file system * in old kernel version */ -#if (KERNEL_VERSION(4, 7, 0) <= LINUX_VERSION_CODE) - /* only for newer kernel version debug file system is safe */ const mode_t mode = 0644; -#else - const mode_t mode = 0600; -#endif + struct dentry *dentry = debugfs_create_dir("defaults", kbdev->debugfs_ctx_directory); + struct dentry *debugfs_ctx_defaults_directory = dentry; + + if (IS_ERR_OR_NULL(dentry)) { + dev_err(kbdev->dev, "Couldn't create mali debugfs ctx defaults directory\n"); + return dentry; + } + + debugfs_create_bool("infinite_cache", mode, + debugfs_ctx_defaults_directory, + &kbdev->infinite_cache_active_default); - kbdev->mali_debugfs_directory = debugfs_create_dir(kbdev->devname, - NULL); - if (IS_ERR_OR_NULL(kbdev->mali_debugfs_directory)) { + dentry = debugfs_create_file("mem_pool_max_size", mode, debugfs_ctx_defaults_directory, + &kbdev->mem_pool_defaults.small, + &kbase_device_debugfs_mem_pool_max_size_fops); + if (IS_ERR_OR_NULL(dentry)) { + dev_err(kbdev->dev, "Unable to create mem_pool_max_size debugfs entry\n"); + return dentry; + } + + dentry = debugfs_create_file("lp_mem_pool_max_size", mode, debugfs_ctx_defaults_directory, + &kbdev->mem_pool_defaults.large, + &kbase_device_debugfs_mem_pool_max_size_fops); + if (IS_ERR_OR_NULL(dentry)) + dev_err(kbdev->dev, "Unable to create lp_mem_pool_max_size debugfs entry\n"); + + return dentry; +} + +/** + * init_debugfs - Create device-wide debugfs directories and files for the Mali driver + * @kbdev: An instance of the GPU platform device, allocated from the probe method of the driver. + * Return: A pointer to the last dentry that it tried to create, whether successful or not. + * Could be NULL or encode another error value. + */ +static struct dentry *init_debugfs(struct kbase_device *kbdev) +{ + struct dentry *dentry = debugfs_create_dir(kbdev->devname, NULL); + + kbdev->mali_debugfs_directory = dentry; + if (IS_ERR_OR_NULL(dentry)) { dev_err(kbdev->dev, "Couldn't create mali debugfs directory: %s\n", kbdev->devname); - err = -ENOMEM; - goto out; + return dentry; } - kbdev->debugfs_ctx_directory = debugfs_create_dir("ctx", - kbdev->mali_debugfs_directory); - if (IS_ERR_OR_NULL(kbdev->debugfs_ctx_directory)) { + dentry = debugfs_create_dir("ctx", kbdev->mali_debugfs_directory); + kbdev->debugfs_ctx_directory = dentry; + if (IS_ERR_OR_NULL(dentry)) { dev_err(kbdev->dev, "Couldn't create mali debugfs ctx directory\n"); - err = -ENOMEM; - goto out; + return dentry; } - kbdev->debugfs_instr_directory = debugfs_create_dir("instrumentation", - kbdev->mali_debugfs_directory); - if (IS_ERR_OR_NULL(kbdev->debugfs_instr_directory)) { + dentry = debugfs_create_dir("instrumentation", kbdev->mali_debugfs_directory); + kbdev->debugfs_instr_directory = dentry; + if (IS_ERR_OR_NULL(dentry)) { dev_err(kbdev->dev, "Couldn't create mali debugfs instrumentation directory\n"); - err = -ENOMEM; - goto out; - } - - debugfs_ctx_defaults_directory = debugfs_create_dir("defaults", - kbdev->debugfs_ctx_directory); - if (IS_ERR_OR_NULL(debugfs_ctx_defaults_directory)) { - dev_err(kbdev->dev, "Couldn't create mali debugfs ctx defaults directory\n"); - err = -ENOMEM; - goto out; + return dentry; } kbasep_regs_history_debugfs_init(kbdev); -#if !MALI_USE_CSF +#if MALI_USE_CSF + kbase_debug_csf_fault_debugfs_init(kbdev); +#else /* MALI_USE_CSF */ kbase_debug_job_fault_debugfs_init(kbdev); #endif /* !MALI_USE_CSF */ @@ -4849,41 +5376,62 @@ int kbase_device_debugfs_init(struct kbase_device *kbdev) /* fops_* variables created by invocations of macro * MAKE_QUIRK_ACCESSORS() above. */ - debugfs_create_file("quirks_sc", 0644, + dentry = debugfs_create_file("quirks_sc", 0644, kbdev->mali_debugfs_directory, kbdev, &fops_sc_quirks); - debugfs_create_file("quirks_tiler", 0644, + if (IS_ERR_OR_NULL(dentry)) { + dev_err(kbdev->dev, "Unable to create quirks_sc debugfs entry\n"); + return dentry; + } + + dentry = debugfs_create_file("quirks_tiler", 0644, kbdev->mali_debugfs_directory, kbdev, &fops_tiler_quirks); - debugfs_create_file("quirks_mmu", 0644, + if (IS_ERR_OR_NULL(dentry)) { + dev_err(kbdev->dev, "Unable to create quirks_tiler debugfs entry\n"); + return dentry; + } + + dentry = debugfs_create_file("quirks_mmu", 0644, kbdev->mali_debugfs_directory, kbdev, &fops_mmu_quirks); - debugfs_create_file("quirks_gpu", 0644, kbdev->mali_debugfs_directory, - kbdev, &fops_gpu_quirks); - - debugfs_create_bool("infinite_cache", mode, - debugfs_ctx_defaults_directory, - &kbdev->infinite_cache_active_default); + if (IS_ERR_OR_NULL(dentry)) { + dev_err(kbdev->dev, "Unable to create quirks_mmu debugfs entry\n"); + return dentry; + } - debugfs_create_file("mem_pool_max_size", mode, - debugfs_ctx_defaults_directory, - &kbdev->mem_pool_defaults.small, - &kbase_device_debugfs_mem_pool_max_size_fops); + dentry = debugfs_create_file("quirks_gpu", 0644, kbdev->mali_debugfs_directory, + kbdev, &fops_gpu_quirks); + if (IS_ERR_OR_NULL(dentry)) { + dev_err(kbdev->dev, "Unable to create quirks_gpu debugfs entry\n"); + return dentry; + } - debugfs_create_file("lp_mem_pool_max_size", mode, - debugfs_ctx_defaults_directory, - &kbdev->mem_pool_defaults.large, - &kbase_device_debugfs_mem_pool_max_size_fops); + dentry = debugfs_ctx_defaults_init(kbdev); + if (IS_ERR_OR_NULL(dentry)) + return dentry; if (kbase_hw_has_feature(kbdev, BASE_HW_FEATURE_PROTECTED_DEBUG_MODE)) { - debugfs_create_file("protected_debug_mode", 0444, + dentry = debugfs_create_file("protected_debug_mode", 0444, kbdev->mali_debugfs_directory, kbdev, &fops_protected_debug_mode); + if (IS_ERR_OR_NULL(dentry)) { + dev_err(kbdev->dev, "Unable to create protected_debug_mode debugfs entry\n"); + return dentry; + } } - debugfs_create_file("reset", 0644, + dentry = debugfs_create_file("reset", 0644, kbdev->mali_debugfs_directory, kbdev, &fops_trigger_reset); + if (IS_ERR_OR_NULL(dentry)) { + dev_err(kbdev->dev, "Unable to create reset debugfs entry\n"); + return dentry; + } + + debugfs_create_file("trigger_uevent", 0644, + kbdev->mali_debugfs_directory, kbdev, + &fops_trigger_uevent); kbase_ktrace_debugfs_init(kbdev); @@ -4895,18 +5443,30 @@ int kbase_device_debugfs_init(struct kbase_device *kbdev) #endif /* CONFIG_MALI_DEVFREQ */ #if !MALI_USE_CSF - debugfs_create_file("serialize_jobs", 0644, + dentry = debugfs_create_file("serialize_jobs", 0644, kbdev->mali_debugfs_directory, kbdev, &kbasep_serialize_jobs_debugfs_fops); - + if (IS_ERR_OR_NULL(dentry)) { + dev_err(kbdev->dev, "Unable to create serialize_jobs debugfs entry\n"); + return dentry; + } + kbase_timeline_io_debugfs_init(kbdev); #endif kbase_dvfs_status_debugfs_init(kbdev); - return 0; -out: - debugfs_remove_recursive(kbdev->mali_debugfs_directory); - return err; + return dentry; +} + +int kbase_device_debugfs_init(struct kbase_device *kbdev) +{ + struct dentry *dentry = init_debugfs(kbdev); + + if (IS_ERR_OR_NULL(dentry)) { + debugfs_remove_recursive(kbdev->mali_debugfs_directory); + return IS_ERR(dentry) ? PTR_ERR(dentry) : -ENOMEM; + } + return 0; } void kbase_device_debugfs_term(struct kbase_device *kbdev) @@ -5098,10 +5658,11 @@ static ssize_t fw_timeout_store(struct device *dev, ret = kstrtouint(buf, 0, &fw_timeout); if (ret || fw_timeout == 0) { - dev_err(kbdev->dev, "%s\n%s\n%u", - "Couldn't process fw_timeout write operation.", - "Use format 'fw_timeout_ms', and fw_timeout_ms > 0", - FIRMWARE_PING_INTERVAL_MS); + dev_err(kbdev->dev, + "Couldn't process fw_timeout write operation.\n" + "Use format 'fw_timeout_ms', and fw_timeout_ms > 0\n" + "Default fw_timeout: %u", + kbase_get_timeout_ms(kbdev, CSF_FIRMWARE_PING_TIMEOUT)); return -EINVAL; } @@ -5171,7 +5732,10 @@ static ssize_t idle_hysteresis_time_store(struct device *dev, return -EINVAL; } - kbase_csf_firmware_set_gpu_idle_hysteresis_time(kbdev, dur); + /* In sysFs, The unit of the input value of idle_hysteresis_time is us. + * But the unit of the input parameter of this function is ns, so multiply by 1000 + */ + kbase_csf_firmware_set_gpu_idle_hysteresis_time(kbdev, dur * NSEC_PER_USEC); return count; } @@ -5198,13 +5762,221 @@ static ssize_t idle_hysteresis_time_show(struct device *dev, if (!kbdev) return -ENODEV; - dur = kbase_csf_firmware_get_gpu_idle_hysteresis_time(kbdev); + /* The unit of return value of idle_hysteresis_time_show is us, So divide by 1000.*/ + dur = kbase_csf_firmware_get_gpu_idle_hysteresis_time(kbdev) / NSEC_PER_USEC; ret = scnprintf(buf, PAGE_SIZE, "%u\n", dur); return ret; } static DEVICE_ATTR_RW(idle_hysteresis_time); + +/** + * idle_hysteresis_time_ns_store - Store callback for CSF + * idle_hysteresis_time_ns sysfs file. + * + * @dev: The device with sysfs file is for + * @attr: The attributes of the sysfs file + * @buf: The value written to the sysfs file + * @count: The number of bytes written to the sysfs file + * + * This function is called when the idle_hysteresis_time_ns sysfs + * file is written to. + * + * This file contains values of the idle hysteresis duration in ns. + * + * Return: @count if the function succeeded. An error code on failure. + */ +static ssize_t idle_hysteresis_time_ns_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) +{ + struct kbase_device *kbdev; + u32 dur = 0; + + kbdev = to_kbase_device(dev); + if (!kbdev) + return -ENODEV; + + if (kstrtou32(buf, 0, &dur)) { + dev_err(kbdev->dev, "Couldn't process idle_hysteresis_time_ns write operation.\n" + "Use format <idle_hysteresis_time_ns>\n"); + return -EINVAL; + } + + kbase_csf_firmware_set_gpu_idle_hysteresis_time(kbdev, dur); + + return count; +} + +/** + * idle_hysteresis_time_ns_show - Show callback for CSF + * idle_hysteresis_time_ns sysfs entry. + * + * @dev: The device this sysfs file is for. + * @attr: The attributes of the sysfs file. + * @buf: The output buffer to receive the GPU information. + * + * This function is called to get the current idle hysteresis duration in ns. + * + * Return: The number of bytes output to @buf. + */ +static ssize_t idle_hysteresis_time_ns_show(struct device *dev, struct device_attribute *attr, + char *const buf) +{ + struct kbase_device *kbdev; + ssize_t ret; + u32 dur; + + kbdev = to_kbase_device(dev); + if (!kbdev) + return -ENODEV; + + dur = kbase_csf_firmware_get_gpu_idle_hysteresis_time(kbdev); + ret = scnprintf(buf, PAGE_SIZE, "%u\n", dur); + + return ret; +} + +static DEVICE_ATTR_RW(idle_hysteresis_time_ns); + +/** + * mcu_shader_pwroff_timeout_show - Get the MCU shader Core power-off time value. + * + * @dev: The device this sysfs file is for. + * @attr: The attributes of the sysfs file. + * @buf: The output buffer for the sysfs file contents + * + * Get the internally recorded MCU shader Core power-off (nominal) timeout value. + * The unit of the value is in micro-seconds. + * + * Return: The number of bytes output to @buf if the + * function succeeded. A Negative value on failure. + */ +static ssize_t mcu_shader_pwroff_timeout_show(struct device *dev, struct device_attribute *attr, + char *const buf) +{ + struct kbase_device *kbdev = dev_get_drvdata(dev); + u32 pwroff; + + if (!kbdev) + return -ENODEV; + + /* The unit of return value of the function is us, So divide by 1000.*/ + pwroff = kbase_csf_firmware_get_mcu_core_pwroff_time(kbdev) / NSEC_PER_USEC; + return scnprintf(buf, PAGE_SIZE, "%u\n", pwroff); +} + +/** + * mcu_shader_pwroff_timeout_store - Set the MCU shader core power-off time value. + * + * @dev: The device with sysfs file is for + * @attr: The attributes of the sysfs file + * @buf: The value written to the sysfs file + * @count: The number of bytes to write to the sysfs file + * + * The duration value (unit: micro-seconds) for configuring MCU Shader Core + * timer, when the shader cores' power transitions are delegated to the + * MCU (normal operational mode) + * + * Return: @count if the function succeeded. An error code on failure. + */ +static ssize_t mcu_shader_pwroff_timeout_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) +{ + struct kbase_device *kbdev = dev_get_drvdata(dev); + u32 dur; + + const struct kbase_pm_policy *current_policy; + bool always_on; + + if (!kbdev) + return -ENODEV; + + if (kstrtouint(buf, 0, &dur)) + return -EINVAL; + + current_policy = kbase_pm_get_policy(kbdev); + always_on = current_policy == &kbase_pm_always_on_policy_ops; + if (dur == 0 && !always_on) + return -EINVAL; + + /* In sysFs, The unit of the input value of mcu_shader_pwroff_timeout is us. + * But the unit of the input parameter of this function is ns, so multiply by 1000 + */ + kbase_csf_firmware_set_mcu_core_pwroff_time(kbdev, dur * NSEC_PER_USEC); + + return count; +} + +static DEVICE_ATTR_RW(mcu_shader_pwroff_timeout); + +/** + * mcu_shader_pwroff_timeout_ns_show - Get the MCU shader Core power-off time value. + * + * @dev: The device this sysfs file is for. + * @attr: The attributes of the sysfs file. + * @buf: The output buffer for the sysfs file contents + * + * Get the internally recorded MCU shader Core power-off (nominal) timeout value. + * The unit of the value is in nanoseconds. + * + * Return: The number of bytes output to @buf if the + * function succeeded. A Negative value on failure. + */ +static ssize_t mcu_shader_pwroff_timeout_ns_show(struct device *dev, struct device_attribute *attr, + char *const buf) +{ + struct kbase_device *kbdev = dev_get_drvdata(dev); + u32 pwroff; + + if (!kbdev) + return -ENODEV; + + pwroff = kbase_csf_firmware_get_mcu_core_pwroff_time(kbdev); + return scnprintf(buf, PAGE_SIZE, "%u\n", pwroff); +} + +/** + * mcu_shader_pwroff_timeout_ns_store - Set the MCU shader core power-off time value. + * + * @dev: The device with sysfs file is for + * @attr: The attributes of the sysfs file + * @buf: The value written to the sysfs file + * @count: The number of bytes to write to the sysfs file + * + * The duration value (unit: nanoseconds) for configuring MCU Shader Core + * timer, when the shader cores' power transitions are delegated to the + * MCU (normal operational mode) + * + * Return: @count if the function succeeded. An error code on failure. + */ +static ssize_t mcu_shader_pwroff_timeout_ns_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) +{ + struct kbase_device *kbdev = dev_get_drvdata(dev); + u32 dur; + + const struct kbase_pm_policy *current_policy; + bool always_on; + + if (!kbdev) + return -ENODEV; + + if (kstrtouint(buf, 0, &dur)) + return -EINVAL; + + current_policy = kbase_pm_get_policy(kbdev); + always_on = current_policy == &kbase_pm_always_on_policy_ops; + if (dur == 0 && !always_on) + return -EINVAL; + + kbase_csf_firmware_set_mcu_core_pwroff_time(kbdev, dur); + + return count; +} + +static DEVICE_ATTR_RW(mcu_shader_pwroff_timeout_ns); + #endif /* MALI_USE_CSF */ static struct attribute *kbase_scheduling_attrs[] = { @@ -5265,6 +6037,9 @@ static struct attribute *kbase_attrs[] = { &dev_attr_csg_scheduling_period.attr, &dev_attr_fw_timeout.attr, &dev_attr_idle_hysteresis_time.attr, + &dev_attr_idle_hysteresis_time_ns.attr, + &dev_attr_mcu_shader_pwroff_timeout.attr, + &dev_attr_mcu_shader_pwroff_timeout_ns.attr, #endif /* !MALI_USE_CSF */ &dev_attr_power_policy.attr, &dev_attr_core_mask.attr, @@ -5402,8 +6177,15 @@ static int kbase_platform_device_probe(struct platform_device *pdev) } kbdev->dev = &pdev->dev; - dev_set_drvdata(kbdev->dev, kbdev); +#if (KERNEL_VERSION(6, 0, 0) <= LINUX_VERSION_CODE) + kbdev->token = -EPERM; +#endif /* (KERNEL_VERSION(6, 0, 0) <= LINUX_VERSION_CODE) */ + + dev_set_drvdata(kbdev->dev, kbdev); +#if (KERNEL_VERSION(5, 3, 0) <= LINUX_VERSION_CODE) + mutex_lock(&kbase_probe_mutex); +#endif err = kbase_device_init(kbdev); if (err) { @@ -5415,14 +6197,28 @@ static int kbase_platform_device_probe(struct platform_device *pdev) dev_set_drvdata(kbdev->dev, NULL); kbase_device_free(kbdev); +#if (KERNEL_VERSION(5, 3, 0) <= LINUX_VERSION_CODE) + mutex_unlock(&kbase_probe_mutex); +#endif } else { +#if (KERNEL_VERSION(6, 1, 0) <= LINUX_VERSION_CODE) + /* Since upstream is not exporting mmap_min_addr, kbase at the + * moment is unable to track possible kernel changes via sysfs. + * Flag this out in a device info message. + */ + dev_info(kbdev->dev, KBASE_COMPILED_MMAP_MIN_ADDR_MSG); +#endif + dev_info(kbdev->dev, "Probed as %s\n", dev_name(kbdev->mdev.this_device)); kbase_increment_device_id(); +#if (KERNEL_VERSION(5, 3, 0) <= LINUX_VERSION_CODE) + mutex_unlock(&kbase_probe_mutex); +#endif #ifdef CONFIG_MALI_ARBITER_SUPPORT - mutex_lock(&kbdev->pm.lock); + rt_mutex_lock(&kbdev->pm.lock); kbase_arbiter_pm_vm_event(kbdev, KBASE_VM_GPU_INITIALIZED_EVT); - mutex_unlock(&kbdev->pm.lock); + rt_mutex_unlock(&kbdev->pm.lock); #endif } @@ -5490,13 +6286,8 @@ static int kbase_device_resume(struct device *dev) #ifdef CONFIG_MALI_DEVFREQ dev_dbg(dev, "Callback %s\n", __func__); - if (kbdev->devfreq) { - mutex_lock(&kbdev->pm.lock); - if (kbdev->pm.active_count > 0) - kbase_devfreq_enqueue_work(kbdev, DEVFREQ_WORK_RESUME); - mutex_unlock(&kbdev->pm.lock); - flush_workqueue(kbdev->devfreq_queue.workq); - } + if (kbdev->devfreq) + kbase_devfreq_enqueue_work(kbdev, DEVFREQ_WORK_RESUME); #endif return 0; } @@ -5631,12 +6422,11 @@ static const struct dev_pm_ops kbase_pm_ops = { }; #if IS_ENABLED(CONFIG_OF) -static const struct of_device_id kbase_dt_ids[] = { - { .compatible = "arm,malit6xx" }, - { .compatible = "arm,mali-midgard" }, - { .compatible = "arm,mali-bifrost" }, - { /* sentinel */ } -}; +static const struct of_device_id kbase_dt_ids[] = { { .compatible = "arm,malit6xx" }, + { .compatible = "arm,mali-midgard" }, + { .compatible = "arm,mali-bifrost" }, + { .compatible = "arm,mali-valhall" }, + { /* sentinel */ } }; MODULE_DEVICE_TABLE(of, kbase_dt_ids); #endif @@ -5644,33 +6434,36 @@ static struct platform_driver kbase_platform_driver = { .probe = kbase_platform_device_probe, .remove = kbase_platform_device_remove, .driver = { - .name = kbase_drv_name, + .name = KBASE_DRV_NAME, .pm = &kbase_pm_ops, .of_match_table = of_match_ptr(kbase_dt_ids), .probe_type = PROBE_PREFER_ASYNCHRONOUS, }, }; -/* - * The driver will not provide a shortcut to create the Mali platform device - * anymore when using Device Tree. - */ -#if IS_ENABLED(CONFIG_OF) +#if (KERNEL_VERSION(5, 3, 0) > LINUX_VERSION_CODE) && IS_ENABLED(CONFIG_OF) module_platform_driver(kbase_platform_driver); #else - static int __init kbase_driver_init(void) { int ret; +#if (KERNEL_VERSION(5, 3, 0) <= LINUX_VERSION_CODE) + mutex_init(&kbase_probe_mutex); +#endif + +#ifndef CONFIG_OF ret = kbase_platform_register(); if (ret) return ret; - +#endif ret = platform_driver_register(&kbase_platform_driver); - - if (ret) +#ifndef CONFIG_OF + if (ret) { kbase_platform_unregister(); + return ret; + } +#endif return ret; } @@ -5678,14 +6471,14 @@ static int __init kbase_driver_init(void) static void __exit kbase_driver_exit(void) { platform_driver_unregister(&kbase_platform_driver); +#ifndef CONFIG_OF kbase_platform_unregister(); +#endif } module_init(kbase_driver_init); module_exit(kbase_driver_exit); - -#endif /* CONFIG_OF */ - +#endif MODULE_LICENSE("GPL"); MODULE_IMPORT_NS(DMA_BUF); MODULE_VERSION(MALI_RELEASE_NAME " (UK version " \ diff --git a/mali_kbase/mali_kbase_cs_experimental.h b/mali_kbase/mali_kbase_cs_experimental.h index 4dc09e4..7e885ca 100644 --- a/mali_kbase/mali_kbase_cs_experimental.h +++ b/mali_kbase/mali_kbase_cs_experimental.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -30,9 +30,9 @@ */ static inline void mali_kbase_print_cs_experimental(void) { -#if MALI_INCREMENTAL_RENDERING - pr_info("mali_kbase: INCREMENTAL_RENDERING (experimental) enabled"); -#endif /* MALI_INCREMENTAL_RENDERING */ +#if MALI_INCREMENTAL_RENDERING_JM + pr_info("mali_kbase: INCREMENTAL_RENDERING_JM (experimental) enabled"); +#endif /* MALI_INCREMENTAL_RENDERING_JM */ } #endif /* _KBASE_CS_EXPERIMENTAL_H_ */ diff --git a/mali_kbase/mali_kbase_ctx_sched.c b/mali_kbase/mali_kbase_ctx_sched.c index 8026e7f..ea4f300 100644 --- a/mali_kbase/mali_kbase_ctx_sched.c +++ b/mali_kbase/mali_kbase_ctx_sched.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2017-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2017-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -23,7 +23,9 @@ #include <mali_kbase_defs.h> #include "mali_kbase_ctx_sched.h" #include "tl/mali_kbase_tracepoints.h" -#if !MALI_USE_CSF +#if MALI_USE_CSF +#include "mali_kbase_reset_gpu.h" +#else #include <mali_kbase_hwaccess_jm.h> #endif @@ -67,6 +69,12 @@ void kbase_ctx_sched_term(struct kbase_device *kbdev) } } +void kbase_ctx_sched_init_ctx(struct kbase_context *kctx) +{ + kctx->as_nr = KBASEP_AS_NR_INVALID; + atomic_set(&kctx->refcount, 0); +} + /* kbasep_ctx_sched_find_as_for_ctx - Find a free address space * * @kbdev: The context for which to find a free address space @@ -111,7 +119,7 @@ int kbase_ctx_sched_retain_ctx(struct kbase_context *kctx) if (atomic_inc_return(&kctx->refcount) == 1) { int const free_as = kbasep_ctx_sched_find_as_for_ctx(kctx); - if (free_as != KBASEP_AS_NR_INVALID) { + if (free_as >= 0) { kbdev->as_free &= ~(1u << free_as); /* Only program the MMU if the context has not been * assigned the same address space before. @@ -152,9 +160,23 @@ void kbase_ctx_sched_retain_ctx_refcount(struct kbase_context *kctx) struct kbase_device *const kbdev = kctx->kbdev; lockdep_assert_held(&kbdev->hwaccess_lock); - WARN_ON(atomic_read(&kctx->refcount) == 0); - WARN_ON(kctx->as_nr == KBASEP_AS_NR_INVALID); - WARN_ON(kbdev->as_to_kctx[kctx->as_nr] != kctx); +#if MALI_USE_CSF + /* We expect the context to be active when this function is called, + * except for the case where a page fault is reported for it during + * the GPU reset sequence, in which case we can expect the refcount + * to be 0. + */ + WARN_ON(!atomic_read(&kctx->refcount) && !kbase_reset_gpu_is_active(kbdev)); +#else + /* We expect the context to be active (and thus refcount should be non-zero) + * when this function is called + */ + WARN_ON(!atomic_read(&kctx->refcount)); +#endif + if (likely((kctx->as_nr >= 0) && (kctx->as_nr < BASE_MAX_NR_AS))) + WARN_ON(kbdev->as_to_kctx[kctx->as_nr] != kctx); + else + WARN(true, "Invalid as_nr(%d)", kctx->as_nr); atomic_inc(&kctx->refcount); } @@ -168,16 +190,17 @@ void kbase_ctx_sched_release_ctx(struct kbase_context *kctx) new_ref_count = atomic_dec_return(&kctx->refcount); if (new_ref_count == 0) { - kbdev->as_free |= (1u << kctx->as_nr); - if (kbase_ctx_flag(kctx, KCTX_AS_DISABLED_ON_FAULT)) { - KBASE_TLSTREAM_TL_KBASE_CTX_UNASSIGN_AS( - kbdev, kctx->id); - kbdev->as_to_kctx[kctx->as_nr] = NULL; - kctx->as_nr = KBASEP_AS_NR_INVALID; - kbase_ctx_flag_clear(kctx, KCTX_AS_DISABLED_ON_FAULT); + if (likely((kctx->as_nr >= 0) && (kctx->as_nr < BASE_MAX_NR_AS))) { + kbdev->as_free |= (1u << kctx->as_nr); + if (kbase_ctx_flag(kctx, KCTX_AS_DISABLED_ON_FAULT)) { + KBASE_TLSTREAM_TL_KBASE_CTX_UNASSIGN_AS(kbdev, kctx->id); + kbdev->as_to_kctx[kctx->as_nr] = NULL; + kctx->as_nr = KBASEP_AS_NR_INVALID; + kbase_ctx_flag_clear(kctx, KCTX_AS_DISABLED_ON_FAULT); #if !MALI_USE_CSF - kbase_backend_slot_kctx_purge_locked(kbdev, kctx); + kbase_backend_slot_kctx_purge_locked(kbdev, kctx); #endif + } } } @@ -187,13 +210,14 @@ void kbase_ctx_sched_release_ctx(struct kbase_context *kctx) void kbase_ctx_sched_remove_ctx(struct kbase_context *kctx) { struct kbase_device *const kbdev = kctx->kbdev; + unsigned long flags; - lockdep_assert_held(&kbdev->mmu_hw_mutex); - lockdep_assert_held(&kbdev->hwaccess_lock); + mutex_lock(&kbdev->mmu_hw_mutex); + spin_lock_irqsave(&kbdev->hwaccess_lock, flags); WARN_ON(atomic_read(&kctx->refcount) != 0); - if (kctx->as_nr != KBASEP_AS_NR_INVALID) { + if ((kctx->as_nr >= 0) && (kctx->as_nr < BASE_MAX_NR_AS)) { if (kbdev->pm.backend.gpu_powered) kbase_mmu_disable(kctx); @@ -201,6 +225,9 @@ void kbase_ctx_sched_remove_ctx(struct kbase_context *kctx) kbdev->as_to_kctx[kctx->as_nr] = NULL; kctx->as_nr = KBASEP_AS_NR_INVALID; } + + spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); + mutex_unlock(&kbdev->mmu_hw_mutex); } void kbase_ctx_sched_restore_all_as(struct kbase_device *kbdev) @@ -212,6 +239,8 @@ void kbase_ctx_sched_restore_all_as(struct kbase_device *kbdev) WARN_ON(!kbdev->pm.backend.gpu_powered); + kbdev->mmu_unresponsive = false; + for (i = 0; i != kbdev->nr_hw_address_spaces; ++i) { struct kbase_context *kctx; @@ -264,7 +293,7 @@ struct kbase_context *kbase_ctx_sched_as_to_ctx_refcount( found_kctx = kbdev->as_to_kctx[as_nr]; - if (!WARN_ON(found_kctx == NULL)) + if (found_kctx) kbase_ctx_sched_retain_ctx_refcount(found_kctx); spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); @@ -313,16 +342,14 @@ struct kbase_context *kbase_ctx_sched_as_to_ctx_nolock( bool kbase_ctx_sched_inc_refcount_nolock(struct kbase_context *kctx) { bool result = false; - int as_nr; if (WARN_ON(kctx == NULL)) return result; lockdep_assert_held(&kctx->kbdev->hwaccess_lock); - as_nr = kctx->as_nr; if (atomic_read(&kctx->refcount) > 0) { - KBASE_DEBUG_ASSERT(as_nr >= 0); + KBASE_DEBUG_ASSERT(kctx->as_nr >= 0); kbase_ctx_sched_retain_ctx_refcount(kctx); KBASE_KTRACE_ADD(kctx->kbdev, SCHED_RETAIN_CTX_NOLOCK, kctx, diff --git a/mali_kbase/mali_kbase_ctx_sched.h b/mali_kbase/mali_kbase_ctx_sched.h index f787cc3..5a8d175 100644 --- a/mali_kbase/mali_kbase_ctx_sched.h +++ b/mali_kbase/mali_kbase_ctx_sched.h @@ -60,6 +60,15 @@ int kbase_ctx_sched_init(struct kbase_device *kbdev); void kbase_ctx_sched_term(struct kbase_device *kbdev); /** + * kbase_ctx_sched_ctx_init - Initialize per-context data fields for scheduling + * @kctx: The context to initialize + * + * This must be called during context initialization before any other context + * scheduling functions are called on @kctx + */ +void kbase_ctx_sched_init_ctx(struct kbase_context *kctx); + +/** * kbase_ctx_sched_retain_ctx - Retain a reference to the @ref kbase_context * @kctx: The context to which to retain a reference * @@ -113,9 +122,6 @@ void kbase_ctx_sched_release_ctx(struct kbase_context *kctx); * This function should be called when a context is being destroyed. The * context must no longer have any reference. If it has been assigned an * address space before then the AS will be unprogrammed. - * - * The kbase_device::mmu_hw_mutex and kbase_device::hwaccess_lock locks must be - * held whilst calling this function. */ void kbase_ctx_sched_remove_ctx(struct kbase_context *kctx); diff --git a/mali_kbase/mali_kbase_debug.h b/mali_kbase/mali_kbase_debug.h index d9eeed8..f0c4b59 100644 --- a/mali_kbase/mali_kbase_debug.h +++ b/mali_kbase/mali_kbase_debug.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2012-2015, 2017, 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2012-2015, 2017, 2020-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -65,7 +65,7 @@ struct kbasep_debug_assert_cb { #endif /** - * KBASEP_DEBUG_ASSERT_OUT(trace, function, ...) - (Private) system printing + * KBASEP_DEBUG_ASSERT_OUT() - (Private) system printing * function associated to the @ref KBASE_DEBUG_ASSERT_MSG event. * @trace: location in the code from where the message is printed * @function: function from where the message is printed @@ -125,7 +125,7 @@ struct kbasep_debug_assert_cb { #endif /* KBASE_DEBUG_DISABLE_ASSERTS */ /** - * KBASE_DEBUG_CODE( X ) - Executes the code inside the macro only in debug mode + * KBASE_DEBUG_CODE() - Executes the code inside the macro only in debug mode * @X: Code to compile only in debug mode. */ #ifdef CONFIG_MALI_DEBUG diff --git a/mali_kbase/mali_kbase_debug_job_fault.c b/mali_kbase/mali_kbase_debug_job_fault.c index 4f021b3..d6518b4 100644 --- a/mali_kbase/mali_kbase_debug_job_fault.c +++ b/mali_kbase/mali_kbase_debug_job_fault.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2012-2016, 2018-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2012-2016, 2018-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -87,8 +87,7 @@ static bool kbase_ctx_has_no_event_pending(struct kbase_context *kctx) static int wait_for_job_fault(struct kbase_device *kbdev) { -#if KERNEL_VERSION(4, 7, 0) <= LINUX_VERSION_CODE && \ - KERNEL_VERSION(4, 15, 0) > LINUX_VERSION_CODE +#if KERNEL_VERSION(4, 15, 0) > LINUX_VERSION_CODE int ret = wait_event_interruptible_timeout(kbdev->job_fault_wq, kbase_is_job_fault_event_pending(kbdev), msecs_to_jiffies(2000)); diff --git a/mali_kbase/mali_kbase_debug_mem_allocs.c b/mali_kbase/mali_kbase_debug_mem_allocs.c new file mode 100644 index 0000000..0592187 --- /dev/null +++ b/mali_kbase/mali_kbase_debug_mem_allocs.c @@ -0,0 +1,135 @@ +// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note +/* + * + * (C) COPYRIGHT 2022-2023 ARM Limited. All rights reserved. + * + * This program is free software and is provided to you under the terms of the + * GNU General Public License version 2 as published by the Free Software + * Foundation, and any use by you of this program is subject to the terms + * of such GNU license. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, you can access it online at + * http://www.gnu.org/licenses/gpl-2.0.html. + * + */ + +/* + * Debugfs interface to dump information about GPU allocations in kctx + */ + +#include "mali_kbase_debug_mem_allocs.h" +#include "mali_kbase.h" + +#include <linux/string.h> +#include <linux/list.h> +#include <linux/file.h> + +#if IS_ENABLED(CONFIG_DEBUG_FS) + +/** + * debug_zone_mem_allocs_show - Show information from specific rbtree + * @zone: The memory zone to be displayed + * @sfile: The debugfs entry + * + * This function is called to show information about all the GPU allocations of a + * a particular zone within GPU virtual memory space of a context. + * The information like the start virtual address and size (in bytes) is shown for + * every GPU allocation mapped in the zone. + */ +static void debug_zone_mem_allocs_show(struct kbase_reg_zone *zone, struct seq_file *sfile) +{ + struct rb_node *p; + struct rb_root *rbtree = &zone->reg_rbtree; + struct kbase_va_region *reg; + const char *type_names[5] = { + "Native", + "Imported UMM", + "Imported user buf", + "Alias", + "Raw" + }; + +#define MEM_ALLOCS_HEADER \ + " VA, VA size, Commit size, Flags, Mem type\n" + seq_printf(sfile, "Zone name: %s\n:", kbase_reg_zone_get_name(zone->id)); + seq_printf(sfile, MEM_ALLOCS_HEADER); + for (p = rb_first(rbtree); p; p = rb_next(p)) { + reg = rb_entry(p, struct kbase_va_region, rblink); + if (!(reg->flags & KBASE_REG_FREE)) { + seq_printf(sfile, "%16llx, %16zx, %16zx, %8lx, %s\n", + reg->start_pfn << PAGE_SHIFT, reg->nr_pages << PAGE_SHIFT, + kbase_reg_current_backed_size(reg) << PAGE_SHIFT, + reg->flags, type_names[reg->gpu_alloc->type]); + } + } +} + +/** + * debug_ctx_mem_allocs_show - Show information about GPU allocations in a kctx + * @sfile: The debugfs entry + * @data: Data associated with the entry + * + * Return: + * 0 if successfully prints data in debugfs entry file + * -1 if it encountered an error + */ +static int debug_ctx_mem_allocs_show(struct seq_file *sfile, void *data) +{ + struct kbase_context *const kctx = sfile->private; + enum kbase_memory_zone zone_idx; + + kbase_gpu_vm_lock(kctx); + for (zone_idx = 0; zone_idx < CONTEXT_ZONE_MAX; zone_idx++) { + struct kbase_reg_zone *zone; + + zone = &kctx->reg_zone[zone_idx]; + debug_zone_mem_allocs_show(zone, sfile); + } + kbase_gpu_vm_unlock(kctx); + return 0; +} + +/* + * File operations related to debugfs entry for mem_zones + */ +static int debug_mem_allocs_open(struct inode *in, struct file *file) +{ + return single_open(file, debug_ctx_mem_allocs_show, in->i_private); +} + +static const struct file_operations kbase_debug_mem_allocs_fops = { + .owner = THIS_MODULE, + .open = debug_mem_allocs_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + +/* + * Initialize debugfs entry for mem_allocs + */ +void kbase_debug_mem_allocs_init(struct kbase_context *const kctx) +{ + /* Caller already ensures this, but we keep the pattern for + * maintenance safety. + */ + if (WARN_ON(!kctx) || WARN_ON(IS_ERR_OR_NULL(kctx->kctx_dentry))) + return; + + debugfs_create_file("mem_allocs", 0400, kctx->kctx_dentry, kctx, + &kbase_debug_mem_allocs_fops); +} +#else +/* + * Stub functions for when debugfs is disabled + */ +void kbase_debug_mem_allocs_init(struct kbase_context *const kctx) +{ +} +#endif diff --git a/mali_kbase/mali_kbase_debug_mem_allocs.h b/mali_kbase/mali_kbase_debug_mem_allocs.h new file mode 100644 index 0000000..8cf69c2 --- /dev/null +++ b/mali_kbase/mali_kbase_debug_mem_allocs.h @@ -0,0 +1,39 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * + * (C) COPYRIGHT 2022 ARM Limited. All rights reserved. + * + * This program is free software and is provided to you under the terms of the + * GNU General Public License version 2 as published by the Free Software + * Foundation, and any use by you of this program is subject to the terms + * of such GNU license. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, you can access it online at + * http://www.gnu.org/licenses/gpl-2.0.html. + * + */ + +#ifndef _KBASE_DEBUG_MEM_ALLOCS_H +#define _KBASE_DEBUG_MEM_ALLOCS_H + +#include <mali_kbase.h> + +/** + * kbase_debug_mem_allocs_init() - Initialize the mem_allocs debugfs file + * @kctx: Pointer to kernel base context + * + * This function creates a "mem_allocs" file for a context to show infor about the + * GPU allocations created for that context. + * + * The file is cleaned up by a call to debugfs_remove_recursive() deleting the + * parent directory. + */ +void kbase_debug_mem_allocs_init(struct kbase_context *kctx); + +#endif diff --git a/mali_kbase/mali_kbase_debug_mem_view.c b/mali_kbase/mali_kbase_debug_mem_view.c index ce87a00..7086c6b 100644 --- a/mali_kbase/mali_kbase_debug_mem_view.c +++ b/mali_kbase/mali_kbase_debug_mem_view.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2013-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2013-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -189,13 +189,13 @@ static const struct seq_operations ops = { .show = debug_mem_show, }; -static int debug_mem_zone_open(struct rb_root *rbtree, - struct debug_mem_data *mem_data) +static int debug_mem_zone_open(struct kbase_reg_zone *zone, struct debug_mem_data *mem_data) { int ret = 0; struct rb_node *p; struct kbase_va_region *reg; struct debug_mem_mapping *mapping; + struct rb_root *rbtree = &zone->reg_rbtree; for (p = rb_first(rbtree); p; p = rb_next(p)) { reg = rb_entry(p, struct kbase_va_region, rblink); @@ -233,8 +233,9 @@ static int debug_mem_open(struct inode *i, struct file *file) struct kbase_context *const kctx = i->i_private; struct debug_mem_data *mem_data; int ret; + enum kbase_memory_zone idx; - if (get_file_rcu(kctx->filp) == 0) + if (!kbase_file_inc_fops_count_unless_closed(kctx->kfile)) return -ENOENT; /* Check if file was opened in write mode. GPU memory contents @@ -263,37 +264,15 @@ static int debug_mem_open(struct inode *i, struct file *file) mem_data->column_width = kctx->mem_view_column_width; - ret = debug_mem_zone_open(&kctx->reg_rbtree_same, mem_data); - if (ret != 0) { - kbase_gpu_vm_unlock(kctx); - goto out; - } - - ret = debug_mem_zone_open(&kctx->reg_rbtree_custom, mem_data); - if (ret != 0) { - kbase_gpu_vm_unlock(kctx); - goto out; - } - - ret = debug_mem_zone_open(&kctx->reg_rbtree_exec, mem_data); - if (ret != 0) { - kbase_gpu_vm_unlock(kctx); - goto out; - } + for (idx = 0; idx < CONTEXT_ZONE_MAX; idx++) { + struct kbase_reg_zone *zone = &kctx->reg_zone[idx]; -#if MALI_USE_CSF - ret = debug_mem_zone_open(&kctx->reg_rbtree_exec_fixed, mem_data); - if (ret != 0) { - kbase_gpu_vm_unlock(kctx); - goto out; - } - - ret = debug_mem_zone_open(&kctx->reg_rbtree_fixed, mem_data); - if (ret != 0) { - kbase_gpu_vm_unlock(kctx); - goto out; + ret = debug_mem_zone_open(zone, mem_data); + if (ret != 0) { + kbase_gpu_vm_unlock(kctx); + goto out; + } } -#endif kbase_gpu_vm_unlock(kctx); @@ -316,7 +295,7 @@ out: } seq_release(i, file); open_fail: - fput(kctx->filp); + kbase_file_dec_fops_count(kctx->kfile); return ret; } @@ -346,7 +325,7 @@ static int debug_mem_release(struct inode *inode, struct file *file) kfree(mem_data); } - fput(kctx->filp); + kbase_file_dec_fops_count(kctx->kfile); return 0; } diff --git a/mali_kbase/mali_kbase_debug_mem_view.h b/mali_kbase/mali_kbase_debug_mem_view.h index d034832..cb8050d 100644 --- a/mali_kbase/mali_kbase_debug_mem_view.h +++ b/mali_kbase/mali_kbase_debug_mem_view.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2013-2015, 2019-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2013-2015, 2019-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -25,7 +25,7 @@ #include <mali_kbase.h> /** - * kbase_debug_mem_view_init - Initialize the mem_view sysfs file + * kbase_debug_mem_view_init - Initialize the mem_view debugfs file * @kctx: Pointer to kernel base context * * This function creates a "mem_view" file which can be used to get a view of diff --git a/mali_kbase/mali_kbase_debug_mem_zones.c b/mali_kbase/mali_kbase_debug_mem_zones.c new file mode 100644 index 0000000..115c9c3 --- /dev/null +++ b/mali_kbase/mali_kbase_debug_mem_zones.c @@ -0,0 +1,115 @@ +// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note +/* + * + * (C) COPYRIGHT 2022-2023 ARM Limited. All rights reserved. + * + * This program is free software and is provided to you under the terms of the + * GNU General Public License version 2 as published by the Free Software + * Foundation, and any use by you of this program is subject to the terms + * of such GNU license. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, you can access it online at + * http://www.gnu.org/licenses/gpl-2.0.html. + * + */ + +/* + * Debugfs interface to dump information about GPU_VA memory zones + */ + +#include "mali_kbase_debug_mem_zones.h" +#include "mali_kbase.h" + +#include <linux/list.h> +#include <linux/file.h> + +#if IS_ENABLED(CONFIG_DEBUG_FS) + +/** + * debug_mem_zones_show - Show information about GPU_VA memory zones + * @sfile: The debugfs entry + * @data: Data associated with the entry + * + * This function is called to get the contents of the @c mem_zones debugfs file. + * This lists the start address and size (in pages) of each initialized memory + * zone within GPU_VA memory. + * + * Return: + * 0 if successfully prints data in debugfs entry file + * -1 if it encountered an error + */ +static int debug_mem_zones_show(struct seq_file *sfile, void *data) +{ + struct kbase_context *const kctx = sfile->private; + struct kbase_reg_zone *reg_zone; + enum kbase_memory_zone zone_idx; + + kbase_gpu_vm_lock(kctx); + + for (zone_idx = 0; zone_idx < CONTEXT_ZONE_MAX; zone_idx++) { + reg_zone = &kctx->reg_zone[zone_idx]; + + if (reg_zone->base_pfn) { + seq_printf(sfile, "%15s %u 0x%.16llx 0x%.16llx\n", + kbase_reg_zone_get_name(zone_idx), zone_idx, reg_zone->base_pfn, + reg_zone->va_size_pages); + } + } +#if MALI_USE_CSF + reg_zone = &kctx->kbdev->csf.mcu_shared_zone; + + if (reg_zone && reg_zone->base_pfn) { + seq_printf(sfile, "%15s %u 0x%.16llx 0x%.16llx\n", + kbase_reg_zone_get_name(MCU_SHARED_ZONE), MCU_SHARED_ZONE, + reg_zone->base_pfn, reg_zone->va_size_pages); + } +#endif + + kbase_gpu_vm_unlock(kctx); + return 0; +} + +/* + * File operations related to debugfs entry for mem_zones + */ +static int debug_mem_zones_open(struct inode *in, struct file *file) +{ + return single_open(file, debug_mem_zones_show, in->i_private); +} + +static const struct file_operations kbase_debug_mem_zones_fops = { + .owner = THIS_MODULE, + .open = debug_mem_zones_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + +/* + * Initialize debugfs entry for mem_zones + */ +void kbase_debug_mem_zones_init(struct kbase_context *const kctx) +{ + /* Caller already ensures this, but we keep the pattern for + * maintenance safety. + */ + if (WARN_ON(!kctx) || WARN_ON(IS_ERR_OR_NULL(kctx->kctx_dentry))) + return; + + debugfs_create_file("mem_zones", 0400, kctx->kctx_dentry, kctx, + &kbase_debug_mem_zones_fops); +} +#else +/* + * Stub functions for when debugfs is disabled + */ +void kbase_debug_mem_zones_init(struct kbase_context *const kctx) +{ +} +#endif diff --git a/mali_kbase/mali_kbase_debug_mem_zones.h b/mali_kbase/mali_kbase_debug_mem_zones.h new file mode 100644 index 0000000..acf349b --- /dev/null +++ b/mali_kbase/mali_kbase_debug_mem_zones.h @@ -0,0 +1,39 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * + * (C) COPYRIGHT 2022 ARM Limited. All rights reserved. + * + * This program is free software and is provided to you under the terms of the + * GNU General Public License version 2 as published by the Free Software + * Foundation, and any use by you of this program is subject to the terms + * of such GNU license. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, you can access it online at + * http://www.gnu.org/licenses/gpl-2.0.html. + * + */ + +#ifndef _KBASE_DEBUG_MEM_ZONES_H +#define _KBASE_DEBUG_MEM_ZONES_H + +#include <mali_kbase.h> + +/** + * kbase_debug_mem_zones_init() - Initialize the mem_zones sysfs file + * @kctx: Pointer to kernel base context + * + * This function creates a "mem_zones" file which can be used to determine the + * address ranges of GPU memory zones, in the GPU Virtual-Address space. + * + * The file is cleaned up by a call to debugfs_remove_recursive() deleting the + * parent directory. + */ +void kbase_debug_mem_zones_init(struct kbase_context *kctx); + +#endif diff --git a/mali_kbase/mali_kbase_debugfs_helper.c b/mali_kbase/mali_kbase_debugfs_helper.c index fcc149b..c846491 100644 --- a/mali_kbase/mali_kbase_debugfs_helper.c +++ b/mali_kbase/mali_kbase_debugfs_helper.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software diff --git a/mali_kbase/mali_kbase_defs.h b/mali_kbase/mali_kbase_defs.h index 25e4f32..bdc3f6d 100644 --- a/mali_kbase/mali_kbase_defs.h +++ b/mali_kbase/mali_kbase_defs.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2011-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2011-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -35,13 +35,13 @@ #include <backend/gpu/mali_kbase_instr_defs.h> #include <mali_kbase_pm.h> #include <mali_kbase_gpuprops_types.h> -#include <mali_kbase_hwcnt_watchdog_if.h> +#include <hwcnt/mali_kbase_hwcnt_watchdog_if.h> #if MALI_USE_CSF -#include <mali_kbase_hwcnt_backend_csf.h> +#include <hwcnt/backend/mali_kbase_hwcnt_backend_csf.h> #else -#include <mali_kbase_hwcnt_backend_jm.h> -#include <mali_kbase_hwcnt_backend_jm_watchdog.h> +#include <hwcnt/backend/mali_kbase_hwcnt_backend_jm.h> +#include <hwcnt/backend/mali_kbase_hwcnt_backend_jm_watchdog.h> #endif #include <protected_mode_switcher.h> @@ -53,11 +53,7 @@ #include <linux/sizes.h> #include <linux/rtmutex.h> -#if defined(CONFIG_SYNC) -#include <sync.h> -#else #include "mali_kbase_fence_defs.h" -#endif #if IS_ENABLED(CONFIG_DEBUG_FS) #include <linux/debugfs.h> @@ -154,8 +150,7 @@ /* Maximum number of pages of memory that require a permanent mapping, per * kbase_context */ -#define KBASE_PERMANENTLY_MAPPED_MEM_LIMIT_PAGES ((32 * 1024ul * 1024ul) >> \ - PAGE_SHIFT) +#define KBASE_PERMANENTLY_MAPPED_MEM_LIMIT_PAGES ((64 * 1024ul * 1024ul) >> PAGE_SHIFT) /* Minimum threshold period for hwcnt dumps between different hwcnt virtualizer * clients, to reduce undesired system load. * If a virtualizer client requests a dump within this threshold period after @@ -188,6 +183,60 @@ struct kbase_as; struct kbase_mmu_setup; struct kbase_kinstr_jm; +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) +/** + * struct kbase_gpu_metrics - Object containing members that are used to emit + * GPU metrics tracepoints for all applications that + * created Kbase context(s) for a GPU. + * + * @active_list: List of applications that did some GPU activity in the recent work period. + * @inactive_list: List of applications that didn't do any GPU activity in the recent work period. + */ +struct kbase_gpu_metrics { + struct list_head active_list; + struct list_head inactive_list; +}; + +/** + * struct kbase_gpu_metrics_ctx - Object created for every application, that created + * Kbase context(s), containing members that are used + * to emit GPU metrics tracepoints for the application. + * + * @link: Links the object in kbase_device::gpu_metrics::active_list + * or kbase_device::gpu_metrics::inactive_list. + * @first_active_start_time: Records the time at which the application first became + * active in the current work period. + * @last_active_start_time: Records the time at which the application last became + * active in the current work period. + * @last_active_end_time: Records the time at which the application last became + * inactive in the current work period. + * @total_active: Tracks the time for which application has been active + * in the current work period. + * @prev_wp_active_end_time: Records the time at which the application last became + * inactive in the previous work period. + * @aid: Unique identifier for an application. + * @kctx_count: Counter to keep a track of the number of Kbase contexts + * created for an application. There may be multiple Kbase + * contexts contributing GPU activity data to a single GPU + * metrics context. + * @active_cnt: Counter that is updated every time the GPU activity starts + * and ends in the current work period for an application. + * @flags: Flags to track the state of GPU metrics context. + */ +struct kbase_gpu_metrics_ctx { + struct list_head link; + u64 first_active_start_time; + u64 last_active_start_time; + u64 last_active_end_time; + u64 total_active; + u64 prev_wp_active_end_time; + unsigned int aid; + unsigned int kctx_count; + u8 active_cnt; + u8 flags; +}; +#endif + /** * struct kbase_io_access - holds information about 1 register access * @@ -269,12 +318,25 @@ struct kbase_fault { bool protected_mode; }; +/** Maximum number of memory pages that should be allocated for the array + * of pointers to free PGDs. + * + * This number has been pre-calculated to deal with the maximum allocation + * size expressed by the default value of KBASE_MEM_ALLOC_MAX_SIZE. + * This is supposed to be enough for almost the entirety of MMU operations. + * Any size greater than KBASE_MEM_ALLOC_MAX_SIZE requires being broken down + * into multiple iterations, each dealing with at most KBASE_MEM_ALLOC_MAX_SIZE + * bytes. + * + * Please update this value if KBASE_MEM_ALLOC_MAX_SIZE changes. + */ +#define MAX_PAGES_FOR_FREE_PGDS ((size_t)9) + +/* Maximum number of pointers to free PGDs */ +#define MAX_FREE_PGDS ((PAGE_SIZE / sizeof(struct page *)) * MAX_PAGES_FOR_FREE_PGDS) + /** * struct kbase_mmu_table - object representing a set of GPU page tables - * @mmu_teardown_pages: Array containing pointers to 3 separate pages, used - * to cache the entries of top (L0) & intermediate level - * page tables (L1 & L2) to avoid repeated calls to - * kmap_atomic() during the MMU teardown. * @mmu_lock: Lock to serialize the accesses made to multi level GPU * page tables * @pgd: Physical address of the page allocated for the top @@ -286,29 +348,106 @@ struct kbase_fault { * Valid range is 0..(MEMORY_GROUP_MANAGER_NR_GROUPS-1). * @kctx: If this set of MMU tables belongs to a context then * this is a back-reference to the context, otherwise - * it is NULL + * it is NULL. + * @scratch_mem: Scratch memory used for MMU operations, which are + * serialized by the @mmu_lock. */ struct kbase_mmu_table { - u64 *mmu_teardown_pages[MIDGARD_MMU_BOTTOMLEVEL]; struct rt_mutex mmu_lock; phys_addr_t pgd; u8 group_id; struct kbase_context *kctx; + union { + /** + * @teardown_pages: Scratch memory used for backup copies of whole + * PGD pages when tearing down levels upon + * termination of the MMU table. + */ + struct { + /** + * @levels: Array of PGD pages, large enough to copy one PGD + * for each level of the MMU table. + */ + u64 levels[MIDGARD_MMU_BOTTOMLEVEL][PAGE_SIZE / sizeof(u64)]; + } teardown_pages; + /** + * @free_pgds: Scratch memory used for insertion, update and teardown + * operations to store a temporary list of PGDs to be freed + * at the end of the operation. + */ + struct { + /** @pgds: Array of pointers to PGDs to free. */ + struct page *pgds[MAX_FREE_PGDS]; + /** @head_index: Index of first free element in the PGDs array. */ + size_t head_index; + } free_pgds; + } scratch_mem; +}; + +/** + * enum kbase_memory_zone - Kbase memory zone identifier + * @SAME_VA_ZONE: Memory zone for allocations where the GPU and CPU VA coincide. + * @CUSTOM_VA_ZONE: When operating in compatibility mode, this zone is used to + * allow 32-bit userspace (either on a 32-bit device or a + * 32-bit application on a 64-bit device) to address the entirety + * of the GPU address space. The @CUSTOM_VA_ZONE is also used + * for JIT allocations: on 64-bit systems, the zone is created + * by reducing the size of the SAME_VA zone by a user-controlled + * amount, whereas on 32-bit systems, it is created as part of + * the existing CUSTOM_VA_ZONE + * @EXEC_VA_ZONE: Memory zone used to track GPU-executable memory. The start + * and end of this zone depend on the individual platform, + * and it is initialized upon user process request. + * @EXEC_FIXED_VA_ZONE: Memory zone used to contain GPU-executable memory + * that also permits FIXED/FIXABLE allocations. + * @FIXED_VA_ZONE: Memory zone used to allocate memory at userspace-supplied + * addresses. + * @MCU_SHARED_ZONE: Memory zone created for mappings shared between the MCU + * and Kbase. Currently this is the only zone type that is + * created on a per-device, rather than a per-context + * basis. + * @MEMORY_ZONE_MAX: Sentinel value used for iterating over all the memory zone + * identifiers. + * @CONTEXT_ZONE_MAX: Sentinel value used to keep track of the last per-context + * zone for iteration. + */ +enum kbase_memory_zone { + SAME_VA_ZONE, + CUSTOM_VA_ZONE, + EXEC_VA_ZONE, +#if IS_ENABLED(MALI_USE_CSF) + EXEC_FIXED_VA_ZONE, + FIXED_VA_ZONE, + MCU_SHARED_ZONE, +#endif + MEMORY_ZONE_MAX, +#if IS_ENABLED(MALI_USE_CSF) + CONTEXT_ZONE_MAX = FIXED_VA_ZONE + 1 +#else + CONTEXT_ZONE_MAX = EXEC_VA_ZONE + 1 +#endif }; /** - * struct kbase_reg_zone - Information about GPU memory region zones + * struct kbase_reg_zone - GPU memory zone information and region tracking + * @reg_rbtree: RB tree used to track kbase memory regions. * @base_pfn: Page Frame Number in GPU virtual address space for the start of * the Zone * @va_size_pages: Size of the Zone in pages + * @id: Memory zone identifier + * @cache: Pointer to a per-device slab allocator to allow for quickly allocating + * new regions * * Track information about a zone KBASE_REG_ZONE() and related macros. * In future, this could also store the &rb_root that are currently in * &kbase_context and &kbase_csf_device. */ struct kbase_reg_zone { + struct rb_root reg_rbtree; u64 base_pfn; u64 va_size_pages; + enum kbase_memory_zone id; + struct kmem_cache *cache; }; #if MALI_USE_CSF @@ -317,6 +456,8 @@ struct kbase_reg_zone { #include "jm/mali_kbase_jm_defs.h" #endif +#include "mali_kbase_hwaccess_time.h" + static inline int kbase_as_has_bus_fault(struct kbase_as *as, struct kbase_fault *fault) { @@ -403,7 +544,15 @@ struct kbase_clk_rate_trace_manager { * Note that some code paths keep shaders/the tiler * powered whilst this is 0. * Use kbase_pm_is_active() instead to check for such cases. - * @suspending: Flag indicating suspending/suspended + * @suspending: Flag set to true when System suspend of GPU device begins and + * set to false only when System resume of GPU device starts. + * So GPU device could be in suspended state while the flag is set. + * The flag is updated with @lock held. + * @resuming: Flag set to true when System resume of GPU device starts and is set + * to false when resume ends. The flag is set to true at the same time + * when @suspending is set to false with @lock held. + * The flag is currently used only to prevent Kbase context termination + * during System resume of GPU device. * @runtime_active: Flag to track if the GPU is in runtime suspended or active * state. This ensures that runtime_put and runtime_get * functions are called in pairs. For example if runtime_get @@ -414,7 +563,7 @@ struct kbase_clk_rate_trace_manager { * This structure contains data for the power management framework. * There is one instance of this structure per device in the system. * @zero_active_count_wait: Wait queue set when active_count == 0 - * @resume_wait: system resume of GPU device. + * @resume_wait: Wait queue to wait for the System suspend/resume of GPU device. * @debug_core_mask: Bit masks identifying the available shader cores that are * specified via sysfs. One mask per job slot. * @debug_core_mask_all: Bit masks identifying the available shader cores that @@ -432,9 +581,10 @@ struct kbase_clk_rate_trace_manager { * @clk_rtm: The state of the GPU clock rate trace manager */ struct kbase_pm_device_data { - struct mutex lock; + struct rt_mutex lock; int active_count; bool suspending; + bool resuming; #if MALI_USE_CSF bool runtime_active; #endif @@ -465,36 +615,40 @@ struct kbase_pm_device_data { /** * struct kbase_mem_pool - Page based memory pool for kctx/kbdev - * @kbdev: Kbase device where memory is used - * @cur_size: Number of free pages currently in the pool (may exceed - * @max_size in some corner cases) - * @max_size: Maximum number of free pages in the pool - * @order: order = 0 refers to a pool of 4 KB pages - * order = 9 refers to a pool of 2 MB pages (2^9 * 4KB = 2 MB) - * @group_id: A memory group ID to be passed to a platform-specific - * memory group manager, if present. Immutable. - * Valid range is 0..(MEMORY_GROUP_MANAGER_NR_GROUPS-1). - * @pool_lock: Lock protecting the pool - must be held when modifying - * @cur_size and @page_list - * @page_list: List of free pages in the pool - * @reclaim: Shrinker for kernel reclaim of free pages - * @next_pool: Pointer to next pool where pages can be allocated when this - * pool is empty. Pages will spill over to the next pool when - * this pool is full. Can be NULL if there is no next pool. - * @dying: true if the pool is being terminated, and any ongoing - * operations should be abandoned - * @dont_reclaim: true if the shrinker is forbidden from reclaiming memory from - * this pool, eg during a grow operation + * @kbdev: Kbase device where memory is used + * @cur_size: Number of free pages currently in the pool (may exceed + * @max_size in some corner cases) + * @max_size: Maximum number of free pages in the pool + * @order: order = 0 refers to a pool of 4 KB pages + * order = 9 refers to a pool of 2 MB pages (2^9 * 4KB = 2 MB) + * @group_id: A memory group ID to be passed to a platform-specific + * memory group manager, if present. Immutable. + * Valid range is 0..(MEMORY_GROUP_MANAGER_NR_GROUPS-1). + * @pool_lock: Lock protecting the pool - must be held when modifying + * @cur_size and @page_list + * @page_list: List of free pages in the pool + * @reclaim: Shrinker for kernel reclaim of free pages + * @isolation_in_progress_cnt: Number of pages in pool undergoing page isolation. + * This is used to avoid race condition between pool termination + * and page isolation for page migration. + * @next_pool: Pointer to next pool where pages can be allocated when this + * pool is empty. Pages will spill over to the next pool when + * this pool is full. Can be NULL if there is no next pool. + * @dying: true if the pool is being terminated, and any ongoing + * operations should be abandoned + * @dont_reclaim: true if the shrinker is forbidden from reclaiming memory from + * this pool, eg during a grow operation */ struct kbase_mem_pool { struct kbase_device *kbdev; - size_t cur_size; - size_t max_size; - u8 order; - u8 group_id; - spinlock_t pool_lock; - struct list_head page_list; - struct shrinker reclaim; + size_t cur_size; + size_t max_size; + u8 order; + u8 group_id; + spinlock_t pool_lock; + struct list_head page_list; + struct shrinker reclaim; + atomic_t isolation_in_progress_cnt; struct kbase_mem_pool *next_pool; @@ -581,7 +735,7 @@ struct kbase_devfreq_opp { * @entry_set_pte: program the pte to be a valid entry to encode the physical * address of the next lower level page table and also update * the number of valid entries. - * @entry_invalidate: clear out or invalidate the pte. + * @entries_invalidate: clear out or invalidate a range of ptes. * @get_num_valid_entries: returns the number of valid entries for a specific pgd. * @set_num_valid_entries: sets the number of valid entries for a specific pgd * @flags: bitmask of MMU mode flags. Refer to KBASE_MMU_MODE_ constants. @@ -598,8 +752,8 @@ struct kbase_mmu_mode { int (*pte_is_valid)(u64 pte, int level); void (*entry_set_ate)(u64 *entry, struct tagged_addr phy, unsigned long flags, int level); - void (*entry_set_pte)(u64 *pgd, u64 vpfn, phys_addr_t phy); - void (*entry_invalidate)(u64 *entry); + void (*entry_set_pte)(u64 *entry, phys_addr_t phy); + void (*entries_invalidate)(u64 *entry, u32 count); unsigned int (*get_num_valid_entries)(u64 *pgd); void (*set_num_valid_entries)(u64 *pgd, unsigned int num_of_valid_entries); @@ -675,6 +829,33 @@ struct kbase_process { }; /** + * struct kbase_mem_migrate - Object representing an instance for managing + * page migration. + * + * @free_pages_list: List of deferred pages to free. Mostly used when page migration + * is enabled. Pages in memory pool that require migrating + * will be freed instead. However page cannot be freed + * right away as Linux will need to release the page lock. + * Therefore page will be added to this list and freed later. + * @free_pages_lock: This lock should be held when adding or removing pages + * from @free_pages_list. + * @free_pages_workq: Work queue to process the work items queued to free + * pages in @free_pages_list. + * @free_pages_work: Work item to free pages in @free_pages_list. + * @inode: Pointer to inode whose address space operations are used + * for page migration purposes. + */ +struct kbase_mem_migrate { + struct list_head free_pages_list; + spinlock_t free_pages_lock; + struct workqueue_struct *free_pages_workq; + struct work_struct free_pages_work; +#if (KERNEL_VERSION(6, 0, 0) > LINUX_VERSION_CODE) + struct inode *inode; +#endif +}; + +/** * struct kbase_device - Object representing an instance of GPU platform device, * allocated from the probe method of mali driver. * @hw_quirks_sc: Configuration to be used for the shader cores as per @@ -712,6 +893,10 @@ struct kbase_process { * @opp_token: Token linked to the device OPP structure maintaining the * link to OPPs attached to a device. This is obtained * after setting regulator names for the device. + * @token: Integer replacement for opp_table in kernel versions + * 6 and greater. Value is a token id number when 0 or greater, + * and a linux errno when negative. Must be initialised + * to an non-zero value as 0 is valid token id. * @devname: string containing the name used for GPU device instance, * miscellaneous device is registered using the same name. * @id: Unique identifier for the device, indicates the number of @@ -752,12 +937,18 @@ struct kbase_process { * to the GPU device. This points to an internal memory * group manager if no platform-specific memory group * manager was retrieved through device tree. + * @mmu_unresponsive: Flag to indicate MMU is not responding. + * Set if a MMU command isn't completed within + * &kbase_device:mmu_or_gpu_cache_op_wait_time_ms. + * Clear by kbase_ctx_sched_restore_all_as() after GPU reset completes. * @as: Array of objects representing address spaces of GPU. - * @as_free: Bitpattern of free/available GPU address spaces. * @as_to_kctx: Array of pointers to struct kbase_context, having * GPU adrress spaces assigned to them. + * @as_free: Bitpattern of free/available GPU address spaces. * @mmu_mask_change: Lock to serialize the access to MMU interrupt mask * register used in the handling of Bus & Page faults. + * @pagesize_2mb: Boolean to determine whether 2MiB page sizes are + * supported and used where possible. * @gpu_props: Object containing complete information about the * configuration/properties of GPU HW device in use. * @hw_issues_mask: List of SW workarounds for HW issues @@ -803,6 +994,7 @@ struct kbase_process { * GPU reset. * @lowest_gpu_freq_khz: Lowest frequency in KHz that the GPU can run at. Used * to calculate suitable timeouts for wait operations. + * @backend_time: Kbase backend time related attributes. * @cache_clean_in_progress: Set when a cache clean has been started, and * cleared when it has finished. This prevents multiple * cache cleans being done simultaneously. @@ -909,6 +1101,10 @@ struct kbase_process { * GPU2019-3878. PM state machine is invoked after * clearing this flag and @hwaccess_lock is used to * serialize the access. + * @mmu_page_migrate_in_progress: Set before starting a MMU page migration transaction + * and cleared after the transaction completes. PM L2 state is + * prevented from entering powering up/down transitions when the + * flag is set, @hwaccess_lock is used to serialize the access. * @poweroff_pending: Set when power off operation for GPU is started, reset when * power on for GPU is started. * @infinite_cache_active_default: Set to enable using infinite cache for all the @@ -978,11 +1174,8 @@ struct kbase_process { * @total_gpu_pages for both native and dma-buf imported * allocations. * @job_done_worker: Worker for job_done work. - * @job_done_worker_thread: Thread for job_done work. * @event_worker: Worker for event work. - * @event_worker_thread: Thread for event work. * @apc.worker: Worker for async power control work. - * @apc.thread: Thread for async power control work. * @apc.power_on_work: Work struct for powering on the GPU. * @apc.power_off_work: Work struct for powering off the GPU. * @apc.end_ts: The latest end timestamp to power off the GPU. @@ -1002,6 +1195,16 @@ struct kbase_process { * @oom_notifier_block: notifier_block containing kernel-registered out-of- * memory handler. * @proc_sysfs_node: Sysfs directory node to store per-process stats. + * @mem_migrate: Per device object for managing page migration. + * @live_fence_metadata: Count of live fence metadata structures created by + * KCPU queue. These structures may outlive kbase module + * itself. Therefore, in such a case, a warning should be + * be produced. + * @mmu_or_gpu_cache_op_wait_time_ms: Maximum waiting time in ms for the completion of + * a cache operation via MMU_AS_CONTROL or GPU_CONTROL. + * @va_region_slab: kmem_cache (slab) for allocated kbase_va_region structures. + * @fence_signal_timeout_enabled: Global flag for whether fence signal timeout tracking + * is enabled. */ struct kbase_device { u32 hw_quirks_sc; @@ -1026,12 +1229,16 @@ struct kbase_device { #if IS_ENABLED(CONFIG_REGULATOR) struct regulator *regulators[BASE_MAX_NR_CLOCKS_REGULATORS]; unsigned int nr_regulators; - int opp_token; +#if (KERNEL_VERSION(6, 0, 0) <= LINUX_VERSION_CODE) + int token; +#elif (KERNEL_VERSION(4, 10, 0) <= LINUX_VERSION_CODE) + struct opp_table *opp_table; +#endif /* (KERNEL_VERSION(6, 0, 0) <= LINUX_VERSION_CODE) */ #endif /* CONFIG_REGULATOR */ char devname[DEVNAME_SIZE]; u32 id; -#if IS_ENABLED(CONFIG_MALI_NO_MALI) +#if !IS_ENABLED(CONFIG_MALI_REAL_HW) void *model; struct kmem_cache *irq_slab; struct workqueue_struct *irq_workq; @@ -1039,7 +1246,7 @@ struct kbase_device { atomic_t serving_gpu_irq; atomic_t serving_mmu_irq; spinlock_t reg_op_lock; -#endif /* CONFIG_MALI_NO_MALI */ +#endif /* !IS_ENABLED(CONFIG_MALI_REAL_HW) */ struct kbase_pm_device_data pm; struct kbase_mem_pool_group mem_pools; @@ -1048,12 +1255,15 @@ struct kbase_device { struct memory_group_manager_device *mgm_dev; + bool mmu_unresponsive; struct kbase_as as[BASE_MAX_NR_AS]; - u16 as_free; struct kbase_context *as_to_kctx[BASE_MAX_NR_AS]; + u16 as_free; spinlock_t mmu_mask_change; + bool pagesize_2mb; + struct kbase_gpu_props gpu_props; unsigned long hw_issues_mask[(BASE_HW_ISSUE_END + BITS_PER_LONG - 1) / BITS_PER_LONG]; @@ -1067,6 +1277,12 @@ struct kbase_device { s8 nr_hw_address_spaces; s8 nr_user_address_spaces; + /** + * @pbha_propagate_bits: Record of Page-Based Hardware Attribute Propagate bits to + * restore to L2_CONFIG upon GPU reset. + */ + u8 pbha_propagate_bits; + #if MALI_USE_CSF struct kbase_hwcnt_backend_csf_if hwcnt_backend_csf_if_fw; #else @@ -1101,6 +1317,8 @@ struct kbase_device { u64 lowest_gpu_freq_khz; + struct kbase_backend_time backend_time; + bool cache_clean_in_progress; u32 cache_clean_queued; wait_queue_head_t cache_clean_wait; @@ -1148,7 +1366,9 @@ struct kbase_device { #endif /* CONFIG_MALI_DEVFREQ */ unsigned long previous_frequency; +#if !MALI_USE_CSF atomic_t job_fault_debug; +#endif /* !MALI_USE_CSF */ #if IS_ENABLED(CONFIG_DEBUG_FS) struct dentry *mali_debugfs_directory; @@ -1159,11 +1379,13 @@ struct kbase_device { u64 debugfs_as_read_bitmap; #endif /* CONFIG_MALI_DEBUG */ +#if !MALI_USE_CSF wait_queue_head_t job_fault_wq; wait_queue_head_t job_fault_resume_wq; struct workqueue_struct *job_fault_resume_workq; struct list_head job_fault_event_list; spinlock_t job_fault_event_lock; +#endif /* !MALI_USE_CSF */ #if !MALI_CUSTOMER_RELEASE struct { @@ -1185,13 +1407,11 @@ struct kbase_device { #if MALI_USE_CSF bool mmu_hw_operation_in_progress; #endif + bool mmu_page_migrate_in_progress; bool poweroff_pending; -#if (KERNEL_VERSION(4, 4, 0) <= LINUX_VERSION_CODE) bool infinite_cache_active_default; -#else - u32 infinite_cache_active_default; -#endif + struct kbase_mem_pool_group_config mem_pool_defaults; u32 current_gpu_coherency_mode; @@ -1240,9 +1460,7 @@ struct kbase_device { struct kbasep_js_device_data js_data; struct kthread_worker job_done_worker; - struct task_struct *job_done_worker_thread; struct kthread_worker event_worker; - struct task_struct *event_worker_thread; /* See KBASE_JS_*_PRIORITY_MODE for details. */ u32 js_ctx_scheduling_mode; @@ -1258,7 +1476,6 @@ struct kbase_device { struct { struct kthread_worker worker; - struct task_struct *thread; struct kthread_work power_on_work; struct kthread_work power_off_work; ktime_t end_ts; @@ -1292,6 +1509,24 @@ struct kbase_device { struct notifier_block oom_notifier_block; struct kobject *proc_sysfs_node; + + struct kbase_mem_migrate mem_migrate; + +#if MALI_USE_CSF && IS_ENABLED(CONFIG_SYNC_FILE) + atomic_t live_fence_metadata; +#endif + u32 mmu_or_gpu_cache_op_wait_time_ms; + struct kmem_cache *va_region_slab; + +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) + /** + * @gpu_metrics: GPU device wide structure used for emitting GPU metrics tracepoints. + */ + struct kbase_gpu_metrics gpu_metrics; +#endif +#if MALI_USE_CSF + atomic_t fence_signal_timeout_enabled; +#endif }; /** @@ -1308,6 +1543,9 @@ struct kbase_device { * @KBASE_FILE_COMPLETE: Indicates if the setup for context has * completed, i.e. flags have been set for the * context. + * @KBASE_FILE_DESTROY_CTX: Indicates that destroying of context has begun or + * is complete. This state can only be reached after + * @KBASE_FILE_COMPLETE. * * The driver allows only limited interaction with user-space until setup * is complete. @@ -1317,7 +1555,8 @@ enum kbase_file_state { KBASE_FILE_VSN_IN_PROGRESS, KBASE_FILE_NEED_CTX, KBASE_FILE_CTX_IN_PROGRESS, - KBASE_FILE_COMPLETE + KBASE_FILE_COMPLETE, + KBASE_FILE_DESTROY_CTX }; /** @@ -1327,6 +1566,12 @@ enum kbase_file_state { * allocated from the probe method of the Mali driver. * @filp: Pointer to the struct file corresponding to device file * /dev/malixx instance, passed to the file's open method. + * @owner: Pointer to the file table structure of a process that + * created the instance of /dev/malixx device file. Set to + * NULL when that process closes the file instance. No more + * file operations would be allowed once set to NULL. + * It would be updated only in the Userspace context, i.e. + * when @kbase_open or @kbase_flush is called. * @kctx: Object representing an entity, among which GPU is * scheduled and which gets its own GPU address space. * Invalid until @setup_state is KBASE_FILE_COMPLETE. @@ -1335,13 +1580,44 @@ enum kbase_file_state { * @setup_state is KBASE_FILE_NEED_CTX. * @setup_state: Initialization state of the file. Values come from * the kbase_file_state enumeration. + * @destroy_kctx_work: Work item for destroying the @kctx, enqueued only when + * @fops_count and @map_count becomes zero after /dev/malixx + * file was previously closed by the @owner. + * @lock: Lock to serialize the access to members like @owner, @fops_count, + * @map_count. + * @fops_count: Counter that is incremented at the beginning of a method + * defined for @kbase_fops and is decremented at the end. + * So the counter keeps a track of the file operations in progress + * for /dev/malixx file, that are being handled by the Kbase. + * The counter is needed to defer the context termination as + * Userspace can close the /dev/malixx file and flush() method + * can get called when some other file operation is in progress. + * @map_count: Counter to keep a track of the memory mappings present on + * /dev/malixx file instance. The counter is needed to defer the + * context termination as Userspace can close the /dev/malixx + * file and flush() method can get called when mappings are still + * present. + * @zero_fops_count_wait: Waitqueue used to wait for the @fops_count to become 0. + * Currently needed only for the "mem_view" debugfs file. + * @event_queue: Wait queue used for blocking the thread, which consumes + * the base_jd_event corresponding to an atom, when there + * are no more posted events. */ struct kbase_file { struct kbase_device *kbdev; struct file *filp; + fl_owner_t owner; struct kbase_context *kctx; unsigned long api_version; atomic_t setup_state; + struct work_struct destroy_kctx_work; + spinlock_t lock; + int fops_count; + int map_count; +#if IS_ENABLED(CONFIG_DEBUG_FS) + wait_queue_head_t zero_fops_count_wait; +#endif + wait_queue_head_t event_queue; }; #if MALI_JIT_PRESSURE_LIMIT_BASE /** @@ -1374,10 +1650,6 @@ struct kbase_file { * * @KCTX_DYING: Set when the context process is in the process of being evicted. * - * @KCTX_NO_IMPLICIT_SYNC: Set when explicit Android fences are in use on this - * context, to disable use of implicit dma-buf fences. This is used to avoid - * potential synchronization deadlocks. - * * @KCTX_FORCE_SAME_VA: Set when BASE_MEM_SAME_VA should be forced on memory * allocations. For 64-bit clients it is enabled by default, and disabled by * default on 32-bit clients. Being able to clear this flag is only used for @@ -1420,7 +1692,6 @@ enum kbase_context_flags { KCTX_PRIVILEGED = 1U << 7, KCTX_SCHEDULED = 1U << 8, KCTX_DYING = 1U << 9, - KCTX_NO_IMPLICIT_SYNC = 1U << 10, KCTX_FORCE_SAME_VA = 1U << 11, KCTX_PULLED_SINCE_ACTIVE_JS0 = 1U << 12, KCTX_PULLED_SINCE_ACTIVE_JS1 = 1U << 13, @@ -1459,9 +1730,6 @@ enum kbase_context_flags { * * @KCTX_DYING: Set when the context process is in the process of being evicted. * - * @KCTX_NO_IMPLICIT_SYNC: Set when explicit Android fences are in use on this - * context, to disable use of implicit dma-buf fences. This is used to avoid - * potential synchronization deadlocks. * * @KCTX_FORCE_SAME_VA: Set when BASE_MEM_SAME_VA should be forced on memory * allocations. For 64-bit clients it is enabled by default, and disabled by @@ -1502,7 +1770,6 @@ enum kbase_context_flags { KCTX_PRIVILEGED = 1U << 7, KCTX_SCHEDULED = 1U << 8, KCTX_DYING = 1U << 9, - KCTX_NO_IMPLICIT_SYNC = 1U << 10, KCTX_FORCE_SAME_VA = 1U << 11, KCTX_PULLED_SINCE_ACTIVE_JS0 = 1U << 12, KCTX_PULLED_SINCE_ACTIVE_JS1 = 1U << 13, @@ -1520,8 +1787,8 @@ struct kbase_sub_alloc { /** * struct kbase_context - Kernel base context * - * @filp: Pointer to the struct file corresponding to device file - * /dev/malixx instance, passed to the file's open method. + * @kfile: Pointer to the object representing the /dev/malixx device + * file instance. * @kbdev: Pointer to the Kbase device for which the context is created. * @kctx_list_link: Node into Kbase device list of contexts. * @mmu: Structure holding details of the MMU tables for this @@ -1556,22 +1823,6 @@ struct kbase_sub_alloc { * for the allocations >= 2 MB in size. * @reg_lock: Lock used for GPU virtual address space management operations, * like adding/freeing a memory region in the address space. - * Can be converted to a rwlock ?. - * @reg_rbtree_same: RB tree of the memory regions allocated from the SAME_VA - * zone of the GPU virtual address space. Used for allocations - * having the same value for GPU & CPU virtual address. - * @reg_rbtree_custom: RB tree of the memory regions allocated from the CUSTOM_VA - * zone of the GPU virtual address space. - * @reg_rbtree_exec: RB tree of the memory regions allocated from the EXEC_VA - * zone of the GPU virtual address space. Used for GPU-executable - * allocations which don't need the SAME_VA property. - * @reg_rbtree_exec_fixed: RB tree of the memory regions allocated from the - * EXEC_FIXED_VA zone of the GPU virtual address space. Used for - * GPU-executable allocations with FIXED/FIXABLE GPU virtual - * addresses. - * @reg_rbtree_fixed: RB tree of the memory regions allocated from the FIXED_VA zone - * of the GPU virtual address space. Used for allocations with - * FIXED/FIXABLE GPU virtual addresses. * @num_fixable_allocs: A count for the number of memory allocations with the * BASE_MEM_FIXABLE property. * @num_fixed_allocs: A count for the number of memory allocations with the @@ -1588,9 +1839,6 @@ struct kbase_sub_alloc { * used in conjunction with @cookies bitmask mainly for * providing a mechansim to have the same value for CPU & * GPU virtual address. - * @event_queue: Wait queue used for blocking the thread, which consumes - * the base_jd_event corresponding to an atom, when there - * are no more posted events. * @tgid: Thread group ID of the process whose thread created * the context (by calling KBASE_IOCTL_VERSION_CHECK or * KBASE_IOCTL_SET_FLAGS, depending on the @api_version). @@ -1652,11 +1900,13 @@ struct kbase_sub_alloc { * is scheduled in and an atom is pulled from the context's per * slot runnable tree in JM GPU or GPU command queue * group is programmed on CSG slot in CSF GPU. - * @mm_update_lock: lock used for handling of special tracking page. * @process_mm: Pointer to the memory descriptor of the process which * created the context. Used for accounting the physical * pages used for GPU allocations, done for the context, - * to the memory consumed by the process. + * to the memory consumed by the process. A reference is taken + * on this descriptor for the Userspace created contexts so that + * Kbase can safely access it to update the memory usage counters. + * The reference is dropped on context termination. * @gpu_va_end: End address of the GPU va space (in 4KB page units) * @running_total_tiler_heap_nr_chunks: Running total of number of chunks in all * tiler heaps of the kbase context. @@ -1707,12 +1957,6 @@ struct kbase_sub_alloc { * memory allocations. * @jit_current_allocations_per_bin: Current number of in-flight just-in-time * memory allocations per bin. - * @jit_version: Version number indicating whether userspace is using - * old or new version of interface for just-in-time - * memory allocations. - * 1 -> client used KBASE_IOCTL_MEM_JIT_INIT_10_2 - * 2 -> client used KBASE_IOCTL_MEM_JIT_INIT_11_5 - * 3 -> client used KBASE_IOCTL_MEM_JIT_INIT * @jit_group_id: A memory group ID to be passed to a platform-specific * memory group manager. * Valid range is 0..(MEMORY_GROUP_MANAGER_NR_GROUPS-1). @@ -1784,6 +2028,11 @@ struct kbase_sub_alloc { * @limited_core_mask: The mask that is applied to the affinity in case of atoms * marked with BASE_JD_REQ_LIMITED_CORE_MASK. * @platform_data: Pointer to platform specific per-context data. + * @task: Pointer to the task structure of the main thread of the process + * that created the Kbase context. It would be set only for the + * contexts created by the Userspace and not for the contexts + * created internally by the Kbase. + * @comm: Record the process name * * A kernel base context is an entity among which the GPU is scheduled. * Each context has its own GPU address space. @@ -1792,7 +2041,7 @@ struct kbase_sub_alloc { * is made on the device file. */ struct kbase_context { - struct file *filp; + struct kbase_file *kfile; struct kbase_device *kbdev; struct list_head kctx_list_link; struct kbase_mmu_table mmu; @@ -1817,17 +2066,11 @@ struct kbase_context { struct list_head mem_partials; struct mutex reg_lock; - - struct rb_root reg_rbtree_same; - struct rb_root reg_rbtree_custom; - struct rb_root reg_rbtree_exec; #if MALI_USE_CSF - struct rb_root reg_rbtree_exec_fixed; - struct rb_root reg_rbtree_fixed; atomic64_t num_fixable_allocs; atomic64_t num_fixed_allocs; #endif - struct kbase_reg_zone reg_zone[KBASE_REG_ZONE_MAX]; + struct kbase_reg_zone reg_zone[CONTEXT_ZONE_MAX]; #if MALI_USE_CSF struct kbase_csf_context csf; @@ -1851,7 +2094,6 @@ struct kbase_context { DECLARE_BITMAP(cookies, BITS_PER_LONG); struct kbase_va_region *pending_regions[BITS_PER_LONG]; - wait_queue_head_t event_queue; pid_t tgid; pid_t pid; atomic_t used_pages; @@ -1866,19 +2108,12 @@ struct kbase_context { struct list_head waiting_soft_jobs; spinlock_t waiting_soft_jobs_lock; -#ifdef CONFIG_MALI_DMA_FENCE - struct { - struct list_head waiting_resource; - struct workqueue_struct *wq; - } dma_fence; -#endif /* CONFIG_MALI_DMA_FENCE */ int as_nr; atomic_t refcount; - spinlock_t mm_update_lock; - struct mm_struct __rcu *process_mm; + struct mm_struct *process_mm; u64 gpu_va_end; #if MALI_USE_CSF u32 running_total_tiler_heap_nr_chunks; @@ -1903,7 +2138,6 @@ struct kbase_context { u8 jit_max_allocations; u8 jit_current_allocations; u8 jit_current_allocations_per_bin[256]; - u8 jit_version; u8 jit_group_id; #if MALI_JIT_PRESSURE_LIMIT_BASE u64 jit_phys_pages_limit; @@ -1939,9 +2173,19 @@ struct kbase_context { u64 limited_core_mask; -#if !MALI_USE_CSF void *platform_data; + + struct task_struct *task; + +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) + /** + * @gpu_metrics_ctx: Pointer to the GPU metrics context corresponding to the + * application that created the Kbase context. + */ + struct kbase_gpu_metrics_ctx *gpu_metrics_ctx; #endif + + char comm[TASK_COMM_LEN]; }; #ifdef CONFIG_MALI_CINSTR_GWT @@ -1970,17 +2214,15 @@ struct kbasep_gwt_list_element { * to a @kbase_context. * @ext_res_node: List head for adding the metadata to a * @kbase_context. - * @alloc: The physical memory allocation structure - * which is mapped. - * @gpu_addr: The GPU virtual address the resource is - * mapped to. + * @reg: External resource information, containing + * the corresponding VA region * @ref: Reference count. * * External resources can be mapped into multiple contexts as well as the same * context multiple times. - * As kbase_va_region itself isn't refcounted we can't attach our extra - * information to it as it could be removed under our feet leaving external - * resources pinned. + * As kbase_va_region is refcounted, we guarantee that it will be available + * for the duration of the external resource, meaning it is sufficient to use + * it to rederive any additional data, like the GPU address. * This metadata structure binds a single external resource to a single * context, ensuring that per context mapping is tracked separately so it can * be overridden when needed and abuses by the application (freeing the resource @@ -1988,8 +2230,7 @@ struct kbasep_gwt_list_element { */ struct kbase_ctx_ext_res_meta { struct list_head ext_res_node; - struct kbase_mem_phy_alloc *alloc; - u64 gpu_addr; + struct kbase_va_region *reg; u32 ref; }; @@ -2044,6 +2285,7 @@ static inline u64 kbase_get_lock_region_min_size_log2(struct kbase_gpu_props con /* Maximum number of loops polling the GPU for a cache flush before we assume it must have completed */ #define KBASE_CLEAN_CACHE_MAX_LOOPS 100000 /* Maximum number of loops polling the GPU for an AS command to complete before we assume the GPU has hung */ -#define KBASE_AS_INACTIVE_MAX_LOOPS 100000000 - +#define KBASE_AS_INACTIVE_MAX_LOOPS 100000 +/* Maximum number of loops polling the GPU PRFCNT_ACTIVE bit before we assume the GPU has hung */ +#define KBASE_PRFCNT_ACTIVE_MAX_LOOPS 100000000 #endif /* _KBASE_DEFS_H_ */ diff --git a/mali_kbase/mali_kbase_dma_fence.c b/mali_kbase/mali_kbase_dma_fence.c deleted file mode 100644 index c4129ff..0000000 --- a/mali_kbase/mali_kbase_dma_fence.c +++ /dev/null @@ -1,491 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note -/* - * - * (C) COPYRIGHT 2011-2016, 2020-2021 ARM Limited. All rights reserved. - * - * This program is free software and is provided to you under the terms of the - * GNU General Public License version 2 as published by the Free Software - * Foundation, and any use by you of this program is subject to the terms - * of such GNU license. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, you can access it online at - * http://www.gnu.org/licenses/gpl-2.0.html. - * - */ - -/* Include mali_kbase_dma_fence.h before checking for CONFIG_MALI_DMA_FENCE as - * it will be set there. - */ -#include "mali_kbase_dma_fence.h" -#include <linux/atomic.h> -#include <linux/list.h> -#include <linux/lockdep.h> -#include <linux/mutex.h> -#include <linux/version.h> -#include <linux/slab.h> -#include <linux/spinlock.h> -#include <linux/workqueue.h> -#include <linux/ww_mutex.h> -#include <mali_kbase.h> - -static void -kbase_dma_fence_work(struct work_struct *pwork); - -static void -kbase_dma_fence_waiters_add(struct kbase_jd_atom *katom) -{ - struct kbase_context *kctx = katom->kctx; - - list_add_tail(&katom->queue, &kctx->dma_fence.waiting_resource); -} - -static void -kbase_dma_fence_waiters_remove(struct kbase_jd_atom *katom) -{ - list_del(&katom->queue); -} - -static int -kbase_dma_fence_lock_reservations(struct kbase_dma_fence_resv_info *info, - struct ww_acquire_ctx *ctx) -{ -#if (KERNEL_VERSION(5, 4, 0) > LINUX_VERSION_CODE) - struct reservation_object *content_res = NULL; -#else - struct dma_resv *content_res = NULL; -#endif - unsigned int content_res_idx = 0; - unsigned int r; - int err = 0; - - ww_acquire_init(ctx, &reservation_ww_class); - -retry: - for (r = 0; r < info->dma_fence_resv_count; r++) { - if (info->resv_objs[r] == content_res) { - content_res = NULL; - continue; - } - - err = ww_mutex_lock(&info->resv_objs[r]->lock, ctx); - if (err) - goto error; - } - - ww_acquire_done(ctx); - return err; - -error: - content_res_idx = r; - - /* Unlock the locked one ones */ - while (r--) - ww_mutex_unlock(&info->resv_objs[r]->lock); - - if (content_res) - ww_mutex_unlock(&content_res->lock); - - /* If we deadlock try with lock_slow and retry */ - if (err == -EDEADLK) { - content_res = info->resv_objs[content_res_idx]; - ww_mutex_lock_slow(&content_res->lock, ctx); - goto retry; - } - - /* If we are here the function failed */ - ww_acquire_fini(ctx); - return err; -} - -static void -kbase_dma_fence_unlock_reservations(struct kbase_dma_fence_resv_info *info, - struct ww_acquire_ctx *ctx) -{ - unsigned int r; - - for (r = 0; r < info->dma_fence_resv_count; r++) - ww_mutex_unlock(&info->resv_objs[r]->lock); - ww_acquire_fini(ctx); -} - - - -/** - * kbase_dma_fence_queue_work() - Queue work to handle @katom - * @katom: Pointer to atom for which to queue work - * - * Queue kbase_dma_fence_work() for @katom to clean up the fence callbacks and - * submit the atom. - */ -static void -kbase_dma_fence_queue_work(struct kbase_jd_atom *katom) -{ - struct kbase_context *kctx = katom->kctx; - bool ret; - - INIT_WORK(&katom->work, kbase_dma_fence_work); - ret = queue_work(kctx->dma_fence.wq, &katom->work); - /* Warn if work was already queued, that should not happen. */ - WARN_ON(!ret); -} - -/** - * kbase_dma_fence_cancel_atom() - Cancels waiting on an atom - * @katom: Katom to cancel - * - * Locking: katom->dma_fence.callbacks list assumes jctx.lock is held. - */ -static void -kbase_dma_fence_cancel_atom(struct kbase_jd_atom *katom) -{ - lockdep_assert_held(&katom->kctx->jctx.lock); - - /* Cancel callbacks and clean up. */ - kbase_fence_free_callbacks(katom); - - /* Mark the atom as handled in case all fences signaled just before - * canceling the callbacks and the worker was queued. - */ - kbase_fence_dep_count_set(katom, -1); - - /* Prevent job_done_nolock from being called twice on an atom when - * there is a race between job completion and cancellation. - */ - - if (katom->status == KBASE_JD_ATOM_STATE_QUEUED) { - /* Wait was cancelled - zap the atom */ - katom->event_code = BASE_JD_EVENT_JOB_CANCELLED; - if (jd_done_nolock(katom, true)) - kbase_js_sched_all(katom->kctx->kbdev); - } -} - -/** - * kbase_dma_fence_work() - Worker thread called when a fence is signaled - * @pwork: work_struct containing a pointer to a katom - * - * This function will clean and mark all dependencies as satisfied - */ -static void -kbase_dma_fence_work(struct work_struct *pwork) -{ - struct kbase_jd_atom *katom; - struct kbase_jd_context *ctx; - - katom = container_of(pwork, struct kbase_jd_atom, work); - ctx = &katom->kctx->jctx; - - mutex_lock(&ctx->lock); - if (kbase_fence_dep_count_read(katom) != 0) - goto out; - - kbase_fence_dep_count_set(katom, -1); - - /* Remove atom from list of dma-fence waiting atoms. */ - kbase_dma_fence_waiters_remove(katom); - /* Cleanup callbacks. */ - kbase_fence_free_callbacks(katom); - /* - * Queue atom on GPU, unless it has already completed due to a failing - * dependency. Run jd_done_nolock() on the katom if it is completed. - */ - if (unlikely(katom->status == KBASE_JD_ATOM_STATE_COMPLETED)) - jd_done_nolock(katom, true); - else - kbase_jd_dep_clear_locked(katom); - -out: - mutex_unlock(&ctx->lock); -} - -static void -#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE) -kbase_dma_fence_cb(struct fence *fence, struct fence_cb *cb) -#else -kbase_dma_fence_cb(struct dma_fence *fence, struct dma_fence_cb *cb) -#endif -{ - struct kbase_fence_cb *kcb = container_of(cb, - struct kbase_fence_cb, - fence_cb); - struct kbase_jd_atom *katom = kcb->katom; - - /* If the atom is zapped dep_count will be forced to a negative number - * preventing this callback from ever scheduling work. Which in turn - * would reschedule the atom. - */ - - if (kbase_fence_dep_count_dec_and_test(katom)) - kbase_dma_fence_queue_work(katom); -} - -#if (KERNEL_VERSION(5, 4, 0) > LINUX_VERSION_CODE) -static int -kbase_dma_fence_add_reservation_callback(struct kbase_jd_atom *katom, - struct reservation_object *resv, - bool exclusive) -#else -static int -kbase_dma_fence_add_reservation_callback(struct kbase_jd_atom *katom, - struct dma_resv *resv, - bool exclusive) -#endif -{ -#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE) - struct fence *excl_fence = NULL; - struct fence **shared_fences = NULL; -#else - struct dma_fence *excl_fence = NULL; - struct dma_fence **shared_fences = NULL; -#endif - unsigned int shared_count = 0; - int err, i; - -#if (KERNEL_VERSION(5, 4, 0) > LINUX_VERSION_CODE) - err = reservation_object_get_fences_rcu( -#elif (KERNEL_VERSION(5, 14, 0) > LINUX_VERSION_CODE) - err = dma_resv_get_fences_rcu( -#else - err = dma_resv_get_fences( -#endif - resv, - &excl_fence, - &shared_count, - &shared_fences); - if (err) - return err; - - if (excl_fence) { - err = kbase_fence_add_callback(katom, - excl_fence, - kbase_dma_fence_cb); - - /* Release our reference, taken by reservation_object_get_fences_rcu(), - * to the fence. We have set up our callback (if that was possible), - * and it's the fence's owner is responsible for singling the fence - * before allowing it to disappear. - */ - dma_fence_put(excl_fence); - - if (err) - goto out; - } - - if (exclusive) { - for (i = 0; i < shared_count; i++) { - err = kbase_fence_add_callback(katom, - shared_fences[i], - kbase_dma_fence_cb); - if (err) - goto out; - } - } - - /* Release all our references to the shared fences, taken by - * reservation_object_get_fences_rcu(). We have set up our callback (if - * that was possible), and it's the fence's owner is responsible for - * signaling the fence before allowing it to disappear. - */ -out: - for (i = 0; i < shared_count; i++) - dma_fence_put(shared_fences[i]); - kfree(shared_fences); - - if (err) { - /* - * On error, cancel and clean up all callbacks that was set up - * before the error. - */ - kbase_fence_free_callbacks(katom); - } - - return err; -} - -#if (KERNEL_VERSION(5, 4, 0) > LINUX_VERSION_CODE) -void kbase_dma_fence_add_reservation(struct reservation_object *resv, - struct kbase_dma_fence_resv_info *info, - bool exclusive) -#else -void kbase_dma_fence_add_reservation(struct dma_resv *resv, - struct kbase_dma_fence_resv_info *info, - bool exclusive) -#endif -{ - unsigned int i; - - for (i = 0; i < info->dma_fence_resv_count; i++) { - /* Duplicate resource, ignore */ - if (info->resv_objs[i] == resv) - return; - } - - info->resv_objs[info->dma_fence_resv_count] = resv; - if (exclusive) - set_bit(info->dma_fence_resv_count, - info->dma_fence_excl_bitmap); - (info->dma_fence_resv_count)++; -} - -int kbase_dma_fence_wait(struct kbase_jd_atom *katom, - struct kbase_dma_fence_resv_info *info) -{ - int err, i; -#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE) - struct fence *fence; -#else - struct dma_fence *fence; -#endif - struct ww_acquire_ctx ww_ctx; - - lockdep_assert_held(&katom->kctx->jctx.lock); - - fence = kbase_fence_out_new(katom); - if (!fence) { - err = -ENOMEM; - dev_err(katom->kctx->kbdev->dev, - "Error %d creating fence.\n", err); - return err; - } - - kbase_fence_dep_count_set(katom, 1); - - err = kbase_dma_fence_lock_reservations(info, &ww_ctx); - if (err) { - dev_err(katom->kctx->kbdev->dev, - "Error %d locking reservations.\n", err); - kbase_fence_dep_count_set(katom, -1); - kbase_fence_out_remove(katom); - return err; - } - - for (i = 0; i < info->dma_fence_resv_count; i++) { -#if (KERNEL_VERSION(5, 4, 0) > LINUX_VERSION_CODE) - struct reservation_object *obj = info->resv_objs[i]; -#else - struct dma_resv *obj = info->resv_objs[i]; -#endif - if (!test_bit(i, info->dma_fence_excl_bitmap)) { -#if (KERNEL_VERSION(5, 4, 0) > LINUX_VERSION_CODE) - err = reservation_object_reserve_shared(obj); -#else - err = dma_resv_reserve_shared(obj, 0); -#endif - if (err) { - dev_err(katom->kctx->kbdev->dev, - "Error %d reserving space for shared fence.\n", err); - goto end; - } - - err = kbase_dma_fence_add_reservation_callback(katom, obj, false); - if (err) { - dev_err(katom->kctx->kbdev->dev, - "Error %d adding reservation to callback.\n", err); - goto end; - } - -#if (KERNEL_VERSION(5, 4, 0) > LINUX_VERSION_CODE) - reservation_object_add_shared_fence(obj, fence); -#else - dma_resv_add_shared_fence(obj, fence); -#endif - } else { - err = kbase_dma_fence_add_reservation_callback(katom, obj, true); - if (err) { - dev_err(katom->kctx->kbdev->dev, - "Error %d adding reservation to callback.\n", err); - goto end; - } - -#if (KERNEL_VERSION(5, 4, 0) > LINUX_VERSION_CODE) - reservation_object_add_excl_fence(obj, fence); -#else - dma_resv_add_excl_fence(obj, fence); -#endif - } - } - -end: - kbase_dma_fence_unlock_reservations(info, &ww_ctx); - - if (likely(!err)) { - /* Test if the callbacks are already triggered */ - if (kbase_fence_dep_count_dec_and_test(katom)) { - kbase_fence_dep_count_set(katom, -1); - kbase_fence_free_callbacks(katom); - } else { - /* Add katom to the list of dma-buf fence waiting atoms - * only if it is still waiting. - */ - kbase_dma_fence_waiters_add(katom); - } - } else { - /* There was an error, cancel callbacks, set dep_count to -1 to - * indicate that the atom has been handled (the caller will - * kill it for us), signal the fence, free callbacks and the - * fence. - */ - kbase_fence_free_callbacks(katom); - kbase_fence_dep_count_set(katom, -1); - kbase_dma_fence_signal(katom); - } - - return err; -} - -void kbase_dma_fence_cancel_all_atoms(struct kbase_context *kctx) -{ - struct list_head *list = &kctx->dma_fence.waiting_resource; - - while (!list_empty(list)) { - struct kbase_jd_atom *katom; - - katom = list_first_entry(list, struct kbase_jd_atom, queue); - kbase_dma_fence_waiters_remove(katom); - kbase_dma_fence_cancel_atom(katom); - } -} - -void kbase_dma_fence_cancel_callbacks(struct kbase_jd_atom *katom) -{ - /* Cancel callbacks and clean up. */ - if (kbase_fence_free_callbacks(katom)) - kbase_dma_fence_queue_work(katom); -} - -void kbase_dma_fence_signal(struct kbase_jd_atom *katom) -{ - if (!katom->dma_fence.fence) - return; - - /* Signal the atom's fence. */ - dma_fence_signal(katom->dma_fence.fence); - - kbase_fence_out_remove(katom); - - kbase_fence_free_callbacks(katom); -} - -void kbase_dma_fence_term(struct kbase_context *kctx) -{ - destroy_workqueue(kctx->dma_fence.wq); - kctx->dma_fence.wq = NULL; -} - -int kbase_dma_fence_init(struct kbase_context *kctx) -{ - INIT_LIST_HEAD(&kctx->dma_fence.waiting_resource); - - kctx->dma_fence.wq = alloc_workqueue("mali-fence-%d", - WQ_UNBOUND, 1, kctx->pid); - if (!kctx->dma_fence.wq) - return -ENOMEM; - - return 0; -} diff --git a/mali_kbase/mali_kbase_dma_fence.h b/mali_kbase/mali_kbase_dma_fence.h deleted file mode 100644 index be69118..0000000 --- a/mali_kbase/mali_kbase_dma_fence.h +++ /dev/null @@ -1,150 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ -/* - * - * (C) COPYRIGHT 2010-2016, 2020-2022 ARM Limited. All rights reserved. - * - * This program is free software and is provided to you under the terms of the - * GNU General Public License version 2 as published by the Free Software - * Foundation, and any use by you of this program is subject to the terms - * of such GNU license. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, you can access it online at - * http://www.gnu.org/licenses/gpl-2.0.html. - * - */ - -#ifndef _KBASE_DMA_FENCE_H_ -#define _KBASE_DMA_FENCE_H_ - -#ifdef CONFIG_MALI_DMA_FENCE - -#include <linux/list.h> -#include <linux/version.h> -#if (KERNEL_VERSION(5, 4, 0) > LINUX_VERSION_CODE) -#include <linux/reservation.h> -#else -#include <linux/dma-resv.h> -#endif -#include <mali_kbase_fence.h> - -/* Forward declaration from mali_kbase_defs.h */ -struct kbase_jd_atom; -struct kbase_context; - -/** - * struct kbase_dma_fence_resv_info - Structure with list of reservation objects - * @resv_objs: Array of reservation objects to attach the - * new fence to. - * @dma_fence_resv_count: Number of reservation objects in the array. - * @dma_fence_excl_bitmap: Specifies which resv_obj are exclusive. - * - * This is used by some functions to pass around a collection of data about - * reservation objects. - */ -struct kbase_dma_fence_resv_info { -#if (KERNEL_VERSION(5, 4, 0) > LINUX_VERSION_CODE) - struct reservation_object **resv_objs; -#else - struct dma_resv **resv_objs; -#endif - unsigned int dma_fence_resv_count; - unsigned long *dma_fence_excl_bitmap; -}; - -/** - * kbase_dma_fence_add_reservation() - Adds a resv to the array of resv_objs - * @resv: Reservation object to add to the array. - * @info: Pointer to struct with current reservation info - * @exclusive: Boolean indicating if exclusive access is needed - * - * The function adds a new reservation_object to an existing array of - * reservation_objects. At the same time keeps track of which objects require - * exclusive access in dma_fence_excl_bitmap. - */ -#if (KERNEL_VERSION(5, 4, 0) > LINUX_VERSION_CODE) -void kbase_dma_fence_add_reservation(struct reservation_object *resv, - struct kbase_dma_fence_resv_info *info, - bool exclusive); -#else -void kbase_dma_fence_add_reservation(struct dma_resv *resv, - struct kbase_dma_fence_resv_info *info, - bool exclusive); -#endif - -/** - * kbase_dma_fence_wait() - Creates a new fence and attaches it to the resv_objs - * @katom: Katom with the external dependency. - * @info: Pointer to struct with current reservation info - * - * Return: An error code or 0 if succeeds - */ -int kbase_dma_fence_wait(struct kbase_jd_atom *katom, - struct kbase_dma_fence_resv_info *info); - -/** - * kbase_dma_fence_cancel_ctx() - Cancel all dma-fences blocked atoms on kctx - * @kctx: Pointer to kbase context - * - * This function will cancel and clean up all katoms on @kctx that is waiting - * on dma-buf fences. - * - * Locking: jctx.lock needs to be held when calling this function. - */ -void kbase_dma_fence_cancel_all_atoms(struct kbase_context *kctx); - -/** - * kbase_dma_fence_cancel_callbacks() - Cancel only callbacks on katom - * @katom: Pointer to katom whose callbacks are to be canceled - * - * This function cancels all dma-buf fence callbacks on @katom, but does not - * cancel the katom itself. - * - * The caller is responsible for ensuring that jd_done_nolock is called on - * @katom. - * - * Locking: jctx.lock must be held when calling this function. - */ -void kbase_dma_fence_cancel_callbacks(struct kbase_jd_atom *katom); - -/** - * kbase_dma_fence_signal() - Signal katom's fence and clean up after wait - * @katom: Pointer to katom to signal and clean up - * - * This function will signal the @katom's fence, if it has one, and clean up - * the callback data from the katom's wait on earlier fences. - * - * Locking: jctx.lock must be held while calling this function. - */ -void kbase_dma_fence_signal(struct kbase_jd_atom *katom); - -/** - * kbase_dma_fence_term() - Terminate Mali dma-fence context - * @kctx: kbase context to terminate - */ -void kbase_dma_fence_term(struct kbase_context *kctx); - -/** - * kbase_dma_fence_init() - Initialize Mali dma-fence context - * @kctx: kbase context to initialize - * - * Return: 0 on success, error code otherwise. - */ -int kbase_dma_fence_init(struct kbase_context *kctx); - -#else /* !CONFIG_MALI_DMA_FENCE */ -/* Dummy functions for when dma-buf fence isn't enabled. */ - -static inline int kbase_dma_fence_init(struct kbase_context *kctx) -{ - return 0; -} - -static inline void kbase_dma_fence_term(struct kbase_context *kctx) {} -#endif /* CONFIG_MALI_DMA_FENCE */ -#endif diff --git a/mali_kbase/mali_kbase_dummy_job_wa.c b/mali_kbase/mali_kbase_dummy_job_wa.c index 35934b9..c3c6046 100644 --- a/mali_kbase/mali_kbase_dummy_job_wa.c +++ b/mali_kbase/mali_kbase_dummy_job_wa.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -183,9 +183,9 @@ int kbase_dummy_job_wa_execute(struct kbase_device *kbdev, u64 cores) if (kbdev->dummy_job_wa.flags & KBASE_DUMMY_JOB_WA_FLAG_WAIT_POWERUP) { /* wait for power-ups */ - wait(kbdev, SHADER_READY_LO, (cores & U32_MAX), true); + wait(kbdev, GPU_CONTROL_REG(SHADER_READY_LO), (cores & U32_MAX), true); if (cores >> 32) - wait(kbdev, SHADER_READY_HI, (cores >> 32), true); + wait(kbdev, GPU_CONTROL_REG(SHADER_READY_HI), (cores >> 32), true); } if (kbdev->dummy_job_wa.flags & KBASE_DUMMY_JOB_WA_FLAG_SERIALIZE) { @@ -218,11 +218,11 @@ int kbase_dummy_job_wa_execute(struct kbase_device *kbdev, u64 cores) kbase_reg_write(kbdev, SHADER_PWROFF_HI, (cores >> 32)); /* wait for power off complete */ - wait(kbdev, SHADER_READY_LO, (cores & U32_MAX), false); - wait(kbdev, SHADER_PWRTRANS_LO, (cores & U32_MAX), false); + wait(kbdev, GPU_CONTROL_REG(SHADER_READY_LO), (cores & U32_MAX), false); + wait(kbdev, GPU_CONTROL_REG(SHADER_PWRTRANS_LO), (cores & U32_MAX), false); if (cores >> 32) { - wait(kbdev, SHADER_READY_HI, (cores >> 32), false); - wait(kbdev, SHADER_PWRTRANS_HI, (cores >> 32), false); + wait(kbdev, GPU_CONTROL_REG(SHADER_READY_HI), (cores >> 32), false); + wait(kbdev, GPU_CONTROL_REG(SHADER_PWRTRANS_HI), (cores >> 32), false); } kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_IRQ_CLEAR), U32_MAX); } diff --git a/mali_kbase/mali_kbase_dvfs_debugfs.c b/mali_kbase/mali_kbase_dvfs_debugfs.c index 1e584de..e4cb716 100644 --- a/mali_kbase/mali_kbase_dvfs_debugfs.c +++ b/mali_kbase/mali_kbase_dvfs_debugfs.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2020-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -68,11 +68,7 @@ static const struct file_operations kbasep_dvfs_utilization_debugfs_fops = { void kbase_dvfs_status_debugfs_init(struct kbase_device *kbdev) { struct dentry *file; -#if (KERNEL_VERSION(4, 7, 0) <= LINUX_VERSION_CODE) const mode_t mode = 0444; -#else - const mode_t mode = 0400; -#endif if (WARN_ON(!kbdev || IS_ERR_OR_NULL(kbdev->mali_debugfs_directory))) return; diff --git a/mali_kbase/mali_kbase_fence.c b/mali_kbase/mali_kbase_fence.c index 01557cd..b16b276 100644 --- a/mali_kbase/mali_kbase_fence.c +++ b/mali_kbase/mali_kbase_fence.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2011-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2011-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -59,95 +59,3 @@ kbase_fence_out_new(struct kbase_jd_atom *katom) return fence; } -bool -kbase_fence_free_callbacks(struct kbase_jd_atom *katom) -{ - struct kbase_fence_cb *cb, *tmp; - bool res = false; - - lockdep_assert_held(&katom->kctx->jctx.lock); - - /* Clean up and free callbacks. */ - list_for_each_entry_safe(cb, tmp, &katom->dma_fence.callbacks, node) { - bool ret; - - /* Cancel callbacks that hasn't been called yet. */ - ret = dma_fence_remove_callback(cb->fence, &cb->fence_cb); - if (ret) { - int ret; - - /* Fence had not signaled, clean up after - * canceling. - */ - ret = atomic_dec_return(&katom->dma_fence.dep_count); - - if (unlikely(ret == 0)) - res = true; - } - - /* - * Release the reference taken in - * kbase_fence_add_callback(). - */ - dma_fence_put(cb->fence); - list_del(&cb->node); - kfree(cb); - } - - return res; -} - -#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE) -int -kbase_fence_add_callback(struct kbase_jd_atom *katom, - struct fence *fence, - fence_func_t callback) -#else -int -kbase_fence_add_callback(struct kbase_jd_atom *katom, - struct dma_fence *fence, - dma_fence_func_t callback) -#endif -{ - int err = 0; - struct kbase_fence_cb *kbase_fence_cb; - - if (!fence) - return -EINVAL; - - kbase_fence_cb = kmalloc(sizeof(*kbase_fence_cb), GFP_KERNEL); - if (!kbase_fence_cb) - return -ENOMEM; - - kbase_fence_cb->fence = fence; - kbase_fence_cb->katom = katom; - INIT_LIST_HEAD(&kbase_fence_cb->node); - atomic_inc(&katom->dma_fence.dep_count); - - err = dma_fence_add_callback(fence, &kbase_fence_cb->fence_cb, - callback); - if (err == -ENOENT) { - /* Fence signaled, get the completion result */ - err = dma_fence_get_status(fence); - - /* remap success completion to err code */ - if (err == 1) - err = 0; - - kfree(kbase_fence_cb); - atomic_dec(&katom->dma_fence.dep_count); - } else if (err) { - kfree(kbase_fence_cb); - atomic_dec(&katom->dma_fence.dep_count); - } else { - /* - * Get reference to fence that will be kept until callback gets - * cleaned up in kbase_fence_free_callbacks(). - */ - dma_fence_get(fence); - /* Add callback to katom's list of callbacks */ - list_add(&kbase_fence_cb->node, &katom->dma_fence.callbacks); - } - - return err; -} diff --git a/mali_kbase/mali_kbase_fence.h b/mali_kbase/mali_kbase_fence.h index 2842280..ea2ac34 100644 --- a/mali_kbase/mali_kbase_fence.h +++ b/mali_kbase/mali_kbase_fence.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2010-2018, 2020-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -23,41 +23,62 @@ #define _KBASE_FENCE_H_ /* - * mali_kbase_fence.[hc] has common fence code used by both - * - CONFIG_MALI_DMA_FENCE - implicit DMA fences - * - CONFIG_SYNC_FILE - explicit fences beginning with 4.9 kernel + * mali_kbase_fence.[hc] has fence code used only by + * - CONFIG_SYNC_FILE - explicit fences */ -#if defined(CONFIG_MALI_DMA_FENCE) || defined(CONFIG_SYNC_FILE) +#if IS_ENABLED(CONFIG_SYNC_FILE) #include <linux/list.h> #include "mali_kbase_fence_defs.h" #include "mali_kbase.h" +#include "mali_kbase_refcount_defs.h" +#include <linux/version_compat_defs.h> +#if MALI_USE_CSF +/* Maximum number of characters in DMA fence timeline name. */ +#define MAX_TIMELINE_NAME (32) + +/** + * struct kbase_kcpu_dma_fence_meta - Metadata structure for dma fence objects containing + * information about KCPU queue. One instance per KCPU + * queue. + * + * @refcount: Atomic value to keep track of number of references to an instance. + * An instance can outlive the KCPU queue itself. + * @kbdev: Pointer to Kbase device. + * @kctx_id: Kbase context ID. + * @timeline_name: String of timeline name for associated fence object. + */ +struct kbase_kcpu_dma_fence_meta { + kbase_refcount_t refcount; + struct kbase_device *kbdev; + int kctx_id; + char timeline_name[MAX_TIMELINE_NAME]; +}; + +/** + * struct kbase_kcpu_dma_fence - Structure which extends a dma fence object to include a + * reference to metadata containing more informaiton about it. + * + * @base: Fence object itself. + * @metadata: Pointer to metadata structure. + */ +struct kbase_kcpu_dma_fence { #if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE) -extern const struct fence_ops kbase_fence_ops; + struct fence base; #else -extern const struct dma_fence_ops kbase_fence_ops; + struct dma_fence base; +#endif /* LINUX_VERSION_CODE < KERNEL_VERSION(4, 10, 0) */ + struct kbase_kcpu_dma_fence_meta *metadata; +}; #endif -/** - * struct kbase_fence_cb - Mali dma-fence callback data struct - * @fence_cb: Callback function - * @katom: Pointer to katom that is waiting on this callback - * @fence: Pointer to the fence object on which this callback is waiting - * @node: List head for linking this callback to the katom - */ -struct kbase_fence_cb { #if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE) - struct fence_cb fence_cb; - struct fence *fence; +extern const struct fence_ops kbase_fence_ops; #else - struct dma_fence_cb fence_cb; - struct dma_fence *fence; +extern const struct dma_fence_ops kbase_fence_ops; #endif - struct kbase_jd_atom *katom; - struct list_head node; -}; /** * kbase_fence_out_new() - Creates a new output fence and puts it on the atom @@ -71,7 +92,7 @@ struct fence *kbase_fence_out_new(struct kbase_jd_atom *katom); struct dma_fence *kbase_fence_out_new(struct kbase_jd_atom *katom); #endif -#if defined(CONFIG_SYNC_FILE) +#if IS_ENABLED(CONFIG_SYNC_FILE) /** * kbase_fence_fence_in_set() - Assign input fence to atom * @katom: Atom to assign input fence to @@ -102,9 +123,9 @@ static inline void kbase_fence_out_remove(struct kbase_jd_atom *katom) } } -#if defined(CONFIG_SYNC_FILE) +#if IS_ENABLED(CONFIG_SYNC_FILE) /** - * kbase_fence_out_remove() - Removes the input fence from atom + * kbase_fence_in_remove() - Removes the input fence from atom * @katom: Atom to remove input fence for * * This will also release the reference to this fence which the atom keeps @@ -140,144 +161,92 @@ static inline bool kbase_fence_out_is_ours(struct kbase_jd_atom *katom) static inline int kbase_fence_out_signal(struct kbase_jd_atom *katom, int status) { - if (status) { -#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE && \ - KERNEL_VERSION(4, 9, 68) <= LINUX_VERSION_CODE) - fence_set_error(katom->dma_fence.fence, status); -#elif (KERNEL_VERSION(4, 11, 0) <= LINUX_VERSION_CODE) - dma_fence_set_error(katom->dma_fence.fence, status); -#else - katom->dma_fence.fence->status = status; -#endif - } + if (status) + dma_fence_set_error_helper(katom->dma_fence.fence, status); return dma_fence_signal(katom->dma_fence.fence); } +#if IS_ENABLED(CONFIG_SYNC_FILE) /** - * kbase_fence_add_callback() - Add callback on @fence to block @katom - * @katom: Pointer to katom that will be blocked by @fence - * @fence: Pointer to fence on which to set up the callback - * @callback: Pointer to function to be called when fence is signaled + * kbase_fence_in_get() - Retrieve input fence for atom. + * @katom: Atom to get input fence from * - * Caller needs to hold a reference to @fence when calling this function, and - * the caller is responsible for releasing that reference. An additional - * reference to @fence will be taken when the callback was successfully set up - * and @fence needs to be kept valid until the callback has been called and - * cleanup have been done. + * A ref will be taken for the fence, so use @kbase_fence_put() to release it * - * Return: 0 on success: fence was either already signaled, or callback was - * set up. Negative error code is returned on error. + * Return: The fence, or NULL if there is no input fence for atom */ -#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE) -int kbase_fence_add_callback(struct kbase_jd_atom *katom, - struct fence *fence, - fence_func_t callback); -#else -int kbase_fence_add_callback(struct kbase_jd_atom *katom, - struct dma_fence *fence, - dma_fence_func_t callback); +#define kbase_fence_in_get(katom) dma_fence_get((katom)->dma_fence.fence_in) #endif /** - * kbase_fence_dep_count_set() - Set dep_count value on atom to specified value - * @katom: Atom to set dep_count for - * @val: value to set dep_count to - * - * The dep_count is available to the users of this module so that they can - * synchronize completion of the wait with cancellation and adding of more - * callbacks. For instance, a user could do the following: + * kbase_fence_out_get() - Retrieve output fence for atom. + * @katom: Atom to get output fence from * - * dep_count set to 1 - * callback #1 added, dep_count is increased to 2 - * callback #1 happens, dep_count decremented to 1 - * since dep_count > 0, no completion is done - * callback #2 is added, dep_count is increased to 2 - * dep_count decremented to 1 - * callback #2 happens, dep_count decremented to 0 - * since dep_count now is zero, completion executes + * A ref will be taken for the fence, so use @kbase_fence_put() to release it * - * The dep_count can also be used to make sure that the completion only - * executes once. This is typically done by setting dep_count to -1 for the - * thread that takes on this responsibility. + * Return: The fence, or NULL if there is no output fence for atom */ -static inline void -kbase_fence_dep_count_set(struct kbase_jd_atom *katom, int val) -{ - atomic_set(&katom->dma_fence.dep_count, val); -} +#define kbase_fence_out_get(katom) dma_fence_get((katom)->dma_fence.fence) + +#endif /* !MALI_USE_CSF */ /** - * kbase_fence_dep_count_dec_and_test() - Decrements dep_count - * @katom: Atom to decrement dep_count for + * kbase_fence_get() - Retrieve fence for a KCPUQ fence command. + * @fence_info: KCPUQ fence command * - * See @kbase_fence_dep_count_set for general description about dep_count + * A ref will be taken for the fence, so use @kbase_fence_put() to release it * - * Return: true if value was decremented to zero, otherwise false + * Return: The fence, or NULL if there is no fence for KCPUQ fence command */ -static inline bool -kbase_fence_dep_count_dec_and_test(struct kbase_jd_atom *katom) +#define kbase_fence_get(fence_info) dma_fence_get((fence_info)->fence) + +#if MALI_USE_CSF +#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE) +static inline struct kbase_kcpu_dma_fence *kbase_kcpu_dma_fence_get(struct fence *fence) +#else +static inline struct kbase_kcpu_dma_fence *kbase_kcpu_dma_fence_get(struct dma_fence *fence) +#endif { - return atomic_dec_and_test(&katom->dma_fence.dep_count); + if (fence->ops == &kbase_fence_ops) + return (struct kbase_kcpu_dma_fence *)fence; + + return NULL; } -/** - * kbase_fence_dep_count_read() - Returns the current dep_count value - * @katom: Pointer to katom - * - * See @kbase_fence_dep_count_set for general description about dep_count - * - * Return: The current dep_count value - */ -static inline int kbase_fence_dep_count_read(struct kbase_jd_atom *katom) +static inline void kbase_kcpu_dma_fence_meta_put(struct kbase_kcpu_dma_fence_meta *metadata) { - return atomic_read(&katom->dma_fence.dep_count); + if (kbase_refcount_dec_and_test(&metadata->refcount)) { + atomic_dec(&metadata->kbdev->live_fence_metadata); + kfree(metadata); + } } -/** - * kbase_fence_free_callbacks() - Free dma-fence callbacks on a katom - * @katom: Pointer to katom - * - * This function will free all fence callbacks on the katom's list of - * callbacks. Callbacks that have not yet been called, because their fence - * hasn't yet signaled, will first be removed from the fence. - * - * Locking: katom->dma_fence.callbacks list assumes jctx.lock is held. - * - * Return: true if dep_count reached 0, otherwise false. - */ -bool kbase_fence_free_callbacks(struct kbase_jd_atom *katom); - -#if defined(CONFIG_SYNC_FILE) -/** - * kbase_fence_in_get() - Retrieve input fence for atom. - * @katom: Atom to get input fence from - * - * A ref will be taken for the fence, so use @kbase_fence_put() to release it - * - * Return: The fence, or NULL if there is no input fence for atom - */ -#define kbase_fence_in_get(katom) dma_fence_get((katom)->dma_fence.fence_in) +#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE) +static inline void kbase_kcpu_dma_fence_put(struct fence *fence) +#else +static inline void kbase_kcpu_dma_fence_put(struct dma_fence *fence) #endif +{ + struct kbase_kcpu_dma_fence *kcpu_fence = kbase_kcpu_dma_fence_get(fence); -/** - * kbase_fence_out_get() - Retrieve output fence for atom. - * @katom: Atom to get output fence from - * - * A ref will be taken for the fence, so use @kbase_fence_put() to release it - * - * Return: The fence, or NULL if there is no output fence for atom - */ -#define kbase_fence_out_get(katom) dma_fence_get((katom)->dma_fence.fence) - -#endif /* !MALI_USE_CSF */ + if (kcpu_fence) + kbase_kcpu_dma_fence_meta_put(kcpu_fence->metadata); +} +#endif /* MALI_USE_CSF */ /** * kbase_fence_put() - Releases a reference to a fence * @fence: Fence to release reference for. */ -#define kbase_fence_put(fence) dma_fence_put(fence) - +#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE) +static inline void kbase_fence_put(struct fence *fence) +#else +static inline void kbase_fence_put(struct dma_fence *fence) +#endif +{ + dma_fence_put(fence); +} -#endif /* CONFIG_MALI_DMA_FENCE || defined(CONFIG_SYNC_FILE */ +#endif /* IS_ENABLED(CONFIG_SYNC_FILE) */ #endif /* _KBASE_FENCE_H_ */ diff --git a/mali_kbase/mali_kbase_fence_ops.c b/mali_kbase/mali_kbase_fence_ops.c index 14ddf03..f14a55e 100644 --- a/mali_kbase/mali_kbase_fence_ops.c +++ b/mali_kbase/mali_kbase_fence_ops.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2020-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -21,7 +21,7 @@ #include <linux/atomic.h> #include <linux/list.h> -#include <mali_kbase_fence_defs.h> +#include <mali_kbase_fence.h> #include <mali_kbase.h> static const char * @@ -31,7 +31,7 @@ kbase_fence_get_driver_name(struct fence *fence) kbase_fence_get_driver_name(struct dma_fence *fence) #endif { - return kbase_drv_name; + return KBASE_DRV_NAME; } static const char * @@ -41,7 +41,13 @@ kbase_fence_get_timeline_name(struct fence *fence) kbase_fence_get_timeline_name(struct dma_fence *fence) #endif { - return kbase_timeline_name; +#if MALI_USE_CSF + struct kbase_kcpu_dma_fence *kcpu_fence = (struct kbase_kcpu_dma_fence *)fence; + + return kcpu_fence->metadata->timeline_name; +#else + return KBASE_TIMELINE_NAME; +#endif /* MALI_USE_CSF */ } static bool @@ -62,22 +68,44 @@ kbase_fence_fence_value_str(struct dma_fence *fence, char *str, int size) #endif { #if (KERNEL_VERSION(5, 1, 0) > LINUX_VERSION_CODE) - snprintf(str, size, "%u", fence->seqno); + const char *format = "%u"; #else - snprintf(str, size, "%llu", fence->seqno); + const char *format = "%llu"; #endif + if (unlikely(!scnprintf(str, size, format, fence->seqno))) + pr_err("Fail to encode fence seqno to string"); } +#if MALI_USE_CSF +static void #if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE) -const struct fence_ops kbase_fence_ops = { - .wait = fence_default_wait, +kbase_fence_release(struct fence *fence) #else -const struct dma_fence_ops kbase_fence_ops = { - .wait = dma_fence_default_wait, +kbase_fence_release(struct dma_fence *fence) +#endif +{ + struct kbase_kcpu_dma_fence *kcpu_fence = (struct kbase_kcpu_dma_fence *)fence; + + kbase_kcpu_dma_fence_meta_put(kcpu_fence->metadata); + kfree(kcpu_fence); +} #endif - .get_driver_name = kbase_fence_get_driver_name, - .get_timeline_name = kbase_fence_get_timeline_name, - .enable_signaling = kbase_fence_enable_signaling, - .fence_value_str = kbase_fence_fence_value_str -}; +#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE) +extern const struct fence_ops kbase_fence_ops; /* silence checker warning */ +const struct fence_ops kbase_fence_ops = { .wait = fence_default_wait, +#else +extern const struct dma_fence_ops kbase_fence_ops; /* silence checker warning */ +const struct dma_fence_ops kbase_fence_ops = { .wait = dma_fence_default_wait, +#endif + .get_driver_name = kbase_fence_get_driver_name, + .get_timeline_name = kbase_fence_get_timeline_name, + .enable_signaling = kbase_fence_enable_signaling, +#if MALI_USE_CSF + .fence_value_str = kbase_fence_fence_value_str, + .release = kbase_fence_release +#else + .fence_value_str = kbase_fence_fence_value_str +#endif +}; +KBASE_EXPORT_TEST_API(kbase_fence_ops); diff --git a/mali_kbase/mali_kbase_gpu_metrics.c b/mali_kbase/mali_kbase_gpu_metrics.c new file mode 100644 index 0000000..af3a08d --- /dev/null +++ b/mali_kbase/mali_kbase_gpu_metrics.c @@ -0,0 +1,260 @@ +// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note +/* + * + * (C) COPYRIGHT 2023 ARM Limited. All rights reserved. + * + * This program is free software and is provided to you under the terms of the + * GNU General Public License version 2 as published by the Free Software + * Foundation, and any use by you of this program is subject to the terms + * of such GNU license. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, you can access it online at + * http://www.gnu.org/licenses/gpl-2.0.html. + * + */ + +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) +#include "mali_power_gpu_work_period_trace.h" +#include <mali_kbase_gpu_metrics.h> + +/** + * enum gpu_metrics_ctx_flags - Flags for the GPU metrics context + * + * @ACTIVE_INTERVAL_IN_WP: Flag set when the application first becomes active in + * the current work period. + * + * @INSIDE_ACTIVE_LIST: Flag to track if object is in kbase_device::gpu_metrics::active_list + * + * All members need to be separate bits. This enum is intended for use in a + * bitmask where multiple values get OR-ed together. + */ +enum gpu_metrics_ctx_flags { + ACTIVE_INTERVAL_IN_WP = 1 << 0, + INSIDE_ACTIVE_LIST = 1 << 1, +}; + +static inline bool gpu_metrics_ctx_flag(struct kbase_gpu_metrics_ctx *gpu_metrics_ctx, + enum gpu_metrics_ctx_flags flag) +{ + return (gpu_metrics_ctx->flags & flag); +} + +static inline void gpu_metrics_ctx_flag_set(struct kbase_gpu_metrics_ctx *gpu_metrics_ctx, + enum gpu_metrics_ctx_flags flag) +{ + gpu_metrics_ctx->flags |= flag; +} + +static inline void gpu_metrics_ctx_flag_clear(struct kbase_gpu_metrics_ctx *gpu_metrics_ctx, + enum gpu_metrics_ctx_flags flag) +{ + gpu_metrics_ctx->flags &= ~flag; +} + +static inline void validate_tracepoint_data(struct kbase_gpu_metrics_ctx *gpu_metrics_ctx, + u64 start_time, u64 end_time, u64 total_active) +{ +#ifdef CONFIG_MALI_DEBUG + WARN(total_active > NSEC_PER_SEC, + "total_active %llu > 1 second for aid %u active_cnt %u", + total_active, gpu_metrics_ctx->aid, gpu_metrics_ctx->active_cnt); + + WARN(start_time >= end_time, + "start_time %llu >= end_time %llu for aid %u active_cnt %u", + start_time, end_time, gpu_metrics_ctx->aid, gpu_metrics_ctx->active_cnt); + + WARN(total_active > (end_time - start_time), + "total_active %llu > end_time %llu - start_time %llu for aid %u active_cnt %u", + total_active, end_time, start_time, + gpu_metrics_ctx->aid, gpu_metrics_ctx->active_cnt); + + WARN(gpu_metrics_ctx->prev_wp_active_end_time > start_time, + "prev_wp_active_end_time %llu > start_time %llu for aid %u active_cnt %u", + gpu_metrics_ctx->prev_wp_active_end_time, start_time, + gpu_metrics_ctx->aid, gpu_metrics_ctx->active_cnt); +#endif +} + +static void emit_tracepoint_for_active_gpu_metrics_ctx(struct kbase_device *kbdev, + struct kbase_gpu_metrics_ctx *gpu_metrics_ctx, u64 current_time) +{ + const u64 start_time = gpu_metrics_ctx->first_active_start_time; + u64 total_active = gpu_metrics_ctx->total_active; + u64 end_time; + + /* Check if the GPU activity is currently ongoing */ + if (gpu_metrics_ctx->active_cnt) { + end_time = current_time; + total_active += + end_time - gpu_metrics_ctx->last_active_start_time; + + gpu_metrics_ctx->first_active_start_time = current_time; + gpu_metrics_ctx->last_active_start_time = current_time; + } else { + end_time = gpu_metrics_ctx->last_active_end_time; + gpu_metrics_ctx_flag_clear(gpu_metrics_ctx, ACTIVE_INTERVAL_IN_WP); + } + + trace_gpu_work_period(kbdev->id, gpu_metrics_ctx->aid, + start_time, end_time, total_active); + + validate_tracepoint_data(gpu_metrics_ctx, start_time, end_time, total_active); + gpu_metrics_ctx->prev_wp_active_end_time = end_time; + gpu_metrics_ctx->total_active = 0; +} + +void kbase_gpu_metrics_ctx_put(struct kbase_device *kbdev, + struct kbase_gpu_metrics_ctx *gpu_metrics_ctx) +{ + WARN_ON(list_empty(&gpu_metrics_ctx->link)); + WARN_ON(!gpu_metrics_ctx->kctx_count); + + gpu_metrics_ctx->kctx_count--; + if (gpu_metrics_ctx->kctx_count) + return; + + if (gpu_metrics_ctx_flag(gpu_metrics_ctx, ACTIVE_INTERVAL_IN_WP)) + emit_tracepoint_for_active_gpu_metrics_ctx(kbdev, + gpu_metrics_ctx, ktime_get_raw_ns()); + + list_del_init(&gpu_metrics_ctx->link); + kfree(gpu_metrics_ctx); +} + +struct kbase_gpu_metrics_ctx *kbase_gpu_metrics_ctx_get(struct kbase_device *kbdev, u32 aid) +{ + struct kbase_gpu_metrics *gpu_metrics = &kbdev->gpu_metrics; + struct kbase_gpu_metrics_ctx *gpu_metrics_ctx; + + list_for_each_entry(gpu_metrics_ctx, &gpu_metrics->active_list, link) { + if (gpu_metrics_ctx->aid == aid) { + WARN_ON(!gpu_metrics_ctx->kctx_count); + gpu_metrics_ctx->kctx_count++; + return gpu_metrics_ctx; + } + } + + list_for_each_entry(gpu_metrics_ctx, &gpu_metrics->inactive_list, link) { + if (gpu_metrics_ctx->aid == aid) { + WARN_ON(!gpu_metrics_ctx->kctx_count); + gpu_metrics_ctx->kctx_count++; + return gpu_metrics_ctx; + } + } + + return NULL; +} + +void kbase_gpu_metrics_ctx_init(struct kbase_device *kbdev, + struct kbase_gpu_metrics_ctx *gpu_metrics_ctx, unsigned int aid) +{ + gpu_metrics_ctx->aid = aid; + gpu_metrics_ctx->total_active = 0; + gpu_metrics_ctx->kctx_count = 1; + gpu_metrics_ctx->active_cnt = 0; + gpu_metrics_ctx->prev_wp_active_end_time = 0; + gpu_metrics_ctx->flags = 0; + list_add_tail(&gpu_metrics_ctx->link, &kbdev->gpu_metrics.inactive_list); +} + +void kbase_gpu_metrics_ctx_start_activity(struct kbase_context *kctx, u64 timestamp_ns) +{ + struct kbase_gpu_metrics_ctx *gpu_metrics_ctx = kctx->gpu_metrics_ctx; + + gpu_metrics_ctx->active_cnt++; + if (gpu_metrics_ctx->active_cnt == 1) + gpu_metrics_ctx->last_active_start_time = timestamp_ns; + + if (!gpu_metrics_ctx_flag(gpu_metrics_ctx, ACTIVE_INTERVAL_IN_WP)) { + gpu_metrics_ctx->first_active_start_time = timestamp_ns; + gpu_metrics_ctx_flag_set(gpu_metrics_ctx, ACTIVE_INTERVAL_IN_WP); + } + + if (!gpu_metrics_ctx_flag(gpu_metrics_ctx, INSIDE_ACTIVE_LIST)) { + list_move_tail(&gpu_metrics_ctx->link, &kctx->kbdev->gpu_metrics.active_list); + gpu_metrics_ctx_flag_set(gpu_metrics_ctx, INSIDE_ACTIVE_LIST); + } +} + +void kbase_gpu_metrics_ctx_end_activity(struct kbase_context *kctx, u64 timestamp_ns) +{ + struct kbase_gpu_metrics_ctx *gpu_metrics_ctx = kctx->gpu_metrics_ctx; + + if (WARN_ON_ONCE(!gpu_metrics_ctx->active_cnt)) + return; + + if (--gpu_metrics_ctx->active_cnt) + return; + + if (likely(timestamp_ns > gpu_metrics_ctx->last_active_start_time)) { + gpu_metrics_ctx->last_active_end_time = timestamp_ns; + gpu_metrics_ctx->total_active += + timestamp_ns - gpu_metrics_ctx->last_active_start_time; + return; + } + + /* Due to conversion from system timestamp to CPU timestamp (which involves rounding) + * the value for start and end timestamp could come as same. + */ + if (timestamp_ns == gpu_metrics_ctx->last_active_start_time) { + gpu_metrics_ctx->last_active_end_time = timestamp_ns + 1; + gpu_metrics_ctx->total_active += 1; + return; + } + + /* The following check is to detect the situation where 'ACT=0' event was not visible to + * the Kbase even though the system timestamp value sampled by FW was less than the system + * timestamp value sampled by Kbase just before the draining of trace buffer. + */ + if (gpu_metrics_ctx->last_active_start_time == gpu_metrics_ctx->first_active_start_time && + gpu_metrics_ctx->prev_wp_active_end_time == gpu_metrics_ctx->first_active_start_time) { + WARN_ON_ONCE(gpu_metrics_ctx->total_active); + gpu_metrics_ctx->last_active_end_time = + gpu_metrics_ctx->prev_wp_active_end_time + 1; + gpu_metrics_ctx->total_active = 1; + return; + } + + WARN_ON_ONCE(1); +} + +void kbase_gpu_metrics_emit_tracepoint(struct kbase_device *kbdev, u64 ts) +{ + struct kbase_gpu_metrics *gpu_metrics = &kbdev->gpu_metrics; + struct kbase_gpu_metrics_ctx *gpu_metrics_ctx, *tmp; + + list_for_each_entry_safe(gpu_metrics_ctx, tmp, &gpu_metrics->active_list, link) { + if (!gpu_metrics_ctx_flag(gpu_metrics_ctx, ACTIVE_INTERVAL_IN_WP)) { + WARN_ON(!gpu_metrics_ctx_flag(gpu_metrics_ctx, INSIDE_ACTIVE_LIST)); + WARN_ON(gpu_metrics_ctx->active_cnt); + list_move_tail(&gpu_metrics_ctx->link, &gpu_metrics->inactive_list); + gpu_metrics_ctx_flag_clear(gpu_metrics_ctx, INSIDE_ACTIVE_LIST); + continue; + } + + emit_tracepoint_for_active_gpu_metrics_ctx(kbdev, gpu_metrics_ctx, ts); + } +} + +int kbase_gpu_metrics_init(struct kbase_device *kbdev) +{ + INIT_LIST_HEAD(&kbdev->gpu_metrics.active_list); + INIT_LIST_HEAD(&kbdev->gpu_metrics.inactive_list); + + dev_info(kbdev->dev, "GPU metrics tracepoint support enabled"); + return 0; +} + +void kbase_gpu_metrics_term(struct kbase_device *kbdev) +{ + WARN_ON_ONCE(!list_empty(&kbdev->gpu_metrics.active_list)); + WARN_ON_ONCE(!list_empty(&kbdev->gpu_metrics.inactive_list)); +} + +#endif diff --git a/mali_kbase/mali_kbase_gpu_metrics.h b/mali_kbase/mali_kbase_gpu_metrics.h new file mode 100644 index 0000000..adc8816 --- /dev/null +++ b/mali_kbase/mali_kbase_gpu_metrics.h @@ -0,0 +1,167 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * + * (C) COPYRIGHT 2023 ARM Limited. All rights reserved. + * + * This program is free software and is provided to you under the terms of the + * GNU General Public License version 2 as published by the Free Software + * Foundation, and any use by you of this program is subject to the terms + * of such GNU license. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, you can access it online at + * http://www.gnu.org/licenses/gpl-2.0.html. + * + */ + +/** + * DOC: GPU metrics frontend APIs + */ + +#ifndef _KBASE_GPU_METRICS_H_ +#define _KBASE_GPU_METRICS_H_ + +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) +#include <mali_kbase.h> + +/** + * kbase_gpu_metrics_get_emit_interval() - Return the trace point emission interval. + * + * Return: The time interval in nanosecond for GPU metrics trace point emission. + */ +unsigned long kbase_gpu_metrics_get_emit_interval(void); + +/** + * kbase_gpu_metrics_ctx_put() - Decrement the Kbase context count for the GPU metrics + * context and free it if the count becomes 0. + * + * @kbdev: Pointer to the GPU device. + * @gpu_metrics_ctx: Pointer to the GPU metrics context. + * + * This function must be called when a Kbase context is destroyed. + * The function would decrement the Kbase context count for the GPU metrics context and + * free the memory if the count becomes 0. + * The function would emit a power/gpu_work_period tracepoint for the GPU metrics context + * if there was some GPU activity done for it since the last tracepoint was emitted. + * + * Note: The caller must appropriately serialize the call to this function with the + * call to other GPU metrics functions declared in this file. + */ +void kbase_gpu_metrics_ctx_put(struct kbase_device *kbdev, + struct kbase_gpu_metrics_ctx *gpu_metrics_ctx); + +/** + * kbase_gpu_metrics_ctx_get() - Increment the Kbase context count for the GPU metrics + * context if it exists. + * + * @kbdev: Pointer to the GPU device. + * @aid: Unique identifier of the Application that is creating the Kbase context. + * + * This function must be called when a Kbase context is created. + * The function would increment the Kbase context count for the GPU metrics context, + * corresponding to the @aid, if it exists. + * + * Return: Pointer to the GPU metrics context corresponding to the @aid if it already + * exists otherwise NULL. + * + * Note: The caller must appropriately serialize the call to this function with the + * call to other GPU metrics functions declared in this file. + * The caller shall allocate memory for GPU metrics context structure if the + * function returns NULL. + */ +struct kbase_gpu_metrics_ctx *kbase_gpu_metrics_ctx_get(struct kbase_device *kbdev, u32 aid); + +/** + * kbase_gpu_metrics_ctx_init() - Initialise the GPU metrics context + * + * @kbdev: Pointer to the GPU device. + * @gpu_metrics_ctx: Pointer to the GPU metrics context. + * @aid: Unique identifier of the Application for which GPU metrics + * context needs to be initialized. + * + * This function must be called when a Kbase context is created, after the call to + * kbase_gpu_metrics_ctx_get() returned NULL and memory for the GPU metrics context + * structure was allocated. + * + * Note: The caller must appropriately serialize the call to this function with the + * call to other GPU metrics functions declared in this file. + */ +void kbase_gpu_metrics_ctx_init(struct kbase_device *kbdev, + struct kbase_gpu_metrics_ctx *gpu_metrics_ctx, u32 aid); + +/** + * kbase_gpu_metrics_ctx_start_activity() - Report the start of some GPU activity + * for GPU metrics context. + * + * @kctx: Pointer to the Kbase context contributing data to the GPU metrics context. + * @timestamp_ns: CPU timestamp at which the GPU activity started. + * + * The provided timestamp would be later used as the "start_time_ns" for the + * power/gpu_work_period tracepoint if this is the first GPU activity for the GPU + * metrics context in the current work period. + * + * Note: The caller must appropriately serialize the call to this function with the + * call to other GPU metrics functions declared in this file. + */ +void kbase_gpu_metrics_ctx_start_activity(struct kbase_context *kctx, u64 timestamp_ns); + +/** + * kbase_gpu_metrics_ctx_end_activity() - Report the end of some GPU activity + * for GPU metrics context. + * + * @kctx: Pointer to the Kbase context contributing data to the GPU metrics context. + * @timestamp_ns: CPU timestamp at which the GPU activity ended. + * + * The provided timestamp would be later used as the "end_time_ns" for the + * power/gpu_work_period tracepoint if this is the last GPU activity for the GPU + * metrics context in the current work period. + * + * Note: The caller must appropriately serialize the call to this function with the + * call to other GPU metrics functions declared in this file. + */ +void kbase_gpu_metrics_ctx_end_activity(struct kbase_context *kctx, u64 timestamp_ns); + +/** + * kbase_gpu_metrics_emit_tracepoint() - Emit power/gpu_work_period tracepoint + * for active GPU metrics contexts. + * + * @kbdev: Pointer to the GPU device. + * @ts: Timestamp at which the tracepoint is being emitted. + * + * This function would loop through all the active GPU metrics contexts and emit a + * power/gpu_work_period tracepoint for them. + * The GPU metrics context that is found to be inactive since the last tracepoint + * was emitted would be moved to the inactive list. + * The current work period would be considered as over and a new work period would + * begin whenever any application does the GPU activity. + * + * Note: The caller must appropriately serialize the call to this function with the + * call to other GPU metrics functions declared in this file. + */ +void kbase_gpu_metrics_emit_tracepoint(struct kbase_device *kbdev, u64 ts); + +/** + * kbase_gpu_metrics_init() - Initialise a gpu_metrics instance for a GPU + * + * @kbdev: Pointer to the GPU device. + * + * This function is called once for each @kbdev. + * + * Return: 0 on success, or negative on failure. + */ +int kbase_gpu_metrics_init(struct kbase_device *kbdev); + +/** + * kbase_gpu_metrics_term() - Terminate a gpu_metrics instance + * + * @kbdev: Pointer to the GPU device. + */ +void kbase_gpu_metrics_term(struct kbase_device *kbdev); + +#endif +#endif /* _KBASE_GPU_METRICS_H_ */ diff --git a/mali_kbase/mali_kbase_gpuprops.c b/mali_kbase/mali_kbase_gpuprops.c index 91ef6d1..02d6bb2 100644 --- a/mali_kbase/mali_kbase_gpuprops.c +++ b/mali_kbase/mali_kbase_gpuprops.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2011-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2011-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -49,7 +49,7 @@ static void kbase_gpuprops_construct_coherent_groups( props->coherency_info.coherency = props->raw_props.mem_features; props->coherency_info.num_core_groups = hweight64(props->raw_props.l2_present); - if (props->coherency_info.coherency & GROUPS_L2_COHERENT) { + if (props->coherency_info.coherency & MEM_FEATURES_COHERENT_CORE_GROUP_MASK) { /* Group is l2 coherent */ group_present = props->raw_props.l2_present; } else { @@ -198,7 +198,6 @@ static int kbase_gpuprops_get_props(struct base_gpu_props * const gpu_props, gpu_props->raw_props.mem_features = regdump.mem_features; gpu_props->raw_props.mmu_features = regdump.mmu_features; gpu_props->raw_props.l2_features = regdump.l2_features; - gpu_props->raw_props.core_features = regdump.core_features; gpu_props->raw_props.as_present = regdump.as_present; gpu_props->raw_props.js_present = regdump.js_present; @@ -312,9 +311,6 @@ static void kbase_gpuprops_calculate_props( struct base_gpu_props * const gpu_props, struct kbase_device *kbdev) { int i; -#if !MALI_USE_CSF - u32 gpu_id; -#endif /* Populate the base_gpu_props structure */ kbase_gpuprops_update_core_props_gpu_id(gpu_props); @@ -326,9 +322,6 @@ static void kbase_gpuprops_calculate_props( totalram_pages() << PAGE_SHIFT; #endif - gpu_props->core_props.num_exec_engines = - KBASE_UBFX32(gpu_props->raw_props.core_features, 0, 4); - for (i = 0; i < BASE_GPU_NUM_TEXTURE_FEATURES_REGISTERS; i++) gpu_props->core_props.texture_features[i] = gpu_props->raw_props.texture_features[i]; @@ -367,51 +360,23 @@ static void kbase_gpuprops_calculate_props( gpu_props->thread_props.tls_alloc = gpu_props->raw_props.thread_tls_alloc; - /* MIDHARC-2364 was intended for tULx. - * Workaround for the incorrectly applied THREAD_FEATURES to tDUx. - */ -#if !MALI_USE_CSF - gpu_id = kbdev->gpu_props.props.raw_props.gpu_id; -#endif - #if MALI_USE_CSF - CSTD_UNUSED(gpu_id); gpu_props->thread_props.max_registers = - KBASE_UBFX32(gpu_props->raw_props.thread_features, - 0U, 22); + KBASE_UBFX32(gpu_props->raw_props.thread_features, 0U, 22); gpu_props->thread_props.impl_tech = - KBASE_UBFX32(gpu_props->raw_props.thread_features, - 22U, 2); + KBASE_UBFX32(gpu_props->raw_props.thread_features, 22U, 2); gpu_props->thread_props.max_task_queue = - KBASE_UBFX32(gpu_props->raw_props.thread_features, - 24U, 8); + KBASE_UBFX32(gpu_props->raw_props.thread_features, 24U, 8); gpu_props->thread_props.max_thread_group_split = 0; #else - if ((gpu_id & GPU_ID2_PRODUCT_MODEL) == GPU_ID2_PRODUCT_TDUX) { - gpu_props->thread_props.max_registers = - KBASE_UBFX32(gpu_props->raw_props.thread_features, - 0U, 22); - gpu_props->thread_props.impl_tech = - KBASE_UBFX32(gpu_props->raw_props.thread_features, - 22U, 2); - gpu_props->thread_props.max_task_queue = - KBASE_UBFX32(gpu_props->raw_props.thread_features, - 24U, 8); - gpu_props->thread_props.max_thread_group_split = 0; - } else { - gpu_props->thread_props.max_registers = - KBASE_UBFX32(gpu_props->raw_props.thread_features, - 0U, 16); - gpu_props->thread_props.max_task_queue = - KBASE_UBFX32(gpu_props->raw_props.thread_features, - 16U, 8); - gpu_props->thread_props.max_thread_group_split = - KBASE_UBFX32(gpu_props->raw_props.thread_features, - 24U, 6); - gpu_props->thread_props.impl_tech = - KBASE_UBFX32(gpu_props->raw_props.thread_features, - 30U, 2); - } + gpu_props->thread_props.max_registers = + KBASE_UBFX32(gpu_props->raw_props.thread_features, 0U, 16); + gpu_props->thread_props.max_task_queue = + KBASE_UBFX32(gpu_props->raw_props.thread_features, 16U, 8); + gpu_props->thread_props.max_thread_group_split = + KBASE_UBFX32(gpu_props->raw_props.thread_features, 24U, 6); + gpu_props->thread_props.impl_tech = + KBASE_UBFX32(gpu_props->raw_props.thread_features, 30U, 2); #endif /* If values are not specified, then use defaults */ @@ -511,6 +476,21 @@ int kbase_gpuprops_set_features(struct kbase_device *kbdev) if (!kbase_hw_has_feature(kbdev, BASE_HW_FEATURE_THREAD_GROUP_SPLIT)) gpu_props->thread_props.max_thread_group_split = 0; + /* + * The CORE_FEATURES register has different meanings depending on GPU. + * On tGOx, bits[3:0] encode num_exec_engines. + * On CSF GPUs, bits[7:0] is an enumeration that needs to be parsed, + * instead. + * GPUs like tTIx have additional fields like LSC_SIZE that are + * otherwise reserved/RAZ on older GPUs. + */ + gpu_props->raw_props.core_features = regdump.core_features; + +#if !MALI_USE_CSF + gpu_props->core_props.num_exec_engines = + KBASE_UBFX32(gpu_props->raw_props.core_features, 0, 4); +#endif + return err; } @@ -532,7 +512,7 @@ MODULE_PARM_DESC(override_l2_hash, "Override L2 hash config for testing"); static u32 l2_hash_values[ASN_HASH_COUNT] = { 0, }; -static int num_override_l2_hash_values; +static unsigned int num_override_l2_hash_values; module_param_array(l2_hash_values, uint, &num_override_l2_hash_values, 0000); MODULE_PARM_DESC(l2_hash_values, "Override L2 hash values config for testing"); @@ -586,7 +566,7 @@ kbase_read_l2_config_from_dt(struct kbase_device *const kbdev) kbdev->l2_hash_values_override = false; if (num_override_l2_hash_values) { - int i; + unsigned int i; kbdev->l2_hash_values_override = true; for (i = 0; i < num_override_l2_hash_values; i++) @@ -670,9 +650,11 @@ int kbase_gpuprops_update_l2_features(struct kbase_device *kbdev) int idx; const bool asn_he = regdump.l2_config & L2_CONFIG_ASN_HASH_ENABLE_MASK; +#if !IS_ENABLED(CONFIG_MALI_NO_MALI) if (!asn_he && kbdev->l2_hash_values_override) dev_err(kbdev->dev, "Failed to use requested ASN_HASH, fallback to default"); +#endif for (idx = 0; idx < ASN_HASH_COUNT; idx++) dev_info(kbdev->dev, "%s ASN_HASH[%d] is [0x%08x]\n", @@ -698,94 +680,102 @@ static struct { #define PROP(name, member) \ {KBASE_GPUPROP_ ## name, offsetof(struct base_gpu_props, member), \ sizeof(((struct base_gpu_props *)0)->member)} - PROP(PRODUCT_ID, core_props.product_id), - PROP(VERSION_STATUS, core_props.version_status), - PROP(MINOR_REVISION, core_props.minor_revision), - PROP(MAJOR_REVISION, core_props.major_revision), - PROP(GPU_FREQ_KHZ_MAX, core_props.gpu_freq_khz_max), - PROP(LOG2_PROGRAM_COUNTER_SIZE, core_props.log2_program_counter_size), - PROP(TEXTURE_FEATURES_0, core_props.texture_features[0]), - PROP(TEXTURE_FEATURES_1, core_props.texture_features[1]), - PROP(TEXTURE_FEATURES_2, core_props.texture_features[2]), - PROP(TEXTURE_FEATURES_3, core_props.texture_features[3]), - PROP(GPU_AVAILABLE_MEMORY_SIZE, core_props.gpu_available_memory_size), - PROP(NUM_EXEC_ENGINES, core_props.num_exec_engines), - - PROP(L2_LOG2_LINE_SIZE, l2_props.log2_line_size), - PROP(L2_LOG2_CACHE_SIZE, l2_props.log2_cache_size), - PROP(L2_NUM_L2_SLICES, l2_props.num_l2_slices), - - PROP(TILER_BIN_SIZE_BYTES, tiler_props.bin_size_bytes), - PROP(TILER_MAX_ACTIVE_LEVELS, tiler_props.max_active_levels), - - PROP(MAX_THREADS, thread_props.max_threads), - PROP(MAX_WORKGROUP_SIZE, thread_props.max_workgroup_size), - PROP(MAX_BARRIER_SIZE, thread_props.max_barrier_size), - PROP(MAX_REGISTERS, thread_props.max_registers), - PROP(MAX_TASK_QUEUE, thread_props.max_task_queue), - PROP(MAX_THREAD_GROUP_SPLIT, thread_props.max_thread_group_split), - PROP(IMPL_TECH, thread_props.impl_tech), - PROP(TLS_ALLOC, thread_props.tls_alloc), - - PROP(RAW_SHADER_PRESENT, raw_props.shader_present), - PROP(RAW_TILER_PRESENT, raw_props.tiler_present), - PROP(RAW_L2_PRESENT, raw_props.l2_present), - PROP(RAW_STACK_PRESENT, raw_props.stack_present), - PROP(RAW_L2_FEATURES, raw_props.l2_features), - PROP(RAW_CORE_FEATURES, raw_props.core_features), - PROP(RAW_MEM_FEATURES, raw_props.mem_features), - PROP(RAW_MMU_FEATURES, raw_props.mmu_features), - PROP(RAW_AS_PRESENT, raw_props.as_present), - PROP(RAW_JS_PRESENT, raw_props.js_present), - PROP(RAW_JS_FEATURES_0, raw_props.js_features[0]), - PROP(RAW_JS_FEATURES_1, raw_props.js_features[1]), - PROP(RAW_JS_FEATURES_2, raw_props.js_features[2]), - PROP(RAW_JS_FEATURES_3, raw_props.js_features[3]), - PROP(RAW_JS_FEATURES_4, raw_props.js_features[4]), - PROP(RAW_JS_FEATURES_5, raw_props.js_features[5]), - PROP(RAW_JS_FEATURES_6, raw_props.js_features[6]), - PROP(RAW_JS_FEATURES_7, raw_props.js_features[7]), - PROP(RAW_JS_FEATURES_8, raw_props.js_features[8]), - PROP(RAW_JS_FEATURES_9, raw_props.js_features[9]), - PROP(RAW_JS_FEATURES_10, raw_props.js_features[10]), - PROP(RAW_JS_FEATURES_11, raw_props.js_features[11]), - PROP(RAW_JS_FEATURES_12, raw_props.js_features[12]), - PROP(RAW_JS_FEATURES_13, raw_props.js_features[13]), - PROP(RAW_JS_FEATURES_14, raw_props.js_features[14]), - PROP(RAW_JS_FEATURES_15, raw_props.js_features[15]), - PROP(RAW_TILER_FEATURES, raw_props.tiler_features), - PROP(RAW_TEXTURE_FEATURES_0, raw_props.texture_features[0]), - PROP(RAW_TEXTURE_FEATURES_1, raw_props.texture_features[1]), - PROP(RAW_TEXTURE_FEATURES_2, raw_props.texture_features[2]), - PROP(RAW_TEXTURE_FEATURES_3, raw_props.texture_features[3]), - PROP(RAW_GPU_ID, raw_props.gpu_id), - PROP(RAW_THREAD_MAX_THREADS, raw_props.thread_max_threads), - PROP(RAW_THREAD_MAX_WORKGROUP_SIZE, - raw_props.thread_max_workgroup_size), + PROP(PRODUCT_ID, core_props.product_id), + PROP(VERSION_STATUS, core_props.version_status), + PROP(MINOR_REVISION, core_props.minor_revision), + PROP(MAJOR_REVISION, core_props.major_revision), + PROP(GPU_FREQ_KHZ_MAX, core_props.gpu_freq_khz_max), + PROP(LOG2_PROGRAM_COUNTER_SIZE, core_props.log2_program_counter_size), + PROP(TEXTURE_FEATURES_0, core_props.texture_features[0]), + PROP(TEXTURE_FEATURES_1, core_props.texture_features[1]), + PROP(TEXTURE_FEATURES_2, core_props.texture_features[2]), + PROP(TEXTURE_FEATURES_3, core_props.texture_features[3]), + PROP(GPU_AVAILABLE_MEMORY_SIZE, core_props.gpu_available_memory_size), + +#if MALI_USE_CSF +#define BACKWARDS_COMPAT_PROP(name, type) \ + { \ + KBASE_GPUPROP_##name, SIZE_MAX, sizeof(type) \ + } + BACKWARDS_COMPAT_PROP(NUM_EXEC_ENGINES, u8), +#else + PROP(NUM_EXEC_ENGINES, core_props.num_exec_engines), +#endif + + PROP(L2_LOG2_LINE_SIZE, l2_props.log2_line_size), + PROP(L2_LOG2_CACHE_SIZE, l2_props.log2_cache_size), + PROP(L2_NUM_L2_SLICES, l2_props.num_l2_slices), + + PROP(TILER_BIN_SIZE_BYTES, tiler_props.bin_size_bytes), + PROP(TILER_MAX_ACTIVE_LEVELS, tiler_props.max_active_levels), + + PROP(MAX_THREADS, thread_props.max_threads), + PROP(MAX_WORKGROUP_SIZE, thread_props.max_workgroup_size), + PROP(MAX_BARRIER_SIZE, thread_props.max_barrier_size), + PROP(MAX_REGISTERS, thread_props.max_registers), + PROP(MAX_TASK_QUEUE, thread_props.max_task_queue), + PROP(MAX_THREAD_GROUP_SPLIT, thread_props.max_thread_group_split), + PROP(IMPL_TECH, thread_props.impl_tech), + PROP(TLS_ALLOC, thread_props.tls_alloc), + + PROP(RAW_SHADER_PRESENT, raw_props.shader_present), + PROP(RAW_TILER_PRESENT, raw_props.tiler_present), + PROP(RAW_L2_PRESENT, raw_props.l2_present), + PROP(RAW_STACK_PRESENT, raw_props.stack_present), + PROP(RAW_L2_FEATURES, raw_props.l2_features), + PROP(RAW_CORE_FEATURES, raw_props.core_features), + PROP(RAW_MEM_FEATURES, raw_props.mem_features), + PROP(RAW_MMU_FEATURES, raw_props.mmu_features), + PROP(RAW_AS_PRESENT, raw_props.as_present), + PROP(RAW_JS_PRESENT, raw_props.js_present), + PROP(RAW_JS_FEATURES_0, raw_props.js_features[0]), + PROP(RAW_JS_FEATURES_1, raw_props.js_features[1]), + PROP(RAW_JS_FEATURES_2, raw_props.js_features[2]), + PROP(RAW_JS_FEATURES_3, raw_props.js_features[3]), + PROP(RAW_JS_FEATURES_4, raw_props.js_features[4]), + PROP(RAW_JS_FEATURES_5, raw_props.js_features[5]), + PROP(RAW_JS_FEATURES_6, raw_props.js_features[6]), + PROP(RAW_JS_FEATURES_7, raw_props.js_features[7]), + PROP(RAW_JS_FEATURES_8, raw_props.js_features[8]), + PROP(RAW_JS_FEATURES_9, raw_props.js_features[9]), + PROP(RAW_JS_FEATURES_10, raw_props.js_features[10]), + PROP(RAW_JS_FEATURES_11, raw_props.js_features[11]), + PROP(RAW_JS_FEATURES_12, raw_props.js_features[12]), + PROP(RAW_JS_FEATURES_13, raw_props.js_features[13]), + PROP(RAW_JS_FEATURES_14, raw_props.js_features[14]), + PROP(RAW_JS_FEATURES_15, raw_props.js_features[15]), + PROP(RAW_TILER_FEATURES, raw_props.tiler_features), + PROP(RAW_TEXTURE_FEATURES_0, raw_props.texture_features[0]), + PROP(RAW_TEXTURE_FEATURES_1, raw_props.texture_features[1]), + PROP(RAW_TEXTURE_FEATURES_2, raw_props.texture_features[2]), + PROP(RAW_TEXTURE_FEATURES_3, raw_props.texture_features[3]), + PROP(RAW_GPU_ID, raw_props.gpu_id), + PROP(RAW_THREAD_MAX_THREADS, raw_props.thread_max_threads), + PROP(RAW_THREAD_MAX_WORKGROUP_SIZE, raw_props.thread_max_workgroup_size), PROP(RAW_THREAD_MAX_BARRIER_SIZE, raw_props.thread_max_barrier_size), - PROP(RAW_THREAD_FEATURES, raw_props.thread_features), - PROP(RAW_COHERENCY_MODE, raw_props.coherency_mode), - PROP(RAW_THREAD_TLS_ALLOC, raw_props.thread_tls_alloc), - PROP(RAW_GPU_FEATURES, raw_props.gpu_features), - PROP(COHERENCY_NUM_GROUPS, coherency_info.num_groups), - PROP(COHERENCY_NUM_CORE_GROUPS, coherency_info.num_core_groups), - PROP(COHERENCY_COHERENCY, coherency_info.coherency), - PROP(COHERENCY_GROUP_0, coherency_info.group[0].core_mask), - PROP(COHERENCY_GROUP_1, coherency_info.group[1].core_mask), - PROP(COHERENCY_GROUP_2, coherency_info.group[2].core_mask), - PROP(COHERENCY_GROUP_3, coherency_info.group[3].core_mask), - PROP(COHERENCY_GROUP_4, coherency_info.group[4].core_mask), - PROP(COHERENCY_GROUP_5, coherency_info.group[5].core_mask), - PROP(COHERENCY_GROUP_6, coherency_info.group[6].core_mask), - PROP(COHERENCY_GROUP_7, coherency_info.group[7].core_mask), - PROP(COHERENCY_GROUP_8, coherency_info.group[8].core_mask), - PROP(COHERENCY_GROUP_9, coherency_info.group[9].core_mask), - PROP(COHERENCY_GROUP_10, coherency_info.group[10].core_mask), - PROP(COHERENCY_GROUP_11, coherency_info.group[11].core_mask), - PROP(COHERENCY_GROUP_12, coherency_info.group[12].core_mask), - PROP(COHERENCY_GROUP_13, coherency_info.group[13].core_mask), - PROP(COHERENCY_GROUP_14, coherency_info.group[14].core_mask), - PROP(COHERENCY_GROUP_15, coherency_info.group[15].core_mask), + PROP(RAW_THREAD_FEATURES, raw_props.thread_features), + PROP(RAW_COHERENCY_MODE, raw_props.coherency_mode), + PROP(RAW_THREAD_TLS_ALLOC, raw_props.thread_tls_alloc), + PROP(RAW_GPU_FEATURES, raw_props.gpu_features), + PROP(COHERENCY_NUM_GROUPS, coherency_info.num_groups), + PROP(COHERENCY_NUM_CORE_GROUPS, coherency_info.num_core_groups), + PROP(COHERENCY_COHERENCY, coherency_info.coherency), + PROP(COHERENCY_GROUP_0, coherency_info.group[0].core_mask), + PROP(COHERENCY_GROUP_1, coherency_info.group[1].core_mask), + PROP(COHERENCY_GROUP_2, coherency_info.group[2].core_mask), + PROP(COHERENCY_GROUP_3, coherency_info.group[3].core_mask), + PROP(COHERENCY_GROUP_4, coherency_info.group[4].core_mask), + PROP(COHERENCY_GROUP_5, coherency_info.group[5].core_mask), + PROP(COHERENCY_GROUP_6, coherency_info.group[6].core_mask), + PROP(COHERENCY_GROUP_7, coherency_info.group[7].core_mask), + PROP(COHERENCY_GROUP_8, coherency_info.group[8].core_mask), + PROP(COHERENCY_GROUP_9, coherency_info.group[9].core_mask), + PROP(COHERENCY_GROUP_10, coherency_info.group[10].core_mask), + PROP(COHERENCY_GROUP_11, coherency_info.group[11].core_mask), + PROP(COHERENCY_GROUP_12, coherency_info.group[12].core_mask), + PROP(COHERENCY_GROUP_13, coherency_info.group[13].core_mask), + PROP(COHERENCY_GROUP_14, coherency_info.group[14].core_mask), + PROP(COHERENCY_GROUP_15, coherency_info.group[15].core_mask), #undef PROP }; @@ -805,7 +795,7 @@ int kbase_gpuprops_populate_user_buffer(struct kbase_device *kbdev) } kprops->prop_buffer_size = size; - kprops->prop_buffer = kmalloc(size, GFP_KERNEL); + kprops->prop_buffer = kzalloc(size, GFP_KERNEL); if (!kprops->prop_buffer) { kprops->prop_buffer_size = 0; @@ -822,7 +812,14 @@ int kbase_gpuprops_populate_user_buffer(struct kbase_device *kbdev) for (i = 0; i < count; i++) { u32 type = gpu_property_mapping[i].type; u8 type_size; - void *field = ((u8 *)props) + gpu_property_mapping[i].offset; + const size_t offset = gpu_property_mapping[i].offset; + const u64 dummy_backwards_compat_value = (u64)0; + const void *field; + + if (likely(offset < sizeof(struct base_gpu_props))) + field = ((const u8 *)props) + offset; + else + field = &dummy_backwards_compat_value; switch (gpu_property_mapping[i].size) { case 1: @@ -848,16 +845,16 @@ int kbase_gpuprops_populate_user_buffer(struct kbase_device *kbdev) switch (type_size) { case KBASE_GPUPROP_VALUE_SIZE_U8: - WRITE_U8(*((u8 *)field)); + WRITE_U8(*((const u8 *)field)); break; case KBASE_GPUPROP_VALUE_SIZE_U16: - WRITE_U16(*((u16 *)field)); + WRITE_U16(*((const u16 *)field)); break; case KBASE_GPUPROP_VALUE_SIZE_U32: - WRITE_U32(*((u32 *)field)); + WRITE_U32(*((const u32 *)field)); break; case KBASE_GPUPROP_VALUE_SIZE_U64: - WRITE_U64(*((u64 *)field)); + WRITE_U64(*((const u64 *)field)); break; default: /* Cannot be reached */ WARN_ON(1); diff --git a/mali_kbase/mali_kbase_gwt.c b/mali_kbase/mali_kbase_gwt.c index 16cccee..4914e24 100644 --- a/mali_kbase/mali_kbase_gwt.c +++ b/mali_kbase/mali_kbase_gwt.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2010-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -53,17 +53,17 @@ static void kbase_gpu_gwt_setup_pages(struct kbase_context *kctx, unsigned long flag) { kbase_gpu_gwt_setup_page_permission(kctx, flag, - rb_first(&(kctx->reg_rbtree_same))); + rb_first(&kctx->reg_zone[SAME_VA_ZONE].reg_rbtree)); kbase_gpu_gwt_setup_page_permission(kctx, flag, - rb_first(&(kctx->reg_rbtree_custom))); + rb_first(&kctx->reg_zone[CUSTOM_VA_ZONE].reg_rbtree)); } int kbase_gpu_gwt_start(struct kbase_context *kctx) { - kbase_gpu_vm_lock(kctx); + kbase_gpu_vm_lock_with_pmode_sync(kctx); if (kctx->gwt_enabled) { - kbase_gpu_vm_unlock(kctx); + kbase_gpu_vm_unlock_with_pmode_sync(kctx); return -EBUSY; } @@ -90,7 +90,7 @@ int kbase_gpu_gwt_start(struct kbase_context *kctx) kbase_gpu_gwt_setup_pages(kctx, ~KBASE_REG_GPU_WR); - kbase_gpu_vm_unlock(kctx); + kbase_gpu_vm_unlock_with_pmode_sync(kctx); return 0; } @@ -125,14 +125,17 @@ int kbase_gpu_gwt_stop(struct kbase_context *kctx) return 0; } - +#if (KERNEL_VERSION(5, 13, 0) <= LINUX_VERSION_CODE) +static int list_cmp_function(void *priv, const struct list_head *a, const struct list_head *b) +#else static int list_cmp_function(void *priv, struct list_head *a, struct list_head *b) +#endif { - struct kbasep_gwt_list_element *elementA = container_of(a, - struct kbasep_gwt_list_element, link); - struct kbasep_gwt_list_element *elementB = container_of(b, - struct kbasep_gwt_list_element, link); + const struct kbasep_gwt_list_element *elementA = + container_of(a, struct kbasep_gwt_list_element, link); + const struct kbasep_gwt_list_element *elementB = + container_of(b, struct kbasep_gwt_list_element, link); CSTD_UNUSED(priv); diff --git a/mali_kbase/mali_kbase_hw.c b/mali_kbase/mali_kbase_hw.c index 75e4aaf..b07327a 100644 --- a/mali_kbase/mali_kbase_hw.c +++ b/mali_kbase/mali_kbase_hw.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2012-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2012-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -68,9 +68,6 @@ void kbase_hw_set_features_mask(struct kbase_device *kbdev) case GPU_ID2_PRODUCT_TBAX: features = base_hw_features_tBAx; break; - case GPU_ID2_PRODUCT_TDUX: - features = base_hw_features_tDUx; - break; case GPU_ID2_PRODUCT_TODX: case GPU_ID2_PRODUCT_LODX: features = base_hw_features_tODx; @@ -85,6 +82,10 @@ void kbase_hw_set_features_mask(struct kbase_device *kbdev) case GPU_ID2_PRODUCT_LTUX: features = base_hw_features_tTUx; break; + case GPU_ID2_PRODUCT_TTIX: + case GPU_ID2_PRODUCT_LTIX: + features = base_hw_features_tTIx; + break; default: features = base_hw_features_generic; break; @@ -137,8 +138,7 @@ static const enum base_hw_issue *kbase_hw_get_issues_for_new_id( static const struct base_hw_product base_hw_products[] = { { GPU_ID2_PRODUCT_TMIX, - { { GPU_ID2_VERSION_MAKE(0, 0, 1), - base_hw_issues_tMIx_r0p0_05dev0 }, + { { GPU_ID2_VERSION_MAKE(0, 0, 1), base_hw_issues_tMIx_r0p0_05dev0 }, { GPU_ID2_VERSION_MAKE(0, 0, 2), base_hw_issues_tMIx_r0p0 }, { GPU_ID2_VERSION_MAKE(0, 1, 0), base_hw_issues_tMIx_r0p1 }, { U32_MAX /* sentinel value */, NULL } } }, @@ -208,10 +208,6 @@ static const enum base_hw_issue *kbase_hw_get_issues_for_new_id( { GPU_ID2_VERSION_MAKE(0, 0, 2), base_hw_issues_tBAx_r0p0 }, { U32_MAX, NULL } } }, - { GPU_ID2_PRODUCT_TDUX, - { { GPU_ID2_VERSION_MAKE(0, 0, 0), base_hw_issues_tDUx_r0p0 }, - { U32_MAX, NULL } } }, - { GPU_ID2_PRODUCT_TODX, { { GPU_ID2_VERSION_MAKE(0, 0, 0), base_hw_issues_tODx_r0p0 }, { GPU_ID2_VERSION_MAKE(0, 0, 4), base_hw_issues_tODx_r0p0 }, @@ -232,12 +228,27 @@ static const enum base_hw_issue *kbase_hw_get_issues_for_new_id( { GPU_ID2_PRODUCT_TTUX, { { GPU_ID2_VERSION_MAKE(0, 0, 0), base_hw_issues_tTUx_r0p0 }, + { GPU_ID2_VERSION_MAKE(0, 1, 0), base_hw_issues_tTUx_r0p1 }, { GPU_ID2_VERSION_MAKE(1, 0, 0), base_hw_issues_tTUx_r1p0 }, + { GPU_ID2_VERSION_MAKE(1, 1, 0), base_hw_issues_tTUx_r1p1 }, + { GPU_ID2_VERSION_MAKE(1, 2, 0), base_hw_issues_tTUx_r1p2 }, + { GPU_ID2_VERSION_MAKE(1, 3, 0), base_hw_issues_tTUx_r1p3 }, { U32_MAX, NULL } } }, { GPU_ID2_PRODUCT_LTUX, { { GPU_ID2_VERSION_MAKE(0, 0, 0), base_hw_issues_tTUx_r0p0 }, { GPU_ID2_VERSION_MAKE(1, 0, 0), base_hw_issues_tTUx_r1p0 }, + { GPU_ID2_VERSION_MAKE(1, 1, 0), base_hw_issues_tTUx_r1p1 }, + { GPU_ID2_VERSION_MAKE(1, 2, 0), base_hw_issues_tTUx_r1p2 }, + { GPU_ID2_VERSION_MAKE(1, 3, 0), base_hw_issues_tTUx_r1p3 }, + { U32_MAX, NULL } } }, + + { GPU_ID2_PRODUCT_TTIX, + { { GPU_ID2_VERSION_MAKE(0, 0, 0), base_hw_issues_tTIx_r0p0 }, + { U32_MAX, NULL } } }, + + { GPU_ID2_PRODUCT_LTIX, + { { GPU_ID2_VERSION_MAKE(0, 0, 0), base_hw_issues_tTIx_r0p0 }, { U32_MAX, NULL } } }, }; @@ -294,25 +305,20 @@ static const enum base_hw_issue *kbase_hw_get_issues_for_new_id( */ issues = fallback_issues; -#if MALI_CUSTOMER_RELEASE - dev_warn(kbdev->dev, - "GPU hardware issue table may need updating:\n" -#else - dev_info(kbdev->dev, -#endif - "r%dp%d status %d is unknown; treating as r%dp%d status %d", - (gpu_id & GPU_ID2_VERSION_MAJOR) >> - GPU_ID2_VERSION_MAJOR_SHIFT, - (gpu_id & GPU_ID2_VERSION_MINOR) >> - GPU_ID2_VERSION_MINOR_SHIFT, - (gpu_id & GPU_ID2_VERSION_STATUS) >> - GPU_ID2_VERSION_STATUS_SHIFT, - (fallback_version & GPU_ID2_VERSION_MAJOR) >> - GPU_ID2_VERSION_MAJOR_SHIFT, - (fallback_version & GPU_ID2_VERSION_MINOR) >> - GPU_ID2_VERSION_MINOR_SHIFT, - (fallback_version & GPU_ID2_VERSION_STATUS) >> - GPU_ID2_VERSION_STATUS_SHIFT); + dev_notice(kbdev->dev, "r%dp%d status %d not found in HW issues table;\n", + (gpu_id & GPU_ID2_VERSION_MAJOR) >> GPU_ID2_VERSION_MAJOR_SHIFT, + (gpu_id & GPU_ID2_VERSION_MINOR) >> GPU_ID2_VERSION_MINOR_SHIFT, + (gpu_id & GPU_ID2_VERSION_STATUS) >> + GPU_ID2_VERSION_STATUS_SHIFT); + dev_notice(kbdev->dev, "falling back to closest match: r%dp%d status %d\n", + (fallback_version & GPU_ID2_VERSION_MAJOR) >> + GPU_ID2_VERSION_MAJOR_SHIFT, + (fallback_version & GPU_ID2_VERSION_MINOR) >> + GPU_ID2_VERSION_MINOR_SHIFT, + (fallback_version & GPU_ID2_VERSION_STATUS) >> + GPU_ID2_VERSION_STATUS_SHIFT); + dev_notice(kbdev->dev, + "Execution proceeding normally with fallback match\n"); gpu_id &= ~GPU_ID2_VERSION; gpu_id |= fallback_version; @@ -338,7 +344,7 @@ int kbase_hw_set_issues_mask(struct kbase_device *kbdev) issues = kbase_hw_get_issues_for_new_id(kbdev); if (issues == NULL) { dev_err(kbdev->dev, - "Unknown GPU ID %x", gpu_id); + "HW product - Unknown GPU ID %x", gpu_id); return -EINVAL; } @@ -382,9 +388,6 @@ int kbase_hw_set_issues_mask(struct kbase_device *kbdev) case GPU_ID2_PRODUCT_TBAX: issues = base_hw_issues_model_tBAx; break; - case GPU_ID2_PRODUCT_TDUX: - issues = base_hw_issues_model_tDUx; - break; case GPU_ID2_PRODUCT_TODX: case GPU_ID2_PRODUCT_LODX: issues = base_hw_issues_model_tODx; @@ -399,10 +402,13 @@ int kbase_hw_set_issues_mask(struct kbase_device *kbdev) case GPU_ID2_PRODUCT_LTUX: issues = base_hw_issues_model_tTUx; break; - + case GPU_ID2_PRODUCT_TTIX: + case GPU_ID2_PRODUCT_LTIX: + issues = base_hw_issues_model_tTIx; + break; default: dev_err(kbdev->dev, - "Unknown GPU ID %x", gpu_id); + "HW issues - Unknown GPU ID %x", gpu_id); return -EINVAL; } } diff --git a/mali_kbase/mali_kbase_hwaccess_jm.h b/mali_kbase/mali_kbase_hwaccess_jm.h index 95d7624..ca77c19 100644 --- a/mali_kbase/mali_kbase_hwaccess_jm.h +++ b/mali_kbase/mali_kbase_hwaccess_jm.h @@ -97,8 +97,8 @@ bool kbase_backend_use_ctx(struct kbase_device *kbdev, * Return: true if context is now active, false otherwise (ie if context does * not have an address space assigned) */ -bool kbase_backend_use_ctx_sched(struct kbase_device *kbdev, - struct kbase_context *kctx, int js); +bool kbase_backend_use_ctx_sched(struct kbase_device *kbdev, struct kbase_context *kctx, + unsigned int js); /** * kbase_backend_release_ctx_irq - Release a context from the GPU. This will @@ -183,8 +183,7 @@ void kbase_backend_reset(struct kbase_device *kbdev, ktime_t *end_timestamp); * * Return: Atom currently at the head of slot @js, or NULL */ -struct kbase_jd_atom *kbase_backend_inspect_tail(struct kbase_device *kbdev, - int js); +struct kbase_jd_atom *kbase_backend_inspect_tail(struct kbase_device *kbdev, unsigned int js); /** * kbase_backend_nr_atoms_on_slot() - Return the number of atoms currently on a @@ -194,7 +193,7 @@ struct kbase_jd_atom *kbase_backend_inspect_tail(struct kbase_device *kbdev, * * Return: Number of atoms currently on slot */ -int kbase_backend_nr_atoms_on_slot(struct kbase_device *kbdev, int js); +int kbase_backend_nr_atoms_on_slot(struct kbase_device *kbdev, unsigned int js); /** * kbase_backend_nr_atoms_submitted() - Return the number of atoms on a slot @@ -204,7 +203,7 @@ int kbase_backend_nr_atoms_on_slot(struct kbase_device *kbdev, int js); * * Return: Number of atoms currently on slot @js that are currently on the GPU. */ -int kbase_backend_nr_atoms_submitted(struct kbase_device *kbdev, int js); +int kbase_backend_nr_atoms_submitted(struct kbase_device *kbdev, unsigned int js); /** * kbase_backend_ctx_count_changed() - Number of contexts ready to submit jobs @@ -233,10 +232,10 @@ void kbase_backend_timeouts_changed(struct kbase_device *kbdev); * * Return: Number of jobs that can be submitted. */ -int kbase_backend_slot_free(struct kbase_device *kbdev, int js); +int kbase_backend_slot_free(struct kbase_device *kbdev, unsigned int js); /** - * kbase_job_check_enter_disjoint - potentially leave disjoint state + * kbase_job_check_leave_disjoint - potentially leave disjoint state * @kbdev: kbase device * @target_katom: atom which is finishing * @@ -287,8 +286,8 @@ u32 kbase_backend_get_current_flush_id(struct kbase_device *kbdev); * Context: * The job slot lock must be held when calling this function. */ -void kbase_job_slot_hardstop(struct kbase_context *kctx, int js, - struct kbase_jd_atom *target_katom); +void kbase_job_slot_hardstop(struct kbase_context *kctx, unsigned int js, + struct kbase_jd_atom *target_katom); /** * kbase_gpu_atoms_submitted_any() - Inspect whether there are any atoms diff --git a/mali_kbase/mali_kbase_hwaccess_pm.h b/mali_kbase/mali_kbase_hwaccess_pm.h index 1c153c4..effb2ff 100644 --- a/mali_kbase/mali_kbase_hwaccess_pm.h +++ b/mali_kbase/mali_kbase_hwaccess_pm.h @@ -209,7 +209,7 @@ int kbase_pm_list_policies(struct kbase_device *kbdev, const struct kbase_pm_policy * const **list); /** - * kbase_protected_most_enable - Enable protected mode + * kbase_pm_protected_mode_enable() - Enable protected mode * * @kbdev: Address of the instance of a GPU platform device. * @@ -218,7 +218,7 @@ int kbase_pm_list_policies(struct kbase_device *kbdev, int kbase_pm_protected_mode_enable(struct kbase_device *kbdev); /** - * kbase_protected_mode_disable - Disable protected mode + * kbase_pm_protected_mode_disable() - Disable protected mode * * @kbdev: Address of the instance of a GPU platform device. * diff --git a/mali_kbase/mali_kbase_hwaccess_time.h b/mali_kbase/mali_kbase_hwaccess_time.h index 27e2cb7..f16348f 100644 --- a/mali_kbase/mali_kbase_hwaccess_time.h +++ b/mali_kbase/mali_kbase_hwaccess_time.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2014, 2018-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2014, 2018-2021, 2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -23,6 +23,56 @@ #define _KBASE_BACKEND_TIME_H_ /** + * struct kbase_backend_time - System timestamp attributes. + * + * @multiplier: Numerator of the converter's fraction. + * @divisor: Denominator of the converter's fraction. + * @offset: Converter's offset term. + * @device_scaled_timeouts: Timeouts in milliseconds that were scaled to be + * consistent with the minimum MCU frequency. This + * array caches the results of all of the conversions + * for ease of use later on. + * + * According to Generic timer spec, system timer: + * - Increments at a fixed frequency + * - Starts operating from zero + * + * Hence CPU time is a linear function of System Time. + * + * CPU_ts = alpha * SYS_ts + beta + * + * Where + * - alpha = 10^9/SYS_ts_freq + * - beta is calculated by two timer samples taken at the same time: + * beta = CPU_ts_s - SYS_ts_s * alpha + * + * Since alpha is a rational number, we minimizing possible + * rounding error by simplifying the ratio. Thus alpha is stored + * as a simple `multiplier / divisor` ratio. + * + */ +struct kbase_backend_time { +#if MALI_USE_CSF + u64 multiplier; + u64 divisor; + s64 offset; +#endif + unsigned int device_scaled_timeouts[KBASE_TIMEOUT_SELECTOR_COUNT]; +}; + +#if MALI_USE_CSF +/** + * kbase_backend_time_convert_gpu_to_cpu() - Convert GPU timestamp to CPU timestamp. + * + * @kbdev: Kbase device pointer + * @gpu_ts: System timestamp value to converter. + * + * Return: The CPU timestamp. + */ +u64 __maybe_unused kbase_backend_time_convert_gpu_to_cpu(struct kbase_device *kbdev, u64 gpu_ts); +#endif + +/** * kbase_backend_get_gpu_time() - Get current GPU time * @kbdev: Device pointer * @cycle_counter: Pointer to u64 to store cycle counter in. @@ -47,7 +97,38 @@ void kbase_backend_get_gpu_time_norequest(struct kbase_device *kbdev, u64 *system_time, struct timespec64 *ts); -#endif /* _KBASE_BACKEND_TIME_H_ */ +/** + * kbase_device_set_timeout_ms - Set an unscaled device timeout in milliseconds, + * subject to the maximum timeout constraint. + * + * @kbdev: KBase device pointer. + * @selector: The specific timeout that should be scaled. + * @timeout_ms: The timeout in cycles which should be scaled. + * + * This function writes the absolute timeout in milliseconds to the table of + * precomputed device timeouts, while estabilishing an upped bound on the individual + * timeout of UINT_MAX milliseconds. + */ +void kbase_device_set_timeout_ms(struct kbase_device *kbdev, enum kbase_timeout_selector selector, + unsigned int timeout_ms); + +/** + * kbase_device_set_timeout - Calculate the given timeout using the provided + * timeout cycles and multiplier. + * + * @kbdev: KBase device pointer. + * @selector: The specific timeout that should be scaled. + * @timeout_cycles: The timeout in cycles which should be scaled. + * @cycle_multiplier: A multiplier applied to the number of cycles, allowing + * the callsite to scale the minimum timeout based on the + * host device. + * + * This function writes the scaled timeout to the per-device table to avoid + * having to recompute the timeouts every single time that the related methods + * are called. + */ +void kbase_device_set_timeout(struct kbase_device *kbdev, enum kbase_timeout_selector selector, + u64 timeout_cycles, u32 cycle_multiplier); /** * kbase_get_timeout_ms - Choose a timeout value to get a timeout scaled @@ -70,3 +151,17 @@ unsigned int kbase_get_timeout_ms(struct kbase_device *kbdev, * Return: Snapshot of the GPU cycle count register. */ u64 kbase_backend_get_cycle_cnt(struct kbase_device *kbdev); + +/** + * kbase_backend_time_init() - Initialize system timestamp converter. + * + * @kbdev: Kbase device pointer + * + * This function should only be called after GPU is powered-up and + * L2 cached power-up has been initiated. + * + * Return: Zero on success, error code otherwise. + */ +int kbase_backend_time_init(struct kbase_device *kbdev); + +#endif /* _KBASE_BACKEND_TIME_H_ */ diff --git a/mali_kbase/mali_kbase_jd.c b/mali_kbase/mali_kbase_jd.c index 97add10..15e30db 100644 --- a/mali_kbase/mali_kbase_jd.c +++ b/mali_kbase/mali_kbase_jd.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2010-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2010-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -28,6 +28,11 @@ #include <linux/version.h> #include <linux/ratelimit.h> #include <linux/priority_control_manager.h> +#if KERNEL_VERSION(4, 11, 0) <= LINUX_VERSION_CODE +#include <linux/sched/signal.h> +#else +#include <linux/signal.h> +#endif #include <mali_kbase_jm.h> #include <mali_kbase_kinstr_jm.h> @@ -35,7 +40,6 @@ #include <tl/mali_kbase_tracepoints.h> #include <mali_linux_trace.h> -#include "mali_kbase_dma_fence.h" #include <mali_kbase_cs_experimental.h> #include <mali_kbase_caps.h> @@ -82,7 +86,7 @@ static void jd_mark_atom_complete(struct kbase_jd_atom *katom) * Returns whether the JS needs a reschedule. * * Note that the caller must also check the atom status and - * if it is KBASE_JD_ATOM_STATE_COMPLETED must call jd_done_nolock + * if it is KBASE_JD_ATOM_STATE_COMPLETED must call kbase_jd_done_nolock */ static bool jd_run_atom(struct kbase_jd_atom *katom) { @@ -148,7 +152,7 @@ void kbase_jd_dep_clear_locked(struct kbase_jd_atom *katom) if (katom->status == KBASE_JD_ATOM_STATE_COMPLETED) { /* The atom has already finished */ - resched |= jd_done_nolock(katom, true); + resched |= kbase_jd_done_nolock(katom, true); } if (resched) @@ -158,15 +162,6 @@ void kbase_jd_dep_clear_locked(struct kbase_jd_atom *katom) void kbase_jd_free_external_resources(struct kbase_jd_atom *katom) { -#ifdef CONFIG_MALI_DMA_FENCE - /* Flush dma-fence workqueue to ensure that any callbacks that may have - * been queued are done before continuing. - * Any successfully completed atom would have had all it's callbacks - * completed before the atom was run, so only flush for failed atoms. - */ - if (katom->event_code != BASE_JD_EVENT_DONE) - flush_workqueue(katom->kctx->dma_fence.wq); -#endif /* CONFIG_MALI_DMA_FENCE */ } static void kbase_jd_post_external_resources(struct kbase_jd_atom *katom) @@ -174,10 +169,6 @@ static void kbase_jd_post_external_resources(struct kbase_jd_atom *katom) KBASE_DEBUG_ASSERT(katom); KBASE_DEBUG_ASSERT(katom->core_req & BASE_JD_REQ_EXTERNAL_RESOURCES); -#ifdef CONFIG_MALI_DMA_FENCE - kbase_dma_fence_signal(katom); -#endif /* CONFIG_MALI_DMA_FENCE */ - kbase_gpu_vm_lock(katom->kctx); /* only roll back if extres is non-NULL */ if (katom->extres) { @@ -185,13 +176,7 @@ static void kbase_jd_post_external_resources(struct kbase_jd_atom *katom) res_no = katom->nr_extres; while (res_no-- > 0) { - struct kbase_mem_phy_alloc *alloc = katom->extres[res_no].alloc; - struct kbase_va_region *reg; - - reg = kbase_region_tracker_find_region_base_address( - katom->kctx, - katom->extres[res_no].gpu_address); - kbase_unmap_external_resource(katom->kctx, reg, alloc); + kbase_unmap_external_resource(katom->kctx, katom->extres[res_no]); } kfree(katom->extres); katom->extres = NULL; @@ -207,26 +192,8 @@ static void kbase_jd_post_external_resources(struct kbase_jd_atom *katom) static int kbase_jd_pre_external_resources(struct kbase_jd_atom *katom, const struct base_jd_atom *user_atom) { - int err_ret_val = -EINVAL; + int err = -EINVAL; u32 res_no; -#ifdef CONFIG_MALI_DMA_FENCE - struct kbase_dma_fence_resv_info info = { - .resv_objs = NULL, - .dma_fence_resv_count = 0, - .dma_fence_excl_bitmap = NULL - }; -#if defined(CONFIG_SYNC) || defined(CONFIG_SYNC_FILE) - /* - * When both dma-buf fence and Android native sync is enabled, we - * disable dma-buf fence for contexts that are using Android native - * fences. - */ - const bool implicit_sync = !kbase_ctx_flag(katom->kctx, - KCTX_NO_IMPLICIT_SYNC); -#else /* CONFIG_SYNC || CONFIG_SYNC_FILE*/ - const bool implicit_sync = true; -#endif /* CONFIG_SYNC || CONFIG_SYNC_FILE */ -#endif /* CONFIG_MALI_DMA_FENCE */ struct base_external_resource *input_extres; KBASE_DEBUG_ASSERT(katom); @@ -240,68 +207,32 @@ static int kbase_jd_pre_external_resources(struct kbase_jd_atom *katom, const st if (!katom->extres) return -ENOMEM; - /* copy user buffer to the end of our real buffer. - * Make sure the struct sizes haven't changed in a way - * we don't support - */ - BUILD_BUG_ON(sizeof(*input_extres) > sizeof(*katom->extres)); - input_extres = (struct base_external_resource *) - (((unsigned char *)katom->extres) + - (sizeof(*katom->extres) - sizeof(*input_extres)) * - katom->nr_extres); + input_extres = kmalloc_array(katom->nr_extres, sizeof(*input_extres), GFP_KERNEL); + if (!input_extres) { + err = -ENOMEM; + goto failed_input_alloc; + } if (copy_from_user(input_extres, get_compat_pointer(katom->kctx, user_atom->extres_list), sizeof(*input_extres) * katom->nr_extres) != 0) { - err_ret_val = -EINVAL; - goto early_err_out; + err = -EINVAL; + goto failed_input_copy; } -#ifdef CONFIG_MALI_DMA_FENCE - if (implicit_sync) { - info.resv_objs = - kmalloc_array(katom->nr_extres, -#if (KERNEL_VERSION(5, 4, 0) > LINUX_VERSION_CODE) - sizeof(struct reservation_object *), -#else - sizeof(struct dma_resv *), -#endif - GFP_KERNEL); - if (!info.resv_objs) { - err_ret_val = -ENOMEM; - goto early_err_out; - } - - info.dma_fence_excl_bitmap = - kcalloc(BITS_TO_LONGS(katom->nr_extres), - sizeof(unsigned long), GFP_KERNEL); - if (!info.dma_fence_excl_bitmap) { - err_ret_val = -ENOMEM; - goto early_err_out; - } - } -#endif /* CONFIG_MALI_DMA_FENCE */ - /* Take the processes mmap lock */ down_read(kbase_mem_get_process_mmap_lock()); /* need to keep the GPU VM locked while we set up UMM buffers */ kbase_gpu_vm_lock(katom->kctx); for (res_no = 0; res_no < katom->nr_extres; res_no++) { - struct base_external_resource *res = &input_extres[res_no]; + struct base_external_resource *user_res = &input_extres[res_no]; struct kbase_va_region *reg; - struct kbase_mem_phy_alloc *alloc; -#ifdef CONFIG_MALI_DMA_FENCE - bool exclusive; - exclusive = (res->ext_resource & BASE_EXT_RES_ACCESS_EXCLUSIVE) - ? true : false; -#endif reg = kbase_region_tracker_find_region_enclosing_address( - katom->kctx, - res->ext_resource & ~BASE_EXT_RES_ACCESS_EXCLUSIVE); + katom->kctx, user_res->ext_resource & ~BASE_EXT_RES_ACCESS_EXCLUSIVE); /* did we find a matching region object? */ - if (kbase_is_region_invalid_or_free(reg)) { + if (unlikely(kbase_is_region_invalid_or_free(reg))) { /* roll back */ goto failed_loop; } @@ -311,36 +242,11 @@ static int kbase_jd_pre_external_resources(struct kbase_jd_atom *katom, const st katom->atom_flags |= KBASE_KATOM_FLAG_PROTECTED; } - alloc = kbase_map_external_resource(katom->kctx, reg, - current->mm); - if (!alloc) { - err_ret_val = -EINVAL; + err = kbase_map_external_resource(katom->kctx, reg, current->mm); + if (err) goto failed_loop; - } - -#ifdef CONFIG_MALI_DMA_FENCE - if (implicit_sync && - reg->gpu_alloc->type == KBASE_MEM_TYPE_IMPORTED_UMM) { -#if (KERNEL_VERSION(5, 4, 0) > LINUX_VERSION_CODE) - struct reservation_object *resv; -#else - struct dma_resv *resv; -#endif - resv = reg->gpu_alloc->imported.umm.dma_buf->resv; - if (resv) - kbase_dma_fence_add_reservation(resv, &info, - exclusive); - } -#endif /* CONFIG_MALI_DMA_FENCE */ - /* finish with updating out array with the data we found */ - /* NOTE: It is important that this is the last thing we do (or - * at least not before the first write) as we overwrite elements - * as we loop and could be overwriting ourself, so no writes - * until the last read for an element. - */ - katom->extres[res_no].gpu_address = reg->start_pfn << PAGE_SHIFT; /* save the start_pfn (as an address, not pfn) to use fast lookup later */ - katom->extres[res_no].alloc = alloc; + katom->extres[res_no] = reg; } /* successfully parsed the extres array */ /* drop the vm lock now */ @@ -349,57 +255,33 @@ static int kbase_jd_pre_external_resources(struct kbase_jd_atom *katom, const st /* Release the processes mmap lock */ up_read(kbase_mem_get_process_mmap_lock()); -#ifdef CONFIG_MALI_DMA_FENCE - if (implicit_sync) { - if (info.dma_fence_resv_count) { - int ret; - - ret = kbase_dma_fence_wait(katom, &info); - if (ret < 0) - goto failed_dma_fence_setup; - } - - kfree(info.resv_objs); - kfree(info.dma_fence_excl_bitmap); - } -#endif /* CONFIG_MALI_DMA_FENCE */ + /* Free the buffer holding data from userspace */ + kfree(input_extres); /* all done OK */ return 0; /* error handling section */ - -#ifdef CONFIG_MALI_DMA_FENCE -failed_dma_fence_setup: - /* Lock the processes mmap lock */ - down_read(kbase_mem_get_process_mmap_lock()); - - /* lock before we unmap */ - kbase_gpu_vm_lock(katom->kctx); -#endif - - failed_loop: - /* undo the loop work */ +failed_loop: + /* undo the loop work. We are guaranteed to have access to the VA region + * as we hold a reference to it until it's unmapped + */ while (res_no-- > 0) { - struct kbase_mem_phy_alloc *alloc = katom->extres[res_no].alloc; + struct kbase_va_region *reg = katom->extres[res_no]; - kbase_unmap_external_resource(katom->kctx, NULL, alloc); + kbase_unmap_external_resource(katom->kctx, reg); } kbase_gpu_vm_unlock(katom->kctx); /* Release the processes mmap lock */ up_read(kbase_mem_get_process_mmap_lock()); - early_err_out: +failed_input_copy: + kfree(input_extres); +failed_input_alloc: kfree(katom->extres); katom->extres = NULL; -#ifdef CONFIG_MALI_DMA_FENCE - if (implicit_sync) { - kfree(info.resv_objs); - kfree(info.dma_fence_excl_bitmap); - } -#endif - return err_ret_val; + return err; } static inline void jd_resolve_dep(struct list_head *out_list, @@ -422,10 +304,6 @@ static inline void jd_resolve_dep(struct list_head *out_list, if (katom->event_code != BASE_JD_EVENT_DONE && (dep_type != BASE_JD_DEP_TYPE_ORDER)) { -#ifdef CONFIG_MALI_DMA_FENCE - kbase_dma_fence_cancel_callbacks(dep_atom); -#endif - dep_atom->event_code = katom->event_code; KBASE_DEBUG_ASSERT(dep_atom->status != KBASE_JD_ATOM_STATE_UNUSED); @@ -439,35 +317,8 @@ static inline void jd_resolve_dep(struct list_head *out_list, (IS_GPU_ATOM(dep_atom) && !ctx_is_dying && !dep_atom->will_fail_event_code && !other_dep_atom->will_fail_event_code))) { - bool dep_satisfied = true; -#ifdef CONFIG_MALI_DMA_FENCE - int dep_count; - - dep_count = kbase_fence_dep_count_read(dep_atom); - if (likely(dep_count == -1)) { - dep_satisfied = true; - } else { - /* - * There are either still active callbacks, or - * all fences for this @dep_atom has signaled, - * but the worker that will queue the atom has - * not yet run. - * - * Wait for the fences to signal and the fence - * worker to run and handle @dep_atom. If - * @dep_atom was completed due to error on - * @katom, then the fence worker will pick up - * the complete status and error code set on - * @dep_atom above. - */ - dep_satisfied = false; - } -#endif /* CONFIG_MALI_DMA_FENCE */ - - if (dep_satisfied) { - dep_atom->in_jd_list = true; - list_add_tail(&dep_atom->jd_item, out_list); - } + dep_atom->in_jd_list = true; + list_add_tail(&dep_atom->jd_item, out_list); } } } @@ -526,33 +377,8 @@ static void jd_try_submitting_deps(struct list_head *out_list, dep_atom->dep[0].atom); bool dep1_valid = is_dep_valid( dep_atom->dep[1].atom); - bool dep_satisfied = true; -#ifdef CONFIG_MALI_DMA_FENCE - int dep_count; - - dep_count = kbase_fence_dep_count_read( - dep_atom); - if (likely(dep_count == -1)) { - dep_satisfied = true; - } else { - /* - * There are either still active callbacks, or - * all fences for this @dep_atom has signaled, - * but the worker that will queue the atom has - * not yet run. - * - * Wait for the fences to signal and the fence - * worker to run and handle @dep_atom. If - * @dep_atom was completed due to error on - * @katom, then the fence worker will pick up - * the complete status and error code set on - * @dep_atom above. - */ - dep_satisfied = false; - } -#endif /* CONFIG_MALI_DMA_FENCE */ - if (dep0_valid && dep1_valid && dep_satisfied) { + if (dep0_valid && dep1_valid) { dep_atom->in_jd_list = true; list_add(&dep_atom->jd_item, out_list); } @@ -780,10 +606,13 @@ static void jd_mark_simple_gfx_frame_atoms(struct kbase_jd_atom *katom) } if (dep_fence && dep_vtx) { + unsigned long flags; dev_dbg(kbdev->dev, "Simple gfx frame: {vtx=%pK, wait=%pK}->frag=%pK\n", dep_vtx, dep_fence, katom); + spin_lock_irqsave(&kbdev->hwaccess_lock, flags); katom->atom_flags |= KBASE_KATOM_FLAG_SIMPLE_FRAME_FRAGMENT; dep_vtx->atom_flags |= KBASE_KATOM_FLAG_DEFER_WHILE_POWEROFF; + spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); } } @@ -796,7 +625,7 @@ static void jd_mark_simple_gfx_frame_atoms(struct kbase_jd_atom *katom) * * The caller must hold the kbase_jd_context.lock. */ -bool jd_done_nolock(struct kbase_jd_atom *katom, bool post_immediately) +bool kbase_jd_done_nolock(struct kbase_jd_atom *katom, bool post_immediately) { struct kbase_context *kctx = katom->kctx; struct list_head completed_jobs; @@ -804,6 +633,8 @@ bool jd_done_nolock(struct kbase_jd_atom *katom, bool post_immediately) bool need_to_try_schedule_context = false; int i; + lockdep_assert_held(&kctx->jctx.lock); + KBASE_TLSTREAM_TL_JD_DONE_NO_LOCK_START(kctx->kbdev, katom); INIT_LIST_HEAD(&completed_jobs); @@ -855,14 +686,15 @@ bool jd_done_nolock(struct kbase_jd_atom *katom, bool post_immediately) dev_dbg(kctx->kbdev->dev, "Simple-frame fragment atom %pK unblocked\n", node); - node->atom_flags &= - ~KBASE_KATOM_FLAG_SIMPLE_FRAME_FRAGMENT; for (i = 0; i < 2; i++) { if (node->dep[i].atom && node->dep[i].atom->atom_flags & KBASE_KATOM_FLAG_DEFER_WHILE_POWEROFF) { + unsigned long flags; + spin_lock_irqsave(&kctx->kbdev->hwaccess_lock, flags); node->dep[i].atom->atom_flags &= ~KBASE_KATOM_FLAG_DEFER_WHILE_POWEROFF; + spin_unlock_irqrestore(&kctx->kbdev->hwaccess_lock, flags); dev_dbg(kctx->kbdev->dev, " Undeferred atom %pK\n", node->dep[i].atom); @@ -936,7 +768,7 @@ bool jd_done_nolock(struct kbase_jd_atom *katom, bool post_immediately) return need_to_try_schedule_context; } -KBASE_EXPORT_TEST_API(jd_done_nolock); +KBASE_EXPORT_TEST_API(kbase_jd_done_nolock); #if IS_ENABLED(CONFIG_GPU_TRACEPOINTS) enum { @@ -1044,7 +876,6 @@ static bool jd_submit_atom(struct kbase_context *const kctx, katom->jobslot = user_atom->jobslot; katom->seq_nr = user_atom->seq_nr; katom->atom_flags = 0; - katom->retry_count = 0; katom->need_cache_flush_cores_retained = 0; katom->pre_dep = NULL; katom->post_dep = NULL; @@ -1078,9 +909,6 @@ static bool jd_submit_atom(struct kbase_context *const kctx, INIT_LIST_HEAD(&katom->queue); INIT_LIST_HEAD(&katom->jd_item); -#ifdef CONFIG_MALI_DMA_FENCE - kbase_fence_dep_count_set(katom, -1); -#endif /* Don't do anything if there is a mess up with dependencies. * This is done in a separate cycle to check both the dependencies at ones, otherwise @@ -1105,7 +933,7 @@ static bool jd_submit_atom(struct kbase_context *const kctx, * dependencies. */ jd_trace_atom_submit(kctx, katom, NULL); - return jd_done_nolock(katom, true); + return kbase_jd_done_nolock(katom, true); } } } @@ -1169,7 +997,7 @@ static bool jd_submit_atom(struct kbase_context *const kctx, if (err >= 0) kbase_finish_soft_job(katom); } - return jd_done_nolock(katom, true); + return kbase_jd_done_nolock(katom, true); } katom->will_fail_event_code = katom->event_code; @@ -1195,7 +1023,7 @@ static bool jd_submit_atom(struct kbase_context *const kctx, /* Create a new atom. */ jd_trace_atom_submit(kctx, katom, &katom->sched_priority); -#if !MALI_INCREMENTAL_RENDERING +#if !MALI_INCREMENTAL_RENDERING_JM /* Reject atoms for incremental rendering if not supported */ if (katom->core_req & (BASE_JD_REQ_START_RENDERPASS|BASE_JD_REQ_END_RENDERPASS)) { @@ -1203,9 +1031,9 @@ static bool jd_submit_atom(struct kbase_context *const kctx, "Rejecting atom with unsupported core_req 0x%x\n", katom->core_req); katom->event_code = BASE_JD_EVENT_JOB_INVALID; - return jd_done_nolock(katom, true); + return kbase_jd_done_nolock(katom, true); } -#endif /* !MALI_INCREMENTAL_RENDERING */ +#endif /* !MALI_INCREMENTAL_RENDERING_JM */ if (katom->core_req & BASE_JD_REQ_END_RENDERPASS) { WARN_ON(katom->jc != 0); @@ -1217,7 +1045,7 @@ static bool jd_submit_atom(struct kbase_context *const kctx, */ dev_err(kctx->kbdev->dev, "Rejecting atom with jc = NULL\n"); katom->event_code = BASE_JD_EVENT_JOB_INVALID; - return jd_done_nolock(katom, true); + return kbase_jd_done_nolock(katom, true); } /* Reject atoms with an invalid device_nr */ @@ -1227,7 +1055,7 @@ static bool jd_submit_atom(struct kbase_context *const kctx, "Rejecting atom with invalid device_nr %d\n", katom->device_nr); katom->event_code = BASE_JD_EVENT_JOB_INVALID; - return jd_done_nolock(katom, true); + return kbase_jd_done_nolock(katom, true); } /* Reject atoms with invalid core requirements */ @@ -1237,7 +1065,7 @@ static bool jd_submit_atom(struct kbase_context *const kctx, "Rejecting atom with invalid core requirements\n"); katom->event_code = BASE_JD_EVENT_JOB_INVALID; katom->core_req &= ~BASE_JD_REQ_EVENT_COALESCE; - return jd_done_nolock(katom, true); + return kbase_jd_done_nolock(katom, true); } /* Reject soft-job atom of certain types from accessing external resources */ @@ -1248,7 +1076,7 @@ static bool jd_submit_atom(struct kbase_context *const kctx, dev_err(kctx->kbdev->dev, "Rejecting soft-job atom accessing external resources\n"); katom->event_code = BASE_JD_EVENT_JOB_INVALID; - return jd_done_nolock(katom, true); + return kbase_jd_done_nolock(katom, true); } if (katom->core_req & BASE_JD_REQ_EXTERNAL_RESOURCES) { @@ -1256,7 +1084,7 @@ static bool jd_submit_atom(struct kbase_context *const kctx, if (kbase_jd_pre_external_resources(katom, user_atom) != 0) { /* setup failed (no access, bad resource, unknown resource types, etc.) */ katom->event_code = BASE_JD_EVENT_JOB_INVALID; - return jd_done_nolock(katom, true); + return kbase_jd_done_nolock(katom, true); } } @@ -1267,7 +1095,7 @@ static bool jd_submit_atom(struct kbase_context *const kctx, * JIT IDs - atom is invalid. */ katom->event_code = BASE_JD_EVENT_JOB_INVALID; - return jd_done_nolock(katom, true); + return kbase_jd_done_nolock(katom, true); } #endif /* MALI_JIT_PRESSURE_LIMIT_BASE */ @@ -1281,13 +1109,13 @@ static bool jd_submit_atom(struct kbase_context *const kctx, if ((katom->core_req & BASE_JD_REQ_SOFT_JOB) == 0) { if (!kbase_js_is_atom_valid(kctx->kbdev, katom)) { katom->event_code = BASE_JD_EVENT_JOB_INVALID; - return jd_done_nolock(katom, true); + return kbase_jd_done_nolock(katom, true); } } else { /* Soft-job */ if (kbase_prepare_soft_job(katom) != 0) { katom->event_code = BASE_JD_EVENT_JOB_INVALID; - return jd_done_nolock(katom, true); + return kbase_jd_done_nolock(katom, true); } } @@ -1302,16 +1130,10 @@ static bool jd_submit_atom(struct kbase_context *const kctx, if (queued && !IS_GPU_ATOM(katom)) return false; -#ifdef CONFIG_MALI_DMA_FENCE - if (kbase_fence_dep_count_read(katom) != -1) - return false; - -#endif /* CONFIG_MALI_DMA_FENCE */ - if (katom->core_req & BASE_JD_REQ_SOFT_JOB) { if (kbase_process_soft_job(katom) == 0) { kbase_finish_soft_job(katom); - return jd_done_nolock(katom, true); + return kbase_jd_done_nolock(katom, true); } return false; } @@ -1341,7 +1163,7 @@ static bool jd_submit_atom(struct kbase_context *const kctx, } /* This is a pure dependency. Resolve it immediately */ - return jd_done_nolock(katom, true); + return kbase_jd_done_nolock(katom, true); } int kbase_jd_submit(struct kbase_context *kctx, @@ -1379,18 +1201,26 @@ int kbase_jd_submit(struct kbase_context *kctx, return -EINVAL; } + if (nr_atoms > BASE_JD_ATOM_COUNT) { + dev_dbg(kbdev->dev, "Invalid attempt to submit %u atoms at once for kctx %d_%d", + nr_atoms, kctx->tgid, kctx->id); + return -EINVAL; + } + /* All atoms submitted in this call have the same flush ID */ latest_flush = kbase_backend_get_current_flush_id(kbdev); for (i = 0; i < nr_atoms; i++) { - struct base_jd_atom user_atom; + struct base_jd_atom user_atom = { + .seq_nr = 0, + }; struct base_jd_fragment user_jc_incr; struct kbase_jd_atom *katom; if (unlikely(jd_atom_is_v2)) { if (copy_from_user(&user_atom.jc, user_addr, sizeof(struct base_jd_atom_v2)) != 0) { dev_dbg(kbdev->dev, - "Invalid atom address %p passed to job_submit\n", + "Invalid atom address %pK passed to job_submit\n", user_addr); err = -EFAULT; break; @@ -1401,7 +1231,7 @@ int kbase_jd_submit(struct kbase_context *kctx, } else { if (copy_from_user(&user_atom, user_addr, stride) != 0) { dev_dbg(kbdev->dev, - "Invalid atom address %p passed to job_submit\n", + "Invalid atom address %pK passed to job_submit\n", user_addr); err = -EFAULT; break; @@ -1507,6 +1337,12 @@ while (false) kbase_disjoint_event_potential(kbdev); rt_mutex_unlock(&jctx->lock); + if (fatal_signal_pending(current)) { + dev_dbg(kbdev->dev, "Fatal signal pending for kctx %d_%d", + kctx->tgid, kctx->id); + /* We're being killed so the result code doesn't really matter */ + return 0; + } } if (need_to_try_schedule_context) @@ -1598,8 +1434,8 @@ void kbase_jd_done_worker(struct kthread_work *data) kbasep_js_remove_job(kbdev, kctx, katom); rt_mutex_unlock(&js_kctx_info->ctx.jsctx_mutex); rt_mutex_unlock(&js_devdata->queue_mutex); - /* jd_done_nolock() requires the jsctx_mutex lock to be dropped */ - jd_done_nolock(katom, false); + /* kbase_jd_done_nolock() requires the jsctx_mutex lock to be dropped */ + kbase_jd_done_nolock(katom, false); /* katom may have been freed now, do not use! */ @@ -1665,7 +1501,7 @@ void kbase_jd_done_worker(struct kthread_work *data) kbase_js_sched_all(kbdev); if (!atomic_dec_return(&kctx->work_count)) { - /* If worker now idle then post all events that jd_done_nolock() + /* If worker now idle then post all events that kbase_jd_done_nolock() * has queued */ rt_mutex_lock(&jctx->lock); @@ -1711,8 +1547,10 @@ static void jd_cancel_worker(struct kthread_work *data) struct kbase_jd_context *jctx; struct kbase_context *kctx; struct kbasep_js_kctx_info *js_kctx_info; + bool need_to_try_schedule_context; bool attr_state_changed; struct kbase_device *kbdev; + CSTD_UNUSED(need_to_try_schedule_context); /* Soft jobs should never reach this function */ KBASE_DEBUG_ASSERT((katom->core_req & BASE_JD_REQ_SOFT_JOB) == 0); @@ -1738,7 +1576,13 @@ static void jd_cancel_worker(struct kthread_work *data) rt_mutex_lock(&jctx->lock); - jd_done_nolock(katom, true); + need_to_try_schedule_context = kbase_jd_done_nolock(katom, true); + /* Because we're zapping, we're not adding any more jobs to this ctx, so no need to + * schedule the context. There's also no need for the jsctx_mutex to have been taken + * around this too. + */ + KBASE_DEBUG_ASSERT(!need_to_try_schedule_context); + CSTD_UNUSED(need_to_try_schedule_context); /* katom may have been freed now, do not use! */ rt_mutex_unlock(&jctx->lock); @@ -1777,6 +1621,8 @@ void kbase_jd_done(struct kbase_jd_atom *katom, int slot_nr, kbdev = kctx->kbdev; KBASE_DEBUG_ASSERT(kbdev); + lockdep_assert_held(&kbdev->hwaccess_lock); + if (done_code & KBASE_JS_ATOM_DONE_EVICTED_FROM_NEXT) katom->event_code = BASE_JD_EVENT_REMOVED_FROM_NEXT; @@ -1854,20 +1700,8 @@ void kbase_jd_zap_context(struct kbase_context *kctx) kbase_cancel_soft_job(katom); } - -#ifdef CONFIG_MALI_DMA_FENCE - kbase_dma_fence_cancel_all_atoms(kctx); -#endif - rt_mutex_unlock(&kctx->jctx.lock); -#ifdef CONFIG_MALI_DMA_FENCE - /* Flush dma-fence workqueue to ensure that any callbacks that may have - * been queued are done before continuing. - */ - flush_workqueue(kctx->dma_fence.wq); -#endif - #if IS_ENABLED(CONFIG_DEBUG_FS) kbase_debug_job_fault_kctx_unblock(kctx); #endif @@ -1896,11 +1730,10 @@ int kbase_jd_init(struct kbase_context *kctx) kctx->jctx.atoms[i].event_code = BASE_JD_EVENT_JOB_INVALID; kctx->jctx.atoms[i].status = KBASE_JD_ATOM_STATE_UNUSED; -#if defined(CONFIG_MALI_DMA_FENCE) || defined(CONFIG_SYNC_FILE) +#if IS_ENABLED(CONFIG_SYNC_FILE) kctx->jctx.atoms[i].dma_fence.context = dma_fence_context_alloc(1); atomic_set(&kctx->jctx.atoms[i].dma_fence.seqno, 0); - INIT_LIST_HEAD(&kctx->jctx.atoms[i].dma_fence.callbacks); #endif } diff --git a/mali_kbase/mali_kbase_jd_debugfs.c b/mali_kbase/mali_kbase_jd_debugfs.c index f9b41d5..3e0a760 100644 --- a/mali_kbase/mali_kbase_jd_debugfs.c +++ b/mali_kbase/mali_kbase_jd_debugfs.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2014-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2014-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -24,8 +24,7 @@ #include <linux/seq_file.h> #include <mali_kbase.h> #include <mali_kbase_jd_debugfs.h> -#include <mali_kbase_dma_fence.h> -#if defined(CONFIG_SYNC) || defined(CONFIG_SYNC_FILE) +#if IS_ENABLED(CONFIG_SYNC_FILE) #include <mali_kbase_sync.h> #endif #include <uapi/gpu/arm/midgard/mali_kbase_ioctl.h> @@ -38,7 +37,7 @@ struct kbase_jd_debugfs_depinfo { static void kbase_jd_debugfs_fence_info(struct kbase_jd_atom *atom, struct seq_file *sfile) { -#if defined(CONFIG_SYNC) || defined(CONFIG_SYNC_FILE) +#if IS_ENABLED(CONFIG_SYNC_FILE) struct kbase_sync_fence_info info; int res; @@ -58,55 +57,7 @@ static void kbase_jd_debugfs_fence_info(struct kbase_jd_atom *atom, default: break; } -#endif /* CONFIG_SYNC || CONFIG_SYNC_FILE */ - -#ifdef CONFIG_MALI_DMA_FENCE - if (atom->core_req & BASE_JD_REQ_EXTERNAL_RESOURCES) { - struct kbase_fence_cb *cb; - - if (atom->dma_fence.fence) { -#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE) - struct fence *fence = atom->dma_fence.fence; -#else - struct dma_fence *fence = atom->dma_fence.fence; -#endif - - seq_printf(sfile, -#if (KERNEL_VERSION(4, 8, 0) > LINUX_VERSION_CODE) - "Sd(%u#%u: %s) ", -#elif (KERNEL_VERSION(5, 1, 0) > LINUX_VERSION_CODE) - "Sd(%llu#%u: %s) ", -#else - "Sd(%llu#%llu: %s) ", -#endif - fence->context, fence->seqno, - dma_fence_is_signaled(fence) ? "signaled" : - "active"); - } - - list_for_each_entry(cb, &atom->dma_fence.callbacks, - node) { -#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE) - struct fence *fence = cb->fence; -#else - struct dma_fence *fence = cb->fence; -#endif - - seq_printf(sfile, -#if (KERNEL_VERSION(4, 8, 0) > LINUX_VERSION_CODE) - "Wd(%u#%u: %s) ", -#elif (KERNEL_VERSION(5, 1, 0) > LINUX_VERSION_CODE) - "Wd(%llu#%u: %s) ", -#else - "Wd(%llu#%llu: %s) ", -#endif - fence->context, fence->seqno, - dma_fence_is_signaled(fence) ? "signaled" : - "active"); - } - } -#endif /* CONFIG_MALI_DMA_FENCE */ - +#endif /* CONFIG_SYNC_FILE */ } static void kbasep_jd_debugfs_atom_deps( @@ -164,7 +115,7 @@ static int kbasep_jd_debugfs_atoms_show(struct seq_file *sfile, void *data) BASE_UK_VERSION_MINOR); /* Print table heading */ - seq_puts(sfile, " ID, Core req, St, CR, Predeps, Start time, Additional info...\n"); + seq_puts(sfile, " ID, Core req, St, Predeps, Start time, Additional info...\n"); atoms = kctx->jctx.atoms; /* General atom states */ @@ -184,8 +135,8 @@ static int kbasep_jd_debugfs_atoms_show(struct seq_file *sfile, void *data) * it is valid */ if (ktime_to_ns(atom->start_timestamp)) - start_timestamp = ktime_to_ns( - ktime_sub(ktime_get(), atom->start_timestamp)); + start_timestamp = + ktime_to_ns(ktime_sub(ktime_get_raw(), atom->start_timestamp)); kbasep_jd_debugfs_atom_deps(deps, atom); @@ -230,11 +181,7 @@ static const struct file_operations kbasep_jd_debugfs_atoms_fops = { void kbasep_jd_debugfs_ctx_init(struct kbase_context *kctx) { -#if (KERNEL_VERSION(4, 7, 0) <= LINUX_VERSION_CODE) const mode_t mode = 0444; -#else - const mode_t mode = 0400; -#endif /* Caller already ensures this, but we keep the pattern for * maintenance safety. diff --git a/mali_kbase/mali_kbase_jm.c b/mali_kbase/mali_kbase_jm.c index 6cbd6f1..1ac5cd3 100644 --- a/mali_kbase/mali_kbase_jm.c +++ b/mali_kbase/mali_kbase_jm.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2014-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2013-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -37,15 +37,13 @@ * * Return: true if slot can still be submitted on, false if slot is now full. */ -static bool kbase_jm_next_job(struct kbase_device *kbdev, int js, - int nr_jobs_to_submit) +static bool kbase_jm_next_job(struct kbase_device *kbdev, unsigned int js, int nr_jobs_to_submit) { struct kbase_context *kctx; int i; kctx = kbdev->hwaccess.active_kctx[js]; - dev_dbg(kbdev->dev, - "Trying to run the next %d jobs in kctx %pK (s:%d)\n", + dev_dbg(kbdev->dev, "Trying to run the next %d jobs in kctx %pK (s:%u)\n", nr_jobs_to_submit, (void *)kctx, js); if (!kctx) @@ -60,7 +58,7 @@ static bool kbase_jm_next_job(struct kbase_device *kbdev, int js, kbase_backend_run_atom(kbdev, katom); } - dev_dbg(kbdev->dev, "Slot ringbuffer should now be full (s:%d)\n", js); + dev_dbg(kbdev->dev, "Slot ringbuffer should now be full (s:%u)\n", js); return false; } @@ -72,7 +70,7 @@ u32 kbase_jm_kick(struct kbase_device *kbdev, u32 js_mask) dev_dbg(kbdev->dev, "JM kick slot mask 0x%x\n", js_mask); while (js_mask) { - int js = ffs(js_mask) - 1; + unsigned int js = ffs(js_mask) - 1; int nr_jobs_to_submit = kbase_backend_slot_free(kbdev, js); if (kbase_jm_next_job(kbdev, js, nr_jobs_to_submit)) @@ -111,14 +109,14 @@ void kbase_jm_try_kick_all(struct kbase_device *kbdev) void kbase_jm_idle_ctx(struct kbase_device *kbdev, struct kbase_context *kctx) { - int js; + unsigned int js; lockdep_assert_held(&kbdev->hwaccess_lock); for (js = 0; js < BASE_JM_MAX_NR_SLOTS; js++) { if (kbdev->hwaccess.active_kctx[js] == kctx) { - dev_dbg(kbdev->dev, "Marking kctx %pK as inactive (s:%d)\n", - (void *)kctx, js); + dev_dbg(kbdev->dev, "Marking kctx %pK as inactive (s:%u)\n", (void *)kctx, + js); kbdev->hwaccess.active_kctx[js] = NULL; } } diff --git a/mali_kbase/mali_kbase_js.c b/mali_kbase/mali_kbase_js.c index 97af9c6..8d29f87 100644 --- a/mali_kbase/mali_kbase_js.c +++ b/mali_kbase/mali_kbase_js.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2011-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2011-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -34,7 +34,22 @@ #include "mali_kbase_jm.h" #include "mali_kbase_hwaccess_jm.h" +#include <mali_kbase_hwaccess_time.h> #include <linux/priority_control_manager.h> +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) +#include <mali_kbase_gpu_metrics.h> + +static unsigned long gpu_metrics_tp_emit_interval_ns = DEFAULT_GPU_METRICS_TP_EMIT_INTERVAL_NS; + +module_param(gpu_metrics_tp_emit_interval_ns, ulong, 0444); +MODULE_PARM_DESC(gpu_metrics_tp_emit_interval_ns, + "Time interval in nano seconds at which GPU metrics tracepoints are emitted"); + +unsigned long kbase_gpu_metrics_get_emit_interval(void) +{ + return gpu_metrics_tp_emit_interval_ns; +} +#endif /* * Private types @@ -77,8 +92,7 @@ static kbasep_js_release_result kbasep_js_runpool_release_ctx_internal( struct kbase_device *kbdev, struct kbase_context *kctx, struct kbasep_js_atom_retained_state *katom_retained_state); -static int kbase_js_get_slot(struct kbase_device *kbdev, - struct kbase_jd_atom *katom); +static unsigned int kbase_js_get_slot(struct kbase_device *kbdev, struct kbase_jd_atom *katom); static void kbase_js_foreach_ctx_job(struct kbase_context *kctx, kbasep_js_ctx_job_cb *callback); @@ -101,6 +115,118 @@ static int kbase_ktrace_get_ctx_refcnt(struct kbase_context *kctx) * Private functions */ +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) +/** + * gpu_metrics_timer_callback() - Callback function for the GPU metrics hrtimer + * + * @timer: Pointer to the GPU metrics hrtimer + * + * This function will emit power/gpu_work_period tracepoint for all the active + * GPU metrics contexts. The timer will be restarted if needed. + * + * Return: enum value to indicate that timer should not be restarted. + */ +static enum hrtimer_restart gpu_metrics_timer_callback(struct hrtimer *timer) +{ + struct kbasep_js_device_data *js_devdata = + container_of(timer, struct kbasep_js_device_data, gpu_metrics_timer); + struct kbase_device *kbdev = + container_of(js_devdata, struct kbase_device, js_data); + unsigned long flags; + + spin_lock_irqsave(&kbdev->hwaccess_lock, flags); + kbase_gpu_metrics_emit_tracepoint(kbdev, ktime_get_raw_ns()); + WARN_ON_ONCE(!js_devdata->gpu_metrics_timer_running); + if (js_devdata->gpu_metrics_timer_needed) { + hrtimer_start(&js_devdata->gpu_metrics_timer, + HR_TIMER_DELAY_NSEC(gpu_metrics_tp_emit_interval_ns), + HRTIMER_MODE_REL); + } else + js_devdata->gpu_metrics_timer_running = false; + spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); + + return HRTIMER_NORESTART; +} + +/** + * gpu_metrics_ctx_init() - Take a reference on GPU metrics context if it exists, + * otherwise allocate and initialise one. + * + * @kctx: Pointer to the Kbase context. + * + * The GPU metrics context represents an "Application" for the purposes of GPU metrics + * reporting. There may be multiple kbase_contexts contributing data to a single GPU + * metrics context. + * This function takes a reference on GPU metrics context if it already exists + * corresponding to the Application that is creating the Kbase context, otherwise + * memory is allocated for it and initialised. + * + * Return: 0 on success, or negative on failure. + */ +static inline int gpu_metrics_ctx_init(struct kbase_context *kctx) +{ + struct kbase_gpu_metrics_ctx *gpu_metrics_ctx; + struct kbase_device *kbdev = kctx->kbdev; + unsigned long flags; + int ret = 0; + + const struct cred *cred = get_current_cred(); + const unsigned int aid = cred->euid.val; + + put_cred(cred); + + /* Return early if this is not a Userspace created context */ + if (unlikely(!kctx->kfile)) + return 0; + + /* Serialize against the other threads trying to create/destroy Kbase contexts. */ + mutex_lock(&kbdev->kctx_list_lock); + spin_lock_irqsave(&kbdev->hwaccess_lock, flags); + gpu_metrics_ctx = kbase_gpu_metrics_ctx_get(kbdev, aid); + spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); + + if (!gpu_metrics_ctx) { + gpu_metrics_ctx = kmalloc(sizeof(*gpu_metrics_ctx), GFP_KERNEL); + + if (gpu_metrics_ctx) { + spin_lock_irqsave(&kbdev->hwaccess_lock, flags); + kbase_gpu_metrics_ctx_init(kbdev, gpu_metrics_ctx, aid); + spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); + } else { + dev_err(kbdev->dev, "Allocation for gpu_metrics_ctx failed"); + ret = -ENOMEM; + } + } + + kctx->gpu_metrics_ctx = gpu_metrics_ctx; + mutex_unlock(&kbdev->kctx_list_lock); + + return ret; +} + +/** + * gpu_metrics_ctx_term() - Drop a reference on a GPU metrics context and free it + * if the refcount becomes 0. + * + * @kctx: Pointer to the Kbase context. + */ +static inline void gpu_metrics_ctx_term(struct kbase_context *kctx) +{ + unsigned long flags; + + /* Return early if this is not a Userspace created context */ + if (unlikely(!kctx->kfile)) + return; + + /* Serialize against the other threads trying to create/destroy Kbase contexts. */ + mutex_lock(&kctx->kbdev->kctx_list_lock); + spin_lock_irqsave(&kctx->kbdev->hwaccess_lock, flags); + kbase_gpu_metrics_ctx_put(kctx->kbdev, kctx->gpu_metrics_ctx); + spin_unlock_irqrestore(&kctx->kbdev->hwaccess_lock, flags); + mutex_unlock(&kctx->kbdev->kctx_list_lock); +} +#endif + /** * core_reqs_from_jsn_features - Convert JSn_FEATURES to core requirements * @features: JSn_FEATURE register value @@ -151,8 +277,7 @@ static void kbase_js_sync_timers(struct kbase_device *kbdev) * * Return: true if there are no atoms to pull, false otherwise. */ -static inline bool -jsctx_rb_none_to_pull_prio(struct kbase_context *kctx, int js, int prio) +static inline bool jsctx_rb_none_to_pull_prio(struct kbase_context *kctx, unsigned int js, int prio) { bool none_to_pull; struct jsctx_queue *rb = &kctx->jsctx_queue[prio][js]; @@ -161,9 +286,8 @@ jsctx_rb_none_to_pull_prio(struct kbase_context *kctx, int js, int prio) none_to_pull = RB_EMPTY_ROOT(&rb->runnable_tree); - dev_dbg(kctx->kbdev->dev, - "Slot %d (prio %d) is %spullable in kctx %pK\n", - js, prio, none_to_pull ? "not " : "", kctx); + dev_dbg(kctx->kbdev->dev, "Slot %u (prio %d) is %spullable in kctx %pK\n", js, prio, + none_to_pull ? "not " : "", kctx); return none_to_pull; } @@ -179,8 +303,7 @@ jsctx_rb_none_to_pull_prio(struct kbase_context *kctx, int js, int prio) * Return: true if the ring buffers for all priorities have no pullable atoms, * false otherwise. */ -static inline bool -jsctx_rb_none_to_pull(struct kbase_context *kctx, int js) +static inline bool jsctx_rb_none_to_pull(struct kbase_context *kctx, unsigned int js) { int prio; @@ -212,8 +335,8 @@ jsctx_rb_none_to_pull(struct kbase_context *kctx, int js) * * The HW access lock must always be held when calling this function. */ -static void jsctx_queue_foreach_prio(struct kbase_context *kctx, int js, - int prio, kbasep_js_ctx_job_cb *callback) +static void jsctx_queue_foreach_prio(struct kbase_context *kctx, unsigned int js, int prio, + kbasep_js_ctx_job_cb *callback) { struct jsctx_queue *queue = &kctx->jsctx_queue[prio][js]; @@ -272,7 +395,7 @@ static void jsctx_queue_foreach_prio(struct kbase_context *kctx, int js, * jsctx_queue_foreach_prio() to iterate over the queue and invoke @callback * for each entry, and remove the entry from the queue. */ -static inline void jsctx_queue_foreach(struct kbase_context *kctx, int js, +static inline void jsctx_queue_foreach(struct kbase_context *kctx, unsigned int js, kbasep_js_ctx_job_cb *callback) { int prio; @@ -293,15 +416,14 @@ static inline void jsctx_queue_foreach(struct kbase_context *kctx, int js, * * Return: Pointer to next atom in buffer, or NULL if there is no atom. */ -static inline struct kbase_jd_atom * -jsctx_rb_peek_prio(struct kbase_context *kctx, int js, int prio) +static inline struct kbase_jd_atom *jsctx_rb_peek_prio(struct kbase_context *kctx, unsigned int js, + int prio) { struct jsctx_queue *rb = &kctx->jsctx_queue[prio][js]; struct rb_node *node; lockdep_assert_held(&kctx->kbdev->hwaccess_lock); - dev_dbg(kctx->kbdev->dev, - "Peeking runnable tree of kctx %pK for prio %d (s:%d)\n", + dev_dbg(kctx->kbdev->dev, "Peeking runnable tree of kctx %pK for prio %d (s:%u)\n", (void *)kctx, prio, js); node = rb_first(&rb->runnable_tree); @@ -326,8 +448,7 @@ jsctx_rb_peek_prio(struct kbase_context *kctx, int js, int prio) * * Return: Pointer to next atom in buffer, or NULL if there is no atom. */ -static inline struct kbase_jd_atom * -jsctx_rb_peek(struct kbase_context *kctx, int js) +static inline struct kbase_jd_atom *jsctx_rb_peek(struct kbase_context *kctx, unsigned int js) { int prio; @@ -358,7 +479,7 @@ static inline void jsctx_rb_pull(struct kbase_context *kctx, struct kbase_jd_atom *katom) { int prio = katom->sched_priority; - int js = katom->slot_nr; + unsigned int js = katom->slot_nr; struct jsctx_queue *rb = &kctx->jsctx_queue[prio][js]; lockdep_assert_held(&kctx->kbdev->hwaccess_lock); @@ -377,14 +498,14 @@ jsctx_tree_add(struct kbase_context *kctx, struct kbase_jd_atom *katom) { struct kbase_device *kbdev = kctx->kbdev; int prio = katom->sched_priority; - int js = katom->slot_nr; + unsigned int js = katom->slot_nr; struct jsctx_queue *queue = &kctx->jsctx_queue[prio][js]; struct rb_node **new = &(queue->runnable_tree.rb_node), *parent = NULL; lockdep_assert_held(&kctx->kbdev->hwaccess_lock); - dev_dbg(kbdev->dev, "Adding atom %pK to runnable tree of kctx %pK (s:%d)\n", - (void *)katom, (void *)kctx, js); + dev_dbg(kbdev->dev, "Adding atom %pK to runnable tree of kctx %pK (s:%u)\n", (void *)katom, + (void *)kctx, js); while (*new) { struct kbase_jd_atom *entry = container_of(*new, @@ -425,15 +546,11 @@ jsctx_rb_unpull(struct kbase_context *kctx, struct kbase_jd_atom *katom) jsctx_tree_add(kctx, katom); } -static bool kbase_js_ctx_pullable(struct kbase_context *kctx, - int js, - bool is_scheduled); +static bool kbase_js_ctx_pullable(struct kbase_context *kctx, unsigned int js, bool is_scheduled); static bool kbase_js_ctx_list_add_pullable_nolock(struct kbase_device *kbdev, - struct kbase_context *kctx, - int js); + struct kbase_context *kctx, unsigned int js); static bool kbase_js_ctx_list_add_unpullable_nolock(struct kbase_device *kbdev, - struct kbase_context *kctx, - int js); + struct kbase_context *kctx, unsigned int js); typedef bool(katom_ordering_func)(const struct kbase_jd_atom *, const struct kbase_jd_atom *); @@ -541,6 +658,7 @@ int kbasep_js_devdata_init(struct kbase_device * const kbdev) jsdd->gpu_reset_ticks_dumping = DEFAULT_JS_RESET_TICKS_DUMPING; jsdd->ctx_timeslice_ns = DEFAULT_JS_CTX_TIMESLICE_NS; atomic_set(&jsdd->soft_job_timeout_ms, DEFAULT_JS_SOFT_JOB_TIMEOUT); + jsdd->js_free_wait_time_ms = kbase_get_timeout_ms(kbdev, JM_DEFAULT_JS_FREE_TIMEOUT); dev_dbg(kbdev->dev, "JS Config Attribs: "); dev_dbg(kbdev->dev, "\tscheduling_period_ns:%u", @@ -565,6 +683,7 @@ int kbasep_js_devdata_init(struct kbase_device * const kbdev) jsdd->ctx_timeslice_ns); dev_dbg(kbdev->dev, "\tsoft_job_timeout:%i", atomic_read(&jsdd->soft_job_timeout_ms)); + dev_dbg(kbdev->dev, "\tjs_free_wait_time_ms:%u", jsdd->js_free_wait_time_ms); if (!(jsdd->soft_stop_ticks < jsdd->hard_stop_ticks_ss && jsdd->hard_stop_ticks_ss < jsdd->gpu_reset_ticks_ss && @@ -609,6 +728,21 @@ int kbasep_js_devdata_init(struct kbase_device * const kbdev) } } +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) + if (!gpu_metrics_tp_emit_interval_ns || (gpu_metrics_tp_emit_interval_ns > NSEC_PER_SEC)) { + dev_warn( + kbdev->dev, + "Invalid value (%lu ns) for module param gpu_metrics_tp_emit_interval_ns. Using default value: %u ns", + gpu_metrics_tp_emit_interval_ns, DEFAULT_GPU_METRICS_TP_EMIT_INTERVAL_NS); + gpu_metrics_tp_emit_interval_ns = DEFAULT_GPU_METRICS_TP_EMIT_INTERVAL_NS; + } + + hrtimer_init(&jsdd->gpu_metrics_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); + jsdd->gpu_metrics_timer.function = gpu_metrics_timer_callback; + jsdd->gpu_metrics_timer_needed = false; + jsdd->gpu_metrics_timer_running = false; +#endif + return 0; } @@ -619,8 +753,9 @@ void kbasep_js_devdata_halt(struct kbase_device *kbdev) void kbasep_js_devdata_term(struct kbase_device *kbdev) { - s8 zero_ctx_attr_ref_count[KBASEP_JS_CTX_ATTR_COUNT] = { 0, }; struct kbasep_js_device_data *js_devdata = &kbdev->js_data; + s8 zero_ctx_attr_ref_count[KBASEP_JS_CTX_ATTR_COUNT] = { 0, }; + CSTD_UNUSED(js_devdata); KBASE_DEBUG_ASSERT(kbdev != NULL); @@ -632,15 +767,31 @@ void kbasep_js_devdata_term(struct kbase_device *kbdev) zero_ctx_attr_ref_count, sizeof(zero_ctx_attr_ref_count)) == 0); CSTD_UNUSED(zero_ctx_attr_ref_count); + +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) + js_devdata->gpu_metrics_timer_needed = false; + hrtimer_cancel(&js_devdata->gpu_metrics_timer); +#endif } int kbasep_js_kctx_init(struct kbase_context *const kctx) { struct kbasep_js_kctx_info *js_kctx_info; int i, j; + int ret; + CSTD_UNUSED(js_kctx_info); KBASE_DEBUG_ASSERT(kctx != NULL); + CSTD_UNUSED(ret); +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) + ret = gpu_metrics_ctx_init(kctx); + if (ret) + return ret; +#endif + + kbase_ctx_sched_init_ctx(kctx); + for (i = 0; i < BASE_JM_MAX_NR_SLOTS; ++i) INIT_LIST_HEAD(&kctx->jctx.sched_info.ctx.ctx_list_entry[i]); @@ -679,9 +830,10 @@ void kbasep_js_kctx_term(struct kbase_context *kctx) { struct kbase_device *kbdev; struct kbasep_js_kctx_info *js_kctx_info; - int js; + unsigned int js; bool update_ctx_count = false; unsigned long flags; + CSTD_UNUSED(js_kctx_info); KBASE_DEBUG_ASSERT(kctx != NULL); @@ -717,6 +869,11 @@ void kbasep_js_kctx_term(struct kbase_context *kctx) kbase_backend_ctx_count_changed(kbdev); mutex_unlock(&kbdev->js_data.runpool_mutex); } + + kbase_ctx_sched_remove_ctx(kctx); +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) + gpu_metrics_ctx_term(kctx); +#endif } /* @@ -724,8 +881,8 @@ void kbasep_js_kctx_term(struct kbase_context *kctx) */ /* Should not normally use directly - use kbase_jsctx_slot_atom_pulled_dec() instead */ -static void kbase_jsctx_slot_prio_blocked_clear(struct kbase_context *kctx, - int js, int sched_prio) +static void kbase_jsctx_slot_prio_blocked_clear(struct kbase_context *kctx, unsigned int js, + int sched_prio) { struct kbase_jsctx_slot_tracking *slot_tracking = &kctx->slot_tracking[js]; @@ -737,7 +894,7 @@ static void kbase_jsctx_slot_prio_blocked_clear(struct kbase_context *kctx, NULL, 0, js, (unsigned int)sched_prio); } -static int kbase_jsctx_slot_atoms_pulled(struct kbase_context *kctx, int js) +static int kbase_jsctx_slot_atoms_pulled(struct kbase_context *kctx, unsigned int js) { return atomic_read(&kctx->slot_tracking[js].atoms_pulled); } @@ -747,7 +904,7 @@ static int kbase_jsctx_slot_atoms_pulled(struct kbase_context *kctx, int js) * - that priority level is blocked * - or, any higher priority level is blocked */ -static bool kbase_jsctx_slot_prio_is_blocked(struct kbase_context *kctx, int js, +static bool kbase_jsctx_slot_prio_is_blocked(struct kbase_context *kctx, unsigned int js, int sched_prio) { struct kbase_jsctx_slot_tracking *slot_tracking = @@ -787,7 +944,7 @@ static bool kbase_jsctx_slot_prio_is_blocked(struct kbase_context *kctx, int js, static int kbase_jsctx_slot_atom_pulled_inc(struct kbase_context *kctx, const struct kbase_jd_atom *katom) { - int js = katom->slot_nr; + unsigned int js = katom->slot_nr; int sched_prio = katom->sched_priority; struct kbase_jsctx_slot_tracking *slot_tracking = &kctx->slot_tracking[js]; @@ -796,7 +953,7 @@ static int kbase_jsctx_slot_atom_pulled_inc(struct kbase_context *kctx, lockdep_assert_held(&kctx->kbdev->hwaccess_lock); WARN(kbase_jsctx_slot_prio_is_blocked(kctx, js, sched_prio), - "Should not have pulled atoms for slot %d from a context that is blocked at priority %d or higher", + "Should not have pulled atoms for slot %u from a context that is blocked at priority %d or higher", js, sched_prio); nr_atoms_pulled = atomic_inc_return(&kctx->atoms_pulled_all_slots); @@ -825,7 +982,7 @@ static int kbase_jsctx_slot_atom_pulled_inc(struct kbase_context *kctx, static bool kbase_jsctx_slot_atom_pulled_dec(struct kbase_context *kctx, const struct kbase_jd_atom *katom) { - int js = katom->slot_nr; + unsigned int js = katom->slot_nr; int sched_prio = katom->sched_priority; int atoms_pulled_pri; struct kbase_jsctx_slot_tracking *slot_tracking = @@ -874,14 +1031,12 @@ static bool kbase_jsctx_slot_atom_pulled_dec(struct kbase_context *kctx, * Return: true if caller should call kbase_backend_ctx_count_changed() */ static bool kbase_js_ctx_list_add_pullable_nolock(struct kbase_device *kbdev, - struct kbase_context *kctx, - int js) + struct kbase_context *kctx, unsigned int js) { bool ret = false; lockdep_assert_held(&kbdev->hwaccess_lock); - dev_dbg(kbdev->dev, "Add pullable tail kctx %pK (s:%d)\n", - (void *)kctx, js); + dev_dbg(kbdev->dev, "Add pullable tail kctx %pK (s:%u)\n", (void *)kctx, js); if (!list_empty(&kctx->jctx.sched_info.ctx.ctx_list_entry[js])) list_del_init(&kctx->jctx.sched_info.ctx.ctx_list_entry[js]); @@ -916,14 +1071,13 @@ static bool kbase_js_ctx_list_add_pullable_nolock(struct kbase_device *kbdev, * * Return: true if caller should call kbase_backend_ctx_count_changed() */ -static bool kbase_js_ctx_list_add_pullable_head_nolock( - struct kbase_device *kbdev, struct kbase_context *kctx, int js) +static bool kbase_js_ctx_list_add_pullable_head_nolock(struct kbase_device *kbdev, + struct kbase_context *kctx, unsigned int js) { bool ret = false; lockdep_assert_held(&kbdev->hwaccess_lock); - dev_dbg(kbdev->dev, "Add pullable head kctx %pK (s:%d)\n", - (void *)kctx, js); + dev_dbg(kbdev->dev, "Add pullable head kctx %pK (s:%u)\n", (void *)kctx, js); if (!list_empty(&kctx->jctx.sched_info.ctx.ctx_list_entry[js])) list_del_init(&kctx->jctx.sched_info.ctx.ctx_list_entry[js]); @@ -961,8 +1115,7 @@ static bool kbase_js_ctx_list_add_pullable_head_nolock( * Return: true if caller should call kbase_backend_ctx_count_changed() */ static bool kbase_js_ctx_list_add_pullable_head(struct kbase_device *kbdev, - struct kbase_context *kctx, - int js) + struct kbase_context *kctx, unsigned int js) { bool ret; unsigned long flags; @@ -992,14 +1145,12 @@ static bool kbase_js_ctx_list_add_pullable_head(struct kbase_device *kbdev, * Return: true if caller should call kbase_backend_ctx_count_changed() */ static bool kbase_js_ctx_list_add_unpullable_nolock(struct kbase_device *kbdev, - struct kbase_context *kctx, - int js) + struct kbase_context *kctx, unsigned int js) { bool ret = false; lockdep_assert_held(&kbdev->hwaccess_lock); - dev_dbg(kbdev->dev, "Add unpullable tail kctx %pK (s:%d)\n", - (void *)kctx, js); + dev_dbg(kbdev->dev, "Add unpullable tail kctx %pK (s:%u)\n", (void *)kctx, js); list_move_tail(&kctx->jctx.sched_info.ctx.ctx_list_entry[js], &kbdev->js_data.ctx_list_unpullable[js][kctx->priority]); @@ -1034,9 +1185,8 @@ static bool kbase_js_ctx_list_add_unpullable_nolock(struct kbase_device *kbdev, * * Return: true if caller should call kbase_backend_ctx_count_changed() */ -static bool kbase_js_ctx_list_remove_nolock(struct kbase_device *kbdev, - struct kbase_context *kctx, - int js) +static bool kbase_js_ctx_list_remove_nolock(struct kbase_device *kbdev, struct kbase_context *kctx, + unsigned int js) { bool ret = false; @@ -1072,9 +1222,8 @@ static bool kbase_js_ctx_list_remove_nolock(struct kbase_device *kbdev, * Return: Context to use for specified slot. * NULL if no contexts present for specified slot */ -static struct kbase_context *kbase_js_ctx_list_pop_head_nolock( - struct kbase_device *kbdev, - int js) +static struct kbase_context *kbase_js_ctx_list_pop_head_nolock(struct kbase_device *kbdev, + unsigned int js) { struct kbase_context *kctx; int i; @@ -1090,9 +1239,8 @@ static struct kbase_context *kbase_js_ctx_list_pop_head_nolock( jctx.sched_info.ctx.ctx_list_entry[js]); list_del_init(&kctx->jctx.sched_info.ctx.ctx_list_entry[js]); - dev_dbg(kbdev->dev, - "Popped %pK from the pullable queue (s:%d)\n", - (void *)kctx, js); + dev_dbg(kbdev->dev, "Popped %pK from the pullable queue (s:%u)\n", (void *)kctx, + js); return kctx; } return NULL; @@ -1107,8 +1255,7 @@ static struct kbase_context *kbase_js_ctx_list_pop_head_nolock( * Return: Context to use for specified slot. * NULL if no contexts present for specified slot */ -static struct kbase_context *kbase_js_ctx_list_pop_head( - struct kbase_device *kbdev, int js) +static struct kbase_context *kbase_js_ctx_list_pop_head(struct kbase_device *kbdev, unsigned int js) { struct kbase_context *kctx; unsigned long flags; @@ -1132,8 +1279,7 @@ static struct kbase_context *kbase_js_ctx_list_pop_head( * Return: true if context can be pulled from on specified slot * false otherwise */ -static bool kbase_js_ctx_pullable(struct kbase_context *kctx, int js, - bool is_scheduled) +static bool kbase_js_ctx_pullable(struct kbase_context *kctx, unsigned int js, bool is_scheduled) { struct kbasep_js_device_data *js_devdata; struct kbase_jd_atom *katom; @@ -1152,8 +1298,7 @@ static bool kbase_js_ctx_pullable(struct kbase_context *kctx, int js, } katom = jsctx_rb_peek(kctx, js); if (!katom) { - dev_dbg(kbdev->dev, "JS: No pullable atom in kctx %pK (s:%d)\n", - (void *)kctx, js); + dev_dbg(kbdev->dev, "JS: No pullable atom in kctx %pK (s:%u)\n", (void *)kctx, js); return false; /* No pullable atoms */ } if (kbase_jsctx_slot_prio_is_blocked(kctx, js, katom->sched_priority)) { @@ -1161,7 +1306,7 @@ static bool kbase_js_ctx_pullable(struct kbase_context *kctx, int js, kctx->kbdev, JS_SLOT_PRIO_IS_BLOCKED, kctx, katom, katom->jc, js, (unsigned int)katom->sched_priority); dev_dbg(kbdev->dev, - "JS: kctx %pK is blocked from submitting atoms at priority %d and lower (s:%d)\n", + "JS: kctx %pK is blocked from submitting atoms at priority %d and lower (s:%u)\n", (void *)kctx, katom->sched_priority, js); return false; } @@ -1182,14 +1327,14 @@ static bool kbase_js_ctx_pullable(struct kbase_context *kctx, int js, if ((katom->atom_flags & KBASE_KATOM_FLAG_FAIL_BLOCKER) && kbase_backend_nr_atoms_on_slot(kctx->kbdev, js)) { dev_dbg(kbdev->dev, - "JS: Atom %pK has cross-slot fail dependency and atoms on slot (s:%d)\n", + "JS: Atom %pK has cross-slot fail dependency and atoms on slot (s:%u)\n", (void *)katom, js); return false; } } - dev_dbg(kbdev->dev, "JS: Atom %pK is pullable in kctx %pK (s:%d)\n", - (void *)katom, (void *)kctx, js); + dev_dbg(kbdev->dev, "JS: Atom %pK is pullable in kctx %pK (s:%u)\n", (void *)katom, + (void *)kctx, js); return true; } @@ -1200,7 +1345,7 @@ static bool kbase_js_dep_validate(struct kbase_context *kctx, struct kbase_device *kbdev = kctx->kbdev; bool ret = true; bool has_dep = false, has_x_dep = false; - int js = kbase_js_get_slot(kbdev, katom); + unsigned int js = kbase_js_get_slot(kbdev, katom); int prio = katom->sched_priority; int i; @@ -1208,7 +1353,7 @@ static bool kbase_js_dep_validate(struct kbase_context *kctx, struct kbase_jd_atom *dep_atom = katom->dep[i].atom; if (dep_atom) { - int dep_js = kbase_js_get_slot(kbdev, dep_atom); + unsigned int dep_js = kbase_js_get_slot(kbdev, dep_atom); int dep_prio = dep_atom->sched_priority; dev_dbg(kbdev->dev, @@ -1363,7 +1508,7 @@ static bool kbase_js_dep_validate(struct kbase_context *kctx, void kbase_js_set_ctx_priority(struct kbase_context *kctx, int new_priority) { struct kbase_device *kbdev = kctx->kbdev; - int js; + unsigned int js; lockdep_assert_held(&kbdev->hwaccess_lock); @@ -1789,10 +1934,12 @@ static kbasep_js_release_result kbasep_js_runpool_release_ctx_internal( unsigned long flags; struct kbasep_js_device_data *js_devdata; struct kbasep_js_kctx_info *js_kctx_info; + int kctx_as_nr = kctx->as_nr; kbasep_js_release_result release_result = 0u; bool runpool_ctx_attr_change = false; int new_ref_count; + CSTD_UNUSED(kctx_as_nr); KBASE_DEBUG_ASSERT(kbdev != NULL); KBASE_DEBUG_ASSERT(kctx != NULL); @@ -1809,7 +1956,7 @@ static kbasep_js_release_result kbasep_js_runpool_release_ctx_internal( * * Assert about out calling contract */ - mutex_lock(&kbdev->pm.lock); + rt_mutex_lock(&kbdev->pm.lock); spin_lock_irqsave(&kbdev->hwaccess_lock, flags); KBASE_DEBUG_ASSERT(atomic_read(&kctx->refcount) > 0); @@ -1911,7 +2058,7 @@ static kbasep_js_release_result kbasep_js_runpool_release_ctx_internal( kbase_backend_release_ctx_noirq(kbdev, kctx); - mutex_unlock(&kbdev->pm.lock); + rt_mutex_unlock(&kbdev->pm.lock); /* Note: Don't reuse kctx_as_nr now */ @@ -1934,7 +2081,7 @@ static kbasep_js_release_result kbasep_js_runpool_release_ctx_internal( katom_retained_state, runpool_ctx_attr_change); spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); - mutex_unlock(&kbdev->pm.lock); + rt_mutex_unlock(&kbdev->pm.lock); } return release_result; @@ -2064,9 +2211,8 @@ void kbase_js_set_timeouts(struct kbase_device *kbdev) kbase_backend_timeouts_changed(kbdev); } -static bool kbasep_js_schedule_ctx(struct kbase_device *kbdev, - struct kbase_context *kctx, - int js) +static bool kbasep_js_schedule_ctx(struct kbase_device *kbdev, struct kbase_context *kctx, + unsigned int js) { struct kbasep_js_device_data *js_devdata; struct kbasep_js_kctx_info *js_kctx_info; @@ -2074,7 +2220,7 @@ static bool kbasep_js_schedule_ctx(struct kbase_device *kbdev, bool kctx_suspended = false; int as_nr; - dev_dbg(kbdev->dev, "Scheduling kctx %pK (s:%d)\n", kctx, js); + dev_dbg(kbdev->dev, "Scheduling kctx %pK (s:%u)\n", kctx, js); js_devdata = &kbdev->js_data; js_kctx_info = &kctx->jctx.sched_info; @@ -2101,8 +2247,8 @@ static bool kbasep_js_schedule_ctx(struct kbase_device *kbdev, WARN_ON(as_nr == KBASEP_AS_NR_INVALID); } } - if (as_nr == KBASEP_AS_NR_INVALID) - return false; /* No address spaces currently available */ + if ((as_nr < 0) || (as_nr >= BASE_MAX_NR_AS)) + return false; /* No address space currently available */ /* * Atomic transaction on the Context and Run Pool begins @@ -2171,6 +2317,9 @@ static bool kbasep_js_schedule_ctx(struct kbase_device *kbdev, #else if (kbase_pm_is_suspending(kbdev)) { #endif + /* Cause it to leave at some later point */ + bool retained; + CSTD_UNUSED(retained); kbase_ctx_sched_inc_refcount_nolock(kctx); @@ -2205,9 +2354,8 @@ static bool kbasep_js_schedule_ctx(struct kbase_device *kbdev, return true; } -static bool kbase_js_use_ctx(struct kbase_device *kbdev, - struct kbase_context *kctx, - int js) +static bool kbase_js_use_ctx(struct kbase_device *kbdev, struct kbase_context *kctx, + unsigned int js) { unsigned long flags; @@ -2215,9 +2363,7 @@ static bool kbase_js_use_ctx(struct kbase_device *kbdev, if (kbase_ctx_flag(kctx, KCTX_SCHEDULED) && kbase_backend_use_ctx_sched(kbdev, kctx, js)) { - - dev_dbg(kbdev->dev, - "kctx %pK already has ASID - mark as active (s:%d)\n", + dev_dbg(kbdev->dev, "kctx %pK already has ASID - mark as active (s:%u)\n", (void *)kctx, js); if (kbdev->hwaccess.active_kctx[js] != kctx) { @@ -2484,8 +2630,7 @@ bool kbase_js_is_atom_valid(struct kbase_device *kbdev, return true; } -static int kbase_js_get_slot(struct kbase_device *kbdev, - struct kbase_jd_atom *katom) +static unsigned int kbase_js_get_slot(struct kbase_device *kbdev, struct kbase_jd_atom *katom) { if (katom->core_req & BASE_JD_REQ_JOB_SLOT) return katom->jobslot; @@ -2524,11 +2669,10 @@ bool kbase_js_dep_resolved_submit(struct kbase_context *kctx, (katom->pre_dep && (katom->pre_dep->atom_flags & KBASE_KATOM_FLAG_JSCTX_IN_X_DEP_LIST))) { int prio = katom->sched_priority; - int js = katom->slot_nr; + unsigned int js = katom->slot_nr; struct jsctx_queue *queue = &kctx->jsctx_queue[prio][js]; - dev_dbg(kctx->kbdev->dev, "Add atom %pK to X_DEP list (s:%d)\n", - (void *)katom, js); + dev_dbg(kctx->kbdev->dev, "Add atom %pK to X_DEP list (s:%u)\n", (void *)katom, js); list_add_tail(&katom->queue, &queue->x_dep_head); katom->atom_flags |= KBASE_KATOM_FLAG_JSCTX_IN_X_DEP_LIST; @@ -2619,8 +2763,8 @@ static void kbase_js_move_to_tree(struct kbase_jd_atom *katom) * * Context: Caller must hold the HW access lock */ -static void kbase_js_evict_deps(struct kbase_context *kctx, - struct kbase_jd_atom *katom, int js, int prio) +static void kbase_js_evict_deps(struct kbase_context *kctx, struct kbase_jd_atom *katom, + unsigned int js, int prio) { struct kbase_jd_atom *x_dep = katom->x_post_dep; struct kbase_jd_atom *next_katom = katom->post_dep; @@ -2652,7 +2796,7 @@ static void kbase_js_evict_deps(struct kbase_context *kctx, } } -struct kbase_jd_atom *kbase_js_pull(struct kbase_context *kctx, int js) +struct kbase_jd_atom *kbase_js_pull(struct kbase_context *kctx, unsigned int js) { struct kbase_jd_atom *katom; struct kbasep_js_device_data *js_devdata; @@ -2662,8 +2806,7 @@ struct kbase_jd_atom *kbase_js_pull(struct kbase_context *kctx, int js) KBASE_DEBUG_ASSERT(kctx); kbdev = kctx->kbdev; - dev_dbg(kbdev->dev, "JS: pulling an atom from kctx %pK (s:%d)\n", - (void *)kctx, js); + dev_dbg(kbdev->dev, "JS: pulling an atom from kctx %pK (s:%u)\n", (void *)kctx, js); js_devdata = &kbdev->js_data; lockdep_assert_held(&kbdev->hwaccess_lock); @@ -2682,13 +2825,12 @@ struct kbase_jd_atom *kbase_js_pull(struct kbase_context *kctx, int js) katom = jsctx_rb_peek(kctx, js); if (!katom) { - dev_dbg(kbdev->dev, "JS: No pullable atom in kctx %pK (s:%d)\n", - (void *)kctx, js); + dev_dbg(kbdev->dev, "JS: No pullable atom in kctx %pK (s:%u)\n", (void *)kctx, js); return NULL; } if (kbase_jsctx_slot_prio_is_blocked(kctx, js, katom->sched_priority)) { dev_dbg(kbdev->dev, - "JS: kctx %pK is blocked from submitting atoms at priority %d and lower (s:%d)\n", + "JS: kctx %pK is blocked from submitting atoms at priority %d and lower (s:%u)\n", (void *)kctx, katom->sched_priority, js); return NULL; } @@ -2722,7 +2864,7 @@ struct kbase_jd_atom *kbase_js_pull(struct kbase_context *kctx, int js) if ((katom->atom_flags & KBASE_KATOM_FLAG_FAIL_BLOCKER) && kbase_backend_nr_atoms_on_slot(kbdev, js)) { dev_dbg(kbdev->dev, - "JS: Atom %pK has cross-slot fail dependency and atoms on slot (s:%d)\n", + "JS: Atom %pK has cross-slot fail dependency and atoms on slot (s:%u)\n", (void *)katom, js); return NULL; } @@ -2745,7 +2887,7 @@ struct kbase_jd_atom *kbase_js_pull(struct kbase_context *kctx, int js) katom->ticks = 0; - dev_dbg(kbdev->dev, "JS: successfully pulled atom %pK from kctx %pK (s:%d)\n", + dev_dbg(kbdev->dev, "JS: successfully pulled atom %pK from kctx %pK (s:%u)\n", (void *)katom, (void *)kctx, js); return katom; @@ -3276,6 +3418,7 @@ bool kbase_js_complete_atom_wq(struct kbase_context *kctx, int atom_slot; bool context_idle = false; int prio = katom->sched_priority; + bool slot_became_unblocked; kbdev = kctx->kbdev; atom_slot = katom->slot_nr; @@ -3298,44 +3441,37 @@ bool kbase_js_complete_atom_wq(struct kbase_context *kctx, mutex_lock(&js_devdata->runpool_mutex); spin_lock_irqsave(&kbdev->hwaccess_lock, flags); - if (katom->atom_flags & KBASE_KATOM_FLAG_JSCTX_IN_TREE) { - bool slot_became_unblocked; + WARN_ON(!(katom->atom_flags & KBASE_KATOM_FLAG_JSCTX_IN_TREE)); - dev_dbg(kbdev->dev, "Atom %pK is in runnable_tree\n", - (void *)katom); + dev_dbg(kbdev->dev, "Atom %pK is in runnable_tree\n", (void *)katom); - slot_became_unblocked = - kbase_jsctx_slot_atom_pulled_dec(kctx, katom); - context_idle = !kbase_jsctx_atoms_pulled(kctx); + slot_became_unblocked = kbase_jsctx_slot_atom_pulled_dec(kctx, katom); + context_idle = !kbase_jsctx_atoms_pulled(kctx); - if (!kbase_jsctx_atoms_pulled(kctx) && !kctx->slots_pullable) { - WARN_ON(!kbase_ctx_flag(kctx, KCTX_RUNNABLE_REF)); - kbase_ctx_flag_clear(kctx, KCTX_RUNNABLE_REF); - atomic_dec(&kbdev->js_data.nr_contexts_runnable); - timer_sync = true; - } + if (!kbase_jsctx_atoms_pulled(kctx) && !kctx->slots_pullable) { + WARN_ON(!kbase_ctx_flag(kctx, KCTX_RUNNABLE_REF)); + kbase_ctx_flag_clear(kctx, KCTX_RUNNABLE_REF); + atomic_dec(&kbdev->js_data.nr_contexts_runnable); + timer_sync = true; + } - /* If this slot has been blocked due to soft-stopped atoms, and - * all atoms have now been processed at this priority level and - * higher, then unblock the slot - */ - if (slot_became_unblocked) { - dev_dbg(kbdev->dev, - "kctx %pK is no longer blocked from submitting on slot %d at priority %d or higher\n", - (void *)kctx, atom_slot, prio); + /* If this slot has been blocked due to soft-stopped atoms, and + * all atoms have now been processed at this priority level and + * higher, then unblock the slot + */ + if (slot_became_unblocked) { + dev_dbg(kbdev->dev, + "kctx %pK is no longer blocked from submitting on slot %d at priority %d or higher\n", + (void *)kctx, atom_slot, prio); - if (kbase_js_ctx_pullable(kctx, atom_slot, true)) - timer_sync |= - kbase_js_ctx_list_add_pullable_nolock( - kbdev, kctx, atom_slot); - } + if (kbase_js_ctx_pullable(kctx, atom_slot, true)) + timer_sync |= + kbase_js_ctx_list_add_pullable_nolock(kbdev, kctx, atom_slot); } - WARN_ON(!(katom->atom_flags & KBASE_KATOM_FLAG_JSCTX_IN_TREE)); if (!kbase_jsctx_slot_atoms_pulled(kctx, atom_slot) && jsctx_rb_none_to_pull(kctx, atom_slot)) { - if (!list_empty( - &kctx->jctx.sched_info.ctx.ctx_list_entry[atom_slot])) + if (!list_empty(&kctx->jctx.sched_info.ctx.ctx_list_entry[atom_slot])) timer_sync |= kbase_js_ctx_list_remove_nolock( kctx->kbdev, kctx, atom_slot); } @@ -3348,7 +3484,7 @@ bool kbase_js_complete_atom_wq(struct kbase_context *kctx, if (!kbasep_js_is_submit_allowed(js_devdata, kctx) && !kbase_jsctx_atoms_pulled(kctx) && !kbase_ctx_flag(kctx, KCTX_DYING)) { - int js; + unsigned int js; kbasep_js_set_submit_allowed(js_devdata, kctx); @@ -3360,7 +3496,7 @@ bool kbase_js_complete_atom_wq(struct kbase_context *kctx, } } else if (katom->x_post_dep && kbasep_js_is_submit_allowed(js_devdata, kctx)) { - int js; + unsigned int js; for (js = 0; js < kbdev->gpu_props.num_job_slots; js++) { if (kbase_js_ctx_pullable(kctx, js, true)) @@ -3638,13 +3774,13 @@ done: return ret; } -void kbase_js_sched(struct kbase_device *kbdev, int js_mask) +void kbase_js_sched(struct kbase_device *kbdev, unsigned int js_mask) { struct kbasep_js_device_data *js_devdata; struct kbase_context *last_active[BASE_JM_MAX_NR_SLOTS]; bool timer_sync = false; bool ctx_waiting[BASE_JM_MAX_NR_SLOTS]; - int js; + unsigned int js; KBASE_TLSTREAM_TL_JS_SCHED_START(kbdev, 0); @@ -3690,18 +3826,15 @@ void kbase_js_sched(struct kbase_device *kbdev, int js_mask) if (!kctx) { js_mask &= ~(1 << js); - dev_dbg(kbdev->dev, - "No kctx on pullable list (s:%d)\n", - js); + dev_dbg(kbdev->dev, "No kctx on pullable list (s:%u)\n", js); break; } if (!kbase_ctx_flag(kctx, KCTX_ACTIVE)) { context_idle = true; - dev_dbg(kbdev->dev, - "kctx %pK is not active (s:%d)\n", - (void *)kctx, js); + dev_dbg(kbdev->dev, "kctx %pK is not active (s:%u)\n", (void *)kctx, + js); if (kbase_js_defer_activate_for_slot(kctx, js)) { bool ctx_count_changed; @@ -3724,8 +3857,7 @@ void kbase_js_sched(struct kbase_device *kbdev, int js_mask) if (kbase_pm_context_active_handle_suspend( kbdev, KBASE_PM_SUSPEND_HANDLER_DONT_INCREASE)) { - dev_dbg(kbdev->dev, - "Suspend pending (s:%d)\n", js); + dev_dbg(kbdev->dev, "Suspend pending (s:%u)\n", js); /* Suspend pending - return context to * queue and stop scheduling */ @@ -3786,16 +3918,13 @@ void kbase_js_sched(struct kbase_device *kbdev, int js_mask) kbase_ctx_flag_clear(kctx, KCTX_PULLED); if (!kbase_jm_kick(kbdev, 1 << js)) { - dev_dbg(kbdev->dev, - "No more jobs can be submitted (s:%d)\n", - js); + dev_dbg(kbdev->dev, "No more jobs can be submitted (s:%u)\n", js); js_mask &= ~(1 << js); } if (!kbase_ctx_flag(kctx, KCTX_PULLED)) { bool pullable; - dev_dbg(kbdev->dev, - "No atoms pulled from kctx %pK (s:%d)\n", + dev_dbg(kbdev->dev, "No atoms pulled from kctx %pK (s:%u)\n", (void *)kctx, js); pullable = kbase_js_ctx_pullable(kctx, js, @@ -3879,8 +4008,8 @@ void kbase_js_sched(struct kbase_device *kbdev, int js_mask) for (js = 0; js < BASE_JM_MAX_NR_SLOTS; js++) { if (kbdev->hwaccess.active_kctx[js] == last_active[js] && ctx_waiting[js]) { - dev_dbg(kbdev->dev, "Marking kctx %pK as inactive (s:%d)\n", - (void *)last_active[js], js); + dev_dbg(kbdev->dev, "Marking kctx %pK as inactive (s:%u)\n", + (void *)last_active[js], js); kbdev->hwaccess.active_kctx[js] = NULL; } } @@ -3951,7 +4080,7 @@ void kbase_js_zap_context(struct kbase_context *kctx) */ if (!kbase_ctx_flag(kctx, KCTX_SCHEDULED)) { unsigned long flags; - int js; + unsigned int js; spin_lock_irqsave(&kbdev->hwaccess_lock, flags); for (js = 0; js < kbdev->gpu_props.num_job_slots; js++) { @@ -3990,6 +4119,8 @@ void kbase_js_zap_context(struct kbase_context *kctx) rt_mutex_unlock(&kctx->jctx.lock); } else { unsigned long flags; + bool was_retained; + CSTD_UNUSED(was_retained); /* Case c: didn't evict, but it is scheduled - it's in the Run * Pool @@ -4068,7 +4199,7 @@ static void kbase_js_foreach_ctx_job(struct kbase_context *kctx, { struct kbase_device *kbdev; unsigned long flags; - u32 js; + unsigned int js; kbdev = kctx->kbdev; @@ -4088,13 +4219,15 @@ base_jd_prio kbase_js_priority_check(struct kbase_device *kbdev, base_jd_prio pr { struct priority_control_manager_device *pcm_device = kbdev->pcm_dev; int req_priority, out_priority; - base_jd_prio out_jd_priority = priority; - if (pcm_device) { - req_priority = kbasep_js_atom_prio_to_sched_prio(priority); - out_priority = pcm_device->ops.pcm_scheduler_priority_check(pcm_device, current, req_priority); - out_jd_priority = kbasep_js_sched_prio_to_atom_prio(out_priority); - } - return out_jd_priority; + req_priority = kbasep_js_atom_prio_to_sched_prio(priority); + out_priority = req_priority; + /* Does not use pcm defined priority check if PCM not defined or if + * kbasep_js_atom_prio_to_sched_prio returns an error + * (KBASE_JS_ATOM_SCHED_PRIO_INVALID). + */ + if (pcm_device && (req_priority != KBASE_JS_ATOM_SCHED_PRIO_INVALID)) + out_priority = pcm_device->ops.pcm_scheduler_priority_check(pcm_device, current, + req_priority); + return kbasep_js_sched_prio_to_atom_prio(kbdev, out_priority); } - diff --git a/mali_kbase/mali_kbase_kinstr_jm.c b/mali_kbase/mali_kbase_kinstr_jm.c index 84efbb3..ca74540 100644 --- a/mali_kbase/mali_kbase_kinstr_jm.c +++ b/mali_kbase/mali_kbase_kinstr_jm.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -45,8 +45,14 @@ #include <linux/slab.h> #include <linux/spinlock.h> #include <linux/version.h> +#include <linux/version_compat_defs.h> #include <linux/wait.h> +/* Explicitly include epoll header for old kernels. Not required from 4.16. */ +#if KERNEL_VERSION(4, 16, 0) > LINUX_VERSION_CODE +#include <uapi/linux/eventpoll.h> +#endif + /* Define static_assert(). * * The macro was introduced in kernel 5.1. But older vendor kernels may define @@ -60,14 +66,6 @@ #define __static_assert(e, msg, ...) _Static_assert(e, msg) #endif -#if KERNEL_VERSION(4, 16, 0) >= LINUX_VERSION_CODE -typedef unsigned int __poll_t; -#endif - -#ifndef ENOTSUP -#define ENOTSUP EOPNOTSUPP -#endif - /* The module printing prefix */ #define PR_ "mali_kbase_kinstr_jm: " @@ -227,11 +225,8 @@ static inline bool reader_changes_is_valid_size(const size_t size) * * Return: * (0, U16_MAX] - the number of data elements allocated - * -EINVAL - a pointer was invalid - * -ENOTSUP - we do not support allocation of the context * -ERANGE - the requested memory size was invalid * -ENOMEM - could not allocate the memory - * -EADDRINUSE - the buffer memory was already allocated */ static int reader_changes_init(struct reader_changes *const changes, const size_t size) @@ -626,31 +621,34 @@ exit: * * Return: * * 0 - no data ready - * * POLLIN - state changes have been buffered - * * -EBADF - the file descriptor did not have an attached reader - * * -EINVAL - the IO control arguments were invalid + * * EPOLLIN | EPOLLRDNORM - state changes have been buffered + * * EPOLLHUP | EPOLLERR - IO control arguments were invalid or the file + * descriptor did not have an attached reader. */ static __poll_t reader_poll(struct file *const file, struct poll_table_struct *const wait) { struct reader *reader; struct reader_changes *changes; + __poll_t mask = 0; if (unlikely(!file || !wait)) - return -EINVAL; + return EPOLLHUP | EPOLLERR; reader = file->private_data; if (unlikely(!reader)) - return -EBADF; + return EPOLLHUP | EPOLLERR; changes = &reader->changes; - if (reader_changes_count(changes) >= changes->threshold) - return POLLIN; + return EPOLLIN | EPOLLRDNORM; poll_wait(file, &reader->wait_queue, wait); - return (reader_changes_count(changes) > 0) ? POLLIN : 0; + if (reader_changes_count(changes) > 0) + mask |= EPOLLIN | EPOLLRDNORM; + + return mask; } /* The file operations virtual function table */ @@ -666,7 +664,7 @@ static const struct file_operations file_operations = { static const size_t kbase_kinstr_jm_readers_max = 16; /** - * kbasep_kinstr_jm_release() - Invoked when the reference count is dropped + * kbase_kinstr_jm_release() - Invoked when the reference count is dropped * @ref: the context reference count */ static void kbase_kinstr_jm_release(struct kref *const ref) @@ -737,7 +735,7 @@ static int kbase_kinstr_jm_readers_add(struct kbase_kinstr_jm *const ctx, } /** - * readers_del() - Deletes a reader from the list of readers + * kbase_kinstr_jm_readers_del() - Deletes a reader from the list of readers * @ctx: the instrumentation context * @reader: the reader to delete */ diff --git a/mali_kbase/mali_kbase_kinstr_jm.h b/mali_kbase/mali_kbase_kinstr_jm.h index 2c904e5..84fabac 100644 --- a/mali_kbase/mali_kbase_kinstr_jm.h +++ b/mali_kbase/mali_kbase_kinstr_jm.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -71,8 +71,6 @@ #else /* empty wrapper macros for userspace */ #define static_branch_unlikely(key) (1) -#define KERNEL_VERSION(a, b, c) (0) -#define LINUX_VERSION_CODE (1) #endif /* __KERNEL__ */ /* Forward declarations */ diff --git a/mali_kbase/mali_kbase_kinstr_prfcnt.c b/mali_kbase/mali_kbase_kinstr_prfcnt.c index afc008b..f0c4da7 100644 --- a/mali_kbase/mali_kbase_kinstr_prfcnt.c +++ b/mali_kbase/mali_kbase_kinstr_prfcnt.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2021-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -21,8 +21,8 @@ #include "mali_kbase.h" #include "mali_kbase_kinstr_prfcnt.h" -#include "mali_kbase_hwcnt_virtualizer.h" -#include "mali_kbase_hwcnt_gpu.h" +#include "hwcnt/mali_kbase_hwcnt_virtualizer.h" +#include "hwcnt/mali_kbase_hwcnt_gpu.h" #include <uapi/gpu/arm/midgard/mali_kbase_ioctl.h> #include "mali_malisw.h" #include "mali_kbase_debug.h" @@ -36,8 +36,14 @@ #include <linux/mutex.h> #include <linux/poll.h> #include <linux/slab.h> +#include <linux/version_compat_defs.h> #include <linux/workqueue.h> +/* Explicitly include epoll header for old kernels. Not required from 4.16. */ +#if KERNEL_VERSION(4, 16, 0) > LINUX_VERSION_CODE +#include <uapi/linux/eventpoll.h> +#endif + /* The minimum allowed interval between dumps, in nanoseconds * (equivalent to 10KHz) */ @@ -46,9 +52,6 @@ /* The maximum allowed buffers per client */ #define MAX_BUFFER_COUNT 32 -/* The module printing prefix */ -#define KINSTR_PRFCNT_PREFIX "mali_kbase_kinstr_prfcnt: " - /** * struct kbase_kinstr_prfcnt_context - IOCTL interface for userspace hardware * counters. @@ -87,16 +90,13 @@ struct kbase_kinstr_prfcnt_sample { /** * struct kbase_kinstr_prfcnt_sample_array - Array of sample data. - * @page_addr: Address of allocated pages. A single allocation is used + * @user_buf: Address of allocated userspace buffer. A single allocation is used * for all Dump Buffers in the array. - * @page_order: The allocation order of the pages, the order is on a - * logarithmic scale. * @sample_count: Number of allocated samples. * @samples: Non-NULL pointer to the array of Dump Buffers. */ struct kbase_kinstr_prfcnt_sample_array { - u64 page_addr; - unsigned int page_order; + u8 *user_buf; size_t sample_count; struct kbase_kinstr_prfcnt_sample *samples; }; @@ -120,16 +120,31 @@ struct kbase_kinstr_prfcnt_client_config { }; /** - * struct kbase_kinstr_prfcnt_async - Asynchronous sampling operation to - * carry out for a kinstr_prfcnt_client. - * @dump_work: Worker for performing asynchronous counter dumps. - * @user_data: User data for asynchronous dump in progress. - * @ts_end_ns: End timestamp of most recent async dump. + * enum kbase_kinstr_prfcnt_client_init_state - A list of + * initialisation states that the + * kinstr_prfcnt client can be at + * during initialisation. Useful + * for terminating a partially + * initialised client. + * + * @KINSTR_PRFCNT_UNINITIALISED : Client is uninitialised + * @KINSTR_PRFCNT_PARSE_SETUP : Parse the setup session + * @KINSTR_PRFCNT_ENABLE_MAP : Allocate memory for enable map + * @KINSTR_PRFCNT_DUMP_BUFFER : Allocate memory for dump buffer + * @KINSTR_PRFCNT_SAMPLE_ARRAY : Allocate memory for and initialise sample array + * @KINSTR_PRFCNT_VIRTUALIZER_CLIENT : Create virtualizer client + * @KINSTR_PRFCNT_WAITQ_MUTEX : Create and initialise mutex and waitqueue + * @KINSTR_PRFCNT_INITIALISED : Client is fully initialised */ -struct kbase_kinstr_prfcnt_async { - struct work_struct dump_work; - u64 user_data; - u64 ts_end_ns; +enum kbase_kinstr_prfcnt_client_init_state { + KINSTR_PRFCNT_UNINITIALISED, + KINSTR_PRFCNT_PARSE_SETUP = KINSTR_PRFCNT_UNINITIALISED, + KINSTR_PRFCNT_ENABLE_MAP, + KINSTR_PRFCNT_DUMP_BUFFER, + KINSTR_PRFCNT_SAMPLE_ARRAY, + KINSTR_PRFCNT_VIRTUALIZER_CLIENT, + KINSTR_PRFCNT_WAITQ_MUTEX, + KINSTR_PRFCNT_INITIALISED }; /** @@ -139,9 +154,7 @@ struct kbase_kinstr_prfcnt_async { * @hvcli: Hardware counter virtualizer client. * @node: Node used to attach this client to list in * kinstr_prfcnt context. - * @cmd_sync_lock: Lock coordinating the reader interface for commands - * that need interacting with the async sample dump - * worker thread. + * @cmd_sync_lock: Lock coordinating the reader interface for commands. * @next_dump_time_ns: Time in ns when this client's next periodic dump must * occur. If 0, not a periodic client. * @dump_interval_ns: Interval between periodic dumps. If 0, not a periodic @@ -162,15 +175,10 @@ struct kbase_kinstr_prfcnt_async { * @waitq: Client's notification queue. * @sample_size: Size of the data required for one sample, in bytes. * @sample_count: Number of samples the client is able to capture. - * @sync_sample_count: Number of available spaces for synchronous samples. - * It can differ from sample_count if asynchronous - * sample requests are reserving space in the buffer. * @user_data: User data associated with the session. * This is set when the session is started and stopped. * This value is ignored for control commands that * provide another value. - * @async: Asynchronous sampling operations to carry out in this - * client's session. */ struct kbase_kinstr_prfcnt_client { struct kbase_kinstr_prfcnt_context *kinstr_ctx; @@ -191,9 +199,7 @@ struct kbase_kinstr_prfcnt_client { wait_queue_head_t waitq; size_t sample_size; size_t sample_count; - atomic_t sync_sample_count; u64 user_data; - struct kbase_kinstr_prfcnt_async async; }; static struct prfcnt_enum_item kinstr_prfcnt_supported_requests[] = { @@ -226,35 +232,29 @@ static struct prfcnt_enum_item kinstr_prfcnt_supported_requests[] = { * @filp: Non-NULL pointer to file structure. * @wait: Non-NULL pointer to poll table. * - * Return: POLLIN if data can be read without blocking, 0 if data can not be - * read without blocking, else error code. + * Return: EPOLLIN | EPOLLRDNORM if data can be read without blocking, 0 if + * data can not be read without blocking, else EPOLLHUP | EPOLLERR. */ -#if KERNEL_VERSION(4, 16, 0) >= LINUX_VERSION_CODE -static unsigned int -kbasep_kinstr_prfcnt_hwcnt_reader_poll(struct file *filp, - struct poll_table_struct *wait) -#else static __poll_t kbasep_kinstr_prfcnt_hwcnt_reader_poll(struct file *filp, struct poll_table_struct *wait) -#endif { struct kbase_kinstr_prfcnt_client *cli; if (!filp || !wait) - return -EINVAL; + return EPOLLHUP | EPOLLERR; cli = filp->private_data; if (!cli) - return -EINVAL; + return EPOLLHUP | EPOLLERR; poll_wait(filp, &cli->waitq, wait); if (atomic_read(&cli->write_idx) != atomic_read(&cli->fetch_idx)) - return POLLIN; + return EPOLLIN | EPOLLRDNORM; - return 0; + return (__poll_t)0; } /** @@ -392,7 +392,10 @@ kbase_hwcnt_metadata_block_type_to_prfcnt_block_type(u64 type) block_type = PRFCNT_BLOCK_TYPE_MEMORY; break; - case KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_UNDEFINED: + case KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_FE_UNDEFINED: + case KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC_UNDEFINED: + case KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_TILER_UNDEFINED: + case KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_MEMSYS_UNDEFINED: default: block_type = PRFCNT_BLOCK_TYPE_RESERVED; break; @@ -429,18 +432,23 @@ static int kbasep_kinstr_prfcnt_set_block_meta_items(struct kbase_hwcnt_enable_map *enable_map, struct kbase_hwcnt_dump_buffer *dst, struct prfcnt_metadata **block_meta_base, - u64 base_addr, u8 counter_set) + u8 *base_addr, u8 counter_set) { size_t grp, blk, blk_inst; struct prfcnt_metadata **ptr_md = block_meta_base; const struct kbase_hwcnt_metadata *metadata; + uint8_t block_idx = 0; if (!dst || !*block_meta_base) return -EINVAL; metadata = dst->metadata; kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst) { - u64 *dst_blk; + u8 *dst_blk; + + /* Block indices must be reported with no gaps. */ + if (blk_inst == 0) + block_idx = 0; /* Skip unavailable or non-enabled blocks */ if (kbase_kinstr_is_block_type_reserved(metadata, grp, blk) || @@ -448,20 +456,21 @@ int kbasep_kinstr_prfcnt_set_block_meta_items(struct kbase_hwcnt_enable_map *ena !kbase_hwcnt_enable_map_block_enabled(enable_map, grp, blk, blk_inst)) continue; - dst_blk = kbase_hwcnt_dump_buffer_block_instance(dst, grp, blk, blk_inst); + dst_blk = (u8 *)kbase_hwcnt_dump_buffer_block_instance(dst, grp, blk, blk_inst); (*ptr_md)->hdr.item_type = PRFCNT_SAMPLE_META_TYPE_BLOCK; (*ptr_md)->hdr.item_version = PRFCNT_READER_API_VERSION; (*ptr_md)->u.block_md.block_type = kbase_hwcnt_metadata_block_type_to_prfcnt_block_type( kbase_hwcnt_metadata_block_type(metadata, grp, blk)); - (*ptr_md)->u.block_md.block_idx = (u8)blk_inst; + (*ptr_md)->u.block_md.block_idx = block_idx; (*ptr_md)->u.block_md.set = counter_set; (*ptr_md)->u.block_md.block_state = BLOCK_STATE_UNKNOWN; - (*ptr_md)->u.block_md.values_offset = (u32)((u64)(uintptr_t)dst_blk - base_addr); + (*ptr_md)->u.block_md.values_offset = (u32)(dst_blk - base_addr); /* update the buf meta data block pointer to next item */ (*ptr_md)++; + block_idx++; } return 0; @@ -504,7 +513,7 @@ static void kbasep_kinstr_prfcnt_set_sample_metadata( /* Dealing with counter blocks */ ptr_md++; if (WARN_ON(kbasep_kinstr_prfcnt_set_block_meta_items(&cli->enable_map, dump_buf, &ptr_md, - cli->sample_arr.page_addr, + cli->sample_arr.user_buf, cli->config.counter_set))) return; @@ -514,33 +523,6 @@ static void kbasep_kinstr_prfcnt_set_sample_metadata( } /** - * kbasep_kinstr_prfcnt_client_output_empty_sample() - Assemble an empty sample - * for output. - * @cli: Non-NULL pointer to a kinstr_prfcnt client. - * @buf_idx: The index to the sample array for saving the sample. - */ -static void kbasep_kinstr_prfcnt_client_output_empty_sample( - struct kbase_kinstr_prfcnt_client *cli, unsigned int buf_idx) -{ - struct kbase_hwcnt_dump_buffer *dump_buf; - struct prfcnt_metadata *ptr_md; - - if (WARN_ON(buf_idx >= cli->sample_arr.sample_count)) - return; - - dump_buf = &cli->sample_arr.samples[buf_idx].dump_buf; - ptr_md = cli->sample_arr.samples[buf_idx].sample_meta; - - kbase_hwcnt_dump_buffer_zero(dump_buf, &cli->enable_map); - - /* Use end timestamp from most recent async dump */ - ptr_md->u.sample_md.timestamp_start = cli->async.ts_end_ns; - ptr_md->u.sample_md.timestamp_end = cli->async.ts_end_ns; - - kbasep_kinstr_prfcnt_set_sample_metadata(cli, dump_buf, ptr_md); -} - -/** * kbasep_kinstr_prfcnt_client_output_sample() - Assemble a sample for output. * @cli: Non-NULL pointer to a kinstr_prfcnt client. * @buf_idx: The index to the sample array for saving the sample. @@ -589,16 +571,11 @@ static void kbasep_kinstr_prfcnt_client_output_sample( * @cli: Non-NULL pointer to a kinstr_prfcnt client. * @event_id: Event type that triggered the dump. * @user_data: User data to return to the user. - * @async_dump: Whether this is an asynchronous dump or not. - * @empty_sample: Sample block data will be 0 if this is true. * * Return: 0 on success, else error code. */ -static int -kbasep_kinstr_prfcnt_client_dump(struct kbase_kinstr_prfcnt_client *cli, - enum base_hwcnt_reader_event event_id, - u64 user_data, bool async_dump, - bool empty_sample) +static int kbasep_kinstr_prfcnt_client_dump(struct kbase_kinstr_prfcnt_client *cli, + enum base_hwcnt_reader_event event_id, u64 user_data) { int ret; u64 ts_start_ns = 0; @@ -616,17 +593,11 @@ kbasep_kinstr_prfcnt_client_dump(struct kbase_kinstr_prfcnt_client *cli, /* Check if there is a place to copy HWC block into. Calculate the * number of available samples count, by taking into account the type * of dump. - * Asynchronous dumps have the ability to reserve space in the samples - * array for future dumps, unlike synchronous dumps. Because of that, - * the samples count for synchronous dumps is managed by a variable - * called sync_sample_count, that originally is defined as equal to the - * size of the whole array but later decreases every time an - * asynchronous dump request is pending and then re-increased every - * time an asynchronous dump request is completed. */ - available_samples_count = async_dump ? - cli->sample_arr.sample_count : - atomic_read(&cli->sync_sample_count); + available_samples_count = cli->sample_arr.sample_count; + WARN_ON(available_samples_count < 1); + /* Reserve one slot to store the implicit sample taken on CMD_STOP */ + available_samples_count -= 1; if (write_idx - read_idx == available_samples_count) { /* For periodic sampling, the current active dump * will be accumulated in the next sample, when @@ -642,38 +613,19 @@ kbasep_kinstr_prfcnt_client_dump(struct kbase_kinstr_prfcnt_client *cli, */ write_idx %= cli->sample_arr.sample_count; - if (!empty_sample) { - ret = kbase_hwcnt_virtualizer_client_dump( - cli->hvcli, &ts_start_ns, &ts_end_ns, &cli->tmp_buf); - /* HWC dump error, set the sample with error flag */ - if (ret) - cli->sample_flags |= SAMPLE_FLAG_ERROR; - - /* Make the sample ready and copy it to the userspace mapped buffer */ - kbasep_kinstr_prfcnt_client_output_sample( - cli, write_idx, user_data, ts_start_ns, ts_end_ns); - } else { - if (!async_dump) { - struct prfcnt_metadata *ptr_md; - /* User data will not be updated for empty samples. */ - ptr_md = cli->sample_arr.samples[write_idx].sample_meta; - ptr_md->u.sample_md.user_data = user_data; - } + ret = kbase_hwcnt_virtualizer_client_dump(cli->hvcli, &ts_start_ns, &ts_end_ns, + &cli->tmp_buf); + /* HWC dump error, set the sample with error flag */ + if (ret) + cli->sample_flags |= SAMPLE_FLAG_ERROR; - /* Make the sample ready and copy it to the userspace mapped buffer */ - kbasep_kinstr_prfcnt_client_output_empty_sample(cli, write_idx); - } + /* Make the sample ready and copy it to the userspace mapped buffer */ + kbasep_kinstr_prfcnt_client_output_sample(cli, write_idx, user_data, ts_start_ns, + ts_end_ns); /* Notify client. Make sure all changes to memory are visible. */ wmb(); atomic_inc(&cli->write_idx); - if (async_dump) { - /* Remember the end timestamp of async dump for empty samples */ - if (!empty_sample) - cli->async.ts_end_ns = ts_end_ns; - - atomic_inc(&cli->sync_sample_count); - } wake_up_interruptible(&cli->waitq); /* Reset the flags for the next sample dump */ cli->sample_flags = 0; @@ -687,6 +639,9 @@ kbasep_kinstr_prfcnt_client_start(struct kbase_kinstr_prfcnt_client *cli, { int ret; u64 tm_start, tm_end; + unsigned int write_idx; + unsigned int read_idx; + size_t available_samples_count; WARN_ON(!cli); lockdep_assert_held(&cli->cmd_sync_lock); @@ -695,6 +650,16 @@ kbasep_kinstr_prfcnt_client_start(struct kbase_kinstr_prfcnt_client *cli, if (cli->active) return 0; + write_idx = atomic_read(&cli->write_idx); + read_idx = atomic_read(&cli->read_idx); + + /* Check whether there is space to store atleast an implicit sample + * corresponding to CMD_STOP. + */ + available_samples_count = cli->sample_count - (write_idx - read_idx); + if (!available_samples_count) + return -EBUSY; + kbase_hwcnt_gpu_enable_map_from_physical(&cli->enable_map, &cli->config.phys_em); @@ -707,7 +672,6 @@ kbasep_kinstr_prfcnt_client_start(struct kbase_kinstr_prfcnt_client *cli, cli->hvcli, &cli->enable_map, &tm_start, &tm_end, NULL); if (!ret) { - atomic_set(&cli->sync_sample_count, cli->sample_count); cli->active = true; cli->user_data = user_data; cli->sample_flags = 0; @@ -721,16 +685,6 @@ kbasep_kinstr_prfcnt_client_start(struct kbase_kinstr_prfcnt_client *cli, return ret; } -static int kbasep_kinstr_prfcnt_client_wait_async_done( - struct kbase_kinstr_prfcnt_client *cli) -{ - lockdep_assert_held(&cli->cmd_sync_lock); - - return wait_event_interruptible(cli->waitq, - atomic_read(&cli->sync_sample_count) == - cli->sample_count); -} - static int kbasep_kinstr_prfcnt_client_stop(struct kbase_kinstr_prfcnt_client *cli, u64 user_data) @@ -739,7 +693,7 @@ kbasep_kinstr_prfcnt_client_stop(struct kbase_kinstr_prfcnt_client *cli, u64 tm_start = 0; u64 tm_end = 0; struct kbase_hwcnt_physical_enable_map phys_em; - struct kbase_hwcnt_dump_buffer *tmp_buf = NULL; + size_t available_samples_count; unsigned int write_idx; unsigned int read_idx; @@ -750,12 +704,11 @@ kbasep_kinstr_prfcnt_client_stop(struct kbase_kinstr_prfcnt_client *cli, if (!cli->active) return -EINVAL; - /* Wait until pending async sample operation done */ - ret = kbasep_kinstr_prfcnt_client_wait_async_done(cli); - - if (ret < 0) - return -ERESTARTSYS; + mutex_lock(&cli->kinstr_ctx->lock); + /* Disable counters under the lock, so we do not race with the + * sampling thread. + */ phys_em.fe_bm = 0; phys_em.tiler_bm = 0; phys_em.mmu_l2_bm = 0; @@ -763,15 +716,11 @@ kbasep_kinstr_prfcnt_client_stop(struct kbase_kinstr_prfcnt_client *cli, kbase_hwcnt_gpu_enable_map_from_physical(&cli->enable_map, &phys_em); - mutex_lock(&cli->kinstr_ctx->lock); - /* Check whether one has the buffer to hold the last sample */ write_idx = atomic_read(&cli->write_idx); read_idx = atomic_read(&cli->read_idx); - /* Check if there is a place to save the last stop produced sample */ - if (write_idx - read_idx < cli->sample_arr.sample_count) - tmp_buf = &cli->tmp_buf; + available_samples_count = cli->sample_count - (write_idx - read_idx); ret = kbase_hwcnt_virtualizer_client_set_counters(cli->hvcli, &cli->enable_map, @@ -781,7 +730,8 @@ kbasep_kinstr_prfcnt_client_stop(struct kbase_kinstr_prfcnt_client *cli, if (ret) cli->sample_flags |= SAMPLE_FLAG_ERROR; - if (tmp_buf) { + /* There must be a place to save the last stop produced sample */ + if (!WARN_ON(!available_samples_count)) { write_idx %= cli->sample_arr.sample_count; /* Handle the last stop sample */ kbase_hwcnt_gpu_enable_map_from_physical(&cli->enable_map, @@ -811,7 +761,6 @@ kbasep_kinstr_prfcnt_client_sync_dump(struct kbase_kinstr_prfcnt_client *cli, u64 user_data) { int ret; - bool empty_sample = false; lockdep_assert_held(&cli->cmd_sync_lock); @@ -819,90 +768,9 @@ kbasep_kinstr_prfcnt_client_sync_dump(struct kbase_kinstr_prfcnt_client *cli, if (!cli->active || cli->dump_interval_ns) return -EINVAL; - /* Wait until pending async sample operation done, this is required to - * satisfy the stated sample sequence following their issuing order, - * reflected by the sample start timestamp. - */ - if (atomic_read(&cli->sync_sample_count) != cli->sample_count) { - /* Return empty sample instead of performing real dump. - * As there is an async dump currently in-flight which will - * have the desired information. - */ - empty_sample = true; - ret = kbasep_kinstr_prfcnt_client_wait_async_done(cli); - - if (ret < 0) - return -ERESTARTSYS; - } - mutex_lock(&cli->kinstr_ctx->lock); - ret = kbasep_kinstr_prfcnt_client_dump(cli, - BASE_HWCNT_READER_EVENT_MANUAL, - user_data, false, empty_sample); - - mutex_unlock(&cli->kinstr_ctx->lock); - - return ret; -} - -static int -kbasep_kinstr_prfcnt_client_async_dump(struct kbase_kinstr_prfcnt_client *cli, - u64 user_data) -{ - unsigned int write_idx; - unsigned int read_idx; - unsigned int active_async_dumps; - unsigned int new_async_buf_idx; - int ret; - - lockdep_assert_held(&cli->cmd_sync_lock); - - /* If the client is not started, or not manual, the command invalid */ - if (!cli->active || cli->dump_interval_ns) - return -EINVAL; - - mutex_lock(&cli->kinstr_ctx->lock); - - write_idx = atomic_read(&cli->write_idx); - read_idx = atomic_read(&cli->read_idx); - active_async_dumps = - cli->sample_count - atomic_read(&cli->sync_sample_count); - new_async_buf_idx = write_idx + active_async_dumps; - - /* Check if there is a place to copy HWC block into. - * If successful, reserve space in the buffer for the asynchronous - * operation to make sure that it can actually take place. - * Because we reserve space for asynchronous dumps we need to take that - * in consideration here. - */ - ret = (new_async_buf_idx - read_idx == cli->sample_arr.sample_count) ? - -EBUSY : - 0; - - if (ret == -EBUSY) { - mutex_unlock(&cli->kinstr_ctx->lock); - return ret; - } - - if (active_async_dumps > 0) { - struct prfcnt_metadata *ptr_md; - unsigned int buf_idx = - new_async_buf_idx % cli->sample_arr.sample_count; - /* Instead of storing user_data, write it directly to future - * empty sample. - */ - ptr_md = cli->sample_arr.samples[buf_idx].sample_meta; - ptr_md->u.sample_md.user_data = user_data; - - atomic_dec(&cli->sync_sample_count); - } else { - cli->async.user_data = user_data; - atomic_dec(&cli->sync_sample_count); - - kbase_hwcnt_virtualizer_queue_work(cli->kinstr_ctx->hvirt, - &cli->async.dump_work); - } + ret = kbasep_kinstr_prfcnt_client_dump(cli, BASE_HWCNT_READER_EVENT_MANUAL, user_data); mutex_unlock(&cli->kinstr_ctx->lock); @@ -962,10 +830,6 @@ int kbasep_kinstr_prfcnt_cmd(struct kbase_kinstr_prfcnt_client *cli, ret = kbasep_kinstr_prfcnt_client_sync_dump( cli, control_cmd->user_data); break; - case PRFCNT_CONTROL_CMD_SAMPLE_ASYNC: - ret = kbasep_kinstr_prfcnt_client_async_dump( - cli, control_cmd->user_data); - break; case PRFCNT_CONTROL_CMD_DISCARD: ret = kbasep_kinstr_prfcnt_client_discard(cli); break; @@ -1017,23 +881,8 @@ kbasep_kinstr_prfcnt_get_sample(struct kbase_kinstr_prfcnt_client *cli, } read_idx %= cli->sample_arr.sample_count; - sample_offset_bytes = - (u64)(uintptr_t)cli->sample_arr.samples[read_idx].sample_meta - - (u64)(uintptr_t)cli->sample_arr.page_addr; - sample_meta = - (struct prfcnt_metadata *)cli->sample_arr.samples[read_idx] - .sample_meta; - - /* Verify that a valid sample has been dumped in the read_idx. - * There are situations where this may not be the case, - * for instance if the client is trying to get an asynchronous - * sample which has not been dumped yet. - */ - if (sample_meta->hdr.item_type != PRFCNT_SAMPLE_META_TYPE_SAMPLE || - sample_meta->hdr.item_version != PRFCNT_READER_API_VERSION) { - err = -EINVAL; - goto error_out; - } + sample_meta = cli->sample_arr.samples[read_idx].sample_meta; + sample_offset_bytes = (u8 *)sample_meta - cli->sample_arr.user_buf; sample_access->sequence = sample_meta->u.sample_md.seq; sample_access->sample_offset_bytes = sample_offset_bytes; @@ -1067,8 +916,7 @@ kbasep_kinstr_prfcnt_put_sample(struct kbase_kinstr_prfcnt_client *cli, read_idx %= cli->sample_arr.sample_count; sample_offset_bytes = - (u64)(uintptr_t)cli->sample_arr.samples[read_idx].sample_meta - - (u64)(uintptr_t)cli->sample_arr.page_addr; + (u8 *)cli->sample_arr.samples[read_idx].sample_meta - cli->sample_arr.user_buf; if (sample_access->sample_offset_bytes != sample_offset_bytes) { err = -EINVAL; @@ -1160,40 +1008,15 @@ static int kbasep_kinstr_prfcnt_hwcnt_reader_mmap(struct file *filp, struct vm_area_struct *vma) { struct kbase_kinstr_prfcnt_client *cli; - unsigned long vm_size, size, addr, pfn, offset; if (!filp || !vma) return -EINVAL; - cli = filp->private_data; + cli = filp->private_data; if (!cli) return -EINVAL; - vm_size = vma->vm_end - vma->vm_start; - - /* The mapping is allowed to span the entirety of the page allocation, - * not just the chunk where the dump buffers are allocated. - * This accommodates the corner case where the combined size of the - * dump buffers is smaller than a single page. - * This does not pose a security risk as the pages are zeroed on - * allocation, and anything out of bounds of the dump buffers is never - * written to. - */ - size = (1ull << cli->sample_arr.page_order) * PAGE_SIZE; - - if (vma->vm_pgoff > (size >> PAGE_SHIFT)) - return -EINVAL; - - offset = vma->vm_pgoff << PAGE_SHIFT; - - if (vm_size > size - offset) - return -EINVAL; - - addr = __pa(cli->sample_arr.page_addr + offset); - pfn = addr >> PAGE_SHIFT; - - return remap_pfn_range(vma, vma->vm_start, pfn, vm_size, - vma->vm_page_prot); + return remap_vmalloc_range(vma, cli->sample_arr.user_buf, 0); } static void kbasep_kinstr_prfcnt_sample_array_free( @@ -1202,27 +1025,51 @@ static void kbasep_kinstr_prfcnt_sample_array_free( if (!sample_arr) return; - kfree((void *)sample_arr->samples); - kfree((void *)(size_t)sample_arr->page_addr); + kfree(sample_arr->samples); + vfree(sample_arr->user_buf); memset(sample_arr, 0, sizeof(*sample_arr)); } -#if !MALI_KERNEL_TEST_API -static -#endif -void kbasep_kinstr_prfcnt_client_destroy(struct kbase_kinstr_prfcnt_client *cli) +static void +kbasep_kinstr_prfcnt_client_destroy_partial(struct kbase_kinstr_prfcnt_client *cli, + enum kbase_kinstr_prfcnt_client_init_state init_state) { if (!cli) return; - kbase_hwcnt_virtualizer_client_destroy(cli->hvcli); - kbasep_kinstr_prfcnt_sample_array_free(&cli->sample_arr); - kbase_hwcnt_dump_buffer_free(&cli->tmp_buf); - kbase_hwcnt_enable_map_free(&cli->enable_map); - mutex_destroy(&cli->cmd_sync_lock); + while (init_state-- > KINSTR_PRFCNT_UNINITIALISED) { + switch (init_state) { + case KINSTR_PRFCNT_INITIALISED: + /* This shouldn't be reached */ + break; + case KINSTR_PRFCNT_WAITQ_MUTEX: + mutex_destroy(&cli->cmd_sync_lock); + break; + case KINSTR_PRFCNT_VIRTUALIZER_CLIENT: + kbase_hwcnt_virtualizer_client_destroy(cli->hvcli); + break; + case KINSTR_PRFCNT_SAMPLE_ARRAY: + kbasep_kinstr_prfcnt_sample_array_free(&cli->sample_arr); + break; + case KINSTR_PRFCNT_DUMP_BUFFER: + kbase_hwcnt_dump_buffer_free(&cli->tmp_buf); + break; + case KINSTR_PRFCNT_ENABLE_MAP: + kbase_hwcnt_enable_map_free(&cli->enable_map); + break; + case KINSTR_PRFCNT_PARSE_SETUP: + /* Nothing to do here */ + break; + } + } kfree(cli); } +void kbasep_kinstr_prfcnt_client_destroy(struct kbase_kinstr_prfcnt_client *cli) +{ + kbasep_kinstr_prfcnt_client_destroy_partial(cli, KINSTR_PRFCNT_INITIALISED); +} + /** * kbasep_kinstr_prfcnt_hwcnt_reader_release() - hwcnt reader's release. * @inode: Non-NULL pointer to inode structure. @@ -1329,9 +1176,8 @@ static void kbasep_kinstr_prfcnt_dump_worker(struct work_struct *work) list_for_each_entry(pos, &kinstr_ctx->clients, node) { if (pos->active && (pos->next_dump_time_ns != 0) && (pos->next_dump_time_ns < cur_time_ns)) - kbasep_kinstr_prfcnt_client_dump( - pos, BASE_HWCNT_READER_EVENT_PERIODIC, - pos->user_data, false, false); + kbasep_kinstr_prfcnt_client_dump(pos, BASE_HWCNT_READER_EVENT_PERIODIC, + pos->user_data); } kbasep_kinstr_prfcnt_reschedule_worker(kinstr_ctx); @@ -1340,48 +1186,6 @@ static void kbasep_kinstr_prfcnt_dump_worker(struct work_struct *work) } /** - * kbasep_kinstr_prfcnt_async_dump_worker()- Dump worker for a manual client - * to take a single asynchronous - * sample. - * @work: Work structure. - */ -static void kbasep_kinstr_prfcnt_async_dump_worker(struct work_struct *work) -{ - struct kbase_kinstr_prfcnt_async *cli_async = - container_of(work, struct kbase_kinstr_prfcnt_async, dump_work); - struct kbase_kinstr_prfcnt_client *cli = container_of( - cli_async, struct kbase_kinstr_prfcnt_client, async); - - mutex_lock(&cli->kinstr_ctx->lock); - /* While the async operation is in flight, a sync stop might have been - * executed, for which the dump should be skipped. Further as we are - * doing an async dump, we expect that there is reserved buffer for - * this to happen. This is to avoid the rare corner case where the - * user side has issued a stop/start pair before the async work item - * get the chance to execute. - */ - if (cli->active && - (atomic_read(&cli->sync_sample_count) < cli->sample_count)) - kbasep_kinstr_prfcnt_client_dump(cli, - BASE_HWCNT_READER_EVENT_MANUAL, - cli->async.user_data, true, - false); - - /* While the async operation is in flight, more async dump requests - * may have been submitted. In this case, no more async dumps work - * will be queued. Instead space will be reserved for that dump and - * an empty sample will be return after handling the current async - * dump. - */ - while (cli->active && - (atomic_read(&cli->sync_sample_count) < cli->sample_count)) { - kbasep_kinstr_prfcnt_client_dump( - cli, BASE_HWCNT_READER_EVENT_MANUAL, 0, true, true); - } - mutex_unlock(&cli->kinstr_ctx->lock); -} - -/** * kbasep_kinstr_prfcnt_dump_timer() - Dump timer that schedules the dump worker for * execution as soon as possible. * @timer: Timer structure. @@ -1443,8 +1247,6 @@ void kbase_kinstr_prfcnt_term(struct kbase_kinstr_prfcnt_context *kinstr_ctx) if (!kinstr_ctx) return; - cancel_work_sync(&kinstr_ctx->dump_work); - /* Non-zero client count implies client leak */ if (WARN_ON(kinstr_ctx->client_count > 0)) { struct kbase_kinstr_prfcnt_client *pos, *n; @@ -1456,14 +1258,18 @@ void kbase_kinstr_prfcnt_term(struct kbase_kinstr_prfcnt_context *kinstr_ctx) } } + cancel_work_sync(&kinstr_ctx->dump_work); + WARN_ON(kinstr_ctx->client_count > 0); kfree(kinstr_ctx); } void kbase_kinstr_prfcnt_suspend(struct kbase_kinstr_prfcnt_context *kinstr_ctx) { - if (WARN_ON(!kinstr_ctx)) + if (!kinstr_ctx) { + pr_warn("%s: kinstr_ctx is NULL\n", __func__); return; + } mutex_lock(&kinstr_ctx->lock); @@ -1492,8 +1298,10 @@ void kbase_kinstr_prfcnt_suspend(struct kbase_kinstr_prfcnt_context *kinstr_ctx) void kbase_kinstr_prfcnt_resume(struct kbase_kinstr_prfcnt_context *kinstr_ctx) { - if (WARN_ON(!kinstr_ctx)) + if (!kinstr_ctx) { + pr_warn("%s: kinstr_ctx is NULL\n", __func__); return; + } mutex_lock(&kinstr_ctx->lock); @@ -1530,8 +1338,6 @@ static int kbasep_kinstr_prfcnt_sample_array_alloc(struct kbase_kinstr_prfcnt_cl struct kbase_kinstr_prfcnt_sample_array *sample_arr = &cli->sample_arr; struct kbase_kinstr_prfcnt_sample *samples; size_t sample_idx; - u64 addr; - unsigned int order; size_t dump_buf_bytes; size_t clk_cnt_buf_bytes; size_t sample_meta_bytes; @@ -1554,16 +1360,13 @@ static int kbasep_kinstr_prfcnt_sample_array_alloc(struct kbase_kinstr_prfcnt_cl if (!samples) return -ENOMEM; - order = get_order(sample_size * buffer_count); - addr = (u64)(uintptr_t)kzalloc(sample_size * buffer_count, GFP_KERNEL); + sample_arr->user_buf = vmalloc_user(sample_size * buffer_count); - if (!addr) { - kfree((void *)samples); + if (!sample_arr->user_buf) { + kfree(samples); return -ENOMEM; } - sample_arr->page_addr = addr; - sample_arr->page_order = order; sample_arr->sample_count = buffer_count; sample_arr->samples = samples; @@ -1577,12 +1380,11 @@ static int kbasep_kinstr_prfcnt_sample_array_alloc(struct kbase_kinstr_prfcnt_cl /* Internal layout in a sample buffer: [sample metadata, dump_buf, clk_cnt_buf]. */ samples[sample_idx].dump_buf.metadata = metadata; samples[sample_idx].sample_meta = - (struct prfcnt_metadata *)(uintptr_t)( - addr + sample_meta_offset); + (struct prfcnt_metadata *)(sample_arr->user_buf + sample_meta_offset); samples[sample_idx].dump_buf.dump_buf = - (u64 *)(uintptr_t)(addr + dump_buf_offset); + (u64 *)(sample_arr->user_buf + dump_buf_offset); samples[sample_idx].dump_buf.clk_cnt_buf = - (u64 *)(uintptr_t)(addr + clk_cnt_buf_offset); + (u64 *)(sample_arr->user_buf + clk_cnt_buf_offset); } return 0; @@ -1849,83 +1651,100 @@ int kbasep_kinstr_prfcnt_client_create(struct kbase_kinstr_prfcnt_context *kinst { int err; struct kbase_kinstr_prfcnt_client *cli; + enum kbase_kinstr_prfcnt_client_init_state init_state; - WARN_ON(!kinstr_ctx); - WARN_ON(!setup); - WARN_ON(!req_arr); + if (WARN_ON(!kinstr_ctx)) + return -EINVAL; + + if (WARN_ON(!setup)) + return -EINVAL; + + if (WARN_ON(!req_arr)) + return -EINVAL; cli = kzalloc(sizeof(*cli), GFP_KERNEL); if (!cli) return -ENOMEM; - cli->kinstr_ctx = kinstr_ctx; - err = kbasep_kinstr_prfcnt_parse_setup(kinstr_ctx, setup, &cli->config, req_arr); - - if (err < 0) - goto error; + for (init_state = KINSTR_PRFCNT_UNINITIALISED; init_state < KINSTR_PRFCNT_INITIALISED; + init_state++) { + err = 0; + switch (init_state) { + case KINSTR_PRFCNT_PARSE_SETUP: + cli->kinstr_ctx = kinstr_ctx; + err = kbasep_kinstr_prfcnt_parse_setup(kinstr_ctx, setup, &cli->config, + req_arr); - cli->config.buffer_count = MAX_BUFFER_COUNT; - cli->dump_interval_ns = cli->config.period_ns; - cli->next_dump_time_ns = 0; - cli->active = false; - atomic_set(&cli->write_idx, 0); - atomic_set(&cli->read_idx, 0); - atomic_set(&cli->fetch_idx, 0); + break; - err = kbase_hwcnt_enable_map_alloc(kinstr_ctx->metadata, - &cli->enable_map); + case KINSTR_PRFCNT_ENABLE_MAP: + cli->config.buffer_count = MAX_BUFFER_COUNT; + cli->dump_interval_ns = cli->config.period_ns; + cli->next_dump_time_ns = 0; + cli->active = false; + atomic_set(&cli->write_idx, 0); + atomic_set(&cli->read_idx, 0); + atomic_set(&cli->fetch_idx, 0); - if (err < 0) - goto error; + err = kbase_hwcnt_enable_map_alloc(kinstr_ctx->metadata, &cli->enable_map); + break; - kbase_hwcnt_gpu_enable_map_from_physical(&cli->enable_map, &cli->config.phys_em); + case KINSTR_PRFCNT_DUMP_BUFFER: + kbase_hwcnt_gpu_enable_map_from_physical(&cli->enable_map, + &cli->config.phys_em); - cli->sample_count = cli->config.buffer_count; - atomic_set(&cli->sync_sample_count, cli->sample_count); - cli->sample_size = kbasep_kinstr_prfcnt_get_sample_size(cli, kinstr_ctx->metadata); + cli->sample_count = cli->config.buffer_count; + cli->sample_size = + kbasep_kinstr_prfcnt_get_sample_size(cli, kinstr_ctx->metadata); - /* Use virtualizer's metadata to alloc tmp buffer which interacts with - * the HWC virtualizer. - */ - err = kbase_hwcnt_dump_buffer_alloc(kinstr_ctx->metadata, - &cli->tmp_buf); + /* Use virtualizer's metadata to alloc tmp buffer which interacts with + * the HWC virtualizer. + */ + err = kbase_hwcnt_dump_buffer_alloc(kinstr_ctx->metadata, &cli->tmp_buf); + break; - if (err < 0) - goto error; + case KINSTR_PRFCNT_SAMPLE_ARRAY: + /* Disable clock map in setup, and enable clock map when start */ + cli->enable_map.clk_enable_map = 0; - /* Disable clock map in setup, and enable clock map when start */ - cli->enable_map.clk_enable_map = 0; + /* Use metadata from virtualizer to allocate dump buffers if + * kinstr_prfcnt doesn't have the truncated metadata. + */ + err = kbasep_kinstr_prfcnt_sample_array_alloc(cli, kinstr_ctx->metadata); - /* Use metadata from virtualizer to allocate dump buffers if - * kinstr_prfcnt doesn't have the truncated metadata. - */ - err = kbasep_kinstr_prfcnt_sample_array_alloc(cli, kinstr_ctx->metadata); + break; - if (err < 0) - goto error; + case KINSTR_PRFCNT_VIRTUALIZER_CLIENT: + /* Set enable map to be 0 to prevent virtualizer to init and kick the + * backend to count. + */ + kbase_hwcnt_gpu_enable_map_from_physical( + &cli->enable_map, &(struct kbase_hwcnt_physical_enable_map){ 0 }); - /* Set enable map to be 0 to prevent virtualizer to init and kick the backend to count */ - kbase_hwcnt_gpu_enable_map_from_physical(&cli->enable_map, - &(struct kbase_hwcnt_physical_enable_map){ 0 }); + err = kbase_hwcnt_virtualizer_client_create(kinstr_ctx->hvirt, + &cli->enable_map, &cli->hvcli); + break; - err = kbase_hwcnt_virtualizer_client_create( - kinstr_ctx->hvirt, &cli->enable_map, &cli->hvcli); + case KINSTR_PRFCNT_WAITQ_MUTEX: + init_waitqueue_head(&cli->waitq); + mutex_init(&cli->cmd_sync_lock); + break; - if (err < 0) - goto error; + case KINSTR_PRFCNT_INITIALISED: + /* This shouldn't be reached */ + break; + } - init_waitqueue_head(&cli->waitq); - INIT_WORK(&cli->async.dump_work, - kbasep_kinstr_prfcnt_async_dump_worker); - mutex_init(&cli->cmd_sync_lock); + if (err < 0) { + kbasep_kinstr_prfcnt_client_destroy_partial(cli, init_state); + return err; + } + } *out_vcli = cli; return 0; -error: - kbasep_kinstr_prfcnt_client_destroy(cli); - return err; } static size_t kbasep_kinstr_prfcnt_get_block_info_count( @@ -2033,7 +1852,6 @@ static int kbasep_kinstr_prfcnt_enum_info_count( struct kbase_kinstr_prfcnt_context *kinstr_ctx, struct kbase_ioctl_kinstr_prfcnt_enum_info *enum_info) { - int err = 0; uint32_t count = 0; size_t block_info_count = 0; const struct kbase_hwcnt_metadata *metadata; @@ -2054,7 +1872,7 @@ static int kbasep_kinstr_prfcnt_enum_info_count( enum_info->info_item_size = sizeof(struct prfcnt_enum_item); kinstr_ctx->info_item_count = count; - return err; + return 0; } static int kbasep_kinstr_prfcnt_enum_info_list( @@ -2148,17 +1966,18 @@ int kbase_kinstr_prfcnt_setup(struct kbase_kinstr_prfcnt_context *kinstr_ctx, union kbase_ioctl_kinstr_prfcnt_setup *setup) { int err; - unsigned int item_count; - unsigned long bytes; - struct prfcnt_request_item *req_arr; + size_t item_count; + size_t bytes; + struct prfcnt_request_item *req_arr = NULL; struct kbase_kinstr_prfcnt_client *cli = NULL; + const size_t max_bytes = 32 * sizeof(*req_arr); if (!kinstr_ctx || !setup) return -EINVAL; item_count = setup->in.request_item_count; - /* Limiting the request items to 2x of the expected: acommodating + /* Limiting the request items to 2x of the expected: accommodating * moderate duplications but rejecting excessive abuses. */ if (!setup->in.requests_ptr || (item_count < 2) || (setup->in.request_item_size == 0) || @@ -2166,16 +1985,22 @@ int kbase_kinstr_prfcnt_setup(struct kbase_kinstr_prfcnt_context *kinstr_ctx, return -EINVAL; } - bytes = item_count * sizeof(*req_arr); - req_arr = kmalloc(bytes, GFP_KERNEL); + if (check_mul_overflow(item_count, sizeof(*req_arr), &bytes)) + return -EINVAL; + + /* Further limiting the max bytes to copy from userspace by setting it in the following + * fashion: a maximum of 1 mode item, 4 types of 3 sets for a total of 12 enable items, + * each currently at the size of prfcnt_request_item. + * + * Note: if more request types get added, this max limit needs to be updated. + */ + if (bytes > max_bytes) + return -EINVAL; - if (!req_arr) - return -ENOMEM; + req_arr = memdup_user(u64_to_user_ptr(setup->in.requests_ptr), bytes); - if (copy_from_user(req_arr, u64_to_user_ptr(setup->in.requests_ptr), bytes)) { - err = -EFAULT; - goto free_buf; - } + if (IS_ERR(req_arr)) + return PTR_ERR(req_arr); err = kbasep_kinstr_prfcnt_client_create(kinstr_ctx, setup, &cli, req_arr); diff --git a/mali_kbase/mali_kbase_kinstr_prfcnt.h b/mali_kbase/mali_kbase_kinstr_prfcnt.h index ec42ce0..53e9674 100644 --- a/mali_kbase/mali_kbase_kinstr_prfcnt.h +++ b/mali_kbase/mali_kbase_kinstr_prfcnt.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -26,7 +26,7 @@ #ifndef _KBASE_KINSTR_PRFCNT_H_ #define _KBASE_KINSTR_PRFCNT_H_ -#include "mali_kbase_hwcnt_types.h" +#include "hwcnt/mali_kbase_hwcnt_types.h" #include <uapi/gpu/arm/midgard/mali_kbase_hwcnt_reader.h> struct kbase_kinstr_prfcnt_context; @@ -80,7 +80,6 @@ void kbase_kinstr_prfcnt_suspend(struct kbase_kinstr_prfcnt_context *kinstr_ctx) */ void kbase_kinstr_prfcnt_resume(struct kbase_kinstr_prfcnt_context *kinstr_ctx); -#if MALI_KERNEL_TEST_API /** * kbasep_kinstr_prfcnt_get_block_info_list() - Get list of all block types * with their information. @@ -124,7 +123,7 @@ size_t kbasep_kinstr_prfcnt_get_sample_md_count(const struct kbase_hwcnt_metadat int kbasep_kinstr_prfcnt_set_block_meta_items(struct kbase_hwcnt_enable_map *enable_map, struct kbase_hwcnt_dump_buffer *dst, struct prfcnt_metadata **block_meta_base, - u64 base_addr, u8 counter_set); + u8 *base_addr, u8 counter_set); /** * kbasep_kinstr_prfcnt_client_create() - Create a kinstr_prfcnt client. @@ -158,7 +157,6 @@ int kbasep_kinstr_prfcnt_cmd(struct kbase_kinstr_prfcnt_client *cli, * @cli: kinstr_prfcnt client. Must not be attached to a kinstr_prfcnt context. */ void kbasep_kinstr_prfcnt_client_destroy(struct kbase_kinstr_prfcnt_client *cli); -#endif /* MALI_KERNEL_TEST_API */ /** * kbase_kinstr_prfcnt_enum_info - Enumerate performance counter information. diff --git a/mali_kbase/mali_kbase_linux.h b/mali_kbase/mali_kbase_linux.h index 1d8d196..e5c6f7a 100644 --- a/mali_kbase/mali_kbase_linux.h +++ b/mali_kbase/mali_kbase_linux.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2010-2014, 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2010-2014, 2020-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -33,7 +33,7 @@ #include <linux/module.h> #include <linux/atomic.h> -#if (defined(MALI_KERNEL_TEST_API) && (1 == MALI_KERNEL_TEST_API)) +#if IS_ENABLED(MALI_KERNEL_TEST_API) #define KBASE_EXPORT_TEST_API(func) EXPORT_SYMBOL(func) #else #define KBASE_EXPORT_TEST_API(func) diff --git a/mali_kbase/mali_kbase_mem.c b/mali_kbase/mali_kbase_mem.c index 6562f01..5547bef 100644 --- a/mali_kbase/mali_kbase_mem.c +++ b/mali_kbase/mali_kbase_mem.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2010-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -43,6 +43,11 @@ #include <mmu/mali_kbase_mmu.h> #include <mali_kbase_config_defaults.h> #include <mali_kbase_trace_gpu_mem.h> +#include <linux/version_compat_defs.h> +#define VA_REGION_SLAB_NAME_PREFIX "va-region-slab-" +#define VA_REGION_SLAB_NAME_SIZE (DEVNAME_SIZE + sizeof(VA_REGION_SLAB_NAME_PREFIX) + 1) + +#if MALI_JIT_PRESSURE_LIMIT_BASE /* * Alignment of objects allocated by the GPU inside a just-in-time memory @@ -66,6 +71,7 @@ */ #define KBASE_GPU_ALLOCATED_OBJECT_MAX_BYTES (512u) +#endif /* MALI_JIT_PRESSURE_LIMIT_BASE */ /* Forward declarations */ static void free_partial_locked(struct kbase_context *kctx, @@ -89,68 +95,72 @@ static size_t kbase_get_num_cpu_va_bits(struct kbase_context *kctx) #error "Unknown CPU VA width for this architecture" #endif -#if IS_ENABLED(CONFIG_64BIT) - if (kbase_ctx_flag(kctx, KCTX_COMPAT)) + if (kbase_ctx_compat_mode(kctx)) cpu_va_bits = 32; -#endif return cpu_va_bits; } -/* This function finds out which RB tree the given pfn from the GPU VA belongs - * to based on the memory zone the pfn refers to - */ -static struct rb_root *kbase_gpu_va_to_rbtree(struct kbase_context *kctx, - u64 gpu_pfn) +unsigned long kbase_zone_to_bits(enum kbase_memory_zone zone) { - struct rb_root *rbtree = NULL; + return ((((unsigned long)zone) & ((1 << KBASE_REG_ZONE_BITS) - 1ul)) + << KBASE_REG_ZONE_SHIFT); +} - struct kbase_reg_zone *exec_va_zone = kbase_ctx_reg_zone_get(kctx, KBASE_REG_ZONE_EXEC_VA); +enum kbase_memory_zone kbase_bits_to_zone(unsigned long zone_bits) +{ + return (enum kbase_memory_zone)(((zone_bits) & KBASE_REG_ZONE_MASK) + >> KBASE_REG_ZONE_SHIFT); +} +char *kbase_reg_zone_get_name(enum kbase_memory_zone zone) +{ + switch (zone) { + case SAME_VA_ZONE: + return "SAME_VA"; + case CUSTOM_VA_ZONE: + return "CUSTOM_VA"; + case EXEC_VA_ZONE: + return "EXEC_VA"; #if MALI_USE_CSF - struct kbase_reg_zone *fixed_va_zone = - kbase_ctx_reg_zone_get(kctx, KBASE_REG_ZONE_FIXED_VA); - - struct kbase_reg_zone *exec_fixed_va_zone = - kbase_ctx_reg_zone_get(kctx, KBASE_REG_ZONE_EXEC_FIXED_VA); - - if (gpu_pfn >= fixed_va_zone->base_pfn) { - rbtree = &kctx->reg_rbtree_fixed; - return rbtree; - } else if (gpu_pfn >= exec_fixed_va_zone->base_pfn) { - rbtree = &kctx->reg_rbtree_exec_fixed; - return rbtree; - } + case MCU_SHARED_ZONE: + return "MCU_SHARED"; + case EXEC_FIXED_VA_ZONE: + return "EXEC_FIXED_VA"; + case FIXED_VA_ZONE: + return "FIXED_VA"; #endif - if (gpu_pfn >= exec_va_zone->base_pfn) - rbtree = &kctx->reg_rbtree_exec; - else { - u64 same_va_end; + default: + return NULL; + } +} -#if IS_ENABLED(CONFIG_64BIT) - if (kbase_ctx_flag(kctx, KCTX_COMPAT)) { -#endif /* CONFIG_64BIT */ - same_va_end = KBASE_REG_ZONE_CUSTOM_VA_BASE; -#if IS_ENABLED(CONFIG_64BIT) - } else { - struct kbase_reg_zone *same_va_zone = - kbase_ctx_reg_zone_get(kctx, - KBASE_REG_ZONE_SAME_VA); - same_va_end = kbase_reg_zone_end_pfn(same_va_zone); - } -#endif /* CONFIG_64BIT */ +/** + * kbase_gpu_pfn_to_rbtree - find the rb-tree tracking the region with the indicated GPU + * page frame number + * @kctx: kbase context + * @gpu_pfn: GPU PFN address + * + * Context: any context. + * + * Return: reference to the rb-tree root, NULL if not found + */ +static struct rb_root *kbase_gpu_pfn_to_rbtree(struct kbase_context *kctx, u64 gpu_pfn) +{ + enum kbase_memory_zone zone_idx; + struct kbase_reg_zone *zone; - if (gpu_pfn >= same_va_end) - rbtree = &kctx->reg_rbtree_custom; - else - rbtree = &kctx->reg_rbtree_same; + for (zone_idx = 0; zone_idx < CONTEXT_ZONE_MAX; zone_idx++) { + zone = &kctx->reg_zone[zone_idx]; + if ((gpu_pfn >= zone->base_pfn) && (gpu_pfn < kbase_reg_zone_end_pfn(zone))) + return &zone->reg_rbtree; } - return rbtree; + return NULL; } /* This function inserts a region into the tree. */ -static void kbase_region_tracker_insert(struct kbase_va_region *new_reg) +void kbase_region_tracker_insert(struct kbase_va_region *new_reg) { u64 start_pfn = new_reg->start_pfn; struct rb_node **link = NULL; @@ -251,7 +261,9 @@ struct kbase_va_region *kbase_region_tracker_find_region_enclosing_address( lockdep_assert_held(&kctx->reg_lock); - rbtree = kbase_gpu_va_to_rbtree(kctx, gpu_pfn); + rbtree = kbase_gpu_pfn_to_rbtree(kctx, gpu_pfn); + if (unlikely(!rbtree)) + return NULL; return kbase_find_region_enclosing_address(rbtree, gpu_addr); } @@ -289,7 +301,9 @@ struct kbase_va_region *kbase_region_tracker_find_region_base_address( lockdep_assert_held(&kctx->reg_lock); - rbtree = kbase_gpu_va_to_rbtree(kctx, gpu_pfn); + rbtree = kbase_gpu_pfn_to_rbtree(kctx, gpu_pfn); + if (unlikely(!rbtree)) + return NULL; return kbase_find_region_base_address(rbtree, gpu_addr); } @@ -376,10 +390,12 @@ void kbase_remove_va_region(struct kbase_device *kbdev, struct kbase_va_region *reg) { struct rb_node *rbprev; + struct kbase_reg_zone *zone = container_of(reg->rbtree, struct kbase_reg_zone, reg_rbtree); struct kbase_va_region *prev = NULL; struct rb_node *rbnext; struct kbase_va_region *next = NULL; struct rb_root *reg_rbtree = NULL; + struct kbase_va_region *orig_reg = reg; int merged_front = 0; int merged_back = 0; @@ -399,8 +415,8 @@ void kbase_remove_va_region(struct kbase_device *kbdev, */ u64 prev_end_pfn = prev->start_pfn + prev->nr_pages; - WARN_ON((prev->flags & KBASE_REG_ZONE_MASK) != - (reg->flags & KBASE_REG_ZONE_MASK)); + WARN_ON((kbase_bits_to_zone(prev->flags)) != + (kbase_bits_to_zone(reg->flags))); if (!WARN_ON(reg->start_pfn < prev_end_pfn)) prev->nr_pages += reg->start_pfn - prev_end_pfn; prev->nr_pages += reg->nr_pages; @@ -421,32 +437,30 @@ void kbase_remove_va_region(struct kbase_device *kbdev, */ u64 reg_end_pfn = reg->start_pfn + reg->nr_pages; - WARN_ON((next->flags & KBASE_REG_ZONE_MASK) != - (reg->flags & KBASE_REG_ZONE_MASK)); + WARN_ON((kbase_bits_to_zone(next->flags)) != + (kbase_bits_to_zone(reg->flags))); if (!WARN_ON(next->start_pfn < reg_end_pfn)) next->nr_pages += next->start_pfn - reg_end_pfn; next->start_pfn = reg->start_pfn; next->nr_pages += reg->nr_pages; rb_erase(&(reg->rblink), reg_rbtree); merged_back = 1; - if (merged_front) { - /* We already merged with prev, free it */ - kfree(reg); - } } } - /* If we failed to merge then we need to add a new block */ - if (!(merged_front || merged_back)) { + if (merged_front && merged_back) { + /* We already merged with prev, free it */ + kfree(reg); + } else if (!(merged_front || merged_back)) { + /* If we failed to merge then we need to add a new block */ + /* * We didn't merge anything. Try to add a new free * placeholder, and in any case, remove the original one. */ struct kbase_va_region *free_reg; - free_reg = kbase_alloc_free_region(reg_rbtree, - reg->start_pfn, reg->nr_pages, - reg->flags & KBASE_REG_ZONE_MASK); + free_reg = kbase_alloc_free_region(zone, reg->start_pfn, reg->nr_pages); if (!free_reg) { /* In case of failure, we cannot allocate a replacement * free region, so we will be left with a 'gap' in the @@ -477,6 +491,12 @@ void kbase_remove_va_region(struct kbase_device *kbdev, rb_replace_node(&(reg->rblink), &(free_reg->rblink), reg_rbtree); } + /* This operation is always safe because the function never frees + * the region. If the region has been merged to both front and back, + * then it's the previous region that is supposed to be freed. + */ + orig_reg->start_pfn = 0; + out: return; } @@ -487,6 +507,7 @@ KBASE_EXPORT_TEST_API(kbase_remove_va_region); * kbase_insert_va_region_nolock - Insert a VA region to the list, * replacing the existing one. * + * @kbdev: The kbase device * @new_reg: The new region to insert * @at_reg: The region to replace * @start_pfn: The Page Frame Number to insert at @@ -494,10 +515,14 @@ KBASE_EXPORT_TEST_API(kbase_remove_va_region); * * Return: 0 on success, error code otherwise. */ -static int kbase_insert_va_region_nolock(struct kbase_va_region *new_reg, - struct kbase_va_region *at_reg, u64 start_pfn, size_t nr_pages) +static int kbase_insert_va_region_nolock(struct kbase_device *kbdev, + struct kbase_va_region *new_reg, + struct kbase_va_region *at_reg, u64 start_pfn, + size_t nr_pages) { struct rb_root *reg_rbtree = NULL; + struct kbase_reg_zone *zone = + container_of(at_reg->rbtree, struct kbase_reg_zone, reg_rbtree); int err = 0; reg_rbtree = at_reg->rbtree; @@ -539,10 +564,8 @@ static int kbase_insert_va_region_nolock(struct kbase_va_region *new_reg, else { struct kbase_va_region *new_front_reg; - new_front_reg = kbase_alloc_free_region(reg_rbtree, - at_reg->start_pfn, - start_pfn - at_reg->start_pfn, - at_reg->flags & KBASE_REG_ZONE_MASK); + new_front_reg = kbase_alloc_free_region(zone, at_reg->start_pfn, + start_pfn - at_reg->start_pfn); if (new_front_reg) { at_reg->nr_pages -= nr_pages + new_front_reg->nr_pages; @@ -595,9 +618,9 @@ int kbase_add_va_region(struct kbase_context *kctx, #endif if (!(reg->flags & KBASE_REG_GPU_NX) && !addr && #if MALI_USE_CSF - ((reg->flags & KBASE_REG_ZONE_MASK) != KBASE_REG_ZONE_EXEC_FIXED_VA) && + ((kbase_bits_to_zone(reg->flags)) != EXEC_FIXED_VA_ZONE) && #endif - ((reg->flags & KBASE_REG_ZONE_MASK) != KBASE_REG_ZONE_EXEC_VA)) { + ((kbase_bits_to_zone(reg->flags)) != EXEC_VA_ZONE)) { if (cpu_va_bits > gpu_pc_bits) { align = max(align, (size_t)((1ULL << gpu_pc_bits) >> PAGE_SHIFT)); @@ -615,8 +638,7 @@ int kbase_add_va_region(struct kbase_context *kctx, * then don't retry, we're out of VA and there is * nothing which can be done about it. */ - if ((reg->flags & KBASE_REG_ZONE_MASK) != - KBASE_REG_ZONE_CUSTOM_VA) + if ((kbase_bits_to_zone(reg->flags)) != CUSTOM_VA_ZONE) break; } while (kbase_jit_evict(kctx)); @@ -679,8 +701,7 @@ int kbase_add_va_region_rbtree(struct kbase_device *kbdev, goto exit; } - err = kbase_insert_va_region_nolock(reg, tmp, gpu_pfn, - nr_pages); + err = kbase_insert_va_region_nolock(kbdev, reg, tmp, gpu_pfn, nr_pages); if (err) { dev_warn(dev, "Failed to insert va region"); err = -ENOMEM; @@ -705,8 +726,7 @@ int kbase_add_va_region_rbtree(struct kbase_device *kbdev, nr_pages, align_offset, align_mask, &start_pfn); if (tmp) { - err = kbase_insert_va_region_nolock(reg, tmp, - start_pfn, nr_pages); + err = kbase_insert_va_region_nolock(kbdev, reg, tmp, start_pfn, nr_pages); if (unlikely(err)) { dev_warn(dev, "Failed to insert region: 0x%08llx start_pfn, %zu nr_pages", start_pfn, nr_pages); @@ -722,85 +742,27 @@ exit: return err; } -/* - * @brief Initialize the internal region tracker data structure. +/** + * kbase_reg_to_kctx - Obtain the kbase context tracking a VA region. + * @reg: VA region + * + * Return: + * * pointer to kbase context of the memory allocation + * * NULL if the region does not belong to a kbase context (for instance, + * if the allocation corresponds to a shared MCU region on CSF). */ -#if MALI_USE_CSF -static void kbase_region_tracker_ds_init(struct kbase_context *kctx, - struct kbase_va_region *same_va_reg, - struct kbase_va_region *custom_va_reg, - struct kbase_va_region *exec_va_reg, - struct kbase_va_region *exec_fixed_va_reg, - struct kbase_va_region *fixed_va_reg) -{ - u64 last_zone_end_pfn; - - kctx->reg_rbtree_same = RB_ROOT; - kbase_region_tracker_insert(same_va_reg); - - last_zone_end_pfn = same_va_reg->start_pfn + same_va_reg->nr_pages; - - /* Although custom_va_reg doesn't always exist, initialize - * unconditionally because of the mem_view debugfs - * implementation which relies on it being empty. - */ - kctx->reg_rbtree_custom = RB_ROOT; - kctx->reg_rbtree_exec = RB_ROOT; - - if (custom_va_reg) { - WARN_ON(custom_va_reg->start_pfn < last_zone_end_pfn); - kbase_region_tracker_insert(custom_va_reg); - last_zone_end_pfn = custom_va_reg->start_pfn + custom_va_reg->nr_pages; - } - - /* Initialize exec, fixed and exec_fixed. These are always - * initialized at this stage, if they will exist at all. - */ - kctx->reg_rbtree_fixed = RB_ROOT; - kctx->reg_rbtree_exec_fixed = RB_ROOT; - - if (exec_va_reg) { - WARN_ON(exec_va_reg->start_pfn < last_zone_end_pfn); - kbase_region_tracker_insert(exec_va_reg); - last_zone_end_pfn = exec_va_reg->start_pfn + exec_va_reg->nr_pages; - } - - if (exec_fixed_va_reg) { - WARN_ON(exec_fixed_va_reg->start_pfn < last_zone_end_pfn); - kbase_region_tracker_insert(exec_fixed_va_reg); - last_zone_end_pfn = exec_fixed_va_reg->start_pfn + exec_fixed_va_reg->nr_pages; - } - - if (fixed_va_reg) { - WARN_ON(fixed_va_reg->start_pfn < last_zone_end_pfn); - kbase_region_tracker_insert(fixed_va_reg); - last_zone_end_pfn = fixed_va_reg->start_pfn + fixed_va_reg->nr_pages; - } -} -#else -static void kbase_region_tracker_ds_init(struct kbase_context *kctx, - struct kbase_va_region *same_va_reg, - struct kbase_va_region *custom_va_reg) +static struct kbase_context *kbase_reg_to_kctx(struct kbase_va_region *reg) { - kctx->reg_rbtree_same = RB_ROOT; - kbase_region_tracker_insert(same_va_reg); + struct rb_root *rbtree = reg->rbtree; + struct kbase_reg_zone *zone = container_of(rbtree, struct kbase_reg_zone, reg_rbtree); - /* Although custom_va_reg and exec_va_reg don't always exist, - * initialize unconditionally because of the mem_view debugfs - * implementation which relies on them being empty. - * - * The difference between the two is that the EXEC_VA region - * is never initialized at this stage. - */ - kctx->reg_rbtree_custom = RB_ROOT; - kctx->reg_rbtree_exec = RB_ROOT; + if (!kbase_is_ctx_reg_zone(zone->id)) + return NULL; - if (custom_va_reg) - kbase_region_tracker_insert(custom_va_reg); + return container_of(zone - zone->id, struct kbase_context, reg_zone[0]); } -#endif /* MALI_USE_CSF */ -static void kbase_region_tracker_erase_rbtree(struct rb_root *rbtree) +void kbase_region_tracker_erase_rbtree(struct rb_root *rbtree) { struct rb_node *rbnode; struct kbase_va_region *reg; @@ -810,7 +772,13 @@ static void kbase_region_tracker_erase_rbtree(struct rb_root *rbtree) if (rbnode) { rb_erase(rbnode, rbtree); reg = rb_entry(rbnode, struct kbase_va_region, rblink); - WARN_ON(reg->va_refcnt != 1); + WARN_ON(kbase_refcount_read(®->va_refcnt) != 1); + if (kbase_is_page_migration_enabled()) { + struct kbase_context *kctx = kbase_reg_to_kctx(reg); + + if (kctx) + kbase_gpu_munmap(kctx, reg); + } /* Reset the start_pfn - as the rbtree is being * destroyed and we've already erased this region, there * is no further need to attempt to remove it. @@ -825,214 +793,261 @@ static void kbase_region_tracker_erase_rbtree(struct rb_root *rbtree) } while (rbnode); } -void kbase_region_tracker_term(struct kbase_context *kctx) -{ - kbase_gpu_vm_lock(kctx); - kbase_region_tracker_erase_rbtree(&kctx->reg_rbtree_same); - kbase_region_tracker_erase_rbtree(&kctx->reg_rbtree_custom); - kbase_region_tracker_erase_rbtree(&kctx->reg_rbtree_exec); -#if MALI_USE_CSF - WARN_ON(!list_empty(&kctx->csf.event_pages_head)); - kbase_region_tracker_erase_rbtree(&kctx->reg_rbtree_exec_fixed); - kbase_region_tracker_erase_rbtree(&kctx->reg_rbtree_fixed); - -#endif - kbase_gpu_vm_unlock(kctx); -} - -void kbase_region_tracker_term_rbtree(struct rb_root *rbtree) -{ - kbase_region_tracker_erase_rbtree(rbtree); -} - static size_t kbase_get_same_va_bits(struct kbase_context *kctx) { return min_t(size_t, kbase_get_num_cpu_va_bits(kctx), kctx->kbdev->gpu_props.mmu.va_bits); } -int kbase_region_tracker_init(struct kbase_context *kctx) +static int kbase_reg_zone_same_va_init(struct kbase_context *kctx, u64 gpu_va_limit) { - struct kbase_va_region *same_va_reg; - struct kbase_va_region *custom_va_reg = NULL; - size_t same_va_bits = kbase_get_same_va_bits(kctx); - u64 custom_va_size = KBASE_REG_ZONE_CUSTOM_VA_SIZE; - u64 gpu_va_bits = kctx->kbdev->gpu_props.mmu.va_bits; - u64 gpu_va_limit = (1ULL << gpu_va_bits) >> PAGE_SHIFT; - u64 same_va_pages; - u64 same_va_base = 1u; int err; -#if MALI_USE_CSF - struct kbase_va_region *exec_va_reg; - struct kbase_va_region *exec_fixed_va_reg; - struct kbase_va_region *fixed_va_reg; - - u64 exec_va_base; - u64 fixed_va_end; - u64 exec_fixed_va_base; - u64 fixed_va_base; - u64 fixed_va_pages; -#endif - - /* Take the lock as kbase_free_alloced_region requires it */ - kbase_gpu_vm_lock(kctx); + struct kbase_reg_zone *zone = kbase_ctx_reg_zone_get(kctx, SAME_VA_ZONE); + const size_t same_va_bits = kbase_get_same_va_bits(kctx); + const u64 base_pfn = 1u; + u64 nr_pages = (1ULL << (same_va_bits - PAGE_SHIFT)) - base_pfn; - same_va_pages = (1ULL << (same_va_bits - PAGE_SHIFT)) - same_va_base; + lockdep_assert_held(&kctx->reg_lock); #if MALI_USE_CSF - if ((same_va_base + same_va_pages) > KBASE_REG_ZONE_EXEC_VA_BASE_64) { + if ((base_pfn + nr_pages) > KBASE_REG_ZONE_EXEC_VA_BASE_64) { /* Depending on how the kernel is configured, it's possible (eg on aarch64) for * same_va_bits to reach 48 bits. Cap same_va_pages so that the same_va zone * doesn't cross into the exec_va zone. */ - same_va_pages = KBASE_REG_ZONE_EXEC_VA_BASE_64 - same_va_base; + nr_pages = KBASE_REG_ZONE_EXEC_VA_BASE_64 - base_pfn; } #endif + err = kbase_reg_zone_init(kctx->kbdev, zone, SAME_VA_ZONE, base_pfn, nr_pages); + if (err) + return -ENOMEM; - /* all have SAME_VA */ - same_va_reg = - kbase_alloc_free_region(&kctx->reg_rbtree_same, same_va_base, - same_va_pages, KBASE_REG_ZONE_SAME_VA); + kctx->gpu_va_end = base_pfn + nr_pages; - if (!same_va_reg) { - err = -ENOMEM; - goto fail_unlock; - } - kbase_ctx_reg_zone_init(kctx, KBASE_REG_ZONE_SAME_VA, same_va_base, - same_va_pages); + return 0; +} -#if IS_ENABLED(CONFIG_64BIT) - /* 32-bit clients have custom VA zones */ - if (kbase_ctx_flag(kctx, KCTX_COMPAT)) { -#endif - if (gpu_va_limit <= KBASE_REG_ZONE_CUSTOM_VA_BASE) { - err = -EINVAL; - goto fail_free_same_va; - } - /* If the current size of TMEM is out of range of the - * virtual address space addressable by the MMU then - * we should shrink it to fit - */ - if ((KBASE_REG_ZONE_CUSTOM_VA_BASE + KBASE_REG_ZONE_CUSTOM_VA_SIZE) >= gpu_va_limit) - custom_va_size = gpu_va_limit - KBASE_REG_ZONE_CUSTOM_VA_BASE; +static void kbase_reg_zone_same_va_term(struct kbase_context *kctx) +{ + struct kbase_reg_zone *zone = kbase_ctx_reg_zone_get(kctx, SAME_VA_ZONE); - custom_va_reg = kbase_alloc_free_region( - &kctx->reg_rbtree_custom, - KBASE_REG_ZONE_CUSTOM_VA_BASE, - custom_va_size, KBASE_REG_ZONE_CUSTOM_VA); + kbase_reg_zone_term(zone); +} - if (!custom_va_reg) { - err = -ENOMEM; - goto fail_free_same_va; - } - kbase_ctx_reg_zone_init(kctx, KBASE_REG_ZONE_CUSTOM_VA, - KBASE_REG_ZONE_CUSTOM_VA_BASE, - custom_va_size); -#if IS_ENABLED(CONFIG_64BIT) - } else { - custom_va_size = 0; - } -#endif +static int kbase_reg_zone_custom_va_init(struct kbase_context *kctx, u64 gpu_va_limit) +{ + struct kbase_reg_zone *zone = kbase_ctx_reg_zone_get(kctx, CUSTOM_VA_ZONE); + u64 nr_pages = KBASE_REG_ZONE_CUSTOM_VA_SIZE; -#if MALI_USE_CSF - /* The position of EXEC_VA depends on whether the client is 32-bit or 64-bit. */ - exec_va_base = KBASE_REG_ZONE_EXEC_VA_BASE_64; + /* If the context does not support CUSTOM_VA zones, then we don't need to + * proceed past this point, and can pretend that it was initialized properly. + * In practice, this will mean that the zone metadata structure will be zero + * initialized and not contain a valid zone ID. + */ + if (!kbase_ctx_compat_mode(kctx)) + return 0; + + if (gpu_va_limit <= KBASE_REG_ZONE_CUSTOM_VA_BASE) + return -EINVAL; - /* Similarly the end of the FIXED_VA zone also depends on whether the client - * is 32 or 64-bits. + /* If the current size of TMEM is out of range of the + * virtual address space addressable by the MMU then + * we should shrink it to fit */ - fixed_va_end = KBASE_REG_ZONE_FIXED_VA_END_64; + if ((KBASE_REG_ZONE_CUSTOM_VA_BASE + KBASE_REG_ZONE_CUSTOM_VA_SIZE) >= gpu_va_limit) + nr_pages = gpu_va_limit - KBASE_REG_ZONE_CUSTOM_VA_BASE; -#if IS_ENABLED(CONFIG_64BIT) - if (kbase_ctx_flag(kctx, KCTX_COMPAT)) { - exec_va_base = KBASE_REG_ZONE_EXEC_VA_BASE_32; - fixed_va_end = KBASE_REG_ZONE_FIXED_VA_END_32; - } + if (kbase_reg_zone_init(kctx->kbdev, zone, CUSTOM_VA_ZONE, KBASE_REG_ZONE_CUSTOM_VA_BASE, + nr_pages)) + return -ENOMEM; + + /* On JM systems, this is the last memory zone that gets initialized, + * so the GPU VA ends right after the end of the CUSTOM_VA zone. On CSF, + * setting here is harmless, as the FIXED_VA initializer will overwrite + * it + */ + kctx->gpu_va_end += nr_pages; + + return 0; +} + +static void kbase_reg_zone_custom_va_term(struct kbase_context *kctx) +{ + struct kbase_reg_zone *zone = kbase_ctx_reg_zone_get(kctx, CUSTOM_VA_ZONE); + + kbase_reg_zone_term(zone); +} + +static inline u64 kbase_get_exec_va_zone_base(struct kbase_context *kctx) +{ + u64 base_pfn; + +#if MALI_USE_CSF + base_pfn = KBASE_REG_ZONE_EXEC_VA_BASE_64; + if (kbase_ctx_compat_mode(kctx)) + base_pfn = KBASE_REG_ZONE_EXEC_VA_BASE_32; +#else + /* EXEC_VA zone's codepaths are slightly easier when its base_pfn is + * initially U64_MAX + */ + base_pfn = U64_MAX; #endif - kbase_ctx_reg_zone_init(kctx, KBASE_REG_ZONE_EXEC_VA, exec_va_base, - KBASE_REG_ZONE_EXEC_VA_SIZE); + return base_pfn; +} - exec_va_reg = kbase_alloc_free_region(&kctx->reg_rbtree_exec, exec_va_base, - KBASE_REG_ZONE_EXEC_VA_SIZE, KBASE_REG_ZONE_EXEC_VA); +static inline int kbase_reg_zone_exec_va_init(struct kbase_context *kctx, u64 gpu_va_limit) +{ + struct kbase_reg_zone *zone = kbase_ctx_reg_zone_get(kctx, EXEC_VA_ZONE); + const u64 base_pfn = kbase_get_exec_va_zone_base(kctx); + u64 nr_pages = KBASE_REG_ZONE_EXEC_VA_SIZE; - if (!exec_va_reg) { - err = -ENOMEM; - goto fail_free_custom_va; - } +#if !MALI_USE_CSF + nr_pages = 0; +#endif - exec_fixed_va_base = exec_va_base + KBASE_REG_ZONE_EXEC_VA_SIZE; + return kbase_reg_zone_init(kctx->kbdev, zone, EXEC_VA_ZONE, base_pfn, nr_pages); +} - kbase_ctx_reg_zone_init(kctx, KBASE_REG_ZONE_EXEC_FIXED_VA, exec_fixed_va_base, - KBASE_REG_ZONE_EXEC_FIXED_VA_SIZE); +static void kbase_reg_zone_exec_va_term(struct kbase_context *kctx) +{ + struct kbase_reg_zone *zone = kbase_ctx_reg_zone_get(kctx, EXEC_VA_ZONE); - exec_fixed_va_reg = - kbase_alloc_free_region(&kctx->reg_rbtree_exec_fixed, exec_fixed_va_base, - KBASE_REG_ZONE_EXEC_FIXED_VA_SIZE, - KBASE_REG_ZONE_EXEC_FIXED_VA); + kbase_reg_zone_term(zone); +} - if (!exec_fixed_va_reg) { - err = -ENOMEM; - goto fail_free_exec_va; - } +#if MALI_USE_CSF +static inline u64 kbase_get_exec_fixed_va_zone_base(struct kbase_context *kctx) +{ + return kbase_get_exec_va_zone_base(kctx) + KBASE_REG_ZONE_EXEC_VA_SIZE; +} + +static int kbase_reg_zone_exec_fixed_va_init(struct kbase_context *kctx, u64 gpu_va_limit) +{ + struct kbase_reg_zone *zone = kbase_ctx_reg_zone_get(kctx, EXEC_FIXED_VA_ZONE); + const u64 base_pfn = kbase_get_exec_fixed_va_zone_base(kctx); + + return kbase_reg_zone_init(kctx->kbdev, zone, EXEC_FIXED_VA_ZONE, base_pfn, + KBASE_REG_ZONE_EXEC_FIXED_VA_SIZE); +} - fixed_va_base = exec_fixed_va_base + KBASE_REG_ZONE_EXEC_FIXED_VA_SIZE; - fixed_va_pages = fixed_va_end - fixed_va_base; +static void kbase_reg_zone_exec_fixed_va_term(struct kbase_context *kctx) +{ + struct kbase_reg_zone *zone = kbase_ctx_reg_zone_get(kctx, EXEC_FIXED_VA_ZONE); + + WARN_ON(!list_empty(&kctx->csf.event_pages_head)); + kbase_reg_zone_term(zone); +} + +static int kbase_reg_zone_fixed_va_init(struct kbase_context *kctx, u64 gpu_va_limit) +{ + struct kbase_reg_zone *zone = kbase_ctx_reg_zone_get(kctx, FIXED_VA_ZONE); + const u64 base_pfn = + kbase_get_exec_fixed_va_zone_base(kctx) + KBASE_REG_ZONE_EXEC_FIXED_VA_SIZE; + u64 fixed_va_end = KBASE_REG_ZONE_FIXED_VA_END_64; + u64 nr_pages; + + if (kbase_ctx_compat_mode(kctx)) + fixed_va_end = KBASE_REG_ZONE_FIXED_VA_END_32; - kbase_ctx_reg_zone_init(kctx, KBASE_REG_ZONE_FIXED_VA, fixed_va_base, fixed_va_pages); + nr_pages = fixed_va_end - base_pfn; - fixed_va_reg = kbase_alloc_free_region(&kctx->reg_rbtree_fixed, fixed_va_base, - fixed_va_pages, KBASE_REG_ZONE_FIXED_VA); + if (kbase_reg_zone_init(kctx->kbdev, zone, FIXED_VA_ZONE, base_pfn, nr_pages)) + return -ENOMEM; kctx->gpu_va_end = fixed_va_end; - if (!fixed_va_reg) { - err = -ENOMEM; - goto fail_free_exec_fixed_va; - } + return 0; +} - kbase_region_tracker_ds_init(kctx, same_va_reg, custom_va_reg, exec_va_reg, - exec_fixed_va_reg, fixed_va_reg); +static void kbase_reg_zone_fixed_va_term(struct kbase_context *kctx) +{ + struct kbase_reg_zone *zone = kbase_ctx_reg_zone_get(kctx, FIXED_VA_ZONE); - INIT_LIST_HEAD(&kctx->csf.event_pages_head); -#else - /* EXEC_VA zone's codepaths are slightly easier when its base_pfn is - * initially U64_MAX - */ - kbase_ctx_reg_zone_init(kctx, KBASE_REG_ZONE_EXEC_VA, U64_MAX, 0u); - /* Other zones are 0: kbase_create_context() uses vzalloc */ + kbase_reg_zone_term(zone); +} +#endif + +typedef int kbase_memory_zone_init(struct kbase_context *kctx, u64 gpu_va_limit); +typedef void kbase_memory_zone_term(struct kbase_context *kctx); + +struct kbase_memory_zone_init_meta { + kbase_memory_zone_init *init; + kbase_memory_zone_term *term; + char *error_msg; +}; + +static const struct kbase_memory_zone_init_meta zones_init[] = { + [SAME_VA_ZONE] = { kbase_reg_zone_same_va_init, kbase_reg_zone_same_va_term, + "Could not initialize SAME_VA zone" }, + [CUSTOM_VA_ZONE] = { kbase_reg_zone_custom_va_init, kbase_reg_zone_custom_va_term, + "Could not initialize CUSTOM_VA zone" }, + [EXEC_VA_ZONE] = { kbase_reg_zone_exec_va_init, kbase_reg_zone_exec_va_term, + "Could not initialize EXEC_VA zone" }, +#if MALI_USE_CSF + [EXEC_FIXED_VA_ZONE] = { kbase_reg_zone_exec_fixed_va_init, + kbase_reg_zone_exec_fixed_va_term, + "Could not initialize EXEC_FIXED_VA zone" }, + [FIXED_VA_ZONE] = { kbase_reg_zone_fixed_va_init, kbase_reg_zone_fixed_va_term, + "Could not initialize FIXED_VA zone" }, +#endif +}; - kbase_region_tracker_ds_init(kctx, same_va_reg, custom_va_reg); - kctx->gpu_va_end = same_va_base + same_va_pages + custom_va_size; +int kbase_region_tracker_init(struct kbase_context *kctx) +{ + const u64 gpu_va_bits = kctx->kbdev->gpu_props.mmu.va_bits; + const u64 gpu_va_limit = (1ULL << gpu_va_bits) >> PAGE_SHIFT; + int err; + unsigned int i; + + /* Take the lock as kbase_free_alloced_region requires it */ + kbase_gpu_vm_lock(kctx); + + for (i = 0; i < ARRAY_SIZE(zones_init); i++) { + err = zones_init[i].init(kctx, gpu_va_limit); + if (unlikely(err)) { + dev_err(kctx->kbdev->dev, "%s, err = %d\n", zones_init[i].error_msg, err); + goto term; + } + } +#if MALI_USE_CSF + INIT_LIST_HEAD(&kctx->csf.event_pages_head); #endif kctx->jit_va = false; kbase_gpu_vm_unlock(kctx); - return 0; -#if MALI_USE_CSF -fail_free_exec_fixed_va: - kbase_free_alloced_region(exec_fixed_va_reg); -fail_free_exec_va: - kbase_free_alloced_region(exec_va_reg); -fail_free_custom_va: - if (custom_va_reg) - kbase_free_alloced_region(custom_va_reg); -#endif + return 0; +term: + while (i-- > 0) + zones_init[i].term(kctx); -fail_free_same_va: - kbase_free_alloced_region(same_va_reg); -fail_unlock: kbase_gpu_vm_unlock(kctx); return err; } +void kbase_region_tracker_term(struct kbase_context *kctx) +{ + unsigned int i; + + WARN(kctx->as_nr != KBASEP_AS_NR_INVALID, + "kctx-%d_%d must first be scheduled out to flush GPU caches+tlbs before erasing remaining regions", + kctx->tgid, kctx->id); + + kbase_gpu_vm_lock(kctx); + + for (i = 0; i < ARRAY_SIZE(zones_init); i++) + zones_init[i].term(kctx); + + kbase_gpu_vm_unlock(kctx); +} + static bool kbase_has_exec_va_zone_locked(struct kbase_context *kctx) { struct kbase_reg_zone *exec_va_zone; lockdep_assert_held(&kctx->reg_lock); - exec_va_zone = kbase_ctx_reg_zone_get(kctx, KBASE_REG_ZONE_EXEC_VA); + exec_va_zone = kbase_ctx_reg_zone_get(kctx, EXEC_VA_ZONE); return (exec_va_zone->base_pfn != U64_MAX); } @@ -1072,16 +1087,16 @@ static bool kbase_region_tracker_has_allocs(struct kbase_context *kctx) lockdep_assert_held(&kctx->reg_lock); - for (zone_idx = 0; zone_idx < KBASE_REG_ZONE_MAX; ++zone_idx) { + for (zone_idx = 0; zone_idx < MEMORY_ZONE_MAX; zone_idx++) { struct kbase_reg_zone *zone; struct kbase_va_region *reg; u64 zone_base_addr; - unsigned long zone_bits = KBASE_REG_ZONE(zone_idx); - unsigned long reg_zone; + enum kbase_memory_zone reg_zone; - if (!kbase_is_ctx_reg_zone(zone_bits)) + if (!kbase_is_ctx_reg_zone(zone_idx)) continue; - zone = kbase_ctx_reg_zone_get(kctx, zone_bits); + + zone = kbase_ctx_reg_zone_get(kctx, zone_idx); zone_base_addr = zone->base_pfn << PAGE_SHIFT; reg = kbase_region_tracker_find_region_base_address( @@ -1089,21 +1104,21 @@ static bool kbase_region_tracker_has_allocs(struct kbase_context *kctx) if (!zone->va_size_pages) { WARN(reg, - "Should not have found a region that starts at 0x%.16llx for zone 0x%lx", - (unsigned long long)zone_base_addr, zone_bits); + "Should not have found a region that starts at 0x%.16llx for zone %s", + (unsigned long long)zone_base_addr, kbase_reg_zone_get_name(zone_idx)); continue; } if (WARN(!reg, - "There should always be a region that starts at 0x%.16llx for zone 0x%lx, couldn't find it", - (unsigned long long)zone_base_addr, zone_bits)) + "There should always be a region that starts at 0x%.16llx for zone %s, couldn't find it", + (unsigned long long)zone_base_addr, kbase_reg_zone_get_name(zone_idx))) return true; /* Safest return value */ - reg_zone = reg->flags & KBASE_REG_ZONE_MASK; - if (WARN(reg_zone != zone_bits, - "The region that starts at 0x%.16llx should be in zone 0x%lx but was found in the wrong zone 0x%lx", - (unsigned long long)zone_base_addr, zone_bits, - reg_zone)) + reg_zone = kbase_bits_to_zone(reg->flags); + if (WARN(reg_zone != zone_idx, + "The region that starts at 0x%.16llx should be in zone %s but was found in the wrong zone %s", + (unsigned long long)zone_base_addr, kbase_reg_zone_get_name(zone_idx), + kbase_reg_zone_get_name(reg_zone))) return true; /* Safest return value */ /* Unless the region is completely free, of the same size as @@ -1120,15 +1135,12 @@ static bool kbase_region_tracker_has_allocs(struct kbase_context *kctx) return false; } -#if IS_ENABLED(CONFIG_64BIT) static int kbase_region_tracker_init_jit_64(struct kbase_context *kctx, u64 jit_va_pages) { struct kbase_va_region *same_va_reg; - struct kbase_reg_zone *same_va_zone; + struct kbase_reg_zone *same_va_zone, *custom_va_zone; u64 same_va_zone_base_addr; - const unsigned long same_va_zone_bits = KBASE_REG_ZONE_SAME_VA; - struct kbase_va_region *custom_va_reg; u64 jit_va_start; lockdep_assert_held(&kctx->reg_lock); @@ -1139,14 +1151,14 @@ static int kbase_region_tracker_init_jit_64(struct kbase_context *kctx, * cause an overlap to happen with existing same VA allocations and the * custom VA zone. */ - same_va_zone = kbase_ctx_reg_zone_get(kctx, same_va_zone_bits); + same_va_zone = kbase_ctx_reg_zone_get(kctx, SAME_VA_ZONE); same_va_zone_base_addr = same_va_zone->base_pfn << PAGE_SHIFT; same_va_reg = kbase_region_tracker_find_region_base_address( kctx, same_va_zone_base_addr); if (WARN(!same_va_reg, - "Already found a free region at the start of every zone, but now cannot find any region for zone base 0x%.16llx zone 0x%lx", - (unsigned long long)same_va_zone_base_addr, same_va_zone_bits)) + "Already found a free region at the start of every zone, but now cannot find any region for zone SAME_VA base 0x%.16llx", + (unsigned long long)same_va_zone_base_addr)) return -ENOMEM; /* kbase_region_tracker_has_allocs() in the caller has already ensured @@ -1167,28 +1179,17 @@ static int kbase_region_tracker_init_jit_64(struct kbase_context *kctx, /* * Create a custom VA zone at the end of the VA for allocations which - * JIT can use so it doesn't have to allocate VA from the kernel. - */ - custom_va_reg = - kbase_alloc_free_region(&kctx->reg_rbtree_custom, jit_va_start, - jit_va_pages, KBASE_REG_ZONE_CUSTOM_VA); - - /* - * The context will be destroyed if we fail here so no point - * reverting the change we made to same_va. + * JIT can use so it doesn't have to allocate VA from the kernel. Note + * that while the zone has already been zero-initialized during the + * region tracker initialization, we can just overwrite it. */ - if (!custom_va_reg) + custom_va_zone = kbase_ctx_reg_zone_get(kctx, CUSTOM_VA_ZONE); + if (kbase_reg_zone_init(kctx->kbdev, custom_va_zone, CUSTOM_VA_ZONE, jit_va_start, + jit_va_pages)) return -ENOMEM; - /* Since this is 64-bit, the custom zone will not have been - * initialized, so initialize it now - */ - kbase_ctx_reg_zone_init(kctx, KBASE_REG_ZONE_CUSTOM_VA, jit_va_start, - jit_va_pages); - kbase_region_tracker_insert(custom_va_reg); return 0; } -#endif int kbase_region_tracker_init_jit(struct kbase_context *kctx, u64 jit_va_pages, int max_allocations, int trim_level, int group_id, @@ -1229,10 +1230,8 @@ int kbase_region_tracker_init_jit(struct kbase_context *kctx, u64 jit_va_pages, goto exit_unlock; } -#if IS_ENABLED(CONFIG_64BIT) - if (!kbase_ctx_flag(kctx, KCTX_COMPAT)) + if (!kbase_ctx_compat_mode(kctx)) err = kbase_region_tracker_init_jit_64(kctx, jit_va_pages); -#endif /* * Nothing to do for 32-bit clients, JIT uses the existing * custom VA zone. @@ -1259,12 +1258,11 @@ exit_unlock: int kbase_region_tracker_init_exec(struct kbase_context *kctx, u64 exec_va_pages) { #if !MALI_USE_CSF - struct kbase_va_region *exec_va_reg; struct kbase_reg_zone *exec_va_zone; struct kbase_reg_zone *target_zone; struct kbase_va_region *target_reg; u64 target_zone_base_addr; - unsigned long target_zone_bits; + enum kbase_memory_zone target_zone_id; u64 exec_va_start; int err; #endif @@ -1308,25 +1306,23 @@ int kbase_region_tracker_init_exec(struct kbase_context *kctx, u64 exec_va_pages goto exit_unlock; } -#if IS_ENABLED(CONFIG_64BIT) - if (kbase_ctx_flag(kctx, KCTX_COMPAT)) { -#endif + if (kbase_ctx_compat_mode(kctx)) { /* 32-bit client: take from CUSTOM_VA zone */ - target_zone_bits = KBASE_REG_ZONE_CUSTOM_VA; -#if IS_ENABLED(CONFIG_64BIT) + target_zone_id = CUSTOM_VA_ZONE; } else { /* 64-bit client: take from SAME_VA zone */ - target_zone_bits = KBASE_REG_ZONE_SAME_VA; + target_zone_id = SAME_VA_ZONE; } -#endif - target_zone = kbase_ctx_reg_zone_get(kctx, target_zone_bits); + + target_zone = kbase_ctx_reg_zone_get(kctx, target_zone_id); target_zone_base_addr = target_zone->base_pfn << PAGE_SHIFT; target_reg = kbase_region_tracker_find_region_base_address( kctx, target_zone_base_addr); if (WARN(!target_reg, - "Already found a free region at the start of every zone, but now cannot find any region for zone base 0x%.16llx zone 0x%lx", - (unsigned long long)target_zone_base_addr, target_zone_bits)) { + "Already found a free region at the start of every zone, but now cannot find any region for zone base 0x%.16llx zone %s", + (unsigned long long)target_zone_base_addr, + kbase_reg_zone_get_name(target_zone_id))) { err = -ENOMEM; goto exit_unlock; } @@ -1345,28 +1341,14 @@ int kbase_region_tracker_init_exec(struct kbase_context *kctx, u64 exec_va_pages /* Taken from the end of the target zone */ exec_va_start = kbase_reg_zone_end_pfn(target_zone) - exec_va_pages; - - exec_va_reg = kbase_alloc_free_region(&kctx->reg_rbtree_exec, - exec_va_start, - exec_va_pages, - KBASE_REG_ZONE_EXEC_VA); - if (!exec_va_reg) { - err = -ENOMEM; - goto exit_unlock; - } - /* Update EXEC_VA zone - * - * not using kbase_ctx_reg_zone_init() - it was already initialized - */ - exec_va_zone = kbase_ctx_reg_zone_get(kctx, KBASE_REG_ZONE_EXEC_VA); - exec_va_zone->base_pfn = exec_va_start; - exec_va_zone->va_size_pages = exec_va_pages; + exec_va_zone = kbase_ctx_reg_zone_get(kctx, EXEC_VA_ZONE); + if (kbase_reg_zone_init(kctx->kbdev, exec_va_zone, EXEC_VA_ZONE, exec_va_start, + exec_va_pages)) + return -ENOMEM; /* Update target zone and corresponding region */ target_reg->nr_pages -= exec_va_pages; target_zone->va_size_pages -= exec_va_pages; - - kbase_region_tracker_insert(exec_va_reg); err = 0; exit_unlock: @@ -1378,36 +1360,40 @@ exit_unlock: #if MALI_USE_CSF void kbase_mcu_shared_interface_region_tracker_term(struct kbase_device *kbdev) { - kbase_region_tracker_term_rbtree(&kbdev->csf.shared_reg_rbtree); + kbase_reg_zone_term(&kbdev->csf.mcu_shared_zone); } int kbase_mcu_shared_interface_region_tracker_init(struct kbase_device *kbdev) { - struct kbase_va_region *shared_reg; - u64 shared_reg_start_pfn; - u64 shared_reg_size; - - shared_reg_start_pfn = KBASE_REG_ZONE_MCU_SHARED_BASE; - shared_reg_size = KBASE_REG_ZONE_MCU_SHARED_SIZE; - - kbdev->csf.shared_reg_rbtree = RB_ROOT; - - shared_reg = kbase_alloc_free_region(&kbdev->csf.shared_reg_rbtree, - shared_reg_start_pfn, - shared_reg_size, - KBASE_REG_ZONE_MCU_SHARED); - if (!shared_reg) - return -ENOMEM; - - kbase_region_tracker_insert(shared_reg); - return 0; + return kbase_reg_zone_init(kbdev, &kbdev->csf.mcu_shared_zone, MCU_SHARED_ZONE, + KBASE_REG_ZONE_MCU_SHARED_BASE, MCU_SHARED_ZONE_SIZE); } #endif +static void kbasep_mem_page_size_init(struct kbase_device *kbdev) +{ +#if IS_ENABLED(CONFIG_LARGE_PAGE_ALLOC_OVERRIDE) +#if IS_ENABLED(CONFIG_LARGE_PAGE_ALLOC) + kbdev->pagesize_2mb = true; + if (kbase_hw_has_feature(kbdev, BASE_HW_FEATURE_LARGE_PAGE_ALLOC) != 1) { + dev_warn( + kbdev->dev, + "2MB page is enabled by force while current GPU-HW doesn't meet the requirement to do so.\n"); + } +#else /* IS_ENABLED(CONFIG_LARGE_PAGE_ALLOC) */ + kbdev->pagesize_2mb = false; +#endif /* IS_ENABLED(CONFIG_LARGE_PAGE_ALLOC) */ +#else /* IS_ENABLED(CONFIG_LARGE_PAGE_ALLOC_OVERRIDE) */ + /* Set it to the default based on which GPU is present */ + kbdev->pagesize_2mb = kbase_hw_has_feature(kbdev, BASE_HW_FEATURE_LARGE_PAGE_ALLOC); +#endif /* IS_ENABLED(CONFIG_LARGE_PAGE_ALLOC_OVERRIDE) */ +} + int kbase_mem_init(struct kbase_device *kbdev) { int err = 0; struct kbasep_mem_device *memdev; + char va_region_slab_name[VA_REGION_SLAB_NAME_SIZE]; #if IS_ENABLED(CONFIG_OF) struct device_node *mgm_node = NULL; #endif @@ -1416,6 +1402,20 @@ int kbase_mem_init(struct kbase_device *kbdev) memdev = &kbdev->memdev; + kbasep_mem_page_size_init(kbdev); + + scnprintf(va_region_slab_name, VA_REGION_SLAB_NAME_SIZE, VA_REGION_SLAB_NAME_PREFIX "%s", + kbdev->devname); + + /* Initialize slab cache for kbase_va_regions */ + kbdev->va_region_slab = + kmem_cache_create(va_region_slab_name, sizeof(struct kbase_va_region), 0, 0, NULL); + if (kbdev->va_region_slab == NULL) { + dev_err(kbdev->dev, "Failed to create va_region_slab\n"); + return -ENOMEM; + } + + kbase_mem_migrate_init(kbdev); kbase_mem_pool_group_config_set_max_size(&kbdev->mem_pool_defaults, KBASE_MEM_POOL_MAX_SIZE_KCTX); @@ -1479,8 +1479,7 @@ int kbase_mem_init(struct kbase_device *kbdev) kbase_mem_pool_group_config_set_max_size(&mem_pool_defaults, KBASE_MEM_POOL_MAX_SIZE_KBDEV); - err = kbase_mem_pool_group_init(&kbdev->mem_pools, kbdev, - &mem_pool_defaults, NULL); + err = kbase_mem_pool_group_init(&kbdev->mem_pools, kbdev, &mem_pool_defaults, NULL); } return err; @@ -1506,6 +1505,11 @@ void kbase_mem_term(struct kbase_device *kbdev) kbase_mem_pool_group_term(&kbdev->mem_pools); + kbase_mem_migrate_term(kbdev); + + kmem_cache_destroy(kbdev->va_region_slab); + kbdev->va_region_slab = NULL; + WARN_ON(kbdev->total_gpu_pages); WARN_ON(!RB_EMPTY_ROOT(&kbdev->process_root)); WARN_ON(!RB_EMPTY_ROOT(&kbdev->dma_buf_root)); @@ -1519,41 +1523,41 @@ KBASE_EXPORT_TEST_API(kbase_mem_term); /** * kbase_alloc_free_region - Allocate a free region object. * - * @rbtree: Backlink to the red-black tree of memory regions. + * @zone: CUSTOM_VA_ZONE or SAME_VA_ZONE * @start_pfn: The Page Frame Number in GPU virtual address space. * @nr_pages: The size of the region in pages. - * @zone: KBASE_REG_ZONE_CUSTOM_VA or KBASE_REG_ZONE_SAME_VA * * The allocated object is not part of any list yet, and is flagged as * KBASE_REG_FREE. No mapping is allocated yet. * - * zone is KBASE_REG_ZONE_CUSTOM_VA or KBASE_REG_ZONE_SAME_VA. - * * Return: pointer to the allocated region object on success, NULL otherwise. */ -struct kbase_va_region *kbase_alloc_free_region(struct rb_root *rbtree, - u64 start_pfn, size_t nr_pages, int zone) +struct kbase_va_region *kbase_alloc_free_region(struct kbase_reg_zone *zone, u64 start_pfn, + size_t nr_pages) { struct kbase_va_region *new_reg; - KBASE_DEBUG_ASSERT(rbtree != NULL); - - /* zone argument should only contain zone related region flags */ - KBASE_DEBUG_ASSERT((zone & ~KBASE_REG_ZONE_MASK) == 0); KBASE_DEBUG_ASSERT(nr_pages > 0); /* 64-bit address range is the max */ KBASE_DEBUG_ASSERT(start_pfn + nr_pages <= (U64_MAX / PAGE_SIZE)); - new_reg = kzalloc(sizeof(*new_reg), GFP_KERNEL); + if (WARN_ON(!zone)) + return NULL; + + if (unlikely(!zone->base_pfn || !zone->va_size_pages)) + return NULL; + + new_reg = kmem_cache_zalloc(zone->cache, GFP_KERNEL); if (!new_reg) return NULL; - new_reg->va_refcnt = 1; + kbase_refcount_set(&new_reg->va_refcnt, 1); + atomic_set(&new_reg->no_user_free_count, 0); new_reg->cpu_alloc = NULL; /* no alloc bound yet */ new_reg->gpu_alloc = NULL; /* no alloc bound yet */ - new_reg->rbtree = rbtree; - new_reg->flags = zone | KBASE_REG_FREE; + new_reg->rbtree = &zone->reg_rbtree; + new_reg->flags = kbase_zone_to_bits(zone->id) | KBASE_REG_FREE; new_reg->flags |= KBASE_REG_GROWABLE; @@ -1565,42 +1569,15 @@ struct kbase_va_region *kbase_alloc_free_region(struct rb_root *rbtree, return new_reg; } - KBASE_EXPORT_TEST_API(kbase_alloc_free_region); -static struct kbase_context *kbase_reg_flags_to_kctx( - struct kbase_va_region *reg) +struct kbase_va_region *kbase_ctx_alloc_free_region(struct kbase_context *kctx, + enum kbase_memory_zone id, u64 start_pfn, + size_t nr_pages) { - struct kbase_context *kctx = NULL; - struct rb_root *rbtree = reg->rbtree; + struct kbase_reg_zone *zone = kbase_ctx_reg_zone_get_nolock(kctx, id); - switch (reg->flags & KBASE_REG_ZONE_MASK) { - case KBASE_REG_ZONE_CUSTOM_VA: - kctx = container_of(rbtree, struct kbase_context, - reg_rbtree_custom); - break; - case KBASE_REG_ZONE_SAME_VA: - kctx = container_of(rbtree, struct kbase_context, - reg_rbtree_same); - break; - case KBASE_REG_ZONE_EXEC_VA: - kctx = container_of(rbtree, struct kbase_context, - reg_rbtree_exec); - break; -#if MALI_USE_CSF - case KBASE_REG_ZONE_EXEC_FIXED_VA: - kctx = container_of(rbtree, struct kbase_context, reg_rbtree_exec_fixed); - break; - case KBASE_REG_ZONE_FIXED_VA: - kctx = container_of(rbtree, struct kbase_context, reg_rbtree_fixed); - break; -#endif - default: - WARN(1, "Unknown zone in region: flags=0x%lx\n", reg->flags); - break; - } - - return kctx; + return kbase_alloc_free_region(zone, start_pfn, nr_pages); } /** @@ -1614,18 +1591,18 @@ static struct kbase_context *kbase_reg_flags_to_kctx( * alloc object will be released. * It is a bug if no alloc object exists for non-free regions. * + * If region is MCU_SHARED_ZONE it is freed */ void kbase_free_alloced_region(struct kbase_va_region *reg) { #if MALI_USE_CSF - if ((reg->flags & KBASE_REG_ZONE_MASK) == - KBASE_REG_ZONE_MCU_SHARED) { + if (kbase_bits_to_zone(reg->flags) == MCU_SHARED_ZONE) { kfree(reg); return; } #endif if (!(reg->flags & KBASE_REG_FREE)) { - struct kbase_context *kctx = kbase_reg_flags_to_kctx(reg); + struct kbase_context *kctx = kbase_reg_to_kctx(reg); if (WARN_ON(!kctx)) return; @@ -1633,10 +1610,17 @@ void kbase_free_alloced_region(struct kbase_va_region *reg) if (WARN_ON(kbase_is_region_invalid(reg))) return; - dev_dbg(kctx->kbdev->dev, "Freeing memory region %pK\n", - (void *)reg); + dev_dbg(kctx->kbdev->dev, "Freeing memory region %pK\n of zone %s", (void *)reg, + kbase_reg_zone_get_name(kbase_bits_to_zone(reg->flags))); #if MALI_USE_CSF if (reg->flags & KBASE_REG_CSF_EVENT) + /* + * This should not be reachable if called from 'mcu_shared' functions + * such as: + * kbase_csf_firmware_mcu_shared_mapping_init + * kbase_csf_firmware_mcu_shared_mapping_term + */ + kbase_unlink_event_mem_page(kctx, reg); #endif @@ -1650,8 +1634,6 @@ void kbase_free_alloced_region(struct kbase_va_region *reg) * on the list at termination time of the region tracker. */ if (!list_empty(®->gpu_alloc->evict_node)) { - mutex_unlock(&kctx->jit_evict_lock); - /* * Unlink the physical allocation before unmaking it * evictable so that the allocation isn't grown back to @@ -1662,6 +1644,8 @@ void kbase_free_alloced_region(struct kbase_va_region *reg) if (reg->cpu_alloc != reg->gpu_alloc) reg->gpu_alloc->reg = NULL; + mutex_unlock(&kctx->jit_evict_lock); + /* * If a region has been made evictable then we must * unmake it before trying to free it. @@ -1736,41 +1720,45 @@ int kbase_gpu_mmap(struct kbase_context *kctx, struct kbase_va_region *reg, KBASE_DEBUG_ASSERT(alloc->imported.alias.aliased); for (i = 0; i < alloc->imported.alias.nents; i++) { if (alloc->imported.alias.aliased[i].alloc) { - err = kbase_mmu_insert_pages( - kctx->kbdev, &kctx->mmu, - reg->start_pfn + (i * stride), - alloc->imported.alias.aliased[i] - .alloc->pages + - alloc->imported.alias.aliased[i] - .offset, + err = kbase_mmu_insert_aliased_pages( + kctx->kbdev, &kctx->mmu, reg->start_pfn + (i * stride), + alloc->imported.alias.aliased[i].alloc->pages + + alloc->imported.alias.aliased[i].offset, alloc->imported.alias.aliased[i].length, - reg->flags & gwt_mask, kctx->as_nr, - group_id, mmu_sync_info); + reg->flags & gwt_mask, kctx->as_nr, group_id, mmu_sync_info, + NULL); if (err) - goto bad_insert; + goto bad_aliased_insert; /* Note: mapping count is tracked at alias * creation time */ } else { - err = kbase_mmu_insert_single_page( - kctx, reg->start_pfn + i * stride, - kctx->aliasing_sink_page, + err = kbase_mmu_insert_single_aliased_page( + kctx, reg->start_pfn + i * stride, kctx->aliasing_sink_page, alloc->imported.alias.aliased[i].length, - (reg->flags & mask & gwt_mask) | attr, - group_id, mmu_sync_info); + (reg->flags & mask & gwt_mask) | attr, group_id, + mmu_sync_info); if (err) - goto bad_insert; + goto bad_aliased_insert; } } } else { - err = kbase_mmu_insert_pages(kctx->kbdev, &kctx->mmu, - reg->start_pfn, - kbase_get_gpu_phy_pages(reg), - kbase_reg_current_backed_size(reg), - reg->flags & gwt_mask, kctx->as_nr, - group_id, mmu_sync_info); + if (reg->gpu_alloc->type == KBASE_MEM_TYPE_IMPORTED_UMM || + reg->gpu_alloc->type == KBASE_MEM_TYPE_IMPORTED_USER_BUF) { + err = kbase_mmu_insert_pages_skip_status_update( + kctx->kbdev, &kctx->mmu, reg->start_pfn, + kbase_get_gpu_phy_pages(reg), kbase_reg_current_backed_size(reg), + reg->flags & gwt_mask, kctx->as_nr, group_id, mmu_sync_info, reg); + } else { + err = kbase_mmu_insert_pages(kctx->kbdev, &kctx->mmu, reg->start_pfn, + kbase_get_gpu_phy_pages(reg), + kbase_reg_current_backed_size(reg), + reg->flags & gwt_mask, kctx->as_nr, group_id, + mmu_sync_info, reg); + } + if (err) goto bad_insert; kbase_mem_phy_alloc_gpu_mapped(alloc); @@ -1780,9 +1768,9 @@ int kbase_gpu_mmap(struct kbase_context *kctx, struct kbase_va_region *reg, !WARN_ON(reg->nr_pages < reg->gpu_alloc->nents) && reg->gpu_alloc->type == KBASE_MEM_TYPE_IMPORTED_UMM && reg->gpu_alloc->imported.umm.current_mapping_usage_count) { - /* For padded imported dma-buf memory, map the dummy aliasing - * page from the end of the dma-buf pages, to the end of the - * region using a read only mapping. + /* For padded imported dma-buf or user-buf memory, map the dummy + * aliasing page from the end of the imported pages, to the end of + * the region using a read only mapping. * * Only map when it's imported dma-buf memory that is currently * mapped. @@ -1790,23 +1778,31 @@ int kbase_gpu_mmap(struct kbase_context *kctx, struct kbase_va_region *reg, * Assume reg->gpu_alloc->nents is the number of actual pages * in the dma-buf memory. */ - err = kbase_mmu_insert_single_page( - kctx, reg->start_pfn + reg->gpu_alloc->nents, - kctx->aliasing_sink_page, + err = kbase_mmu_insert_single_imported_page( + kctx, reg->start_pfn + reg->gpu_alloc->nents, kctx->aliasing_sink_page, reg->nr_pages - reg->gpu_alloc->nents, - (reg->flags | KBASE_REG_GPU_RD) & ~KBASE_REG_GPU_WR, - KBASE_MEM_GROUP_SINK, mmu_sync_info); + (reg->flags | KBASE_REG_GPU_RD) & ~KBASE_REG_GPU_WR, KBASE_MEM_GROUP_SINK, + mmu_sync_info); if (err) goto bad_insert; } return err; -bad_insert: - kbase_mmu_teardown_pages(kctx->kbdev, &kctx->mmu, - reg->start_pfn, reg->nr_pages, - kctx->as_nr); +bad_aliased_insert: + while (i-- > 0) { + struct tagged_addr *phys_alloc = NULL; + u64 const stride = alloc->imported.alias.stride; + if (alloc->imported.alias.aliased[i].alloc != NULL) + phys_alloc = alloc->imported.alias.aliased[i].alloc->pages + + alloc->imported.alias.aliased[i].offset; + + kbase_mmu_teardown_pages(kctx->kbdev, &kctx->mmu, reg->start_pfn + (i * stride), + phys_alloc, alloc->imported.alias.aliased[i].length, + alloc->imported.alias.aliased[i].length, kctx->as_nr); + } +bad_insert: kbase_remove_va_region(kctx->kbdev, reg); return err; @@ -1814,12 +1810,13 @@ bad_insert: KBASE_EXPORT_TEST_API(kbase_gpu_mmap); -static void kbase_jd_user_buf_unmap(struct kbase_context *kctx, - struct kbase_mem_phy_alloc *alloc, bool writeable); +static void kbase_jd_user_buf_unmap(struct kbase_context *kctx, struct kbase_mem_phy_alloc *alloc, + struct kbase_va_region *reg); int kbase_gpu_munmap(struct kbase_context *kctx, struct kbase_va_region *reg) { int err = 0; + struct kbase_mem_phy_alloc *alloc; if (reg->start_pfn == 0) return 0; @@ -1827,67 +1824,95 @@ int kbase_gpu_munmap(struct kbase_context *kctx, struct kbase_va_region *reg) if (!reg->gpu_alloc) return -EINVAL; + alloc = reg->gpu_alloc; + /* Tear down GPU page tables, depending on memory type. */ - switch (reg->gpu_alloc->type) { + switch (alloc->type) { case KBASE_MEM_TYPE_ALIAS: { size_t i = 0; - struct kbase_mem_phy_alloc *alloc = reg->gpu_alloc; - /* Due to the way the number of valid PTEs and ATEs are tracked * currently, only the GPU virtual range that is backed & mapped - * should be passed to the kbase_mmu_teardown_pages() function, - * hence individual aliased regions needs to be unmapped - * separately. + * should be passed to the page teardown function, hence individual + * aliased regions needs to be unmapped separately. */ for (i = 0; i < alloc->imported.alias.nents; i++) { - if (alloc->imported.alias.aliased[i].alloc) { - int err_loop = kbase_mmu_teardown_pages( - kctx->kbdev, &kctx->mmu, - reg->start_pfn + - (i * - alloc->imported.alias.stride), - alloc->imported.alias.aliased[i].length, - kctx->as_nr); - if (WARN_ON_ONCE(err_loop)) - err = err_loop; - } + struct tagged_addr *phys_alloc = NULL; + int err_loop; + + if (alloc->imported.alias.aliased[i].alloc != NULL) + phys_alloc = alloc->imported.alias.aliased[i].alloc->pages + + alloc->imported.alias.aliased[i].offset; + + err_loop = kbase_mmu_teardown_pages( + kctx->kbdev, &kctx->mmu, + reg->start_pfn + (i * alloc->imported.alias.stride), + phys_alloc, alloc->imported.alias.aliased[i].length, + alloc->imported.alias.aliased[i].length, kctx->as_nr); + + if (WARN_ON_ONCE(err_loop)) + err = err_loop; } } break; - case KBASE_MEM_TYPE_IMPORTED_UMM: - err = kbase_mmu_teardown_pages(kctx->kbdev, &kctx->mmu, - reg->start_pfn, reg->nr_pages, kctx->as_nr); + case KBASE_MEM_TYPE_IMPORTED_UMM: { + size_t nr_phys_pages = reg->nr_pages; + size_t nr_virt_pages = reg->nr_pages; + /* If the region has import padding and falls under the threshold for + * issuing a partial GPU cache flush, we want to reduce the number of + * physical pages that get flushed. + + * This is symmetric with case of mapping the memory, which first maps + * each imported physical page to a separate virtual page, and then + * maps the single aliasing sink page to each of the virtual padding + * pages. + */ + if (reg->flags & KBASE_REG_IMPORT_PAD) + nr_phys_pages = alloc->nents + 1; + + err = kbase_mmu_teardown_imported_pages(kctx->kbdev, &kctx->mmu, + reg->start_pfn, alloc->pages, + nr_phys_pages, nr_virt_pages, + kctx->as_nr); + } break; - default: - err = kbase_mmu_teardown_pages(kctx->kbdev, &kctx->mmu, - reg->start_pfn, kbase_reg_current_backed_size(reg), - kctx->as_nr); + case KBASE_MEM_TYPE_IMPORTED_USER_BUF: { + size_t nr_reg_pages = kbase_reg_current_backed_size(reg); + + err = kbase_mmu_teardown_imported_pages(kctx->kbdev, &kctx->mmu, + reg->start_pfn, alloc->pages, + nr_reg_pages, nr_reg_pages, + kctx->as_nr); + } + break; + default: { + size_t nr_reg_pages = kbase_reg_current_backed_size(reg); + + err = kbase_mmu_teardown_pages(kctx->kbdev, &kctx->mmu, reg->start_pfn, + alloc->pages, nr_reg_pages, nr_reg_pages, + kctx->as_nr); + } break; } /* Update tracking, and other cleanup, depending on memory type. */ - switch (reg->gpu_alloc->type) { + switch (alloc->type) { case KBASE_MEM_TYPE_ALIAS: /* We mark the source allocs as unmapped from the GPU when * putting reg's allocs */ break; case KBASE_MEM_TYPE_IMPORTED_USER_BUF: { - struct kbase_alloc_import_user_buf *user_buf = - ®->gpu_alloc->imported.user_buf; - - if (user_buf->current_mapping_usage_count & PINNED_ON_IMPORT) { - user_buf->current_mapping_usage_count &= - ~PINNED_ON_IMPORT; - - /* The allocation could still have active mappings. */ - if (user_buf->current_mapping_usage_count == 0) { - kbase_jd_user_buf_unmap(kctx, reg->gpu_alloc, - (reg->flags & (KBASE_REG_CPU_WR | - KBASE_REG_GPU_WR))); - } + struct kbase_alloc_import_user_buf *user_buf = &alloc->imported.user_buf; + + if (user_buf->current_mapping_usage_count & PINNED_ON_IMPORT) { + user_buf->current_mapping_usage_count &= ~PINNED_ON_IMPORT; + + /* The allocation could still have active mappings. */ + if (user_buf->current_mapping_usage_count == 0) { + kbase_jd_user_buf_unmap(kctx, alloc, reg); } } + } fallthrough; default: kbase_mem_phy_alloc_gpu_unmapped(reg->gpu_alloc); @@ -2007,7 +2032,8 @@ void kbase_sync_single(struct kbase_context *kctx, BUG_ON(!cpu_page); BUG_ON(offset + size > PAGE_SIZE); - dma_addr = kbase_dma_addr(cpu_page) + offset; + dma_addr = kbase_dma_addr_from_tagged(t_cpu_pa) + offset; + if (sync_fn == KBASE_SYNC_TO_CPU) dma_sync_single_for_cpu(kctx->kbdev->dev, dma_addr, size, DMA_BIDIRECTIONAL); @@ -2018,29 +2044,30 @@ void kbase_sync_single(struct kbase_context *kctx, void *src = NULL; void *dst = NULL; struct page *gpu_page; + dma_addr_t dma_addr; if (WARN(!gpu_pa, "No GPU PA found for infinite cache op")) return; gpu_page = pfn_to_page(PFN_DOWN(gpu_pa)); + dma_addr = kbase_dma_addr_from_tagged(t_gpu_pa) + offset; if (sync_fn == KBASE_SYNC_TO_DEVICE) { - src = ((unsigned char *)kmap(cpu_page)) + offset; - dst = ((unsigned char *)kmap(gpu_page)) + offset; + src = ((unsigned char *)kbase_kmap(cpu_page)) + offset; + dst = ((unsigned char *)kbase_kmap(gpu_page)) + offset; } else if (sync_fn == KBASE_SYNC_TO_CPU) { - dma_sync_single_for_cpu(kctx->kbdev->dev, - kbase_dma_addr(gpu_page) + offset, - size, DMA_BIDIRECTIONAL); - src = ((unsigned char *)kmap(gpu_page)) + offset; - dst = ((unsigned char *)kmap(cpu_page)) + offset; + dma_sync_single_for_cpu(kctx->kbdev->dev, dma_addr, size, + DMA_BIDIRECTIONAL); + src = ((unsigned char *)kbase_kmap(gpu_page)) + offset; + dst = ((unsigned char *)kbase_kmap(cpu_page)) + offset; } + memcpy(dst, src, size); - kunmap(gpu_page); - kunmap(cpu_page); + kbase_kunmap(gpu_page, src); + kbase_kunmap(cpu_page, dst); if (sync_fn == KBASE_SYNC_TO_DEVICE) - dma_sync_single_for_device(kctx->kbdev->dev, - kbase_dma_addr(gpu_page) + offset, - size, DMA_BIDIRECTIONAL); + dma_sync_single_for_device(kctx->kbdev->dev, dma_addr, size, + DMA_BIDIRECTIONAL); } } @@ -2186,29 +2213,27 @@ int kbase_mem_free_region(struct kbase_context *kctx, struct kbase_va_region *re __func__, (void *)reg, (void *)kctx); lockdep_assert_held(&kctx->reg_lock); - if (reg->flags & KBASE_REG_NO_USER_FREE) { + if (kbase_va_region_is_no_user_free(reg)) { dev_warn(kctx->kbdev->dev, "Attempt to free GPU memory whose freeing by user space is forbidden!\n"); return -EINVAL; } - /* - * Unlink the physical allocation before unmaking it evictable so - * that the allocation isn't grown back to its last backed size - * as we're going to unmap it anyway. - */ - reg->cpu_alloc->reg = NULL; - if (reg->cpu_alloc != reg->gpu_alloc) - reg->gpu_alloc->reg = NULL; - - /* - * If a region has been made evictable then we must unmake it + /* If a region has been made evictable then we must unmake it * before trying to free it. * If the memory hasn't been reclaimed it will be unmapped and freed * below, if it has been reclaimed then the operations below are no-ops. */ if (reg->flags & KBASE_REG_DONT_NEED) { - KBASE_DEBUG_ASSERT(reg->cpu_alloc->type == - KBASE_MEM_TYPE_NATIVE); + WARN_ON(reg->cpu_alloc->type != KBASE_MEM_TYPE_NATIVE); + mutex_lock(&kctx->jit_evict_lock); + /* Unlink the physical allocation before unmaking it evictable so + * that the allocation isn't grown back to its last backed size + * as we're going to unmap it anyway. + */ + reg->cpu_alloc->reg = NULL; + if (reg->cpu_alloc != reg->gpu_alloc) + reg->gpu_alloc->reg = NULL; + mutex_unlock(&kctx->jit_evict_lock); kbase_mem_evictable_unmake(reg->gpu_alloc); } @@ -2219,8 +2244,8 @@ int kbase_mem_free_region(struct kbase_context *kctx, struct kbase_va_region *re } #if MALI_USE_CSF - if (((reg->flags & KBASE_REG_ZONE_MASK) == KBASE_REG_ZONE_FIXED_VA) || - ((reg->flags & KBASE_REG_ZONE_MASK) == KBASE_REG_ZONE_EXEC_FIXED_VA)) { + if (((kbase_bits_to_zone(reg->flags)) == FIXED_VA_ZONE) || + ((kbase_bits_to_zone(reg->flags)) == EXEC_FIXED_VA_ZONE)) { if (reg->flags & KBASE_REG_FIXED_ADDRESS) atomic64_dec(&kctx->num_fixed_allocs); else @@ -2268,7 +2293,7 @@ int kbase_mem_free(struct kbase_context *kctx, u64 gpu_addr) __func__); return -EINVAL; } - kbase_gpu_vm_lock(kctx); + kbase_gpu_vm_lock_with_pmode_sync(kctx); if (gpu_addr >= BASE_MEM_COOKIE_BASE && gpu_addr < BASE_MEM_FIRST_FREE_ADDRESS) { @@ -2297,7 +2322,7 @@ int kbase_mem_free(struct kbase_context *kctx, u64 gpu_addr) goto out_unlock; } - if ((reg->flags & KBASE_REG_ZONE_MASK) == KBASE_REG_ZONE_SAME_VA) { + if ((kbase_bits_to_zone(reg->flags)) == SAME_VA_ZONE) { /* SAME_VA must be freed through munmap */ dev_warn(kctx->kbdev->dev, "%s called on SAME_VA memory 0x%llX", __func__, gpu_addr); @@ -2308,7 +2333,7 @@ int kbase_mem_free(struct kbase_context *kctx, u64 gpu_addr) } out_unlock: - kbase_gpu_vm_unlock(kctx); + kbase_gpu_vm_unlock_with_pmode_sync(kctx); return err; } @@ -2407,8 +2432,11 @@ int kbase_update_region_flags(struct kbase_context *kctx, if (flags & BASEP_MEM_PERMANENT_KERNEL_MAPPING) reg->flags |= KBASE_REG_PERMANENT_KERNEL_MAPPING; - if (flags & BASEP_MEM_NO_USER_FREE) - reg->flags |= KBASE_REG_NO_USER_FREE; + if (flags & BASEP_MEM_NO_USER_FREE) { + kbase_gpu_vm_lock(kctx); + kbase_va_region_no_user_free_inc(reg); + kbase_gpu_vm_unlock(kctx); + } if (flags & BASE_MEM_GPU_VA_SAME_4GB_PAGE) reg->flags |= KBASE_REG_GPU_VA_SAME_4GB_PAGE; @@ -2457,21 +2485,18 @@ int kbase_alloc_phy_pages_helper(struct kbase_mem_phy_alloc *alloc, * allocation is visible to the OOM killer */ kbase_process_page_usage_inc(kctx, nr_pages_requested); + kbase_trace_gpu_mem_usage_inc(kctx->kbdev, kctx, nr_pages_requested); tp = alloc->pages + alloc->nents; -#ifdef CONFIG_MALI_2MB_ALLOC /* Check if we have enough pages requested so we can allocate a large * page (512 * 4KB = 2MB ) */ - if (nr_left >= (SZ_2M / SZ_4K)) { + if (kbdev->pagesize_2mb && nr_left >= (SZ_2M / SZ_4K)) { int nr_lp = nr_left / (SZ_2M / SZ_4K); - res = kbase_mem_pool_alloc_pages( - &kctx->mem_pools.large[alloc->group_id], - nr_lp * (SZ_2M / SZ_4K), - tp, - true); + res = kbase_mem_pool_alloc_pages(&kctx->mem_pools.large[alloc->group_id], + nr_lp * (SZ_2M / SZ_4K), tp, true, kctx->task); if (res > 0) { nr_left -= res; @@ -2525,7 +2550,7 @@ int kbase_alloc_phy_pages_helper(struct kbase_mem_phy_alloc *alloc, err = kbase_mem_pool_grow( &kctx->mem_pools.large[alloc->group_id], - 1); + 1, kctx->task); if (err) break; } while (1); @@ -2566,13 +2591,11 @@ int kbase_alloc_phy_pages_helper(struct kbase_mem_phy_alloc *alloc, } } } -no_new_partial: -#endif +no_new_partial: if (nr_left) { - res = kbase_mem_pool_alloc_pages( - &kctx->mem_pools.small[alloc->group_id], - nr_left, tp, false); + res = kbase_mem_pool_alloc_pages(&kctx->mem_pools.small[alloc->group_id], nr_left, + tp, false, kctx->task); if (res <= 0) goto alloc_failed; } @@ -2584,8 +2607,6 @@ no_new_partial: alloc->nents += nr_pages_requested; - kbase_trace_gpu_mem_usage_inc(kctx->kbdev, kctx, nr_pages_requested); - done: return 0; @@ -2595,19 +2616,13 @@ alloc_failed: size_t nr_pages_to_free = nr_pages_requested - nr_left; alloc->nents += nr_pages_to_free; - - kbase_process_page_usage_inc(kctx, nr_pages_to_free); - atomic_add(nr_pages_to_free, &kctx->used_pages); - atomic_add(nr_pages_to_free, - &kctx->kbdev->memdev.used_pages); - kbase_free_phy_pages_helper(alloc, nr_pages_to_free); } - kbase_process_page_usage_dec(kctx, nr_pages_requested); - atomic_sub(nr_pages_requested, &kctx->used_pages); - atomic_sub(nr_pages_requested, - &kctx->kbdev->memdev.used_pages); + kbase_trace_gpu_mem_usage_dec(kctx->kbdev, kctx, nr_left); + kbase_process_page_usage_dec(kctx, nr_left); + atomic_sub(nr_left, &kctx->used_pages); + atomic_sub(nr_left, &kctx->kbdev->memdev.used_pages); invalid_request: return -ENOMEM; @@ -2631,18 +2646,17 @@ struct tagged_addr *kbase_alloc_phy_pages_helper_locked( lockdep_assert_held(&pool->pool_lock); -#if !defined(CONFIG_MALI_2MB_ALLOC) - WARN_ON(pool->order); -#endif + kctx = alloc->imported.native.kctx; + kbdev = kctx->kbdev; + + if (!kbdev->pagesize_2mb) + WARN_ON(pool->order); if (alloc->reg) { if (nr_pages_requested > alloc->reg->nr_pages - alloc->nents) goto invalid_request; } - kctx = alloc->imported.native.kctx; - kbdev = kctx->kbdev; - lockdep_assert_held(&kctx->mem_partials_lock); if (nr_pages_requested == 0) @@ -2657,12 +2671,12 @@ struct tagged_addr *kbase_alloc_phy_pages_helper_locked( * allocation is visible to the OOM killer */ kbase_process_page_usage_inc(kctx, nr_pages_requested); + kbase_trace_gpu_mem_usage_inc(kctx->kbdev, kctx, nr_pages_requested); tp = alloc->pages + alloc->nents; new_pages = tp; -#ifdef CONFIG_MALI_2MB_ALLOC - if (pool->order) { + if (kbdev->pagesize_2mb && pool->order) { int nr_lp = nr_left / (SZ_2M / SZ_4K); res = kbase_mem_pool_alloc_pages_locked(pool, @@ -2746,15 +2760,12 @@ struct tagged_addr *kbase_alloc_phy_pages_helper_locked( if (nr_left) goto alloc_failed; } else { -#endif res = kbase_mem_pool_alloc_pages_locked(pool, nr_left, tp); if (res <= 0) goto alloc_failed; -#ifdef CONFIG_MALI_2MB_ALLOC } -#endif KBASE_TLSTREAM_AUX_PAGESALLOC( kbdev, @@ -2763,8 +2774,6 @@ struct tagged_addr *kbase_alloc_phy_pages_helper_locked( alloc->nents += nr_pages_requested; - kbase_trace_gpu_mem_usage_inc(kctx->kbdev, kctx, nr_pages_requested); - done: return new_pages; @@ -2775,8 +2784,7 @@ alloc_failed: struct tagged_addr *start_free = alloc->pages + alloc->nents; -#ifdef CONFIG_MALI_2MB_ALLOC - if (pool->order) { + if (kbdev->pagesize_2mb && pool->order) { while (nr_pages_to_free) { if (is_huge_head(*start_free)) { kbase_mem_pool_free_pages_locked( @@ -2794,17 +2802,15 @@ alloc_failed: } } } else { -#endif kbase_mem_pool_free_pages_locked(pool, nr_pages_to_free, start_free, false, /* not dirty */ true); /* return to pool */ -#ifdef CONFIG_MALI_2MB_ALLOC } -#endif } + kbase_trace_gpu_mem_usage_dec(kctx->kbdev, kctx, nr_pages_requested); kbase_process_page_usage_dec(kctx, nr_pages_requested); atomic_sub(nr_pages_requested, &kctx->used_pages); atomic_sub(nr_pages_requested, &kctx->kbdev->memdev.used_pages); @@ -3064,6 +3070,13 @@ KBASE_EXPORT_TEST_API(kbase_free_phy_pages_helper_locked); /** * kbase_jd_user_buf_unpin_pages - Release the pinned pages of a user buffer. * @alloc: The allocation for the imported user buffer. + * + * This must only be called when terminating an alloc, when its refcount + * (number of users) has become 0. This also ensures it is only called once all + * CPU mappings have been closed. + * + * Instead call kbase_jd_user_buf_unmap() if you need to unpin pages on active + * allocations */ static void kbase_jd_user_buf_unpin_pages(struct kbase_mem_phy_alloc *alloc); #endif @@ -3194,9 +3207,32 @@ out_rollback: out_term: return -1; } - KBASE_EXPORT_TEST_API(kbase_alloc_phy_pages); +void kbase_set_phy_alloc_page_status(struct kbase_mem_phy_alloc *alloc, + enum kbase_page_status status) +{ + u32 i = 0; + + for (; i < alloc->nents; i++) { + struct tagged_addr phys = alloc->pages[i]; + struct kbase_page_metadata *page_md = kbase_page_private(as_page(phys)); + + /* Skip the 4KB page that is part of a large page, as the large page is + * excluded from the migration process. + */ + if (is_huge(phys) || is_partial(phys)) + continue; + + if (!page_md) + continue; + + spin_lock(&page_md->migrate_lock); + page_md->status = PAGE_STATUS_SET(page_md->status, (u8)status); + spin_unlock(&page_md->migrate_lock); + } +} + bool kbase_check_alloc_flags(unsigned long flags) { /* Only known input flags should be set. */ @@ -3437,30 +3473,36 @@ int kbase_check_alloc_sizes(struct kbase_context *kctx, unsigned long flags, #undef KBASE_MSG_PRE } -/** - * Acquire the per-context region list lock - * @kctx: KBase context - */ void kbase_gpu_vm_lock(struct kbase_context *kctx) { KBASE_DEBUG_ASSERT(kctx != NULL); mutex_lock(&kctx->reg_lock); } - KBASE_EXPORT_TEST_API(kbase_gpu_vm_lock); -/** - * Release the per-context region list lock - * @kctx: KBase context - */ +void kbase_gpu_vm_lock_with_pmode_sync(struct kbase_context *kctx) +{ +#if MALI_USE_CSF + down_read(&kctx->kbdev->csf.pmode_sync_sem); +#endif + kbase_gpu_vm_lock(kctx); +} + void kbase_gpu_vm_unlock(struct kbase_context *kctx) { KBASE_DEBUG_ASSERT(kctx != NULL); mutex_unlock(&kctx->reg_lock); } - KBASE_EXPORT_TEST_API(kbase_gpu_vm_unlock); +void kbase_gpu_vm_unlock_with_pmode_sync(struct kbase_context *kctx) +{ + kbase_gpu_vm_unlock(kctx); +#if MALI_USE_CSF + up_read(&kctx->kbdev->csf.pmode_sync_sem); +#endif +} + #if IS_ENABLED(CONFIG_DEBUG_FS) struct kbase_jit_debugfs_data { int (*func)(struct kbase_jit_debugfs_data *data); @@ -3688,12 +3730,7 @@ void kbase_jit_debugfs_init(struct kbase_context *kctx) /* prevent unprivileged use of debug file system * in old kernel version */ -#if (KERNEL_VERSION(4, 7, 0) <= LINUX_VERSION_CODE) - /* only for newer kernel version debug file system is safe */ const mode_t mode = 0444; -#else - const mode_t mode = 0400; -#endif /* Caller already ensures this, but we keep the pattern for * maintenance safety. @@ -3767,7 +3804,15 @@ static void kbase_jit_destroy_worker(struct work_struct *work) mutex_unlock(&kctx->jit_evict_lock); kbase_gpu_vm_lock(kctx); - reg->flags &= ~KBASE_REG_NO_USER_FREE; + + /* + * Incrementing the refcount is prevented on JIT regions. + * If/when this ever changes we would need to compensate + * by implementing "free on putting the last reference", + * but only for JIT regions. + */ + WARN_ON(atomic_read(®->no_user_free_count) > 1); + kbase_va_region_no_user_free_dec(reg); kbase_mem_free_region(kctx, reg); kbase_gpu_vm_unlock(kctx); } while (1); @@ -3782,6 +3827,7 @@ int kbase_jit_init(struct kbase_context *kctx) INIT_WORK(&kctx->jit_work, kbase_jit_destroy_worker); #if MALI_USE_CSF + mutex_init(&kctx->csf.kcpu_queues.jit_lock); INIT_LIST_HEAD(&kctx->csf.kcpu_queues.jit_cmds_head); INIT_LIST_HEAD(&kctx->csf.kcpu_queues.jit_blocked_queues); #else /* !MALI_USE_CSF */ @@ -4020,25 +4066,18 @@ static int kbase_jit_grow(struct kbase_context *kctx, if (reg->gpu_alloc->nents >= info->commit_pages) goto done; - /* Grow the backing */ - old_size = reg->gpu_alloc->nents; - /* Allocate some more pages */ delta = info->commit_pages - reg->gpu_alloc->nents; pages_required = delta; -#ifdef CONFIG_MALI_2MB_ALLOC - if (pages_required >= (SZ_2M / SZ_4K)) { + if (kctx->kbdev->pagesize_2mb && pages_required >= (SZ_2M / SZ_4K)) { pool = &kctx->mem_pools.large[kctx->jit_group_id]; /* Round up to number of 2 MB pages required */ pages_required += ((SZ_2M / SZ_4K) - 1); pages_required /= (SZ_2M / SZ_4K); } else { -#endif pool = &kctx->mem_pools.small[kctx->jit_group_id]; -#ifdef CONFIG_MALI_2MB_ALLOC } -#endif if (reg->cpu_alloc != reg->gpu_alloc) pages_required *= 2; @@ -4059,7 +4098,7 @@ static int kbase_jit_grow(struct kbase_context *kctx, spin_unlock(&kctx->mem_partials_lock); kbase_gpu_vm_unlock(kctx); - ret = kbase_mem_pool_grow(pool, pool_delta); + ret = kbase_mem_pool_grow(pool, pool_delta, kctx->task); kbase_gpu_vm_lock(kctx); if (ret) @@ -4069,6 +4108,17 @@ static int kbase_jit_grow(struct kbase_context *kctx, kbase_mem_pool_lock(pool); } + if (reg->gpu_alloc->nents >= info->commit_pages) { + kbase_mem_pool_unlock(pool); + spin_unlock(&kctx->mem_partials_lock); + dev_info( + kctx->kbdev->dev, + "JIT alloc grown beyond the required number of initially required pages, this grow no longer needed."); + goto done; + } + + old_size = reg->gpu_alloc->nents; + delta = info->commit_pages - old_size; gpu_pages = kbase_alloc_phy_pages_helper_locked(reg->gpu_alloc, pool, delta, &prealloc_sas[0]); if (!gpu_pages) { @@ -4219,11 +4269,11 @@ static bool jit_allow_allocate(struct kbase_context *kctx, const struct base_jit_alloc_info *info, bool ignore_pressure_limit) { -#if MALI_USE_CSF - lockdep_assert_held(&kctx->csf.kcpu_queues.lock); -#else +#if !MALI_USE_CSF lockdep_assert_held(&kctx->jctx.lock); -#endif +#else /* MALI_USE_CSF */ + lockdep_assert_held(&kctx->csf.kcpu_queues.jit_lock); +#endif /* !MALI_USE_CSF */ #if MALI_JIT_PRESSURE_LIMIT_BASE if (!ignore_pressure_limit && @@ -4314,25 +4364,25 @@ struct kbase_va_region *kbase_jit_allocate(struct kbase_context *kctx, */ const enum kbase_caller_mmu_sync_info mmu_sync_info = CALLER_MMU_SYNC; -#if MALI_USE_CSF - lockdep_assert_held(&kctx->csf.kcpu_queues.lock); -#else +#if !MALI_USE_CSF lockdep_assert_held(&kctx->jctx.lock); -#endif +#else /* MALI_USE_CSF */ + lockdep_assert_held(&kctx->csf.kcpu_queues.jit_lock); +#endif /* !MALI_USE_CSF */ if (!jit_allow_allocate(kctx, info, ignore_pressure_limit)) return NULL; -#ifdef CONFIG_MALI_2MB_ALLOC - /* Preallocate memory for the sub-allocation structs */ - for (i = 0; i != ARRAY_SIZE(prealloc_sas); ++i) { - prealloc_sas[i] = kmalloc(sizeof(*prealloc_sas[i]), GFP_KERNEL); - if (!prealloc_sas[i]) - goto end; + if (kctx->kbdev->pagesize_2mb) { + /* Preallocate memory for the sub-allocation structs */ + for (i = 0; i != ARRAY_SIZE(prealloc_sas); ++i) { + prealloc_sas[i] = kmalloc(sizeof(*prealloc_sas[i]), GFP_KERNEL); + if (!prealloc_sas[i]) + goto end; + } } -#endif - kbase_gpu_vm_lock(kctx); + kbase_gpu_vm_lock_with_pmode_sync(kctx); mutex_lock(&kctx->jit_evict_lock); /* @@ -4414,12 +4464,12 @@ struct kbase_va_region *kbase_jit_allocate(struct kbase_context *kctx, kbase_jit_done_phys_increase(kctx, needed_pages); #endif /* MALI_JIT_PRESSURE_LIMIT_BASE */ - kbase_gpu_vm_unlock(kctx); + kbase_gpu_vm_unlock_with_pmode_sync(kctx); if (ret < 0) { /* * An update to an allocation from the pool failed, - * chances are slim a new allocation would fair any + * chances are slim a new allocation would fare any * better so return the allocation to the pool and * return the function with failure. */ @@ -4441,6 +4491,17 @@ struct kbase_va_region *kbase_jit_allocate(struct kbase_context *kctx, mutex_unlock(&kctx->jit_evict_lock); reg = NULL; goto end; + } else { + /* A suitable JIT allocation existed on the evict list, so we need + * to make sure that the NOT_MOVABLE property is cleared. + */ + if (kbase_is_page_migration_enabled()) { + kbase_gpu_vm_lock(kctx); + mutex_lock(&kctx->jit_evict_lock); + kbase_set_phy_alloc_page_status(reg->gpu_alloc, ALLOCATED_MAPPED); + mutex_unlock(&kctx->jit_evict_lock); + kbase_gpu_vm_unlock(kctx); + } } } else { /* No suitable JIT allocation was found so create a new one */ @@ -4468,7 +4529,7 @@ struct kbase_va_region *kbase_jit_allocate(struct kbase_context *kctx, #endif /* MALI_JIT_PRESSURE_LIMIT_BASE */ mutex_unlock(&kctx->jit_evict_lock); - kbase_gpu_vm_unlock(kctx); + kbase_gpu_vm_unlock_with_pmode_sync(kctx); reg = kbase_mem_alloc(kctx, info->va_pages, info->commit_pages, info->extension, &flags, &gpu_addr, mmu_sync_info); @@ -4497,6 +4558,29 @@ struct kbase_va_region *kbase_jit_allocate(struct kbase_context *kctx, } } + /* Similarly to tiler heap init, there is a short window of time + * where the (either recycled or newly allocated, in our case) region has + * "no user free" count incremented but is still missing the DONT_NEED flag, and + * doesn't yet have the ACTIVE_JIT_ALLOC flag either. Temporarily leaking the + * allocation is the least bad option that doesn't lead to a security issue down the + * line (it will eventually be cleaned up during context termination). + * + * We also need to call kbase_gpu_vm_lock regardless, as we're updating the region + * flags. + */ + kbase_gpu_vm_lock(kctx); + if (unlikely(atomic_read(®->no_user_free_count) > 1)) { + kbase_gpu_vm_unlock(kctx); + dev_err(kctx->kbdev->dev, "JIT region has no_user_free_count > 1!\n"); + + mutex_lock(&kctx->jit_evict_lock); + list_move(®->jit_node, &kctx->jit_pool_head); + mutex_unlock(&kctx->jit_evict_lock); + + reg = NULL; + goto end; + } + trace_mali_jit_alloc(reg, info->id); kctx->jit_current_allocations++; @@ -4514,6 +4598,7 @@ struct kbase_va_region *kbase_jit_allocate(struct kbase_context *kctx, kbase_jit_report_update_pressure(kctx, reg, info->va_pages, KBASE_JIT_REPORT_ON_ALLOC_OR_FREE); #endif /* MALI_JIT_PRESSURE_LIMIT_BASE */ + kbase_gpu_vm_unlock(kctx); end: for (i = 0; i != ARRAY_SIZE(prealloc_sas); ++i) @@ -4526,6 +4611,12 @@ void kbase_jit_free(struct kbase_context *kctx, struct kbase_va_region *reg) { u64 old_pages; +#if !MALI_USE_CSF + lockdep_assert_held(&kctx->jctx.lock); +#else /* MALI_USE_CSF */ + lockdep_assert_held(&kctx->csf.kcpu_queues.jit_lock); +#endif /* !MALI_USE_CSF */ + /* JIT id not immediately available here, so use 0u */ trace_mali_jit_free(reg, 0u); @@ -4540,9 +4631,9 @@ void kbase_jit_free(struct kbase_context *kctx, struct kbase_va_region *reg) u64 delta = old_pages - new_size; if (delta) { - mutex_lock(&kctx->reg_lock); + kbase_gpu_vm_lock_with_pmode_sync(kctx); kbase_mem_shrink(kctx, reg, old_pages - delta); - mutex_unlock(&kctx->reg_lock); + kbase_gpu_vm_unlock_with_pmode_sync(kctx); } } @@ -4578,12 +4669,18 @@ void kbase_jit_free(struct kbase_context *kctx, struct kbase_va_region *reg) list_move(®->jit_node, &kctx->jit_pool_head); + /* Inactive JIT regions should be freed by the shrinker and not impacted + * by page migration. Once freed, they will enter into the page migration + * state machine via the mempools. + */ + if (kbase_is_page_migration_enabled()) + kbase_set_phy_alloc_page_status(reg->gpu_alloc, NOT_MOVABLE); mutex_unlock(&kctx->jit_evict_lock); } void kbase_jit_backing_lost(struct kbase_va_region *reg) { - struct kbase_context *kctx = kbase_reg_flags_to_kctx(reg); + struct kbase_context *kctx = kbase_reg_to_kctx(reg); if (WARN_ON(!kctx)) return; @@ -4624,7 +4721,14 @@ bool kbase_jit_evict(struct kbase_context *kctx) mutex_unlock(&kctx->jit_evict_lock); if (reg) { - reg->flags &= ~KBASE_REG_NO_USER_FREE; + /* + * Incrementing the refcount is prevented on JIT regions. + * If/when this ever changes we would need to compensate + * by implementing "free on putting the last reference", + * but only for JIT regions. + */ + WARN_ON(atomic_read(®->no_user_free_count) > 1); + kbase_va_region_no_user_free_dec(reg); kbase_mem_free_region(kctx, reg); } @@ -4636,8 +4740,7 @@ void kbase_jit_term(struct kbase_context *kctx) struct kbase_va_region *walker; /* Free all allocations for this context */ - - kbase_gpu_vm_lock(kctx); + kbase_gpu_vm_lock_with_pmode_sync(kctx); mutex_lock(&kctx->jit_evict_lock); /* Free all allocations from the pool */ while (!list_empty(&kctx->jit_pool_head)) { @@ -4646,7 +4749,14 @@ void kbase_jit_term(struct kbase_context *kctx) list_del(&walker->jit_node); list_del_init(&walker->gpu_alloc->evict_node); mutex_unlock(&kctx->jit_evict_lock); - walker->flags &= ~KBASE_REG_NO_USER_FREE; + /* + * Incrementing the refcount is prevented on JIT regions. + * If/when this ever changes we would need to compensate + * by implementing "free on putting the last reference", + * but only for JIT regions. + */ + WARN_ON(atomic_read(&walker->no_user_free_count) > 1); + kbase_va_region_no_user_free_dec(walker); kbase_mem_free_region(kctx, walker); mutex_lock(&kctx->jit_evict_lock); } @@ -4658,7 +4768,14 @@ void kbase_jit_term(struct kbase_context *kctx) list_del(&walker->jit_node); list_del_init(&walker->gpu_alloc->evict_node); mutex_unlock(&kctx->jit_evict_lock); - walker->flags &= ~KBASE_REG_NO_USER_FREE; + /* + * Incrementing the refcount is prevented on JIT regions. + * If/when this ever changes we would need to compensate + * by implementing "free on putting the last reference", + * but only for JIT regions. + */ + WARN_ON(atomic_read(&walker->no_user_free_count) > 1); + kbase_va_region_no_user_free_dec(walker); kbase_mem_free_region(kctx, walker); mutex_lock(&kctx->jit_evict_lock); } @@ -4666,7 +4783,7 @@ void kbase_jit_term(struct kbase_context *kctx) WARN_ON(kctx->jit_phys_pages_to_be_allocated); #endif mutex_unlock(&kctx->jit_evict_lock); - kbase_gpu_vm_unlock(kctx); + kbase_gpu_vm_unlock_with_pmode_sync(kctx); /* * Flush the freeing of allocations whose backing has been freed @@ -4772,7 +4889,23 @@ void kbase_unpin_user_buf_page(struct page *page) #if MALI_USE_CSF static void kbase_jd_user_buf_unpin_pages(struct kbase_mem_phy_alloc *alloc) { - if (alloc->nents) { + /* In CSF builds, we keep pages pinned until the last reference is + * released on the alloc. A refcount of 0 also means we can be sure + * that all CPU mappings have been closed on this alloc, and no more + * mappings of it will be created. + * + * Further, the WARN() below captures the restriction that this + * function will not handle anything other than the alloc termination + * path, because the caller of kbase_mem_phy_alloc_put() is not + * required to hold the kctx's reg_lock, and so we could not handle + * removing an existing CPU mapping here. + * + * Refer to this function's kernel-doc comments for alternatives for + * unpinning a User buffer. + */ + + if (alloc->nents && !WARN(kref_read(&alloc->kref) != 0, + "must only be called on terminating an allocation")) { struct page **pages = alloc->imported.user_buf.pages; long i; @@ -4780,6 +4913,8 @@ static void kbase_jd_user_buf_unpin_pages(struct kbase_mem_phy_alloc *alloc) for (i = 0; i < alloc->nents; i++) kbase_unpin_user_buf_page(pages[i]); + + alloc->nents = 0; } } #endif @@ -4795,6 +4930,8 @@ int kbase_jd_user_buf_pin_pages(struct kbase_context *kctx, long i; int write; + lockdep_assert_held(&kctx->reg_lock); + if (WARN_ON(alloc->type != KBASE_MEM_TYPE_IMPORTED_USER_BUF)) return -EINVAL; @@ -4810,18 +4947,7 @@ int kbase_jd_user_buf_pin_pages(struct kbase_context *kctx, write = reg->flags & (KBASE_REG_CPU_WR | KBASE_REG_GPU_WR); -#if KERNEL_VERSION(4, 6, 0) > LINUX_VERSION_CODE - pinned_pages = get_user_pages(NULL, mm, address, alloc->imported.user_buf.nr_pages, -#if KERNEL_VERSION(4, 4, 168) <= LINUX_VERSION_CODE && \ -KERNEL_VERSION(4, 5, 0) > LINUX_VERSION_CODE - write ? FOLL_WRITE : 0, pages, NULL); -#else - write, 0, pages, NULL); -#endif -#elif KERNEL_VERSION(4, 9, 0) > LINUX_VERSION_CODE - pinned_pages = get_user_pages_remote(NULL, mm, address, alloc->imported.user_buf.nr_pages, - write, 0, pages, NULL); -#elif KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE +#if KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE pinned_pages = get_user_pages_remote(NULL, mm, address, alloc->imported.user_buf.nr_pages, write ? FOLL_WRITE : 0, pages, NULL); #elif KERNEL_VERSION(5, 9, 0) > LINUX_VERSION_CODE @@ -4836,6 +4962,9 @@ KERNEL_VERSION(4, 5, 0) > LINUX_VERSION_CODE return pinned_pages; if (pinned_pages != alloc->imported.user_buf.nr_pages) { + /* Above code already ensures there will not have been a CPU + * mapping by ensuring alloc->nents is 0 + */ for (i = 0; i < pinned_pages; i++) kbase_unpin_user_buf_page(pages[i]); return -ENOMEM; @@ -4849,43 +4978,65 @@ KERNEL_VERSION(4, 5, 0) > LINUX_VERSION_CODE static int kbase_jd_user_buf_map(struct kbase_context *kctx, struct kbase_va_region *reg) { - long pinned_pages; + int err; + long pinned_pages = 0; struct kbase_mem_phy_alloc *alloc; struct page **pages; struct tagged_addr *pa; - long i; - unsigned long address; + long i, dma_mapped_pages; struct device *dev; - unsigned long offset; - unsigned long local_size; unsigned long gwt_mask = ~0; - int err = kbase_jd_user_buf_pin_pages(kctx, reg); - /* Calls to this function are inherently asynchronous, with respect to * MMU operations. */ const enum kbase_caller_mmu_sync_info mmu_sync_info = CALLER_MMU_ASYNC; + bool write; + enum dma_data_direction dma_dir; + + /* If neither the CPU nor the GPU needs write access, use DMA_TO_DEVICE + * to avoid potentially-destructive CPU cache invalidates that could + * corruption of user data. + */ + write = reg->flags & (KBASE_REG_CPU_WR | KBASE_REG_GPU_WR); + dma_dir = write ? DMA_BIDIRECTIONAL : DMA_TO_DEVICE; + + lockdep_assert_held(&kctx->reg_lock); + + err = kbase_jd_user_buf_pin_pages(kctx, reg); if (err) return err; alloc = reg->gpu_alloc; pa = kbase_get_gpu_phy_pages(reg); - address = alloc->imported.user_buf.address; pinned_pages = alloc->nents; pages = alloc->imported.user_buf.pages; dev = kctx->kbdev->dev; - offset = address & ~PAGE_MASK; - local_size = alloc->imported.user_buf.size; + /* Manual CPU cache synchronization. + * + * The driver disables automatic CPU cache synchronization because the + * memory pages that enclose the imported region may also contain + * sub-regions which are not imported and that are allocated and used + * by the user process. This may be the case of memory at the beginning + * of the first page and at the end of the last page. Automatic CPU cache + * synchronization would force some operations on those memory allocations, + * unbeknown to the user process: in particular, a CPU cache invalidate + * upon unmapping would destroy the content of dirty CPU caches and cause + * the user process to lose CPU writes to the non-imported sub-regions. + * + * When the GPU claims ownership of the imported memory buffer, it shall + * commit CPU writes for the whole of all pages that enclose the imported + * region, otherwise the initial content of memory would be wrong. + */ for (i = 0; i < pinned_pages; i++) { dma_addr_t dma_addr; - unsigned long min; - - min = MIN(PAGE_SIZE - offset, local_size); - dma_addr = dma_map_page(dev, pages[i], - offset, min, - DMA_BIDIRECTIONAL); +#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE) + dma_addr = dma_map_page(dev, pages[i], 0, PAGE_SIZE, dma_dir); +#else + dma_addr = dma_map_page_attrs(dev, pages[i], 0, PAGE_SIZE, dma_dir, + DMA_ATTR_SKIP_CPU_SYNC); +#endif err = dma_mapping_error(dev, dma_addr); if (err) goto unwind; @@ -4893,8 +5044,7 @@ static int kbase_jd_user_buf_map(struct kbase_context *kctx, alloc->imported.user_buf.dma_addrs[i] = dma_addr; pa[i] = as_tagged(page_to_phys(pages[i])); - local_size -= min; - offset = 0; + dma_sync_single_for_device(dev, dma_addr, PAGE_SIZE, dma_dir); } #ifdef CONFIG_MALI_CINSTR_GWT @@ -4902,23 +5052,44 @@ static int kbase_jd_user_buf_map(struct kbase_context *kctx, gwt_mask = ~KBASE_REG_GPU_WR; #endif - err = kbase_mmu_insert_pages(kctx->kbdev, &kctx->mmu, reg->start_pfn, - pa, kbase_reg_current_backed_size(reg), - reg->flags & gwt_mask, kctx->as_nr, - alloc->group_id, mmu_sync_info); + err = kbase_mmu_insert_pages_skip_status_update(kctx->kbdev, &kctx->mmu, reg->start_pfn, pa, + kbase_reg_current_backed_size(reg), + reg->flags & gwt_mask, kctx->as_nr, + alloc->group_id, mmu_sync_info, NULL); if (err == 0) return 0; /* fall down */ unwind: alloc->nents = 0; - while (i--) { - dma_unmap_page(kctx->kbdev->dev, - alloc->imported.user_buf.dma_addrs[i], - PAGE_SIZE, DMA_BIDIRECTIONAL); + dma_mapped_pages = i; + /* Run the unmap loop in the same order as map loop, and perform again + * CPU cache synchronization to re-write the content of dirty CPU caches + * to memory. This is precautionary measure in case a GPU job has taken + * advantage of a partially GPU-mapped range to write and corrupt the + * content of memory, either inside or outside the imported region. + * + * Notice that this error recovery path doesn't try to be optimal and just + * flushes the entire page range. + */ + for (i = 0; i < dma_mapped_pages; i++) { + dma_addr_t dma_addr = alloc->imported.user_buf.dma_addrs[i]; + + dma_sync_single_for_device(dev, dma_addr, PAGE_SIZE, dma_dir); +#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE) + dma_unmap_page(dev, dma_addr, PAGE_SIZE, dma_dir); +#else + dma_unmap_page_attrs(dev, dma_addr, PAGE_SIZE, dma_dir, DMA_ATTR_SKIP_CPU_SYNC); +#endif } - while (++i < pinned_pages) { + /* The user buffer could already have been previously pinned before + * entering this function, and hence there could potentially be CPU + * mappings of it + */ + kbase_mem_shrink_cpu_mapping(kctx, reg, 0, pinned_pages); + + for (i = 0; i < pinned_pages; i++) { kbase_unpin_user_buf_page(pages[i]); pages[i] = NULL; } @@ -4926,34 +5097,165 @@ unwind: return err; } +/* user_buf_sync_read_only_page - This function handles syncing a single page that has read access, + * only, on both the CPU and * GPU, so it is ready to be unmapped. + * @kctx: kbase context + * @imported_size: the number of bytes to sync + * @dma_addr: DMA address of the bytes to be sync'd + * @offset_within_page: (unused) offset of the bytes within the page. Passed so that the calling + * signature is identical to user_buf_sync_writable_page(). + */ +static void user_buf_sync_read_only_page(struct kbase_context *kctx, unsigned long imported_size, + dma_addr_t dma_addr, unsigned long offset_within_page) +{ + /* Manual cache synchronization. + * + * Writes from neither the CPU nor GPU are possible via this mapping, + * so we just sync the entire page to the device. + */ + dma_sync_single_for_device(kctx->kbdev->dev, dma_addr, imported_size, DMA_TO_DEVICE); +} + +/* user_buf_sync_writable_page - This function handles syncing a single page that has read + * and writable access, from either (or both of) the CPU and GPU, + * so it is ready to be unmapped. + * @kctx: kbase context + * @imported_size: the number of bytes to unmap + * @dma_addr: DMA address of the bytes to be unmapped + * @offset_within_page: offset of the bytes within the page. This is the offset to the subrange of + * the memory that is "imported" and so is intended for GPU access. Areas of + * the page outside of this - whilst still GPU accessible - are not intended + * for use by GPU work, and should also not be modified as the userspace CPU + * threads may be modifying them. + */ +static void user_buf_sync_writable_page(struct kbase_context *kctx, unsigned long imported_size, + dma_addr_t dma_addr, unsigned long offset_within_page) +{ + /* Manual CPU cache synchronization. + * + * When the GPU returns ownership of the buffer to the CPU, the driver + * needs to treat imported and non-imported memory differently. + * + * The first case to consider is non-imported sub-regions at the + * beginning of the first page and at the end of last page. For these + * sub-regions: CPU cache shall be committed with a clean+invalidate, + * in order to keep the last CPU write. + * + * Imported region prefers the opposite treatment: this memory has been + * legitimately mapped and used by the GPU, hence GPU writes shall be + * committed to memory, while CPU cache shall be invalidated to make + * sure that CPU reads the correct memory content. + * + * The following diagram shows the expect value of the variables + * used in this loop in the corner case of an imported region encloed + * by a single memory page: + * + * page boundary ->|---------- | <- dma_addr (initial value) + * | | + * | - - - - - | <- offset_within_page + * |XXXXXXXXXXX|\ + * |XXXXXXXXXXX| \ + * |XXXXXXXXXXX| }- imported_size + * |XXXXXXXXXXX| / + * |XXXXXXXXXXX|/ + * | - - - - - | <- offset_within_page + imported_size + * | |\ + * | | }- PAGE_SIZE - imported_size - + * | |/ offset_within_page + * | | + * page boundary ->|-----------| + * + * If the imported region is enclosed by more than one page, then + * offset_within_page = 0 for any page after the first. + */ + + /* Only for first page: handle non-imported range at the beginning. */ + if (offset_within_page > 0) { + dma_sync_single_for_device(kctx->kbdev->dev, dma_addr, offset_within_page, + DMA_BIDIRECTIONAL); + dma_addr += offset_within_page; + } + + /* For every page: handle imported range. */ + if (imported_size > 0) + dma_sync_single_for_cpu(kctx->kbdev->dev, dma_addr, imported_size, + DMA_BIDIRECTIONAL); + + /* Only for last page (that may coincide with first page): + * handle non-imported range at the end. + */ + if ((imported_size + offset_within_page) < PAGE_SIZE) { + dma_addr += imported_size; + dma_sync_single_for_device(kctx->kbdev->dev, dma_addr, + PAGE_SIZE - imported_size - offset_within_page, + DMA_BIDIRECTIONAL); + } +} + /* This function would also perform the work of unpinning pages on Job Manager * GPUs, which implies that a call to kbase_jd_user_buf_pin_pages() will NOT * have a corresponding call to kbase_jd_user_buf_unpin_pages(). */ -static void kbase_jd_user_buf_unmap(struct kbase_context *kctx, - struct kbase_mem_phy_alloc *alloc, bool writeable) +static void kbase_jd_user_buf_unmap(struct kbase_context *kctx, struct kbase_mem_phy_alloc *alloc, + struct kbase_va_region *reg) { long i; struct page **pages; - unsigned long size = alloc->imported.user_buf.size; + unsigned long offset_within_page = alloc->imported.user_buf.address & ~PAGE_MASK; + unsigned long remaining_size = alloc->imported.user_buf.size; + bool writable = (reg->flags & (KBASE_REG_CPU_WR | KBASE_REG_GPU_WR)); + + lockdep_assert_held(&kctx->reg_lock); KBASE_DEBUG_ASSERT(alloc->type == KBASE_MEM_TYPE_IMPORTED_USER_BUF); pages = alloc->imported.user_buf.pages; + +#if !MALI_USE_CSF + kbase_mem_shrink_cpu_mapping(kctx, reg, 0, alloc->nents); +#endif + for (i = 0; i < alloc->imported.user_buf.nr_pages; i++) { - unsigned long local_size; + unsigned long imported_size = MIN(remaining_size, PAGE_SIZE - offset_within_page); + /* Notice: this is a temporary variable that is used for DMA sync + * operations, and that could be incremented by an offset if the + * current page contains both imported and non-imported memory + * sub-regions. + * + * It is valid to add an offset to this value, because the offset + * is always kept within the physically contiguous dma-mapped range + * and there's no need to translate to physical address to offset it. + * + * This variable is not going to be used for the actual DMA unmap + * operation, that shall always use the original DMA address of the + * whole memory page. + */ dma_addr_t dma_addr = alloc->imported.user_buf.dma_addrs[i]; + enum dma_data_direction dma_dir = writable ? DMA_BIDIRECTIONAL : DMA_TO_DEVICE; + + if (writable) + user_buf_sync_writable_page(kctx, imported_size, dma_addr, + offset_within_page); + else + user_buf_sync_read_only_page(kctx, imported_size, dma_addr, + offset_within_page); - local_size = MIN(size, PAGE_SIZE - (dma_addr & ~PAGE_MASK)); - dma_unmap_page(kctx->kbdev->dev, dma_addr, local_size, - DMA_BIDIRECTIONAL); - if (writeable) + /* Notice: use the original DMA address to unmap the whole memory page. */ +#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE) + dma_unmap_page(kctx->kbdev->dev, alloc->imported.user_buf.dma_addrs[i], PAGE_SIZE, + dma_dir); +#else + dma_unmap_page_attrs(kctx->kbdev->dev, alloc->imported.user_buf.dma_addrs[i], + PAGE_SIZE, dma_dir, DMA_ATTR_SKIP_CPU_SYNC); +#endif + if (writable) set_page_dirty_lock(pages[i]); #if !MALI_USE_CSF kbase_unpin_user_buf_page(pages[i]); pages[i] = NULL; #endif - size -= local_size; + remaining_size -= imported_size; + offset_within_page = 0; } #if !MALI_USE_CSF alloc->nents = 0; @@ -4964,7 +5266,8 @@ int kbase_mem_copy_to_pinned_user_pages(struct page **dest_pages, void *src_page, size_t *to_copy, unsigned int nr_pages, unsigned int *target_page_nr, size_t offset) { - void *target_page = kmap(dest_pages[*target_page_nr]); + void *target_page = kbase_kmap(dest_pages[*target_page_nr]); + size_t chunk = PAGE_SIZE-offset; if (!target_page) { @@ -4977,13 +5280,13 @@ int kbase_mem_copy_to_pinned_user_pages(struct page **dest_pages, memcpy(target_page + offset, src_page, chunk); *to_copy -= chunk; - kunmap(dest_pages[*target_page_nr]); + kbase_kunmap(dest_pages[*target_page_nr], target_page); *target_page_nr += 1; if (*target_page_nr >= nr_pages || *to_copy == 0) return 0; - target_page = kmap(dest_pages[*target_page_nr]); + target_page = kbase_kmap(dest_pages[*target_page_nr]); if (!target_page) { pr_err("%s: kmap failure", __func__); return -ENOMEM; @@ -4995,16 +5298,16 @@ int kbase_mem_copy_to_pinned_user_pages(struct page **dest_pages, memcpy(target_page, src_page + PAGE_SIZE-offset, chunk); *to_copy -= chunk; - kunmap(dest_pages[*target_page_nr]); + kbase_kunmap(dest_pages[*target_page_nr], target_page); return 0; } -struct kbase_mem_phy_alloc *kbase_map_external_resource( - struct kbase_context *kctx, struct kbase_va_region *reg, - struct mm_struct *locked_mm) +int kbase_map_external_resource(struct kbase_context *kctx, struct kbase_va_region *reg, + struct mm_struct *locked_mm) { - int err; + int err = 0; + struct kbase_mem_phy_alloc *alloc = reg->gpu_alloc; lockdep_assert_held(&kctx->reg_lock); @@ -5013,7 +5316,7 @@ struct kbase_mem_phy_alloc *kbase_map_external_resource( case KBASE_MEM_TYPE_IMPORTED_USER_BUF: { if ((reg->gpu_alloc->imported.user_buf.mm != locked_mm) && (!reg->gpu_alloc->nents)) - goto exit; + return -EINVAL; reg->gpu_alloc->imported.user_buf.current_mapping_usage_count++; if (reg->gpu_alloc->imported.user_buf @@ -5021,7 +5324,7 @@ struct kbase_mem_phy_alloc *kbase_map_external_resource( err = kbase_jd_user_buf_map(kctx, reg); if (err) { reg->gpu_alloc->imported.user_buf.current_mapping_usage_count--; - goto exit; + return err; } } } @@ -5029,21 +5332,30 @@ struct kbase_mem_phy_alloc *kbase_map_external_resource( case KBASE_MEM_TYPE_IMPORTED_UMM: { err = kbase_mem_umm_map(kctx, reg); if (err) - goto exit; + return err; break; } default: - goto exit; + dev_dbg(kctx->kbdev->dev, + "Invalid external resource GPU allocation type (%x) on mapping", + alloc->type); + return -EINVAL; } - return kbase_mem_phy_alloc_get(reg->gpu_alloc); -exit: - return NULL; + kbase_va_region_alloc_get(kctx, reg); + kbase_mem_phy_alloc_get(alloc); + return err; } -void kbase_unmap_external_resource(struct kbase_context *kctx, - struct kbase_va_region *reg, struct kbase_mem_phy_alloc *alloc) +void kbase_unmap_external_resource(struct kbase_context *kctx, struct kbase_va_region *reg) { + /* gpu_alloc was used in kbase_map_external_resources, so we need to use it for the + * unmapping operation. + */ + struct kbase_mem_phy_alloc *alloc = reg->gpu_alloc; + + lockdep_assert_held(&kctx->reg_lock); + switch (alloc->type) { case KBASE_MEM_TYPE_IMPORTED_UMM: { kbase_mem_umm_unmap(kctx, reg, alloc); @@ -5053,28 +5365,29 @@ void kbase_unmap_external_resource(struct kbase_context *kctx, alloc->imported.user_buf.current_mapping_usage_count--; if (alloc->imported.user_buf.current_mapping_usage_count == 0) { - bool writeable = true; - - if (!kbase_is_region_invalid_or_free(reg) && - reg->gpu_alloc == alloc) - kbase_mmu_teardown_pages( - kctx->kbdev, - &kctx->mmu, - reg->start_pfn, - kbase_reg_current_backed_size(reg), - kctx->as_nr); - - if (reg && ((reg->flags & (KBASE_REG_CPU_WR | KBASE_REG_GPU_WR)) == 0)) - writeable = false; + if (!kbase_is_region_invalid_or_free(reg)) { + kbase_mmu_teardown_imported_pages( + kctx->kbdev, &kctx->mmu, reg->start_pfn, alloc->pages, + kbase_reg_current_backed_size(reg), + kbase_reg_current_backed_size(reg), kctx->as_nr); + } - kbase_jd_user_buf_unmap(kctx, alloc, writeable); + kbase_jd_user_buf_unmap(kctx, alloc, reg); + } } - } break; default: - break; + WARN(1, "Invalid external resource GPU allocation type (%x) on unmapping", + alloc->type); + return; } kbase_mem_phy_alloc_put(alloc); + kbase_va_region_alloc_put(kctx, reg); +} + +static inline u64 kbasep_get_va_gpu_addr(struct kbase_va_region *reg) +{ + return reg->start_pfn << PAGE_SHIFT; } struct kbase_ctx_ext_res_meta *kbase_sticky_resource_acquire( @@ -5090,7 +5403,7 @@ struct kbase_ctx_ext_res_meta *kbase_sticky_resource_acquire( * metadata which matches the region which is being acquired. */ list_for_each_entry(walker, &kctx->ext_res_meta_head, ext_res_node) { - if (walker->gpu_addr == gpu_addr) { + if (kbasep_get_va_gpu_addr(walker->reg) == gpu_addr) { meta = walker; meta->ref++; break; @@ -5102,8 +5415,7 @@ struct kbase_ctx_ext_res_meta *kbase_sticky_resource_acquire( struct kbase_va_region *reg; /* Find the region */ - reg = kbase_region_tracker_find_region_enclosing_address( - kctx, gpu_addr); + reg = kbase_region_tracker_find_region_enclosing_address(kctx, gpu_addr); if (kbase_is_region_invalid_or_free(reg)) goto failed; @@ -5111,18 +5423,18 @@ struct kbase_ctx_ext_res_meta *kbase_sticky_resource_acquire( meta = kzalloc(sizeof(*meta), GFP_KERNEL); if (!meta) goto failed; - /* * Fill in the metadata object and acquire a reference * for the physical resource. */ - meta->alloc = kbase_map_external_resource(kctx, reg, NULL); - meta->ref = 1; + meta->reg = reg; - if (!meta->alloc) + /* Map the external resource to the GPU allocation of the region + * and acquire the reference to the VA region + */ + if (kbase_map_external_resource(kctx, meta->reg, NULL)) goto fail_map; - - meta->gpu_addr = reg->start_pfn << PAGE_SHIFT; + meta->ref = 1; list_add(&meta->ext_res_node, &kctx->ext_res_meta_head); } @@ -5147,7 +5459,7 @@ find_sticky_resource_meta(struct kbase_context *kctx, u64 gpu_addr) * metadata which matches the region which is being released. */ list_for_each_entry(walker, &kctx->ext_res_meta_head, ext_res_node) - if (walker->gpu_addr == gpu_addr) + if (kbasep_get_va_gpu_addr(walker->reg) == gpu_addr) return walker; return NULL; @@ -5156,14 +5468,7 @@ find_sticky_resource_meta(struct kbase_context *kctx, u64 gpu_addr) static void release_sticky_resource_meta(struct kbase_context *kctx, struct kbase_ctx_ext_res_meta *meta) { - struct kbase_va_region *reg; - - /* Drop the physical memory reference and free the metadata. */ - reg = kbase_region_tracker_find_region_enclosing_address( - kctx, - meta->gpu_addr); - - kbase_unmap_external_resource(kctx, reg, meta->alloc); + kbase_unmap_external_resource(kctx, meta->reg); list_del(&meta->ext_res_node); kfree(meta); } diff --git a/mali_kbase/mali_kbase_mem.h b/mali_kbase/mali_kbase_mem.h index 4ac4feb..1a59706 100644 --- a/mali_kbase/mali_kbase_mem.h +++ b/mali_kbase/mali_kbase_mem.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2010-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -37,6 +37,8 @@ #include "mali_kbase_defs.h" /* Required for kbase_mem_evictable_unmake */ #include "mali_kbase_mem_linux.h" +#include "mali_kbase_mem_migrate.h" +#include "mali_kbase_refcount_defs.h" static inline void kbase_process_page_usage_inc(struct kbase_context *kctx, int pages); @@ -60,6 +62,186 @@ static inline void kbase_process_page_usage_inc(struct kbase_context *kctx, #define KBASEP_TMEM_GROWABLE_BLOCKSIZE_PAGES_HW_ISSUE_8316 (1u << KBASEP_TMEM_GROWABLE_BLOCKSIZE_PAGES_LOG2_HW_ISSUE_8316) #define KBASEP_TMEM_GROWABLE_BLOCKSIZE_PAGES_HW_ISSUE_9630 (1u << KBASEP_TMEM_GROWABLE_BLOCKSIZE_PAGES_LOG2_HW_ISSUE_9630) +/* Free region */ +#define KBASE_REG_FREE (1ul << 0) +/* CPU write access */ +#define KBASE_REG_CPU_WR (1ul << 1) +/* GPU write access */ +#define KBASE_REG_GPU_WR (1ul << 2) +/* No eXecute flag */ +#define KBASE_REG_GPU_NX (1ul << 3) +/* Is CPU cached? */ +#define KBASE_REG_CPU_CACHED (1ul << 4) +/* Is GPU cached? + * Some components within the GPU might only be able to access memory that is + * GPU cacheable. Refer to the specific GPU implementation for more details. + */ +#define KBASE_REG_GPU_CACHED (1ul << 5) + +#define KBASE_REG_GROWABLE (1ul << 6) +/* Can grow on pf? */ +#define KBASE_REG_PF_GROW (1ul << 7) + +/* Allocation doesn't straddle the 4GB boundary in GPU virtual space */ +#define KBASE_REG_GPU_VA_SAME_4GB_PAGE (1ul << 8) + +/* inner shareable coherency */ +#define KBASE_REG_SHARE_IN (1ul << 9) +/* inner & outer shareable coherency */ +#define KBASE_REG_SHARE_BOTH (1ul << 10) + +#if MALI_USE_CSF +/* Space for 8 different zones */ +#define KBASE_REG_ZONE_BITS 3 +#else +/* Space for 4 different zones */ +#define KBASE_REG_ZONE_BITS 2 +#endif + +/* The bits 11-13 (inclusive) of the kbase_va_region flag are reserved + * for information about the zone in which it was allocated. + */ +#define KBASE_REG_ZONE_SHIFT (11ul) +#define KBASE_REG_ZONE_MASK (((1 << KBASE_REG_ZONE_BITS) - 1ul) << KBASE_REG_ZONE_SHIFT) + +#if KBASE_REG_ZONE_MAX > (1 << KBASE_REG_ZONE_BITS) +#error "Too many zones for the number of zone bits defined" +#endif + +/* GPU read access */ +#define KBASE_REG_GPU_RD (1ul << 14) +/* CPU read access */ +#define KBASE_REG_CPU_RD (1ul << 15) + +/* Index of chosen MEMATTR for this region (0..7) */ +#define KBASE_REG_MEMATTR_MASK (7ul << 16) +#define KBASE_REG_MEMATTR_INDEX(x) (((x)&7) << 16) +#define KBASE_REG_MEMATTR_VALUE(x) (((x)&KBASE_REG_MEMATTR_MASK) >> 16) + +#define KBASE_REG_PROTECTED (1ul << 19) + +/* Region belongs to a shrinker. + * + * This can either mean that it is part of the JIT/Ephemeral or tiler heap + * shrinker paths. Should be removed only after making sure that there are + * no references remaining to it in these paths, as it may cause the physical + * backing of the region to disappear during use. + */ +#define KBASE_REG_DONT_NEED (1ul << 20) + +/* Imported buffer is padded? */ +#define KBASE_REG_IMPORT_PAD (1ul << 21) + +#if MALI_USE_CSF +/* CSF event memory */ +#define KBASE_REG_CSF_EVENT (1ul << 22) +/* Bit 23 is reserved. + * + * Do not remove, use the next unreserved bit for new flags + */ +#define KBASE_REG_RESERVED_BIT_23 (1ul << 23) +#else +/* Bit 22 is reserved. + * + * Do not remove, use the next unreserved bit for new flags + */ +#define KBASE_REG_RESERVED_BIT_22 (1ul << 22) +/* The top of the initial commit is aligned to extension pages. + * Extent must be a power of 2 + */ +#define KBASE_REG_TILER_ALIGN_TOP (1ul << 23) +#endif /* MALI_USE_CSF */ + +/* Bit 24 is currently unused and is available for use for a new flag */ + +/* Memory has permanent kernel side mapping */ +#define KBASE_REG_PERMANENT_KERNEL_MAPPING (1ul << 25) + +/* GPU VA region has been freed by the userspace, but still remains allocated + * due to the reference held by CPU mappings created on the GPU VA region. + * + * A region with this flag set has had kbase_gpu_munmap() called on it, but can + * still be looked-up in the region tracker as a non-free region. Hence must + * not create or update any more GPU mappings on such regions because they will + * not be unmapped when the region is finally destroyed. + * + * Since such regions are still present in the region tracker, new allocations + * attempted with BASE_MEM_SAME_VA might fail if their address intersects with + * a region with this flag set. + * + * In addition, this flag indicates the gpu_alloc member might no longer valid + * e.g. in infinite cache simulation. + */ +#define KBASE_REG_VA_FREED (1ul << 26) + +/* If set, the heap info address points to a u32 holding the used size in bytes; + * otherwise it points to a u64 holding the lowest address of unused memory. + */ +#define KBASE_REG_HEAP_INFO_IS_SIZE (1ul << 27) + +/* Allocation is actively used for JIT memory */ +#define KBASE_REG_ACTIVE_JIT_ALLOC (1ul << 28) + +#if MALI_USE_CSF +/* This flag only applies to allocations in the EXEC_FIXED_VA and FIXED_VA + * memory zones, and it determines whether they were created with a fixed + * GPU VA address requested by the user. + */ +#define KBASE_REG_FIXED_ADDRESS (1ul << 29) +#else +#define KBASE_REG_RESERVED_BIT_29 (1ul << 29) +#endif + +#define KBASE_REG_ZONE_CUSTOM_VA_BASE (0x100000000ULL >> PAGE_SHIFT) + +#if MALI_USE_CSF +/* only used with 32-bit clients */ +/* On a 32bit platform, custom VA should be wired from 4GB to 2^(43). + */ +#define KBASE_REG_ZONE_CUSTOM_VA_SIZE (((1ULL << 43) >> PAGE_SHIFT) - KBASE_REG_ZONE_CUSTOM_VA_BASE) +#else +/* only used with 32-bit clients */ +/* On a 32bit platform, custom VA should be wired from 4GB to the VA limit of the + * GPU. Unfortunately, the Linux mmap() interface limits us to 2^32 pages (2^44 + * bytes, see mmap64 man page for reference). So we put the default limit to the + * maximum possible on Linux and shrink it down, if required by the GPU, during + * initialization. + */ +#define KBASE_REG_ZONE_CUSTOM_VA_SIZE (((1ULL << 44) >> PAGE_SHIFT) - KBASE_REG_ZONE_CUSTOM_VA_BASE) +/* end 32-bit clients only */ +#endif + +/* The starting address and size of the GPU-executable zone are dynamic + * and depend on the platform and the number of pages requested by the + * user process, with an upper limit of 4 GB. + */ +#define KBASE_REG_ZONE_EXEC_VA_MAX_PAGES ((1ULL << 32) >> PAGE_SHIFT) /* 4 GB */ +#define KBASE_REG_ZONE_EXEC_VA_SIZE KBASE_REG_ZONE_EXEC_VA_MAX_PAGES + +#if MALI_USE_CSF +#define KBASE_REG_ZONE_MCU_SHARED_BASE (0x04000000ULL >> PAGE_SHIFT) +#define MCU_SHARED_ZONE_SIZE (((0x08000000ULL) >> PAGE_SHIFT) - KBASE_REG_ZONE_MCU_SHARED_BASE) + +/* For CSF GPUs, the EXEC_VA zone is always 4GB in size, and starts at 2^47 for 64-bit + * clients, and 2^43 for 32-bit clients. + */ +#define KBASE_REG_ZONE_EXEC_VA_BASE_64 ((1ULL << 47) >> PAGE_SHIFT) +#define KBASE_REG_ZONE_EXEC_VA_BASE_32 ((1ULL << 43) >> PAGE_SHIFT) +/* Executable zone supporting FIXED/FIXABLE allocations. + * It is always 4GB in size. + */ +#define KBASE_REG_ZONE_EXEC_FIXED_VA_SIZE KBASE_REG_ZONE_EXEC_VA_MAX_PAGES + +/* Non-executable zone supporting FIXED/FIXABLE allocations. + * It extends from (2^47) up to (2^48)-1, for 64-bit userspace clients, and from + * (2^43) up to (2^44)-1 for 32-bit userspace clients. For the same reason, + * the end of the FIXED_VA zone for 64-bit clients is (2^48)-1. + */ +#define KBASE_REG_ZONE_FIXED_VA_END_64 ((1ULL << 48) >> PAGE_SHIFT) +#define KBASE_REG_ZONE_FIXED_VA_END_32 ((1ULL << 44) >> PAGE_SHIFT) + +#endif + /* * A CPU mapping */ @@ -182,6 +364,106 @@ struct kbase_mem_phy_alloc { } imported; }; +/** + * enum kbase_page_status - Status of a page used for page migration. + * + * @MEM_POOL: Stable state. Page is located in a memory pool and can safely + * be migrated. + * @ALLOCATE_IN_PROGRESS: Transitory state. A page is set to this status as + * soon as it leaves a memory pool. + * @SPILL_IN_PROGRESS: Transitory state. Corner case where pages in a memory + * pool of a dying context are being moved to the device + * memory pool. + * @NOT_MOVABLE: Stable state. Page has been allocated for an object that is + * not movable, but may return to be movable when the object + * is freed. + * @ALLOCATED_MAPPED: Stable state. Page has been allocated, mapped to GPU + * and has reference to kbase_mem_phy_alloc object. + * @PT_MAPPED: Stable state. Similar to ALLOCATED_MAPPED, but page doesn't + * reference kbase_mem_phy_alloc object. Used as a page in MMU + * page table. + * @FREE_IN_PROGRESS: Transitory state. A page is set to this status as soon as + * the driver manages to acquire a lock on the page while + * unmapping it. This status means that a memory release is + * happening and it's still not complete. + * @FREE_ISOLATED_IN_PROGRESS: Transitory state. This is a very particular corner case. + * A page is isolated while it is in ALLOCATED_MAPPED state, + * but then the driver tries to destroy the allocation. + * @FREE_PT_ISOLATED_IN_PROGRESS: Transitory state. This is a very particular corner case. + * A page is isolated while it is in PT_MAPPED state, but + * then the driver tries to destroy the allocation. + * + * Pages can only be migrated in stable states. + */ +enum kbase_page_status { + MEM_POOL = 0, + ALLOCATE_IN_PROGRESS, + SPILL_IN_PROGRESS, + NOT_MOVABLE, + ALLOCATED_MAPPED, + PT_MAPPED, + FREE_IN_PROGRESS, + FREE_ISOLATED_IN_PROGRESS, + FREE_PT_ISOLATED_IN_PROGRESS, +}; + +#define PGD_VPFN_LEVEL_MASK ((u64)0x3) +#define PGD_VPFN_LEVEL_GET_LEVEL(pgd_vpfn_level) (pgd_vpfn_level & PGD_VPFN_LEVEL_MASK) +#define PGD_VPFN_LEVEL_GET_VPFN(pgd_vpfn_level) (pgd_vpfn_level & ~PGD_VPFN_LEVEL_MASK) +#define PGD_VPFN_LEVEL_SET(pgd_vpfn, level) \ + ((pgd_vpfn & ~PGD_VPFN_LEVEL_MASK) | (level & PGD_VPFN_LEVEL_MASK)) + +/** + * struct kbase_page_metadata - Metadata for each page in kbase + * + * @kbdev: Pointer to kbase device. + * @dma_addr: DMA address mapped to page. + * @migrate_lock: A spinlock to protect the private metadata. + * @data: Member in union valid based on @status. + * @status: Status to keep track if page can be migrated at any + * given moment. MSB will indicate if page is isolated. + * Protected by @migrate_lock. + * @vmap_count: Counter of kernel mappings. + * @group_id: Memory group ID obtained at the time of page allocation. + * + * Each 4KB page will have a reference to this struct in the private field. + * This will be used to keep track of information required for Linux page + * migration functionality as well as address for DMA mapping. + */ +struct kbase_page_metadata { + dma_addr_t dma_addr; + spinlock_t migrate_lock; + + union { + struct { + struct kbase_mem_pool *pool; + /* Pool could be terminated after page is isolated and therefore + * won't be able to get reference to kbase device. + */ + struct kbase_device *kbdev; + } mem_pool; + struct { + struct kbase_va_region *reg; + struct kbase_mmu_table *mmut; + u64 vpfn; + } mapped; + struct { + struct kbase_mmu_table *mmut; + u64 pgd_vpfn_level; + } pt_mapped; + struct { + struct kbase_device *kbdev; + } free_isolated; + struct { + struct kbase_device *kbdev; + } free_pt_isolated; + } data; + + u8 status; + u8 vmap_count; + u8 group_id; +}; + /* The top bit of kbase_alloc_import_user_buf::current_mapping_usage_count is * used to signify that a buffer was pinned when it was imported. Since the * reference count is limited by the number of atoms that can be submitted at @@ -204,6 +486,46 @@ enum kbase_jit_report_flags { KBASE_JIT_REPORT_ON_ALLOC_OR_FREE = (1u << 0) }; +/** + * kbase_zone_to_bits - Convert a memory zone @zone to the corresponding + * bitpattern, for ORing together with other flags. + * @zone: Memory zone + * + * Return: Bitpattern with the appropriate bits set. + */ +unsigned long kbase_zone_to_bits(enum kbase_memory_zone zone); + +/** + * kbase_bits_to_zone - Convert the bitpattern @zone_bits to the corresponding + * zone identifier + * @zone_bits: Memory allocation flag containing a zone pattern + * + * Return: Zone identifier for valid zone bitpatterns, + */ +enum kbase_memory_zone kbase_bits_to_zone(unsigned long zone_bits); + +/** + * kbase_mem_zone_get_name - Get the string name for a given memory zone + * @zone: Memory zone identifier + * + * Return: string for valid memory zone, NULL otherwise + */ +char *kbase_reg_zone_get_name(enum kbase_memory_zone zone); + +/** + * kbase_set_phy_alloc_page_status - Set the page migration status of the underlying + * physical allocation. + * @alloc: the physical allocation containing the pages whose metadata is going + * to be modified + * @status: the status the pages should end up in + * + * Note that this function does not go through all of the checking to ensure that + * proper states are set. Instead, it is only used when we change the allocation + * to NOT_MOVABLE or from NOT_MOVABLE to ALLOCATED_MAPPED + */ +void kbase_set_phy_alloc_page_status(struct kbase_mem_phy_alloc *alloc, + enum kbase_page_status status); + static inline void kbase_mem_phy_alloc_gpu_mapped(struct kbase_mem_phy_alloc *alloc) { KBASE_DEBUG_ASSERT(alloc); @@ -224,8 +546,9 @@ static inline void kbase_mem_phy_alloc_gpu_unmapped(struct kbase_mem_phy_alloc * } /** - * kbase_mem_phy_alloc_kernel_mapped - Increment kernel_mappings - * counter for a memory region to prevent commit and flag changes + * kbase_mem_phy_alloc_kernel_mapped - Increment kernel_mappings counter for a + * memory region to prevent commit and flag + * changes * * @alloc: Pointer to physical pages tracking object */ @@ -303,6 +626,8 @@ static inline struct kbase_mem_phy_alloc *kbase_mem_phy_alloc_put(struct kbase_m * @jit_usage_id: The last just-in-time memory usage ID for this region. * @jit_bin_id: The just-in-time memory bin this region came from. * @va_refcnt: Number of users of this region. Protected by reg_lock. + * @no_user_free_count: Number of contexts that want to prevent the region + * from being freed by userspace. * @heap_info_gpu_addr: Pointer to an object in GPU memory defining an end of * an allocated region * The object can be one of: @@ -330,200 +655,6 @@ struct kbase_va_region { size_t nr_pages; size_t initial_commit; size_t threshold_pages; - -/* Free region */ -#define KBASE_REG_FREE (1ul << 0) -/* CPU write access */ -#define KBASE_REG_CPU_WR (1ul << 1) -/* GPU write access */ -#define KBASE_REG_GPU_WR (1ul << 2) -/* No eXecute flag */ -#define KBASE_REG_GPU_NX (1ul << 3) -/* Is CPU cached? */ -#define KBASE_REG_CPU_CACHED (1ul << 4) -/* Is GPU cached? - * Some components within the GPU might only be able to access memory that is - * GPU cacheable. Refer to the specific GPU implementation for more details. - */ -#define KBASE_REG_GPU_CACHED (1ul << 5) - -#define KBASE_REG_GROWABLE (1ul << 6) -/* Can grow on pf? */ -#define KBASE_REG_PF_GROW (1ul << 7) - -/* Allocation doesn't straddle the 4GB boundary in GPU virtual space */ -#define KBASE_REG_GPU_VA_SAME_4GB_PAGE (1ul << 8) - -/* inner shareable coherency */ -#define KBASE_REG_SHARE_IN (1ul << 9) -/* inner & outer shareable coherency */ -#define KBASE_REG_SHARE_BOTH (1ul << 10) - -#if MALI_USE_CSF -/* Space for 8 different zones */ -#define KBASE_REG_ZONE_BITS 3 -#else -/* Space for 4 different zones */ -#define KBASE_REG_ZONE_BITS 2 -#endif - -#define KBASE_REG_ZONE_MASK (((1 << KBASE_REG_ZONE_BITS) - 1ul) << 11) -#define KBASE_REG_ZONE(x) (((x) & ((1 << KBASE_REG_ZONE_BITS) - 1ul)) << 11) -#define KBASE_REG_ZONE_IDX(x) (((x) & KBASE_REG_ZONE_MASK) >> 11) - -#if KBASE_REG_ZONE_MAX > (1 << KBASE_REG_ZONE_BITS) -#error "Too many zones for the number of zone bits defined" -#endif - -/* GPU read access */ -#define KBASE_REG_GPU_RD (1ul << 14) -/* CPU read access */ -#define KBASE_REG_CPU_RD (1ul << 15) - -/* Index of chosen MEMATTR for this region (0..7) */ -#define KBASE_REG_MEMATTR_MASK (7ul << 16) -#define KBASE_REG_MEMATTR_INDEX(x) (((x) & 7) << 16) -#define KBASE_REG_MEMATTR_VALUE(x) (((x) & KBASE_REG_MEMATTR_MASK) >> 16) - -#define KBASE_REG_PROTECTED (1ul << 19) - -#define KBASE_REG_DONT_NEED (1ul << 20) - -/* Imported buffer is padded? */ -#define KBASE_REG_IMPORT_PAD (1ul << 21) - -#if MALI_USE_CSF -/* CSF event memory */ -#define KBASE_REG_CSF_EVENT (1ul << 22) -#else -/* Bit 22 is reserved. - * - * Do not remove, use the next unreserved bit for new flags - */ -#define KBASE_REG_RESERVED_BIT_22 (1ul << 22) -#endif - -#if !MALI_USE_CSF -/* The top of the initial commit is aligned to extension pages. - * Extent must be a power of 2 - */ -#define KBASE_REG_TILER_ALIGN_TOP (1ul << 23) -#else -/* Bit 23 is reserved. - * - * Do not remove, use the next unreserved bit for new flags - */ -#define KBASE_REG_RESERVED_BIT_23 (1ul << 23) -#endif /* !MALI_USE_CSF */ - -/* Whilst this flag is set the GPU allocation is not supposed to be freed by - * user space. The flag will remain set for the lifetime of JIT allocations. - */ -#define KBASE_REG_NO_USER_FREE (1ul << 24) - -/* Memory has permanent kernel side mapping */ -#define KBASE_REG_PERMANENT_KERNEL_MAPPING (1ul << 25) - -/* GPU VA region has been freed by the userspace, but still remains allocated - * due to the reference held by CPU mappings created on the GPU VA region. - * - * A region with this flag set has had kbase_gpu_munmap() called on it, but can - * still be looked-up in the region tracker as a non-free region. Hence must - * not create or update any more GPU mappings on such regions because they will - * not be unmapped when the region is finally destroyed. - * - * Since such regions are still present in the region tracker, new allocations - * attempted with BASE_MEM_SAME_VA might fail if their address intersects with - * a region with this flag set. - * - * In addition, this flag indicates the gpu_alloc member might no longer valid - * e.g. in infinite cache simulation. - */ -#define KBASE_REG_VA_FREED (1ul << 26) - -/* If set, the heap info address points to a u32 holding the used size in bytes; - * otherwise it points to a u64 holding the lowest address of unused memory. - */ -#define KBASE_REG_HEAP_INFO_IS_SIZE (1ul << 27) - -/* Allocation is actively used for JIT memory */ -#define KBASE_REG_ACTIVE_JIT_ALLOC (1ul << 28) - -#if MALI_USE_CSF -/* This flag only applies to allocations in the EXEC_FIXED_VA and FIXED_VA - * memory zones, and it determines whether they were created with a fixed - * GPU VA address requested by the user. - */ -#define KBASE_REG_FIXED_ADDRESS (1ul << 29) -#else -#define KBASE_REG_RESERVED_BIT_29 (1ul << 29) -#endif - -#define KBASE_REG_ZONE_SAME_VA KBASE_REG_ZONE(0) - -#define KBASE_REG_ZONE_CUSTOM_VA KBASE_REG_ZONE(1) -#define KBASE_REG_ZONE_CUSTOM_VA_BASE (0x100000000ULL >> PAGE_SHIFT) - -#if MALI_USE_CSF -/* only used with 32-bit clients */ -/* On a 32bit platform, custom VA should be wired from 4GB to 2^(43). - */ -#define KBASE_REG_ZONE_CUSTOM_VA_SIZE \ - (((1ULL << 43) >> PAGE_SHIFT) - KBASE_REG_ZONE_CUSTOM_VA_BASE) -#else -/* only used with 32-bit clients */ -/* On a 32bit platform, custom VA should be wired from 4GB to the VA limit of the - * GPU. Unfortunately, the Linux mmap() interface limits us to 2^32 pages (2^44 - * bytes, see mmap64 man page for reference). So we put the default limit to the - * maximum possible on Linux and shrink it down, if required by the GPU, during - * initialization. - */ -#define KBASE_REG_ZONE_CUSTOM_VA_SIZE \ - (((1ULL << 44) >> PAGE_SHIFT) - KBASE_REG_ZONE_CUSTOM_VA_BASE) -/* end 32-bit clients only */ -#endif - -/* The starting address and size of the GPU-executable zone are dynamic - * and depend on the platform and the number of pages requested by the - * user process, with an upper limit of 4 GB. - */ -#define KBASE_REG_ZONE_EXEC_VA KBASE_REG_ZONE(2) -#define KBASE_REG_ZONE_EXEC_VA_MAX_PAGES ((1ULL << 32) >> PAGE_SHIFT) /* 4 GB */ - -#if MALI_USE_CSF -#define KBASE_REG_ZONE_MCU_SHARED KBASE_REG_ZONE(3) -#define KBASE_REG_ZONE_MCU_SHARED_BASE (0x04000000ULL >> PAGE_SHIFT) -#define KBASE_REG_ZONE_MCU_SHARED_SIZE (((0x08000000ULL) >> PAGE_SHIFT) - \ - KBASE_REG_ZONE_MCU_SHARED_BASE) - -/* For CSF GPUs, the EXEC_VA zone is always 4GB in size, and starts at 2^47 for 64-bit - * clients, and 2^43 for 32-bit clients. - */ -#define KBASE_REG_ZONE_EXEC_VA_BASE_64 ((1ULL << 47) >> PAGE_SHIFT) -#define KBASE_REG_ZONE_EXEC_VA_BASE_32 ((1ULL << 43) >> PAGE_SHIFT) -#define KBASE_REG_ZONE_EXEC_VA_SIZE KBASE_REG_ZONE_EXEC_VA_MAX_PAGES - -/* Executable zone supporting FIXED/FIXABLE allocations. - * It is always 4GB in size. - */ - -#define KBASE_REG_ZONE_EXEC_FIXED_VA KBASE_REG_ZONE(4) -#define KBASE_REG_ZONE_EXEC_FIXED_VA_SIZE KBASE_REG_ZONE_EXEC_VA_MAX_PAGES - -/* Non-executable zone supporting FIXED/FIXABLE allocations. - * It extends from (2^47) up to (2^48)-1, for 64-bit userspace clients, and from - * (2^43) up to (2^44)-1 for 32-bit userspace clients. - */ -#define KBASE_REG_ZONE_FIXED_VA KBASE_REG_ZONE(5) - -/* Again - 32-bit userspace cannot map addresses beyond 2^44, but 64-bit can - and so - * the end of the FIXED_VA zone for 64-bit clients is (2^48)-1. - */ -#define KBASE_REG_ZONE_FIXED_VA_END_64 ((1ULL << 48) >> PAGE_SHIFT) -#define KBASE_REG_ZONE_FIXED_VA_END_32 ((1ULL << 44) >> PAGE_SHIFT) - -#endif - unsigned long flags; size_t extension; struct kbase_mem_phy_alloc *cpu_alloc; @@ -559,24 +690,24 @@ struct kbase_va_region { size_t used_pages; #endif /* MALI_JIT_PRESSURE_LIMIT_BASE */ - int va_refcnt; + kbase_refcount_t va_refcnt; + atomic_t no_user_free_count; }; /** - * kbase_is_ctx_reg_zone - determine whether a KBASE_REG_ZONE_<...> is for a - * context or for a device - * @zone_bits: A KBASE_REG_ZONE_<...> to query + * kbase_is_ctx_reg_zone - Determine whether a zone is associated with a + * context or with the device + * @zone: Zone identifier * - * Return: True if the zone for @zone_bits is a context zone, False otherwise + * Return: True if @zone is a context zone, False otherwise */ -static inline bool kbase_is_ctx_reg_zone(unsigned long zone_bits) +static inline bool kbase_is_ctx_reg_zone(enum kbase_memory_zone zone) { - WARN_ON((zone_bits & KBASE_REG_ZONE_MASK) != zone_bits); - return (zone_bits == KBASE_REG_ZONE_SAME_VA || #if MALI_USE_CSF - zone_bits == KBASE_REG_ZONE_EXEC_FIXED_VA || zone_bits == KBASE_REG_ZONE_FIXED_VA || + return !(zone == MCU_SHARED_ZONE); +#else + return true; #endif - zone_bits == KBASE_REG_ZONE_CUSTOM_VA || zone_bits == KBASE_REG_ZONE_EXEC_VA); } /* Special marker for failed JIT allocations that still must be marked as @@ -602,6 +733,23 @@ static inline bool kbase_is_region_invalid_or_free(struct kbase_va_region *reg) return (kbase_is_region_invalid(reg) || kbase_is_region_free(reg)); } +/** + * kbase_is_region_shrinkable - Check if a region is "shrinkable". + * A shrinkable regions is a region for which its backing pages (reg->gpu_alloc->pages) + * can be freed at any point, even though the kbase_va_region structure itself + * may have been refcounted. + * Regions that aren't on a shrinker, but could be shrunk at any point in future + * without warning are still considered "shrinkable" (e.g. Active JIT allocs) + * + * @reg: Pointer to region + * + * Return: true if the region is "shrinkable", false if not. + */ +static inline bool kbase_is_region_shrinkable(struct kbase_va_region *reg) +{ + return (reg->flags & KBASE_REG_DONT_NEED) || (reg->flags & KBASE_REG_ACTIVE_JIT_ALLOC); +} + void kbase_remove_va_region(struct kbase_device *kbdev, struct kbase_va_region *reg); static inline void kbase_region_refcnt_free(struct kbase_device *kbdev, @@ -619,14 +767,12 @@ static inline void kbase_region_refcnt_free(struct kbase_device *kbdev, static inline struct kbase_va_region *kbase_va_region_alloc_get( struct kbase_context *kctx, struct kbase_va_region *region) { - lockdep_assert_held(&kctx->reg_lock); + WARN_ON(!kbase_refcount_read(®ion->va_refcnt)); + WARN_ON(kbase_refcount_read(®ion->va_refcnt) == INT_MAX); - WARN_ON(!region->va_refcnt); - - /* non-atomic as kctx->reg_lock is held */ dev_dbg(kctx->kbdev->dev, "va_refcnt %d before get %pK\n", - region->va_refcnt, (void *)region); - region->va_refcnt++; + kbase_refcount_read(®ion->va_refcnt), (void *)region); + kbase_refcount_inc(®ion->va_refcnt); return region; } @@ -634,21 +780,67 @@ static inline struct kbase_va_region *kbase_va_region_alloc_get( static inline struct kbase_va_region *kbase_va_region_alloc_put( struct kbase_context *kctx, struct kbase_va_region *region) { - lockdep_assert_held(&kctx->reg_lock); - - WARN_ON(region->va_refcnt <= 0); + WARN_ON(kbase_refcount_read(®ion->va_refcnt) <= 0); WARN_ON(region->flags & KBASE_REG_FREE); - /* non-atomic as kctx->reg_lock is held */ - region->va_refcnt--; - dev_dbg(kctx->kbdev->dev, "va_refcnt %d after put %pK\n", - region->va_refcnt, (void *)region); - if (!region->va_refcnt) + if (kbase_refcount_dec_and_test(®ion->va_refcnt)) kbase_region_refcnt_free(kctx->kbdev, region); + else + dev_dbg(kctx->kbdev->dev, "va_refcnt %d after put %pK\n", + kbase_refcount_read(®ion->va_refcnt), (void *)region); return NULL; } +/** + * kbase_va_region_is_no_user_free - Check if user free is forbidden for the region. + * A region that must not be freed by userspace indicates that it is owned by some other + * kbase subsystem, for example tiler heaps, JIT memory or CSF queues. + * Such regions must not be shrunk (i.e. have their backing pages freed), except by the + * current owner. + * Hence, callers cannot rely on this check alone to determine if a region might be shrunk + * by any part of kbase. Instead they should use kbase_is_region_shrinkable(). + * + * @region: Pointer to region. + * + * Return: true if userspace cannot free the region, false if userspace can free the region. + */ +static inline bool kbase_va_region_is_no_user_free(struct kbase_va_region *region) +{ + return atomic_read(®ion->no_user_free_count) > 0; +} + +/** + * kbase_va_region_no_user_free_inc - Increment "no user free" count for a region. + * Calling this function will prevent the region to be shrunk by parts of kbase that + * don't own the region (as long as the count stays above zero). Refer to + * kbase_va_region_is_no_user_free() for more information. + * + * @region: Pointer to region (not shrinkable). + * + * Return: the pointer to the region passed as argument. + */ +static inline void kbase_va_region_no_user_free_inc(struct kbase_va_region *region) +{ + WARN_ON(kbase_is_region_shrinkable(region)); + WARN_ON(atomic_read(®ion->no_user_free_count) == INT_MAX); + + /* non-atomic as kctx->reg_lock is held */ + atomic_inc(®ion->no_user_free_count); +} + +/** + * kbase_va_region_no_user_free_dec - Decrement "no user free" count for a region. + * + * @region: Pointer to region (not shrinkable). + */ +static inline void kbase_va_region_no_user_free_dec(struct kbase_va_region *region) +{ + WARN_ON(!kbase_va_region_is_no_user_free(region)); + + atomic_dec(®ion->no_user_free_count); +} + /* Common functions */ static inline struct tagged_addr *kbase_get_cpu_phy_pages( struct kbase_va_region *reg) @@ -862,12 +1054,9 @@ static inline size_t kbase_mem_pool_config_get_max_size( * * Return: 0 on success, negative -errno on error */ -int kbase_mem_pool_init(struct kbase_mem_pool *pool, - const struct kbase_mem_pool_config *config, - unsigned int order, - int group_id, - struct kbase_device *kbdev, - struct kbase_mem_pool *next_pool); +int kbase_mem_pool_init(struct kbase_mem_pool *pool, const struct kbase_mem_pool_config *config, + unsigned int order, int group_id, struct kbase_device *kbdev, + struct kbase_mem_pool *next_pool); /** * kbase_mem_pool_term - Destroy a memory pool @@ -947,6 +1136,9 @@ void kbase_mem_pool_free_locked(struct kbase_mem_pool *pool, struct page *p, * @pages: Pointer to array where the physical address of the allocated * pages will be stored. * @partial_allowed: If fewer pages allocated is allowed + * @page_owner: Pointer to the task that created the Kbase context for which + * the pages are being allocated. It can be NULL if the pages + * won't be associated with any Kbase context. * * Like kbase_mem_pool_alloc() but optimized for allocating many pages. * @@ -963,7 +1155,8 @@ void kbase_mem_pool_free_locked(struct kbase_mem_pool *pool, struct page *p, * this lock, it should use kbase_mem_pool_alloc_pages_locked() instead. */ int kbase_mem_pool_alloc_pages(struct kbase_mem_pool *pool, size_t nr_4k_pages, - struct tagged_addr *pages, bool partial_allowed); + struct tagged_addr *pages, bool partial_allowed, + struct task_struct *page_owner); /** * kbase_mem_pool_alloc_pages_locked - Allocate pages from memory pool @@ -1075,13 +1268,17 @@ void kbase_mem_pool_set_max_size(struct kbase_mem_pool *pool, size_t max_size); * kbase_mem_pool_grow - Grow the pool * @pool: Memory pool to grow * @nr_to_grow: Number of pages to add to the pool + * @page_owner: Pointer to the task that created the Kbase context for which + * the memory pool is being grown. It can be NULL if the pages + * to be allocated won't be associated with any Kbase context. * * Adds @nr_to_grow pages to the pool. Note that this may cause the pool to * become larger than the maximum size specified. * * Return: 0 on success, -ENOMEM if unable to allocate sufficent pages */ -int kbase_mem_pool_grow(struct kbase_mem_pool *pool, size_t nr_to_grow); +int kbase_mem_pool_grow(struct kbase_mem_pool *pool, size_t nr_to_grow, + struct task_struct *page_owner); /** * kbase_mem_pool_trim - Grow or shrink the pool to a new size @@ -1115,6 +1312,16 @@ void kbase_mem_pool_mark_dying(struct kbase_mem_pool *pool); struct page *kbase_mem_alloc_page(struct kbase_mem_pool *pool); /** + * kbase_mem_pool_free_page - Free a page from a memory pool. + * @pool: Memory pool to free a page from + * @p: Page to free + * + * This will free any associated data stored for the page and release + * the page back to the kernel. + */ +void kbase_mem_pool_free_page(struct kbase_mem_pool *pool, struct page *p); + +/** * kbase_region_tracker_init - Initialize the region tracker data structure * @kctx: kbase context * @@ -1159,18 +1366,19 @@ int kbase_region_tracker_init_exec(struct kbase_context *kctx, u64 exec_va_pages void kbase_region_tracker_term(struct kbase_context *kctx); /** - * kbase_region_tracker_term_rbtree - Free memory for a region tracker + * kbase_region_tracker_erase_rbtree - Free memory for a region tracker * * @rbtree: Region tracker tree root * * This will free all the regions within the region tracker */ -void kbase_region_tracker_term_rbtree(struct rb_root *rbtree); +void kbase_region_tracker_erase_rbtree(struct rb_root *rbtree); struct kbase_va_region *kbase_region_tracker_find_region_enclosing_address( struct kbase_context *kctx, u64 gpu_addr); struct kbase_va_region *kbase_find_region_enclosing_address( struct rb_root *rbtree, u64 gpu_addr); +void kbase_region_tracker_insert(struct kbase_va_region *new_reg); /** * kbase_region_tracker_find_region_base_address - Check that a pointer is @@ -1187,8 +1395,11 @@ struct kbase_va_region *kbase_region_tracker_find_region_base_address( struct kbase_va_region *kbase_find_region_base_address(struct rb_root *rbtree, u64 gpu_addr); -struct kbase_va_region *kbase_alloc_free_region(struct rb_root *rbtree, - u64 start_pfn, size_t nr_pages, int zone); +struct kbase_va_region *kbase_alloc_free_region(struct kbase_reg_zone *zone, u64 start_pfn, + size_t nr_pages); +struct kbase_va_region *kbase_ctx_alloc_free_region(struct kbase_context *kctx, + enum kbase_memory_zone id, u64 start_pfn, + size_t nr_pages); void kbase_free_alloced_region(struct kbase_va_region *reg); int kbase_add_va_region(struct kbase_context *kctx, struct kbase_va_region *reg, u64 addr, size_t nr_pages, size_t align); @@ -1199,6 +1410,32 @@ int kbase_add_va_region_rbtree(struct kbase_device *kbdev, bool kbase_check_alloc_flags(unsigned long flags); bool kbase_check_import_flags(unsigned long flags); +static inline bool kbase_import_size_is_valid(struct kbase_device *kbdev, u64 va_pages) +{ + if (va_pages > KBASE_MEM_ALLOC_MAX_SIZE) { + dev_dbg( + kbdev->dev, + "Import attempted with va_pages==%lld larger than KBASE_MEM_ALLOC_MAX_SIZE!", + (unsigned long long)va_pages); + return false; + } + + return true; +} + +static inline bool kbase_alias_size_is_valid(struct kbase_device *kbdev, u64 va_pages) +{ + if (va_pages > KBASE_MEM_ALLOC_MAX_SIZE) { + dev_dbg( + kbdev->dev, + "Alias attempted with va_pages==%lld larger than KBASE_MEM_ALLOC_MAX_SIZE!", + (unsigned long long)va_pages); + return false; + } + + return true; +} + /** * kbase_check_alloc_sizes - check user space sizes parameters for an * allocation @@ -1233,9 +1470,75 @@ int kbase_check_alloc_sizes(struct kbase_context *kctx, unsigned long flags, int kbase_update_region_flags(struct kbase_context *kctx, struct kbase_va_region *reg, unsigned long flags); +/** + * kbase_gpu_vm_lock() - Acquire the per-context region list lock + * @kctx: KBase context + * + * Care must be taken when making an allocation whilst holding this lock, because of interaction + * with the Kernel's OoM-killer and use of this lock in &vm_operations_struct close() handlers. + * + * If this lock is taken during a syscall, and/or the allocation is 'small' then it is safe to use. + * + * If the caller is not in a syscall, and the allocation is 'large', then it must not hold this + * lock. + * + * This is because the kernel OoM killer might target the process corresponding to that same kbase + * context, and attempt to call the context's close() handlers for its open VMAs. This is safe if + * the allocating caller is in a syscall, because the VMA close() handlers are delayed until all + * syscalls have finished (noting that no new syscalls can start as the remaining user threads will + * have been killed too), and so there is no possibility of contention between the thread + * allocating with this lock held, and the VMA close() handler. + * + * However, outside of a syscall (e.g. a kworker or other kthread), one of kbase's VMA close() + * handlers (kbase_cpu_vm_close()) also takes this lock, and so prevents the process from being + * killed until the caller of the function allocating memory has released this lock. On subsequent + * retries for allocating a page, the OoM killer would be re-invoked but skips over the process + * stuck in its close() handler. + * + * Also because the caller is not in a syscall, the page allocation code in the kernel is not aware + * that the allocation is being done on behalf of another process, and so does not realize that + * process has received a kill signal due to an OoM, and so will continually retry with the OoM + * killer until enough memory has been released, or until all other killable processes have been + * killed (at which point the kernel halts with a panic). + * + * However, if the allocation outside of a syscall is small enough to be satisfied by killing + * another process, then the allocation completes, the caller releases this lock, and + * kbase_cpu_vm_close() can unblock and allow the process to be killed. + * + * Hence, this is effectively a deadlock with kbase_cpu_vm_close(), except that if the memory + * allocation is small enough the deadlock can be resolved. For that reason, such a memory deadlock + * is NOT discovered with CONFIG_PROVE_LOCKING. + * + * If this may be called outside of a syscall, consider moving allocations outside of this lock, or + * use __GFP_NORETRY for such allocations (which will allow direct-reclaim attempts, but will + * prevent OoM kills to satisfy the allocation, and will just fail the allocation instead). + */ void kbase_gpu_vm_lock(struct kbase_context *kctx); + +/** + * kbase_gpu_vm_lock_with_pmode_sync() - Wrapper of kbase_gpu_vm_lock. + * @kctx: KBase context + * + * Same as kbase_gpu_vm_lock for JM GPU. + * Additionally acquire P.mode read-write semaphore for CSF GPU. + */ +void kbase_gpu_vm_lock_with_pmode_sync(struct kbase_context *kctx); + +/** + * kbase_gpu_vm_unlock() - Release the per-context region list lock + * @kctx: KBase context + */ void kbase_gpu_vm_unlock(struct kbase_context *kctx); +/** + * kbase_gpu_vm_unlock_with_pmode_sync() - Wrapper of kbase_gpu_vm_unlock. + * @kctx: KBase context + * + * Same as kbase_gpu_vm_unlock for JM GPU. + * Additionally release P.mode read-write semaphore for CSF GPU. + */ +void kbase_gpu_vm_unlock_with_pmode_sync(struct kbase_context *kctx); + int kbase_alloc_phy_pages(struct kbase_va_region *reg, size_t vsize, size_t size); /** @@ -1311,6 +1614,7 @@ void kbase_mmu_disable_as(struct kbase_device *kbdev, int as_nr); void kbase_mmu_interrupt(struct kbase_device *kbdev, u32 irq_stat); +#if defined(CONFIG_MALI_VECTOR_DUMP) /** * kbase_mmu_dump() - Dump the MMU tables to a buffer. * @@ -1330,6 +1634,7 @@ void kbase_mmu_interrupt(struct kbase_device *kbdev, u32 irq_stat); * (including if the @c nr_pages is too small) */ void *kbase_mmu_dump(struct kbase_context *kctx, int nr_pages); +#endif /** * kbase_sync_now - Perform cache maintenance on a memory region @@ -1449,15 +1754,21 @@ int kbasep_find_enclosing_gpu_mapping_start_and_offset( * @alloc: allocation object to add pages to * @nr_pages_requested: number of physical pages to allocate * - * Allocates \a nr_pages_requested and updates the alloc object. + * Allocates @nr_pages_requested and updates the alloc object. * - * Return: 0 if all pages have been successfully allocated. Error code otherwise + * Note: if kbase_gpu_vm_lock() is to be held around this function to ensure thread-safe updating + * of @alloc, then refer to the documentation of kbase_gpu_vm_lock() about the requirements of + * either calling during a syscall, or ensuring the allocation is small. These requirements prevent + * an effective deadlock between the kernel's OoM killer and kbase's VMA close() handlers, which + * could take kbase_gpu_vm_lock() too. * - * Note : The caller must not hold vm_lock, as this could cause a deadlock if - * the kernel OoM killer runs. If the caller must allocate pages while holding - * this lock, it should use kbase_mem_pool_alloc_pages_locked() instead. + * If the requirements of kbase_gpu_vm_lock() cannot be satisfied when calling this function, but + * @alloc must still be updated in a thread-safe way, then instead use + * kbase_alloc_phy_pages_helper_locked() and restructure callers into the sequence outlined there. * * This function cannot be used from interrupt context + * + * Return: 0 if all pages have been successfully allocated. Error code otherwise */ int kbase_alloc_phy_pages_helper(struct kbase_mem_phy_alloc *alloc, size_t nr_pages_requested); @@ -1467,17 +1778,19 @@ int kbase_alloc_phy_pages_helper(struct kbase_mem_phy_alloc *alloc, * @alloc: allocation object to add pages to * @pool: Memory pool to allocate from * @nr_pages_requested: number of physical pages to allocate - * @prealloc_sa: Information about the partial allocation if the amount - * of memory requested is not a multiple of 2MB. One - * instance of struct kbase_sub_alloc must be allocated by - * the caller iff CONFIG_MALI_2MB_ALLOC is enabled. * - * Allocates \a nr_pages_requested and updates the alloc object. This function - * does not allocate new pages from the kernel, and therefore will never trigger - * the OoM killer. Therefore, it can be run while the vm_lock is held. + * @prealloc_sa: Information about the partial allocation if the amount of memory requested + * is not a multiple of 2MB. One instance of struct kbase_sub_alloc must be + * allocated by the caller if kbdev->pagesize_2mb is enabled. * - * As new pages can not be allocated, the caller must ensure there are - * sufficient pages in the pool. Usage of this function should look like : + * Allocates @nr_pages_requested and updates the alloc object. This function does not allocate new + * pages from the kernel, and therefore will never trigger the OoM killer. Therefore, it can be + * called whilst a thread operating outside of a syscall has held the region list lock + * (kbase_gpu_vm_lock()), as it will not cause an effective deadlock with VMA close() handlers used + * by the OoM killer. + * + * As new pages can not be allocated, the caller must ensure there are sufficient pages in the + * pool. Usage of this function should look like : * * kbase_gpu_vm_lock(kctx); * kbase_mem_pool_lock(pool) @@ -1490,24 +1803,24 @@ int kbase_alloc_phy_pages_helper(struct kbase_mem_phy_alloc *alloc, * } * kbase_alloc_phy_pages_helper_locked(pool) * kbase_mem_pool_unlock(pool) - * Perform other processing that requires vm_lock... + * // Perform other processing that requires vm_lock... * kbase_gpu_vm_unlock(kctx); * - * This ensures that the pool can be grown to the required size and that the - * allocation can complete without another thread using the newly grown pages. + * This ensures that the pool can be grown to the required size and that the allocation can + * complete without another thread using the newly grown pages. * - * If CONFIG_MALI_2MB_ALLOC is defined and the allocation is >= 2MB, then - * @pool must be alloc->imported.native.kctx->lp_mem_pool. Otherwise it must be - * alloc->imported.native.kctx->mem_pool. - * @prealloc_sa is used to manage the non-2MB sub-allocation. It has to be - * pre-allocated because we must not sleep (due to the usage of kmalloc()) - * whilst holding pool->pool_lock. - * @prealloc_sa shall be set to NULL if it has been consumed by this function - * to indicate that the caller must not free it. + * If kbdev->pagesize_2mb is enabled and the allocation is >= 2MB, then @pool must be one of the + * pools from alloc->imported.native.kctx->mem_pools.large[]. Otherwise it must be one of the + * mempools from alloc->imported.native.kctx->mem_pools.small[]. * - * Return: Pointer to array of allocated pages. NULL on failure. + * @prealloc_sa is used to manage the non-2MB sub-allocation. It has to be pre-allocated because we + * must not sleep (due to the usage of kmalloc()) whilst holding pool->pool_lock. @prealloc_sa + * shall be set to NULL if it has been consumed by this function to indicate that the caller no + * longer owns it and should not access it further. + * + * Note: Caller must hold @pool->pool_lock * - * Note : Caller must hold pool->pool_lock + * Return: Pointer to array of allocated pages. NULL on failure. */ struct tagged_addr *kbase_alloc_phy_pages_helper_locked( struct kbase_mem_phy_alloc *alloc, struct kbase_mem_pool *pool, @@ -1546,7 +1859,7 @@ void kbase_free_phy_pages_helper_locked(struct kbase_mem_phy_alloc *alloc, struct kbase_mem_pool *pool, struct tagged_addr *pages, size_t nr_pages_to_free); -static inline void kbase_set_dma_addr(struct page *p, dma_addr_t dma_addr) +static inline void kbase_set_dma_addr_as_priv(struct page *p, dma_addr_t dma_addr) { SetPagePrivate(p); if (sizeof(dma_addr_t) > sizeof(p->private)) { @@ -1562,7 +1875,7 @@ static inline void kbase_set_dma_addr(struct page *p, dma_addr_t dma_addr) } } -static inline dma_addr_t kbase_dma_addr(struct page *p) +static inline dma_addr_t kbase_dma_addr_as_priv(struct page *p) { if (sizeof(dma_addr_t) > sizeof(p->private)) return ((dma_addr_t)page_private(p)) << PAGE_SHIFT; @@ -1570,11 +1883,35 @@ static inline dma_addr_t kbase_dma_addr(struct page *p) return (dma_addr_t)page_private(p); } -static inline void kbase_clear_dma_addr(struct page *p) +static inline void kbase_clear_dma_addr_as_priv(struct page *p) { ClearPagePrivate(p); } +static inline struct kbase_page_metadata *kbase_page_private(struct page *p) +{ + return (struct kbase_page_metadata *)page_private(p); +} + +static inline dma_addr_t kbase_dma_addr(struct page *p) +{ + if (kbase_is_page_migration_enabled()) + return kbase_page_private(p)->dma_addr; + + return kbase_dma_addr_as_priv(p); +} + +static inline dma_addr_t kbase_dma_addr_from_tagged(struct tagged_addr tagged_pa) +{ + phys_addr_t pa = as_phys_addr_t(tagged_pa); + struct page *page = pfn_to_page(PFN_DOWN(pa)); + dma_addr_t dma_addr = (is_huge(tagged_pa) || is_partial(tagged_pa)) ? + kbase_dma_addr_as_priv(page) : + kbase_dma_addr(page); + + return dma_addr; +} + /** * kbase_flush_mmu_wqs() - Flush MMU workqueues. * @kbdev: Device pointer. @@ -1733,8 +2070,8 @@ void kbase_jit_report_update_pressure(struct kbase_context *kctx, unsigned int flags); /** - * jit_trim_necessary_pages() - calculate and trim the least pages possible to - * satisfy a new JIT allocation + * kbase_jit_trim_necessary_pages() - calculate and trim the least pages + * possible to satisfy a new JIT allocation * * @kctx: Pointer to the kbase context * @needed_pages: Number of JIT physical pages by which trimming is requested. @@ -1868,28 +2205,36 @@ bool kbase_has_exec_va_zone(struct kbase_context *kctx); /** * kbase_map_external_resource - Map an external resource to the GPU. * @kctx: kbase context. - * @reg: The region to map. + * @reg: External resource to map. * @locked_mm: The mm_struct which has been locked for this operation. * - * Return: The physical allocation which backs the region on success or NULL - * on failure. + * On successful mapping, the VA region and the gpu_alloc refcounts will be + * increased, making it safe to use and store both values directly. + * + * Return: Zero on success, or negative error code. */ -struct kbase_mem_phy_alloc *kbase_map_external_resource( - struct kbase_context *kctx, struct kbase_va_region *reg, - struct mm_struct *locked_mm); +int kbase_map_external_resource(struct kbase_context *kctx, struct kbase_va_region *reg, + struct mm_struct *locked_mm); /** * kbase_unmap_external_resource - Unmap an external resource from the GPU. * @kctx: kbase context. - * @reg: The region to unmap or NULL if it has already been released. - * @alloc: The physical allocation being unmapped. + * @reg: VA region corresponding to external resource + * + * On successful unmapping, the VA region and the gpu_alloc refcounts will + * be decreased. If the refcount reaches zero, both @reg and the corresponding + * allocation may be freed, so using them after returning from this function + * requires the caller to explicitly check their state. */ -void kbase_unmap_external_resource(struct kbase_context *kctx, - struct kbase_va_region *reg, struct kbase_mem_phy_alloc *alloc); +void kbase_unmap_external_resource(struct kbase_context *kctx, struct kbase_va_region *reg); /** * kbase_unpin_user_buf_page - Unpin a page of a user buffer. * @page: page to unpin + * + * The caller must have ensured that there are no CPU mappings for @page (as + * might be created from the struct kbase_mem_phy_alloc that tracks @page), and + * that userspace will not be able to recreate the CPU mappings again. */ void kbase_unpin_user_buf_page(struct page *page); @@ -1973,7 +2318,7 @@ static inline void kbase_mem_pool_lock(struct kbase_mem_pool *pool) } /** - * kbase_mem_pool_lock - Release a memory pool + * kbase_mem_pool_unlock - Release a memory pool * @pool: Memory pool to lock */ static inline void kbase_mem_pool_unlock(struct kbase_mem_pool *pool) @@ -2119,83 +2464,102 @@ int kbase_mem_copy_to_pinned_user_pages(struct page **dest_pages, unsigned int *target_page_nr, size_t offset); /** - * kbase_reg_zone_end_pfn - return the end Page Frame Number of @zone - * @zone: zone to query + * kbase_ctx_reg_zone_get_nolock - Get a zone from @kctx where the caller does + * not have @kctx 's region lock + * @kctx: Pointer to kbase context + * @zone: Zone identifier * - * Return: The end of the zone corresponding to @zone + * This should only be used in performance-critical paths where the code is + * resilient to a race with the zone changing, and only when the zone is tracked + * by the @kctx. + * + * Return: The zone corresponding to @zone */ -static inline u64 kbase_reg_zone_end_pfn(struct kbase_reg_zone *zone) +static inline struct kbase_reg_zone *kbase_ctx_reg_zone_get_nolock(struct kbase_context *kctx, + enum kbase_memory_zone zone) { - return zone->base_pfn + zone->va_size_pages; + WARN_ON(!kbase_is_ctx_reg_zone(zone)); + return &kctx->reg_zone[zone]; } /** - * kbase_ctx_reg_zone_init - initialize a zone in @kctx + * kbase_ctx_reg_zone_get - Get a memory zone from @kctx * @kctx: Pointer to kbase context - * @zone_bits: A KBASE_REG_ZONE_<...> to initialize + * @zone: Zone identifier + * + * Note that the zone is not refcounted, so there is no corresponding operation to + * put the zone back. + * + * Return: The zone corresponding to @zone + */ +static inline struct kbase_reg_zone *kbase_ctx_reg_zone_get(struct kbase_context *kctx, + enum kbase_memory_zone zone) +{ + lockdep_assert_held(&kctx->reg_lock); + return kbase_ctx_reg_zone_get_nolock(kctx, zone); +} + +/** + * kbase_reg_zone_init - Initialize a zone in @kctx + * @kbdev: Pointer to kbase device in order to initialize the VA region cache + * @zone: Memory zone + * @id: Memory zone identifier to facilitate lookups * @base_pfn: Page Frame Number in GPU virtual address space for the start of * the Zone * @va_size_pages: Size of the Zone in pages + * + * Return: + * * 0 on success + * * -ENOMEM on error */ -static inline void kbase_ctx_reg_zone_init(struct kbase_context *kctx, - unsigned long zone_bits, - u64 base_pfn, u64 va_size_pages) +static inline int kbase_reg_zone_init(struct kbase_device *kbdev, struct kbase_reg_zone *zone, + enum kbase_memory_zone id, u64 base_pfn, u64 va_size_pages) { - struct kbase_reg_zone *zone; + struct kbase_va_region *reg; - lockdep_assert_held(&kctx->reg_lock); - WARN_ON(!kbase_is_ctx_reg_zone(zone_bits)); + *zone = (struct kbase_reg_zone){ .reg_rbtree = RB_ROOT, + .base_pfn = base_pfn, + .va_size_pages = va_size_pages, + .id = id, + .cache = kbdev->va_region_slab }; + + if (unlikely(!va_size_pages)) + return 0; + + reg = kbase_alloc_free_region(zone, base_pfn, va_size_pages); + if (unlikely(!reg)) + return -ENOMEM; + + kbase_region_tracker_insert(reg); - zone = &kctx->reg_zone[KBASE_REG_ZONE_IDX(zone_bits)]; - *zone = (struct kbase_reg_zone){ - .base_pfn = base_pfn, .va_size_pages = va_size_pages, - }; + return 0; } /** - * kbase_ctx_reg_zone_get_nolock - get a zone from @kctx where the caller does - * not have @kctx 's region lock - * @kctx: Pointer to kbase context - * @zone_bits: A KBASE_REG_ZONE_<...> to retrieve - * - * This should only be used in performance-critical paths where the code is - * resilient to a race with the zone changing. + * kbase_reg_zone_end_pfn - return the end Page Frame Number of @zone + * @zone: zone to query * - * Return: The zone corresponding to @zone_bits + * Return: The end of the zone corresponding to @zone */ -static inline struct kbase_reg_zone * -kbase_ctx_reg_zone_get_nolock(struct kbase_context *kctx, - unsigned long zone_bits) +static inline u64 kbase_reg_zone_end_pfn(struct kbase_reg_zone *zone) { - WARN_ON(!kbase_is_ctx_reg_zone(zone_bits)); - - return &kctx->reg_zone[KBASE_REG_ZONE_IDX(zone_bits)]; + return zone->base_pfn + zone->va_size_pages; } /** - * kbase_ctx_reg_zone_get - get a zone from @kctx - * @kctx: Pointer to kbase context - * @zone_bits: A KBASE_REG_ZONE_<...> to retrieve - * - * The get is not refcounted - there is no corresponding 'put' operation - * - * Return: The zone corresponding to @zone_bits + * kbase_reg_zone_term - Terminate the memory zone tracker + * @zone: Memory zone */ -static inline struct kbase_reg_zone * -kbase_ctx_reg_zone_get(struct kbase_context *kctx, unsigned long zone_bits) +static inline void kbase_reg_zone_term(struct kbase_reg_zone *zone) { - lockdep_assert_held(&kctx->reg_lock); - WARN_ON(!kbase_is_ctx_reg_zone(zone_bits)); - - return &kctx->reg_zone[KBASE_REG_ZONE_IDX(zone_bits)]; + kbase_region_tracker_erase_rbtree(&zone->reg_rbtree); } /** * kbase_mem_allow_alloc - Check if allocation of GPU memory is allowed * @kctx: Pointer to kbase context * - * Don't allow the allocation of GPU memory until user space has set up the - * tracking page (which sets kctx->process_mm) or if the ioctl has been issued + * Don't allow the allocation of GPU memory if the ioctl has been issued * from the forked child process using the mali device file fd inherited from * the parent process. * @@ -2203,13 +2567,23 @@ kbase_ctx_reg_zone_get(struct kbase_context *kctx, unsigned long zone_bits) */ static inline bool kbase_mem_allow_alloc(struct kbase_context *kctx) { - bool allow_alloc = true; - - rcu_read_lock(); - allow_alloc = (rcu_dereference(kctx->process_mm) == current->mm); - rcu_read_unlock(); + return (kctx->process_mm == current->mm); +} - return allow_alloc; +/** + * kbase_mem_mmgrab - Wrapper function to take reference on mm_struct of current process + */ +static inline void kbase_mem_mmgrab(void) +{ + /* This merely takes a reference on the memory descriptor structure + * i.e. mm_struct of current process and not on its address space and + * so won't block the freeing of address space on process exit. + */ +#if KERNEL_VERSION(4, 11, 0) > LINUX_VERSION_CODE + atomic_inc(¤t->mm->mm_count); +#else + mmgrab(current->mm); +#endif } /** diff --git a/mali_kbase/mali_kbase_mem_linux.c b/mali_kbase/mali_kbase_mem_linux.c index 23d55b2..d154583 100644 --- a/mali_kbase/mali_kbase_mem_linux.c +++ b/mali_kbase/mali_kbase_mem_linux.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2010-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -31,14 +31,13 @@ #include <linux/fs.h> #include <linux/version.h> #include <linux/dma-mapping.h> -#if (KERNEL_VERSION(4, 8, 0) > LINUX_VERSION_CODE) -#include <linux/dma-attrs.h> -#endif /* LINUX_VERSION_CODE < 4.8.0 */ #include <linux/dma-buf.h> #include <linux/shrinker.h> #include <linux/cache.h> #include <linux/memory_group_manager.h> - +#include <linux/math64.h> +#include <linux/migrate.h> +#include <linux/version.h> #include <mali_kbase.h> #include <mali_kbase_mem_linux.h> #include <tl/mali_kbase_tracepoints.h> @@ -84,23 +83,34 @@ #define IR_THRESHOLD_STEPS (256u) #if MALI_USE_CSF -static int kbase_csf_cpu_mmap_user_reg_page(struct kbase_context *kctx, - struct vm_area_struct *vma); -static int kbase_csf_cpu_mmap_user_io_pages(struct kbase_context *kctx, - struct vm_area_struct *vma); +static int kbase_csf_cpu_mmap_user_reg_page(struct kbase_context *kctx, struct vm_area_struct *vma); +static int kbase_csf_cpu_mmap_user_io_pages(struct kbase_context *kctx, struct vm_area_struct *vma); #endif -static int kbase_vmap_phy_pages(struct kbase_context *kctx, - struct kbase_va_region *reg, u64 offset_bytes, size_t size, - struct kbase_vmap_struct *map); +static int kbase_vmap_phy_pages(struct kbase_context *kctx, struct kbase_va_region *reg, + u64 offset_bytes, size_t size, struct kbase_vmap_struct *map, + kbase_vmap_flag vmap_flags); static void kbase_vunmap_phy_pages(struct kbase_context *kctx, struct kbase_vmap_struct *map); static int kbase_tracking_page_setup(struct kbase_context *kctx, struct vm_area_struct *vma); -static int kbase_mem_shrink_gpu_mapping(struct kbase_context *kctx, - struct kbase_va_region *reg, - u64 new_pages, u64 old_pages); +static bool is_process_exiting(struct vm_area_struct *vma) +{ + /* PF_EXITING flag can't be reliably used here for the detection + * of process exit, as 'mm_users' counter could still be non-zero + * when all threads of the process have exited. Later when the + * thread (which took a reference on the 'mm' of process that + * exited) drops it reference, the vm_ops->close method would be + * called for all the vmas (owned by 'mm' of process that exited) + * but the PF_EXITING flag may not be neccessarily set for the + * thread at that time. + */ + if (atomic_read(&vma->vm_mm->mm_users)) + return false; + + return true; +} /* Retrieve the associated region pointer if the GPU address corresponds to * one of the event memory pages. The enclosing region, if found, shouldn't @@ -182,20 +192,12 @@ static int kbase_phy_alloc_mapping_init(struct kbase_context *kctx, reg->cpu_alloc->type != KBASE_MEM_TYPE_NATIVE) return -EINVAL; - if (size > (KBASE_PERMANENTLY_MAPPED_MEM_LIMIT_PAGES - - atomic_read(&kctx->permanent_mapped_pages))) { - dev_warn(kctx->kbdev->dev, "Request for %llu more pages mem needing a permanent mapping would breach limit %lu, currently at %d pages", - (u64)size, - KBASE_PERMANENTLY_MAPPED_MEM_LIMIT_PAGES, - atomic_read(&kctx->permanent_mapped_pages)); - return -ENOMEM; - } - kern_mapping = kzalloc(sizeof(*kern_mapping), GFP_KERNEL); if (!kern_mapping) return -ENOMEM; - err = kbase_vmap_phy_pages(kctx, reg, 0u, size_bytes, kern_mapping); + err = kbase_vmap_phy_pages(kctx, reg, 0u, size_bytes, kern_mapping, + KBASE_VMAP_FLAG_PERMANENT_MAP_ACCOUNTING); if (err < 0) goto vmap_fail; @@ -203,7 +205,6 @@ static int kbase_phy_alloc_mapping_init(struct kbase_context *kctx, reg->flags &= ~KBASE_REG_GROWABLE; reg->cpu_alloc->permanent_map = kern_mapping; - atomic_add(size, &kctx->permanent_mapped_pages); return 0; vmap_fail: @@ -219,13 +220,6 @@ void kbase_phy_alloc_mapping_term(struct kbase_context *kctx, kfree(alloc->permanent_map); alloc->permanent_map = NULL; - - /* Mappings are only done on cpu_alloc, so don't need to worry about - * this being reduced a second time if a separate gpu_alloc is - * freed - */ - WARN_ON(alloc->nents > atomic_read(&kctx->permanent_mapped_pages)); - atomic_sub(alloc->nents, &kctx->permanent_mapped_pages); } void *kbase_phy_alloc_mapping_get(struct kbase_context *kctx, @@ -293,9 +287,8 @@ struct kbase_va_region *kbase_mem_alloc(struct kbase_context *kctx, u64 va_pages u64 extension, u64 *flags, u64 *gpu_va, enum kbase_caller_mmu_sync_info mmu_sync_info) { - int zone; struct kbase_va_region *reg; - struct rb_root *rbtree; + enum kbase_memory_zone zone; struct device *dev; KBASE_DEBUG_ASSERT(kctx); @@ -365,32 +358,25 @@ struct kbase_va_region *kbase_mem_alloc(struct kbase_context *kctx, u64 va_pages #endif /* find out which VA zone to use */ - if (*flags & BASE_MEM_SAME_VA) { - rbtree = &kctx->reg_rbtree_same; - zone = KBASE_REG_ZONE_SAME_VA; - } + if (*flags & BASE_MEM_SAME_VA) + zone = SAME_VA_ZONE; #if MALI_USE_CSF /* fixed va_zone always exists */ else if (*flags & (BASE_MEM_FIXED | BASE_MEM_FIXABLE)) { if (*flags & BASE_MEM_PROT_GPU_EX) { - rbtree = &kctx->reg_rbtree_exec_fixed; - zone = KBASE_REG_ZONE_EXEC_FIXED_VA; + zone = EXEC_FIXED_VA_ZONE; } else { - rbtree = &kctx->reg_rbtree_fixed; - zone = KBASE_REG_ZONE_FIXED_VA; + zone = FIXED_VA_ZONE; } } #endif else if ((*flags & BASE_MEM_PROT_GPU_EX) && kbase_has_exec_va_zone(kctx)) { - rbtree = &kctx->reg_rbtree_exec; - zone = KBASE_REG_ZONE_EXEC_VA; + zone = EXEC_VA_ZONE; } else { - rbtree = &kctx->reg_rbtree_custom; - zone = KBASE_REG_ZONE_CUSTOM_VA; + zone = CUSTOM_VA_ZONE; } - reg = kbase_alloc_free_region(rbtree, PFN_DOWN(*gpu_va), - va_pages, zone); + reg = kbase_ctx_alloc_free_region(kctx, zone, PFN_DOWN(*gpu_va), va_pages); if (!reg) { dev_err(dev, "Failed to allocate free region"); @@ -445,7 +431,7 @@ struct kbase_va_region *kbase_mem_alloc(struct kbase_context *kctx, u64 va_pages } reg->initial_commit = commit_pages; - kbase_gpu_vm_lock(kctx); + kbase_gpu_vm_lock_with_pmode_sync(kctx); if (reg->flags & KBASE_REG_PERMANENT_KERNEL_MAPPING) { /* Permanent kernel mappings must happen as soon as @@ -456,7 +442,7 @@ struct kbase_va_region *kbase_mem_alloc(struct kbase_context *kctx, u64 va_pages int err = kbase_phy_alloc_mapping_init(kctx, reg, va_pages, commit_pages); if (err < 0) { - kbase_gpu_vm_unlock(kctx); + kbase_gpu_vm_unlock_with_pmode_sync(kctx); goto no_kern_mapping; } } @@ -468,7 +454,7 @@ struct kbase_va_region *kbase_mem_alloc(struct kbase_context *kctx, u64 va_pages /* Bind to a cookie */ if (bitmap_empty(kctx->cookies, BITS_PER_LONG)) { dev_err(dev, "No cookies available for allocation!"); - kbase_gpu_vm_unlock(kctx); + kbase_gpu_vm_unlock_with_pmode_sync(kctx); goto no_cookie; } /* return a cookie */ @@ -483,10 +469,28 @@ struct kbase_va_region *kbase_mem_alloc(struct kbase_context *kctx, u64 va_pages *gpu_va = (u64) cookie; } else /* we control the VA */ { - if (kbase_gpu_mmap(kctx, reg, *gpu_va, va_pages, 1, + size_t align = 1; + + if (kctx->kbdev->pagesize_2mb) { + /* If there's enough (> 33 bits) of GPU VA space, align to 2MB + * boundaries. The similar condition is used for mapping from + * the SAME_VA zone inside kbase_context_get_unmapped_area(). + */ + if (kctx->kbdev->gpu_props.mmu.va_bits > 33) { + if (va_pages >= (SZ_2M / SZ_4K)) + align = (SZ_2M / SZ_4K); + } + if (*gpu_va) + align = 1; +#if !MALI_USE_CSF + if (reg->flags & KBASE_REG_TILER_ALIGN_TOP) + align = 1; +#endif /* !MALI_USE_CSF */ + } + if (kbase_gpu_mmap(kctx, reg, *gpu_va, va_pages, align, mmu_sync_info) != 0) { dev_warn(dev, "Failed to map memory on GPU"); - kbase_gpu_vm_unlock(kctx); + kbase_gpu_vm_unlock_with_pmode_sync(kctx); goto no_mmap; } /* return real GPU VA */ @@ -504,7 +508,7 @@ struct kbase_va_region *kbase_mem_alloc(struct kbase_context *kctx, u64 va_pages } #endif /* MALI_JIT_PRESSURE_LIMIT_BASE */ - kbase_gpu_vm_unlock(kctx); + kbase_gpu_vm_unlock_with_pmode_sync(kctx); #if MALI_USE_CSF if (*flags & BASE_MEM_FIXABLE) @@ -623,8 +627,8 @@ int kbase_mem_query(struct kbase_context *kctx, #if MALI_USE_CSF if (KBASE_REG_CSF_EVENT & reg->flags) *out |= BASE_MEM_CSF_EVENT; - if (((KBASE_REG_ZONE_MASK & reg->flags) == KBASE_REG_ZONE_FIXED_VA) || - ((KBASE_REG_ZONE_MASK & reg->flags) == KBASE_REG_ZONE_EXEC_FIXED_VA)) { + if ((kbase_bits_to_zone(reg->flags) == FIXED_VA_ZONE) || + (kbase_bits_to_zone(reg->flags) == EXEC_FIXED_VA_ZONE)) { if (KBASE_REG_FIXED_ADDRESS & reg->flags) *out |= BASE_MEM_FIXED; else @@ -659,24 +663,33 @@ out_unlock: * @s: Shrinker * @sc: Shrinker control * - * Return: Number of pages which can be freed. + * Return: Number of pages which can be freed or SHRINK_EMPTY if no page remains. */ static unsigned long kbase_mem_evictable_reclaim_count_objects(struct shrinker *s, struct shrink_control *sc) { - struct kbase_context *kctx; + struct kbase_context *kctx = container_of(s, struct kbase_context, reclaim); + int evict_nents = atomic_read(&kctx->evict_nents); + unsigned long nr_freeable_items; - kctx = container_of(s, struct kbase_context, reclaim); - - WARN((sc->gfp_mask & __GFP_ATOMIC), - "Shrinkers cannot be called for GFP_ATOMIC allocations. Check kernel mm for problems. gfp_mask==%x\n", - sc->gfp_mask); WARN(in_atomic(), - "Shrinker called whilst in atomic context. The caller must switch to using GFP_ATOMIC or similar. gfp_mask==%x\n", + "Shrinker called in atomic context. The caller must use GFP_ATOMIC or similar, then Shrinkers must not be called. gfp_mask==%x\n", sc->gfp_mask); - return atomic_read(&kctx->evict_nents); + if (unlikely(evict_nents < 0)) { + dev_err(kctx->kbdev->dev, "invalid evict_nents(%d)", evict_nents); + nr_freeable_items = 0; + } else { + nr_freeable_items = evict_nents; + } + +#if KERNEL_VERSION(4, 19, 0) <= LINUX_VERSION_CODE + if (nr_freeable_items == 0) + nr_freeable_items = SHRINK_EMPTY; +#endif + + return nr_freeable_items; } /** @@ -685,8 +698,8 @@ unsigned long kbase_mem_evictable_reclaim_count_objects(struct shrinker *s, * @s: Shrinker * @sc: Shrinker control * - * Return: Number of pages freed (can be less then requested) or -1 if the - * shrinker failed to free pages in its pool. + * Return: Number of pages freed (can be less then requested) or + * SHRINK_STOP if reclaim isn't possible. * * Note: * This function accesses region structures without taking the region lock, @@ -709,22 +722,27 @@ unsigned long kbase_mem_evictable_reclaim_scan_objects(struct shrinker *s, kctx = container_of(s, struct kbase_context, reclaim); +#if MALI_USE_CSF + if (!down_read_trylock(&kctx->kbdev->csf.pmode_sync_sem)) { + dev_warn(kctx->kbdev->dev, + "Can't shrink GPU memory when P.Mode entrance is in progress"); + return 0; + } +#endif mutex_lock(&kctx->jit_evict_lock); list_for_each_entry_safe(alloc, tmp, &kctx->evict_list, evict_node) { int err; + if (!alloc->reg) + continue; + err = kbase_mem_shrink_gpu_mapping(kctx, alloc->reg, 0, alloc->nents); - if (err != 0) { - /* - * Failed to remove GPU mapping, tell the shrinker - * to stop trying to shrink our slab even though we - * have pages in it. - */ - freed = -1; - goto out_unlock; - } + + /* Failed to remove GPU mapping, proceed to next one. */ + if (err != 0) + continue; /* * Update alloc->evicted before freeing the backing so the @@ -748,9 +766,11 @@ unsigned long kbase_mem_evictable_reclaim_scan_objects(struct shrinker *s, if (freed > sc->nr_to_scan) break; } -out_unlock: - mutex_unlock(&kctx->jit_evict_lock); + mutex_unlock(&kctx->jit_evict_lock); +#if MALI_USE_CSF + up_read(&kctx->kbdev->csf.pmode_sync_sem); +#endif return freed; } @@ -768,7 +788,11 @@ int kbase_mem_evictable_init(struct kbase_context *kctx) * struct shrinker does not define batch */ kctx->reclaim.batch = 0; - register_shrinker(&kctx->reclaim, "mali-mem-evictable"); +#if KERNEL_VERSION(6, 0, 0) > LINUX_VERSION_CODE + register_shrinker(&kctx->reclaim); +#else + register_shrinker(&kctx->reclaim, "mali-mem"); +#endif return 0; } @@ -832,6 +856,9 @@ int kbase_mem_evictable_make(struct kbase_mem_phy_alloc *gpu_alloc) lockdep_assert_held(&kctx->reg_lock); + /* Memory is in the process of transitioning to the shrinker, and + * should ignore migration attempts + */ kbase_mem_shrink_cpu_mapping(kctx, gpu_alloc->reg, 0, gpu_alloc->nents); @@ -839,12 +866,17 @@ int kbase_mem_evictable_make(struct kbase_mem_phy_alloc *gpu_alloc) /* This allocation can't already be on a list. */ WARN_ON(!list_empty(&gpu_alloc->evict_node)); - /* - * Add the allocation to the eviction list, after this point the shrink + /* Add the allocation to the eviction list, after this point the shrink * can reclaim it. */ list_add(&gpu_alloc->evict_node, &kctx->evict_list); atomic_add(gpu_alloc->nents, &kctx->evict_nents); + + /* Indicate to page migration that the memory can be reclaimed by the shrinker. + */ + if (kbase_is_page_migration_enabled()) + kbase_set_phy_alloc_page_status(gpu_alloc, NOT_MOVABLE); + mutex_unlock(&kctx->jit_evict_lock); kbase_mem_evictable_mark_reclaim(gpu_alloc); @@ -896,6 +928,15 @@ bool kbase_mem_evictable_unmake(struct kbase_mem_phy_alloc *gpu_alloc) gpu_alloc->evicted, 0, mmu_sync_info); gpu_alloc->evicted = 0; + + /* Since the allocation is no longer evictable, and we ensure that + * it grows back to its pre-eviction size, we will consider the + * state of it to be ALLOCATED_MAPPED, as that is the only state + * in which a physical allocation could transition to NOT_MOVABLE + * from. + */ + if (kbase_is_page_migration_enabled()) + kbase_set_phy_alloc_page_status(gpu_alloc, ALLOCATED_MAPPED); } } @@ -941,13 +982,22 @@ int kbase_mem_flags_change(struct kbase_context *kctx, u64 gpu_addr, unsigned in /* now we can lock down the context, and find the region */ down_write(kbase_mem_get_process_mmap_lock()); - kbase_gpu_vm_lock(kctx); + kbase_gpu_vm_lock_with_pmode_sync(kctx); /* Validate the region */ reg = kbase_region_tracker_find_region_base_address(kctx, gpu_addr); if (kbase_is_region_invalid_or_free(reg)) goto out_unlock; + /* There is no use case to support MEM_FLAGS_CHANGE ioctl for allocations + * that have NO_USER_FREE flag set, to mark them as evictable/reclaimable. + * This would usually include JIT allocations, Tiler heap related allocations + * & GPU queue ringbuffer and none of them needs to be explicitly marked + * as evictable by Userspace. + */ + if (kbase_va_region_is_no_user_free(reg)) + goto out_unlock; + /* Is the region being transitioning between not needed and needed? */ prev_needed = (KBASE_REG_DONT_NEED & reg->flags) == KBASE_REG_DONT_NEED; new_needed = (BASE_MEM_DONT_NEED & flags) == BASE_MEM_DONT_NEED; @@ -1045,7 +1095,7 @@ int kbase_mem_flags_change(struct kbase_context *kctx, u64 gpu_addr, unsigned in reg->flags = new_flags; out_unlock: - kbase_gpu_vm_unlock(kctx); + kbase_gpu_vm_unlock_with_pmode_sync(kctx); up_write(kbase_mem_get_process_mmap_lock()); out: return ret; @@ -1101,19 +1151,7 @@ int kbase_mem_do_sync_imported(struct kbase_context *kctx, ret = 0; } #else - /* Though the below version check could be superfluous depending upon the version condition - * used for enabling KBASE_MEM_ION_SYNC_WORKAROUND, we still keep this check here to allow - * ease of modification for non-ION systems or systems where ION has been patched. - */ -#if KERNEL_VERSION(4, 6, 0) > LINUX_VERSION_CODE && !defined(CONFIG_CHROMEOS) - dma_buf_end_cpu_access(dma_buf, - 0, dma_buf->size, - dir); - ret = 0; -#else - ret = dma_buf_end_cpu_access(dma_buf, - dir); -#endif + ret = dma_buf_end_cpu_access(dma_buf, dir); #endif /* KBASE_MEM_ION_SYNC_WORKAROUND */ break; case KBASE_SYNC_TO_CPU: @@ -1130,11 +1168,7 @@ int kbase_mem_do_sync_imported(struct kbase_context *kctx, ret = 0; } #else - ret = dma_buf_begin_cpu_access(dma_buf, -#if KERNEL_VERSION(4, 6, 0) > LINUX_VERSION_CODE && !defined(CONFIG_CHROMEOS) - 0, dma_buf->size, -#endif - dir); + ret = dma_buf_begin_cpu_access(dma_buf, dir); #endif /* KBASE_MEM_ION_SYNC_WORKAROUND */ break; } @@ -1281,11 +1315,11 @@ int kbase_mem_umm_map(struct kbase_context *kctx, gwt_mask = ~KBASE_REG_GPU_WR; #endif - err = kbase_mmu_insert_pages(kctx->kbdev, &kctx->mmu, reg->start_pfn, - kbase_get_gpu_phy_pages(reg), - kbase_reg_current_backed_size(reg), - reg->flags & gwt_mask, kctx->as_nr, - alloc->group_id, mmu_sync_info); + err = kbase_mmu_insert_pages_skip_status_update(kctx->kbdev, &kctx->mmu, reg->start_pfn, + kbase_get_gpu_phy_pages(reg), + kbase_reg_current_backed_size(reg), + reg->flags & gwt_mask, kctx->as_nr, + alloc->group_id, mmu_sync_info, NULL); if (err) goto bad_insert; @@ -1298,11 +1332,11 @@ int kbase_mem_umm_map(struct kbase_context *kctx, * Assume alloc->nents is the number of actual pages in the * dma-buf memory. */ - err = kbase_mmu_insert_single_page( - kctx, reg->start_pfn + alloc->nents, - kctx->aliasing_sink_page, reg->nr_pages - alloc->nents, - (reg->flags | KBASE_REG_GPU_RD) & ~KBASE_REG_GPU_WR, - KBASE_MEM_GROUP_SINK, mmu_sync_info); + err = kbase_mmu_insert_single_imported_page( + kctx, reg->start_pfn + alloc->nents, kctx->aliasing_sink_page, + reg->nr_pages - alloc->nents, + (reg->flags | KBASE_REG_GPU_RD) & ~KBASE_REG_GPU_WR, KBASE_MEM_GROUP_SINK, + mmu_sync_info); if (err) goto bad_pad_insert; } @@ -1310,11 +1344,8 @@ int kbase_mem_umm_map(struct kbase_context *kctx, return 0; bad_pad_insert: - kbase_mmu_teardown_pages(kctx->kbdev, - &kctx->mmu, - reg->start_pfn, - alloc->nents, - kctx->as_nr); + kbase_mmu_teardown_imported_pages(kctx->kbdev, &kctx->mmu, reg->start_pfn, alloc->pages, + alloc->nents, alloc->nents, kctx->as_nr); bad_insert: kbase_mem_umm_unmap_attachment(kctx, alloc); bad_map_attachment: @@ -1342,11 +1373,9 @@ void kbase_mem_umm_unmap(struct kbase_context *kctx, if (!kbase_is_region_invalid_or_free(reg) && reg->gpu_alloc == alloc) { int err; - err = kbase_mmu_teardown_pages(kctx->kbdev, - &kctx->mmu, - reg->start_pfn, - reg->nr_pages, - kctx->as_nr); + err = kbase_mmu_teardown_imported_pages(kctx->kbdev, &kctx->mmu, reg->start_pfn, + alloc->pages, reg->nr_pages, reg->nr_pages, + kctx->as_nr); WARN_ON(err); } @@ -1393,6 +1422,7 @@ static struct kbase_va_region *kbase_mem_from_umm(struct kbase_context *kctx, struct kbase_va_region *reg; struct dma_buf *dma_buf; struct dma_buf_attachment *dma_attachment; + enum kbase_memory_zone zone; bool shared_zone = false; bool need_sync = false; int group_id; @@ -1418,6 +1448,9 @@ static struct kbase_va_region *kbase_mem_from_umm(struct kbase_context *kctx, return NULL; } + if (!kbase_import_size_is_valid(kctx->kbdev, *va_pages)) + return NULL; + /* ignore SAME_VA */ *flags &= ~BASE_MEM_SAME_VA; @@ -1438,24 +1471,21 @@ static struct kbase_va_region *kbase_mem_from_umm(struct kbase_context *kctx, if (*flags & BASE_MEM_IMPORT_SYNC_ON_MAP_UNMAP) need_sync = true; -#if IS_ENABLED(CONFIG_64BIT) - if (!kbase_ctx_flag(kctx, KCTX_COMPAT)) { + if (!kbase_ctx_compat_mode(kctx)) { /* * 64-bit tasks require us to reserve VA on the CPU that we use * on the GPU. */ shared_zone = true; } -#endif if (shared_zone) { *flags |= BASE_MEM_NEED_MMAP; - reg = kbase_alloc_free_region(&kctx->reg_rbtree_same, - 0, *va_pages, KBASE_REG_ZONE_SAME_VA); - } else { - reg = kbase_alloc_free_region(&kctx->reg_rbtree_custom, - 0, *va_pages, KBASE_REG_ZONE_CUSTOM_VA); - } + zone = SAME_VA_ZONE; + } else + zone = CUSTOM_VA_ZONE; + + reg = kbase_ctx_alloc_free_region(kctx, zone, 0, *va_pages); if (!reg) { dma_buf_detach(dma_buf, dma_attachment); @@ -1539,16 +1569,18 @@ static struct kbase_va_region *kbase_mem_from_user_buffer( struct kbase_context *kctx, unsigned long address, unsigned long size, u64 *va_pages, u64 *flags) { - long i; + long i, dma_mapped_pages; struct kbase_va_region *reg; - struct rb_root *rbtree; long faulted_pages; - int zone = KBASE_REG_ZONE_CUSTOM_VA; + enum kbase_memory_zone zone = CUSTOM_VA_ZONE; bool shared_zone = false; u32 cache_line_alignment = kbase_get_cache_line_alignment(kctx->kbdev); struct kbase_alloc_import_user_buf *user_buf; struct page **pages = NULL; + struct tagged_addr *pa; + struct device *dev; int write; + enum dma_data_direction dma_dir; /* Flag supported only for dma-buf imported memory */ if (*flags & BASE_MEM_IMPORT_SYNC_ON_MAP_UNMAP) @@ -1585,31 +1617,29 @@ static struct kbase_va_region *kbase_mem_from_user_buffer( /* 64-bit address range is the max */ goto bad_size; + if (!kbase_import_size_is_valid(kctx->kbdev, *va_pages)) + goto bad_size; + /* SAME_VA generally not supported with imported memory (no known use cases) */ *flags &= ~BASE_MEM_SAME_VA; if (*flags & BASE_MEM_IMPORT_SHARED) shared_zone = true; -#if IS_ENABLED(CONFIG_64BIT) - if (!kbase_ctx_flag(kctx, KCTX_COMPAT)) { + if (!kbase_ctx_compat_mode(kctx)) { /* * 64-bit tasks require us to reserve VA on the CPU that we use * on the GPU. */ shared_zone = true; } -#endif if (shared_zone) { *flags |= BASE_MEM_NEED_MMAP; - zone = KBASE_REG_ZONE_SAME_VA; - rbtree = &kctx->reg_rbtree_same; - } else - rbtree = &kctx->reg_rbtree_custom; - - reg = kbase_alloc_free_region(rbtree, 0, *va_pages, zone); + zone = SAME_VA_ZONE; + } + reg = kbase_ctx_alloc_free_region(kctx, zone, 0, *va_pages); if (!reg) goto no_region; @@ -1634,11 +1664,7 @@ static struct kbase_va_region *kbase_mem_from_user_buffer( user_buf->address = address; user_buf->nr_pages = *va_pages; user_buf->mm = current->mm; -#if KERNEL_VERSION(4, 11, 0) > LINUX_VERSION_CODE - atomic_inc(¤t->mm->mm_count); -#else - mmgrab(current->mm); -#endif + kbase_mem_mmgrab(); if (reg->gpu_alloc->properties & KBASE_MEM_PHY_ALLOC_LARGE) user_buf->pages = vmalloc(*va_pages * sizeof(struct page *)); else @@ -1663,19 +1689,9 @@ static struct kbase_va_region *kbase_mem_from_user_buffer( down_read(kbase_mem_get_process_mmap_lock()); write = reg->flags & (KBASE_REG_CPU_WR | KBASE_REG_GPU_WR); + dma_dir = write ? DMA_BIDIRECTIONAL : DMA_TO_DEVICE; -#if KERNEL_VERSION(4, 6, 0) > LINUX_VERSION_CODE - faulted_pages = get_user_pages(current, current->mm, address, *va_pages, -#if KERNEL_VERSION(4, 4, 168) <= LINUX_VERSION_CODE && \ -KERNEL_VERSION(4, 5, 0) > LINUX_VERSION_CODE - write ? FOLL_WRITE : 0, pages, NULL); -#else - write, 0, pages, NULL); -#endif -#elif KERNEL_VERSION(4, 9, 0) > LINUX_VERSION_CODE - faulted_pages = get_user_pages(address, *va_pages, - write, 0, pages, NULL); -#elif KERNEL_VERSION(5, 9, 0) > LINUX_VERSION_CODE +#if KERNEL_VERSION(5, 9, 0) > LINUX_VERSION_CODE faulted_pages = get_user_pages(address, *va_pages, write ? FOLL_WRITE : 0, pages, NULL); #else @@ -1706,31 +1722,44 @@ KERNEL_VERSION(4, 5, 0) > LINUX_VERSION_CODE reg->gpu_alloc->nents = 0; reg->extension = 0; - if (pages) { - struct device *dev = kctx->kbdev->dev; - unsigned long local_size = user_buf->size; - unsigned long offset = user_buf->address & ~PAGE_MASK; - struct tagged_addr *pa = kbase_get_gpu_phy_pages(reg); + pa = kbase_get_gpu_phy_pages(reg); + dev = kctx->kbdev->dev; + if (pages) { /* Top bit signifies that this was pinned on import */ user_buf->current_mapping_usage_count |= PINNED_ON_IMPORT; + /* Manual CPU cache synchronization. + * + * The driver disables automatic CPU cache synchronization because the + * memory pages that enclose the imported region may also contain + * sub-regions which are not imported and that are allocated and used + * by the user process. This may be the case of memory at the beginning + * of the first page and at the end of the last page. Automatic CPU cache + * synchronization would force some operations on those memory allocations, + * unbeknown to the user process: in particular, a CPU cache invalidate + * upon unmapping would destroy the content of dirty CPU caches and cause + * the user process to lose CPU writes to the non-imported sub-regions. + * + * When the GPU claims ownership of the imported memory buffer, it shall + * commit CPU writes for the whole of all pages that enclose the imported + * region, otherwise the initial content of memory would be wrong. + */ for (i = 0; i < faulted_pages; i++) { dma_addr_t dma_addr; - unsigned long min; - - min = MIN(PAGE_SIZE - offset, local_size); - dma_addr = dma_map_page(dev, pages[i], - offset, min, - DMA_BIDIRECTIONAL); +#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE) + dma_addr = dma_map_page(dev, pages[i], 0, PAGE_SIZE, dma_dir); +#else + dma_addr = dma_map_page_attrs(dev, pages[i], 0, PAGE_SIZE, dma_dir, + DMA_ATTR_SKIP_CPU_SYNC); +#endif if (dma_mapping_error(dev, dma_addr)) goto unwind_dma_map; user_buf->dma_addrs[i] = dma_addr; pa[i] = as_tagged(page_to_phys(pages[i])); - local_size -= min; - offset = 0; + dma_sync_single_for_device(dev, dma_addr, PAGE_SIZE, dma_dir); } reg->gpu_alloc->nents = faulted_pages; @@ -1739,13 +1768,29 @@ KERNEL_VERSION(4, 5, 0) > LINUX_VERSION_CODE return reg; unwind_dma_map: - while (i--) { - dma_unmap_page(kctx->kbdev->dev, - user_buf->dma_addrs[i], - PAGE_SIZE, DMA_BIDIRECTIONAL); + dma_mapped_pages = i; + /* Run the unmap loop in the same order as map loop, and perform again + * CPU cache synchronization to re-write the content of dirty CPU caches + * to memory. This precautionary measure is kept here to keep this code + * aligned with kbase_jd_user_buf_map() to allow for a potential refactor + * in the future. + */ + for (i = 0; i < dma_mapped_pages; i++) { + dma_addr_t dma_addr = user_buf->dma_addrs[i]; + + dma_sync_single_for_device(dev, dma_addr, PAGE_SIZE, dma_dir); +#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE) + dma_unmap_page(dev, dma_addr, PAGE_SIZE, dma_dir); +#else + dma_unmap_page_attrs(dev, dma_addr, PAGE_SIZE, dma_dir, DMA_ATTR_SKIP_CPU_SYNC); +#endif } fault_mismatch: if (pages) { + /* In this case, the region was not yet in the region tracker, + * and so there are no CPU mappings to remove before we unpin + * the page + */ for (i = 0; i < faulted_pages; i++) kbase_unpin_user_buf_page(pages[i]); } @@ -1758,7 +1803,6 @@ no_alloc_obj: no_region: bad_size: return NULL; - } @@ -1770,6 +1814,8 @@ u64 kbase_mem_alias(struct kbase_context *kctx, u64 *flags, u64 stride, u64 gpu_va; size_t i; bool coherent; + uint64_t max_stride; + enum kbase_memory_zone zone; /* Calls to this function are inherently asynchronous, with respect to * MMU operations. @@ -1802,30 +1848,31 @@ u64 kbase_mem_alias(struct kbase_context *kctx, u64 *flags, u64 stride, if (!nents) goto bad_nents; - if (nents > (U64_MAX / PAGE_SIZE) / stride) + max_stride = div64_u64(U64_MAX, nents); + + if (stride > max_stride) + goto bad_size; + + if ((nents * stride) > (U64_MAX / PAGE_SIZE)) /* 64-bit address range is the max */ goto bad_size; /* calculate the number of pages this alias will cover */ *num_pages = nents * stride; -#if IS_ENABLED(CONFIG_64BIT) - if (!kbase_ctx_flag(kctx, KCTX_COMPAT)) { + if (!kbase_alias_size_is_valid(kctx->kbdev, *num_pages)) + goto bad_size; + + if (!kbase_ctx_compat_mode(kctx)) { /* 64-bit tasks must MMAP anyway, but not expose this address to * clients */ + zone = SAME_VA_ZONE; *flags |= BASE_MEM_NEED_MMAP; - reg = kbase_alloc_free_region(&kctx->reg_rbtree_same, 0, - *num_pages, - KBASE_REG_ZONE_SAME_VA); - } else { -#else - if (1) { -#endif - reg = kbase_alloc_free_region(&kctx->reg_rbtree_custom, - 0, *num_pages, - KBASE_REG_ZONE_CUSTOM_VA); - } + } else + zone = CUSTOM_VA_ZONE; + + reg = kbase_ctx_alloc_free_region(kctx, zone, 0, *num_pages); if (!reg) goto no_reg; @@ -1847,7 +1894,7 @@ u64 kbase_mem_alias(struct kbase_context *kctx, u64 *flags, u64 stride, if (!reg->gpu_alloc->imported.alias.aliased) goto no_aliased_array; - kbase_gpu_vm_lock(kctx); + kbase_gpu_vm_lock_with_pmode_sync(kctx); /* validate and add src handles */ for (i = 0; i < nents; i++) { @@ -1873,9 +1920,9 @@ u64 kbase_mem_alias(struct kbase_context *kctx, u64 *flags, u64 stride, /* validate found region */ if (kbase_is_region_invalid_or_free(aliasing_reg)) goto bad_handle; /* Not found/already free */ - if (aliasing_reg->flags & KBASE_REG_DONT_NEED) + if (kbase_is_region_shrinkable(aliasing_reg)) goto bad_handle; /* Ephemeral region */ - if (aliasing_reg->flags & KBASE_REG_NO_USER_FREE) + if (kbase_va_region_is_no_user_free(aliasing_reg)) goto bad_handle; /* JIT regions can't be * aliased. NO_USER_FREE flag * covers the entire lifetime @@ -1930,8 +1977,7 @@ u64 kbase_mem_alias(struct kbase_context *kctx, u64 *flags, u64 stride, } } -#if IS_ENABLED(CONFIG_64BIT) - if (!kbase_ctx_flag(kctx, KCTX_COMPAT)) { + if (!kbase_ctx_compat_mode(kctx)) { /* Bind to a cookie */ if (bitmap_empty(kctx->cookies, BITS_PER_LONG)) { dev_err(kctx->kbdev->dev, "No cookies available for allocation!"); @@ -1946,10 +1992,8 @@ u64 kbase_mem_alias(struct kbase_context *kctx, u64 *flags, u64 stride, /* relocate to correct base */ gpu_va += PFN_DOWN(BASE_MEM_COOKIE_BASE); gpu_va <<= PAGE_SHIFT; - } else /* we control the VA */ { -#else - if (1) { -#endif + } else { + /* we control the VA */ if (kbase_gpu_mmap(kctx, reg, 0, *num_pages, 1, mmu_sync_info) != 0) { dev_warn(kctx->kbdev->dev, "Failed to map memory on GPU"); @@ -1962,20 +2006,18 @@ u64 kbase_mem_alias(struct kbase_context *kctx, u64 *flags, u64 stride, reg->flags &= ~KBASE_REG_FREE; reg->flags &= ~KBASE_REG_GROWABLE; - kbase_gpu_vm_unlock(kctx); + kbase_gpu_vm_unlock_with_pmode_sync(kctx); return gpu_va; -#if IS_ENABLED(CONFIG_64BIT) no_cookie: -#endif no_mmap: bad_handle: /* Marking the source allocs as not being mapped on the GPU and putting * them is handled by putting reg's allocs, so no rollback of those * actions is done here. */ - kbase_gpu_vm_unlock(kctx); + kbase_gpu_vm_unlock_with_pmode_sync(kctx); no_aliased_array: invalid_flags: kbase_mem_phy_alloc_put(reg->cpu_alloc); @@ -2035,7 +2077,10 @@ int kbase_mem_import(struct kbase_context *kctx, enum base_mem_import_type type, /* Remove COHERENT_SYSTEM flag if coherent mem is unavailable */ *flags &= ~BASE_MEM_COHERENT_SYSTEM; } - + if (((*flags & BASE_MEM_CACHED_CPU) == 0) && (type == BASE_MEM_IMPORT_TYPE_USER_BUFFER)) { + dev_warn(kctx->kbdev->dev, "USER_BUFFER must be CPU cached"); + goto bad_flags; + } if ((padding != 0) && (type != BASE_MEM_IMPORT_TYPE_UMM)) { dev_warn(kctx->kbdev->dev, "padding is only supported for UMM"); @@ -2083,7 +2128,7 @@ int kbase_mem_import(struct kbase_context *kctx, enum base_mem_import_type type, if (!reg) goto no_reg; - kbase_gpu_vm_lock(kctx); + kbase_gpu_vm_lock_with_pmode_sync(kctx); /* mmap needed to setup VA? */ if (*flags & (BASE_MEM_SAME_VA | BASE_MEM_NEED_MMAP)) { @@ -2118,13 +2163,13 @@ int kbase_mem_import(struct kbase_context *kctx, enum base_mem_import_type type, /* clear out private flags */ *flags &= ((1UL << BASE_MEM_FLAGS_NR_BITS) - 1); - kbase_gpu_vm_unlock(kctx); + kbase_gpu_vm_unlock_with_pmode_sync(kctx); return 0; no_gpu_va: no_cookie: - kbase_gpu_vm_unlock(kctx); + kbase_gpu_vm_unlock_with_pmode_sync(kctx); kbase_mem_phy_alloc_put(reg->cpu_alloc); kbase_mem_phy_alloc_put(reg->gpu_alloc); kfree(reg); @@ -2149,11 +2194,9 @@ int kbase_mem_grow_gpu_mapping(struct kbase_context *kctx, /* Map the new pages into the GPU */ phy_pages = kbase_get_gpu_phy_pages(reg); - ret = kbase_mmu_insert_pages(kctx->kbdev, &kctx->mmu, - reg->start_pfn + old_pages, - phy_pages + old_pages, delta, reg->flags, - kctx->as_nr, reg->gpu_alloc->group_id, - mmu_sync_info); + ret = kbase_mmu_insert_pages(kctx->kbdev, &kctx->mmu, reg->start_pfn + old_pages, + phy_pages + old_pages, delta, reg->flags, kctx->as_nr, + reg->gpu_alloc->group_id, mmu_sync_info, reg); return ret; } @@ -2168,33 +2211,21 @@ void kbase_mem_shrink_cpu_mapping(struct kbase_context *kctx, /* Nothing to do */ return; - unmap_mapping_range(kctx->filp->f_inode->i_mapping, + unmap_mapping_range(kctx->kfile->filp->f_inode->i_mapping, (gpu_va_start + new_pages)<<PAGE_SHIFT, (old_pages - new_pages)<<PAGE_SHIFT, 1); } -/** - * kbase_mem_shrink_gpu_mapping - Shrink the GPU mapping of an allocation - * @kctx: Context the region belongs to - * @reg: The GPU region or NULL if there isn't one - * @new_pages: The number of pages after the shrink - * @old_pages: The number of pages before the shrink - * - * Return: 0 on success, negative -errno on error - * - * Unmap the shrunk pages from the GPU mapping. Note that the size of the region - * itself is unmodified as we still need to reserve the VA, only the page tables - * will be modified by this function. - */ -static int kbase_mem_shrink_gpu_mapping(struct kbase_context *const kctx, - struct kbase_va_region *const reg, - u64 const new_pages, u64 const old_pages) +int kbase_mem_shrink_gpu_mapping(struct kbase_context *const kctx, + struct kbase_va_region *const reg, u64 const new_pages, + u64 const old_pages) { u64 delta = old_pages - new_pages; + struct kbase_mem_phy_alloc *alloc = reg->gpu_alloc; int ret = 0; - ret = kbase_mmu_teardown_pages(kctx->kbdev, &kctx->mmu, - reg->start_pfn + new_pages, delta, kctx->as_nr); + ret = kbase_mmu_teardown_pages(kctx->kbdev, &kctx->mmu, reg->start_pfn + new_pages, + alloc->pages + new_pages, delta, delta, kctx->as_nr); return ret; } @@ -2221,7 +2252,7 @@ int kbase_mem_commit(struct kbase_context *kctx, u64 gpu_addr, u64 new_pages) } down_write(kbase_mem_get_process_mmap_lock()); - kbase_gpu_vm_lock(kctx); + kbase_gpu_vm_lock_with_pmode_sync(kctx); /* Validate the region */ reg = kbase_region_tracker_find_region_base_address(kctx, gpu_addr); @@ -2258,8 +2289,11 @@ int kbase_mem_commit(struct kbase_context *kctx, u64 gpu_addr, u64 new_pages) if (atomic_read(®->cpu_alloc->kernel_mappings) > 0) goto out_unlock; - /* can't grow regions which are ephemeral */ - if (reg->flags & KBASE_REG_DONT_NEED) + + if (kbase_is_region_shrinkable(reg)) + goto out_unlock; + + if (kbase_va_region_is_no_user_free(reg)) goto out_unlock; #ifdef CONFIG_MALI_MEMORY_FULLY_BACKED @@ -2322,7 +2356,7 @@ int kbase_mem_commit(struct kbase_context *kctx, u64 gpu_addr, u64 new_pages) } out_unlock: - kbase_gpu_vm_unlock(kctx); + kbase_gpu_vm_unlock_with_pmode_sync(kctx); if (read_locked) up_read(kbase_mem_get_process_mmap_lock()); else @@ -2350,6 +2384,21 @@ int kbase_mem_shrink(struct kbase_context *const kctx, return -EINVAL; delta = old_pages - new_pages; + if (kctx->kbdev->pagesize_2mb) { + struct tagged_addr *start_free = reg->gpu_alloc->pages + new_pages; + + /* Move the end of new committed range to a valid location. + * This mirrors the adjustment done inside kbase_free_phy_pages_helper(). + */ + while (delta && is_huge(*start_free) && !is_huge_head(*start_free)) { + start_free++; + new_pages++; + delta--; + } + + if (!delta) + return 0; + } /* Update the GPU mapping */ err = kbase_mem_shrink_gpu_mapping(kctx, reg, @@ -2362,18 +2411,6 @@ int kbase_mem_shrink(struct kbase_context *const kctx, kbase_free_phy_pages_helper(reg->cpu_alloc, delta); if (reg->cpu_alloc != reg->gpu_alloc) kbase_free_phy_pages_helper(reg->gpu_alloc, delta); -#ifdef CONFIG_MALI_2MB_ALLOC - if (kbase_reg_current_backed_size(reg) > new_pages) { - old_pages = new_pages; - new_pages = kbase_reg_current_backed_size(reg); - - /* Update GPU mapping. */ - err = kbase_mem_grow_gpu_mapping(kctx, reg, - new_pages, old_pages, CALLER_MMU_ASYNC); - } -#else - WARN_ON(kbase_reg_current_backed_size(reg) != new_pages); -#endif } return err; @@ -2404,55 +2441,27 @@ static void kbase_cpu_vm_close(struct vm_area_struct *vma) KBASE_DEBUG_ASSERT(map->kctx); KBASE_DEBUG_ASSERT(map->alloc); - kbase_gpu_vm_lock(map->kctx); + kbase_gpu_vm_lock_with_pmode_sync(map->kctx); if (map->free_on_close) { - KBASE_DEBUG_ASSERT((map->region->flags & KBASE_REG_ZONE_MASK) == - KBASE_REG_ZONE_SAME_VA); + KBASE_DEBUG_ASSERT(kbase_bits_to_zone(map->region->flags) == SAME_VA_ZONE); /* Avoid freeing memory on the process death which results in * GPU Page Fault. Memory will be freed in kbase_destroy_context */ - if (!(current->flags & PF_EXITING)) + if (!is_process_exiting(vma)) kbase_mem_free_region(map->kctx, map->region); } list_del(&map->mappings_list); kbase_va_region_alloc_put(map->kctx, map->region); - kbase_gpu_vm_unlock(map->kctx); + kbase_gpu_vm_unlock_with_pmode_sync(map->kctx); kbase_mem_phy_alloc_put(map->alloc); + kbase_file_dec_cpu_mapping_count(map->kctx->kfile); kfree(map); } -static int kbase_cpu_vm_split(struct vm_area_struct *vma, unsigned long addr) -{ - struct kbase_cpu_mapping *map = vma->vm_private_data; - - KBASE_DEBUG_ASSERT(map->kctx); - KBASE_DEBUG_ASSERT(map->count > 0); - - /* - * We should never have a map/munmap pairing on a kbase_context managed - * vma such that the munmap only unmaps a portion of the vma range. - * Should this arise, the kernel attempts to split the vma range to - * ensure that it only unmaps the requested region. To achieve this it - * attempts to split the containing vma split occurs, and this callback - * is reached. By returning -EINVAL here we inform the kernel that such - * splits are not supported so that it instead unmaps the entire region. - * Since this is indicative of a bug in the map/munmap code in the - * driver, we raise a WARN here to indicate that this invalid - * state has been reached. - */ - dev_warn(map->kctx->kbdev->dev, - "%s: vma region split requested: addr=%lx map->count=%d reg=%p reg->start_pfn=%llx reg->nr_pages=%zu", - __func__, addr, map->count, map->region, map->region->start_pfn, - map->region->nr_pages); - WARN_ON_ONCE(1); - - return -EINVAL; -} - static struct kbase_aliased *get_aliased_alloc(struct vm_area_struct *vma, struct kbase_va_region *reg, pgoff_t *start_off, @@ -2508,9 +2517,17 @@ static vm_fault_t kbase_cpu_vm_fault(struct vm_fault *vmf) KBASE_DEBUG_ASSERT(map->kctx); KBASE_DEBUG_ASSERT(map->alloc); + kbase_gpu_vm_lock(map->kctx); + + /* Reject faults for SAME_VA mapping of UMM allocations */ + if ((map->alloc->type == KBASE_MEM_TYPE_IMPORTED_UMM) && map->free_on_close) { + dev_warn(map->kctx->kbdev->dev, "Invalid CPU access to UMM memory for ctx %d_%d", + map->kctx->tgid, map->kctx->id); + goto exit; + } + map_start_pgoff = vma->vm_pgoff - map->region->start_pfn; - kbase_gpu_vm_lock(map->kctx); if (unlikely(map->region->cpu_alloc->type == KBASE_MEM_TYPE_ALIAS)) { struct kbase_aliased *aliased = get_aliased_alloc(vma, map->region, &map_start_pgoff, 1); @@ -2561,7 +2578,6 @@ exit: const struct vm_operations_struct kbase_vm_ops = { .open = kbase_cpu_vm_open, .close = kbase_cpu_vm_close, - .may_split = kbase_cpu_vm_split, .fault = kbase_cpu_vm_fault }; @@ -2626,9 +2642,9 @@ static int kbase_cpu_mmap(struct kbase_context *kctx, vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot); } - if (!kaddr) { + if (!kaddr) vm_flags_set(vma, VM_PFNMAP); - } else { + else { WARN_ON(aligned_offset); /* MIXEDMAP so we can vfree the kaddr early and not track it after map time */ vm_flags_set(vma, VM_MIXEDMAP); @@ -2652,6 +2668,7 @@ static int kbase_cpu_mmap(struct kbase_context *kctx, map->alloc->properties |= KBASE_MEM_PHY_ALLOC_ACCESSED_CACHED; list_add(&map->mappings_list, &map->alloc->mappings); + kbase_file_inc_cpu_mapping_count(kctx->kfile); out: return err; @@ -2673,7 +2690,6 @@ static void kbase_free_unused_jit_allocations(struct kbase_context *kctx) while (kbase_jit_evict(kctx)) ; } -#endif static int kbase_mmu_dump_mmap(struct kbase_context *kctx, struct vm_area_struct *vma, @@ -2686,13 +2702,13 @@ static int kbase_mmu_dump_mmap(struct kbase_context *kctx, size_t size; int err = 0; + lockdep_assert_held(&kctx->reg_lock); + dev_dbg(kctx->kbdev->dev, "%s\n", __func__); size = (vma->vm_end - vma->vm_start); nr_pages = size >> PAGE_SHIFT; -#ifdef CONFIG_MALI_VECTOR_DUMP kbase_free_unused_jit_allocations(kctx); -#endif kaddr = kbase_mmu_dump(kctx, nr_pages); @@ -2701,8 +2717,7 @@ static int kbase_mmu_dump_mmap(struct kbase_context *kctx, goto out; } - new_reg = kbase_alloc_free_region(&kctx->reg_rbtree_same, 0, nr_pages, - KBASE_REG_ZONE_SAME_VA); + new_reg = kbase_ctx_alloc_free_region(kctx, SAME_VA_ZONE, 0, nr_pages); if (!new_reg) { err = -ENOMEM; WARN_ON(1); @@ -2740,7 +2755,7 @@ out_va_region: out: return err; } - +#endif void kbase_os_mem_map_lock(struct kbase_context *kctx) { @@ -2760,7 +2775,7 @@ static int kbasep_reg_mmap(struct kbase_context *kctx, size_t *nr_pages, size_t *aligned_offset) { - int cookie = vma->vm_pgoff - PFN_DOWN(BASE_MEM_COOKIE_BASE); + unsigned int cookie = vma->vm_pgoff - PFN_DOWN(BASE_MEM_COOKIE_BASE); struct kbase_va_region *reg; int err = 0; @@ -2801,7 +2816,6 @@ static int kbasep_reg_mmap(struct kbase_context *kctx, /* adjust down nr_pages to what we have physically */ *nr_pages = kbase_reg_current_backed_size(reg); - if (kbase_gpu_mmap(kctx, reg, vma->vm_start + *aligned_offset, reg->nr_pages, 1, mmu_sync_info) != 0) { dev_err(kctx->kbdev->dev, "%s:%d\n", __FILE__, __LINE__); @@ -2861,7 +2875,7 @@ int kbase_context_mmap(struct kbase_context *const kctx, goto out; } - kbase_gpu_vm_lock(kctx); + kbase_gpu_vm_lock_with_pmode_sync(kctx); if (vma->vm_pgoff == PFN_DOWN(BASE_MEM_MAP_TRACKING_HANDLE)) { /* The non-mapped tracking helper page */ @@ -2881,6 +2895,7 @@ int kbase_context_mmap(struct kbase_context *const kctx, err = -EINVAL; goto out_unlock; case PFN_DOWN(BASE_MEM_MMU_DUMP_HANDLE): +#if defined(CONFIG_MALI_VECTOR_DUMP) /* MMU dump */ err = kbase_mmu_dump_mmap(kctx, vma, ®, &kaddr); if (err != 0) @@ -2888,17 +2903,22 @@ int kbase_context_mmap(struct kbase_context *const kctx, /* free the region on munmap */ free_on_close = 1; break; +#else + /* Illegal handle for direct map */ + err = -EINVAL; + goto out_unlock; +#endif /* defined(CONFIG_MALI_VECTOR_DUMP) */ #if MALI_USE_CSF case PFN_DOWN(BASEP_MEM_CSF_USER_REG_PAGE_HANDLE): - kbase_gpu_vm_unlock(kctx); + kbase_gpu_vm_unlock_with_pmode_sync(kctx); err = kbase_csf_cpu_mmap_user_reg_page(kctx, vma); goto out; case PFN_DOWN(BASEP_MEM_CSF_USER_IO_PAGES_HANDLE) ... PFN_DOWN(BASE_MEM_COOKIE_BASE) - 1: { - kbase_gpu_vm_unlock(kctx); - mutex_lock(&kctx->csf.lock); + kbase_gpu_vm_unlock_with_pmode_sync(kctx); + rt_mutex_lock(&kctx->csf.lock); err = kbase_csf_cpu_mmap_user_io_pages(kctx, vma); - mutex_unlock(&kctx->csf.lock); + rt_mutex_unlock(&kctx->csf.lock); goto out; } #endif @@ -2975,7 +2995,7 @@ int kbase_context_mmap(struct kbase_context *const kctx, err = kbase_cpu_mmap(kctx, reg, vma, kaddr, nr_pages, aligned_offset, free_on_close); - +#if defined(CONFIG_MALI_VECTOR_DUMP) if (vma->vm_pgoff == PFN_DOWN(BASE_MEM_MMU_DUMP_HANDLE)) { /* MMU dump - userspace should now have a reference on * the pages, so we can now free the kernel mapping @@ -2994,9 +3014,9 @@ int kbase_context_mmap(struct kbase_context *const kctx, */ vma->vm_pgoff = PFN_DOWN(vma->vm_start); } - +#endif /* defined(CONFIG_MALI_VECTOR_DUMP) */ out_unlock: - kbase_gpu_vm_unlock(kctx); + kbase_gpu_vm_unlock_with_pmode_sync(kctx); out: if (err) dev_err(dev, "mmap failed %d\n", err); @@ -3036,9 +3056,108 @@ void kbase_sync_mem_regions(struct kbase_context *kctx, } } -static int kbase_vmap_phy_pages(struct kbase_context *kctx, - struct kbase_va_region *reg, u64 offset_bytes, size_t size, - struct kbase_vmap_struct *map) +/** + * kbase_vmap_phy_pages_migrate_count_increment - Increment VMAP count for + * array of physical pages + * + * @pages: Array of pages. + * @page_count: Number of pages. + * @flags: Region flags. + * + * This function is supposed to be called only if page migration support + * is enabled in the driver. + * + * The counter of kernel CPU mappings of the physical pages involved in a + * mapping operation is incremented by 1. Errors are handled by making pages + * not movable. Permanent kernel mappings will be marked as not movable, too. + */ +static void kbase_vmap_phy_pages_migrate_count_increment(struct tagged_addr *pages, + size_t page_count, unsigned long flags) +{ + size_t i; + + if (!IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT)) + return; + + for (i = 0; i < page_count; i++) { + struct page *p = as_page(pages[i]); + struct kbase_page_metadata *page_md = kbase_page_private(p); + + /* Skip the 4KB page that is part of a large page, as the large page is + * excluded from the migration process. + */ + if (is_huge(pages[i]) || is_partial(pages[i])) + continue; + + spin_lock(&page_md->migrate_lock); + /* Mark permanent kernel mappings as NOT_MOVABLE because they're likely + * to stay mapped for a long time. However, keep on counting the number + * of mappings even for them: they don't represent an exception for the + * vmap_count. + * + * At the same time, errors need to be handled if a client tries to add + * too many mappings, hence a page may end up in the NOT_MOVABLE state + * anyway even if it's not a permanent kernel mapping. + */ + if (flags & KBASE_REG_PERMANENT_KERNEL_MAPPING) + page_md->status = PAGE_STATUS_SET(page_md->status, (u8)NOT_MOVABLE); + if (page_md->vmap_count < U8_MAX) + page_md->vmap_count++; + else + page_md->status = PAGE_STATUS_SET(page_md->status, (u8)NOT_MOVABLE); + spin_unlock(&page_md->migrate_lock); + } +} + +/** + * kbase_vunmap_phy_pages_migrate_count_decrement - Decrement VMAP count for + * array of physical pages + * + * @pages: Array of pages. + * @page_count: Number of pages. + * + * This function is supposed to be called only if page migration support + * is enabled in the driver. + * + * The counter of kernel CPU mappings of the physical pages involved in a + * mapping operation is decremented by 1. Errors are handled by making pages + * not movable. + */ +static void kbase_vunmap_phy_pages_migrate_count_decrement(struct tagged_addr *pages, + size_t page_count) +{ + size_t i; + + if (!IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT)) + return; + + for (i = 0; i < page_count; i++) { + struct page *p = as_page(pages[i]); + struct kbase_page_metadata *page_md = kbase_page_private(p); + + /* Skip the 4KB page that is part of a large page, as the large page is + * excluded from the migration process. + */ + if (is_huge(pages[i]) || is_partial(pages[i])) + continue; + + spin_lock(&page_md->migrate_lock); + /* Decrement the number of mappings for all kinds of pages, including + * pages which are NOT_MOVABLE (e.g. permanent kernel mappings). + * However, errors still need to be handled if a client tries to remove + * more mappings than created. + */ + if (page_md->vmap_count == 0) + page_md->status = PAGE_STATUS_SET(page_md->status, (u8)NOT_MOVABLE); + else + page_md->vmap_count--; + spin_unlock(&page_md->migrate_lock); + } +} + +static int kbase_vmap_phy_pages(struct kbase_context *kctx, struct kbase_va_region *reg, + u64 offset_bytes, size_t size, struct kbase_vmap_struct *map, + kbase_vmap_flag vmap_flags) { unsigned long page_index; unsigned int offset_in_page = offset_bytes & ~PAGE_MASK; @@ -3049,6 +3168,12 @@ static int kbase_vmap_phy_pages(struct kbase_context *kctx, pgprot_t prot; size_t i; + if (WARN_ON(vmap_flags & ~KBASE_VMAP_INPUT_FLAGS)) + return -EINVAL; + + if (WARN_ON(kbase_is_region_invalid_or_free(reg))) + return -EINVAL; + if (!size || !map || !reg->cpu_alloc || !reg->gpu_alloc) return -EINVAL; @@ -3065,6 +3190,17 @@ static int kbase_vmap_phy_pages(struct kbase_context *kctx, if (page_index + page_count > kbase_reg_current_backed_size(reg)) return -ENOMEM; + if ((vmap_flags & KBASE_VMAP_FLAG_PERMANENT_MAP_ACCOUNTING) && + (page_count > (KBASE_PERMANENTLY_MAPPED_MEM_LIMIT_PAGES - + atomic_read(&kctx->permanent_mapped_pages)))) { + dev_warn( + kctx->kbdev->dev, + "Request for %llu more pages mem needing a permanent mapping would breach limit %lu, currently at %d pages", + (u64)page_count, KBASE_PERMANENTLY_MAPPED_MEM_LIMIT_PAGES, + atomic_read(&kctx->permanent_mapped_pages)); + return -ENOMEM; + } + if (reg->flags & KBASE_REG_DONT_NEED) return -EINVAL; @@ -3091,6 +3227,13 @@ static int kbase_vmap_phy_pages(struct kbase_context *kctx, */ cpu_addr = vmap(pages, page_count, VM_MAP, prot); + /* If page migration is enabled, increment the number of VMA mappings + * of all physical pages. In case of errors, e.g. too many mappings, + * make the page not movable to prevent trouble. + */ + if (kbase_is_page_migration_enabled() && !kbase_mem_is_imported(reg->gpu_alloc->type)) + kbase_vmap_phy_pages_migrate_count_increment(page_array, page_count, reg->flags); + kfree(pages); if (!cpu_addr) @@ -3103,61 +3246,79 @@ static int kbase_vmap_phy_pages(struct kbase_context *kctx, map->gpu_pages = &kbase_get_gpu_phy_pages(reg)[page_index]; map->addr = (void *)((uintptr_t)cpu_addr + offset_in_page); map->size = size; - map->sync_needed = ((reg->flags & KBASE_REG_CPU_CACHED) != 0) && - !kbase_mem_is_imported(map->gpu_alloc->type); + map->flags = vmap_flags; + if ((reg->flags & KBASE_REG_CPU_CACHED) && !kbase_mem_is_imported(map->gpu_alloc->type)) + map->flags |= KBASE_VMAP_FLAG_SYNC_NEEDED; - if (map->sync_needed) + if (map->flags & KBASE_VMAP_FLAG_SYNC_NEEDED) kbase_sync_mem_regions(kctx, map, KBASE_SYNC_TO_CPU); + if (vmap_flags & KBASE_VMAP_FLAG_PERMANENT_MAP_ACCOUNTING) + atomic_add(page_count, &kctx->permanent_mapped_pages); + kbase_mem_phy_alloc_kernel_mapped(reg->cpu_alloc); + return 0; } -void *kbase_vmap_prot(struct kbase_context *kctx, u64 gpu_addr, size_t size, - unsigned long prot_request, struct kbase_vmap_struct *map) +void *kbase_vmap_reg(struct kbase_context *kctx, struct kbase_va_region *reg, u64 gpu_addr, + size_t size, unsigned long prot_request, struct kbase_vmap_struct *map, + kbase_vmap_flag vmap_flags) { - struct kbase_va_region *reg; - void *addr = NULL; u64 offset_bytes; struct kbase_mem_phy_alloc *cpu_alloc; struct kbase_mem_phy_alloc *gpu_alloc; int err; - kbase_gpu_vm_lock(kctx); + lockdep_assert_held(&kctx->reg_lock); - reg = kbase_region_tracker_find_region_enclosing_address(kctx, - gpu_addr); - if (kbase_is_region_invalid_or_free(reg)) - goto out_unlock; + if (WARN_ON(kbase_is_region_invalid_or_free(reg))) + return NULL; /* check access permissions can be satisfied * Intended only for checking KBASE_REG_{CPU,GPU}_{RD,WR} */ if ((reg->flags & prot_request) != prot_request) - goto out_unlock; + return NULL; offset_bytes = gpu_addr - (reg->start_pfn << PAGE_SHIFT); cpu_alloc = kbase_mem_phy_alloc_get(reg->cpu_alloc); gpu_alloc = kbase_mem_phy_alloc_get(reg->gpu_alloc); - err = kbase_vmap_phy_pages(kctx, reg, offset_bytes, size, map); + err = kbase_vmap_phy_pages(kctx, reg, offset_bytes, size, map, vmap_flags); if (err < 0) goto fail_vmap_phy_pages; - addr = map->addr; - -out_unlock: - kbase_gpu_vm_unlock(kctx); - return addr; + return map->addr; fail_vmap_phy_pages: - kbase_gpu_vm_unlock(kctx); kbase_mem_phy_alloc_put(cpu_alloc); kbase_mem_phy_alloc_put(gpu_alloc); - return NULL; } +void *kbase_vmap_prot(struct kbase_context *kctx, u64 gpu_addr, size_t size, + unsigned long prot_request, struct kbase_vmap_struct *map) +{ + struct kbase_va_region *reg; + void *addr = NULL; + + kbase_gpu_vm_lock(kctx); + + reg = kbase_region_tracker_find_region_enclosing_address(kctx, gpu_addr); + if (kbase_is_region_invalid_or_free(reg)) + goto out_unlock; + + if (reg->gpu_alloc->type != KBASE_MEM_TYPE_NATIVE) + goto out_unlock; + + addr = kbase_vmap_reg(kctx, reg, gpu_addr, size, prot_request, map, 0u); + +out_unlock: + kbase_gpu_vm_unlock(kctx); + return addr; +} + void *kbase_vmap(struct kbase_context *kctx, u64 gpu_addr, size_t size, struct kbase_vmap_struct *map) { @@ -3178,16 +3339,34 @@ static void kbase_vunmap_phy_pages(struct kbase_context *kctx, vunmap(addr); - if (map->sync_needed) + /* If page migration is enabled, decrement the number of VMA mappings + * for all physical pages. Now is a good time to do it because references + * haven't been released yet. + */ + if (kbase_is_page_migration_enabled() && !kbase_mem_is_imported(map->gpu_alloc->type)) { + const size_t page_count = PFN_UP(map->offset_in_page + map->size); + struct tagged_addr *pages_array = map->cpu_pages; + + kbase_vunmap_phy_pages_migrate_count_decrement(pages_array, page_count); + } + + if (map->flags & KBASE_VMAP_FLAG_SYNC_NEEDED) kbase_sync_mem_regions(kctx, map, KBASE_SYNC_TO_DEVICE); + if (map->flags & KBASE_VMAP_FLAG_PERMANENT_MAP_ACCOUNTING) { + size_t page_count = PFN_UP(map->offset_in_page + map->size); + + WARN_ON(page_count > atomic_read(&kctx->permanent_mapped_pages)); + atomic_sub(page_count, &kctx->permanent_mapped_pages); + } kbase_mem_phy_alloc_kernel_unmapped(map->cpu_alloc); + map->offset_in_page = 0; map->cpu_pages = NULL; map->gpu_pages = NULL; map->addr = NULL; map->size = 0; - map->sync_needed = false; + map->flags = 0; } void kbase_vunmap(struct kbase_context *kctx, struct kbase_vmap_struct *map) @@ -3200,11 +3379,14 @@ KBASE_EXPORT_TEST_API(kbase_vunmap); static void kbasep_add_mm_counter(struct mm_struct *mm, int member, long value) { -#if (KERNEL_VERSION(4, 19, 0) <= LINUX_VERSION_CODE) - /* To avoid the build breakage due to an unexported kernel symbol - * 'mm_trace_rss_stat' from later kernels, i.e. from V4.19.0 onwards, - * we inline here the equivalent of 'add_mm_counter()' from linux - * kernel V5.4.0~8. +#if (KERNEL_VERSION(6, 2, 0) <= LINUX_VERSION_CODE) + /* To avoid the build breakage due to the type change in rss_stat, + * we inline here the equivalent of 'add_mm_counter()' from linux kernel V6.2. + */ + percpu_counter_add(&mm->rss_stat[member], value); +#elif (KERNEL_VERSION(5, 5, 0) <= LINUX_VERSION_CODE) + /* To avoid the build breakage due to an unexported kernel symbol 'mm_trace_rss_stat', + * we inline here the equivalent of 'add_mm_counter()' from linux kernel V5.5. */ atomic_long_add(value, &mm->rss_stat.count[member]); #else @@ -3214,73 +3396,44 @@ static void kbasep_add_mm_counter(struct mm_struct *mm, int member, long value) void kbasep_os_process_page_usage_update(struct kbase_context *kctx, int pages) { - struct mm_struct *mm; + struct mm_struct *mm = kctx->process_mm; - rcu_read_lock(); - mm = rcu_dereference(kctx->process_mm); - if (mm) { - atomic_add(pages, &kctx->nonmapped_pages); -#ifdef SPLIT_RSS_COUNTING - kbasep_add_mm_counter(mm, MM_FILEPAGES, pages); -#else - spin_lock(&mm->page_table_lock); - kbasep_add_mm_counter(mm, MM_FILEPAGES, pages); - spin_unlock(&mm->page_table_lock); -#endif - } - rcu_read_unlock(); -} - -static void kbasep_os_process_page_usage_drain(struct kbase_context *kctx) -{ - int pages; - struct mm_struct *mm; - - spin_lock(&kctx->mm_update_lock); - mm = rcu_dereference_protected(kctx->process_mm, lockdep_is_held(&kctx->mm_update_lock)); - if (!mm) { - spin_unlock(&kctx->mm_update_lock); + if (unlikely(!mm)) return; - } - rcu_assign_pointer(kctx->process_mm, NULL); - spin_unlock(&kctx->mm_update_lock); - synchronize_rcu(); - - pages = atomic_xchg(&kctx->nonmapped_pages, 0); + atomic_add(pages, &kctx->nonmapped_pages); #ifdef SPLIT_RSS_COUNTING - kbasep_add_mm_counter(mm, MM_FILEPAGES, -pages); + kbasep_add_mm_counter(mm, MM_FILEPAGES, pages); #else spin_lock(&mm->page_table_lock); - kbasep_add_mm_counter(mm, MM_FILEPAGES, -pages); + kbasep_add_mm_counter(mm, MM_FILEPAGES, pages); spin_unlock(&mm->page_table_lock); #endif } +static void kbase_special_vm_open(struct vm_area_struct *vma) +{ + struct kbase_context *kctx = vma->vm_private_data; + + kbase_file_inc_cpu_mapping_count(kctx->kfile); +} + static void kbase_special_vm_close(struct vm_area_struct *vma) { - struct kbase_context *kctx; + struct kbase_context *kctx = vma->vm_private_data; - kctx = vma->vm_private_data; - kbasep_os_process_page_usage_drain(kctx); + kbase_file_dec_cpu_mapping_count(kctx->kfile); } static const struct vm_operations_struct kbase_vm_special_ops = { + .open = kbase_special_vm_open, .close = kbase_special_vm_close, }; static int kbase_tracking_page_setup(struct kbase_context *kctx, struct vm_area_struct *vma) { - /* check that this is the only tracking page */ - spin_lock(&kctx->mm_update_lock); - if (rcu_dereference_protected(kctx->process_mm, lockdep_is_held(&kctx->mm_update_lock))) { - spin_unlock(&kctx->mm_update_lock); - return -EFAULT; - } - - rcu_assign_pointer(kctx->process_mm, current->mm); - - spin_unlock(&kctx->mm_update_lock); + if (vma_pages(vma) != 1) + return -EINVAL; /* no real access */ vm_flags_clear(vma, VM_READ | VM_MAYREAD | VM_WRITE | VM_MAYWRITE | VM_EXEC | VM_MAYEXEC); @@ -3288,6 +3441,7 @@ static int kbase_tracking_page_setup(struct kbase_context *kctx, struct vm_area_ vma->vm_ops = &kbase_vm_special_ops; vma->vm_private_data = kctx; + kbase_file_inc_cpu_mapping_count(kctx->kfile); return 0; } @@ -3311,9 +3465,27 @@ static unsigned long get_queue_doorbell_pfn(struct kbase_device *kbdev, (u64)queue->doorbell_nr * CSF_HW_DOORBELL_PAGE_SIZE)); } +static int +#if (KERNEL_VERSION(5, 13, 0) <= LINUX_VERSION_CODE || \ + KERNEL_VERSION(5, 11, 0) > LINUX_VERSION_CODE) +kbase_csf_user_io_pages_vm_mremap(struct vm_area_struct *vma) +#else +kbase_csf_user_io_pages_vm_mremap(struct vm_area_struct *vma, unsigned long flags) +#endif +{ + pr_debug("Unexpected call to mremap method for User IO pages mapping vma\n"); + return -EINVAL; +} + +static int kbase_csf_user_io_pages_vm_split(struct vm_area_struct *vma, unsigned long addr) +{ + pr_debug("Unexpected call to split method for User IO pages mapping vma\n"); + return -EINVAL; +} + static void kbase_csf_user_io_pages_vm_open(struct vm_area_struct *vma) { - WARN(1, "Unexpected attempt to clone private vma\n"); + pr_debug("Unexpected call to the open method for User IO pages mapping vma\n"); vma->vm_private_data = NULL; } @@ -3324,12 +3496,16 @@ static void kbase_csf_user_io_pages_vm_close(struct vm_area_struct *vma) struct kbase_device *kbdev; int err; bool reset_prevented = false; + struct kbase_file *kfile; - if (WARN_ON(!queue)) + if (!queue) { + pr_debug("Close method called for the new User IO pages mapping vma\n"); return; + } kctx = queue->kctx; kbdev = kctx->kbdev; + kfile = kctx->kfile; err = kbase_reset_gpu_prevent_and_wait(kbdev); if (err) @@ -3340,15 +3516,16 @@ static void kbase_csf_user_io_pages_vm_close(struct vm_area_struct *vma) else reset_prevented = true; - mutex_lock(&kctx->csf.lock); - kbase_csf_queue_unbind(queue); - mutex_unlock(&kctx->csf.lock); + rt_mutex_lock(&kctx->csf.lock); + kbase_csf_queue_unbind(queue, is_process_exiting(vma)); + rt_mutex_unlock(&kctx->csf.lock); if (reset_prevented) kbase_reset_gpu_allow(kbdev); + kbase_file_dec_cpu_mapping_count(kfile); /* Now as the vma is closed, drop the reference on mali device file */ - fput(kctx->filp); + fput(kfile->filp); } #if (KERNEL_VERSION(4, 11, 0) > LINUX_VERSION_CODE) @@ -3370,9 +3547,12 @@ static vm_fault_t kbase_csf_user_io_pages_vm_fault(struct vm_fault *vmf) struct memory_group_manager_device *mgm_dev; /* Few sanity checks up front */ - if ((nr_pages != BASEP_QUEUE_NR_MMAP_USER_PAGES) || - (vma->vm_pgoff != queue->db_file_offset)) + if (!queue || (nr_pages != BASEP_QUEUE_NR_MMAP_USER_PAGES) || + (vma->vm_pgoff != queue->db_file_offset)) { + pr_warn("Unexpected CPU page fault on User IO pages mapping for process %s tgid %d pid %d\n", + current->comm, current->tgid, current->pid); return VM_FAULT_SIGBUS; + } kbdev = queue->kctx->kbdev; mgm_dev = kbdev->mgm_dev; @@ -3382,13 +3562,6 @@ static vm_fault_t kbase_csf_user_io_pages_vm_fault(struct vm_fault *vmf) /* Always map the doorbell page as uncached */ doorbell_pgprot = pgprot_device(vma->vm_page_prot); -#if ((KERNEL_VERSION(4, 4, 147) >= LINUX_VERSION_CODE) || \ - ((KERNEL_VERSION(4, 6, 0) > LINUX_VERSION_CODE) && \ - (KERNEL_VERSION(4, 5, 0) <= LINUX_VERSION_CODE))) - vma->vm_page_prot = doorbell_pgprot; - input_page_pgprot = doorbell_pgprot; - output_page_pgprot = doorbell_pgprot; -#else if (kbdev->system_coherency == COHERENCY_NONE) { input_page_pgprot = pgprot_writecombine(vma->vm_page_prot); output_page_pgprot = pgprot_writecombine(vma->vm_page_prot); @@ -3396,7 +3569,6 @@ static vm_fault_t kbase_csf_user_io_pages_vm_fault(struct vm_fault *vmf) input_page_pgprot = vma->vm_page_prot; output_page_pgprot = vma->vm_page_prot; } -#endif doorbell_cpu_addr = vma->vm_start; @@ -3435,6 +3607,12 @@ exit: static const struct vm_operations_struct kbase_csf_user_io_pages_vm_ops = { .open = kbase_csf_user_io_pages_vm_open, .close = kbase_csf_user_io_pages_vm_close, +#if KERNEL_VERSION(5, 11, 0) <= LINUX_VERSION_CODE + .may_split = kbase_csf_user_io_pages_vm_split, +#else + .split = kbase_csf_user_io_pages_vm_split, +#endif + .mremap = kbase_csf_user_io_pages_vm_mremap, .fault = kbase_csf_user_io_pages_vm_fault }; @@ -3500,6 +3678,7 @@ static int kbase_csf_cpu_mmap_user_io_pages(struct kbase_context *kctx, /* Also adjust the vm_pgoff */ vma->vm_pgoff = queue->db_file_offset; + kbase_file_inc_cpu_mapping_count(kctx->kfile); return 0; map_failed: @@ -3514,13 +3693,78 @@ map_failed: return err; } +/** + * kbase_csf_user_reg_vm_open - VMA open function for the USER page + * + * @vma: Pointer to the struct containing information about + * the userspace mapping of USER page. + * Note: + * This function isn't expected to be called. If called (i.e> mremap), + * set private_data as NULL to indicate to close() and fault() functions. + */ +static void kbase_csf_user_reg_vm_open(struct vm_area_struct *vma) +{ + pr_debug("Unexpected call to the open method for USER register mapping"); + vma->vm_private_data = NULL; +} + +/** + * kbase_csf_user_reg_vm_close - VMA close function for the USER page + * + * @vma: Pointer to the struct containing information about + * the userspace mapping of USER page. + */ static void kbase_csf_user_reg_vm_close(struct vm_area_struct *vma) { struct kbase_context *kctx = vma->vm_private_data; + struct kbase_device *kbdev; + struct kbase_file *kfile; + + if (unlikely(!kctx)) { + pr_debug("Close function called for the unexpected mapping"); + return; + } + + kbdev = kctx->kbdev; + kfile = kctx->kfile; + + if (unlikely(!kctx->csf.user_reg.vma)) + dev_warn(kbdev->dev, "user_reg VMA pointer unexpectedly NULL for ctx %d_%d", + kctx->tgid, kctx->id); + + mutex_lock(&kbdev->csf.reg_lock); + list_del_init(&kctx->csf.user_reg.link); + mutex_unlock(&kbdev->csf.reg_lock); - WARN_ON(!kctx->csf.user_reg_vma); + kctx->csf.user_reg.vma = NULL; - kctx->csf.user_reg_vma = NULL; + kbase_file_dec_cpu_mapping_count(kfile); + /* Now as the VMA is closed, drop the reference on mali device file */ + fput(kfile->filp); +} + +/** + * kbase_csf_user_reg_vm_mremap - VMA mremap function for the USER page + * + * @vma: Pointer to the struct containing information about + * the userspace mapping of USER page. + * + * Return: -EINVAL + * + * Note: + * User space must not attempt mremap on USER page mapping. + * This function will return an error to fail the attempt. + */ +static int +#if ((KERNEL_VERSION(5, 13, 0) <= LINUX_VERSION_CODE) || \ + (KERNEL_VERSION(5, 11, 0) > LINUX_VERSION_CODE)) +kbase_csf_user_reg_vm_mremap(struct vm_area_struct *vma) +#else +kbase_csf_user_reg_vm_mremap(struct vm_area_struct *vma, unsigned long flags) +#endif +{ + pr_debug("Unexpected call to mremap method for USER page mapping vma\n"); + return -EINVAL; } #if (KERNEL_VERSION(4, 11, 0) > LINUX_VERSION_CODE) @@ -3533,44 +3777,52 @@ static vm_fault_t kbase_csf_user_reg_vm_fault(struct vm_fault *vmf) struct vm_area_struct *vma = vmf->vma; #endif struct kbase_context *kctx = vma->vm_private_data; - struct kbase_device *kbdev = kctx->kbdev; - struct memory_group_manager_device *mgm_dev = kbdev->mgm_dev; - unsigned long pfn = PFN_DOWN(kbdev->reg_start + USER_BASE); + struct kbase_device *kbdev; + struct memory_group_manager_device *mgm_dev; + unsigned long pfn; size_t nr_pages = PFN_DOWN(vma->vm_end - vma->vm_start); vm_fault_t ret = VM_FAULT_SIGBUS; unsigned long flags; /* Few sanity checks up front */ - if (WARN_ON(nr_pages != 1) || - WARN_ON(vma != kctx->csf.user_reg_vma) || - WARN_ON(vma->vm_pgoff != - PFN_DOWN(BASEP_MEM_CSF_USER_REG_PAGE_HANDLE))) + + if (!kctx || (nr_pages != 1) || (vma != kctx->csf.user_reg.vma) || + (vma->vm_pgoff != kctx->csf.user_reg.file_offset)) { + pr_err("Unexpected CPU page fault on USER page mapping for process %s tgid %d pid %d\n", + current->comm, current->tgid, current->pid); return VM_FAULT_SIGBUS; + } + + kbdev = kctx->kbdev; + mgm_dev = kbdev->mgm_dev; + pfn = PFN_DOWN(kbdev->reg_start + USER_BASE); mutex_lock(&kbdev->csf.reg_lock); + spin_lock_irqsave(&kbdev->hwaccess_lock, flags); - /* Don't map in the actual register page if GPU is powered down. - * Always map in the dummy page in no mali builds. + /* Dummy page will be mapped during GPU off. + * + * In no mail builds, always map in the dummy page. */ -#if IS_ENABLED(CONFIG_MALI_NO_MALI) - pfn = PFN_DOWN(as_phys_addr_t(kbdev->csf.dummy_user_reg_page)); -#else - if (!kbdev->pm.backend.gpu_powered) - pfn = PFN_DOWN(as_phys_addr_t(kbdev->csf.dummy_user_reg_page)); -#endif + if (IS_ENABLED(CONFIG_MALI_NO_MALI) || !kbdev->pm.backend.gpu_powered) + pfn = PFN_DOWN(as_phys_addr_t(kbdev->csf.user_reg.dummy_page)); spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); + list_move_tail(&kctx->csf.user_reg.link, &kbdev->csf.user_reg.list); ret = mgm_dev->ops.mgm_vmf_insert_pfn_prot(mgm_dev, KBASE_MEM_GROUP_CSF_FW, vma, vma->vm_start, pfn, vma->vm_page_prot); + mutex_unlock(&kbdev->csf.reg_lock); return ret; } static const struct vm_operations_struct kbase_csf_user_reg_vm_ops = { + .open = kbase_csf_user_reg_vm_open, .close = kbase_csf_user_reg_vm_close, + .mremap = kbase_csf_user_reg_vm_mremap, .fault = kbase_csf_user_reg_vm_fault }; @@ -3578,9 +3830,10 @@ static int kbase_csf_cpu_mmap_user_reg_page(struct kbase_context *kctx, struct vm_area_struct *vma) { size_t nr_pages = PFN_DOWN(vma->vm_end - vma->vm_start); + struct kbase_device *kbdev = kctx->kbdev; /* Few sanity checks */ - if (kctx->csf.user_reg_vma) + if (kctx->csf.user_reg.vma) return -EBUSY; if (nr_pages != 1) @@ -3599,11 +3852,25 @@ static int kbase_csf_cpu_mmap_user_reg_page(struct kbase_context *kctx, */ vm_flags_set(vma, VM_PFNMAP); - kctx->csf.user_reg_vma = vma; + kctx->csf.user_reg.vma = vma; + + mutex_lock(&kbdev->csf.reg_lock); + kctx->csf.user_reg.file_offset = kbdev->csf.user_reg.file_offset++; + mutex_unlock(&kbdev->csf.reg_lock); + /* Make VMA point to the special internal file, but don't drop the + * reference on mali device file (that would be done later when the + * VMA is closed). + */ + vma->vm_file = kctx->kbdev->csf.user_reg.filp; + get_file(vma->vm_file); + + /* Also adjust the vm_pgoff */ + vma->vm_pgoff = kctx->csf.user_reg.file_offset; vma->vm_ops = &kbase_csf_user_reg_vm_ops; vma->vm_private_data = kctx; + kbase_file_inc_cpu_mapping_count(kctx->kfile); return 0; } diff --git a/mali_kbase/mali_kbase_mem_linux.h b/mali_kbase/mali_kbase_mem_linux.h index 1f6877a..6dda44b 100644 --- a/mali_kbase/mali_kbase_mem_linux.h +++ b/mali_kbase/mali_kbase_mem_linux.h @@ -217,6 +217,26 @@ int kbase_mem_evictable_make(struct kbase_mem_phy_alloc *gpu_alloc); */ bool kbase_mem_evictable_unmake(struct kbase_mem_phy_alloc *alloc); +typedef unsigned int kbase_vmap_flag; + +/* Sync operations are needed on beginning and ending of access to kernel-mapped GPU memory. + * + * This is internal to the struct kbase_vmap_struct and should not be passed in by callers of + * kbase_vmap-related functions. + */ +#define KBASE_VMAP_FLAG_SYNC_NEEDED (((kbase_vmap_flag)1) << 0) + +/* Permanently mapped memory accounting (including enforcing limits) should be done on the + * kernel-mapped GPU memory. + * + * This should be used if the kernel mapping is going to live for a potentially long time, for + * example if it will persist after the caller has returned. + */ +#define KBASE_VMAP_FLAG_PERMANENT_MAP_ACCOUNTING (((kbase_vmap_flag)1) << 1) + +/* Set of flags that can be passed into kbase_vmap-related functions */ +#define KBASE_VMAP_INPUT_FLAGS (KBASE_VMAP_FLAG_PERMANENT_MAP_ACCOUNTING) + struct kbase_vmap_struct { off_t offset_in_page; struct kbase_mem_phy_alloc *cpu_alloc; @@ -225,9 +245,55 @@ struct kbase_vmap_struct { struct tagged_addr *gpu_pages; void *addr; size_t size; - bool sync_needed; + kbase_vmap_flag flags; }; +/** + * kbase_mem_shrink_gpu_mapping - Shrink the GPU mapping of an allocation + * @kctx: Context the region belongs to + * @reg: The GPU region or NULL if there isn't one + * @new_pages: The number of pages after the shrink + * @old_pages: The number of pages before the shrink + * + * Return: 0 on success, negative -errno on error + * + * Unmap the shrunk pages from the GPU mapping. Note that the size of the region + * itself is unmodified as we still need to reserve the VA, only the page tables + * will be modified by this function. + */ +int kbase_mem_shrink_gpu_mapping(struct kbase_context *kctx, struct kbase_va_region *reg, + u64 new_pages, u64 old_pages); + +/** + * kbase_vmap_reg - Map part of an existing region into the kernel safely, only if the requested + * access permissions are supported + * @kctx: Context @reg belongs to + * @reg: The GPU region to map part of + * @gpu_addr: Start address of VA range to map, which must be within @reg + * @size: Size of VA range, which when added to @gpu_addr must be within @reg + * @prot_request: Flags indicating how the caller will then access the memory + * @map: Structure to be given to kbase_vunmap() on freeing + * @vmap_flags: Flags of type kbase_vmap_flag + * + * Return: Kernel-accessible CPU pointer to the VA range, or NULL on error + * + * Variant of kbase_vmap_prot() that can be used given an existing region. + * + * The caller must satisfy one of the following for @reg: + * * It must have been obtained by finding it on the region tracker, and the region lock must not + * have been released in the mean time. + * * Or, it must have been refcounted with a call to kbase_va_region_alloc_get(), and the region + * lock is now held again. + * * Or, @reg has had NO_USER_FREE set at creation time or under the region lock, and the + * region lock is now held again. + * + * The acceptable @vmap_flags are those in %KBASE_VMAP_INPUT_FLAGS. + * + * Refer to kbase_vmap_prot() for more information on the operation of this function. + */ +void *kbase_vmap_reg(struct kbase_context *kctx, struct kbase_va_region *reg, u64 gpu_addr, + size_t size, unsigned long prot_request, struct kbase_vmap_struct *map, + kbase_vmap_flag vmap_flags); /** * kbase_vmap_prot - Map a GPU VA range into the kernel safely, only if the @@ -439,18 +505,7 @@ u32 kbase_get_cache_line_alignment(struct kbase_device *kbdev); static inline vm_fault_t vmf_insert_pfn_prot(struct vm_area_struct *vma, unsigned long addr, unsigned long pfn, pgprot_t pgprot) { - int err; - -#if ((KERNEL_VERSION(4, 4, 147) >= LINUX_VERSION_CODE) || \ - ((KERNEL_VERSION(4, 6, 0) > LINUX_VERSION_CODE) && \ - (KERNEL_VERSION(4, 5, 0) <= LINUX_VERSION_CODE))) - if (pgprot_val(pgprot) != pgprot_val(vma->vm_page_prot)) - return VM_FAULT_SIGBUS; - - err = vm_insert_pfn(vma, addr, pfn); -#else - err = vm_insert_pfn_prot(vma, addr, pfn, pgprot); -#endif + int err = vm_insert_pfn_prot(vma, addr, pfn, pgprot); if (unlikely(err == -ENOMEM)) return VM_FAULT_OOM; diff --git a/mali_kbase/mali_kbase_mem_migrate.c b/mali_kbase/mali_kbase_mem_migrate.c new file mode 100644 index 0000000..4c2cc0f --- /dev/null +++ b/mali_kbase/mali_kbase_mem_migrate.c @@ -0,0 +1,712 @@ +// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note +/* + * + * (C) COPYRIGHT 2022-2023 ARM Limited. All rights reserved. + * + * This program is free software and is provided to you under the terms of the + * GNU General Public License version 2 as published by the Free Software + * Foundation, and any use by you of this program is subject to the terms + * of such GNU license. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, you can access it online at + * http://www.gnu.org/licenses/gpl-2.0.html. + * + */ + +/** + * DOC: Base kernel page migration implementation. + */ +#include <linux/migrate.h> + +#include <mali_kbase.h> +#include <mali_kbase_mem_migrate.h> +#include <mmu/mali_kbase_mmu.h> + +/* Global integer used to determine if module parameter value has been + * provided and if page migration feature is enabled. + * Feature is disabled on all platforms by default. + */ +#if !IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT) +/* If page migration support is explicitly compiled out, there should be no way to change + * this int. Its value is automatically 0 as a global. + */ +const int kbase_page_migration_enabled; +/* module_param is not called so this value cannot be changed at insmod when compiled + * without support for page migration. + */ +#else +/* -1 as default, 0 when manually set as off and 1 when manually set as on */ +int kbase_page_migration_enabled = -1; +module_param(kbase_page_migration_enabled, int, 0444); +MODULE_PARM_DESC(kbase_page_migration_enabled, + "Explicitly enable or disable page migration with 1 or 0 respectively."); +#endif /* !IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT) */ + +KBASE_EXPORT_TEST_API(kbase_page_migration_enabled); + +bool kbase_is_page_migration_enabled(void) +{ + /* Handle uninitialised int case */ + if (kbase_page_migration_enabled < 0) + return false; + return IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT) && kbase_page_migration_enabled; +} +KBASE_EXPORT_SYMBOL(kbase_is_page_migration_enabled); + +#if (KERNEL_VERSION(6, 0, 0) <= LINUX_VERSION_CODE) +static const struct movable_operations movable_ops; +#endif + +bool kbase_alloc_page_metadata(struct kbase_device *kbdev, struct page *p, dma_addr_t dma_addr, + u8 group_id) +{ + struct kbase_page_metadata *page_md; + + /* A check for kbase_page_migration_enabled would help here too but it's already being + * checked in the only caller of this function. + */ + if (!IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT)) + return false; + + page_md = kzalloc(sizeof(struct kbase_page_metadata), GFP_KERNEL); + if (!page_md) + return false; + + SetPagePrivate(p); + set_page_private(p, (unsigned long)page_md); + page_md->dma_addr = dma_addr; + page_md->status = PAGE_STATUS_SET(page_md->status, (u8)ALLOCATE_IN_PROGRESS); + page_md->vmap_count = 0; + page_md->group_id = group_id; + spin_lock_init(&page_md->migrate_lock); + + lock_page(p); +#if (KERNEL_VERSION(6, 0, 0) <= LINUX_VERSION_CODE) + __SetPageMovable(p, &movable_ops); + page_md->status = PAGE_MOVABLE_SET(page_md->status); +#else + /* In some corner cases, the driver may attempt to allocate memory pages + * even before the device file is open and the mapping for address space + * operations is created. In that case, it is impossible to assign address + * space operations to memory pages: simply pretend that they are movable, + * even if they are not. + * + * The page will go through all state transitions but it will never be + * actually considered movable by the kernel. This is due to the fact that + * the page cannot be marked as NOT_MOVABLE upon creation, otherwise the + * memory pool will always refuse to add it to the pool and schedule + * a worker thread to free it later. + * + * Page metadata may seem redundant in this case, but they are not, + * because memory pools expect metadata to be present when page migration + * is enabled and because the pages may always return to memory pools and + * gain the movable property later on in their life cycle. + */ + if (kbdev->mem_migrate.inode && kbdev->mem_migrate.inode->i_mapping) { + __SetPageMovable(p, kbdev->mem_migrate.inode->i_mapping); + page_md->status = PAGE_MOVABLE_SET(page_md->status); + } +#endif + unlock_page(p); + + return true; +} + +static void kbase_free_page_metadata(struct kbase_device *kbdev, struct page *p, u8 *group_id) +{ + struct device *const dev = kbdev->dev; + struct kbase_page_metadata *page_md; + dma_addr_t dma_addr; + + if (!IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT)) + return; + page_md = kbase_page_private(p); + if (!page_md) + return; + + if (group_id) + *group_id = page_md->group_id; + dma_addr = kbase_dma_addr(p); + dma_unmap_page(dev, dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL); + + kfree(page_md); + set_page_private(p, 0); + ClearPagePrivate(p); +} + +#if IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT) +/* This function is only called when page migration + * support is not explicitly compiled out. + */ +static void kbase_free_pages_worker(struct work_struct *work) +{ + struct kbase_mem_migrate *mem_migrate = + container_of(work, struct kbase_mem_migrate, free_pages_work); + struct kbase_device *kbdev = container_of(mem_migrate, struct kbase_device, mem_migrate); + struct page *p, *tmp; + struct kbase_page_metadata *page_md; + LIST_HEAD(free_list); + + spin_lock(&mem_migrate->free_pages_lock); + list_splice_init(&mem_migrate->free_pages_list, &free_list); + spin_unlock(&mem_migrate->free_pages_lock); + list_for_each_entry_safe(p, tmp, &free_list, lru) { + u8 group_id = 0; + list_del_init(&p->lru); + + lock_page(p); + page_md = kbase_page_private(p); + if (page_md && IS_PAGE_MOVABLE(page_md->status)) { + __ClearPageMovable(p); + page_md->status = PAGE_MOVABLE_CLEAR(page_md->status); + } + unlock_page(p); + + kbase_free_page_metadata(kbdev, p, &group_id); + kbdev->mgm_dev->ops.mgm_free_page(kbdev->mgm_dev, group_id, p, 0); + } +} +#endif + +void kbase_free_page_later(struct kbase_device *kbdev, struct page *p) +{ + struct kbase_mem_migrate *mem_migrate = &kbdev->mem_migrate; + + if (!IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT)) + return; + spin_lock(&mem_migrate->free_pages_lock); + list_add(&p->lru, &mem_migrate->free_pages_list); + spin_unlock(&mem_migrate->free_pages_lock); +} + +/** + * kbasep_migrate_page_pt_mapped - Migrate a memory page that is mapped + * in a PGD of kbase_mmu_table. + * + * @old_page: Existing PGD page to remove + * @new_page: Destination for migrating the existing PGD page to + * + * Replace an existing PGD page with a new page by migrating its content. More specifically: + * the new page shall replace the existing PGD page in the MMU page table. Before returning, + * the new page shall be set as movable and not isolated, while the old page shall lose + * the movable property. The meta data attached to the PGD page is transferred to the + * new (replacement) page. + * + * This function returns early with an error if called when not compiled with + * CONFIG_PAGE_MIGRATION_SUPPORT. + * + * Return: 0 on migration success, or -EAGAIN for a later retry. Otherwise it's a failure + * and the migration is aborted. + */ +static int kbasep_migrate_page_pt_mapped(struct page *old_page, struct page *new_page) +{ + struct kbase_page_metadata *page_md = kbase_page_private(old_page); + struct kbase_context *kctx = page_md->data.pt_mapped.mmut->kctx; + struct kbase_device *kbdev = kctx->kbdev; + dma_addr_t old_dma_addr = page_md->dma_addr; + dma_addr_t new_dma_addr; + int ret; + + if (!IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT)) + return -EINVAL; + + /* Create a new dma map for the new page */ + new_dma_addr = dma_map_page(kbdev->dev, new_page, 0, PAGE_SIZE, DMA_BIDIRECTIONAL); + if (dma_mapping_error(kbdev->dev, new_dma_addr)) + return -ENOMEM; + + /* Lock context to protect access to the page in physical allocation. + * This blocks the CPU page fault handler from remapping pages. + * Only MCU's mmut is device wide, i.e. no corresponding kctx. + */ + kbase_gpu_vm_lock_with_pmode_sync(kctx); + + ret = kbase_mmu_migrate_page( + as_tagged(page_to_phys(old_page)), as_tagged(page_to_phys(new_page)), old_dma_addr, + new_dma_addr, PGD_VPFN_LEVEL_GET_LEVEL(page_md->data.pt_mapped.pgd_vpfn_level)); + + if (ret == 0) { + dma_unmap_page(kbdev->dev, old_dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL); + __ClearPageMovable(old_page); + ClearPagePrivate(old_page); + put_page(old_page); + +#if (KERNEL_VERSION(6, 0, 0) <= LINUX_VERSION_CODE) + __SetPageMovable(new_page, &movable_ops); + page_md->status = PAGE_MOVABLE_SET(page_md->status); +#else + if (kbdev->mem_migrate.inode->i_mapping) { + __SetPageMovable(new_page, kbdev->mem_migrate.inode->i_mapping); + page_md->status = PAGE_MOVABLE_SET(page_md->status); + } +#endif + SetPagePrivate(new_page); + get_page(new_page); + } else + dma_unmap_page(kbdev->dev, new_dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL); + + /* Page fault handler for CPU mapping unblocked. */ + kbase_gpu_vm_unlock_with_pmode_sync(kctx); + + return ret; +} + +/* + * kbasep_migrate_page_allocated_mapped - Migrate a memory page that is both + * allocated and mapped. + * + * @old_page: Page to remove. + * @new_page: Page to add. + * + * Replace an old page with a new page by migrating its content and all its + * CPU and GPU mappings. More specifically: the new page shall replace the + * old page in the MMU page table, as well as in the page array of the physical + * allocation, which is used to create CPU mappings. Before returning, the new + * page shall be set as movable and not isolated, while the old page shall lose + * the movable property. + * + * This function returns early with an error if called when not compiled with + * CONFIG_PAGE_MIGRATION_SUPPORT. + */ +static int kbasep_migrate_page_allocated_mapped(struct page *old_page, struct page *new_page) +{ + struct kbase_page_metadata *page_md = kbase_page_private(old_page); + struct kbase_context *kctx = page_md->data.mapped.mmut->kctx; + dma_addr_t old_dma_addr, new_dma_addr; + int ret; + + if (!IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT)) + return -EINVAL; + old_dma_addr = page_md->dma_addr; + new_dma_addr = dma_map_page(kctx->kbdev->dev, new_page, 0, PAGE_SIZE, DMA_BIDIRECTIONAL); + if (dma_mapping_error(kctx->kbdev->dev, new_dma_addr)) + return -ENOMEM; + + /* Lock context to protect access to array of pages in physical allocation. + * This blocks the CPU page fault handler from remapping pages. + */ + kbase_gpu_vm_lock_with_pmode_sync(kctx); + + /* Unmap the old physical range. */ + unmap_mapping_range(kctx->kfile->filp->f_inode->i_mapping, + page_md->data.mapped.vpfn << PAGE_SHIFT, + PAGE_SIZE, 1); + + ret = kbase_mmu_migrate_page(as_tagged(page_to_phys(old_page)), + as_tagged(page_to_phys(new_page)), old_dma_addr, new_dma_addr, + MIDGARD_MMU_BOTTOMLEVEL); + + if (ret == 0) { + dma_unmap_page(kctx->kbdev->dev, old_dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL); + + SetPagePrivate(new_page); + get_page(new_page); + + /* Clear PG_movable from the old page and release reference. */ + ClearPagePrivate(old_page); + __ClearPageMovable(old_page); + put_page(old_page); + + /* Set PG_movable to the new page. */ +#if (KERNEL_VERSION(6, 0, 0) <= LINUX_VERSION_CODE) + __SetPageMovable(new_page, &movable_ops); + page_md->status = PAGE_MOVABLE_SET(page_md->status); +#else + if (kctx->kbdev->mem_migrate.inode->i_mapping) { + __SetPageMovable(new_page, kctx->kbdev->mem_migrate.inode->i_mapping); + page_md->status = PAGE_MOVABLE_SET(page_md->status); + } +#endif + } else + dma_unmap_page(kctx->kbdev->dev, new_dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL); + + /* Page fault handler for CPU mapping unblocked. */ + kbase_gpu_vm_unlock_with_pmode_sync(kctx); + + return ret; +} + +/** + * kbase_page_isolate - Isolate a page for migration. + * + * @p: Pointer of the page struct of page to isolate. + * @mode: LRU Isolation modes. + * + * Callback function for Linux to isolate a page and prepare it for migration. + * This callback is not registered if compiled without CONFIG_PAGE_MIGRATION_SUPPORT. + * + * Return: true on success, false otherwise. + */ +static bool kbase_page_isolate(struct page *p, isolate_mode_t mode) +{ + bool status_mem_pool = false; + struct kbase_mem_pool *mem_pool = NULL; + struct kbase_page_metadata *page_md = kbase_page_private(p); + + if (!IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT)) + return false; + CSTD_UNUSED(mode); + + if (!page_md || !IS_PAGE_MOVABLE(page_md->status)) + return false; + + if (!spin_trylock(&page_md->migrate_lock)) + return false; + + if (WARN_ON(IS_PAGE_ISOLATED(page_md->status))) { + spin_unlock(&page_md->migrate_lock); + return false; + } + + switch (PAGE_STATUS_GET(page_md->status)) { + case MEM_POOL: + /* Prepare to remove page from memory pool later only if pool is not + * in the process of termination. + */ + mem_pool = page_md->data.mem_pool.pool; + status_mem_pool = true; + preempt_disable(); + atomic_inc(&mem_pool->isolation_in_progress_cnt); + break; + case ALLOCATED_MAPPED: + /* Mark the page into isolated state, but only if it has no + * kernel CPU mappings + */ + if (page_md->vmap_count == 0) + page_md->status = PAGE_ISOLATE_SET(page_md->status, 1); + break; + case PT_MAPPED: + /* Mark the page into isolated state. */ + page_md->status = PAGE_ISOLATE_SET(page_md->status, 1); + break; + case SPILL_IN_PROGRESS: + case ALLOCATE_IN_PROGRESS: + case FREE_IN_PROGRESS: + break; + case NOT_MOVABLE: + /* Opportunistically clear the movable property for these pages */ + __ClearPageMovable(p); + page_md->status = PAGE_MOVABLE_CLEAR(page_md->status); + break; + default: + /* State should always fall in one of the previous cases! + * Also notice that FREE_ISOLATED_IN_PROGRESS or + * FREE_PT_ISOLATED_IN_PROGRESS is impossible because + * that state only applies to pages that are already isolated. + */ + page_md->status = PAGE_ISOLATE_SET(page_md->status, 0); + break; + } + + spin_unlock(&page_md->migrate_lock); + + /* If the page is still in the memory pool: try to remove it. This will fail + * if pool lock is taken which could mean page no longer exists in pool. + */ + if (status_mem_pool) { + if (!spin_trylock(&mem_pool->pool_lock)) { + atomic_dec(&mem_pool->isolation_in_progress_cnt); + preempt_enable(); + return false; + } + + spin_lock(&page_md->migrate_lock); + /* Check status again to ensure page has not been removed from memory pool. */ + if (PAGE_STATUS_GET(page_md->status) == MEM_POOL) { + page_md->status = PAGE_ISOLATE_SET(page_md->status, 1); + list_del_init(&p->lru); + mem_pool->cur_size--; + } + spin_unlock(&page_md->migrate_lock); + spin_unlock(&mem_pool->pool_lock); + atomic_dec(&mem_pool->isolation_in_progress_cnt); + preempt_enable(); + } + + return IS_PAGE_ISOLATED(page_md->status); +} + +/** + * kbase_page_migrate - Migrate content of old page to new page provided. + * + * @mapping: Pointer to address_space struct associated with pages. + * @new_page: Pointer to the page struct of new page. + * @old_page: Pointer to the page struct of old page. + * @mode: Mode to determine if migration will be synchronised. + * + * Callback function for Linux to migrate the content of the old page to the + * new page provided. + * This callback is not registered if compiled without CONFIG_PAGE_MIGRATION_SUPPORT. + * + * Return: 0 on success, error code otherwise. + */ +#if (KERNEL_VERSION(6, 0, 0) > LINUX_VERSION_CODE) +static int kbase_page_migrate(struct address_space *mapping, struct page *new_page, + struct page *old_page, enum migrate_mode mode) +#else +static int kbase_page_migrate(struct page *new_page, struct page *old_page, enum migrate_mode mode) +#endif +{ + int err = 0; + bool status_mem_pool = false; + bool status_free_pt_isolated_in_progress = false; + bool status_free_isolated_in_progress = false; + bool status_pt_mapped = false; + bool status_mapped = false; + bool status_not_movable = false; + struct kbase_page_metadata *page_md = kbase_page_private(old_page); + struct kbase_device *kbdev = NULL; + +#if (KERNEL_VERSION(6, 0, 0) > LINUX_VERSION_CODE) + CSTD_UNUSED(mapping); +#endif + CSTD_UNUSED(mode); + + if (!kbase_is_page_migration_enabled() || !page_md || !IS_PAGE_MOVABLE(page_md->status)) + return -EINVAL; + + if (!spin_trylock(&page_md->migrate_lock)) + return -EAGAIN; + + if (WARN_ON(!IS_PAGE_ISOLATED(page_md->status))) { + spin_unlock(&page_md->migrate_lock); + return -EINVAL; + } + + switch (PAGE_STATUS_GET(page_md->status)) { + case MEM_POOL: + status_mem_pool = true; + kbdev = page_md->data.mem_pool.kbdev; + break; + case ALLOCATED_MAPPED: + status_mapped = true; + break; + case PT_MAPPED: + status_pt_mapped = true; + break; + case FREE_ISOLATED_IN_PROGRESS: + status_free_isolated_in_progress = true; + kbdev = page_md->data.free_isolated.kbdev; + break; + case FREE_PT_ISOLATED_IN_PROGRESS: + status_free_pt_isolated_in_progress = true; + kbdev = page_md->data.free_pt_isolated.kbdev; + break; + case NOT_MOVABLE: + status_not_movable = true; + break; + default: + /* State should always fall in one of the previous cases! */ + err = -EAGAIN; + break; + } + + spin_unlock(&page_md->migrate_lock); + + if (status_mem_pool || status_free_isolated_in_progress || + status_free_pt_isolated_in_progress) { + struct kbase_mem_migrate *mem_migrate = &kbdev->mem_migrate; + + kbase_free_page_metadata(kbdev, old_page, NULL); + __ClearPageMovable(old_page); + put_page(old_page); + + /* Just free new page to avoid lock contention. */ + INIT_LIST_HEAD(&new_page->lru); + get_page(new_page); + set_page_private(new_page, 0); + kbase_free_page_later(kbdev, new_page); + queue_work(mem_migrate->free_pages_workq, &mem_migrate->free_pages_work); + } else if (status_not_movable) { + err = -EINVAL; + } else if (status_mapped) { + err = kbasep_migrate_page_allocated_mapped(old_page, new_page); + } else if (status_pt_mapped) { + err = kbasep_migrate_page_pt_mapped(old_page, new_page); + } + + /* While we want to preserve the movability of pages for which we return + * EAGAIN, according to the kernel docs, movable pages for which a critical + * error is returned are called putback on, which may not be what we + * expect. + */ + if (err < 0 && err != -EAGAIN) { + __ClearPageMovable(old_page); + page_md->status = PAGE_MOVABLE_CLEAR(page_md->status); + } + + return err; +} + +/** + * kbase_page_putback - Return isolated page back to kbase. + * + * @p: Pointer of the page struct of page. + * + * Callback function for Linux to return isolated page back to kbase. This + * will only be called for a page that has been isolated but failed to + * migrate. This function will put back the given page to the state it was + * in before it was isolated. + * This callback is not registered if compiled without CONFIG_PAGE_MIGRATION_SUPPORT. + */ +static void kbase_page_putback(struct page *p) +{ + bool status_mem_pool = false; + bool status_free_isolated_in_progress = false; + bool status_free_pt_isolated_in_progress = false; + struct kbase_page_metadata *page_md = kbase_page_private(p); + struct kbase_device *kbdev = NULL; + + if (!IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT)) + return; + /* If we don't have page metadata, the page may not belong to the + * driver or may already have been freed, and there's nothing we can do + */ + if (!page_md) + return; + + spin_lock(&page_md->migrate_lock); + + if (WARN_ON(!IS_PAGE_ISOLATED(page_md->status))) { + spin_unlock(&page_md->migrate_lock); + return; + } + + switch (PAGE_STATUS_GET(page_md->status)) { + case MEM_POOL: + status_mem_pool = true; + kbdev = page_md->data.mem_pool.kbdev; + break; + case ALLOCATED_MAPPED: + page_md->status = PAGE_ISOLATE_SET(page_md->status, 0); + break; + case PT_MAPPED: + case NOT_MOVABLE: + /* Pages should no longer be isolated if they are in a stable state + * and used by the driver. + */ + page_md->status = PAGE_ISOLATE_SET(page_md->status, 0); + break; + case FREE_ISOLATED_IN_PROGRESS: + status_free_isolated_in_progress = true; + kbdev = page_md->data.free_isolated.kbdev; + break; + case FREE_PT_ISOLATED_IN_PROGRESS: + status_free_pt_isolated_in_progress = true; + kbdev = page_md->data.free_pt_isolated.kbdev; + break; + default: + /* State should always fall in one of the previous cases! */ + break; + } + + spin_unlock(&page_md->migrate_lock); + + /* If page was in a memory pool then just free it to avoid lock contention. The + * same is also true to status_free_pt_isolated_in_progress. + */ + if (status_mem_pool || status_free_isolated_in_progress || + status_free_pt_isolated_in_progress) { + __ClearPageMovable(p); + page_md->status = PAGE_MOVABLE_CLEAR(page_md->status); + if (!WARN_ON_ONCE(!kbdev)) { + struct kbase_mem_migrate *mem_migrate = &kbdev->mem_migrate; + + kbase_free_page_later(kbdev, p); + queue_work(mem_migrate->free_pages_workq, &mem_migrate->free_pages_work); + } + } +} + +#if (KERNEL_VERSION(6, 0, 0) <= LINUX_VERSION_CODE) +static const struct movable_operations movable_ops = { + .isolate_page = kbase_page_isolate, + .migrate_page = kbase_page_migrate, + .putback_page = kbase_page_putback, +}; +#else +static const struct address_space_operations kbase_address_space_ops = { + .isolate_page = kbase_page_isolate, + .migratepage = kbase_page_migrate, + .putback_page = kbase_page_putback, +}; +#endif + +#if (KERNEL_VERSION(6, 0, 0) > LINUX_VERSION_CODE) +void kbase_mem_migrate_set_address_space_ops(struct kbase_device *kbdev, struct file *const filp) +{ + if (!kbase_is_page_migration_enabled()) + return; + + mutex_lock(&kbdev->fw_load_lock); + + if (filp) { + filp->f_inode->i_mapping->a_ops = &kbase_address_space_ops; + + if (!kbdev->mem_migrate.inode) { + kbdev->mem_migrate.inode = filp->f_inode; + /* This reference count increment is balanced by iput() + * upon termination. + */ + atomic_inc(&filp->f_inode->i_count); + } else { + WARN_ON(kbdev->mem_migrate.inode != filp->f_inode); + } + } + + mutex_unlock(&kbdev->fw_load_lock); +} +#endif + +void kbase_mem_migrate_init(struct kbase_device *kbdev) +{ +#if !IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT) + /* Page migration explicitly disabled at compile time - do nothing */ + return; +#else + struct kbase_mem_migrate *mem_migrate = &kbdev->mem_migrate; + + /* Page migration support compiled in, either explicitly or + * by default, so the default behaviour is to follow the choice + * of large pages if not selected at insmod. Check insmod parameter + * integer for a negative value to see if insmod parameter was + * passed in at all (it will override the default negative value). + */ + if (kbase_page_migration_enabled < 0) + kbase_page_migration_enabled = kbdev->pagesize_2mb ? 1 : 0; + else + dev_info(kbdev->dev, "Page migration support explicitly %s at insmod.", + kbase_page_migration_enabled ? "enabled" : "disabled"); + + spin_lock_init(&mem_migrate->free_pages_lock); + INIT_LIST_HEAD(&mem_migrate->free_pages_list); + +#if (KERNEL_VERSION(6, 0, 0) > LINUX_VERSION_CODE) + mem_migrate->inode = NULL; +#endif + mem_migrate->free_pages_workq = + alloc_workqueue("free_pages_workq", WQ_UNBOUND | WQ_MEM_RECLAIM, 1); + INIT_WORK(&mem_migrate->free_pages_work, kbase_free_pages_worker); +#endif +} + +void kbase_mem_migrate_term(struct kbase_device *kbdev) +{ + struct kbase_mem_migrate *mem_migrate = &kbdev->mem_migrate; + +#if !IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT) + /* Page migration explicitly disabled at compile time - do nothing */ + return; +#endif + if (mem_migrate->free_pages_workq) + destroy_workqueue(mem_migrate->free_pages_workq); +#if (KERNEL_VERSION(6, 0, 0) > LINUX_VERSION_CODE) + iput(mem_migrate->inode); +#endif +} diff --git a/mali_kbase/mali_kbase_mem_migrate.h b/mali_kbase/mali_kbase_mem_migrate.h new file mode 100644 index 0000000..e9f3fc4 --- /dev/null +++ b/mali_kbase/mali_kbase_mem_migrate.h @@ -0,0 +1,118 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * + * (C) COPYRIGHT 2022-2023 ARM Limited. All rights reserved. + * + * This program is free software and is provided to you under the terms of the + * GNU General Public License version 2 as published by the Free Software + * Foundation, and any use by you of this program is subject to the terms + * of such GNU license. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, you can access it online at + * http://www.gnu.org/licenses/gpl-2.0.html. + * + */ +#ifndef _KBASE_MEM_MIGRATE_H +#define _KBASE_MEM_MIGRATE_H + +/** + * DOC: Base kernel page migration implementation. + */ + +#define PAGE_STATUS_MASK ((u8)0x3F) +#define PAGE_STATUS_GET(status) (status & PAGE_STATUS_MASK) +#define PAGE_STATUS_SET(status, value) ((status & ~PAGE_STATUS_MASK) | (value & PAGE_STATUS_MASK)) + +#define PAGE_ISOLATE_SHIFT (7) +#define PAGE_ISOLATE_MASK ((u8)1 << PAGE_ISOLATE_SHIFT) +#define PAGE_ISOLATE_SET(status, value) \ + ((status & ~PAGE_ISOLATE_MASK) | (value << PAGE_ISOLATE_SHIFT)) +#define IS_PAGE_ISOLATED(status) ((bool)(status & PAGE_ISOLATE_MASK)) + +#define PAGE_MOVABLE_SHIFT (6) +#define PAGE_MOVABLE_MASK ((u8)1 << PAGE_MOVABLE_SHIFT) +#define PAGE_MOVABLE_CLEAR(status) ((status) & ~PAGE_MOVABLE_MASK) +#define PAGE_MOVABLE_SET(status) (status | PAGE_MOVABLE_MASK) + +#define IS_PAGE_MOVABLE(status) ((bool)(status & PAGE_MOVABLE_MASK)) + +/* Global integer used to determine if module parameter value has been + * provided and if page migration feature is enabled. + */ +#if !IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT) +extern const int kbase_page_migration_enabled; +#else +extern int kbase_page_migration_enabled; +#endif + +/** + * kbase_alloc_page_metadata - Allocate and initialize page metadata + * @kbdev: Pointer to kbase device. + * @p: Page to assign metadata to. + * @dma_addr: DMA address mapped to paged. + * @group_id: Memory group ID associated with the entity that is + * allocating the page metadata. + * + * This will allocate memory for the page's metadata, initialize it and + * assign a reference to the page's private field. Importantly, once + * the metadata is set and ready this function will mark the page as + * movable. + * + * Return: true if successful or false otherwise. + */ +bool kbase_alloc_page_metadata(struct kbase_device *kbdev, struct page *p, dma_addr_t dma_addr, + u8 group_id); + +bool kbase_is_page_migration_enabled(void); + +/** + * kbase_free_page_later - Defer freeing of given page. + * @kbdev: Pointer to kbase device + * @p: Page to free + * + * This will add given page to a list of pages which will be freed at + * a later time. + */ +void kbase_free_page_later(struct kbase_device *kbdev, struct page *p); + +#if (KERNEL_VERSION(6, 0, 0) > LINUX_VERSION_CODE) +/* + * kbase_mem_migrate_set_address_space_ops - Set address space operations + * + * @kbdev: Pointer to object representing an instance of GPU platform device. + * @filp: Pointer to the struct file corresponding to device file + * /dev/malixx instance, passed to the file's open method. + * + * Assign address space operations to the given file struct @filp and + * add a reference to @kbdev. + */ +void kbase_mem_migrate_set_address_space_ops(struct kbase_device *kbdev, struct file *const filp); +#endif + +/* + * kbase_mem_migrate_init - Initialise kbase page migration + * + * @kbdev: Pointer to kbase device + * + * Enables page migration by default based on GPU and setup work queue to + * defer freeing pages during page migration callbacks. + */ +void kbase_mem_migrate_init(struct kbase_device *kbdev); + +/* + * kbase_mem_migrate_term - Terminate kbase page migration + * + * @kbdev: Pointer to kbase device + * + * This will flush any work left to free pages from page migration + * and destroy workqueue associated. + */ +void kbase_mem_migrate_term(struct kbase_device *kbdev); + +#endif /* _KBASE_migrate_H */ diff --git a/mali_kbase/mali_kbase_mem_pool.c b/mali_kbase/mali_kbase_mem_pool.c index c991adf..d942ff5 100644 --- a/mali_kbase/mali_kbase_mem_pool.c +++ b/mali_kbase/mali_kbase_mem_pool.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2015-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2015-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -21,12 +21,18 @@ #include <mali_kbase.h> #include <linux/mm.h> +#include <linux/migrate.h> #include <linux/dma-mapping.h> #include <linux/highmem.h> #include <linux/spinlock.h> #include <linux/shrinker.h> #include <linux/atomic.h> #include <linux/version.h> +#if KERNEL_VERSION(4, 11, 0) <= LINUX_VERSION_CODE +#include <linux/sched/signal.h> +#else +#include <linux/signal.h> +#endif #define pool_dbg(pool, format, ...) \ dev_dbg(pool->kbdev->dev, "%s-pool [%zu/%zu]: " format, \ @@ -70,6 +76,41 @@ static void kbase_mem_pool_ordered_add_array_spill_locked( struct tagged_addr *pages, struct list_head *spillover_list, bool zero, bool sync); +/** + * can_alloc_page() - Check if the current thread can allocate a physical page + * + * @pool: Pointer to the memory pool. + * @page_owner: Pointer to the task/process that created the Kbase context + * for which a page needs to be allocated. It can be NULL if + * the page won't be associated with Kbase context. + * + * This function checks if the current thread can make a request to kernel to + * allocate a physical page. If the process that created the context is exiting or + * is being killed, then there is no point in doing a page allocation. + * + * The check done by the function is particularly helpful when the system is running + * low on memory. When a page is allocated from the context of a kernel thread, OoM + * killer doesn't consider the kernel thread for killing and kernel keeps retrying + * to allocate the page as long as the OoM killer is able to kill processes. + * The check allows to quickly exit the page allocation loop once OoM + * killer has initiated the killing of @page_owner, thereby unblocking the context + * termination for @page_owner and freeing of GPU memory allocated by it. This helps + * in preventing the kernel panic and also limits the number of innocent processes + * that get killed. + * + * Return: true if the page can be allocated otherwise false. + */ +static inline bool can_alloc_page(struct kbase_mem_pool *pool, struct task_struct *page_owner) +{ + if (page_owner && ((page_owner->flags & PF_EXITING) || fatal_signal_pending(page_owner))) { + dev_info(pool->kbdev->dev, "%s : Process %s/%d exiting", __func__, page_owner->comm, + task_pid_nr(page_owner)); + return false; + } + + return true; +} + static size_t kbase_mem_pool_capacity(struct kbase_mem_pool *pool) { ssize_t max_size = kbase_mem_pool_max_size(pool); @@ -88,9 +129,47 @@ static bool kbase_mem_pool_is_empty(struct kbase_mem_pool *pool) return kbase_mem_pool_size(pool) == 0; } +static bool set_pool_new_page_metadata(struct kbase_mem_pool *pool, struct page *p, + struct list_head *page_list, size_t *list_size) +{ + struct kbase_page_metadata *page_md = kbase_page_private(p); + bool not_movable = false; + + lockdep_assert_held(&pool->pool_lock); + + /* Free the page instead of adding it to the pool if it's not movable. + * Only update page status and add the page to the memory pool if + * it is not isolated. + */ + if (!IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT)) + not_movable = true; + else { + spin_lock(&page_md->migrate_lock); + if (PAGE_STATUS_GET(page_md->status) == (u8)NOT_MOVABLE) { + not_movable = true; + } else if (!WARN_ON_ONCE(IS_PAGE_ISOLATED(page_md->status))) { + page_md->status = PAGE_STATUS_SET(page_md->status, (u8)MEM_POOL); + page_md->data.mem_pool.pool = pool; + page_md->data.mem_pool.kbdev = pool->kbdev; + list_add(&p->lru, page_list); + (*list_size)++; + } + spin_unlock(&page_md->migrate_lock); + } + + if (not_movable) { + kbase_free_page_later(pool->kbdev, p); + pool_dbg(pool, "skipping a not movable page\n"); + } + + return not_movable; +} + static void kbase_mem_pool_add_locked(struct kbase_mem_pool *pool, struct page *p) { + bool queue_work_to_free = false; + if (mali_kbase_mem_pool_order_pages_enabled) { kbase_mem_pool_ordered_add_locked(pool, p); return; @@ -98,8 +177,19 @@ static void kbase_mem_pool_add_locked(struct kbase_mem_pool *pool, lockdep_assert_held(&pool->pool_lock); - list_add(&p->lru, &pool->page_list); - pool->cur_size++; + if (!pool->order && kbase_is_page_migration_enabled()) { + if (set_pool_new_page_metadata(pool, p, &pool->page_list, &pool->cur_size)) + queue_work_to_free = true; + } else { + list_add(&p->lru, &pool->page_list); + pool->cur_size++; + } + + if (queue_work_to_free) { + struct kbase_mem_migrate *mem_migrate = &pool->kbdev->mem_migrate; + + queue_work(mem_migrate->free_pages_workq, &mem_migrate->free_pages_work); + } pool_dbg(pool, "added page\n"); } @@ -114,10 +204,28 @@ static void kbase_mem_pool_add(struct kbase_mem_pool *pool, struct page *p) static void kbase_mem_pool_add_list_locked(struct kbase_mem_pool *pool, struct list_head *page_list, size_t nr_pages) { + bool queue_work_to_free = false; + lockdep_assert_held(&pool->pool_lock); - list_splice(page_list, &pool->page_list); - pool->cur_size += nr_pages; + if (!pool->order && kbase_is_page_migration_enabled()) { + struct page *p, *tmp; + + list_for_each_entry_safe(p, tmp, page_list, lru) { + list_del_init(&p->lru); + if (set_pool_new_page_metadata(pool, p, &pool->page_list, &pool->cur_size)) + queue_work_to_free = true; + } + } else { + list_splice(page_list, &pool->page_list); + pool->cur_size += nr_pages; + } + + if (queue_work_to_free) { + struct kbase_mem_migrate *mem_migrate = &pool->kbdev->mem_migrate; + + queue_work(mem_migrate->free_pages_workq, &mem_migrate->free_pages_work); + } pool_dbg(pool, "added %zu pages\n", nr_pages); } @@ -130,7 +238,8 @@ static void kbase_mem_pool_add_list(struct kbase_mem_pool *pool, kbase_mem_pool_unlock(pool); } -static struct page *kbase_mem_pool_remove_locked(struct kbase_mem_pool *pool) +static struct page *kbase_mem_pool_remove_locked(struct kbase_mem_pool *pool, + enum kbase_page_status status) { struct page *p; @@ -140,6 +249,16 @@ static struct page *kbase_mem_pool_remove_locked(struct kbase_mem_pool *pool) return NULL; p = list_first_entry(&pool->page_list, struct page, lru); + + if (!pool->order && kbase_is_page_migration_enabled()) { + struct kbase_page_metadata *page_md = kbase_page_private(p); + + spin_lock(&page_md->migrate_lock); + WARN_ON(PAGE_STATUS_GET(page_md->status) != (u8)MEM_POOL); + page_md->status = PAGE_STATUS_SET(page_md->status, (u8)status); + spin_unlock(&page_md->migrate_lock); + } + list_del_init(&p->lru); pool->cur_size--; @@ -148,12 +267,13 @@ static struct page *kbase_mem_pool_remove_locked(struct kbase_mem_pool *pool) return p; } -static struct page *kbase_mem_pool_remove(struct kbase_mem_pool *pool) +static struct page *kbase_mem_pool_remove(struct kbase_mem_pool *pool, + enum kbase_page_status status) { struct page *p; kbase_mem_pool_lock(pool); - p = kbase_mem_pool_remove_locked(pool); + p = kbase_mem_pool_remove_locked(pool, status); kbase_mem_pool_unlock(pool); return p; @@ -163,9 +283,9 @@ static void kbase_mem_pool_sync_page(struct kbase_mem_pool *pool, struct page *p) { struct device *dev = pool->kbdev->dev; + dma_addr_t dma_addr = pool->order ? kbase_dma_addr_as_priv(p) : kbase_dma_addr(p); - dma_sync_single_for_device(dev, kbase_dma_addr(p), - (PAGE_SIZE << pool->order), DMA_BIDIRECTIONAL); + dma_sync_single_for_device(dev, dma_addr, (PAGE_SIZE << pool->order), DMA_BIDIRECTIONAL); } static void kbase_mem_pool_zero_page(struct kbase_mem_pool *pool, @@ -196,7 +316,7 @@ static void kbase_mem_pool_spill(struct kbase_mem_pool *next_pool, struct page *kbase_mem_alloc_page(struct kbase_mem_pool *pool) { struct page *p; - gfp_t gfp = GFP_HIGHUSER | __GFP_ZERO; + gfp_t gfp = __GFP_ZERO; struct kbase_device *const kbdev = pool->kbdev; struct device *const dev = kbdev->dev; dma_addr_t dma_addr; @@ -204,7 +324,9 @@ struct page *kbase_mem_alloc_page(struct kbase_mem_pool *pool) /* don't warn on higher order failures */ if (pool->order) - gfp |= __GFP_NOWARN; + gfp |= GFP_HIGHUSER | __GFP_NOWARN; + else + gfp |= kbase_is_page_migration_enabled() ? GFP_HIGHUSER_MOVABLE : GFP_HIGHUSER; p = kbdev->mgm_dev->ops.mgm_alloc_page(kbdev->mgm_dev, pool->group_id, gfp, pool->order); @@ -220,30 +342,59 @@ struct page *kbase_mem_alloc_page(struct kbase_mem_pool *pool) return NULL; } - WARN_ON(dma_addr != page_to_phys(p)); - for (i = 0; i < (1u << pool->order); i++) - kbase_set_dma_addr(p+i, dma_addr + PAGE_SIZE * i); + /* Setup page metadata for 4KB pages when page migration is enabled */ + if (!pool->order && kbase_is_page_migration_enabled()) { + INIT_LIST_HEAD(&p->lru); + if (!kbase_alloc_page_metadata(kbdev, p, dma_addr, pool->group_id)) { + dma_unmap_page(dev, dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL); + kbdev->mgm_dev->ops.mgm_free_page(kbdev->mgm_dev, pool->group_id, p, + pool->order); + return NULL; + } + } else { + WARN_ON(dma_addr != page_to_phys(p)); + for (i = 0; i < (1u << pool->order); i++) + kbase_set_dma_addr_as_priv(p + i, dma_addr + PAGE_SIZE * i); + } return p; } -static void kbase_mem_pool_free_page(struct kbase_mem_pool *pool, - struct page *p) +static void enqueue_free_pool_pages_work(struct kbase_mem_pool *pool) { - struct kbase_device *const kbdev = pool->kbdev; - struct device *const dev = kbdev->dev; - dma_addr_t dma_addr = kbase_dma_addr(p); - int i; + struct kbase_mem_migrate *mem_migrate = &pool->kbdev->mem_migrate; + + if (!pool->order && kbase_is_page_migration_enabled()) + queue_work(mem_migrate->free_pages_workq, &mem_migrate->free_pages_work); +} - dma_unmap_page(dev, dma_addr, (PAGE_SIZE << pool->order), - DMA_BIDIRECTIONAL); - for (i = 0; i < (1u << pool->order); i++) - kbase_clear_dma_addr(p+i); +void kbase_mem_pool_free_page(struct kbase_mem_pool *pool, struct page *p) +{ + struct kbase_device *kbdev; + + if (WARN_ON(!pool)) + return; + if (WARN_ON(!p)) + return; + + kbdev = pool->kbdev; - kbdev->mgm_dev->ops.mgm_free_page(kbdev->mgm_dev, - pool->group_id, p, pool->order); + if (!pool->order && kbase_is_page_migration_enabled()) { + kbase_free_page_later(kbdev, p); + pool_dbg(pool, "page to be freed to kernel later\n"); + } else { + int i; + dma_addr_t dma_addr = kbase_dma_addr_as_priv(p); + + for (i = 0; i < (1u << pool->order); i++) + kbase_clear_dma_addr_as_priv(p + i); + + dma_unmap_page(kbdev->dev, dma_addr, (PAGE_SIZE << pool->order), DMA_BIDIRECTIONAL); - pool_dbg(pool, "freed page to kernel\n"); + kbdev->mgm_dev->ops.mgm_free_page(kbdev->mgm_dev, pool->group_id, p, pool->order); + + pool_dbg(pool, "freed page to kernel\n"); + } } static size_t kbase_mem_pool_shrink_locked(struct kbase_mem_pool *pool, @@ -255,10 +406,13 @@ static size_t kbase_mem_pool_shrink_locked(struct kbase_mem_pool *pool, lockdep_assert_held(&pool->pool_lock); for (i = 0; i < nr_to_shrink && !kbase_mem_pool_is_empty(pool); i++) { - p = kbase_mem_pool_remove_locked(pool); + p = kbase_mem_pool_remove_locked(pool, FREE_IN_PROGRESS); kbase_mem_pool_free_page(pool, p); } + /* Freeing of pages will be deferred when page migration is enabled. */ + enqueue_free_pool_pages_work(pool); + return i; } @@ -274,8 +428,8 @@ static size_t kbase_mem_pool_shrink(struct kbase_mem_pool *pool, return nr_freed; } -int kbase_mem_pool_grow(struct kbase_mem_pool *pool, - size_t nr_to_grow) +int kbase_mem_pool_grow(struct kbase_mem_pool *pool, size_t nr_to_grow, + struct task_struct *page_owner) { struct page *p; size_t i; @@ -293,6 +447,9 @@ int kbase_mem_pool_grow(struct kbase_mem_pool *pool, } kbase_mem_pool_unlock(pool); + if (unlikely(!can_alloc_page(pool, page_owner))) + return -ENOMEM; + p = kbase_mem_alloc_page(pool); if (!p) { kbase_mem_pool_lock(pool); @@ -310,6 +467,7 @@ int kbase_mem_pool_grow(struct kbase_mem_pool *pool, return 0; } +KBASE_EXPORT_TEST_API(kbase_mem_pool_grow); void kbase_mem_pool_trim(struct kbase_mem_pool *pool, size_t new_size) { @@ -324,7 +482,7 @@ void kbase_mem_pool_trim(struct kbase_mem_pool *pool, size_t new_size) if (new_size < cur_size) kbase_mem_pool_shrink(pool, cur_size - new_size); else if (new_size > cur_size) - err = kbase_mem_pool_grow(pool, new_size - cur_size); + err = kbase_mem_pool_grow(pool, new_size - cur_size, NULL); if (err) { size_t grown_size = kbase_mem_pool_size(pool); @@ -365,6 +523,9 @@ static unsigned long kbase_mem_pool_reclaim_count_objects(struct shrinker *s, kbase_mem_pool_lock(pool); if (pool->dont_reclaim && !pool->dying) { kbase_mem_pool_unlock(pool); + /* Tell shrinker to skip reclaim + * even though freeable pages are available + */ return 0; } pool_size = kbase_mem_pool_size(pool); @@ -384,7 +545,10 @@ static unsigned long kbase_mem_pool_reclaim_scan_objects(struct shrinker *s, kbase_mem_pool_lock(pool); if (pool->dont_reclaim && !pool->dying) { kbase_mem_pool_unlock(pool); - return 0; + /* Tell shrinker that reclaim can't be made and + * do not attempt again for this reclaim context. + */ + return SHRINK_STOP; } pool_dbg(pool, "reclaim scan %ld:\n", sc->nr_to_scan); @@ -398,12 +562,9 @@ static unsigned long kbase_mem_pool_reclaim_scan_objects(struct shrinker *s, return freed; } -int kbase_mem_pool_init(struct kbase_mem_pool *pool, - const struct kbase_mem_pool_config *config, - unsigned int order, - int group_id, - struct kbase_device *kbdev, - struct kbase_mem_pool *next_pool) +int kbase_mem_pool_init(struct kbase_mem_pool *pool, const struct kbase_mem_pool_config *config, + unsigned int order, int group_id, struct kbase_device *kbdev, + struct kbase_mem_pool *next_pool) { if (WARN_ON(group_id < 0) || WARN_ON(group_id >= MEMORY_GROUP_MANAGER_NR_GROUPS)) { @@ -417,6 +578,7 @@ int kbase_mem_pool_init(struct kbase_mem_pool *pool, pool->kbdev = kbdev; pool->next_pool = next_pool; pool->dying = false; + atomic_set(&pool->isolation_in_progress_cnt, 0); spin_lock_init(&pool->pool_lock); INIT_LIST_HEAD(&pool->page_list); @@ -428,12 +590,17 @@ int kbase_mem_pool_init(struct kbase_mem_pool *pool, * struct shrinker does not define batch */ pool->reclaim.batch = 0; - register_shrinker(&pool->reclaim, "mali-mempool"); +#if KERNEL_VERSION(6, 0, 0) > LINUX_VERSION_CODE + register_shrinker(&pool->reclaim); +#else + register_shrinker(&pool->reclaim, "mali-mem-pool"); +#endif pool_dbg(pool, "initialized\n"); return 0; } +KBASE_EXPORT_TEST_API(kbase_mem_pool_init); void kbase_mem_pool_mark_dying(struct kbase_mem_pool *pool) { @@ -465,15 +632,17 @@ void kbase_mem_pool_term(struct kbase_mem_pool *pool) /* Zero pages first without holding the next_pool lock */ for (i = 0; i < nr_to_spill; i++) { - p = kbase_mem_pool_remove_locked(pool); - list_add(&p->lru, &spill_list); + p = kbase_mem_pool_remove_locked(pool, SPILL_IN_PROGRESS); + if (p) + list_add(&p->lru, &spill_list); } } while (!kbase_mem_pool_is_empty(pool)) { /* Free remaining pages to kernel */ - p = kbase_mem_pool_remove_locked(pool); - list_add(&p->lru, &free_list); + p = kbase_mem_pool_remove_locked(pool, FREE_IN_PROGRESS); + if (p) + list_add(&p->lru, &free_list); } kbase_mem_pool_unlock(pool); @@ -506,8 +675,19 @@ void kbase_mem_pool_term(struct kbase_mem_pool *pool) kbase_mem_pool_free_page(pool, p); } + /* Freeing of pages will be deferred when page migration is enabled. */ + enqueue_free_pool_pages_work(pool); + + /* Before returning wait to make sure there are no pages undergoing page isolation + * which will require reference to this pool. + */ + if (kbase_is_page_migration_enabled()) { + while (atomic_read(&pool->isolation_in_progress_cnt)) + cpu_relax(); + } pool_dbg(pool, "terminated\n"); } +KBASE_EXPORT_TEST_API(kbase_mem_pool_term); struct page *kbase_mem_pool_alloc(struct kbase_mem_pool *pool) { @@ -515,7 +695,7 @@ struct page *kbase_mem_pool_alloc(struct kbase_mem_pool *pool) do { pool_dbg(pool, "alloc()\n"); - p = kbase_mem_pool_remove(pool); + p = kbase_mem_pool_remove(pool, ALLOCATE_IN_PROGRESS); if (p) return p; @@ -528,17 +708,10 @@ struct page *kbase_mem_pool_alloc(struct kbase_mem_pool *pool) struct page *kbase_mem_pool_alloc_locked(struct kbase_mem_pool *pool) { - struct page *p; - lockdep_assert_held(&pool->pool_lock); pool_dbg(pool, "alloc_locked()\n"); - p = kbase_mem_pool_remove_locked(pool); - - if (p) - return p; - - return NULL; + return kbase_mem_pool_remove_locked(pool, ALLOCATE_IN_PROGRESS); } void kbase_mem_pool_free(struct kbase_mem_pool *pool, struct page *p, @@ -565,6 +738,8 @@ void kbase_mem_pool_free(struct kbase_mem_pool *pool, struct page *p, } else { /* Free page */ kbase_mem_pool_free_page(pool, p); + /* Freeing of pages will be deferred when page migration is enabled. */ + enqueue_free_pool_pages_work(pool); } } @@ -589,11 +764,14 @@ void kbase_mem_pool_free_locked(struct kbase_mem_pool *pool, struct page *p, } else { /* Free page */ kbase_mem_pool_free_page(pool, p); + /* Freeing of pages will be deferred when page migration is enabled. */ + enqueue_free_pool_pages_work(pool); } } int kbase_mem_pool_alloc_pages(struct kbase_mem_pool *pool, size_t nr_4k_pages, - struct tagged_addr *pages, bool partial_allowed) + struct tagged_addr *pages, bool partial_allowed, + struct task_struct *page_owner) { struct page *p; size_t nr_from_pool; @@ -612,10 +790,12 @@ int kbase_mem_pool_alloc_pages(struct kbase_mem_pool *pool, size_t nr_4k_pages, /* Get pages from this pool */ kbase_mem_pool_lock(pool); nr_from_pool = min(nr_pages_internal, kbase_mem_pool_size(pool)); + while (nr_from_pool--) { int j; - p = kbase_mem_pool_remove_locked(pool); + p = kbase_mem_pool_remove_locked(pool, ALLOCATE_IN_PROGRESS); + if (pool->order) { pages[i++] = as_tagged_tag(page_to_phys(p), HUGE_HEAD | HUGE_PAGE); @@ -631,8 +811,8 @@ int kbase_mem_pool_alloc_pages(struct kbase_mem_pool *pool, size_t nr_4k_pages, if (i != nr_4k_pages && pool->next_pool) { /* Allocate via next pool */ - err = kbase_mem_pool_alloc_pages(pool->next_pool, - nr_4k_pages - i, pages + i, partial_allowed); + err = kbase_mem_pool_alloc_pages(pool->next_pool, nr_4k_pages - i, pages + i, + partial_allowed, page_owner); if (err < 0) goto err_rollback; @@ -641,6 +821,9 @@ int kbase_mem_pool_alloc_pages(struct kbase_mem_pool *pool, size_t nr_4k_pages, } else { /* Get any remaining pages from kernel */ while (i != nr_4k_pages) { + if (unlikely(!can_alloc_page(pool, page_owner))) + goto err_rollback; + p = kbase_mem_alloc_page(pool); if (!p) { if (partial_allowed) @@ -674,6 +857,9 @@ done: err_rollback: kbase_mem_pool_free_pages(pool, i, pages, NOT_DIRTY, NOT_RECLAIMED); + dev_warn(pool->kbdev->dev, + "Failed allocation request for remaining %zu pages after obtaining %zu pages already.\n", + nr_4k_pages, i); return err; } @@ -703,7 +889,7 @@ int kbase_mem_pool_alloc_pages_locked(struct kbase_mem_pool *pool, for (i = 0; i < nr_pages_internal; i++) { int j; - p = kbase_mem_pool_remove_locked(pool); + p = kbase_mem_pool_remove_locked(pool, ALLOCATE_IN_PROGRESS); if (pool->order) { *pages++ = as_tagged_tag(page_to_phys(p), HUGE_HEAD | HUGE_PAGE); @@ -810,6 +996,7 @@ void kbase_mem_pool_free_pages(struct kbase_mem_pool *pool, size_t nr_pages, size_t nr_to_pool; LIST_HEAD(to_pool_list); size_t i = 0; + bool pages_released = false; if (mali_kbase_mem_pool_order_pages_enabled) { kbase_mem_pool_ordered_free_pages(pool, nr_pages, pages, dirty, @@ -848,13 +1035,17 @@ void kbase_mem_pool_free_pages(struct kbase_mem_pool *pool, size_t nr_pages, pages[i] = as_tagged(0); continue; } - p = as_page(pages[i]); kbase_mem_pool_free_page(pool, p); pages[i] = as_tagged(0); + pages_released = true; } + /* Freeing of pages will be deferred when page migration is enabled. */ + if (pages_released) + enqueue_free_pool_pages_work(pool); + pool_dbg(pool, "free_pages(%zu) done\n", nr_pages); } @@ -867,6 +1058,7 @@ void kbase_mem_pool_free_pages_locked(struct kbase_mem_pool *pool, size_t nr_to_pool; LIST_HEAD(to_pool_list); size_t i = 0; + bool pages_released = false; if (mali_kbase_mem_pool_order_pages_enabled) { kbase_mem_pool_ordered_free_pages_locked(pool, nr_pages, pages, @@ -903,8 +1095,13 @@ void kbase_mem_pool_free_pages_locked(struct kbase_mem_pool *pool, kbase_mem_pool_free_page(pool, p); pages[i] = as_tagged(0); + pages_released = true; } + /* Freeing of pages will be deferred when page migration is enabled. */ + if (pages_released) + enqueue_free_pool_pages_work(pool); + pool_dbg(pool, "free_pages_locked(%zu) done\n", nr_pages); } diff --git a/mali_kbase/mali_kbase_mem_pool_debugfs.c b/mali_kbase/mali_kbase_mem_pool_debugfs.c index cfb43b0..3b1b2ba 100644 --- a/mali_kbase/mali_kbase_mem_pool_debugfs.c +++ b/mali_kbase/mali_kbase_mem_pool_debugfs.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2014-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2014-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -168,13 +168,7 @@ static const struct file_operations kbase_mem_pool_debugfs_max_size_fops = { void kbase_mem_pool_debugfs_init(struct dentry *parent, struct kbase_context *kctx) { - /* prevent unprivileged use of debug file in old kernel version */ -#if (KERNEL_VERSION(4, 7, 0) <= LINUX_VERSION_CODE) - /* only for newer kernel version debug file system is safe */ const mode_t mode = 0644; -#else - const mode_t mode = 0600; -#endif debugfs_create_file("mem_pool_size", mode, parent, &kctx->mem_pools.small, &kbase_mem_pool_debugfs_fops); diff --git a/mali_kbase/mali_kbase_mem_pool_group.c b/mali_kbase/mali_kbase_mem_pool_group.c index 8d7bb4d..49c4b04 100644 --- a/mali_kbase/mali_kbase_mem_pool_group.c +++ b/mali_kbase/mali_kbase_mem_pool_group.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -43,29 +43,22 @@ void kbase_mem_pool_group_config_set_max_size( } } -int kbase_mem_pool_group_init( - struct kbase_mem_pool_group *const mem_pools, - struct kbase_device *const kbdev, - const struct kbase_mem_pool_group_config *const configs, - struct kbase_mem_pool_group *next_pools) +int kbase_mem_pool_group_init(struct kbase_mem_pool_group *const mem_pools, + struct kbase_device *const kbdev, + const struct kbase_mem_pool_group_config *const configs, + struct kbase_mem_pool_group *next_pools) { int gid, err = 0; for (gid = 0; gid < MEMORY_GROUP_MANAGER_NR_GROUPS; ++gid) { - err = kbase_mem_pool_init(&mem_pools->small[gid], - &configs->small[gid], - KBASE_MEM_POOL_4KB_PAGE_TABLE_ORDER, - gid, - kbdev, - next_pools ? &next_pools->small[gid] : NULL); + err = kbase_mem_pool_init(&mem_pools->small[gid], &configs->small[gid], + KBASE_MEM_POOL_4KB_PAGE_TABLE_ORDER, gid, kbdev, + next_pools ? &next_pools->small[gid] : NULL); if (!err) { - err = kbase_mem_pool_init(&mem_pools->large[gid], - &configs->large[gid], - KBASE_MEM_POOL_2MB_PAGE_TABLE_ORDER, - gid, - kbdev, - next_pools ? &next_pools->large[gid] : NULL); + err = kbase_mem_pool_init(&mem_pools->large[gid], &configs->large[gid], + KBASE_MEM_POOL_2MB_PAGE_TABLE_ORDER, gid, kbdev, + next_pools ? &next_pools->large[gid] : NULL); if (err) kbase_mem_pool_term(&mem_pools->small[gid]); } diff --git a/mali_kbase/mali_kbase_mem_pool_group.h b/mali_kbase/mali_kbase_mem_pool_group.h index c50ffdb..fe8ce77 100644 --- a/mali_kbase/mali_kbase_mem_pool_group.h +++ b/mali_kbase/mali_kbase_mem_pool_group.h @@ -49,8 +49,8 @@ static inline struct kbase_mem_pool *kbase_mem_pool_group_select( } /** - * kbase_mem_pool_group_config_init - Set the initial configuration for a - * set of memory pools + * kbase_mem_pool_group_config_set_max_size - Set the initial configuration for + * a set of memory pools * * @configs: Initial configuration for the set of memory pools * @max_size: Maximum number of free 4 KiB pages each pool can hold @@ -86,13 +86,12 @@ void kbase_mem_pool_group_config_set_max_size( * * Return: 0 on success, otherwise a negative error code */ -int kbase_mem_pool_group_init(struct kbase_mem_pool_group *mem_pools, - struct kbase_device *kbdev, - const struct kbase_mem_pool_group_config *configs, - struct kbase_mem_pool_group *next_pools); +int kbase_mem_pool_group_init(struct kbase_mem_pool_group *mem_pools, struct kbase_device *kbdev, + const struct kbase_mem_pool_group_config *configs, + struct kbase_mem_pool_group *next_pools); /** - * kbase_mem_pool_group_term - Mark a set of memory pools as dying + * kbase_mem_pool_group_mark_dying - Mark a set of memory pools as dying * * @mem_pools: Set of memory pools to mark * diff --git a/mali_kbase/mali_kbase_mem_profile_debugfs.c b/mali_kbase/mali_kbase_mem_profile_debugfs.c index 92ab1b8..9317023 100644 --- a/mali_kbase/mali_kbase_mem_profile_debugfs.c +++ b/mali_kbase/mali_kbase_mem_profile_debugfs.c @@ -69,11 +69,7 @@ static const struct file_operations kbasep_mem_profile_debugfs_fops = { int kbasep_mem_profile_debugfs_insert(struct kbase_context *kctx, char *data, size_t size) { -#if (KERNEL_VERSION(4, 7, 0) <= LINUX_VERSION_CODE) const mode_t mode = 0444; -#else - const mode_t mode = 0400; -#endif int err = 0; mutex_lock(&kctx->mem_profile_lock); diff --git a/mali_kbase/mali_kbase_native_mgm.c b/mali_kbase/mali_kbase_native_mgm.c index 4554bee..10a7f50 100644 --- a/mali_kbase/mali_kbase_native_mgm.c +++ b/mali_kbase/mali_kbase_native_mgm.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -140,6 +140,30 @@ kbase_native_mgm_update_gpu_pte(struct memory_group_manager_device *mgm_dev, return pte; } +/** + * kbase_native_mgm_pte_to_original_pte - Native method to undo changes done in + * kbase_native_mgm_update_gpu_pte() + * + * @mgm_dev: The memory group manager the request is being made through. + * @group_id: A physical memory group ID, which must be valid but is not used. + * Its valid range is 0 .. MEMORY_GROUP_MANAGER_NR_GROUPS-1. + * @mmu_level: The level of the MMU page table where the page is getting mapped. + * @pte: The prepared page table entry. + * + * This function simply returns the @pte without modification. + * + * Return: A GPU page table entry to be stored in a page table. + */ +static u64 kbase_native_mgm_pte_to_original_pte(struct memory_group_manager_device *mgm_dev, + int group_id, int mmu_level, u64 pte) +{ + CSTD_UNUSED(mgm_dev); + CSTD_UNUSED(group_id); + CSTD_UNUSED(mmu_level); + + return pte; +} + struct memory_group_manager_device kbase_native_mgm_dev = { .ops = { .mgm_alloc_page = kbase_native_mgm_alloc, @@ -147,6 +171,7 @@ struct memory_group_manager_device kbase_native_mgm_dev = { .mgm_get_import_memory_id = NULL, .mgm_vmf_insert_pfn_prot = kbase_native_mgm_vmf_insert_pfn_prot, .mgm_update_gpu_pte = kbase_native_mgm_update_gpu_pte, + .mgm_pte_to_original_pte = kbase_native_mgm_pte_to_original_pte, }, .data = NULL }; diff --git a/mali_kbase/mali_kbase_pbha.c b/mali_kbase/mali_kbase_pbha.c index 90406b2..b446bd5 100644 --- a/mali_kbase/mali_kbase_pbha.c +++ b/mali_kbase/mali_kbase_pbha.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2021-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -23,7 +23,10 @@ #include <device/mali_kbase_device.h> #include <mali_kbase.h> + +#if MALI_USE_CSF #define DTB_SET_SIZE 2 +#endif static bool read_setting_valid(unsigned int id, unsigned int read_setting) { @@ -209,31 +212,36 @@ void kbase_pbha_write_settings(struct kbase_device *kbdev) } } -int kbase_pbha_read_dtb(struct kbase_device *kbdev) +#if MALI_USE_CSF +static int kbase_pbha_read_int_id_override_property(struct kbase_device *kbdev, + const struct device_node *pbha_node) { u32 dtb_data[SYSC_ALLOC_COUNT * sizeof(u32) * DTB_SET_SIZE]; - const struct device_node *pbha_node; int sz, i; bool valid = true; - if (!kbasep_pbha_supported(kbdev)) - return 0; + sz = of_property_count_elems_of_size(pbha_node, "int-id-override", sizeof(u32)); - pbha_node = of_get_child_by_name(kbdev->dev->of_node, "pbha"); - if (!pbha_node) + if (sz == -EINVAL) { + /* There is no int-id-override field. Fallback to int_id_override instead */ + sz = of_property_count_elems_of_size(pbha_node, "int_id_override", sizeof(u32)); + } + if (sz == -EINVAL) { + /* There is no int_id_override field. This is valid - but there's nothing further + * to do here. + */ return 0; - - sz = of_property_count_elems_of_size(pbha_node, "int_id_override", - sizeof(u32)); + } if (sz <= 0 || (sz % DTB_SET_SIZE != 0)) { dev_err(kbdev->dev, "Bad DTB format: pbha.int_id_override\n"); return -EINVAL; } - if (of_property_read_u32_array(pbha_node, "int_id_override", dtb_data, - sz) != 0) { - dev_err(kbdev->dev, - "Failed to read DTB pbha.int_id_override\n"); - return -EINVAL; + if (of_property_read_u32_array(pbha_node, "int-id-override", dtb_data, sz) != 0) { + /* There may be no int-id-override field. Fallback to int_id_override instead */ + if (of_property_read_u32_array(pbha_node, "int_id_override", dtb_data, sz) != 0) { + dev_err(kbdev->dev, "Failed to read DTB pbha.int_id_override\n"); + return -EINVAL; + } } for (i = 0; valid && i < sz; i = i + DTB_SET_SIZE) { @@ -256,3 +264,66 @@ int kbase_pbha_read_dtb(struct kbase_device *kbdev) } return 0; } + +static int kbase_pbha_read_propagate_bits_property(struct kbase_device *kbdev, + const struct device_node *pbha_node) +{ + u32 bits = 0; + int err; + + if (!kbase_hw_has_feature(kbdev, BASE_HW_FEATURE_PBHA_HWU)) + return 0; + + err = of_property_read_u32(pbha_node, "propagate-bits", &bits); + + if (err == -EINVAL) { + err = of_property_read_u32(pbha_node, "propagate_bits", &bits); + } + + if (err < 0) { + if (err != -EINVAL) { + dev_err(kbdev->dev, + "DTB value for propagate_bits is improperly formed (err=%d)\n", + err); + return err; + } else { + /* Property does not exist */ + kbdev->pbha_propagate_bits = 0; + return 0; + } + } + + if (bits > (L2_CONFIG_PBHA_HWU_MASK >> L2_CONFIG_PBHA_HWU_SHIFT)) { + dev_err(kbdev->dev, "Bad DTB value for propagate_bits: 0x%x\n", bits); + return -EINVAL; + } + + kbdev->pbha_propagate_bits = bits; + return 0; +} +#endif /* MALI_USE_CSF */ + +int kbase_pbha_read_dtb(struct kbase_device *kbdev) +{ +#if MALI_USE_CSF + const struct device_node *pbha_node; + int err; + + if (!kbasep_pbha_supported(kbdev)) + return 0; + + pbha_node = of_get_child_by_name(kbdev->dev->of_node, "pbha"); + if (!pbha_node) + return 0; + + err = kbase_pbha_read_int_id_override_property(kbdev, pbha_node); + + if (err < 0) + return err; + + err = kbase_pbha_read_propagate_bits_property(kbdev, pbha_node); + return err; +#else + return 0; +#endif +} diff --git a/mali_kbase/mali_kbase_pbha_debugfs.c b/mali_kbase/mali_kbase_pbha_debugfs.c index 47eab63..1cc29c7 100644 --- a/mali_kbase/mali_kbase_pbha_debugfs.c +++ b/mali_kbase/mali_kbase_pbha_debugfs.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -20,13 +20,15 @@ */ #include "mali_kbase_pbha_debugfs.h" - #include "mali_kbase_pbha.h" - #include <device/mali_kbase_device.h> #include <mali_kbase_reset_gpu.h> #include <mali_kbase.h> +#if MALI_USE_CSF +#include "backend/gpu/mali_kbase_pm_internal.h" +#endif + static int int_id_overrides_show(struct seq_file *sfile, void *data) { struct kbase_device *kbdev = sfile->private; @@ -108,6 +110,90 @@ static int int_id_overrides_open(struct inode *in, struct file *file) return single_open(file, int_id_overrides_show, in->i_private); } +#if MALI_USE_CSF +/** + * propagate_bits_show - Read PBHA bits from L2_CONFIG out to debugfs. + * + * @sfile: The debugfs entry. + * @data: Data associated with the entry. + * + * Return: 0 in all cases. + */ +static int propagate_bits_show(struct seq_file *sfile, void *data) +{ + struct kbase_device *kbdev = sfile->private; + u32 l2_config_val; + + kbase_csf_scheduler_pm_active(kbdev); + kbase_pm_wait_for_l2_powered(kbdev); + l2_config_val = L2_CONFIG_PBHA_HWU_GET(kbase_reg_read(kbdev, GPU_CONTROL_REG(L2_CONFIG))); + kbase_csf_scheduler_pm_idle(kbdev); + + seq_printf(sfile, "PBHA Propagate Bits: 0x%x\n", l2_config_val); + return 0; +} + +static int propagate_bits_open(struct inode *in, struct file *file) +{ + return single_open(file, propagate_bits_show, in->i_private); +} + +/** + * propagate_bits_write - Write input value from debugfs to PBHA bits of L2_CONFIG register. + * + * @file: Pointer to file struct of debugfs node. + * @ubuf: Pointer to user buffer with value to be written. + * @count: Size of user buffer. + * @ppos: Not used. + * + * Return: Size of buffer passed in when successful, but error code E2BIG/EINVAL otherwise. + */ +static ssize_t propagate_bits_write(struct file *file, const char __user *ubuf, size_t count, + loff_t *ppos) +{ + struct seq_file *sfile = file->private_data; + struct kbase_device *kbdev = sfile->private; + /* 32 characters should be enough for the input string in any base */ + char raw_str[32]; + unsigned long propagate_bits; + + if (count >= sizeof(raw_str)) + return -E2BIG; + if (copy_from_user(raw_str, ubuf, count)) + return -EINVAL; + raw_str[count] = '\0'; + if (kstrtoul(raw_str, 0, &propagate_bits)) + return -EINVAL; + + /* Check propagate_bits input argument does not + * exceed the maximum size of the propagate_bits mask. + */ + if (propagate_bits > (L2_CONFIG_PBHA_HWU_MASK >> L2_CONFIG_PBHA_HWU_SHIFT)) + return -EINVAL; + /* Cast to u8 is safe as check is done already to ensure size is within + * correct limits. + */ + kbdev->pbha_propagate_bits = (u8)propagate_bits; + + /* GPU Reset will set new values in L2 config */ + if (kbase_prepare_to_reset_gpu(kbdev, RESET_FLAGS_NONE)) { + kbase_reset_gpu(kbdev); + kbase_reset_gpu_wait(kbdev); + } + + return count; +} + +static const struct file_operations pbha_propagate_bits_fops = { + .owner = THIS_MODULE, + .open = propagate_bits_open, + .read = seq_read, + .write = propagate_bits_write, + .llseek = seq_lseek, + .release = single_release, +}; +#endif /* MALI_USE_CSF */ + static const struct file_operations pbha_int_id_overrides_fops = { .owner = THIS_MODULE, .open = int_id_overrides_open, @@ -120,14 +206,10 @@ static const struct file_operations pbha_int_id_overrides_fops = { void kbase_pbha_debugfs_init(struct kbase_device *kbdev) { if (kbasep_pbha_supported(kbdev)) { -#if (KERNEL_VERSION(4, 7, 0) <= LINUX_VERSION_CODE) - /* only for newer kernel version debug file system is safe */ const mode_t mode = 0644; -#else - const mode_t mode = 0600; -#endif struct dentry *debugfs_pbha_dir = debugfs_create_dir( "pbha", kbdev->mali_debugfs_directory); + if (IS_ERR_OR_NULL(debugfs_pbha_dir)) { dev_err(kbdev->dev, "Couldn't create mali debugfs page-based hardware attributes directory\n"); @@ -136,5 +218,10 @@ void kbase_pbha_debugfs_init(struct kbase_device *kbdev) debugfs_create_file("int_id_overrides", mode, debugfs_pbha_dir, kbdev, &pbha_int_id_overrides_fops); +#if MALI_USE_CSF + if (kbase_hw_has_feature(kbdev, BASE_HW_FEATURE_PBHA_HWU)) + debugfs_create_file("propagate_bits", mode, debugfs_pbha_dir, kbdev, + &pbha_propagate_bits_fops); +#endif /* MALI_USE_CSF */ } } diff --git a/mali_kbase/mali_kbase_pbha_debugfs.h b/mali_kbase/mali_kbase_pbha_debugfs.h index 3f477b4..508ecdf 100644 --- a/mali_kbase/mali_kbase_pbha_debugfs.h +++ b/mali_kbase/mali_kbase_pbha_debugfs.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -25,7 +25,7 @@ #include <mali_kbase.h> /** - * kbasep_pbha_debugfs_init - Initialize pbha debugfs directory + * kbase_pbha_debugfs_init - Initialize pbha debugfs directory * * @kbdev: Device pointer */ diff --git a/mali_kbase/mali_kbase_platform_fake.c b/mali_kbase/mali_kbase_platform_fake.c index bf525ed..265c676 100644 --- a/mali_kbase/mali_kbase_platform_fake.c +++ b/mali_kbase/mali_kbase_platform_fake.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2011-2014, 2016-2017, 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2011-2014, 2016-2017, 2020-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -32,14 +32,15 @@ */ #include <mali_kbase_config.h> +#ifndef CONFIG_OF + #define PLATFORM_CONFIG_RESOURCE_COUNT 4 -#define PLATFORM_CONFIG_IRQ_RES_COUNT 3 static struct platform_device *mali_device; -#ifndef CONFIG_OF /** - * Convert data in struct kbase_io_resources struct to Linux-specific resources + * kbasep_config_parse_io_resources - Convert data in struct kbase_io_resources + * struct to Linux-specific resources * @io_resources: Input IO resource data * @linux_resources: Pointer to output array of Linux resource structures * @@ -72,14 +73,11 @@ static void kbasep_config_parse_io_resources(const struct kbase_io_resources *io linux_resources[3].end = io_resources->gpu_irq_number; linux_resources[3].flags = IORESOURCE_IRQ | IORESOURCE_IRQ_HIGHLEVEL; } -#endif /* CONFIG_OF */ int kbase_platform_register(void) { struct kbase_platform_config *config; -#ifndef CONFIG_OF struct resource resources[PLATFORM_CONFIG_RESOURCE_COUNT]; -#endif int err; config = kbase_get_platform_config(); /* declared in midgard/mali_kbase_config.h but defined in platform folder */ @@ -92,7 +90,6 @@ int kbase_platform_register(void) if (mali_device == NULL) return -ENOMEM; -#ifndef CONFIG_OF kbasep_config_parse_io_resources(config->io_resources, resources); err = platform_device_add_resources(mali_device, resources, PLATFORM_CONFIG_RESOURCE_COUNT); if (err) { @@ -100,7 +97,6 @@ int kbase_platform_register(void) mali_device = NULL; return err; } -#endif /* CONFIG_OF */ err = platform_device_add(mali_device); if (err) { @@ -119,3 +115,5 @@ void kbase_platform_unregister(void) platform_device_unregister(mali_device); } EXPORT_SYMBOL(kbase_platform_unregister); + +#endif /* CONFIG_OF */ diff --git a/mali_kbase/mali_kbase_pm.c b/mali_kbase/mali_kbase_pm.c index de2422c..d6c559a 100644 --- a/mali_kbase/mali_kbase_pm.c +++ b/mali_kbase/mali_kbase_pm.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2010-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -27,7 +27,7 @@ #include <gpu/mali_kbase_gpu_regmap.h> #include <mali_kbase_vinstr.h> #include <mali_kbase_kinstr_prfcnt.h> -#include <mali_kbase_hwcnt_context.h> +#include <hwcnt/mali_kbase_hwcnt_context.h> #include <mali_kbase_pm.h> #include <backend/gpu/mali_kbase_pm_internal.h> @@ -159,13 +159,13 @@ int kbase_pm_driver_suspend(struct kbase_device *kbdev) */ kbase_hwcnt_context_disable(kbdev->hwcnt_gpu_ctx); - mutex_lock(&kbdev->pm.lock); + rt_mutex_lock(&kbdev->pm.lock); if (WARN_ON(kbase_pm_is_suspending(kbdev))) { - mutex_unlock(&kbdev->pm.lock); + rt_mutex_unlock(&kbdev->pm.lock); return 0; } kbdev->pm.suspending = true; - mutex_unlock(&kbdev->pm.lock); + rt_mutex_unlock(&kbdev->pm.lock); #ifdef CONFIG_MALI_ARBITER_SUPPORT if (kbdev->arb.arb_if) { @@ -194,9 +194,9 @@ int kbase_pm_driver_suspend(struct kbase_device *kbdev) kbasep_js_suspend(kbdev); #else if (kbase_csf_scheduler_pm_suspend(kbdev)) { - mutex_lock(&kbdev->pm.lock); + rt_mutex_lock(&kbdev->pm.lock); kbdev->pm.suspending = false; - mutex_unlock(&kbdev->pm.lock); + rt_mutex_unlock(&kbdev->pm.lock); return -1; } #endif @@ -211,13 +211,31 @@ int kbase_pm_driver_suspend(struct kbase_device *kbdev) kbdev->pm.active_count == 0); dev_dbg(kbdev->dev, ">wait_event - waiting done\n"); +#if MALI_USE_CSF + /* At this point, any kbase context termination should either have run to + * completion and any further context termination can only begin after + * the system resumes. Therefore, it is now safe to skip taking the context + * list lock when traversing the context list. + */ + if (kbase_csf_kcpu_queue_halt_timers(kbdev)) { + rt_mutex_lock(&kbdev->pm.lock); + kbdev->pm.suspending = false; + rt_mutex_unlock(&kbdev->pm.lock); + return -1; + } +#endif + /* NOTE: We synchronize with anything that was just finishing a * kbase_pm_context_idle() call by locking the pm.lock below */ if (kbase_hwaccess_pm_suspend(kbdev)) { - mutex_lock(&kbdev->pm.lock); +#if MALI_USE_CSF + /* Resume the timers in case of suspend failure. */ + kbase_csf_kcpu_queue_resume_timers(kbdev); +#endif + rt_mutex_lock(&kbdev->pm.lock); kbdev->pm.suspending = false; - mutex_unlock(&kbdev->pm.lock); + rt_mutex_unlock(&kbdev->pm.lock); return -1; } @@ -262,6 +280,8 @@ void kbase_pm_driver_resume(struct kbase_device *kbdev, bool arb_gpu_start) kbasep_js_resume(kbdev); #else kbase_csf_scheduler_pm_resume(kbdev); + + kbase_csf_kcpu_queue_resume_timers(kbdev); #endif /* Matching idle call, to power off the GPU/cores if we didn't actually @@ -283,6 +303,10 @@ void kbase_pm_driver_resume(struct kbase_device *kbdev, bool arb_gpu_start) /* Resume HW counters intermediaries. */ kbase_vinstr_resume(kbdev->vinstr_ctx); kbase_kinstr_prfcnt_resume(kbdev->kinstr_prfcnt_ctx); + /* System resume callback is complete */ + kbdev->pm.resuming = false; + /* Unblock the threads waiting for the completion of System suspend/resume */ + wake_up_all(&kbdev->pm.resume_wait); } int kbase_pm_suspend(struct kbase_device *kbdev) @@ -462,11 +486,11 @@ static enum hrtimer_restart kbase_pm_apc_timer_callback(struct hrtimer *timer) int kbase_pm_apc_init(struct kbase_device *kbdev) { - kthread_init_worker(&kbdev->apc.worker); - kbdev->apc.thread = kbase_create_realtime_thread(kbdev, - kthread_worker_fn, &kbdev->apc.worker, "mali_apc_thread"); - if (IS_ERR(kbdev->apc.thread)) - return PTR_ERR(kbdev->apc.thread); + int ret; + + ret = kbase_kthread_run_worker_rt(kbdev, &kbdev->apc.worker, "mali_apc_thread"); + if (ret) + return ret; /* * We initialize power off and power on work on init as they will each @@ -486,6 +510,5 @@ int kbase_pm_apc_init(struct kbase_device *kbdev) void kbase_pm_apc_term(struct kbase_device *kbdev) { hrtimer_cancel(&kbdev->apc.timer); - kthread_flush_worker(&kbdev->apc.worker); - kthread_stop(kbdev->apc.thread); + kbase_destroy_kworker_stack(&kbdev->apc.worker); } diff --git a/mali_kbase/mali_kbase_pm.h b/mali_kbase/mali_kbase_pm.h index 7252bc7..4ff3699 100644 --- a/mali_kbase/mali_kbase_pm.h +++ b/mali_kbase/mali_kbase_pm.h @@ -292,4 +292,14 @@ void kbase_pm_apc_term(struct kbase_device *kbdev); */ void kbase_pm_apc_request(struct kbase_device *kbdev, u32 dur_usec); +/** + * Print debug message indicating power state of GPU + * @kbdev: The kbase device structure for the device (must be a valid pointer) + * @timeout_msg: A message to print. + * + * Prerequisite: GPU is powered. + * Takes and releases kbdev->hwaccess_lock on CSF GPUs. + */ +void kbase_gpu_timeout_debug_message(struct kbase_device *kbdev, const char *timeout_msg); + #endif /* _KBASE_PM_H_ */ diff --git a/mali_kbase/mali_kbase_refcount_defs.h b/mali_kbase/mali_kbase_refcount_defs.h new file mode 100644 index 0000000..c517a2d --- /dev/null +++ b/mali_kbase/mali_kbase_refcount_defs.h @@ -0,0 +1,57 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * + * (C) COPYRIGHT 2023 ARM Limited. All rights reserved. + * + * This program is free software and is provided to you under the terms of the + * GNU General Public License version 2 as published by the Free Software + * Foundation, and any use by you of this program is subject to the terms + * of such GNU license. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, you can access it online at + * http://www.gnu.org/licenses/gpl-2.0.html. + * + */ + +#ifndef _KBASE_REFCOUNT_DEFS_H_ +#define _KBASE_REFCOUNT_DEFS_H_ + +/* + * The Refcount API is available from 4.11 onwards + * This file hides the compatibility issues with this for the rest the driver + */ + +#include <linux/version.h> +#include <linux/types.h> + +#if (KERNEL_VERSION(4, 11, 0) > LINUX_VERSION_CODE) + +#define kbase_refcount_t atomic_t +#define kbase_refcount_read(x) atomic_read(x) +#define kbase_refcount_set(x, v) atomic_set(x, v) +#define kbase_refcount_dec_and_test(x) atomic_dec_and_test(x) +#define kbase_refcount_dec(x) atomic_dec(x) +#define kbase_refcount_inc_not_zero(x) atomic_inc_not_zero(x) +#define kbase_refcount_inc(x) atomic_inc(x) + +#else + +#include <linux/refcount.h> + +#define kbase_refcount_t refcount_t +#define kbase_refcount_read(x) refcount_read(x) +#define kbase_refcount_set(x, v) refcount_set(x, v) +#define kbase_refcount_dec_and_test(x) refcount_dec_and_test(x) +#define kbase_refcount_dec(x) refcount_dec(x) +#define kbase_refcount_inc_not_zero(x) refcount_inc_not_zero(x) +#define kbase_refcount_inc(x) refcount_inc(x) + +#endif /* (KERNEL_VERSION(4, 11, 0) > LINUX_VERSION_CODE) */ + +#endif /* _KBASE_REFCOUNT_DEFS_H_ */ diff --git a/mali_kbase/mali_kbase_regs_history_debugfs.c b/mali_kbase/mali_kbase_regs_history_debugfs.c index f8dec6b..c19b4a3 100644 --- a/mali_kbase/mali_kbase_regs_history_debugfs.c +++ b/mali_kbase/mali_kbase_regs_history_debugfs.c @@ -25,6 +25,7 @@ #if defined(CONFIG_DEBUG_FS) && !IS_ENABLED(CONFIG_MALI_NO_MALI) #include <linux/debugfs.h> +#include <linux/version_compat_defs.h> /** * kbase_io_history_resize - resize the register access history buffer. @@ -158,11 +159,8 @@ static int regs_history_size_set(void *data, u64 val) return kbase_io_history_resize(h, (u16)val); } - -DEFINE_SIMPLE_ATTRIBUTE(regs_history_size_fops, - regs_history_size_get, - regs_history_size_set, - "%llu\n"); +DEFINE_DEBUGFS_ATTRIBUTE(regs_history_size_fops, regs_history_size_get, regs_history_size_set, + "%llu\n"); /** * regs_history_show - show callback for the register access history file. diff --git a/mali_kbase/mali_kbase_reset_gpu.h b/mali_kbase/mali_kbase_reset_gpu.h index ff631e9..5063b64 100644 --- a/mali_kbase/mali_kbase_reset_gpu.h +++ b/mali_kbase/mali_kbase_reset_gpu.h @@ -144,6 +144,14 @@ void kbase_reset_gpu_assert_prevented(struct kbase_device *kbdev); void kbase_reset_gpu_assert_failed_or_prevented(struct kbase_device *kbdev); /** + * kbase_reset_gpu_failed - Return whether a previous GPU reset failed. + * + * @kbdev: Device pointer + * + */ +bool kbase_reset_gpu_failed(struct kbase_device *kbdev); + +/** * RESET_FLAGS_NONE - Flags for kbase_prepare_to_reset_gpu */ #define RESET_FLAGS_NONE (0U) @@ -151,6 +159,9 @@ void kbase_reset_gpu_assert_failed_or_prevented(struct kbase_device *kbdev); /* This reset should be treated as an unrecoverable error by HW counter logic */ #define RESET_FLAGS_HWC_UNRECOVERABLE_ERROR ((unsigned int)(1 << 0)) +/* pixel: Powercycle the GPU instead of attempting a soft/hard reset (only used on CSF hw). */ +#define RESET_FLAGS_FORCE_PM_HW_RESET ((unsigned int)(1 << 1)) + /** * kbase_prepare_to_reset_gpu_locked - Prepare for resetting the GPU. * @kbdev: Device pointer @@ -237,6 +248,18 @@ int kbase_reset_gpu_silent(struct kbase_device *kbdev); bool kbase_reset_gpu_is_active(struct kbase_device *kbdev); /** + * kbase_reset_gpu_not_pending - Reports if the GPU reset isn't pending + * + * @kbdev: Device pointer + * + * Note that unless appropriate locks are held when using this function, the + * state could change immediately afterwards. + * + * Return: True if the GPU reset isn't pending. + */ +bool kbase_reset_gpu_is_not_pending(struct kbase_device *kbdev); + +/** * kbase_reset_gpu_wait - Wait for a GPU reset to complete * @kbdev: Device pointer * diff --git a/mali_kbase/mali_kbase_smc.h b/mali_kbase/mali_kbase_smc.h index 91eb9ee..40a3483 100644 --- a/mali_kbase/mali_kbase_smc.h +++ b/mali_kbase/mali_kbase_smc.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2015, 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2015, 2020-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -49,7 +49,7 @@ u64 kbase_invoke_smc_fid(u32 fid, u64 arg0, u64 arg1, u64 arg2); /** - * kbase_invoke_smc_fid - Perform a secure monitor call + * kbase_invoke_smc - Perform a secure monitor call * @oen: Owning Entity number (SIP, STD etc). * @function_number: The function number within the OEN. * @smc64: use SMC64 calling convention instead of SMC32. diff --git a/mali_kbase/mali_kbase_softjobs.c b/mali_kbase/mali_kbase_softjobs.c index bbb0934..31da049 100644 --- a/mali_kbase/mali_kbase_softjobs.c +++ b/mali_kbase/mali_kbase_softjobs.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2011-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2011-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -23,7 +23,7 @@ #include <linux/dma-buf.h> #include <asm/cacheflush.h> -#if defined(CONFIG_SYNC) || defined(CONFIG_SYNC_FILE) +#if IS_ENABLED(CONFIG_SYNC_FILE) #include <mali_kbase_sync.h> #include <mali_kbase_fence.h> #endif @@ -41,6 +41,7 @@ #include <linux/kernel.h> #include <linux/cache.h> #include <linux/file.h> +#include <linux/version_compat_defs.h> #if !MALI_USE_CSF /** @@ -75,7 +76,7 @@ static void kbasep_add_waiting_with_timeout(struct kbase_jd_atom *katom) /* Record the start time of this atom so we could cancel it at * the right time. */ - katom->start_timestamp = ktime_get(); + katom->start_timestamp = ktime_get_raw(); /* Add the atom to the waiting list before the timer is * (re)started to make sure that it gets processed. @@ -206,7 +207,7 @@ static int kbase_dump_cpu_gpu_time(struct kbase_jd_atom *katom) return 0; } -#if defined(CONFIG_SYNC) || defined(CONFIG_SYNC_FILE) +#if IS_ENABLED(CONFIG_SYNC_FILE) /* Called by the explicit fence mechanism when a fence wait has completed */ void kbase_soft_event_wait_callback(struct kbase_jd_atom *katom) { @@ -215,7 +216,7 @@ void kbase_soft_event_wait_callback(struct kbase_jd_atom *katom) rt_mutex_lock(&kctx->jctx.lock); kbasep_remove_waiting_soft_job(katom); kbase_finish_soft_job(katom); - if (jd_done_nolock(katom, true)) + if (kbase_jd_done_nolock(katom, true)) kbase_js_sched_all(kctx->kbdev); rt_mutex_unlock(&kctx->jctx.lock); } @@ -229,7 +230,7 @@ static void kbasep_soft_event_complete_job(struct kthread_work *work) int resched; rt_mutex_lock(&kctx->jctx.lock); - resched = jd_done_nolock(katom, true); + resched = kbase_jd_done_nolock(katom, true); rt_mutex_unlock(&kctx->jctx.lock); if (resched) @@ -390,7 +391,7 @@ void kbasep_soft_job_timeout_worker(struct timer_list *timer) soft_job_timeout); u32 timeout_ms = (u32)atomic_read( &kctx->kbdev->js_data.soft_job_timeout_ms); - ktime_t cur_time = ktime_get(); + ktime_t cur_time = ktime_get_raw(); bool restarting = false; unsigned long lflags; struct list_head *entry, *tmp; @@ -500,10 +501,11 @@ out: static void kbasep_soft_event_cancel_job(struct kbase_jd_atom *katom) { katom->event_code = BASE_JD_EVENT_JOB_CANCELLED; - if (jd_done_nolock(katom, true)) + if (kbase_jd_done_nolock(katom, true)) kbase_js_sched_all(katom->kctx->kbdev); } +#if IS_ENABLED(CONFIG_MALI_VECTOR_DUMP) || MALI_UNIT_TEST static void kbase_debug_copy_finish(struct kbase_jd_atom *katom) { struct kbase_debug_copy_buffer *buffers = katom->softjob_data; @@ -675,8 +677,8 @@ static int kbase_debug_copy_prepare(struct kbase_jd_atom *katom) case KBASE_MEM_TYPE_IMPORTED_USER_BUF: { struct kbase_mem_phy_alloc *alloc = reg->gpu_alloc; - unsigned long nr_pages = - alloc->imported.user_buf.nr_pages; + const unsigned long nr_pages = alloc->imported.user_buf.nr_pages; + const unsigned long start = alloc->imported.user_buf.address; if (alloc->imported.user_buf.mm != current->mm) { ret = -EINVAL; @@ -688,11 +690,9 @@ static int kbase_debug_copy_prepare(struct kbase_jd_atom *katom) ret = -ENOMEM; goto out_unlock; } - - ret = get_user_pages_fast( - alloc->imported.user_buf.address, - nr_pages, 0, - buffers[i].extres_pages); + kbase_gpu_vm_unlock(katom->kctx); + ret = get_user_pages_fast(start, nr_pages, 0, buffers[i].extres_pages); + kbase_gpu_vm_lock(katom->kctx); if (ret != nr_pages) { /* Adjust number of pages, so that we only * attempt to release pages in the array that we @@ -730,7 +730,6 @@ out_cleanup: return ret; } -#endif /* !MALI_USE_CSF */ #if KERNEL_VERSION(5, 6, 0) <= LINUX_VERSION_CODE static void *dma_buf_kmap_page(struct kbase_mem_phy_alloc *gpu_alloc, @@ -753,7 +752,7 @@ static void *dma_buf_kmap_page(struct kbase_mem_phy_alloc *gpu_alloc, if (page_index == page_num) { *page = sg_page_iter_page(&sg_iter); - return kmap(*page); + return kbase_kmap(*page); } page_index++; } @@ -762,8 +761,18 @@ static void *dma_buf_kmap_page(struct kbase_mem_phy_alloc *gpu_alloc, } #endif -int kbase_mem_copy_from_extres(struct kbase_context *kctx, - struct kbase_debug_copy_buffer *buf_data) +/** + * kbase_mem_copy_from_extres() - Copy from external resources. + * + * @kctx: kbase context within which the copying is to take place. + * @buf_data: Pointer to the information about external resources: + * pages pertaining to the external resource, number of + * pages to copy. + * + * Return: 0 on success, error code otherwise. + */ +static int kbase_mem_copy_from_extres(struct kbase_context *kctx, + struct kbase_debug_copy_buffer *buf_data) { unsigned int i; unsigned int target_page_nr = 0; @@ -789,14 +798,13 @@ int kbase_mem_copy_from_extres(struct kbase_context *kctx, for (i = 0; i < buf_data->nr_extres_pages && target_page_nr < buf_data->nr_pages; i++) { struct page *pg = buf_data->extres_pages[i]; - void *extres_page = kmap(pg); - + void *extres_page = kbase_kmap(pg); if (extres_page) { ret = kbase_mem_copy_to_pinned_user_pages( pages, extres_page, &to_copy, buf_data->nr_pages, &target_page_nr, offset); - kunmap(pg); + kbase_kunmap(pg, extres_page); if (ret) goto out_unlock; } @@ -812,11 +820,7 @@ int kbase_mem_copy_from_extres(struct kbase_context *kctx, dma_to_copy = min(dma_buf->size, (size_t)(buf_data->nr_extres_pages * PAGE_SIZE)); - ret = dma_buf_begin_cpu_access(dma_buf, -#if KERNEL_VERSION(4, 6, 0) > LINUX_VERSION_CODE && !defined(CONFIG_CHROMEOS) - 0, dma_to_copy, -#endif - DMA_FROM_DEVICE); + ret = dma_buf_begin_cpu_access(dma_buf, DMA_FROM_DEVICE); if (ret) goto out_unlock; @@ -835,7 +839,7 @@ int kbase_mem_copy_from_extres(struct kbase_context *kctx, &target_page_nr, offset); #if KERNEL_VERSION(5, 6, 0) <= LINUX_VERSION_CODE - kunmap(pg); + kbase_kunmap(pg, extres_page); #else dma_buf_kunmap(dma_buf, i, extres_page); #endif @@ -843,11 +847,7 @@ int kbase_mem_copy_from_extres(struct kbase_context *kctx, break; } } - dma_buf_end_cpu_access(dma_buf, -#if KERNEL_VERSION(4, 6, 0) > LINUX_VERSION_CODE && !defined(CONFIG_CHROMEOS) - 0, dma_to_copy, -#endif - DMA_FROM_DEVICE); + dma_buf_end_cpu_access(dma_buf, DMA_FROM_DEVICE); break; } default: @@ -858,7 +858,6 @@ out_unlock: return ret; } -#if !MALI_USE_CSF static int kbase_debug_copy(struct kbase_jd_atom *katom) { struct kbase_debug_copy_buffer *buffers = katom->softjob_data; @@ -876,6 +875,7 @@ static int kbase_debug_copy(struct kbase_jd_atom *katom) return 0; } +#endif /* IS_ENABLED(CONFIG_MALI_VECTOR_DUMP) || MALI_UNIT_TEST */ #endif /* !MALI_USE_CSF */ #define KBASEP_JIT_ALLOC_GPU_ADDR_ALIGNMENT ((u32)0x7) @@ -935,26 +935,6 @@ int kbasep_jit_alloc_validate(struct kbase_context *kctx, #if !MALI_USE_CSF -/* - * Sizes of user data to copy for each just-in-time memory interface version - * - * In interface version 2 onwards this is the same as the struct size, allowing - * copying of arrays of structures from userspace. - * - * In interface version 1 the structure size was variable, and hence arrays of - * structures cannot be supported easily, and were not a feature present in - * version 1 anyway. - */ -static const size_t jit_info_copy_size_for_jit_version[] = { - /* in jit_version 1, the structure did not have any end padding, hence - * it could be a different size on 32 and 64-bit clients. We therefore - * do not copy past the last member - */ - [1] = offsetofend(struct base_jit_alloc_info_10_2, id), - [2] = sizeof(struct base_jit_alloc_info_11_5), - [3] = sizeof(struct base_jit_alloc_info) -}; - static int kbase_jit_allocate_prepare(struct kbase_jd_atom *katom) { __user u8 *data = (__user u8 *)(uintptr_t) katom->jc; @@ -964,18 +944,18 @@ static int kbase_jit_allocate_prepare(struct kbase_jd_atom *katom) u32 count; int ret; u32 i; - size_t jit_info_user_copy_size; - WARN_ON(kctx->jit_version >= - ARRAY_SIZE(jit_info_copy_size_for_jit_version)); - jit_info_user_copy_size = - jit_info_copy_size_for_jit_version[kctx->jit_version]; - WARN_ON(jit_info_user_copy_size > sizeof(*info)); + if (!kbase_mem_allow_alloc(kctx)) { + dev_dbg(kbdev->dev, "Invalid attempt to allocate JIT memory by %s/%d for ctx %d_%d", + current->comm, current->pid, kctx->tgid, kctx->id); + ret = -EINVAL; + goto fail; + } /* For backwards compatibility, and to prevent reading more than 1 jit * info struct on jit version 1 */ - if (katom->nr_extres == 0 || kctx->jit_version == 1) + if (katom->nr_extres == 0) katom->nr_extres = 1; count = katom->nr_extres; @@ -995,17 +975,11 @@ static int kbase_jit_allocate_prepare(struct kbase_jd_atom *katom) katom->softjob_data = info; - for (i = 0; i < count; i++, info++, data += jit_info_user_copy_size) { - if (copy_from_user(info, data, jit_info_user_copy_size) != 0) { + for (i = 0; i < count; i++, info++, data += sizeof(*info)) { + if (copy_from_user(info, data, sizeof(*info)) != 0) { ret = -EINVAL; goto free_info; } - /* Clear any remaining bytes when user struct is smaller than - * kernel struct. For jit version 1, this also clears the - * padding bytes - */ - memset(((u8 *)info) + jit_info_user_copy_size, 0, - sizeof(*info) - jit_info_user_copy_size); ret = kbasep_jit_alloc_validate(kctx, info); if (ret) @@ -1357,7 +1331,7 @@ static void kbasep_jit_finish_worker(struct kthread_work *work) rt_mutex_lock(&kctx->jctx.lock); kbase_finish_soft_job(katom); - resched = jd_done_nolock(katom, true); + resched = kbase_jd_done_nolock(katom, true); rt_mutex_unlock(&kctx->jctx.lock); if (resched) @@ -1486,10 +1460,11 @@ static void kbase_ext_res_process(struct kbase_jd_atom *katom, bool map) if (!kbase_sticky_resource_acquire(katom->kctx, gpu_addr)) goto failed_loop; - } else + } else { if (!kbase_sticky_resource_release_force(katom->kctx, NULL, gpu_addr)) failed = true; + } } /* @@ -1549,7 +1524,7 @@ int kbase_process_soft_job(struct kbase_jd_atom *katom) ret = kbase_dump_cpu_gpu_time(katom); break; -#if defined(CONFIG_SYNC) || defined(CONFIG_SYNC_FILE) +#if IS_ENABLED(CONFIG_SYNC_FILE) case BASE_JD_REQ_SOFT_FENCE_TRIGGER: katom->event_code = kbase_sync_fence_out_trigger(katom, katom->event_code == BASE_JD_EVENT_DONE ? @@ -1578,6 +1553,7 @@ int kbase_process_soft_job(struct kbase_jd_atom *katom) case BASE_JD_REQ_SOFT_EVENT_RESET: kbasep_soft_event_update_locked(katom, BASE_JD_SOFT_EVENT_RESET); break; +#if IS_ENABLED(CONFIG_MALI_VECTOR_DUMP) || MALI_UNIT_TEST case BASE_JD_REQ_SOFT_DEBUG_COPY: { int res = kbase_debug_copy(katom); @@ -1586,6 +1562,7 @@ int kbase_process_soft_job(struct kbase_jd_atom *katom) katom->event_code = BASE_JD_EVENT_JOB_INVALID; break; } +#endif /* IS_ENABLED(CONFIG_MALI_VECTOR_DUMP) || MALI_UNIT_TEST */ case BASE_JD_REQ_SOFT_JIT_ALLOC: ret = kbase_jit_allocate_process(katom); break; @@ -1609,7 +1586,7 @@ int kbase_process_soft_job(struct kbase_jd_atom *katom) void kbase_cancel_soft_job(struct kbase_jd_atom *katom) { switch (katom->core_req & BASE_JD_REQ_SOFT_JOB_TYPE) { -#if defined(CONFIG_SYNC) || defined(CONFIG_SYNC_FILE) +#if IS_ENABLED(CONFIG_SYNC_FILE) case BASE_JD_REQ_SOFT_FENCE_WAIT: kbase_sync_fence_in_cancel_wait(katom); break; @@ -1632,7 +1609,7 @@ int kbase_prepare_soft_job(struct kbase_jd_atom *katom) return -EINVAL; } break; -#if defined(CONFIG_SYNC) || defined(CONFIG_SYNC_FILE) +#if IS_ENABLED(CONFIG_SYNC_FILE) case BASE_JD_REQ_SOFT_FENCE_TRIGGER: { struct base_fence fence; @@ -1683,20 +1660,9 @@ int kbase_prepare_soft_job(struct kbase_jd_atom *katom) fence.basep.fd); if (ret < 0) return ret; - -#ifdef CONFIG_MALI_DMA_FENCE - /* - * Set KCTX_NO_IMPLICIT_FENCE in the context the first - * time a soft fence wait job is observed. This will - * prevent the implicit dma-buf fence to conflict with - * the Android native sync fences. - */ - if (!kbase_ctx_flag(katom->kctx, KCTX_NO_IMPLICIT_SYNC)) - kbase_ctx_flag_set(katom->kctx, KCTX_NO_IMPLICIT_SYNC); -#endif /* CONFIG_MALI_DMA_FENCE */ } break; -#endif /* CONFIG_SYNC || CONFIG_SYNC_FILE */ +#endif /* CONFIG_SYNC_FILE */ case BASE_JD_REQ_SOFT_JIT_ALLOC: return kbase_jit_allocate_prepare(katom); case BASE_JD_REQ_SOFT_JIT_FREE: @@ -1707,8 +1673,10 @@ int kbase_prepare_soft_job(struct kbase_jd_atom *katom) if (katom->jc == 0) return -EINVAL; break; +#if IS_ENABLED(CONFIG_MALI_VECTOR_DUMP) || MALI_UNIT_TEST case BASE_JD_REQ_SOFT_DEBUG_COPY: return kbase_debug_copy_prepare(katom); +#endif /* IS_ENABLED(CONFIG_MALI_VECTOR_DUMP) || MALI_UNIT_TEST */ case BASE_JD_REQ_SOFT_EXT_RES_MAP: return kbase_ext_res_prepare(katom); case BASE_JD_REQ_SOFT_EXT_RES_UNMAP: @@ -1729,7 +1697,7 @@ void kbase_finish_soft_job(struct kbase_jd_atom *katom) case BASE_JD_REQ_SOFT_DUMP_CPU_GPU_TIME: /* Nothing to do */ break; -#if defined(CONFIG_SYNC) || defined(CONFIG_SYNC_FILE) +#if IS_ENABLED(CONFIG_SYNC_FILE) case BASE_JD_REQ_SOFT_FENCE_TRIGGER: /* If fence has not yet been signaled, do it now */ kbase_sync_fence_out_trigger(katom, katom->event_code == @@ -1739,10 +1707,12 @@ void kbase_finish_soft_job(struct kbase_jd_atom *katom) /* Release katom's reference to fence object */ kbase_sync_fence_in_remove(katom); break; -#endif /* CONFIG_SYNC || CONFIG_SYNC_FILE */ +#endif /* CONFIG_SYNC_FILE */ +#if IS_ENABLED(CONFIG_MALI_VECTOR_DUMP) || MALI_UNIT_TEST case BASE_JD_REQ_SOFT_DEBUG_COPY: kbase_debug_copy_finish(katom); break; +#endif /* IS_ENABLED(CONFIG_MALI_VECTOR_DUMP) || MALI_UNIT_TEST */ case BASE_JD_REQ_SOFT_JIT_ALLOC: kbase_jit_allocate_finish(katom); break; @@ -1793,7 +1763,7 @@ void kbase_resume_suspended_soft_jobs(struct kbase_device *kbdev) if (kbase_process_soft_job(katom_iter) == 0) { kbase_finish_soft_job(katom_iter); - resched |= jd_done_nolock(katom_iter, true); + resched |= kbase_jd_done_nolock(katom_iter, true); #ifdef CONFIG_MALI_ARBITER_SUPPORT atomic_dec(&kbdev->pm.gpu_users_waiting); #endif /* CONFIG_MALI_ARBITER_SUPPORT */ diff --git a/mali_kbase/mali_kbase_sync.h b/mali_kbase/mali_kbase_sync.h index e820dcc..2b466a6 100644 --- a/mali_kbase/mali_kbase_sync.h +++ b/mali_kbase/mali_kbase_sync.h @@ -30,9 +30,6 @@ #include <linux/fdtable.h> #include <linux/syscalls.h> -#if IS_ENABLED(CONFIG_SYNC) -#include <sync.h> -#endif #if IS_ENABLED(CONFIG_SYNC_FILE) #include "mali_kbase_fence_defs.h" #include <linux/sync_file.h> @@ -181,7 +178,7 @@ int kbase_sync_fence_out_info_get(struct kbase_jd_atom *katom, struct kbase_sync_fence_info *info); #endif /* !MALI_USE_CSF */ -#if defined(CONFIG_SYNC_FILE) +#if IS_ENABLED(CONFIG_SYNC_FILE) #if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE) void kbase_sync_fence_info_get(struct fence *fence, struct kbase_sync_fence_info *info); diff --git a/mali_kbase/mali_kbase_sync_android.c b/mali_kbase/mali_kbase_sync_android.c deleted file mode 100644 index c028b1c..0000000 --- a/mali_kbase/mali_kbase_sync_android.c +++ /dev/null @@ -1,520 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note -/* - * - * (C) COPYRIGHT 2012-2017, 2020-2021 ARM Limited. All rights reserved. - * - * This program is free software and is provided to you under the terms of the - * GNU General Public License version 2 as published by the Free Software - * Foundation, and any use by you of this program is subject to the terms - * of such GNU license. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, you can access it online at - * http://www.gnu.org/licenses/gpl-2.0.html. - * - */ - -/* - * Code for supporting explicit Android fences (CONFIG_SYNC) - * Known to be good for kernels 4.5 and earlier. - * Replaced with CONFIG_SYNC_FILE for 4.9 and later kernels - * (see mali_kbase_sync_file.c) - */ - -#include <linux/sched.h> -#include <linux/fdtable.h> -#include <linux/file.h> -#include <linux/fs.h> -#include <linux/module.h> -#include <linux/anon_inodes.h> -#include <linux/version.h> -#include "sync.h" -#include <mali_kbase.h> -#include <mali_kbase_sync.h> - -struct mali_sync_timeline { - struct sync_timeline timeline; - atomic_t counter; - atomic_t signaled; -}; - -struct mali_sync_pt { - struct sync_pt pt; - int order; - int result; -}; - -static struct mali_sync_timeline *to_mali_sync_timeline( - struct sync_timeline *timeline) -{ - return container_of(timeline, struct mali_sync_timeline, timeline); -} - -static struct mali_sync_pt *to_mali_sync_pt(struct sync_pt *pt) -{ - return container_of(pt, struct mali_sync_pt, pt); -} - -static struct sync_pt *timeline_dup(struct sync_pt *pt) -{ - struct mali_sync_pt *mpt = to_mali_sync_pt(pt); - struct mali_sync_pt *new_mpt; - struct sync_pt *new_pt = sync_pt_create(sync_pt_parent(pt), - sizeof(struct mali_sync_pt)); - - if (!new_pt) - return NULL; - - new_mpt = to_mali_sync_pt(new_pt); - new_mpt->order = mpt->order; - new_mpt->result = mpt->result; - - return new_pt; -} - -static int timeline_has_signaled(struct sync_pt *pt) -{ - struct mali_sync_pt *mpt = to_mali_sync_pt(pt); - struct mali_sync_timeline *mtl = to_mali_sync_timeline( - sync_pt_parent(pt)); - int result = mpt->result; - - int diff = atomic_read(&mtl->signaled) - mpt->order; - - if (diff >= 0) - return (result < 0) ? result : 1; - - return 0; -} - -static int timeline_compare(struct sync_pt *a, struct sync_pt *b) -{ - struct mali_sync_pt *ma = container_of(a, struct mali_sync_pt, pt); - struct mali_sync_pt *mb = container_of(b, struct mali_sync_pt, pt); - - int diff = ma->order - mb->order; - - if (diff == 0) - return 0; - - return (diff < 0) ? -1 : 1; -} - -static void timeline_value_str(struct sync_timeline *timeline, char *str, - int size) -{ - struct mali_sync_timeline *mtl = to_mali_sync_timeline(timeline); - - snprintf(str, size, "%d", atomic_read(&mtl->signaled)); -} - -static void pt_value_str(struct sync_pt *pt, char *str, int size) -{ - struct mali_sync_pt *mpt = to_mali_sync_pt(pt); - - snprintf(str, size, "%d(%d)", mpt->order, mpt->result); -} - -static struct sync_timeline_ops mali_timeline_ops = { - .driver_name = "Mali", - .dup = timeline_dup, - .has_signaled = timeline_has_signaled, - .compare = timeline_compare, - .timeline_value_str = timeline_value_str, - .pt_value_str = pt_value_str, -}; - -/* Allocates a timeline for Mali - * - * One timeline should be allocated per API context. - */ -static struct sync_timeline *mali_sync_timeline_alloc(const char *name) -{ - struct sync_timeline *tl; - struct mali_sync_timeline *mtl; - - tl = sync_timeline_create(&mali_timeline_ops, - sizeof(struct mali_sync_timeline), name); - if (!tl) - return NULL; - - /* Set the counter in our private struct */ - mtl = to_mali_sync_timeline(tl); - atomic_set(&mtl->counter, 0); - atomic_set(&mtl->signaled, 0); - - return tl; -} - -static int kbase_stream_close(struct inode *inode, struct file *file) -{ - struct sync_timeline *tl; - - tl = (struct sync_timeline *)file->private_data; - sync_timeline_destroy(tl); - return 0; -} - -static const struct file_operations stream_fops = { - .owner = THIS_MODULE, - .release = kbase_stream_close, -}; - -int kbase_sync_fence_stream_create(const char *name, int *const out_fd) -{ - struct sync_timeline *tl; - - if (!out_fd) - return -EINVAL; - - tl = mali_sync_timeline_alloc(name); - if (!tl) - return -EINVAL; - - *out_fd = anon_inode_getfd(name, &stream_fops, tl, O_RDONLY|O_CLOEXEC); - - if (*out_fd < 0) { - sync_timeline_destroy(tl); - return -EINVAL; - } - - return 0; -} - -#if !MALI_USE_CSF -/* Allocates a sync point within the timeline. - * - * The timeline must be the one allocated by kbase_sync_timeline_alloc - * - * Sync points must be triggered in *exactly* the same order as they are - * allocated. - */ -static struct sync_pt *kbase_sync_pt_alloc(struct sync_timeline *parent) -{ - struct sync_pt *pt = sync_pt_create(parent, - sizeof(struct mali_sync_pt)); - struct mali_sync_timeline *mtl = to_mali_sync_timeline(parent); - struct mali_sync_pt *mpt; - - if (!pt) - return NULL; - - mpt = to_mali_sync_pt(pt); - mpt->order = atomic_inc_return(&mtl->counter); - mpt->result = 0; - - return pt; -} - -int kbase_sync_fence_out_create(struct kbase_jd_atom *katom, int tl_fd) -{ - struct sync_timeline *tl; - struct sync_pt *pt; - struct sync_fence *fence; - int fd; - struct file *tl_file; - - tl_file = fget(tl_fd); - if (tl_file == NULL) - return -EBADF; - - if (tl_file->f_op != &stream_fops) { - fd = -EBADF; - goto out; - } - - tl = tl_file->private_data; - - pt = kbase_sync_pt_alloc(tl); - if (!pt) { - fd = -EFAULT; - goto out; - } - - fence = sync_fence_create("mali_fence", pt); - if (!fence) { - sync_pt_free(pt); - fd = -EFAULT; - goto out; - } - - /* from here the fence owns the sync_pt */ - - /* create a fd representing the fence */ - fd = get_unused_fd_flags(O_RDWR | O_CLOEXEC); - if (fd < 0) { - sync_fence_put(fence); - goto out; - } - - /* bind fence to the new fd */ - sync_fence_install(fence, fd); - - katom->fence = sync_fence_fdget(fd); - if (katom->fence == NULL) { - /* The only way the fence can be NULL is if userspace closed it - * for us, so we don't need to clear it up - */ - fd = -EINVAL; - goto out; - } - -out: - fput(tl_file); - - return fd; -} - -int kbase_sync_fence_in_from_fd(struct kbase_jd_atom *katom, int fd) -{ - katom->fence = sync_fence_fdget(fd); - return katom->fence ? 0 : -ENOENT; -} -#endif /* !MALI_USE_CSF */ - -int kbase_sync_fence_validate(int fd) -{ - struct sync_fence *fence; - - fence = sync_fence_fdget(fd); - if (!fence) - return -EINVAL; - - sync_fence_put(fence); - return 0; -} - -#if !MALI_USE_CSF -/* Returns true if the specified timeline is allocated by Mali */ -static int kbase_sync_timeline_is_ours(struct sync_timeline *timeline) -{ - return timeline->ops == &mali_timeline_ops; -} - -/* Signals a particular sync point - * - * Sync points must be triggered in *exactly* the same order as they are - * allocated. - * - * If they are signaled in the wrong order then a message will be printed in - * debug builds and otherwise attempts to signal order sync_pts will be ignored. - * - * result can be negative to indicate error, any other value is interpreted as - * success. - */ -static void kbase_sync_signal_pt(struct sync_pt *pt, int result) -{ - struct mali_sync_pt *mpt = to_mali_sync_pt(pt); - struct mali_sync_timeline *mtl = to_mali_sync_timeline( - sync_pt_parent(pt)); - int signaled; - int diff; - - mpt->result = result; - - do { - signaled = atomic_read(&mtl->signaled); - - diff = signaled - mpt->order; - - if (diff > 0) { - /* The timeline is already at or ahead of this point. - * This should not happen unless userspace has been - * signaling fences out of order, so warn but don't - * violate the sync_pt API. - * The warning is only in debug builds to prevent - * a malicious user being able to spam dmesg. - */ -#ifdef CONFIG_MALI_DEBUG - pr_err("Fences were triggered in a different order to allocation!"); -#endif /* CONFIG_MALI_DEBUG */ - return; - } - } while (atomic_cmpxchg(&mtl->signaled, - signaled, mpt->order) != signaled); -} - -enum base_jd_event_code -kbase_sync_fence_out_trigger(struct kbase_jd_atom *katom, int result) -{ - struct sync_pt *pt; - struct sync_timeline *timeline; - - if (!katom->fence) - return BASE_JD_EVENT_JOB_CANCELLED; - - if (katom->fence->num_fences != 1) { - /* Not exactly one item in the list - so it didn't (directly) - * come from us - */ - return BASE_JD_EVENT_JOB_CANCELLED; - } - - pt = container_of(katom->fence->cbs[0].sync_pt, struct sync_pt, base); - timeline = sync_pt_parent(pt); - - if (!kbase_sync_timeline_is_ours(timeline)) { - /* Fence has a sync_pt which isn't ours! */ - return BASE_JD_EVENT_JOB_CANCELLED; - } - - kbase_sync_signal_pt(pt, result); - - sync_timeline_signal(timeline); - - kbase_sync_fence_out_remove(katom); - - return (result < 0) ? BASE_JD_EVENT_JOB_CANCELLED : BASE_JD_EVENT_DONE; -} - -static inline int kbase_fence_get_status(struct sync_fence *fence) -{ - if (!fence) - return -ENOENT; - - return atomic_read(&fence->status); -} - -static void kbase_fence_wait_callback(struct sync_fence *fence, - struct sync_fence_waiter *waiter) -{ - struct kbase_jd_atom *katom = container_of(waiter, - struct kbase_jd_atom, sync_waiter); - struct kbase_context *kctx = katom->kctx; - - /* Propagate the fence status to the atom. - * If negative then cancel this atom and its dependencies. - */ - if (kbase_fence_get_status(fence) < 0) - katom->event_code = BASE_JD_EVENT_JOB_CANCELLED; - - /* To prevent a potential deadlock we schedule the work onto the - * job_done_worker kthread - * - * The issue is that we may signal the timeline while holding - * kctx->jctx.lock and the callbacks are run synchronously from - * sync_timeline_signal. So we simply defer the work. - */ - - kthread_init_work(&katom->work, kbase_sync_fence_wait_worker); - kthread_queue_work(&kctx->kbdev->job_done_worker, &katom->work); -} - -int kbase_sync_fence_in_wait(struct kbase_jd_atom *katom) -{ - int ret; - - sync_fence_waiter_init(&katom->sync_waiter, kbase_fence_wait_callback); - - ret = sync_fence_wait_async(katom->fence, &katom->sync_waiter); - - if (ret == 1) { - /* Already signaled */ - return 0; - } - - if (ret < 0) { - katom->event_code = BASE_JD_EVENT_JOB_CANCELLED; - /* We should cause the dependent jobs in the bag to be failed, - * to do this we schedule the work queue to complete this job - */ - kthread_init_work(&katom->work, kbase_sync_fence_wait_worker); - kthread_queue_work(&katom->kctx->kbdev->job_done_worker, &katom->work); - - } - - return 1; -} - -void kbase_sync_fence_in_cancel_wait(struct kbase_jd_atom *katom) -{ - if (sync_fence_cancel_async(katom->fence, &katom->sync_waiter) != 0) { - /* The wait wasn't cancelled - leave the cleanup for - * kbase_fence_wait_callback - */ - return; - } - - /* Wait was cancelled - zap the atoms */ - katom->event_code = BASE_JD_EVENT_JOB_CANCELLED; - - kbasep_remove_waiting_soft_job(katom); - kbase_finish_soft_job(katom); - - if (jd_done_nolock(katom, true)) - kbase_js_sched_all(katom->kctx->kbdev); -} - -void kbase_sync_fence_out_remove(struct kbase_jd_atom *katom) -{ - if (katom->fence) { - sync_fence_put(katom->fence); - katom->fence = NULL; - } -} - -void kbase_sync_fence_in_remove(struct kbase_jd_atom *katom) -{ - if (katom->fence) { - sync_fence_put(katom->fence); - katom->fence = NULL; - } -} - -int kbase_sync_fence_in_info_get(struct kbase_jd_atom *katom, - struct kbase_sync_fence_info *info) -{ - u32 string_len; - - if (!katom->fence) - return -ENOENT; - - info->fence = katom->fence; - info->status = kbase_fence_get_status(katom->fence); - - string_len = strscpy(info->name, katom->fence->name, sizeof(info->name)); - string_len += sizeof(char); - /* Make sure that the source string fit into the buffer. */ - KBASE_DEBUG_ASSERT(string_len <= sizeof(info->name)); - CSTD_UNUSED(string_len); - - return 0; -} - -int kbase_sync_fence_out_info_get(struct kbase_jd_atom *katom, - struct kbase_sync_fence_info *info) -{ - u32 string_len; - - if (!katom->fence) - return -ENOENT; - - info->fence = katom->fence; - info->status = kbase_fence_get_status(katom->fence); - - string_len = strscpy(info->name, katom->fence->name, sizeof(info->name)); - string_len += sizeof(char); - /* Make sure that the source string fit into the buffer. */ - KBASE_DEBUG_ASSERT(string_len <= sizeof(info->name)); - CSTD_UNUSED(string_len); - - return 0; -} - -#ifdef CONFIG_MALI_FENCE_DEBUG -void kbase_sync_fence_in_dump(struct kbase_jd_atom *katom) -{ - /* Dump out the full state of all the Android sync fences. - * The function sync_dump() isn't exported to modules, so force - * sync_fence_wait() to time out to trigger sync_dump(). - */ - if (katom->fence) - sync_fence_wait(katom->fence, 1); -} -#endif -#endif /* !MALI_USE_CSF */ diff --git a/mali_kbase/mali_kbase_sync_file.c b/mali_kbase/mali_kbase_sync_file.c index 1462a6b..d98eba9 100644 --- a/mali_kbase/mali_kbase_sync_file.c +++ b/mali_kbase/mali_kbase_sync_file.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2012-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2012-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -21,9 +21,6 @@ /* * Code for supporting explicit Linux fences (CONFIG_SYNC_FILE) - * Introduced in kernel 4.9. - * Android explicit fences (CONFIG_SYNC) can be used for older kernels - * (see mali_kbase_sync_android.c) */ #include <linux/sched.h> @@ -101,10 +98,13 @@ int kbase_sync_fence_in_from_fd(struct kbase_jd_atom *katom, int fd) struct dma_fence *fence = sync_file_get_fence(fd); #endif + lockdep_assert_held(&katom->kctx->jctx.lock); + if (!fence) return -ENOENT; kbase_fence_fence_in_set(katom, fence); + katom->dma_fence.fence_cb_added = false; return 0; } @@ -156,36 +156,31 @@ static void kbase_fence_wait_callback(struct dma_fence *fence, struct dma_fence_cb *cb) #endif { - struct kbase_fence_cb *kcb = container_of(cb, - struct kbase_fence_cb, - fence_cb); - struct kbase_jd_atom *katom = kcb->katom; + struct kbase_jd_atom *katom = container_of(cb, struct kbase_jd_atom, + dma_fence.fence_cb); struct kbase_context *kctx = katom->kctx; /* Cancel atom if fence is erroneous */ + if (dma_fence_is_signaled(katom->dma_fence.fence_in) && #if (KERNEL_VERSION(4, 11, 0) <= LINUX_VERSION_CODE || \ (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE && \ KERNEL_VERSION(4, 9, 68) <= LINUX_VERSION_CODE)) - if (dma_fence_is_signaled(kcb->fence) && kcb->fence->error < 0) + katom->dma_fence.fence_in->error < 0) #else - if (dma_fence_is_signaled(kcb->fence) && kcb->fence->status < 0) + katom->dma_fence.fence_in->status < 0) #endif katom->event_code = BASE_JD_EVENT_JOB_CANCELLED; - if (kbase_fence_dep_count_dec_and_test(katom)) { - /* We take responsibility of handling this */ - kbase_fence_dep_count_set(katom, -1); - /* To prevent a potential deadlock we schedule the work onto the - * job_done_worker kthread - * - * The issue is that we may signal the timeline while holding - * kctx->jctx.lock and the callbacks are run synchronously from - * sync_timeline_signal. So we simply defer the work. - */ - kthread_init_work(&katom->work, kbase_sync_fence_wait_worker); - kthread_queue_work(&kctx->kbdev->job_done_worker, &katom->work); - } + /* To prevent a potential deadlock we schedule the work onto the + * job_done_wq workqueue + * + * The issue is that we may signal the timeline while holding + * kctx->jctx.lock and the callbacks are run synchronously from + * sync_timeline_signal. So we simply defer the work. + */ + kthread_init_work(&katom->work, kbase_sync_fence_wait_worker); + kthread_queue_work(&kctx->kbdev->job_done_worker, &katom->work); } int kbase_sync_fence_in_wait(struct kbase_jd_atom *katom) @@ -197,53 +192,77 @@ int kbase_sync_fence_in_wait(struct kbase_jd_atom *katom) struct dma_fence *fence; #endif - fence = kbase_fence_in_get(katom); + lockdep_assert_held(&katom->kctx->jctx.lock); + + fence = katom->dma_fence.fence_in; if (!fence) return 0; /* no input fence to wait for, good to go! */ - kbase_fence_dep_count_set(katom, 1); + err = dma_fence_add_callback(fence, &katom->dma_fence.fence_cb, + kbase_fence_wait_callback); + if (err == -ENOENT) { + int fence_status = dma_fence_get_status(fence); + + if (fence_status == 1) { + /* Fence is already signaled with no error. The completion + * for FENCE_WAIT softjob can be done right away. + */ + return 0; + } - err = kbase_fence_add_callback(katom, fence, kbase_fence_wait_callback); + /* Fence shouldn't be in not signaled state */ + if (!fence_status) { + struct kbase_sync_fence_info info; - kbase_fence_put(fence); + kbase_sync_fence_in_info_get(katom, &info); - if (likely(!err)) { - /* Test if the callbacks are already triggered */ - if (kbase_fence_dep_count_dec_and_test(katom)) { - kbase_fence_free_callbacks(katom); - kbase_fence_dep_count_set(katom, -1); - return 0; /* Already signaled, good to go right now */ + dev_warn(katom->kctx->kbdev->dev, + "Unexpected status for fence %s of ctx:%d_%d atom:%d", + info.name, katom->kctx->tgid, katom->kctx->id, + kbase_jd_atom_id(katom->kctx, katom)); } - /* Callback installed, so we just need to wait for it... */ - } else { - /* Failure */ - kbase_fence_free_callbacks(katom); - kbase_fence_dep_count_set(katom, -1); + /* If fence is signaled with an error, then the FENCE_WAIT softjob is + * considered to be failed. + */ + } + if (unlikely(err)) { + /* We should cause the dependent jobs in the bag to be failed. */ katom->event_code = BASE_JD_EVENT_JOB_CANCELLED; - /* We should cause the dependent jobs in the bag to be failed, - * to do this we schedule the work queue to complete this job - */ - kthread_init_work(&katom->work, kbase_sync_fence_wait_worker); - kthread_queue_work(&katom->kctx->kbdev->job_done_worker, &katom->work); + /* The completion for FENCE_WAIT softjob can be done right away. */ + return 0; } - return 1; /* completion to be done later by callback/worker */ + /* Callback was successfully installed */ + katom->dma_fence.fence_cb_added = true; + + /* Completion to be done later by callback/worker */ + return 1; } void kbase_sync_fence_in_cancel_wait(struct kbase_jd_atom *katom) { - if (!kbase_fence_free_callbacks(katom)) { - /* The wait wasn't cancelled - - * leave the cleanup for kbase_fence_wait_callback - */ - return; - } + lockdep_assert_held(&katom->kctx->jctx.lock); + + if (katom->dma_fence.fence_cb_added) { + if (!dma_fence_remove_callback(katom->dma_fence.fence_in, + &katom->dma_fence.fence_cb)) { + /* The callback is already removed so leave the cleanup + * for kbase_fence_wait_callback. + */ + return; + } + } else { + struct kbase_sync_fence_info info; - /* Take responsibility of completion */ - kbase_fence_dep_count_set(katom, -1); + kbase_sync_fence_in_info_get(katom, &info); + dev_warn(katom->kctx->kbdev->dev, + "Callback was not added earlier for fence %s of ctx:%d_%d atom:%d", + info.name, katom->kctx->tgid, katom->kctx->id, + kbase_jd_atom_id(katom->kctx, katom)); + } /* Wait was cancelled - zap the atoms */ katom->event_code = BASE_JD_EVENT_JOB_CANCELLED; @@ -251,7 +270,7 @@ void kbase_sync_fence_in_cancel_wait(struct kbase_jd_atom *katom) kbasep_remove_waiting_soft_job(katom); kbase_finish_soft_job(katom); - if (jd_done_nolock(katom, true)) + if (kbase_jd_done_nolock(katom, true)) kbase_js_sched_all(katom->kctx->kbdev); } @@ -262,8 +281,29 @@ void kbase_sync_fence_out_remove(struct kbase_jd_atom *katom) void kbase_sync_fence_in_remove(struct kbase_jd_atom *katom) { - kbase_fence_free_callbacks(katom); + lockdep_assert_held(&katom->kctx->jctx.lock); + + if (katom->dma_fence.fence_cb_added) { + bool removed = dma_fence_remove_callback(katom->dma_fence.fence_in, + &katom->dma_fence.fence_cb); + + /* Here it is expected that the callback should have already been removed + * previously either by kbase_sync_fence_in_cancel_wait() or when the fence + * was signaled and kbase_sync_fence_wait_worker() was called. + */ + if (removed) { + struct kbase_sync_fence_info info; + + kbase_sync_fence_in_info_get(katom, &info); + dev_warn(katom->kctx->kbdev->dev, + "Callback was not removed earlier for fence %s of ctx:%d_%d atom:%d", + info.name, katom->kctx->tgid, katom->kctx->id, + kbase_jd_atom_id(katom->kctx, katom)); + } + } + kbase_fence_in_remove(katom); + katom->dma_fence.fence_cb_added = false; } #endif /* !MALI_USE_CSF */ @@ -277,7 +317,7 @@ void kbase_sync_fence_info_get(struct dma_fence *fence, { info->fence = fence; - /* translate into CONFIG_SYNC status: + /* Translate into the following status, with support for error handling: * < 0 : error * 0 : active * 1 : signaled @@ -298,10 +338,7 @@ void kbase_sync_fence_info_get(struct dma_fence *fence, info->status = 0; /* still active (unsignaled) */ } -#if (KERNEL_VERSION(4, 8, 0) > LINUX_VERSION_CODE) - scnprintf(info->name, sizeof(info->name), "%u#%u", - fence->context, fence->seqno); -#elif (KERNEL_VERSION(5, 1, 0) > LINUX_VERSION_CODE) +#if (KERNEL_VERSION(5, 1, 0) > LINUX_VERSION_CODE) scnprintf(info->name, sizeof(info->name), "%llu#%u", fence->context, fence->seqno); #else diff --git a/mali_kbase/mali_kbase_utility.h b/mali_kbase/mali_kbase_utility.h deleted file mode 100644 index 2dad49b..0000000 --- a/mali_kbase/mali_kbase_utility.h +++ /dev/null @@ -1,52 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ -/* - * - * (C) COPYRIGHT 2012-2013, 2015, 2018, 2020-2021 ARM Limited. All rights reserved. - * - * This program is free software and is provided to you under the terms of the - * GNU General Public License version 2 as published by the Free Software - * Foundation, and any use by you of this program is subject to the terms - * of such GNU license. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, you can access it online at - * http://www.gnu.org/licenses/gpl-2.0.html. - * - */ - -#ifndef _KBASE_UTILITY_H -#define _KBASE_UTILITY_H - -#ifndef _KBASE_H_ -#error "Don't include this file directly, use mali_kbase.h instead" -#endif - -static inline void kbase_timer_setup(struct timer_list *timer, - void (*callback)(struct timer_list *timer)) -{ -#if KERNEL_VERSION(4, 14, 0) > LINUX_VERSION_CODE - setup_timer(timer, (void (*)(unsigned long)) callback, - (unsigned long) timer); -#else - timer_setup(timer, callback, 0); -#endif -} - -#ifndef WRITE_ONCE - #ifdef ASSIGN_ONCE - #define WRITE_ONCE(x, val) ASSIGN_ONCE(val, x) - #else - #define WRITE_ONCE(x, val) (ACCESS_ONCE(x) = (val)) - #endif -#endif - -#ifndef READ_ONCE - #define READ_ONCE(x) ACCESS_ONCE(x) -#endif - -#endif /* _KBASE_UTILITY_H */ diff --git a/mali_kbase/mali_kbase_vinstr.c b/mali_kbase/mali_kbase_vinstr.c index d7a6c98..eb6911e 100644 --- a/mali_kbase/mali_kbase_vinstr.c +++ b/mali_kbase/mali_kbase_vinstr.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2011-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2011-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -20,11 +20,11 @@ */ #include "mali_kbase_vinstr.h" -#include "mali_kbase_hwcnt_virtualizer.h" -#include "mali_kbase_hwcnt_types.h" +#include "hwcnt/mali_kbase_hwcnt_virtualizer.h" +#include "hwcnt/mali_kbase_hwcnt_types.h" #include <uapi/gpu/arm/midgard/mali_kbase_hwcnt_reader.h> -#include "mali_kbase_hwcnt_gpu.h" -#include "mali_kbase_hwcnt_gpu_narrow.h" +#include "hwcnt/mali_kbase_hwcnt_gpu.h" +#include "hwcnt/mali_kbase_hwcnt_gpu_narrow.h" #include <uapi/gpu/arm/midgard/mali_kbase_ioctl.h> #include "mali_malisw.h" #include "mali_kbase_debug.h" @@ -38,8 +38,14 @@ #include <linux/mutex.h> #include <linux/poll.h> #include <linux/slab.h> +#include <linux/version_compat_defs.h> #include <linux/workqueue.h> +/* Explicitly include epoll header for old kernels. Not required from 4.16. */ +#if KERNEL_VERSION(4, 16, 0) > LINUX_VERSION_CODE +#include <uapi/linux/eventpoll.h> +#endif + /* Hwcnt reader API version */ #define HWCNT_READER_API 1 @@ -113,9 +119,7 @@ struct kbase_vinstr_client { wait_queue_head_t waitq; }; -static unsigned int kbasep_vinstr_hwcnt_reader_poll( - struct file *filp, - poll_table *wait); +static __poll_t kbasep_vinstr_hwcnt_reader_poll(struct file *filp, poll_table *wait); static long kbasep_vinstr_hwcnt_reader_ioctl( struct file *filp, @@ -453,7 +457,7 @@ static int kbasep_vinstr_client_create( errcode = -ENOMEM; vcli->dump_bufs_meta = kmalloc_array( - setup->buffer_count, sizeof(*vcli->dump_bufs_meta), GFP_KERNEL); + setup->buffer_count, sizeof(*vcli->dump_bufs_meta), GFP_KERNEL | __GFP_ZERO); if (!vcli->dump_bufs_meta) goto error; @@ -517,8 +521,6 @@ void kbase_vinstr_term(struct kbase_vinstr_context *vctx) if (!vctx) return; - cancel_work_sync(&vctx->dump_work); - /* Non-zero client count implies client leak */ if (WARN_ON(vctx->client_count != 0)) { struct kbase_vinstr_client *pos, *n; @@ -530,6 +532,7 @@ void kbase_vinstr_term(struct kbase_vinstr_context *vctx) } } + cancel_work_sync(&vctx->dump_work); kbase_hwcnt_gpu_metadata_narrow_destroy(vctx->metadata_user); WARN_ON(vctx->client_count != 0); @@ -538,8 +541,10 @@ void kbase_vinstr_term(struct kbase_vinstr_context *vctx) void kbase_vinstr_suspend(struct kbase_vinstr_context *vctx) { - if (WARN_ON(!vctx)) + if (!vctx) { + pr_warn("%s: vctx is NULL\n", __func__); return; + } mutex_lock(&vctx->lock); @@ -568,8 +573,10 @@ void kbase_vinstr_suspend(struct kbase_vinstr_context *vctx) void kbase_vinstr_resume(struct kbase_vinstr_context *vctx) { - if (WARN_ON(!vctx)) + if (!vctx) { + pr_warn("%s:vctx is NULL\n", __func__); return; + } mutex_lock(&vctx->lock); @@ -1036,26 +1043,25 @@ static long kbasep_vinstr_hwcnt_reader_ioctl( * @filp: Non-NULL pointer to file structure. * @wait: Non-NULL pointer to poll table. * - * Return: POLLIN if data can be read without blocking, 0 if data can not be - * read without blocking, else error code. + * Return: EPOLLIN | EPOLLRDNORM if data can be read without blocking, 0 if + * data can not be read without blocking, else EPOLLHUP | EPOLLERR. */ -static unsigned int kbasep_vinstr_hwcnt_reader_poll( - struct file *filp, - poll_table *wait) +static __poll_t kbasep_vinstr_hwcnt_reader_poll(struct file *filp, poll_table *wait) { struct kbase_vinstr_client *cli; if (!filp || !wait) - return -EINVAL; + return EPOLLHUP | EPOLLERR; cli = filp->private_data; if (!cli) - return -EINVAL; + return EPOLLHUP | EPOLLERR; poll_wait(filp, &cli->waitq, wait); if (kbasep_vinstr_hwcnt_reader_buffer_ready(cli)) - return POLLIN; - return 0; + return EPOLLIN | EPOLLRDNORM; + + return (__poll_t)0; } /** diff --git a/mali_kbase/mali_linux_trace.h b/mali_kbase/mali_linux_trace.h index 2a243dd..1293a0b 100644 --- a/mali_kbase/mali_linux_trace.h +++ b/mali_kbase/mali_linux_trace.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2011-2016, 2018-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2011-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -173,7 +173,7 @@ TRACE_EVENT(mali_total_alloc_pages_change, ((status) & AS_FAULTSTATUS_ACCESS_TYPE_MASK) #define KBASE_MMU_FAULT_ACCESS_SYMBOLIC_STRINGS _ENSURE_PARENTHESIS(\ {AS_FAULTSTATUS_ACCESS_TYPE_ATOMIC, "ATOMIC" }, \ - {AS_FAULTSTATUS_ACCESS_TYPE_EX, "EXECUTE"}, \ + {AS_FAULTSTATUS_ACCESS_TYPE_EXECUTE, "EXECUTE"}, \ {AS_FAULTSTATUS_ACCESS_TYPE_READ, "READ" }, \ {AS_FAULTSTATUS_ACCESS_TYPE_WRITE, "WRITE" }) #define KBASE_MMU_FAULT_STATUS_ACCESS_PRINT(status) \ @@ -531,6 +531,23 @@ TRACE_EVENT(mali_jit_trim, TP_printk("freed_pages=%zu", __entry->freed_pages) ); +/* trace_mali_protected_mode + * + * Trace point to indicate if GPU is in protected mode + */ +TRACE_EVENT(mali_protected_mode, + TP_PROTO(bool protm), + TP_ARGS(protm), + TP_STRUCT__entry( + __field(bool, protm) + ), + TP_fast_assign( + __entry->protm = protm; + ), + TP_printk("Protected mode: %d" , __entry->protm) +); + + #include "debug/mali_kbase_debug_linux_ktrace.h" #endif /* _TRACE_MALI_H */ diff --git a/mali_kbase/mali_malisw.h b/mali_kbase/mali_malisw.h index fc8dcbc..d9db189 100644 --- a/mali_kbase/mali_malisw.h +++ b/mali_kbase/mali_malisw.h @@ -19,7 +19,7 @@ * */ -/** +/* * Kernel-wide include for common macros and types. */ @@ -97,16 +97,12 @@ */ #define CSTD_STR2(x) CSTD_STR1(x) -/* LINUX_VERSION_CODE < 5.4 */ -#if (KERNEL_VERSION(5, 4, 0) > LINUX_VERSION_CODE) -#if defined(GCC_VERSION) && GCC_VERSION >= 70000 + #ifndef fallthrough + #define fallthrough __fallthrough + #endif /* fallthrough */ + #ifndef __fallthrough #define __fallthrough __attribute__((fallthrough)) #endif /* __fallthrough */ -#define fallthrough __fallthrough -#else -#define fallthrough CSTD_NOP(...) /* fallthrough */ -#endif /* GCC_VERSION >= 70000 */ -#endif /* KERNEL_VERSION(5, 4, 0) */ #endif /* _MALISW_H_ */ diff --git a/mali_kbase/mali_kbase_strings.c b/mali_kbase/mali_power_gpu_work_period_trace.c index 84784be..8e7bf6f 100644 --- a/mali_kbase/mali_kbase_strings.c +++ b/mali_kbase/mali_power_gpu_work_period_trace.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2010-2016, 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -19,10 +19,10 @@ * */ -#include "mali_kbase_strings.h" - -#define KBASE_DRV_NAME "mali" -#define KBASE_TIMELINE_NAME KBASE_DRV_NAME ".timeline" - -const char kbase_drv_name[] = KBASE_DRV_NAME; -const char kbase_timeline_name[] = KBASE_TIMELINE_NAME; +/* Create the trace point if not configured in kernel */ +#ifndef CONFIG_TRACE_POWER_GPU_WORK_PERIOD +#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) +#define CREATE_TRACE_POINTS +#include "mali_power_gpu_work_period_trace.h" +#endif /* CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD */ +#endif diff --git a/mali_kbase/mali_power_gpu_work_period_trace.h b/mali_kbase/mali_power_gpu_work_period_trace.h new file mode 100644 index 0000000..46e86ad --- /dev/null +++ b/mali_kbase/mali_power_gpu_work_period_trace.h @@ -0,0 +1,88 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * + * (C) COPYRIGHT 2023 ARM Limited. All rights reserved. + * + * This program is free software and is provided to you under the terms of the + * GNU General Public License version 2 as published by the Free Software + * Foundation, and any use by you of this program is subject to the terms + * of such GNU license. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, you can access it online at + * http://www.gnu.org/licenses/gpl-2.0.html. + * + */ + +#ifndef _TRACE_POWER_GPU_WORK_PERIOD_MALI +#define _TRACE_POWER_GPU_WORK_PERIOD_MALI +#endif + +#undef TRACE_SYSTEM +#define TRACE_SYSTEM power +#undef TRACE_INCLUDE_FILE +#define TRACE_INCLUDE_FILE mali_power_gpu_work_period_trace +#undef TRACE_INCLUDE_PATH +#define TRACE_INCLUDE_PATH . + +#if !defined(_TRACE_POWER_GPU_WORK_PERIOD_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_POWER_GPU_WORK_PERIOD_H + +#include <linux/tracepoint.h> + +/** + * gpu_work_period - Reports GPU work period metrics + * + * @gpu_id: Unique GPU Identifier + * @uid: UID of an application + * @start_time_ns: Start time of a GPU work period in nanoseconds + * @end_time_ns: End time of a GPU work period in nanoseconds + * @total_active_duration_ns: Total amount of time the GPU was running GPU work for given + * UID during the GPU work period, in nanoseconds. This duration does + * not double-account parallel GPU work for the same UID. + */ +TRACE_EVENT(gpu_work_period, + + TP_PROTO( + u32 gpu_id, + u32 uid, + u64 start_time_ns, + u64 end_time_ns, + u64 total_active_duration_ns + ), + + TP_ARGS(gpu_id, uid, start_time_ns, end_time_ns, total_active_duration_ns), + + TP_STRUCT__entry( + __field(u32, gpu_id) + __field(u32, uid) + __field(u64, start_time_ns) + __field(u64, end_time_ns) + __field(u64, total_active_duration_ns) + ), + + TP_fast_assign( + __entry->gpu_id = gpu_id; + __entry->uid = uid; + __entry->start_time_ns = start_time_ns; + __entry->end_time_ns = end_time_ns; + __entry->total_active_duration_ns = total_active_duration_ns; + ), + + TP_printk("gpu_id=%u uid=%u start_time_ns=%llu end_time_ns=%llu total_active_duration_ns=%llu", + __entry->gpu_id, + __entry->uid, + __entry->start_time_ns, + __entry->end_time_ns, + __entry->total_active_duration_ns) +); + +#endif /* _TRACE_POWER_GPU_WORK_PERIOD_H */ + +/* This part must be outside protection */ +#include <trace/define_trace.h> diff --git a/mali_kbase/mmu/backend/mali_kbase_mmu_csf.c b/mali_kbase/mmu/backend/mali_kbase_mmu_csf.c index c9ba3fc..a057d3c 100644 --- a/mali_kbase/mmu/backend/mali_kbase_mmu_csf.c +++ b/mali_kbase/mmu/backend/mali_kbase_mmu_csf.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -88,12 +88,11 @@ static void submit_work_pagefault(struct kbase_device *kbdev, u32 as_nr, * context's address space, when the page fault occurs for * MCU's address space. */ - if (!queue_work(as->pf_wq, &as->work_pagefault)) - kbase_ctx_sched_release_ctx(kctx); - else { + if (!queue_work(as->pf_wq, &as->work_pagefault)) { dev_dbg(kbdev->dev, - "Page fault is already pending for as %u\n", - as_nr); + "Page fault is already pending for as %u", as_nr); + kbase_ctx_sched_release_ctx(kctx); + } else { atomic_inc(&kbdev->faults_pending); } } @@ -122,6 +121,8 @@ void kbase_mmu_report_mcu_as_fault_and_reset(struct kbase_device *kbdev, access_type, kbase_gpu_access_type_name(fault->status), source_id); + kbase_debug_csf_fault_notify(kbdev, NULL, DF_GPU_PAGE_FAULT); + /* Report MMU fault for all address spaces (except MCU_AS_NR) */ for (as_no = 1; as_no < kbdev->nr_hw_address_spaces; as_no++) submit_work_pagefault(kbdev, as_no, fault); @@ -145,21 +146,21 @@ void kbase_gpu_report_bus_fault_and_kill(struct kbase_context *kctx, GPU_FAULTSTATUS_ACCESS_TYPE_SHIFT; int source_id = (status & GPU_FAULTSTATUS_SOURCE_ID_MASK) >> GPU_FAULTSTATUS_SOURCE_ID_SHIFT; - const char *addr_valid = (status & GPU_FAULTSTATUS_ADDR_VALID_FLAG) ? - "true" : "false"; + const char *addr_valid = (status & GPU_FAULTSTATUS_ADDRESS_VALID_MASK) ? "true" : "false"; int as_no = as->number; unsigned long flags; + const uintptr_t fault_addr = fault->addr; /* terminal fault, print info about the fault */ dev_err(kbdev->dev, - "GPU bus fault in AS%d at VA 0x%016llX\n" - "VA_VALID: %s\n" + "GPU bus fault in AS%d at PA %pK\n" + "PA_VALID: %s\n" "raw fault status: 0x%X\n" "exception type 0x%X: %s\n" "access type 0x%X: %s\n" "source id 0x%X\n" "pid: %d\n", - as_no, fault->addr, + as_no, (void *)fault_addr, addr_valid, status, exception_type, kbase_gpu_exception_name(exception_type), @@ -188,6 +189,7 @@ void kbase_gpu_report_bus_fault_and_kill(struct kbase_context *kctx, kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_COMMAND), GPU_COMMAND_CLEAR_FAULT); spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); + } /* @@ -244,6 +246,8 @@ void kbase_mmu_report_fault_and_kill(struct kbase_context *kctx, spin_lock_irqsave(&kbdev->hwaccess_lock, flags); kbase_mmu_disable(kctx); kbase_ctx_flag_set(kctx, KCTX_AS_DISABLED_ON_FAULT); + kbase_debug_csf_fault_notify(kbdev, kctx, DF_GPU_PAGE_FAULT); + kbase_csf_ctx_report_page_fault_for_active_groups(kctx, fault); spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); mutex_unlock(&kbdev->mmu_hw_mutex); @@ -262,6 +266,7 @@ void kbase_mmu_report_fault_and_kill(struct kbase_context *kctx, KBASE_MMU_FAULT_TYPE_PAGE_UNEXPECTED); kbase_mmu_hw_enable_fault(kbdev, as, KBASE_MMU_FAULT_TYPE_PAGE_UNEXPECTED); + } /** @@ -363,9 +368,9 @@ void kbase_mmu_interrupt(struct kbase_device *kbdev, u32 irq_stat) /* remember current mask */ spin_lock_irqsave(&kbdev->mmu_mask_change, flags); - new_mask = kbase_reg_read(kbdev, MMU_REG(MMU_IRQ_MASK)); + new_mask = kbase_reg_read(kbdev, MMU_CONTROL_REG(MMU_IRQ_MASK)); /* mask interrupts for now */ - kbase_reg_write(kbdev, MMU_REG(MMU_IRQ_MASK), 0); + kbase_reg_write(kbdev, MMU_CONTROL_REG(MMU_IRQ_MASK), 0); spin_unlock_irqrestore(&kbdev->mmu_mask_change, flags); while (pf_bits) { @@ -375,11 +380,11 @@ void kbase_mmu_interrupt(struct kbase_device *kbdev, u32 irq_stat) struct kbase_fault *fault = &as->pf_data; /* find faulting address */ - fault->addr = kbase_reg_read(kbdev, MMU_AS_REG(as_no, - AS_FAULTADDRESS_HI)); + fault->addr = kbase_reg_read(kbdev, + MMU_STAGE1_REG(MMU_AS_REG(as_no, AS_FAULTADDRESS_HI))); fault->addr <<= 32; - fault->addr |= kbase_reg_read(kbdev, MMU_AS_REG(as_no, - AS_FAULTADDRESS_LO)); + fault->addr |= kbase_reg_read( + kbdev, MMU_STAGE1_REG(MMU_AS_REG(as_no, AS_FAULTADDRESS_LO))); /* Mark the fault protected or not */ fault->protected_mode = false; @@ -388,14 +393,14 @@ void kbase_mmu_interrupt(struct kbase_device *kbdev, u32 irq_stat) kbase_as_fault_debugfs_new(kbdev, as_no); /* record the fault status */ - fault->status = kbase_reg_read(kbdev, MMU_AS_REG(as_no, - AS_FAULTSTATUS)); + fault->status = + kbase_reg_read(kbdev, MMU_STAGE1_REG(MMU_AS_REG(as_no, AS_FAULTSTATUS))); - fault->extra_addr = kbase_reg_read(kbdev, - MMU_AS_REG(as_no, AS_FAULTEXTRA_HI)); + fault->extra_addr = + kbase_reg_read(kbdev, MMU_STAGE1_REG(MMU_AS_REG(as_no, AS_FAULTEXTRA_HI))); fault->extra_addr <<= 32; - fault->extra_addr |= kbase_reg_read(kbdev, - MMU_AS_REG(as_no, AS_FAULTEXTRA_LO)); + fault->extra_addr |= + kbase_reg_read(kbdev, MMU_STAGE1_REG(MMU_AS_REG(as_no, AS_FAULTEXTRA_LO))); /* Mark page fault as handled */ pf_bits &= ~(1UL << as_no); @@ -427,9 +432,9 @@ void kbase_mmu_interrupt(struct kbase_device *kbdev, u32 irq_stat) /* reenable interrupts */ spin_lock_irqsave(&kbdev->mmu_mask_change, flags); - tmp = kbase_reg_read(kbdev, MMU_REG(MMU_IRQ_MASK)); + tmp = kbase_reg_read(kbdev, MMU_CONTROL_REG(MMU_IRQ_MASK)); new_mask |= tmp; - kbase_reg_write(kbdev, MMU_REG(MMU_IRQ_MASK), new_mask); + kbase_reg_write(kbdev, MMU_CONTROL_REG(MMU_IRQ_MASK), new_mask); spin_unlock_irqrestore(&kbdev->mmu_mask_change, flags); } @@ -465,19 +470,16 @@ static void kbase_mmu_gpu_fault_worker(struct work_struct *data) spin_lock_irqsave(&kbdev->hwaccess_lock, flags); fault = &faulting_as->gf_data; status = fault->status; - as_valid = status & GPU_FAULTSTATUS_JASID_VALID_FLAG; + as_valid = status & GPU_FAULTSTATUS_JASID_VALID_MASK; address = fault->addr; spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); dev_warn(kbdev->dev, "GPU Fault 0x%08x (%s) in AS%u at 0x%016llx\n" "ASID_VALID: %s, ADDRESS_VALID: %s\n", - status, - kbase_gpu_exception_name( - GPU_FAULTSTATUS_EXCEPTION_TYPE_GET(status)), - as_nr, address, - as_valid ? "true" : "false", - status & GPU_FAULTSTATUS_ADDR_VALID_FLAG ? "true" : "false"); + status, kbase_gpu_exception_name(GPU_FAULTSTATUS_EXCEPTION_TYPE_GET(status)), + as_nr, address, as_valid ? "true" : "false", + status & GPU_FAULTSTATUS_ADDRESS_VALID_MASK ? "true" : "false"); kctx = kbase_ctx_sched_as_to_ctx(kbdev, as_nr); kbase_csf_ctx_handle_fault(kctx, fault); @@ -547,14 +549,14 @@ void kbase_mmu_gpu_fault_interrupt(struct kbase_device *kbdev, u32 status, } KBASE_EXPORT_TEST_API(kbase_mmu_gpu_fault_interrupt); -int kbase_mmu_as_init(struct kbase_device *kbdev, int i) +int kbase_mmu_as_init(struct kbase_device *kbdev, unsigned int i) { kbdev->as[i].number = i; kbdev->as[i].bf_data.addr = 0ULL; kbdev->as[i].pf_data.addr = 0ULL; kbdev->as[i].gf_data.addr = 0ULL; - kbdev->as[i].pf_wq = alloc_workqueue("mali_mmu%d", 0, 1, i); + kbdev->as[i].pf_wq = alloc_workqueue("mali_mmu%d", WQ_UNBOUND, 0, i); if (!kbdev->as[i].pf_wq) return -ENOMEM; diff --git a/mali_kbase/mmu/backend/mali_kbase_mmu_jm.c b/mali_kbase/mmu/backend/mali_kbase_mmu_jm.c index fad5554..5c774c2 100644 --- a/mali_kbase/mmu/backend/mali_kbase_mmu_jm.c +++ b/mali_kbase/mmu/backend/mali_kbase_mmu_jm.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -63,15 +63,16 @@ void kbase_gpu_report_bus_fault_and_kill(struct kbase_context *kctx, u32 const exception_data = (status >> 8) & 0xFFFFFF; int const as_no = as->number; unsigned long flags; + const uintptr_t fault_addr = fault->addr; /* terminal fault, print info about the fault */ dev_err(kbdev->dev, - "GPU bus fault in AS%d at VA 0x%016llX\n" + "GPU bus fault in AS%d at PA %pK\n" "raw fault status: 0x%X\n" "exception type 0x%X: %s\n" "exception data 0x%X\n" "pid: %d\n", - as_no, fault->addr, + as_no, (void *)fault_addr, status, exception_type, kbase_gpu_exception_name(exception_type), exception_data, @@ -94,6 +95,7 @@ void kbase_gpu_report_bus_fault_and_kill(struct kbase_context *kctx, KBASE_MMU_FAULT_TYPE_BUS_UNEXPECTED); kbase_mmu_hw_enable_fault(kbdev, as, KBASE_MMU_FAULT_TYPE_BUS_UNEXPECTED); + } /* @@ -320,14 +322,14 @@ void kbase_mmu_interrupt(struct kbase_device *kbdev, u32 irq_stat) /* remember current mask */ spin_lock_irqsave(&kbdev->mmu_mask_change, flags); - new_mask = kbase_reg_read(kbdev, MMU_REG(MMU_IRQ_MASK)); + new_mask = kbase_reg_read(kbdev, MMU_CONTROL_REG(MMU_IRQ_MASK)); /* mask interrupts for now */ - kbase_reg_write(kbdev, MMU_REG(MMU_IRQ_MASK), 0); + kbase_reg_write(kbdev, MMU_CONTROL_REG(MMU_IRQ_MASK), 0); spin_unlock_irqrestore(&kbdev->mmu_mask_change, flags); while (bf_bits | pf_bits) { struct kbase_as *as; - int as_no; + unsigned int as_no; struct kbase_context *kctx; struct kbase_fault *fault; @@ -353,11 +355,11 @@ void kbase_mmu_interrupt(struct kbase_device *kbdev, u32 irq_stat) kctx = kbase_ctx_sched_as_to_ctx_refcount(kbdev, as_no); /* find faulting address */ - fault->addr = kbase_reg_read(kbdev, MMU_AS_REG(as_no, - AS_FAULTADDRESS_HI)); + fault->addr = kbase_reg_read(kbdev, + MMU_STAGE1_REG(MMU_AS_REG(as_no, AS_FAULTADDRESS_HI))); fault->addr <<= 32; - fault->addr |= kbase_reg_read(kbdev, MMU_AS_REG(as_no, - AS_FAULTADDRESS_LO)); + fault->addr |= kbase_reg_read( + kbdev, MMU_STAGE1_REG(MMU_AS_REG(as_no, AS_FAULTADDRESS_LO))); /* Mark the fault protected or not */ fault->protected_mode = kbdev->protected_mode; @@ -370,13 +372,13 @@ void kbase_mmu_interrupt(struct kbase_device *kbdev, u32 irq_stat) kbase_as_fault_debugfs_new(kbdev, as_no); /* record the fault status */ - fault->status = kbase_reg_read(kbdev, MMU_AS_REG(as_no, - AS_FAULTSTATUS)); - fault->extra_addr = kbase_reg_read(kbdev, - MMU_AS_REG(as_no, AS_FAULTEXTRA_HI)); + fault->status = + kbase_reg_read(kbdev, MMU_STAGE1_REG(MMU_AS_REG(as_no, AS_FAULTSTATUS))); + fault->extra_addr = + kbase_reg_read(kbdev, MMU_STAGE1_REG(MMU_AS_REG(as_no, AS_FAULTEXTRA_HI))); fault->extra_addr <<= 32; - fault->extra_addr |= kbase_reg_read(kbdev, - MMU_AS_REG(as_no, AS_FAULTEXTRA_LO)); + fault->extra_addr |= + kbase_reg_read(kbdev, MMU_STAGE1_REG(MMU_AS_REG(as_no, AS_FAULTEXTRA_LO))); if (kbase_as_has_bus_fault(as, fault)) { /* Mark bus fault as handled. @@ -404,9 +406,9 @@ void kbase_mmu_interrupt(struct kbase_device *kbdev, u32 irq_stat) /* reenable interrupts */ spin_lock_irqsave(&kbdev->mmu_mask_change, flags); - tmp = kbase_reg_read(kbdev, MMU_REG(MMU_IRQ_MASK)); + tmp = kbase_reg_read(kbdev, MMU_CONTROL_REG(MMU_IRQ_MASK)); new_mask |= tmp; - kbase_reg_write(kbdev, MMU_REG(MMU_IRQ_MASK), new_mask); + kbase_reg_write(kbdev, MMU_CONTROL_REG(MMU_IRQ_MASK), new_mask); spin_unlock_irqrestore(&kbdev->mmu_mask_change, flags); dev_dbg(kbdev->dev, "Leaving %s irq_stat %u\n", @@ -422,13 +424,13 @@ int kbase_mmu_switch_to_ir(struct kbase_context *const kctx, return kbase_job_slot_softstop_start_rp(kctx, reg); } -int kbase_mmu_as_init(struct kbase_device *kbdev, int i) +int kbase_mmu_as_init(struct kbase_device *kbdev, unsigned int i) { kbdev->as[i].number = i; kbdev->as[i].bf_data.addr = 0ULL; kbdev->as[i].pf_data.addr = 0ULL; - kbdev->as[i].pf_wq = alloc_workqueue("mali_mmu%d", 0, 1, i); + kbdev->as[i].pf_wq = alloc_workqueue("mali_mmu%u", 0, 0, i); if (!kbdev->as[i].pf_wq) return -ENOMEM; diff --git a/mali_kbase/mmu/mali_kbase_mmu.c b/mali_kbase/mmu/mali_kbase_mmu.c index 26ddd95..d6b4eb7 100644 --- a/mali_kbase/mmu/mali_kbase_mmu.c +++ b/mali_kbase/mmu/mali_kbase_mmu.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2010-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -25,6 +25,7 @@ #include <linux/kernel.h> #include <linux/dma-mapping.h> +#include <linux/migrate.h> #include <mali_kbase.h> #include <gpu/mali_kbase_gpu_fault.h> #include <gpu/mali_kbase_gpu_regmap.h> @@ -45,10 +46,35 @@ #if !MALI_USE_CSF #include <mali_kbase_hwaccess_jm.h> #endif +#include <linux/version_compat_defs.h> #include <mali_kbase_trace_gpu_mem.h> #include <backend/gpu/mali_kbase_pm_internal.h> +/* Threshold used to decide whether to flush full caches or just a physical range */ +#define KBASE_PA_RANGE_THRESHOLD_NR_PAGES 20 +#define MGM_DEFAULT_PTE_GROUP (0) + +/* Macro to convert updated PDGs to flags indicating levels skip in flush */ +#define pgd_level_to_skip_flush(dirty_pgds) (~(dirty_pgds) & 0xF) + +static int mmu_insert_pages_no_flush(struct kbase_device *kbdev, struct kbase_mmu_table *mmut, + const u64 start_vpfn, struct tagged_addr *phys, size_t nr, + unsigned long flags, int const group_id, u64 *dirty_pgds, + struct kbase_va_region *reg, bool ignore_page_migration); + +/* Small wrapper function to factor out GPU-dependent context releasing */ +static void release_ctx(struct kbase_device *kbdev, + struct kbase_context *kctx) +{ +#if MALI_USE_CSF + CSTD_UNUSED(kbdev); + kbase_ctx_sched_release_ctx_lock(kctx); +#else /* MALI_USE_CSF */ + kbasep_js_runpool_release_ctx(kbdev, kctx); +#endif /* MALI_USE_CSF */ +} + static void mmu_hw_operation_begin(struct kbase_device *kbdev) { #if !IS_ENABLED(CONFIG_MALI_NO_MALI) @@ -91,7 +117,8 @@ static void mmu_hw_operation_end(struct kbase_device *kbdev) /** * mmu_flush_cache_on_gpu_ctrl() - Check if cache flush needs to be done - * through GPU_CONTROL interface + * through GPU_CONTROL interface. + * * @kbdev: kbase device to check GPU model ID on. * * This function returns whether a cache flush for page table update should @@ -109,119 +136,213 @@ static bool mmu_flush_cache_on_gpu_ctrl(struct kbase_device *kbdev) } /** - * mmu_flush_invalidate_on_gpu_ctrl() - Flush and invalidate the GPU caches - * through GPU_CONTROL interface. - * @kbdev: kbase device to issue the MMU operation on. - * @as: address space to issue the MMU operation on. - * @op_param: parameters for the operation. + * mmu_flush_pa_range() - Flush physical address range * - * This wrapper function alternates AS_COMMAND_FLUSH_PT and AS_COMMAND_FLUSH_MEM - * to equivalent GPU_CONTROL command FLUSH_CACHES. - * The function first issue LOCK to MMU-AS with kbase_mmu_hw_do_operation(). - * And issues cache-flush with kbase_gpu_cache_flush_and_busy_wait() function - * then issue UNLOCK to MMU-AS with kbase_mmu_hw_do_operation(). + * @kbdev: kbase device to issue the MMU operation on. + * @phys: Starting address of the physical range to start the operation on. + * @nr_bytes: Number of bytes to work on. + * @op: Type of cache flush operation to perform. * - * Return: Zero if the operation was successful, non-zero otherwise. + * Issue a cache flush physical range command. */ -static int -mmu_flush_invalidate_on_gpu_ctrl(struct kbase_device *kbdev, - struct kbase_as *as, - struct kbase_mmu_hw_op_param *op_param) +#if MALI_USE_CSF +static void mmu_flush_pa_range(struct kbase_device *kbdev, phys_addr_t phys, size_t nr_bytes, + enum kbase_mmu_op_type op) { u32 flush_op; - int ret, ret2; - - if (WARN_ON(kbdev == NULL) || - WARN_ON(as == NULL) || - WARN_ON(op_param == NULL)) - return -EINVAL; lockdep_assert_held(&kbdev->hwaccess_lock); - lockdep_assert_held(&kbdev->mmu_hw_mutex); /* Translate operation to command */ - if (op_param->op == KBASE_MMU_OP_FLUSH_PT) { - flush_op = GPU_COMMAND_CACHE_CLN_INV_L2; - } else if (op_param->op == KBASE_MMU_OP_FLUSH_MEM) { - flush_op = GPU_COMMAND_CACHE_CLN_INV_L2_LSC; - } else { - dev_warn(kbdev->dev, "Invalid flush request (op = %d)\n", - op_param->op); - return -EINVAL; + if (op == KBASE_MMU_OP_FLUSH_PT) + flush_op = GPU_COMMAND_FLUSH_PA_RANGE_CLN_INV_L2; + else if (op == KBASE_MMU_OP_FLUSH_MEM) + flush_op = GPU_COMMAND_FLUSH_PA_RANGE_CLN_INV_L2_LSC; + else { + dev_warn(kbdev->dev, "Invalid flush request (op = %d)", op); + return; } - /* 1. Issue MMU_AS_CONTROL.COMMAND.LOCK operation. */ - op_param->op = KBASE_MMU_OP_LOCK; - ret = kbase_mmu_hw_do_operation(kbdev, as, op_param); - if (ret) - return ret; + if (kbase_gpu_cache_flush_pa_range_and_busy_wait(kbdev, phys, nr_bytes, flush_op)) + dev_err(kbdev->dev, "Flush for physical address range did not complete"); +} +#endif - /* 2. Issue GPU_CONTROL.COMMAND.FLUSH_CACHES operation */ - ret = kbase_gpu_cache_flush_and_busy_wait(kbdev, flush_op); +/** + * mmu_invalidate() - Perform an invalidate operation on MMU caches. + * @kbdev: The Kbase device. + * @kctx: The Kbase context. + * @as_nr: GPU address space number for which invalidate is required. + * @op_param: Non-NULL pointer to struct containing information about the MMU + * operation to perform. + * + * Perform an MMU invalidate operation on a particual address space + * by issuing a UNLOCK command. + */ +static void mmu_invalidate(struct kbase_device *kbdev, struct kbase_context *kctx, int as_nr, + const struct kbase_mmu_hw_op_param *op_param) +{ + unsigned long flags; - /* 3. Issue MMU_AS_CONTROL.COMMAND.UNLOCK operation. */ - op_param->op = KBASE_MMU_OP_UNLOCK; - ret2 = kbase_mmu_hw_do_operation(kbdev, as, op_param); + spin_lock_irqsave(&kbdev->hwaccess_lock, flags); - return ret ?: ret2; + if (kbdev->pm.backend.gpu_ready && (!kctx || kctx->as_nr >= 0)) { + as_nr = kctx ? kctx->as_nr : as_nr; + if (kbase_mmu_hw_do_unlock(kbdev, &kbdev->as[as_nr], op_param)) + dev_err(kbdev->dev, + "Invalidate after GPU page table update did not complete"); + } + + spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); +} + +/* Perform a flush/invalidate on a particular address space + */ +static void mmu_flush_invalidate_as(struct kbase_device *kbdev, struct kbase_as *as, + const struct kbase_mmu_hw_op_param *op_param) +{ + unsigned long flags; + + /* AS transaction begin */ + mutex_lock(&kbdev->mmu_hw_mutex); + spin_lock_irqsave(&kbdev->hwaccess_lock, flags); + + if (kbdev->pm.backend.gpu_ready && (kbase_mmu_hw_do_flush_locked(kbdev, as, op_param))) + dev_err(kbdev->dev, "Flush for GPU page table update did not complete"); + + spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); + mutex_unlock(&kbdev->mmu_hw_mutex); + /* AS transaction end */ } /** - * kbase_mmu_flush_invalidate() - Flush and invalidate the GPU caches. - * @kctx: The KBase context. - * @vpfn: The virtual page frame number to start the flush on. - * @nr: The number of pages to flush. - * @sync: Set if the operation should be synchronous or not. + * mmu_flush_invalidate() - Perform a flush operation on GPU caches. + * @kbdev: The Kbase device. + * @kctx: The Kbase context. + * @as_nr: GPU address space number for which flush + invalidate is required. + * @op_param: Non-NULL pointer to struct containing information about the MMU + * operation to perform. * - * Issue a cache flush + invalidate to the GPU caches and invalidate the TLBs. + * This function performs the cache flush operation described by @op_param. + * The function retains a reference to the given @kctx and releases it + * after performing the flush operation. * - * If sync is not set then transactions still in flight when the flush is issued - * may use the old page tables and the data they write will not be written out - * to memory, this function returns after the flush has been issued but - * before all accesses which might effect the flushed region have completed. + * If operation is set to KBASE_MMU_OP_FLUSH_PT then this function will issue + * a cache flush + invalidate to the L2 caches and invalidate the TLBs. * - * If sync is set then accesses in the flushed region will be drained - * before data is flush and invalidated through L1, L2 and into memory, - * after which point this function will return. - * @mmu_sync_info: Indicates whether this call is synchronous wrt MMU ops. + * If operation is set to KBASE_MMU_OP_FLUSH_MEM then this function will issue + * a cache flush + invalidate to the L2 and GPU Load/Store caches as well as + * invalidating the TLBs. */ -static void -kbase_mmu_flush_invalidate(struct kbase_context *kctx, u64 vpfn, size_t nr, - bool sync, - enum kbase_caller_mmu_sync_info mmu_sync_info); +static void mmu_flush_invalidate(struct kbase_device *kbdev, struct kbase_context *kctx, int as_nr, + const struct kbase_mmu_hw_op_param *op_param) +{ + bool ctx_is_in_runpool; + + /* Early out if there is nothing to do */ + if (op_param->nr == 0) + return; + + /* If no context is provided then MMU operation is performed on address + * space which does not belong to user space context. Otherwise, retain + * refcount to context provided and release after flush operation. + */ + if (!kctx) { + mmu_flush_invalidate_as(kbdev, &kbdev->as[as_nr], op_param); + } else { +#if !MALI_USE_CSF + rt_mutex_lock(&kbdev->js_data.queue_mutex); + ctx_is_in_runpool = kbase_ctx_sched_inc_refcount(kctx); + rt_mutex_unlock(&kbdev->js_data.queue_mutex); +#else + ctx_is_in_runpool = kbase_ctx_sched_inc_refcount_if_as_valid(kctx); +#endif /* !MALI_USE_CSF */ + + if (ctx_is_in_runpool) { + KBASE_DEBUG_ASSERT(kctx->as_nr != KBASEP_AS_NR_INVALID); + + mmu_flush_invalidate_as(kbdev, &kbdev->as[kctx->as_nr], op_param); + + release_ctx(kbdev, kctx); + } + } +} /** - * kbase_mmu_flush_invalidate_no_ctx() - Flush and invalidate the GPU caches. - * @kbdev: Device pointer. - * @vpfn: The virtual page frame number to start the flush on. - * @nr: The number of pages to flush. - * @sync: Set if the operation should be synchronous or not. - * @as_nr: GPU address space number for which flush + invalidate is required. - * @mmu_sync_info: Indicates whether this call is synchronous wrt MMU ops. + * mmu_flush_invalidate_on_gpu_ctrl() - Perform a flush operation on GPU caches via + * the GPU_CONTROL interface + * @kbdev: The Kbase device. + * @kctx: The Kbase context. + * @as_nr: GPU address space number for which flush + invalidate is required. + * @op_param: Non-NULL pointer to struct containing information about the MMU + * operation to perform. * - * This is used for MMU tables which do not belong to a user space context. + * Perform a flush/invalidate on a particular address space via the GPU_CONTROL + * interface. */ -static void kbase_mmu_flush_invalidate_no_ctx( - struct kbase_device *kbdev, u64 vpfn, size_t nr, bool sync, int as_nr, - enum kbase_caller_mmu_sync_info mmu_sync_info); +static void mmu_flush_invalidate_on_gpu_ctrl(struct kbase_device *kbdev, struct kbase_context *kctx, + int as_nr, const struct kbase_mmu_hw_op_param *op_param) +{ + unsigned long flags; + + /* AS transaction begin */ + mutex_lock(&kbdev->mmu_hw_mutex); + spin_lock_irqsave(&kbdev->hwaccess_lock, flags); + + if (kbdev->pm.backend.gpu_ready && (!kctx || kctx->as_nr >= 0)) { + as_nr = kctx ? kctx->as_nr : as_nr; + if (kbase_mmu_hw_do_flush_on_gpu_ctrl(kbdev, &kbdev->as[as_nr], op_param)) + dev_err(kbdev->dev, "Flush for GPU page table update did not complete"); + } + + spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); + mutex_unlock(&kbdev->mmu_hw_mutex); +} + +static void kbase_mmu_sync_pgd_gpu(struct kbase_device *kbdev, struct kbase_context *kctx, + phys_addr_t phys, size_t size, + enum kbase_mmu_op_type flush_op) +{ + kbase_mmu_flush_pa_range(kbdev, kctx, phys, size, flush_op); +} + +static void kbase_mmu_sync_pgd_cpu(struct kbase_device *kbdev, dma_addr_t handle, size_t size) +{ + /* Ensure that the GPU can read the pages from memory. + * + * pixel: b/200555454 requires this sync to happen even if the system + * is coherent. + */ + dma_sync_single_for_device(kbdev->dev, handle, size, + DMA_TO_DEVICE); +} /** * kbase_mmu_sync_pgd() - sync page directory to memory when needed. - * @kbdev: Device pointer. - * @handle: Address of DMA region. - * @size: Size of the region to sync. + * @kbdev: Device pointer. + * @kctx: Context pointer. + * @phys: Starting physical address of the destination region. + * @handle: Address of DMA region. + * @size: Size of the region to sync. + * @flush_op: MMU cache flush operation to perform on the physical address + * range, if GPU control is available. + * + * This function is called whenever the association between a virtual address + * range and a physical address range changes, because a mapping is created or + * destroyed. + * One of the effects of this operation is performing an MMU cache flush + * operation only on the physical address range affected by this function, if + * GPU control is available. * * This should be called after each page directory update. */ -static void kbase_mmu_sync_pgd(struct kbase_device *kbdev, - dma_addr_t handle, size_t size) +static void kbase_mmu_sync_pgd(struct kbase_device *kbdev, struct kbase_context *kctx, + phys_addr_t phys, dma_addr_t handle, size_t size, + enum kbase_mmu_op_type flush_op) { - /* In non-coherent system, ensure the GPU can read - * the pages from memory - */ - if (kbdev->system_coherency == COHERENCY_NONE) - dma_sync_single_for_device(kbdev->dev, handle, size, - DMA_TO_DEVICE); + + kbase_mmu_sync_pgd_cpu(kbdev, handle, size); + kbase_mmu_sync_pgd_gpu(kbdev, kctx, phys, size, flush_op); } /* @@ -233,35 +354,153 @@ static void kbase_mmu_sync_pgd(struct kbase_device *kbdev, * a 4kB physical page. */ -static int kbase_mmu_update_pages_no_flush(struct kbase_context *kctx, u64 vpfn, - struct tagged_addr *phys, size_t nr, - unsigned long flags, int group_id); - /** * kbase_mmu_update_and_free_parent_pgds() - Update number of valid entries and * free memory of the page directories * - * @kbdev: Device pointer. - * @mmut: GPU MMU page table. - * @pgds: Physical addresses of page directories to be freed. - * @vpfn: The virtual page frame number. - * @level: The level of MMU page table. + * @kbdev: Device pointer. + * @mmut: GPU MMU page table. + * @pgds: Physical addresses of page directories to be freed. + * @vpfn: The virtual page frame number. + * @level: The level of MMU page table. + * @flush_op: The type of MMU flush operation to perform. + * @dirty_pgds: Flags to track every level where a PGD has been updated. */ static void kbase_mmu_update_and_free_parent_pgds(struct kbase_device *kbdev, - struct kbase_mmu_table *mmut, - phys_addr_t *pgds, u64 vpfn, - int level); + struct kbase_mmu_table *mmut, phys_addr_t *pgds, + u64 vpfn, int level, + enum kbase_mmu_op_type flush_op, u64 *dirty_pgds); + +static void kbase_mmu_account_freed_pgd(struct kbase_device *kbdev, struct kbase_mmu_table *mmut) +{ + atomic_sub(1, &kbdev->memdev.used_pages); + + /* If MMU tables belong to a context then pages will have been accounted + * against it, so we must decrement the usage counts here. + */ + if (mmut->kctx) { + kbase_process_page_usage_dec(mmut->kctx, 1); + atomic_sub(1, &mmut->kctx->used_pages); + } + + kbase_trace_gpu_mem_usage_dec(kbdev, mmut->kctx, 1); +} + +static bool kbase_mmu_handle_isolated_pgd_page(struct kbase_device *kbdev, + struct kbase_mmu_table *mmut, + struct page *p) +{ + struct kbase_page_metadata *page_md = kbase_page_private(p); + bool page_is_isolated = false; + + lockdep_assert_held(&mmut->mmu_lock); + + if (!kbase_is_page_migration_enabled()) + return false; + + spin_lock(&page_md->migrate_lock); + if (PAGE_STATUS_GET(page_md->status) == PT_MAPPED) { + WARN_ON_ONCE(!mmut->kctx); + if (IS_PAGE_ISOLATED(page_md->status)) { + page_md->status = PAGE_STATUS_SET(page_md->status, + FREE_PT_ISOLATED_IN_PROGRESS); + page_md->data.free_pt_isolated.kbdev = kbdev; + page_is_isolated = true; + } else { + page_md->status = + PAGE_STATUS_SET(page_md->status, FREE_IN_PROGRESS); + } + } else if ((PAGE_STATUS_GET(page_md->status) == FREE_IN_PROGRESS) || + (PAGE_STATUS_GET(page_md->status) == ALLOCATE_IN_PROGRESS)) { + /* Nothing to do - fall through */ + } else { + WARN_ON_ONCE(PAGE_STATUS_GET(page_md->status) != NOT_MOVABLE); + } + spin_unlock(&page_md->migrate_lock); + + if (unlikely(page_is_isolated)) { + /* Do the CPU cache flush and accounting here for the isolated + * PGD page, which is done inside kbase_mmu_free_pgd() for the + * PGD page that did not get isolated. + */ + dma_sync_single_for_device(kbdev->dev, kbase_dma_addr(p), PAGE_SIZE, + DMA_BIDIRECTIONAL); + kbase_mmu_account_freed_pgd(kbdev, mmut); + } + + return page_is_isolated; +} + /** * kbase_mmu_free_pgd() - Free memory of the page directory * * @kbdev: Device pointer. * @mmut: GPU MMU page table. * @pgd: Physical address of page directory to be freed. - * @dirty: Flag to indicate whether the page may be dirty in the cache. + * + * This function is supposed to be called with mmu_lock held and after + * ensuring that the GPU won't be able to access the page. */ -static void kbase_mmu_free_pgd(struct kbase_device *kbdev, - struct kbase_mmu_table *mmut, phys_addr_t pgd, - bool dirty); +static void kbase_mmu_free_pgd(struct kbase_device *kbdev, struct kbase_mmu_table *mmut, + phys_addr_t pgd) +{ + struct page *p; + bool page_is_isolated = false; + + lockdep_assert_held(&mmut->mmu_lock); + + p = pfn_to_page(PFN_DOWN(pgd)); + page_is_isolated = kbase_mmu_handle_isolated_pgd_page(kbdev, mmut, p); + + if (likely(!page_is_isolated)) { + kbase_mem_pool_free(&kbdev->mem_pools.small[mmut->group_id], p, true); + kbase_mmu_account_freed_pgd(kbdev, mmut); + } +} + +/** + * kbase_mmu_free_pgds_list() - Free the PGD pages present in the list + * + * @kbdev: Device pointer. + * @mmut: GPU MMU page table. + * + * This function will call kbase_mmu_free_pgd() on each page directory page + * present in the list of free PGDs inside @mmut. + * + * The function is supposed to be called after the GPU cache and MMU TLB has + * been invalidated post the teardown loop. + * + * The mmu_lock shall be held prior to calling the function. + */ +static void kbase_mmu_free_pgds_list(struct kbase_device *kbdev, struct kbase_mmu_table *mmut) +{ + size_t i; + + lockdep_assert_held(&mmut->mmu_lock); + + for (i = 0; i < mmut->scratch_mem.free_pgds.head_index; i++) + kbase_mmu_free_pgd(kbdev, mmut, page_to_phys(mmut->scratch_mem.free_pgds.pgds[i])); + + mmut->scratch_mem.free_pgds.head_index = 0; +} + +static void kbase_mmu_add_to_free_pgds_list(struct kbase_mmu_table *mmut, struct page *p) +{ + lockdep_assert_held(&mmut->mmu_lock); + + if (WARN_ON_ONCE(mmut->scratch_mem.free_pgds.head_index > (MAX_FREE_PGDS - 1))) + return; + + mmut->scratch_mem.free_pgds.pgds[mmut->scratch_mem.free_pgds.head_index++] = p; +} + +static inline void kbase_mmu_reset_free_pgds_list(struct kbase_mmu_table *mmut) +{ + lockdep_assert_held(&mmut->mmu_lock); + + mmut->scratch_mem.free_pgds.head_index = 0; +} + /** * reg_grow_calc_extra_pages() - Calculate the number of backed pages to add to * a region on a GPU page fault @@ -289,7 +528,7 @@ static size_t reg_grow_calc_extra_pages(struct kbase_device *kbdev, if (!multiple) { dev_warn( kbdev->dev, - "VA Region 0x%llx extension was 0, allocator needs to set this properly for KBASE_REG_PF_GROW\n", + "VA Region 0x%llx extension was 0, allocator needs to set this properly for KBASE_REG_PF_GROW", ((unsigned long long)reg->start_pfn) << PAGE_SHIFT); return minimum_extra; } @@ -345,13 +584,14 @@ static size_t reg_grow_calc_extra_pages(struct kbase_device *kbdev, static void kbase_gpu_mmu_handle_write_faulting_as(struct kbase_device *kbdev, struct kbase_as *faulting_as, u64 start_pfn, size_t nr, - u32 kctx_id) + u32 kctx_id, u64 dirty_pgds) { /* Calls to this function are inherently synchronous, with respect to * MMU operations. */ const enum kbase_caller_mmu_sync_info mmu_sync_info = CALLER_MMU_SYNC; struct kbase_mmu_hw_op_param op_param; + int ret = 0; mutex_lock(&kbdev->mmu_hw_mutex); @@ -359,27 +599,31 @@ static void kbase_gpu_mmu_handle_write_faulting_as(struct kbase_device *kbdev, KBASE_MMU_FAULT_TYPE_PAGE); /* flush L2 and unlock the VA (resumes the MMU) */ - op_param = (struct kbase_mmu_hw_op_param){ - .vpfn = start_pfn, - .nr = nr, - .op = KBASE_MMU_OP_FLUSH_PT, - .kctx_id = kctx_id, - .mmu_sync_info = mmu_sync_info, - }; + op_param.vpfn = start_pfn; + op_param.nr = nr; + op_param.op = KBASE_MMU_OP_FLUSH_PT; + op_param.kctx_id = kctx_id; + op_param.mmu_sync_info = mmu_sync_info; if (mmu_flush_cache_on_gpu_ctrl(kbdev)) { unsigned long irq_flags; spin_lock_irqsave(&kbdev->hwaccess_lock, irq_flags); - mmu_flush_invalidate_on_gpu_ctrl(kbdev, faulting_as, &op_param); + op_param.flush_skip_levels = + pgd_level_to_skip_flush(dirty_pgds); + ret = kbase_mmu_hw_do_flush_on_gpu_ctrl(kbdev, faulting_as, &op_param); spin_unlock_irqrestore(&kbdev->hwaccess_lock, irq_flags); } else { mmu_hw_operation_begin(kbdev); - kbase_mmu_hw_do_operation(kbdev, faulting_as, &op_param); + ret = kbase_mmu_hw_do_flush(kbdev, faulting_as, &op_param); mmu_hw_operation_end(kbdev); } mutex_unlock(&kbdev->mmu_hw_mutex); + if (ret) + dev_err(kbdev->dev, + "Flush for GPU page fault due to write access did not complete"); + kbase_mmu_hw_enable_fault(kbdev, faulting_as, KBASE_MMU_FAULT_TYPE_PAGE); } @@ -412,8 +656,8 @@ static void kbase_gpu_mmu_handle_write_fault(struct kbase_context *kctx, struct tagged_addr *fault_phys_addr; struct kbase_fault *fault; u64 fault_pfn, pfn_offset; - int ret; int as_no; + u64 dirty_pgds = 0; as_no = faulting_as->number; kbdev = container_of(faulting_as, struct kbase_device, as[as_no]); @@ -472,12 +716,11 @@ static void kbase_gpu_mmu_handle_write_fault(struct kbase_context *kctx, } /* Now make this faulting page writable to GPU. */ - ret = kbase_mmu_update_pages_no_flush(kctx, fault_pfn, - fault_phys_addr, - 1, region->flags, region->gpu_alloc->group_id); + kbase_mmu_update_pages_no_flush(kbdev, &kctx->mmu, fault_pfn, fault_phys_addr, 1, + region->flags, region->gpu_alloc->group_id, &dirty_pgds); kbase_gpu_mmu_handle_write_faulting_as(kbdev, faulting_as, fault_pfn, 1, - kctx->id); + kctx->id, dirty_pgds); kbase_gpu_vm_unlock(kctx); } @@ -492,7 +735,7 @@ static void kbase_gpu_mmu_handle_permission_fault(struct kbase_context *kctx, case AS_FAULTSTATUS_ACCESS_TYPE_WRITE: kbase_gpu_mmu_handle_write_fault(kctx, faulting_as); break; - case AS_FAULTSTATUS_ACCESS_TYPE_EX: + case AS_FAULTSTATUS_ACCESS_TYPE_EXECUTE: kbase_mmu_report_fault_and_kill(kctx, faulting_as, "Execute Permission fault", fault); break; @@ -508,31 +751,68 @@ static void kbase_gpu_mmu_handle_permission_fault(struct kbase_context *kctx, } #endif -#define MAX_POOL_LEVEL 2 +/** + * estimate_pool_space_required - Determine how much a pool should be grown by to support a future + * allocation + * @pool: The memory pool to check, including its linked pools + * @pages_required: Number of 4KiB pages require for the pool to support a future allocation + * + * The value returned is accounting for the size of @pool and the size of each memory pool linked to + * @pool. Hence, the caller should use @pool and (if not already satisfied) all its linked pools to + * allocate from. + * + * Note: this is only an estimate, because even during the calculation the memory pool(s) involved + * can be updated to be larger or smaller. Hence, the result is only a guide as to whether an + * allocation could succeed, or an estimate of the correct amount to grow the pool by. The caller + * should keep attempting an allocation and then re-growing with a new value queried form this + * function until the allocation succeeds. + * + * Return: an estimate of the amount of extra 4KiB pages in @pool that are required to satisfy an + * allocation, or 0 if @pool (including its linked pools) is likely to already satisfy the + * allocation. + */ +static size_t estimate_pool_space_required(struct kbase_mem_pool *pool, const size_t pages_required) +{ + size_t pages_still_required; + + for (pages_still_required = pages_required; pool != NULL && pages_still_required; + pool = pool->next_pool) { + size_t pool_size_4k; + + kbase_mem_pool_lock(pool); + + pool_size_4k = kbase_mem_pool_size(pool) << pool->order; + if (pool_size_4k >= pages_still_required) + pages_still_required = 0; + else + pages_still_required -= pool_size_4k; + + kbase_mem_pool_unlock(pool); + } + return pages_still_required; +} /** * page_fault_try_alloc - Try to allocate memory from a context pool * @kctx: Context pointer * @region: Region to grow - * @new_pages: Number of 4 kB pages to allocate - * @pages_to_grow: Pointer to variable to store number of outstanding pages on - * failure. This can be either 4 kB or 2 MB pages, depending on - * the number of pages requested. - * @grow_2mb_pool: Pointer to variable to store which pool needs to grow - true - * for 2 MB, false for 4 kB. + * @new_pages: Number of 4 KiB pages to allocate + * @pages_to_grow: Pointer to variable to store number of outstanding pages on failure. This can be + * either 4 KiB or 2 MiB pages, depending on the number of pages requested. + * @grow_2mb_pool: Pointer to variable to store which pool needs to grow - true for 2 MiB, false for + * 4 KiB. * @prealloc_sas: Pointer to kbase_sub_alloc structures * - * This function will try to allocate as many pages as possible from the context - * pool, then if required will try to allocate the remaining pages from the - * device pool. + * This function will try to allocate as many pages as possible from the context pool, then if + * required will try to allocate the remaining pages from the device pool. * - * This function will not allocate any new memory beyond that is already - * present in the context or device pools. This is because it is intended to be - * called with the vm_lock held, which could cause recursive locking if the - * allocation caused the out-of-memory killer to run. + * This function will not allocate any new memory beyond that is already present in the context or + * device pools. This is because it is intended to be called whilst the thread has acquired the + * region list lock with kbase_gpu_vm_lock(), and a large enough memory allocation whilst that is + * held could invoke the OoM killer and cause an effective deadlock with kbase_cpu_vm_close(). * - * If 2 MB pages are enabled and new_pages is >= 2 MB then pages_to_grow will be - * a count of 2 MB pages, otherwise it will be a count of 4 kB pages. + * If 2 MiB pages are enabled and new_pages is >= 2 MiB then pages_to_grow will be a count of 2 MiB + * pages, otherwise it will be a count of 4 KiB pages. * * Return: true if successful, false on failure */ @@ -541,13 +821,15 @@ static bool page_fault_try_alloc(struct kbase_context *kctx, int *pages_to_grow, bool *grow_2mb_pool, struct kbase_sub_alloc **prealloc_sas) { - struct tagged_addr *gpu_pages[MAX_POOL_LEVEL] = {NULL}; - struct tagged_addr *cpu_pages[MAX_POOL_LEVEL] = {NULL}; - size_t pages_alloced[MAX_POOL_LEVEL] = {0}; + size_t total_gpu_pages_alloced = 0; + size_t total_cpu_pages_alloced = 0; struct kbase_mem_pool *pool, *root_pool; - int pool_level = 0; bool alloc_failed = false; size_t pages_still_required; + size_t total_mempools_free_4k = 0; + + lockdep_assert_held(&kctx->reg_lock); + lockdep_assert_held(&kctx->mem_partials_lock); if (WARN_ON(region->gpu_alloc->group_id >= MEMORY_GROUP_MANAGER_NR_GROUPS)) { @@ -556,42 +838,21 @@ static bool page_fault_try_alloc(struct kbase_context *kctx, return false; } -#ifdef CONFIG_MALI_2MB_ALLOC - if (new_pages >= (SZ_2M / SZ_4K)) { + if (kctx->kbdev->pagesize_2mb && new_pages >= (SZ_2M / SZ_4K)) { root_pool = &kctx->mem_pools.large[region->gpu_alloc->group_id]; *grow_2mb_pool = true; } else { -#endif root_pool = &kctx->mem_pools.small[region->gpu_alloc->group_id]; *grow_2mb_pool = false; -#ifdef CONFIG_MALI_2MB_ALLOC } -#endif if (region->gpu_alloc != region->cpu_alloc) new_pages *= 2; - pages_still_required = new_pages; - /* Determine how many pages are in the pools before trying to allocate. * Don't attempt to allocate & free if the allocation can't succeed. */ - for (pool = root_pool; pool != NULL; pool = pool->next_pool) { - size_t pool_size_4k; - - kbase_mem_pool_lock(pool); - - pool_size_4k = kbase_mem_pool_size(pool) << pool->order; - if (pool_size_4k >= pages_still_required) - pages_still_required = 0; - else - pages_still_required -= pool_size_4k; - - kbase_mem_pool_unlock(pool); - - if (!pages_still_required) - break; - } + pages_still_required = estimate_pool_space_required(root_pool, new_pages); if (pages_still_required) { /* Insufficient pages in pools. Don't try to allocate - just @@ -602,11 +863,11 @@ static bool page_fault_try_alloc(struct kbase_context *kctx, return false; } - /* Since we've dropped the pool locks, the amount of memory in the pools - * may change between the above check and the actual allocation. + /* Since we're not holding any of the mempool locks, the amount of memory in the pools may + * change between the above estimate and the actual allocation. */ - pool = root_pool; - for (pool_level = 0; pool_level < MAX_POOL_LEVEL; pool_level++) { + pages_still_required = new_pages; + for (pool = root_pool; pool != NULL && pages_still_required; pool = pool->next_pool) { size_t pool_size_4k; size_t pages_to_alloc_4k; size_t pages_to_alloc_4k_per_alloc; @@ -615,94 +876,92 @@ static bool page_fault_try_alloc(struct kbase_context *kctx, /* Allocate as much as possible from this pool*/ pool_size_4k = kbase_mem_pool_size(pool) << pool->order; - pages_to_alloc_4k = MIN(new_pages, pool_size_4k); + total_mempools_free_4k += pool_size_4k; + pages_to_alloc_4k = MIN(pages_still_required, pool_size_4k); if (region->gpu_alloc == region->cpu_alloc) pages_to_alloc_4k_per_alloc = pages_to_alloc_4k; else pages_to_alloc_4k_per_alloc = pages_to_alloc_4k >> 1; - pages_alloced[pool_level] = pages_to_alloc_4k; if (pages_to_alloc_4k) { - gpu_pages[pool_level] = - kbase_alloc_phy_pages_helper_locked( - region->gpu_alloc, pool, - pages_to_alloc_4k_per_alloc, - &prealloc_sas[0]); + struct tagged_addr *gpu_pages = + kbase_alloc_phy_pages_helper_locked(region->gpu_alloc, pool, + pages_to_alloc_4k_per_alloc, + &prealloc_sas[0]); - if (!gpu_pages[pool_level]) { + if (!gpu_pages) alloc_failed = true; - } else if (region->gpu_alloc != region->cpu_alloc) { - cpu_pages[pool_level] = - kbase_alloc_phy_pages_helper_locked( - region->cpu_alloc, pool, - pages_to_alloc_4k_per_alloc, - &prealloc_sas[1]); - - if (!cpu_pages[pool_level]) + else + total_gpu_pages_alloced += pages_to_alloc_4k_per_alloc; + + if (!alloc_failed && region->gpu_alloc != region->cpu_alloc) { + struct tagged_addr *cpu_pages = kbase_alloc_phy_pages_helper_locked( + region->cpu_alloc, pool, pages_to_alloc_4k_per_alloc, + &prealloc_sas[1]); + + if (!cpu_pages) alloc_failed = true; + else + total_cpu_pages_alloced += pages_to_alloc_4k_per_alloc; } } kbase_mem_pool_unlock(pool); if (alloc_failed) { - WARN_ON(!new_pages); - WARN_ON(pages_to_alloc_4k >= new_pages); - WARN_ON(pages_to_alloc_4k_per_alloc >= new_pages); + WARN_ON(!pages_still_required); + WARN_ON(pages_to_alloc_4k >= pages_still_required); + WARN_ON(pages_to_alloc_4k_per_alloc >= pages_still_required); break; } - new_pages -= pages_to_alloc_4k; - - if (!new_pages) - break; - - pool = pool->next_pool; - if (!pool) - break; + pages_still_required -= pages_to_alloc_4k; } - if (new_pages) { - /* Allocation was unsuccessful */ - int max_pool_level = pool_level; - - pool = root_pool; - - /* Free memory allocated so far */ - for (pool_level = 0; pool_level <= max_pool_level; - pool_level++) { - kbase_mem_pool_lock(pool); + if (pages_still_required) { + /* Allocation was unsuccessful. We have dropped the mem_pool lock after allocation, + * so must in any case use kbase_free_phy_pages_helper() rather than + * kbase_free_phy_pages_helper_locked() + */ + if (total_gpu_pages_alloced > 0) + kbase_free_phy_pages_helper(region->gpu_alloc, total_gpu_pages_alloced); + if (region->gpu_alloc != region->cpu_alloc && total_cpu_pages_alloced > 0) + kbase_free_phy_pages_helper(region->cpu_alloc, total_cpu_pages_alloced); - if (region->gpu_alloc != region->cpu_alloc) { - if (pages_alloced[pool_level] && - cpu_pages[pool_level]) - kbase_free_phy_pages_helper_locked( - region->cpu_alloc, - pool, cpu_pages[pool_level], - pages_alloced[pool_level]); + if (alloc_failed) { + /* Note that in allocating from the above memory pools, we always ensure + * never to request more than is available in each pool with the pool's + * lock held. Hence failing to allocate in such situations would be unusual + * and we should cancel the growth instead (as re-growing the memory pool + * might not fix the situation) + */ + dev_warn( + kctx->kbdev->dev, + "Page allocation failure of %zu pages: managed %zu pages, mempool (inc linked pools) had %zu pages available", + new_pages, total_gpu_pages_alloced + total_cpu_pages_alloced, + total_mempools_free_4k); + *pages_to_grow = 0; + } else { + /* Tell the caller to try to grow the memory pool + * + * Freeing pages above may have spilled or returned them to the OS, so we + * have to take into account how many are still in the pool before giving a + * new estimate for growth required of the pool. We can just re-estimate a + * new value. + */ + pages_still_required = estimate_pool_space_required(root_pool, new_pages); + if (pages_still_required) { + *pages_to_grow = pages_still_required; + } else { + /* It's possible another thread could've grown the pool to be just + * big enough after we rolled back the allocation. Request at least + * one more page to ensure the caller doesn't fail the growth by + * conflating it with the alloc_failed case above + */ + *pages_to_grow = 1u; } - - if (pages_alloced[pool_level] && gpu_pages[pool_level]) - kbase_free_phy_pages_helper_locked( - region->gpu_alloc, - pool, gpu_pages[pool_level], - pages_alloced[pool_level]); - - kbase_mem_pool_unlock(pool); - - pool = pool->next_pool; } - /* - * If the allocation failed despite there being enough memory in - * the pool, then just fail. Otherwise, try to grow the memory - * pool. - */ - if (alloc_failed) - *pages_to_grow = 0; - else - *pages_to_grow = new_pages; - return false; } @@ -712,18 +971,6 @@ static bool page_fault_try_alloc(struct kbase_context *kctx, return true; } -/* Small wrapper function to factor out GPU-dependent context releasing */ -static void release_ctx(struct kbase_device *kbdev, - struct kbase_context *kctx) -{ -#if MALI_USE_CSF - CSTD_UNUSED(kbdev); - kbase_ctx_sched_release_ctx_lock(kctx); -#else /* MALI_USE_CSF */ - kbasep_js_runpool_release_ctx(kbdev, kctx); -#endif /* MALI_USE_CSF */ -} - void kbase_mmu_page_fault_worker(struct work_struct *data) { u64 fault_pfn; @@ -758,9 +1005,8 @@ void kbase_mmu_page_fault_worker(struct work_struct *data) as_no = faulting_as->number; kbdev = container_of(faulting_as, struct kbase_device, as[as_no]); - dev_dbg(kbdev->dev, - "Entering %s %pK, fault_pfn %lld, as_no %d\n", - __func__, (void *)data, fault_pfn, as_no); + dev_dbg(kbdev->dev, "Entering %s %pK, fault_pfn %lld, as_no %d", __func__, (void *)data, + fault_pfn, as_no); /* Grab the context that was already refcounted in kbase_mmu_interrupt() * Therefore, it cannot be scheduled out of this AS until we explicitly @@ -783,8 +1029,7 @@ void kbase_mmu_page_fault_worker(struct work_struct *data) #ifdef CONFIG_MALI_ARBITER_SUPPORT /* check if we still have GPU */ if (unlikely(kbase_is_gpu_removed(kbdev))) { - dev_dbg(kbdev->dev, - "%s: GPU has been removed\n", __func__); + dev_dbg(kbdev->dev, "%s: GPU has been removed", __func__); goto fault_done; } #endif @@ -847,20 +1092,24 @@ void kbase_mmu_page_fault_worker(struct work_struct *data) goto fault_done; } -#ifdef CONFIG_MALI_2MB_ALLOC - /* Preallocate memory for the sub-allocation structs if necessary */ - for (i = 0; i != ARRAY_SIZE(prealloc_sas); ++i) { - prealloc_sas[i] = kmalloc(sizeof(*prealloc_sas[i]), GFP_KERNEL); - if (!prealloc_sas[i]) { - kbase_mmu_report_fault_and_kill(kctx, faulting_as, - "Failed pre-allocating memory for sub-allocations' metadata", - fault); - goto fault_done; +page_fault_retry: + if (kbdev->pagesize_2mb) { + /* Preallocate (or re-allocate) memory for the sub-allocation structs if necessary */ + for (i = 0; i != ARRAY_SIZE(prealloc_sas); ++i) { + if (!prealloc_sas[i]) { + prealloc_sas[i] = kmalloc(sizeof(*prealloc_sas[i]), GFP_KERNEL); + + if (!prealloc_sas[i]) { + kbase_mmu_report_fault_and_kill( + kctx, faulting_as, + "Failed pre-allocating memory for sub-allocations' metadata", + fault); + goto fault_done; + } + } } } -#endif /* CONFIG_MALI_2MB_ALLOC */ -page_fault_retry: /* so we have a translation fault, * let's see if it is for growable memory */ @@ -938,16 +1187,29 @@ page_fault_retry: * transaction (which should cause the other page fault to be * raised again). */ - op_param = (struct kbase_mmu_hw_op_param){ - .vpfn = 0, - .nr = 0, - .op = KBASE_MMU_OP_UNLOCK, - .kctx_id = kctx->id, - .mmu_sync_info = mmu_sync_info, - }; - mmu_hw_operation_begin(kbdev); - kbase_mmu_hw_do_operation(kbdev, faulting_as, &op_param); - mmu_hw_operation_end(kbdev); + op_param.mmu_sync_info = mmu_sync_info; + op_param.kctx_id = kctx->id; + if (!mmu_flush_cache_on_gpu_ctrl(kbdev)) { + mmu_hw_operation_begin(kbdev); + err = kbase_mmu_hw_do_unlock_no_addr(kbdev, faulting_as, + &op_param); + mmu_hw_operation_end(kbdev); + } else { + /* Can safely skip the invalidate for all levels in case + * of duplicate page faults. + */ + op_param.flush_skip_levels = 0xF; + op_param.vpfn = fault_pfn; + op_param.nr = 1; + err = kbase_mmu_hw_do_unlock(kbdev, faulting_as, + &op_param); + } + + if (err) { + dev_err(kbdev->dev, + "Invalidation for MMU did not complete on handling page fault @ 0x%llx", + fault->addr); + } mutex_unlock(&kbdev->mmu_hw_mutex); @@ -962,8 +1224,7 @@ page_fault_retry: /* cap to max vsize */ new_pages = min(new_pages, region->nr_pages - current_backed_size); - dev_dbg(kctx->kbdev->dev, "Allocate %zu pages on page fault\n", - new_pages); + dev_dbg(kctx->kbdev->dev, "Allocate %zu pages on page fault", new_pages); if (new_pages == 0) { struct kbase_mmu_hw_op_param op_param; @@ -975,16 +1236,29 @@ page_fault_retry: KBASE_MMU_FAULT_TYPE_PAGE); /* See comment [1] about UNLOCK usage */ - op_param = (struct kbase_mmu_hw_op_param){ - .vpfn = 0, - .nr = 0, - .op = KBASE_MMU_OP_UNLOCK, - .kctx_id = kctx->id, - .mmu_sync_info = mmu_sync_info, - }; - mmu_hw_operation_begin(kbdev); - kbase_mmu_hw_do_operation(kbdev, faulting_as, &op_param); - mmu_hw_operation_end(kbdev); + op_param.mmu_sync_info = mmu_sync_info; + op_param.kctx_id = kctx->id; + if (!mmu_flush_cache_on_gpu_ctrl(kbdev)) { + mmu_hw_operation_begin(kbdev); + err = kbase_mmu_hw_do_unlock_no_addr(kbdev, faulting_as, + &op_param); + mmu_hw_operation_end(kbdev); + } else { + /* Can safely skip the invalidate for all levels in case + * of duplicate page faults. + */ + op_param.flush_skip_levels = 0xF; + op_param.vpfn = fault_pfn; + op_param.nr = 1; + err = kbase_mmu_hw_do_unlock(kbdev, faulting_as, + &op_param); + } + + if (err) { + dev_err(kbdev->dev, + "Invalidation for MMU did not complete on handling page fault @ 0x%llx", + fault->addr); + } mutex_unlock(&kbdev->mmu_hw_mutex); @@ -1009,6 +1283,7 @@ page_fault_retry: spin_unlock(&kctx->mem_partials_lock); if (grown) { + u64 dirty_pgds = 0; u64 pfn_offset; struct kbase_mmu_hw_op_param op_param; @@ -1026,10 +1301,11 @@ page_fault_retry: * so the no_flush version of insert_pages is used which allows * us to unlock the MMU as we see fit. */ - err = kbase_mmu_insert_pages_no_flush(kbdev, &kctx->mmu, - region->start_pfn + pfn_offset, - &kbase_get_gpu_phy_pages(region)[pfn_offset], - new_pages, region->flags, region->gpu_alloc->group_id); + err = mmu_insert_pages_no_flush(kbdev, &kctx->mmu, region->start_pfn + pfn_offset, + &kbase_get_gpu_phy_pages(region)[pfn_offset], + new_pages, region->flags, + region->gpu_alloc->group_id, &dirty_pgds, region, + false); if (err) { kbase_free_phy_pages_helper(region->gpu_alloc, new_pages); @@ -1048,23 +1324,18 @@ page_fault_retry: (u64)new_pages); trace_mali_mmu_page_fault_grow(region, fault, new_pages); -#if MALI_INCREMENTAL_RENDERING +#if MALI_INCREMENTAL_RENDERING_JM /* Switch to incremental rendering if we have nearly run out of * memory in a JIT memory allocation. */ if (region->threshold_pages && kbase_reg_current_backed_size(region) > region->threshold_pages) { - - dev_dbg(kctx->kbdev->dev, - "%zu pages exceeded IR threshold %zu\n", - new_pages + current_backed_size, - region->threshold_pages); + dev_dbg(kctx->kbdev->dev, "%zu pages exceeded IR threshold %zu", + new_pages + current_backed_size, region->threshold_pages); if (kbase_mmu_switch_to_ir(kctx, region) >= 0) { - dev_dbg(kctx->kbdev->dev, - "Get region %pK for IR\n", - (void *)region); + dev_dbg(kctx->kbdev->dev, "Get region %pK for IR", (void *)region); kbase_va_region_alloc_get(kctx, region); } } @@ -1084,25 +1355,22 @@ page_fault_retry: kbase_mmu_hw_clear_fault(kbdev, faulting_as, KBASE_MMU_FAULT_TYPE_PAGE); - /* flush L2 and unlock the VA (resumes the MMU) */ - op_param = (struct kbase_mmu_hw_op_param){ - .vpfn = fault->addr >> PAGE_SHIFT, - .nr = new_pages, - .op = KBASE_MMU_OP_FLUSH_PT, - .kctx_id = kctx->id, - .mmu_sync_info = mmu_sync_info, - }; + op_param.vpfn = region->start_pfn + pfn_offset; + op_param.nr = new_pages; + op_param.op = KBASE_MMU_OP_FLUSH_PT; + op_param.kctx_id = kctx->id; + op_param.mmu_sync_info = mmu_sync_info; if (mmu_flush_cache_on_gpu_ctrl(kbdev)) { - unsigned long irq_flags; - - spin_lock_irqsave(&kbdev->hwaccess_lock, irq_flags); - err = mmu_flush_invalidate_on_gpu_ctrl(kbdev, faulting_as, - &op_param); - spin_unlock_irqrestore(&kbdev->hwaccess_lock, irq_flags); + /* Unlock to invalidate the TLB (and resume the MMU) */ + op_param.flush_skip_levels = + pgd_level_to_skip_flush(dirty_pgds); + err = kbase_mmu_hw_do_unlock(kbdev, faulting_as, + &op_param); } else { + /* flush L2 and unlock the VA (resumes the MMU) */ mmu_hw_operation_begin(kbdev); - err = kbase_mmu_hw_do_operation(kbdev, faulting_as, - &op_param); + err = kbase_mmu_hw_do_flush(kbdev, faulting_as, + &op_param); mmu_hw_operation_end(kbdev); } @@ -1148,6 +1416,7 @@ page_fault_retry: kbase_gpu_vm_unlock(kctx); } else { int ret = -ENOMEM; + const u8 group_id = region->gpu_alloc->group_id; kbase_gpu_vm_unlock(kctx); @@ -1155,37 +1424,31 @@ page_fault_retry: * Otherwise fail the allocation. */ if (pages_to_grow > 0) { -#ifdef CONFIG_MALI_2MB_ALLOC - if (grow_2mb_pool) { + if (kbdev->pagesize_2mb && grow_2mb_pool) { /* Round page requirement up to nearest 2 MB */ struct kbase_mem_pool *const lp_mem_pool = - &kctx->mem_pools.large[ - region->gpu_alloc->group_id]; + &kctx->mem_pools.large[group_id]; pages_to_grow = (pages_to_grow + ((1 << lp_mem_pool->order) - 1)) >> lp_mem_pool->order; ret = kbase_mem_pool_grow(lp_mem_pool, - pages_to_grow); + pages_to_grow, kctx->task); } else { -#endif struct kbase_mem_pool *const mem_pool = - &kctx->mem_pools.small[ - region->gpu_alloc->group_id]; + &kctx->mem_pools.small[group_id]; ret = kbase_mem_pool_grow(mem_pool, - pages_to_grow); -#ifdef CONFIG_MALI_2MB_ALLOC + pages_to_grow, kctx->task); } -#endif } if (ret < 0) { /* failed to extend, handle as a normal PF */ kbase_mmu_report_fault_and_kill(kctx, faulting_as, "Page allocation failure", fault); } else { - dev_dbg(kbdev->dev, "Try again after pool_grow\n"); + dev_dbg(kbdev->dev, "Try again after pool_grow"); goto page_fault_retry; } } @@ -1212,24 +1475,27 @@ fault_done: release_ctx(kbdev, kctx); atomic_dec(&kbdev->faults_pending); - dev_dbg(kbdev->dev, "Leaving page_fault_worker %pK\n", (void *)data); + dev_dbg(kbdev->dev, "Leaving page_fault_worker %pK", (void *)data); } static phys_addr_t kbase_mmu_alloc_pgd(struct kbase_device *kbdev, struct kbase_mmu_table *mmut) { u64 *page; - int i; struct page *p; + phys_addr_t pgd; p = kbase_mem_pool_alloc(&kbdev->mem_pools.small[mmut->group_id]); if (!p) - return 0; + return KBASE_MMU_INVALID_PGD_ADDRESS; + + page = kbase_kmap(p); - page = kmap(p); if (page == NULL) goto alloc_free; + pgd = page_to_phys(p); + /* If the MMU tables belong to a context then account the memory usage * to that context, otherwise the MMU tables are device wide and are * only accounted to the device. @@ -1250,33 +1516,43 @@ static phys_addr_t kbase_mmu_alloc_pgd(struct kbase_device *kbdev, kbase_trace_gpu_mem_usage_inc(kbdev, mmut->kctx, 1); - for (i = 0; i < KBASE_MMU_PAGE_ENTRIES; i++) - kbdev->mmu_mode->entry_invalidate(&page[i]); + kbdev->mmu_mode->entries_invalidate(page, KBASE_MMU_PAGE_ENTRIES); - kbase_mmu_sync_pgd(kbdev, kbase_dma_addr(p), PAGE_SIZE); + /* As this page is newly created, therefore there is no content to + * clean or invalidate in the GPU caches. + */ + kbase_mmu_sync_pgd_cpu(kbdev, kbase_dma_addr(p), PAGE_SIZE); - kunmap(p); - return page_to_phys(p); + kbase_kunmap(p, page); + return pgd; alloc_free: kbase_mem_pool_free(&kbdev->mem_pools.small[mmut->group_id], p, false); - return 0; + return KBASE_MMU_INVALID_PGD_ADDRESS; } -/* Given PGD PFN for level N, return PGD PFN for level N+1, allocating the - * new table from the pool if needed and possible +/** + * mmu_get_next_pgd() - Given PGD PFN for level N, return PGD PFN for level N+1 + * + * @kbdev: Device pointer. + * @mmut: GPU MMU page table. + * @pgd: Physical addresse of level N page directory. + * @vpfn: The virtual page frame number. + * @level: The level of MMU page table (N). + * + * Return: + * * 0 - OK + * * -EFAULT - level N+1 PGD does not exist + * * -EINVAL - kmap() failed for level N PGD PFN */ -static int mmu_get_next_pgd(struct kbase_device *kbdev, - struct kbase_mmu_table *mmut, - phys_addr_t *pgd, u64 vpfn, int level) +static int mmu_get_next_pgd(struct kbase_device *kbdev, struct kbase_mmu_table *mmut, + phys_addr_t *pgd, u64 vpfn, int level) { u64 *page; phys_addr_t target_pgd; struct page *p; - KBASE_DEBUG_ASSERT(*pgd); - lockdep_assert_held(&mmut->mmu_lock); /* @@ -1287,43 +1563,92 @@ static int mmu_get_next_pgd(struct kbase_device *kbdev, vpfn &= 0x1FF; p = pfn_to_page(PFN_DOWN(*pgd)); - page = kmap(p); + page = kbase_kmap(p); if (page == NULL) { - dev_warn(kbdev->dev, "%s: kmap failure\n", __func__); + dev_err(kbdev->dev, "%s: kmap failure", __func__); return -EINVAL; } - target_pgd = kbdev->mmu_mode->pte_to_phy_addr(page[vpfn]); + if (!kbdev->mmu_mode->pte_is_valid(page[vpfn], level)) { + dev_dbg(kbdev->dev, "%s: invalid PTE at level %d vpfn 0x%llx", __func__, level, + vpfn); + kbase_kunmap(p, page); + return -EFAULT; + } else { + target_pgd = kbdev->mmu_mode->pte_to_phy_addr( + kbdev->mgm_dev->ops.mgm_pte_to_original_pte( + kbdev->mgm_dev, MGM_DEFAULT_PTE_GROUP, level, page[vpfn])); + } - if (!target_pgd) { - target_pgd = kbase_mmu_alloc_pgd(kbdev, mmut); - if (!target_pgd) { - dev_dbg(kbdev->dev, "%s: kbase_mmu_alloc_pgd failure\n", - __func__); - kunmap(p); - return -ENOMEM; - } + kbase_kunmap(p, page); + *pgd = target_pgd; - kbdev->mmu_mode->entry_set_pte(page, vpfn, target_pgd); + return 0; +} - kbase_mmu_sync_pgd(kbdev, kbase_dma_addr(p), PAGE_SIZE); - /* Rely on the caller to update the address space flags. */ +/** + * mmu_get_lowest_valid_pgd() - Find a valid PGD at or closest to in_level + * + * @kbdev: Device pointer. + * @mmut: GPU MMU page table. + * @vpfn: The virtual page frame number. + * @in_level: The level of MMU page table (N). + * @out_level: Set to the level of the lowest valid PGD found on success. + * Invalid on error. + * @out_pgd: Set to the lowest valid PGD found on success. + * Invalid on error. + * + * Does a page table walk starting from top level (L0) to in_level to find a valid PGD at or + * closest to in_level + * + * Terminology: + * Level-0 = Top-level = highest + * Level-3 = Bottom-level = lowest + * + * Return: + * * 0 - OK + * * -EINVAL - kmap() failed during page table walk. + */ +static int mmu_get_lowest_valid_pgd(struct kbase_device *kbdev, struct kbase_mmu_table *mmut, + u64 vpfn, int in_level, int *out_level, phys_addr_t *out_pgd) +{ + phys_addr_t pgd; + int l; + int err = 0; + + lockdep_assert_held(&mmut->mmu_lock); + pgd = mmut->pgd; + + for (l = MIDGARD_MMU_TOPLEVEL; l < in_level; l++) { + err = mmu_get_next_pgd(kbdev, mmut, &pgd, vpfn, l); + + /* Handle failure condition */ + if (err) { + dev_dbg(kbdev->dev, + "%s: mmu_get_next_pgd() failed to find a valid pgd at level %d", + __func__, l + 1); + break; + } } - kunmap(p); - *pgd = target_pgd; + *out_pgd = pgd; + *out_level = l; - return 0; + /* -EFAULT indicates that pgd param was valid but the next pgd entry at vpfn was invalid. + * This implies that we have found the lowest valid pgd. Reset the error code. + */ + if (err == -EFAULT) + err = 0; + + return err; } /* - * Returns the PGD for the specified level of translation + * On success, sets out_pgd to the PGD for the specified level of translation + * Returns -EFAULT if a valid PGD is not found */ -static int mmu_get_pgd_at_level(struct kbase_device *kbdev, - struct kbase_mmu_table *mmut, - u64 vpfn, - int level, - phys_addr_t *out_pgd) +static int mmu_get_pgd_at_level(struct kbase_device *kbdev, struct kbase_mmu_table *mmut, u64 vpfn, + int level, phys_addr_t *out_pgd) { phys_addr_t pgd; int l; @@ -1335,9 +1660,9 @@ static int mmu_get_pgd_at_level(struct kbase_device *kbdev, int err = mmu_get_next_pgd(kbdev, mmut, &pgd, vpfn, l); /* Handle failure condition */ if (err) { - dev_dbg(kbdev->dev, - "%s: mmu_get_next_pgd failure at level %d\n", - __func__, l); + dev_err(kbdev->dev, + "%s: mmu_get_next_pgd() failed to find a valid pgd at level %d", + __func__, l + 1); return err; } } @@ -1347,20 +1672,11 @@ static int mmu_get_pgd_at_level(struct kbase_device *kbdev, return 0; } -static int mmu_get_bottom_pgd(struct kbase_device *kbdev, - struct kbase_mmu_table *mmut, - u64 vpfn, - phys_addr_t *out_pgd) -{ - return mmu_get_pgd_at_level(kbdev, mmut, vpfn, MIDGARD_MMU_BOTTOMLEVEL, - out_pgd); -} - static void mmu_insert_pages_failure_recovery(struct kbase_device *kbdev, - struct kbase_mmu_table *mmut, - u64 from_vpfn, u64 to_vpfn) + struct kbase_mmu_table *mmut, u64 from_vpfn, + u64 to_vpfn, u64 *dirty_pgds, + struct tagged_addr *phys, bool ignore_page_migration) { - phys_addr_t pgd; u64 vpfn = from_vpfn; struct kbase_mmu_mode const *mmu_mode; @@ -1371,9 +1687,9 @@ static void mmu_insert_pages_failure_recovery(struct kbase_device *kbdev, lockdep_assert_held(&mmut->mmu_lock); mmu_mode = kbdev->mmu_mode; + kbase_mmu_reset_free_pgds_list(mmut); while (vpfn < to_vpfn) { - unsigned int i; unsigned int idx = vpfn & 0x1FF; unsigned int count = KBASE_MMU_PAGE_ENTRIES - idx; unsigned int pcount = 0; @@ -1381,6 +1697,8 @@ static void mmu_insert_pages_failure_recovery(struct kbase_device *kbdev, int level; u64 *page; phys_addr_t pgds[MIDGARD_MMU_BOTTOMLEVEL + 1]; + phys_addr_t pgd = mmut->pgd; + struct page *p = phys_to_page(pgd); register unsigned int num_of_valid_entries; @@ -1388,17 +1706,17 @@ static void mmu_insert_pages_failure_recovery(struct kbase_device *kbdev, count = left; /* need to check if this is a 2MB page or a 4kB */ - pgd = mmut->pgd; - for (level = MIDGARD_MMU_TOPLEVEL; level <= MIDGARD_MMU_BOTTOMLEVEL; level++) { idx = (vpfn >> ((3 - level) * 9)) & 0x1FF; pgds[level] = pgd; - page = kmap(phys_to_page(pgd)); + page = kbase_kmap(p); if (mmu_mode->ate_is_valid(page[idx], level)) break; /* keep the mapping */ - kunmap(phys_to_page(pgd)); - pgd = mmu_mode->pte_to_phy_addr(page[idx]); + kbase_kunmap(p, page); + pgd = mmu_mode->pte_to_phy_addr(kbdev->mgm_dev->ops.mgm_pte_to_original_pte( + kbdev->mgm_dev, MGM_DEFAULT_PTE_GROUP, level, page[idx])); + p = phys_to_page(pgd); } switch (level) { @@ -1411,68 +1729,312 @@ static void mmu_insert_pages_failure_recovery(struct kbase_device *kbdev, pcount = count; break; default: - dev_warn(kbdev->dev, "%sNo support for ATEs at level %d\n", - __func__, level); + dev_warn(kbdev->dev, "%sNo support for ATEs at level %d", __func__, level); goto next; } + if (dirty_pgds && pcount > 0) + *dirty_pgds |= 1ULL << level; + num_of_valid_entries = mmu_mode->get_num_valid_entries(page); if (WARN_ON_ONCE(num_of_valid_entries < pcount)) num_of_valid_entries = 0; else num_of_valid_entries -= pcount; + /* Invalidate the entries we added */ + mmu_mode->entries_invalidate(&page[idx], pcount); + if (!num_of_valid_entries) { - kunmap(phys_to_page(pgd)); + kbase_kunmap(p, page); - kbase_mmu_free_pgd(kbdev, mmut, pgd, true); + kbase_mmu_add_to_free_pgds_list(mmut, p); - kbase_mmu_update_and_free_parent_pgds(kbdev, mmut, pgds, - vpfn, level); + kbase_mmu_update_and_free_parent_pgds(kbdev, mmut, pgds, vpfn, level, + KBASE_MMU_OP_NONE, dirty_pgds); vpfn += count; continue; } - /* Invalidate the entries we added */ - for (i = 0; i < pcount; i++) - mmu_mode->entry_invalidate(&page[idx + i]); - mmu_mode->set_num_valid_entries(page, num_of_valid_entries); - kbase_mmu_sync_pgd(kbdev, - kbase_dma_addr(phys_to_page(pgd)) + 8 * idx, - 8 * pcount); - kunmap(phys_to_page(pgd)); + /* MMU cache flush strategy is NONE because GPU cache maintenance is + * going to be done by the caller + */ + kbase_mmu_sync_pgd(kbdev, mmut->kctx, pgd + (idx * sizeof(u64)), + kbase_dma_addr(p) + sizeof(u64) * idx, sizeof(u64) * pcount, + KBASE_MMU_OP_NONE); + kbase_kunmap(p, page); next: vpfn += count; } + + /* If page migration is enabled: the only way to recover from failure + * is to mark all pages as not movable. It is not predictable what's + * going to happen to these pages at this stage. They might return + * movable once they are returned to a memory pool. + */ + if (kbase_is_page_migration_enabled() && !ignore_page_migration && phys) { + const u64 num_pages = to_vpfn - from_vpfn + 1; + u64 i; + + for (i = 0; i < num_pages; i++) { + struct page *phys_page = as_page(phys[i]); + struct kbase_page_metadata *page_md = kbase_page_private(phys_page); + + if (page_md) { + spin_lock(&page_md->migrate_lock); + page_md->status = PAGE_STATUS_SET(page_md->status, (u8)NOT_MOVABLE); + spin_unlock(&page_md->migrate_lock); + } + } + } } -/* - * Map the single page 'phys' 'nr' of times, starting at GPU PFN 'vpfn' +static void mmu_flush_invalidate_insert_pages(struct kbase_device *kbdev, + struct kbase_mmu_table *mmut, const u64 vpfn, + size_t nr, u64 dirty_pgds, + enum kbase_caller_mmu_sync_info mmu_sync_info, + bool insert_pages_failed) +{ + struct kbase_mmu_hw_op_param op_param; + int as_nr = 0; + + op_param.vpfn = vpfn; + op_param.nr = nr; + op_param.op = KBASE_MMU_OP_FLUSH_PT; + op_param.mmu_sync_info = mmu_sync_info; + op_param.kctx_id = mmut->kctx ? mmut->kctx->id : 0xFFFFFFFF; + op_param.flush_skip_levels = pgd_level_to_skip_flush(dirty_pgds); + +#if MALI_USE_CSF + as_nr = mmut->kctx ? mmut->kctx->as_nr : MCU_AS_NR; +#else + WARN_ON(!mmut->kctx); +#endif + + /* MMU cache flush strategy depends on whether GPU control commands for + * flushing physical address ranges are supported. The new physical pages + * are not present in GPU caches therefore they don't need any cache + * maintenance, but PGDs in the page table may or may not be created anew. + * + * Operations that affect the whole GPU cache shall only be done if it's + * impossible to update physical ranges. + * + * On GPUs where flushing by physical address range is supported, + * full cache flush is done when an error occurs during + * insert_pages() to keep the error handling simpler. + */ + if (mmu_flush_cache_on_gpu_ctrl(kbdev) && !insert_pages_failed) + mmu_invalidate(kbdev, mmut->kctx, as_nr, &op_param); + else + mmu_flush_invalidate(kbdev, mmut->kctx, as_nr, &op_param); +} + +/** + * update_parent_pgds() - Updates the page table from bottom level towards + * the top level to insert a new ATE + * + * @kbdev: Device pointer. + * @mmut: GPU MMU page table. + * @cur_level: The level of MMU page table where the ATE needs to be added. + * The bottom PGD level. + * @insert_level: The level of MMU page table where the chain of newly allocated + * PGDs needs to be linked-in/inserted. + * @insert_vpfn: The virtual page frame number for the ATE. + * @pgds_to_insert: Ptr to an array (size MIDGARD_MMU_BOTTOMLEVEL+1) that contains + * the physical addresses of newly allocated PGDs from index + * insert_level+1 to cur_level, and an existing PGD at index + * insert_level. + * + * The newly allocated PGDs are linked from the bottom level up and inserted into the PGD + * at insert_level which already exists in the MMU Page Tables. Migration status is also + * updated for all the newly allocated PGD pages. + * + * Return: + * * 0 - OK + * * -EFAULT - level N+1 PGD does not exist + * * -EINVAL - kmap() failed for level N PGD PFN + */ +static int update_parent_pgds(struct kbase_device *kbdev, struct kbase_mmu_table *mmut, + int cur_level, int insert_level, u64 insert_vpfn, + phys_addr_t *pgds_to_insert) +{ + int pgd_index; + int err = 0; + + /* Add a PTE for the new PGD page at pgd_index into the parent PGD at (pgd_index-1) + * Loop runs from the bottom-most to the top-most level so that all entries in the chain + * are valid when they are inserted into the MMU Page table via the insert_level PGD. + */ + for (pgd_index = cur_level; pgd_index > insert_level; pgd_index--) { + int parent_index = pgd_index - 1; + phys_addr_t parent_pgd = pgds_to_insert[parent_index]; + unsigned int current_valid_entries; + u64 pte; + phys_addr_t target_pgd = pgds_to_insert[pgd_index]; + u64 parent_vpfn = (insert_vpfn >> ((3 - parent_index) * 9)) & 0x1FF; + struct page *parent_page = pfn_to_page(PFN_DOWN(parent_pgd)); + u64 *parent_page_va; + + if (WARN_ON_ONCE(target_pgd == KBASE_MMU_INVALID_PGD_ADDRESS)) { + err = -EFAULT; + goto failure_recovery; + } + + parent_page_va = kbase_kmap(parent_page); + + if (unlikely(parent_page_va == NULL)) { + dev_err(kbdev->dev, "%s: kmap failure", __func__); + err = -EINVAL; + goto failure_recovery; + } + + current_valid_entries = kbdev->mmu_mode->get_num_valid_entries(parent_page_va); + + kbdev->mmu_mode->entry_set_pte(&pte, target_pgd); + parent_page_va[parent_vpfn] = kbdev->mgm_dev->ops.mgm_update_gpu_pte( + kbdev->mgm_dev, MGM_DEFAULT_PTE_GROUP, parent_index, pte); + kbdev->mmu_mode->set_num_valid_entries(parent_page_va, current_valid_entries + 1); + kbase_kunmap(parent_page, parent_page_va); + + if (parent_index != insert_level) { + /* Newly allocated PGDs */ + kbase_mmu_sync_pgd_cpu( + kbdev, kbase_dma_addr(parent_page) + (parent_vpfn * sizeof(u64)), + sizeof(u64)); + } else { + /* A new valid entry is added to an existing PGD. Perform the + * invalidate operation for GPU cache as it could be having a + * cacheline that contains the entry (in an invalid form). + */ + kbase_mmu_sync_pgd( + kbdev, mmut->kctx, parent_pgd + (parent_vpfn * sizeof(u64)), + kbase_dma_addr(parent_page) + (parent_vpfn * sizeof(u64)), + sizeof(u64), KBASE_MMU_OP_FLUSH_PT); + } + + /* Update the new target_pgd page to its stable state */ + if (kbase_is_page_migration_enabled()) { + struct kbase_page_metadata *page_md = + kbase_page_private(phys_to_page(target_pgd)); + + spin_lock(&page_md->migrate_lock); + + WARN_ON_ONCE(PAGE_STATUS_GET(page_md->status) != ALLOCATE_IN_PROGRESS || + IS_PAGE_ISOLATED(page_md->status)); + + if (mmut->kctx) { + page_md->status = PAGE_STATUS_SET(page_md->status, PT_MAPPED); + page_md->data.pt_mapped.mmut = mmut; + page_md->data.pt_mapped.pgd_vpfn_level = + PGD_VPFN_LEVEL_SET(insert_vpfn, parent_index); + } else { + page_md->status = PAGE_STATUS_SET(page_md->status, NOT_MOVABLE); + } + + spin_unlock(&page_md->migrate_lock); + } + } + + return 0; + +failure_recovery: + /* Cleanup PTEs from PGDs. The Parent PGD in the loop above is just "PGD" here */ + for (; pgd_index < cur_level; pgd_index++) { + phys_addr_t pgd = pgds_to_insert[pgd_index]; + struct page *pgd_page = pfn_to_page(PFN_DOWN(pgd)); + u64 *pgd_page_va = kbase_kmap(pgd_page); + u64 vpfn = (insert_vpfn >> ((3 - pgd_index) * 9)) & 0x1FF; + + kbdev->mmu_mode->entries_invalidate(&pgd_page_va[vpfn], 1); + kbase_kunmap(pgd_page, pgd_page_va); + } + + return err; +} + +/** + * mmu_insert_alloc_pgds() - allocate memory for PGDs from level_low to + * level_high (inclusive) + * + * @kbdev: Device pointer. + * @mmut: GPU MMU page table. + * @level_low: The lower bound for the levels for which the PGD allocs are required + * @level_high: The higher bound for the levels for which the PGD allocs are required + * @new_pgds: Ptr to an array (size MIDGARD_MMU_BOTTOMLEVEL+1) to write the + * newly allocated PGD addresses to. + * + * Numerically, level_low < level_high, not to be confused with top level and + * bottom level concepts for MMU PGDs. They are only used as low and high bounds + * in an incrementing for-loop. + * + * Return: + * * 0 - OK + * * -ENOMEM - allocation failed for a PGD. */ -int kbase_mmu_insert_single_page(struct kbase_context *kctx, u64 vpfn, - struct tagged_addr phys, size_t nr, - unsigned long flags, int const group_id, - enum kbase_caller_mmu_sync_info mmu_sync_info) +static int mmu_insert_alloc_pgds(struct kbase_device *kbdev, struct kbase_mmu_table *mmut, + phys_addr_t *new_pgds, int level_low, int level_high) +{ + int err = 0; + int i; + + lockdep_assert_held(&mmut->mmu_lock); + + for (i = level_low; i <= level_high; i++) { + do { + new_pgds[i] = kbase_mmu_alloc_pgd(kbdev, mmut); + if (new_pgds[i] != KBASE_MMU_INVALID_PGD_ADDRESS) + break; + + rt_mutex_unlock(&mmut->mmu_lock); + err = kbase_mem_pool_grow(&kbdev->mem_pools.small[mmut->group_id], + level_high, NULL); + rt_mutex_lock(&mmut->mmu_lock); + if (err) { + dev_err(kbdev->dev, "%s: kbase_mem_pool_grow() returned error %d", + __func__, err); + + /* Free all PGDs allocated in previous successful iterations + * from (i-1) to level_low + */ + for (i = (i - 1); i >= level_low; i--) { + if (new_pgds[i] != KBASE_MMU_INVALID_PGD_ADDRESS) + kbase_mmu_free_pgd(kbdev, mmut, new_pgds[i]); + } + + return err; + } + } while (1); + } + + return 0; +} + +static int kbase_mmu_insert_single_page(struct kbase_context *kctx, u64 start_vpfn, + struct tagged_addr phys, size_t nr, unsigned long flags, + int const group_id, + enum kbase_caller_mmu_sync_info mmu_sync_info, + bool ignore_page_migration) { phys_addr_t pgd; u64 *pgd_page; - /* In case the insert_single_page only partially completes - * we need to be able to recover - */ - bool recover_required = false; - u64 start_vpfn = vpfn; - size_t recover_count = 0; + u64 insert_vpfn = start_vpfn; size_t remain = nr; int err; struct kbase_device *kbdev; + u64 dirty_pgds = 0; + unsigned int i; + phys_addr_t new_pgds[MIDGARD_MMU_BOTTOMLEVEL + 1]; + enum kbase_mmu_op_type flush_op; + struct kbase_mmu_table *mmut = &kctx->mmu; + int l, cur_level, insert_level; if (WARN_ON(kctx == NULL)) return -EINVAL; /* 64-bit address range is the max */ - KBASE_DEBUG_ASSERT(vpfn <= (U64_MAX / PAGE_SIZE)); + KBASE_DEBUG_ASSERT(start_vpfn <= (U64_MAX / PAGE_SIZE)); kbdev = kctx->kbdev; @@ -1480,77 +2042,88 @@ int kbase_mmu_insert_single_page(struct kbase_context *kctx, u64 vpfn, if (nr == 0) return 0; - rt_mutex_lock(&kctx->mmu.mmu_lock); + /* If page migration is enabled, pages involved in multiple GPU mappings + * are always treated as not movable. + */ + if (kbase_is_page_migration_enabled() && !ignore_page_migration) { + struct page *phys_page = as_page(phys); + struct kbase_page_metadata *page_md = kbase_page_private(phys_page); + + if (page_md) { + spin_lock(&page_md->migrate_lock); + page_md->status = PAGE_STATUS_SET(page_md->status, (u8)NOT_MOVABLE); + spin_unlock(&page_md->migrate_lock); + } + } + + rt_mutex_lock(&mmut->mmu_lock); while (remain) { - unsigned int i; - unsigned int index = vpfn & 0x1FF; - unsigned int count = KBASE_MMU_PAGE_ENTRIES - index; + unsigned int vindex = insert_vpfn & 0x1FF; + unsigned int count = KBASE_MMU_PAGE_ENTRIES - vindex; struct page *p; register unsigned int num_of_valid_entries; + bool newly_created_pgd = false; if (count > remain) count = remain; + cur_level = MIDGARD_MMU_BOTTOMLEVEL; + insert_level = cur_level; + /* - * Repeatedly calling mmu_get_bottom_pgd() is clearly + * Repeatedly calling mmu_get_lowest_valid_pgd() is clearly * suboptimal. We don't have to re-parse the whole tree * each time (just cache the l0-l2 sequence). * On the other hand, it's only a gain when we map more than * 256 pages at once (on average). Do we really care? */ - do { - err = mmu_get_bottom_pgd(kbdev, &kctx->mmu, - vpfn, &pgd); - if (err != -ENOMEM) - break; - /* Fill the memory pool with enough pages for - * the page walk to succeed - */ - rt_mutex_unlock(&kctx->mmu.mmu_lock); - err = kbase_mem_pool_grow( - &kbdev->mem_pools.small[ - kctx->mmu.group_id], - MIDGARD_MMU_BOTTOMLEVEL); - rt_mutex_lock(&kctx->mmu.mmu_lock); - } while (!err); + /* insert_level < cur_level if there's no valid PGD for cur_level and insert_vpn */ + err = mmu_get_lowest_valid_pgd(kbdev, mmut, insert_vpfn, cur_level, &insert_level, + &pgd); + if (err) { - dev_warn(kbdev->dev, "%s: mmu_get_bottom_pgd failure\n", - __func__); - if (recover_required) { - /* Invalidate the pages we have partially - * completed - */ - mmu_insert_pages_failure_recovery(kbdev, - &kctx->mmu, - start_vpfn, - start_vpfn + recover_count); - } + dev_err(kbdev->dev, "%s: mmu_get_lowest_valid_pgd() returned error %d", + __func__, err); goto fail_unlock; } + /* No valid pgd at cur_level */ + if (insert_level != cur_level) { + /* Allocate new pgds for all missing levels from the required level + * down to the lowest valid pgd at insert_level + */ + err = mmu_insert_alloc_pgds(kbdev, mmut, new_pgds, (insert_level + 1), + cur_level); + if (err) + goto fail_unlock; + + newly_created_pgd = true; + + new_pgds[insert_level] = pgd; + + /* If we didn't find an existing valid pgd at cur_level, + * we've now allocated one. The ATE in the next step should + * be inserted in this newly allocated pgd. + */ + pgd = new_pgds[cur_level]; + } + p = pfn_to_page(PFN_DOWN(pgd)); - pgd_page = kmap(p); + + pgd_page = kbase_kmap(p); if (!pgd_page) { - dev_warn(kbdev->dev, "%s: kmap failure\n", __func__); - if (recover_required) { - /* Invalidate the pages we have partially - * completed - */ - mmu_insert_pages_failure_recovery(kbdev, - &kctx->mmu, - start_vpfn, - start_vpfn + recover_count); - } + dev_err(kbdev->dev, "%s: kmap failure", __func__); err = -ENOMEM; - goto fail_unlock; + + goto fail_unlock_free_pgds; } num_of_valid_entries = kbdev->mmu_mode->get_num_valid_entries(pgd_page); for (i = 0; i < count; i++) { - unsigned int ofs = index + i; + unsigned int ofs = vindex + i; /* Fail if the current page is a valid ATE entry */ KBASE_DEBUG_ASSERT(0 == (pgd_page[ofs] & 1UL)); @@ -1562,55 +2135,170 @@ int kbase_mmu_insert_single_page(struct kbase_context *kctx, u64 vpfn, kbdev->mmu_mode->set_num_valid_entries( pgd_page, num_of_valid_entries + count); - vpfn += count; - remain -= count; + dirty_pgds |= 1ULL << (newly_created_pgd ? insert_level : MIDGARD_MMU_BOTTOMLEVEL); - kbase_mmu_sync_pgd(kbdev, - kbase_dma_addr(p) + (index * sizeof(u64)), - count * sizeof(u64)); - - kunmap(p); - /* We have started modifying the page table. - * If further pages need inserting and fail we need to undo what - * has already taken place + /* MMU cache flush operation here will depend on whether bottom level + * PGD is newly created or not. + * + * If bottom level PGD is newly created then no GPU cache maintenance is + * required as the PGD will not exist in GPU cache. Otherwise GPU cache + * maintenance is required for existing PGD. */ - recover_required = true; - recover_count += count; + flush_op = newly_created_pgd ? KBASE_MMU_OP_NONE : KBASE_MMU_OP_FLUSH_PT; + + kbase_mmu_sync_pgd(kbdev, kctx, pgd + (vindex * sizeof(u64)), + kbase_dma_addr(p) + (vindex * sizeof(u64)), count * sizeof(u64), + flush_op); + + if (newly_created_pgd) { + err = update_parent_pgds(kbdev, mmut, cur_level, insert_level, insert_vpfn, + new_pgds); + if (err) { + dev_err(kbdev->dev, "%s: update_parent_pgds() failed (%d)", + __func__, err); + + kbdev->mmu_mode->entries_invalidate(&pgd_page[vindex], count); + + kbase_kunmap(p, pgd_page); + goto fail_unlock_free_pgds; + } + } + + insert_vpfn += count; + remain -= count; + kbase_kunmap(p, pgd_page); } - rt_mutex_unlock(&kctx->mmu.mmu_lock); - kbase_mmu_flush_invalidate(kctx, start_vpfn, nr, false, mmu_sync_info); + + rt_mutex_unlock(&mmut->mmu_lock); + + mmu_flush_invalidate_insert_pages(kbdev, mmut, start_vpfn, nr, dirty_pgds, mmu_sync_info, + false); + return 0; +fail_unlock_free_pgds: + /* Free the pgds allocated by us from insert_level+1 to bottom level */ + for (l = cur_level; l > insert_level; l--) + kbase_mmu_free_pgd(kbdev, mmut, new_pgds[l]); + fail_unlock: - rt_mutex_unlock(&kctx->mmu.mmu_lock); - kbase_mmu_flush_invalidate(kctx, start_vpfn, nr, false, mmu_sync_info); + if (insert_vpfn != start_vpfn) { + /* Invalidate the pages we have partially completed */ + mmu_insert_pages_failure_recovery(kbdev, mmut, start_vpfn, insert_vpfn, &dirty_pgds, + NULL, true); + } + + mmu_flush_invalidate_insert_pages(kbdev, mmut, start_vpfn, nr, dirty_pgds, mmu_sync_info, + true); + kbase_mmu_free_pgds_list(kbdev, mmut); + rt_mutex_unlock(&mmut->mmu_lock); + return err; } -static void kbase_mmu_free_pgd(struct kbase_device *kbdev, - struct kbase_mmu_table *mmut, phys_addr_t pgd, - bool dirty) +int kbase_mmu_insert_single_imported_page(struct kbase_context *kctx, u64 vpfn, + struct tagged_addr phys, size_t nr, unsigned long flags, + int const group_id, + enum kbase_caller_mmu_sync_info mmu_sync_info) { - struct page *p; + /* The aliasing sink page has metadata and shall be moved to NOT_MOVABLE. */ + return kbase_mmu_insert_single_page(kctx, vpfn, phys, nr, flags, group_id, mmu_sync_info, + false); +} - lockdep_assert_held(&mmut->mmu_lock); +int kbase_mmu_insert_single_aliased_page(struct kbase_context *kctx, u64 vpfn, + struct tagged_addr phys, size_t nr, unsigned long flags, + int const group_id, + enum kbase_caller_mmu_sync_info mmu_sync_info) +{ + /* The aliasing sink page has metadata and shall be moved to NOT_MOVABLE. */ + return kbase_mmu_insert_single_page(kctx, vpfn, phys, nr, flags, group_id, mmu_sync_info, + false); +} - p = pfn_to_page(PFN_DOWN(pgd)); +static void kbase_mmu_progress_migration_on_insert(struct tagged_addr phys, + struct kbase_va_region *reg, + struct kbase_mmu_table *mmut, const u64 vpfn) +{ + struct page *phys_page = as_page(phys); + struct kbase_page_metadata *page_md = kbase_page_private(phys_page); - kbase_mem_pool_free(&kbdev->mem_pools.small[mmut->group_id], - p, dirty); + if (!IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT)) + return; - atomic_sub(1, &kbdev->memdev.used_pages); + spin_lock(&page_md->migrate_lock); - /* If MMU tables belong to a context then pages will have been accounted - * against it, so we must decrement the usage counts here. + /* If no GPU va region is given: the metadata provided are + * invalid. + * + * If the page is already allocated and mapped: this is + * an additional GPU mapping, probably to create a memory + * alias, which means it is no longer possible to migrate + * the page easily because tracking all the GPU mappings + * would be too costly. + * + * In any case: the page becomes not movable. It is kept + * alive, but attempts to migrate it will fail. The page + * will be freed if it is still not movable when it returns + * to a memory pool. Notice that the movable flag is not + * cleared because that would require taking the page lock. */ - if (mmut->kctx) { - kbase_process_page_usage_dec(mmut->kctx, 1); - atomic_sub(1, &mmut->kctx->used_pages); + if (!reg || PAGE_STATUS_GET(page_md->status) == (u8)ALLOCATED_MAPPED) { + page_md->status = PAGE_STATUS_SET(page_md->status, (u8)NOT_MOVABLE); + } else if (PAGE_STATUS_GET(page_md->status) == (u8)ALLOCATE_IN_PROGRESS) { + page_md->status = PAGE_STATUS_SET(page_md->status, (u8)ALLOCATED_MAPPED); + page_md->data.mapped.reg = reg; + page_md->data.mapped.mmut = mmut; + page_md->data.mapped.vpfn = vpfn; } - kbase_trace_gpu_mem_usage_dec(kbdev, mmut->kctx, 1); + spin_unlock(&page_md->migrate_lock); +} + +static void kbase_mmu_progress_migration_on_teardown(struct kbase_device *kbdev, + struct tagged_addr *phys, size_t requested_nr) +{ + size_t i; + + if (!IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT)) + return; + + for (i = 0; i < requested_nr; i++) { + struct page *phys_page = as_page(phys[i]); + struct kbase_page_metadata *page_md = kbase_page_private(phys_page); + + /* Skip the 4KB page that is part of a large page, as the large page is + * excluded from the migration process. + */ + if (is_huge(phys[i]) || is_partial(phys[i])) + continue; + + if (page_md) { + u8 status; + + spin_lock(&page_md->migrate_lock); + status = PAGE_STATUS_GET(page_md->status); + + if (status == ALLOCATED_MAPPED) { + if (IS_PAGE_ISOLATED(page_md->status)) { + page_md->status = PAGE_STATUS_SET( + page_md->status, (u8)FREE_ISOLATED_IN_PROGRESS); + page_md->data.free_isolated.kbdev = kbdev; + /* At this point, we still have a reference + * to the page via its page migration metadata, + * and any page with the FREE_ISOLATED_IN_PROGRESS + * status will subsequently be freed in either + * kbase_page_migrate() or kbase_page_putback() + */ + phys[i] = as_tagged(0); + } else + page_md->status = PAGE_STATUS_SET(page_md->status, + (u8)FREE_IN_PROGRESS); + } + + spin_unlock(&page_md->migrate_lock); + } + } } u64 kbase_mmu_create_ate(struct kbase_device *const kbdev, @@ -1624,12 +2312,10 @@ u64 kbase_mmu_create_ate(struct kbase_device *const kbdev, group_id, level, entry); } -int kbase_mmu_insert_pages_no_flush(struct kbase_device *kbdev, - struct kbase_mmu_table *mmut, - const u64 start_vpfn, - struct tagged_addr *phys, size_t nr, - unsigned long flags, - int const group_id) +static int mmu_insert_pages_no_flush(struct kbase_device *kbdev, struct kbase_mmu_table *mmut, + const u64 start_vpfn, struct tagged_addr *phys, size_t nr, + unsigned long flags, int const group_id, u64 *dirty_pgds, + struct kbase_va_region *reg, bool ignore_page_migration) { phys_addr_t pgd; u64 *pgd_page; @@ -1637,6 +2323,9 @@ int kbase_mmu_insert_pages_no_flush(struct kbase_device *kbdev, size_t remain = nr; int err; struct kbase_mmu_mode const *mmu_mode; + unsigned int i; + phys_addr_t new_pgds[MIDGARD_MMU_BOTTOMLEVEL + 1]; + int l, cur_level, insert_level; /* Note that 0 is a valid start_vpfn */ /* 64-bit address range is the max */ @@ -1651,12 +2340,12 @@ int kbase_mmu_insert_pages_no_flush(struct kbase_device *kbdev, rt_mutex_lock(&mmut->mmu_lock); while (remain) { - unsigned int i; unsigned int vindex = insert_vpfn & 0x1FF; unsigned int count = KBASE_MMU_PAGE_ENTRIES - vindex; struct page *p; - int cur_level; register unsigned int num_of_valid_entries; + bool newly_created_pgd = false; + enum kbase_mmu_op_type flush_op; if (count > remain) count = remain; @@ -1666,55 +2355,54 @@ int kbase_mmu_insert_pages_no_flush(struct kbase_device *kbdev, else cur_level = MIDGARD_MMU_BOTTOMLEVEL; + insert_level = cur_level; + /* - * Repeatedly calling mmu_get_pgd_at_level() is clearly + * Repeatedly calling mmu_get_lowest_valid_pgd() is clearly * suboptimal. We don't have to re-parse the whole tree * each time (just cache the l0-l2 sequence). * On the other hand, it's only a gain when we map more than * 256 pages at once (on average). Do we really care? */ - do { - err = mmu_get_pgd_at_level(kbdev, mmut, insert_vpfn, - cur_level, &pgd); - if (err != -ENOMEM) - break; - /* Fill the memory pool with enough pages for - * the page walk to succeed - */ - rt_mutex_unlock(&mmut->mmu_lock); - err = kbase_mem_pool_grow( - &kbdev->mem_pools.small[mmut->group_id], - cur_level); - rt_mutex_lock(&mmut->mmu_lock); - } while (!err); + /* insert_level < cur_level if there's no valid PGD for cur_level and insert_vpn */ + err = mmu_get_lowest_valid_pgd(kbdev, mmut, insert_vpfn, cur_level, &insert_level, + &pgd); if (err) { - dev_warn(kbdev->dev, - "%s: mmu_get_bottom_pgd failure\n", __func__); - if (insert_vpfn != start_vpfn) { - /* Invalidate the pages we have partially - * completed - */ - mmu_insert_pages_failure_recovery(kbdev, - mmut, start_vpfn, insert_vpfn); - } + dev_err(kbdev->dev, "%s: mmu_get_lowest_valid_pgd() returned error %d", + __func__, err); goto fail_unlock; } + /* No valid pgd at cur_level */ + if (insert_level != cur_level) { + /* Allocate new pgds for all missing levels from the required level + * down to the lowest valid pgd at insert_level + */ + err = mmu_insert_alloc_pgds(kbdev, mmut, new_pgds, (insert_level + 1), + cur_level); + if (err) + goto fail_unlock; + + newly_created_pgd = true; + + new_pgds[insert_level] = pgd; + + /* If we didn't find an existing valid pgd at cur_level, + * we've now allocated one. The ATE in the next step should + * be inserted in this newly allocated pgd. + */ + pgd = new_pgds[cur_level]; + } + p = pfn_to_page(PFN_DOWN(pgd)); - pgd_page = kmap(p); + pgd_page = kbase_kmap(p); + if (!pgd_page) { - dev_warn(kbdev->dev, "%s: kmap failure\n", - __func__); - if (insert_vpfn != start_vpfn) { - /* Invalidate the pages we have partially - * completed - */ - mmu_insert_pages_failure_recovery(kbdev, - mmut, start_vpfn, insert_vpfn); - } + dev_err(kbdev->dev, "%s: kmap failure", __func__); err = -ENOMEM; - goto fail_unlock; + + goto fail_unlock_free_pgds; } num_of_valid_entries = @@ -1722,18 +2410,8 @@ int kbase_mmu_insert_pages_no_flush(struct kbase_device *kbdev, if (cur_level == MIDGARD_MMU_LEVEL(2)) { int level_index = (insert_vpfn >> 9) & 0x1FF; - u64 *target = &pgd_page[level_index]; - - if (mmu_mode->pte_is_valid(*target, cur_level)) { - kbase_mmu_free_pgd( - kbdev, mmut, - kbdev->mmu_mode->pte_to_phy_addr( - *target), - false); - num_of_valid_entries--; - } - *target = kbase_mmu_create_ate(kbdev, *phys, flags, - cur_level, group_id); + pgd_page[level_index] = + kbase_mmu_create_ate(kbdev, *phys, flags, cur_level, group_id); num_of_valid_entries++; } else { @@ -1752,27 +2430,94 @@ int kbase_mmu_insert_pages_no_flush(struct kbase_device *kbdev, *target = kbase_mmu_create_ate(kbdev, phys[i], flags, cur_level, group_id); + + /* If page migration is enabled, this is the right time + * to update the status of the page. + */ + if (kbase_is_page_migration_enabled() && !ignore_page_migration && + !is_huge(phys[i]) && !is_partial(phys[i])) + kbase_mmu_progress_migration_on_insert(phys[i], reg, mmut, + insert_vpfn + i); } num_of_valid_entries += count; } mmu_mode->set_num_valid_entries(pgd_page, num_of_valid_entries); + if (dirty_pgds) + *dirty_pgds |= 1ULL << (newly_created_pgd ? insert_level : cur_level); + + /* MMU cache flush operation here will depend on whether bottom level + * PGD is newly created or not. + * + * If bottom level PGD is newly created then no GPU cache maintenance is + * required as the PGD will not exist in GPU cache. Otherwise GPU cache + * maintenance is required for existing PGD. + */ + flush_op = newly_created_pgd ? KBASE_MMU_OP_NONE : KBASE_MMU_OP_FLUSH_PT; + + kbase_mmu_sync_pgd(kbdev, mmut->kctx, pgd + (vindex * sizeof(u64)), + kbase_dma_addr(p) + (vindex * sizeof(u64)), count * sizeof(u64), + flush_op); + + if (newly_created_pgd) { + err = update_parent_pgds(kbdev, mmut, cur_level, insert_level, insert_vpfn, + new_pgds); + if (err) { + dev_err(kbdev->dev, "%s: update_parent_pgds() failed (%d)", + __func__, err); + + kbdev->mmu_mode->entries_invalidate(&pgd_page[vindex], count); + + kbase_kunmap(p, pgd_page); + goto fail_unlock_free_pgds; + } + } + phys += count; insert_vpfn += count; remain -= count; + kbase_kunmap(p, pgd_page); + } - kbase_mmu_sync_pgd(kbdev, - kbase_dma_addr(p) + (vindex * sizeof(u64)), - count * sizeof(u64)); + rt_mutex_unlock(&mmut->mmu_lock); - kunmap(p); - } + return 0; - err = 0; +fail_unlock_free_pgds: + /* Free the pgds allocated by us from insert_level+1 to bottom level */ + for (l = cur_level; l > insert_level; l--) + kbase_mmu_free_pgd(kbdev, mmut, new_pgds[l]); fail_unlock: + if (insert_vpfn != start_vpfn) { + /* Invalidate the pages we have partially completed */ + mmu_insert_pages_failure_recovery(kbdev, mmut, start_vpfn, insert_vpfn, dirty_pgds, + phys, ignore_page_migration); + } + + mmu_flush_invalidate_insert_pages(kbdev, mmut, start_vpfn, nr, + dirty_pgds ? *dirty_pgds : 0xF, CALLER_MMU_ASYNC, true); + kbase_mmu_free_pgds_list(kbdev, mmut); rt_mutex_unlock(&mmut->mmu_lock); + + return err; +} + +int kbase_mmu_insert_pages_no_flush(struct kbase_device *kbdev, struct kbase_mmu_table *mmut, + const u64 start_vpfn, struct tagged_addr *phys, size_t nr, + unsigned long flags, int const group_id, u64 *dirty_pgds, + struct kbase_va_region *reg) +{ + int err; + + /* Early out if there is nothing to do */ + if (nr == 0) + return 0; + + err = mmu_insert_pages_no_flush(kbdev, mmut, start_vpfn, phys, nr, flags, group_id, + dirty_pgds, reg, false); + return err; } @@ -1780,31 +2525,86 @@ fail_unlock: * Map 'nr' pages pointed to by 'phys' at GPU PFN 'vpfn' for GPU address space * number 'as_nr'. */ -int kbase_mmu_insert_pages(struct kbase_device *kbdev, - struct kbase_mmu_table *mmut, u64 vpfn, - struct tagged_addr *phys, size_t nr, - unsigned long flags, int as_nr, int const group_id, - enum kbase_caller_mmu_sync_info mmu_sync_info) +int kbase_mmu_insert_pages(struct kbase_device *kbdev, struct kbase_mmu_table *mmut, u64 vpfn, + struct tagged_addr *phys, size_t nr, unsigned long flags, int as_nr, + int const group_id, enum kbase_caller_mmu_sync_info mmu_sync_info, + struct kbase_va_region *reg) { int err; + u64 dirty_pgds = 0; + + /* Early out if there is nothing to do */ + if (nr == 0) + return 0; - err = kbase_mmu_insert_pages_no_flush(kbdev, mmut, vpfn, - phys, nr, flags, group_id); + err = mmu_insert_pages_no_flush(kbdev, mmut, vpfn, phys, nr, flags, group_id, &dirty_pgds, + reg, false); + if (err) + return err; - if (mmut->kctx) - kbase_mmu_flush_invalidate(mmut->kctx, vpfn, nr, false, - mmu_sync_info); - else - kbase_mmu_flush_invalidate_no_ctx(kbdev, vpfn, nr, false, as_nr, - mmu_sync_info); + mmu_flush_invalidate_insert_pages(kbdev, mmut, vpfn, nr, dirty_pgds, mmu_sync_info, false); - return err; + return 0; } KBASE_EXPORT_TEST_API(kbase_mmu_insert_pages); +int kbase_mmu_insert_pages_skip_status_update(struct kbase_device *kbdev, + struct kbase_mmu_table *mmut, u64 vpfn, + struct tagged_addr *phys, size_t nr, + unsigned long flags, int as_nr, int const group_id, + enum kbase_caller_mmu_sync_info mmu_sync_info, + struct kbase_va_region *reg) +{ + int err; + u64 dirty_pgds = 0; + + /* Early out if there is nothing to do */ + if (nr == 0) + return 0; + + /* Imported allocations don't have metadata and therefore always ignore the + * page migration logic. + */ + err = mmu_insert_pages_no_flush(kbdev, mmut, vpfn, phys, nr, flags, group_id, &dirty_pgds, + reg, true); + if (err) + return err; + + mmu_flush_invalidate_insert_pages(kbdev, mmut, vpfn, nr, dirty_pgds, mmu_sync_info, false); + + return 0; +} + +int kbase_mmu_insert_aliased_pages(struct kbase_device *kbdev, struct kbase_mmu_table *mmut, + u64 vpfn, struct tagged_addr *phys, size_t nr, + unsigned long flags, int as_nr, int const group_id, + enum kbase_caller_mmu_sync_info mmu_sync_info, + struct kbase_va_region *reg) +{ + int err; + u64 dirty_pgds = 0; + + /* Early out if there is nothing to do */ + if (nr == 0) + return 0; + + /* Memory aliases are always built on top of existing allocations, + * therefore the state of physical pages shall be updated. + */ + err = mmu_insert_pages_no_flush(kbdev, mmut, vpfn, phys, nr, flags, group_id, &dirty_pgds, + reg, false); + if (err) + return err; + + mmu_flush_invalidate_insert_pages(kbdev, mmut, vpfn, nr, dirty_pgds, mmu_sync_info, false); + + return 0; +} + +#if !MALI_USE_CSF /** - * kbase_mmu_flush_invalidate_noretain() - Flush and invalidate the GPU caches + * kbase_mmu_flush_noretain() - Flush and invalidate the GPU caches * without retaining the kbase context. * @kctx: The KBase context. * @vpfn: The virtual page frame number to start the flush on. @@ -1813,17 +2613,15 @@ KBASE_EXPORT_TEST_API(kbase_mmu_insert_pages); * As per kbase_mmu_flush_invalidate but doesn't retain the kctx or do any * other locking. */ -static void kbase_mmu_flush_invalidate_noretain(struct kbase_context *kctx, - u64 vpfn, size_t nr) +static void kbase_mmu_flush_noretain(struct kbase_context *kctx, u64 vpfn, size_t nr) { struct kbase_device *kbdev = kctx->kbdev; - struct kbase_mmu_hw_op_param op_param; int err; - /* Calls to this function are inherently asynchronous, with respect to * MMU operations. */ const enum kbase_caller_mmu_sync_info mmu_sync_info = CALLER_MMU_ASYNC; + struct kbase_mmu_hw_op_param op_param; lockdep_assert_held(&kctx->kbdev->hwaccess_lock); lockdep_assert_held(&kctx->kbdev->mmu_hw_mutex); @@ -1833,154 +2631,32 @@ static void kbase_mmu_flush_invalidate_noretain(struct kbase_context *kctx, return; /* flush L2 and unlock the VA (resumes the MMU) */ - op_param = (struct kbase_mmu_hw_op_param){ - .vpfn = vpfn, - .nr = nr, - .op = KBASE_MMU_OP_FLUSH_MEM, - .kctx_id = kctx->id, - .mmu_sync_info = mmu_sync_info, - }; - + op_param.vpfn = vpfn; + op_param.nr = nr; + op_param.op = KBASE_MMU_OP_FLUSH_MEM; + op_param.kctx_id = kctx->id; + op_param.mmu_sync_info = mmu_sync_info; if (mmu_flush_cache_on_gpu_ctrl(kbdev)) { - err = mmu_flush_invalidate_on_gpu_ctrl( - kbdev, &kbdev->as[kctx->as_nr], &op_param); + /* Value used to prevent skipping of any levels when flushing */ + op_param.flush_skip_levels = pgd_level_to_skip_flush(0xF); + err = kbase_mmu_hw_do_flush_on_gpu_ctrl(kbdev, &kbdev->as[kctx->as_nr], + &op_param); } else { - err = kbase_mmu_hw_do_operation(kbdev, &kbdev->as[kctx->as_nr], - &op_param); + err = kbase_mmu_hw_do_flush_locked(kbdev, &kbdev->as[kctx->as_nr], + &op_param); } if (err) { /* Flush failed to complete, assume the * GPU has hung and perform a reset to recover */ - dev_err(kbdev->dev, "Flush for GPU page table update did not complete. Issuing GPU soft-reset to recover\n"); + dev_err(kbdev->dev, "Flush for GPU page table update did not complete. Issuing GPU soft-reset to recover"); if (kbase_prepare_to_reset_gpu_locked(kbdev, RESET_FLAGS_NONE)) kbase_reset_gpu_locked(kbdev); } } - -/* Perform a flush/invalidate on a particular address space - */ -static void -kbase_mmu_flush_invalidate_as(struct kbase_device *kbdev, struct kbase_as *as, - u64 vpfn, size_t nr, bool sync, u32 kctx_id, - enum kbase_caller_mmu_sync_info mmu_sync_info) -{ - int err; - bool gpu_powered; - unsigned long flags; - struct kbase_mmu_hw_op_param op_param; - - spin_lock_irqsave(&kbdev->hwaccess_lock, flags); - gpu_powered = kbdev->pm.backend.gpu_powered; - spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); - - /* GPU is off so there's no need to perform flush/invalidate. - * But even if GPU is not actually powered down, after gpu_powered flag - * was set to false, it is still safe to skip the flush/invalidate. - * The TLB invalidation will anyways be performed due to AS_COMMAND_UPDATE - * which is sent when address spaces are restored after gpu_powered flag - * is set to true. Flushing of L2 cache is certainly not required as L2 - * cache is definitely off if gpu_powered is false. - */ - if (!gpu_powered) - return; - - if (kbase_pm_context_active_handle_suspend(kbdev, - KBASE_PM_SUSPEND_HANDLER_DONT_REACTIVATE)) { - /* GPU has just been powered off due to system suspend. - * So again, no need to perform flush/invalidate. - */ - return; - } - - /* AS transaction begin */ - mutex_lock(&kbdev->mmu_hw_mutex); - - op_param = (struct kbase_mmu_hw_op_param){ - .vpfn = vpfn, - .nr = nr, - .kctx_id = kctx_id, - .mmu_sync_info = mmu_sync_info, - }; - - if (sync) - op_param.op = KBASE_MMU_OP_FLUSH_MEM; - else - op_param.op = KBASE_MMU_OP_FLUSH_PT; - - if (mmu_flush_cache_on_gpu_ctrl(kbdev)) { - spin_lock_irqsave(&kbdev->hwaccess_lock, flags); - err = mmu_flush_invalidate_on_gpu_ctrl(kbdev, as, &op_param); - spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); - } else { - mmu_hw_operation_begin(kbdev); - err = kbase_mmu_hw_do_operation(kbdev, as, &op_param); - mmu_hw_operation_end(kbdev); - } - - if (err) { - /* Flush failed to complete, assume the GPU has hung and - * perform a reset to recover - */ - dev_err(kbdev->dev, "Flush for GPU page table update did not complete. Issuing GPU soft-reset to recover\n"); - - if (kbase_prepare_to_reset_gpu( - kbdev, RESET_FLAGS_HWC_UNRECOVERABLE_ERROR)) - kbase_reset_gpu(kbdev); - } - - mutex_unlock(&kbdev->mmu_hw_mutex); - /* AS transaction end */ - - kbase_pm_context_idle(kbdev); -} - -static void -kbase_mmu_flush_invalidate_no_ctx(struct kbase_device *kbdev, u64 vpfn, - size_t nr, bool sync, int as_nr, - enum kbase_caller_mmu_sync_info mmu_sync_info) -{ - /* Skip if there is nothing to do */ - if (nr) { - kbase_mmu_flush_invalidate_as(kbdev, &kbdev->as[as_nr], vpfn, - nr, sync, 0xFFFFFFFF, - mmu_sync_info); - } -} - -static void -kbase_mmu_flush_invalidate(struct kbase_context *kctx, u64 vpfn, size_t nr, - bool sync, - enum kbase_caller_mmu_sync_info mmu_sync_info) -{ - struct kbase_device *kbdev; - bool ctx_is_in_runpool; - - /* Early out if there is nothing to do */ - if (nr == 0) - return; - - kbdev = kctx->kbdev; -#if !MALI_USE_CSF - rt_mutex_lock(&kbdev->js_data.queue_mutex); - ctx_is_in_runpool = kbase_ctx_sched_inc_refcount(kctx); - rt_mutex_unlock(&kbdev->js_data.queue_mutex); -#else - ctx_is_in_runpool = kbase_ctx_sched_inc_refcount_if_as_valid(kctx); -#endif /* !MALI_USE_CSF */ - - if (ctx_is_in_runpool) { - KBASE_DEBUG_ASSERT(kctx->as_nr != KBASEP_AS_NR_INVALID); - - kbase_mmu_flush_invalidate_as(kbdev, &kbdev->as[kctx->as_nr], - vpfn, nr, sync, kctx->id, - mmu_sync_info); - - release_ctx(kbdev, kctx); - } -} +#endif void kbase_mmu_update(struct kbase_device *kbdev, struct kbase_mmu_table *mmut, @@ -2002,6 +2678,88 @@ void kbase_mmu_disable_as(struct kbase_device *kbdev, int as_nr) kbdev->mmu_mode->disable_as(kbdev, as_nr); } +#if MALI_USE_CSF +void kbase_mmu_disable(struct kbase_context *kctx) +{ + /* Calls to this function are inherently asynchronous, with respect to + * MMU operations. + */ + const enum kbase_caller_mmu_sync_info mmu_sync_info = CALLER_MMU_ASYNC; + struct kbase_device *kbdev = kctx->kbdev; + struct kbase_mmu_hw_op_param op_param = { 0 }; + int lock_err, flush_err; + + /* ASSERT that the context has a valid as_nr, which is only the case + * when it's scheduled in. + * + * as_nr won't change because the caller has the hwaccess_lock + */ + KBASE_DEBUG_ASSERT(kctx->as_nr != KBASEP_AS_NR_INVALID); + + lockdep_assert_held(&kctx->kbdev->hwaccess_lock); + lockdep_assert_held(&kctx->kbdev->mmu_hw_mutex); + + op_param.vpfn = 0; + op_param.nr = ~0; + op_param.op = KBASE_MMU_OP_FLUSH_MEM; + op_param.kctx_id = kctx->id; + op_param.mmu_sync_info = mmu_sync_info; + +#if MALI_USE_CSF + /* 0xF value used to prevent skipping of any levels when flushing */ + if (mmu_flush_cache_on_gpu_ctrl(kbdev)) + op_param.flush_skip_levels = pgd_level_to_skip_flush(0xF); +#endif + + /* lock MMU to prevent existing jobs on GPU from executing while the AS is + * not yet disabled + */ + lock_err = kbase_mmu_hw_do_lock(kbdev, &kbdev->as[kctx->as_nr], &op_param); + if (lock_err) + dev_err(kbdev->dev, "Failed to lock AS %d for ctx %d_%d", kctx->as_nr, kctx->tgid, + kctx->id); + + /* Issue the flush command only when L2 cache is in stable power on state. + * Any other state for L2 cache implies that shader cores are powered off, + * which in turn implies there is no execution happening on the GPU. + */ + if (kbdev->pm.backend.l2_state == KBASE_L2_ON) { + flush_err = kbase_gpu_cache_flush_and_busy_wait(kbdev, + GPU_COMMAND_CACHE_CLN_INV_L2_LSC); + if (flush_err) + dev_err(kbdev->dev, + "Failed to flush GPU cache when disabling AS %d for ctx %d_%d", + kctx->as_nr, kctx->tgid, kctx->id); + } + kbdev->mmu_mode->disable_as(kbdev, kctx->as_nr); + + if (!lock_err) { + /* unlock the MMU to allow it to resume */ + lock_err = + kbase_mmu_hw_do_unlock_no_addr(kbdev, &kbdev->as[kctx->as_nr], &op_param); + if (lock_err) + dev_err(kbdev->dev, "Failed to unlock AS %d for ctx %d_%d", kctx->as_nr, + kctx->tgid, kctx->id); + } + +#if !MALI_USE_CSF + /* + * JM GPUs has some L1 read only caches that need to be invalidated + * with START_FLUSH configuration. Purge the MMU disabled kctx from + * the slot_rb tracking field so such invalidation is performed when + * a new katom is executed on the affected slots. + */ + kbase_backend_slot_kctx_purge_locked(kbdev, kctx); +#endif + + /* kbase_gpu_cache_flush_and_busy_wait() will reset the GPU on timeout. Only + * reset the GPU if locking or unlocking fails. + */ + if (lock_err) + if (kbase_prepare_to_reset_gpu_locked(kbdev, RESET_FLAGS_NONE)) + kbase_reset_gpu_locked(kbdev); +} +#else void kbase_mmu_disable(struct kbase_context *kctx) { /* ASSERT that the context has a valid as_nr, which is only the case @@ -2021,7 +2779,7 @@ void kbase_mmu_disable(struct kbase_context *kctx) * The job scheduler code will already be holding the locks and context * so just do the flush. */ - kbase_mmu_flush_invalidate_noretain(kctx, 0, ~0); + kbase_mmu_flush_noretain(kctx, 0, ~0); kctx->kbdev->mmu_mode->disable_as(kctx->kbdev, kctx->as_nr); #if !MALI_USE_CSF @@ -2034,12 +2792,13 @@ void kbase_mmu_disable(struct kbase_context *kctx) kbase_backend_slot_kctx_purge_locked(kctx->kbdev, kctx); #endif } +#endif KBASE_EXPORT_TEST_API(kbase_mmu_disable); static void kbase_mmu_update_and_free_parent_pgds(struct kbase_device *kbdev, - struct kbase_mmu_table *mmut, - phys_addr_t *pgds, u64 vpfn, - int level) + struct kbase_mmu_table *mmut, phys_addr_t *pgds, + u64 vpfn, int level, + enum kbase_mmu_op_type flush_op, u64 *dirty_pgds) { int current_level; @@ -2047,83 +2806,116 @@ static void kbase_mmu_update_and_free_parent_pgds(struct kbase_device *kbdev, for (current_level = level - 1; current_level >= MIDGARD_MMU_LEVEL(0); current_level--) { - u64 *current_page = kmap(phys_to_page(pgds[current_level])); + phys_addr_t current_pgd = pgds[current_level]; + struct page *p = phys_to_page(current_pgd); + + u64 *current_page = kbase_kmap(p); unsigned int current_valid_entries = kbdev->mmu_mode->get_num_valid_entries(current_page); + int index = (vpfn >> ((3 - current_level) * 9)) & 0x1FF; + /* We need to track every level that needs updating */ + if (dirty_pgds) + *dirty_pgds |= 1ULL << current_level; + + kbdev->mmu_mode->entries_invalidate(¤t_page[index], 1); if (current_valid_entries == 1 && current_level != MIDGARD_MMU_LEVEL(0)) { - kunmap(phys_to_page(pgds[current_level])); + kbase_kunmap(p, current_page); - kbase_mmu_free_pgd(kbdev, mmut, pgds[current_level], - true); - } else { - int index = (vpfn >> ((3 - current_level) * 9)) & 0x1FF; - - kbdev->mmu_mode->entry_invalidate(¤t_page[index]); + /* Ensure the cacheline containing the last valid entry + * of PGD is invalidated from the GPU cache, before the + * PGD page is freed. + */ + kbase_mmu_sync_pgd_gpu(kbdev, mmut->kctx, + current_pgd + (index * sizeof(u64)), + sizeof(u64), flush_op); + kbase_mmu_add_to_free_pgds_list(mmut, p); + } else { current_valid_entries--; kbdev->mmu_mode->set_num_valid_entries( current_page, current_valid_entries); - kbase_mmu_sync_pgd(kbdev, - kbase_dma_addr(phys_to_page( - pgds[current_level])) + - 8 * index, - 8 * 1); + kbase_kunmap(p, current_page); - kunmap(phys_to_page(pgds[current_level])); + kbase_mmu_sync_pgd(kbdev, mmut->kctx, current_pgd + (index * sizeof(u64)), + kbase_dma_addr(p) + (index * sizeof(u64)), sizeof(u64), + flush_op); break; } } } -/* - * We actually discard the ATE and free the page table pages if no valid entries - * exist in PGD. +/** + * mmu_flush_invalidate_teardown_pages() - Perform flush operation after unmapping pages. * - * IMPORTANT: This uses kbasep_js_runpool_release_ctx() when the context is - * currently scheduled into the runpool, and so potentially uses a lot of locks. - * These locks must be taken in the correct order with respect to others - * already held by the caller. Refer to kbasep_js_runpool_release_ctx() for more - * information. + * @kbdev: Pointer to kbase device. + * @kctx: Pointer to kbase context. + * @as_nr: Address space number, for GPU cache maintenance operations + * that happen outside a specific kbase context. + * @phys: Array of physical pages to flush. + * @phys_page_nr: Number of physical pages to flush. + * @op_param: Non-NULL pointer to struct containing information about the flush + * operation to perform. + * + * This function will do one of three things: + * 1. Invalidate the MMU caches, followed by a partial GPU cache flush of the + * individual pages that were unmapped if feature is supported on GPU. + * 2. Perform a full GPU cache flush through the GPU_CONTROL interface if feature is + * supported on GPU or, + * 3. Perform a full GPU cache flush through the MMU_CONTROL interface. + * + * When performing a partial GPU cache flush, the number of physical + * pages does not have to be identical to the number of virtual pages on the MMU, + * to support a single physical address flush for an aliased page. */ -int kbase_mmu_teardown_pages(struct kbase_device *kbdev, - struct kbase_mmu_table *mmut, u64 vpfn, size_t nr, int as_nr) +static void mmu_flush_invalidate_teardown_pages(struct kbase_device *kbdev, + struct kbase_context *kctx, int as_nr, + struct tagged_addr *phys, size_t phys_page_nr, + struct kbase_mmu_hw_op_param *op_param) { - phys_addr_t pgd; - u64 start_vpfn = vpfn; - size_t requested_nr = nr; - struct kbase_mmu_mode const *mmu_mode; - int err = -EFAULT; + if (!mmu_flush_cache_on_gpu_ctrl(kbdev)) { + /* Full cache flush through the MMU_COMMAND */ + mmu_flush_invalidate(kbdev, kctx, as_nr, op_param); + } else if (op_param->op == KBASE_MMU_OP_FLUSH_MEM) { + /* Full cache flush through the GPU_CONTROL */ + mmu_flush_invalidate_on_gpu_ctrl(kbdev, kctx, as_nr, op_param); + } +#if MALI_USE_CSF + else { + /* Partial GPU cache flush with MMU cache invalidation */ + unsigned long irq_flags; + unsigned int i; + bool flush_done = false; - /* Calls to this function are inherently asynchronous, with respect to - * MMU operations. - */ - const enum kbase_caller_mmu_sync_info mmu_sync_info = CALLER_MMU_ASYNC; + mmu_invalidate(kbdev, kctx, as_nr, op_param); - if (nr == 0) { - /* early out if nothing to do */ - return 0; + for (i = 0; !flush_done && i < phys_page_nr; i++) { + spin_lock_irqsave(&kbdev->hwaccess_lock, irq_flags); + if (kbdev->pm.backend.gpu_ready && (!kctx || kctx->as_nr >= 0)) + mmu_flush_pa_range(kbdev, as_phys_addr_t(phys[i]), PAGE_SIZE, + KBASE_MMU_OP_FLUSH_MEM); + else + flush_done = true; + spin_unlock_irqrestore(&kbdev->hwaccess_lock, irq_flags); + } } +#endif +} - if (!rt_mutex_trylock(&mmut->mmu_lock)) { - /* - * Sometimes, mmu_lock takes long time to be released. - * In that case, kswapd is stuck until it can hold - * the lock. Instead, just bail out here so kswapd - * could reclaim other pages. - */ - if (current_is_kswapd()) - return -EBUSY; - rt_mutex_lock(&mmut->mmu_lock); - } +static int kbase_mmu_teardown_pgd_pages(struct kbase_device *kbdev, struct kbase_mmu_table *mmut, + u64 vpfn, size_t nr, u64 *dirty_pgds, + struct list_head *free_pgds_list, + enum kbase_mmu_op_type flush_op) +{ + struct kbase_mmu_mode const *mmu_mode = kbdev->mmu_mode; - mmu_mode = kbdev->mmu_mode; + lockdep_assert_held(&mmut->mmu_lock); + kbase_mmu_reset_free_pgds_list(mmut); while (nr) { - unsigned int i; unsigned int index = vpfn & 0x1FF; unsigned int count = KBASE_MMU_PAGE_ENTRIES - index; unsigned int pcount; @@ -2131,19 +2923,19 @@ int kbase_mmu_teardown_pages(struct kbase_device *kbdev, u64 *page; phys_addr_t pgds[MIDGARD_MMU_BOTTOMLEVEL + 1]; register unsigned int num_of_valid_entries; + phys_addr_t pgd = mmut->pgd; + struct page *p = phys_to_page(pgd); if (count > nr) count = nr; - /* need to check if this is a 2MB or a 4kB page */ - pgd = mmut->pgd; - + /* need to check if this is a 2MB page or a 4kB */ for (level = MIDGARD_MMU_TOPLEVEL; level <= MIDGARD_MMU_BOTTOMLEVEL; level++) { phys_addr_t next_pgd; index = (vpfn >> ((3 - level) * 9)) & 0x1FF; - page = kmap(phys_to_page(pgd)); + page = kbase_kmap(p); if (mmu_mode->ate_is_valid(page[index], level)) break; /* keep the mapping */ else if (!mmu_mode->pte_is_valid(page[index], level)) { @@ -2166,28 +2958,31 @@ int kbase_mmu_teardown_pages(struct kbase_device *kbdev, count = nr; goto next; } - next_pgd = mmu_mode->pte_to_phy_addr(page[index]); + next_pgd = mmu_mode->pte_to_phy_addr( + kbdev->mgm_dev->ops.mgm_pte_to_original_pte( + kbdev->mgm_dev, MGM_DEFAULT_PTE_GROUP, level, page[index])); + kbase_kunmap(p, page); pgds[level] = pgd; - kunmap(phys_to_page(pgd)); pgd = next_pgd; + p = phys_to_page(pgd); } switch (level) { case MIDGARD_MMU_LEVEL(0): case MIDGARD_MMU_LEVEL(1): - dev_warn(kbdev->dev, - "%s: No support for ATEs at level %d\n", - __func__, level); - kunmap(phys_to_page(pgd)); + dev_warn(kbdev->dev, "%s: No support for ATEs at level %d", __func__, + level); + kbase_kunmap(p, page); goto out; case MIDGARD_MMU_LEVEL(2): /* can only teardown if count >= 512 */ if (count >= 512) { pcount = 1; } else { - dev_warn(kbdev->dev, - "%s: limiting teardown as it tries to do a partial 2MB teardown, need 512, but have %d to tear down\n", - __func__, count); + dev_warn( + kbdev->dev, + "%s: limiting teardown as it tries to do a partial 2MB teardown, need 512, but have %d to tear down", + __func__, count); pcount = 0; } break; @@ -2196,68 +2991,205 @@ int kbase_mmu_teardown_pages(struct kbase_device *kbdev, pcount = count; break; default: - dev_err(kbdev->dev, - "%s: found non-mapped memory, early out\n", - __func__); + dev_err(kbdev->dev, "%s: found non-mapped memory, early out", __func__); vpfn += count; nr -= count; continue; } + if (pcount > 0) + *dirty_pgds |= 1ULL << level; + num_of_valid_entries = mmu_mode->get_num_valid_entries(page); if (WARN_ON_ONCE(num_of_valid_entries < pcount)) num_of_valid_entries = 0; else num_of_valid_entries -= pcount; + /* Invalidate the entries we added */ + mmu_mode->entries_invalidate(&page[index], pcount); + if (!num_of_valid_entries) { - kunmap(phys_to_page(pgd)); + kbase_kunmap(p, page); + + /* Ensure the cacheline(s) containing the last valid entries + * of PGD is invalidated from the GPU cache, before the + * PGD page is freed. + */ + kbase_mmu_sync_pgd_gpu(kbdev, mmut->kctx, + pgd + (index * sizeof(u64)), + pcount * sizeof(u64), flush_op); - kbase_mmu_free_pgd(kbdev, mmut, pgd, true); + kbase_mmu_add_to_free_pgds_list(mmut, p); - kbase_mmu_update_and_free_parent_pgds(kbdev, mmut, pgds, - vpfn, level); + kbase_mmu_update_and_free_parent_pgds(kbdev, mmut, pgds, vpfn, level, + flush_op, dirty_pgds); vpfn += count; nr -= count; continue; } - /* Invalidate the entries we added */ - for (i = 0; i < pcount; i++) - mmu_mode->entry_invalidate(&page[index + i]); - mmu_mode->set_num_valid_entries(page, num_of_valid_entries); - kbase_mmu_sync_pgd( - kbdev, kbase_dma_addr(phys_to_page(pgd)) + 8 * index, - 8 * pcount); + kbase_mmu_sync_pgd(kbdev, mmut->kctx, pgd + (index * sizeof(u64)), + kbase_dma_addr(p) + (index * sizeof(u64)), pcount * sizeof(u64), + flush_op); next: - kunmap(phys_to_page(pgd)); - vpfn += count; - nr -= count; + kbase_kunmap(p, page); + vpfn += count; + nr -= count; } - err = 0; out: - rt_mutex_unlock(&mmut->mmu_lock); + return 0; +} - if (mmut->kctx) - kbase_mmu_flush_invalidate(mmut->kctx, start_vpfn, requested_nr, - true, mmu_sync_info); - else - kbase_mmu_flush_invalidate_no_ctx(kbdev, start_vpfn, - requested_nr, true, as_nr, - mmu_sync_info); +/** + * mmu_teardown_pages - Remove GPU virtual addresses from the MMU page table + * + * @kbdev: Pointer to kbase device. + * @mmut: Pointer to GPU MMU page table. + * @vpfn: Start page frame number of the GPU virtual pages to unmap. + * @phys: Array of physical pages currently mapped to the virtual + * pages to unmap, or NULL. This is used for GPU cache maintenance + * and page migration support. + * @nr_phys_pages: Number of physical pages to flush. + * @nr_virt_pages: Number of virtual pages whose PTEs should be destroyed. + * @as_nr: Address space number, for GPU cache maintenance operations + * that happen outside a specific kbase context. + * @ignore_page_migration: Whether page migration metadata should be ignored. + * + * We actually discard the ATE and free the page table pages if no valid entries + * exist in the PGD. + * + * IMPORTANT: This uses kbasep_js_runpool_release_ctx() when the context is + * currently scheduled into the runpool, and so potentially uses a lot of locks. + * These locks must be taken in the correct order with respect to others + * already held by the caller. Refer to kbasep_js_runpool_release_ctx() for more + * information. + * + * The @p phys pointer to physical pages is not necessary for unmapping virtual memory, + * but it is used for fine-grained GPU cache maintenance. If @p phys is NULL, + * GPU cache maintenance will be done as usual; that is, invalidating the whole GPU caches + * instead of specific physical address ranges. + * + * Return: 0 on success, otherwise an error code. + */ +static int mmu_teardown_pages(struct kbase_device *kbdev, struct kbase_mmu_table *mmut, u64 vpfn, + struct tagged_addr *phys, size_t nr_phys_pages, size_t nr_virt_pages, + int as_nr, bool ignore_page_migration) +{ + u64 start_vpfn = vpfn; + enum kbase_mmu_op_type flush_op = KBASE_MMU_OP_NONE; + struct kbase_mmu_hw_op_param op_param; + int err = -EFAULT; + u64 dirty_pgds = 0; + LIST_HEAD(free_pgds_list); + + /* Calls to this function are inherently asynchronous, with respect to + * MMU operations. + */ + const enum kbase_caller_mmu_sync_info mmu_sync_info = CALLER_MMU_ASYNC; + + /* This function performs two operations: MMU maintenance and flushing + * the caches. To ensure internal consistency between the caches and the + * MMU, it does not make sense to be able to flush only the physical pages + * from the cache and keep the PTE, nor does it make sense to use this + * function to remove a PTE and keep the physical pages in the cache. + * + * However, we have legitimate cases where we can try to tear down a mapping + * with zero virtual and zero physical pages, so we must have the following + * behaviour: + * - if both physical and virtual page counts are zero, return early + * - if either physical and virtual page counts are zero, return early + * - if there are fewer physical pages than virtual pages, return -EINVAL + */ + if (unlikely(nr_virt_pages == 0 || nr_phys_pages == 0)) + return 0; + + if (unlikely(nr_virt_pages < nr_phys_pages)) + return -EINVAL; + + /* MMU cache flush strategy depends on the number of pages to unmap. In both cases + * the operation is invalidate but the granularity of cache maintenance may change + * according to the situation. + * + * If GPU control command operations are present and the number of pages is "small", + * then the optimal strategy is flushing on the physical address range of the pages + * which are affected by the operation. That implies both the PGDs which are modified + * or removed from the page table and the physical pages which are freed from memory. + * + * Otherwise, there's no alternative to invalidating the whole GPU cache. + */ + if (mmu_flush_cache_on_gpu_ctrl(kbdev) && phys && + nr_phys_pages <= KBASE_PA_RANGE_THRESHOLD_NR_PAGES) + flush_op = KBASE_MMU_OP_FLUSH_PT; + + if (!rt_mutex_trylock(&mmut->mmu_lock)) { + /* + * Sometimes, mmu_lock takes long time to be released. + * In that case, kswapd is stuck until it can hold + * the lock. Instead, just bail out here so kswapd + * could reclaim other pages. + */ + if (current_is_kswapd()) + return -EBUSY; + rt_mutex_lock(&mmut->mmu_lock); + } + + err = kbase_mmu_teardown_pgd_pages(kbdev, mmut, vpfn, nr_virt_pages, &dirty_pgds, + &free_pgds_list, flush_op); + + /* Set up MMU operation parameters. See above about MMU cache flush strategy. */ + op_param = (struct kbase_mmu_hw_op_param){ + .vpfn = start_vpfn, + .nr = nr_virt_pages, + .mmu_sync_info = mmu_sync_info, + .kctx_id = mmut->kctx ? mmut->kctx->id : 0xFFFFFFFF, + .op = (flush_op == KBASE_MMU_OP_FLUSH_PT) ? KBASE_MMU_OP_FLUSH_PT : + KBASE_MMU_OP_FLUSH_MEM, + .flush_skip_levels = pgd_level_to_skip_flush(dirty_pgds), + }; + mmu_flush_invalidate_teardown_pages(kbdev, mmut->kctx, as_nr, phys, nr_phys_pages, + &op_param); + + /* If page migration is enabled: the status of all physical pages involved + * shall be updated, unless they are not movable. Their status shall be + * updated before releasing the lock to protect against concurrent + * requests to migrate the pages, if they have been isolated. + */ + if (kbase_is_page_migration_enabled() && phys && !ignore_page_migration) + kbase_mmu_progress_migration_on_teardown(kbdev, phys, nr_phys_pages); + + kbase_mmu_free_pgds_list(kbdev, mmut); + + rt_mutex_unlock(&mmut->mmu_lock); return err; } -KBASE_EXPORT_TEST_API(kbase_mmu_teardown_pages); +int kbase_mmu_teardown_pages(struct kbase_device *kbdev, struct kbase_mmu_table *mmut, u64 vpfn, + struct tagged_addr *phys, size_t nr_phys_pages, size_t nr_virt_pages, + int as_nr) +{ + return mmu_teardown_pages(kbdev, mmut, vpfn, phys, nr_phys_pages, nr_virt_pages, as_nr, + false); +} + +int kbase_mmu_teardown_imported_pages(struct kbase_device *kbdev, struct kbase_mmu_table *mmut, + u64 vpfn, struct tagged_addr *phys, size_t nr_phys_pages, + size_t nr_virt_pages, int as_nr) +{ + return mmu_teardown_pages(kbdev, mmut, vpfn, phys, nr_phys_pages, nr_virt_pages, as_nr, + true); +} /** - * kbase_mmu_update_pages_no_flush() - Update attributes data in GPU page table entries + * kbase_mmu_update_pages_no_flush() - Update phy pages and attributes data in GPU + * page table entries * - * @kctx: Kbase context + * @kbdev: Pointer to kbase device. + * @mmut: The involved MMU table * @vpfn: Virtual PFN (Page Frame Number) of the first page to update * @phys: Pointer to the array of tagged physical addresses of the physical * pages that are pointed to by the page table entries (that need to @@ -2267,28 +3199,25 @@ KBASE_EXPORT_TEST_API(kbase_mmu_teardown_pages); * @flags: Flags * @group_id: The physical memory group in which the page was allocated. * Valid range is 0..(MEMORY_GROUP_MANAGER_NR_GROUPS-1). + * @dirty_pgds: Flags to track every level where a PGD has been updated. * * This will update page table entries that already exist on the GPU based on - * the new flags that are passed (the physical pages pointed to by the page - * table entries remain unchanged). It is used as a response to the changes of - * the memory attributes. + * new flags and replace any existing phy pages that are passed (the PGD pages + * remain unchanged). It is used as a response to the changes of phys as well + * as the the memory attributes. * * The caller is responsible for validating the memory attributes. * * Return: 0 if the attributes data in page table entries were updated * successfully, otherwise an error code. */ -static int kbase_mmu_update_pages_no_flush(struct kbase_context *kctx, u64 vpfn, - struct tagged_addr *phys, size_t nr, - unsigned long flags, int const group_id) +int kbase_mmu_update_pages_no_flush(struct kbase_device *kbdev, struct kbase_mmu_table *mmut, + u64 vpfn, struct tagged_addr *phys, size_t nr, + unsigned long flags, int const group_id, u64 *dirty_pgds) { phys_addr_t pgd; u64 *pgd_page; int err; - struct kbase_device *kbdev; - - if (WARN_ON(kctx == NULL)) - return -EINVAL; KBASE_DEBUG_ASSERT(vpfn <= (U64_MAX / PAGE_SIZE)); @@ -2296,9 +3225,7 @@ static int kbase_mmu_update_pages_no_flush(struct kbase_context *kctx, u64 vpfn, if (nr == 0) return 0; - rt_mutex_lock(&kctx->mmu.mmu_lock); - - kbdev = kctx->kbdev; + rt_mutex_lock(&mmut->mmu_lock); while (nr) { unsigned int i; @@ -2314,12 +3241,12 @@ static int kbase_mmu_update_pages_no_flush(struct kbase_context *kctx, u64 vpfn, if (is_huge(*phys) && (index == index_in_large_page(*phys))) cur_level = MIDGARD_MMU_LEVEL(2); - err = mmu_get_pgd_at_level(kbdev, &kctx->mmu, vpfn, cur_level, &pgd); + err = mmu_get_pgd_at_level(kbdev, mmut, vpfn, cur_level, &pgd); if (WARN_ON(err)) goto fail_unlock; p = pfn_to_page(PFN_DOWN(pgd)); - pgd_page = kmap(p); + pgd_page = kbase_kmap(p); if (!pgd_page) { dev_warn(kbdev->dev, "kmap failure on update_pages"); err = -ENOMEM; @@ -2341,9 +3268,9 @@ static int kbase_mmu_update_pages_no_flush(struct kbase_context *kctx, u64 vpfn, pgd_page[level_index] = kbase_mmu_create_ate(kbdev, *target_phys, flags, MIDGARD_MMU_LEVEL(2), group_id); - kbase_mmu_sync_pgd(kbdev, - kbase_dma_addr(p) + (level_index * sizeof(u64)), - sizeof(u64)); + kbase_mmu_sync_pgd(kbdev, mmut->kctx, pgd + (level_index * sizeof(u64)), + kbase_dma_addr(p) + (level_index * sizeof(u64)), + sizeof(u64), KBASE_MMU_OP_NONE); } else { for (i = 0; i < count; i++) { #ifdef CONFIG_MALI_DEBUG @@ -2355,148 +3282,568 @@ static int kbase_mmu_update_pages_no_flush(struct kbase_context *kctx, u64 vpfn, phys[i], flags, MIDGARD_MMU_BOTTOMLEVEL, group_id); } - kbase_mmu_sync_pgd(kbdev, - kbase_dma_addr(p) + (index * sizeof(u64)), - count * sizeof(u64)); + + /* MMU cache flush strategy is NONE because GPU cache maintenance + * will be done by the caller. + */ + kbase_mmu_sync_pgd(kbdev, mmut->kctx, pgd + (index * sizeof(u64)), + kbase_dma_addr(p) + (index * sizeof(u64)), + count * sizeof(u64), KBASE_MMU_OP_NONE); } kbdev->mmu_mode->set_num_valid_entries(pgd_page, num_of_valid_entries); + if (dirty_pgds && count > 0) + *dirty_pgds |= 1ULL << cur_level; + phys += count; vpfn += count; nr -= count; - kunmap(p); + kbase_kunmap(p, pgd_page); } - rt_mutex_unlock(&kctx->mmu.mmu_lock); + rt_mutex_unlock(&mmut->mmu_lock); return 0; fail_unlock: - rt_mutex_unlock(&kctx->mmu.mmu_lock); + rt_mutex_unlock(&mmut->mmu_lock); return err; } -int kbase_mmu_update_pages(struct kbase_context *kctx, u64 vpfn, - struct tagged_addr *phys, size_t nr, - unsigned long flags, int const group_id) +static int kbase_mmu_update_pages_common(struct kbase_device *kbdev, struct kbase_context *kctx, + u64 vpfn, struct tagged_addr *phys, size_t nr, + unsigned long flags, int const group_id) { int err; + u64 dirty_pgds = 0; + struct kbase_mmu_table *mmut; +#if !MALI_USE_CSF + if (unlikely(kctx == NULL)) + return -EINVAL; + + mmut = &kctx->mmu; +#else + mmut = kctx ? &kctx->mmu : &kbdev->csf.mcu_mmu; +#endif + + err = kbase_mmu_update_pages_no_flush(kbdev, mmut, vpfn, phys, nr, flags, group_id, + &dirty_pgds); + + kbase_mmu_flush_invalidate_update_pages(kbdev, kctx, vpfn, nr, dirty_pgds); + + return err; +} + +void kbase_mmu_flush_invalidate_update_pages(struct kbase_device *kbdev, struct kbase_context *kctx, u64 vpfn, + size_t nr, u64 dirty_pgds) +{ + struct kbase_mmu_hw_op_param op_param; /* Calls to this function are inherently asynchronous, with respect to * MMU operations. */ const enum kbase_caller_mmu_sync_info mmu_sync_info = CALLER_MMU_ASYNC; + int as_nr; - err = kbase_mmu_update_pages_no_flush(kctx, vpfn, phys, nr, flags, - group_id); - kbase_mmu_flush_invalidate(kctx, vpfn, nr, true, mmu_sync_info); - return err; +#if !MALI_USE_CSF + if (unlikely(kctx == NULL)) + return; + + as_nr = kctx->as_nr; +#else + as_nr = kctx ? kctx->as_nr : MCU_AS_NR; +#endif + + op_param = (const struct kbase_mmu_hw_op_param){ + .vpfn = vpfn, + .nr = nr, + .op = KBASE_MMU_OP_FLUSH_MEM, + .kctx_id = kctx ? kctx->id : 0xFFFFFFFF, + .mmu_sync_info = mmu_sync_info, + .flush_skip_levels = pgd_level_to_skip_flush(dirty_pgds), + }; + + if (mmu_flush_cache_on_gpu_ctrl(kbdev)) + mmu_flush_invalidate_on_gpu_ctrl(kbdev, kctx, as_nr, &op_param); + else + mmu_flush_invalidate(kbdev, kctx, as_nr, &op_param); } -static void mmu_teardown_level(struct kbase_device *kbdev, - struct kbase_mmu_table *mmut, phys_addr_t pgd, - int level) +int kbase_mmu_update_pages(struct kbase_context *kctx, u64 vpfn, struct tagged_addr *phys, + size_t nr, unsigned long flags, int const group_id) +{ + if (unlikely(kctx == NULL)) + return -EINVAL; + + return kbase_mmu_update_pages_common(kctx->kbdev, kctx, vpfn, phys, nr, flags, group_id); +} + +#if MALI_USE_CSF +int kbase_mmu_update_csf_mcu_pages(struct kbase_device *kbdev, u64 vpfn, struct tagged_addr *phys, + size_t nr, unsigned long flags, int const group_id) +{ + return kbase_mmu_update_pages_common(kbdev, NULL, vpfn, phys, nr, flags, group_id); +} +#endif /* MALI_USE_CSF */ + +static void mmu_page_migration_transaction_begin(struct kbase_device *kbdev) +{ + lockdep_assert_held(&kbdev->hwaccess_lock); + + WARN_ON_ONCE(kbdev->mmu_page_migrate_in_progress); + kbdev->mmu_page_migrate_in_progress = true; +} + +static void mmu_page_migration_transaction_end(struct kbase_device *kbdev) +{ + lockdep_assert_held(&kbdev->hwaccess_lock); + WARN_ON_ONCE(!kbdev->mmu_page_migrate_in_progress); + kbdev->mmu_page_migrate_in_progress = false; + /* Invoke the PM state machine, as the MMU page migration session + * may have deferred a transition in L2 state machine. + */ + kbase_pm_update_state(kbdev); +} + +int kbase_mmu_migrate_page(struct tagged_addr old_phys, struct tagged_addr new_phys, + dma_addr_t old_dma_addr, dma_addr_t new_dma_addr, int level) +{ + struct kbase_page_metadata *page_md = kbase_page_private(as_page(old_phys)); + struct kbase_mmu_hw_op_param op_param; + struct kbase_mmu_table *mmut = (level == MIDGARD_MMU_BOTTOMLEVEL) ? + page_md->data.mapped.mmut : + page_md->data.pt_mapped.mmut; + struct kbase_device *kbdev; + phys_addr_t pgd; + u64 *old_page, *new_page, *pgd_page, *target, vpfn; + int index, check_state, ret = 0; + unsigned long hwaccess_flags = 0; + unsigned int num_of_valid_entries; + u8 vmap_count = 0; + + /* If page migration support is not compiled in, return with fault */ + if (!IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT)) + return -EINVAL; + /* Due to the hard binding of mmu_command_instr with kctx_id via kbase_mmu_hw_op_param, + * here we skip the no kctx case, which is only used with MCU's mmut. + */ + if (!mmut->kctx) + return -EINVAL; + + if (level > MIDGARD_MMU_BOTTOMLEVEL) + return -EINVAL; + else if (level == MIDGARD_MMU_BOTTOMLEVEL) + vpfn = page_md->data.mapped.vpfn; + else + vpfn = PGD_VPFN_LEVEL_GET_VPFN(page_md->data.pt_mapped.pgd_vpfn_level); + + kbdev = mmut->kctx->kbdev; + index = (vpfn >> ((3 - level) * 9)) & 0x1FF; + + /* Create all mappings before copying content. + * This is done as early as possible because it is the only operation that may + * fail. It is possible to do this before taking any locks because the + * pages to migrate are not going to change and even the parent PGD is not + * going to be affected by any other concurrent operation, since the page + * has been isolated before migration and therefore it cannot disappear in + * the middle of this function. + */ + old_page = kbase_kmap(as_page(old_phys)); + if (!old_page) { + dev_warn(kbdev->dev, "%s: kmap failure for old page.", __func__); + ret = -EINVAL; + goto old_page_map_error; + } + + new_page = kbase_kmap(as_page(new_phys)); + if (!new_page) { + dev_warn(kbdev->dev, "%s: kmap failure for new page.", __func__); + ret = -EINVAL; + goto new_page_map_error; + } + + /* GPU cache maintenance affects both memory content and page table, + * but at two different stages. A single virtual memory page is affected + * by the migration. + * + * Notice that the MMU maintenance is done in the following steps: + * + * 1) The MMU region is locked without performing any other operation. + * This lock must cover the entire migration process, in order to + * prevent any GPU access to the virtual page whose physical page + * is being migrated. + * 2) Immediately after locking: the MMU region content is flushed via + * GPU control while the lock is taken and without unlocking. + * The region must stay locked for the duration of the whole page + * migration procedure. + * This is necessary to make sure that pending writes to the old page + * are finalized before copying content to the new page. + * 3) Before unlocking: changes to the page table are flushed. + * Finer-grained GPU control operations are used if possible, otherwise + * the whole GPU cache shall be flushed again. + * This is necessary to make sure that the GPU accesses the new page + * after migration. + * 4) The MMU region is unlocked. + */ +#define PGD_VPFN_MASK(level) (~((((u64)1) << ((3 - level) * 9)) - 1)) + op_param.mmu_sync_info = CALLER_MMU_ASYNC; + op_param.kctx_id = mmut->kctx->id; + op_param.vpfn = vpfn & PGD_VPFN_MASK(level); + op_param.nr = 1 << ((3 - level) * 9); + op_param.op = KBASE_MMU_OP_FLUSH_PT; + /* When level is not MIDGARD_MMU_BOTTOMLEVEL, it is assumed PGD page migration */ + op_param.flush_skip_levels = (level == MIDGARD_MMU_BOTTOMLEVEL) ? + pgd_level_to_skip_flush(1ULL << level) : + pgd_level_to_skip_flush(3ULL << level); + + rt_mutex_lock(&mmut->mmu_lock); + + /* The state was evaluated before entering this function, but it could + * have changed before the mmu_lock was taken. However, the state + * transitions which are possible at this point are only two, and in both + * cases it is a stable state progressing to a "free in progress" state. + * + * After taking the mmu_lock the state can no longer change: read it again + * and make sure that it hasn't changed before continuing. + */ + spin_lock(&page_md->migrate_lock); + check_state = PAGE_STATUS_GET(page_md->status); + if (level == MIDGARD_MMU_BOTTOMLEVEL) + vmap_count = page_md->vmap_count; + spin_unlock(&page_md->migrate_lock); + + if (level == MIDGARD_MMU_BOTTOMLEVEL) { + if (check_state != ALLOCATED_MAPPED) { + dev_dbg(kbdev->dev, + "%s: state changed to %d (was %d), abort page migration", __func__, + check_state, ALLOCATED_MAPPED); + ret = -EAGAIN; + goto page_state_change_out; + } else if (vmap_count > 0) { + dev_dbg(kbdev->dev, "%s: page was multi-mapped, abort page migration", + __func__); + ret = -EAGAIN; + goto page_state_change_out; + } + } else { + if (check_state != PT_MAPPED) { + dev_dbg(kbdev->dev, + "%s: state changed to %d (was %d), abort PGD page migration", + __func__, check_state, PT_MAPPED); + WARN_ON_ONCE(check_state != FREE_PT_ISOLATED_IN_PROGRESS); + ret = -EAGAIN; + goto page_state_change_out; + } + } + + ret = mmu_get_pgd_at_level(kbdev, mmut, vpfn, level, &pgd); + if (ret) { + dev_err(kbdev->dev, "%s: failed to find PGD for old page.", __func__); + goto get_pgd_at_level_error; + } + + pgd_page = kbase_kmap(phys_to_page(pgd)); + if (!pgd_page) { + dev_warn(kbdev->dev, "%s: kmap failure for PGD page.", __func__); + ret = -EINVAL; + goto pgd_page_map_error; + } + + mutex_lock(&kbdev->mmu_hw_mutex); + + /* Lock MMU region and flush GPU cache by using GPU control, + * in order to keep MMU region locked. + */ + spin_lock_irqsave(&kbdev->hwaccess_lock, hwaccess_flags); + if (unlikely(!kbase_pm_l2_allow_mmu_page_migration(kbdev))) { + /* Defer the migration as L2 is in a transitional phase */ + spin_unlock_irqrestore(&kbdev->hwaccess_lock, hwaccess_flags); + mutex_unlock(&kbdev->mmu_hw_mutex); + dev_dbg(kbdev->dev, "%s: L2 in transtion, abort PGD page migration", __func__); + ret = -EAGAIN; + goto l2_state_defer_out; + } + /* Prevent transitional phases in L2 by starting the transaction */ + mmu_page_migration_transaction_begin(kbdev); + if (kbdev->pm.backend.gpu_ready && mmut->kctx->as_nr >= 0) { + int as_nr = mmut->kctx->as_nr; + struct kbase_as *as = &kbdev->as[as_nr]; + + ret = kbase_mmu_hw_do_lock(kbdev, as, &op_param); + if (!ret) { + ret = kbase_gpu_cache_flush_and_busy_wait( + kbdev, GPU_COMMAND_CACHE_CLN_INV_L2_LSC); + } + if (ret) + mmu_page_migration_transaction_end(kbdev); + } + spin_unlock_irqrestore(&kbdev->hwaccess_lock, hwaccess_flags); + + if (ret < 0) { + mutex_unlock(&kbdev->mmu_hw_mutex); + dev_err(kbdev->dev, "%s: failed to lock MMU region or flush GPU cache", __func__); + goto undo_mappings; + } + + /* Copy memory content. + * + * It is necessary to claim the ownership of the DMA buffer for the old + * page before performing the copy, to make sure of reading a consistent + * version of its content, before copying. After the copy, ownership of + * the DMA buffer for the new page is given to the GPU in order to make + * the content visible to potential GPU access that may happen as soon as + * this function releases the lock on the MMU region. + */ + dma_sync_single_for_cpu(kbdev->dev, old_dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL); + memcpy(new_page, old_page, PAGE_SIZE); + dma_sync_single_for_device(kbdev->dev, new_dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL); + + /* Remap GPU virtual page. + * + * This code rests on the assumption that page migration is only enabled + * for 4 kB pages, that necessarily live in the bottom level of the MMU + * page table. For this reason, the PGD level tells us inequivocably + * whether the page being migrated is a "content page" or another PGD + * of the page table: + * + * - Bottom level implies ATE (Address Translation Entry) + * - Any other level implies PTE (Page Table Entry) + * + * The current implementation doesn't handle the case of a level 0 PGD, + * that is: the root PGD of the page table. + */ + target = &pgd_page[index]; + + /* Certain entries of a page table page encode the count of valid entries + * present in that page. So need to save & restore the count information + * when updating the PTE/ATE to point to the new page. + */ + num_of_valid_entries = kbdev->mmu_mode->get_num_valid_entries(pgd_page); + + if (level == MIDGARD_MMU_BOTTOMLEVEL) { + WARN_ON_ONCE((*target & 1UL) == 0); + *target = + kbase_mmu_create_ate(kbdev, new_phys, page_md->data.mapped.reg->flags, + level, page_md->data.mapped.reg->gpu_alloc->group_id); + } else { + u64 managed_pte; + +#ifdef CONFIG_MALI_DEBUG + /* The PTE should be pointing to the page being migrated */ + WARN_ON_ONCE(as_phys_addr_t(old_phys) != kbdev->mmu_mode->pte_to_phy_addr( + kbdev->mgm_dev->ops.mgm_pte_to_original_pte( + kbdev->mgm_dev, MGM_DEFAULT_PTE_GROUP, level, pgd_page[index]))); +#endif + kbdev->mmu_mode->entry_set_pte(&managed_pte, as_phys_addr_t(new_phys)); + *target = kbdev->mgm_dev->ops.mgm_update_gpu_pte( + kbdev->mgm_dev, MGM_DEFAULT_PTE_GROUP, level, managed_pte); + } + + kbdev->mmu_mode->set_num_valid_entries(pgd_page, num_of_valid_entries); + + /* This function always updates a single entry inside an existing PGD, + * therefore cache maintenance is necessary and affects a single entry. + */ + kbase_mmu_sync_pgd(kbdev, mmut->kctx, pgd + (index * sizeof(u64)), + kbase_dma_addr(phys_to_page(pgd)) + (index * sizeof(u64)), sizeof(u64), + KBASE_MMU_OP_FLUSH_PT); + + /* Unlock MMU region. + * + * Notice that GPUs which don't issue flush commands via GPU control + * still need an additional GPU cache flush here, this time only + * for the page table, because the function call above to sync PGDs + * won't have any effect on them. + */ + spin_lock_irqsave(&kbdev->hwaccess_lock, hwaccess_flags); + if (kbdev->pm.backend.gpu_ready && mmut->kctx->as_nr >= 0) { + int as_nr = mmut->kctx->as_nr; + struct kbase_as *as = &kbdev->as[as_nr]; + + if (mmu_flush_cache_on_gpu_ctrl(kbdev)) { + ret = kbase_mmu_hw_do_unlock(kbdev, as, &op_param); + } else { + ret = kbase_gpu_cache_flush_and_busy_wait(kbdev, + GPU_COMMAND_CACHE_CLN_INV_L2); + if (!ret) + ret = kbase_mmu_hw_do_unlock_no_addr(kbdev, as, &op_param); + } + } + spin_unlock_irqrestore(&kbdev->hwaccess_lock, hwaccess_flags); + /* Releasing locks before checking the migration transaction error state */ + mutex_unlock(&kbdev->mmu_hw_mutex); + + spin_lock_irqsave(&kbdev->hwaccess_lock, hwaccess_flags); + /* Release the transition prevention in L2 by ending the transaction */ + mmu_page_migration_transaction_end(kbdev); + spin_unlock_irqrestore(&kbdev->hwaccess_lock, hwaccess_flags); + + /* Checking the final migration transaction error state */ + if (ret < 0) { + dev_err(kbdev->dev, "%s: failed to unlock MMU region.", __func__); + goto undo_mappings; + } + + /* Undertaking metadata transfer, while we are holding the mmu_lock */ + spin_lock(&page_md->migrate_lock); + if (level == MIDGARD_MMU_BOTTOMLEVEL) { + size_t page_array_index = + page_md->data.mapped.vpfn - page_md->data.mapped.reg->start_pfn; + + WARN_ON(PAGE_STATUS_GET(page_md->status) != ALLOCATED_MAPPED); + + /* Replace page in array of pages of the physical allocation. */ + page_md->data.mapped.reg->gpu_alloc->pages[page_array_index] = new_phys; + } + /* Update the new page dma_addr with the transferred metadata from the old_page */ + page_md->dma_addr = new_dma_addr; + page_md->status = PAGE_ISOLATE_SET(page_md->status, 0); + spin_unlock(&page_md->migrate_lock); + set_page_private(as_page(new_phys), (unsigned long)page_md); + /* Old page metatdata pointer cleared as it now owned by the new page */ + set_page_private(as_page(old_phys), 0); + +l2_state_defer_out: + kbase_kunmap(phys_to_page(pgd), pgd_page); +pgd_page_map_error: +get_pgd_at_level_error: +page_state_change_out: + rt_mutex_unlock(&mmut->mmu_lock); + + kbase_kunmap(as_page(new_phys), new_page); +new_page_map_error: + kbase_kunmap(as_page(old_phys), old_page); +old_page_map_error: + return ret; + +undo_mappings: + /* Unlock the MMU table and undo mappings. */ + rt_mutex_unlock(&mmut->mmu_lock); + kbase_kunmap(phys_to_page(pgd), pgd_page); + kbase_kunmap(as_page(new_phys), new_page); + kbase_kunmap(as_page(old_phys), old_page); + + return ret; +} + +static void mmu_teardown_level(struct kbase_device *kbdev, struct kbase_mmu_table *mmut, + phys_addr_t pgd, unsigned int level) { - phys_addr_t target_pgd; u64 *pgd_page; int i; - struct kbase_mmu_mode const *mmu_mode; - u64 *pgd_page_buffer; + struct memory_group_manager_device *mgm_dev = kbdev->mgm_dev; + struct kbase_mmu_mode const *mmu_mode = kbdev->mmu_mode; + u64 *pgd_page_buffer = NULL; + struct page *p = phys_to_page(pgd); lockdep_assert_held(&mmut->mmu_lock); - /* Early-out. No need to kmap to check entries for L3 PGD. */ - if (level == MIDGARD_MMU_BOTTOMLEVEL) { - kbase_mmu_free_pgd(kbdev, mmut, pgd, true); + pgd_page = kbase_kmap_atomic(p); + /* kmap_atomic should NEVER fail. */ + if (WARN_ON_ONCE(pgd_page == NULL)) return; + if (level < MIDGARD_MMU_BOTTOMLEVEL) { + /* Copy the page to our preallocated buffer so that we can minimize + * kmap_atomic usage + */ + pgd_page_buffer = mmut->scratch_mem.teardown_pages.levels[level]; + memcpy(pgd_page_buffer, pgd_page, PAGE_SIZE); } - pgd_page = kmap_atomic(pfn_to_page(PFN_DOWN(pgd))); - /* kmap_atomic should NEVER fail. */ - if (WARN_ON(pgd_page == NULL)) - return; - /* Copy the page to our preallocated buffer so that we can minimize - * kmap_atomic usage + /* When page migration is enabled, kbase_region_tracker_term() would ensure + * there are no pages left mapped on the GPU for a context. Hence the count + * of valid entries is expected to be zero here. */ - pgd_page_buffer = mmut->mmu_teardown_pages[level]; - memcpy(pgd_page_buffer, pgd_page, PAGE_SIZE); - kunmap_atomic(pgd_page); + if (kbase_is_page_migration_enabled() && mmut->kctx) + WARN_ON_ONCE(kbdev->mmu_mode->get_num_valid_entries(pgd_page)); + /* Invalidate page after copying */ + mmu_mode->entries_invalidate(pgd_page, KBASE_MMU_PAGE_ENTRIES); + kbase_kunmap_atomic(pgd_page); pgd_page = pgd_page_buffer; - mmu_mode = kbdev->mmu_mode; - - for (i = 0; i < KBASE_MMU_PAGE_ENTRIES; i++) { - target_pgd = mmu_mode->pte_to_phy_addr(pgd_page[i]); - - if (target_pgd) { + if (level < MIDGARD_MMU_BOTTOMLEVEL) { + for (i = 0; i < KBASE_MMU_PAGE_ENTRIES; i++) { if (mmu_mode->pte_is_valid(pgd_page[i], level)) { - mmu_teardown_level(kbdev, mmut, - target_pgd, - level + 1); + phys_addr_t target_pgd = mmu_mode->pte_to_phy_addr( + mgm_dev->ops.mgm_pte_to_original_pte(mgm_dev, + MGM_DEFAULT_PTE_GROUP, + level, pgd_page[i])); + + mmu_teardown_level(kbdev, mmut, target_pgd, level + 1); } } } - kbase_mmu_free_pgd(kbdev, mmut, pgd, true); + kbase_mmu_free_pgd(kbdev, mmut, pgd); +} + +static void kbase_mmu_mark_non_movable(struct page *page) +{ + struct kbase_page_metadata *page_md; + + if (!kbase_is_page_migration_enabled()) + return; + + page_md = kbase_page_private(page); + + spin_lock(&page_md->migrate_lock); + page_md->status = PAGE_STATUS_SET(page_md->status, NOT_MOVABLE); + + if (IS_PAGE_MOVABLE(page_md->status)) + page_md->status = PAGE_MOVABLE_CLEAR(page_md->status); + + spin_unlock(&page_md->migrate_lock); } int kbase_mmu_init(struct kbase_device *const kbdev, struct kbase_mmu_table *const mmut, struct kbase_context *const kctx, int const group_id) { - int level; - if (WARN_ON(group_id >= MEMORY_GROUP_MANAGER_NR_GROUPS) || WARN_ON(group_id < 0)) return -EINVAL; + compiletime_assert(KBASE_MEM_ALLOC_MAX_SIZE <= (((8ull << 30) >> PAGE_SHIFT)), + "List of free PGDs may not be large enough."); + compiletime_assert(MAX_PAGES_FOR_FREE_PGDS >= MIDGARD_MMU_BOTTOMLEVEL, + "Array of MMU levels is not large enough."); + mmut->group_id = group_id; rt_mutex_init(&mmut->mmu_lock); mmut->kctx = kctx; - mmut->pgd = 0; - - /* Preallocate MMU depth of 3 pages for mmu_teardown_level to use */ - for (level = MIDGARD_MMU_TOPLEVEL; - level < MIDGARD_MMU_BOTTOMLEVEL; level++) { - mmut->mmu_teardown_pages[level] = - kmalloc(PAGE_SIZE, GFP_KERNEL); - - if (!mmut->mmu_teardown_pages[level]) { - kbase_mmu_term(kbdev, mmut); - return -ENOMEM; - } - } + mmut->pgd = KBASE_MMU_INVALID_PGD_ADDRESS; /* We allocate pages into the kbdev memory pool, then * kbase_mmu_alloc_pgd will allocate out of that pool. This is done to * avoid allocations from the kernel happening with the lock held. */ - while (!mmut->pgd) { + while (mmut->pgd == KBASE_MMU_INVALID_PGD_ADDRESS) { int err; err = kbase_mem_pool_grow( &kbdev->mem_pools.small[mmut->group_id], - MIDGARD_MMU_BOTTOMLEVEL); + MIDGARD_MMU_BOTTOMLEVEL, kctx ? kctx->task : NULL); if (err) { kbase_mmu_term(kbdev, mmut); return -ENOMEM; } - rt_mutex_lock(&mmut->mmu_lock); mmut->pgd = kbase_mmu_alloc_pgd(kbdev, mmut); - rt_mutex_unlock(&mmut->mmu_lock); } + kbase_mmu_mark_non_movable(pfn_to_page(PFN_DOWN(mmut->pgd))); return 0; } void kbase_mmu_term(struct kbase_device *kbdev, struct kbase_mmu_table *mmut) { - int level; + WARN((mmut->kctx) && (mmut->kctx->as_nr != KBASEP_AS_NR_INVALID), + "kctx-%d_%d must first be scheduled out to flush GPU caches+tlbs before tearing down MMU tables", + mmut->kctx->tgid, mmut->kctx->id); - if (mmut->pgd) { + if (mmut->pgd != KBASE_MMU_INVALID_PGD_ADDRESS) { rt_mutex_lock(&mmut->mmu_lock); mmu_teardown_level(kbdev, mmut, mmut->pgd, MIDGARD_MMU_TOPLEVEL); rt_mutex_unlock(&mmut->mmu_lock); @@ -2504,20 +3851,29 @@ void kbase_mmu_term(struct kbase_device *kbdev, struct kbase_mmu_table *mmut) if (mmut->kctx) KBASE_TLSTREAM_AUX_PAGESALLOC(kbdev, mmut->kctx->id, 0); } - - for (level = MIDGARD_MMU_TOPLEVEL; - level < MIDGARD_MMU_BOTTOMLEVEL; level++) { - if (!mmut->mmu_teardown_pages[level]) - break; - kfree(mmut->mmu_teardown_pages[level]); - } } -void kbase_mmu_as_term(struct kbase_device *kbdev, int i) +void kbase_mmu_as_term(struct kbase_device *kbdev, unsigned int i) { destroy_workqueue(kbdev->as[i].pf_wq); } +void kbase_mmu_flush_pa_range(struct kbase_device *kbdev, struct kbase_context *kctx, + phys_addr_t phys, size_t size, + enum kbase_mmu_op_type flush_op) +{ +#if MALI_USE_CSF + unsigned long irq_flags; + + spin_lock_irqsave(&kbdev->hwaccess_lock, irq_flags); + if (mmu_flush_cache_on_gpu_ctrl(kbdev) && (flush_op != KBASE_MMU_OP_NONE) && + kbdev->pm.backend.gpu_ready && (!kctx || kctx->as_nr >= 0)) + mmu_flush_pa_range(kbdev, phys, size, KBASE_MMU_OP_FLUSH_PT); + spin_unlock_irqrestore(&kbdev->hwaccess_lock, irq_flags); +#endif +} + +#ifdef CONFIG_MALI_VECTOR_DUMP static size_t kbasep_mmu_dump_level(struct kbase_context *kctx, phys_addr_t pgd, int level, char ** const buffer, size_t *size_left) { @@ -2536,9 +3892,9 @@ static size_t kbasep_mmu_dump_level(struct kbase_context *kctx, phys_addr_t pgd, kbdev = kctx->kbdev; mmu_mode = kbdev->mmu_mode; - pgd_page = kmap(pfn_to_page(PFN_DOWN(pgd))); + pgd_page = kbase_kmap(pfn_to_page(PFN_DOWN(pgd))); if (!pgd_page) { - dev_warn(kbdev->dev, "%s: kmap failure\n", __func__); + dev_warn(kbdev->dev, "%s: kmap failure", __func__); return 0; } @@ -2563,13 +3919,15 @@ static size_t kbasep_mmu_dump_level(struct kbase_context *kctx, phys_addr_t pgd, for (i = 0; i < KBASE_MMU_PAGE_ENTRIES; i++) { if (mmu_mode->pte_is_valid(pgd_page[i], level)) { target_pgd = mmu_mode->pte_to_phy_addr( - pgd_page[i]); + kbdev->mgm_dev->ops.mgm_pte_to_original_pte( + kbdev->mgm_dev, MGM_DEFAULT_PTE_GROUP, + level, pgd_page[i])); dump_size = kbasep_mmu_dump_level(kctx, target_pgd, level + 1, buffer, size_left); if (!dump_size) { - kunmap(pfn_to_page(PFN_DOWN(pgd))); + kbase_kunmap(pfn_to_page(PFN_DOWN(pgd)), pgd_page); return 0; } size += dump_size; @@ -2577,7 +3935,7 @@ static size_t kbasep_mmu_dump_level(struct kbase_context *kctx, phys_addr_t pgd, } } - kunmap(pfn_to_page(PFN_DOWN(pgd))); + kbase_kunmap(pfn_to_page(PFN_DOWN(pgd)), pgd_page); return size; } @@ -2657,6 +4015,7 @@ fail_free: return NULL; } KBASE_EXPORT_TEST_API(kbase_mmu_dump); +#endif /* CONFIG_MALI_VECTOR_DUMP */ void kbase_mmu_bus_fault_worker(struct work_struct *data) { @@ -2689,8 +4048,7 @@ void kbase_mmu_bus_fault_worker(struct work_struct *data) #ifdef CONFIG_MALI_ARBITER_SUPPORT /* check if we still have GPU */ if (unlikely(kbase_is_gpu_removed(kbdev))) { - dev_dbg(kbdev->dev, - "%s: GPU has been removed\n", __func__); + dev_dbg(kbdev->dev, "%s: GPU has been removed", __func__); release_ctx(kbdev, kctx); atomic_dec(&kbdev->faults_pending); return; diff --git a/mali_kbase/mmu/mali_kbase_mmu.h b/mali_kbase/mmu/mali_kbase_mmu.h index 49665fb..e13e9b9 100644 --- a/mali_kbase/mmu/mali_kbase_mmu.h +++ b/mali_kbase/mmu/mali_kbase_mmu.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -25,17 +25,19 @@ #include <uapi/gpu/arm/midgard/mali_base_kernel.h> #define KBASE_MMU_PAGE_ENTRIES 512 +#define KBASE_MMU_INVALID_PGD_ADDRESS (~(phys_addr_t)0) struct kbase_context; struct kbase_mmu_table; +struct kbase_va_region; /** * enum kbase_caller_mmu_sync_info - MMU-synchronous caller info. * A pointer to this type is passed down from the outer-most callers in the kbase * module - where the information resides as to the synchronous / asynchronous * nature of the call flow, with respect to MMU operations. ie - does the call flow relate to - * existing GPU work does it come from requests (like ioctl) from user-space, power management, - * etc. + * existing GPU work or does it come from requests (like ioctl) from user-space, power + * management, etc. * * @CALLER_MMU_UNSET_SYNCHRONICITY: default value must be invalid to avoid accidental choice * of a 'valid' value @@ -49,6 +51,26 @@ enum kbase_caller_mmu_sync_info { }; /** + * enum kbase_mmu_op_type - enum for MMU operations + * @KBASE_MMU_OP_NONE: To help catch uninitialized struct + * @KBASE_MMU_OP_FIRST: The lower boundary of enum + * @KBASE_MMU_OP_LOCK: Lock memory region + * @KBASE_MMU_OP_UNLOCK: Unlock memory region + * @KBASE_MMU_OP_FLUSH_PT: Flush page table (CLN+INV L2 only) + * @KBASE_MMU_OP_FLUSH_MEM: Flush memory (CLN+INV L2+LSC) + * @KBASE_MMU_OP_COUNT: The upper boundary of enum + */ +enum kbase_mmu_op_type { + KBASE_MMU_OP_NONE = 0, /* Must be zero */ + KBASE_MMU_OP_FIRST, /* Must be the first non-zero op */ + KBASE_MMU_OP_LOCK = KBASE_MMU_OP_FIRST, + KBASE_MMU_OP_UNLOCK, + KBASE_MMU_OP_FLUSH_PT, + KBASE_MMU_OP_FLUSH_MEM, + KBASE_MMU_OP_COUNT /* Must be the last in enum */ +}; + +/** * kbase_mmu_as_init() - Initialising GPU address space object. * * @kbdev: The kbase device structure for the device (must be a valid pointer). @@ -59,7 +81,7 @@ enum kbase_caller_mmu_sync_info { * * Return: 0 on success and non-zero value on failure. */ -int kbase_mmu_as_init(struct kbase_device *kbdev, int i); +int kbase_mmu_as_init(struct kbase_device *kbdev, unsigned int i); /** * kbase_mmu_as_term() - Terminate address space object. @@ -70,7 +92,7 @@ int kbase_mmu_as_init(struct kbase_device *kbdev, int i); * This is called upon device termination to destroy * the address space object of the device. */ -void kbase_mmu_as_term(struct kbase_device *kbdev, int i); +void kbase_mmu_as_term(struct kbase_device *kbdev, unsigned int i); /** * kbase_mmu_init - Initialise an object representing GPU page tables @@ -129,27 +151,143 @@ void kbase_mmu_term(struct kbase_device *kbdev, struct kbase_mmu_table *mmut); u64 kbase_mmu_create_ate(struct kbase_device *kbdev, struct tagged_addr phy, unsigned long flags, int level, int group_id); -int kbase_mmu_insert_pages_no_flush(struct kbase_device *kbdev, - struct kbase_mmu_table *mmut, - const u64 start_vpfn, - struct tagged_addr *phys, size_t nr, - unsigned long flags, int group_id); -int kbase_mmu_insert_pages(struct kbase_device *kbdev, - struct kbase_mmu_table *mmut, u64 vpfn, - struct tagged_addr *phys, size_t nr, - unsigned long flags, int as_nr, int group_id, - enum kbase_caller_mmu_sync_info mmu_sync_info); -int kbase_mmu_insert_single_page(struct kbase_context *kctx, u64 vpfn, - struct tagged_addr phys, size_t nr, - unsigned long flags, int group_id, - enum kbase_caller_mmu_sync_info mmu_sync_info); - -int kbase_mmu_teardown_pages(struct kbase_device *kbdev, - struct kbase_mmu_table *mmut, u64 vpfn, - size_t nr, int as_nr); +int kbase_mmu_insert_pages_no_flush(struct kbase_device *kbdev, struct kbase_mmu_table *mmut, + u64 vpfn, struct tagged_addr *phys, size_t nr, + unsigned long flags, int group_id, u64 *dirty_pgds, + struct kbase_va_region *reg); +int kbase_mmu_insert_pages(struct kbase_device *kbdev, struct kbase_mmu_table *mmut, u64 vpfn, + struct tagged_addr *phys, size_t nr, unsigned long flags, int as_nr, + int group_id, enum kbase_caller_mmu_sync_info mmu_sync_info, + struct kbase_va_region *reg); + +/** + * kbase_mmu_insert_pages_skip_status_update - Map 'nr' pages pointed to by 'phys' + * at GPU PFN 'vpfn' for GPU address space number 'as_nr'. + * + * @kbdev: Instance of GPU platform device, allocated from the probe method. + * @mmut: GPU page tables. + * @vpfn: Start page frame number of the GPU virtual pages to map. + * @phys: Physical address of the page to be mapped. + * @nr: The number of pages to map. + * @flags: Bitmask of attributes of the GPU memory region being mapped. + * @as_nr: The GPU address space number. + * @group_id: The physical memory group in which the page was allocated. + * @mmu_sync_info: MMU-synchronous caller info. + * @reg: The region whose physical allocation is to be mapped. + * + * Similar to kbase_mmu_insert_pages() but skips updating each pages metadata + * for page migration. + * + * Return: 0 if successful, otherwise a negative error code. + */ +int kbase_mmu_insert_pages_skip_status_update(struct kbase_device *kbdev, + struct kbase_mmu_table *mmut, u64 vpfn, + struct tagged_addr *phys, size_t nr, + unsigned long flags, int as_nr, int group_id, + enum kbase_caller_mmu_sync_info mmu_sync_info, + struct kbase_va_region *reg); +int kbase_mmu_insert_aliased_pages(struct kbase_device *kbdev, struct kbase_mmu_table *mmut, + u64 vpfn, struct tagged_addr *phys, size_t nr, + unsigned long flags, int as_nr, int group_id, + enum kbase_caller_mmu_sync_info mmu_sync_info, + struct kbase_va_region *reg); +int kbase_mmu_insert_single_imported_page(struct kbase_context *kctx, u64 vpfn, + struct tagged_addr phys, size_t nr, unsigned long flags, + int group_id, + enum kbase_caller_mmu_sync_info mmu_sync_info); +int kbase_mmu_insert_single_aliased_page(struct kbase_context *kctx, u64 vpfn, + struct tagged_addr phys, size_t nr, unsigned long flags, + int group_id, + enum kbase_caller_mmu_sync_info mmu_sync_info); + +int kbase_mmu_teardown_pages(struct kbase_device *kbdev, struct kbase_mmu_table *mmut, u64 vpfn, + struct tagged_addr *phys, size_t nr_phys_pages, size_t nr_virt_pages, + int as_nr); +int kbase_mmu_teardown_imported_pages(struct kbase_device *kbdev, struct kbase_mmu_table *mmut, + u64 vpfn, struct tagged_addr *phys, size_t nr_phys_pages, + size_t nr_virt_pages, int as_nr); +#define kbase_mmu_teardown_firmware_pages(kbdev, mmut, vpfn, phys, nr_phys_pages, nr_virt_pages, \ + as_nr) \ + kbase_mmu_teardown_imported_pages(kbdev, mmut, vpfn, phys, nr_phys_pages, nr_virt_pages, \ + as_nr) + int kbase_mmu_update_pages(struct kbase_context *kctx, u64 vpfn, struct tagged_addr *phys, size_t nr, unsigned long flags, int const group_id); +#if MALI_USE_CSF +/** + * kbase_mmu_update_csf_mcu_pages - Update MCU mappings with changes of phys and flags + * + * @kbdev: Pointer to kbase device. + * @vpfn: Virtual PFN (Page Frame Number) of the first page to update + * @phys: Pointer to the array of tagged physical addresses of the physical + * pages that are pointed to by the page table entries (that need to + * be updated). + * @nr: Number of pages to update + * @flags: Flags + * @group_id: The physical memory group in which the page was allocated. + * Valid range is 0..(MEMORY_GROUP_MANAGER_NR_GROUPS-1). + * + * Return: 0 on success, otherwise an error code. + */ +int kbase_mmu_update_csf_mcu_pages(struct kbase_device *kbdev, u64 vpfn, struct tagged_addr *phys, + size_t nr, unsigned long flags, int const group_id); +#endif + +/** + * kbase_mmu_migrate_page - Migrate GPU mappings and content between memory pages + * + * @old_phys: Old physical page to be replaced. + * @new_phys: New physical page used to replace old physical page. + * @old_dma_addr: DMA address of the old page. + * @new_dma_addr: DMA address of the new page. + * @level: MMU page table level of the provided PGD. + * + * The page migration process is made of 2 big steps: + * + * 1) Copy the content of the old page to the new page. + * 2) Remap the virtual page, that is: replace either the ATE (if the old page + * was a regular page) or the PTE (if the old page was used as a PGD) in the + * MMU page table with the new page. + * + * During the process, the MMU region is locked to prevent GPU access to the + * virtual memory page that is being remapped. + * + * Before copying the content of the old page to the new page and while the + * MMU region is locked, a GPU cache flush is performed to make sure that + * pending GPU writes are finalized to the old page before copying. + * That is necessary because otherwise there's a risk that GPU writes might + * be finalized to the old page, and not new page, after migration. + * The MMU region is unlocked only at the end of the migration operation. + * + * Return: 0 on success, otherwise an error code. + */ +int kbase_mmu_migrate_page(struct tagged_addr old_phys, struct tagged_addr new_phys, + dma_addr_t old_dma_addr, dma_addr_t new_dma_addr, int level); + +/** + * kbase_mmu_flush_pa_range() - Flush physical address range from the GPU caches + * + * @kbdev: Instance of GPU platform device, allocated from the probe method. + * @kctx: Pointer to kbase context, it can be NULL if the physical address + * range is not associated with User created context. + * @phys: Starting address of the physical range to start the operation on. + * @size: Number of bytes to work on. + * @flush_op: Type of cache flush operation to perform. + * + * Issue a cache flush physical range command. This function won't perform any + * flush if the GPU doesn't support FLUSH_PA_RANGE command. The flush would be + * performed only if the context has a JASID assigned to it. + * This function is basically a wrapper for kbase_gpu_cache_flush_pa_range_and_busy_wait(). + */ +void kbase_mmu_flush_pa_range(struct kbase_device *kbdev, struct kbase_context *kctx, + phys_addr_t phys, size_t size, + enum kbase_mmu_op_type flush_op); +void kbase_mmu_flush_invalidate_update_pages(struct kbase_device *kbdev, struct kbase_context *kctx, u64 vpfn, + size_t nr, u64 dirty_pgds); +int kbase_mmu_update_pages_no_flush(struct kbase_device *kbdev, struct kbase_mmu_table *mmut, + u64 vpfn, struct tagged_addr *phys, size_t nr, + unsigned long flags, int group_id, u64 *dirty_pgds); /** * kbase_mmu_bus_fault_interrupt - Process a bus fault interrupt. diff --git a/mali_kbase/mmu/mali_kbase_mmu_hw.h b/mali_kbase/mmu/mali_kbase_mmu_hw.h index 31658e0..49e050e 100644 --- a/mali_kbase/mmu/mali_kbase_mmu_hw.h +++ b/mali_kbase/mmu/mali_kbase_mmu_hw.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2014-2015, 2018-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2014-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -55,32 +55,14 @@ enum kbase_mmu_fault_type { }; /** - * enum kbase_mmu_op_type - enum for MMU operations - * @KBASE_MMU_OP_NONE: To help catch uninitialized struct - * @KBASE_MMU_OP_FIRST: The lower boundary of enum - * @KBASE_MMU_OP_LOCK: Lock memory region - * @KBASE_MMU_OP_UNLOCK: Unlock memory region - * @KBASE_MMU_OP_FLUSH_PT: Flush page table (CLN+INV L2 only) - * @KBASE_MMU_OP_FLUSH_MEM: Flush memory (CLN+INV L2+LSC) - * @KBASE_MMU_OP_COUNT: The upper boundary of enum - */ -enum kbase_mmu_op_type { - KBASE_MMU_OP_NONE = 0, /* Must be zero */ - KBASE_MMU_OP_FIRST, /* Must be the first non-zero op */ - KBASE_MMU_OP_LOCK = KBASE_MMU_OP_FIRST, - KBASE_MMU_OP_UNLOCK, - KBASE_MMU_OP_FLUSH_PT, - KBASE_MMU_OP_FLUSH_MEM, - KBASE_MMU_OP_COUNT /* Must be the last in enum */ -}; - -/** - * struct kbase_mmu_hw_op_param - parameters for kbase_mmu_hw_do_operation() - * @vpfn: MMU Virtual Page Frame Number to start the operation on. - * @nr: Number of pages to work on. - * @op: Operation type (written to ASn_COMMAND). - * @kctx_id: Kernel context ID for MMU command tracepoint - * @mmu_sync_info: Indicates whether this call is synchronous wrt MMU ops. + * struct kbase_mmu_hw_op_param - parameters for kbase_mmu_hw_do_* functions + * @vpfn: MMU Virtual Page Frame Number to start the operation on. + * @nr: Number of pages to work on. + * @op: Operation type (written to AS_COMMAND). + * @kctx_id: Kernel context ID for MMU command tracepoint. + * @mmu_sync_info: Indicates whether this call is synchronous wrt MMU ops. + * @flush_skip_levels: Page table levels to skip flushing. (Only + * applicable if GPU supports feature) */ struct kbase_mmu_hw_op_param { u64 vpfn; @@ -88,6 +70,7 @@ struct kbase_mmu_hw_op_param { enum kbase_mmu_op_type op; u32 kctx_id; enum kbase_caller_mmu_sync_info mmu_sync_info; + u64 flush_skip_levels; }; /** @@ -102,18 +85,120 @@ void kbase_mmu_hw_configure(struct kbase_device *kbdev, struct kbase_as *as); /** - * kbase_mmu_hw_do_operation - Issue an operation to the MMU. - * @kbdev: kbase device to issue the MMU operation on. - * @as: address space to issue the MMU operation on. - * @op_param: parameters for the operation. + * kbase_mmu_hw_do_lock - Issue LOCK command to the MMU and program + * the LOCKADDR register. + * + * @kbdev: Kbase device to issue the MMU operation on. + * @as: Address space to issue the MMU operation on. + * @op_param: Pointer to struct containing information about the MMU + * operation to perform. + * + * hwaccess_lock needs to be held when calling this function. + * + * Return: 0 if issuing the command was successful, otherwise an error code. + */ +int kbase_mmu_hw_do_lock(struct kbase_device *kbdev, struct kbase_as *as, + const struct kbase_mmu_hw_op_param *op_param); + +/** + * kbase_mmu_hw_do_unlock_no_addr - Issue UNLOCK command to the MMU without + * programming the LOCKADDR register and wait + * for it to complete before returning. + * + * @kbdev: Kbase device to issue the MMU operation on. + * @as: Address space to issue the MMU operation on. + * @op_param: Pointer to struct containing information about the MMU + * operation to perform. + * + * This function should be called for GPU where GPU command is used to flush + * the cache(s) instead of MMU command. + * + * Return: 0 if issuing the command was successful, otherwise an error code. + */ +int kbase_mmu_hw_do_unlock_no_addr(struct kbase_device *kbdev, struct kbase_as *as, + const struct kbase_mmu_hw_op_param *op_param); + +/** + * kbase_mmu_hw_do_unlock - Issue UNLOCK command to the MMU and wait for it + * to complete before returning. + * + * @kbdev: Kbase device to issue the MMU operation on. + * @as: Address space to issue the MMU operation on. + * @op_param: Pointer to struct containing information about the MMU + * operation to perform. + * + * Return: 0 if issuing the command was successful, otherwise an error code. + */ +int kbase_mmu_hw_do_unlock(struct kbase_device *kbdev, struct kbase_as *as, + const struct kbase_mmu_hw_op_param *op_param); +/** + * kbase_mmu_hw_do_lock - Issue a LOCK operation to the MMU. * - * Issue an operation (MMU invalidate, MMU flush, etc) on the address space that - * is associated with the provided kbase_context over the specified range + * @kbdev: Kbase device to issue the MMU operation on. + * @as: Address space to issue the MMU operation on. + * @op_param: Pointer to struct containing information about the MMU + * operation to perform. + * + * Context: Acquires the hwaccess_lock, expects the caller to hold the mmu_hw_mutex * * Return: Zero if the operation was successful, non-zero otherwise. */ -int kbase_mmu_hw_do_operation(struct kbase_device *kbdev, struct kbase_as *as, - struct kbase_mmu_hw_op_param *op_param); +int kbase_mmu_hw_do_lock(struct kbase_device *kbdev, struct kbase_as *as, + const struct kbase_mmu_hw_op_param *op_param); + +/** + * kbase_mmu_hw_do_flush - Issue a flush operation to the MMU. + * + * @kbdev: Kbase device to issue the MMU operation on. + * @as: Address space to issue the MMU operation on. + * @op_param: Pointer to struct containing information about the MMU + * operation to perform. + * + * Issue a flush operation on the address space as per the information + * specified inside @op_param. This function should not be called for + * GPUs where MMU command to flush the cache(s) is deprecated. + * mmu_hw_mutex needs to be held when calling this function. + * + * Return: 0 if the operation was successful, non-zero otherwise. + */ +int kbase_mmu_hw_do_flush(struct kbase_device *kbdev, struct kbase_as *as, + const struct kbase_mmu_hw_op_param *op_param); + +/** + * kbase_mmu_hw_do_flush_locked - Issue a flush operation to the MMU. + * + * @kbdev: Kbase device to issue the MMU operation on. + * @as: Address space to issue the MMU operation on. + * @op_param: Pointer to struct containing information about the MMU + * operation to perform. + * + * Issue a flush operation on the address space as per the information + * specified inside @op_param. This function should not be called for + * GPUs where MMU command to flush the cache(s) is deprecated. + * Both mmu_hw_mutex and hwaccess_lock need to be held when calling this + * function. + * + * Return: 0 if the operation was successful, non-zero otherwise. + */ +int kbase_mmu_hw_do_flush_locked(struct kbase_device *kbdev, struct kbase_as *as, + const struct kbase_mmu_hw_op_param *op_param); + +/** + * kbase_mmu_hw_do_flush_on_gpu_ctrl - Issue a flush operation to the MMU. + * + * @kbdev: Kbase device to issue the MMU operation on. + * @as: Address space to issue the MMU operation on. + * @op_param: Pointer to struct containing information about the MMU + * operation to perform. + * + * Issue a flush operation on the address space as per the information + * specified inside @op_param. GPU command is used to flush the cache(s) + * instead of the MMU command. + * + * Return: 0 if the operation was successful, non-zero otherwise. + */ +int kbase_mmu_hw_do_flush_on_gpu_ctrl(struct kbase_device *kbdev, struct kbase_as *as, + const struct kbase_mmu_hw_op_param *op_param); /** * kbase_mmu_hw_clear_fault - Clear a fault that has been previously reported by diff --git a/mali_kbase/mmu/mali_kbase_mmu_hw_direct.c b/mali_kbase/mmu/mali_kbase_mmu_hw_direct.c index cdf9a84..d5411bd 100644 --- a/mali_kbase/mmu/mali_kbase_mmu_hw_direct.c +++ b/mali_kbase/mmu/mali_kbase_mmu_hw_direct.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2014-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2014-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -24,15 +24,40 @@ #include <mali_kbase.h> #include <mali_kbase_ctx_sched.h> #include <mali_kbase_mem.h> +#include <mali_kbase_reset_gpu.h> #include <mmu/mali_kbase_mmu_hw.h> #include <tl/mali_kbase_tracepoints.h> +#include <linux/delay.h> + +#if MALI_USE_CSF +/** + * mmu_has_flush_skip_pgd_levels() - Check if the GPU has the feature + * AS_LOCKADDR_FLUSH_SKIP_LEVELS + * + * @gpu_props: GPU properties for the GPU instance. + * + * This function returns whether a cache flush can apply the skip flags of + * AS_LOCKADDR_FLUSH_SKIP_LEVELS. + * + * Return: True if cache flush has the said feature. + */ +static bool mmu_has_flush_skip_pgd_levels(struct kbase_gpu_props const *gpu_props) +{ + u32 const signature = + gpu_props->props.raw_props.gpu_id & (GPU_ID2_ARCH_MAJOR | GPU_ID2_ARCH_REV); + + return signature >= (u32)GPU_ID2_PRODUCT_MAKE(12, 0, 4, 0); +} +#endif /** * lock_region() - Generate lockaddr to lock memory region in MMU - * @gpu_props: GPU properties for finding the MMU lock region size - * @pfn: Starting page frame number of the region to lock - * @num_pages: Number of pages to lock. It must be greater than 0. - * @lockaddr: Address and size of memory region to lock + * + * @gpu_props: GPU properties for finding the MMU lock region size. + * @lockaddr: Address and size of memory region to lock. + * @op_param: Pointer to a struct containing the starting page frame number of + * the region to lock, the number of pages to lock and page table + * levels to skip when flushing (if supported). * * The lockaddr value is a combination of the starting address and * the size of the region that encompasses all the memory pages to lock. @@ -63,14 +88,14 @@ * * Return: 0 if success, or an error code on failure. */ -static int lock_region(struct kbase_gpu_props const *gpu_props, u64 pfn, u32 num_pages, - u64 *lockaddr) +static int lock_region(struct kbase_gpu_props const *gpu_props, u64 *lockaddr, + const struct kbase_mmu_hw_op_param *op_param) { - const u64 lockaddr_base = pfn << PAGE_SHIFT; - const u64 lockaddr_end = ((pfn + num_pages) << PAGE_SHIFT) - 1; + const u64 lockaddr_base = op_param->vpfn << PAGE_SHIFT; + const u64 lockaddr_end = ((op_param->vpfn + op_param->nr) << PAGE_SHIFT) - 1; u64 lockaddr_size_log2; - if (num_pages == 0) + if (op_param->nr == 0) return -EINVAL; /* The MMU lock region is a self-aligned region whose size @@ -101,7 +126,7 @@ static int lock_region(struct kbase_gpu_props const *gpu_props, u64 pfn, u32 num * therefore the highest bit that differs is bit #16 * and the region size (as a logarithm) is 16 + 1 = 17, i.e. 128 kB. */ - lockaddr_size_log2 = fls(lockaddr_base ^ lockaddr_end); + lockaddr_size_log2 = fls64(lockaddr_base ^ lockaddr_end); /* Cap the size against minimum and maximum values allowed. */ if (lockaddr_size_log2 > KBASE_LOCK_REGION_MAX_SIZE_LOG2) @@ -123,40 +148,69 @@ static int lock_region(struct kbase_gpu_props const *gpu_props, u64 pfn, u32 num *lockaddr = lockaddr_base & ~((1ull << lockaddr_size_log2) - 1); *lockaddr |= lockaddr_size_log2 - 1; +#if MALI_USE_CSF + if (mmu_has_flush_skip_pgd_levels(gpu_props)) + *lockaddr = + AS_LOCKADDR_FLUSH_SKIP_LEVELS_SET(*lockaddr, op_param->flush_skip_levels); +#endif + return 0; } -static int wait_ready(struct kbase_device *kbdev, - unsigned int as_nr) +/** + * wait_ready() - Wait for previously issued MMU command to complete. + * + * @kbdev: Kbase device to wait for a MMU command to complete. + * @as_nr: Address space to wait for a MMU command to complete. + * + * Reset GPU if the wait for previously issued command fails. + * + * Return: 0 on successful completion. negative error on failure. + */ +static int wait_ready(struct kbase_device *kbdev, unsigned int as_nr) { - unsigned int max_loops = KBASE_AS_INACTIVE_MAX_LOOPS; + const ktime_t wait_loop_start = ktime_get_raw(); + const u32 mmu_as_inactive_wait_time_ms = kbdev->mmu_or_gpu_cache_op_wait_time_ms; + s64 diff; - /* Wait for the MMU status to indicate there is no active command. */ - while (--max_loops && - kbase_reg_read(kbdev, MMU_AS_REG(as_nr, AS_STATUS)) & - AS_STATUS_AS_ACTIVE) { - ; - } + if (unlikely(kbdev->mmu_unresponsive)) + return -EBUSY; - if (WARN_ON_ONCE(max_loops == 0)) { - dev_err(kbdev->dev, - "AS_ACTIVE bit stuck for as %u, might be caused by slow/unstable GPU clock or possible faulty FPGA connector", - as_nr); - return -1; - } + do { + unsigned int i; - return 0; + for (i = 0; i < 1000; i++) { + /* Wait for the MMU status to indicate there is no active command */ + if (!(kbase_reg_read(kbdev, MMU_STAGE1_REG(MMU_AS_REG(as_nr, AS_STATUS))) & + AS_STATUS_AS_ACTIVE)) + return 0; + } + + diff = ktime_to_ms(ktime_sub(ktime_get_raw(), wait_loop_start)); + } while (diff < mmu_as_inactive_wait_time_ms); + + dev_err(kbdev->dev, + "AS_ACTIVE bit stuck for as %u. Might be caused by unstable GPU clk/pwr or faulty system", + as_nr); + kbdev->mmu_unresponsive = true; + if (kbase_prepare_to_reset_gpu_locked(kbdev, RESET_FLAGS_HWC_UNRECOVERABLE_ERROR)) + kbase_reset_gpu_locked(kbdev); + + return -ETIMEDOUT; } static int write_cmd(struct kbase_device *kbdev, int as_nr, u32 cmd) { - int status; - /* write AS_COMMAND when MMU is ready to accept another command */ - status = wait_ready(kbdev, as_nr); - if (status == 0) - kbase_reg_write(kbdev, MMU_AS_REG(as_nr, AS_COMMAND), cmd); - else { + const int status = wait_ready(kbdev, as_nr); + + if (likely(status == 0)) + kbase_reg_write(kbdev, MMU_STAGE1_REG(MMU_AS_REG(as_nr, AS_COMMAND)), cmd); + else if (status == -EBUSY) { + dev_dbg(kbdev->dev, + "Skipped the wait for AS_ACTIVE bit for as %u, before sending MMU command %u", + as_nr, cmd); + } else { dev_err(kbdev->dev, "Wait for AS_ACTIVE bit failed for as %u, before sending MMU command %u", as_nr, cmd); @@ -165,6 +219,131 @@ static int write_cmd(struct kbase_device *kbdev, int as_nr, u32 cmd) return status; } +#if MALI_USE_CSF +static int wait_l2_power_trans_complete(struct kbase_device *kbdev) +{ + const ktime_t wait_loop_start = ktime_get_raw(); + const u32 pwr_trans_wait_time_ms = kbdev->mmu_or_gpu_cache_op_wait_time_ms; + s64 diff; + u64 value; + + lockdep_assert_held(&kbdev->hwaccess_lock); + + do { + unsigned int i; + + for (i = 0; i < 1000; i++) { + value = kbase_reg_read(kbdev, GPU_CONTROL_REG(L2_PWRTRANS_HI)); + value <<= 32; + value |= kbase_reg_read(kbdev, GPU_CONTROL_REG(L2_PWRTRANS_LO)); + + if (!value) + return 0; + } + + diff = ktime_to_ms(ktime_sub(ktime_get_raw(), wait_loop_start)); + } while (diff < pwr_trans_wait_time_ms); + + dev_warn(kbdev->dev, "L2_PWRTRANS %016llx set for too long", value); + + if (kbase_prepare_to_reset_gpu_locked(kbdev, RESET_FLAGS_NONE)) + kbase_reset_gpu_locked(kbdev); + + return -ETIMEDOUT; +} + +#if !IS_ENABLED(CONFIG_MALI_NO_MALI) +static int wait_cores_power_trans_complete(struct kbase_device *kbdev) +{ +#define WAIT_TIMEOUT 50000 /* 50ms timeout */ +#define DELAY_TIME_IN_US 1 + const int max_iterations = WAIT_TIMEOUT; + int loop; + + lockdep_assert_held(&kbdev->hwaccess_lock); + + for (loop = 0; loop < max_iterations; loop++) { + u32 lo = + kbase_reg_read(kbdev, GPU_CONTROL_REG(SHADER_PWRTRANS_LO)); + u32 hi = + kbase_reg_read(kbdev, GPU_CONTROL_REG(SHADER_PWRTRANS_HI)); + + if (!lo && !hi) + break; + + udelay(DELAY_TIME_IN_US); + } + + if (loop == max_iterations) { + dev_warn(kbdev->dev, "SHADER_PWRTRANS %08x%08x set for too long", + kbase_reg_read(kbdev, GPU_CONTROL_REG(SHADER_PWRTRANS_HI)), + kbase_reg_read(kbdev, GPU_CONTROL_REG(SHADER_PWRTRANS_LO))); + return -ETIMEDOUT; + } + + return 0; +} + +/** + * apply_hw_issue_GPU2019_3901_wa - Apply WA for the HW issue GPU2019_3901 + * + * @kbdev: Kbase device to issue the MMU operation on. + * @mmu_cmd: Pointer to the variable contain the value of MMU command + * that needs to be sent to flush the L2 cache and do an + * implicit unlock. + * @as_nr: Address space number for which MMU command needs to be + * sent. + * + * This function ensures that the flush of LSC is not missed for the pages that + * were unmapped from the GPU, due to the power down transition of shader cores. + * + * Return: 0 if the WA was successfully applied, non-zero otherwise. + */ +static int apply_hw_issue_GPU2019_3901_wa(struct kbase_device *kbdev, u32 *mmu_cmd, + unsigned int as_nr) +{ + int ret = 0; + + lockdep_assert_held(&kbdev->hwaccess_lock); + + /* Check if L2 is OFF. The cores also must be OFF if L2 is not up, so + * the workaround can be safely skipped. + */ + if (kbdev->pm.backend.l2_state != KBASE_L2_OFF) { + if (unlikely(*mmu_cmd != AS_COMMAND_FLUSH_MEM)) { + dev_warn(kbdev->dev, "Unexpected MMU command(%u) received", *mmu_cmd); + return -EINVAL; + } + + /* Wait for the LOCK MMU command to complete, issued by the caller */ + ret = wait_ready(kbdev, as_nr); + if (unlikely(ret)) + return ret; + + ret = kbase_gpu_cache_flush_and_busy_wait(kbdev, + GPU_COMMAND_CACHE_CLN_INV_LSC); + if (unlikely(ret)) + return ret; + + ret = wait_cores_power_trans_complete(kbdev); + if (unlikely(ret)) { + if (kbase_prepare_to_reset_gpu_locked(kbdev, + RESET_FLAGS_HWC_UNRECOVERABLE_ERROR)) + kbase_reset_gpu_locked(kbdev); + return ret; + } + + /* As LSC is guaranteed to have been flushed we can use FLUSH_PT + * MMU command to only flush the L2. + */ + *mmu_cmd = AS_COMMAND_FLUSH_PT; + } + + return ret; +} +#endif /* !IS_ENABLED(CONFIG_MALI_NO_MALI) */ +#endif /* MALI_USE_CSF */ + void kbase_mmu_hw_configure(struct kbase_device *kbdev, struct kbase_as *as) { struct kbase_mmu_setup *current_setup = &as->current_setup; @@ -195,19 +374,18 @@ void kbase_mmu_hw_configure(struct kbase_device *kbdev, struct kbase_as *as) transcfg = (transcfg | AS_TRANSCFG_PTW_SH_OS); } - kbase_reg_write(kbdev, MMU_AS_REG(as->number, AS_TRANSCFG_LO), - transcfg); - kbase_reg_write(kbdev, MMU_AS_REG(as->number, AS_TRANSCFG_HI), + kbase_reg_write(kbdev, MMU_STAGE1_REG(MMU_AS_REG(as->number, AS_TRANSCFG_LO)), transcfg); + kbase_reg_write(kbdev, MMU_STAGE1_REG(MMU_AS_REG(as->number, AS_TRANSCFG_HI)), (transcfg >> 32) & 0xFFFFFFFFUL); - kbase_reg_write(kbdev, MMU_AS_REG(as->number, AS_TRANSTAB_LO), + kbase_reg_write(kbdev, MMU_STAGE1_REG(MMU_AS_REG(as->number, AS_TRANSTAB_LO)), current_setup->transtab & 0xFFFFFFFFUL); - kbase_reg_write(kbdev, MMU_AS_REG(as->number, AS_TRANSTAB_HI), + kbase_reg_write(kbdev, MMU_STAGE1_REG(MMU_AS_REG(as->number, AS_TRANSTAB_HI)), (current_setup->transtab >> 32) & 0xFFFFFFFFUL); - kbase_reg_write(kbdev, MMU_AS_REG(as->number, AS_MEMATTR_LO), + kbase_reg_write(kbdev, MMU_STAGE1_REG(MMU_AS_REG(as->number, AS_MEMATTR_LO)), current_setup->memattr & 0xFFFFFFFFUL); - kbase_reg_write(kbdev, MMU_AS_REG(as->number, AS_MEMATTR_HI), + kbase_reg_write(kbdev, MMU_STAGE1_REG(MMU_AS_REG(as->number, AS_MEMATTR_HI)), (current_setup->memattr >> 32) & 0xFFFFFFFFUL); KBASE_TLSTREAM_TL_ATTRIB_AS_CONFIG(kbdev, as, @@ -222,93 +400,302 @@ void kbase_mmu_hw_configure(struct kbase_device *kbdev, struct kbase_as *as) #endif } -int kbase_mmu_hw_do_operation(struct kbase_device *kbdev, struct kbase_as *as, - struct kbase_mmu_hw_op_param *op_param) +/** + * mmu_command_instr - Record an MMU command for instrumentation purposes. + * + * @kbdev: Kbase device used to issue MMU operation on. + * @kctx_id: Kernel context ID for MMU command tracepoint. + * @cmd: Command issued to the MMU. + * @lock_addr: Address of memory region locked for the operation. + * @mmu_sync_info: Indicates whether this call is synchronous wrt MMU ops. + */ +static void mmu_command_instr(struct kbase_device *kbdev, u32 kctx_id, u32 cmd, u64 lock_addr, + enum kbase_caller_mmu_sync_info mmu_sync_info) +{ + u64 lock_addr_base = AS_LOCKADDR_LOCKADDR_BASE_GET(lock_addr); + u32 lock_addr_size = AS_LOCKADDR_LOCKADDR_SIZE_GET(lock_addr); + + bool is_mmu_synchronous = (mmu_sync_info == CALLER_MMU_SYNC); + + KBASE_TLSTREAM_AUX_MMU_COMMAND(kbdev, kctx_id, cmd, is_mmu_synchronous, lock_addr_base, + lock_addr_size); +} + +/* Helper function to program the LOCKADDR register before LOCK/UNLOCK command + * is issued. + */ +static int mmu_hw_set_lock_addr(struct kbase_device *kbdev, int as_nr, u64 *lock_addr, + const struct kbase_mmu_hw_op_param *op_param) +{ + int ret; + + ret = lock_region(&kbdev->gpu_props, lock_addr, op_param); + + if (!ret) { + /* Set the region that needs to be updated */ + kbase_reg_write(kbdev, MMU_STAGE1_REG(MMU_AS_REG(as_nr, AS_LOCKADDR_LO)), + *lock_addr & 0xFFFFFFFFUL); + kbase_reg_write(kbdev, MMU_STAGE1_REG(MMU_AS_REG(as_nr, AS_LOCKADDR_HI)), + (*lock_addr >> 32) & 0xFFFFFFFFUL); + } + return ret; +} + +/** + * mmu_hw_do_lock_no_wait - Issue LOCK command to the MMU and return without + * waiting for it's completion. + * + * @kbdev: Kbase device to issue the MMU operation on. + * @as: Address space to issue the MMU operation on. + * @lock_addr: Address of memory region locked for this operation. + * @op_param: Pointer to a struct containing information about the MMU operation. + * + * Return: 0 if issuing the command was successful, otherwise an error code. + */ +static int mmu_hw_do_lock_no_wait(struct kbase_device *kbdev, struct kbase_as *as, u64 *lock_addr, + const struct kbase_mmu_hw_op_param *op_param) +{ + int ret; + + ret = mmu_hw_set_lock_addr(kbdev, as->number, lock_addr, op_param); + + if (likely(!ret)) + ret = write_cmd(kbdev, as->number, AS_COMMAND_LOCK); + + return ret; +} + +/** + * mmu_hw_do_lock - Issue LOCK command to the MMU and wait for its completion. + * + * @kbdev: Kbase device to issue the MMU operation on. + * @as: Address space to issue the MMU operation on. + * @op_param: Pointer to a struct containing information about the MMU operation. + * + * Return: 0 if issuing the LOCK command was successful, otherwise an error code. + */ +static int mmu_hw_do_lock(struct kbase_device *kbdev, struct kbase_as *as, + const struct kbase_mmu_hw_op_param *op_param) { int ret; u64 lock_addr = 0x0; - if (WARN_ON(kbdev == NULL) || - WARN_ON(as == NULL) || - WARN_ON(op_param == NULL)) + if (WARN_ON(kbdev == NULL) || WARN_ON(as == NULL)) return -EINVAL; - lockdep_assert_held(&kbdev->mmu_hw_mutex); + ret = mmu_hw_do_lock_no_wait(kbdev, as, &lock_addr, op_param); + + if (!ret) + ret = wait_ready(kbdev, as->number); + + if (!ret) + mmu_command_instr(kbdev, op_param->kctx_id, AS_COMMAND_LOCK, lock_addr, + op_param->mmu_sync_info); + else + dev_err(kbdev->dev, "AS_ACTIVE bit stuck after sending UNLOCK command"); - if (op_param->op == KBASE_MMU_OP_UNLOCK) { - /* Unlock doesn't require a lock first */ - ret = write_cmd(kbdev, as->number, AS_COMMAND_UNLOCK); + return ret; +} + +int kbase_mmu_hw_do_lock(struct kbase_device *kbdev, struct kbase_as *as, + const struct kbase_mmu_hw_op_param *op_param) +{ + lockdep_assert_held(&kbdev->hwaccess_lock); - /* Wait for UNLOCK command to complete */ + return mmu_hw_do_lock(kbdev, as, op_param); +} + +int kbase_mmu_hw_do_unlock_no_addr(struct kbase_device *kbdev, struct kbase_as *as, + const struct kbase_mmu_hw_op_param *op_param) +{ + int ret = 0; + + if (WARN_ON(kbdev == NULL) || WARN_ON(as == NULL)) + return -EINVAL; + + ret = write_cmd(kbdev, as->number, AS_COMMAND_UNLOCK); + + /* Wait for UNLOCK command to complete */ + if (likely(!ret)) ret = wait_ready(kbdev, as->number); - if (!ret) { - /* read MMU_AS_CONTROL.LOCKADDR register */ - lock_addr |= (u64)kbase_reg_read(kbdev, - MMU_AS_REG(as->number, AS_LOCKADDR_HI)) << 32; - lock_addr |= (u64)kbase_reg_read(kbdev, - MMU_AS_REG(as->number, AS_LOCKADDR_LO)); + if (likely(!ret)) { + u64 lock_addr = 0x0; + /* read MMU_AS_CONTROL.LOCKADDR register */ + lock_addr |= (u64)kbase_reg_read( + kbdev, MMU_STAGE1_REG(MMU_AS_REG(as->number, AS_LOCKADDR_HI))) + << 32; + lock_addr |= (u64)kbase_reg_read( + kbdev, MMU_STAGE1_REG(MMU_AS_REG(as->number, AS_LOCKADDR_LO))); + + mmu_command_instr(kbdev, op_param->kctx_id, AS_COMMAND_UNLOCK, + lock_addr, op_param->mmu_sync_info); + } + + return ret; +} + +int kbase_mmu_hw_do_unlock(struct kbase_device *kbdev, struct kbase_as *as, + const struct kbase_mmu_hw_op_param *op_param) +{ + int ret = 0; + u64 lock_addr = 0x0; + + if (WARN_ON(kbdev == NULL) || WARN_ON(as == NULL)) + return -EINVAL; + + ret = mmu_hw_set_lock_addr(kbdev, as->number, &lock_addr, op_param); + + if (!ret) + ret = kbase_mmu_hw_do_unlock_no_addr(kbdev, as, + op_param); + + return ret; +} + +/** + * mmu_hw_do_flush - Flush MMU and wait for its completion. + * + * @kbdev: Kbase device to issue the MMU operation on. + * @as: Address space to issue the MMU operation on. + * @op_param: Pointer to a struct containing information about the MMU operation. + * @hwaccess_locked: Flag to indicate if the lock has been held. + * + * Return: 0 if flushing MMU was successful, otherwise an error code. + */ +static int mmu_hw_do_flush(struct kbase_device *kbdev, struct kbase_as *as, + const struct kbase_mmu_hw_op_param *op_param, bool hwaccess_locked) +{ + int ret; + u64 lock_addr = 0x0; + u32 mmu_cmd = AS_COMMAND_FLUSH_MEM; + const enum kbase_mmu_op_type flush_op = op_param->op; + + if (WARN_ON(kbdev == NULL) || WARN_ON(as == NULL)) + return -EINVAL; + + /* MMU operations can be either FLUSH_PT or FLUSH_MEM, anything else at + * this point would be unexpected. + */ + if (flush_op != KBASE_MMU_OP_FLUSH_PT && flush_op != KBASE_MMU_OP_FLUSH_MEM) { + dev_err(kbdev->dev, "Unexpected flush operation received"); + return -EINVAL; + } + + lockdep_assert_held(&kbdev->mmu_hw_mutex); + + if (flush_op == KBASE_MMU_OP_FLUSH_PT) + mmu_cmd = AS_COMMAND_FLUSH_PT; + + /* Lock the region that needs to be updated */ + ret = mmu_hw_do_lock_no_wait(kbdev, as, &lock_addr, op_param); + if (ret) + return ret; + +#if MALI_USE_CSF && !IS_ENABLED(CONFIG_MALI_NO_MALI) + /* WA for the BASE_HW_ISSUE_GPU2019_3901. */ + if (kbase_hw_has_issue(kbdev, BASE_HW_ISSUE_GPU2019_3901) && + mmu_cmd == AS_COMMAND_FLUSH_MEM) { + if (!hwaccess_locked) { + unsigned long flags = 0; + + spin_lock_irqsave(&kbdev->hwaccess_lock, flags); + ret = apply_hw_issue_GPU2019_3901_wa(kbdev, &mmu_cmd, as->number); + spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); + } else { + ret = apply_hw_issue_GPU2019_3901_wa(kbdev, &mmu_cmd, as->number); } - } else if (op_param->op >= KBASE_MMU_OP_FIRST && - op_param->op < KBASE_MMU_OP_COUNT) { - ret = lock_region(&kbdev->gpu_props, op_param->vpfn, op_param->nr, &lock_addr); - - if (!ret) { - /* Lock the region that needs to be updated */ - kbase_reg_write(kbdev, - MMU_AS_REG(as->number, AS_LOCKADDR_LO), - lock_addr & 0xFFFFFFFFUL); - kbase_reg_write(kbdev, - MMU_AS_REG(as->number, AS_LOCKADDR_HI), - (lock_addr >> 32) & 0xFFFFFFFFUL); - write_cmd(kbdev, as->number, AS_COMMAND_LOCK); - - /* Translate and send operation to HW */ - switch (op_param->op) { - case KBASE_MMU_OP_FLUSH_PT: - write_cmd(kbdev, as->number, - AS_COMMAND_FLUSH_PT); - break; - case KBASE_MMU_OP_FLUSH_MEM: - write_cmd(kbdev, as->number, - AS_COMMAND_FLUSH_MEM); - break; - case KBASE_MMU_OP_LOCK: - /* No further operation. */ - break; - default: - dev_warn(kbdev->dev, - "Unsupported MMU operation (op=%d).\n", - op_param->op); - return -EINVAL; - }; - - /* Wait for the command to complete */ - ret = wait_ready(kbdev, as->number); + + if (ret) { + dev_warn( + kbdev->dev, + "Failed to apply WA for HW issue when doing MMU flush op on VA range %llx-%llx for AS %u", + op_param->vpfn << PAGE_SHIFT, + ((op_param->vpfn + op_param->nr) << PAGE_SHIFT) - 1, as->number); + /* Continue with the MMU flush operation */ } - } else { - /* Code should not reach here. */ - dev_warn(kbdev->dev, "Invalid mmu operation (op=%d).\n", - op_param->op); + } +#endif + + ret = write_cmd(kbdev, as->number, mmu_cmd); + + /* Wait for the command to complete */ + if (likely(!ret)) + ret = wait_ready(kbdev, as->number); + + if (likely(!ret)) { + mmu_command_instr(kbdev, op_param->kctx_id, mmu_cmd, lock_addr, + op_param->mmu_sync_info); +#if MALI_USE_CSF + if (flush_op == KBASE_MMU_OP_FLUSH_MEM && + kbdev->pm.backend.apply_hw_issue_TITANHW_2938_wa && + kbdev->pm.backend.l2_state == KBASE_L2_PEND_OFF) + ret = wait_l2_power_trans_complete(kbdev); +#endif + } + + return ret; +} + +int kbase_mmu_hw_do_flush_locked(struct kbase_device *kbdev, struct kbase_as *as, + const struct kbase_mmu_hw_op_param *op_param) +{ + lockdep_assert_held(&kbdev->hwaccess_lock); + + return mmu_hw_do_flush(kbdev, as, op_param, true); +} + +int kbase_mmu_hw_do_flush(struct kbase_device *kbdev, struct kbase_as *as, + const struct kbase_mmu_hw_op_param *op_param) +{ + return mmu_hw_do_flush(kbdev, as, op_param, false); +} + +int kbase_mmu_hw_do_flush_on_gpu_ctrl(struct kbase_device *kbdev, struct kbase_as *as, + const struct kbase_mmu_hw_op_param *op_param) +{ + int ret, ret2; + u32 gpu_cmd = GPU_COMMAND_CACHE_CLN_INV_L2_LSC; + const enum kbase_mmu_op_type flush_op = op_param->op; + + if (WARN_ON(kbdev == NULL) || WARN_ON(as == NULL)) + return -EINVAL; + + /* MMU operations can be either FLUSH_PT or FLUSH_MEM, anything else at + * this point would be unexpected. + */ + if (flush_op != KBASE_MMU_OP_FLUSH_PT && flush_op != KBASE_MMU_OP_FLUSH_MEM) { + dev_err(kbdev->dev, "Unexpected flush operation received"); return -EINVAL; } - /* MMU command instrumentation */ - if (!ret) { - u64 lock_addr_base = AS_LOCKADDR_LOCKADDR_BASE_GET(lock_addr); - u32 lock_addr_size = AS_LOCKADDR_LOCKADDR_SIZE_GET(lock_addr); + lockdep_assert_held(&kbdev->hwaccess_lock); + lockdep_assert_held(&kbdev->mmu_hw_mutex); + + if (flush_op == KBASE_MMU_OP_FLUSH_PT) + gpu_cmd = GPU_COMMAND_CACHE_CLN_INV_L2; + + /* 1. Issue MMU_AS_CONTROL.COMMAND.LOCK operation. */ + ret = mmu_hw_do_lock(kbdev, as, op_param); + if (ret) + return ret; - bool is_mmu_synchronous = false; + /* 2. Issue GPU_CONTROL.COMMAND.FLUSH_CACHES operation */ + ret = kbase_gpu_cache_flush_and_busy_wait(kbdev, gpu_cmd); - if (op_param->mmu_sync_info == CALLER_MMU_SYNC) - is_mmu_synchronous = true; + /* 3. Issue MMU_AS_CONTROL.COMMAND.UNLOCK operation. */ + ret2 = kbase_mmu_hw_do_unlock_no_addr(kbdev, as, op_param); - KBASE_TLSTREAM_AUX_MMU_COMMAND(kbdev, op_param->kctx_id, - op_param->op, is_mmu_synchronous, - lock_addr_base, lock_addr_size); +#if MALI_USE_CSF + if (!ret && !ret2) { + if (flush_op == KBASE_MMU_OP_FLUSH_MEM && + kbdev->pm.backend.apply_hw_issue_TITANHW_2938_wa && + kbdev->pm.backend.l2_state == KBASE_L2_PEND_OFF) + ret = wait_l2_power_trans_complete(kbdev); } +#endif - return ret; + return ret ?: ret2; } void kbase_mmu_hw_clear_fault(struct kbase_device *kbdev, struct kbase_as *as, @@ -333,7 +720,7 @@ void kbase_mmu_hw_clear_fault(struct kbase_device *kbdev, struct kbase_as *as, type == KBASE_MMU_FAULT_TYPE_BUS_UNEXPECTED) pf_bf_mask |= MMU_BUS_ERROR(as->number); #endif - kbase_reg_write(kbdev, MMU_REG(MMU_IRQ_CLEAR), pf_bf_mask); + kbase_reg_write(kbdev, MMU_CONTROL_REG(MMU_IRQ_CLEAR), pf_bf_mask); unlock: spin_unlock_irqrestore(&kbdev->mmu_mask_change, flags); @@ -357,15 +744,15 @@ void kbase_mmu_hw_enable_fault(struct kbase_device *kbdev, struct kbase_as *as, if (kbdev->irq_reset_flush) goto unlock; - irq_mask = kbase_reg_read(kbdev, MMU_REG(MMU_IRQ_MASK)) | - MMU_PAGE_FAULT(as->number); + irq_mask = + kbase_reg_read(kbdev, MMU_CONTROL_REG(MMU_IRQ_MASK)) | MMU_PAGE_FAULT(as->number); #if !MALI_USE_CSF if (type == KBASE_MMU_FAULT_TYPE_BUS || type == KBASE_MMU_FAULT_TYPE_BUS_UNEXPECTED) irq_mask |= MMU_BUS_ERROR(as->number); #endif - kbase_reg_write(kbdev, MMU_REG(MMU_IRQ_MASK), irq_mask); + kbase_reg_write(kbdev, MMU_CONTROL_REG(MMU_IRQ_MASK), irq_mask); unlock: spin_unlock_irqrestore(&kbdev->mmu_mask_change, flags); diff --git a/mali_kbase/mmu/mali_kbase_mmu_mode_aarch64.c b/mali_kbase/mmu/mali_kbase_mmu_mode_aarch64.c index c061099..f2c6274 100644 --- a/mali_kbase/mmu/mali_kbase_mmu_mode_aarch64.c +++ b/mali_kbase/mmu/mali_kbase_mmu_mode_aarch64.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2010-2014, 2016-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2010-2014, 2016-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -35,10 +35,8 @@ #define ENTRY_IS_INVAL 2ULL #define ENTRY_IS_PTE 3ULL -#define ENTRY_ATTR_BITS (7ULL << 2) /* bits 4:2 */ #define ENTRY_ACCESS_RW (1ULL << 6) /* bits 6:7 */ #define ENTRY_ACCESS_RO (3ULL << 6) -#define ENTRY_SHARE_BITS (3ULL << 8) /* bits 9:8 */ #define ENTRY_ACCESS_BIT (1ULL << 10) #define ENTRY_NX_BIT (1ULL << 54) @@ -189,35 +187,31 @@ static void set_num_valid_entries(u64 *pgd, unsigned int num_of_valid_entries) << UNUSED_BIT_POSITION_IN_PAGE_DESCRIPTOR); } -static void entry_set_pte(u64 *pgd, u64 vpfn, phys_addr_t phy) +static void entry_set_pte(u64 *entry, phys_addr_t phy) { - unsigned int nr_entries = get_num_valid_entries(pgd); - - page_table_entry_set(&pgd[vpfn], (phy & PAGE_MASK) | ENTRY_ACCESS_BIT | - ENTRY_IS_PTE); - - set_num_valid_entries(pgd, nr_entries + 1); + page_table_entry_set(entry, (phy & PAGE_MASK) | ENTRY_ACCESS_BIT | ENTRY_IS_PTE); } -static void entry_invalidate(u64 *entry) +static void entries_invalidate(u64 *entry, u32 count) { - page_table_entry_set(entry, ENTRY_IS_INVAL); + u32 i; + + for (i = 0; i < count; i++) + page_table_entry_set(entry + i, ENTRY_IS_INVAL); } -static const struct kbase_mmu_mode aarch64_mode = { - .update = mmu_update, - .get_as_setup = kbase_mmu_get_as_setup, - .disable_as = mmu_disable_as, - .pte_to_phy_addr = pte_to_phy_addr, - .ate_is_valid = ate_is_valid, - .pte_is_valid = pte_is_valid, - .entry_set_ate = entry_set_ate, - .entry_set_pte = entry_set_pte, - .entry_invalidate = entry_invalidate, - .get_num_valid_entries = get_num_valid_entries, - .set_num_valid_entries = set_num_valid_entries, - .flags = KBASE_MMU_MODE_HAS_NON_CACHEABLE -}; +static const struct kbase_mmu_mode aarch64_mode = { .update = mmu_update, + .get_as_setup = kbase_mmu_get_as_setup, + .disable_as = mmu_disable_as, + .pte_to_phy_addr = pte_to_phy_addr, + .ate_is_valid = ate_is_valid, + .pte_is_valid = pte_is_valid, + .entry_set_ate = entry_set_ate, + .entry_set_pte = entry_set_pte, + .entries_invalidate = entries_invalidate, + .get_num_valid_entries = get_num_valid_entries, + .set_num_valid_entries = set_num_valid_entries, + .flags = KBASE_MMU_MODE_HAS_NON_CACHEABLE }; struct kbase_mmu_mode const *kbase_mmu_mode_get_aarch64(void) { diff --git a/mali_kbase/platform/Kconfig b/mali_kbase/platform/Kconfig index de4203c..b190e26 100644 --- a/mali_kbase/platform/Kconfig +++ b/mali_kbase/platform/Kconfig @@ -1,6 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note # -# (C) COPYRIGHT 2012-2013, 2017, 2021 ARM Limited. All rights reserved. +# (C) COPYRIGHT 2012-2023 ARM Limited. All rights reserved. # # This program is free software and is provided to you under the terms of the # GNU General Public License version 2 as published by the Free Software @@ -20,7 +20,7 @@ # Add your platform specific Kconfig file here # -# "drivers/gpu/arm/midgard/platform/xxx/Kconfig" +# "$(MALI_KCONFIG_EXT_PREFIX)drivers/gpu/arm/midgard/platform/xxx/Kconfig" # # Where xxx is the platform name is the name set in MALI_PLATFORM_NAME # diff --git a/mali_kbase/platform/devicetree/Kbuild b/mali_kbase/platform/devicetree/Kbuild index 5eeccfa..995c4cd 100644 --- a/mali_kbase/platform/devicetree/Kbuild +++ b/mali_kbase/platform/devicetree/Kbuild @@ -20,6 +20,5 @@ mali_kbase-y += \ platform/$(MALI_PLATFORM_DIR)/mali_kbase_config_devicetree.o \ - platform/$(MALI_PLATFORM_DIR)/mali_kbase_config_platform.o \ platform/$(MALI_PLATFORM_DIR)/mali_kbase_runtime_pm.o \ platform/$(MALI_PLATFORM_DIR)/mali_kbase_clk_rate_trace.o diff --git a/mali_kbase/platform/devicetree/mali_kbase_config_platform.h b/mali_kbase/platform/devicetree/mali_kbase_config_platform.h index 743885f..584a721 100644 --- a/mali_kbase/platform/devicetree/mali_kbase_config_platform.h +++ b/mali_kbase/platform/devicetree/mali_kbase_config_platform.h @@ -33,13 +33,12 @@ * Attached value: pointer to @ref kbase_platform_funcs_conf * Default value: See @ref kbase_platform_funcs_conf */ -#define PLATFORM_FUNCS (&platform_funcs) +#define PLATFORM_FUNCS (NULL) #define CLK_RATE_TRACE_OPS (&clk_rate_trace_ops) extern struct kbase_pm_callback_conf pm_callbacks; extern struct kbase_clk_rate_trace_op_conf clk_rate_trace_ops; -extern struct kbase_platform_funcs_conf platform_funcs; /** * AUTO_SUSPEND_DELAY - Autosuspend delay * diff --git a/mali_kbase/platform/devicetree/mali_kbase_runtime_pm.c b/mali_kbase/platform/devicetree/mali_kbase_runtime_pm.c index 3881d28..a019229 100644 --- a/mali_kbase/platform/devicetree/mali_kbase_runtime_pm.c +++ b/mali_kbase/platform/devicetree/mali_kbase_runtime_pm.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2015-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2015-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -50,7 +50,6 @@ static void enable_gpu_power_control(struct kbase_device *kbdev) } } - static void disable_gpu_power_control(struct kbase_device *kbdev) { unsigned int i; @@ -82,8 +81,7 @@ static int pm_callback_power_on(struct kbase_device *kbdev) int error; unsigned long flags; - dev_dbg(kbdev->dev, "%s %p\n", __func__, - (void *)kbdev->dev->pm_domain); + dev_dbg(kbdev->dev, "%s %pK\n", __func__, (void *)kbdev->dev->pm_domain); spin_lock_irqsave(&kbdev->hwaccess_lock, flags); WARN_ON(kbdev->pm.backend.gpu_powered); @@ -99,9 +97,8 @@ static int pm_callback_power_on(struct kbase_device *kbdev) #else spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); +#ifdef KBASE_PM_RUNTIME error = pm_runtime_get_sync(kbdev->dev); - enable_gpu_power_control(kbdev); - if (error == 1) { /* * Let core know that the chip has not been @@ -109,8 +106,11 @@ static int pm_callback_power_on(struct kbase_device *kbdev) */ ret = 0; } - dev_dbg(kbdev->dev, "pm_runtime_get_sync returned %d\n", error); +#else + enable_gpu_power_control(kbdev); +#endif /* KBASE_PM_RUNTIME */ + #endif /* MALI_USE_CSF */ return ret; @@ -126,7 +126,9 @@ static void pm_callback_power_off(struct kbase_device *kbdev) WARN_ON(kbdev->pm.backend.gpu_powered); #if MALI_USE_CSF if (likely(kbdev->csf.firmware_inited)) { +#ifdef CONFIG_MALI_DEBUG WARN_ON(kbase_csf_scheduler_get_nr_active_csgs(kbdev)); +#endif WARN_ON(kbdev->pm.backend.mcu_state != KBASE_MCU_OFF); } spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); @@ -241,7 +243,9 @@ static int pm_callback_runtime_on(struct kbase_device *kbdev) { dev_dbg(kbdev->dev, "%s\n", __func__); +#if !MALI_USE_CSF enable_gpu_power_control(kbdev); +#endif return 0; } @@ -249,7 +253,9 @@ static void pm_callback_runtime_off(struct kbase_device *kbdev) { dev_dbg(kbdev->dev, "%s\n", __func__); +#if !MALI_USE_CSF disable_gpu_power_control(kbdev); +#endif } static void pm_callback_resume(struct kbase_device *kbdev) @@ -264,6 +270,17 @@ static void pm_callback_suspend(struct kbase_device *kbdev) pm_callback_runtime_off(kbdev); } +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS +static void pm_callback_sc_rails_on(struct kbase_device *kbdev) +{ + dev_dbg(kbdev->dev, "SC rails are on"); +} + +static void pm_callback_sc_rails_off(struct kbase_device *kbdev) +{ + dev_dbg(kbdev->dev, "SC rails are off"); +} +#endif struct kbase_pm_callback_conf pm_callbacks = { .power_on_callback = pm_callback_power_on, @@ -289,6 +306,9 @@ struct kbase_pm_callback_conf pm_callbacks = { .power_runtime_gpu_idle_callback = NULL, .power_runtime_gpu_active_callback = NULL, #endif -}; - +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + .power_on_sc_rails_callback = pm_callback_sc_rails_on, + .power_off_sc_rails_callback = pm_callback_sc_rails_off, +#endif +}; diff --git a/mali_kbase/platform/meson/Kbuild b/mali_kbase/platform/meson/Kbuild new file mode 100644 index 0000000..3f55378 --- /dev/null +++ b/mali_kbase/platform/meson/Kbuild @@ -0,0 +1,23 @@ +# SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note +# +# (C) COPYRIGHT 2012-2017, 2019-2021 ARM Limited. All rights reserved. +# +# This program is free software and is provided to you under the terms of the +# GNU General Public License version 2 as published by the Free Software +# Foundation, and any use by you of this program is subject to the terms +# of such GNU license. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, you can access it online at +# http://www.gnu.org/licenses/gpl-2.0.html. +# +# + +mali_kbase-y += \ + platform/$(MALI_PLATFORM_DIR)/mali_kbase_config_meson.o \ + platform/$(MALI_PLATFORM_DIR)/mali_kbase_runtime_pm.o diff --git a/mali_kbase/platform/devicetree/mali_kbase_config_platform.c b/mali_kbase/platform/meson/mali_kbase_config_meson.c index 2eebed0..c999a52 100644 --- a/mali_kbase/platform/devicetree/mali_kbase_config_platform.c +++ b/mali_kbase/platform/meson/mali_kbase_config_meson.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2015, 2017, 2019, 2021, 2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -20,24 +20,34 @@ */ #include <mali_kbase.h> -#include <mali_kbase_defs.h> #include <mali_kbase_config.h> -#include "mali_kbase_config_platform.h" -#include <device/mali_kbase_device.h> -#include <mali_kbase_hwaccess_time.h> -#include <gpu/mali_kbase_gpu_regmap.h> +#include <backend/gpu/mali_kbase_pm_internal.h> -#include <linux/kthread.h> -#include <linux/timer.h> -#include <linux/jiffies.h> -#include <linux/wait.h> -#include <linux/delay.h> -#include <linux/gcd.h> -#include <asm/arch_timer.h> +static struct kbase_platform_config dummy_platform_config; -struct kbase_platform_funcs_conf platform_funcs = { - .platform_init_func = NULL, - .platform_term_func = NULL, - .platform_late_init_func = NULL, - .platform_late_term_func = NULL, -}; +struct kbase_platform_config *kbase_get_platform_config(void) +{ + return &dummy_platform_config; +} + +#ifndef CONFIG_OF +int kbase_platform_register(void) +{ + return 0; +} + +void kbase_platform_unregister(void) +{ +} +#endif + +#ifdef CONFIG_MALI_MIDGARD_DVFS +#if MALI_USE_CSF +int kbase_platform_dvfs_event(struct kbase_device *kbdev, u32 utilisation) +#else +int kbase_platform_dvfs_event(struct kbase_device *kbdev, u32 utilisation, u32 util_gl_share, u32 util_cl_share[2]) +#endif +{ + return 1; +} +#endif /* CONFIG_MALI_MIDGARD_DVFS */ diff --git a/mali_kbase/platform/meson/mali_kbase_config_platform.h b/mali_kbase/platform/meson/mali_kbase_config_platform.h new file mode 100644 index 0000000..866a7de --- /dev/null +++ b/mali_kbase/platform/meson/mali_kbase_config_platform.h @@ -0,0 +1,45 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * + * (C) COPYRIGHT 2014-2017, 2019-2023 ARM Limited. All rights reserved. + * + * This program is free software and is provided to you under the terms of the + * GNU General Public License version 2 as published by the Free Software + * Foundation, and any use by you of this program is subject to the terms + * of such GNU license. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, you can access it online at + * http://www.gnu.org/licenses/gpl-2.0.html. + * + */ + +/** + * POWER_MANAGEMENT_CALLBACKS - Power management configuration + * + * Attached value: pointer to @ref kbase_pm_callback_conf + * Default value: See @ref kbase_pm_callback_conf + */ +#define POWER_MANAGEMENT_CALLBACKS (&pm_callbacks) + +/** + * PLATFORM_FUNCS - Platform specific configuration functions + * + * Attached value: pointer to @ref kbase_platform_funcs_conf + * Default value: See @ref kbase_platform_funcs_conf + */ +#define PLATFORM_FUNCS (NULL) + +extern struct kbase_pm_callback_conf pm_callbacks; + +/** + * AUTO_SUSPEND_DELAY - Autosuspend delay + * + * The delay time (in milliseconds) to be used for autosuspend + */ +#define AUTO_SUSPEND_DELAY (100) diff --git a/mali_kbase/platform/meson/mali_kbase_runtime_pm.c b/mali_kbase/platform/meson/mali_kbase_runtime_pm.c new file mode 100644 index 0000000..a9b380c --- /dev/null +++ b/mali_kbase/platform/meson/mali_kbase_runtime_pm.c @@ -0,0 +1,290 @@ +// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note +/* + * + * (C) COPYRIGHT 2015, 2017-2022 ARM Limited. All rights reserved. + * + * This program is free software and is provided to you under the terms of the + * GNU General Public License version 2 as published by the Free Software + * Foundation, and any use by you of this program is subject to the terms + * of such GNU license. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, you can access it online at + * http://www.gnu.org/licenses/gpl-2.0.html. + * + */ + +#include <mali_kbase.h> +#include <mali_kbase_defs.h> +#include <device/mali_kbase_device.h> + +#include <linux/pm_runtime.h> +#include <linux/reset.h> +#include <linux/clk.h> +#include <linux/clk-provider.h> +#include <linux/delay.h> +#include <linux/regulator/consumer.h> + +#include "mali_kbase_config_platform.h" + + +static struct reset_control **resets; +static int nr_resets; + +static int resets_init(struct kbase_device *kbdev) +{ + struct device_node *np; + int i; + int err = 0; + + np = kbdev->dev->of_node; + + nr_resets = of_count_phandle_with_args(np, "resets", "#reset-cells"); + if (nr_resets <= 0) { + dev_err(kbdev->dev, "Failed to get GPU resets from dtb\n"); + return nr_resets; + } + + resets = devm_kcalloc(kbdev->dev, nr_resets, sizeof(*resets), + GFP_KERNEL); + if (!resets) + return -ENOMEM; + + for (i = 0; i < nr_resets; ++i) { + resets[i] = devm_reset_control_get_exclusive_by_index( + kbdev->dev, i); + if (IS_ERR(resets[i])) { + err = PTR_ERR(resets[i]); + nr_resets = i; + break; + } + } + + return err; +} + +static int pm_callback_soft_reset(struct kbase_device *kbdev) +{ + int ret, i; + + if (!resets) { + ret = resets_init(kbdev); + if (ret) + return ret; + } + + for (i = 0; i < nr_resets; ++i) + reset_control_assert(resets[i]); + + udelay(10); + + for (i = 0; i < nr_resets; ++i) + reset_control_deassert(resets[i]); + + udelay(10); + + /* Override Power Management Settings, values from manufacturer's defaults */ + kbase_reg_write(kbdev, GPU_CONTROL_REG(PWR_KEY), 0x2968A819); + kbase_reg_write(kbdev, GPU_CONTROL_REG(PWR_OVERRIDE1), + 0xfff | (0x20 << 16)); + + /* + * RESET_COMPLETED interrupt will be raised, so continue with + * the normal soft reset procedure + */ + return 0; +} + +static void enable_gpu_power_control(struct kbase_device *kbdev) +{ + unsigned int i; + +#if defined(CONFIG_REGULATOR) + for (i = 0; i < kbdev->nr_regulators; i++) { + if (WARN_ON(kbdev->regulators[i] == NULL)) + ; + else if (!regulator_is_enabled(kbdev->regulators[i])) + WARN_ON(regulator_enable(kbdev->regulators[i])); + } +#endif + + for (i = 0; i < kbdev->nr_clocks; i++) { + if (WARN_ON(kbdev->clocks[i] == NULL)) + ; + else if (!__clk_is_enabled(kbdev->clocks[i])) + WARN_ON(clk_prepare_enable(kbdev->clocks[i])); + } +} + +static void disable_gpu_power_control(struct kbase_device *kbdev) +{ + unsigned int i; + + for (i = 0; i < kbdev->nr_clocks; i++) { + if (WARN_ON(kbdev->clocks[i] == NULL)) + ; + else if (__clk_is_enabled(kbdev->clocks[i])) { + clk_disable_unprepare(kbdev->clocks[i]); + WARN_ON(__clk_is_enabled(kbdev->clocks[i])); + } + } + +#if defined(CONFIG_REGULATOR) + for (i = 0; i < kbdev->nr_regulators; i++) { + if (WARN_ON(kbdev->regulators[i] == NULL)) + ; + else if (regulator_is_enabled(kbdev->regulators[i])) + WARN_ON(regulator_disable(kbdev->regulators[i])); + } +#endif +} + +static int pm_callback_power_on(struct kbase_device *kbdev) +{ + int ret = 1; /* Assume GPU has been powered off */ + int error; + + dev_dbg(kbdev->dev, "%s %pK\n", __func__, (void *)kbdev->dev->pm_domain); + +#ifdef KBASE_PM_RUNTIME + error = pm_runtime_get_sync(kbdev->dev); + if (error == 1) { + /* + * Let core know that the chip has not been + * powered off, so we can save on re-initialization. + */ + ret = 0; + } + dev_dbg(kbdev->dev, "pm_runtime_get_sync returned %d\n", error); +#else + enable_gpu_power_control(kbdev); +#endif + + return ret; +} + +static void pm_callback_power_off(struct kbase_device *kbdev) +{ + dev_dbg(kbdev->dev, "%s\n", __func__); + +#ifdef KBASE_PM_RUNTIME + pm_runtime_mark_last_busy(kbdev->dev); + pm_runtime_put_autosuspend(kbdev->dev); +#else + /* Power down the GPU immediately as runtime PM is disabled */ + disable_gpu_power_control(kbdev); +#endif +} + +#ifdef KBASE_PM_RUNTIME +static int kbase_device_runtime_init(struct kbase_device *kbdev) +{ + int ret = 0; + + dev_dbg(kbdev->dev, "%s\n", __func__); + + pm_runtime_set_autosuspend_delay(kbdev->dev, AUTO_SUSPEND_DELAY); + pm_runtime_use_autosuspend(kbdev->dev); + + pm_runtime_set_active(kbdev->dev); + pm_runtime_enable(kbdev->dev); + + if (!pm_runtime_enabled(kbdev->dev)) { + dev_warn(kbdev->dev, "pm_runtime not enabled"); + ret = -EINVAL; + } else if (atomic_read(&kbdev->dev->power.usage_count)) { + dev_warn(kbdev->dev, "%s: Device runtime usage count unexpectedly non zero %d", + __func__, atomic_read(&kbdev->dev->power.usage_count)); + ret = -EINVAL; + } + + return ret; +} + +static void kbase_device_runtime_disable(struct kbase_device *kbdev) +{ + dev_dbg(kbdev->dev, "%s\n", __func__); + + if (atomic_read(&kbdev->dev->power.usage_count)) + dev_warn(kbdev->dev, "%s: Device runtime usage count unexpectedly non zero %d", + __func__, atomic_read(&kbdev->dev->power.usage_count)); + + pm_runtime_disable(kbdev->dev); +} +#endif /* KBASE_PM_RUNTIME */ + +static int pm_callback_runtime_on(struct kbase_device *kbdev) +{ + dev_dbg(kbdev->dev, "%s\n", __func__); + + enable_gpu_power_control(kbdev); + return 0; +} + +static void pm_callback_runtime_off(struct kbase_device *kbdev) +{ + dev_dbg(kbdev->dev, "%s\n", __func__); + + disable_gpu_power_control(kbdev); +} + +static void pm_callback_resume(struct kbase_device *kbdev) +{ + int ret = pm_callback_runtime_on(kbdev); + + WARN_ON(ret); +} + +static void pm_callback_suspend(struct kbase_device *kbdev) +{ + pm_callback_runtime_off(kbdev); +} + +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS +static void pm_callback_sc_rails_on(struct kbase_device *kbdev) +{ + dev_dbg(kbdev->dev, "SC rails are on"); +} + +static void pm_callback_sc_rails_off(struct kbase_device *kbdev) +{ + dev_dbg(kbdev->dev, "SC rails are off"); +} +#endif + +struct kbase_pm_callback_conf pm_callbacks = { + .power_on_callback = pm_callback_power_on, + .power_off_callback = pm_callback_power_off, + .power_suspend_callback = pm_callback_suspend, + .power_resume_callback = pm_callback_resume, + .soft_reset_callback = pm_callback_soft_reset, +#ifdef KBASE_PM_RUNTIME + .power_runtime_init_callback = kbase_device_runtime_init, + .power_runtime_term_callback = kbase_device_runtime_disable, + .power_runtime_on_callback = pm_callback_runtime_on, + .power_runtime_off_callback = pm_callback_runtime_off, +#else /* KBASE_PM_RUNTIME */ + .power_runtime_init_callback = NULL, + .power_runtime_term_callback = NULL, + .power_runtime_on_callback = NULL, + .power_runtime_off_callback = NULL, +#endif /* KBASE_PM_RUNTIME */ + +#if MALI_USE_CSF && defined(KBASE_PM_RUNTIME) + .power_runtime_gpu_idle_callback = pm_callback_runtime_gpu_idle, + .power_runtime_gpu_active_callback = pm_callback_runtime_gpu_active, +#else + .power_runtime_gpu_idle_callback = NULL, + .power_runtime_gpu_active_callback = NULL, +#endif + +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + .power_on_sc_rails_callback = pm_callback_sc_rails_on, + .power_off_sc_rails_callback = pm_callback_sc_rails_off, +#endif +}; diff --git a/mali_kbase/platform/pixel/Kbuild b/mali_kbase/platform/pixel/Kbuild index 1d368c9..b80c87b 100644 --- a/mali_kbase/platform/pixel/Kbuild +++ b/mali_kbase/platform/pixel/Kbuild @@ -21,7 +21,9 @@ mali_kbase-y += \ platform/$(MALI_PLATFORM_DIR)/pixel_gpu.o \ - platform/$(MALI_PLATFORM_DIR)/pixel_gpu_power.o + platform/$(MALI_PLATFORM_DIR)/pixel_gpu_power.o \ + platform/$(MALI_PLATFORM_DIR)/pixel_gpu_uevent.o \ + platform/$(MALI_PLATFORM_DIR)/pixel_gpu_itmon.o mali_kbase-$(CONFIG_MALI_MIDGARD_DVFS) += \ platform/$(MALI_PLATFORM_DIR)/pixel_gpu_dvfs.o \ @@ -34,3 +36,14 @@ mali_kbase-$(CONFIG_MALI_PIXEL_GPU_QOS) += \ mali_kbase-$(CONFIG_MALI_PIXEL_GPU_THERMAL) += \ platform/$(MALI_PLATFORM_DIR)/pixel_gpu_tmu.o + +ifneq ($(filter -DCONFIG_MALI_PIXEL_GPU_SSCD, $(ccflags-y)),) +mali_kbase-y += \ + platform/$(MALI_PLATFORM_DIR)/pixel_gpu_sscd.o +endif + +mali_kbase-$(CONFIG_MALI_PIXEL_GPU_SLC) += \ + platform/$(MALI_PLATFORM_DIR)/pixel_gpu_slc.o + +mali_kbase-$(CONFIG_MALI_CSF_SUPPORT) += \ + platform/$(MALI_PLATFORM_DIR)/pixel_gpu_debug.o diff --git a/mali_kbase/platform/pixel/mali_kbase_config_platform.h b/mali_kbase/platform/pixel/mali_kbase_config_platform.h index 87df05d..57cec12 100644 --- a/mali_kbase/platform/pixel/mali_kbase_config_platform.h +++ b/mali_kbase/platform/pixel/mali_kbase_config_platform.h @@ -45,7 +45,10 @@ * Attached value: pointer to @ref kbase_clk_rate_trace_op_conf * Default value: See @ref kbase_clk_rate_trace_op_conf */ +#ifdef CONFIG_MALI_MIDGARD_DVFS #define CLK_RATE_TRACE_OPS (&pixel_clk_rate_trace_ops) +extern struct kbase_clk_rate_trace_op_conf pixel_clk_rate_trace_ops; +#endif /** * Platform specific configuration functions @@ -56,7 +59,6 @@ #define PLATFORM_FUNCS (&platform_funcs) extern struct kbase_pm_callback_conf pm_callbacks; -extern struct kbase_clk_rate_trace_op_conf pixel_clk_rate_trace_ops; extern struct kbase_platform_funcs_conf platform_funcs; #ifdef CONFIG_MALI_PIXEL_GPU_SECURE_RENDERING @@ -65,13 +67,6 @@ extern struct protected_mode_ops pixel_protected_ops; #endif /* CONFIG_MALI_PIXEL_GPU_SECURE_RENDERING */ /** - * Autosuspend delay - * - * The delay time (in milliseconds) to be used for autosuspend - */ -#define AUTO_SUSPEND_DELAY (100) - -/** * DVFS Utilization evaluation period * * The amount of time (in milliseconds) between sucessive measurements of the @@ -86,8 +81,16 @@ extern struct protected_mode_ops pixel_protected_ops; #include <linux/workqueue.h> #endif /* CONFIG_MALI_MIDGARD_DVFS */ +#if IS_ENABLED(CONFIG_EXYNOS_ITMON) +#include <linux/atomic.h> +#include <linux/notifier.h> +#include <linux/workqueue.h> +#endif /* IS_ENABLED(CONFIG_EXYNOS_ITMON) */ + /* SOC level includes */ +#if IS_ENABLED(CONFIG_GOOGLE_BCL) #include <soc/google/bcl.h> +#endif #if IS_ENABLED(CONFIG_EXYNOS_PD) #include <soc/google/exynos-pd.h> #endif @@ -102,8 +105,10 @@ extern struct protected_mode_ops pixel_protected_ops; #include "pixel_gpu_dvfs.h" #endif /* CONFIG_MALI_MIDGARD_DVFS */ +#include "pixel_gpu_uevent.h" + /* All port specific fields go here */ -#define OF_DATA_NUM_MAX 128 +#define OF_DATA_NUM_MAX 140 #define CPU_FREQ_MAX INT_MAX enum gpu_power_state { @@ -116,14 +121,15 @@ enum gpu_power_state { * The power state can thus be defined as the highest-level domain that * is currently powered on. * - * GLOBAL: The frontend (JM, CSF), including registers. - * COREGROUP: The L2 and AXI interface, Tiler, and MMU. - * STACKS: The shader cores. + * GLOBAL: JM, CSF: The frontend (JM, CSF), including registers. + * CSF: The L2 and AXI interface, Tiler, and MMU. + * STACKS: JM, CSF: The shader cores. + * JM: The L2 and AXI interface, Tiler, and MMU. */ GPU_POWER_LEVEL_OFF = 0, GPU_POWER_LEVEL_GLOBAL = 1, - GPU_POWER_LEVEL_COREGROUP = 2, - GPU_POWER_LEVEL_STACKS = 3, + GPU_POWER_LEVEL_STACKS = 2, + GPU_POWER_LEVEL_NUM }; /** @@ -231,7 +237,9 @@ struct gpu_dvfs_metrics_uid_stats; * @pm.domain: The power domain the GPU is in. * @pm.status_reg_offset: Register offset to the G3D status in the PMU. Set via DT. * @pm.status_local_power_mask: Mask to extract power status of the GPU. Set via DT. - * @pm.autosuspend_delay: Delay (in ms) before PM runtime should trigger auto suspend. + * @pm.use_autosuspend: Use autosuspend on the TOP domain if true, sync suspend if false. + * @pm.autosuspend_delay: Delay (in ms) before PM runtime should trigger auto suspend on TOP + * domain if use_autosuspend is true. * @pm.bcl_dev: Pointer to the Battery Current Limiter device. * * @tz_protection_enabled: Storing the secure rendering state of the GPU. Access to this is @@ -271,9 +279,9 @@ struct gpu_dvfs_metrics_uid_stats; * @dvfs.metrics.last_power_state: The GPU's power state when the DVFS metric logic was last run. * @dvfs.metrics.last_level: The GPU's level when the DVFS metric logic was last run. * @dvfs.metrics.transtab: Pointer to the DVFS transition table. - * @dvfs.metrics.js_uid_stats: An array of pointers to the per-UID stats blocks currently - * resident in each of the GPU's job slots. Access is controlled by - * the hwaccess lock. + * @dvfs.metrics.work_uid_stats: An array of pointers to the per-UID stats blocks currently + * resident in each of the GPU's job slots, or CSG slots. + * Access is controlled by the dvfs.metrics.lock. * @dvfs.metrics.uid_stats_list: List head pointer to the linked list of per-UID stats blocks. * Modification to the linked list itself (not its elements) is * protected by the kctx_list lock. @@ -293,6 +301,16 @@ struct gpu_dvfs_metrics_uid_stats; * @dvfs.qos.bts.enabled: Stores whether Bus Traffic Shaping (BTS) is currently enabled * @dvfs.qos.bts.threshold: The G3D shader stack clock at which BTS will be enabled. Set via DT. * @dvfs.qos.bts.scenario: The index of the BTS scenario to be used. Set via DT. + * + * @slc.lock: Synchronize updates to the SLC partition accounting variables. + * @slc.demand: The total demand for SLC space, an aggregation of each kctx's demand. + * @slc.usage: The total amount of SLC space used, an aggregation of each kctx's usage. + * + * @itmon.wq: A workqueue for ITMON page table search. + * @itmon.work: The work item for the above. + * @itmon.nb: The ITMON notifier block. + * @itmon.pa: The faulting physical address. + * @itmon.active: Active count, non-zero while a search is active. */ struct pixel_context { struct kbase_device *kbdev; @@ -303,10 +321,10 @@ struct pixel_context { struct device *domain_devs[GPU_PM_DOMAIN_COUNT]; struct device_link *domain_links[GPU_PM_DOMAIN_COUNT]; - struct exynos_pm_domain *domain; unsigned int status_reg_offset; unsigned int status_local_power_mask; + bool use_autosuspend; unsigned int autosuspend_delay; #ifdef CONFIG_MALI_MIDGARD_DVFS struct gpu_dvfs_opp_metrics power_off_metrics; @@ -315,6 +333,10 @@ struct pixel_context { #if IS_ENABLED(CONFIG_GOOGLE_BCL) struct bcl_device *bcl_dev; #endif + struct pixel_rail_state_log *rail_state_log; +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + bool ifpo_enabled; +#endif } pm; #ifdef CONFIG_MALI_PIXEL_GPU_SECURE_RENDERING @@ -328,17 +350,21 @@ struct pixel_context { struct workqueue_struct *control_wq; struct work_struct control_work; atomic_t util; +#if !MALI_USE_CSF atomic_t util_gl; atomic_t util_cl; +#endif struct workqueue_struct *clockdown_wq; struct delayed_work clockdown_work; unsigned int clockdown_hysteresis; + bool updates_enabled; struct gpu_dvfs_clk clks[GPU_DVFS_CLK_COUNT]; struct gpu_dvfs_opp *table; int table_size; + int step_up_val; int level; int level_target; int level_max; @@ -354,11 +380,16 @@ struct pixel_context { } governor; struct { + spinlock_t lock; u64 last_time; bool last_power_state; int last_level; int *transtab; - struct gpu_dvfs_metrics_uid_stats *js_uid_stats[BASE_JM_MAX_NR_SLOTS]; +#if !MALI_USE_CSF + struct gpu_dvfs_metrics_uid_stats *work_uid_stats[BASE_JM_MAX_NR_SLOTS * SLOT_RB_SIZE]; +#else + struct gpu_dvfs_metrics_uid_stats *work_uid_stats[MAX_SUPPORTED_CSGS]; +#endif /* !MALI_USE_CSF */ struct list_head uid_stats_list; } metrics; @@ -389,6 +420,38 @@ struct pixel_context { #endif /* CONFIG_MALI_PIXEL_GPU_THERMAL */ } dvfs; #endif /* CONFIG_MALI_MIDGARD_DVFS */ + + struct { + struct mutex lock; + u64 demand; + u64 usage; + } slc; + +#if IS_ENABLED(CONFIG_EXYNOS_ITMON) + struct { + struct workqueue_struct *wq; + struct work_struct work; + struct notifier_block nb; + phys_addr_t pa; + atomic_t active; + } itmon; +#endif +}; + +/** + * struct pixel_platform_data - Per kbase_context Pixel specific platform data + * + * @stats: Tracks the dvfs metrics for the UID associated with this context + * + * @slc.peak_demand: The parent context's maximum demand for SLC space + * @slc.peak_usage: The parent context's maximum use of SLC space + */ +struct pixel_platform_data { + struct gpu_dvfs_metrics_uid_stats* stats; + struct { + u64 peak_demand; + u64 peak_usage; + } slc; }; #endif /* _KBASE_CONFIG_PLATFORM_H_ */ diff --git a/mali_kbase/platform/pixel/pixel_gpu.c b/mali_kbase/platform/pixel/pixel_gpu.c index 940f125..3e8977c 100644 --- a/mali_kbase/platform/pixel/pixel_gpu.c +++ b/mali_kbase/platform/pixel/pixel_gpu.c @@ -21,10 +21,15 @@ #ifdef CONFIG_MALI_PIXEL_GPU_SECURE_RENDERING #include <device/mali_kbase_device_internal.h> #endif /* CONFIG_MALI_PIXEL_GPU_SECURE_RENDERING */ +#if MALI_USE_CSF +#include <csf/mali_kbase_csf_firmware_cfg.h> +#endif /* Pixel integration includes */ #include "mali_kbase_config_platform.h" #include "pixel_gpu_control.h" +#include "pixel_gpu_sscd.h" +#include "pixel_gpu_slc.h" #define CREATE_TRACE_POINTS #include "pixel_gpu_trace.h" @@ -35,6 +40,10 @@ */ #define GPU_SMC_TZPC_OK 0 +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS +#define HOST_CONTROLS_SC_RAILS_CFG_ENTRY_NAME "Host controls SC rails" +#endif + /** * pixel_gpu_secure_mode_enable() - Enables secure mode for the GPU * @@ -118,6 +127,123 @@ struct protected_mode_ops pixel_protected_ops = { #endif /* CONFIG_MALI_PIXEL_GPU_SECURE_RENDERING */ +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS +/** + * gpu_pixel_enable_host_ctrl_sc_rails() - Enable the config in FW to support host based + * control of SC power rails + * + * Look for the config entry that enables support in FW for the Host based + * control of shader core power rails and set it before the initial boot + * or reload of firmware. + * + * @kbdev: Kbase device structure + * + * Return: 0 if successful, negative error code on failure + */ +static int gpu_pixel_enable_host_ctrl_sc_rails(struct kbase_device *kbdev) +{ + u32 addr; + int ec = kbase_csf_firmware_cfg_find_config_address( + kbdev, HOST_CONTROLS_SC_RAILS_CFG_ENTRY_NAME, &addr); + + if (!ec) { + kbase_csf_update_firmware_memory(kbdev, addr, 1); + } + + return ec; +} +#endif + +static int gpu_fw_cfg_init(struct kbase_device *kbdev) { + int ec = 0; + +#if MALI_USE_CSF +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + ec = gpu_pixel_enable_host_ctrl_sc_rails(kbdev); + if (ec) + dev_warn(kbdev->dev, "pixel: failed to enable SC rail host-control"); +#endif + if (gpu_sscd_fw_log_init(kbdev, 0)) { + dev_warn(kbdev->dev, "pixel: failed to enable FW log"); + } +#endif + + return ec; +} + +/** + * gpu_pixel_kctx_init() - Called when a kernel context is created + * + * @kctx: The &struct kbase_context that is being initialized + * + * This function is called when the GPU driver is initializing a new kernel context. + * + * Return: Returns 0 on success, or an error code on failure. + */ +static int gpu_pixel_kctx_init(struct kbase_context *kctx) +{ + struct kbase_device* kbdev = kctx->kbdev; + int err; + + kctx->platform_data = kzalloc(sizeof(struct pixel_platform_data), GFP_KERNEL); + if (kctx->platform_data == NULL) { + dev_err(kbdev->dev, "pixel: failed to alloc platform_data for kctx"); + err = -ENOMEM; + goto done; + } + + err = gpu_dvfs_kctx_init(kctx); + if (err) { + dev_err(kbdev->dev, "pixel: DVFS kctx init failed\n"); + goto done; + } + + err = gpu_slc_kctx_init(kctx); + if (err) { + dev_err(kbdev->dev, "pixel: SLC kctx init failed\n"); + goto done; + } + +done: + return err; +} + +/** + * gpu_pixel_kctx_term() - Called when a kernel context is terminated + * + * @kctx: The &struct kbase_context that is being terminated + */ +static void gpu_pixel_kctx_term(struct kbase_context *kctx) +{ + gpu_slc_kctx_term(kctx); + gpu_dvfs_kctx_term(kctx); + + kfree(kctx->platform_data); + kctx->platform_data = NULL; +} + +static const struct kbase_device_init dev_init[] = { + { gpu_pm_init, gpu_pm_term, "PM init failed" }, +#ifdef CONFIG_MALI_MIDGARD_DVFS + { gpu_dvfs_init, gpu_dvfs_term, "DVFS init failed" }, +#endif + { gpu_sysfs_init, gpu_sysfs_term, "sysfs init failed" }, + { gpu_sscd_init, gpu_sscd_term, "SSCD init failed" }, + { gpu_slc_init, gpu_slc_term, "SLC init failed" }, +#if IS_ENABLED(CONFIG_EXYNOS_ITMON) + { gpu_itmon_init, gpu_itmon_term, "ITMON notifier init failed" }, +#endif +}; + +static void gpu_pixel_term_partial(struct kbase_device *kbdev, + unsigned int i) +{ + while (i-- > 0) { + if (dev_init[i].term) + dev_init[i].term(kbdev); + } +} + /** * gpu_pixel_init() - Initializes the Pixel integration for the Mali GPU. * @@ -127,8 +253,8 @@ struct protected_mode_ops pixel_protected_ops = { */ static int gpu_pixel_init(struct kbase_device *kbdev) { - int ret; - + int ret = 0; + unsigned int i; struct pixel_context *pc; pc = kzalloc(sizeof(struct pixel_context), GFP_KERNEL); @@ -141,26 +267,22 @@ static int gpu_pixel_init(struct kbase_device *kbdev) kbdev->platform_context = pc; pc->kbdev = kbdev; - ret = gpu_pm_init(kbdev); - if (ret) { - dev_err(kbdev->dev, "power management init failed\n"); - goto done; - } - -#ifdef CONFIG_MALI_MIDGARD_DVFS - ret = gpu_dvfs_init(kbdev); - if (ret) { - dev_err(kbdev->dev, "DVFS init failed\n"); - goto done; + for (i = 0; i < ARRAY_SIZE(dev_init); i++) { + if (dev_init[i].init) { + ret = dev_init[i].init(kbdev); + if (ret) { + dev_err(kbdev->dev, "%s error = %d\n", + dev_init[i].err_mes, ret); + break; + } + } } -#endif /* CONFIG_MALI_MIDGARD_DVFS */ - ret = gpu_sysfs_init(kbdev); if (ret) { - dev_err(kbdev->dev, "sysfs init failed\n"); - goto done; + gpu_pixel_term_partial(kbdev, i); + kbdev->platform_context = NULL; + kfree(pc); } - ret = 0; done: return ret; @@ -175,10 +297,7 @@ static void gpu_pixel_term(struct kbase_device *kbdev) { struct pixel_context *pc = kbdev->platform_context; - gpu_sysfs_term(kbdev); - gpu_dvfs_term(kbdev); - gpu_pm_term(kbdev); - + gpu_pixel_term_partial(kbdev, ARRAY_SIZE(dev_init)); kbdev->platform_context = NULL; kfree(pc); } @@ -186,8 +305,10 @@ static void gpu_pixel_term(struct kbase_device *kbdev) struct kbase_platform_funcs_conf platform_funcs = { .platform_init_func = &gpu_pixel_init, .platform_term_func = &gpu_pixel_term, - .platform_handler_context_init_func = &gpu_dvfs_kctx_init, - .platform_handler_context_term_func = &gpu_dvfs_kctx_term, - .platform_handler_atom_submit_func = &gpu_dvfs_metrics_job_start, - .platform_handler_atom_complete_func = &gpu_dvfs_metrics_job_end, + .platform_handler_context_init_func = &gpu_pixel_kctx_init, + .platform_handler_context_term_func = &gpu_pixel_kctx_term, + .platform_handler_work_begin_func = &gpu_dvfs_metrics_work_begin, + .platform_handler_work_end_func = &gpu_dvfs_metrics_work_end, + .platform_fw_cfg_init_func = &gpu_fw_cfg_init, + .platform_handler_core_dump_func = &gpu_sscd_dump, }; diff --git a/mali_kbase/platform/pixel/pixel_gpu_control.h b/mali_kbase/platform/pixel/pixel_gpu_control.h index 5b4e184..51b3063 100644 --- a/mali_kbase/platform/pixel/pixel_gpu_control.h +++ b/mali_kbase/platform/pixel/pixel_gpu_control.h @@ -12,19 +12,44 @@ bool gpu_pm_get_power_state(struct kbase_device *kbdev); int gpu_pm_init(struct kbase_device *kbdev); void gpu_pm_term(struct kbase_device *kbdev); +void* gpu_pm_get_rail_state_log(struct kbase_device *kbdev); +unsigned int gpu_pm_get_rail_state_log_size(struct kbase_device *kbdev); /* DVFS */ void gpu_dvfs_event_power_on(struct kbase_device *kbdev); void gpu_dvfs_event_power_off(struct kbase_device *kbdev); + +#ifdef CONFIG_MALI_MIDGARD_DVFS int gpu_dvfs_init(struct kbase_device *kbdev); void gpu_dvfs_term(struct kbase_device *kbdev); +void gpu_dvfs_disable_updates(struct kbase_device *kbdev); +void gpu_dvfs_enable_updates(struct kbase_device *kbdev); +#else +static int __maybe_unused gpu_dvfs_init(struct kbase_device *kbdev) { return 0; } +static void __maybe_unused gpu_dvfs_term(struct kbase_device *kbdev) {} +static void __maybe_unused gpu_dvfs_disable_updates(struct kbase_device *kbdev) {} +static void __maybe_unused gpu_dvfs_enable_updates(struct kbase_device *kbdev) {} +#endif /* sysfs */ +#ifdef CONFIG_MALI_MIDGARD_DVFS int gpu_sysfs_init(struct kbase_device *kbdev); void gpu_sysfs_term(struct kbase_device *kbdev); +#else +static int __maybe_unused gpu_sysfs_init(struct kbase_device *kbdev) { return 0; } +static void __maybe_unused gpu_sysfs_term(struct kbase_device *kbdev) {} +#endif /* Kernel context callbacks */ +#ifdef CONFIG_MALI_MIDGARD_DVFS int gpu_dvfs_kctx_init(struct kbase_context *kctx); void gpu_dvfs_kctx_term(struct kbase_context *kctx); +#endif + +/* ITMON notifier */ +#if IS_ENABLED(CONFIG_EXYNOS_ITMON) +int gpu_itmon_init(struct kbase_device *kbdev); +void gpu_itmon_term(struct kbase_device *kbdev); +#endif #endif /* _PIXEL_GPU_CONTROL_H_ */ diff --git a/mali_kbase/platform/pixel/pixel_gpu_debug.c b/mali_kbase/platform/pixel/pixel_gpu_debug.c new file mode 100644 index 0000000..f08f0b0 --- /dev/null +++ b/mali_kbase/platform/pixel/pixel_gpu_debug.c @@ -0,0 +1,103 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright 2022 Google LLC. + * + * Author: Jack Diver <diverj@google.com> + */ + +/* Mali core includes */ +#include <mali_kbase.h> +#include <device/mali_kbase_device.h> + +/* Pixel integration includes */ +#include "pixel_gpu_debug.h" + +#define GPU_DBG_LO 0x00000FE8 +#define PIXEL_STACK_PDC_ADDR 0x000770DB +#define PIXEL_CG_PDC_ADDR 0x000760DB +#define PIXEL_SC_PDC_ADDR 0x000740DB +#define GPU_PDC_ADDR(offset, val) ((offset) + ((val) << 8)) +#define GPU_DBG_ACTIVE_BIT (1 << 31) +#define GPU_DBG_ACTIVE_MAX_LOOPS 1000000 +#define GPU_DBG_INVALID (~0U) + +static bool gpu_debug_check_dbg_active(struct kbase_device *kbdev) +{ + int i = 0; + u32 val; + + lockdep_assert_held(&kbdev->hwaccess_lock); + + /* Wait for the active bit to drop, indicating the DBG command completed */ + do { + val = kbase_reg_read(kbdev, GPU_CONTROL_REG(GPU_STATUS)); + } while ((val & GPU_DBG_ACTIVE_BIT) && i++ < GPU_DBG_ACTIVE_MAX_LOOPS); + + if (val & GPU_DBG_ACTIVE_BIT) { + dev_err(kbdev->dev, "Timed out waiting for GPU DBG command to complete"); + return false; + } + + dev_dbg(kbdev->dev, "Waited for %d iterations before GPU DBG command completed", i); + + return true; +} + +static u32 gpu_debug_read_pdc(struct kbase_device *kbdev, u32 pdc_offset) +{ + lockdep_assert_held(&kbdev->hwaccess_lock); + + /* Write the debug command */ + kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_COMMAND), pdc_offset); + /* Wait for the debug command to complete */ + if (!gpu_debug_check_dbg_active(kbdev)) + return GPU_DBG_INVALID; + + /* Read the result */ + return kbase_reg_read(kbdev, GPU_CONTROL_REG(GPU_DBG_LO)); +} + +static void gpu_debug_read_sparse_pdcs(struct kbase_device *kbdev, u32 *out, u64 available, + u64 offset, u64 logical_max) +{ + int sparse_idx, logical_idx = 0; + + for (sparse_idx = 0; sparse_idx < BITS_PER_TYPE(u64) && logical_idx < logical_max; ++sparse_idx) { + /* Skip if we don't have this core in our configuration */ + if (!(available & BIT_ULL(sparse_idx))) + continue; + + /* GPU debug command expects the sparse core index */ + out[logical_idx] = gpu_debug_read_pdc(kbdev, GPU_PDC_ADDR(offset, sparse_idx)); + + ++logical_idx; + } +} + +void gpu_debug_read_pdc_status(struct kbase_device *kbdev, struct pixel_gpu_pdc_status *status) +{ + struct gpu_raw_gpu_props *raw_props; + + lockdep_assert_held(&kbdev->hwaccess_lock); + + status->meta = (struct pixel_gpu_pdc_status_metadata) { + .magic = "pdcs", + .version = 2, + }; + + /* If there's no external power we skip the register read/writes, + * We know all the PDC signals will be 0 in this case + */ + if (!kbdev->pm.backend.gpu_powered) { + memset(&status->state, 0, sizeof(status->state)); + return; + } + + raw_props = &kbdev->gpu_props.props.raw_props; + + status->state.core_group = gpu_debug_read_pdc(kbdev, PIXEL_CG_PDC_ADDR); + gpu_debug_read_sparse_pdcs(kbdev, status->state.shader_cores, raw_props->shader_present, + PIXEL_SC_PDC_ADDR, PIXEL_MALI_SC_COUNT); + gpu_debug_read_sparse_pdcs(kbdev, status->state.stacks, raw_props->stack_present, + PIXEL_STACK_PDC_ADDR, PIXEL_MALI_STACK_COUNT); +} diff --git a/mali_kbase/platform/pixel/pixel_gpu_debug.h b/mali_kbase/platform/pixel/pixel_gpu_debug.h new file mode 100644 index 0000000..c4bcd4a --- /dev/null +++ b/mali_kbase/platform/pixel/pixel_gpu_debug.h @@ -0,0 +1,130 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright 2022 Google LLC. + * + * Author: Jack Diver <diverj@google.com> + */ + +#ifndef _PIXEL_GPU_DEBUG_H_ +#define _PIXEL_GPU_DEBUG_H_ + +/* This is currently only supported for Odin */ +#define PIXEL_MALI_SC_COUNT 0x7 +#define PIXEL_MALI_STACK_COUNT 0x3 + +/** + * enum pixel_gpu_pdc_state - PDC internal state + */ +enum pixel_gpu_pdc_state { + PIXEL_GPU_PDC_STATE_POWER_OFF, + PIXEL_GPU_PDC_STATE_UP_POWER, + PIXEL_GPU_PDC_STATE_UP_ISOLATE, + PIXEL_GPU_PDC_STATE_UP_RESET, + PIXEL_GPU_PDC_STATE_UP_CLOCK, + PIXEL_GPU_PDC_STATE_UP_FUNC_ISOLATE, + PIXEL_GPU_PDC_STATE_UP_RESP, + PIXEL_GPU_PDC_STATE_UNUSED7, + PIXEL_GPU_PDC_STATE_UNUSED8, + PIXEL_GPU_PDC_STATE_POWER_ON, + PIXEL_GPU_PDC_STATE_DOWN_FUNC_ISOLATE, + PIXEL_GPU_PDC_STATE_DOWN_CLOCK, + PIXEL_GPU_PDC_STATE_DOWN_RESET, + PIXEL_GPU_PDC_STATE_DOWN_ISOLATE, + PIXEL_GPU_PDC_STATE_DOWN_POWER, + PIXEL_GPU_PDC_STATE_DOWN_RESP, + PIXEL_GPU_PDC_STATE_FAST_FUNC_ISOLATE, + PIXEL_GPU_PDC_STATE_FAST_CLOCK, + PIXEL_GPU_PDC_STATE_FAST_RESET, + PIXEL_GPU_PDC_STATE_FAST_RESP, + PIXEL_GPU_PDC_STATE_FAST_WAIT, + PIXEL_GPU_PDC_STATE_UNUSED11, + PIXEL_GPU_PDC_STATE_UNUSED12, + PIXEL_GPU_PDC_STATE_UNUSED13, + PIXEL_GPU_PDC_STATE_UNUSED14, + PIXEL_GPU_PDC_STATE_UNUSED15, + PIXEL_GPU_PDC_STATE_UNUSED16, + PIXEL_GPU_PDC_STATE_UNUSED17, + PIXEL_GPU_PDC_STATE_UNUSED1A, + PIXEL_GPU_PDC_STATE_UNUSED1B, + PIXEL_GPU_PDC_STATE_UNUSED1F, +}; + +/** + * struct pixel_gpu_pdc_status_bits - PDC status layout + * + * @state: PDC state, see enum pixel_gpu_pdc_state for details + * @func_iso_n: Functional isolation request + * @func_iso_ack_n Functional isolation complete + * @pwrup: Power up request + * @pwrup_ack Power up request acknowledged by PDC + * @reset_n Reset request + * @reset_ack_n Reset request acknowledged by PDC + * @isolate_n Physical isolation enable request + * @isolate_ack_n Physical isolation enable request has been acknowledged by PDC + * @clken Clock enable request + * @clken_ack Clock enable request acknowledged from internal gating + * @power_is_on PDC thinks power domain is fully on + * @power_is_off PDC thinks power domain is fully off + * @_reserved Undocumented + **/ +struct pixel_gpu_pdc_status_bits { + uint32_t state : 5; + uint32_t func_iso_n : 1; + uint32_t func_iso_ack_n : 1; + uint32_t pwrup : 1; + uint32_t pwrup_ack : 1; + uint32_t reset_n : 1; + uint32_t reset_ack_n : 1; + uint32_t isolate_n : 1; + uint32_t isolate_ack_n : 1; + uint32_t clken : 1; + uint32_t clken_ack : 1; + uint32_t power_is_on : 1; + uint32_t power_is_off : 1; + uint32_t _reserved : 15; +}; +_Static_assert(sizeof(struct pixel_gpu_pdc_status_bits) == sizeof(uint32_t), + "Incorrect pixel_gpu_pdc_status_bits size"); + +/** + * struct pixel_gpu_pdc_status_metadata - Info about the PDC status format + * + * @magic: Always 'pdcs', helps find the log in memory dumps + * @version: Updated whenever the binary layout changes + * @_reserved: Bytes reserved for future use + **/ +struct pixel_gpu_pdc_status_metadata { + char magic[4]; + uint8_t version; + char _reserved[11]; +} __attribute__((packed)); +_Static_assert(sizeof(struct pixel_gpu_pdc_status_metadata) == 16, + "Incorrect pixel_gpu_pdc_status_metadata size"); + +/** + * struct pixel_gpu_pdc_status - FW view of PDC state + * + * @meta: Info about the status format + * @core_group: Core group PDC state + * @shader_cores: Shader core PDC state + **/ +struct pixel_gpu_pdc_status { + struct pixel_gpu_pdc_status_metadata meta; + struct { + uint32_t core_group; + uint32_t shader_cores[PIXEL_MALI_SC_COUNT]; + uint32_t stacks[PIXEL_MALI_STACK_COUNT]; + } state; +} __attribute__((packed)); + +#if MALI_USE_CSF +void gpu_debug_read_pdc_status(struct kbase_device *kbdev, struct pixel_gpu_pdc_status *status); +#else +static void __maybe_unused gpu_debug_read_pdc_status(struct kbase_device *kbdev, + struct pixel_gpu_pdc_status *status) +{ + (void)kbdev, (void)status; +} +#endif + +#endif /* _PIXEL_GPU_DEBUG_H_ */ diff --git a/mali_kbase/platform/pixel/pixel_gpu_dvfs.c b/mali_kbase/platform/pixel/pixel_gpu_dvfs.c index ae6f496..f758867 100644 --- a/mali_kbase/platform/pixel/pixel_gpu_dvfs.c +++ b/mali_kbase/platform/pixel/pixel_gpu_dvfs.c @@ -26,12 +26,33 @@ #include "pixel_gpu_dvfs.h" #include "pixel_gpu_trace.h" -#define DVFS_TABLE_ROW_MAX (12) +#define DVFS_TABLE_ROW_MAX (14) +#define DVFS_TABLES_MAX (2) static struct gpu_dvfs_opp gpu_dvfs_table[DVFS_TABLE_ROW_MAX]; /* DVFS event handling code */ /** + * gpu_dvfs_set_freq() - Request a frequency change for a GPU domain + * + * @kbdev: &struct kbase_device for the GPU. + * @domain: The GPU domain that shall have it's frequency changed. + * @level: The frequency level to set the GPU domain to. + * + * Context: Expects the caller to hold the domain access lock + * + * Return: See cal_dfs_set_rate + */ +static int gpu_dvfs_set_freq(struct kbase_device *kbdev, enum gpu_dvfs_clk_index domain, int level) +{ + struct pixel_context *pc = kbdev->platform_context; + + lockdep_assert_held(&pc->pm.domain->access_lock); + + return cal_dfs_set_rate(pc->dvfs.clks[domain].cal_id, pc->dvfs.table[level].clk[domain]); +} + +/** * gpu_dvfs_set_new_level() - Updates the GPU operating point. * * @kbdev: The &struct kbase_device for the GPU. @@ -43,7 +64,6 @@ static struct gpu_dvfs_opp gpu_dvfs_table[DVFS_TABLE_ROW_MAX]; static int gpu_dvfs_set_new_level(struct kbase_device *kbdev, int next_level) { struct pixel_context *pc = kbdev->platform_context; - int c; lockdep_assert_held(&pc->dvfs.lock); @@ -55,8 +75,17 @@ static int gpu_dvfs_set_new_level(struct kbase_device *kbdev, int next_level) mutex_lock(&pc->pm.domain->access_lock); - for (c = 0; c < GPU_DVFS_CLK_COUNT; c++) - cal_dfs_set_rate(pc->dvfs.clks[c].cal_id, pc->dvfs.table[next_level].clk[c]); + /* We must enforce the CLK_G3DL2 >= CLK_G3D constraint. + * When clocking down we must set G3D CLK first to avoid violating the constraint. + */ + if (next_level > pc->dvfs.level) { + gpu_dvfs_set_freq(kbdev, GPU_DVFS_CLK_SHADERS, next_level); + gpu_dvfs_set_freq(kbdev, GPU_DVFS_CLK_TOP_LEVEL, next_level); + } else { + gpu_dvfs_set_freq(kbdev, GPU_DVFS_CLK_TOP_LEVEL, next_level); + gpu_dvfs_set_freq(kbdev, GPU_DVFS_CLK_SHADERS, next_level); + } + mutex_unlock(&pc->pm.domain->access_lock); @@ -82,6 +111,8 @@ static int gpu_dvfs_set_new_level(struct kbase_device *kbdev, int next_level) * taking into account the priority levels of each level lock. It ensures that votes on minimum and * maximum levels originating from different level lock types are supported. * + * Context: Expects the caller to hold the DVFS lock + * * Note: This is the only function that should write to &level_scaling_max or &level_scaling_min. */ static void gpu_dvfs_process_level_locks(struct kbase_device *kbdev) @@ -267,6 +298,7 @@ static void gpu_dvfs_clockdown_worker(struct work_struct *data) static inline void gpu_dvfs_set_level_locks_from_util(struct kbase_device *kbdev, struct gpu_dvfs_utlization *util_stats) { +#if !MALI_USE_CSF struct pixel_context *pc = kbdev->platform_context; bool cl_lock_set = (pc->dvfs.level_locks[GPU_DVFS_LEVEL_LOCK_COMPUTE].level_min != -1 || pc->dvfs.level_locks[GPU_DVFS_LEVEL_LOCK_COMPUTE].level_max != -1); @@ -277,6 +309,7 @@ static inline void gpu_dvfs_set_level_locks_from_util(struct kbase_device *kbdev pc->dvfs.level_scaling_compute_min, -1); else if (util_stats->util_cl == 0 && cl_lock_set) gpu_dvfs_reset_level_lock(kbdev, GPU_DVFS_LEVEL_LOCK_COMPUTE); +#endif /* !MALI_USE_CSF */ } /** @@ -299,10 +332,12 @@ void gpu_dvfs_select_level(struct kbase_device *kbdev) struct pixel_context *pc = kbdev->platform_context; struct gpu_dvfs_utlization util_stats; - if (gpu_pm_get_power_state(kbdev)) { + if (pc->dvfs.updates_enabled && gpu_pm_get_power_state(kbdev)) { util_stats.util = atomic_read(&pc->dvfs.util); +#if !MALI_USE_CSF util_stats.util_gl = atomic_read(&pc->dvfs.util_gl); util_stats.util_cl = atomic_read(&pc->dvfs.util_cl); +#endif gpu_dvfs_set_level_locks_from_util(kbdev, &util_stats); @@ -325,6 +360,41 @@ void gpu_dvfs_select_level(struct kbase_device *kbdev) } } +#ifdef CONFIG_MALI_MIDGARD_DVFS +/** + * gpu_dvfs_disable_updates() - Ensure DVFS updates are disabled + * + * @kbdev: The &struct kbase_device for the GPU. + * + * Ensure that no dvfs updates will occurr after this call completes. + */ +void gpu_dvfs_disable_updates(struct kbase_device *kbdev) { + struct pixel_context *pc = kbdev->platform_context; + + mutex_lock(&pc->dvfs.lock); + pc->dvfs.updates_enabled = false; + mutex_unlock(&pc->dvfs.lock); + + flush_workqueue(pc->dvfs.control_wq); +} + +/** + * gpu_dvfs_enable_updates() - Ensure DVFS updates are enabled + * + * @kbdev: The &struct kbase_device for the GPU. + * + * Ensure that dvfs updates will occurr after this call completes, undoing the effect of + * gpu_dvfs_disable_updates(). + */ +void gpu_dvfs_enable_updates(struct kbase_device *kbdev) { + struct pixel_context *pc = kbdev->platform_context; + + mutex_lock(&pc->dvfs.lock); + pc->dvfs.updates_enabled = true; + mutex_unlock(&pc->dvfs.lock); +} +#endif + /** * gpu_dvfs_control_worker() - The workqueue worker that changes DVFS on utilization change. * @@ -341,6 +411,34 @@ static void gpu_dvfs_control_worker(struct work_struct *data) mutex_unlock(&pc->dvfs.lock); } +#if MALI_USE_CSF +/** + * kbase_platform_dvfs_event() - Callback from Mali driver to report updated utilization metrics. + * + * @kbdev: The &struct kbase_device for the GPU. + * @utilisation: The calculated utilization as measured by the core Mali driver's metrics system. + * + * This is the function that bridges the core Mali driver and the Pixel integration code. As this is + * made in interrupt context, it is swiftly handed off to a work_queue for further processing. + * + * Context: Interrupt context. + * + * Return: Returns 1 to signal success as specified in mali_kbase_pm_internal.h. + */ +int kbase_platform_dvfs_event(struct kbase_device *kbdev, u32 utilisation) +{ + struct pixel_context *pc = kbdev->platform_context; + int proc = raw_smp_processor_id(); + + /* TODO (b/187175695): Report this data via a custom ftrace event instead */ + trace_clock_set_rate("gpu_util", utilisation, proc); + + atomic_set(&pc->dvfs.util, utilisation); + queue_work(pc->dvfs.control_wq, &pc->dvfs.control_work); + + return 1; +} +#else /* MALI_USE_CSF */ /** * kbase_platform_dvfs_event() - Callback from Mali driver to report updated utilization metrics. * @@ -374,6 +472,7 @@ int kbase_platform_dvfs_event(struct kbase_device *kbdev, u32 utilisation, return 1; } +#endif /* Initialization code */ @@ -405,39 +504,39 @@ static int find_voltage_for_freq(struct kbase_device *kbdev, unsigned int clock, } /** - * gpu_dvfs_update_asv_table() - Populate the GPU's DVFS table from DT. + * validate_and_parse_dvfs_table() - Validate and populate the GPU's DVFS table from DT. * * @kbdev: The &struct kbase_device for the GPU. + * @dvfs_table_num: DVFS table number to be validated and parsed. * - * This function reads data out of the GPU's device tree entry and uses it to populate - * &gpu_dvfs_table. For each entry in the DVFS table, it makes calls to determine voltages from ECT. - * It also checks for any level locks specified in the devicetree and ensures that the effective - * scaling range is set up. + * This function reads data out of the GPU's device tree entry, validates it, and + * uses it to populate &gpu_dvfs_table. For each entry in the DVFS table, it makes + * calls to determine voltages from ECT. It also checks for any level locks specified + * in the devicetree and ensures that the effective scaling range is set up. * - * This function will fail if the required data is not present in the GPU's device tree entry. + * This function will fail if the particular dvfs table's operating points does not + * match the ECT table for the device. * - * Return: Returns the size of the DVFS table on success, -EINVAL on failure. + * Return: Returns the number of opertaing points in the DVFS table on success, -EINVAL on failure. */ -static int gpu_dvfs_update_asv_table(struct kbase_device *kbdev) +static int validate_and_parse_dvfs_table(struct kbase_device *kbdev, int dvfs_table_num) { - struct pixel_context *pc = kbdev->platform_context; - struct device_node *np = kbdev->dev->of_node; + char table_name[64]; + char table_size_name[64]; int i, idx, c; - int of_data_int_array[OF_DATA_NUM_MAX]; int dvfs_table_row_num = 0, dvfs_table_col_num = 0; int dvfs_table_size = 0; - - struct dvfs_rate_volt vf_map[GPU_DVFS_CLK_COUNT][16]; - int level_count[GPU_DVFS_CLK_COUNT]; - int scaling_level_max = -1, scaling_level_min = -1; int scaling_freq_max_devicetree = INT_MAX; int scaling_freq_min_devicetree = 0; int scaling_freq_min_compute = 0; + int level_count[GPU_DVFS_CLK_COUNT]; + struct dvfs_rate_volt vf_map[GPU_DVFS_CLK_COUNT][16]; - bool use_asv_v1 = false; + struct device_node *np = kbdev->dev->of_node; + struct pixel_context *pc = kbdev->platform_context; /* Get frequency -> voltage mapping */ for (c = 0; c < GPU_DVFS_CLK_COUNT; c++) { @@ -448,22 +547,9 @@ static int gpu_dvfs_update_asv_table(struct kbase_device *kbdev) } } - /* We detect which ASV table the GPU is running by checking which - * operating points are available from ECT. We check for 202MHz on the - * GPU shader cores as this is only available in ASV v0.3 and later. - */ - if (find_voltage_for_freq(kbdev, 202000, NULL, vf_map[GPU_DVFS_CLK_SHADERS], - level_count[GPU_DVFS_CLK_SHADERS])) - use_asv_v1 = true; - - /* Get size of DVFS table data from device tree */ - if (use_asv_v1) { - if (of_property_read_u32_array(np, "gpu_dvfs_table_size_v1", of_data_int_array, 2)) - goto err; - } else { - if (of_property_read_u32_array(np, "gpu_dvfs_table_size_v2", of_data_int_array, 2)) - goto err; - } + sprintf(table_size_name, "gpu_dvfs_table_size_v%d", dvfs_table_num); + if (of_property_read_u32_array(np, table_size_name, of_data_int_array, 2)) + goto err; dvfs_table_row_num = of_data_int_array[0]; dvfs_table_col_num = of_data_int_array[1]; @@ -471,27 +557,39 @@ static int gpu_dvfs_update_asv_table(struct kbase_device *kbdev) if (dvfs_table_row_num > DVFS_TABLE_ROW_MAX) { dev_err(kbdev->dev, - "DVFS table has %d rows but only up to %d are supported\n", - dvfs_table_row_num, DVFS_TABLE_ROW_MAX); + "DVFS table %d has %d rows but only up to %d are supported", + dvfs_table_num, dvfs_table_row_num, DVFS_TABLE_ROW_MAX); goto err; } if (dvfs_table_size > OF_DATA_NUM_MAX) { - dev_err(kbdev->dev, "DVFS table is too big\n"); + dev_err(kbdev->dev, "DVFS table %d is too big", dvfs_table_num); goto err; } - - if (use_asv_v1) - of_property_read_u32_array(np, "gpu_dvfs_table_v1", - of_data_int_array, dvfs_table_size); - else - of_property_read_u32_array(np, "gpu_dvfs_table_v2", - of_data_int_array, dvfs_table_size); + sprintf(table_name, "gpu_dvfs_table_v%d", dvfs_table_num); + if (of_property_read_u32_array(np, table_name, of_data_int_array, dvfs_table_size)) + goto err; of_property_read_u32(np, "gpu_dvfs_max_freq", &scaling_freq_max_devicetree); of_property_read_u32(np, "gpu_dvfs_min_freq", &scaling_freq_min_devicetree); of_property_read_u32(np, "gpu_dvfs_min_freq_compute", &scaling_freq_min_compute); + /* Check if there is a voltage mapping for each frequency in the ECT table */ + for (i = 0; i < dvfs_table_row_num; i++) { + idx = i * dvfs_table_col_num; + + /* Get and validate voltages from cal-if */ + for (c = 0; c < GPU_DVFS_CLK_COUNT; c++) { + if (find_voltage_for_freq(kbdev, of_data_int_array[idx + c], + NULL, vf_map[c], level_count[c])) { + dev_dbg(kbdev->dev, + "Failed to find voltage for clock %u frequency %u in gpu_dvfs_table_v%d\n", + c, of_data_int_array[idx + c], dvfs_table_num); + goto err; + } + } + } + /* Process DVFS table data from device tree and store it in OPP table */ for (i = 0; i < dvfs_table_row_num; i++) { idx = i * dvfs_table_col_num; @@ -500,6 +598,11 @@ static int gpu_dvfs_update_asv_table(struct kbase_device *kbdev) gpu_dvfs_table[i].clk[GPU_DVFS_CLK_TOP_LEVEL] = of_data_int_array[idx + 0]; gpu_dvfs_table[i].clk[GPU_DVFS_CLK_SHADERS] = of_data_int_array[idx + 1]; + for (c = 0; c < GPU_DVFS_CLK_COUNT; c++) { + find_voltage_for_freq(kbdev, gpu_dvfs_table[i].clk[c], + &(gpu_dvfs_table[i].vol[c]), vf_map[c], level_count[c]); + } + gpu_dvfs_table[i].util_min = of_data_int_array[idx + 2]; gpu_dvfs_table[i].util_max = of_data_int_array[idx + 3]; gpu_dvfs_table[i].hysteresis = of_data_int_array[idx + 4]; @@ -524,19 +627,6 @@ static int gpu_dvfs_update_asv_table(struct kbase_device *kbdev) if (gpu_dvfs_table[i].clk[GPU_DVFS_CLK_SHADERS] >= scaling_freq_min_compute) pc->dvfs.level_scaling_compute_min = i; - - /* Get and validate voltages from cal-if */ - for (c = 0; c < GPU_DVFS_CLK_COUNT; c++) { - if (find_voltage_for_freq(kbdev, gpu_dvfs_table[i].clk[c], - &(gpu_dvfs_table[i].vol[c]), - vf_map[c], level_count[c])) { - dev_err(kbdev->dev, - "Failed to find voltage for clock %u frequency %u\n", - c, gpu_dvfs_table[i].clk[c]); - goto err; - } - } - } pc->dvfs.level_max = 0; @@ -547,11 +637,43 @@ static int gpu_dvfs_update_asv_table(struct kbase_device *kbdev) return dvfs_table_row_num; err: - dev_err(kbdev->dev, "failed to set GPU ASV table\n"); return -EINVAL; } /** + * gpu_dvfs_update_asv_table() - Populate the GPU's DVFS table from DT. + * + * @kbdev: The &struct kbase_device for the GPU. + * + * This function iterates through the list of DVFS tables available in the device tree + * and calls validate_and_parse_dvfs_table() to select the valid one for the device. + * + * This function will fail if the required data is not present in the GPU's device tree entry. + * + * Context: Expects the caller to hold the DVFS lock + * + * Return: Returns the number of opertaing points in the DVFS table on success, -EINVAL on failure. + */ +static int gpu_dvfs_update_asv_table(struct kbase_device *kbdev) +{ + int dvfs_table_idx, dvfs_table_row_num; + struct pixel_context *pc = kbdev->platform_context; + + lockdep_assert_held(&pc->dvfs.lock); + + for (dvfs_table_idx = DVFS_TABLES_MAX; dvfs_table_idx > 0; dvfs_table_idx--) { + dvfs_table_row_num = validate_and_parse_dvfs_table(kbdev, dvfs_table_idx); + if (dvfs_table_row_num > 0) + break; + } + if (dvfs_table_row_num <= 0) { + dev_err(kbdev->dev, "failed to set GPU DVFS table"); + } + + return dvfs_table_row_num; +} + +/** * gpu_dvfs_set_initial_level() - Set the initial GPU clocks * * @kbdev: The &struct kbase_device for the GPU @@ -576,17 +698,17 @@ static int gpu_dvfs_set_initial_level(struct kbase_device *kbdev) mutex_lock(&pc->pm.domain->access_lock); for (c = 0; c < GPU_DVFS_CLK_COUNT; c++) { - ret = cal_dfs_set_rate(pc->dvfs.clks[c].cal_id, pc->dvfs.table[level].clk[c]); + ret = gpu_dvfs_set_freq(kbdev, c, level); if (ret) { dev_err(kbdev->dev, "Failed to set boot frequency %d on clock index %d (err: %d)\n", pc->dvfs.table[level].clk[c], c, ret); - goto done; + break; } } -done: mutex_unlock(&pc->pm.domain->access_lock); + return ret; } @@ -615,6 +737,8 @@ int gpu_dvfs_init(struct kbase_device *kbdev) pc->dvfs.level_locks[i].level_max = -1; } + pc->dvfs.updates_enabled = true; + /* Get data from DT */ if (of_property_read_u32(np, "gpu0_cmu_cal_id", &pc->dvfs.clks[GPU_DVFS_CLK_TOP_LEVEL].cal_id) || @@ -643,6 +767,12 @@ int gpu_dvfs_init(struct kbase_device *kbdev) goto done; } + /* Setup dvfs step up value */ + if (of_property_read_u32(np, "gpu_dvfs_step_up_val", &pc->dvfs.step_up_val)) { + ret = -EINVAL; + goto done; + } + /* Initialize power down hysteresis */ if (of_property_read_u32(np, "gpu_dvfs_clockdown_hysteresis", &pc->dvfs.clockdown_hysteresis)) { diff --git a/mali_kbase/platform/pixel/pixel_gpu_dvfs.h b/mali_kbase/platform/pixel/pixel_gpu_dvfs.h index d133693..c1f1587 100644 --- a/mali_kbase/platform/pixel/pixel_gpu_dvfs.h +++ b/mali_kbase/platform/pixel/pixel_gpu_dvfs.h @@ -142,27 +142,99 @@ void gpu_dvfs_governor_term(struct kbase_device *kbdev); * @active_kctx_count: Count of active kernel contexts operating under this UID. Should only be * accessed while holding the kctx_list lock. * @uid: The UID for this stats block. - * @atoms_in_flight: The number of atoms currently executing on the GPU from this UID. Should only - * be accessed while holding the hwaccess lock. + * @active_work_count: Count of currently executing units of work on the GPU from this UID. Should + * only be accessed while holding the hwaccess lock if using a job manager GPU, + * CSF GPUs require holding the csf.scheduler.lock. * @period_start: The time (in nanoseconds) that the current active period for this UID began. - * Should only be accessed while holding the hwaccess lock. + * Should only be accessed while holding the hwaccess lock if using a job + * manager GPU, CSF GPUs require holding the csf.scheduler.lock. * @tis_stats: &struct gpu_dvfs_opp_metrics block storing time in state data for this UID. - * Should only be accessed while holding the hwaccess lock. + * Should only be accessed while holding the hwaccess lock if using a job + * manager GPU, CSF GPUs require holding the csf.scheduler.lock. */ struct gpu_dvfs_metrics_uid_stats { struct list_head uid_list_link; int active_kctx_count; kuid_t uid; - int atoms_in_flight; + int active_work_count; u64 period_start; struct gpu_dvfs_opp_metrics *tis_stats; }; +/** + * gpu_dvfs_metrics_update() - Updates GPU metrics on level or power change. + * + * @kbdev: The &struct kbase_device for the GPU. + * @old_level: The level that the GPU has just moved from. Can be the same as &new_level. + * @new_level: The level that the GPU has just moved to. Can be the same as &old_level. This + * parameter is ignored if &power_state is false. + * @power_state: The current power state of the GPU. Can be the same as the current power state. + * + * This function should be called (1) right after a change in power state of the GPU, or (2) just + * after changing the level of a powered on GPU. It will update the metrics for each of the GPU + * DVFS level metrics and the power metrics as appropriate. + * + * Context: Expects the caller to hold the dvfs.lock & dvfs.metrics.lock. + */ void gpu_dvfs_metrics_update(struct kbase_device *kbdev, int old_level, int new_level, bool power_state); -void gpu_dvfs_metrics_job_start(struct kbase_jd_atom *atom); -void gpu_dvfs_metrics_job_end(struct kbase_jd_atom *atom); + +/** + * gpu_dvfs_metrics_work_begin() - Notification of when a unit of work starts on + * the GPU + * + * @param: + * - If job manager GPU: The &struct kbase_jd_atom that has just been submitted to the GPU. + * - If CSF GPU: The &struct kbase_queue_group that has just been submitted to the GPU. + * + * For job manager GPUs: + * This function is called when an atom is submitted to the GPU by way of writing to the + * JSn_HEAD_NEXTn register. + * + * For CSF GPUs: + * This function is called when an group resident in a CSG slot starts executing. + * + * Context: Acquires the dvfs.metrics.lock. May be in IRQ context + */ +void gpu_dvfs_metrics_work_begin(void *param); + +/** + * gpu_dvfs_metrics_work_end() - Notification of when a unit of work stops + * running on the GPU + * + * @param: + * - If job manager GPU: The &struct kbase_jd_atom that has just stopped running on the GPU + * - If CSF GPU: The &struct kbase_queue_group that has just stopped running on the GPU + * + * This function is called when a unit of work is no longer running on the GPU, + * either due to successful completion, failure, preemption, or GPU reset. + * + * For job manager GPUs, a unit of work refers to an atom. + * + * For CSF GPUs, it refers to a group resident in a CSG slot, and so this + * function is called when a that CSG slot completes or suspends execution of + * the group. + * + * Context: Acquires the dvfs.metrics.lock. May be in IRQ context + */ +void gpu_dvfs_metrics_work_end(void *param); + +/** + * gpu_dvfs_metrics_init() - Initializes DVFS metrics. + * + * @kbdev: The &struct kbase_device for the GPU. + * + * Context: Process context. Takes and releases the DVFS lock. + * + * Return: On success, returns 0 otherwise returns an error code. + */ int gpu_dvfs_metrics_init(struct kbase_device *kbdev); + +/** + * gpu_dvfs_metrics_term() - Terminates DVFS metrics + * + * @kbdev: The &struct kbase_device for the GPU. + */ void gpu_dvfs_metrics_term(struct kbase_device *kbdev); /** diff --git a/mali_kbase/platform/pixel/pixel_gpu_dvfs_governor.c b/mali_kbase/platform/pixel/pixel_gpu_dvfs_governor.c index b817aff..28d4073 100644 --- a/mali_kbase/platform/pixel/pixel_gpu_dvfs_governor.c +++ b/mali_kbase/platform/pixel/pixel_gpu_dvfs_governor.c @@ -7,11 +7,13 @@ /* Mali core includes */ #include <mali_kbase.h> +#include <trace/events/power.h> /* Pixel integration includes */ #include "mali_kbase_config_platform.h" #include "pixel_gpu_control.h" #include "pixel_gpu_dvfs.h" +#include "pixel_gpu_trace.h" /** * gpu_dvfs_governor_basic() - The evaluation function for &GPU_DVFS_GOVERNOR_BASIC. @@ -96,15 +98,16 @@ static int gpu_dvfs_governor_quickstep(struct kbase_device *kbdev, int level_max = pc->dvfs.level_max; int level_min = pc->dvfs.level_min; int util = util_stats->util; + int step_up = pc->dvfs.step_up_val; lockdep_assert_held(&pc->dvfs.lock); if ((level > level_max) && (util > tbl[level].util_max)) { /* We need to clock up. */ - if (level >= 2 && (util > (100 + tbl[level].util_max) / 2)) { - dev_dbg(kbdev->dev, "DVFS +2: %d -> %d (u: %d / %d)\n", - level, level - 2, util, tbl[level].util_max); - level -= 2; + if (level >= step_up && (util > (100 + tbl[level].util_max) / 2)) { + dev_dbg(kbdev->dev, "DVFS +%d: %d -> %d (u: %d / %d)\n", + step_up, level, level - step_up, util, tbl[level].util_max); + level -= step_up; pc->dvfs.governor.delay = tbl[level].hysteresis / 2; } else { dev_dbg(kbdev->dev, "DVFS +1: %d -> %d (u: %d / %d)\n", @@ -164,11 +167,24 @@ int gpu_dvfs_governor_get_next_level(struct kbase_device *kbdev, struct gpu_dvfs_utlization *util_stats) { struct pixel_context *pc = kbdev->platform_context; - int level; + int level, ret; lockdep_assert_held(&pc->dvfs.lock); level = governors[pc->dvfs.governor.curr].evaluate(kbdev, util_stats); - return clamp(level, pc->dvfs.level_scaling_max, pc->dvfs.level_scaling_min); + if (level != pc->dvfs.level) { + trace_clock_set_rate("gpu_gov_rec", pc->dvfs.table[level].clk[GPU_DVFS_CLK_SHADERS], + raw_smp_processor_id()); + } + + ret = clamp(level, pc->dvfs.level_scaling_max, pc->dvfs.level_scaling_min); + if (ret != level) { + trace_gpu_gov_rec_violate(pc->dvfs.table[level].clk[GPU_DVFS_CLK_SHADERS], + pc->dvfs.table[ret].clk[GPU_DVFS_CLK_SHADERS], + pc->dvfs.table[pc->dvfs.level_scaling_min].clk[GPU_DVFS_CLK_SHADERS], + pc->dvfs.table[pc->dvfs.level_scaling_max].clk[GPU_DVFS_CLK_SHADERS]); + } + + return ret; } /** diff --git a/mali_kbase/platform/pixel/pixel_gpu_dvfs_metrics.c b/mali_kbase/platform/pixel/pixel_gpu_dvfs_metrics.c index 5d3da59..c7c2b81 100644 --- a/mali_kbase/platform/pixel/pixel_gpu_dvfs_metrics.c +++ b/mali_kbase/platform/pixel/pixel_gpu_dvfs_metrics.c @@ -25,6 +25,7 @@ #include "mali_kbase_config_platform.h" #include "pixel_gpu_control.h" #include "pixel_gpu_dvfs.h" +#include "mali_power_gpu_frequency_trace.h" static void *enumerate_gpu_clk(struct kbase_device *kbdev, unsigned int index) { @@ -85,7 +86,6 @@ static void gpu_dvfs_metrics_trace_clock(struct kbase_device *kbdev, int old_lev struct pixel_context *pc = kbdev->platform_context; struct kbase_gpu_clk_notifier_data nd; int c; - int proc = raw_smp_processor_id(); int clks[GPU_DVFS_CLK_COUNT]; for (c = 0; c < GPU_DVFS_CLK_COUNT; c++) { @@ -103,9 +103,8 @@ static void gpu_dvfs_metrics_trace_clock(struct kbase_device *kbdev, int old_lev } - /* TODO: Remove reporting clocks this way when we transition to Perfetto */ - trace_clock_set_rate("gpu0", clks[GPU_DVFS_CLK_TOP_LEVEL], proc); - trace_clock_set_rate("gpu1", clks[GPU_DVFS_CLK_SHADERS], proc); + trace_gpu_frequency(clks[GPU_DVFS_CLK_TOP_LEVEL], 0); + trace_gpu_frequency(clks[GPU_DVFS_CLK_SHADERS], 1); } /** @@ -114,61 +113,44 @@ static void gpu_dvfs_metrics_trace_clock(struct kbase_device *kbdev, int old_lev * @kbdev: The &struct kbase_device for the GPU. * @event_time: The time of the clock change event in nanoseconds. * - * Called when the operating point is changing so that the per-UID time in state data for in-flight - * atoms can be updated. Note that this function need only be called when the operating point is - * changing _and_ the GPU is powered on. This is because no atoms will be in-flight when the GPU is - * powered down. + * Called when the operating point is changing so that the per-UID time in state + * data for active work can be updated. Note that this function need only be + * called when the operating point is changing _and_ the GPU is powered on. + * This is because no work will be active when the GPU is powered down. * - * Context: Called in process context, invokes an IRQ context and takes the per-UID metrics spin - * lock. + * Context: Called in process context. Requires the dvfs.lock & dvfs.metrics.lock to be held. */ static void gpu_dvfs_metrics_uid_level_change(struct kbase_device *kbdev, u64 event_time) { struct pixel_context *pc = kbdev->platform_context; struct gpu_dvfs_metrics_uid_stats *stats; - unsigned long flags; int i; + int const nr_slots = ARRAY_SIZE(pc->dvfs.metrics.work_uid_stats); lockdep_assert_held(&pc->dvfs.lock); + lockdep_assert_held(&pc->dvfs.metrics.lock); - spin_lock_irqsave(&kbdev->hwaccess_lock, flags); - - for (i = 0; i < BASE_JM_MAX_NR_SLOTS; i++) { - stats = pc->dvfs.metrics.js_uid_stats[i]; + for (i = 0; i < nr_slots; i++) { + stats = pc->dvfs.metrics.work_uid_stats[i]; if (stats && stats->period_start != event_time) { - WARN_ON(stats->period_start == 0); + WARN_ON_ONCE(stats->period_start == 0); stats->tis_stats[pc->dvfs.level].time_total += (event_time - stats->period_start); stats->period_start = event_time; } } - - spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); } -/** - * gpu_dvfs_metrics_update() - Updates GPU metrics on level or power change. - * - * @kbdev: The &struct kbase_device for the GPU. - * @old_level: The level that the GPU has just moved from. Can be the same as &new_level. - * @new_level: The level that the GPU has just moved to. Can be the same as &old_level. This - * parameter is ignored if &power_state is false. - * @power_state: The current power state of the GPU. Can be the same as the current power state. - * - * This function should be called (1) right after a change in power state of the GPU, or (2) just - * after changing the level of a powered on GPU. It will update the metrics for each of the GPU - * DVFS level metrics and the power metrics as appropriate. - * - * Context: Expects the caller to hold the DVFS lock. - */ void gpu_dvfs_metrics_update(struct kbase_device *kbdev, int old_level, int new_level, bool power_state) { struct pixel_context *pc = kbdev->platform_context; const u64 prev = pc->dvfs.metrics.last_time; u64 curr = ktime_get_ns(); + unsigned long flags; lockdep_assert_held(&pc->dvfs.lock); + spin_lock_irqsave(&pc->dvfs.metrics.lock, flags); if (pc->dvfs.metrics.last_power_state) { if (power_state) { @@ -210,74 +192,125 @@ void gpu_dvfs_metrics_update(struct kbase_device *kbdev, int old_level, int new_ pc->dvfs.metrics.last_power_state = power_state; pc->dvfs.metrics.last_time = curr; pc->dvfs.metrics.last_level = new_level; + spin_unlock_irqrestore(&pc->dvfs.metrics.lock, flags); gpu_dvfs_metrics_trace_clock(kbdev, old_level, new_level, power_state); } -/** - * gpu_dvfs_metrics_job_start() - Notification of when an atom starts on the GPU - * - * @atom: The &struct kbase_jd_atom that has just been submitted to the GPU. - * - * This function is called when an atom is submitted to the GPU by way of writing to the - * JSn_HEAD_NEXTn register. - * - * Context: May be in IRQ context, assumes that the hwaccess lock is held, and in turn takes and - * releases the metrics UID spin lock. - */ -void gpu_dvfs_metrics_job_start(struct kbase_jd_atom *atom) +void gpu_dvfs_metrics_work_begin(void* param) { - struct kbase_device *kbdev = atom->kctx->kbdev; - struct pixel_context *pc = kbdev->platform_context; - struct gpu_dvfs_metrics_uid_stats *stats = atom->kctx->platform_data; - int js = atom->slot_nr; +#if !MALI_USE_CSF + struct kbase_jd_atom* unit = param; + const int slot = unit->slot_nr; +#else + struct kbase_queue_group* unit = param; + const int slot = unit->csg_nr; +#endif + struct kbase_context* kctx = unit->kctx; + struct kbase_device* kbdev = kctx->kbdev; + struct pixel_context* pc = kbdev->platform_context; + struct pixel_platform_data *pd = kctx->platform_data; + struct gpu_dvfs_metrics_uid_stats* uid_stats = pd->stats; + struct gpu_dvfs_metrics_uid_stats** work_stats = &pc->dvfs.metrics.work_uid_stats[slot]; + const u64 curr = ktime_get_ns(); + unsigned long flags; + + dev_dbg(kbdev->dev, "work_begin, slot: %d, uid: %d", slot, uid_stats->uid.val); + + spin_lock_irqsave(&pc->dvfs.metrics.lock, flags); - lockdep_assert_held(&kbdev->hwaccess_lock); +#if !MALI_USE_CSF + /* + * JM slots can have 2 Atoms submitted per slot, with different UIDs + * Use the secondary slot if the first is occupied + */ + if (*work_stats != NULL) { + work_stats = &pc->dvfs.metrics.work_uid_stats[slot + BASE_JM_MAX_NR_SLOTS]; + } +#endif + + /* Nothing should be mapped to this slot */ + WARN_ON_ONCE(*work_stats != NULL); - if (stats->atoms_in_flight == 0) { - /* This is the start of a new period */ - WARN_ON(stats->period_start != 0); - stats->period_start = ktime_get_ns(); + /* + * First new work associated with this UID, start tracking the per UID + * time now + */ + if (uid_stats->active_work_count == 0) + { + /* + * This is the start of a new period, the start time shouldn't have + * been set or should have been cleared. + */ + WARN_ON_ONCE(uid_stats->period_start != 0); + uid_stats->period_start = curr; } + ++uid_stats->active_work_count; + + /* Link the UID stats to the stream slot */ + *work_stats = uid_stats; - stats->atoms_in_flight++; - pc->dvfs.metrics.js_uid_stats[js] = stats; + spin_unlock_irqrestore(&pc->dvfs.metrics.lock, flags); } -/** - * gpu_dvfs_metrics_job_end() - Notification of when an atom stops running on the GPU - * - * @atom: The &struct kbase_jd_atom that has just stopped running on the GPU - * - * This function is called when an atom is no longer running on the GPU, either due to successful - * completion, failure, preemption, or GPU reset. - * - * Context: May be in IRQ context, assumes that the hwaccess lock is held, and in turn takes and - * releases the metrics UID spin lock. - */ -void gpu_dvfs_metrics_job_end(struct kbase_jd_atom *atom) +void gpu_dvfs_metrics_work_end(void *param) { - struct kbase_device *kbdev = atom->kctx->kbdev; - struct pixel_context *pc = kbdev->platform_context; - struct gpu_dvfs_metrics_uid_stats *stats = atom->kctx->platform_data; - int js = atom->slot_nr; - u64 curr = ktime_get_ns(); +#if !MALI_USE_CSF + struct kbase_jd_atom* unit = param; + const int slot = unit->slot_nr; +#else + struct kbase_queue_group* unit = param; + const int slot = unit->csg_nr; +#endif + struct kbase_context* kctx = unit->kctx; + struct kbase_device* kbdev = kctx->kbdev; + struct pixel_context* pc = kbdev->platform_context; + struct pixel_platform_data *pd = kctx->platform_data; + struct gpu_dvfs_metrics_uid_stats* uid_stats = pd->stats; + struct gpu_dvfs_metrics_uid_stats** work_stats = &pc->dvfs.metrics.work_uid_stats[slot]; + const u64 curr = ktime_get_ns(); + unsigned long flags; - lockdep_assert_held(&kbdev->hwaccess_lock); + dev_dbg(kbdev->dev, "work_end, slot: %d, uid: %d", slot, uid_stats->uid.val); - WARN_ON(stats->period_start == 0); - WARN_ON(stats->atoms_in_flight == 0); + spin_lock_irqsave(&pc->dvfs.metrics.lock, flags); - stats->atoms_in_flight--; - stats->tis_stats[pc->dvfs.level].time_total += (curr - stats->period_start); +#if !MALI_USE_CSF + /* + * JM slots can have 2 Atoms submitted per slot, with different UIDs + * If the primary slot is not for this uid, then check the secondary slot + */ + if (*work_stats != uid_stats) { + work_stats = &pc->dvfs.metrics.work_uid_stats[slot + BASE_JM_MAX_NR_SLOTS]; + } +#endif - if (stats->atoms_in_flight == 0) - /* This is the end of a period */ - stats->period_start = 0; - else - stats->period_start = curr; + /* We should have something mapped to this slot */ + WARN_ON_ONCE(*work_stats == NULL); + /* Should be the same stats */ + WARN_ON_ONCE(uid_stats != *work_stats); + /* Forgot to init the start time? */ + WARN_ON_ONCE(uid_stats->period_start == 0); + /* No jobs so how could have something have completed? */ + if (!WARN_ON_ONCE(uid_stats->active_work_count == 0)) + --uid_stats->active_work_count; + /* + * We could only update this when the work count equals zero, and + * avoid updating the period_start often. However we get more timely + * updates this way. + */ + uid_stats->tis_stats[pc->dvfs.level].time_total += (curr - uid_stats->period_start); + + /* + * Reset the period start time when there is no work associated with + * this UID, or update it to prevent double counting. + */ + uid_stats->period_start = uid_stats->active_work_count == 0 ? 0 : curr; - pc->dvfs.metrics.js_uid_stats[js] = NULL; + /* Unlink the UID stats from the slot stats */ + *work_stats = NULL; + + spin_unlock_irqrestore(&pc->dvfs.metrics.lock, flags); } /** @@ -345,6 +378,7 @@ int gpu_dvfs_kctx_init(struct kbase_context *kctx) { struct kbase_device *kbdev = kctx->kbdev; struct pixel_context *pc = kbdev->platform_context; + struct pixel_platform_data *pd = kctx->platform_data; struct task_struct *task; kuid_t uid; @@ -397,7 +431,7 @@ int gpu_dvfs_kctx_init(struct kbase_context *kctx) stats->active_kctx_count++; /* Store a direct link in the kctx */ - kctx->platform_data = stats; + pd->stats = stats; done: mutex_unlock(&kbdev->kctx_list_lock); @@ -405,7 +439,7 @@ done: } /** - * gpu_dvfs_kctx_init() - Called when a kernel context is terminated + * gpu_dvfs_kctx_term() - Called when a kernel context is terminated * * @kctx: The &struct kbase_context that is being terminated * @@ -415,7 +449,8 @@ done: void gpu_dvfs_kctx_term(struct kbase_context *kctx) { struct kbase_device *kbdev = kctx->kbdev; - struct gpu_dvfs_metrics_uid_stats *stats = kctx->platform_data; + struct pixel_platform_data *pd = kctx->platform_data; + struct gpu_dvfs_metrics_uid_stats *stats = pd->stats; unsigned long flags; spin_lock_irqsave(&kbdev->hwaccess_lock, flags); @@ -424,21 +459,13 @@ void gpu_dvfs_kctx_term(struct kbase_context *kctx) spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); } -/** - * gpu_dvfs_metrics_init() - Initializes DVFS metrics. - * - * @kbdev: The &struct kbase_device for the GPU. - * - * Context: Process context. Takes and releases the DVFS lock. - * - * Return: On success, returns 0 otherwise returns an error code. - */ int gpu_dvfs_metrics_init(struct kbase_device *kbdev) { struct pixel_context *pc = kbdev->platform_context; int c; mutex_lock(&pc->dvfs.lock); + spin_lock_init(&pc->dvfs.metrics.lock); pc->dvfs.metrics.last_time = ktime_get_ns(); pc->dvfs.metrics.last_power_state = gpu_pm_get_power_state(kbdev); @@ -460,14 +487,11 @@ int gpu_dvfs_metrics_init(struct kbase_device *kbdev) /* Initialize per-UID metrics */ INIT_LIST_HEAD(&pc->dvfs.metrics.uid_stats_list); + memset(pc->dvfs.metrics.work_uid_stats, 0, sizeof(pc->dvfs.metrics.work_uid_stats)); + return 0; } -/** - * gpu_dvfs_metrics_term() - Terminates DVFS metrics - * - * @kbdev: The &struct kbase_device for the GPU. - */ void gpu_dvfs_metrics_term(struct kbase_device *kbdev) { struct pixel_context *pc = kbdev->platform_context; diff --git a/mali_kbase/platform/pixel/pixel_gpu_itmon.c b/mali_kbase/platform/pixel/pixel_gpu_itmon.c new file mode 100644 index 0000000..7dbce37 --- /dev/null +++ b/mali_kbase/platform/pixel/pixel_gpu_itmon.c @@ -0,0 +1,383 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright 2023 Google LLC. + * + * This platform component registers an ITMON notifier callback which filters + * fabric fault reports where the GPU is identified as the initiator of the + * transaction. + * + * When such a fault occurs, it searches for the faulting physical address in + * the GPU page tables of all GPU contexts. If the physical address appears in + * a page table, the context and corresponding virtual address are logged. + * + * Otherwise, a message is logged indicating that the physical address does not + * appear in any GPU page table. + */ + +#if IS_ENABLED(CONFIG_EXYNOS_ITMON) + +/* Linux includes */ +#include <linux/of.h> + +/* SOC includes */ +#include <soc/google/exynos-itmon.h> + +/* Mali core includes */ +#include <mali_kbase.h> + +/* Pixel integration includes */ +#include "mali_kbase_config_platform.h" +#include "pixel_gpu_control.h" + + +/* GPU page tables may use more physical address bits than the bus, to encode + * other information. We'll need to mask those away to match with bus + * addresses. + */ +#define PHYSICAL_ADDRESS_BITS 36 +#define PHYSICAL_ADDRESS_MASK ((1ULL << (PHYSICAL_ADDRESS_BITS)) - 1) + +/* Convert KBASE_MMU_PAGE_ENTRIES to number of bits */ +#define KBASE_MMU_PAGE_ENTRIES_LOG2 const_ilog2(KBASE_MMU_PAGE_ENTRIES) + + +/** + * pixel_gpu_itmon_search_pgd() - Search a page directory page. + * + * @mmu_mode: The &struct kbase_mmu_mode PTE accessor functions. + * @level: The level of the page directory. + * @pa_pgd: The physical address of the page directory page. + * @pa_search: The physical address to search for. + * @va_prefix: The virtual address prefix above this level. + * + * Return: The virtual address mapped to the physical address, or zero. + */ +static u64 pixel_gpu_itmon_search_pgd(struct kbase_mmu_mode const *mmu_mode, + int level, phys_addr_t pa_pgd, phys_addr_t pa_search, u64 va_prefix) +{ + u64 va_found = 0; + int i; + + /* Map the page */ + const u64 *entry = kmap(pfn_to_page(PFN_DOWN(pa_pgd))); + if (!entry) + return 0; + + /* Shift the VA prefix left to make room for this new level */ + va_prefix <<= KBASE_MMU_PAGE_ENTRIES_LOG2; + + /* For each entry in the page directory */ + for (i = 0; i < KBASE_MMU_PAGE_ENTRIES; i++) { + + /* Is this a PTE, an ATE, or invalid? */ + if (mmu_mode->pte_is_valid(entry[i], level)) { + + /* PTE: Get the physical address of the next level PGD */ + phys_addr_t pa_next = mmu_mode->pte_to_phy_addr(entry[i]) + & PHYSICAL_ADDRESS_MASK; + + /* Recurse into it */ + if (pa_next) { + va_found = pixel_gpu_itmon_search_pgd(mmu_mode, level + 1, + pa_next, pa_search, va_prefix); + if (va_found) + break; + } + + } else if (mmu_mode->ate_is_valid(entry[i], level)) { + + /* ATE: Get the page (or block) physical address */ + phys_addr_t pa_start = mmu_mode->pte_to_phy_addr(entry[i]) + & PHYSICAL_ADDRESS_MASK; + + if (pa_start) { + /* Get the size of the block: + * this may be larger than a page, depending on level. + * A competent compiler will hoist this out of the loop. + */ + int remaining_levels = MIDGARD_MMU_BOTTOMLEVEL - level; + size_t block_size = PAGE_SIZE << + (KBASE_MMU_PAGE_ENTRIES_LOG2 * remaining_levels); + + /* Test if the block contains the PA we are searching for */ + if ((pa_search >= pa_start) && + (pa_search < (pa_start + block_size))) { + + /* Combine translated and non-translated address bits */ + va_found = (va_prefix * block_size) + + (pa_search % block_size); + break; + } + } + } + + /* Advance the virtual address prefix with each entry */ + va_prefix++; + } + + kunmap(pfn_to_page(PFN_DOWN(pa_pgd))); + + return va_found; +} + +/** + * pixel_gpu_itmon_search_page_table() - Search a page table for a PA. + * + * @kbdev: The &struct kbase_device. + * @table: The &struct kbase_mmu_table to search. + * @pa: The physical address to search for. + * + * Return: The virtual address mapped to the physical address, or zero. + */ +static u64 pixel_gpu_itmon_search_page_table(struct kbase_device *kbdev, + struct kbase_mmu_table* table, phys_addr_t pa) +{ + u64 va; + + rt_mutex_lock(&table->mmu_lock); + va = pixel_gpu_itmon_search_pgd(kbdev->mmu_mode, MIDGARD_MMU_TOPLEVEL, + table->pgd, pa, 0); + rt_mutex_unlock(&table->mmu_lock); + + return va; +} + +/** + * pixel_gpu_itmon_search_context() - Search the page tables of a context. + * + * @pc: The &struct pixel_context. + * @kctx: The &struct kbase_context to search. + * + * Return: True if the faulting physical address was found. + */ +static bool pixel_gpu_itmon_search_context(struct pixel_context *pc, + struct kbase_context *kctx) +{ + u64 va = pixel_gpu_itmon_search_page_table(pc->kbdev, &kctx->mmu, + pc->itmon.pa); + + /* If a mapping was found */ + if (va) { + /* Get the task from the context */ + struct pid *pid_struct; + struct task_struct *task; + + rcu_read_lock(); + pid_struct = find_get_pid(kctx->pid); + task = pid_task(pid_struct, PIDTYPE_PID); + + /* And report it */ + dev_err(pc->kbdev->dev, + "ITMON: Faulting physical address 0x%llX appears in page table of " + "task %s (pid %u), mapped from virtual address 0x%llx (as %d)\n", + pc->itmon.pa, task ? task->comm : "[null task]", kctx->pid, va, + kctx->as_nr); + + put_pid(pid_struct); + rcu_read_unlock(); + + return true; + } + + return false; +} + +#if MALI_USE_CSF +/** + * pixel_gpu_itmon_search_csffw() - Search the CSF MCU page table. + * + * @pc: The &struct pixel_context. + * + * Return: True if the faulting physical address was found. + */ +static bool pixel_gpu_itmon_search_csffw(struct pixel_context *pc) +{ + struct kbase_device *kbdev = pc->kbdev; + + u64 va = pixel_gpu_itmon_search_page_table(kbdev, &kbdev->csf.mcu_mmu, + pc->itmon.pa); + + /* If a mapping was found */ + if (va) { + dev_err(kbdev->dev, + "ITMON: Faulting physical address 0x%llX appears in CSF MCU page " + "table, mapped from virtual address 0x%llx (as 0)\n", + pc->itmon.pa, va); + return true; + } + + return false; +} +#endif /* MALI_USE_CSF */ + +/** + * pixel_gpu_itmon_worker() - ITMON fault worker. + * + * Required to be able to lock mutexes while searching page tables. + * + * @data: The &struct work_struct. + */ +static void pixel_gpu_itmon_worker(struct work_struct *data) +{ + /* Recover the pixel_context */ + struct pixel_context *pc = container_of(data, struct pixel_context, + itmon.work); + + struct kbase_device *kbdev = pc->kbdev; + struct kbase_context *kctx; + bool found = false; + + /* Log that the work has started */ + dev_err(kbdev->dev, + "ITMON: Searching for physical address 0x%llX across all GPU page " + "tables...\n", pc->itmon.pa); + + /* Search the CSF MCU page table first */ +#if MALI_USE_CSF + found |= pixel_gpu_itmon_search_csffw(pc); +#endif + + mutex_lock(&kbdev->kctx_list_lock); + + /* Enumerate all contexts and search their page tables */ + list_for_each_entry(kctx, &kbdev->kctx_list, kctx_list_link) { + found |= pixel_gpu_itmon_search_context(pc, kctx); + } + + mutex_unlock(&kbdev->kctx_list_lock); + + /* For completeness, log that we did not find the fault address anywhere */ + if (!found) { + dev_err(kbdev->dev, + "ITMON: Faulting physical address 0x%llX NOT PRESENT in any GPU " + "page table - GPU would not have initiated this access\n", + pc->itmon.pa); + } + + /* Let the ITMON ISR know that we're done and it can continue */ + atomic_dec(&pc->itmon.active); +} + +/** + * pixel_gpu_itmon_notifier() - Handle an ITMON fault report. + * + * @nb: The &struct notifier_block inside &struct pixel_context. + * @action: Unused. + * @nb_data: The ITMON report. + * + * Return: NOTIFY_OK to continue calling other notifier blocks. + */ +static int pixel_gpu_itmon_notifier(struct notifier_block *nb, + unsigned long action, void *nb_data) +{ + /* Recover the pixel_context */ + struct pixel_context *pc = container_of(nb, struct pixel_context, itmon.nb); + + /* Get details of the ITMON report */ + struct itmon_notifier *itmon_info = nb_data; + + /* Filter out non-GPU ports */ + if ((!itmon_info->port) || + (strncmp(itmon_info->port, "GPU", 3) && + strncmp(itmon_info->port, "G3D", 3))) + return NOTIFY_OK; + + /* Immediately acknowledge that this fault matched our filter */ + dev_err(pc->kbdev->dev, + "Detected relevant ITMON fault report from %s to 0x%llX, " + "enqueueing work...\n", itmon_info->port, (u64)itmon_info->target_addr); + + /* Make sure we have finished processing previous work */ + if (atomic_fetch_inc(&pc->itmon.active) != 0) { + atomic_dec(&pc->itmon.active); + dev_err(pc->kbdev->dev, "Previous work not yet finished, skipping\n"); + return NOTIFY_OK; + } + + /* Save the PA to search for */ + pc->itmon.pa = itmon_info->target_addr; + + /* Access to GPU page tables is protected by a mutex, which we cannot lock + * here in an atomic context. Queue work to another CPU to do the search. + */ + queue_work(pc->itmon.wq, &pc->itmon.work); + + /* (Try to) busy-wait for that work to complete, before we ramdump */ + { + u64 start = ktime_get_ns(); + + while (atomic_read(&pc->itmon.active) > 0) { + + if ((ktime_get_ns() - start) < (NSEC_PER_SEC / 2)) { + udelay(10000); + } else { + dev_err(pc->kbdev->dev, + "Timed out waiting for ITMON work, this is not an error\n"); + break; + } + } + } + + return NOTIFY_OK; +} + +/** + * gpu_itmon_init() - Initialize ITMON notifier callback. + * + * @kbdev: The &struct kbase_device. + * + * Return: An error code, or 0 on success. + */ +int gpu_itmon_init(struct kbase_device *kbdev) +{ + struct pixel_context *pc = kbdev->platform_context; + + /* The additional diagnostic information offered by this callback is only + * useful if it can be collected as part of a ramdump. Ramdumps are + * disabled in "user" builds, so query the build variant and skip + * initialization if that is the case. + */ + struct device_node *dpm = of_find_node_by_name(NULL, "dpm"); + const char *variant = NULL; + if ((!dpm) || of_property_read_string(dpm, "variant", &variant) || + (!strcmp(variant, "user"))) + return 0; + + /* Create a workqueue that can run on any CPU with high priority, so that + * it can run while we (try to) wait for it in the ITMON interrupt. + */ + pc->itmon.wq = alloc_workqueue("mali_itmon_wq", WQ_UNBOUND | WQ_HIGHPRI, 1); + if (!pc->itmon.wq) + return -ENOMEM; + INIT_WORK(&pc->itmon.work, pixel_gpu_itmon_worker); + + /* Then register our ITMON notifier callback */ + pc->itmon.nb.notifier_call = pixel_gpu_itmon_notifier; + itmon_notifier_chain_register(&pc->itmon.nb); + + return 0; +} + +/** + * gpu_itmon_term() - Terminate ITMON notifier callback. + * + * @kbdev: The &struct kbase_device. + */ +void gpu_itmon_term(struct kbase_device *kbdev) +{ + struct pixel_context *pc = kbdev->platform_context; + + if (pc->itmon.wq) { + /* Unregister our ITMON notifier callback first */ + itmon_notifier_chain_unregister(&pc->itmon.nb); + + /* Then it's safe to destroy the workqueue */ + destroy_workqueue(pc->itmon.wq); + pc->itmon.wq = NULL; + } +} + +/* Depend on ITMON driver */ +MODULE_SOFTDEP("pre: itmon"); + +#endif /* IS_ENABLED(CONFIG_EXYNOS_ITMON) */
\ No newline at end of file diff --git a/mali_kbase/platform/pixel/pixel_gpu_power.c b/mali_kbase/platform/pixel/pixel_gpu_power.c index 33ea438..7b28f9e 100644 --- a/mali_kbase/platform/pixel/pixel_gpu_power.c +++ b/mali_kbase/platform/pixel/pixel_gpu_power.c @@ -15,10 +15,12 @@ /* SOC includes */ #if IS_ENABLED(CONFIG_EXYNOS_PMU_IF) #include <soc/google/exynos-pmu-if.h> +#include <soc/google/exynos-pd.h> #endif #if IS_ENABLED(CONFIG_CAL_IF) #include <soc/google/cal-if.h> #endif +#include <linux/soc/samsung/exynos-smc.h> /* Mali core includes */ #include <mali_kbase.h> @@ -27,6 +29,7 @@ #include "mali_kbase_config_platform.h" #include "pixel_gpu_control.h" #include "pixel_gpu_trace.h" +#include <trace/events/power.h> /* * GPU_PM_DOMAIN_NAMES - names for GPU power domains. @@ -40,28 +43,217 @@ static const char * const GPU_PM_DOMAIN_NAMES[GPU_PM_DOMAIN_COUNT] = { }; /** - * gpu_pm_power_on_cores() - Powers on the GPU shader cores. + * struct pixel_rail_transition - Represents a power rail state transition + * + * @begin_timestamp: Time-stamp from when the transition began + * @end_timestamp: Time-stamp from when the transition completed + * @from: Rail state at the start of the transition + * @to: Rail state at the end of the transition + **/ +struct pixel_rail_transition { + ktime_t begin_timestamp; + ktime_t end_timestamp; + uint8_t from; + uint8_t to; +} __attribute__((packed)); +_Static_assert(sizeof(struct pixel_rail_transition) == 18, + "Incorrect pixel_rail_transition size"); +_Static_assert(GPU_POWER_LEVEL_NUM < ((uint8_t)(~0U)), "gpu_power_state must fit in one byte"); + +#define PIXEL_RAIL_LOG_MAX (PAGE_SIZE / sizeof(struct pixel_rail_transition)) + +/** + * struct pixel_rail_state_metadata - Info about the rail transition log + * + * @magic: Always 'pprs', helps find the log in memory dumps + * @version: Updated whenever the binary layout changes + * @log_address: The memory address of the power rail state log + * @log_offset: The offset of the power rail state log within an SSCD + * @log_length: Number of used bytes in the power rail state log ring buffer. + * The length will be <= (FW_TRACE_BUF_NR_PAGES << PAGE_SHIFT) + * @last_entry: The last entry index, used to find the start and end of the ring buffer + * @log_entry_stride: The stride in bytes between entries within the log + * @_reserved: Bytes reserved for future use + **/ +struct pixel_rail_state_metadata { + char magic[4]; + uint8_t version; + uint64_t log_address; + uint32_t log_offset; + uint32_t log_length; + uint32_t last_entry; + uint8_t log_entry_stride; + char _reserved[6]; +} __attribute__((packed)); +_Static_assert(sizeof(struct pixel_rail_state_metadata) == 32, + "Incorrect pixel_rail_state_metadata size"); + + +/** + * struct pixel_rail_state_log - Log containing a record of power rail state transitions + * + * @meta: Info about the log + * @log_rb: The actual log + **/ +struct pixel_rail_state_log { + struct pixel_rail_state_metadata meta; + struct pixel_rail_transition log_rb[PIXEL_RAIL_LOG_MAX]; +} __attribute__((packed)); + +/** + * gpu_pm_rail_state_log_last_entry() - Get a handle to the last logged rail transition + * + * @log: The &struct pixel_rail_state_log containing all logged transitions + * + * Context: Process context + * + * Return: Most recent log entry + */ +static struct pixel_rail_transition * +gpu_pm_rail_state_log_last_entry(struct pixel_rail_state_log *log) +{ + return &log->log_rb[log->meta.last_entry]; +} + +/** + * gpu_pm_rail_state_start_transition_lock() - Mark the start of a power rail transition + * + * @pc: The &struct pixel_context for the GPU + * + * Mark the beginning of a power rail transition. This function starts a critical section + * by holding the pm.lock, and creates a new log entry to record the transition. + * + * Context: Process context, acquires pc->pm.lock and does not release it + */ +static void gpu_pm_rail_state_start_transition_lock(struct pixel_context *pc) +{ + struct pixel_rail_state_log *log; + struct pixel_rail_transition *entry; + + mutex_lock(&pc->pm.lock); + + log = pc->pm.rail_state_log; + log->meta.last_entry = (log->meta.last_entry + 1) % PIXEL_RAIL_LOG_MAX; + log->meta.log_length = max(log->meta.last_entry, log->meta.log_length); + entry = gpu_pm_rail_state_log_last_entry(log); + + /* Clear to prevent leaking an old event */ + memset(entry, 0, sizeof(struct pixel_rail_transition)); + + entry->from = (uint8_t)pc->pm.state; + entry->begin_timestamp = ktime_get_ns(); +} + +/** + * gpu_pm_rail_state_end_transition_unlock() - Mark the end of a power rail transition + * + * @pc: The &struct pixel_context for the GPU + * + * Mark the end of a power rail transition. This function ends a critical section + * by releasing the pm.lock, and completes the partial event log entry added when + * the transition began. + * + * Context: Process context, expects pc->pm.lock to be held, releases pc->pm.lock + */ +static void gpu_pm_rail_state_end_transition_unlock(struct pixel_context *pc) +{ + struct pixel_rail_transition *entry; + + lockdep_assert_held(&pc->pm.lock); + + entry = gpu_pm_rail_state_log_last_entry(pc->pm.rail_state_log); + + entry->end_timestamp = ktime_get_ns(); + entry->to = (uint8_t)pc->pm.state; + trace_gpu_power_state(entry->end_timestamp - entry->begin_timestamp, entry->from, entry->to); + + mutex_unlock(&pc->pm.lock); +} + +/** + * gpu_pm_get_rail_state_log() - Obtain a handle to the rail state log * * @kbdev: The &struct kbase_device for the GPU. * - * Powers on the CORES domain and issues trace points and events. Also powers on TOP and cancels - * any pending suspend operations on it. + * Context: Process context * - * Context: Process context. Takes and releases PM lock. + * Return: Opaque handle to rail state log + */ +void* gpu_pm_get_rail_state_log(struct kbase_device *kbdev) +{ + return ((struct pixel_context *)kbdev->platform_context)->pm.rail_state_log; +} + + +/** + * gpu_pm_get_rail_state_log_size() - Size in bytes of the rail state log * - * Return: If GPU state has been lost, 1 is returned. Otherwise 0 is returned. + * @kbdev: The &struct kbase_device for the GPU. + * + * Context: Process context + * + * Return: Size in bytes of the rail state log, for dumping purposes + */ +unsigned int gpu_pm_get_rail_state_log_size(struct kbase_device *kbdev) +{ + return sizeof(struct pixel_rail_state_log); +} + +/** + * gpu_pm_rail_state_log_init() - Allocate and initialize the power rail state transition log + * + * @kbdev: The &struct kbase_device for the GPU. + * + * Context: Process context + * + * Return: Owning pointer to allocated rail state log + */ +static struct pixel_rail_state_log* gpu_pm_rail_state_log_init(struct kbase_device *kbdev) +{ + struct pixel_rail_state_log* log = kzalloc(sizeof(struct pixel_rail_state_log), GFP_KERNEL); + + if (log == NULL) { + dev_err(kbdev->dev, "Failed to allocated pm_rail_state_log"); + return log; + } + + log->meta = (struct pixel_rail_state_metadata) { + .magic = "pprs", + .version = 1, + .log_address = (uint64_t)log->log_rb, + .log_offset = offsetof(struct pixel_rail_state_log, log_rb), + .log_length = 0, + .last_entry = 0, + .log_entry_stride = (uint8_t)sizeof(struct pixel_rail_transition), + }; + + return log; +} + +/** + * gpu_pm_rail_state_log_term() - Free the rail state transition log + * + * @log: The &struct pixel_rail_state_log to destroy + * + * Context: Process context */ -static int gpu_pm_power_on_cores(struct kbase_device *kbdev) +static void gpu_pm_rail_state_log_term(struct pixel_rail_state_log *log) +{ + kfree(log); +} + +/** + * gpu_pm_power_on_top_nolock() - See gpu_pm_power_on_top + * + * @kbdev: The &struct kbase_device for the GPU. + */ +static int gpu_pm_power_on_top_nolock(struct kbase_device *kbdev) { int ret; struct pixel_context *pc = kbdev->platform_context; - u64 start_ns = ktime_get_ns(); - - mutex_lock(&pc->pm.lock); pm_runtime_get_sync(pc->pm.domain_devs[GPU_PM_DOMAIN_TOP]); pm_runtime_get_sync(pc->pm.domain_devs[GPU_PM_DOMAIN_CORES]); - /* * We determine whether GPU state was lost by detecting whether the GPU state reached * GPU_POWER_LEVEL_OFF before we entered this function. The GPU state is set to be @@ -75,61 +267,114 @@ static int gpu_pm_power_on_cores(struct kbase_device *kbdev) */ ret = (pc->pm.state == GPU_POWER_LEVEL_OFF); - trace_gpu_power_state(ktime_get_ns() - start_ns, - GPU_POWER_LEVEL_GLOBAL, GPU_POWER_LEVEL_STACKS); + gpu_dvfs_enable_updates(kbdev); #ifdef CONFIG_MALI_MIDGARD_DVFS + kbase_pm_metrics_start(kbdev); gpu_dvfs_event_power_on(kbdev); #endif - #if IS_ENABLED(CONFIG_GOOGLE_BCL) + if (!pc->pm.bcl_dev) + pc->pm.bcl_dev = google_retrieve_bcl_handle(); if (pc->pm.bcl_dev) google_init_gpu_ratio(pc->pm.bcl_dev); #endif - pc->pm.state = GPU_POWER_LEVEL_STACKS; +#if !IS_ENABLED(CONFIG_SOC_GS101) + if (exynos_smc(SMC_PROTECTION_SET, 0, PROT_G3D, SMC_PROTECTION_ENABLE) != 0) { + dev_err(kbdev->dev, "Couldn't enable protected mode after GPU power-on"); + } +#endif - mutex_unlock(&pc->pm.lock); + pc->pm.state = GPU_POWER_LEVEL_STACKS; return ret; } /** - * gpu_pm_power_off_cores() - Powers off the GPU shader cores. + * gpu_pm_power_on_top() - Powers on the GPU global domains and shader cores. * * @kbdev: The &struct kbase_device for the GPU. * - * Powers off the CORES domain and issues trace points and events. Also marks the TOP domain for - * delayed suspend. Complete power down of all GPU domains will only occur after this delayed - * suspend, and the kernel notifies of this change via the &gpu_pm_callback_power_runtime_suspend - * callback. + * Powers on the CORES domain and issues trace points and events. Also powers on TOP and cancels + * any pending suspend operations on it. * - * Note: If the we have already performed these operations without an intervening call to - * &gpu_pm_power_on_cores, then we take no action. + * Context: Process context. Takes and releases PM lock. * - * Context: Process context. Takes and releases the PM lock. + * Return: If GPU state has been lost, 1 is returned. Otherwise 0 is returned. */ -static void gpu_pm_power_off_cores(struct kbase_device *kbdev) +static int gpu_pm_power_on_top(struct kbase_device *kbdev) { + int ret; struct pixel_context *pc = kbdev->platform_context; - u64 start_ns = ktime_get_ns(); - mutex_lock(&pc->pm.lock); + gpu_pm_rail_state_start_transition_lock(pc); + ret = gpu_pm_power_on_top_nolock(kbdev); + gpu_pm_rail_state_end_transition_unlock(pc); + + return ret; +} + +/** + * gpu_pm_power_off_top_nolock() - See gpu_pm_power_off_top + * + * @kbdev: The &struct kbase_device for the GPU. + */ +static void gpu_pm_power_off_top_nolock(struct kbase_device *kbdev) +{ + struct pixel_context *pc = kbdev->platform_context; - if (pc->pm.state > GPU_POWER_LEVEL_GLOBAL) { + if (pc->pm.state == GPU_POWER_LEVEL_STACKS) { pm_runtime_put_sync(pc->pm.domain_devs[GPU_PM_DOMAIN_CORES]); pc->pm.state = GPU_POWER_LEVEL_GLOBAL; + } + + if (pc->pm.state == GPU_POWER_LEVEL_GLOBAL) { +#if !IS_ENABLED(CONFIG_SOC_GS101) + if (exynos_smc(SMC_PROTECTION_SET, 0, PROT_G3D, SMC_PROTECTION_DISABLE) != 0) { + dev_err(kbdev->dev, "Couldn't disable protected mode before GPU power-off"); + } +#endif - pm_runtime_mark_last_busy(pc->pm.domain_devs[GPU_PM_DOMAIN_TOP]); - pm_runtime_put_autosuspend(pc->pm.domain_devs[GPU_PM_DOMAIN_TOP]); + gpu_dvfs_disable_updates(kbdev); + + if (pc->pm.use_autosuspend) { + pm_runtime_mark_last_busy(pc->pm.domain_devs[GPU_PM_DOMAIN_TOP]); + pm_runtime_put_autosuspend(pc->pm.domain_devs[GPU_PM_DOMAIN_TOP]); + } else { + pm_runtime_put_sync_suspend(pc->pm.domain_devs[GPU_PM_DOMAIN_TOP]); + } + pc->pm.state = GPU_POWER_LEVEL_OFF; - trace_gpu_power_state(ktime_get_ns() - start_ns, - GPU_POWER_LEVEL_STACKS, GPU_POWER_LEVEL_GLOBAL); #ifdef CONFIG_MALI_MIDGARD_DVFS gpu_dvfs_event_power_off(kbdev); + kbase_pm_metrics_stop(kbdev); #endif + } +} - mutex_unlock(&pc->pm.lock); +/** + * gpu_pm_power_off_top() - Instruct GPU to transition to OFF. + * + * @kbdev: The &struct kbase_device for the GPU. + * + * Powers off the CORES domain if they are on. Marks the TOP domain for delayed + * suspend. The complete power down of all GPU domains will only occur after + * this delayed suspend, and the kernel notifies of this change via the + * &gpu_pm_callback_power_runtime_suspend callback. + * + * Note: If the we have already performed these operations without an intervening call to + * &gpu_pm_power_on_top, then we take no action. + * + * Context: Process context. Takes and releases the PM lock. + */ +static void gpu_pm_power_off_top(struct kbase_device *kbdev) +{ + struct pixel_context *pc = kbdev->platform_context; + + gpu_pm_rail_state_start_transition_lock(pc); + gpu_pm_power_off_top_nolock(kbdev); + gpu_pm_rail_state_end_transition_unlock(pc); } /** @@ -152,7 +397,7 @@ static int gpu_pm_callback_power_on(struct kbase_device *kbdev) { dev_dbg(kbdev->dev, "%s\n", __func__); - return gpu_pm_power_on_cores(kbdev); + return gpu_pm_power_on_top(kbdev); } /** @@ -170,7 +415,7 @@ static void gpu_pm_callback_power_off(struct kbase_device *kbdev) { dev_dbg(kbdev->dev, "%s\n", __func__); - gpu_pm_power_off_cores(kbdev); + gpu_pm_power_off_top(kbdev); } /** @@ -204,117 +449,168 @@ static void gpu_pm_callback_power_suspend(struct kbase_device *kbdev) { dev_dbg(kbdev->dev, "%s\n", __func__); - gpu_pm_power_off_cores(kbdev); + gpu_pm_power_off_top(kbdev); } -#ifdef KBASE_PM_RUNTIME +#if IS_ENABLED(KBASE_PM_RUNTIME) /** - * gpu_pm_callback_power_runtime_suspend() - Called when a TOP domain is going to runtime suspend + * gpu_pm_callback_power_runtime_init() - Initialize runtime power management. * - * @dev: The device that is going to runtime suspend + * @kbdev: The &struct kbase_device for the GPU. * - * This callback is made when @dev is about to enter runtime suspend. In our case, this occurs when - * the TOP domain of GPU is about to enter runtime suspend. At this point we take the opportunity - * to store that state will be lost and disable DVFS metrics gathering. + * This callback is made by the core Mali driver at the point where runtime power management is + * being initialized early on in the probe of the Mali device. * - * Note: This function doesn't take the PM lock prior to updating GPU state as it doesn't explicitly - * attempt to update GPU power domain state. The caller of this function (or another function - * further up the callstack) will hold &power.lock for the TOP domain's &struct device and - * that is sufficient for ensuring serialization of the GPU power state. + * We enable autosuspend for the TOP domain so that after the autosuspend delay, the core Mali + * driver knows to disable the collection of GPU utilization data used for DVFS purposes. * - * Return: Always returns 0. + * Return: Returns 0 on success, or an error code on failure. */ -static int gpu_pm_callback_power_runtime_suspend(struct device *dev) +static int gpu_pm_callback_power_runtime_init(struct kbase_device *kbdev) { - struct kbase_device *kbdev = dev_get_drvdata(dev); struct pixel_context *pc = kbdev->platform_context; dev_dbg(kbdev->dev, "%s\n", __func__); - WARN_ON(pc->pm.state > GPU_POWER_LEVEL_GLOBAL); - pc->pm.state = GPU_POWER_LEVEL_OFF; + if (!pm_runtime_enabled(pc->pm.domain_devs[GPU_PM_DOMAIN_TOP]) || + !pm_runtime_enabled(pc->pm.domain_devs[GPU_PM_DOMAIN_CORES])) { + dev_warn(kbdev->dev, "pm_runtime not enabled\n"); + return -ENOSYS; + } -#ifdef CONFIG_MALI_MIDGARD_DVFS - kbase_pm_metrics_stop(kbdev); -#endif + if (pc->pm.use_autosuspend) { + pm_runtime_set_autosuspend_delay(pc->pm.domain_devs[GPU_PM_DOMAIN_TOP], + pc->pm.autosuspend_delay); + pm_runtime_use_autosuspend(pc->pm.domain_devs[GPU_PM_DOMAIN_TOP]); + } return 0; } /** - * gpu_pm_callback_power_runtime_resume() - Called when a TOP domain is going to runtime resume - * - * @dev: The device that is going to runtime suspend + * kbase_device_runtime_term() - Initialize runtime power management. * - * This callback is made when @dev is about to runtime resume. In our case, this occurs when - * the TOP domain of GPU is about to runtime resume. We use this callback to enable DVFS metrics - * gathering. + * @kbdev: The &struct kbase_device for the GPU. * - * Return: Always returns 0. + * This callback is made via the core Mali driver at the point where runtime power management needs + * to be de-initialized. Currently this only happens if the device probe fails at a point after + * which runtime power management has been initialized. */ -static int gpu_pm_callback_power_runtime_resume(struct device *dev) +static void gpu_pm_callback_power_runtime_term(struct kbase_device *kbdev) { -#ifdef CONFIG_MALI_MIDGARD_DVFS - struct kbase_device *kbdev = dev_get_drvdata(dev); + struct pixel_context *pc = kbdev->platform_context; - kbase_pm_metrics_start(kbdev); -#endif - return 0; + dev_dbg(kbdev->dev, "%s\n", __func__); + + pm_runtime_disable(pc->pm.domain_devs[GPU_PM_DOMAIN_CORES]); + pm_runtime_disable(pc->pm.domain_devs[GPU_PM_DOMAIN_TOP]); } +#endif /* IS_ENABLED(KBASE_PM_RUNTIME) */ + + +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS /** - * gpu_pm_callback_power_runtime_init() - Initialize runtime power management. + * gpu_pm_power_on_cores() - Powers on the GPU shader cores for + * CONFIG_MALI_HOST_CONTROLS_SC_RAILS integrations. * * @kbdev: The &struct kbase_device for the GPU. * - * This callback is made by the core Mali driver at the point where runtime power management is - * being initialized early on in the probe of the Mali device. - * - * We enable autosuspend for the TOP domain so that after the autosuspend delay, the core Mali - * driver knows to disable the collection of GPU utilization data used for DVFS purposes. + * Powers on the CORES domain for CONFIG_MALI_HOST_CONTROLS_SC_RAILS + * integrations. Afterwards shaders must be powered and may be used by GPU. * - * Return: Returns 0 on success, or an error code on failure. + * Context: Process context. Takes and releases PM lock. */ -static int gpu_pm_callback_power_runtime_init(struct kbase_device *kbdev) -{ +static void gpu_pm_power_on_cores(struct kbase_device *kbdev) { struct pixel_context *pc = kbdev->platform_context; - dev_dbg(kbdev->dev, "%s\n", __func__); + gpu_pm_rail_state_start_transition_lock(pc); - pm_runtime_set_autosuspend_delay(pc->pm.domain_devs[GPU_PM_DOMAIN_TOP], - pc->pm.autosuspend_delay); - pm_runtime_use_autosuspend(pc->pm.domain_devs[GPU_PM_DOMAIN_TOP]); + if (pc->pm.state == GPU_POWER_LEVEL_GLOBAL && pc->pm.ifpo_enabled) { + pm_runtime_get_sync(pc->pm.domain_devs[GPU_PM_DOMAIN_CORES]); + pc->pm.state = GPU_POWER_LEVEL_STACKS; - if (!pm_runtime_enabled(pc->pm.domain_devs[GPU_PM_DOMAIN_TOP]) || - !pm_runtime_enabled(pc->pm.domain_devs[GPU_PM_DOMAIN_CORES])) { - dev_warn(kbdev->dev, "pm_runtime not enabled\n"); - return -ENOSYS; +#ifdef CONFIG_MALI_MIDGARD_DVFS + gpu_dvfs_event_power_on(kbdev); +#endif } - return 0; + gpu_pm_rail_state_end_transition_unlock(pc); } /** - * kbase_device_runtime_term() - Initialize runtime power management. + * gpu_pm_power_off_cores() - Powers off the GPU shader cores for + * CONFIG_MALI_HOST_CONTROLS_SC_RAILS integrations. * * @kbdev: The &struct kbase_device for the GPU. * - * This callback is made via the core Mali driver at the point where runtime power management needs - * to be de-initialized. Currently this only happens if the device probe fails at a point after - * which runtime power management has been initialized. + * Powers off the CORES domain for CONFIG_MALI_HOST_CONTROLS_SC_RAILS + * integrations. Afterwards shaders are not powered and may not be used by GPU. + * + * Context: Process context. Takes and releases PM lock. */ -static void gpu_pm_callback_power_runtime_term(struct kbase_device *kbdev) -{ +static void gpu_pm_power_off_cores(struct kbase_device *kbdev) { struct pixel_context *pc = kbdev->platform_context; + gpu_pm_rail_state_start_transition_lock(pc); + + if (pc->pm.state == GPU_POWER_LEVEL_STACKS && pc->pm.ifpo_enabled) { + pm_runtime_put_sync(pc->pm.domain_devs[GPU_PM_DOMAIN_CORES]); + pc->pm.state = GPU_POWER_LEVEL_GLOBAL; + +#ifdef CONFIG_MALI_MIDGARD_DVFS + gpu_dvfs_event_power_off(kbdev); +#endif + } + + gpu_pm_rail_state_end_transition_unlock(pc); +} + +/** + * gpu_pm_callback_power_sc_rails_on() - Called by GPU when shaders are needed. + * + * @kbdev: The device that needs its shaders powered on. + * + * This callback is made when @dev needs shader cores powered on integrations + * using CONFIG_MALI_HOST_CONTROLS_SC_RAILS. + */ +static void gpu_pm_callback_power_sc_rails_on(struct kbase_device *kbdev) { dev_dbg(kbdev->dev, "%s\n", __func__); - pm_runtime_disable(pc->pm.domain_devs[GPU_PM_DOMAIN_CORES]); - pm_runtime_disable(pc->pm.domain_devs[GPU_PM_DOMAIN_TOP]); + gpu_pm_power_on_cores(kbdev); } -#endif /* KBASE_PM_RUNTIME */ +/** + * gpu_pm_callback_power_sc_rails_off() - Called by GPU when shaders are idle. + * + * @kbdev: The device that needs its shaders powered on. + * + * This callback is made when @dev coud have its shader cores powered off on + * integrations using CONFIG_MALI_HOST_CONTROLS_SC_RAILS. + */ +static void gpu_pm_callback_power_sc_rails_off(struct kbase_device *kbdev) { + dev_dbg(kbdev->dev, "%s\n", __func__); + + gpu_pm_power_off_cores(kbdev); +} +#endif /* CONFIG_MALI_HOST_CONTROLS_SC_RAILS */ + +static void gpu_pm_hw_reset(struct kbase_device *kbdev) +{ + struct pixel_context *pc = kbdev->platform_context; + + /* Ensure the power cycle happens inside one critical section */ + gpu_pm_rail_state_start_transition_lock(pc); + + dev_warn(kbdev->dev, "pixel: performing GPU hardware reset"); + + gpu_pm_power_off_top_nolock(kbdev); + /* GPU state loss is intended */ + (void)gpu_pm_power_on_top_nolock(kbdev); + + gpu_pm_rail_state_end_transition_unlock(pc); +} /* * struct pm_callbacks - Callbacks for linking to core Mali KMD power management @@ -350,7 +646,7 @@ struct kbase_pm_callback_conf pm_callbacks = { .power_on_callback = gpu_pm_callback_power_on, .power_suspend_callback = gpu_pm_callback_power_suspend, .power_resume_callback = NULL, -#ifdef KBASE_PM_RUNTIME +#if IS_ENABLED(KBASE_PM_RUNTIME) .power_runtime_init_callback = gpu_pm_callback_power_runtime_init, .power_runtime_term_callback = gpu_pm_callback_power_runtime_term, .power_runtime_off_callback = NULL, @@ -363,39 +659,16 @@ struct kbase_pm_callback_conf pm_callbacks = { .power_runtime_on_callback = NULL, .power_runtime_idle_callback = NULL, #endif /* KBASE_PM_RUNTIME */ - .soft_reset_callback = NULL + .soft_reset_callback = NULL, + .hardware_reset_callback = gpu_pm_hw_reset, +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + .power_on_sc_rails_callback = gpu_pm_callback_power_sc_rails_on, + .power_off_sc_rails_callback = gpu_pm_callback_power_sc_rails_off, +#endif /* CONFIG_MALI_HOST_CONTROLS_SC_RAILS */ }; /** - * gpu_pm_get_pm_cores_domain() - Find the GPU's power domain. - * - * @g3d_genpd_name: A string containing the name of the power domain - * - * Searches through the available power domains in device tree for one that - * matched @g3d_genpd_name and returns it if found. - * - * Return: A pointer to the power domain if found, NULL otherwise. - */ -static struct exynos_pm_domain *gpu_pm_get_pm_cores_domain(const char *g3d_genpd_name) -{ - struct device_node *np; - struct platform_device *pdev; - struct exynos_pm_domain *pd; - - for_each_compatible_node(np, NULL, "samsung,exynos-pd") { - if (of_device_is_available(np)) { - pdev = of_find_device_by_node(np); - pd = (struct exynos_pm_domain *)platform_get_drvdata(pdev); - if (strcmp(g3d_genpd_name, (const char *)(pd->genpd.name)) == 0) - return pd; - } - } - - return NULL; -} - -/** - * gpu_pm_get_power_state() - Returns the current power state of a GPU. + * gpu_pm_get_power_state() - Returns the current power state of the GPU. * * @kbdev: The &struct kbase_device for the GPU. * @@ -472,20 +745,17 @@ int gpu_pm_init(struct kbase_device *kbdev) } } - /* - * We set up runtime pm callbacks specifically for the TOP domain. This is so that when we - * use autosupend it will only affect the TOP domain and not CORES as we control the power - * state of CORES directly. - */ - pc->pm.domain_devs[GPU_PM_DOMAIN_TOP]->pm_domain->ops.runtime_suspend = - &gpu_pm_callback_power_runtime_suspend; - pc->pm.domain_devs[GPU_PM_DOMAIN_TOP]->pm_domain->ops.runtime_resume = - &gpu_pm_callback_power_runtime_resume; +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + pc->pm.ifpo_enabled = true; +#endif if (of_property_read_u32(np, "gpu_pm_autosuspend_delay", &pc->pm.autosuspend_delay)) { - pc->pm.autosuspend_delay = AUTO_SUSPEND_DELAY; - dev_info(kbdev->dev, "autosuspend delay not set in DT, using default of %dms\n", - AUTO_SUSPEND_DELAY); + pc->pm.use_autosuspend = false; + pc->pm.autosuspend_delay = 0; + dev_info(kbdev->dev, "using synchronous suspend for TOP domain\n"); + } else { + pc->pm.use_autosuspend = true; + dev_info(kbdev->dev, "autosuspend delay set to %ims for TOP domain\n", pc->pm.autosuspend_delay); } if (of_property_read_u32(np, "gpu_pmu_status_reg_offset", &pc->pm.status_reg_offset)) { @@ -507,14 +777,19 @@ int gpu_pm_init(struct kbase_device *kbdev) goto error; } - pc->pm.domain = gpu_pm_get_pm_cores_domain(g3d_power_domain_name); - if (pc->pm.domain == NULL) + pc->pm.domain = exynos_pd_lookup_name(g3d_power_domain_name); + if (pc->pm.domain == NULL) { + dev_err(kbdev->dev, "Failed to find GPU power domain '%s'\n", + g3d_power_domain_name); return -ENODEV; + } #if IS_ENABLED(CONFIG_GOOGLE_BCL) pc->pm.bcl_dev = google_retrieve_bcl_handle(); #endif + pc->pm.rail_state_log = gpu_pm_rail_state_log_init(kbdev); + return 0; error: @@ -535,6 +810,8 @@ void gpu_pm_term(struct kbase_device *kbdev) struct pixel_context *pc = kbdev->platform_context; int i; + gpu_pm_rail_state_log_term(pc->pm.rail_state_log); + for (i = 0; i < GPU_PM_DOMAIN_COUNT; i++) { if (pc->pm.domain_devs[i]) { if (pc->pm.domain_links[i]) diff --git a/mali_kbase/platform/pixel/pixel_gpu_slc.c b/mali_kbase/platform/pixel/pixel_gpu_slc.c new file mode 100644 index 0000000..94409d2 --- /dev/null +++ b/mali_kbase/platform/pixel/pixel_gpu_slc.c @@ -0,0 +1,462 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright 2022-2023 Google LLC. + * + * Author: Jack Diver <diverj@google.com> + */ + +/* Mali core includes */ +#include <mali_kbase.h> + +/* UAPI includes */ +#include <uapi/gpu/arm/midgard/platform/pixel/pixel_gpu_common_slc.h> +/* Back-door mali_pixel include */ +#include <uapi/gpu/arm/midgard/platform/pixel/pixel_memory_group_manager.h> + +/* Pixel integration includes */ +#include "mali_kbase_config_platform.h" +#include "pixel_gpu_slc.h" + +struct dirty_region { + u64 first_vpfn; + u64 last_vpfn; + u64 dirty_pgds; +}; + +/** + * struct gpu_slc_liveness_update_info - Buffer info, and live ranges + * + * @buffer_va: Array of buffer base virtual addresses + * @buffer_sizes: Array of buffer sizes + * @buffer_count: Number of elements in the va and sizes buffers + * @live_ranges: Array of &struct kbase_pixel_gpu_slc_liveness_mark denoting live ranges for + * each buffer + * @live_ranges_count: Number of elements in the live ranges buffer + */ +struct gpu_slc_liveness_update_info { + u64* buffer_va; + u64* buffer_sizes; + u64 buffer_count; + struct kbase_pixel_gpu_slc_liveness_mark* live_ranges; + u64 live_ranges_count; +}; + +/** + * gpu_slc_lock_as - Lock the current process address space + * + * @kctx: The &struct kbase_context + */ +static void gpu_slc_lock_as(struct kbase_context *kctx) +{ + down_write(kbase_mem_get_process_mmap_lock()); + kbase_gpu_vm_lock(kctx); +} + +/** + * gpu_slc_unlock_as - Unlock the current process address space + * + * @kctx: The &struct kbase_context + */ +static void gpu_slc_unlock_as(struct kbase_context *kctx) +{ + kbase_gpu_vm_unlock(kctx); + up_write(kbase_mem_get_process_mmap_lock()); +} + +/** + * gpu_slc_in_group - Check whether the region is SLC cacheable + * + * @reg: The gpu memory region to check for an SLC cacheable memory group. + */ +static bool gpu_slc_in_group(struct kbase_va_region* reg) +{ + return reg->gpu_alloc->group_id == MGM_SLC_GROUP_ID; +} + +/** + * gpu_slc_get_region - Find the gpu memory region from a virtual address + * + * @kctx: The &struct kbase_context + * @va: The base gpu virtual address of the region + * + * Return: On success, returns a valid memory region. On failure NULL is returned. + */ +static struct kbase_va_region* gpu_slc_get_region(struct kbase_context *kctx, u64 va) +{ + struct kbase_va_region *reg; + + if (!va) + goto invalid; + + if ((va & ~PAGE_MASK) && (va >= PAGE_SIZE)) + goto invalid; + + /* Find the region that the virtual address belongs to */ + reg = kbase_region_tracker_find_region_base_address(kctx, va); + + /* Validate the region */ + if (kbase_is_region_invalid_or_free(reg)) + goto invalid; + + return reg; + +invalid: + dev_dbg(kctx->kbdev->dev, "pixel: failed to find valid region for gpu_va: %llu", va); + return NULL; +} + +/** + * gpu_slc_migrate_region - Add PBHA that will make the pages SLC cacheable + * + * @kctx: The &struct kbase_context + * @reg: The gpu memory region migrate to an SLC cacheable memory group + * @dirty_reg: The &struct dirty_region containing the extent of the dirty page table entries + */ +static void gpu_slc_migrate_region(struct kbase_context *kctx, struct kbase_va_region *reg, struct dirty_region *dirty_reg) +{ + int err; + u64 vpfn; + size_t page_nr; + + KBASE_DEBUG_ASSERT(kctx); + KBASE_DEBUG_ASSERT(reg); + + vpfn = reg->start_pfn; + page_nr = kbase_reg_current_backed_size(reg); + + err = kbase_mmu_update_pages_no_flush(kctx->kbdev, &kctx->mmu, vpfn, + kbase_get_gpu_phy_pages(reg), + page_nr, + reg->flags, + MGM_SLC_GROUP_ID, + &dirty_reg->dirty_pgds); + + /* Track the dirty region */ + dirty_reg->first_vpfn = min(dirty_reg->first_vpfn, vpfn); + dirty_reg->last_vpfn = max(dirty_reg->last_vpfn, vpfn + page_nr); + + if (err) + dev_warn(kctx->kbdev->dev, "pixel: failed to move region to SLC: %d", err); + else + /* If everything is good, then set the new group on the region. */ + reg->gpu_alloc->group_id = MGM_SLC_GROUP_ID; +} + +/** + * gpu_slc_flush_dirty_region - Perform an MMU flush for a dirty page region + * + * @kctx: The &struct kbase_context + * @dirty_reg: The &struct dirty_region containing the extent of the dirty page table entries + */ +static void gpu_slc_flush_dirty_region(struct kbase_context *kctx, struct dirty_region *dirty_reg) +{ + size_t const dirty_page_nr = + (dirty_reg->last_vpfn - min(dirty_reg->first_vpfn, dirty_reg->last_vpfn)); + + if (!dirty_page_nr) + return; + + kbase_mmu_flush_invalidate_update_pages( + kctx->kbdev, kctx, dirty_reg->first_vpfn, dirty_page_nr, dirty_reg->dirty_pgds); +} + +/** + * gpu_slc_resize_partition - Attempt to resize the GPU's SLC partition to meet demand. + * + * @kbdev: The &struct kbase_device for the GPU. + */ +static void gpu_slc_resize_partition(struct kbase_device* kbdev) +{ + struct pixel_context *pc = kbdev->platform_context; + + /* Request that the mgm select an SLC partition that fits our demand */ + pixel_mgm_resize_group_to_fit(kbdev->mgm_dev, MGM_SLC_GROUP_ID, pc->slc.demand); + + dev_dbg(kbdev->dev, "pixel: resized GPU SLC partition to meet demand: %llu", pc->slc.demand); +} + +/** + * gpu_slc_get_partition_size - Query the current size of the GPU's SLC partition. + * + * @kbdev: The &struct kbase_device for the GPU. + * + * Returns the size of the GPU's SLC partition. + */ +static u64 gpu_slc_get_partition_size(struct kbase_device* kbdev) +{ + u64 const partition_size = pixel_mgm_query_group_size(kbdev->mgm_dev, MGM_SLC_GROUP_ID); + + dev_dbg(kbdev->dev, "pixel: GPU SLC partition partition size: %llu", partition_size); + + return partition_size; +} + +/** + * gpu_slc_liveness_update - Respond to a liveness update by trying to put the new buffers into free + * SLC space, and resizing the partition to meet demand. + * + * @kctx: The &struct kbase_context corresponding to a user space context which sent the liveness + * update + * @info: See struct gpu_slc_liveness_update_info + */ +static void gpu_slc_liveness_update(struct kbase_context* kctx, + struct gpu_slc_liveness_update_info* info) +{ + struct kbase_device* kbdev = kctx->kbdev; + struct pixel_context *pc = kbdev->platform_context; + struct pixel_platform_data *kctx_pd = kctx->platform_data; + struct dirty_region dirty_reg = { + .first_vpfn = U64_MAX, + .last_vpfn = 0, + .dirty_pgds = 0, + }; + u64 current_usage = 0; + u64 current_demand = 0; + u64 free_space; + int i; + + /* Lock the process address space before modifying ATE's */ + gpu_slc_lock_as(kctx); + + /* Synchronize updates to the partition size and usage */ + mutex_lock(&pc->slc.lock); + + dev_dbg(kbdev->dev, "pixel: buffer liveness update received"); + + /* Remove the usage and demand from the previous liveness update */ + pc->slc.demand -= kctx_pd->slc.peak_demand; + pc->slc.usage -= kctx_pd->slc.peak_usage; + kctx_pd->slc.peak_demand = 0; + kctx_pd->slc.peak_usage = 0; + + /* Calculate the remaining free space in the SLC partition (floored at 0) */ + free_space = gpu_slc_get_partition_size(kbdev); + free_space -= min(free_space, pc->slc.usage); + + for (i = 0; i < info->live_ranges_count; ++i) + { + struct kbase_va_region *reg; + u64 size; + u64 va; + u32 index = info->live_ranges[i].index; + + if (unlikely(index >= info->buffer_count)) + continue; + + size = info->buffer_sizes[index]; + va = info->buffer_va[index]; + + reg = gpu_slc_get_region(kctx, va); + if(!reg) + continue; + + switch (info->live_ranges[i].type) + { + case KBASE_PIXEL_GPU_LIVE_RANGE_BEGIN: + /* Update demand as though there's no size limit */ + current_demand += size; + kctx_pd->slc.peak_demand = max(kctx_pd->slc.peak_demand, current_demand); + + /* Check whether there's free space in the partition to store the buffer */ + if (free_space >= current_usage + size) + gpu_slc_migrate_region(kctx, reg, &dirty_reg); + + /* This may be true, even if the space calculation above returned false, + * as a previous call to this function may have migrated the region. + * In such a scenario, the current_usage may exceed the available free_space + * and we will be oversubscribed to the SLC partition. + * We could migrate the region back to the non-SLC group, but this would + * require an SLC flush, so for now we do nothing. + */ + if (gpu_slc_in_group(reg)) { + current_usage += size; + kctx_pd->slc.peak_usage = max(kctx_pd->slc.peak_usage, current_usage); + } + break; + case KBASE_PIXEL_GPU_LIVE_RANGE_END: + current_demand -= size; + if (gpu_slc_in_group(reg)) + current_usage -= size; + break; + } + } + /* Perform single page table flush */ + gpu_slc_flush_dirty_region(kctx, &dirty_reg); + + /* Indicates a missing live range end marker */ + WARN_ON_ONCE(current_demand != 0 || current_usage != 0); + + /* Update the total usage and demand */ + pc->slc.demand += kctx_pd->slc.peak_demand; + pc->slc.usage += kctx_pd->slc.peak_usage; + + dev_dbg(kbdev->dev, + "pixel: kctx_%d, peak_demand: %llu, peak_usage: %llu", + kctx->id, + kctx_pd->slc.peak_demand, + kctx_pd->slc.peak_usage); + dev_dbg(kbdev->dev, "pixel: kbdev, demand: %llu, usage: %llu", pc->slc.demand, pc->slc.usage); + + /* Trigger partition resize based on the new demand */ + gpu_slc_resize_partition(kctx->kbdev); + + mutex_unlock(&pc->slc.lock); + gpu_slc_unlock_as(kctx); +} + +/** + * gpu_pixel_handle_buffer_liveness_update_ioctl() - See gpu_slc_liveness_update + * + * @kctx: The &struct kbase_context corresponding to a user space context which sent the liveness + * update + * @update: See struct kbase_ioctl_buffer_liveness_update + * + * Context: Process context. Takes and releases the GPU power domain lock. Expects the caller to + * hold the DVFS lock. + */ +int gpu_pixel_handle_buffer_liveness_update_ioctl(struct kbase_context* kctx, + struct kbase_ioctl_buffer_liveness_update* update) +{ + int err = -EINVAL; + struct gpu_slc_liveness_update_info info; + u64* buff = NULL; + u64 total_buff_size; + + /* Compute the sizes of the user space arrays that we need to copy */ + u64 const buffer_info_size = sizeof(u64) * update->buffer_count; + u64 const live_ranges_size = + sizeof(struct kbase_pixel_gpu_slc_liveness_mark) * update->live_ranges_count; + + /* Guard against overflows and empty sizes */ + if (!buffer_info_size || !live_ranges_size) + goto done; + if (U64_MAX / sizeof(u64) < update->buffer_count) + goto done; + if (U64_MAX / sizeof(struct kbase_pixel_gpu_slc_liveness_mark) < update->live_ranges_count) + goto done; + /* Guard against nullptr */ + if (!update->live_ranges_address || !update->buffer_va_address || !update->buffer_sizes_address) + goto done; + /* Calculate the total buffer size required and detect overflows */ + if ((U64_MAX - live_ranges_size) / 2 < buffer_info_size) + goto done; + + total_buff_size = buffer_info_size * 2 + live_ranges_size; + + /* Allocate the memory we require to copy from user space */ + buff = kmalloc(total_buff_size, GFP_KERNEL); + if (buff == NULL) { + dev_err(kctx->kbdev->dev, "pixel: failed to allocate buffer for liveness update"); + err = -ENOMEM; + goto done; + } + + /* Set up the info struct by pointing into the allocation. All 8 byte aligned */ + info = (struct gpu_slc_liveness_update_info){ + .buffer_va = buff, + .buffer_sizes = buff + update->buffer_count, + .buffer_count = update->buffer_count, + .live_ranges = (struct kbase_pixel_gpu_slc_liveness_mark*)(buff + update->buffer_count * 2), + .live_ranges_count = update->live_ranges_count, + }; + + /* Copy the data from user space */ + err = + copy_from_user(info.live_ranges, u64_to_user_ptr(update->live_ranges_address), live_ranges_size); + if (err) { + dev_err(kctx->kbdev->dev, "pixel: failed to copy live ranges"); + err = -EFAULT; + goto done; + } + + err = copy_from_user( + info.buffer_sizes, u64_to_user_ptr(update->buffer_sizes_address), buffer_info_size); + if (err) { + dev_err(kctx->kbdev->dev, "pixel: failed to copy buffer sizes"); + err = -EFAULT; + goto done; + } + + err = copy_from_user(info.buffer_va, u64_to_user_ptr(update->buffer_va_address), buffer_info_size); + if (err) { + dev_err(kctx->kbdev->dev, "pixel: failed to copy buffer addresses"); + err = -EFAULT; + goto done; + } + + /* Execute an slc update */ + gpu_slc_liveness_update(kctx, &info); + +done: + kfree(buff); + + return err; +} + +/** + * gpu_slc_kctx_init() - Called when a kernel context is created + * + * @kctx: The &struct kbase_context that is being initialized + * + * This function is called when the GPU driver is initializing a new kernel context. This event is + * used to set up data structures that will be used to track this context's usage of the SLC. + * + * Return: Returns 0 on success, or an error code on failure. + */ +int gpu_slc_kctx_init(struct kbase_context *kctx) +{ + (void)kctx; + return 0; +} + +/** + * gpu_slc_kctx_term() - Called when a kernel context is terminated + * + * @kctx: The &struct kbase_context that is being terminated + * + * Free up SLC space used by the buffers that this context owns. + */ +void gpu_slc_kctx_term(struct kbase_context *kctx) +{ + struct kbase_device* kbdev = kctx->kbdev; + struct pixel_context *pc = kbdev->platform_context; + struct pixel_platform_data *kctx_pd = kctx->platform_data; + + mutex_lock(&pc->slc.lock); + + /* Deduct the usage and demand, freeing that SLC space for the next update */ + pc->slc.demand -= kctx_pd->slc.peak_demand; + pc->slc.usage -= kctx_pd->slc.peak_usage; + + /* Trigger partition resize based on the new demand */ + gpu_slc_resize_partition(kctx->kbdev); + + mutex_unlock(&pc->slc.lock); +} + + +/** + * gpu_slc_init - Initialize the SLC partition for the GPU + * + * @kbdev: The &struct kbase_device for the GPU. + * + * Return: On success, returns 0. On failure an error code is returned. + */ +int gpu_slc_init(struct kbase_device *kbdev) +{ + struct pixel_context *pc = kbdev->platform_context; + + mutex_init(&pc->slc.lock); + + return 0; +} + +/** + * gpu_slc_term() - Terminates the Pixel GPU SLC partition. + * + * @kbdev: The &struct kbase_device for the GPU. + */ +void gpu_slc_term(struct kbase_device *kbdev) +{ + (void)kbdev; +} diff --git a/mali_kbase/platform/pixel/pixel_gpu_slc.h b/mali_kbase/platform/pixel/pixel_gpu_slc.h new file mode 100644 index 0000000..29b4eb3 --- /dev/null +++ b/mali_kbase/platform/pixel/pixel_gpu_slc.h @@ -0,0 +1,37 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright 2022-2023 Google LLC. + * + * Author: Jack Diver <diverj@google.com> + */ +#ifndef _PIXEL_GPU_SLC_H_ +#define _PIXEL_GPU_SLC_H_ + +#ifdef CONFIG_MALI_PIXEL_GPU_SLC +int gpu_pixel_handle_buffer_liveness_update_ioctl(struct kbase_context* kctx, + struct kbase_ioctl_buffer_liveness_update* update); + +int gpu_slc_init(struct kbase_device *kbdev); + +void gpu_slc_term(struct kbase_device *kbdev); + +int gpu_slc_kctx_init(struct kbase_context *kctx); + +void gpu_slc_kctx_term(struct kbase_context *kctx); +#else +static int __maybe_unused gpu_pixel_handle_buffer_liveness_update_ioctl(struct kbase_context* kctx, + struct kbase_ioctl_buffer_liveness_update* update) +{ + return (void)kctx, (void)update, 0; +} + +int __maybe_unused gpu_slc_init(struct kbase_device *kbdev) { return (void)kbdev, 0; } + +void __maybe_unused gpu_slc_term(struct kbase_device *kbdev) { (void)kbdev; } + +static int __maybe_unused gpu_slc_kctx_init(struct kbase_context *kctx) { return (void)kctx, 0; } + +static void __maybe_unused gpu_slc_kctx_term(struct kbase_context* kctx) { (void)kctx; } +#endif /* CONFIG_MALI_PIXEL_GPU_SLC */ + +#endif /* _PIXEL_GPU_SLC_H_ */ diff --git a/mali_kbase/platform/pixel/pixel_gpu_sscd.c b/mali_kbase/platform/pixel/pixel_gpu_sscd.c new file mode 100644 index 0000000..b374b00 --- /dev/null +++ b/mali_kbase/platform/pixel/pixel_gpu_sscd.c @@ -0,0 +1,720 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright 2021 Google LLC. + * + * Author: Jack Diver <diverj@google.com> + */ + +/* Mali core includes */ +#include <mali_kbase.h> +#include <csf/mali_kbase_csf_trace_buffer.h> +#include <csf/mali_kbase_csf_firmware.h> +#include <csf/mali_kbase_csf_firmware_cfg.h> +#include <csf/mali_kbase_csf_firmware_core_dump.h> + +/* Pixel integration includes */ +#include "mali_kbase_config_platform.h" +#include <mali_kbase_reset_gpu.h> +#include "pixel_gpu_sscd.h" +#include "pixel_gpu_debug.h" +#include "pixel_gpu_control.h" +#include <linux/platform_data/sscoredump.h> +#include <linux/platform_device.h> + +/*************************************************************************************************** + * This feature is a WIP, and is pending Firmware + core KMD support for: * + * - Dumping FW private memory * + * - Suspending the MCU * + * - Dumping MCU registers * + **************************************************************************************************/ + +static void sscd_release(struct device *dev) +{ + (void)dev; +} + +static struct sscd_platform_data sscd_pdata; +const static struct platform_device sscd_dev_init = { .name = "mali", + .driver_override = SSCD_NAME, + .id = -1, + .dev = { + .platform_data = &sscd_pdata, + .release = sscd_release, + } }; +static struct platform_device sscd_dev; + +enum +{ + MCU_REGISTERS = 0x1, + GPU_REGISTERS = 0x2, + PRIVATE_MEM = 0x3, + SHARED_MEM = 0x4, + FW_TRACE = 0x5, + PM_EVENT_LOG = 0x6, + POWER_RAIL_LOG = 0x7, + PDC_STATUS = 0x8, + KTRACE = 0x9, + CONTEXTS = 0xA, + FW_CORE_DUMP = 0xB, + NUM_SEGMENTS +} sscd_segs; + +static void get_pm_event_log(struct kbase_device *kbdev, struct sscd_segment *seg) +{ + lockdep_assert_held(&kbdev->hwaccess_lock); + + if (seg->addr == NULL) + return; + + if (kbase_pm_copy_event_log(kbdev, seg->addr, seg->size)) { + dev_warn(kbdev->dev, "pixel: failed to report PM event log"); + } +} + +/** + * struct pixel_fw_trace_metadata - Info about the FW trace log + * + * @magic: Always 'pfwt', helps find the log in memory dumps + * @trace_address: The memory address of the FW trace log + * @trace_length: Number of used bytes in the trace ring buffer. + * The length will be <= (FW_TRACE_BUF_NR_PAGES << PAGE_SHIFT) + * @version: Updated whenever the binary layout changes + * @_reserved: Bytes reserved for future use + **/ +struct pixel_fw_trace_metadata { + char magic[4]; + uint64_t trace_address; + uint32_t trace_length; + uint8_t version; + char _reserved[31]; +} __attribute__((packed)); +_Static_assert(sizeof(struct pixel_fw_trace_metadata) == 48, + "Incorrect pixel_fw_trace_metadata size"); + +/** + * struct pixel_fw_trace - The FW trace and associated meta data + * + * @meta: Info about the trace log + * @trace_log: The actual trace log + **/ +struct pixel_fw_trace { + struct pixel_fw_trace_metadata meta; + char trace_log[FW_TRACE_BUF_NR_PAGES << PAGE_SHIFT]; +}; + +static void get_fw_trace(struct kbase_device *kbdev, struct sscd_segment *seg) +{ + struct firmware_trace_buffer *tb; + struct pixel_fw_trace *fw_trace; + + lockdep_assert_held(&kbdev->hwaccess_lock); + + if (seg->addr == NULL) + return; + + fw_trace = seg->addr; + + /* Write the default meta data */ + fw_trace->meta = (struct pixel_fw_trace_metadata) { + .magic = "pfwt", + .trace_address = 0, + .trace_length = 0, + .version = 1, + }; + + tb = kbase_csf_firmware_get_trace_buffer(kbdev, KBASE_CSFFW_LOG_BUF_NAME); + + if (tb == NULL) { + dev_err(kbdev->dev, "pixel: failed to open firmware trace buffer"); + return; + } + + /* Write the trace log */ + fw_trace->meta.trace_address = (uint64_t)tb; + fw_trace->meta.trace_length = kbase_csf_firmware_trace_buffer_read_data( + tb, fw_trace->trace_log, sizeof(fw_trace->trace_log)); + + return; +} + +/** + * struct pixel_ktrace_metadata - Info about the ktrace log + * + * @magic: Always 'ktra', helps find the log in memory dumps + * @trace_address: The memory address of the ktrace log + * @trace_start: Start of the ktrace ringbuffer + * @trace_end: End of the ktrace ringbuffer + * @version_major: Ktrace major version. + * @version_minor: Ktrace minor version. + * @_reserved: Bytes reserved for future use + **/ +struct pixel_ktrace_metadata { + char magic[4]; + uint64_t trace_address; + uint32_t trace_start; + uint32_t trace_end; + uint8_t version_major; + uint8_t version_minor; + char _reserved[28]; +} __attribute__((packed)); +_Static_assert(sizeof(struct pixel_ktrace_metadata) == 50, + "Incorrect pixel_ktrace_metadata size"); + +struct pixel_ktrace { + struct pixel_ktrace_metadata meta; +#if KBASE_KTRACE_TARGET_RBUF + struct kbase_ktrace_msg trace_log[KBASE_KTRACE_SIZE]; +#endif +}; +static void get_ktrace(struct kbase_device *kbdev, + struct sscd_segment *seg) +{ + struct pixel_ktrace *ktrace = seg->addr; +#if KBASE_KTRACE_TARGET_RBUF + unsigned long flags; + u32 entries_copied = 0; +#endif + + if (seg->addr == NULL) + return; + + ktrace->meta = (struct pixel_ktrace_metadata) { .magic = "ktra" }; +#if KBASE_KTRACE_TARGET_RBUF + lockdep_assert_held(&kbdev->hwaccess_lock); + spin_lock_irqsave(&kbdev->ktrace.lock, flags); + ktrace->meta.trace_address = (uint64_t)kbdev->ktrace.rbuf; + ktrace->meta.trace_start = kbdev->ktrace.first_out; + ktrace->meta.trace_end = kbdev->ktrace.next_in; + ktrace->meta.version_major = KBASE_KTRACE_VERSION_MAJOR; + ktrace->meta.version_minor = KBASE_KTRACE_VERSION_MINOR; + + entries_copied = kbasep_ktrace_copy(kbdev, seg->addr, KBASE_KTRACE_SIZE); + if (entries_copied != KBASE_KTRACE_SIZE) + dev_warn(kbdev->dev, "only copied %i of %i ktrace entries", + entries_copied, KBASE_KTRACE_SIZE); + spin_unlock_irqrestore(&kbdev->ktrace.lock, flags); + + KBASE_KTRACE_RBUF_DUMP(kbdev); +#else + dev_warn(kbdev->dev, "ktrace information not present"); +#endif +} + +#if MALI_USE_CSF +/** + * enum pixel_context_state - a coarse platform independent state for a context. + * + * @PIXEL_CONTEXT_ACTIVE: The context is running (in some capacity) on GPU. + * @PIXEL_CONTEXT_RUNNABLE: The context is runnable, but not running on GPU. + * @PIXEL_CONTEXT_INACTIVE: The context is not acive. + */ +enum pixel_context_state { + PIXEL_CONTEXT_ACTIVE = 0, + PIXEL_CONTEXT_RUNNABLE, + PIXEL_CONTEXT_INACTIVE +}; + +/** + * struct pixel_context_metadata - metadata for context information. + * + * @magic: always "c@tx" + * @version: version marker. + * @platform: unique id for platform reporting context. + * @_reserved: reserved. + */ +struct pixel_context_metadata { + char magic[4]; + u8 version; + u32 platform; + char _reserved[27]; +} __attribute__((packed)); +_Static_assert(sizeof(struct pixel_context_metadata) == 36, + "Incorrect pixel_context_metadata size"); + +/** + * struct pixel_context_snapshot_entry - platform independent context record for + * crash reports. + * @id: The context id. + * @pid: The PID that owns this context. + * @tgid: The TGID that owns this context. + * @context_state: The coarse state for a context. + * @priority: The priority of this context. + * @gpu_slot: The handle that the context may have representing the + * resource granted to run on the GPU. + * @platform_state: The platform-dependendant state, if any. + * @time_in_state: The amount of time in ms that this context has been + * in @platform_state. + */ +struct pixel_context_snapshot_entry { + u32 id; + u32 pid; + u32 tgid; + u8 context_state; + u32 priority; + u32 gpu_slot; + u32 platform_state; + u64 time_in_state; +} __attribute__((packed)); +_Static_assert(sizeof(struct pixel_context_snapshot_entry) == 33, + "Incorrect pixel_context_metadata size"); + +/** + * struct pixel_context_snapshot - list of platform independent context info. + * + * List of contexts of interest during SSCD generation time. + * + * @meta: The metadata for the segment. + * @num_contexts: The number of contexts in the list. + * @contexts: The context information. + */ +struct pixel_context_snapshot { + struct pixel_context_metadata meta; + u32 num_contexts; + struct pixel_context_snapshot_entry contexts[]; +} __attribute__((packed)); + +static int pixel_context_snapshot_init(struct kbase_device *kbdev, + struct sscd_segment* segment, + size_t num_entries) { + segment->size = sizeof(struct pixel_context_snapshot) + + num_entries * sizeof(struct pixel_context_snapshot_entry); + segment->addr = kzalloc(segment->size, GFP_KERNEL); + if (segment->addr == NULL) { + segment->size = 0; + dev_err(kbdev->dev, + "pixel: failed to allocate context snapshot buffer"); + return -ENOMEM; + } + return 0; +} + +static void pixel_context_snapshot_term(struct sscd_segment* segment) { + if (segment && segment->addr) { + kfree(segment->addr); + segment->size = 0; + segment->addr = NULL; + } +} + +/* get_and_init_contexts - fill the CONTEXT segment + * + * If function returns 0, caller is reponsible for freeing segment->addr. + * + * @kbdev: kbase_device + * @segment: the CONTEXT segment for report + * + * \returns: 0 on success. + */ +static int get_and_init_contexts(struct kbase_device *kbdev, + struct sscd_segment *segment) +{ + u32 csg_nr; + u32 num_csg = kbdev->csf.global_iface.group_num; + struct pixel_context_snapshot *context_snapshot; + struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler; + size_t num_entries; + size_t entry_idx; + int rc; + + if (!rt_mutex_trylock(&kbdev->csf.scheduler.lock)) { + dev_warn(kbdev->dev, "could not lock scheduler during dump."); + return -EBUSY; + } + + num_entries = bitmap_weight(scheduler->csg_inuse_bitmap, num_csg); + rc = pixel_context_snapshot_init(kbdev, segment, num_entries); + if (rc) { + rt_mutex_unlock(&kbdev->csf.scheduler.lock); + return rc; + } + context_snapshot = segment->addr; + context_snapshot->num_contexts = num_entries; + + context_snapshot->meta = (struct pixel_context_metadata) { + .magic = "c@tx", + .platform = kbdev->gpu_props.props.raw_props.gpu_id, + .version = 1, + }; + + entry_idx = 0; + for_each_set_bit(csg_nr, scheduler->csg_inuse_bitmap, num_csg) { + struct kbase_csf_csg_slot *slot = + &kbdev->csf.scheduler.csg_slots[csg_nr]; + struct pixel_context_snapshot_entry *entry = + &context_snapshot->contexts[entry_idx++]; + entry->context_state = PIXEL_CONTEXT_ACTIVE; + entry->gpu_slot = csg_nr; + entry->platform_state = atomic_read(&slot->state); + entry->priority = slot->priority; + entry->time_in_state = (jiffies - slot->trigger_jiffies) / HZ; + if (slot->resident_group) { + entry->id = slot->resident_group->handle; + entry->pid = slot->resident_group->kctx->pid; + entry->tgid = slot->resident_group->kctx->tgid; + } + } + + rt_mutex_unlock(&kbdev->csf.scheduler.lock); + return 0; +} +#endif + +struct pixel_fw_core_dump { + char magic[4]; + u32 reserved; + char git_sha[BUILD_INFO_GIT_SHA_LEN]; + char core_dump[]; +}; + +static void get_and_init_fw_core_dump(struct kbase_device *kbdev, struct sscd_segment *seg) +{ + const size_t core_dump_size = get_fw_core_dump_size(kbdev); + + int i; + struct pixel_fw_core_dump *fw_core_dump; + struct kbase_csf_firmware_interface *interface; + struct page *page; + u32 *p; + size_t size; + size_t write_size; + + if (core_dump_size == -1) + { + dev_err(kbdev->dev, "pixel: failed to get firmware core dump size"); + } + + seg->size = sizeof(struct pixel_fw_core_dump) + core_dump_size; + seg->addr = kzalloc(seg->size, GFP_KERNEL); + + if (seg->addr == NULL) { + seg->size = 0; + dev_err(kbdev->dev, "pixel: failed to allocate for firmware core dump buffer"); + return; + } + + fw_core_dump = (struct pixel_fw_core_dump *) seg->addr; + + strncpy(fw_core_dump->magic, "fwcd", 4); + memcpy(fw_core_dump->git_sha, fw_git_sha, BUILD_INFO_GIT_SHA_LEN); + + // Dumping ELF header + { + struct fw_core_dump_data private = {.kbdev = kbdev}; + struct seq_file m = {.private = &private, .buf = fw_core_dump->core_dump, .size = core_dump_size}; + fw_core_dump_write_elf_header(&m); + size = m.count; + if (unlikely(m.count >= m.size)) + dev_warn(kbdev->dev, "firmware core dump header may be larger than buffer size"); + } + + // Dumping pages + list_for_each_entry(interface, &kbdev->csf.firmware_interfaces, node) { + /* Skip memory sections that cannot be read or are protected. */ + if ((interface->flags & CSF_FIRMWARE_ENTRY_PROTECTED) || + (interface->flags & CSF_FIRMWARE_ENTRY_READ) == 0) + continue; + + for(i = 0; i < interface->num_pages; i++) + { + page = as_page(interface->phys[i]); + write_size = size < core_dump_size ? min(core_dump_size - size, (size_t) FW_PAGE_SIZE) : 0; + if (write_size) + { + p = kmap_atomic(page); + memcpy(fw_core_dump->core_dump + size, p, write_size); + kunmap_atomic(p); + } + size += FW_PAGE_SIZE; + + if (size < FW_PAGE_SIZE) + break; + } + } + + if (unlikely(size != core_dump_size)) + { + dev_err(kbdev->dev, "firmware core dump size and buffer size are different"); + kfree(seg->addr); + seg->addr = NULL; + seg->size = 0; + } + + return; +} +/* + * Stub pending FW support + */ +static void get_fw_private_memory(struct kbase_device *kbdev, struct sscd_segment *seg) +{ + (void)kbdev; + (void)seg; +} +/* + * Stub pending FW support + */ +static void get_fw_shared_memory(struct kbase_device *kbdev, struct sscd_segment *seg) +{ + (void)kbdev; + (void)seg; +} +/* + * Stub pending FW support + */ +static void get_fw_registers(struct kbase_device *kbdev, struct sscd_segment *seg) +{ + (void)kbdev; + (void)seg; +} + +/* + * Stub pending FW support + */ +static void get_gpu_registers(struct kbase_device *kbdev, struct sscd_segment *seg) +{ + (void)kbdev; + (void)seg; +} +/* + * Stub pending FW support + */ +static void flush_caches(struct kbase_device *kbdev) +{ + (void)kbdev; +} +/* + * Stub pending FW support + */ +static void suspend_mcu(struct kbase_device *kbdev) +{ + (void)kbdev; +} + +static void get_rail_state_log(struct kbase_device *kbdev, struct sscd_segment *seg) +{ + lockdep_assert_held(&((struct pixel_context*)kbdev->platform_context)->pm.lock); + + seg->addr = gpu_pm_get_rail_state_log(kbdev); + seg->size = gpu_pm_get_rail_state_log_size(kbdev); +} + +static void get_pdc_state(struct kbase_device *kbdev, struct pixel_gpu_pdc_status *pdc_status, + struct sscd_segment *seg) +{ + lockdep_assert_held(&kbdev->hwaccess_lock); + + if (pdc_status == NULL) { + dev_err(kbdev->dev, "pixel: failed to read PDC status, no storage"); + return; + } + gpu_debug_read_pdc_status(kbdev, pdc_status); + seg->addr = pdc_status; + seg->size = sizeof(*pdc_status); +} + +static int segments_init(struct kbase_device *kbdev, struct sscd_segment* segments) +{ + /* Zero init everything for safety */ + memset(segments, 0, sizeof(struct sscd_segment) * NUM_SEGMENTS); + + segments[PM_EVENT_LOG].size = kbase_pm_max_event_log_size(kbdev); + segments[PM_EVENT_LOG].addr = kzalloc(segments[PM_EVENT_LOG].size, GFP_KERNEL); + + if (!segments[PM_EVENT_LOG].addr) { + segments[PM_EVENT_LOG].size = 0; + dev_err(kbdev->dev, "pixel: failed to allocate for PM event log"); + return -ENOMEM; + } + + segments[FW_TRACE].size = sizeof(struct pixel_fw_trace); + segments[FW_TRACE].addr = kzalloc(sizeof(struct pixel_fw_trace), GFP_KERNEL); + + if (segments[FW_TRACE].addr == NULL) { + segments[FW_TRACE].size = 0; + dev_err(kbdev->dev, "pixel: failed to allocate for firmware trace description"); + return -ENOMEM; + } + + segments[KTRACE].size = sizeof(struct pixel_ktrace); + segments[KTRACE].addr = kzalloc(sizeof(struct pixel_ktrace), GFP_KERNEL); + if (segments[KTRACE].addr == NULL) { + segments[KTRACE].size = 0; + dev_err(kbdev->dev, "pixel: failed to allocate for ktrace buffer"); + return -ENOMEM; + } + + return 0; +} + +static void segments_term(struct kbase_device *kbdev, struct sscd_segment* segments) +{ + (void)kbdev; + + kfree(segments[FW_TRACE].addr); + kfree(segments[PM_EVENT_LOG].addr); + kfree(segments[KTRACE].addr); +#if MALI_USE_CSF + pixel_context_snapshot_term(segments); +#endif + /* Null out the pointers */ + memset(segments, 0, sizeof(struct sscd_segment) * NUM_SEGMENTS); +} + +#define GPU_HANG_SSCD_TIMEOUT_MS (300000) /* 300s */ + +/** + * gpu_sscd_dump() - Initiates and reports a subsystem core-dump of the GPU. + * + * @kbdev: The &struct kbase_device for the GPU. + * @reason: A null terminated string containing a dump reason + * + * Context: Process context. + */ +void gpu_sscd_dump(struct kbase_device *kbdev, const char* reason) +{ + struct sscd_segment segs[NUM_SEGMENTS]; + struct sscd_platform_data *pdata = dev_get_platdata(&sscd_dev.dev); + struct pixel_context *pc = kbdev->platform_context; + int ec = 0; + unsigned long flags, current_ts = jiffies; + struct pixel_gpu_pdc_status pdc_status; + static unsigned long last_hang_sscd_ts; +#if MALI_USE_CSF + int fwcd_err; +#endif + + if (!strcmp(reason, "GPU hang")) { + /* GPU hang - avoid multiple coredumps for the same hang until + * GPU_HANG_SSCD_TIMEOUT_MS passes and GPU reset shows no failure. + */ + if (!last_hang_sscd_ts || (time_after(current_ts, + last_hang_sscd_ts + msecs_to_jiffies(GPU_HANG_SSCD_TIMEOUT_MS)) && + !kbase_reset_gpu_failed(kbdev))) { + last_hang_sscd_ts = current_ts; + } else { + dev_info(kbdev->dev, "pixel: skipping mali subsystem core dump"); + return; + } + } + + dev_info(kbdev->dev, "pixel: mali subsystem core dump in progress"); + /* No point in proceeding if we can't report the dumped data */ + if (!pdata->sscd_report) { + dev_warn(kbdev->dev, "pixel: failed to report core dump, sscd_report was NULL"); + return; + } + +#if MALI_USE_CSF + fwcd_err = fw_core_dump_create(kbdev); + if (fwcd_err) + dev_err(kbdev->dev, "pixel: failed to create firmware core dump (%d)", fwcd_err); +#endif + + ec = segments_init(kbdev, segs); + if (ec != 0) { + dev_err(kbdev->dev, + "pixel: failed to init core dump segments (%d), partial dump in progress", ec); + } + + /* We don't want anything messing with the HW while we dump */ + spin_lock_irqsave(&kbdev->hwaccess_lock, flags); + + /* Read the FW view of GPU PDC state, we get this early */ + get_pdc_state(kbdev, &pdc_status, &segs[PDC_STATUS]); + + /* Suspend the MCU to prevent it from overwriting the data we want to dump */ + suspend_mcu(kbdev); + + /* Flush the cache so our memory page reads contain up to date values */ + flush_caches(kbdev); + + /* Read out the updated FW private memory pages */ + get_fw_private_memory(kbdev, &segs[PRIVATE_MEM]); + + /* Read out the updated memory shared between host and firmware */ + get_fw_shared_memory(kbdev, &segs[SHARED_MEM]); + + get_fw_registers(kbdev, &segs[MCU_REGISTERS]); + get_gpu_registers(kbdev, &segs[GPU_REGISTERS]); + + get_fw_trace(kbdev, &segs[FW_TRACE]); + + get_pm_event_log(kbdev, &segs[PM_EVENT_LOG]); + + get_ktrace(kbdev, &segs[KTRACE]); + +#if MALI_USE_CSF + ec = get_and_init_contexts(kbdev, &segs[CONTEXTS]); + if (ec) { + dev_err(kbdev->dev, + "could not collect active contexts: rc: %i", ec); + } + + if (!fwcd_err) + get_and_init_fw_core_dump(kbdev, &segs[FW_CORE_DUMP]); +#endif + + spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags); + + /* Acquire the pm lock to prevent modifications to the rail state log */ + mutex_lock(&pc->pm.lock); + + get_rail_state_log(kbdev, &segs[POWER_RAIL_LOG]); + + /* Report the core dump and generate an ELF header for it */ + pdata->sscd_report(&sscd_dev, segs, NUM_SEGMENTS, SSCD_FLAGS_ELFARM64HDR, reason); + + /* Must be held until the dump completes, as the log is referenced rather than copied */ + mutex_unlock(&pc->pm.lock); + + segments_term(kbdev, segs); +} + +/** + * gpu_sscd_fw_log_init() - Set's the FW log verbosity. + * + * @kbdev: The &struct kbase_device for the GPU. + * @level: The log verbosity. + * + * Context: Process context. + * + * Return: On success returns 0, otherwise returns an error code. + */ +int gpu_sscd_fw_log_init(struct kbase_device *kbdev, u32 level) +{ + u32 addr; + int ec = kbase_csf_firmware_cfg_find_config_address(kbdev, "Log verbosity", &addr); + + if (!ec) { + /* Update the FW log verbosity in FW memory */ + kbase_csf_update_firmware_memory(kbdev, addr, level); + } + + return ec; +} + +/** + * gpu_sscd_init() - Registers the SSCD platform device. + * + * @kbdev: The &struct kbase_device for the GPU. + * + * Context: Process context. + * + * Return: On success, returns 0 otherwise returns an error code. + */ +int gpu_sscd_init(struct kbase_device *kbdev) +{ + sscd_dev = sscd_dev_init; + return platform_device_register(&sscd_dev); +} + +/** + * gpu_sscd_term() - Unregisters the SSCD platform device. + * + * @kbdev: The &struct kbase_device for the GPU. + * + * Context: Process context. + */ +void gpu_sscd_term(struct kbase_device *kbdev) +{ + platform_device_unregister(&sscd_dev); +} diff --git a/mali_kbase/platform/pixel/pixel_gpu_sscd.h b/mali_kbase/platform/pixel/pixel_gpu_sscd.h new file mode 100644 index 0000000..68f7a0b --- /dev/null +++ b/mali_kbase/platform/pixel/pixel_gpu_sscd.h @@ -0,0 +1,37 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright 2021 Google LLC. + * + * Author: Jack Diver <diverj@google.com> + */ + +#ifndef _PIXEL_GPU_SSCD_H_ +#define _PIXEL_GPU_SSCD_H_ + +#include <mali_kbase.h> + +#ifdef CONFIG_MALI_PIXEL_GPU_SSCD +int gpu_sscd_fw_log_init(struct kbase_device *kbdev, u32 level); + +int gpu_sscd_init(struct kbase_device *kbdev); + +void gpu_sscd_term(struct kbase_device *kbdev); + +void gpu_sscd_dump(struct kbase_device *kbdev, const char* reason); +#else +static int __maybe_unused gpu_sscd_fw_log_init(struct kbase_device *kbdev, u32 level) +{ + return (void)kbdev, (void)level, 0; +} + +static int __maybe_unused gpu_sscd_init(struct kbase_device *kbdev) { return (void)kbdev, 0; } + +static void __maybe_unused gpu_sscd_term(struct kbase_device *kbdev) { (void)kbdev; } + +static void __maybe_unused gpu_sscd_dump(struct kbase_device *kbdev, const char* reason) +{ + (void)kbdev, (void)reason; +} +#endif /* CONFIG_MALI_PIXEL_GPU_SSCD */ + +#endif /* _PIXEL_GPU_SSCD_H_ */ diff --git a/mali_kbase/platform/pixel/pixel_gpu_sysfs.c b/mali_kbase/platform/pixel/pixel_gpu_sysfs.c index e856039..f6164f9 100644 --- a/mali_kbase/platform/pixel/pixel_gpu_sysfs.c +++ b/mali_kbase/platform/pixel/pixel_gpu_sysfs.c @@ -7,11 +7,13 @@ /* Mali core includes */ #include <mali_kbase.h> +#include <trace/events/power.h> /* Pixel integration includes */ #include "mali_kbase_config_platform.h" #include "pixel_gpu_control.h" #include "pixel_gpu_dvfs.h" +#include "pixel_gpu_sscd.h" static const char *gpu_dvfs_level_lock_names[GPU_DVFS_LEVEL_LOCK_COUNT] = { "devicetree", @@ -315,12 +317,25 @@ static ssize_t uid_time_in_state_h_show(struct device *dev, struct device_attrib return ret; } +static ssize_t trigger_core_dump_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) +{ + struct kbase_device *kbdev = dev->driver_data; + + (void)attr, (void)buf; + + gpu_sscd_dump(kbdev, "Manual core dump"); + + return count; +} + DEVICE_ATTR_RO(utilization); DEVICE_ATTR_RO(clock_info); DEVICE_ATTR_RO(dvfs_table); DEVICE_ATTR_RO(power_stats); DEVICE_ATTR_RO(uid_time_in_state); DEVICE_ATTR_RO(uid_time_in_state_h); +DEVICE_ATTR_WO(trigger_core_dump); /* devfreq-like attributes */ @@ -431,6 +446,8 @@ static ssize_t hint_max_freq_store(struct device *dev, struct device_attribute * if (level < 0) return -EINVAL; + trace_clock_set_rate("gpu_hint_max", clock, raw_smp_processor_id()); + mutex_lock(&pc->dvfs.lock); gpu_dvfs_update_level_lock(kbdev, GPU_DVFS_LEVEL_LOCK_HINT, -1, level); gpu_dvfs_select_level(kbdev); @@ -475,6 +492,8 @@ static ssize_t hint_min_freq_store(struct device *dev, struct device_attribute * if (level < 0) return -EINVAL; + trace_clock_set_rate("gpu_hint_min", clock, raw_smp_processor_id()); + mutex_lock(&pc->dvfs.lock); gpu_dvfs_update_level_lock(kbdev, GPU_DVFS_LEVEL_LOCK_HINT, level, -1); gpu_dvfs_select_level(kbdev); @@ -676,6 +695,57 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr, return ret; } +static ssize_t ifpo_show(struct device *dev, struct device_attribute *attr, char *buf) +{ +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + struct kbase_device *kbdev = dev->driver_data; + struct pixel_context *pc = kbdev->platform_context; + ssize_t ret = 0; + + if (!pc) + return -ENODEV; + + mutex_lock(&pc->pm.lock); + ret = scnprintf(buf, PAGE_SIZE, "%d\n", pc->pm.ifpo_enabled); + mutex_unlock(&pc->pm.lock); + return ret; +#else + return -ENOTSUPP; +#endif +} + +static ssize_t ifpo_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) +{ +#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS + int ret; + bool enabled; + struct kbase_device *kbdev = dev->driver_data; + struct pixel_context *pc = kbdev->platform_context; + if (!pc) + return -ENODEV; + + ret = strtobool(buf, &enabled); + if (ret) + return -EINVAL; + + mutex_lock(&kbdev->csf.scheduler.lock); + + if (!enabled) { + turn_on_sc_power_rails(kbdev); + } + + mutex_lock(&pc->pm.lock); + pc->pm.ifpo_enabled = enabled; + mutex_unlock(&pc->pm.lock); + mutex_unlock(&kbdev->csf.scheduler.lock); + + return count; +#else + return -ENOTSUPP; +#endif +} + /* Define devfreq-like attributes */ DEVICE_ATTR_RO(available_frequencies); @@ -691,6 +761,7 @@ DEVICE_ATTR_RO(time_in_state); DEVICE_ATTR_RO(trans_stat); DEVICE_ATTR_RO(available_governors); DEVICE_ATTR_RW(governor); +DEVICE_ATTR_RW(ifpo); /* Initialization code */ @@ -722,7 +793,9 @@ static struct { { "time_in_state", &dev_attr_time_in_state }, { "trans_stat", &dev_attr_trans_stat }, { "available_governors", &dev_attr_available_governors }, - { "governor", &dev_attr_governor } + { "governor", &dev_attr_governor }, + { "trigger_core_dump", &dev_attr_trigger_core_dump }, + { "ifpo", &dev_attr_ifpo } }; /** diff --git a/mali_kbase/platform/pixel/pixel_gpu_tmu.c b/mali_kbase/platform/pixel/pixel_gpu_tmu.c index a7b064b..dd49236 100644 --- a/mali_kbase/platform/pixel/pixel_gpu_tmu.c +++ b/mali_kbase/platform/pixel/pixel_gpu_tmu.c @@ -207,7 +207,7 @@ static int gpu_tmu_notifier(struct notifier_block *notifier, unsigned long event return NOTIFY_BAD; } dev_info(kbdev->dev, - "%s: GPU_THROTTLING event received limiting GPU clock to %d kHz\n", + "%s: Adjusting GPU clock to %d kHz for thermal constraints (this is normal)\n", __func__, pc->dvfs.table[level].clk[GPU_DVFS_CLK_SHADERS]); break; default: diff --git a/mali_kbase/platform/pixel/pixel_gpu_trace.h b/mali_kbase/platform/pixel/pixel_gpu_trace.h index 775adde..6c30f1b 100644 --- a/mali_kbase/platform/pixel/pixel_gpu_trace.h +++ b/mali_kbase/platform/pixel/pixel_gpu_trace.h @@ -22,7 +22,6 @@ #define GPU_POWER_STATE_SYMBOLIC_STRINGS \ {GPU_POWER_LEVEL_STACKS, "STACKS"}, \ - {GPU_POWER_LEVEL_COREGROUP, "COREGROUP"}, \ {GPU_POWER_LEVEL_GLOBAL, "GLOBAL"}, \ {GPU_POWER_LEVEL_OFF, "OFF"} @@ -46,6 +45,30 @@ TRACE_EVENT(gpu_power_state, ) ); +TRACE_EVENT(gpu_gov_rec_violate, + TP_PROTO(unsigned int recfreq, unsigned int retfreq, + unsigned int minlvfreq, unsigned int maxlvfreq), + TP_ARGS(recfreq, retfreq, minlvfreq, maxlvfreq), + TP_STRUCT__entry( + __field(unsigned int, recfreq) + __field(unsigned int, retfreq) + __field(unsigned int, minlvfreq) + __field(unsigned int, maxlvfreq) + ), + TP_fast_assign( + __entry->recfreq = recfreq; + __entry->retfreq = retfreq; + __entry->minlvfreq = minlvfreq; + __entry->maxlvfreq = maxlvfreq; + ), + TP_printk("rec=%u ret=%u min=%u max=%u", + __entry->recfreq, + __entry->retfreq, + __entry->minlvfreq, + __entry->maxlvfreq + ) +); + #endif /* _TRACE_PIXEL_GPU_H */ /* This part must be outside protection */ diff --git a/mali_kbase/platform/pixel/pixel_gpu_uevent.c b/mali_kbase/platform/pixel/pixel_gpu_uevent.c new file mode 100644 index 0000000..a1db47c --- /dev/null +++ b/mali_kbase/platform/pixel/pixel_gpu_uevent.c @@ -0,0 +1,94 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright 2023 Google LLC. + * + * Author: Varad Gautam <varadgautam@google.com> + */ + +#include <linux/spinlock.h> +#include "pixel_gpu_uevent.h" + +#define GPU_UEVENT_TIMEOUT_MS (1200000U) /* 20min */ + +static struct gpu_uevent_ctx { + unsigned long last_uevent_ts[GPU_UEVENT_TYPE_MAX]; + spinlock_t lock; +} gpu_uevent_ctx = { + .last_uevent_ts = {0}, + .lock = __SPIN_LOCK_UNLOCKED(gpu_uevent_ctx.lock) +}; + +static bool gpu_uevent_check_valid(const struct gpu_uevent *evt) +{ + switch (evt->type) { + case GPU_UEVENT_TYPE_KMD_ERROR: + switch (evt->info) { + case GPU_UEVENT_INFO_CSG_REQ_STATUS_UPDATE: + case GPU_UEVENT_INFO_CSG_SUSPEND: + case GPU_UEVENT_INFO_CSG_SLOTS_SUSPEND: + case GPU_UEVENT_INFO_CSG_GROUP_SUSPEND: + case GPU_UEVENT_INFO_CSG_EP_CFG: + case GPU_UEVENT_INFO_CSG_SLOTS_START: + case GPU_UEVENT_INFO_GROUP_TERM: + case GPU_UEVENT_INFO_QUEUE_START: + case GPU_UEVENT_INFO_QUEUE_STOP: + case GPU_UEVENT_INFO_QUEUE_STOP_ACK: + case GPU_UEVENT_INFO_CSG_SLOT_READY: + case GPU_UEVENT_INFO_L2_PM_TIMEOUT: + case GPU_UEVENT_INFO_PM_TIMEOUT: + return true; + default: + break; + } + break; + case GPU_UEVENT_TYPE_GPU_RESET: + switch (evt->info) { + case GPU_UEVENT_INFO_CSF_RESET_OK: + case GPU_UEVENT_INFO_CSF_RESET_FAILED: + return true; + default: + break; + } + break; + default: + break; + } + + return false; +} + +void pixel_gpu_uevent_send(struct kbase_device *kbdev, const struct gpu_uevent *evt) +{ + enum uevent_env_idx { + ENV_IDX_TYPE, + ENV_IDX_INFO, + ENV_IDX_NULL, + ENV_IDX_MAX + }; + char *env[ENV_IDX_MAX] = {0}; + unsigned long flags, current_ts = jiffies; + bool suppress_uevent = false; + + if (!gpu_uevent_check_valid(evt)) { + dev_err(kbdev->dev, "unrecognized uevent type=%u info=%u", evt->type, evt->info); + return; + } + + env[ENV_IDX_TYPE] = (char *) gpu_uevent_type_str(evt->type); + env[ENV_IDX_INFO] = (char *) gpu_uevent_info_str(evt->info); + env[ENV_IDX_NULL] = NULL; + + spin_lock_irqsave(&gpu_uevent_ctx.lock, flags); + + if (time_after(current_ts, gpu_uevent_ctx.last_uevent_ts[evt->type] + + msecs_to_jiffies(GPU_UEVENT_TIMEOUT_MS))) { + gpu_uevent_ctx.last_uevent_ts[evt->type] = current_ts; + } else { + suppress_uevent = true; + } + + spin_unlock_irqrestore(&gpu_uevent_ctx.lock, flags); + + if (!suppress_uevent) + kobject_uevent_env(&kbdev->dev->kobj, KOBJ_CHANGE, env); +} diff --git a/mali_kbase/platform/pixel/pixel_gpu_uevent.h b/mali_kbase/platform/pixel/pixel_gpu_uevent.h new file mode 100644 index 0000000..1fe3c50 --- /dev/null +++ b/mali_kbase/platform/pixel/pixel_gpu_uevent.h @@ -0,0 +1,74 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright 2023 Google LLC. + * + * Author: Varad Gautam <varadgautam@google.com> + */ + +#ifndef _PIXEL_GPU_UEVENT_H_ +#define _PIXEL_GPU_UEVENT_H_ + +#include <mali_kbase.h> + +#define GPU_UEVENT_TYPE_LIST \ + GPU_UEVENT_TYPE(NONE) \ + GPU_UEVENT_TYPE(KMD_ERROR) \ + GPU_UEVENT_TYPE(GPU_RESET) \ + GPU_UEVENT_TYPE(MAX) + +#define GPU_UEVENT_TYPE(type) GPU_UEVENT_TYPE_##type, +enum gpu_uevent_type { + GPU_UEVENT_TYPE_LIST +}; + +#undef GPU_UEVENT_TYPE +#define GPU_UEVENT_TYPE(type) "GPU_UEVENT_TYPE="#type, +static inline const char *gpu_uevent_type_str(enum gpu_uevent_type type) { + static const char * const gpu_uevent_types[] = { + GPU_UEVENT_TYPE_LIST + }; + return gpu_uevent_types[type]; +} +#undef GPU_UEVENT_TYPE + +#define GPU_UEVENT_INFO_LIST \ + GPU_UEVENT_INFO(NONE) \ + GPU_UEVENT_INFO(CSG_REQ_STATUS_UPDATE) \ + GPU_UEVENT_INFO(CSG_SUSPEND) \ + GPU_UEVENT_INFO(CSG_SLOTS_SUSPEND) \ + GPU_UEVENT_INFO(CSG_GROUP_SUSPEND) \ + GPU_UEVENT_INFO(CSG_EP_CFG) \ + GPU_UEVENT_INFO(CSG_SLOTS_START) \ + GPU_UEVENT_INFO(GROUP_TERM) \ + GPU_UEVENT_INFO(QUEUE_START) \ + GPU_UEVENT_INFO(QUEUE_STOP) \ + GPU_UEVENT_INFO(QUEUE_STOP_ACK) \ + GPU_UEVENT_INFO(CSG_SLOT_READY) \ + GPU_UEVENT_INFO(L2_PM_TIMEOUT) \ + GPU_UEVENT_INFO(PM_TIMEOUT) \ + GPU_UEVENT_INFO(CSF_RESET_OK) \ + GPU_UEVENT_INFO(CSF_RESET_FAILED) \ + GPU_UEVENT_INFO(MAX) + +#define GPU_UEVENT_INFO(info) GPU_UEVENT_INFO_##info, +enum gpu_uevent_info { + GPU_UEVENT_INFO_LIST +}; +#undef GPU_UEVENT_INFO +#define GPU_UEVENT_INFO(info) "GPU_UEVENT_INFO="#info, +static inline const char *gpu_uevent_info_str(enum gpu_uevent_info info) { + static const char * const gpu_uevent_infos[] = { + GPU_UEVENT_INFO_LIST + }; + return gpu_uevent_infos[info]; +} +#undef GPU_UEVENT_INFO + +struct gpu_uevent { + enum gpu_uevent_type type; + enum gpu_uevent_info info; +}; + +void pixel_gpu_uevent_send(struct kbase_device *kbdev, const struct gpu_uevent *evt); + +#endif /* _PIXEL_GPU_UEVENT_H_ */ diff --git a/mali_kbase/tests/Kbuild b/mali_kbase/tests/Kbuild index ee3de7b..72ca70a 100644 --- a/mali_kbase/tests/Kbuild +++ b/mali_kbase/tests/Kbuild @@ -1,6 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note # -# (C) COPYRIGHT 2017, 2020-2021 ARM Limited. All rights reserved. +# (C) COPYRIGHT 2017-2023 ARM Limited. All rights reserved. # # This program is free software and is provided to you under the terms of the # GNU General Public License version 2 as published by the Free Software @@ -17,6 +17,7 @@ # http://www.gnu.org/licenses/gpl-2.0.html. # # +src:=$(if $(patsubst /%,,$(src)),$(srctree)/$(src),$(src)) ccflags-y += -I$(src)/include \ -I$(src) @@ -27,4 +28,6 @@ subdir-ccflags-y += -I$(src)/include \ obj-$(CONFIG_MALI_KUTF) += kutf/ obj-$(CONFIG_MALI_KUTF_IRQ_TEST) += mali_kutf_irq_test/ obj-$(CONFIG_MALI_KUTF_CLK_RATE_TRACE) += mali_kutf_clk_rate_trace/kernel/ +obj-$(CONFIG_MALI_KUTF_MGM_INTEGRATION) += mali_kutf_mgm_integration_test/ + diff --git a/mali_kbase/tests/Kconfig b/mali_kbase/tests/Kconfig index a86e1ce..f100901 100644 --- a/mali_kbase/tests/Kconfig +++ b/mali_kbase/tests/Kconfig @@ -1,6 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note # -# (C) COPYRIGHT 2017, 2020-2021 ARM Limited. All rights reserved. +# (C) COPYRIGHT 2017, 2020-2023 ARM Limited. All rights reserved. # # This program is free software and is provided to you under the terms of the # GNU General Public License version 2 as published by the Free Software @@ -52,6 +52,19 @@ config MALI_KUTF_CLK_RATE_TRACE Modules: - mali_kutf_clk_rate_trace_test_portal.ko +config MALI_KUTF_MGM_INTEGRATION_TEST + bool "Build Mali KUTF MGM integration test module" + depends on MALI_KUTF + default y + help + This option will build the MGM integration test module. + It can test the implementation of PTE translation for specific + group ids. + + Modules: + - mali_kutf_mgm_integration_test.ko + + comment "Enable MALI_DEBUG for KUTF modules support" depends on MALI_MIDGARD && !MALI_DEBUG && MALI_KUTF diff --git a/mali_kbase/tests/Mconfig b/mali_kbase/tests/Mconfig index 167facd..aa09274 100644 --- a/mali_kbase/tests/Mconfig +++ b/mali_kbase/tests/Mconfig @@ -1,6 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note # -# (C) COPYRIGHT 2018-2021 ARM Limited. All rights reserved. +# (C) COPYRIGHT 2018-2023 ARM Limited. All rights reserved. # # This program is free software and is provided to you under the terms of the # GNU General Public License version 2 as published by the Free Software @@ -26,8 +26,8 @@ menuconfig MALI_KUTF This option will build the Mali testing framework modules. Modules: - - kutf.ko - - kutf_test.ko + - kutf.ko + - kutf_test.ko config MALI_KUTF_IRQ_TEST bool "Build Mali KUTF IRQ test module" @@ -38,7 +38,7 @@ config MALI_KUTF_IRQ_TEST It can determine the latency of the Mali GPU IRQ on your system. Modules: - - mali_kutf_irq_test.ko + - mali_kutf_irq_test.ko config MALI_KUTF_CLK_RATE_TRACE bool "Build Mali KUTF Clock rate trace test module" @@ -50,12 +50,25 @@ config MALI_KUTF_CLK_RATE_TRACE basic trace test in the system. Modules: - - mali_kutf_clk_rate_trace_test_portal.ko + - mali_kutf_clk_rate_trace_test_portal.ko + +config MALI_KUTF_MGM_INTEGRATION_TEST + bool "Build Mali KUTF MGM integration test module" + depends on MALI_KUTF + default y + help + This option will build the MGM integration test module. + It can test the implementation of PTE translation for specific + group ids. + + Modules: + - mali_kutf_mgm_integration_test.ko + # Enable MALI_DEBUG for KUTF modules support config UNIT_TEST_KERNEL_MODULES - bool - default y if UNIT_TEST_CODE && BACKEND_KERNEL - default n + bool + default y if UNIT_TEST_CODE && BACKEND_KERNEL + default n diff --git a/mali_kbase/tests/build.bp b/mali_kbase/tests/build.bp index 9d6137d..5581ba9 100644 --- a/mali_kbase/tests/build.bp +++ b/mali_kbase/tests/build.bp @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2021-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -25,7 +25,7 @@ bob_defaults { "include", "./../../", "./../", - "./" + "./", ], } @@ -38,3 +38,9 @@ bob_defaults { kbuild_options: ["CONFIG_UNIT_TEST_KERNEL_MODULES=y"], }, } + +bob_defaults { + name: "kernel_unit_tests", + add_to_alias: ["unit_tests"], + srcs: [".*_unit_test/"], +} diff --git a/mali_kbase/tests/include/kutf/kutf_helpers.h b/mali_kbase/tests/include/kutf/kutf_helpers.h index c4c713c..3f68efa 100644 --- a/mali_kbase/tests/include/kutf/kutf_helpers.h +++ b/mali_kbase/tests/include/kutf/kutf_helpers.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2017, 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2017, 2020-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -31,6 +31,7 @@ */ #include <kutf/kutf_suite.h> +#include <linux/device.h> /** * kutf_helper_pending_input() - Check any pending lines sent by user space @@ -81,4 +82,28 @@ int kutf_helper_input_enqueue(struct kutf_context *context, */ void kutf_helper_input_enqueue_end_of_data(struct kutf_context *context); +/** + * kutf_helper_ignore_dmesg() - Write message in dmesg to instruct parser + * to ignore errors, until the counterpart + * is written to dmesg to stop ignoring errors. + * @dev: Device pointer to write to dmesg using. + * + * This function writes "Start ignoring dmesg warnings" to dmesg, which + * the parser will read and not log any errors. Only to be used in cases where + * we expect an error to be produced in dmesg but that we do not want to be + * flagged as an error. + */ +void kutf_helper_ignore_dmesg(struct device *dev); + +/** + * kutf_helper_stop_ignoring_dmesg() - Write message in dmesg to instruct parser + * to stop ignoring errors. + * @dev: Device pointer to write to dmesg using. + * + * This function writes "Stop ignoring dmesg warnings" to dmesg, which + * the parser will read and continue to log any errors. Counterpart to + * kutf_helper_ignore_dmesg(). + */ +void kutf_helper_stop_ignoring_dmesg(struct device *dev); + #endif /* _KERNEL_UTF_HELPERS_H_ */ diff --git a/mali_kbase/tests/kutf/Kbuild b/mali_kbase/tests/kutf/Kbuild index c4790bc..3b3bc4c 100644 --- a/mali_kbase/tests/kutf/Kbuild +++ b/mali_kbase/tests/kutf/Kbuild @@ -19,9 +19,9 @@ # ifeq ($(CONFIG_MALI_KUTF),y) -obj-m += kutf.o +obj-m += mali_kutf.o -kutf-y := \ +mali_kutf-y := \ kutf_mem.o \ kutf_resultset.o \ kutf_suite.o \ diff --git a/mali_kbase/tests/kutf/kutf_helpers.c b/mali_kbase/tests/kutf/kutf_helpers.c index d207d1c..4273619 100644 --- a/mali_kbase/tests/kutf/kutf_helpers.c +++ b/mali_kbase/tests/kutf/kutf_helpers.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2017, 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2017, 2020-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -127,3 +127,15 @@ void kutf_helper_input_enqueue_end_of_data(struct kutf_context *context) { kutf_helper_input_enqueue(context, NULL, 0); } + +void kutf_helper_ignore_dmesg(struct device *dev) +{ + dev_info(dev, "KUTF: Start ignoring dmesg warnings\n"); +} +EXPORT_SYMBOL(kutf_helper_ignore_dmesg); + +void kutf_helper_stop_ignoring_dmesg(struct device *dev) +{ + dev_info(dev, "KUTF: Stop ignoring dmesg warnings\n"); +} +EXPORT_SYMBOL(kutf_helper_stop_ignoring_dmesg); diff --git a/mali_kbase/tests/kutf/kutf_helpers_user.c b/mali_kbase/tests/kutf/kutf_helpers_user.c index f88e138..c4e2943 100644 --- a/mali_kbase/tests/kutf/kutf_helpers_user.c +++ b/mali_kbase/tests/kutf/kutf_helpers_user.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2017, 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2017, 2020-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -28,7 +28,7 @@ #include <linux/slab.h> #include <linux/export.h> -const char *valtype_names[] = { +static const char *const valtype_names[] = { "INVALID", "U64", "STR", diff --git a/mali_kbase/tests/kutf/kutf_suite.c b/mali_kbase/tests/kutf/kutf_suite.c index 91065b5..4468066 100644 --- a/mali_kbase/tests/kutf/kutf_suite.c +++ b/mali_kbase/tests/kutf/kutf_suite.c @@ -106,22 +106,16 @@ struct kutf_convert_table { enum kutf_result_status result; }; -struct kutf_convert_table kutf_convert[] = { -#define ADD_UTF_RESULT(_name) \ -{ \ - #_name, \ - _name, \ -}, -ADD_UTF_RESULT(KUTF_RESULT_BENCHMARK) -ADD_UTF_RESULT(KUTF_RESULT_SKIP) -ADD_UTF_RESULT(KUTF_RESULT_UNKNOWN) -ADD_UTF_RESULT(KUTF_RESULT_PASS) -ADD_UTF_RESULT(KUTF_RESULT_DEBUG) -ADD_UTF_RESULT(KUTF_RESULT_INFO) -ADD_UTF_RESULT(KUTF_RESULT_WARN) -ADD_UTF_RESULT(KUTF_RESULT_FAIL) -ADD_UTF_RESULT(KUTF_RESULT_FATAL) -ADD_UTF_RESULT(KUTF_RESULT_ABORT) +static const struct kutf_convert_table kutf_convert[] = { +#define ADD_UTF_RESULT(_name) \ + { \ +#_name, _name, \ + } + ADD_UTF_RESULT(KUTF_RESULT_BENCHMARK), ADD_UTF_RESULT(KUTF_RESULT_SKIP), + ADD_UTF_RESULT(KUTF_RESULT_UNKNOWN), ADD_UTF_RESULT(KUTF_RESULT_PASS), + ADD_UTF_RESULT(KUTF_RESULT_DEBUG), ADD_UTF_RESULT(KUTF_RESULT_INFO), + ADD_UTF_RESULT(KUTF_RESULT_WARN), ADD_UTF_RESULT(KUTF_RESULT_FAIL), + ADD_UTF_RESULT(KUTF_RESULT_FATAL), ADD_UTF_RESULT(KUTF_RESULT_ABORT), }; #define UTF_CONVERT_SIZE (ARRAY_SIZE(kutf_convert)) @@ -191,8 +185,7 @@ static void kutf_set_expected_result(struct kutf_context *context, * * Return: 1 if test result was successfully converted to string, 0 otherwise */ -static int kutf_result_to_string(char **result_str, - enum kutf_result_status result) +static int kutf_result_to_string(const char **result_str, enum kutf_result_status result) { int i; int ret = 0; @@ -382,7 +375,7 @@ static ssize_t kutf_debugfs_run_read(struct file *file, char __user *buf, struct kutf_result *res; unsigned long bytes_not_copied; ssize_t bytes_copied = 0; - char *kutf_str_ptr = NULL; + const char *kutf_str_ptr = NULL; size_t kutf_str_len = 0; size_t message_len = 0; char separator = ':'; @@ -599,11 +592,7 @@ static int create_fixture_variant(struct kutf_test_function *test_func, goto fail_file; } -#if KERNEL_VERSION(4, 7, 0) <= LINUX_VERSION_CODE tmp = debugfs_create_file_unsafe( -#else - tmp = debugfs_create_file( -#endif "run", 0600, test_fix->dir, test_fix, &kutf_debugfs_run_ops); diff --git a/mali_kbase/tests/kutf/kutf_utils.c b/mali_kbase/tests/kutf/kutf_utils.c index 2ae1510..21f5fad 100644 --- a/mali_kbase/tests/kutf/kutf_utils.c +++ b/mali_kbase/tests/kutf/kutf_utils.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2014, 2017, 2020-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2014, 2017, 2020-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -31,7 +31,7 @@ static char tmp_buffer[KUTF_MAX_DSPRINTF_LEN]; -DEFINE_MUTEX(buffer_lock); +static DEFINE_MUTEX(buffer_lock); const char *kutf_dsprintf(struct kutf_mempool *pool, const char *fmt, ...) diff --git a/mali_kbase/tests/mali_kutf_clk_rate_trace/kernel/mali_kutf_clk_rate_trace_test.c b/mali_kbase/tests/mali_kutf_clk_rate_trace/kernel/mali_kutf_clk_rate_trace_test.c index 935f8ca..8b86fb0 100644 --- a/mali_kbase/tests/mali_kutf_clk_rate_trace/kernel/mali_kutf_clk_rate_trace_test.c +++ b/mali_kbase/tests/mali_kutf_clk_rate_trace/kernel/mali_kutf_clk_rate_trace_test.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2020-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2020-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -46,7 +46,7 @@ #define MINOR_FOR_FIRST_KBASE_DEV (-1) /* KUTF test application pointer for this test */ -struct kutf_application *kutf_app; +static struct kutf_application *kutf_app; enum portal_server_state { PORTAL_STATE_NO_CLK, @@ -113,7 +113,7 @@ struct kbasep_cmd_name_pair { const char *name; }; -struct kbasep_cmd_name_pair kbasep_portal_cmd_name_map[] = { +static const struct kbasep_cmd_name_pair kbasep_portal_cmd_name_map[] = { { PORTAL_CMD_GET_PLATFORM, GET_PLATFORM }, { PORTAL_CMD_GET_CLK_RATE_MGR, GET_CLK_RATE_MGR }, { PORTAL_CMD_GET_CLK_RATE_TRACE, GET_CLK_RATE_TRACE }, @@ -128,7 +128,7 @@ struct kbasep_cmd_name_pair kbasep_portal_cmd_name_map[] = { * this pointer is engaged, new requests for create fixture will fail * hence limiting the use of the portal at any time to a singleton. */ -struct kutf_clk_rate_trace_fixture_data *g_ptr_portal_data; +static struct kutf_clk_rate_trace_fixture_data *g_ptr_portal_data; #define PORTAL_MSG_LEN (KUTF_MAX_LINE_LENGTH - MAX_REPLY_NAME_LEN) static char portal_msg_buf[PORTAL_MSG_LEN]; @@ -442,8 +442,9 @@ static const char *kutf_clk_trace_do_get_platform( #if defined(CONFIG_MALI_ARBITER_SUPPORT) && defined(CONFIG_OF) struct kutf_clk_rate_trace_fixture_data *data = context->fixture; - arbiter_if_node = - of_get_property(data->kbdev->dev->of_node, "arbiter_if", NULL); + arbiter_if_node = of_get_property(data->kbdev->dev->of_node, "arbiter-if", NULL); + if (!arbiter_if_node) + arbiter_if_node = of_get_property(data->kbdev->dev->of_node, "arbiter_if", NULL); #endif if (arbiter_if_node) { power_node = of_find_compatible_node(NULL, NULL, @@ -825,14 +826,14 @@ static void *mali_kutf_clk_rate_trace_create_fixture( if (!data) return NULL; - *data = (const struct kutf_clk_rate_trace_fixture_data) { 0 }; + memset(data, 0, sizeof(*data)); pr_debug("Hooking up the test portal to kbdev clk rate trace\n"); spin_lock(&kbdev->pm.clk_rtm.lock); if (g_ptr_portal_data != NULL) { pr_warn("Test portal is already in use, run aborted\n"); - kutf_test_fail(context, "Portal allows single session only"); spin_unlock(&kbdev->pm.clk_rtm.lock); + kutf_test_fail(context, "Portal allows single session only"); return NULL; } @@ -909,7 +910,7 @@ static int __init mali_kutf_clk_rate_trace_test_module_init(void) { struct kutf_suite *suite; unsigned int filters; - union kutf_callback_data suite_data = { 0 }; + union kutf_callback_data suite_data = { NULL }; pr_debug("Creating app\n"); diff --git a/mali_kbase/tests/mali_kutf_irq_test/mali_kutf_irq_test_main.c b/mali_kbase/tests/mali_kutf_irq_test/mali_kutf_irq_test_main.c index 5824a4c..f2a014d 100644 --- a/mali_kbase/tests/mali_kutf_irq_test/mali_kutf_irq_test_main.c +++ b/mali_kbase/tests/mali_kutf_irq_test/mali_kutf_irq_test_main.c @@ -40,7 +40,7 @@ */ /* KUTF test application pointer for this test */ -struct kutf_application *irq_app; +static struct kutf_application *irq_app; /** * struct kutf_irq_fixture_data - test fixture used by the test functions. @@ -51,8 +51,6 @@ struct kutf_irq_fixture_data { struct kbase_device *kbdev; }; -#define SEC_TO_NANO(s) ((s)*1000000000LL) - /* ID for the GPU IRQ */ #define GPU_IRQ_HANDLER 2 @@ -212,6 +210,11 @@ static void mali_kutf_irq_latency(struct kutf_context *context) average_time += irq_time - start_time; udelay(10); + /* Sleep for a ms, every 10000 iterations, to avoid misleading warning + * of CPU softlockup when all GPU IRQs keep going to the same CPU. + */ + if (!(i % 10000)) + msleep(1); } /* Go back to default handler */ diff --git a/mali_kbase/arbitration/ptm/Kconfig b/mali_kbase/tests/mali_kutf_mgm_integration_test/Kbuild index 074ebd5..e9bff98 100644 --- a/mali_kbase/arbitration/ptm/Kconfig +++ b/mali_kbase/tests/mali_kutf_mgm_integration_test/Kbuild @@ -1,6 +1,6 @@ -# SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note OR MIT +# SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note # -# (C) COPYRIGHT 2021 ARM Limited. All rights reserved. +# (C) COPYRIGHT 2022 ARM Limited. All rights reserved. # # This program is free software and is provided to you under the terms of the # GNU General Public License version 2 as published by the Free Software @@ -18,11 +18,8 @@ # # -config MALI_PARTITION_MANAGER - tristate "Enable compilation of partition manager modules" - depends on MALI_ARBITRATION - default n - help - This option enables the compilation of the partition manager - modules used to configure the Mali-G78AE GPU. +ifeq ($(CONFIG_MALI_KUTF_MGM_INTEGRATION_TEST),y) +obj-m += mali_kutf_mgm_integration_test.o +mali_kutf_mgm_integration_test-y := mali_kutf_mgm_integration_test_main.o +endif diff --git a/mali_kbase/tests/mali_kutf_mgm_integration_test/build.bp b/mali_kbase/tests/mali_kutf_mgm_integration_test/build.bp new file mode 100644 index 0000000..8b995f8 --- /dev/null +++ b/mali_kbase/tests/mali_kutf_mgm_integration_test/build.bp @@ -0,0 +1,41 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * + * (C) COPYRIGHT 2022 ARM Limited. All rights reserved. + * + * This program is free software and is provided to you under the terms of the + * GNU General Public License version 2 as published by the Free Software + * Foundation, and any use by you of this program is subject to the terms + * of such GNU license. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, you can access it online at + * http://www.gnu.org/licenses/gpl-2.0.html. + * + */ +bob_kernel_module { + name: "mali_kutf_mgm_integration_test", + defaults: [ + "mali_kbase_shared_config_defaults", + "kernel_test_configs", + "kernel_test_includes", + ], + srcs: [ + "Kbuild", + "mali_kutf_mgm_integration_test_main.c", + ], + extra_symbols: [ + "mali_kbase", + "kutf", + ], + enabled: false, + mali_kutf_mgm_integration_test: { + kbuild_options: ["CONFIG_MALI_KUTF_MGM_INTEGRATION_TEST=y"], + enabled: true, + }, +} diff --git a/mali_kbase/tests/mali_kutf_mgm_integration_test/mali_kutf_mgm_integration_test_main.c b/mali_kbase/tests/mali_kutf_mgm_integration_test/mali_kutf_mgm_integration_test_main.c new file mode 100644 index 0000000..5a42bd6 --- /dev/null +++ b/mali_kbase/tests/mali_kutf_mgm_integration_test/mali_kutf_mgm_integration_test_main.c @@ -0,0 +1,210 @@ +// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note +/* + * + * (C) COPYRIGHT 2022 ARM Limited. All rights reserved. + * + * This program is free software and is provided to you under the terms of the + * GNU General Public License version 2 as published by the Free Software + * Foundation, and any use by you of this program is subject to the terms + * of such GNU license. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, you can access it online at + * http://www.gnu.org/licenses/gpl-2.0.html. + * + */ +#include <linux/module.h> +#include "mali_kbase.h" +#include <kutf/kutf_suite.h> +#include <kutf/kutf_utils.h> +#include <kutf/kutf_helpers.h> +#include <kutf/kutf_helpers_user.h> + +#define MINOR_FOR_FIRST_KBASE_DEV (-1) + +#define BASE_MEM_GROUP_COUNT (16) +#define PA_MAX ((1ULL << 48) - 1) +#define PA_START_BIT 12 +#define ENTRY_ACCESS_BIT (1ULL << 10) + +#define ENTRY_IS_ATE_L3 3ULL +#define ENTRY_IS_ATE_L02 1ULL + +#define MGM_INTEGRATION_SUITE_NAME "mgm_integration" +#define MGM_INTEGRATION_PTE_TRANSLATION "pte_translation" + +static char msg_buf[KUTF_MAX_LINE_LENGTH]; + +/* KUTF test application pointer for this test */ +struct kutf_application *mgm_app; + +/** + * struct kutf_mgm_fixture_data - test fixture used by test functions + * @kbdev: kbase device for the GPU. + * @group_id: Memory group ID to test based on fixture index. + */ +struct kutf_mgm_fixture_data { + struct kbase_device *kbdev; + int group_id; +}; + +/** + * mali_kutf_mgm_pte_translation_test() - Tests forward and reverse translation + * of PTE by the MGM module + * @context: KUTF context within which to perform the test. + * + * This test creates PTEs with physical addresses in the range + * 0x0000-0xFFFFFFFFF000 and tests that mgm_update_gpu_pte() returns a different + * PTE and mgm_pte_to_original_pte() returns the original PTE. This is tested + * at MMU level 2 and 3 as mgm_update_gpu_pte() is called for ATEs only. + * + * This test is run for a specific group_id depending on the fixture_id. + */ +static void mali_kutf_mgm_pte_translation_test(struct kutf_context *context) +{ + struct kutf_mgm_fixture_data *data = context->fixture; + struct kbase_device *kbdev = data->kbdev; + struct memory_group_manager_device *mgm_dev = kbdev->mgm_dev; + u64 addr; + + for (addr = 1 << (PA_START_BIT - 1); addr <= PA_MAX; addr <<= 1) { + /* Mask 1 << 11 by ~0xFFF to get 0x0000 at first iteration */ + phys_addr_t pa = addr; + u8 mmu_level; + + /* Test MMU level 3 and 2 (2MB pages) only */ + for (mmu_level = MIDGARD_MMU_LEVEL(2); mmu_level <= MIDGARD_MMU_LEVEL(3); + mmu_level++) { + u64 translated_pte; + u64 returned_pte; + u64 original_pte; + + if (mmu_level == MIDGARD_MMU_LEVEL(3)) + original_pte = + (pa & PAGE_MASK) | ENTRY_ACCESS_BIT | ENTRY_IS_ATE_L3; + else + original_pte = + (pa & PAGE_MASK) | ENTRY_ACCESS_BIT | ENTRY_IS_ATE_L02; + + dev_dbg(kbdev->dev, "Testing group_id=%u, mmu_level=%u, pte=0x%llx\n", + data->group_id, mmu_level, original_pte); + + translated_pte = mgm_dev->ops.mgm_update_gpu_pte(mgm_dev, data->group_id, + mmu_level, original_pte); + if (translated_pte == original_pte) { + snprintf( + msg_buf, sizeof(msg_buf), + "PTE unchanged. translated_pte (0x%llx) == original_pte (0x%llx) for mmu_level=%u, group_id=%d", + translated_pte, original_pte, mmu_level, data->group_id); + kutf_test_fail(context, msg_buf); + return; + } + + returned_pte = mgm_dev->ops.mgm_pte_to_original_pte( + mgm_dev, data->group_id, mmu_level, translated_pte); + dev_dbg(kbdev->dev, "\treturned_pte=%llx\n", returned_pte); + + if (returned_pte != original_pte) { + snprintf( + msg_buf, sizeof(msg_buf), + "Original PTE not returned. returned_pte (0x%llx) != origin al_pte (0x%llx) for mmu_level=%u, group_id=%d", + returned_pte, original_pte, mmu_level, data->group_id); + kutf_test_fail(context, msg_buf); + return; + } + } + } + snprintf(msg_buf, sizeof(msg_buf), "Translation passed for group_id=%d", data->group_id); + kutf_test_pass(context, msg_buf); +} + +/** + * mali_kutf_mgm_integration_create_fixture() - Creates the fixture data + * required for all tests in the mgm integration suite. + * @context: KUTF context. + * + * Return: Fixture data created on success or NULL on failure + */ +static void *mali_kutf_mgm_integration_create_fixture(struct kutf_context *context) +{ + struct kutf_mgm_fixture_data *data; + struct kbase_device *kbdev; + + pr_debug("Finding kbase device\n"); + kbdev = kbase_find_device(MINOR_FOR_FIRST_KBASE_DEV); + if (kbdev == NULL) { + kutf_test_fail(context, "Failed to find kbase device"); + return NULL; + } + pr_debug("Creating fixture\n"); + + data = kutf_mempool_alloc(&context->fixture_pool, sizeof(struct kutf_mgm_fixture_data)); + if (!data) + return NULL; + data->kbdev = kbdev; + data->group_id = context->fixture_index; + + pr_debug("Fixture created\n"); + return data; +} + +/** + * mali_kutf_mgm_integration_remove_fixture() - Destroy fixture data previously + * created by mali_kutf_mgm_integration_create_fixture. + * @context: KUTF context. + */ +static void mali_kutf_mgm_integration_remove_fixture(struct kutf_context *context) +{ + struct kutf_mgm_fixture_data *data = context->fixture; + struct kbase_device *kbdev = data->kbdev; + + kbase_release_device(kbdev); +} + +/** + * mali_kutf_mgm_integration_test_main_init() - Module entry point for this test. + * + * Return: 0 on success, error code on failure. + */ +static int __init mali_kutf_mgm_integration_test_main_init(void) +{ + struct kutf_suite *suite; + + mgm_app = kutf_create_application("mgm"); + + if (mgm_app == NULL) { + pr_warn("Creation of mgm KUTF app failed!\n"); + return -ENOMEM; + } + suite = kutf_create_suite(mgm_app, MGM_INTEGRATION_SUITE_NAME, BASE_MEM_GROUP_COUNT, + mali_kutf_mgm_integration_create_fixture, + mali_kutf_mgm_integration_remove_fixture); + if (suite == NULL) { + pr_warn("Creation of %s suite failed!\n", MGM_INTEGRATION_SUITE_NAME); + kutf_destroy_application(mgm_app); + return -ENOMEM; + } + kutf_add_test(suite, 0x0, MGM_INTEGRATION_PTE_TRANSLATION, + mali_kutf_mgm_pte_translation_test); + return 0; +} + +/** + * mali_kutf_mgm_integration_test_main_exit() - Module exit point for this test. + */ +static void __exit mali_kutf_mgm_integration_test_main_exit(void) +{ + kutf_destroy_application(mgm_app); +} + +module_init(mali_kutf_mgm_integration_test_main_init); +module_exit(mali_kutf_mgm_integration_test_main_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("ARM Ltd."); +MODULE_VERSION("1.0"); diff --git a/mali_kbase/thirdparty/mali_kbase_mmap.c b/mali_kbase/thirdparty/mali_kbase_mmap.c index 1e636b9..20f7496 100644 --- a/mali_kbase/thirdparty/mali_kbase_mmap.c +++ b/mali_kbase/thirdparty/mali_kbase_mmap.c @@ -303,8 +303,7 @@ unsigned long kbase_context_get_unmapped_area(struct kbase_context *const kctx, * is no free region at the address found originally by too large a * same_va_end_addr here, and will fail the allocation gracefully. */ - struct kbase_reg_zone *zone = - kbase_ctx_reg_zone_get_nolock(kctx, KBASE_REG_ZONE_SAME_VA); + struct kbase_reg_zone *zone = kbase_ctx_reg_zone_get_nolock(kctx, SAME_VA_ZONE); u64 same_va_end_addr = kbase_reg_zone_end_pfn(zone) << PAGE_SHIFT; #if (KERNEL_VERSION(6, 1, 0) <= LINUX_VERSION_CODE) const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags); @@ -386,7 +385,7 @@ unsigned long kbase_context_get_unmapped_area(struct kbase_context *const kctx, #ifndef CONFIG_64BIT } else { return current->mm->get_unmapped_area( - kctx->filp, addr, len, pgoff, flags); + kctx->kfile->filp, addr, len, pgoff, flags); #endif } diff --git a/mali_kbase/tl/Kbuild b/mali_kbase/tl/Kbuild index 4344850..1ecf3e4 100644 --- a/mali_kbase/tl/Kbuild +++ b/mali_kbase/tl/Kbuild @@ -1,6 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note # -# (C) COPYRIGHT 2021 ARM Limited. All rights reserved. +# (C) COPYRIGHT 2022 ARM Limited. All rights reserved. # # This program is free software and is provided to you under the terms of the # GNU General Public License version 2 as published by the Free Software diff --git a/mali_kbase/tl/backend/mali_kbase_timeline_csf.c b/mali_kbase/tl/backend/mali_kbase_timeline_csf.c index a6062f1..e96e05b 100644 --- a/mali_kbase/tl/backend/mali_kbase_timeline_csf.c +++ b/mali_kbase/tl/backend/mali_kbase_timeline_csf.c @@ -84,7 +84,7 @@ void kbase_create_timeline_objects(struct kbase_device *kbdev) * stream tracepoints are emitted to ensure we don't change the * scheduler until after then */ - mutex_lock(&kbdev->csf.scheduler.lock); + rt_mutex_lock(&kbdev->csf.scheduler.lock); for (slot_i = 0; slot_i < kbdev->csf.global_iface.group_num; slot_i++) { @@ -105,7 +105,7 @@ void kbase_create_timeline_objects(struct kbase_device *kbdev) */ kbase_timeline_streams_body_reset(timeline); - mutex_unlock(&kbdev->csf.scheduler.lock); + rt_mutex_unlock(&kbdev->csf.scheduler.lock); /* For each context in the device... */ list_for_each_entry(kctx, &timeline->tl_kctx_list, tl_kctx_list_node) { diff --git a/mali_kbase/tl/mali_kbase_timeline.c b/mali_kbase/tl/mali_kbase_timeline.c index d656c03..20356d6 100644 --- a/mali_kbase/tl/mali_kbase_timeline.c +++ b/mali_kbase/tl/mali_kbase_timeline.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2015-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2015-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -24,9 +24,6 @@ #include "mali_kbase_tracepoints.h" #include <mali_kbase.h> -#include <mali_kbase_jm.h> - -#include <linux/anon_inodes.h> #include <linux/atomic.h> #include <linux/file.h> #include <linux/mutex.h> @@ -35,7 +32,7 @@ #include <linux/stringify.h> #include <linux/timer.h> #include <linux/wait.h> - +#include <linux/delay.h> /* The period of autoflush checker execution in milliseconds. */ #define AUTOFLUSH_INTERVAL 1000 /* ms */ @@ -184,90 +181,109 @@ static void kbase_tlstream_current_devfreq_target(struct kbase_device *kbdev) } #endif /* CONFIG_MALI_DEVFREQ */ -int kbase_timeline_io_acquire(struct kbase_device *kbdev, u32 flags) +int kbase_timeline_acquire(struct kbase_device *kbdev, u32 flags) { - int ret = 0; + int err = 0; u32 timeline_flags = TLSTREAM_ENABLED | flags; - struct kbase_timeline *timeline = kbdev->timeline; + struct kbase_timeline *timeline; + int rcode; + + if (WARN_ON(!kbdev) || WARN_ON(flags & ~BASE_TLSTREAM_FLAGS_MASK)) + return -EINVAL; - if (!atomic_cmpxchg(timeline->timeline_flags, 0, timeline_flags)) { - int rcode; + timeline = kbdev->timeline; + if (WARN_ON(!timeline)) + return -EFAULT; + + if (atomic_cmpxchg(timeline->timeline_flags, 0, timeline_flags)) + return -EBUSY; #if MALI_USE_CSF - if (flags & BASE_TLSTREAM_ENABLE_CSFFW_TRACEPOINTS) { - ret = kbase_csf_tl_reader_start( - &timeline->csf_tl_reader, kbdev); - if (ret) { - atomic_set(timeline->timeline_flags, 0); - return ret; - } - } -#endif - ret = anon_inode_getfd( - "[mali_tlstream]", - &kbasep_tlstream_fops, - timeline, - O_RDONLY | O_CLOEXEC); - if (ret < 0) { + if (flags & BASE_TLSTREAM_ENABLE_CSFFW_TRACEPOINTS) { + err = kbase_csf_tl_reader_start(&timeline->csf_tl_reader, kbdev); + if (err) { atomic_set(timeline->timeline_flags, 0); -#if MALI_USE_CSF - kbase_csf_tl_reader_stop(&timeline->csf_tl_reader); -#endif - return ret; + return err; } + } +#endif - /* Reset and initialize header streams. */ - kbase_tlstream_reset( - &timeline->streams[TL_STREAM_TYPE_OBJ_SUMMARY]); + /* Reset and initialize header streams. */ + kbase_tlstream_reset(&timeline->streams[TL_STREAM_TYPE_OBJ_SUMMARY]); - timeline->obj_header_btc = obj_desc_header_size; - timeline->aux_header_btc = aux_desc_header_size; + timeline->obj_header_btc = obj_desc_header_size; + timeline->aux_header_btc = aux_desc_header_size; #if !MALI_USE_CSF - /* If job dumping is enabled, readjust the software event's - * timeout as the default value of 3 seconds is often - * insufficient. - */ - if (flags & BASE_TLSTREAM_JOB_DUMPING_ENABLED) { - dev_info(kbdev->dev, - "Job dumping is enabled, readjusting the software event's timeout\n"); - atomic_set(&kbdev->js_data.soft_job_timeout_ms, - 1800000); - } + /* If job dumping is enabled, readjust the software event's + * timeout as the default value of 3 seconds is often + * insufficient. + */ + if (flags & BASE_TLSTREAM_JOB_DUMPING_ENABLED) { + dev_info(kbdev->dev, + "Job dumping is enabled, readjusting the software event's timeout\n"); + atomic_set(&kbdev->js_data.soft_job_timeout_ms, 1800000); + } #endif /* !MALI_USE_CSF */ - /* Summary stream was cleared during acquire. - * Create static timeline objects that will be - * read by client. - */ - kbase_create_timeline_objects(kbdev); + /* Summary stream was cleared during acquire. + * Create static timeline objects that will be + * read by client. + */ + kbase_create_timeline_objects(kbdev); #ifdef CONFIG_MALI_DEVFREQ - /* Devfreq target tracepoints are only fired when the target - * changes, so we won't know the current target unless we - * send it now. - */ - kbase_tlstream_current_devfreq_target(kbdev); + /* Devfreq target tracepoints are only fired when the target + * changes, so we won't know the current target unless we + * send it now. + */ + kbase_tlstream_current_devfreq_target(kbdev); #endif /* CONFIG_MALI_DEVFREQ */ - /* Start the autoflush timer. - * We must do this after creating timeline objects to ensure we - * don't auto-flush the streams which will be reset during the - * summarization process. - */ - atomic_set(&timeline->autoflush_timer_active, 1); - rcode = mod_timer(&timeline->autoflush_timer, - jiffies + - msecs_to_jiffies(AUTOFLUSH_INTERVAL)); - CSTD_UNUSED(rcode); - } else { - ret = -EBUSY; - } + /* Start the autoflush timer. + * We must do this after creating timeline objects to ensure we + * don't auto-flush the streams which will be reset during the + * summarization process. + */ + atomic_set(&timeline->autoflush_timer_active, 1); + rcode = mod_timer(&timeline->autoflush_timer, + jiffies + msecs_to_jiffies(AUTOFLUSH_INTERVAL)); + CSTD_UNUSED(rcode); + + timeline->last_acquire_time = ktime_get_raw(); + + return err; +} + +void kbase_timeline_release(struct kbase_timeline *timeline) +{ + ktime_t elapsed_time; + s64 elapsed_time_ms, time_to_sleep; + + if (WARN_ON(!timeline) || WARN_ON(!atomic_read(timeline->timeline_flags))) + return; + + /* Get the amount of time passed since the timeline was acquired and ensure + * we sleep for long enough such that it has been at least + * TIMELINE_HYSTERESIS_TIMEOUT_MS amount of time between acquire and release. + * This prevents userspace from spamming acquire and release too quickly. + */ + elapsed_time = ktime_sub(ktime_get_raw(), timeline->last_acquire_time); + elapsed_time_ms = ktime_to_ms(elapsed_time); + time_to_sleep = (elapsed_time_ms < 0 ? TIMELINE_HYSTERESIS_TIMEOUT_MS : + TIMELINE_HYSTERESIS_TIMEOUT_MS - elapsed_time_ms); + if (time_to_sleep > 0) + msleep_interruptible(time_to_sleep); - if (ret >= 0) - timeline->last_acquire_time = ktime_get(); +#if MALI_USE_CSF + kbase_csf_tl_reader_stop(&timeline->csf_tl_reader); +#endif - return ret; + /* Stop autoflush timer before releasing access to streams. */ + atomic_set(&timeline->autoflush_timer_active, 0); + del_timer_sync(&timeline->autoflush_timer); + + atomic_set(timeline->timeline_flags, 0); } int kbase_timeline_streams_flush(struct kbase_timeline *timeline) @@ -275,11 +291,17 @@ int kbase_timeline_streams_flush(struct kbase_timeline *timeline) enum tl_stream_type stype; bool has_bytes = false; size_t nbytes = 0; + + if (WARN_ON(!timeline)) + return -EINVAL; + #if MALI_USE_CSF - int ret = kbase_csf_tl_reader_flush_buffer(&timeline->csf_tl_reader); + { + int ret = kbase_csf_tl_reader_flush_buffer(&timeline->csf_tl_reader); - if (ret > 0) - has_bytes = true; + if (ret > 0) + has_bytes = true; + } #endif for (stype = 0; stype < TL_STREAM_TYPE_COUNT; stype++) { diff --git a/mali_kbase/tl/mali_kbase_timeline.h b/mali_kbase/tl/mali_kbase_timeline.h index 96a4b18..62be6c6 100644 --- a/mali_kbase/tl/mali_kbase_timeline.h +++ b/mali_kbase/tl/mali_kbase_timeline.h @@ -117,4 +117,12 @@ void kbase_timeline_post_kbase_context_destroy(struct kbase_context *kctx); void kbase_timeline_stats(struct kbase_timeline *timeline, u32 *bytes_collected, u32 *bytes_generated); #endif /* MALI_UNIT_TEST */ +/** + * kbase_timeline_io_debugfs_init - Add a debugfs entry for reading timeline stream data + * + * @kbdev: An instance of the GPU platform device, allocated from the probe + * method of the driver. + */ +void kbase_timeline_io_debugfs_init(struct kbase_device *kbdev); + #endif /* _KBASE_TIMELINE_H */ diff --git a/mali_kbase/tl/mali_kbase_timeline_io.c b/mali_kbase/tl/mali_kbase_timeline_io.c index 3391e75..ae57006 100644 --- a/mali_kbase/tl/mali_kbase_timeline_io.c +++ b/mali_kbase/tl/mali_kbase_timeline_io.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -24,26 +24,74 @@ #include "mali_kbase_tracepoints.h" #include "mali_kbase_timeline.h" -#include <linux/delay.h> +#include <device/mali_kbase_device.h> + #include <linux/poll.h> +#include <linux/version_compat_defs.h> +#include <linux/anon_inodes.h> + +/* Explicitly include epoll header for old kernels. Not required from 4.16. */ +#if KERNEL_VERSION(4, 16, 0) > LINUX_VERSION_CODE +#include <uapi/linux/eventpoll.h> +#endif + +static int kbase_unprivileged_global_profiling; + +/** + * kbase_unprivileged_global_profiling_set - set permissions for unprivileged processes + * + * @val: String containing value to set. Only strings representing positive + * integers are accepted as valid; any non-positive integer (including 0) + * is rejected. + * @kp: Module parameter associated with this method. + * + * This method can only be used to enable permissions for unprivileged processes, + * if they are disabled: for this reason, the only values which are accepted are + * strings representing positive integers. Since it's impossible to disable + * permissions once they're set, any integer which is non-positive is rejected, + * including 0. + * + * Return: 0 if success, otherwise error code. + */ +static int kbase_unprivileged_global_profiling_set(const char *val, const struct kernel_param *kp) +{ + int new_val; + int ret = kstrtoint(val, 0, &new_val); + + if (ret == 0) { + if (new_val < 1) + return -EINVAL; + + kbase_unprivileged_global_profiling = 1; + } + + return ret; +} + +static const struct kernel_param_ops kbase_global_unprivileged_profiling_ops = { + .get = param_get_int, + .set = kbase_unprivileged_global_profiling_set, +}; + +module_param_cb(kbase_unprivileged_global_profiling, &kbase_global_unprivileged_profiling_ops, + &kbase_unprivileged_global_profiling, 0600); /* The timeline stream file operations functions. */ static ssize_t kbasep_timeline_io_read(struct file *filp, char __user *buffer, size_t size, loff_t *f_pos); -static unsigned int kbasep_timeline_io_poll(struct file *filp, - poll_table *wait); +static __poll_t kbasep_timeline_io_poll(struct file *filp, poll_table *wait); static int kbasep_timeline_io_release(struct inode *inode, struct file *filp); static int kbasep_timeline_io_fsync(struct file *filp, loff_t start, loff_t end, int datasync); -/* The timeline stream file operations structure. */ -const struct file_operations kbasep_tlstream_fops = { - .owner = THIS_MODULE, - .release = kbasep_timeline_io_release, - .read = kbasep_timeline_io_read, - .poll = kbasep_timeline_io_poll, - .fsync = kbasep_timeline_io_fsync, -}; +static bool timeline_is_permitted(void) +{ +#if KERNEL_VERSION(5, 8, 0) <= LINUX_VERSION_CODE + return kbase_unprivileged_global_profiling || perfmon_capable(); +#else + return kbase_unprivileged_global_profiling || capable(CAP_SYS_ADMIN); +#endif +} /** * kbasep_timeline_io_packet_pending - check timeline streams for pending @@ -290,9 +338,10 @@ static ssize_t kbasep_timeline_io_read(struct file *filp, char __user *buffer, * @filp: Pointer to file structure * @wait: Pointer to poll table * - * Return: POLLIN if data can be read without blocking, otherwise zero + * Return: EPOLLIN | EPOLLRDNORM if data can be read without blocking, + * otherwise zero, or EPOLLHUP | EPOLLERR on error. */ -static unsigned int kbasep_timeline_io_poll(struct file *filp, poll_table *wait) +static __poll_t kbasep_timeline_io_poll(struct file *filp, poll_table *wait) { struct kbase_tlstream *stream; unsigned int rb_idx; @@ -302,20 +351,94 @@ static unsigned int kbasep_timeline_io_poll(struct file *filp, poll_table *wait) KBASE_DEBUG_ASSERT(wait); if (WARN_ON(!filp->private_data)) - return -EFAULT; + return EPOLLHUP | EPOLLERR; timeline = (struct kbase_timeline *)filp->private_data; /* If there are header bytes to copy, read will not block */ if (kbasep_timeline_has_header_data(timeline)) - return POLLIN; + return EPOLLIN | EPOLLRDNORM; poll_wait(filp, &timeline->event_queue, wait); if (kbasep_timeline_io_packet_pending(timeline, &stream, &rb_idx)) - return POLLIN; - return 0; + return EPOLLIN | EPOLLRDNORM; + + return (__poll_t)0; +} + +int kbase_timeline_io_acquire(struct kbase_device *kbdev, u32 flags) +{ + /* The timeline stream file operations structure. */ + static const struct file_operations kbasep_tlstream_fops = { + .owner = THIS_MODULE, + .release = kbasep_timeline_io_release, + .read = kbasep_timeline_io_read, + .poll = kbasep_timeline_io_poll, + .fsync = kbasep_timeline_io_fsync, + }; + int err; + + if (!timeline_is_permitted()) + return -EPERM; + + if (WARN_ON(!kbdev) || (flags & ~BASE_TLSTREAM_FLAGS_MASK)) + return -EINVAL; + + err = kbase_timeline_acquire(kbdev, flags); + if (err) + return err; + + err = anon_inode_getfd("[mali_tlstream]", &kbasep_tlstream_fops, kbdev->timeline, + O_RDONLY | O_CLOEXEC); + if (err < 0) + kbase_timeline_release(kbdev->timeline); + + return err; } +#if IS_ENABLED(CONFIG_DEBUG_FS) +static int kbasep_timeline_io_open(struct inode *in, struct file *file) +{ + struct kbase_device *const kbdev = in->i_private; + + if (WARN_ON(!kbdev)) + return -EFAULT; + + file->private_data = kbdev->timeline; + return kbase_timeline_acquire(kbdev, BASE_TLSTREAM_FLAGS_MASK & + ~BASE_TLSTREAM_JOB_DUMPING_ENABLED); +} + +void kbase_timeline_io_debugfs_init(struct kbase_device *const kbdev) +{ + static const struct file_operations kbasep_tlstream_debugfs_fops = { + .owner = THIS_MODULE, + .open = kbasep_timeline_io_open, + .release = kbasep_timeline_io_release, + .read = kbasep_timeline_io_read, + .poll = kbasep_timeline_io_poll, + .fsync = kbasep_timeline_io_fsync, + }; + struct dentry *file; + + if (WARN_ON(!kbdev) || WARN_ON(IS_ERR_OR_NULL(kbdev->mali_debugfs_directory))) + return; + + file = debugfs_create_file("tlstream", 0400, kbdev->mali_debugfs_directory, kbdev, + &kbasep_tlstream_debugfs_fops); + + if (IS_ERR_OR_NULL(file)) + dev_warn(kbdev->dev, "Unable to create timeline debugfs entry"); +} +#else +/* + * Stub function for when debugfs is disabled + */ +void kbase_timeline_io_debugfs_init(struct kbase_device *const kbdev) +{ +} +#endif + /** * kbasep_timeline_io_release - release timeline stream descriptor * @inode: Pointer to inode structure @@ -325,55 +448,18 @@ static unsigned int kbasep_timeline_io_poll(struct file *filp, poll_table *wait) */ static int kbasep_timeline_io_release(struct inode *inode, struct file *filp) { - struct kbase_timeline *timeline; - ktime_t elapsed_time; - s64 elapsed_time_ms, time_to_sleep; - - KBASE_DEBUG_ASSERT(inode); - KBASE_DEBUG_ASSERT(filp); - KBASE_DEBUG_ASSERT(filp->private_data); - CSTD_UNUSED(inode); - timeline = (struct kbase_timeline *)filp->private_data; - - /* Get the amount of time passed since the timeline was acquired and ensure - * we sleep for long enough such that it has been at least - * TIMELINE_HYSTERESIS_TIMEOUT_MS amount of time between acquire and release. - * This prevents userspace from spamming acquire and release too quickly. - */ - elapsed_time = ktime_sub(ktime_get(), timeline->last_acquire_time); - elapsed_time_ms = ktime_to_ms(elapsed_time); - time_to_sleep = MIN(TIMELINE_HYSTERESIS_TIMEOUT_MS, - TIMELINE_HYSTERESIS_TIMEOUT_MS - elapsed_time_ms); - if (time_to_sleep > 0) - msleep(time_to_sleep); - -#if MALI_USE_CSF - kbase_csf_tl_reader_stop(&timeline->csf_tl_reader); -#endif - - /* Stop autoflush timer before releasing access to streams. */ - atomic_set(&timeline->autoflush_timer_active, 0); - del_timer_sync(&timeline->autoflush_timer); - - atomic_set(timeline->timeline_flags, 0); + kbase_timeline_release(filp->private_data); return 0; } static int kbasep_timeline_io_fsync(struct file *filp, loff_t start, loff_t end, int datasync) { - struct kbase_timeline *timeline; - CSTD_UNUSED(start); CSTD_UNUSED(end); CSTD_UNUSED(datasync); - if (WARN_ON(!filp->private_data)) - return -EFAULT; - - timeline = (struct kbase_timeline *)filp->private_data; - - return kbase_timeline_streams_flush(timeline); + return kbase_timeline_streams_flush(filp->private_data); } diff --git a/mali_kbase/tl/mali_kbase_timeline_priv.h b/mali_kbase/tl/mali_kbase_timeline_priv.h index bf2c385..de30bcc 100644 --- a/mali_kbase/tl/mali_kbase_timeline_priv.h +++ b/mali_kbase/tl/mali_kbase_timeline_priv.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -51,7 +51,7 @@ * @event_queue: Timeline stream event queue * @bytes_collected: Number of bytes read by user * @timeline_flags: Zero, if timeline is disabled. Timeline stream flags - * otherwise. See kbase_timeline_io_acquire(). + * otherwise. See kbase_timeline_acquire(). * @obj_header_btc: Remaining bytes to copy for the object stream header * @aux_header_btc: Remaining bytes to copy for the aux stream header * @last_acquire_time: The time at which timeline was last acquired. @@ -77,8 +77,27 @@ struct kbase_timeline { #endif }; -extern const struct file_operations kbasep_tlstream_fops; - void kbase_create_timeline_objects(struct kbase_device *kbdev); +/** + * kbase_timeline_acquire - acquire timeline for a userspace client. + * @kbdev: An instance of the GPU platform device, allocated from the probe + * method of the driver. + * @flags: Timeline stream flags + * + * Each timeline instance can be acquired by only one userspace client at a time. + * + * Return: Zero on success, error number on failure (e.g. if already acquired). + */ +int kbase_timeline_acquire(struct kbase_device *kbdev, u32 flags); + +/** + * kbase_timeline_release - release timeline for a userspace client. + * @timeline: Timeline instance to be stopped. It must be previously acquired + * with kbase_timeline_acquire(). + * + * Releasing the timeline instance allows it to be acquired by another userspace client. + */ +void kbase_timeline_release(struct kbase_timeline *timeline); + #endif /* _KBASE_TIMELINE_PRIV_H */ diff --git a/mali_kbase/tl/mali_kbase_tlstream.h b/mali_kbase/tl/mali_kbase_tlstream.h index 6660cf5..c142849 100644 --- a/mali_kbase/tl/mali_kbase_tlstream.h +++ b/mali_kbase/tl/mali_kbase_tlstream.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2015-2021 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2015-2022 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -27,17 +27,13 @@ #include <linux/wait.h> /* The maximum size of a single packet used by timeline. */ -#define PACKET_SIZE 4096 /* bytes */ +#define PACKET_SIZE 4096 /* bytes */ /* The number of packets used by one timeline stream. */ -#if defined(CONFIG_MALI_JOB_DUMP) || defined(CONFIG_MALI_VECTOR_DUMP) - #define PACKET_COUNT 64 -#else - #define PACKET_COUNT 32 -#endif +#define PACKET_COUNT 128 /* The maximum expected length of string in tracepoint descriptor. */ -#define STRLEN_MAX 64 /* bytes */ +#define STRLEN_MAX 64 /* bytes */ /** * struct kbase_tlstream - timeline stream structure diff --git a/mali_kbase/tl/mali_kbase_tracepoints.c b/mali_kbase/tl/mali_kbase_tracepoints.c index 6aae4e0..f62c755 100644 --- a/mali_kbase/tl/mali_kbase_tracepoints.c +++ b/mali_kbase/tl/mali_kbase_tracepoints.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note /* * - * (C) COPYRIGHT 2010-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -84,9 +84,12 @@ enum tl_msg_id_obj { KBASE_TL_ATTRIB_ATOM_PRIORITIZED, KBASE_TL_ATTRIB_ATOM_JIT, KBASE_TL_KBASE_NEW_DEVICE, + KBASE_TL_KBASE_GPUCMDQUEUE_KICK, KBASE_TL_KBASE_DEVICE_PROGRAM_CSG, KBASE_TL_KBASE_DEVICE_DEPROGRAM_CSG, - KBASE_TL_KBASE_DEVICE_HALT_CSG, + KBASE_TL_KBASE_DEVICE_HALTING_CSG, + KBASE_TL_KBASE_DEVICE_SUSPEND_CSG, + KBASE_TL_KBASE_DEVICE_CSG_IDLE, KBASE_TL_KBASE_NEW_CTX, KBASE_TL_KBASE_DEL_CTX, KBASE_TL_KBASE_CTX_ASSIGN_AS, @@ -97,17 +100,19 @@ enum tl_msg_id_obj { KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_FENCE_WAIT, KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_WAIT, KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_SET, + KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_WAIT_OPERATION, + KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_SET_OPERATION, KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_MAP_IMPORT, KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_UNMAP_IMPORT, KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_UNMAP_IMPORT_FORCE, - KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_ERROR_BARRIER, - KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_GROUP_SUSPEND, KBASE_TL_KBASE_ARRAY_BEGIN_KCPUQUEUE_ENQUEUE_JIT_ALLOC, KBASE_TL_KBASE_ARRAY_ITEM_KCPUQUEUE_ENQUEUE_JIT_ALLOC, KBASE_TL_KBASE_ARRAY_END_KCPUQUEUE_ENQUEUE_JIT_ALLOC, KBASE_TL_KBASE_ARRAY_BEGIN_KCPUQUEUE_ENQUEUE_JIT_FREE, KBASE_TL_KBASE_ARRAY_ITEM_KCPUQUEUE_ENQUEUE_JIT_FREE, KBASE_TL_KBASE_ARRAY_END_KCPUQUEUE_ENQUEUE_JIT_FREE, + KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_ERROR_BARRIER, + KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_GROUP_SUSPEND, KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_FENCE_SIGNAL_START, KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_FENCE_SIGNAL_END, KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_FENCE_WAIT_START, @@ -115,6 +120,9 @@ enum tl_msg_id_obj { KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_START, KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_END, KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_SET, + KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_OPERATION_START, + KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_OPERATION_END, + KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_SET_OPERATION, KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_MAP_IMPORT_START, KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_MAP_IMPORT_END, KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_UNMAP_IMPORT_START, @@ -305,11 +313,11 @@ enum tl_msg_id_obj { "@p", \ "atom") \ TRACEPOINT_DESC(KBASE_TL_JD_DONE_NO_LOCK_START, \ - "Within function jd_done_nolock", \ + "Within function kbase_jd_done_nolock", \ "@p", \ "atom") \ TRACEPOINT_DESC(KBASE_TL_JD_DONE_NO_LOCK_END, \ - "Within function jd_done_nolock - end", \ + "Within function kbase_jd_done_nolock - end", \ "@p", \ "atom") \ TRACEPOINT_DESC(KBASE_TL_JD_DONE_START, \ @@ -352,16 +360,28 @@ enum tl_msg_id_obj { "New KBase Device", \ "@IIIIIII", \ "kbase_device_id,kbase_device_gpu_core_count,kbase_device_max_num_csgs,kbase_device_as_count,kbase_device_sb_entry_count,kbase_device_has_cross_stream_sync,kbase_device_supports_gpu_sleep") \ + TRACEPOINT_DESC(KBASE_TL_KBASE_GPUCMDQUEUE_KICK, \ + "Kernel receives a request to process new GPU queue instructions", \ + "@IL", \ + "kernel_ctx_id,buffer_gpu_addr") \ TRACEPOINT_DESC(KBASE_TL_KBASE_DEVICE_PROGRAM_CSG, \ "CSG is programmed to a slot", \ "@IIIII", \ - "kbase_device_id,kernel_ctx_id,gpu_cmdq_grp_handle,kbase_device_csg_slot_index,kbase_device_csg_slot_resumed") \ + "kbase_device_id,kernel_ctx_id,gpu_cmdq_grp_handle,kbase_device_csg_slot_index,kbase_device_csg_slot_resuming") \ TRACEPOINT_DESC(KBASE_TL_KBASE_DEVICE_DEPROGRAM_CSG, \ "CSG is deprogrammed from a slot", \ "@II", \ "kbase_device_id,kbase_device_csg_slot_index") \ - TRACEPOINT_DESC(KBASE_TL_KBASE_DEVICE_HALT_CSG, \ - "CSG is halted", \ + TRACEPOINT_DESC(KBASE_TL_KBASE_DEVICE_HALTING_CSG, \ + "CSG is halting", \ + "@III", \ + "kbase_device_id,kbase_device_csg_slot_index,kbase_device_csg_slot_suspending") \ + TRACEPOINT_DESC(KBASE_TL_KBASE_DEVICE_SUSPEND_CSG, \ + "CSG is suspended", \ + "@II", \ + "kbase_device_id,kbase_device_csg_slot_index") \ + TRACEPOINT_DESC(KBASE_TL_KBASE_DEVICE_CSG_IDLE, \ + "KBase device is notified that CSG is idle.", \ "@II", \ "kbase_device_id,kbase_device_csg_slot_index") \ TRACEPOINT_DESC(KBASE_TL_KBASE_NEW_CTX, \ @@ -399,11 +419,19 @@ enum tl_msg_id_obj { TRACEPOINT_DESC(KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_WAIT, \ "KCPU Queue enqueues Wait on Cross Queue Sync Object", \ "@pLII", \ - "kcpu_queue,cqs_obj_gpu_addr,cqs_obj_compare_value,cqs_obj_inherit_error") \ + "kcpu_queue,cqs_obj_gpu_addr,compare_value,inherit_error") \ TRACEPOINT_DESC(KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_SET, \ "KCPU Queue enqueues Set on Cross Queue Sync Object", \ "@pL", \ "kcpu_queue,cqs_obj_gpu_addr") \ + TRACEPOINT_DESC(KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_WAIT_OPERATION, \ + "KCPU Queue enqueues Wait Operation on Cross Queue Sync Object", \ + "@pLLIII", \ + "kcpu_queue,cqs_obj_gpu_addr,compare_value,condition,data_type,inherit_error") \ + TRACEPOINT_DESC(KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_SET_OPERATION, \ + "KCPU Queue enqueues Set Operation on Cross Queue Sync Object", \ + "@pLLII", \ + "kcpu_queue,cqs_obj_gpu_addr,value,operation,data_type") \ TRACEPOINT_DESC(KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_MAP_IMPORT, \ "KCPU Queue enqueues Map Import", \ "@pL", \ @@ -416,14 +444,6 @@ enum tl_msg_id_obj { "KCPU Queue enqueues Unmap Import ignoring reference count", \ "@pL", \ "kcpu_queue,map_import_buf_gpu_addr") \ - TRACEPOINT_DESC(KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_ERROR_BARRIER, \ - "KCPU Queue enqueues Error Barrier", \ - "@p", \ - "kcpu_queue") \ - TRACEPOINT_DESC(KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_GROUP_SUSPEND, \ - "KCPU Queue enqueues Group Suspend", \ - "@ppI", \ - "kcpu_queue,group_suspend_buf,gpu_cmdq_grp_handle") \ TRACEPOINT_DESC(KBASE_TL_KBASE_ARRAY_BEGIN_KCPUQUEUE_ENQUEUE_JIT_ALLOC, \ "Begin array of KCPU Queue enqueues JIT Alloc", \ "@p", \ @@ -448,6 +468,14 @@ enum tl_msg_id_obj { "End array of KCPU Queue enqueues JIT Free", \ "@p", \ "kcpu_queue") \ + TRACEPOINT_DESC(KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_ERROR_BARRIER, \ + "KCPU Queue enqueues Error Barrier", \ + "@p", \ + "kcpu_queue") \ + TRACEPOINT_DESC(KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_GROUP_SUSPEND, \ + "KCPU Queue enqueues Group Suspend", \ + "@ppI", \ + "kcpu_queue,group_suspend_buf,gpu_cmdq_grp_handle") \ TRACEPOINT_DESC(KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_FENCE_SIGNAL_START, \ "KCPU Queue starts a Signal on Fence", \ "@p", \ @@ -465,15 +493,27 @@ enum tl_msg_id_obj { "@pI", \ "kcpu_queue,execute_error") \ TRACEPOINT_DESC(KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_START, \ - "KCPU Queue starts a Wait on an array of Cross Queue Sync Objects", \ + "KCPU Queue starts a Wait on Cross Queue Sync Object", \ "@p", \ "kcpu_queue") \ TRACEPOINT_DESC(KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_END, \ - "KCPU Queue ends a Wait on an array of Cross Queue Sync Objects", \ + "KCPU Queue ends a Wait on Cross Queue Sync Object", \ "@pI", \ "kcpu_queue,execute_error") \ TRACEPOINT_DESC(KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_SET, \ - "KCPU Queue executes a Set on an array of Cross Queue Sync Objects", \ + "KCPU Queue executes a Set on Cross Queue Sync Object", \ + "@pI", \ + "kcpu_queue,execute_error") \ + TRACEPOINT_DESC(KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_OPERATION_START, \ + "KCPU Queue starts a Wait Operation on Cross Queue Sync Object", \ + "@p", \ + "kcpu_queue") \ + TRACEPOINT_DESC(KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_OPERATION_END, \ + "KCPU Queue ends a Wait Operation on Cross Queue Sync Object", \ + "@pI", \ + "kcpu_queue,execute_error") \ + TRACEPOINT_DESC(KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_SET_OPERATION, \ + "KCPU Queue executes a Set Operation on Cross Queue Sync Object", \ "@pI", \ "kcpu_queue,execute_error") \ TRACEPOINT_DESC(KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_MAP_IMPORT_START, \ @@ -2092,13 +2132,40 @@ void __kbase_tlstream_tl_kbase_new_device( kbase_tlstream_msgbuf_release(stream, acq_flags); } +void __kbase_tlstream_tl_kbase_gpucmdqueue_kick( + struct kbase_tlstream *stream, + u32 kernel_ctx_id, + u64 buffer_gpu_addr +) +{ + const u32 msg_id = KBASE_TL_KBASE_GPUCMDQUEUE_KICK; + const size_t msg_size = sizeof(msg_id) + sizeof(u64) + + sizeof(kernel_ctx_id) + + sizeof(buffer_gpu_addr) + ; + char *buffer; + unsigned long acq_flags; + size_t pos = 0; + + buffer = kbase_tlstream_msgbuf_acquire(stream, msg_size, &acq_flags); + + pos = kbasep_serialize_bytes(buffer, pos, &msg_id, sizeof(msg_id)); + pos = kbasep_serialize_timestamp(buffer, pos); + pos = kbasep_serialize_bytes(buffer, + pos, &kernel_ctx_id, sizeof(kernel_ctx_id)); + pos = kbasep_serialize_bytes(buffer, + pos, &buffer_gpu_addr, sizeof(buffer_gpu_addr)); + + kbase_tlstream_msgbuf_release(stream, acq_flags); +} + void __kbase_tlstream_tl_kbase_device_program_csg( struct kbase_tlstream *stream, u32 kbase_device_id, u32 kernel_ctx_id, u32 gpu_cmdq_grp_handle, u32 kbase_device_csg_slot_index, - u32 kbase_device_csg_slot_resumed + u32 kbase_device_csg_slot_resuming ) { const u32 msg_id = KBASE_TL_KBASE_DEVICE_PROGRAM_CSG; @@ -2107,7 +2174,7 @@ void __kbase_tlstream_tl_kbase_device_program_csg( + sizeof(kernel_ctx_id) + sizeof(gpu_cmdq_grp_handle) + sizeof(kbase_device_csg_slot_index) - + sizeof(kbase_device_csg_slot_resumed) + + sizeof(kbase_device_csg_slot_resuming) ; char *buffer; unsigned long acq_flags; @@ -2126,7 +2193,7 @@ void __kbase_tlstream_tl_kbase_device_program_csg( pos = kbasep_serialize_bytes(buffer, pos, &kbase_device_csg_slot_index, sizeof(kbase_device_csg_slot_index)); pos = kbasep_serialize_bytes(buffer, - pos, &kbase_device_csg_slot_resumed, sizeof(kbase_device_csg_slot_resumed)); + pos, &kbase_device_csg_slot_resuming, sizeof(kbase_device_csg_slot_resuming)); kbase_tlstream_msgbuf_release(stream, acq_flags); } @@ -2158,13 +2225,71 @@ void __kbase_tlstream_tl_kbase_device_deprogram_csg( kbase_tlstream_msgbuf_release(stream, acq_flags); } -void __kbase_tlstream_tl_kbase_device_halt_csg( +void __kbase_tlstream_tl_kbase_device_halting_csg( + struct kbase_tlstream *stream, + u32 kbase_device_id, + u32 kbase_device_csg_slot_index, + u32 kbase_device_csg_slot_suspending +) +{ + const u32 msg_id = KBASE_TL_KBASE_DEVICE_HALTING_CSG; + const size_t msg_size = sizeof(msg_id) + sizeof(u64) + + sizeof(kbase_device_id) + + sizeof(kbase_device_csg_slot_index) + + sizeof(kbase_device_csg_slot_suspending) + ; + char *buffer; + unsigned long acq_flags; + size_t pos = 0; + + buffer = kbase_tlstream_msgbuf_acquire(stream, msg_size, &acq_flags); + + pos = kbasep_serialize_bytes(buffer, pos, &msg_id, sizeof(msg_id)); + pos = kbasep_serialize_timestamp(buffer, pos); + pos = kbasep_serialize_bytes(buffer, + pos, &kbase_device_id, sizeof(kbase_device_id)); + pos = kbasep_serialize_bytes(buffer, + pos, &kbase_device_csg_slot_index, sizeof(kbase_device_csg_slot_index)); + pos = kbasep_serialize_bytes(buffer, + pos, &kbase_device_csg_slot_suspending, sizeof(kbase_device_csg_slot_suspending)); + + kbase_tlstream_msgbuf_release(stream, acq_flags); +} + +void __kbase_tlstream_tl_kbase_device_suspend_csg( struct kbase_tlstream *stream, u32 kbase_device_id, u32 kbase_device_csg_slot_index ) { - const u32 msg_id = KBASE_TL_KBASE_DEVICE_HALT_CSG; + const u32 msg_id = KBASE_TL_KBASE_DEVICE_SUSPEND_CSG; + const size_t msg_size = sizeof(msg_id) + sizeof(u64) + + sizeof(kbase_device_id) + + sizeof(kbase_device_csg_slot_index) + ; + char *buffer; + unsigned long acq_flags; + size_t pos = 0; + + buffer = kbase_tlstream_msgbuf_acquire(stream, msg_size, &acq_flags); + + pos = kbasep_serialize_bytes(buffer, pos, &msg_id, sizeof(msg_id)); + pos = kbasep_serialize_timestamp(buffer, pos); + pos = kbasep_serialize_bytes(buffer, + pos, &kbase_device_id, sizeof(kbase_device_id)); + pos = kbasep_serialize_bytes(buffer, + pos, &kbase_device_csg_slot_index, sizeof(kbase_device_csg_slot_index)); + + kbase_tlstream_msgbuf_release(stream, acq_flags); +} + +void __kbase_tlstream_tl_kbase_device_csg_idle( + struct kbase_tlstream *stream, + u32 kbase_device_id, + u32 kbase_device_csg_slot_index +) +{ + const u32 msg_id = KBASE_TL_KBASE_DEVICE_CSG_IDLE; const size_t msg_size = sizeof(msg_id) + sizeof(u64) + sizeof(kbase_device_id) + sizeof(kbase_device_csg_slot_index) @@ -2401,16 +2526,16 @@ void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_cqs_wait( struct kbase_tlstream *stream, const void *kcpu_queue, u64 cqs_obj_gpu_addr, - u32 cqs_obj_compare_value, - u32 cqs_obj_inherit_error + u32 compare_value, + u32 inherit_error ) { const u32 msg_id = KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_WAIT; const size_t msg_size = sizeof(msg_id) + sizeof(u64) + sizeof(kcpu_queue) + sizeof(cqs_obj_gpu_addr) - + sizeof(cqs_obj_compare_value) - + sizeof(cqs_obj_inherit_error) + + sizeof(compare_value) + + sizeof(inherit_error) ; char *buffer; unsigned long acq_flags; @@ -2425,9 +2550,9 @@ void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_cqs_wait( pos = kbasep_serialize_bytes(buffer, pos, &cqs_obj_gpu_addr, sizeof(cqs_obj_gpu_addr)); pos = kbasep_serialize_bytes(buffer, - pos, &cqs_obj_compare_value, sizeof(cqs_obj_compare_value)); + pos, &compare_value, sizeof(compare_value)); pos = kbasep_serialize_bytes(buffer, - pos, &cqs_obj_inherit_error, sizeof(cqs_obj_inherit_error)); + pos, &inherit_error, sizeof(inherit_error)); kbase_tlstream_msgbuf_release(stream, acq_flags); } @@ -2459,16 +2584,24 @@ void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_cqs_set( kbase_tlstream_msgbuf_release(stream, acq_flags); } -void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_map_import( +void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_cqs_wait_operation( struct kbase_tlstream *stream, const void *kcpu_queue, - u64 map_import_buf_gpu_addr + u64 cqs_obj_gpu_addr, + u64 compare_value, + u32 condition, + u32 data_type, + u32 inherit_error ) { - const u32 msg_id = KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_MAP_IMPORT; + const u32 msg_id = KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_WAIT_OPERATION; const size_t msg_size = sizeof(msg_id) + sizeof(u64) + sizeof(kcpu_queue) - + sizeof(map_import_buf_gpu_addr) + + sizeof(cqs_obj_gpu_addr) + + sizeof(compare_value) + + sizeof(condition) + + sizeof(data_type) + + sizeof(inherit_error) ; char *buffer; unsigned long acq_flags; @@ -2481,21 +2614,35 @@ void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_map_import( pos = kbasep_serialize_bytes(buffer, pos, &kcpu_queue, sizeof(kcpu_queue)); pos = kbasep_serialize_bytes(buffer, - pos, &map_import_buf_gpu_addr, sizeof(map_import_buf_gpu_addr)); + pos, &cqs_obj_gpu_addr, sizeof(cqs_obj_gpu_addr)); + pos = kbasep_serialize_bytes(buffer, + pos, &compare_value, sizeof(compare_value)); + pos = kbasep_serialize_bytes(buffer, + pos, &condition, sizeof(condition)); + pos = kbasep_serialize_bytes(buffer, + pos, &data_type, sizeof(data_type)); + pos = kbasep_serialize_bytes(buffer, + pos, &inherit_error, sizeof(inherit_error)); kbase_tlstream_msgbuf_release(stream, acq_flags); } -void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_unmap_import( +void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_cqs_set_operation( struct kbase_tlstream *stream, const void *kcpu_queue, - u64 map_import_buf_gpu_addr + u64 cqs_obj_gpu_addr, + u64 value, + u32 operation, + u32 data_type ) { - const u32 msg_id = KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_UNMAP_IMPORT; + const u32 msg_id = KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_SET_OPERATION; const size_t msg_size = sizeof(msg_id) + sizeof(u64) + sizeof(kcpu_queue) - + sizeof(map_import_buf_gpu_addr) + + sizeof(cqs_obj_gpu_addr) + + sizeof(value) + + sizeof(operation) + + sizeof(data_type) ; char *buffer; unsigned long acq_flags; @@ -2508,18 +2655,24 @@ void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_unmap_import( pos = kbasep_serialize_bytes(buffer, pos, &kcpu_queue, sizeof(kcpu_queue)); pos = kbasep_serialize_bytes(buffer, - pos, &map_import_buf_gpu_addr, sizeof(map_import_buf_gpu_addr)); + pos, &cqs_obj_gpu_addr, sizeof(cqs_obj_gpu_addr)); + pos = kbasep_serialize_bytes(buffer, + pos, &value, sizeof(value)); + pos = kbasep_serialize_bytes(buffer, + pos, &operation, sizeof(operation)); + pos = kbasep_serialize_bytes(buffer, + pos, &data_type, sizeof(data_type)); kbase_tlstream_msgbuf_release(stream, acq_flags); } -void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_unmap_import_force( +void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_map_import( struct kbase_tlstream *stream, const void *kcpu_queue, u64 map_import_buf_gpu_addr ) { - const u32 msg_id = KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_UNMAP_IMPORT_FORCE; + const u32 msg_id = KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_MAP_IMPORT; const size_t msg_size = sizeof(msg_id) + sizeof(u64) + sizeof(kcpu_queue) + sizeof(map_import_buf_gpu_addr) @@ -2540,14 +2693,16 @@ void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_unmap_import_force( kbase_tlstream_msgbuf_release(stream, acq_flags); } -void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_error_barrier( +void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_unmap_import( struct kbase_tlstream *stream, - const void *kcpu_queue + const void *kcpu_queue, + u64 map_import_buf_gpu_addr ) { - const u32 msg_id = KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_ERROR_BARRIER; + const u32 msg_id = KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_UNMAP_IMPORT; const size_t msg_size = sizeof(msg_id) + sizeof(u64) + sizeof(kcpu_queue) + + sizeof(map_import_buf_gpu_addr) ; char *buffer; unsigned long acq_flags; @@ -2559,22 +2714,22 @@ void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_error_barrier( pos = kbasep_serialize_timestamp(buffer, pos); pos = kbasep_serialize_bytes(buffer, pos, &kcpu_queue, sizeof(kcpu_queue)); + pos = kbasep_serialize_bytes(buffer, + pos, &map_import_buf_gpu_addr, sizeof(map_import_buf_gpu_addr)); kbase_tlstream_msgbuf_release(stream, acq_flags); } -void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_group_suspend( +void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_unmap_import_force( struct kbase_tlstream *stream, const void *kcpu_queue, - const void *group_suspend_buf, - u32 gpu_cmdq_grp_handle + u64 map_import_buf_gpu_addr ) { - const u32 msg_id = KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_GROUP_SUSPEND; + const u32 msg_id = KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_UNMAP_IMPORT_FORCE; const size_t msg_size = sizeof(msg_id) + sizeof(u64) + sizeof(kcpu_queue) - + sizeof(group_suspend_buf) - + sizeof(gpu_cmdq_grp_handle) + + sizeof(map_import_buf_gpu_addr) ; char *buffer; unsigned long acq_flags; @@ -2587,9 +2742,7 @@ void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_group_suspend( pos = kbasep_serialize_bytes(buffer, pos, &kcpu_queue, sizeof(kcpu_queue)); pos = kbasep_serialize_bytes(buffer, - pos, &group_suspend_buf, sizeof(group_suspend_buf)); - pos = kbasep_serialize_bytes(buffer, - pos, &gpu_cmdq_grp_handle, sizeof(gpu_cmdq_grp_handle)); + pos, &map_import_buf_gpu_addr, sizeof(map_import_buf_gpu_addr)); kbase_tlstream_msgbuf_release(stream, acq_flags); } @@ -2772,6 +2925,60 @@ void __kbase_tlstream_tl_kbase_array_end_kcpuqueue_enqueue_jit_free( kbase_tlstream_msgbuf_release(stream, acq_flags); } +void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_error_barrier( + struct kbase_tlstream *stream, + const void *kcpu_queue +) +{ + const u32 msg_id = KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_ERROR_BARRIER; + const size_t msg_size = sizeof(msg_id) + sizeof(u64) + + sizeof(kcpu_queue) + ; + char *buffer; + unsigned long acq_flags; + size_t pos = 0; + + buffer = kbase_tlstream_msgbuf_acquire(stream, msg_size, &acq_flags); + + pos = kbasep_serialize_bytes(buffer, pos, &msg_id, sizeof(msg_id)); + pos = kbasep_serialize_timestamp(buffer, pos); + pos = kbasep_serialize_bytes(buffer, + pos, &kcpu_queue, sizeof(kcpu_queue)); + + kbase_tlstream_msgbuf_release(stream, acq_flags); +} + +void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_group_suspend( + struct kbase_tlstream *stream, + const void *kcpu_queue, + const void *group_suspend_buf, + u32 gpu_cmdq_grp_handle +) +{ + const u32 msg_id = KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_GROUP_SUSPEND; + const size_t msg_size = sizeof(msg_id) + sizeof(u64) + + sizeof(kcpu_queue) + + sizeof(group_suspend_buf) + + sizeof(gpu_cmdq_grp_handle) + ; + char *buffer; + unsigned long acq_flags; + size_t pos = 0; + + buffer = kbase_tlstream_msgbuf_acquire(stream, msg_size, &acq_flags); + + pos = kbasep_serialize_bytes(buffer, pos, &msg_id, sizeof(msg_id)); + pos = kbasep_serialize_timestamp(buffer, pos); + pos = kbasep_serialize_bytes(buffer, + pos, &kcpu_queue, sizeof(kcpu_queue)); + pos = kbasep_serialize_bytes(buffer, + pos, &group_suspend_buf, sizeof(group_suspend_buf)); + pos = kbasep_serialize_bytes(buffer, + pos, &gpu_cmdq_grp_handle, sizeof(gpu_cmdq_grp_handle)); + + kbase_tlstream_msgbuf_release(stream, acq_flags); +} + void __kbase_tlstream_tl_kbase_kcpuqueue_execute_fence_signal_start( struct kbase_tlstream *stream, const void *kcpu_queue @@ -2949,6 +3156,83 @@ void __kbase_tlstream_tl_kbase_kcpuqueue_execute_cqs_set( kbase_tlstream_msgbuf_release(stream, acq_flags); } +void __kbase_tlstream_tl_kbase_kcpuqueue_execute_cqs_wait_operation_start( + struct kbase_tlstream *stream, + const void *kcpu_queue +) +{ + const u32 msg_id = KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_OPERATION_START; + const size_t msg_size = sizeof(msg_id) + sizeof(u64) + + sizeof(kcpu_queue) + ; + char *buffer; + unsigned long acq_flags; + size_t pos = 0; + + buffer = kbase_tlstream_msgbuf_acquire(stream, msg_size, &acq_flags); + + pos = kbasep_serialize_bytes(buffer, pos, &msg_id, sizeof(msg_id)); + pos = kbasep_serialize_timestamp(buffer, pos); + pos = kbasep_serialize_bytes(buffer, + pos, &kcpu_queue, sizeof(kcpu_queue)); + + kbase_tlstream_msgbuf_release(stream, acq_flags); +} + +void __kbase_tlstream_tl_kbase_kcpuqueue_execute_cqs_wait_operation_end( + struct kbase_tlstream *stream, + const void *kcpu_queue, + u32 execute_error +) +{ + const u32 msg_id = KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_OPERATION_END; + const size_t msg_size = sizeof(msg_id) + sizeof(u64) + + sizeof(kcpu_queue) + + sizeof(execute_error) + ; + char *buffer; + unsigned long acq_flags; + size_t pos = 0; + + buffer = kbase_tlstream_msgbuf_acquire(stream, msg_size, &acq_flags); + + pos = kbasep_serialize_bytes(buffer, pos, &msg_id, sizeof(msg_id)); + pos = kbasep_serialize_timestamp(buffer, pos); + pos = kbasep_serialize_bytes(buffer, + pos, &kcpu_queue, sizeof(kcpu_queue)); + pos = kbasep_serialize_bytes(buffer, + pos, &execute_error, sizeof(execute_error)); + + kbase_tlstream_msgbuf_release(stream, acq_flags); +} + +void __kbase_tlstream_tl_kbase_kcpuqueue_execute_cqs_set_operation( + struct kbase_tlstream *stream, + const void *kcpu_queue, + u32 execute_error +) +{ + const u32 msg_id = KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_SET_OPERATION; + const size_t msg_size = sizeof(msg_id) + sizeof(u64) + + sizeof(kcpu_queue) + + sizeof(execute_error) + ; + char *buffer; + unsigned long acq_flags; + size_t pos = 0; + + buffer = kbase_tlstream_msgbuf_acquire(stream, msg_size, &acq_flags); + + pos = kbasep_serialize_bytes(buffer, pos, &msg_id, sizeof(msg_id)); + pos = kbasep_serialize_timestamp(buffer, pos); + pos = kbasep_serialize_bytes(buffer, + pos, &kcpu_queue, sizeof(kcpu_queue)); + pos = kbasep_serialize_bytes(buffer, + pos, &execute_error, sizeof(execute_error)); + + kbase_tlstream_msgbuf_release(stream, acq_flags); +} + void __kbase_tlstream_tl_kbase_kcpuqueue_execute_map_import_start( struct kbase_tlstream *stream, const void *kcpu_queue diff --git a/mali_kbase/tl/mali_kbase_tracepoints.h b/mali_kbase/tl/mali_kbase_tracepoints.h index b15fe6a..f1f4761 100644 --- a/mali_kbase/tl/mali_kbase_tracepoints.h +++ b/mali_kbase/tl/mali_kbase_tracepoints.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * - * (C) COPYRIGHT 2010-2022 ARM Limited. All rights reserved. + * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved. * * This program is free software and is provided to you under the terms of the * GNU General Public License version 2 as published by the Free Software @@ -77,7 +77,7 @@ extern const size_t aux_desc_header_size; #define TL_JS_EVENT_STOP GATOR_JOB_SLOT_STOP #define TL_JS_EVENT_SOFT_STOP GATOR_JOB_SLOT_SOFT_STOPPED -#define TLSTREAM_ENABLED (1 << 31) +#define TLSTREAM_ENABLED (1u << 31) void __kbase_tlstream_tl_new_ctx( struct kbase_tlstream *stream, @@ -396,13 +396,19 @@ void __kbase_tlstream_tl_kbase_new_device( u32 kbase_device_supports_gpu_sleep ); +void __kbase_tlstream_tl_kbase_gpucmdqueue_kick( + struct kbase_tlstream *stream, + u32 kernel_ctx_id, + u64 buffer_gpu_addr +); + void __kbase_tlstream_tl_kbase_device_program_csg( struct kbase_tlstream *stream, u32 kbase_device_id, u32 kernel_ctx_id, u32 gpu_cmdq_grp_handle, u32 kbase_device_csg_slot_index, - u32 kbase_device_csg_slot_resumed + u32 kbase_device_csg_slot_resuming ); void __kbase_tlstream_tl_kbase_device_deprogram_csg( @@ -411,7 +417,20 @@ void __kbase_tlstream_tl_kbase_device_deprogram_csg( u32 kbase_device_csg_slot_index ); -void __kbase_tlstream_tl_kbase_device_halt_csg( +void __kbase_tlstream_tl_kbase_device_halting_csg( + struct kbase_tlstream *stream, + u32 kbase_device_id, + u32 kbase_device_csg_slot_index, + u32 kbase_device_csg_slot_suspending +); + +void __kbase_tlstream_tl_kbase_device_suspend_csg( + struct kbase_tlstream *stream, + u32 kbase_device_id, + u32 kbase_device_csg_slot_index +); + +void __kbase_tlstream_tl_kbase_device_csg_idle( struct kbase_tlstream *stream, u32 kbase_device_id, u32 kbase_device_csg_slot_index @@ -468,8 +487,8 @@ void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_cqs_wait( struct kbase_tlstream *stream, const void *kcpu_queue, u64 cqs_obj_gpu_addr, - u32 cqs_obj_compare_value, - u32 cqs_obj_inherit_error + u32 compare_value, + u32 inherit_error ); void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_cqs_set( @@ -478,34 +497,41 @@ void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_cqs_set( u64 cqs_obj_gpu_addr ); -void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_map_import( +void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_cqs_wait_operation( struct kbase_tlstream *stream, const void *kcpu_queue, - u64 map_import_buf_gpu_addr + u64 cqs_obj_gpu_addr, + u64 compare_value, + u32 condition, + u32 data_type, + u32 inherit_error ); -void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_unmap_import( +void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_cqs_set_operation( struct kbase_tlstream *stream, const void *kcpu_queue, - u64 map_import_buf_gpu_addr + u64 cqs_obj_gpu_addr, + u64 value, + u32 operation, + u32 data_type ); -void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_unmap_import_force( +void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_map_import( struct kbase_tlstream *stream, const void *kcpu_queue, u64 map_import_buf_gpu_addr ); -void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_error_barrier( +void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_unmap_import( struct kbase_tlstream *stream, - const void *kcpu_queue + const void *kcpu_queue, + u64 map_import_buf_gpu_addr ); -void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_group_suspend( +void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_unmap_import_force( struct kbase_tlstream *stream, const void *kcpu_queue, - const void *group_suspend_buf, - u32 gpu_cmdq_grp_handle + u64 map_import_buf_gpu_addr ); void __kbase_tlstream_tl_kbase_array_begin_kcpuqueue_enqueue_jit_alloc( @@ -548,6 +574,18 @@ void __kbase_tlstream_tl_kbase_array_end_kcpuqueue_enqueue_jit_free( const void *kcpu_queue ); +void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_error_barrier( + struct kbase_tlstream *stream, + const void *kcpu_queue +); + +void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_group_suspend( + struct kbase_tlstream *stream, + const void *kcpu_queue, + const void *group_suspend_buf, + u32 gpu_cmdq_grp_handle +); + void __kbase_tlstream_tl_kbase_kcpuqueue_execute_fence_signal_start( struct kbase_tlstream *stream, const void *kcpu_queue @@ -587,6 +625,23 @@ void __kbase_tlstream_tl_kbase_kcpuqueue_execute_cqs_set( u32 execute_error ); +void __kbase_tlstream_tl_kbase_kcpuqueue_execute_cqs_wait_operation_start( + struct kbase_tlstream *stream, + const void *kcpu_queue +); + +void __kbase_tlstream_tl_kbase_kcpuqueue_execute_cqs_wait_operation_end( + struct kbase_tlstream *stream, + const void *kcpu_queue, + u32 execute_error +); + +void __kbase_tlstream_tl_kbase_kcpuqueue_execute_cqs_set_operation( + struct kbase_tlstream *stream, + const void *kcpu_queue, + u32 execute_error +); + void __kbase_tlstream_tl_kbase_kcpuqueue_execute_map_import_start( struct kbase_tlstream *stream, const void *kcpu_queue @@ -1686,7 +1741,7 @@ struct kbase_tlstream; } while (0) /** - * KBASE_TLSTREAM_TL_JD_DONE_NO_LOCK_START - Within function jd_done_nolock + * KBASE_TLSTREAM_TL_JD_DONE_NO_LOCK_START - Within function kbase_jd_done_nolock * * @kbdev: Kbase device * @atom: Atom identifier @@ -1705,7 +1760,7 @@ struct kbase_tlstream; } while (0) /** - * KBASE_TLSTREAM_TL_JD_DONE_NO_LOCK_END - Within function jd_done_nolock - end + * KBASE_TLSTREAM_TL_JD_DONE_NO_LOCK_END - Within function kbase_jd_done_nolock - end * * @kbdev: Kbase device * @atom: Atom identifier @@ -1982,6 +2037,37 @@ struct kbase_tlstream; #endif /* MALI_USE_CSF */ /** + * KBASE_TLSTREAM_TL_KBASE_GPUCMDQUEUE_KICK - Kernel receives a request to process new GPU queue instructions + * + * @kbdev: Kbase device + * @kernel_ctx_id: Unique ID for the KBase Context + * @buffer_gpu_addr: Address of the GPU queue's command buffer + */ +#if MALI_USE_CSF +#define KBASE_TLSTREAM_TL_KBASE_GPUCMDQUEUE_KICK( \ + kbdev, \ + kernel_ctx_id, \ + buffer_gpu_addr \ + ) \ + do { \ + int enabled = atomic_read(&kbdev->timeline_flags); \ + if (enabled & BASE_TLSTREAM_ENABLE_CSF_TRACEPOINTS) \ + __kbase_tlstream_tl_kbase_gpucmdqueue_kick( \ + __TL_DISPATCH_STREAM(kbdev, obj), \ + kernel_ctx_id, \ + buffer_gpu_addr \ + ); \ + } while (0) +#else +#define KBASE_TLSTREAM_TL_KBASE_GPUCMDQUEUE_KICK( \ + kbdev, \ + kernel_ctx_id, \ + buffer_gpu_addr \ + ) \ + do { } while (0) +#endif /* MALI_USE_CSF */ + +/** * KBASE_TLSTREAM_TL_KBASE_DEVICE_PROGRAM_CSG - CSG is programmed to a slot * * @kbdev: Kbase device @@ -1989,7 +2075,7 @@ struct kbase_tlstream; * @kernel_ctx_id: Unique ID for the KBase Context * @gpu_cmdq_grp_handle: GPU Command Queue Group handle which will match userspace * @kbase_device_csg_slot_index: The index of the slot in the scheduler being programmed - * @kbase_device_csg_slot_resumed: Whether the csg is being resumed + * @kbase_device_csg_slot_resuming: Whether the csg is being resumed */ #if MALI_USE_CSF #define KBASE_TLSTREAM_TL_KBASE_DEVICE_PROGRAM_CSG( \ @@ -1998,7 +2084,7 @@ struct kbase_tlstream; kernel_ctx_id, \ gpu_cmdq_grp_handle, \ kbase_device_csg_slot_index, \ - kbase_device_csg_slot_resumed \ + kbase_device_csg_slot_resuming \ ) \ do { \ int enabled = atomic_read(&kbdev->timeline_flags); \ @@ -2009,7 +2095,7 @@ struct kbase_tlstream; kernel_ctx_id, \ gpu_cmdq_grp_handle, \ kbase_device_csg_slot_index, \ - kbase_device_csg_slot_resumed \ + kbase_device_csg_slot_resuming \ ); \ } while (0) #else @@ -2019,7 +2105,7 @@ struct kbase_tlstream; kernel_ctx_id, \ gpu_cmdq_grp_handle, \ kbase_device_csg_slot_index, \ - kbase_device_csg_slot_resumed \ + kbase_device_csg_slot_resuming \ ) \ do { } while (0) #endif /* MALI_USE_CSF */ @@ -2029,7 +2115,7 @@ struct kbase_tlstream; * * @kbdev: Kbase device * @kbase_device_id: The ID of the physical hardware - * @kbase_device_csg_slot_index: The index of the slot in the scheduler being programmed + * @kbase_device_csg_slot_index: The index of the slot in the scheduler whose CSG is being deprogrammed */ #if MALI_USE_CSF #define KBASE_TLSTREAM_TL_KBASE_DEVICE_DEPROGRAM_CSG( \ @@ -2056,14 +2142,80 @@ struct kbase_tlstream; #endif /* MALI_USE_CSF */ /** - * KBASE_TLSTREAM_TL_KBASE_DEVICE_HALT_CSG - CSG is halted + * KBASE_TLSTREAM_TL_KBASE_DEVICE_HALTING_CSG - CSG is halting * * @kbdev: Kbase device * @kbase_device_id: The ID of the physical hardware - * @kbase_device_csg_slot_index: The index of the slot in the scheduler being programmed + * @kbase_device_csg_slot_index: The index of the slot in the scheduler whose CSG is being halted + * @kbase_device_csg_slot_suspending: Whether the csg is being suspended + */ +#if MALI_USE_CSF +#define KBASE_TLSTREAM_TL_KBASE_DEVICE_HALTING_CSG( \ + kbdev, \ + kbase_device_id, \ + kbase_device_csg_slot_index, \ + kbase_device_csg_slot_suspending \ + ) \ + do { \ + int enabled = atomic_read(&kbdev->timeline_flags); \ + if (enabled & BASE_TLSTREAM_ENABLE_CSF_TRACEPOINTS) \ + __kbase_tlstream_tl_kbase_device_halting_csg( \ + __TL_DISPATCH_STREAM(kbdev, obj), \ + kbase_device_id, \ + kbase_device_csg_slot_index, \ + kbase_device_csg_slot_suspending \ + ); \ + } while (0) +#else +#define KBASE_TLSTREAM_TL_KBASE_DEVICE_HALTING_CSG( \ + kbdev, \ + kbase_device_id, \ + kbase_device_csg_slot_index, \ + kbase_device_csg_slot_suspending \ + ) \ + do { } while (0) +#endif /* MALI_USE_CSF */ + +/** + * KBASE_TLSTREAM_TL_KBASE_DEVICE_SUSPEND_CSG - CSG is suspended + * + * @kbdev: Kbase device + * @kbase_device_id: The ID of the physical hardware + * @kbase_device_csg_slot_index: The index of the slot in the scheduler whose CSG is being suspended + */ +#if MALI_USE_CSF +#define KBASE_TLSTREAM_TL_KBASE_DEVICE_SUSPEND_CSG( \ + kbdev, \ + kbase_device_id, \ + kbase_device_csg_slot_index \ + ) \ + do { \ + int enabled = atomic_read(&kbdev->timeline_flags); \ + if (enabled & BASE_TLSTREAM_ENABLE_CSF_TRACEPOINTS) \ + __kbase_tlstream_tl_kbase_device_suspend_csg( \ + __TL_DISPATCH_STREAM(kbdev, obj), \ + kbase_device_id, \ + kbase_device_csg_slot_index \ + ); \ + } while (0) +#else +#define KBASE_TLSTREAM_TL_KBASE_DEVICE_SUSPEND_CSG( \ + kbdev, \ + kbase_device_id, \ + kbase_device_csg_slot_index \ + ) \ + do { } while (0) +#endif /* MALI_USE_CSF */ + +/** + * KBASE_TLSTREAM_TL_KBASE_DEVICE_CSG_IDLE - KBase device is notified that CSG is idle. + * + * @kbdev: Kbase device + * @kbase_device_id: The ID of the physical hardware + * @kbase_device_csg_slot_index: The index of the slot in the scheduler whose CSG for which we are receiving an idle notification */ #if MALI_USE_CSF -#define KBASE_TLSTREAM_TL_KBASE_DEVICE_HALT_CSG( \ +#define KBASE_TLSTREAM_TL_KBASE_DEVICE_CSG_IDLE( \ kbdev, \ kbase_device_id, \ kbase_device_csg_slot_index \ @@ -2071,14 +2223,14 @@ struct kbase_tlstream; do { \ int enabled = atomic_read(&kbdev->timeline_flags); \ if (enabled & BASE_TLSTREAM_ENABLE_CSF_TRACEPOINTS) \ - __kbase_tlstream_tl_kbase_device_halt_csg( \ + __kbase_tlstream_tl_kbase_device_csg_idle( \ __TL_DISPATCH_STREAM(kbdev, obj), \ kbase_device_id, \ kbase_device_csg_slot_index \ ); \ } while (0) #else -#define KBASE_TLSTREAM_TL_KBASE_DEVICE_HALT_CSG( \ +#define KBASE_TLSTREAM_TL_KBASE_DEVICE_CSG_IDLE( \ kbdev, \ kbase_device_id, \ kbase_device_csg_slot_index \ @@ -2336,16 +2488,16 @@ struct kbase_tlstream; * @kbdev: Kbase device * @kcpu_queue: KCPU queue * @cqs_obj_gpu_addr: CQS Object GPU pointer - * @cqs_obj_compare_value: Semaphore value that should be exceeded for the WAIT to pass - * @cqs_obj_inherit_error: Flag which indicates if the CQS object error state should be inherited by the queue + * @compare_value: Semaphore value that should be exceeded for the WAIT to pass + * @inherit_error: Flag which indicates if the CQS object error state should be inherited by the queue */ #if MALI_USE_CSF #define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_WAIT( \ kbdev, \ kcpu_queue, \ cqs_obj_gpu_addr, \ - cqs_obj_compare_value, \ - cqs_obj_inherit_error \ + compare_value, \ + inherit_error \ ) \ do { \ int enabled = atomic_read(&kbdev->timeline_flags); \ @@ -2354,8 +2506,8 @@ struct kbase_tlstream; __TL_DISPATCH_STREAM(kbdev, obj), \ kcpu_queue, \ cqs_obj_gpu_addr, \ - cqs_obj_compare_value, \ - cqs_obj_inherit_error \ + compare_value, \ + inherit_error \ ); \ } while (0) #else @@ -2363,8 +2515,8 @@ struct kbase_tlstream; kbdev, \ kcpu_queue, \ cqs_obj_gpu_addr, \ - cqs_obj_compare_value, \ - cqs_obj_inherit_error \ + compare_value, \ + inherit_error \ ) \ do { } while (0) #endif /* MALI_USE_CSF */ @@ -2401,76 +2553,104 @@ struct kbase_tlstream; #endif /* MALI_USE_CSF */ /** - * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_MAP_IMPORT - KCPU Queue enqueues Map Import + * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_WAIT_OPERATION - KCPU Queue enqueues Wait Operation on Cross Queue Sync Object * * @kbdev: Kbase device * @kcpu_queue: KCPU queue - * @map_import_buf_gpu_addr: Map import buffer GPU pointer + * @cqs_obj_gpu_addr: CQS Object GPU pointer + * @compare_value: Value that should be compared to semaphore value for the WAIT to pass + * @condition: Condition for unblocking WAITs on Timeline Cross Queue Sync Object (e.g. greater than, less or equal) + * @data_type: Data type of a CQS Object's value + * @inherit_error: Flag which indicates if the CQS object error state should be inherited by the queue */ #if MALI_USE_CSF -#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_MAP_IMPORT( \ +#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_WAIT_OPERATION( \ kbdev, \ kcpu_queue, \ - map_import_buf_gpu_addr \ + cqs_obj_gpu_addr, \ + compare_value, \ + condition, \ + data_type, \ + inherit_error \ ) \ do { \ int enabled = atomic_read(&kbdev->timeline_flags); \ if (enabled & BASE_TLSTREAM_ENABLE_CSF_TRACEPOINTS) \ - __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_map_import( \ + __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_cqs_wait_operation( \ __TL_DISPATCH_STREAM(kbdev, obj), \ kcpu_queue, \ - map_import_buf_gpu_addr \ + cqs_obj_gpu_addr, \ + compare_value, \ + condition, \ + data_type, \ + inherit_error \ ); \ } while (0) #else -#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_MAP_IMPORT( \ +#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_WAIT_OPERATION( \ kbdev, \ kcpu_queue, \ - map_import_buf_gpu_addr \ + cqs_obj_gpu_addr, \ + compare_value, \ + condition, \ + data_type, \ + inherit_error \ ) \ do { } while (0) #endif /* MALI_USE_CSF */ /** - * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_UNMAP_IMPORT - KCPU Queue enqueues Unmap Import + * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_SET_OPERATION - KCPU Queue enqueues Set Operation on Cross Queue Sync Object * * @kbdev: Kbase device * @kcpu_queue: KCPU queue - * @map_import_buf_gpu_addr: Map import buffer GPU pointer + * @cqs_obj_gpu_addr: CQS Object GPU pointer + * @value: Value that will be set or added to semaphore + * @operation: Operation type performed on semaphore value (SET or ADD) + * @data_type: Data type of a CQS Object's value */ #if MALI_USE_CSF -#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_UNMAP_IMPORT( \ +#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_SET_OPERATION( \ kbdev, \ kcpu_queue, \ - map_import_buf_gpu_addr \ + cqs_obj_gpu_addr, \ + value, \ + operation, \ + data_type \ ) \ do { \ int enabled = atomic_read(&kbdev->timeline_flags); \ if (enabled & BASE_TLSTREAM_ENABLE_CSF_TRACEPOINTS) \ - __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_unmap_import( \ + __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_cqs_set_operation( \ __TL_DISPATCH_STREAM(kbdev, obj), \ kcpu_queue, \ - map_import_buf_gpu_addr \ + cqs_obj_gpu_addr, \ + value, \ + operation, \ + data_type \ ); \ } while (0) #else -#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_UNMAP_IMPORT( \ +#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_SET_OPERATION( \ kbdev, \ kcpu_queue, \ - map_import_buf_gpu_addr \ + cqs_obj_gpu_addr, \ + value, \ + operation, \ + data_type \ ) \ do { } while (0) #endif /* MALI_USE_CSF */ /** - * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_UNMAP_IMPORT_FORCE - KCPU Queue enqueues Unmap Import ignoring reference count + * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_MAP_IMPORT - KCPU Queue enqueues Map Import * * @kbdev: Kbase device * @kcpu_queue: KCPU queue * @map_import_buf_gpu_addr: Map import buffer GPU pointer */ #if MALI_USE_CSF -#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_UNMAP_IMPORT_FORCE( \ +#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_MAP_IMPORT( \ kbdev, \ kcpu_queue, \ map_import_buf_gpu_addr \ @@ -2478,14 +2658,14 @@ struct kbase_tlstream; do { \ int enabled = atomic_read(&kbdev->timeline_flags); \ if (enabled & BASE_TLSTREAM_ENABLE_CSF_TRACEPOINTS) \ - __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_unmap_import_force( \ + __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_map_import( \ __TL_DISPATCH_STREAM(kbdev, obj), \ kcpu_queue, \ map_import_buf_gpu_addr \ ); \ } while (0) #else -#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_UNMAP_IMPORT_FORCE( \ +#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_MAP_IMPORT( \ kbdev, \ kcpu_queue, \ map_import_buf_gpu_addr \ @@ -2494,63 +2674,63 @@ struct kbase_tlstream; #endif /* MALI_USE_CSF */ /** - * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_ERROR_BARRIER - KCPU Queue enqueues Error Barrier + * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_UNMAP_IMPORT - KCPU Queue enqueues Unmap Import * * @kbdev: Kbase device * @kcpu_queue: KCPU queue + * @map_import_buf_gpu_addr: Map import buffer GPU pointer */ #if MALI_USE_CSF -#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_ERROR_BARRIER( \ +#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_UNMAP_IMPORT( \ kbdev, \ - kcpu_queue \ + kcpu_queue, \ + map_import_buf_gpu_addr \ ) \ do { \ int enabled = atomic_read(&kbdev->timeline_flags); \ if (enabled & BASE_TLSTREAM_ENABLE_CSF_TRACEPOINTS) \ - __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_error_barrier( \ + __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_unmap_import( \ __TL_DISPATCH_STREAM(kbdev, obj), \ - kcpu_queue \ + kcpu_queue, \ + map_import_buf_gpu_addr \ ); \ } while (0) #else -#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_ERROR_BARRIER( \ +#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_UNMAP_IMPORT( \ kbdev, \ - kcpu_queue \ + kcpu_queue, \ + map_import_buf_gpu_addr \ ) \ do { } while (0) #endif /* MALI_USE_CSF */ /** - * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_GROUP_SUSPEND - KCPU Queue enqueues Group Suspend + * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_UNMAP_IMPORT_FORCE - KCPU Queue enqueues Unmap Import ignoring reference count * * @kbdev: Kbase device * @kcpu_queue: KCPU queue - * @group_suspend_buf: Pointer to the suspend buffer structure - * @gpu_cmdq_grp_handle: GPU Command Queue Group handle which will match userspace + * @map_import_buf_gpu_addr: Map import buffer GPU pointer */ #if MALI_USE_CSF -#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_GROUP_SUSPEND( \ +#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_UNMAP_IMPORT_FORCE( \ kbdev, \ kcpu_queue, \ - group_suspend_buf, \ - gpu_cmdq_grp_handle \ + map_import_buf_gpu_addr \ ) \ do { \ int enabled = atomic_read(&kbdev->timeline_flags); \ if (enabled & BASE_TLSTREAM_ENABLE_CSF_TRACEPOINTS) \ - __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_group_suspend( \ + __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_unmap_import_force( \ __TL_DISPATCH_STREAM(kbdev, obj), \ kcpu_queue, \ - group_suspend_buf, \ - gpu_cmdq_grp_handle \ + map_import_buf_gpu_addr \ ); \ } while (0) #else -#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_GROUP_SUSPEND( \ +#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_UNMAP_IMPORT_FORCE( \ kbdev, \ kcpu_queue, \ - group_suspend_buf, \ - gpu_cmdq_grp_handle \ + map_import_buf_gpu_addr \ ) \ do { } while (0) #endif /* MALI_USE_CSF */ @@ -2758,6 +2938,68 @@ struct kbase_tlstream; #endif /* MALI_USE_CSF */ /** + * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_ERROR_BARRIER - KCPU Queue enqueues Error Barrier + * + * @kbdev: Kbase device + * @kcpu_queue: KCPU queue + */ +#if MALI_USE_CSF +#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_ERROR_BARRIER( \ + kbdev, \ + kcpu_queue \ + ) \ + do { \ + int enabled = atomic_read(&kbdev->timeline_flags); \ + if (enabled & BASE_TLSTREAM_ENABLE_CSF_TRACEPOINTS) \ + __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_error_barrier( \ + __TL_DISPATCH_STREAM(kbdev, obj), \ + kcpu_queue \ + ); \ + } while (0) +#else +#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_ERROR_BARRIER( \ + kbdev, \ + kcpu_queue \ + ) \ + do { } while (0) +#endif /* MALI_USE_CSF */ + +/** + * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_GROUP_SUSPEND - KCPU Queue enqueues Group Suspend + * + * @kbdev: Kbase device + * @kcpu_queue: KCPU queue + * @group_suspend_buf: Pointer to the suspend buffer structure + * @gpu_cmdq_grp_handle: GPU Command Queue Group handle which will match userspace + */ +#if MALI_USE_CSF +#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_GROUP_SUSPEND( \ + kbdev, \ + kcpu_queue, \ + group_suspend_buf, \ + gpu_cmdq_grp_handle \ + ) \ + do { \ + int enabled = atomic_read(&kbdev->timeline_flags); \ + if (enabled & BASE_TLSTREAM_ENABLE_CSF_TRACEPOINTS) \ + __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_group_suspend( \ + __TL_DISPATCH_STREAM(kbdev, obj), \ + kcpu_queue, \ + group_suspend_buf, \ + gpu_cmdq_grp_handle \ + ); \ + } while (0) +#else +#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_GROUP_SUSPEND( \ + kbdev, \ + kcpu_queue, \ + group_suspend_buf, \ + gpu_cmdq_grp_handle \ + ) \ + do { } while (0) +#endif /* MALI_USE_CSF */ + +/** * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_FENCE_SIGNAL_START - KCPU Queue starts a Signal on Fence * * @kbdev: Kbase device @@ -2874,7 +3116,7 @@ struct kbase_tlstream; #endif /* MALI_USE_CSF */ /** - * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_START - KCPU Queue starts a Wait on an array of Cross Queue Sync Objects + * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_START - KCPU Queue starts a Wait on Cross Queue Sync Object * * @kbdev: Kbase device * @kcpu_queue: KCPU queue @@ -2901,7 +3143,7 @@ struct kbase_tlstream; #endif /* MALI_USE_CSF */ /** - * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_END - KCPU Queue ends a Wait on an array of Cross Queue Sync Objects + * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_END - KCPU Queue ends a Wait on Cross Queue Sync Object * * @kbdev: Kbase device * @kcpu_queue: KCPU queue @@ -2932,7 +3174,7 @@ struct kbase_tlstream; #endif /* MALI_USE_CSF */ /** - * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_SET - KCPU Queue executes a Set on an array of Cross Queue Sync Objects + * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_SET - KCPU Queue executes a Set on Cross Queue Sync Object * * @kbdev: Kbase device * @kcpu_queue: KCPU queue @@ -2963,6 +3205,95 @@ struct kbase_tlstream; #endif /* MALI_USE_CSF */ /** + * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_OPERATION_START - KCPU Queue starts a Wait Operation on Cross Queue Sync Object + * + * @kbdev: Kbase device + * @kcpu_queue: KCPU queue + */ +#if MALI_USE_CSF +#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_OPERATION_START( \ + kbdev, \ + kcpu_queue \ + ) \ + do { \ + int enabled = atomic_read(&kbdev->timeline_flags); \ + if (enabled & BASE_TLSTREAM_ENABLE_CSF_TRACEPOINTS) \ + __kbase_tlstream_tl_kbase_kcpuqueue_execute_cqs_wait_operation_start( \ + __TL_DISPATCH_STREAM(kbdev, obj), \ + kcpu_queue \ + ); \ + } while (0) +#else +#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_OPERATION_START( \ + kbdev, \ + kcpu_queue \ + ) \ + do { } while (0) +#endif /* MALI_USE_CSF */ + +/** + * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_OPERATION_END - KCPU Queue ends a Wait Operation on Cross Queue Sync Object + * + * @kbdev: Kbase device + * @kcpu_queue: KCPU queue + * @execute_error: Non-zero error code if KCPU Queue item completed with error, else zero + */ +#if MALI_USE_CSF +#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_OPERATION_END( \ + kbdev, \ + kcpu_queue, \ + execute_error \ + ) \ + do { \ + int enabled = atomic_read(&kbdev->timeline_flags); \ + if (enabled & BASE_TLSTREAM_ENABLE_CSF_TRACEPOINTS) \ + __kbase_tlstream_tl_kbase_kcpuqueue_execute_cqs_wait_operation_end( \ + __TL_DISPATCH_STREAM(kbdev, obj), \ + kcpu_queue, \ + execute_error \ + ); \ + } while (0) +#else +#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_OPERATION_END( \ + kbdev, \ + kcpu_queue, \ + execute_error \ + ) \ + do { } while (0) +#endif /* MALI_USE_CSF */ + +/** + * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_SET_OPERATION - KCPU Queue executes a Set Operation on Cross Queue Sync Object + * + * @kbdev: Kbase device + * @kcpu_queue: KCPU queue + * @execute_error: Non-zero error code if KCPU Queue item completed with error, else zero + */ +#if MALI_USE_CSF +#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_SET_OPERATION( \ + kbdev, \ + kcpu_queue, \ + execute_error \ + ) \ + do { \ + int enabled = atomic_read(&kbdev->timeline_flags); \ + if (enabled & BASE_TLSTREAM_ENABLE_CSF_TRACEPOINTS) \ + __kbase_tlstream_tl_kbase_kcpuqueue_execute_cqs_set_operation( \ + __TL_DISPATCH_STREAM(kbdev, obj), \ + kcpu_queue, \ + execute_error \ + ); \ + } while (0) +#else +#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_SET_OPERATION( \ + kbdev, \ + kcpu_queue, \ + execute_error \ + ) \ + do { } while (0) +#endif /* MALI_USE_CSF */ + +/** * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_MAP_IMPORT_START - KCPU Queue starts a Map Import * * @kbdev: Kbase device diff --git a/mali_pixel/BUILD.bazel b/mali_pixel/BUILD.bazel index 21f1633..11b066e 100644 --- a/mali_pixel/BUILD.bazel +++ b/mali_pixel/BUILD.bazel @@ -17,6 +17,7 @@ kernel_module( ], kernel_build = "//private/google-modules/soc/gs:gs_kernel_build", visibility = [ + "//private/google-modules/gpu/mali_kbase:__pkg__", "//private/google-modules/soc/gs:__pkg__", ], deps = [ diff --git a/mali_pixel/Kbuild b/mali_pixel/Kbuild index 87e432a..4b519a9 100644 --- a/mali_pixel/Kbuild +++ b/mali_pixel/Kbuild @@ -23,21 +23,38 @@ src:=$(if $(patsubst /%,,$(src)),$(srctree)/$(src),$(src)) CONFIG_MALI_MEMORY_GROUP_MANAGER ?= m CONFIG_MALI_PRIORITY_CONTROL_MANAGER ?= m +CONFIG_MALI_PROTECTED_MEMORY_ALLOCATOR ?= m +CONFIG_MALI_PIXEL_STATS ?= m +CONFIG_MALI_PIXEL_GPU_SLC=y -DEFINES += \ - -DCONFIG_MALI_MEMORY_GROUP_MANAGER=$(CONFIG_MALI_MEMORY_GROUP_MANAGER) \ - -DCONFIG_MALI_PRIORITY_CONTROL_MANAGER=$(CONFIG_MALI_PRIORITY_CONTROL_MANAGER) +mali_pixel-objs := -# Use our defines when compiling, and include mali platform module headers -ccflags-y += $(DEFINES) -I$(src)/../common/include +ifeq ($(CONFIG_MALI_PIXEL_STATS),m) + DEFINES += -DCONFIG_MALI_PIXEL_STATS + mali_pixel-objs += mali_pixel_stats.o +endif -mali_pixel-objs := ifeq ($(CONFIG_MALI_MEMORY_GROUP_MANAGER),m) + DEFINES += -DCONFIG_MALI_MEMORY_GROUP_MANAGER mali_pixel-objs += memory_group_manager.o endif ifeq ($(CONFIG_MALI_PRIORITY_CONTROL_MANAGER),m) + DEFINES += -DCONFIG_MALI_PRIORITY_CONTROL_MANAGER mali_pixel-objs += priority_control_manager.o endif +ifeq ($(CONFIG_MALI_PROTECTED_MEMORY_ALLOCATOR),m) + DEFINES += -DCONFIG_MALI_PROTECTED_MEMORY_ALLOCATOR + mali_pixel-objs += protected_memory_allocator.o +endif +ifeq ($(CONFIG_MALI_PIXEL_GPU_SLC),y) + DEFINES += -DCONFIG_MALI_PIXEL_GPU_SLC +endif + +# Use our defines when compiling, and include mali platform module headers +ccflags-y += \ + $(DEFINES) \ + -I$(src)/../common/include \ + -I$(srctree)/include/linux # Add kernel module target if any of our config options is enabled ifneq ($(mali_pixel-objs),) diff --git a/mali_pixel/Kconfig b/mali_pixel/Kconfig index 2406990..10ab093 100644 --- a/mali_pixel/Kconfig +++ b/mali_pixel/Kconfig @@ -25,8 +25,22 @@ config MALI_MEMORY_GROUP_MANAGER for allocation and release of pages for memory pools managed by Mali GPU device drivers. +config MALI_MEMORY_GROUP_MANAGER_DEBUG_FS + depends on MALI_MEMORY_GROUP_MANAGER && DEBUG_FS + bool "Enable Mali memory group manager debugfs nodes" + default n + help + Enables support for memory group manager debugfs nodes + config MALI_PRIORITY_CONTROL_MANAGER tristate "MALI_PRIORITY_CONTROL_MANAGER" help This option enables an implementation of a priority control manager for determining the target GPU scheduling priority of a process. + +config MALI_PROTECTED_MEMORY_ALLOCATOR + tristate "MALI_PROTECTED_MEMORY_ALLOCATOR" + help + This option enables an implementation of a protected memory allocator + for allocation and release of pages of protected memory for use by + Mali GPU device drivers. diff --git a/mali_pixel/Makefile b/mali_pixel/Makefile index 2bff7de..7b09188 100644 --- a/mali_pixel/Makefile +++ b/mali_pixel/Makefile @@ -8,6 +8,9 @@ M ?= $(shell pwd) KBUILD_OPTIONS += CONFIG_MALI_MEMORY_GROUP_MANAGER=m KBUILD_OPTIONS += CONFIG_MALI_PRIORITY_CONTROL_MANAGER=m +KBUILD_OPTIONS += CONFIG_MALI_PROTECTED_MEMORY_ALLOCATOR=m +KBUILD_OPTIONS += CONFIG_MALI_PIXEL_STATS=m +KBUILD_OPTIONS += CONFIG_MALI_PIXEL_GPU_SLC=y KBUILD_OPTIONS += $(KBUILD_EXTRA) # Extra config if any diff --git a/mali_pixel/mali_pixel_mod.c b/mali_pixel/mali_pixel_mod.c index 1fd4865..75b6c87 100644 --- a/mali_pixel/mali_pixel_mod.c +++ b/mali_pixel/mali_pixel_mod.c @@ -8,37 +8,17 @@ MODULE_DESCRIPTION("Pixel platform integration for GPU"); MODULE_IMPORT_NS(DMA_BUF); MODULE_AUTHOR("<sidaths@google.com>"); MODULE_VERSION("1.0"); -MODULE_SOFTDEP("pre: pixel_stat_sysfs"); MODULE_SOFTDEP("pre: slc_pmon"); MODULE_SOFTDEP("pre: slc_dummy"); MODULE_SOFTDEP("pre: slc_acpm"); -extern struct kobject *pixel_stat_kobj; - -struct kobject *pixel_stat_gpu_kobj; - -static int mali_pixel_init_pixel_stats(void) -{ - struct kobject *pixel_stat = pixel_stat_kobj; - - WARN_ON(pixel_stat_kobj == NULL); - - pixel_stat_gpu_kobj = kobject_create_and_add("gpu", pixel_stat); - if (!pixel_stat_gpu_kobj) - return -ENOMEM; - - return 0; -} - static int __init mali_pixel_init(void) { int ret = 0; - /* The Pixel Stats Sysfs module needs to be loaded first */ - if (pixel_stat_kobj == NULL) - return -EPROBE_DEFER; - +#ifdef CONFIG_MALI_PIXEL_STATS ret = mali_pixel_init_pixel_stats(); +#endif if (ret) goto fail_pixel_stats; @@ -50,13 +30,23 @@ static int __init mali_pixel_init(void) #ifdef CONFIG_MALI_PRIORITY_CONTROL_MANAGER ret = platform_driver_register(&priority_control_manager_driver); -#else #endif if (ret) goto fail_pcm; +#ifdef CONFIG_MALI_PROTECTED_MEMORY_ALLOCATOR + ret = platform_driver_register(&protected_memory_allocator_driver); +#endif + if (ret) + goto fail_pma; + goto exit; +fail_pma: +#ifdef CONFIG_MALI_PRIORITY_CONTROL_MANAGER + platform_driver_unregister(&priority_control_manager_driver); +#endif + fail_pcm: #ifdef CONFIG_MALI_MEMORY_GROUP_MANAGER platform_driver_unregister(&memory_group_manager_driver); @@ -74,6 +64,9 @@ module_init(mali_pixel_init); static void __exit mali_pixel_exit(void) { +#ifdef CONFIG_MALI_PROTECTED_MEMORY_ALLOCATOR + platform_driver_unregister(&protected_memory_allocator_driver); +#endif #ifdef CONFIG_MALI_PRIORITY_CONTROL_MANAGER platform_driver_unregister(&priority_control_manager_driver); #endif diff --git a/mali_pixel/mali_pixel_mod.h b/mali_pixel/mali_pixel_mod.h index 0f5f0d3..3a43c9e 100644 --- a/mali_pixel/mali_pixel_mod.h +++ b/mali_pixel/mali_pixel_mod.h @@ -8,4 +8,12 @@ extern struct platform_driver memory_group_manager_driver; #ifdef CONFIG_MALI_PRIORITY_CONTROL_MANAGER extern struct platform_driver priority_control_manager_driver; -#endif
\ No newline at end of file +#endif + +#ifdef CONFIG_MALI_PROTECTED_MEMORY_ALLOCATOR +extern struct platform_driver protected_memory_allocator_driver; +#endif + +#ifdef CONFIG_MALI_PIXEL_STATS +extern int mali_pixel_init_pixel_stats(void); +#endif diff --git a/mali_pixel/mali_pixel_stats.c b/mali_pixel/mali_pixel_stats.c new file mode 100644 index 0000000..dba388e --- /dev/null +++ b/mali_pixel/mali_pixel_stats.c @@ -0,0 +1,24 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include "mali_pixel_mod.h" +#include <linux/module.h> + +MODULE_SOFTDEP("pre: pixel_stat_sysfs"); + +extern struct kobject *pixel_stat_kobj; + +struct kobject *pixel_stat_gpu_kobj; + +int mali_pixel_init_pixel_stats(void) +{ + struct kobject *pixel_stat = pixel_stat_kobj; + + if (pixel_stat_kobj == NULL) + return -EPROBE_DEFER; + + pixel_stat_gpu_kobj = kobject_create_and_add("gpu", pixel_stat); + if (!pixel_stat_gpu_kobj) + return -ENOMEM; + + return 0; +} diff --git a/mali_pixel/memory_group_manager.c b/mali_pixel/memory_group_manager.c index 5c98a5d..0cde4e0 100644 --- a/mali_pixel/memory_group_manager.c +++ b/mali_pixel/memory_group_manager.c @@ -8,7 +8,7 @@ */ #include <linux/atomic.h> -#ifdef CONFIG_DEBUG_FS +#ifdef CONFIG_MALI_MEMORY_GROUP_MANAGER_DEBUG_FS #include <linux/debugfs.h> #endif #include <linux/fs.h> @@ -19,27 +19,41 @@ #include <linux/platform_device.h> #include <linux/slab.h> #include <linux/version.h> +#include <linux/limits.h> #include <linux/memory_group_manager.h> #include <soc/google/pt.h> +#include <uapi/gpu/arm/midgard/platform/pixel/pixel_memory_group_manager.h> + + +#define ORDER_SMALL_PAGE 0 +#define ORDER_LARGE_PAGE 9 + +/* Borr does not have "real" PBHA support. However, since we only use a 36-bit PA on the bus, + * AxADDR[39:36] is wired up to the GPU AxUSER[PBHA] field seen by the rest of the system. + * Those AxADDR bits come from [39:36] in the page descriptor. + * + * Odin and Turse have "real" PBHA support using a dedicated output signal and page descriptor field. + * The AxUSER[PBHA] field is driven by the GPU's PBHA signal, and AxADDR[39:36] is dropped. + * The page descriptor PBHA field is [62:59]. + * + * We could write to both of these locations, as each SoC only reads from its respective PBHA + * location with the other being ignored or dropped. + * + * b/148988078 contains confirmation of the above description. + */ +#if IS_ENABLED(CONFIG_SOC_GS101) #define PBHA_BIT_POS (36) +#else +#define PBHA_BIT_POS (59) +#endif #define PBHA_BIT_MASK (0xf) #define MGM_PBHA_DEFAULT 0 -#define GROUP_ID_TO_PT_IDX(x) ((x)-1) -/* The Mali driver requires that allocations made on one of the groups - * are not treated specially. - */ -#define MGM_RESERVED_GROUP_ID 0 - -/* Imported memory is handled by the allocator of the memory, and the Mali - * DDK will request a group_id for such memory via mgm_get_import_memory_id(). - * We specify which group we want to use for this here. - */ -#define MGM_IMPORTED_MEMORY_GROUP_ID (MEMORY_GROUP_MANAGER_NR_GROUPS - 1) +#define MGM_SENTINEL_PT_SIZE U64_MAX #define INVALID_GROUP_ID(group_id) \ (WARN_ON((group_id) < 0) || \ @@ -68,8 +82,12 @@ static inline vm_fault_t vmf_insert_pfn_prot(struct vm_area_struct *vma, * @lp_size: The number of allocated large(2MB) pages * @insert_pfn: The number of calls to map pages for CPU access. * @update_gpu_pte: The number of calls to update GPU page table entries. - * @ptid: The partition ID for this group + * @ptid: The active partition ID for this group * @pbha: The PBHA bits assigned to this group, + * @base_pt: The base partition ID available to this group. + * @pt_num: The number of partitions available to this group. + * @active_pt_idx: The relative index for the partition backing the group. + * Different from the absolute ptid. * @state: The lifecycle state of the partition associated with this group * This structure allows page allocation information to be displayed via * debugfs. Display is organized per group with small and large sized pages. @@ -77,11 +95,17 @@ static inline vm_fault_t vmf_insert_pfn_prot(struct vm_area_struct *vma, struct mgm_group { atomic_t size; atomic_t lp_size; +#ifdef CONFIG_MALI_MEMORY_GROUP_MANAGER_DEBUG_FS atomic_t insert_pfn; atomic_t update_gpu_pte; +#endif ptid_t ptid; ptpbha_t pbha; + + u32 base_pt; + u32 pt_num; + u32 active_pt_idx; enum { MGM_GROUP_STATE_NEW = 0, MGM_GROUP_STATE_ENABLED = 10, @@ -91,10 +115,23 @@ struct mgm_group { }; /** + * struct partition_stats - Structure for tracking sizing of a partition + * + * @capacity: The total capacity of each partition + * @size: The current size of each partition + */ +struct partition_stats { + u64 capacity; + atomic64_t size; +}; + +/** * struct mgm_groups - Structure for groups of memory group manager * * @groups: To keep track of the number of allocated pages of all groups * @ngroups: Number of groups actually used + * @npartitions: Number of partitions used by all groups combined + * @pt_stats: The sizing info for each partition * @dev: device attached * @pt_handle: Link to SLC partition data * @kobj: &sruct kobject used for linking to pixel_stats_sysfs node @@ -106,10 +143,12 @@ struct mgm_group { struct mgm_groups { struct mgm_group groups[MEMORY_GROUP_MANAGER_NR_GROUPS]; size_t ngroups; + size_t npartitions; + struct partition_stats *pt_stats; struct device *dev; struct pt_handle *pt_handle; struct kobject kobj; -#ifdef CONFIG_DEBUG_FS +#ifdef CONFIG_MALI_MEMORY_GROUP_MANAGER_DEBUG_FS struct dentry *mgm_debugfs_root; #endif }; @@ -118,7 +157,7 @@ struct mgm_groups { * DebugFS */ -#ifdef CONFIG_DEBUG_FS +#ifdef CONFIG_MALI_MEMORY_GROUP_MANAGER_DEBUG_FS static int mgm_debugfs_state_get(void *data, u64 *val) { @@ -249,15 +288,14 @@ static int mgm_debugfs_init(struct mgm_groups *mgm_data) return 0; } -#endif /* CONFIG_DEBUG_FS */ +#endif /* CONFIG_MALI_MEMORY_GROUP_MANAGER_DEBUG_FS */ /* * Pixel Stats sysfs */ -extern struct kobject *pixel_stat_gpu_kobj; +#ifdef CONFIG_MALI_PIXEL_STATS -#define ORDER_SMALL_PAGE 0 -#define ORDER_LARGE_PAGE 9 +extern struct kobject *pixel_stat_gpu_kobj; #define MGM_ATTR_RO(_name) \ static struct kobj_attribute _name##_attr = __ATTR_RO(_name) @@ -343,41 +381,81 @@ static void mgm_sysfs_term(struct mgm_groups *data) kobject_put(&data->kobj); } +#else /* CONFIG_MALI_PIXEL_STATS */ + +static int mgm_sysfs_init(struct mgm_groups *data) +{ + return 0; +} + +static void mgm_sysfs_term(struct mgm_groups *data) +{} + +#endif /* CONFIG_MALI_PIXEL_STATS */ + +static int group_pt_id(struct mgm_groups *data, enum pixel_mgm_group_id group_id, int pt_index) +{ + struct mgm_group *group = &data->groups[group_id]; + if (WARN_ON_ONCE(pt_index >= group->pt_num)) + return 0; + + return group->base_pt + pt_index; +} + +static int group_active_pt_id(struct mgm_groups *data, enum pixel_mgm_group_id group_id) +{ + return group_pt_id(data, group_id, data->groups[group_id].active_pt_idx); +} + static atomic64_t total_gpu_pages = ATOMIC64_INIT(0); -static void update_size(struct memory_group_manager_device *mgm_dev, int - group_id, int order, bool alloc) +static atomic_t* get_size_counter(struct memory_group_manager_device* mgm_dev, int group_id, int order) { - static DEFINE_RATELIMIT_STATE(gpu_alloc_rs, 10*HZ, 1); + static atomic_t err_atomic; struct mgm_groups *data = mgm_dev->data; switch (order) { case ORDER_SMALL_PAGE: - if (alloc) { - atomic_inc(&data->groups[group_id].size); - atomic64_inc(&total_gpu_pages); - } else { - WARN_ON(atomic_read(&data->groups[group_id].size) == 0); - atomic_dec(&data->groups[group_id].size); - atomic64_dec(&total_gpu_pages); - } - break; - + return &data->groups[group_id].size; case ORDER_LARGE_PAGE: - if (alloc) { - atomic_inc(&data->groups[group_id].lp_size); - atomic64_add(1 << ORDER_LARGE_PAGE, &total_gpu_pages); - } else { - WARN_ON(atomic_read( - &data->groups[group_id].lp_size) == 0); - atomic_dec(&data->groups[group_id].lp_size); - atomic64_sub(1 << ORDER_LARGE_PAGE, &total_gpu_pages); - } - break; - + return &data->groups[group_id].lp_size; default: dev_err(data->dev, "Unknown order(%d)\n", order); - break; + return &err_atomic; + } +} + +static void update_size(struct memory_group_manager_device *mgm_dev, int + group_id, int order, bool alloc) +{ + static DEFINE_RATELIMIT_STATE(gpu_alloc_rs, 10*HZ, 1); + atomic_t* size = get_size_counter(mgm_dev, group_id, order); + + if (alloc) { + atomic_inc(size); + atomic64_add(1 << order, &total_gpu_pages); + } else { + if (atomic_dec_return(size) < 0) { + /* b/289501175 + * Pages are often 'migrated' to the SLC group, which needs special + * accounting. + * + * TODO: Remove after SLC MGM decoupling b/290354607 + */ + if (!WARN_ON(group_id != MGM_SLC_GROUP_ID)) { + /* Undo the dec, and instead decrement the reserved group counter. + * This is still making the assumption that the migration came from + * the reserved group. Currently this is always true, however it + * might not be in future. It would be invasive and costly to track + * where every page came from, so instead this will be fixed as part + * of the b/290354607 effort. + */ + atomic_inc(size); + update_size(mgm_dev, MGM_RESERVED_GROUP_ID, order, alloc); + return; + } + } + atomic64_sub(1 << order, &total_gpu_pages); } if (atomic64_read(&total_gpu_pages) >= (4 << (30 - PAGE_SHIFT)) && @@ -385,6 +463,185 @@ static void update_size(struct memory_group_manager_device *mgm_dev, int pr_warn("total_gpu_pages %lld\n", atomic64_read(&total_gpu_pages)); } +static void pt_size_invalidate(struct mgm_groups* data, int pt_idx) +{ + /* Set the size to a known sentinel value so that we can later detect an update */ + atomic64_set(&data->pt_stats[pt_idx].size, MGM_SENTINEL_PT_SIZE); +} + +static void pt_size_init(struct mgm_groups* data, int pt_idx, size_t size) +{ + /* The resize callback may have already been executed, which would have set + * the correct size. Only update the size if this has not happened. + * We can tell that no resize took place if the size is still a sentinel. + */ + atomic64_cmpxchg(&data->pt_stats[pt_idx].size, MGM_SENTINEL_PT_SIZE, size); +} + +static void validate_ptid(struct mgm_groups* data, enum pixel_mgm_group_id group_id, int ptid) +{ + if (ptid == -EINVAL) + dev_err(data->dev, "Failed to get partition for group: %d\n", group_id); + else + dev_info(data->dev, "pt_client_mutate returned ptid=%d for group=%d", ptid, group_id); +} + +static void update_group(struct mgm_groups* data, + enum pixel_mgm_group_id group_id, + int ptid, + int relative_pt_idx) +{ + int const abs_pt_idx = group_pt_id(data, group_id, relative_pt_idx); + int const pbha = pt_pbha(data->dev->of_node, abs_pt_idx); + + if (pbha == PT_PBHA_INVALID) + dev_err(data->dev, "Failed to get PBHA for group: %d\n", group_id); + else + dev_info(data->dev, "pt_pbha returned PBHA=%d for group=%d", pbha, group_id); + + data->groups[group_id].ptid = ptid; + data->groups[group_id].pbha = pbha; + data->groups[group_id].state = MGM_GROUP_STATE_ENABLED; + data->groups[group_id].active_pt_idx = relative_pt_idx; +} + +static void disable_partition(struct mgm_groups* data, enum pixel_mgm_group_id group_id) +{ + int const active_idx = group_active_pt_id(data, group_id); + + /* Skip if not already enabled */ + if (data->groups[group_id].state != MGM_GROUP_STATE_ENABLED) + return; + + pt_client_disable_no_free(data->pt_handle, active_idx); + data->groups[group_id].state = MGM_GROUP_STATE_DISABLED_NOT_FREED; + + pt_size_invalidate(data, active_idx); + pt_size_init(data, active_idx, 0); +} + +static void enable_partition(struct mgm_groups* data, enum pixel_mgm_group_id group_id) +{ + int ptid; + size_t size = 0; + int const active_idx = group_active_pt_id(data, group_id); + + /* Skip if already enabled */ + if (data->groups[group_id].state == MGM_GROUP_STATE_ENABLED) + return; + + pt_size_invalidate(data, active_idx); + + ptid = pt_client_enable_size(data->pt_handle, active_idx, &size); + + validate_ptid(data, group_id, ptid); + + update_group(data, group_id, ptid, data->groups[group_id].active_pt_idx); + + pt_size_init(data, active_idx, size); +} + +static void set_group_partition(struct mgm_groups* data, + enum pixel_mgm_group_id group_id, + int new_pt_index) +{ + int ptid; + size_t size = 0; + int const active_idx = group_active_pt_id(data, group_id); + int const new_idx = group_pt_id(data, group_id, new_pt_index); + + /* Early out if no changes are needed */ + if (new_idx == active_idx) + return; + + pt_size_invalidate(data, new_idx); + + ptid = pt_client_mutate_size(data->pt_handle, active_idx, new_idx, &size); + + validate_ptid(data, group_id, ptid); + + update_group(data, group_id, ptid, new_pt_index); + + pt_size_init(data, new_idx, size); + /* Reset old partition size */ + atomic64_set(&data->pt_stats[active_idx].size, data->pt_stats[active_idx].capacity); +} + +u64 pixel_mgm_query_group_size(struct memory_group_manager_device* mgm_dev, + enum pixel_mgm_group_id group_id) +{ + struct mgm_groups *data; + struct mgm_group *group; + u64 size = 0; + + /* Early out if the group doesn't exist */ + if (INVALID_GROUP_ID(group_id)) + goto done; + + data = mgm_dev->data; + group = &data->groups[group_id]; + + /* Early out if the group has no partitions */ + if (group->pt_num == 0) + goto done; + + size = atomic64_read(&data->pt_stats[group_active_pt_id(data, group_id)].size); + +done: + return size; +} +EXPORT_SYMBOL(pixel_mgm_query_group_size); + +void pixel_mgm_resize_group_to_fit(struct memory_group_manager_device* mgm_dev, + enum pixel_mgm_group_id group_id, + u64 demand) +{ + struct mgm_groups *data; + struct mgm_group *group; + s64 diff, cur_size, min_diff = S64_MAX; + int pt_idx; + + /* Early out if the group doesn't exist */ + if (INVALID_GROUP_ID(group_id)) + goto done; + + data = mgm_dev->data; + group = &data->groups[group_id]; + + /* Early out if the group has no partitions */ + if (group->pt_num == 0) + goto done; + + /* We can disable the partition if there's no demand */ + if (demand == 0) + { + disable_partition(data, group_id); + goto done; + } + + /* Calculate best partition to use, by finding the nearest capacity */ + for (pt_idx = 0; pt_idx < group->pt_num; ++pt_idx) + { + cur_size = data->pt_stats[group_pt_id(data, group_id, pt_idx)].capacity; + diff = abs(demand - cur_size); + + if (diff > min_diff) + break; + + min_diff = diff; + } + + /* Ensure the partition is enabled before trying to mutate it */ + enable_partition(data, group_id); + set_group_partition(data, group_id, pt_idx - 1); + +done: + dev_dbg(data->dev, "%s: resized memory_group_%d for demand: %lldB", __func__, group_id, demand); + + return; +} +EXPORT_SYMBOL(pixel_mgm_resize_group_to_fit); + static struct page *mgm_alloc_page( struct memory_group_manager_device *mgm_dev, int group_id, gfp_t gfp_mask, unsigned int order) @@ -400,7 +657,7 @@ static struct page *mgm_alloc_page( return NULL; if (WARN_ON_ONCE((group_id != MGM_RESERVED_GROUP_ID) && - (GROUP_ID_TO_PT_IDX(group_id) >= data->ngroups))) + (group_active_pt_id(data, group_id) >= data->npartitions))) return NULL; /* We don't expect to be allocting pages into the group used for @@ -413,38 +670,9 @@ static struct page *mgm_alloc_page( * ensure that we have enabled the relevant partitions for it. */ if (group_id != MGM_RESERVED_GROUP_ID) { - int ptid, pbha; switch (data->groups[group_id].state) { case MGM_GROUP_STATE_NEW: - ptid = pt_client_enable(data->pt_handle, - GROUP_ID_TO_PT_IDX(group_id)); - if (ptid == -EINVAL) { - dev_err(data->dev, - "Failed to get partition for group: " - "%d\n", group_id); - } else { - dev_info(data->dev, - "pt_client_enable returned ptid=%d for" - " group=%d", - ptid, group_id); - } - - pbha = pt_pbha(data->dev->of_node, - GROUP_ID_TO_PT_IDX(group_id)); - if (pbha == PT_PBHA_INVALID) { - dev_err(data->dev, - "Failed to get PBHA for group: %d\n", - group_id); - } else { - dev_info(data->dev, - "pt_pbha returned PBHA=%d for group=%d", - pbha, group_id); - } - - data->groups[group_id].ptid = ptid; - data->groups[group_id].pbha = pbha; - data->groups[group_id].state = MGM_GROUP_STATE_ENABLED; - + enable_partition(data, group_id); break; case MGM_GROUP_STATE_ENABLED: case MGM_GROUP_STATE_DISABLED_NOT_FREED: @@ -534,7 +762,7 @@ static u64 mgm_update_gpu_pte( switch (group_id) { case MGM_RESERVED_GROUP_ID: - case MGM_IMPORTED_MEMORY_GROUP_ID: + case MGM_IMPORTED_MEMORY_GROUP_ID: /* The reserved group doesn't set PBHA bits */ /* TODO: Determine what to do with imported memory */ break; @@ -558,7 +786,35 @@ static u64 mgm_update_gpu_pte( } } +#ifdef CONFIG_MALI_MEMORY_GROUP_MANAGER_DEBUG_FS atomic_inc(&data->groups[group_id].update_gpu_pte); +#endif + + return pte; +} + +static u64 mgm_pte_to_original_pte(struct memory_group_manager_device *mgm_dev, int group_id, + int mmu_level, u64 pte) +{ + struct mgm_groups *const data = mgm_dev->data; + u64 old_pte; + + if (INVALID_GROUP_ID(group_id)) + return pte; + + switch (group_id) { + case MGM_RESERVED_GROUP_ID: + case MGM_IMPORTED_MEMORY_GROUP_ID: + /* The reserved group doesn't set PBHA bits */ + /* TODO: Determine what to do with imported memory */ + break; + default: + /* All other groups will have PBHA bits, so clear them */ + old_pte = pte; + pte &= ~((u64)PBHA_BIT_MASK << PBHA_BIT_POS); + dev_dbg(data->dev, "%s: group_id=%d pte=0x%llx -> 0x%llx\n", __func__, group_id, + old_pte, pte); + } return pte; } @@ -582,57 +838,105 @@ static vm_fault_t mgm_vmf_insert_pfn_prot( fault = vmf_insert_pfn_prot(vma, addr, pfn, prot); - if (fault == VM_FAULT_NOPAGE) - atomic_inc(&data->groups[group_id].insert_pfn); - else + if (fault != VM_FAULT_NOPAGE) dev_err(data->dev, "vmf_insert_pfn_prot failed\n"); +#ifdef CONFIG_MALI_MEMORY_GROUP_MANAGER_DEBUG_FS + else + atomic_inc(&data->groups[group_id].insert_pfn); +#endif return fault; } static void mgm_resize_callback(void *data, int id, size_t size_allocated) { - /* Currently we don't do anything on partition resize */ struct mgm_groups *const mgm_data = (struct mgm_groups *)data; - dev_dbg(mgm_data->dev, "Resize callback called, size_allocated: %zu\n", - size_allocated); + dev_dbg(mgm_data->dev, "Resize callback called, size_allocated: %zu\n", size_allocated); + /* Update the partition size for the group */ + atomic64_set(&mgm_data->pt_stats[id].size, size_allocated); } static int mgm_initialize_data(struct mgm_groups *mgm_data) { int i, ret; - const int ngroups = of_property_count_strings(mgm_data->dev->of_node, "pt_id"); + /* +1 to include the required default group */ + const int ngroups = of_property_count_strings(mgm_data->dev->of_node, "groups") + 1; if (WARN_ON(ngroups < 0) || WARN_ON(ngroups > MEMORY_GROUP_MANAGER_NR_GROUPS)) { mgm_data->ngroups = 0; } else { mgm_data->ngroups = ngroups; } + mgm_data->npartitions = of_property_count_strings(mgm_data->dev->of_node, "pt_id"); + + mgm_data->pt_stats = kzalloc(mgm_data->npartitions * sizeof(struct partition_stats), GFP_KERNEL); + if (mgm_data->pt_stats == NULL) { + dev_err(mgm_data->dev, "failed to allocate space for pt_stats"); + ret = -ENOMEM; + goto out_err; + } + + for (i = 0; i < mgm_data->npartitions; i++) { + struct partition_stats* stats; + u32 capacity_kb; + ret = of_property_read_u32_index(mgm_data->dev->of_node, "pt_size", i, &capacity_kb); + if (ret) { + dev_err(mgm_data->dev, "failed to read pt_size[%d]", i); + continue; + } + + stats = &mgm_data->pt_stats[i]; + // Convert from KB to bytes + stats->capacity = (u64)capacity_kb << 10; + atomic64_set(&stats->size, stats->capacity); + } for (i = 0; i < MEMORY_GROUP_MANAGER_NR_GROUPS; i++) { atomic_set(&mgm_data->groups[i].size, 0); atomic_set(&mgm_data->groups[i].lp_size, 0); +#ifdef CONFIG_MALI_MEMORY_GROUP_MANAGER_DEBUG_FS atomic_set(&mgm_data->groups[i].insert_pfn, 0); atomic_set(&mgm_data->groups[i].update_gpu_pte, 0); +#endif mgm_data->groups[i].pbha = MGM_PBHA_DEFAULT; + mgm_data->groups[i].base_pt = 0; + mgm_data->groups[i].pt_num = 0; + mgm_data->groups[i].active_pt_idx = 0; mgm_data->groups[i].state = MGM_GROUP_STATE_NEW; } + /* Discover the partitions belonging to each memory group, skipping the reserved group */ + for (i = 1; i < mgm_data->ngroups; i++) { + /* Device tree has no description for the reserved group */ + int const dt_idx = i - 1; + + int err = of_property_read_u32_index( + mgm_data->dev->of_node, "group_base_pt", dt_idx, &mgm_data->groups[i].base_pt); + if (err) { + dev_warn(mgm_data->dev, "failed to read base pt index for group %d", i); + continue; + } + + err = of_property_read_u32_index( + mgm_data->dev->of_node, "group_pt_num", dt_idx, &mgm_data->groups[i].pt_num); + if (err) + dev_warn(mgm_data->dev, "failed to read pt number for group %d", i); + } + /* * Initialize SLC partitions. We don't enable partitions until * we actually allocate memory to the corresponding memory * group */ - mgm_data->pt_handle = pt_client_register( - mgm_data->dev->of_node, - (void *)mgm_data, &mgm_resize_callback); + mgm_data->pt_handle = + pt_client_register(mgm_data->dev->of_node, (void*)mgm_data, &mgm_resize_callback); if (IS_ERR(mgm_data->pt_handle)) { ret = PTR_ERR(mgm_data->pt_handle); dev_err(mgm_data->dev, "pt_client_register returned %d\n", ret); - return ret; + goto out_err; } /* We don't use PBHA bits for the reserved memory group, and so @@ -640,13 +944,26 @@ static int mgm_initialize_data(struct mgm_groups *mgm_data) */ mgm_data->groups[MGM_RESERVED_GROUP_ID].state = MGM_GROUP_STATE_ENABLED; - ret = mgm_debugfs_init(mgm_data); - if (ret) - goto out; + if ((ret = mgm_debugfs_init(mgm_data))) + goto out_err; - ret = mgm_sysfs_init(mgm_data); + if ((ret = mgm_sysfs_init(mgm_data))) + goto out_err; + +#ifdef CONFIG_MALI_PIXEL_GPU_SLC + /* We enable the SLC partition by default to support dynamic SLC caching. + * Enabling will initialize the partition, by querying the pbha and assigning a ptid. + * We then immediately disable the partition, effectively resizing the group to zero, + * whilst still retaining other properties such as pbha. + */ + enable_partition(mgm_data, MGM_SLC_GROUP_ID); + disable_partition(mgm_data, MGM_SLC_GROUP_ID); +#endif -out: + return ret; + +out_err: + kfree(mgm_data->pt_stats); return ret; } @@ -677,8 +994,10 @@ static void mgm_term_data(struct mgm_groups *data) break; case MGM_GROUP_STATE_ENABLED: + pt_client_disable(data->pt_handle, group_active_pt_id(data, i)); + break; case MGM_GROUP_STATE_DISABLED_NOT_FREED: - pt_client_free(data->pt_handle, group->ptid); + pt_client_free(data->pt_handle, group_active_pt_id(data, i)); break; default: @@ -704,12 +1023,14 @@ static int memory_group_manager_probe(struct platform_device *pdev) return -ENOMEM; mgm_dev->owner = THIS_MODULE; - mgm_dev->ops.mgm_alloc_page = mgm_alloc_page; - mgm_dev->ops.mgm_free_page = mgm_free_page; - mgm_dev->ops.mgm_get_import_memory_id = - mgm_get_import_memory_id; - mgm_dev->ops.mgm_vmf_insert_pfn_prot = mgm_vmf_insert_pfn_prot; - mgm_dev->ops.mgm_update_gpu_pte = mgm_update_gpu_pte; + mgm_dev->ops = (struct memory_group_manager_ops){ + .mgm_alloc_page = mgm_alloc_page, + .mgm_free_page = mgm_free_page, + .mgm_get_import_memory_id = mgm_get_import_memory_id, + .mgm_update_gpu_pte = mgm_update_gpu_pte, + .mgm_pte_to_original_pte = mgm_pte_to_original_pte, + .mgm_vmf_insert_pfn_prot = mgm_vmf_insert_pfn_prot, + }; mgm_data = kzalloc(sizeof(*mgm_data), GFP_KERNEL); if (!mgm_data) { diff --git a/mali_pixel/protected_memory_allocator.c b/mali_pixel/protected_memory_allocator.c new file mode 100644 index 0000000..25b5bde --- /dev/null +++ b/mali_pixel/protected_memory_allocator.c @@ -0,0 +1,580 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright 2021 Google LLC. + * + * Protected memory allocator driver for allocation and release of pages of + * protected memory for use by Mali GPU device drivers. + */ + +#include <linux/dma-buf.h> +#include <linux/dma-heap.h> +#include <linux/of.h> +#include <linux/module.h> +#include <linux/platform_device.h> +#include <linux/protected_memory_allocator.h> +#include <linux/slab.h> + +#define MALI_PMA_DMA_HEAP_NAME "vframe-secure" +#define MALI_PMA_SLAB_SIZE (1 << 16) +#define MALI_PMA_SLAB_BLOCK_SIZE (PAGE_SIZE) +#define MALI_PMA_SLAB_BLOCK_COUNT \ + (MALI_PMA_SLAB_SIZE / MALI_PMA_SLAB_BLOCK_SIZE) +#define MALI_PMA_MAX_ALLOC_SIZE (MALI_PMA_SLAB_SIZE) + +/** + * struct mali_pma_dev - Structure for managing a Mali protected memory + * allocator device. + * + * @pma_dev: The base protected memory allocator device. + * @dev: The device for which to allocate protected memory. + * @dma_heap: The DMA buffer heap from which to allocate protected memory. + * @slab_list: List of allocated slabs of protected memory. + * @slab_mutex: Mutex used to serialize access to the slab list. + */ +struct mali_pma_dev { + struct protected_memory_allocator_device pma_dev; + struct device *dev; + struct dma_heap *dma_heap; + struct list_head slab_list; + struct mutex slab_mutex; +}; + +/** + * struct mali_protected_memory_allocation - Structure for tracking a Mali + * protected memory allocation. + * + * @pma: The base protected memory allocation record. + * @slab: Protected memory slab used for allocation. + * @first_block_index: Index of first memory block allocated from the slab. + * @block_count: Count of the number of blocks allocated from the slab. + */ +struct mali_protected_memory_allocation { + struct protected_memory_allocation pma; + struct mali_pma_slab *slab; + int first_block_index; + int block_count; +}; + +/** + * struct mali_pma_slab - Structure for managing a slab of Mali protected + * memory. + * + * @list_entry: Entry in slab list. + * @base: Physical base address of slab memory. + * @dma_buf: The DMA buffer allocated for the slab . A reference to the DMA + * buffer is held by this pointer. + * @dma_attachment: The DMA buffer device attachment. + * @dma_sg_table: The DMA buffer scatter/gather table. + * @allocated_block_map: Bit map of allocated blocks in the slab. + */ +struct mali_pma_slab { + struct list_head list_entry; + phys_addr_t base; + struct dma_buf *dma_buf; + struct dma_buf_attachment *dma_attachment; + struct sg_table *dma_sg_table; + uint64_t allocated_block_map; +}; +static_assert(8 * sizeof(((struct mali_pma_slab *) 0)->allocated_block_map) >= + MALI_PMA_SLAB_BLOCK_COUNT); + +static struct protected_memory_allocation *mali_pma_alloc_page( + struct protected_memory_allocator_device *pma_dev, + unsigned int order); + +static phys_addr_t mali_pma_get_phys_addr( + struct protected_memory_allocator_device *pma_dev, + struct protected_memory_allocation *pma); + +static void mali_pma_free_page( + struct protected_memory_allocator_device *pma_dev, + struct protected_memory_allocation *pma); + +static bool mali_pma_slab_alloc( + struct mali_pma_dev* mali_pma_dev, + struct mali_protected_memory_allocation *mali_pma, size_t size); + +static void mali_pma_slab_dealloc( + struct mali_pma_dev* mali_pma_dev, + struct mali_protected_memory_allocation *mali_pma); + +static bool mali_pma_slab_find_available( + struct mali_pma_dev* mali_pma_dev, size_t size, + struct mali_pma_slab** p_slab, int* p_block_index); + +static struct mali_pma_slab* mali_pma_slab_add( + struct mali_pma_dev* mali_pma_dev); + +static void mali_pma_slab_remove( + struct mali_pma_dev* mali_pma_dev, struct mali_pma_slab* slab); + +static int protected_memory_allocator_probe(struct platform_device *pdev); + +static int protected_memory_allocator_remove(struct platform_device *pdev); + +/** + * mali_pma_alloc_page - Allocate protected memory pages + * + * @pma_dev: The protected memory allocator the request is being made + * through. + * @order: How many pages to allocate, as a base-2 logarithm. + * + * Return: Pointer to allocated memory, or NULL if allocation failed. + */ +static struct protected_memory_allocation *mali_pma_alloc_page( + struct protected_memory_allocator_device *pma_dev, + unsigned int order) { + struct mali_pma_dev *mali_pma_dev; + struct protected_memory_allocation* pma = NULL; + struct mali_protected_memory_allocation *mali_pma; + size_t alloc_size; + bool succeeded = false; + + /* Get the Mali protected memory allocator device record. */ + mali_pma_dev = container_of(pma_dev, struct mali_pma_dev, pma_dev); + + /* Check requested size against the maximum size. */ + alloc_size = 1 << (PAGE_SHIFT + order); + if (alloc_size > MALI_PMA_MAX_ALLOC_SIZE) { + dev_err(mali_pma_dev->dev, + "Protected memory allocation size %zu too big\n", + alloc_size); + goto out; + } + + /* Allocate a Mali protected memory allocation record. */ + mali_pma = devm_kzalloc( + mali_pma_dev->dev, sizeof(*mali_pma), GFP_KERNEL); + if (!mali_pma) { + dev_err(mali_pma_dev->dev, + "Failed to allocate a Mali protected memory allocation " + "record\n"); + goto out; + } + pma = &(mali_pma->pma); + pma->order = order; + + /* Allocate Mali protected memory from a slab. */ + if (!mali_pma_slab_alloc(mali_pma_dev, mali_pma, alloc_size)) { + dev_err(mali_pma_dev->dev, + "Failed to allocate Mali protected memory.\n"); + goto out; + } + + /* Mark the allocation as successful. */ + succeeded = true; + +out: + /* Clean up on error. */ + if (!succeeded) { + if (pma) { + mali_pma_free_page(pma_dev, pma); + pma = NULL; + } + } + + return pma; +} + +/** + * mali_pma_get_phys_addr - Get the physical address of the protected memory + * allocation + * + * @pma_dev: The protected memory allocator the request is being made + * through. + * @pma: The protected memory allocation whose physical address + * shall be retrieved + * + * Return: The physical address of the given allocation. + */ +static phys_addr_t mali_pma_get_phys_addr( + struct protected_memory_allocator_device *pma_dev, + struct protected_memory_allocation *pma) { + return pma->pa; +} + +/** + * mali_pma_free_page - Free a page of memory + * + * @pma_dev: The protected memory allocator the request is being made + * through. + * @pma: The protected memory allocation to free. + */ +static void mali_pma_free_page( + struct protected_memory_allocator_device *pma_dev, + struct protected_memory_allocation *pma) { + struct mali_pma_dev *mali_pma_dev; + struct mali_protected_memory_allocation *mali_pma; + + /* + * Get the Mali protected memory allocator device record and allocation + * record. + */ + mali_pma_dev = container_of(pma_dev, struct mali_pma_dev, pma_dev); + mali_pma = + container_of(pma, struct mali_protected_memory_allocation, pma); + + /* Deallocate Mali protected memory from the slab. */ + mali_pma_slab_dealloc(mali_pma_dev, mali_pma); + + /* Deallocate the Mali protected memory allocation record. */ + devm_kfree(mali_pma_dev->dev, mali_pma); +} + +/** + * mali_pma_slab_alloc - Allocate protected memory from a slab + * + * @mali_pma_dev: Mali protected memory allocator device. + * @mali_pma: Mali protected memory allocation record to hold the slab memory. + * @size: Size in bytes of memory to allocate. + * + * Return: True if memory was successfully allocated. + */ +static bool mali_pma_slab_alloc( + struct mali_pma_dev *mali_pma_dev, + struct mali_protected_memory_allocation *mali_pma, size_t size) { + struct mali_pma_slab *slab; + int start_block; + int block_count; + bool succeeded = false; + + /* Lock the slab list. */ + mutex_lock(&(mali_pma_dev->slab_mutex)); + + /* + * Try finding an existing slab from which to allocate. If none are + * available, add a new slab and allocate from it. + */ + if (!mali_pma_slab_find_available( + mali_pma_dev, size, &slab, &start_block)) { + slab = mali_pma_slab_add(mali_pma_dev); + if (!slab) { + goto out; + } + start_block = 0; + } + + /* Allocate a contiguous set of blocks from the slab. */ + block_count = DIV_ROUND_UP(size, MALI_PMA_SLAB_BLOCK_SIZE); + bitmap_set((unsigned long *) &(slab->allocated_block_map), + start_block, block_count); + + /* + * Use the allocated slab memory for the Mali protected memory + * allocation. + */ + mali_pma->pma.pa = + slab->base + (start_block * MALI_PMA_SLAB_BLOCK_SIZE); + mali_pma->slab = slab; + mali_pma->first_block_index = start_block; + mali_pma->block_count = block_count; + + /* Mark the allocation as successful. */ + succeeded = true; + +out: + /* Unlock the slab list. */ + mutex_unlock(&(mali_pma_dev->slab_mutex)); + + return succeeded; +} + +/** + * mali_pma_slab_dealloc - Deallocate protected memory from a slab + * + * @mali_pma_dev: Mali protected memory allocator device. + * @mali_pma: Mali protected memory allocation record holding slab memory to + * deallocate. + */ +static void mali_pma_slab_dealloc( + struct mali_pma_dev *mali_pma_dev, + struct mali_protected_memory_allocation *mali_pma) { + struct mali_pma_slab *slab; + + /* Lock the slab list. */ + mutex_lock(&(mali_pma_dev->slab_mutex)); + + /* Get the slab. */ + slab = mali_pma->slab; + + /* Deallocate the slab. */ + if (slab != NULL) { + /* Deallocate all the blocks in the slab. */ + bitmap_clear((unsigned long *) &(slab->allocated_block_map), + mali_pma->first_block_index, + mali_pma->block_count); + + /* If no slab blocks remain allocated, remove the slab. */ + if (bitmap_empty( + (unsigned long *) &(slab->allocated_block_map), + MALI_PMA_SLAB_BLOCK_COUNT)) { + mali_pma_slab_remove(mali_pma_dev, slab); + } + } + + /* Unlock the slab list. */ + mutex_unlock(&(mali_pma_dev->slab_mutex)); +} + +/** + * mali_pma_slab_find_available - Find a slab with available memory + * + * Must be called with the slab list mutex locked. + * + * @mali_pma_dev: Mali protected memory allocator device. + * @size: Size in bytes of requested memory. + * @p_slab: Returned slab with requested memory available. + * @p_block_index: Returned starting block index of available memory. + * + * Return: True if a slab was found with the requested memory available. + */ +static bool mali_pma_slab_find_available( + struct mali_pma_dev *mali_pma_dev, size_t size, + struct mali_pma_slab **p_slab, int *p_block_index) { + struct mali_pma_slab *slab; + int block_count; + int start_block; + bool found = false; + + /* Ensure the slab list mutex is locked. */ + lockdep_assert_held(&(mali_pma_dev->slab_mutex)); + + /* Search slabs for a contiguous set of blocks of the requested size. */ + block_count = DIV_ROUND_UP(size, MALI_PMA_SLAB_BLOCK_SIZE); + list_for_each_entry(slab, &(mali_pma_dev->slab_list), list_entry) { + start_block = bitmap_find_next_zero_area_off( + (unsigned long *) &(slab->allocated_block_map), + MALI_PMA_SLAB_BLOCK_COUNT, 0, block_count, 0, 0); + if (start_block < MALI_PMA_SLAB_BLOCK_COUNT) { + found = true; + break; + } + } + + /* Return results if found. */ + if (found) { + *p_slab = slab; + *p_block_index = start_block; + } + + return found; +} + +/** + * mali_pma_slab_add - Allocate and add a new slab + * + * Must be called with the slab list mutex locked. + * + * @mali_pma_dev: Mali protected memory allocator device. + * + * Return: Newly added slab. + */ +static struct mali_pma_slab *mali_pma_slab_add( + struct mali_pma_dev *mali_pma_dev) { + struct mali_pma_slab *slab = NULL; + struct dma_buf *dma_buf; + struct dma_buf_attachment *dma_attachment; + struct sg_table *dma_sg_table; + bool succeeded = false; + + /* Ensure the slab list mutex is locked. */ + lockdep_assert_held(&(mali_pma_dev->slab_mutex)); + + /* Allocate and initialize a Mali protected memory slab record. */ + slab = devm_kzalloc(mali_pma_dev->dev, sizeof(*slab), GFP_KERNEL); + if (!slab) { + dev_err(mali_pma_dev->dev, + "Failed to allocate a Mali protected memory slab.\n"); + goto out; + } + INIT_LIST_HEAD(&(slab->list_entry)); + + /* Allocate a DMA buffer. */ + dma_buf = dma_heap_buffer_alloc( + mali_pma_dev->dma_heap, MALI_PMA_SLAB_SIZE, O_RDWR, 0); + if (IS_ERR(dma_buf)) { + dev_err(mali_pma_dev->dev, + "Failed to allocate a DMA buffer of size %d\n", + MALI_PMA_SLAB_SIZE); + goto out; + } + slab->dma_buf = dma_buf; + + /* Attach the device to the DMA buffer. */ + dma_attachment = dma_buf_attach(dma_buf, mali_pma_dev->dev); + if (IS_ERR(dma_attachment)) { + dev_err(mali_pma_dev->dev, + "Failed to attach the device to the DMA buffer\n"); + goto out; + } + slab->dma_attachment = dma_attachment; + + /* Map the DMA buffer into the attached device address space. */ + dma_sg_table = + dma_buf_map_attachment(dma_attachment, DMA_BIDIRECTIONAL); + if (IS_ERR(dma_sg_table)) { + dev_err(mali_pma_dev->dev, "Failed to map the DMA buffer\n"); + goto out; + } + slab->dma_sg_table = dma_sg_table; + slab->base = page_to_phys(sg_page(dma_sg_table->sgl)); + + /* Add the slab to the slab list. */ + list_add(&(slab->list_entry), &(mali_pma_dev->slab_list)); + + /* Mark that the slab was successfully added. */ + succeeded = true; + +out: + /* Clean up on failure. */ + if (!succeeded && (slab != NULL)) { + mali_pma_slab_remove(mali_pma_dev, slab); + slab = NULL; + } + + return slab; +} + +/** + * mali_pma_slab_remove - Remove and deallocate a slab + * + * Must be called with the slab list mutex locked. + * + * @mali_pma_dev: Mali protected memory allocator device. + * @slab: Slab to remove and deallocate. + */ +static void mali_pma_slab_remove( + struct mali_pma_dev *mali_pma_dev, struct mali_pma_slab *slab) { + /* Ensure the slab list mutex is locked. */ + lockdep_assert_held(&(mali_pma_dev->slab_mutex)); + + /* Free the Mali protected memory slab allocation. */ + if (slab->dma_sg_table) { + dma_buf_unmap_attachment( + slab->dma_attachment, + slab->dma_sg_table, DMA_BIDIRECTIONAL); + } + if (slab->dma_attachment) { + dma_buf_detach(slab->dma_buf, slab->dma_attachment); + } + if (slab->dma_buf) { + dma_buf_put(slab->dma_buf); + } + + /* Remove the slab from the slab list. */ + list_del(&(slab->list_entry)); + + /* Deallocate the Mali protected memory slab record. */ + devm_kfree(mali_pma_dev->dev, slab); +} + +/** + * protected_memory_allocator_probe - Probe the protected memory allocator + * device + * + * @pdev: The platform device to probe. + */ +static int protected_memory_allocator_probe(struct platform_device *pdev) +{ + struct dma_heap *pma_heap; + struct mali_pma_dev *mali_pma_dev; + struct protected_memory_allocator_device *pma_dev; + int ret = 0; + + /* Try locating a PMA heap, defer if not present (yet). */ + pma_heap = dma_heap_find(MALI_PMA_DMA_HEAP_NAME); + if (!pma_heap) { + dev_warn(&(pdev->dev), + "Failed to find \"%s\" DMA buffer heap. Deferring.\n", + MALI_PMA_DMA_HEAP_NAME); + ret = -EPROBE_DEFER; + goto out; + } + + /* Create a Mali protected memory allocator device record. */ + mali_pma_dev = kzalloc(sizeof(*mali_pma_dev), GFP_KERNEL); + if (!mali_pma_dev) { + dev_err(&(pdev->dev), + "Failed to create a Mali protected memory allocator " + "device record\n"); + dma_heap_put(pma_heap); + ret = -ENOMEM; + goto out; + } + pma_dev = &(mali_pma_dev->pma_dev); + platform_set_drvdata(pdev, pma_dev); + + /* Initialize the slab list. */ + INIT_LIST_HEAD(&(mali_pma_dev->slab_list)); + mutex_init(&(mali_pma_dev->slab_mutex)); + + /* Configure the Mali protected memory allocator. */ + mali_pma_dev->dev = &(pdev->dev); + pma_dev->owner = THIS_MODULE; + pma_dev->ops.pma_alloc_page = mali_pma_alloc_page; + pma_dev->ops.pma_get_phys_addr = mali_pma_get_phys_addr; + pma_dev->ops.pma_free_page = mali_pma_free_page; + + /* Assign the DMA buffer heap. */ + mali_pma_dev->dma_heap = pma_heap; + + /* Log that the protected memory allocator was successfully probed. */ + dev_info(&(pdev->dev), + "Protected memory allocator probed successfully\n"); + +out: + return ret; +} + +/** + * protected_memory_allocator_remove - Remove the protected memory allocator + * device + * + * @pdev: The protected memory allocator platform device to remove. + */ +static int protected_memory_allocator_remove(struct platform_device *pdev) +{ + struct protected_memory_allocator_device *pma_dev; + struct mali_pma_dev *mali_pma_dev; + + /* Get the Mali protected memory allocator device record. */ + pma_dev = platform_get_drvdata(pdev); + if (!pma_dev) { + return 0; + } + mali_pma_dev = container_of(pma_dev, struct mali_pma_dev, pma_dev); + + /* Warn if there are any outstanding protected memory slabs. */ + if (!list_empty(&(mali_pma_dev->slab_list))) { + dev_warn(&(pdev->dev), + "Some protected memory has been left allocated\n"); + } + + /* Release the DMA buffer heap. */ + if (mali_pma_dev->dma_heap) { + dma_heap_put(mali_pma_dev->dma_heap); + } + + /* Free the Mali protected memory allocator device record. */ + kfree(mali_pma_dev); + + return 0; +} + +static const struct of_device_id protected_memory_allocator_dt_ids[] = { + { .compatible = "arm,protected-memory-allocator" }, + { /* sentinel */ } +}; +MODULE_DEVICE_TABLE(of, protected_memory_allocator_dt_ids); + +struct platform_driver protected_memory_allocator_driver = { + .probe = protected_memory_allocator_probe, + .remove = protected_memory_allocator_remove, + .driver = { + .name = "mali-pma", + .owner = THIS_MODULE, + .of_match_table = of_match_ptr(protected_memory_allocator_dt_ids), + .suppress_bind_attrs = true, + } +}; + |